Skip to content

Utilities

Farley Lai edited this page Apr 30, 2021 · 1 revision

Compilation of useful code snippets using ML:

Dataset Archive in Parallel

Parallel compression with symlink dereference

alias tarbz='tar --use-compress-program=lbzip2'
tarbz -cfh file.tar.gz /path/to/dataset

Dataset Validation and Cleanup

Count files or directories

find data/store/videos/train/*/* -maxdepth 0 -type f | wc
find data/store/videos/train/*/* -maxdepth 0 -type d | wc
find frames-1fps/val -type d -empty | wc

Count distinct suffixes

find data/store/videos/train/*/* -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

Download/Upload from/to S3

from ml import hub
path = 'weights/yolov5x-store.pt'
hub.upload_s3(path, 'pretrained', 'detection/yolo')
hub.download_s3('pretrained', 'detection/yolo/yolov5x-store.pt')

Download from Google Drive

from ml import hub
fid = '1I7OjhaomWqd8Quf7o5suwLloRlY0THbp'
path = 'WiderPerson.zip'
hub.download_gdrive(fid, path)