Skip to content

Get the DeepLesion CT Image data set into a GCP Storage Bucket

Notifications You must be signed in to change notification settings

suyashkumar/deeplesion-gcp-loader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepLesion GCP Loader

This program is a simple way to fetch, uncompress, and upload the DeepLesion dataset of 32,000 CT images into a google cloud bucket. Usage is simple:

./deeplesion-loader --removeFiles=true --bucketName=my-bucket

Will download each 4GB zip from the dataset, unzip it, and upload the images to my-bucket. This configuration with removeFiles=true will delete each zip file after it has successfully uploaded the contents to GCP.

./deeplesion-loader --bucketName=my-bucket --parallel=true

Will run all file downloads and uploads in parallel--this is much faster, but requires more disk space and resources.

Note: You must ensure the machine running this program has write access to your GCP bucket (or that GCP application deafult credentials are set). See the section below for more details

General installation and setup

You can simply download the right binary from the releases tab and run it like detailed above. You can also fetch the binary from the commandline using the following command:

wget -qO- $BINARY_RELEASE_LINK | tar xvz

where $BINARY_RELEASE_LINK is the link of the download from the releases tab.

Ensuring GCP Write access

The machine this program runs on needs to have write access to your bucket. This can be done in two ways:

  • Ensure application default credentials are set. Usually: gcloud auth application-default login will do it
  • Or you can spin up a GCP virtual machine that has the "Storage" API permission set to "Read Write" which can be done when creating the VM by clicking "Set access for each API"