CZ CELLxGENE Discover

CELLxGENE Discover enables the publication, discovery and exploration of interoperable single-cell datasets. Data contributors can upload, review and publish datasets for private or public use. Via Discover, data consumers are able to discover, download and connect data to visualization tools such as CELLxGENE Explorer to perform further analysis. The goal of Discover is to catalyze distributed collaboration of single-cell research by providing a large, well-labeled repository of interoperable datasets.

Developer Guidelines

FE Setup

Follow instructions in /frontend README

Pre-requisite installations and setups

Note: Before you begin to install any Python packages, make sure you have activated your Python virtual environment.

Install pre-commit: pre-commit install or check doc here
Set up your machine to be able to work with AWS using the instructions here. Please ensure to follow the step 3 AWS CLI access instructions all the way to the bottom so that you are also set up for SSH access. When you run the final command that requires the team's infra repo, use single-cell-infra.
install jq. If brew is installed run brew install jq.
install libpq. If brew is installed run brew install libpq. We need this tool to invoke the psql commandline utility.
Install chamber. For running functional tests below, you will need to install Chamber on your machine. Chamber is a tool for reading secrets stored in AWS Secret Store and Parameter Store. On Linux, go to https://github.com/segmentio/chamber/releases to download the latest version >= 2.9.0, and add it somewhere on your path. On Mac, run brew install chamber.

Development Quickstart

Note: Make sure you are running your Python virtual environment before going through the development guides.

Once you have run the pre-requisite sets, you are ready to begin developing for CELLxGENE Discover. As you start to change code, you may want to deploy a test instance of Discover so that you can check to see how your changes perform. We have two ways to deploy your changes:

Creating a local python-only development environment. This local development environment is a minimal environment that is suitable for running the automated test suite quickly. It also allows you to run specific test cases which is not possible in the dockerized local deployment environment (see below) - the make commands invoke docker commands to run test suites in their entirety. Moreover, creating an environment without docker also makes for much faster test runs. To set up a python-only local development environment, see this guide
Creating a local deployment environment. This environment will be entirely hosted on your own machine. It relies upon Docker to run both CELLxGENE Discover servers, Discover unit tests, and infrastructure service dependencies (AWS, Postgres, OIDC). The environment will be initialized with a small amount of dummy data. This environment is great to have up and running while you are actively developing. See this guide for instructions on how to set up a local deployment.
Creating a remote deployment. This environment creates a lightweight replica of CELLxGENE Discover, hosted by AWS, and provides a more realistic test bed to test your changes before either sending them to a PR or try them out with a cross-functional partner. It takes a longer time to deploy your changes to a remote development environment which is why the local deployment is preferred until your changes are ready for broader review. See this guide for instructions on how to set up an rDev environment.

Common Commands

Command	Description	Notes
`make fmt`	Auto-format codebase using black.	This should be run before merging in any changes.
`make lint`	Perform lint checks on codebase using flake8.	This should be run before merging in any changes.
`make unit-test`	Run all unit tests.
`DEPLOYMENT_STAGE=<deployed_env> python3 -m unittest discover tests/functional/backend`	Run all functional tests.

Environment variables

Environment variables are set using the command export <name>=<value>. For example, export DEPLOYMENT_STAGE=dev. These environment variables typically need to be set before you are able to set up your environments (i.e. local, rDev) and before you are able to successfully run any test suite.

Name	Description	Values
`DEPLOYMENT_STAGE`	Specifies an app deployment stage for tasks such as deployments and functional tests. The `test` value implies local Docker development environment (and should probably be renamed `local`).	`test`, `dev`, `staging`, `prod`
`AWS_PROFILE`	Specifies the profile used to interact with AWS resources via the awscli.	`single-cell-dev`, `single-cell-prod`
`CORPORA_LOCAL_DEV`	Flag: If this variable is set to any value, the app will look for the database on localhost:5432 and will use the aws secret `corpora/backend/\${DEPLOYMENT_STAGE}/database`.	Any

Database Procedures

If you need to make a change to the CELLxGENE Discover database, see CELLxGENE Discover Database Procedures.

Running Unit Tests

Set AWS_PROFILE.
Ensure that you have set up your local development environment per the instructions above and run make local-init to launch a local dev environment.
Run the tests using the command $ make unit-test.

Running the WMG pipeline and endpoints locally in an interactive environment

Please take a look at the tutorial notebooks in example_dev_notebooks for examples on how to run the WMG pipeline and endpoints locally for development purposes.

Troubleshooting Unit Tests

If your unit tests crash due to an Error 137, that means the docker containers currently running are using up more memory than what the docker application has allocated. First run $ make local-stop to kill all docker containers and rerun the tests. If that doesn't work either, you may need to increase the memory allocation by going into settings pane of your desktop docker application. A reasonable allocation of memory is 16GB.

Running Functional Tests

Set AWS_PROFILE.
Set DEPLOYMENT_STAGE to deployed environment you want to run tests, as written locally, against (dev, staging, or prod)
Run a specific suite of tests using DEPLOYMENT_STAGE=<deployed_env> python3 -m unittest <path_to_functional_test>. For example, DEPLOYMENT_STAGE=dev python3 -m unittest tests/functional/backend/corpora/test_revisions.py
Run all functional tests by using DEPLOYMENT_STAGE=<deployed_env> python3 -m unittest discover tests/functional/backend

Running e2e Tests

Follow instructions in e2e tests README

Upload processing container

The upload processing container is split into 2 parts: a base container that contains R libraries, and the CELLxGENE Discover upload application code that builds on top of this.

Because the base container takes a long time to build and is expected to change infrequently, the container is built separately from the standard release process.

Building the image

The base image is built using Github actions. It is built both nightly, and whenever the Dockerfile.processing_base file is changed.

The CELLxGENE Discover upload application code by default uses the base image tagged with the tag "branch-main" (which the nightly and on-change base image build reassigns).

If a new base image build is needed but the Dockerfile has no functional change (e.g. upstream R libraries versions have changed), the Dockerfile.processing can be modified with a non-functional to force the build (e.g. adding a blank line).

In the rare event a new build of the base image needs to be built without Github Actions (e.g. Github Actions is down), follow the steps Github's documentation for creating a personal access token, and build locally and push like any other Docker image.

Name		Name	Last commit message	Last commit date
Latest commit History 2,728 Commits
.github		.github
.happy		.happy
.vscode		.vscode
backend		backend
cloc		cloc
docs		docs
example_dev_notebooks		example_dev_notebooks
frontend		frontend
oauth		oauth
python_dependencies		python_dependencies
scripts		scripts
tests		tests
.codeclimate.yml		.codeclimate.yml
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
DEV_ENV.md		DEV_ENV.md
DEV_ENV_WITHOUT_DOCKER.md		DEV_ENV_WITHOUT_DOCKER.md
Dockerfile.backend		Dockerfile.backend
Dockerfile.cellguide_pipeline		Dockerfile.cellguide_pipeline
Dockerfile.dataset_submissions		Dockerfile.dataset_submissions
Dockerfile.processing		Dockerfile.processing
Dockerfile.processing_base		Dockerfile.processing_base
Dockerfile.upload_failures		Dockerfile.upload_failures
Dockerfile.upload_success		Dockerfile.upload_success
Dockerfile.wmg_pipeline		Dockerfile.wmg_pipeline
LICENSE		LICENSE
Makefile		Makefile
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.md		README.md
SECURITY.md		SECURITY.md
container_init.sh		container_init.sh
docker-compose.yml		docker-compose.yml
environment.test		environment.test
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml

License

chanzuckerberg/single-cell-data-portal

Folders and files

Latest commit

History

Repository files navigation

CZ CELLxGENE Discover

Developer Guidelines

FE Setup

Pre-requisite installations and setups

Development Quickstart

Common Commands

Environment variables

Database Procedures

Running Unit Tests

Running the WMG pipeline and endpoints locally in an interactive environment

Troubleshooting Unit Tests

Running Functional Tests

Running e2e Tests

Upload processing container

Building the image

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages