TOSTADAS → Toolkit for Open Sequence Triage, Annotation and DAtabase Submission 🧬 💻

PATHOGEN ANNOTATION AND SUBMISSION PIPELINE

For the full documentation on the TOSTADAS pipeline take a look at our the Wiki Page: Wiki

Overview

TOSTADAS is designed to fulfill common sequence submission use cases. The tool runs three sub-processes:

Metadata Validation – This workflow checks if metadata conforms to NCBI standards and matches the input .fasta file(s)
Gene Annotation – This workflow runs gene annotation on fasta-formatted genomes using one of three annotation methods: RepeatMasker and Liftoff, VADR or BAKTA
Submission – This workflow generates the necessary files and information for submission to NCBI and optionally and optionally submit to NCBI.

TOSTADAS is flexible, allowing you to choose which portions of the pipeline to run and which to skip. For example, you can submit .fastq files and metadata without performing gene annotation.

The current distribution has been tested with Pox virus sequences as well as some bacteria. Ongoing development aims to make the pipeline pathogen agnostic.

Environment Setup

For in-depth set-up instructions, follow the Installation Guide in our wiki.

❗ Note: If you are a CDC user, please follow the set-up instructions found here: CDC User Guide

(1) Install Nextflow using Use Mamba and the Bioconda Channel:

There are several options for install if you don't already have nextflow on your system.

mamba install -c bioconda nextflow

❗ Optionally, you may install nextflow without mamba by following the instructions found in the Nextflow Installation Documentation Page: Nextflow Install

(2) Clone the repository to your local machine:

git clone https://github.com/CDCgov/tostadas.git
cd tostadas

❗ Note: If you have mamba or nextflow installed in your local environment, you may skip steps 2, 3 (mamba installation) and 6 (nextflow installation) accordingly.

(3) Create and activate the conda environment:

mamba env create -n tostadas -f environment.yml
conda activate tostadas

(4) Test your installation by running one of the following nextflow commands on test data

# for virus reads
nextflow run main.nf -profile test,<singularity/docker/conda> --virus
# for bacterial reads
nextflow run main.nf -profile test,<singularity/docker/conda> --bacteria

The pipeline outputs appear in the test_output folder within the tostadas directory.

(5) Start running your own analysis

Annotate and submit viral reads

nextflow run main.nf -profile docker --virus --fasta_path <path/to/fasta/files> ---meta_path <path/to/metadata_file.xlsx> --submission_config <path/to/submission_config.yaml> --output_dir <path/to/output/dir/>

Annotate and submit bacterial reads

nextflow run main.nf -profile docker --bacteria --fasta_path <path/to/fasta/files> ---meta_path <path/to/metadata_file.xlsx> --submission_config <path/to/submission_config.yaml> --download_bakta_db --bakta_db_type <light/full>--output_dir <path/to/output/dir/>

Refer to the wiki for more information on input parameters and use cases

Get in Touch

If you have any ideas for ways to improve our existing codebase, feel free to open an Issue Request (found here: Open New Issue)

Steps to Open Issue Request:

(1) Select Appropriate Template

Following the link above, there are four options for issue templates and your selection will depend on (1) if you are a user vs maintainer/collaborator and (2) if the request pertains to a bug vs feature enhancement. Please select the template that accurately reflects your situation.

(2) Fill Out Necessary Information

Once the appropriate template has been selected, you must fill/answer all fields/questions specified. The information provided will be valuable in getting more information about the issue and any necessary context surrounding it.

(3) Submit the Issue

Once all information has been provided, you may now submit it!

Please allow for some turnaround time for us to review the issue and potentially start addressing it. If this is an urgent request and have not heard from us nor see any progress being made after quite some time (longer than a week), feel free to start a discussion (found here: Start New Discussion) mentioning the following:

Issue Number
Date Submitted
General Background on Bug/Feature
Reason for Urgency

And we will get back to you as soon as possible.

Acknowledgements

Contributors

Tools

The submission portion of this pipeline was adapted from SeqSender. To find more information on this tool, please refer to their GitHub page: SeqSender.

Name		Name	Last commit message	Last commit date
Latest commit History 1,230 Commits
.github		.github
app		app
aspen		aspen
assets		assets
bin		bin
conf		conf
docs		docs
lib		lib
modules		modules
params		params
setup		setup
subworkflows/local		subworkflows/local
tests		tests
vadr_files		vadr_files
workflows		workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
DISCLAIMER.md		DISCLAIMER.md
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config
open_practices.md		open_practices.md
rules_of_behavior.md		rules_of_behavior.md
thanks.md		thanks.md

License

CDCgov/tostadas

Folders and files

Latest commit

History

Repository files navigation

TOSTADAS → Toolkit for Open Sequence Triage, Annotation and DAtabase Submission 🧬 💻

PATHOGEN ANNOTATION AND SUBMISSION PIPELINE

Overview

Environment Setup

(1) Install Nextflow using Use Mamba and the Bioconda Channel:

(2) Clone the repository to your local machine:

(3) Create and activate the conda environment:

(4) Test your installation by running one of the following nextflow commands on test data

(5) Start running your own analysis

Get in Touch

Steps to Open Issue Request:

(1) Select Appropriate Template

(2) Fill Out Necessary Information

(3) Submit the Issue

Acknowledgements

Contributors

Tools

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages