PARANOID

Pipeline for Automated Read ANalysis Of iCLIP Data

PARANOiD is a versatile software for the fully automated analysis of iCLIP and iCLIP2 data. It contains all steps necessary for preprocessing, the determination of cross-link locations and several additional steps which can be used to detect specific characteristics, e.g. definite distances between cross-link events or binding motifs. The cross-link sites are presented as WIG files that can be easily visualized e.g. using IGV, for which a config file is offered. Additionally, results are offered as statistical plots for a quick overview and as standardized bioinformatics file formats or TSV files which can be used for further analysis steps.

Overview

Basic usage
Inputs
Parameters
Additional analyses
Outputs

Basic-usage

nextflow PARANOiD.nf --reads \<reads.fastq\> --reference \<reference_sequence.fasta\> --barcodes \<barcodes.tsv\>

Inputs

Reads (essential)

Reads generated by iCLIP experiments. Can be provided as one or more files. If providing more than one file, regular expressions can be used within quotation marks.
Format: FASTQ

Usage

--reads reads_file.fastq
--reads "reads_{1,2}.fastq"
--reads "*.fastq"

Reference (essential)

File containing the reference to which the reads will be mapped.
Format: FASTA

Usage

--reference reference_file.fasta

Barcodes (essential)

Barcode sequences are used to assign reads to their experiment. The file is provided as TSV-file (tab separated value). The first consists of the experiment name and the second of the nucleotide sequence representing the barcode of the experiment. One experiment is described per lane and the columns are divided by a tab.
The experiment name should be named as follows:
<experiment_name>_rep_<replicate-number>

Example:

experiment1_rep_1	GCATTG  
experiment1_rep_2	CAGTAA  
experiment1_rep_3	GGCCTA  
experiment2_rep_1	AATCCG  
experiment2_rep_2	CCGTTA  
experiment2_rep_3	GTCATT

Usage

--barcodes barcode_file.tsv

Annotation

File containing annotations of the reference provided. Advised when working with splicing capable organisms. Necessary for RNA subtype analysis.
Formats: GFF GTF

Usage

--annotation annotation_file.gff

Parameters

--barcode_pattern

A string that allows to adapt to other barcode patterns (default is iCLIP2). N represent the random barcodes and X represent the experimental barcode.
Default: NNNNNXXXXXXNNNN

Usage

Example for iCLIP:

--barcode_pattern NNXXXXNNN

Default:

--barcode_pattern NNNNNXXXXXXNNNN

--domain

Enables the use of a splicing capable mapping tool (STAR) if necessary.
Options:
pro -> Bowtie2 for splicing incapable organisms or spliced transcripts
eu -> STAR for splicing capable organisms
Default: pro

Usage

--domain eu

Default:

--domain pro

--output

Path to output directory. Allows to save outputs to another location.
Default: ./output

Usage

--output /path/to/output

Default:

--output ./output

--min_length

Minimum length for reads to retain after adapter trimming. All reads that are cut shorter during this step are removed.
Default: 30

Usage

--min_length 30

--min_qual

Minimum quality of bases necessary to retain them. Bases below that quality are cut of. Furthermore, reads with a certain percentage of bases below that quality are completely removed (see --min_percent_qual_filter). The value is based on the Phred score:

Quality score	Error	Accuracy
10	10%	90%
20	1%	99%
30	0.1%	99.9%
40	0.01%	99.99%

For more information click here
Default: 20

Usage

--min_qual 20

--min_percent_qual_filter

Minimum percent of bases above the stated quality (see --min_qual) necessary to retain a read after quality filtering.
Default: 90

Usage

--min_percent_qual_filter 90

--barcode_mismatches

Allowed number of mismatches in experimental barcode sequence to assign reads to experiments. This gives the possibility to still assign reads when a sequencing error occurs in the barcode sequence.
Default: 1

Usage

--barcode_mismatches 1

--mapq

Usage

--mapq 2

--split_fastq_by

Usage

--split_fastq_by 1000000

Additional-analyses

Transcript analysis

--map_to_transcripts

Usage

--map_to_transcripts

--number_top_transcripts

Usage

--number_top_transcripts 10

Usage

--mapq 2

Peak calling

--omit_peak_calling

Usage

--peak_calling

Merging of replicates

--merge_replicates

Usage

--merge_replicates

RNA subtype distribution

--rna_subtypes

Usage

--rna_subtypes 3_prime_UTR,transcript,5_prime_UTR

--gene_id

Usage

--gene_id ID

--color_barplot

Usage

--color_barplot #69b3a2

Peak distance analysis

--omit_peak_distance

Usage

--peak_distance

--percentile

Shared with sequence extraction

Usage

--percentile 90

--distance

Usage

--distance 50

Sequence extraction

--omit_sequence_extraction

Usage

--sequence_extraction

--percentile

Shared with sequence extraction

Usage

--percentile 90

--seq_len

Usage

--seq_len 20

--sequence_format_txt

Usage

--sequence_format_txt

TODO: document streme parameters params.max_motif_num = 50 // INT max number of motifs to search for params.min_motif_width = 8 // INT minimum motif width to report, >=3 params.max_motif_width = 15 // INT maximum motif width to report, <= 30

Name		Name	Last commit message	Last commit date
Latest commit History 356 Commits
.nextflow		.nextflow
bin		bin
dockerfiles		dockerfiles
docs		docs
modules		modules
test_data		test_data
.gitignore		.gitignore
.nextflow.log		.nextflow.log
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
LICENSE.pybam		LICENSE.pybam
PARANOID.code-workspace		PARANOID.code-workspace
PARANOiD.nf		PARANOiD.nf
README.md		README.md
bam-files.txt		bam-files.txt
build_docker.sh		build_docker.sh
featuretypes-from-gtfgff.awk		featuretypes-from-gtfgff.awk
flowchart.png		flowchart.png
flowchart_old.png		flowchart_old.png
nextflow.config		nextflow.config
pbarth-samtools-1.0.img.pulling.1670257234109		pbarth-samtools-1.0.img.pulling.1670257234109
run_pipeline.sh		run_pipeline.sh
test_pipeline.nf		test_pipeline.nf

License

g1ronn1mo/PARANOID

Folders and files

Latest commit

History

Repository files navigation

PARANOID

Overview

Basic-usage

Inputs

Reads (essential)

Usage

Reference (essential)

Usage

Barcodes (essential)

Usage

Annotation

Usage

Parameters

--barcode_pattern

Usage

--domain

Usage

--output

Usage

--min_length

Usage

--min_qual

Usage

--min_percent_qual_filter

Usage

--barcode_mismatches

Usage

--mapq

Usage

--split_fastq_by

Usage

Additional-analyses

Transcript analysis

--map_to_transcripts

Usage

--number_top_transcripts

Usage

Usage

Peak calling

--omit_peak_calling

Usage

Merging of replicates

--merge_replicates

Usage

RNA subtype distribution

--rna_subtypes

Usage

--gene_id

Usage

--color_barplot

Usage

Peak distance analysis

--omit_peak_distance

Usage

--percentile

Usage

--distance

Usage

Sequence extraction

--omit_sequence_extraction

Usage

--percentile

Usage

--seq_len

Usage

--sequence_format_txt

Usage

Outputs

About

Resources

License

Stars

Watchers

Forks