Skip to content

smowton/cufflinks

 
 

Repository files navigation

CUFFLINKS
----------------------------
Cufflinks is a reference-guided assembler for RNA-Seq experiments. It
simultaneously assembles transcripts from reads and estimates their relative
abundances, without using a reference annotation.  The software expects as 
input RNA-Seq read alignments in SAM format (http://samtools.sourceforge.net).

Here's an example spliced read alignment record:
s6.25mer.txt-913508	16	chr1	4482736	255	14M431N11M	*	0	0	CAAGATGCTAGGCAAGTCTTGGAAG	IIIIIIIIIIIIIIIIIIIIIIIII	NM:i:0	XS:A:-

This record includes a custom tag used by Cufflinks to determine the strand 
of the mRNA from which this read originated.  Often, RNA-Seq experiments lose
strand information, and so reads will map to either strand of the genome.  
However, strandedness of spliced alignments can often be inferred from the 
orientation of consensus splice sites, and Cufflinks requires that spliced 
aligments have the custom strand tag XS, which has SAM attribute type "A", and
can take values of "+" or "-".  If your RNA-Seq protocol is strand specific,
including this tag for all alignments, including unspliced alignments, will 
improve the assembly quality.

The SAM records MUST BE SORTED by reference coordinate, like so:

sort -k 3,3 -k 4,4n hits.sam 

The program is fully threaded, and when running with multiple threads, should
be run on a machine with plenty of RAM. 4 GB per thread is probably reasonable
for most experiments.  Since many experiments feature a handful of genes that
are very abundantly transcribed, Cufflinks will spend much of its time 
assembling" a few genes.  When using more than one thread, Cufflinks may 
appear to "hang" while these genes are being assembled.

Cufflinks assumes that library fragment lengths are size selected and normally
distributed. When using paired end RNA-Seq reads, you must take care to supply
Cufflinks with the mean and variance on the inner distances between mate 
pairs. For the moment, Cufflinks doesn't support assembling mixtures of paired
end reads from different fragment size distributions.  Mixtures of single 
ended reads (of varying lengths) with paired ends are supported.

Cufflinks also assumes that the donor does not contain major structural 
variations with respect to the reference.  Tumor genomes are often highly 
rearranged, and while Cufflinks may eventually identify gene fusions and 
gracefully handle genome breakpoints, users are encouraged to be careful when
using Cufflinks on tumor RNA-Seq experiments. 

The full manual may be found at http://cufflinks.cbcb.umd.edu

CUFFCOMPARE
----------------------------
Please see http://cole-trapnell-lab.github.io/cufflinks/manual

REQUIREMENTS
---------------------------

Cufflinks is a standalone tool that requires gcc 4.0 or greater, and runs on
Linux and OS X.  It depends on Boost (http://www.boost.org) version 1.38 or 
higher.

REFERENCES
---------------------------
Cufflinks builds on many ideas, including some
proposed in the following papers:

Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaeffer and Barbara 
Wold, "Mapping and quantifying mammalian transcriptomes by RNA-Seq",Nature 
Methods, volume 5, 621 - 628 (2008)

Hui Jiang and Wing Hung Wong, "Statistical Inferences for isoform expression", 
Bioinformatics, 2009 25(8):1026-1032

Nicholas Eriksson, Lior Pachter, Yumi Mitsuya, Soo-Yon Rhee, Chunlin Wang, 
Baback Gharizadeh, Mostafa Ronaghi, Robert W. Shafer, Niko Beerenwinkel, "Viral 
population estimation using pyrosequencing", PLoS Computational Biology, 
4(5):e1000074

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 86.6%
  • C 12.5%
  • Other 0.9%