add prebuilt index for genome.fa #70

lincoln-harris · 2020-06-08T18:06:15Z

can we add a prebuilt index for the human genome .gtf / .fa that would load much faster?

hg38.fa -> 3 Gb
hg38.gtf -> 144 Mb

lincoln-harris · 2020-06-08T18:19:29Z

the rate limiting step here is genome interval tree construction, rather than building the genome.fa index. not sure what to do about this?

rvanheusden · 2020-06-10T22:57:14Z

Have you confirmed that building the interval tree is the main contributor for startup time? If so, it's perhaps worth taking a look at whether a majority of the time is spent doing calls in Python or if more time is spent in the low-level C code that NCLS uses for the underlying interval tree implementation. If the latter constitutes a majority of the time spent, then it may be worth optimizing the low level code. Because the interval tree only needs to be built once and then can be used on multiple threads, perhaps a custom low-level implementation of an interval tree that allows it to be built cooperatively by multiple threads would speed up this process.

lincoln-harris added the enhancement New feature or request label Jun 8, 2020

lincoln-harris changed the title ~~add prebuilt indexes~~ add prebuilt index for genome.fa Jun 8, 2020

lincoln-harris added the wontfix This will not be worked on label Jun 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add prebuilt index for genome.fa #70

add prebuilt index for genome.fa #70

lincoln-harris commented Jun 8, 2020 •

edited

lincoln-harris commented Jun 8, 2020

rvanheusden commented Jun 10, 2020

add prebuilt index for genome.fa #70

add prebuilt index for genome.fa #70

Comments

lincoln-harris commented Jun 8, 2020 • edited

lincoln-harris commented Jun 8, 2020

rvanheusden commented Jun 10, 2020

lincoln-harris commented Jun 8, 2020 •

edited