Skip to content

File types

Peter Kerpedjiev edited this page Mar 21, 2017 · 5 revisions

The HiGlass server is capable of loading tile data from different file types. While they may physically store the data in different formats, they share the capability of being queried for data at a given zoom level and location.

Multires cooler files

Multires cooler files are HDF5 files which store multiple contact matrices binned at different resolutions. Each individual contact matrix is stored using the standard cooler format.

Regular cooler files can be turned into multires files using the cooler coarsegrain command. See the Processing and importing data section of the wiki for more information about the format.

Hitile files

Hitile files sore 1D genomic data at multiple resolutions using the HDF5 format. They are created using the clodius package. See the BigWig section of the processing and importing data section of the wiki for information about creating hitile files.

Contents

At the root level, attributes define metadata about the file. This is perhaps best explained with a chunk of code:

    import h5py

    f = h5py.File('file.hitile')
    d = f['meta']
    d.attrs['zoom-step'] = zoom_step        # store every nth aggregation (zoom) level (default: 8)
    d.attrs['max-length'] = assembly_size   # the size of the genome assembly  (default: hg19)
    d.attrs['assembly'] = assembly          # the name of the genome assembly (default: hg19)
    d.attrs['chrom-names'] = bwf.chroms().keys()  # the chromosome names in the assembly (default ['chr1', 'chr2',...])
    d.attrs['chrom-sizes'] = bwf.chroms().values() # the sizes of the chromosomes (e.g. [249250621, ...])
    d.attrs['chrom-order'] = chrom_order    # the order in which the chromosomes are stored (default ['chr1'..., 'chrX', 'chrY', 'chrM'])
    d.attrs['tile-size'] = tile_size        # the size of each individual tile (default: 1024)
    d.attrs['max-zoom'] = max_zoom =  math.ceil(math.log(d.attrs['max-length'] / tile_size) / math.log(2))
                                            # the maximum zoom level (default: 22)
    d.attrs['max-width'] = tile_size * 2 ** max_zoom  
                                            # the maximum width of a tileset with this tile size and maximum zoom

Internally, the data is stored at each zoom-step'th zoom level as one long array.

Size

Because HDF5 compresses data when storing it, hitile files end up being smaller than their bigWig counterparts.

File BigWig size HiTile size Conversion time (seconds)
wgEncodeSydhTfbsA549CtcfbIggrabSig 595M 166M 480
E116-H3K4me2.fc.signal 203M 175M 455
E004-H3K79me1.fc.signal 710M 465M 577