Tiny Scala API to stream, parse, and cache atom coordinates from Protein Data Bank files. Only ATOM and HETATM records are parsed. Uses ScalaCache as a facade to support virtually any caching backend, such as Ehcache, Redis, Caffeine, or a custom backend.
Provides a model (PdbAtom
), a parser (PdbParser
), and a cache (AtomTin
). There are three SBT subprojects:
- core, which includes only the parser, model, and cache
- caffeinated, which uses Caffeine for in-memory caching (via
CaffeinatedAtomTin
) - pickled, which uses Pickling for on-disk serialization (via
PickledAtomTin
)
Simple example of using pickled and synchronous lookups:
val tin = new PickledAtomTin() // directory for serialization defaults to ~/atom-tin-cache
val atoms: TraversableOnce[PdbAtom] = tin.loadAndWait("1hiv") // max wait defaults to infinite
Or the same with caffeinated:
val tin = new CaffeinatedAtomTin(_.maximumSize(100)) // alter Caffeine defaults for maximumSize
val atoms: TraversableOnce[PdbAtom] = tin.loadAndWait("1hiv")
To use asynchronous lookups, use:
val atoms: Future[TraversableOnce[PdbAtom]] = tin.load("1hiv")
Here's a more complex example that prints the coordinates of Arginines for multiple PDB structures asynchronously:
def printArginines(pdbId: String) = {
tin.load(pdbId) map {
atoms => atoms filter (a => a.residueName == Right(AminoAcid.Arginine))
map (_.coordinates)
} onSuccess {
case coordinates => println(coordinates)
}
}
Seq("1HIV", "5AYR", "2D26", "3JD6") map printArginines
You can bypass the cache, delete items from the cache, or clear the cache:
AtomTin.download("1hiv") // skips the cache
cache.delete("1hiv") // removes 1hiv from the cache
cache.deleteAll() // clears the cache
- The parser is covered by ScalaCheck property tests. The caching currently lacks tests but seems to work.
AtomTin
and its subclasses require an implicitExecutionContext
.PickledAtomTin
is currently limited to Gzipped JSON.
The software is licensed under the Apache License, Version 2.0 by Douglas Myers-Turnbull.