Skip to content

SereneAnt/segystream

Repository files navigation

SegyStream

TravisCI build

Reactive streaming SEG-Y parser

Features

  • Seg-Y version 1 format supported.
  • Supports asynchronous stream processing with non-blocking adaptive pull/push back pressure, as it declared by Reactive Streams.
  • Built with Akka Streams.
  • Contains examples of different use cases: streaming from file source, AWS S3, transformation, visualization, statistics, parallel processing, etc.
  • API for both scala and java languages.

Further work

  • Configurable segy data chunk size and text reading encoding
  • Add Github badges - code coverage, stable version, etc
  • Add more examples for streaming from file source, S3, transformation, visualization, parallel processing
  • Add benchmarks, taking commonly used Seg-Y parsers as a baseline
  • Add full support for set of Seg-Y v1 features (variable ext text headers, etc.)
  • Add Seg-Y v2 support
  • Cross-validation against other commonly used Seg-Y parsers

Prerequisites

  • java 1.8
  • sbt 1.x

How to use

Add dependency:

Sbt

libraryDependencies += "com.github.sereneant.segystream" %% "segystream-core" % "0.1.0"

Maven

<dependency>
    <groupId>com.github.sereneant.segystream</groupId>
    <artifactId>segystream-core_2.12</artifactId>
    <version>0.1.0</version>
</dependency>

Gradle

dependencies {
  compile group: 'com.github.sereneant.segystream', name: 'segystream-core_2.12', version: '0.1.0'
}

Streaming implementation is based on Akka Streams.

Scala

Setup streams:

  implicit val system: ActorSystem = ActorSystem("segystream-examples")
  implicit val mat: ActorMaterializer = ActorMaterializer()

Construct Stream blueprint from Seg-Y file or another byte sources (S3, HDFS, etc).

  val segySource: Source[SegyPart, Future[SegyHeaders]] = fileSource.viaMat(SegyFlow())(Keep.right)

Full spectre of Alpakka Connectors can be used for streaming from different sources / to different sinks.

Run the flow, make actions/transformations:

  val done: Future[Done] = segySource
    .map {
      case th: TraceHeader => println(s"Trace Header: ${th.traceSequenceNumberWithinLine}")
      case td: TraceDataChunk => println(s"Trace Data Chunk: length=${td.length}")
      case _ => // NoOp
    }
    .toMat(Sink.ignore)(Keep.right) // wait for the Sink to complete
    .run()

Wait for stream termination and print the stats:

  implicit val ec: ExecutionContextExecutor = system.dispatcher
  done.onComplete { _ =>
    system.terminate()
    println("Stream completed")
  }

Scala examples

Java

The full power of Akka streams is available in Java as well.

Java Examples

Configuration

Stream of Seg-Y data in traces is split into chunks of configurable length, default is 1024 bytes.

Custom configuration can be passed to SegyFlow constructor:

val segyFlow = new SegyFlow(SegyConfig(
  charset: Charset = Charset.forName("CP037"), //textual data charset
  dataChunkSize: Int = 1024 //bytes
))

Building from sources

sbt package

Publish to local repo repository

Ivy

sbt publishLocal

Maven

sbt publishM2

Running tests

sbt test

Running benchmarks

TBD

Running examples

Examples are located in examples folder.

sbt "examples/runMain com.github.sereneant.segystrem.examples.CollectSegyStats SegY_file_name.segy"

Known Issues

  • Parser does not support variable extended text headers.
  • Parser does not support Data Sample Format Code 4 (4-byte fixed-point with gain, obsolete).

Contributing

Any contributions are welcome! It can be done by creating issues and pull requests on a project GitHub page.

Please keep code clean (whatever it means for you) and comply with coding style standards:

Please keep a CHANGELOG.md file in actual state; the format is based on Keep a Changelog.

Versioning

SemVer is used as versioning standard. For the version references, see the git tags.

License

Licensed under the MIT License - see the LICENSE file.

Acknowledgments

Alternative noteworthy implementations

All references are given in alphabetical order.

Java

Python