Skip to content

Teraslice processors for working with data stored in files on disk, S3 or HDFS.

License

Notifications You must be signed in to change notification settings

terascope/file-assets

Repository files navigation

file-assets

A set of Teraslice processors for working with data stored in files on disk. The readers utilize the chunked-file-reader module (migrated into this bundle from the Teraslice monorepo) to break data into records.

Since all the readers in this asset bundle use DataEntities, the slice's file path can be retrieved from each record by using something like record.getMetadata('path'). More information about DataEntities can be found here.

This bundle includes the following processors:

Releases

You can find a list of releases, changes, and pre-built asset bundles here.

Getting Started

This asset bundle requires a running Teraslice cluster, you can find the documentation here.

# Step 1: make sure you have teraslice-cli installed
yarn global add teraslice-cli

# Step 2:
teraslice-cli assets deploy clusterAlias terascope/file-assets

Connectors

S3 Connector

Configuration:

The S3 connector configuration, in your Teraslice configuration file, includes the following parameters:

Configuration Description Type Notes
endpoint Target S3 HTTP endpoint, must be URL String optional, defaults to http://127.0.0.1:80
accessKeyId S3 access key ID String required
secretAccessKey S3 secret access key String required
region AWS Region where bucket is located String optional, defaults to us-east-1
maxRetries Maximum retry attempts Number optional, defaults to 3
sslEnabled Flag to enable/disable SSL communication Boolean optional, defaults to true
caCertificate A string containing a single or multiple ca certificates String optional, defaults to ' '
certLocation DEPRECATED - use caCertificate. Location of ssl cert String optional, defaults to ' '
forcePathStyle Whether to force path style URLs for S3 objects Boolean optional, defaults to false
bucketEndpoint Whether to use the bucket name as the endpoint for this request Boolean optional, defaults to false

Terafoundation S3 configuration example:

terafoundation:
    connectors:
        s3:
            default:
                endpoint: "http://localhost:9000"
                accessKeyId: "yourId"
                secretAccessKey: "yourPassword"
                forcePathStyle: true
                sslEnabled: true
                caCertificate: |
                    -----BEGIN CERTIFICATE-----
                    MIICGTCCAZ+gAwIBAgIQCeCTZaz32ci5PhwLBCou8zAKBggqhkjOPQQDAzBOMQs
                    ...
                    DXZDjC5Ty3zfDBeWUA==
                    -----END CERTIFICATE-----

Development

Tests

Run the file-assets tests

Requirements:

yarn test

Build

Build a compiled asset bundle to deploy to a teraslice cluster.

Install Teraslice CLI

yarn global add teraslice-cli
teraslice-cli assets build

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT licensed.

About

Teraslice processors for working with data stored in files on disk, S3 or HDFS.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages