Skip to content

Flexible and efficient data processing engine and an evolution of the popular Scramjet Framework based on node.js. Our Transform Hub was designed specifically for data processing and has its own unique algorithms included.

License

open-pwa/transform-hub

 
 

Repository files navigation

Introduction

This repository is intended for developers that would like to:

  • participate in the Scramjet Transform Hub development community
  • register feature requests, issues and PRs for this product
  • build Transform Hub from source
  • dive into code

For developers that would like to simply use Scramjet Transform Hub to run data processing applications, we recommend following resources:

Scramjet Transform Hub

Scramjet Transform Hub is a deployment and execution platform. Once installed on a server, it will allow you to start your programs and keep them running on a remote machine. You will be able to start programs in the background or connect to them and see their output directly on your terminal. You will be able to pipe your local data to the program as if it was running from your terminal. You can start your server in AWS, Google Cloud or Azure, start it on your local machine, install it on a Rasperry Pi or wherever else you'd like.

There's no limit what you can use it for. You want a stock checker? A chat bot? Maybe you'd like to automate your home? Retrieve sensor data? Maybe you have a lot of data and want to transfer and wrangle it? You have a database of cities and you'd like to enrich your data? You do machine learning and you want to train your set while the data is fetched in real time? Hey, you want to use it for something else and ask us if that's a good use? Ask us via email or hop on our Scramjet Slack!

This is the STH development repo. In order to use it, you need to have linux based os, for instance Ubuntu, docker and node.js v14.x installed. We're working on development guides for Mac and Windows and node.js v16.x compatibility. A docker install will also be shortly available.

TL;DR

First clone build and start the hub, copy the following commands to the terminal:

git clone https://github.com/scramjetorg/transform-hub.git && \
  cd transform-hub && sudo gpasswd -a $USER docker && \
  yarn install && yarn build:all && npm i -g ./dist/cli && yarn start -P 8000

Depending on your machine this may take some time. When it's done the Hub should be running and you should see initial logs showing that the API server has been started on port 8000, something like this:

2021-07-07T18:19:36.808Z info (object:Host) API listening on port: localhost:8000

Now create an application, let's say you want to get the crypto prices every second. In a clean folder save this as index.js:

const { PassThrough } = require("stream");
const fetch = require("node-fetch");

const getData = async (baseCurrency, currency) =>
    fetch(`https://api.coinbase.com/v2/prices/${baseCurrency}-${currency}/spot`)
        .then(res => res.json());

module.exports = async function(_stream, baseCurrency = "BTC", currency = "USD") {
    const outputStream = new PassThrough();

    setInterval(async () => {
        getData(baseCurrency, currency)
            .then(data => {
                outputStream.write(JSON.stringify(data) + "\r\n");
            })
            .catch(() => {
                outputStream.write(JSON.stringify({ error: true }) + "\r\n");
            });
    }, 1000);

    return outputStream;
};

Save this as package.json

{
  "name": "@scramjet/crypto-prices",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "author": "",
  "license": "ISC",
  "repository": {
    "type": "git",
    "url": "https://github.com/scramjetorg/transform-hub.git"
  },
  "dependencies": {
    "node-fetch": "^2.6.1"
  }
}

Open a terminal and run your program on the transform hub:

# install dependencies
npm install

# make a compressed package with sequence
si pack . -o crypto-prices.tar.gz

# send sequence to transform hub, this will output Sequence ID
si seq send crypto-prices.tar.gz

# start a sequence with parameters, this will output Instance ID
si seq start <sequence-id> ETH USD

# See output
si inst output <instance-id>

For more information see si help.

Table of contents

Intro

The readme file contains information about the development process of the STH. It is focused mainly on a day by day commands. Commands won't work as long as you don't set up the environment correctly. You can find setup instructions in the docs.

The basics

Scramjet Transform Hub allows you to deploy and execute programs that you build and develop. As mentioned above, you can run any program you like, but you need to know a couple of important things:

  • The program should consist of a function or an array of functions, such a program is called a Transform Sequence.
  • The sequence will be executed within a separate docker instance (we're working on other execution environment integrations - help will be appreciated).
  • The sequence function will receive a stream as input in the first argument - you can send the data to it via the command si instance input.
  • If your sequence contains more than one function, then the output from the previous one is passed to the next one. The first function in sequence receives the input from API.
  • The last (or the only) function in sequence can return a Promise or a Stream - based on this, STH will know when processing is done.
  • Once the returned Promise is resolved, or the Stream is ended, STH will gracefully stop the sequence and remove its container.
  • You can communicate with the server via API, command line client si which we wrote for your convenience.
  • The sequence is called with an AppContext as this, a class that allows you to communicate back from the sequence: send logs, provide health info, send and receive events from the API or CLI.
  • You can run your sequence multiple times with different arguments (like for instance currency tickers with different symbols or sensor data readers for each sensor)
  • The program does not leave your server and doesn't use any external systems. It runs on the server you install the host on.
  • Currently STH supports node.js runner only, we're working on bringing you runners for other languages, with Python and C++ as the first ones.

Some important links:

How to start development

If you want to help out, we're happy to accept your pull requests. Please follow the below information to start development.

git clone git@github.com:scramjetorg/transform-hub.git      # clone the repo
cd transform-hub/                                           # enter the cloned directory
yarn install                                                # install dependencies
yarn build:all                                              # build all packages
                                                            #    -> modules, samples and docker images
yarn global add file:$(pwd)/dist/cli                        # install the cli
yarn packseq                                                # packs the sequencees
yarn start                                                  # start the hub

Now in another window:

si pack /path/to/my/folder -o ~/package.tar.gz # compress the app to a package
SEQ_ID=$(si seq send ~/package.tar.gz)         # upload the package to the server SEQ_ID is now it's id
INT_ID=$(si seq start $SEQ_ID -C "{}" $APIKEY BTC EUR)
                                               # start the program on the host with arguments
si inst stdout $INT_ID                         # see the output from the program.

Start host

Host can be started in multiple ways

yarn start                          # Starts Host after it's been built
node dist/host/bin/start            # This is the same as above
ts-node packages/host/src/bin/start # This starts node from source code

Lerna commands

We use Lerna to control our monorepo. Here's a couple of helpful commands:

lerna create package_name # Add new package:
lerna ls # List all of the public packages in the current Lerna repo:
lerna run [script] # Run an npm script in each package that contains that script.
lerna run --ignore @scramjet/<package_name> <script-name>
    # Run script in all packages excluding one package:
lerna run --ignore @scramjet/<package_name> --ignore @scramjet/<package_name> <script-name>
    # ... or run script excluding more packages
lerna run --scope @scramjet/<package_name> <script-name>
    # Run script only in one package
lerna run --scope @scramjet/<package_name> --scope @scramjet/<package_name> <script-name>
    # Run script in more packages

Clean build

This is how to perform a clean build

yarn install:clean        # this command will perform yarn clean && yarn clean:modules && yarn install at once
yarn build:all-packages   # optionally build:all if you want all dockerfiles.

Docker commands

During development some artifact may be left over in docker, here's how to clean them

docker ps                      # list containers
docker volume prune -f         # remove all unused local volumes
docker system prune --all -f   # remove all unused images not just dangling ones
docker stop $(docker ps -a -q) # stops all running containers

*(-f) - don't prompt confirmation

Install Host and execute

After build is done, you can install and run Hub globally:

npm install -g ./dist/hub  # installs packages globally
scramjet-transform-hub     # starts host

You can also install current Hub release from registry:

npm install -g @scramjet/hub
scramjet-transform-hub

Install CLI and execute

In the root folder, after building run the following commands:

npm i -g ./dist/cli # install CLI globally
si help             # show CLI commands

You can also install the package from NPM.

npm i -g @scramjet/cli # install CLI globally
si help                # show CLI commands

HINT: If something goes wrong make clean, install, build.

Build Host on Docker

Build from current source:

cd ./packages/host/
yarn build:docker

Build current release:

cd ./packages/host/
yarn build:docker-release

Run Transform Hub in Docker

cd ./packages/sth/
docker-compose up

# or run in detached mode
docker-compose up -d
docker-compose logs -f

To run Hub without docker-compose:

docker run -d --init \
  --name scramjet-hub \
  -p 8000:8000 \
  -v /tmp/:/tmp/ \
  -v /var/run/docker.sock:/var/run/docker.sock \
  scramjetorg/host:$(jq -r .version < package.json)

Run components

Runner

Starting Runner script: ./packages/runner/src/bin/start-runner.ts

Example of usage:

node dist/runner/bin/start-runner.js sequence-file-path fifo-files-path

Sequences and samples

To run sequence / sample (example Alice), first, you need to install all the dependencies, install and execute host, compress the package, and then you're good to go and use curl commands.

HINT: The following instructions apply to the state of the repository from the release/0.10.

Compress the package

The sequence in a tar.gz file format with package.js (aka package) can be generated in different ways.

Assuming that you have the host running use command:

yarn packseq # this creates tar.gz for all packages in the repo

When the host is not running you can use a script:

lerna run prepare-sample-tar

To compress specific package use linux tar command:

tar -C /path/to/package/dir czf <package-name.tar.gz> .

"Hello Alice" sample

To execute the sample run the commands listed below from the level of the main folder.

If the sequence is not packed:

lerna run prepare-sample-tar

HINT: remember that to use curl commands host must be running. See how to execute host =>

Now upload the package:

SEQ_ID=$(
    curl -H 'content-type: application/octet-stream' \
    --data-binary '@packages/reference-apps/hello-alice-out.tar.gz' \
    "http://localhost:8000/api/v1/sequence" | jq ".id" -r
)

You can use the following that will build and send any of the reference packages and samples in this repo:

SEQ_ID=$(./scripts/_/upload-sequence packages/reference-apps/hello-alice-out) # -> when you want to upload the package (it will be built if needed)
SEQ_ID=$(./scripts/_/upload-sequence packages/reference-apps/hello-alice-out -r) # -> when you want to upload the package and make sure it's rebuilt
SEQ_ID=$(./scripts/_/upload-sequence dist/my-package.tgz -r) # -> when you want to upload a ready tarball

HINT: INSTANCE_ID and SEQ_ID are shell variables.

Start the sequence and see the output from it.

INSTANCE_ID=$(curl -H "Content-Type: application/json" --data-raw '{"appConfig": {},"args": ["/package/data.json"]}' http://localhost:8000/api/v1/sequence/$SEQ_ID/start | jq ".id" -r)
curl -X GET -H "Content-Type: application/octet-stream" "http://localhost:8000/api/v1/instance/$INSTANCE_ID/stdout"

as a result you should see something like this in the console:

Hello Alice!
Hello Ada!
Hello Aga!
Hello Michał!
Hello Maciek!
Hello Marcin!
Hello Patryk!
Hello Rafał!
Hello Aida!
Hello Basia!
Hello Natalia!
Hello Monika!
Hello Wojtek!
Hello Arek!

after that hit enter and type kill to exit the process:

sequence: kill

See more about streams and curl commands =>

HINT: If something goes wrong run clean, build.

Copy and paste 🤞

yarn clean && yarn build

How to run tests

Make unit and bdd tests via command:

yarn test

It will execute:

yarn test:parallel && yarn test:bdd

Unit tests

yarn test:packages

If you want to run a particular test file, go to directory where the test file is and run command:

npx ava name-of-the-file.spec.ts

If you want to run one particular test in the file, go to directory where the test file is and run command:

npx ava name-of-the-file.spec.ts -m "Name-of-the-unit-test"

If you add -w a the end of the command above the test will run automaticaly after every change you make in the test, eg.:

npx ava runner.spec.ts -m "Stop sequence" -w

BDD tests

The following instructions apply to the state of the repository from the release/0.10. BDD tests are located in a bdd folder, to execute them simply follow the steps below:

  • start with:
yarn clean && yarn install && yarn build:all && yarn packseq

Remeber if you want to test core dump file you must set echo '/cores/core.%e.%p' | sudo tee /proc/sys/kernel/core_pattern on your linux machine.

  • execute all bdd test from the command line:
yarn test:bdd
  • or execute a particular bdd scenario by adding the scenario title after a --name flag:
yarn test:bdd --name="Execute example HelloAlice"

When you want to execute a group of tests you can do it using the substring of their name, for example, to execute all E2E tests:

yarn test:bdd --name="E2E"

You can also execute tests based on their tag name, for example:

yarn test:bdd --tags '@ci'

Results of the performed test will be displayed in the console. There is also a report generated in html which illustrates the results in a very user friendly form. Html report is generated every time we run a bdd test, those html's are saved in bdd/reports folder.

In a result of running all the test, both unit and bdd (command: yarn test), Lerna goes through all the packages and runs unit tests and also checks the bdd directory and runs all bdd scenarios.

If you see the error along the way, that means some tests were not passed.

Below you can see an example, which shows the result of all passed unit test in all the packages:

lerna success run Ran npm script 'test' in 17 packages in 26.1s:
lerna success - @scramjet/adapters
lerna success - @scramjet/api-client
lerna success - @scramjet/api-server
lerna success - @scramjet/sth-config
lerna success - @scramjet/host
lerna success - @scramjet/logger
lerna success - @scramjet/model
lerna success - @scramjet/pre-runner
lerna success - @scramjet/runner
lerna success - @scramjet/example
lerna success - @scramjet/example2
lerna success - @scramjet/hello-alice-out
lerna success - @scramjet/supervisor
lerna success - @scramjet/symbols
lerna success - @scramjet/test-ava-ts-node
lerna success - @scramjet/types
Done in 26.66s.

Publishing

To perform full publishing of packages with build and install, perform the following commands:

# <clone>
yarn cache clean           # optional clean cache
yarn install               # install dependencies
yarn build:all-packages    # build all packages
yarn bump:version          # bump version and docker images prior to publishing
yarn bump:postversion      # prepare dist folder, publish packages from dist, push git tags

License and contributions

This project is licensed dual licensed under the AGPL-3.0 and MIT licences. Parts of the project that are linked with your programs are MIT licensed, the rest is AGPL.

We accept valid contributions and we will be publishing a more specific project roadmap so contributors can propose features and also help us implement them. We kindly ask you that contributed commits are Signed-Off git commit --sign-off.

We provide support for contributions via test cases. If you expect a certain type of workflow to be officially supported, please specify and implement a test case in Gherkin format in bdd directory.

Help wanted

The project need's your help! There's lots of work to do and we have a lot of plans. If you want to help and be part of the Scramjet team, please reach out to us, on slack or email us: opensource@scramjet.org.

Donation

Do you like this project? It helped you to reduce time spent on delivering your solution? You are welcome to buy us a coffee ;)

You can sponsor us on github

  • There's also a Paypal donation link if you prefer that: paypal

About

Flexible and efficient data processing engine and an evolution of the popular Scramjet Framework based on node.js. Our Transform Hub was designed specifically for data processing and has its own unique algorithms included.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • TypeScript 80.6%
  • JavaScript 8.8%
  • Gherkin 7.9%
  • Shell 1.4%
  • Other 1.3%