Capillaries

Capillaries is a data processing framework that:

addresses scalability issues and manages intermediate data storage, enabling users to concentrate on data transforms and quality control;
bridges the gap between distributed, scalable data processing/integration solutions and the necessity to produce enriched, customer-ready, production-quality, human-curated data within SLA time limits.

Why Capillaries?

	BEFORE	AFTER
Cloud-friendly	Depends	Can be deployed to the cloud within minutes; Docker-ready
Data aggregation	SQL joins	Capillaries lookups in Cassandra + Go expressions (scalability, parallel execution)
Data filtering	SQL queries, custom code	Go expressions (scalability, maintainability)
Data transform	SQL expressions, custom code	Go expressions, Python formulas (parallel execution, maintainability)
Intermediate data storage	Files, relational databases	on-the-fly-created Cassandra keyspaces and tables (scalability, maintainability)
Workflow execution	Shell scripts, custom code, workflow frameworks	RabbitMQ as scheduler, workflow status stored in Cassandra (parallel execution, fault tolerance, incremental computing)
Workflow monitoring and interaction	Custom solutions	Capillaries UI, Toolbelt utility, API, Web API (transparency, operator validation support)
Workflow management	Shell scripts, custom code	Capillaries configuration: script file with DAG, Python formulas

Getting started

On Mac, WSL or Linux, run in bash shell:

git clone https://github.com/capillariesio/capillaries.git
cd capillaries
./copy_demo_data.sh
docker-compose -p "test_capillaries_containers" up

Wait until all containers are started and Cassandra is fully initialized (it will log something like Created default superuser role 'cassandra'). Now Capillaries is ready to process sample demo input data according to the sample demo scripts (all copied by copy_demo_data.sh above).

Navigate to http://localhost:8080 to see Capillaries UI.

Start a new Capillaries data processing run by clicking "New run" and providing the following parameters (no tabs or spaces allowed):

Field	Value
Keyspace	portfolio_quicktest
Script URI	/tmp/capi_cfg/portfolio_quicktest/script.json
Script parameters URI	/tmp/capi_cfg/portfolio_quicktest/script_params.json
Start nodes	1_read_accounts,1_read_txns,1_read_period_holdings

Alternatively, you can start a new run using Capillaries toolbelt by executing the following command from the Docker host machine, it should have the same effect as starting a run from the UI:

docker exec -it capillaries_webapi /usr/local/bin/capitoolbelt start_run -script_file=/tmp/capi_cfg/portfolio_quicktest/script.json -params_file=/tmp/capi_cfg/portfolio_quicktest/script_params.json -keyspace=portfolio_quicktest -start_nodes=1_read_accounts,1_read_txns,1_read_period_holdings

Watch the progress in Capillaries UI. A new keyspace portfolio_quicktest will appear in the keyspace list. Click on it and watch the run complete - nodes 7_file_account_period_sector_perf and 7_file_account_year_perf should produce result files:

cat /tmp/capi_out/portfolio_quicktest/account_period_sector_perf.csv
cat /tmp/capi_out/portfolio_quicktest/account_year_perf.csv

Log messages generated by:

Capillaries Daemon
Capillaries WebAPI
Capillaries UI
RabbitMQ
Cassandra are collected by fluentd and saved in /tmp/capi_log.

For more details about getting started, see Getting started. For more details about this particular demo, see Capillaries blog: Use Capillaries to calculate ARK portfolio performance. To learn how this demo runs on a bigger dataset with 14 million transactions, see Capillaries: ARK portfolio performance calculation at scale.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/workflows		.github/workflows
.vscode		.vscode
doc		doc
pkg		pkg
test		test
ui		ui
.gitattributes		.gitattributes
.gitignore		.gitignore
.golangci.yml		.golangci.yml
LICENSE		LICENSE
README.md		README.md
build_binaries.sh		build_binaries.sh
copy_demo_data.sh		copy_demo_data.sh
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
test_coverage.sh		test_coverage.sh
test_integration.sh		test_integration.sh
test_unit.sh		test_unit.sh

License

capillariesio/capillaries

Folders and files

Latest commit

History

Repository files navigation

Capillaries

Why Capillaries?

Getting started

Capillaries in depth

What it is and what it is not (use case discussion and diagrams)

Getting started (run a quick Docker-based demo without compiling a single line of code)

About

Topics

Resources

License

Stars

Watchers

Forks

Languages