Intro

While testing out with the new schema support available in the ecosystem and its best practice, more specifically protobuf, was surprised to not find open references of implementing personal data protection. Please see kafka references and general information to the link of the solutions found.

This repo intends to present some experimentation on gdpr which were not ...

Further more provide an ?open? space to collaborate in a so complex subject and with so many possible combinations for example with cloud kms implementations, use cases as Acls including the extense kafka ecosystem.

Project Goals

Gdpr compliant / right to be forgotten
No deletion, event loss, data loss of non personal data
Explicit data classification over implicit encryption (as part of the schema)
Composable with the current kafka clients / serializers
Composable with different key management systems
Composable with the kafka ecosystem (could be used directly by the client or by a kafka connect)
Yet, providing a simple implementation
Composability should enable different Acls/ways to access data from different consumers

Background

Event driven architectures and its persistence is finally becoming known and becoming the new core.
- The new source of true
- Streaming platforms with long term durability rather than data in transit, specially with KIP-405
- Streaming platforms extending to provide database like operations instead of the opposite - lsm ;)
Data governance at center with personal data laws (gdpr/lgpd)
- Maturity levels - Early, many times mixed with bureaucracy and spreadsheets

Challenges

Multiple areas of knowledge:
- Serializers (Avro, Protobuf, Json Schema, ...)
- Schema registries (Confluent, Apiario, ...)
- Cryptography / shredding approach
- Multiple kms implementations (aws, gcp, ...)

Getting started

Please see the kotlin-springboot code sample and video.

Concepts

The pi2schema project relies basically on the 3 following modules/components which can be composed among them. They are implemented for extensibility and to support multiple cloud providers, encryption mechanisms and security level.

Schema

The schema is the central part of the p2schema solution. All the metadata information is intended to be described explicitly and naturally as part of the schema, even if the information itself comes from outside.

The core metadata information to be described in the schema consists of:

Subject Identifier: Identifies which subject the personal data belongs to. It can be for instance the user uuid , or the user email or any other identifier.
Personal Information: The data which should be protected related to the subject identifier.

Although this project started as part of the confluent protobuf support exploration, the goal is to be extensible for any schema / serialization format. While the intention is to have the definition / usage as close as possible within the implementations, they will inevitably be different depending on the schema capabilities. Please refer to the specific documentation for details:

Crypto

Application

Next steps

DelegateSecretKey and cloud implementations/providers
Secret keys wrapping and Acls
Multi language support similar to librdkafka implemented in rust
Extending schema support/vocabulary

Name		Name	Last commit message	Last commit date
Latest commit History 253 Commits
.github		.github
crypto-providers-kafkakms		crypto-providers-kafkakms
crypto-providers-vault		crypto-providers-vault
crypto-spi		crypto-spi
examples		examples
gradle/wrapper		gradle/wrapper
schema-providers-avro		schema-providers-avro
schema-providers-protobuf		schema-providers-protobuf
schema-spi		schema-spi
serialization-kafka-avro		serialization-kafka-avro
serialization-kafka-protobuf		serialization-kafka-protobuf
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

License

pi2schema/pi2schema

Folders and files

Latest commit

History

Repository files navigation

Intro

Project Goals

Background

Challenges

Getting started

Concepts

Schema

Crypto

Application

Next steps

See also

Alternative approaches

kafka references

General implementations (mainly non free) references

About

Topics

Resources

License

Stars

Watchers

Forks

Languages