Skip to content

bazzani/kafka-connect-stack

Repository files navigation

Kafka Connect Stack with Integration Tests

CircleCI

Quality Gate Status Coverage Bugs Code Smells Technical Debt Lines of Code

TL;DR

If you want to learn how to run Kafka Connect, create a custom SMT, and test a Connector is working end to end in your local environment with automated tests asserting on data in the Database...*gasps for air* ๐Ÿ˜ฎ... this project is for you; read on...

Tip

Run ./gradlew build to build the project including running the integration tests. Afterward you can check the code coverage and container logs, more details about where to find them below.

Note

The Docker Compose stack used in this project is a variant of the Confluent cp-all-in-one stack found at https://github.com/confluentinc/cp-all-in-one/tree/7.6.0-post


Project purpose

The purpose of this project is to help a developer understand how to use:

  1. Kafka Connect :: Run a Docker Compose stack with all dependant services a Kafka Connect instance needs
  2. Single Message Transforms :: Develop custom Kafka Connect SMTs and build a jar library
  3. Kafka Connect Docker Image :: Build a Docker image based on confluentinc/cp-server-connect-base
  4. Auto Connector creation :: Add Connector configs to docker image, created at startup with a custom script
  5. JDBCSinkConnector :: Create a Kafka Connect Connector to move data from a Kafka Topic to a Database
  6. Gradle Multi-Project structure :: Separate our SMT library code and the Spring Boot application/tests
  7. Spring Boot and @SpringBootTest :: Produce a Kafka record on a topic and read from a Database with JPA
  8. Integration Tests :: Run a single Gradle command to ensure our Connector/SMTs are working end-to-end
  9. gradle-docker-compose-plugin :: Start and stop the full stack of Docker Compose services using Gradle
  10. JaCoCo Report Aggregation Plugin :: Get code coverage stats from subprojects in a single HTML/XML report

Continuous Integration

Circle CI

CircleCI

Every commit pushed to git is built by Circle CI to ensure the integration tests still pass, and SonarCloud analysis is run if possible (see SONAR_TOKEN requirement below). You can see the Circle CI build configuration in the .circleci/config.yml file.

The Gradle dependencies are cached based on a checksum of all the build.gradle files, which can speed up the builds significantly; see more at the Caching dependencies page.

After the build has finished, the test results can be found in the Tests tab of the build job; see more at the Collect test data page.

The test html reports, JaCoCo aggregate report and connect container logs can be found in the Artifacts tab of the build job. The log file can be useful to check if integration tests fail; see more at the Storing build artifacts page.

SonarCloud

Quality Gate Status

SonarCloud helps to keep the codebase clean by reporting on several metrics such as issues (problems in code where rules are violated), code coverage, and technical debt.

The sonar Gradle task will only run if the SONAR_TOKEN environment variable is present, which makes the build more robust in different environments. Circle CI has the token applied, so it will run there.

Important

If you fork this repo and want to be able to run SonarCloud analysis you will need to change the projectKey and organization sonar properties to point to your own project, they can be found in the root build.gradle.


Project structure

The code in this project is split into a few Gradle Multi-Projects with dependencies between them. Best practises from the Gradle User Manual have been followed:

Project Directory Description
Root project ย  Contains Java Toolchain plugin in settings.gradle, and plugin definitions in build.gradle which are applied to subprojects (this centralises the version numbers).
buildSrc Contains Gradle convention plugins to share build logic in sub projects.
connect-smt-lib Contains Java code and tests for custom SMTs that are exported to a jar library and added to the Kafka Connect Docker image at build time.
connect-spring-boot-app Contains Java code which uses Spring Boot, Spring Kafka, Spring Data JPA, and @SpringBootTest to run end to end integration tests with the help of some Grade plugins to control the Docker Compose stack.
jacoco-report-aggregation Contains Gradle build logic to run a plugin which aggregates all JaCoCo execution data and creates a consolidated HTML/XML report with coverage data for all subprojects.
connect-connector-configs Contains Kafka Connect connector configurations in JSON format. Each JSON file is processed by the Kafka Connect container at startup to create a connector automatically once the REST API is ready to receive requests.
connect-scripts Contains custom bash scripts to start the Connect service with the Confluent startup script, wait for the REST API to be available, then create connectors automatically.
db Contains SQL scripts run by the postgres service on startup; we usually add CREATE TABLE definitions in these scripts.

How do I extend this project?

The objective of this project is to allow you to easily add your own SMTs and use them in Connectors.

If you want to do this follow these steps:

  1. Add a new SMT to the connect-smt-lib module
    • ๐Ÿ“ don't forget to add a unit test too!
  2. Add a new connector config to the connect-connector-configs directory
    • ๐Ÿ“It will automatically get detected at container startup and the connector will be created
    • ๐Ÿ’ก Keep an eye on the container logs at startup to ensure your connector config is valid
    • ๐Ÿ’ก Or use the Confluent Control Center UI to add a connector by uploading the json file
  3. Add an AVRO schema for the new record/topic here
  4. Add an AVRO JSON data file for the new record/topic here
  5. If you are adding a new JDBCSinkConnector
    1. Add a new Database table definition to the db directory
    2. Add a new Entity for the new DB Table here
    3. Add a new JPA Repository for the entity here
  6. Add a new Creator class to map AVRO data to a GenericData.Record based on FFVIIAllyUpdateCreator
  7. Add a @SpringBootTest integration test for the new connector based on FFVIIAllyUpdateTest
  8. Run the integration tests ./gradlew integrationTest

Technologies Used

  1. Java 17 - Used by Gradle to build the Java code/tests, and it is the JRE running inside the Connect container
  2. Gradle Wrapper - The project contains a Gradle Wrapper to execute the build
    • The Gradle installation will be downloaded the first time you run the wrapper
  3. Docker & Docker Compose - to build Docker images and run containers in the stack
  4. Postgres Database - to store data from Kafka Connect

Important

Please ensure you have Java 17 set as your Java Home, otherwise you will get errors like this:

FAILURE: Build failed with an exception.

* What went wrong:
A problem occurred configuring root project 'kafka-connect-stack'.
> Could not resolve all artifacts for configuration ':classpath'.
  > Could not resolve org.springframework.boot:spring-boot-gradle-plugin:3.2.4.
    Required by:
        project : > org.springframework.boot:org.springframework.boot.gradle.plugin:3.2.4
    > No matching variant of org.springframework.boot:spring-boot-gradle-plugin:3.2.4 was found.
       The consumer was configured to find a library for use during runtime, compatible with
       Java 11, packaged as a jar, and its dependencies declared externally, as well as attribute
       'org.gradle.plugin.api-version' with value '8.7' but:
    - Variant 'apiElements' declares a library, packaged as a jar, and its dependencies declared
      externally:
    - Incompatible because this component declares a component for use during compile-time,
      compatible with Java 17 and the consumer needed a component for use during runtime,
      compatible with Java 11
    - Other compatible attribute:
    - Doesn't say anything about org.gradle.plugin.api-version (required '8.7')

Docker Compose stack

The Docker Compose stack used in this project is a variant of the Confluent cp-all-in-one stack found at https://github.com/confluentinc/cp-all-in-one/tree/7.6.0-post

Tip

You can start the stack by running the docker compose command below, but we recommend you use the gradle-docker-compose-plugin task ./gradlew intTestComposeUp to start it as it will also build the Connect image if the SMT library or connector configs have changed in any way.

docker compose up -d --build

Ports

To connect to services in Docker, refer to the following ports:

Service Port Notes
ZooKeeper 2181
Kafka broker 9092
Kafka broker JMX 9101
Confluent Schema Registry 8081 Schema Registry API Reference
Kafka Connect 8083 Kafka Connect Rest API documentation
Confluent Control Center 9021 Confluent Control Center documentation
ksqlDB 8088
Confluent REST Proxy 8082 API Reference for Confluent REST Proxy

The most useful service is the Confluent Control Center which exposes a graphical user interface at http://localhost:9021/clusters/ to manage various services including:

  • the Kafka Broker
  • Topics (including producing and consuming records)
  • Consumers/Consumer groups
  • Connectors running on the Kafka Connect service
  • Schemas associated with topics (via the Schema Registry under the hood)
  • use ksqlDB Streams, Tables, and queries

Integration Tests

An integrationTest Gradle sourceSet has been added so that we can isolate the unit tests and the integration tests (the former completing in a much shorter time by running the standard ./gradlew test task).

Running the ./gradlew check gradle task will invoke the integrationTest task which has the following steps:

  1. bring the Docker Compose stack up in a project named kafka-connect-stack_inttest
  2. run the integration tests
  3. bring the Docker Compose stack down
  4. remove all the containers

Tip

This behaviour can be changed by using the various configuration properties of the gradle-docker-compose-plugin such as stopContainers and removeContainers

Container logs

After running the integration tests, you can view the logs for all the containers in the connect-spring-boot-app/build directory.

If the integration tests fail ๐Ÿ› , you might want to check the Kafka Connect container logs for errors on startup when automatic connector creation happens, or when a connector tried to process a record and failed.

Running Integration tests once

As mentioned above all containers are stopped and removed after the integration tests have finished running, which is useful when running them on a CI server. It is possible to run the integration tests without stopping and removing all the Docker containers in the stack which is more suited to local development.

This is useful if you are making frequent changes to the tests, the underlying Spring Boot application, or even the SMTs in the connect-smt-lib library/connector configurations (making changes to the latter will force the connect container to be recreated, which is desirable).

  • To do this run ./gradlew integrationTestRun

Tip

Exclude the intTestComposeUp task to run the integration tests even faster by not checking if all containers are responding on the correct port before the tests are run.

Skipping this check can be useful if we are only making changes to the tests, and not the SMTs, connector configs or the Kafka Connect Dockerfile.

  • To do this run ./gradlew integrationTestRun -x intTestComposeUp

Note

making changes to any SMT library code or connector configs will recreate the Kafka Connect docker image via the intTestComposeBuild task, which is a dependant of integrationTestRun.


JaCoCo coverage

After running the ./gradlew check task (included in build), you can find the aggregate report for code in all the subprojects in the jacoco-report-aggregation/build directory.


TODOs

Some outstanding tasks to make the project more complete can be found here

About

My personal project which creates a Kafka Connect app using Docker Compose and runs some end-to-end tests using Java, Spring Boot, Postgres, and Gradle

Topics

Resources

Stars

Watchers

Forks