Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jaeger-v2] Storage backend integration tests #5254

Open
4 of 5 tasks
james-ryans opened this issue Mar 6, 2024 · 14 comments
Open
4 of 5 tasks

[jaeger-v2] Storage backend integration tests #5254

james-ryans opened this issue Mar 6, 2024 · 14 comments

Comments

@james-ryans
Copy link
Contributor

james-ryans commented Mar 6, 2024

Requirement

With the Jaeger storage extension for Jaeger-v2 is going to have full support of Jaeger-v1's storage backends, some unit tests on every storage backends are not enough. We need to conduct end-to-end tests of OpenTelemetry Collector pipeline to the targeted database.

Problem

There are still no integration tests to test the actual stored traces to the database from V2 Jaeger storage extension.

Proposal

Fortunately, OpenTelemetry Collector already has a testbed framework to help us conducting the end-to-end tests.

Testbed is a controlled environment and tools for conducting end-to-end tests for the OpenTelemetry Collector, including reproducible short-term benchmarks, correctness tests, long-running stability tests and maximum load stress tests. However, we will only utilize the correctness tests from testbed, it generates and sends every combinatorial trace attributes and matches every single of them with the received traces from another end.

Architecture of the integration test

Here's the architecture we will use to test the OpenTelemetry Collector pipeline from end-to-end with the designated storage backends.
jaeger-v2-testbed
Testbed components:

  • LoadGenerator - encapsulates DataProvider and DataSender in order to generate and send data.
    • Golden DataProvider - generates traces from the "Golden" dataset generated using pairwise combinatorial testing techniques. Testbed example uses PICT to generate the test data, e.g. testdata.
    • OTLP Trace DataSender - with the generated traces from DataProvider, the DataSender sends traces to OTLP receiver in the collector instance.
  • Mockbackend - encapsulates DataReceiver and provides consume functionality.
    • DataReceiver - we will create a custom DataReceiver that will host a Jaeger storage extension to retrieve traces from the database by pulling them using our artificial Jaeger storage receiver [jaeger-v2] Add support for artificial jaeger storage receiver #5242.
    • Consumer - consumer does not actually a thing in MockBackend but only to make the diagram intuitive, the traces received from our artificial receiver will be stored inside MockBackend.
  • Correctness Test Validator - checks if the traces received from MockBackend are all matches with the generated traces from DataProvider.

Plan

The execution of integration tests will be done incrementally one by one on every supported storage backends:

Open questions

No response

@pavolloffay pavolloffay added the v2 label Mar 13, 2024
yurishkuro pushed a commit that referenced this issue Mar 16, 2024
## Which problem is this PR solving?
- Resolves GRPC integration test sub-task at #5254

## Description of the changes
- Created a `grpc-integration-test.sh` script to run a jaeger remote
storage and execute the end-to-end test through the OpenTelemetry
Collector pipeline with the jaeger storage extension inside connected to
the remote storage. To have a visualization of this architecture, see
the proposal at #5254
- Separate the GRPC and Badger integration test CI because GRPC need to
be run twice for the v1 and v2 versions.

## How was this change tested?
- Run `./scripts/grpc-integration-test.sh latest` and the whole remote
storage and pipeline will be built and executed for you.
- I also ran a `jaeger-query`component to query traces from the remote
storage for manual checks.
<img width="1440" alt="Screenshot 2024-03-08 at 09 58 13"
src="https://github.com/jaegertracing/jaeger/assets/46216691/2388e8bc-baf9-41bd-8baa-c3d51703fc8e">

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
@yurishkuro
Copy link
Member

@james-ryans as I was reviewing the PRs that follow from this issue, I am starting to have some concerns with this approach. Here is the set of requirements that I think we need to meet:

  1. we need to exercise the full pipeline to write data externally and verify that it makes it to the storage
    * (1b) we need to write data in different formats, not just OTLP
  2. we then also need to exercise the querying API
  3. we need to exercise archiving capability
  4. we need to validate that the config files we're providing in cmd/jaeger are valid by doing an e2e smoke test
    * (4b) in v1 we also had some docker-compose files that need to be tested
  5. we need to generate code coverage for some parts of the code that do not get exercised in unit tests (usually related to initializing the storage drivers)
  6. we need to provide a capability for external plugin providers (implementing gRPC Storage API, such as Quickwit or Postgress plugin) to also run e2e test for writing and querying, as a way of certifying compatibility with Jaeger

In the current state:

  • OTEL testbed solves only (1)
  • crossdock tests solve (1b) and partially (4b)
  • Our storage integration tests solve (2), (3) and (5), and I think (6)
  • Nothing solves (4)

I think we can solve all 6 requirements by building upon our existing integration tests rather than with OTEL testbed. Perhaps we can also find a way to utilize the testbed's data generation ability and incorporate it as a step in the overall integration, but on itself I don't see how it can solve all requirements.

  • we can keep integration tests operating as unit tests to address (2), (3), (5)
  • we can abstract how integration tests write and read span data, such that in the unit test mode they would call storage API as a library, but in e2e mode they will do the same via RPC requests. This can solve (1) and (4)
    • note 1: right now testbed-based config for gRPC is different ([jaeger-v2] add Badger storage backend integration test #5281) from the one in cmd/jaeger, but probably artificially, if we run the main config and test writes and reads via RPCs we don't need to separate Badger storage into another process, it can run in the all-in-one mode
    • note 2: when bootstrapping e2e tests, perhaps we can rely on docker-compose files instead of starting docker containers manually, this will help with (4b)
  • we can run unit test mode and e2e mode in the same workflow so that we don't have to start-up storage backends multiple times (expensive for ES/OS/Cassandra)
  • not sure about (6), my guess it should be able to reuse the grpc-storage tests
  • the crossdock tests have two parts: testing interoperability between SDKs and exercising the receipt of different formats of data produced by various legacy SDKs. We can retire the former part, but the latter is still useful since we cannot reproduce it locally without depending on deprecating SDKs. If the whole e2e test is running in docker-compose we can find a way to reuse this.

Achieving this will streamline our integration tests by converging onto a single framework, instead of using 3 different ones for bits and pieces. This is probably a large task, so I would like to find a path of incremental improvements that lead us to the overall goal. Let's give it some thought.

@james-ryans
Copy link
Contributor Author

There are some points that are still ambiguous for me, and I want to clarify things. Right now, I’ll just want to focus on the first three points of your vision and intention:

  • Point 1 states "keep integration tests operating as unit tests". From what I understand, by operating as unit tests, you mean implementing it with test cases similar to the StorageIntegration struct in the plugin/storage/integration/integration.go. Thus, instead of testing end-to-end, where the only components visible to us are the source (data provider) and the sink (data receiver), and keeping the collector pipeline isolated, we can manually read and write span data ourselves by calling storage API library functions.

  • At the point 2, it states "unit test mode they would call storage API as a library, but in e2e mode they will do the same via RPC requests". I don't quite understand the statement. From my perspective, I think it means that in unit test, we create only the storage extension, and we directly write and read span data to it through the Writer and Reader at the storage/spanstore/interface.go API. While in e2e mode, we create the whole collector pipeline, send traces through the receiver, and validate them by reading with Reader storage API. Is that correct?

  • With the two points above, both unit test mode and e2e mode actually write to the storage backends (even though it is called unit test, I'm thinking that the "unit" refers solely the storage extension). So we can just start the storage backend once and reused it.

@yurishkuro
Copy link
Member

Yes, that is all correct. For instance, with ES, in the unit test model the test will instantiate es.SpanWriter and when it calls writer.WriteSpan() it's an in-process call to the es storage implementation that writes directly to ES. But in e2e mode, a different SpanWriter will be instantiated that executes an OTLP-RPC request to the running collector, where it will be accepted by the receiver and written to storage by the exporter.

@yurishkuro
Copy link
Member

yurishkuro commented Mar 28, 2024

unit test mode

flowchart LR
    Test -->|writeSpan| SpanWriter
    SpanWriter --> B(StorageBackend)
    Test -->|readSpan| SpanReader
    SpanReader --> B

    subgraph Integration Test Executable
        Test
        SpanWriter
        SpanReader
    end

e2e test mode

flowchart LR
    Test -->|writeSpan| SpanWriter
    SpanWriter --> RPCW[RPC_client]
    RPCW --> Receiver
    Receiver --> Exporter
    Exporter --> B(StorageBackend)
    Test -->|readSpan| SpanReader
    SpanReader --> RPCR[RPC_client]
    RPCR --> jaeger_query
    jaeger_query --> B

    subgraph Integration Test Executable
        Test
        SpanWriter
        SpanReader
        RPCW
        RPCR
    end

    subgraph jaeger-v2
        Receiver
        Exporter
        jaeger_query
    end

@james-ryans
Copy link
Contributor Author

I have created an action plan to provide us with a clear, structured pathway so we can execute this in parallel. Some thoughts are welcome if my idea doesn't match with our vision.

  1. Prototyping the new integration tests

    1. Implement the unit test that exercise the querying API (2) and archives (3). Initializing storage driver codes (5) should automatically covered with this tests.


      My thoughts of how this will be implemented is we need only to pass the config to the setup function that will starts the storage extension, within the setup we can retrieve the SpanWriter and SpanReader. Not sure, but probably we can reuse the StorageIntegration module.
Also I found that the archiving capability only tested on Elasticsearch storage.

    2. Extend the unit test for the e2e test, but instead of starts only the storage extension, we use config file in cmd/jaeger to spawn the whole collector pipeline (4), then implement the SpanWriter and SpanReader to send span data through gRPC requests to receiver and from jaeger_query (1).

    List of the storage backends that need to be tested:

    • memory
    • gRPC
    • badger
    • cassandra
    • elasticsearch
    • opensearch
  2. Refactoring and an example for external plugin provider.

    1. Refactor the unit test and e2e test to run in the same workflow so they use the same storage backend. Need to extra careful with the previous written data.
    2. Refactor bootstrapping tests to rely on docker-compose files.
    3. Add an example on how to test external plugin providers with our gRPC storage tests.
  3. Add the crossdock tests.

With this, we can prototype the unit test and e2e test modes in parallel. But, after the unit test is merged, we need to refactor the e2e test to have a similar structure. Once the unit test and e2e test for one of the storage backends are merged, we can continue working on the other backends. After that, we can do refactoring and an example from plan 2 in parallel. The last one is to give some thoughts and find out how to test the interoperability between SDKs and exercise the receipt of different formats of data in crossdock fashion.

@james-ryans
Copy link
Contributor Author

And I'll try to prototype the e2e test for the gRPC storage backend since @Pushkarm029 is working on the gRPC unit test.

@yurishkuro
Copy link
Member

@james-ryans a couple thoughts

  • there is a PR in progress to migrate archive test from ES to the main test suite Move Archive test into shared integration test suite #5207
  • my diagrams only show the extension of the existing /integration/ tests to work in e2e mode. Do you see the benefits of also using OTEL testbed in this setup?

@james-ryans
Copy link
Contributor Author

Ohh wow, nice.. I overlooked that this task exists. I'll take a look at it.

  • my diagrams only show the extension of the existing /integration/ tests to work in e2e mode. Do you see the benefits of also using OTEL testbed in this setup?

Some components of it might be useful but we can implement it on our own with ease if we want to, probably modifying it for our specific use case. I'm thinking that we should be able to use OTEL testbed collector (testbed/testbed/in_process_collector.go) to start the jaeger-v2.

Probably, also the OTEL testbed sender component to write the span data through RPC request. However, I still need to examine it to get a concrete picture. One concern is that the sender lacks the functionality to close the RPC connection.

@yurishkuro
Copy link
Member

yurishkuro commented Mar 29, 2024

One main difference to me is that our integration tests generate very specific traces and then query for them in very specific ways, to actually exercise the querying capabilities & permutations. But OTEL testbed just generates a random flood of data and only checks that it all gets through (not even that, as I believe it only checks the IDs). That was really my question - what is the value of such data source? It's not really fuzz-testing since the data is still hardcoded (just permutated for the load). I could see it potentially being useful for stress testing, but we don't do that today (would need dedicated HW, not GH runners).

@james-ryans
Copy link
Contributor Author

james-ryans commented Mar 29, 2024

The sender is just a wrapper for OTLP exporter that we are able to call the ConsumeTraces func with our specific provided traces and the remaining will be handled by the sender to do the RPC requests. The OTEL testbed has data provider and sender components, and the data provider component is the one that generates random traces and pushes them through the sender. With the sender alone we should be able to utilize it for our integration tests.

@james-ryans
Copy link
Contributor Author

@yurishkuro with the new integration requirements, we don't need to test the collector pipeline with testbed as I proposed before anymore, is that right? If it is, we can just delete it.

@yurishkuro
Copy link
Member

I think so, but that was really my question to you - if we used the testbed, what additional aspects or behavior would it be testing?

@james-ryans
Copy link
Contributor Author

Okay. It doesn't give any benefit anymore at this point, since all the test cases are already covered by the existing StorageIntegration. But we can use some parts of the components for ourselves to provide an easier setup for the new integration tests.

yurishkuro added a commit that referenced this issue Apr 9, 2024
## Which problem is this PR solving?
- Part of
#5254 (comment)
solves (1) and (4) for gRPC storage integration test in e2e mode.

## Description of the changes
- Utilizing existing `StorageIntegration` to test the jaeger-v2 OTel
Collector and gRPC storage backend with the provided config file at
`cmd/jaeger/grpc_config.yaml`.
- Creates an abstraction for e2e test mode that initializes the
collector, span writer to the receiver, and span reader from
jaeger_query.

## How was this change tested?
- Run `STORAGE=grpc SPAN_STORAGE_TYPE=memory make
jaeger-v2-storage-integration-test`

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
@yurishkuro
Copy link
Member

Copying from #5355 (comment) - let's add this to the README.

flowchart LR
    Receiver --> Processor
    Processor --> Exporter
    JaegerStorageExension -->|"(1) get storage"| Exporter
    Exporter -->|"(2) write trace"| Badger

    Badger_e2e_test -->|"(1) POST /purge"| HTTP_endpoint
    JaegerStorageExension -->|"(2) getStorage()"| HTTP_endpoint
    HTTP_endpoint -.->|"(3) storage.(*Badger).Purge()"| Badger

    subgraph Jaeger Collector
        Receiver
        Processor
        Exporter
        
        Badger
        BadgerCleanerExtension
        HTTP_endpoint
        subgraph JaegerStorageExension
            Badger
        end
        subgraph BadgerCleanerExtension
            HTTP_endpoint
        end
    end

yurishkuro added a commit that referenced this issue Apr 26, 2024
## Which problem is this PR solving?
- part of #5254
## Description of the changes
- added badger e2e integration test

## How was this change tested?
- tested locally 

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <122517264+akagami-harsh@users.noreply.github.com>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Co-authored-by: Yuri Shkuro <github@ysh.us>
varshith257 pushed a commit to varshith257/jaeger that referenced this issue May 3, 2024
## Which problem is this PR solving?
- part of jaegertracing#5254
## Description of the changes
- added badger e2e integration test

## How was this change tested?
- tested locally 

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <122517264+akagami-harsh@users.noreply.github.com>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Co-authored-by: Yuri Shkuro <github@ysh.us>
Signed-off-by: Vamshi Maskuri <gwcchintu@gmail.com>
yurishkuro pushed a commit that referenced this issue May 3, 2024
## Which problem is this PR solving?
- part of #5254 

## Description of the changes
- Utilizing existing `StorageIntegration` to test the jaeger-v2 OTel
Collector and gRPC storage backend with the provided config file at
`cmd/jaeger/config-elasticsearch.yaml`.

## How was this change tested?
- Start a elasticsearch or opensearch docker instance.
- Run `STORAGE=elasticsearch SPAN_STORAGE_TYPE=elasticsearch make
jaeger-v2-storage-integration-test`

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
Pushkarm029 added a commit to Pushkarm029/jaeger that referenced this issue May 4, 2024
…ertracing#5345)

## Which problem is this PR solving?
- part of jaegertracing#5254 

## Description of the changes
- Utilizing existing `StorageIntegration` to test the jaeger-v2 OTel
Collector and gRPC storage backend with the provided config file at
`cmd/jaeger/config-elasticsearch.yaml`.

## How was this change tested?
- Start a elasticsearch or opensearch docker instance.
- Run `STORAGE=elasticsearch SPAN_STORAGE_TYPE=elasticsearch make
jaeger-v2-storage-integration-test`

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
yurishkuro added a commit that referenced this issue May 5, 2024
## Which problem is this PR solving?
-  part of #5254 

## Description of the changes
- added cassandra integration tests
- added method to purge cassandra storage

## How was this change tested?
- some tests are failing

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <122517264+akagami-harsh@users.noreply.github.com>
Signed-off-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Co-authored-by: Yuri Shkuro <github@ysh.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

3 participants