Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype ingestor-less replication-less architecture #3192

Open
3 tasks
petethepig opened this issue Apr 10, 2024 · 0 comments
Open
3 tasks

Prototype ingestor-less replication-less architecture #3192

petethepig opened this issue Apr 10, 2024 · 0 comments
Labels
backend Mostly go code

Comments

@petethepig
Copy link
Member

petethepig commented Apr 10, 2024

Is your feature request related to a problem? Please describe.

Currently at large scale Pyroscope struggles with a few issues:

  • compaction
    • is using a lot of resources, particularly ram
      • this is mostly because of the need for deduplication of data, which is caused by replication
  • replication (x3)
    • uses a lot of resources (cup, disk, ram)
    • is hard to maintain
    • leads to complex bugs
    • reduces read performance due to the need for deduplication of data
  • reads and writes are not isolated so spike in reads often affects writes

Describe the solution you'd like

We could do the following:

  • remove ingesters
  • make it so that distributors create small blocks in memory and flush them to object storage
  • remove deduplication code from read path and compaction
  • tweak compaction to work with increased number of blocks
  • tweak read path to work with increased number of blocks

Concerns / Risks

These are not blockers, but rather a list of things that might derail the project, so we should make sure we keep these concerns in mind and address these early:

  • increased object storage write costs
  • reduced query performance due to too many small blocks
  • running into some unforeseen performance limitations of underlying object storage

Acceptance Criteria

  • it should work with traffic in ops
  • costs should not go up. it is fine if we have to exchange reduced ingesters costs with increased queriers cost though, because we can address this later
  • performance should not go down

Timeline / Staffing

Great news is that we already have all of the components built for this and so this project becomes a lot more about moving things around and tweaking the system rather than building new stuff.

I think if we split up the task we could get a good working prototype in about 4 weeks and 3 people. I think @kolesnikovae should lead this project.

Assuming the project succeeds we could then spend another 4 weeks doing migrations / further tweaking algorithms to work with the new system.

Outcome

The main thing is that the system becomes simpler. To elaborate on that, these changes would:

  • improve maintainability of the system
  • reduce toil
  • significantly reduce tco
  • improve read performance

Additional context

All credit for this idea should go to @kolesnikovae — I'm just trying to document the proposed solution. Also, this is somewhat of a meta-issue. I imagine a lot of other existing issues could be steps towards implementation of this project.

@petethepig petethepig added the backend Mostly go code label Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Mostly go code
Projects
None yet
Development

No branches or pull requests

1 participant