You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS Platform and Distribution (e.g., Linux Ubuntu 20.0): Marner 5.15.111.1-1.cm2
JDK version: 17
Describe the problem
Currently the on-disk state inside the rocksDB folder of a venice server for a give resource (store version) can contain folders for partitions that are no longer assigned to the host, due to missed "partition drops".
We need to ensure that only the known (metadata ?) or "assigned" partitions are present in the folder for each store_version , and remove the rest very quickly in order to recover disk space.
It is NOT sufficient to clean up this data on start without some introspection. Venice relies on delayed rebalance on it's controller to avoid unnecessary bootstraps, so some examination may be needed to determine what the appropriate disk state should be.
Tracking information
No response
Code to reproduce bug
No response
What component(s) does this bug affect?
Controller: This is the control-plane for Venice. Used to create/update/query stores and their metadata.
Router: This is the stateless query-routing layer for serving read requests.
Server: This is the component that persists all the store data.
VenicePushJob: This is the component that pushes derived data from Hadoop to Venice backend.
VenicePulsarSink: This is a Sink connector for Apache Pulsar that pushes data from Pulsar into Venice.
Thin Client: This is a stateless client users use to query Venice Router for reading store data.
Fast Client: This is a stateful client users use to query Venice Server for reading store data.
Da Vinci Client: This is an embedded, stateful client that materializes store data locally.
Alpini: This is the framework that fast-client and routers use to route requests to the storage nodes that have the data.
Samza: This is the library users use to make nearline updates to store data.
Admin Tool: This is the stand-alone client used for ad-hoc operations on Venice.
Scripts: These are the various ops scripts in the repo.
The text was updated successfully, but these errors were encountered:
Willingness to contribute
No. I cannot contribute a bug fix at this time.
Venice version
Observed since March in production
System information
Describe the problem
Currently the on-disk state inside the rocksDB folder of a venice server for a give resource (store version) can contain folders for partitions that are no longer assigned to the host, due to missed "partition drops".
We need to ensure that only the known (metadata ?) or "assigned" partitions are present in the folder for each store_version , and remove the rest very quickly in order to recover disk space.
It is NOT sufficient to clean up this data on start without some introspection. Venice relies on delayed rebalance on it's controller to avoid unnecessary bootstraps, so some examination may be needed to determine what the appropriate disk state should be.
Tracking information
No response
Code to reproduce bug
No response
What component(s) does this bug affect?
Controller
: This is the control-plane for Venice. Used to create/update/query stores and their metadata.Router
: This is the stateless query-routing layer for serving read requests.Server
: This is the component that persists all the store data.VenicePushJob
: This is the component that pushes derived data from Hadoop to Venice backend.VenicePulsarSink
: This is a Sink connector for Apache Pulsar that pushes data from Pulsar into Venice.Thin Client
: This is a stateless client users use to query Venice Router for reading store data.Fast Client
: This is a stateful client users use to query Venice Server for reading store data.Da Vinci Client
: This is an embedded, stateful client that materializes store data locally.Alpini
: This is the framework that fast-client and routers use to route requests to the storage nodes that have the data.Samza
: This is the library users use to make nearline updates to store data.Admin Tool
: This is the stand-alone client used for ad-hoc operations on Venice.Scripts
: These are the various ops scripts in the repo.The text was updated successfully, but these errors were encountered: