Add separate parallel/intensity control for tablets #3792

Michal-Leszczynski · 2024-04-15T09:42:39Z

As per tablet repair design, SM should introduce separate flags for controlling repair parallelism/intensity for tablet tables.
Those new flags should also be supported by the sctool repair control command.

As per mentioned doc, the default for --tablet-intensity should be the amount of shards (even despite max_repair_ranges_in_parallel). I'm not aware of any reasonable limit for this value, so I suppose that SM won't cap it at any point.

I'm wondering whether it makes sense to introduce --tablet-parallel flag (@tgrabiec please advise). If there is a need for this flag, then I believe that it should behave in the same way as --parallel flag. This means that it has the same default (1) and the same cap.

cc: @karol-kokoszka @tzach @tgrabiec @asias

The text was updated successfully, but these errors were encountered:

Michal-Leszczynski · 2024-04-15T10:28:15Z

Also, it might be confusing for the user that the default value of --intensity is 1 - meaning that we want to repair with the lowest intensity by default. On the other hand, suggested default value of --tablet-intensity is number of shards which sounds like something that utilizes repaired node way more.

asias · 2024-04-17T02:22:32Z

We already have enough parameters for the parallel intensity. I think it is better to use the existing ones but adjust automatically internally for the tablet tables when scheduling. It would be a nightmare for users to figure out the exact meaning of --tablet-intensity, --tablet-parallel, ---intensity and --parallel.

E.g., for tablet, if ---intensity is 1, we make sure SM sends 1 range per shard when scheduling.

Michal-Leszczynski · 2024-04-17T07:37:17Z

@asias In theory I like this approach, but what would happen when setting --intenisty 0 (max supported intensity)? Should SM send max_repair_ranges_in_parallel * number_of_shards token ranges for a single job?

@tzach what do you think?

tzach · 2024-04-17T08:40:13Z

We already have enough parameters for the parallel intensity.

Since Tablets are new, we will likely need to update the default instantly for tablets in the field while keeping the vNode intensity as is (or vice versa)

So, I suggest having different APIs for Tablets and vNodes as long as we have a mixed cluster with both. It does add complexity, but one does not have to set any of these values.

The suggested default above makes sense.

asias · 2024-04-18T03:40:58Z

We already have enough parameters for the parallel intensity. I think it is better to use the existing ones but adjust automatically internally for the tablet tables when scheduling. It would be a nightmare for users to figure out the exact meaning of --tablet-intensity, --tablet-parallel, ---intensity and --parallel.

E.g., for tablet, if ---intensity is 1, we make sure SM sends 1 range per shard when scheduling.

@asias In theory I like this approach, but what would happen when setting --intenisty 0 (max supported intensity)? Should SM send max_repair_ranges_in_parallel * number_of_shards token ranges for a single job?

With --intenisty 0, SM still sends max_repair_ranges_in_parallel ranges per shard, resulting max_repair_ranges_in_parallel * number_of_shards ranges in total. We know which shard owns the range for tablet, so SM could select the correct ranges for each shard easily.

@tzach what do you think?

asias · 2024-04-18T03:46:58Z

We already have enough parameters for the parallel intensity.

Since Tablets are new, we will likely need to update the default instantly for tablets in the field while keeping the vNode intensity as is (or vice versa)

So, I suggest having different APIs for Tablets and vNodes as long as we have a mixed cluster with both. It does add complexity, but one does not have to set any of these values.

If we expose the new options, users will start to use. It is confusing to set values differently for vnode or tablet tables.

In the long term, scylla core is going to schedule and control the intensity internally once we have a built-in repair scheduler which works better with tablet migration, e.g., no need to disable migration.

The suggested default above makes sense.

tzach · 2024-04-18T06:23:58Z

If we expose the new options, users will start to use. It is confusing to set values differently for vnode or tablet tables.

The reality is these are two different parameters with two different defaults.
Coupling them under one API will break the first time a user needs to update only one of the two (probably soon)

In the long term, scylla core is going to schedule and control the intensity internally once we have a built-in repair scheduler which works better with tablet migration, e.g., no need to disable migration.

+1

asias · 2024-04-18T06:37:52Z

If we expose the new options, users will start to use. It is confusing to set values differently for vnode or tablet tables.

The reality is these are two different parameters with two different defaults. Coupling them under one API will break the first time a user needs to update only one of the two (probably soon)

No. The defaults are the same for both vnode and tablet tables. The meaning of the intensity is also the same, which controls the number of ranges repaired per shard.

E.g.,

Intensity specifies how many token ranges can be repaired in a Scylla node at every given time. The default intensity is one, you can change that using [sctool repair –intensity flag](https://manager.docs.scylladb.com/stable/sctool/repair.html#sctool-repair).

The default –intensity 1, still makes sense for tablet. There is not need to change it. The only difference is internal to SM, which sends one range per shard to scylla core (number of shard ranges to scylla core), but this does not change the promise that one range per shard is repaired, which is exactly like what we do for vnode table previously.

In the long term, scylla core is going to schedule and control the intensity internally once we have a built-in repair scheduler which works better with tablet migration, e.g., no need to disable migration.

+1

Michal-Leszczynski · 2024-04-18T12:43:39Z

With --intenisty 0, SM still sends max_repair_ranges_in_parallel ranges per shard, resulting max_repair_ranges_in_parallel * number_of_shards ranges in total. We know which shard owns the range for tablet, so SM could select the correct ranges for each shard easily.

@asias Just to clarify:
Right now, when intensity is set to 1, SM sends just 1 token range per Scylla repair job - SM does not multiply intensity by shard_cnt for vnode tables. Is this correct behavior from core POV? So you suggest that the only difference between vnode and tablet intensity is to multiply it by the max_repair_ranges_in_parallel?

Is max_repair_ranges_in_parallel a per shard limit or per node limit? I remember that when we were discussing changes in SM 3.2 repair, it was said that SM shouldn't really take shards into consideration (except for calculating max_repair_ranges_in_parallel or choosing repair

We know which shard owns the range for tablet, so SM could select the correct ranges for each shard easily.

Also, can elaborate on this? Does it mean that when choosing ranges for repair job, SM should do it so that each shard owns max_repair_ranges_in_parallel from selected ranges?

mykaul · 2024-04-18T12:50:15Z

Keeping in mind that for some time we'll have both tablets and vnodes in the same cluster, what does it actually mean to run with different intensity? How do we prevent one hurting the other?

asias · 2024-04-22T02:04:32Z

With --intenisty 0, SM still sends max_repair_ranges_in_parallel ranges per shard, resulting max_repair_ranges_in_parallel * number_of_shards ranges in total. We know which shard owns the range for tablet, so SM could select the correct ranges for each shard easily.

@asias Just to clarify: Right now, when intensity is set to 1, SM sends just 1 token range per Scylla repair job - SM does not multiply intensity by shard_cnt for vnode tables. Is this correct behavior from core POV?

This is correct. The requested range will be worked in parallel by all shards. It does what the --intensity 1 is supposed to control.

So you suggest that the only difference between vnode and tablet intensity is to multiply it by the max_repair_ranges_in_parallel?

Not multiply by max_repair_ranges_in_parallel. With --intensity 1, SM will find ranges for each shard and send one such range per shard. With tablet, each range will always be owned by one shard.

E.g, there are two shards and there are 8 tablets (8 ranges)

r1, r2, r3, r4 shard0
r5, r6, r7, r8 shard1

With --intensity 1, SM will find r1,r2,r3,r4 belongs to shard 0 and the rests belongs to shard 1. SM sends one range ( out of r1 to r4) to shard 0 and one range (out of r5 to r8) to shard 1 independently.

This ensures, at any point in time, scylla will repair at most one range per shard. This is exactly what --intensity 1 does for vnode.

Does this make sense now?

Is max_repair_ranges_in_parallel a per shard limit or per node limit? I remember that when we were discussing changes in SM 3.2 repair, it was said that SM shouldn't really take shards into consideration (except for calculating max_repair_ranges_in_parallel or choosing repair

max_repair_ranges_in_parallel is per SHARD limit.

We know which shard owns the range for tablet, so SM could select the correct ranges for each shard easily.

Also, can elaborate on this? Does it mean that when choosing ranges for repair job, SM should do it so that each shard owns max_repair_ranges_in_parallel from selected ranges?

See the example above.

asias · 2024-04-22T02:07:26Z

Keeping in mind that for some time we'll have both tablets and vnodes in the same cluster, what does it actually mean to run with different intensity? How do we prevent one hurting the other?

I am actually suggesting not to run different intensity for vnode and tablet tables. SM repairs table after table, so they will not hurt each other.

Michal-Leszczynski · 2024-04-22T07:58:18Z

Not multiply by max_repair_ranges_in_parallel. With --intensity 1, SM will find ranges for each shard and send one such range per shard. With tablet, each range will always be owned by one shard.

@asias, after reading integration with SM from tablet repair design doc, I wasn't aware that SM is supposed to calculate ranges ownership and treat intensity as owned ranges per shard. Right now it's not done, but can be added. I understand that it is important to support this?

Also, is this range to shard ownership calculated based on Murmur3Partitioner arithmetic (like here)?

Michal-Leszczynski · 2024-04-22T08:00:37Z

But after this explanation, perhaps it's indeed ok to use the same intensity flag for both vnode and tablet tables (but send intensity owned ranges per shard for tablet table).

@tzach @mykaul @karol-kokoszka

asias · 2024-04-22T08:14:37Z

Not multiply by max_repair_ranges_in_parallel. With --intensity 1, SM will find ranges for each shard and send one such range per shard. With tablet, each range will always be owned by one shard.

@asias, after reading integration with SM from tablet repair design doc, I wasn't aware that SM is supposed to calculate ranges ownership and treat intensity as owned ranges per shard. Right now it's not done, but can be added. I understand that it is important to support this?

Also, is this range to shard ownership calculated based on Murmur3Partitioner arithmetic (like here)?

No, the tablet token range to shard ownership is completely not using the vnode algorithms. We need scylla core to expose this information if needed.

However, I have an idea to avoid asking SM to be aware of this shard mapping, which simplifies the requests logic on SM side significantly. We could use ranges_parallelism option we introduced to control the parallelism.

                  {
                     "name":"ranges_parallelism",
                     "description":"An integer specifying the number of ranges to repair in parallel by user request. If this number is bigger than the max_repair_ranges_in_parallel calculated by Scylla core, the smaller one will be used.",
                     "required":false,
                     "allowMultiple":false,
                     "type":"string",
                     "paramType":"query"
                  },

Without knowing which ranges belong to which shard, SM sends all ranges (or big portion of them) need to repair per repair job, in the mean while speechifying ranges_parallelism to 1. This will ensure all shards have work to do and each shard will only work on 1 range in parallel. This avoids asking SM to send only 1 token range per repair job in order to implement the --intensity 1.

Michal-Leszczynski · 2024-04-22T08:47:14Z

I tried to put your idea into an issue (#3789) and from my understanding, it can also be really useful for the vnode based table. (there is also another repair improvement issue #3790 that perhaps can also improve tablet table repair)

asias · 2024-04-23T02:47:16Z

I tried to put your idea into an issue (#3789) and from my understanding, it can also be really useful for the vnode based table. (there is also another repair improvement issue #3790 that perhaps can also improve tablet table repair)

Yes, the option is introduced exactly for the vnode tables in the past. I suggested in the past to use the ranges_parallelism to implement ranges parallelism while sending more ranges in a batch as well.

asias · 2024-04-25T02:20:00Z

@Michal-Leszczynski FYI. This PR adds ranges_parallelism for tablet. scylladb/scylladb#18385

Michal-Leszczynski · 2024-04-26T10:41:08Z

@asias btw, can we expect small_table_optimization to be implemented for tablets in Scylla 6.0?

asias · 2024-04-27T01:04:59Z

No. It is not implemented. We do not need small table optimization for tablet tables.

…

On Fri, Apr 26, 2024, 18:41 Michal-Leszczynski ***@***.***> wrote: @asias <https://github.com/asias> btw, can we expect small_table_optimization to be implemented for tablets in Scylla 6.0? — Reply to this email directly, view it on GitHub <#3792 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACOETBPTSAVVOF5YB63IF3Y7IVNVAVCNFSM6AAAAABGG7V4X6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZZGEZTOMZUGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

asias · 2024-05-09T11:48:18Z

@Michal-Leszczynski FYI, the scylladb/scylladb#18385 is now in master.

Michal-Leszczynski · 2024-05-13T16:31:53Z

Always sending all (or a lot of) ranges belonging to given replica set in a single repair job is hurting task granularity.
AFAIK there is no way to tell which ranges were successfully repaired in a failed repair job, so even if only a single range fails, all will be retried. The same goes for situations when repair is paused by user or because of specified --window flag.

Is re-repairing given token range as expensive as the first repair of that token range? Or is it way cheaper and can be ignored (because it just streams hashes and not actual data)?

If re-repairing is usually as expensive as the first one, then we would need to somehow control the amount of ranges sent in a single repair job. I see 3 options:

Introduce yet another flag (e.g. --batch-size). The downside is that it would make repair control more complicated for the user and might be ignored.
Introduce new field in scylla-manager.yaml config (e.g. repair: batch-size: x) with big default value (e.g. 256). This would indicate that it's best not to touch it (most users won't even notice it), but it would still be possible to disable it if it does not play well with given cluster and task setup.
Make it so that the first execution of repair task sends all (or a lot of) ranges in a single repair job. All next executions (retries after errors, resuming after pause etc.) will fall back to the current behavior. This could prevent situations when some ranges are re-repaired many times because some other ranges failed or because of pausing.

@asias what do you think?

asias · 2024-05-14T01:49:17Z

Always sending all (or a lot of) ranges belonging to given replica set in a single repair job is hurting task granularity. AFAIK there is no way to tell which ranges were successfully repaired in a failed repair job, so even if only a single range fails, all will be retried. The same goes for situations when repair is paused by user or because of specified --window flag.

Is re-repairing given token range as expensive as the first repair of that token range? Or is it way cheaper and can be ignored (because it just streams hashes and not actual data)?

It should be much cheaper because the difference between node is much smaller than it was. When there is no difference, repair only read data and calculate a check. A granularity of 10% ranges looks good enough, i.e, sending 10% of ranges in a single job. if the job with 10% ranges fails, we retry it. It is not a big deal. Also, failed repair is supposed to be rare compared to successful repair.

If re-repairing is usually as expensive as the first one, then we would need to somehow control the amount of ranges sent in a single repair job. I see 3 options:
* Introduce yet another flag (e.g. `--batch-size`). The downside is that it would make repair control more complicated for the user and might be ignored.

* Introduce new field in `scylla-manager.yaml` config (e.g. `repair: batch-size: x`) with big default value (e.g. 256). This would indicate that it's best not to touch it (most users won't even notice it), but it would still be possible to disable it if it does not play well with given cluster and task setup.

It is best to avoid the --batch-size config to users. We have enough options to control it.

* Make it so that the first execution of repair task sends all (or a lot of) ranges in a single repair job. All next executions (retries after errors, resuming after pause etc.) will fall back to the current behavior. This could prevent situations when some ranges are re-repaired many times because some other ranges failed or because of pausing.

If some 10% of ranges fail, we could continue to repair the next 10% of ranges. This prevents some of the ranges blocking the other ranges infinitely. We could also add some randomness how we group them into 10% of ranges.

@asias what do you think?

In the long run, we could provide a api to return the successfully repaired ranges.

Michal-Leszczynski · 2024-05-15T10:33:04Z

It is best to avoid the --batch-size config to users. We have enough options to control it.

I agree.

If some 10% of ranges fail, we could continue to repair the next 10% of ranges. This prevents some of the ranges blocking the other ranges infinitely. We could also add some randomness how we group them into 10% of ranges.

By default, repair does not end task execution on first encountered error, but just goes through all ranges, repairs them, and then retries any failed jobs after some backoff. So when I wrote the following:

Make it so that the first execution of repair task sends all (or a lot of) ranges in a single repair job. All next executions (retries after errors, resuming after pause etc.) will fall back to the current behavior. This could prevent situations when some ranges are re-repaired many times because some other ranges failed or because of pausing.

I meant that perhaps SM should drop this 10% batching (return to the current behavior) when retrying failed ranges after backoff. So all ranges will be first repaired with 10% batching and only in case of an error, they will be retried without batching.

The same could be applied to the situation when repair task needs to be interrupted because getting outside of --window. I'm worried about a situation where:

--window is specified to be many, small size windows when SM can be performing repair
repaired table is really big and fully replicated

There is some slight possibility that applying this 10% batching could result in SM not being able to make progress between going out of work windows.

So this approach (dropping batching on retry) would be a general safety valve for this implementation that could perhaps be removed in the future.

@asias @karol-kokoszka @tzach

karol-kokoszka · 2024-05-16T10:27:30Z

@Michal-Leszczynski I labelled this issue so that we can go through it on Monday's grooming. Let's sum up your discussion with @asias there and prepare the summary for Wednesday's planning and triage it with expected SM version.

karol-kokoszka · 2024-05-16T14:41:38Z

GROOMING NOTES

The initial idea was to introduce completely new flags to control intensity and parallelism for tablet-based tables.

@asias suggested that we should use the same flag as we do for regular tables.

Intensity = 1 means that only one range is sent in a single repair job, and the job is going to be handled by all the shards.
For vnodes, we don't care about the range-to-shard mapping.

For tablet-based repair, ranges are mapped to shards, which means that a single range will be repaired by the "owner" shard only. Scylla Manager (SM) doesn't know the mapping and is not aware of which shard owns which ranges. This makes it impossible to find the correct set of ranges to be sent in a single repair job.

Due to the facts described above, the proposed solution includes changing the internal meaning of the intensity flag for tablet-based tables.
The intensity flag currently has the following meaning: https://manager.docs.scylladb.com/stable/repair/#repair-intensity

Intensity specifies how many token ranges can be repaired in a Scylla node at any given time. The default intensity is one, you can change that using sctool repair –intensity flag.

This is interpreted by SM code as "how many token ranges to send in a single repair job."
The proposal is to change the interpretation on the SM side to send, for every single repair job, 10% of the ranges owned by the current replica set. The intensity specifies how many token ranges can be repaired in a Scylla node at any given time, which will be guaranteed by setting the ranges-parallelism parameter in the Scylla API call:

scylla-manager/swagger/scylla_v1.json

Lines 10407 to 10412 in 4d169fe

    
             "name": "ranges_parallelism", 
        
             "in": "query", 
        
             "required": false, 
        
             "type": "string", 
        
             "description": "An integer specifying the number of ranges to repair in parallel by user request. If this number is bigger than the max_repair_ranges_in_parallel calculated by Scylla core, the smaller one will be used." 
        
           },

By implementing this, we improve the CPU/shard utilization during the repair process, which should lead to improved performance.

The following design document must be updated to include the chosen approach: https://docs.google.com/document/d/1mBaufbomXU6hDO_25KkhC7CO65AbFgKpHVpB5Mei8xA/edit#heading=h.o646d4880ftd

The proposal from @Michal-Leszczynski is to create a fallback for the approach with 10% of ranges sent in a single repair job. If one range of these 10% fails, then SM would need to repair all these ranges once again. To avoid a loop of retried repairs, we should use 10% only in the first repair job execution; failover should proceed with the old approach (intensity = number of ranges in the repair job).

To evaluate if the given approach is correct, we need to have Scylla Cluster Test (SCT) allowing us to execute the repair on a large cluster (1TB?) and compare the metrics. (cc: @mikliapko)

cc: @mykaul @tzach @vladzcloudius -> We need to decide if we want to change the repair now and include @asias's suggestions, or if we agree to underutilize the nodes during the repair and send the intensity number of token ranges in a single repair job, maintaining the 3.2 approach.

Let's bring this to the planning.

asias · 2024-05-17T01:38:34Z

It is best to avoid the --batch-size config to users. We have enough options to control it.

I agree.

If some 10% of ranges fail, we could continue to repair the next 10% of ranges. This prevents some of the ranges blocking the other ranges infinitely. We could also add some randomness how we group them into 10% of ranges.

By default, repair does not end task execution on first encountered error, but just goes through all ranges, repairs them, and then retries any failed jobs after some backoff. So when I wrote the following:

Make it so that the first execution of repair task sends all (or a lot of) ranges in a single repair job. All next executions (retries after errors, resuming after pause etc.) will fall back to the current behavior. This could prevent situations when some ranges are re-repaired many times because some other ranges failed or because of pausing.

I meant that perhaps SM should drop this 10% batching (return to the current behavior) when retrying failed ranges after backoff. So all ranges will be first repaired with 10% batching and only in case of an error, they will be retried without batching.

The same could be applied to the situation when repair task needs to be interrupted because getting outside of --window. I'm worried about a situation where:
* `--window` is specified to be many, small size windows when SM can be performing repair

* repaired table is really big and fully replicated
There is some slight possibility that applying this 10% batching could result in SM not being able to make progress between going out of work windows.

So this approach (dropping batching on retry) would be a general safety valve for this implementation that could perhaps be removed in the future.

Retry after failure and Resume after pause with less than 10% ranges per job sounds reasonable. It is possible that 10% ranges would take more than the specified window. However, it is also possible that even a single range would exceed the window too, which should be rare.

Consider

10% ranges -> failure -> 1 range at a time until all ranges in the first 10% are repaired

How about the next 10% ranges, would we use 10% or 1 range at a time?

@asias @karol-kokoszka @tzach

asias · 2024-05-17T01:48:51Z

GROOMING NOTES

The initial idea was to introduce completely new flags to control intensity and parallelism for tablet-based tables.

@asias suggested that we should use the same flag as we do for regular tables.

Intensity = 1 means that only one range is sent in a single repair job, and the job is going to be handled by all the shards. For vnodes, we don't care about the range-to-shard mapping.

For tablet-based repair, ranges are mapped to shards, which means that a single range will be repaired by the "owner" shard only. Scylla Manager (SM) doesn't know the mapping and is not aware of which shard owns which ranges. This makes it impossible to find the correct set of ranges to be sent in a single repair job.

Due to the facts described above, the proposed solution includes changing the internal meaning of the intensity flag for tablet-based tables. The intensity flag currently has the following meaning: https://manager.docs.scylladb.com/stable/repair/#repair-intensity

Intensity specifies how many token ranges can be repaired in a Scylla node at any given time. The default intensity is one, you can change that using sctool repair –intensity flag.

This is interpreted by SM code as "how many token ranges to send in a single repair job." The proposal is to change the interpretation on the SM side to send, for every single repair job, 10% of the ranges owned by the current replica set. The intensity specifies how many token ranges can be repaired in a Scylla node at any given time, which will be guaranteed by setting the ranges-parallelism parameter in the Scylla API call:

scylla-manager/swagger/scylla_v1.json

Lines 10407 to 10412 in 4d169fe

"name": "ranges_parallelism",

"in": "query",

"required": false,

"type": "string",

"description": "An integer specifying the number of ranges to repair in parallel by user request. If this number is bigger than the max_repair_ranges_in_parallel calculated by Scylla core, the smaller one will be used."

},

By implementing this, we improve the CPU/shard utilization during the repair process, which should lead to improved performance.

The following design document must be updated to include the chosen approach: https://docs.google.com/document/d/1mBaufbomXU6hDO_25KkhC7CO65AbFgKpHVpB5Mei8xA/edit#heading=h.o646d4880ftd

The proposal from @Michal-Leszczynski is to create a fallback for the approach with 10% of ranges sent in a single repair job. If one range of these 10% fails, then SM would need to repair all these ranges once again. To avoid a loop of retried repairs, we should use 10% only in the first repair job execution; failover should proceed with the old approach (intensity = number of ranges in the repair job).

To evaluate if the given approach is correct, we need to have Scylla Cluster Test (SCT) allowing us to execute the repair on a large cluster (1TB?) and compare the metrics. (cc: @mikliapko)

cc: @mykaul @tzach @vladzcloudius -> We need to decide if we want to change the repair now and include @asias's suggestions, or if we agree to underutilize the nodes during the repair and send the intensity number of token ranges in a single repair job, maintaining the 3.2 approach.

Let's bring this to the planning.

I am fine with either sending some percentage of ranges per job (+ retry with 1 job per time) or sending one job at a time (under-utilize the cluster for repair) for tablets. But let's not add new options for the tablet intensity control.

Another point is that, I am already working on the in-core automatic repair scheduler that schedules repair jobs in side scylla core, which could cope better with tablet migration and topology changes.

karol-kokoszka · 2024-05-20T09:16:46Z

The following issue is expected to help with measuring/validating the upgraded repair on tablets #3853

This should improve both vnode and tablet table repair performance, as described in issues below. Fixes #3789 Fixes #3792

Michal-Leszczynski added repair tablets labels Apr 15, 2024

Michal-Leszczynski self-assigned this Apr 15, 2024

Michal-Leszczynski mentioned this issue Apr 15, 2024

Tablets in Scylla #3516

Open

karol-kokoszka added the grooming-needed label May 16, 2024

karol-kokoszka added this to the 3.2.9 milestone May 20, 2024

karol-kokoszka removed the grooming-needed label May 20, 2024

karol-kokoszka modified the milestones: 3.2.9, 3.3 May 20, 2024

Michal-Leszczynski added a commit that referenced this issue May 24, 2024

feat(repair): use ranges_parallelism

9961d19

This should improve both vnode and tablet table repair performance, as described in issues below. Fixes #3789 Fixes #3792

Michal-Leszczynski linked a pull request May 24, 2024 that will close this issue

Repair: use ranges parallelism #3866

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add separate parallel/intensity control for tablets #3792

Add separate parallel/intensity control for tablets #3792

Michal-Leszczynski commented Apr 15, 2024

Michal-Leszczynski commented Apr 15, 2024

asias commented Apr 17, 2024

Michal-Leszczynski commented Apr 17, 2024

tzach commented Apr 17, 2024

asias commented Apr 18, 2024

asias commented Apr 18, 2024

tzach commented Apr 18, 2024

asias commented Apr 18, 2024

Michal-Leszczynski commented Apr 18, 2024

mykaul commented Apr 18, 2024

asias commented Apr 22, 2024

asias commented Apr 22, 2024

Michal-Leszczynski commented Apr 22, 2024

Michal-Leszczynski commented Apr 22, 2024

asias commented Apr 22, 2024

Michal-Leszczynski commented Apr 22, 2024

asias commented Apr 23, 2024

asias commented Apr 25, 2024

Michal-Leszczynski commented Apr 26, 2024

asias commented Apr 27, 2024 via email

asias commented May 9, 2024

Michal-Leszczynski commented May 13, 2024

asias commented May 14, 2024

Michal-Leszczynski commented May 15, 2024

karol-kokoszka commented May 16, 2024

karol-kokoszka commented May 16, 2024 •

edited

asias commented May 17, 2024

asias commented May 17, 2024

karol-kokoszka commented May 20, 2024

Add separate parallel/intensity control for tablets #3792

Add separate parallel/intensity control for tablets #3792

Comments

Michal-Leszczynski commented Apr 15, 2024

Michal-Leszczynski commented Apr 15, 2024

asias commented Apr 17, 2024

Michal-Leszczynski commented Apr 17, 2024

tzach commented Apr 17, 2024

asias commented Apr 18, 2024

asias commented Apr 18, 2024

tzach commented Apr 18, 2024

asias commented Apr 18, 2024

Michal-Leszczynski commented Apr 18, 2024

mykaul commented Apr 18, 2024

asias commented Apr 22, 2024

asias commented Apr 22, 2024

Michal-Leszczynski commented Apr 22, 2024

Michal-Leszczynski commented Apr 22, 2024

asias commented Apr 22, 2024

Michal-Leszczynski commented Apr 22, 2024

asias commented Apr 23, 2024

asias commented Apr 25, 2024

Michal-Leszczynski commented Apr 26, 2024

asias commented Apr 27, 2024 via email

asias commented May 9, 2024

Michal-Leszczynski commented May 13, 2024

asias commented May 14, 2024

Michal-Leszczynski commented May 15, 2024

karol-kokoszka commented May 16, 2024

karol-kokoszka commented May 16, 2024 • edited

asias commented May 17, 2024

asias commented May 17, 2024

karol-kokoszka commented May 20, 2024

karol-kokoszka commented May 16, 2024 •

edited