Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: Recompression, part 5. Testing. #57450

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

aclamk
Copy link
Contributor

@aclamk aclamk commented May 13, 2024

This is test part, that accumulates settings done to run unit tests and teuthology tests.

#54075 Nice debugs.
#54504 New write path.
#57448 Segmented onode.
#56975 Main
#57450 Test

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Introduce printer class that allows to select parts of Blob that are to be printed.
It severly reduced amount of clutter in output.
Usage:
using P = Bluestore::Blob::printer;
dout << blob->printer(P::ptr + P::sdisk + P::schk);

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Modify Extent similar to Blob, so that one can use improved Blob printing
when printing extents.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Now printing Blob can include buffers.
There are 2 variants:
- 'buf' same as original in dump_onode
- 'sbuf' only fundamental params, no ptr etc.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Added nice replacement for dump_onode function.
Introduce printer class that allows to select parts of Onode that are to be printed.
It severly reduced amount of clutter in output.
Usage:
using P = Bluestore::printer;
dout << blob->print(P::ptr + P::sdisk + P::schk + P::buf + P::attrs);

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
- moved operator<< to BlueStore_debug file
- upcased Printer {} flags
- more reliable heap begin detection
- fixup after rebase

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Copy link
Contributor

@ifed01 ifed01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of failed test cases in store_test:
[ FAILED ] ObjectStore/StoreTestSpecificAUSize.ReproBug41901Test/1, where GetParam() = "bluestore" (1426 ms)
[ FAILED ] ObjectStore/StoreTestSpecificAUSize.BluestoreStatFSTest/1, where GetParam() = "bluestore" (7027 ms)
[ FAILED ] ObjectStore/StoreTestSpecificAUSize.BluestoreFragmentedBlobTest/1, where GetParam() = "bluestore" (4740 ms)

And finally this stops at:
[ RUN ] ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCsumVsCompression/1
---------------------- 1 / 16 ----------------------
bluestore_min_alloc_size = 4096
max_write = 131072
max_size = 262144
alignment = 512
bluestore_compression_mode = force
bluestore_compression_algorithm = snappy
bluestore_csum_type = crc32c
bluestore_default_buffered_read = true
bluestore_default_buffered_write = true
bluestore_sync_submit_transaction = false
seeding object 0
seeding object 500
Op 0
available_objects: 994 in_flight_objects: 6 total objects: 1000 in_flight 6
Op 1000
available_objects: 998 in_flight_objects: 0 total objects: 998 in_flight 0
Op 2000
available_objects: 1004 in_flight_objects: 0 total objects: 1004 in_flight 0
--- buffer mismatch between offset 0x16c00 and 0x17000, total 0x30000
...
home/if/ceph.3/src/test/objectstore/store_test.cc: In function 'virtual void SyntheticWorkloadState::C_SyntheticOnReadable::finish(int)' thread 7f2a99a8f6c0 time 2024-05-21T00:23:03.209227+0300
/home/if/ceph.3/src/test/objectstore/store_test.cc: 4314: FAILED ceph_assert(bl_eq(state->contents[hoid].data, r2))

@aclamk aclamk force-pushed the wip-aclamk-bs-compression-recompression-test branch 3 times, most recently from ac8d66d to fcfbd4e Compare May 23, 2024 06:48
Small improvement on debug output.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Created new variant of bluestore_blob_t::release_extents function.
Now the function takes range [offset~length] as an argument,
a simplification that allows it to have much better performance.

Created comprehensive unit test, tests 40k random blobs.
The unit test does not test for a potential case of having
bluestore_blob_t.extents that are not allocation unit aligned.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
p2remain gives remaining data in a block.
It simialar to p2nphase, but for 0 offset returns full size.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
New version of put().
It is simpler and faster, but does not allow for
overprovisioning of used AUs.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Created dedicated mutator of ExtentMap that is useful when
a logical extent must be split.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Introducing new logic of Onode processing during write.
New punch_hole_2 function empties range, but keeps track of elements:
- allocations that are no longer used
- blobs that are now empty
- shared blobs that got modified
- statfs changes to apply later

This change allows to reuse allocation for deferred freely, which means
that we can use allocations in deferred mode in other blob then they come from.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Comprehensive tests for punch_hole_2.
New formulation of punch_hole_2 makes it very easy to
create patterns and inspect results.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
It is more organized this way.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Set of various simple functions to simplify code.
No special logic here.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
aclamk added 24 commits May 27, 2024 08:48
BlueStore::Writer is a toolkit to give more options to control write.
It gives more control over compression process, letting user of the class
manually split incoming data to blobs.
Now for large writes all but last blob can be fully filled with data.

There is now a single place that decides on deferred/direct.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Extensive tests for BlueStore::Writer.
Some extra debug hooks for BlueStore.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
New "write_lat" tracks wallclock time spent in execution of BlueStore::_write().

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Added _do_write_v2 function to BlueStore.
Function is not integrated yet.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Add new conf variable.
bluestore_write_v2 = true : use new _do_write_v2() function
bluestore_write_v2 = false : use legacy _do_write() function
This variable is read only at start time.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
We need to scan all extents that are under proposed read / padding.
It was broken and we could clear out something defined.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
The search must go over entire range where we can have aligned mutable blobs.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
A review proposal.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Result of ongoing review.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
It was not checked if necessary location is still within blob range.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Split object data into segments of conf.bluestore_segment_data_size bytes.
This means that no blob will be in two segments at the same time.
Modified reshard function to prefer segment separation lines.
As a result no spanning blobs are created.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
New segment_size pool option creates segmentation in BlueStore Onodes.
This simplifies metadata encoding.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Add feature of recompression scanner that looks around write region to see how much
would be gained, if we read some more around and wrote more.
Added Compression.h / Compression.cc.
Added debug_bluestore_compression dout.
Created Scanner class.
Provides write_lookaround() for scanning loaded extents.
Created Estimator class.
It is used by Scanner to decide if specific extent is to be recompressed.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
An error causes recompression lookup to go into infinite loop.
Now we properly skip over shared blobs in compressed expansion step.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Split do_write into do_write and do_write_with_blobs.
The original is used when only uncompressed data is written.
The new one accepts stream of data formatted into blobs;
the blobs can be compressed or uncompressed.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Modify do_write_v2() to handle also compressed.
Segmented and regular cases are recognized and handled properly.
New do_write_v2_compressed() oversees compression / recompression.

Added split_and_compress() function.
It is dumb now and always assumes compression makes sense.
To be improved.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Add blob_create_full_compressed.
Fix do_put_new_blobs to handle compressed.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Move most logic from Scanner to Estimator.
Prepare for future machine learning / adaptive algorithm for estimation.
Renamed functions, added interface comments.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Give Estimator proper logic.
It now learns expected recompression values,
and uses them in next iterations to predict.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Make one estimator per collection.
It makes possible for estimator to learn in collection specific compressibility.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Added missing files to alienstore CMake list.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
@aclamk aclamk force-pushed the wip-aclamk-bs-compression-recompression-test branch from fcfbd4e to ec09451 Compare May 27, 2024 07:03
Tests that use original write path specific knowledge are failing now.
For such tests, force conf/bluestore_write_v2=false.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Empty line.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
@aclamk aclamk force-pushed the wip-aclamk-bs-compression-recompression-test branch from ec09451 to ca39eb4 Compare May 29, 2024 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants