-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DYOD] Add variable string segment #2593
base: master
Are you sure you want to change the base?
[DYOD] Add variable string segment #2593
Conversation
* Also create temp-directory when not exists
[skip ci]
Co-authored-by: Marie Fischer <marie.fischer@student.hpi.de> Co-authored-by: Clemens <68013019+clfesc@users.noreply.github.com>
…feature/variable-string-length-segment-three-layers
… github.com:phkeese/hyrise into feature/variable-string-length-segment-three-layers
… github.com:phkeese/hyrise into feature/variable-string-length-segment-three-layers
…eHeaders pipeline step
…string-length-segment-three-layers
src/lib/storage/variable_string_dictionary/variable_string_dictionary_encoder.hpp
Show resolved
Hide resolved
src/lib/storage/variable_string_dictionary/variable_string_dictionary_encoder.hpp
Outdated
Show resolved
Hide resolved
src/lib/storage/variable_string_dictionary/variable_string_dictionary_encoder.hpp
Show resolved
Hide resolved
src/lib/storage/variable_string_dictionary/variable_string_dictionary_encoder.hpp
Show resolved
Hide resolved
const AllTypeVariant& value) | ||
const { | ||
DebugAssert(!variant_is_null(value), "Null value passed."); | ||
access_counter[SegmentAccessCounter::AccessType::Dictionary] += |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we should also increase other counters. We also search the offset_vector, right?
/** | ||
* @brief Segment implementing variable length string encoding. | ||
* | ||
* Uses vector compression schemes for its attribute vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I haven't seen it yet (still reviewing): give a highlevel description of your implementation here?
auto create_iterable_from_segment(const VariableStringDictionarySegment<T>& segment) { | ||
#ifdef HYRISE_ERASE_VARIABLESTRINGDICTIONARY | ||
PerformanceWarning("VariableStringDictionarySegmentIterable erased by compile-time setting"); | ||
return AnySegmentIterable<T>(DictionarySegmentIterable<T, FixedStringVector>(segment)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a copy paste error.
5d4719b
to
db0344f
Compare
Description
Adds VariableStringSegment, a new segment for storing strings.
It works similarly to the FixedStringDictionarySegment by storing deduplicated strings in a contiguous block of memory.
It uses an additional layer of indirection to allow
VariableStringDictionarySegment
to behave like a dictionary, including optimized scan performance due tolower_bound()
andupper_bound()
.Point access needs to go through another indirection, though.
Benchmarking
To run a benchmark, use the following command:
python3 scripts/evaluate_string_segments.py benchmark -b hyriseBenchmarkTPCH -b hyriseBenchmarkTPCDS -b hyriseBenchmarkJoinOrder -b hyriseBenchmarkStarSchema -d -e VariableStringDictionary -p cmake-build-release -s SCALE && python3 scripts/evaluate_string_segments.py benchmark -b hyriseBenchmarkTPCH -b hyriseBenchmarkTPCDS -b hyriseBenchmarkJoinOrder -b hyriseBenchmarkStarSchema -d -e VariableStringDictionary -p cmake-build-release -s SCALE --metrics
Remember to replace
SCALE
with the intended scale factor.To create analysis diagrams from this, use the following command:
Performance
The other variable string segment branches are different approaches that did not perform well enough.
Update:
This is the most recent run with the final implement: