Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DYOD] Add variable string segment #2593

Open
wants to merge 82 commits into
base: master
Choose a base branch
from

Conversation

phkeese
Copy link

@phkeese phkeese commented Jul 12, 2023

Description

Adds VariableStringSegment, a new segment for storing strings.
It works similarly to the FixedStringDictionarySegment by storing deduplicated strings in a contiguous block of memory.
It uses an additional layer of indirection to allow VariableStringDictionarySegment to behave like a dictionary, including optimized scan performance due to lower_bound() and upper_bound().
Point access needs to go through another indirection, though.

Benchmarking

To run a benchmark, use the following command:

python3 scripts/evaluate_string_segments.py benchmark -b hyriseBenchmarkTPCH -b hyriseBenchmarkTPCDS -b hyriseBenchmarkJoinOrder -b hyriseBenchmarkStarSchema -d -e VariableStringDictionary -p cmake-build-release -s SCALE && python3 scripts/evaluate_string_segments.py benchmark -b hyriseBenchmarkTPCH -b hyriseBenchmarkTPCDS -b hyriseBenchmarkJoinOrder -b hyriseBenchmarkStarSchema -d -e VariableStringDictionary -p cmake-build-release -s SCALE --metrics

Remember to replace SCALE with the intended scale factor.

To create analysis diagrams from this, use the following command:

python3 scripts/evaluate_string_segments.py evaluate -t ./tmp/*False.json -m ./tmp/*True.json -o results

Performance

comparison

The other variable string segment branches are different approaches that did not perform well enough.

Update:

This is the most recent run with the final implement:

comparison

phkeese and others added 2 commits July 12, 2023 17:53
Co-authored-by: Marie Fischer <marie.fischer@student.hpi.de>
Co-authored-by: Clemens <68013019+clfesc@users.noreply.github.com>
@phkeese phkeese added the FullCI Run all CI tests (slow, but required for merge) label Aug 23, 2023
@phkeese phkeese removed the FullCI Run all CI tests (slow, but required for merge) label Aug 24, 2023
phkeese and others added 3 commits August 24, 2023 21:04
[skip ci]
… github.com:phkeese/hyrise into feature/variable-string-length-segment-three-layers
@phkeese phkeese added the FullCI Run all CI tests (slow, but required for merge) label Aug 24, 2023
@Bouncner
Copy link
Collaborator

Bouncner commented Oct 9, 2023

Hi @phkeese, @ClFeSc, and @23mafi: I assume you already left the Hyrise Slack channel? Had a question concerning updated benchmark results.

const AllTypeVariant& value)
const {
DebugAssert(!variant_is_null(value), "Null value passed.");
access_counter[SegmentAccessCounter::AccessType::Dictionary] +=
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should also increase other counters. We also search the offset_vector, right?

/**
* @brief Segment implementing variable length string encoding.
*
* Uses vector compression schemes for its attribute vector.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I haven't seen it yet (still reviewing): give a highlevel description of your implementation here?

src/lib/storage/dictionary_segment.hpp Show resolved Hide resolved
src/lib/storage/create_iterable_from_segment.hpp Outdated Show resolved Hide resolved
auto create_iterable_from_segment(const VariableStringDictionarySegment<T>& segment) {
#ifdef HYRISE_ERASE_VARIABLESTRINGDICTIONARY
PerformanceWarning("VariableStringDictionarySegmentIterable erased by compile-time setting");
return AnySegmentIterable<T>(DictionarySegmentIterable<T, FixedStringVector>(segment));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a copy paste error.

@Bouncner Bouncner force-pushed the feature/variable-string-length-segment-three-layers branch from 5d4719b to db0344f Compare May 7, 2024 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature FullCI Run all CI tests (slow, but required for merge)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants