Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute embedded string literals hash code #10596

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

etiennebarrie
Copy link
Contributor

[Feature #20415]

With embedded strings we often have some space left in the slot, which we can use to store the string Hash code.

It's probably only worth it for string literals, as they are the ones likely to be used as hash keys.

We chose to store the Hash code right after the string terminator as to make it easy/fast to compute, and not require one more union in RString.

compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c1fa) [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23]
last_commit=Precompute embedded string literals hash code

|            |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|symbol      |     39.275M|   39.753M|
|            |           -|     1.01x|
|dyn_symbol  |     37.348M|   37.704M|
|            |           -|     1.01x|
|small_lit   |     29.514M|   33.948M|
|            |           -|     1.15x|
|frozen_lit  |     27.180M|   33.056M|
|            |           -|     1.22x|
|iseq_lit    |     27.391M|   32.242M|
|            |           -|     1.18x|

This comment has been minimized.

string.c Outdated
RUBY_ASSERT(free_bytes >= sizeof(st_index_t));
#endif

*(st_index_t *)(RSTRING_END(str) + TERM_LEN(str)) = rb_str_hash(str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid if this may cause a bus error on some platform. Is it safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may cause a bus error on some platform

Because of alignment? If it's a concern, I think we could ensure it's always aligned, the overhead would be very slightly larger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes > because of alignment

It would be safer to use aligned address if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll look into it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*(st_index_t *)(RSTRING_END(str) + TERM_LEN(str)) = rb_str_hash(str);
typedef struct {char bytes[sizeof(st_index_t)];} unaligned_index;
union {st_index_t i; unaligned_index b;} u = {.i = rb_str_hash(str)};
*(unaligned_index *)(RSTRING_END(str) + TERM_LEN(str)) = u.b;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @nobu, interesting trick. I suppose we need it on read as well?

@casperisfine
Copy link
Contributor

Looks like we broke something after the refactor, I'll investigate.

Comment on lines +248 to +250
if (FL_TEST_RAW(str, STR_PRECOMPUTED_HASH)) {
real_size += sizeof(st_index_t);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the bug. The compactor thought the string could fit in a smaller slot, corrupting the precomputed hash.

@casperisfine casperisfine force-pushed the interned-string-hashcode branch 5 times, most recently from 2bd83d4 to 8f949a4 Compare April 22, 2024 14:19
@casperisfine
Copy link
Contributor

Alright, I think the bugs have been squashed now.

With embedded strings we often have some space left in the slot, which
we can use to store the string Hash code.

It's probably only worth it for string literals, as they are the ones
likely to be used as hash keys.

We chose to store the Hash code right after the string terminator as to
make it easy/fast to compute, and not require one more union in RString.

```
compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c) [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23]
last_commit=Precompute embedded string literals hash code

|            |compare-ruby|built-ruby|
|:-----------|-----------:|---------:|
|symbol      |     39.275M|   39.753M|
|            |           -|     1.01x|
|dyn_symbol  |     37.348M|   37.704M|
|            |           -|     1.01x|
|small_lit   |     29.514M|   33.948M|
|            |           -|     1.15x|
|frozen_lit  |     27.180M|   33.056M|
|            |           -|     1.22x|
|iseq_lit    |     27.391M|   32.242M|
|            |           -|     1.18x|
```

Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants