Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use better string type #4946

Closed
2 of 6 tasks
kdy1 opened this issue Jun 11, 2022 · 42 comments · Fixed by #8126
Closed
2 of 6 tasks

Use better string type #4946

kdy1 opened this issue Jun 11, 2022 · 42 comments · Fixed by #8126
Milestone

Comments

@kdy1
Copy link
Member

kdy1 commented Jun 11, 2022

Using JsWord everywhere was my mistake. We should use Arc<str> instead of JsWord for some types.
But I'll use a wrapper named Atom

Tasks

Atoms should be (conceptually)

enum Atom {
  Inlined([u8; 7])
  Static(&'static str),
  Arc(Arc<String>)
}

but with pointer tagging.

  • Add thread-local interner

  • Use it in es ast

  • Use it in css ast

  • Use it in html ast

@kdy1 kdy1 added this to the Planned milestone Jun 11, 2022
@overlookmotel
Copy link
Contributor

@kdy1 While working on #2175, I've noticed that serializing strings (JsWords) is one of the slowest parts of the process. I've been working on an alternative serializer for JsWord to speed it up, but I've realized that may be wasted effort if plan is to replace JsWord with Atom anyway.

Can I ask a few questions:

  1. Why was using JsWord a mistake?
  2. You say above that using Arc<str> should be used for "some" types. What types should continue to use JsWord and not the new Atom struct?
  3. Is the implementation of Atom complete or does it need more work before it can be used?
  4. Is what's required to complete the transition simply replacing JsWord everywhere with Atom?

Please let me know if you have time. I may be able to make a PR.

@kdy1
Copy link
Member Author

kdy1 commented Jul 20, 2022

  1. It's useful in only some cases
  2. I think we can use Atom for all cases.
  3. It's complete
  4. Parser should be patched to use atom generator

@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 20, 2022

Thanks for swift reply. Sorry for more questions, but:

  1. Can you expand on "It's useful in only some cases"? In which cases is it not useful?

  2. Are there some cases where it is useful? Or will Atom be faster than JsWord in all cases?

  3. Is there some other motivation behind this change aside from speed?

  4. Presumably the usage of Atom::new_bad should be replaced with AtomGenerator::gen in these places?

String::deserialize(deserializer).map(Self::new_bad)

static CACHE: $crate::once_cell::sync::Lazy<$crate::Atom> =
$crate::once_cell::sync::Lazy::new(|| $crate::Atom::new_bad($s));

  1. Is the idea to have a single static AtomGenerator same as string_cache does it? Or would each pass of lexer create its own AtomGenerator instance? (I assume the first, but just checking)

  2. Does Atom need an rkyv::SerializeWith etc implementation, or can RKYV do its default thing for Arc<str>?

  3. Is the optimization which JsWord uses for common words not useful to re-implement in Atom?

Sorry for all the questions. I'd like to make a PR for this, but I don't fully understand your intention.

@kdy1
Copy link
Member Author

kdy1 commented Jul 20, 2022

  1. Can you expand on "It's useful in only some cases"? In which cases is it not useful?

It's useful for some common strings (like React) or parsing. In almost all cases Atom can reaplace JsWord.

  1. Are there some cases where it is useful? Or will Atom be faster than JsWord in all cases?

We can make Atom faster almost all cases except hashing.

  1. Is there some other motivation behind this change aside from speed?

Using interned string everywhere, cross-module/cross-thread seems like a wrong design to me.
I didn't care about it at the start.

  1. Presumably the usage of Atom::new_bad should be replaced with AtomGenerator::gen in these places?

Nope. new_bad is named bad because it does not allow sharing but actually in case of deserialization it's not a performance issue.
Also serde deserializer has no shared storage for AtomGenerator.

  1. Is the idea to have a single static AtomGenerator same as string_cache does it? Or would each pass of lexer create its own AtomGenerator instance? (I assume the first, but just checking)

Nope. It should never share the atom generator. Otherwise the situation will become more worse.

  1. Does Atom need an rkyv::SerializeWith etc implementation, or can RKYV do its default thing for Arc?

I already patched required things.

  1. Is the optimization which JsWord uses for common words not useful to re-implement in Atom?

It will be useful but I'm not sure about the way to do it without another pointer indirection.

@overlookmotel
Copy link
Contributor

Thanks for answers. OK, I think I get it - no sharing between threads.

What did you have in mind for passing the AtomGenerator around? In the lexer, it could be included in State, but there are other places that create JsWords e.g. transforms where, as far as I can see, there isn't a context object e.g.:

https://github.com/swc-project/swc/blob/main/crates/swc_ecma_transforms_base/src/hygiene/mod.rs#L59

https://github.com/swc-project/swc/blob/main/crates/swc_ecma_transforms_react/src/jsx/mod.rs#L1266

Would the AtomGenerator need to be passed into every such method? Or would you consider using something like std ::thread ::LocalKey?

LocalKey would also give access to AtomGenerator in serde deserialize method. This would have complications for code running in main thread (parse_sync etc) but I guess these methods could clear the contents of the AtomGenerator when they're done, so it's left empty for the next task. I also don't know if it's performant or not.

@kdy1
Copy link
Member Author

kdy1 commented Jul 22, 2022

We don't need to share AtomGenerator.
It's used by parser just because it will be faster.

But ti's not true tor other places

@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 23, 2022

Recap

To recap, my understanding is:

Old JsWord uses string_cache crate which combines 3 methods for storing strings:

  1. "Static" store for common words, created at compile time.
  2. "Dynamic" store for other strings.
  3. Short strings (7 chars or less) stored inline with no hashing.

Both the "static" and "dynamic" stores are global and shared across all threads.

Advantages of JsWord:

  • Global "static" store for common words saves memory because e.g. "React" will only be stored in memory once no matter how many times it's used across all threads.
  • "Static" store for common words is created at compile time with a perfect hash function, so fast lookup.
  • Reduced memory use for other strings if used more than once - again, stored once globally in for all uses across all threads.

Disadvantages of JsWord:

  • Global "dynamic" store requires a Mutex, which means threads can block other threads while they add/remove strings from the hash set.
  • Uses slow hashing function SipHash (in phf crate)

New Atom aims to fix JsWord's shortcomings by:

  • Separate AtomGenerator store per-thread to avoid blocking between threads.
  • Using faster hashing function FxHasher

Possible combined approach

I wonder if we can combine the advantages of both into a single entity, which can be used everywhere.

  1. Global store for common words (as in string_cache). This is read-only so no Mutex required.

  2. Global store is created with perfect hashing, so fast lookup.

  3. Per-thread store for less common strings (as with new Atom) - no Mutex.

  4. Both stores would use FxHasher, so only 1 hash calculation on each lookup.

In some places, the string is very unlikely to be repeated e.g. comments, so a second entity JsWordUncommon could be provided for these cases. But that would make it clearer when to use which than JsWord vs Atom, and JsWordUncommon could avoid using Arc at all.

Implementation

Probably simplest way would be to fork and amend string_cache and phf to:

  1. Use FxHasher
  2. Not create a global "dynamic store", but take store as a parameter (so can be separate per thread)

rust-phf/rust-phf#88 (comment) suggests FxHasher could be used for perfect hashing (FxHash is a close cousin of FNV). In our case, the list of common words is quite short, so that would make it easier for PHF generation to reach a solution, even if the hash function is lower quality.

It'd be ideal to get these changes implemented upstream, but maybe better get it working in SWC first and then see if can get those changes merged upstream after. rust-phf/rust-phf#236 has been open for over a year.

Question

I'm confused about one thing:

PR #5271 switched over to Atom in a lot of places. Atom currently uses Arc internally, but I was under the impression that the idea was to avoid sharing across threads to avoid the blocking problem. But @kdy1 said this isn't the case in #5271 (comment).

My understanding is that Arc uses atomic operations for ref count increment/decrement so that may re-introduce the blocking problem. Seems like atomic operations are also more expensive in general.

So I'm confused - in what circumstances are Atoms shared across threads? Wasn't the starting motivation to avoid that?

Sorry this is long. I'm coming into this cold, and I have probably missed a lot of previous conversations.

@kdy1
Copy link
Member Author

kdy1 commented Jul 23, 2022

We don't use lock while creating it but we pass it to other threads.
So Arc is required. The timing is not same.

@overlookmotel
Copy link
Contributor

Thanks for fast reply. Why/where is Atom passed between threads? Can you point me to an example in code? If I can read it, it'll probably make sense and I can stop asking you questions!

Would be very interested in your thoughts on the rest of this idea when you have time.

@kdy1
Copy link
Member Author

kdy1 commented Jul 23, 2022

I was actually planning something similar, phf + per-thread interned string without a mutex.
We process stuffs in parallel.

GLOBALS.with(|globals| {
changed = nodes
.par_iter_mut()
.map(|node| {
GLOBALS.set(globals, || {
let mut v = Pure {
expr_ctx: self.expr_ctx.clone(),
ctx: Ctx {
par_depth: self.ctx.par_depth + 1,
..self.ctx
},
changed: false,
..*self
};
node.visit_mut_with(&mut v);
v.changed
})
})
.reduce(|| false, |a, b| a || b);
})

For example minifier has lots of stateless logic

@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 24, 2022

Ah. Interesting. Another question:

Outside of the parser, where are most new JsWords created, where the string is not static? I can think of 3 places which will create a lot of strings:

  1. Hygiene renamer

fn new_name_for(&self, orig: &Id, n: &mut usize) -> swc_atoms::JsWord {
let res = if *n == 0 {
orig.0.clone()
} else {
format!("{}{}", orig.0, n).into()
};
*n += 1;
res
}

  1. Minifier where it mangles identifiers to a, b etc (can't find the place in the code)

  2. Various transforms where an identifier is prepended with _ (foo -> _foo)

Is there anywhere else which creates a lot of new JsWords?

What I'm thinking is, if it's rare for new JsWords to be created after parsing:

  • JsWord could use the current scheme of static/dynamic/inline representations.
  • During parsing, dynamic set is filled with strings.
  • Once parsing is complete, the dynamic set is made read-only, and then it can be safely shared across threads without use of Mutex or atomic operations.
  • The dynamic set would remain in memory unchanged until the whole transform/minify process is complete, and then be dropped in its entirety (similar to an arena allocator).
  • Therefore no need for reference counting each entry in the dynamic set -> no atomic operations.

@overlookmotel
Copy link
Contributor

A small prototype here: https://github.com/overlookmotel/swc/blob/new-jsword/crates/swc_atoms/src/new_jsword.rs
And some explanation here: https://github.com/overlookmotel/swc/blob/new-jsword/crates/swc_atoms/src/string%20storage.md

Code is probably full of errors, and I have no idea if this will work anyway, but it was fun to write!

@kdy1
Copy link
Member Author

kdy1 commented Jul 25, 2022

We need atomic operation even if it's read-only.
Actually Atom is already read only

@kdy1
Copy link
Member Author

kdy1 commented Jul 25, 2022

Oh arena? Yeah it's possible but it's too much restriction for rust-side users.

@overlookmotel
Copy link
Contributor

It was the dynamic set I was saying I think could avoid Mutex and atomic operations, not individual Atoms / JsWords. But actually, I think maybe they can too, if they can avoid being reference-counted.

I need to read more about concurrency in Rust. I'll come back to you.

In the meantime, if you'd have time to answer my other question, that'd be really useful:

Outside of the parser, where are most new JsWords created, where the string is not static?

@kdy1
Copy link
Member Author

kdy1 commented Jul 25, 2022

Strings are mostly not static, because we don't add atoms for injected helpers or variables

@kdy1
Copy link
Member Author

kdy1 commented Jul 27, 2022

I think we should implement inline/static for Atom and use it for everywhere

kdy1 added a commit that referenced this issue Oct 13, 2022
**Description:**

This PR changes the size of `Atom` type to `usize` from 2 * usize`.

**Related issue:**

 - #4946.
@hyf0
Copy link
Contributor

hyf0 commented Jan 10, 2023

We would like to know if there are some works that the community could contribute.

@kdy1
Copy link
Member Author

kdy1 commented Jan 10, 2023

@iheyunfei It would be really nice if someone adds support for static atoms to swc_atoms::Atom

@hyf0
Copy link
Contributor

hyf0 commented Feb 17, 2023

I wonder, if servo/string-cache#268 is merged, whether we should still investigate a better string type for swc?

@kdy1
Copy link
Member Author

kdy1 commented Feb 17, 2023

I don't think we need a new type, then. JsWord was problematic because of the mutex while other characteristics are very good

@YoniFeng
Copy link
Contributor

YoniFeng commented Mar 7, 2023

@hyf0 @kdy1 FYI that PR was merged, but there were some issues (prior breaking change got published together, and then it turned out the PR's change caused a debug assert).
I have #6980 open for a while wanting to get it in to swc.

I my fix for the debug assert was merged, we're just waiting on a new version of string-cache getting published now.

kdy1 added a commit that referenced this issue Oct 8, 2023
…7976)

**Description:**

This PR is to prepare removal of `string-cache`. Actually, this PR does not remove it. Instead, this PR only removes direct usages of `js_word!`s, especially in patterns.



**Related issue:**

 - #4946.
@kdy1
Copy link
Member Author

kdy1 commented Nov 5, 2023

@phoenix-ru You can create Lazy<Atom> instead. I profiled it but static atoms did not improve performance

@phoenix-ru
Copy link
Contributor

phoenix-ru commented Nov 6, 2023

@kdy1 After updating to swc_core@0.86.* all micro-benchmarks show a regression of 2-5% compared to js_word! + JsWord usage.

upd. Clarification: I removed all usages of a js_word! macro and replaced it with a generic Atom::from. Unfortunately, I also had to change pattern-matching, as doing matches!(smth, js_word!("something")) is not possible with Atom::from and had to be replaced with smth == "something".

upd2. A single-threaded static set sounds like it can benefit both single-threaded and multi-threaded usage.

@kdy1 kdy1 closed this as completed in #8126 Nov 7, 2023
kdy1 added a commit that referenced this issue Nov 7, 2023
**Description:**

`hstr` is an alternative for `string-cache` which does not support static strings and does not use a global mutex.
 
**Related issue:**

 - Closes #4946.
 - Closes #7974.
@kdy1
Copy link
Member Author

kdy1 commented Nov 7, 2023

@phoenix-ru Can you try newer version?

@phoenix-ru
Copy link
Contributor

@kdy1 This is terribly bad.

image

@phoenix-ru
Copy link
Contributor

phoenix-ru commented Nov 7, 2023

I reverted to swc_core@0.86.39 just to make sure, the results are consistent there


Summarizing:

  • generate micro-benchmark located here which does Transforms and produces a swc_core::ecma::ast::Module is suffering the most.

  • parse which is only Parsing suffers a bit less;

  • compile_sync_naive bench which does Parsing (html+TS) + Transforms + Stringify suffers even less.


Results from 0.86.39:
image

@kdy1
Copy link
Member Author

kdy1 commented Nov 7, 2023

I think you are doing something wrong, see vercel/turbo#6343 (comment)
It improved performance

@kdy1
Copy link
Member Author

kdy1 commented Nov 7, 2023

Can you share some code?
I have some guess, but can't be sure without any code.

@kdy1
Copy link
Member Author

kdy1 commented Nov 7, 2023

Ah. You must use https://rustdoc.swc.rs/swc_atoms/struct.AtomStore.html
So I also made a regression for CSS/XML/HTML parser of SWC.
Well... My bad that I didn't documented this.

@phoenix-ru
Copy link
Contributor

@kdy1 From here you can pick whatever part you want to see: parsing/transforming/codegen: https://github.com/phoenix-ru/fervid/blob/592025308719345388d6b3775276dc134e9458d9/crates/fervid/src/lib.rs#L57 (use fervid_parser one for SWC-based parsing).

What should I do to use AtomStore? I did an Atom polyfill like that for a moment: https://github.com/phoenix-ru/fervid/blob/592025308719345388d6b3775276dc134e9458d9/crates/fervid_core/src/structs.rs#L7

@kdy1
Copy link
Member Author

kdy1 commented Nov 7, 2023

You should patch your parser to have a field typed AtomStore and use AtomStore::atom with your string.

@kdy1
Copy link
Member Author

kdy1 commented Nov 7, 2023

You can call like self.atoms.atom(string)

@phoenix-ru
Copy link
Contributor

Is this the final design? Can AtomStore be static (not Send+Sync)?
The price of adjusting a codebase to new SWC atoms is getting high...

@kdy1
Copy link
Member Author

kdy1 commented Nov 7, 2023

Yes, it's final design. Actually the problem of string-cache is that it uses Mutex internally.

@kdy1
Copy link
Member Author

kdy1 commented Nov 7, 2023

It should never be static. That's the whole point of transition from string-cache.

@kdy1
Copy link
Member Author

kdy1 commented Nov 7, 2023

@phoenix-ru I managed to make it fast by default. dudykr/ddbase#14
cargo update -p hstr will do the trick. You don't need to use AtomStore

@phoenix-ru
Copy link
Contributor

@kdy1 Thanks! Even though you said

It should never be static.

With hstr@0.2.4 I got improvements of 3-15% across the board with an own static store:

pub static mut ATOM_STORE: Lazy<AtomStore> = Lazy::new(|| AtomStore::default());

#[macro_export]
macro_rules! fervid_atom {
    ($lit: literal) => {{
        let store = unsafe { &mut fervid_core::ATOM_STORE };
        let atom = store.atom($lit);
        atom
    }}
}

make it fast by default
You don't need to use AtomStore

Do you mean I can use Atom::from without a store?

@kdy1
Copy link
Member Author

kdy1 commented Nov 7, 2023

Yeah. It will use per-thread AtomStore internally. (Via an internal static variable 🤣 )

@phoenix-ru
Copy link
Contributor

Via an internal static variable

Yeah, I saw the sources and was confused before I noticed thread_local!.

FYI, benched the Atom::from implementation, results are depending on a stage. Parsing time improved (-2%) while transformation time increased (up to +5%) compared to a custom unsafe static store. Ignoring the bench.

I also compared swc_core@0.86.39 vs 0.86.43 with hstr@0.2.4 and results show a general improvement of 6-16%. Good work @kdy1!

@kdy1 kdy1 modified the milestones: Planned, v1.3.97 Nov 9, 2023
@kdy1 kdy1 modified the milestones: Planned, v1.3.97 Nov 20, 2023
@swc-bot
Copy link
Collaborator

swc-bot commented Dec 21, 2023

This closed issue has been automatically locked because it had no new activity for a month. If you are running into a similar issue, please create a new issue with the steps to reproduce. Thank you.

@swc-project swc-project locked as resolved and limited conversation to collaborators Dec 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging a pull request may close this issue.

6 participants