Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement custom fold for WhileSome #780

Conversation

Owen-CH-Leung
Copy link
Contributor

#755

This PR introduces a custom fold method for WhileSome. The optimization hinges on the underlying iterator's fold:

With Custom fold: If the underlying iterator provides its own specialized fold, the performance gain depends on its implementation.
Without Custom fold: In the absence of a specialized fold for the underlying iterator, the performance remains on par with the default.

Benchmark Results:

Default fold: time: [497.09 ns 497.45 ns 497.82 ns]
Custom fold: time: [456.09 ns 458.15 ns 460.80 ns]

Comment on lines 585 to 599
fn fold<B, F>(self, acc: B, f: F) -> B
where
Self: Sized,
F: FnMut(B, Self::Item) -> B,
{
self.iter
.take_while(|opt| opt.is_some())
.map(|item| item.unwrap())
.fold(acc, f)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible to implement this without the unwrap, using try_fold:

Suggested change
fn fold<B, F>(self, acc: B, f: F) -> B
where
Self: Sized,
F: FnMut(B, Self::Item) -> B,
{
self.iter
.take_while(|opt| opt.is_some())
.map(|item| item.unwrap())
.fold(acc, f)
}
fn fold<B, F>(mut self, init: B, mut f: F) -> B
where
Self: Sized,
F: FnMut(B, Self::Item) -> B,
{
// Replace with `ControlFlow`, once MSRV >= 1.55.0
use core::result::Result::{Ok as Continue, Err as Break};
let res = self.iter.try_fold(init, |acc, item| match item {
Some(item) => Continue(f(acc, item)),
None => Break(acc),
});
let (Break(res) | Continue(res)) = res;
res
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jswrenn Thanks for your feedback! I'm a bit hesitate to override fold with try_fold though.

The names of the methods in Rust are usually semantically clear. When overriding fold, it makes sense to delegate its responsibility to the underlying iterator's fold. Similarly, try_fold should be reserved for overriding try_fold.

What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Rust standard library, it's quite common for fold to be implemented in terms of try_fold. In your original code you construct a TakeWhile and call fold on it. But TakeWhile's fold implementation delegates to try_fold!

The general rule to follow is something like this:

  • If you need to completely consume an iterator, use for_each or fold.
    • Use for_each if you don't need to maintain an accumulator.
    • Use fold if you do need to maintain an accumulator.
  • If you need to partly consume an iterator, use try_for_each or try_fold.
    • Same rules about accumulators apply.

Here, try_fold is the perfect tool: We need to consume the iterator partially, up until the first None, with an accumulator.

@jswrenn jswrenn changed the title Implement custom fold for while some Implement custom fold for WhileSome Oct 10, 2023
@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Oct 11, 2023

Untested, I think the bench part should be

let result = data.fold(String::new(), |mut acc, ch| {
    acc.push(ch);
    acc
});
assert_eq!(result, "0123456789abcdef"); // or result.as_str() if needed

@Owen-CH-Leung
Copy link
Contributor Author

Untested, I think the bench part should be

let result = data.fold(String::new(), |mut acc, ch| {
    acc.push(ch);
    acc
});
assert_eq!(result, "0123456789abcdef"); // or result.as_str() if needed

Interestingly, when I re-benchmark again using the revised benchmark (using both fold / try_fold), the performance gain disappears.

default: time: [59.414 ns 59.554 ns 59.697 ns]
overriden: time: [84.891 ns 85.273 ns 85.691 ns]

Seems that the overhead from calling to_string() is far more heavier than just push(ch)

@Philippe-Cholet
Copy link
Member

Each time in fold, to_string here creates a non-empty string and therefore heap-allocates (slow) while push(ch) merely mutates the existing acc string, so yes way heavier and the reason behind my quick comment, I should have said something.
The assert_eq was probably a bit slow too for allocating to a vector while my suggestion would not.

src/adaptors/mod.rs Outdated Show resolved Hide resolved
@Philippe-Cholet Philippe-Cholet force-pushed the Implement-custom-fold-for-while-some branch from 2808bcc to 2a59240 Compare October 24, 2023 07:42
@Philippe-Cholet
Copy link
Member

I guess I can't merge this because an owner (jswrenn) requested changes.

@jswrenn jswrenn added this pull request to the merge queue Oct 24, 2023
@jswrenn jswrenn added this to the next milestone Oct 24, 2023
Merged via the queue into rust-itertools:master with commit 6710e71 Oct 24, 2023
8 checks passed
@jswrenn jswrenn mentioned this pull request Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants