Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add split_unfused() to Itertools. #883

Closed
wants to merge 7 commits into from

Conversation

shanecelis
Copy link

@shanecelis shanecelis commented Feb 25, 2024

Here is the API for the proposed addition.

API doc

Split an unfused iterator into a group of iterators.

Given an "unfused iterator"---that is an iterator contains elements after next() returns None---the split_unfused() function will provide each None-terminated sequence as its own iterator. Iterators can be dropped and consumed out of order just like .group_by(). If consumed in order, no memory is allocated.

When the "unfused iterator" returns None twice consecutively, it is considered exhausted.

use itertools::Itertools;

// This iterator returns None every third element until it reaches seven.
struct Frayed(u8);
impl Iterator for Frayed {
    type Item = u8;
    fn next(&mut self) -> Option<u8> {
        self.0 += 1;
        (self.0 % 3 != 0 && self.0 <= 7).then_some(self.0)
    }
}
let split = Frayed(0).split_unfused();
let mut iters = split.into_iter();
let first = iters.next().unwrap();
let second = iters.next().unwrap();
let third = iters.next().unwrap();
assert!(iters.next().is_none());
// Iterators can be consumed out of order (but they will then allocate
// memory).
assert_eq!(third.collect::<Vec<_>>(), [7]);
assert_eq!(second.collect::<Vec<_>>(), [4, 5]);
assert_eq!(first.collect::<Vec<_>>(), [1, 2]);

Motivation

For some reason I can often write a unfused “None-terminated” iterator that represents multiple sequences. But they're unconventional and unwieldly. split_unfused() uses a lot of the machinery from chunk_by() to turn an unfused iterator into an iterator of iterators.

How's the name? The only other name I thought of was fray() but that seems not descriptive enough.

Tests

I added four tests as well.

@scottmcm
Copy link
Contributor

Hmm, I'm not sure how I feel about having a bunch of fancy "store stuff" logic for this.

.by_ref().fuse() will give you an iterator that does the first part, then you could do that again on the original iterator after exhausting it to continue it.

How often do you need the "sometimes allocates, sometimes doesn't" part? Because if allocating is fine, might as well just collect the parts up-front, I'd think...


At a meta level, it's not clear to me that itertools should have support for unfused iterators, since they're rather acanonical, and I don't know that we'd want to implicitly encourage them.

@shanecelis
Copy link
Author

Thanks for your consideration and the by_ref() tip.

I think the sometimes allocates, sometimes not is purely trying to respect the out of order iterator like chunk_by() does, which I was kind of floored by when I saw it implemented there. I probably would have just punted to panicking if someone wanted to go "backwards."

I'm happy to retract this pull request in light of the meta consideration. Maybe it can go in a deviant iterators crate unfusedtools or something.

Thanks for the great crate!

@scottmcm
Copy link
Contributor

BTW, I'm just peanut gallery here, so wait for a real maintainer to respond before worrying about retracting :)

@Philippe-Cholet
Copy link
Member

I'm no peanut gallery (he is a VIP for me really) but the new kid (latest maintainer).

I have on my TODO list (low priority) to make an issue about fusedness to clarify the situation for our iterators. One reference for me is #55.

I'm not particularly fond of using unfused iterators after it yielded None.
For example, calling fold (or any other related method, see #755) consume the iterator entirely. Technically, with iter.by_ref().fold(..); iter... we can still use it, but are the internals still in a valid state? That does not seem guaranteed to me.

@shanecelis
Copy link
Author

I started putting this functionality into a thing called frayed.

@shanecelis shanecelis closed this Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants