Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descendants #836

Merged
merged 24 commits into from Feb 13, 2024
Merged

Descendants #836

merged 24 commits into from Feb 13, 2024

Conversation

wagnerf42
Copy link
Contributor

hi,

this is an equivalent of the split function but for tree like patterns.
it solves an issue that was posted recently and i could also use it with some of my code.

tell me if you think it is worth inclusion for you. if so i can add some more tests and benches.

Copy link
Member

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems definitely useful to me. I feel like an awful lot of parallel patterns wind up descending over trees.

/// assert_eq!(v, vec![3, 10, 14, 18]);
/// ```
///
pub fn descendants<S, B, I>(root: S, breed: B) -> Descendants<S, B, I>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about a different name, like iter::walk_tree or something?

@cuviper
Copy link
Member

cuviper commented Mar 24, 2021

Do I understand correctly that this makes no guarantees about traversal order?

Of course parallel execution is unordered, but it's still visible in fold and reduce. I think it would be a lot stronger if we could specify the traversal and hold to that. It could even be generic pre/in/post-order if the user provided two closures, or one closure returning two things, so they choose what is logically traversed before or after the current item. Then examples like your tree could collect in the desired order, rather than sorting afterward.

I'm sure that would make the implementation harder though...

@wagnerf42
Copy link
Contributor Author

Do I understand correctly that this makes no guarantees about traversal order?

Of course parallel execution is unordered, but it's still visible in fold and reduce. I think it would be a lot stronger if we could specify the traversal and hold to that. It could even be generic pre/in/post-order if the user provided two closures, or one closure returning two things, so they choose what is logically traversed before or after the current item. Then examples like your tree could collect in the desired order, rather than sorting afterward.

I'm sure that would make the implementation harder though...

i'll take a look. note that it's not for binary trees but for arbitrary degrees so the only meaningful orders are pre and post

@wagnerf42
Copy link
Contributor Author

the ordering question was interesting.
actually i need to bench the sequential part (several algorithms in mind).
maybe i would do two functions walk_tree_prefix and walk_tree_postfix because they have very distinctive properties.
i might take a few days for that.

@wagnerf42
Copy link
Contributor Author

hi, so i did two functions with two different orderings.
prefix and postfix (order between children also differs in the two functions)
i benchmarked quite some number of variants and stayed with the fastest ones.
it is actually pretty fast. you can try the tree example.

there is an overhead for the prefix order because breed borrows the state so we must consume the whole iterator before consuming s and then loop again on the same data towards the children.

i'd like to have some feedback at this point if you can.
if that's ok for you then i'll add some tests for arbitrary degrees and the debug+clone tests

Copy link
Member

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I made some documentation suggestions. I think we should introduce a walk_tree that simply calls walk_tree_postfix, so that people have something they can use that does not guarantee ordering.

src/iter/walk_tree.rs Outdated Show resolved Hide resolved
src/iter/walk_tree.rs Outdated Show resolved Hide resolved
src/iter/walk_tree.rs Outdated Show resolved Hide resolved
src/iter/walk_tree.rs Outdated Show resolved Hide resolved
src/iter/walk_tree.rs Outdated Show resolved Hide resolved
@wagnerf42
Copy link
Contributor Author

thanks for all your suggestions. it definitely is much cleaner now.

Copy link
Member

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! One last nit.

src/iter/walk_tree.rs Show resolved Hide resolved
/// which guarantees a postfix order.
/// If you don't care about ordering, you should use [`walk_tree`],
/// which will use whatever is believed to be fastest.
/// Between siblings, children are reduced in reverse order -- that is, the children that returned last are reduced first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reverse order seems really bizarre -- is that necessary? As a user, I would expect a "normal" pre-order traversal of a binary tree to first visit the node, then the left children, then the right children.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, this is the fastest version. since it is a prefix order we need to consume the father before its children. but we need to borrow the father to iterate on children. this means we first need to extend the stack with the children.
because it's a vec the last pushed will be the first poped and this reverses children's order.

other options are:

  • use a more complex struct and some unsafe.
  • require the iterator to be double ended
  • first collect the iterator into a vec and then extend the stack in the right order

to me the only acceptable option is the third one. it comes with a performance hit though.

if you think it would be better to have the standard order then i can re-implement this version.

Copy link
Contributor Author

@wagnerf42 wagnerf42 Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so i did a quick bench (the one in rayon_demo)
this is the current version:
sum: 382us collect: 683us
this is the version with the "fixed" ordering:
sum: 705us collect: 950us

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That speed difference seems pretty significant to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first collect the iterator into a vec and then extend the stack in the right order

You could do this in-place by noting the previous length, extending, then reversing the new tail slice. That's still some added cost, but at least it doesn't require a new allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did it like this :

         self.reorder_buffer.extend((self.breed)(&e));
         self.to_explore.extend(self.reorder_buffer.drain(..).rev());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear, i do understand there is a choice to be made here.
the overhead is between 10 and 15 nanoseconds per element. this is not zero cost but not excessive.
if you ask me i would favor the current (reversed) order because the algorithm is "natural" this way. user can also reverse their iterators to get to the other order. however i do see why the other option also makes sense.

in the end, choose whatever you want and let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, internal algorithm details should not be so visible in the public API, especially since those details might want to change in the future. I want to aim for what is most "natural" to the user first, without exposing details like "well actually you probably want to reverse this." That's even evident in your tree_prefix_collect benchmark where you have an explanatory comment, "large indices to the left, small to the right", being different than the others.

To that end, I think DoubleEndedIterator might be reasonable, although you immediately dismissed that. The user can provide the children in their natural order, matching the visitation order, but we'll have the flexibility to internally iterate in reverse. Are there realistic use-cases that you think could not be double-ended?

Of course there are plenty of non-DE iterators, but I'm asking for a real data structure that could not operate this way, especially since you say "users can also reverse their iterators." If they can so easily reverse, then DE should be no issue; if they can't, then I think we won't be serving them well with an algorithmically-reversed order.

Copy link
Contributor Author

@wagnerf42 wagnerf42 Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i did the switch to double ended iterators.
the non double ended iterators i use from time to time are scan and successors but i don't really see how they would be used here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we now just need to update (or remove) this line about reversal.

src/iter/walk_tree.rs Outdated Show resolved Hide resolved
src/iter/walk_tree.rs Outdated Show resolved Hide resolved
src/iter/walk_tree.rs Outdated Show resolved Hide resolved
@wagnerf42
Copy link
Contributor Author

wagnerf42 commented Apr 16, 2021

i added a bit more tests with higher degrees

Copy link
Member

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good to me.

@wagnerf42
Copy link
Contributor Author

do i need to do something more ? i'm in no hurry but just tell me if you want something from me.

@nikomatsakis
Copy link
Member

I'm satisfied -- @cuviper ?

Copy link
Member

@cuviper cuviper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some minor cleanup, but otherwise I'm satisfied.

Please also take the PR out of draft state if you think it's ready.

/// which guarantees a postfix order.
/// If you don't care about ordering, you should use [`walk_tree`],
/// which will use whatever is believed to be fastest.
/// Between siblings, children are reduced in reverse order -- that is, the children that returned last are reduced first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we now just need to update (or remove) this line about reversal.

S: Send,
B: Fn(&S) -> I + Send + Sync,
IT: DoubleEndedIterator<Item = S>,
I: IntoIterator<Item = S, IntoIter = IT> + Send,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove IT as a public API parameter by constraining I::IntoIter: Double... instead.

Should we also require I::IntoIter: Send? We consume it immediately now, but might there be a future where we'd want to be more lazy about that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure i get it. what would send allow us to do ? IT would not be a parallel iterator so it would just mean using another thread. I is already Send.
so i'm not getting what this would potentially allow us to do and why I being Send is not enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: Send doesn't actually give us much, only that after getting such a return value of B, we could hold and send that I value to other threads before calling into_iter(). Once that's called, we now have the IntoIter type, which we might want to lazily pull items out one at a time.

For example, suppose to_explore were instead a Vec<Either<S, IntoIter>>, storing either unvisited nodes or their unprocessed (or partially processed) brood. We would need IntoIter: Send to be able to split this between threads.

Maybe we don't need to do this, especially if we don't expect there to be very many children most of the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I: Send is only necessary right now because you have that PhantomData<I>, but I don't think you actually need that. The structs themselves don't need the I parameter at all, only the impl constraints.

where
S: Send,
B: Fn(&S) -> I + Send + Sync,
I: IntoIterator<Item = S> + Send,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about maybe I::IntoIter: Send.

src/iter/walk_tree.rs Outdated Show resolved Hide resolved
src/iter/walk_tree.rs Show resolved Hide resolved
src/iter/walk_tree.rs Outdated Show resolved Hide resolved
src/iter/walk_tree.rs Outdated Show resolved Hide resolved
src/iter/walk_tree.rs Outdated Show resolved Hide resolved
where
S: Send,
B: Fn(&S) -> I + Send + Sync,
I: IntoIterator<Item = S> + Send,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about maybe I::IntoIter: Send.

@nikomatsakis
Copy link
Member

@wagnerf42 should we take the PR out of draft state?

@nikomatsakis
Copy link
Member

you gotta click that "ready for review" button:

image

or we can...

@wagnerf42 wagnerf42 marked this pull request as ready for review May 21, 2021 14:22
frederic wagner and others added 12 commits February 9, 2024 15:20
this commit adds the `descendants` function which is an equivalent of `split`
for tree structured data.
    * renaming descendants to walk_tree
    * preserving order
        -> this has a performance cost so i'll try to do some benches
sequential folding is recursive
i'm not sure how to do it sequentially since
this would imply a self borrowing struct
(we need to store both the state and the iterator borrowing it)
postfix collect test + bench, removed deque
only thing needed now is tests for n-ary trees
removed vectors allocations + no recursion
changed order between siblings for that
Co-authored-by: Niko Matsakis <niko@alum.mit.edu>
Co-authored-by: Niko Matsakis <niko@alum.mit.edu>
Co-authored-by: Niko Matsakis <niko@alum.mit.edu>
wagnerf42 and others added 12 commits February 9, 2024 15:20
Co-authored-by: Niko Matsakis <niko@alum.mit.edu>
- prefix order is reversed, we now required double ended iterators
- tests and benches updated accordingly
- two more tests for flat trees
- removed unneeded malloc in task splitting
tests for graphs with higher degrees
- renamed breed to children_of
- doc cleanup
- using consume_iter
Otherwise we'll be under-constrained, unable to actually make a choice
between the implementations. (Even though we think the faster one
doesn't need that right now.)
@cuviper
Copy link
Member

cuviper commented Feb 10, 2024

Hi @wagnerf42! Sorry that it's been a while. I took the liberty of rebasing your PR and making a few of those tweaks I had suggested way back in my last review. Please take a look, and if you're happy with this then I think we should merge it!

@wagnerf42
Copy link
Contributor Author

hi, well it seems fine for me.

@cuviper
Copy link
Member

cuviper commented Feb 13, 2024

Thanks!

@cuviper cuviper added this pull request to the merge queue Feb 13, 2024
Merged via the queue into rayon-rs:main with commit 46c49e6 Feb 13, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants