Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in Base.Iterators.partition and IterTools.partition #67

Open
ssfrr opened this issue Nov 25, 2019 · 7 comments
Open

Differences in Base.Iterators.partition and IterTools.partition #67

ssfrr opened this issue Nov 25, 2019 · 7 comments

Comments

@ssfrr
Copy link

ssfrr commented Nov 25, 2019

There are some differences between Base.Iterators.partition and IterTools.partition. I brought it up on Slack and @oxinabox asked me to file an issue.

  1. Base.Iterators.partition returns each chunk as an Array, whereas IterTools.partition returns a Tuple
  2. In the case that the last chunk is smaller than the requested partition size, Base.Iterators.partition returns a shorter chunk, whereas IterTools.partition drops the last values.
  3. IterTools.partition accepts a step argument that determines the hop size for each chunk.
@ssfrr
Copy link
Author

ssfrr commented Mar 5, 2020

update - I think on master Base.Iterators.partition now returns views into the original array, not Arrays.

This might (?) be the last inconsistency that would need to be addressed before something like #30 could be resolved. How do folks want to resolve the differences listed above? I'd propose using the Base.Iterators behavior for (1) and (2), but keeping the step argument.

One path forward would be to add the step argument as a PR to Base, then just reexport Base.Iterators.partition. This would also take advantage of the recent improvements that @mbauman made.

@omus
Copy link
Contributor

omus commented Mar 5, 2020

Change to make Base.Iterators.partition return views was done in: JuliaLang/julia#33533

@goretkin
Copy link
Contributor

goretkin commented May 19, 2020

I'm in favor of keeping the current behavior of IterTools.partition, perhaps under a different name, to avoid confusion with Base.

For example, implementing Base.diff, but for an iterable it instead of just AbstractVectors is very natural in terms of IterTools.partition(it, 2, 1), and the fact that it produces a stream of tuples, which have a length known at compile time is (I'm guessing) beneficial for efficient code generation.

@ssfrr
Copy link
Author

ssfrr commented May 24, 2020

Yeah, it's sort of a shame that the implementation needs to be different depending on whether it's designed for long or short partitions (which would use Arrays or Tuples, respectively). For instance, in my use case (streaming audio) I often want to chop the stream into chunks of 512 or 1024 elements, so Tuples aren't good for that. Maybe the partition type should be provided as an optional argument to partition?

@goretkin
Copy link
Contributor

Oh, good. That's an important use too. Base.Iterators.partition returns views into the array, instead of copies, now. Does that fit the audio chunking need?

@ssfrr
Copy link
Author

ssfrr commented May 24, 2020

Yeah, array views are good for me

@pepijndevos
Copy link
Collaborator

Difference 2 is handled in Clojure by having partition-all and partition, where base partition maps to Clojure's partition-all behavior.

Difference 1 & 2 are grouped together in Transducers under Consecutive vs Partition. So you either get fixed length tuples or variable length arrays.

I think this roughly aligns with common usage patterns where you either want to have small fixed length groups, or process large chunks of an array.

In my ideal universe both consecutive (IterTools partition) and partition would be in base, both supporting a step argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants