Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazily materialize the groups #2031

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

lazily materialize the groups #2031

wants to merge 2 commits into from

Conversation

Fil
Copy link
Contributor

@Fil Fil commented Mar 22, 2024

Makes lazy calls to take (which is expensive on arrow tables). The total # of calls to take in the tests goes from 34068 to 11778 (differenceY and differenceYVariable are the main consumers of this function). This should give better performance to the group transform on large datasets.

Related: #2030

@Fil Fil requested a review from mbostock March 22, 2024 08:04
Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t like relying on Proxy to do this — it’s too magical. Can we instead change the group transform to not compute arrays of data unless explicitly requested?

@Fil
Copy link
Contributor Author

Fil commented Mar 22, 2024

More experimentation: I've pushed another strategy, even though I'm not convinced it's better. Maybe it will give us new ideas.

@Fil Fil marked this pull request as draft March 22, 2024 17:59
…nsform, that needs to be invoked when we want to use the data. That means that downstream consumers of this need to know how to handle it. In the tests there is only one case, and it goes through arrayify, which is where I "decode" the structure. Not sure if it's any better.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants