Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using NumPy arrays as columns of R data.frames #1481

Open
t-kalinowski opened this issue Sep 14, 2023 · 0 comments
Open

Using NumPy arrays as columns of R data.frames #1481

t-kalinowski opened this issue Sep 14, 2023 · 0 comments

Comments

@t-kalinowski
Copy link
Member

Being able to use numpy arrays in R dataframes would

  • Provide very nice ergonomics for managing nd-arrays with metadata (e.g., frames of a video stored as one ndarray, with miscellaneous columns of metadata for each frame)
  • Open the door to tracing + compiling operations in the context of a dataframe / dplyr (e.g., with jax, tensorflow, or similar).

This would need to an implementation of vctrs::vec_proxy() for numpy arrays that takes advantage of ALTREP that avoids materializing R atomic vectors where it doesn't make sense (e.g., if the numpy array is type int8, we don't want to materialize an int32 R atomic vector if we can avoid it).

(from Dewey Dunnington in slack)
Looking at nanoarrow_buffer may be helpful

int8_array <- arrow::Array$create(1:10, arrow::int8())
int8_array_nanoarrow <- nanoarrow::as_nanoarrow_array(int8_array)
int8_buffer <- int8_array_nanoarrow$buffers[[2]]

list(
  int8_buffer
)
#> [[1]]
#> <nanoarrow_buffer data<int8>[10][10 b]> `1 2 3 4 5 6 7 8 9 10`

https://arrow.apache.org/docs/format/CanonicalExtensions.html#fixed-shape-tensor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant