Add ByteView::try_new #5735

tustvold · 2024-05-08T06:45:46Z

Which issue does this PR close?

Closes #.

Rationale for this change

Potentially simpler version of #5619

What changes are included in this PR?

Are there any user-facing changes?

tustvold · 2024-05-08T06:46:35Z

arrow-data/src/byte_view.rs

+    ///
+    /// If `v` instead contains the binary data inline, returns an `Err` containing it
+    #[inline]
+    pub fn try_new(v: &u128) -> Result<Self, &[u8]> {


The idea would then be that we would remove the From<u128> implementation, which would be a breaking change but the next release is going to be breaking anyway (and I suspect few people are relying on this API yet)

I think what most confuses me about ByteView as a struct is that it doesn't represent in Rust anywhere the different layouts of the u128s in ByteViewArrays

For example, if you look at the rust ByteView struct without consulting the arrow spec, you may come to the conclusion that the u128s in a ByteViewArray have this format, which is not the case and thus you need to

Know to check "is length less than 12" and if so handle things specially (this API helps here by encapsulating that check for certain cases)

Know how to construct a view from bytes (aka how much of the prefix to copy and where.

This API seems to improve things (though I think if we went this way we should expand its docstring to explain the difference between the two types of byte views)

represent in Rust anywhere the different layouts of the u128s in ByteViewArrays

Correct, it represents the non-inlined case where you have a view, and not just a short inlined byte array.

This is consistent with the terminology used in the docs - https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-view-layout

And with the terminology for ListView and LargeListView, where a view represents a view into a separate buffer of data

this API helps here by encapsulating that check for certain cases

What cases does it not encapsulate?

Correct, it represents the non-inlined case where you have a view, and not just a short inlined byte array.

I see -- in my mind the combination of (length, inlined data) is also a "view" but I can see how you have a different interpretation (perhaps if you view the types as (length, inline) or (length, view) 🤔 )

The diference I am thinking about is two layouts shown here (described as "view structures")

However, there is a single ByteView rust struct (that corresponds to "Long strings")

What cases does it not encapsulate?

One case is creating the u128 initially (e.g. if we should copy 4 bytes or up to 12)

One case is creating the u128 initially (e.g. if we should copy 4 bytes or up to 12)

Right, but does this need to be its own type, or could it be a free function? I don't know the answer to this, yet, but I would always take no abstraction over a bad abstraction

tustvold · 2024-05-08T12:39:27Z

but I probably don't fully understand it

I updated one call site to show how it works, I believe it achieves the same end as the linked PR whilst being significantly simpler

tustvold · 2024-05-08T12:40:02Z

arrow-array/src/array/byte_view_array.rs

-            let data = self.buffers.get_unchecked(view.buffer_index as usize);
-            let offset = view.offset as usize;
-            data.get_unchecked(offset..offset + len as usize)
+        let b = match ByteView::try_new(v) {


We can see here how try_new encapsulates the logic for interpreting the u128

alamb

Thank you for this PR @tustvold

It seems to me the key difference between the existing ByteView approach (that this PR extends) and the approach in #5619 is an explicy Rust API for manipulating / accessing the inline variant of the u128 vies.

I think this PR improves the usability of ByteView but I still think #5619 (or another approach that models the two types of views as separate Rust structs) is easier to understand

alamb · 2024-05-08T12:54:23Z

arrow-data/src/byte_view.rs

+    ///
+    /// If `v` instead contains the binary data inline, returns an `Err` containing it
+    #[inline]
+    pub fn try_new(v: &u128) -> Result<Self, &[u8]> {


I think what most confuses me about ByteView as a struct is that it doesn't represent in Rust anywhere the different layouts of the u128s in ByteViewArrays

For example, if you look at the rust ByteView struct without consulting the arrow spec, you may come to the conclusion that the u128s in a ByteViewArray have this format, which is not the case and thus you need to

Know to check "is length less than 12" and if so handle things specially (this API helps here by encapsulating that check for certain cases)

Know how to construct a view from bytes (aka how much of the prefix to copy and where.

This API seems to improve things (though I think if we went this way we should expand its docstring to explain the difference between the two types of byte views)

Add ByteView::try_new

7146a7a

github-actions bot added the arrow Changes to the arrow crate label May 8, 2024

tustvold commented May 8, 2024

View reviewed changes

tustvold mentioned this pull request May 8, 2024

Encapsulate View logic for GenericByteViewArray #5619

Draft

This comment was marked as off-topic.

Sign in to view

tustvold commented May 8, 2024

View reviewed changes

alamb reviewed May 8, 2024

View reviewed changes

tustvold mentioned this pull request May 8, 2024

Structured ByteView Access (underlying StringView/BinaryView representation) #5736

Open

tustvold mentioned this pull request May 23, 2024

Allow constructing ByteViewArray from existing blocks #5796

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ByteView::try_new #5735

Add ByteView::try_new #5735

tustvold commented May 8, 2024

tustvold May 8, 2024

alamb May 8, 2024

tustvold May 8, 2024

alamb May 8, 2024 •

edited

tustvold May 8, 2024

This comment was marked as off-topic.

tustvold commented May 8, 2024

tustvold May 8, 2024

alamb left a comment

alamb May 8, 2024

Add ByteView::try_new #5735

Are you sure you want to change the base?

Add ByteView::try_new #5735

Conversation

tustvold commented May 8, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

tustvold May 8, 2024

Choose a reason for hiding this comment

alamb May 8, 2024

Choose a reason for hiding this comment

tustvold May 8, 2024

Choose a reason for hiding this comment

alamb May 8, 2024 • edited

Choose a reason for hiding this comment

tustvold May 8, 2024

Choose a reason for hiding this comment

This comment was marked as off-topic.

tustvold commented May 8, 2024

tustvold May 8, 2024

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb May 8, 2024

Choose a reason for hiding this comment

alamb May 8, 2024 •

edited