Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type #41373

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Apr 25, 2024

Rationale for this change

I want to instantiate this primitive operation in other scenarios (e.g. the optimized version of Take that handles chunked arrays) and extend the sub-classes of GatherCRTP with different member functions that re-use the WriteValue function generically (any fixed-width type and even bit-wide booleans).

When taking these improvements to Filter I will also re-use the "gather" concept and parameterize it by bitmaps/boolean-arrays instead of selection vectors (indices) like take does. So gather is not a "renaming of take" but rather a generalization of take and filter do in Arrow with different representations of what should be gathered from the values array.

What changes are included in this PR?

  • Introduce the Gather class helper to delegate fixed-width memory gathering: both static and dynamically sized (size known at compile time or size known at runtime)
  • Specialized Take implementation for values/indices without nulls
  • Fold the Boolean, Primitives, and Fixed-Width Binary implementation of Take into a single one
  • Skip validity bitmap allocation when inputs (values and indices) have no nulls

Are these changes tested?

  • Existing tests
  • New test assertions that check that Take guarantees null values are zeroed out

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels May 3, 2024
@felipecrv felipecrv requested a review from pitrou May 3, 2024 15:06
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 3, 2024
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical that you want to reuse this for Filter, unless you add Gather methods for batch selection. For Filter performance, it is essential to write out ranges of selected values at a time, not one value at a time. I don't know if that's what you have in mind.

Other than that, some assorted comments.

cpp/src/arrow/util/macros.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
}
template <typename IndexCType, typename ValueBitWidthConstant,
typename OutputIsZeroInitialized = std::false_type,
typename WithFactor = std::false_type>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not bool kOutputIsZeroInitialized and bool kWithFactor = false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific reason.

I will change, as I prefer to use bool directly myself. I was trying to be compatible with the use of std::integral_constant<> for ValueBitWidthConstant that was added by you in a previous PR :)

What's the reasoning for using std::integral_constant instead of an integer typed template parameter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for using std::integral_constant instead of an integer typed template parameter?

I forgot to respond to this, sorry. I don't remember exactly, but using an integer probably failed for some reason, perhaps because it would prevent partial specialization? (but my memory is fuzzy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change everything to bool and int and see what happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for these tparams being types and not values is that they are populated from a pack of template type parameters:

template <template <typename...> class TakeImpl, typename... Args>
void TakeIndexDispatch(KernelContext* ctx, const ArraySpan& values,
                       const ArraySpan& indices, ArrayData* out, int64_t factor = 1) {

TakeIndexDispatch<FixedWidthTakeImpl, ...all params here should be types...>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH, of course. Let's add a comment explaining this for future readers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/util/gather_internal.h Outdated Show resolved Hide resolved
@felipecrv
Copy link
Contributor Author

I'm skeptical that you want to reuse this for Filter, unless you add Gather methods for batch selection. For Filter performance, it is essential to write out ranges of selected values at a time, not one value at a time. I don't know if that's what you have in mind.

I want to expand the set of WriteValue implementations to support writing multiple values. Then add a version of Gather::Execute*() that takes boolean arrays (masks) instead of indices (selection vectors).

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels May 6, 2024
@felipecrv felipecrv requested a review from pitrou May 6, 2024 20:43
@felipecrv felipecrv force-pushed the gather_fixed branch 3 times, most recently from 3dd6c56 to 6d769d3 Compare May 15, 2024 01:36
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 17, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 18, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 4, 2024
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some assorted comments below. The code looks good on the principle.

cpp/src/arrow/compute/kernels/gather_internal.h Outdated Show resolved Hide resolved
// See derived Gather classes below for the meaning of the parameters, pre and
// post-conditions.
template <bool kOutputIsZeroInitialized, typename IndexCType>
ARROW_FORCE_INLINE int64_t ExecuteWithNulls(const ArraySpan& src_validity,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ARROW_FORCE_INLINE desirable here? Generally, the compiler is able to make more informed heuristics than us (but not always admittedly). This function is rather complex so I'm not sure force-inlining it would actually be beneficial (and could blow up code generation sizes).

Copy link
Contributor Author

@felipecrv felipecrv Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I checked code and benchmarks (a month or more ago).

Here's the overall approach with this header.

GatherBaseCRTP is a template that generates these templates:

template <typename Global, typename Params>
class Gather {
  a simple;
  b scalar;
  c members;
  
  void WriteValue(pos) { ... }
  void WriteZero(pos) { ... }
  void WriteZeroSegment(pos, len) { ... }
  
  int64_t ExecuteNoNulls(len) { .. }
  
  template<bool OutputIsZeroInitialized>
  int64_t Execute(src_validity, idx_validity, out_is_valid) { ... }
};

The usage should be as follows. On the same function, the specific Gather instance should be instantiated and the method should be called on it:

    const bool out_has_validity = values.MayHaveNulls() || indices.MayHaveNulls();

    const uint8_t* src;
    int64_t src_offset;
    std::tie(src_offset, src) = util::OffsetPointerOfFixedBitWidthValues(values);
    uint8_t* out = util::MutableFixedWidthValuesPointer(out_arr);
    int64_t valid_count = 0;
    arrow::internal::Gather<kValueWidthInBits, IndexCType, WithFactor::value> gather{
        /*src_length=*/values.length,
        src,
        src_offset,
        /*idx_length=*/indices.length,
        /*idx=*/indices.GetValues<IndexCType>(1),
        out,
        factor};
    if (out_has_validity) {
      DCHECK_EQ(out_arr->offset, 0);
      // out_is_valid must be zero-initiliazed, because Gather::Execute
      // saves time by not having to ClearBit on every null element.
      auto out_is_valid = out_arr->GetMutableValues<uint8_t>(0);
      memset(out_is_valid, 0, bit_util::BytesForBits(out_arr->length));
      valid_count = gather.template Execute<OutputIsZeroInitialized::value>(
          /*src_validity=*/values, /*idx_validity=*/indices, out_is_valid);
    } else {
      valid_count = gather.Execute();
    }

What this achieves? Gather<kValueWidthInBits, ...> gather{...} isn't really the initialization of a portion of the memory in the stack, but of registers. Because the lifetime of gather is confined to a single function and all the method calls on it are inlined, the member of gather can live in registers from initialization to the end of the kernel loop. I did this to achieve the same performance we had when these loops were hand-written (more than once!) in vector_selection_take_internal.cc.

This idea of keeping variables in registers is mentioned in database implementation reports (Efficiently Compiling Efficient Query Plans for Modern Hardware) and it's not just compiler-brain perfectionism (https://llvm.org/docs/Passes.html#sroa-scalar-replacement-of-aggregates).

By force-inlining the Execute methods I'm actually reducing code size (together with the fact that I instantiate Gather and these specific Execute calls from a single place). Without inlining Execute there would be more binary code to deal with construction/destruction, loads to/from memory.

To make sure we only ever instantiate these templates once, I could wrap this code in templates outside the inline namespace. The templates would contain the initialization of Gather<...> gather and the inlined calls to the two inlined Execute functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to build that functional façade to these classes before we concretely have more than two instantiations of these templates? I could create an issue to track this concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know using registers is critical for performance. I'm just surprised that the compiler wouldn't cache member variables in registers in a (non-inlined) method's body. Adding a couple memory lookups at the start and end of such a heavy function should not yield a noticeable slowdown.

Just because a compiler feels better with some particular organization of the source code doesn't mean we've unlocked a robust performance improvement. We may just have hit a sweeter spot in this optimizer's behaviour.

In any case, we can proceed with this. This is helper code that should only be used in a couple kernels, so the potential code size increase is hopefully (!) minor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the inline namespace annotation as well as it seems to be a red-herring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just surprised that the compiler wouldn't cache member variables in registers in a (non-inlined) method's body. Adding a couple memory lookups at the start and end of such a heavy function should not yield a noticeable slowdown.

Without inlining Execute, there has to be a this pointer pointing to the struct laid out in memory and call to Write* have to load members through the this pointer. Not an impossible optmization, but wasn't happening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without inlining Execute, there has to be a this pointer pointing to the struct laid out in memory and call to Write* have to load members through the this pointer. Not an impossible optmization, but wasn't happening.

I see. Compilers are annoying :-)

@felipecrv
Copy link
Contributor Author

Proof that this PR reduces binary size:

~/code/arrow/cpp/release $ wc -c ./src/arrow/CMakeFiles/arrow_compute.dir/compute/kernels/vector_selection_internal.cc.o
  278808 ./src/arrow/CMakeFiles/arrow_compute.dir/compute/kernels/vector_selection_internal.cc.o
$ git co origin/main
HEAD is now at 7f0c4070dd GH-41397: [C#] Downgrade macOS test runner to avoid infrastructure bug (#41934)
~/code/arrow/cpp/release $ ninja ./src/arrow/CMakeFiles/arrow_compute.dir/compute/kernels/vector_selection_internal.cc.o
[1/1] Building CXX object src/arrow/CMakeFiles/arrow_compute.dir/compute/kernels/vector_selection_internal.cc.o
~/code/arrow/cpp/release $ wc -c ./src/arrow/CMakeFiles/arrow_compute.dir/compute/kernels/vector_selection_internal.cc.o
  289832 ./src/arrow/CMakeFiles/arrow_compute.dir/compute/kernels/vector_selection_internal.cc.o

-11024 bytes 🎉

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 4, 2024
@pitrou
Copy link
Member

pitrou commented Jun 5, 2024

Proof that this PR reduces binary size:

Nice! Can you also post numbers obtained with the size utility for completeness?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 5, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 5, 2024
@felipecrv
Copy link
Contributor Author

felipecrv commented Jun 6, 2024

Proof that this PR reduces binary size:

Nice! Can you also post numbers obtained with the size utility for completeness?

Looking at both vector_selection_take_internal.cc.o and vector_selection_internal.cc.o I'm net-adding 1.45 KBytes.

bloaty -d symbols -C full -n 0 \
HEAD-vector_selection_take_internal.cc.o HEAD-vector_selection_internal.cc.o -- \
MAIN-vector_selection_take_internal.cc.o MAIN-vector_selection_internal.cc.o
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +32.8Ki  [NEW] +32.7Ki    arrow::compute::internal::FixedWidthTakeExec(arrow::compute::KernelContext*, arrow::compute::ExecSpan const&, arrow::compute::ExecResult*)
  [NEW]    +204  [NEW]    +168    GCC_except_table272
  [NEW]    +189  [NEW]    +104    arrow::Status arrow::Status::NotImplemented<char const (&) [38], arrow::DataType const&>(char const (&&&) [38], arrow::DataType const&&&)
Details

  +183%    +168  +300%    +168    GCC_except_table170
   +43%    +164   +47%    +164    GCC_except_table20
  [NEW]    +156  [NEW]    +120    GCC_except_table302
  [NEW]    +146  [NEW]     +76    GCC_except_table99
   +40%    +109   +71%    +144    GCC_except_table24
   +72%    +143   +66%   gg +108    GCC_except_table13
  [NEW]    +126  [NEW]     +56    GCC_except_table60
  [NEW]    +108  [NEW]     +72    GCC_except_table138
  +169%    +108  +257%     +72    GCC_except_table172
  [NEW]    +108  [NEW]     +72    GCC_except_table225
  [NEW]    +107  [NEW]     +72    GCC_except_table64
  +195%    +107  +360%     +72    GCC_except_table87
  [NEW]     +96  [NEW]     +60    GCC_except_table111
  +160%     +96  +250%     +60    GCC_except_table151
   +25%     +95   +18%     +60    GCC_except_table17
  [NEW]     +95  [NEW]     +60    GCC_except_table72
  [NEW]     +95  [NEW]     +60    GCC_except_table81
  +116%     +88  +220%     +88    GCC_except_table192
  [NEW]     +84  [NEW]     +48    GCC_except_table188
  [NEW]     +76  [NEW]     +40    GCC_except_table107
   +96%     +68  +189%     +68    GCC_except_table10
  [NEW]     +68  [NEW]     +32    GCC_except_table102
  [NEW]     +68  [NEW]     +32    GCC_except_table114
  [NEW]     +68  [NEW]     +32    GCC_except_table144
  [NEW]     +68  [NEW]     +32    GCC_except_table197
  [NEW]     +68  [NEW]     +32    GCC_except_table223
  [NEW]     +67  [NEW]     +32    GCC_except_table43
  [NEW]     +67  [NEW]     +32    GCC_except_table53
  [NEW]     +67  [NEW]     +32    GCC_except_table90
  [NEW]     +64  [NEW]     +28    GCC_except_table164
  [NEW]     +64  [NEW]     +28    GCC_except_table199
  [NEW]     +64  [NEW]     +28    GCC_except_table207
  [NEW]     +64  [NEW]     +28    GCC_except_table213
  [NEW]     +63  [NEW]     +28    GCC_except_table28
  [NEW]     +60  [NEW]     +24    GCC_except_table128
  [NEW]     +60  [NEW]     +24    GCC_except_table155
  [NEW]     +60  [NEW]     +24    GCC_except_table298
  [NEW]     +59  [NEW]     +24    GCC_except_table36
  [NEW]     +56  [NEW]     +20    GCC_except_table182
  [NEW]     +56  [NEW]     +20    GCC_except_table190
  [NEW]     +56  [NEW]     +20    GCC_except_table209
  [NEW]     +52  [NEW]     +16    GCC_except_table133
  [NEW]     +52  [NEW]     +16    GCC_except_table134
  [NEW]     +52  [NEW]     +16    GCC_except_table162
  [NEW]     +52  [NEW]     +16    GCC_except_table168
  [NEW]     +52  [NEW]     +16    GCC_except_table179
  [NEW]     +52  [NEW]     +16    GCC_except_table194
  +100%     +52  +100%     +16    GCC_except_table202
  [NEW]     +52  [NEW]     +16    GCC_except_table218
  [NEW]     +52  [NEW]     +16    GCC_except_table221
  [NEW]     +52  [NEW]     +16    GCC_except_table277
  [NEW]     +52  [NEW]     +16    GCC_except_table279
  [NEW]     +51  [NEW]     +16    GCC_except_table38
  [NEW]     +50  [NEW]     +16    GCC_except_table4
   +92%     +48  +300%     +48    GCC_except_table132
  [NEW]     +47  [NEW]     +12    GCC_except_table54
  [NEW]     +42  [NEW]     +16    lCPI223_1
  [NEW]     +40  [NEW]     +16    lJTI2_8
  [NEW]     +40  [NEW]     +16    lJTI2_9
   +25%     +32   +53%     +32    GCC_except_table84
  +0.4%     +32  +0.4%     +32    ltmp9
   +30%     +31  -5.9%      -4    GCC_except_table85
  [NEW]     +31  [NEW]      +5    lJTI170_0
  [NEW]     +30  [NEW]      +4    lJTI170_1
   +44%     +28  +100%     +28    GCC_except_table171
  -6.4%      -7   +70%     +28    GCC_except_table35
   +32%     +24   +60%     +24    GCC_except_table177
   +46%     +24  +150%     +24    GCC_except_table178
   +38%     +20  +125%     +20    GCC_except_table217
   +26%     +15  +100%     +16    GCC_except_table100
   +19%     +12   +43%     +12    GCC_except_table206
   +19%     +12   +43%     +12    GCC_except_table212
   +14%      +8   +40%      +8    GCC_except_table181
  +5.6%      +4   +11%      +4    GCC_except_table216
  +2.9%      +2  +7.4%      +2    typeinfo name for arrow::StructArray
  +1.2%      +2  +2.6%      +2    typeinfo name for std::__1::__shared_ptr_emplace<arrow::ChunkedArray, std::__1::allocator<arrow::ChunkedArray> >
  -6.8%      -4 -16.7%      -4    GCC_except_table83
 -13.3%      -8 -33.3%      -8    GCC_except_table149
 -17.9%     -12 -37.5%     -12    GCC_except_table41
 -20.7%     -12 -50.0%     -12    GCC_except_table8
 -23.9%     -16 -50.0%     -16    GCC_except_table52
 -26.3%     -20 -50.0%     -20    GCC_except_table215
  [DEL]     -30  [DEL]      -4    lJTI169_1
 -27.3%     -30 -46.9%     -30    lJTI2_0
  [DEL]     -31  [DEL]      -5    lJTI169_0
  +2.9%      +3 -47.1%     -32    GCC_except_table34
  [DEL]     -41  [DEL]     -16    lJTI25_0
  [DEL]     -42  [DEL]     -16    lCPI222_1
 -40.7%     -44 -61.1%     -44    GCC_except_table173
  [DEL]     -46  [DEL]     -12    GCC_except_table7
  [DEL]     -47  [DEL]     -12    GCC_except_table55
  -0.6%     -48  -0.6%     -48    arrow::compute::internal::FSLTakeExec(arrow::compute::KernelContext*, arrow::compute::ExecSpan const&, arrow::compute::ExecResult*)
  [DEL]     -51  [DEL]     -16    GCC_except_table37
  [DEL]     -51  [DEL]     -16    GCC_except_table51
  [DEL]     -51  [DEL]     -16    GCC_except_table61
  [DEL]     -52  [DEL]     -16    GCC_except_table135
  [DEL]     -52  [DEL]     -16    GCC_except_table148
  [DEL]     -52  [DEL]     -16    GCC_except_table161
  [DEL]     -52  [DEL]     -16    GCC_except_table203
  [DEL]     -52  [DEL]     -16    GCC_except_table285
  [DEL]     -52  [DEL]     -16    GCC_except_table287
  [DEL]     -52  [DEL]     -16    GCC_except_table288
  [DEL]     -55  [DEL]     -20    GCC_except_table33
  [DEL]     -55  [DEL]     -20    GCC_except_table40
  [DEL]     -56  [DEL]     -20    GCC_except_table208
  [DEL]     -56  [DEL]     -20    GCC_except_table214
 -48.3%     -58 -46.2%     -24    GCC_except_table2
  [DEL]     -60  [DEL]     -24    GCC_except_table127
  [DEL]     -60  [DEL]     -24    GCC_except_table306
  [DEL]     -63  [DEL]     -28    GCC_except_table29
  [DEL]     -64  [DEL]     -28    GCC_except_table163
  [DEL]     -64  [DEL]     -28    GCC_except_table180
  [DEL]     -68  [DEL]     -32    GCC_except_table113
  [DEL]     -68  [DEL]     -32    GCC_except_table145
 -22.1%     -72 -28.1%     -72    GCC_except_table11
  [DEL]     -75  [DEL]     -40    GCC_except_table59
  [DEL]     -76  [DEL]     -40    GCC_except_table108
  [DEL]     -76  [DEL]     -40    GCC_except_table205
  [DEL]     -76  [DEL]     -40    GCC_except_table211
 -61.3%     -92 -79.3%     -92    GCC_except_table9
 -20.2%     -95 -15.0%     -60    GCC_except_table16
  [DEL]     -95  [DEL]     -60    GCC_except_table71
  [DEL]     -95  [DEL]     -60    GCC_except_table82
  [DEL]     -95  [DEL]     -60    GCC_except_table98
  [DEL]     -96  [DEL]     -60    GCC_except_table110
 -61.5%     -96 -71.4%     -60    GCC_except_table152
 -64.3%     -99 -76.2%     -64    GCC_except_table86
  [DEL]    -100  [DEL]     -64    GCC_except_table131
  [DEL]    -100  [DEL]     -64    GCC_except_table174
  [DEL]    -103  [DEL]     -24    typeinfo for arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl
  [DEL]    -107  [DEL]     -72    GCC_except_table65
  [DEL]    -108  [DEL]     -72    GCC_except_table139
  [DEL]    -108  [DEL]     -72    GCC_except_table226
  [DEL]    -113  [DEL]     -28    arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl::Finish()
  [DEL]    -120  [DEL]     -48    GCC_except_table222
  [DEL]    -127  [DEL]     -48    vtable for arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl
  [DEL]    -132  [DEL]     -60    GCC_except_table198
 -24.1%    -132 -25.8%    -132    GCC_except_table21
  [DEL]    -133  [DEL]      -8    arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>::Init()
  [DEL]    -134  [DEL]     -64    GCC_except_table89
  [DEL]    -136  [DEL]     -64    GCC_except_table101
 -26.2%    -136 -28.1%    -136    GCC_except_table18
  [DEL]    -137  [DEL]     -16    typeinfo for arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>
  [DEL]    -137  [DEL]     -58    typeinfo name for arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl
  [DEL]    -140  [DEL]     -68    GCC_except_table189
 -64.8%    -140 -72.2%    -104    GCC_except_table193
  [DEL]    -149  [DEL]     -96    arrow::FixedSizeBinaryArray::~FixedSizeBinaryArray()
 -74.5%    -152 -90.5%    -152    GCC_except_table280
  [DEL]    -156  [DEL]    -120    GCC_except_table310
  [DEL]    -169  [DEL]     -48    vtable for arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>
 -50.1%    -208 -54.7%    -208    GCC_except_table22
  [DEL]    -221  [DEL]    -100    typeinfo name for arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>
  [DEL]    -252  [DEL]      -8    arrow::compute::internal::(anonymous namespace)::Selection<arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl, arrow::FixedSizeBinaryType>::~Selection()
  -2.1%    -256  -2.1%    -256    ltmp5
  [DEL]    -312  [DEL]    -240    GCC_except_table169
 -82.5%    -316 -90.8%    -316    GCC_except_table25

  -8.9%    -352  -9.2%    -352    arrow::compute::internal::PopulateTakeKernels(std::__1::vector<arrow::compute::internal::SelectionKernelData, std::__1::allocator<arrow::compute::internal::SelectionKernelData> >*)
  [DEL]    -464  [DEL]    -304    arrow::compute::internal::(anonymous namespace)::FSBSelectionImpl::~FSBSelectionImpl()
  -0.8%    -677  [ = ]       0    [Unmapped]
  [DEL] -5.83Ki  [DEL] -5.72Ki    arrow::compute::internal::FSBTakeExec(arrow::compute::KernelContext*, arrow::compute::ExecSpan const&, arrow::compute::ExecResult*)
  [DEL] -23.9Ki  [DEL] -23.8Ki    arrow::compute::internal::PrimitiveTakeExec(arrow::compute::KernelContext*, arrow::compute::ExecSpan const&, arrow::compute::ExecResult*)
  -0.1%    -504  +0.5% +1.45Ki    TOTAL

UPDATE: Last 3 commits lead to:

  -0.1%    -528  +0.5% +1.21Ki    TOTAL

  Impact on origin/main:

    Before this commit
     -0.1%    -504  +0.5% +1.45Ki    TOTAL

    After this commit
     -0.1%    -392  +0.5% +1.27Ki    TOTAL
In a follow-up PR I'm adding Status return type out of necessity, so
this change is not only for binary size reasons.

    -0.1%    -528  +0.5% +1.21Ki    TOTAL
@felipecrv felipecrv requested a review from pitrou June 6, 2024 17:09
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants