-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41460: [C++] Use ASAN to poison temp vector stack memory #41695
Conversation
|
Hi @felipecrv , this is a followup of our discussion in #41335 (comment). Think you might want to take a look. Thanks. |
@@ -20,16 +20,29 @@ | |||
#include "arrow/compute/util.h" | |||
#include "arrow/memory_pool.h" | |||
|
|||
#ifdef ADDRESS_SANITIZER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some platforms don't have ASAN, so wrap the ASAN related code with a flag that is only defined with ASAN.
TempVectorStack() = default; | ||
~TempVectorStack(); | ||
|
||
TempVectorStack(const TempVectorStack&) = delete; | ||
TempVectorStack& operator=(const TempVectorStack&) = delete; | ||
|
||
TempVectorStack(TempVectorStack&&) = default; | ||
TempVectorStack& operator=(TempVectorStack&&) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boilerplate code to cope with the explicit destructor, otherwise the compiler will complain that implicit copy constructor is deleted due to a member of unique_ptr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two handy macros for you here ->
arrow/cpp/src/arrow/util/macros.h
Lines 27 to 37 in f5ac05c
#ifndef ARROW_DISALLOW_COPY_AND_ASSIGN | |
#define ARROW_DISALLOW_COPY_AND_ASSIGN(TypeName) \ | |
TypeName(const TypeName&) = delete; \ | |
void operator=(const TypeName&) = delete | |
#endif | |
#ifndef ARROW_DEFAULT_MOVE_AND_ASSIGN | |
#define ARROW_DEFAULT_MOVE_AND_ASSIGN(TypeName) \ | |
TypeName(TypeName&&) = default; \ | |
TypeName& operator=(TypeName&&) = default | |
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thanks for tips! Will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
if (buffer_) { | ||
ASAN_UNPOISON_MEMORY_REGION(buffer_->mutable_data(), buffer_size_); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must unpoison the buffer before releasing.
TempVectorStack() = default; | ||
~TempVectorStack(); | ||
|
||
TempVectorStack(const TempVectorStack&) = delete; | ||
TempVectorStack& operator=(const TempVectorStack&) = delete; | ||
|
||
TempVectorStack(TempVectorStack&&) = default; | ||
TempVectorStack& operator=(TempVectorStack&&) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two handy macros for you here ->
arrow/cpp/src/arrow/util/macros.h
Lines 27 to 37 in f5ac05c
#ifndef ARROW_DISALLOW_COPY_AND_ASSIGN | |
#define ARROW_DISALLOW_COPY_AND_ASSIGN(TypeName) \ | |
TypeName(const TypeName&) = delete; \ | |
void operator=(const TypeName&) = delete | |
#endif | |
#ifndef ARROW_DEFAULT_MOVE_AND_ASSIGN | |
#define ARROW_DEFAULT_MOVE_AND_ASSIGN(TypeName) \ | |
TypeName(TypeName&&) = default; \ | |
TypeName& operator=(TypeName&&) = default | |
#endif |
static int64_t PaddedAllocationSize(int64_t num_bytes); | ||
|
||
void alloc(uint32_t num_bytes, uint8_t** data, int* id); | ||
void release(int id, uint32_t num_bytes); | ||
static constexpr uint64_t kGuard1 = 0x3141592653589793ULL; | ||
static constexpr uint64_t kGuard2 = 0x0577215664901532ULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should keep these, write them after the poisoning. Poisoning can only help us with invalid reads, but not with undesirable writes.
They were added to help detecting bugs like #11335
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @felipecrv , I'll add them back.
But I'm also putting down some notes here about what I've been thinking, perhaps just as a memo to myself.
I'm removing these guards because ASAN can prevent both invalid reads and writes to poisoned memory. I don't think I can replicate the exact issue that #11335 was trying to fix, but I simply reverted #11335 and ran hash join ut, seeing ASAN successfully caught write after poison.
However there could be more tricky cases like below:
func1
allocates a temp vectorv1
and reads/writes tov1
;func1
callsfunc2
(note: w/o releasingv1
- this is the essential of a "stack" and the tricky part);func2
allocates another temp vectorv2
and reads/writes tov2
;func2
releasesv2
and returns tofunc1
;func1
reads/writes tov1
again;func1
releasesv1
and returns to its caller.
A) Suppose writes to v1
at offset >= v1.size()
happen in 1, then ASAN can catch it because that piece of memory is already poisoned.
B) Suppose writes to v1
at offset < 0
happen in 1, then whether ASAN can catch it depends on the validity of the memory address lower than v1
, but the guards have a better chance.
C) Suppose writes to v2
at offset >= v2.size()
happen in 3, same as A.
D) Suppose writes to v2
at offset < 0
happen in 3, then ASAN has zero chance to catch it as long as the offset is within v1.size()
, but the guards have a better chance.
E) Is it possible that func1
writes to v1
at offset >= v1.size()
while v2
is still alive? So it is corrupting v2
's content without ASAN noticing because v2
's memory is unpoisoned throughout v2
's lifespan. The answer is NO, the control flow will be in func2
as long as v2
is alive so func1
won't be able to execute any code.
To summarize, ASAN poisoning is not able to 100% cover what the guards can do. The guards have a better chance to catch invalid writes at negative offsets (B and D). But other than that, ASAN is equally good as the guards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
Thank you for doing this! ASAN is so cool. |
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit dcdf4e6. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ache#41695) ### Rationale for this change See apache#41460. And reduce the overhead of current manual poisoning (filling the entire stack space with `0xFF`s) that happens even in release mode. ### What changes are included in this PR? Use ASAN API to replace the current manual poisoning of the temp vector stack memory. ### Are these changes tested? Wanted to add cases to assert that ASAN poison/unpoison is functioning. However I found it too tricky to catch an ASAN error because ASAN directly uses signals that are hard to interoperate in C++/gtest. So I just manually checked poisoning is working in my local, e.g. by intentionally not unpoisoning the allocated buffer and seeing ASAN unhappy. Just leveraging existing cases that use temp stack such as acero tests, which should cover this change well. ### Are there any user-facing changes? None. * GitHub Issue: apache#41460 Authored-by: Ruoxi Sun <zanmato1984@gmail.com> Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Rationale for this change
See #41460. And reduce the overhead of current manual poisoning (filling the entire stack space with
0xFF
s) that happens even in release mode.What changes are included in this PR?
Use ASAN API to replace the current manual poisoning of the temp vector stack memory.
Are these changes tested?
Wanted to add cases to assert that ASAN poison/unpoison is functioning. However I found it too tricky to catch an ASAN error because ASAN directly uses signals that are hard to interoperate in C++/gtest. So I just manually checked poisoning is working in my local, e.g. by intentionally not unpoisoning the allocated buffer and seeing ASAN unhappy.
Just leveraging existing cases that use temp stack such as acero tests, which should cover this change well.
Are there any user-facing changes?
None.