Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add supported backend information in Sig (signature for a expression) #5700

Closed
ivoson opened this issue May 11, 2024 · 10 comments
Closed

Add supported backend information in Sig (signature for a expression) #5700

ivoson opened this issue May 11, 2024 · 10 comments
Labels
enhancement New feature or request

Comments

@ivoson
Copy link
Contributor

ivoson commented May 11, 2024

Description

Sig is a mapping from spark expression to substrait name, used to convert spark functions to substrait and offload the funciton.
Currently, it includes below information:

case class Sig(expClass: Class[_], name: String)

IIUC all the registered functions are maintained in ExpressionMappings and serve for both velox and clickhouse backend. But some of them are supported by only one of the backend. For example: Expm1 and StringLocate are only supported by clickhouse.

In the mapping, it's hard to tell which function is supported by which backend. Adding supported backend information here might be helpful for the maintenance/track the status of the functions.

@ivoson ivoson added the enhancement New feature or request label May 11, 2024
@ivoson
Copy link
Contributor Author

ivoson commented May 11, 2024

cc @PHILO-HE does this make sense? Please let me know if I missed anything. Thanks.

@PHILO-HE
Copy link
Contributor

Hi @ivoson, thanks for your proposal!

Actually, a kind of validate-by-trying-offloading (covering Java/Scala code & native code) is introduced in Gluten to reduce code maintenance efforts. At least for Velox backend (CH backend may still have some gaps to support validate-by-trying-offloading), if validation process fails (e.g., function signature is not found in velox), we expect fallback happens.

Maybe, there are some cases that the expected fallback doesn't happen when validation fails. Do you have any findings that need us to fix? Thanks!

@ivoson
Copy link
Contributor Author

ivoson commented May 11, 2024

Maybe, there are some cases that the expected fallback doesn't happen when validation fails. Do you have any findings that need us to fix?

@PHILO-HE thanks for the response. I didn't find any cases need to be fixed. I created this issue because that I supposed the functions added in ExpressionMappings would be all supported by velox, but it was not. So it's not clear from the code to identify that how many of the expressions supported in velox/clickhouse. I am just wondering whether it's worth doing it. If not, we can rely on the docs up-to-date I think.

@PHILO-HE
Copy link
Contributor

Maybe, there are some cases that the expected fallback doesn't happen when validation fails. Do you have any findings that need us to fix?

@PHILO-HE thanks for the response. I didn't find any cases need to be fixed. I created this issue because that I supposed the functions added in ExpressionMappings would be all supported by velox, but it was not. So it's not clear from the code to identify that how many of the expressions supported in velox/clickhouse. I am just wondering whether it's worth doing it. If not, we can rely on the docs up-to-date I think.

@ivoson, I think it is not worth doing that. cc @FelixYBW, @zhouyuan

@FelixYBW
Copy link
Contributor

In our plan, the validation should all happens in native library. native library should have all the info it supports the expression/operators or not. But Velox itself doesn't have this feature due to its low priority, we still put all the logic in Gluten c++ now. It's where the mismatch comes from. It should be tracked when we nightly rebase to Velox but currently we didn't. How the sig() function can solve the mismatch issue

@ivoson
Copy link
Contributor Author

ivoson commented May 14, 2024

In our plan, the validation should all happens in native library. native library should have all the info it supports the expression/operators or not. But Velox itself doesn't have this feature due to its low priority, we still put all the logic in Gluten c++ now. It's where the mismatch comes from. It should be tracked when we nightly rebase to Velox but currently we didn't. How the sig() function can solve the mismatch issue

Thanks @PHILO-HE @FelixYBW for your explanation. The proposal is not targeting for plan validation, it's more like a metadata to point out the expression is supported by which backend.

For example, adding the metadata in Sig like Sig(Add, 'add', Set(Velox, CH)) can tell that the expression is supported by both Velox and CH. There could be 2 scenarios we can leverage this:

  1. When do plan validation, it can be eagerly detected if the function is not supported by the backend;
  2. Track the function support status;

@zhztheplayer
Copy link
Member

When do plan validation, it can be eagerly detected if the function is not supported by the backend;

We had this API

to customize the supported expression list from backend implementations. Did you have a check on that code?

@ivoson
Copy link
Contributor Author

ivoson commented May 14, 2024

When do plan validation, it can be eagerly detected if the function is not supported by the backend;

We had this API

to customize the supported expression list from backend implementations. Did you have a check on that code?

Thanks @zhztheplayer. I noticed this, but I am not very clear about what's is the design purpose of this API.

In the very beginning, I supposed that the expressions defined in ExpressionMappings should be supported by both engines(velox and ch), and the ones supported by only one of the engines should be defined with in above API. But it's not the case in current code implementation.

@PHILO-HE
Copy link
Contributor

@ivoson, some functions have multiple signatures. And it is possible that not all signatures are supported by native engine. If we add a fag in ExpressionMappings, we have to consider such partially supported state, which makes the code a bit complex. As I said, the native side validation can reduce the efforts for code maintenance, though it is not clear to developers to know which functions are not supported.

@ivoson
Copy link
Contributor Author

ivoson commented May 15, 2024

@ivoson, some functions have multiple signatures. And it is possible that not all signatures are supported by native engine. If we add a fag in ExpressionMappings, we have to consider such partially supported state, which makes the code a bit complex. As I said, the native side validation can reduce the efforts for code maintenance, though it is not clear to developers to know which functions are not supported.

Understood. As the potential changes could introduce maintenance burden, closing the issue.

@ivoson ivoson closed this as completed May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants