Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(function): Implement map_delete function #15480

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

shamb0
Copy link
Contributor

@shamb0 shamb0 commented May 11, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Implement map_delete function

  • Addresses part of 15295

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

Signed-off-by: shamb0 <r.raajey@gmail.com>
@shamb0 shamb0 changed the title Refactor, Add map_delete functionality to map.rs feat(function): Implement map_delete function May 11, 2024
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label May 11, 2024
@shamb0
Copy link
Contributor Author

shamb0 commented May 11, 2024

Hi @b41sh,

  • I've completed the first draft of map_delete. Could you please review the code and provide feedback when you get a chance?

  • Let me know if there are any areas that need improvement or alternative approaches to consider.

Thanks!

@b41sh b41sh self-requested a review May 11, 2024 06:14
Signed-off-by: shamb0 <r.raajey@gmail.com>
@b41sh
Copy link
Member

b41sh commented May 11, 2024

Hi @b41sh,

  • I've completed the first draft of map_delete. Could you please review the code and provide feedback when you get a chance?
  • Let me know if there are any areas that need improvement or alternative approaches to consider.

Thanks!

Hi @shamb0 , thanks for continuing to contribute code.
Since the map_delete function has a variable number of arguments and may have more than one key argument, it is better to use Factory function instead of Builtin function. You can refer to these Factory functions as examples.

https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/string_multi_args.rs#L88
https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/string_multi_args.rs#L247
https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/geo.rs#L311

There are many more implementations of the factory function, and you can find more examples by searching the code base for registry.register_function_factory

The factory function is not written in the same way as the builtin function, it can support more flexible argument types, but it needs to deal with more problems, there are mainly the following steps:

First of all, you need to check whether the input arguments are supported types, for the map_delete function, the number of arguments is at least two, the type of the first argument must be map, the type of the second and later arguments must be the same as the type of the map key, and can not be nullable. If the arguments do not meet these conditions, return NULL, otherwise, we can get args_type and return_type, used to generate FunctionSignature.

The second part of writing the logic for executing the function is that the input args are of type ValueRef, and each argument could be a Scalar or a Column, which needs to be handled in a different way. You also need a builder to store the result data, there are many examples in this part, you can refer to the implementation of other functions.

@shamb0
Copy link
Contributor Author

shamb0 commented May 11, 2024

Hi @b41sh,

  • I've completed the first draft of map_delete. Could you please review the code and provide feedback when you get a chance?
  • Let me know if there are any areas that need improvement or alternative approaches to consider.

Thanks!

Hi @shamb0 , thanks for continuing to contribute code. Since the map_delete function has a variable number of arguments and may have more than one key argument, it is better to use Factory function instead of Builtin function. You can refer to these Factory functions as examples.

https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/string_multi_args.rs#L88 https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/string_multi_args.rs#L247 https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/geo.rs#L311

There are many more implementations of the factory function, and you can find more examples by searching the code base for registry.register_function_factory

The factory function is not written in the same way as the builtin function, it can support more flexible argument types, but it needs to deal with more problems, there are mainly the following steps:

First of all, you need to check whether the input arguments are supported types, for the map_delete function, the number of arguments is at least two, the type of the first argument must be map, the type of the second and later arguments must be the same as the type of the map key, and can not be nullable. If the arguments do not meet these conditions, return NULL, otherwise, we can get args_type and return_type, used to generate FunctionSignature.

The second part of writing the logic for executing the function is that the input args are of type ValueRef, and each argument could be a Scalar or a Column, which needs to be handled in a different way. You also need a builder to store the result data, there are many examples in this part, you can refer to the implementation of other functions.

Thanks @b41sh, for sharing those details! I'll take a good look at the requirements and circle back with any questions or updates.

@b41sh
Copy link
Member

b41sh commented May 27, 2024

Hi @shamb0 It's been a long time since last update, have you made any progress recently?

@shamb0
Copy link
Contributor Author

shamb0 commented May 27, 2024

Hi @shamb0 It's been a long time since last update, have you made any progress recently?

Hi @b41sh,

Sorry for the delay. I've been tied up with some personal matters. I have some updates that are partially complete and will aim to make a full commit in the next few days.

Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
@shamb0 shamb0 marked this pull request as ready for review May 29, 2024 00:13
Signed-off-by: shamb0 <r.raajey@gmail.com>
@shamb0
Copy link
Contributor Author

shamb0 commented May 29, 2024

Hi @b41sh,

I've refactored the implementation to use a factory function instead of a built-in function. The proto implementation is ready for your review. Currently, it only supports the key variant of type DataType::String. Since the DataType collection is extensive, could you please suggest which other variants we should support next?

Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
@b41sh
Copy link
Member

b41sh commented May 29, 2024

Hi @b41sh,

I've refactored the implementation to use a factory function instead of a built-in function. The proto implementation is ready for your review. Currently, it only supports the key variant of type DataType::String. Since the DataType collection is extensive, could you please suggest which other variants we should support next?

Hi @shamb0 Thank you for your modification. There are still some minor problems that need to be modified. You can have a look at them when you are free.

@shamb0
Copy link
Contributor Author

shamb0 commented Jun 1, 2024

Hi @b41sh,
I've refactored the implementation to use a factory function instead of a built-in function. The proto implementation is ready for your review. Currently, it only supports the key variant of type DataType::String. Since the DataType collection is extensive, could you please suggest which other variants we should support next?

Hi @shamb0 Thank you for your modification. There are still some minor problems that need to be modified. You can have a look at them when you are free.

Hi @b41sh,

Thank you for the review comments. I'll incorporate the suggested changes and have an updated version ready for your review soon.

@shamb0
Copy link
Contributor Author

shamb0 commented Jun 5, 2024

Hi @b41sh,

Code refactored based on the review comments. Appreciate the detailed feedback, it helped in making the necessary changes.
Please review and let me know if any further modifications are required.

Signed-off-by: shamb0 <r.raajey@gmail.com>
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
Signed-off-by: shamb0 <r.raajey@gmail.com>
@shamb0
Copy link
Contributor Author

shamb0 commented Jun 7, 2024

Hi @b41sh,

I've addressed all the comments and made the necessary changes.
The PR is now ready for In-take review.

Please let me know if there's anything else that needs attention.

Thanks,

src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
src/query/functions/src/scalars/map.rs Outdated Show resolved Hide resolved
Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>

let inner_builder_type = match input_map_sref.infer_data_type() {
DataType::Map(box typ) => typ,
DataType::EmptyMap => DataType::EmptyMap,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed, as the ScalarRef::Map(col) can't have EmptyMap type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a quick clarification regarding the test cases for the EmptyMap condition. I have captured the current behavior of the function as follows:

// Deleting keys from an empty map
run_ast(file, "map_delete({})", &[]);

run_ast(file, "map_delete({}, NULL, NULL)", &[]);

// Deleting keys from a map literal
run_ast(file, "map_delete({}, ['k3'], ['k2'])", &[]);
run_ast(
    file,
    "map_delete({'k1': 'v1', 'k2': 'v2', 'k3': 'v3', 'k4': 'v4'}, 'k3', 'k2')",
    &[],
);

Here are the unit test results:

error: 
  --> SQL:1:1
  |
1 | map_delete({})
  | ^^^^^^^^^^^^^^ no overload satisfies `map_delete(Map(Nothing))`


error: 
  --> SQL:1:1
  |
1 | map_delete({}, NULL, NULL)
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^ no overload satisfies `map_delete(Map(Nothing), NULL, NULL)`


error: 
  --> SQL:1:1
  |
1 | map_delete({}, ['k3'], ['k2'])
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no overload satisfies `map_delete(Map(Nothing), Array(String), Array(String))`

Is this behavior aligned with the requirements? I couldn't find details for handling the EmptyMap condition in the documentation: Snowflake Documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sqls do not match the type required by the map_delete function. The first example has only one argument, and the second and third examples have the wrong type of key value, so return the error is correct.

Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
}

let map_key_type = match args_type[0].remove_nullable() {
DataType::Map(box DataType::Tuple(type_tuple)) if type_tuple.len() == 2 => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need support EmptyMap here, the previous code is correct, you can modify it back. Delete DataType::EmptyMap on line 317 is because we have already processed the EmptyMap type on line 299, so EmptyMap type can't appear on line 317.
I think there is basically ok with the rest of the codes. After this modification is completed, we can merge this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankyou

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shamb0 , seems not correct. select map_delete({}, 'a') will return SemanticError, you can write like this:

let map_key_type = match args_type[0].remove_nullable() {
    DataType::Map(box DataType::Tuple(type_tuple)) if type_tuple.len() == 2 => {
        Some(type_tuple[0].clone())
    }
    DataType::EmptyMap => None,
    _ => return None,
};

if let Some(map_key_type) = map_key_type {
    for arg_type in args_type.iter().skip(1) {
        if arg_type != &map_key_type {
            return None;
        }
    }
} else {
    let key_type = &args_type[1];
    if !key_type.is_boolean()
        && !key_type.is_string()
        && !key_type.is_numeric()
        && !key_type.is_decimal()
        && !key_type.is_date_or_date_time()
    {
        return None;
    }
    for arg_type in args_type.iter().skip(2) {
        if arg_type != key_type {
            return None;
        }
    }
}

Please also remove DataType::EmptyMap => DataType::EmptyMap at line 325, because this case will never matched.

Add a unit test for EmptyMap, like map_delete({}, 'a', 'b'). After these changes are complete, we can merge this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @b41sh,

Changes are done, and I've added the suggested test case. Could you please double-check the result? It looks correct on my end.

ast            : map_delete({}, 'a', 'b')
raw expr       : map_delete(map(array(), array()), 'a', 'b')
checked expr   : map_delete<Map(Nothing), String, String>(map<Array(Nothing), Array(Nothing)>(array<>(), array<>()), "a", "b")
optimized expr : {} :: Map(Nothing)
output type    : Map(Nothing)
output domain  : {}
output         : {}

Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants