Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support json_valid and json_keys #8491

Merged
merged 29 commits into from
Dec 19, 2023

Conversation

SeaRise
Copy link
Contributor

@SeaRise SeaRise commented Dec 11, 2023

What problem does this PR solve?

Issue Number: close #8490

Problem Summary:

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 11, 2023
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 11, 2023
@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 11, 2023
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 11, 2023
@ti-chi-bot ti-chi-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 11, 2023
@SeaRise SeaRise changed the title WIP: support json_valid and json_keys support json_valid and json_keys Dec 11, 2023
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2023
@SeaRise
Copy link
Contributor Author

SeaRise commented Dec 11, 2023

/run-all-tests

dbms/src/Functions/FunctionsJson.h Outdated Show resolved Hide resolved
dbms/src/Functions/FunctionsJson.h Outdated Show resolved Hide resolved
dbms/src/Functions/FunctionsJson.h Outdated Show resolved Hide resolved
dbms/src/Functions/FunctionsJson.h Outdated Show resolved Hide resolved
dbms/src/Functions/FunctionsJson.h Outdated Show resolved Hide resolved
dbms/src/Functions/FunctionsJson.h Outdated Show resolved Hide resolved
{
auto cur_offset = tmp_buffer.offset();
JsonBinary::appendStringRef(tmp_buffer, key);
auto after_offset = tmp_buffer.offset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I‘m a little confused that looks like key is already a StringRef/string_view based on the real data in data_from, why here need to deep copy it to tmp_buffer and construct new StringRef based on tmp_buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok..
add a new function buildKeyArrayInBuffer for std::vector<std::string_view> keys

ColumnUInt8::Container & vec_null_map = col_null_map->getData();

{
JsonBinary::JsonBinaryWriteBuffer write_buffer(data_to, data_from.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried that it will reserve too many memories for the result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use two loops, first is to extract the keys from orignal json column, then use another loop to do the copy? The intermediate result should not be too large if we keep the key_binaries as StringRef refering the orignal data in data_from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be inconvenient, especially when dealing with nullmap.
Typically, keys are very small, how about automatically resize instead of reserve?
https://github.com/pingcap/tiflash/pull/8491/files#diff-52627c7f14d286918db163406a85acac41a1ad79c22184768379ebe163c1131aR1664-R1665

Copy link
Contributor

@yibin87 yibin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others LGTM

throw Exception(
fmt::format("Illegal json path expression of function {}", getName()),
ErrorCodes::ILLEGAL_COLUMN);
auto path_expr_containor = std::make_unique<JsonPathExprRefContainer>(path_expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo here: path_expr_container

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, updated.

throw Exception(
fmt::format("Illegal json path expression of function {}", getName()),
ErrorCodes::ILLEGAL_COLUMN);
auto path_expr_containor = std::make_unique<JsonPathExprRefContainer>(path_expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, path_expr_container

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, updated.

std::vector<JsonPathExprRefContainerPtr> path_expr_containor_vec(1);
path_expr_containor_vec[0] = std::move(path_expr_containor);

const auto & json_val = json_source->getWhole();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If actual extract codes can be extracted as a common function, it would look better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, updated.

@@ -389,12 +391,12 @@ JsonBinary JsonBinary::getArrayElement(size_t index) const
return getValueEntry(HEADER_SIZE + index * VALUE_ENTRY_SIZE);
}

String JsonBinary::getObjectKey(size_t index) const
std::string_view JsonBinary::getObjectKey(size_t index) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since JsonBinary already use StringRef to represent the string reference, and there is no significant difference between StringRef and string_view, why not use StringRef here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, used StringRef instead.

offsets_to,
vec_null_map);
}
else if (path_col->isColumnConst())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If json_col is constant and path_col is not constant, will doExecuteCommon handle it correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SeaRise
Copy link
Contributor Author

SeaRise commented Dec 18, 2023

/run-all-tests

1 similar comment
@SeaRise
Copy link
Contributor Author

SeaRise commented Dec 18, 2023

/run-all-tests

@SeaRise
Copy link
Contributor Author

SeaRise commented Dec 19, 2023

/run-all-tests

Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the lgtm label Dec 19, 2023
Copy link
Contributor

ti-chi-bot bot commented Dec 19, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: windtalker, yibin87

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

ti-chi-bot bot commented Dec 19, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-12-15 03:02:32.037417746 +0000 UTC m=+584443.074644668: ☑️ agreed by yibin87.
  • 2023-12-19 03:30:53.957174791 +0000 UTC m=+931744.994401719: ☑️ agreed by windtalker.

@ti-chi-bot ti-chi-bot bot merged commit 8a19c86 into pingcap:master Dec 19, 2023
6 checks passed
@SeaRise SeaRise deleted the support_more_json_functions branch December 19, 2023 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support json_valid and json_keys
3 participants