Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lookup): exposing vector lookup logic through wasm #12

Closed
wants to merge 2 commits into from

Conversation

dhable
Copy link
Contributor

@dhable dhable commented Jun 14, 2023

chore: bring cargo version inline with repo tags

There were a number of manual tag/releases created in our fork repo to release changes that could be picked up in the vector fork. With Jenkins, we now need to reset the version in the Cargo.toml files to match the tag set so things don't get weird.

Ref: LOG-16869


feat(lookup): exposing vector lookup logic through wasm

Expose two simple methods - one for parsing VRL paths into segments and one for joining segments into a VRL path. This accounts for for fields, indexes and coalesce. The code will also handle quoting and minimizing quoting on field to be more human friendly.

These two functions could be used to perform richer validation on user input and simplify some of the more complex path construction in the pipeline service and/or front end UI.

Ref: LOG-17092, LOG-16869

@dhable dhable force-pushed the dhable/LOG-17092 branch 2 times, most recently from 81e340f to fbba862 Compare June 14, 2023 18:43
Jenkinsfile Outdated Show resolved Hide resolved
@dhable dhable force-pushed the dhable/LOG-17092 branch 8 times, most recently from 0a93545 to b5eb30b Compare June 14, 2023 20:59
@dhable
Copy link
Contributor Author

dhable commented Jun 14, 2023

Ugh, I had to make a version tag v0.3.0 and then reset the version in Cargo.toml files to that because cargo doesn't think a 4 component version number is a valid semver (and it's not according to the RFC). Just more semver rage for Dan.

@dhable dhable requested a review from a team June 14, 2023 21:12
Copy link
Contributor

@c-nixon c-nixon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits, one perf note and a few questions

lib/lookup/Cargo.toml Outdated Show resolved Hide resolved
lib/lookup/Cargo.toml Outdated Show resolved Hide resolved
lib/lookup/Cargo.toml Show resolved Hide resolved
lib/lookup/src/lib.rs Show resolved Hide resolved
lib/lookup/src/mezmo_wasm.rs Outdated Show resolved Hide resolved
lib/lookup/src/mezmo_wasm.rs Show resolved Hide resolved
lib/lookup/src/mezmo_wasm.rs Outdated Show resolved Hide resolved
lib/lookup/src/mezmo_wasm.rs Show resolved Hide resolved
lib/lookup/src/mezmo_wasm.rs Show resolved Hide resolved
None
}

fn into_jsvalue(segment: &Segment) -> JsValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit like this would be better as a method on Segment, or as a From<Segment> for JsValue impl somewhere since we're already touching the segment file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed at https://github.com/answerbook/vrl/pull/12/files#diff-a35fa3fd1a9b06a65e5a0b37ce02e015a90db55d305c79308f2adbd5b8a0a834R39-R69.

I didn't want to leak JsValue into the rest of the code base so instead of implementing the method directly on Segment, I made an implementation of From on JsValue.

@penick
Copy link
Contributor

penick commented Jun 20, 2023

Is the goal to use this in the pipeline-service, etc.? If that the case then this should not live in this repo. It should likely be its own crate that uses the vrl crate.

pub struct Field<'a> {
pub name: &'a str,
// This is a very lazy optimization to avoid having to scan for escapes.
pub requires_quoting: bool,
}

// LOG-17092: Part of trying to validate whether a field segment is within a path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to only set requires_quoting when it contains invalid chars a construction time:

diff --git a/lib/lookup/src/lookup_buf/segmentbuf.rs b/lib/lookup/src/lookup_buf/segmentbuf.rs
index fc803fb36..c2abcb713 100644
--- a/lib/lookup/src/lookup_buf/segmentbuf.rs
+++ b/lib/lookup/src/lookup_buf/segmentbuf.rs
@@ -38,8 +38,9 @@ impl From<String> for FieldBuf {
             // So we have to take a slice and clone it.
             let len = name.len();
             name = name[1..len - 1].to_string();
-            requires_quoting = true;
-        } else if !field::is_valid_fieldname(&name) {
+        }
+
+        if !field::is_valid_fieldname(&name) {
             requires_quoting = true
         }

Note: Same change required in FieldBuf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had started down this route and ended up running into a number of test cases that asserted that parsing and formatting resulted in the same string. For example, parsing top."middle".child and then formatting that segment would result in top.middle.child. Both are semantically the same but we then lose the ability to recreate the input from a parsed representation. From what I can tell of the path syntax, this is fine but the rippling number of changes to test code was concerning and why I opt-ed for localized implementation.

We could add an additional boolean into the struct to denote if the original field was quoted, along with the flag if quoting is required. Then the default formatting could retain the existing behavior while we add a new method to display with reduced quoting. I decided against this to avoid changing things alongside upstream too much. It's worth considering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing a boolean if the original was quoted or not does allow the default to_string() method to maintain the existing behavior while adding a new to_humanized_string() method that does the reduced quoting.

Copy link
Contributor

@c-nixon c-nixon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All makes sense to me!

There were a number of manual tag/releases created in our fork repo to
release changes that could be picked up in the vector fork. With
Jenkins, we now need to reset the version in the Cargo.toml files to
match the tag set so things don't get weird.

Ref: LOG-16869
@dhable dhable force-pushed the dhable/LOG-17092 branch 2 times, most recently from 6efcd6e to f8d72f7 Compare June 21, 2023 15:57
Expose two simple methods - one for parsing VRL paths into segments and
one for joining segments into a VRL path. This accounts for for fields,
indexes and coalesce. The code will also handle quoting and minimizing
quoting on field to be more human friendly.

These two functions could be used to perform richer validation on user
input and simplify some of the more complex path construction in the
pipeline service and/or front end UI.

Ref: LOG-17092, LOG-16869
@dhable dhable closed this Nov 16, 2023
@dhable dhable deleted the dhable/LOG-17092 branch November 16, 2023 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants