New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzzy Matching Functions - Jaro Winkler Similarity and Levenshtein Distance #2839
base: master
Are you sure you want to change the base?
Conversation
...nPlan-dependencies/src/main/java/org/finos/legend/engine/plan/dependencies/util/Library.java
Outdated
Show resolved
Hide resolved
...gine-pure-code-compiled-core/src/main/resources/core/pure/router/routing/router_routing.pure
Show resolved
Hide resolved
...rces/core_java_platform_binding/legendJavaPlatformBinding/planConventions/stringLibrary.pure
Outdated
Show resolved
Hide resolved
...main/resources/core_relational_snowflake/relational/sqlQueryToString/snowflakeExtension.pure
Outdated
Show resolved
Hide resolved
...in/resources/core_relational/relational/sqlQueryToString/testSuite/dynaFunctions/string.pure
Outdated
Show resolved
Hide resolved
fc2(hash_String_1__HashType_1__String_1_, {ctx,text,hashType | $library->j_invoke('hash', [$text, $hashType], javaString())}) | ||
|
||
fc2(hash_String_1__HashType_1__String_1_, {ctx,text,hashType | $library->j_invoke('hash', [$text, $hashType], javaString())}), | ||
fc2(jaroWinklerSimilarity_String_1__String_1__Float_1_, {ctx,str1,str2 | $library->j_invoke('jaroWinklerSimilarity',[$str1, $str2], javaFloat())}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't return type javaDouble? As the Library method is this -
public static double jaroWinklerSimilarity(String str1, String str2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -42,6 +42,7 @@ function <<access.private>> meta::relational::functions::sqlQueryToString::duckD | |||
{ | |||
let allStates = allGenerationStates(); | |||
[ | |||
dynaFnToSql('jaroWinklerSimilarity', $allStates, ^ToSql(format='JARO_WINKLER_SIMILARITY(%s, %s)')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing edit distance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, added
@@ -158,9 +158,11 @@ function <<access.private>> meta::relational::functions::sqlQueryToString::snowf | |||
dynaFnToSql('hour', $allStates, ^ToSql(format='date_part(\'hour\', %s)')), | |||
dynaFnToSql('indexOf', $allStates, ^ToSql(format='CHARINDEX(%s)', transform={p:String[2] | $p->at(1) + ', ' + $p->at(0)})), | |||
dynaFnToSql('isAlphaNumeric', $allStates, ^ToSql(format=regexpPattern('%s'), transform={p:String[1]|$p->transformAlphaNumericParamsDefault()})), | |||
dynaFnToSql('jaroWinklerSimilarity', $allStates, ^ToSql(format='ROUND(JAROWINKLER_SIMILARITY(%s, %s)/100, 2)')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why round(x, 2)? Anyways it will be two digits only as does number b/n 0-100/100. Also, should we keep these in lowercase like most other functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added for precision to output. But we can work without it as well, removed. Yes, lowercase would be more semantic, done.
{ | ||
if (string1 == ValueNull.INSTANCE || string2 == ValueNull.INSTANCE) | ||
{ | ||
throw new IllegalArgumentException("Null values not supported in legend_h2_extension_edit_distance"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should return null rather than failing. Same goes for other method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
What type of PR is this?
Improvement
What does this PR do / why is it needed ?
Support for Jaro Winkler Similarity and Levenshtein Distance in Snowflake and DuckDb Runtime
Which issue(s) this PR fixes:
Fixes #
Other notes for reviewers:
Depends on finos/legend-pure#805
Does this PR introduce a user-facing change?
Yes, new functions