Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HACKDAYS] ES_PREDICATES and ES_STREET_PREDICATES pipelines #88

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

kwiersema
Copy link
Contributor

@kwiersema kwiersema commented Jan 31, 2024

Context

Hackdays POC for making full address validation logic run in the predicate pipeline

Approach

  • Create a new temporary es_predicates and es_street_predicates pipelines for A/B testing
  • Create the full address predicates found, zip_known, city_known, province_known, street_known
  • Add the new predicates to the es_predicates and es_street_predicates pipelines
  • Update validator pipeline executor flow with full address specific logic

Testing

With ES strategy:

image

With ES predicates strategy:

image

🐞 Bug fix 🐞

ES strategy has no concerns on invalid zip if a candidate is found:
image

ES_PREDICATES strategy will still run the zip valid checks as a final sanity check even if a candidate is found and determined to be "good"

image

🎩 Benchmarking results on dev_tagged suite for US 🎩

+----------------------------------------------------------+---------+
| Field                                                    | Percent |
+----------------------------------------------------------+---------+
| % of confirmed validated addresses                       | 99.4%   |
| % of validated addresses that we INCORRECTLY flagged     | 0.6%    |
| % of known invalid addresses (with CORRECT suggestion)   | 81.0%   |
| % of known invalid addresses (with INCORRECT suggestion) | 7.1%    |
| % of known invalid addresses (with no suggestion)        | 11.9%   |
+----------------------------------------------------------+---------+

vs current baseline performance of

+----------------------------------------------------------+---------+
| Field                                                    | Percent |
+----------------------------------------------------------+---------+
| % of confirmed validated addresses                       | 99.4%   |
| % of validated addresses that we INCORRECTLY flagged     | 0.6%    |
| % of known invalid addresses (with CORRECT suggestion)   | 79.8%   |
| % of known invalid addresses (with INCORRECT suggestion) | 7.1%    |
| % of known invalid addresses (with no suggestion)        | 13.1%   |
+----------------------------------------------------------+---------+

Checklist

  • I have added a CHANGELOG entry for this change (or determined that it isn't needed)
  • Added Sorbet signatures to new methods I've introduced
  • Commits squashed

@@ -117,17 +117,45 @@ def execute_pipeline
local_concerns[config.field] = [] if local_concerns[config.field].nil?
next if local_concerns[config.field].present?

next if config.field == :address && concerns_preclude_validation(local_concerns.values.flatten)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not execute any full address predicate if we already have local concerns.


sig { returns(T.nilable(AtlasEngine::AddressValidation::Validators::FullAddress::AddressComparison)) }
def address_comparison
@address_comparison ||= AddressValidation::Es::CandidateSelector.new(
Copy link
Contributor Author

@kwiersema kwiersema Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the simple non multi-locale call for now. Supporting multi-locale would be a small refactor to move the logic from the best_candidate method in FullAddress to another class and call that here.

@address_comparison ||= AddressValidation::Es::CandidateSelector.new(
datastore: AtlasEngine::AddressValidation::Es::Datastore.new(address: @address),
address: @address,
).best_candidate&.address_comparison # TODO: get rid of the tuple, it's not needed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need the CandidateTuple anymore since the address comparison is now initialized with the candidate.

- field: address
class: AtlasEngine::AddressValidation::Validators::Predicates::FullAddress::Found
- field: zip
class: AtlasEngine::AddressValidation::Validators::Predicates::Zip::ValidForProvince
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the zip checks at the end of the pipeline to still return zip concerns if a candidate is not found.

@kwiersema kwiersema changed the title [WIP] add temp es_predicates pipeline + full address found predicate [HACKDAYS] ES_PREDICATES and ES_STREET_PREDICATES pipelines Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant