More Structured Errors? #65

elsiehupp · 2021-09-06T23:29:22Z

Problem Scope

I am contributing to a project that uses python-email-validator, and a difficulty I'm running into is triggering custom (and possibly localized) error messages in my own (downstream) code based on an input email address's failure mode.

Right now, the more detailed contextual information for a validation failure is only available as part of the message string of the error raised. In order to define my own error messages based on the failure mode, I would need to parse the message string, which is brittle and messy.

Possible Solutions

The solution I tried here is to create a large number of sub-classes of python-email-validator's existing custom error classes. This does the trick, I guess, though I'm uneasy about the resulting number of error classes being somewhat excessive.

An alternate solution could involve fleshing out python-email-validator's existing custom error classes with named parameters and the like. The reason I'm hesitant to take this route is that I'm new enough to Python that I'm concerned the complexity involved might be difficult to target to Python 2.7. Granted, this is almost certainly the superior approach.

Miscellaneous Comments

Wherever I raised the new error subclasses, they remained child classes of the error class I was replacing. This way, under the possible scenario that someone was already testing for error classes, the new subclasses would not break their code.
I pulled the error subclasses out into their own submodule so that it's easier to find them and read their descriptions. This may prove unnecessary with an alternative approach, and a lot of the new information could be extracted into the documentation.
I added the new error subclasses to the test script, and (at least with Python 3) testing for errors by class works as well as testing errors by message string.
I imported the new error subclasses to the test script individually, in a big, long list, rather than with a *, in part in order to make it easier to visualize which cases from validate_email() are covered by the tests. Considering there's roughly one error subclass per failure case, it's clear that the existing tests have pretty thin coverage of the existing cases.
The way I implemented the tests duplicates a whole bunch of the existing test code. I could probably reimplement the tests so as to avoid this code duplication, but I wanted to leave the existing tests initially unchanged in order to demonstrate that the new subclasses don't break the existing tests.

Anyway, what are your thoughts on resolving this problem scope? Does this initial approach seem like the better direction, or should I try the alternate approach instead? And do you have any other feedback I can work with in continuing to polish this pull request? Thanks!

elsiehupp · 2021-09-06T23:31:33Z

Spoiler alert: this is related to mail-in-a-box/mailinabox#2028, as I see you're the maintainer over there, too.

Signed-off-by: Elsie Hupp <9206310+elsiehupp@users.noreply.github.com>

elsiehupp · 2021-09-06T23:40:28Z

Okay, it looks like my test code is broken. (I probably wasn't running the test script correctly on my computer.)

Any idea how to fix this?

>  assert isinstance(exc_info, error_class)
E  AssertionError: assert False
E  +  where False = isinstance(<ExceptionInfo EmailDomainStartsWithPeriodError tblen=3>, <class 'email_validator.error_classes.EmailDomainStartsWithPeriodError'>)

JoshData · 2021-09-11T13:13:17Z

Hi. Thanks. Yep, I started this project in part to support Mail-in-a-Box.

This is probably overly complicated if this is just about localization. Typically, localization is handled by mapping strings or ID codes for error messages to tables of localized strings. In that case structure isn't needed, except for error messages that have variable substitutions within them. Rather, the error message strings are either considered stable keys for lookup into a localization table, or the strings are replaced with keys and we build-in a localization table.

How far can you get by just copying the hard-coded error message strings (e.g. "An email address cannot have a period immediately after the @-sign.") into your application to check which error the exception is?

elsiehupp · 2021-09-11T18:02:31Z

I would say the issue with depending on strings is that parsing code is difficult and messy, and it’s cleaner to pass information in closer to the original format if necessary. So, for example, downstream code could be something like:

try:
    response = do_check(thing)
catch ErrorTypeA:
    response = error_message_a(thing)
catch ErrorTypeB:
    response = error_message_b(thing)
catch ErrorTypeC:
    response = error_message_c(thing)
catch ErrorTypeD:
    response = error_message_d(thing)
catch ErrorTypeE:
    response = error_message_e(thing)
...
return response

The alternate approach would be something like:

try:
    response = do_check(thing)
catch ErrorTypeA as err:
    if err.param1:
	     response = error_message_a1(thing)
	if err.param2:
	     response = error_message_a2(thing)
	if err.param3:
	     response = error_message_a3(thing)
	if err.param4:
	     response = error_message_a4(thing)
	finally:
	     response = error_message_a_other(thing)
catch ErrorTypeB as err:
    if err.param1:
	     response = error_message_b1(thing)
	if err.param2:
	     response = error_message_b2(thing)
	if err.param3:
	     response = error_message_b3(thing)
	if err.param4:
	     response = error_message_b4(thing)
	finally:
	     response = error_message_b_other(thing)
...
return response

Versus the current approach:

try:
    response = do_check(thing)
catch ErrorTypeA as err:
    # lots of complicated string parsing
catch ErrorTypeB as err:
    # lots of complicated string parsing
...
return response

Does this distinction make sense? Basically, adding a little bit of complexity in this library dramatically simplifies enumerating the errors in a downstream application. Either way, it can be done without breaking existing code that depends on string parsing. And it would even be possible to implement both the first two options simultaneously just to give client applications more flexibility.

Anyway, my thought is that I could start by improving the test coverage, and then I could go from there.

elsiehupp · 2021-09-11T18:08:07Z

FWIW the example code in my comment just now didn’t account for the fact that a lot of the existing error messages are constructed from variables, so a function call like error_message_a1(err) might actually do something with other parameters from err, or a client application could subclass ErrorTypeA and give it a custom __str__() function.

JoshData · 2021-09-12T21:23:19Z

I understand completely. But creating a zillion error classes is not the way localization is typically implemented.

Checking if a string exactly matches ""There must be something before the @-sign." is not complicated string parsing. Hence my question, how far can you get by just copying the hard-coded error message strings into your application? I know that there are some messages that have some dynamic content, so what my question means is, how many have dynamic content? How can we separate the dynamic content from the static content of those messages to make them easier to work with?

elsiehupp · 2021-09-12T22:59:26Z

I understand completely. But creating a zillion error classes is not the way localization is typically implemented.

Yeah, I had that feeling, too. But regardless I think my next step here is going to be fleshing out the unit tests for the existing code. (The existing unit tests don’t cover all of the checks.) I know that Mail-in-a-Box (which I would like to contribute to as well) specifically mentions building out test coverage as a welcome contribution, and building out test coverage is something I need more practice with anyway.

elsiehupp · 2021-09-12T23:00:57Z

(I’m marking this as a draft just to emphasize that it isn’t done, and that the code in this pull request is not my immediate priority.)

More Structured Errors?

7799902

Signed-off-by: Elsie Hupp <9206310+elsiehupp@users.noreply.github.com>

elsiehupp force-pushed the more-structured-errors branch from a2bb69d to 7799902 Compare September 6, 2021 23:35

elsiehupp marked this pull request as draft September 12, 2021 22:59

JoshData force-pushed the main branch 5 times, most recently from ffe1669 to 95deaf8 Compare May 1, 2022 22:18

JoshData force-pushed the main branch 3 times, most recently from 1c0390b to d6a5d4b Compare March 1, 2023 23:31

JoshData force-pushed the main branch 3 times, most recently from 5c9973d to f18da74 Compare April 16, 2023 01:19

JoshData force-pushed the main branch 2 times, most recently from 0df8b7a to dbf4618 Compare September 26, 2023 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Structured Errors? #65

More Structured Errors? #65

elsiehupp commented Sep 6, 2021

elsiehupp commented Sep 6, 2021

elsiehupp commented Sep 6, 2021

JoshData commented Sep 11, 2021

elsiehupp commented Sep 11, 2021

elsiehupp commented Sep 11, 2021

JoshData commented Sep 12, 2021

elsiehupp commented Sep 12, 2021

elsiehupp commented Sep 12, 2021

More Structured Errors? #65

Are you sure you want to change the base?

More Structured Errors? #65

Conversation

elsiehupp commented Sep 6, 2021

Problem Scope

Possible Solutions

Miscellaneous Comments

elsiehupp commented Sep 6, 2021

elsiehupp commented Sep 6, 2021

JoshData commented Sep 11, 2021

elsiehupp commented Sep 11, 2021

elsiehupp commented Sep 11, 2021

JoshData commented Sep 12, 2021

elsiehupp commented Sep 12, 2021

elsiehupp commented Sep 12, 2021