Validations with Callback or Iterator #497

doc987 · 2018-02-24T23:49:52Z

I'd like to be able to do an action before and after each field validation, and then have the option to redo the validation after the action. This would help to accept a JSON document that has some invalid entries, but could still be accepted with some modification. Example use cases:

Invalid Format: Fix date field formats prior to validating. For example, if format = date-time, then reformat the date, and then re-validate.
Invalid Value: Remove invalid values after failing validation. For example, if a value fails a test, remove it from the input, and then re-validate.

This could be achieved by either of the following:

Callback: Allow a callback function before each validation and a callback function after each validation. Both callbacks would need access to the relevant input data and corresponding piece of the JSON schema. The post validation callback would need access to the result of the validation, and be able to optionally call the validation again after possibly making changes to the input data.
Iterator: Have an iterator (e.g. https://php.net/manual/en/class.iterator.php) where the validations could be applied one field at a time (and optionally be reapplied after possibly making changes to the input data). The user of the iterator would need access to the same information as the callback functions.

erayd · 2018-02-25T08:34:22Z

@doc987 I don't have the time to implement this; I already have too much stuff on my todo list, and this feature is complex. However, you are welcome to implement it yourself and submit a pull request.

Due to the way JSON schema works, and some of the changes coming up in draft-08, this would be best done as a callback, rather than an iterator. I'd be happy with a callback feature, but would object strongly to an iterator-based implementation.

doc987 · 2018-02-25T13:25:17Z

I'm not familiar with the code for this project. Would you be able to provide any guidance on where a callback implementation should go? Any pointers or suggestions you have could be helpful.

Out of curiosity, what drawbacks do you envision for an iterator?

erayd · 2018-02-25T13:43:13Z

I'm not familiar with the code for this project.

It's a mess. Lots of legacy garbage. It's been cleaned up somewhat over the last year or so, but there's still a long way to go.

Would you be able to provide any guidance on where a callback implementation should go? Any pointers or suggestions you have could be helpful.

The most logical place is probably in the UndefinedConstraint class.
Bear in mind that you'll need some way of dealing with schema branches which are allowed to fail (e.g. anyOf), or even supposed to fail (oneOf, not, enum).
You'll also need to update the error list.
Your PR will need to provide full test coverage for any code you are adding. PRs without proper unit testing will be rejected.

Out of curiosity, what drawbacks do you envision for an iterator?

Iteration is a preprocessing model, which isn't possible with JSON schema. Attempting to pre-process $ref can result in infinite recursion, which violates the spec - validators are not allowed to infinitely recurse, even if the schema definition is infinitely recursive. As such, this feature would need to be implemented in a lazy / on-demand fashion, which realistically means it needs to be done using a callback model.

shmax · 2018-02-25T18:34:40Z

Also: JSON Schema uses an injection model that should allow you to subclass as many of the constraints as you need and then you can simply override any point in the code that seems appropriate for what you want to do. The upside of such an approach is that you don't have to sell the idea to us or get put through the unit test wringer. The downside is that you will be circumventing the API contract and will be on your own for making sure your code doesn't break if internal changes ever happen.

For example:

use JsonSchema\Constraints\NiumberConstraint;

class MyNumberConstrant extends NumberConstraint {
   public function check($element, $schema = null, JsonPointer $path = null, $i = null)
    {
        $this->_preValidate($element, $schema); // your function
        parent::check($element, $schema, $path, $i);
        $this->_postValidate($element, $schema); // your other function
    }
}

$factory = new \JsonSchema\Constraints\Factory();
$factory->setConstraintClass( "number", "MyNumberConstraint");

$validator = new \JsonSchema\Validator($factory);

doc987 · 2018-02-27T02:42:59Z

What do you think about changing the $value argument to a reference on some of the "check" functions in https://github.com/justinrainbow/json-schema/blob/master/src/JsonSchema/Constraints/Constraint.php?

For example, consider the function checkString() shown below.

    protected function checkString($value, $schema = null, JsonPointer $path = null, $i = null)
    {
        $validator = $this->factory->createInstanceFor('string');
        $validator->check($value, $schema, $path, $i);

        $this->addErrors($validator->getErrors());
    }

Suppose the $value argument is changed to &$value. A pre-validation callback could be called immediately before check(), and a post-validation callback could be called immediately after check(). Both callbacks could be passed the original data ($value) as a reference, and so would be able to modify it. Both callbacks could also be passed the relevant schema piece ($schema). Maybe pass $path and $i too? The post-validation callback could also be passed $validator so the errors (if any) could be found.

The change of $value to a reference would maybe apply to:

checkString()
checkNumber()
checkEnum()
checkFormat()

Callbacks could be added to:

checkArray()
checkObject()
checkType()
checkUndefined()
checkString()
checkNumber()
checkEnum()
checkFormat()

Side Note: Many of the functions in that file have a similar format. Do you think it makes sense to make a more generic function (example below)?

    protected function checkType($type, $value, $schema = null, JsonPointer $path = null, $more = [])
    {
        $validator = $this->factory->createInstanceFor($type);
        $validator->check($value, $schema, $path, $more);
        $this->addErrors($validator->getErrors());
    }

erayd · 2018-02-27T04:42:32Z

What do you think about changing the $value argument to a reference on some of the "check" functions...

I'd be OK with that, provided there's a legitimate reason for doing so - some of them already do this for the purpose of setting defaults. However...

...why do you want to change them? You've proposed adding callbacks in many places (rather than just one), and changing $value to a reference in several places - but I don't understand why you've done it this way.

In almost all cases, all values pass through UndefinedConstraint::check() anyway, and are assigned a more specific validator after the non-type-specific checks are run.
In the few cases that don't work that way, they either don't call those checkXXXX() methods anyway (e.g. item iteration in CollectionConstraint, which calls the check() method on specific validator classes), or
They are subordinate to another validator that would already have gone through UndefinedConstraint::check() earlier in the process (e.g. chained calls to FormatValidator).

UndefinedConstraint::check() is probably the most logical home for the kind of callback behavior you're wanting to add. Is there a reason you're wanting to put it elsewhere?

doc987 · 2018-02-27T05:21:51Z

I was thinking that specific checks (e.g. string) didn't go through UndefinedConstraint::check(), but upon closer examination, it looks like that does happen. In that case, your suggestion (update https://github.com/justinrainbow/json-schema/blob/master/src/JsonSchema/Constraints/UndefinedConstraint.php) sounds better. Perhaps a pre-validation callback would go immediately before validateCommonProperties() and a post-validation callback would go immediately after validateTypes().

At that stage the $validator with the validation results is no longer available. Maybe $this->getErrors() would be passed to the post-validation callback instead?

erayd · 2018-02-27T05:51:47Z

That sounds good.

How do you intend to handle cases where failure is actually the expected outcome (i.e. failure does not mean that the value is invalid)?

doc987 · 2018-02-27T13:28:45Z

I'm not sure that the expected outcome is relevant for this purpose. The callback functions would be called in all cases, so those functions get to choose what to do (if anything). Am I missing something?

erayd · 2018-02-27T18:40:51Z

Am I missing something?

Well, you mentioned that the primary purpose of these callbacks was to modify values that failed validation, in order to make them pass validation on a second attempt. If something is supposed to fail, then the callback would need to know that, in order to avoid modifying a value that should not be modified.

If you don't pass expected-failure information to the callback, then how will you know whether or not to modify the value?

doc987 · 2018-02-28T00:02:02Z

I was thinking that $validator (originally) or $this->getErrors() (more recently) would be passed to the callback, and provide the necessary information. In other words, if the test result information (including whether or not the value tested is valid) could be passed to the post-validation callback, then that callback could decide what to do based on the information received. Are you saying that information is not available, or that something different needs to be passed? Ideally the test would provide enough return information to make a validity assessment.

erayd · 2018-02-28T00:18:17Z

Are you saying that information is not available, or that something different needs to be passed?

I'm saying that, as things stand, you cannot tell the difference between the following:

A branch of the schema that is supposed to fail when the document is valid; and
A branch of the schema that is not supposed to fail, and will cause the document to be deemed invalid.

With the way validation works currently, they look exactly the same - in both cases, $this->getErrors() will contain errors, and the current schema branch will say that validation has failed. The only difference is that in cases where the failure is deliberate, the errors get thrown away afterwards.

For example, take the following schema:

{
    "not": {
        "type": "string"
    },
    "maximum": "10"
}

If your document is 5, and you validate it against that schema, it will (correctly) pass validation, because it's not a string, and is less than 10. However, there is still an error triggered inside the schema for #/not/type - that rule checks to see if the value is a string, and 5 is not a string, and therefore does not pass validation against that specific rule. However, because not means that a failure is deemed to be valid, the error gets discarded during the evaluation of #/not.

In order to implement your feature, you will need some way for the callback to tell the difference between an error in #/maximum (which will cause validation to fail), and an error in #/not/type (which will cause validation to succeed). It's this part of it which makes your feature complex to implement.

doc987 · 2018-02-28T00:52:52Z

Is it possible to "wait" until all validations have been run for a field? In your example, that would mean that the post-validation callback isn't called until the not string and maximum tests have been run, at which point the callback would be informed that the field is valid (no errors).

Alternatively, is it possible to wait until the full not string validation is complete before calling the post-validation callback?

erayd · 2018-02-28T01:08:52Z

There is not currently any mechanism in the library to do this - you would need to implement it yourself as part of the PR.

dafeder · 2024-05-14T15:54:29Z

I would humbly suggest closing this. It seems outside the scope of what this library is trying to accomplish, and would significantly increase the complexity of the business logic, an order of magnitude more edge cases that would need to be covered in tests, etc. I think the more important thing is to have granular and complete error reporting so that the application using the validator can react, re-validate as needed, etc.

DannyvdSluijs · 2024-05-14T18:33:45Z

Although I've mentioned this issue in #716 in the list of Issues that need a more in depth triage. But considering this once again and seeing the thoughts of @dafeder I think that it is indeed best to consider this out of scope for this library indeed.
And put the focus and efforts into the code of what json-schema embodies.

@doc987 you probably can still make this into a separate project which uses this library in its core for the validation.

bighappyface added the PRs Welcome label Feb 27, 2018

DannyvdSluijs mentioned this issue Mar 8, 2024

Triage on Issues & Pull requests #716

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validations with Callback or Iterator #497

Validations with Callback or Iterator #497

doc987 commented Feb 24, 2018 •

edited

erayd commented Feb 25, 2018

doc987 commented Feb 25, 2018

erayd commented Feb 25, 2018

shmax commented Feb 25, 2018 •

edited

doc987 commented Feb 27, 2018 •

edited

erayd commented Feb 27, 2018

doc987 commented Feb 27, 2018

erayd commented Feb 27, 2018

doc987 commented Feb 27, 2018

erayd commented Feb 27, 2018

doc987 commented Feb 28, 2018

erayd commented Feb 28, 2018 •

edited

doc987 commented Feb 28, 2018

erayd commented Feb 28, 2018

dafeder commented May 14, 2024

DannyvdSluijs commented May 14, 2024

Validations with Callback or Iterator #497

Validations with Callback or Iterator #497

Comments

doc987 commented Feb 24, 2018 • edited

erayd commented Feb 25, 2018

doc987 commented Feb 25, 2018

erayd commented Feb 25, 2018

shmax commented Feb 25, 2018 • edited

doc987 commented Feb 27, 2018 • edited

erayd commented Feb 27, 2018

doc987 commented Feb 27, 2018

erayd commented Feb 27, 2018

doc987 commented Feb 27, 2018

erayd commented Feb 27, 2018

doc987 commented Feb 28, 2018

erayd commented Feb 28, 2018 • edited

doc987 commented Feb 28, 2018

erayd commented Feb 28, 2018

dafeder commented May 14, 2024

DannyvdSluijs commented May 14, 2024

doc987 commented Feb 24, 2018 •

edited

shmax commented Feb 25, 2018 •

edited

doc987 commented Feb 27, 2018 •

edited

erayd commented Feb 28, 2018 •

edited