Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More standard support for date/time validation #30

Open
bodiam opened this issue Apr 15, 2020 · 9 comments
Open

More standard support for date/time validation #30

bodiam opened this issue Apr 15, 2020 · 9 comments

Comments

@bodiam
Copy link

bodiam commented Apr 15, 2020

First of all, thanks for your amazing efforts on JSON Schemas, it's a wonderful schema, and it works really well for us.

There's only one issue which we encounter on a regular basis, which is date validation. I've been scouring the documentation and the issues, but I haven't seen a great solution so far.

What we'd like to do, is to validate dates. Unfortunately, the formats of these dates are beyond our control, and aren't in any ISO standard. Often they are delivered in formats such as "YYYYMMDD", "DD-MM-YYYY", DDMMYYYYHHMMSS". I guess you get the idea.

Right now, we use regular expressions to validate these dates, or depending on the validator, we use a custom format (not all validators support this however, and it requires extra implmentation efforts), but it's not ideal (think of leap years for example). It's quite hard. So, I was hoping that the current date/date-time formats could be made a little bit more flexible, and something like a pattern could be used?

We had a few things in mind, such as:

"processDate": {
    "type": "string",
    "format": "date-time",
    "pattern": "dd-MM-yyyyy"
}

Pro: It's quite backwards compatible
Con: it's using the pattern regex for something else now. But maybe in combination with format, this might be okay?

or

"processDate": {
    "type": "string",
    "format": "dd-MM-yyyyy"
}

Pro: We believe formats should be used for something like this
Con: It's using a custom format, not really in line with the current user defined formats

"processDate": {
    "type": "string",
    "format": "dd-MM-yyyyy"
   "pattern": "[0-9]{2}-[0-9]{2}-[1-2][0-9]{3}"
}

Pro: We believe formats should be used for something like this, plus there's a fallback pattern for those validators not understanding the format
Con: It's might be slightly unclear to understand which patterns "wins", especially when they are conflicting.

or:

"processDate": {
    "type": "date",
    "format": "dd-MM-yyyyy"
}

Pro: it's clear that these are dates, in a specific format
Con: it's a whole new datatype in JSON schema, which might be more complex to implement.

I understand they are all not ideal, but a date format like this should be doable in most programming languages.

Therefor I was wondering what your thoughts were on this, or even if you had alternatives to the above suggestions.

@bodiam bodiam changed the title Better support for date formats More standard support for date/time validation Apr 15, 2020
@handrews
Copy link
Contributor

@bodiam a regular expression is going to be far more reliable than format, which has always been inconsistently implemented and is now not treated as a validation assertion by default. It is simply an annotation- a bit of information that tells the application that it might want to do some additional validation of its own. Technically you can still ask an implementation to validate it, but that's unlikely to work well and need not be supported by implementations (because, as a practical matter, people tended to not implement it no matter what the spec said).

format is an awful mess and we're hoping that people will use the new vocabularies feature to create suites of purpose-specific keywords (like a set of flexible and comprehensive date-time keywords).

While this is not the official position of the project, it is my personal hope that we can eventually drop format because it's a huge source of confusion and errors. date-time is one of the few formats that tends to work reliably, actually. Which is another reason format is bad- it takes a simple problem (date and time validation) and staples it to a bunch of complicated problems leading people to just not support the keyword at all, even the easy parts.

Honestly, I'd just do this (using 2019-09 syntax, replace $defs with definitions for draft-07 and earlier):

"properties": {
    "someDate": {"$ref": "#/$defs/customDateTime"}
},
"$defs": {
    "customDateTime": {
    "type": "string",
    "pattern": "the-actual-regex-for-your-format"
}

We definitely can't overload pattern with more than regexes- regexes are complicated enough as it is. This is another reason that 3rd-party vocabularies are a better idea. Someone could design a keyword that is an efficient date templating system (based on strftime or whatever).

@bodiam
Copy link
Author

bodiam commented Apr 26, 2020

Hi @handrews ,

Thanks for your reply. I appreciate the the feedback, but using a regular expression for date validation seems like a less than ideal solution. For example, a poor version of regular expression to handle dates looks like this:

((0[13578]|1[02])[\/.](0[1-9]|[12][0-9]|3[01])[\/.](18|19|20)[0-9]{2})|((0[469]|11)[\/.](0[1-9]|[12][0-9]|30)[\/.](18|19|20)[0-9]{2})|((02)[\/.](0[1-9]|1[0-9]|2[0-8])[\/.](18|19|20)[0-9]{2})|((02)[\/.]29[\/.](((18|19|20)(04|08|[2468][048]|[13579][26]))|2000))

I got it from here: https://stackoverflow.com/questions/8647893/regular-expression-leap-years-and-more, and it handles things like leap years. As you'll probably agree, this is terribly complex regular expression. Using a simpler regex, such as [0-9]{4}-[0-9]{2}-[0-9]{2} will work for most dates, but it makes using JSON schemas for input validation a poor choice, since it won't say that 2020-02-30 is an invalid date. And a validation which validates only a subset is hardly validation at all.

I hope you'll agree with me here that regexes are a powerful language, but not a golden hammer and not the best tool for date validation.

I did some research, and it seems that all languages I checked (Javascript, C#, Python, Java) have some way of validating date formats.

Would you be open to a different solution besides using Regex? What I'd like, and not sure if it's possible, is to have something like this:

"processDate": {
    "type": "string",
    "date-format": "dd-MM-yyyyy"
   "pattern": "[0-9]{2}-[0-9]{2}-[1-2][0-9]{3}"
}

Something like this could use the date-format for accurate date formats, but use regular expressions as a fallback for validators which don't implement the date-format attribute. Would that be something which could work?

@handrews
Copy link
Contributor

@bodiam We added extensible, re-usable vocabulary support because there are endless requests for new keywords, often for odd cases (like non-standard, or less-common standard, date formats). There is no way we will ever produce a final standard with all of these requests, and we've long since hit the point of diminishing returns in most areas.

We are encouraging folks to write their own keyword specifications and build extensions to handle their keywords. Using $vocabulary in a meta-schema, you can indicate what extensions your schemas require.

If you want to use the existing keywords, you'll need to use a regex. Yes, they are often complicated, that is the nature of regexes. If you want a separate keyword, you'll need to make your own.

Sensible date-time handling is something that we think would be an excellent, and fairly easy, extension vocabulary for someone to produce. But that someone will not be the JSON Schema project as we have our hands full as it is. The line must be drawn somewhere- half the community wants more keywords and half the community thinks we should have standardized years ago.

I'm going to move this to the vocabularies repo. That doesn't mean the JSON Schema org will work on it- that repo is a holding area for vocabulary proposals so people interested in writing extensions will see what other ideas have been floated so far, and maybe collaborate on them.

@handrews handrews transferred this issue from json-schema-org/json-schema-spec Apr 26, 2020
@awwright
Copy link
Member

Using a simpler regex, such as [0-9]{4}-[0-9]{2}-[0-9]{2} will work for most dates, but it makes using JSON schemas for input validation a poor choice, since it won't say that 2020-02-30 is an invalid date. And a validation which validates only a subset is hardly validation at all.

There's always multiple passes to validation, and JSON Schema is no exception. That date (2020-02-30) actually follows the ABNF grammar laid out in RFC 3339, it is "well formed". What makes it invalid is that it does not follow additional post-processing requirements. Among other things, this includes computing the leap year, day-of-month limits from a lookup table, and leap seconds which, as a rule, are not known until less than a year in advance.

I think the best way to do this is to break apart validation into the tools that do it best. When you're parsing a JSON document, use a JSON parser. When you want to verify the formatting is correct, you use a linter. When you want to test code, you use a unit test suite. When you want to validate the structure of JSON, you use a JSON Schema validator. When strings inside that document specify a date, you parse it with a date parsing library.

All of these possibly generate errors that need to be relayed to the user. And this way, JSON Schema doesn't have to know a thing about leap seconds and leap years, we can just leave that task to a tool that specializes in it.


Now I understand there's other purposes where we still want to declare the format of something, even though it's not related to structural validation. This is a case where it makes sense to use a custom vocabulary keyword, probably in the strftime/strptime format. For example, so I can look at a schema, and know how to construct a date, or how to parse it.

Also, for good error reporting, it does make sense to have some hook into the parser. I think if you have a streaming+validating JSON parser (like I'm working on), you can register a callback that parses the date at the moment it's encountered during parsing, and if your date parsing library emits an error, maybe the parser can pass that error through, preserving line and position information relative to the entire JSON document.

@MarcGodard
Copy link

MarcGodard commented Aug 24, 2023

Sorry to bump this old issue, but I am also always having issues with date time. My request is related, so though to add here.

I use { type: 'string', format: 'date-time' } and the option coerceTypes: true. However, when a data object is passed, I get a validation issue. 1) why isn't coercion being applied? and 2) What would be the best way to write my own.

Still a little new to this, and so far love it all but date issues.

@gregsdennis
Copy link
Member

@MarcGodard I'm not sure what coerceTypes is. It's not anything defined by JSON Schema. Is it something specific to the implementation you're using? If so, maybe you need to contact them.

@MarcGodard
Copy link

MarcGodard commented Aug 24, 2023

@gregsdennis yes, sorry I should have mentioned that I am using Ajv to coerce the types. But is string and format date-time valid? Or is it just objects?

@gregsdennis
Copy link
Member

What you put is a valid schema, yes, but format isn't validated by default, generally (ajv has a history of doing it's own thing, though, so check with them).

@MarcGodard
Copy link

Ok thanks. Will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants