Skip to content

Latest commit

 

History

History
252 lines (205 loc) · 14.2 KB

data-formats.adoc

File metadata and controls

252 lines (205 loc) · 14.2 KB

REST Basics - Data formats

{MUST} use standard data formats

Open API (based on JSON Schema Validation vocabulary) defines formats from ISO and IETF standards for date/time, integers/numbers and binary data. You must use these formats, whenever applicable:

OpenAPI type OpenAPI format Specification Example

integer

int32

4 byte signed integer between -231 and 231-1

7721071004

integer

int64

8 byte signed integer between -263 and 263-1

772107100456824

integer

bigint

arbitrarily large signed integer number

77210710045682438959

number

float

binary32 single precision decimal number — see {IEEE-754-2008}[IEEE 754-2008/ISO 60559:2011]

3.1415927

number

double

binary64 double precision decimal number — see {IEEE-754-2008}[IEEE 754-2008/ISO 60559:2011]

3.141592653589793

number

decimal

arbitrarily precise signed decimal number

3.141592653589793238462643383279

string

byte

base64url encoded byte following {RFC-7493}#section-4.4[RFC 7493 Section 4.4]

"VA=="

string

binary

base64url encoded byte sequence following {RFC-7493}#section-4.4[RFC 7493 Section 4.4]

"VGVzdA=="

string

date

{RFC-3339}[RFC 3339] internet profile — subset of ISO 8601

"2019-07-30"

string

date-time

{RFC-3339}[RFC 3339] internet profile — subset of ISO 8601

"2019-07-30T06:43:40.252Z"

string

time

{RFC-3339}[RFC 3339] internet profile — subset of ISO 8601

"06:43:40.252Z"

string

duration

{RFC-3339}[RFC 3339] internet profile — subset of ISO 8601

"P1DT30H4S"

string

period

{RFC-3339}[RFC 3339] internet profile — subset of ISO 8601

"2019-07-30T06:43:40.252Z/PT3H"

string

password

"secret"

string

email

{RFC-5322}[RFC 5322]

"example@zalando.de"

string

idn-email

{RFC-6531}[RFC 6531]

"hello@bücher.example"

string

hostname

{RFC-1034}[RFC 1034]

"www.zalando.de"

string

idn-hostname

{RFC-5890}[RFC 5890]

"bücher.example"

string

ipv4

{RFC-2673}[RFC 2673]

"104.75.173.179"

string

ipv6

{RFC-4291}[RFC 4291]

"2600:1401:2::8a"

string

uri

{RFC-3986}[RFC 3986]

"https://www.zalando.de/"

string

uri-reference

{RFC-3986}[RFC 3986]

"/clothing/"

string

uri-template

{RFC-6570}[RFC 6570]

"/users/{id}"

string

iri

{RFC-3987}[RFC 3987]

"https://bücher.example/"

string

iri-reference

{RFC-3987}[RFC 3987]

"/damenbekleidung-jacken-mäntel/"

string

uuid

{RFC-4122}[RFC 4122]

"e2ab873e-b295-11e9-9c02-…​"

string

json-pointer

{RFC-6901}[RFC 6901]

"/items/0/id"

string

relative-json-pointer

Relative JSON pointers

"1/id"

string

regex

regular expressions as defined in {ECMA-262}[ECMA 262]

"^[a-z0-9]+$"

Note: Formats bigint and decimal have been added to the OpenAPI defined formats — see also {MUST} define a format for number and integer types and {MUST} use standard formats for date and time properties below.

We add further OpenAPI formats that are useful especially in an e-commerce environment e.g. language code, country code, and currency based other ISO and IETF standards. You must use these formats, whenever applicable:

OpenAPI type format Specification Example

string

iso-639-1

two letter language code — see {ISO-639-1}[ISO 639-1]. Hint: In the past we used iso-639 as format.

"en"

string

bcp47

multi letter language tag — see {BCP47}[BCP 47]. It is a compatible extension of {ISO-639-1}[ISO 639-1] optionally with additional information for language usage, like region, variant, script.

"en-DE"

string

iso-3166-alpha-2

two letter country code — see {ISO-3166-1-alpha-2}[ISO 3166-1 alpha-2]. Hint: In the past we used iso-3166 as format.

"GB" Hint: It is "GB", not "UK", even though "UK" has seen some use at Zalando.

string

iso-4217

three letter currency code — see {ISO-4217}[ISO 4217]

"EUR"

string

gtin-13

Global Trade Item Number — see {GTIN}[GTIN]

"5710798389878"

Remark: Please note that this list of standard data formats is not exhaustive and everyone is encouraged to propose additions.

{MUST} define a format for number and integer types

In {MUST} use standard data formats we added bigint and decimal to the OpenAPI defined formats. As an implication, you must always provide one of the formats int32, int64, bigint or float, double, decimal when you define an API property of JSON type number or integer.

By this we prevent clients from guessing the precision incorrectly, and thereby changing the value unintentionally. The precision must be translated by clients and servers into the most specific language types; in Java, for instance, the number type with decimal format will translate into BigDecimal and integer type with int32 format will translate to int or Integer Java types.

{MUST} encode binary data in base64url

You may expose binary data. You must use a standard media type and data format, if applicable — see Rule 168. If no standard is available, you must define the binary data as string typed property with binary format using base64url encoding — as also described in {MUST} use standard data formats.

{MUST} use standard formats for date and time properties

As a specific case of {MUST} use standard data formats, you must use the string typed formats date, date-time, time, duration, or period for the definition of date and time properties. The formats are based on the standard {RFC-3339}[RFC 3339] internet profile -- a subset of ISO 8601

APIs {MUST} use the upper-case T as a separator between date and time and upper-case Z at the end when generating dates as opposed to lower-case t, z, space, or any other character. This is stricter than the date/time format as defined in RFC 3339, which leaves it up to the specification.

Exception: For passing date/time information via standard protocol headers, HTTP RFC 7231 requires to follow the date and time specification used by the Internet Message Format RFC 5322.

As defined by the standard, time zone offset may be used, however, we recommend to only use times based on UTC without local offsets. For example 2015-05-28T14:07:17Z rather than 2015-05-28T14:07:17+00:00. From experience we have learned that zone offsets are not easy to understand and often not correctly handled. Note also that zone offsets are different from local times which may include daylight saving time. When it comes to storage, all dates should be consistently stored in UTC without a zone offset. Localization should be done locally by the services that provide user interfaces, if required.

Hint: We discourage using numerical timestamps. It typically creates issues with precision, e.g. whether to represent a timestamp as 1460062925, 1460062925000 or 1460062925.000. Date strings, though more verbose and requiring more effort to parse, avoid this ambiguity.

{SHOULD} select appropriate one of date or date-time format

When choosing between date and datetime formats you should take into account the following:

  • date should be used for properties where no exact point in time is required and day time-range is sufficient, for instance, document dates, birthdays, ETAs (estimated time of arrival). Without further context, date implies the time period from midnight to midnight in the local time zone. However, the timezone information can be also provided as an additional context information via other fields indicating location.

  • datetime should be used in all other cases where an exact point in time is required, for instance, datetimes for supplier advice, specific processing events, fast delivery planning dates. As required in {MUST} use standard formats for date and time properties, datetime requires the explicit time zone offset to be provided, which avoids misinterpretations and eliminates the need of an additional context to provide.

{SHOULD} use standard formats for time duration and interval properties

Properties and that are by design durations and time intervals should be represented as strings formatted as defined by {ISO-8601}[ISO 8601] ({RFC-3339}#appendix-A[RFC 3339 Appendix A contains a grammar] for durations and periods - the latter called time intervals in {ISO-8601}[ISO 8601]). ISO 8601:1-2019 defines an extension (..) to express open ended time intervals that are very convenient in searches and are included in the below {RFC-2234}[ABNF] grammar:

   dur-second        = 1*DIGIT "S"
   dur-minute        = 1*DIGIT "M" [dur-second]
   dur-hour          = 1*DIGIT "H" [dur-minute]
   dur-time          = "T" (dur-hour / dur-minute / dur-second)
   dur-day           = 1*DIGIT "D"
   dur-week          = 1*DIGIT "W"
   dur-month         = 1*DIGIT "M" [dur-day]
   dur-year          = 1*DIGIT "Y" [dur-month]
   dur-date          = (dur-day / dur-month / dur-year) [dur-time]
   duration          = "P" (dur-date / dur-time / dur-week)

   period-explicit   = iso-date-time "/" iso-date-time
   period-start      = iso-date-time "/" (duration / "..")
   period-end        = (duration / "..") "/" iso-date-time
   period            = period-explicit / period-start / period-end

A time interval query parameters should use <time-property>_between instead of the parameter tuple <time-property>_before/<time-property>_after, while properties providing a time interval should be named <time-property>_interval.

{MUST} use standard formats for country, language and currency properties

As a specific case of {MUST} use standard data formats you must use the following standard formats:

  • Country codes: {ISO-3166-1-alpha-2}[ISO 3166-1-alpha-2] two letter country codes indicated via format iso-3166-alpha-2 in the OpenAPI specification.

  • Language codes: {ISO-639-1}[ISO 639-1] two letter language codes indicated via format iso-639-1 in the OpenAPI specification.

  • Language variant tags: {BCP47}[BCP 47] multi letter language tag indicated via format bcp47 in the OpenAPI specification. (It is a compatible extension of {ISO-639-1}[ISO 639-1] with additional optional information for language usage, like region, variant, script)

  • Currency codes: {ISO-4217}[ISO 4217] three letter currency codes indicated via format iso-4217 in the OpenAPI specification.

{SHOULD} use content negotiation, if clients may choose from different resource representations

In some situations the API supports serving different representations of a specific resource (at the same URL), e.g. JSON, PDF, TEXT, or HTML representations for an invoice resource. You should use content negotiation to support clients specifying via the standard HTTP headers {Accept}, {Accept-Language}, {Accept-Encoding} which representation is best suited for their use case, for example, which language of a document, representation / content format, or content encoding. You [172] like application/json or application/pdf for defining the content format in the {Accept} header.

{SHOULD} only use UUIDs if necessary

Generating IDs can be a scaling problem in high frequency and near real time use cases. UUIDs solve this problem, as they can be generated without collisions in a distributed, non-coordinated way and without additional server round trips.

However, they also come with some disadvantages:

  • pure technical key without meaning; not ready for naming or name scope conventions that might be helpful for pragmatic reasons, e.g. we learned to use names for product attributes, instead of UUIDs

  • less usable, because…​

    • cannot be memorized and easily communicated by humans

    • harder to use in debugging and logging analysis

    • less convenient for consumer facing usage

  • quite long: readable representation requires 36 characters and comes with higher memory and bandwidth consumption

  • not ordered along their creation history and no indication of used id volume

  • may be in conflict with additional backward compatibility support of legacy ids

UUIDs should be avoided when not needed for large scale id generation. Instead, for instance, server side support with id generation can be preferred ({POST} on id resource, followed by idempotent {PUT} on entity resource). Usage of UUIDs is especially discouraged as primary keys of master and configuration data, like brand-ids or attribute-ids which have low id volume but widespread steering functionality.

Please be aware that sequential, strictly monotonically increasing numeric identifiers may reveal critical, confidential business information, like order volume, to non-privileged clients.

In any case, we should always use string rather than number type for identifiers. This gives us more flexibility to evolve the identifier naming scheme. Accordingly, if used as identifiers, UUIDs should not be qualified using a format property.

Hint: Usually, random UUID is used - see UUID version 4 in {RFC-4122}[RFC 4122]. Though UUID version 1 also contains leading timestamps it is not reflected by its lexicographic sorting. This deficit is addressed by ULID (Universally Unique Lexicographically Sortable Identifier). You may favour ULID instead of UUID, for instance, for pagination use cases ordered along creation time.