Skip to content

Latest commit

 

History

History
526 lines (421 loc) · 18.7 KB

json-guidelines.adoc

File metadata and controls

526 lines (421 loc) · 18.7 KB

REST Basics - JSON payload

These guidelines provides recommendations for defining JSON data at Zalando. JSON here refers to {RFC-7159}[RFC 7159] (which updates {RFC-4627}[RFC 4627]), the "application/json" media type and custom JSON media types defined for APIs. The guidelines clarifies some specific cases to allow Zalando JSON data to have an idiomatic form across teams and services.

{MUST} use JSON as payload data interchange format

Use JSON ({RFC-7159}[RFC 7159]) to represent structured (resource) data passed with HTTP requests and responses as body payload. The JSON payload must use a JSON object as top-level data structure (if possible) to allow for future extension. This also applies to collection resources, where you ad-hoc would use an array — see also [110].

Additionally, the JSON payload must comply to the more restrictive Internet JSON ({RFC-7493}[RFC 7493]), particularly

  • {RFC-7493}#section-2.1[Section 2.1] on encoding of characters, and

  • {RFC-7493}#section-2.3[Section 2.3] on object constraints.

As a consequence, a JSON payload must

  • use {RFC-7493}#section-2.1[UTF-8 encoding]

  • consist of {RFC-7493}#section-2.1[valid Unicode strings], i.e. must not contain non-characters or surrogates, and

  • contain only {RFC-7493}#section-2.3[unique member names] (no duplicate names).

{SHOULD} design single resource schema for reading and writing

To simplify API specs and API maintenance, as well as the cognitive load of developers, endpoints should utilize a common model for reading and writing the same resource type. The differences between read and write models are best addressed by

  • marking properties writeOnly that are only available in a request, e.g. passwords, and

  • marking properties readOnly that are only available in a response, e.g. resource identifiers.

Note: A resource with property marked readOnly in the API according to the {json-schema-validation}#rfc.section.9.4[JSON schema validation] may be ignored or rejected by a server. Both are in line with the idea behind the compatibility guidance, however, we suggest to document when rejecting to prevent client side surprises.

{SHOULD} be aware of services not fully supporting JSON/unicode

JSON payloads passed via API requests consist of valid unicode characters (see {MUST} use JSON as payload data interchange format). While \uxxxx are valid characters in a JSON strings, they can create failures when leaving the API request processing context, e.g. by writing the data into a database or piping it through command line tools that are not 100% compatible with the JSON or unicode standard.

A prominent example for such a tool is the Postgres database:

  • Postgres cannot handle \u0000 in jsonb and text types (see datatype-json).

As a consequence, servers and clients that forward JSON content to other tools should check that these tools fully support the JSON or unicode standard. If not, they should reject or sanitize unicode characters not supported by the tools.

Note: A sanitization and rejection are actions that change the API behavior and therefore should be described by the API specification.

{MAY} pass non-JSON media types using data specific standard formats

Non-JSON media types may be supported, if you stick to a business object specific standard format for the payload data, for instance, image data format (JPG, PNG, GIF), document format (PDF, DOC, ODF, PPT), or archive format (TAR, ZIP).

Generic structured data interchange formats other than JSON (e.g. XML, CSV) may be provided, but only additionally to JSON as default format using content negotiation, for specific use cases where clients may not interpret the payload structure.

{SHOULD} use standard media types

You should use standard media types (defined in {iana-media-types}[media type registry] of Internet Assigned Numbers Authority (IANA)) as content-type (or accept) header information. More specifically, for JSON payload you should use the standard media type application/json (or application/problem+json for [176]).

You should avoid using custom media types like application/x.zalando.article+json. Custom media types beginning with x bring no advantage compared to the standard media type for JSON, and make automated processing more difficult.

Exception: Custom media type should be only used in situations where you need to provide API endpoint versioning (with content negotiation) due to incompatible changes.

{SHOULD} pluralize array names

Names of arrays should be pluralized to indicate that they contain multiple values. This implies in turn that object names should be singular.

{MUST} property names must be snake_case (and never camelCase)

Property names are restricted to ASCII snake_case strings matching regex ^[a-z_][a-z_0-9]*$. The first character must be a lower case letter, or an underscore, and subsequent characters can be a letter, an underscore, or a number.

Examples:

customer_number, sales_order_number, billing_address

Rationale: No established industry standard exists, but many popular Internet companies prefer snake_case: e.g. GitHub, Stack Exchange, Twitter. Others, like Google and Amazon, use both - but not only camelCase. It’s essential to establish a consistent look and feel such that JSON looks as if it came from the same hand.

{SHOULD} declare enum values using UPPER_SNAKE_CASE string

Enumerations should be represented as string typed OpenAPI definitions of request parameters or model properties. Enum values (using enum or {x-extensible-enum}) need to consistently use the upper-snake case format, e.g. VALUE or YET_ANOTHER_VALUE. This approach allows to clearly distinguish values from properties or other elements.

Exception: This rule does not apply for case sensitive values sourced from outside API definition scope, e.g. for language codes from {ISO-639-1}[ISO 639-1], or when declaring possible values for a rule 137 [sort parameter].

{SHOULD} name date/time properties with _at suffix

Dates and date-time properties should end with _at to distinguish them from boolean properties which otherwise would have very similar or even identical names:

  • {created_at} rather than {created},

  • {modified_at} rather than {modified},

  • occurred_at rather than occurred, and

  • returned_at rather than returned.

Hint: Use format: date-time (or as format: date) as required in [238].

Note: {created} and {modified} were mentioned in an earlier version of the guideline and are therefore still accepted for APIs that predate this rule.

{SHOULD} define maps using additionalProperties

A "map" here is a mapping from string keys to some other type. In JSON this is represented as an object, the key-value pairs being represented by property names and property values. In OpenAPI schema (as well as in JSON schema) they should be represented using additionalProperties with a schema defining the value type. Such an object should normally have no other defined properties.

The map keys don’t count as property names in the sense of rule 118, and can follow whatever format is natural for their domain. Please document this in the description of the map object’s schema.

Here is an example for such a map definition (the translations property):

components:
  schemas:
    Message:
      description:
        A message together with translations in several languages.
      type: object
      properties:
        message_key:
          type: string
          description: The message key.
        translations:
          description:
            The translations of this message into several languages.
            The keys are [IETF BCP-47 language tags](https://tools.ietf.org/html/bcp47).
          type: object
          additionalProperties:
            type: string
            description:
              the translation of this message into the language identified by the key.

An actual JSON object described by this might then look like this:

{ "message_key": "color",
  "translations": {
    "de": "Farbe",
    "en-US": "color",
    "en-GB": "colour",
    "eo": "koloro",
    "nl": "kleur"
  }
}

{MUST} use same semantics for null and absent properties

OpenAPI 3.x allows to mark properties as required and as nullable to specify whether properties may be absent ({}) or null ({"example":null}). If a property is defined to be not required and nullable (see 2nd row in Table below), this rule demands that both cases must be handled in the exact same manner by specification.

The following table shows all combinations and whether the examples are valid:

{CODE-START}required{CODE-END} {CODE-START}nullable{CODE-END} {CODE-START}{}{CODE-END} {CODE-START}{"example":null}{CODE-END}

true

true

{NO}

{YES}

false

true

{YES}

{YES}

true

false

{NO}

{NO}

false

false

{YES}

{NO}

While API designers and implementers may be tempted to assign different semantics to both cases, we explicitly decide against that option, because we think that any gain in expressiveness is far outweighed by the risk of clients not understanding and implementing the subtle differences incorrectly.

As an example, an API that provides the ability for different users to coordinate on a time schedule, e.g. a meeting, may have a resource for options in which every user has to make a choice. The difference between undecided and decided against any of the options could be modeled as absent and null respectively. It would be safer to express the null case with a dedicated Null object, e.g. {} compared to {"id":"42"}.

Moreover, many major libraries have somewhere between little to no support for a null/absent pattern (see Gson, Moshi, Jackson, JSON-B). Especially strongly-typed languages suffer from this since a new composite type is required to express the third state. Nullable Option/Optional/Maybe types could be used but having nullable references of these types completely contradicts their purpose.

The only exception to this rule is JSON Merge Patch {RFC-7396}[RFC 7396]) which uses null to explicitly indicate property deletion while absent properties are ignored, i.e. not modified.

{MUST} not use null for boolean properties

Schema based JSON properties that are by design booleans must not be presented as nulls. A boolean is essentially a closed enumeration of two values, true and false. If the content has a meaningful null value, we strongly prefer to replace the boolean with enumeration of named values or statuses - for example accepted_terms_and_conditions with enumeration values YES, NO, UNDEFINED.

{SHOULD} not use null for empty arrays

Empty array values can unambiguously be represented as the empty list, [].

{MUST} use common field names and semantics

You must use common field names and semantics whenever applicable. Common fields are idiomatic, create consistency across APIs and support common understanding for API consumers.

We define the following common field names:

  • {id}: the identity of the object. If used, IDs must be opaque strings and not numbers. IDs are unique within some documented context, are stable and don’t change for a given object once assigned, and are never recycled cross entities.

  • {xyz_id}: an attribute within one object holding the identifier of another object must use a name that corresponds to the type of the referenced object or the relationship to the referenced object followed by _id (e.g. partner_id not partner_number, or parent_node_id for the reference to a parent node from a child node, even if both have the type Node). Exception: We use customer_number instead of customer_id for customer facing identification of customers due to legacy reasons. (Hint: customer_id used to be defined as internal only, technical integer key, see {customer-naming-decision}[Naming Decision: customer_number vs customer_id [internal_link]]).

  • {e_tag}: the {e_tag} of an embedded sub-resource. It typically is used to carry the {ETag} for subsequent {PUT}/{PATCH} calls (see ETag and [etag-in-result-entities]).

Further common fields are defined in {SHOULD} name date/time properties with _at suffix. The following guidelines define standard objects and fields:

Example JSON schema:

tree_node:
  type: object
  properties:
    id:
      description: the identifier of this node
      type: string
    parent_node_id:
      description: the identifier of the parent node of this node
      type: string
    created_at:
      description: when got this node created
      type: string
      format: 'date-time'
    modified_at:
      description: when got this node last updated
      type: string
      format: 'date-time'
  example:
    id: '123435'
    parent_node_id: '534321'
    created_at: '2017-04-12T23:20:50.52Z'
    modified_at: '2017-04-12T23:20:50.52Z'

{MUST} use the common address fields

Address structures play a role in different business and use-case contexts, including country variances. All attributes that relate to address information must follow the naming and semantics defined below.

addressee:
  description: a (natural or legal) person that gets addressed
  type: object
  properties:
    salutation:
      description: |
        a salutation and/or title used for personal contacts to some
        addressee; not to be confused with the gender information!
      type: string
      example: Mr
    first_name:
      description: |
        given name(s) or first name(s) of a person; may also include the
        middle names.
      type: string
      example: Hans Dieter
    last_name:
      description: |
        family name(s) or surname(s) of a person
      type: string
      example: Mustermann
    business_name:
      description: |
        company name of the business organization. Used when a business is
        the actual addressee; for personal shipments to office addresses, use
        `care_of` instead.
      type: string
      example: Consulting Services GmbH
  required:
    - first_name
    - last_name

address:
  description:
    an address of a location/destination
  type: object
  properties:
    care_of:
      description: |
        (aka c/o) the person that resides at the address, if different from
        addressee. E.g. used when sending a personal parcel to the
        office /someone else's home where the addressee resides temporarily
      type: string
      example: Consulting Services GmbH
    street:
      description: |
        the full street address including house number and street name
      type: string
      example: Schönhauser Allee 103
    additional:
      description: |
        further details like building name, suite, apartment number, etc.
      type: string
      example: 2. Hinterhof rechts
    city:
      description: |
        name of the city / locality
      type: string
      example: Berlin
    zip:
      description: |
        zip code or postal code
      type: string
      example: 14265
    country_code:
      description: |
        the country code according to
        [iso-3166-1-alpha-2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2)
      type: string
      example: DE
  required:
    - street
    - city
    - zip
    - country_code

Grouping and cardinality of fields in specific data types may vary based on the specific use case (e.g. combining addressee and address fields into a single type when modeling an address label vs distinct addressee and address types when modeling users and their addresses).

{MUST} use the common money object

Use the following common money structure:

link:../models/money-1.0.0.yaml[role=include]

APIs are encouraged to include a reference to the global schema for Money.

SalesOrder:
  properties:
    grand_total:
      $ref: 'https://opensource.zalando.com/restful-api-guidelines/models/money-1.0.0.yaml#/Money'

Please note that APIs have to treat Money as a closed data type, i.e. it’s not meant to be used in an inheritance hierarchy. That means the following usage is not allowed:

{
  "amount": 19.99,
  "currency": "EUR",
  "discounted_amount": 9.99
}

Cons

A better approach is to favor composition over inheritance:

{
    "price": {
      "amount": 19.99,
      "currency": "EUR"
    },
    "discounted_price": {
      "amount": 9.99,
      "currency": "EUR"
    }
}

Pros

  • No inheritance, hence no issue with the substitution principle

  • Makes use of existing library support

  • No coupling, i.e. mixed currencies is an option

  • Prices are now self-describing, atomic values

Notes

Please be aware that some business cases (e.g. transactions in Bitcoin) call for a higher precision, so applications must be prepared to accept values with unlimited precision, unless explicitly stated otherwise in the API specification.

Examples for correct representations (in EUR):

  • 42.20 or 42.2 = 42 Euros, 20 Cent

  • 0.23 = 23 Cent

  • 42.0 or 42 = 42 Euros

  • 1024.42 = 1024 Euros, 42 Cent

  • 1024.4225 = 1024 Euros, 42.25 Cent

Make sure that you don’t convert the "amount" field to float / double types when implementing this interface in a specific language or when doing calculations. Otherwise, you might lose precision. Instead, use exact formats like Java’s BigDecimal. See Stack Overflow for more info.

Some JSON parsers (NodeJS’s, for example) convert numbers to floats by default. After discussing the pros and cons we’ve decided on "decimal" as our amount format. It is not a standard OpenAPI format, but should help us to avoid parsing numbers as float / doubles.