Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prep release: v1.44.0 #4944

Merged
merged 4 commits into from Apr 12, 2024
Merged

prep release: v1.44.0 #4944

merged 4 commits into from Apr 12, 2024

Conversation

o0Ignition0o
Copy link
Contributor

@o0Ignition0o o0Ignition0o commented Apr 12, 2024

Note

When approved, this PR will merge into the 1.44.0 branch which will — upon being approved itself — merge into main.

Things to review in this PR:

  • Changelog correctness (There is a preview below, but it is not necessarily the most up to date. See the Files Changed for the true reality.)
  • Version bumps
  • That it targets the right release branch (1.44.0 in this case!).

🚀 Features

Add details to router service call failed errors (Issue #4899)

The router now includes more details in router service call failed error messages to improve their understandability and debuggability.

By @garypen in #4900

Support exporting metrics via OTLP HTTP (Issue #4559)

In addition to exporting metrics via OTLP/gRPC, the router now supports exporting metrics via OTLP/HTTP.

You can enable exporting via OTLP/HTTP by setting the protocol key to http in your router.yaml:

telemetry:
  exporters:
    metrics:
      otlp:
        enabled: true
        protocol: http

By @BrynCooke in #4842

Add support of instruments in configuration for telemetry (Issue #4319)

Add support for custom and standard instruments through the configuration file. You'll be able to add your own custom metrics just using the configuration file. They may:

  • be conditional
  • get values from selectors, for instance headers, context or body
  • have different types like histogram or counter.

Example:

telemetry:
  instrumentation:
    instruments:
      router:
        http.server.active_requests: true
        acme.request.duration:
          value: duration
          type: counter
          unit: kb
          description: "my description"
          attributes:
            http.response.status_code: true
            "my_attribute":
              response_header: "x-my-header"

      supergraph:
        acme.graphql.requests:
          value: unit
          type: counter
          unit: count
          description: "supergraph requests"

      subgraph:
        acme.graphql.subgraph.errors:
          value: unit
          type: counter
          unit: count
          description: "my description"

Documentation

By @bnjjj in #4771

Reuse cached query plans across schema updates (Issue #4834)

The router now supports an experimental feature to reuse schema aware query hashing—introduced with the entity caching feature—to cache query plans. It reduces the amount of work when reloading the router. The hash of the cache stays the same for a query across schema updates if the schema updates don't change the query. If query planner cache warm-up is configured, the router can reuse previous cache entries for which the hash does not change, consequently reducing both CPU usage and reload duration.

You can enable reuse of cached query plans by setting the supergraph.query_planning.experimental_reuse_query_plans option:

supergraph:
  query_planning:
    warmed_up_queries: 100
    experimental_reuse_query_plans: true

By @Geal in #4883

Set a default TTL for query plans (Issue #4473)

The router has updated the default TTL for query plan caches. The new default TTL is 30 days. With the previous default being an infinite duration, the new finite default better supports the fact that the router updates caches with schema updates.

By @Geal in #4588

🐛 Fixes

Replace null separator in cache key with : to match Redis convention (PR #4886)

To conform with Redis convention, the router now uses : instead of null as the separator in cache keys. This conformance helps to properly display cache keys in nested form in Redis clients.

This PR (#4886) updates the separator for APQ cache keys. Another PR (#4583) updates the separator for query plan cache keys.

By @tapaderster in #4886

Make 'router' user the owner of the docker image's /dist/data directory (PR #4898)

Since we made our images more secure, we run our router process as user 'router'. If we are running under 'heaptrack', e.g.: in a debug image, then we cannot write to /dist/data because it is owned by 'root'.

This changes the ownership of /dist/data from 'root' to 'router' to allow writes to succeed.

By @garypen in #4898

Accept extensions: null in a GraphQL request (Issue #3388)

In GraphQL requests, extensions is an optional map.
Passing an explicit null was incorrectly considered a parse error.
Now it is equivalent to omiting that field entirely, or to passing an empty map.

By @SimonSapin in #4911

Require Cache-Control header for entity cache (Issue #4880)

Previously, the router's entity cache plugin didn't use a subgraph's Cache-Control header to decide whether to store a response. Instead, it cached all responses.

Now, the router's entity cache plugin expects a Cache-Control header from a subgraph. If a subgraph does not provide it, the aggregated Cache-Control header sent to the client will contain no-store.

Additionally, the router now verifies that a TTL is configured for all subgraphs, either globally or for each subgraph configuration.

By @Geal in #4882

Helm: include all standard labels in pod spec but complete sentence that stands on its own (PR #4862)

The templates for the router's Helm chart have been updated so that the helm.sh/chart, app.kubernetes.io/version, and app.kubernetes.io/managed-by labels are now included on pods, as they already were for all other resources created by the Helm chart.

The specific change to the template is that the pod spec template now uses the router.labels template function instead of the router.selectorLabels template function. This allows you to remove a label from the selector without removing it from resource metadata by overriding the router.selectorLabels and router.labels functions and moving the label from the former to the latter.

By @glasser in #4862

Persisted queries return 4xx errors (PR #4887

Previously, sending an invalid persisted query request could return a 200 status code to the client when they should have returned errors. These requests now return errors as 4xx status codes:

  • Sending a PQ ID that is unknown returns 404 (Not Found).

  • Sending freeform GraphQL when no freeform GraphQL is allowed returns
    400 (Bad Request).

  • Sending both a PQ ID and freeform GraphQL in the same request (if the
    APQ feature is not also enabled) returns 400 (Bad Request).

  • Sending freeform GraphQL that is not in the safelist when the safelist
    is enabled returns (403 Forbidden).

  • A particular internal error that shouldn't happen returns 500 (Internal
    Server Error).

    By @glasser in feat(pq): use 4xx status code on PQ errors #4887

📃 Configuration

Add generate_query_fragments configuration option (PR #4885)

Add a new supergraph configuration option generate_query_fragments. When set to true, the query planner will extract inline fragments into fragment definitions before sending queries to subgraphs. This can significantly reduce the size of the query sent to subgraphs, but may increase the time it takes to plan the query. Note that this option and reuse_query_fragments are mutually exclusive; if both are set to true, generate_query_fragments will take precedence.

An example router configuration:

supergraph:
  generate_query_fragments: true

By @trevor-scheer in #4885

Fix integration test warning on macOS (PR #4919)

Previously, integration tests of the router on macOS could produce the warning messages:

warning: unused import: `common::Telemetry`
 --> apollo-router/tests/integration/mod.rs:4:16
  |
4 | pub(crate) use common::Telemetry;
  |                ^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused import: `common::ValueExt`
 --> apollo-router/tests/integration/mod.rs:5:16
  |
5 | pub(crate) use common::ValueExt;
  |                ^^^^^^^^^^^^^^^^

That issue is now resolved.

By @garypen in #4919

@router-perf
Copy link

router-perf bot commented Apr 12, 2024

CI performance tests

  • reload - Reload test over a long period of time at a constant rate of users
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • large-request - Stress test with a 1 MB request payload
  • const - Basic stress test that runs with a constant number of users
  • no-graphos - Basic stress test, no GraphOS.
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • xxlarge-request - Stress test with 100 MB request payload
  • xlarge-request - Stress test with 10 MB request payload
  • step - Basic stress test that steps up the number of users over time

@o0Ignition0o o0Ignition0o enabled auto-merge (squash) April 12, 2024 09:13
@o0Ignition0o o0Ignition0o merged commit eb2c03d into 1.44.0 Apr 12, 2024
12 checks passed
@o0Ignition0o o0Ignition0o deleted the prep-1.44.0 branch April 12, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants