aud) validation (PR #7578)The router now supports JWT audience (aud) validation. This allows the router to ensure that the JWT is intended
for the specific audience it is being used with, enhancing security by preventing token misuse across different audiences.
The following sample configuration will validate the JWT's aud claim against the specified audiences and ensure a match with either https://my.api or https://my.other.api. If the aud claim does not match either of those configured audiences, the router will reject the request.
authentication:
router:
jwt:
jwks: # This key is required.
- url: https://dev-zzp5enui.us.auth0.com/.well-known/jwks.json
issuers: # optional list of issuers
- https://issuer.one
- https://issuer.two
audiences: # optional list of audiences
- https://my.api
- https://my.other.api
poll_interval: <optional poll interval>
headers: # optional list of static headers added to the HTTP request to the JWKS URL
- name: User-Agent
value: router
# These keys are optional. Default values are shown.
header_name: Authorization
header_value_prefix: Bearer
on_error: Error
# array of alternative token sources
sources:
- type: header
name: X-Authorization
value_prefix: Bearer
- type: cookie
name: authz
By @Velfi in https://github.com/apollographql/router/pull/7578
The router warms up its query planning cache during a hot reload. This change decreases the priority of warm up tasks in the compute job queue to reduce the impact of warmup on serving requests.
This change adds new values to the job.type dimension of the following metrics:
apollo.router.compute_jobs.duration - A histogram of time spent in the compute pipeline by the job, including the queue and query planning.
job.type: (query_planning, query_parsing, introspection, query_planning_warmup, query_parsing_warmup)job.outcome: (executed_ok, executed_error, channel_error, rejected_queue_full, abandoned)apollo.router.compute_jobs.queue.wait.duration - A histogram of time spent in the compute queue by the job.
job.type: (query_planning, query_parsing, introspection, query_planning_warmup, query_parsing_warmup)apollo.router.compute_jobs.execution.duration - A histogram of time spent to execute job (excludes time spent in the queue).
job.type: (query_planning, query_parsing, introspection, query_planning_warmup, query_parsing_warmup)apollo.router.compute_jobs.active_jobs - A gauge of the number of compute jobs being processed in parallel.
job.type: (query_planning, query_parsing, introspection, query_planning_warmup, query_parsing_warmup)By @carodewig in https://github.com/apollographql/router/pull/7223
PERSISTED_QUERY_NOT_IN_LIST error for debuggability (PR #7768)When persisted query safelisting is enabled and a request has an unknown PQ ID, the GraphQL error now has the extension field operation_name containing the GraphQL operation name (if provided explicitly in the request). Note that this only applies to the PERSISTED_QUERY_NOT_IN_LIST error returned when manifest-based PQs are enabled, APQs are disabled, and the request contains an operation ID that is not in the list.
By @glasser in https://github.com/apollographql/router/pull/7768
The cooperative cancellation feature allows the router to gracefully handle query planning timeouts and cancellations, improving resource utilization.
The mode can be set to measure or enforce. We recommend starting with measure. In measure mode, the router will measure the time taken for query planning and emit metrics accordingly. In enforce mode, the router will cancel query planning operations that exceed the specified timeout.
To observe this behavior, the router telemetry has been updated:
outcome attribute to the apollo.router.query_planning.plan.duration metricoutcome attribute to the query_planning spanBelow is a sample configuration to configure cooperative cancellation in measure mode:
supergraph:
query_planning:
experimental_cooperative_cancellation:
enabled: true
mode: measure
timeout: 1s
By @Velfi in https://github.com/apollographql/router/pull/7604
on_graphql_error selector with subgraph_on_graphql_error (PR #7676)The on_graphql_error selector will now return true or false, in alignment with the subgraph_on_graphql_error selector. Previously, the selector would return true or None.
By @carodewig in https://github.com/apollographql/router/pull/7676
PR #7141 added checks on GraphQL responses returned from coprocessors to ensure compliance with GraphQL specifications. This surfaced an issue where subscription responses over websockets could omit the required data field during the handshake, resulting in invalid GraphQL response payloads. All websocket subscription responses will now return a valid GraphQL response when doing the websocket handshake.
By @bnjjj in https://github.com/apollographql/router/pull/7680
Fixed an issue introduced in Router 2.3.0 where some SigV4 configurations would fail to start, preventing communication with SigV4-enabled services.
By @dylan-apollo in https://github.com/apollographql/router/pull/7726
When a variable in a GraphQL request is missing or contains an invalid value, the router now returns more useful error messages. Example:
-invalid type for variable: 'x'
+invalid input value at x.coordinates[0].longitude: found JSON null for GraphQL Float!
By @SimonSapin in https://github.com/apollographql/router/pull/7567
By default, the Prometheus metrics exporter will only export resources as target_info metrics, not inline on every metric. Now, you can add resources to every metric by setting resource_selector to all (default is none).
telemetry:
exporters:
metrics:
common:
resource:
"test-resource": "test"
prometheus:
enabled: true
resource_selector: all # This will add resources on every metrics
Note: this change only affects Prometheus, not OTLP.
By @bnjjj in https://github.com/apollographql/router/pull/7394
@link directives for supergraph schemas where purpose is EXECUTION or SECURITYThe legacy JavaScript query planner forbid any usage of unknown @link specs in supergraph schemas with either EXECUTION or SECURITY value set for the for argument (aka, the spec's "purpose"). This behavior had not been ported to the native query planner previously. This PR implements the expected behavior in the native query planner.
By @duckki in https://github.com/apollographql/router/pull/7587
on_graphql_error selector (PR #7669)The on_graphql_error selector will now correctly fire on the supergraph stage; previously it only worked on the router stage.
By @carodewig in https://github.com/apollographql/router/pull/7669
@defer fetchThe query planner was adding an inline spread (...) conditioned on the Query type in deferred subgraph fetch queries. Such a query would be invalid in the subgraph when the subgraph schema renamed the root query type to somethhing other than Query. The fix removes the root type condition from all subgraph queries, so that they stay valid even when root types are renamed.
By @duckki in https://github.com/apollographql/router/pull/7580
content-type for file uploads when Rhai scripts are in use (PR #7559)If a Rhai script was invoked during file upload processing, then the "Content-Type" of the request was not preserved correctly. This would cause a file upload to fail.
The error message would be something like:
"message": "invalid multipart request: Content-Type is not multipart/form-data",
This issue has now been fixed.
By @garypen in https://github.com/apollographql/router/pull/7559
We made substantial updates to OpenTelemetry in router 2.0, but didn't catch that OpenTelemetry changed how it processed "endpoints" (destinations for metrics and traces) until now.
With the undetected change, the router wasn't setting the path correctly, resulting in failure to export metrics over HTTP when using the "default" endpoint. Neither metrics via gRPC nor traces were impacted.
We have fixed our interactions with the dependency and improved our testing to make sure this does not occur again. Additionally, the router now supports setting standard OpenTelemetry environment variables for endpoints.
There is still a known problem when using environment variables to configure endpoints for the HTTP protocol when transmitting to an un-encrypted endpoint (i.e., TLS not configured). This affects the following environment variables:
OTEL_EXPORTER_OTLP_ENDPOINTOTEL_EXPORTER_OTLP_METRICS_ENDPOINTOTEL_EXPORTER_OTLP_TRACES_ENDPOINTWhen these environment variables are set to insecure hosts, messages will appear in the logs indicating an error, but the metrics and traces will still be sent correctly:
2025-06-06T15:12:47.992144Z ERROR OpenTelemetry metric error occurred: Metrics exporter otlp failed with the grpc server returns error (Unknown error): , detailed error message: h2 protocol error: http2 error tonic::transport::Error(Transport, hyper::Error(Http2, Error { kind: GoAway(b"", FRAME_SIZE_ERROR, Library) }))
2025-06-06T15:12:47.992763Z ERROR OpenTelemetry trace error occurred: Exporter otlp encountered the following error(s): the grpc server returns error (Unknown error): , detailed error message: h2 protocol error: http2 error tonic::transport::Error(Transport, hyper::Error(Http2, Error { kind: GoAway(b"", FRAME_SIZE_ERROR, Library) }))
This is tracked upstream at https://github.com/open-telemetry/opentelemetry-collector/issues/10952.
By @garypen in https://github.com/apollographql/router/pull/7595
graphql.operation.name attribute to apollo.router.opened.subscriptions counter (PR #7606)The apollo.router.opened.subscriptions metric has an graphql.operation.name attribute applied to identify the named operation of open subscriptions.
By @bnjjj in https://github.com/apollographql/router/pull/7606
preview_extended_error_metrics in Apollo config telemetry (PR #7597)By @timbotnik in https://github.com/apollographql/router/pull/7597
The Apollo Runtime Container is now included in our documentation for deployment options. It also includes instructions for running Apollo Router with the Apollo MCP Server.
By @jonathanrainer and @lambertjosh in https://github.com/apollographql/router/pull/7734 and https://github.com/apollographql/router/pull/7668
apollo.router.schema.load.duration (PR #7582)The in-memory cache documentation was referencing an incorrect metric to track schema load times. Previously it was referred to as apollo.router.schema.loading.time, whereas the metric being emitted by the router since v2.0.0 is actually apollo.router.schema.load.duration. This is now fixed.
By @lrlna in https://github.com/apollographql/router/pull/7582
Since this PR we added more checks on graphql response returned by coprocessors to be compliant with GraphQL specs. When it's a subscription using websocket it was not returning any data and so was not a correct GraphQL response payload. This is a fix to always return valid GraphQL response when doing the websocket handshake.
By @bnjjj in https://github.com/apollographql/router/pull/7680
http.route (PR #7405)Per the OpenTelemetry spec, the http.route should only include "the matched route, that is, the path template used in the format used by the respective server framework."
The router currently sends the full URI in http.route, which can be high cardinality (ie /graphql?operation=one_of_many_values). After this change, the router will only include the path (/graphql).
By @carodewig in https://github.com/apollographql/router/pull/7405
graphql.operation.name attribute to apollo.router.opened.subscriptions counter (PR #7606)The apollo.router.opened.subscriptions metric has an graphql.operation.name attribute applied to identify the named operation of subscriptions which are still open.
By @bnjjj in https://github.com/apollographql/router/pull/7606
Connectors improvements: Router 2.3.0 supports Connect spec v0.2, including batch requests, error customization, and direct access to HTTP headers. To use these features: upgrade your Router to 2.3, update your version of Federation to 2.11, and update the @link directives in your subgraphs to https://specs.apollo.dev/connect/v0.2.
See the Connectors changelog for more details.
When logging unknown operations encountered during safe-listing, include information about whether enforcement was skipped. This will help distinguish between truly problematic external operations (where enforcement_skipped is false) and internal operations that are intentionally allowed to bypass safelisting (where enforcement_skipped is true).
By @DaleSeo in https://github.com/apollographql/router/pull/7509
The Router now supports a response_body selector which provides access to the response body in telemetry configurations. This enables more detailed monitoring and logging of response data in the Router.
Example configuration:
telemetry:
instrumentation:
spans:
router:
attributes:
"my_attribute":
response_body: true
By @Velfi in https://github.com/apollographql/router/pull/7363
Connectors now inspect the content-type header of responses to determine how they should treat the response. This allows more flexibility as prior to this change, all responses were treated as JSON which would lead to errors on non-json responses.
The behavior is as follows:
content-type ends with /json (like application/json) OR +json (like application/vnd.foo+json): content is parsed as JSON.content-type is text/plain: content will be treated as a UTF-8 string. Content can be accessed in selection mapping via $ variable.content-type is any other value: content will be treated as a JSON null.content-type header is provided: content is assumed to be JSON and therefore parsed as JSON.If deserialization fails, an error message of Response deserialization failed with a error code of CONNECTOR_DESERIALIZE will be returned:
"errors": [
{
"message": "Response deserialization failed",
"extensions": {
"code": "CONNECTOR_DESERIALIZE"
}
}
]
By @andrewmcgivery in https://github.com/apollographql/router/pull/7380
For errors pertaining to connectors and demand control features, Apollo telemetry will now include the original error message and path as part of the traces sent to GraphOS.
By @timbotnik in https://github.com/apollographql/router/pull/7378
The Router now supports ignoring specific headers when deduplicating requests to subgraphs which provide subscription events. Previously, any differing headers which didn't actually affect the subscription response (e.g., user-agent) would prevent or limit the potential of deduplication.
The introduction of the ignored_headers option allows you to specify headers to ignore during deduplication, enabling you to benefit from subscription deduplication even when requests include headers with unique or varying values that don't affect the subscription's event data.
Configuration example:
subscription:
enabled: true
deduplication:
enabled: true # optional, default: true
ignored_headers: # (optional) List of ignored headers when deduplicating subscriptions
- x-transaction-id
- custom-header-name
By @bnjjj in https://github.com/apollographql/router/pull/7070
During the development of Router 2.0, the health check endpoint support was converted to be a plugin. Unfortunately, the support for disabling the health check endpoint was lost during the conversion.
This is now fixed and a new unit test ensures that disabling the health check does not result in the creation of a health check endpoint.
By @garypen in https://github.com/apollographql/router/pull/7519
The Router accepts modifications to the client name and version (apollo::telemetry::client_name and apollo::telemetry::client_version), but those modifications were not propagated through the telemetry layers to update spans and traces.
After this change, the modifications from plugins on the router service are propagated through the telemetry layers.
By @carodewig in https://github.com/apollographql/router/pull/7369
The connectors plugin will no longer error when using a variable in a nested input argument. The following example would error prior to this change:
query Query ($query: String){
complexInputType(filters: { inSpace: true, search: $query })
}
By @andrewmcgivery in https://github.com/apollographql/router/pull/7472
http.route (PR #7390)Per the OpenTelemetry spec, the http.route should only include "the matched route, that is, the path template used in the format used by the respective server framework."
Prior to this change, the Router sends the full URI in http.route, which can be high cardinality (ie /graphql?operation=one_of_many_values). The Router will now only include the path (/graphql).
By @carodewig in https://github.com/apollographql/router/pull/7390
A recent change increased the log level of JWT authentication failures from info to error. This reverts that change.
By @carodewig in https://github.com/apollographql/router/pull/7396
When configuring the same header name in both @connect(http: { headers: }) (or @source(http: { headers: })) in SDL and propagate in Router YAML configuration, the request had both headers, even if the value is the same. After this change, Router YAML configuration always wins.
By @andrewmcgivery in https://github.com/apollographql/router/pull/7499
The legacy JavaScript query planner forbids any usage of unknown @link specs in supergraph schemas with either EXECUTION or SECURITY value set for the for argument (aka, the spec's "purpose"). This behavior had not been ported to the native query planner previously. This PR implements the expected behavior in the native query planner.
By @duckki in https://github.com/apollographql/router/pull/7587
@defer fetchThe query planner could add an inline spread conditioned on the Query type in deferred subgraph fetch queries. Such a query would be invalid in the subgraph when the subgraph schema renamed the root query type. This fix removes the root type condition from all subgraph queries, so that they stay valid even when root types were renamed.
By @duckki in https://github.com/apollographql/router/pull/7580
The Router's internal Redis configuration has been improved to increase client resiliency under various failure modes (TCP failures and timeouts, unresponsive sockets, Redis server failures, etc.). It also adds heartbeats (a PING every 10 seconds) to the Redis clients.
By @aembke, @carodewig in https://github.com/apollographql/router/pull/7526
The documentation for standard metric instruments for coprocessors has been updated:
apollo.router.operations.coprocessor.total to apollo.router.operations.coprocessorcoprocessor.succeeded attribute applies to apollo.router.operations.coprocessor only.By @shorgi in https://github.com/apollographql/router/pull/7359
A new section has been added to the demand control documentation to demonstrate how to use Rhai scripts to expose cost estimation data in response headers. This allows clients to see the estimated cost, actual cost, and other demand control metrics directly in HTTP responses, which is useful for debugging and client-side optimization.
By @abernix in https://github.com/apollographql/router/pull/7564
When logging unknown operations encountered during safe-listing, include information about whether enforcement was skipped. This will help distinguish between truly problematic external operations (where enforcement_skipped is false) and internal operations that are intentionally allowed to bypass safelisting (where enforcement_skipped is true).
By @DaleSeo in https://github.com/apollographql/router/pull/7509
The router performs a 'hot reload' whenever it detects a schema update. During this reload, it effectively instantiates a new internal router, warms it up (optional), redirects all traffic to this new router, and drops the old internal router.
This change fixes a bug in that "drop" process where the Redis connections are never told to terminate, even though the Redis client pool is dropped. This leads to an ever-increasing number of inactive Redis connections as each new schema comes in and goes out of service, which eats up memory.
The solution adds a new up-down counter metric, apollo.router.cache.redis.connections, to track the number of open Redis connections. This metric includes a kind label to discriminate between different Redis connection pools, which mirrors the kind label on other cache metrics (ie apollo.router.cache.hit.time).
By @carodewig in https://github.com/apollographql/router/pull/7319
The router accepts modifications to the client name and version (apollo::telemetry::client_name and apollo::telemetry::client_version), but those modifications are not currently propagated through the telemetry layers to update spans and traces.
This PR moves where the client name and version are bound to the span, so that the modifications from plugins on the router service are propagated.
By @carodewig in https://github.com/apollographql/router/pull/7369
Prior to this fix, introducing a connector disabled the progressive override plugin.
By @lennyburdette in https://github.com/apollographql/router/pull/7351
The deduplication plugin always cloned responses, even if there were not multiple simultaneous requests that would benefit from the cloned response.
We now check to see if deduplication will provide a benefit before we clone the subgraph response.
There was also an undiagnosed race condition which meant that a notification could be missed. This would have resulted in additional work being performed as the missed notification would have led to another subgraph request.
By @garypen in https://github.com/apollographql/router/pull/7347
http.route (PR #7390)Per the OpenTelemetry spec, the http.route should only include "the matched route, that is, the path template used in the format used by the respective server framework."
The router currently sends the full URI in http.route, which can be high cardinality (ie /graphql?operation=one_of_many_values). After this change, the router will only include the path (/graphql).
By @carodewig in https://github.com/apollographql/router/pull/7390
A recent change inadvertently increased the log level of JWT authentication failures from info to error. This reverts that change returning it to the previous behavior.
By @carodewig in https://github.com/apollographql/router/pull/7396
apollo.router.operations.batching.size metrics for GraphQL request batch sizes (PR #7306)Corrects the calculation of the apollo.router.operations.batching.size metric to reflect accurate batch sizes rather than occasionally returning fractional numbers.
By @bnjjj in https://github.com/apollographql/router/pull/7306
context configuration usage (PR #7349)context: true is an alias for context: deprecated but should not be used. The router now logs a runtime warning on startup if you do use it.
Instead of:
coprocessor:
supergraph:
request:
context: true # ❌
Explicitly use deprecated or all:
coprocessor:
supergraph:
request:
context: deprecated # ✅
See the 2.x upgrade guide for more detailed upgrade steps.
By @goto-bus-stop in https://github.com/apollographql/router/pull/7349
The default build images provided in our CI environment have a relatively modern version of glibc (2.35). This means that on some distributions, notably those based around RedHat, it wasn't possible to use our binaries since the version of glibc was older than 2.35.
We now maintain a build image which is based on a distribution with glibc 2.28. This is old enough that recent releases of either of the main Linux distribution families (Debian and RedHat) can make use of our binary releases.
By @garypen in https://github.com/apollographql/router/pull/7355
@skip/@include on subscription root fields in validation (PR #7338)This implements a GraphQL spec RFC, rejecting subscriptions in validation that can be invalid during execution.
By @goto-bus-stop in https://github.com/apollographql/router/pull/7338
Added a new page under Routing docs about Query Planning Best Practices.
By @smyrick in https://github.com/apollographql/router/pull/7263
This fixes the apollo.router.operations.authentication.jwt counter metric to behave as documented: emitted for every request that uses JWT, with the authentication.jwt.failed attribute set to true or false for failed or successful authentication.
Previously, it was only used for failed authentication.
The attribute-less and accidentally-differently-named apollo.router.operations.jwt counter was and is only emitted for successful authentication, but is deprecated now.
By @SimonSapin in https://github.com/apollographql/router/pull/7258
The router performs a 'hot reload' whenever it detects a schema update. During this reload, it effectively instantiates a new internal router, warms it up (optional), redirects all traffic to this new router, and drops the old internal router.
This change fixes a bug in that drop process where the Redis connections are never told to terminate, even though the Redis client pool is dropped. This leads to an ever-increasing number of inactive Redis connections, which eats up memory.
It also adds a new up-down counter metric, apollo.router.cache.redis.connections, to track the number of open Redis connections. This metric includes a kind label to discriminate between different Redis connection pools, which mirrors the kind label on other cache metrics (ie apollo.router.cache.hit.time).
By @carodewig in https://github.com/apollographql/router/pull/7319
Previously Router ignored data: null property inside GraphQL response returned by coprocessor.
According to GraphQL Spectification:
If an error was raised during the execution that prevented a valid response, the "data" entry in the response should be null.
That means if coprocessor returned valid execution error, for example:
{
"data": null,
"errors": [{ "message": "Some execution error" }]
}
Router violated above restriction from GraphQL Specification by returning following response to client:
{
"errors": [{ "message": "Some execution error" }]
}
This fix ensures full compliance with the GraphQL specification by preserving the complete structure of error responses from coprocessors.
Contributed by @IvanGoncharov in #7141
apollo.router.operations.batching.size metrics for GraphQL request batch sizes (PR #7306)Correct the calculation of the apollo.router.operations.batching.size metric to reflect accurate batch sizes rather than occasionally returning fractional numbers.
By @bnjjj in https://github.com/apollographql/router/pull/7306
This change exposes the server's header read timeout as the server.http.header_read_timeout configuration option.
By default, the server.http.header_read_timeout is set to previously hard-coded 10 seconds. A longer timeout can be configured using the server.http.header_read_timeout option.
server:
http:
header_read_timeout: 30s
By @gwardwell in https://github.com/apollographql/router/pull/7262
@skip/@include on subscription root fields in validation (PR #7338)This implements a GraphQL spec RFC, rejecting subscriptions in validation that can be invalid during execution.
By @goto-bus-stop in https://github.com/apollographql/router/pull/7338
Added support for connector header propagation via YAML config. All of the existing header propagation in the Router now works for connectors by using
headers.connector.all to apply rules to all connectors or headers.connector.sources.* to apply rules to specific sources.
Note that if one of these rules conflicts with a header set in your schema, either in @connect or @source, the value in your Router config will
take priority and be treated as an override.
headers:
connector:
all: # configuration for all connectors across all subgraphs
request:
- insert:
name: "x-inserted-header"
value: "hello world!"
- propagate:
named: "x-client-header"
sources:
connector-graph.random_person_api:
request:
- insert:
name: "x-inserted-header"
value: "hello world!"
- propagate:
named: "x-client-header"
By @andrewmcgivery in https://github.com/apollographql/router/pull/7152
To facilitate configuration evolution within major versions of the router's lifecycles (e.g., within 2.x.x versions), YAML configuration migrations are applied automatically. To avoid configuration drift and facilitate maintenance, when upgrading to a new major version the migrations from the previous major (e.g., 1.x.x) will not be applied automatically. These will need to be applied with router config upgrade prior to the upgrade. To facilitate major version upgrades, we recommend regularly applying the configuration changes using router config upgrade and committing those to your version control system.
By @bnjjj in https://github.com/apollographql/router/pull/7162
Previously, we only allowed expressions in very specific locations in Connectors URIs:
/users/{$args.id}/users?id={$args.id}Expressions can now be used anywhere in or after the path of the URI.
For example, you can do
@connect(http: {GET: "/users?{$args.filterName}={$args.filterValue}"}).
The result of any expression will always be percent encoded.
Note: Parts of this feature are only available when composing with Apollo Federation v2.11 or above (currently in preview).
By @dylan-apollo in https://github.com/apollographql/router/pull/7220
This change allows the router to report usage metrics by persisted query ID to Apollo, so that we can show usage stats for PQs.
By @bonnici in https://github.com/apollographql/router/pull/7166
http_request span (Issue #6739)Coprocessor requests will now emit an http_request span. This span can help to gain
insight into latency that may be introduced over the network stack when communicating with coprocessor.
Coprocessor span attributes are:
otel.kind: CLIENThttp.request.method: POSTserver.address: <target address>server.port: <target port>url.full: <url.full>otel.name: <method> <url.full>otel.original_name: http_requestBy @theJC in https://github.com/apollographql/router/pull/6776
Apollo client libraries can send the library name and version information in the extensions key of an operation request. If those values are found in a request the router will include them in the telemetry operation report sent to Apollo.
By @calvincestari in https://github.com/apollographql/router/pull/7264
The compute job pool in the router is used to execute CPU intensive work outside of the main I/O worker threads, including GraphQL parsing, query planning, and introspection. This PR adds spans to jobs that are on this pool to allow users to see when latency is introduced due to resource contention within the compute job pool.
compute_job:
job.type: (query_parsing|query_planning|introspection)compute_job.execution
job.age: P1-P8job.type: (query_parsing|query_planning|introspection)Jobs are executed highest priority (P8) first. Jobs that are low priority (P1) age over time, eventually executing
at highest priority. The age of a job is can be used to diagnose if a job was waiting in the queue due to other higher
priority jobs also in the queue.
By @bryncooke in https://github.com/apollographql/router/pull/7236
Allow JWT authorization options to support multiple issuers using the same JWKS.
Configuration change: any issuer defined on currently existing authentication.router.jwt.jwks needs to be
migrated to an entry in the issuers list. This configuration will happen automatically until the next major version of the router. This change can be committed using ./router config upgrade prior to the next major release.
For example, the following configuration:
authentication:
router:
jwt:
jwks:
- url: https://dev-zzp5enui.us.auth0.com/.well-known/jwks.json
issuer: https://issuer.one
Will be changed to contain an array of issuers rather than a single issuer:
authentication:
router:
jwt:
jwks:
- url: https://dev-zzp5enui.us.auth0.com/.well-known/jwks.json
issuers:
- https://issuer.one
- https://issuer.two
By @theJC in https://github.com/apollographql/router/pull/7170
This fixes the apollo.router.operations.authentication.jwt counter metric to behave as documented: emitted for every request that uses JWT, with the authentication.jwt.failed attribute set to true or false for failed or successful authentication.
Previously, it was only used for failed authentication.
The attribute-less and accidentally-differently-named apollo.router.operations.jwt counter was and is only emitted for successful authentication, but is deprecated now.
By @SimonSapin in https://github.com/apollographql/router/pull/7258
The tracing_subscriber crate uses RwLocks to manage access to a Span's Extensions. Deadlocks are possible when
multiple threads access this lock, including with reentrant locks:
// Thread 1 | // Thread 2
let _rg1 = lock.read(); |
| // will block
| let _wg = lock.write();
// may deadlock |
let _rg2 = lock.read(); |
This fix removes an opportunity for reentrant locking while extracting a Datadog identifier.
There is also a potential for deadlocks when the root and active spans' Extensions are acquired at the same time, if
multiple threads are attempting to access those Extensions but in a different order. This fix removes a few cases
where multiple spans' Extensions are acquired at the same time.
By @carodewig in https://github.com/apollographql/router/pull/7142
In v2.1.0 we introduced logs for the jwt_expires_in function which caused an unexpectedly chatty logging when using subscriptions.
By @bnjjj in https://github.com/apollographql/router/pull/7069
Fixes a bug where enums that were arguments to nested queries were not being reported.
By @merylc in https://github.com/apollographql/router/pull/6900
The compute job pool is used within the router for compute intensive jobs that should not block the Tokio worker threads. When this pool becomes saturated it is difficult for users to see why so that they can take action. This change adds new metrics to help users understand how long jobs are waiting to be processed.
New metrics:
apollo.router.compute_jobs.queue_is_full - A counter of requests rejected because the queue was full.apollo.router.compute_jobs.duration - A histogram of time spent in the compute pipeline by the job, including the queue and query planning.
job.type: (query_planning, query_parsing, introspection)job.outcome: (executed_ok, executed_error, channel_error, rejected_queue_full, abandoned)apollo.router.compute_jobs.queue.wait.duration - A histogram of time spent in the compute queue by the job.
job.type: (query_planning, query_parsing, introspection)apollo.router.compute_jobs.execution.duration - A histogram of time spent to execute job (excludes time spent in the queue).
job.type: (query_planning, query_parsing, introspection)apollo.router.compute_jobs.active_jobs - A gauge of the number of compute jobs being processed in parallel.
job.type: (query_planning, query_parsing, introspection)By @carodewig in https://github.com/apollographql/router/pull/7184
Previously, a URI like @connect(http: {GET: "/users/"}) could be normalized to @connect(http: {GET: "/users"}). This
change preserves the trailing slash, which is significant to some web servers.
By @dylan-apollo in https://github.com/apollographql/router/pull/7220
This fixes a bug that dropped the @context and @fromContext directives when introducing a connector.
By @lennyburdette in https://github.com/apollographql/router/pull/7132
Fixed a issue where conditional telemetry events weren't being properly evaluated.
This affected both standard events (response, error) and custom telemetry events.
For example in config like this:
telemetry:
instrumentation:
events:
supergraph:
request:
level: info
condition:
eq:
- request_header: apollo-router-log-request
- testing
response:
level: info
condition:
eq:
- request_header: apollo-router-log-request
- testing
The Router would emit the request event when the header matched, but never emit the response event - even with the same matching header.
This fix ensures that all event conditions are properly evaluated, restoring expected telemetry behavior and making conditional logging work correctly throughout the entire request lifecycle.
By @IvanGoncharov in https://github.com/apollographql/router/pull/7325
When a connection is closed we call graceful_shutdown on hyper and then await for the connection to close.
Hyper 0.x has various issues around shutdown that may result in us waiting for extended periods for the connection to eventually be closed.
This PR introduces a configurable timeout from the termination signal to actual termination, defaulted to 60 seconds. The connection is forcibly terminated after the timeout is reached.
To configure, set the option in router yaml. It accepts human time durations:
supergraph:
connection_shutdown_timeout: 60s
Note that even after connections have been terminated the router will still hang onto pipelines if early_cancel has not been configured to true. The router is trying to complete the request.
Users can either set early_cancel to true
supergraph:
early_cancel: true
AND/OR use traffic shaping timeouts:
traffic_shaping:
router:
timeout: 60s
By @BrynCooke in https://github.com/apollographql/router/pull/7058
Trace messages in coprocessors used external extensibility namespace. They now use coprocessor in the message instead for clarity.
By @briannafugate408
When an invalid query plan is generated, the router could panic and crash. This could happen if there are gaps in the GraphQL validation implementation. Now, even if there are unresolved gaps, the router will handle it gracefully and reject the request.
By @goto-bus-stop in https://github.com/apollographql/router/pull/7214
By @bonnici in https://github.com/apollographql/router/pull/7021
Fixes an issue where numeric error codes (e.g. 400, 500) were not properly parsed into a string and thus were not reported to Apollo error telemetry.
By @rregitsky in https://github.com/apollographql/router/pull/7226
The compute job pool in the router is used to execute CPU intensive work outside of the main I/O worker threads, including GraphQL parsing, query planning, and introspection. When the pool is busy, jobs enter a queue.
We previously set this queue size to 20 (per thread). However, this may be too small on resource constrained environments.
This patch increases the queue size to 1,000 jobs per thread. For reference, in older router versions before the introduction of the compute job worker pool, the equivalent queue size was 1,000.
By @goto-bus-stop in https://github.com/apollographql/router/pull/7205
Characters outside of { } expressions will no longer be percent encoded unless they are completely invalid for a
URI. For example, in an expression like @connect(http: {GET: "/products?filters[category]={$args.category}"}) the
square
braces [ ] will no longer be percent encoded. Any string from within a dynamic { } will still be percent encoded.
By @dylan-apollo in https://github.com/apollographql/router/pull/7220
data: null when handling coprocessor GraphQL responses which included errors (PR #7141)Previously, Router incorrectly swallowed data: null conditions on GraphQL responses returned from a coprocessor.
According to GraphQL Spectification:
If an error was raised during the execution that prevented a valid response, the "data" entry in the response should be null.
That means if coprocessor returned a valid execution error, for example:
{
"data": null,
"errors": [{ "message": "Some execution error" }]
}
It was incorrect (and inadvertent) to return the following response to the client:
{
"errors": [{ "message": "Some execution error" }]
}
This fix ensures compliance with the GraphQL specification in this regard by preserving the complete structure of the response returned from coprocessors.
Contributed by @IvanGoncharov in #7141
resource property in ConfigMap (Issue #6104)The Helm chart was using an outdated value when emitting the telemetry.exporters.metrics.common.resource.service.name values. This has been updated to use the correct (singular) version of resource (rather than the incorrect resources which was used earlier in 1.x's life-cycle).
By @vatsalpatel in https://github.com/apollographql/router/pull/6105
#!/bin/bash instead of #!/usr/bin/env bash (Issue #3517)For users of Google Cloud Platform (GCP) Cloud Run platform, using the router's default Docker image was not possible due to an error that would occur during startup:
"/usr/bin/env: 'bash ': No such file or directory"
To avoid this issue, we've changed the script to use #!/bin/bash instead of #!/usr/bin/env bash, as we use a fixed Linux distribution in Docker which has the Bash binary located in a fixed location.
By @lleadbet in https://github.com/apollographql/router/pull/7198
If Uplink was enabled, Router 2.1.x emitted this warning at startup even when there was no user configuration responsible for the condition:
WARN setting resource attributes is not allowed for Apollo telemetry
The warning is removed entirely.
By @SimonSapin in https://github.com/apollographql/router/pull/7272
This change exposes the server's header read timeout as the server.http.header_read_timeout configuration option.
By default, the server.http.header_read_timeout is set to previously hard-coded 10 seconds. A longer timeout can be configured using the server.http.header_read_timeout option.
server:
http:
header_read_timeout: 30s
By @gwardwell in https://github.com/apollographql/router/pull/7262
include_subgraph_errors (Issue #6402Update include_subgraph_errors with additional configuration options for both global and subgraph levels. This update provides finer control over error messages and extension keys for each subgraph.
For more details, please read subgraph error inclusion.
include_subgraph_errors:
all:
redact_message: true
allow_extensions_keys:
- code
subgraphs:
product:
redact_message: false # Propagate original error messages
allow_extensions_keys: # Extend global allow list - `code` and `reason` will be propagated
- reason
exclude_global_keys: # Exclude `code` from global allow list - only `reason` will be propagated.
- code
account:
deny_extensions_keys: # Overrides global allow list
- classification
review: false # Redact everything.
# Undefined subgraphs inherits default global settings from `all`
Note: Using a deny_extensions_keys approach carries security risks because any sensitive information not explicitly included in the deny list will be exposed to clients. For better security, subgraphs should prefer to redact everything or allow_extensions_keys when possible.
By @Samjin and @bryncooke in https://github.com/apollographql/router/pull/7164
This change provides a secondary pathway for new "realtime" GraphOS Studio metrics whose delivery interval is configurable due to their higher cardinality. These metrics will respect telemetry.apollo.batch_processor.scheduled_delay as configured on the realtime path. All other Apollo metrics will maintain the previous hardcoded 60s send interval.
By @rregitsky and @timbotnik in https://github.com/apollographql/router/pull/7138
Added documentation for more GraphQL error codes that can occur during router execution, including better differentiation between HTTP status codes and GraphQL error extensions codes.
By @timbotnik in https://github.com/apollographql/router/pull/7160
Update the Router vs Gateway Tech Note with more details now that we have connectors
By @smyrick in https://github.com/apollographql/router/pull/7261
We've introduced documentation for GraphOS extended error reporting.
By @timbotnik in https://github.com/apollographql/router/pull/7038
Apollo-Expose-Query-Plan: dry-run to Cache warm-up (PR #6973)The Cache warm-up documentation now flags the availability of the Apollo-Expose-Query-Plan: dry-run header.
By @smyrick in https://github.com/apollographql/router/pull/6973