Skip to content

Conversation

@ardatan
Copy link
Member

@ardatan ardatan commented Oct 31, 2025

Implementation of Timeout in #317
Ref ROUTER-110
Ref ROUTER-151
This also adds subgraphs and all options to traffic_shaping as in Apollo Router. So subgraph specific configuration can be done with subgraphs;

Apollo Router -> https://www.apollographql.com/docs/graphos/routing/performance/traffic-shaping#configuration

traffic_shaping:
   all:
      request_timeout: 5s
   subgraphs:
       products:
           request_timeout:
                expression: |
                   if (.request.operation.kind == "mutation") {
                        "15s"
                    } else {
                        "5s"
                    }

Documentation -> graphql-hive/console#7214

@ardatan ardatan changed the base branch from main to primitive-expression October 31, 2025 10:55
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ardatan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the router's resilience and configurability by adding a robust timeout feature for subgraph requests. It allows operators to define timeouts either as fixed durations or dynamically via VRL expressions, applicable globally or on a per-subgraph basis. This enhancement is integrated into a refactored traffic shaping configuration, providing greater control over request execution behavior.

Highlights

  • Subgraph Timeout Implementation: Introduced a timeout mechanism for subgraph requests, allowing for both fixed duration and dynamic VRL expression-based timeouts.
  • Flexible Traffic Shaping Configuration: Enhanced the traffic shaping configuration to support global settings (all) and specific overrides for individual subgraphs (subgraphs), mirroring Apollo Router's approach.
  • VRL Expression Integration: Centralized VRL expression handling into a new Expression primitive, simplifying its use across various configurations like subgraph URLs, headers, and now, timeouts.
  • Error Handling: Added specific error types for timeout failures and issues with VRL expression resolution for timeouts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable subgraph timeout feature, enhancing the router's resilience. The implementation is well-structured, using tokio::timeout and providing flexibility with dynamic timeouts through VRL expressions. The configuration refactoring to support per-subgraph settings is also a welcome improvement.

My review focuses on several key areas to further refine the implementation:

  • Performance: I've identified a high-severity issue where new HTTP clients and connection pools are created unnecessarily. I've also pointed out a couple of medium-severity opportunities to reduce allocations on hot paths, aligning with the repository's performance-first ethos.
  • Correctness & Readability: I've suggested improvements to the timeout expression evaluation to provide clearer, more accurate error messages and handle edge cases more robustly. Additionally, I've noted some documentation comments that could be clarified for better user understanding.

The proposed changes aim to enhance performance, improve error handling, and increase the overall clarity and maintainability of the new feature.

@github-actions
Copy link

github-actions bot commented Oct 31, 2025

k6-benchmark results

     ✓ response code was 200
     ✓ no graphql errors
     ✓ valid response structure

     █ setup

     checks.........................: 100.00% ✓ 209781      ✗ 0    
     data_received..................: 6.1 GB  204 MB/s
     data_sent......................: 82 MB   2.7 MB/s
     http_req_blocked...............: avg=4.34µs   min=711ns   med=1.8µs   max=8.02ms   p(90)=2.5µs   p(95)=2.81µs  
     http_req_connecting............: avg=1.47µs   min=0s      med=0s      max=3.95ms   p(90)=0s      p(95)=0s      
     http_req_duration..............: avg=20.97ms  min=2.16ms  med=20ms    max=128.86ms p(90)=28.69ms p(95)=31.9ms  
       { expected_response:true }...: avg=20.97ms  min=2.16ms  med=20ms    max=128.86ms p(90)=28.69ms p(95)=31.9ms  
     http_req_failed................: 0.00%   ✓ 0           ✗ 69947
     http_req_receiving.............: avg=159.52µs min=24.77µs med=39.79µs max=99.1ms   p(90)=89.5µs  p(95)=400.27µs
     http_req_sending...............: avg=25.24µs  min=5.49µs  med=10.72µs max=22.62ms  p(90)=15.87µs p(95)=29.13µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s      max=0s       p(90)=0s      p(95)=0s      
     http_req_waiting...............: avg=20.79ms  min=2.09ms  med=19.87ms max=88.73ms  p(90)=28.42ms p(95)=31.5ms  
     http_reqs......................: 69947   2326.26073/s
     iteration_duration.............: avg=21.44ms  min=4.83ms  med=20.34ms max=248.63ms p(90)=29.16ms p(95)=32.39ms 
     iterations.....................: 69927   2325.595581/s
     vus............................: 50      min=50        max=50 
     vus_max........................: 50      min=50        max=50 

@github-actions
Copy link

github-actions bot commented Oct 31, 2025

🐋 This PR was built and pushed to the following Docker images:

Image Names: ghcr.io/graphql-hive/router

Platforms: linux/amd64,linux/arm64

Image Tags: ghcr.io/graphql-hive/router:pr-541 ghcr.io/graphql-hive/router:sha-cc9fb63

Docker metadata
{
"buildx.build.ref": "builder-129e9f65-ee08-4627-875e-3da8e8b8be74/builder-129e9f65-ee08-4627-875e-3da8e8b8be740/r7r547s2pph1jq3p3xzs6727d",
"containerimage.descriptor": {
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "digest": "sha256:023f5b5f8266ea36afabf41292c8d3d982fd4f48b5e3bee2d8a209242929f49c",
  "size": 1609
},
"containerimage.digest": "sha256:023f5b5f8266ea36afabf41292c8d3d982fd4f48b5e3bee2d8a209242929f49c",
"image.name": "ghcr.io/graphql-hive/router:pr-541,ghcr.io/graphql-hive/router:sha-cc9fb63"
}

@ardatan ardatan changed the title Subgraph timeout feat(router): Subgraph Timeout Configuration Oct 31, 2025
@ardatan ardatan force-pushed the subgraph_timeout branch 3 times, most recently from 0a61e05 to e319e11 Compare October 31, 2025 15:10
@ardatan ardatan force-pushed the primitive-expression branch 2 times, most recently from 128b10a to 766bd0e Compare November 3, 2025 11:30
@ardatan ardatan force-pushed the subgraph_timeout branch 4 times, most recently from 98ceed4 to 7cc1030 Compare November 3, 2025 13:31
@ardatan ardatan force-pushed the primitive-expression branch from 046cc07 to 8650809 Compare November 5, 2025 23:25
Base automatically changed from primitive-expression to main November 21, 2025 15:38
@kamilkisiela kamilkisiela force-pushed the subgraph_timeout branch 2 times, most recently from 48c0b6b to 7d062f6 Compare November 26, 2025 12:38
@github-actions
Copy link

github-actions bot commented Nov 26, 2025

🐋 This PR was built and pushed to the following Docker images:

Image Names: ghcr.io/graphql-hive/router

Platforms: linux/amd64,linux/arm64

Image Tags: ghcr.io/graphql-hive/router:pr-541 ghcr.io/graphql-hive/router:sha-8a10098

Docker metadata
{
"buildx.build.ref": "builder-5f3e3909-9fd3-42c5-83ec-721241aa21fc/builder-5f3e3909-9fd3-42c5-83ec-721241aa21fc0/j99ap2qjevelxrijlqj22ipsk",
"containerimage.descriptor": {
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "digest": "sha256:4c9a9c2a0ad3dc4922d804108fc8c9e341723785a9e16d151bd56a25c518a435",
  "size": 1609
},
"containerimage.digest": "sha256:4c9a9c2a0ad3dc4922d804108fc8c9e341723785a9e16d151bd56a25c518a435",
"image.name": "ghcr.io/graphql-hive/router:pr-541,ghcr.io/graphql-hive/router:sha-8a10098"
}

@kamilkisiela
Copy link
Contributor

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a powerful and flexible timeout configuration system for subgraphs, which is a great enhancement for router resilience. The implementation, including the refactoring of VRL expression handling into a dedicated expressions module, is well-structured and improves code clarity and reusability. I've identified a few issues: an incorrect unit in a timeout error message, a potential memory leak due to unbounded caching of dynamic subgraph executors, a missing case for float values when parsing durations, and a changeset file that doesn't fully align with the repository's contribution guidelines. Addressing these points will make this feature even more robust.

The previous implementation had expression handling scattered across
multiple locations: - `duration_or_prog.rs` handled duration expressions
- `utils/expression.rs` had generic compilation logic - Type-specific
conversions were ad-hoc

This refactor centralizes all expression handling into a module with:

1. Generic traits for extensibility - `FromVrlValue`: Provides a way to
convert VRL values to specific types - `CompileExpression`: Compiles VRL
expression strings into executable programs - `ExecutableProgram`:
Executes compiled VRL programs with proper context setup

2. Type-safe ValueOrProgram enum - A generic enum that can represent
either a static value or a computed expression - Eliminates duplication
(previously had DurationOrProgram only) - Enables easy extension for new
types (StringOrProgram, HeaderValueOrProgram)

3. Proper error handling - Dedicated error types for compilation and
execution failures - Type-specific conversion errors
(DurationConversionError, StringConversionError, etc.)
Rename timeout label from 'global' to 'all' in SubgraphExecutorMap.
Use concrete error types for parsing (FromUtf8Error and
humantime::DurationError) and wrap them via DurationParseErrorSource to
simplify conversion errors and From conversions. Re-export the moved
ProgramResolutionError and streamline expression resolution to use ? for
propagation instead of manual wrapping.
@kamilkisiela kamilkisiela merged commit cd1d576 into main Nov 27, 2025
20 checks passed
@kamilkisiela kamilkisiela deleted the subgraph_timeout branch November 27, 2025 09:48
kamilkisiela pushed a commit that referenced this pull request Nov 28, 2025
> [!IMPORTANT]
> Merging this pull request will create these releases

# router 0.0.21 (2025-11-28)
## Features

- Subgraph Timeout Configuration (#541)

### Subgraph Request Timeout Feature

Adds support for configurable subgraph request timeouts via the
`traffic_shaping` configuration. The `request_timeout` option allows you
to specify the maximum time the router will wait for a response from a
subgraph before timing out the request. You can set a static timeout
(e.g., `30s`) globally or per-subgraph, or use dynamic timeouts with VRL
expressions to vary timeout values based on request characteristics.
This helps protect your router from hanging requests and enables
fine-grained control over how long requests to different subgraphs
should be allowed to run.

### Rename `original_url` variable to `default` in subgraph URL override
expressions.

This change aligns the variable naming with other configuration
expressions, such as timeout configuration.

When using expressions to override subgraph URLs, use `.default` to
refer to the original URL defined in the subgraph definition.

Example:

```yaml
url:
  expression: |
    if .request.headers."x-region" == "us-east" {
      "https://products-us-east.example.com/graphql"
    } else {
      .default
    }
```

## Fixes

- support `@include` and `@skip` in initial fetch node (#591)
- Fixed an issue where `@skip` and `@include` directives were
incorrectly removed from the initial Fetch of the Query Plan.
# node-addon 0.0.5 (2025-11-28)
## Fixes

- support `@include` and `@skip` in initial fetch node (#591)
- Fixed an issue where `@skip` and `@include` directives were
incorrectly removed from the initial Fetch of the Query Plan.
# query-planner 2.1.1 (2025-11-28)
## Fixes

- support `@include` and `@skip` in initial fetch node (#591)
- Fixed an issue where `@skip` and `@include` directives were
incorrectly removed from the initial Fetch of the Query Plan.
# config 0.0.13 (2025-11-28)
## Features

### Subgraph Request Timeout Feature

Adds support for configurable subgraph request timeouts via the
`traffic_shaping` configuration. The `request_timeout` option allows you
to specify the maximum time the router will wait for a response from a
subgraph before timing out the request. You can set a static timeout
(e.g., `30s`) globally or per-subgraph, or use dynamic timeouts with VRL
expressions to vary timeout values based on request characteristics.
This helps protect your router from hanging requests and enables
fine-grained control over how long requests to different subgraphs
should be allowed to run.

### Rename `original_url` variable to `default` in subgraph URL override
expressions.

This change aligns the variable naming with other configuration
expressions, such as timeout configuration.

When using expressions to override subgraph URLs, use `.default` to
refer to the original URL defined in the subgraph definition.

Example:

```yaml
url:
  expression: |
    if .request.headers."x-region" == "us-east" {
      "https://products-us-east.example.com/graphql"
    } else {
      .default
    }
```

## Fixes

- support `@include` and `@skip` in initial fetch node (#591)
# executor 6.2.0 (2025-11-28)
## Features

### Subgraph Request Timeout Feature

Adds support for configurable subgraph request timeouts via the
`traffic_shaping` configuration. The `request_timeout` option allows you
to specify the maximum time the router will wait for a response from a
subgraph before timing out the request. You can set a static timeout
(e.g., `30s`) globally or per-subgraph, or use dynamic timeouts with VRL
expressions to vary timeout values based on request characteristics.
This helps protect your router from hanging requests and enables
fine-grained control over how long requests to different subgraphs
should be allowed to run.

### Rename `original_url` variable to `default` in subgraph URL override
expressions.

This change aligns the variable naming with other configuration
expressions, such as timeout configuration.

When using expressions to override subgraph URLs, use `.default` to
refer to the original URL defined in the subgraph definition.

Example:

```yaml
url:
  expression: |
    if .request.headers."x-region" == "us-east" {
      "https://products-us-east.example.com/graphql"
    } else {
      .default
    }
```

## Fixes

- support `@include` and `@skip` in initial fetch node (#591)

Co-authored-by: knope-bot[bot] <152252888+knope-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants