How It Works
Circuit Breaker Grouping
Circuit breakers are created and managed based on unique URLs. Each unique full URL, including the complete path, gets its own dedicated circuit breaker. This means that multiple subgraphs sharing the same URL will also share the same circuit breaker instance. However, there’s an important exception to this rule: if a subgraph has its own specific circuit breaker configuration defined, it will get a dedicated circuit breaker even when sharing a URL with other subgraphs.Time-Based Sliding Window
The circuit breaker uses a time-based sliding window with buckets to track request statistics over time. When you configure the circuit breaker withnum_buckets
set to 5 and rolling_duration
set to 60 seconds, the router creates 5 buckets of 12 seconds each (calculated as 60 divided by 5). This bucketing system allows for granular tracking of request patterns and outcomes.
When you make a request, the router records both the request itself and its outcome—whether it succeeded or failed—in the current time bucket. The circuit breaker then continuously evaluates error rates and request counts across all active buckets within the specified rolling duration.
After 60 seconds have elapsed, the circuit breaker has collected a full window of data across all 5 buckets, as illustrated in the diagram below:


Circuit Breaker States
The circuit breaker operates in three distinct states, each serving a specific purpose in the failure detection and recovery process:
sleep_window
configuration.
Half-Open State (Testing Recovery): After the sleep window expires, the circuit breaker enters a cautious testing phase called the half-open state. During this phase, the circuit breaker allows a limited number of test requests (defined by half_open_attempts
) to pass through to the subgraph. The purpose is to probe whether the subgraph has recovered and is ready to handle traffic again. Based on the results of these test requests, the circuit breaker will either close (if enough requests succeed as defined by required_successful
) or return to the open state if the requests continue to fail.
State Transition Logic
The circuit breaker’s state transitions follow a carefully designed logic that balances protection with availability. Transition from Closed to Open: The circuit breaker will only transition from closed to open when both of two critical conditions are met simultaneously. First, the minimum number of requests specified byrequest_threshold
must have been received within the rolling window. Second, the error rate must exceed the percentage defined by error_threshold_percentage
. This dual-condition approach is crucial because it prevents the circuit from opening due to a few isolated failures when there isn’t enough data to make a reliable decision. For example, even if you have a 100% error rate, the circuit won’t open until the request threshold is met, preventing premature circuit opening during low-traffic periods.
Transition from Open to Half-Open: This transition happens automatically after the sleep_window
duration expires. The circuit breaker doesn’t require any external trigger—it simply moves to the half-open state to begin testing whether the downstream service has recovered.
Transition from Half-Open to Closed or Open: From the half-open state, the circuit can transition in two directions. If the required number of successful requests (as defined by required_successful
) are achieved during the testing phase, the circuit transitions back to closed, allowing normal traffic flow to resume. However, if any of the test requests fail, the circuit immediately returns to the open state and waits for another sleep window before attempting to test recovery again.
Identifying Failures
The circuit breaker determines what constitutes a failure based on Go’s HTTP RoundTripper behavior and specific timeout conditions.What Counts as a Failure
The circuit breaker considers the following scenarios as failures: Network-Level Failures When Go’s RoundTripper returns an error, the circuit breaker treats this as a failure. According to the Go source:RoundTrip should not attempt to interpret the response. In particular, RoundTrip must return err == nil if it obtained a response, regardless of the response’s HTTP status code. A non-nil err should be reserved for failure to obtain a response.This means that any situation where a response is received from the subgraph—regardless of HTTP status code—will not be considered a failure. However, network-level issues that prevent obtaining a response are counted as failures, including but not limited to:
- Connection failures: DNS resolution errors, network unreachable, connection refused, connection timeout
- TLS/SSL errors: certificate verification failures, handshake timeouts, protocol negotiation issues
- Transport errors: broken connections, premature connection closure, read/write timeouts during data transfer
execution_timeout
, it gets marked as an error for circuit breaker statistics. This timeout is independent from request cancellations or client-side timeouts.
What Does NOT Count as a Failure
HTTP Error Status Codes Since the circuit breaker relies on Go’s RoundTripper behavior, HTTP error responses (4xx, 5xx status codes) are not considered failures as long as a response is received. The circuit breaker focuses on connectivity and availability rather than application-level errors. Request Cancellations and Timeouts When a request is cancelled or times out due to client-side constraints, this will not be recorded as a failure for the circuit breaker. These scenarios are treated differently from execution timeouts, which are circuit breaker-specific.Example YAML Configuration
You can find information on each individual configuration option hereConfiguration Scopes
Global Configuration
You can apply circuit breaker settings to all subgraphs by default using theall
scope in your configuration. This approach provides a consistent baseline protection level across your entire graph:
Subgraph-Specific Configuration
Individual subgraphs can have their circuit breaker behavior customized or completely disabled by adding specific configuration blocks. This granular control allows you to tailor protection levels based on the reliability characteristics and criticality of different services:It is important to note that when you coinfigure a circuit breaker at the subgraph level, it will also result in the creation of a distinct subgraph transport with the default values (unless specified).
Important Considerations
Retry InteractionCircuit breakers work in conjunction with the router’s retry mechanism, and their interaction is important to understand. When you have retries configured and a circuit opens during the retry attempts for a request, no further retries will be attempted for that specific request. Multi-Window Recovery Scenarios
When you configure
half_open_attempts
to be less than required_successful
, the recovery process will span multiple sleep windows. Consider an example where you have half_open_attempts
set to 3, required_successful
set to 5, and sleep_window
set to 300 milliseconds. In this scenario, when the circuit enters the half-open state, it will allow 3 test requests to pass through. Even if all 3 requests succeed, the circuit still needs 2 more successful requests to meet the required_successful
threshold. Since the half-open attempts are exhausted, the circuit remains half-open and waits for another sleep window to expire before allowing the next batch of test requests.
Timeout BehaviorThe
execution_timeout
serves as an internal timer specifically for circuit breaker error tracking. When a request exceeds this timeout, it gets marked as an error for circuit breaker statistical purposes. However, it’s crucial to understand that the actual request might still succeed and return a response to the client before the circuit breaker trips. This separation allows the circuit breaker to track slow requests as potential indicators of service degradation.
Num Buckets and Rolling DurationThe rolling duration must be evenly divisible by the number of buckets—if the modulo operation of rolling_duration % num_buckets is not zero, the router will return a configuration error.