HTTP Retry with Exponential Backoff Specification¶
- Authors
- Matt Cockayne, Claude (claude-opus-4-6) (AI drafting assistant)
- Date
- 26 March 2026
- Status
- IMPLEMENTED
Overview¶
The pkg/http package provides a secure HTTP client with sensible defaults (TLS 1.2+, redirect policy, timeouts) but has no retry mechanism. When the update command runs behind corporate proxies or over unreliable networks, transient failures such as context deadline exceeded, HTTP 429, 502, 503, and 504 responses cause immediate failure with no recovery attempt.
This specification adds an opt-in retry mechanism with exponential backoff, jitter, configurable retry limits, and context-aware cancellation. The mechanism is implemented as a http.RoundTripper decorator so it composes cleanly with the existing ClientOption pattern and works transparently for all HTTP requests made through the client.
Design Decisions¶
RoundTripper decorator: Implement retry as a wrapping http.RoundTripper rather than patching NewClient internals. This keeps the retry logic isolated, testable, and composable with custom transports set via WithTransport.
Opt-in via ClientOption: Retry is not enabled by default. Consumers who need it use WithRetry(...). This avoids unexpected behaviour changes for existing callers and prevents double-retries when callers already have their own retry logic.
Exponential backoff with full jitter: Use the "full jitter" strategy ([0, min(cap, base * 2^attempt)]) as recommended by AWS architecture best practices. Full jitter reduces thundering-herd effects compared to equal jitter or no jitter.
Retryable conditions are configurable: The default set of retryable status codes (429, 502, 503, 504) and network errors (timeouts, connection resets) covers common transient failures. Consumers can override this with a custom RetryPolicy function.
Request body buffering: For requests with a body, the body must be re-readable across retries. The retry transport buffers the body on first read and resets it for each attempt. Requests with GetBody set (as produced by http.NewRequest) are handled natively.
Respect Retry-After header: When a 429 or 503 response includes a Retry-After header, use that value as the delay floor for the next attempt rather than the computed backoff.
Public API Changes¶
New ClientOption: WithRetry¶
// RetryConfig configures the retry behaviour of the HTTP client.
type RetryConfig struct {
// MaxRetries is the maximum number of retry attempts. Zero means no retries.
MaxRetries int
// InitialBackoff is the base delay before the first retry. Default: 500ms.
InitialBackoff time.Duration
// MaxBackoff caps the computed delay. Default: 30s.
MaxBackoff time.Duration
// RetryableStatusCodes defines which HTTP status codes trigger a retry.
// Default: []int{429, 502, 503, 504}.
RetryableStatusCodes []int
// ShouldRetry is an optional custom predicate. When set, it replaces the
// default status-code and network-error checks. The attempt count (0-based)
// and either the response or the transport error are provided.
ShouldRetry func(attempt int, resp *http.Response, err error) bool
}
// WithRetry enables automatic retry with exponential backoff for transient failures.
func WithRetry(cfg RetryConfig) ClientOption
Default Retry Configuration¶
// DefaultRetryConfig returns a RetryConfig suitable for most use cases.
func DefaultRetryConfig() RetryConfig {
return RetryConfig{
MaxRetries: 3,
InitialBackoff: 500 * time.Millisecond,
MaxBackoff: 30 * time.Second,
RetryableStatusCodes: []int{http.StatusTooManyRequests, http.StatusBadGateway, http.StatusServiceUnavailable, http.StatusGatewayTimeout},
}
}
Usage¶
client := http.NewClient(
http.WithTimeout(60*time.Second),
http.WithRetry(http.DefaultRetryConfig()),
)
Internal Implementation¶
retryTransport¶
// retryTransport wraps an http.RoundTripper with retry logic.
type retryTransport struct {
next http.RoundTripper
cfg RetryConfig
}
func (t *retryTransport) RoundTrip(req *http.Request) (*http.Response, error) {
var (
resp *http.Response
err error
)
for attempt := range t.cfg.MaxRetries + 1 {
if attempt > 0 {
delay := t.computeDelay(attempt, resp)
select {
case <-req.Context().Done():
return nil, req.Context().Err()
case <-time.After(delay):
}
// Reset request body for retry
if req.GetBody != nil {
req.Body, err = req.GetBody()
if err != nil {
return nil, errors.Wrap(err, "failed to reset request body for retry")
}
}
}
resp, err = t.next.RoundTrip(req)
if !t.shouldRetry(attempt, resp, err) {
break
}
// Drain and close response body before retry to reuse connection
if resp != nil {
_, _ = io.Copy(io.Discard, resp.Body)
_ = resp.Body.Close()
}
}
return resp, err
}
Backoff Computation¶
func (t *retryTransport) computeDelay(attempt int, resp *http.Response) time.Duration {
// Check Retry-After header
if resp != nil {
if ra := parseRetryAfter(resp.Header.Get("Retry-After")); ra > 0 {
return ra
}
}
// Exponential backoff: base * 2^attempt
backoff := t.cfg.InitialBackoff * (1 << uint(attempt-1))
if backoff > t.cfg.MaxBackoff {
backoff = t.cfg.MaxBackoff
}
// Full jitter: uniform random in [0, backoff]
jitter := time.Duration(rand.Int64N(int64(backoff) + 1))
return jitter
}
Integration with NewClient¶
The WithRetry option stores the RetryConfig on clientConfig. In NewClient, after the base transport is resolved, the retry transport wraps it:
func NewClient(opts ...ClientOption) *http.Client {
cfg := defaultClientConfig()
for _, opt := range opts {
opt(cfg)
}
var transport http.RoundTripper
if cfg.transport != nil {
transport = cfg.transport
} else {
transport = NewTransport(cfg.tlsConfig)
}
if cfg.retry != nil {
transport = &retryTransport{next: transport, cfg: *cfg.retry}
}
return &http.Client{
Timeout: cfg.timeout,
Transport: transport,
CheckRedirect: redirectPolicy(cfg.maxRedirects),
}
}
Project Structure¶
pkg/http/
โโโ client.go <- MODIFIED: RetryConfig on clientConfig, WithRetry option, wrapping in NewClient
โโโ retry.go <- NEW: retryTransport, computeDelay, parseRetryAfter, default shouldRetry
โโโ retry_test.go <- NEW: retry tests
โโโ tls.go <- UNCHANGED
Testing Strategy¶
| Test | Scenario |
|---|---|
TestRetryTransport_NoRetryOnSuccess |
200 response on first attempt, no retry |
TestRetryTransport_RetriesOn503 |
503 then 200, verifies single retry and success |
TestRetryTransport_RetriesOn429WithRetryAfter |
429 with Retry-After header, verifies delay respects header |
TestRetryTransport_ExhaustsMaxRetries |
503 on all attempts, returns final 503 response |
TestRetryTransport_ContextCancelled |
Context cancelled during backoff wait, returns context error |
TestRetryTransport_NetworkError |
Connection reset on first attempt, retries and succeeds |
TestRetryTransport_BodyRewind |
POST with body, retried, body correctly re-sent |
TestRetryTransport_NoRetryOn4xx |
400/401/403/404 not retried |
TestRetryTransport_CustomShouldRetry |
Custom predicate overrides default logic |
TestRetryTransport_BackoffJitter |
Verify delay is within expected bounds across multiple attempts |
TestRetryTransport_MaxBackoffCap |
High attempt count does not exceed MaxBackoff |
TestDefaultRetryConfig |
Default values are correct |
TestWithRetry_Integration |
NewClient with WithRetry, full round-trip through httptest.Server |
Test Helpers¶
// countingHandler returns an http.Handler that responds with the given status
// codes in sequence, then 200 for all subsequent requests.
func countingHandler(t *testing.T, statusCodes ...int) http.Handler
Coverage¶
- Target: 90%+ for
pkg/http/including retry paths.
Linting¶
golangci-lint run --fixmust pass.- No new
nolintdirectives.
Documentation¶
- Godoc for
RetryConfig,WithRetry, andDefaultRetryConfig. - Godoc on
retryTransportexplaining the wrapping strategy and body-rewind behaviour. - Update
docs/components/http.md(if it exists) with retry usage examples.
Backwards Compatibility¶
- No breaking changes. Retry is opt-in via
WithRetry. Existing callers that do not use this option see identical behaviour. WithRetrycomposes withWithTransport-- the retry transport wraps whatever base transport is configured.
Future Considerations¶
- Circuit breaker: A circuit breaker could wrap the retry transport to fail-fast when a downstream service is consistently unavailable, avoiding wasted retry attempts.
- Per-request retry override: Allow individual requests to opt out of retry via a context value.
- Metrics/observability: Emit retry attempt counts and delays for monitoring integration.
Implementation Phases¶
Phase 1 -- Types and Configuration¶
- Add
RetryConfigstruct andDefaultRetryConfig()topkg/http/retry.go - Add
WithRetryClientOption - Add
retry *RetryConfigfield toclientConfig
Phase 2 -- Retry Transport¶
- Implement
retryTransportwithRoundTrip - Implement
computeDelaywith exponential backoff and full jitter - Implement
parseRetryAfterfor Retry-After header support - Implement default
shouldRetrychecking status codes and network errors - Wire retry transport wrapping into
NewClient
Phase 3 -- Tests¶
- Create
countingHandlertest helper - Add unit tests for
retryTransportcovering all scenarios - Add integration test via
httptest.Server - Run with race detector
Verification¶
go build ./...
go test -race ./pkg/http/...
go test ./...
golangci-lint run --fix
# Verify retry transport exists
grep -n 'retryTransport' pkg/http/retry.go
# Verify WithRetry option
grep -n 'WithRetry' pkg/http/client.go