Migrating: pkg/controls supervisor & lifecycle hardening¶
This release hardens the pkg/controls supervisor and goroutine lifecycle. It
fixes seven interlocking defects (see the
spec). The
change is behaviourally breaking for one case (described below) but introduces
no breaking exported-signature changes β all API additions are backward
compatible.
Breaking Changes¶
Restart policy no longer restarts a clean Start return¶
Package: pkg/controls
Previously, a service whose StartFunc returned nil (the normal case for a
server that spawns its listener in a background goroutine) was treated as having
exited and, if it had a RestartPolicy, was restarted in a loop β
including sending nil on the error channel.
Each run is now classified explicitly. A nil return is a clean start, not an
exit: the service is supervised via its StatusFunc/health check (when
HealthFailureThreshold > 0) and otherwise simply runs until shutdown. Only a
genuine error (or a health-threshold breach) triggers a restart.
Before (buggy):
// A RestartPolicy on a background-serving service caused a restart storm at
// startup, because Start returning nil was misread as "exited".
controller.Register("api",
controls.WithStart(func(ctx context.Context) error {
go srv.Serve(ln) // returns nil immediately
return nil
}),
controls.WithRestartPolicy(controls.RestartPolicy{MaxRestarts: 5}),
)
After (correct):
The same registration no longer restarts on the clean nil return. To have such a
service restarted on failure, supervise it via a health check:
controller.Register("api",
controls.WithStart(func(ctx context.Context) error {
go func() {
if err := srv.Serve(ln); err != nil && !errors.Is(err, http.ErrServerClosed) {
state.setExit(err)
}
}()
return nil
}),
controls.WithStatus(func() error { return state.exitErr() }),
controls.WithRestartPolicy(controls.RestartPolicy{
MaxRestarts: 5,
HealthFailureThreshold: 3,
HealthCheckInterval: 5 * time.Second,
}),
)
Migration: Most callers need no change β the built-in pkg/http and pkg/grpc
transports register without a restart policy and were never meant to restart on
a clean start. If you relied on the old (incorrect) restart-on-clean-return
behaviour, switch to health-check-driven supervision as shown above.
RestartPolicy.MaxRestarts now counts consecutive failures¶
MaxRestarts previously counted lifetime restarts. It now counts consecutive
failures: after a service runs healthily for RestartResetInterval (default 30 s),
the counter resets to zero. A service that fails, recovers, and later fails again is
no longer prematurely declared as having exceeded its restart budget.
New (additive, non-breaking) APIs¶
WithValidError(fn ValidErrorFunc) ControllerOpt¶
Registers a predicate identifying expected terminal errors (e.g.
http.ErrServerClosed, context.Canceled). A matching error is treated as a
graceful end-of-run: it neither counts toward the restart total nor is forwarded on
the error channel.
controller := controls.NewController(ctx,
controls.WithValidError(func(err error) bool {
return errors.Is(err, http.ErrServerClosed)
}),
)
WithRestartResetInterval(d time.Duration) ServiceOption¶
Sets the healthy-run duration after which a service's consecutive-failure counter
resets (default controls.DefaultRestartResetInterval = 30 s). Implies a restart
policy if the service has none.
RestartPolicy.RestartResetInterval time.Duration¶
New field on the existing struct (zero selects the default). Adding a struct field is backward compatible for positional-free struct literals.
Other behavioural fixes (no API change)¶
- Idempotent
Start()β a secondStart()while already running is a no-op (no double-start, no hungWait()). - No busy-spin β the error/context handler no longer spins a CPU core after cancellation; all controller goroutines terminate at shutdown.
- Nil
Start/Stopβ services registered without these default to no-ops instead of panicking. WithoutSignalsnow genuinely leaves the default OS signal disposition in place (signal.Notifyis registered only after options are applied, andsignal.Stopis called when the channel is swapped or at shutdown).- Force-stop β on shutdown, services stop in reverse registration order, one at
a time; a
StopFuncthat ignores its context is abandoned at the shutdown deadline rather than hangingWait()forever. - Readiness fails closed β
/readyzreports not-ready for an async readiness check until its first run completes.