Implementation notes β config hot-reload rework¶
Spec: docs/development/specs/2026-06-12-config-hot-reload-rework.md
What was implemented¶
All six findings closed via a container-owned fsnotify watcher in pkg/config.
- D1 β Container owns the watcher.
Containernow carriesfs,configFiles,reloadDebounce, a*fsnotify.Watcher, awatchDonechannel, and async.Mutex.watchConfig()creates one watcher andAdds every configured file, then runswatchLoopin a goroutine. Viper'sWatchConfig/OnConfigChangeare no longer used. - D2 β Candidate-validate-swap.
reload()callsbuildCandidate(fs, files)which readsfile[0]thenMergeInConfigsfiles[1:]into a fresh viper built by the new sharednewResolverViperhelper (same afero fs, env prefix, AutomaticEnv, key replacer, type-by-default as the live viper). If a schema is set, the candidate is validated via the newvalidateViper(v, schema); only on success isc.viperswapped under the lock. - D3 β Returned-error observer contract.
Observable.Run(Containable) errorandAddObserverFunc(func(Containable) error). The unbuffered/unreaderrschannel and the per-observer goroutine +WaitGroupare gone. Observers run sequentially in registration order; a returned error is logged and never blocks subsequent observers or reloads. - D4 β Race-safety.
observers,schema, and the liveviperpointer are all guarded by the container mutex. AllGet*/Has/IsSet/Set/Sub/WriteConfigAs/ToJSON/Validatereads route throughliveViper()(reads the pointer under lock).notify()copies the observer slice under lock and iterates the copy outside the lock. The watcher is started only after construction completes (see below). Verified clean undergo test -race(including a concurrent-AddObserver-during-reload test). - D5 β Single-file watch.
watchConfig()moved out of thelen(files) > 1branch; it is now called for every file-backed container. The previously-vacuous single-file observer test now requiresobserved > 0(polled withrequire.Eventually). - D6 β Configurable debounce + re-watch.
WithReloadDebounce(d)(defaultDefaultReloadDebounce= 250 ms).watchLoopcoalesces the event burst behind a single timer;rewatch()removes and re-Adds the affected path after every event so atomic-rename saves are not missed. - D7 β Fail-closed partial merge.
buildCandidatereturns an error if any file fails to parse/merge (or iffile[0]is missing, honouringErrConfigFileNotFound);reloadthen keeps last-known-good and does not swap.
Watcher start points¶
watchConfig() is invoked after construction completes, in NewFilesContainer,
LoadFilesContainer, and LoadFilesContainerWithSchema (the last after schema is set), so
the reload goroutine never observes a half-built container.
New public surface¶
func WithReloadDebounce(time.Duration) ContainerOptionconst DefaultReloadDebounce = 250 * time.Millisecondfunc (c *Container) Close() errorβ stops the watcher; idempotent; safe on non-watching containers.
Observer signature change (before / after)¶
// Before
type Observable interface { Run(Containable, chan error) }
cfg.AddObserverFunc(func(c config.Containable, errs chan error) { errs <- err })
// After
type Observable interface { Run(Containable) error }
cfg.AddObserverFunc(func(c config.Containable) error { return err })
Migration note: docs/migration/v0.16-hot-reload-observer.md
(also linked from docs/migration/index.md). This is a breaking change; commit carries a
BREAKING CHANGE: footer. Pre-1.0, so it ships as a minor bump with no shim.
Deviations from the spec (and why)¶
-
"Notify observers of the error" on a rejected reload. The spec (D2/D3/D7) says to notify observers of the error on a failed reload. With the returned-error contract there is no longer a channel to push an error into an observer β observers return errors, they don't receive them. Notifying observers on a rejected reload would hand them the unchanged last-known-good config and imply a change occurred. The implementation therefore logs the reload error (container's responsibility) and does not call observers when nothing changed. The verifiable guarantees β
Get*serves last-known-good and the bad config is never visible β are fully met and tested. Flagged as an open question below. -
Observers run sequentially, not in parallel. The old code spawned a goroutine per observer; the new code runs them in registration order on the watch goroutine. This matches the documented "registration order" semantics, removes a goroutine/WaitGroup per reload, and keeps the contract simple. Expensive observers should still offload work (documented in the how-to).
Tests added / changed¶
pkg/config/hotreload_test.go (new): merge-preservation, validation-rollback, partial-merge
fail-closed, primary-file-removed fail-closed, observer-error-no-deadlock, env-prefix
preserved across reload, Close idempotency, and a -race concurrent-observer test.
pkg/config/container_test.go: TestObserver and the single-file watch test updated to the
new signature and a real observed > 0 assertion. Mocks regenerated (go tool mockery).
Coverage on the changed surface: most new functions 80β100%; remaining gaps are defensive
fault-injection paths (fsnotify error channel, watcher.Add/Remove failures,
ReadInConfig mid-reload errors) that need a fake watcher/fs to hit deterministically.
Open questions for review¶
-
Rejected-reload notification (deviation 1). β RESOLVED. Per review, an
OnReloadError(func(error))hook was added to the file-backed container (and to theContainableinterface; mocks regenerated). It is the faithful realization of the spec's "notify observers of the error" intent under the returned-error contract: observers return errors (and so cannot receive a pushed reload-time error), so a dedicated reload-error hook lets embedders react to a rejected reload programmatically. The hook fires on every rejected reload β a fail-closed candidate-build failure (partial-merge / parse error / missing primary file) or a schema-validation failure β where last-known-good is retained; it is not called on a successful reload (observers are). Callbacks are stored in a mutex-guarded slice, copied under the lock and invoked outside it (the same discipline asnotify()), so concurrent registration during active reloads is race-safe. The container still logs the rejection atERROR; the hook is additive. Documented indocs/components/config.md(Β§ Reacting to rejected reloads),docs/how-to/config-hot-reload.md, and the migration note. Tests: validation-rejection fires the hook, fail-closed partial-merge fires the hook, a successful reload does not fire it, and a-raceconcurrent-registration test. -
Reader/embedded containers. Confirmed out of scope and unaffected β
NewReaderContainersets noconfigFiles, sowatchConfig()is a no-op for it.Close()is also a no-op there. -
Test timing tolerance. Hot-reload tests use a small injected debounce (
WithReloadDebounce(20ms)) and poll withrequire.Eventually/require.Neverrather than hard-sleeping. They are green across repeated-raceruns locally, but are inherently filesystem/timer-sensitive; CI on a heavily loaded runner may need the generous timeouts (currently 3 s eventually / 1 s never) kept or raised. -
Migration-note location/versioning. The note is filed as
docs/migration/v0.16-hot-reload-observer.mdand added to the migration index with av0.16 | βrow (no fixed target version, since releaser-pleaser computes the bump). Confirm the naming/versioning convention is acceptable. -
Debounce default validation.
WithReloadDebounce(d)and the option pipeline treatd <= 0as "use default 250 ms". There is no upper bound. Confirm no max clamp is desired.