Desktop-gated Integration Tests Specification¶
- Authors
- Matt Cockayne, Claude Opus 4.8 (AI drafting assistant)
- Date
- 20 June 2026
- Status
- APPROVED
Environment requirement β read first
Work Items 2 (OS keychain) and 3 (WKD) MUST be done on a desktop
environment with a working, unlocked OS keychain. The headless homelab dev
server has no Secret Service daemon (no gnome-keyring + dbus session), so
go-keyring cannot store or retrieve and these tests cannot run there at all.
Do not attempt them over SSH on the homelab box β schedule a session on a
macOS / Windows / Linux-desktop machine with an unlocked login keyring.
Work Item 1 (live VCS) and Work Item 4 (live chat) need credentials but are not desktop-gated β they run anywhere with the right tokens/keys.
Summary¶
Phase 11 ("integration: GitLab + keychain + WKD + hot-reload") and Phase 15
("chat live-provider") of docs/development/plans/2026-06-17-test-coverage-closure.md
are partially complete:
- DONE β config hot-reload.
pkg/config/hotreload_test.goalready exercises the real fsnotify round-trip end-to-end (write a real file int.TempDir(), assert observers fire) across eight scenarios. The plan's "unit-only" note was stale; nothing to add. - DEFERRED β everything else in Phases 11 and 15. These need live credentials (VCS tokens, AI API keys) or a real desktop environment (an OS keychain), neither of which the headless homelab dev server provides. This spec pins them down so they can be picked up in one sitting when the prerequisites exist β and so they are not forgotten.
This spec is the single authoritative record of this deferred work; it absorbs and replaces the earlier lightweight follow-up plan (now removed).
All work items are gated with testutil.SkipIfNotIntegration(t, "<tag>") so
they compile and skip in normal/CI runs and only execute when the matching
INT_TEST_<TAG>=1 (or INT_TEST=1) env var and prerequisites are present.
They live in dedicated *_integration_test.go files per the project convention,
and each new INT_TEST_* tag is added to the inventory in
docs/development/integration-testing.md when its work item lands.
Goals & Non-Goals¶
Goals¶
- Real-dependency coverage for the four surfaces currently exercised only against
mocks /
httptest: live VCS (GitLab + GitHub), OS keychain, WKD, and live chat providers. - A crisp, environment-explicit record so the desktop-only work is scheduled on the right machine and the homelab-runnable work is picked up opportunistically.
- Per-item prerequisites, gating tag, what-to-test, and acceptance criteria.
Non-Goals¶
- No changes to production code are expected; these are test-only additions. If a minimal, additive test seam is genuinely required (mirroring an existing one), note it in the implementing MR.
- No re-doing the hermetic coverage already delivered (config hot-reload; the
httptest/in-memory unit suites for the VCS providers, chat client, WKD derivation, and keychain viacredtest). - Not a CI-gating change. These stay opt-in; running them in CI is a separate decision once the secrets/runners exist.
Work Item 1 β Live VCS integration (GitLab + GitHub) Β· NOT desktop-gated¶
Effort: low. Where: anywhere, including the homelab. Blocked on: a token + a throwaway test project. Pick this up first β it is the cheapest and needs no special environment.
Why it matters: pkg/vcs/gitlab and pkg/vcs/github are unit-tested with
httptest only. GitLab nested-group / Enterprise path handling and the live
PR / release-asset round-trips have no real-API coverage β the headline VCS
capability is exercised only against mocks. (A prior pkg/vcs/github
client_integration_test.go was removed because it hardcoded a fake
GITHUB_TOKEN; do not reintroduce that pattern.)
What to test (gated, INT_TEST_VCS=1):
- GitLab
GetLatestRelease/GetReleaseByTag/ListReleases/DownloadReleaseAssetagainst a real project β including a nested-group path (group/subgroup/repo) and, if available, a self-hosted/Enterprise host viaReleaseSource.Host. - GitLab MR create / update / label / get-by-branch on a scratch branch, with
cleanup in
t.Cleanup. - GitHub equivalents (release read + a PR lifecycle), including a
WithEnterpriseURLshost if a GHE instance is available. - Token resolution precedence (env vs config).
Prerequisites / credentials:
| Need | Detail |
|---|---|
GITLAB_TOKEN |
api scope on a throwaway project the test can create branches/MRs in. |
| GitLab test project | owner/repo; ideally also a nested-group project; optionally a self-hosted host for the Enterprise path. |
GITHUB_TOKEN |
repo scope on a throwaway GitHub repo (NOT the archived phpboyscout/gtb mirror β it rejects writes). |
Acceptance: pkg/vcs/gitlab/*_integration_test.go (+ a GitHub counterpart)
gated by INT_TEST_VCS=1; each test self-cleans created branches/MRs; inventory
updated.
Work Item 2 β OS keychain real round-trip Β· β οΈ DESKTOP REQUIRED¶
Effort: medium. Where: a desktop with a working OS keychain β NOT the homelab. Blocked on: an unlocked OS keychain session.
Why it matters: pkg/credentials/keychain is exercised today only via
keyring.MockInit / the in-memory credtest backend. The real OS backend
round-trip and the runtime resolution precedence against a real keychain are
unverified.
What to test (gated, INT_TEST_KEYCHAIN=1):
- Real store β retrieve β delete via the OS backend (blank-import
pkg/credentials/keychain), with a unique per-run service/account so repeat/parallel runs don't collide; clean up int.Cleanup. - Runtime resolution precedence end-to-end with a real keychain entry present:
env β keychain β literal β well-known fallback(the matrix in CLAUDE.md Β§ Credential Storage). - The
doctorcredentials.no-literalcheck against a real-keychain-backed config.
Prerequisites / environment:
| Need | Detail |
|---|---|
| Desktop OS keychain | macOS Keychain / Windows Credential Manager built-in, or a Linux desktop with gnome-keyring + an active dbus session. |
| Unlocked login keyring | A locked keyring prompts or errors; unlock it for the run. |
| NOT the homelab | The headless server has no Secret Service daemon β these will not run there. |
Acceptance: pkg/credentials/keychain/*_integration_test.go gated by
INT_TEST_KEYCHAIN=1; documented as desktop-only in
docs/development/integration-testing.md.
Work Item 3 β WKD against a live openpgpkey host Β· bundled with Item 2¶
Effort: low. Where: bundled into the same desktop session as Item 2 (no hard desktop dependency itself β only network egress β but grouped so both real- crypto items are done in one sitting). Blocked on: a live WKD host.
Why it matters: the WKD resolver (pkg/openpgpkey + the update signing trust
path) is tested against httptest only. Resolving a real key from a real Web Key
Directory over HTTPS β advanced vs direct method URLs β is unverified end-to-end.
What to test (gated, INT_TEST_WKD=1):
- Resolve a known email to its published key from a live openpgpkey host and
assert the fingerprint matches an expected value. The project's own
openpgpkey.phpboyscout.uk(which publishes the release-signing key) is the natural target if its WKD tree is live; otherwise a known third-party WKD email. - Both the advanced-method and direct-method URLs.
- A not-found / unreachable-host path returning the expected error.
Prerequisites:
| Need | Detail |
|---|---|
| Live WKD host | e.g. openpgpkey.phpboyscout.uk serving a key for a known email; confirm the tree is published. |
| Known email + fingerprint | The expected identity to assert against. |
| Network egress | HTTPS to the host. |
Acceptance: pkg/openpgpkey/*_integration_test.go gated by INT_TEST_WKD=1;
inventory updated.
Work Item 4 β Live chat-provider coverage Β· NOT desktop-gated¶
Effort: medium. Where: anywhere with API keys (costs money β run sparingly). Blocked on: AI provider API keys. This is Phase 15 of the coverage plan.
Why it matters: pkg/chat is unit-tested with in-process fakes / httptest
only. Real Anthropic / OpenAI / Gemini SSE streaming and auth-mode behaviour have
zero live coverage.
What to test (gated, INT_TEST_CHAT_LIVE=1 β distinct from the existing
in-process chat tag):
- A minimal real request/response and an SSE streaming turn per provider (Anthropic, OpenAI, Gemini), asserting event ordering and final assembly.
- Auth-mode resolution (env-var key) and
ValidateBaseURLagainst the real endpoints. - Keep token usage tiny (short prompts, low max-tokens) β these cost money.
Prerequisites:
| Need | Detail |
|---|---|
ANTHROPIC_API_KEY / OPENAI_API_KEY / GEMINI_API_KEY |
One per provider under test; tests skip per-provider when its key is absent. |
| Network egress | HTTPS to each provider. |
Acceptance: pkg/chat/*_live_integration_test.go gated by a new
INT_TEST_CHAT_LIVE tag (so it never runs with the existing in-process chat
suite); per-provider skip when the key is missing; inventory updated. Pair with
the doc note already added in integration-testing.md clarifying the in-process
chat tests need no keys.
Testing Strategy & gating¶
- Each item is a dedicated
*_integration_test.gogated by its tag viatestutil.SkipIfNotIntegration. New tags:INT_TEST_KEYCHAIN,INT_TEST_WKD,INT_TEST_CHAT_LIVE(VCS reuses the existingINT_TEST_VCS). - Tests must self-clean any remote/keychain state they create (
t.Cleanup, unique per-run identifiers). - They stay opt-in β never enabled by the default suite or CI without a separate, deliberate secrets/runners decision.
- When each item lands: update Phase 11/15 of the coverage-closure plan and add
the tag + env vars to the inventory in
docs/development/integration-testing.md.
Implementation phases (pick-up order)¶
- Item 1 β live VCS (GitLab/GitHub). As soon as a token + throwaway project are to hand. Homelab-runnable. Low effort.
- Items 2 + 3 β keychain + WKD, together, on a desktop with an unlocked OS keychain. This is the desktop-gated sitting.
- Item 4 β live chat. Whenever AI keys are available; independent of the others; keep it cheap.
Open Questions¶
- Which throwaway GitLab/GitHub projects to target for Item 1 β a dedicated
*-it-sandboxrepo per forge is recommended so cleanup can be aggressive. - Is
openpgpkey.phpboyscout.uk's WKD tree currently published for a known email (Item 3)? If not, choose a stable third-party WKD identity. - Should Item 4 run on a schedule (cron) once keys exist, given its cost, rather than on every opt-in run? Default: manual/opt-in only.