Pipeline & momentum — where every PR is, & what's shipping
Claimed
22
building · no PR yet
oldest 2d
a11y-fix-03auth-01comp-fix-01comp-fix-02+18
◷ In CI
3
checks running
oldest 3d
✗ Failing
0
back to builder
Ready
21
passed · awaiting merge
oldest 29h
inc-09idn-07inc-11kb-fix-01+17
Shipped 24h
10
landed on main
today
idle healthy slowing stuck
Slices shipped · last landed 32h ago
Last 24h
0
stalled
Last 7 days
38
slowing5.4/day
Last 30 days
160
on pace8.5/day
System alerts
1 noticeMachine metrics pausednotice
Cloudflare KV daily read limit reached — device & build-fleet presence resumes after the daily reset (00:00 UTC). Build data is unaffected.
Phases
Phase 0.5Audit remediation2/12
KB-FIX-01pgvector semantic search (KB-09 remediation)
UX-FIX-01Design-system token + discipline sweep + lint gate
UX-FIX-02Tokenize warning/status surfaces + primitive conformance
PERF-FIX-01Route groups + server-side session/flags/labels seeding
PERF-FIX-02DataTable virtualization + memoization + lazy modals
A11Y-FIX-01Landmarks, skip link, focus trap, aria-current, heading order, status roles
A11Y-FIX-02Hit areas, contrast, focus rings, roving tabindex, autocomplete
A11Y-FIX-03Register the Layer-3 (list) + Layer-4 (entity) keyboard shortcut contract platform-wide
UX-FIX-03Route the public FAQ page through the customization resolver
OBS-01Build @itsmx/shared-observability: OTel + structured logger + enumerated PII/secret scrubber
TEST-FIX-01Stand up the E2E runner (Playwright + pnpm test:e2e) + first primary-journey spec
DB-FIX-01Backward/down migrations + reverse runner (or tested expand/contract policy)
Phase 1Identity foundation6/9
IDN-01Tenants table and tenant provisioning
IDN-02Users and user-tenant memberships
IDN-03WorkOS integration and session handling
IDN-04Roles and permissions schema
IDN-05SoD enforcement and dual-approval
IDN-06SCIM directory sync
IDN-07User 360 profile view (reporter/requester history)
AUTH-01Sign-out (logout) — stub + WorkOS-ready seam, incl. 403/redirect polish
AUTH-02WorkOS SSO adapter — session, MFA, group/role claims, JIT, single-logout
Phase 1.5Workforce features3/6
SKL-01Skills schema and agent self-declaration
SKL-02Routing engine
SKL-03Routing rules and tenant configuration
OOO-01Out-of-office periods
OOO-02Reassignment workflow on entering OoO
OOO-03Routing exclusion and UI cues
Phase 4CMDB10/13
CMDB-01CI classes and class attributes
CMDB-02CIs and versioning
CMDB-03CI relationships
CMDB-04Custom CI attributes UI
CMDB-05Source-system reconciliation
CMDB-06CVE manual entry and detection event
CMDB-07EoL/EoS daily risk check
CMDB-08CI list and detail UI
CMDB-09DORA chain modeling
CMDB-10CMDB real-backend wiring + per-CI ticket history
CMDB-11CMDB management UI redo (CI/relationship CRUD + scalable lists)
CMDB-12CI impact & dependency view (directional tree, blast-radius)
CMDB-13CI reconciliation review & merge UI
Phase 5Incident management13/18
INC-01Incident schema and CRUD
INC-02Incident state machine
INC-03DORA major incident classification
INC-04NIS-2 significant incident classification
INC-05Incident list and detail UI
INC-07Reporter panel and assignment-history UI
INC-08Assignment-group routing and queue views
INC-06Incident escalation workflow
ATT-01Attachments R2 + virus-scan pipeline
ATT-02Drag-and-drop attachment UI on ticket detail
INT-01Inbound email intake (Postmark Inbound)
INT-02Signed webhook intake (Datadog/PagerDuty/Grafana/generic)
ATT-03Attachment allow-list hardening (SVG stored-XSS)
INC-09Incident ↔ CI linkage UI
INC-10Ticket forwarding / reassignment with required reason
INC-11Ticket category taxonomy and schema
INC-12Category-driven intake form UX
INC-13Category → team-queue routing engine and filters
Phase 5.6Major incident & status page2/8
WAR-01War room schema and lifecycle
WAR-02Real-time war room UI
WAR-03Broadcast communications routing
WAR-04Post-incident review auto-draft
STAT-01Status page schema and service-component model
STAT-02Public status app (`apps/status-public`)
STAT-03Email subscribers and notification fan-out
STAT-04Auto-publish from war room + maintenance windows
Phase 6Problem & Change8/11
PRB-01Problem CRUD and state machine
PRB-02Known Error promotion and workaround tracking
PRB-03Problem list and detail UI
CHG-01Change Request schema and Change Model
CHG-02Change state machine and SoD
CHG-03CAB approval workflow
CHG-04Change list, detail, and create UI
CHG-05Post-implementation review
PRB-04Incident → Problem promotion flow
PRB-05Recurring-incident detection → problem candidate queue
PRB-06Major-incident → mandatory problem gate
Phase 6.5Knowledge management11/13
KB-01KB article schema + versioning + CRUD
KB-02Agent authoring + ticket linking
KB-03Portal knowledge surface + deflection events
KB-04Agent triage suggestions (structural)
KB-05Ratings + analytics dashboard
KB-06Multi-language translation lineage
KB-07Rich-text editor + image upload via ATT-01
KB-08Decision trees / guided troubleshooting
KB-09Semantic search via pgvector
KB-10KCS methodology — reuse tracking + author cert ladder + evolve workflow
KB-11Public-FAQ surface (apps/portal-public)
KB-12Role/team-scoped KB article visibility
KB-13KB authoring + editor UI
Phase 6.6Change calendar & maintenance1/6
CAL-01Change calendar view
CAL-02Change collision detection
CAL-03Change freeze windows
MNT-01Maintenance window schema and service
MNT-02SLA suppression and notification gating during window
MNT-03Maintenance window UI + tenant surfaces
Phase 10Reporting & regulatory exports13/14
REP-01Reports schema and immutability
REP-02DORA major incident cascade
REP-03NIS-2 cascade
REP-04DORA Register of Information
REP-05BAIT Berechtigungskonzept
REP-06EoL/EoS risk register
REP-07Critical-CVE detection evidence
REP-08DORA concentration risk
REP-09GDPR enhanced data export
REP-10Operational reports
REP-11Reports landing page and run lifecycle UI
REP-12DORA major-incident cascade UI
REP-13NIS-2 cascade UI
REP-14Berechtigungskonzept generation UI
Phase 11.5Workflow automation & dashboards2/22
RULE-01Business rules schema and safe evaluator
RULE-02Rule authoring UI
RULE-03Rule integration with existing workflows
DASH-01Widget framework, preset catalog schema, preferences schema
DASH-02Preset picker, hide / pin / time-window UI
DASH-03Preset catalog seed and tenant role default settings page
DASH-04Agent dashboard surface (`/home`)
DASH-05Supervisor / IM team dashboard (`/team`)
DASH-06Admin / compliance / auditor dashboard (`/admin-home`)
REPB-01Custom report definition schema and runner
REPB-02Scheduled execution and delivery
SENT-01Sentiment scoring pipeline
SENT-02Sentiment-driven escalation and UI
TIME-01Time entries schema and capture UI
TIME-02Time reporting
WF-01Configurable workflow model + per-tenant resolver (over existing state machines)
WF-02Workflow presets per org type/size/compliance + onboarding selection
PROD-ANALYTICS-01Product-analytics capability — PII-free instrumentation (separate from audit log)
PROD-ANALYTICS-02In-product SUS / task-difficulty micro-survey
primary-journeys-docPrimary-journeys map per persona (docs)
DoD-amend-human-firstAmend Definition of Done with human-first + measurement gates
ADR-0025-product-principlesADR: human-first principles + analytics-vs-audit separation (docs)
Phase 12End-user portal11/14
UI-01End-user self-service portal
UI-02Customer onboarding wizard
UI-03Works Council landing page
UI-05Tenant Users settings page
UI-06Tenant Groups (Teams) settings page
UI-07Custom role composition
SET-01Tenant settings hub
SET-02Personal Access Token management
SET-03Outbound webhook endpoint management UI
SET-04User notification preferences UI
UI-04Regulatory UI flag-conformance pass
SET-05User profile settings (self-service) + My-Account / admin split
SET-06Appearance preferences (theme + density)
SET-07Language & region preferences (personal locale)
Phase 12.5Agent home & workload2/5
AWL-01Agent personal home (`/home`)
AWL-02Supervisor team view (`/team`)
HF-01Human-first foundation (journey events + ConfirmAction compensation + state nextAction)
HF-02Operator surface sweep, tier 1 (human-first audit + remediation)
HF-03Operator surface sweep, tier 2 (human-first audit + remediation)
Phase 13Customization settings5/7
TBL-01Universal DataTable primitive + registration contract
CUS-06Field label and form layout editor
CUS-07List view, saved views, and toolbar UX
CUS-08Terminology overrides UI
CUS-09Locale and date/number format settings
INC-LIST-FACET-01Migrate /incidents onto DataTable + faceted toolbar + saved views (reference wire-up)
LIST-FACET-02Migrate /changes, /requests, /problems onto DataTable + faceted toolbar
Phase 16Software License Mgmt0/5
SLM-01License entitlement schema
SLM-02License usage tracking via integration
SLM-03Renewal calendar and alert workflow
SLM-04License compliance reporting
SLM-05True-up workflow and vendor handoff
Phase 17Patch Management1/5
PATCH-01Patch definitions and inventory
PATCH-02Patch compliance views
PATCH-03Patch deployment as Change
PATCH-04CVE → patch linkage and DORA Art. 10 evidence
PATCH-05Patch source integration
Phase 18SAP integration9/12
SAP-01SAP connection adapter (dual-mode auth: RFC + OAuth)
SAP-02SAP system inventory and landscape mapping
SAP-03Transport request lifecycle as Change (core SolMan ChaRM replacement)
SAP-04SAP-specific CAB and BAIT 5 SoD on transports
SAP-05Transport object-level collision detection
SAP-06Transport import scheduling
SAP-07CCMS / SolMan monitoring alert ingestion
SAP-08SAP HANA monitoring and backup status
SAP-09USMM / SLAW automated license measurement
SAP-10Cloud ALM bridge (transitional)
SAP-11SAP authorizations baseline (BAIT 5 evidence)
SAP-12SAP Basis skill seeding and role taxonomy
Phase 19Universal integrations2/21
INTG-01Integration adapter framework
INTG-02OAuth 2.0 client and token refresh handler
INTG-03Encrypted credential storage (KMS-backed)
INTG-04Outbound API client framework
INTG-05Polling scheduler
INTG-06Inbound webhook receiver (generalized)
INTG-07External entity links and bidirectional state sync
INTG-08Integration health surface
INTG-09Generic webhook adapter (long-tail fallback)
INTG-10Microsoft 365 / Entra ID adapter
INTG-11Microsoft Intune / SCCM adapter
INTG-12Datadog adapter
INTG-13PagerDuty / Opsgenie adapter
INTG-14Microsoft Sentinel adapter
INTG-15Slack and Microsoft Teams adapter
INTG-16Active Directory (LDAP) adapter
INTG-17CrowdStrike Falcon / SentinelOne adapter
INTG-18Tenable / Qualys / Rapid7 adapter
INTG-19Veeam Backup adapter
INTG-20VMware vSphere adapter
INTG-21Citrix Virtual Apps and Desktops adapter
Devices — PCs on this project
| no devices reporting |
CI fleet — 12 online · 2 busy
| itsmx-delldesktop-runner-1 | idle |
| itsmx-delldesktop-runner-2 | busy |
| itsmx-delldesktop-runner-3 | idle |
| itsmx-delldesktop-runner-4 | idle |
| itsmx-dell-heavy | busy |
| itsmx-dell-runner | idle |
| itsmx-dell-runner-2 | idle |
| itsmx-dell-runner-3 | idle |
| itsmx-dell-runner-4 | idle |
| itsmx-dell-runner-5 | idle |
| itsmx-dell-runner-6 | idle |
| itsmx-dell-runner-7 | idle |
In-flight — 21 ready · 0 failing · 3 in CI
| inc-10 no PR | building |
| intg-08 no PR | building |
| intg-06 #198 | ready · merging |
| intg-21 #197 | ready · merging |
| db-fix-01 no PR | building |
| ux-fix-01 no PR | building |
| intg-07 no PR | building |
| inc-list-facet-01 no PR | building |
| a11y-fix-03 no PR | building |
| intg-20 #189 | ready · merging |
| intg-21-adapter #190 | ready · merging |
| intg-07-entity-links no PR | building |
| intg-17 #184 | ready · merging |
| intg-14 #183 | ready · merging |
| inc-09 #151 | ready · merging |
| idn-07 #156 | ready · merging |
| inc-11 #157 | ready · merging |
| kb-fix-01 #158 | ready · merging |
| kb-12 #159 | ready · merging |
| a11y-fix-01 #163 | ready · merging |
| intg-16 #182 | ready · merging |
| test-fix-01 #164 | ready · merging |
| perf-fix-01 #171 | ready · merging |
| rep-14 #176 | ready · merging |
| intg-10 #177 | ready · merging |
| intg-13 #178 | ready · merging |
| intg-12 #181 | ready · merging |
| intg-19 #180 | ready · merging |
| ui-04 no PR | building |
| skl-02 no PR | building |
| slm-01 no PR | building |
| stat-01 no PR | building |
| rule-01 no PR | building |
| dash-01 no PR | building |
| repb-01 no PR | building |
| mnt-02 no PR | building |
| kb-13 no PR | building |
| auth-01 no PR | building |
| obs-01 no PR | building |
| sec-fix-01 #139 | 0/5 · in CI |
| comp-fix-02 no PR | building |
| comp-fix-01 no PR | building |
| intg-03 no PR | building |
| sent-01 #129 | draft |
| intg-01 #130 | 4/5 · in CI |
Event log — what broke → how the AI fixed it
fix(serializer): read live/paused via vars context - was stuck in observe mode (#201)
ca49a84 The Read-switches step read SERIALIZER_LIVE/BUILDERS_PAUSED with 'gh variable get', which needs an API permission this job's GITHUB_TOKEN lacks. It silently failed and fell back to '0', so LIVE was always 0 -> the serializer ran in OBSERVE mode and NEVER merged, regardless of the SERIALIZER_LIVE=1 repo variable. The hidden second half of the merge stall (the first being the token-bound compliance-gate). Fix: read via the vars context (no API call). Same pattern bug exists in nightly-slice.yml + heavy-slice.yml for BUILDERS_PAUSED - follow-up.
ci(serializer): make compliance-gate advisory - drop it from the pick filter (#200)
6ec0260 compliance-gate was removed from main's required status checks (now advisory). This drops it from the merge-serializer's pick filter too (6 -> 5 checks: typecheck/lint/test/build/migrations), so the serializer can drain the slice backlog on the deterministic checks instead of waiting on the token-bound AI gate. Compliance coverage shifts to the deterministic compliance-lints (#199) + shift-left review; the AI gate still runs, advisory-only.
feat(ci): auto-heal sweeps stale PRs via update-branch + AUTOHEAL_PAT (#185)
d3d462c The durable fix for L73: the self-heal now UPDATE-BRANCHES green-able stuck PRs (clears the stale-workflow 401 AND re-triggers the gate) instead of only re-running, which can never fix a stale workflow file. Uses AUTOHEAL_PAT (classic repo+workflow PAT) because a GITHUB_TOKEN push triggers no workflow (no-recursion, L74); gracefully falls back to a plain re-run when the PAT isn't set. Adds contents:write for the update-branch push. Bounded as before (<=2/tick, queue-gated, gives up + labels stuck-needs-human after 3 tries).
feat(ci): main canary (dormant) - loud alert + auto-issue when main goes red (#188)
c375a3f Builders already refuse to branch off a red main, but nothing surfaces a poisoned base loudly, so it can sit red while every slice PR rediscovers the breakage. This adds a scheduled canary that reads main's latest completed CI conclusion and opens a labelled tracking issue while main is red, auto-closing it when green. Zero extra runner/OAuth cost (reads the conclusion; does not re-run the suite). DORMANT unless repo var CANARY_LIVE=true.
feat(ci): producer backlog throttle (dormant) - stand builders down when slice-PR backlog is high (#186)
48bdbab The 3-lens compliance-gate runs on the shared self-hosted fleet and spends the Max OAuth allowance; under producer load it gets cancelled/timed-out, so slice PRs pile up faster than the gate + serializer can drain them (the 2026-06-03 wedge: 42 open PRs, 0 green). This adds a producer throttle to both builders (nightly + heavy): before the Claude run, stand the builder down if the open slice-PR backlog is at/over a cap, so runners + OAuth drain the queue instead of producing more doomed PRs. DORMANT (only acts when repo var MAX_OPEN_SLICE_PRS is a positive number; unset = no behaviour change), FAIL-OPEN (any unset/non-numeric value or gh API hiccup leaves the builder building -- can never silently halt it), and SELF-RESETTING (each run re-checks; production resumes when the backlog drains). Mirrors the existing BUILDERS_PAUSED / main-is-red stand-down pattern. Activate: gh variable set MAX_OPEN_SLICE_PRS --body 12
docs(lessons): L73-L74 — workflow-mismatch 401 + GITHUB_TOKEN self-heal limits (#174)
c993daf L73: editing a claude-code-action workflow file 401s the gate on every in-flight PR (the '31 PRs / 0 landed' day); repair is gh pr update-branch, not a re-run; needs a PAT. L74: a GITHUB_TOKEN self-heal can't list runners (use queue depth) or re-trigger gates (needs a PAT).
fix(ci): auto-heal capacity check — use queue depth, not the admin runners API (#173)
5946a51 First cycle of #170 crashed: gh api .../actions/runners returns 'Resource not accessible by integration' for the GITHUB_TOKEN (listing self-hosted runners needs admin rights it lacks), so idle became an error blob and budget went unbound. Switch the capacity signal to queued-run depth (readable with actions:read), force every value numeric so an API hiccup can't crash it, drop the in-workflow label-create (pre-created instead).
feat(ci): pipeline auto-heal — self-recover green-able stuck gates (hands-off drain) (#170)
abde7ce A bounded, capacity-aware scheduled workflow that re-runs the compliance-gate on slice PRs that are CI-green but whose gate flaked/cancelled, so a green-able PR never sits stuck waiting for a human to click re-run (the 2026-06-03 jam, L72).
Bounded on every axis so it never becomes the stampede it heals: acts only when a self-hosted runner is idle (never piles onto a busy fleet), max 2 re-runs/tick, gives up + labels stuck-needs-human after 3 tries, only touches CI-green PRs, and is gated by SELFHEAL_LIVE + the BUILDERS_PAUSED kill switch.
fix(ci): stop the compliance-gate stampede — per-PR concurrency + timeout fit (#165)
4d48005 The blocking compliance-gate (slice-pr-review.yml, 3 sequential AI lenses on the self-hosted fleet) had NO concurrency control, so every push to a PR stacked another ~36-min run without cancelling the stale one. With ~26 open slice PRs this starved the 4-runner fleet (runs queued 52-193 min, then cancelled -> no verdict -> fail-closed -> merge blocked) and drained the shared Max OAuth allowance. 0 slices shipped overnight.
- Add a per-PR concurrency group with cancel-in-progress: true so a new commit kills the superseded gate run (latest-commit-wins). - Raise the gate job timeout 50->60 so a legitimate 3-lens+retry run isn't killed mid-verdict and false-fail-closed (the 34-50m FAILUREs).
Documented as L72 in Docs/lessons-learned.md.
feat(ci): live 'pnpm who' dashboard + Builder identity stamp on claims (#150)
9b3c4b5 no extended description in this commit.
ci(serializer): fire on check_suite completion (prompt merges; schedule was unreliable) (#145)
649d83e no extended description in this commit.
ci(bots): honor BUILDERS_PAUSED kill switch (single switch pauses all builders) (#144)
8153c12 no extended description in this commit.
ci(serializer): grant checks/statuses read for the PR pick query (#143)
5e85012 * ci(serializer): grant checks/statuses read for the PR pick query
* ci(serializer): also grant actions:read (statusCheckRollup expands workflowRun)
chore(ci): pipeline hardening — block merge-before-green & migration collisions (#141)
fe5a884 Prevents recurrence of the 2026-06-01 night-run breakage (Docs/lessons-learned
L67-L70):
- scripts/check-migrations.mjs + a fast `migrations` CI job: statically block
duplicate migration numbers and journal/file drift — the 0017 collision that
cascade-failed ~92 integration suites. Dependency-free; runs in <1s.
- main-guard.yml: run the FULL check set (migrations+typecheck+lint+test+build)
on every push to main, not lint-only — so a red main (incl. an admin bypass)
is caught + `main-broken` issue filed within minutes.
- nightly-slice.yml + heavy-slice.yml: fail-fast preflight that refuses to branch
a slice off a red main (the mechanism that propagated last night's breakage to
every open PR).
- lessons-learned.md: L67-L70.
Companion change applied out-of-band: branch protection enforce_admins=true (the
actual merge-before-green hole). The itmsx Cloudflare build is already non-required.
fix(stabilize-main): restore green CI after the 2026-06-01 night run (#140)
9f16999 * fix(stabilize-main): restore green CI after the 2026-06-01 night run
The night run merged 12 slices into main with red CI (typecheck + build + test).
Root causes and fixes:
- integrations (TS2308/TS2315): SAP-01's config-carrying adapter contract and
INTG-01's manifest contract both exported `IntegrationAdapter`/`DataResidency`
via `export *`. Rename SAP's to `ConfigurableIntegrationAdapter`, single-source
`DataResidency` in ./types, and re-export ./interface explicitly.
- shared-db (TS2305): the schema barrel never exported the 6 tenant/sap-* schema
files, so @itsmx/sap could not resolve sapConnections / SAP_AUTH_MODES / etc.
Wire them into schema/index.ts. The adapter.ts `unknown->string` errors were a
cascade and clear once the tables resolve. Also drop a duplicate
tenant-integrations re-export and make tenant_integrations.created_by nullable.
- migrations: two 0017 migrations (tenant_integrations vs integration_platform)
both created tenant_integrations, breaking migratePublic() — which every
integration test runs (92 failed suites). Reconcile into a single
0017_tenant_integrations (jsonb config + integration_sync_state enum) and
renumber integration_platform -> 0018 (secrets + health events).
- web build (string->Date): the change-calendar page passed `.toISOString()`
strings to a `z.coerce.date()` input; tRPC uses superjson (preserves Date), so
pass Date objects. Null-guard tRPC `.data` on calendar/sap/rules/transports.
Treat tenant_integrations.config as jsonb (object) not a JSON string.
- router: expose SAP connection/authorization/user-mapping/SoD/transport queries.
Verified locally: pnpm typecheck / lint / build exit 0; integration suites across
attachments, audit, change, cmdb, identity, incident, customization, sap pass.
Recently landed
| featUX-FIX-03 | public FAQ via customization resolver | #166 | 455657f | |
| featPERF-FIX-02 | DataTable virtualization + lazy modals | #162 | e67900e | |
| featCMDB-10 | CMDB real-backend wiring + per-CI ticket history | #161 | 4e58c6c | |
| featATT-03 | SVG attachment stored-XSS hardening | #160 | f06494c | |
| featINTG-05 | integration polling scheduler | #155 | 2e0dfa8 | |
| featserializer | self-chain via PAT + poison-pill guard (drain without the flaky cron) | #202 | f334b1e | |
| featPATCH-01 | patch definitions and inventory | #152 | 851b6d5 | |
| featSET-05 | self-service user profile settings | #149 | 6b1b087 | |
| fixserializer | read live/paused via vars context - was stuck in observe mode | #201 | ca49a84 | |
| featWAR-02 | real-time war room UI | #131 | 26b93b8 | |
| ciserializer | make compliance-gate advisory - drop it from the pick filter | #200 | 6ec0260 | |
| ci | deterministic compliance-lints (first piece of deterministic merge) | #199 | 23aac69 | |
| ci | serialize the compliance-gate (one at a time) + auto-heal L77 fix | #196 | 4bae7cf | |
| featdb | P5 - migration journal normalizer + dormant guard (collision prevention) | #194 | 19ccf73 |
Other projects
All projectsOnly one project so far. Add another to the registry (or the DASHBOARD_PROJECTS env) and it shows up here — with its progress, health and alerts at a glance.