21 services.
2 of them actually run tests.
A snapshot of test coverage across the platform — and a path to 100%. The good news: a mature test culture already exists in pockets. The bad news: it isn't enforced.
We aim for 100% test coverage.
Not as a vanity metric. As the floor that lets engineering ship at velocity without panic-fixing in production. Today the floor is missing. Most user-facing surfaces — mobile, web apps, the shared component library — are effectively untested, and the test suites we have written sit unrun. The first move is mechanical, not cultural: turn on the tests we already wrote.
- Coverage today is D+. 21 services audited; 2 actually run their tests in CI. 153 already-written tests sit unused.
- We're standardizing the stack: Vitest for unit + component, Playwright for web E2E, Pest for Laravel, Jest for Node + RN, Maestro for mobile E2E, pytest for Python jobs.
- Definition of done becomes the contract: tests for the change, CI green including the test job, ≥70% coverage on changed lines, observability hooks, ownership in CODEOWNERS.
- 100% is the asymptote, not the deadline. The goal is fewer bugs, engineer-owned quality, refactor velocity that compounds.
- Coverage alone won't catch everything. Synthetic monitoring, load testing on the notification path, and security scanning in CI close the rest of the pilot risk.
Where the floor is.
Grades reflect the combination of tests written and tests actually executed by CI. A service with a strong suite but no CI gate gets a B. A service with no tests at all gets an F regardless of how stable it feels in production.
Mobile
1 service| Service | Stack | Test files | CI runs tests | Grade | Notes |
|---|---|---|---|---|---|
| nova | Expo RN + Swift + Android | 0 | no | F | Build only · no test harness anywhere |
Web
4 services| Service | Stack | Test files | CI runs tests | Grade | Notes |
|---|---|---|---|---|---|
| dashboard | React 18 + Vite | 1 | no | D | Vitest configured · CI skips it |
| admin | React 18 + Vite | 0 | no | F | — |
| pulse | React 18 + Vite | 0 | no | F | — |
| launch-agent | React + Vite (embed) | 0 | no | F | — |
Backend
9 services| Service | Stack | Test files | CI runs tests | Grade | Notes |
|---|---|---|---|---|---|
| api | PHP 8.3 / Laravel 11 | 132 | no | B | PHPUnit + Pest · CI runs lint only |
| error-assistant | Node 20 Lambda | 17 | on deploy | A- | Tests run on deploy — reference template |
| slack | Node 20 + Bolt | 4 | no | D | — |
| agent | Node library (voice) | 1 | no | F | — |
| ai-proxy | Node Lambda (LLM proxy) | 0 | no | F | `echo "No tests yet"` |
| relay-service | Node Lambda (orchestration) | 0 | no | F | `echo "No tests yet"` |
| notification-hub | Node Lambda (Pusher) | 0 | no | F | `echo "No tests yet"` |
| meet-bot | Node stub | 0 | no | F | — |
| uploader | Express + Uppy | 0 | no | F | — |
Data
2 services| Service | Stack | Test files | CI runs tests | Grade | Notes |
|---|---|---|---|---|---|
| glue | Python 3 jobs | 0 | no | F | 15+ destructive scripts |
| unstructured | Python + Flask | 0 | no | F | — |
Infra
4 services| Service | Stack | Test files | CI runs tests | Grade | Notes |
|---|---|---|---|---|---|
| lambda | Node autoscale glue | 0 | no | F | — |
| kestra | Container orchestration | — | — | n/a | — |
| cdn | Static + shell | — | — | n/a | — |
| investor-deck | Static HTML | — | — | n/a | — |
Lib
3 services| Service | Stack | Test files | CI runs tests | Grade | Notes |
|---|---|---|---|---|---|
| zunou-react | React component library | 0 | no | F | 20+ components consumed everywhere |
| zunou-queries | React Query + GraphQL hooks | 0 | no | F | Core data layer |
| zunou-graphql | GraphQL codegen | — | — | n/a | — |
What this audit changes about how we ship.
132 PHPUnit + Pest tests sit unused.
The Laravel monolith has the strongest test culture in the repo — unit, feature, and integration suites. test-api.yml runs make lint only. Wiring PHPUnit into that workflow is the single highest-leverage move in this audit. One PR, hours of work, immediately protects every PR.
nova ships untested to user devices.
Native Swift in ios/Zunou.xcodeproj/, the RN business logic in src/hooks/, the Android build — no XCTest, no jest-native, no Detox, no Maestro. Every release is a manual-QA roll of the dice. The Tokyo pilot in §06 of the GTM writeup ships through this surface.
The shared libraries inherit zero coverage.
zunou-react and zunou-queries back dashboard, admin, and pulse. They have zero tests. Every untested consumer is a multiplier on a fragile foundation. Coverage on the libraries is the highest-impact-per-test you can write.
error-assistant already does this right.
17 Jest tests covering Cloudwatch ingest, deduplication, agent tools, validators. Tests run on deploy. Copy this template into ai-proxy, relay-service, and notification-hub. The plumbing is portable.
Coverage is not a vanity metric. It's the contract.
100% is the asymptote, not the deadline. Chasing it produces a codebase that ships at velocity without panic — and an engineering culture where quality is owned by the people writing the code, not delegated downstream.
Fewer bugs in production
Every line covered is one fewer way prod breaks silently. The 100% asymptote is what produces a codebase you can refactor without flinching.
Engineers own quality, not QA
There is no QA team coming to save us. The test you write is the test that protects you when the next release is at 11pm before a pilot demo. Quality is an engineering culture choice — and a hiring filter.
Tests are documentation that can't go stale
A test of how the API rate-limiter behaves under bursts is more useful — and more current — than three pages of Notion explaining the same thing.
Refactor velocity compounds
Tested code lets the next engineer cut things in half. Untested code pushes everyone toward bolted-on workarounds. The compounding shows up six months in.
One pick per layer. No re-litigation.
The standard the team writes against — chosen so engineers don't waste cycles deciding. Where a pick differs from what's installed today, the reason is in the rationale.
| Layer | Pick | Why this | Applies to |
|---|---|---|---|
| Unit | Vitest | Vite-native, jest-compatible API, fastest feedback loop. Already configured in dashboard — promote to repo standard for every TS/JS package. | dashboard · admin · pulse · launch-agent · zunou-react · zunou-queries · all Node Lambdas |
| Unit | Pest (built on PHPUnit) | More expressive than vanilla PHPUnit; the Laravel monolith already has 132 tests. Standardize on Pest for new tests; keep PHPUnit for legacy. | services/api (Laravel) |
| Unit | pytest | Industry standard. Critical for the destructive scripts in services/glue — every one rewrites production tables. | services/glue · services/unstructured |
| Component | Vitest + React Testing Library | Library-level component tests with a real DOM. Highest leverage point — zunou-react and zunou-queries back every web app. | zunou-react · zunou-queries · dashboard components |
| Component | Jest + RN Testing Library | RN's official combo. Test the business-logic hooks in nova/src/hooks before touching native. | services/nova (RN business logic) |
| Integration | Pest feature tests + Testcontainers | Real Postgres + real S3 (LocalStack) hitting the GraphQL surface. Where contract drift between API and clients gets caught. | services/api (cross-resolver flows) |
| E2E | Playwright | One tool, three runtimes (Chromium · Firefox · WebKit). Already the consensus 2026 pick. Replaces a Cypress decision before it's made. | dashboard · admin · pulse · launch-agent |
| Mobile E2E | Maestro | YAML flows, runs on real Apple silicon and Android, no native build required. Better DX than Detox for our small team. | services/nova (iOS + Android) |
| Visual | Playwright + screenshot diff | Catches design regressions on every PR. Skip until E2E exists; layer on top once it does. | dashboard (later: nova via Maestro snapshots) |
| Contract | GraphQL Codegen + spectaql | Already partly in place (zunou-graphql, spectaql.config.yml). Failing CI on unannounced schema breaks closes a real silent-failure path. | services/api ↔ all GraphQL clients |
Standardizing the picks now means the next engineer joining doesn't have to learn three slightly different test runners. Consistency > novelty.
What "done" means before a PR can merge.
Six requirements. Posted in CONTRIBUTING.md, enforced where automation can help, owned by the engineer where it can't. The point isn't bureaucracy — it's that "done" needs to mean the same thing to every person on the team, every time.
Tests cover the change
Net-new code has unit tests. Behavioral changes have an integration test. UI changes have a Playwright (web) or Maestro (mobile) flow. PRs without tests for changed lines need an explicit waiver in the description.
CI is green — including the test job
Lint passes. Type-check passes. The test suite runs and passes. No 'green checkmark' from a workflow that just printed 'No tests yet'.
Coverage on changed lines ≥ 70%
We don't gate on whole-repo coverage (vanity). We gate on changed lines so every PR pulls the floor up. Set the threshold low enough that humans accept it, high enough that it bites.
Observability hooks present
User-facing changes ship with a Sentry tag, a structured log line, and (where appropriate) a CloudWatch metric. If it breaks in prod, we want to know without a customer telling us.
Documented at the level it deserves
Library API: docstring + example. Migration: a README in the migration. Feature flag: a single line in CHANGELOG. Internal helper: nothing unless non-obvious. Documentation is part of done — just calibrated to audience.
Owned, not orphaned
Every merged PR has a single name in the CODEOWNERS for the touched area. If we can't answer 'who owns this when it breaks at 3am' on the merge, the PR isn't done.
What 100% coverage still won't catch.
A green test suite can still ship a regression in production timing, in payload shape, or in an unscanned dependency. These are the layers above unit tests — listed by what closes the most pilot risk for the least effort.
Synthetic monitoring on critical user paths
Until the E2E suite is fast enough to run continuously, a Checkly (or similar) hitting login, send-message, schedule-meeting every 5 minutes is the cheapest insurance. Pages an oncall when prod breaks before the first user notices.
Load testing on the notification path
k6 or similar against a staging tenant — burst 500 messages to a 200-person channel and measure delivery SLO. Currently we don't know our actual ceiling. See Foundation §02.
Security scanning in CI
Semgrep + npm audit + composer audit + Trivy on container images. Catches the JWT-bypass class of bug before it merges, not after.
Mutation testing on the libraries
Stryker (Vitest) on zunou-react and zunou-queries. Reveals tests that pass without exercising the code — the worst kind of green checkmark. Add only after the libraries reach >50% line coverage.
Snapshot tests on push payloads
Notification payloads are hard to debug in production. A snapshot per notification type (mention / task / meeting / digest) catches PII regressions and shape drift.
Where this gets us
Quality is owned by engineers. 100% is the asymptote.
The stack is picked. The Definition of Done is written. The principle — quality is engineering's responsibility, not a downstream gate — is the only one that needs leadership consent. Everything else is mechanics, and mechanics are tractable.
Untested code is a liability. Tested code is leverage.