Quality · test coverage Six docs · start with the Overview. GTM names it · Dashboard measures it · Journeys shows it · Foundation builds it · Quality verifies it.

Quality · Test coverage audit Marco Avila · 2026-05-12 · pulled from zunou-services @ main

21 services.
2 of them actually run tests.

A snapshot of test coverage across the platform — and a path to 100%. The good news: a mature test culture already exists in pockets. The bad news: it isn't enforced.

The principle

We aim for 100% test coverage.

Not as a vanity metric. As the floor that lets engineering ship at velocity without panic-fixing in production. Today the floor is missing. Most user-facing surfaces — mobile, web apps, the shared component library — are effectively untested, and the test suites we have written sit unrun. The first move is mechanical, not cultural: turn on the tests we already wrote.

Overall posture

D+

Mature pockets, no enforcement

Services testing in CI

1 / 19

Only error-assistant runs its suite on deploy

Test files (already written)

155

Mostly in api (132) and error-assistant (17)

E2E automation

No Playwright / Cypress / Detox / Maestro anywhere

Executive summary 5 bullets · 30 seconds

Coverage today is D+. 21 services audited; 2 actually run their tests in CI. 153 already-written tests sit unused.
We're standardizing the stack: Vitest for unit + component, Playwright for web E2E, Pest for Laravel, Jest for Node + RN, Maestro for mobile E2E, pytest for Python jobs.
Definition of done becomes the contract: tests for the change, CI green including the test job, ≥70% coverage on changed lines, observability hooks, ownership in CODEOWNERS.
100% is the asymptote, not the deadline. The goal is fewer bugs, engineer-owned quality, refactor velocity that compounds.
Coverage alone won't catch everything. Synthetic monitoring, load testing on the notification path, and security scanning in CI close the rest of the pilot risk.

01 Scoreboard

Where the floor is.

Grades reflect the combination of tests written and tests actually executed by CI. A service with a strong suite but no CI gate gets a B. A service with no tests at all gets an F regardless of how stable it feels in production.

Mobile

1 service

Service	Stack	Test files	CI runs tests	Grade	Notes
nova	Expo RN + Swift + Android	0	no	F	Build only · no test harness anywhere

Web

4 services

Service	Stack	Test files	CI runs tests	Grade	Notes
dashboard	React 18 + Vite	1	no	D	Vitest configured · CI skips it
admin	React 18 + Vite	0	no	F	—
pulse	React 18 + Vite	0	no	F	—
launch-agent	React + Vite (embed)	0	no	F	—

Backend

9 services

Service	Stack	Test files	CI runs tests	Grade	Notes
api	PHP 8.3 / Laravel 11	132	no	B	PHPUnit + Pest · CI runs lint only
error-assistant	Node 20 Lambda	17	on deploy	A-	Tests run on deploy — reference template
slack	Node 20 + Bolt	4	no	D	—
agent	Node library (voice)	1	no	F	—
ai-proxy	Node Lambda (LLM proxy)	0	no	F	`echo "No tests yet"`
relay-service	Node Lambda (orchestration)	0	no	F	`echo "No tests yet"`
notification-hub	Node Lambda (Pusher)	0	no	F	`echo "No tests yet"`
meet-bot	Node stub	0	no	F	—
uploader	Express + Uppy	0	no	F	—

Data

2 services

Service	Stack	Test files	CI runs tests	Grade	Notes
glue	Python 3 jobs	0	no	F	15+ destructive scripts
unstructured	Python + Flask	0	no	F	—

Infra

4 services

Service	Stack	Test files	CI runs tests	Grade	Notes
lambda	Node autoscale glue	0	no	F	—
kestra	Container orchestration	—	—	n/a	—
cdn	Static + shell	—	—	n/a	—
investor-deck	Static HTML	—	—	n/a	—

Lib

3 services

Service	Stack	Test files	CI runs tests	Grade	Notes
zunou-react	React component library	0	no	F	20+ components consumed everywhere
zunou-queries	React Query + GraphQL hooks	0	no	F	Core data layer
zunou-graphql	GraphQL codegen	—	—	n/a	—

02 The four facts

What this audit changes about how we ship.

01 · The biggest unforced error

132 PHPUnit + Pest tests sit unused.

The Laravel monolith has the strongest test culture in the repo — unit, feature, and integration suites. test-api.yml runs make lint only. Wiring PHPUnit into that workflow is the single highest-leverage move in this audit. One PR, hours of work, immediately protects every PR.

02 · The biggest blast radius

nova ships untested to user devices.

Native Swift in ios/Zunou.xcodeproj/, the RN business logic in src/hooks/, the Android build — no XCTest, no jest-native, no Detox, no Maestro. Every release is a manual-QA roll of the dice. The Tokyo pilot in §06 of the GTM writeup ships through this surface.

03 · The compounding gap

The shared libraries inherit zero coverage.

zunou-react and zunou-queries back dashboard, admin, and pulse. They have zero tests. Every untested consumer is a multiplier on a fragile foundation. Coverage on the libraries is the highest-impact-per-test you can write.

04 · The reference template

error-assistant already does this right.

17 Jest tests covering Cloudwatch ingest, deduplication, agent tools, validators. Tests run on deploy. Copy this template into ai-proxy, relay-service, and notification-hub. The plumbing is portable.

03 Why 100%

Coverage is not a vanity metric. It's the contract.

100% is the asymptote, not the deadline. Chasing it produces a codebase that ships at velocity without panic — and an engineering culture where quality is owned by the people writing the code, not delegated downstream.

Reason 01

Fewer bugs in production

Every line covered is one fewer way prod breaks silently. The 100% asymptote is what produces a codebase you can refactor without flinching.

Reason 02

Engineers own quality, not QA

There is no QA team coming to save us. The test you write is the test that protects you when the next release is at 11pm before a pilot demo. Quality is an engineering culture choice — and a hiring filter.

Reason 03

Tests are documentation that can't go stale

A test of how the API rate-limiter behaves under bursts is more useful — and more current — than three pages of Notion explaining the same thing.

Reason 04

Refactor velocity compounds

Tested code lets the next engineer cut things in half. Untested code pushes everyone toward bolted-on workarounds. The compounding shows up six months in.

04 Test stack

One pick per layer. No re-litigation.

The standard the team writes against — chosen so engineers don't waste cycles deciding. Where a pick differs from what's installed today, the reason is in the rationale.

Layer	Pick	Why this	Applies to
Unit	Vitest	Vite-native, jest-compatible API, fastest feedback loop. Already configured in dashboard — promote to repo standard for every TS/JS package.	dashboard · admin · pulse · launch-agent · zunou-react · zunou-queries · all Node Lambdas
Unit	Pest (built on PHPUnit)	More expressive than vanilla PHPUnit; the Laravel monolith already has 132 tests. Standardize on Pest for new tests; keep PHPUnit for legacy.	services/api (Laravel)
Unit	pytest	Industry standard. Critical for the destructive scripts in services/glue — every one rewrites production tables.	services/glue · services/unstructured
Component	Vitest + React Testing Library	Library-level component tests with a real DOM. Highest leverage point — zunou-react and zunou-queries back every web app.	zunou-react · zunou-queries · dashboard components
Component	Jest + RN Testing Library	RN's official combo. Test the business-logic hooks in nova/src/hooks before touching native.	services/nova (RN business logic)
Integration	Pest feature tests + Testcontainers	Real Postgres + real S3 (LocalStack) hitting the GraphQL surface. Where contract drift between API and clients gets caught.	services/api (cross-resolver flows)
E2E	Playwright	One tool, three runtimes (Chromium · Firefox · WebKit). Already the consensus 2026 pick. Replaces a Cypress decision before it's made.	dashboard · admin · pulse · launch-agent
Mobile E2E	Maestro	YAML flows, runs on real Apple silicon and Android, no native build required. Better DX than Detox for our small team.	services/nova (iOS + Android)
Visual	Playwright + screenshot diff	Catches design regressions on every PR. Skip until E2E exists; layer on top once it does.	dashboard (later: nova via Maestro snapshots)
Contract	GraphQL Codegen + spectaql	Already partly in place (zunou-graphql, spectaql.config.yml). Failing CI on unannounced schema breaks closes a real silent-failure path.	services/api ↔ all GraphQL clients

Standardizing the picks now means the next engineer joining doesn't have to learn three slightly different test runners. Consistency > novelty.

05 Definition of done

What "done" means before a PR can merge.

Six requirements. Posted in CONTRIBUTING.md, enforced where automation can help, owned by the engineer where it can't. The point isn't bureaucracy — it's that "done" needs to mean the same thing to every person on the team, every time.

Tests cover the change

Net-new code has unit tests. Behavioral changes have an integration test. UI changes have a Playwright (web) or Maestro (mobile) flow. PRs without tests for changed lines need an explicit waiver in the description.

CI is green — including the test job

Lint passes. Type-check passes. The test suite runs and passes. No 'green checkmark' from a workflow that just printed 'No tests yet'.

Coverage on changed lines ≥ 70%

We don't gate on whole-repo coverage (vanity). We gate on changed lines so every PR pulls the floor up. Set the threshold low enough that humans accept it, high enough that it bites.

Observability hooks present

User-facing changes ship with a Sentry tag, a structured log line, and (where appropriate) a CloudWatch metric. If it breaks in prod, we want to know without a customer telling us.

Documented at the level it deserves

Library API: docstring + example. Migration: a README in the migration. Feature flag: a single line in CHANGELOG. Internal helper: nothing unless non-obvious. Documentation is part of done — just calibrated to audience.

Owned, not orphaned

Every merged PR has a single name in the CODEOWNERS for the touched area. If we can't answer 'who owns this when it breaks at 3am' on the merge, the PR isn't done.

06 Beyond raw coverage

What 100% coverage still won't catch.

A green test suite can still ship a regression in production timing, in payload shape, or in an unscanned dependency. These are the layers above unit tests — listed by what closes the most pilot risk for the least effort.

Synthetic monitoring on critical user paths

Until the E2E suite is fast enough to run continuously, a Checkly (or similar) hitting login, send-message, schedule-meeting every 5 minutes is the cheapest insurance. Pages an oncall when prod breaks before the first user notices.

high

Load testing on the notification path

k6 or similar against a staging tenant — burst 500 messages to a 200-person channel and measure delivery SLO. Currently we don't know our actual ceiling. See Foundation §02.

high

Security scanning in CI

Semgrep + npm audit + composer audit + Trivy on container images. Catches the JWT-bypass class of bug before it merges, not after.

high

Mutation testing on the libraries

Stryker (Vitest) on zunou-react and zunou-queries. Reveals tests that pass without exercising the code — the worst kind of green checkmark. Add only after the libraries reach >50% line coverage.

medium

Snapshot tests on push payloads

Notification payloads are hard to debug in production. A snapshot per notification type (mention / task / meeting / digest) catches PII regressions and shape drift.

medium

Where this gets us

Quality is owned by engineers. 100% is the asymptote.

The stack is picked. The Definition of Done is written. The principle — quality is engineering's responsibility, not a downstream gate — is the only one that needs leadership consent. Everything else is mechanics, and mechanics are tractable.

Untested code is a liability. Tested code is leverage.

See Foundation · what to ship Read the GTM strategy

21 services. 2 of them actually run tests.

Where the floor is.

Mobile

Web

Backend

Data

Infra

Lib

What this audit changes about how we ship.

132 PHPUnit + Pest tests sit unused.

nova ships untested to user devices.

The shared libraries inherit zero coverage.

error-assistant already does this right.

Coverage is not a vanity metric. It's the contract.

Fewer bugs in production

Engineers own quality, not QA

Tests are documentation that can't go stale

Refactor velocity compounds

One pick per layer. No re-litigation.

What "done" means before a PR can merge.

Tests cover the change

CI is green — including the test job

Coverage on changed lines ≥ 70%

Observability hooks present

Documented at the level it deserves

Owned, not orphaned

What 100% coverage still won't catch.

Synthetic monitoring on critical user paths

Load testing on the notification path

Security scanning in CI

Mutation testing on the libraries

Snapshot tests on push payloads

Quality is owned by engineers. 100% is the asymptote.

21 services.
2 of them actually run tests.