Quality · test coverage Six docs · start with the Overview. GTM names it · Dashboard measures it · Journeys shows it · Foundation builds it · Quality verifies it.
Quality · Test coverage audit Marco Avila · 2026-05-12 · pulled from zunou-services @ main

21 services.
2 of them actually run tests.

A snapshot of test coverage across the platform — and a path to 100%. The good news: a mature test culture already exists in pockets. The bad news: it isn't enforced.

The principle

We aim for 100% test coverage.

Not as a vanity metric. As the floor that lets engineering ship at velocity without panic-fixing in production. Today the floor is missing. Most user-facing surfaces — mobile, web apps, the shared component library — are effectively untested, and the test suites we have written sit unrun. The first move is mechanical, not cultural: turn on the tests we already wrote.

Services testing in CI
1 / 19
Only error-assistant runs its suite on deploy
Test files (already written)
155
Mostly in api (132) and error-assistant (17)
E2E automation
0
No Playwright / Cypress / Detox / Maestro anywhere
Executive summary 5 bullets · 30 seconds
  1. Coverage today is D+. 21 services audited; 2 actually run their tests in CI. 153 already-written tests sit unused.
  2. We're standardizing the stack: Vitest for unit + component, Playwright for web E2E, Pest for Laravel, Jest for Node + RN, Maestro for mobile E2E, pytest for Python jobs.
  3. Definition of done becomes the contract: tests for the change, CI green including the test job, ≥70% coverage on changed lines, observability hooks, ownership in CODEOWNERS.
  4. 100% is the asymptote, not the deadline. The goal is fewer bugs, engineer-owned quality, refactor velocity that compounds.
  5. Coverage alone won't catch everything. Synthetic monitoring, load testing on the notification path, and security scanning in CI close the rest of the pilot risk.

01 Scoreboard

Where the floor is.

Grades reflect the combination of tests written and tests actually executed by CI. A service with a strong suite but no CI gate gets a B. A service with no tests at all gets an F regardless of how stable it feels in production.

Mobile

1 service
Service Stack Test files CI runs tests Grade Notes
nova Expo RN + Swift + Android 0 no F Build only · no test harness anywhere

Web

4 services
Service Stack Test files CI runs tests Grade Notes
dashboard React 18 + Vite 1 no D Vitest configured · CI skips it
admin React 18 + Vite 0 no F
pulse React 18 + Vite 0 no F
launch-agent React + Vite (embed) 0 no F

Backend

9 services
Service Stack Test files CI runs tests Grade Notes
api PHP 8.3 / Laravel 11 132 no B PHPUnit + Pest · CI runs lint only
error-assistant Node 20 Lambda 17 on deploy A- Tests run on deploy — reference template
slack Node 20 + Bolt 4 no D
agent Node library (voice) 1 no F
ai-proxy Node Lambda (LLM proxy) 0 no F `echo "No tests yet"`
relay-service Node Lambda (orchestration) 0 no F `echo "No tests yet"`
notification-hub Node Lambda (Pusher) 0 no F `echo "No tests yet"`
meet-bot Node stub 0 no F
uploader Express + Uppy 0 no F

Data

2 services
Service Stack Test files CI runs tests Grade Notes
glue Python 3 jobs 0 no F 15+ destructive scripts
unstructured Python + Flask 0 no F

Infra

4 services
Service Stack Test files CI runs tests Grade Notes
lambda Node autoscale glue 0 no F
kestra Container orchestration n/a
cdn Static + shell n/a
investor-deck Static HTML n/a

Lib

3 services
Service Stack Test files CI runs tests Grade Notes
zunou-react React component library 0 no F 20+ components consumed everywhere
zunou-queries React Query + GraphQL hooks 0 no F Core data layer
zunou-graphql GraphQL codegen n/a

02 The four facts

What this audit changes about how we ship.

01 · The biggest unforced error

132 PHPUnit + Pest tests sit unused.

The Laravel monolith has the strongest test culture in the repo — unit, feature, and integration suites. test-api.yml runs make lint only. Wiring PHPUnit into that workflow is the single highest-leverage move in this audit. One PR, hours of work, immediately protects every PR.

02 · The biggest blast radius

nova ships untested to user devices.

Native Swift in ios/Zunou.xcodeproj/, the RN business logic in src/hooks/, the Android build — no XCTest, no jest-native, no Detox, no Maestro. Every release is a manual-QA roll of the dice. The Tokyo pilot in §06 of the GTM writeup ships through this surface.

03 · The compounding gap

The shared libraries inherit zero coverage.

zunou-react and zunou-queries back dashboard, admin, and pulse. They have zero tests. Every untested consumer is a multiplier on a fragile foundation. Coverage on the libraries is the highest-impact-per-test you can write.

04 · The reference template

error-assistant already does this right.

17 Jest tests covering Cloudwatch ingest, deduplication, agent tools, validators. Tests run on deploy. Copy this template into ai-proxy, relay-service, and notification-hub. The plumbing is portable.

03 Why 100%

Coverage is not a vanity metric. It's the contract.

100% is the asymptote, not the deadline. Chasing it produces a codebase that ships at velocity without panic — and an engineering culture where quality is owned by the people writing the code, not delegated downstream.

Reason 01

Fewer bugs in production

Every line covered is one fewer way prod breaks silently. The 100% asymptote is what produces a codebase you can refactor without flinching.

Reason 02

Engineers own quality, not QA

There is no QA team coming to save us. The test you write is the test that protects you when the next release is at 11pm before a pilot demo. Quality is an engineering culture choice — and a hiring filter.

Reason 03

Tests are documentation that can't go stale

A test of how the API rate-limiter behaves under bursts is more useful — and more current — than three pages of Notion explaining the same thing.

Reason 04

Refactor velocity compounds

Tested code lets the next engineer cut things in half. Untested code pushes everyone toward bolted-on workarounds. The compounding shows up six months in.


04 Test stack

One pick per layer. No re-litigation.

The standard the team writes against — chosen so engineers don't waste cycles deciding. Where a pick differs from what's installed today, the reason is in the rationale.

Layer Pick Why this Applies to
Unit
Vitest
Vite-native, jest-compatible API, fastest feedback loop. Already configured in dashboard — promote to repo standard for every TS/JS package. dashboard · admin · pulse · launch-agent · zunou-react · zunou-queries · all Node Lambdas
Unit
Pest (built on PHPUnit)
More expressive than vanilla PHPUnit; the Laravel monolith already has 132 tests. Standardize on Pest for new tests; keep PHPUnit for legacy. services/api (Laravel)
Unit
pytest
Industry standard. Critical for the destructive scripts in services/glue — every one rewrites production tables. services/glue · services/unstructured
Component
Vitest + React Testing Library
Library-level component tests with a real DOM. Highest leverage point — zunou-react and zunou-queries back every web app. zunou-react · zunou-queries · dashboard components
Component
Jest + RN Testing Library
RN's official combo. Test the business-logic hooks in nova/src/hooks before touching native. services/nova (RN business logic)
Integration
Pest feature tests + Testcontainers
Real Postgres + real S3 (LocalStack) hitting the GraphQL surface. Where contract drift between API and clients gets caught. services/api (cross-resolver flows)
E2E
Playwright
One tool, three runtimes (Chromium · Firefox · WebKit). Already the consensus 2026 pick. Replaces a Cypress decision before it's made. dashboard · admin · pulse · launch-agent
Mobile E2E
Maestro
YAML flows, runs on real Apple silicon and Android, no native build required. Better DX than Detox for our small team. services/nova (iOS + Android)
Visual
Playwright + screenshot diff
Catches design regressions on every PR. Skip until E2E exists; layer on top once it does. dashboard (later: nova via Maestro snapshots)
Contract
GraphQL Codegen + spectaql
Already partly in place (zunou-graphql, spectaql.config.yml). Failing CI on unannounced schema breaks closes a real silent-failure path. services/api ↔ all GraphQL clients

Standardizing the picks now means the next engineer joining doesn't have to learn three slightly different test runners. Consistency > novelty.

05 Definition of done

What "done" means before a PR can merge.

Six requirements. Posted in CONTRIBUTING.md, enforced where automation can help, owned by the engineer where it can't. The point isn't bureaucracy — it's that "done" needs to mean the same thing to every person on the team, every time.

01

Tests cover the change

Net-new code has unit tests. Behavioral changes have an integration test. UI changes have a Playwright (web) or Maestro (mobile) flow. PRs without tests for changed lines need an explicit waiver in the description.

02

CI is green — including the test job

Lint passes. Type-check passes. The test suite runs and passes. No 'green checkmark' from a workflow that just printed 'No tests yet'.

03

Coverage on changed lines ≥ 70%

We don't gate on whole-repo coverage (vanity). We gate on changed lines so every PR pulls the floor up. Set the threshold low enough that humans accept it, high enough that it bites.

04

Observability hooks present

User-facing changes ship with a Sentry tag, a structured log line, and (where appropriate) a CloudWatch metric. If it breaks in prod, we want to know without a customer telling us.

05

Documented at the level it deserves

Library API: docstring + example. Migration: a README in the migration. Feature flag: a single line in CHANGELOG. Internal helper: nothing unless non-obvious. Documentation is part of done — just calibrated to audience.

06

Owned, not orphaned

Every merged PR has a single name in the CODEOWNERS for the touched area. If we can't answer 'who owns this when it breaks at 3am' on the merge, the PR isn't done.


06 Beyond raw coverage

What 100% coverage still won't catch.

A green test suite can still ship a regression in production timing, in payload shape, or in an unscanned dependency. These are the layers above unit tests — listed by what closes the most pilot risk for the least effort.

Synthetic monitoring on critical user paths

Until the E2E suite is fast enough to run continuously, a Checkly (or similar) hitting login, send-message, schedule-meeting every 5 minutes is the cheapest insurance. Pages an oncall when prod breaks before the first user notices.

high

Load testing on the notification path

k6 or similar against a staging tenant — burst 500 messages to a 200-person channel and measure delivery SLO. Currently we don't know our actual ceiling. See Foundation §02.

high

Security scanning in CI

Semgrep + npm audit + composer audit + Trivy on container images. Catches the JWT-bypass class of bug before it merges, not after.

high

Mutation testing on the libraries

Stryker (Vitest) on zunou-react and zunou-queries. Reveals tests that pass without exercising the code — the worst kind of green checkmark. Add only after the libraries reach >50% line coverage.

medium

Snapshot tests on push payloads

Notification payloads are hard to debug in production. A snapshot per notification type (mention / task / meeting / digest) catches PII regressions and shape drift.

medium

Where this gets us

Quality is owned by engineers. 100% is the asymptote.

The stack is picked. The Definition of Done is written. The principle — quality is engineering's responsibility, not a downstream gate — is the only one that needs leadership consent. Everything else is mechanics, and mechanics are tractable.

Untested code is a liability. Tested code is leverage.