Testing & quality¶
Jarvis is a distributed system handling sensitive personal, biometric and financial data. Quality is non-negotiable.
Goals¶
- Coverage ≥ 80% on core code (server, agents)
- Zero regressions in production: every bug fix ships with a regression test
- Device-bridge tests: every integration (Oura, Whoop, Frame, …) has mock contract tests
- Green CI before every merge to
main
Testing pyramid¶
▲
╱E╲ E2E (Playwright · device emulators)
╱2 E╲ few, slow, high value
╱─────╲
╱ INT ╲ Integration (server + DB + LLM mock)
╱─────────╲
╱ UNIT ╲ Unit (pytest · vitest)
╱─────────────╲ many, fast, isolated
Testing stack per language¶
| Language | Framework | Mock | Coverage |
|---|---|---|---|
| Python | pytest + pytest-asyncio | pytest-mock, respx (HTTPX), freezegun | coverage.py |
| TypeScript / JS | vitest or jest | msw, nock | c8 / istanbul |
| Go (if used) | testing + testify | gomock, httptest | go test -cover |
| Mobile | XCTest (iOS), Espresso (Android) | Detox, Maestro | platform-specific |
| E2E | Playwright | – | – |
TDD workflow¶
1. RED → write a failing test
2. GREEN → write the minimum code to make it pass
3. IMPROVE → refactor while keeping tests green
4. COVERAGE → verify we are ≥ 80%
Test types required per feature¶
- ✅ Unit tests on pure functions
- ✅ Integration tests hitting the real database (Postgres in container) — no DB mocks
- ✅ Contract tests for every external API client (Oura, Whoop, Plaid…)
- ✅ E2E tests for every critical user flow (login, device pairing, cross-device conversation)
- ✅ Security tests for every endpoint handling auth or sensitive data
Mandatory security tests¶
- 🔒 No hardcoded secrets — checked in CI with
gitleaksortrufflehog - 🔒 Input validation at boundaries (Pydantic schemas, Zod schemas)
- 🔒 SQL injection / XSS tests on forms
- 🔒 Rate-limiting tests on public endpoints
- 🔒 Dependency audit:
pip-audit,npm audit,osv-scanner
CI / CD¶
# .github/workflows/ci.yml — simplified
on: [push, pull_request]
jobs:
test:
steps:
- lint # ruff, eslint, prettier
- typecheck # mypy, tsc
- unit # pytest, vitest
- integration # with services in containers
- e2e # playwright (only on PRs to release branches)
- security # gitleaks, pip-audit
Writing a good test¶
- 🎯 One test = one behaviour
- 📛 Naming:
test_<component>_<action>_<expected> - 🧱 Clear and visible Arrange / Act / Assert
- 🚫 No tests that depend on execution order
- 🚫 No arbitrary sleeps — use fixtures and explicit awaits
- 🪪 Every test cleans up after itself (rollback transactions, ephemeral containers)
Mock strategy for device agents¶
Integrations with physical devices (Frame, OctoPrint, smartwatches) require two levels:
- Contract test on the client → HTTP mock of the remote server
- Optional smoke test on real hardware → run locally by whoever develops that device-agent, not in CI
Example: server integration test¶
# server/tests/integration/test_memory_search.py
import pytest
@pytest.mark.asyncio
async def test_memory_search_returns_relevant_results(client, db, qdrant):
# Arrange
await client.memory.add(user_id="u1", text="I love sushi")
# Act
results = await client.memory.search(user_id="u1", query="japanese food")
# Assert
assert len(results) >= 1
assert "sushi" in results[0].text