Overview
GAIA has two test suites:- API (
apps/api/tests/) — pytest + pytest-asyncio, ~918 tests, runs in ~15 seconds - Bots (
apps/bots/__tests__/) — vitest, covers Discord and Slack adapters
Running Tests
With mise (recommended)
Without mise
Test Layout
API
Bots
Test Markers
| Marker | What it covers | External deps? |
|---|---|---|
unit | Individual functions and classes with mocked I/O | None |
integration | Real FastAPI app lifecycle or compiled graphs, mocked services | None |
e2e | Full agent runs with near-real services | Redis, MongoDB |
composio | Live Composio API calls | Composio credentials |
pytest.ini config runs everything except e2e and composio.
What Gets Tested
API Endpoints (auth, routing, response contracts)
API Endpoints (auth, routing, response contracts)
Tests in
tests/api/ and tests/integration/api/ run the real FastAPI app via httpx.AsyncClient. They verify:- Correct HTTP status codes (200, 401, 403, 422)
- Auth enforcement on every protected route
- Response body shape and required fields
- SSE content type (
text/event-stream) and headers (x-stream-id,cache-control) - Error paths: Redis unavailable →
[STREAM_ERROR]in SSE body - Pagination parameter validation
patch("app.api.v1.endpoints.conversations.create_conversation_service")), so the routing and response logic is exercised without hitting a database.Service Layer (core business logic)
Service Layer (core business logic)
Tests in
tests/unit/services/ import production functions directly and test real logic with mocked databases and LLM clients. Covered services:- chat_service:
run_chat_stream_background,_initialize_new_conversation,_save_conversation_async,extract_tool_data,_extract_response_text - conversation_service: CRUD operations, pagination, read/unread state
- user_service: User creation, lookup, preference management
- memory_service: Memory extraction and persistence
- mail_service: Email parsing, send logic
- workflow_service: Workflow creation and trigger evaluation
Agent Routing and Graph Construction
Agent Routing and Graph Construction
Tests in
tests/unit/agents/ and tests/integration/agents/ call the real create_agent factory and build_comms_graph / build_executor_graph builders. Verified behaviors:- Conditional edge
"agent"is registered with routing targets "tools"is always reachable from the agent node"select_tools"appears only whenretrieve_toolsis enabled"end_graph_hooks"appears only when hooks are provided- Plain text response → no ToolMessages (routes to END / end_graph_hooks)
- Tool call response → ToolMessage produced with correct
tool_call_id - Multiple tool calls → all produce ToolMessages
- State accumulates across turns via InMemorySaver checkpointing
add_memoryandsearch_memoryare wired into the comms agent tool registry
Workers and Background Tasks
Workers and Background Tasks
Tests in
tests/unit/workers/ cover the ARQ background task functions:cleanup_tasks: old conversation pruning, orphan cleanupmemory_tasks: background memory extraction schedulingreminder_tasks: reminder triggering and deliveryuser_tasks: user lifecycle operationsworkflow_tasks: cron trigger evaluation
MCP and Tool Registry
MCP and Tool Registry
Tests cover the MCP tool store, connection flows, and token management:
ChromaStoreindexing with namespace metadata- Redis cache hit/miss paths
- MCP server connection lifecycle
- Token refresh and expiry handling
Bot Adapters (Discord and Slack)
Bot Adapters (Discord and Slack)
Tests in
apps/bots/__tests__/ use vitest and run against production adapter code. They verify:- Discord message adapter formatting and embed rendering
- Slack message adapter formatting and mention handling
- Shared rich-text renderer output
- Command parsing utilities
- Text formatting helpers
mise test:bots or cd apps/bots && pnpm vitest run.Test Infrastructure
Root conftest.py
The rootconftest.py (at tests/conftest.py) sets up the test environment before any app modules load:
_noop_lifespan) so database connections are never attempted. Auth is bypassed via app.dependency_overrides[get_current_user] = lambda: FAKE_USER.
Key fixtures
| Fixture | Scope | What it provides |
|---|---|---|
test_app | session | FastAPI app with no-op lifespan and auth override |
client | function | AsyncClient bound to the test app, authenticated |
unauthed_client | function | AsyncClient without auth override (gets 401) |
fake_user | function | Dict with test user data |
mock_mongodb | function | AsyncMock() for MongoDB operations |
Writing New Tests
The Golden Rule
If you deleted the production function this test targets, would the test still fail?If the answer is “no”, the test is worthless. Always import from
app. directly.
Mock at the boundary
Mock external I/O (databases, HTTP, Redis), never the logic under test.Assert on behavior, not mock calls
Cover error paths
Production bugs cluster in error handling. Always test what happens when a dependency fails.Coverage Configuration
Coverage is configured inpytest.ini:
fail_under = 3 threshold is intentionally low. Run coverage locally to track actual coverage:

