SearchMuse Architecture¶
SearchMuse employs a hexagonal architecture (Ports & Adapters) to ensure clean separation of concerns, testability, and extensibility. This document describes the overall architecture, design decisions, and layer responsibilities.
Hexagonal Architecture Overview¶
Hexagonal architecture organizes code into distinct layers, with dependencies pointing inward toward the domain layer. This ensures the domain logic (business rules) remains independent of implementation details.
graph TB
subgraph "External World"
CLI["CLI Interface"]
WEB["Web Browser"]
DB[(SQLite)]
LLM["Ollama LLM"]
SEARCH["DuckDuckGo"]
end
subgraph "Infrastructure Layer"
TYPER["Typer CLI"]
RICH["Rich Formatter"]
end
subgraph "Adapter Layer"
OLLAMA["OllamaLLM Adapter"]
HTTP["HttpxScraper Adapter"]
PW["PlaywrightScraper Adapter"]
TRAF["TrafilaturaExtractor Adapter"]
SQLITE["SQLiteRepository Adapter"]
MARKDOWN["MarkdownRenderer Adapter"]
DDG["DuckDuckGoSearch Adapter"]
end
subgraph "Port Layer (Interfaces)"
LLMPORT["LLMPort"]
SCRAPERPORT["ScraperPort"]
EXTRACTPORT["ContentExtractorPort"]
REPOPORT["SourceRepositoryPort"]
RENDERPORT["ResultRendererPort"]
SEARCHPORT["SearchPort"]
end
subgraph "Application Layer"
ORCHESTRATOR["ResearchOrchestrator"]
STRATEGYENGINE["StrategyEngine"]
end
subgraph "Domain Layer"
ENTITIES["Entities<br/>SearchQuery, SearchState<br/>Source, Citation, Result"]
VALUEOBJECTS["Value Objects<br/>URL, ContentBlock<br/>Reference"]
EXCEPTIONS["Domain Exceptions<br/>SearchError, ValidationError"]
end
CLI --> TYPER
WEB --> RICH
TYPER --> ORCHESTRATOR
RICH --> MARKDOWN
ORCHESTRATOR --> STRATEGYENGINE
STRATEGYENGINE --> LLMPORT
ORCHESTRATOR --> SCRAPERPORT
ORCHESTRATOR --> EXTRACTPORT
ORCHESTRATOR --> REPOPORT
ORCHESTRATOR --> RENDERPORT
ORCHESTRATOR --> SEARCHPORT
LLMPORT --> OLLAMA
SCRAPERPORT --> HTTP
SCRAPERPORT --> PW
EXTRACTPORT --> TRAF
REPOPORT --> SQLITE
RENDERPORT --> MARKDOWN
SEARCHPORT --> DDG
OLLAMA --> LLM
HTTP --> WEB
PW --> WEB
TRAF --> WEB
SQLITE --> DB
DDG --> SEARCH
ORCHESTRATOR -.-> ENTITIES
STRATEGYENGINE -.-> ENTITIES
ADAPTERS -.-> ENTITIES
Layer Descriptions¶
Domain Layer (Core)¶
The domain layer contains pure business logic with zero external dependencies. It defines:
- Entities: Core business objects with identity (SearchQuery, SearchState, Source)
- Value Objects: Immutable objects without identity (URL, Citation, ContentBlock)
- Exceptions: Domain-specific errors (SearchError, ValidationError, MaxIterationsExceeded)
- Business Rules: Logic for combining and validating data
Key principle: All domain objects are frozen dataclasses for immutability.
Port Layer (Interfaces)¶
Ports define contracts for external services using Python Protocol interfaces:
- LLMPort: Language model integration
- ScraperPort: Web scraping abstraction
- ContentExtractorPort: Content extraction from HTML
- SourceRepositoryPort: Data persistence
- ResultRendererPort: Output formatting
- SearchPort: Search engine integration
Key principle: Protocol over ABC for maximum flexibility and runtime zero-cost.
Adapter Layer (Implementation)¶
Adapters implement Port interfaces, integrating external libraries:
- OllamaLLM: Ollama integration via ollama-python
- HttpxScraper: Lightweight HTTP scraping via httpx
- PlaywrightScraper: JavaScript-capable scraping via playwright
- TrafilaturaExtractor: Content extraction via trafilatura
- SQLiteRepository: Persistent storage via aiosqlite
- MarkdownRenderer: Output formatting via rich
- DuckDuckGoSearch: Search via duckduckgo-search
Key principle: Adapters are interchangeable via dependency injection.
Application Layer¶
Orchestrates domain logic and port interactions:
- ResearchOrchestrator: Manages overall research workflow
- StrategyEngine: Generates search strategies using LLM
- ResultAggregator: Combines results from multiple iterations
Infrastructure Layer¶
Provides runtime infrastructure:
- DependencyContainer: Manages adapter instances and configuration
- ConfigurationLoader: Parses YAML and environment configuration
- AsyncRuntime: Handles async execution and error propagation
CLI Layer¶
Exposes the application as command-line interface:
- Typer Commands: Research, configuration, history commands
- Rich Output: Formatted results in terminal
Dependency Rule¶
Dependencies point inward. Outer layers depend on inner layers; inner layers never depend on outer layers.
External APIs (Ollama, DuckDuckGo, etc.)
↓
Adapters (OllamaLLM, HttpxScraper, etc.)
↓
Ports (Protocol interfaces)
↓
Application Layer (Orchestrators)
↓
Domain Layer (Entities, Value Objects, Exceptions)
Violations of this rule create tight coupling and reduce testability.
Architecture Decision Records¶
ADR-1: Hexagonal Architecture¶
Decision: Use hexagonal architecture (Ports & Adapters pattern).
Rationale: - Domain logic isolated from external dependencies - Easy to swap implementations (Ollama -> Claude -> open-source LLM) - Testable without external services (mock ports) - Clear separation of concerns - Scalable as new adapters are added
Status: Accepted
ADR-2: Frozen Dataclasses for Domain Objects¶
Decision: All domain entities and value objects are frozen dataclasses.
Rationale: - Immutability prevents hidden side effects - Hashable, can be used in sets and dicts - Reduced cognitive load (no mutation tracking) - Thread-safe by default - Better memory efficiency than traditional classes
Status: Accepted
ADR-3: Protocol over ABC¶
Decision: Use Python Protocol for port interfaces instead of ABC.
Rationale: - Structural typing: no explicit inheritance required - Adapters can implement multiple protocols - Zero runtime cost (protocols are not classes) - Better tooling support (type checkers understand protocols) - More Pythonic for runtime duck typing
Status: Accepted
ADR-4: Async-First Design¶
Decision: All I/O operations use async/await.
Rationale: - Efficient handling of multiple concurrent requests - Better resource utilization - Scales to thousands of concurrent operations - Cleaner than threading callbacks
Status: Accepted
ADR-5: Configuration as YAML + Environment Variables¶
Decision: Configuration via YAML files with environment variable overrides.
Rationale: - YAML is human-readable and version-controllable - Environment variables enable containerized deployments - Three-tier precedence (default -> custom -> env) prevents surprises - No secrets in code or version control
Status: Accepted
Cross-Layer Communication¶
Layers communicate through well-defined interfaces:
- CLI → Application: Typer commands pass user input to orchestrators
- Application → Domain: Orchestrators use domain entities and value objects
- Application → Ports: Orchestrators depend on port interfaces (not implementations)
- Ports → Adapters: Dependency injection provides concrete implementations
- Adapters → External: Adapters call external services (Ollama, DuckDuckGo, etc.)
Testing Implications¶
The architecture enables comprehensive testing:
- Unit Tests: Domain logic with no external dependencies
- Integration Tests: Adapters with real external services
- E2E Tests: Full research flow with mock search results
- Contract Tests: Port interfaces verified by adapters
See Testing Strategy for details.
Related Documentation¶
- Components Guide - Detailed component descriptions
- Data Flow - How data moves through layers
- API Reference - Domain classes and port interfaces
- Contributing Guide - Extending the architecture
Last updated: 2026-02-28