SearchMuse Architecture¶

SearchMuse employs a hexagonal architecture (Ports & Adapters) to ensure clean separation of concerns, testability, and extensibility. This document describes the overall architecture, design decisions, and layer responsibilities.

Hexagonal Architecture Overview¶

Hexagonal architecture organizes code into distinct layers, with dependencies pointing inward toward the domain layer. This ensures the domain logic (business rules) remains independent of implementation details.

graph TB
    subgraph "External World"
        CLI["CLI Interface"]
        WEB["Web Browser"]
        DB[(SQLite)]
        LLM["Ollama LLM"]
        SEARCH["DuckDuckGo"]
    end

    subgraph "Infrastructure Layer"
        TYPER["Typer CLI"]
        RICH["Rich Formatter"]
    end

    subgraph "Adapter Layer"
        OLLAMA["OllamaLLM Adapter"]
        HTTP["HttpxScraper Adapter"]
        PW["PlaywrightScraper Adapter"]
        TRAF["TrafilaturaExtractor Adapter"]
        SQLITE["SQLiteRepository Adapter"]
        MARKDOWN["MarkdownRenderer Adapter"]
        DDG["DuckDuckGoSearch Adapter"]
    end

    subgraph "Port Layer (Interfaces)"
        LLMPORT["LLMPort"]
        SCRAPERPORT["ScraperPort"]
        EXTRACTPORT["ContentExtractorPort"]
        REPOPORT["SourceRepositoryPort"]
        RENDERPORT["ResultRendererPort"]
        SEARCHPORT["SearchPort"]
    end

    subgraph "Application Layer"
        ORCHESTRATOR["ResearchOrchestrator"]
        STRATEGYENGINE["StrategyEngine"]
    end

    subgraph "Domain Layer"
        ENTITIES["Entities<br/>SearchQuery, SearchState<br/>Source, Citation, Result"]
        VALUEOBJECTS["Value Objects<br/>URL, ContentBlock<br/>Reference"]
        EXCEPTIONS["Domain Exceptions<br/>SearchError, ValidationError"]
    end

    CLI --> TYPER
    WEB --> RICH
    TYPER --> ORCHESTRATOR
    RICH --> MARKDOWN

    ORCHESTRATOR --> STRATEGYENGINE
    STRATEGYENGINE --> LLMPORT
    ORCHESTRATOR --> SCRAPERPORT
    ORCHESTRATOR --> EXTRACTPORT
    ORCHESTRATOR --> REPOPORT
    ORCHESTRATOR --> RENDERPORT
    ORCHESTRATOR --> SEARCHPORT

    LLMPORT --> OLLAMA
    SCRAPERPORT --> HTTP
    SCRAPERPORT --> PW
    EXTRACTPORT --> TRAF
    REPOPORT --> SQLITE
    RENDERPORT --> MARKDOWN
    SEARCHPORT --> DDG

    OLLAMA --> LLM
    HTTP --> WEB
    PW --> WEB
    TRAF --> WEB
    SQLITE --> DB
    DDG --> SEARCH

    ORCHESTRATOR -.-> ENTITIES
    STRATEGYENGINE -.-> ENTITIES
    ADAPTERS -.-> ENTITIES

Layer Descriptions¶

Domain Layer (Core)¶

The domain layer contains pure business logic with zero external dependencies. It defines:

Entities: Core business objects with identity (SearchQuery, SearchState, Source)
Value Objects: Immutable objects without identity (URL, Citation, ContentBlock)
Exceptions: Domain-specific errors (SearchError, ValidationError, MaxIterationsExceeded)
Business Rules: Logic for combining and validating data

Key principle: All domain objects are frozen dataclasses for immutability.

Port Layer (Interfaces)¶

Ports define contracts for external services using Python Protocol interfaces:

LLMPort: Language model integration
ScraperPort: Web scraping abstraction
ContentExtractorPort: Content extraction from HTML
SourceRepositoryPort: Data persistence
ResultRendererPort: Output formatting
SearchPort: Search engine integration

Key principle: Protocol over ABC for maximum flexibility and runtime zero-cost.

Adapter Layer (Implementation)¶

Adapters implement Port interfaces, integrating external libraries:

OllamaLLM: Ollama integration via ollama-python
HttpxScraper: Lightweight HTTP scraping via httpx
PlaywrightScraper: JavaScript-capable scraping via playwright
TrafilaturaExtractor: Content extraction via trafilatura
SQLiteRepository: Persistent storage via aiosqlite
MarkdownRenderer: Output formatting via rich
DuckDuckGoSearch: Search via duckduckgo-search

Key principle: Adapters are interchangeable via dependency injection.

Application Layer¶

Orchestrates domain logic and port interactions:

ResearchOrchestrator: Manages overall research workflow
StrategyEngine: Generates search strategies using LLM
ResultAggregator: Combines results from multiple iterations

Infrastructure Layer¶

Provides runtime infrastructure:

DependencyContainer: Manages adapter instances and configuration
ConfigurationLoader: Parses YAML and environment configuration
AsyncRuntime: Handles async execution and error propagation

CLI Layer¶

Exposes the application as command-line interface:

Typer Commands: Research, configuration, history commands
Rich Output: Formatted results in terminal

Dependency Rule¶

Dependencies point inward. Outer layers depend on inner layers; inner layers never depend on outer layers.

External APIs (Ollama, DuckDuckGo, etc.)
           ↓
Adapters (OllamaLLM, HttpxScraper, etc.)
           ↓
Ports (Protocol interfaces)
           ↓
Application Layer (Orchestrators)
           ↓
Domain Layer (Entities, Value Objects, Exceptions)

Violations of this rule create tight coupling and reduce testability.

Architecture Decision Records¶

ADR-1: Hexagonal Architecture¶

Decision: Use hexagonal architecture (Ports & Adapters pattern).

Rationale: - Domain logic isolated from external dependencies - Easy to swap implementations (Ollama -> Claude -> open-source LLM) - Testable without external services (mock ports) - Clear separation of concerns - Scalable as new adapters are added

Status: Accepted

ADR-2: Frozen Dataclasses for Domain Objects¶

Decision: All domain entities and value objects are frozen dataclasses.

Rationale: - Immutability prevents hidden side effects - Hashable, can be used in sets and dicts - Reduced cognitive load (no mutation tracking) - Thread-safe by default - Better memory efficiency than traditional classes

Status: Accepted

ADR-3: Protocol over ABC¶

Decision: Use Python Protocol for port interfaces instead of ABC.

Rationale: - Structural typing: no explicit inheritance required - Adapters can implement multiple protocols - Zero runtime cost (protocols are not classes) - Better tooling support (type checkers understand protocols) - More Pythonic for runtime duck typing

Status: Accepted

ADR-4: Async-First Design¶

Decision: All I/O operations use async/await.

Rationale: - Efficient handling of multiple concurrent requests - Better resource utilization - Scales to thousands of concurrent operations - Cleaner than threading callbacks

Status: Accepted

ADR-5: Configuration as YAML + Environment Variables¶

Decision: Configuration via YAML files with environment variable overrides.

Rationale: - YAML is human-readable and version-controllable - Environment variables enable containerized deployments - Three-tier precedence (default -> custom -> env) prevents surprises - No secrets in code or version control

Status: Accepted

Cross-Layer Communication¶

Layers communicate through well-defined interfaces:

CLI → Application: Typer commands pass user input to orchestrators
Application → Domain: Orchestrators use domain entities and value objects
Application → Ports: Orchestrators depend on port interfaces (not implementations)
Ports → Adapters: Dependency injection provides concrete implementations
Adapters → External: Adapters call external services (Ollama, DuckDuckGo, etc.)

Testing Implications¶

The architecture enables comprehensive testing:

Unit Tests: Domain logic with no external dependencies
Integration Tests: Adapters with real external services
E2E Tests: Full research flow with mock search results
Contract Tests: Port interfaces verified by adapters

See Testing Strategy for details.

Components Guide - Detailed component descriptions
Data Flow - How data moves through layers
API Reference - Domain classes and port interfaces
Contributing Guide - Extending the architecture

Last updated: 2026-02-28

SearchMuse Architecture¶

Hexagonal Architecture Overview¶

Layer Descriptions¶

Domain Layer (Core)¶

Port Layer (Interfaces)¶

Adapter Layer (Implementation)¶

Application Layer¶

Infrastructure Layer¶

CLI Layer¶

Dependency Rule¶

Architecture Decision Records¶

ADR-1: Hexagonal Architecture¶

ADR-2: Frozen Dataclasses for Domain Objects¶

ADR-3: Protocol over ABC¶

ADR-4: Async-First Design¶

ADR-5: Configuration as YAML + Environment Variables¶

Cross-Layer Communication¶

Testing Implications¶

Related Documentation¶