Architecture Overview

LynxDB is a columnar log analytics database written from scratch in Go. Every component -- the storage engine, query engine, indexing layer, and HTTP API -- is purpose-built for log search and aggregation workloads. There are no embedded third-party databases (no RocksDB, no SQLite, no Lucene). This page provides a high-level map of the system and explains how the pieces fit together.

System Diagram

                          ┌─────────────────────────────┐
                          │         lynxdb CLI          │
                          │  query · ingest · server    │
                          │  status · mv · config       │
                          └──────────────┬──────────────┘
                                         │
                ┌────────────────────────┼────────────────────────┐
                │                        │                        │
         ┌──────┴──────┐         ┌───────┴──────┐        ┌───────┴──────┐
         │  Pipe Mode  │         │  Server Mode │        │ Cluster Mode │
         │ (ephemeral  │         │ (persistent  │        │ (distributed │
         │  in-memory) │         │  single-node)│        │  multi-node) │
         └─────────────┘         └──────┬───────┘        └──────┬───────┘
                                        │                       │
            ┌───────────────────────────┼───────────────────────┘
            │
┌───────────┴───────────────────────────────────────────────┐
│                    Storage Engine                          │
│                                                           │
│  ┌──────────────┐  ┌───────────┐  ┌──────────┐          │
│  │ AsyncBatcher │→ │ Part      │→ │  S3 Tier │          │
│  │ (per-index   │  │ Writer    │  │  (warm/  │          │
│  │  buffering)  │  │  (.lsg)   │  │   cold)  │          │
│  └──────────────┘  └─────┬─────┘  └──────────┘          │
│                                   │                       │
│  ┌────────────────────────────────┤                       │
│  │           Indexes              │                       │
│  │  ┌────────┐  ┌──────────────┐  │  ┌─────────────────┐ │
│  │  │ Bloom  │  │  Inverted    │  │  │   Compaction    │ │
│  │  │ Filter │  │  Index (FST  │  │  │  L0 → L1 → L2  │ │
│  │  │        │  │  + roaring)  │  │  │                 │ │
│  │  └────────┘  └──────────────┘  │  └─────────────────┘ │
│  └────────────────────────────────┘                       │
└───────────────────────────────────────────────────────────┘
            │
┌───────────┴───────────────────────────────────────────────┐
│                    Query Engine                            │
│                                                           │
│  ┌─────────┐  ┌───────────┐  ┌──────────┐  ┌──────────┐ │
│  │  SPL2   │→ │ Optimizer │→ │ Pipeline │→ │  Output  │ │
│  │ Parser  │  │ (23 rules)│  │ (Volcano │  │  (JSON/  │ │
│  │         │  │           │  │ iterator)│  │ table/csv│ │
│  └─────────┘  └───────────┘  └──────────┘  └──────────┘ │
│                                    │                      │
│  ┌─────────────┐  ┌───────────────┤  ┌─────────────────┐ │
│  │ Bytecode VM │  │  18 Pipeline  │  │  Segment Cache  │ │
│  │ (60+ ops,   │  │  Operators    │  │  (TTL + LRU,    │ │
│  │  0 allocs)  │  │               │  │   persistent)   │ │
│  └─────────────┘  └───────────────┘  └─────────────────┘ │
└───────────────────────────────────────────────────────────┘

Three Execution Modes

LynxDB ships as a single static binary. The same binary runs in three modes depending on how you invoke it. Internally, all three modes share the same storage engine and query engine -- the difference is lifecycle and networking.

Pipe Mode (Ephemeral, In-Memory)

cat app.log | lynxdb query '| where level="ERROR" | stats count by service'
lynxdb query --file '/var/log/nginx/*.log' '| where status>=500 | top 10 uri'

In pipe mode, LynxDB behaves like grep or awk. The binary:

Creates an ephemeral in-memory storage engine (no data directory, no persisted parts).
Ingests all data from stdin or the specified file(s).
Runs the full SPL2 pipeline -- parser, optimizer, Volcano iterator, bytecode VM.
Prints results to stdout and exits.

There is no daemon, no config file, no network I/O. The entire SPL2 engine runs locally in-process. This mode is ideal for ad-hoc debugging, scripting, and integration with Unix pipelines.

Server Mode (Persistent, Single-Node)

lynxdb server --data-dir /var/lib/lynxdb

Server mode starts a long-running process that:

Opens (or creates) a data directory with part files, caches, and metadata.
Scans the data directory, removes stale temporary files, and rebuilds the in-memory part registry.
Starts the HTTP API on port 3100 (configurable).
Accepts ingest requests, buffers events in AsyncBatcher, and flushes immutable .lsg parts.
Runs background compaction (L0 -> L1 -> L2).
Serves queries against on-disk and remotely tiered segments.

A single server handles 50-100K events/sec on commodity hardware. Optional S3 tiering offloads older segments to object storage while keeping them queryable through a local segment cache.

Cluster Mode (Distributed, Multi-Node)

lynxdb server --cluster.seeds node1:9400,node2:9400,node3:9400

Cluster mode uses the same binary with clustering flags. Nodes discover each other via seed addresses and form a cluster with three roles:

Role	Responsibility	Scales with
Meta (3-5 nodes)	Raft consensus, shard map, node registry, leader leases, field catalog, source registry, view coordination	Cluster size
Ingest (N nodes)	Batcher write, part flush, S3 upload, batcher replication	Write throughput
Query (M nodes)	Scatter-gather, partial aggregation merge, segment caching	Query concurrency

In small clusters (< 10 nodes), every node runs all three roles. At scale, you split roles for independent scaling. The architecture is shared-storage: S3 is the source of truth for segments, making ingest and query nodes effectively stateless. Data is sharded using a two-level scheme (time bucketing + xxhash64 hash partitioning) and replicated via batcher-based replication with configurable ACK levels.

See Distributed Architecture for full details.

Major Subsystems

Storage Engine

The storage engine handles the full data lifecycle: ingest -> async buffering -> part flush -> compaction -> tiering.

Key design choices:

AsyncBatcher: Per-index in-memory buffering with internal flush thresholds for event count, bytes, and wait time.
Part writer: Writes to tmp_*.lsg, optionally fsyncs, then atomically renames into place.
Segments: Columnar .lsg V2 format. Per-column encoding: delta-varint for timestamps, dictionary for strings, Gorilla for floats, LZ4 for raw text. Each segment embeds a bloom filter and FST-based inverted index.
Compaction: Size-tiered with three levels (L0 overlapping -> L1 merged -> L2 fully compacted at ~1 GB). Rate-limited to avoid I/O starvation.
Tiering: Hot (local SSD, < 7d) -> Warm (S3, < 30d) -> Cold (Glacier, < 90d). Local segment cache for warm-tier queries.

See Storage Engine and Segment Format for deep dives.

Query Engine

The query engine transforms SPL2 text into a streaming execution pipeline:

Key components:

Parser: Recursive descent parser for full SPL2. Error recovery with syntax suggestions and Splunk SPL1 compatibility hints.
Optimizer: 23 rules in 6 phases -- expression simplification, predicate/projection pushdown, scan optimization, aggregation optimization, expression optimization, join optimization.
Pipeline: Volcano iterator model with 18 streaming operators. Pull-based with 1024-row batches. head 10 on 100M events reads one batch, not the entire dataset.
Bytecode VM: Stack-based VM with 60+ opcodes for evaluating WHERE, EVAL, and STATS expressions. Fixed 256-slot stack. 22ns/op for a simple predicate (status >= 500). Zero heap allocations on the hot path.
Cache: Filesystem-based segment query cache. Key = (segment_id, CRC32, query_hash, time_range). TTL + LRU eviction. Persistent across restarts.

See Query Engine for the full breakdown.

Indexing

LynxDB builds two index structures per segment during flush:

Bloom filters: Per-segment probabilistic index for term-level segment skipping. If a bloom filter says a term is absent, the entire segment is skipped without reading column data.
FST-based inverted index: Finite State Transducer mapping terms to roaring bitmap posting lists. Supports prefix, wildcard, and fuzzy matching with sub-millisecond lookup.

Together, these indexes enable full-text search that is orders of magnitude faster than brute-force scanning. See Indexing.

REST API

The HTTP API layer (pkg/api/rest) exposes all functionality over HTTP:

Ingest: Structured event arrays via /api/v1/ingest, raw lines via /api/v1/ingest/raw, plus compatibility endpoints for Elasticsearch _bulk, OTLP, and Splunk HEC.
Query: Sync, async, and hybrid execution modes. NDJSON streaming for large result sets. SSE for real-time progress.
Management: Fields catalog, saved queries, materialized views, configuration.

See REST API Overview for endpoint documentation.

Data Flow

Write Path

HTTP POST /api/v1/ingest, /api/v1/ingest/raw, /api/v1/es/_bulk, or /api/v1/otlp/v1/logs
  → Decode request shape (structured event array, raw lines, or compatibility payload)
  → Convert request payloads into internal events
  → Buffer in AsyncBatcher
  → [Cluster mode] Route to shard primary, replicate via batcher
  → [Flush thresholds reached]
    → Write tmp_*.lsg part
    → Optional fsync + atomic rename
    → Register part metadata
    → [In cluster mode: upload to S3]
  → [Background]
    → Compaction merges L0 → L1 → L2
    → Tiering offloads aged segments to S3/Glacier

Read Path

SPL2 query text
  → Parse to AST
  → Optimize (23 rules, 6 phases)
  → Build Volcano pipeline
  → Scan operator:
    → Check segment time ranges (prune by time)
    → Check bloom filters (prune by term)
    → Check inverted index (resolve full-text search)
    → Read matching columns from .lsg parts
    → Read from batcher buffer (events buffered but not yet flushed)
  → Stream through pipeline operators (Filter, Eval, Aggregate, Sort, ...)
  → Bytecode VM evaluates expressions (0 allocs per row)
  → Output (JSON / table / CSV / NDJSON)

Performance Characteristics

Metric	Value
VM simple predicate (`status >= 500`)	22 ns/op, 0 allocs
VM complex predicate (`status >= 500 AND host = "web-01"`)	55 ns/op, 0 allocs
Streaming `head 10` on 100K events	0.23 ms
Binary search on sorted timestamps (1M events)	15 ns vs 232 us full scan
Cache hit latency	299 ns
Pipeline throughput (WHERE + STATS, 100K events)	~2.1M events/sec
Single-node ingest throughput	300K+ events/sec
Materialized view acceleration	~400x vs raw scan

Next Steps

Storage Engine -- async buffering, part flush, compaction, tiering
Segment Format -- columnar .lsg V2 format and encoding strategies
Query Engine -- parser, optimizer, pipeline, bytecode VM
Indexing -- bloom filters, FST inverted index, roaring bitmaps
Distributed Architecture -- Raft, sharding, scatter-gather, S3
Design Decisions -- why columnar, why SPL2, why Go, why single binary

System Diagram​

Three Execution Modes​

Pipe Mode (Ephemeral, In-Memory)​

Server Mode (Persistent, Single-Node)​

Cluster Mode (Distributed, Multi-Node)​

Major Subsystems​

Storage Engine​

Query Engine​

Indexing​

REST API​

Data Flow​

Write Path​

Read Path​

Performance Characteristics​

Next Steps​