Large Cluster (10-1000+ Nodes)

For large-scale deployments, LynxDB supports role splitting. Each node runs one specific role, allowing independent scaling of metadata management, write throughput, and query concurrency.

Current implementation status

The repo includes the core cluster building blocks for this topology, but the operational surface for large separated-role clusters is still broader than the small-cluster path. Treat this guide as advanced deployment guidance and verify failover, rebalancing, and shard-routing behavior in staging before production use.

Architecture

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Meta Nodes   │    │ Ingest Nodes │    │ Query Nodes  │
│ (3-5, Raft)  │    │ (N, stateless│    │ (M, stateless│
│              │    │  + batcher)  │    │  + cache)    │
│ - Shard map  │    │ - Batcher    │    │ - Scatter-   │
│ - Node reg   │    │ - Memtable   │    │   gather     │
│ - Leases     │    │ - Flush→S3   │    │ - Partial    │
│              │    │ - Replicate  │    │   merge      │
└──────┬───────┘    └──────┬───────┘    └──────┬───────┘
       │                   │                   │
       └───────────────────┼───────────────────┘
                     ┌─────┴─────┐
                     │  S3/MinIO │
                     │ (source   │
                     │  of truth)│
                     └───────────┘

Roles

Role	Scales with	Count	Responsibility
Meta	Cluster size	3-5 (fixed)	Raft consensus, shard map, node registry, leader leases, field catalog, source registry
Ingest	Write throughput	N nodes	Async batcher, direct-to-part writes, S3 upload, batcher replication
Query	Query concurrency	M nodes	Scatter-gather, partial aggregation merge, result caching

Meta Node Configuration

Meta nodes run the Raft consensus protocol. You need 3 for production (tolerates 1 failure) or 5 (tolerates 2 failures). Meta nodes are lightweight and do not store log data.

# /etc/lynxdb/config.yaml (meta-1, meta-2, meta-3)
listen: "0.0.0.0:3100"
data_dir: "/var/lib/lynxdb"
log_level: "info"

cluster:
  node_id: "meta-1"    # Unique per node
  roles: [meta]
  seeds:
    - "meta-1.example.com:9400"
    - "meta-2.example.com:9400"
    - "meta-3.example.com:9400"

Ingest Node Configuration

Ingest nodes receive data, buffer events in the async batcher, write parts directly to immutable .lsg files, upload them to S3, and replicate batches to ISR peers via gRPC streaming. They are stateless after flush -- if an ingest node dies, its shards are reassigned and replicated batches are available on ISR peers.

# /etc/lynxdb/config.yaml (ingest nodes)
listen: "0.0.0.0:3100"
data_dir: "/var/lib/lynxdb"
log_level: "info"

cluster:
  enabled: true
  node_id: "ingest-1"    # Unique per node
  roles: [ingest]
  seeds:
    - "meta-1.example.com:9400"
    - "meta-2.example.com:9400"
    - "meta-3.example.com:9400"
  virtual_partition_count: 1024
  time_bucket_size: "24h"
  replication_factor: 3
  ack_level: "one"
  heartbeat_interval: "5s"
  lease_duration: "10s"

storage:
  s3_bucket: "my-lynxdb-logs"
  s3_region: "us-east-1"
  compression: "lz4"
  flush_threshold: "512mb"
  compaction_workers: 4

ingest:
  max_body_size: "50mb"

Query Node Configuration

Query nodes execute distributed queries using scatter-gather. They pull segments from S3 (with a local cache) and merge partial aggregation results from shards.

# /etc/lynxdb/config.yaml (query nodes)
listen: "0.0.0.0:3100"
data_dir: "/var/lib/lynxdb"
log_level: "info"

cluster:
  node_id: "query-1"    # Unique per node
  roles: [query]
  seeds:
    - "meta-1.example.com:9400"
    - "meta-2.example.com:9400"
    - "meta-3.example.com:9400"

storage:
  s3_bucket: "my-lynxdb-logs"
  s3_region: "us-east-1"
  segment_cache_size: "50gb"
  cache_max_bytes: "8gb"

query:
  max_concurrent: 50
  max_query_runtime: "30m"

Distributed Query Execution

The optimizer automatically splits queries for distributed execution:

"source=nginx | where status>=500 | stats count by uri | sort -count | head 10"

  Shard-level (pushed):    | where status>=500 | stats count by uri   (partial agg)
  Coordinator (merged):    | sort -count | head 10

Operators that push down to shards: scan, filter, eval, partial stats, TopK. Operators that run on the coordinator: sort, join, global dedup, streamstats.

Network Architecture

                    ┌─────────────────────────────────────┐
                    │         Load Balancer                │
                    │   (ingest LB)    (query LB)         │
                    └────────┬──────────────┬──────────────┘
                             │              │
              ┌──────────────┤              ├──────────────┐
              │              │              │              │
        ┌─────┴────┐  ┌─────┴────┐  ┌─────┴────┐  ┌─────┴────┐
        │Ingest-1  │  │Ingest-2  │  │ Query-1  │  │ Query-2  │
        │Ingest-3  │  │Ingest-4  │  │ Query-3  │  │ Query-4  │
        └──────────┘  └──────────┘  └──────────┘  └──────────┘
              │              │              │              │
              └──────────────┼──────────────┼──────────────┘
                             │              │
                     ┌───────┴───┐  ┌───────┴───┐
                     │  Meta-1   │  │  Meta-2   │
                     │  Meta-3   │  │           │
                     └───────────┘  └───────────┘

Required Ports

Port	Protocol	Direction	Purpose
3100	TCP	Inbound	HTTP API (ingest, query, health)
9400	TCP	Between nodes	Cluster communication (Raft, shard rebalancing, replication)

Load Balancer Setup

Use separate load balancers (or URL-based routing) for ingest and query traffic:

Ingest LB routes POST /api/v1/ingest* to ingest nodes
Query LB routes all other traffic to query nodes
Health check: GET /health on port 3100

Scaling Guidelines

Ingest Nodes

Scale ingest nodes based on write throughput:

Events/sec	Recommended Ingest Nodes
100K	2-3
500K	5-10
1M	10-20
5M+	50+

Query Nodes

Scale query nodes based on concurrent query load:

Concurrent Queries	Recommended Query Nodes
10	2-3
50	5-10
200	20-30
500+	50+

Meta Nodes

Meta nodes are lightweight. 3 nodes handle clusters up to ~500 nodes. Use 5 for larger clusters.

Node Failure Recovery

Scenario	Impact	Recovery Time
Ingest node failure	Shards reassigned to surviving ingest nodes	~25 seconds
Query node failure	Load balancer routes to surviving query nodes	Immediate (health check)
Meta node failure (1 of 3)	Raft quorum maintained, cluster operates normally	Automatic
Meta node failure (2 of 3)	Raft quorum lost, no new shard assignments	Bring 1 meta node back
Meta quorum loss (all)	Ingest continues in degraded mode for `meta_loss_timeout` (30s)	Restore meta quorum

S3 is the source of truth. No data is lost when ingest nodes fail because batches are replicated to ISR peers.

Example: 50-Node Cluster

Meta:   3 nodes (m5.large)
Ingest: 20 nodes (c5.2xlarge, 8 vCPU, 16GB RAM)
Query:  27 nodes (r5.2xlarge, 8 vCPU, 64GB RAM)
S3:     Single bucket, us-east-1

Estimated capacity:

Ingest: ~2M events/sec
Query: ~100 concurrent queries
Storage: Unlimited (S3)
Hot cache: ~1.3TB across query nodes

Next Steps

S3 Storage Setup -- configure the S3 backend
Small Cluster -- simpler deployment for fewer than 10 nodes
Performance Tuning -- optimize cluster performance
Monitoring -- cluster-wide observability

Architecture​

Roles​

Meta Node Configuration​

Ingest Node Configuration​

Query Node Configuration​

Distributed Query Execution​

Network Architecture​

Required Ports​

Load Balancer Setup​

Scaling Guidelines​

Ingest Nodes​

Query Nodes​

Meta Nodes​

Node Failure Recovery​

Example: 50-Node Cluster​

Next Steps​