Skip to main content

Run Aggregations

The stats stage is the core of log analytics in LynxDB. It computes aggregate functions over events, optionally grouped by one or more fields. This guide covers all the aggregation patterns you need for day-to-day log analysis.

Basic counting

Count all events

lynxdb query '| stats count()'
note

count() always takes parentheses in LynxFlow — stats count is a parse error. An unaliased count() produces a column literally named count(), so alias it (stats count() as count) whenever a later stage or output references the column.

Count with a filter

Search-sugar terms like level=error are accepted directly after the source in a from stage:

lynxdb query 'from main level=error | stats count()'

In a bare pipeline (no from), use where with the == comparison operator:

lynxdb query 'where level == "error" | stats count()'

Quick count shortcut

The lynxdb count command is a faster way to get a simple count. It wraps your filter stage in from main | <filter> | stats count():

lynxdb count 'where level == "error"' --since 1h

Group by a field

Add by <field> to break results down by category:

lynxdb query 'from main level=error | stats count() as count by source'

Result:

sourcecount
nginx847
api-gateway523
postgres211

Group by multiple fields

lynxdb query 'from nginx | stats count() as count by status, uri'

Name your aggregation

Use as to give the result column a meaningful name:

lynxdb query 'from main level=error | stats count() as error_count by source'

Aggregation functions

LynxDB supports 26 aggregate functions. Here are the most common ones.

Count, sum, avg

lynxdb query 'from nginx | stats count() as count, sum(bytes) as total_bytes, avg(duration_ms) as avg_lat by uri'

Result:

uricounttotal_bytesavg_lat
/api/v2/users1423489210045.2
/api/v1/health89112450012.1
/api/v1/login45634200089.7
note

SPL2's mean is gone — use avg.

Min and max

lynxdb query 'from nginx | stats min(duration_ms) as fastest, max(duration_ms) as slowest by uri'

Distinct count

Count unique values with dc():

lynxdb query 'from nginx | stats dc(client_ip) as unique_visitors by uri'

dc() is exact below 10K distinct values and switches to HyperLogLog above; estdc() always uses HyperLogLog.

Percentiles

Compute latency percentiles:

lynxdb query 'from nginx | stats avg(duration_ms) as avg_lat, p50(duration_ms) as p50_lat, p95(duration_ms) as p95_lat, p99(duration_ms) as p99_lat by uri'

Available percentile functions: p50, p75, p90, p95, p99 (the SPL2 perc50...perc99 zoo is gone). For arbitrary percentiles the registry defines perc(x, p) with p in [0, 100] — see the aggregate functions reference.

For weighted samples, use perc_weighted(x, weight, p):

lynxdb query 'from metrics | stats perc_weighted(latency_ms, sample_count, 95) as weighted_p95 by endpoint'

There is also a percentiles sugar stage that expands to the full p50/p75/p90/p95/p99 set:

lynxdb query 'from nginx | percentiles duration_ms by uri'

Standard deviation and variance

lynxdb query 'from nginx | stats avg(duration_ms) as avg_lat, stdev(duration_ms) as stdev_lat by uri'

stdev() is the sample standard deviation; var() is the sample variance.

Robust spread

Use mad() for median absolute deviation, an exact robust spread measure over numeric values:

lynxdb query 'from nginx | stats mad(duration_ms) as latency_spread by uri'

Collect values

The values() function collects all distinct non-null values of a field into an array:

lynxdb query 'from main level=error | stats count() as count, values(source) as sources by host'

list() collects all non-null values in row order (duplicates included).

Use top_k_weighted() to rank values by summed numeric weight:

lynxdb query 'from sales | stats top_k_weighted(product, revenue, 10) as top_products by region'

Earliest and latest

Get the first and last value seen (by _time):

lynxdb query 'from main | stats earliest(message) as first_msg, latest(message) as last_msg by source'

first() and last() are the row-order equivalents.

See the aggregate functions reference for the complete list.


Conditional counting

Count events that match a condition with a where clause inside the aggregate (formerly count(eval(...)) in SPL2):

lynxdb query 'from nginx | stats count() as total, count(where status >= 500) as errors by uri'

All standard aggregates support where clauses, e.g. sum(bytes, where status == 200):

lynxdb query 'from nginx | stats sum(bytes, where status == 200) as ok_bytes by uri'

Compute error rates

Combine conditional counting with extend (formerly eval) to calculate ratios:

lynxdb query 'from nginx
| stats count() as total, count(where status >= 500) as errors by uri
| extend error_rate = round(errors / total * 100, 1)
| where error_rate > 5
| sort -error_rate
| keep uri, total, errors, error_rate'

Sorting results

Pipe aggregation results into sort to order them:

# Sort descending by count (prefix with -)
lynxdb query 'from main level=error | stats count() as count by source | sort -count'

# Sort ascending
lynxdb query 'from main level=error | stats count() as count by source | sort count'

# Sort by multiple fields
lynxdb query 'from nginx | stats count() as count by status, uri | sort status, -count'

Top and rare

The top and rare stages are shortcuts for the most and least common values:

# Top 10 URIs by request count
lynxdb query 'from nginx | top 10 uri'

# Rarest error messages
lynxdb query 'from main level=error | rare 10 message'

These desugar to stats count() as count by <field> | sort -count | head N (and sort +count for rare) — visible with --show-rewritten.


Multi-level aggregation

You can chain multiple stats stages in a pipeline. Each one aggregates the output of the previous step:

# First: count errors per host per source
# Then: find hosts with more than 100 total errors
lynxdb query 'from main level=error
| stats count() as count by host, source
| stats sum(count) as total_errors by host
| where total_errors > 100
| sort -total_errors'

Streaming aggregations

streamstats -- running aggregations

streamstats computes running (cumulative) aggregations without collapsing events:

lynxdb query 'from nginx
| sort +_time
| streamstats count() as request_num, avg(duration_ms) as running_avg_latency'

streamstats also supports window-only functions such as lag, lead, and row_number — see the aggregate functions reference.

eventstats -- enrich events with aggregates

eventstats adds aggregation values to each event without collapsing:

lynxdb query 'from nginx
| eventstats avg(duration_ms) as global_avg by uri
| where duration_ms > global_avg * 3
| keep _time, uri, duration_ms, global_avg'

This is useful for finding outliers: events where the latency is more than 3x the average.


Aggregations on local files

All aggregation stages work in pipe mode and file mode:

# Aggregate a local file
lynxdb query --file access.log '| stats count() as count by status'

# Aggregate piped input
kubectl logs deploy/api | lynxdb query '| stats avg(duration_ms) as avg_dur, p99(duration_ms) as p99_dur by endpoint'

# Combine with Unix tools
lynxdb query --file access.log '| stats count() as count by status' --format csv | sort -t, -k2 -rn

Performance tips

  • Time range first: Always narrow the scan window with a bracket range on the source: from main[-1h] level=error | stats count() is much faster than scanning all data. The --since CLI flag works too.
  • Filter before aggregating: Place where before stats to reduce the number of events processed.
  • Use materialized views: For queries you run repeatedly, create a materialized view to precompute the aggregation and get results up to 400x faster.
  • Partial aggregation: LynxDB automatically uses two-phase partial aggregation (per-segment, then global merge), so stats scales linearly with data size.

Next steps