Skip to main content

Time Series Analysis

Log analytics often requires understanding trends over time: when do errors spike? Is latency increasing? How does traffic change throughout the day? LynxDB provides the every stage and the bin() function for time-based aggregation (formerly TIMECHART, BIN, and time_bucket() in SPL2).

Time bucketing with every

The every stage is the primary tool for time series analysis. It buckets events into time intervals and computes aggregations for each bucket.

Basic time series

Count events per 5-minute interval:

lynxdb query 'from main level=error | every 5m stats count() as count'

every is pure sugar: it desugars to stats count() as count by bin(_time, 5m) (visible with --show-rewritten), and the bucket timestamp emits as _time.

Choose the time span

The span argument controls the bucket size. Combine it with a bracket time range on from:

# 1-minute granularity (high detail)
lynxdb query 'from main[-1h] level=error | every 1m stats count() as count'

# 1-hour granularity (overview)
lynxdb query 'from main[-7d] level=error | every 1h stats count() as count'

# 1-day granularity (trend)
lynxdb query 'from main[-30d] level=error | every 1d stats count() as count'

Common span values: 1m, 5m, 10m, 15m, 30m, 1h, 6h, 12h, 1d.

Aggregation functions in every

Use any aggregate function, not just count:

# Average latency over time
lynxdb query 'from nginx | every 5m stats avg(duration_ms) as avg_lat'

# P99 latency over time
lynxdb query 'from nginx | every 5m stats p99(duration_ms) as p99_lat'

# Multiple aggregations
lynxdb query 'from nginx | every 5m stats count() as count, avg(duration_ms) as avg_lat, p99(duration_ms) as p99_lat'

# Sum of bytes transferred
lynxdb query 'from nginx | every 1h stats sum(bytes) as total_bytes'

Split by a field

Use by <field> to produce separate series for each value:

# Error count over time, split by source
lynxdb query 'from main level=error | every 5m by source stats count() as count'

# Latency by endpoint
lynxdb query 'from nginx | every 5m by uri stats avg(duration_ms) as avg_lat'

# Status code distribution over time
lynxdb query 'from nginx | every 5m by status stats count() as count'
warning

In every, the by clause comes before the stats keyword: every 5m by source stats count(). Writing every 5m stats count() by source is a parse error.

The rate shortcut

For plain event counts per bucket, the rate sugar stage is even shorter — it desugars to every <per> [by <keys>] stats count() as rate:

lynxdb query 'from nginx | rate per 5m by service'

bin() -- explicit time buckets in stats

bin(_time, <span>) snaps a timestamp to a bucket boundary. Used as a group key in stats, it is the explicit form of every — useful when you want to combine time bucketing with other groupings or computations.

info

_time is the canonical timestamp field in LynxFlow queries and API output. In a stats by list, the binned key emits as _time.

Basic binning

lynxdb query 'from nginx
| stats count() as count, avg(duration_ms) as avg_lat by bin(_time, 5m)'

Correlate metrics with binned timestamps

lynxdb query 'from postgres duration_ms>1000
| stats count() as slow_queries, avg(duration_ms) as avg_latency by bin(_time, 5m)
| where slow_queries > 10'

bin() in extend

You can also materialize the bucket as a named column:

lynxdb query 'from nginx
| extend hour = bin(_time, 1h)
| stats avg(duration_ms) as avg_lat, count() as count by hour
| sort hour'

every vs explicit bin()

Featureeverystats ... by bin(_time, ...)
ConvenienceOne keyword, reads naturallyExplicit group key
Split byby clause before statsAny mix of keys in the by list
FlexibilityFixed to time groupingCombine time with other groupings, name the bucket via extend
Typical useQuick time series chartsComplex multi-dimensional analysis

Both compile to the same plan — every is pure sugar.


bin() in materialized views

bin()-based bucketing (formerly time_bucket()) is essential for defining materialized views:

lynxdb mv create mv_errors_5m \
'from main level=error | every 5m by source stats count() as count, avg(duration) as avg_duration' \
--retention 90d

Practical examples

Find error spikes

Identify 5-minute windows with abnormally high error counts:

lynxdb query 'from main[-24h] level=error
| every 5m stats count() as count
| where count > 100'

Compare latency across endpoints

lynxdb query 'from nginx[-6h] | every 10m by uri stats avg(duration_ms) as avg_lat'

Traffic pattern analysis

See how request volume changes hour by hour:

lynxdb query 'from nginx[-7d] | every 1h stats count() as count'

Correlate errors with slow queries

lynxdb query 'from main[-6h] level=error or (source=postgres duration_ms>1000)
| extend event_type = if(source == "postgres", "slow_query", "error")
| every 5m by event_type stats count() as count'

Compute moving averages with streamstats

Smooth out spiky time series with a running average:

lynxdb query 'from nginx
| every 5m stats avg(duration_ms) as avg_lat
| sort +_time
| streamstats window=6 avg(avg_lat) as moving_avg
| keep _time, avg_lat, moving_avg'

The window=6 option computes the average over the trailing 6 rows (30 minutes at 5-minute intervals).


Time series on local files

Time analysis works in pipe mode and file mode:

# Analyze a local access log
lynxdb query --file access.log '| every 1h stats count() as count'

# Analyze kubectl output over time
kubectl logs deploy/api --since=6h | lynxdb query '
| stats count() as count, avg(duration_ms) as avg_dur by bin(_time, 5m)'

Output format for time series

Time series data often needs to be consumed by other tools. Use --format csv for spreadsheet compatibility or --format json for programmatic use:

# CSV for spreadsheets or graphing tools
lynxdb query 'from main[-7d] level=error | every 1h stats count() as count' --format csv > errors_by_hour.csv

# JSON for programmatic use
lynxdb query 'from main[-7d] level=error | every 1h stats count() as count' --format json

Next steps