Skip to main content

Upgrading from grep/awk

If you live in the terminal and your log analysis toolkit is grep, awk, sed, and jq, LynxDB pipe mode gives you the power of a full analytics engine with zero setup. Same philosophy: read from stdin, process, write to stdout. No server, no config file, no daemon.

How Pipe Mode Works

LynxDB's query command detects when data is piped via stdin. It creates an ephemeral in-memory engine, ingests the data, runs your LynxFlow query, prints results, and exits. Nothing is saved to disk.

cat app.log | lynxdb query 'stats count() by level'

This is the equivalent of a full analytics pipeline in a single command.

Side-by-Side Comparisons

Count Lines Matching a Pattern

# grep
grep -c "ERROR" app.log

# LynxDB
lynxdb query --file app.log 'from main level=error | stats count()'

Count by Field Value

# grep + sort + uniq
grep -oP 'level=\K\w+' app.log | sort | uniq -c | sort -rn

# awk
awk -F'level=' '{print $2}' app.log | awk '{print $1}' | sort | uniq -c | sort -rn

# LynxDB
lynxdb query --file app.log 'stats count() by level | sort -count'

Filter and Aggregate

# grep + awk (fragile, depends on log format)
grep "status=5" access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -10

# LynxDB (works with any log format)
lynxdb query --file access.log 'where status >= 500 | stats count() by uri | sort -count | head 10'

Average of a Numeric Field

# awk
awk '{sum+=$NF; n++} END {print sum/n}' data.log

# LynxDB
lynxdb query --file data.log 'stats avg(duration_ms)'

Percentiles

# awk (requires writing a percentile function)
# ... complex multi-line awk script ...

# LynxDB
lynxdb query --file data.log 'stats p50(duration_ms), p95(duration_ms), p99(duration_ms)'

Time-Based Aggregation

# awk (requires parsing timestamps, bucketing, counting)
# ... very complex awk script ...

# LynxDB
lynxdb query --file app.log 'from main level=error | every 5m stats count()'

Top Values

# grep + sort + uniq + head
grep -oP 'host=\K\S+' app.log | sort | uniq -c | sort -rn | head -5

# LynxDB
lynxdb query --file app.log 'top 5 host'

JSON Logs

# jq (one field at a time)
cat app.json | jq -r '.level' | sort | uniq -c | sort -rn

# jq (complex aggregation -- difficult)
cat app.json | jq -r '[.level, .source] | @tsv' | sort | uniq -c | sort -rn

# LynxDB (handles JSON natively)
cat app.json | lynxdb query 'stats count() by level, source | sort -count'

Extracting Fields with Regex

# grep -oP
grep -oP 'duration=\K\d+' app.log

# LynxDB (named capture groups)
lynxdb query --file app.log 'parse regex r"duration=(?P<dur>\d+)" | keep dur'

Chaining with Unix Tools

LynxDB outputs NDJSON when piped, so it composes with standard tools:

# LynxDB aggregation -> jq for further processing
lynxdb query --file app.log 'stats count() by host' | jq '.host'

# LynxDB filter -> CSV export -> sort
lynxdb query --file app.log 'stats count() by status' --format csv | sort -t, -k2 -rn

# LynxDB as a filter in a pipeline
cat huge.log | lynxdb query 'where level == "ERROR"' | wc -l

Common Recipes

Quick Error Count

cat app.log | lynxdb query 'where level == "ERROR" | stats count()'

Errors Per Service in the Last Hour

# Against a running server
lynxdb query 'from main level=error | stats count() by source' --since 1h

# Against a local file
lynxdb query --file app.log 'where level == "ERROR" | stats count() by source'

Slow Requests

kubectl logs deploy/api | lynxdb query 'where duration_ms > 1000 | stats avg(duration_ms), count() by endpoint | sort -count'

HTTP Status Code Distribution

lynxdb query --file access.log 'stats count() by status | sort -count'

Unique Visitors

lynxdb query --file access.log 'stats dc(client_ip) as unique_visitors'

Error Spike Detection

lynxdb query --file app.log 'from main level=error | every 1m stats count()'

Parse Unstructured Logs

# Extract IP and status from Apache combined log format
lynxdb query --file access.log \
'parse regex r"^(?P<ip>\S+) \S+ \S+ \[[^\]]+\] .(?P<method>\S+) (?P<uri>\S+) \S+ (?P<status>\d+)"
| stats count() by status | sort -count'

Docker/Kubernetes Logs

# Docker
docker logs myapp 2>&1 | lynxdb query 'from main "OOM" | stats count() by container'

# Kubernetes
kubectl logs deploy/api --since=1h | lynxdb query 'stats avg(duration_ms) by endpoint'

# Multiple pods
kubectl logs -l app=api --all-containers | lynxdb query 'where level == "ERROR" | stats count() by pod'

Process Compressed Logs

zcat /var/log/app.log.gz | lynxdb query 'stats count() by level'

Why LynxDB Over grep/awk

Capabilitygrep/awk/jqLynxDB Pipe Mode
Simple text searchEasyEasy
Count by fieldAwkward (sort | uniq -c)stats count() by field
Averages, percentilesWrite your own functionBuilt-in (avg, p99, etc.)
Time-based bucketsVery difficultevery 5m stats count()
JSON parsingjq (separate tool)Native
Multiple aggregationsNear impossiblestats count(), avg(x), p99(x) by y
Top-Nsort | head (no ties)top 10 field
JoinsNot possiblejoin, let bindings
Output formatsText onlyJSON, table, CSV, TSV

When to Keep Using grep

  • Simple text search in a single file: grep "error" app.log is faster for one-off searches
  • When you need regex match highlighting
  • When you need line numbers: grep -n "pattern" file

LynxDB complements grep rather than replacing it. Use grep for quick text searches, and LynxDB when you need aggregation, statistics, or structured analysis.

Next Steps