Working with JSON Logs
LynxDB provides multiple tools for working with JSON log data, from one-stage extraction to nested object access and array manipulation. This guide covers all the approaches and when to use each one.
Quick reference
| Approach | Use when | Example |
|---|---|---|
| parse json | Extracting all/some fields | | parse json | stats count() as count by level |
Object access f.a.b | Querying nested values after extraction | | parse json | where response.status >= 500 |
| parse json into (...) | Production pipelines with typed, projected captures | | parse json into (status as int) |
from_json() | Extracting one embedded JSON value in an extend/where | | extend payload = from_json(body) |
| explode | Exploding JSON arrays into rows | | parse json | explode items |
parse json -- extraction
The parse stage is LynxFlow's unified schema-on-read extractor (formerly the json and unpack_json commands in SPL2). parse json extracts all JSON keys from _raw (or a specified field) into columns:
# Extract all fields
cat app.json | lynxdb query '| parse json | stats count() as count by level'
# Extract specific fields only (faster for wide objects)
cat app.json | lynxdb query '| parse json into (level, status, duration_ms) | where status >= 500'
# Extract from a non-default field
cat logs.json | lynxdb query '| parse json from message | keep level, service'
Typed captures
into (...) both projects the fields you want and coerces their types at extraction time:
lynxdb query 'from main | parse json into (level as string, status as int)'
Prefixes
Add a prefix to avoid field name collisions:
lynxdb query 'from main | parse json prefix app_ | where app_level == "error"'
Merge rules
parse stages never delete columns and never silently overwrite an existing non-null field — on collision the existing value wins and a per-query warning counter increments. There is no keep_original= option because the original field is always kept.
Object and array access
After parse json, nested values are real objects and arrays. Use dot-notation for objects and [index] for arrays:
echo '{"level":"error","request":{"method":"POST","duration_ms":5012}}' \
| lynxdb query '| where request.duration_ms > 1000 | extend method = request.method | keep level, method'
(Pipe mode auto-detects JSON input, so the nested request object is already a column here. For raw text containing JSON, add | parse json first.)
Multi-level nesting
// Access deeply nested values
| parse json
| where response.headers.content_type == "application/json"
| extend origin = request.headers.origin
| stats count() as count by response.status
Array elements
| parse json
| extend first_tag = tags[0]
| extend num_items = len(order.items)
| where num_items > 10
len() returns the element count of an array (formerly json_array_length); keys() returns the keys of an object (formerly json_keys).
How field resolution works
When LynxFlow sees request.method in an expression, it resolves in order:
- Flat column -- a column literally named
request.method(e.g. produced by ingest-time extraction). - Object access -- the
methodkey of the object columnrequest.
Backticks force interpretation 1 (`request.method`); (request).method or request["method"] force interpretation 2.
There is no implicit _raw JSON fallback anymore. If no stage produced the field, dotted access yields missing — it does not silently re-parse _raw. Add | parse json to extract structure.
Safe access with ?.
Use ?. when intermediate objects may be absent:
lynxdb query 'from main | parse json | extend origin = request?.headers?.origin'
Single values: from_json()
When one field contains an embedded JSON string and you only need a value or two, from_json() (formerly json_extract) parses it into an object you can access directly:
lynxdb query 'from main | extend payload = from_json(body) | extend user = payload?.user?.name'
from_json() returns null on invalid JSON — never the original string. The strict variant from_json!() raises a query error instead.
Build JSON output
Construct objects with {...} literals and serialize with to_json() (formerly json_object):
lynxdb query 'from main
| stats count() as count by host, level
| extend summary = to_json({host: host, level: level, count: count})
| keep summary'
Handling malformed JSON
Instead of pre-validating with json_valid(), LynxFlow's parse stage has explicit on_error modes:
# Drop rows that fail to parse
lynxdb query 'from main | parse json on_error drop'
The default mode is propagate: rows survive with best-effort fields, and the _error / _error_detail columns record what went wrong. Inspecting failures is a first-class workflow:
lynxdb query 'from main | parse json | where exists(_error) | keep _error, _error_detail, _raw'
Other modes: null (row survives, fields null, no error columns) and strict (the query fails on the first offending row). See the parse reference.
explode -- arrays into rows
When a field contains an array, explode (formerly unroll) creates one row per element:
echo '{"order":"ORD-1","items":[{"sku":"A1","qty":2},{"sku":"B3","qty":1}]}' \
| lynxdb query '| parse json | explode items | extend sku = items.sku, qty = items.qty | keep order, sku, qty'
Output:
order sku qty
ORD-1 A1 2
ORD-1 B3 1
Rows with a missing or empty array are dropped.
Array of objects
After explode items, each row's items column holds one element. Object elements are reached with dot-notation (items.sku, items.qty). Note that keep only accepts flat column names — materialize nested values with extend first, as in the example above.
Array of scalars
Use as to name the element column:
// Input: {"name": "alice", "tags": ["admin", "user"]}
| parse json | explode tags as tag
// Row 1: name=alice, tag=admin
// Row 2: name=alice, tag=user
Aggregate over exploded data
lynxdb query '| parse json
| explode items
| stats sum(items.qty) as total_sold, dc(order_id) as orders by items.sku
| sort -total_sold
| head 20'
Chaining parsers
Real-world logs often have nested formats. Chain parse stages to handle them:
Docker JSON logs with embedded application log
# Docker wraps each log line in JSON: {"log":"...","stream":"stdout","time":"..."}
# The inner "log" field contains the application logfmt output
cat docker-logs.json | lynxdb query '
| parse json
| parse logfmt from log prefix app_
| where app_level == "error"
| stats count() as count by app_service'
Syslog with embedded JSON
# Syslog header wraps a JSON application message
cat syslog.log | lynxdb query '
| parse syslog
| parse json from message prefix app_
| stats count() as count by hostname, app_level'
Nginx access log with JSON body field
cat access.log | lynxdb query '
| parse combined
| parse json from request_body prefix body_
| where body_action == "purchase"
| stats sum(body_amount) as revenue by client_ip'
Mixed-format streams
When lines may be JSON or logfmt, use a fallback chain — the first format that succeeds per row wins:
cat app.log | lynxdb query '| parse first_of(json, logfmt)'
Performance tips
-
Use
into (...)for wide objects. Extracting 3 fields from a 50-key JSON object is much faster than extracting all 50, and you get typed columns for free. -
Use
from_json()in extend/where for single embedded values. When only one field holds JSON, you avoid running a full parse stage. -
Place extraction early in the pipeline. Extract before
whereso the filter operates on typed fields:| parse json into (status as int) | where status >= 500 // fast: status is an int -
Choose an
on_errormode deliberately.on_error dropkeeps malformed data out of production pipelines; the defaultpropagatekeeps rows and lets you audit failures via_error.
Next steps
- parse reference
- explode reference
- Scalar functions reference -- object and array function sections
- Field Extraction Guide -- parse regex, extend, and schema-on-read