Working with JSON Logs
LynxDB provides multiple tools for working with JSON log data, from zero-config dot-notation to full structural extraction and array manipulation. This guide covers all the approaches and when to use each one.
Quick reference
| Approach | Use when | Example |
|---|---|---|
| Dot-notation | Ad-hoc queries on nested JSON | | where response.status >= 500 |
| json | Quick extraction of all/some fields | | json | stats count by level |
| unpack_json | Production pipelines with prefix/field control | | unpack_json prefix=app_ |
json_extract() | Extracting one value in an eval/where | | eval name = json_extract(data, "user.name") |
| unroll | Exploding JSON arrays into rows | | unroll field=items |
Dot-notation: zero-config nested access
LynxDB supports dot-notation for accessing nested JSON fields directly in where, eval, and stats without any explicit extraction step.
# Filter on a nested field -- no unpack needed
echo '{"level":"error","request":{"method":"POST","duration_ms":5012}}' \
| lynxdb query '| where request.duration_ms > 1000 | table level, request.method'
How it works
When the VM encounters a field like request.method:
- Direct lookup -- checks if a field literally named
request.methodexists (e.g., from a previousunpack_json). - Root extraction -- if not found, looks for a field named
requestand extractsmethodfrom its JSON value. - _raw fallback -- if the root field doesn't exist either, tries to extract
request.methodfrom_raw.
This means dot-notation works whether or not you've explicitly extracted fields.
Multi-level nesting
-- Access deeply nested values
| where response.headers.content_type = "application/json"
| eval origin = request.headers.origin
| stats count by response.status
| json -- quick extraction
The | json command extracts all JSON keys from _raw (or a specified field) into top-level fields.
# Extract all fields
cat app.json | lynxdb query '| json | stats count by level'
# Extract specific paths only (faster for wide objects)
cat app.json | lynxdb query '| json level, status, duration_ms | where status >= 500'
# Extract from a non-default field
cat logs.json | lynxdb query '| json field=message | table level, service'
| unpack_json -- production extraction
For production pipelines, unpack_json provides full control over extraction:
-- Add a prefix to avoid field name collisions
| unpack_json prefix=app_
| where app_level = "error"
-- Extract only specific fields (performance optimization)
| unpack_json fields=level,status,duration_ms
-- Keep the original _raw field
| unpack_json keep_original=true
-- Extract from a specific field
| unpack_json field=metadata prefix=meta_
JSON eval functions
Use JSON functions in eval and where for precise extraction and validation.
Extract a single value
| eval user = json_extract(payload, "user.name")
| eval first_tag = json_extract(metadata, "tags.0")
Validate JSON before processing
| where json_valid(message) = true
| unpack_json field=message
Inspect structure
-- Get keys of a JSON object
| eval keys = json_keys(config)
-- Get length of a JSON array
| eval num_items = json_array_length(order, "items")
| where num_items > 10
Build JSON output
-- Construct a JSON object for export
| stats count by host, level
| eval summary = json_object("host", host, "level", level, "count", count)
| table summary
See the JSON Functions Reference for the complete API.
| unroll -- exploding arrays
When a field contains a JSON array, | unroll creates one row per element.
echo '{"order":"ORD-1","items":[{"sku":"A1","qty":2},{"sku":"B3","qty":1}]}' \
| lynxdb query '| json | unroll field=items | table order, items.sku, items.qty'
Output:
order items.sku items.qty
ORD-1 A1 2
ORD-1 B3 1
Array of objects
Object elements are flattened with dot-notation. For field=items:
items.sku= theskukey from each elementitems.qty= theqtykey from each element
Array of scalars
Scalar elements replace the field value directly:
-- Input: {"name": "alice", "tags": ["admin", "user"]}
| json | unroll field=tags
-- Row 1: name=alice, tags=admin
-- Row 2: name=alice, tags=user
Aggregate over unrolled data
| json
| unroll field=items
| stats sum(items.qty) AS total_sold, dc(order_id) AS orders by items.sku
| sort -total_sold
| head 20
Chaining parsers
Real-world logs often have nested formats. Chain extraction commands to handle them:
Docker JSON logs with embedded application log
# Docker wraps each log line in JSON: {"log":"...","stream":"stdout","time":"..."}
# The inner "log" field contains the application's logfmt output
cat docker-logs.json | lynxdb query '
| json
| unpack_logfmt field=log prefix=app_
| where app_level = "error"
| stats count by app_service'
Syslog with embedded JSON
# Syslog header wraps a JSON application message
cat syslog.log | lynxdb query '
| unpack_syslog
| unpack_json field=message prefix=app_
| stats count by hostname, app_level'
Nginx access log with JSON body field
cat access.log | lynxdb query '
| unpack_combined
| unpack_json field=request_body prefix=body_
| where body_action = "purchase"
| stats sum(body_amount) by client_ip'
Performance tips
-
Use dot-notation for ad-hoc queries. No extraction overhead for fields you don't materialize.
-
Specify
fields=when usingunpack_json. Extracting 3 fields from a 50-key JSON object is much faster than extracting all 50. -
Use
json_extract()in eval/where for single values. When you only need one field,json_extractavoids parsing the entire object. -
Place extraction early in the pipeline. Extract before
whereso the filter can operate on typed fields:| json | where status >= 500 -- fast: status is an integer -
Use
json_valid()to guard against malformed data. Avoid parse errors in production pipelines:| where json_valid(_raw) = true | json
Next steps
- json command reference
- unpack_json reference
- unroll reference
- JSON Functions reference
- Field Extraction Guide -- REX, EVAL, and schema-on-read