Extract Fields at Query Time
LynxDB follows a schema-on-read philosophy: you do not need to define a schema before ingesting data. JSON and text logs remain queryable without an upfront schema, but each ingest transport has its own endpoint contract. Fields from JSON events are indexed automatically. For unstructured text logs, use parse regex (formerly rex in SPL2) and extend (formerly eval) to extract and compute fields at query time.
How schema-on-read works
When you ingest data, LynxDB automatically discovers fields:
- JSON events: All top-level keys become searchable fields with their types preserved.
- Raw text: The full line is stored in the
_rawfield. You can extract structure from it at query time.
Check what fields LynxDB has discovered:
lynxdb fields
FIELD TYPE COVERAGE TOP VALUES
--------------------------------------------------------------------------------
_timestamp datetime 100%
level string 100% INFO(72%), ERROR(17%), WARN(11%)
status integer 50% 200(90%), 404(5%), 500(3%)
duration_ms float 50% min=0.1, max=30001.0, avg=145.3
source string 100% nginx(50%), api-gw(37%), redis(13%)
See the lynxdb fields command reference for details. Inside a query, the describe stage gives the same view of the current stream — one row per field with type, coverage, and top values:
lynxdb query 'from main | parse json | describe'
Extract fields with parse regex
The parse regex stage extracts fields from a text field using named capture groups in a regular expression.
Basic extraction
Given raw log lines like:
2026-01-15 14:23:01 host=web-01 service=api duration=245ms status=200
Extract host, service, duration, and status:
lynxdb query 'from main "duration"
| parse regex r"host=(?P<host>\S+) service=(?P<service>\S+) duration=(?P<duration>\d+)ms status=(?P<status>\d+)"
| keep _time, host, service, duration, status'
Named capture group syntax
parse regex uses Go-style named capture groups: (?P<field_name>pattern).
| Pattern | Matches |
|---|---|
(?P<ip>\d+\.\d+\.\d+\.\d+) | An IPv4 address |
(?P<host>\S+) | A non-whitespace token |
(?P<code>\d{3}) | A 3-digit status code |
(?P<path>[^ ]+) | A path (no spaces) |
(?P<msg>.+) | Everything to end of line |
Patterns are raw strings (r"..."): backslashes are literal, so \d and \S need no double-escaping. A raw string cannot contain a literal " — match a quote character with \x22 instead.
Extract from Apache/Nginx access logs
For standard access-log formats, prefer the purpose-built parse combined (see below). The regex equivalent looks like this:
lynxdb query --file access.log '
| parse regex r"(?P<client_ip>\S+) \S+ \S+ \[(?P<ts>[^\]]+)\] \x22(?P<method>\w+) (?P<uri>\S+) [^\x22]*\x22 (?P<status>\d+) (?P<bytes>\d+)"
| stats count() as count by method, status
| sort -count'
Extract from application logs
lynxdb query 'from main "connection refused"
| parse regex r"host=(?P<host>\S+)"
| stats count() as count by host
| sort -count'
Extract from a non-default field
By default, parse operates on _raw. Use from <field> to extract from any field:
lynxdb query 'from main
| parse regex r"user_id=(?P<uid>\d+)" from message
| stats dc(uid) as unique_users'
Typed captures
into (...) coerces captures at extraction time — no separate conversion step needed:
lynxdb query 'from main
| parse regex r"duration=(?P<dur>\d+)ms" into (dur as int)
| stats avg(dur) as avg_ms'
Compute fields with extend
The extend stage (formerly eval in SPL2) creates new fields by evaluating expressions. Remember: = binds, == compares.
Create a computed field
lynxdb query 'from nginx
| extend duration_sec = duration_ms / 1000
| keep uri, duration_ms, duration_sec'
Conditional fields with if
lynxdb query 'from nginx
| extend severity = if(status >= 500, "critical", if(status >= 400, "warning", "ok"))
| stats count() as count by severity'
Conditional fields with case
A trailing odd argument is the default — no 1=1 sentinel needed:
lynxdb query 'from nginx
| extend category = case(
status >= 500, "5xx",
status >= 400, "4xx",
status >= 300, "3xx",
status >= 200, "2xx",
"other"
)
| stats count() as count by category'
String manipulation
# Convert to lowercase
lynxdb query 'from main | extend level_lower = lower(level) | stats count() as count by level_lower'
# Extract substring (0-based start; SPL2 substr was 1-based)
lynxdb query 'from main | extend short_path = substr(uri, 0, 20) | stats count() as count by short_path'
# String length
lynxdb query 'from main | extend msg_len = len(message) | where msg_len > 500 | keep _time, msg_len, message'
Type conversion
The SPL2 tonumber/tostring functions are gone — cast with the type name:
# Convert a string field to a number
lynxdb query 'from main | extend status_num = int(status) | where status_num >= 500'
# Convert a number to string for display
lynxdb query 'from main | extend status_str = string(status) | keep status_str, uri'
int() and float() return null on failure; the strict variants int!() and float!() raise a query error instead.
Coalesce (first non-null)
lynxdb query 'from main
| extend display_time = coalesce(timestamp, `@timestamp`, _time)
| keep display_time, message'
For a single fallback, the ?? operator is shorter: region ?? "unknown". Field names with special characters (like @timestamp) are quoted with backticks.
Time formatting
lynxdb query 'from main
| extend human_time = strftime(_time, "%Y-%m-%d %H:%M:%S")
| keep human_time, level, message'
See the scalar functions reference for the complete list of available functions.
Combine parse regex and extend
Extract typed values with parse regex ... into, then transform them with extend:
lynxdb query 'from main "request completed"
| parse regex r"duration=(?P<dur>\d+)ms" into (dur as int)
| extend is_slow = if(dur > 1000, "slow", "fast")
| stats count() as count by is_slow'
Array operations
LynxFlow arrays are first-class values (SPL2's multivalue mv* functions are gone). When a field contains an array — for example, from a values() aggregation or parsed JSON — use the native array functions:
# Join an array into a string (formerly mvjoin)
lynxdb query 'from main level=error | stats values(source) as sources by host | extend src_list = join(sources, ", ")'
# Deduplicate an array (formerly mvdedup)
lynxdb query 'from main | extend unique_tags = array_distinct(tags)'
# Concatenate values into one array (formerly mvappend)
lynxdb query 'from main | extend all_ids = array_concat([primary_id], [secondary_id])'
See the array section of the scalar functions reference for slice, flatten, filter, map, and more.
Null handling
LynxFlow distinguishes null (the field is present with an explicit null) from missing (the field was never extracted). SPL2's isnull/isnotnull are gone:
# Find events where a field is explicitly null
lynxdb query 'from main | where is_null(user_id) | stats count() as count by source'
# Find events that have a non-null field
lynxdb query 'from main | where exists(duration_ms) | stats avg(duration_ms)'
# Replace null/missing with a default
lynxdb query 'from main | extend region = region ?? "unknown" | stats count() as count by region'
is_missing(f) is true only when the field was never extracted at all.
Field extraction on local files
All extraction stages work in pipe mode:
# Extract from a local file
lynxdb query --file /var/log/syslog '
| parse regex r"(?P<process>\w+)\[(?P<pid>\d+)\]"
| stats count() as count by process
| sort -count
| head 10'
# Extract from piped input
kubectl logs deploy/api | lynxdb query '
| parse regex r"endpoint=(?P<ep>\S+) status=(?P<code>\d+) duration=(?P<dur>\d+)ms" into (dur as int)
| stats avg(dur) as avg_ms, p99(dur) as p99_ms by ep'
Structured log parsing with parse formats
For structured log formats, the unified parse stage has purpose-built named formats (formerly the 16 unpack_* commands) that are faster and more accurate than regex extraction:
| Format | Stage | Example input |
|---|---|---|
| JSON | parse json | {"level":"error","msg":"timeout"} |
| logfmt | parse logfmt | level=error msg="request failed" duration=245ms |
| Key=value | parse kv | host=web-01 status=200 duration=45ms |
| Syslog | parse syslog | <134>Jan 15 14:23:01 web-01 nginx: connection reset |
| Combined (access log) | parse combined | 10.0.1.5 - - [10/Oct/2025:13:55:36 -0700] "GET /api HTTP/1.1" 200 2326 "-" "curl/7.64" |
| CLF | parse clf | 127.0.0.1 - frank [10/Oct/2025:13:55:36 -0700] "GET /api HTTP/1.1" 200 2326 |
| Nginx error | parse nginx_error | 2026/02/14 14:52:01 [error] 12345#67: *890 message, client: 10.0.1.5 |
| CEF | parse cef | CEF:0|Vendor|Product|1.0|100|Alert|7|src=10.0.0.1 |
Additional named formats: docker, redis, apache_error, postgres, mysql_slow, haproxy, leef, w3c. Fallback chains try formats in order per row: parse first_of(json, logfmt).
# Parse logfmt and aggregate
cat app.log | lynxdb query '| parse logfmt | stats count() as count by level'
# Parse nginx access logs
lynxdb query --file access.log '| parse combined | where status >= 500 | stats count() as count by uri'
For JSON-specific workflows (object access, from_json, explode), see the Working with JSON Logs guide.
Next steps
- Working with JSON Logs -- parse json, object access, explode
- Search and filter logs -- filter before extracting fields
- Run aggregations -- aggregate over extracted fields
- parse reference -- full parse syntax, formats, and options
- extend reference -- full extend syntax
- Scalar functions reference -- complete function list