Extract Fields at Query Time

LynxDB follows a schema-on-read philosophy: you do not need to define a schema before ingesting data. JSON and text logs remain queryable without an upfront schema, but each ingest transport has its own endpoint contract. Fields from JSON events are indexed automatically. For unstructured text logs, use parse regex (formerly rex in SPL2) and extend (formerly eval) to extract and compute fields at query time.

How schema-on-read works

When you ingest data, LynxDB automatically discovers fields:

JSON events: All top-level keys become searchable fields with their types preserved.
Raw text: The full line is stored in the _raw field. You can extract structure from it at query time.

Check what fields LynxDB has discovered:

lynxdb fields

FIELD                     TYPE       COVERAGE   TOP VALUES
--------------------------------------------------------------------------------
_timestamp                datetime      100%
level                     string        100%    INFO(72%), ERROR(17%), WARN(11%)
status                    integer        50%    200(90%), 404(5%), 500(3%)
duration_ms               float          50%    min=0.1, max=30001.0, avg=145.3
source                    string        100%    nginx(50%), api-gw(37%), redis(13%)

See the lynxdb fields command reference for details. Inside a query, the describe stage gives the same view of the current stream — one row per field with type, coverage, and top values:

lynxdb query 'from main | parse json | describe'

Extract fields with `parse regex`

The parse regex stage extracts fields from a text field using named capture groups in a regular expression.

Basic extraction

Given raw log lines like:

2026-01-15 14:23:01 host=web-01 service=api duration=245ms status=200

Extract host, service, duration, and status:

lynxdb query 'from main "duration"
  | parse regex r"host=(?P<host>\S+) service=(?P<service>\S+) duration=(?P<duration>\d+)ms status=(?P<status>\d+)"
  | keep _time, host, service, duration, status'

Named capture group syntax

parse regex uses Go-style named capture groups: (?P<field_name>pattern).

Pattern	Matches
`(?P<ip>\d+\.\d+\.\d+\.\d+)`	An IPv4 address
`(?P<host>\S+)`	A non-whitespace token
`(?P<code>\d{3})`	A 3-digit status code
`(?P<path>[^ ]+)`	A path (no spaces)
`(?P<msg>.+)`	Everything to end of line

note

Patterns are raw strings (r"..."): backslashes are literal, so \d and \S need no double-escaping. A raw string cannot contain a literal " — match a quote character with \x22 instead.

Extract from Apache/Nginx access logs

For standard access-log formats, prefer the purpose-built parse combined (see below). The regex equivalent looks like this:

lynxdb query --file access.log '
  | parse regex r"(?P<client_ip>\S+) \S+ \S+ \[(?P<ts>[^\]]+)\] \x22(?P<method>\w+) (?P<uri>\S+) [^\x22]*\x22 (?P<status>\d+) (?P<bytes>\d+)"
  | stats count() as count by method, status
  | sort -count'

Extract from application logs

lynxdb query 'from main "connection refused"
  | parse regex r"host=(?P<host>\S+)"
  | stats count() as count by host
  | sort -count'

Extract from a non-default field

By default, parse operates on _raw. Use from <field> to extract from any field:

lynxdb query 'from main
  | parse regex r"user_id=(?P<uid>\d+)" from message
  | stats dc(uid) as unique_users'

Typed captures

into (...) coerces captures at extraction time — no separate conversion step needed:

lynxdb query 'from main
  | parse regex r"duration=(?P<dur>\d+)ms" into (dur as int)
  | stats avg(dur) as avg_ms'

Compute fields with `extend`

The extend stage (formerly eval in SPL2) creates new fields by evaluating expressions. Remember: = binds, == compares.

Create a computed field

lynxdb query 'from nginx
  | extend duration_sec = duration_ms / 1000
  | keep uri, duration_ms, duration_sec'

Conditional fields with if

lynxdb query 'from nginx
  | extend severity = if(status >= 500, "critical", if(status >= 400, "warning", "ok"))
  | stats count() as count by severity'

Conditional fields with case

A trailing odd argument is the default — no 1=1 sentinel needed:

lynxdb query 'from nginx
  | extend category = case(
      status >= 500, "5xx",
      status >= 400, "4xx",
      status >= 300, "3xx",
      status >= 200, "2xx",
      "other"
    )
  | stats count() as count by category'

String manipulation

# Convert to lowercase
lynxdb query 'from main | extend level_lower = lower(level) | stats count() as count by level_lower'

# Extract substring (0-based start; SPL2 substr was 1-based)
lynxdb query 'from main | extend short_path = substr(uri, 0, 20) | stats count() as count by short_path'

# String length
lynxdb query 'from main | extend msg_len = len(message) | where msg_len > 500 | keep _time, msg_len, message'

Type conversion

The SPL2 tonumber/tostring functions are gone — cast with the type name:

# Convert a string field to a number
lynxdb query 'from main | extend status_num = int(status) | where status_num >= 500'

# Convert a number to string for display
lynxdb query 'from main | extend status_str = string(status) | keep status_str, uri'

int() and float() return null on failure; the strict variants int!() and float!() raise a query error instead.

Coalesce (first non-null)

lynxdb query 'from main
  | extend display_time = coalesce(timestamp, `@timestamp`, _time)
  | keep display_time, message'

For a single fallback, the ?? operator is shorter: region ?? "unknown". Field names with special characters (like @timestamp) are quoted with backticks.

Time formatting

lynxdb query 'from main
  | extend human_time = strftime(_time, "%Y-%m-%d %H:%M:%S")
  | keep human_time, level, message'

See the scalar functions reference for the complete list of available functions.

Combine `parse regex` and `extend`

Extract typed values with parse regex ... into, then transform them with extend:

lynxdb query 'from main "request completed"
  | parse regex r"duration=(?P<dur>\d+)ms" into (dur as int)
  | extend is_slow = if(dur > 1000, "slow", "fast")
  | stats count() as count by is_slow'

Array operations

LynxFlow arrays are first-class values (SPL2's multivalue mv* functions are gone). When a field contains an array — for example, from a values() aggregation or parsed JSON — use the native array functions:

# Join an array into a string (formerly mvjoin)
lynxdb query 'from main level=error | stats values(source) as sources by host | extend src_list = join(sources, ", ")'

# Deduplicate an array (formerly mvdedup)
lynxdb query 'from main | extend unique_tags = array_distinct(tags)'

# Concatenate values into one array (formerly mvappend)
lynxdb query 'from main | extend all_ids = array_concat([primary_id], [secondary_id])'

See the array section of the scalar functions reference for slice, flatten, filter, map, and more.

Null handling

LynxFlow distinguishes null (the field is present with an explicit null) from missing (the field was never extracted). SPL2's isnull/isnotnull are gone:

# Find events where a field is explicitly null
lynxdb query 'from main | where is_null(user_id) | stats count() as count by source'

# Find events that have a non-null field
lynxdb query 'from main | where exists(duration_ms) | stats avg(duration_ms)'

# Replace null/missing with a default
lynxdb query 'from main | extend region = region ?? "unknown" | stats count() as count by region'

is_missing(f) is true only when the field was never extracted at all.

Field extraction on local files

All extraction stages work in pipe mode:

# Extract from a local file
lynxdb query --file /var/log/syslog '
  | parse regex r"(?P<process>\w+)\[(?P<pid>\d+)\]"
  | stats count() as count by process
  | sort -count
  | head 10'

# Extract from piped input
kubectl logs deploy/api | lynxdb query '
  | parse regex r"endpoint=(?P<ep>\S+) status=(?P<code>\d+) duration=(?P<dur>\d+)ms" into (dur as int)
  | stats avg(dur) as avg_ms, p99(dur) as p99_ms by ep'

Structured log parsing with parse formats

For structured log formats, the unified parse stage has purpose-built named formats (formerly the 16 unpack_* commands) that are faster and more accurate than regex extraction:

Format	Stage	Example input
JSON	`parse json`	`{"level":"error","msg":"timeout"}`
logfmt	`parse logfmt`	`level=error msg="request failed" duration=245ms`
Key=value	`parse kv`	`host=web-01 status=200 duration=45ms`
Syslog	`parse syslog`	`<134>Jan 15 14:23:01 web-01 nginx: connection reset`
Combined (access log)	`parse combined`	`10.0.1.5 - - [10/Oct/2025:13:55:36 -0700] "GET /api HTTP/1.1" 200 2326 "-" "curl/7.64"`
CLF	`parse clf`	`127.0.0.1 - frank [10/Oct/2025:13:55:36 -0700] "GET /api HTTP/1.1" 200 2326`
Nginx error	`parse nginx_error`	`2026/02/14 14:52:01 [error] 12345#67: *890 message, client: 10.0.1.5`
CEF	`parse cef`	`CEF:0\|Vendor\|Product\|1.0\|100\|Alert\|7\|src=10.0.0.1`

Additional named formats: docker, redis, apache_error, postgres, mysql_slow, haproxy, leef, w3c. Fallback chains try formats in order per row: parse first_of(json, logfmt).

# Parse logfmt and aggregate
cat app.log | lynxdb query '| parse logfmt | stats count() as count by level'

# Parse nginx access logs
lynxdb query --file access.log '| parse combined | where status >= 500 | stats count() as count by uri'

For JSON-specific workflows (object access, from_json, explode), see the Working with JSON Logs guide.

Next steps

Working with JSON Logs -- parse json, object access, explode
Search and filter logs -- filter before extracting fields
Run aggregations -- aggregate over extracted fields
parse reference -- full parse syntax, formats, and options
extend reference -- full extend syntax
Scalar functions reference -- complete function list

How schema-on-read works​

Extract fields with parse regex​

Basic extraction​

Named capture group syntax​

Extract from Apache/Nginx access logs​

Extract from application logs​

Extract from a non-default field​

Typed captures​

Compute fields with extend​

Create a computed field​

Conditional fields with if​

Conditional fields with case​

String manipulation​

Type conversion​

Coalesce (first non-null)​

Time formatting​

Combine parse regex and extend​

Array operations​

Null handling​

Field extraction on local files​

Structured log parsing with parse formats​

Next steps​