Static Analysis Tool for Wazuh Decoder XML Files

Wazuh decoder XML files define how raw log lines are parsed into structured security events. A misconfigured decoder – a missing <order> element, an orphaned parent reference, or a regex group mismatch – can silently drop critical fields from alerts, leaving blind spots in your SIEM pipeline. Manual code review catches some of these issues, but it does not scale across hundreds of decoder files shipped with Wazuh or maintained by your organization.

This post introduces wazuh-decoder-linter, an open-source static analysis tool that validates Wazuh decoder XML files automatically. It checks structure, regex/order consistency, element attributes, and cross-file parent-child chains. It optionally integrates with the Wazuh logtest API to verify decoders against a live manager instance. By the end of this guide, you will know how to install the tool, understand every validation rule it enforces, use it from the command line and Python code, and integrate it into CI/CD pipelines.

Understanding Wazuh Decoder XML Structure

Wazuh decoders are defined in XML files located under /var/ossec/etc/decoders/ (custom) or /var/ossec/ruleset/decoders/ (default). Each file contains one or more <decoder> elements. A typical decoder has two stages: a parent decoder that matches the program name, and a child decoder that extracts fields with a regex.

<decoder name="example">
  <program_name>^example</program_name>
</decoder>

<decoder name="example">
  <parent>example</parent>
  <regex>User '(\w+)' logged from '(\d+.\d+.\d+.\d+)'</regex>
  <order>user, srcip</order>
</decoder>

The Wazuh decoder schema supports 16 child elements inside <decoder>: parent, prematch, program_name, regex, order, fts, ftscomment, plugin_decoder, accumulate, type, json_null_field, json_array_structure, var, use_own_name, match, and description. Each element has specific constraints: <regex> requires a corresponding <order>, <plugin_decoder> must reference one of the five known plugins (JSON_Decoder, OSSECAlert_Decoder, PF_Decoder, SymantecWS_Decoder, SonicWall_Decoder), and <type> accepts only eight values (firewall, ids, web-log, syslog, squid, windows, host-information, ossec).

Understanding this structure is essential for writing correct decoders.

Common Decoder Configuration Errors

Through analysis of real-world decoder files – including the 80+ files shipped with the Wazuh default ruleset – several recurring error patterns emerge:

Regex without order (and vice versa). The most frequent error. A <regex> element captures groups, but the <order> element that maps those groups to field names is missing. The decoder silently discards captured data. The reverse – <order> without <regex> – also occurs when plugin decoders are not present to provide the fields.

Capture group mismatches. A regex with 2 capturing groups paired with an <order> listing 3 fields. The third field is never populated. This is subtle because Wazuh does not raise an error at runtime – it simply leaves the field empty.

Orphaned parent references. A child decoder declares <parent>sshd-custom</parent>, but no decoder named sshd-custom exists in any loaded file. The child decoder never activates.

Invalid element attributes. The offset attribute on <prematch> only accepts after_regex and after_parent. Using after_prematch on a <prematch> element (valid only on <regex>) produces undefined behavior. Similarly, <regex type="osmatch"> is invalid because osmatch does not support capturing groups.

OS_Regex syntax violations. Wazuh’s default regex engine (osregex) does not support (?...) constructs, {n,m} quantifiers, or alternation (|) inside groups. Using these patterns causes silent matching failures.

Empty plugin_decoder elements. A <plugin_decoder></plugin_decoder> with no content is always an error – the element must specify which plugin to invoke.

Architecture of wazuh-decoder-linter

The tool is built in Python (3.10+) with lxml for XML parsing and click for the CLI. The codebase follows a modular architecture:

wazuh_decoder_linter/
  cli.py           # Click CLI: argument parsing, output formatting
  constants.py     # All validation constants, enums, valid element sets
  linter.py        # Core engine: WazuhDecoderLinter class
  logtest.py       # Wazuh API logtest integration
  models.py        # Data models: LintResult, LintReport, DecoderMeta
  regex_utils.py   # Regex group counting and syntax validation

The core engine (WazuhDecoderLinter) uses a two-pass parsing strategy for resilience. It first attempts to parse the entire file as XML. If that fails – common when a single decoder block has malformed content – it falls back to extracting individual <decoder>...</decoder> blocks using depth-tracking and processes each independently. This means one broken decoder does not prevent linting of other valid decoders in the same file.

XML sanitization handles Wazuh-specific patterns that break standard XML parsing: unescaped & characters, \< OS_Regex word boundaries, and bare < characters that are not XML tags. The sanitizer preserves valid XML entities (&, {, ) while escaping everything else.

The 16 Validation Rules

The linter implements 16 distinct validation rules, each mapped to a severity level (ERROR, WARNING, or INFO):

Errors (must fix)

Rule	Description
Name attribute	Every `<decoder>` must have a `name` attribute
Empty parent	`<parent>` elements must not be empty
Regex/order consistency	`<regex>` requires `<order>` and vice versa (unless `<plugin_decoder>` is present)
Order field format	Field names must match known static fields or the dynamic field pattern `^\w[\w.\- ]*$`
Invalid offsets	`offset` attributes must use valid values for each element type
Invalid regex types	`type` attribute must be osregex, pcre2, or osmatch
Empty plugin_decoder	`<plugin_decoder>` must not be empty
Invalid JSON fields	`<json_null_field>` must be “string” or “discard”; `<json_array_structure>` must be “array” or “csv”
use_own_name value	Must be “true”

Warnings (should fix)

Rule	Description
Unknown elements	XML elements not in the 16 known decoder child elements
Group count mismatch	Regex capture groups do not match order field count
Decoder type	`<type>` value not in the 8 recognized types
Unknown plugin	Plugin decoder name not in the 5 known plugins
use_own_name without parent	`<use_own_name>` requires `<parent>`
Accumulate without ID	`<accumulate>` requires “id” in `<order>` fields
FTS field names	FTS fields must match known static fields plus location and name
OS_Regex syntax	Flags unsupported constructs in osregex patterns
Parent chain	Cross-file: parent decoder names must exist in scanned files

The regex utilities module deserves special attention. It distinguishes between Wazuh OS_Regex and PCRE2 when counting capture groups. In OS_Regex, \( is a literal parenthesis (not a group), while in PCRE2, non-capturing groups ((?:...)), lookaheads ((?=...)), and lookbehinds ((?<=...)) are excluded from the count. This precision prevents false positives in group count validation.

Installation and CLI Usage

Install directly from GitHub:

# Core installation
pip install git+https://github.com/pyToshka/wazuh-linter.git

# With API testing support
pip install "wazuh-decoder-linter[api] @ git+https://github.com/pyToshka/wazuh-linter.git"

Static Analysis

Lint a single file or entire directory:

# Lint a single file
wazuh-decoder-lint /var/ossec/etc/decoders/local_decoder.xml

# Lint all decoder files in a directory
wazuh-decoder-lint /var/ossec/etc/decoders/

# Strict mode: treat warnings as errors
wazuh-decoder-lint --strict /var/ossec/etc/decoders/

# Show INFO-level messages
wazuh-decoder-lint --show-info /var/ossec/etc/decoders/

# JSON output for CI/CD integration
wazuh-decoder-lint --format json /var/ossec/etc/decoders/

API Testing Against a Live Manager

The tool can verify decoders against a running Wazuh manager using the logtest API:

# Inline test log
wazuh-decoder-lint /var/ossec/etc/decoders/ \
  --test-api \
  --api-url https://wazuh-manager:55000 \
  --api-user wazuh-wui \
  --test-log "sshd:Oct 15 21:07:00 myhost sshd[1234]: Failed password for root from 10.0.0.1"

# YAML test file
wazuh-decoder-lint /var/ossec/etc/decoders/ \
  --test-api \
  --test-file test_cases.yml

The YAML test file format:

tests:
  - event: 'Oct 15 21:07:00 myhost sshd[1234]: Failed password for root from 10.0.0.1'
    decoder: sshd
    description: 'SSH failed password'
  - event: '192.168.1.1 - - [15/Oct/2024:21:07:00 +0000] "GET / HTTP/1.1" 200 1234'
    decoder: web-accesslog
    description: 'Apache access log entry'

For API password security, use the WAZUH_API_PASSWORD environment variable instead of the --api-pass flag to avoid exposing credentials in process listings.

Exit Codes

Code	Meaning
0	All checks passed
1	Errors found (or warnings in strict mode, or API test failures)
2	CLI usage error

Programmatic Python API

The tool exposes a clean Python API for integration into custom tooling:

from wazuh_decoder_linter import WazuhDecoderLinter

linter = WazuhDecoderLinter()
report = linter.lint_paths(["path/to/decoders/"])

for result in report.results:
    print(f"[{result.severity}] {result.file}:{result.line} -- {result.message}")

if report.has_errors:
    print(f"Found {len(report.errors)} error(s)")

# Strict mode: treat warnings as failures
if report.has_failures(strict=True):
    sys.exit(1)

For API testing:

from wazuh_decoder_linter import WazuhLogtest

with WazuhLogtest(
    api_url="https://wazuh:55000",
    user="wazuh-wui",
    password="secret",
    verify_ssl=False,
) as tester:
    results = tester.test_batch([
        {"event": "Failed password for root from 10.0.0.1", "decoder": "sshd"},
    ])
    for result in results:
        print(f"[{result['status']}] {result['description']}")

Integrating with CI/CD Pipelines

GitHub Actions

name: Lint Wazuh Decoders
on:
  push:
    paths:
      - 'decoders/**'
  pull_request:
    paths:
      - 'decoders/**'

jobs:
  lint-decoders:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install wazuh-decoder-linter
        run: pip install "wazuh-decoder-linter[api] @ git+https://github.com/pyToshka/wazuh-linter.git"

      - name: Lint decoders
        run: wazuh-decoder-lint --strict --format json decoders/ > lint-results.json

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: lint-results
          path: lint-results.json

Pre-commit Hook

The repository includes a .pre-commit-config.yaml that enforces code quality with black, isort, flake8, bandit, and mypy. You can add decoder linting as a local pre-commit hook:

repos:
  - repo: local
    hooks:
      - id: wazuh-decoder-lint
        name: Wazuh Decoder Lint
        entry: wazuh-decoder-lint --strict
        language: python
        files: '\.xml$'
        types: [file]

The JSON output format integrates with any CI/CD system that can parse structured output. The exit code convention (0 = pass, 1 = fail, 2 = usage error) follows standard Unix conventions for seamless pipeline integration.

For a similar approach to automated security validation in container environments, see Boosting Container Image Security Using Wazuh and Trivy.

Testing Against Real-World Decoders

The project includes a comprehensive test suite: 80+ valid decoder XML files from the Wazuh default ruleset, 80+ intentionally broken copies for error testing, and 90+ parametrized integration test cases covering decoders from SSH, Apache, Cisco, Fortinet, Snort, auditd, Windows Security, AWS, Docker, Kubernetes, and dozens of other sources.

Sample text output:

[ERROR] broken_decoder.xml:14 -- Empty <plugin_decoder> in decoder 'json-msgraph'
[WARNING] decoder.xml:58 -- Decoder 'sshd-success': regex (osregex) has 2 capture
  group(s) but order has 3 field(s)

============================================================
  Wazuh Logtest API Results
============================================================
  [PASS] SSH failed password
  [FAIL] Apache access log
         expected: apache, got: web-accesslog

  API tests: 1 passed, 1 failed, 0 errors

Summary: 1 error(s), 1 warning(s)

The integration test suite spins up a full Wazuh stack (manager, indexer, dashboard) via Docker Compose and runs all 90+ test cases against the live logtest API. This validates that the linter’s static analysis results align with actual Wazuh runtime behavior.

Conclusion

Static analysis for Wazuh decoder XML files closes the gap between writing decoders and deploying them with confidence. The wazuh-decoder-linter tool catches misconfigurations - missing order elements, regex group mismatches, orphaned parent chains, invalid attributes – before they reach production and cause silent data loss in your SIEM pipeline.

The tool is open source under the BSD 3-Clause license, supports Python 3.10+, and is available at github.com/pyToshka/wazuh-linter. Contributions to validation rules, test cases, and documentation are welcome.

Boosting Container Image Security Using Wazuh and Trivy - Automated security validation with Wazuh
RAG for Wazuh Documentation: Part 1 - Building retrieval systems over Wazuh knowledge base
Wazuh LLM: Fine-Tuned Llama 3.1 for Security Analysis - AI model for Wazuh security event analysis