Wazuh decoder XML files define how raw log lines are parsed into structured security events. A misconfigured decoder – a missing <order> element, an orphaned parent reference, or a regex group mismatch – can silently drop critical fields from alerts, leaving blind spots in your SIEM pipeline. Manual code review catches some of these issues, but it does not scale across hundreds of decoder files shipped with Wazuh or maintained by your organization.
This post introduces wazuh-decoder-linter, an open-source static analysis tool that validates Wazuh decoder XML files automatically. It checks structure, regex/order consistency, element attributes, and cross-file parent-child chains. It optionally integrates with the Wazuh logtest API to verify decoders against a live manager instance. By the end of this guide, you will know how to install the tool, understand every validation rule it enforces, use it from the command line and Python code, and integrate it into CI/CD pipelines.
Understanding Wazuh Decoder XML Structure
Wazuh decoders are defined in XML files located under /var/ossec/etc/decoders/ (custom) or /var/ossec/ruleset/decoders/ (default). Each file contains one or more <decoder> elements. A typical decoder has two stages: a parent decoder that matches the program name, and a child decoder that extracts fields with a regex.
<decoder name="example">
<program_name>^example</program_name>
</decoder>
<decoder name="example">
<parent>example</parent>
<regex>User '(\w+)' logged from '(\d+.\d+.\d+.\d+)'</regex>
<order>user, srcip</order>
</decoder>
The Wazuh decoder schema supports 16 child elements inside <decoder>: parent, prematch, program_name, regex, order, fts, ftscomment, plugin_decoder, accumulate, type, json_null_field, json_array_structure, var, use_own_name, match, and description. Each element has specific constraints: <regex> requires a corresponding <order>, <plugin_decoder> must reference one of the five known plugins (JSON_Decoder, OSSECAlert_Decoder, PF_Decoder, SymantecWS_Decoder, SonicWall_Decoder), and <type> accepts only eight values (firewall, ids, web-log, syslog, squid, windows, host-information, ossec).
Understanding this structure is essential for writing correct decoders.
Common Decoder Configuration Errors
Through analysis of real-world decoder files – including the 80+ files shipped with the Wazuh default ruleset – several recurring error patterns emerge:
Regex without order (and vice versa). The most frequent error. A <regex> element captures groups, but the <order> element that maps those groups to field names is missing. The decoder silently discards captured data. The reverse – <order> without <regex> – also occurs when plugin decoders are not present to provide the fields.
Capture group mismatches. A regex with 2 capturing groups paired with an <order> listing 3 fields. The third field is never populated. This is subtle because Wazuh does not raise an error at runtime – it simply leaves the field empty.
Orphaned parent references. A child decoder declares <parent>sshd-custom</parent>, but no decoder named sshd-custom exists in any loaded file. The child decoder never activates.
Invalid element attributes. The offset attribute on <prematch> only accepts after_regex and after_parent. Using after_prematch on a <prematch> element (valid only on <regex>) produces undefined behavior. Similarly, <regex type="osmatch"> is invalid because osmatch does not support capturing groups.
OS_Regex syntax violations. Wazuh’s default regex engine (osregex) does not support (?...) constructs, {n,m} quantifiers, or alternation (|) inside groups. Using these patterns causes silent matching failures.
Empty plugin_decoder elements. A <plugin_decoder></plugin_decoder> with no content is always an error – the element must specify which plugin to invoke.
Architecture of wazuh-decoder-linter
The tool is built in Python (3.10+) with lxml for XML parsing and click for the CLI. The codebase follows a modular architecture:
wazuh_decoder_linter/
cli.py # Click CLI: argument parsing, output formatting
constants.py # All validation constants, enums, valid element sets
linter.py # Core engine: WazuhDecoderLinter class
logtest.py # Wazuh API logtest integration
models.py # Data models: LintResult, LintReport, DecoderMeta
regex_utils.py # Regex group counting and syntax validation
The core engine (WazuhDecoderLinter) uses a two-pass parsing strategy for resilience. It first attempts to parse the entire file as XML. If that fails – common when a single decoder block has malformed content – it falls back to extracting individual <decoder>...</decoder> blocks using depth-tracking and processes each independently. This means one broken decoder does not prevent linting of other valid decoders in the same file.
XML sanitization handles Wazuh-specific patterns that break standard XML parsing: unescaped & characters, \< OS_Regex word boundaries, and bare < characters that are not XML tags. The sanitizer preserves valid XML entities (&, {, ) while escaping everything else.
The 16 Validation Rules
The linter implements 16 distinct validation rules, each mapped to a severity level (ERROR, WARNING, or INFO):
Errors (must fix)
| Rule | Description |
|---|---|
| Name attribute | Every <decoder> must have a name attribute |
| Empty parent | <parent> elements must not be empty |
| Regex/order consistency | <regex> requires <order> and vice versa (unless <plugin_decoder> is present) |
| Order field format | Field names must match known static fields or the dynamic field pattern ^\w[\w.\- ]*$ |
| Invalid offsets | offset attributes must use valid values for each element type |
| Invalid regex types | type attribute must be osregex, pcre2, or osmatch |
| Empty plugin_decoder | <plugin_decoder> must not be empty |
| Invalid JSON fields | <json_null_field> must be “string” or “discard”; <json_array_structure> must be “array” or “csv” |
| use_own_name value | Must be “true” |
Warnings (should fix)
| Rule | Description |
|---|---|
| Unknown elements | XML elements not in the 16 known decoder child elements |
| Group count mismatch | Regex capture groups do not match order field count |
| Decoder type | <type> value not in the 8 recognized types |
| Unknown plugin | Plugin decoder name not in the 5 known plugins |
| use_own_name without parent | <use_own_name> requires <parent> |
| Accumulate without ID | <accumulate> requires “id” in <order> fields |
| FTS field names | FTS fields must match known static fields plus location and name |
| OS_Regex syntax | Flags unsupported constructs in osregex patterns |
| Parent chain | Cross-file: parent decoder names must exist in scanned files |
The regex utilities module deserves special attention. It distinguishes between Wazuh OS_Regex and PCRE2 when counting capture groups. In OS_Regex, \( is a literal parenthesis (not a group), while in PCRE2, non-capturing groups ((?:...)), lookaheads ((?=...)), and lookbehinds ((?<=...)) are excluded from the count. This precision prevents false positives in group count validation.
Installation and CLI Usage
Install directly from GitHub:
# Core installation
pip install git+https://github.com/pyToshka/wazuh-linter.git
# With API testing support
pip install "wazuh-decoder-linter[api] @ git+https://github.com/pyToshka/wazuh-linter.git"
Static Analysis
Lint a single file or entire directory:
# Lint a single file
wazuh-decoder-lint /var/ossec/etc/decoders/local_decoder.xml
# Lint all decoder files in a directory
wazuh-decoder-lint /var/ossec/etc/decoders/
# Strict mode: treat warnings as errors
wazuh-decoder-lint --strict /var/ossec/etc/decoders/
# Show INFO-level messages
wazuh-decoder-lint --show-info /var/ossec/etc/decoders/
# JSON output for CI/CD integration
wazuh-decoder-lint --format json /var/ossec/etc/decoders/
API Testing Against a Live Manager
The tool can verify decoders against a running Wazuh manager using the logtest API:
# Inline test log
wazuh-decoder-lint /var/ossec/etc/decoders/ \
--test-api \
--api-url https://wazuh-manager:55000 \
--api-user wazuh-wui \
--test-log "sshd:Oct 15 21:07:00 myhost sshd[1234]: Failed password for root from 10.0.0.1"
# YAML test file
wazuh-decoder-lint /var/ossec/etc/decoders/ \
--test-api \
--test-file test_cases.yml
The YAML test file format:
tests:
- event: 'Oct 15 21:07:00 myhost sshd[1234]: Failed password for root from 10.0.0.1'
decoder: sshd
description: 'SSH failed password'
- event: '192.168.1.1 - - [15/Oct/2024:21:07:00 +0000] "GET / HTTP/1.1" 200 1234'
decoder: web-accesslog
description: 'Apache access log entry'
For API password security, use the WAZUH_API_PASSWORD environment variable instead of the --api-pass flag to avoid exposing credentials in process listings.
Exit Codes
| Code | Meaning |
|---|---|
| 0 | All checks passed |
| 1 | Errors found (or warnings in strict mode, or API test failures) |
| 2 | CLI usage error |
Programmatic Python API
The tool exposes a clean Python API for integration into custom tooling:
from wazuh_decoder_linter import WazuhDecoderLinter
linter = WazuhDecoderLinter()
report = linter.lint_paths(["path/to/decoders/"])
for result in report.results:
print(f"[{result.severity}] {result.file}:{result.line} -- {result.message}")
if report.has_errors:
print(f"Found {len(report.errors)} error(s)")
# Strict mode: treat warnings as failures
if report.has_failures(strict=True):
sys.exit(1)
For API testing:
from wazuh_decoder_linter import WazuhLogtest
with WazuhLogtest(
api_url="https://wazuh:55000",
user="wazuh-wui",
password="secret",
verify_ssl=False,
) as tester:
results = tester.test_batch([
{"event": "Failed password for root from 10.0.0.1", "decoder": "sshd"},
])
for result in results:
print(f"[{result['status']}] {result['description']}")
Integrating with CI/CD Pipelines
GitHub Actions
name: Lint Wazuh Decoders
on:
push:
paths:
- 'decoders/**'
pull_request:
paths:
- 'decoders/**'
jobs:
lint-decoders:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install wazuh-decoder-linter
run: pip install "wazuh-decoder-linter[api] @ git+https://github.com/pyToshka/wazuh-linter.git"
- name: Lint decoders
run: wazuh-decoder-lint --strict --format json decoders/ > lint-results.json
- name: Upload results
if: always()
uses: actions/upload-artifact@v4
with:
name: lint-results
path: lint-results.json
Pre-commit Hook
The repository includes a .pre-commit-config.yaml that enforces code quality with black, isort, flake8, bandit, and mypy. You can add decoder linting as a local pre-commit hook:
repos:
- repo: local
hooks:
- id: wazuh-decoder-lint
name: Wazuh Decoder Lint
entry: wazuh-decoder-lint --strict
language: python
files: '\.xml$'
types: [file]
The JSON output format integrates with any CI/CD system that can parse structured output. The exit code convention (0 = pass, 1 = fail, 2 = usage error) follows standard Unix conventions for seamless pipeline integration.
For a similar approach to automated security validation in container environments, see Boosting Container Image Security Using Wazuh and Trivy.
Testing Against Real-World Decoders
The project includes a comprehensive test suite: 80+ valid decoder XML files from the Wazuh default ruleset, 80+ intentionally broken copies for error testing, and 90+ parametrized integration test cases covering decoders from SSH, Apache, Cisco, Fortinet, Snort, auditd, Windows Security, AWS, Docker, Kubernetes, and dozens of other sources.
Sample text output:
[ERROR] broken_decoder.xml:14 -- Empty <plugin_decoder> in decoder 'json-msgraph'
[WARNING] decoder.xml:58 -- Decoder 'sshd-success': regex (osregex) has 2 capture
group(s) but order has 3 field(s)
============================================================
Wazuh Logtest API Results
============================================================
[PASS] SSH failed password
[FAIL] Apache access log
expected: apache, got: web-accesslog
API tests: 1 passed, 1 failed, 0 errors
Summary: 1 error(s), 1 warning(s)
The integration test suite spins up a full Wazuh stack (manager, indexer, dashboard) via Docker Compose and runs all 90+ test cases against the live logtest API. This validates that the linter’s static analysis results align with actual Wazuh runtime behavior.
Conclusion
Static analysis for Wazuh decoder XML files closes the gap between writing decoders and deploying them with confidence. The wazuh-decoder-linter tool catches misconfigurations - missing order elements, regex group mismatches, orphaned parent chains, invalid attributes – before they reach production and cause silent data loss in your SIEM pipeline.
The tool is open source under the BSD 3-Clause license, supports Python 3.10+, and is available at github.com/pyToshka/wazuh-linter. Contributions to validation rules, test cases, and documentation are welcome.
Related Reading
- Boosting Container Image Security Using Wazuh and Trivy - Automated security validation with Wazuh
- RAG for Wazuh Documentation: Part 1 - Building retrieval systems over Wazuh knowledge base
- Wazuh LLM: Fine-Tuned Llama 3.1 for Security Analysis - AI model for Wazuh security event analysis