Wazuh Rule Static Analysis: Linter Evolution

“Wazuh Static Analysis” series:

  • Part 1: Decoders - decoder XML validation
  • Part 2: Rules (you are here) - rule validation and cross-type checking

In Part 1 we built a linter for Wazuh decoder XML files - a tool that validates structure, regex/order consistency, and parent-child decoder chains. But decoders are only half of the event processing pipeline. Decoders extract fields from raw logs, while rules decide what to do with those fields: generate an alert, escalate a threat level, or trigger an automated response. An error in a rule - a missed alert or a false positive - can be more dangerous than a decoder misconfiguration.

The tool has grown. It is now wazuh-linter - a static analysis platform that validates decoders, rules, and the relationships between them. This article covers the architectural evolution of the tool, its 24 validation rules for Wazuh rule XML files, and the cross-type checking mechanism.

Common Rule Configuration Errors

Wazuh rule XML files are located in /var/ossec/etc/rules/ (custom) and /var/ossec/ruleset/rules/ (default). Each file contains <rule> elements inside a root <group> element. Analysis of real-world configurations reveals consistent error patterns.

timeframe without frequency. A rule with timeframe="120" but no frequency attribute defines a time window with no firing threshold. Wazuh silently ignores the timeframe in this case. The exception is rules with <if_matched_sid> or <if_matched_group>, which inherit frequency context from the referenced rule. Note that frequency without timeframe is valid - Wazuh applies a default timeframe.

<!-- Error: timeframe without frequency -->
<rule id="100001" level="10" timeframe="120">
  <description>Missing frequency</description>
</rule>

if_sid referencing a non-existent ID. A rule declares <if_sid>100500</if_sid>, but rule 100500 does not exist in any loaded file. The rule will never fire because its activation condition cannot be satisfied.

Duplicate rule IDs. Two rules with the same id without overwrite="yes" produce undefined behavior. Wazuh loads one of them, but which one depends on file processing order.

Invalid MITRE ATT&CK format. MITRE identifiers must follow the format Tnnnn or Tnnnn.nnn (for example, T1078 or T1078.001). Arbitrary strings like brute_force in the <id> element break MITRE framework integration.

osmatch in regex elements. The type="osmatch" attribute is valid for <match> and <prematch>, but not for <regex> - osmatch does not support capturing groups, and <regex> exists specifically for field capture.

Invalid time/weekday formats. The <time> element accepts ranges in 24-hour format (6 pm - 8:30 am), while <weekday> accepts day names or special values weekdays/weekends. Typos like Mnday or 25:00 - 26:00 cause the time condition to be silently ignored.

Architectural Evolution: From Linter to Platform

The first version of the tool - wazuh-decoder-linter - was a monolith: a single WazuhDecoderLinter class handling XML parsing, sanitization, block extraction, and all validation checks. When it came time to add rule validation, it became clear that copying XML parsing logic was a dead end. Decoders and rules share the same mechanisms: file reading, malformed XML recovery, Wazuh-specific character escaping, and individual block extraction on parse failure.

The solution was extracting shared logic into a base class BaseXmlLinter:

BaseXmlLinter
  - File reading (UTF-8 / latin-1 fallback)
  - XML sanitization (unescaped &, \<, bare <)
  - Two-pass parsing strategy
  - Individual block extraction on failure
  - Line context formatting
      |
      +-- WazuhDecoderLinter
      |     14 decoder checks
      |     Decoder name registry
      |
      +-- WazuhRuleLinter
            24 rule checks
            Rule ID and group registry

Each specialized linter inherits from BaseXmlLinter and implements only domain logic. WazuhDecoderLinter checks regex/order, parent chains, plugin_decoder. WazuhRuleLinter checks frequency/timeframe, if_sid chains, MITRE IDs, time formats.

The second architectural decision was LintSession. This is a shared state object that ties decoder and rule linters into a single validation pass. When the decoder linter processes files, it registers all discovered decoder names in the session. When the rule linter encounters <decoded_as>sshd</decoded_as>, it checks the session to verify that the sshd decoder exists. Without LintSession, this cross-type validation would be impossible.

The third change was automatic file type detection. The CLI analyzes XML content to determine whether a file contains decoders (<decoder> elements) or rules (<group> with child <rule> elements). This allows running the unified wazuh-lint command on a directory with mixed file types.

24 Validation Rules for Wazuh Rule XML Files

The rule linter implements 24 checks, grouped by type.

Structural Checks

RuleSeverityDescription
Required attributesERROR<rule> must have id and level
ID rangeERRORid must be an integer from 1 to 999999
Level rangeERRORlevel must be an integer from 0 to 16
ID uniquenessERRORDuplicate IDs within a file are forbidden
Unknown elementsWARNINGChild elements outside the set of 82 valid elements
DescriptionWARNINGRule should contain <description>
Rule attributesWARNINGUnknown attributes on <rule>

Logical Checks

RuleSeverityDescription
frequency/timeframeERRORCo-dependency (relaxed for if_matched_*)
if_sid formatERRORValid comma-separated integers
if_level formatERRORInteger within 0-16 range
Overwrite valueERROROnly yes or no
Correlation contextWARNINGsame_*/different_* require frequency or if_matched_*

Format Checks

RuleSeverityDescription
Regex typesERRORtype must be osmatch, osregex, or pcre2
Negate attributeERROROnly yes/no; valid only on matching elements
OS_Regex syntaxWARNINGUnsupported constructs in osregex patterns
Options valuesERROROnly valid option values accepted
Time formatERRORValid time range (24h, 12h am/pm, ! for negation)
Weekday formatERRORValid day names or weekdays/weekends
MITRE ID formatWARNING<id> must match Tnnnn or Tnnnn.nnn
List attributesERROR<list> requires field=; valid lookup= values

Cross-File Checks

RuleSeverityDescription
if_sid chainWARNING<if_sid> must reference existing IDs
if_matched_sid chainWARNING<if_matched_sid> must reference existing IDs
Cross-file duplicate IDsERRORNo duplicates without overwrite="yes"
decoded_asINFO<decoded_as> must reference an existing decoder

Let us examine several checks in more detail.

Correlation context. Elements <same_source_ip>, <different_source_ip>, and other correlation elements (40 total) only make sense in an aggregation context - when <frequency> or <if_matched_sid> is present. Without them, the correlation element is silently ignored:

<!-- Error: same_source_ip without frequency -->
<rule id="100002" level="8">
  <if_sid>5710</if_sid>
  <same_source_ip />
  <description>Should correlate but cannot</description>
</rule>

MITRE ATT&CK validation. The linter verifies that identifiers inside <mitre><id> conform to the Tnnnn or Tnnnn.nnn format. This is not full validation against the MITRE registry, but it catches obvious errors like text descriptions instead of identifiers.

List attributes. The <list> element for CDB lookups requires a mandatory field attribute and accepts lookup with values match_key, not_match_key, match_key_value, address_match_key, not_address_match_key, address_match_key_value. A missing field is an error, and so is an invalid lookup value.

Cross-Type Validation: decoded_as

The most interesting capability of the new architecture is checking relationships between rules and decoders. The <decoded_as> element in a rule filters events by decoder name. If the specified decoder does not exist, the rule will never fire.

Consider this example. In a rules file:

<rule id="100100" level="5">
  <decoded_as>custom-nginx</decoded_as>
  <description>Custom nginx event detected</description>
</rule>

But no decoder named custom-nginx exists in the decoder files. The rule is formally valid XML, but functionally useless.

LintSession solves this problem. When running through the unified wazuh-lint entry point, a session object is created. The decoder linter populates the name registry (session.decoder_names). The rule linter then receives this registry and checks every <decoded_as> against it.

from wazuh_linter import WazuhDecoderLinter, WazuhRuleLinter, LintSession

session = LintSession()

decoder_linter = WazuhDecoderLinter()
decoder_report = decoder_linter.lint_paths(
    ["decoders/"], session=session
)

rule_linter = WazuhRuleLinter()
rule_report = rule_linter.lint_paths(
    ["rules/"], session=session
)

# session.decoder_names populated by decoder linter
# rule_linter used it to validate decoded_as
for result in rule_report.results:
    print(f"[{result.severity}] {result.file}:{result.line} - {result.message}")

Output when an issue is detected:

[INFO] local_rules.xml:12 - Rule '100100': <decoded_as> references
  decoder 'custom-nginx' which was not found in scanned decoder files

The severity is INFO rather than ERROR because the decoder may exist in files not included in the current scan (for example, Wazuh default decoders).

CLI Usage and CI/CD Integration

The updated tool provides three entry points:

# Auto-detect file type (recommended)
wazuh-lint /var/ossec/etc/

# Force specific type
wazuh-lint --type rule /var/ossec/etc/rules/
wazuh-lint --type decoder /var/ossec/etc/decoders/

# Legacy aliases (identical to wazuh-lint, kept for backwards compatibility)
wazuh-rule-lint /var/ossec/etc/rules/
wazuh-decoder-lint /var/ossec/etc/decoders/

The wazuh-lint command automatically detects the type of each XML file and creates a LintSession for cross-type validation. Use --type to force a specific mode. The wazuh-rule-lint and wazuh-decoder-lint commands are aliases for wazuh-lint - they do not force a type. Options --strict, --format json, and --show-info work across all modes.

Updated GitHub Actions example for validating the entire configuration:

name: Lint Wazuh Configuration
on:
  push:
    paths:
      - 'decoders/**'
      - 'rules/**'
  pull_request:
    paths:
      - 'decoders/**'
      - 'rules/**'

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install wazuh-linter
        run: pip install git+https://github.com/pyToshka/wazuh-linter.git

      - name: Lint decoders and rules
        run: wazuh-lint --strict --format json decoders/ rules/ > lint-results.json

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: lint-results
          path: lint-results.json

Pre-commit hook for both file types:

repos:
  - repo: local
    hooks:
      - id: wazuh-lint
        name: Wazuh Lint
        entry: wazuh-lint --strict
        language: python
        files: '\.xml$'
        types: [file]

For a detailed walkthrough of the decoder linter and all 14 decoder checks, refer to Part 1 of this series.

Conclusion and Next Steps

wazuh-linter now covers both sides of the Wazuh event processing pipeline: decoders (14 checks) and rules (24 checks), connected through the LintSession cross-type validation mechanism. The BaseXmlLinter architecture makes adding new analysis types straightforward.

The tool is open source under the BSD 3-Clause license at github.com/pyToshka/wazuh-linter. The next part of the series will explore further extensions to the tool’s capabilities.


See also