RAG for Wazuh Documentation: Step-by-Step Guide, Part 1

Introduction to RAG

Retrieval-Augmented Generation (RAG) is a method that allows the use of information from various sources to generate more accurate and useful responses to questions.

In the context of Wazuh, RAG can be used to automate data processing, optimize access to information, and improve information retrieval.

Understanding RAG Technology

RAG combines the strengths of two distinct approaches: information retrieval and text generation. Traditional large language models generate responses based solely on their training data, which means they can produce outdated or inaccurate information about specific products or tools. RAG addresses this limitation by first retrieving relevant documents from a curated knowledge base and then feeding those documents as context to the language model alongside the user’s query.

The RAG pipeline consists of three primary stages:

  1. Document Ingestion: Source documents are processed, split into manageable chunks, and converted into vector embeddings that capture their semantic meaning.
  2. Retrieval: When a user submits a query, the system converts the query into a vector embedding and searches the document store for the most semantically similar chunks.
  3. Generation: The retrieved document chunks are provided as context to the language model, which generates a response grounded in the actual documentation rather than relying solely on its pretrained knowledge.

For Wazuh specifically, this approach is valuable because the platform’s documentation is extensive and frequently updated. Security teams often need quick answers about specific configuration options, rule syntax, or troubleshooting procedures. A RAG system built on the official Wazuh documentation can provide accurate, up-to-date responses that reference the actual documentation content.

Preparing for RAG Integration

Before integrating RAG with the Wazuh documentation, you need to complete the following steps:

  • Evaluate the Current Documentation: Analyze the existing Wazuh documentation.
  • Collect Data: Gather all necessary data and information sources that RAG will use to generate responses.
  • Select Tools: Determine the appropriate tools and technologies for integrating RAG into your system.

Evaluating the Current Wazuh Documentation

The Wazuh documentation is available on Wazuh GitHub. This documentation contains essential information about Wazuh, its features, and capabilities.

The documentation uses a documentation generator based on Sphinx.

This allows you to compile the documentation locally and use it for Retrieval-Augmented Generation (RAG).

How to Compile Wazuh Documentation Locally

To compile the Wazuh documentation for subsequent use in RAG, follow these steps:

  1. Ensure Python and pip are Installed: Check if Python and pip are installed on your computer.

  2. Alternative: Using Docker: If you do not want to install Python and pip, you can use Docker for compilation.

  3. Download the Wazuh Documentation: Use the command:

    git clone https://github.com/wazuh/wazuh-documentation.git -b <branch-name>
    

    Replace <branch-name> with the desired branch.

  4. Navigate to the Documentation Directory: Run the command:

    cd wazuh-documentation
    
  5. Install Dependencies: Run the command:

    pip install -r requirements.txt
    
  6. Compile the Documentation: In the root directory of the repository, run the command:

    make output-format
    

    Replace output-format with the desired output format.

By following these steps, you can compile the Wazuh documentation for use in RAG.

Compiling Wazuh Documentation Using Docker

You can compile the Wazuh documentation using Docker by following these steps:

  1. Install Docker: Ensure Docker is installed on your computer.

  2. Download the Wazuh Documentation: Use the command:

    git clone https://github.com/wazuh/wazuh-documentation.git -b v4.11.0
    
  3. Create a Dockerfile and docker-compose File: In the documentation directory, create the files Dockerfile and docker-compose.yml. In the Dockerfile, specify the necessary instructions for compilation, and in docker-compose.yml, configure the services for working with Docker.

  4. Compile the Documentation:: Run the command to compile the documentation using Docker.

By following these steps, you can efficiently compile the Wazuh documentation using Docker.

Now let’s do it step by step:

  1. Create a Compilation Directory:

    mkdir wazuh-documentation-rag
    cd wazuh-documentation-rag
    
  2. Download the Wazuh Documentation Repository: In this example, the documentation for version 4.11.0 is used:

    git clone https://github.com/wazuh/wazuh-documentation.git -b v4.11.0
    
  3. Create a Dockerfile: In the wazuh-documentation-rag directory, create a file named Dockerfile with the following content:

    # Use the base image with Python 3.9
    FROM python:3.9
    
    # Set the working directory
    WORKDIR /app
    
    # Copy the dependencies to the /tmp/requirements.txt folder
    COPY wazuh-documentation/requirements.txt /tmp/requirements.txt
    
    # Install the dependencies
    RUN pip install -r /tmp/requirements.txt
    
    CMD ["sleep", "infinity"]
    
  4. Create a docker-compose.yml File: In the same directory, create a file named docker-compose.yml to manage the Docker container:

    services:
      wazuh-docs:
        build: .
        volumes:
          - ./wazuh-documentation:/app/wazuh-documentation
    
  5. Run:

    docker compose up -d --build 
    

Unfortunately, it is currently not possible to compile the documentation directly into a PDF format.

However, you can compile the document into a single HTML format and then convert it to PDF.

To compile the single HTML, follow these commands:

  1. Run docker compose up -d --build
  2. Connect to the container using docker compose exec -it wazuh-docs bash
  3. Compile the documentation with the command cd /app/wazuh-documentation && make singlehtml
  4. Wait for the compilation to complete (it may take some times)
  5. Exit the container: exit
  6. Navigate to the directory with the compiled documentation: cd wazuh-documentation/build/singlehtml/
  7. КConvert the single HTML to PDF using the command wkhtmltopdf index.html wazuh.pdf, ensuring that wkhtmltopdf installed.

Expected Outcomes and Benefits

Once the Wazuh documentation is compiled and prepared for RAG integration, you will have a foundation for building an AI-powered documentation assistant with the following capabilities:

  • Accurate query responses: The RAG system will provide answers grounded in the official Wazuh documentation, reducing the risk of hallucinated or outdated information that standalone LLMs may produce.
  • Contextual search: Unlike traditional keyword search, the RAG approach understands semantic meaning, allowing users to ask natural language questions such as “How do I configure file integrity monitoring for a specific directory?” and receive relevant documentation excerpts.
  • Version-specific answers: By compiling documentation for a specific Wazuh version (such as 4.11.0 in our example), the RAG system provides answers that are accurate for your deployed version, avoiding confusion caused by documentation changes between releases.
  • Reduced onboarding time: New team members can quickly find answers to Wazuh configuration and operational questions without manually searching through hundreds of documentation pages.

To be continued in Part 2, where we will implement the actual RAG pipeline code, including document chunking, embedding generation, and the query interface. Stay tuned for updates.


Series Navigation:


See also