Skip to content

ripgrep crates/core/search.rs: Code Companion

Reference code for the Search Orchestration lecture. Sections correspond to the lecture document.


Section 1: The Configuration Foundation

/// The configuration for the search worker.
///
/// Among a few other things, the configuration primarily controls the way we
/// show search results to users at a very high level.
#[derive(Clone, Debug)]
struct Config {
    // External command to preprocess files before searching
    preprocessor: Option<std::path::PathBuf>,
    // Glob patterns to select which files get preprocessed
    preprocessor_globs: ignore::overrides::Override,
    // Whether to automatically decompress archives
    search_zip: bool,
    // Binary detection for files discovered during directory walking
    binary_implicit: grep::searcher::BinaryDetection,
    // Binary detection for files explicitly named by the user
    binary_explicit: grep::searcher::BinaryDetection,
}

impl Default for Config {
    fn default() -> Config {
        Config {
            preprocessor: None,
            preprocessor_globs: ignore::overrides::Override::empty(),
            search_zip: false,
            // By default, no binary detection - search everything
            binary_implicit: grep::searcher::BinaryDetection::none(),
            binary_explicit: grep::searcher::BinaryDetection::none(),
        }
    }
}

The Override type from the ignore crate provides gitignore-style glob matching. The two binary detection fields allow different policies based on how files were discovered.


Section 2: The Builder Pattern in Action

/// A builder for configuring and constructing a search worker.
#[derive(Clone, Debug)]
pub(crate) struct SearchWorkerBuilder {
    config: Config,
    command_builder: grep::cli::CommandReaderBuilder,
}

impl SearchWorkerBuilder {
    /// Create a new builder for configuring and constructing a search worker.
    pub(crate) fn new() -> SearchWorkerBuilder {
        let mut command_builder = grep::cli::CommandReaderBuilder::new();
        // Async stderr prevents deadlocks when preprocessors write errors
        command_builder.async_stderr(true);

        SearchWorkerBuilder { config: Config::default(), command_builder }
    }

    /// Create a new search worker using the given searcher, matcher and
    /// printer.
    pub(crate) fn build<W: WriteColor>(
        &self,
        matcher: PatternMatcher,
        searcher: grep::searcher::Searcher,
        printer: Printer<W>,
    ) -> SearchWorker<W> {
        let config = self.config.clone();
        let command_builder = self.command_builder.clone();
        // Only create decomp_builder when search_zip is enabled
        // This avoids work like resolving decompression binary paths
        let decomp_builder = config.search_zip.then(|| {
            let mut decomp_builder =
                grep::cli::DecompressionReaderBuilder::new();
            decomp_builder.async_stderr(true);
            decomp_builder
        });
        SearchWorker {
            config,
            command_builder,
            decomp_builder,
            matcher,
            searcher,
            printer,
        }
    }
}

The then() method on bool is an elegant way to conditionally create Option<T> values. It returns Some(closure_result) if true, None if false.


Section 3: Configuring the Preprocessor Pipeline

/// Set the path to a preprocessor command.
///
/// When this is set, instead of searching files directly, the given
/// command will be run with the file path as the first argument, and the
/// output of that command will be searched instead.
pub(crate) fn preprocessor(
    &mut self,
    cmd: Option<std::path::PathBuf>,
) -> anyhow::Result<&mut SearchWorkerBuilder> {
    if let Some(ref prog) = cmd {
        // Resolve the binary path now, fail fast if it doesn't exist
        let bin = grep::cli::resolve_binary(prog)?;
        self.config.preprocessor = Some(bin);
    } else {
        self.config.preprocessor = None;
    }
    // Returns Result to allow early error detection
    Ok(self)
}

/// Set the globs for determining which files should be run through the
/// preprocessor. By default, with no globs and a preprocessor specified,
/// every file is run through the preprocessor.
pub(crate) fn preprocessor_globs(
    &mut self,
    globs: ignore::overrides::Override,
) -> &mut SearchWorkerBuilder {
    self.config.preprocessor_globs = globs;
    self
}

/// Returns true if and only if the given file path should be run through
/// the preprocessor.
fn should_preprocess(&self, path: &Path) -> bool {
    if !self.config.preprocessor.is_some() {
        return false;
    }
    if self.config.preprocessor_globs.is_empty() {
        // No globs means preprocess everything
        return true;
    }
    // Glob match: file is preprocessed unless it's explicitly ignored
    !self.config.preprocessor_globs.matched(path, false).is_ignore()
}

The resolve_binary function handles platform-specific concerns like finding executables with implicit .exe extensions on Windows.


Section 4: The PatternMatcher and Printer Enums

/// The pattern matcher used by a search worker.
#[derive(Clone, Debug)]
pub(crate) enum PatternMatcher {
    RustRegex(grep::regex::RegexMatcher),
    // Only included when pcre2 feature is enabled
    #[cfg(feature = "pcre2")]
    PCRE2(grep::pcre2::RegexMatcher),
}

/// The printer used by a search worker.
///
/// The `W` type parameter refers to the type of the underlying writer.
#[derive(Clone, Debug)]
pub(crate) enum Printer<W> {
    /// Use the standard printer, which supports the classic grep-like format.
    Standard(grep::printer::Standard<W>),
    /// Use the summary printer, which supports aggregate displays of search
    /// results.
    Summary(grep::printer::Summary<W>),
    /// A JSON printer, which emits results in the JSON Lines format.
    JSON(grep::printer::JSON<W>),
}

impl<W: WriteColor> Printer<W> {
    /// Return a mutable reference to the underlying printer's writer.
    pub(crate) fn get_mut(&mut self) -> &mut W {
        match *self {
            Printer::Standard(ref mut p) => p.get_mut(),
            Printer::Summary(ref mut p) => p.get_mut(),
            Printer::JSON(ref mut p) => p.get_mut(),
        }
    }
}

The WriteColor trait bound (from termcolor) ensures the writer supports colored output. This abstracts over stdout, files, and test buffers.


Section 5: The SearchWorker Structure

/// A worker for executing searches.
///
/// It is intended for a single worker to execute many searches, and is
/// generally intended to be used from a single thread. When searching using
/// multiple threads, it is better to create a new worker for each thread.
#[derive(Clone, Debug)]
pub(crate) struct SearchWorker<W> {
    config: Config,
    command_builder: grep::cli::CommandReaderBuilder,
    /// This is `None` when `search_zip` is not enabled, since in this case it
    /// can never be used. We do this because building the reader can sometimes
    /// do non-trivial work (like resolving the paths of decompression binaries
    /// on Windows).
    decomp_builder: Option<grep::cli::DecompressionReaderBuilder>,
    matcher: PatternMatcher,
    searcher: grep::searcher::Searcher,
    printer: Printer<W>,
}

The Clone derive enables creating per-thread worker instances for parallel search. Each thread gets its own worker to avoid contention.


Section 6: The Main Search Entry Point

impl<W: WriteColor> SearchWorker<W> {
    /// Execute a search over the given haystack.
    pub(crate) fn search(
        &mut self,
        haystack: &crate::haystack::Haystack,
    ) -> io::Result<SearchResult> {
        // Select binary detection based on how the file was discovered
        let bin = if haystack.is_explicit() {
            self.config.binary_explicit.clone()
        } else {
            self.config.binary_implicit.clone()
        };
        let path = haystack.path();
        log::trace!("{}: binary detection: {:?}", path.display(), bin);

        // Configure the searcher with the chosen detection strategy
        self.searcher.set_binary_detection(bin);

        // Decision tree: how should we read this file?
        if haystack.is_stdin() {
            self.search_reader(path, &mut io::stdin().lock())
        } else if self.should_preprocess(path) {
            self.search_preprocessor(path)
        } else if self.should_decompress(path) {
            self.search_decompress(path)
        } else {
            self.search_path(path)
        }
    }

    /// Returns true if and only if the given file path should be
    /// decompressed before searching.
    fn should_decompress(&self, path: &Path) -> bool {
        self.decomp_builder.as_ref().is_some_and(|decomp_builder| {
            decomp_builder.get_matcher().has_command(path)
        })
    }
}

The is_some_and method combines Option::is_some with a predicate check in one step, avoiding unnecessary unwrapping.


Section 7: The Preprocessor Search Path

/// Search the given file path by first asking the preprocessor for the
/// data to search instead of opening the path directly.
fn search_preprocessor(
    &mut self,
    path: &Path,
) -> io::Result<SearchResult> {
    use std::{fs::File, process::Stdio};

    let bin = self.config.preprocessor.as_ref().unwrap();
    let mut cmd = std::process::Command::new(bin);
    // Pass file path as argument, also provide file content via stdin
    cmd.arg(path).stdin(Stdio::from(File::open(path)?));

    // Build a reader that captures stdout from the command
    let mut rdr = self.command_builder.build(&mut cmd).map_err(|err| {
        io::Error::new(
            io::ErrorKind::Other,
            format!(
                "preprocessor command could not start: '{cmd:?}': {err}",
            ),
        )
    })?;

    // Search the command's output, not the original file
    let result = self.search_reader(path, &mut rdr).map_err(|err| {
        io::Error::new(
            io::ErrorKind::Other,
            format!("preprocessor command failed: '{cmd:?}': {err}"),
        )
    });

    // Important: close the reader to wait for the process to exit
    let close_result = rdr.close();
    let search_result = result?;
    close_result?;
    Ok(search_result)
}

The error handling wraps original errors with context about which command failed. The close() call ensures the subprocess is properly reaped.


/// Attempt to decompress the data at the given file path and search the
/// result. If the given file path isn't recognized as a compressed file,
/// then search it without doing any decompression.
fn search_decompress(&mut self, path: &Path) -> io::Result<SearchResult> {
    let Some(ref decomp_builder) = self.decomp_builder else {
        // No decompression builder - fall back to direct search
        return self.search_path(path);
    };
    let mut rdr = decomp_builder.build(path)?;
    let result = self.search_reader(path, &mut rdr);
    let close_result = rdr.close();
    let search_result = result?;
    close_result?;
    Ok(search_result)
}

/// Search the contents of the given file path.
fn search_path(&mut self, path: &Path) -> io::Result<SearchResult> {
    use self::PatternMatcher::*;

    let (searcher, printer) = (&mut self.searcher, &mut self.printer);
    // Dispatch to the appropriate regex engine
    match self.matcher {
        RustRegex(ref m) => search_path(m, searcher, printer, path),
        #[cfg(feature = "pcre2")]
        PCRE2(ref m) => search_path(m, searcher, printer, path),
    }
}

The let-else pattern (let Some(...) = ... else { return ... }) provides clean early returns for Option values.


Section 9: The Generic Search Functions

/// Search the contents of the given file path using the given matcher,
/// searcher and printer.
fn search_path<M: Matcher, W: WriteColor>(
    matcher: M,
    searcher: &mut grep::searcher::Searcher,
    printer: &mut Printer<W>,
    path: &Path,
) -> io::Result<SearchResult> {
    match *printer {
        Printer::Standard(ref mut p) => {
            // Create a sink that connects the printer to the path
            let mut sink = p.sink_with_path(&matcher, path);
            // Execute the search, writing results to the sink
            searcher.search_path(&matcher, path, &mut sink)?;
            Ok(SearchResult {
                has_match: sink.has_match(),
                stats: sink.stats().map(|s| s.clone()),
            })
        }
        Printer::Summary(ref mut p) => {
            let mut sink = p.sink_with_path(&matcher, path);
            searcher.search_path(&matcher, path, &mut sink)?;
            Ok(SearchResult {
                has_match: sink.has_match(),
                stats: sink.stats().map(|s| s.clone()),
            })
        }
        Printer::JSON(ref mut p) => {
            let mut sink = p.sink_with_path(&matcher, path);
            searcher.search_path(&matcher, path, &mut sink)?;
            Ok(SearchResult {
                has_match: sink.has_match(),
                // JSON printer always has stats
                stats: Some(sink.stats().clone()),
            })
        }
    }
}

The Matcher trait bound allows any regex engine that implements the grep crate's matching interface. The sink pattern connects the searcher to the printer.


Section 10: The SearchResult Type

/// The result of executing a search.
///
/// Generally speaking, the "result" of a search is sent to a printer, which
/// writes results to an underlying writer such as stdout or a file. However,
/// every search also has some aggregate statistics or meta data that may be
/// useful to higher level routines.
#[derive(Clone, Debug, Default)]
pub(crate) struct SearchResult {
    has_match: bool,
    stats: Option<grep::printer::Stats>,
}

impl SearchResult {
    /// Whether the search found a match or not.
    pub(crate) fn has_match(&self) -> bool {
        self.has_match
    }

    /// Return aggregate search statistics for a single search, if available.
    ///
    /// It can be expensive to compute statistics, so these are only present
    /// if explicitly enabled in the printer provided by the caller.
    pub(crate) fn stats(&self) -> Option<&grep::printer::Stats> {
        self.stats.as_ref()
    }
}

Statistics are optional because computing them has performance cost. The caller decides whether to enable stats collection in the printer configuration.


Quick Reference

Search Decision Flow

search(haystack)
    ├─ stdin? ──────────────► search_reader(stdin)
    ├─ preprocessor? ───────► search_preprocessor ──► search_reader
    ├─ compressed? ─────────► search_decompress ───► search_reader
    └─ regular file ────────► search_path (may use mmap)

Key Types

Type Purpose
Config High-level search settings
SearchWorkerBuilder Fluent API for worker construction
SearchWorker<W> Executes searches, holds all components
PatternMatcher Abstracts over regex engines
Printer<W> Abstracts over output formats
SearchResult Match status and optional statistics

Binary Detection Strategies

Strategy Used For Typical Setting
binary_implicit Files found via directory walking quit (skip binary files)
binary_explicit Files named by user none (search everything)

Trait Bounds

// Pattern matching abstraction
M: Matcher

// Colored output writer
W: WriteColor

// Generic readable input
R: io::Read