ripgrep crates/core/search.rs: Code Companion¶
Reference code for the Search Orchestration lecture. Sections correspond to the lecture document.
Section 1: The Configuration Foundation¶
/// The configuration for the search worker.
///
/// Among a few other things, the configuration primarily controls the way we
/// show search results to users at a very high level.
#[derive(Clone, Debug)]
struct Config {
// External command to preprocess files before searching
preprocessor: Option<std::path::PathBuf>,
// Glob patterns to select which files get preprocessed
preprocessor_globs: ignore::overrides::Override,
// Whether to automatically decompress archives
search_zip: bool,
// Binary detection for files discovered during directory walking
binary_implicit: grep::searcher::BinaryDetection,
// Binary detection for files explicitly named by the user
binary_explicit: grep::searcher::BinaryDetection,
}
impl Default for Config {
fn default() -> Config {
Config {
preprocessor: None,
preprocessor_globs: ignore::overrides::Override::empty(),
search_zip: false,
// By default, no binary detection - search everything
binary_implicit: grep::searcher::BinaryDetection::none(),
binary_explicit: grep::searcher::BinaryDetection::none(),
}
}
}
The Override type from the ignore crate provides gitignore-style glob matching. The two binary detection fields allow different policies based on how files were discovered.
Section 2: The Builder Pattern in Action¶
/// A builder for configuring and constructing a search worker.
#[derive(Clone, Debug)]
pub(crate) struct SearchWorkerBuilder {
config: Config,
command_builder: grep::cli::CommandReaderBuilder,
}
impl SearchWorkerBuilder {
/// Create a new builder for configuring and constructing a search worker.
pub(crate) fn new() -> SearchWorkerBuilder {
let mut command_builder = grep::cli::CommandReaderBuilder::new();
// Async stderr prevents deadlocks when preprocessors write errors
command_builder.async_stderr(true);
SearchWorkerBuilder { config: Config::default(), command_builder }
}
/// Create a new search worker using the given searcher, matcher and
/// printer.
pub(crate) fn build<W: WriteColor>(
&self,
matcher: PatternMatcher,
searcher: grep::searcher::Searcher,
printer: Printer<W>,
) -> SearchWorker<W> {
let config = self.config.clone();
let command_builder = self.command_builder.clone();
// Only create decomp_builder when search_zip is enabled
// This avoids work like resolving decompression binary paths
let decomp_builder = config.search_zip.then(|| {
let mut decomp_builder =
grep::cli::DecompressionReaderBuilder::new();
decomp_builder.async_stderr(true);
decomp_builder
});
SearchWorker {
config,
command_builder,
decomp_builder,
matcher,
searcher,
printer,
}
}
}
The then() method on bool is an elegant way to conditionally create Option<T> values. It returns Some(closure_result) if true, None if false.
Section 3: Configuring the Preprocessor Pipeline¶
/// Set the path to a preprocessor command.
///
/// When this is set, instead of searching files directly, the given
/// command will be run with the file path as the first argument, and the
/// output of that command will be searched instead.
pub(crate) fn preprocessor(
&mut self,
cmd: Option<std::path::PathBuf>,
) -> anyhow::Result<&mut SearchWorkerBuilder> {
if let Some(ref prog) = cmd {
// Resolve the binary path now, fail fast if it doesn't exist
let bin = grep::cli::resolve_binary(prog)?;
self.config.preprocessor = Some(bin);
} else {
self.config.preprocessor = None;
}
// Returns Result to allow early error detection
Ok(self)
}
/// Set the globs for determining which files should be run through the
/// preprocessor. By default, with no globs and a preprocessor specified,
/// every file is run through the preprocessor.
pub(crate) fn preprocessor_globs(
&mut self,
globs: ignore::overrides::Override,
) -> &mut SearchWorkerBuilder {
self.config.preprocessor_globs = globs;
self
}
/// Returns true if and only if the given file path should be run through
/// the preprocessor.
fn should_preprocess(&self, path: &Path) -> bool {
if !self.config.preprocessor.is_some() {
return false;
}
if self.config.preprocessor_globs.is_empty() {
// No globs means preprocess everything
return true;
}
// Glob match: file is preprocessed unless it's explicitly ignored
!self.config.preprocessor_globs.matched(path, false).is_ignore()
}
The resolve_binary function handles platform-specific concerns like finding executables with implicit .exe extensions on Windows.
Section 4: The PatternMatcher and Printer Enums¶
/// The pattern matcher used by a search worker.
#[derive(Clone, Debug)]
pub(crate) enum PatternMatcher {
RustRegex(grep::regex::RegexMatcher),
// Only included when pcre2 feature is enabled
#[cfg(feature = "pcre2")]
PCRE2(grep::pcre2::RegexMatcher),
}
/// The printer used by a search worker.
///
/// The `W` type parameter refers to the type of the underlying writer.
#[derive(Clone, Debug)]
pub(crate) enum Printer<W> {
/// Use the standard printer, which supports the classic grep-like format.
Standard(grep::printer::Standard<W>),
/// Use the summary printer, which supports aggregate displays of search
/// results.
Summary(grep::printer::Summary<W>),
/// A JSON printer, which emits results in the JSON Lines format.
JSON(grep::printer::JSON<W>),
}
impl<W: WriteColor> Printer<W> {
/// Return a mutable reference to the underlying printer's writer.
pub(crate) fn get_mut(&mut self) -> &mut W {
match *self {
Printer::Standard(ref mut p) => p.get_mut(),
Printer::Summary(ref mut p) => p.get_mut(),
Printer::JSON(ref mut p) => p.get_mut(),
}
}
}
The WriteColor trait bound (from termcolor) ensures the writer supports colored output. This abstracts over stdout, files, and test buffers.
Section 5: The SearchWorker Structure¶
/// A worker for executing searches.
///
/// It is intended for a single worker to execute many searches, and is
/// generally intended to be used from a single thread. When searching using
/// multiple threads, it is better to create a new worker for each thread.
#[derive(Clone, Debug)]
pub(crate) struct SearchWorker<W> {
config: Config,
command_builder: grep::cli::CommandReaderBuilder,
/// This is `None` when `search_zip` is not enabled, since in this case it
/// can never be used. We do this because building the reader can sometimes
/// do non-trivial work (like resolving the paths of decompression binaries
/// on Windows).
decomp_builder: Option<grep::cli::DecompressionReaderBuilder>,
matcher: PatternMatcher,
searcher: grep::searcher::Searcher,
printer: Printer<W>,
}
The Clone derive enables creating per-thread worker instances for parallel search. Each thread gets its own worker to avoid contention.
Section 6: The Main Search Entry Point¶
impl<W: WriteColor> SearchWorker<W> {
/// Execute a search over the given haystack.
pub(crate) fn search(
&mut self,
haystack: &crate::haystack::Haystack,
) -> io::Result<SearchResult> {
// Select binary detection based on how the file was discovered
let bin = if haystack.is_explicit() {
self.config.binary_explicit.clone()
} else {
self.config.binary_implicit.clone()
};
let path = haystack.path();
log::trace!("{}: binary detection: {:?}", path.display(), bin);
// Configure the searcher with the chosen detection strategy
self.searcher.set_binary_detection(bin);
// Decision tree: how should we read this file?
if haystack.is_stdin() {
self.search_reader(path, &mut io::stdin().lock())
} else if self.should_preprocess(path) {
self.search_preprocessor(path)
} else if self.should_decompress(path) {
self.search_decompress(path)
} else {
self.search_path(path)
}
}
/// Returns true if and only if the given file path should be
/// decompressed before searching.
fn should_decompress(&self, path: &Path) -> bool {
self.decomp_builder.as_ref().is_some_and(|decomp_builder| {
decomp_builder.get_matcher().has_command(path)
})
}
}
The is_some_and method combines Option::is_some with a predicate check in one step, avoiding unnecessary unwrapping.
Section 7: The Preprocessor Search Path¶
/// Search the given file path by first asking the preprocessor for the
/// data to search instead of opening the path directly.
fn search_preprocessor(
&mut self,
path: &Path,
) -> io::Result<SearchResult> {
use std::{fs::File, process::Stdio};
let bin = self.config.preprocessor.as_ref().unwrap();
let mut cmd = std::process::Command::new(bin);
// Pass file path as argument, also provide file content via stdin
cmd.arg(path).stdin(Stdio::from(File::open(path)?));
// Build a reader that captures stdout from the command
let mut rdr = self.command_builder.build(&mut cmd).map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!(
"preprocessor command could not start: '{cmd:?}': {err}",
),
)
})?;
// Search the command's output, not the original file
let result = self.search_reader(path, &mut rdr).map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!("preprocessor command failed: '{cmd:?}': {err}"),
)
});
// Important: close the reader to wait for the process to exit
let close_result = rdr.close();
let search_result = result?;
close_result?;
Ok(search_result)
}
The error handling wraps original errors with context about which command failed. The close() call ensures the subprocess is properly reaped.
Section 8: Decompression and Direct File Search¶
/// Attempt to decompress the data at the given file path and search the
/// result. If the given file path isn't recognized as a compressed file,
/// then search it without doing any decompression.
fn search_decompress(&mut self, path: &Path) -> io::Result<SearchResult> {
let Some(ref decomp_builder) = self.decomp_builder else {
// No decompression builder - fall back to direct search
return self.search_path(path);
};
let mut rdr = decomp_builder.build(path)?;
let result = self.search_reader(path, &mut rdr);
let close_result = rdr.close();
let search_result = result?;
close_result?;
Ok(search_result)
}
/// Search the contents of the given file path.
fn search_path(&mut self, path: &Path) -> io::Result<SearchResult> {
use self::PatternMatcher::*;
let (searcher, printer) = (&mut self.searcher, &mut self.printer);
// Dispatch to the appropriate regex engine
match self.matcher {
RustRegex(ref m) => search_path(m, searcher, printer, path),
#[cfg(feature = "pcre2")]
PCRE2(ref m) => search_path(m, searcher, printer, path),
}
}
The let-else pattern (let Some(...) = ... else { return ... }) provides clean early returns for Option values.
Section 9: The Generic Search Functions¶
/// Search the contents of the given file path using the given matcher,
/// searcher and printer.
fn search_path<M: Matcher, W: WriteColor>(
matcher: M,
searcher: &mut grep::searcher::Searcher,
printer: &mut Printer<W>,
path: &Path,
) -> io::Result<SearchResult> {
match *printer {
Printer::Standard(ref mut p) => {
// Create a sink that connects the printer to the path
let mut sink = p.sink_with_path(&matcher, path);
// Execute the search, writing results to the sink
searcher.search_path(&matcher, path, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
stats: sink.stats().map(|s| s.clone()),
})
}
Printer::Summary(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path);
searcher.search_path(&matcher, path, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
stats: sink.stats().map(|s| s.clone()),
})
}
Printer::JSON(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path);
searcher.search_path(&matcher, path, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
// JSON printer always has stats
stats: Some(sink.stats().clone()),
})
}
}
}
The Matcher trait bound allows any regex engine that implements the grep crate's matching interface. The sink pattern connects the searcher to the printer.
Section 10: The SearchResult Type¶
/// The result of executing a search.
///
/// Generally speaking, the "result" of a search is sent to a printer, which
/// writes results to an underlying writer such as stdout or a file. However,
/// every search also has some aggregate statistics or meta data that may be
/// useful to higher level routines.
#[derive(Clone, Debug, Default)]
pub(crate) struct SearchResult {
has_match: bool,
stats: Option<grep::printer::Stats>,
}
impl SearchResult {
/// Whether the search found a match or not.
pub(crate) fn has_match(&self) -> bool {
self.has_match
}
/// Return aggregate search statistics for a single search, if available.
///
/// It can be expensive to compute statistics, so these are only present
/// if explicitly enabled in the printer provided by the caller.
pub(crate) fn stats(&self) -> Option<&grep::printer::Stats> {
self.stats.as_ref()
}
}
Statistics are optional because computing them has performance cost. The caller decides whether to enable stats collection in the printer configuration.
Quick Reference¶
Search Decision Flow¶
search(haystack)
│
├─ stdin? ──────────────► search_reader(stdin)
│
├─ preprocessor? ───────► search_preprocessor ──► search_reader
│
├─ compressed? ─────────► search_decompress ───► search_reader
│
└─ regular file ────────► search_path (may use mmap)
Key Types¶
| Type | Purpose |
|---|---|
Config |
High-level search settings |
SearchWorkerBuilder |
Fluent API for worker construction |
SearchWorker<W> |
Executes searches, holds all components |
PatternMatcher |
Abstracts over regex engines |
Printer<W> |
Abstracts over output formats |
SearchResult |
Match status and optional statistics |
Binary Detection Strategies¶
| Strategy | Used For | Typical Setting |
|---|---|---|
binary_implicit |
Files found via directory walking | quit (skip binary files) |
binary_explicit |
Files named by user | none (search everything) |