Skip to content

ripgrep crates/searcher/src/lib.rs: Code Companion

Reference code for the Searcher Overview lecture. Sections correspond to the lecture document.


Section 1: The Crate Documentation as Architecture Document

/*!
This crate provides an implementation of line oriented search, with optional
support for multi-line search.

# Brief overview

The principle type in this crate is a [`Searcher`], which can be configured
and built by a [`SearcherBuilder`]. A `Searcher` is responsible for reading
bytes from a source (e.g., a file), executing a search of those bytes using
a `Matcher` (e.g., a regex) and then reporting the results of that search to
a [`Sink`] (e.g., stdout). The `Searcher` itself is principally responsible
for managing the consumption of bytes from a source and applying a `Matcher`
over those bytes in an efficient way. The `Searcher` is also responsible for
inverting a search, counting lines, reporting contextual lines, detecting
binary data and even deciding whether or not to use memory maps.
*/

The //! doc comment syntax marks this as module-level documentation. The [Searcher] syntax creates links to types in generated documentation.


Section 2: Public API Design Through Re-exports

pub use crate::{
    // Line iteration utilities from the lines module
    lines::{LineIter, LineStep},

    // Core searcher types and configuration
    searcher::{
        BinaryDetection,    // How to handle binary files
        ConfigError,        // Configuration validation errors
        Encoding,           // Text encoding settings
        MmapChoice,         // Memory-mapped file options
        Searcher,           // The main search executor
        SearcherBuilder,    // Builder for configuring Searcher
    },

    // Result handling types
    sink::{
        Sink,              // Trait for receiving search results
        SinkContext,       // Context lines around matches
        SinkContextKind,   // Before/after context distinction
        SinkError,         // Error type for sink operations
        SinkFinish,        // End-of-search summary data
        SinkMatch,         // Individual match data
        sinks,             // Module of convenience implementations
    },
};

The crate:: prefix refers to the current crate's root. Users import these types directly from grep_searcher:: without knowing the internal module structure.


Section 3: The Builder Pattern Ecosystem

pub use crate::{
    searcher::{
        // The builder creates configured Searcher instances
        SearcherBuilder,
        // The built type with all configuration applied
        Searcher,
    },
    // ...
};

The builder pattern pairs visible here: SearcherBuilder constructs Searcher. This mirrors RegexMatcherBuilder/RegexMatcher from the regex crate.


Section 4: The Sink Abstraction

pub use crate::{
    sink::{
        Sink,              // The core trait - defines callback methods
        SinkContext,       // Data passed for context lines
        SinkContextKind,   // Enum: Before, After, or Other
        SinkError,         // Errors that can occur in sink callbacks
        SinkFinish,        // Summary data at search completion
        SinkMatch,         // Data passed for each matching line
        sinks,             // Convenience implementations
    },
};

The supporting types (SinkContext, SinkMatch, etc.) carry data to the Sink trait's callback methods. This is a push-based streaming API rather than a pull-based collection.


Section 5: Convenience Sinks and Closures

// From the documentation example - using the UTF8 convenience sink
use grep_searcher::sinks::UTF8;

// The closure signature: (line_number, line_content) -> Result<bool, Error>
// Return Ok(true) to continue, Ok(false) to stop early
Searcher::new().search_slice(&matcher, SHERLOCK, UTF8(|lnum, line| {
    // Process the match...
    Ok(true)  // Continue searching
}))?;

The UTF8 sink wraps a closure, handling byte-to-string conversion. The bool return value provides early termination control—useful for "find first match" scenarios.


Section 6: The Documentation Example Dissected

use {
    grep_matcher::Matcher,      // The trait (from grep-matcher crate)
    grep_regex::RegexMatcher,   // The implementation (from grep-regex crate)
    grep_searcher::Searcher,    // The search orchestrator
    grep_searcher::sinks::UTF8, // Convenience sink for UTF-8 text
};

// Test data as a byte slice
const SHERLOCK: &'static [u8] = b"...";

// Step 1: Build the matcher
let matcher = RegexMatcher::new(r"Doctor \w+")?;

// Step 2: Prepare result storage
let mut matches: Vec<(u64, String)> = vec![];

// Step 3: Execute the search
Searcher::new().search_slice(&matcher, SHERLOCK, UTF8(|lnum, line| {
    // The searcher found a line with a match, but we re-run
    // the matcher to get precise match boundaries
    let mymatch = matcher.find(line.as_bytes())?.unwrap();
    matches.push((lnum, line[mymatch].to_string()));
    Ok(true)
}))?;

Three crates compose: grep_matcher defines the interface, grep_regex provides the implementation, grep_searcher orchestrates the search. The nested matcher.find() call extracts precise match text from matching lines.


Section 7: Module Organization and Visibility

// Macro definitions available to subsequent modules
#[macro_use]
mod macros;

// Internal modules (no pub = private to crate)
mod line_buffer;  // Buffered reading implementation
mod lines;        // Line iteration utilities
mod searcher;     // Core Searcher type and builder
mod sink;         // Sink trait and implementations

// Test utilities only compiled in test builds
#[cfg(test)]
mod testutil;

The #[macro_use] attribute must appear before modules that use the macros. The #[cfg(test)] conditional compilation excludes test utilities from release builds.


Quick Reference

Architecture Flow

Source (file/stdin/bytes)
    Searcher (orchestration)
    Matcher (pattern matching)
    Sink (result handling)

Key Types

Type Purpose
Searcher Orchestrates search over byte sources
SearcherBuilder Configures and constructs Searcher
Sink Trait for receiving search results
SinkMatch Data for a matching line
SinkContext Data for context lines
UTF8 Convenience sink wrapping a closure

Re-export Pattern

// Internal structure:
mod internal_module {
    pub struct SomeType;
}

// Public API (in lib.rs):
pub use crate::internal_module::SomeType;

// User sees:
use my_crate::SomeType;  // Clean, flat path

Closure Sink Return Values

Return Effect
Ok(true) Continue searching
Ok(false) Stop search early
Err(e) Abort with error