Skip to content

ripgrep crates/searcher/src/searcher/mod.rs: Code Companion

Reference code for the Searcher Core lecture. Sections correspond to the lecture document.


Section 1: The Builder Pattern in Action

/// A builder for configuring a searcher.
///
/// Once a searcher has been built, it is beneficial to reuse that searcher
/// for multiple searches, if possible.
#[derive(Clone, Debug)]
pub struct SearcherBuilder {
    config: Config,
}

impl Default for SearcherBuilder {
    fn default() -> SearcherBuilder {
        SearcherBuilder::new()
    }
}

impl SearcherBuilder {
    /// Create a new searcher builder with a default configuration.
    pub fn new() -> SearcherBuilder {
        SearcherBuilder { config: Config::default() }
    }

    /// Build a searcher with the given matcher.
    pub fn build(&self) -> Searcher {
        let mut config = self.config.clone();
        // Normalize passthru mode: context makes no sense when passing through all lines
        if config.passthru {
            config.before_context = 0;
            config.after_context = 0;
        }

        // Configure the transcoding layer
        let mut decode_builder = DecodeReaderBytesBuilder::new();
        decode_builder
            .encoding(self.config.encoding.as_ref().map(|e| e.0))
            .utf8_passthru(true)  // Pass UTF-8 through unchanged
            .strip_bom(self.config.bom_sniffing)
            .bom_override(true)   // BOM takes precedence over explicit encoding
            .bom_sniffing(self.config.bom_sniffing);

        Searcher {
            config,
            decode_builder,
            // Pre-allocate 8KB for transcoding scratch space
            decode_buffer: RefCell::new(vec![0; 8 * (1 << 10)]),
            line_buffer: RefCell::new(self.config.line_buffer()),
            multi_line_buffer: RefCell::new(vec![]),
        }
    }

    /// Example of a fluent builder method returning &mut Self
    pub fn line_terminator(
        &mut self,
        line_term: LineTerminator,
    ) -> &mut SearcherBuilder {
        self.config.line_term = line_term;
        self
    }

    // ... additional builder methods follow the same pattern
}

The build method performs normalization (passthru disables context) and initializes all infrastructure the Searcher needs. This ensures the returned searcher is immediately ready for use.


Section 2: Configuration as a First-Class Concept

/// The internal configuration of a searcher. This is shared among several
/// search related types, but is only ever written to by the SearcherBuilder.
#[derive(Clone, Debug)]
pub struct Config {
    line_term: LineTerminator,
    invert_match: bool,
    after_context: usize,
    before_context: usize,
    passthru: bool,
    line_number: bool,
    /// When None, no explicit limit. When Some(0), only mmap strategy available.
    heap_limit: Option<usize>,
    mmap: MmapChoice,
    binary: BinaryDetection,
    multi_line: bool,
    encoding: Option<Encoding>,
    bom_sniffing: bool,
    stop_on_nonmatch: bool,
    max_matches: Option<u64>,
}

impl Default for Config {
    fn default() -> Config {
        Config {
            line_term: LineTerminator::default(),
            invert_match: false,
            after_context: 0,
            before_context: 0,
            passthru: false,
            line_number: true,          // Enabled by default
            heap_limit: None,           // No limit by default
            mmap: MmapChoice::default(),
            binary: BinaryDetection::default(),  // Disabled by default
            multi_line: false,
            encoding: None,
            bom_sniffing: true,         // Enabled by default
            stop_on_nonmatch: false,
            max_matches: None,
        }
    }
}

impl Config {
    /// Return the maximal amount of lines needed to fulfill this
    /// configuration's context.
    fn max_context(&self) -> usize {
        cmp::max(self.before_context, self.after_context)
    }

    /// Build a line buffer from this configuration.
    fn line_buffer(&self) -> LineBuffer {
        let mut builder = LineBufferBuilder::new();
        builder
            .line_terminator(self.config.line_term.as_byte())
            .binary_detection(self.config.binary.0);

        // Configure heap limits if set
        if let Some(limit) = self.heap_limit {
            let (capacity, additional) = if limit <= DEFAULT_BUFFER_CAPACITY {
                (limit, 0)
            } else {
                // Split limit between initial capacity and growth allowance
                (DEFAULT_BUFFER_CAPACITY, limit - DEFAULT_BUFFER_CAPACITY)
            };
            builder
                .capacity(capacity)
                .buffer_alloc(BufferAllocation::Error(additional));
        }
        builder.build()
    }
}

The Config struct centralizes all search parameters. The line_buffer method shows how configuration translates into concrete infrastructure, properly handling heap limits by splitting them between initial capacity and growth allowance.


Section 3: Binary Detection Strategies

/// The behavior of binary detection while searching.
#[derive(Clone, Debug, Default, Eq, PartialEq)]
pub struct BinaryDetection(line_buffer::BinaryDetection);

impl BinaryDetection {
    /// No binary detection is performed. Data reported by the searcher may
    /// contain arbitrary bytes. This is the default.
    pub fn none() -> BinaryDetection {
        BinaryDetection(line_buffer::BinaryDetection::None)
    }

    /// Binary detection is performed by looking for the given byte.
    /// When found, the search stops as if it reached EOF.
    ///
    /// Behavior differs by search mode:
    /// - Fixed buffer: all content is checked
    /// - Memory map/heap: only matched regions + initial prefix checked
    pub fn quit(binary_byte: u8) -> BinaryDetection {
        BinaryDetection(line_buffer::BinaryDetection::Quit(binary_byte))
    }

    /// Binary detection replaces the given byte with the line terminator.
    ///
    /// With fixed buffer: caller never observes this byte
    /// With memory map: this setting has no effect (data is read-only)
    pub fn convert(binary_byte: u8) -> BinaryDetection {
        BinaryDetection(line_buffer::BinaryDetection::Convert(binary_byte))
    }

    /// If this uses "quit" strategy, returns the trigger byte.
    pub fn quit_byte(&self) -> Option<u8> {
        match self.0 {
            line_buffer::BinaryDetection::Quit(b) => Some(b),
            _ => None,
        }
    }

    /// If this uses "convert" strategy, returns the byte to replace.
    pub fn convert_byte(&self) -> Option<u8> {
        match self.0 {
            line_buffer::BinaryDetection::Convert(b) => Some(b),
            _ => None,
        }
    }
}

BinaryDetection wraps the internal line_buffer::BinaryDetection type, providing a public API while keeping the implementation detail private. The three strategies (none, quit, convert) handle the inherent ambiguity of what constitutes "binary" data.


Section 4: Encoding and Transcoding Support

/// An encoding to use when searching.
///
/// An `Encoding` will always be cheap to clone.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct Encoding(&'static encoding_rs::Encoding);

impl Encoding {
    /// Create a new encoding for the specified label.
    ///
    /// The encoding label is mapped via the Encoding Standard.
    /// Returns an error if the label doesn't correspond to a valid encoding.
    pub fn new(label: &str) -> Result<Encoding, ConfigError> {
        let label = label.as_bytes();
        match encoding_rs::Encoding::for_label_no_replacement(label) {
            Some(encoding) => Ok(Encoding(encoding)),
            None => {
                Err(ConfigError::UnknownEncoding { label: label.to_vec() })
            }
        }
    }
}

impl Searcher {
    /// Returns true if and only if the given slice needs to be transcoded.
    fn slice_needs_transcoding(&self, slice: &[u8]) -> bool {
        // Explicit encoding always requires transcoding
        self.config.encoding.is_some()
            // BOM sniffing enabled AND a BOM is actually present
            || (self.config.bom_sniffing && slice_has_bom(slice))
    }
}

/// Returns true if the slice begins with a UTF-8 or UTF-16 BOM.
fn slice_has_bom(slice: &[u8]) -> bool {
    let enc = match encoding_rs::Encoding::for_bom(slice) {
        None => return false,
        Some((enc, _)) => enc,
    };
    log::trace!("found byte-order mark (BOM) for encoding {enc:?}");
    // Only these three encodings are recognized via BOM
    [encoding_rs::UTF_16LE, encoding_rs::UTF_16BE, encoding_rs::UTF_8]
        .contains(&enc)
}

The Encoding type wraps encoding_rs to provide encapsulation. The slice_needs_transcoding method optimizes the common case: plain ASCII/UTF-8 data without a BOM can be searched directly without transcoding overhead.


Section 5: The Searcher Type and Interior Mutability

/// A searcher executes searches over a haystack and writes results to a caller
/// provided sink.
#[derive(Clone, Debug)]
pub struct Searcher {
    /// The configuration for this searcher.
    config: Config,
    /// A builder for constructing a streaming transcoder.
    /// When no transcoding is needed, passes through bytes unchanged.
    decode_builder: DecodeReaderBytesBuilder,
    /// Buffer for transcoding scratch space.
    decode_buffer: RefCell<Vec<u8>>,
    /// Line buffer for incremental line-oriented searching.
    ///
    /// We wrap it in RefCell to permit lending out borrows of `Searcher`
    /// to sinks. We still require a mutable borrow to execute a search, so
    /// we statically prevent callers from causing RefCell to panic at runtime
    /// due to a borrowing violation.
    line_buffer: RefCell<LineBuffer>,
    /// Buffer for storing entire contents when multi-line searching.
    /// Multi-line searches cannot be performed incrementally.
    multi_line_buffer: RefCell<Vec<u8>>,
}

impl Searcher {
    /// Create a new searcher with a default configuration.
    pub fn new() -> Searcher {
        SearcherBuilder::new().build()
    }

    /// Dynamically update binary detection (useful between searches).
    pub fn set_binary_detection(&mut self, detection: BinaryDetection) {
        self.config.binary = detection.clone();
        // Also update the line buffer's detection setting
        self.line_buffer.borrow_mut().set_binary_detection(detection.0);
    }
}

The RefCell wrappers enable a specific borrowing pattern: the searcher can lend references to its configuration to sink implementations while still mutating its buffers. The &mut self requirement on search methods prevents runtime borrow panics.


Section 6: Search Strategy Selection

impl Searcher {
    /// Execute a search over the file with the given path.
    pub fn search_path<P, M, S>(
        &mut self,
        matcher: M,
        path: P,
        write_to: S,
    ) -> Result<(), S::Error>
    where
        P: AsRef<Path>,
        M: Matcher,
        S: Sink,
    {
        let path = path.as_ref();
        let file = File::open(path).map_err(S::Error::error_io)?;
        self.search_file_maybe_path(matcher, Some(path), &file, write_to)
    }

    fn search_file_maybe_path<M, S>(
        &mut self,
        matcher: M,
        path: Option<&Path>,
        file: &File,
        write_to: S,
    ) -> Result<(), S::Error>
    where
        M: Matcher,
        S: Sink,
    {
        // Strategy 1: Memory map if enabled and available
        if let Some(mmap) = self.config.mmap.open(file, path) {
            log::trace!("{:?}: searching via memory map", path);
            return self.search_slice(matcher, &mmap, write_to);
        }

        // Strategy 2: Multi-line fast path for files (pre-allocate based on file size)
        if self.multi_line_with_matcher(&matcher) {
            log::trace!("{:?}: reading entire file on to heap for multiline", path);
            self.fill_multi_line_buffer_from_file::<S>(file)?;
            log::trace!("{:?}: searching via multiline strategy", path);
            MultiLine::new(
                self,
                matcher,
                &*self.multi_line_buffer.borrow(),
                write_to,
            )
            .run()
        } else {
            // Strategy 3: Generic incremental reader
            log::trace!("{:?}: searching using generic reader", path);
            self.search_reader(matcher, file, write_to)
        }
    }

    /// Execute a search over any implementation of `std::io::Read`.
    pub fn search_reader<M, R, S>(
        &mut self,
        matcher: M,
        read_from: R,
        write_to: S,
    ) -> Result<(), S::Error>
    where
        M: Matcher,
        R: io::Read,
        S: Sink,
    {
        self.check_config(&matcher).map_err(S::Error::error_config)?;

        // Set up transcoding layer
        let mut decode_buffer = self.decode_buffer.borrow_mut();
        let decoder = self
            .decode_builder
            .build_with_buffer(read_from, &mut *decode_buffer)
            .map_err(S::Error::error_io)?;

        if self.multi_line_with_matcher(&matcher) {
            // Multi-line: read everything into memory
            self.fill_multi_line_buffer_from_reader::<_, S>(decoder)?;
            MultiLine::new(
                self,
                matcher,
                &*self.multi_line_buffer.borrow(),
                write_to,
            )
            .run()
        } else {
            // Incremental: use rolling line buffer
            let mut line_buffer = self.line_buffer.borrow_mut();
            let rdr = LineBufferReader::new(decoder, &mut *line_buffer);
            ReadByLine::new(self, matcher, rdr, write_to).run()
        }
    }
}

The search methods implement a tiered strategy: memory map first (fastest for large files already in memory), then file-specific multi-line (efficient pre-allocation), then generic incremental (works with any Read).


Quick Reference

Search Strategy Decision Tree

search_path / search_file
   mmap available? ──yes──▶ search_slice (memory map)
        no
   multi_line needed? ──yes──▶ fill_multi_line_buffer_from_file
         │                              │
        no                              ▼
         │                     MultiLine search
   search_reader (incremental)
   multi_line needed? ──yes──▶ fill_multi_line_buffer_from_reader
         │                              │
        no                              ▼
         │                     MultiLine search
   ReadByLine (rolling buffer)

Key Type Signatures

Type Purpose
SearcherBuilder Fluent builder for Searcher configuration
Searcher Main search orchestrator with reusable buffers
Config All search parameters in one cloneable struct
BinaryDetection Strategy for handling binary data (none/quit/convert)
Encoding Wrapper around encoding_rs::Encoding
ConfigError Build-time configuration errors

Search Methods

Method Input Type Best For
search_path Path Files (enables mmap)
search_file &File Open file handles
search_reader impl Read Arbitrary streams
search_slice &[u8] In-memory data

Default Configuration

Setting Default Notes
line_number true Small perf cost
bom_sniffing true Detects UTF-16
binary none No detection
mmap never Explicit enable required
multi_line false Requires full file in memory
heap_limit None No limit