ripgrep crates/searcher/src/searcher/mod.rs: Code Companion¶
Reference code for the Searcher Core lecture. Sections correspond to the lecture document.
Section 1: The Builder Pattern in Action¶
/// A builder for configuring a searcher.
///
/// Once a searcher has been built, it is beneficial to reuse that searcher
/// for multiple searches, if possible.
#[derive(Clone, Debug)]
pub struct SearcherBuilder {
config: Config,
}
impl Default for SearcherBuilder {
fn default() -> SearcherBuilder {
SearcherBuilder::new()
}
}
impl SearcherBuilder {
/// Create a new searcher builder with a default configuration.
pub fn new() -> SearcherBuilder {
SearcherBuilder { config: Config::default() }
}
/// Build a searcher with the given matcher.
pub fn build(&self) -> Searcher {
let mut config = self.config.clone();
// Normalize passthru mode: context makes no sense when passing through all lines
if config.passthru {
config.before_context = 0;
config.after_context = 0;
}
// Configure the transcoding layer
let mut decode_builder = DecodeReaderBytesBuilder::new();
decode_builder
.encoding(self.config.encoding.as_ref().map(|e| e.0))
.utf8_passthru(true) // Pass UTF-8 through unchanged
.strip_bom(self.config.bom_sniffing)
.bom_override(true) // BOM takes precedence over explicit encoding
.bom_sniffing(self.config.bom_sniffing);
Searcher {
config,
decode_builder,
// Pre-allocate 8KB for transcoding scratch space
decode_buffer: RefCell::new(vec![0; 8 * (1 << 10)]),
line_buffer: RefCell::new(self.config.line_buffer()),
multi_line_buffer: RefCell::new(vec![]),
}
}
/// Example of a fluent builder method returning &mut Self
pub fn line_terminator(
&mut self,
line_term: LineTerminator,
) -> &mut SearcherBuilder {
self.config.line_term = line_term;
self
}
// ... additional builder methods follow the same pattern
}
The build method performs normalization (passthru disables context) and initializes all infrastructure the Searcher needs. This ensures the returned searcher is immediately ready for use.
Section 2: Configuration as a First-Class Concept¶
/// The internal configuration of a searcher. This is shared among several
/// search related types, but is only ever written to by the SearcherBuilder.
#[derive(Clone, Debug)]
pub struct Config {
line_term: LineTerminator,
invert_match: bool,
after_context: usize,
before_context: usize,
passthru: bool,
line_number: bool,
/// When None, no explicit limit. When Some(0), only mmap strategy available.
heap_limit: Option<usize>,
mmap: MmapChoice,
binary: BinaryDetection,
multi_line: bool,
encoding: Option<Encoding>,
bom_sniffing: bool,
stop_on_nonmatch: bool,
max_matches: Option<u64>,
}
impl Default for Config {
fn default() -> Config {
Config {
line_term: LineTerminator::default(),
invert_match: false,
after_context: 0,
before_context: 0,
passthru: false,
line_number: true, // Enabled by default
heap_limit: None, // No limit by default
mmap: MmapChoice::default(),
binary: BinaryDetection::default(), // Disabled by default
multi_line: false,
encoding: None,
bom_sniffing: true, // Enabled by default
stop_on_nonmatch: false,
max_matches: None,
}
}
}
impl Config {
/// Return the maximal amount of lines needed to fulfill this
/// configuration's context.
fn max_context(&self) -> usize {
cmp::max(self.before_context, self.after_context)
}
/// Build a line buffer from this configuration.
fn line_buffer(&self) -> LineBuffer {
let mut builder = LineBufferBuilder::new();
builder
.line_terminator(self.config.line_term.as_byte())
.binary_detection(self.config.binary.0);
// Configure heap limits if set
if let Some(limit) = self.heap_limit {
let (capacity, additional) = if limit <= DEFAULT_BUFFER_CAPACITY {
(limit, 0)
} else {
// Split limit between initial capacity and growth allowance
(DEFAULT_BUFFER_CAPACITY, limit - DEFAULT_BUFFER_CAPACITY)
};
builder
.capacity(capacity)
.buffer_alloc(BufferAllocation::Error(additional));
}
builder.build()
}
}
The Config struct centralizes all search parameters. The line_buffer method shows how configuration translates into concrete infrastructure, properly handling heap limits by splitting them between initial capacity and growth allowance.
Section 3: Binary Detection Strategies¶
/// The behavior of binary detection while searching.
#[derive(Clone, Debug, Default, Eq, PartialEq)]
pub struct BinaryDetection(line_buffer::BinaryDetection);
impl BinaryDetection {
/// No binary detection is performed. Data reported by the searcher may
/// contain arbitrary bytes. This is the default.
pub fn none() -> BinaryDetection {
BinaryDetection(line_buffer::BinaryDetection::None)
}
/// Binary detection is performed by looking for the given byte.
/// When found, the search stops as if it reached EOF.
///
/// Behavior differs by search mode:
/// - Fixed buffer: all content is checked
/// - Memory map/heap: only matched regions + initial prefix checked
pub fn quit(binary_byte: u8) -> BinaryDetection {
BinaryDetection(line_buffer::BinaryDetection::Quit(binary_byte))
}
/// Binary detection replaces the given byte with the line terminator.
///
/// With fixed buffer: caller never observes this byte
/// With memory map: this setting has no effect (data is read-only)
pub fn convert(binary_byte: u8) -> BinaryDetection {
BinaryDetection(line_buffer::BinaryDetection::Convert(binary_byte))
}
/// If this uses "quit" strategy, returns the trigger byte.
pub fn quit_byte(&self) -> Option<u8> {
match self.0 {
line_buffer::BinaryDetection::Quit(b) => Some(b),
_ => None,
}
}
/// If this uses "convert" strategy, returns the byte to replace.
pub fn convert_byte(&self) -> Option<u8> {
match self.0 {
line_buffer::BinaryDetection::Convert(b) => Some(b),
_ => None,
}
}
}
BinaryDetection wraps the internal line_buffer::BinaryDetection type, providing a public API while keeping the implementation detail private. The three strategies (none, quit, convert) handle the inherent ambiguity of what constitutes "binary" data.
Section 4: Encoding and Transcoding Support¶
/// An encoding to use when searching.
///
/// An `Encoding` will always be cheap to clone.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct Encoding(&'static encoding_rs::Encoding);
impl Encoding {
/// Create a new encoding for the specified label.
///
/// The encoding label is mapped via the Encoding Standard.
/// Returns an error if the label doesn't correspond to a valid encoding.
pub fn new(label: &str) -> Result<Encoding, ConfigError> {
let label = label.as_bytes();
match encoding_rs::Encoding::for_label_no_replacement(label) {
Some(encoding) => Ok(Encoding(encoding)),
None => {
Err(ConfigError::UnknownEncoding { label: label.to_vec() })
}
}
}
}
impl Searcher {
/// Returns true if and only if the given slice needs to be transcoded.
fn slice_needs_transcoding(&self, slice: &[u8]) -> bool {
// Explicit encoding always requires transcoding
self.config.encoding.is_some()
// BOM sniffing enabled AND a BOM is actually present
|| (self.config.bom_sniffing && slice_has_bom(slice))
}
}
/// Returns true if the slice begins with a UTF-8 or UTF-16 BOM.
fn slice_has_bom(slice: &[u8]) -> bool {
let enc = match encoding_rs::Encoding::for_bom(slice) {
None => return false,
Some((enc, _)) => enc,
};
log::trace!("found byte-order mark (BOM) for encoding {enc:?}");
// Only these three encodings are recognized via BOM
[encoding_rs::UTF_16LE, encoding_rs::UTF_16BE, encoding_rs::UTF_8]
.contains(&enc)
}
The Encoding type wraps encoding_rs to provide encapsulation. The slice_needs_transcoding method optimizes the common case: plain ASCII/UTF-8 data without a BOM can be searched directly without transcoding overhead.
Section 5: The Searcher Type and Interior Mutability¶
/// A searcher executes searches over a haystack and writes results to a caller
/// provided sink.
#[derive(Clone, Debug)]
pub struct Searcher {
/// The configuration for this searcher.
config: Config,
/// A builder for constructing a streaming transcoder.
/// When no transcoding is needed, passes through bytes unchanged.
decode_builder: DecodeReaderBytesBuilder,
/// Buffer for transcoding scratch space.
decode_buffer: RefCell<Vec<u8>>,
/// Line buffer for incremental line-oriented searching.
///
/// We wrap it in RefCell to permit lending out borrows of `Searcher`
/// to sinks. We still require a mutable borrow to execute a search, so
/// we statically prevent callers from causing RefCell to panic at runtime
/// due to a borrowing violation.
line_buffer: RefCell<LineBuffer>,
/// Buffer for storing entire contents when multi-line searching.
/// Multi-line searches cannot be performed incrementally.
multi_line_buffer: RefCell<Vec<u8>>,
}
impl Searcher {
/// Create a new searcher with a default configuration.
pub fn new() -> Searcher {
SearcherBuilder::new().build()
}
/// Dynamically update binary detection (useful between searches).
pub fn set_binary_detection(&mut self, detection: BinaryDetection) {
self.config.binary = detection.clone();
// Also update the line buffer's detection setting
self.line_buffer.borrow_mut().set_binary_detection(detection.0);
}
}
The RefCell wrappers enable a specific borrowing pattern: the searcher can lend references to its configuration to sink implementations while still mutating its buffers. The &mut self requirement on search methods prevents runtime borrow panics.
Section 6: Search Strategy Selection¶
impl Searcher {
/// Execute a search over the file with the given path.
pub fn search_path<P, M, S>(
&mut self,
matcher: M,
path: P,
write_to: S,
) -> Result<(), S::Error>
where
P: AsRef<Path>,
M: Matcher,
S: Sink,
{
let path = path.as_ref();
let file = File::open(path).map_err(S::Error::error_io)?;
self.search_file_maybe_path(matcher, Some(path), &file, write_to)
}
fn search_file_maybe_path<M, S>(
&mut self,
matcher: M,
path: Option<&Path>,
file: &File,
write_to: S,
) -> Result<(), S::Error>
where
M: Matcher,
S: Sink,
{
// Strategy 1: Memory map if enabled and available
if let Some(mmap) = self.config.mmap.open(file, path) {
log::trace!("{:?}: searching via memory map", path);
return self.search_slice(matcher, &mmap, write_to);
}
// Strategy 2: Multi-line fast path for files (pre-allocate based on file size)
if self.multi_line_with_matcher(&matcher) {
log::trace!("{:?}: reading entire file on to heap for multiline", path);
self.fill_multi_line_buffer_from_file::<S>(file)?;
log::trace!("{:?}: searching via multiline strategy", path);
MultiLine::new(
self,
matcher,
&*self.multi_line_buffer.borrow(),
write_to,
)
.run()
} else {
// Strategy 3: Generic incremental reader
log::trace!("{:?}: searching using generic reader", path);
self.search_reader(matcher, file, write_to)
}
}
/// Execute a search over any implementation of `std::io::Read`.
pub fn search_reader<M, R, S>(
&mut self,
matcher: M,
read_from: R,
write_to: S,
) -> Result<(), S::Error>
where
M: Matcher,
R: io::Read,
S: Sink,
{
self.check_config(&matcher).map_err(S::Error::error_config)?;
// Set up transcoding layer
let mut decode_buffer = self.decode_buffer.borrow_mut();
let decoder = self
.decode_builder
.build_with_buffer(read_from, &mut *decode_buffer)
.map_err(S::Error::error_io)?;
if self.multi_line_with_matcher(&matcher) {
// Multi-line: read everything into memory
self.fill_multi_line_buffer_from_reader::<_, S>(decoder)?;
MultiLine::new(
self,
matcher,
&*self.multi_line_buffer.borrow(),
write_to,
)
.run()
} else {
// Incremental: use rolling line buffer
let mut line_buffer = self.line_buffer.borrow_mut();
let rdr = LineBufferReader::new(decoder, &mut *line_buffer);
ReadByLine::new(self, matcher, rdr, write_to).run()
}
}
}
The search methods implement a tiered strategy: memory map first (fastest for large files already in memory), then file-specific multi-line (efficient pre-allocation), then generic incremental (works with any Read).
Quick Reference¶
Search Strategy Decision Tree¶
search_path / search_file
│
▼
mmap available? ──yes──▶ search_slice (memory map)
│
no
│
▼
multi_line needed? ──yes──▶ fill_multi_line_buffer_from_file
│ │
no ▼
│ MultiLine search
▼
search_reader (incremental)
│
▼
multi_line needed? ──yes──▶ fill_multi_line_buffer_from_reader
│ │
no ▼
│ MultiLine search
▼
ReadByLine (rolling buffer)
Key Type Signatures¶
| Type | Purpose |
|---|---|
SearcherBuilder |
Fluent builder for Searcher configuration |
Searcher |
Main search orchestrator with reusable buffers |
Config |
All search parameters in one cloneable struct |
BinaryDetection |
Strategy for handling binary data (none/quit/convert) |
Encoding |
Wrapper around encoding_rs::Encoding |
ConfigError |
Build-time configuration errors |
Search Methods¶
| Method | Input Type | Best For |
|---|---|---|
search_path |
Path |
Files (enables mmap) |
search_file |
&File |
Open file handles |
search_reader |
impl Read |
Arbitrary streams |
search_slice |
&[u8] |
In-memory data |
Default Configuration¶
| Setting | Default | Notes |
|---|---|---|
line_number |
true |
Small perf cost |
bom_sniffing |
true |
Detects UTF-16 |
binary |
none |
No detection |
mmap |
never |
Explicit enable required |
multi_line |
false |
Requires full file in memory |
heap_limit |
None |
No limit |