Skip to content

ripgrep crates/searcher/src/sink.rs: The Sink Trait

What This File Does

This file defines the Sink trait, which is the output interface for ripgrep's search engine. While the Matcher trait you studied earlier describes how to find matches, the Sink trait describes what to do with those matches once they're found. Every search operation in ripgrep needs somewhere to send its results—whether that's printing to a terminal, counting matches, collecting them into a data structure, or streaming them to another process.

The design follows a "push" model where the searcher drives execution, calling methods on the sink whenever significant events occur: matches found, context lines encountered, binary data detected, or search completion. This inverts the typical iterator pattern and creates a callback-style API that proves surprisingly flexible for building diverse output behaviors.


Section 1: Push vs Pull Iteration Models

Before diving into the code, let's understand the fundamental architectural choice this file represents. When designing a search API, you have two options for how results flow from the searcher to the consumer.

In a "pull" model, the consumer controls execution. You'd call something like searcher.next_match() repeatedly, getting results one at a time. This is how Rust iterators work—the caller decides when to advance. The pull model is intuitive and composes well with iterator adapters like map, filter, and collect.

The "push" model inverts this relationship. The searcher controls execution and pushes results to a callback as they're discovered. The caller provides a handler—in this case, a Sink implementation—and the searcher calls its methods when events occur. This model appears in JavaScript event handlers, Rust's std::fmt::Write, and streaming parsers.

Ripgrep chose the push model for several practical reasons. Search involves complex state management: tracking line numbers, managing buffers, handling context lines, detecting binary content. Exposing all this through an iterator interface would require either duplicating state on every yield or maintaining complex suspension points. The push model keeps that complexity inside the searcher, presenting a clean callback interface to consumers. The trait documentation is refreshingly honest about this—it explicitly acknowledges that the choice stems from "the complexity of the searcher implementation."

See: Companion Code Section 1


Section 2: The SinkError Trait and Error Flexibility

Before the main Sink trait, the file defines SinkError—a companion trait that describes what error types are acceptable. This pattern might seem like over-engineering at first, but it solves a real problem in library design.

Different consumers of the search API have different error handling needs. A command-line tool might use std::io::Error for everything. A library embedded in a larger application might need to wrap errors in a custom type. A testing harness might want Box<dyn Error> for maximum flexibility. The SinkError trait lets the Sink trait be generic over error types without restricting what those types can be.

The trait requires a single method: error_message, which constructs an error from anything implementing Display. This is the universal escape hatch—any error can be turned into a string representation. Two additional methods, error_io and error_config, provide specialized constructors for common error sources, but they default to calling error_message. This means implementing SinkError requires writing only one method while still allowing optimizations for specific error kinds.

Notice that io::Error implements SinkError with an optimization: its error_io implementation just returns the error unchanged rather than converting it to a string and back. This pattern—provide reasonable defaults but allow optimizations—recurs throughout ripgrep's design.

See: Companion Code Section 2


Section 3: The Core Sink Trait

The Sink trait itself is the centerpiece of this file. It defines an associated type for errors and seven methods representing different events during a search. Only one method, matched, is required—all others have default implementations that do nothing and return success.

The associated type type Error: SinkError creates a powerful constraint. Any error type used with Sink must implement SinkError, ensuring that the searcher can always construct appropriate errors regardless of what specific type the sink uses. This is Rust's trait system enabling late binding: the concrete error type isn't determined until someone implements Sink, but we're guaranteed it will have the capabilities we need.

The matched method is the heart of the trait. When the searcher finds a match, it calls this method with a reference to itself (allowing the sink to query searcher configuration) and a SinkMatch struct containing match details. The return type Result<bool, Self::Error> encodes three possible outcomes: continue searching (Ok(true)), stop searching gracefully (Ok(false)), or abort with an error (Err(...)). This tri-state return appears throughout the trait's methods and provides fine-grained control over search termination.

See: Companion Code Section 3


Section 4: Lifecycle and Event Methods

Beyond matched, the trait defines six additional methods forming a complete lifecycle for search operations. Understanding when each is called reveals the searcher's execution model.

The begin method fires once when a search starts, before examining any input. This is your hook for initialization—opening output files, writing headers, resetting counters. The finish method bookends the search, called after all input is processed (unless an error occurred). It receives a SinkFinish struct containing summary statistics like total bytes searched.

Between begin and finish, three methods handle the main search flow. We've discussed matched for actual matches. The context method handles context lines—the lines before and after matches that tools like grep traditionally show with -B and -A flags. The context_break method signals gaps between groups of context lines, typically rendered as -- separator lines in grep output.

The binary_data method addresses a practical concern: what happens when searching encounters binary content. By default, it simply continues, but implementations might want to skip the file, report a warning, or switch to a binary-aware display mode.

All these methods follow the same Result<bool, Self::Error> pattern, meaning any event can stop the search. You could implement a sink that stops after finding ten matches by counting in matched and returning Ok(false) on the eleventh call.

See: Companion Code Section 4


Section 5: Blanket Implementations for Indirection

The file includes two implementations of Sink that don't define new behavior—they forward to existing implementations through indirection. These are impl Sink for &mut S and impl Sink for Box<S>.

The mutable reference implementation lets you pass a borrowed sink to a search function. Without this, you'd need to move the sink into the searcher and recover it afterward. With this implementation, you can write searcher.search(&mut my_sink) and retain ownership of my_sink for further use—checking how many matches were found, for instance.

The Box implementation enables dynamic dispatch. Sometimes you don't know at compile time which sink implementation you'll use. Perhaps a command-line flag determines whether output goes to a terminal with colors or a file without them. Boxing the sink allows runtime selection: let sink: Box<dyn Sink<Error = io::Error>> = if use_colors { Box::new(ColorSink::new()) } else { Box::new(PlainSink::new()) }.

Note the ?Sized bound on the Box implementation: impl<S: Sink + ?Sized> Sink for Box<S>. This allows Box<dyn Sink> to work, not just Box<ConcreteSink>. The ?Sized relaxes Rust's default assumption that generic types have known sizes at compile time.

These implementations consist entirely of delegation—each method just calls (**self).method(...). The double dereference unwraps the reference or box to reach the underlying sink. The #[inline] attributes hint that the compiler should eliminate this indirection when possible.

See: Companion Code Section 5


Section 6: Match Data with SinkMatch

When the searcher finds a match, it doesn't just report the matching bytes—it packages extensive metadata into a SinkMatch struct. This struct is the primary currency of information exchange between searcher and sink.

The struct contains the raw match bytes, an absolute byte offset indicating where in the input the match starts, an optional line number (available only if the searcher was configured to track them), the line terminator being used, and access to the underlying buffer. The fields are marked pub(crate), meaning the searcher can construct these structs, but external code must use the accessor methods.

The lines method returns an iterator over individual lines within the match. This matters for multi-line matches, which can span several lines. The iterator uses LineIter, an internal utility that respects the configured line terminator. On Unix, that's usually \n; on Windows, it might be \r\n; and in unusual cases, it could be something else entirely.

The buffer method and bytes_range_in_buffer method provide a window into the searcher's internal state. This is unusual—most APIs would hide such details—but it enables advanced use cases. A syntax highlighter, for example, might need to look at surrounding context that isn't part of the formal match. By exposing the buffer and the match's location within it, ripgrep enables such extensions without complicating the common case.

See: Companion Code Section 6


Section 7: Context Lines and Search Summary

The SinkContext struct parallels SinkMatch but describes non-matching lines shown for context. It includes a SinkContextKind enum distinguishing three cases: Before (lines preceding a match), After (lines following a match), and Other (used in "passthru" mode, which shows all lines regardless of match status).

Context handling is surprisingly complex in a real grep implementation. Consider overlapping contexts: if matches are close together, their before and after contexts might overlap. The searcher handles this complexity internally, and the sink just sees a stream of context lines with appropriate kind markers. Context breaks (signaled by context_break) indicate gaps where lines were skipped.

The SinkFinish struct provides summary statistics after a search completes. Currently, it contains the total byte count searched and, if binary detection is enabled, the offset where binary data was first encountered. This struct uses pub(crate) fields like SinkMatch, keeping construction internal while exposing read-only accessors.

The binary detection offset is particularly useful for tools that want to report "binary file matches" messages. By recording where binary data was first seen, the sink can make informed decisions about how to present results—perhaps switching from line-by-line output to a simple "Binary file matches" message.

See: Companion Code Section 7


Section 8: The Convenience Sinks Module

The file concludes with a sinks module containing pre-built Sink implementations for common use cases. These are adapter types that wrap closures, trading flexibility for convenience.

The UTF8 sink is the most commonly useful. It wraps a closure taking a line number and a string slice, handling the conversion from raw bytes internally. If the match contains invalid UTF-8, it returns an error. If line numbers weren't enabled in the searcher, it returns an error. These restrictions simplify the closure: you're guaranteed valid UTF-8 and a line number.

The Lossy sink relaxes the UTF-8 requirement. Invalid bytes are replaced with the Unicode replacement character (�) rather than causing an error. This uses String::from_utf8_lossy, which the code notes could theoretically be optimized to avoid allocation but currently isn't due to standard library limitations.

The Bytes sink skips UTF-8 conversion entirely, passing raw bytes to the closure. This is appropriate when you need to handle arbitrary binary data or when you'll do your own encoding handling.

All three convenience sinks wrap tuple structs around the closure: pub struct UTF8<F>(pub F). The public inner field allows construction without a new method: sinks::UTF8(|line, text| { ... }). This is a common Rust pattern for simple wrapper types.

See: Companion Code Section 8


Section 9: Connection to the Builder Pattern Ecosystem

The Sink trait integrates with ripgrep's broader builder pattern infrastructure. Looking at how Searcher is configured (in searcher/mod.rs) and how printers work (in printer/standard.rs), you'll see a recurring pattern: builders construct configured objects, and those objects interact through traits like Sink.

When you build a Searcher with specific options—line numbers enabled, context lines set, binary detection configured—those choices affect what the Sink receives. A searcher configured without line counting will never populate the line_number field in SinkMatch. A searcher without context configured will never call the context method. The Sink implementations in the sinks module enforce these expectations, returning errors if required features aren't enabled.

The printer implementations from the grep-printer crate are sophisticated Sink implementations. They handle color output, JSON formatting, summary statistics, and more. By implementing Sink, they can be passed to any searcher without the searcher knowing anything about how results will be displayed. This separation of concerns—finding matches vs. displaying them—is the fundamental abstraction this trait enables.

See: Companion Code Section 9


Key Takeaways

First, the push model inverts typical iteration, having the searcher call into your code rather than your code pulling from the searcher—a design driven by the complexity of search state management.

Second, the SinkError trait demonstrates how to make a generic trait flexible over error types while ensuring necessary capabilities through trait bounds.

Third, the tri-state return pattern of Result<bool, Error> elegantly encodes continue, stop gracefully, and abort with error in a single type.

Fourth, blanket implementations for &mut S and Box<S> enable borrowed sinks and dynamic dispatch without requiring implementors to think about these patterns.

Fifth, separating match discovery from match handling through the Sink trait creates a clean abstraction boundary that allows searchers and output formatters to evolve independently.

Sixth, the convenience sinks in the sinks module show how wrapper types around closures can provide ergonomic APIs for common cases while the full trait remains available for complex ones.


How does the searcher actually call these sink methods? Read crates/searcher/src/searcher/mod.rs to see the search loop that drives sink invocation.

What do production-quality sink implementations look like? The crates/printer/src/standard.rs file implements Sink for terminal output with colors, line numbers, and context.

How does this trait connect to the configuration builders? Examine crates/searcher/src/searcher/mod.rs for SearcherBuilder, which creates searchers whose options determine what information sinks receive.