Skip to content

Ripgrep Autopsy: Anatomy of an Idiomatic Rust CLI Application

"ripgrep was initially a larger pile of tightly coupled code; it did not start out with most of its logic separated into crates."
— BurntSushi (Andrew Gallant)

Overview

Ripgrep (rg) is one of the most studied, well-maintained Rust codebases in existence. Written by Andrew Gallant, who also authored the regex, memchr, and many other foundational Rust crates, it represents a masterclass in:

  • Workspace-based modular architecture
  • Separation of concerns via crate boundaries
  • Performance-first design without sacrificing correctness
  • Idiomatic error handling
  • Cross-platform CLI development

This autopsy examines ripgrep's architecture from both high and low levels, extracting patterns and idioms applicable to any serious Rust application.


1. High-Level Architecture

1.1 The Workspace Pattern

Ripgrep uses Cargo's workspace feature to organize code into focused, reusable crates:

# From Cargo.toml
[workspace]
members = [
  "crates/globset",    # Glob pattern matching
  "crates/grep",       # Facade crate
  "crates/cli",        # CLI utilities
  "crates/matcher",    # Abstract matching trait
  "crates/pcre2",      # PCRE2 regex engine
  "crates/printer",    # Output formatting
  "crates/regex",      # Rust regex integration
  "crates/searcher",   # File searching logic
  "crates/ignore",     # .gitignore handling
]

Key Insight: The main binary is defined separately:

[[bin]]
bench = false
path = "crates/core/main.rs"
name = "rg"

This means the crates/core/ directory contains the CLI application itself, while all the library logic lives in separate, independently testable crates.

1.2 The Dependency Graph

                    ┌─────────────────┐
                    │   rg (binary)   │
                    │ crates/core/    │
                    └────────┬────────┘
              ┌──────────────┼──────────────┐
              │              │              │
              ▼              ▼              ▼
         ┌────────┐    ┌──────────┐   ┌──────────┐
         │ ignore │    │   grep   │   │   cli    │
         │        │    │ (facade) │   │          │
         └────┬───┘    └────┬─────┘   └──────────┘
              │             │
              │    ┌────────┼────────┬─────────┐
              │    │        │        │         │
              ▼    ▼        ▼        ▼         ▼
         ┌────────┐  ┌─────────┐ ┌────────┐ ┌───────┐
         │globset │  │ matcher │ │searcher│ │printer│
         └────────┘  └────┬────┘ └────────┘ └───────┘
                    ┌─────┴─────┐
                    │           │
                    ▼           ▼
               ┌───────┐   ┌───────┐
               │ regex │   │ pcre2 │
               └───────┘   └───────┘

1.3 The Facade Pattern (grep crate)

The grep crate doesn't contain significant logic — it's a facade that re-exports functionality from sub-crates:

// crates/grep/src/lib.rs
pub use grep_matcher::*;
pub use grep_printer::*;
pub use grep_regex::*;
pub use grep_searcher::*;

Why This Pattern?

  1. Simplifies downstream usage — Users depend on one crate instead of four
  2. Hides internal organization — Implementation can be refactored without breaking API
  3. Version coordination — All sub-crates move in lockstep

2. Core Abstractions

2.1 The Matcher Trait

The cornerstone of ripgrep's extensibility is the Matcher trait in grep-matcher:

pub trait Matcher {
    type Captures: Captures;
    type Error: std::error::Error;

    fn find_at(&self, haystack: &[u8], at: usize) -> Result<Option<Match>, Self::Error>;
    fn new_captures(&self) -> Result<Self::Captures, Self::Error>;
    fn capture_count(&self) -> usize;
    fn line_terminator(&self) -> Option<LineTerminator>;

    // ... more methods
}

Key Design Decisions:

  1. Works on &[u8], not &str — Handles non-UTF8 files gracefully
  2. Associated types for flexibility — Different matchers can have different capture/error types
  3. Optional line terminator — Enables --null-data mode where \0 is the line terminator

2.2 The Sink Trait

Output formatting is abstracted through the Sink trait:

pub trait Sink {
    type Error;

    fn matched(&mut self, searcher: &Searcher, mat: &SinkMatch<'_>) -> Result<bool, Self::Error>;
    fn context(&mut self, searcher: &Searcher, context: &SinkContext<'_>) -> Result<bool, Self::Error>;
    fn finish(&mut self, searcher: &Searcher, sink_finish: &SinkFinish) -> Result<(), Self::Error>;
    // ...
}

This enables: - Standard grep-like output - JSON Lines output - Custom output formats without modifying search logic

2.3 Searcher Configuration

The Searcher uses the builder pattern:

let mut searcher = SearcherBuilder::new()
    .line_number(true)
    .heap_limit(Some(0))
    .build();

3. The CLI Architecture

3.1 Two-Level Argument Parsing

Ripgrep uses a sophisticated two-level argument system:

Command Line String
    ┌─────────┐
    │ lexopt  │  ← Low-level tokenization
    └────┬────┘
    ┌──────────┐
    │ LowArgs  │  ← Direct representation of CLI flags
    └────┬─────┘
    ┌──────────┐
    │ HiArgs   │  ← Validated, resolved configuration
    └──────────┘

LowArgs: Mirrors CLI flags directly

#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub enum Command {
    Search,
    SearchParallel,
    SearchNever,
    Files,
    FilesParallel,
    TypeList,
    PCRE2Version,
    RipgrepVersion,
    VersionLong,
}

HiArgs: High-level, validated configuration with computed values

pub struct HiArgs {
    // Configuration fields computed from LowArgs
    // Handles precedence, defaults, config files, env vars
}

impl HiArgs {
    pub fn matcher(&self) -> Result<impl Matcher>;
    pub fn searcher(&self) -> SearcherBuilder;
    pub fn printer(&self, mode: SearchMode, wtr: impl WriteColor) -> impl Sink;
    pub fn walk_builder(&self) -> Result<WalkBuilder>;
}

3.2 The Haystack Abstraction

// crates/core/haystack.rs
pub struct Haystack {
    // Represents something to search over (file, stdin, etc.)
}

This abstraction normalizes: - Regular files - Standard input - Files from directory walking

3.3 The Search Worker

// crates/core/search.rs
pub struct SearchWorker<M, S, P> {
    matcher: M,
    searcher: S,
    printer: P,
}

The SearchWorker coordinates: 1. Matcher — Pattern matching engine 2. Searcher — File reading and searching 3. Printer — Output formatting


4. Performance Patterns

4.1 Memory-Mapped vs Buffered I/O

Ripgrep dynamically chooses search strategy:

// Pseudo-code from search logic
if single_large_file && file_fits_in_memory {
    use memory_map();  // Faster for big single files
} else {
    use_incremental_buffer();  // Better for many small files
}

4.2 Parallel Directory Walking

The ignore crate provides lock-free parallel directory traversal:

args.walk_builder()?.build_parallel().run(|| {
    // This closure is called on multiple threads
    let mut searcher = searcher.clone();

    Box::new(move |entry| {
        // Process each file in parallel
        WalkState::Continue
    })
});

Key Pattern: AtomicBool for coordination:

let matched = AtomicBool::new(false);
let searched = AtomicBool::new(false);

// In worker threads:
matched.store(true, Ordering::SeqCst);

4.3 Literal Optimization

The grep-regex crate extracts literals for fast pre-filtering:

// crates/regex/src/literal.rs
// Extracts literal substrings from regex patterns
// Uses memchr for SIMD-accelerated byte searching

If the pattern is foo.*bar, ripgrep can use memchr to find foo first, then only run the full regex on promising lines.

4.4 Smart Defaults

// Only count lines if we're going to display them
SearcherBuilder::new().line_number(needs_line_numbers).build()

Counting lines is fast but not free — ripgrep only pays the cost when needed.


5. Error Handling Idioms

5.1 The anyhow Crate

Ripgrep uses anyhow for top-level error handling:

// crates/core/main.rs
fn main() {
    if let Err(err) = run() {
        eprintln!("{:#}", err);
        std::process::exit(1);
    }
}

fn run() -> anyhow::Result<()> {
    // Application logic with ? operator
}

5.2 Custom Error Types in Libraries

Library crates define specific error types:

// In grep-regex
pub struct Error {
    kind: ErrorKind,
}

pub enum ErrorKind {
    Regex(regex::Error),
    // ... other variants
}

5.3 Result Type Aliases

// Common pattern in each crate
pub type Result<T> = std::result::Result<T, Error>;

6. Cross-Platform Patterns

6.1 Conditional Compilation

#[cfg(all(target_env = "musl", target_pointer_width = "64"))]
use tikv_jemallocator;

#[cfg(windows)]
mod windows_specific;

6.2 Platform-Specific Allocators

# In Cargo.toml
[target.'cfg(all(target_env = "musl", target_pointer_width = "64"))'.dependencies.tikv-jemallocator]
version = "0.6.0"

Musl builds use jemalloc because musl's allocator is slower for ripgrep's workload.

6.3 Build Script (build.rs)

fn main() {
    set_git_revision_hash();
    set_windows_exe_options();
}

fn set_git_revision_hash() {
    // Embeds git hash at compile time via RIPGREP_BUILD_GIT_HASH
}

fn set_windows_exe_options() {
    // Enables long path support on Windows
    // Embeds manifest file
}

7. Testing Patterns

7.1 Integration Tests

[[test]]
name = "integration"
path = "tests/tests.rs"
// tests/tests.rs
mod binary;
mod feature;
mod json;
mod misc;
mod multiline;
mod regression;

7.2 In-Crate Unit Tests

Each library crate has internal tests:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn config_error_heap_limit() {
        let matcher = RegexMatcher::new("");
        let sink = KitchenSink::new();
        let mut searcher = SearcherBuilder::new().heap_limit(Some(0)).build();
        let res = searcher.search_slice(matcher, &[], sink);
        assert!(res.is_err());
    }
}

7.3 Test Utilities

// tests/util.rs
pub struct Dir {
    // Temporary directory for testing
}

impl Dir {
    pub fn create_file(&self, name: &str, contents: &str);
    pub fn rg(&self) -> Command;
}

8. Configuration System

8.1 Environment Variables

// RIPGREP_CONFIG_PATH for config file location
// NO_COLOR for disabling colors
// TERM for terminal detection

8.2 Configuration File Format

# ~/.ripgreprc
--max-columns=150
--smart-case
--type-add
web:*.{html,css,js}

Key Constraint: No escaping, one argument per line.

8.3 Configuration Precedence

1. Built-in defaults (lowest)
2. Configuration file (via RIPGREP_CONFIG_PATH)
3. Command-line arguments (highest, last flag wins)

9. Code Style and Conventions

9.1 rustfmt Configuration

# rustfmt.toml
max_width = 79
use_small_heuristics = "max"
edition = "2024"

79 columns — Stricter than Rust default (100), enables side-by-side diffs.

9.2 Documentation Style

Every public item has documentation:

/// The command that ripgrep should execute based on the command line
/// configuration.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub enum Command {
    /// Search using exactly one thread.
    Search,
    /// Search using possibly many threads.
    SearchParallel,
    // ...
}

9.3 Derive Attributes

Standard derives for configuration types:

#[derive(Clone, Copy, Debug, Eq, PartialEq)]

10. Key Takeaways for Your Own Projects

10.1 Architectural Patterns

  1. Workspace for modularity — Separate concerns into crates
  2. Facade crates — Simplify external dependencies
  3. Trait-based abstraction — Enable extensibility without coupling
  4. Builder pattern — Complex configuration with sensible defaults

10.2 Performance Patterns

  1. Work on bytes (&[u8]) — Don't require UTF-8 when unnecessary
  2. Conditional optimization — Choose strategy based on workload
  3. Parallel by default — Use crossbeam for lock-free parallelism
  4. Literal extraction — Pre-filter with fast substring search

10.3 CLI Patterns

  1. Two-level argument parsing — Raw flags → Validated configuration
  2. Configuration file support — Respects environment and dotfiles
  3. Graceful degradation — Works on non-UTF8 files, handles errors cleanly

10.4 Code Organization

  1. Binary separate from librarycrates/core/main.rs imports library crates
  2. Associated types in traits — Avoid boxing, enable optimization
  3. Platform abstraction — Conditional compilation for portability

Source Files Reference

Crate Purpose
crates/core/ The rg binary — CLI, argument handling, search orchestration
crates/grep/ Facade re-exporting matcher, printer, regex, searcher
crates/matcher/ Abstract Matcher trait
crates/regex/ Rust regex implementation of Matcher
crates/pcre2/ PCRE2 implementation of Matcher (optional)
crates/searcher/ File searching with buffered/mmap strategies
crates/printer/ Output formatting (standard, JSON)
crates/ignore/ Parallel directory walking with .gitignore support
crates/globset/ Fast glob pattern matching
crates/cli/ CLI utilities (colors, decompression, process handling)

Further Reading

  1. BurntSushi's blog: https://blog.burntsushi.net/ripgrep/
  2. Regex internals: https://blog.burntsushi.net/regex-internals/
  3. Repository: https://github.com/BurntSushi/ripgrep

"Your performance intuition is useless. Run perf."
— Comment in Rust's layout.rs, but equally applicable here