Skip to content

Ripgrep haystack.rs: Code Companion

Reference code for the haystack.rs lecture. Sections correspond to the lecture document.


Section 1: The Builder Pattern

/// A builder for constructing things to search over.
#[derive(Clone, Debug)]
pub(crate) struct HaystackBuilder {
    strip_dot_prefix: bool,
}

impl HaystackBuilder {
    /// Return a new haystack builder with a default configuration.
    pub(crate) fn new() -> HaystackBuilder {
        HaystackBuilder { strip_dot_prefix: false }
    }

    /// When enabled, if the haystack's file path starts with `./` then it is
    /// stripped.
    ///
    /// This is useful when implicitly searching the current working directory.
    pub(crate) fn strip_dot_prefix(
        &mut self,
        yes: bool,
    ) -> &mut HaystackBuilder {
        self.strip_dot_prefix = yes;
        self
    }
}

Usage in HiArgs:

// From hiargs.rs
pub(crate) fn haystack_builder(&self) -> HaystackBuilder {
    let mut builder = HaystackBuilder::new();
    builder.strip_dot_prefix(self.paths.has_implicit_path);
    builder
}

Why a builder for one field?

  • Consistent API pattern across ripgrep
  • Room for future configuration
  • Separates construction from usage
  • Builder is Clone — can be shared across threads

Section 2: Building from Results

impl HaystackBuilder {
    /// Create a new haystack from a possibly missing directory entry.
    ///
    /// If the directory entry isn't present, then the corresponding error is
    /// logged if messages have been configured. Otherwise, if the directory
    /// entry is deemed searchable, then it is returned as a haystack.
    pub(crate) fn build_from_result(
        &self,
        result: Result<ignore::DirEntry, ignore::Error>,
    ) -> Option<Haystack> {
        match result {
            Ok(dent) => self.build(dent),
            Err(err) => {
                err_message!("{err}");
                None
            }
        }
    }
}

How it's used in main.rs:

// Single-threaded search
let unsorted = args
    .walk_builder()?
    .build()
    .filter_map(|result| haystack_builder.build_from_result(result));

// Parallel search
Box::new(move |result| {
    let haystack = match haystack_builder.build_from_result(result) {
        Some(haystack) => haystack,
        None => return WalkState::Continue,  // Skip, keep walking
    };
    // ... search the haystack
})

Error handling flow:

Walker yields Result<DirEntry, Error>
           ├── Err(e) → Log error, return None
           └── Ok(dent) → Apply filtering logic
                              ├── Passes → Some(Haystack)
                              └── Fails → None

Section 3: The Filtering Decision

impl HaystackBuilder {
    /// Create a new haystack using this builder's configuration.
    ///
    /// If a directory entry could not be created or should otherwise not be
    /// searched, then this returns `None` after emitting any relevant log
    /// messages.
    fn build(&self, dent: ignore::DirEntry) -> Option<Haystack> {
        let hay = Haystack { 
            dent, 
            strip_dot_prefix: self.strip_dot_prefix 
        };

        // Log any partial errors (e.g., metadata read failures)
        if let Some(err) = hay.dent.error() {
            ignore_message!("{err}");
        }

        // Rule 1: Explicit entries always pass
        if hay.is_explicit() {
            return Some(hay);
        }

        // Rule 2: Only search regular files
        if hay.is_file() {
            return Some(hay);
        }

        // Rule 3: Everything else is skipped
        if !hay.is_dir() {
            log::debug!(
                "ignoring {}: failed to pass haystack filter: \
                 file type: {:?}, metadata: {:?}",
                hay.dent.path().display(),
                hay.dent.file_type(),
                hay.dent.metadata()
            );
        }
        None
    }
}

Decision flowchart:

Directory Entry
  is_explicit()?
      ├── Yes → SEARCH IT
      └── No
       is_file()?
           ├── Yes → SEARCH IT
           └── No
            is_dir()?
                ├── Yes → Skip silently (normal)
                └── No → Skip with debug log (unusual)

Section 4: Explicit vs Implicit

impl Haystack {
    /// Returns true if and only if this entry corresponds to a haystack to
    /// search that was explicitly supplied by an end user.
    ///
    /// Generally, this corresponds to either stdin or an explicit file path
    /// argument. e.g., in `rg foo some-file ./some-dir/`, `some-file` is
    /// an explicit haystack, but, e.g., `./some-dir/some-other-file` is not.
    ///
    /// However, note that ripgrep does not see through shell globbing. e.g.,
    /// in `rg foo ./some-dir/*`, `./some-dir/some-other-file` will be treated
    /// as an explicit haystack.
    pub(crate) fn is_explicit(&self) -> bool {
        // stdin is always explicit
        // depth() == 0 means directly provided, not discovered
        // !is_dir() because directories are never "searched"
        self.is_stdin() || (self.dent.depth() == 0 && !self.is_dir())
    }

    /// Returns true if and only if this entry corresponds to stdin.
    pub(crate) fn is_stdin(&self) -> bool {
        self.dent.is_stdin()
    }
}

Examples:

Command Path Explicit? Why
rg foo file.txt file.txt Yes depth=0, user provided
rg foo ./src/ ./src/main.rs No depth>0, discovered
rg foo src/main.rs No depth>0, discovered
rg foo - stdin Yes stdin is always explicit
echo x \| rg foo stdin Yes stdin is always explicit
rg foo ./src/* ./src/lib.rs Yes shell expanded, depth=0

Why depth=0 means explicit:

// When you run: rg foo file1.txt ./dir/
// The walker sees:
//   file1.txt     → depth 0 (you provided it)
//   ./dir/        → depth 0 (you provided it)  
//   ./dir/a.rs    → depth 1 (walker found it)
//   ./dir/sub/b.rs → depth 2 (walker found it)

Section 5: Path Display

impl Haystack {
    /// Return the file path corresponding to this haystack.
    ///
    /// If this haystack corresponds to stdin, then a special `<stdin>` path
    /// is returned instead.
    pub(crate) fn path(&self) -> &Path {
        if self.strip_dot_prefix && self.dent.path().starts_with("./") {
            self.dent.path().strip_prefix("./").unwrap()
        } else {
            self.dent.path()
        }
    }
}

Behavior examples:

Input strip_dot_prefix path() returns
./src/main.rs true src/main.rs
./src/main.rs false ./src/main.rs
src/main.rs true src/main.rs
src/main.rs false src/main.rs
- (stdin) either <stdin> (from DirEntry)

When strip_dot_prefix is enabled:

// In hiargs.rs
builder.strip_dot_prefix(self.paths.has_implicit_path);

// has_implicit_path is true when user ran `rg foo` with no paths
// This makes output cleaner:
//   "src/main.rs:42:fn main()"
// Instead of:
//   "./src/main.rs:42:fn main()"

Section 6: File Type Detection

impl Haystack {
    /// Returns true if and only if this haystack points to a directory after
    /// following symbolic links.
    fn is_dir(&self) -> bool {
        let ft = match self.dent.file_type() {
            None => return false,  // Can't determine = not a dir
            Some(ft) => ft,
        };
        if ft.is_dir() {
            return true;
        }
        // Symlink that points to a directory?
        self.dent.path_is_symlink() && self.dent.path().is_dir()
    }

    /// Returns true if and only if this haystack points to a file.
    fn is_file(&self) -> bool {
        self.dent.file_type().map_or(false, |ft| ft.is_file())
    }
}

File type scenarios:

Filesystem Object is_file() is_dir() Searchable?
Regular file true false Yes
Directory false true No
Symlink → file false false Only if explicit
Symlink → dir false true No
Socket/FIFO/etc false false Only if explicit
Unknown (no metadata) false false Only if explicit

Why symlink handling differs:

// is_dir follows symlinks:
self.dent.path_is_symlink() && self.dent.path().is_dir()

// is_file does NOT follow symlinks:
self.dent.file_type().map_or(false, |ft| ft.is_file())

// Reason: By the time we get here, --follow has already been applied.
// If the user wanted symlinks followed, the walker already did it.
// A symlink that wasn't followed should not be searched.

Section 7: The Wrapper Structure

/// A haystack is a thing we want to search.
///
/// Generally, a haystack is either a file or stdin.
#[derive(Clone, Debug)]
pub(crate) struct Haystack {
    dent: ignore::DirEntry,
    strip_dot_prefix: bool,
}

What ignore::DirEntry provides:

// From the ignore crate (not shown in haystack.rs)
impl DirEntry {
    fn path(&self) -> &Path;
    fn file_type(&self) -> Option<FileType>;
    fn metadata(&self) -> Result<Metadata, Error>;
    fn depth(&self) -> usize;
    fn is_stdin(&self) -> bool;
    fn path_is_symlink(&self) -> bool;
    fn error(&self) -> Option<&Error>;
}

Why wrap instead of using DirEntry directly?

  1. Custom path logic — strip_dot_prefix behavior
  2. Application-level filtering — is_explicit, is_file
  3. Encapsulation — Search code sees Haystack API, not ignore API
  4. Future flexibility — Can add fields without changing ignore crate

Quick Reference: Full API

// Builder
impl HaystackBuilder {
    pub fn new() -> HaystackBuilder;
    pub fn strip_dot_prefix(&mut self, yes: bool) -> &mut Self;
    pub fn build_from_result(&self, result: Result<DirEntry, Error>) -> Option<Haystack>;
}

// Haystack
impl Haystack {
    pub fn path(&self) -> &Path;        // Display path
    pub fn is_stdin(&self) -> bool;     // Is this stdin?
    pub fn is_explicit(&self) -> bool;  // User-provided?
    fn is_dir(&self) -> bool;           // Directory? (private)
    fn is_file(&self) -> bool;          // Regular file? (private)
}

Integration Example

// Complete flow from main.rs search function

fn search(args: &HiArgs, mode: SearchMode) -> anyhow::Result<bool> {
    // 1. Get builder configured from HiArgs
    let haystack_builder = args.haystack_builder();

    // 2. Create iterator that converts DirEntry → Haystack
    let unsorted = args
        .walk_builder()?          // Configure directory walker
        .build()                  // Create iterator over Results
        .filter_map(|result| {    // Convert each Result
            haystack_builder.build_from_result(result)
        });

    // 3. Optionally sort
    let haystacks = args.sort(unsorted);

    // 4. Search each haystack
    for haystack in haystacks {
        let result = searcher.search(&haystack)?;
        // haystack.path() used for output formatting
    }

    Ok(matched)
}

Data flow:

WalkBuilder
Iterator<Item = Result<DirEntry, Error>>
    │  filter_map + build_from_result
Iterator<Item = Haystack>
    │  sort (optional)
Iterator<Item = Haystack>
    │  search loop
Results