Skip to content

ripgrep crates/core/haystack.rs: Code Companion

Reference code for the Haystack Discovery lecture. Sections correspond to the lecture document.


Section 1: The Builder Pattern at Its Simplest

/// A builder for constructing things to search over.
#[derive(Clone, Debug)]
pub(crate) struct HaystackBuilder {
    strip_dot_prefix: bool,  // The single configuration option
}

impl HaystackBuilder {
    /// Return a new haystack builder with a default configuration.
    pub(crate) fn new() -> HaystackBuilder {
        HaystackBuilder { strip_dot_prefix: false }
    }

    /// When enabled, if the haystack's file path starts with `./` then it is
    /// stripped.
    ///
    /// This is useful when implicitly searching the current working directory.
    pub(crate) fn strip_dot_prefix(
        &mut self,
        yes: bool,
    ) -> &mut HaystackBuilder {  // Returns &mut Self for method chaining
        self.strip_dot_prefix = yes;
        self
    }
}

The pub(crate) visibility restricts these types to the core crate. The strip_dot_prefix method returns &mut Self, enabling the fluent builder pattern: builder.strip_dot_prefix(true).build(entry).


Section 2: Constructing from Fallible Results

/// Create a new haystack from a possibly missing directory entry.
///
/// If the directory entry isn't present, then the corresponding error is
/// logged if messages have been configured. Otherwise, if the directory
/// entry is deemed searchable, then it is returned as a haystack.
pub(crate) fn build_from_result(
    &self,
    result: Result<ignore::DirEntry, ignore::Error>,  // From directory traversal
) -> Option<Haystack> {
    match result {
        Ok(dent) => self.build(dent),  // Delegate to main build logic
        Err(err) => {
            err_message!("{err}");     // Log error, don't propagate
            None                        // Signal "nothing to search"
        }
    }
}

The err_message! macro is defined elsewhere in ripgrep's core crate. It handles conditional error output based on configuration. Returning Option<Haystack> lets callers use filter_map to seamlessly skip errors during iteration.


Section 3: The Filtering Decision Tree

/// Create a new haystack using this builder's configuration.
///
/// If a directory entry could not be created or should otherwise not be
/// searched, then this returns `None` after emitting any relevant log
/// messages.
fn build(&self, dent: ignore::DirEntry) -> Option<Haystack> {
    // Wrap immediately to use Haystack's helper methods
    let hay = Haystack { dent, strip_dot_prefix: self.strip_dot_prefix };

    // Log partial errors but continue processing
    if let Some(err) = hay.dent.error() {
        ignore_message!("{err}");
    }

    // Priority 1: Explicit entries always pass through
    if hay.is_explicit() {
        return Some(hay);
    }

    // Priority 2: Regular files pass through
    if hay.is_file() {
        return Some(hay);
    }

    // Priority 3: Everything else gets rejected (with debug logging)
    if !hay.is_dir() {
        log::debug!(
            "ignoring {}: failed to pass haystack filter: \
             file type: {:?}, metadata: {:?}",
            hay.dent.path().display(),
            hay.dent.file_type(),
            hay.dent.metadata()
        );
    }
    None
}

The ignore_message! macro differs from err_message!—it's for warnings about things being skipped rather than outright failures. The decision tree's order matters: explicit paths get special treatment before the file-type check.


Section 4: Explicit vs Discovered Paths

/// A haystack is a thing we want to search.
///
/// Generally, a haystack is either a file or stdin.
#[derive(Clone, Debug)]
pub(crate) struct Haystack {
    dent: ignore::DirEntry,      // The underlying directory entry
    strip_dot_prefix: bool,       // Configuration baked in at construction
}

impl Haystack {
    /// Returns true if and only if this entry corresponds to stdin.
    pub(crate) fn is_stdin(&self) -> bool {
        self.dent.is_stdin()
    }

    /// Returns true if and only if this entry corresponds to a haystack to
    /// search that was explicitly supplied by an end user.
    ///
    /// Generally, this corresponds to either stdin or an explicit file path
    /// argument. e.g., in `rg foo some-file ./some-dir/`, `some-file` is
    /// an explicit haystack, but, e.g., `./some-dir/some-other-file` is not.
    ///
    /// However, note that ripgrep does not see through shell globbing. e.g.,
    /// in `rg foo ./some-dir/*`, `./some-dir/some-other-file` will be treated
    /// as an explicit haystack.
    pub(crate) fn is_explicit(&self) -> bool {
        // stdin is obvious. When an entry has a depth of 0, that means it
        // was explicitly provided to our directory iterator, which means it
        // was in turn explicitly provided by the end user. The !is_dir check
        // means that we want to search files even if their symlinks, again,
        // because they were explicitly provided. (And we never want to try
        // to search a directory.)
        self.is_stdin() || (self.dent.depth() == 0 && !self.is_dir())
    }
}

The depth() == 0 check is the key heuristic: entries at depth zero were passed directly to the walker, not discovered during traversal. The !is_dir() check excludes directories since we search their contents, not the directories themselves.


/// Returns true if and only if this haystack points to a directory after
/// following symbolic links.
fn is_dir(&self) -> bool {
    let ft = match self.dent.file_type() {
        None => return false,      // No file type means can't be a directory
        Some(ft) => ft,
    };
    if ft.is_dir() {
        return true;               // Direct directory
    }
    // If this is a symlink, then we want to follow it to determine
    // whether it's a directory or not.
    self.dent.path_is_symlink() && self.dent.path().is_dir()
}

/// Returns true if and only if this haystack points to a file.
fn is_file(&self) -> bool {
    self.dent.file_type().map_or(false, |ft| ft.is_file())
}

Note the asymmetry: is_file doesn't follow symlinks (uses only file_type()), but is_dir does (calls path().is_dir() which follows symlinks). This is intentional—discovered symlinks are skipped by the main filter, but we need symlink-aware directory checking for explicit paths.


Section 6: Path Presentation and User Experience

impl Haystack {
    /// Return the file path corresponding to this haystack.
    ///
    /// If this haystack corresponds to stdin, then a special `<stdin>` path
    /// is returned instead.
    pub(crate) fn path(&self) -> &Path {
        if self.strip_dot_prefix && self.dent.path().starts_with("./") {
            // strip_prefix returns Result, but we know "./" is present
            self.dent.path().strip_prefix("./").unwrap()
        } else {
            self.dent.path()
        }
    }
}

The unwrap() here is safe because we've already verified the prefix exists with starts_with("./"). This transforms ./src/main.rs into src/main.rs for cleaner output when searching the current directory.


Quick Reference

Type Summary

Type Visibility Purpose
HaystackBuilder pub(crate) Configures and creates haystacks
Haystack pub(crate) Wraps searchable directory entries

Haystack Decision Flow

DirEntry
    ├─ Error? → Log, return None
    ├─ Explicit? (stdin or depth=0) → Return Some(haystack)
    ├─ Regular file? → Return Some(haystack)
    └─ Otherwise → Log debug, return None

Key Method Signatures

// Builder
fn new() -> HaystackBuilder
fn build_from_result(&self, Result<DirEntry, Error>) -> Option<Haystack>
fn build(&self, DirEntry) -> Option<Haystack>
fn strip_dot_prefix(&mut self, bool) -> &mut HaystackBuilder

// Haystack
fn path(&self) -> &Path
fn is_stdin(&self) -> bool
fn is_explicit(&self) -> bool
fn is_dir(&self) -> bool      // follows symlinks
fn is_file(&self) -> bool     // does NOT follow symlinks

Explicit Path Heuristic

is_stdin() || (depth() == 0 && !is_dir())