Ripgrep haystack.rs: Code Companion¶
Reference code for the haystack.rs lecture. Sections correspond to the lecture document.
Section 1: The Builder Pattern¶
/// A builder for constructing things to search over.
#[derive(Clone, Debug)]
pub(crate) struct HaystackBuilder {
strip_dot_prefix: bool,
}
impl HaystackBuilder {
/// Return a new haystack builder with a default configuration.
pub(crate) fn new() -> HaystackBuilder {
HaystackBuilder { strip_dot_prefix: false }
}
/// When enabled, if the haystack's file path starts with `./` then it is
/// stripped.
///
/// This is useful when implicitly searching the current working directory.
pub(crate) fn strip_dot_prefix(
&mut self,
yes: bool,
) -> &mut HaystackBuilder {
self.strip_dot_prefix = yes;
self
}
}
Usage in HiArgs:
// From hiargs.rs
pub(crate) fn haystack_builder(&self) -> HaystackBuilder {
let mut builder = HaystackBuilder::new();
builder.strip_dot_prefix(self.paths.has_implicit_path);
builder
}
Why a builder for one field?
- Consistent API pattern across ripgrep
- Room for future configuration
- Separates construction from usage
- Builder is
Clone— can be shared across threads
Section 2: Building from Results¶
impl HaystackBuilder {
/// Create a new haystack from a possibly missing directory entry.
///
/// If the directory entry isn't present, then the corresponding error is
/// logged if messages have been configured. Otherwise, if the directory
/// entry is deemed searchable, then it is returned as a haystack.
pub(crate) fn build_from_result(
&self,
result: Result<ignore::DirEntry, ignore::Error>,
) -> Option<Haystack> {
match result {
Ok(dent) => self.build(dent),
Err(err) => {
err_message!("{err}");
None
}
}
}
}
How it's used in main.rs:
// Single-threaded search
let unsorted = args
.walk_builder()?
.build()
.filter_map(|result| haystack_builder.build_from_result(result));
// Parallel search
Box::new(move |result| {
let haystack = match haystack_builder.build_from_result(result) {
Some(haystack) => haystack,
None => return WalkState::Continue, // Skip, keep walking
};
// ... search the haystack
})
Error handling flow:
Walker yields Result<DirEntry, Error>
│
├── Err(e) → Log error, return None
│
└── Ok(dent) → Apply filtering logic
│
├── Passes → Some(Haystack)
│
└── Fails → None
Section 3: The Filtering Decision¶
impl HaystackBuilder {
/// Create a new haystack using this builder's configuration.
///
/// If a directory entry could not be created or should otherwise not be
/// searched, then this returns `None` after emitting any relevant log
/// messages.
fn build(&self, dent: ignore::DirEntry) -> Option<Haystack> {
let hay = Haystack {
dent,
strip_dot_prefix: self.strip_dot_prefix
};
// Log any partial errors (e.g., metadata read failures)
if let Some(err) = hay.dent.error() {
ignore_message!("{err}");
}
// Rule 1: Explicit entries always pass
if hay.is_explicit() {
return Some(hay);
}
// Rule 2: Only search regular files
if hay.is_file() {
return Some(hay);
}
// Rule 3: Everything else is skipped
if !hay.is_dir() {
log::debug!(
"ignoring {}: failed to pass haystack filter: \
file type: {:?}, metadata: {:?}",
hay.dent.path().display(),
hay.dent.file_type(),
hay.dent.metadata()
);
}
None
}
}
Decision flowchart:
Directory Entry
│
▼
is_explicit()?
│
├── Yes → SEARCH IT
│
└── No
│
▼
is_file()?
│
├── Yes → SEARCH IT
│
└── No
│
▼
is_dir()?
│
├── Yes → Skip silently (normal)
│
└── No → Skip with debug log (unusual)
Section 4: Explicit vs Implicit¶
impl Haystack {
/// Returns true if and only if this entry corresponds to a haystack to
/// search that was explicitly supplied by an end user.
///
/// Generally, this corresponds to either stdin or an explicit file path
/// argument. e.g., in `rg foo some-file ./some-dir/`, `some-file` is
/// an explicit haystack, but, e.g., `./some-dir/some-other-file` is not.
///
/// However, note that ripgrep does not see through shell globbing. e.g.,
/// in `rg foo ./some-dir/*`, `./some-dir/some-other-file` will be treated
/// as an explicit haystack.
pub(crate) fn is_explicit(&self) -> bool {
// stdin is always explicit
// depth() == 0 means directly provided, not discovered
// !is_dir() because directories are never "searched"
self.is_stdin() || (self.dent.depth() == 0 && !self.is_dir())
}
/// Returns true if and only if this entry corresponds to stdin.
pub(crate) fn is_stdin(&self) -> bool {
self.dent.is_stdin()
}
}
Examples:
| Command | Path | Explicit? | Why |
|---|---|---|---|
rg foo file.txt |
file.txt |
Yes | depth=0, user provided |
rg foo ./src/ |
./src/main.rs |
No | depth>0, discovered |
rg foo |
src/main.rs |
No | depth>0, discovered |
rg foo - |
stdin | Yes | stdin is always explicit |
echo x \| rg foo |
stdin | Yes | stdin is always explicit |
rg foo ./src/* |
./src/lib.rs |
Yes | shell expanded, depth=0 |
Why depth=0 means explicit:
// When you run: rg foo file1.txt ./dir/
// The walker sees:
// file1.txt → depth 0 (you provided it)
// ./dir/ → depth 0 (you provided it)
// ./dir/a.rs → depth 1 (walker found it)
// ./dir/sub/b.rs → depth 2 (walker found it)
Section 5: Path Display¶
impl Haystack {
/// Return the file path corresponding to this haystack.
///
/// If this haystack corresponds to stdin, then a special `<stdin>` path
/// is returned instead.
pub(crate) fn path(&self) -> &Path {
if self.strip_dot_prefix && self.dent.path().starts_with("./") {
self.dent.path().strip_prefix("./").unwrap()
} else {
self.dent.path()
}
}
}
Behavior examples:
| Input | strip_dot_prefix | path() returns |
|---|---|---|
./src/main.rs |
true |
src/main.rs |
./src/main.rs |
false |
./src/main.rs |
src/main.rs |
true |
src/main.rs |
src/main.rs |
false |
src/main.rs |
- (stdin) |
either | <stdin> (from DirEntry) |
When strip_dot_prefix is enabled:
// In hiargs.rs
builder.strip_dot_prefix(self.paths.has_implicit_path);
// has_implicit_path is true when user ran `rg foo` with no paths
// This makes output cleaner:
// "src/main.rs:42:fn main()"
// Instead of:
// "./src/main.rs:42:fn main()"
Section 6: File Type Detection¶
impl Haystack {
/// Returns true if and only if this haystack points to a directory after
/// following symbolic links.
fn is_dir(&self) -> bool {
let ft = match self.dent.file_type() {
None => return false, // Can't determine = not a dir
Some(ft) => ft,
};
if ft.is_dir() {
return true;
}
// Symlink that points to a directory?
self.dent.path_is_symlink() && self.dent.path().is_dir()
}
/// Returns true if and only if this haystack points to a file.
fn is_file(&self) -> bool {
self.dent.file_type().map_or(false, |ft| ft.is_file())
}
}
File type scenarios:
| Filesystem Object | is_file() | is_dir() | Searchable? |
|---|---|---|---|
| Regular file | true | false | Yes |
| Directory | false | true | No |
| Symlink → file | false | false | Only if explicit |
| Symlink → dir | false | true | No |
| Socket/FIFO/etc | false | false | Only if explicit |
| Unknown (no metadata) | false | false | Only if explicit |
Why symlink handling differs:
// is_dir follows symlinks:
self.dent.path_is_symlink() && self.dent.path().is_dir()
// is_file does NOT follow symlinks:
self.dent.file_type().map_or(false, |ft| ft.is_file())
// Reason: By the time we get here, --follow has already been applied.
// If the user wanted symlinks followed, the walker already did it.
// A symlink that wasn't followed should not be searched.
Section 7: The Wrapper Structure¶
/// A haystack is a thing we want to search.
///
/// Generally, a haystack is either a file or stdin.
#[derive(Clone, Debug)]
pub(crate) struct Haystack {
dent: ignore::DirEntry,
strip_dot_prefix: bool,
}
What ignore::DirEntry provides:
// From the ignore crate (not shown in haystack.rs)
impl DirEntry {
fn path(&self) -> &Path;
fn file_type(&self) -> Option<FileType>;
fn metadata(&self) -> Result<Metadata, Error>;
fn depth(&self) -> usize;
fn is_stdin(&self) -> bool;
fn path_is_symlink(&self) -> bool;
fn error(&self) -> Option<&Error>;
}
Why wrap instead of using DirEntry directly?
- Custom path logic — strip_dot_prefix behavior
- Application-level filtering — is_explicit, is_file
- Encapsulation — Search code sees Haystack API, not ignore API
- Future flexibility — Can add fields without changing ignore crate
Quick Reference: Full API¶
// Builder
impl HaystackBuilder {
pub fn new() -> HaystackBuilder;
pub fn strip_dot_prefix(&mut self, yes: bool) -> &mut Self;
pub fn build_from_result(&self, result: Result<DirEntry, Error>) -> Option<Haystack>;
}
// Haystack
impl Haystack {
pub fn path(&self) -> &Path; // Display path
pub fn is_stdin(&self) -> bool; // Is this stdin?
pub fn is_explicit(&self) -> bool; // User-provided?
fn is_dir(&self) -> bool; // Directory? (private)
fn is_file(&self) -> bool; // Regular file? (private)
}
Integration Example¶
// Complete flow from main.rs search function
fn search(args: &HiArgs, mode: SearchMode) -> anyhow::Result<bool> {
// 1. Get builder configured from HiArgs
let haystack_builder = args.haystack_builder();
// 2. Create iterator that converts DirEntry → Haystack
let unsorted = args
.walk_builder()? // Configure directory walker
.build() // Create iterator over Results
.filter_map(|result| { // Convert each Result
haystack_builder.build_from_result(result)
});
// 3. Optionally sort
let haystacks = args.sort(unsorted);
// 4. Search each haystack
for haystack in haystacks {
let result = searcher.search(&haystack)?;
// haystack.path() used for output formatting
}
Ok(matched)
}
Data flow: