Chapter: The Ignore Crate — Intelligent File Discovery¶
Overview¶
The ignore crate is ripgrep's file discovery engine. When you run rg pattern, something has to decide which files to search. That "something" is the ignore crate — a sophisticated directory walker that respects gitignore rules, filters by file type, and traverses directories in parallel.
This crate embodies a key insight: the fastest way to search a file is to not search it at all. By intelligently filtering files during traversal, ripgrep avoids ever opening files that can't match. This filtering happens at multiple levels: hidden files, gitignore patterns, file type globs, and size limits — all evaluated before any regex touches the file.
The ignore crate is also ripgrep's most reusable component. Unlike grep-searcher or grep-printer which are tightly coupled to search functionality, ignore is a general-purpose library. Other tools like fd (a find replacement) use it directly.
What This Crate Provides¶
The ignore crate offers three main capabilities:
1. Recursive Directory Walking
- Sequential iteration via Walk
- Parallel traversal via WalkParallel
- Builder pattern for extensive configuration (WalkBuilder)
- The WalkState enum for controlling traversal
2. Gitignore Processing
- Parsing .gitignore, .ignore, and .rgignore files
- Proper precedence rules (child overrides parent)
- Negation patterns (lines starting with !)
- Global git excludes
3. File Type Filtering
- Built-in type definitions (rust, python, c, etc.)
- Custom type definitions via --type-add
- Type selection (-t) and negation (-T)
Crate Architecture¶
┌─────────────────────────────────────────────────────────────────────┐
│ ignore crate │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ walk.rs │────▶│ dir.rs │────▶│ gitignore.rs│ │
│ │ (~2500) │ │ (~1300) │ │ (~850) │ │
│ │ │ │ │ │ │ │
│ │ Walk │ │ Ignore │ │ Gitignore │ │
│ │ WalkBuilder │ │ IgnoreBuilder│ │ GitignoreBuilder │
│ │ WalkParallel│ │ │ │ │ │
│ │ WalkState │ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌─────────────┐ ┌─────────────┐ │
│ │ │ types.rs │ │ overrides.rs│ │
│ │ │ (~580) │ │ (~290) │ │
│ │ │ │ │ │ │
│ └───────────▶│ Types │ │ Override │◀─────────┘
│ │ TypesBuilder│ │ │
│ └─────────────┘ └─────────────┘
│ ▲
│ │
│ ┌──────────────┐
│ │default_types │
│ │ (~360) │
│ │ │
│ │ Built-in │
│ │ definitions │
│ └──────────────┘
│ │
│ Supporting: lib.rs (~540) - Error, Match types │
│ pathutil.rs (~140) - Path helpers │
│ │
└─────────────────────────────────────────────────────────────────────┘
Module Summary¶
| Module | Lines | Purpose |
|---|---|---|
| walk.rs | ~2500 | Directory traversal, parallel iteration, the closure-factory pattern |
| dir.rs | ~1300 | Per-directory ignore state, rule accumulation |
| gitignore.rs | ~850 | Gitignore file parsing, glob matching |
| types.rs | ~580 | File type definitions and matching |
| default_types.rs | ~360 | Built-in type definitions (rust:*.rs, etc.) |
| overrides.rs | ~290 | Command-line glob overrides (-g flag) |
| pathutil.rs | ~140 | Path normalization utilities |
| lib.rs | ~540 | Crate root, Error enum, Match enum |
Table of Contents¶
This chapter covers the ignore crate in dependency order:
Part 1: Foundation Types¶
1.1 lib.rs — Crate Entry Point
- The Error enum and error wrapping patterns
- The Match<T> enum for ignore/whitelist decisions
- Partial error handling philosophy
Part 2: Glob and Pattern Matching¶
2.1 gitignore.rs — Gitignore Parsing - Gitignore file format and semantics - Pattern compilation and matching - Precedence rules and negation
2.2 overrides.rs — CLI Glob Overrides
- The -g/--glob flag implementation
- Override vs ignore precedence
Part 3: File Type System¶
3.1 default_types.rs — Built-in Definitions - How type definitions are stored - The DEFAULT_TYPES static
3.2 types.rs — Type Matching - TypesBuilder configuration - Type selection and negation - Glob-to-type matching
Part 4: Directory State¶
4.1 dir.rs — Per-Directory Ignore State
- The Ignore struct
- Rule accumulation as you descend
- Parent-child override semantics
Part 5: The Walker¶
5.1 walk.rs — Directory Traversal
- Walk for sequential iteration
- WalkParallel for parallel traversal
- WalkBuilder configuration
- The closure-factory pattern
- WalkState for traversal control
- Thread coordination and work stealing
Key Concepts Preview¶
The Match Enum¶
Every filtering decision returns a Match<T>:
pub enum Match<T> {
None, // No rule matched
Ignore(T), // Should be ignored (skip this file)
Whitelist(T) // Explicitly included (override ignore)
}
Negation patterns (lines starting with !) produce Whitelist. The precedence rule: later rules win, so a whitelist after an ignore re-includes the file.
The Closure-Factory Pattern¶
Parallel traversal uses a pattern you saw in main.rs:
walker.build_parallel().run(|| {
// This closure is called once per thread
// Return a closure that handles individual entries
Box::new(|entry| {
// Process entry
WalkState::Continue
})
})
The outer closure sets up thread-local state. The inner closure processes entries. This enables per-thread resources without shared mutable state.
Rule Hierarchy¶
Ignore rules stack as you descend directories:
/project/.gitignore # Applies to all of /project
/project/src/.gitignore # Adds rules for /project/src
/project/src/test/.gitignore # Adds rules for /project/src/test
Child rules take precedence over parent rules. A whitelist in a child directory can override an ignore in a parent.
How Ripgrep Uses This Crate¶
From main.rs and hiargs.rs:
// Building the walker
let walker = WalkBuilder::new(path)
.hidden(!show_hidden) // Skip hidden files?
.ignore(!no_ignore) // Respect .ignore files?
.git_ignore(!no_git_ignore) // Respect .gitignore?
.git_global(!no_git_global) // Respect global gitignore?
.git_exclude(!no_git_exclude) // Respect .git/info/exclude?
.types(type_matcher) // File type filtering
.overrides(override_matcher) // CLI glob overrides
.threads(thread_count) // Parallelism level
.build_parallel(); // Construct parallel walker
// Running in parallel (from main.rs)
walker.run(|| {
let mut searcher = searcher.clone();
Box::new(move |entry| {
// Search this file
match searcher.search(&entry) { ... }
WalkState::Continue
})
});
Reading Order Recommendation¶
For the deepest understanding, read in this order:
- lib.rs — Understand
ErrorandMatchfirst - gitignore.rs — Core pattern matching
- types.rs + default_types.rs — File type system
- overrides.rs — CLI overrides (builds on gitignore)
- dir.rs — How rules accumulate per-directory
- walk.rs — The actual traversal (ties everything together)
Alternatively, for a top-down view: 1. walk.rs — See the big picture first 2. Then dive into components as questions arise
What You'll Learn¶
By the end of this chapter, you'll understand:
- How gitignore pattern matching actually works
- Why parallel directory traversal is non-trivial
- The closure-factory pattern for thread-local state
- How ignore rules cascade through directory hierarchies
- Why file type filtering happens during traversal, not after
- The design decisions that make ripgrep fast at file discovery
Let's begin with the foundation: lib.rs and the Error/Match types.