Skip to content

Chapter: The Ignore Crate — Intelligent File Discovery

Overview

The ignore crate is ripgrep's file discovery engine. When you run rg pattern, something has to decide which files to search. That "something" is the ignore crate — a sophisticated directory walker that respects gitignore rules, filters by file type, and traverses directories in parallel.

This crate embodies a key insight: the fastest way to search a file is to not search it at all. By intelligently filtering files during traversal, ripgrep avoids ever opening files that can't match. This filtering happens at multiple levels: hidden files, gitignore patterns, file type globs, and size limits — all evaluated before any regex touches the file.

The ignore crate is also ripgrep's most reusable component. Unlike grep-searcher or grep-printer which are tightly coupled to search functionality, ignore is a general-purpose library. Other tools like fd (a find replacement) use it directly.


What This Crate Provides

The ignore crate offers three main capabilities:

1. Recursive Directory Walking - Sequential iteration via Walk - Parallel traversal via WalkParallel - Builder pattern for extensive configuration (WalkBuilder) - The WalkState enum for controlling traversal

2. Gitignore Processing - Parsing .gitignore, .ignore, and .rgignore files - Proper precedence rules (child overrides parent) - Negation patterns (lines starting with !) - Global git excludes

3. File Type Filtering - Built-in type definitions (rust, python, c, etc.) - Custom type definitions via --type-add - Type selection (-t) and negation (-T)


Crate Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         ignore crate                                 │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌─────────────┐     ┌─────────────┐     ┌─────────────┐          │
│   │   walk.rs   │────▶│   dir.rs    │────▶│ gitignore.rs│          │
│   │   (~2500)   │     │   (~1300)   │     │   (~850)    │          │
│   │             │     │             │     │             │          │
│   │ Walk        │     │ Ignore      │     │ Gitignore   │          │
│   │ WalkBuilder │     │ IgnoreBuilder│    │ GitignoreBuilder       │
│   │ WalkParallel│     │             │     │             │          │
│   │ WalkState   │     │             │     │             │          │
│   └─────────────┘     └─────────────┘     └─────────────┘          │
│          │                   │                                       │
│          │                   ▼                                       │
│          │            ┌─────────────┐     ┌─────────────┐          │
│          │            │  types.rs   │     │ overrides.rs│          │
│          │            │   (~580)    │     │   (~290)    │          │
│          │            │             │     │             │          │
│          └───────────▶│ Types       │     │ Override    │◀─────────┘
│                       │ TypesBuilder│     │             │
│                       └─────────────┘     └─────────────┘
│                              ▲
│                              │
│                       ┌──────────────┐
│                       │default_types │
│                       │   (~360)     │
│                       │              │
│                       │ Built-in     │
│                       │ definitions  │
│                       └──────────────┘
│                                                                      │
│   Supporting:  lib.rs (~540) - Error, Match types                   │
│                pathutil.rs (~140) - Path helpers                    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Module Summary

Module Lines Purpose
walk.rs ~2500 Directory traversal, parallel iteration, the closure-factory pattern
dir.rs ~1300 Per-directory ignore state, rule accumulation
gitignore.rs ~850 Gitignore file parsing, glob matching
types.rs ~580 File type definitions and matching
default_types.rs ~360 Built-in type definitions (rust:*.rs, etc.)
overrides.rs ~290 Command-line glob overrides (-g flag)
pathutil.rs ~140 Path normalization utilities
lib.rs ~540 Crate root, Error enum, Match enum

Table of Contents

This chapter covers the ignore crate in dependency order:

Part 1: Foundation Types

1.1 lib.rs — Crate Entry Point - The Error enum and error wrapping patterns - The Match<T> enum for ignore/whitelist decisions - Partial error handling philosophy

Part 2: Glob and Pattern Matching

2.1 gitignore.rs — Gitignore Parsing - Gitignore file format and semantics - Pattern compilation and matching - Precedence rules and negation

2.2 overrides.rs — CLI Glob Overrides - The -g/--glob flag implementation - Override vs ignore precedence

Part 3: File Type System

3.1 default_types.rs — Built-in Definitions - How type definitions are stored - The DEFAULT_TYPES static

3.2 types.rs — Type Matching - TypesBuilder configuration - Type selection and negation - Glob-to-type matching

Part 4: Directory State

4.1 dir.rs — Per-Directory Ignore State - The Ignore struct - Rule accumulation as you descend - Parent-child override semantics

Part 5: The Walker

5.1 walk.rs — Directory Traversal - Walk for sequential iteration - WalkParallel for parallel traversal - WalkBuilder configuration - The closure-factory pattern - WalkState for traversal control - Thread coordination and work stealing


Key Concepts Preview

The Match Enum

Every filtering decision returns a Match<T>:

pub enum Match<T> {
    None,        // No rule matched
    Ignore(T),   // Should be ignored (skip this file)
    Whitelist(T) // Explicitly included (override ignore)
}

Negation patterns (lines starting with !) produce Whitelist. The precedence rule: later rules win, so a whitelist after an ignore re-includes the file.

The Closure-Factory Pattern

Parallel traversal uses a pattern you saw in main.rs:

walker.build_parallel().run(|| {
    // This closure is called once per thread
    // Return a closure that handles individual entries
    Box::new(|entry| {
        // Process entry
        WalkState::Continue
    })
})

The outer closure sets up thread-local state. The inner closure processes entries. This enables per-thread resources without shared mutable state.

Rule Hierarchy

Ignore rules stack as you descend directories:

/project/.gitignore          # Applies to all of /project
/project/src/.gitignore      # Adds rules for /project/src
/project/src/test/.gitignore # Adds rules for /project/src/test

Child rules take precedence over parent rules. A whitelist in a child directory can override an ignore in a parent.


How Ripgrep Uses This Crate

From main.rs and hiargs.rs:

// Building the walker
let walker = WalkBuilder::new(path)
    .hidden(!show_hidden)           // Skip hidden files?
    .ignore(!no_ignore)             // Respect .ignore files?
    .git_ignore(!no_git_ignore)     // Respect .gitignore?
    .git_global(!no_git_global)     // Respect global gitignore?
    .git_exclude(!no_git_exclude)   // Respect .git/info/exclude?
    .types(type_matcher)            // File type filtering
    .overrides(override_matcher)    // CLI glob overrides
    .threads(thread_count)          // Parallelism level
    .build_parallel();              // Construct parallel walker

// Running in parallel (from main.rs)
walker.run(|| {
    let mut searcher = searcher.clone();
    Box::new(move |entry| {
        // Search this file
        match searcher.search(&entry) { ... }
        WalkState::Continue
    })
});

Reading Order Recommendation

For the deepest understanding, read in this order:

  1. lib.rs — Understand Error and Match first
  2. gitignore.rs — Core pattern matching
  3. types.rs + default_types.rs — File type system
  4. overrides.rs — CLI overrides (builds on gitignore)
  5. dir.rs — How rules accumulate per-directory
  6. walk.rs — The actual traversal (ties everything together)

Alternatively, for a top-down view: 1. walk.rs — See the big picture first 2. Then dive into components as questions arise


What You'll Learn

By the end of this chapter, you'll understand:

  • How gitignore pattern matching actually works
  • Why parallel directory traversal is non-trivial
  • The closure-factory pattern for thread-local state
  • How ignore rules cascade through directory hierarchies
  • Why file type filtering happens during traversal, not after
  • The design decisions that make ripgrep fast at file discovery

Let's begin with the foundation: lib.rs and the Error/Match types.