Skip to content

ripgrep crates/ignore/src/gitignore.rs: Gitignore Parsing

What This File Does

This file implements a complete gitignore parser and matcher from scratch, without shelling out to the git command line tool. It transforms the somewhat quirky syntax defined in the gitignore man page into the glob matching infrastructure we explored in the previous lesson. The result is a system that can read .gitignore files, parse their patterns according to git's exact semantics, and efficiently determine whether any given file path should be ignored.

The implementation follows the builder pattern we've seen throughout ripgrep's codebase. A GitignoreBuilder accumulates glob patterns from one or more gitignore files, then constructs an immutable Gitignore matcher. This separation allows for flexible configuration—you can combine patterns from multiple sources, control case sensitivity, and handle parse errors gracefully without abandoning valid patterns.


Section 1: The Glob Data Structure

The heart of gitignore matching lies in understanding what a pattern actually means. Git's gitignore syntax carries more semantic information than a simple glob string—a pattern might be a whitelist entry (prefixed with !), might only match directories (suffixed with /), or might have come from a specific file path that affects how errors are reported. The Glob struct captures all of this metadata alongside both the original pattern text and the transformed pattern that will actually be used for matching.

The distinction between original and actual is subtle but important. When you write foo/ in a gitignore file, you're saying "ignore directories named foo." But the trailing slash isn't part of what gets matched—it's a semantic modifier. Similarly, patterns without path separators need a **/ prefix to match anywhere in the directory tree, not just at the root. The Glob struct preserves both forms so you can see exactly what the user wrote while also having the correct matching pattern.

This design reflects a broader principle in ripgrep: maintain fidelity to user input while transforming it internally for correctness. When ripgrep reports which pattern matched a file, it shows the original pattern from the gitignore file, not the internally-transformed version. This makes error messages and diagnostics far more useful.

See: Companion Code Section 1


Section 2: The Gitignore Matcher

The Gitignore struct is the compiled, immutable matcher that results from building patterns. It wraps a GlobSet from the globset crate—the same infrastructure we studied in the previous lesson—along with metadata about the patterns it contains. This layered design means the gitignore module doesn't need to implement its own matching engine; it transforms gitignore semantics into glob semantics and delegates the heavy lifting.

Notice the matches field, which holds an Option<Arc<Pool<Vec<usize>>>>. This is the object pool pattern we've seen in ripgrep's searcher and printer infrastructure. When matching, the GlobSet needs a scratch vector to store which patterns matched. Rather than allocating a new vector for every match operation, the pool provides thread-safe reuse of these scratch buffers. The Arc wrapper allows multiple threads to share the same pool.

The distinction between num_ignores and num_whitelists matters for optimization. If a gitignore file contains only whitelist patterns (only ! prefixed patterns), it can never cause a file to be ignored—it can only un-ignore files that were ignored by other patterns. This metadata enables short-circuit logic in the broader ignore infrastructure.

See: Companion Code Section 2


Section 3: Path Stripping and Relative Matching

One of the trickiest aspects of gitignore matching is handling the relationship between the gitignore file's location and the paths being matched. A pattern in /home/user/project/.gitignore applies to paths relative to /home/user/project/, not absolute paths. The strip method handles this normalization.

The implementation handles several edge cases that might not be obvious. A leading ./ is completely superfluous and gets stripped from both the root path and candidate paths. If the candidate path shares a common prefix with the gitignore's root, that prefix gets stripped so matching happens relative to the gitignore location. But there's a subtle case: if the path is just a filename with no directory components, stripping could incorrectly remove part of the filename itself if it happens to match the root.

The special case for root being . is particularly interesting. When the gitignore's root is the current directory represented as ., we shouldn't try to strip it from paths that happen to start with a dot (like .gitignore itself). This kind of edge case handling is why the file implements gitignore semantics from scratch rather than trying to adapt a simpler glob matcher.

See: Companion Code Section 3


Section 4: The Matching Algorithm

The matched_stripped method reveals how precedence works in gitignore files. When multiple patterns could match a path, the last one wins. This is why the method iterates through matches in reverse order—it's looking for the highest-precedence (last-defined) pattern that applies.

The is_only_dir check is crucial here. A pattern like foo/ should only match directories, but the GlobSet doesn't know whether we're matching a directory or a file—that's semantic information from the filesystem, not from the pattern. So the matcher checks this constraint after the glob matching succeeds, allowing the pattern to match only if the path is actually a directory.

The return type uses the Match enum from the ignore crate's public interface. This tri-state result—Ignore, Whitelist, or None—lets callers distinguish between "this should be ignored," "this was explicitly un-ignored," and "no pattern matched." The whitelist case is important because it affects how patterns from multiple gitignore files interact: a whitelist in a child directory can un-ignore something that was ignored by a parent directory's patterns.

See: Companion Code Section 4


Section 5: The Builder Pattern

The GitignoreBuilder follows the same builder pattern we've seen throughout ripgrep. It accumulates configuration and pattern data, then produces an immutable Gitignore through its build method. The builder owns a GlobSetBuilder internally, maintaining the layered abstraction—gitignore semantics on top of glob semantics.

Two configuration options deserve attention. The case_insensitive flag affects how globs are matched, which is useful on case-insensitive filesystems like Windows or macOS with default settings. The allow_unclosed_class option defaults to true to match git's actual behavior—if you write [abc without a closing bracket, git treats it as a literal string rather than an error. This kind of compatibility concern is why implementing gitignore correctly requires careful attention to the specification's edge cases.

The builder also strips the ./ prefix from the root path during construction. This normalization happens once at builder creation rather than on every match operation, which is more efficient and ensures consistent behavior regardless of how the path was originally specified.

See: Companion Code Section 5


Section 6: Parsing Individual Lines

The add_line method is where gitignore syntax gets transformed into glob patterns. This is one of the more complex methods in the file because gitignore syntax has many special cases that need to be handled in the right order.

The method first handles comments (lines starting with #) and blank lines, which are skipped. Trailing whitespace is trimmed unless it's escaped with a backslash. Then comes the escape handling for patterns that actually start with ! or #—you can write \!foo to match a file literally named !foo.

The whitelist prefix ! and the absolute prefix / are processed next. A leading slash means the pattern should only match at the root of the gitignore's scope, not anywhere in the tree. This is implemented by not adding the **/ prefix that would otherwise allow matching anywhere.

The trailing slash for directory-only matching is handled, with special attention to escaped slashes. Then comes the key transformation: patterns without a slash anywhere in them get a **/ prefix so they match at any depth. Finally, patterns ending with /** get an extra /* appended so they match contents inside the directory rather than the directory itself.

See: Companion Code Section 6


Section 7: Reading Gitignore Files

The add method handles reading patterns from a gitignore file on disk. It returns an Option<Error> rather than a Result because partial errors are possible—one invalid pattern shouldn't prevent other valid patterns from being added.

The line-by-line reading handles a subtle Unicode case: git supports files that begin with the UTF-8 byte order mark (BOM). This invisible character at the start of a file is stripped if present on the first line. This kind of detail shows the commitment to matching git's actual behavior, not just a simplified version of it.

Error handling is tagged with both the file path and line number, which enables useful error messages that tell users exactly where a problem pattern was defined. The PartialErrorBuilder collects these errors without stopping processing, allowing ripgrep to use as many valid patterns as possible even when some patterns are malformed.

See: Companion Code Section 7


Section 8: Global Gitignore Discovery

Git supports a global gitignore file configured through core.excludesFile in the git configuration. The build_global method and its helper functions implement this discovery process, checking multiple locations in the right precedence order.

The search order follows git's conventions: first $HOME/.gitconfig, then $XDG_CONFIG_HOME/git/config (or $HOME/.config/git/config if the environment variable isn't set). If neither configuration file specifies an excludes file, the default location $XDG_CONFIG_HOME/git/ignore is used.

The parse_excludes_file function extracts the path from git's INI-style configuration format. It uses a regex pattern with case-insensitive matching and handles both quoted and unquoted values. The comment acknowledges this isn't a full INI parser, but it works for the common cases. Tilde expansion is also supported, so ~/foo/bar in the config file correctly expands to the user's home directory.

See: Companion Code Section 8


Section 9: Parent Directory Matching

The matched_path_or_any_parents method provides an interesting optimization for certain use cases. Sometimes you have a full path and need to know if it or any of its parent directories match a gitignore pattern. Rather than walking the directory tree yourself, this method handles it internally.

The method is documented as being more expensive than walking the tree top-to-bottom, which is the normal case during directory traversal. But when you have a list of paths without hierarchy information—perhaps from a different source like a file list—this method provides a convenient API.

The assertion that the path not have a root (not be absolute) protects against incorrect usage. After stripping the gitignore's root prefix, the remaining path should be relative. This is a debugging aid that catches programming errors during development.

See: Companion Code Section 9


Section 10: The Constructor Methods

The Gitignore::new constructor provides a convenient entry point for the common case of loading a single gitignore file. It creates a builder, adds the file, handles errors gracefully, and returns both the matcher and any errors that occurred. The design always returns a valid matcher—even if it's empty due to errors—so callers can proceed with what worked.

The Gitignore::global constructor wraps the global gitignore discovery process. It handles the case where even getting the current working directory might fail, returning an empty matcher with the error rather than panicking.

The Gitignore::empty constructor creates a matcher that never matches anything. This is useful as a default value or as a fallback when gitignore functionality is disabled. The pattern of providing an empty constructor alongside new appears throughout ripgrep's codebase for types that might reasonably have a "do nothing" instance.

See: Companion Code Section 10


Section 11: Test Macros and Verification

The test suite uses macros to define concise test cases for both positive and negative matching. The ignored! macro creates a test that asserts a pattern should match, while not_ignored! asserts the opposite. This table-driven style makes it easy to add new test cases as edge cases are discovered.

The test cases cover a remarkable range of gitignore behaviors: basic filename matching, wildcards, the ** pattern for matching across directories, whitelist entries, directory-only patterns, escaped special characters, and interaction with different root paths. Many test cases reference specific GitHub issues, indicating they were added to prevent regressions.

The configuration file parsing also has its own test cases, verifying that core.excludesFile is correctly extracted from git config content, including tilde expansion and quoted values.

See: Companion Code Section 11


Key Takeaways

First, implementing gitignore correctly requires attention to many edge cases that aren't obvious from casual reading of the specification—escaped characters, trailing slashes, whitelist precedence, and directory-only matching all need special handling.

Second, the layered architecture separates concerns effectively: the globset crate handles pattern matching, while this module handles gitignore-specific syntax transformation and semantics.

Third, the builder pattern enables flexible construction from multiple sources while producing an immutable, thread-safe matcher optimized for repeated use.

Fourth, preserving both original and transformed pattern text enables better error messages and debugging without sacrificing matching correctness.

Fifth, object pooling through Pool<Vec<usize>> extends into the gitignore layer, maintaining ripgrep's commitment to minimizing allocations during hot paths.

Sixth, graceful error handling that collects partial errors rather than failing on the first problem lets ripgrep continue working with valid patterns even when some patterns are malformed.


How does the gitignore matcher integrate with directory traversal? Read crates/ignore/src/walk.rs to see how multiple gitignore files at different directory levels combine during a recursive search.

How are multiple ignore sources prioritized? Read crates/ignore/src/dir.rs to understand how .gitignore, .ignore, and command-line patterns interact.

Want to see the glob matching infrastructure this builds on? Return to crates/globset/src/lib.rs from the previous lesson for the underlying pattern compilation and matching engine.