lowargs.rs: Low-Level Arguments¶

What This File Does¶

This file defines the LowArgs struct and its associated types—the complete collection of "low-level" arguments that ripgrep's CLI parser produces. Think of it as the raw output of argument parsing: every flag value, every option, every mode toggle, all collected into a single structure that mirrors the actual command-line interface as closely as possible.

The key design constraint here is intentional simplicity. Low-level arguments contain validated user input but avoid higher-level abstractions. They don't discover hostnames, don't probe filesystem capabilities, don't make network calls. This separation allows ripgrep to guarantee that help and version information are always available, even when the environment is broken in ways that would cause higher-level configuration to fail. The transformation from these low-level arguments into the builder configurations you've seen in previous lessons happens elsewhere—this file is purely about capturing what the user asked for.

Section 1: The Anatomy of Low-Level Arguments¶

The LowArgs struct is surprisingly large—over fifty fields spanning boolean flags, optional values, vectors of configuration, and numerous mode enumerations. This isn't complexity for its own sake; it reflects the genuine richness of ripgrep's command-line interface. Each field here corresponds directly to one or more CLI flags.

Notice how the struct uses pub(crate) visibility for all its fields. This is a deliberate design choice: low-level arguments are internal implementation details, not part of ripgrep's public API. Other modules within the core crate can read and write these fields directly, but external code cannot. This gives the ripgrep developers freedom to restructure the argument representation without breaking external consumers.

The struct derives Debug and Default, which is essential for argument parsing. The parser starts with LowArgs::default() and then updates individual fields as it processes each flag. This incremental construction pattern means every field must have a sensible default value—either through Default for the type or through explicit initialization.

See: Companion Code Section 1

Section 2: Special Modes and Short-Circuiting¶

The SpecialMode enum represents commands that should bypass normal ripgrep operation entirely. When a user asks for --help or --version, ripgrep shouldn't attempt to build search configurations, probe for PCRE2 support, or discover hostnames. It should just print the requested information and exit.

This design is a reliability hedge. Imagine ripgrep needs to read environment variables or check filesystem permissions to construct its full configuration. If something is wrong with the environment, that construction might fail—but the user should still be able to run rg --help to understand how to fix their invocation. By separating special modes from regular operation, ripgrep ensures that basic help and version information remain accessible even in degraded environments.

The distinction between "short" and "long" variants for both help and version reflects ripgrep's Unix heritage. The -h flag gives condensed output suitable for quick reference, while --help provides comprehensive documentation. Similarly, -V shows just the version string, while --version includes build features and other details useful for bug reports.

See: Companion Code Section 2

Section 3: Operational Modes and Override Semantics¶

The Mode enum defines what ripgrep is actually going to do: search for patterns, list files, show type definitions, or generate shell completions. Unlike special modes that short-circuit everything, these operational modes represent the primary functionality that ripgrep will execute after parsing.

The update method on Mode implements an interesting override policy. Search modes (the default) can be overridden by anything. But once you're in a non-search mode like Files, other non-search modes can override it, but search modes cannot. This means if you type rg --files -l, you stay in Mode::Files—the -l flag (which normally activates FilesWithMatches search mode) doesn't override the explicit --files request.

This policy exists because flags often come from multiple sources: the command line, config files, environment variables. A user who explicitly requests --files probably wants that behavior even if their config file includes some search-mode flag. The override semantics prevent surprising interactions between configuration sources.

See: Companion Code Section 3

Section 4: Search Mode Variations¶

The SearchMode enum captures the different ways ripgrep can report search results. Standard mode prints matching lines. FilesWithMatches prints only filenames. Count prints filenames with match counts. Each mode represents a fundamentally different output format.

Notice that there's no explicit flag for Standard mode—it's simply the default. The other modes have activation flags (-l, -c, --json), and some have negation flags (--no-json) that reset back to standard mode. This asymmetry reflects the common pattern where you want to override a config file setting back to the default behavior.

The distinction between Count and CountMatches is subtle but important. Count reports the number of matching lines, while CountMatches reports the total number of matches. A line with three matches of the pattern would contribute 1 to Count but 3 to CountMatches. Having both as separate modes means the output format can be optimized for each case.

See: Companion Code Section 4

Section 5: Binary Data Handling¶

The BinaryMode enum addresses one of the trickier aspects of text searching: what happens when ripgrep encounters binary files? The three options represent fundamentally different philosophies about how to handle non-text content.

Auto mode, the default, makes context-dependent decisions. Files searched explicitly (named on the command line) get searched with binary suppression—ripgrep will search them but warn if it finds binary content near a match. Files found through directory traversal get more aggressive filtering—once ripgrep decides a file is binary, it stops immediately.

The SearchAndSuppress mode includes an interesting implementation detail: it replaces NUL bytes with line terminators. This sounds strange until you understand the problem it solves. Binary files often contain long runs of NUL bytes, and treating those as line content would create astronomically long "lines" that exhaust heap memory. The NUL replacement is a heuristic that keeps memory usage bounded while still allowing binary-adjacent searching.

AsText mode removes all binary handling—every file is treated as text, period. This is useful when you know your files contain NUL bytes but are actually text (some logging formats do this), or when you genuinely want to search binary content and understand the memory implications.

See: Companion Code Section 5

Section 6: Context Mode and Precedence¶

The ContextMode enum and its companion ContextModeLimited struct handle the -A, -B, and -C flags that control how many lines around each match ripgrep displays. This seemingly simple feature has surprisingly complex precedence rules.

The Passthru variant is special: it means "print all lines, whether they match or not." This transforms ripgrep into something like grep --color where you see the entire file with matches highlighted. It's useful when you want context but don't know how much context you'll need.

Limited mode is where the complexity lives. Notice that ContextModeLimited tracks before, after, and both as separate Option<usize> values rather than just two numbers. This is because -A (after) and -B (before) always override -C (both), regardless of the order they appear. If the user specifies -C 5 -A 2, they get 5 lines before and 2 lines after—the -A partially overrides the -C. This order-independence is important for configuration file support, where the user might want to set a default with -C but override just one direction on the command line.

The get method resolves these precedences into concrete numbers at the point of use, keeping the parsing logic separate from the resolution logic.

See: Companion Code Section 6

Section 7: Separator Types with Escape Handling¶

The ContextSeparator, FieldContextSeparator, and FieldMatchSeparator types all follow the same pattern: they wrap a BString (binary string) and handle the conversion from user-provided arguments. This wrapping serves two purposes: it enforces that escape sequences are properly processed, and it documents the semantic role of each separator.

The new constructors on these types require valid UTF-8 input but then immediately convert to binary strings using Vec::unescape_bytes. This allows users to specify binary separators using escape sequences like \x00 for a null byte, while still providing a clear error message if the input can't even be parsed as UTF-8.

The ContextSeparator has an additional disabled constructor that creates a None variant, representing the explicit absence of a separator. This is different from an empty string separator—disabled means "don't print anything between context groups," while empty string means "print a blank line between context groups."

These types demonstrate a common Rust pattern: creating newtypes that add semantic meaning and validation to primitive types. A BString could be anything; a FieldMatchSeparator clearly communicates its purpose and guarantees its content has been properly processed.

See: Companion Code Section 7

Section 8: Encoding and Engine Choices¶

The EncodingMode and EngineChoice enums represent choices that significantly affect ripgrep's behavior but require careful defaulting. Both use Auto or Default as their default variant, reflecting the philosophy that ripgrep should "just work" for common cases while allowing expert users to override.

EncodingMode::Auto tells ripgrep to sniff for byte-order marks (BOMs) that indicate encoding. EncodingMode::Some lets users force a specific encoding while still respecting BOMs. EncodingMode::Disabled bypasses all encoding logic and searches raw bytes—useful when you know the encoding detection is wrong or when you're searching binary-ish content.

EngineChoice controls which regex engine ripgrep uses. The default Rust regex engine is fast and safe but doesn't support all regex features. PCRE2 supports features like backreferences and lookaround but requires an external library. The Auto choice implements a clever fallback: try the default engine first, and if the pattern doesn't compile, automatically try PCRE2. This gives users the best of both worlds—fast patterns when possible, powerful patterns when necessary.

See: Companion Code Section 8

Section 9: Sorting and Platform Detection¶

The SortMode and SortModeKind types handle the --sort and --sortr flags, with an interesting twist: sorting by timestamps isn't always supported. The supported method performs runtime capability detection to determine whether the requested sort mode will work.

Look at how supported is implemented: it attempts to get metadata from the current executable and checks whether the relevant timestamp is available. This isn't checking whether the filesystem supports timestamps in general—it's checking whether this particular Rust runtime on this particular platform can retrieve them. Some platforms might support modification times but not creation times. Some embedded or restricted environments might not support any timestamp access.

This approach—probing capabilities at runtime rather than compile time—is pragmatic. It means ripgrep can be compiled once and deployed to varied environments, gracefully degrading when features aren't available. The user gets a clear error message explaining why their sort mode won't work, rather than a cryptic failure later in execution.

See: Companion Code Section 9

Section 10: Pattern Sources and Type Changes¶

The PatternSource enum unifies two ways of providing patterns to ripgrep: the -e/--regexp flag for inline patterns and the -f/--file flag for pattern files. By collecting both into a single Vec<PatternSource>, ripgrep preserves the relative order of these flags.

Order matters because patterns are tried in sequence, and for certain operations (like replacement), the order determines which pattern "wins." If a user specifies -f patterns.txt -e 'inline', they expect the patterns from the file to come before the inline pattern.

The TypeChange enum handles the complex world of file type definitions. Users can add new types (--type-add), clear existing ones (--type-clear), select types for filtering (--type), or negate types (--type-not). These operations form a mini-language for customizing ripgrep's built-in file type system.

Importantly, these are recorded as changes, not computed results. The actual type definitions are built elsewhere by applying these changes in order to ripgrep's default type database. This separation keeps low-level arguments purely about "what the user said" rather than "what that means."

See: Companion Code Section 10

Section 11: Color Choices and Terminal Integration¶

The ColorChoice enum demonstrates how command-line tools handle the perennial question of "should this output be colorful?" The four variants cover the full spectrum of possibilities: never, auto-detect, always, and always-with-ANSI.

The Ansi variant exists for a specific historical reason: Windows terminal handling. Older Windows console APIs required special escape sequences different from standard ANSI codes. Modern Windows terminals understand ANSI, but ripgrep needs to support both. ColorChoice::Ansi forces ANSI codes even when ripgrep thinks it should use legacy Windows APIs.

The to_termcolor method bridges ripgrep's internal representation to the termcolor crate's ColorChoice type. This is a common pattern when integrating with external crates: define your own types that match your semantics, then provide conversion methods to external types. This insulates your codebase from changes in external crate APIs.

See: Companion Code Section 11

Section 12: Connecting to the Builder Ecosystem¶

This file is the foundation that the builder pattern ecosystem you've studied depends upon. The LowArgs struct captures everything needed to configure searchers, printers, and walkers—but it captures them in their raw, unprocessed form.

Consider how fields like dfa_size_limit, regex_size_limit, and threads are all Option types. The None case doesn't mean "no limit" or "single-threaded"—it means "use the default." The translation from None to actual defaults happens when constructing the builders. This separation means the defaults can be computed contextually (perhaps based on available CPU cores or memory) without complicating the argument parsing.

Similarly, boolean flags like multiline, crlf, and fixed_strings will eventually flow into RegexMatcherBuilder and SearcherBuilder configurations. The globs and iglobs vectors will configure file filtering. The heading, column, and line_number options will shape printer output. Every field here has a downstream consumer, but the low-level arguments don't need to know about those consumers—they just faithfully record what the user requested.

See: Companion Code Section 12

Key Takeaways¶

First, low-level arguments intentionally mirror the CLI surface rather than higher-level abstractions, creating a clean separation between "what the user said" and "what that means for the application."

Second, special modes that short-circuit normal operation ensure reliability—help and version information remain accessible even when the environment is broken.

Third, override semantics in enums like Mode and ContextMode handle the complexity of configuration coming from multiple sources while maintaining user expectations.

Fourth, newtypes like ContextSeparator add semantic meaning and validation to primitive types, making code self-documenting and enforcing invariants at construction time.

Fifth, runtime capability detection (as in SortMode::supported) allows a single binary to gracefully handle varied deployment environments.

Sixth, this file demonstrates the value of keeping argument parsing separate from argument interpretation—the transformation to builder configurations happens elsewhere, keeping each layer focused on a single responsibility.

What to Read Next¶

How do these low-level arguments get transformed into the higher-level configurations that drive searching? Read crates/core/flags/hiargs.rs to see the next stage of argument processing.

How does the parser actually populate these fields? Read crates/core/flags/parse.rs to understand the parsing machinery that produces LowArgs.

How do the separator types and encoding modes affect actual search output? Review crates/printer/standard.rs to see how printer configuration consumes these values.