process.rs: Process Management¶

What This File Does¶

This file provides infrastructure for executing external processes and streaming their output safely. When ripgrep needs to search through compressed files, it spawns decompression tools like gzip or xz and reads their output as a stream. The challenge is doing this correctly: processes write to both stdout and stderr, and if you don't read from both simultaneously, the process can deadlock when one buffer fills up.

The CommandReader abstraction solves this by wrapping a child process in a type that implements Rust's standard Read trait. It handles the tricky details of process lifecycle management—spawning, reading, waiting, and cleaning up—while preventing deadlocks through asynchronous stderr reading. This design lets the rest of ripgrep treat decompressed data as just another input stream, without worrying about the underlying process management complexity.

Section 1: The Custom Error Type Strategy¶

The file begins by defining CommandError, a custom error type that captures the two ways a command can fail: through an I/O error during process management, or by exiting with an error status and writing to stderr. This distinction matters because the error message you want to show the user differs dramatically between these cases.

When a process fails due to an I/O error—perhaps the executable doesn't exist—you want to show that system error directly. But when a process runs successfully from the operating system's perspective but exits with a non-zero status, the meaningful error message is whatever the process wrote to stderr. A decompression tool might write "file is corrupted" or "unsupported format," and that's what the user needs to see.

The enum-based design with CommandErrorKind is a common Rust pattern for errors with multiple causes. The outer CommandError struct provides a stable public API while the inner enum handles the variant logic. Notice how the kind field is private—users of this type interact through methods, not by matching on variants directly. This encapsulation allows the internal representation to change without breaking callers.

See: Companion Code Section 1

Section 2: Error Display Formatting for Human Readability¶

The Display implementation for CommandError shows thoughtful attention to user experience. For I/O errors, it delegates directly to the underlying error's display logic. But for stderr content, it applies formatting that makes error messages from external tools visually distinct and readable.

The stderr formatting wraps the content in a box made of dashes, with the message trimmed of leading and trailing whitespace. This visual treatment serves a practical purpose: when ripgrep reports that a decompression tool failed, users can immediately distinguish between ripgrep's own message and the quoted stderr from the external tool. The 79-character line of dashes provides a clear boundary.

The empty stderr case receives special handling with the placeholder <stderr is empty>. This acknowledges reality: some tools exit with error codes without writing anything useful to stderr. Rather than showing nothing, which might confuse users, the error explicitly states that stderr was empty. This helps with debugging—users know the process failed but didn't explain why.

See: Companion Code Section 2

Section 3: Bidirectional Error Conversion¶

The From implementations create a bridge between CommandError and io::Error, allowing seamless conversion in both directions. This is crucial for integrating with code that expects standard I/O errors while preserving the richer error information when available.

Converting from io::Error to CommandError is straightforward—the I/O error simply gets wrapped. The reverse direction is more nuanced. When converting a CommandError back to io::Error, I/O errors unwrap cleanly, but stderr-based errors become io::Error with ErrorKind::Other. The original CommandError becomes the source of the new I/O error, preserving the full error chain.

This bidirectional conversion supports a common pattern in Rust: functions that need to work with the standard library's error types while internally using richer domain-specific errors. The ? operator works naturally in both directions because Rust's error conversion is driven by From implementations.

See: Companion Code Section 3

Section 4: The Builder Pattern for Command Configuration¶

The CommandReaderBuilder follows the builder pattern you've seen throughout ripgrep's codebase. Even though there's currently only one configuration option—async_stderr—the builder establishes an extensible API that can grow without breaking existing code.

The builder stores configuration state and produces a CommandReader through its build method. The separation between configuration and construction is valuable: you might want to create a builder once, configure it based on runtime conditions, and then use it to build multiple readers. The Clone and Default derives support this use case.

The build method takes a mutable reference to a Command rather than owning it. This design choice reflects that the caller has already configured the command with its arguments, environment variables, and working directory. The builder only overrides the stdout and stderr settings to Stdio::piped(), which is necessary for capturing the process output.

See: Companion Code Section 4

Section 5: Understanding the Deadlock Problem¶

The async_stderr option addresses a subtle but serious problem with process I/O. When you spawn a process, the operating system creates buffers for stdout and stderr. These buffers have limited capacity—typically a few kilobytes. If a process writes enough to fill a buffer and nobody reads from it, the process blocks.

Imagine spawning a decompression tool that writes the decompressed data to stdout and progress messages to stderr. If you read only from stdout, stderr's buffer might fill up. The process blocks waiting to write to stderr, but you're blocked waiting to read from stdout. Neither can proceed—a classic deadlock.

The async_stderr option solves this by reading stderr on a separate thread. While the main thread reads stdout, the background thread drains stderr. Neither buffer fills up, so the process runs to completion. The documentation notes this spawns an additional thread, which has some overhead, but the alternative—potential deadlocks—is far worse. This is why async_stderr defaults to enabled.

See: Companion Code Section 5

Section 6: The CommandReader Core Structure¶

The CommandReader struct holds three things: the child process, a stderr reader, and an EOF flag. The child process owns the stdout pipe (as an Option<ChildStdout> inside the Child struct), the stderr reader handles the stderr pipe through either sync or async reading, and the EOF flag tracks whether we've finished reading stdout.

The EOF flag serves an important purpose in cleanup logic. When you close a reader before consuming all output—perhaps because you found what you were searching for—the child process receives a broken pipe signal when it tries to write more. This is expected behavior, not an error. The EOF flag lets the close logic distinguish between "closed early, broken pipe is fine" and "read to completion, any error is real."

This structure embodies a key principle: the reader owns the child process and is responsible for its complete lifecycle. When the reader is dropped, it must clean up the process. When the reader encounters an error, it must provide useful diagnostics. Ownership in Rust isn't just about memory—it's about resources generally.

See: Companion Code Section 6

Section 7: The Close Protocol¶

The close method implements careful cleanup logic that handles multiple scenarios correctly. First, it checks if stdout has already been taken—if so, close was already called, and we return early. This makes close idempotent, safe to call multiple times.

Closing starts by dropping stdout. This closes the pipe, which signals to a well-behaved child process that no more output will be read. The process should then exit. We call wait() to collect the exit status, which also reaps the process and prevents it from becoming a zombie.

If the process exited successfully, we're done. If it failed, we read whatever stderr contains and return it as an error. But there's a special case: if we didn't read to EOF (meaning we closed early) and stderr is empty, we assume success. This handles the broken pipe scenario gracefully—the process "failed" only because we stopped reading, not because of an actual error.

See: Companion Code Section 7

Section 8: Resource Safety Through Drop¶

The Drop implementation ensures that resources are always cleaned up, even if the caller forgets to call close. This is defense-in-depth: proper usage should call close explicitly to handle errors, but Drop provides a safety net.

When Drop calls close and encounters an error, it can't return that error—Drop has no return value. Instead, it logs a warning. This is a pragmatic compromise: dropping silently might hide real problems, but panicking would be too severe for what might be an expected broken pipe.

The documentation explicitly discusses this design choice, advising callers to call close explicitly if they want to handle errors properly. This is honest API design: the Drop implementation prevents resource leaks and process zombies, but it can't fully replace explicit cleanup. Rust's type system can't express "you must call close before dropping," so documentation fills the gap.

See: Companion Code Section 8

Section 9: Implementing the Read Trait¶

The Read implementation is where CommandReader becomes useful to the rest of the codebase. By implementing io::Read, a CommandReader can be used anywhere that expects a reader: passed to functions that read bytes, wrapped in buffered readers, or used with the standard read methods.

The implementation delegates to the child's stdout, but with two additions. First, it handles the case where stdout has been taken (after close), returning zero bytes to indicate EOF. Second, when a read returns zero bytes naturally (the process finished), it sets the EOF flag and calls close, converting any process failure into an I/O error.

This automatic close-on-EOF is convenient: if you read a CommandReader to completion using read_to_end, cleanup happens automatically. The EOF flag ensures that close knows this was a natural end, not an early termination. This distinction affects whether a broken pipe is considered an error.

See: Companion Code Section 9

Section 10: The Stderr Reader Abstraction¶

The StderrReader enum encapsulates the choice between synchronous and asynchronous stderr reading. The async variant holds a join handle to a background thread, while the sync variant holds the stderr pipe directly. Both variants provide the same interface through read_to_end.

The async variant is constructed by spawning a thread that reads all of stderr into a CommandError. The thread runs independently while the main thread reads stdout. When we need the stderr content—at close time—we join the thread and retrieve its result.

The raw identifier syntax r#async deserves explanation. async became a reserved keyword in Rust 2018 for async/await syntax. But here it's used as a method name, which requires the r# prefix to tell Rust "this isn't the keyword, it's an identifier." This is a minor inconvenience from Rust's evolution, preserved for API consistency.

See: Companion Code Section 10

Section 11: Thread Safety Considerations¶

The read_to_end method on StderrReader shows careful handling of the thread join. It takes the join handle out of the Option (using take()), ensuring that read_to_end can only be called once for the async case. The expect message documents this constraint: calling it twice is a programmer error.

The second expect on join() asserts that the thread doesn't panic. The stderr reading thread is simple—it just reads bytes—so panics are unexpected. If one did occur, it would indicate a bug, not a runtime error, so panicking in the caller is appropriate.

For the sync case, read_to_end simply reads stderr directly. This works when stderr output is small or when you know the process won't write much to stderr while producing stdout. The sync path avoids thread overhead but requires this guarantee from the caller.

See: Companion Code Section 11

Section 12: The Helper Function for Stderr Reading¶

The stderr_to_command_error function handles the actual reading of stderr into a CommandError. It reads all bytes into a vector, then wraps either the bytes (on success) or the I/O error (on failure) into the appropriate error variant.

This helper is used both by the async thread and the sync reader, avoiding code duplication. It takes a mutable reference to ChildStderr, which implements Read, and returns a CommandError that captures either the stderr content or any error encountered while reading.

The function's simplicity is intentional. Complex error handling logic lives in CommandError::stderr and CommandError::io. This function just connects the Read trait to those constructors, following the single responsibility principle.

See: Companion Code Section 12

Key Takeaways¶

First, process management in Rust requires explicit attention to resource cleanup—child processes need to be waited on, pipes need to be closed, and the order of these operations matters for correctness.

Second, the deadlock risk with process I/O is real and subtle. Reading stdout and stderr concurrently, whether through async I/O or threads, prevents buffer-full deadlocks that can otherwise freeze your program.

Third, bidirectional From implementations create smooth interoperability between custom error types and standard library types, enabling the ? operator to work naturally across API boundaries.

Fourth, the builder pattern provides extensibility for APIs even when current configuration is minimal. CommandReaderBuilder has one option today but can grow without breaking callers.

Fifth, implementing standard traits like Read integrates custom types into Rust's ecosystem. A CommandReader works with any code expecting a reader, from BufReader wrappers to read_to_end calls.

Sixth, Drop implementations provide resource safety as a last resort, but explicit cleanup methods give callers control over error handling. Both are needed for robust APIs.

What to Read Next¶

How does ripgrep use this process infrastructure? Read crates/cli/src/decompress.rs to see CommandReader in action, spawning decompression tools and treating their output as searchable streams.

Want to understand more about streaming I/O in ripgrep? The searcher module in crates/grep/src/searcher/mod.rs shows how various input sources, including process output, feed into the search machinery.

Curious about error handling patterns across ripgrep? The main binary in crates/core/main.rs demonstrates how errors from different subsystems, including command errors, propagate to user-facing messages.