Inside the Rust Compiler: From Source Code to Binary¶
This comprehensive guide explores the internal workings of the Rust compiler (rustc), from parsing source code to generating binaries. We'll examine the various intermediate representations (IRs) and learn how to interact with the compiler programmatically.
What is Rust?¶
Rust is defined by the Rust Language team as "a language empowering everyone to build reliable and efficient software." Key characteristics include:
- Systems-level control: Explicit memory allocation control
- Efficiency: No garbage collector, compiles to small binaries for embedded systems
- Memory safety: Prevents buffer overflows, use-after-free, and dangling pointers
- Thread safety: Data race freedom in concurrent code
- Compile-time guarantees: All safety checks happen at compile time
Borrowing and Lifetime Semantics¶
Rust's safety guarantees come from its borrowing rules: - Shared data cannot be mutated: Multiple immutable references are allowed - Mutable data cannot be aliased: Only one mutable reference at a time - Lifetimes ensure validity: References cannot outlive the data they point to
The unsafe Keyword¶
The unsafe keyword allows you to:
- Directly manipulate memory and raw pointers
- Call unsafe functions
- Access mutable static variables
- Implement unsafe traits
When used appropriately (like in the standard library), unsafe enables powerful abstractions while maintaining safety at higher levels.
The Rust Compiler Architecture¶
The Rust compiler (rustc) transforms source code through multiple intermediate representations before generating a binary:
Rust Source Code
↓
Abstract Syntax Tree (AST)
↓
High-level IR (HIR)
↓
Typed HIR (THIR)
↓
Mid-level IR (MIR)
↓
Code Generation (LLVM/GCC/Cranelift)
↓
Target Binary
Exploring the Rust Compiler Repository¶
The main Rust compiler repository contains several important directories:
compiler/: Core compiler code organized into multiple crateslibrary/: Standard library and core typessrc/: Bootstrap and build scriptstests/: Comprehensive test suite
Building the Compiler¶
To build the Rust compiler from source:
# Initial setup
./x.py setup
# Choose option B for compiler development
# Build the compiler
./x.py build
# Incremental builds (much faster)
./x.py build --keep-stage 1
The build process creates multiple compiler stages due to bootstrapping: - Stage 0: Minimal compiler built with existing Rust toolchain - Stage 1: Full compiler built with Stage 0 - Stage 2: Self-hosted compiler (Stage 1 building itself)
Intermediate Representations in Detail¶
Abstract Syntax Tree (AST)¶
The AST is created through lexing and parsing:
View the AST using:
The AST represents the program structure as a tree, with nodes for functions, expressions, and statements.
High-level IR (HIR)¶
HIR is created by lowering and desugaring the AST:
- Multiple loop constructs (for, while, loop) are normalized to loop
- Control flow is simplified to match statements
- The prelude is made explicit
# View HIR
rustc -Z unpretty=hir hello.rs
# View HIR tree structure
rustc -Z unpretty=hir-tree hello.rs
At the HIR level, the compiler performs: - Type inference: Determining types from context - Trait solving: Ensuring generic constraints can be satisfied - Type checking: Verifying type compatibility
Example of trait bounds:
fn log_elements<T: std::fmt::Display>(vec: Vec<T>) {
for (i, element) in vec.iter().enumerate() {
println!("{}: {}", i, element);
}
}
Typed HIR (THIR)¶
THIR contains fully elaborated types and is used for: - Unsafe code analysis - Additional type-related checks
Mid-level IR (MIR)¶
MIR transforms the tree structure into a Control Flow Graph (CFG):
MIR represents programs as: - Places in memory instead of variables - Basic blocks containing statements - Terminators that branch between blocks
Example MIR structure:
fn main() -> () {
let mut _0: ();
let _1: ();
bb0: {
_1 = const "Hello, world!";
_2 = &_1;
_3 = std::io::_print(_2);
goto -> bb1;
}
bb1: {
return;
}
}
At the MIR level, the compiler performs: - Borrow checking: Ensuring memory safety rules - Drop elaboration: Inserting cleanup code - Lifetime analysis: Validating reference lifetimes
Interacting with the Compiler Programmatically¶
You can write Rust programs that call the compiler internally using the rustc_driver crate:
Basic Setup¶
#![feature(rustc_private)]
extern crate rustc_driver;
extern crate rustc_interface;
use rustc_driver::Compilation;
use rustc_interface::interface;
struct MyCallbacks;
impl rustc_driver::Callbacks for MyCallbacks {
fn after_analysis<'tcx>(
&mut self,
compiler: &interface::Compiler,
queries: &'tcx rustc_interface::Queries<'tcx>,
) -> Compilation {
queries.global_ctxt().unwrap().peek_mut().enter(|tcx| {
// Access compiler internals here
println!("Analyzing crate: {}", tcx.crate_name(rustc_span::def_id::LOCAL_CRATE));
});
Compilation::Continue
}
}
fn main() {
let args: Vec<String> = std::env::args().collect();
rustc_driver::RunCompiler::new(&args, &mut MyCallbacks).run().unwrap();
}
Callback Points¶
The compiler provides three main callback points:
after_crate_root_parsing: After creating the initial ASTafter_expansion: After macro expansion and name resolutionafter_analysis: After all type checking and borrow checking
Accessing Compiler Internals¶
With a typing context (TyCtxt), you can call many internal compiler functions:
// Get all function bodies in the crate
let body_owners = tcx.hir().body_owners().collect::<Vec<_>>();
// Pretty print MIR
for def_id in body_owners {
rustc_middle::mir::pretty::write_mir_pretty(tcx, Some(def_id), &mut std::io::stdout()).unwrap();
}
Code Generation Backends¶
The final step transforms MIR into target code. Rust supports multiple backends:
- LLVM (default): Mature, highly optimizing
- GCC: Alternative with different optimization characteristics
- Cranelift: Faster compilation, less optimization
Custom Code Generation¶
You can implement custom code generation by implementing the CodegenBackend trait:
impl CodegenBackend for MyBackend {
fn codegen_crate(&self, tcx: TyCtxt, metadata: EncodedMetadata, need_metadata_module: bool) -> Box<dyn Any> {
// Custom code generation logic
}
// ... other required methods
}
This enables targeting custom architectures or generating specialized output formats.
Debugging and Optimization¶
Compiler Flags for Exploration¶
Useful flags for examining compiler internals:
# List all unstable options
rustc -Z help
# Common debugging flags
rustc -Z unpretty=ast # Show AST
rustc -Z unpretty=hir # Show HIR
rustc -Z unpretty=mir # Show MIR
rustc --emit=llvm-ir # Show LLVM IR
rustc --emit=asm # Show assembly
Development Workflow¶
When working on the compiler:
- Use the compiler profile:
./x.py setup→ option B - Make changes to compiler source
- Rebuild incrementally:
./x.py build --keep-stage 1 library - Test changes with the stage 1 compiler
This workflow reduces rebuild times from minutes to seconds.
Practical Applications¶
Understanding compiler internals enables:
- Static analysis tools: Analyze code without executing it
- Custom lints: Detect project-specific issues
- Code generation: Generate code from high-level descriptions
- Verification tools: Prove program properties
- Educational tools: Understand how language features work
Conclusion¶
The Rust compiler is a sophisticated piece of software that transforms high-level Rust code through multiple intermediate representations, each optimized for different types of analysis. By understanding these internals and learning to interact with them programmatically, you can build powerful tools that leverage the compiler's deep understanding of Rust programs.
The compiler's architecture demonstrates the careful balance between providing high-level language features and maintaining the performance and safety guarantees that make Rust unique in the systems programming landscape.