An AArch64 OS in Rust – Parsing the Device Tree (Part 2)

In the previous post, we built a structural parser that transformed the raw Device Tree Blob (DTB) into a searchable table of PlatformDevice entries. While we can now traverse the tree and see property names like reg or compatible, these properties are still just raw, big-endian byte buffers. Our kernel knows a device exists, but it doesn’t yet know its base address, the size of its register space, or how to route its interrupts.

This second part is where we move from structural parsing to semantic interpretation. We are shifting from simply identifying nodes to configuring hardware – extracting the precise memory maps, interrupt lines, and clock frequencies required to bring our drivers to life. This allows us to finally purge the hardcoded peripheral addresses from head.S and replace them with a dynamic discovery process that makes the kernel truly hardware-agnostic

Driver Registry

To bridge the gap between the DTB and our kernel logic, we need a mechanism to look at a device node and answer a fundamental question: “Which initialization routine handles this specific hardware?”

In the Linux kernel, this is handled by the Platform Bus. Drivers register an of_device_id table containing the compatible strings they support. When the kernel finds a matching node in the Device Tree, it probes the driver. For my kernel, I’ve implemented a simplified version of this pattern. Rather than a complex dynamic bus, I use a Driver Registry – a static table that pairs compatible strings directly with setup functions.


pub struct DeviceMatch {
    pub compatible: &'static str,
    pub setup_fn: fn(&PlatformDevice),
}

pub static CONFIGURED_DEVICES: [DeviceMatch; 3] = [
    DeviceMatch { compatible: "arm,gic-v3", setup_fn: gicv3::setup },
    DeviceMatch { compatible: "arm,pl011", setup_fn: pl011::setup },
    DeviceMatch { compatible: "arm,armv7-timer", setup_fn: arch_timer::setup },
];

This table maps the standardized compatible strings found in the DTB, like "arm,pl011" directly to a setup function in our driver code. This design creates a clean separation of concerns: the DTB parser handles the binary format, while the registry handles the kernel’s policy on which hardware to support.

The Zero-Infrastructure Reality

Before we dive into the drivers, we have to address a major technical hurdle: malloc does not exist yet. In a standard Rust program, we would just use a Vec or a HashMap to store the discovered devices. But in a bare-metal kernel, we don’t have a heap until we implement a memory allocator (reserved for future posts).

Because of the lack of a memory allocator, we have to work with what the hardware gives us: raw memory and static allocations. To handle this, I’ve implemented a couple patterns in the driver logic:

  • Static Global Tables: Since we can’t allocate memory at runtime, our DEVICE_TABLE and PHANDLE_TABLE are fixed-size arrays defined at compile time.
  • Linear Search: Without a HashMap, we resolve phandles and compatible strings via linear scans.
  • Zero-Copy Parsing: Instead of copying property values into new strings or buffers, we store raw pointers that point directly back into the original DTB blob.

The Initialization Order: A Two-Pass Approach

One of the first obstacles I encountered was a dependency loop. Most peripherals (like the UART or Timer) need to register interrupts during their setup. However, they can’t do that if the Interrupt Controller (GIC) hasn’t been initialized yet.

To solve this, my initialization logic performs a two-pass sequence. This mirrors how real kernels often have specific early init levels for critical infrastructure.

  1. Pass 1: We search for the GIC (arm,gic-v3) and initialize it first. This maps the interrupt distributor and prepares the CPU interfaces.
  2. Pass 2: With the GIC ready, we loop through the remaining devices. Now, when the UART driver starts, it can safely tell the GIC: “Hey, I’m at IRQ 33, please start listening for me”.

Deep Dive: The Driver Setup Functions

Now that the kernel has matched a DTB node to a driver and decided on the initialization order, the work moves inside the setup functions. Each driver must perform its own semantic parsing – turning those big-endian cells into hardware configurations.

Initializing the GICv3

The GICv3 is the architectural anchor of our system. It is the first device that must be brought online, as it governs how every other peripheral communicates with the CPU.

Extraction of MMIO Ranges

According to the ARM GICv3 Bindings, the reg property must follow a specific order. For our basic initialization, we care about the first two:

  • GIC Distributor interface (GICD)
  • GIC Redistributors (GICR), one range per redistributor region
  • GIC CPU interface (GICC)
  • GIC Hypervisor interface (GICH)
  • GIC Virtual CPU interface (GICV)

To read this property correctly, we must follow the Devicetree Specification regarding how addresses and sizes are encoded. The spec defines the following rules for #address-cells and #size-cells:

The #address-cells and #size-cells properties may be used in any device node that has children in the devicetree hierarchy and describes how child device nodes should be addressed. The #address-cells property defines the number of <u32> cells used to encode the address field in a child node’s reg property. The #size-cells property defines the number of <u32> cells used to encode the size field in a child node’s reg property.
The #address-cells and #size-cells properties are not inherited from ancestors in the devicetree. They shall be explicitly defined.

In our setup, we use these properties as a ruler to slice the addresses out of the raw DTB buffer. To reach the GICR, we follow the standard and skip the first entry (the GICD address and size) by calculating the offset based on these cell counts.


pub fn setup(dev: &device::PlatformDevice) {
    let mut gicd_addr: usize = 0;
    let mut gicr_addr: usize = 0;
    
    // 1. Get cell counts from the parent
    let (addr_cells, size_cells) = dev.get_parent_cells();

    if let Some(reg_prop) = dev.find_property("reg") {
        // 2. Decode GICD (Distributor) Base
        for i in 0..addr_cells as usize {
            let cell = convert::read_be_u32(reg_prop.value, i * 4);
            gicd_addr = (gicd_addr << 32) | cell as usize;
        }

        // 3. Calculate the offset for GICR (Redistributor)
        // Skip (address cells + size cells) * 4 bytes to reach the next entry
        let gicr_off = (addr_cells + size_cells) as usize * 4;
        for i in 0..addr_cells as usize {
            let cell = convert::read_be_u32(reg_prop.value, gicr_off + (i * 4));
            gicr_addr = (gicr_addr << 32) | cell as usize;
        }
    }
    
    // Initialize hardware with the discovered addresses
    let mut gic = Gicv3::new(gicd_addr, gicr_addr);
    gic.init();
    
    // Final step: Enable the CPU interface to receive interrupts
    enable_grp1_ints();
}

PL011 UART

With the GIC initialized we can set up the PL011 UART. Just like the GIC, the UART driver must first discover its own MMIO base address. Since we already navigated the cell-encoding rules for the reg property during the GIC setup, we apply that same logic here to find where the UART lives in the memory map.

Fulfilling the Interrupt Contract

The UART requires an interrupt line to notify the CPU of incoming data. In the Devicetree, this connection is established via the interrupt-parent property, while the specific configuration of that signal is defined by the #interrupt-cells meta-property.

The ARM GICv3 Bindings defines the rules for this property:

Specifies the number of cells needed to encode an interrupt source.
Must be a single cell with a value of at least 3.
If the system requires describing PPI affinity, then the value must
be at least 4.

The 1st cell is the interrupt type; 0 for SPI interrupts, 1 for PPI
interrupts, 2 for interrupts in the Extended SPI range, 3 for the
Extended PPI range. Other values are reserved for future use.

The 2nd cell contains the interrupt number for the interrupt type.
SPI interrupts are in the range [0-987]. PPI interrupts are in the
range [0-15]. Extended SPI interrupts are in the range [0-1023].
Extended PPI interrupts are in the range [0-127].

The 3rd cell is the flags, encoded as follows:
bits[3:0] trigger type and level flags.
1 = edge triggered
4 = level triggered

In our implementation, we define MAX_INTERRUPT_CELLS as 4 to remain fully compliant with the bindings, which require at least 4 cells when PPI affinity is involved. Even though the UART (an SPI) primarily uses the first 3, the code is prepared for the full specifier:

  • 1st cell: The interrupt type (0 for SPI interrupts, 1 for PPI).
  • 2nd cell: The interrupt number relative to the type.
  • 3rd cell: The flags for trigger type and level (e.g., 1 for edge or 4 for level).
  • 4th cell: (Optional) PPI affinity mask.

Phandles: The Devicetree Link

Beyond base addresses and interrupts, the UART driver requires a clock frequency to calculate the correct baud rate1. In our assembly version, we hardcoded this to a fixed 24MHz (0x16e3600). To make this dynamic, we now use Phandle Resolution.

As defined by the Devicetree Specification:

The phandle property specifies a numerical identifier for a node that is unique within the devicetree. The phandle property value is used by other nodes that need to refer to the node associated with the property.

In our UART node, the clocks property contains one of these numerical IDs. This creates a "source of truth" relationship: the UART doesn't need to know the clock frequency; it only needs to know which node to ask. To find the frequency, the code follows this trail:

  • Extract the Phandle: Read the ID from the UART's clocks property.
  • Locate the Provider: Search the DTB for the node that owns that specific ID.
  • Read the Frequency: Once at the provider node, extract the value from its clock-frequency property.

The Unified Setup

The UART setup function represents the final evolution of our discovery logic. It consolidates the patterns we established with the GIC while adding the resolution of these new properties.


pub fn setup(dev: &device::PlatformDevice) {
    let mut addr: u64 = 0;
    let mut freq: u32 = 0;
    let mut interrupt_info: [u32; gicv3::MAX_INTERRUPT_CELLS] = [0; gicv3::MAX_INTERRUPT_CELLS];
    
    // 1. Get #address-cells from parent to parse 'reg' (Same logic as GIC)
    let (addr_cells, _) = dev.get_parent_cells();
    if let Some(reg_prop) = dev.find_property("reg") {
        for i in 0..addr_cells as usize {
            let cell = convert::read_be_u32(reg_prop.value, i * 4);
            addr = (addr << 32) | cell as u64;
        }
    }

    // 2. Parse interrupts property via the interrupt-parent contract
    if let Some(int_prop) = dev.find_property("interrupts") {
        if let Some(intc) = dtb::find_interrupt_parent(dev) {
            let mut interrupt_cells: u32 = 3;
            if let Some(cells_prop) = intc.find_property("#interrupt-cells") {
                interrupt_cells = convert::read_be_u32(cells_prop.value, 0);
            }

            // We read up to MAX_INTERRUPT_CELLS (4) to ensure spec compliance
            for i in 0..interrupt_cells.min(gicv3::MAX_INTERRUPT_CELLS as u32) {
                interrupt_info[i as usize] = convert::read_be_u32(int_prop.value, (i * 4) as usize);
            }

            if interrupt_info[0] == 0 { // SPI
                let spi_id = 32 + interrupt_info[1];
                
                // Parse trigger flags: bits 0-1 (edge), bits 2-3 (level)
                if (interrupt_info[2] & 0x3) != 0 {
                    gicv3::set_spi_trigger_edge(spi_id);
                } else {
                    gicv3::set_spi_trigger_level(spi_id);
                }
                
                gicv3::set_spi_priority(spi_id, 0x00);
                gicv3::set_spi_group(spi_id);
                gicv3::set_spi_routing(spi_id, 0); 
                gicv3::enable_spi(spi_id);
            }
        }
    }

    // 3. Parse clocks property for clock frequency
    if let Some(clocks_prop) = dev.find_property("clocks") {
        let phandle_id = convert::read_be_u32(clocks_prop.value, 0);
        if let Some(clock_node) = dtb::find_device_by_phandle(phandle_id) {
            if let Some(freq_prop) = clock_node.find_property("clock-frequency") {
                freq = convert::read_be_u32(freq_prop.value, 0);
            }
        }
    }

    init_uart(addr as *mut u32, freq);
    configure_uart();
}

Generic Timer

Unlike the UART or the GIC, the ARM Generic Timer isn't accessed through memory-mapped I/O registers. Instead, it is accessed via internal CPU system registers (CNTP_CTL_EL0, etc.). Because of this, we won't find a reg property in its node.

The timer exists in the DTB solely to describe its connection to the GIC. This makes the interrupt-parent handshake we discussed earlier the only task the setup function needs to perform.

PPI: Private Peripheral Interrupts

While the UART used an SPI (ID 32+), the timer uses a PPI. As the name suggests, these interrupts are private to each core. According to the GICv3 contract we cited:

  • 1st Cell = 1: Specifies a PPI.
  • Interrupt ID Offset: For PPIs, the GIC hardware maps the IDs starting at 16.

In the setup code, you'll notice a bit of pointer arithmetic (ns_offset). This is because the ARM Arch Timer Bindings specify that the interrupts property must contain a list in a specific order:

  1. Secure Physical Timer.
  2. Non-Secure Physical Timer.
  3. Virtual Timer.
  4. Hypervisor Physical Timer.

Since we are running in a non-secure context, we skip the first interrupt specifier (the Secure timer) to grab the Non-Secure specifier.

Finalizing the Timer Setup

Because we have already established the interrupt-parent and #interrupt-cells logic, the timer initialization is streamlined. The driver simply verifies the PPI type, applies the +16 offset to the ID, and registers the handler.


pub fn setup(dev: &device::PlatformDevice) {
    let mut interrupt_info: [u32; gicv3::MAX_INTERRUPT_CELLS] = [0; gicv3::MAX_INTERRUPT_CELLS];
    // Parse interrupts property
    if let Some(int_prop) = dev.find_property("interrupts") {
        if let Some(intc) = dtb::find_interrupt_parent(dev) {
            // Get #interrupt-cells from interrupt controller
            let mut interrupt_cells: u32 = 3;
            if let Some(cells_prop) = intc.find_property("#interrupt-cells") {
                interrupt_cells = convert::read_be_u32(cells_prop.value, 0);
            }

            // Read interrupt specifier cells
            let ns_offset = interrupt_cells as usize * 4;
            for i in 0..interrupt_cells.min(gicv3::MAX_INTERRUPT_CELLS as u32) {
                unsafe {
                    interrupt_info[i as usize] =
                        convert::read_be_u32(int_prop.value.add(ns_offset), (i * 4) as usize);
                }
            }

            // interrupt_info[0] = irq_type (0 = SPI, 1 = PPI)
            // interrupt_info[1] = interrupt_number
            // interrupt_info[2] = flags (trigger type)
            if interrupt_info[0] == 1 {
                let ppi_id = 16 + interrupt_info[1];
                // bits 0-1: edge trigger (1=rising, 2=falling)
                // bits 2-3: level trigger (4=high, 8=low)
                if (interrupt_info[2] & 0x3) != 0 {
                    gicv3::set_ppi_trigger_edge(ppi_id);
                } else {
                    gicv3::set_ppi_trigger_level(ppi_id);
                }
                gicv3::set_ppi_priority(ppi_id, 0x00);
                gicv3::set_ppi_group(ppi_id);
                gicv3::enable_ppi(ppi_id);
            }
        }
    }
}

With the timer now dynamic, we have successfully migrated the final piece of our core infrastructure from hardcoded assembly constants to a fully discovered, data-driven system.

From Static Assembly to Dynamic Rust

The transition is best summarized by what we were able to delete. In the early version, we were essentially hardcoding the QEMU virt machine’s manual directly into the source code. By adopting a Devicetree discovery approach, we replaced those hardcoded assumptions with a generic, data-driven loop.

MMIO Discovery

In the "previous" version, we had to point the kernel to specific memory addresses. If the hardware layout changed, the kernel died. Now, we've replaced those rigid assumptions with a runtime search of the reg property.


--- a/head.S
- .equ GICD_BASE_ADDR, 0x08000000
- .equ UART_BASE_ADDR, 0x09000000
 
--- a/src/drivers/uart/pl011.rs
+ if let Some(reg_prop) = dev.find_property("reg") {
+     for i in 0..addr_cells as usize {
+         let cell = convert::read_be_u32(reg_prop.value, i * 4);
+         addr = (addr << 32) | cell as u64;
+     }
+ }

Interrupt Mapping

We stopped manually loading arbitrary interrupt IDs. By respecting the GICv3 bindings, our drivers now calculate their own connection points.


--- a/head.S
- mov w1, #30  // Hardcoded Timer PPI
- mov w1, #33  // Hardcoded UART SPI
 
--- a/src/drivers/gicv3.rs
+ let ppi_id = 16 + interrupt_info[1]; // Derived from GIC spec
+ let spi_id = 32 + interrupt_info[1]; // Derived from GIC spec

Clock Resolution

The UART is no longer tethered to a 24MHz assumption. It now follows the Phandle trail to find the actual hardware truth.


--- a/head.S
- .equ UART_CLOCK_FREQ, 0x16e3600
 
--- a/src/drivers/uart/pl011.rs
+ if let Some(clocks_prop) = dev.find_property("clocks") {
+     let phandle_id = convert::read_be_u32(clocks_prop.value, 0);
+     if let Some(clock_node) = dtb::find_device_by_phandle(phandle_id) {
+         freq = convert::read_be_u32(freq_prop.value, 0);
+     }
+ }

The Boot Trampoline

The assembly boot sequence has been stripped of its complexity. Its only remaining job is to set up the environment and hand the DTB pointer to Rust.


--- a/head.S
- ldr x0, =GICD_BASE_ADDR
- bl init_gic_distributor
- ldr x0, =UART_BASE_ADDR
- bl init_uart
+ /* Dynamic discovery loop in Rust */
+ mov x0, x8       // x8 contains the DTB address from bootloader
+ bl parse_dtb

Removing these magic numbers moves the kernel from hardcoded assumptions to runtime discovery. While these snippets highlight the core logic shifts, the complete transformation (including the deleted assembly and full Rust implementation) is available in the following commit.

Next Steps

This marks a major milestone: we have successfully transformed a machine-specific binary into an autonomous, hardware-aware kernel that is no longer tethered to hardcoded constants. By decoupling the hardware description from the driver logic, the kernel remains simple and portable, capable of interrogating any ARMv8 environment to configure itself on the fly.

With physical discovery now solved, our next task will be to move beyond this "flat" physical world and implement support for Virtual Memory. This will allow us to map our own logical address space.

The full implementation of these drivers and the dynamic boot process is available in the aarch64_kernel repository on my GitHub.

References

  1. In this project this value is fixed to 115200. ↩︎