Published on

VFIO Deep Dive: Part 1

  • avatar


Virtualisation is the cornerstone of modern compute at scale, with tremendous efforts at both the hardware and software level to minimise the performance implications of running a machine within a machine.

Use Case

When optimised, hypervisors running on modern hardware enable low latency VMs within 1% of native by taking advantage of previously esoteric features now finding their way into consumer grade hardware: e.g. Intel APICv, traditionally exclusive to HEDT, is starting to appear in 12th/13th Gen consumer SKUs.

For a developer, this lets us work on, or with, native Windows compute-intensive applications without adopting the reduced security posture of running Windows on bare metal; for an org this tech lets us combine high-performance (and high cost) systems with the flexibility of virtual machine orchestration.

Nowadays, the benefits of virtualisation reach far beyond the datacentre - a user on Windows 11 is likely already running inside a virtual machine, perhaps even without knowing it. Microsoft's 'virtualisation-based security' effectively moves the OS's root of trust down to ring-1 and lets hardware (MMUs) handle the heavy lifting of kernel memory protection.

Hold on MMUs? Ring-1?

Virtual Machine Extensions (VMX)

Before the advent of hardware assisted virtualisation, a VMM (virtual machine monitor/hypervisor) would be responsible for managing guest VMs purely through software. This was suboptimal, and superseded in x86 by Virtual Machine Extensions (VMX) in the mid 00s: commercialised by Intel and AMD as VT-x and AMD-V respectively.

VMX introduced another ring: -1 (hypervisor) below ring 0 (kernel) but still above the ominous lower levels of system management mode... but how secure can a system really be if it only takes one naughty USB stick to DMA main memory anyway? Enter: IOMMUs (aka VT-d & AMD-Vi).

I/O Memory Management Unit (IOMMU)

MMUs translate virtual memory to physical memory, this prevents a process accessing things it probably shouldn't (cue SIGSEGV). However, sometimes we may need to provide similar controls over a device, such as a GPU, in the same manner: input-output MMUs provide this, and let a guest VM enjoy DMA-like access to a device attached to the host through MMU-powered address translation.

Virtual Function I/O

Putting it all together brings us to the Linux kernel subsystem: VFIO, or: hardware IOMMU based DMA mapping and isolation. For example, VFIO lets us run PCI passthrough of a GPU device from Linux host to a Windows guest at native speed.

As I mostly work with Unreal, setting up a flexible high-performance workstation that can run both Windows and Linux without resorting to the headache of dual-booting was essential - in part 2 we will look at putting this theory into action.