Published on

VFIO Deep Dive: Part 1

Authors
  • avatar
    Name
    michael
    Twitter

Virtualisation

Virtualisation is the cornerstone of modern compute at scale, with tremendous efforts at both the hardware and software level to minimise the performance implications of running a machine within a machine.

Use Case

When optimised, hypervisors running on modern hardware enable low latency VMs within 1% of native by taking advantage of previously esoteric features now finding their way into consumer grade hardware: e.g. Intel APICv, traditionally exclusive to HEDT, is starting to appear in 12th/13th Gen consumer SKUs.

For a developer, this lets us work on, or with, native Windows compute-intensive applications without adopting the reduced security posture of running Windows on bare metal; for an org this tech lets us combine high-performance (and high cost) systems with the flexibility of virtual machine orchestration.

Nowadays, the benefits of virtualisation reach far beyond the datacentre - a user on Windows 11 is likely already running inside a virtual machine, perhaps even without knowing it. Microsoft's 'virtualisation-based security' effectively moves the OS's root of trust down to ring-1 and lets hardware (MMUs) handle the heavy lifting of kernel memory protection.

Hold on MMUs? Ring-1?

Virtual Machine Extensions (VMX)

Before the advent of hardware assisted virtualisation, a VMM (virtual machine monitor/hypervisor) would be responsible for managing guest VMs purely through software. This was suboptimal, and superseded in x86 by Virtual Machine Extensions (VMX) in the mid 00s: commercialised by Intel and AMD as VT-x and AMD-V respectively.

VMX introduced another ring: -1 (hypervisor) below ring 0 (kernel) but still above the ominous lower levels of system management mode... but how secure can a system really be if it only takes one naughty USB stick to DMA main memory anyway? Enter: IOMMUs (aka VT-d & AMD-Vi).

I/O Memory Management Unit (IOMMU)

MMUs translate virtual memory to physical memory, this prevents a process accessing things it probably shouldn't (cue SIGSEGV). However, sometimes we may need to provide similar controls over a device, such as a GPU, in the same manner: input-output MMUs provide this, and let a guest VM enjoy DMA-like access to a device attached to the host through MMU-powered address translation.

Virtual Function I/O

Putting it all together brings us to the Linux kernel subsystem: VFIO, or: hardware IOMMU based DMA mapping and isolation. For example, VFIO lets us run PCI passthrough of a GPU device from Linux host to a Windows guest at native speed.

As I mostly work with Unreal, setting up a flexible high-performance workstation that can run both Windows and Linux without resorting to the headache of dual-booting was essential - in part 2 we will look at putting this theory into action.