The x86 KVM MMU has significant scaling issues with many VCPUs and lots of RAM. Over the last year, we have made substantial improvements to the x86 KVM MMU in the direct-mapped TDP case, to reduce lock contention and memory overheads, with the goal of migrating VMs with 416 VCPUs and 12TB of memory. With these changes, the x86 KVM MMU can handle EPT/NPT violations from all VCPUs in parallel, requires ~99% less MMU memory overhead in steady state with 2M pages, simplifies the implementation of MMU operations, and more. This talk will cover new synchronization models, abstractions, and data structures, and details of the performance we have gained from them.