Clever scheduling of virtual CPUs on Symmetric MultiThreaded systems for, among other things, making highly impractical side-channel attacks even more unpractical, is no new idea. Unfortunately, via exploiting L1TF and MDS vulnerabilities in Intel CPUs, impractical is becoming practical!
But, instead than disabling SMT, we can avoid that VM share cores. This is called core-scheduling, and implementing it requires quite some scheduler changes. Nevertheless, work toward that is being done for both KVM and Xen (and other hypervisors have it already).
After an overview of L1TF and MDS, we will see how core-scheduling may help and why it is so tricky to implement (although in different ways) for both KVM and Xen.
We will show numbers from performance evaluation of the currently available implementations. In fact, all this only matters if performance are better than turning SMT off.
Dario is a Virtualization Software Engineer at SUSE. He's been active in the Open Source virtualization space for a few years. Within the Xen-Project, he is still the maintainer of the Xen hypervisor scheduler. He also works on Linux kernel, KVM, Libvirt, and QEMU. Back during his... Read More →
RISC-V (pronounced "risk-five") is an Instruction Set Architecture (ISA) that's available under open, free and non-restrictive licences. It is a clean and modular ISA where new features are added as optional extensions. The RISC-V hypervisor extension provides virtualisation capabilities to a RISC-V CPU and it is designed considering both Type-1 and Type-2 hypervisors. In this talk Alistair and Anup will explain the RISC-V Hypervisor extension, discuss how it was implemented in QEMU and talk about the RISC-V implementation of KVM.
Alistair will talk about the current state of the RISC-V Hypervisor Extensions in QEMU. This will include details about the implementation and design choices that were made. He will discuss what we currently have upstream and how this compares to the latest and proposed future specification versions. This will include all known limitations and proposed future work in the QEMU implementation. He will also talk about current out of tree work that is not yet ready to be submitted upstream and discuss how this can be upstreamed.
Anup will then explain KVM RISC-V internals and the road ahead for KVM RISC-V. Anup will also show a demo of KVM RISC-V using using KVMTOOL.
Alistair Francis currently works at Western Digital as part of the RISC-V software research team. He is the QEMU RISC-V maintainer; developing, reviewing and merging QEMU patches. He also has a focus on security, specifically secure operating systems related to Root of Trust (RoT... Read More →
Anup Patel is an open-source enthusiast with primary interest in hypervisors, firmware, and Linux kernel. He has 15+ years of experience developing system level software across architectures. He is part of the Western Digital system software research group which does lot of open-source... Read More →
VirtIO was designed to standardize hypervisor interfaces for virtual machines - but we are beginning to see the emergence of Virtio hardware. This talk will answer the questions: why does this make sense, what works and what are the issues hardware implementations of virtio have to overcome? Topics to be covered:
- What is the difference between hardware virtio devices and virtio data path accelerators? - What are the minimal requirements of virtio in hardware? - How can we handle compatibility, including hardware bugs and limitations? - How to make live migration work? What about overcommit? - Which changes included in the recent virtio specification help design hardware virtio devices? - Which known issues remain and how does the Virtio committee plat to address them? - Why design Virtio in hardware? Are there alternatives? - Why get involved with the Virtio specification process?
Michael has been with Red Hat for more than 10 years. In his role as a Distinguished Engineer he acts as a chair of the Virtio Technical Committee, overseeing the development of the virtio specification for virtual devices. He also maintains several subsystems in QEMU and Linux and... Read More →
With the advent of fast storage technologies like NVMe and 3DXP, hypervisors are facing unprecedented challenges. The added software overhead involved in access validations, general data movement and notification between domains is more noticeable than ever. It affects all sorts of performance dimensions including bandwidth, IOPS and latency, most of which have been vastly hidden by the slow nature of devices until recently.
This talk is divided in two parts. Firstly, we will focus on storage performance evaluation and benchmarks, showing how these translate to virtualisation. Secondly, we will dive into hypervisors based on KVM and Xen to compare how they work and discuss how they can deliver the best end user experience in terms of performance and efficiency.
Felipe is a Senior Staff Software Engineer working for Nutanix since 2015, more specifically leading the engineering efforts of the Acropolis Hypervisor (AHV). He brings nearly 20 years of expertise in storage performance and virtualisation. This includes four years at Citrix working... Read More →
When it comes to performance monitoring KVM provides sophisticated tools to deep dive into specific aspects. For example kvm_stat or perf allow to analyse the exits from guest mode. On the other hand getting a system level view or having permanent monitoring and analytics is only available for the process view, e.g. with tools like sysstat. Other hypervisors offer a much better out of the box experience. This talk is about extending the tooling for kvm stats to better integrate into the bigger picture.
A technical (and end-user oriented) Q&A panel discussion on a variety of topics related to KVM, QEMU and more. The discussion will be for about an hour. Topics will be chosen on the spot from a prepared list, and from the live Etherpad, where an audience (live or remote) can add questions before or during the discussion.
Kashyap Chamarthy works as part of Red Hat's cloud engineering group. He focuses his efforts on integrating low-level virtualization components (KVM, QEMU, libvirt and related infrastructure) with high-level management software (e.g. OpenStack and others). Over the past 10 years... Read More →
Principal Engineer, Kernel & Operating System Team, Amazon
David is a Principal Engineer in Amazon’s Kernel and Operating System team, working on Linux and Xen to support Amazon EC2. David started hacking on Linux in 1995 when he was an undergraduate at the University of Cambridge. He has since worked at Red Hat, and in Intel’s Open... Read More →
Andrea Arcangeli joined Red Hat in 2008 because of his interest in working on the KVM Virtualization Hypervisor, with a special interest in virtual machine memory management. He worked on many parts of the Linux Kernel, especially on the Virtual Memory subsystem. Andrea started working... Read More →
Karen Noel is Director of Platform Virtualization and Network Engineering at Red Hat. She has been working on Operating System kernels her entire career and on Virtualization technologies since 2005. She was formerly with Digital Equipment Corporation and HP and has been with Red... Read More →
Konrad Wilk is a Software Director at Oracle. His group's mission is to make Linux and Xen Project virtualization better and faster. As part of this work, Konrad has been the maintainer of the Xen Project subsystem in Linux kernel, Xen Project maintainer and had been the Release Manager... Read More →
Paolo is a Distinguished Engineer at Red Hat and the upstream maintainer for both KVM and various subsystems in QEMU. As a contributor to QEMU, through the years, he has worked on various parts of the project architecture, including the threading architecture, the test frameworks... Read More →
A period of increased disruption has begun in the virtualization space with new applications such as Kubernetes, KubeVirt and Kata Containers challenging traditional virtual machine usage paradigms. The libvirt developers have responded with self examination, reconsidering historic decisions, identifying what is required to stay relevant to modern developer & application needs.
The talk will outline the many significant changes and plans to come out of this exercise. Dramatic changes to the build system with the replacement of autotools by a cutting edge, easy to use alternative. The benefits of adoption of the glib2 library to replace current APIs and GNULIB. The potential for using the modern Rust and Golang languages. Modularization of the libvirt daemon and enabling daemon-less embedded use of the KVM driver. A switch from email based development to well known web based tooling.
Daniel is a long term contributor in the open source virtualization space working at Red Hat. A lead architect of the libvirt project since its inception, frequent contributor & subsystem maintainer to QEMU and has involved in many other projects including OpenStack, GTK-VNC, libosinfo... Read More →
Thursday October 31, 2019 09:30 - 10:00 CET
Forum 3
Firecracker is an open source VMM written in Rust, leveraging KVM to provide isolation for multi-tenant, serverless workloads like containers and functions. It is currently used in production by AWS Lambda and AWS Fargate.
Each Firecracker process has a low memory overhead, it boots virtual machines in as little as 125 milliseconds and oversubscribes host resources in order to pack thousands of microVMs on a single host. But in a multi-tenant environment, the most important requirement is properly enforcing the security isolation of workloads.
In this talk we will go over the design decisions we took when building Firecracker, showcasing the advantages as well as the limitations of this VMM. What does it take to run Firecracker at scale? Are Rust’s builtin protection mechanisms enough to ensure smooth sailing in production? Come and find out!
I am a software engineer with the Amazon Web Services Firecracker team. I am passionate about open source and, beyond Firecracker, I am also contributing to rust-vmm, a community effort to create a shared set of Rust-based Virtual Machine Monitor components. So far I’ve been talking... Read More →
Alexandra is a software development engineer at AWS and one of the maintainers of the Firecracker project. Her work is centered on the Firecracker virtual machine monitor.
Virtualization technologies build the infrastructure of cloud computing. However, with more and more VMs and workloads running on cloud, traditional virtualization technologies exposed some weakness under cloud environment, i.e., virtualization overhead, performance fluctuation, and higher cost overhead, etc. ZERO is Huawei’s next generation virtualization platform – targeted achieving 4 '0's: '0' reserved for CPU, '0' reserved for memory, ‘0’virtualization overhead, and ’0’ performance fluctuations. By designing ZERO virtualization chip, ZERO System offloads overhead to ZERO chip&card, including all network I/O, all storage I/O, and all cloud control plane. By designing split-hypervisor, ZERO leaves a very small and silent hypervisor at X86/ARM server, therefor improving overall resource utilization and performance. Currently ZERO1.0 has been launched on Huawei Cloud, supporting both VM and bare metal instances, and supports both X86 and ARM server.
The nested virtualization functionality is one of the key functionalities of modern hypervisors. Yet, one central quest is to find an adequate way to write functional tests that check and verify the entire "KVM/QEMU/libvirt" stack in each level of (nested) guest. How can each guest level be supervised, managed, tested without introducing high complexity and without writing duplicated code in each guest level?
In this presentation, Marc Hartmayer will discuss existing test approaches and present an alternative approach by using "self-replicating programs" in combination with the technique of remote proxy objects. Moreover, he will show a demo for a test case in which the pass-through functionality of a device will be tested up to the Nth level. Lastly, he'll give an outlook on how this approach could be integrated into existing frameworks like Avocado and what else could be done.
During the Spring Festival Gala, the instantaneous traffic is several hundred times that of normal, and the burst traffic during the activity greatly exceeds the current capacity of the IDCs. At the same time, to ensure the QoS of the mixed deployment online services, the isolation level of various resources is very high in every aspect, not limited to page cache, cpu scheduling capability, memory bandwidth, etc. In this session, Ye Lu & Zhenwei will introduce how the decision KVM-based Hybrid deployment solution is made, the performance optimization at the landing, and the system monitoring after virtualization, such as more accurate network analysis tools to distinguish app backend error, physical network outage, virtualized network failure. The solution help services go through the traffic peaks and improve the overall resource utilization of the IDCs.
Zhenwei Pi is working as a cloud computing engineer in ByteDance. He is responsible for the IaaS architecture of ByteDance’s production environment, including private cloud and edge computing cloud.
Yelu is working as a cloud computing engineer in ByteDance, which has more than 600 millions active users and hundreds of thousands of servers all over the world. She is responsible for the IaaS architecture of ByteDance’s production environment, including private cloud and edge... Read More →
Nested virtualization on x86 is finally becoming a thing: lots of work has been done recently to eliminate bugs and make it faster. Testing, however, remains a challenge and regressions even for KVM-on-KVM are, unfortunately, not uncommon. Adding third party hypervisors (Hyper-V, VMware,...) and different types of L2 guests to the picture also doesn't make it any simpler.
The talk will try to cover the existing KVM testing frameworks: kvm-unit-tests and selftests, what these frameworks test and what they don't, the gaps we have between VMX and SVM. Possible improvements and additional testing approaches will be suggested. Overall, this is going to be an open discussion on how we can test nested virtualization better.
Frequent updates (software and firmware) become a major pain point to Cloud Service Providers. There have been some approaches to address this, for example hot patching, live migration, etc., but there still have some limitations for each of them. VMM fast restart tries to propose an alternative solution, which leverages kexec-based fast rebooting of host machine while keeping VM states in memory across reboot, to achieve short service downtime, high success rate and low management overhead.
This talk will introduce the technical approaches, current status of development, and future plans of VMM fast restart. Related challenges will also be described in this talk.
Jason Zeng is a software engineer from Intel virtualization team, focusing on various KVM/virtualization features and projects. Currently he is working on VMM Fast Restart project which aims to provide a solution for fast upgrading and rebooting VMM/host kernel while impose less impact... Read More →
Tests from the KVM unit tests framework have been traditionally run on only one hypervisor...KVM. But having a clean and tiny test framework has been so invaluable, we started porting it to all s390 hypervisors that are out there.
This allowed new users like hardware and firmware to use it and with the advent of Protected Virtualization became an important part of software and hardware verification.
This talk concentrates on how we used KVM unit tests in the past, how we're using it right now and what lies in the future for s390 (and maybe also other platforms).
Cross and stacked hypervisor testing to the rescue!
Janosch is a software engineer at IBM Germany and a s390 co-maintainer for KVM. He works on guest memory management, Protected Virtualization and KVM testing.
Ever since KVM was created, the tenant split has always been very clear: KVM inside the Linux kernel provides an abstraction layer for CPU and close-to-CPU hardware, guests run as if they were on real hardware and user space (QEMU usually) emulates real world hardware.
It's about time we start to reconsider that split though. With spectre mitigations in place, exiting guest context suddenly becomes much more expensive than before. From a general security point of view we ideally want to run as little code as we can in host context. Also, with device assignment becoming commodity, maybe we can build faster virtual devices if we think out of the box.
In this presentation I will introduce a prototype I've been working on that implements legacy device emulation inside guest firmware and explain all the security as well as tenant split benefits that brings.
Principal Software Engineer, Amazon Development Center Germany GmbH
Alexander currently works at AWS and is responsible for the Nitro Hypervisor. Previously, he worked on QEMU, KVM, openSUSE / SLES on ARM and U-Boot. Whenever something really useful comes to his mind, he implements it. Among others he did Mac OS X virtualization using KVM, nested... Read More →
virtio-fs is a new shared file system for virtual machines. Unlike previous approaches, it is designed to take advantage of the co-location of virtual machines and the hypervisor to achieve local file system semantics and performance. This talk covers the status of virtio-fs, its key features, and use cases.
Amongst its features, the ability to share the host page cache with the guest is unique and not available in other shared file systems. This leads to interesting applications, including local file system mmap MAP_SHARED semantics, memory footprint reduction, and efficient page cache sharing between guests.
This talk also covers metadata coherence and the shared memory version table that is being developed to achieve this. The table allows guests accessing the same files and directories to have a consistent view even when other guests make changes to the file system.
Stefan works on QEMU and Linux in Red Hat's Virtualization team with a focus on storage, VIRTIO, and tracing. Recent projects include libblkio, virtiofs, storage performance optimization for NVMe drives, and out-of-process device emulation. Stefan has been active in the QEMU community... Read More →
Recent vulnerabilities like L1 Terminal Fault (L1TF) and Microarchitectural Data Sampling (MDS) have shown that the cpu hyper-threading architecture is prone to leaking data with speculative execution attacks.
With KVM, a guest VM can use speculative execution attacks to leak data from the sibling hyper-thread, thus potentially accessing data from the host kernel, from the hypervisor or from another VM.
Kernel Address Space Isolation is a project aims to use address spaces to isolate some parts of the kernel to prevent leaking sensitive data. If KVM can be run in an address space containing no sensitive data, and separated from the full kernel address space, then KVM would be immune from leaking secrets.
A first proposal to implement KVM Address Space Isolation and early discussions are available here: https://lkml.org/lkml/2019/5/13/515
Liran Alon is the Virtualization Architect of OCI Israel (Oracle Cloud Infrastructure). He is involved and lead projects in multiple areas of the company's public cloud offering such as Compute, Networking and Virtualization. In addition, Liran is a very active KVM contributor (mostly... Read More →
Alexandre Chartre is a Consulting Developer in the Linux and Virtualization engineering team at Oracle. Lately, he has been focusing on security issues on Linux, in particular on Spectre and Meltdown issues (and all variants and derivatives) and their impact on virtualization and... Read More →
Virtio-fs(https://virtio-fs.gitlab.io/) is proposed recently to provide file system sharing for lightweight VMs and containers workloads, where shared volumes are a requirement.
In this presentation, we propose an SPDK(Storage Performance Development Kit, https://spdk.io) userspace vhost-user-fs solution, which can be used together with QEMU/Kata Container to accelerate virtio-fs. Virtio-fs uses FUSE instead of 9P for communication. We will present this solution in details including the utilization of techniques such as virtio-fs, blobfs (SPDK file system) and the significant performance gain achieved. Blobfs can be built on abstract block device layer in SPDK, which can access local or remote storage services via iSCSI/NVMe/NVMeoF protocols in userspace. Relying on this solution, we are going to build a fast, consistent and secure manner to share directory tree on host to guests.
Xiaodong Liu is a senior cloud engineer at Intel, working on storage related areas like Storage Performance Development Kit (SPDK) and Intel Intelligent acceleration Library (ISA-L). He focuses on acceleration, protocols and innovations among virtualization, cloud native storage and... Read More →
One concern of container workloads has always been the limited process isolation provided by the hosting OS. With Virtualization Based Hardening (VBH), a new set of security policies can be enforced by an open source thin-layer hypervisor, which can prevent compromised containers from tampering the OS kernel or other containers, through a set of memory exploit and attack techniques. Intel, together with Bitdefender, worked on several memory introspection use-cases designed to defend container workloads against zero day binary exploits. We will review a few CVEs as examples.
In addition, the set of APIs exposed by the HV is intended to assist anyone in implementing hardening modules for containers. The solution can be used for other scenarios, such as debugging. We also present a tool for kernel developers which can help in some uncommon tasks such as finding self-modified kernel code.
Jun Nakajima is a Senior Principal Engineer at the Intel Open Source Technology Center, leading virtualization and security for open source projects. Jun presented a number of times at technical conferences, including LSS, KVM Forum, Xen Summit, LinuxCon, OpenStack Summit, and USENIX... Read More →
Andrei joined Bitdefender in October 2008, as a junior virus researcher. Initial responsibilities included reverse engineering of malicious samples, adding signatures for malicious files, developing disinfection routines and developing code-similarity methods and systems. He joined... Read More →
The virtio-vsock device provides a zero-configuration communication channel between guest agents and hypervisor services independent of the guest network configuration. QEMU and the Linux kernel have virtio-vsock vhost support. Firecracker is a new open source Virtual Machine Monitor (VMM) that makes use of KVM and includes support for virtio-vsock.
Andra will give an intro on the state of the art of virtio-vsock and its use cases. She will then present multiple proposed options for communication channels between a virtual machine and the host or between virtual machines using Firecracker. These options include the vhost backend as well as UNIX domain sockets. She will share performance metrics with regard to the discussed alternatives.
Stefano will describe the latest performance improvements within the Linux kernel and QEMU. He will also give an overview of tools that recently added vsock support (e.g. wireshark, tcpdump, iproute2-ss, ncat). Finally, he will present the next challenges that will be faced to improve virtio-vsock, such as support for nested VMs and network namespaces.
Stefano is a Principal Software Engineer at Red Hat.He is working on virtualization and networking topics in QEMU and Linux kernel. He is the maintainer of Linux's vsock subsystem (AF_VSOCK).Current projects cover vDPA for virtio-blk devices, virtio-vsock, QEMU network and storage... Read More →
Software Development Engineer, Amazon Web Services
Andra is a Software Development Engineer at Amazon Development Center, Romania, Bucharest, part of Amazon Web Services (AWS). She has been working on the virtualization stack of EC2, both on Xen and Nitro hypervisors. Before AWS, she was a Software Engineer at Intel, focusing on research... Read More →
For the cloud providers it is important to keep private user data secure. One way to achieve it is to fuzz the interfaces available to the guest, to find new vulnerabilities and ways of exploitation. One of such surface is the emulated devices used by the guest machines.
We present the approach to fuzz virtio devices based on AFL to find a bugs. We evaluate this approach by fuzzing the virtio devices in SPDK and QEMU. Find several crashes, hangs and filed new CVE (CVE-2019-9547). Also to make the approach useful for our Cloud production case, we integrate it with the CI for each release.
10+ years of system-level development: gdb, gcc, linux, rtos. Right now i'm working on the Yandex Cloud project (https://cloud.yandex.com/), as part of the Kernel-Hypervisor team. My ongoing projects are: - virtio-blk device optimization, stability and security - host security (from... Read More →
During the KVMForum in Prague Paolo and me have presented the challenges in implementing virtualized fibrechannel for qemu. However, after some initial submission the entire topic didn't receive much traction. So here I will be presenting a way on how to efficiently map fibrechannel devices onto virtio-scsi by just updating the mapping information and not modifying the actual data layout. Thus I've preserved backwards compability with existing implementations and allowed new installations to take advantage of the new implementation.
Studied Physics with main focus image processing in Heidelberg from 1990 until 1997, followed by a PhD in Edinburgh 's Heriot-Watt University in 2000. Worked as sysadmin during the studies, mainly in the Mathematical Institute in Heidelberg. Now working at SUSE Labs as Kernel Storage... Read More →
Thursday October 31, 2019 15:45 - 16:15 CET
Forum 3
Traditionally, system administrators have been able to access all data on a running system, including memory belonging to Virtual Machines (VMs). Bugs in the hypervisor have also allowed cross-VM attacks.
A new upcoming feature for the s390x architecture will prevent those security issues, allowing VM guests to be protected from a broken or malicious hypervisor, without using memory encryption, while at the same time requiring a minimum amount of changes in the guest.
This presentation will introduce the technology, the architectural extensions, the unique features, and how KVM and Qemu have been adapted to exploit it. The presentation will also cover the typical lifecycle of host and guest, including interactions with the firmware.
KVM on s390x (IBM Z mainframes) developer and co-maintainer, KVM-unit-tests for 390x developer and co-maintainer. Mostly working on protected virtualization and related topics. Previously held talks about the s390x architecture and protected virtualization at GPN, CCC and KVM forum... Read More →
The ivshmem device is a simple way to interconnect a number of VMs and let them exchange data and events without much hypervisor involvement. In fact, this is a common pattern in many hypervisor, specifically in embedded. But our current design has a number of shortcomings, primarily around life-cycle management. And it has always been a stepchild, lacking even an upstream kernel driver.
This talk will present our effort to improve ivshmem. The new design gained essential missing features as well as a number of nice add-ons like uni-directional memory regions or optimized UIO interrupt handling. And it has been written to be applicable on QEMU as well as other hypervisors, e.g. Jailhouse.
The talk will furthermore present a prototype that stacks virtio over an ivshmem link, providing a lightweight backend-frontend channel that does not require virtio awareness in the hypervisor.
Jan Kiszka is working as consultant, open source evangelist and Principal Key Expert Engineer in the Competence Center Embedded Linux at Siemens Technology. He is supporting Siemens businesses with adapting, enhancing or strategically driving open source as platform for their product... Read More →
AMD continues to improve and expand the support for SEV in the kernel/hypervisor. This talk will focus on the current development activities around SEV, such as eliminating memory pinning, live migration and SEV-ES.
Tom Lendacky is a member of the Linux OS group at Advanced Micro Devices where he is responsible for enabling and enhancing support for AMD processor features in the Linux kernel. He is currently working on extending the SEV support in the Linux kernel to further enhance the features... Read More →
The Network Block Device (NBD) protocol dates back to Linux 2.1.55 in April 1997, pre-dating iSCSI as a means for block device access of remote storage. However, in more recent years, the protocol has seen a revival as virtualization scenarios have used and extended its features for a variety of tasks.
This talk will cover recent developments: new commands (WRITE_ZEROES, BLOCK_STATUS, RESIZE), encryption support (X.509 certificates, TLS PSK), multi-connection throughput enhancement, underlying protocol improvements (structured replies, 64-bit requests), and standardization efforts for a common URI naming representation.
Richard Jones and Eric Blake will also discuss performance improvements, and userspace libraries for easier integration of the NBD protocol into other projects (nbdkit, libnbd). A demonstration of some interesting nbdkit plugins and filters will tie it all together.
Eric Blake is a software engineer at Red Hat, working on block device management in virtualization. He has contributed extensively to qemu and libvirt. He has spoken at several past KVM Forums, most recently about making the most of NBD in Oct 2019.
Intel has introduced a new hardware-assisted IO virtualization technology, i.e. scalable IOV which provides much better flexibility and scalability in sharing of I/O devices like network interface cards, GPUs, and hardware accelerators across containers and VMs compared to the existing one - SR-IOV. In this presentation, the authors will take an overview of the scalable IOV technology from platform and device's perspective, introduce how to enable a typical scalable IOV device driver through vfio-mdev framework, how to compose the virtual device from scalable IOV capable device and bring up this virtual device, how the virtual device works and how the scalable IOV capable device works together with another PASID based technology SVA(Shared Virtual Address) in virtualization environment.
Xin Zeng is a software engineer of Network Platform Group at Intel Data Center Group. He is now working on virtualization projects for Intel QuickAssist Technology product. Intel QuickAssist Technology can be used to handle compute-intensive security and compression operations that... Read More →
Yi is a software engineer from Intel Virtualization team, focusing on I/O virtualization technology. He works on Shared Virtual Memory, Scalable IOV and vIOMMU stuffs in recent years. He has been invited to give presentations at LPC 2017, LinuxCon Beijing 2018, KVM Forum 2018, Intel... Read More →
When emulating a system using QEMU's Tiny Code Generator we must inspect and translate every single instruction that gets executed. As we completely control the system we should be able to extract some interesting information about how the code executes. While other code instrumentation systems exist (DynamoRIO, Pin, Valgrind) QEMU is unique in supporting system emulation as well as not requiring the host system to have the same instruction set as the guest. In this talk we shall discuss how we can answer some interesting questions about programs running under QEMU. We shall examine some of the drawbacks of QEMU's current introspection support. Finally we shall discuss if a plugin system will allow for more detailed experiments without comprising our ability to continually improve the quality of our emulation code.
Alex started learning to program in the 80s in an era of classic home computers that allowed you to get down and dirty at the system level. After graduating with a degree in Chemistry he's worked on a variety of projects including Fruit Machines, Line Cards, CCTV recorders and point-to-multipoint... Read More →
Mediated pass-through provides many merits, e.g. flexible resource management, easy-to-scale, composability, etc. while still sustaining good user experience regarding to performance and feature. While VFIO introduces basic mediated pass-through support (mdev) since kernel 4.10, there are many inspiring values to be further explored atop. In this talk, Kevin Tian will introduce their work on extending VFIO mdev framework in three main areas: enriching the portfolio of mediation capabilities efficiently, using mediation framework to bridge hardware gaps, and bringing mediation capability in nested virtualization environment. Along the road mediated pass-through could become a corner-stone toward uncompromised cloud experience in pass-through usages.
Kevin is a virtualization veteran from Intel with 16 years experience in open source virtualization projects (KVM, Xen, etc.), including multiple presentations in associated conferences. He is currently a software architect in Open source Technology Center of Intel, with current focus... Read More →
QEMU includes gdbserver emulator which is capable of debugging the whole emulated system, including firmwares, drivers, and BIOS code. However, debugging with a multi-process operating system is tricky, because GDB does not distinguish the processes within the guest. In this talk authors discuss the approaches for making the debugging better: detecting the processes, inspection of the address spaces, instrumenting the code, mapping the executables, and so on.
The talk also includes overview of the new debugging stub for QEMU which allows using WinDbg without switching the guest system into the debugging mode.
Pavel Dovgalyuk is a software developer in Institute for System Programming (ISP) of the Russian Academy of Sciences (RAS). The activities of the Institute include fundamental research, software development, applied research for the benefits of the Industry, and education. For the... Read More →
Meet muser, a framework built on top of vfio/mdev for implementing PCI devices in userspace. It consists of a kernel module that acts as the mediated device and a userspace library where the core of the device is implemented. Applications using libmuser must only provide a description and callbacks for read/write.
muser abstracts the complexity yet allows tremendous flexibility. It manages interrupts, the PCI config space, memory translation, handles interaction with vfio/mdev and much more. While allowing customization where needed (for power users), it can also offer bindings for various languages. To prove simplicity, we will write and test a device live during the talk!
This is very useful with QEMU, where devices presented via vfio can be directly passed to VMs. It also enables a single userspace process to manage devices for multiple VMs, which has performance benefits.
I'm a software engineer at Nutanix working on storage virtualization. I'm currently working on the vfio-user protocol and libvfio-user, which allows us to use SPDK as a virtual NVMe controller outside QEMU in order to achieve high performance, low latency, and higher CPU efficien... Read More →
This talk will present the implementation of the QEMU frontend for the Qualcomm Hexagon DSP.
The QEMU frontend is automatically generated: the authors have extracted and formalized the pseudocode of each instruction from the ISA reference manual and implemented a translator. This translator transforms the pseudocode to C code to be embedded directly in QEMU which performs instruction decoding, textual disassembly and generates the equivalent tiny code instructions. This approach reduces the implementation effort and allows to easily add new instructions.
The other interesting aspect is the fact that Hexagon is a VLIW architecture: it runs bundles composed of up to 4 parallel instructions that might also feature data dependencies.
This talk also wants to be the starting point for upstreaming our frontend, which is now almost feature-complete.
Lead LLVM Compiler & Tools Team, Qualcomm Innovation Center
Taylor Simpson (PhD) leads the LLVM compiler and tools team at Qualcomm Innovation Center. His team is dedicated to delivering state-of-the-art toolchains for all Qualcomm processors, including Hexagon. He presented a talk on the LLVM back end for Hexagon at the 2011 LLVM Developers... Read More →
N. Izzo received its Master’s Degree in Computer Science and Engineering (cum laude) in 2017 from Politecnico di Milano (Italy), his thesis work was published as “Software-only Reverse Engineering of Physical (DRAM) Mappings for Rowhammer Attacks” (Costa Brava, Spain, IEEE... Read More →
Is QEMU bloated? Insecure? Obsolete? What are QEMU's tasks when virtualizing a modern guest, and how do they change for various workloads and scenarios? The aim of this talk is to provide hard data on the security and size of various components of QEMU, explain how the build can be tailored to minimize code size, attack surface and startup time, and give ideas for future development of QEMU. I will also shortly present the tools that helped me gather the data, so that anyone can reproduce my experiments in the future.
Paolo is a Distinguished Engineer at Red Hat and the upstream maintainer for both KVM and various subsystems in QEMU. As a contributor to QEMU, through the years, he has worked on various parts of the project architecture, including the threading architecture, the test frameworks... Read More →
As popular of cloud, more vendors move their business into cloud and nested virtualization technology is used in the production environment more and more(e.g security container and emulation ). Microsoft Azure cloud platform provides nested virtualization support.
The IO performance is still to be a bottleneck for good experience with high through put. This is due to long code path and several data copies among host, L1 VM and L2 VM. Traditional solution is to use pass-through solution and expose virtual IOMMU to L1 VM. But Virtual IOMMU still has a side affect. This topic is to propose a hybrid solution of vhost-user with user space drivers (DPDK, SPDK)plus device pass through(L0->L1) to accelerate nested VM IO performance. From test result, L2 VM can achieve almost 100% L0 IO performance in some cases. This topic will show our performance result and some challenges.
Chao Peng is a senior software engineer in Intel virtualization team. His responsibilities include enabling various hardware virtualization features in open source VMM/OS, as well as developing new usages models in virtualization and cloud environment. He was speaker in KVM forum/Xen... Read More →
Tianyu is Senior Software Engineer in COSINE(Core OS & Intelligent Edge) at Microsoft. He focuses on the performance optimization of Linux VMs on Hyper-V. Previously, Tianyu worked on ACPI, power management, KVM and Xen opens source projects at Intel Open source technology center... Read More →
rust-vmm is an open-source project that maintains a set of high-quality virtualization building blocks. It allows developers to focus on their VMM key differentiators rather than re-implementing components like KVM API wrappers, virtio devices and memory models.
In this presentation we go over the design and structure of the project, as well as the fundamentals of building VMMs using rust-vmm. We start by describing why we think Rust is the right language. We also highlight the implications of splitting virtualization components into standalone, separate repositories. Next, we look at how rust-vmm is used in practice by Rust based VMMs and what changes are required to make the transition from a single repo model to one where packages are consumed from a shared, multi-repo. Finally, we outline how the modular nature of rust-vmm can be leveraged by non Rust based VMMs like QEMU.
I am a software engineer with the Amazon Web Services Firecracker team. I am passionate about open source and, beyond Firecracker, I am also contributing to rust-vmm, a community effort to create a shared set of Rust-based Virtual Machine Monitor components. So far I’ve been talking... Read More →
I currently work at Intel’s Open Source Technology Center where I’m busy with the cloud-hypervisor and Kata Containers projects. I’ve previously talked at the KVM Forum, the Open Infrastructure Summit, KubeCon and various other random open source conferences.
We argue that memory management for future VMs ought to be different from the one for Linux processes. Recently new types of memory, such as persistent memory, encrypted memory, are emerging, and they have different characteristics or require different (or additional) operations (e.g. flush caches) in terms of memory management. Although KVM has started to reuse the Linux kernel mechanism and benefited, it’s becoming difficult to keep using the kernel memory management for guests to meet those requirements, while achieving performance and simplicity. For example, various aspects of memory management are different: life cycles, page sizes, page invalidation, page access/modification tracking, memory ballooning, security, and isolation (e.g. from the host). In this session we discuss ideal/optimal memory management for guest VMs, possible implementation options, and preliminary PoC.
Isaku Yamahata is a Software architect in the Open Source Technology Center, Intel. His main focus is virtualization technology, network virtualization as Software Defined Networking for multiple years. Isaku is an active on Graphene LibOS and OpenStack Neutron (networking) and has... Read More →
In the recent past there has been an explosion of innovation in the technology area around Virtual Machine Monitors (also known as hypervisors) based around the Rust programming language including Google’s crosvm for ChromeOS, Amazon’s Firecracker for containers and Intel's Cloud Hypervisor project.
One defining aspect of all the Rust hypervisors that are active or under development is that they do not use a traditional firmware for booting the guest operating system and instead boot directly into a Linux kernel under the control of the host. This limitation makes it much harder to use the hypervisor to provide a general purpose Virtual Machine, often known as a “pet”. In order to mitigate this we have developed the Rust Hypervisor Firmware to allow these Rust based hypervisors to load customer controlled operating systems and enable a wider range of uses.
Rob has worked on Open Source at Intel for over 15 years on a wide variety of projects spanning from client user experiences, to graphics, to system software and now cloud technologies. In the field of cloud technologies Rob has been a key contributor to the Cloud Integrated Advanced... Read More →
On x86 platforms, interrupts are configured and delivered to operating system through either the interrupt controllers(e.g. PIC/APIC) or MSI/MSI-X. Similarly for virtualized x86 system, the same set of technologies are used. However, this is not fundamentally required and for some light weight virtualization usages like Kata Containers and Firecracker, which mainly focus on virtio devices and even MSI/MSI-X is not included due to complexity of PCI, the existing interrupt controllers and interrupt handling flow in both host and guest sides are over-killed. We prototyped a new virtual and simplified interrupt controller which fits current kernel interrupt framework well and meanwhile keeps only minimal code in VMM side. This will present the solution as well as the performance data and demonstrate how it can achieve simple and efficient interrupt handling for virtio-mmio device.
Jing Liu is a software engineer working in Intel virtualization team. She focuses on hardware virtualization enabling work and innovation optimization projects for modern cloud in these years. She was once a speaker for colleges in previous company IBM.
Chao Peng is a senior software engineer in Intel virtualization team. His responsibilities include enabling various hardware virtualization features in open source VMM/OS, as well as developing new usages models in virtualization and cloud environment. He was speaker in KVM forum/Xen... Read More →
The x86 KVM MMU has significant scaling issues with many VCPUs and lots of RAM. Over the last year, we have made substantial improvements to the x86 KVM MMU in the direct-mapped TDP case, to reduce lock contention and memory overheads, with the goal of migrating VMs with 416 VCPUs and 12TB of memory. With these changes, the x86 KVM MMU can handle EPT/NPT violations from all VCPUs in parallel, requires ~99% less MMU memory overhead in steady state with 2M pages, simplifies the implementation of MMU operations, and more. This talk will cover new synchronization models, abstractions, and data structures, and details of the performance we have gained from them.
KubeVirt enables Kubernetes to run VMs in addition to containers and got introduced 2 years ago. In these 2 years quite a lot has changed and KubeVirt gained traction. In this talk we are
- giving small demo to illustrate how KubeVirt works - looking at where KubeVirt stands today - what features it gained - what architectural shifts it went through - how traditional components like libvirt are used - how the community is using KubeVirt - and what is laying ahead
Fabian Deutsch is working for Red Hat and has been working in the virtualization space for the last couple of years. Initially covering some node level aspects in oVirt and now building a robust virtual machine add-on for Kubernetes with KubeVirt. Throughout the years he spoke at... Read More →
Many common workloads aren't sensitive to the VM-Exit performance or they can be optimized through device assignment. The focus of this presentation will be on those workloads that are sensitive to the VM-Exit performance and that cannot avoid triggering high frequency VM-Exits. Those workloads aren't common, but they can materialize in the guest with some applications like databases. Incidentally those are also the workloads that show the biggest impact from the software mitigations of some CPU model speculative execution vulnerabilities.
The KVM x86-64 VM-Exits are already highly optimized, but there is still room for improvement. We'll first analyze the impact of various software mitigations on the VM-Exit execution and then how we can change KVM in order to Micro-Optimize the VM-Exit further, with, but also without, the software mitigations enabled.
Andrea Arcangeli joined Red Hat in 2008 because of his interest in working on the KVM Virtualization Hypervisor, with a special interest in virtual machine memory management. He worked on many parts of the Linux Kernel, especially on the Virtual Memory subsystem. Andrea started working... Read More →
As we all know, post-copy can greatly reduce live migration down time for devices with memory intensive usages. While it is possible on emulated devices, live migration with post-copy technology on pass-through devices is still not supported. In this session, Yan will explain detailed benefits and show you a generic solution in VFIO on how to migrate pass-through devices with post-copy technology and Shaopeng will expose performance statistics of using post-copy on migrating SRIOV VFs on Intel NIC.
Kevin is a virtualization veteran from Intel with 16 years experience in open source virtualization projects (KVM, Xen, etc.), including multiple presentations in associated conferences. He is currently a software architect in Open source Technology Center of Intel, with current focus... Read More →
Shaopeng is a Senior Network Software Engineer from Intel. He focuses on Network Interface Controller and I/O virtualization. Prior to Intel, he worked in cloud and network industry for over 10 years.
The KVM hypervisor is at the core of cloud computing, some customers from financial, online shopping, and gaming etc more prefer the dedicated instances to avoid resources contention from multi-tenant, and the security can be guaranteed by isolation. However, without more hypervisor optimizations, cloud providers still can't provide performance that is "indistinguishable from metal."
In this presentation,we will introduce some features which can reduce the tax from kvm hypervisor for dedicated instances include: Exitless Timer, KVM_HINTS_DEDICATED performance hint, allow userspace to disable MWAIT/HLT/PAUSE vmexits, adaptively tune advance lapic timer and adaptive halt-polling in guest/host to reduce latency.
Wanpeng Li is a 9 years experienced Linux kernel/virtualization developer who works in Tencent Cloud currently. He mainly focuses on KVM, scheduler and memory management. In KVM, he contributes a lot of features to improve performance and stability. He has worked in the IBM LTC kernel... Read More →
Virtual Performance Monitor Units (vPMUs) are usually disabled in today’s KVM-based clouds though some runnable vPMU code has been in the upstream for several years. Consequently, profiling software inside virtual machines becomes a gap in the services that cloud vendors can provide. The main barriers are 1) the existing vPMU provides inaccurate profiling results in some cases; and 2) the advanced vPMU features, e.g. LBR and PEBS, have not been supported as they are not designed with virtualization in consideration.
To tackle the above issues, the existing vPMU is optimized by avoiding some heavyweight host perf operations. Tests show that the optimization can greatly reduce the emulation overhead of guest PMU operations with ~3000x boost, and achieves near-native efficiency. In addition, for the first time the virtualized LBR and PEBS features are brought to the clouds.
We have been working on KVM to better protect and isolate guests, and propose a more secure and yet simpler architecture, where 1) guest memory is isolated from the host except the areas for I/O buffers, 2) no MMIO emulation is used. Since it piggybacks on the Linux systems, KVM tends to have more attack surfaces compared with other VMMs, making the guest more vulnerable. For example, the kernel or QEMU can easily access data of the guests today. Even if we have memory encryption technologies, it’s also easy for them to corrupt data of the guests (accidentally or intentionally) or use potential side channels.
In our architecture, we need to make limited changes to guests, but this provides more protection and simplification, compared with other approaches like XPFO, where the user-level still has access to the entire guest memory. We share our experiences and data based on our PoC.
Jun Nakajima is a Senior Principal Engineer at the Intel Open Source Technology Center, leading virtualization and security for open source projects. Jun presented a number of times at technical conferences, including LSS, KVM Forum, Xen Summit, LinuxCon, OpenStack Summit, and USENIX... Read More →
QEMU can be susceptible to security attacks on the many interfaces it exposes to a guest VM. Each interface is an exposure point that, if compromised, provides a malign guest the ability to assume the QEMU process's host privileges.
A multi-process QEMU involves separating QEMU services into multiple host processes. Each of these processes can be given only the privileges it needs to provide its service.
We introduced this topic at KVM forum two years ago, and hosted a BoF on it last year. In this presentation, we will introduce the work we've done with an LSI SCSI controller model, including how it performs, and what the next steps will be.
I've been working on virtualization technologies for a number of years, beginning with the LDOMs product at Sun Microsystems. Recently, I've been working on multi-process QEMU at Oracle, including presenting it at KVM 2019.
Currently working at Oracle on QEMU multiprocess disaggregation project. Before was working on the implementation of vNUMA topology for guests in Xen hypervisor, as well as Xen livepatching and working on various Xen hypervisor improvements and issues. Previously had given a talk... Read More →
This talk is a follow-up to our 2017 one called “Bringing Commercial Grade Virtual Machine Introspection to KVM”. Since then we have made a lot of progress with regards to performance and stability, and are also on track to include support for three Intel features that can greatly help with scalability: VMFUNC, #VE and SPP. We also came across a surprise: in our tests, the speed of the more involved guest-to-hypervisor communication channel used on KVM (BSD sockets on top of vhost-vsock) comes very close to Xen’s lightweight event channel. And we have the numbers to prove it.
I lead the Linux development team at Bitdefender and I am currently involved in integrating our HVI technology with open source hypervisors like Xen and KVM
Recently more and more services, particularly micro services, have been moved from VMs to containers. Due to container's native infrastructure, cloud providers are seeking tech to create a more secured multi-tenant environment such as firecrack from AWS. But using a dedicated hypervisor for micro service would bring extra burden to develop and maintain respectively. Furthermore, the improvement we add for one could benifit another. How about leveraging QEMU to fulfill the requirements of micro services? That is exactly what we did at Tencent Cloud. We will share our works to adapt QEMU to fast deploy intensive micro services in a extremely short period ( < 35 ms) with less resource utilization which is comparable to containers that includes directly starting a VM from the parent, C/R QEMU to start a new VM, modularizing QEMU, reducing resource usage for both QEMU & Linux VMs etc.
Xiao Guangrong is a Linux Kernel developer working on Ftrace, MM, Btrfs but his main interest is KVM. As a active contributor, he was invited to give some presentations at some conferences: Japan LinuxCon 2011, Japan LinuxCon 2012 China CLK 2012, KVM Forum 2016, 2017, 2018. He is... Read More →
Yulei has more than 10 years experienced software developer working in Virtualization area. Used to work on GFX driver and involve in Intel GPU virtualization technology(a.k.a Intel GVT-g). He is currently a senior software developer Tencent Cloud, his recent presentations were: "Adaptive... Read More →