From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Catterall Subject: [RFC 0/4] HVM x86 enhancements to run Xen deprivileged mode operations Date: Thu, 6 Aug 2015 17:45:15 +0100 Message-ID: <1438879519-564-1-git-send-email-Ben.Catterall@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: xen-devel@lists.xensource.com Cc: keir@xen.org, ian.campbell@citrix.com, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, jbeulich@suse.com, Ben Catterall List-Id: xen-devel@lists.xenproject.org Hi all, I have a working base for this and would appreciate feedback at this point to evaluate if it is moving in the right direction. Many thanks in advance, Ben The aim of this work is to create a proof-of-concept to establish if it is feasible to move certain Xen operations into a deprivileged context to mitigate the impact of a bug or compromise in such areas. An example would be x86_emulate or virtual device emulation which is not done in QEMU for performance reasons. This patch series contains the underlying support mechanisms for this mode, which include: - Setting up the necessary monitor page table entries for the deprivileged code, data and stack regions. - Moving into and out of this mode - Handle system calls from this mode - Trapping exceptions taken whilst in this mode Performance testing ------------------- Performance testing indicates that the overhead for this deprivileged mode is approximately 25%. This overhead is the cost of moving into deprivileged mode and then fully back out of deprivileged mode. I performed 100000 writes to a single I/O port on an Intel 2.2GHz Xeon E5-2407 0 processor. This was done from a python script within the HVM guest using time.time() and running Debian Jessie. Each write was trapped to cause a vmexit and the time for each write was calculated. These experiments were repeated. Note that only the host and this HVM guest were running (both Debian Jessie) during the experiments. 20e-6 seconds was the average time for performing the write without the deprivileged code running. 25e-6 seconds was the average time for performing the write with an entry and exit from deprvileged mode. Further Work ------------ - Support migration of vcpus between pcpus. This will likely be done by using a hard affinity to a pcpu and setting a 'migration pending' flag so that once we return from deprivileged mode and the stack has unwound, we can then migrate the vcpu. - Prevent DoS attacks on migration: A counter is needed to prevent a spinning deprivileged mode from preventing migration. We could count the number of quanta which have passed since we failed to migrate, then migrate when it becomes too high. - Add support for SVM and test on AMD processors. - We need to get the host MSRs for AMD SVM mode. Signed-off-by: Ben Catterall