From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode Date: Fri, 7 Aug 2015 15:24:54 +0100 Message-ID: <55C4BFB6.9070905@citrix.com> References: <1438879519-564-1-git-send-email-Ben.Catterall@citrix.com> <1438879519-564-4-git-send-email-Ben.Catterall@citrix.com> <55C3C9C7.8030808@citrix.com> <55C4A9B6.1030303@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <55C4A9B6.1030303@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ben Catterall , xen-devel@lists.xensource.com Cc: george.dunlap@eu.citrix.com, tim@xen.org, keir@xen.org, ian.campbell@citrix.com, jbeulich@suse.com List-Id: xen-devel@lists.xenproject.org On 07/08/15 13:51, Ben Catterall wrote: > On 06/08/15 21:55, Andrew Cooper wrote: >> On 06/08/15 17:45, Ben Catterall wrote: >>> The process to switch into and out of deprivileged mode can be >>> likened to >>> setjmp/longjmp. >>> >>> To enter deprivileged mode, we take a copy of the stack from the >>> guest's >>> registers up to the current stack pointer. This allows us to restore >>> the stack >>> when we have finished the deprivileged mode operation, meaning we >>> can continue >>> execution from that point. This is similar to if a context switch >>> had happened. >>> >>> To exit deprivileged mode, we copy the stack back, replacing the >>> current stack. >>> We can then continue execution from where we left off, which will >>> unwind the >>> stack and free up resources. This method means that we do not need to >>> change any other code paths and its invocation will be transparent >>> to callers. >>> This should allow the feature to be more easily deployed to >>> different parts >>> of Xen. >>> >>> Note that this copy of the stack is per-vcpu but, it will contain >>> per-pcpu data. >>> Extra work is needed to properly migrate vcpus between pcpus. >> >> Under what circumstances do you see there being persistent state in the >> depriv area between calls, given that the calls are synchronous from VM >> actions? > > I don't know if we can make these synchronous as we need a way to > interrupt the vcpu if it's spinning for a long time. Otherwise an > attacker could just spin in depriv and cause a DoS. With that in mind, > the scheduler may decide to migrate the vcpu whilst it's in depriv > mode which would mean this per-pcpu data is held in the stack copy > which is then migrated to another pcpu incorrectly. If the emulator spins for a sufficient time, it is fine to shoot the domain. This is a strict improvement on the current behaviour where a spinning emulator would shoot the host, via a watchdog timeout. As said elsewhere, this kind of DoS is not a very interesting attack vector. State handling errors which cause Xen to change the wrong thing are far more interesting from a guests point of view. http://xenbits.xen.org/xsa/advisory-123.html (full host compromise) or http://xenbits.xen.org/xsa/advisory-108.html (read other guests data) are examples of kinds of interesting issues which could potentially be mitigated with this depriv infrastructure. > >> >>> >>> The switch to and from deprivileged mode is performed using sysret >>> and syscall >>> respectively. >> >> I suspect we need to borrow the SS attribute workaround from Linux to >> make this function reliably on AMD systems. >> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61f01dd941ba9e06d2bf05994450ecc3d61b6b8b >> >> > > > Ah! ok, I'll look into this. Thanks! Just be aware of it. Don't spend your time attempting to retrofit it to Xen. It is more work than it looks. ~Andrew