Re: [PATCH] x86/vmx: Revert "x86/VMX: sanitize rIP before re-entering guest"

From: Jan Beulich <jbeulich@suse.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Xen-devel <xen-devel@lists.xenproject.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>,
	"Jun Nakajima" <jun.nakajima@intel.com>,
	"Kevin Tian" <kevin.tian@intel.com>
Subject: Re: [PATCH] x86/vmx: Revert "x86/VMX: sanitize rIP before re-entering guest"
Date: Thu, 15 Oct 2020 10:01:28 +0200	[thread overview]
Message-ID: <1786f728-15c2-3877-c01a-035b11bd8504@suse.com> (raw)
In-Reply-To: <01bb2f27-4e0b-3637-e456-09eb7b9b233e@citrix.com>

On 14.10.2020 15:57, Andrew Cooper wrote:
> On 13/10/2020 16:58, Jan Beulich wrote:
>> On 09.10.2020 17:09, Andrew Cooper wrote:
>>> At the time of XSA-170, the x86 instruction emulator really was broken, and
>>> would allow arbitrary non-canonical values to be loaded into %rip.  This was
>>> fixed after the embargo by c/s 81d3a0b26c1 "x86emul: limit-check branch
>>> targets".
>>>
>>> However, in a demonstration that off-by-one errors really are one of the
>>> hardest programming issues we face, everyone involved with XSA-170, myself
>>> included, mistook the statement in the SDM which says:
>>>
>>>   If the processor supports N < 64 linear-address bits, bits 63:N must be identical
>>>
>>> to mean "must be canonical".  A real canonical check is bits 63:N-1.
>>>
>>> VMEntries really do tolerate a not-quite-canonical %rip, specifically to cater
>>> to the boundary condition at 0x0000800000000000.
>>>
>>> Now that the emulator has been fixed, revert the XSA-170 change to fix
>>> architectural behaviour at the boundary case.  The XTF test case for XSA-170
>>> exercises this corner case, and still passes.
>>>
>>> Fixes: ffbbfda377 ("x86/VMX: sanitize rIP before re-entering guest")
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> But why revert the change rather than fix ...
>>
>>> @@ -4280,38 +4280,6 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>>>  out:
>>>      if ( nestedhvm_vcpu_in_guestmode(v) )
>>>          nvmx_idtv_handling();
>>> -
>>> -    /*
>>> -     * VM entry will fail (causing the guest to get crashed) if rIP (and
>>> -     * rFLAGS, but we don't have an issue there) doesn't meet certain
>>> -     * criteria. As we must not allow less than fully privileged mode to have
>>> -     * such an effect on the domain, we correct rIP in that case (accepting
>>> -     * this not being architecturally correct behavior, as the injected #GP
>>> -     * fault will then not see the correct [invalid] return address).
>>> -     * And since we know the guest will crash, we crash it right away if it
>>> -     * already is in most privileged mode.
>>> -     */
>>> -    mode = vmx_guest_x86_mode(v);
>>> -    if ( mode == 8 ? !is_canonical_address(regs->rip)
>> ... the wrong use of is_canonical_address() here? By reverting
>> you open up avenues for XSAs in case we get things wrong elsewhere,
>> including ...
>>
>>> -                   : regs->rip != regs->eip )
>> ... for 32-bit guests.
> 
> Because the only appropriate alternative would be ASSERT_UNREACHABLE()
> and domain crash.
> 
> This logic corrupts guest state.
> 
> Running with corrupt state is every bit an XSA as hitting a VMEntry
> failure if it can be triggered by userspace, but the latter safer and
> much more obvious.

I disagree. For CPL > 0 we don't "corrupt" guest state any more
than reporting a #GP fault when one is going to be reported
anyway (as long as the VM entry doesn't fail, and hence the
guest won't get crashed). IOW this raising of #GP actually is a
precautionary measure to _avoid_ XSAs.

Nor do I agree with the "much more obvious" aspect: A VM entry
failure requires quite a bit of analysis to recognize what has
caused it; whether a non-pseudo-canonical RIP is what catches your
eye right away is simply unknown. The gprintk() that you delete,
otoh, says very clearly what we have found to be wrong.

> It was the appropriate security fix (give or take the functional bug in
> it) at the time, given the complexity of retrofitting zero length
> instruction fetches to the emulator.
> 
> However, it is one of a very long list of guest-state-induced VMEntry
> failures, with non-trivial logic which we assert will pass, on a
> fastpath, where hardware also performs the same checks and we already
> have a runtime safe way of dealing with errors.  (Hence not actually
> using ASSERT_UNREACHABLE() here.)

"Runtime safe" as far as Xen is concerned, I take it. This isn't safe
for the guest at all, as vmx_failed_vmentry() results in an
unconditional domain_crash().

I certainly buy the fast path aspect of your comment, and if you were
moving the guest state adjustment into vmx_failed_vmentry(), I'd be
fine with the deletion here.

> It isn't appropriate for this check to exist on its own (i.e. without
> other guest state checks),

Well, if we run into cases where we get things wrong, more checks
and adjustments may want adding. Sadly each one of those has a fair
chance of needing an XSA.

As an aside, nvmx_n2_vmexit_handler()'s handling of
VMX_EXIT_REASONS_FAILED_VMENTRY looks pretty bogus - this is a flag,
not a separate exit reason. I guess I'll make a patch ...

Jan