Re: [PATCH] x86/cpuid: fix dom0 crash on skylake machine

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: huaitong.han@intel.com, Luwei Kang <luwei.kang@intel.com>,
	yong.y.wang@intel.com, xen-devel@lists.xen.org
Subject: Re: [PATCH] x86/cpuid: fix dom0 crash on skylake machine
Date: Wed, 1 Jun 2016 12:45:54 +0100	[thread overview]
Message-ID: <574ECAF2.4000502@citrix.com> (raw)
In-Reply-To: <574EE53C02000078000F0609@prv-mh.provo.novell.com>

On 01/06/16 12:38, Jan Beulich wrote:
>>>> On 01.06.16 at 13:27, <andrew.cooper3@citrix.com> wrote:
>> On 01/06/16 10:43, Jan Beulich wrote:
>>>>>> On 01.06.16 at 11:34, <andrew.cooper3@citrix.com> wrote:
>>>> On 01/06/16 10:17, Jan Beulich wrote:
>>>>>>>> On 01.06.16 at 11:03, <andrew.cooper3@citrix.com> wrote:
>>>>>> While this does work, it undoes some of the work I started with my cpuid
>>>>>> improvements in 4.7
>>>>>>
>>>>>> Does the attached patch also resolve your issue?
>>>>> While that's much better than the original, I don't think it's quite
>>>>> enough. The rest of the domain policy should be taken into account
>>>>> (and I think I had suggested to do so during review of your CPUID
>>>>> rework series), i.e. this can't be calculated once for every domain.
>>>> Like the current use of {hvm,pv}_featureset, as an upper bound, this is
>>>> just a stopgap fix.
>>>>
>>>> Fixing this in a properly per-domain way is part of my further plans for
>>>> cpuid improvements.  The reason it isn't done like this yet is because
>>>> there is a substantial quantity of work required to make this function.
>>> What substantial work other than XSTATE leaf handling inquiring
>>> other leaves do you see?
>> I want to adjust the representation of cpuid information in struct
>> domain. The current loop in domain_cpuid() causes an O(N) overhead for
>> every query, which is very poor for actions which really should be a
>> single bit test at a fixed offset.
>>
>> This needs to be combined with properly splitting the per-domain and
>> per-vcpu information, which requires knowing the expected vcpu topology
>> during domain creation.
>>
>> On top of that, there needs to be verification logic to check the
>> correctness of information passed from the toolstack.
>>
>> All of these areas are covered in the "known issues" section of the
>> feature doc, and I do plan to fix them all.  However, it isn't a couple
>> of hours worth of work.
> All understood, yet not to the point: The original remark was that
> the very XSTATE handling could be done better with far not as much
> of a change, at least afaict without having tried.

In which case I don't know what you were suggesting.

>
>>>>> And then, as said in reply to the original patch, handle_xsetbv()'s
>>>>> checking should be generalized from the special casing of PKRU (or
>>>>> if we don't want that, then that special case would better get
>>>>> removed for consistency reasons).
>>>> Oh - I hadn't even noticed that.  How about this incremental change?
>>>>
>>>> diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
>>>> index a0cfcc2..67c0e4b 100644
>>>> --- a/xen/arch/x86/xstate.c
>>>> +++ b/xen/arch/x86/xstate.c
>>>> @@ -658,8 +658,8 @@ int handle_xsetbv(u32 index, u64 new_bv)
>>>>      if ( (new_bv & ~xfeature_mask) || !valid_xcr0(new_bv) )
>>>>          return -EINVAL;
>>>>  
>>>> -    /* XCR0.PKRU is disabled on PV mode. */
>>>> -    if ( is_pv_vcpu(curr) && (new_bv & XSTATE_PKRU) )
>>>> +    /* Sanity check against domain maximum. */
>>>> +    if ( new_bv & ~(is_pv_vcpu(curr) ? pv_xfeature_mask : hvm_xfeature_mask) )
>>>>          return -EOPNOTSUPP;
>>> Looks good.
>>>
>>> But one other aspect: MPX is tagged S, while PKU is tagged H. Do
>>> we perhaps need three masks to get the above right at least with
>>> just the global masking?
>> So it is.  I wonder why.  (It was previously like that, and I didn't
>> alter it as part of my cpuid work).  I can't see anything in MPX which
>> makes it unsafe to use with shadow.  All control state works on linear
>> addresses rather than physical addresses.
> Well, it's MPX that's tagged as available for shadow guests, and
> PKU as unavailable.

Ah yes - my confusion.  We should dynamically hide PKU not MPX.

>
>> For now, I will implement the same kind of runtime check as hides MPX
>> from shadow guests in hvm_cpuid().  This again will eventually be
>> subsumed by the further work.
>>
>> I am also wondering whether we should disable MPX by default for
>> domains.  We know it won't work properly for anything which ends up
>> being emulated.  Users who specifically want to try it should be able to
>> turn it on in their domain configuration file, but it really shouldn't
>> be on my default.
> For guests being introspected I agree. For others I'm not sure, as
> normally we shouldn't hit the instruction emulator for branches (the
> main exception being PAE mode shadow guests).

MPX can be used for any memory reference, and one stated purpose is for
buffer overflow detection.  While the instruction emulator doesn't
understand MPX, any MMIO or shadow pagetable updates are liable to be
incorrect.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel