Re: [PATCH for 4.9 3/6] x86/hvm: Fix segmentation logic for system segments

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>,
	Paul Durrant <paul.durrant@citrix.com>, Tim Deegan <tim@xen.org>,
	Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH for 4.9 3/6] x86/hvm: Fix segmentation logic for system segments
Date: Mon, 3 Apr 2017 18:37:11 +0100	[thread overview]
Message-ID: <c3bd56cd-774f-574d-bbf0-3418a7508f9f@citrix.com> (raw)
In-Reply-To: <58E28F9A020000780014C34A@prv-mh.provo.novell.com>

On 03/04/17 17:08, Jan Beulich wrote:
>>>> On 03.04.17 at 17:42, <andrew.cooper3@citrix.com> wrote:
>> On 03/04/17 16:07, Jan Beulich wrote:
>>>>>> On 03.04.17 at 16:27, <andrew.cooper3@citrix.com> wrote:
>>>> On 03/04/17 10:13, Jan Beulich wrote:
>>>>>>>> On 31.03.17 at 21:50, <andrew.cooper3@citrix.com> wrote:
>>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>>> @@ -2374,13 +2374,27 @@ int hvm_set_cr4(unsigned long value, bool_t 
>>>> may_defer)
>>>>>>      return X86EMUL_OKAY;
>>>>>>  }
>>>>>>  
>>>>>> +enum hvm_segmentation_mode hvm_seg_mode(
>>>>>> +    const struct vcpu *v, enum x86_segment seg,
>>>>>> +    const struct segment_register *cs)
>>>>> The inputs here are at least somewhat counterintuitive (for example,
>>>>> from an abstract pov it is unexpected that the result depends on seg
>>>>> and cs). At the very least I think the naming should make clear that
>>>>> cs is not just some code segment, but the CS v has currently in use
>>>>> (e.g. active_cs). Going further the question is whether having this
>>>>> helper is really useful (and not perhaps inviting mis-use), and hence
>>>>> whether the inputs passed here wouldn't better be passed directly
>>>>> to hvm_virtual_to_linear_addr(), the more that the "seg" argument
>>>>> is required to match up between the two calls.
>>>> I purposefully wanted to avoid people opencoding the logic and getting
>>>> it wrong (looks like even I got it wrong).
>>>>
>>>> I'm not convinced that passing the parameters individually is better.
>>>>
>>>>>> +{
>>>>>> +    if ( !(v->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE) )
>>>>>> +        return hvm_seg_mode_real;
>>>>> What about VM86 mode?
>>>> Very good point.
>>>>
>>>>>> +    if ( hvm_long_mode_active(v) &&
>>>>>> +         (is_x86_system_segment(seg) || cs->attr.fields.l) )
>>>>>> +        return hvm_seg_mode_long;
>>>>>> +
>>>>>> +    return hvm_seg_mode_prot;
>>>>> Did you verify this actually matching real hardware behavior? There
>>>>> being obvious anomalies when compat ring-0 code executes
>>>>> LGDT/LIDT/SGDT/SIDT (in long mode these ought to have 10-byte
>>>>> operands, yet 32-bit/compat code would expect 6-byte ones, so
>>>>> one of the two views is necessarily wrong, and whichever it is, it
>>>>> introduces an inconsistency), I wouldn't take it for given that _all_
>>>>> descriptor table accessing insns behave like they would from a
>>>>> 64-bit code segment (I nevertheless assume they do, but as I
>>>>> can't see it written down anywhere, we shouldn't assume so,
>>>>> considering how many other oddities there are in x86).
>>>>>
>>>>> This question is also being supported by the SDM using the same
>>>>> standard "Same exceptions as in protected mode" in the
>>>>> respective insns' "Compatibility Mode Exceptions" sections, yet
>>>>> the behavior above implies that #GP(0) might also result for
>>>>> compat mode descriptor table accesses if the descriptor address
>>>>> ends up being non-canonical. Interestingly enough the PM
>>>>> doesn't separate exception specifications for 32-bit protected,
>>>>> compat, and 64-bit modes.
>>>> You are mistaken.
>>>>
>>>> {L,S}{I,G}DT are {read,write}_segment_register() operations, using a
>>>> plain memory operand.
>>>>
>>>> When we come to the segmentation check, it will be by default
>>>> %ds-relative, with size as calculated by op_bytes in the instruction
>>>> emulator.
>>> I think I didn't make myself clear then: I'm not at all talking about how
>>> the memory access of these insns get carried out, I solely talk about
>>> the size of their operands:
>> I still fail to see what the size of the operands have to do with the
>> choice of segmentation mode.
>>
>>> In long mode to load IDTR or GDTR you'd expect a 64-bit base and a 16-bit 
>> limit.
>>
>> Why?  I'd expect nothing of the sort, because 32bit compat segments are
>> purposefully designed to be no functional difference from regular 32bit
>> protected mode segments.  That includes not changing the behaviour of
>> instructions like this.
> Well, one can of course take the position that ring-0 compat code
> is simply a useless thing.

Compatibility mode segments exist for the purpose of making user code
continue to work.  I don't find it surprising that compatbility
supervisor segments have some rough corners.

>
>>> Hence if _all_ system segment
>>> register related insns worked consistently in long mode, the four
>>> named insns would have to have 10-byte operands.
>> This isn't a valid expectation to draw.
>>
>>>  I'm pretty sure
>>> they don't though, so there is _one_ anomaly already.
>> Indeed they don't.  In a compatibility mode segment, they have take a
>> 6-byte operand, identically to 32bit mode.
>>
>>> With that I don't think we can rule out there being other anomalies, with this
>>> not being talked about explicitly anywhere in the doc.
>> I don't think any of this is relevant to the correctness of this patch.
> I don't question the correctness; all I question is how far it is built
> upon assumptions vs actually observed (short of documented)
> behavior.

AMD Vol2 14.5 "Initialising Long Mode" is fairly clear on the subject. 
The layout and behaviour of the 4 system structures depend on

>
>> Without this fix, implicit accesses to system segments in a
>> compatibility mode segment will truncate the resulting linear address as
>> part of performing the segmentation calculations, which is obviously not
>> how real hardware behaves.
> I understand this. But things are a little more complicated. Just
> extend your line of thinking regarding IDTR/GDTR to LDTR and
> TR: Above you suggest that the former two get loaded in a fully
> 32-bit mode compatible way. LTR and LLDT (as well as LAR and
> LSL), however, access a descriptor table. 32-bit code would
> expect an 8-byte descriptor to be read. Is compat mode code
> then not allowed to make the same assumption?

Hmm - the wording of LTR/LLDT in both manuals states 64bit mode, not
long mode, so there is a decent chance that the compat behaviour is
identical.  Let me experiment.

~Andrew

>  I think you've
> made pretty much explicit that you'd expect a 16-byte descriptor
> to be loaded in this case. So ring-0 compat code operates
> identically to 32-bit mode in some respects, and identically to
> 64-bit code in others. Fundamentally I'd expect consistency here,
> but along the lines of the usefulness remark higher up I simply
> think that no-one had spent any thought on this when designing
> x86-64; otherwise it would have been easy to simply disallow
> ring-0 to enter compat mode, thus avoiding any inconsistencies.
>
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel