From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH v3 07/32] xen/x86: fix arch_set_info_guest
 for HVM guests
Date: Tue, 4 Aug 2015 19:08:19 +0100
Message-ID: <55C0FF93.70501@citrix.com>
References: <1435923310-9019-1-git-send-email-roger.pau@citrix.com>
	<1435923310-9019-8-git-send-email-roger.pau@citrix.com>
	<55A3E0F102000078000903B5@mail.emea.novell.com>
	<55B0C100.6000308@citrix.com>
	<55B0EC520200007800094842@prv-mh.provo.novell.com>
	<1437651718.19412.92.camel@citrix.com> <55B103DD.4030901@citrix.com>
	<55B125440200007800094BD2@prv-mh.provo.novell.com>
	<55B10CDB.5010708@citrix.com> <1437667259.24746.12.camel@citrix.com>
	<55B11316.2000201@citrix.com>
	<55B130420200007800094CB8@prv-mh.provo.novell.com>
	<55B11CBE.8050704@citrix.com> <55B1208C.9090207@citrix.com>
	<55B211D70200007800094FCF@prv-mh.provo.novell.com>
	<55B20C9D.5070708@citrix.com>
	<55B233890200007800095124@prv-mh.provo.novell.com>
	<55B22B5A.8020304@citrix.com>
	<55B24F4A02000078000952CC@prv-mh.provo.novell.com>
	<55B25941.9@citrix.com>
	<55B27AAB02000078000954A0@prv-mh.provo.novell.com>
	<55B26DB1.4020302@citrix.com> <55BF9CE6.1030904@citrix.com>
	<55BFA569.7050102@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <prvs=6515eae1c=Andrew.Cooper3@citrix.com>)
	id 1ZMge4-0000uY-0r
	for xen-devel@lists.xenproject.org; Tue, 04 Aug 2015 18:09:00 +0000
In-Reply-To: <55BFA569.7050102@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: =?windows-1252?Q?Roger_Pau_Monn=E9?= <roger.pau@citrix.com>, Jan Beulich <JBeulich@suse.com>
Cc: xen-devel@lists.xenproject.org, Boris Ostrovsky <boris.ostrovsky@oracle.com>, David Vrabel <david.vrabel@citrix.com>, Ian Campbell <ian.campbell@citrix.com>
List-Id: xen-devel@lists.xenproject.org

On 03/08/15 18:31, Roger Pau Monn=E9 wrote:
> El 03/08/15 a les 18.55, Andrew Cooper ha escrit:
>> On 24/07/15 17:54, Roger Pau Monn=E9 wrote:
>>>>> struct vcpu_hvm_context {
>>>>> /* 16bit fields of the strucutre will be used. */
>>>>> #define _VCPUHVM_MODE_16B              0
>>>>> #define VCPUHVM_MODE_16B               (1<<_VCPUHVM_MODE_16B)
>>>>> /* 32bit fields of the structure will be used. */
>>>>> #define _VCPUHVM_MODE_32B              1
>>>>> #define VCPUHVM_MODE_32B               (1<<_VCPUHVM_MODE_32B)
>>>>> /* 64bit fields of the structure will be used. */
>>>>> #define _VCPUHVM_MODE_64B              2
>>>>> #define VCPUHVM_MODE_64B               (1<<_VCPUHVM_MODE_64B)
>>>> As soon as we have three values here, a boolean flag approach
>>>> doesn't make sense anymore.
>>> Yes, taking into account that only _one_ of them can be used at the same
>>> time.
>>>
>>> I have the feeling that we are over engineering this interface. IMHO we
>>> should only allow the user to set the control registers, efer (on amd64)
>>> and the GP registers. Segment selectors would be set by Xen to point to
>>> a flat segment suitable for the mode the guest has selected. I think
>>> that should be enough to get the guest OS into it's entry point, and
>>> then it can do whatever it wants.
>>>
>>> If that's not suitable, then my second option would be to allow the
>>> guest to set the base, limit and AR bytes of the selectors, but not load
>>> a GDT.
>> In an ideal world, it would be nice to pass enough state to be able to
>> point eip at the idle loop and have everything JustWork.  However, this
>> makes a lot of assumptions about exactly which state the vcpu wants to
>> set up, which might include MSRs as well.
>>
>> Therefore, an appropriate compromise is to be able to point eip at
>> startup_cpu, running in the correct operating mode to avoid bouncing
>> through trampolines.
> IMHO, allowing OSes to jump straight into the idle loop is putting a =

> lot of burden in the Xen side of things. Also all OSes already contain =

> AP startup code, we should be able to hook into this code at a suitable =

> location, and that's what we should aim for.
>
>> >From that point of view, the minimum quantity of state required is:
>>
>> CR{0,3,4}, EFER, cs/eip, ds, ss/esp, gdt{base,limit}.  It would be
>> natural to extend this to all the 8 base GPRs and the user segment
>> selectors.  (Note that EFER is needed even for 32bit paged entries if NX
>> is set in the pagetables.)
> Extending this to the rest of the GPR is needed for Linux to boot. The =

> AP startup code of PVH Linux places the cpu id in a GPR for the boot =

> stub to find it:
>
> http://lxr.free-electrons.com/source/arch/x86/xen/smp.c#L420
>
>> In addition, a hint to Xen as to the eventual mode so it can widen its
>> reads to start with rather than attempting to reverse engineer the
>> operating mode and rereading half the structure later. =

>>
>> Specifying the gdt{base,limit} will require Xen to read the GDT to set
>> up the entry attributes for the segment, as simply setting the selector
>> is insufficient.  A complicating factor is that VT-x and SVM have
>> different representations for access rights.  (For migration, VT-x's
>> representation is mutated to match SVM.)
>>
>> I really undecided between suggesting a gdt{base,limit} and Xen reading
>> the access rights in an unambiguous fashon, vs specifying the internals
>> and avoiding any peeking into domain memory.
>>
>> It would be nice if the new cpus gs base could be set by the caller,
>> which allows the new cpu straight access to its per-cpu data area (if gs
>> is in use).  This is not representable using the gdt method and would
>> involve casing the GSBASE MSR if we were to go down the strict
>> architectural route.
> IMHO, this is out of the scope and I would rather allow the caller to =

> set the internal (cached) part of the segment selectors.
>
>> Therefore, I am leaning slightly towards the specify the internals side
>> of things, which removes some complexity from the Xen side of the hyperc=
all.
> I've updated the proposed structure, so now it looks like:
>
> (I still need to figure out how to order the bits in the access rights =

> (_ar) fields.)

All struct's need a flags register.

>
> struct vcpu_hvm_x86_16 {
>     uint16_t ax;
>     uint16_t cx;
>     uint16_t dx;
>     uint16_t bx;
>     uint16_t sp;
>     uint16_t bp;
>     uint16_t si;
>     uint16_t di;
>     uint16_t ip;
>
>     uint32_t cr[8];

Definitely no need for anything other than cr0 and 4 in 16 bit mode.

>
>     uint32_t cs_base;
>     uint32_t ds_base;
>     uint32_t ss_base;
>     uint32_t cs_limit;
>     uint32_t ds_limit;
>     uint32_t ss_limit;
>     uint16_t cs_ar;
>     uint16_t ds_ar;
>     uint16_t ss_ar;
> };
>
> struct vcpu_hvm_x86_32 {
>     uint32_t eax;
>     uint32_t ecx;
>     uint32_t edx;
>     uint32_t ebx;
>     uint32_t esp;
>     uint32_t ebp;
>     uint32_t esi;
>     uint32_t edi;
>     uint32_t eip;
>
>     uint32_t cr[8];

Don't need cr's 5-8.

>     uint64_t efer;
>
>     uint32_t cs_base;
>     uint32_t ds_base;
>     uint32_t ss_base;
>     uint32_t cs_limit;
>     uint32_t ds_limit;
>     uint32_t ss_limit;
>     uint16_t cs_ar;
>     uint16_t ds_ar;
>     uint16_t ss_ar;

You need selector entries for each segment as well.

> };
>
> struct vcpu_hvm_x86_64 {
>     uint64_t rax;
>     uint64_t rcx;
>     uint64_t rdx;
>     uint64_t rbx;
>     uint64_t rsp;
>     uint64_t rbp;
>     uint64_t rsi;
>     uint64_t rdi;
>     uint64_t r8;
>     uint64_t r9;
>     uint64_t r10;
>     uint64_t r11;
>     uint64_t r12;
>     uint64_t r13;
>     uint64_t r14;
>     uint64_t r15;
>     uint64_t rip;
>
>     uint64_t cr[8];
>     uint64_t efer;

What has happened to the segment information?  It cannot be omitted
completely, even in 64bit.

> };
>
> typedef enum {
>     VCPU_HVM_MODE_16B,       /* 16bit fields of the structure will be use=
d. */
>     VCPU_HVM_MODE_32B,       /* 32bit fields of the structure will be use=
d. */
>     VCPU_HVM_MODE_64B,       /* 64bit fields of the structure will be use=
d. */
> } vcpu_hvm_mode_t;
>
> struct vcpu_hvm_context {
>     vcpu_hvm_mode_t mode;

The width of an enum is implementation defined, which makes them
unsuitable in the public interface.

>
>     /* CPU registers. */
>     union {
>         struct vcpu_hvm_x86_16 x86_16;
>         struct vcpu_hvm_x86_32 x86_32;
>         struct vcpu_hvm_x86_64 x86_64;
>     };

We require C89 compatibility for the public interface, so no anonymous
unions.

~Andrew

> };
> typedef struct vcpu_hvm_context vcpu_hvm_context_t;
> DEFINE_XEN_GUEST_HANDLE(vcpu_hvm_context_t);
>
>