From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH v3 07/32] xen/x86: fix arch_set_info_guest for HVM guests Date: Tue, 4 Aug 2015 19:08:19 +0100 Message-ID: <55C0FF93.70501@citrix.com> References: <1435923310-9019-1-git-send-email-roger.pau@citrix.com> <1435923310-9019-8-git-send-email-roger.pau@citrix.com> <55A3E0F102000078000903B5@mail.emea.novell.com> <55B0C100.6000308@citrix.com> <55B0EC520200007800094842@prv-mh.provo.novell.com> <1437651718.19412.92.camel@citrix.com> <55B103DD.4030901@citrix.com> <55B125440200007800094BD2@prv-mh.provo.novell.com> <55B10CDB.5010708@citrix.com> <1437667259.24746.12.camel@citrix.com> <55B11316.2000201@citrix.com> <55B130420200007800094CB8@prv-mh.provo.novell.com> <55B11CBE.8050704@citrix.com> <55B1208C.9090207@citrix.com> <55B211D70200007800094FCF@prv-mh.provo.novell.com> <55B20C9D.5070708@citrix.com> <55B233890200007800095124@prv-mh.provo.novell.com> <55B22B5A.8020304@citrix.com> <55B24F4A02000078000952CC@prv-mh.provo.novell.com> <55B25941.9@citrix.com> <55B27AAB02000078000954A0@prv-mh.provo.novell.com> <55B26DB1.4020302@citrix.com> <55BF9CE6.1030904@citrix.com> <55BFA569.7050102@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZMge4-0000uY-0r for xen-devel@lists.xenproject.org; Tue, 04 Aug 2015 18:09:00 +0000 In-Reply-To: <55BFA569.7050102@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: =?windows-1252?Q?Roger_Pau_Monn=E9?= , Jan Beulich Cc: xen-devel@lists.xenproject.org, Boris Ostrovsky , David Vrabel , Ian Campbell List-Id: xen-devel@lists.xenproject.org On 03/08/15 18:31, Roger Pau Monn=E9 wrote: > El 03/08/15 a les 18.55, Andrew Cooper ha escrit: >> On 24/07/15 17:54, Roger Pau Monn=E9 wrote: >>>>> struct vcpu_hvm_context { >>>>> /* 16bit fields of the strucutre will be used. */ >>>>> #define _VCPUHVM_MODE_16B 0 >>>>> #define VCPUHVM_MODE_16B (1<<_VCPUHVM_MODE_16B) >>>>> /* 32bit fields of the structure will be used. */ >>>>> #define _VCPUHVM_MODE_32B 1 >>>>> #define VCPUHVM_MODE_32B (1<<_VCPUHVM_MODE_32B) >>>>> /* 64bit fields of the structure will be used. */ >>>>> #define _VCPUHVM_MODE_64B 2 >>>>> #define VCPUHVM_MODE_64B (1<<_VCPUHVM_MODE_64B) >>>> As soon as we have three values here, a boolean flag approach >>>> doesn't make sense anymore. >>> Yes, taking into account that only _one_ of them can be used at the same >>> time. >>> >>> I have the feeling that we are over engineering this interface. IMHO we >>> should only allow the user to set the control registers, efer (on amd64) >>> and the GP registers. Segment selectors would be set by Xen to point to >>> a flat segment suitable for the mode the guest has selected. I think >>> that should be enough to get the guest OS into it's entry point, and >>> then it can do whatever it wants. >>> >>> If that's not suitable, then my second option would be to allow the >>> guest to set the base, limit and AR bytes of the selectors, but not load >>> a GDT. >> In an ideal world, it would be nice to pass enough state to be able to >> point eip at the idle loop and have everything JustWork. However, this >> makes a lot of assumptions about exactly which state the vcpu wants to >> set up, which might include MSRs as well. >> >> Therefore, an appropriate compromise is to be able to point eip at >> startup_cpu, running in the correct operating mode to avoid bouncing >> through trampolines. > IMHO, allowing OSes to jump straight into the idle loop is putting a = > lot of burden in the Xen side of things. Also all OSes already contain = > AP startup code, we should be able to hook into this code at a suitable = > location, and that's what we should aim for. > >> >From that point of view, the minimum quantity of state required is: >> >> CR{0,3,4}, EFER, cs/eip, ds, ss/esp, gdt{base,limit}. It would be >> natural to extend this to all the 8 base GPRs and the user segment >> selectors. (Note that EFER is needed even for 32bit paged entries if NX >> is set in the pagetables.) > Extending this to the rest of the GPR is needed for Linux to boot. The = > AP startup code of PVH Linux places the cpu id in a GPR for the boot = > stub to find it: > > http://lxr.free-electrons.com/source/arch/x86/xen/smp.c#L420 > >> In addition, a hint to Xen as to the eventual mode so it can widen its >> reads to start with rather than attempting to reverse engineer the >> operating mode and rereading half the structure later. = >> >> Specifying the gdt{base,limit} will require Xen to read the GDT to set >> up the entry attributes for the segment, as simply setting the selector >> is insufficient. A complicating factor is that VT-x and SVM have >> different representations for access rights. (For migration, VT-x's >> representation is mutated to match SVM.) >> >> I really undecided between suggesting a gdt{base,limit} and Xen reading >> the access rights in an unambiguous fashon, vs specifying the internals >> and avoiding any peeking into domain memory. >> >> It would be nice if the new cpus gs base could be set by the caller, >> which allows the new cpu straight access to its per-cpu data area (if gs >> is in use). This is not representable using the gdt method and would >> involve casing the GSBASE MSR if we were to go down the strict >> architectural route. > IMHO, this is out of the scope and I would rather allow the caller to = > set the internal (cached) part of the segment selectors. > >> Therefore, I am leaning slightly towards the specify the internals side >> of things, which removes some complexity from the Xen side of the hyperc= all. > I've updated the proposed structure, so now it looks like: > > (I still need to figure out how to order the bits in the access rights = > (_ar) fields.) All struct's need a flags register. > > struct vcpu_hvm_x86_16 { > uint16_t ax; > uint16_t cx; > uint16_t dx; > uint16_t bx; > uint16_t sp; > uint16_t bp; > uint16_t si; > uint16_t di; > uint16_t ip; > > uint32_t cr[8]; Definitely no need for anything other than cr0 and 4 in 16 bit mode. > > uint32_t cs_base; > uint32_t ds_base; > uint32_t ss_base; > uint32_t cs_limit; > uint32_t ds_limit; > uint32_t ss_limit; > uint16_t cs_ar; > uint16_t ds_ar; > uint16_t ss_ar; > }; > > struct vcpu_hvm_x86_32 { > uint32_t eax; > uint32_t ecx; > uint32_t edx; > uint32_t ebx; > uint32_t esp; > uint32_t ebp; > uint32_t esi; > uint32_t edi; > uint32_t eip; > > uint32_t cr[8]; Don't need cr's 5-8. > uint64_t efer; > > uint32_t cs_base; > uint32_t ds_base; > uint32_t ss_base; > uint32_t cs_limit; > uint32_t ds_limit; > uint32_t ss_limit; > uint16_t cs_ar; > uint16_t ds_ar; > uint16_t ss_ar; You need selector entries for each segment as well. > }; > > struct vcpu_hvm_x86_64 { > uint64_t rax; > uint64_t rcx; > uint64_t rdx; > uint64_t rbx; > uint64_t rsp; > uint64_t rbp; > uint64_t rsi; > uint64_t rdi; > uint64_t r8; > uint64_t r9; > uint64_t r10; > uint64_t r11; > uint64_t r12; > uint64_t r13; > uint64_t r14; > uint64_t r15; > uint64_t rip; > > uint64_t cr[8]; > uint64_t efer; What has happened to the segment information? It cannot be omitted completely, even in 64bit. > }; > > typedef enum { > VCPU_HVM_MODE_16B, /* 16bit fields of the structure will be use= d. */ > VCPU_HVM_MODE_32B, /* 32bit fields of the structure will be use= d. */ > VCPU_HVM_MODE_64B, /* 64bit fields of the structure will be use= d. */ > } vcpu_hvm_mode_t; > > struct vcpu_hvm_context { > vcpu_hvm_mode_t mode; The width of an enum is implementation defined, which makes them unsuitable in the public interface. > > /* CPU registers. */ > union { > struct vcpu_hvm_x86_16 x86_16; > struct vcpu_hvm_x86_32 x86_32; > struct vcpu_hvm_x86_64 x86_64; > }; We require C89 compatibility for the public interface, so no anonymous unions. ~Andrew > }; > typedef struct vcpu_hvm_context vcpu_hvm_context_t; > DEFINE_XEN_GUEST_HANDLE(vcpu_hvm_context_t); > >