All of lore.kernel.org
 help / color / mirror / Atom feed
* [Draft A] Boot ABI for HVM guests without a device-model
@ 2015-06-10 12:34 Roger Pau Monné
  2015-06-10 13:15 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Roger Pau Monné @ 2015-06-10 12:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Elena Ufimtseva, Ian Campbell, Andrew Cooper, Stefano Stabellini,
	Tim Deegan, Jan Beulich, Boris Ostrovsky

Hello,

The discussion in [1] lead to an agreement of the missing pieces in PVH 
(or HVM without a device-model) in order to progress with it's 
implementation.

One of the missing pieces is a new boot ABI, that replaces the PV boot 
ABI. The aim of this new boot ABI is to remove the limitations of the 
PV boot ABI, that are no longer present when using auto-translated 
guests. The new boot protocol should allow to use the same entry point 
for both 32bit and 64bit guests, and let the guest choose it's bitness 
at run time without the domain builder knowing in advance.

Roger.

[1] http://lists.xen.org/archives/html/xen-devel/2015-06/msg00258.html

---
HVM direct boot ABI

Since the Xen entry point into the kernel can be different from the 
native entry point, ELFNOTES are used in order to tell the domain 
builder how to load and jump into the kernel entry point. At least the 
following ELFNOTES are required in order to use this boot ABI:

ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS,       .asciz, "FreeBSD")
ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION,  .asciz, __XSTRING(__FreeBSD_version))
ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION,    .asciz, "xen-3.0")
ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET,   .quad,  KERNBASE)
ELFNOTE(Xen, XEN_ELFNOTE_PADDR_ENTRY,    .quad,  xen_start32)
ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .asciz, "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector")
ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz, "generic")

The first three notes contain information about the guest kernel and 
the Xen hypercall ABI version. The following notes are of special 
interest:

 * XEN_ELFNOTE_PADDR_OFFSET: the offset of the ELF paddr field from the
   actual required physical address.
 * XEN_ELFNOTE_PADDR_ENTRY: the 32bit entry point into the kernel.
 * XEN_ELFNOTE_FEATURES: features required by the guest kernel in order
   to run.

The presence of the XEN_ELFNOTE_PADDR_ENTRY note indicates that the 
kernel supports the boot ABI described in this document.

The domain builder will load the kernel into the guest memory space and 
jump into the entry point defined at XEN_ELFNOTE_PADDR_ENTRY with the 
following machine state:

 * esi: contains the physical memory address were the loader has placed
   the start_info page.

 * eax: contains the magic value 0xFF6BC1E2.

 * cr0: bit 31 (PG) must be cleared. Bit 0 (PE) must be set. Other bits
   are all undefined. 

 * cs: must be a 32-bit read/execute code segment with an offset of ‘0’
   and a limit of ‘0xFFFFFFFF’. The exact value is undefined.

 * ds, es, fs, gs, ss: must be a 32-bit read/write data segment with an
   offset of ‘0’ and a limit of ‘0xFFFFFFFF’. The exact values are all
   undefined. 

 * eflags: bit 17 (VM) must be cleared. Bit 9 (IF) must be cleared. 
   Other bits are all undefined.

 * A20 gate: must be enabled.

All other processor registers and flag bits are undefined. The OS is in 
charge of setting up it's own stack, GDT and IDT.

Note that the boot protocol resembles the multiboot1 specification, 
this is done so OSes with multiboot1 entry points can reuse those if 
desired. Also note that the processor starts with paging disabled, 
which means that all the memory addresses in the start_info page will 
be physical memory addresses.

---
Comments for further discussion:

Do we want to keep using the start_info page? Most of the fields there 
are not relevant for auto-translated guests, but without it we have to 
figure out how to pass the following information to the guest:

 - Flags: SIF_xxx flags, this could probably be done with cpuid instead.
 - cmd_line: ?
 - console mfn: ?
 - console evtchn: ?
 - console_info address: ?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 12:34 [Draft A] Boot ABI for HVM guests without a device-model Roger Pau Monné
@ 2015-06-10 13:15 ` Jan Beulich
  2015-06-10 14:53   ` Roger Pau Monné
                     ` (2 more replies)
  2015-06-10 13:18 ` Andrew Cooper
  2015-06-10 18:55 ` Konrad Rzeszutek Wilk
  2 siblings, 3 replies; 17+ messages in thread
From: Jan Beulich @ 2015-06-10 13:15 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, xen-devel, Boris Ostrovsky

>>> On 10.06.15 at 14:34, <roger.pau@citrix.com> wrote:
>  * XEN_ELFNOTE_PADDR_OFFSET: the offset of the ELF paddr field from the
>    actual required physical address.

Why would that be needed? I.e. why would there ever be an offset?

>  * XEN_ELFNOTE_PADDR_ENTRY: the 32bit entry point into the kernel.
>  * XEN_ELFNOTE_FEATURES: features required by the guest kernel in order
>    to run.
> 
> The presence of the XEN_ELFNOTE_PADDR_ENTRY note indicates that the 
> kernel supports the boot ABI described in this document.
> 
> The domain builder will load the kernel into the guest memory space and 
> jump into the entry point defined at XEN_ELFNOTE_PADDR_ENTRY with the 
> following machine state:
> 
>  * esi: contains the physical memory address were the loader has placed
>    the start_info page.
> 
>  * eax: contains the magic value 0xFF6BC1E2.

On what basis was this value chosen? For my taste, it's getting too
close to something that could be a legitimate 32-bit kernel pointer
(agreed, all values could be valid pointers in 32-bit OSes, but with
OSes tending to place themselves high in memory, a value numerically
closer to what multiboot1 uses would seem more desirable).

>  * cr0: bit 31 (PG) must be cleared. Bit 0 (PE) must be set. Other bits
>    are all undefined. 

I see that grub1 documentation says so, but I doubt this is realistic
(even less so for cr4 bits): Some of the bits (including ones not
currently defined) may have a meaning even in non-paged protected
mode, and the environment should be as completely defined as possible.
I.e. I think most other bits should be defined to be zero upon handoff.

>  * cs: must be a 32-bit read/execute code segment with an offset of ‘0’
>    and a limit of ‘0xFFFFFFFF’. The exact value is undefined.

I guess "exact value" really means "selector value".

>  * ds, es, fs, gs, ss: must be a 32-bit read/write data segment with an
>    offset of ‘0’ and a limit of ‘0xFFFFFFFF’. The exact values are all
>    undefined. 

Same here, plus I don't think fs and gs should be defined to have any
particular value, base, limit, or attributes (such that handing off with
them holding nul selectors would become acceptable).

>  * eflags: bit 17 (VM) must be cleared. Bit 9 (IF) must be cleared. 
>    Other bits are all undefined.
> 
>  * A20 gate: must be enabled.

This is irrelevant on other than physical machines.

> Comments for further discussion:
> 
> Do we want to keep using the start_info page? Most of the fields there 
> are not relevant for auto-translated guests, but without it we have to 
> figure out how to pass the following information to the guest:
> 
>  - Flags: SIF_xxx flags, this could probably be done with cpuid instead.
>  - cmd_line: ?
>  - console mfn: ?
>  - console evtchn: ?
>  - console_info address: ?

Yeah, settling on ideally a reasonably arch-independent mechanism
that doesn't place undue constraints on future ports would be nice.
And considering a hypothetical variant of x86 Xen not supporting PV
guests anymore, this would no longer define XEN_HAVE_PV_GUEST_ENTRY
and hence no longer have a struct start_info. So from a puristic pov
the information should indeed be conveyed another way.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 12:34 [Draft A] Boot ABI for HVM guests without a device-model Roger Pau Monné
  2015-06-10 13:15 ` Jan Beulich
@ 2015-06-10 13:18 ` Andrew Cooper
  2015-06-10 15:38   ` Roger Pau Monné
  2015-06-10 18:55 ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2015-06-10 13:18 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Tim Deegan,
	Jan Beulich, Boris Ostrovsky

On 10/06/15 13:34, Roger Pau Monné wrote:
> Hello,
>
> The discussion in [1] lead to an agreement of the missing pieces in PVH 
> (or HVM without a device-model) in order to progress with it's 
> implementation.
>
> One of the missing pieces is a new boot ABI, that replaces the PV boot 
> ABI. The aim of this new boot ABI is to remove the limitations of the 
> PV boot ABI, that are no longer present when using auto-translated 
> guests. The new boot protocol should allow to use the same entry point 
> for both 32bit and 64bit guests, and let the guest choose it's bitness 
> at run time without the domain builder knowing in advance.
>
> Roger.
>
> [1] http://lists.xen.org/archives/html/xen-devel/2015-06/msg00258.html

Fantastic - thanks for doing this.

>
> ---
> HVM direct boot ABI
>
> Since the Xen entry point into the kernel can be different from the 
> native entry point, ELFNOTES are used in order to tell the domain 
> builder how to load and jump into the kernel entry point. At least the 
> following ELFNOTES are required in order to use this boot ABI:
>
> ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS,       .asciz, "FreeBSD")
> ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION,  .asciz, __XSTRING(__FreeBSD_version))
> ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION,    .asciz, "xen-3.0")
> ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET,   .quad,  KERNBASE)
> ELFNOTE(Xen, XEN_ELFNOTE_PADDR_ENTRY,    .quad,  xen_start32)
> ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .asciz, "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector")
> ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz, "generic")
>
> The first three notes contain information about the guest kernel and 
> the Xen hypercall ABI version. The following notes are of special 
> interest:
>
>  * XEN_ELFNOTE_PADDR_OFFSET: the offset of the ELF paddr field from the
>    actual required physical address.
>  * XEN_ELFNOTE_PADDR_ENTRY: the 32bit entry point into the kernel.
>  * XEN_ELFNOTE_FEATURES: features required by the guest kernel in order
>    to run.
>
> The presence of the XEN_ELFNOTE_PADDR_ENTRY note indicates that the 
> kernel supports the boot ABI described in this document.
>
> The domain builder will load the kernel into the guest memory space and 
> jump into the entry point defined at XEN_ELFNOTE_PADDR_ENTRY with the 
> following machine state:
>
>  * esi: contains the physical memory address were the loader has placed
>    the start_info page.
>
>  * eax: contains the magic value 0xFF6BC1E2.
>
>  * cr0: bit 31 (PG) must be cleared. Bit 0 (PE) must be set. Other bits
>    are all undefined. 

"unspecified" is perhaps better phrasing.  Most will be 0, but ET will
be set as it is a read-only bit in all processors Xen will function on
these days.

Perhaps also worth calling out cr4 as well, which typically starts as
all zeroes.

>
>  * cs: must be a 32-bit read/execute code segment with an offset of ‘0’
>    and a limit of ‘0xFFFFFFFF’. The exact value is undefined.
>
>  * ds, es, fs, gs, ss: must be a 32-bit read/write data segment with an
>    offset of ‘0’ and a limit of ‘0xFFFFFFFF’. The exact values are all
>    undefined. 

I would be tempted to only define ds and possibly es.  Any code using
this boot protocol will load its gdt and reload the segments in very
short order, and ss is useless until esp has been set up appropriately.

>
>  * eflags: bit 17 (VM) must be cleared. Bit 9 (IF) must be cleared. 
>    Other bits are all undefined.
>
>  * A20 gate: must be enabled.
>
> All other processor registers and flag bits are undefined. The OS is in 
> charge of setting up it's own stack, GDT and IDT.
>
> Note that the boot protocol resembles the multiboot1 specification, 
> this is done so OSes with multiboot1 entry points can reuse those if 
> desired. Also note that the processor starts with paging disabled, 
> which means that all the memory addresses in the start_info page will 
> be physical memory addresses.
>
> ---
> Comments for further discussion:
>
> Do we want to keep using the start_info page? Most of the fields there 
> are not relevant for auto-translated guests, but without it we have to 
> figure out how to pass the following information to the guest:
>
>  - Flags: SIF_xxx flags, this could probably be done with cpuid instead.
>  - cmd_line: ?
>  - console mfn: ?
>  - console evtchn: ?
>  - console_info address: ?

All console information should be available from the HVMPARAMS.  I see
no reason to prevent a PVH guest getting at these.

This just leaves the command line being awkward.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 13:15 ` Jan Beulich
@ 2015-06-10 14:53   ` Roger Pau Monné
  2015-06-10 15:53     ` Jan Beulich
  2015-06-10 15:42   ` Roger Pau Monné
  2015-06-11 11:01   ` Tim Deegan
  2 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2015-06-10 14:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, xen-devel, Boris Ostrovsky

El 10/06/15 a les 15.15, Jan Beulich ha escrit:
>>>> On 10.06.15 at 14:34, <roger.pau@citrix.com> wrote:
>>  * XEN_ELFNOTE_PADDR_OFFSET: the offset of the ELF paddr field from the
>>    actual required physical address.
> 
> Why would that be needed? I.e. why would there ever be an offset?

For example a FreeBSD kernel has the following program headers:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0xffffffff80200040 0xffffffff80200040
                 0x0000000000000150 0x0000000000000150  R E    8
  INTERP         0x0000000000000190 0xffffffff80200190 0xffffffff80200190
                 0x000000000000000d 0x000000000000000d  R      1
      [Requesting program interpreter: /red/herring]
  LOAD           0x0000000000000000 0xffffffff80200000 0xffffffff80200000
                 0x0000000001055b30 0x0000000001055b30  R E    200000
  LOAD           0x0000000001055b30 0xffffffff81455b30 0xffffffff81455b30
                 0x0000000000135c88 0x0000000000532348  RW     200000
  DYNAMIC        0x0000000001055b30 0xffffffff81455b30 0xffffffff81455b30
                 0x00000000000000d0 0x00000000000000d0  RW     8
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RWE    8

I thought the loader needs XEN_ELFNOTE_PADDR_OFFSET in order to figure 
out the physical address were it has to load the kernel by using 
PhysAddr - XEN_ELFNOTE_PADDR_OFFSET, but maybe that's not the case. 
Maybe I can also fix the FreeBSD kernel in order to have the right 
PhysAddr, but I'm not sure if that's going to screw native loading.

> 
>>  * XEN_ELFNOTE_PADDR_ENTRY: the 32bit entry point into the kernel.
>>  * XEN_ELFNOTE_FEATURES: features required by the guest kernel in order
>>    to run.
>>
>> The presence of the XEN_ELFNOTE_PADDR_ENTRY note indicates that the 
>> kernel supports the boot ABI described in this document.
>>
>> The domain builder will load the kernel into the guest memory space and 
>> jump into the entry point defined at XEN_ELFNOTE_PADDR_ENTRY with the 
>> following machine state:
>>
>>  * esi: contains the physical memory address were the loader has placed
>>    the start_info page.
>>
>>  * eax: contains the magic value 0xFF6BC1E2.
> 
> On what basis was this value chosen?

It's a completely random value.

> For my taste, it's getting too
> close to something that could be a legitimate 32-bit kernel pointer
> (agreed, all values could be valid pointers in 32-bit OSes, but with
> OSes tending to place themselves high in memory, a value numerically
> closer to what multiboot1 uses would seem more desirable).

I don't have any strong opinions here, does the following seem more 
suitable:

0x336ec578 ("xEn3" with the 0x80 bit of the "E" set)

(from xc_dom_binloader.c)

Or we can follow multiboot1 and use:

0x3BADB002

(note the 3 instead of the 2).

> 
>>  * cr0: bit 31 (PG) must be cleared. Bit 0 (PE) must be set. Other bits
>>    are all undefined. 
> 
> I see that grub1 documentation says so, but I doubt this is realistic
> (even less so for cr4 bits): Some of the bits (including ones not
> currently defined) may have a meaning even in non-paged protected
> mode, and the environment should be as completely defined as possible.
> I.e. I think most other bits should be defined to be zero upon handoff.
> 
>>  * cs: must be a 32-bit read/execute code segment with an offset of ‘0’
>>    and a limit of ‘0xFFFFFFFF’. The exact value is undefined.
> 
> I guess "exact value" really means "selector value".

I think so, it's a literal copy from the multiboot1 spec.

> 
>>  * ds, es, fs, gs, ss: must be a 32-bit read/write data segment with an
>>    offset of ‘0’ and a limit of ‘0xFFFFFFFF’. The exact values are all
>>    undefined. 
> 
> Same here, plus I don't think fs and gs should be defined to have any
> particular value, base, limit, or attributes (such that handing off with
> them holding nul selectors would become acceptable).

This is also copied from the multiboot1 spec. I don't have any issue 
with leaving fs and gs undefined.

> 
>>  * eflags: bit 17 (VM) must be cleared. Bit 9 (IF) must be cleared. 
>>    Other bits are all undefined.
>>
>>  * A20 gate: must be enabled.
> 
> This is irrelevant on other than physical machines.

I had my doubts on this one, glad to know it's not relevant.

>> Comments for further discussion:
>>
>> Do we want to keep using the start_info page? Most of the fields there 
>> are not relevant for auto-translated guests, but without it we have to 
>> figure out how to pass the following information to the guest:
>>
>>  - Flags: SIF_xxx flags, this could probably be done with cpuid instead.
>>  - cmd_line: ?
>>  - console mfn: ?
>>  - console evtchn: ?
>>  - console_info address: ?
> 
> Yeah, settling on ideally a reasonably arch-independent mechanism
> that doesn't place undue constraints on future ports would be nice.
> And considering a hypothetical variant of x86 Xen not supporting PV
> guests anymore, this would no longer define XEN_HAVE_PV_GUEST_ENTRY
> and hence no longer have a struct start_info. So from a puristic pov
> the information should indeed be conveyed another way.

What about the following layout:

struct hvm_start_info {
    /* THE FOLLOWING ARE FILLED IN BOTH ON INITIAL BOOT AND ON RESUME.    */
    char magic[32];             /* "xen-<version>-<platform>".            */
    union {
        struct {
            xen_pfn_t console_paddr;    /* Physical address of console page.   */
            uint32_t  console_evtchn;   /* Event channel for console page.     */
        } domU;
        struct {
            uint32_t info_off;  /* Offset of console_info struct.         */
            uint32_t info_size; /* Size of console_info struct from start.*/
        } dom0;
    } console;
    unsigned long mod_start;    /* Physical address of pre-loaded module  */
    unsigned long mod_len;      /* Size (bytes) of pre-loaded module.     */
#define MAX_GUEST_CMDLINE 1024
    int8_t cmd_line[MAX_GUEST_CMDLINE];
};

We can even expand MAX_GUEST_CMDLINE if needed.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 13:18 ` Andrew Cooper
@ 2015-06-10 15:38   ` Roger Pau Monné
  2015-06-10 15:57     ` Andrew Cooper
  0 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2015-06-10 15:38 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Tim Deegan,
	Jan Beulich, Boris Ostrovsky

El 10/06/15 a les 15.18, Andrew Cooper ha escrit:
>>  * cr0: bit 31 (PG) must be cleared. Bit 0 (PE) must be set. Other bits
>>    are all undefined. 
> 
> "unspecified" is perhaps better phrasing.  Most will be 0, but ET will
> be set as it is a read-only bit in all processors Xen will function on
> these days.

OK. I think we can say that:

 * cr0: bit 0 (PE) and bit 4 (ET) will be set. All the other bits are
   cleared.

> Perhaps also worth calling out cr4 as well, which typically starts as
> all zeroes.

 * cr4: all bits are cleared.

>>
>>  * cs: must be a 32-bit read/execute code segment with an offset of ‘0’
>>    and a limit of ‘0xFFFFFFFF’. The exact value is undefined.
>>
>>  * ds, es, fs, gs, ss: must be a 32-bit read/write data segment with an
>>    offset of ‘0’ and a limit of ‘0xFFFFFFFF’. The exact values are all
>>    undefined. 
> 
> I would be tempted to only define ds and possibly es.  Any code using
> this boot protocol will load its gdt and reload the segments in very
> short order, and ss is useless until esp has been set up appropriately.

I would rather prefer to have ss already defined according to the above
text, this way you just need to load a valid stack into esp, but I'm not
going to strongly argue about it.

>> Do we want to keep using the start_info page? Most of the fields there 
>> are not relevant for auto-translated guests, but without it we have to 
>> figure out how to pass the following information to the guest:
>>
>>  - Flags: SIF_xxx flags, this could probably be done with cpuid instead.
>>  - cmd_line: ?
>>  - console mfn: ?
>>  - console evtchn: ?
>>  - console_info address: ?
> 
> All console information should be available from the HVMPARAMS.  I see
> no reason to prevent a PVH guest getting at these.
> 
> This just leaves the command line being awkward.

I've forgot to add the kernel payload (initramfs), which could also be
fetched using a HVMPARAM, so that just leaves the cmd_line, which could
be passed as a physical memory address in one of the gp registers. I
don't have a strong opinion on whether we should create a new
hvm_start_info struct that contains those, or whether we should just add
new HVMPARAMS.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 13:15 ` Jan Beulich
  2015-06-10 14:53   ` Roger Pau Monné
@ 2015-06-10 15:42   ` Roger Pau Monné
  2015-06-11 11:01   ` Tim Deegan
  2 siblings, 0 replies; 17+ messages in thread
From: Roger Pau Monné @ 2015-06-10 15:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, xen-devel, Boris Ostrovsky


Sorry, forgot to reply to one of your chunks.

El 10/06/15 a les 15.15, Jan Beulich ha escrit:
>>  * cr0: bit 31 (PG) must be cleared. Bit 0 (PE) must be set. Other bits
>>    are all undefined. 
> 
> I see that grub1 documentation says so, but I doubt this is realistic
> (even less so for cr4 bits): Some of the bits (including ones not
> currently defined) may have a meaning even in non-paged protected
> mode, and the environment should be as completely defined as possible.
> I.e. I think most other bits should be defined to be zero upon handoff.

I think the following is more accurate:

 * cr0: bit 0 (PE) and bit 4 (ET) will be set. All the other bits are
   cleared.

 * cr4: all bits are cleared.

Roger.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 14:53   ` Roger Pau Monné
@ 2015-06-10 15:53     ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2015-06-10 15:53 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, xen-devel, Boris Ostrovsky

>>> On 10.06.15 at 16:53, <roger.pau@citrix.com> wrote:
> El 10/06/15 a les 15.15, Jan Beulich ha escrit:
>>>>> On 10.06.15 at 14:34, <roger.pau@citrix.com> wrote:
>> For my taste, it's getting too
>> close to something that could be a legitimate 32-bit kernel pointer
>> (agreed, all values could be valid pointers in 32-bit OSes, but with
>> OSes tending to place themselves high in memory, a value numerically
>> closer to what multiboot1 uses would seem more desirable).
> 
> I don't have any strong opinions here, does the following seem more 
> suitable:
> 
> 0x336ec578 ("xEn3" with the 0x80 bit of the "E" set)
> 
> (from xc_dom_binloader.c)

That would seem fine to me.

>>>  * cr0: bit 31 (PG) must be cleared. Bit 0 (PE) must be set. Other bits
>>>    are all undefined. 
>> 
>> I see that grub1 documentation says so, but I doubt this is realistic
>> (even less so for cr4 bits): Some of the bits (including ones not
>> currently defined) may have a meaning even in non-paged protected
>> mode, and the environment should be as completely defined as possible.
>> I.e. I think most other bits should be defined to be zero upon handoff.
>> 
>>>  * cs: must be a 32-bit read/execute code segment with an offset of ‘0’
>>>    and a limit of ‘0xFFFFFFFF’. The exact value is undefined.
>> 
>> I guess "exact value" really means "selector value".
> 
> I think so, it's a literal copy from the multiboot1 spec.

In which case let's please try to be more accurate.

>>> Comments for further discussion:
>>>
>>> Do we want to keep using the start_info page? Most of the fields there 
>>> are not relevant for auto-translated guests, but without it we have to 
>>> figure out how to pass the following information to the guest:
>>>
>>>  - Flags: SIF_xxx flags, this could probably be done with cpuid instead.
>>>  - cmd_line: ?
>>>  - console mfn: ?
>>>  - console evtchn: ?
>>>  - console_info address: ?
>> 
>> Yeah, settling on ideally a reasonably arch-independent mechanism
>> that doesn't place undue constraints on future ports would be nice.
>> And considering a hypothetical variant of x86 Xen not supporting PV
>> guests anymore, this would no longer define XEN_HAVE_PV_GUEST_ENTRY
>> and hence no longer have a struct start_info. So from a puristic pov
>> the information should indeed be conveyed another way.
> 
> What about the following layout:
> 
> struct hvm_start_info {

I mean, if you want to go with another structure, then I can't see why
you wouldn't want to use what is there. I was rather understanding
you'd like to go without any such structure, and would allow the guest
to retrieve the respective data another way (CPUID, HVM param, ...).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 15:38   ` Roger Pau Monné
@ 2015-06-10 15:57     ` Andrew Cooper
  2015-06-11  8:23       ` Roger Pau Monné
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2015-06-10 15:57 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Tim Deegan,
	Jan Beulich, Boris Ostrovsky

On 10/06/15 16:38, Roger Pau Monné wrote:
> El 10/06/15 a les 15.18, Andrew Cooper ha escrit:
>>>  * cr0: bit 31 (PG) must be cleared. Bit 0 (PE) must be set. Other bits
>>>    are all undefined. 
>> "unspecified" is perhaps better phrasing.  Most will be 0, but ET will
>> be set as it is a read-only bit in all processors Xen will function on
>> these days.
> OK. I think we can say that:
>
>  * cr0: bit 0 (PE) and bit 4 (ET) will be set. All the other bits are
>    cleared.

bit 0 set, all other writeable bits clear.

We should not state that ET will be set, even though will be the case in
reality.

>>>  * cs: must be a 32-bit read/execute code segment with an offset of ‘0’
>>>    and a limit of ‘0xFFFFFFFF’. The exact value is undefined.
>>>
>>>  * ds, es, fs, gs, ss: must be a 32-bit read/write data segment with an
>>>    offset of ‘0’ and a limit of ‘0xFFFFFFFF’. The exact values are all
>>>    undefined. 
>> I would be tempted to only define ds and possibly es.  Any code using
>> this boot protocol will load its gdt and reload the segments in very
>> short order, and ss is useless until esp has been set up appropriately.
> I would rather prefer to have ss already defined according to the above
> text, this way you just need to load a valid stack into esp, but I'm not
> going to strongly argue about it.

I would prefer not to give people rope to hang themselves with, when it
comes to making assumptions about starting state.

Any code which doesn't explicitly set up ss is skating on thin ice.

>
>>> Do we want to keep using the start_info page? Most of the fields there 
>>> are not relevant for auto-translated guests, but without it we have to 
>>> figure out how to pass the following information to the guest:
>>>
>>>  - Flags: SIF_xxx flags, this could probably be done with cpuid instead.
>>>  - cmd_line: ?
>>>  - console mfn: ?
>>>  - console evtchn: ?
>>>  - console_info address: ?
>> All console information should be available from the HVMPARAMS.  I see
>> no reason to prevent a PVH guest getting at these.
>>
>> This just leaves the command line being awkward.
> I've forgot to add the kernel payload (initramfs), which could also be
> fetched using a HVMPARAM, so that just leaves the cmd_line, which could
> be passed as a physical memory address in one of the gp registers. I
> don't have a strong opinion on whether we should create a new
> hvm_start_info struct that contains those, or whether we should just add
> new HVMPARAMS.

I should have remembered as well, as I have a curiously-pvh-like usecase
which wants similar bits.

The reason I suggested multiboot was for modules and command line
support.  If we are not going for exactly multiboot, but something
similar, we might want to make a pvh_start_info with a module list and
cmdline pointer in it.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 12:34 [Draft A] Boot ABI for HVM guests without a device-model Roger Pau Monné
  2015-06-10 13:15 ` Jan Beulich
  2015-06-10 13:18 ` Andrew Cooper
@ 2015-06-10 18:55 ` Konrad Rzeszutek Wilk
  2015-06-10 21:31   ` Andrew Cooper
                     ` (2 more replies)
  2 siblings, 3 replies; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-10 18:55 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, Jan Beulich, xen-devel, Boris Ostrovsky

On Wed, Jun 10, 2015 at 02:34:00PM +0200, Roger Pau Monné wrote:
> Hello,
> 
> The discussion in [1] lead to an agreement of the missing pieces in PVH 
> (or HVM without a device-model) in order to progress with it's 
> implementation.
> 
> One of the missing pieces is a new boot ABI, that replaces the PV boot 
> ABI. The aim of this new boot ABI is to remove the limitations of the 

To be fair, there is an existing boot ABI.

It is the same as the PV boot but since it is an PV autotranslated
guest some of the values that an PV guest require are undefined.

With that in mind, why cannot we re-use that (xen_start_info) and
any field which is PV specific can be treated as reserved?


> PV boot ABI, that are no longer present when using auto-translated 
> guests. The new boot protocol should allow to use the same entry point 
> for both 32bit and 64bit guests, and let the guest choose it's bitness 
> at run time without the domain builder knowing in advance.

I like that idea, but that will make the work going forward
on the 32-bit PVH and AMD PVH move out at least another half year
- which is rather sad.

Also this change will require modifying the Linux 64-bit PVH
part. That should be mentioned - and that is likely going to
take also three months.


> 
> Roger.
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-06/msg00258.html
> 
> ---
> HVM direct boot ABI
> 
> Since the Xen entry point into the kernel can be different from the 
> native entry point, ELFNOTES are used in order to tell the domain 
> builder how to load and jump into the kernel entry point. At least the 
> following ELFNOTES are required in order to use this boot ABI:
> 
> ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS,       .asciz, "FreeBSD")
> ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION,  .asciz, __XSTRING(__FreeBSD_version))
> ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION,    .asciz, "xen-3.0")
> ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET,   .quad,  KERNBASE)
> ELFNOTE(Xen, XEN_ELFNOTE_PADDR_ENTRY,    .quad,  xen_start32)
> ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .asciz, "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector")

That will choke on older hypervisors. That is a normal PV
guest won't boot anymore. That is because the older hypervisors
will choke on 'hvm_callback_vector' being in the XEN_ELFNOTE_FEATURES.

You have to stick that in XEN_ELFNOTE_SUPPORTED_FEATURES field.

> ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz, "generic")
> 
> The first three notes contain information about the guest kernel and 
> the Xen hypercall ABI version. The following notes are of special 
> interest:
> 
>  * XEN_ELFNOTE_PADDR_OFFSET: the offset of the ELF paddr field from the
>    actual required physical address.
>  * XEN_ELFNOTE_PADDR_ENTRY: the 32bit entry point into the kernel.

Is 'P' suppose to be 'physical' ?

I am not sure how this will work with an ELF 64-bit binary like
the Linux kernel. Usually we use the virtual address but with
us starting in 32-bit mode with an 64-bit virtual address won't work.

But the ELF loader could figure out the offset of the virtual
address from the ELF starting point and just call at the delta - in
which case having XEN_ELFNOTE_ENTRY can be used with the
understanding that we will just call at that that offset.

>  * XEN_ELFNOTE_FEATURES: features required by the guest kernel in order
>    to run.
> 
> The presence of the XEN_ELFNOTE_PADDR_ENTRY note indicates that the 
> kernel supports the boot ABI described in this document.
> 
> The domain builder will load the kernel into the guest memory space and 
> jump into the entry point defined at XEN_ELFNOTE_PADDR_ENTRY with the 
> following machine state:
> 
>  * esi: contains the physical memory address were the loader has placed
>    the start_info page.
> 
>  * eax: contains the magic value 0xFF6BC1E2.
> 
>  * cr0: bit 31 (PG) must be cleared. Bit 0 (PE) must be set. Other bits
>    are all undefined. 
> 
>  * cs: must be a 32-bit read/execute code segment with an offset of ‘0’
>    and a limit of ‘0xFFFFFFFF’. The exact value is undefined.
> 
>  * ds, es, fs, gs, ss: must be a 32-bit read/write data segment with an
>    offset of ‘0’ and a limit of ‘0xFFFFFFFF’. The exact values are all
>    undefined. 
> 
>  * eflags: bit 17 (VM) must be cleared. Bit 9 (IF) must be cleared. 
>    Other bits are all undefined.
> 
>  * A20 gate: must be enabled.
> 
> All other processor registers and flag bits are undefined. The OS is in 
> charge of setting up it's own stack, GDT and IDT.
> 
> Note that the boot protocol resembles the multiboot1 specification, 
> this is done so OSes with multiboot1 entry points can reuse those if 
> desired. Also note that the processor starts with paging disabled, 
> which means that all the memory addresses in the start_info page will 
> be physical memory addresses.

Wow?! Pagetables disabled?! Why? Usually boot loaders start with some
pagetables setup for the OS - to cover at least the kernel and the
ramdisk. Either it being in 1-1 pagetables or such.

Why make this work harder for the guest?
Why can't the hypervisor setup most of these things for the guest?

> 
> ---
> Comments for further discussion:
> 
> Do we want to keep using the start_info page? Most of the fields there 

Yes. It suits its purpose here too.

> are not relevant for auto-translated guests, but without it we have to 
> figure out how to pass the following information to the guest:
> 
>  - Flags: SIF_xxx flags, this could probably be done with cpuid instead.
>  - cmd_line: ?
>  - console mfn: ?
>  - console evtchn: ?
>  - console_info address: ?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 18:55 ` Konrad Rzeszutek Wilk
@ 2015-06-10 21:31   ` Andrew Cooper
  2015-06-11  8:31     ` Roger Pau Monné
  2015-06-11  7:18   ` Jan Beulich
  2015-06-11  8:43   ` Roger Pau Monné
  2 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2015-06-10 21:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Roger Pau Monné
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Tim Deegan,
	Jan Beulich, xen-devel, Boris Ostrovsky

On 10/06/2015 19:55, Konrad Rzeszutek Wilk wrote:
>> All other processor registers and flag bits are undefined. The OS is in 
>> charge of setting up it's own stack, GDT and IDT.
>>
>> Note that the boot protocol resembles the multiboot1 specification, 
>> this is done so OSes with multiboot1 entry points can reuse those if 
>> desired. Also note that the processor starts with paging disabled, 
>> which means that all the memory addresses in the start_info page will 
>> be physical memory addresses.
> Wow?! Pagetables disabled?! Why? Usually boot loaders start with some
> pagetables setup for the OS - to cover at least the kernel and the
> ramdisk. Either it being in 1-1 pagetables or such.
>
> Why make this work harder for the guest?
> Why can't the hypervisor setup most of these things for the guest?

If you start with paging enabled, the domain builder has to know the
intended runmode and paging details a priori. 

Starting with paging disabled allows one single guest binary to set
itself up however it likes, which includes one single binary being able
to chainload any further payload; an option not available to PV guests
at all.

There are usecases which actually want to run without paging, or without
PAE.  Few, granted, but some non-the-less.  Alternatively, a distro
which might wish to choose between 32 or 64bit depending on the quantity
of RAM in the VM.

>From a 32bit flat entry, it is 0x40 bytes worth of instructions for
32bit, or 0x4f bytes worth of instructions for 64bit to get set up in
the desired paging mode, with a usable stack (and I could definitely
reduce those numbers, at the expense of readability). (TODO - find
enough free time to finish the test framework and publish it.)  The
point is that it is not hard at all, but offers substantially more
flexibility to both host and guest software.

~Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 18:55 ` Konrad Rzeszutek Wilk
  2015-06-10 21:31   ` Andrew Cooper
@ 2015-06-11  7:18   ` Jan Beulich
  2015-06-12 13:30     ` Konrad Rzeszutek Wilk
  2015-06-11  8:43   ` Roger Pau Monné
  2 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2015-06-11  7:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, xen-devel, Boris Ostrovsky, roger.pau

>>> On 10.06.15 at 20:55, <konrad.wilk@oracle.com> wrote:
> On Wed, Jun 10, 2015 at 02:34:00PM +0200, Roger Pau Monné wrote:
>> The first three notes contain information about the guest kernel and 
>> the Xen hypercall ABI version. The following notes are of special 
>> interest:
>> 
>>  * XEN_ELFNOTE_PADDR_OFFSET: the offset of the ELF paddr field from the
>>    actual required physical address.
>>  * XEN_ELFNOTE_PADDR_ENTRY: the 32bit entry point into the kernel.
> 
> Is 'P' suppose to be 'physical' ?
> 
> I am not sure how this will work with an ELF 64-bit binary like
> the Linux kernel. Usually we use the virtual address but with
> us starting in 32-bit mode with an 64-bit virtual address won't work.

So first you correctly decode the 'P' as physical, and then you're
concerned about _virtual_ addresses? The Linux ELF PHDR has
perfectly valid virtual _and_ physical addresses in it afaict.

>> Note that the boot protocol resembles the multiboot1 specification, 
>> this is done so OSes with multiboot1 entry points can reuse those if 
>> desired. Also note that the processor starts with paging disabled, 
>> which means that all the memory addresses in the start_info page will 
>> be physical memory addresses.
> 
> Wow?! Pagetables disabled?! Why? Usually boot loaders start with some
> pagetables setup for the OS - to cover at least the kernel and the
> ramdisk. Either it being in 1-1 pagetables or such.

Mind pointing out which boot loaders you think about here? Both
multiboot variants surely start the OS in non-paged protected
mode. Of course, UEFI is completely different (because it wants
itself to run in 64-bit mode).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 15:57     ` Andrew Cooper
@ 2015-06-11  8:23       ` Roger Pau Monné
  0 siblings, 0 replies; 17+ messages in thread
From: Roger Pau Monné @ 2015-06-11  8:23 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Tim Deegan,
	Jan Beulich, Boris Ostrovsky

El 10/06/15 a les 17.57, Andrew Cooper ha escrit:
>>>> Do we want to keep using the start_info page? Most of the fields there 
>>>> are not relevant for auto-translated guests, but without it we have to 
>>>> figure out how to pass the following information to the guest:
>>>>
>>>>  - Flags: SIF_xxx flags, this could probably be done with cpuid instead.
>>>>  - cmd_line: ?
>>>>  - console mfn: ?
>>>>  - console evtchn: ?
>>>>  - console_info address: ?
>>> All console information should be available from the HVMPARAMS.  I see
>>> no reason to prevent a PVH guest getting at these.
>>>
>>> This just leaves the command line being awkward.
>> I've forgot to add the kernel payload (initramfs), which could also be
>> fetched using a HVMPARAM, so that just leaves the cmd_line, which could
>> be passed as a physical memory address in one of the gp registers. I
>> don't have a strong opinion on whether we should create a new
>> hvm_start_info struct that contains those, or whether we should just add
>> new HVMPARAMS.
> 
> I should have remembered as well, as I have a curiously-pvh-like usecase
> which wants similar bits.
> 
> The reason I suggested multiboot was for modules and command line
> support.  If we are not going for exactly multiboot, but something
> similar, we might want to make a pvh_start_info with a module list and
> cmdline pointer in it.

I don't think we can go for multiboot as-is. It implicitly requires an
ELF32 binary unless you want to setup the address header of multiboot,
and even in that case it lacks a proper way to load a symtab/strtab for
example, which the generic Xen ELF loader has.

Roger.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 21:31   ` Andrew Cooper
@ 2015-06-11  8:31     ` Roger Pau Monné
  0 siblings, 0 replies; 17+ messages in thread
From: Roger Pau Monné @ 2015-06-11  8:31 UTC (permalink / raw)
  To: Andrew Cooper, Konrad Rzeszutek Wilk
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Tim Deegan,
	Jan Beulich, xen-devel, Boris Ostrovsky

El 10/06/15 a les 23.31, Andrew Cooper ha escrit:
> On 10/06/2015 19:55, Konrad Rzeszutek Wilk wrote:
>>> All other processor registers and flag bits are undefined. The OS is in 
>>> charge of setting up it's own stack, GDT and IDT.
>>>
>>> Note that the boot protocol resembles the multiboot1 specification, 
>>> this is done so OSes with multiboot1 entry points can reuse those if 
>>> desired. Also note that the processor starts with paging disabled, 
>>> which means that all the memory addresses in the start_info page will 
>>> be physical memory addresses.
>> Wow?! Pagetables disabled?! Why? Usually boot loaders start with some
>> pagetables setup for the OS - to cover at least the kernel and the
>> ramdisk. Either it being in 1-1 pagetables or such.
>>
>> Why make this work harder for the guest?
>> Why can't the hypervisor setup most of these things for the guest?
> 
> If you start with paging enabled, the domain builder has to know the
> intended runmode and paging details a priori. 

That's not 100% true, since we agreed to always launch the guest in
32bit protected mode we could create some simple 32bit page tables
without PAE.

IMHO it's not worth it, it's very unlikely that the page tables we build
are going to be suitable for the guest OS, so the guest needs to build
it's own page tables anyway.

>>From a 32bit flat entry, it is 0x40 bytes worth of instructions for
> 32bit, or 0x4f bytes worth of instructions for 64bit to get set up in
> the desired paging mode, with a usable stack (and I could definitely
> reduce those numbers, at the expense of readability). (TODO - find
> enough free time to finish the test framework and publish it.)  The
> point is that it is not hard at all, but offers substantially more
> flexibility to both host and guest software.

I have the following trampoline which seems to work fine on a FreeBSD
64bit kernel (which is an ELF64 binary itself):

https://people.freebsd.org/~royger/xen-locore32.S

(I've also hacked the HVM builder code in libxc in order to load a
kernel directly instead of the hvmloader image, but that's too dirty to
post here).

Roger.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 18:55 ` Konrad Rzeszutek Wilk
  2015-06-10 21:31   ` Andrew Cooper
  2015-06-11  7:18   ` Jan Beulich
@ 2015-06-11  8:43   ` Roger Pau Monné
  2015-06-12 13:23     ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2015-06-11  8:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, Jan Beulich, xen-devel, Boris Ostrovsky

El 10/06/15 a les 20.55, Konrad Rzeszutek Wilk ha escrit:
> On Wed, Jun 10, 2015 at 02:34:00PM +0200, Roger Pau Monné wrote:
>> Hello,
>>
>> The discussion in [1] lead to an agreement of the missing pieces in PVH 
>> (or HVM without a device-model) in order to progress with it's 
>> implementation.
>>
>> One of the missing pieces is a new boot ABI, that replaces the PV boot 
>> ABI. The aim of this new boot ABI is to remove the limitations of the 
> 
> To be fair, there is an existing boot ABI.
> 
> It is the same as the PV boot but since it is an PV autotranslated
> guest some of the values that an PV guest require are undefined.
> 
> With that in mind, why cannot we re-use that (xen_start_info) and
> any field which is PV specific can be treated as reserved?

IMHO I would rather get rid of start_info and fetch everything using
HVMPARAMS instead, this is more similar to what ARM guests already do.
This means we can get rid of start_info in the long run, and that we
don't paint ourselves into a corner, HVMPARAMS can always be expanded
without problems.

>> PV boot ABI, that are no longer present when using auto-translated 
>> guests. The new boot protocol should allow to use the same entry point 
>> for both 32bit and 64bit guests, and let the guest choose it's bitness 
>> at run time without the domain builder knowing in advance.
> 
> I like that idea, but that will make the work going forward
> on the 32-bit PVH and AMD PVH move out at least another half year
> - which is rather sad.
> 
> Also this change will require modifying the Linux 64-bit PVH
> part. That should be mentioned - and that is likely going to
> take also three months.
> 
> 
>>
>> Roger.
>>
>> [1] http://lists.xen.org/archives/html/xen-devel/2015-06/msg00258.html
>>
>> ---
>> HVM direct boot ABI
>>
>> Since the Xen entry point into the kernel can be different from the 
>> native entry point, ELFNOTES are used in order to tell the domain 
>> builder how to load and jump into the kernel entry point. At least the 
>> following ELFNOTES are required in order to use this boot ABI:
>>
>> ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS,       .asciz, "FreeBSD")
>> ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION,  .asciz, __XSTRING(__FreeBSD_version))
>> ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION,    .asciz, "xen-3.0")
>> ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET,   .quad,  KERNBASE)
>> ELFNOTE(Xen, XEN_ELFNOTE_PADDR_ENTRY,    .quad,  xen_start32)
>> ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .asciz, "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector")
> 
> That will choke on older hypervisors. That is a normal PV
> guest won't boot anymore. That is because the older hypervisors
> will choke on 'hvm_callback_vector' being in the XEN_ELFNOTE_FEATURES.

I see, this is what FreeBSD currently uses. We are going to choke on
older hypervisors anyway, since FreeBSD only supports PVH.

> You have to stick that in XEN_ELFNOTE_SUPPORTED_FEATURES field.
> 
>> ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz, "generic")
>>
>> The first three notes contain information about the guest kernel and 
>> the Xen hypercall ABI version. The following notes are of special 
>> interest:
>>
>>  * XEN_ELFNOTE_PADDR_OFFSET: the offset of the ELF paddr field from the
>>    actual required physical address.
>>  * XEN_ELFNOTE_PADDR_ENTRY: the 32bit entry point into the kernel.
> 
> Is 'P' suppose to be 'physical' ?
> 
> I am not sure how this will work with an ELF 64-bit binary like
> the Linux kernel. Usually we use the virtual address but with
> us starting in 32-bit mode with an 64-bit virtual address won't work.

That's why we are defining a new entry point instead of reusing the
current XEN_ELFNOTE_ENTRY note. This entry point is expected to be a
32bit physical address.

> But the ELF loader could figure out the offset of the virtual
> address from the ELF starting point and just call at the delta - in
> which case having XEN_ELFNOTE_ENTRY can be used with the
> understanding that we will just call at that that offset.

I'm not following you here, I don't think it's possible to reuse the
same entry point, that's why this new ELFNOTE is proposed.

Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-10 13:15 ` Jan Beulich
  2015-06-10 14:53   ` Roger Pau Monné
  2015-06-10 15:42   ` Roger Pau Monné
@ 2015-06-11 11:01   ` Tim Deegan
  2 siblings, 0 replies; 17+ messages in thread
From: Tim Deegan @ 2015-06-11 11:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	xen-devel, Boris Ostrovsky, Roger Pau Monné

At 14:15 +0100 on 10 Jun (1433945712), Jan Beulich wrote:
> >>> On 10.06.15 at 14:34, <roger.pau@citrix.com> wrote:
> >  * XEN_ELFNOTE_PADDR_OFFSET: the offset of the ELF paddr field from the
> >    actual required physical address.
> 
> Why would that be needed? I.e. why would there ever be an offset?

I had the same question -- given that ELF provides physical load
addresses the obvious thing to do here is load at the specified
paddrs.

The example FreeBSD headers elsewhere in the thread just have paddr ==
vaddr, which is clearly not the case in a real machine, so we ought
to be able to use those fields for their intended purpose.

> >  * ds, es, fs, gs, ss: must be a 32-bit read/write data segment with an
> >    offset of ?0? and a limit of ?0xFFFFFFFF?. The exact values are all
> >    undefined.
> 
> Same here, plus I don't think fs and gs should be defined to have any
> particular value, base, limit, or attributes (such that handing off with
> them holding nul selectors would become acceptable).

Given that this is just following the multiboot spec, and that once
you've set %ds, setting the others is effectively free, I think we
should set all of them.  

In fact, going further, I think we should just include the multiboot
spec by reference, and specify the changes that we will make, e.g.:

- different magic number, so the guest can tell what's going on
- allowing ELF64 binaries as well as ELF32
- how to get at boot info (i.e. cpuid -> hypercall table -> hvm params).

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-11  8:43   ` Roger Pau Monné
@ 2015-06-12 13:23     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-12 13:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, Jan Beulich, xen-devel, Boris Ostrovsky

On Thu, Jun 11, 2015 at 10:43:08AM +0200, Roger Pau Monné wrote:
> El 10/06/15 a les 20.55, Konrad Rzeszutek Wilk ha escrit:
> > On Wed, Jun 10, 2015 at 02:34:00PM +0200, Roger Pau Monné wrote:
> >> Hello,
> >>
> >> The discussion in [1] lead to an agreement of the missing pieces in PVH 
> >> (or HVM without a device-model) in order to progress with it's 
> >> implementation.
> >>
> >> One of the missing pieces is a new boot ABI, that replaces the PV boot 
> >> ABI. The aim of this new boot ABI is to remove the limitations of the 
> > 
> > To be fair, there is an existing boot ABI.
> > 
> > It is the same as the PV boot but since it is an PV autotranslated
> > guest some of the values that an PV guest require are undefined.
> > 
> > With that in mind, why cannot we re-use that (xen_start_info) and
> > any field which is PV specific can be treated as reserved?
> 
> IMHO I would rather get rid of start_info and fetch everything using
> HVMPARAMS instead, this is more similar to what ARM guests already do.
> This means we can get rid of start_info in the long run, and that we
> don't paint ourselves into a corner, HVMPARAMS can always be expanded
> without problems.
> 
> >> PV boot ABI, that are no longer present when using auto-translated 
> >> guests. The new boot protocol should allow to use the same entry point 
> >> for both 32bit and 64bit guests, and let the guest choose it's bitness 
> >> at run time without the domain builder knowing in advance.
> > 
> > I like that idea, but that will make the work going forward
> > on the 32-bit PVH and AMD PVH move out at least another half year
> > - which is rather sad.
> > 
> > Also this change will require modifying the Linux 64-bit PVH
> > part. That should be mentioned - and that is likely going to
> > take also three months.
> > 
> > 
> >>
> >> Roger.
> >>
> >> [1] http://lists.xen.org/archives/html/xen-devel/2015-06/msg00258.html
> >>
> >> ---
> >> HVM direct boot ABI
> >>
> >> Since the Xen entry point into the kernel can be different from the 
> >> native entry point, ELFNOTES are used in order to tell the domain 
> >> builder how to load and jump into the kernel entry point. At least the 
> >> following ELFNOTES are required in order to use this boot ABI:
> >>
> >> ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS,       .asciz, "FreeBSD")
> >> ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION,  .asciz, __XSTRING(__FreeBSD_version))
> >> ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION,    .asciz, "xen-3.0")
> >> ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET,   .quad,  KERNBASE)
> >> ELFNOTE(Xen, XEN_ELFNOTE_PADDR_ENTRY,    .quad,  xen_start32)
> >> ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .asciz, "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector")
> > 
> > That will choke on older hypervisors. That is a normal PV
> > guest won't boot anymore. That is because the older hypervisors
> > will choke on 'hvm_callback_vector' being in the XEN_ELFNOTE_FEATURES.
> 
> I see, this is what FreeBSD currently uses. We are going to choke on
> older hypervisors anyway, since FreeBSD only supports PVH.

Linux would choke too. I would like Linux (next-version) to still
work on Amazon installations that use older hypervisor.

> 
> > You have to stick that in XEN_ELFNOTE_SUPPORTED_FEATURES field.
> > 
> >> ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz, "generic")
> >>
> >> The first three notes contain information about the guest kernel and 
> >> the Xen hypercall ABI version. The following notes are of special 
> >> interest:
> >>
> >>  * XEN_ELFNOTE_PADDR_OFFSET: the offset of the ELF paddr field from the
> >>    actual required physical address.
> >>  * XEN_ELFNOTE_PADDR_ENTRY: the 32bit entry point into the kernel.
> > 
> > Is 'P' suppose to be 'physical' ?
> > 
> > I am not sure how this will work with an ELF 64-bit binary like
> > the Linux kernel. Usually we use the virtual address but with
> > us starting in 32-bit mode with an 64-bit virtual address won't work.
> 
> That's why we are defining a new entry point instead of reusing the
> current XEN_ELFNOTE_ENTRY note. This entry point is expected to be a
> 32bit physical address.

Thanks, Jan clued me in.
> 
> > But the ELF loader could figure out the offset of the virtual
> > address from the ELF starting point and just call at the delta - in
> > which case having XEN_ELFNOTE_ENTRY can be used with the
> > understanding that we will just call at that that offset.
> 
> I'm not following you here, I don't think it's possible to reuse the
> same entry point, that's why this new ELFNOTE is proposed.

<nods>
> 
> Roger.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Draft A] Boot ABI for HVM guests without a device-model
  2015-06-11  7:18   ` Jan Beulich
@ 2015-06-12 13:30     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-12 13:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Elena Ufimtseva, Ian Campbell, Stefano Stabellini, Andrew Cooper,
	Tim Deegan, xen-devel, Boris Ostrovsky, roger.pau

On Thu, Jun 11, 2015 at 08:18:27AM +0100, Jan Beulich wrote:
> >>> On 10.06.15 at 20:55, <konrad.wilk@oracle.com> wrote:
> > On Wed, Jun 10, 2015 at 02:34:00PM +0200, Roger Pau Monné wrote:
> >> The first three notes contain information about the guest kernel and 
> >> the Xen hypercall ABI version. The following notes are of special 
> >> interest:
> >> 
> >>  * XEN_ELFNOTE_PADDR_OFFSET: the offset of the ELF paddr field from the
> >>    actual required physical address.
> >>  * XEN_ELFNOTE_PADDR_ENTRY: the 32bit entry point into the kernel.
> > 
> > Is 'P' suppose to be 'physical' ?
> > 
> > I am not sure how this will work with an ELF 64-bit binary like
> > the Linux kernel. Usually we use the virtual address but with
> > us starting in 32-bit mode with an 64-bit virtual address won't work.
> 
> So first you correctly decode the 'P' as physical, and then you're
> concerned about _virtual_ addresses? The Linux ELF PHDR has
> perfectly valid virtual _and_ physical addresses in it afaict.
> 
> >> Note that the boot protocol resembles the multiboot1 specification, 
> >> this is done so OSes with multiboot1 entry points can reuse those if 
> >> desired. Also note that the processor starts with paging disabled, 
> >> which means that all the memory addresses in the start_info page will 
> >> be physical memory addresses.
> > 
> > Wow?! Pagetables disabled?! Why? Usually boot loaders start with some
> > pagetables setup for the OS - to cover at least the kernel and the
> > ramdisk. Either it being in 1-1 pagetables or such.
> 
> Mind pointing out which boot loaders you think about here? Both
> multiboot variants surely start the OS in non-paged protected
> mode. Of course, UEFI is completely different (because it wants
> itself to run in 64-bit mode).

I was thinking of UEFI. Other ones as you pointed out are more
primitive.
> 
> Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-06-12 13:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-10 12:34 [Draft A] Boot ABI for HVM guests without a device-model Roger Pau Monné
2015-06-10 13:15 ` Jan Beulich
2015-06-10 14:53   ` Roger Pau Monné
2015-06-10 15:53     ` Jan Beulich
2015-06-10 15:42   ` Roger Pau Monné
2015-06-11 11:01   ` Tim Deegan
2015-06-10 13:18 ` Andrew Cooper
2015-06-10 15:38   ` Roger Pau Monné
2015-06-10 15:57     ` Andrew Cooper
2015-06-11  8:23       ` Roger Pau Monné
2015-06-10 18:55 ` Konrad Rzeszutek Wilk
2015-06-10 21:31   ` Andrew Cooper
2015-06-11  8:31     ` Roger Pau Monné
2015-06-11  7:18   ` Jan Beulich
2015-06-12 13:30     ` Konrad Rzeszutek Wilk
2015-06-11  8:43   ` Roger Pau Monné
2015-06-12 13:23     ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.