xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found] <20160406024027.GX1990@wotan.suse.de>
@ 2016-04-06  9:40 ` David Vrabel
  2016-04-06 11:07 ` George Dunlap
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: David Vrabel @ 2016-04-06  9:40 UTC (permalink / raw)
  To: Luis R. Rodriguez, Andrew Cooper, Boris Ostrovsky,
	Roger Pau Monné,
	Matt Fleming, Juergen Gross, Charles Arndol, Jim Fehlig,
	Jan Beulich, Daniel Kiper, H. Peter Anvin, x86
  Cc: Stefano Stabellini, linux-kernel, Michael Chang, Andy Lutomirski,
	joeyli, Julien Grall, Vojtěch Pavlík, Borislav Petkov,
	xen-devel, Gary Lin, Jeffrey Cheung

On 06/04/16 03:40, Luis R. Rodriguez wrote:
> 
>     * You don't need full EFI emulation

I think needing any EFI emulation inside Xen (which is where it would
need to be for dom0) is not suitable because of the increase in
hypervisor ABI.

I also still do not understand your objection to the current tiny stub.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found] <20160406024027.GX1990@wotan.suse.de>
  2016-04-06  9:40 ` HVMLite / PVHv2 - using x86 EFI boot entry David Vrabel
@ 2016-04-06 11:07 ` George Dunlap
  2016-04-06 11:11 ` Daniel Kiper
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-04-06 11:07 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, Michael Chang, Julien Grall, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Stefano Stabellini, Jim Fehlig, joeyli,
	Borislav Petkov, Boris Ostrovsky, Juergen Gross, Andrew Cooper,
	Linux Kernel Mailing List, Andy Lutomirski, David Vrabel

On Wed, Apr 6, 2016 at 3:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> A huge summary of the discussion over EFI boot option for HVMLite is now on a
> wiki [2], below I'll just provide the outline of the discussion. Consider this a
> request for more public review, feel free to take any of the items below and
> elaborate on it as you see fit.
[snip]
>   * Issues with boot x86 boot entries
>     * Small x86 zero page stubs
[snip]
>   * Points against using EFI
>     * Nulling the claimed boot loader effect

I'm a bit confused about this. You list exactly two arguments against
the proposed stub in the "con" section:
1. Bootloaders may not be able to use the extra entry point
2. It's an extra entry point

And then later, in another section, you actually list the reason #1 is
irrelevant: bootloaders don't matter because the stub is there to boot
from the Xen hypervisor.

So the only actual argument you have against the proposed PVH stub in
the linked document is that it's an extra entry point.

>   * Why use EFI for HVMlite
>     * EFI calling conventions are standardized
>     * EFI entry generalizes what new HVMLite entry proposes
>     * Further semantics may be needed
>     * Match Xen ARM's clean solution
>     * You don't need full EFI emulation
>       * Minimal EFI stubs for guests
>         * GetMemoryMap()
>         * ExitBootServices()
>       * EFI stubs which may be needed for guests
>         * Exit()
>         * Variable operation functions
>       * EFI stubs not needed for guests
>         * GetTime()/SetTime()
>         * SetVirtualAddressMap()
>         * ResetSystem()
>       * dom0 EFI
>       * domU EFI emulation possibilities
>         * Xen implements its own EFI environment for guests
>         * Xen uses Tianocore / OVMF

So rather than make a new entry point which does just the minimal
amount of work to run on a software interface (Xen), you want to take
an interface designed for hardware (EFI) and put in hacks so that it
knows that sometimes some EFI services are not available?  That sounds
like it's going to make the EFI path just as unmanageable as the
current PV path.

Using the EFI entry point would certainly make sense if it was
actually simpler than the proposed extra entry point.  But it sounds
like it's going to be more complicated, not only for Xen, but also for
Linux.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found] <20160406024027.GX1990@wotan.suse.de>
  2016-04-06  9:40 ` HVMLite / PVHv2 - using x86 EFI boot entry David Vrabel
  2016-04-06 11:07 ` George Dunlap
@ 2016-04-06 11:11 ` Daniel Kiper
       [not found] ` <CAFLBxZbRjB6QWH5GbG6osCXat9NQVUAyDYrAMrdALbCofpX3Dg@mail.gmail.com>
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: Daniel Kiper @ 2016-04-06 11:11 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, linux-kernel, Julien Grall, Jan Beulich,
	H. Peter Anvin, x86, Vojtěch Pavlík, Gary Lin,
	xen-devel, Jeffrey Cheung, Charles Arndol, Stefano Stabellini,
	Jim Fehlig, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Michael Chang, Andy Lutomirski,
	david.vrabel, Roger Pau Monné

On Wed, Apr 06, 2016 at 04:40:27AM +0200, Luis R. Rodriguez wrote:
> Boris sent out the first HVMLite series of patches to add a new Xen guest type
> February 1, 2016 [0]. We've been talking off list with a few folks now over
> the prospect of instead of adding yet-another-boot-entry we instead fixate
> HVMLite to use the x86 EFI boot entry. There's a series of reasons to consider
> this, likewise there are reasons to question the effort required and if its
> really needed. We'd like some more public review of this proposal, and see if
> others can come up with other ideas, both in favor or against this proposal.
>
> This in particular is also a good time to get x86 Linux folks to chime on on
> the general design proposal of HVMLite design, given that outside of the boot
> entry discussion it would seem including myself that we didn't get the memo
> over the proposed architecture review [1]. At least on my behalf perhaps the
> only sticking thorns of the design was the new boot entry, which came to me
> as a surprise, and this thread addresses and the lack of addressing semantics
> for early boot (which we may seem to need to address; some of this is being
> addressing in parallels through other work). The HVMLite document talks about
> using ACPI_FADT_NO_VGA -- we don't use this yet upstream but I have some pending
> changes which should make it easy to integrate its use on HVMLite. Perhaps
> there are others that may have some other points they may want to raise now...
>
> A huge summary of the discussion over EFI boot option for HVMLite is now on a
> wiki [2], below I'll just provide the outline of the discussion. Consider this a
> request for more public review, feel free to take any of the items below and
> elaborate on it as you see fit.
>
> Worth mentioning also is that this topic will be discussed at the 2016 Xen
> Hackathon April 18-19 [3] at the ARM Cambridge, UK Headquarters so if you can
> attend and this topic interests you, consider attending.

I hope that you will be there as one of the biggest proponents of EFI entry point.
If you does not it will be difficult or impossible to discuss this issue without you.
In the worst case I can raise this topic on behalf of you and then we should organize
phone call if possible (and accepted by others). However, to do that I must know your
plans in advance.

Daniel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found] ` <CAFLBxZbRjB6QWH5GbG6osCXat9NQVUAyDYrAMrdALbCofpX3Dg@mail.gmail.com>
@ 2016-04-06 15:02   ` Matt Fleming
  2016-04-07 18:51   ` Luis R. Rodriguez
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Matt Fleming @ 2016-04-06 15:02 UTC (permalink / raw)
  To: George Dunlap
  Cc: Michael Chang, Linux Kernel Mailing List, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Vojtěch Pavlík, Gary Lin,
	xen-devel, Jeffrey Cheung, Charles Arndol, Stefano Stabellini,
	joeyli, Borislav Petkov, Boris Ostrovsky, Juergen Gross,
	Andrew Cooper, Jim Fehlig, Andy Lutomirski, Luis R. Rodriguez,
	David Vrabel

On Wed, 06 Apr, at 12:07:36PM, George Dunlap wrote:
> 
> So rather than make a new entry point which does just the minimal
> amount of work to run on a software interface (Xen), you want to take
> an interface designed for hardware (EFI) and put in hacks so that it
> knows that sometimes some EFI services are not available?  That sounds
> like it's going to make the EFI path just as unmanageable as the
> current PV path.
 
Requiring code in the new entry point to manipulate control registers
and do the switch to long-mode does not seem like a minimal amount of
code to me,

  http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00134.html

What's likely to happen in the future is that startup_(32|64) will be
entered with different settings depending on whether coming from
HVMlite or bare metal, due to the natural tendency for these kinds of
code paths to diverge.

Sometimes EFI runtime services are not available on bare metal
hardware too, for example, when booting 32-bit kernels on 64-bit EFI
or 64-bit kernels on 32-bit EFI without CONFIG_EFI_MIXED. Or when
booting with the "noefi" kernel command line parameter. That's how
things work today when booting Xen, we disable the runtime services.

EFI boot services are a different story however, and the EFI boot stub
would need to be changed to handle that. Though honestly, it would
make more sense to provide EFI services stubs in the kernel image
itself that are implemented using hypercalls, and assuming you can run
hypercalls that early in boot.

One place that struck me as suitable for this "hypercall in an EFI
service stub" approach is the trouble with doing ACPI reboot as
documented here,

  http://lists.xen.org/archives/html/xen-devel/2016-02/msg01609.html

Performing the reset hypercall from within HVMlite's custom EfiReset()
service would avoid having to touch ACPICA at all, and would be
indistinguishable from bare metal.

> Using the EFI entry point would certainly make sense if it was
> actually simpler than the proposed extra entry point.  But it sounds
> like it's going to be more complicated, not only for Xen, but also for
> Linux.

Until someone sits down and writes the code I think we're going to be
arguing back and forth over this particular point.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]   ` <20160406150240.GO2701@codeblueprint.co.uk>
@ 2016-04-06 16:05     ` Konrad Rzeszutek Wilk
  2016-04-06 16:23       ` Konrad Rzeszutek Wilk
  2016-04-13 10:03     ` Roger Pau Monné
       [not found]     ` <20160413100312.647eocdtbmak4btk@mac>
  2 siblings, 1 reply; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-06 16:05 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Michael Chang, Jim Fehlig, Jan Beulich, H. Peter Anvin,
	Daniel Kiper, the arch/x86 maintainers, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross, Julien Grall,
	Stefano Stabellini, George Dunlap, joeyli, Borislav Petkov,
	Boris Ostrovsky, Charles Arndol, Andrew Cooper,
	Linux Kernel Mailing List, Andy Lutomirski, Luis R. Rodriguez

On Wed, Apr 06, 2016 at 04:02:40PM +0100, Matt Fleming wrote:
> On Wed, 06 Apr, at 12:07:36PM, George Dunlap wrote:
> > 
> > So rather than make a new entry point which does just the minimal
> > amount of work to run on a software interface (Xen), you want to take
> > an interface designed for hardware (EFI) and put in hacks so that it
> > knows that sometimes some EFI services are not available?  That sounds
> > like it's going to make the EFI path just as unmanageable as the
> > current PV path.
>  
> Requiring code in the new entry point to manipulate control registers
> and do the switch to long-mode does not seem like a minimal amount of
> code to me,
> 
>   http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00134.html
> 
> What's likely to happen in the future is that startup_(32|64) will be
> entered with different settings depending on whether coming from
> HVMlite or bare metal, due to the natural tendency for these kinds of
> code paths to diverge.

I hope they do not have the same churn as the rest of Linux code.

The startup_(32|64) are to be called from divergent
bootloaders - and they are responsible to set the stage. Or in other
words - startup_(32|64) has some expectations of what the world
will look like. Changing those means the bootloaders stub have to change
too.

But if there is churn it surely is less than what the PV code paths
are enforcing now in x86 code.

And it is in assembler so only a few folks would venture in that..


> 
> Sometimes EFI runtime services are not available on bare metal
> hardware too, for example, when booting 32-bit kernels on 64-bit EFI
> or 64-bit kernels on 32-bit EFI without CONFIG_EFI_MIXED. Or when
> booting with the "noefi" kernel command line parameter. That's how
> things work today when booting Xen, we disable the runtime services.

Why? You can use GRUB2+EFI+MB2 and boot with EFI boot services..
Or boot Xen as an EFI application.
> 
> EFI boot services are a different story however, and the EFI boot stub
> would need to be changed to handle that. Though honestly, it would
> make more sense to provide EFI services stubs in the kernel image
> itself that are implemented using hypercalls, and assuming you can run
> hypercalls that early in boot.
> 
> One place that struck me as suitable for this "hypercall in an EFI
> service stub" approach is the trouble with doing ACPI reboot as
> documented here,
> 
>   http://lists.xen.org/archives/html/xen-devel/2016-02/msg01609.html
> 
> Performing the reset hypercall from within HVMlite's custom EfiReset()
> service would avoid having to touch ACPICA at all, and would be
> indistinguishable from bare metal.
> 
> > Using the EFI entry point would certainly make sense if it was
> > actually simpler than the proposed extra entry point.  But it sounds
> > like it's going to be more complicated, not only for Xen, but also for
> > Linux.
> 
> Until someone sits down and writes the code I think we're going to be
> arguing back and forth over this particular point.

.. Pragmatic! I like that!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
  2016-04-06 16:05     ` Konrad Rzeszutek Wilk
@ 2016-04-06 16:23       ` Konrad Rzeszutek Wilk
  2016-04-08 21:53         ` Luis R. Rodriguez
  0 siblings, 1 reply; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-06 16:23 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Michael Chang, Jim Fehlig, Jan Beulich, H. Peter Anvin,
	Daniel Kiper, the arch/x86 maintainers, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross, Julien Grall,
	Stefano Stabellini, George Dunlap, joeyli, Borislav Petkov,
	Boris Ostrovsky, Charles Arndol, Andrew Cooper,
	Linux Kernel Mailing List, Andy Lutomirski, Luis R. Rodriguez

On Wed, Apr 06, 2016 at 12:05:16PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 06, 2016 at 04:02:40PM +0100, Matt Fleming wrote:
> > On Wed, 06 Apr, at 12:07:36PM, George Dunlap wrote:
> > > 
> > > So rather than make a new entry point which does just the minimal
> > > amount of work to run on a software interface (Xen), you want to take
> > > an interface designed for hardware (EFI) and put in hacks so that it
> > > knows that sometimes some EFI services are not available?  That sounds
> > > like it's going to make the EFI path just as unmanageable as the
> > > current PV path.
> >  
> > Requiring code in the new entry point to manipulate control registers
> > and do the switch to long-mode does not seem like a minimal amount of
> > code to me,
> > 
> >   http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00134.html
> > 
> > What's likely to happen in the future is that startup_(32|64) will be
> > entered with different settings depending on whether coming from
> > HVMlite or bare metal, due to the natural tendency for these kinds of
> > code paths to diverge.
> 
> I hope they do not have the same churn as the rest of Linux code.
> 
> The startup_(32|64) are to be called from divergent
> bootloaders - and they are responsible to set the stage. Or in other
> words - startup_(32|64) has some expectations of what the world
> will look like. Changing those means the bootloaders stub have to change
> too.
> 
> But if there is churn it surely is less than what the PV code paths
> are enforcing now in x86 code.

Let me expand on that since I was not sure if I was clear.

Currently Boris tirelessly ends up fixing on almost every merge window
Xen related fallout. That is new functionality that breaks Xen.
He has been doing this for years and before him I was doing it.

This is what an maintainer does - and with the HVMLite/PVH stub
paths that will still continue - that is fallout from the
startup_(32|64) code changes will be handled as before.

However the bigger goals are that:
 - This churn will be much much lower than the existing one,

 - baremetal won't have to deal with some rather odd semantics
   placed by the pvops paths that are funky and drive x86
   maintainers to lose hair (amongts other things).


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found] ` <CAFLBxZbRjB6QWH5GbG6osCXat9NQVUAyDYrAMrdALbCofpX3Dg@mail.gmail.com>
  2016-04-06 15:02   ` Matt Fleming
@ 2016-04-07 18:51   ` Luis R. Rodriguez
       [not found]   ` <20160406150240.GO2701@codeblueprint.co.uk>
       [not found]   ` <20160407185148.GL1990@wotan.suse.de>
  3 siblings, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-07 18:51 UTC (permalink / raw)
  To: George Dunlap
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Jim Fehlig, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Charles Arndol, Andrew Cooper, Julien Grall, Andy Lutomirski

On Wed, Apr 06, 2016 at 12:07:36PM +0100, George Dunlap wrote:
> On Wed, Apr 6, 2016 at 3:40 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > A huge summary of the discussion over EFI boot option for HVMLite is now on a
> > wiki [2], below I'll just provide the outline of the discussion. Consider this a
> > request for more public review, feel free to take any of the items below and
> > elaborate on it as you see fit.
> [snip]
> >   * Issues with boot x86 boot entries
> >     * Small x86 zero page stubs
> [snip]
> >   * Points against using EFI
> >     * Nulling the claimed boot loader effect
> 
> I'm a bit confused about this. You list exactly two arguments against
> the proposed stub in the "con" section:
> 1. Bootloaders may not be able to use the extra entry point
> 2. It's an extra entry point
> 
> And then later, in another section, you actually list the reason #1 is
> irrelevant: bootloaders don't matter because the stub is there to boot
> from the Xen hypervisor.

Forgive me, the private thread was ongoing and I really wanted to capture
both sides of the expressed arguments and move to the list any extensions
to the discussion, this meant annotating both positions and letting
others fill in the gaps to determine if in fact one position was really
nullified by the other.

First I should state that it is only natural for anyone sensible to have
any type of knee-jerk reaction to kick and scream about the idea of
adding yet a new x86 entry point for Linux... IMO one should not expect
it to be sensible to simply accept yet-another-entry-point to Linux,
rather is should be the expected behaviour to have people really dig
and ensure they did their homework to ensure that if they are going to
add yet-another-entry-point they really validate and have exhausted
review of all possible avenues.

It was Andrew Coopers's position that boot loaders would not need to be
involved, and that would seem to nullify Matt's original position on this.

While Andrew's position is right in that perhaps only Xen tools have to deal
with the HVMLite specific entry, it would also still mean diverging from ARM's
own EFI entry only position, which I'd like to clarify that ARM has no custom
Xen entry, we should strive to match that. Anything far from that to me really
deserves an explanation, specially if we are going to argue that HVMLite is
the best that x86 Xen can do.

Ultimately unifying entry approaches for Xen in a streamlined fashion seems
like a sensible thing to strive for. Anything we push in the other direction,
as small as it can be, should deserve at least a 'hey, wait a minute'...

> So the only actual argument you have against the proposed PVH stub in
> the linked document is that it's an extra entry point.

Then you have not really read the document well, more to the point,
EFI's entry already does what the small HVMLite stub does, already
provides an existing entry and path to the kernel, so why should we
add yet another small stub?

So more to it, if the EFI entry already provides a way into Linux
in a more streamlined fashion bringing it closer to the bare metal
boot entry, why *would* we add another boot entry to x86, even if
its small and self contained ?

Another position against small stubs which I listed myself is that we may need
more semantics for early boot even if the new HVMLite small stub is added. This
remains to be seen. If we are going to add new semantics, it would seem best to
use something more standard like EFI configuration tables rather than hack on
to x86 further custom semantics. Custom sloppy semantics have proven to be
misused, and were ultimately a sloppy mess. To take this further,
virtualization semantics are being abused even outside of Xen -- drivers
developers may think that just because some semantics are available they can
use them to customize drivers to fine tune them for virtualized environments.
Even the best of our folks have taken positions to claim certain hacks are
*impossible* to change [0], when in fact only 4 days later a completely sensible
replacement was found [1], and this as even outside of Xen's situation, so its
not only Xen I am careful over here with regards to semantics. If we need early
boot code semantics or general kernel semantics for virtualization I want to
address that now and I want to be very careful with that given the abuse.
I'm doing my part to ensure that we clarify sloppy old semantics on Xen [2],
and this effort is actually proving to even pave the path for HVMLite, for
instance consider the gains of leveraging use of the legacy devices struct
in the future for ACPI_FADT_NO_VGA now, which HVMLite's specification seems
to annotate it will use. Clearing out the paravirt_enabled() hack for
pnpbios helped push for a right architectural solution to pave the path
for this in generic fashion.

[0] http://lkml.kernel.org/r/s5hvb4151v1.wl-tiwai@suse.de
[1] https://www.spinics.net/lists/alsa-devel/msg48627.html
[2] http://lkml.kernel.org/r/1459987594-5434-1-git-send-email-mcgrof@kernel.org

> 
> >   * Why use EFI for HVMlite
> >     * EFI calling conventions are standardized
> >     * EFI entry generalizes what new HVMLite entry proposes
> >     * Further semantics may be needed
> >     * Match Xen ARM's clean solution
> >     * You don't need full EFI emulation
> >       * Minimal EFI stubs for guests
> >         * GetMemoryMap()
> >         * ExitBootServices()
> >       * EFI stubs which may be needed for guests
> >         * Exit()
> >         * Variable operation functions
> >       * EFI stubs not needed for guests
> >         * GetTime()/SetTime()
> >         * SetVirtualAddressMap()
> >         * ResetSystem()
> >       * dom0 EFI
> >       * domU EFI emulation possibilities
> >         * Xen implements its own EFI environment for guests
> >         * Xen uses Tianocore / OVMF
> 
> So rather than make a new entry point which does just the minimal
> amount of work to run on a software interface (Xen), you want to take
> an interface designed for hardware (EFI) and put in hacks so that it
> knows that sometimes some EFI services are not available? 

The purpose of the discussion is to evaluate the EFI entry as a possible
alternative candidate to yet another entry point, from a completely engineering
neutral position.

> That sounds like it's going to make the EFI path just as unmanageable as the
> current PV path.

Can you describe how?

> Using the EFI entry point would certainly make sense if it was
> actually simpler than the proposed extra entry point.  But it sounds
> like it's going to be more complicated, not only for Xen, but also for
> Linux.

How so? Please provide specifics.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found] ` <20160406111130.GG3489@olila.local.net-space.pl>
@ 2016-04-07 19:12   ` Luis R. Rodriguez
  2016-04-09 17:02   ` Luis R. Rodriguez
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-07 19:12 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Matt Fleming, Jeff Mahoney, Michael Chang, linux-kernel,
	Julien Grall, Jan Beulich, H. Peter Anvin, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Stefano Stabellini, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper, Jim Fehlig,
	Andy Lutomirski, Luis R. Rodriguez, david.vrabel,
	Roger Pau Monné

On Wed, Apr 06, 2016 at 01:11:30PM +0200, Daniel Kiper wrote:
> On Wed, Apr 06, 2016 at 04:40:27AM +0200, Luis R. Rodriguez wrote:
> > Boris sent out the first HVMLite series of patches to add a new Xen guest type
> > February 1, 2016 [0]. We've been talking off list with a few folks now over
> > the prospect of instead of adding yet-another-boot-entry we instead fixate
> > HVMLite to use the x86 EFI boot entry. There's a series of reasons to consider
> > this, likewise there are reasons to question the effort required and if its
> > really needed. We'd like some more public review of this proposal, and see if
> > others can come up with other ideas, both in favor or against this proposal.
> >
> > This in particular is also a good time to get x86 Linux folks to chime on on
> > the general design proposal of HVMLite design, given that outside of the boot
> > entry discussion it would seem including myself that we didn't get the memo
> > over the proposed architecture review [1]. At least on my behalf perhaps the
> > only sticking thorns of the design was the new boot entry, which came to me
> > as a surprise, and this thread addresses and the lack of addressing semantics
> > for early boot (which we may seem to need to address; some of this is being
> > addressing in parallels through other work). The HVMLite document talks about
> > using ACPI_FADT_NO_VGA -- we don't use this yet upstream but I have some pending
> > changes which should make it easy to integrate its use on HVMLite. Perhaps
> > there are others that may have some other points they may want to raise now...
> >
> > A huge summary of the discussion over EFI boot option for HVMLite is now on a
> > wiki [2], below I'll just provide the outline of the discussion. Consider this a
> > request for more public review, feel free to take any of the items below and
> > elaborate on it as you see fit.
> >
> > Worth mentioning also is that this topic will be discussed at the 2016 Xen
> > Hackathon April 18-19 [3] at the ARM Cambridge, UK Headquarters so if you can
> > attend and this topic interests you, consider attending.
> 
> I hope that you will be there as one of the biggest proponents of EFI entry point.

It would be a last minute trip to prepare for...

> If you does not it will be difficult or impossible to discuss this issue without you.
> In the worst case I can raise this topic on behalf of you and then we should organize
> phone call if possible (and accepted by others). However, to do that I must know your
> plans in advance.

I understand, I'd like to make it clear I am taking simply a neutral position
on this topic, even though it may seem I'm a die-hard on this idea, this was
simply an architectural question that came up, and I have been just
dissatisfied with the answers against the architectural questions I had over
this.

To help better evaluate how neutral really a discussion like this can be
can someone please help chime in on the question of if there are pressures to
just complete HVMLite design already ? How strong are those ? Are we really
able to have a very neutral technical discussion on this ?

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]   ` <20160407185148.GL1990@wotan.suse.de>
@ 2016-04-08 14:16     ` George Dunlap
       [not found]     ` <5707BD2E.20204@citrix.com>
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-04-08 14:16 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Jim Fehlig, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross,
	Stefano Stabellini, Julien Grall, joeyli, Borislav Petkov,
	Boris Ostrovsky, Charles Arndol, Andrew Cooper, Julien Grall

On 07/04/16 19:51, Luis R. Rodriguez wrote:
> While Andrew's position is right in that perhaps only Xen tools have to deal
> with the HVMLite specific entry, it would also still mean diverging from ARM's
> own EFI entry only position, which I'd like to clarify that ARM has no custom
> Xen entry, we should strive to match that. Anything far from that to me really
> deserves an explanation, specially if we are going to argue that HVMLite is
> the best that x86 Xen can do.
> 
> Ultimately unifying entry approaches for Xen in a streamlined fashion seems
> like a sensible thing to strive for. Anything we push in the other direction,
> as small as it can be, should deserve at least a 'hey, wait a minute'...

Quick factual correction here.

"Since ARM guests only use the EFI entry point, x86 guests should also
only use the EFI entry point" is certainly a reasonable argument to make.

However, dom0 on ARM does not use the EFI entry point.  When starting
dom0, Xen uses the native entry point (the one that UBoot uses) and
hands dom0 a device-tree node.  The reason this is possible on ARM is
that there are no assumptions made about what hardware is or is not
present on the system -- everything that needs to be communicated about
what is or is not present can be passed in DT.

So it is incorrect to say that ARM has an "EFI entry only" position.

(On ACPI systems, it does apparently generate some UEFI informational
tables, which it passes to the dom0 kernel via DT; and the kernel
unpacks and puts in the right place.  Normal Xen ARM guests can use EFI,
but that's because we start OVMF in the guest context to provide the EFI
services.  These may be where the idea that ARM guests use only the UEFI
entry point came from.)

Obviously it would be nice if we could use the native entry point on x86
as well, but there's decades of legacy hardware and backwards
compatibility to deal with there.

(Julien is a Xen ARM maintainer, he can correct me if I've said
something incorrect.)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found] ` <5704D978.1050101@citrix.com>
@ 2016-04-08 20:40   ` Luis R. Rodriguez
       [not found]   ` <20160408204032.GR1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-08 20:40 UTC (permalink / raw)
  To: David Vrabel, Stefano Stabellini
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Jim Fehlig, Andy Lutomirski,
	Luis R. Rodriguez, Linus Torvalds, Roger Pau Monné

On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
> On 06/04/16 03:40, Luis R. Rodriguez wrote:
> > 
> >     * You don't need full EFI emulation
> 
> I think needing any EFI emulation inside Xen (which is where it would
> need to be for dom0) is not suitable because of the increase in
> hypervisor ABI.

Is this because of timing on architecture / design of HVMLite, or
a general position that the complexity to deal with EFI emulation
is too much for Xen's taste ?

ARM already went the EFI entry way for domU -- it went the OVMF route,
would such a possibility be possible for x86 domU HVMLite ? If not why
not, I mean it would seem to make sense to at least mimic the same type
of early boot environment, and perhaps there are some lessons to be
learned from that effort too.

Are there some lessons to be learned with ARM's effort? What are they?
If that could be re-done again with any type of cleaner path, what
could that be that could help the x86 side ?

Although emulating EFI may require work, some folks have pointed out
that the amount of work may not be that much. If that is done can
we instead rely on the same code to replace OVMF to support both
Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
this ?

> I also still do not understand your objection to the current tiny stub.

Its more of a hypothetical -- can an EFI entry be used instead given
it already does exactly what the new small entry does ? Its also rather
odd to add a new entry without evaluating fully a possible alternative
that would provide the same exact mechanism.

A full technical unbiased evaluation of the different approaches is what I'd
hope we could strive to achieve through discussion and peer review, thinking
and prioritizing ultimately what is best to minimize the impact on Linux
and also help take advantage of the best features possible through both
means. Thinking long term, not immediate short term.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
  2016-04-06 16:23       ` Konrad Rzeszutek Wilk
@ 2016-04-08 21:53         ` Luis R. Rodriguez
  0 siblings, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-08 21:53 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Matt Fleming, Michael Chang, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Juergen Gross, Julien Grall, Stefano Stabellini, George Dunlap,
	joeyli, Borislav Petkov, Boris Ostrovsky, Charles Arndol,
	Andrew Cooper, Linux Kernel Mailing List, Andy Lutomirski

On Wed, Apr 06, 2016 at 12:23:47PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 06, 2016 at 12:05:16PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Apr 06, 2016 at 04:02:40PM +0100, Matt Fleming wrote:
> > > On Wed, 06 Apr, at 12:07:36PM, George Dunlap wrote:
> > > > 
> > > > So rather than make a new entry point which does just the minimal
> > > > amount of work to run on a software interface (Xen), you want to take
> > > > an interface designed for hardware (EFI) and put in hacks so that it
> > > > knows that sometimes some EFI services are not available?  That sounds
> > > > like it's going to make the EFI path just as unmanageable as the
> > > > current PV path.
> > >  
> > > Requiring code in the new entry point to manipulate control registers
> > > and do the switch to long-mode does not seem like a minimal amount of
> > > code to me,
> > > 
> > >   http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00134.html
> > > 
> > > What's likely to happen in the future is that startup_(32|64) will be
> > > entered with different settings depending on whether coming from
> > > HVMlite or bare metal, due to the natural tendency for these kinds of
> > > code paths to diverge.
> > 
> > I hope they do not have the same churn as the rest of Linux code.
> > 
> > The startup_(32|64) are to be called from divergent
> > bootloaders - and they are responsible to set the stage. Or in other
> > words - startup_(32|64) has some expectations of what the world
> > will look like. Changing those means the bootloaders stub have to change
> > too.
> > 
> > But if there is churn it surely is less than what the PV code paths
> > are enforcing now in x86 code.

Its better for sure. But we can also look at other options to make it
even better. Its worth some review at the very least.

> Let me expand on that since I was not sure if I was clear.
> 
> Currently Boris tirelessly ends up fixing on almost every merge window
> Xen related fallout. That is new functionality that breaks Xen.
> He has been doing this for years and before him I was doing it.

FWIW the work I'm doing with linker tables and x86's use of this on
the boot side of things should help avoid these issues proactively.
Sounds too good to be true ? I know. I thought it was rather impossible,
but its what I've come up with and I think it should really help with
that.

This should help either avoid these issues moving forward proactively
to let us keep the old PV path for legacy junk if we want that, or if
we really want to remove the PV path completely and replace it with
HVMLite it should help us avoid issues proactively until we are
ready to nuke the old PV path completely.

So lets say we plan to remove old PV path in 5 years, with the work I'm
doing on the old PV path it means we'll have in place a proactive
framework to avoid Xen fallout *now*, while we churn away towards the
HVMLite lofty goals.

> This is what an maintainer does - and with the HVMLite/PVH stub
> paths that will still continue - that is fallout from the
> startup_(32|64) code changes will be handled as before.

Right, that's because we did not have a proactive solution to
the problem.

> However the bigger goals are that:
>  - This churn will be much much lower than the existing one,
> 
>  - baremetal won't have to deal with some rather odd semantics
>    placed by the pvops paths that are funky and drive x86
>    maintainers to lose hair (amongts other things).

Right on. We are all in strong agreement that the old PV path is a
grand piece of fecal matter.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]     ` <5707BD2E.20204@citrix.com>
@ 2016-04-08 21:58       ` Luis R. Rodriguez
       [not found]       ` <20160408215854.GU1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-08 21:58 UTC (permalink / raw)
  To: George Dunlap
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Jim Fehlig, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross,
	Stefano Stabellini, Julien Grall, joeyli, Borislav Petkov,
	Boris Ostrovsky, Charles Arndol, Andrew Cooper, Julien Grall

On Fri, Apr 08, 2016 at 03:16:14PM +0100, George Dunlap wrote:
> On 07/04/16 19:51, Luis R. Rodriguez wrote:
> > While Andrew's position is right in that perhaps only Xen tools have to deal
> > with the HVMLite specific entry, it would also still mean diverging from ARM's
> > own EFI entry only position, which I'd like to clarify that ARM has no custom
> > Xen entry, we should strive to match that. Anything far from that to me really
> > deserves an explanation, specially if we are going to argue that HVMLite is
> > the best that x86 Xen can do.
> > 
> > Ultimately unifying entry approaches for Xen in a streamlined fashion seems
> > like a sensible thing to strive for. Anything we push in the other direction,
> > as small as it can be, should deserve at least a 'hey, wait a minute'...
> 
> Quick factual correction here.
> 
> "Since ARM guests only use the EFI entry point, x86 guests should also
> only use the EFI entry point" is certainly a reasonable argument to make.
> 
> However, dom0 on ARM does not use the EFI entry point.  When starting
> dom0, Xen uses the native entry point (the one that UBoot uses) and
> hands dom0 a device-tree node.  The reason this is possible on ARM is
> that there are no assumptions made about what hardware is or is not
> present on the system -- everything that needs to be communicated about
> what is or is not present can be passed in DT.
> 
> So it is incorrect to say that ARM has an "EFI entry only" position.
> 
> (On ACPI systems, it does apparently generate some UEFI informational
> tables, which it passes to the dom0 kernel via DT; and the kernel
> unpacks and puts in the right place.  Normal Xen ARM guests can use EFI,
> but that's because we start OVMF in the guest context to provide the EFI
> services.  These may be where the idea that ARM guests use only the UEFI
> entry point came from.)
> 
> Obviously it would be nice if we could use the native entry point on x86
> as well, but there's decades of legacy hardware and backwards
> compatibility to deal with there.

OK thanks for the clarification -- still no custom entries for Xen!
We should strive for that, at the very least.

You do have a point about the legacy stuff. There are two options there:

  * Fold legacy support under HVMLite -- which seems to be what we
    currently want to do (we should evaluate the implications and
    requirements here for that); or

  * Leave legacy stuff on the old PV path; this may be something to
    bring to the table if we had in place a proactive solution to
    avoid further fallout from the architecture of the huge differences
    on the entries. The work I'm doing should help with that. (We should
    also evaluate the implications and requirements here for that as
    well).

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found] ` <20160406111130.GG3489@olila.local.net-space.pl>
  2016-04-07 19:12   ` Luis R. Rodriguez
@ 2016-04-09 17:02   ` Luis R. Rodriguez
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-09 17:02 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, x86, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Jim Fehlig, Andy Lutomirski,
	Luis R. Rodriguez, david.vrabel, Roger Pau Monné

On Wed, Apr 06, 2016 at 01:11:30PM +0200, Daniel Kiper wrote:
> On Wed, Apr 06, 2016 at 04:40:27AM +0200, Luis R. Rodriguez wrote:
> > Boris sent out the first HVMLite series of patches to add a new Xen guest type
> > February 1, 2016 [0]. We've been talking off list with a few folks now over
> > the prospect of instead of adding yet-another-boot-entry we instead fixate
> > HVMLite to use the x86 EFI boot entry. There's a series of reasons to consider
> > this, likewise there are reasons to question the effort required and if its
> > really needed. We'd like some more public review of this proposal, and see if
> > others can come up with other ideas, both in favor or against this proposal.
> >
> > This in particular is also a good time to get x86 Linux folks to chime on on
> > the general design proposal of HVMLite design, given that outside of the boot
> > entry discussion it would seem including myself that we didn't get the memo
> > over the proposed architecture review [1]. At least on my behalf perhaps the
> > only sticking thorns of the design was the new boot entry, which came to me
> > as a surprise, and this thread addresses and the lack of addressing semantics
> > for early boot (which we may seem to need to address; some of this is being
> > addressing in parallels through other work). The HVMLite document talks about
> > using ACPI_FADT_NO_VGA -- we don't use this yet upstream but I have some pending
> > changes which should make it easy to integrate its use on HVMLite. Perhaps
> > there are others that may have some other points they may want to raise now...
> >
> > A huge summary of the discussion over EFI boot option for HVMLite is now on a
> > wiki [2], below I'll just provide the outline of the discussion. Consider this a
> > request for more public review, feel free to take any of the items below and
> > elaborate on it as you see fit.
> >
> > Worth mentioning also is that this topic will be discussed at the 2016 Xen
> > Hackathon April 18-19 [3] at the ARM Cambridge, UK Headquarters so if you can
> > attend and this topic interests you, consider attending.
> 
> I hope that you will be there as one of the biggest proponents of EFI entry point.
> If you does not it will be difficult or impossible to discuss this issue without you.
> In the worst case I can raise this topic on behalf of you and then we should organize
> phone call if possible (and accepted by others). However, to do that I must know your
> plans in advance.

I'll be there!

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]   ` <20160408204032.GR1990@wotan.suse.de>
@ 2016-04-11  5:12     ` Juergen Gross
       [not found]     ` <570B3228.90400@suse.com>
  1 sibling, 0 replies; 68+ messages in thread
From: Juergen Gross @ 2016-04-11  5:12 UTC (permalink / raw)
  To: Luis R. Rodriguez, David Vrabel, Julien Grall, Stefano Stabellini
  Cc: Charles Arndol, xen-devel, Matt Fleming, Andrew Cooper,
	Daniel Kiper, x86, Michael Chang, Andy Lutomirski, joeyli,
	Jim Fehlig, Vojtěch Pavlík, Gary Lin, Jan Beulich,
	H. Peter Anvin, Borislav Petkov, Boris Ostrovsky, Linus Torvalds,
	Jeffrey Cheung, linux-kernel, Roger Pau Monné

On 08/04/16 22:40, Luis R. Rodriguez wrote:
> On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
>> On 06/04/16 03:40, Luis R. Rodriguez wrote:
>>>
>>>     * You don't need full EFI emulation
>>
>> I think needing any EFI emulation inside Xen (which is where it would
>> need to be for dom0) is not suitable because of the increase in
>> hypervisor ABI.
> 
> Is this because of timing on architecture / design of HVMLite, or
> a general position that the complexity to deal with EFI emulation
> is too much for Xen's taste ?

The Xen hypervisor should be as small as possible. Adding an EFI
emulator will be adding quite some code. This should be done after a
very thorough evaluation only.

> ARM already went the EFI entry way for domU -- it went the OVMF route,
> would such a possibility be possible for x86 domU HVMLite ? If not why
> not, I mean it would seem to make sense to at least mimic the same type
> of early boot environment, and perhaps there are some lessons to be
> learned from that effort too.

The final solution must be appropriate for dom0, too. So don't try
to limit the discussion to domU. If dom0 isn't going to be acceptable
there will no need to discuss domU.

> Are there some lessons to be learned with ARM's effort? What are they?
> If that could be re-done again with any type of cleaner path, what
> could that be that could help the x86 side ?
> 
> Although emulating EFI may require work, some folks have pointed out
> that the amount of work may not be that much. If that is done can
> we instead rely on the same code to replace OVMF to support both
> Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
> this ?
> 
>> I also still do not understand your objection to the current tiny stub.
> 
> Its more of a hypothetical -- can an EFI entry be used instead given
> it already does exactly what the new small entry does ? Its also rather
> odd to add a new entry without evaluating fully a possible alternative
> that would provide the same exact mechanism.

The interface isn't the new entry only. It should be evaluated how much
of the early EFI boot path would be common to the HVMlite one. What
would be gained by using the same entry but having two different boot
paths after it? You still need a way to distinguish between bare metal
EFI and HVMlite. And Xen needs a way to find out whether a kernel is
supporting HVMlite to boot it in the correct mode.

> A full technical unbiased evaluation of the different approaches is what I'd
> hope we could strive to achieve through discussion and peer review, thinking
> and prioritizing ultimately what is best to minimize the impact on Linux
> and also help take advantage of the best features possible through both
> means. Thinking long term, not immediate short term.

Sure.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]     ` <570B3228.90400@suse.com>
@ 2016-04-12 21:02       ` Andy Lutomirski
       [not found]       ` <CALCETrXvGR3XKJf5Ab_ZPc-iuNuzR8AzLpRBciemKz4r0vSrGA@mail.gmail.com>
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Andy Lutomirski @ 2016-04-12 21:02 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Matt Fleming, Michael Chang, linux-kernel, Jim Fehlig,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, X86 ML,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Charles Arndol, Andrew Cooper, Julien Grall, Luis R. Rodriguez,
	David Vrabel, Linus Torvalds

On Sun, Apr 10, 2016 at 10:12 PM, Juergen Gross <jgross@suse.com> wrote:
> On 08/04/16 22:40, Luis R. Rodriguez wrote:
>> On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
>>> On 06/04/16 03:40, Luis R. Rodriguez wrote:
>>>>
>>>>     * You don't need full EFI emulation
>>>
>>> I think needing any EFI emulation inside Xen (which is where it would
>>> need to be for dom0) is not suitable because of the increase in
>>> hypervisor ABI.
>>
>> Is this because of timing on architecture / design of HVMLite, or
>> a general position that the complexity to deal with EFI emulation
>> is too much for Xen's taste ?
>
> The Xen hypervisor should be as small as possible. Adding an EFI
> emulator will be adding quite some code. This should be done after a
> very thorough evaluation only.
>
>> ARM already went the EFI entry way for domU -- it went the OVMF route,
>> would such a possibility be possible for x86 domU HVMLite ? If not why
>> not, I mean it would seem to make sense to at least mimic the same type
>> of early boot environment, and perhaps there are some lessons to be
>> learned from that effort too.
>
> The final solution must be appropriate for dom0, too. So don't try
> to limit the discussion to domU. If dom0 isn't going to be acceptable
> there will no need to discuss domU.
>
>> Are there some lessons to be learned with ARM's effort? What are they?
>> If that could be re-done again with any type of cleaner path, what
>> could that be that could help the x86 side ?
>>
>> Although emulating EFI may require work, some folks have pointed out
>> that the amount of work may not be that much. If that is done can
>> we instead rely on the same code to replace OVMF to support both
>> Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
>> this ?
>>
>>> I also still do not understand your objection to the current tiny stub.
>>
>> Its more of a hypothetical -- can an EFI entry be used instead given
>> it already does exactly what the new small entry does ? Its also rather
>> odd to add a new entry without evaluating fully a possible alternative
>> that would provide the same exact mechanism.
>
> The interface isn't the new entry only. It should be evaluated how much
> of the early EFI boot path would be common to the HVMlite one. What
> would be gained by using the same entry but having two different boot
> paths after it? You still need a way to distinguish between bare metal
> EFI and HVMlite. And Xen needs a way to find out whether a kernel is
> supporting HVMlite to boot it in the correct mode.
>
>> A full technical unbiased evaluation of the different approaches is what I'd
>> hope we could strive to achieve through discussion and peer review, thinking
>> and prioritizing ultimately what is best to minimize the impact on Linux
>> and also help take advantage of the best features possible through both
>> means. Thinking long term, not immediate short term.
>
> Sure.

FWIW, someone just pointed me to u-boot's EFI implementation.
u-boot's lib/efi_loader contains a tiny (<3k LOC, 10kB compiled) UEFI
implementation that's sufficient to boot a Linux EFI payload.

An argument against making Xen's default domU entry use UEFI is that
it might become unnecessarily awkward to do something like
chainloading to OVMF.   But maybe OVMF can be compiled as a UEFI
binary :)

--Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]       ` <20160408215854.GU1990@wotan.suse.de>
@ 2016-04-12 22:12         ` Luis R. Rodriguez
  2016-04-13  9:54         ` Roger Pau Monné
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-12 22:12 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Jim Fehlig, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross,
	Stefano Stabellini, Julien Grall, George Dunlap, joeyli,
	Borislav Petkov, Boris Ostrovsky, Charles Arndol, Andrew Cooper

On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
> On Fri, Apr 08, 2016 at 03:16:14PM +0100, George Dunlap wrote:
> > On 07/04/16 19:51, Luis R. Rodriguez wrote:
> > > While Andrew's position is right in that perhaps only Xen tools have to deal
> > > with the HVMLite specific entry, it would also still mean diverging from ARM's
> > > own EFI entry only position, which I'd like to clarify that ARM has no custom
> > > Xen entry, we should strive to match that. Anything far from that to me really
> > > deserves an explanation, specially if we are going to argue that HVMLite is
> > > the best that x86 Xen can do.
> > > 
> > > Ultimately unifying entry approaches for Xen in a streamlined fashion seems
> > > like a sensible thing to strive for. Anything we push in the other direction,
> > > as small as it can be, should deserve at least a 'hey, wait a minute'...
> > 
> > Quick factual correction here.
> > 
> > "Since ARM guests only use the EFI entry point, x86 guests should also
> > only use the EFI entry point" is certainly a reasonable argument to make.
> > 
> > However, dom0 on ARM does not use the EFI entry point.  When starting
> > dom0, Xen uses the native entry point (the one that UBoot uses) and
> > hands dom0 a device-tree node.  The reason this is possible on ARM is
> > that there are no assumptions made about what hardware is or is not
> > present on the system -- everything that needs to be communicated about
> > what is or is not present can be passed in DT.
> > 
> > So it is incorrect to say that ARM has an "EFI entry only" position.
> > 
> > (On ACPI systems, it does apparently generate some UEFI informational
> > tables, which it passes to the dom0 kernel via DT; and the kernel
> > unpacks and puts in the right place.  Normal Xen ARM guests can use EFI,
> > but that's because we start OVMF in the guest context to provide the EFI
> > services.  These may be where the idea that ARM guests use only the UEFI
> > entry point came from.)
> > 
> > Obviously it would be nice if we could use the native entry point on x86
> > as well, but there's decades of legacy hardware and backwards
> > compatibility to deal with there.
> 
> OK thanks for the clarification -- still no custom entries for Xen!
> We should strive for that, at the very least.
> 
> You do have a point about the legacy stuff. There are two options there:
> 
>   * Fold legacy support under HVMLite -- which seems to be what we
>     currently want to do (we should evaluate the implications and
>     requirements here for that); or
> 
>   * Leave legacy stuff on the old PV path; this may be something to
>     bring to the table if we had in place a proactive solution to
>     avoid further fallout from the architecture of the huge differences
>     on the entries. The work I'm doing should help with that. (We should
>     also evaluate the implications and requirements here for that as
>     well).

Also, x86 does have a history of short DT use. Just pointing that its there as
an option as well. I'll Cc you on some thread about that.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]       ` <CALCETrXvGR3XKJf5Ab_ZPc-iuNuzR8AzLpRBciemKz4r0vSrGA@mail.gmail.com>
@ 2016-04-13  9:02         ` Roger Pau Monné
       [not found]         ` <20160413090202.bg2vfdl3iol7eedv@mac>
  1 sibling, 0 replies; 68+ messages in thread
From: Roger Pau Monné @ 2016-04-13  9:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Matt Fleming, Michael Chang, linux-kernel, Jim Fehlig,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, X86 ML,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Stefano Stabellini, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper, Julien Grall,
	Luis R. Rodriguez, David Vrabel

On Tue, Apr 12, 2016 at 02:02:52PM -0700, Andy Lutomirski wrote:
> On Sun, Apr 10, 2016 at 10:12 PM, Juergen Gross <jgross@suse.com> wrote:
> > On 08/04/16 22:40, Luis R. Rodriguez wrote:
> >> On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
> >>> On 06/04/16 03:40, Luis R. Rodriguez wrote:
> >>>>
> >>>>     * You don't need full EFI emulation
> >>>
> >>> I think needing any EFI emulation inside Xen (which is where it would
> >>> need to be for dom0) is not suitable because of the increase in
> >>> hypervisor ABI.
> >>
> >> Is this because of timing on architecture / design of HVMLite, or
> >> a general position that the complexity to deal with EFI emulation
> >> is too much for Xen's taste ?
> >
> > The Xen hypervisor should be as small as possible. Adding an EFI
> > emulator will be adding quite some code. This should be done after a
> > very thorough evaluation only.
> >
> >> ARM already went the EFI entry way for domU -- it went the OVMF route,
> >> would such a possibility be possible for x86 domU HVMLite ? If not why
> >> not, I mean it would seem to make sense to at least mimic the same type
> >> of early boot environment, and perhaps there are some lessons to be
> >> learned from that effort too.
> >
> > The final solution must be appropriate for dom0, too. So don't try
> > to limit the discussion to domU. If dom0 isn't going to be acceptable
> > there will no need to discuss domU.
> >
> >> Are there some lessons to be learned with ARM's effort? What are they?
> >> If that could be re-done again with any type of cleaner path, what
> >> could that be that could help the x86 side ?
> >>
> >> Although emulating EFI may require work, some folks have pointed out
> >> that the amount of work may not be that much. If that is done can
> >> we instead rely on the same code to replace OVMF to support both
> >> Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
> >> this ?
> >>
> >>> I also still do not understand your objection to the current tiny stub.
> >>
> >> Its more of a hypothetical -- can an EFI entry be used instead given
> >> it already does exactly what the new small entry does ? Its also rather
> >> odd to add a new entry without evaluating fully a possible alternative
> >> that would provide the same exact mechanism.
> >
> > The interface isn't the new entry only. It should be evaluated how much
> > of the early EFI boot path would be common to the HVMlite one. What
> > would be gained by using the same entry but having two different boot
> > paths after it? You still need a way to distinguish between bare metal
> > EFI and HVMlite. And Xen needs a way to find out whether a kernel is
> > supporting HVMlite to boot it in the correct mode.
> >
> >> A full technical unbiased evaluation of the different approaches is what I'd
> >> hope we could strive to achieve through discussion and peer review, thinking
> >> and prioritizing ultimately what is best to minimize the impact on Linux
> >> and also help take advantage of the best features possible through both
> >> means. Thinking long term, not immediate short term.
> >
> > Sure.
> 
> FWIW, someone just pointed me to u-boot's EFI implementation.
> u-boot's lib/efi_loader contains a tiny (<3k LOC, 10kB compiled) UEFI
> implementation that's sufficient to boot a Linux EFI payload.

I guess this is a pretty minimal EFI implementation, is this something 
standard, or just an EFI implementation tailored to Linux needs? (ie: is 
there any standard EFI flag to signal this kind of minimal EFI environment?)
 
> An argument against making Xen's default domU entry use UEFI is that
> it might become unnecessarily awkward to do something like
> chainloading to OVMF.   But maybe OVMF can be compiled as a UEFI
> binary :)

With my FreeBSD committer hat:

The FreeBSD kernel doesn't contain an EFI entry point, it just contains one 
single entry point that's used for both legacy BIOS and EFI. Then the 
FreeBSD loader is the one that contains the different entry points. I would 
really like to avoid adding an EFI entry point and the PE header to the 
FreeBSD kernel. The current trampoline in FreeBSD to tie the Xen entry point 
into the native path contains 96 lines of assembly (half of them are 
actually comments) and 66 lines of C. I think adding an EFI entry point is 
going to add a lot more of code than this, and we would probably need 
changes to the build system in order to assembly the PE header and the ELF 
headers together.

IMHO, if we want to boot PVH using EFI the right solution is to use OVMF (or 
any other UEFI firmware) and port it so it's able to run as a PVH guest. I 
guess it should even be possible to use it for Dom0, although I think this 
is cumbersome.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]       ` <20160408215854.GU1990@wotan.suse.de>
  2016-04-12 22:12         ` Luis R. Rodriguez
@ 2016-04-13  9:54         ` Roger Pau Monné
       [not found]         ` <20160412221225.GN1990@wotan.suse.de>
       [not found]         ` <20160413095428.5mcbrimvc6vxffcw@mac>
  3 siblings, 0 replies; 68+ messages in thread
From: Roger Pau Monné @ 2016-04-13  9:54 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Jim Fehlig, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross,
	Stefano Stabellini, Julien Grall, George Dunlap, joeyli,
	Borislav Petkov, Boris Ostrovsky, Charles Arndol, Andrew Cooper

On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
> On Fri, Apr 08, 2016 at 03:16:14PM +0100, George Dunlap wrote:
> > On 07/04/16 19:51, Luis R. Rodriguez wrote:
> > > While Andrew's position is right in that perhaps only Xen tools have to deal
> > > with the HVMLite specific entry, it would also still mean diverging from ARM's
> > > own EFI entry only position, which I'd like to clarify that ARM has no custom
> > > Xen entry, we should strive to match that. Anything far from that to me really
> > > deserves an explanation, specially if we are going to argue that HVMLite is
> > > the best that x86 Xen can do.
> > > 
> > > Ultimately unifying entry approaches for Xen in a streamlined fashion seems
> > > like a sensible thing to strive for. Anything we push in the other direction,
> > > as small as it can be, should deserve at least a 'hey, wait a minute'...
> > 
> > Quick factual correction here.
> > 
> > "Since ARM guests only use the EFI entry point, x86 guests should also
> > only use the EFI entry point" is certainly a reasonable argument to make.
> > 
> > However, dom0 on ARM does not use the EFI entry point.  When starting
> > dom0, Xen uses the native entry point (the one that UBoot uses) and
> > hands dom0 a device-tree node.  The reason this is possible on ARM is
> > that there are no assumptions made about what hardware is or is not
> > present on the system -- everything that needs to be communicated about
> > what is or is not present can be passed in DT.
> > 
> > So it is incorrect to say that ARM has an "EFI entry only" position.
> > 
> > (On ACPI systems, it does apparently generate some UEFI informational
> > tables, which it passes to the dom0 kernel via DT; and the kernel
> > unpacks and puts in the right place.  Normal Xen ARM guests can use EFI,
> > but that's because we start OVMF in the guest context to provide the EFI
> > services.  These may be where the idea that ARM guests use only the UEFI
> > entry point came from.)
> > 
> > Obviously it would be nice if we could use the native entry point on x86
> > as well, but there's decades of legacy hardware and backwards
> > compatibility to deal with there.
> 
> OK thanks for the clarification -- still no custom entries for Xen!
> We should strive for that, at the very least.
> 
> You do have a point about the legacy stuff. There are two options there:
> 
>   * Fold legacy support under HVMLite -- which seems to be what we
>     currently want to do (we should evaluate the implications and
>     requirements here for that); or

I'm not following here. What does it mean to fold legacy support under 
HVMlite? HVMlite doesn't have any legacy hardware, and that's the issue when 
it comes to using native Linux entry points. Linux might expect some legacy 
PC hardware to be always present, which is not true for HVMlite.

Could you please clarify this point?

>   * Leave legacy stuff on the old PV path; this may be something to
>     bring to the table if we had in place a proactive solution to
>     avoid further fallout from the architecture of the huge differences
>     on the entries. The work I'm doing should help with that. (We should
>     also evaluate the implications and requirements here for that as
>     well).

Classic PV guests don't have legacy hardware at all, they just have PV 
interfaces, so I'm even less sure of what this means.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]   ` <20160406150240.GO2701@codeblueprint.co.uk>
  2016-04-06 16:05     ` Konrad Rzeszutek Wilk
@ 2016-04-13 10:03     ` Roger Pau Monné
       [not found]     ` <20160413100312.647eocdtbmak4btk@mac>
  2 siblings, 0 replies; 68+ messages in thread
From: Roger Pau Monné @ 2016-04-13 10:03 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Michael Chang, Jim Fehlig, Jan Beulich, H. Peter Anvin,
	Daniel Kiper, the arch/x86 maintainers, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross, Julien Grall,
	Stefano Stabellini, George Dunlap, joeyli, Borislav Petkov,
	Boris Ostrovsky, Charles Arndol, Andrew Cooper,
	Linux Kernel Mailing List, Andy Lutomirski, Luis R. Rodriguez

On Wed, Apr 06, 2016 at 04:02:40PM +0100, Matt Fleming wrote:
[...]
> One place that struck me as suitable for this "hypercall in an EFI
> service stub" approach is the trouble with doing ACPI reboot as
> documented here,
> 
>   http://lists.xen.org/archives/html/xen-devel/2016-02/msg01609.html
> 
> Performing the reset hypercall from within HVMlite's custom EfiReset()
> service would avoid having to touch ACPICA at all, and would be
> indistinguishable from bare metal.

I don't get this, the "reset/shutdown" hypercall requires the following 
steps from Dom0 (it's not as simple as calling a hypercall):

The way to perform a full system power off from Dom0 is different than 
what's done in a DomU guest. In order to perform a power off from Dom0 the 
native ACPI path should be followed, but the guest should not write the 
`SLP_EN` bit to the Pm1Control register. Instead the 
`XENPF_enter_acpi_sleep` hypercall should be used, filling the following 
data in the `xen_platform_op` struct:

    cmd = XENPF_enter_acpi_sleep
    interface_version = XENPF_INTERFACE_VERSION
    u.enter_acpi_sleep.pm1a_cnt_val = Pm1aControlValue
    u.enter_acpi_sleep.pm1b_cnt_val = Pm1bControlValue

At which point it means that we are either going to duplicate ACPICA code 
into the HVMlite's custom EfiReset() service, or we are going to call into 
ACPICA, which is what we already do now.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]         ` <20160412221225.GN1990@wotan.suse.de>
@ 2016-04-13 10:05           ` George Dunlap
  2016-04-13 10:25           ` Roger Pau Monné
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-04-13 10:05 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Takashi Iwai, Vojtěch Pavlík, Gary Lin, xen-devel,
	Jeffrey Cheung, Charles Arndol, Julien Grall, Stefano Stabellini,
	Julien Grall, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Linux Kernel Mailing List

On Tue, Apr 12, 2016 at 11:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> Also, x86 does have a history of short DT use. Just pointing that its there as
> an option as well. I'll Cc you on some thread about that.

I'm not sure how this is relevant to anything.

What we're talking about is how to get from Xen to a point in the
Linux kernel where everything can Just Work.  The proposed feature is
a mini trampoline that (as I understand it):
1. Tells Xen where to jump to (via ELF note)
2. Sets up some basic modes and pagetables and then jumps to the zero
page so Linux can just carry on.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]         ` <20160413090202.bg2vfdl3iol7eedv@mac>
@ 2016-04-13 10:15           ` Matt Fleming
       [not found]           ` <20160413101515.GJ2829@codeblueprint.co.uk>
  1 sibling, 0 replies; 68+ messages in thread
From: Matt Fleming @ 2016-04-13 10:15 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Michael Chang, linux-kernel, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, X86 ML, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Julien Grall, Andy Lutomirski,
	Luis R. Rodriguez, David Vrabel

On Wed, 13 Apr, at 11:02:02AM, Roger Pau Monné wrote:
> 
> With my FreeBSD committer hat:
> 
> The FreeBSD kernel doesn't contain an EFI entry point, it just contains one 
> single entry point that's used for both legacy BIOS and EFI. Then the 
> FreeBSD loader is the one that contains the different entry points. I would 
> really like to avoid adding an EFI entry point and the PE header to the 
> FreeBSD kernel. The current trampoline in FreeBSD to tie the Xen entry point 
> into the native path contains 96 lines of assembly (half of them are 
> actually comments) and 66 lines of C. I think adding an EFI entry point is 
> going to add a lot more of code than this, and we would probably need 
> changes to the build system in order to assembly the PE header and the ELF 
> headers together.
 
What does the boot flow look like for PVH2 on FreeBSD today?
Presumably it doesn't have the same entry point that Boris proposed
for Linux?

Does it go, Hypervisor -> FreeBSD loader -> FreeBSD kernel? Or are you
able to directly boot the kernel from the hypervisor and skip the
middle part by having secondary entry point for Xen marked by the ELF
note?

> IMHO, if we want to boot PVH using EFI the right solution is to use OVMF (or 
> any other UEFI firmware) and port it so it's able to run as a PVH guest. I 
> guess it should even be possible to use it for Dom0, although I think this 
> is cumbersome.

There are two levels of EFI boot entry features being discussed,

 1. Make the OS kernel a PE/COFF executable
 2. Provide some level of EFI service functionality

You can adopt 1. without 2, i.e. without actually providing any EFI
services at all, as long as the Xen hypervisor grows a PE/COFF loader
(since EFI firmware has to provide you one, for EFI platforms you
could use the LoadImage() service in the firmware, but for BIOS
platforms you'd need your own in Xen).

On Linux, this has the advantage of deferring the decompression of the
bzImage (x86 Linux kernel file format) to the stub on the front of the
bzImage. And while I realise that the toolstack already has support
for decompressing bzImages, given what Andrew has said about reducing
attack surface, having the guest perform the decompression should be a
win.

Of course, this is offset somewhat by the fact that you need to audit
the PE/COFF loader ;) But decompression in general is notoriously
vulnerable to security issues.

Using the in-kernel decompressor is how most (all?) Linux boot loaders
work today, so there's the added benefit of reducing the differences
between booting on Xen and booting bare metal. For example, you'd
probably be able to use CONFIG_RANDOMIZE_BASE (ASLR for kernel image)
for Xen if you use the kernel's decompressor. Xen would also get
future features in this area for free, and there is a tendency to push
boot features into the early stub.

For 1. we'd basically be using the PE/COFF file format with the EFI
ABI as an OS agnostic boot protocol, but not as a full firmware
runtime environment.

2. is also interesting, though I think less so than 1. I agree that
making OVMF work as a PVH guest is probably the right way to go, even
for Dom0, not least because you'd have a much cleaner/less buggy
implementation than what we see in the real world ;)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]     ` <20160413100312.647eocdtbmak4btk@mac>
@ 2016-04-13 10:21       ` Matt Fleming
  0 siblings, 0 replies; 68+ messages in thread
From: Matt Fleming @ 2016-04-13 10:21 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Michael Chang, Jim Fehlig, Jan Beulich, H. Peter Anvin,
	Daniel Kiper, the arch/x86 maintainers, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross, Julien Grall,
	Stefano Stabellini, George Dunlap, joeyli, Borislav Petkov,
	Boris Ostrovsky, Charles Arndol, Andrew Cooper,
	Linux Kernel Mailing List, Andy Lutomirski, Luis R. Rodriguez

On Wed, 13 Apr, at 12:03:12PM, Roger Pau Monné wrote:
> 
> I don't get this, the "reset/shutdown" hypercall requires the following 
> steps from Dom0 (it's not as simple as calling a hypercall):
> 
> The way to perform a full system power off from Dom0 is different than 
> what's done in a DomU guest. In order to perform a power off from Dom0 the 
> native ACPI path should be followed, but the guest should not write the 
> `SLP_EN` bit to the Pm1Control register. Instead the 
> `XENPF_enter_acpi_sleep` hypercall should be used, filling the following 
> data in the `xen_platform_op` struct:
> 
>     cmd = XENPF_enter_acpi_sleep
>     interface_version = XENPF_INTERFACE_VERSION
>     u.enter_acpi_sleep.pm1a_cnt_val = Pm1aControlValue
>     u.enter_acpi_sleep.pm1b_cnt_val = Pm1bControlValue
> 
> At which point it means that we are either going to duplicate ACPICA code 
> into the HVMlite's custom EfiReset() service, or we are going to call into 
> ACPICA, which is what we already do now.

Fair enough, I wasn't aware that you needed to call into ACPI to
perform the reset.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]         ` <20160412221225.GN1990@wotan.suse.de>
  2016-04-13 10:05           ` George Dunlap
@ 2016-04-13 10:25           ` Roger Pau Monné
       [not found]           ` <CAFLBxZbiGppNad=Z6-fLgx89O0yAFrSyARTCwv=vHBR3zJ=NsA@mail.gmail.com>
       [not found]           ` <20160413102156.b4qwhwbqvnnpmxgw@mac>
  3 siblings, 0 replies; 68+ messages in thread
From: Roger Pau Monné @ 2016-04-13 10:25 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Jim Fehlig, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross,
	Stefano Stabellini, Julien Grall, George Dunlap, joeyli,
	Borislav Petkov, Boris Ostrovsky, Charles Arndol, Andrew Cooper

On Wed, Apr 13, 2016 at 12:12:25AM +0200, Luis R. Rodriguez wrote:
[...]
> Also, x86 does have a history of short DT use. Just pointing that its there as
> an option as well. I'll Cc you on some thread about that.

I don't see how this is relevant to the conversation that's going on:

How many x86 hardware provide DT? I bet this is 0%.

How many OSes can boot on x86 using DT? Linux maybe, certainly FreeBSD, 
Windows or OpenBSD won't be able to boot at all when provided a DT on x86.

Is Xen going to craft a DT for x86 based on ACPI? No, because it can't parse 
the DSDT or other dynamic tables that contain the information about 
the devices in the system.

I would also like to point out that DT or not DT is not really the problem 
here, the issue that George was trying to point out is that on x86 there's 
some legacy hardware that's considered to be always there, so it's presence 
is not signaled by ACPI, and HVMlite is _not_ emulating this hardware. It 
doesn't matter if the hardware description comes from ACPI or DT, this 
hardware is considered to be always present on PC compatible hardware.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]           ` <20160413101515.GJ2829@codeblueprint.co.uk>
@ 2016-04-13 10:40             ` Matt Fleming
  2016-04-13 11:12             ` George Dunlap
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Matt Fleming @ 2016-04-13 10:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Michael Chang, linux-kernel, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, X86 ML, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Julien Grall, Andy Lutomirski,
	Luis R. Rodriguez, David Vrabel

On Wed, 13 Apr, at 11:15:15AM, Matt Fleming wrote:
> 
> For 1. we'd basically be using the PE/COFF file format with the EFI
> ABI as an OS agnostic boot protocol, but not as a full firmware
> runtime environment.

To add some balance to this proposal (since there's no such thing as a
free lunch) some of the disadvantages are,

The PE/COFF stub in Linux does assume that it is executing in native
cpu mode and does not perform any mode switching, i.e. from 32-bit
protected to long mode. This is due to the way that EFI works - by the
time the OS image entry point is jumped to on a 64-bit cpu we're
running in long mode with identity mapped page tables. To be fair,
when running Xen on EFI (bare metal) this would save you one cpu mode
switch when compared with the current HVMLite proposal.

I'm not aware of a direct equivalent for ELF notes in the PE/COFF
format. I'm still re-reading the spec to find something suitable.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]           ` <20160413101515.GJ2829@codeblueprint.co.uk>
  2016-04-13 10:40             ` Matt Fleming
@ 2016-04-13 11:12             ` George Dunlap
  2016-04-13 11:59             ` Roger Pau Monné
       [not found]             ` <20160413115846.hyt4lg24rfkenbxu@mac>
  3 siblings, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-04-13 11:12 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Michael Chang, linux-kernel, Julien Grall, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, X86 ML, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Charles Arndol, Andrew Cooper, Jim Fehlig, Andy Lutomirski,
	Luis R. Rodriguez, David Vrabel

On Wed, Apr 13, 2016 at 11:15 AM, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> For 1. we'd basically be using the PE/COFF file format with the EFI
> ABI as an OS agnostic boot protocol, but not as a full firmware
> runtime environment.

But we still have the issue here that the now the EFI entry point in
Linux has to figure out, "Am I running in a full firmware runtime
environment, or am I running under Xen?", and then change behavior
appropriately.  Then we get back to Juergen's comment:  "[The EFI
proposal] should be evaluated how much of the early EFI boot path
would be common to the HVMlite one. What would be gained by using the
same entry but having two different boot paths after it?"

> 2. is also interesting, though I think less so than 1. I agree that
> making OVMF work as a PVH guest is probably the right way to go, even
> for Dom0, not least because you'd have a much cleaner/less buggy
> implementation than what we see in the real world ;)

So rather than just add an extra entry point and a Xen-to-zero-page
stub, you're going to ask Xen on dom0 to import a full OVMF binary?
Or have the bootloader entries include xen, linux, the initrd, *and*
ovmf?  That seems a bit extreme. :-)

Keep in mind also that PVH needs to support not only the traditional
VM use-case (e.g., booting a full distro), but the small service VM
usecase (a la unikernels).  Booting a traditional distro as a domU via
OVMF -> EFI Linux makes sense; it reduces the distro's test burden,
and the OVMF doesn't add a lot to the memory or boot time compared to
the size and boot time of a full distro.  But booting tiny service
VMs, sometimes with not even any disk of their own (other than a
ramdisk), the extra cost of including OVMF in the guest address space
can be a non-negligible addition to the memory requirements and
boot-up time.

One of the reasons Xen on ARM prioritized getting EFI working for
domUs was that a representative from a certain distro vendor made it
absolutely clear that *their* distro would *only* support booting via
EFI on ARM.  But you can still, as I understand it, use uBoot with DT
to boot a lightweight domU if you want.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]           ` <20160413101515.GJ2829@codeblueprint.co.uk>
  2016-04-13 10:40             ` Matt Fleming
  2016-04-13 11:12             ` George Dunlap
@ 2016-04-13 11:59             ` Roger Pau Monné
       [not found]             ` <20160413115846.hyt4lg24rfkenbxu@mac>
  3 siblings, 0 replies; 68+ messages in thread
From: Roger Pau Monné @ 2016-04-13 11:59 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Michael Chang, linux-kernel, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, X86 ML, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Julien Grall, Andy Lutomirski,
	Luis R. Rodriguez, David Vrabel

On Wed, Apr 13, 2016 at 11:15:15AM +0100, Matt Fleming wrote:
> On Wed, 13 Apr, at 11:02:02AM, Roger Pau Monné wrote:
> > 
> > With my FreeBSD committer hat:
> > 
> > The FreeBSD kernel doesn't contain an EFI entry point, it just contains one 
> > single entry point that's used for both legacy BIOS and EFI. Then the 
> > FreeBSD loader is the one that contains the different entry points. I would 
> > really like to avoid adding an EFI entry point and the PE header to the 
> > FreeBSD kernel. The current trampoline in FreeBSD to tie the Xen entry point 
> > into the native path contains 96 lines of assembly (half of them are 
> > actually comments) and 66 lines of C. I think adding an EFI entry point is 
> > going to add a lot more of code than this, and we would probably need 
> > changes to the build system in order to assembly the PE header and the ELF 
> > headers together.
>  
> What does the boot flow look like for PVH2 on FreeBSD today?
> Presumably it doesn't have the same entry point that Boris proposed
> for Linux?

Yes it does have something quite similar to the entry point that Boris 
proposed for Linux.
 
> Does it go, Hypervisor -> FreeBSD loader -> FreeBSD kernel? Or are you
> able to directly boot the kernel from the hypervisor and skip the
> middle part by having secondary entry point for Xen marked by the ELF
> note?

We skip the bootloader and Xen loads the FreeBSD kernel directly using the 
ELF note that contains the PVH entry point.

I certainly want to be able to run the FreeBSD loader inside of a PVH guest, 
but I plan to simply chainload it from OVMF, so it would look like:

Hypervisor -> OVMF -> FreeBSD EFI loader -> FreeBSD kernel

> > IMHO, if we want to boot PVH using EFI the right solution is to use OVMF (or 
> > any other UEFI firmware) and port it so it's able to run as a PVH guest. I 
> > guess it should even be possible to use it for Dom0, although I think this 
> > is cumbersome.
> 
> There are two levels of EFI boot entry features being discussed,
> 
>  1. Make the OS kernel a PE/COFF executable
>  2. Provide some level of EFI service functionality
> 
> You can adopt 1. without 2, i.e. without actually providing any EFI
> services at all, as long as the Xen hypervisor grows a PE/COFF loader
> (since EFI firmware has to provide you one, for EFI platforms you
> could use the LoadImage() service in the firmware, but for BIOS
> platforms you'd need your own in Xen).

We could use native LoadImage for Dom0 maybe if we are booted on an EFI 
platform, but for DomUs we certainly need to implement our own inside of 
Xen, at which point we could do the same and always use the one inside of 
Xen in order to avoid diverging paths.

TBH, I don't think this is the right solution. We would force every OS 
kernel that wants to be loaded using Xen to become a PE/COFF executable. 
This also includes unikernels like MirageOS, which will be forced to become 
a PE/COFF executable.

Is this header compatible with the ELF header? Con both co-exist in the 
same binary without issues?

> On Linux, this has the advantage of deferring the decompression of the
> bzImage (x86 Linux kernel file format) to the stub on the front of the
> bzImage. And while I realise that the toolstack already has support
> for decompressing bzImages, given what Andrew has said about reducing
> attack surface, having the guest perform the decompression should be a
> win.
> 
> Of course, this is offset somewhat by the fact that you need to audit
> the PE/COFF loader ;) But decompression in general is notoriously
> vulnerable to security issues.
> 
> Using the in-kernel decompressor is how most (all?) Linux boot loaders
> work today, so there's the added benefit of reducing the differences
> between booting on Xen and booting bare metal. For example, you'd
> probably be able to use CONFIG_RANDOMIZE_BASE (ASLR for kernel image)
> for Xen if you use the kernel's decompressor. Xen would also get
> future features in this area for free, and there is a tendency to push
> boot features into the early stub.

All the issues that you mention above are also solved by chainloading OVMF 
instead of directly loading the guest kernel, and it avoids adding a PE/COFF 
loader into Xen.

> For 1. we'd basically be using the PE/COFF file format with the EFI
> ABI as an OS agnostic boot protocol, but not as a full firmware
> runtime environment.

This also means that we will be adding PE/COFF headers to (uni)kernels, but 
we won't still implement full EFI support inside of them, so although it 
would seem like they are capable of being loaded by a native EFI loader, 
they would not.

This seems misleading, and I think it's going to cause grief amongst OS 
developers in general. The current proposed entry point is unique to Xen 
(it's only mentioned in Xen ELF notes), and is certainly not going to cause 
confusion at all.

Also, doesn't this (the fact that Xen will use the EFI entry point 
without a runtime environment) mean that there are going to be diverging 
paths inside of Linux EFI entry point anyway?

At which point, does it really matter that much if this divergence includes 
a new entry point or not?

> 2. is also interesting, though I think less so than 1. I agree that
> making OVMF work as a PVH guest is probably the right way to go, even
> for Dom0, not least because you'd have a much cleaner/less buggy
> implementation than what we see in the real world ;)

I think we all agree that this is not suitable.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]   ` <20160407185148.GL1990@wotan.suse.de>
  2016-04-08 14:16     ` George Dunlap
       [not found]     ` <5707BD2E.20204@citrix.com>
@ 2016-04-13 15:44     ` George Dunlap
       [not found]     ` <CAFLBxZbJ4QyJQ1-ZuXg_Q-9YNXnWzDyPNp4SX=d9g0DS8mJKaw@mail.gmail.com>
  3 siblings, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-04-13 15:44 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Julien Grall, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Jim Fehlig, Andy Lutomirski

On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> So more to it, if the EFI entry already provides a way into Linux
> in a more streamlined fashion bringing it closer to the bare metal
> boot entry, why *would* we add another boot entry to x86, even if
> its small and self contained ?

We would avoid using EFI if:

* Being called both on real hardware and under Xen would make the EFI
entry point more complicated

* Adding the necessary EFI support into Xen would be a significant
chunk of extra work

* Requiring PVH mode to implement EFI would make it more difficult for
other kernes (NetBSD, FreeBSD) to act as dom0s.

* Requiring PVH mode to use EFI would make it more difficult to
support unikernel-style workloads for domUs.

Now as has been pointed out, we don't know for a lot of the above
things for certain, because nobody has posted any code.  None of us
really want to post any code because:

* Reading and understanding the EFI spec, the Linux EFI path, and
implementing all that on both the Xen and the Linux side is a lot of
work

* It looks pretty likely that many of the above things will be true

* The only real objection to the currently proposed solution is really weak.

If you want to post some code I'm sure we could give you feedback on it.

> Another position against small stubs which I listed myself is that we may need
> more semantics for early boot even if the new HVMLite small stub is added. This
> remains to be seen. If we are going to add new semantics, it would seem best to
> use something more standard like EFI configuration tables rather than hack on
> to x86 further custom semantics. Custom sloppy semantics have proven to be
> misused, and were ultimately a sloppy mess.
[snip]
>> That sounds like it's going to make the EFI path just as unmanageable as the
>> current PV path.
>
> Can you describe how?
>
>> Using the EFI entry point would certainly make sense if it was
>> actually simpler than the proposed extra entry point.  But it sounds
>> like it's going to be more complicated, not only for Xen, but also for
>> Linux.
>
> How so? Please provide specifics.

Here is the juxtaposition that confuses me.  The problem with a lot of
the current code is that you have virtualization-specific hacks all
over the place making things complicated.  And in the first quote
above, you seem afraid that the extra entry point with stub code will
somehow be misused and end up in a similar "sloppy mess", even though
it's not at all clear how *having a stub entry point* could be
"abused" by anyone.  But then when I suggest that sharing a codepath
between systems that have actual EFI firmware, with platform hardware,
and a system that has no EFI firmware and no similar concept of the
hardware, might end up a sloppy mess of Xen-specific if clauses and
maintenance headaches due to broken assumptions, it doesn't even
register with you as a reasonable concern?

As Matt said, nobody will be able to provide specifics until someone
tries to code it up.  But coding things up is not free.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]     ` <570B3228.90400@suse.com>
  2016-04-12 21:02       ` Andy Lutomirski
       [not found]       ` <CALCETrXvGR3XKJf5Ab_ZPc-iuNuzR8AzLpRBciemKz4r0vSrGA@mail.gmail.com>
@ 2016-04-13 18:29       ` Luis R. Rodriguez
       [not found]       ` <20160413182951.GW1990@wotan.suse.de>
  3 siblings, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-13 18:29 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Matt Fleming, Michael Chang, linux-kernel, Jim Fehlig,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Charles Arndol, Andrew Cooper, Julien Grall, Andy Lutomirski,
	Luis R. Rodriguez, David Vrabel, Linus Torvalds

On Mon, Apr 11, 2016 at 07:12:08AM +0200, Juergen Gross wrote:
> On 08/04/16 22:40, Luis R. Rodriguez wrote:
> > On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
> >> On 06/04/16 03:40, Luis R. Rodriguez wrote:
> >>>
> >>>     * You don't need full EFI emulation
> >>
> >> I think needing any EFI emulation inside Xen (which is where it would
> >> need to be for dom0) is not suitable because of the increase in
> >> hypervisor ABI.
> > 
> > Is this because of timing on architecture / design of HVMLite, or
> > a general position that the complexity to deal with EFI emulation
> > is too much for Xen's taste ?
> 
> The Xen hypervisor should be as small as possible. Adding an EFI
> emulator will be adding quite some code. This should be done after a
> very thorough evaluation only.

Sure.

> > ARM already went the EFI entry way for domU -- it went the OVMF route,
> > would such a possibility be possible for x86 domU HVMLite ? If not why
> > not, I mean it would seem to make sense to at least mimic the same type
> > of early boot environment, and perhaps there are some lessons to be
> > learned from that effort too.
> 
> The final solution must be appropriate for dom0, too. So don't try
> to limit the discussion to domU. If dom0 isn't going to be acceptable
> there will no need to discuss domU.

Understood. George noted that on ARM dom0 still uses the ARM native entry
point, it seems to accomplish this as it uses a device tree node. I'll
chime in on that in another thread.

> > Are there some lessons to be learned with ARM's effort? What are they?
> > If that could be re-done again with any type of cleaner path, what
> > could that be that could help the x86 side ?
> > 
> > Although emulating EFI may require work, some folks have pointed out
> > that the amount of work may not be that much. If that is done can
> > we instead rely on the same code to replace OVMF to support both
> > Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
> > this ?
> > 
> >> I also still do not understand your objection to the current tiny stub.
> > 
> > Its more of a hypothetical -- can an EFI entry be used instead given
> > it already does exactly what the new small entry does ? Its also rather
> > odd to add a new entry without evaluating fully a possible alternative
> > that would provide the same exact mechanism.
> 
> The interface isn't the new entry only. It should be evaluated how much
> of the early EFI boot path would be common to the HVMlite one.

We also have other asm code which can be shared. I'll reply to Boris'
original e-mail with what I can identify as perhaps sharable. There is
obviously more as you allude.

> What would be gained by using the same entry but having two different boot
> paths after it?

Its a good question. In summary for me it would be the push for sharing more
code and the push for semantics on early boot to address differences
proactively, and ultimately it may enable us to help bring closer the old PV
boot path closer.

I'll elaborate on this but first let's clarify why a new entry is used for
HVMlite to start of with:

  1) Xen ABI has historically not wanted to set up the boot params for Linux
     guests, instead it insists on letting the Linux kernel Xen boot stubs fill
     that out for it. This sticking point means it has implicated a boot stub.
     The HVMLite boot entry tries to bring the boot entries paths closer as it
     leverages more of the HVM boot path philosophy to mimic the regular PC boot
     path.

     Is HVMLite supposed to support legacy PV guests as well BTW ?

     Reason I'm highlighting Xen ABI as a *reason* alone is that even with
     today's large discrepancy on the old PV boot path I believe we can
     bring together the boot paths closer together if the Xen ABI was slightly
     flexible about this, I've highlighted how I believe that is possible before,
     *iff* the Xen ABI would at the very least set 2 things only:

     a) Hypervisor type
     b) A custom data pointer

     This would enable a single boot entry on the guest to handle then:

	Pseudo code:

	startup_32()                         startup_64()
	       |                                  |
	       |                                  |
	       V                                  V
	pre_hypervisor_stub_32()        pre_hypervisor_stub_64()
	       |                                  |
	       |                                  |
	       V                                  V
	 [existing startup_32()]       [existing startup_64()]
	       |                                  |
	       |                                  |
	       V                                  V
	post_hypervisor_stub_32()       post_hypervisor_stub_64()

     
     If the Xen ABI was flexible about setting a hypervisor type and custom
     data pointer then we would haven handlers for it, and in it, it can
     do whatever it thinks is needed for its own guest types. It could
     also continue to set the zero page on its own as it sees fit.

     Again, note that if this is done it could also mean even bringing together
     the old PV boot path closer together... so this is not just a prospect
     for HVMLite but also for old PV guests.

  2) Because of 1) it has meant we have no formal semantics for early boot
     code is available and so severe differences can best be addressed also
     by yet another boot entry. This has meant often times not addressing
     or not knowing if we've addressed real differences between the different
     entries. Case in point, dead code [0]. How do we know we will not run
     certain code that should not run for the different entries ? Without
     *any* semantics later in boot code to distinguish where we came from
     and because we strive to build single kernels with different possible
     run time environments it means we have tons of code available to
     execute / run that we may not need.

     Because of the lack of semantics we may still have dead code prospects
     with the new HVMLite entry. How are we sure there is no differences ?

[0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html

  3) Unikernel / other OS requirements: this is really tied to 2) but even if
     we tried to evolve the Xen ABI it would mean considering existing solutions
     out there. Things to consider as an example: FreeBSD doesn't have an EFI
     entry, unikernels want a simple boot entry.

With this in mind then, that I can think of:

Cons of using the same entry but having two different boot paths:

  * Pushes the Xen ABI, needs to make everyone happy, this is hard
  * Perhaps harder to implement

Gains of striving to use the same entry but having two different boot:

 * Helps to share more code easily
 * Reduce attack surface
 * Requires us to have semantics for early boot; this has a series of
   side benefits:
   - Means you should try to address differences explicitly rather than
     implicitly -- case in point Dead Code

> You still need a way to distinguish between bare metal
> EFI and HVMlite.

Great point! This is the semantics aspect. The new entry for HVMlite approach
deals with this by making the differences implicit by the new entry point.
My call for addressing this through a hypervisor type was to see if we can
get those semantics added explicitly so we can also later address dead
code concerns for the new HVMLite guest type.

Part of my own interest in an EFI entry here is that EFI could be used to help
expand on the semantics in an OS/agnostic form rather than pushing the x86 boot
protocol further. That seems to have its own set of drawbacks though.


> And Xen needs a way to find out whether a kernel is
> supporting HVMlite to boot it in the correct mode.

How was Xen going to find out if new kernels had HVMlite support with the
new entry ? An ELFNOTE() ? If an entry is shared could we note use an
ELFNOTE() also for this though too ?

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]         ` <20160413095428.5mcbrimvc6vxffcw@mac>
@ 2016-04-13 18:50           ` Luis R. Rodriguez
       [not found]           ` <20160413185010.GX1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-13 18:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Jim Fehlig, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross,
	Stefano Stabellini, Julien Grall, George Dunlap, joeyli,
	Borislav Petkov, Boris Ostrovsky, Charles Arndol, Andrew Cooper

On Wed, Apr 13, 2016 at 11:54:29AM +0200, Roger Pau Monné wrote:
> On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
> > OK thanks for the clarification -- still no custom entries for Xen!
> > We should strive for that, at the very least.
> > 
> > You do have a point about the legacy stuff. There are two options there:
> > 
> >   * Fold legacy support under HVMLite -- which seems to be what we
> >     currently want to do (we should evaluate the implications and
> >     requirements here for that); or
> 
> I'm not following here. What does it mean to fold legacy support under 
> HVMlite? HVMlite doesn't have any legacy hardware, and that's the issue when 
> it comes to using native Linux entry points. Linux might expect some legacy 
> PC hardware to be always present, which is not true for HVMlite.
> 
> Could you please clarify this point?

It seems there is a confusion on terms used. By folding legacy support under
HVMLite I meant folding legacy PV path (classic PV with PV interfaces) under
HVMlite.

I got the impression that if we wanted to remove the old PV path we had to see
if we can address old classic PV x86 guests through HVMlite, otherwise we'd
have to live with the old PV path for the long term.

> >   * Leave legacy stuff on the old PV path; this may be something to
> >     bring to the table if we had in place a proactive solution to
> >     avoid further fallout from the architecture of the huge differences
> >     on the entries. The work I'm doing should help with that. (We should
> >     also evaluate the implications and requirements here for that as
> >     well).
> 
> Classic PV guests don't have legacy hardware at all, they just have PV 
> interfaces, so I'm even less sure of what this means.

Using the terms you use by "Leave legacy stuff on the old PV path" I meant 
not having to address classic PV guest support through HVMLite.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]           ` <CAFLBxZbiGppNad=Z6-fLgx89O0yAFrSyARTCwv=vHBR3zJ=NsA@mail.gmail.com>
@ 2016-04-13 18:54             ` Luis R. Rodriguez
       [not found]             ` <20160413185451.GY1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-13 18:54 UTC (permalink / raw)
  To: George Dunlap
  Cc: Matt Fleming, jeffm, Linux Kernel Mailing List, Jim Fehlig,
	Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Julien Grall, Stefano Stabellini, Julien Grall, joeyli,
	Borislav Petkov, Boris Ostrovsky, Juergen Gross, Andrew Cooper,
	Michael Chang

On Wed, Apr 13, 2016 at 11:05:00AM +0100, George Dunlap wrote:
> On Tue, Apr 12, 2016 at 11:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > Also, x86 does have a history of short DT use. Just pointing that its there as
> > an option as well. I'll Cc you on some thread about that.
> 
> I'm not sure how this is relevant to anything.

You brought DT as a reason why ARM was able to use the native point.
I'm clarifying DT has nothing to do as a restriction on x86.

> What we're talking about is how to get from Xen to a point in the
> Linux kernel where everything can Just Work.  The proposed feature is
> a mini trampoline that (as I understand it):
> 1. Tells Xen where to jump to (via ELF note)
> 2. Sets up some basic modes and pagetables and then jumps to the zero
> page so Linux can just carry on.

Right, and the my goal is to see to it we do enough homework to
ensure we reviewed all possibilities to share as much code as possible
already and looked at all options before saying we certainly need yet
another entry point. I am not convinced yet this has been done.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]       ` <20160413182951.GW1990@wotan.suse.de>
@ 2016-04-13 18:56         ` Konrad Rzeszutek Wilk
  2016-04-13 20:40           ` Luis R. Rodriguez
       [not found]           ` <20160413204055.GD1990@wotan.suse.de>
  0 siblings, 2 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-13 18:56 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Stefano Stabellini, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper, Jim Fehlig,
	Andy Lutomirski, David Vrabel, Linus Torvalds

On Wed, Apr 13, 2016 at 08:29:51PM +0200, Luis R. Rodriguez wrote:
> On Mon, Apr 11, 2016 at 07:12:08AM +0200, Juergen Gross wrote:
> > On 08/04/16 22:40, Luis R. Rodriguez wrote:
> > > On Wed, Apr 06, 2016 at 10:40:08AM +0100, David Vrabel wrote:
> > >> On 06/04/16 03:40, Luis R. Rodriguez wrote:
> > >>>
> > >>>     * You don't need full EFI emulation
> > >>
> > >> I think needing any EFI emulation inside Xen (which is where it would
> > >> need to be for dom0) is not suitable because of the increase in
> > >> hypervisor ABI.
> > > 
> > > Is this because of timing on architecture / design of HVMLite, or
> > > a general position that the complexity to deal with EFI emulation
> > > is too much for Xen's taste ?
> > 
> > The Xen hypervisor should be as small as possible. Adding an EFI
> > emulator will be adding quite some code. This should be done after a
> > very thorough evaluation only.
> 
> Sure.
> 
> > > ARM already went the EFI entry way for domU -- it went the OVMF route,
> > > would such a possibility be possible for x86 domU HVMLite ? If not why
> > > not, I mean it would seem to make sense to at least mimic the same type
> > > of early boot environment, and perhaps there are some lessons to be
> > > learned from that effort too.
> > 
> > The final solution must be appropriate for dom0, too. So don't try
> > to limit the discussion to domU. If dom0 isn't going to be acceptable
> > there will no need to discuss domU.
> 
> Understood. George noted that on ARM dom0 still uses the ARM native entry
> point, it seems to accomplish this as it uses a device tree node. I'll
> chime in on that in another thread.
> 
> > > Are there some lessons to be learned with ARM's effort? What are they?
> > > If that could be re-done again with any type of cleaner path, what
> > > could that be that could help the x86 side ?
> > > 
> > > Although emulating EFI may require work, some folks have pointed out
> > > that the amount of work may not be that much. If that is done can
> > > we instead rely on the same code to replace OVMF to support both
> > > Xen ARM and Xen HVMLite on x86 ? What would be the pros / cons of
> > > this ?
> > > 
> > >> I also still do not understand your objection to the current tiny stub.
> > > 
> > > Its more of a hypothetical -- can an EFI entry be used instead given
> > > it already does exactly what the new small entry does ? Its also rather
> > > odd to add a new entry without evaluating fully a possible alternative
> > > that would provide the same exact mechanism.
> > 
> > The interface isn't the new entry only. It should be evaluated how much
> > of the early EFI boot path would be common to the HVMlite one.
> 
> We also have other asm code which can be shared. I'll reply to Boris'
> original e-mail with what I can identify as perhaps sharable. There is
> obviously more as you allude.
> 
> > What would be gained by using the same entry but having two different boot
> > paths after it?
> 
> Its a good question. In summary for me it would be the push for sharing more
> code and the push for semantics on early boot to address differences
> proactively, and ultimately it may enable us to help bring closer the old PV
> boot path closer.

But why? We want to kill PV (eventually).
> 
> I'll elaborate on this but first let's clarify why a new entry is used for
> HVMlite to start of with:
> 
>   1) Xen ABI has historically not wanted to set up the boot params for Linux
>      guests, instead it insists on letting the Linux kernel Xen boot stubs fill
>      that out for it. This sticking point means it has implicated a boot stub.


Which is b/c it has to be OS agnostic. It has nothing to do 'not wanting'.

>      The HVMLite boot entry tries to bring the boot entries paths closer as it
>      leverages more of the HVM boot path philosophy to mimic the regular PC boot
>      path.
> 
>      Is HVMLite supposed to support legacy PV guests as well BTW ?

Gosh no.
> 
>      Reason I'm highlighting Xen ABI as a *reason* alone is that even with
>      today's large discrepancy on the old PV boot path I believe we can
>      bring together the boot paths closer together if the Xen ABI was slightly
>      flexible about this, I've highlighted how I believe that is possible before,

<runs away screaming>

>      *iff* the Xen ABI would at the very least set 2 things only:
> 
>      a) Hypervisor type
>      b) A custom data pointer
> 
>      This would enable a single boot entry on the guest to handle then:
> 
> 	Pseudo code:
> 
> 	startup_32()                         startup_64()
> 	       |                                  |
> 	       |                                  |
> 	       V                                  V
> 	pre_hypervisor_stub_32()        pre_hypervisor_stub_64()
> 	       |                                  |
> 	       |                                  |
> 	       V                                  V
> 	 [existing startup_32()]       [existing startup_64()]
> 	       |                                  |
> 	       |                                  |
> 	       V                                  V
> 	post_hypervisor_stub_32()       post_hypervisor_stub_64()
> 
>      
>      If the Xen ABI was flexible about setting a hypervisor type and custom
>      data pointer then we would haven handlers for it, and in it, it can
>      do whatever it thinks is needed for its own guest types. It could
>      also continue to set the zero page on its own as it sees fit.
> 
>      Again, note that if this is done it could also mean even bringing together
>      the old PV boot path closer together... so this is not just a prospect
>      for HVMLite but also for old PV guests.
> 
>   2) Because of 1) it has meant we have no formal semantics for early boot
>      code is available and so severe differences can best be addressed also
>      by yet another boot entry. This has meant often times not addressing

There are semantics written for this new code: http://xenbits.xen.org/docs/unstable/misc/hvmlite.html

All other ones related to low-level operations are described in Intel SDM.


>      or not knowing if we've addressed real differences between the different
>      entries. Case in point, dead code [0]. How do we know we will not run
>      certain code that should not run for the different entries ? Without
>      *any* semantics later in boot code to distinguish where we came from
>      and because we strive to build single kernels with different possible
>      run time environments it means we have tons of code available to
>      execute / run that we may not need.

I am not following that. PVH aka HVMLite will pretty much erase the need for the
pvops.
> 
>      Because of the lack of semantics we may still have dead code prospects
>      with the new HVMLite entry. How are we sure there is no differences ?
> 
> [0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
> 
>   3) Unikernel / other OS requirements: this is really tied to 2) but even if
>      we tried to evolve the Xen ABI it would mean considering existing solutions
>      out there. Things to consider as an example: FreeBSD doesn't have an EFI
>      entry, unikernels want a simple boot entry.
> 
> With this in mind then, that I can think of:
> 
> Cons of using the same entry but having two different boot paths:
> 
>   * Pushes the Xen ABI, needs to make everyone happy, this is hard
>   * Perhaps harder to implement
> 
> Gains of striving to use the same entry but having two different boot:
> 
>  * Helps to share more code easily
>  * Reduce attack surface
>  * Requires us to have semantics for early boot; this has a series of
>    side benefits:
>    - Means you should try to address differences explicitly rather than
>      implicitly -- case in point Dead Code
> 
> > You still need a way to distinguish between bare metal
> > EFI and HVMlite.
> 
> Great point! This is the semantics aspect. The new entry for HVMlite approach
> deals with this by making the differences implicit by the new entry point.
> My call for addressing this through a hypervisor type was to see if we can
> get those semantics added explicitly so we can also later address dead
> code concerns for the new HVMLite guest type.

Right, they are..
> 
> Part of my own interest in an EFI entry here is that EFI could be used to help
> expand on the semantics in an OS/agnostic form rather than pushing the x86 boot
> protocol further. That seems to have its own set of drawbacks though.
> 
> 
> > And Xen needs a way to find out whether a kernel is
> > supporting HVMlite to boot it in the correct mode.
> 
> How was Xen going to find out if new kernels had HVMlite support with the
> new entry ? An ELFNOTE() ? If an entry is shared could we note use an

Yeah.
> ELFNOTE() also for this though too ?

Not sure what you mean by 'shared'. But you can add multiple Elf PT_NOTEs.
See the ELF document.
> 
>   Luis
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]           ` <20160413185010.GX1990@wotan.suse.de>
@ 2016-04-13 19:02             ` Konrad Rzeszutek Wilk
  2016-04-13 19:14               ` Luis R. Rodriguez
       [not found]               ` <20160413191408.GA1990@wotan.suse.de>
  0 siblings, 2 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-13 19:02 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Takashi Iwai, Vojtěch Pavlík, Gary Lin, xen-devel,
	Jeffrey Cheung, Charles Arndol, Julien Grall, Stefano Stabellini,
	Julien Grall, George Dunlap, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper

On Wed, Apr 13, 2016 at 08:50:10PM +0200, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 11:54:29AM +0200, Roger Pau Monné wrote:
> > On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
> > > OK thanks for the clarification -- still no custom entries for Xen!
> > > We should strive for that, at the very least.
> > > 
> > > You do have a point about the legacy stuff. There are two options there:
> > > 
> > >   * Fold legacy support under HVMLite -- which seems to be what we
> > >     currently want to do (we should evaluate the implications and
> > >     requirements here for that); or
> > 
> > I'm not following here. What does it mean to fold legacy support under 
> > HVMlite? HVMlite doesn't have any legacy hardware, and that's the issue when 
> > it comes to using native Linux entry points. Linux might expect some legacy 
> > PC hardware to be always present, which is not true for HVMlite.
> > 
> > Could you please clarify this point?
> 
> It seems there is a confusion on terms used. By folding legacy support under
> HVMLite I meant folding legacy PV path (classic PV with PV interfaces) under
> HVMlite.

Ewww.
> 
> I got the impression that if we wanted to remove the old PV path we had to see
> if we can address old classic PV x86 guests through HVMlite, otherwise we'd
> have to live with the old PV path for the long term.

No. We need to deprecate the PV paths - and the agreement we hammered out
with the x86 maintainers was that once PVH/HVMLite is stable the clock
would start ticking on PV (pvops) life. All the big users of PV Linux
were told in persons to prep them for this.

Keep in mind that this is not for deleting of support in hypervisor for
PV hypercalls - meaning you would still be able to run say 2.6.18 RHEL5
in years to come. It is just that Linux v6.1 won't have any more PV paths
and can only run in HVM or PVH/HVMLite mode under Xen.

> 
> > >   * Leave legacy stuff on the old PV path; this may be something to
> > >     bring to the table if we had in place a proactive solution to
> > >     avoid further fallout from the architecture of the huge differences
> > >     on the entries. The work I'm doing should help with that. (We should
> > >     also evaluate the implications and requirements here for that as
> > >     well).
> > 
> > Classic PV guests don't have legacy hardware at all, they just have PV 
> > interfaces, so I'm even less sure of what this means.
> 
> Using the terms you use by "Leave legacy stuff on the old PV path" I meant 
> not having to address classic PV guest support through HVMLite.
> 
>   Luis
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]           ` <20160413102156.b4qwhwbqvnnpmxgw@mac>
@ 2016-04-13 19:10             ` Luis R. Rodriguez
  0 siblings, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-13 19:10 UTC (permalink / raw)
  To: Roger Pau Monné, Sebastian Andrzej Siewior
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Jim Fehlig, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Juergen Gross,
	Stefano Stabellini, Julien Grall, George Dunlap, joeyli,
	Borislav Petkov, Boris Ostrovsky, Charles Arndol, Andrew Cooper

On Wed, Apr 13, 2016 at 12:25:03PM +0200, Roger Pau Monné wrote:
> On Wed, Apr 13, 2016 at 12:12:25AM +0200, Luis R. Rodriguez wrote:
> [...]
> > Also, x86 does have a history of short DT use. Just pointing that its there as
> > an option as well. I'll Cc you on some thread about that.
> 
> I don't see how this is relevant to the conversation that's going on:

Its relevant as George brought up DT as a *reason* why ARM was able
to cope with no custom entry point...

> How many x86 hardware provide DT?


One. CE4100.

arch/x86/platform/ce4100/falconfalls.dt

> I bet this is 0%.

That's slightly more than 0%.

> How many OSes can boot on x86 using DT? Linux maybe, certainly FreeBSD, 
> Windows or OpenBSD won't be able to boot at all when provided a DT on x86.

You guys seem to be taking these things too personal. 

Let me repeat, my goal is to ensure we review things without a bias. The points
you make here *now* are things I welcome to the discussion as reasons for
ruling out DT as ways to fine tune further semantics, its however by no means
something we should have discarded.

> Is Xen going to craft a DT for x86 based on ACPI?  No, because it can't parse
> the DSDT or other dynamic tables that contain the information about the
> devices in the system.

Again, DT was brought up by George as reason why ARM was able to cope
with no custom entry point. That's all. What you raise is a good point
to highlight but it does not mean we can't use it if we wanted to for
other things, for instance as an alternative to extending the x86 boot
protocol with custom things which we may need to enhance semantics
early in boot. If that is a stupid prospect lets highlight that and
rule it out.

> I would also like to point out that DT or not DT is not really the problem 
> here, the issue that George was trying to point out is that on x86 there's 
> some legacy hardware that's considered to be always there, so it's presence 
> is not signaled by ACPI, and HVMlite is _not_ emulating this hardware. It 
> doesn't matter if the hardware description comes from ACPI or DT, this 
> hardware is considered to be always present on PC compatible hardware.

x86 Xen PV guests are not alone.  I'm adding quirks we can use to address this
in a clean way now which turns out to be very useful for other custom x86
platforms.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
  2016-04-13 19:02             ` Konrad Rzeszutek Wilk
@ 2016-04-13 19:14               ` Luis R. Rodriguez
       [not found]               ` <20160413191408.GA1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-13 19:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Matt Fleming, jeffm, Michael Chang, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Takashi Iwai, Vojtěch Pavlík, Gary Lin, xen-devel,
	Jeffrey Cheung, Charles Arndol, Julien Grall, Stefano Stabellini,
	Julien Grall, George Dunlap, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper

On Wed, Apr 13, 2016 at 03:02:26PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 13, 2016 at 08:50:10PM +0200, Luis R. Rodriguez wrote:
> > On Wed, Apr 13, 2016 at 11:54:29AM +0200, Roger Pau Monné wrote:
> > > On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
> > > > OK thanks for the clarification -- still no custom entries for Xen!
> > > > We should strive for that, at the very least.
> > > > 
> > > > You do have a point about the legacy stuff. There are two options there:
> > > > 
> > > >   * Fold legacy support under HVMLite -- which seems to be what we
> > > >     currently want to do (we should evaluate the implications and
> > > >     requirements here for that); or
> > > 
> > > I'm not following here. What does it mean to fold legacy support under 
> > > HVMlite? HVMlite doesn't have any legacy hardware, and that's the issue when 
> > > it comes to using native Linux entry points. Linux might expect some legacy 
> > > PC hardware to be always present, which is not true for HVMlite.
> > > 
> > > Could you please clarify this point?
> > 
> > It seems there is a confusion on terms used. By folding legacy support under
> > HVMLite I meant folding legacy PV path (classic PV with PV interfaces) under
> > HVMlite.
> 
> Ewww.

Probably a confusion again on terms, by the above I meant to say what you seem
to be indicating below, which is to keep old PV guest support with PV interfaces
using a new shiny entry.

Or are we really going to nuke full support for old PV guests ?

> > I got the impression that if we wanted to remove the old PV path we had to see
> > if we can address old classic PV x86 guests through HVMlite, otherwise we'd
> > have to live with the old PV path for the long term.
> 
> No. We need to deprecate the PV paths - and the agreement we hammered out
> with the x86 maintainers was that once PVH/HVMLite is stable the clock
> would start ticking on PV (pvops) life. All the big users of PV Linux
> were told in persons to prep them for this.

That's nice. *How* that is done is what we are determining here.

> Keep in mind that this is not for deleting of support in hypervisor for
> PV hypercalls - meaning you would still be able to run say 2.6.18 RHEL5
> in years to come. It is just that Linux v6.1 won't have any more PV paths
> and can only run in HVM or PVH/HVMLite mode under Xen.

Sure.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]               ` <20160413191408.GA1990@wotan.suse.de>
@ 2016-04-13 19:22                 ` Konrad Rzeszutek Wilk
  2016-04-13 20:01                   ` Luis R. Rodriguez
       [not found]                   ` <20160413200118.GC1990@wotan.suse.de>
  2016-04-14 10:13                 ` George Dunlap
  1 sibling, 2 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-13 19:22 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Takashi Iwai, Vojtěch Pavlík, Gary Lin, xen-devel,
	Jeffrey Cheung, Charles Arndol, Julien Grall, Stefano Stabellini,
	Julien Grall, George Dunlap, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper

On Wed, Apr 13, 2016 at 09:14:08PM +0200, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 03:02:26PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Apr 13, 2016 at 08:50:10PM +0200, Luis R. Rodriguez wrote:
> > > On Wed, Apr 13, 2016 at 11:54:29AM +0200, Roger Pau Monné wrote:
> > > > On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
> > > > > OK thanks for the clarification -- still no custom entries for Xen!
> > > > > We should strive for that, at the very least.
> > > > > 
> > > > > You do have a point about the legacy stuff. There are two options there:
> > > > > 
> > > > >   * Fold legacy support under HVMLite -- which seems to be what we
> > > > >     currently want to do (we should evaluate the implications and
> > > > >     requirements here for that); or
> > > > 
> > > > I'm not following here. What does it mean to fold legacy support under 
> > > > HVMlite? HVMlite doesn't have any legacy hardware, and that's the issue when 
> > > > it comes to using native Linux entry points. Linux might expect some legacy 
> > > > PC hardware to be always present, which is not true for HVMlite.
> > > > 
> > > > Could you please clarify this point?
> > > 
> > > It seems there is a confusion on terms used. By folding legacy support under
> > > HVMLite I meant folding legacy PV path (classic PV with PV interfaces) under
> > > HVMlite.
> > 
> > Ewww.
> 
> Probably a confusion again on terms, by the above I meant to say what you seem
> to be indicating below, which is to keep old PV guest support with PV interfaces
> using a new shiny entry.
> 
> Or are we really going to nuke full support for old PV guests ?

Please re-read my email. The hypervisor is not going to nuke it. Linux
will stop using them - and hence the pvops will be obsolete.
> 
> > > I got the impression that if we wanted to remove the old PV path we had to see
> > > if we can address old classic PV x86 guests through HVMlite, otherwise we'd
> > > have to live with the old PV path for the long term.
> > 
> > No. We need to deprecate the PV paths - and the agreement we hammered out
> > with the x86 maintainers was that once PVH/HVMLite is stable the clock
> > would start ticking on PV (pvops) life. All the big users of PV Linux
> > were told in persons to prep them for this.
> 
> That's nice. *How* that is done is what we are determining here.

What is being discussed is how PVH/HVMLite is suppose to bootup.
Or the merits of different bootup paths.

Unless you are saying that you want to be the maintainer of pvops
and want to extend the life of pvops in Linux and are trying to make
it work under HVMLite?

> 
> > Keep in mind that this is not for deleting of support in hypervisor for
> > PV hypercalls - meaning you would still be able to run say 2.6.18 RHEL5
> > in years to come. It is just that Linux v6.1 won't have any more PV paths
> > and can only run in HVM or PVH/HVMLite mode under Xen.
> 
> Sure.
> 
>   Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]     ` <CAFLBxZbJ4QyJQ1-ZuXg_Q-9YNXnWzDyPNp4SX=d9g0DS8mJKaw@mail.gmail.com>
@ 2016-04-13 19:52       ` Luis R. Rodriguez
       [not found]       ` <20160413195257.GB1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-13 19:52 UTC (permalink / raw)
  To: George Dunlap
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Julien Grall, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Jim Fehlig, Andy Lutomirski

On Wed, Apr 13, 2016 at 04:44:54PM +0100, George Dunlap wrote:
> On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > So more to it, if the EFI entry already provides a way into Linux
> > in a more streamlined fashion bringing it closer to the bare metal
> > boot entry, why *would* we add another boot entry to x86, even if
> > its small and self contained ?
> 
> We would avoid using EFI if:

And this is what I was looking for, thanks!

> * Being called both on real hardware and under Xen would make the EFI
> entry point more complicated

That's on the EFI Linux maintainer to assess. And he seems willing to
consider this.

> * Adding the necessary EFI support into Xen would be a significant
> chunk of extra work

This seems to be a good sticking point, but Andi noted another aspect
of this or redundancy as well.

> * Requiring PVH mode to implement EFI would make it more difficult for
> other kernes (NetBSD, FreeBSD) to act as dom0s.

What if this is an option only then ?

> 
> * Requiring PVH mode to use EFI would make it more difficult to
> support unikernel-style workloads for domUs.

What if this is an option only then ?

> Now as has been pointed out, we don't know for a lot of the above
> things for certain, because nobody has posted any code.  None of us
> really want to post any code because:
> 
> * Reading and understanding the EFI spec, the Linux EFI path, and
> implementing all that on both the Xen and the Linux side is a lot of
> work
> 
> * It looks pretty likely that many of the above things will be true
> 
> * The only real objection to the currently proposed solution is really weak.

Not true:

  * Avoiding code duplication
  * Semantics may be needed anyway


> If you want to post some code I'm sure we could give you feedback on it.

Part of my engagement on HVMLite review is *because* I have been posting
code to help proactively address some old classic PV path issues and
semantics.

I've been addressing semantics on the PV path, and trying to help
bring the classic PV path closer to native entry points while trying
to also provide a proactive measure to help address regressions on the
classic PV path without having Xen be a bottleneck for x86 development.

As for the EFI stuff -- its discussion now as it'd be pointless to
throw out code if we already know we can't go down a path.

> > Another position against small stubs which I listed myself is that we may need
> > more semantics for early boot even if the new HVMLite small stub is added. This
> > remains to be seen. If we are going to add new semantics, it would seem best to
> > use something more standard like EFI configuration tables rather than hack on
> > to x86 further custom semantics. Custom sloppy semantics have proven to be
> > misused, and were ultimately a sloppy mess.
> [snip]
> >> That sounds like it's going to make the EFI path just as unmanageable as the
> >> current PV path.
> >
> > Can you describe how?
> >
> >> Using the EFI entry point would certainly make sense if it was
> >> actually simpler than the proposed extra entry point.  But it sounds
> >> like it's going to be more complicated, not only for Xen, but also for
> >> Linux.
> >
> > How so? Please provide specifics.
> 
> Here is the juxtaposition that confuses me.  The problem with a lot of
> the current code is that you have virtualization-specific hacks all
> over the place making things complicated.

That's because of sloppy solutions.

> And in the first quote
> above, you seem afraid that the extra entry point with stub code will
> somehow be misused and end up in a similar "sloppy mess", even though
> it's not at all clear how *having a stub entry point* could be
> "abused" by anyone.

You seem to be missing the points I've raised to Boris about semantics
and requirements for custom platform stuff.

>  But then when I suggest that sharing a codepath
> between systems that have actual EFI firmware, with platform hardware,
> and a system that has no EFI firmware and no similar concept of the
> hardware, might end up a sloppy mess of Xen-specific if clauses and
> maintenance headaches due to broken assumptions, it doesn't even
> register with you as a reasonable concern?

Quite the contrary! It does, the question is how we are going to address
the semantics clearly. EFI seemed to provide an OS agnostic way to
address some of this through configuration tables, which would mean
not having to extend the old x86 boot protocol further. More to the
point, this is beyond x86, if we are going to be striving to unify
entry points on Linux across architectures in the long term why not
start addressing needed semantics for virtualization through more
standard mean now?

> As Matt said, nobody will be able to provide specifics until someone
> tries to code it up.  But coding things up is not free.

And he is, but privately shared so far. We still can benefit from
more architectural discussion over these things.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
  2016-04-13 19:22                 ` Konrad Rzeszutek Wilk
@ 2016-04-13 20:01                   ` Luis R. Rodriguez
       [not found]                   ` <20160413200118.GC1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-13 20:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Matt Fleming, jeffm, Michael Chang, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Takashi Iwai, Vojtěch Pavlík, Gary Lin, xen-devel,
	Jeffrey Cheung, Charles Arndol, Julien Grall, Stefano Stabellini,
	Julien Grall, George Dunlap, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper

On Wed, Apr 13, 2016 at 03:22:23PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 13, 2016 at 09:14:08PM +0200, Luis R. Rodriguez wrote:
> > On Wed, Apr 13, 2016 at 03:02:26PM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Apr 13, 2016 at 08:50:10PM +0200, Luis R. Rodriguez wrote:
> > > > On Wed, Apr 13, 2016 at 11:54:29AM +0200, Roger Pau Monné wrote:
> > > > > On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
> > > > > > OK thanks for the clarification -- still no custom entries for Xen!
> > > > > > We should strive for that, at the very least.
> > > > > > 
> > > > > > You do have a point about the legacy stuff. There are two options there:
> > > > > > 
> > > > > >   * Fold legacy support under HVMLite -- which seems to be what we
> > > > > >     currently want to do (we should evaluate the implications and
> > > > > >     requirements here for that); or
> > > > > 
> > > > > I'm not following here. What does it mean to fold legacy support under 
> > > > > HVMlite? HVMlite doesn't have any legacy hardware, and that's the issue when 
> > > > > it comes to using native Linux entry points. Linux might expect some legacy 
> > > > > PC hardware to be always present, which is not true for HVMlite.
> > > > > 
> > > > > Could you please clarify this point?
> > > > 
> > > > It seems there is a confusion on terms used. By folding legacy support under
> > > > HVMLite I meant folding legacy PV path (classic PV with PV interfaces) under
> > > > HVMlite.
> > > 
> > > Ewww.
> > 
> > Probably a confusion again on terms, by the above I meant to say what you seem
> > to be indicating below, which is to keep old PV guest support with PV interfaces
> > using a new shiny entry.
> > 
> > Or are we really going to nuke full support for old PV guests ?
> 
> Please re-read my email. The hypervisor is not going to nuke it. Linux
> will stop using them - and hence the pvops will be obsolete.

I meant remove old PV guests support from Linux. You made it crystal clear
that the hypervisor will keep legacy PV support.

Are we going to remove old PV guest support from Linux upstream long term ?
If so then HVMLite design need not be concerned with supporting legacy crap.

> > > > I got the impression that if we wanted to remove the old PV path we had to see
> > > > if we can address old classic PV x86 guests through HVMlite, otherwise we'd
> > > > have to live with the old PV path for the long term.
> > > 
> > > No. We need to deprecate the PV paths - and the agreement we hammered out
> > > with the x86 maintainers was that once PVH/HVMLite is stable the clock
> > > would start ticking on PV (pvops) life. All the big users of PV Linux
> > > were told in persons to prep them for this.
> > 
> > That's nice. *How* that is done is what we are determining here.
> 
> What is being discussed is how PVH/HVMLite is suppose to bootup.
> Or the merits of different bootup paths.

That's part of it...

> Unless you are saying that you want to be the maintainer of pvops
> and want to extend the life of pvops in Linux and are trying to make
> it work under HVMLite?

Huh? If you look at pvops commits you'll see I've been responsible for
most of the pvops removal already, my ongoing patches should show that
my goal is to streamline this further.

I want to clarify now then what our exist path is, do we need to care
about legacy crap ?

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]                   ` <20160413200118.GC1990@wotan.suse.de>
@ 2016-04-13 20:11                     ` Konrad Rzeszutek Wilk
  2016-04-13 20:35                       ` Luis R. Rodriguez
       [not found]                       ` <CAB=NE6VdTB1Bc=c0oCd_tTHpwwkQcxhnOFdcLfck2jX=JjuOAQ@mail.gmail.com>
  0 siblings, 2 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-13 20:11 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Takashi Iwai, Vojtěch Pavlík, Gary Lin, xen-devel,
	Jeffrey Cheung, Charles Arndol, Julien Grall, Stefano Stabellini,
	Julien Grall, George Dunlap, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper

On Wed, Apr 13, 2016 at 10:01:18PM +0200, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 03:22:23PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Apr 13, 2016 at 09:14:08PM +0200, Luis R. Rodriguez wrote:
> > > On Wed, Apr 13, 2016 at 03:02:26PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > On Wed, Apr 13, 2016 at 08:50:10PM +0200, Luis R. Rodriguez wrote:
> > > > > On Wed, Apr 13, 2016 at 11:54:29AM +0200, Roger Pau Monné wrote:
> > > > > > On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
> > > > > > > OK thanks for the clarification -- still no custom entries for Xen!
> > > > > > > We should strive for that, at the very least.
> > > > > > > 
> > > > > > > You do have a point about the legacy stuff. There are two options there:
> > > > > > > 
> > > > > > >   * Fold legacy support under HVMLite -- which seems to be what we
> > > > > > >     currently want to do (we should evaluate the implications and
> > > > > > >     requirements here for that); or
> > > > > > 
> > > > > > I'm not following here. What does it mean to fold legacy support under 
> > > > > > HVMlite? HVMlite doesn't have any legacy hardware, and that's the issue when 
> > > > > > it comes to using native Linux entry points. Linux might expect some legacy 
> > > > > > PC hardware to be always present, which is not true for HVMlite.
> > > > > > 
> > > > > > Could you please clarify this point?
> > > > > 
> > > > > It seems there is a confusion on terms used. By folding legacy support under
> > > > > HVMLite I meant folding legacy PV path (classic PV with PV interfaces) under
> > > > > HVMlite.
> > > > 
> > > > Ewww.
> > > 
> > > Probably a confusion again on terms, by the above I meant to say what you seem
> > > to be indicating below, which is to keep old PV guest support with PV interfaces
> > > using a new shiny entry.
> > > 
> > > Or are we really going to nuke full support for old PV guests ?
> > 
> > Please re-read my email. The hypervisor is not going to nuke it. Linux
> > will stop using them - and hence the pvops will be obsolete.
> 
> I meant remove old PV guests support from Linux. You made it crystal clear
> that the hypervisor will keep legacy PV support.
> 
> Are we going to remove old PV guest support from Linux upstream long term ?

Yes!
> If so then HVMLite design need not be concerned with supporting legacy crap.

Exactly.
> 
> > > > > I got the impression that if we wanted to remove the old PV path we had to see
> > > > > if we can address old classic PV x86 guests through HVMlite, otherwise we'd
> > > > > have to live with the old PV path for the long term.
> > > > 
> > > > No. We need to deprecate the PV paths - and the agreement we hammered out
> > > > with the x86 maintainers was that once PVH/HVMLite is stable the clock
> > > > would start ticking on PV (pvops) life. All the big users of PV Linux
> > > > were told in persons to prep them for this.
> > > 
> > > That's nice. *How* that is done is what we are determining here.
> > 
> > What is being discussed is how PVH/HVMLite is suppose to bootup.
> > Or the merits of different bootup paths.
> 
> That's part of it...
> 
> > Unless you are saying that you want to be the maintainer of pvops
> > and want to extend the life of pvops in Linux and are trying to make
> > it work under HVMLite?
> 
> Huh? If you look at pvops commits you'll see I've been responsible for
> most of the pvops removal already, my ongoing patches should show that
> my goal is to streamline this further.
> 
> I want to clarify now then what our exist path is, do we need to care
> about legacy crap ?

exist? Existing?

And by 'legacy crap' you mean 'pvops' - then the answer is no.

The big existing use-case of pvops is to boot Linux as initial domain.
If we can swap it over to PVH/HVMLite then that frees us from having to
use pvops.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
  2016-04-13 20:11                     ` Konrad Rzeszutek Wilk
@ 2016-04-13 20:35                       ` Luis R. Rodriguez
       [not found]                       ` <CAB=NE6VdTB1Bc=c0oCd_tTHpwwkQcxhnOFdcLfck2jX=JjuOAQ@mail.gmail.com>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-13 20:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Matt Fleming, Jeff Mahoney, Michael Chang, Jim Fehlig,
	Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Julien Grall, Stefano Stabellini, Julien Grall, George Dunlap,
	joeyli, Borislav Petkov, Boris Ostrovsky, Juergen Gross,
	Andrew Cooper

On Wed, Apr 13, 2016 at 1:11 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> I want to clarify now then what our exist path is, do we need to care
>> about legacy crap ?
>
> exist? Existing?

Sorry I meant 'exit path'.

> And by 'legacy crap' you mean 'pvops' - then the answer is no.

Not pvops -- but hardware without hardware virtualization bells and
whistles, are we then simply not going to need to support this old
crap hardware?

> The big existing use-case of pvops is to boot Linux as initial domain.
> If we can swap it over to PVH/HVMLite then that frees us from having to
> use pvops.

Right.

 Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
  2016-04-13 18:56         ` Konrad Rzeszutek Wilk
@ 2016-04-13 20:40           ` Luis R. Rodriguez
       [not found]           ` <20160413204055.GD1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-13 20:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Juergen Gross
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Charles Arndol, Andrew Cooper, Jim Fehlig, Andy Lutomirski,
	Luis R. Rodriguez, David Vrabel, Linus Torvalds

On Wed, Apr 13, 2016 at 02:56:29PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 13, 2016 at 08:29:51PM +0200, Luis R. Rodriguez wrote:
> > On Mon, Apr 11, 2016 at 07:12:08AM +0200, Juergen Gross wrote:
> > 
> > > What would be gained by using the same entry but having two different boot
> > > paths after it?
> > 
> > Its a good question. In summary for me it would be the push for sharing more
> > code and the push for semantics on early boot to address differences
> > proactively, and ultimately it may enable us to help bring closer the old PV
> > boot path closer.
> 
> But why? We want to kill PV (eventually).

Yeah yeah, but its still there, and we'll have to live with it for
at least minimum 5 years I hear. Part of my interest is to see to it
that this path gets less disruption and issues, and we also address
dead code issues which pvops simply folded under the rug. The dead code
concerns may exist still for hvmlite, so unless someone is willing
to make a bold claim there is none, its something to consider.

How we address semantics then is *very* important to me.

> > I'll elaborate on this but first let's clarify why a new entry is used for
> > HVMlite to start of with:
> > 
> >   1) Xen ABI has historically not wanted to set up the boot params for Linux
> >      guests, instead it insists on letting the Linux kernel Xen boot stubs fill
> >      that out for it. This sticking point means it has implicated a boot stub.
> 
> 
> Which is b/c it has to be OS agnostic. It has nothing to do 'not wanting'.

It can still be OS agnostic and pass on type and custom data pointer.

Would that be reasonable ?

> >      The HVMLite boot entry tries to bring the boot entries paths closer as it
> >      leverages more of the HVM boot path philosophy to mimic the regular PC boot
> >      path.
> > 
> >      Is HVMLite supposed to support legacy PV guests as well BTW ?
> 
> Gosh no.

Interesting.. and *everyone* is happy about this?

> >      Reason I'm highlighting Xen ABI as a *reason* alone is that even with
> >      today's large discrepancy on the old PV boot path I believe we can
> >      bring together the boot paths closer together if the Xen ABI was slightly
> >      flexible about this, I've highlighted how I believe that is possible before,
> 
> <runs away screaming>

Everyone has. If you need to support old PV guests for more than 5 years the
work I'm doing should help with that. I'm trying to leverage gains of the
work I'm doing for HVMLite, and part of this is trying to address semantics
proactively.

> >      *iff* the Xen ABI would at the very least set 2 things only:
> > 
> >      a) Hypervisor type
> >      b) A custom data pointer
> > 
> >      This would enable a single boot entry on the guest to handle then:
> > 
> > 	Pseudo code:
> > 
> > 	startup_32()                         startup_64()
> > 	       |                                  |
> > 	       |                                  |
> > 	       V                                  V
> > 	pre_hypervisor_stub_32()        pre_hypervisor_stub_64()
> > 	       |                                  |
> > 	       |                                  |
> > 	       V                                  V
> > 	 [existing startup_32()]       [existing startup_64()]
> > 	       |                                  |
> > 	       |                                  |
> > 	       V                                  V
> > 	post_hypervisor_stub_32()       post_hypervisor_stub_64()
> > 
> >      
> >      If the Xen ABI was flexible about setting a hypervisor type and custom
> >      data pointer then we would haven handlers for it, and in it, it can
> >      do whatever it thinks is needed for its own guest types. It could
> >      also continue to set the zero page on its own as it sees fit.
> > 
> >      Again, note that if this is done it could also mean even bringing together
> >      the old PV boot path closer together... so this is not just a prospect
> >      for HVMLite but also for old PV guests.
> > 
> >   2) Because of 1) it has meant we have no formal semantics for early boot
> >      code is available and so severe differences can best be addressed also
> >      by yet another boot entry. This has meant often times not addressing
> 
> There are semantics written for this new code: http://xenbits.xen.org/docs/unstable/misc/hvmlite.html

That only addressed semantics for early boot code implicitly through a new entry...

> All other ones related to low-level operations are described in Intel SDM.
> 
> 
> >      or not knowing if we've addressed real differences between the different
> >      entries. Case in point, dead code [0]. How do we know we will not run
> >      certain code that should not run for the different entries ? Without
> >      *any* semantics later in boot code to distinguish where we came from
> >      and because we strive to build single kernels with different possible
> >      run time environments it means we have tons of code available to
> >      execute / run that we may not need.
> 
> I am not following that. PVH aka HVMLite will pretty much erase the need for the
> pvops.

It does not mean there are no dead code concerns with HVMlite.

> > 
> >      Because of the lack of semantics we may still have dead code prospects
> >      with the new HVMLite entry. How are we sure there is no differences ?
> > 
> > [0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
> > 
> >   3) Unikernel / other OS requirements: this is really tied to 2) but even if
> >      we tried to evolve the Xen ABI it would mean considering existing solutions
> >      out there. Things to consider as an example: FreeBSD doesn't have an EFI
> >      entry, unikernels want a simple boot entry.
> > 
> > With this in mind then, that I can think of:
> > 
> > Cons of using the same entry but having two different boot paths:
> > 
> >   * Pushes the Xen ABI, needs to make everyone happy, this is hard
> >   * Perhaps harder to implement
> > 
> > Gains of striving to use the same entry but having two different boot:
> > 
> >  * Helps to share more code easily
> >  * Reduce attack surface
> >  * Requires us to have semantics for early boot; this has a series of
> >    side benefits:
> >    - Means you should try to address differences explicitly rather than
> >      implicitly -- case in point Dead Code
> > 
> > > You still need a way to distinguish between bare metal
> > > EFI and HVMlite.
> > 
> > Great point! This is the semantics aspect. The new entry for HVMlite approach
> > deals with this by making the differences implicit by the new entry point.
> > My call for addressing this through a hypervisor type was to see if we can
> > get those semantics added explicitly so we can also later address dead
> > code concerns for the new HVMLite guest type.
> 
> Right, they are..

There is huge merit to address a huge chunks of dead code concerns by sticking
more closer to the native booth paths, it doesn't mean you still have no
dead code concerns with HVMlite, nor that HVMLite has no platform quirks,
it does and part of some recent work is to pave a *clean* path for setting
these differences apart.

> > Part of my own interest in an EFI entry here is that EFI could be used to help
> > expand on the semantics in an OS/agnostic form rather than pushing the x86 boot
> > protocol further. That seems to have its own set of drawbacks though.
> > 
> > 
> > > And Xen needs a way to find out whether a kernel is
> > > supporting HVMlite to boot it in the correct mode.
> > 
> > How was Xen going to find out if new kernels had HVMlite support with the
> > new entry ? An ELFNOTE() ? If an entry is shared could we note use an
> 
> Yeah.
> > ELFNOTE() also for this though too ?
> 
> Not sure what you mean by 'shared'. But you can add multiple Elf PT_NOTEs.
> See the ELF document.

OK so even if we used a common/shared entry point we can address letting
Xen find out whether or not a kernel supports HVMlite.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]                       ` <CAB=NE6VdTB1Bc=c0oCd_tTHpwwkQcxhnOFdcLfck2jX=JjuOAQ@mail.gmail.com>
@ 2016-04-13 20:48                         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-13 20:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, Jeff Mahoney, Michael Chang, Jim Fehlig,
	Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Julien Grall, Stefano Stabellini, Julien Grall, George Dunlap,
	joeyli, Borislav Petkov, Boris Ostrovsky, Juergen Gross,
	Andrew Cooper

On Wed, Apr 13, 2016 at 01:35:27PM -0700, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 1:11 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> >> I want to clarify now then what our exist path is, do we need to care
> >> about legacy crap ?
> >
> > exist? Existing?
> 
> Sorry I meant 'exit path'.
> 
> > And by 'legacy crap' you mean 'pvops' - then the answer is no.
> 
> Not pvops -- but hardware without hardware virtualization bells and
> whistles, are we then simply not going to need to support this old
> crap hardware?

Yes, HVMLite means - HVM without QEMU. As in HVM without having to
emulate legacy hardware.
> 
> > The big existing use-case of pvops is to boot Linux as initial domain.
> > If we can swap it over to PVH/HVMLite then that frees us from having to
> > use pvops.
> 
> Right.
> 
>  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]           ` <20160413204055.GD1990@wotan.suse.de>
@ 2016-04-13 21:08             ` Konrad Rzeszutek Wilk
  2016-04-13 22:23               ` Luis R. Rodriguez
       [not found]               ` <20160413222317.GH1990@wotan.suse.de>
  0 siblings, 2 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-13 21:08 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Stefano Stabellini, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper, Jim Fehlig,
	Andy Lutomirski, David Vrabel, Linus Torvalds

On Wed, Apr 13, 2016 at 10:40:55PM +0200, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 02:56:29PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Apr 13, 2016 at 08:29:51PM +0200, Luis R. Rodriguez wrote:
> > > On Mon, Apr 11, 2016 at 07:12:08AM +0200, Juergen Gross wrote:
> > > 
> > > > What would be gained by using the same entry but having two different boot
> > > > paths after it?
> > > 
> > > Its a good question. In summary for me it would be the push for sharing more
> > > code and the push for semantics on early boot to address differences
> > > proactively, and ultimately it may enable us to help bring closer the old PV
> > > boot path closer.
> > 
> > But why? We want to kill PV (eventually).
> 
> Yeah yeah, but its still there, and we'll have to live with it for
> at least minimum 5 years I hear. Part of my interest is to see to it
> that this path gets less disruption and issues, and we also address
> dead code issues which pvops simply folded under the rug. The dead code
> concerns may exist still for hvmlite, so unless someone is willing
> to make a bold claim there is none, its something to consider.

What is this dead code you speak of? Is it MTRR? Is early path code
that PV misses (like KASL or other?)


The entrace point in Linux "proper" is startup_32 or startup_64 - the same
path that EFI uses.

If you were to draw this (very simplified):

a)- GRUB2 ---------------------\ (creates an bootparam structure)
                                \
                                 +---- startup_32 or startup_64
b) EFI -> Linux EFI stub -------/
       (creates bootparm)      /
c) GRUB2-EFI  -> Linux EFI----/
               stub         /
d) HVMLite ----------------/
      (creates bootparm)

(I am not sure about the c) - I would have to look in source to
be source). There is also LILO in this, but I am not even sure if
works anymore.


What you have is that every entry point creates the bootparams
and ends up calling startup_X. The startup_64 then hit the rest
of the kernel. The startp_X code is the one that would setup
the basic pagetables, segments, etc.

> 
> How we address semantics then is *very* important to me.

Which semantics? How the CPU is going to be at startup_X ? Or
how the CPU is going to be when EFI firmware invokes the EFI stub?
Or when GRUB2 loads Linux?

That (those bootloaders) is clearly defined. The URL I provided
mentions the HVMLite one. The Documentation/x86/boot.c mentions
what the semantics are to expected when providing an bootstrap
(which is what HVMLitel stub code in Linux would write against -
and what EFI stub code had been written against too).
> 
> > > I'll elaborate on this but first let's clarify why a new entry is used for
> > > HVMlite to start of with:
> > > 
> > >   1) Xen ABI has historically not wanted to set up the boot params for Linux
> > >      guests, instead it insists on letting the Linux kernel Xen boot stubs fill
> > >      that out for it. This sticking point means it has implicated a boot stub.
> > 
> > 
> > Which is b/c it has to be OS agnostic. It has nothing to do 'not wanting'.
> 
> It can still be OS agnostic and pass on type and custom data pointer.

Sure. It has that (it MUST otherwise how else would you pass data).
It is documented as well http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,xen.h.html#incontents_startofday
(see " Start of day structure passed to PVH guests in %ebx.")

> 
> Would that be reasonable ?
> 
> > >      The HVMLite boot entry tries to bring the boot entries paths closer as it
> > >      leverages more of the HVM boot path philosophy to mimic the regular PC boot
> > >      path.
> > > 
> > >      Is HVMLite supposed to support legacy PV guests as well BTW ?
> > 
> > Gosh no.
> 
> Interesting.. and *everyone* is happy about this?

The Xen Linux _and_ x86 maintainers are.
And the Xen community developers as well (I hadn't heard anybody screaming NOOO
so I am presuming so).

> 
> > >      Reason I'm highlighting Xen ABI as a *reason* alone is that even with
> > >      today's large discrepancy on the old PV boot path I believe we can
> > >      bring together the boot paths closer together if the Xen ABI was slightly
> > >      flexible about this, I've highlighted how I believe that is possible before,
> > 
> > <runs away screaming>
> 
> Everyone has. If you need to support old PV guests for more than 5 years the
> work I'm doing should help with that. I'm trying to leverage gains of the
> work I'm doing for HVMLite, and part of this is trying to address semantics
> proactively.

What do you mean by 'support'? Support an old kernel or support upstream Linux?

> 
> > >      *iff* the Xen ABI would at the very least set 2 things only:
> > > 
> > >      a) Hypervisor type
> > >      b) A custom data pointer
> > > 
> > >      This would enable a single boot entry on the guest to handle then:
> > > 
> > > 	Pseudo code:
> > > 
> > > 	startup_32()                         startup_64()
> > > 	       |                                  |
> > > 	       |                                  |
> > > 	       V                                  V
> > > 	pre_hypervisor_stub_32()        pre_hypervisor_stub_64()
> > > 	       |                                  |
> > > 	       |                                  |
> > > 	       V                                  V
> > > 	 [existing startup_32()]       [existing startup_64()]
> > > 	       |                                  |
> > > 	       |                                  |
> > > 	       V                                  V
> > > 	post_hypervisor_stub_32()       post_hypervisor_stub_64()
> > > 
> > >      
> > >      If the Xen ABI was flexible about setting a hypervisor type and custom
> > >      data pointer then we would haven handlers for it, and in it, it can
> > >      do whatever it thinks is needed for its own guest types. It could
> > >      also continue to set the zero page on its own as it sees fit.
> > > 
> > >      Again, note that if this is done it could also mean even bringing together
> > >      the old PV boot path closer together... so this is not just a prospect
> > >      for HVMLite but also for old PV guests.
> > > 
> > >   2) Because of 1) it has meant we have no formal semantics for early boot
> > >      code is available and so severe differences can best be addressed also
> > >      by yet another boot entry. This has meant often times not addressing
> > 
> > There are semantics written for this new code: http://xenbits.xen.org/docs/unstable/misc/hvmlite.html
> 
> That only addressed semantics for early boot code implicitly through a new entry...

And there is the Documentation/x86/boot.txt.

You have two semantics from either side clearly defined. Now it is just
the matter of connecting the dots.

> 
> > All other ones related to low-level operations are described in Intel SDM.
> > 
> > 
> > >      or not knowing if we've addressed real differences between the different
> > >      entries. Case in point, dead code [0]. How do we know we will not run
> > >      certain code that should not run for the different entries ? Without
> > >      *any* semantics later in boot code to distinguish where we came from
> > >      and because we strive to build single kernels with different possible
> > >      run time environments it means we have tons of code available to
> > >      execute / run that we may not need.
> > 
> > I am not following that. PVH aka HVMLite will pretty much erase the need for the
> > pvops.
> 
> It does not mean there are no dead code concerns with HVMlite.

I am pretty sure there are none. But I need to make sure I understand
what you mean by 'dead code'.

> 
> > > 
> > >      Because of the lack of semantics we may still have dead code prospects
> > >      with the new HVMLite entry. How are we sure there is no differences ?
> > > 
> > > [0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
> > > 
> > >   3) Unikernel / other OS requirements: this is really tied to 2) but even if
> > >      we tried to evolve the Xen ABI it would mean considering existing solutions
> > >      out there. Things to consider as an example: FreeBSD doesn't have an EFI
> > >      entry, unikernels want a simple boot entry.
> > > 
> > > With this in mind then, that I can think of:
> > > 
> > > Cons of using the same entry but having two different boot paths:
> > > 
> > >   * Pushes the Xen ABI, needs to make everyone happy, this is hard
> > >   * Perhaps harder to implement
> > > 
> > > Gains of striving to use the same entry but having two different boot:
> > > 
> > >  * Helps to share more code easily
> > >  * Reduce attack surface
> > >  * Requires us to have semantics for early boot; this has a series of
> > >    side benefits:
> > >    - Means you should try to address differences explicitly rather than
> > >      implicitly -- case in point Dead Code
> > > 
> > > > You still need a way to distinguish between bare metal
> > > > EFI and HVMlite.
> > > 
> > > Great point! This is the semantics aspect. The new entry for HVMlite approach
> > > deals with this by making the differences implicit by the new entry point.
> > > My call for addressing this through a hypervisor type was to see if we can
> > > get those semantics added explicitly so we can also later address dead
> > > code concerns for the new HVMLite guest type.
> > 
> > Right, they are..
> 
> There is huge merit to address a huge chunks of dead code concerns by sticking
> more closer to the native booth paths, it doesn't mean you still have no

Right, which we do. Keep in mind that Linux does not boot by itself. It needs
a bootloader which sets the stage for it. We set the same exact stage.

> dead code concerns with HVMlite, nor that HVMLite has no platform quirks,
> it does and part of some recent work is to pave a *clean* path for setting
> these differences apart.

/me scratches his head.

There will always be platform quirks.

I guess I am not understanding your concerns. The work that Boris is doing is
to code against the bootparams - which has a spec.

> 
> > > Part of my own interest in an EFI entry here is that EFI could be used to help
> > > expand on the semantics in an OS/agnostic form rather than pushing the x86 boot
> > > protocol further. That seems to have its own set of drawbacks though.
> > > 
> > > 
> > > > And Xen needs a way to find out whether a kernel is
> > > > supporting HVMlite to boot it in the correct mode.
> > > 
> > > How was Xen going to find out if new kernels had HVMlite support with the
> > > new entry ? An ELFNOTE() ? If an entry is shared could we note use an
> > 
> > Yeah.
> > > ELFNOTE() also for this though too ?
> > 
> > Not sure what you mean by 'shared'. But you can add multiple Elf PT_NOTEs.
> > See the ELF document.
> 
> OK so even if we used a common/shared entry point we can address letting
> Xen find out whether or not a kernel supports HVMlite.

Yes. Xen parses the Linux ELF NOTEs and can figure out if the kernel
can do HVMLite or not.

> 
>   Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
  2016-04-13 21:08             ` Konrad Rzeszutek Wilk
@ 2016-04-13 22:23               ` Luis R. Rodriguez
       [not found]               ` <20160413222317.GH1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-13 22:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Stefano Stabellini, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper, Jim Fehlig,
	Andy Lutomirski, Luis R. Rodriguez, David Vrabel, Linus Torvalds

On Wed, Apr 13, 2016 at 05:08:01PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 13, 2016 at 10:40:55PM +0200, Luis R. Rodriguez wrote:
> > On Wed, Apr 13, 2016 at 02:56:29PM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Apr 13, 2016 at 08:29:51PM +0200, Luis R. Rodriguez wrote:
> > > > On Mon, Apr 11, 2016 at 07:12:08AM +0200, Juergen Gross wrote:
> > > > 
> > > > > What would be gained by using the same entry but having two different boot
> > > > > paths after it?
> > > > 
> > > > Its a good question. In summary for me it would be the push for sharing more
> > > > code and the push for semantics on early boot to address differences
> > > > proactively, and ultimately it may enable us to help bring closer the old PV
> > > > boot path closer.
> > > 
> > > But why? We want to kill PV (eventually).
> > 
> > Yeah yeah, but its still there, and we'll have to live with it for
> > at least minimum 5 years I hear. Part of my interest is to see to it
> > that this path gets less disruption and issues, and we also address
> > dead code issues which pvops simply folded under the rug. The dead code
> > concerns may exist still for hvmlite, so unless someone is willing
> > to make a bold claim there is none, its something to consider.
> 
> What is this dead code you speak of? Is it MTRR? Is early path code
> that PV misses (like KASL or other?)

Kasan is dead code to Xen. If you boot x86 Xen with Kasan enabled
Xen explodes. Quick question, will Kasan not explode with HVMLite ?

MTRR used to be dead code concern but since we have vetted most of that code
now we are pretty certain that code should never run now.

KASLR may be -- not sure as I  haven't vetted that, but from
what I have loosely heard maybe.

VGA code will be dead code for HVMlite for sure as the design doc
says it will not run VGA, the ACPI flag will be set but the check
for that is not yet on Linux. That means the VGA Linux code will
be there but we have no way to ensure it will not run nor that
anything will muck with it.

To be clear -- dead code concerns still exist even without
virtualization solutions, its just that with virtualization
this stuff comes up more and there has been no proactive
measures to address this. The question of semantics here is
to see to what extent we need earlier boot code annotations
to ensure we address semantics proactively.

> The entrace point in Linux "proper" is startup_32 or startup_64 - the same
> path that EFI uses.
> 
> If you were to draw this (very simplified):
> 
> a)- GRUB2 ---------------------\ (creates an bootparam structure)
>                                 \
>                                  +---- startup_32 or startup_64
> b) EFI -> Linux EFI stub -------/
>        (creates bootparm)      /
> c) GRUB2-EFI  -> Linux EFI----/
>                stub         /
> d) HVMLite ----------------/
>       (creates bootparm)

b) and d) might be able to share paths there...
d) still has its own entry, it does more than create boot params.

> (I am not sure about the c) - I would have to look in source to
> be source). There is also LILO in this, but I am not even sure if
> works anymore.
> 
> 
> What you have is that every entry point creates the bootparams
> and ends up calling startup_X. The startup_64 then hit the rest
> of the kernel. The startp_X code is the one that would setup
> the basic pagetables, segments, etc.

Sure.. a full diagram should include both sides and how when using
a custom entry one runs the risk of skipping a lot of code setup.
There is that and as others have pointed out how certain guests types
are assumed to not have certain peripherals, and we have no idea
to ensure certain old legacy code may not ever run or be accessed
by drivers.

> > How we address semantics then is *very* important to me.
> 
> Which semantics? How the CPU is going to be at startup_X ? Or
> how the CPU is going to be when EFI firmware invokes the EFI stub?
> Or when GRUB2 loads Linux?

What hypervisor kicked me and what guest type I am.

Let me elaborate more below.

> That (those bootloaders) is clearly defined. The URL I provided
> mentions the HVMLite one. The Documentation/x86/boot.c mentions
> what the semantics are to expected when providing an bootstrap
> (which is what HVMLitel stub code in Linux would write against -
> and what EFI stub code had been written against too).
> > 
> > > > I'll elaborate on this but first let's clarify why a new entry is used for
> > > > HVMlite to start of with:
> > > > 
> > > >   1) Xen ABI has historically not wanted to set up the boot params for Linux
> > > >      guests, instead it insists on letting the Linux kernel Xen boot stubs fill
> > > >      that out for it. This sticking point means it has implicated a boot stub.
> > > 
> > > 
> > > Which is b/c it has to be OS agnostic. It has nothing to do 'not wanting'.
> > 
> > It can still be OS agnostic and pass on type and custom data pointer.
> 
> Sure. It has that (it MUST otherwise how else would you pass data).
> It is documented as well http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,xen.h.html#incontents_startofday
> (see " Start of day structure passed to PVH guests in %ebx.")

The design doc begs for a custom OS entry point though.
If we had a single 'type' and 'custom data' passed to the kernel that
should suffice for the default Linux entry point to just pivot off
of that and do what it needs without more entry points. Once.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]               ` <20160413222317.GH1990@wotan.suse.de>
@ 2016-04-14  1:01                 ` Konrad Rzeszutek Wilk
       [not found]                 ` <20160414010131.GA21510@localhost.localdomain>
  1 sibling, 0 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-14  1:01 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Stefano Stabellini, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper, Jim Fehlig,
	Andy Lutomirski, David Vrabel, Linus Torvalds

On Thu, Apr 14, 2016 at 12:23:17AM +0200, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 05:08:01PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Apr 13, 2016 at 10:40:55PM +0200, Luis R. Rodriguez wrote:
> > > On Wed, Apr 13, 2016 at 02:56:29PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > On Wed, Apr 13, 2016 at 08:29:51PM +0200, Luis R. Rodriguez wrote:
> > > > > On Mon, Apr 11, 2016 at 07:12:08AM +0200, Juergen Gross wrote:
> > > > > 
> > > > > > What would be gained by using the same entry but having two different boot
> > > > > > paths after it?
> > > > > 
> > > > > Its a good question. In summary for me it would be the push for sharing more
> > > > > code and the push for semantics on early boot to address differences
> > > > > proactively, and ultimately it may enable us to help bring closer the old PV
> > > > > boot path closer.
> > > > 
> > > > But why? We want to kill PV (eventually).
> > > 
> > > Yeah yeah, but its still there, and we'll have to live with it for
> > > at least minimum 5 years I hear. Part of my interest is to see to it
> > > that this path gets less disruption and issues, and we also address
> > > dead code issues which pvops simply folded under the rug. The dead code
> > > concerns may exist still for hvmlite, so unless someone is willing
> > > to make a bold claim there is none, its something to consider.
> > 
> > What is this dead code you speak of? Is it MTRR? Is early path code
> > that PV misses (like KASL or other?)
> 
> Kasan is dead code to Xen. If you boot x86 Xen with Kasan enabled

For Xen PV guests,
> Xen explodes. Quick question, will Kasan not explode with HVMLite ?

.. but for HVMLite of Xen HVM guest Kasan will run.
> 
> MTRR used to be dead code concern but since we have vetted most of that code
> now we are pretty certain that code should never run now.
> 
> KASLR may be -- not sure as I  haven't vetted that, but from
> what I have loosely heard maybe.
> 
> VGA code will be dead code for HVMlite for sure as the design doc
> says it will not run VGA, the ACPI flag will be set but the check
> for that is not yet on Linux. That means the VGA Linux code will
> be there but we have no way to ensure it will not run nor that
> anything will muck with it.

<shrugs> The worst it will do is try to read non-existent registers.
The VGA code should be able to handle failures like that and
not initialize itself when the hardware is dead (or non-existent).
> 
> To be clear -- dead code concerns still exist even without
> virtualization solutions, its just that with virtualization
> this stuff comes up more and there has been no proactive
> measures to address this. The question of semantics here is
> to see to what extent we need earlier boot code annotations
> to ensure we address semantics proactively.

I think what you mean by dead code is another word for
hardware test coverage?
> 
> > The entrace point in Linux "proper" is startup_32 or startup_64 - the same
> > path that EFI uses.
> > 
> > If you were to draw this (very simplified):
> > 
> > a)- GRUB2 ---------------------\ (creates an bootparam structure)
> >                                 \
> >                                  +---- startup_32 or startup_64
> > b) EFI -> Linux EFI stub -------/
> >        (creates bootparm)      /
> > c) GRUB2-EFI  -> Linux EFI----/
> >                stub         /
> > d) HVMLite ----------------/
> >       (creates bootparm)
> 
> b) and d) might be able to share paths there...

No idea. You would have to look in the assembler code to
figure that out.

> d) still has its own entry, it does more than create boot params.

d) purpose is to create boot params. It may do more as nobody likes
to muck in assembler and make bootparams from within assembler.

> 
> > (I am not sure about the c) - I would have to look in source to
> > be source). There is also LILO in this, but I am not even sure if
> > works anymore.
> > 
> > 
> > What you have is that every entry point creates the bootparams
> > and ends up calling startup_X. The startup_64 then hit the rest
> > of the kernel. The startp_X code is the one that would setup
> > the basic pagetables, segments, etc.
> 
> Sure.. a full diagram should include both sides and how when using
> a custom entry one runs the risk of skipping a lot of code setup.

But it does not skip a lot of code setup. It starts exactly
at the same code startup that _all_ bootstraping code start at.

> There is that and as others have pointed out how certain guests types
> are assumed to not have certain peripherals, and we have no idea
> to ensure certain old legacy code may not ever run or be accessed
> by drivers.

Ok, but that is not at code setup. That is later - when device
drivers are initialized. This no different than booting on
some hardware with missing functionality. ACPI, PCI and PnP
PnP are set there to help OSes discover this.
> 
> > > How we address semantics then is *very* important to me.
> > 
> > Which semantics? How the CPU is going to be at startup_X ? Or
> > how the CPU is going to be when EFI firmware invokes the EFI stub?
> > Or when GRUB2 loads Linux?
> 
> What hypervisor kicked me and what guest type I am.

cpuid software flags have that - and that semantics has been 
there for eons.
> 
> Let me elaborate more below.
> 
> > That (those bootloaders) is clearly defined. The URL I provided
> > mentions the HVMLite one. The Documentation/x86/boot.c mentions
> > what the semantics are to expected when providing an bootstrap
> > (which is what HVMLitel stub code in Linux would write against -
> > and what EFI stub code had been written against too).
> > > 
> > > > > I'll elaborate on this but first let's clarify why a new entry is used for
> > > > > HVMlite to start of with:
> > > > > 
> > > > >   1) Xen ABI has historically not wanted to set up the boot params for Linux
> > > > >      guests, instead it insists on letting the Linux kernel Xen boot stubs fill
> > > > >      that out for it. This sticking point means it has implicated a boot stub.
> > > > 
> > > > 
> > > > Which is b/c it has to be OS agnostic. It has nothing to do 'not wanting'.
> > > 
> > > It can still be OS agnostic and pass on type and custom data pointer.
> > 
> > Sure. It has that (it MUST otherwise how else would you pass data).
> > It is documented as well http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,xen.h.html#incontents_startofday
> > (see " Start of day structure passed to PVH guests in %ebx.")
> 
> The design doc begs for a custom OS entry point though.

That is what the ELF Note has.
> If we had a single 'type' and 'custom data' passed to the kernel that
> should suffice for the default Linux entry point to just pivot off
> of that and do what it needs without more entry points. Once.

And what about ramdisk? What about multiple ramdisks?
What about command line? All of that is what bootparams
tries to unify on Linux. But 'bootparams' is unique to Linux,
it does not exist on FreeBSD. Hence some stub code to transplant
OS-agnostic simple data to OS-specific is neccessary.
> 
>   Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]             ` <20160413185451.GY1990@wotan.suse.de>
@ 2016-04-14  9:42               ` George Dunlap
       [not found]               ` <570F65F7.5050108@citrix.com>
  1 sibling, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-04-14  9:42 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Takashi Iwai, Vojtěch Pavlík, Gary Lin, xen-devel,
	Jeffrey Cheung, Charles Arndol, Julien Grall, Stefano Stabellini,
	Julien Grall, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Linux Kernel Mailing List

On 13/04/16 19:54, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 11:05:00AM +0100, George Dunlap wrote:
>> On Tue, Apr 12, 2016 at 11:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>>> Also, x86 does have a history of short DT use. Just pointing that its there as
>>> an option as well. I'll Cc you on some thread about that.
>>
>> I'm not sure how this is relevant to anything.
> 
> You brought DT as a reason why ARM was able to use the native point.
> I'm clarifying DT has nothing to do as a restriction on x86.

No, DT isn't the reason Xen is able to use the native entry point on
ARM.  The reason is, to quote myself: "there are no assumptions made
about what hardware is or is not present on the system -- everything
that needs to be communicated about what is or is not present can be
passed in DT."

So that's three things:
1. DT is available to be used
2. DT is expected as the main thing that entry point accepts
3. There are no assumptions about what hardware is or is not present in
the system
4. Everything that needs to be communicated about what is or is not
present can be passed in DT.

Are #2, #3, and #4 true on x86?  If not then #1 is irrelevant.

[snip from another thread]

> One. CE4100.
>
> arch/x86/platform/ce4100/falconfalls.dt

You CC'd me on some patches related to that.  I don't know anything
about the code, but it looked like CE4100 is a subarch, and in response
to that thread Ingo specifically asked you to add a comment saying
basically "Don't add any more subarches".

And not only that, but the ugly, nasty legacy PV boot path we're trying
to get rid of IS ALSO A SUBARCH.  So instead of a quick stub with an
extra EFI flag, you're proposing we consider add yet another Xen PV subarch?

>> What we're talking about is how to get from Xen to a point in the
>> Linux kernel where everything can Just Work.  The proposed feature is
>> a mini trampoline that (as I understand it):
>> 1. Tells Xen where to jump to (via ELF note)
>> 2. Sets up some basic modes and pagetables and then jumps to the zero
>> page so Linux can just carry on.
> 
> Right, and the my goal is to see to it we do enough homework to
> ensure we reviewed all possibilities to share as much code as possible
> already and looked at all options before saying we certainly need yet
> another entry point. I am not convinced yet this has been done.

I think we have different ideas about what an appropriate amount of
homework is. :-)  Everything you've put forward has been given
consideration and judged unlikely to be promising; and your suggestions
for further possibilities (like this one) keep getting more and more
obviously unsuitable.  We shouldn't be required to actually post code
for every single other option just to prove how ugly they are,
particularly when there's nothing particularly wrong with the code we have.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]       ` <20160413195257.GB1990@wotan.suse.de>
@ 2016-04-14  9:53         ` George Dunlap
       [not found]         ` <570F68AB.2040400@citrix.com>
  1 sibling, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-04-14  9:53 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Julien Grall, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Jim Fehlig, Andy Lutomirski

On 13/04/16 20:52, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 04:44:54PM +0100, George Dunlap wrote:
>> On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>>> So more to it, if the EFI entry already provides a way into Linux
>>> in a more streamlined fashion bringing it closer to the bare metal
>>> boot entry, why *would* we add another boot entry to x86, even if
>>> its small and self contained ?
>>
>> We would avoid using EFI if:
> 
> And this is what I was looking for, thanks!
> 
>> * Being called both on real hardware and under Xen would make the EFI
>> entry point more complicated
> 
> That's on the EFI Linux maintainer to assess. And he seems willing to
> consider this.
> 
>> * Adding the necessary EFI support into Xen would be a significant
>> chunk of extra work
> 
> This seems to be a good sticking point, but Andi noted another aspect
> of this or redundancy as well.
> 
>> * Requiring PVH mode to implement EFI would make it more difficult for
>> other kernes (NetBSD, FreeBSD) to act as dom0s.
> 
> What if this is an option only then ?
> 
>>
>> * Requiring PVH mode to use EFI would make it more difficult to
>> support unikernel-style workloads for domUs.
> 
> What if this is an option only then ?

So first of all, you asked why anyone would oppose EFI, and this is part
of the answer to that.

Secondly, you mean "What if this is the only thing the Linux maintainers
will accept?"  And you already know the answer to that.

How much of a burden it would be on the rest of the open-source
ecosystem (Xen, *BSDs, &c) is a combination of some as-yet unknown facts
(i.e., what a minimal Xen/Linux EFI interface would look like) and a
matter of judgement (i.e., given the same interface, reasonable people
may come to different conclusions about whether the interface is an
undue burden to impose on others or not).

But I would hope that the Linux maintainers would at least consider the
broader community when weighing their decisions, and not take advantage
of their position of dominance to simply ignore the effect of their
choices on everybody else.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]               ` <20160413191408.GA1990@wotan.suse.de>
  2016-04-13 19:22                 ` Konrad Rzeszutek Wilk
@ 2016-04-14 10:13                 ` George Dunlap
  1 sibling, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-04-14 10:13 UTC (permalink / raw)
  To: Luis R. Rodriguez, Konrad Rzeszutek Wilk
  Cc: Matt Fleming, jeffm, Michael Chang, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Takashi Iwai, Vojtěch Pavlík, Gary Lin, xen-devel,
	Jeffrey Cheung, Charles Arndol, Julien Grall, Stefano Stabellini,
	Julien Grall, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Linux Kernel Mailing List

On 13/04/16 20:14, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 03:02:26PM -0400, Konrad Rzeszutek Wilk wrote:
>> On Wed, Apr 13, 2016 at 08:50:10PM +0200, Luis R. Rodriguez wrote:
>>> On Wed, Apr 13, 2016 at 11:54:29AM +0200, Roger Pau Monné wrote:
>>>> On Fri, Apr 08, 2016 at 11:58:54PM +0200, Luis R. Rodriguez wrote:
>>>>> OK thanks for the clarification -- still no custom entries for Xen!
>>>>> We should strive for that, at the very least.
>>>>>
>>>>> You do have a point about the legacy stuff. There are two options there:
>>>>>
>>>>>   * Fold legacy support under HVMLite -- which seems to be what we
>>>>>     currently want to do (we should evaluate the implications and
>>>>>     requirements here for that); or
>>>>
>>>> I'm not following here. What does it mean to fold legacy support under 
>>>> HVMlite? HVMlite doesn't have any legacy hardware, and that's the issue when 
>>>> it comes to using native Linux entry points. Linux might expect some legacy 
>>>> PC hardware to be always present, which is not true for HVMlite.
>>>>
>>>> Could you please clarify this point?
>>>
>>> It seems there is a confusion on terms used. By folding legacy support under
>>> HVMLite I meant folding legacy PV path (classic PV with PV interfaces) under
>>> HVMlite.
>>
>> Ewww.
> 
> Probably a confusion again on terms, by the above I meant to say what you seem
> to be indicating below, which is to keep old PV guest support with PV interfaces
> using a new shiny entry.
> 
> Or are we really going to nuke full support for old PV guests ?

Just to be clear: In this case "support for old PV guests" really means,
"Support for running new versions of Linux in PV mode on old
(non-HVMLite-capable) hypervisors".  And yes, that is the plan: in 5
years' time, if you're still running Xen 4.6, to run a Linux 5.17* guest
you'll have to run it in HVM mode, and you won't be able to use it as a
dom0.  (Xen 6.1 will still support Linux 4.5 running in PV mode, however.)

 -George

* Making up version numbers here, obviously

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]                 ` <20160414010131.GA21510@localhost.localdomain>
@ 2016-04-14 18:40                   ` Luis R. Rodriguez
       [not found]                   ` <20160414184048.GM1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-14 18:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Kees Cook, Stefano Stabellini, Josh Triplett,
	joeyli, Borislav Petkov, Boris Ostrovsky, Juergen Gross,
	Andrew Cooper, Jim Fehlig, Andy Lutomirski, Luis R. Rodriguez

On Wed, Apr 13, 2016 at 09:01:32PM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Apr 14, 2016 at 12:23:17AM +0200, Luis R. Rodriguez wrote:
> > On Wed, Apr 13, 2016 at 05:08:01PM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Apr 13, 2016 at 10:40:55PM +0200, Luis R. Rodriguez wrote:
> > > > On Wed, Apr 13, 2016 at 02:56:29PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > > On Wed, Apr 13, 2016 at 08:29:51PM +0200, Luis R. Rodriguez wrote:
> > > > 
> > > > and we also want to address dead code issues which pvops simply folded
> > > > under the rug. The dead code concerns may exist still for hvmlite, so
> > > > unless someone is willing to make a bold claim there is none, its
> > > > something to consider.
> > > 
> > > What is this dead code you speak of?
> > 
> > Kasan is dead code to Xen. If you boot x86 Xen with Kasan enabled
> 
> For Xen PV guests,

That's right. For 5 years this will be a bomb. That went unnoticed
and I feel I have to pull hair now to try to get folks to fix this.

How many other issues will go in which will explode during this 5 year
time line? How can we proactively address a solution to this now so we
avoid this in future ?

Do you believe me now its a real issue?

Fortunately I have a proactive solution for pvops now in my pipeline that
should help avoid us having to blow more things up on Xen but also that should
cause no headaches on behalf of x86 developers. But, reason I have been so
engaged on HVMLite design review is I want to ensure we take the lessons
learned from pvops and avoid this for an architecture that will be def-facto
Xen on Linux 5 years from now.

Not bringing this up or addressing this now for HVMLite / PVH2 would simply be
silly, and since it wasn't addressed in pvops I obviously have to ensure I
convince enough people it was a real issue and ensure that we have enough
semantics available to address it.

Part of the semantics question, which has made my quest hard, was the use of
semantics for virtualization for code on early boot and later in boot has been
rather sloppy, so we have recently needed to address some of these gaps. Some
of these discussions have however been productive, as I'll explain to George
soon regarding his DT questions.  The discussion is not over though and we need
to ensure that if we need semantics for HVMLite we'll have them available in
*clean* way. One of the things where early semantics and design to address
these issue help in a proactive manner is to address a clean boot entry -- and
that's also why I've been so pedantic over review of the new HVMlite boot
entry.

> > Xen explodes. Quick question, will Kasan not explode with HVMLite ?
> 
> .. but for HVMLite of Xen HVM guest Kasan will run.

Are you sure? Should that mean that Xen HVM should be fine as well.  Does that
work? Are we sure?

> > MTRR used to be dead code concern but since we have vetted most of that code
> > now we are pretty certain that code should never run now.
> > 
> > KASLR may be -- not sure as I  haven't vetted that, but from
> > what I have loosely heard maybe.
> > 
> > VGA code will be dead code for HVMlite for sure as the design doc
> > says it will not run VGA, the ACPI flag will be set but the check
> > for that is not yet on Linux. That means the VGA Linux code will
> > be there but we have no way to ensure it will not run nor that
> > anything will muck with it.
> 
> <shrugs> The worst it will do is try to read non-existent registers.

Really ?

Is that your position on all other possible dead code that may have been
possible on old Xen PV guests as well ?

As I hinted, after thinking about this for a while I realized that dead code is
likely present on bare metal as well even without virtualization, specially if
you build large single kernels to support a wide array of features which only
late at run time can be determined. Virtualization and the pvops design just
makes this issue much more prominent. If there are other areas of code exposed
that actually may run, but we are not sure may run, I figured some other folks
with a bit more security conscience minds might even simply take the position
it may be a security risk to leave that code exposed. So to take a position
that 'the worst it will do is try to read non-existent registers' -- seems
rather shortsighted here.

Anyway for more details on thoughts on this refer to the this wiki:

http://kernelnewbies.org/KernelProjects/kernel-sandboxing

Since this is now getting off topic please send me your feedback on another
thread for the non-virtualization aspects of this if that interests you. My
point here was rather to highlight the importance of clear semantics due to
virtualization in light of possible dead code.

> The VGA code should be able to handle failures like that and
> not initialize itself when the hardware is dead (or non-existent).

That's right, its through ACPI_FADT_NO_VGA and since its part of the HVMLite
design doc we want HVMlite design to address ACPI_FADT_NO_VGA properly.  I've
paved the way for this to be done cleanly and easily now, but that code should
be in place before HVMLite code gets merged.

Does domU for old Xen PV also set ACPI_FADT_NO_VGA as well ?  Should it ?

> > To be clear -- dead code concerns still exist even without
> > virtualization solutions, its just that with virtualization
> > this stuff comes up more and there has been no proactive
> > measures to address this. The question of semantics here is
> > to see to what extent we need earlier boot code annotations
> > to ensure we address semantics proactively.
> 
> I think what you mean by dead code is another word for
> hardware test coverage?

No, no, its very different given that with virtualization the scope of possible
dead code is significant and at run time you are certain a huge portion of code
should *never ever* run. So for instance we know once we boot bare metal none
of the Xen stuff should ever run, likewise on Xen dom0 we know none of the KVM
/ bare-metal only stuff should never run, when on Xen domU, none of the Xen
domU-only stuff should ever run.

> > > The entrace point in Linux "proper" is startup_32 or startup_64 - the same
> > > path that EFI uses.
> > > 
> > > If you were to draw this (very simplified):
> > > 
> > > a)- GRUB2 ---------------------\ (creates an bootparam structure)
> > >                                 \
> > >                                  +---- startup_32 or startup_64
> > > b) EFI -> Linux EFI stub -------/
> > >        (creates bootparm)      /
> > > c) GRUB2-EFI  -> Linux EFI----/
> > >                stub         /
> > > d) HVMLite ----------------/
> > >       (creates bootparm)
> > 
> > b) and d) might be able to share paths there...
> 
> No idea. You would have to look in the assembler code to
> figure that out.

And that's a pain, I get it.

I spotted one place already -- will note to Boris. I think Matt may have more
ideas ;)

> > d) still has its own entry, it does more than create boot params.
> 
> d) purpose is to create boot params.

OK good to know that's the only thing we acknowledge it *should* do.

>  It may do more as nobody likes to muck in assembler and make bootparams from
>  within assembler.

OK -- it does do more and that's where we'd like to avoid duplication if
possible and yet-another-entry (TM).

> > > (I am not sure about the c) - I would have to look in source to
> > > be source). There is also LILO in this, but I am not even sure if
> > > works anymore.
> > > 
> > > 
> > > What you have is that every entry point creates the bootparams
> > > and ends up calling startup_X. The startup_64 then hit the rest
> > > of the kernel. The startp_X code is the one that would setup
> > > the basic pagetables, segments, etc.
> > 
> > Sure.. a full diagram should include both sides and how when using
> > a custom entry one runs the risk of skipping a lot of code setup.
> 
> But it does not skip a lot of code setup. It starts exactly
> at the same code startup that _all_ bootstraping code start at.

Its a fair point.

> > There is that and as others have pointed out how certain guests types
> > are assumed to not have certain peripherals, and we have no idea
> > to ensure certain old legacy code may not ever run or be accessed
> > by drivers.
> 
> Ok, but that is not at code setup. That is later - when device
> drivers are initialized. This no different than booting on
> some hardware with missing functionality. ACPI, PCI and PnP
> PnP are set there to help OSes discover this.

To a certain extent this is true, but there may things which are missing still.

We really have no idea what the full list of those things are.

It may be that things may have been running for ages without notice of an issue
or that only under certain situations will certain issues or bugs trigger a
failure. For instance, just yesterday I was Cc'd on a brand-spanking new legacy
conflict [0], caused by upstream commit 8c058b0b9c34d8c ("x86/irq: Probe for
PIC presence before allocating descs for legacy IRQs") merged on v4.4 where
some new code used nr_legacy_irqs() -- one proposed solution seems to be that
for Xen code NR_IRQS_LEGACY should be used instead is as it lacks PCI [1] and
another was to peg the legacy requirements as a quirk on the new x86 platform
legacy quirk stuff [2]. Are other uses of nr_legacy_irqs() correct ? Are
we sure ?

[0] http://lkml.kernel.org/r/570F90DF.1020508@oracle.com
[1] https://lkml.org/lkml/2016/4/14/532
[2] http://lkml.kernel.org/r/1460592286-300-1-git-send-email-mcgrof@kernel.org

> > > > How we address semantics then is *very* important to me.
> > > 
> > > Which semantics? How the CPU is going to be at startup_X ? Or
> > > how the CPU is going to be when EFI firmware invokes the EFI stub?
> > > Or when GRUB2 loads Linux?
> > 
> > What hypervisor kicked me and what guest type I am.
> 
> cpuid software flags have that - and that semantics has been 
> there for eons.

We cannot use cpuid early in asm code, I'm looking for something we
can even use on asm early in boot code, on x86 the best option we
have is the boot_params, but I've even have had issues with that
early in code, as I can only access it after load_idt() where I
described my effort to unify Xen PV and x86_64 init paths [3].

[3] http://lkml.kernel.org/r/CAB=NE6VTCRCazcNpCdJ7pN1eD3=x_fcGOdH37MzVpxkKEN5esw@mail.gmail.com

> > Let me elaborate more below.
> > 
> > > That (those bootloaders) is clearly defined. The URL I provided
> > > mentions the HVMLite one. The Documentation/x86/boot.c mentions
> > > what the semantics are to expected when providing an bootstrap
> > > (which is what HVMLitel stub code in Linux would write against -
> > > and what EFI stub code had been written against too).
> > > > 
> > > > > > I'll elaborate on this but first let's clarify why a new entry is used for
> > > > > > HVMlite to start of with:
> > > > > > 
> > > > > >   1) Xen ABI has historically not wanted to set up the boot params for Linux
> > > > > >      guests, instead it insists on letting the Linux kernel Xen boot stubs fill
> > > > > >      that out for it. This sticking point means it has implicated a boot stub.
> > > > > 
> > > > > 
> > > > > Which is b/c it has to be OS agnostic. It has nothing to do 'not wanting'.
> > > > 
> > > > It can still be OS agnostic and pass on type and custom data pointer.
> > > 
> > > Sure. It has that (it MUST otherwise how else would you pass data).
> > > It is documented as well http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,xen.h.html#incontents_startofday
> > > (see " Start of day structure passed to PVH guests in %ebx.")
> > 
> > The design doc begs for a custom OS entry point though.
> 
> That is what the ELF Note has.

Right, but I'm saying that its rather silly to be adding entry points if
all we want the code to do is copy the boot params for us. The design
doc requires a new entry, and likewise you'd need yet-another-entry if
HVMLite is thrown out the window and come back 5 years later after new
hardware solutions are in place and need to redesign HVMLite. Kind of
where we are with PVH today. Likewise if other paravirtualization
developers want to support Linux and want to copy your strategy they'd
add yet-another-entry-point as well.

This is dumb.

> > If we had a single 'type' and 'custom data' passed to the kernel that
> > should suffice for the default Linux entry point to just pivot off
> > of that and do what it needs without more entry points. Once.
> 
> And what about ramdisk? What about multiple ramdisks?
> What about command line? All of that is what bootparams
> tries to unify on Linux. But 'bootparams' is unique to Linux,
> it does not exist on FreeBSD. Hence some stub code to transplant
> OS-agnostic simple data to OS-specific is neccessary.

If we had a Xen ABI option where *all* that I'm asking is you pass
first:

  a) hypervisor type
  b) custom data pointer

We'd be able to avoid adding *any* entry point and just address
the requirements as I noted with pre / post stubs for the type.
This would require an x86 boot protocol bump, but all the issues
creeping up randomly I think that's worth putting on the table now.

And maybe we don't want it to be hypervisor specific, perhaps there are other
*needs* for custom pre-post startup_32()/startup_64() stubs.

To avoid extending boot_params further I figured perhaps we can look
at EFI as another option instead. If we are going to drop all legacy
PV support from the kernel (not the hypervisor) and require hardware
virtualization 5 years from now on the Linux kernel, it doesn't seem
to me far fetched to at the very least consider using an EFI entry
instead, specially since all it does is set boot params and we can
make re-use this for HVMLite too.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]         ` <570F68AB.2040400@citrix.com>
@ 2016-04-14 19:44           ` Luis R. Rodriguez
       [not found]           ` <20160414194408.GP1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-14 19:44 UTC (permalink / raw)
  To: George Dunlap
  Cc: Matt Fleming, jeffm, Linux Kernel Mailing List, Jim Fehlig,
	Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Julien Grall, Stefano Stabellini, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper, Michael Chang,
	Andy Lutomirski

On Thu, Apr 14, 2016 at 10:53:47AM +0100, George Dunlap wrote:
> On 13/04/16 20:52, Luis R. Rodriguez wrote:
> > On Wed, Apr 13, 2016 at 04:44:54PM +0100, George Dunlap wrote:
> >> On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >>> So more to it, if the EFI entry already provides a way into Linux
> >>> in a more streamlined fashion bringing it closer to the bare metal
> >>> boot entry, why *would* we add another boot entry to x86, even if
> >>> its small and self contained ?
> >>
> >> We would avoid using EFI if:
> > 
> > And this is what I was looking for, thanks!
> > 
> >> * Being called both on real hardware and under Xen would make the EFI
> >> entry point more complicated
> > 
> > That's on the EFI Linux maintainer to assess. And he seems willing to
> > consider this.
> > 
> >> * Adding the necessary EFI support into Xen would be a significant
> >> chunk of extra work
> > 
> > This seems to be a good sticking point, but Andi noted another aspect
> > of this or redundancy as well.
> > 
> >> * Requiring PVH mode to implement EFI would make it more difficult for
> >> other kernes (NetBSD, FreeBSD) to act as dom0s.
> > 
> > What if this is an option only then ?
> > 
> >>
> >> * Requiring PVH mode to use EFI would make it more difficult to
> >> support unikernel-style workloads for domUs.
> > 
> > What if this is an option only then ?
> 
> So first of all, you asked why anyone would oppose EFI, and this is part
> of the answer to that.
> 
> Secondly, you mean "What if this is the only thing the Linux maintainers
> will accept?"  And you already know the answer to that.

No, I meant to ask, would it be possible to make booting HVMLite using EFI
be optional ? That way if you already support EFI that can be used on
your entires with some small modifications.

> How much of a burden it would be on the rest of the open-source
> ecosystem (Xen, *BSDs, &c) is a combination of some as-yet unknown facts
> (i.e., what a minimal Xen/Linux EFI interface would look like) and a
> matter of judgement (i.e., given the same interface, reasonable people
> may come to different conclusions about whether the interface is an
> undue burden to impose on others or not).
> 
> But I would hope that the Linux maintainers would at least consider the
> broader community when weighing their decisions, and not take advantage
> of their position of dominance to simply ignore the effect of their
> choices on everybody else.

This has nothing to do with dominance or anything nefarious, I'm asking
simply for a full engineering evaluation of all possibilities, with
the long term in mind. Not for now, but for hardware assumptions which
are sensible 5 years from now.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]                   ` <20160414184048.GM1990@wotan.suse.de>
@ 2016-04-14 19:56                     ` Konrad Rzeszutek Wilk
  2016-04-14 20:56                       ` Luis R. Rodriguez
       [not found]                       ` <20160414205619.GR1990@wotan.suse.de>
  0 siblings, 2 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-14 19:56 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Kees Cook, Stefano Stabellini, Josh Triplett,
	joeyli, Borislav Petkov, Boris Ostrovsky, Juergen Gross,
	Andrew Cooper, Jim Fehlig, Andy Lutomirski, David Vrabel

On Thu, Apr 14, 2016 at 08:40:48PM +0200, Luis R. Rodriguez wrote:
> On Wed, Apr 13, 2016 at 09:01:32PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Thu, Apr 14, 2016 at 12:23:17AM +0200, Luis R. Rodriguez wrote:
> > > On Wed, Apr 13, 2016 at 05:08:01PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > On Wed, Apr 13, 2016 at 10:40:55PM +0200, Luis R. Rodriguez wrote:
> > > > > On Wed, Apr 13, 2016 at 02:56:29PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > > > On Wed, Apr 13, 2016 at 08:29:51PM +0200, Luis R. Rodriguez wrote:
> > > > > 
> > > > > and we also want to address dead code issues which pvops simply folded
> > > > > under the rug. The dead code concerns may exist still for hvmlite, so
> > > > > unless someone is willing to make a bold claim there is none, its
> > > > > something to consider.
> > > > 
> > > > What is this dead code you speak of?
> > > 
> > > Kasan is dead code to Xen. If you boot x86 Xen with Kasan enabled
> > 
> > For Xen PV guests,
> 
> That's right. For 5 years this will be a bomb. That went unnoticed
> and I feel I have to pull hair now to try to get folks to fix this.

Sometimes you have to roll up your sleeves and do the work yourself.
> 
> How many other issues will go in which will explode during this 5 year
> time line? How can we proactively address a solution to this now so we
> avoid this in future ?
> 
> Do you believe me now its a real issue?

I never said otherwise. What I was confused was that you grouped
this with HVMLite - which would not have a problem like this.

> 
> Fortunately I have a proactive solution for pvops now in my pipeline that
> should help avoid us having to blow more things up on Xen but also that should
> cause no headaches on behalf of x86 developers. But, reason I have been so
> engaged on HVMLite design review is I want to ensure we take the lessons
> learned from pvops and avoid this for an architecture that will be def-facto
> Xen on Linux 5 years from now.
> 
> Not bringing this up or addressing this now for HVMLite / PVH2 would simply be
> silly, and since it wasn't addressed in pvops I obviously have to ensure I
> convince enough people it was a real issue and ensure that we have enough
> semantics available to address it.

Kasan came after pvops, so of course it was not addressed in pvops.

> 
> Part of the semantics question, which has made my quest hard, was the use of
> semantics for virtualization for code on early boot and later in boot has been
> rather sloppy, so we have recently needed to address some of these gaps. Some
> of these discussions have however been productive, as I'll explain to George
> soon regarding his DT questions.  The discussion is not over though and we need
> to ensure that if we need semantics for HVMLite we'll have them available in
> *clean* way. One of the things where early semantics and design to address
> these issue help in a proactive manner is to address a clean boot entry -- and
> that's also why I've been so pedantic over review of the new HVMlite boot
> entry.

I must have missed your review of the patches. Sorry!
> 
> > > Xen explodes. Quick question, will Kasan not explode with HVMLite ?
> > 
> > .. but for HVMLite of Xen HVM guest Kasan will run.
> 
> Are you sure? Should that mean that Xen HVM should be fine as well.  Does that
> work? Are we sure?

Yes, and yes.
> 
> > > MTRR used to be dead code concern but since we have vetted most of that code
> > > now we are pretty certain that code should never run now.
> > > 
> > > KASLR may be -- not sure as I  haven't vetted that, but from
> > > what I have loosely heard maybe.
> > > 
> > > VGA code will be dead code for HVMlite for sure as the design doc
> > > says it will not run VGA, the ACPI flag will be set but the check
> > > for that is not yet on Linux. That means the VGA Linux code will
> > > be there but we have no way to ensure it will not run nor that
> > > anything will muck with it.
> > 
> > <shrugs> The worst it will do is try to read non-existent registers.
> 
> Really ?
> 
> Is that your position on all other possible dead code that may have been
> possible on old Xen PV guests as well ?

This is not just with Xen - it with other device drivers that are being
invoked on baremetal and are not present in hardware anymore.
> 
> As I hinted, after thinking about this for a while I realized that dead code is
> likely present on bare metal as well even without virtualization, specially if

Yes!
> you build large single kernels to support a wide array of features which only
> late at run time can be determined. Virtualization and the pvops design just
> makes this issue much more prominent. If there are other areas of code exposed
> that actually may run, but we are not sure may run, I figured some other folks
> with a bit more security conscience minds might even simply take the position
> it may be a security risk to leave that code exposed. So to take a position
> that 'the worst it will do is try to read non-existent registers' -- seems
> rather shortsighted here.

Security conscious people trim their CONFIG.
>  
> Anyway for more details on thoughts on this refer to the this wiki:
> 
> http://kernelnewbies.org/KernelProjects/kernel-sandboxing
> 
> Since this is now getting off topic please send me your feedback on another
> thread for the non-virtualization aspects of this if that interests you. My
> point here was rather to highlight the importance of clear semantics due to
> virtualization in light of possible dead code.

Thank you.
> 
> > The VGA code should be able to handle failures like that and
> > not initialize itself when the hardware is dead (or non-existent).
> 
> That's right, its through ACPI_FADT_NO_VGA and since its part of the HVMLite
> design doc we want HVMlite design to address ACPI_FADT_NO_VGA properly.  I've
> paved the way for this to be done cleanly and easily now, but that code should
> be in place before HVMLite code gets merged.
> 
> Does domU for old Xen PV also set ACPI_FADT_NO_VGA as well ?  Should it ?

It does not. Not sure - it seems to have worked fine for the last ten
years?
> 
> > > To be clear -- dead code concerns still exist even without
> > > virtualization solutions, its just that with virtualization
> > > this stuff comes up more and there has been no proactive
> > > measures to address this. The question of semantics here is
> > > to see to what extent we need earlier boot code annotations
> > > to ensure we address semantics proactively.
> > 
> > I think what you mean by dead code is another word for
> > hardware test coverage?
> 
> No, no, its very different given that with virtualization the scope of possible
> dead code is significant and at run time you are certain a huge portion of code
> should *never ever* run. So for instance we know once we boot bare metal none
> of the Xen stuff should ever run, likewise on Xen dom0 we know none of the KVM
> / bare-metal only stuff should never run, when on Xen domU, none of the Xen

What is this 'bare metal only stuff' you speak of? On Xen dom0 most of
the baremetal code is running. In fact that is how the device drivers
work. Or are you talking about low level baremetal code? If so, then
PVH/HVMLite does that - it skips pvops so that it can run this
'low-level baremetal code'

> domU-only stuff should ever run.

You forgot KVM guest support on baremetal. That shouldn't run either.

> 
> > > > The entrace point in Linux "proper" is startup_32 or startup_64 - the same
> > > > path that EFI uses.
> > > > 
> > > > If you were to draw this (very simplified):
> > > > 
> > > > a)- GRUB2 ---------------------\ (creates an bootparam structure)
> > > >                                 \
> > > >                                  +---- startup_32 or startup_64
> > > > b) EFI -> Linux EFI stub -------/
> > > >        (creates bootparm)      /
> > > > c) GRUB2-EFI  -> Linux EFI----/
> > > >                stub         /
> > > > d) HVMLite ----------------/
> > > >       (creates bootparm)
> > > 
> > > b) and d) might be able to share paths there...
> > 
> > No idea. You would have to look in the assembler code to
> > figure that out.
> 
> And that's a pain, I get it.
> 
> I spotted one place already -- will note to Boris. I think Matt may have more
> ideas ;)
> 
> > > d) still has its own entry, it does more than create boot params.
> > 
> > d) purpose is to create boot params.
> 
> OK good to know that's the only thing we acknowledge it *should* do.

And b), c) purpose is for that too - amongts providing an mechanism
to call in EFI firmware.

And I realized that early baremetal boot option also ends up calling C during
its startup (see main in arch/x86/boot/main.c) amongst then switching
different modes.

> 
> >  It may do more as nobody likes to muck in assembler and make bootparams from
> >  within assembler.
> 
> OK -- it does do more and that's where we'd like to avoid duplication if
> possible and yet-another-entry (TM).

It does more? EFI stub entry does more than the GRUB2 entry.

If you have some patches to trim the code duplication within
those boot paths- please post it.
> 
> > > > (I am not sure about the c) - I would have to look in source to
> > > > be source). There is also LILO in this, but I am not even sure if
> > > > works anymore.
> > > > 
> > > > 
> > > > What you have is that every entry point creates the bootparams
> > > > and ends up calling startup_X. The startup_64 then hit the rest
> > > > of the kernel. The startp_X code is the one that would setup
> > > > the basic pagetables, segments, etc.
> > > 
> > > Sure.. a full diagram should include both sides and how when using
> > > a custom entry one runs the risk of skipping a lot of code setup.
> > 
> > But it does not skip a lot of code setup. It starts exactly
> > at the same code startup that _all_ bootstraping code start at.
> 
> Its a fair point.
> 
> > > There is that and as others have pointed out how certain guests types
> > > are assumed to not have certain peripherals, and we have no idea
> > > to ensure certain old legacy code may not ever run or be accessed
> > > by drivers.
> > 
> > Ok, but that is not at code setup. That is later - when device
> > drivers are initialized. This no different than booting on
> > some hardware with missing functionality. ACPI, PCI and PnP
> > PnP are set there to help OSes discover this.
> 
> To a certain extent this is true, but there may things which are missing still.

Like?
> 
> We really have no idea what the full list of those things are.

Ok, it sounds like you have some homework.
> 
> It may be that things may have been running for ages without notice of an issue
> or that only under certain situations will certain issues or bugs trigger a
> failure. For instance, just yesterday I was Cc'd on a brand-spanking new legacy
> conflict [0], caused by upstream commit 8c058b0b9c34d8c ("x86/irq: Probe for
> PIC presence before allocating descs for legacy IRQs") merged on v4.4 where
> some new code used nr_legacy_irqs() -- one proposed solution seems to be that
> for Xen code NR_IRQS_LEGACY should be used instead is as it lacks PCI [1] and
> another was to peg the legacy requirements as a quirk on the new x86 platform
> legacy quirk stuff [2]. Are other uses of nr_legacy_irqs() correct ? Are
> we sure ?

And how is this example related to 'early bootup' path?

It is not.

It is in fact related to PV codepaths - which PVH/HVMLite and HVM guests
do not exercise.
> 
> [0] http://lkml.kernel.org/r/570F90DF.1020508@oracle.com
> [1] https://lkml.org/lkml/2016/4/14/532
> [2] http://lkml.kernel.org/r/1460592286-300-1-git-send-email-mcgrof@kernel.org
> 
> > > > > How we address semantics then is *very* important to me.
> > > > 
> > > > Which semantics? How the CPU is going to be at startup_X ? Or
> > > > how the CPU is going to be when EFI firmware invokes the EFI stub?
> > > > Or when GRUB2 loads Linux?
> > > 
> > > What hypervisor kicked me and what guest type I am.
> > 
> > cpuid software flags have that - and that semantics has been 
> > there for eons.
> 
> We cannot use cpuid early in asm code, I'm looking for something we

?! Why!?
> can even use on asm early in boot code, on x86 the best option we
> have is the boot_params, but I've even have had issues with that
> early in code, as I can only access it after load_idt() where I
> described my effort to unify Xen PV and x86_64 init paths [3].

Well, Xen PV skips x86_64_start_kernel..
> 
> [3] http://lkml.kernel.org/r/CAB=NE6VTCRCazcNpCdJ7pN1eD3=x_fcGOdH37MzVpxkKEN5esw@mail.gmail.com
> 
> > > Let me elaborate more below.
> > > 
> > > > That (those bootloaders) is clearly defined. The URL I provided
> > > > mentions the HVMLite one. The Documentation/x86/boot.c mentions
> > > > what the semantics are to expected when providing an bootstrap
> > > > (which is what HVMLitel stub code in Linux would write against -
> > > > and what EFI stub code had been written against too).
> > > > > 
> > > > > > > I'll elaborate on this but first let's clarify why a new entry is used for
> > > > > > > HVMlite to start of with:
> > > > > > > 
> > > > > > >   1) Xen ABI has historically not wanted to set up the boot params for Linux
> > > > > > >      guests, instead it insists on letting the Linux kernel Xen boot stubs fill
> > > > > > >      that out for it. This sticking point means it has implicated a boot stub.
> > > > > > 
> > > > > > 
> > > > > > Which is b/c it has to be OS agnostic. It has nothing to do 'not wanting'.
> > > > > 
> > > > > It can still be OS agnostic and pass on type and custom data pointer.
> > > > 
> > > > Sure. It has that (it MUST otherwise how else would you pass data).
> > > > It is documented as well http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,xen.h.html#incontents_startofday
> > > > (see " Start of day structure passed to PVH guests in %ebx.")
> > > 
> > > The design doc begs for a custom OS entry point though.
> > 
> > That is what the ELF Note has.
> 
> Right, but I'm saying that its rather silly to be adding entry points if
> all we want the code to do is copy the boot params for us. The design
> doc requires a new entry, and likewise you'd need yet-another-entry if
> HVMLite is thrown out the window and come back 5 years later after new
> hardware solutions are in place and need to redesign HVMLite. Kind of

Why would you need to redesign HVMLite based on hardware solutions?
The entrace point and the CPU state are pretty well known - it is akin
to what GRUB2 bootloader path is (protected mode).
> where we are with PVH today. Likewise if other paravirtualization
> developers want to support Linux and want to copy your strategy they'd
> add yet-another-entry-point as well.
> 
> This is dumb.

You saying the EFI entry point is dumb? That instead the EFI
firmware should understand Linux bootparams and booted that?

> 
> > > If we had a single 'type' and 'custom data' passed to the kernel that
> > > should suffice for the default Linux entry point to just pivot off
> > > of that and do what it needs without more entry points. Once.
> > 
> > And what about ramdisk? What about multiple ramdisks?
> > What about command line? All of that is what bootparams
> > tries to unify on Linux. But 'bootparams' is unique to Linux,
> > it does not exist on FreeBSD. Hence some stub code to transplant
> > OS-agnostic simple data to OS-specific is neccessary.
> 
> If we had a Xen ABI option where *all* that I'm asking is you pass
> first:
> 
>   a) hypervisor type

Why can't you use cpuid.
>   b) custom data pointer

What is this custom data pointer you speak of?
> 
> We'd be able to avoid adding *any* entry point and just address
> the requirements as I noted with pre / post stubs for the type.

But you need some entry point to call into Linux. Are you
suggesting to use the existing ones? No, the existing one
wouldn't understand this.

> This would require an x86 boot protocol bump, but all the issues
> creeping up randomly I think that's worth putting on the table now.

Aaaah, so you are saying expand the bootparams. In other words
make Xen ABI call into Linux using the bootparams structure, similar
to how GRUB2 does it.

How is that OS agnostic?
> 
> And maybe we don't want it to be hypervisor specific, perhaps there are other
> *needs* for custom pre-post startup_32()/startup_64() stubs.

Multiboot?
> 
> To avoid extending boot_params further I figured perhaps we can look
> at EFI as another option instead. If we are going to drop all legacy

But EFI support is _huge_.
> PV support from the kernel (not the hypervisor) and require hardware
> virtualization 5 years from now on the Linux kernel, it doesn't seem
> to me far fetched to at the very least consider using an EFI entry
> instead, specially since all it does is set boot params and we can
> make re-use this for HVMLite too.

But to make that work you have to emulate EFI firmware in the
hypervisor. Is that work you are signing up for?
> 
>   Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]               ` <570F65F7.5050108@citrix.com>
@ 2016-04-14 19:59                 ` Luis R. Rodriguez
  0 siblings, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-14 19:59 UTC (permalink / raw)
  To: George Dunlap
  Cc: Matt Fleming, jeffm, Linux Kernel Mailing List, Jim Fehlig,
	Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Julien Grall, Stefano Stabellini, Julien Grall, joeyli,
	Borislav Petkov, Boris Ostrovsky, Juergen Gross, Andrew Cooper,
	Michael Chang

On Thu, Apr 14, 2016 at 10:42:15AM +0100, George Dunlap wrote:
> On 13/04/16 19:54, Luis R. Rodriguez wrote:
> > On Wed, Apr 13, 2016 at 11:05:00AM +0100, George Dunlap wrote:
> >> On Tue, Apr 12, 2016 at 11:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >>> Also, x86 does have a history of short DT use. Just pointing that its there as
> >>> an option as well. I'll Cc you on some thread about that.
> >>
> >> I'm not sure how this is relevant to anything.
> > 
> > You brought DT as a reason why ARM was able to use the native point.
> > I'm clarifying DT has nothing to do as a restriction on x86.
> 
> No, DT isn't the reason Xen is able to use the native entry point on
> ARM.  The reason is, to quote myself: "there are no assumptions made
> about what hardware is or is not present on the system -- everything
> that needs to be communicated about what is or is not present can be
> passed in DT."
> 
> So that's three things:
> 1. DT is available to be used
> 2. DT is expected as the main thing that entry point accepts
> 3. There are no assumptions about what hardware is or is not present in
> the system
> 4. Everything that needs to be communicated about what is or is not
> present can be passed in DT.
> 
> Are #2, #3, and #4 true on x86?  If not then #1 is irrelevant.

2) Obviously not, but it can be used.
3) We're getting close to that, see the platform legacy work [0],
   that should help us mesh things into a generic form that we
   didn't have before. There may be others, as is being discussed.
   If you have other ideas now would be great to hear of them.
4) we have ACPI to fill in the gaps these days for not only x86
   but also ARM, as such I think it makes sense to only use DT
   when it makes sense and to standardize on ACPI when possible

[0] http://lkml.kernel.org/r/1460592286-300-1-git-send-email-mcgrof@kernel.org

> [snip from another thread]
> 
> > One. CE4100.
> >
> > arch/x86/platform/ce4100/falconfalls.dt
> 
> You CC'd me on some patches related to that.  I don't know anything
> about the code, but it looked like CE4100 is a subarch, and in response
> to that thread Ingo specifically asked you to add a comment saying
> basically "Don't add any more subarches".

Yeap!

> And not only that, but the ugly, nasty legacy PV boot path we're trying
> to get rid of IS ALSO A SUBARCH.  So instead of a quick stub with an
> extra EFI flag, you're proposing we consider add yet another Xen PV subarch?

A little while ago I brought that up as a possibility, given that the
semantics of use of the subarch were also loose... hence the discussion
over that, and now a patch that helps clarify the use as you were
Cc'd on.

What's been decided is that we should not extend the subarch, however
if we need a hypervisor type that's a separate topic and we would need
to address that separately. Its possible. I find it sensible specially if
the goal is to avoid more sporadic entries on Linux and to help with
early boot semantics / addressing dead code prospects.

EFI is another option which already has code and an entry and its
why I've asked us to consider it. So we should probably not really
try to look at adding a hypervisor type until we've really decided
that EFI is a no go at all and makes no sense.

IMHO we should add new entries to x86 linux only as a last resort measure.

> >> What we're talking about is how to get from Xen to a point in the
> >> Linux kernel where everything can Just Work.  The proposed feature is
> >> a mini trampoline that (as I understand it):
> >> 1. Tells Xen where to jump to (via ELF note)
> >> 2. Sets up some basic modes and pagetables and then jumps to the zero
> >> page so Linux can just carry on.
> > 
> > Right, and the my goal is to see to it we do enough homework to
> > ensure we reviewed all possibilities to share as much code as possible
> > already and looked at all options before saying we certainly need yet
> > another entry point. I am not convinced yet this has been done.
> 
> I think we have different ideas about what an appropriate amount of
> homework is. :-)  Everything you've put forward has been given
> consideration and judged unlikely to be promising;

That's fine I'm not afraid of suggestions to be discarded, my goal
is to evaluate all possibilities from an engineering point of
view, and then make decisions.

> and your suggestions for further possibilities (like this one) keep getting
> more and more obviously unsuitable.

Really ? If it wasn't for me looking into the paravirt crap you'd
end up likely with some other semantic mess. If you'd really like
me to stop chiming in let me know and I'll look away form Xen for
good like others have.

> We shouldn't be required to actually post code
> for every single other option just to prove how ugly they are,
> particularly when there's nothing particularly wrong with the code we have.

I'm not asking that. I'm asking for an engineering evaluation. That's very
different. I am going to the Xen Hackathon after all as well, not sure what
else to tell you to show you I'm only after the best engineering solution and
it seems we could do much better here.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]           ` <20160414194408.GP1990@wotan.suse.de>
@ 2016-04-14 20:38             ` Konrad Rzeszutek Wilk
       [not found]             ` <20160414203847.GB21657@localhost.localdomain>
                               ` (4 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-14 20:38 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Julien Grall, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Takashi Iwai, Vojtěch Pavlík, Gary Lin, xen-devel,
	Jeffrey Cheung, Juergen Gross, Stefano Stabellini, Jim Fehlig,
	George Dunlap, joeyli, Borislav Petkov, Boris Ostrovsky,
	Charles Arndol, Andrew Cooper, Linux Kernel Mailing List

> This has nothing to do with dominance or anything nefarious, I'm asking
> simply for a full engineering evaluation of all possibilities, with
> the long term in mind. Not for now, but for hardware assumptions which
> are sensible 5 years from now.

There are two different things in my mind about this conversation:

 1). semantics of low-level code wrapped around pvops. On baremetal
   it is easy - just look at Intel and AMD SDM.
   And this is exactly what running in HVM or HVMLite mode will do -
   all those low-level operations will have the same exact semantic
   as baremetal.

   There is no hope for the pv_ops to fix that.

   And I am pretty sure the HVMLite in 5 years will have no
   trouble in this as it will be running in VMX mode (HVM).
   
 2). Boot entry.

   The semantics on Linux are well known - they are documented in
   Documentation/x86/boot.txt.

   HVMLite Linux guests have to somehow provide that.

   And how it is done seems to be tied around:

   a) Use existing boot paths - which means making some
      extra stub code to call in those existing boot paths
      (for example Xen could bundle with an GRUB2-alike
       code to be run when booting Linux using that boot-path).

      Or EFI (for a ton more code). Granted not all OSes
      support those, so not very OS agnostic.

       Hard part - if the bootparams change then have to
      rev up the code in there. May be out of sync
      with Linux bootparams.

   b) Add another simpler boot entry point which has to copy
     "some" strings from its format in bootparams.


   So this part of the discussion does not fall in the
   hardware assumptions. Intel SDM or AMD mention nothing about
   boot loaders or how to boot an OS - that is all in realms
   of how software talks to software.

 3). And there is the discussion on man-power to make this
   happen.

 4). Lastly which one is simpler and involves less code so
    that there is a less chance of bitrot.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
  2016-04-14 19:56                     ` Konrad Rzeszutek Wilk
@ 2016-04-14 20:56                       ` Luis R. Rodriguez
       [not found]                       ` <20160414205619.GR1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-14 20:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Kees Cook, Stefano Stabellini, Josh Triplett,
	joeyli, Borislav Petkov, Boris Ostrovsky, Juergen Gross,
	Andrew Cooper, Jim Fehlig, Andy Lutomirski, Luis R. Rodriguez

On Thu, Apr 14, 2016 at 03:56:53PM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Apr 14, 2016 at 08:40:48PM +0200, Luis R. Rodriguez wrote:
> > On Wed, Apr 13, 2016 at 09:01:32PM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Thu, Apr 14, 2016 at 12:23:17AM +0200, Luis R. Rodriguez wrote:
> > > > VGA code will be dead code for HVMlite for sure as the design doc
> > > > says it will not run VGA, the ACPI flag will be set but the check
> > > > for that is not yet on Linux. That means the VGA Linux code will
> > > > be there but we have no way to ensure it will not run nor that
> > > > anything will muck with it.
> > > 
> > > <shrugs> The worst it will do is try to read non-existent registers.
> > 
> > Really ?
> > 
> > Is that your position on all other possible dead code that may have been
> > possible on old Xen PV guests as well ?
> 
> This is not just with Xen - it with other device drivers that are being
> invoked on baremetal and are not present in hardware anymore.

Indeed, however virtualization makes this issue much more prominent.

> > As I hinted, after thinking about this for a while I realized that dead code is
> > likely present on bare metal as well even without virtualization, specially if
> 
> Yes!
> > you build large single kernels to support a wide array of features which only
> > late at run time can be determined. Virtualization and the pvops design just
> > makes this issue much more prominent. If there are other areas of code exposed
> > that actually may run, but we are not sure may run, I figured some other folks
> > with a bit more security conscience minds might even simply take the position
> > it may be a security risk to leave that code exposed. So to take a position
> > that 'the worst it will do is try to read non-existent registers' -- seems
> > rather shortsighted here.
> 
> Security conscious people trim their CONFIG.

Not all Linux distributions want to do this, the more binaries the
higher the cost to test / vet.

> > Anyway for more details on thoughts on this refer to the this wiki:
> > 
> > http://kernelnewbies.org/KernelProjects/kernel-sandboxing
> > 
> > Since this is now getting off topic please send me your feedback on another
> > thread for the non-virtualization aspects of this if that interests you. My
> > point here was rather to highlight the importance of clear semantics due to
> > virtualization in light of possible dead code.
> 
> Thank you.
> > 
> > > The VGA code should be able to handle failures like that and
> > > not initialize itself when the hardware is dead (or non-existent).
> > 
> > That's right, its through ACPI_FADT_NO_VGA and since its part of the HVMLite
> > design doc we want HVMlite design to address ACPI_FADT_NO_VGA properly.  I've
> > paved the way for this to be done cleanly and easily now, but that code should
> > be in place before HVMLite code gets merged.
> > 
> > Does domU for old Xen PV also set ACPI_FADT_NO_VGA as well ?  Should it ?
> 
> It does not. Not sure - it seems to have worked fine for the last ten
> years?

Maybe HVMLite will need it enabled then too, just for bug parity.

> > > > To be clear -- dead code concerns still exist even without
> > > > virtualization solutions, its just that with virtualization
> > > > this stuff comes up more and there has been no proactive
> > > > measures to address this. The question of semantics here is
> > > > to see to what extent we need earlier boot code annotations
> > > > to ensure we address semantics proactively.
> > > 
> > > I think what you mean by dead code is another word for
> > > hardware test coverage?
> > 
> > No, no, its very different given that with virtualization the scope of possible
> > dead code is significant and at run time you are certain a huge portion of code
> > should *never ever* run. So for instance we know once we boot bare metal none
> > of the Xen stuff should ever run, likewise on Xen dom0 we know none of the KVM
> > / bare-metal only stuff should never run, when on Xen domU, none of the Xen
> 
> What is this 'bare metal only stuff' you speak of? On Xen dom0 most of
> the baremetal code is running.

A lot, not all. In the past folks added stubs (used to be paravirt_enabled()
checks) to some code, but we are simply not sure of other possible conflicts.
This is an known unknown if you will.

> In fact that is how the device drivers work. Or are you talking about low
> level baremetal code? If so, then PVH/HVMLite does that - it skips pvops so
> that it can run this 'low-level baremetal code'

Are you telling me that HVMLite has no dead code issues ?

> > domU-only stuff should ever run.
> 
> You forgot KVM guest support on baremetal. That shouldn't run either.

Glad you bring that up, yes, that is correct. I'm being just as cautious with
Xen as with KVM on their dead-code possible issues, however their dead code
conerns should be smaller given as you not the boot path.

It doesn't mean dead-cod concerns do not exist for KVM... or other
virtualization solutions.

> > > > > The entrace point in Linux "proper" is startup_32 or startup_64 - the same
> > > > > path that EFI uses.
> > > > > 
> > > > > If you were to draw this (very simplified):
> > > > > 
> > > > > a)- GRUB2 ---------------------\ (creates an bootparam structure)
> > > > >                                 \
> > > > >                                  +---- startup_32 or startup_64
> > > > > b) EFI -> Linux EFI stub -------/
> > > > >        (creates bootparm)      /
> > > > > c) GRUB2-EFI  -> Linux EFI----/
> > > > >                stub         /
> > > > > d) HVMLite ----------------/
> > > > >       (creates bootparm)
> > > > 
> > > > b) and d) might be able to share paths there...
> > > 
> > > No idea. You would have to look in the assembler code to
> > > figure that out.
> > 
> > And that's a pain, I get it.
> > 
> > I spotted one place already -- will note to Boris. I think Matt may have more
> > ideas ;)
> > 
> > > > d) still has its own entry, it does more than create boot params.
> > > 
> > > d) purpose is to create boot params.
> > 
> > OK good to know that's the only thing we acknowledge it *should* do.
> 
> And b), c) purpose is for that too - amongts providing an mechanism
> to call in EFI firmware.

Sure.

> And I realized that early baremetal boot option also ends up calling C during
> its startup (see main in arch/x86/boot/main.c) amongst then switching
> different modes.

Sure.

> > >  It may do more as nobody likes to muck in assembler and make bootparams from
> > >  within assembler.
> > 
> > OK -- it does do more and that's where we'd like to avoid duplication if
> > possible and yet-another-entry (TM).
> 
> It does more? EFI stub entry does more than the GRUB2 entry.
> 
> If you have some patches to trim the code duplication within
> those boot paths- please post it.

Sure.

> > > > > (I am not sure about the c) - I would have to look in source to
> > > > > be source). There is also LILO in this, but I am not even sure if
> > > > > works anymore.
> > > > > 
> > > > > 
> > > > > What you have is that every entry point creates the bootparams
> > > > > and ends up calling startup_X. The startup_64 then hit the rest
> > > > > of the kernel. The startp_X code is the one that would setup
> > > > > the basic pagetables, segments, etc.
> > > > 
> > > > Sure.. a full diagram should include both sides and how when using
> > > > a custom entry one runs the risk of skipping a lot of code setup.
> > > 
> > > But it does not skip a lot of code setup. It starts exactly
> > > at the same code startup that _all_ bootstraping code start at.
> > 
> > Its a fair point.
> > 
> > > > There is that and as others have pointed out how certain guests types
> > > > are assumed to not have certain peripherals, and we have no idea
> > > > to ensure certain old legacy code may not ever run or be accessed
> > > > by drivers.
> > > 
> > > Ok, but that is not at code setup. That is later - when device
> > > drivers are initialized. This no different than booting on
> > > some hardware with missing functionality. ACPI, PCI and PnP
> > > PnP are set there to help OSes discover this.
> > 
> > To a certain extent this is true, but there may things which are missing still.
> 
> Like?

That's the thing, I had a list of thing to look out for and then things
I ran across over code inspection. We need more work to be sure we're
really well covered.

Are you *sure* we have no dead code concerns with HVMLite ?
If there are dead code concerns are you sure there might not
be differences between KVM and HVMLite ? Should cpuid be used to
address differences ? Will that enable to distinguish between
hybrid versions of HVMLite ? Are we sure ?

> > We really have no idea what the full list of those things are.
> 
> Ok, it sounds like you have some homework.

We all do.

> > It may be that things may have been running for ages without notice of an issue
> > or that only under certain situations will certain issues or bugs trigger a
> > failure. For instance, just yesterday I was Cc'd on a brand-spanking new legacy
> > conflict [0], caused by upstream commit 8c058b0b9c34d8c ("x86/irq: Probe for
> > PIC presence before allocating descs for legacy IRQs") merged on v4.4 where
> > some new code used nr_legacy_irqs() -- one proposed solution seems to be that
> > for Xen code NR_IRQS_LEGACY should be used instead is as it lacks PCI [1] and
> > another was to peg the legacy requirements as a quirk on the new x86 platform
> > legacy quirk stuff [2]. Are other uses of nr_legacy_irqs() correct ? Are
> > we sure ?
> 
> And how is this example related to 'early bootup' path?
> 
> It is not.

For early boot code -- it is not. HVMLite is not merged, and PHV was never
completed.. so how are you sure we won't have any issues there ?

> It is in fact related to PV codepaths - which PVH/HVMLite and HVM guests
> do not exercise.

Agreed.

> > [0] http://lkml.kernel.org/r/570F90DF.1020508@oracle.com
> > [1] https://lkml.org/lkml/2016/4/14/532
> > [2] http://lkml.kernel.org/r/1460592286-300-1-git-send-email-mcgrof@kernel.org
> > 
> > > > > > How we address semantics then is *very* important to me.
> > > > > 
> > > > > Which semantics? How the CPU is going to be at startup_X ? Or
> > > > > how the CPU is going to be when EFI firmware invokes the EFI stub?
> > > > > Or when GRUB2 loads Linux?
> > > > 
> > > > What hypervisor kicked me and what guest type I am.
> > > 
> > > cpuid software flags have that - and that semantics has been 
> > > there for eons.
> > 
> > We cannot use cpuid early in asm code, I'm looking for something we
> 
> ?! Why!?

What existing code uses it? If there is nothing you are still certain
it should work ? Would that work for old PV guest as well BTW ?

> > can even use on asm early in boot code, on x86 the best option we
> > have is the boot_params, but I've even have had issues with that
> > early in code, as I can only access it after load_idt() where I
> > described my effort to unify Xen PV and x86_64 init paths [3].
> 
> Well, Xen PV skips x86_64_start_kernel..

Yes, and in doing so often times people skip adding Xen PV specific
code, as was the case with Kasan.

> > [3] http://lkml.kernel.org/r/CAB=NE6VTCRCazcNpCdJ7pN1eD3=x_fcGOdH37MzVpxkKEN5esw@mail.gmail.com
> > 
> > > > Let me elaborate more below.
> > > > 
> > > > > That (those bootloaders) is clearly defined. The URL I provided
> > > > > mentions the HVMLite one. The Documentation/x86/boot.c mentions
> > > > > what the semantics are to expected when providing an bootstrap
> > > > > (which is what HVMLitel stub code in Linux would write against -
> > > > > and what EFI stub code had been written against too).
> > > > > > 
> > > > > > > > I'll elaborate on this but first let's clarify why a new entry is used for
> > > > > > > > HVMlite to start of with:
> > > > > > > > 
> > > > > > > >   1) Xen ABI has historically not wanted to set up the boot params for Linux
> > > > > > > >      guests, instead it insists on letting the Linux kernel Xen boot stubs fill
> > > > > > > >      that out for it. This sticking point means it has implicated a boot stub.
> > > > > > > 
> > > > > > > 
> > > > > > > Which is b/c it has to be OS agnostic. It has nothing to do 'not wanting'.
> > > > > > 
> > > > > > It can still be OS agnostic and pass on type and custom data pointer.
> > > > > 
> > > > > Sure. It has that (it MUST otherwise how else would you pass data).
> > > > > It is documented as well http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,xen.h.html#incontents_startofday
> > > > > (see " Start of day structure passed to PVH guests in %ebx.")
> > > > 
> > > > The design doc begs for a custom OS entry point though.
> > > 
> > > That is what the ELF Note has.
> > 
> > Right, but I'm saying that its rather silly to be adding entry points if
> > all we want the code to do is copy the boot params for us. The design
> > doc requires a new entry, and likewise you'd need yet-another-entry if
> > HVMLite is thrown out the window and come back 5 years later after new
> > hardware solutions are in place and need to redesign HVMLite. Kind of
> 
> Why would you need to redesign HVMLite based on hardware solutions?

That's what happened to Xen PV, right ? Are we sure 5 years from now we won't
have any new hardware virtualization features that will just obsolete HVMLite?

> The entrace point and the CPU state are pretty well known - it is akin
> to what GRUB2 bootloader path is (protected mode).
> > where we are with PVH today. Likewise if other paravirtualization
> > developers want to support Linux and want to copy your strategy they'd
> > add yet-another-entry-point as well.
> > 
> > This is dumb.
> 
> You saying the EFI entry point is dumb? That instead the EFI
> firmware should understand Linux bootparams and booted that?

EFI is a standard. Xen is not. And since we are not talking about legacy
hardware in the future, EFI seems like a sensible option to consider for an
entry point. Specially given that it may mean that we can ultimately also help
unify more entry points on Linux in general. I'd prefer to consider using
EFI configuration tables instead of extending the x86 boot protocol.

> > > > If we had a single 'type' and 'custom data' passed to the kernel that
> > > > should suffice for the default Linux entry point to just pivot off
> > > > of that and do what it needs without more entry points. Once.
> > > 
> > > And what about ramdisk? What about multiple ramdisks?
> > > What about command line? All of that is what bootparams
> > > tries to unify on Linux. But 'bootparams' is unique to Linux,
> > > it does not exist on FreeBSD. Hence some stub code to transplant
> > > OS-agnostic simple data to OS-specific is neccessary.
> > 
> > If we had a Xen ABI option where *all* that I'm asking is you pass
> > first:
> > 
> >   a) hypervisor type
> 
> Why can't you use cpuid.

I'll evaluate that.

> >   b) custom data pointer
> 
> What is this custom data pointer you speak of?

For Xen this is the en_start_info, the structure that Xen stuffs in
a copy of its version of what we need to fill the boot_params.

> > We'd be able to avoid adding *any* entry point and just address
> > the requirements as I noted with pre / post stubs for the type.
> 
> But you need some entry point to call into Linux. Are you
> suggesting to use the existing ones? No, the existing one
> wouldn't understand this.

If we used the boot_parms, yes it would be possible.

> > This would require an x86 boot protocol bump, but all the issues
> > creeping up randomly I think that's worth putting on the table now.
> 
> Aaaah, so you are saying expand the bootparams. In other words
> make Xen ABI call into Linux using the bootparams structure, similar
> to how GRUB2 does it.
> 
> How is that OS agnostic?

That's an issue, I understand. EFI is OS agnostic though.

> > And maybe we don't want it to be hypervisor specific, perhaps there are other
> > *needs* for custom pre-post startup_32()/startup_64() stubs.
> 
> Multiboot?

Can you elaborate?

> > To avoid extending boot_params further I figured perhaps we can look
> > at EFI as another option instead. If we are going to drop all legacy
> 
> But EFI support is _huge_.

I get the sense now. Perhaps we should explore to what extent now really
at the Hackathon.

> > PV support from the kernel (not the hypervisor) and require hardware
> > virtualization 5 years from now on the Linux kernel, it doesn't seem
> > to me far fetched to at the very least consider using an EFI entry
> > instead, specially since all it does is set boot params and we can
> > make re-use this for HVMLite too.
> 
> But to make that work you have to emulate EFI firmware in the
> hypervisor. Is that work you are signing up for?

I'll do what is needed, as I have done before. If EFI is on the long
term roadmap for ARM perhaps there are a few birds to knock with one
stone here. If there is also interest to support other OSes through
EFI standard means this also should help make that easier.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]             ` <20160414203847.GB21657@localhost.localdomain>
@ 2016-04-14 21:12               ` Luis R. Rodriguez
       [not found]               ` <20160414211201.GS1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-14 21:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Matt Fleming, jeffm, Michael Chang, Julien Grall, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Takashi Iwai, Vojtěch Pavlík, Gary Lin, xen-devel,
	Jeffrey Cheung, Juergen Gross, Stefano Stabellini, Jim Fehlig,
	George Dunlap, joeyli, Borislav Petkov, Boris Ostrovsky,
	Charles Arndol, Andrew Cooper, Linux Kernel Mailing List

On Thu, Apr 14, 2016 at 04:38:47PM -0400, Konrad Rzeszutek Wilk wrote:
> > This has nothing to do with dominance or anything nefarious, I'm asking
> > simply for a full engineering evaluation of all possibilities, with
> > the long term in mind. Not for now, but for hardware assumptions which
> > are sensible 5 years from now.
> 
> There are two different things in my mind about this conversation:
> 
>  1). semantics of low-level code wrapped around pvops. On baremetal
>    it is easy - just look at Intel and AMD SDM.
>    And this is exactly what running in HVM or HVMLite mode will do -
>    all those low-level operations will have the same exact semantic
>    as baremetal.

Today Linux is KVM stupid for early boot code. I've pointed this out
before, but again, there has been no reason found to need this. Perhaps
for HVMLite we won't need this...

>    There is no hope for the pv_ops to fix that.

Actually I beg to differ. See my patches and ongoing work.

>    And I am pretty sure the HVMLite in 5 years will have no
>    trouble in this as it will be running in VMX mode (HVM).

HVMLite may still use PV drivers for some things, its not super
obvious to me that low level semantics will not be needed yet.

>  2). Boot entry.
> 
>    The semantics on Linux are well known - they are documented in
>    Documentation/x86/boot.txt.
> 
>    HVMLite Linux guests have to somehow provide that.
> 
>    And how it is done seems to be tied around:
> 
>    a) Use existing boot paths - which means making some
>       extra stub code to call in those existing boot paths
>       (for example Xen could bundle with an GRUB2-alike
>        code to be run when booting Linux using that boot-path).
> 
>       Or EFI (for a ton more code). Granted not all OSes
>       support those, so not very OS agnostic.

What other OSes do is something to consider but if they don't
do it because they are slacking in one domain should by no means
be a reason to not evaluate the long term possible gains.
Specially if we have reasons to believe more architectures will
consider it and standardize on it.

It'd be silly not to take this a bit more seriously.

>        Hard part - if the bootparams change then have to
>       rev up the code in there. May be out of sync
>       with Linux bootparams.

If we are going to ultimately standardize on EFI boot for new
hardware it'd be rather silly to extend the boot params further.

>    b) Add another simpler boot entry point which has to copy
>      "some" strings from its format in bootparams.
> 
> 
>    So this part of the discussion does not fall in the
>    hardware assumptions. Intel SDM or AMD mention nothing about
>    boot loaders or how to boot an OS - that is all in realms
>    of how software talks to software.

Right -- so one question to ask here is what other uses are there
for this outside of say HVMLite. You mentioned Multiboot so far.

>  3). And there is the discussion on man-power to make this
>    happen.

Sure.

>  4). Lastly which one is simpler and involves less code so
>     that there is a less chance of bitrot.

Indeed.

You also forgot the tie-in between dead-code and semantics but
that clearly is not on your mind. But I'd say this is a good
summary.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]                       ` <20160414205619.GR1990@wotan.suse.de>
@ 2016-04-15  2:02                         ` Konrad Rzeszutek Wilk
  2016-04-15 10:06                         ` Julien Grall
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-15  2:02 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Kees Cook, Stefano Stabellini, Josh Triplett,
	joeyli, Borislav Petkov, Boris Ostrovsky, Juergen Gross,
	Andrew Cooper, Jim Fehlig, Andy Lutomirski, David Vrabel

On Thu, Apr 14, 2016 at 10:56:19PM +0200, Luis R. Rodriguez wrote:
> On Thu, Apr 14, 2016 at 03:56:53PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Thu, Apr 14, 2016 at 08:40:48PM +0200, Luis R. Rodriguez wrote:
> > > On Wed, Apr 13, 2016 at 09:01:32PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > On Thu, Apr 14, 2016 at 12:23:17AM +0200, Luis R. Rodriguez wrote:
> > > > > VGA code will be dead code for HVMlite for sure as the design doc
> > > > > says it will not run VGA, the ACPI flag will be set but the check
> > > > > for that is not yet on Linux. That means the VGA Linux code will
> > > > > be there but we have no way to ensure it will not run nor that
> > > > > anything will muck with it.
> > > > 
> > > > <shrugs> The worst it will do is try to read non-existent registers.
> > > 
> > > Really ?
> > > 
> > > Is that your position on all other possible dead code that may have been
> > > possible on old Xen PV guests as well ?
> > 
> > This is not just with Xen - it with other device drivers that are being
> > invoked on baremetal and are not present in hardware anymore.
> 
> Indeed, however virtualization makes this issue much more prominent.

I suppose - as it only exposes a certain type of platform and nothing
else.
> 
> > > As I hinted, after thinking about this for a while I realized that dead code is
> > > likely present on bare metal as well even without virtualization, specially if
> > 
> > Yes!
> > > you build large single kernels to support a wide array of features which only
> > > late at run time can be determined. Virtualization and the pvops design just
> > > makes this issue much more prominent. If there are other areas of code exposed
> > > that actually may run, but we are not sure may run, I figured some other folks
> > > with a bit more security conscience minds might even simply take the position
> > > it may be a security risk to leave that code exposed. So to take a position
> > > that 'the worst it will do is try to read non-existent registers' -- seems
> > > rather shortsighted here.
> > 
> > Security conscious people trim their CONFIG.
> 
> Not all Linux distributions want to do this, the more binaries the
> higher the cost to test / vet.

OK, but Linux distributions have many goals - and are pulled in
different directions so they cannot always achieve the 'low footprint -
small amount of code to do inspection from security standpoint'

> 
> > > Anyway for more details on thoughts on this refer to the this wiki:
> > > 
> > > http://kernelnewbies.org/KernelProjects/kernel-sandboxing
> > > 
> > > Since this is now getting off topic please send me your feedback on another
> > > thread for the non-virtualization aspects of this if that interests you. My
> > > point here was rather to highlight the importance of clear semantics due to
> > > virtualization in light of possible dead code.
> > 
> > Thank you.
> > > 
> > > > The VGA code should be able to handle failures like that and
> > > > not initialize itself when the hardware is dead (or non-existent).
> > > 
> > > That's right, its through ACPI_FADT_NO_VGA and since its part of the HVMLite
> > > design doc we want HVMlite design to address ACPI_FADT_NO_VGA properly.  I've
> > > paved the way for this to be done cleanly and easily now, but that code should
> > > be in place before HVMLite code gets merged.
> > > 
> > > Does domU for old Xen PV also set ACPI_FADT_NO_VGA as well ?  Should it ?
> > 
> > It does not. Not sure - it seems to have worked fine for the last ten
> > years?
> 
> Maybe HVMLite will need it enabled then too, just for bug parity.

<shrugs> Sure.
> 
> > > > > To be clear -- dead code concerns still exist even without
> > > > > virtualization solutions, its just that with virtualization
> > > > > this stuff comes up more and there has been no proactive
> > > > > measures to address this. The question of semantics here is
> > > > > to see to what extent we need earlier boot code annotations
> > > > > to ensure we address semantics proactively.
> > > > 
> > > > I think what you mean by dead code is another word for
> > > > hardware test coverage?
> > > 
> > > No, no, its very different given that with virtualization the scope of possible
> > > dead code is significant and at run time you are certain a huge portion of code
> > > should *never ever* run. So for instance we know once we boot bare metal none
> > > of the Xen stuff should ever run, likewise on Xen dom0 we know none of the KVM
> > > / bare-metal only stuff should never run, when on Xen domU, none of the Xen
> > 
> > What is this 'bare metal only stuff' you speak of? On Xen dom0 most of
> > the baremetal code is running.
> 
> A lot, not all. In the past folks added stubs (used to be paravirt_enabled()
> checks) to some code, but we are simply not sure of other possible conflicts.
> This is an known unknown if you will.
> 
> > In fact that is how the device drivers work. Or are you talking about low
> > level baremetal code? If so, then PVH/HVMLite does that - it skips pvops so
> > that it can run this 'low-level baremetal code'
> 
> Are you telling me that HVMLite has no dead code issues ?

You said earlier that baremetal has dead code issue. Then by extensions
_any_ execution path has dead code issues.

..snip..
> > > > > There is that and as others have pointed out how certain guests types
> > > > > are assumed to not have certain peripherals, and we have no idea
> > > > > to ensure certain old legacy code may not ever run or be accessed
> > > > > by drivers.
> > > > 
> > > > Ok, but that is not at code setup. That is later - when device
> > > > drivers are initialized. This no different than booting on
> > > > some hardware with missing functionality. ACPI, PCI and PnP
> > > > PnP are set there to help OSes discover this.
> > > 
> > > To a certain extent this is true, but there may things which are missing still.
> > 
> > Like?
> 
> That's the thing, I had a list of thing to look out for and then things
> I ran across over code inspection. We need more work to be sure we're
> really well covered.
> 
> Are you *sure* we have no dead code concerns with HVMLite ?
> If there are dead code concerns are you sure there might not
> be differences between KVM and HVMLite ? Should cpuid be used to
> address differences ? Will that enable to distinguish between
> hybrid versions of HVMLite ? Are we sure ?

HVMLite CPU semantics will be the same as what a baremetal CPU
semantics are.

Platform wise it will be different - as in, instead of say
having a speaker (to emulated it) or RTC clock (again, another
thing to emulate), or say IDE controller (again, another
thing to emulate), or Realtek network card (again, another
thing to emulate) - it has none of those.

[Keep in mind 'another thing to emulate', means 'another
@$@() thing in QEMU that could be a security bug']

So it differs from an consumer x86 platform in that it has
none of the 'legacy' stuff. And it requires PV drivers to
function. And since it requires PV drivers to function
only OSes that have those can use this mode.
> 
> > > We really have no idea what the full list of those things are.
> > 
> > Ok, it sounds like you have some homework.
> 
> We all do.
> 
> > > It may be that things may have been running for ages without notice of an issue
> > > or that only under certain situations will certain issues or bugs trigger a
> > > failure. For instance, just yesterday I was Cc'd on a brand-spanking new legacy
> > > conflict [0], caused by upstream commit 8c058b0b9c34d8c ("x86/irq: Probe for
> > > PIC presence before allocating descs for legacy IRQs") merged on v4.4 where
> > > some new code used nr_legacy_irqs() -- one proposed solution seems to be that
> > > for Xen code NR_IRQS_LEGACY should be used instead is as it lacks PCI [1] and
> > > another was to peg the legacy requirements as a quirk on the new x86 platform
> > > legacy quirk stuff [2]. Are other uses of nr_legacy_irqs() correct ? Are
> > > we sure ?
> > 
> > And how is this example related to 'early bootup' path?
> > 
> > It is not.
> 
> For early boot code -- it is not. HVMLite is not merged, and PHV was never
> completed.. so how are you sure we won't have any issues there ?

If we did not have issues we would be out of jobs.

But this is a seperate topic - it is an issue about device drivers and
the assumptions they have. And those assumptions are not always
true (even with normal hardware).

> 
> > It is in fact related to PV codepaths - which PVH/HVMLite and HVM guests
> > do not exercise.
> 
> Agreed.
> 
> > > [0] http://lkml.kernel.org/r/570F90DF.1020508@oracle.com
> > > [1] https://lkml.org/lkml/2016/4/14/532
> > > [2] http://lkml.kernel.org/r/1460592286-300-1-git-send-email-mcgrof@kernel.org
> > > 
> > > > > > > How we address semantics then is *very* important to me.
> > > > > > 
> > > > > > Which semantics? How the CPU is going to be at startup_X ? Or
> > > > > > how the CPU is going to be when EFI firmware invokes the EFI stub?
> > > > > > Or when GRUB2 loads Linux?
> > > > > 
> > > > > What hypervisor kicked me and what guest type I am.
> > > > 
> > > > cpuid software flags have that - and that semantics has been 
> > > > there for eons.
> > > 
> > > We cannot use cpuid early in asm code, I'm looking for something we
> > 
> > ?! Why!?
> 
> What existing code uses it? If there is nothing you are still certain
> it should work ? Would that work for old PV guest as well BTW ?

Yeah. For HVM/HVMLite it traps to the hypervisor.

For old PV guests it is unwise to use it as it goes straight to
the hardware (as PV guests run in ring3 - they are considered
'userspace' and the Intel nor AMD do not trap on 'cpuid' in ring3
-unless  you run in an VMX container).
> 
> > > can even use on asm early in boot code, on x86 the best option we
> > > have is the boot_params, but I've even have had issues with that
> > > early in code, as I can only access it after load_idt() where I
> > > described my effort to unify Xen PV and x86_64 init paths [3].
> > 
> > Well, Xen PV skips x86_64_start_kernel..
> 
> Yes, and in doing so often times people skip adding Xen PV specific
> code, as was the case with Kasan.

Right. That is an existing problem Xen PV code has.

> 
> > > [3] http://lkml.kernel.org/r/CAB=NE6VTCRCazcNpCdJ7pN1eD3=x_fcGOdH37MzVpxkKEN5esw@mail.gmail.com
> > > 
> > > > > Let me elaborate more below.
> > > > > 
> > > > > > That (those bootloaders) is clearly defined. The URL I provided
> > > > > > mentions the HVMLite one. The Documentation/x86/boot.c mentions
> > > > > > what the semantics are to expected when providing an bootstrap
> > > > > > (which is what HVMLitel stub code in Linux would write against -
> > > > > > and what EFI stub code had been written against too).
> > > > > > > 
> > > > > > > > > I'll elaborate on this but first let's clarify why a new entry is used for
> > > > > > > > > HVMlite to start of with:
> > > > > > > > > 
> > > > > > > > >   1) Xen ABI has historically not wanted to set up the boot params for Linux
> > > > > > > > >      guests, instead it insists on letting the Linux kernel Xen boot stubs fill
> > > > > > > > >      that out for it. This sticking point means it has implicated a boot stub.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Which is b/c it has to be OS agnostic. It has nothing to do 'not wanting'.
> > > > > > > 
> > > > > > > It can still be OS agnostic and pass on type and custom data pointer.
> > > > > > 
> > > > > > Sure. It has that (it MUST otherwise how else would you pass data).
> > > > > > It is documented as well http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,xen.h.html#incontents_startofday
> > > > > > (see " Start of day structure passed to PVH guests in %ebx.")
> > > > > 
> > > > > The design doc begs for a custom OS entry point though.
> > > > 
> > > > That is what the ELF Note has.
> > > 
> > > Right, but I'm saying that its rather silly to be adding entry points if
> > > all we want the code to do is copy the boot params for us. The design
> > > doc requires a new entry, and likewise you'd need yet-another-entry if
> > > HVMLite is thrown out the window and come back 5 years later after new
> > > hardware solutions are in place and need to redesign HVMLite. Kind of
> > 
> > Why would you need to redesign HVMLite based on hardware solutions?
> 
> That's what happened to Xen PV, right ? Are we sure 5 years from now we won't
> have any new hardware virtualization features that will just obsolete HVMLite?

There were no hardware virtualization when Xen PV came about.

If there is hardware virtualization that obsoletes HVMLite that means
it would also obsolete KVM and HVM mode - as HVMLite runs in an VMX
container - the same type that KVM and Xen HVM guests run in.

> 
> > The entrace point and the CPU state are pretty well known - it is akin
> > to what GRUB2 bootloader path is (protected mode).
> > > where we are with PVH today. Likewise if other paravirtualization
> > > developers want to support Linux and want to copy your strategy they'd
> > > add yet-another-entry-point as well.
> > > 
> > > This is dumb.
> > 
> > You saying the EFI entry point is dumb? That instead the EFI
> > firmware should understand Linux bootparams and booted that?
> 
> EFI is a standard. Xen is not. And since we are not talking about legacy

And is a standard something that has to come out of a committee?

If so, then Linux bootparams is not a standard. Nor is LILO bootup
path.

> hardware in the future, EFI seems like a sensible option to consider for an
> entry point. Specially given that it may mean that we can ultimately also help
> unify more entry points on Linux in general. I'd prefer to consider using

<chokes>
I can just see that. On non-EFI hardware GRUB2/SYSLINUX would use the EFI entry
point and create an fake firmware.
> EFI configuration tables instead of extending the x86 boot protocol.

What is that? Are you talking about EFI runtime services? Take a look
at the EFI spec and see what you have to implement to emulate this.
> 
> > > > > If we had a single 'type' and 'custom data' passed to the kernel that
> > > > > should suffice for the default Linux entry point to just pivot off
> > > > > of that and do what it needs without more entry points. Once.
> > > > 
> > > > And what about ramdisk? What about multiple ramdisks?
> > > > What about command line? All of that is what bootparams
> > > > tries to unify on Linux. But 'bootparams' is unique to Linux,
> > > > it does not exist on FreeBSD. Hence some stub code to transplant
> > > > OS-agnostic simple data to OS-specific is neccessary.
> > > 
> > > If we had a Xen ABI option where *all* that I'm asking is you pass
> > > first:
> > > 
> > >   a) hypervisor type
> > 
> > Why can't you use cpuid.
> 
> I'll evaluate that.
> 
> > >   b) custom data pointer
> > 
> > What is this custom data pointer you speak of?
> 
> For Xen this is the en_start_info, the structure that Xen stuffs in
> a copy of its version of what we need to fill the boot_params.

Ok, but that is what we do in some way provide.

I am lost here. You seem to saying you want something that is
already there?

> 
> > > We'd be able to avoid adding *any* entry point and just address
> > > the requirements as I noted with pre / post stubs for the type.
> > 
> > But you need some entry point to call into Linux. Are you
> > suggesting to use the existing ones? No, the existing one
> > wouldn't understand this.
> 
> If we used the boot_parms, yes it would be possible.

...OS agnostic... they are not.

> 
> > > This would require an x86 boot protocol bump, but all the issues
> > > creeping up randomly I think that's worth putting on the table now.
> > 
> > Aaaah, so you are saying expand the bootparams. In other words
> > make Xen ABI call into Linux using the bootparams structure, similar
> > to how GRUB2 does it.
> > 
> > How is that OS agnostic?
> 
> That's an issue, I understand. EFI is OS agnostic though.
> 
> > > And maybe we don't want it to be hypervisor specific, perhaps there are other
> > > *needs* for custom pre-post startup_32()/startup_64() stubs.
> > 
> > Multiboot?
> 
> Can you elaborate?

Google Multiboot specification.
> 
> > > To avoid extending boot_params further I figured perhaps we can look
> > > at EFI as another option instead. If we are going to drop all legacy
> > 
> > But EFI support is _huge_.
> 
> I get the sense now. Perhaps we should explore to what extent now really
> at the Hackathon.

Print out the EFI spec and carry it on the plane. The plane will tilt
to one side when trying to take off.

> 
> > > PV support from the kernel (not the hypervisor) and require hardware
> > > virtualization 5 years from now on the Linux kernel, it doesn't seem
> > > to me far fetched to at the very least consider using an EFI entry
> > > instead, specially since all it does is set boot params and we can
> > > make re-use this for HVMLite too.
> > 
> > But to make that work you have to emulate EFI firmware in the
> > hypervisor. Is that work you are signing up for?
> 
> I'll do what is needed, as I have done before. If EFI is on the long
> term roadmap for ARM perhaps there are a few birds to knock with one
> stone here. If there is also interest to support other OSes through
> EFI standard means this also should help make that easier.
> 
>   Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]               ` <20160414211201.GS1990@wotan.suse.de>
@ 2016-04-15  2:14                 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-15  2:14 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Julien Grall, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, the arch/x86 maintainers,
	Takashi Iwai, Vojtěch Pavlík, Gary Lin, xen-devel,
	Jeffrey Cheung, Juergen Gross, Stefano Stabellini, Jim Fehlig,
	George Dunlap, joeyli, Borislav Petkov, Boris Ostrovsky,
	Charles Arndol, Andrew Cooper, Linux Kernel Mailing List

On Thu, Apr 14, 2016 at 11:12:01PM +0200, Luis R. Rodriguez wrote:
> On Thu, Apr 14, 2016 at 04:38:47PM -0400, Konrad Rzeszutek Wilk wrote:
> > > This has nothing to do with dominance or anything nefarious, I'm asking
> > > simply for a full engineering evaluation of all possibilities, with
> > > the long term in mind. Not for now, but for hardware assumptions which
> > > are sensible 5 years from now.
> > 
> > There are two different things in my mind about this conversation:
> > 
> >  1). semantics of low-level code wrapped around pvops. On baremetal
> >    it is easy - just look at Intel and AMD SDM.
> >    And this is exactly what running in HVM or HVMLite mode will do -
> >    all those low-level operations will have the same exact semantic
> >    as baremetal.
> 
> Today Linux is KVM stupid for early boot code. I've pointed this out

-EPARSE?
> before, but again, there has been no reason found to need this. Perhaps
> for HVMLite we won't need this...

Are you talking about kvmtools? Which BTW are similar to how HVMLite
would expose the platform.
> 
> >    There is no hope for the pv_ops to fix that.
> 
> Actually I beg to differ. See my patches and ongoing work.

I meant in terms of semantics. As in I cannot see some of
those pv-ops to have the same semantics as baremetal. For example
set_pte is simple on x86 (movq $<some value>, <memory address>).

While on Xen PV it is a potential batching hypercall with
lookup in an P2M table, then perhaps a sidelong look at
the M2P, then maybe the M2P override.

> 
> >    And I am pretty sure the HVMLite in 5 years will have no
> >    trouble in this as it will be running in VMX mode (HVM).
> 
> HVMLite may still use PV drivers for some things, its not super
> obvious to me that low level semantics will not be needed yet.

PV drivers are very different from low-level semantics.

And it will have to use them.

Maybe it is easier to think of this in terms of kvmtool - it
is pretty much how this would work - but instead of VirtIO
drivers you would be using the Xen PV drivers (thought one
could also use VirtIO ones if you wanted).
> 
> >  2). Boot entry.
> > 
> >    The semantics on Linux are well known - they are documented in
> >    Documentation/x86/boot.txt.
> > 
> >    HVMLite Linux guests have to somehow provide that.
> > 
> >    And how it is done seems to be tied around:
> > 
> >    a) Use existing boot paths - which means making some
> >       extra stub code to call in those existing boot paths
> >       (for example Xen could bundle with an GRUB2-alike
> >        code to be run when booting Linux using that boot-path).
> > 
> >       Or EFI (for a ton more code). Granted not all OSes
> >       support those, so not very OS agnostic.
> 
> What other OSes do is something to consider but if they don't
> do it because they are slacking in one domain should by no means
> be a reason to not evaluate the long term possible gains.
> Specially if we have reasons to believe more architectures will
> consider it and standardize on it.
> 
> It'd be silly not to take this a bit more seriously.

Complexity vs simplicity.
> 
> >        Hard part - if the bootparams change then have to
> >       rev up the code in there. May be out of sync
> >       with Linux bootparams.
> 
> If we are going to ultimately standardize on EFI boot for new
> hardware it'd be rather silly to extend the boot params further.

Whoa there... Have you spoken to hpa,tglrx about this?

> 
> >    b) Add another simpler boot entry point which has to copy
> >      "some" strings from its format in bootparams.
> > 
> > 
> >    So this part of the discussion does not fall in the
> >    hardware assumptions. Intel SDM or AMD mention nothing about
> >    boot loaders or how to boot an OS - that is all in realms
> >    of how software talks to software.
> 
> Right -- so one question to ask here is what other uses are there
> for this outside of say HVMLite. You mentioned Multiboot so far.
> 
> >  3). And there is the discussion on man-power to make this
> >    happen.
> 
> Sure.
> 
> >  4). Lastly which one is simpler and involves less code so
> >     that there is a less chance of bitrot.
> 
> Indeed.
> 
> You also forgot the tie-in between dead-code and semantics but

Wait, I just spoke about CPU semantics?! Which semantics
are you talking about?
> that clearly is not on your mind. But I'd say this is a good
> summary.

I put 'dead code' in the same realm as device drivers work.
And they seem to always have some issue or another.
Or maybe I getting unlucky and getting copied on those bugs.
> 
>   Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]           ` <20160414194408.GP1990@wotan.suse.de>
  2016-04-14 20:38             ` Konrad Rzeszutek Wilk
       [not found]             ` <20160414203847.GB21657@localhost.localdomain>
@ 2016-04-15  5:50             ` Juergen Gross
  2016-04-15  9:59             ` George Dunlap
                               ` (2 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: Juergen Gross @ 2016-04-15  5:50 UTC (permalink / raw)
  To: Luis R. Rodriguez, George Dunlap
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Julien Grall, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Stefano Stabellini, joeyli,
	Borislav Petkov, Boris Ostrovsky, Charles Arndol, Andrew Cooper,
	Jim Fehlig, Andy Lutomirski, David Vrabel

On 14/04/16 21:44, Luis R. Rodriguez wrote:
> On Thu, Apr 14, 2016 at 10:53:47AM +0100, George Dunlap wrote:
>> On 13/04/16 20:52, Luis R. Rodriguez wrote:
>>> On Wed, Apr 13, 2016 at 04:44:54PM +0100, George Dunlap wrote:
>>>> On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>>>>> So more to it, if the EFI entry already provides a way into Linux
>>>>> in a more streamlined fashion bringing it closer to the bare metal
>>>>> boot entry, why *would* we add another boot entry to x86, even if
>>>>> its small and self contained ?
>>>>
>>>> We would avoid using EFI if:
>>>
>>> And this is what I was looking for, thanks!
>>>
>>>> * Being called both on real hardware and under Xen would make the EFI
>>>> entry point more complicated
>>>
>>> That's on the EFI Linux maintainer to assess. And he seems willing to
>>> consider this.
>>>
>>>> * Adding the necessary EFI support into Xen would be a significant
>>>> chunk of extra work
>>>
>>> This seems to be a good sticking point, but Andi noted another aspect
>>> of this or redundancy as well.
>>>
>>>> * Requiring PVH mode to implement EFI would make it more difficult for
>>>> other kernes (NetBSD, FreeBSD) to act as dom0s.
>>>
>>> What if this is an option only then ?
>>>
>>>>
>>>> * Requiring PVH mode to use EFI would make it more difficult to
>>>> support unikernel-style workloads for domUs.
>>>
>>> What if this is an option only then ?
>>
>> So first of all, you asked why anyone would oppose EFI, and this is part
>> of the answer to that.
>>
>> Secondly, you mean "What if this is the only thing the Linux maintainers
>> will accept?"  And you already know the answer to that.
> 
> No, I meant to ask, would it be possible to make booting HVMLite using EFI
> be optional ? That way if you already support EFI that can be used on
> your entires with some small modifications.

So you suggest to add two HVMlite modes regarding boot interface
instead of one?

I still have the impression you are suggesting by using the same entry
everything is solved in the OS. You still need the support of HVMlite
especially in the early boot path to make sure the OS won't try to use
the complete EFI standard.

> 
>> How much of a burden it would be on the rest of the open-source
>> ecosystem (Xen, *BSDs, &c) is a combination of some as-yet unknown facts
>> (i.e., what a minimal Xen/Linux EFI interface would look like) and a
>> matter of judgement (i.e., given the same interface, reasonable people
>> may come to different conclusions about whether the interface is an
>> undue burden to impose on others or not).
>>
>> But I would hope that the Linux maintainers would at least consider the
>> broader community when weighing their decisions, and not take advantage
>> of their position of dominance to simply ignore the effect of their
>> choices on everybody else.
> 
> This has nothing to do with dominance or anything nefarious, I'm asking
> simply for a full engineering evaluation of all possibilities, with
> the long term in mind. Not for now, but for hardware assumptions which
> are sensible 5 years from now.

No, they are not.

Given how long the EFI standard is available now and how buggy many
vendor's implementations are I don't expect all computers sold in 5
years will have a usable EFI. This will be true especially for
consumer devices where no EFI is available today.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]           ` <20160414194408.GP1990@wotan.suse.de>
                               ` (2 preceding siblings ...)
  2016-04-15  5:50             ` Juergen Gross
@ 2016-04-15  9:59             ` George Dunlap
       [not found]             ` <57108121.1070307@suse.com>
       [not found]             ` <5710BB74.2060409@citrix.com>
  5 siblings, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-04-15  9:59 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Julien Grall, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Jim Fehlig, Andy Lutomirski

On 14/04/16 20:44, Luis R. Rodriguez wrote:
> On Thu, Apr 14, 2016 at 10:53:47AM +0100, George Dunlap wrote:
>> On 13/04/16 20:52, Luis R. Rodriguez wrote:
>>> On Wed, Apr 13, 2016 at 04:44:54PM +0100, George Dunlap wrote:
>>>> On Thu, Apr 7, 2016 at 7:51 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>>>>> So more to it, if the EFI entry already provides a way into Linux
>>>>> in a more streamlined fashion bringing it closer to the bare metal
>>>>> boot entry, why *would* we add another boot entry to x86, even if
>>>>> its small and self contained ?
>>>>
>>>> We would avoid using EFI if:
>>>
>>> And this is what I was looking for, thanks!
>>>
>>>> * Being called both on real hardware and under Xen would make the EFI
>>>> entry point more complicated
>>>
>>> That's on the EFI Linux maintainer to assess. And he seems willing to
>>> consider this.
>>>
>>>> * Adding the necessary EFI support into Xen would be a significant
>>>> chunk of extra work
>>>
>>> This seems to be a good sticking point, but Andi noted another aspect
>>> of this or redundancy as well.
>>>
>>>> * Requiring PVH mode to implement EFI would make it more difficult for
>>>> other kernes (NetBSD, FreeBSD) to act as dom0s.
>>>
>>> What if this is an option only then ?
>>>
>>>>
>>>> * Requiring PVH mode to use EFI would make it more difficult to
>>>> support unikernel-style workloads for domUs.
>>>
>>> What if this is an option only then ?
>>
>> So first of all, you asked why anyone would oppose EFI, and this is part
>> of the answer to that.
>>
>> Secondly, you mean "What if this is the only thing the Linux maintainers
>> will accept?"  And you already know the answer to that.
> 
> No, I meant to ask, would it be possible to make booting HVMLite using EFI
> be optional ? That way if you already support EFI that can be used on
> your entires with some small modifications.

Oh -- I read both those lines as, "What if this is *the only option*
then?" (which I then interpreted to mean, what if booting EFI is the
only thing Linux will accept).  The rest of my reply is based on that
misunderstanding.  Sorry about that.

Regarding the second one -- I wasn't talking about actual non-Linux
unikernels; I was talking about using Linux in the way that unikernels
are used ("unikernel-style").  That is, you boot a minimal Linux image
with a small ramdisk and have a single process running as init.  For
this use case, even an extra megabyte of guest RAM and an extra second
of boot time is a significant cost.  "Use OVMF for domUs" is an
excellent solution for traditional VMs where you boot a full distro, but
would impose a significant cost on using Linux in unikernel-style VMs.

Whether a stripped-down EFI support would be sufficiently low memory /
latency for such workloads is an open question that would take time and
engineering effort to discover.  And in any case, it would certainly
require the maintenance of Yet Another Bootloader in the Xen source tree.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]                       ` <20160414205619.GR1990@wotan.suse.de>
  2016-04-15  2:02                         ` Konrad Rzeszutek Wilk
@ 2016-04-15 10:06                         ` Julien Grall
       [not found]                         ` <5710BD0B.2070306@arm.com>
       [not found]                         ` <20160415020246.GA6956@localhost.localdomain>
  3 siblings, 0 replies; 68+ messages in thread
From: Julien Grall @ 2016-04-15 10:06 UTC (permalink / raw)
  To: Luis R. Rodriguez, Konrad Rzeszutek Wilk
  Cc: Matt Fleming, Michael Chang, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Stefano Stabellini, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Kees Cook, Josh Triplett, joeyli,
	Borislav Petkov, Boris Ostrovsky, Juergen Gross, Andrew Cooper,
	linux-kernel, Andy Lutomirski, David Vrabel, Vitaly Kuznetsov

Hello Luis,

On 14/04/16 21:56, Luis R. Rodriguez wrote:
> On Thu, Apr 14, 2016 at 03:56:53PM -0400, Konrad Rzeszutek Wilk wrote:
>> On Thu, Apr 14, 2016 at 08:40:48PM +0200, Luis R. Rodriguez wrote:
>>> On Wed, Apr 13, 2016 at 09:01:32PM -0400, Konrad Rzeszutek Wilk wrote:
>>>> On Thu, Apr 14, 2016 at 12:23:17AM +0200, Luis R. Rodriguez wrote:
>>> PV support from the kernel (not the hypervisor) and require hardware
>>> virtualization 5 years from now on the Linux kernel, it doesn't seem
>>> to me far fetched to at the very least consider using an EFI entry
>>> instead, specially since all it does is set boot params and we can
>>> make re-use this for HVMLite too.
>>
>> But to make that work you have to emulate EFI firmware in the
>> hypervisor. Is that work you are signing up for?
>
> I'll do what is needed, as I have done before. If EFI is on the long
> term roadmap for ARM perhaps there are a few birds to knock with one
> stone here. If there is also interest to support other OSes through
> EFI standard means this also should help make that easier.

We already have a working solution for EFI on ARM which does not require 
to emulate the firmware in the hypervisor.

On ARM, the EFI stub is communicating with the kernel using device-tree 
[1]. Once the EFI stub has ended, the native path (i.e non-UEFI) will be 
executed normally and it won't be possible to use BootServices anymore.

For the guest, we provide a full support of EFI using OVMF. For DOM0, 
Xen will craft the UEFI system table and the UEFI memory map. The 
locations of those tables will be passed to DOM0 using a tiny 
device-tree [1] and the kernel will boot using the native path. The 
runtime services for DOM0 will be provided via hypercall.

The DOM0 approach has been discussed for a long time (see [3]) and I 
believe this is better than emulating UEFI firmware in Xen. We want to 
keep Xen on ARM tiny. Adding any sort of emulation will increase the 
attack surface and require more maintenance from our side.

Regards,

[1] Documentation/arm/uefi.txt in Linux.

[2] 
http://xenbits.xen.org/docs/unstable-staging/misc/arm/device-tree/guest.txt

[3] http://www.gossamer-threads.com/lists/xen/devel/397349

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]                         ` <5710BD0B.2070306@arm.com>
@ 2016-04-15 14:55                           ` Luis R. Rodriguez
       [not found]                           ` <CAB=NE6UDuLOnW8xfTcgCGSbJ1aS4TkkokcGdeJGHMBps0T9=Sg@mail.gmail.com>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-15 14:55 UTC (permalink / raw)
  To: Julien Grall
  Cc: Matt Fleming, Michael Chang, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Stefano Stabellini, Daniel Kiper, X86 ML,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Kees Cook, Josh Triplett, joeyli,
	Borislav Petkov, Boris Ostrovsky, Juergen Gross, Andrew Cooper,
	linux-kernel, Andy Lutomirski, David Vrabel

On Fri, Apr 15, 2016 at 3:06 AM, Julien Grall <julien.grall@arm.com> wrote:
> On 14/04/16 21:56, Luis R. Rodriguez wrote:
>> On Thu, Apr 14, 2016 at 03:56:53PM -0400, Konrad Rzeszutek Wilk wrote:
>>> But to make that work you have to emulate EFI firmware in the
>>> hypervisor. Is that work you are signing up for?
>>
>> I'll do what is needed, as I have done before. If EFI is on the long
>> term roadmap for ARM perhaps there are a few birds to knock with one
>> stone here. If there is also interest to support other OSes through
>> EFI standard means this also should help make that easier.
>
> We already have a working solution for EFI on ARM which does not require to
> emulate the firmware in the hypervisor.

I get that.

> On ARM, the EFI stub is communicating with the kernel using device-tree [1].
> Once the EFI stub has ended, the native path (i.e non-UEFI) will be executed
> normally and it won't be possible to use BootServices anymore.
>
> For the guest, we provide a full support of EFI using OVMF.

I get that as well, is this the long term solution ? That still
requires OVMF, will relying on OVMF always be what is used on Xen ARM
? Was it too much of a burden to require OVMF? Is the upstream OVMF
code pulled by Xen at build time on ARM, or just wget a binary ?

> For DOM0, Xen
> will craft the UEFI system table and the UEFI memory map. The locations of
> those tables will be passed to DOM0 using a tiny device-tree [1] and the
> kernel will boot using the native path. The runtime services for DOM0 will
> be provided via hypercall.

Thanks this helps!

> The DOM0 approach has been discussed for a long time (see [3]) and I believe
> this is better than emulating UEFI firmware in Xen. We want to keep Xen on
> ARM tiny. Adding any sort of emulation will increase the attack surface and
> require more maintenance from our side.

OK thanks, would re-using OVMF (note, DT perhaps may not be ideal for
x86 for the rest though) be a reasonable solution on x86 as an option
then?

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]             ` <57108121.1070307@suse.com>
@ 2016-04-15 15:24               ` Luis R. Rodriguez
  0 siblings, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-15 15:24 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Matt Fleming, jeffm, Linux Kernel Mailing List, Jim Fehlig,
	Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Julien Grall,
	Stefano Stabellini, George Dunlap, joeyli, Borislav Petkov,
	Boris Ostrovsky, Charles Arndol, Andrew Cooper, Michael Chang,
	Andy Lutomirski

On Fri, Apr 15, 2016 at 07:50:25AM +0200, Juergen Gross wrote:
> On 14/04/16 21:44, Luis R. Rodriguez wrote:
> > No, I meant to ask, would it be possible to make booting HVMLite using EFI
> > be optional ? That way if you already support EFI that can be used on
> > your entires with some small modifications.
> 
> So you suggest to add two HVMlite modes regarding boot interface
> instead of one?

Not suggest, I'm evaluating what options we have available. That's very
different from suggesting. That's the point to this whole topic, pure and
simple evaluation of options.

> Given how long the EFI standard is available now and how buggy many
> vendor's implementations are I don't expect all computers sold in 5
> years will have a usable EFI. This will be true especially for
> consumer devices where no EFI is available today.

Thanks this really helps.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]             ` <5710BB74.2060409@citrix.com>
@ 2016-04-15 15:30               ` Luis R. Rodriguez
       [not found]               ` <20160415153028.GX1990@wotan.suse.de>
  1 sibling, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-15 15:30 UTC (permalink / raw)
  To: George Dunlap
  Cc: Matt Fleming, jeffm, Linux Kernel Mailing List, Jim Fehlig,
	Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Julien Grall, Stefano Stabellini, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper, Michael Chang,
	Andy Lutomirski

On Fri, Apr 15, 2016 at 10:59:16AM +0100, George Dunlap wrote:
> On 14/04/16 20:44, Luis R. Rodriguez wrote:
> > No, I meant to ask, would it be possible to make booting HVMLite using EFI
> > be optional ? That way if you already support EFI that can be used on
> > your entires with some small modifications.
> 
> I wasn't talking about actual non-Linux unikernels; I was talking about using
> Linux in the way that unikernels are used ("unikernel-style").  That is, you
> boot a minimal Linux image with a small ramdisk and have a single process
> running as init.  For this use case, even an extra megabyte of guest RAM and
> an extra second of boot time is a significant cost.  "Use OVMF for domUs" is
> an excellent solution for traditional VMs where you boot a full distro, but
> would impose a significant cost on using Linux in unikernel-style VMs.

Understood.

> Whether a stripped-down EFI support would be sufficiently low memory /
> latency for such workloads is an open question that would take time and
> engineering effort to discover.  And in any case, it would certainly
> require the maintenance of Yet Another Bootloader in the Xen source tree.

OVMF is used by ARM, so using it should be a matter of adaptation, and
some changes other than perhaps DT use. Question still stands though,
would it be possible to have HVMLite be using EFI as an option so that
some users could opt-in if they so wish ?

To be clear, at this point I am not suggesting this be done, just evaluating
the options available.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]               ` <20160415153028.GX1990@wotan.suse.de>
@ 2016-04-15 16:03                 ` George Dunlap
       [not found]                 ` <571110BB.2000408@citrix.com>
  1 sibling, 0 replies; 68+ messages in thread
From: George Dunlap @ 2016-04-15 16:03 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, jeffm, Michael Chang, Linux Kernel Mailing List,
	Julien Grall, Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Jim Fehlig, Andy Lutomirski

On 15/04/16 16:30, Luis R. Rodriguez wrote:
> On Fri, Apr 15, 2016 at 10:59:16AM +0100, George Dunlap wrote:
>> On 14/04/16 20:44, Luis R. Rodriguez wrote:
>>> No, I meant to ask, would it be possible to make booting HVMLite using EFI
>>> be optional ? That way if you already support EFI that can be used on
>>> your entires with some small modifications.
>>
>> I wasn't talking about actual non-Linux unikernels; I was talking about using
>> Linux in the way that unikernels are used ("unikernel-style").  That is, you
>> boot a minimal Linux image with a small ramdisk and have a single process
>> running as init.  For this use case, even an extra megabyte of guest RAM and
>> an extra second of boot time is a significant cost.  "Use OVMF for domUs" is
>> an excellent solution for traditional VMs where you boot a full distro, but
>> would impose a significant cost on using Linux in unikernel-style VMs.
> 
> Understood.
> 
>> Whether a stripped-down EFI support would be sufficiently low memory /
>> latency for such workloads is an open question that would take time and
>> engineering effort to discover.  And in any case, it would certainly
>> require the maintenance of Yet Another Bootloader in the Xen source tree.
> 
> OVMF is used by ARM, so using it should be a matter of adaptation, and
> some changes other than perhaps DT use. Question still stands though,
> would it be possible to have HVMLite be using EFI as an option so that
> some users could opt-in if they so wish ?

Well we definitely intend go have a mode of PVH* which boots OVMF to
EFI-enabled guests, if that's what you mean.  For one thing, that should
in theory allow us to boot Windows guests without needing to spin up
qemu to emulate any devices (since OVMF will be able to access the PV
devices until the Windows PV drivers come up).  Booting to EFI-enabled
distros is certainly something we want as well.

But we need an option for dom0, and ideally we'd like an option for
lightweight Linux guests.  It's using EFI for those purposes that we're
pushing back on.

 -George

* I'm saying PVH because I hope when everything is sorted out we can
just call HVMLite PVH again.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]                         ` <20160415020246.GA6956@localhost.localdomain>
@ 2016-04-15 17:08                           ` Luis R. Rodriguez
  0 siblings, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-15 17:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Matt Fleming, Michael Chang, linux-kernel, Julien Grall,
	Jan Beulich, H. Peter Anvin, Daniel Kiper, x86,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Kees Cook, Stefano Stabellini, Josh Triplett,
	joeyli, Borislav Petkov, Boris Ostrovsky, Juergen Gross,
	Ard Biesheuvel, Andrew Cooper, Jim Fehlig, Andy Lutomirski

On Thu, Apr 14, 2016 at 10:02:47PM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Apr 14, 2016 at 10:56:19PM +0200, Luis R. Rodriguez wrote:
> > Are you telling me that HVMLite has no dead code issues ?
> 
> You said earlier that baremetal has dead code issue. Then by extensions
> _any_ execution path has dead code issues.

;)

> > Are you *sure* we have no dead code concerns with HVMLite ?
> > If there are dead code concerns are you sure there might not
> > be differences between KVM and HVMLite ? Should cpuid be used to
> > address differences ? Will that enable to distinguish between
> > hybrid versions of HVMLite ? Are we sure ?
> 
> HVMLite CPU semantics will be the same as what a baremetal CPU
> semantics are.
> 
> Platform wise it will be different - as in, instead of say
> having a speaker (to emulated it) or RTC clock (again, another
> thing to emulate), or say IDE controller (again, another
> thing to emulate), or Realtek network card (again, another
> thing to emulate) - it has none of those.
> 
> [Keep in mind 'another thing to emulate', means 'another
> @$@() thing in QEMU that could be a security bug']
> 
> So it differs from an consumer x86 platform in that it has
> none of the 'legacy' stuff. And it requires PV drivers to
> function. And since it requires PV drivers to function
> only OSes that have those can use this mode.

OK that was a long winded way to suggest dead code differences may really lie
in the implications of using some PV drivers.  That seems sensible to me and is
a good starting point to consider in the future. This is helpful thanks.

For KVM, it shall be pretty similar, however instead of PV drivers we'd be
dealing with emulated hardware and drivers using / being able to detect
emulation.

For both, the size of the dead code that is possible would grow depending on
the dependencies / libraries that these things use in comparison to bare metal.

Likewise other than this both will rely on hardware virtualization extensions
and this may implicate some dead code impact, it will depend on what run time
differences this causes on code and the dependencies on those components.

> > > > We cannot use cpuid early in asm code, I'm looking for something we
> > > 
> > > ?! Why!?
> > 
> > What existing code uses it? If there is nothing you are still certain
> > it should work ? Would that work for old PV guest as well BTW ?
> 
> Yeah. For HVM/HVMLite it traps to the hypervisor.
> 
> For old PV guests it is unwise to use it as it goes straight to
> the hardware (as PV guests run in ring3 - they are considered
> 'userspace' and the Intel nor AMD do not trap on 'cpuid' in ring3
> -unless  you run in an VMX container).

OK one heuristic we may need then for both KVM and HVMLite could be 'Are we
using hardware virtualization extensions', as that seems to implicate we can
then trust cpuid to zero-in on any further low level virtualization semantics
we may need and from what you are saying this should even work on asm code as
early as the sun rises.

> > > > Right, but I'm saying that its rather silly to be adding entry points if
> > > > all we want the code to do is copy the boot params for us. The design
> > > > doc requires a new entry, and likewise you'd need yet-another-entry if
> > > > HVMLite is thrown out the window and come back 5 years later after new
> > > > hardware solutions are in place and need to redesign HVMLite. Kind of
> > > 
> > > Why would you need to redesign HVMLite based on hardware solutions?
> > 
> > That's what happened to Xen PV, right ? Are we sure 5 years from now we won't
> > have any new hardware virtualization features that will just obsolete HVMLite?
> 
> There were no hardware virtualization when Xen PV came about.
> 
> If there is hardware virtualization that obsoletes HVMLite that means
> it would also obsolete KVM and HVM mode

Indeed.

> > > The entrace point and the CPU state are pretty well known - it is akin
> > > to what GRUB2 bootloader path is (protected mode).
> > > > where we are with PVH today. Likewise if other paravirtualization
> > > > developers want to support Linux and want to copy your strategy they'd
> > > > add yet-another-entry-point as well.
> > > > 
> > > > This is dumb.
> > > 
> > > You saying the EFI entry point is dumb? That instead the EFI
> > > firmware should understand Linux bootparams and booted that?
> > 
> > EFI is a standard. Xen is not. And since we are not talking about legacy
> 
> And is a standard something that has to come out of a committee?
> 
> If so, then Linux bootparams is not a standard. Nor is LILO bootup
> path.

My point was that it is very likely that OSes will implement booting
EFI, so if they were going to do that, and if this was going to be
streamlined on different architectures EFI as an entry point makes
sense to consider for future code as an entry point.

> > hardware in the future, EFI seems like a sensible option to consider for an
> > entry point. Specially given that it may mean that we can ultimately also help
> > unify more entry points on Linux in general. I'd prefer to consider using
> 
> <chokes>
> I can just see that. On non-EFI hardware GRUB2/SYSLINUX would use the EFI entry
> point and create an fake firmware.

That would be stupid, however it would also be stupid to not use EFI
if available if there were benefits from using it... so in context
the question here is if hardware supports EFI, would it be a sensible
option for HVMLite to use ?

> > EFI configuration tables instead of extending the x86 boot protocol.
> 
> What is that? Are you talking about EFI runtime services? Take a look
> at the EFI spec and see what you have to implement to emulate this.

Here's an example:

https://lkml.kernel.org/r/CAKv+Gu-=e4YiaR15+MWkazxH6s3ELnYj1BAV_PeuzhkLQpoNqA@mail.gmail.com

Its used to get scree_info on ARM.

> > 
> > > > > > If we had a single 'type' and 'custom data' passed to the kernel that
> > > > > > should suffice for the default Linux entry point to just pivot off
> > > > > > of that and do what it needs without more entry points. Once.
> > > > > 
> > > > > And what about ramdisk? What about multiple ramdisks?
> > > > > What about command line? All of that is what bootparams
> > > > > tries to unify on Linux. But 'bootparams' is unique to Linux,
> > > > > it does not exist on FreeBSD. Hence some stub code to transplant
> > > > > OS-agnostic simple data to OS-specific is neccessary.
> > > > 
> > > > If we had a Xen ABI option where *all* that I'm asking is you pass
> > > > first:
> > > > 
> > > >   a) hypervisor type
> > > 
> > > Why can't you use cpuid.
> > 
> > I'll evaluate that.
> > 
> > > >   b) custom data pointer
> > > 
> > > What is this custom data pointer you speak of?
> > 
> > For Xen this is the xen_start_info, the structure that Xen stuffs in
> > a copy of its version of what we need to fill the boot_params.
> 
> Ok, but that is what we do in some way provide.
> 
> I am lost here. You seem to saying you want something that is
> already there?

Its not done in a way that avoids yet-another-entry point. I am looking
to see if we can instead let the OS agnostic data still be passed but
let the kernel stick to its existing entry points without adding
another entry point just for HVMLite.

> > > > We'd be able to avoid adding *any* entry point and just address
> > > > the requirements as I noted with pre / post stubs for the type.
> > > 
> > > But you need some entry point to call into Linux. Are you
> > > suggesting to use the existing ones? No, the existing one
> > > wouldn't understand this.
> > 
> > If we used the boot_parms, yes it would be possible.
> 
> ...OS agnostic... they are not.

Right... 

> > > > This would require an x86 boot protocol bump, but all the issues
> > > > creeping up randomly I think that's worth putting on the table now.
> > > 
> > > Aaaah, so you are saying expand the bootparams. In other words
> > > make Xen ABI call into Linux using the bootparams structure, similar
> > > to how GRUB2 does it.
> > > 
> > > How is that OS agnostic?
> > 
> > That's an issue, I understand. EFI is OS agnostic though.
> > 
> > > > And maybe we don't want it to be hypervisor specific, perhaps there are other
> > > > *needs* for custom pre-post startup_32()/startup_64() stubs.
> > > 
> > > Multiboot?
> > 
> > Can you elaborate?
> 
> Google Multiboot specification.

Neat thanks. Hrm...

> > > > To avoid extending boot_params further I figured perhaps we can look
> > > > at EFI as another option instead. If we are going to drop all legacy
> > > 
> > > But EFI support is _huge_.
> > 
> > I get the sense now. Perhaps we should explore to what extent now really
> > at the Hackathon.
> 
> Print out the EFI spec and carry it on the plane. The plane will tilt
> to one side when trying to take off.

Maybe I'll take Xen code as well to balance it all out.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]                 ` <571110BB.2000408@citrix.com>
@ 2016-04-15 17:17                   ` Luis R. Rodriguez
  0 siblings, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-15 17:17 UTC (permalink / raw)
  To: George Dunlap
  Cc: Matt Fleming, jeffm, Linux Kernel Mailing List, Jim Fehlig,
	Jan Beulich, H. Peter Anvin, Daniel Kiper,
	the arch/x86 maintainers, Takashi Iwai, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Julien Grall, Stefano Stabellini, joeyli, Borislav Petkov,
	Boris Ostrovsky, Juergen Gross, Andrew Cooper, Michael Chang,
	Andy Lutomirski

On Fri, Apr 15, 2016 at 05:03:07PM +0100, George Dunlap wrote:
> On 15/04/16 16:30, Luis R. Rodriguez wrote:
> > On Fri, Apr 15, 2016 at 10:59:16AM +0100, George Dunlap wrote:
> >> On 14/04/16 20:44, Luis R. Rodriguez wrote:
> >>> No, I meant to ask, would it be possible to make booting HVMLite using EFI
> >>> be optional ? That way if you already support EFI that can be used on
> >>> your entires with some small modifications.
> >>
> >> I wasn't talking about actual non-Linux unikernels; I was talking about using
> >> Linux in the way that unikernels are used ("unikernel-style").  That is, you
> >> boot a minimal Linux image with a small ramdisk and have a single process
> >> running as init.  For this use case, even an extra megabyte of guest RAM and
> >> an extra second of boot time is a significant cost.  "Use OVMF for domUs" is
> >> an excellent solution for traditional VMs where you boot a full distro, but
> >> would impose a significant cost on using Linux in unikernel-style VMs.
> > 
> > Understood.
> > 
> >> Whether a stripped-down EFI support would be sufficiently low memory /
> >> latency for such workloads is an open question that would take time and
> >> engineering effort to discover.  And in any case, it would certainly
> >> require the maintenance of Yet Another Bootloader in the Xen source tree.
> > 
> > OVMF is used by ARM, so using it should be a matter of adaptation, and
> > some changes other than perhaps DT use. Question still stands though,
> > would it be possible to have HVMLite be using EFI as an option so that
> > some users could opt-in if they so wish ?
> 
> Well we definitely intend go have a mode of PVH* which boots OVMF to
> EFI-enabled guests, if that's what you mean.  For one thing, that should
> in theory allow us to boot Windows guests without needing to spin up
> qemu to emulate any devices (since OVMF will be able to access the PV
> devices until the Windows PV drivers come up).

OK so for Windows x86 HVMLite will need to go the EFI boot route for sure,
only it will use OVMF ?

> Booting to EFI-enabled
> distros is certainly something we want as well.
> 
> But we need an option for dom0, and ideally we'd like an option for
> lightweight Linux guests.  It's using EFI for those purposes that we're
> pushing back on.
> 
>  -George
> 
> * I'm saying PVH because I hope when everything is sorted out we can
> just call HVMLite PVH again.

OK sure, so so long as:

 * Other OSes don't have to use EFI
 * We keep a Linux non-EFI lightweight boot mechanism

Then the OVMF / EFI route (perhaps alternatives might be minimal EFI
emulation) is still a prospect on the table long term.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]                           ` <CAB=NE6UDuLOnW8xfTcgCGSbJ1aS4TkkokcGdeJGHMBps0T9=Sg@mail.gmail.com>
@ 2016-04-15 18:44                             ` Stefano Stabellini
  0 siblings, 0 replies; 68+ messages in thread
From: Stefano Stabellini @ 2016-04-15 18:44 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Matt Fleming, linux-kernel, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Stefano Stabellini, Daniel Kiper, X86 ML,
	Vojtěch Pavlík, Gary Lin, xen-devel, Jeffrey Cheung,
	Charles Arndol, Julien Grall, Kees Cook, Josh Triplett, joeyli,
	Borislav Petkov, Boris Ostrovsky, Juergen Gross, Andrew Cooper,
	Michael Chang, Andy Lutomirski

On Fri, 15 Apr 2016, Luis R. Rodriguez wrote:
> On Fri, Apr 15, 2016 at 3:06 AM, Julien Grall <julien.grall@arm.com> wrote:
> > On 14/04/16 21:56, Luis R. Rodriguez wrote:
> >> On Thu, Apr 14, 2016 at 03:56:53PM -0400, Konrad Rzeszutek Wilk wrote:
> >>> But to make that work you have to emulate EFI firmware in the
> >>> hypervisor. Is that work you are signing up for?
> >>
> >> I'll do what is needed, as I have done before. If EFI is on the long
> >> term roadmap for ARM perhaps there are a few birds to knock with one
> >> stone here. If there is also interest to support other OSes through
> >> EFI standard means this also should help make that easier.
> >
> > We already have a working solution for EFI on ARM which does not require to
> > emulate the firmware in the hypervisor.
> 
> I get that.
> 
> > On ARM, the EFI stub is communicating with the kernel using device-tree [1].
> > Once the EFI stub has ended, the native path (i.e non-UEFI) will be executed
> > normally and it won't be possible to use BootServices anymore.
> >
> > For the guest, we provide a full support of EFI using OVMF.
> 
> I get that as well, is this the long term solution ?

Yes, it is for Xen on ARM.


> That still requires OVMF, will relying on OVMF always be what is used
> on Xen ARM ?

Not always, the native boot path is still supported. It is possible to
boot a VM using "kernel=/path/to/linux" in your VM config file and that
is not going to boot via EFI but via the native boot path.

To summarize, on ARM:

# DomUs options:
1) xl create "kernel=/path/to/ovfm.bin" -> OVMF -> EFI stub -> Linux (regular entry point)
2) xl create "kernel=/path/to/Linux" -> Linux (regular entry point)

# Dom0 options:
1) native UEFI firmare -> Xen (ExitBootServices) -> Linux (regular entry point)
2) uBoot -> Xen -> Linux (regular entry point)


> Was it too much of a burden to require OVMF?

No, it wasn't. Especially because Anthony had already introduced Xen
support in it.


> Is the upstream OVMF code pulled by Xen at build time on ARM, or just
> wget a binary ?

At the moment the build is not integrated, so you need to go and build
it yourself or use Raisin to do it.


> > For DOM0, Xen will craft the UEFI system table and the UEFI memory
> > map. The locations of those tables will be passed to DOM0 using a
> > tiny device-tree [1] and the kernel will boot using the native path.
> > The runtime services for DOM0 will be provided via hypercall.
> 
> Thanks this helps!
> 
> > The DOM0 approach has been discussed for a long time (see [3]) and I believe
> > this is better than emulating UEFI firmware in Xen. We want to keep Xen on
> > ARM tiny. Adding any sort of emulation will increase the attack surface and
> > require more maintenance from our side.
> 
> OK thanks, would re-using OVMF (note, DT perhaps may not be ideal for
> x86 for the rest though) be a reasonable solution on x86 as an option
> then?

Reusing OVMF for HVMLite DomUs should be easy and something to look at
in the future. Reusing OVMF for HVMLite Dom0 is another story. I think
is a bad idea.

If we wanted to do something like we did on ARM, we need to understand
how the Linux internal API on x86 between the EFI stub and the regular
entry point look like. Is there even one? Could we elevate that to an
external interface and use it to boot Linux from Xen? If so, that would
be an option.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: HVMLite / PVHv2 - using x86 EFI boot entry
       [not found]             ` <20160413115846.hyt4lg24rfkenbxu@mac>
@ 2016-04-15 22:53               ` Matt Fleming
  0 siblings, 0 replies; 68+ messages in thread
From: Matt Fleming @ 2016-04-15 22:53 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Michael Chang, linux-kernel, Jim Fehlig, Jan Beulich,
	H. Peter Anvin, Daniel Kiper, X86 ML, Vojtěch Pavlík,
	Gary Lin, xen-devel, Jeffrey Cheung, Charles Arndol,
	Stefano Stabellini, joeyli, Borislav Petkov, Boris Ostrovsky,
	Juergen Gross, Andrew Cooper, Julien Grall, Andy Lutomirski,
	Luis R. Rodriguez, David Vrabel

(Sorry, just realised I never replied to this)

On Wed, 13 Apr, at 01:59:10PM, Roger Pau Monné wrote:
> 
> Is this header compatible with the ELF header? Con both co-exist in the 
> same binary without issues?
 
Nope, they cannot. We get away with mixing bzImage headers and PE/COFF
headers for the EFI stub because bzImage has no magic string and
contains historical code at the start of the file. The code is never
executed in practice nowadays (it tells the user to use a boot loader
instead of direct execution) so we just stamp a PE/COFF header over it
when CONFIG_EFI_STUB is enabled.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* HVMLite / PVHv2 - using x86 EFI boot entry
@ 2016-04-06  2:40 Luis R. Rodriguez
  0 siblings, 0 replies; 68+ messages in thread
From: Luis R. Rodriguez @ 2016-04-06  2:40 UTC (permalink / raw)
  To: Andrew Cooper, Boris Ostrovsky, david.vrabel,
	Roger Pau Monné,
	Matt Fleming, Juergen Gross, Charles Arndol, Jim Fehlig,
	Jan Beulich, Daniel Kiper, H. Peter Anvin, x86
  Cc: Stefano Stabellini, linux-kernel, Michael Chang, Andy Lutomirski,
	joeyli, Julien Grall, Vojtěch Pavlík, Borislav Petkov,
	xen-devel, Gary Lin, Jeffrey Cheung

Boris sent out the first HVMLite series of patches to add a new Xen guest type
February 1, 2016 [0]. We've been talking off list with a few folks now over
the prospect of instead of adding yet-another-boot-entry we instead fixate
HVMLite to use the x86 EFI boot entry. There's a series of reasons to consider
this, likewise there are reasons to question the effort required and if its
really needed. We'd like some more public review of this proposal, and see if
others can come up with other ideas, both in favor or against this proposal.

This in particular is also a good time to get x86 Linux folks to chime on on
the general design proposal of HVMLite design, given that outside of the boot
entry discussion it would seem including myself that we didn't get the memo
over the proposed architecture review [1]. At least on my behalf perhaps the
only sticking thorns of the design was the new boot entry, which came to me
as a surprise, and this thread addresses and the lack of addressing semantics 
for early boot (which we may seem to need to address; some of this is being
addressing in parallels through other work). The HVMLite document talks about
using ACPI_FADT_NO_VGA -- we don't use this yet upstream but I have some pending
changes which should make it easy to integrate its use on HVMLite. Perhaps
there are others that may have some other points they may want to raise now...

A huge summary of the discussion over EFI boot option for HVMLite is now on a
wiki [2], below I'll just provide the outline of the discussion. Consider this a
request for more public review, feel free to take any of the items below and
elaborate on it as you see fit.

Worth mentioning also is that this topic will be discussed at the 2016 Xen
Hackathon April 18-19 [3] at the ARM Cambridge, UK Headquarters so if you can
attend and this topic interests you, consider attending.

  * Linux x86 Xen EFI boot entry evaluation
  * Issues with boot x86 boot entries
    * Bypassing native startup_32() / startup_64()
    * Small x86 zero page stubs

  * Xen evolution and roadmap
    * About PVH
    * About HVMLite
    * Xen ARM solution

  * Why use EFI for HVMlite
    * EFI calling conventions are standardized
    * EFI entry generalizes what new HVMLite entry proposes
    * Further semantics may be needed
    * Match Xen ARM's clean solution
    * You don't need full EFI emulation
      * Minimal EFI stubs for guests
        * GetMemoryMap()
        * ExitBootServices()
      * EFI stubs which may be needed for guests
        * Exit()
        * Variable operation functions
      * EFI stubs not needed for guests
        * GetTime()/SetTime()
        * SetVirtualAddressMap()
        * ResetSystem()
      * dom0 EFI
      * domU EFI emulation possibilities
        * Xen implements its own EFI environment for guests
        * Xen uses Tianocore / OVMF
    * kexec needs a boot path as well

  * Points against using EFI
    * Legacy PV guests need to be supported
    * Nulling the claimed boot loader effect
    * startup_32 / startup_64 flexibility
  * Remaining questions

[0] http://lkml.kernel.org/r/1454341137-14110-3-git-send-email-boris.ostrovsky@oracle.com
[1] http://lists.xen.org/archives/html/xen-devel/2016-02/msg01609.html
[2] http://kernelnewbies.org/KernelProjects/x86-xen-efi
[3] http://wiki.xenproject.org/wiki/Hackathon/April2016

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2016-04-15 22:53 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20160406024027.GX1990@wotan.suse.de>
2016-04-06  9:40 ` HVMLite / PVHv2 - using x86 EFI boot entry David Vrabel
2016-04-06 11:07 ` George Dunlap
2016-04-06 11:11 ` Daniel Kiper
     [not found] ` <CAFLBxZbRjB6QWH5GbG6osCXat9NQVUAyDYrAMrdALbCofpX3Dg@mail.gmail.com>
2016-04-06 15:02   ` Matt Fleming
2016-04-07 18:51   ` Luis R. Rodriguez
     [not found]   ` <20160406150240.GO2701@codeblueprint.co.uk>
2016-04-06 16:05     ` Konrad Rzeszutek Wilk
2016-04-06 16:23       ` Konrad Rzeszutek Wilk
2016-04-08 21:53         ` Luis R. Rodriguez
2016-04-13 10:03     ` Roger Pau Monné
     [not found]     ` <20160413100312.647eocdtbmak4btk@mac>
2016-04-13 10:21       ` Matt Fleming
     [not found]   ` <20160407185148.GL1990@wotan.suse.de>
2016-04-08 14:16     ` George Dunlap
     [not found]     ` <5707BD2E.20204@citrix.com>
2016-04-08 21:58       ` Luis R. Rodriguez
     [not found]       ` <20160408215854.GU1990@wotan.suse.de>
2016-04-12 22:12         ` Luis R. Rodriguez
2016-04-13  9:54         ` Roger Pau Monné
     [not found]         ` <20160412221225.GN1990@wotan.suse.de>
2016-04-13 10:05           ` George Dunlap
2016-04-13 10:25           ` Roger Pau Monné
     [not found]           ` <CAFLBxZbiGppNad=Z6-fLgx89O0yAFrSyARTCwv=vHBR3zJ=NsA@mail.gmail.com>
2016-04-13 18:54             ` Luis R. Rodriguez
     [not found]             ` <20160413185451.GY1990@wotan.suse.de>
2016-04-14  9:42               ` George Dunlap
     [not found]               ` <570F65F7.5050108@citrix.com>
2016-04-14 19:59                 ` Luis R. Rodriguez
     [not found]           ` <20160413102156.b4qwhwbqvnnpmxgw@mac>
2016-04-13 19:10             ` Luis R. Rodriguez
     [not found]         ` <20160413095428.5mcbrimvc6vxffcw@mac>
2016-04-13 18:50           ` Luis R. Rodriguez
     [not found]           ` <20160413185010.GX1990@wotan.suse.de>
2016-04-13 19:02             ` Konrad Rzeszutek Wilk
2016-04-13 19:14               ` Luis R. Rodriguez
     [not found]               ` <20160413191408.GA1990@wotan.suse.de>
2016-04-13 19:22                 ` Konrad Rzeszutek Wilk
2016-04-13 20:01                   ` Luis R. Rodriguez
     [not found]                   ` <20160413200118.GC1990@wotan.suse.de>
2016-04-13 20:11                     ` Konrad Rzeszutek Wilk
2016-04-13 20:35                       ` Luis R. Rodriguez
     [not found]                       ` <CAB=NE6VdTB1Bc=c0oCd_tTHpwwkQcxhnOFdcLfck2jX=JjuOAQ@mail.gmail.com>
2016-04-13 20:48                         ` Konrad Rzeszutek Wilk
2016-04-14 10:13                 ` George Dunlap
2016-04-13 15:44     ` George Dunlap
     [not found]     ` <CAFLBxZbJ4QyJQ1-ZuXg_Q-9YNXnWzDyPNp4SX=d9g0DS8mJKaw@mail.gmail.com>
2016-04-13 19:52       ` Luis R. Rodriguez
     [not found]       ` <20160413195257.GB1990@wotan.suse.de>
2016-04-14  9:53         ` George Dunlap
     [not found]         ` <570F68AB.2040400@citrix.com>
2016-04-14 19:44           ` Luis R. Rodriguez
     [not found]           ` <20160414194408.GP1990@wotan.suse.de>
2016-04-14 20:38             ` Konrad Rzeszutek Wilk
     [not found]             ` <20160414203847.GB21657@localhost.localdomain>
2016-04-14 21:12               ` Luis R. Rodriguez
     [not found]               ` <20160414211201.GS1990@wotan.suse.de>
2016-04-15  2:14                 ` Konrad Rzeszutek Wilk
2016-04-15  5:50             ` Juergen Gross
2016-04-15  9:59             ` George Dunlap
     [not found]             ` <57108121.1070307@suse.com>
2016-04-15 15:24               ` Luis R. Rodriguez
     [not found]             ` <5710BB74.2060409@citrix.com>
2016-04-15 15:30               ` Luis R. Rodriguez
     [not found]               ` <20160415153028.GX1990@wotan.suse.de>
2016-04-15 16:03                 ` George Dunlap
     [not found]                 ` <571110BB.2000408@citrix.com>
2016-04-15 17:17                   ` Luis R. Rodriguez
     [not found] ` <5704D978.1050101@citrix.com>
2016-04-08 20:40   ` Luis R. Rodriguez
     [not found]   ` <20160408204032.GR1990@wotan.suse.de>
2016-04-11  5:12     ` Juergen Gross
     [not found]     ` <570B3228.90400@suse.com>
2016-04-12 21:02       ` Andy Lutomirski
     [not found]       ` <CALCETrXvGR3XKJf5Ab_ZPc-iuNuzR8AzLpRBciemKz4r0vSrGA@mail.gmail.com>
2016-04-13  9:02         ` Roger Pau Monné
     [not found]         ` <20160413090202.bg2vfdl3iol7eedv@mac>
2016-04-13 10:15           ` Matt Fleming
     [not found]           ` <20160413101515.GJ2829@codeblueprint.co.uk>
2016-04-13 10:40             ` Matt Fleming
2016-04-13 11:12             ` George Dunlap
2016-04-13 11:59             ` Roger Pau Monné
     [not found]             ` <20160413115846.hyt4lg24rfkenbxu@mac>
2016-04-15 22:53               ` Matt Fleming
2016-04-13 18:29       ` Luis R. Rodriguez
     [not found]       ` <20160413182951.GW1990@wotan.suse.de>
2016-04-13 18:56         ` Konrad Rzeszutek Wilk
2016-04-13 20:40           ` Luis R. Rodriguez
     [not found]           ` <20160413204055.GD1990@wotan.suse.de>
2016-04-13 21:08             ` Konrad Rzeszutek Wilk
2016-04-13 22:23               ` Luis R. Rodriguez
     [not found]               ` <20160413222317.GH1990@wotan.suse.de>
2016-04-14  1:01                 ` Konrad Rzeszutek Wilk
     [not found]                 ` <20160414010131.GA21510@localhost.localdomain>
2016-04-14 18:40                   ` Luis R. Rodriguez
     [not found]                   ` <20160414184048.GM1990@wotan.suse.de>
2016-04-14 19:56                     ` Konrad Rzeszutek Wilk
2016-04-14 20:56                       ` Luis R. Rodriguez
     [not found]                       ` <20160414205619.GR1990@wotan.suse.de>
2016-04-15  2:02                         ` Konrad Rzeszutek Wilk
2016-04-15 10:06                         ` Julien Grall
     [not found]                         ` <5710BD0B.2070306@arm.com>
2016-04-15 14:55                           ` Luis R. Rodriguez
     [not found]                           ` <CAB=NE6UDuLOnW8xfTcgCGSbJ1aS4TkkokcGdeJGHMBps0T9=Sg@mail.gmail.com>
2016-04-15 18:44                             ` Stefano Stabellini
     [not found]                         ` <20160415020246.GA6956@localhost.localdomain>
2016-04-15 17:08                           ` Luis R. Rodriguez
     [not found] ` <20160406111130.GG3489@olila.local.net-space.pl>
2016-04-07 19:12   ` Luis R. Rodriguez
2016-04-09 17:02   ` Luis R. Rodriguez
2016-04-06  2:40 Luis R. Rodriguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).