All of lore.kernel.org
 help / color / mirror / Atom feed
* [DRAFT C] PVH CPU hotplug design document
@ 2017-01-17 17:14 Roger Pau Monné
  2017-01-23 16:30 ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Roger Pau Monné @ 2017-01-17 17:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Graeme Gregory, Al Stone, Andrew Cooper,
	Anshul Makkar, Julien Grall, Paul Durrant, Jan Beulich,
	Boris Ostrovsky

Hello,

Below is a draft of a design document for PVHv2 CPU hotplug. It should cover
both vCPU and pCPU hotplug. It's mainly centered around the hardware domain,
since for unprivileged PVH guests the vCPU hotplug mechanism is already
described in Boris series [0], and it's shared with HVM.

The aim here is to find a way to use ACPI vCPU hotplug for the hardware domain,
while still being able to properly detect and notify Xen of pCPU hotplug.

[0] https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg00060.html

---8<---
% CPU hotplug support for PVH
% Roger Pau Monné <roger.pau@citrix.com>
% Draft C

# Revision History

| Version | Date        | Changes                                           |
|---------|-------------|---------------------------------------------------|
| Draft A | 5 Jan 2017  | Initial draft.                                    |
|---------|-------------|---------------------------------------------------|
| Draft B | 12 Jan 2017 | Removed the XXX comments and clarify some         |
|         |             | sections.                                         |
|         |             |                                                   |
|         |             | Added a sample of the SSDT ASL code that would be |
|         |             | appended to the hardware domain.                  |
|---------|-------------|---------------------------------------------------|
|Draft C  | 17 Jan 2017 | Define a _SB.XEN0 bus device and place all the    |
|         |             | processor objects and the GPE block inside of it. |
|         |             |                                                   |
|         |             | Place the GPE status and enable registers and     |
|         |             | the vCPU enable bitmap in memory instead of IO    |
|         |             | space.                                            |

# Preface

This document aims to describe the interface to use in order to implement CPU
hotplug for PVH guests, this applies to hotplug of both physical and virtual
CPUs.

# Introduction

One of the design goals of PVH is to be able to remove as much Xen PV specific
code as possible, thus limiting the number of Xen PV interfaces used by guests,
and tending to use native interfaces (as used by bare metal) as much as
possible. This is in line with the efforts also done by Xen on ARM and helps
reduce the burden of maintaining huge amounts of Xen PV code inside of guests
kernels.

This however presents some challenges due to the model used by the Xen
Hypervisor, where some devices are handled by Xen while others are left for the
hardware domain to manage. The fact that Xen lacks and AML parser also makes it
harder, since it cannot get the full hardware description from dynamic ACPI
tables (DSDT, SSDT) without the hardware domain collaboration.

One of such issues is CPU enumeration and hotplug, for both the hardware and
unprivileged domains. The aim is to be able to use the same enumeration and
hotplug interface for all PVH guests, regardless of their privilege.

This document aims to describe the interface used in order to fulfill the
following actions:

 * Virtual CPU (vCPU) enumeration at boot time.
 * Hotplug of vCPUs.
 * Hotplug of physical CPUs (pCPUs) to Xen.

# Prior work

## PV CPU hotplug

CPU hotplug for Xen PV guests is implemented using xenstore and hypercalls. The
guest has to setup a watch event on the "cpu/" xenstore node, and react to
changes in this directory. CPUs are added creating a new node and setting it's
"availability" to online:

    cpu/X/availability = "online"

Where X is the vCPU ID. This is an out-of-band method, that relies on Xen
specific interfaces in order to perform CPU hotplug.

## QEMU CPU hotplug using ACPI

The ACPI tables provided to HVM guests contain processor objects, as created by
libacpi. The number of processor objects in the ACPI namespace matches the
maximum number of processors supported by HVM guests (up to 128 at the time of
writing). Processors currently disabled are marked as so in the MADT and in
their \_MAT and \_STA methods.

A PRST operation region in I/O space is also defined, with a size of 128bits,
that's used as a bitmap of enabled vCPUs on the system. A PRSC method is
provided in order to check for updates to the PRST region and trigger
notifications on the affected processor objects. The execution of the PRSC
method is done by a GPE event. Then OSPM checks the value returned by \_STA for
the ACPI\_STA\_DEVICE\_PRESENT flag in order to check if the vCPU has been
enabled.

## Native CPU hotplug

OSPM waits for a notification from ACPI on the processor object and when an
event is received the return value from _STA is checked in order to see if
ACPI\_STA\_DEVICE\_PRESENT has been enabled. This notification is triggered
from the method of a GPE block.

# PVH CPU hotplug

The aim as stated in the introduction is to use a method as similar as possible
to bare metal CPU hotplug for PVH, this is feasible for unprivileged domains,
since the ACPI tables can be created by the toolstack and provided to the
guest. Then a minimal I/O or memory handler will be added to Xen in order to
report the bitmap of enabled vCPUs. There's already a [series][0] posted to
xen-devel that implement this functionality for unprivileged PVH guests.

This however is proven to be quite difficult to implement for the hardware
domain, since it has to manage both pCPUs and vCPUs. The hardware domain should
be able to notify Xen of the addition of new pCPUs, so that they can be used by
the Hypervisor, and also be able to hotplug new vCPUs for it's own usage. Since
Xen cannot access the dynamic (AML) ACPI tables, because it lacks an AML
parser, it is the duty of the hardware domain to parse those tables and notify
Xen of relevant events.

There are several related issues here that prevent a straightforward solution
to this issue:

 * Xen cannot parse AML tables, and thus cannot get notifications from ACPI
   events. And even in the case that Xen could parse those tables, there can
   only be one OSPM registered with ACPI
 * Xen can provide a valid MADT table to the hardware domain that describes the
   environment in which the hardware domain is running, but it cannot prevent
   the hardware domain from seeing the real processor devices in the ACPI
   namespace, nor Xen can provide the hardware domain with processor
   devices that match the vCPUs at the moment.

[0]: https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg00060.html

## Proposed solution using the STAO

The general idea of this method is to use the STAO in order to hide the pCPUs
from the hardware domain, and provide processor objects for vCPUs in an extra
SSDT table.

This method requires one change to the STAO, in order to be able to notify the
hardware domain of which processors found in ACPI tables are pCPUs. The
description of the new STAO field is as follows:

 |   Field            | Byte Length | Byte Offset |     Description          |
 |--------------------|:-----------:|:-----------:|--------------------------|
 | Processor List [n] |      -      |      -      | A list of ACPI numbers,  |
 |                    |             |             | where each number is the |
 |                    |             |             | Processor UID of a       |
 |                    |             |             | physical CPU, and should |
 |                    |             |             | be treated specially by  |
 |                    |             |             | the OSPM                 |

The list of UIDs in this new field would be matched against the ACPI Processor
UID field found in local/x2 APIC MADT structs and Processor objects in the ACPI
namespace, and the OSPM should either ignore those objects, or in case it
implements pCPU hotplug, it should notify Xen of changes to these objects.

The contents of the MADT provided to the hardware domain are also going to be
different from the contents of the MADT as found in native ACPI. The local/x2
APIC entries for all the pCPUs are going to be marked as disabled.

Extra entries are going to be added for each vCPU available to the hardware
domain, up to the maximum number of supported vCPUs. Note that supported vCPUs
might be different than enabled vCPUs, so it's possible that some of these
entries are also going to be marked as disabled. The entries for vCPUs on the
MADT are going to use a processor local x2 APIC structure, and the ACPI
processor ID of vCPUs are not going to re-use processor IDs already used by
pCPUs. Xen makes no guarantee about the processor ID of the first vCPU, neither
the OS must assume them to be consecutive. Note that this would limit the
number of vCPUs so that (pCPUs + vCPUs) < 2^32.

In order to be able to perform vCPU hotplug, the vCPUs must have an ACPI
processor object in the ACPI namespace, so that the OSPM can request
notifications and get the value of the \_STA and \_MAT methods. This can be
problematic because Xen doesn't know the ACPI name of the other processor
objects, so blindly adding new ones can create namespace clashes.

This can be solved by using a different ACPI name in order to describe vCPUs in
the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for
the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should
prevent clashes.

A Xen GPE device block will be used in order to deliver events related to the
vCPUs available to the guest, since Xen doesn't know if there are any bits
available in the native GPEs. A SCI interrupt will be injected into the guest
in order to trigger the event.

The following snippet is a representation of the ASL SSDT code that is proposed
for the hardware domain:

    DefinitionBlock ("SSDT.aml", "SSDT", 5, "Xen", "HVM", 0)
    {
        Device ( \_SB.XEN0 ) {
            Name ( _HID, "ACPI0004" ) /* ACPI Module Device (bus node) */
        }
        Scope (\_SB.XEN0)
        {
            OperationRegion(XEN, SystemMemory, 0xXXXXXXXX, 41)
            Field(XEN, ByteAcc, NoLock, Preserve) {
                PRS, 2,   /* vCPU enabled bitmap */
                NCPU, 16, /* Number of vCPUs */
                MSUA, 32, /* MADT checksum address */
                MAPA, 32, /* MADT LAPIC0 address */
            }
            OperationRegion ( MSUM, SystemMemory, \_SB.XEN0.MSUA, 1 )
            Field ( MSUM, ByteAcc, NoLock, Preserve ) {
                MSU, 8
            }
            Method ( PMAT, 2 ) {
                If ( LLess(Arg0, NCPU) ) {
                    Return ( ToBuffer(Arg1) )
                }
                Return ( Buffer() {0, 8, 0xff, 0xff, 0, 0, 0, 0} )
            }
            Processor ( VP00, 0, 0x0000b010, 0x06 ) {
                Name ( _HID, "ACPI0007" )
                Name ( _UID, 1 )
                OperationRegion ( MATR, SystemMemory, Add(\_SB.XEN0.MAPA, 0), 8 )
                Field ( MATR, ByteAcc, NoLock, Preserve ) {
                    MAT, 64
                }
                Field ( MATR, ByteAcc, NoLock, Preserve ) {
                    Offset(4),
                    FLG, 1
                }
                Method ( _MAT, 0 ) {
                    Return ( ToBuffer(MAT) )
                }
                Method ( _STA ) {
                    If ( FLG ) {
                        Return ( 0xF )
                    }
                    Return ( 0x0 )
                }
                Method ( _EJ0, 1, NotSerialized ) {
                    Sleep ( 0xC8 )
                }
            }
            Processor ( VP01, 1, 0x0000b010, 0x06 ) {
                Name ( _HID, "ACPI0007" )
                Name ( _UID, 2 )
                OperationRegion ( MATR, SystemMemory, Add(\_SB.XEN0.MAPA, 8), 8 )
                Field ( MATR, ByteAcc, NoLock, Preserve ) {
                    MAT, 64
                }
                Field ( MATR, ByteAcc, NoLock, Preserve ) {
                    Offset(4),
                    FLG, 1
                }
                Method ( _MAT, 0 ) {
                    Return ( PMAT (1, MAT) )
                }
                Method ( _STA ) {
                    If ( LLess(1, \_SB.XEN0.NCPU) ) {
                        If ( FLG ) {
                            Return ( 0xF )
                        }
                    }
                    Return ( 0x0 )
                }
                Method ( _EJ0, 1, NotSerialized ) {
                    Sleep ( 0xC8 )
                }
            }
            Method ( PRSC, 0 ) {
                Store ( ToBuffer(PRS), Local0 )
                Store ( DerefOf(Index(Local0, 0)), Local1 )
                And ( Local1, 1, Local2 )
                If ( LNotEqual(Local2, \_SB.XEN0.VP00.FLG) ) {
                    Store ( Local2, \_SB.XEN0.VP00.FLG )
                    If ( LEqual(Local2, 1) ) {
                        Notify ( VP00, 1 )
                        Subtract ( \_SB.XEN0.MSU, 1, \_SB.XEN0.MSU )
                    }
                    Else {
                        Notify ( VP00, 3 )
                        Add ( \_SB.XEN0.MSU, 1, \_SB.XEN0.MSU )
                    }
                }
                ShiftRight ( Local1, 1, Local1 )
                And ( Local1, 1, Local2 )
                If ( LNotEqual(Local2, \_SB.XEN0.VP01.FLG) ) {
                    Store ( Local2, \_SB.XEN0.VP01.FLG )
                    If ( LEqual(Local2, 1) ) {
                        Notify ( VP01, 1 )
                        Subtract ( \_SB.XEN0.MSU, 1, \_SB.XEN0.MSU )
                    }
                    Else {
                        Notify ( VP01, 3 )
                        Add ( \_SB.XEN0.MSU, 1, \_SB.XEN0.MSU )
                    }
                }
                Return ( One )
            }
        }
        Device ( \_SB.XEN0.GPE0 ) {
            Name ( _HID, "ACPI0006" )
            Name ( _UID, "XENGPE0" )
            Name ( _CRS, ResourceTemplate() {
                Memory32Fixed ( ReadWrite, 0xXXXXXXXX, 0x4 )
            } )
            Method ( _E02 ) {
                \_SB.XEN0.PRSC ()
            }
        }
    }

Since the position of the XEN data memory area is not know, the hypervisor will
have to replace the address noted as 0xXXXXXXXX with the actual memory address
where this structure has been copied. The ACPI processor IDs will also be
replaced by Xen during runtime (noted as 1 and 2 in the snipped above). The
PRST region containing the vCPU enabled bitmap would also need to be relocated
by Xen over a RAM region, and updated accordingly when a vCPU is added or
removed.

The replacement can be done by compiling two different versions of the above
ASL code, each one having different values for the XEN operation region, the
ACPI processor objects IDs and other values that need to be set on a per-system
basis, and doing a binary comparison between them in order to get the relative
offsets of the differences. Note that the XEN operation region and the GPE
event and status regions would be placed over a RAM memory region.

In order to implement this, the hypervisor build is going to use part of
libacpi and the iasl compiler.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DRAFT C] PVH CPU hotplug design document
  2017-01-17 17:14 [DRAFT C] PVH CPU hotplug design document Roger Pau Monné
@ 2017-01-23 16:30 ` Jan Beulich
  2017-01-23 16:42   ` Roger Pau Monné
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2017-01-23 16:30 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Graeme Gregory, Al Stone, Andrew Cooper,
	Anshul Makkar, JulienGrall, Paul Durrant, xen-devel,
	BorisOstrovsky

>>> On 17.01.17 at 18:14, <roger.pau@citrix.com> wrote:
> This can be solved by using a different ACPI name in order to describe vCPUs in
> the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for
> the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should
> prevent clashes.

I continue to think that this is insufficient, without seeing a nice
clean way to solve the issue properly.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DRAFT C] PVH CPU hotplug design document
  2017-01-23 16:30 ` Jan Beulich
@ 2017-01-23 16:42   ` Roger Pau Monné
  2017-01-23 16:55     ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Roger Pau Monné @ 2017-01-23 16:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Graeme Gregory, Al Stone, Andrew Cooper,
	Anshul Makkar, JulienGrall, Paul Durrant, xen-devel,
	BorisOstrovsky

On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote:
> >>> On 17.01.17 at 18:14, <roger.pau@citrix.com> wrote:
> > This can be solved by using a different ACPI name in order to describe vCPUs in
> > the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for
> > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should
> > prevent clashes.
> 
> I continue to think that this is insufficient, without seeing a nice
> clean way to solve the issue properly.

But in this document the namespace path for processor objects will be
_SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should have
updated the wording here, every Xen-related ACPI bit will be inside the
_SB.XEN0 namespace.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DRAFT C] PVH CPU hotplug design document
  2017-01-23 16:42   ` Roger Pau Monné
@ 2017-01-23 16:55     ` Jan Beulich
  2017-01-23 17:12       ` Roger Pau Monné
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2017-01-23 16:55 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Graeme Gregory, Al Stone, Andrew Cooper,
	Anshul Makkar, JulienGrall, Paul Durrant, xen-devel,
	BorisOstrovsky

>>> On 23.01.17 at 17:42, <roger.pau@citrix.com> wrote:
> On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote:
>> >>> On 17.01.17 at 18:14, <roger.pau@citrix.com> wrote:
>> > This can be solved by using a different ACPI name in order to describe vCPUs in
>> > the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for
>> > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should
>> > prevent clashes.
>> 
>> I continue to think that this is insufficient, without seeing a nice
>> clean way to solve the issue properly.
> 
> But in this document the namespace path for processor objects will be
> _SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should have
> updated the wording here, every Xen-related ACPI bit will be inside the
> _SB.XEN0 namespace.

Well, if we want to introduce our own parent name space, why the
special naming convention then? Any name not colliding with other
things in _SB.XEN0 should do then, so the only remaining risk would
then be that the firmware also has _SB.XEN0.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DRAFT C] PVH CPU hotplug design document
  2017-01-23 16:55     ` Jan Beulich
@ 2017-01-23 17:12       ` Roger Pau Monné
  2017-01-24  7:45         ` Jan Beulich
  2017-01-24 14:20         ` Boris Ostrovsky
  0 siblings, 2 replies; 10+ messages in thread
From: Roger Pau Monné @ 2017-01-23 17:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Graeme Gregory, Al Stone, Andrew Cooper,
	Anshul Makkar, JulienGrall, Paul Durrant, xen-devel,
	BorisOstrovsky

On Mon, Jan 23, 2017 at 09:55:19AM -0700, Jan Beulich wrote:
> >>> On 23.01.17 at 17:42, <roger.pau@citrix.com> wrote:
> > On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote:
> >> >>> On 17.01.17 at 18:14, <roger.pau@citrix.com> wrote:
> >> > This can be solved by using a different ACPI name in order to describe vCPUs in
> >> > the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for
> >> > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should
> >> > prevent clashes.
> >> 
> >> I continue to think that this is insufficient, without seeing a nice
> >> clean way to solve the issue properly.
> > 
> > But in this document the namespace path for processor objects will be
> > _SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should have
> > updated the wording here, every Xen-related ACPI bit will be inside the
> > _SB.XEN0 namespace.
> 
> Well, if we want to introduce our own parent name space, why the
> special naming convention then? Any name not colliding with other
> things in _SB.XEN0 should do then, so the only remaining risk would
> then be that the firmware also has _SB.XEN0.

Right, that's why I say that I should have reworded this. We can then use PXXX,
CXXX or whatever we want.

Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT there's
no way to reserve anything in there (mostly because it's assumed that ACPI
tables will be created by a single entity I guess).

I think that the chance of this happening is 0%, and that there's no single
system out there with a _SB.XEN0 node. I've been wondering whether I should try
to post this to the ACPI working group, and try to get some feedback there.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DRAFT C] PVH CPU hotplug design document
  2017-01-23 17:12       ` Roger Pau Monné
@ 2017-01-24  7:45         ` Jan Beulich
  2017-01-24 14:20         ` Boris Ostrovsky
  1 sibling, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2017-01-24  7:45 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Graeme Gregory, Al Stone, Andrew Cooper,
	Anshul Makkar, JulienGrall, PaulDurrant, xen-devel,
	BorisOstrovsky

>>> On 23.01.17 at 18:12, <roger.pau@citrix.com> wrote:
> On Mon, Jan 23, 2017 at 09:55:19AM -0700, Jan Beulich wrote:
>> >>> On 23.01.17 at 17:42, <roger.pau@citrix.com> wrote:
>> > On Mon, Jan 23, 2017 at 09:30:30AM -0700, Jan Beulich wrote:
>> >> >>> On 17.01.17 at 18:14, <roger.pau@citrix.com> wrote:
>> >> > This can be solved by using a different ACPI name in order to describe vCPUs in
>> >> > the ACPI namespace. Most hardware vendors tend to use CPU or PR prefixes for
>> >> > the processor objects, so using a 'VP' (ie: Virtual Processor) prefix should
>> >> > prevent clashes.
>> >> 
>> >> I continue to think that this is insufficient, without seeing a nice
>> >> clean way to solve the issue properly.
>> > 
>> > But in this document the namespace path for processor objects will be
>> > _SB.XEN0.VPXX, which should prevent any namespace clashes. Maybe I should have
>> > updated the wording here, every Xen-related ACPI bit will be inside the
>> > _SB.XEN0 namespace.
>> 
>> Well, if we want to introduce our own parent name space, why the
>> special naming convention then? Any name not colliding with other
>> things in _SB.XEN0 should do then, so the only remaining risk would
>> then be that the firmware also has _SB.XEN0.
> 
> Right, that's why I say that I should have reworded this. We can then use PXXX,
> CXXX or whatever we want.
> 
> Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT there's
> no way to reserve anything in there (mostly because it's assumed that ACPI
> tables will be created by a single entity I guess).

Right.

> I think that the chance of this happening is 0%, and that there's no single
> system out there with a _SB.XEN0 node. I've been wondering whether I should try
> to post this to the ACPI working group, and try to get some feedback there.

As you've said during some earlier discussion, it won't hurt to give
this a try.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DRAFT C] PVH CPU hotplug design document
  2017-01-23 17:12       ` Roger Pau Monné
  2017-01-24  7:45         ` Jan Beulich
@ 2017-01-24 14:20         ` Boris Ostrovsky
  2017-02-06 23:06           ` Al Stone
  1 sibling, 1 reply; 10+ messages in thread
From: Boris Ostrovsky @ 2017-01-24 14:20 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: Stefano Stabellini, Graeme Gregory, Al Stone, Andrew Cooper,
	Anshul Makkar, JulienGrall, Paul Durrant, xen-devel


> Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT there's
> no way to reserve anything in there (mostly because it's assumed that ACPI
> tables will be created by a single entity I guess).
>
> I think that the chance of this happening is 0%, and that there's no single
> system out there with a _SB.XEN0 node. I've been wondering whether I should try
> to post this to the ACPI working group, and try to get some feedback there.

If you end up asking there, I'd suggest including Rafael Wysocki and Len
Brown (rafael@kernel.org and lenb@kernel.org) and maybe 
linux-acpi@vger.kernel.org as well.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DRAFT C] PVH CPU hotplug design document
  2017-01-24 14:20         ` Boris Ostrovsky
@ 2017-02-06 23:06           ` Al Stone
  2017-02-07 12:21             ` Roger Pau Monné
  0 siblings, 1 reply; 10+ messages in thread
From: Al Stone @ 2017-02-06 23:06 UTC (permalink / raw)
  To: Boris Ostrovsky, Roger Pau Monné, Jan Beulich
  Cc: Stefano Stabellini, Graeme Gregory, Andrew Cooper, Anshul Makkar,
	JulienGrall, Paul Durrant, xen-devel

On 01/24/2017 07:20 AM, Boris Ostrovsky wrote:
> 
>> Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT there's
>> no way to reserve anything in there (mostly because it's assumed that ACPI
>> tables will be created by a single entity I guess).
>>
>> I think that the chance of this happening is 0%, and that there's no single
>> system out there with a _SB.XEN0 node. I've been wondering whether I should try
>> to post this to the ACPI working group, and try to get some feedback there.
> 
> If you end up asking there, I'd suggest including Rafael Wysocki and Len
> Brown (rafael@kernel.org and lenb@kernel.org) and maybe 
> linux-acpi@vger.kernel.org as well.
> 
> -boris
> 

My apologies for not leaping into this discussion earlier; real life has been
somewhat complicated lately.  Hopefully I won't annoy too many people.

So, I am on the ASWG (ACPI Spec Working Group) as a Red Hat and/or Linaro
representative.  To clarify something mentioned quite some time ago, the STAO
and XENV tables are in the ACPI in a special form.  Essentially, there are two
classes of tables within ACPI: official tables defined in the spec itself that
are meant to be used anywhere ACPI is used, and, tables whose names are to be
recognized but whose content is defined elsewhere.  The STAO and XENV belong
to this second class -- the spec reserved their signatures so that others do
not use them, but then points to an external source -- Xen, specifically -- for
the definition.  The practical implication is that Xen can change definitions
as they wish, without direct oversight of the ASWG.  Just the same, it is
considered bad form to do so, however, so new revisions should at least be sent
to the ASWG for discussion (it may make sense to pull the table into the spec
itself...).  Stefano and I worked together to get the original reservation made
for the STAO and XENV tables.

The other thing I've noticed so far in the discussion is that everything
discussed may work on x86 or ia64, but will not work at all on arm64.  The
HARDWARE_REDUCED flag in the FADT was mentioned -- this is the crux of the
problem.  For arm64, that flag is required to be set, so overloading it is most
definitely an issue.  More problematic, however, is the notion of using GPE
blocks; when the HARDWARE_REDUCED flag is set, the spec requires GPE block
definitions are to be ignored.

Then it gets messy :).  The APIC and/or x2APIC subtables of the MADT are not
likely to exist on arm64; chances are just about zero, actually.  There are
other similar MADT subtables for arm64, but APIC, x2APIC and many more just
won't be there.  There is some overlap with ia64, but not entirely.

The other issue is that a separate name space for the added CPUs would have
to be very carefully done.  If not, then the processor hierarchy information
in the AML either becomes useless, or at the least inconsistent, and OSPMs
are just now beginning to use some of that info to make scheduling decisions.
It would be possible to just assume the hot plug CPUs are outside of any
existing processor hierarchy, but I would then worry that power management
decisions made by the OSPM might be wrong; I can imagine a scenario where
a CPU is inserted and shares a power rail with an existing CPU, but the
existing CPU is idle so it decides to power off since it's the last in the
hierarchy, so the power rail isn't needed, and now the power gets turned off
to the unit just plugged in because the OSPM doesn't realize it shares power.

So at a minimum, it sounds like there would need to be a solution for each
architecture, with maybe some fiddling around on ia64, too.  Unfortunately,
I believe the ACPI spec provides a way to handle all of the things wanted,
but an ASL interpreter would be required because it does rely on executing
methods (e.g., _CRS to determine processor resources on hot plug).  The ACPICA
code is dual-licensed, GPL and commercial, and there is the OpenBSD code.
But without an interpreter, it feels like we're trying to push dynamic
behavior into static tables, and they really weren't designed for that.

That's my $0.02 worth at least....

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Linaro Enterprise Group
al.stone@linaro.org
-----------------------------------

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DRAFT C] PVH CPU hotplug design document
  2017-02-06 23:06           ` Al Stone
@ 2017-02-07 12:21             ` Roger Pau Monné
  2017-02-22 19:29               ` Al Stone
  0 siblings, 1 reply; 10+ messages in thread
From: Roger Pau Monné @ 2017-02-07 12:21 UTC (permalink / raw)
  To: Al Stone
  Cc: Stefano Stabellini, Graeme Gregory, Andrew Cooper, Anshul Makkar,
	JulienGrall, Paul Durrant, Jan Beulich, xen-devel,
	Boris Ostrovsky

Hello Al,

Thanks for your comments, please see below.

On Mon, Feb 06, 2017 at 04:06:45PM -0700, Al Stone wrote:
> On 01/24/2017 07:20 AM, Boris Ostrovsky wrote:
> > 
> >> Yes, the only remaining risk is some vendor using _SB.XEN0, and AFAICT there's
> >> no way to reserve anything in there (mostly because it's assumed that ACPI
> >> tables will be created by a single entity I guess).
> >>
> >> I think that the chance of this happening is 0%, and that there's no single
> >> system out there with a _SB.XEN0 node. I've been wondering whether I should try
> >> to post this to the ACPI working group, and try to get some feedback there.
> > 
> > If you end up asking there, I'd suggest including Rafael Wysocki and Len
> > Brown (rafael@kernel.org and lenb@kernel.org) and maybe 
> > linux-acpi@vger.kernel.org as well.
> > 
> > -boris
> > 
> 
> My apologies for not leaping into this discussion earlier; real life has been
> somewhat complicated lately.  Hopefully I won't annoy too many people.
> 
> So, I am on the ASWG (ACPI Spec Working Group) as a Red Hat and/or Linaro
> representative.  To clarify something mentioned quite some time ago, the STAO
> and XENV tables are in the ACPI in a special form.  Essentially, there are two
> classes of tables within ACPI: official tables defined in the spec itself that
> are meant to be used anywhere ACPI is used, and, tables whose names are to be
> recognized but whose content is defined elsewhere.  The STAO and XENV belong
> to this second class -- the spec reserved their signatures so that others do
> not use them, but then points to an external source -- Xen, specifically -- for
> the definition.  The practical implication is that Xen can change definitions
> as they wish, without direct oversight of the ASWG.  Just the same, it is
> considered bad form to do so, however, so new revisions should at least be sent
> to the ASWG for discussion (it may make sense to pull the table into the spec
> itself...).  Stefano and I worked together to get the original reservation made
> for the STAO and XENV tables.
> 
> The other thing I've noticed so far in the discussion is that everything
> discussed may work on x86 or ia64, but will not work at all on arm64.  The
> HARDWARE_REDUCED flag in the FADT was mentioned -- this is the crux of the
> problem.  For arm64, that flag is required to be set, so overloading it is most
> definitely an issue.  More problematic, however, is the notion of using GPE
> blocks; when the HARDWARE_REDUCED flag is set, the spec requires GPE block
> definitions are to be ignored.

Yes, this document is specific to x86. I believe that the difference between
x86 and ARM regarding ACPI would make it too complicated to come up with a
solution that's usable on both, mainly because ACPI tables on ARM and x86 are
already too different.

> Then it gets messy :).  The APIC and/or x2APIC subtables of the MADT are not
> likely to exist on arm64; chances are just about zero, actually.  There are
> other similar MADT subtables for arm64, but APIC, x2APIC and many more just
> won't be there.  There is some overlap with ia64, but not entirely.

ia64 is also out of the picture here, the more that Xen doesn't support it, and
it doesn't look like anyone is working on it.

> The other issue is that a separate name space for the added CPUs would have
> to be very carefully done.  If not, then the processor hierarchy information
> in the AML either becomes useless, or at the least inconsistent, and OSPMs
> are just now beginning to use some of that info to make scheduling decisions.
> It would be possible to just assume the hot plug CPUs are outside of any
> existing processor hierarchy, but I would then worry that power management
> decisions made by the OSPM might be wrong; I can imagine a scenario where
> a CPU is inserted and shares a power rail with an existing CPU, but the
> existing CPU is idle so it decides to power off since it's the last in the
> hierarchy, so the power rail isn't needed, and now the power gets turned off
> to the unit just plugged in because the OSPM doesn't realize it shares power.

Well, my suggestion was to add the processor objects of the virtual CPUs inside
an ACPI Module Device that has the _SB.XEN0 namespace. However, AFAIK there's
no way to reserve the _SB.XEN0 namespace, so a vendor could use that for
something else. I think the chances of that happening are very low, but it's
not impossible.

Is there anyway in ACPI to reserve a namespace for a certain usage? (ie: would
it be possible to somehow reserve _SB.XEN0 for Xen usage?)

Or if we want to go more generic, we could reserve _SB.VIRT for generic
hypervisor usage.

> So at a minimum, it sounds like there would need to be a solution for each
> architecture, with maybe some fiddling around on ia64, too.  Unfortunately,
> I believe the ACPI spec provides a way to handle all of the things wanted,
> but an ASL interpreter would be required because it does rely on executing
> methods (e.g., _CRS to determine processor resources on hot plug).  The ACPICA
> code is dual-licensed, GPL and commercial, and there is the OpenBSD code.
> But without an interpreter, it feels like we're trying to push dynamic
> behavior into static tables, and they really weren't designed for that.

Yes, I think an arch-specific solution is needed in this case. Currently Dom0
passes all this information to Xen using hypercalls, but I don't think an AML
parser in Xen is strictly needed in order to implement the solution that I'm
proposing. We can get the ACPI processor object IDs from the MADT, and that
could be used in the STAO to hide them from Dom0 (provided that the STAO is
modified to add a new field, as described in the design document).

I'm also a member of the ACPI working group, and I was planning to send this
design document there for further discussion, just haven't found the time yet
to write a proper mail :(.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DRAFT C] PVH CPU hotplug design document
  2017-02-07 12:21             ` Roger Pau Monné
@ 2017-02-22 19:29               ` Al Stone
  0 siblings, 0 replies; 10+ messages in thread
From: Al Stone @ 2017-02-22 19:29 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Graeme Gregory, Andrew Cooper, Anshul Makkar,
	JulienGrall, Paul Durrant, Jan Beulich, xen-devel,
	Boris Ostrovsky

On 02/07/2017 05:21 AM, Roger Pau Monné wrote:
> Hello Al,
> 
> Thanks for your comments, please see below.
> 
> On Mon, Feb 06, 2017 at 04:06:45PM -0700, Al Stone wrote:
>> On 01/24/2017 07:20 AM, Boris Ostrovsky wrote:
[snip....]

>> Then it gets messy :).  The APIC and/or x2APIC subtables of the MADT are not
>> likely to exist on arm64; chances are just about zero, actually.  There are
>> other similar MADT subtables for arm64, but APIC, x2APIC and many more just
>> won't be there.  There is some overlap with ia64, but not entirely.
> 
> ia64 is also out of the picture here, the more that Xen doesn't support it, and
> it doesn't look like anyone is working on it.

Aw.  That's kind of sad.  I worked on Xen/ia64 briefly many, many moons ago.

Yeah, there are arch differences.  Once you have the x86 side going, though, I
think adding in arm64 wouldn't be too bad; they're a little simpler, in some
respects.

>> The other issue is that a separate name space for the added CPUs would have
>> to be very carefully done.  If not, then the processor hierarchy information
>> in the AML either becomes useless, or at the least inconsistent, and OSPMs
>> are just now beginning to use some of that info to make scheduling decisions.
>> It would be possible to just assume the hot plug CPUs are outside of any
>> existing processor hierarchy, but I would then worry that power management
>> decisions made by the OSPM might be wrong; I can imagine a scenario where
>> a CPU is inserted and shares a power rail with an existing CPU, but the
>> existing CPU is idle so it decides to power off since it's the last in the
>> hierarchy, so the power rail isn't needed, and now the power gets turned off
>> to the unit just plugged in because the OSPM doesn't realize it shares power.
> 
> Well, my suggestion was to add the processor objects of the virtual CPUs inside
> an ACPI Module Device that has the _SB.XEN0 namespace. However, AFAIK there's
> no way to reserve the _SB.XEN0 namespace, so a vendor could use that for
> something else. I think the chances of that happening are very low, but it's
> not impossible.
> 
> Is there anyway in ACPI to reserve a namespace for a certain usage? (ie: would
> it be possible to somehow reserve _SB.XEN0 for Xen usage?)

The only really reserved namespace is "_XXX".  The rest is fair game; since one
can only use four characters, I suspect there will be some reluctance to set
aside more.

There are the top-level names (mostly just \_SB these days).  Maybe a top level
\_XEN or \_VRT could work, perhaps with some fairly strict rules on what can be
in that subspace.  I think the issue at that point would be whether or not this
is a solution to a general problem, or if it is something that affects only Xen.

> Or if we want to go more generic, we could reserve _SB.VIRT for generic
> hypervisor usage.

Right.  And this would be one of the key questions from ASWG -- can it be
generalized?

> [snip...] 
> I'm also a member of the ACPI working group, and I was planning to send this
> design document there for further discussion, just haven't found the time yet
> to write a proper mail :(.
> 
> Roger.
> 

No worries.  Getting things started is not too bad; it's the discussion after
that can go on for a while :-).

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Linaro Enterprise Group
al.stone@linaro.org
-----------------------------------

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-02-22 19:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-17 17:14 [DRAFT C] PVH CPU hotplug design document Roger Pau Monné
2017-01-23 16:30 ` Jan Beulich
2017-01-23 16:42   ` Roger Pau Monné
2017-01-23 16:55     ` Jan Beulich
2017-01-23 17:12       ` Roger Pau Monné
2017-01-24  7:45         ` Jan Beulich
2017-01-24 14:20         ` Boris Ostrovsky
2017-02-06 23:06           ` Al Stone
2017-02-07 12:21             ` Roger Pau Monné
2017-02-22 19:29               ` Al Stone

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.