All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@kaod.org>
To: kvm-ppc@vger.kernel.org
Cc: kvm@vger.kernel.org, "Paul Mackerras" <paulus@samba.org>,
	"Cédric Le Goater" <clg@kaod.org>,
	linuxppc-dev@lists.ozlabs.org,
	"David Gibson" <david@gibson.dropbear.id.au>
Subject: [PATCH v4 00/17] KVM: PPC: Book3S HV: add XIVE native exploitation mode
Date: Wed, 20 Mar 2019 09:37:34 +0100	[thread overview]
Message-ID: <20190320083751.27001-1-clg@kaod.org> (raw)

Hello,

On the POWER9 processor, the XIVE interrupt controller can control
interrupt sources using MMIOs to trigger events, to EOI or to turn off
the sources. Priority management and interrupt acknowledgment is also
controlled by MMIO in the CPU presenter sub-engine.

PowerNV/baremetal Linux runs natively under XIVE but sPAPR guests need
special support from the hypervisor to do the same. This is called the
XIVE native exploitation mode and today, it can be activated under the
PowerPC Hypervisor, pHyp. However, Linux/KVM lacks XIVE native support
and still offers the old interrupt mode interface using a KVM device
implementing the XICS hcalls over XIVE.

The following series is proposal to add the same support under KVM.

A new KVM device is introduced for the XIVE native exploitation
mode. It reuses most of the XICS-over-XIVE glue implementation
structures which are internal to KVM but has a completely different
interface. A set of KVM device ioctls provide support for the
hypervisor calls, all handled in QEMU, to configure the sources and
the event queues. From there, all interrupt control is transferred to
the guest which can use MMIOs.

These MMIO regions (ESB and TIMA) are exposed to guests in QEMU,
similarly to VFIO, and the associated VMAs are populated dynamically
with the appropriate pages using a fault handler. These are now
implemented using mmap()s of the KVM device fd.

Migration has its own specific needs regarding memory. The patchset
provides a specific control to quiesce XIVE before capturing the
memory. The save and restore of the internal state is based on the
same ioctls used for the hcalls.

On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
negotiation process determines whether the guest operates with a
interrupt controller using the XICS legacy model, as found on POWER8,
or in XIVE exploitation mode. Which means that the KVM interrupt
device should be created at run-time, after the machine has started.
This requires extra support from KVM to destroy KVM devices. It is
introduced at the end of the patchset as it still requires some
attention and a XIVE-only VM would not need.

This is based on 5.1-rc1 and should be a candidate for 5.2 now. The
OPAL patches have not yet been merged.


GitHub trees available here :
 
QEMU sPAPR:

  https://github.com/legoater/qemu/commits/xive-next
  
Linux/KVM:

  https://github.com/legoater/linux/commits/xive-5.1

OPAL:

  https://github.com/legoater/skiboot/commits/xive

Thanks,

C.

Caveats :

 - We should introduce a set of definitions common to XIVE and XICS
 - The XICS-over-XIVE device file book3s_xive.c could be renamed to
   book3s_xics_on_xive.c or book3s_xics_p9.c
 - The XICS-over-XIVE device has locking issues in the setup. 

Changes since v3:

 - removed a couple of useless includes
 - fix the test ont the initial setting of the EQ toggle bit : 0 -> 1
 - renamed qsize to qshift
 - renamed qpage to qaddr
 - checked host page size
 - limited flags to KVM_XIVE_EQ_ALWAYS_NOTIFY to fit sPAPR specs
 - Fixed xive_timaval description in documentation

Changes since v2:

 - removed extra OPAL call definitions
 - removed ->q_order setting. Only useful in the XICS-on-XIVE KVM
   device which allocates the EQs on behalf of the guest.
 - returned -ENXIO when VP base is invalid
 - made use of the xive_vp() macro to compute VP identifiers
 - reworked locking in kvmppc_xive_native_connect_vcpu() to fix races 
 - stop advertising KVM_CAP_PPC_IRQ_XIVE as support is not fully
   available yet
 - fixed comment on XIVE IRQ number space
 - removed usage of the __x_* macros
 - fixed locking on source block
 - fixed comments on the KVM device attribute definitions
 - handled MASKED EAS configuration
 - fixed check on supported EQ size to restrict to 64K pages
 - checked kvm_eq.flags that need to be zero
 - removed the OPAL call when EQ qtoggle bit and index are zero. 
 - reduced the size of kvmppc_one_reg timaval attribute to two u64s
 - stopped returning of the OS CAM line value
 
Changes since v1:

 - Better documentation (was missing)
 - Nested support. XIVE not advertised on non PowerNV platforms. This
   is a good way to test the fallback on QEMU emulated devices.
 - ESB and TIMA special mapping done using the KVM device fd
 - All hcalls moved to QEMU. Dropped the patch moving the hcall flags.
 - Reworked of the KVM device ioctl controls to support hcalls and
   migration needs to capture/save states
 - Merged the control syncing XIVE and marking the EQ page dirty
 - Fixed passthrough support using the KVM device file address_space
   to clear the ESB pages from the mapping
 - Misc enhancements and fixes 

Cédric Le Goater (17):
  powerpc/xive: add OPAL extensions for the XIVE native exploitation
    support
  KVM: PPC: Book3S HV: add a new KVM device for the XIVE native
    exploitation mode
  KVM: PPC: Book3S HV: XIVE: introduce a new capability
    KVM_CAP_PPC_IRQ_XIVE
  KVM: PPC: Book3S HV: XIVE: add a control to initialize a source
  KVM: PPC: Book3S HV: XIVE: add a control to configure a source
  KVM: PPC: Book3S HV: XIVE: add controls for the EQ configuration
  KVM: PPC: Book3S HV: XIVE: add a global reset control
  KVM: PPC: Book3S HV: XIVE: add a control to sync the sources
  KVM: PPC: Book3S HV: XIVE: add a control to dirty the XIVE EQ pages
  KVM: PPC: Book3S HV: XIVE: add get/set accessors for the VP XIVE state
  KVM: introduce a 'mmap' method for KVM devices
  KVM: PPC: Book3S HV: XIVE: add a TIMA mapping
  KVM: PPC: Book3S HV: XIVE: add a mapping for the source ESB pages
  KVM: PPC: Book3S HV: XIVE: add passthrough support
  KVM: PPC: Book3S HV: XIVE: activate XIVE exploitation mode
  KVM: introduce a KVM_DESTROY_DEVICE ioctl
  KVM: PPC: Book3S HV: XIVE: clear the vCPU interrupt presenters

 arch/powerpc/include/asm/kvm_host.h        |    2 +
 arch/powerpc/include/asm/kvm_ppc.h         |   32 +
 arch/powerpc/include/asm/opal-api.h        |    7 +-
 arch/powerpc/include/asm/opal.h            |    7 +
 arch/powerpc/include/asm/xive.h            |   17 +
 arch/powerpc/include/uapi/asm/kvm.h        |   46 +
 arch/powerpc/kvm/book3s_xive.h             |   36 +
 include/linux/kvm_host.h                   |    1 +
 include/uapi/linux/kvm.h                   |   10 +
 arch/powerpc/kvm/book3s.c                  |   31 +-
 arch/powerpc/kvm/book3s_xics.c             |   19 +
 arch/powerpc/kvm/book3s_xive.c             |  170 ++-
 arch/powerpc/kvm/book3s_xive_native.c      | 1205 ++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c                 |   37 +
 arch/powerpc/platforms/powernv/opal-call.c |    3 +
 arch/powerpc/sysdev/xive/native.c          |  110 ++
 virt/kvm/kvm_main.c                        |   52 +
 Documentation/virtual/kvm/api.txt          |   29 +
 Documentation/virtual/kvm/devices/xive.txt |  197 ++++
 arch/powerpc/kvm/Makefile                  |    2 +-
 20 files changed, 1953 insertions(+), 60 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_xive_native.c
 create mode 100644 Documentation/virtual/kvm/devices/xive.txt

-- 
2.20.1

WARNING: multiple messages have this Message-ID (diff)
From: "Cédric Le Goater" <clg@kaod.org>
To: kvm-ppc@vger.kernel.org
Cc: kvm@vger.kernel.org, "Paul Mackerras" <paulus@samba.org>,
	"Cédric Le Goater" <clg@kaod.org>,
	linuxppc-dev@lists.ozlabs.org,
	"David Gibson" <david@gibson.dropbear.id.au>
Subject: [PATCH v4 00/17] KVM: PPC: Book3S HV: add XIVE native exploitation mode
Date: Wed, 20 Mar 2019 08:37:34 +0000	[thread overview]
Message-ID: <20190320083751.27001-1-clg@kaod.org> (raw)

Hello,

On the POWER9 processor, the XIVE interrupt controller can control
interrupt sources using MMIOs to trigger events, to EOI or to turn off
the sources. Priority management and interrupt acknowledgment is also
controlled by MMIO in the CPU presenter sub-engine.

PowerNV/baremetal Linux runs natively under XIVE but sPAPR guests need
special support from the hypervisor to do the same. This is called the
XIVE native exploitation mode and today, it can be activated under the
PowerPC Hypervisor, pHyp. However, Linux/KVM lacks XIVE native support
and still offers the old interrupt mode interface using a KVM device
implementing the XICS hcalls over XIVE.

The following series is proposal to add the same support under KVM.

A new KVM device is introduced for the XIVE native exploitation
mode. It reuses most of the XICS-over-XIVE glue implementation
structures which are internal to KVM but has a completely different
interface. A set of KVM device ioctls provide support for the
hypervisor calls, all handled in QEMU, to configure the sources and
the event queues. From there, all interrupt control is transferred to
the guest which can use MMIOs.

These MMIO regions (ESB and TIMA) are exposed to guests in QEMU,
similarly to VFIO, and the associated VMAs are populated dynamically
with the appropriate pages using a fault handler. These are now
implemented using mmap()s of the KVM device fd.

Migration has its own specific needs regarding memory. The patchset
provides a specific control to quiesce XIVE before capturing the
memory. The save and restore of the internal state is based on the
same ioctls used for the hcalls.

On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
negotiation process determines whether the guest operates with a
interrupt controller using the XICS legacy model, as found on POWER8,
or in XIVE exploitation mode. Which means that the KVM interrupt
device should be created at run-time, after the machine has started.
This requires extra support from KVM to destroy KVM devices. It is
introduced at the end of the patchset as it still requires some
attention and a XIVE-only VM would not need.

This is based on 5.1-rc1 and should be a candidate for 5.2 now. The
OPAL patches have not yet been merged.


GitHub trees available here :
 
QEMU sPAPR:

  https://github.com/legoater/qemu/commits/xive-next
  
Linux/KVM:

  https://github.com/legoater/linux/commits/xive-5.1

OPAL:

  https://github.com/legoater/skiboot/commits/xive

Thanks,

C.

Caveats :

 - We should introduce a set of definitions common to XIVE and XICS
 - The XICS-over-XIVE device file book3s_xive.c could be renamed to
   book3s_xics_on_xive.c or book3s_xics_p9.c
 - The XICS-over-XIVE device has locking issues in the setup. 

Changes since v3:

 - removed a couple of useless includes
 - fix the test ont the initial setting of the EQ toggle bit : 0 -> 1
 - renamed qsize to qshift
 - renamed qpage to qaddr
 - checked host page size
 - limited flags to KVM_XIVE_EQ_ALWAYS_NOTIFY to fit sPAPR specs
 - Fixed xive_timaval description in documentation

Changes since v2:

 - removed extra OPAL call definitions
 - removed ->q_order setting. Only useful in the XICS-on-XIVE KVM
   device which allocates the EQs on behalf of the guest.
 - returned -ENXIO when VP base is invalid
 - made use of the xive_vp() macro to compute VP identifiers
 - reworked locking in kvmppc_xive_native_connect_vcpu() to fix races 
 - stop advertising KVM_CAP_PPC_IRQ_XIVE as support is not fully
   available yet
 - fixed comment on XIVE IRQ number space
 - removed usage of the __x_* macros
 - fixed locking on source block
 - fixed comments on the KVM device attribute definitions
 - handled MASKED EAS configuration
 - fixed check on supported EQ size to restrict to 64K pages
 - checked kvm_eq.flags that need to be zero
 - removed the OPAL call when EQ qtoggle bit and index are zero. 
 - reduced the size of kvmppc_one_reg timaval attribute to two u64s
 - stopped returning of the OS CAM line value
 
Changes since v1:

 - Better documentation (was missing)
 - Nested support. XIVE not advertised on non PowerNV platforms. This
   is a good way to test the fallback on QEMU emulated devices.
 - ESB and TIMA special mapping done using the KVM device fd
 - All hcalls moved to QEMU. Dropped the patch moving the hcall flags.
 - Reworked of the KVM device ioctl controls to support hcalls and
   migration needs to capture/save states
 - Merged the control syncing XIVE and marking the EQ page dirty
 - Fixed passthrough support using the KVM device file address_space
   to clear the ESB pages from the mapping
 - Misc enhancements and fixes 

Cédric Le Goater (17):
  powerpc/xive: add OPAL extensions for the XIVE native exploitation
    support
  KVM: PPC: Book3S HV: add a new KVM device for the XIVE native
    exploitation mode
  KVM: PPC: Book3S HV: XIVE: introduce a new capability
    KVM_CAP_PPC_IRQ_XIVE
  KVM: PPC: Book3S HV: XIVE: add a control to initialize a source
  KVM: PPC: Book3S HV: XIVE: add a control to configure a source
  KVM: PPC: Book3S HV: XIVE: add controls for the EQ configuration
  KVM: PPC: Book3S HV: XIVE: add a global reset control
  KVM: PPC: Book3S HV: XIVE: add a control to sync the sources
  KVM: PPC: Book3S HV: XIVE: add a control to dirty the XIVE EQ pages
  KVM: PPC: Book3S HV: XIVE: add get/set accessors for the VP XIVE state
  KVM: introduce a 'mmap' method for KVM devices
  KVM: PPC: Book3S HV: XIVE: add a TIMA mapping
  KVM: PPC: Book3S HV: XIVE: add a mapping for the source ESB pages
  KVM: PPC: Book3S HV: XIVE: add passthrough support
  KVM: PPC: Book3S HV: XIVE: activate XIVE exploitation mode
  KVM: introduce a KVM_DESTROY_DEVICE ioctl
  KVM: PPC: Book3S HV: XIVE: clear the vCPU interrupt presenters

 arch/powerpc/include/asm/kvm_host.h        |    2 +
 arch/powerpc/include/asm/kvm_ppc.h         |   32 +
 arch/powerpc/include/asm/opal-api.h        |    7 +-
 arch/powerpc/include/asm/opal.h            |    7 +
 arch/powerpc/include/asm/xive.h            |   17 +
 arch/powerpc/include/uapi/asm/kvm.h        |   46 +
 arch/powerpc/kvm/book3s_xive.h             |   36 +
 include/linux/kvm_host.h                   |    1 +
 include/uapi/linux/kvm.h                   |   10 +
 arch/powerpc/kvm/book3s.c                  |   31 +-
 arch/powerpc/kvm/book3s_xics.c             |   19 +
 arch/powerpc/kvm/book3s_xive.c             |  170 ++-
 arch/powerpc/kvm/book3s_xive_native.c      | 1205 ++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c                 |   37 +
 arch/powerpc/platforms/powernv/opal-call.c |    3 +
 arch/powerpc/sysdev/xive/native.c          |  110 ++
 virt/kvm/kvm_main.c                        |   52 +
 Documentation/virtual/kvm/api.txt          |   29 +
 Documentation/virtual/kvm/devices/xive.txt |  197 ++++
 arch/powerpc/kvm/Makefile                  |    2 +-
 20 files changed, 1953 insertions(+), 60 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_xive_native.c
 create mode 100644 Documentation/virtual/kvm/devices/xive.txt

-- 
2.20.1

             reply	other threads:[~2019-03-20  8:37 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-20  8:37 Cédric Le Goater [this message]
2019-03-20  8:37 ` [PATCH v4 00/17] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 01/17] powerpc/xive: add OPAL extensions for the XIVE native exploitation support Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 02/17] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 03/17] KVM: PPC: Book3S HV: XIVE: introduce a new capability KVM_CAP_PPC_IRQ_XIVE Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 04/17] KVM: PPC: Book3S HV: XIVE: add a control to initialize a source Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 05/17] KVM: PPC: Book3S HV: XIVE: add a control to configure " Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 06/17] KVM: PPC: Book3S HV: XIVE: add controls for the EQ configuration Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20 23:09   ` David Gibson
2019-03-20 23:09     ` David Gibson
2019-03-21  8:48     ` Cédric Le Goater
2019-03-21  8:48       ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 07/17] KVM: PPC: Book3S HV: XIVE: add a global reset control Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 08/17] KVM: PPC: Book3S HV: XIVE: add a control to sync the sources Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 09/17] KVM: PPC: Book3S HV: XIVE: add a control to dirty the XIVE EQ pages Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 10/17] KVM: PPC: Book3S HV: XIVE: add get/set accessors for the VP XIVE state Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-04-09  6:19   ` Paul Mackerras
2019-04-09  6:19     ` Paul Mackerras
2019-04-09  6:19     ` Paul Mackerras
2019-04-09  9:18     ` Cédric Le Goater
2019-04-09  9:18       ` Cédric Le Goater
2019-04-09  9:18       ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 11/17] KVM: introduce a 'mmap' method for KVM devices Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 12/17] KVM: PPC: Book3S HV: XIVE: add a TIMA mapping Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 13/17] KVM: PPC: Book3S HV: XIVE: add a mapping for the source ESB pages Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 14/17] KVM: PPC: Book3S HV: XIVE: add passthrough support Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 15/17] KVM: PPC: Book3S HV: XIVE: activate XIVE exploitation mode Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 16/17] KVM: introduce a KVM_DESTROY_DEVICE ioctl Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-04-09 14:12   ` Cédric Le Goater
2019-04-09 14:12     ` Cédric Le Goater
2019-04-09 14:12     ` Cédric Le Goater
2019-03-20  8:37 ` [PATCH v4 17/17] KVM: PPC: Book3S HV: XIVE: clear the vCPU interrupt presenters Cédric Le Goater
2019-03-20  8:37   ` Cédric Le Goater
2019-04-09 14:13 ` [RFC PATCH v4.1 16/17] KVM: PPC: Book3S HV: XIVE: introduce a xive_devices array under the VM Cédric Le Goater
2019-04-09 14:13   ` Cédric Le Goater
2019-04-09 14:13   ` [RFC PATCH v4 17/17] KVM: PPC: Book3S HV: XIVE: introduce a 'release' device operation Cédric Le Goater
2019-04-09 14:13     ` Cédric Le Goater
2019-04-15  3:32     ` David Gibson
2019-04-15  3:32       ` David Gibson
2019-04-15  9:25       ` Paul Mackerras
2019-04-15  9:25         ` Paul Mackerras
2019-04-15 13:56         ` Cédric Le Goater
2019-04-15 13:56           ` Cédric Le Goater
2019-04-15 13:48       ` Cédric Le Goater
2019-04-15 13:48         ` Cédric Le Goater
2019-04-17  2:05         ` David Gibson
2019-04-17  2:05           ` David Gibson
2019-04-15  3:26   ` [RFC PATCH v4.1 16/17] KVM: PPC: Book3S HV: XIVE: introduce a xive_devices array under the VM David Gibson
2019-04-15  3:26     ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190320083751.27001-1-clg@kaod.org \
    --to=clg@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.