LinuxPPC-Dev Archive on lore.kernel.org
 help / Atom feed
* [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode
@ 2019-01-07 18:43 Cédric Le Goater
  2019-01-07 18:43 ` [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls Cédric Le Goater
                   ` (17 more replies)
  0 siblings, 18 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

Hello,

On the POWER9 processor, the XIVE interrupt controller can control
interrupt sources using MMIO to trigger events, to EOI or to turn off
the sources. Priority management and interrupt acknowledgment is also
controlled by MMIO in the CPU presenter subengine.

PowerNV/baremetal Linux runs natively under XIVE but sPAPR guests need
special support from the hypervisor to do the same. This is called the
XIVE native exploitation mode and today, it can be activated under the
PowerPC Hypervisor, pHyp. However, Linux/KVM lacks XIVE native support
and still offers the old interrupt mode interface using a
XICS-over-XIVE glue which implements the XICS hcalls.

The following series is proposal to add the same support under KVM.

A new KVM device is introduced for the XIVE native exploitation
mode. It reuses most of the XICS-over-XIVE glue implementation
structures which are internal to KVM but has a completely different
interface. A set of Hypervisor calls configures the sources and the
event queues and from there, all control is done by the guest through
MMIOs.

These MMIO regions (ESB and TIMA) are exposed to guests in QEMU,
similarly to VFIO, and the associated VMAs are populated dynamically
with the appropriate pages using a fault handler. This is implemented
with a couple of KVM device ioctls.

On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
negotiation process determines whether the guest operates with a
interrupt controller using the XICS legacy model, as found on POWER8,
or in XIVE exploitation mode. Which means that the KVM interrupt
device should be created at runtime, after the machine as started.
This requires extra KVM support to create/destroy KVM devices. The
last patches are an attempt to solve that problem.

Migration has its own specific needs. The patchset provides the
necessary routines to quiesce XIVE, to capture and restore the state
of the different structures used by KVM, OPAL and HW. Extra OPAL
support is required for these.

GitHub trees available here :
 
QEMU sPAPR:

  https://github.com/legoater/qemu/commits/xive-next
  
Linux/KVM:

  https://github.com/legoater/linux/commits/xive-5.0

OPAL:

  https://github.com/legoater/skiboot/commits/xive

Best wishes for 2019 !

C.


Cédric Le Goater (19):
  powerpc/xive: export flags for the XIVE native exploitation mode
    hcalls
  powerpc/xive: add OPAL extensions for the XIVE native exploitation
    support
  KVM: PPC: Book3S HV: check the IRQ controller type
  KVM: PPC: Book3S HV: export services for the XIVE native exploitation
    device
  KVM: PPC: Book3S HV: add a new KVM device for the XIVE native
    exploitation mode
  KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native
    device
  KVM: PPC: Book3S HV: add a GET_TIMA_FD control to XIVE native device
  KVM: PPC: Book3S HV: add a VC_BASE control to the XIVE native device
  KVM: PPC: Book3S HV: add a SET_SOURCE control to the XIVE native
    device
  KVM: PPC: Book3S HV: add a EISN attribute to kvmppc_xive_irq_state
  KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode
    hcalls
  KVM: PPC: Book3S HV: record guest queue page address
  KVM: PPC: Book3S HV: add a SYNC control for the XIVE native migration
  KVM: PPC: Book3S HV: add a control to make the XIVE EQ pages dirty
  KVM: PPC: Book3S HV: add get/set accessors for the source
    configuration
  KVM: PPC: Book3S HV: add get/set accessors for the EQ configuration
  KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state
  KVM: PPC: Book3S HV: add passthrough support
  KVM: introduce a KVM_DELETE_DEVICE ioctl

 arch/powerpc/include/asm/kvm_host.h           |    2 +
 arch/powerpc/include/asm/kvm_ppc.h            |   69 +
 arch/powerpc/include/asm/opal-api.h           |   11 +-
 arch/powerpc/include/asm/opal.h               |    7 +
 arch/powerpc/include/asm/xive.h               |   40 +
 arch/powerpc/include/uapi/asm/kvm.h           |   47 +
 arch/powerpc/kvm/book3s_xive.h                |   82 +
 include/linux/kvm_host.h                      |    2 +
 include/uapi/linux/kvm.h                      |    5 +
 arch/powerpc/kvm/book3s.c                     |   31 +-
 arch/powerpc/kvm/book3s_hv.c                  |   29 +
 arch/powerpc/kvm/book3s_hv_builtin.c          |  196 +++
 arch/powerpc/kvm/book3s_hv_rm_xive_native.c   |   47 +
 arch/powerpc/kvm/book3s_xive.c                |  149 +-
 arch/powerpc/kvm/book3s_xive_native.c         | 1406 +++++++++++++++++
 .../powerpc/kvm/book3s_xive_native_template.c |  398 +++++
 arch/powerpc/kvm/powerpc.c                    |   30 +
 arch/powerpc/sysdev/xive/native.c             |  110 ++
 arch/powerpc/sysdev/xive/spapr.c              |   28 +-
 virt/kvm/kvm_main.c                           |   39 +
 arch/powerpc/kvm/Makefile                     |    4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S       |   52 +
 .../powerpc/platforms/powernv/opal-wrappers.S |    3 +
 23 files changed, 2722 insertions(+), 65 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_rm_xive_native.c
 create mode 100644 arch/powerpc/kvm/book3s_xive_native.c
 create mode 100644 arch/powerpc/kvm/book3s_xive_native_template.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-09  3:33   ` David Gibson
  2019-01-09 13:08   ` Michael Ellerman
  2019-01-07 18:43 ` [PATCH 02/19] powerpc/xive: add OPAL extensions for the XIVE native exploitation support Cédric Le Goater
                   ` (16 subsequent siblings)
  17 siblings, 2 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

These flags are shared between Linux/KVM implementing the hypervisor
calls for the XIVE native exploitation mode and the driver for the
sPAPR guests.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/xive.h  | 23 +++++++++++++++++++++++
 arch/powerpc/sysdev/xive/spapr.c | 28 ++++++++--------------------
 2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
index 3c704f5dd3ae..32f033bfbf42 100644
--- a/arch/powerpc/include/asm/xive.h
+++ b/arch/powerpc/include/asm/xive.h
@@ -93,6 +93,29 @@ extern void xive_flush_interrupt(void);
 /* xmon hook */
 extern void xmon_xive_do_dump(int cpu);
 
+/*
+ * Hcall flags shared by the sPAPR backend and KVM
+ */
+
+/* H_INT_GET_SOURCE_INFO */
+#define XIVE_SPAPR_SRC_H_INT_ESB	PPC_BIT(60)
+#define XIVE_SPAPR_SRC_LSI		PPC_BIT(61)
+#define XIVE_SPAPR_SRC_TRIGGER		PPC_BIT(62)
+#define XIVE_SPAPR_SRC_STORE_EOI	PPC_BIT(63)
+
+/* H_INT_SET_SOURCE_CONFIG */
+#define XIVE_SPAPR_SRC_SET_EISN		PPC_BIT(62)
+#define XIVE_SPAPR_SRC_MASK		PPC_BIT(63) /* unused */
+
+/* H_INT_SET_QUEUE_CONFIG */
+#define XIVE_SPAPR_EQ_ALWAYS_NOTIFY	PPC_BIT(63)
+
+/* H_INT_SET_QUEUE_CONFIG */
+#define XIVE_SPAPR_EQ_DEBUG		PPC_BIT(63)
+
+/* H_INT_ESB */
+#define XIVE_SPAPR_ESB_STORE		PPC_BIT(63)
+
 /* APIs used by KVM */
 extern u32 xive_native_default_eq_shift(void);
 extern u32 xive_native_alloc_vp_block(u32 max_vcpus);
diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
index 575db3b06a6b..730284f838c8 100644
--- a/arch/powerpc/sysdev/xive/spapr.c
+++ b/arch/powerpc/sysdev/xive/spapr.c
@@ -184,9 +184,6 @@ static long plpar_int_get_source_info(unsigned long flags,
 	return 0;
 }
 
-#define XIVE_SRC_SET_EISN (1ull << (63 - 62))
-#define XIVE_SRC_MASK     (1ull << (63 - 63)) /* unused */
-
 static long plpar_int_set_source_config(unsigned long flags,
 					unsigned long lisn,
 					unsigned long target,
@@ -243,8 +240,6 @@ static long plpar_int_get_queue_info(unsigned long flags,
 	return 0;
 }
 
-#define XIVE_EQ_ALWAYS_NOTIFY (1ull << (63 - 63))
-
 static long plpar_int_set_queue_config(unsigned long flags,
 				       unsigned long target,
 				       unsigned long priority,
@@ -286,8 +281,6 @@ static long plpar_int_sync(unsigned long flags, unsigned long lisn)
 	return 0;
 }
 
-#define XIVE_ESB_FLAG_STORE (1ull << (63 - 63))
-
 static long plpar_int_esb(unsigned long flags,
 			  unsigned long lisn,
 			  unsigned long offset,
@@ -321,7 +314,7 @@ static u64 xive_spapr_esb_rw(u32 lisn, u32 offset, u64 data, bool write)
 	unsigned long read_data;
 	long rc;
 
-	rc = plpar_int_esb(write ? XIVE_ESB_FLAG_STORE : 0,
+	rc = plpar_int_esb(write ? XIVE_SPAPR_ESB_STORE : 0,
 			   lisn, offset, data, &read_data);
 	if (rc)
 		return -1;
@@ -329,11 +322,6 @@ static u64 xive_spapr_esb_rw(u32 lisn, u32 offset, u64 data, bool write)
 	return write ? 0 : read_data;
 }
 
-#define XIVE_SRC_H_INT_ESB     (1ull << (63 - 60))
-#define XIVE_SRC_LSI           (1ull << (63 - 61))
-#define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
-#define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
-
 static int xive_spapr_populate_irq_data(u32 hw_irq, struct xive_irq_data *data)
 {
 	long rc;
@@ -349,11 +337,11 @@ static int xive_spapr_populate_irq_data(u32 hw_irq, struct xive_irq_data *data)
 	if (rc)
 		return  -EINVAL;
 
-	if (flags & XIVE_SRC_H_INT_ESB)
+	if (flags & XIVE_SPAPR_SRC_H_INT_ESB)
 		data->flags  |= XIVE_IRQ_FLAG_H_INT_ESB;
-	if (flags & XIVE_SRC_STORE_EOI)
+	if (flags & XIVE_SPAPR_SRC_STORE_EOI)
 		data->flags  |= XIVE_IRQ_FLAG_STORE_EOI;
-	if (flags & XIVE_SRC_LSI)
+	if (flags & XIVE_SPAPR_SRC_LSI)
 		data->flags  |= XIVE_IRQ_FLAG_LSI;
 	data->eoi_page  = eoi_page;
 	data->esb_shift = esb_shift;
@@ -374,7 +362,7 @@ static int xive_spapr_populate_irq_data(u32 hw_irq, struct xive_irq_data *data)
 	data->hw_irq = hw_irq;
 
 	/* Full function page supports trigger */
-	if (flags & XIVE_SRC_TRIGGER) {
+	if (flags & XIVE_SPAPR_SRC_TRIGGER) {
 		data->trig_mmio = data->eoi_mmio;
 		return 0;
 	}
@@ -391,8 +379,8 @@ static int xive_spapr_configure_irq(u32 hw_irq, u32 target, u8 prio, u32 sw_irq)
 {
 	long rc;
 
-	rc = plpar_int_set_source_config(XIVE_SRC_SET_EISN, hw_irq, target,
-					 prio, sw_irq);
+	rc = plpar_int_set_source_config(XIVE_SPAPR_SRC_SET_EISN, hw_irq,
+					 target, prio, sw_irq);
 
 	return rc == 0 ? 0 : -ENXIO;
 }
@@ -432,7 +420,7 @@ static int xive_spapr_configure_queue(u32 target, struct xive_q *q, u8 prio,
 	q->eoi_phys = esn_page;
 
 	/* Default is to always notify */
-	flags = XIVE_EQ_ALWAYS_NOTIFY;
+	flags = XIVE_SPAPR_EQ_ALWAYS_NOTIFY;
 
 	/* Configure and enable the queue in HW */
 	rc = plpar_int_set_queue_config(flags, target, prio, qpage_phys, order);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 02/19] powerpc/xive: add OPAL extensions for the XIVE native exploitation support
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
  2019-01-07 18:43 ` [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-09  4:26   ` David Gibson
  2019-01-07 18:43 ` [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type Cédric Le Goater
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

The support for XIVE native exploitation mode in Linux/KVM needs a
couple more OPAL calls to configure the sPAPR guest and to get/set the
state of the XIVE internal structures.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/opal-api.h           | 11 ++-
 arch/powerpc/include/asm/opal.h               |  7 ++
 arch/powerpc/include/asm/xive.h               | 14 +++
 arch/powerpc/sysdev/xive/native.c             | 99 +++++++++++++++++++
 .../powerpc/platforms/powernv/opal-wrappers.S |  3 +
 5 files changed, 130 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 870fb7b239ea..cdfc54f78101 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -186,8 +186,8 @@
 #define OPAL_XIVE_FREE_IRQ			140
 #define OPAL_XIVE_SYNC				141
 #define OPAL_XIVE_DUMP				142
-#define OPAL_XIVE_RESERVED3			143
-#define OPAL_XIVE_RESERVED4			144
+#define OPAL_XIVE_GET_QUEUE_STATE		143
+#define OPAL_XIVE_SET_QUEUE_STATE		144
 #define OPAL_SIGNAL_SYSTEM_RESET		145
 #define OPAL_NPU_INIT_CONTEXT			146
 #define OPAL_NPU_DESTROY_CONTEXT		147
@@ -209,8 +209,11 @@
 #define OPAL_SENSOR_GROUP_ENABLE		163
 #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR		164
 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR		165
-#define	OPAL_NX_COPROC_INIT			167
-#define OPAL_LAST				167
+#define OPAL_HANDLE_HMI2			166
+#define OPAL_NX_COPROC_INIT			167
+#define OPAL_NPU_SET_RELAXED_ORDER		168
+#define OPAL_NPU_GET_RELAXED_ORDER		169
+#define OPAL_XIVE_GET_VP_STATE			170
 
 #define QUIESCE_HOLD			1 /* Spin all calls at entry */
 #define QUIESCE_REJECT			2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index a55b01c90bb1..4e978d4dea5c 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -279,6 +279,13 @@ int64_t opal_xive_allocate_irq(uint32_t chip_id);
 int64_t opal_xive_free_irq(uint32_t girq);
 int64_t opal_xive_sync(uint32_t type, uint32_t id);
 int64_t opal_xive_dump(uint32_t type, uint32_t id);
+int64_t opal_xive_get_queue_state(uint64_t vp, uint32_t prio,
+				  __be32 *out_qtoggle,
+				  __be32 *out_qindex);
+int64_t opal_xive_set_queue_state(uint64_t vp, uint32_t prio,
+				  uint32_t qtoggle,
+				  uint32_t qindex);
+int64_t opal_xive_get_vp_state(uint64_t vp, __be64 *out_w01);
 int64_t opal_pci_set_p2p(uint64_t phb_init, uint64_t phb_target,
 			uint64_t desc, uint16_t pe_number);
 
diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
index 32f033bfbf42..d6be3e4d9fa4 100644
--- a/arch/powerpc/include/asm/xive.h
+++ b/arch/powerpc/include/asm/xive.h
@@ -132,12 +132,26 @@ extern int xive_native_configure_queue(u32 vp_id, struct xive_q *q, u8 prio,
 extern void xive_native_disable_queue(u32 vp_id, struct xive_q *q, u8 prio);
 
 extern void xive_native_sync_source(u32 hw_irq);
+extern void xive_native_sync_queue(u32 hw_irq);
 extern bool is_xive_irq(struct irq_chip *chip);
 extern int xive_native_enable_vp(u32 vp_id, bool single_escalation);
 extern int xive_native_disable_vp(u32 vp_id);
 extern int xive_native_get_vp_info(u32 vp_id, u32 *out_cam_id, u32 *out_chip_id);
 extern bool xive_native_has_single_escalation(void);
 
+extern int xive_native_get_queue_info(u32 vp_id, uint32_t prio,
+				      u64 *out_qpage,
+				      u64 *out_qsize,
+				      u64 *out_qeoi_page,
+				      u32 *out_escalate_irq,
+				      u64 *out_qflags);
+
+extern int xive_native_get_queue_state(u32 vp_id, uint32_t prio, u32 *qtoggle,
+				       u32 *qindex);
+extern int xive_native_set_queue_state(u32 vp_id, uint32_t prio, u32 qtoggle,
+				       u32 qindex);
+extern int xive_native_get_vp_state(u32 vp_id, u64 *out_state);
+
 #else
 
 static inline bool xive_enabled(void) { return false; }
diff --git a/arch/powerpc/sysdev/xive/native.c b/arch/powerpc/sysdev/xive/native.c
index 1ca127d052a6..0c037e933e55 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -437,6 +437,12 @@ void xive_native_sync_source(u32 hw_irq)
 }
 EXPORT_SYMBOL_GPL(xive_native_sync_source);
 
+void xive_native_sync_queue(u32 hw_irq)
+{
+	opal_xive_sync(XIVE_SYNC_QUEUE, hw_irq);
+}
+EXPORT_SYMBOL_GPL(xive_native_sync_queue);
+
 static const struct xive_ops xive_native_ops = {
 	.populate_irq_data	= xive_native_populate_irq_data,
 	.configure_irq		= xive_native_configure_irq,
@@ -711,3 +717,96 @@ bool xive_native_has_single_escalation(void)
 	return xive_has_single_esc;
 }
 EXPORT_SYMBOL_GPL(xive_native_has_single_escalation);
+
+int xive_native_get_queue_info(u32 vp_id, u32 prio,
+			       u64 *out_qpage,
+			       u64 *out_qsize,
+			       u64 *out_qeoi_page,
+			       u32 *out_escalate_irq,
+			       u64 *out_qflags)
+{
+	__be64 qpage;
+	__be64 qsize;
+	__be64 qeoi_page;
+	__be32 escalate_irq;
+	__be64 qflags;
+	s64 rc;
+
+	rc = opal_xive_get_queue_info(vp_id, prio, &qpage, &qsize,
+				      &qeoi_page, &escalate_irq, &qflags);
+	if (rc) {
+		pr_err("OPAL failed to get queue info for VCPU %d/%d : %lld\n",
+		       vp_id, prio, rc);
+		return -EIO;
+	}
+
+	if (out_qpage)
+		*out_qpage = be64_to_cpu(qpage);
+	if (out_qsize)
+		*out_qsize = be32_to_cpu(qsize);
+	if (out_qeoi_page)
+		*out_qeoi_page = be64_to_cpu(qeoi_page);
+	if (out_escalate_irq)
+		*out_escalate_irq = be32_to_cpu(escalate_irq);
+	if (out_qflags)
+		*out_qflags = be64_to_cpu(qflags);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xive_native_get_queue_info);
+
+int xive_native_get_queue_state(u32 vp_id, u32 prio, u32 *qtoggle, u32 *qindex)
+{
+	__be32 opal_qtoggle;
+	__be32 opal_qindex;
+	s64 rc;
+
+	rc = opal_xive_get_queue_state(vp_id, prio, &opal_qtoggle,
+				       &opal_qindex);
+	if (rc) {
+		pr_err("OPAL failed to get queue state for VCPU %d/%d : %lld\n",
+		       vp_id, prio, rc);
+		return -EIO;
+	}
+
+	if (qtoggle)
+		*qtoggle = be32_to_cpu(opal_qtoggle);
+	if (qindex)
+		*qindex = be32_to_cpu(opal_qindex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xive_native_get_queue_state);
+
+int xive_native_set_queue_state(u32 vp_id, u32 prio, u32 qtoggle, u32 qindex)
+{
+	s64 rc;
+
+	rc = opal_xive_set_queue_state(vp_id, prio, qtoggle, qindex);
+	if (rc) {
+		pr_err("OPAL failed to set queue state for VCPU %d/%d : %lld\n",
+		       vp_id, prio, rc);
+		return -EIO;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xive_native_set_queue_state);
+
+int xive_native_get_vp_state(u32 vp_id, u64 *out_state)
+{
+	__be64 state;
+	s64 rc;
+
+	rc = opal_xive_get_vp_state(vp_id, &state);
+	if (rc) {
+		pr_err("OPAL failed to get vp state for VCPU %d : %lld\n",
+		       vp_id, rc);
+		return -EIO;
+	}
+
+	if (out_state)
+		*out_state = be64_to_cpu(state);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xive_native_get_vp_state);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index f4875fe3f8ff..3179953d6b56 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -309,6 +309,9 @@ OPAL_CALL(opal_xive_get_vp_info,		OPAL_XIVE_GET_VP_INFO);
 OPAL_CALL(opal_xive_set_vp_info,		OPAL_XIVE_SET_VP_INFO);
 OPAL_CALL(opal_xive_sync,			OPAL_XIVE_SYNC);
 OPAL_CALL(opal_xive_dump,			OPAL_XIVE_DUMP);
+OPAL_CALL(opal_xive_get_queue_state,		OPAL_XIVE_GET_QUEUE_STATE);
+OPAL_CALL(opal_xive_set_queue_state,		OPAL_XIVE_SET_QUEUE_STATE);
+OPAL_CALL(opal_xive_get_vp_state,		OPAL_XIVE_GET_VP_STATE);
 OPAL_CALL(opal_signal_system_reset,		OPAL_SIGNAL_SYSTEM_RESET);
 OPAL_CALL(opal_npu_init_context,		OPAL_NPU_INIT_CONTEXT);
 OPAL_CALL(opal_npu_destroy_context,		OPAL_NPU_DESTROY_CONTEXT);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
  2019-01-07 18:43 ` [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls Cédric Le Goater
  2019-01-07 18:43 ` [PATCH 02/19] powerpc/xive: add OPAL extensions for the XIVE native exploitation support Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-09  4:27   ` David Gibson
  2019-01-22  4:56   ` Paul Mackerras
  2019-01-07 18:43 ` [PATCH 04/19] KVM: PPC: Book3S HV: export services for the XIVE native exploitation device Cédric Le Goater
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

We will have different KVM devices for interrupts, one for the
XICS-over-XIVE mode and one for the XIVE native exploitation
mode. Let's add some checks to make sure we are not mixing the
interfaces in KVM.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index f78d002f0fe0..8a4fa45f07f8 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -819,6 +819,9 @@ u64 kvmppc_xive_get_icp(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
 
+	if (!kvmppc_xics_enabled(vcpu))
+		return -EPERM;
+
 	if (!xc)
 		return 0;
 
@@ -835,6 +838,9 @@ int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
 	u8 cppr, mfrr;
 	u32 xisr;
 
+	if (!kvmppc_xics_enabled(vcpu))
+		return -EPERM;
+
 	if (!xc || !xive)
 		return -ENOENT;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 04/19] KVM: PPC: Book3S HV: export services for the XIVE native exploitation device
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (2 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-11  4:09   ` David Gibson
  2019-01-07 18:43 ` [PATCH 05/19] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode Cédric Le Goater
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

The KVM device for the XIVE native exploitation mode will reuse the
structures of the XICS-over-XIVE glue implementation. Some code will
also be shared : source block creation and destruction, target
selection and escalation attachment.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.h | 11 +++++
 arch/powerpc/kvm/book3s_xive.c | 89 +++++++++++++++++++---------------
 2 files changed, 62 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index a08ae6fd4c51..10c4aa5cd010 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -248,5 +248,16 @@ extern int (*__xive_vm_h_ipi)(struct kvm_vcpu *vcpu, unsigned long server,
 extern int (*__xive_vm_h_cppr)(struct kvm_vcpu *vcpu, unsigned long cppr);
 extern int (*__xive_vm_h_eoi)(struct kvm_vcpu *vcpu, unsigned long xirr);
 
+/*
+ * Common Xive routines for XICS-over-XIVE and XIVE native
+ */
+struct kvmppc_xive_src_block *kvmppc_xive_create_src_block(
+	struct kvmppc_xive *xive, int irq);
+void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb);
+int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio);
+void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu);
+int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio);
+int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu);
+
 #endif /* CONFIG_KVM_XICS */
 #endif /* _KVM_PPC_BOOK3S_XICS_H */
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 8a4fa45f07f8..bb5d32f7e4e6 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -166,7 +166,7 @@ static irqreturn_t xive_esc_irq(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
-static int xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio)
+int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio)
 {
 	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
 	struct xive_q *q = &xc->queues[prio];
@@ -291,7 +291,7 @@ static int xive_check_provisioning(struct kvm *kvm, u8 prio)
 			continue;
 		rc = xive_provision_queue(vcpu, prio);
 		if (rc == 0 && !xive->single_escalation)
-			xive_attach_escalation(vcpu, prio);
+			kvmppc_xive_attach_escalation(vcpu, prio);
 		if (rc)
 			return rc;
 	}
@@ -342,7 +342,7 @@ static int xive_try_pick_queue(struct kvm_vcpu *vcpu, u8 prio)
 	return atomic_add_unless(&q->count, 1, max) ? 0 : -EBUSY;
 }
 
-static int xive_select_target(struct kvm *kvm, u32 *server, u8 prio)
+int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio)
 {
 	struct kvm_vcpu *vcpu;
 	int i, rc;
@@ -535,7 +535,7 @@ static int xive_target_interrupt(struct kvm *kvm,
 	 * priority. The count for that new target will have
 	 * already been incremented.
 	 */
-	rc = xive_select_target(kvm, &server, prio);
+	rc = kvmppc_xive_select_target(kvm, &server, prio);
 
 	/*
 	 * We failed to find a target ? Not much we can do
@@ -1055,7 +1055,7 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
 }
 EXPORT_SYMBOL_GPL(kvmppc_xive_clr_mapped);
 
-static void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
+void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
 	struct kvm *kvm = vcpu->kvm;
@@ -1225,7 +1225,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
 		if (xive->qmap & (1 << i)) {
 			r = xive_provision_queue(vcpu, i);
 			if (r == 0 && !xive->single_escalation)
-				xive_attach_escalation(vcpu, i);
+				kvmppc_xive_attach_escalation(vcpu, i);
 			if (r)
 				goto bail;
 		} else {
@@ -1240,7 +1240,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
 	}
 
 	/* If not done above, attach priority 0 escalation */
-	r = xive_attach_escalation(vcpu, 0);
+	r = kvmppc_xive_attach_escalation(vcpu, 0);
 	if (r)
 		goto bail;
 
@@ -1491,8 +1491,8 @@ static int xive_get_source(struct kvmppc_xive *xive, long irq, u64 addr)
 	return 0;
 }
 
-static struct kvmppc_xive_src_block *xive_create_src_block(struct kvmppc_xive *xive,
-							   int irq)
+struct kvmppc_xive_src_block *kvmppc_xive_create_src_block(
+	struct kvmppc_xive *xive, int irq)
 {
 	struct kvm *kvm = xive->kvm;
 	struct kvmppc_xive_src_block *sb;
@@ -1571,7 +1571,7 @@ static int xive_set_source(struct kvmppc_xive *xive, long irq, u64 addr)
 	sb = kvmppc_xive_find_source(xive, irq, &idx);
 	if (!sb) {
 		pr_devel("No source, creating source block...\n");
-		sb = xive_create_src_block(xive, irq);
+		sb = kvmppc_xive_create_src_block(xive, irq);
 		if (!sb) {
 			pr_devel("Failed to create block...\n");
 			return -ENOMEM;
@@ -1795,7 +1795,7 @@ static void kvmppc_xive_cleanup_irq(u32 hw_num, struct xive_irq_data *xd)
 	xive_cleanup_irq_data(xd);
 }
 
-static void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
+void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
 {
 	int i;
 
@@ -1824,6 +1824,8 @@ static void kvmppc_xive_free(struct kvm_device *dev)
 
 	debugfs_remove(xive->dentry);
 
+	pr_devel("Destroying xive for partition\n");
+
 	if (kvm)
 		kvm->arch.xive = NULL;
 
@@ -1889,6 +1891,43 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 	return 0;
 }
 
+int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+	unsigned int i;
+
+	for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
+		struct xive_q *q = &xc->queues[i];
+		u32 i0, i1, idx;
+
+		if (!q->qpage && !xc->esc_virq[i])
+			continue;
+
+		seq_printf(m, " [q%d]: ", i);
+
+		if (q->qpage) {
+			idx = q->idx;
+			i0 = be32_to_cpup(q->qpage + idx);
+			idx = (idx + 1) & q->msk;
+			i1 = be32_to_cpup(q->qpage + idx);
+			seq_printf(m, "T=%d %08x %08x...\n", q->toggle,
+				   i0, i1);
+		}
+		if (xc->esc_virq[i]) {
+			struct irq_data *d = irq_get_irq_data(xc->esc_virq[i]);
+			struct xive_irq_data *xd =
+				irq_data_get_irq_handler_data(d);
+			u64 pq = xive_vm_esb_load(xd, XIVE_ESB_GET);
+
+			seq_printf(m, "E:%c%c I(%d:%llx:%llx)",
+				   (pq & XIVE_ESB_VAL_P) ? 'P' : 'p',
+				   (pq & XIVE_ESB_VAL_Q) ? 'Q' : 'q',
+				   xc->esc_virq[i], pq, xd->eoi_page);
+			seq_puts(m, "\n");
+		}
+	}
+	return 0;
+}
 
 static int xive_debug_show(struct seq_file *m, void *private)
 {
@@ -1914,7 +1953,6 @@ static int xive_debug_show(struct seq_file *m, void *private)
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
-		unsigned int i;
 
 		if (!xc)
 			continue;
@@ -1924,33 +1962,8 @@ static int xive_debug_show(struct seq_file *m, void *private)
 			   xc->server_num, xc->cppr, xc->hw_cppr,
 			   xc->mfrr, xc->pending,
 			   xc->stat_rm_h_xirr, xc->stat_vm_h_xirr);
-		for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
-			struct xive_q *q = &xc->queues[i];
-			u32 i0, i1, idx;
 
-			if (!q->qpage && !xc->esc_virq[i])
-				continue;
-
-			seq_printf(m, " [q%d]: ", i);
-
-			if (q->qpage) {
-				idx = q->idx;
-				i0 = be32_to_cpup(q->qpage + idx);
-				idx = (idx + 1) & q->msk;
-				i1 = be32_to_cpup(q->qpage + idx);
-				seq_printf(m, "T=%d %08x %08x... \n", q->toggle, i0, i1);
-			}
-			if (xc->esc_virq[i]) {
-				struct irq_data *d = irq_get_irq_data(xc->esc_virq[i]);
-				struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
-				u64 pq = xive_vm_esb_load(xd, XIVE_ESB_GET);
-				seq_printf(m, "E:%c%c I(%d:%llx:%llx)",
-					   (pq & XIVE_ESB_VAL_P) ? 'P' : 'p',
-					   (pq & XIVE_ESB_VAL_Q) ? 'Q' : 'q',
-					   xc->esc_virq[i], pq, xd->eoi_page);
-				seq_printf(m, "\n");
-			}
-		}
+		kvmppc_xive_debug_show_queues(m, vcpu);
 
 		t_rm_h_xirr += xc->stat_rm_h_xirr;
 		t_rm_h_ipoll += xc->stat_rm_h_ipoll;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 05/19] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (3 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 04/19] KVM: PPC: Book3S HV: export services for the XIVE native exploitation device Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-22  5:05   ` Paul Mackerras
  2019-01-07 18:43 ` [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device Cédric Le Goater
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

This is the basic framework for the new KVM device supporting the XIVE
native exploitation mode. The user interface exposes a new capability
and a new KVM device to be used by QEMU.

Internally, the interface to the new KVM device is protected with a
new interrupt mode: KVMPPC_IRQ_XIVE.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/kvm_host.h   |   2 +
 arch/powerpc/include/asm/kvm_ppc.h    |  21 ++
 arch/powerpc/kvm/book3s_xive.h        |   3 +
 include/uapi/linux/kvm.h              |   3 +
 arch/powerpc/kvm/book3s.c             |   7 +-
 arch/powerpc/kvm/book3s_xive_native.c | 332 ++++++++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c            |  30 +++
 arch/powerpc/kvm/Makefile             |   2 +-
 8 files changed, 398 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_xive_native.c

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 0f98f00da2ea..c522e8274ad9 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -220,6 +220,7 @@ extern struct kvm_device_ops kvm_xics_ops;
 struct kvmppc_xive;
 struct kvmppc_xive_vcpu;
 extern struct kvm_device_ops kvm_xive_ops;
+extern struct kvm_device_ops kvm_xive_native_ops;
 
 struct kvmppc_passthru_irqmap;
 
@@ -446,6 +447,7 @@ struct kvmppc_passthru_irqmap {
 #define KVMPPC_IRQ_DEFAULT	0
 #define KVMPPC_IRQ_MPIC		1
 #define KVMPPC_IRQ_XICS		2 /* Includes a XIVE option */
+#define KVMPPC_IRQ_XIVE		3 /* XIVE native exploitation mode */
 
 #define MMIO_HPTE_CACHE_SIZE	4
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index eb0d79f0ca45..1bb313f238fe 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -591,6 +591,18 @@ extern int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
 extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
 			       int level, bool line_status);
 extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
+
+static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.irq_type == KVMPPC_IRQ_XIVE;
+}
+
+extern int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
+				    struct kvm_vcpu *vcpu, u32 cpu);
+extern void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu);
+extern void kvmppc_xive_native_init_module(void);
+extern void kvmppc_xive_native_exit_module(void);
+
 #else
 static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server,
 				       u32 priority) { return -1; }
@@ -614,6 +626,15 @@ static inline int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval) { retur
 static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
 				      int level, bool line_status) { return -ENODEV; }
 static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
+
+static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
+	{ return 0; }
+static inline int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
+						  struct kvm_vcpu *vcpu, u32 cpu) { return -EBUSY; }
+static inline void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu) { }
+static inline void kvmppc_xive_native_init_module(void) { }
+static inline void kvmppc_xive_native_exit_module(void) { }
+
 #endif /* CONFIG_KVM_XIVE */
 
 /*
diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index 10c4aa5cd010..5f22415520b4 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -12,6 +12,9 @@
 #ifdef CONFIG_KVM_XICS
 #include "book3s_xics.h"
 
+#define KVMPPC_XIVE_FIRST_IRQ	0
+#define KVMPPC_XIVE_NR_IRQS	KVMPPC_XICS_NR_IRQS
+
 /*
  * State for one guest irq source.
  *
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6d4ea4b6c922..52bf74a1616e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -988,6 +988,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_VM_IPA_SIZE 165
 #define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166
 #define KVM_CAP_HYPERV_CPUID 167
+#define KVM_CAP_PPC_IRQ_XIVE 168
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1211,6 +1212,8 @@ enum kvm_device_type {
 #define KVM_DEV_TYPE_ARM_VGIC_V3	KVM_DEV_TYPE_ARM_VGIC_V3
 	KVM_DEV_TYPE_ARM_VGIC_ITS,
 #define KVM_DEV_TYPE_ARM_VGIC_ITS	KVM_DEV_TYPE_ARM_VGIC_ITS
+	KVM_DEV_TYPE_XIVE,
+#define KVM_DEV_TYPE_XIVE		KVM_DEV_TYPE_XIVE
 	KVM_DEV_TYPE_MAX,
 };
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index bd1a677dd9e4..de7eed191107 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1039,7 +1039,10 @@ static int kvmppc_book3s_init(void)
 #ifdef CONFIG_KVM_XIVE
 	if (xive_enabled()) {
 		kvmppc_xive_init_module();
+		kvmppc_xive_native_init_module();
 		kvm_register_device_ops(&kvm_xive_ops, KVM_DEV_TYPE_XICS);
+		kvm_register_device_ops(&kvm_xive_native_ops,
+					KVM_DEV_TYPE_XIVE);
 	} else
 #endif
 		kvm_register_device_ops(&kvm_xics_ops, KVM_DEV_TYPE_XICS);
@@ -1050,8 +1053,10 @@ static int kvmppc_book3s_init(void)
 static void kvmppc_book3s_exit(void)
 {
 #ifdef CONFIG_KVM_XICS
-	if (xive_enabled())
+	if (xive_enabled()) {
 		kvmppc_xive_exit_module();
+		kvmppc_xive_native_exit_module();
+	}
 #endif
 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
 	kvmppc_book3s_exit_pr();
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
new file mode 100644
index 000000000000..115143e76c45
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -0,0 +1,332 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2017-2019, IBM Corporation.
+ */
+
+#define pr_fmt(fmt) "xive-kvm: " fmt
+
+#include <linux/anon_inodes.h>
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/err.h>
+#include <linux/gfp.h>
+#include <linux/spinlock.h>
+#include <linux/delay.h>
+#include <linux/percpu.h>
+#include <linux/cpumask.h>
+#include <asm/uaccess.h>
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_ppc.h>
+#include <asm/hvcall.h>
+#include <asm/xics.h>
+#include <asm/xive.h>
+#include <asm/xive-regs.h>
+#include <asm/debug.h>
+#include <asm/debugfs.h>
+#include <asm/time.h>
+#include <asm/opal.h>
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+#include "book3s_xive.h"
+
+static void xive_native_cleanup_queue(struct kvm_vcpu *vcpu, int prio)
+{
+	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+	struct xive_q *q = &xc->queues[prio];
+
+	xive_native_disable_queue(xc->vp_id, q, prio);
+	if (q->qpage) {
+		put_page(virt_to_page(q->qpage));
+		q->qpage = NULL;
+	}
+}
+
+void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+	int i;
+
+	if (!kvmppc_xive_enabled(vcpu))
+		return;
+
+	if (!xc)
+		return;
+
+	pr_devel("native_cleanup_vcpu(cpu=%d)\n", xc->server_num);
+
+	/* Ensure no interrupt is still routed to that VP */
+	xc->valid = false;
+	kvmppc_xive_disable_vcpu_interrupts(vcpu);
+
+	/* Disable the VP */
+	xive_native_disable_vp(xc->vp_id);
+
+	/* Free the queues & associated interrupts */
+	for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
+		/* Free the escalation irq */
+		if (xc->esc_virq[i]) {
+			free_irq(xc->esc_virq[i], vcpu);
+			irq_dispose_mapping(xc->esc_virq[i]);
+			kfree(xc->esc_virq_names[i]);
+			xc->esc_virq[i] = 0;
+		}
+
+		/* Free the queue */
+		xive_native_cleanup_queue(vcpu, i);
+	}
+
+	/* Free the VP */
+	kfree(xc);
+
+	/* Cleanup the vcpu */
+	vcpu->arch.irq_type = KVMPPC_IRQ_DEFAULT;
+	vcpu->arch.xive_vcpu = NULL;
+}
+
+int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
+				    struct kvm_vcpu *vcpu, u32 cpu)
+{
+	struct kvmppc_xive *xive = dev->private;
+	struct kvmppc_xive_vcpu *xc;
+	int rc;
+
+	pr_devel("native_connect_vcpu(cpu=%d)\n", cpu);
+
+	if (dev->ops != &kvm_xive_native_ops) {
+		pr_devel("Wrong ops !\n");
+		return -EPERM;
+	}
+	if (xive->kvm != vcpu->kvm)
+		return -EPERM;
+	if (vcpu->arch.irq_type)
+		return -EBUSY;
+	if (kvmppc_xive_find_server(vcpu->kvm, cpu)) {
+		pr_devel("Duplicate !\n");
+		return -EEXIST;
+	}
+	if (cpu >= KVM_MAX_VCPUS) {
+		pr_devel("Out of bounds !\n");
+		return -EINVAL;
+	}
+	xc = kzalloc(sizeof(*xc), GFP_KERNEL);
+	if (!xc)
+		return -ENOMEM;
+
+	mutex_lock(&vcpu->kvm->lock);
+	vcpu->arch.xive_vcpu = xc;
+	xc->xive = xive;
+	xc->vcpu = vcpu;
+	xc->server_num = cpu;
+	xc->vp_id = xive->vp_base + cpu;
+	xc->valid = true;
+
+	rc = xive_native_get_vp_info(xc->vp_id, &xc->vp_cam, &xc->vp_chip_id);
+	if (rc) {
+		pr_err("Failed to get VP info from OPAL: %d\n", rc);
+		goto bail;
+	}
+
+	/*
+	 * Enable the VP first as the single escalation mode will
+	 * affect escalation interrupts numbering
+	 */
+	rc = xive_native_enable_vp(xc->vp_id, xive->single_escalation);
+	if (rc) {
+		pr_err("Failed to enable VP in OPAL: %d\n", rc);
+		goto bail;
+	}
+
+	/* Configure VCPU fields for use by assembly push/pull */
+	vcpu->arch.xive_saved_state.w01 = cpu_to_be64(0xff000000);
+	vcpu->arch.xive_cam_word = cpu_to_be32(xc->vp_cam | TM_QW1W2_VO);
+
+	/* TODO: initialize queues ? */
+
+bail:
+	vcpu->arch.irq_type = KVMPPC_IRQ_XIVE;
+	mutex_unlock(&vcpu->kvm->lock);
+	if (rc)
+		kvmppc_xive_native_cleanup_vcpu(vcpu);
+
+	return rc;
+}
+
+static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
+				       struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+static int kvmppc_xive_native_get_attr(struct kvm_device *dev,
+				       struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
+				       struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+static void kvmppc_xive_native_free(struct kvm_device *dev)
+{
+	struct kvmppc_xive *xive = dev->private;
+	struct kvm *kvm = xive->kvm;
+	int i;
+
+	debugfs_remove(xive->dentry);
+
+	pr_devel("Destroying xive native for partition\n");
+
+	if (kvm)
+		kvm->arch.xive = NULL;
+
+	/* Mask and free interrupts */
+	for (i = 0; i <= xive->max_sbid; i++) {
+		if (xive->src_blocks[i])
+			kvmppc_xive_free_sources(xive->src_blocks[i]);
+		kfree(xive->src_blocks[i]);
+		xive->src_blocks[i] = NULL;
+	}
+
+	if (xive->vp_base != XIVE_INVALID_VP)
+		xive_native_free_vp_block(xive->vp_base);
+
+	kfree(xive);
+	kfree(dev);
+}
+
+static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
+{
+	struct kvmppc_xive *xive;
+	struct kvm *kvm = dev->kvm;
+	int ret = 0;
+
+	pr_devel("Creating xive native for partition\n");
+
+	if (kvm->arch.xive)
+		return -EEXIST;
+
+	xive = kzalloc(sizeof(*xive), GFP_KERNEL);
+	if (!xive)
+		return -ENOMEM;
+
+	dev->private = xive;
+	xive->dev = dev;
+	xive->kvm = kvm;
+	kvm->arch.xive = xive;
+
+	/* We use the default queue size set by the host */
+	xive->q_order = xive_native_default_eq_shift();
+	if (xive->q_order < PAGE_SHIFT)
+		xive->q_page_order = 0;
+	else
+		xive->q_page_order = xive->q_order - PAGE_SHIFT;
+
+	/* Allocate a bunch of VPs */
+	xive->vp_base = xive_native_alloc_vp_block(KVM_MAX_VCPUS);
+	pr_devel("VP_Base=%x\n", xive->vp_base);
+
+	if (xive->vp_base == XIVE_INVALID_VP)
+		ret = -ENOMEM;
+
+	xive->single_escalation = xive_native_has_single_escalation();
+
+	if (ret)
+		kfree(xive);
+
+	return ret;
+}
+
+static int xive_native_debug_show(struct seq_file *m, void *private)
+{
+	struct kvmppc_xive *xive = m->private;
+	struct kvm *kvm = xive->kvm;
+	struct kvm_vcpu *vcpu;
+	unsigned int i;
+
+	if (!kvm)
+		return 0;
+
+	seq_puts(m, "=========\nVCPU state\n=========\n");
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+
+		if (!xc)
+			continue;
+
+		seq_printf(m, "cpu server %#x NSR=%02x CPPR=%02x IBP=%02x PIPR=%02x w01=%016llx w2=%08x\n",
+			   xc->server_num,
+			   vcpu->arch.xive_saved_state.nsr,
+			   vcpu->arch.xive_saved_state.cppr,
+			   vcpu->arch.xive_saved_state.ipb,
+			   vcpu->arch.xive_saved_state.pipr,
+			   vcpu->arch.xive_saved_state.w01,
+			   (u32) vcpu->arch.xive_cam_word);
+
+		kvmppc_xive_debug_show_queues(m, vcpu);
+	}
+
+	return 0;
+}
+
+static int xive_native_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, xive_native_debug_show, inode->i_private);
+}
+
+static const struct file_operations xive_native_debug_fops = {
+	.open = xive_native_debug_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+static void xive_native_debugfs_init(struct kvmppc_xive *xive)
+{
+	char *name;
+
+	name = kasprintf(GFP_KERNEL, "kvm-xive-%p", xive);
+	if (!name) {
+		pr_err("%s: no memory for name\n", __func__);
+		return;
+	}
+
+	xive->dentry = debugfs_create_file(name, 0444, powerpc_debugfs_root,
+					   xive, &xive_native_debug_fops);
+
+	pr_debug("%s: created %s\n", __func__, name);
+	kfree(name);
+}
+
+static void kvmppc_xive_native_init(struct kvm_device *dev)
+{
+	struct kvmppc_xive *xive = (struct kvmppc_xive *)dev->private;
+
+	/* Register some debug interfaces */
+	xive_native_debugfs_init(xive);
+}
+
+struct kvm_device_ops kvm_xive_native_ops = {
+	.name = "kvm-xive-native",
+	.create = kvmppc_xive_native_create,
+	.init = kvmppc_xive_native_init,
+	.destroy = kvmppc_xive_native_free,
+	.set_attr = kvmppc_xive_native_set_attr,
+	.get_attr = kvmppc_xive_native_get_attr,
+	.has_attr = kvmppc_xive_native_has_attr,
+};
+
+void kvmppc_xive_native_init_module(void)
+{
+	;
+}
+
+void kvmppc_xive_native_exit_module(void)
+{
+	;
+}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index b90a7d154180..01d526e15e9d 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -566,6 +566,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_PPC_ENABLE_HCALL:
 #ifdef CONFIG_KVM_XICS
 	case KVM_CAP_IRQ_XICS:
+#endif
+#ifdef CONFIG_KVM_XIVE
+	case KVM_CAP_PPC_IRQ_XIVE:
 #endif
 	case KVM_CAP_PPC_GET_CPU_CHAR:
 		r = 1;
@@ -753,6 +756,9 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 		else
 			kvmppc_xics_free_icp(vcpu);
 		break;
+	case KVMPPC_IRQ_XIVE:
+		kvmppc_xive_native_cleanup_vcpu(vcpu);
+		break;
 	}
 
 	kvmppc_core_vcpu_free(vcpu);
@@ -1941,6 +1947,30 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 		break;
 	}
 #endif /* CONFIG_KVM_XICS */
+#ifdef CONFIG_KVM_XIVE
+	case KVM_CAP_PPC_IRQ_XIVE: {
+		struct fd f;
+		struct kvm_device *dev;
+
+		r = -EBADF;
+		f = fdget(cap->args[0]);
+		if (!f.file)
+			break;
+
+		r = -ENXIO;
+		if (!xive_enabled())
+			break;
+
+		r = -EPERM;
+		dev = kvm_device_from_filp(f.file);
+		if (dev)
+			r = kvmppc_xive_native_connect_vcpu(dev, vcpu,
+							    cap->args[1]);
+
+		fdput(f);
+		break;
+	}
+#endif /* CONFIG_KVM_XIVE */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	case KVM_CAP_PPC_FWNMI:
 		r = -EINVAL;
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 64f1135e7732..806cbe488410 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -99,7 +99,7 @@ endif
 kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
 	book3s_xics.o
 
-kvm-book3s_64-objs-$(CONFIG_KVM_XIVE) += book3s_xive.o
+kvm-book3s_64-objs-$(CONFIG_KVM_XIVE) += book3s_xive.o book3s_xive_native.o
 kvm-book3s_64-objs-$(CONFIG_SPAPR_TCE_IOMMU) += book3s_64_vio.o
 
 kvm-book3s_64-module-objs := \
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (4 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 05/19] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-22  5:09   ` Paul Mackerras
  2019-01-07 18:43 ` [PATCH 07/19] KVM: PPC: Book3S HV: add a GET_TIMA_FD control to " Cédric Le Goater
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

This will let the guest create a memory mapping to expose the ESB MMIO
regions used to control the interrupt sources, to trigger events, to
EOI or to turn off the sources.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/uapi/asm/kvm.h   |  4 ++
 arch/powerpc/kvm/book3s_xive_native.c | 97 +++++++++++++++++++++++++++
 2 files changed, 101 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 8c876c166ef2..6bb61ba141c2 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -675,4 +675,8 @@ struct kvm_ppc_cpu_char {
 #define  KVM_XICS_PRESENTED		(1ULL << 43)
 #define  KVM_XICS_QUEUED		(1ULL << 44)
 
+/* POWER9 XIVE Native Interrupt Controller */
+#define KVM_DEV_XIVE_GRP_CTRL		1
+#define   KVM_DEV_XIVE_GET_ESB_FD	1
+
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 115143e76c45..e20081f0c8d4 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -153,6 +153,85 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
 	return rc;
 }
 
+static int xive_native_esb_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct kvmppc_xive *xive = vma->vm_file->private_data;
+	struct kvmppc_xive_src_block *sb;
+	struct kvmppc_xive_irq_state *state;
+	struct xive_irq_data *xd;
+	u32 hw_num;
+	u16 src;
+	u64 page;
+	unsigned long irq;
+
+	/*
+	 * Linux/KVM uses a two pages ESB setting, one for trigger and
+	 * one for EOI
+	 */
+	irq = vmf->pgoff / 2;
+
+	sb = kvmppc_xive_find_source(xive, irq, &src);
+	if (!sb) {
+		pr_err("%s: source %lx not found !\n", __func__, irq);
+		return VM_FAULT_SIGBUS;
+	}
+
+	state = &sb->irq_state[src];
+	kvmppc_xive_select_irq(state, &hw_num, &xd);
+
+	arch_spin_lock(&sb->lock);
+
+	/*
+	 * first/even page is for trigger
+	 * second/odd page is for EOI and management.
+	 */
+	page = vmf->pgoff % 2 ? xd->eoi_page : xd->trig_page;
+	arch_spin_unlock(&sb->lock);
+
+	if (!page) {
+		pr_err("%s: acessing invalid ESB page for source %lx !\n",
+		       __func__, irq);
+		return VM_FAULT_SIGBUS;
+	}
+
+	vmf_insert_pfn(vma, vmf->address, page >> PAGE_SHIFT);
+	return VM_FAULT_NOPAGE;
+}
+
+static const struct vm_operations_struct xive_native_esb_vmops = {
+	.fault = xive_native_esb_fault,
+};
+
+static int xive_native_esb_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	/* There are two ESB pages (trigger and EOI) per IRQ */
+	if (vma_pages(vma) + vma->vm_pgoff > KVMPPC_XIVE_NR_IRQS * 2)
+		return -EINVAL;
+
+	vma->vm_flags |= VM_IO | VM_PFNMAP;
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+	vma->vm_ops = &xive_native_esb_vmops;
+	return 0;
+}
+
+static const struct file_operations xive_native_esb_fops = {
+	.mmap = xive_native_esb_mmap,
+};
+
+static int kvmppc_xive_native_get_esb_fd(struct kvmppc_xive *xive, u64 addr)
+{
+	u64 __user *ubufp = (u64 __user *) addr;
+	int ret;
+
+	ret = anon_inode_getfd("[xive-esb]", &xive_native_esb_fops, xive,
+				O_RDWR | O_CLOEXEC);
+	if (ret < 0)
+		return ret;
+
+	return put_user(ret, ubufp);
+}
+
 static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 				       struct kvm_device_attr *attr)
 {
@@ -162,12 +241,30 @@ static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 static int kvmppc_xive_native_get_attr(struct kvm_device *dev,
 				       struct kvm_device_attr *attr)
 {
+	struct kvmppc_xive *xive = dev->private;
+
+	switch (attr->group) {
+	case KVM_DEV_XIVE_GRP_CTRL:
+		switch (attr->attr) {
+		case KVM_DEV_XIVE_GET_ESB_FD:
+			return kvmppc_xive_native_get_esb_fd(xive, attr->addr);
+		}
+		break;
+	}
 	return -ENXIO;
 }
 
 static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
 				       struct kvm_device_attr *attr)
 {
+	switch (attr->group) {
+	case KVM_DEV_XIVE_GRP_CTRL:
+		switch (attr->attr) {
+		case KVM_DEV_XIVE_GET_ESB_FD:
+			return 0;
+		}
+		break;
+	}
 	return -ENXIO;
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 07/19] KVM: PPC: Book3S HV: add a GET_TIMA_FD control to XIVE native device
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (5 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device Cédric Le Goater
@ 2019-01-07 18:43 ` " Cédric Le Goater
  2019-01-07 18:43 ` [PATCH 08/19] KVM: PPC: Book3S HV: add a VC_BASE control to the " Cédric Le Goater
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

This will let the guest create a memory mapping to expose the XIVE
MMIO region (TIMA) used for interrupt management at the CPU level.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/xive.h       |  1 +
 arch/powerpc/include/uapi/asm/kvm.h   |  1 +
 arch/powerpc/kvm/book3s_xive_native.c | 57 +++++++++++++++++++++++++++
 arch/powerpc/sysdev/xive/native.c     | 11 ++++++
 4 files changed, 70 insertions(+)

diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
index d6be3e4d9fa4..7a7aa22d8258 100644
--- a/arch/powerpc/include/asm/xive.h
+++ b/arch/powerpc/include/asm/xive.h
@@ -23,6 +23,7 @@
  * same offset regardless of where the code is executing
  */
 extern void __iomem *xive_tima;
+extern unsigned long xive_tima_os;
 
 /*
  * Offset in the TM area of our current execution level (provided by
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 6bb61ba141c2..89c140cb9e79 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -678,5 +678,6 @@ struct kvm_ppc_cpu_char {
 /* POWER9 XIVE Native Interrupt Controller */
 #define KVM_DEV_XIVE_GRP_CTRL		1
 #define   KVM_DEV_XIVE_GET_ESB_FD	1
+#define   KVM_DEV_XIVE_GET_TIMA_FD	2
 
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index e20081f0c8d4..ee9d12bf2dae 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -232,6 +232,60 @@ static int kvmppc_xive_native_get_esb_fd(struct kvmppc_xive *xive, u64 addr)
 	return put_user(ret, ubufp);
 }
 
+static int xive_native_tima_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+
+	switch (vmf->pgoff) {
+	case 0: /* HW - forbid access */
+	case 1: /* HV - forbid access */
+		return VM_FAULT_SIGBUS;
+	case 2: /* OS */
+		vmf_insert_pfn(vma, vmf->address, xive_tima_os >> PAGE_SHIFT);
+		return VM_FAULT_NOPAGE;
+	case 3: /* USER - TODO */
+	default:
+		return VM_FAULT_SIGBUS;
+	}
+}
+
+static const struct vm_operations_struct xive_native_tima_vmops = {
+	.fault = xive_native_tima_fault,
+};
+
+static int xive_native_tima_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	/*
+	 * The TIMA is four pages wide but only the last two pages (OS
+	 * and User view) are accessible to the guest. The page fault
+	 * handler will handle the permissions.
+	 */
+	if (vma_pages(vma) + vma->vm_pgoff > 4)
+		return -EINVAL;
+
+	vma->vm_flags |= VM_IO | VM_PFNMAP;
+	vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot);
+	vma->vm_ops = &xive_native_tima_vmops;
+	return 0;
+}
+
+static const struct file_operations xive_native_tima_fops = {
+	.mmap = xive_native_tima_mmap,
+};
+
+static int kvmppc_xive_native_get_tima_fd(struct kvmppc_xive *xive, u64 addr)
+{
+	u64 __user *ubufp = (u64 __user *) addr;
+	int ret;
+
+	ret = anon_inode_getfd("[xive-tima]", &xive_native_tima_fops, xive,
+			       O_RDWR | O_CLOEXEC);
+	if (ret < 0)
+		return ret;
+
+	return put_user(ret, ubufp);
+}
+
 static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 				       struct kvm_device_attr *attr)
 {
@@ -248,6 +302,8 @@ static int kvmppc_xive_native_get_attr(struct kvm_device *dev,
 		switch (attr->attr) {
 		case KVM_DEV_XIVE_GET_ESB_FD:
 			return kvmppc_xive_native_get_esb_fd(xive, attr->addr);
+		case KVM_DEV_XIVE_GET_TIMA_FD:
+			return kvmppc_xive_native_get_tima_fd(xive, attr->addr);
 		}
 		break;
 	}
@@ -261,6 +317,7 @@ static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
 	case KVM_DEV_XIVE_GRP_CTRL:
 		switch (attr->attr) {
 		case KVM_DEV_XIVE_GET_ESB_FD:
+		case KVM_DEV_XIVE_GET_TIMA_FD:
 			return 0;
 		}
 		break;
diff --git a/arch/powerpc/sysdev/xive/native.c b/arch/powerpc/sysdev/xive/native.c
index 0c037e933e55..7782201e5fe8 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -521,6 +521,9 @@ u32 xive_native_default_eq_shift(void)
 }
 EXPORT_SYMBOL_GPL(xive_native_default_eq_shift);
 
+unsigned long xive_tima_os;
+EXPORT_SYMBOL_GPL(xive_tima_os);
+
 bool __init xive_native_init(void)
 {
 	struct device_node *np;
@@ -573,6 +576,14 @@ bool __init xive_native_init(void)
 	for_each_possible_cpu(cpu)
 		kvmppc_set_xive_tima(cpu, r.start, tima);
 
+	/* Resource 2 is OS window */
+	if (of_address_to_resource(np, 2, &r)) {
+		pr_err("Failed to get thread mgmnt area resource\n");
+		return false;
+	}
+
+	xive_tima_os = r.start;
+
 	/* Grab size of provisionning pages */
 	xive_parse_provisioning(np);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 08/19] KVM: PPC: Book3S HV: add a VC_BASE control to the XIVE native device
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (6 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 07/19] KVM: PPC: Book3S HV: add a GET_TIMA_FD control to " Cédric Le Goater
@ 2019-01-07 18:43 ` " Cédric Le Goater
  2019-01-22  5:14   ` Paul Mackerras
  2019-01-07 18:43 ` [PATCH 09/19] KVM: PPC: Book3S HV: add a SET_SOURCE " Cédric Le Goater
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

The ESB MMIO region controls the interrupt sources of the guest. QEMU
will query an fd (GET_ESB_FD ioctl) and map this region at a specific
address for the guest to use. The guest will obtain this information
using the H_INT_GET_SOURCE_INFO hcall. To inform KVM of the address
setting used by QEMU, add a VC_BASE control to the KVM XIVE device

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/uapi/asm/kvm.h   |  1 +
 arch/powerpc/kvm/book3s_xive.h        |  3 +++
 arch/powerpc/kvm/book3s_xive_native.c | 39 +++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 89c140cb9e79..8b78b12aa118 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -679,5 +679,6 @@ struct kvm_ppc_cpu_char {
 #define KVM_DEV_XIVE_GRP_CTRL		1
 #define   KVM_DEV_XIVE_GET_ESB_FD	1
 #define   KVM_DEV_XIVE_GET_TIMA_FD	2
+#define   KVM_DEV_XIVE_VC_BASE		3
 
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index 5f22415520b4..ae4a670eea63 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -125,6 +125,9 @@ struct kvmppc_xive {
 
 	/* Flags */
 	u8	single_escalation;
+
+	/* VC base address for ESBs */
+	u64     vc_base;
 };
 
 #define KVMPPC_XIVE_Q_COUNT	8
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index ee9d12bf2dae..29a62914de55 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -153,6 +153,25 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
 	return rc;
 }
 
+static int kvmppc_xive_native_set_vc_base(struct kvmppc_xive *xive, u64 addr)
+{
+	u64 __user *ubufp = (u64 __user *) addr;
+
+	if (get_user(xive->vc_base, ubufp))
+		return -EFAULT;
+	return 0;
+}
+
+static int kvmppc_xive_native_get_vc_base(struct kvmppc_xive *xive, u64 addr)
+{
+	u64 __user *ubufp = (u64 __user *) addr;
+
+	if (put_user(xive->vc_base, ubufp))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int xive_native_esb_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
@@ -289,6 +308,16 @@ static int kvmppc_xive_native_get_tima_fd(struct kvmppc_xive *xive, u64 addr)
 static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 				       struct kvm_device_attr *attr)
 {
+	struct kvmppc_xive *xive = dev->private;
+
+	switch (attr->group) {
+	case KVM_DEV_XIVE_GRP_CTRL:
+		switch (attr->attr) {
+		case KVM_DEV_XIVE_VC_BASE:
+			return kvmppc_xive_native_set_vc_base(xive, attr->addr);
+		}
+		break;
+	}
 	return -ENXIO;
 }
 
@@ -304,6 +333,8 @@ static int kvmppc_xive_native_get_attr(struct kvm_device *dev,
 			return kvmppc_xive_native_get_esb_fd(xive, attr->addr);
 		case KVM_DEV_XIVE_GET_TIMA_FD:
 			return kvmppc_xive_native_get_tima_fd(xive, attr->addr);
+		case KVM_DEV_XIVE_VC_BASE:
+			return kvmppc_xive_native_get_vc_base(xive, attr->addr);
 		}
 		break;
 	}
@@ -318,6 +349,7 @@ static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
 		switch (attr->attr) {
 		case KVM_DEV_XIVE_GET_ESB_FD:
 		case KVM_DEV_XIVE_GET_TIMA_FD:
+		case KVM_DEV_XIVE_VC_BASE:
 			return 0;
 		}
 		break;
@@ -353,6 +385,11 @@ static void kvmppc_xive_native_free(struct kvm_device *dev)
 	kfree(dev);
 }
 
+/*
+ * ESB MMIO address of chip 0
+ */
+#define XIVE_VC_BASE   0x0006010000000000ull
+
 static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
 {
 	struct kvmppc_xive *xive;
@@ -387,6 +424,8 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
 	if (xive->vp_base == XIVE_INVALID_VP)
 		ret = -ENOMEM;
 
+	xive->vc_base = XIVE_VC_BASE;
+
 	xive->single_escalation = xive_native_has_single_escalation();
 
 	if (ret)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 09/19] KVM: PPC: Book3S HV: add a SET_SOURCE control to the XIVE native device
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (7 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 08/19] KVM: PPC: Book3S HV: add a VC_BASE control to the " Cédric Le Goater
@ 2019-01-07 18:43 ` " Cédric Le Goater
  2019-01-07 18:43 ` [PATCH 10/19] KVM: PPC: Book3S HV: add a EISN attribute to kvmppc_xive_irq_state Cédric Le Goater
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

Interrupt sources are simply created at the OPAL level and then
MASKED. KVM only needs to know about their type: LSI or MSI.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/uapi/asm/kvm.h           |  5 +
 arch/powerpc/kvm/book3s_xive_native.c         | 98 +++++++++++++++++++
 .../powerpc/kvm/book3s_xive_native_template.c | 27 +++++
 3 files changed, 130 insertions(+)
 create mode 100644 arch/powerpc/kvm/book3s_xive_native_template.c

diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 8b78b12aa118..6fc9660c5aec 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -680,5 +680,10 @@ struct kvm_ppc_cpu_char {
 #define   KVM_DEV_XIVE_GET_ESB_FD	1
 #define   KVM_DEV_XIVE_GET_TIMA_FD	2
 #define   KVM_DEV_XIVE_VC_BASE		3
+#define KVM_DEV_XIVE_GRP_SOURCES	2	/* 64-bit source attributes */
+
+/* Layout of 64-bit XIVE source attribute values */
+#define KVM_XIVE_LEVEL_SENSITIVE	(1ULL << 0)
+#define KVM_XIVE_LEVEL_ASSERTED		(1ULL << 1)
 
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 29a62914de55..2518640d4a58 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -31,6 +31,24 @@
 
 #include "book3s_xive.h"
 
+/*
+ * We still instantiate them here because we use some of the
+ * generated utility functions as well in this file.
+ */
+#define XIVE_RUNTIME_CHECKS
+#define X_PFX xive_vm_
+#define X_STATIC static
+#define X_STAT_PFX stat_vm_
+#define __x_tima		xive_tima
+#define __x_eoi_page(xd)	((void __iomem *)((xd)->eoi_mmio))
+#define __x_trig_page(xd)	((void __iomem *)((xd)->trig_mmio))
+#define __x_writeb	__raw_writeb
+#define __x_readw	__raw_readw
+#define __x_readq	__raw_readq
+#define __x_writeq	__raw_writeq
+
+#include "book3s_xive_native_template.c"
+
 static void xive_native_cleanup_queue(struct kvm_vcpu *vcpu, int prio)
 {
 	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
@@ -305,6 +323,78 @@ static int kvmppc_xive_native_get_tima_fd(struct kvmppc_xive *xive, u64 addr)
 	return put_user(ret, ubufp);
 }
 
+static int kvmppc_xive_native_set_source(struct kvmppc_xive *xive, long irq,
+					 u64 addr)
+{
+	struct kvmppc_xive_src_block *sb;
+	struct kvmppc_xive_irq_state *state;
+	u64 __user *ubufp = (u64 __user *) addr;
+	u64 val;
+	u16 idx;
+
+	pr_devel("%s irq=0x%lx\n", __func__, irq);
+
+	if (irq < KVMPPC_XIVE_FIRST_IRQ || irq >= KVMPPC_XIVE_NR_IRQS)
+		return -ENOENT;
+
+	sb = kvmppc_xive_find_source(xive, irq, &idx);
+	if (!sb) {
+		pr_debug("No source, creating source block...\n");
+		sb = kvmppc_xive_create_src_block(xive, irq);
+		if (!sb) {
+			pr_err("Failed to create block...\n");
+			return -ENOMEM;
+		}
+	}
+	state = &sb->irq_state[idx];
+
+	if (get_user(val, ubufp)) {
+		pr_err("fault getting user info !\n");
+		return -EFAULT;
+	}
+
+	/*
+	 * If the source doesn't already have an IPI, allocate
+	 * one and get the corresponding data
+	 */
+	if (!state->ipi_number) {
+		state->ipi_number = xive_native_alloc_irq();
+		if (state->ipi_number == 0) {
+			pr_err("Failed to allocate IRQ !\n");
+			return -ENOMEM;
+		}
+		xive_native_populate_irq_data(state->ipi_number,
+					      &state->ipi_data);
+		pr_debug("%s allocated hw_irq=0x%x for irq=0x%lx\n", __func__,
+			 state->ipi_number, irq);
+	}
+
+	arch_spin_lock(&sb->lock);
+
+	/* Restore LSI state */
+	if (val & KVM_XIVE_LEVEL_SENSITIVE) {
+		state->lsi = true;
+		if (val & KVM_XIVE_LEVEL_ASSERTED)
+			state->asserted = true;
+		pr_devel("  LSI ! Asserted=%d\n", state->asserted);
+	}
+
+	/* Mask IRQ to start with */
+	state->act_server = 0;
+	state->act_priority = MASKED;
+	xive_vm_esb_load(&state->ipi_data, XIVE_ESB_SET_PQ_01);
+	xive_native_configure_irq(state->ipi_number, 0, MASKED, 0);
+
+	/* Increment the number of valid sources and mark this one valid */
+	if (!state->valid)
+		xive->src_count++;
+	state->valid = true;
+
+	arch_spin_unlock(&sb->lock);
+
+	return 0;
+}
+
 static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 				       struct kvm_device_attr *attr)
 {
@@ -317,6 +407,9 @@ static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 			return kvmppc_xive_native_set_vc_base(xive, attr->addr);
 		}
 		break;
+	case KVM_DEV_XIVE_GRP_SOURCES:
+		return kvmppc_xive_native_set_source(xive, attr->attr,
+						     attr->addr);
 	}
 	return -ENXIO;
 }
@@ -353,6 +446,11 @@ static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
 			return 0;
 		}
 		break;
+	case KVM_DEV_XIVE_GRP_SOURCES:
+		if (attr->attr >= KVMPPC_XIVE_FIRST_IRQ &&
+		    attr->attr < KVMPPC_XIVE_NR_IRQS)
+			return 0;
+		break;
 	}
 	return -ENXIO;
 }
diff --git a/arch/powerpc/kvm/book3s_xive_native_template.c b/arch/powerpc/kvm/book3s_xive_native_template.c
new file mode 100644
index 000000000000..e7260da4a596
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_xive_native_template.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2017-2019, IBM Corporation.
+ */
+
+/* File to be included by other .c files */
+
+#define XGLUE(a, b) a##b
+#define GLUE(a, b) XGLUE(a, b)
+
+/*
+ * TODO: introduce a common template file with the XIVE native layer
+ * and the XICS-on-XIVE glue for the utility functions
+ */
+static u8 GLUE(X_PFX, esb_load)(struct xive_irq_data *xd, u32 offset)
+{
+	u64 val;
+
+	if (xd->flags & XIVE_IRQ_FLAG_SHIFT_BUG)
+		offset |= offset << 4;
+
+	val = __x_readq(__x_eoi_page(xd) + offset);
+#ifdef __LITTLE_ENDIAN__
+	val >>= 64-8;
+#endif
+	return (u8)val;
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 10/19] KVM: PPC: Book3S HV: add a EISN attribute to kvmppc_xive_irq_state
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (8 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 09/19] KVM: PPC: Book3S HV: add a SET_SOURCE " Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-07 18:43 ` [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls Cédric Le Goater
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

The Effective IRQ Source Number is the interrupt number pushed in the
event queue that the guest OS will use to dispatch events internally.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.h | 3 +++
 arch/powerpc/kvm/book3s_xive.c | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index ae4a670eea63..67e07b41061d 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -57,6 +57,9 @@ struct kvmppc_xive_irq_state {
 	bool saved_p;
 	bool saved_q;
 	u8 saved_scan_prio;
+
+	/* Xive native */
+	u32 eisn;			/* Guest Effective IRQ number */
 };
 
 /* Select the "right" interrupt (IPI vs. passthrough) */
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index bb5d32f7e4e6..e9f05d9c9ad5 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1515,6 +1515,7 @@ struct kvmppc_xive_src_block *kvmppc_xive_create_src_block(
 
 	for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
 		sb->irq_state[i].number = (bid << KVMPPC_XICS_ICS_SHIFT) | i;
+		sb->irq_state[i].eisn = 0;
 		sb->irq_state[i].guest_priority = MASKED;
 		sb->irq_state[i].saved_priority = MASKED;
 		sb->irq_state[i].act_priority = MASKED;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (9 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 10/19] KVM: PPC: Book3S HV: add a EISN attribute to kvmppc_xive_irq_state Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-22  5:23   ` Paul Mackerras
  2019-01-07 18:43 ` [PATCH 12/19] KVM: PPC: Book3S HV: record guest queue page address Cédric Le Goater
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

The XIVE native exploitation mode specs define a set of Hypervisor
calls to configure the sources and the event queues :

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (PQ bits) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification configuration associated
   with the queue, only unconditional notification is supported for
   the moment. Reset is performed with a queue size of 0 and queueing
   is disabled in that case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the guest's internal interrupt structures to their
   initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure all notifications
   have reached their queue.

Calls that still need to be addressed :

   H_INT_SET_OS_REPORTING_LINE
   H_INT_GET_OS_REPORTING_LINE

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/kvm_ppc.h            |  43 ++
 arch/powerpc/kvm/book3s_xive.h                |  54 +++
 arch/powerpc/kvm/book3s_hv.c                  |  29 ++
 arch/powerpc/kvm/book3s_hv_builtin.c          | 196 +++++++++
 arch/powerpc/kvm/book3s_hv_rm_xive_native.c   |  47 +++
 arch/powerpc/kvm/book3s_xive_native.c         | 326 ++++++++++++++-
 .../powerpc/kvm/book3s_xive_native_template.c | 371 ++++++++++++++++++
 arch/powerpc/kvm/Makefile                     |   2 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S       |  52 +++
 9 files changed, 1118 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_rm_xive_native.c

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 1bb313f238fe..4cc897039485 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -602,6 +602,7 @@ extern int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
 extern void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu);
 extern void kvmppc_xive_native_init_module(void);
 extern void kvmppc_xive_native_exit_module(void);
+extern int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd);
 
 #else
 static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server,
@@ -634,6 +635,8 @@ static inline int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
 static inline void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu) { }
 static inline void kvmppc_xive_native_init_module(void) { }
 static inline void kvmppc_xive_native_exit_module(void) { }
+static inline int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd)
+	{ return 0; }
 
 #endif /* CONFIG_KVM_XIVE */
 
@@ -682,6 +685,46 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr);
 int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
 void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu);
 
+int kvmppc_rm_h_int_get_source_info(struct kvm_vcpu *vcpu,
+				    unsigned long flag,
+				    unsigned long lisn);
+int kvmppc_rm_h_int_set_source_config(struct kvm_vcpu *vcpu,
+				      unsigned long flag,
+				      unsigned long lisn,
+				      unsigned long target,
+				      unsigned long priority,
+				      unsigned long eisn);
+int kvmppc_rm_h_int_get_source_config(struct kvm_vcpu *vcpu,
+				      unsigned long flag,
+				      unsigned long lisn);
+int kvmppc_rm_h_int_get_queue_info(struct kvm_vcpu *vcpu,
+				   unsigned long flag,
+				   unsigned long target,
+				   unsigned long priority);
+int kvmppc_rm_h_int_set_queue_config(struct kvm_vcpu *vcpu,
+				     unsigned long flag,
+				     unsigned long target,
+				     unsigned long priority,
+				     unsigned long qpage,
+				     unsigned long qsize);
+int kvmppc_rm_h_int_get_queue_config(struct kvm_vcpu *vcpu,
+				     unsigned long flag,
+				     unsigned long target,
+				     unsigned long priority);
+int kvmppc_rm_h_int_set_os_reporting_line(struct kvm_vcpu *vcpu,
+					  unsigned long flag,
+					  unsigned long reportingline);
+int kvmppc_rm_h_int_get_os_reporting_line(struct kvm_vcpu *vcpu,
+					  unsigned long flag,
+					  unsigned long target,
+					  unsigned long reportingline);
+int kvmppc_rm_h_int_esb(struct kvm_vcpu *vcpu, unsigned long flag,
+			unsigned long lisn, unsigned long offset,
+			unsigned long data);
+int kvmppc_rm_h_int_sync(struct kvm_vcpu *vcpu, unsigned long flag,
+			 unsigned long lisn);
+int kvmppc_rm_h_int_reset(struct kvm_vcpu *vcpu, unsigned long flag);
+
 /*
  * Host-side operations we want to set up while running in real
  * mode in the guest operating on the xics.
diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index 67e07b41061d..31e598e62589 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -268,5 +268,59 @@ void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu);
 int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio);
 int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu);
 
+int xive_rm_h_int_get_source_info(struct kvm_vcpu *vcpu,
+				    unsigned long flag,
+				    unsigned long lisn);
+int xive_rm_h_int_get_source_config(struct kvm_vcpu *vcpu,
+				      unsigned long flag,
+				      unsigned long lisn);
+int xive_rm_h_int_get_queue_info(struct kvm_vcpu *vcpu,
+				   unsigned long flag,
+				   unsigned long target,
+				   unsigned long priority);
+int xive_rm_h_int_get_queue_config(struct kvm_vcpu *vcpu,
+				     unsigned long flag,
+				     unsigned long target,
+				     unsigned long priority);
+int xive_rm_h_int_set_os_reporting_line(struct kvm_vcpu *vcpu,
+					  unsigned long flag,
+					  unsigned long reportingline);
+int xive_rm_h_int_get_os_reporting_line(struct kvm_vcpu *vcpu,
+					  unsigned long flag,
+					  unsigned long target,
+					  unsigned long reportingline);
+int xive_rm_h_int_esb(struct kvm_vcpu *vcpu, unsigned long flag,
+			unsigned long lisn, unsigned long offset,
+			unsigned long data);
+int xive_rm_h_int_sync(struct kvm_vcpu *vcpu, unsigned long flag,
+			 unsigned long lisn);
+
+extern int (*__xive_vm_h_int_get_source_info)(struct kvm_vcpu *vcpu,
+				    unsigned long flag,
+				    unsigned long lisn);
+extern int (*__xive_vm_h_int_get_source_config)(struct kvm_vcpu *vcpu,
+				      unsigned long flag,
+				      unsigned long lisn);
+extern int (*__xive_vm_h_int_get_queue_info)(struct kvm_vcpu *vcpu,
+				   unsigned long flag,
+				   unsigned long target,
+				   unsigned long priority);
+extern int (*__xive_vm_h_int_get_queue_config)(struct kvm_vcpu *vcpu,
+				     unsigned long flag,
+				     unsigned long target,
+				     unsigned long priority);
+extern int (*__xive_vm_h_int_set_os_reporting_line)(struct kvm_vcpu *vcpu,
+					  unsigned long flag,
+					  unsigned long reportingline);
+extern int (*__xive_vm_h_int_get_os_reporting_line)(struct kvm_vcpu *vcpu,
+					  unsigned long flag,
+					  unsigned long target,
+					  unsigned long reportingline);
+extern int (*__xive_vm_h_int_esb)(struct kvm_vcpu *vcpu, unsigned long flag,
+			unsigned long lisn, unsigned long offset,
+			unsigned long data);
+extern int (*__xive_vm_h_int_sync)(struct kvm_vcpu *vcpu, unsigned long flag,
+			 unsigned long lisn);
+
 #endif /* CONFIG_KVM_XICS */
 #endif /* _KVM_PPC_BOOK3S_XICS_H */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5a066fc299e1..1fb17d529a88 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -930,6 +930,22 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 			break;
 		}
 		return RESUME_HOST;
+	case H_INT_GET_SOURCE_INFO:
+	case H_INT_SET_SOURCE_CONFIG:
+	case H_INT_GET_SOURCE_CONFIG:
+	case H_INT_GET_QUEUE_INFO:
+	case H_INT_SET_QUEUE_CONFIG:
+	case H_INT_GET_QUEUE_CONFIG:
+	case H_INT_SET_OS_REPORTING_LINE:
+	case H_INT_GET_OS_REPORTING_LINE:
+	case H_INT_ESB:
+	case H_INT_SYNC:
+	case H_INT_RESET:
+		if (kvmppc_xive_enabled(vcpu)) {
+			ret = kvmppc_xive_native_hcall(vcpu, req);
+			break;
+		}
+		return RESUME_HOST;
 	case H_SET_DABR:
 		ret = kvmppc_h_set_dabr(vcpu, kvmppc_get_gpr(vcpu, 4));
 		break;
@@ -5153,6 +5169,19 @@ static unsigned int default_hcall_list[] = {
 	H_IPOLL,
 	H_XIRR,
 	H_XIRR_X,
+#endif
+#ifdef CONFIG_KVM_XIVE
+	H_INT_GET_SOURCE_INFO,
+	H_INT_SET_SOURCE_CONFIG,
+	H_INT_GET_SOURCE_CONFIG,
+	H_INT_GET_QUEUE_INFO,
+	H_INT_SET_QUEUE_CONFIG,
+	H_INT_GET_QUEUE_CONFIG,
+	H_INT_SET_OS_REPORTING_LINE,
+	H_INT_GET_OS_REPORTING_LINE,
+	H_INT_ESB,
+	H_INT_SYNC,
+	H_INT_RESET,
 #endif
 	0
 };
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index a71e2fc00a4e..db690f914d78 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -51,6 +51,42 @@ EXPORT_SYMBOL_GPL(__xive_vm_h_ipi);
 EXPORT_SYMBOL_GPL(__xive_vm_h_cppr);
 EXPORT_SYMBOL_GPL(__xive_vm_h_eoi);
 
+int (*__xive_vm_h_int_get_source_info)(struct kvm_vcpu *vcpu,
+				       unsigned long flag,
+				       unsigned long lisn);
+int (*__xive_vm_h_int_get_source_config)(struct kvm_vcpu *vcpu,
+					 unsigned long flag,
+					 unsigned long lisn);
+int (*__xive_vm_h_int_get_queue_info)(struct kvm_vcpu *vcpu,
+				      unsigned long flag,
+				      unsigned long target,
+				      unsigned long priority);
+int (*__xive_vm_h_int_get_queue_config)(struct kvm_vcpu *vcpu,
+					unsigned long flag,
+					unsigned long target,
+					unsigned long priority);
+int (*__xive_vm_h_int_set_os_reporting_line)(struct kvm_vcpu *vcpu,
+					     unsigned long flag,
+					     unsigned long line);
+int (*__xive_vm_h_int_get_os_reporting_line)(struct kvm_vcpu *vcpu,
+					     unsigned long flag,
+					     unsigned long target,
+					     unsigned long line);
+int (*__xive_vm_h_int_esb)(struct kvm_vcpu *vcpu, unsigned long flag,
+			   unsigned long lisn, unsigned long offset,
+			   unsigned long data);
+int (*__xive_vm_h_int_sync)(struct kvm_vcpu *vcpu, unsigned long flag,
+			    unsigned long lisn);
+
+EXPORT_SYMBOL_GPL(__xive_vm_h_int_get_source_info);
+EXPORT_SYMBOL_GPL(__xive_vm_h_int_get_source_config);
+EXPORT_SYMBOL_GPL(__xive_vm_h_int_get_queue_info);
+EXPORT_SYMBOL_GPL(__xive_vm_h_int_get_queue_config);
+EXPORT_SYMBOL_GPL(__xive_vm_h_int_set_os_reporting_line);
+EXPORT_SYMBOL_GPL(__xive_vm_h_int_get_os_reporting_line);
+EXPORT_SYMBOL_GPL(__xive_vm_h_int_esb);
+EXPORT_SYMBOL_GPL(__xive_vm_h_int_sync);
+
 /*
  * Hash page table alignment on newer cpus(CPU_FTR_ARCH_206)
  * should be power of 2.
@@ -660,6 +696,166 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
 }
 #endif /* CONFIG_KVM_XICS */
 
+#ifdef CONFIG_KVM_XIVE
+int kvmppc_rm_h_int_get_source_info(struct kvm_vcpu *vcpu,
+				    unsigned long flag,
+				    unsigned long lisn)
+{
+	if (!kvmppc_xive_enabled(vcpu))
+		return H_TOO_HARD;
+	if (!xive_enabled())
+		return H_TOO_HARD;
+
+	if (is_rm())
+		return xive_rm_h_int_get_source_info(vcpu, flag, lisn);
+	if (unlikely(!__xive_vm_h_int_get_source_info))
+		return H_NOT_AVAILABLE;
+	return __xive_vm_h_int_get_source_info(vcpu, flag, lisn);
+}
+
+int kvmppc_rm_h_int_set_source_config(struct kvm_vcpu *vcpu,
+				      unsigned long flag,
+				      unsigned long lisn,
+				      unsigned long target,
+				      unsigned long priority,
+				      unsigned long eisn)
+{
+	return H_TOO_HARD;
+}
+
+int kvmppc_rm_h_int_get_source_config(struct kvm_vcpu *vcpu,
+				      unsigned long flag,
+				      unsigned long lisn)
+{
+	if (!kvmppc_xive_enabled(vcpu))
+		return H_TOO_HARD;
+	if (!xive_enabled())
+		return H_TOO_HARD;
+
+	if (is_rm())
+		return xive_rm_h_int_get_source_config(vcpu, flag, lisn);
+	if (unlikely(!__xive_vm_h_int_get_source_config))
+		return H_NOT_AVAILABLE;
+	return __xive_vm_h_int_get_source_config(vcpu, flag, lisn);
+}
+
+int kvmppc_rm_h_int_get_queue_info(struct kvm_vcpu *vcpu,
+				   unsigned long flag,
+				   unsigned long target,
+				   unsigned long priority)
+{
+	if (!kvmppc_xive_enabled(vcpu))
+		return H_TOO_HARD;
+	if (!xive_enabled())
+		return H_TOO_HARD;
+
+	if (is_rm())
+		return xive_rm_h_int_get_queue_info(vcpu, flag, target,
+						    priority);
+	if (unlikely(!__xive_vm_h_int_get_queue_info))
+		return H_NOT_AVAILABLE;
+	return __xive_vm_h_int_get_queue_info(vcpu, flag, target, priority);
+}
+
+int kvmppc_rm_h_int_set_queue_config(struct kvm_vcpu *vcpu,
+				     unsigned long flag,
+				     unsigned long target,
+				     unsigned long priority,
+				     unsigned long qpage,
+				     unsigned long qsize)
+{
+	return H_TOO_HARD;
+}
+
+int kvmppc_rm_h_int_get_queue_config(struct kvm_vcpu *vcpu,
+				     unsigned long flag,
+				     unsigned long target,
+				     unsigned long priority)
+{
+	if (!kvmppc_xive_enabled(vcpu))
+		return H_TOO_HARD;
+	if (!xive_enabled())
+		return H_TOO_HARD;
+
+	if (is_rm())
+		return xive_rm_h_int_get_queue_config(vcpu, flag, target,
+						      priority);
+	if (unlikely(!__xive_vm_h_int_get_queue_config))
+		return H_NOT_AVAILABLE;
+	return __xive_vm_h_int_get_queue_config(vcpu, flag, target, priority);
+}
+
+int kvmppc_rm_h_int_set_os_reporting_line(struct kvm_vcpu *vcpu,
+					  unsigned long flag,
+					  unsigned long line)
+{
+	if (!kvmppc_xive_enabled(vcpu))
+		return H_TOO_HARD;
+	if (!xive_enabled())
+		return H_TOO_HARD;
+
+	if (is_rm())
+		return xive_rm_h_int_set_os_reporting_line(vcpu, flag, line);
+	if (unlikely(!__xive_vm_h_int_set_os_reporting_line))
+		return H_NOT_AVAILABLE;
+	return __xive_vm_h_int_set_os_reporting_line(vcpu, flag, line);
+}
+
+int kvmppc_rm_h_int_get_os_reporting_line(struct kvm_vcpu *vcpu,
+					  unsigned long flag,
+					  unsigned long target,
+					  unsigned long line)
+{
+	if (!kvmppc_xive_enabled(vcpu))
+		return H_TOO_HARD;
+	if (!xive_enabled())
+		return H_TOO_HARD;
+
+	if (is_rm())
+		return xive_rm_h_int_get_os_reporting_line(vcpu,
+							   flag, target, line);
+	if (unlikely(!__xive_vm_h_int_get_os_reporting_line))
+		return H_NOT_AVAILABLE;
+	return __xive_vm_h_int_get_os_reporting_line(vcpu, flag, target, line);
+}
+
+int kvmppc_rm_h_int_esb(struct kvm_vcpu *vcpu, unsigned long flag,
+			 unsigned long lisn, unsigned long offset,
+			 unsigned long data)
+{
+	if (!kvmppc_xive_enabled(vcpu))
+		return H_TOO_HARD;
+	if (!xive_enabled())
+		return H_TOO_HARD;
+
+	if (is_rm())
+		return xive_rm_h_int_esb(vcpu, flag, lisn, offset, data);
+	if (unlikely(!__xive_vm_h_int_esb))
+		return H_NOT_AVAILABLE;
+	return __xive_vm_h_int_esb(vcpu, flag, lisn, offset, data);
+}
+
+int kvmppc_rm_h_int_sync(struct kvm_vcpu *vcpu, unsigned long flag,
+			 unsigned long lisn)
+{
+	if (!kvmppc_xive_enabled(vcpu))
+		return H_TOO_HARD;
+	if (!xive_enabled())
+		return H_TOO_HARD;
+
+	if (is_rm())
+		return xive_rm_h_int_sync(vcpu, flag, lisn);
+	if (unlikely(!__xive_vm_h_int_sync))
+		return H_NOT_AVAILABLE;
+	return __xive_vm_h_int_sync(vcpu, flag, lisn);
+}
+
+int kvmppc_rm_h_int_reset(struct kvm_vcpu *vcpu, unsigned long flag)
+{
+	return H_TOO_HARD;
+}
+#endif /* CONFIG_KVM_XIVE */
+
 void kvmppc_bad_interrupt(struct pt_regs *regs)
 {
 	/*
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xive_native.c b/arch/powerpc/kvm/book3s_hv_rm_xive_native.c
new file mode 100644
index 000000000000..0e72a6ae0f07
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_rm_xive_native.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/err.h>
+#include <linux/kernel_stat.h>
+
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_ppc.h>
+#include <asm/hvcall.h>
+#include <asm/xics.h>
+#include <asm/debug.h>
+#include <asm/synch.h>
+#include <asm/cputhreads.h>
+#include <asm/pgtable.h>
+#include <asm/ppc-opcode.h>
+#include <asm/pnv-pci.h>
+#include <asm/opal.h>
+#include <asm/smp.h>
+#include <asm/asm-prototypes.h>
+#include <asm/xive.h>
+#include <asm/xive-regs.h>
+
+#include "book3s_xive.h"
+
+/* XXX */
+#include <asm/udbg.h>
+//#define DBG(fmt...) udbg_printf(fmt)
+#define DBG(fmt...) do { } while (0)
+
+static inline void __iomem *get_tima_phys(void)
+{
+	return local_paca->kvm_hstate.xive_tima_phys;
+}
+
+#undef XIVE_RUNTIME_CHECKS
+#define X_PFX xive_rm_
+#define X_STATIC
+#define X_STAT_PFX stat_rm_
+#define __x_tima		get_tima_phys()
+#define __x_eoi_page(xd)	((void __iomem *)((xd)->eoi_page))
+#define __x_trig_page(xd)	((void __iomem *)((xd)->trig_page))
+#define __x_writeb	__raw_rm_writeb
+#define __x_readw	__raw_rm_readw
+#define __x_readq	__raw_rm_readq
+#define __x_writeq	__raw_rm_writeq
+
+#include "book3s_xive_native_template.c"
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 2518640d4a58..35d806740c3a 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -171,6 +171,56 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
 	return rc;
 }
 
+static int kvmppc_xive_native_set_source_config(struct kvmppc_xive *xive,
+					struct kvmppc_xive_src_block *sb,
+					struct kvmppc_xive_irq_state *state,
+					u32 server,
+					u8 priority,
+					u32 eisn)
+{
+	struct kvm *kvm = xive->kvm;
+	u32 hw_num;
+	int rc = 0;
+
+	/*
+	 * TODO: Do we need to safely mask and unmask a source ? can
+	 * we just let the guest handle the possible races ?
+	 */
+	arch_spin_lock(&sb->lock);
+
+	if (state->act_server == server && state->act_priority == priority &&
+	    state->eisn == eisn)
+		goto unlock;
+
+	pr_devel("new_act_prio=%d new_act_server=%d act_server=%d act_prio=%d\n",
+		 priority, server, state->act_server, state->act_priority);
+
+	kvmppc_xive_select_irq(state, &hw_num, NULL);
+
+	if (priority != MASKED) {
+		rc = kvmppc_xive_select_target(kvm, &server, priority);
+		if (rc)
+			goto unlock;
+
+		state->act_priority = priority;
+		state->act_server = server;
+		state->eisn = eisn;
+
+		rc = xive_native_configure_irq(hw_num, xive->vp_base + server,
+					       priority, eisn);
+	} else {
+		state->act_priority = MASKED;
+		state->act_server = 0;
+		state->eisn = 0;
+
+		rc = xive_native_configure_irq(hw_num, 0, MASKED, 0);
+	}
+
+unlock:
+	arch_spin_unlock(&sb->lock);
+	return rc;
+}
+
 static int kvmppc_xive_native_set_vc_base(struct kvmppc_xive *xive, u64 addr)
 {
 	u64 __user *ubufp = (u64 __user *) addr;
@@ -323,6 +373,20 @@ static int kvmppc_xive_native_get_tima_fd(struct kvmppc_xive *xive, u64 addr)
 	return put_user(ret, ubufp);
 }
 
+static int xive_native_validate_queue_size(u32 qsize)
+{
+	switch (qsize) {
+	case 12:
+	case 16:
+	case 21:
+	case 24:
+	case 0:
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
 static int kvmppc_xive_native_set_source(struct kvmppc_xive *xive, long irq,
 					 u64 addr)
 {
@@ -532,6 +596,248 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
 	return ret;
 }
 
+static int kvmppc_h_int_set_source_config(struct kvm_vcpu *vcpu,
+					  unsigned long flags,
+					  unsigned long irq,
+					  unsigned long server,
+					  unsigned long priority,
+					  unsigned long eisn)
+{
+	struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
+	struct kvmppc_xive_src_block *sb;
+	struct kvmppc_xive_irq_state *state;
+	int rc = 0;
+	u16 idx;
+
+	pr_devel("H_INT_SET_SOURCE_CONFIG flags=%08lx irq=%lx server=%ld priority=%ld eisn=%lx\n",
+		 flags, irq, server, priority, eisn);
+
+	if (flags & ~(XIVE_SPAPR_SRC_SET_EISN | XIVE_SPAPR_SRC_MASK))
+		return H_PARAMETER;
+
+	sb = kvmppc_xive_find_source(xive, irq, &idx);
+	if (!sb)
+		return H_P2;
+	state = &sb->irq_state[idx];
+
+	if (!(flags & XIVE_SPAPR_SRC_SET_EISN))
+		eisn = state->eisn;
+
+	if (priority != xive_prio_from_guest(priority)) {
+		pr_err("invalid priority for queue %ld for VCPU %ld\n",
+		       priority, server);
+		return H_P3;
+	}
+
+	/* TODO: handle XIVE_SPAPR_SRC_MASK */
+
+	rc = kvmppc_xive_native_set_source_config(xive, sb, state, server,
+						  priority, eisn);
+	if (!rc)
+		return H_SUCCESS;
+	else if (rc == -EINVAL)
+		return H_P4; /* no server found */
+	else
+		return H_HARDWARE;
+}
+
+static int kvmppc_h_int_set_queue_config(struct kvm_vcpu *vcpu,
+					 unsigned long flags,
+					 unsigned long server,
+					 unsigned long priority,
+					 unsigned long qpage,
+					 unsigned long qsize)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+	struct xive_q *q;
+	int rc;
+	__be32 *qaddr = 0;
+	struct page *page;
+
+	pr_devel("H_INT_SET_QUEUE_CONFIG flags=%08lx server=%ld priority=%ld qpage=%08lx qsize=%ld\n",
+		 flags, server, priority, qpage, qsize);
+
+	if (flags & ~XIVE_SPAPR_EQ_ALWAYS_NOTIFY)
+		return H_PARAMETER;
+
+	if (xc->server_num != server) {
+		vcpu = kvmppc_xive_find_server(kvm, server);
+		if (!vcpu) {
+			pr_debug("Can't find server %ld\n", server);
+			return H_P2;
+		}
+		xc = vcpu->arch.xive_vcpu;
+	}
+
+	if (priority != xive_prio_from_guest(priority) || priority == MASKED) {
+		pr_err("invalid priority for queue %ld for VCPU %d\n",
+		       priority, xc->server_num);
+		return H_P3;
+	}
+	q = &xc->queues[priority];
+
+	rc = xive_native_validate_queue_size(qsize);
+	if (rc) {
+		pr_err("invalid queue size %ld\n", qsize);
+		return H_P5;
+	}
+
+	/* reset queue and disable queueing */
+	if (!qsize) {
+		rc = xive_native_configure_queue(xc->vp_id, q, priority,
+						 NULL, 0, true);
+		if (rc) {
+			pr_err("Failed to reset queue %ld for VCPU %d: %d\n",
+			       priority, xc->server_num, rc);
+			return H_HARDWARE;
+		}
+
+		if (q->qpage) {
+			put_page(virt_to_page(q->qpage));
+			q->qpage = NULL;
+		}
+
+		return H_SUCCESS;
+	}
+
+	page = gfn_to_page(kvm, gpa_to_gfn(qpage));
+	if (is_error_page(page)) {
+		pr_warn("Couldn't get guest page for %lx!\n", qpage);
+		return H_P4;
+	}
+	qaddr = page_to_virt(page) + (qpage & ~PAGE_MASK);
+
+	rc = xive_native_configure_queue(xc->vp_id, q, priority,
+					 (__be32 *) qaddr, qsize, true);
+	if (rc) {
+		pr_err("Failed to configure queue %ld for VCPU %d: %d\n",
+		       priority, xc->server_num, rc);
+		put_page(page);
+		return H_HARDWARE;
+	}
+
+	rc = kvmppc_xive_attach_escalation(vcpu, priority);
+	if (rc) {
+		xive_native_cleanup_queue(vcpu, priority);
+		return H_HARDWARE;
+	}
+
+	return H_SUCCESS;
+}
+
+static void kvmppc_xive_reset_sources(struct kvmppc_xive_src_block *sb)
+{
+	int i;
+
+	for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
+		struct kvmppc_xive_irq_state *state = &sb->irq_state[i];
+
+		if (!state->valid)
+			continue;
+
+		if (state->act_priority == MASKED)
+			continue;
+
+		arch_spin_lock(&sb->lock);
+		state->eisn = 0;
+		state->act_server = 0;
+		state->act_priority = MASKED;
+		xive_vm_esb_load(&state->ipi_data, XIVE_ESB_SET_PQ_01);
+		xive_native_configure_irq(state->ipi_number, 0, MASKED, 0);
+		if (state->pt_number) {
+			xive_vm_esb_load(state->pt_data, XIVE_ESB_SET_PQ_01);
+			xive_native_configure_irq(state->pt_number,
+						  0, MASKED, 0);
+		}
+		arch_spin_unlock(&sb->lock);
+	}
+}
+
+static int kvmppc_h_int_reset(struct kvmppc_xive *xive, unsigned long flags)
+{
+	struct kvm *kvm = xive->kvm;
+	struct kvm_vcpu *vcpu;
+	unsigned int i;
+
+	pr_devel("H_INT_RESET flags=%08lx\n", flags);
+
+	if (flags)
+		return H_PARAMETER;
+
+	mutex_lock(&kvm->lock);
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+		unsigned int prio;
+
+		if (!xc)
+			continue;
+
+		kvmppc_xive_disable_vcpu_interrupts(vcpu);
+
+		for (prio = 0; prio < KVMPPC_XIVE_Q_COUNT; prio++) {
+
+			if (xc->esc_virq[prio]) {
+				free_irq(xc->esc_virq[prio], vcpu);
+				irq_dispose_mapping(xc->esc_virq[prio]);
+				kfree(xc->esc_virq_names[prio]);
+				xc->esc_virq[prio] = 0;
+			}
+
+			xive_native_cleanup_queue(vcpu, prio);
+		}
+	}
+
+	for (i = 0; i <= xive->max_sbid; i++) {
+		if (xive->src_blocks[i])
+			kvmppc_xive_reset_sources(xive->src_blocks[i]);
+	}
+
+	mutex_unlock(&kvm->lock);
+
+	return H_SUCCESS;
+}
+
+int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 req)
+{
+	struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
+	int rc;
+
+	if (!xive || !vcpu->arch.xive_vcpu)
+		return H_FUNCTION;
+
+	switch (req) {
+	case H_INT_SET_QUEUE_CONFIG:
+		rc = kvmppc_h_int_set_queue_config(vcpu,
+						   kvmppc_get_gpr(vcpu, 4),
+						   kvmppc_get_gpr(vcpu, 5),
+						   kvmppc_get_gpr(vcpu, 6),
+						   kvmppc_get_gpr(vcpu, 7),
+						   kvmppc_get_gpr(vcpu, 8));
+		break;
+
+	case H_INT_SET_SOURCE_CONFIG:
+		rc = kvmppc_h_int_set_source_config(vcpu,
+						    kvmppc_get_gpr(vcpu, 4),
+						    kvmppc_get_gpr(vcpu, 5),
+						    kvmppc_get_gpr(vcpu, 6),
+						    kvmppc_get_gpr(vcpu, 7),
+						    kvmppc_get_gpr(vcpu, 8));
+		break;
+
+	case H_INT_RESET:
+		rc = kvmppc_h_int_reset(xive, kvmppc_get_gpr(vcpu, 4));
+		break;
+
+	default:
+		rc =  H_NOT_AVAILABLE;
+	}
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(kvmppc_xive_native_hcall);
+
 static int xive_native_debug_show(struct seq_file *m, void *private)
 {
 	struct kvmppc_xive *xive = m->private;
@@ -614,10 +920,26 @@ struct kvm_device_ops kvm_xive_native_ops = {
 
 void kvmppc_xive_native_init_module(void)
 {
-	;
+	__xive_vm_h_int_get_source_info = xive_vm_h_int_get_source_info;
+	__xive_vm_h_int_get_source_config = xive_vm_h_int_get_source_config;
+	__xive_vm_h_int_get_queue_info = xive_vm_h_int_get_queue_info;
+	__xive_vm_h_int_get_queue_config = xive_vm_h_int_get_queue_config;
+	__xive_vm_h_int_set_os_reporting_line =
+		xive_vm_h_int_set_os_reporting_line;
+	__xive_vm_h_int_get_os_reporting_line =
+		xive_vm_h_int_get_os_reporting_line;
+	__xive_vm_h_int_esb = xive_vm_h_int_esb;
+	__xive_vm_h_int_sync = xive_vm_h_int_sync;
 }
 
 void kvmppc_xive_native_exit_module(void)
 {
-	;
+	__xive_vm_h_int_get_source_info = NULL;
+	__xive_vm_h_int_get_source_config = NULL;
+	__xive_vm_h_int_get_queue_info = NULL;
+	__xive_vm_h_int_get_queue_config = NULL;
+	__xive_vm_h_int_set_os_reporting_line = NULL;
+	__xive_vm_h_int_get_os_reporting_line = NULL;
+	__xive_vm_h_int_esb = NULL;
+	__xive_vm_h_int_sync = NULL;
 }
diff --git a/arch/powerpc/kvm/book3s_xive_native_template.c b/arch/powerpc/kvm/book3s_xive_native_template.c
index e7260da4a596..ccde2786d203 100644
--- a/arch/powerpc/kvm/book3s_xive_native_template.c
+++ b/arch/powerpc/kvm/book3s_xive_native_template.c
@@ -8,6 +8,279 @@
 #define XGLUE(a, b) a##b
 #define GLUE(a, b) XGLUE(a, b)
 
+X_STATIC int GLUE(X_PFX, h_int_get_source_info)(struct kvm_vcpu *vcpu,
+						unsigned long flags,
+						unsigned long irq)
+{
+	struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
+	struct kvmppc_xive_src_block *sb;
+	struct kvmppc_xive_irq_state *state;
+	struct xive_irq_data *xd;
+	u32 hw_num;
+	u16 src;
+	unsigned long esb_addr;
+
+	pr_devel("H_INT_GET_SOURCE_INFO flags=%08lx irq=%lx\n", flags, irq);
+
+	if (!xive)
+		return H_FUNCTION;
+
+	if (flags)
+		return H_PARAMETER;
+
+	sb = kvmppc_xive_find_source(xive, irq, &src);
+	if (!sb) {
+		pr_debug("source %lx not found !\n", irq);
+		return H_P2;
+	}
+	state = &sb->irq_state[src];
+
+	arch_spin_lock(&sb->lock);
+	kvmppc_xive_select_irq(state, &hw_num, &xd);
+
+	vcpu->arch.regs.gpr[4] = 0;
+	if (xd->flags & XIVE_IRQ_FLAG_STORE_EOI)
+		vcpu->arch.regs.gpr[4] |= XIVE_SPAPR_SRC_STORE_EOI;
+
+	/*
+	 * Force the use of the H_INT_ESB hcall in case of a Virtual
+	 * LSI interrupt. This is necessary under KVM to re-trigger
+	 * the interrupt if the level is still asserted
+	 */
+	if (state->lsi) {
+		vcpu->arch.regs.gpr[4] |= XIVE_SPAPR_SRC_LSI;
+		vcpu->arch.regs.gpr[4] |= XIVE_SPAPR_SRC_H_INT_ESB;
+	}
+
+	/*
+	 * Linux/KVM uses a two pages ESB setting, one for trigger and
+	 * one for EOI
+	 */
+	esb_addr = xive->vc_base + (irq << (PAGE_SHIFT + 1));
+
+	/* EOI/management page is the second/odd page */
+	if (xd->eoi_page &&
+	    !(vcpu->arch.regs.gpr[4] & XIVE_SPAPR_SRC_H_INT_ESB))
+		vcpu->arch.regs.gpr[5] = esb_addr + (1ull << PAGE_SHIFT);
+	else
+		vcpu->arch.regs.gpr[5] = -1;
+
+	/* Trigger page is always the first/even page */
+	if (xd->trig_page)
+		vcpu->arch.regs.gpr[6] = esb_addr;
+	else
+		vcpu->arch.regs.gpr[6] = -1;
+
+	vcpu->arch.regs.gpr[7] = PAGE_SHIFT;
+	arch_spin_unlock(&sb->lock);
+	return H_SUCCESS;
+}
+
+X_STATIC int GLUE(X_PFX, h_int_get_source_config)(struct kvm_vcpu *vcpu,
+						  unsigned long flags,
+						  unsigned long irq)
+{
+	struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
+	struct kvmppc_xive_src_block *sb;
+	struct kvmppc_xive_irq_state *state;
+	u16 src;
+
+	pr_devel("H_INT_GET_SOURCE_CONFIG flags=%08lx irq=%lx\n", flags, irq);
+
+	if (!xive)
+		return H_FUNCTION;
+
+	if (flags)
+		return H_PARAMETER;
+
+	sb = kvmppc_xive_find_source(xive, irq, &src);
+	if (!sb) {
+		pr_debug("source %lx not found !\n", irq);
+		return H_P2;
+	}
+	state = &sb->irq_state[src];
+
+	arch_spin_lock(&sb->lock);
+	vcpu->arch.regs.gpr[4] = state->act_server;
+	vcpu->arch.regs.gpr[5] = state->act_priority;
+	vcpu->arch.regs.gpr[6] = state->number;
+	arch_spin_unlock(&sb->lock);
+
+	return H_SUCCESS;
+}
+
+X_STATIC int GLUE(X_PFX, h_int_get_queue_info)(struct kvm_vcpu *vcpu,
+					       unsigned long flags,
+					       unsigned long server,
+					       unsigned long priority)
+{
+	struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
+	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+	struct xive_q *q;
+
+	pr_devel("H_INT_GET_QUEUE_INFO flags=%08lx server=%ld priority=%ld\n",
+		 flags, server, priority);
+
+	if (!xive)
+		return H_FUNCTION;
+
+	if (flags)
+		return H_PARAMETER;
+
+	if (xc->server_num != server) {
+		struct kvm_vcpu *vc;
+
+		vc = kvmppc_xive_find_server(vcpu->kvm, server);
+		if (!vc) {
+			pr_debug("server %ld not found\n", server);
+			return H_P2;
+		}
+		xc = vc->arch.xive_vcpu;
+	}
+
+	if (priority != xive_prio_from_guest(priority) || priority == MASKED) {
+		pr_debug("invalid priority for queue %ld for VCPU %ld\n",
+		       priority, server);
+		return H_P3;
+	}
+	q = &xc->queues[priority];
+
+	vcpu->arch.regs.gpr[4] = q->eoi_phys;
+	/* TODO: Power of 2 page size of the notification page */
+	vcpu->arch.regs.gpr[5] = 0;
+	return H_SUCCESS;
+}
+
+X_STATIC int GLUE(X_PFX, get_queue_state)(struct kvm_vcpu *vcpu,
+					  struct kvmppc_xive_vcpu *xc,
+					  unsigned long prio)
+{
+	int rc;
+	u32 qtoggle;
+	u32 qindex;
+
+	rc = xive_native_get_queue_state(xc->vp_id, prio, &qtoggle, &qindex);
+	if (rc)
+		return rc;
+
+	vcpu->arch.regs.gpr[4] |= ((unsigned long) qtoggle) << 62;
+	vcpu->arch.regs.gpr[7] = qindex;
+	return 0;
+}
+
+X_STATIC int GLUE(X_PFX, h_int_get_queue_config)(struct kvm_vcpu *vcpu,
+						 unsigned long flags,
+						 unsigned long server,
+						 unsigned long priority)
+{
+	struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
+	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+	struct xive_q *q;
+	u64 qpage;
+	u64 qsize;
+	u64 qeoi_page;
+	u32 escalate_irq;
+	u64 qflags;
+	int rc;
+
+	pr_devel("H_INT_GET_QUEUE_CONFIG flags=%08lx server=%ld priority=%ld\n",
+		 flags, server, priority);
+
+	if (!xive)
+		return H_FUNCTION;
+
+	if (flags & ~XIVE_SPAPR_EQ_DEBUG)
+		return H_PARAMETER;
+
+	if (xc->server_num != server) {
+		struct kvm_vcpu *vc;
+
+		vc = kvmppc_xive_find_server(vcpu->kvm, server);
+		if (!vc) {
+			pr_debug("server %ld not found\n", server);
+			return H_P2;
+		}
+		xc = vc->arch.xive_vcpu;
+	}
+
+	if (priority != xive_prio_from_guest(priority) || priority == MASKED) {
+		pr_debug("invalid priority for queue %ld for VCPU %ld\n",
+		       priority, server);
+		return H_P3;
+	}
+	q = &xc->queues[priority];
+
+	rc = xive_native_get_queue_info(xc->vp_id, priority, &qpage, &qsize,
+					&qeoi_page, &escalate_irq, &qflags);
+	if (rc)
+		return H_HARDWARE;
+
+	vcpu->arch.regs.gpr[4] = 0;
+	if (qflags & OPAL_XIVE_EQ_ALWAYS_NOTIFY)
+		vcpu->arch.regs.gpr[4] |= XIVE_SPAPR_EQ_ALWAYS_NOTIFY;
+
+	vcpu->arch.regs.gpr[5] = qpage;
+	vcpu->arch.regs.gpr[6] = qsize;
+	if (flags & XIVE_SPAPR_EQ_DEBUG) {
+		rc = GLUE(X_PFX, get_queue_state)(vcpu, xc, priority);
+		if (rc)
+			return H_HARDWARE;
+	}
+	return H_SUCCESS;
+}
+
+/* TODO H_INT_SET_OS_REPORTING_LINE */
+X_STATIC int GLUE(X_PFX, h_int_set_os_reporting_line)(struct kvm_vcpu *vcpu,
+						      unsigned long flags,
+						      unsigned long line)
+{
+	struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
+
+	pr_devel("H_INT_SET_OS_REPORTING_LINE flags=%08lx line=%ld\n",
+		 flags, line);
+
+	if (!xive)
+		return H_FUNCTION;
+
+	if (flags)
+		return H_PARAMETER;
+
+	return H_FUNCTION;
+}
+
+/* TODO H_INT_GET_OS_REPORTING_LINE*/
+X_STATIC int GLUE(X_PFX, h_int_get_os_reporting_line)(struct kvm_vcpu *vcpu,
+						      unsigned long flags,
+						      unsigned long server,
+						      unsigned long line)
+{
+	struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
+	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+
+	pr_devel("H_INT_GET_OS_REPORTING_LINE flags=%08lx server=%ld line=%ld\n",
+		 flags, server, line);
+
+	if (!xive)
+		return H_FUNCTION;
+
+	if (flags)
+		return H_PARAMETER;
+
+	if (xc->server_num != server) {
+		struct kvm_vcpu *vc;
+
+		vc = kvmppc_xive_find_server(vcpu->kvm, server);
+		if (!vc) {
+			pr_debug("server %ld not found\n", server);
+			return H_P2;
+		}
+		xc = vc->arch.xive_vcpu;
+	}
+
+	return H_FUNCTION;
+
+}
+
 /*
  * TODO: introduce a common template file with the XIVE native layer
  * and the XICS-on-XIVE glue for the utility functions
@@ -25,3 +298,101 @@ static u8 GLUE(X_PFX, esb_load)(struct xive_irq_data *xd, u32 offset)
 #endif
 	return (u8)val;
 }
+
+static u8 GLUE(X_PFX, esb_store)(struct xive_irq_data *xd, u32 offset, u64 data)
+{
+	u64 val;
+
+	if (xd->flags & XIVE_IRQ_FLAG_SHIFT_BUG)
+		offset |= offset << 4;
+
+	val = __x_readq(__x_eoi_page(xd) + offset);
+#ifdef __LITTLE_ENDIAN__
+	val >>= 64-8;
+#endif
+	return (u8)val;
+}
+
+X_STATIC int GLUE(X_PFX, h_int_esb)(struct kvm_vcpu *vcpu, unsigned long flags,
+				    unsigned long irq, unsigned long offset,
+				    unsigned long data)
+{
+	struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
+	struct kvmppc_xive_src_block *sb;
+	struct kvmppc_xive_irq_state *state;
+	struct xive_irq_data *xd;
+	u32 hw_num;
+	u16 src;
+
+	if (!xive)
+		return H_FUNCTION;
+
+	if (flags)
+		return H_PARAMETER;
+
+	sb = kvmppc_xive_find_source(xive, irq, &src);
+	if (!sb) {
+		pr_debug("source %lx not found !\n", irq);
+		return H_P2;
+	}
+	state = &sb->irq_state[src];
+
+	if (offset > (1ull << PAGE_SHIFT))
+		return H_P3;
+
+	arch_spin_lock(&sb->lock);
+	kvmppc_xive_select_irq(state, &hw_num, &xd);
+
+	if (flags & XIVE_SPAPR_ESB_STORE) {
+		GLUE(X_PFX, esb_store)(xd, offset, data);
+		vcpu->arch.regs.gpr[4] = -1;
+	} else {
+		/* Virtual LSI EOI handling */
+		if (state->lsi && offset == XIVE_ESB_LOAD_EOI) {
+			GLUE(X_PFX, esb_load)(xd, XIVE_ESB_SET_PQ_00);
+			if (state->asserted && __x_trig_page(xd))
+				__x_writeq(0, __x_trig_page(xd));
+			vcpu->arch.regs.gpr[4] = 0;
+		} else {
+			vcpu->arch.regs.gpr[4] =
+				GLUE(X_PFX, esb_load)(xd, offset);
+		}
+	}
+	arch_spin_unlock(&sb->lock);
+
+	return H_SUCCESS;
+}
+
+X_STATIC int GLUE(X_PFX, h_int_sync)(struct kvm_vcpu *vcpu, unsigned long flags,
+				     unsigned long irq)
+{
+	struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
+	struct kvmppc_xive_src_block *sb;
+	struct kvmppc_xive_irq_state *state;
+	struct xive_irq_data *xd;
+	u32 hw_num;
+	u16 src;
+
+	pr_devel("H_INT_SYNC flags=%08lx irq=%lx\n", flags, irq);
+
+	if (!xive)
+		return H_FUNCTION;
+
+	if (flags)
+		return H_PARAMETER;
+
+	sb = kvmppc_xive_find_source(xive, irq, &src);
+	if (!sb) {
+		pr_debug("source %lx not found !\n", irq);
+		return H_P2;
+	}
+	state = &sb->irq_state[src];
+
+	arch_spin_lock(&sb->lock);
+
+	kvmppc_xive_select_irq(state, &hw_num, &xd);
+	xive_native_sync_source(hw_num);
+
+	arch_spin_unlock(&sb->lock);
+	return H_SUCCESS;
+}
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 806cbe488410..1a5c65c59b13 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -81,6 +81,8 @@ kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
 
 kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
 	book3s_hv_rm_xics.o book3s_hv_rm_xive.o
+kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XIVE) += \
+	book3s_hv_rm_xive_native.o
 
 kvm-book3s_64-builtin-tm-objs-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
 	book3s_hv_tm_builtin.o
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 9b8d50a7cbaf..25b9489de249 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2462,6 +2462,58 @@ hcall_real_table:
 	.long	0		/* 0x2fc - H_XIRR_X*/
 #endif
 	.long	DOTSYM(kvmppc_h_random) - hcall_real_table
+	.long	0		/* 0x304 */
+	.long	0		/* 0x308 */
+	.long	0		/* 0x30c */
+	.long	0		/* 0x310 */
+	.long	0		/* 0x314 */
+	.long	0		/* 0x318 */
+	.long	0		/* 0x31c */
+	.long	0		/* 0x320 */
+	.long	0		/* 0x324 */
+	.long	0		/* 0x328 */
+	.long	0		/* 0x32c */
+	.long	0		/* 0x330 */
+	.long	0		/* 0x334 */
+	.long	0		/* 0x338 */
+	.long	0		/* 0x33c */
+	.long	0		/* 0x340 */
+	.long	0		/* 0x344 */
+	.long	0		/* 0x348 */
+	.long	0		/* 0x34c */
+	.long	0		/* 0x350 */
+	.long	0		/* 0x354 */
+	.long	0		/* 0x358 */
+	.long	0		/* 0x35c */
+	.long	0		/* 0x360 */
+	.long	0		/* 0x364 */
+	.long	0		/* 0x368 */
+	.long	0		/* 0x36c */
+	.long	0		/* 0x370 */
+	.long	0		/* 0x374 */
+	.long	0		/* 0x378 */
+	.long	0		/* 0x37c */
+	.long	0		/* 0x380 */
+	.long	0		/* 0x384 */
+	.long	0		/* 0x388 */
+	.long	0		/* 0x38c */
+	.long	0		/* 0x390 */
+	.long	0		/* 0x394 */
+	.long	0		/* 0x398 */
+	.long	0		/* 0x39c */
+	.long	0		/* 0x3a0 */
+	.long	0		/* 0x3a4 */
+	.long	DOTSYM(kvmppc_rm_h_int_get_source_info) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_int_set_source_config) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_int_get_source_config) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_int_get_queue_info) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_int_set_queue_config) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_int_get_queue_config) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_int_set_os_reporting_line) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_int_get_os_reporting_line) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_int_esb) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_int_sync) - hcall_real_table
+	.long	DOTSYM(kvmppc_rm_h_int_reset) - hcall_real_table
 	.globl	hcall_real_table_end
 hcall_real_table_end:
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 12/19] KVM: PPC: Book3S HV: record guest queue page address
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (10 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-07 18:43 ` [PATCH 13/19] KVM: PPC: Book3S HV: add a SYNC control for the XIVE native migration Cédric Le Goater
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

The guest physical address of the event queue will be part of the
state to transfer in the migration. Cache its value when the queue is
configured, it will save us an OPAL call.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/xive.h       | 2 ++
 arch/powerpc/kvm/book3s_xive_native.c | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
index 7a7aa22d8258..e90c3c5d9533 100644
--- a/arch/powerpc/include/asm/xive.h
+++ b/arch/powerpc/include/asm/xive.h
@@ -74,6 +74,8 @@ struct xive_q {
 	u32			esc_irq;
 	atomic_t		count;
 	atomic_t		pending_count;
+	u64			guest_qpage;
+	u32			guest_qsize;
 };
 
 /* Global enable flags for the XIVE support */
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 35d806740c3a..4ca75aade069 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -708,6 +708,10 @@ static int kvmppc_h_int_set_queue_config(struct kvm_vcpu *vcpu,
 	}
 	qaddr = page_to_virt(page) + (qpage & ~PAGE_MASK);
 
+	/* Backup queue page address and size for migration */
+	q->guest_qpage = qpage;
+	q->guest_qsize = qsize;
+
 	rc = xive_native_configure_queue(xc->vp_id, q, priority,
 					 (__be32 *) qaddr, qsize, true);
 	if (rc) {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 13/19] KVM: PPC: Book3S HV: add a SYNC control for the XIVE native migration
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (11 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 12/19] KVM: PPC: Book3S HV: record guest queue page address Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-07 18:43 ` [PATCH 14/19] KVM: PPC: Book3S HV: add a control to make the XIVE EQ pages dirty Cédric Le Goater
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

When migration of a VM is initiated, a first copy of the RAM is
transferred to the destination before the VM is stopped. At that time,
QEMU needs to perform a XIVE quiesce sequence to stop the flow of
event notifications and stabilize the EQs. The sources are masked and
the XIVE IC is synced with the KVM ioctl KVM_DEV_XIVE_GRP_SYNC.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/uapi/asm/kvm.h   |  1 +
 arch/powerpc/kvm/book3s_xive_native.c | 32 +++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 6fc9660c5aec..f3b859223b80 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -681,6 +681,7 @@ struct kvm_ppc_cpu_char {
 #define   KVM_DEV_XIVE_GET_TIMA_FD	2
 #define   KVM_DEV_XIVE_VC_BASE		3
 #define KVM_DEV_XIVE_GRP_SOURCES	2	/* 64-bit source attributes */
+#define KVM_DEV_XIVE_GRP_SYNC		3	/* 64-bit source attributes */
 
 /* Layout of 64-bit XIVE source attribute values */
 #define KVM_XIVE_LEVEL_SENSITIVE	(1ULL << 0)
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 4ca75aade069..a8052867afc1 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -459,6 +459,35 @@ static int kvmppc_xive_native_set_source(struct kvmppc_xive *xive, long irq,
 	return 0;
 }
 
+static int kvmppc_xive_native_sync(struct kvmppc_xive *xive, long irq, u64 addr)
+{
+	struct kvmppc_xive_src_block *sb;
+	struct kvmppc_xive_irq_state *state;
+	struct xive_irq_data *xd;
+	u32 hw_num;
+	u16 src;
+
+	pr_devel("%s irq=0x%lx\n", __func__, irq);
+
+	sb = kvmppc_xive_find_source(xive, irq, &src);
+	if (!sb)
+		return -ENOENT;
+
+	state = &sb->irq_state[src];
+
+	if (!state->valid)
+		return -ENOENT;
+
+	arch_spin_lock(&sb->lock);
+
+	kvmppc_xive_select_irq(state, &hw_num, &xd);
+	xive_native_sync_source(hw_num);
+	xive_native_sync_queue(hw_num);
+
+	arch_spin_unlock(&sb->lock);
+	return 0;
+}
+
 static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 				       struct kvm_device_attr *attr)
 {
@@ -474,6 +503,8 @@ static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 	case KVM_DEV_XIVE_GRP_SOURCES:
 		return kvmppc_xive_native_set_source(xive, attr->attr,
 						     attr->addr);
+	case KVM_DEV_XIVE_GRP_SYNC:
+		return kvmppc_xive_native_sync(xive, attr->attr, attr->addr);
 	}
 	return -ENXIO;
 }
@@ -511,6 +542,7 @@ static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
 		}
 		break;
 	case KVM_DEV_XIVE_GRP_SOURCES:
+	case KVM_DEV_XIVE_GRP_SYNC:
 		if (attr->attr >= KVMPPC_XIVE_FIRST_IRQ &&
 		    attr->attr < KVMPPC_XIVE_NR_IRQS)
 			return 0;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 14/19] KVM: PPC: Book3S HV: add a control to make the XIVE EQ pages dirty
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (12 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 13/19] KVM: PPC: Book3S HV: add a SYNC control for the XIVE native migration Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-07 18:43 ` [PATCH 15/19] KVM: PPC: Book3S HV: add get/set accessors for the source configuration Cédric Le Goater
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

When the VM is stopped in a migration sequence, the sources are masked
and the XIVE IC is synced to stabilize the EQs. When done, the KVM
ioctl KVM_DEV_XIVE_SAVE_EQ_PAGES is called to mark dirty the EQ pages.

The migration can then transfer the remaining dirty pages to the
destination and start collecting the state of the devices.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/uapi/asm/kvm.h   |  1 +
 arch/powerpc/kvm/book3s_xive_native.c | 40 +++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index f3b859223b80..1a8740629acf 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -680,6 +680,7 @@ struct kvm_ppc_cpu_char {
 #define   KVM_DEV_XIVE_GET_ESB_FD	1
 #define   KVM_DEV_XIVE_GET_TIMA_FD	2
 #define   KVM_DEV_XIVE_VC_BASE		3
+#define   KVM_DEV_XIVE_SAVE_EQ_PAGES	4
 #define KVM_DEV_XIVE_GRP_SOURCES	2	/* 64-bit source attributes */
 #define KVM_DEV_XIVE_GRP_SYNC		3	/* 64-bit source attributes */
 
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index a8052867afc1..f2de1bcf3b35 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -373,6 +373,43 @@ static int kvmppc_xive_native_get_tima_fd(struct kvmppc_xive *xive, u64 addr)
 	return put_user(ret, ubufp);
 }
 
+static int kvmppc_xive_native_vcpu_save_eq_pages(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+	unsigned int prio;
+
+	if (!xc)
+		return -ENOENT;
+
+	for (prio = 0; prio < KVMPPC_XIVE_Q_COUNT; prio++) {
+		struct xive_q *q = &xc->queues[prio];
+
+		if (!q->qpage)
+			continue;
+
+		/* Mark EQ page dirty for migration */
+		mark_page_dirty(vcpu->kvm, gpa_to_gfn(q->guest_qpage));
+	}
+	return 0;
+}
+
+static int kvmppc_xive_native_save_eq_pages(struct kvmppc_xive *xive)
+{
+	struct kvm *kvm = xive->kvm;
+	struct kvm_vcpu *vcpu;
+	unsigned int i;
+
+	pr_devel("%s\n", __func__);
+
+	mutex_lock(&kvm->lock);
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		kvmppc_xive_native_vcpu_save_eq_pages(vcpu);
+	}
+	mutex_unlock(&kvm->lock);
+
+	return 0;
+}
+
 static int xive_native_validate_queue_size(u32 qsize)
 {
 	switch (qsize) {
@@ -498,6 +535,8 @@ static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 		switch (attr->attr) {
 		case KVM_DEV_XIVE_VC_BASE:
 			return kvmppc_xive_native_set_vc_base(xive, attr->addr);
+		case KVM_DEV_XIVE_SAVE_EQ_PAGES:
+			return kvmppc_xive_native_save_eq_pages(xive);
 		}
 		break;
 	case KVM_DEV_XIVE_GRP_SOURCES:
@@ -538,6 +577,7 @@ static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
 		case KVM_DEV_XIVE_GET_ESB_FD:
 		case KVM_DEV_XIVE_GET_TIMA_FD:
 		case KVM_DEV_XIVE_VC_BASE:
+		case KVM_DEV_XIVE_SAVE_EQ_PAGES:
 			return 0;
 		}
 		break;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 15/19] KVM: PPC: Book3S HV: add get/set accessors for the source configuration
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (13 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 14/19] KVM: PPC: Book3S HV: add a control to make the XIVE EQ pages dirty Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-07 18:43 ` [PATCH 16/19] KVM: PPC: Book3S HV: add get/set accessors for the EQ configuration Cédric Le Goater
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

Theses are use to capure the XIVE EAS table of the KVM device, the
configuration of the source targets.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/uapi/asm/kvm.h   | 11 ++++
 arch/powerpc/kvm/book3s_xive_native.c | 87 +++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 1a8740629acf..faf024f39858 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -683,9 +683,20 @@ struct kvm_ppc_cpu_char {
 #define   KVM_DEV_XIVE_SAVE_EQ_PAGES	4
 #define KVM_DEV_XIVE_GRP_SOURCES	2	/* 64-bit source attributes */
 #define KVM_DEV_XIVE_GRP_SYNC		3	/* 64-bit source attributes */
+#define KVM_DEV_XIVE_GRP_EAS		4	/* 64-bit eas attributes */
 
 /* Layout of 64-bit XIVE source attribute values */
 #define KVM_XIVE_LEVEL_SENSITIVE	(1ULL << 0)
 #define KVM_XIVE_LEVEL_ASSERTED		(1ULL << 1)
 
+/* Layout of 64-bit eas attribute values */
+#define KVM_XIVE_EAS_PRIORITY_SHIFT	0
+#define KVM_XIVE_EAS_PRIORITY_MASK	0x7
+#define KVM_XIVE_EAS_SERVER_SHIFT	3
+#define KVM_XIVE_EAS_SERVER_MASK	0xfffffff8ULL
+#define KVM_XIVE_EAS_MASK_SHIFT		32
+#define KVM_XIVE_EAS_MASK_MASK		0x100000000ULL
+#define KVM_XIVE_EAS_EISN_SHIFT		33
+#define KVM_XIVE_EAS_EISN_MASK		0xfffffffe00000000ULL
+
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index f2de1bcf3b35..0468b605baa7 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -525,6 +525,88 @@ static int kvmppc_xive_native_sync(struct kvmppc_xive *xive, long irq, u64 addr)
 	return 0;
 }
 
+static int kvmppc_xive_native_set_eas(struct kvmppc_xive *xive, long irq,
+				      u64 addr)
+{
+	struct kvmppc_xive_src_block *sb;
+	struct kvmppc_xive_irq_state *state;
+	u64 __user *ubufp = (u64 __user *) addr;
+	u16 src;
+	u64 kvm_eas;
+	u32 server;
+	u8 priority;
+	u32 eisn;
+
+	sb = kvmppc_xive_find_source(xive, irq, &src);
+	if (!sb)
+		return -ENOENT;
+
+	state = &sb->irq_state[src];
+
+	if (!state->valid)
+		return -EINVAL;
+
+	if (get_user(kvm_eas, ubufp))
+		return -EFAULT;
+
+	pr_devel("%s irq=0x%lx eas=%016llx\n", __func__, irq, kvm_eas);
+
+	priority = (kvm_eas & KVM_XIVE_EAS_PRIORITY_MASK) >>
+		KVM_XIVE_EAS_PRIORITY_SHIFT;
+	server = (kvm_eas & KVM_XIVE_EAS_SERVER_MASK) >>
+		KVM_XIVE_EAS_SERVER_SHIFT;
+	eisn = (kvm_eas & KVM_XIVE_EAS_EISN_MASK) >> KVM_XIVE_EAS_EISN_SHIFT;
+
+	if (priority != xive_prio_from_guest(priority)) {
+		pr_err("invalid priority for queue %d for VCPU %d\n",
+		       priority, server);
+		return -EINVAL;
+	}
+
+	return kvmppc_xive_native_set_source_config(xive, sb, state, server,
+						    priority, eisn);
+}
+
+static int kvmppc_xive_native_get_eas(struct kvmppc_xive *xive, long irq,
+				      u64 addr)
+{
+	struct kvmppc_xive_src_block *sb;
+	struct kvmppc_xive_irq_state *state;
+	u64 __user *ubufp = (u64 __user *) addr;
+	u16 src;
+	u64 kvm_eas;
+
+	sb = kvmppc_xive_find_source(xive, irq, &src);
+	if (!sb)
+		return -ENOENT;
+
+	state = &sb->irq_state[src];
+
+	if (!state->valid)
+		return -EINVAL;
+
+	arch_spin_lock(&sb->lock);
+
+	if (state->act_priority == MASKED)
+		kvm_eas = KVM_XIVE_EAS_MASK_MASK;
+	else {
+		kvm_eas = (state->act_priority << KVM_XIVE_EAS_PRIORITY_SHIFT) &
+			KVM_XIVE_EAS_PRIORITY_MASK;
+		kvm_eas |= (state->act_server << KVM_XIVE_EAS_SERVER_SHIFT) &
+			KVM_XIVE_EAS_SERVER_MASK;
+		kvm_eas |= ((u64) state->eisn << KVM_XIVE_EAS_EISN_SHIFT) &
+			KVM_XIVE_EAS_EISN_MASK;
+	}
+	arch_spin_unlock(&sb->lock);
+
+	pr_devel("%s irq=0x%lx eas=%016llx\n", __func__, irq, kvm_eas);
+
+	if (put_user(kvm_eas, ubufp))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 				       struct kvm_device_attr *attr)
 {
@@ -544,6 +626,8 @@ static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 						     attr->addr);
 	case KVM_DEV_XIVE_GRP_SYNC:
 		return kvmppc_xive_native_sync(xive, attr->attr, attr->addr);
+	case KVM_DEV_XIVE_GRP_EAS:
+		return kvmppc_xive_native_set_eas(xive, attr->attr, attr->addr);
 	}
 	return -ENXIO;
 }
@@ -564,6 +648,8 @@ static int kvmppc_xive_native_get_attr(struct kvm_device *dev,
 			return kvmppc_xive_native_get_vc_base(xive, attr->addr);
 		}
 		break;
+	case KVM_DEV_XIVE_GRP_EAS:
+		return kvmppc_xive_native_get_eas(xive, attr->attr, attr->addr);
 	}
 	return -ENXIO;
 }
@@ -583,6 +669,7 @@ static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
 		break;
 	case KVM_DEV_XIVE_GRP_SOURCES:
 	case KVM_DEV_XIVE_GRP_SYNC:
+	case KVM_DEV_XIVE_GRP_EAS:
 		if (attr->attr >= KVMPPC_XIVE_FIRST_IRQ &&
 		    attr->attr < KVMPPC_XIVE_NR_IRQS)
 			return 0;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 16/19] KVM: PPC: Book3S HV: add get/set accessors for the EQ configuration
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (14 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 15/19] KVM: PPC: Book3S HV: add get/set accessors for the source configuration Cédric Le Goater
@ 2019-01-07 18:43 ` Cédric Le Goater
  2019-01-07 19:10 ` [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state Cédric Le Goater
  2019-01-22  4:46 ` [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Paul Mackerras
  17 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 18:43 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

These are used to capture the XIVE END table of the KVM device. It
relies on an OPAL call to retrieve from the XIVE IC the EQ toggle bit
and index which are updated by the HW when events are enqueued in the
guest RAM.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/uapi/asm/kvm.h   |  21 ++++
 arch/powerpc/kvm/book3s_xive_native.c | 166 ++++++++++++++++++++++++++
 2 files changed, 187 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index faf024f39858..95302558ce10 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -684,6 +684,7 @@ struct kvm_ppc_cpu_char {
 #define KVM_DEV_XIVE_GRP_SOURCES	2	/* 64-bit source attributes */
 #define KVM_DEV_XIVE_GRP_SYNC		3	/* 64-bit source attributes */
 #define KVM_DEV_XIVE_GRP_EAS		4	/* 64-bit eas attributes */
+#define KVM_DEV_XIVE_GRP_EQ		5	/* 64-bit eq attributes */
 
 /* Layout of 64-bit XIVE source attribute values */
 #define KVM_XIVE_LEVEL_SENSITIVE	(1ULL << 0)
@@ -699,4 +700,24 @@ struct kvm_ppc_cpu_char {
 #define KVM_XIVE_EAS_EISN_SHIFT		33
 #define KVM_XIVE_EAS_EISN_MASK		0xfffffffe00000000ULL
 
+/* Layout of 64-bit eq attribute */
+#define KVM_XIVE_EQ_PRIORITY_SHIFT	0
+#define KVM_XIVE_EQ_PRIORITY_MASK	0x7
+#define KVM_XIVE_EQ_SERVER_SHIFT	3
+#define KVM_XIVE_EQ_SERVER_MASK		0xfffffff8ULL
+
+/* Layout of 64-bit eq attribute values */
+struct kvm_ppc_xive_eq {
+	__u32 flags;
+	__u32 qsize;
+	__u64 qpage;
+	__u32 qtoggle;
+	__u32 qindex;
+};
+
+#define KVM_XIVE_EQ_FLAG_ENABLED	0x00000001
+#define KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY	0x00000002
+#define KVM_XIVE_EQ_FLAG_ESCALATE	0x00000004
+
+
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 0468b605baa7..f4eb71eafc57 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -607,6 +607,164 @@ static int kvmppc_xive_native_get_eas(struct kvmppc_xive *xive, long irq,
 	return 0;
 }
 
+static int kvmppc_xive_native_set_queue(struct kvmppc_xive *xive, long eq_idx,
+				      u64 addr)
+{
+	struct kvm *kvm = xive->kvm;
+	struct kvm_vcpu *vcpu;
+	struct kvmppc_xive_vcpu *xc;
+	void __user *ubufp = (u64 __user *) addr;
+	u32 server;
+	u8 priority;
+	struct kvm_ppc_xive_eq kvm_eq;
+	int rc;
+	__be32 *qaddr = 0;
+	struct page *page;
+	struct xive_q *q;
+
+	/*
+	 * Demangle priority/server tuple from the EQ index
+	 */
+	priority = (eq_idx & KVM_XIVE_EQ_PRIORITY_MASK) >>
+		KVM_XIVE_EQ_PRIORITY_SHIFT;
+	server = (eq_idx & KVM_XIVE_EQ_SERVER_MASK) >>
+		KVM_XIVE_EQ_SERVER_SHIFT;
+
+	if (copy_from_user(&kvm_eq, ubufp, sizeof(kvm_eq)))
+		return -EFAULT;
+
+	vcpu = kvmppc_xive_find_server(kvm, server);
+	if (!vcpu) {
+		pr_err("Can't find server %d\n", server);
+		return -ENOENT;
+	}
+	xc = vcpu->arch.xive_vcpu;
+
+	if (priority != xive_prio_from_guest(priority)) {
+		pr_err("Trying to restore invalid queue %d for VCPU %d\n",
+		       priority, server);
+		return -EINVAL;
+	}
+	q = &xc->queues[priority];
+
+	pr_devel("%s VCPU %d priority %d fl:%x sz:%d addr:%llx g:%d idx:%d\n",
+		 __func__, server, priority, kvm_eq.flags,
+		 kvm_eq.qsize, kvm_eq.qpage, kvm_eq.qtoggle, kvm_eq.qindex);
+
+	rc = xive_native_validate_queue_size(kvm_eq.qsize);
+	if (rc || !kvm_eq.qsize) {
+		pr_err("invalid queue size %d\n", kvm_eq.qsize);
+		return rc;
+	}
+
+	page = gfn_to_page(kvm, gpa_to_gfn(kvm_eq.qpage));
+	if (is_error_page(page)) {
+		pr_warn("Couldn't get guest page for %llx!\n", kvm_eq.qpage);
+		return -ENOMEM;
+	}
+	qaddr = page_to_virt(page) + (kvm_eq.qpage & ~PAGE_MASK);
+
+	/* Backup queue page guest address for migration */
+	q->guest_qpage = kvm_eq.qpage;
+	q->guest_qsize = kvm_eq.qsize;
+
+	rc = xive_native_configure_queue(xc->vp_id, q, priority,
+					 (__be32 *) qaddr, kvm_eq.qsize, true);
+	if (rc) {
+		pr_err("Failed to configure queue %d for VCPU %d: %d\n",
+		       priority, xc->server_num, rc);
+		put_page(page);
+		return rc;
+	}
+
+	rc = xive_native_set_queue_state(xc->vp_id, priority, kvm_eq.qtoggle,
+					 kvm_eq.qindex);
+	if (rc)
+		goto error;
+
+	rc = kvmppc_xive_attach_escalation(vcpu, priority);
+error:
+	if (rc)
+		xive_native_cleanup_queue(vcpu, priority);
+	return rc;
+}
+
+static int kvmppc_xive_native_get_queue(struct kvmppc_xive *xive, long eq_idx,
+				      u64 addr)
+{
+	struct kvm *kvm = xive->kvm;
+	struct kvm_vcpu *vcpu;
+	struct kvmppc_xive_vcpu *xc;
+	struct xive_q *q;
+	void __user *ubufp = (u64 __user *) addr;
+	u32 server;
+	u8 priority;
+	struct kvm_ppc_xive_eq kvm_eq;
+	u64 qpage;
+	u64 qsize;
+	u64 qeoi_page;
+	u32 escalate_irq;
+	u64 qflags;
+	int rc;
+
+	/*
+	 * Demangle priority/server tuple from the EQ index
+	 */
+	priority = (eq_idx & KVM_XIVE_EQ_PRIORITY_MASK) >>
+		KVM_XIVE_EQ_PRIORITY_SHIFT;
+	server = (eq_idx & KVM_XIVE_EQ_SERVER_MASK) >>
+		KVM_XIVE_EQ_SERVER_SHIFT;
+
+	vcpu = kvmppc_xive_find_server(kvm, server);
+	if (!vcpu) {
+		pr_err("Can't find server %d\n", server);
+		return -ENOENT;
+	}
+	xc = vcpu->arch.xive_vcpu;
+
+	if (priority != xive_prio_from_guest(priority)) {
+		pr_err("invalid priority for queue %d for VCPU %d\n",
+		       priority, server);
+		return -EINVAL;
+	}
+	q = &xc->queues[priority];
+
+	memset(&kvm_eq, 0, sizeof(kvm_eq));
+
+	if (!q->qpage)
+		return 0;
+
+	rc = xive_native_get_queue_info(xc->vp_id, priority, &qpage, &qsize,
+					&qeoi_page, &escalate_irq, &qflags);
+	if (rc)
+		return rc;
+
+	kvm_eq.flags = 0;
+	if (qflags & OPAL_XIVE_EQ_ENABLED)
+		kvm_eq.flags |= KVM_XIVE_EQ_FLAG_ENABLED;
+	if (qflags & OPAL_XIVE_EQ_ALWAYS_NOTIFY)
+		kvm_eq.flags |= KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY;
+	if (qflags & OPAL_XIVE_EQ_ESCALATE)
+		kvm_eq.flags |= KVM_XIVE_EQ_FLAG_ESCALATE;
+
+	kvm_eq.qsize = q->guest_qsize;
+	kvm_eq.qpage = q->guest_qpage;
+
+	rc = xive_native_get_queue_state(xc->vp_id, priority, &kvm_eq.qtoggle,
+					 &kvm_eq.qindex);
+	if (rc)
+		return rc;
+
+	pr_devel("%s VCPU %d priority %d fl:%x sz:%d addr:%llx g:%d idx:%d\n",
+		 __func__, server, priority, kvm_eq.flags,
+		 kvm_eq.qsize, kvm_eq.qpage, kvm_eq.qtoggle, kvm_eq.qindex);
+
+	if (copy_to_user(ubufp, &kvm_eq, sizeof(kvm_eq)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 				       struct kvm_device_attr *attr)
 {
@@ -628,6 +786,9 @@ static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 		return kvmppc_xive_native_sync(xive, attr->attr, attr->addr);
 	case KVM_DEV_XIVE_GRP_EAS:
 		return kvmppc_xive_native_set_eas(xive, attr->attr, attr->addr);
+	case KVM_DEV_XIVE_GRP_EQ:
+		return kvmppc_xive_native_set_queue(xive, attr->attr,
+						    attr->addr);
 	}
 	return -ENXIO;
 }
@@ -650,6 +811,9 @@ static int kvmppc_xive_native_get_attr(struct kvm_device *dev,
 		break;
 	case KVM_DEV_XIVE_GRP_EAS:
 		return kvmppc_xive_native_get_eas(xive, attr->attr, attr->addr);
+	case KVM_DEV_XIVE_GRP_EQ:
+		return kvmppc_xive_native_get_queue(xive, attr->attr,
+						    attr->addr);
 	}
 	return -ENXIO;
 }
@@ -674,6 +838,8 @@ static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
 		    attr->attr < KVMPPC_XIVE_NR_IRQS)
 			return 0;
 		break;
+	case KVM_DEV_XIVE_GRP_EQ:
+		return 0;
 	}
 	return -ENXIO;
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (15 preceding siblings ...)
  2019-01-07 18:43 ` [PATCH 16/19] KVM: PPC: Book3S HV: add get/set accessors for the EQ configuration Cédric Le Goater
@ 2019-01-07 19:10 ` Cédric Le Goater
  2019-01-07 19:10   ` [PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support Cédric Le Goater
  2019-01-07 19:10   ` [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl Cédric Le Goater
  2019-01-22  4:46 ` [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Paul Mackerras
  17 siblings, 2 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 19:10 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

At a VCPU level, the state of the thread context interrupt management
registers needs to be collected. These registers are cached under the
'xive_saved_state.w01' field of the VCPU when the VPCU context is
pulled from the HW thread. An OPAL call retrieves the backup of the
IPB register in the NVT structure and merges it in the KVM state.

The structures of the interface between QEMU and KVM provisions some
extra room (two u64) for further extensions if more state needs to be
transferred back to QEMU.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/kvm_ppc.h    |  5 ++
 arch/powerpc/include/uapi/asm/kvm.h   |  2 +
 arch/powerpc/kvm/book3s.c             | 24 +++++++++
 arch/powerpc/kvm/book3s_xive_native.c | 78 +++++++++++++++++++++++++++
 4 files changed, 109 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 4cc897039485..49c488af168c 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -270,6 +270,7 @@ union kvmppc_one_reg {
 		u64	addr;
 		u64	length;
 	}	vpaval;
+	u64	xive_timaval[4];
 };
 
 struct kvmppc_ops {
@@ -603,6 +604,8 @@ extern void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu);
 extern void kvmppc_xive_native_init_module(void);
 extern void kvmppc_xive_native_exit_module(void);
 extern int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd);
+extern int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, union kvmppc_one_reg *val);
+extern int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union kvmppc_one_reg *val);
 
 #else
 static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server,
@@ -637,6 +640,8 @@ static inline void kvmppc_xive_native_init_module(void) { }
 static inline void kvmppc_xive_native_exit_module(void) { }
 static inline int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd)
 	{ return 0; }
+static inline int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, union kvmppc_one_reg *val) { return 0; }
+static inline int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union kvmppc_one_reg *val) { return -ENOENT; }
 
 #endif /* CONFIG_KVM_XIVE */
 
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 95302558ce10..3c958c39a782 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -480,6 +480,8 @@ struct kvm_ppc_cpu_char {
 #define  KVM_REG_PPC_ICP_PPRI_SHIFT	16	/* pending irq priority */
 #define  KVM_REG_PPC_ICP_PPRI_MASK	0xff
 
+#define KVM_REG_PPC_VP_STATE	(KVM_REG_PPC | KVM_REG_SIZE_U256 | 0x8d)
+
 /* Device control API: PPC-specific devices */
 #define KVM_DEV_MPIC_GRP_MISC		1
 #define   KVM_DEV_MPIC_BASE_ADDR	0	/* 64-bit */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index de7eed191107..5ad658077a35 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -641,6 +641,18 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
 				*val = get_reg_val(id, kvmppc_xics_get_icp(vcpu));
 			break;
 #endif /* CONFIG_KVM_XICS */
+#ifdef CONFIG_KVM_XIVE
+		case KVM_REG_PPC_VP_STATE:
+			if (!vcpu->arch.xive_vcpu) {
+				r = -ENXIO;
+				break;
+			}
+			if (xive_enabled())
+				r = kvmppc_xive_native_get_vp(vcpu, val);
+			else
+				r = -ENXIO;
+			break;
+#endif /* CONFIG_KVM_XIVE */
 		case KVM_REG_PPC_FSCR:
 			*val = get_reg_val(id, vcpu->arch.fscr);
 			break;
@@ -714,6 +726,18 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 				r = kvmppc_xics_set_icp(vcpu, set_reg_val(id, *val));
 			break;
 #endif /* CONFIG_KVM_XICS */
+#ifdef CONFIG_KVM_XIVE
+		case KVM_REG_PPC_VP_STATE:
+			if (!vcpu->arch.xive_vcpu) {
+				r = -ENXIO;
+				break;
+			}
+			if (xive_enabled())
+				r = kvmppc_xive_native_set_vp(vcpu, val);
+			else
+				r = -ENXIO;
+			break;
+#endif /* CONFIG_KVM_XIVE */
 		case KVM_REG_PPC_FSCR:
 			vcpu->arch.fscr = set_reg_val(id, *val);
 			break;
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index f4eb71eafc57..1aefb366df0b 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -424,6 +424,84 @@ static int xive_native_validate_queue_size(u32 qsize)
 	}
 }
 
+#define TM_IPB_SHIFT 40
+#define TM_IPB_MASK  (((u64) 0xFF) << TM_IPB_SHIFT)
+
+int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, union kvmppc_one_reg *val)
+{
+	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+	u64 opal_state;
+	int rc;
+
+	if (!kvmppc_xive_enabled(vcpu))
+		return -EPERM;
+
+	if (!xc)
+		return -ENOENT;
+
+	/* Thread context registers. We only care about IPB and CPPR */
+	val->xive_timaval[0] = vcpu->arch.xive_saved_state.w01;
+
+	/*
+	 * Return the OS CAM line to print out the VP identifier in
+	 * the QEMU monitor. This is not restored.
+	 */
+	val->xive_timaval[1] = vcpu->arch.xive_cam_word;
+
+	/* Get the VP state from OPAL */
+	rc = xive_native_get_vp_state(xc->vp_id, &opal_state);
+	if (rc)
+		return rc;
+
+	/*
+	 * Capture the backup of IPB register in the NVT structure and
+	 * merge it in our KVM VP state.
+	 *
+	 * TODO: P10 support.
+	 */
+	val->xive_timaval[0] |= cpu_to_be64(opal_state & TM_IPB_MASK);
+
+	pr_devel("%s NSR=%02x CPPR=%02x IBP=%02x PIPR=%02x w01=%016llx w2=%08x opal=%016llx\n",
+		 __func__,
+		 vcpu->arch.xive_saved_state.nsr,
+		 vcpu->arch.xive_saved_state.cppr,
+		 vcpu->arch.xive_saved_state.ipb,
+		 vcpu->arch.xive_saved_state.pipr,
+		 vcpu->arch.xive_saved_state.w01,
+		 (u32) vcpu->arch.xive_cam_word, opal_state);
+
+	return 0;
+}
+
+int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union kvmppc_one_reg *val)
+{
+	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
+	struct kvmppc_xive *xive = vcpu->kvm->arch.xive;
+
+	pr_devel("%s w01=%016llx vp=%016llx\n", __func__,
+		 val->xive_timaval[0], val->xive_timaval[1]);
+
+	if (!kvmppc_xive_enabled(vcpu))
+		return -EPERM;
+
+	if (!xc || !xive)
+		return -ENOENT;
+
+	/* We can't update the state of a "pushed" VCPU	 */
+	if (WARN_ON(vcpu->arch.xive_pushed))
+		return -EIO;
+
+	/* Thread context registers. only restore IPB and CPPR ? */
+	vcpu->arch.xive_saved_state.w01 = val->xive_timaval[0];
+
+	/*
+	 * There is no need to restore the XIVE internal state (IPB
+	 * stored in the NVT) as the IPB register was merged in KVM VP
+	 * state.
+	 */
+	return 0;
+}
+
 static int kvmppc_xive_native_set_source(struct kvmppc_xive *xive, long irq,
 					 u64 addr)
 {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support
  2019-01-07 19:10 ` [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state Cédric Le Goater
@ 2019-01-07 19:10   ` Cédric Le Goater
  2019-01-22  5:26     ` Paul Mackerras
  2019-01-07 19:10   ` [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl Cédric Le Goater
  1 sibling, 1 reply; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 19:10 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

Clear the ESB pages from the VMA of the IRQ being pass through to the
guest and let the fault handler repopulate the VMA when the ESB pages
are accessed for an EOI or for a trigger.

Storing the VMA under the KVM XIVE device is a little ugly.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.h        |  8 +++++++
 arch/powerpc/kvm/book3s_xive.c        | 15 ++++++++++++++
 arch/powerpc/kvm/book3s_xive_native.c | 30 +++++++++++++++++++++++++++
 3 files changed, 53 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index 31e598e62589..6e64d3496a2c 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -90,6 +90,11 @@ struct kvmppc_xive_src_block {
 	struct kvmppc_xive_irq_state irq_state[KVMPPC_XICS_IRQ_PER_ICS];
 };
 
+struct kvmppc_xive;
+
+struct kvmppc_xive_ops {
+	int (*reset_mapped)(struct kvm *kvm, unsigned long guest_irq);
+};
 
 struct kvmppc_xive {
 	struct kvm *kvm;
@@ -131,6 +136,9 @@ struct kvmppc_xive {
 
 	/* VC base address for ESBs */
 	u64     vc_base;
+
+	struct kvmppc_xive_ops *ops;
+	struct vm_area_struct *vma;
 };
 
 #define KVMPPC_XIVE_Q_COUNT	8
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index e9f05d9c9ad5..9b4751713554 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -946,6 +946,13 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
 	/* Turn the IPI hard off */
 	xive_vm_esb_load(&state->ipi_data, XIVE_ESB_SET_PQ_01);
 
+	/*
+	 * Reset ESB guest mapping. Needed when ESB pages are exposed
+	 * to the guest in XIVE native mode
+	 */
+	if (xive->ops && xive->ops->reset_mapped)
+		xive->ops->reset_mapped(kvm, guest_irq);
+
 	/* Grab info about irq */
 	state->pt_number = hw_irq;
 	state->pt_data = irq_data_get_irq_handler_data(host_data);
@@ -1031,6 +1038,14 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
 	state->pt_number = 0;
 	state->pt_data = NULL;
 
+	/*
+	 * Reset ESB guest mapping. Needed when ESB pages are exposed
+	 * to the guest in XIVE native mode
+	 */
+	if (xive->ops && xive->ops->reset_mapped) {
+		xive->ops->reset_mapped(kvm, guest_irq);
+	}
+
 	/* Reconfigure the IPI */
 	xive_native_configure_irq(state->ipi_number,
 				  xive_vp(xive, state->act_server),
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 1aefb366df0b..12edac29995e 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -240,6 +240,32 @@ static int kvmppc_xive_native_get_vc_base(struct kvmppc_xive *xive, u64 addr)
 	return 0;
 }
 
+static int kvmppc_xive_native_reset_mapped(struct kvm *kvm, unsigned long irq)
+{
+	struct kvmppc_xive *xive = kvm->arch.xive;
+	struct mm_struct *mm = kvm->mm;
+	struct vm_area_struct *vma = xive->vma;
+	unsigned long address;
+
+	if (irq >= KVMPPC_XIVE_NR_IRQS)
+		return -EINVAL;
+
+	pr_debug("clearing esb pages for girq 0x%lx\n", irq);
+
+	down_read(&mm->mmap_sem);
+	/* TODO: can we clear the PTEs without keeping a VMA pointer ? */
+	if (vma) {
+		address = vma->vm_start + irq * (2ull << PAGE_SHIFT);
+		zap_vma_ptes(vma, address, 2ull << PAGE_SHIFT);
+	}
+	up_read(&mm->mmap_sem);
+	return 0;
+}
+
+static struct kvmppc_xive_ops kvmppc_xive_native_ops =  {
+	.reset_mapped = kvmppc_xive_native_reset_mapped,
+};
+
 static int xive_native_esb_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
@@ -292,6 +318,8 @@ static const struct vm_operations_struct xive_native_esb_vmops = {
 
 static int xive_native_esb_mmap(struct file *file, struct vm_area_struct *vma)
 {
+	struct kvmppc_xive *xive = vma->vm_file->private_data;
+
 	/* There are two ESB pages (trigger and EOI) per IRQ */
 	if (vma_pages(vma) + vma->vm_pgoff > KVMPPC_XIVE_NR_IRQS * 2)
 		return -EINVAL;
@@ -299,6 +327,7 @@ static int xive_native_esb_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_flags |= VM_IO | VM_PFNMAP;
 	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 	vma->vm_ops = &xive_native_esb_vmops;
+	xive->vma = vma; /* TODO: get rid of the VMA pointer */
 	return 0;
 }
 
@@ -992,6 +1021,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
 	xive->vc_base = XIVE_VC_BASE;
 
 	xive->single_escalation = xive_native_has_single_escalation();
+	xive->ops = &kvmppc_xive_native_ops;
 
 	if (ret)
 		kfree(xive);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl
  2019-01-07 19:10 ` [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state Cédric Le Goater
  2019-01-07 19:10   ` [PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support Cédric Le Goater
@ 2019-01-07 19:10   ` Cédric Le Goater
  2019-01-22  5:42     ` Paul Mackerras
  1 sibling, 1 reply; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-07 19:10 UTC (permalink / raw)
  To: kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

This will be used to destroy the KVM XICS or XIVE device when the
sPAPR machine is reseted. When the VM boots, the CAS negotiation
process will determine which interrupt mode to use and the appropriate
KVM device will then be created.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/linux/kvm_host.h              |  2 ++
 include/uapi/linux/kvm.h              |  2 ++
 arch/powerpc/kvm/book3s_xive.c        | 38 +++++++++++++++++++++++++-
 arch/powerpc/kvm/book3s_xive_native.c | 24 +++++++++++++++++
 virt/kvm/kvm_main.c                   | 39 +++++++++++++++++++++++++++
 5 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c38cc5eb7e73..259b6885dc74 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1218,6 +1218,8 @@ struct kvm_device_ops {
 	 */
 	void (*destroy)(struct kvm_device *dev);
 
+	int (*delete)(struct kvm_device *dev);
+
 	int (*set_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
 	int (*get_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
 	int (*has_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 52bf74a1616e..b00cb4d986cf 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1331,6 +1331,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_GET_DEVICE_ATTR	  _IOW(KVMIO,  0xe2, struct kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR	  _IOW(KVMIO,  0xe3, struct kvm_device_attr)
 
+#define KVM_DELETE_DEVICE	  _IOWR(KVMIO,  0xf0, struct kvm_create_device)
+
 /*
  * ioctls for vcpu fds
  */
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 9b4751713554..5449fb4c87f9 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1109,11 +1109,19 @@ void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
 void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
-	struct kvmppc_xive *xive = xc->xive;
+	struct kvmppc_xive *xive;
 	int i;
 
+	if (!kvmppc_xics_enabled(vcpu))
+		return;
+
+	if (!xc)
+		return;
+
 	pr_devel("cleanup_vcpu(cpu=%d)\n", xc->server_num);
 
+	xive = xc->xive;
+
 	/* Ensure no interrupt is still routed to that VP */
 	xc->valid = false;
 	kvmppc_xive_disable_vcpu_interrupts(vcpu);
@@ -1150,6 +1158,10 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
 	}
 	/* Free the VP */
 	kfree(xc);
+
+	/* Cleanup the vcpu */
+	vcpu->arch.irq_type = KVMPPC_IRQ_DEFAULT;
+	vcpu->arch.xive_vcpu = NULL;
 }
 
 int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
@@ -1861,6 +1873,29 @@ static void kvmppc_xive_free(struct kvm_device *dev)
 	kfree(dev);
 }
 
+static int kvmppc_xive_delete(struct kvm_device *dev)
+{
+	struct kvm *kvm = dev->kvm;
+	unsigned int i;
+	struct kvm_vcpu *vcpu;
+
+	if (!kvm->arch.xive)
+		return -EPERM;
+
+	/*
+	 * call kick_all_cpus_sync() to ensure that all CPUs have
+	 * executed any pending interrupts
+	 */
+	if (is_kvmppc_hv_enabled(kvm))
+		kick_all_cpus_sync();
+
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		kvmppc_xive_cleanup_vcpu(vcpu);
+
+	kvmppc_xive_free(dev);
+	return 0;
+}
+
 static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 {
 	struct kvmppc_xive *xive;
@@ -2035,6 +2070,7 @@ struct kvm_device_ops kvm_xive_ops = {
 	.create = kvmppc_xive_create,
 	.init = kvmppc_xive_init,
 	.destroy = kvmppc_xive_free,
+	.delete = kvmppc_xive_delete,
 	.set_attr = xive_set_attr,
 	.get_attr = xive_get_attr,
 	.has_attr = xive_has_attr,
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 12edac29995e..7367962e670a 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -979,6 +979,29 @@ static void kvmppc_xive_native_free(struct kvm_device *dev)
 	kfree(dev);
 }
 
+static int kvmppc_xive_native_delete(struct kvm_device *dev)
+{
+	struct kvm *kvm = dev->kvm;
+	unsigned int i;
+	struct kvm_vcpu *vcpu;
+
+	if (!kvm->arch.xive)
+		return -EPERM;
+
+	/*
+	 * call kick_all_cpus_sync() to ensure that all CPUs have
+	 * executed any pending interrupts
+	 */
+	if (is_kvmppc_hv_enabled(kvm))
+		kick_all_cpus_sync();
+
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		kvmppc_xive_native_cleanup_vcpu(vcpu);
+
+	kvmppc_xive_native_free(dev);
+	return 0;
+}
+
 /*
  * ESB MMIO address of chip 0
  */
@@ -1350,6 +1373,7 @@ struct kvm_device_ops kvm_xive_native_ops = {
 	.create = kvmppc_xive_native_create,
 	.init = kvmppc_xive_native_init,
 	.destroy = kvmppc_xive_native_free,
+	.delete = kvmppc_xive_native_delete,
 	.set_attr = kvmppc_xive_native_set_attr,
 	.get_attr = kvmppc_xive_native_get_attr,
 	.has_attr = kvmppc_xive_native_has_attr,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1f888a103f78..c93c35c43675 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3009,6 +3009,31 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
 	return 0;
 }
 
+static int kvm_ioctl_delete_device(struct kvm *kvm,
+				   struct kvm_create_device *cd)
+{
+	struct fd f;
+	struct kvm_device *dev;
+	int ret;
+
+	f = fdget(cd->fd);
+	if (!f.file)
+		return -EBADF;
+
+	dev = kvm_device_from_filp(f.file);
+	fdput(f);
+
+	if (!dev)
+		return -EPERM;
+
+	mutex_lock(&kvm->lock);
+	list_del(&dev->vm_node);
+	mutex_unlock(&kvm->lock);
+	ret = dev->ops->delete(dev);
+
+	return ret;
+}
+
 static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 {
 	switch (arg) {
@@ -3253,6 +3278,20 @@ static long kvm_vm_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_DELETE_DEVICE: {
+		struct kvm_create_device cd;
+
+		r = -EFAULT;
+		if (copy_from_user(&cd, argp, sizeof(cd)))
+			goto out;
+
+		r = kvm_ioctl_delete_device(kvm, &cd);
+		if (r)
+			goto out;
+
+		r = 0;
+		break;
+	}
 	case KVM_CHECK_EXTENSION:
 		r = kvm_vm_ioctl_check_extension_generic(kvm, arg);
 		break;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls
  2019-01-07 18:43 ` [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls Cédric Le Goater
@ 2019-01-09  3:33   ` David Gibson
  2019-01-09 13:08   ` Michael Ellerman
  1 sibling, 0 replies; 47+ messages in thread
From: David Gibson @ 2019-01-09  3:33 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 5524 bytes --]

On Mon, Jan 07, 2019 at 07:43:13PM +0100, Cédric Le Goater wrote:
> These flags are shared between Linux/KVM implementing the hypervisor
> calls for the XIVE native exploitation mode and the driver for the
> sPAPR guests.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/include/asm/xive.h  | 23 +++++++++++++++++++++++
>  arch/powerpc/sysdev/xive/spapr.c | 28 ++++++++--------------------
>  2 files changed, 31 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
> index 3c704f5dd3ae..32f033bfbf42 100644
> --- a/arch/powerpc/include/asm/xive.h
> +++ b/arch/powerpc/include/asm/xive.h
> @@ -93,6 +93,29 @@ extern void xive_flush_interrupt(void);
>  /* xmon hook */
>  extern void xmon_xive_do_dump(int cpu);
>  
> +/*
> + * Hcall flags shared by the sPAPR backend and KVM
> + */
> +
> +/* H_INT_GET_SOURCE_INFO */
> +#define XIVE_SPAPR_SRC_H_INT_ESB	PPC_BIT(60)
> +#define XIVE_SPAPR_SRC_LSI		PPC_BIT(61)
> +#define XIVE_SPAPR_SRC_TRIGGER		PPC_BIT(62)
> +#define XIVE_SPAPR_SRC_STORE_EOI	PPC_BIT(63)
> +
> +/* H_INT_SET_SOURCE_CONFIG */
> +#define XIVE_SPAPR_SRC_SET_EISN		PPC_BIT(62)
> +#define XIVE_SPAPR_SRC_MASK		PPC_BIT(63) /* unused */
> +
> +/* H_INT_SET_QUEUE_CONFIG */
> +#define XIVE_SPAPR_EQ_ALWAYS_NOTIFY	PPC_BIT(63)
> +
> +/* H_INT_SET_QUEUE_CONFIG */
> +#define XIVE_SPAPR_EQ_DEBUG		PPC_BIT(63)
> +
> +/* H_INT_ESB */
> +#define XIVE_SPAPR_ESB_STORE		PPC_BIT(63)
> +
>  /* APIs used by KVM */
>  extern u32 xive_native_default_eq_shift(void);
>  extern u32 xive_native_alloc_vp_block(u32 max_vcpus);
> diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
> index 575db3b06a6b..730284f838c8 100644
> --- a/arch/powerpc/sysdev/xive/spapr.c
> +++ b/arch/powerpc/sysdev/xive/spapr.c
> @@ -184,9 +184,6 @@ static long plpar_int_get_source_info(unsigned long flags,
>  	return 0;
>  }
>  
> -#define XIVE_SRC_SET_EISN (1ull << (63 - 62))
> -#define XIVE_SRC_MASK     (1ull << (63 - 63)) /* unused */
> -
>  static long plpar_int_set_source_config(unsigned long flags,
>  					unsigned long lisn,
>  					unsigned long target,
> @@ -243,8 +240,6 @@ static long plpar_int_get_queue_info(unsigned long flags,
>  	return 0;
>  }
>  
> -#define XIVE_EQ_ALWAYS_NOTIFY (1ull << (63 - 63))
> -
>  static long plpar_int_set_queue_config(unsigned long flags,
>  				       unsigned long target,
>  				       unsigned long priority,
> @@ -286,8 +281,6 @@ static long plpar_int_sync(unsigned long flags, unsigned long lisn)
>  	return 0;
>  }
>  
> -#define XIVE_ESB_FLAG_STORE (1ull << (63 - 63))
> -
>  static long plpar_int_esb(unsigned long flags,
>  			  unsigned long lisn,
>  			  unsigned long offset,
> @@ -321,7 +314,7 @@ static u64 xive_spapr_esb_rw(u32 lisn, u32 offset, u64 data, bool write)
>  	unsigned long read_data;
>  	long rc;
>  
> -	rc = plpar_int_esb(write ? XIVE_ESB_FLAG_STORE : 0,
> +	rc = plpar_int_esb(write ? XIVE_SPAPR_ESB_STORE : 0,
>  			   lisn, offset, data, &read_data);
>  	if (rc)
>  		return -1;
> @@ -329,11 +322,6 @@ static u64 xive_spapr_esb_rw(u32 lisn, u32 offset, u64 data, bool write)
>  	return write ? 0 : read_data;
>  }
>  
> -#define XIVE_SRC_H_INT_ESB     (1ull << (63 - 60))
> -#define XIVE_SRC_LSI           (1ull << (63 - 61))
> -#define XIVE_SRC_TRIGGER       (1ull << (63 - 62))
> -#define XIVE_SRC_STORE_EOI     (1ull << (63 - 63))
> -
>  static int xive_spapr_populate_irq_data(u32 hw_irq, struct xive_irq_data *data)
>  {
>  	long rc;
> @@ -349,11 +337,11 @@ static int xive_spapr_populate_irq_data(u32 hw_irq, struct xive_irq_data *data)
>  	if (rc)
>  		return  -EINVAL;
>  
> -	if (flags & XIVE_SRC_H_INT_ESB)
> +	if (flags & XIVE_SPAPR_SRC_H_INT_ESB)
>  		data->flags  |= XIVE_IRQ_FLAG_H_INT_ESB;
> -	if (flags & XIVE_SRC_STORE_EOI)
> +	if (flags & XIVE_SPAPR_SRC_STORE_EOI)
>  		data->flags  |= XIVE_IRQ_FLAG_STORE_EOI;
> -	if (flags & XIVE_SRC_LSI)
> +	if (flags & XIVE_SPAPR_SRC_LSI)
>  		data->flags  |= XIVE_IRQ_FLAG_LSI;
>  	data->eoi_page  = eoi_page;
>  	data->esb_shift = esb_shift;
> @@ -374,7 +362,7 @@ static int xive_spapr_populate_irq_data(u32 hw_irq, struct xive_irq_data *data)
>  	data->hw_irq = hw_irq;
>  
>  	/* Full function page supports trigger */
> -	if (flags & XIVE_SRC_TRIGGER) {
> +	if (flags & XIVE_SPAPR_SRC_TRIGGER) {
>  		data->trig_mmio = data->eoi_mmio;
>  		return 0;
>  	}
> @@ -391,8 +379,8 @@ static int xive_spapr_configure_irq(u32 hw_irq, u32 target, u8 prio, u32 sw_irq)
>  {
>  	long rc;
>  
> -	rc = plpar_int_set_source_config(XIVE_SRC_SET_EISN, hw_irq, target,
> -					 prio, sw_irq);
> +	rc = plpar_int_set_source_config(XIVE_SPAPR_SRC_SET_EISN, hw_irq,
> +					 target, prio, sw_irq);
>  
>  	return rc == 0 ? 0 : -ENXIO;
>  }
> @@ -432,7 +420,7 @@ static int xive_spapr_configure_queue(u32 target, struct xive_q *q, u8 prio,
>  	q->eoi_phys = esn_page;
>  
>  	/* Default is to always notify */
> -	flags = XIVE_EQ_ALWAYS_NOTIFY;
> +	flags = XIVE_SPAPR_EQ_ALWAYS_NOTIFY;
>  
>  	/* Configure and enable the queue in HW */
>  	rc = plpar_int_set_queue_config(flags, target, prio, qpage_phys, order);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/19] powerpc/xive: add OPAL extensions for the XIVE native exploitation support
  2019-01-07 18:43 ` [PATCH 02/19] powerpc/xive: add OPAL extensions for the XIVE native exploitation support Cédric Le Goater
@ 2019-01-09  4:26   ` David Gibson
  0 siblings, 0 replies; 47+ messages in thread
From: David Gibson @ 2019-01-09  4:26 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 8614 bytes --]

On Mon, Jan 07, 2019 at 07:43:14PM +0100, Cédric Le Goater wrote:
> The support for XIVE native exploitation mode in Linux/KVM needs a
> couple more OPAL calls to configure the sPAPR guest and to get/set the
> state of the XIVE internal structures.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/include/asm/opal-api.h           | 11 ++-
>  arch/powerpc/include/asm/opal.h               |  7 ++
>  arch/powerpc/include/asm/xive.h               | 14 +++
>  arch/powerpc/sysdev/xive/native.c             | 99 +++++++++++++++++++
>  .../powerpc/platforms/powernv/opal-wrappers.S |  3 +
>  5 files changed, 130 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index 870fb7b239ea..cdfc54f78101 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -186,8 +186,8 @@
>  #define OPAL_XIVE_FREE_IRQ			140
>  #define OPAL_XIVE_SYNC				141
>  #define OPAL_XIVE_DUMP				142
> -#define OPAL_XIVE_RESERVED3			143
> -#define OPAL_XIVE_RESERVED4			144
> +#define OPAL_XIVE_GET_QUEUE_STATE		143
> +#define OPAL_XIVE_SET_QUEUE_STATE		144
>  #define OPAL_SIGNAL_SYSTEM_RESET		145
>  #define OPAL_NPU_INIT_CONTEXT			146
>  #define OPAL_NPU_DESTROY_CONTEXT		147
> @@ -209,8 +209,11 @@
>  #define OPAL_SENSOR_GROUP_ENABLE		163
>  #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR		164
>  #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR		165
> -#define	OPAL_NX_COPROC_INIT			167
> -#define OPAL_LAST				167
> +#define OPAL_HANDLE_HMI2			166
> +#define OPAL_NX_COPROC_INIT			167
> +#define OPAL_NPU_SET_RELAXED_ORDER		168
> +#define OPAL_NPU_GET_RELAXED_ORDER		169
> +#define OPAL_XIVE_GET_VP_STATE			170
>  
>  #define QUIESCE_HOLD			1 /* Spin all calls at entry */
>  #define QUIESCE_REJECT			2 /* Fail all calls with OPAL_BUSY */
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index a55b01c90bb1..4e978d4dea5c 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -279,6 +279,13 @@ int64_t opal_xive_allocate_irq(uint32_t chip_id);
>  int64_t opal_xive_free_irq(uint32_t girq);
>  int64_t opal_xive_sync(uint32_t type, uint32_t id);
>  int64_t opal_xive_dump(uint32_t type, uint32_t id);
> +int64_t opal_xive_get_queue_state(uint64_t vp, uint32_t prio,
> +				  __be32 *out_qtoggle,
> +				  __be32 *out_qindex);
> +int64_t opal_xive_set_queue_state(uint64_t vp, uint32_t prio,
> +				  uint32_t qtoggle,
> +				  uint32_t qindex);
> +int64_t opal_xive_get_vp_state(uint64_t vp, __be64 *out_w01);
>  int64_t opal_pci_set_p2p(uint64_t phb_init, uint64_t phb_target,
>  			uint64_t desc, uint16_t pe_number);
>  
> diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
> index 32f033bfbf42..d6be3e4d9fa4 100644
> --- a/arch/powerpc/include/asm/xive.h
> +++ b/arch/powerpc/include/asm/xive.h
> @@ -132,12 +132,26 @@ extern int xive_native_configure_queue(u32 vp_id, struct xive_q *q, u8 prio,
>  extern void xive_native_disable_queue(u32 vp_id, struct xive_q *q, u8 prio);
>  
>  extern void xive_native_sync_source(u32 hw_irq);
> +extern void xive_native_sync_queue(u32 hw_irq);
>  extern bool is_xive_irq(struct irq_chip *chip);
>  extern int xive_native_enable_vp(u32 vp_id, bool single_escalation);
>  extern int xive_native_disable_vp(u32 vp_id);
>  extern int xive_native_get_vp_info(u32 vp_id, u32 *out_cam_id, u32 *out_chip_id);
>  extern bool xive_native_has_single_escalation(void);
>  
> +extern int xive_native_get_queue_info(u32 vp_id, uint32_t prio,
> +				      u64 *out_qpage,
> +				      u64 *out_qsize,
> +				      u64 *out_qeoi_page,
> +				      u32 *out_escalate_irq,
> +				      u64 *out_qflags);
> +
> +extern int xive_native_get_queue_state(u32 vp_id, uint32_t prio, u32 *qtoggle,
> +				       u32 *qindex);
> +extern int xive_native_set_queue_state(u32 vp_id, uint32_t prio, u32 qtoggle,
> +				       u32 qindex);
> +extern int xive_native_get_vp_state(u32 vp_id, u64 *out_state);
> +
>  #else
>  
>  static inline bool xive_enabled(void) { return false; }
> diff --git a/arch/powerpc/sysdev/xive/native.c b/arch/powerpc/sysdev/xive/native.c
> index 1ca127d052a6..0c037e933e55 100644
> --- a/arch/powerpc/sysdev/xive/native.c
> +++ b/arch/powerpc/sysdev/xive/native.c
> @@ -437,6 +437,12 @@ void xive_native_sync_source(u32 hw_irq)
>  }
>  EXPORT_SYMBOL_GPL(xive_native_sync_source);
>  
> +void xive_native_sync_queue(u32 hw_irq)
> +{
> +	opal_xive_sync(XIVE_SYNC_QUEUE, hw_irq);
> +}
> +EXPORT_SYMBOL_GPL(xive_native_sync_queue);
> +
>  static const struct xive_ops xive_native_ops = {
>  	.populate_irq_data	= xive_native_populate_irq_data,
>  	.configure_irq		= xive_native_configure_irq,
> @@ -711,3 +717,96 @@ bool xive_native_has_single_escalation(void)
>  	return xive_has_single_esc;
>  }
>  EXPORT_SYMBOL_GPL(xive_native_has_single_escalation);
> +
> +int xive_native_get_queue_info(u32 vp_id, u32 prio,
> +			       u64 *out_qpage,
> +			       u64 *out_qsize,
> +			       u64 *out_qeoi_page,
> +			       u32 *out_escalate_irq,
> +			       u64 *out_qflags)
> +{
> +	__be64 qpage;
> +	__be64 qsize;
> +	__be64 qeoi_page;
> +	__be32 escalate_irq;
> +	__be64 qflags;
> +	s64 rc;
> +
> +	rc = opal_xive_get_queue_info(vp_id, prio, &qpage, &qsize,
> +				      &qeoi_page, &escalate_irq, &qflags);
> +	if (rc) {
> +		pr_err("OPAL failed to get queue info for VCPU %d/%d : %lld\n",
> +		       vp_id, prio, rc);
> +		return -EIO;
> +	}
> +
> +	if (out_qpage)
> +		*out_qpage = be64_to_cpu(qpage);
> +	if (out_qsize)
> +		*out_qsize = be32_to_cpu(qsize);
> +	if (out_qeoi_page)
> +		*out_qeoi_page = be64_to_cpu(qeoi_page);
> +	if (out_escalate_irq)
> +		*out_escalate_irq = be32_to_cpu(escalate_irq);
> +	if (out_qflags)
> +		*out_qflags = be64_to_cpu(qflags);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(xive_native_get_queue_info);
> +
> +int xive_native_get_queue_state(u32 vp_id, u32 prio, u32 *qtoggle, u32 *qindex)
> +{
> +	__be32 opal_qtoggle;
> +	__be32 opal_qindex;
> +	s64 rc;
> +
> +	rc = opal_xive_get_queue_state(vp_id, prio, &opal_qtoggle,
> +				       &opal_qindex);
> +	if (rc) {
> +		pr_err("OPAL failed to get queue state for VCPU %d/%d : %lld\n",
> +		       vp_id, prio, rc);
> +		return -EIO;
> +	}
> +
> +	if (qtoggle)
> +		*qtoggle = be32_to_cpu(opal_qtoggle);
> +	if (qindex)
> +		*qindex = be32_to_cpu(opal_qindex);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(xive_native_get_queue_state);
> +
> +int xive_native_set_queue_state(u32 vp_id, u32 prio, u32 qtoggle, u32 qindex)
> +{
> +	s64 rc;
> +
> +	rc = opal_xive_set_queue_state(vp_id, prio, qtoggle, qindex);
> +	if (rc) {
> +		pr_err("OPAL failed to set queue state for VCPU %d/%d : %lld\n",
> +		       vp_id, prio, rc);
> +		return -EIO;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(xive_native_set_queue_state);
> +
> +int xive_native_get_vp_state(u32 vp_id, u64 *out_state)
> +{
> +	__be64 state;
> +	s64 rc;
> +
> +	rc = opal_xive_get_vp_state(vp_id, &state);
> +	if (rc) {
> +		pr_err("OPAL failed to get vp state for VCPU %d : %lld\n",
> +		       vp_id, rc);
> +		return -EIO;
> +	}
> +
> +	if (out_state)
> +		*out_state = be64_to_cpu(state);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(xive_native_get_vp_state);
> diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
> index f4875fe3f8ff..3179953d6b56 100644
> --- a/arch/powerpc/platforms/powernv/opal-wrappers.S
> +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
> @@ -309,6 +309,9 @@ OPAL_CALL(opal_xive_get_vp_info,		OPAL_XIVE_GET_VP_INFO);
>  OPAL_CALL(opal_xive_set_vp_info,		OPAL_XIVE_SET_VP_INFO);
>  OPAL_CALL(opal_xive_sync,			OPAL_XIVE_SYNC);
>  OPAL_CALL(opal_xive_dump,			OPAL_XIVE_DUMP);
> +OPAL_CALL(opal_xive_get_queue_state,		OPAL_XIVE_GET_QUEUE_STATE);
> +OPAL_CALL(opal_xive_set_queue_state,		OPAL_XIVE_SET_QUEUE_STATE);
> +OPAL_CALL(opal_xive_get_vp_state,		OPAL_XIVE_GET_VP_STATE);
>  OPAL_CALL(opal_signal_system_reset,		OPAL_SIGNAL_SYSTEM_RESET);
>  OPAL_CALL(opal_npu_init_context,		OPAL_NPU_INIT_CONTEXT);
>  OPAL_CALL(opal_npu_destroy_context,		OPAL_NPU_DESTROY_CONTEXT);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type
  2019-01-07 18:43 ` [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type Cédric Le Goater
@ 2019-01-09  4:27   ` David Gibson
  2019-01-22  4:56   ` Paul Mackerras
  1 sibling, 0 replies; 47+ messages in thread
From: David Gibson @ 2019-01-09  4:27 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1363 bytes --]

On Mon, Jan 07, 2019 at 07:43:15PM +0100, Cédric Le Goater wrote:
> We will have different KVM devices for interrupts, one for the
> XICS-over-XIVE mode and one for the XIVE native exploitation
> mode. Let's add some checks to make sure we are not mixing the
> interfaces in KVM.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_xive.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index f78d002f0fe0..8a4fa45f07f8 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -819,6 +819,9 @@ u64 kvmppc_xive_get_icp(struct kvm_vcpu *vcpu)
>  {
>  	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
>  
> +	if (!kvmppc_xics_enabled(vcpu))
> +		return -EPERM;
> +
>  	if (!xc)
>  		return 0;
>  
> @@ -835,6 +838,9 @@ int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
>  	u8 cppr, mfrr;
>  	u32 xisr;
>  
> +	if (!kvmppc_xics_enabled(vcpu))
> +		return -EPERM;
> +
>  	if (!xc || !xive)
>  		return -ENOENT;
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls
  2019-01-07 18:43 ` [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls Cédric Le Goater
  2019-01-09  3:33   ` David Gibson
@ 2019-01-09 13:08   ` Michael Ellerman
  2019-01-09 13:38     ` Cédric Le Goater
  1 sibling, 1 reply; 47+ messages in thread
From: Michael Ellerman @ 2019-01-09 13:08 UTC (permalink / raw)
  To: Cédric Le Goater, kvm-ppc
  Cc: kvm, Paul Mackerras, Cédric Le Goater, linuxppc-dev, David Gibson

Cédric Le Goater <clg@kaod.org> writes:

> These flags are shared between Linux/KVM implementing the hypervisor
> calls for the XIVE native exploitation mode and the driver for the
> sPAPR guests.
>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  arch/powerpc/include/asm/xive.h  | 23 +++++++++++++++++++++++
>  arch/powerpc/sysdev/xive/spapr.c | 28 ++++++++--------------------
>  2 files changed, 31 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
> index 3c704f5dd3ae..32f033bfbf42 100644
> --- a/arch/powerpc/include/asm/xive.h
> +++ b/arch/powerpc/include/asm/xive.h
> @@ -93,6 +93,29 @@ extern void xive_flush_interrupt(void);
>  /* xmon hook */
>  extern void xmon_xive_do_dump(int cpu);
>  
> +/*
> + * Hcall flags shared by the sPAPR backend and KVM
> + */
> +
> +/* H_INT_GET_SOURCE_INFO */
> +#define XIVE_SPAPR_SRC_H_INT_ESB	PPC_BIT(60)
> +#define XIVE_SPAPR_SRC_LSI		PPC_BIT(61)
> +#define XIVE_SPAPR_SRC_TRIGGER		PPC_BIT(62)
> +#define XIVE_SPAPR_SRC_STORE_EOI	PPC_BIT(63)

I have an (irrational) hatred of PPC_BIT, because it obfuscates what's
going on and makes PPC seem weirder than it needs to be. It could at
least be called IBM_BIT().

I know it helps people compare the code vs the documentation, but
basically no one has the documentation, and everyone has the code.

Anyway it's not a show stopper, just a pet-peeve of mine :)

cheers

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls
  2019-01-09 13:08   ` Michael Ellerman
@ 2019-01-09 13:38     ` Cédric Le Goater
  0 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-09 13:38 UTC (permalink / raw)
  To: Michael Ellerman, kvm-ppc; +Cc: Paul Mackerras, linuxppc-dev, kvm, David Gibson

On 1/9/19 2:08 PM, Michael Ellerman wrote:
> Cédric Le Goater <clg@kaod.org> writes:
> 
>> These flags are shared between Linux/KVM implementing the hypervisor
>> calls for the XIVE native exploitation mode and the driver for the
>> sPAPR guests.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  arch/powerpc/include/asm/xive.h  | 23 +++++++++++++++++++++++
>>  arch/powerpc/sysdev/xive/spapr.c | 28 ++++++++--------------------
>>  2 files changed, 31 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
>> index 3c704f5dd3ae..32f033bfbf42 100644
>> --- a/arch/powerpc/include/asm/xive.h
>> +++ b/arch/powerpc/include/asm/xive.h
>> @@ -93,6 +93,29 @@ extern void xive_flush_interrupt(void);
>>  /* xmon hook */
>>  extern void xmon_xive_do_dump(int cpu);
>>  
>> +/*
>> + * Hcall flags shared by the sPAPR backend and KVM
>> + */
>> +
>> +/* H_INT_GET_SOURCE_INFO */
>> +#define XIVE_SPAPR_SRC_H_INT_ESB	PPC_BIT(60)
>> +#define XIVE_SPAPR_SRC_LSI		PPC_BIT(61)
>> +#define XIVE_SPAPR_SRC_TRIGGER		PPC_BIT(62)
>> +#define XIVE_SPAPR_SRC_STORE_EOI	PPC_BIT(63)
> 
> I have an (irrational) hatred of PPC_BIT, because it obfuscates what's
> going on and makes PPC seem weirder than it needs to be. It could at
> least be called IBM_BIT().
> 
> I know it helps people compare the code vs the documentation, but
> basically no one has the documentation, and everyone has the code.
> 
> Anyway it's not a show stopper, just a pet-peeve of mine :)

Only the define matters, I can change that back to the non-PPC_BIT
version in v2. Not a problem. 

Cheers,

C. 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/19] KVM: PPC: Book3S HV: export services for the XIVE native exploitation device
  2019-01-07 18:43 ` [PATCH 04/19] KVM: PPC: Book3S HV: export services for the XIVE native exploitation device Cédric Le Goater
@ 2019-01-11  4:09   ` David Gibson
  0 siblings, 0 replies; 47+ messages in thread
From: David Gibson @ 2019-01-11  4:09 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 8687 bytes --]

On Mon, Jan 07, 2019 at 07:43:16PM +0100, Cédric Le Goater wrote:
> The KVM device for the XIVE native exploitation mode will reuse the
> structures of the XICS-over-XIVE glue implementation. Some code will
> also be shared : source block creation and destruction, target
> selection and escalation attachment.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_xive.h | 11 +++++
>  arch/powerpc/kvm/book3s_xive.c | 89 +++++++++++++++++++---------------
>  2 files changed, 62 insertions(+), 38 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
> index a08ae6fd4c51..10c4aa5cd010 100644
> --- a/arch/powerpc/kvm/book3s_xive.h
> +++ b/arch/powerpc/kvm/book3s_xive.h
> @@ -248,5 +248,16 @@ extern int (*__xive_vm_h_ipi)(struct kvm_vcpu *vcpu, unsigned long server,
>  extern int (*__xive_vm_h_cppr)(struct kvm_vcpu *vcpu, unsigned long cppr);
>  extern int (*__xive_vm_h_eoi)(struct kvm_vcpu *vcpu, unsigned long xirr);
>  
> +/*
> + * Common Xive routines for XICS-over-XIVE and XIVE native
> + */
> +struct kvmppc_xive_src_block *kvmppc_xive_create_src_block(
> +	struct kvmppc_xive *xive, int irq);
> +void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb);
> +int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio);
> +void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu);
> +int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio);
> +int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu);
> +
>  #endif /* CONFIG_KVM_XICS */
>  #endif /* _KVM_PPC_BOOK3S_XICS_H */
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 8a4fa45f07f8..bb5d32f7e4e6 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -166,7 +166,7 @@ static irqreturn_t xive_esc_irq(int irq, void *data)
>  	return IRQ_HANDLED;
>  }
>  
> -static int xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio)
> +int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio)
>  {
>  	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
>  	struct xive_q *q = &xc->queues[prio];
> @@ -291,7 +291,7 @@ static int xive_check_provisioning(struct kvm *kvm, u8 prio)
>  			continue;
>  		rc = xive_provision_queue(vcpu, prio);
>  		if (rc == 0 && !xive->single_escalation)
> -			xive_attach_escalation(vcpu, prio);
> +			kvmppc_xive_attach_escalation(vcpu, prio);
>  		if (rc)
>  			return rc;
>  	}
> @@ -342,7 +342,7 @@ static int xive_try_pick_queue(struct kvm_vcpu *vcpu, u8 prio)
>  	return atomic_add_unless(&q->count, 1, max) ? 0 : -EBUSY;
>  }
>  
> -static int xive_select_target(struct kvm *kvm, u32 *server, u8 prio)
> +int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio)
>  {
>  	struct kvm_vcpu *vcpu;
>  	int i, rc;
> @@ -535,7 +535,7 @@ static int xive_target_interrupt(struct kvm *kvm,
>  	 * priority. The count for that new target will have
>  	 * already been incremented.
>  	 */
> -	rc = xive_select_target(kvm, &server, prio);
> +	rc = kvmppc_xive_select_target(kvm, &server, prio);
>  
>  	/*
>  	 * We failed to find a target ? Not much we can do
> @@ -1055,7 +1055,7 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_xive_clr_mapped);
>  
> -static void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
> +void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
>  {
>  	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
>  	struct kvm *kvm = vcpu->kvm;
> @@ -1225,7 +1225,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
>  		if (xive->qmap & (1 << i)) {
>  			r = xive_provision_queue(vcpu, i);
>  			if (r == 0 && !xive->single_escalation)
> -				xive_attach_escalation(vcpu, i);
> +				kvmppc_xive_attach_escalation(vcpu, i);
>  			if (r)
>  				goto bail;
>  		} else {
> @@ -1240,7 +1240,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
>  	}
>  
>  	/* If not done above, attach priority 0 escalation */
> -	r = xive_attach_escalation(vcpu, 0);
> +	r = kvmppc_xive_attach_escalation(vcpu, 0);
>  	if (r)
>  		goto bail;
>  
> @@ -1491,8 +1491,8 @@ static int xive_get_source(struct kvmppc_xive *xive, long irq, u64 addr)
>  	return 0;
>  }
>  
> -static struct kvmppc_xive_src_block *xive_create_src_block(struct kvmppc_xive *xive,
> -							   int irq)
> +struct kvmppc_xive_src_block *kvmppc_xive_create_src_block(
> +	struct kvmppc_xive *xive, int irq)
>  {
>  	struct kvm *kvm = xive->kvm;
>  	struct kvmppc_xive_src_block *sb;
> @@ -1571,7 +1571,7 @@ static int xive_set_source(struct kvmppc_xive *xive, long irq, u64 addr)
>  	sb = kvmppc_xive_find_source(xive, irq, &idx);
>  	if (!sb) {
>  		pr_devel("No source, creating source block...\n");
> -		sb = xive_create_src_block(xive, irq);
> +		sb = kvmppc_xive_create_src_block(xive, irq);
>  		if (!sb) {
>  			pr_devel("Failed to create block...\n");
>  			return -ENOMEM;
> @@ -1795,7 +1795,7 @@ static void kvmppc_xive_cleanup_irq(u32 hw_num, struct xive_irq_data *xd)
>  	xive_cleanup_irq_data(xd);
>  }
>  
> -static void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
> +void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
>  {
>  	int i;
>  
> @@ -1824,6 +1824,8 @@ static void kvmppc_xive_free(struct kvm_device *dev)
>  
>  	debugfs_remove(xive->dentry);
>  
> +	pr_devel("Destroying xive for partition\n");
> +
>  	if (kvm)
>  		kvm->arch.xive = NULL;
>  
> @@ -1889,6 +1891,43 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
>  	return 0;
>  }
>  
> +int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu)
> +{
> +	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
> +	unsigned int i;
> +
> +	for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
> +		struct xive_q *q = &xc->queues[i];
> +		u32 i0, i1, idx;
> +
> +		if (!q->qpage && !xc->esc_virq[i])
> +			continue;
> +
> +		seq_printf(m, " [q%d]: ", i);
> +
> +		if (q->qpage) {
> +			idx = q->idx;
> +			i0 = be32_to_cpup(q->qpage + idx);
> +			idx = (idx + 1) & q->msk;
> +			i1 = be32_to_cpup(q->qpage + idx);
> +			seq_printf(m, "T=%d %08x %08x...\n", q->toggle,
> +				   i0, i1);
> +		}
> +		if (xc->esc_virq[i]) {
> +			struct irq_data *d = irq_get_irq_data(xc->esc_virq[i]);
> +			struct xive_irq_data *xd =
> +				irq_data_get_irq_handler_data(d);
> +			u64 pq = xive_vm_esb_load(xd, XIVE_ESB_GET);
> +
> +			seq_printf(m, "E:%c%c I(%d:%llx:%llx)",
> +				   (pq & XIVE_ESB_VAL_P) ? 'P' : 'p',
> +				   (pq & XIVE_ESB_VAL_Q) ? 'Q' : 'q',
> +				   xc->esc_virq[i], pq, xd->eoi_page);
> +			seq_puts(m, "\n");
> +		}
> +	}
> +	return 0;
> +}
>  
>  static int xive_debug_show(struct seq_file *m, void *private)
>  {
> @@ -1914,7 +1953,6 @@ static int xive_debug_show(struct seq_file *m, void *private)
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
> -		unsigned int i;
>  
>  		if (!xc)
>  			continue;
> @@ -1924,33 +1962,8 @@ static int xive_debug_show(struct seq_file *m, void *private)
>  			   xc->server_num, xc->cppr, xc->hw_cppr,
>  			   xc->mfrr, xc->pending,
>  			   xc->stat_rm_h_xirr, xc->stat_vm_h_xirr);
> -		for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
> -			struct xive_q *q = &xc->queues[i];
> -			u32 i0, i1, idx;
>  
> -			if (!q->qpage && !xc->esc_virq[i])
> -				continue;
> -
> -			seq_printf(m, " [q%d]: ", i);
> -
> -			if (q->qpage) {
> -				idx = q->idx;
> -				i0 = be32_to_cpup(q->qpage + idx);
> -				idx = (idx + 1) & q->msk;
> -				i1 = be32_to_cpup(q->qpage + idx);
> -				seq_printf(m, "T=%d %08x %08x... \n", q->toggle, i0, i1);
> -			}
> -			if (xc->esc_virq[i]) {
> -				struct irq_data *d = irq_get_irq_data(xc->esc_virq[i]);
> -				struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
> -				u64 pq = xive_vm_esb_load(xd, XIVE_ESB_GET);
> -				seq_printf(m, "E:%c%c I(%d:%llx:%llx)",
> -					   (pq & XIVE_ESB_VAL_P) ? 'P' : 'p',
> -					   (pq & XIVE_ESB_VAL_Q) ? 'Q' : 'q',
> -					   xc->esc_virq[i], pq, xd->eoi_page);
> -				seq_printf(m, "\n");
> -			}
> -		}
> +		kvmppc_xive_debug_show_queues(m, vcpu);
>  
>  		t_rm_h_xirr += xc->stat_rm_h_xirr;
>  		t_rm_h_ipoll += xc->stat_rm_h_ipoll;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode
  2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
                   ` (16 preceding siblings ...)
  2019-01-07 19:10 ` [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state Cédric Le Goater
@ 2019-01-22  4:46 ` Paul Mackerras
  2019-01-23 19:07   ` Cédric Le Goater
  17 siblings, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2019-01-22  4:46 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On Mon, Jan 07, 2019 at 07:43:12PM +0100, Cédric Le Goater wrote:
> Hello,
> 
> On the POWER9 processor, the XIVE interrupt controller can control
> interrupt sources using MMIO to trigger events, to EOI or to turn off
> the sources. Priority management and interrupt acknowledgment is also
> controlled by MMIO in the CPU presenter subengine.
> 
> PowerNV/baremetal Linux runs natively under XIVE but sPAPR guests need
> special support from the hypervisor to do the same. This is called the
> XIVE native exploitation mode and today, it can be activated under the
> PowerPC Hypervisor, pHyp. However, Linux/KVM lacks XIVE native support
> and still offers the old interrupt mode interface using a
> XICS-over-XIVE glue which implements the XICS hcalls.
> 
> The following series is proposal to add the same support under KVM.
> 
> A new KVM device is introduced for the XIVE native exploitation
> mode. It reuses most of the XICS-over-XIVE glue implementation
> structures which are internal to KVM but has a completely different
> interface. A set of Hypervisor calls configures the sources and the
> event queues and from there, all control is done by the guest through
> MMIOs.
> 
> These MMIO regions (ESB and TIMA) are exposed to guests in QEMU,
> similarly to VFIO, and the associated VMAs are populated dynamically
> with the appropriate pages using a fault handler. This is implemented
> with a couple of KVM device ioctls.
> 
> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> negotiation process determines whether the guest operates with a
> interrupt controller using the XICS legacy model, as found on POWER8,
> or in XIVE exploitation mode. Which means that the KVM interrupt
> device should be created at runtime, after the machine as started.
> This requires extra KVM support to create/destroy KVM devices. The
> last patches are an attempt to solve that problem.
> 
> Migration has its own specific needs. The patchset provides the
> necessary routines to quiesce XIVE, to capture and restore the state
> of the different structures used by KVM, OPAL and HW. Extra OPAL
> support is required for these.

Thanks for the patchset.  It mostly looks good, but there are some
more things we need to consider, and I think a v2 will be needed.

One general comment I have is that there are a lot of acronyms in this
code and you mostly seem to assume that people will know what they all
mean.  It would make the code more readable if you provide the
expansion of the acronym on first use in a comment or whatever.  For
example, one of the patches in this series talks about the "EAS"
without ever expanding it in any comment or in the patch description,
and I have forgotten just at the moment what EAS stands for (I just
know that understanding the XIVE is not eas-y. :)

Another general comment is that you seem to have written all this
code assuming we are using HV KVM in a host running bare-metal.
However, we could be using PR KVM (either in a bare-metal host or in a
guest), or we could be doing nested HV KVM where we are using the
kvm_hv module inside a KVM guest and using special hypercalls for
controlling our guests.

It would be perfectly acceptable for now to say that we don't yet
support XIVE exploitation in those scenarios, as long as we then make
sure that the new KVM capability reports false in those scenarios, and
any attempt to use the XIVE exploitation interfaces fails cleanly.
I don't see that either of those is true in the patch set as it
stands, so that is one area that needs to be fixed.

A third general comment is that the new KVM interfaces you have added
need to be documented in the files under Documentation/virtual/kvm.

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type
  2019-01-07 18:43 ` [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type Cédric Le Goater
  2019-01-09  4:27   ` David Gibson
@ 2019-01-22  4:56   ` Paul Mackerras
  2019-01-23 16:24     ` Cédric Le Goater
  1 sibling, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2019-01-22  4:56 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On Mon, Jan 07, 2019 at 07:43:15PM +0100, Cédric Le Goater wrote:
> We will have different KVM devices for interrupts, one for the
> XICS-over-XIVE mode and one for the XIVE native exploitation
> mode. Let's add some checks to make sure we are not mixing the
> interfaces in KVM.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  arch/powerpc/kvm/book3s_xive.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index f78d002f0fe0..8a4fa45f07f8 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -819,6 +819,9 @@ u64 kvmppc_xive_get_icp(struct kvm_vcpu *vcpu)
>  {
>  	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
>  
> +	if (!kvmppc_xics_enabled(vcpu))
> +		return -EPERM;
> +
>  	if (!xc)
>  		return 0;
>  
> @@ -835,6 +838,9 @@ int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
>  	u8 cppr, mfrr;
>  	u32 xisr;
>  
> +	if (!kvmppc_xics_enabled(vcpu))
> +		return -EPERM;
> +
>  	if (!xc || !xive)
>  		return -ENOENT;

I can't see how these new checks could ever trigger in the code as it
stands.  Is there a way at present?  Do following patches ever add a
path where the new checks could trigger, or is this just an excess of
caution?  (Your patch description should ideally have answered these
questions for me.)

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/19] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode
  2019-01-07 18:43 ` [PATCH 05/19] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode Cédric Le Goater
@ 2019-01-22  5:05   ` Paul Mackerras
  2019-01-23 16:28     ` Cédric Le Goater
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2019-01-22  5:05 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On Mon, Jan 07, 2019 at 07:43:17PM +0100, Cédric Le Goater wrote:
> This is the basic framework for the new KVM device supporting the XIVE
> native exploitation mode. The user interface exposes a new capability
> and a new KVM device to be used by QEMU.

[snip]
> @@ -1039,7 +1039,10 @@ static int kvmppc_book3s_init(void)
>  #ifdef CONFIG_KVM_XIVE
>  	if (xive_enabled()) {
>  		kvmppc_xive_init_module();
> +		kvmppc_xive_native_init_module();
>  		kvm_register_device_ops(&kvm_xive_ops, KVM_DEV_TYPE_XICS);
> +		kvm_register_device_ops(&kvm_xive_native_ops,
> +					KVM_DEV_TYPE_XIVE);

I think we want tighter conditions on initializing the xive_native
stuff and creating the xive device class.  We could have
xive_enabled() returning true in a guest, and this code will get
called both by PR KVM and HV KVM (and HV KVM no longer implies that we
are running bare metal).

> @@ -1050,8 +1053,10 @@ static int kvmppc_book3s_init(void)
>  static void kvmppc_book3s_exit(void)
>  {
>  #ifdef CONFIG_KVM_XICS
> -	if (xive_enabled())
> +	if (xive_enabled()) {
>  		kvmppc_xive_exit_module();
> +		kvmppc_xive_native_exit_module();

Same comment here.

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device
  2019-01-07 18:43 ` [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device Cédric Le Goater
@ 2019-01-22  5:09   ` Paul Mackerras
  2019-01-23 16:48     ` Cédric Le Goater
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2019-01-22  5:09 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On Mon, Jan 07, 2019 at 07:43:18PM +0100, Cédric Le Goater wrote:
> This will let the guest create a memory mapping to expose the ESB MMIO
> regions used to control the interrupt sources, to trigger events, to
> EOI or to turn off the sources.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  arch/powerpc/include/uapi/asm/kvm.h   |  4 ++
>  arch/powerpc/kvm/book3s_xive_native.c | 97 +++++++++++++++++++++++++++
>  2 files changed, 101 insertions(+)
> 
> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
> index 8c876c166ef2..6bb61ba141c2 100644
> --- a/arch/powerpc/include/uapi/asm/kvm.h
> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> @@ -675,4 +675,8 @@ struct kvm_ppc_cpu_char {
>  #define  KVM_XICS_PRESENTED		(1ULL << 43)
>  #define  KVM_XICS_QUEUED		(1ULL << 44)
>  
> +/* POWER9 XIVE Native Interrupt Controller */
> +#define KVM_DEV_XIVE_GRP_CTRL		1
> +#define   KVM_DEV_XIVE_GET_ESB_FD	1
> +
>  #endif /* __LINUX_KVM_POWERPC_H */
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index 115143e76c45..e20081f0c8d4 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -153,6 +153,85 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
>  	return rc;
>  }
>  
> +static int xive_native_esb_fault(struct vm_fault *vmf)
> +{
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct kvmppc_xive *xive = vma->vm_file->private_data;
> +	struct kvmppc_xive_src_block *sb;
> +	struct kvmppc_xive_irq_state *state;
> +	struct xive_irq_data *xd;
> +	u32 hw_num;
> +	u16 src;
> +	u64 page;
> +	unsigned long irq;
> +
> +	/*
> +	 * Linux/KVM uses a two pages ESB setting, one for trigger and
> +	 * one for EOI
> +	 */
> +	irq = vmf->pgoff / 2;
> +
> +	sb = kvmppc_xive_find_source(xive, irq, &src);
> +	if (!sb) {
> +		pr_err("%s: source %lx not found !\n", __func__, irq);

In general it's a bad idea to have a printk that userspace can trigger
at will without any rate-limiting.  Is there a real reason why this
printk is needed (and can't be pr_devel)?

> +		return VM_FAULT_SIGBUS;
> +	}
> +
> +	state = &sb->irq_state[src];
> +	kvmppc_xive_select_irq(state, &hw_num, &xd);
> +
> +	arch_spin_lock(&sb->lock);
> +
> +	/*
> +	 * first/even page is for trigger
> +	 * second/odd page is for EOI and management.
> +	 */
> +	page = vmf->pgoff % 2 ? xd->eoi_page : xd->trig_page;
> +	arch_spin_unlock(&sb->lock);
> +
> +	if (!page) {
> +		pr_err("%s: acessing invalid ESB page for source %lx !\n",
> +		       __func__, irq);

Does this represent a exceptional condition that userspace can't just
trigger at will (i.e. it implies the presence of a kernel bug)?  If
not then the same comment as above applies.

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/19] KVM: PPC: Book3S HV: add a VC_BASE control to the XIVE native device
  2019-01-07 18:43 ` [PATCH 08/19] KVM: PPC: Book3S HV: add a VC_BASE control to the " Cédric Le Goater
@ 2019-01-22  5:14   ` Paul Mackerras
  2019-01-23 16:56     ` Cédric Le Goater
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2019-01-22  5:14 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On Mon, Jan 07, 2019 at 07:43:20PM +0100, Cédric Le Goater wrote:
> The ESB MMIO region controls the interrupt sources of the guest. QEMU
> will query an fd (GET_ESB_FD ioctl) and map this region at a specific
> address for the guest to use. The guest will obtain this information
> using the H_INT_GET_SOURCE_INFO hcall. To inform KVM of the address
> setting used by QEMU, add a VC_BASE control to the KVM XIVE device

This needs a little more explanation.  I *think* the only way this
gets used is that it gets returned to the guest by the new
hypercalls.  If that is indeed the case it would be useful to mention
that in the patch description, because otherwise taking a value that
userspace provides and which looks like it is an address, and not
doing any validation on it, looks a bit scary.

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls
  2019-01-07 18:43 ` [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls Cédric Le Goater
@ 2019-01-22  5:23   ` Paul Mackerras
       [not found]     ` <de777e750665d92509b1487a90acb79d70854cf3.camel@kernel.crashing.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2019-01-22  5:23 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On Mon, Jan 07, 2019 at 07:43:23PM +0100, Cédric Le Goater wrote:
> The XIVE native exploitation mode specs define a set of Hypervisor
> calls to configure the sources and the event queues :
> 
>  - H_INT_GET_SOURCE_INFO
> 
>    used to obtain the address of the MMIO page of the Event State
>    Buffer (PQ bits) entry associated with the source.
> 
>  - H_INT_SET_SOURCE_CONFIG
> 
>    assigns a source to a "target".
> 
>  - H_INT_GET_SOURCE_CONFIG
> 
>    determines which "target" and "priority" is assigned to a source
> 
>  - H_INT_GET_QUEUE_INFO
> 
>    returns the address of the notification management page associated
>    with the specified "target" and "priority".
> 
>  - H_INT_SET_QUEUE_CONFIG
> 
>    sets or resets the event queue for a given "target" and "priority".
>    It is also used to set the notification configuration associated
>    with the queue, only unconditional notification is supported for
>    the moment. Reset is performed with a queue size of 0 and queueing
>    is disabled in that case.
> 
>  - H_INT_GET_QUEUE_CONFIG
> 
>    returns the queue settings for a given "target" and "priority".
> 
>  - H_INT_RESET
> 
>    resets all of the guest's internal interrupt structures to their
>    initial state, losing all configuration set via the hcalls
>    H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
> 
>  - H_INT_SYNC
> 
>    issue a synchronisation on a source to make sure all notifications
>    have reached their queue.

Which ones of these could be implemented in QEMU?  Are there any that
can't possibly be implemented in QEMU because they need to do things
that require calling internal interfaces that userspace doesn't have
access to?

How often do we expect each of these hypercalls to be called?

[snip]

> @@ -682,6 +685,46 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr);
>  int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
>  void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu);
>  
> +int kvmppc_rm_h_int_get_source_info(struct kvm_vcpu *vcpu,
> +				    unsigned long flag,
> +				    unsigned long lisn);
> +int kvmppc_rm_h_int_set_source_config(struct kvm_vcpu *vcpu,
> +				      unsigned long flag,
> +				      unsigned long lisn,
> +				      unsigned long target,
> +				      unsigned long priority,
> +				      unsigned long eisn);
> +int kvmppc_rm_h_int_get_source_config(struct kvm_vcpu *vcpu,
> +				      unsigned long flag,
> +				      unsigned long lisn);
> +int kvmppc_rm_h_int_get_queue_info(struct kvm_vcpu *vcpu,
> +				   unsigned long flag,
> +				   unsigned long target,
> +				   unsigned long priority);
> +int kvmppc_rm_h_int_set_queue_config(struct kvm_vcpu *vcpu,
> +				     unsigned long flag,
> +				     unsigned long target,
> +				     unsigned long priority,
> +				     unsigned long qpage,
> +				     unsigned long qsize);
> +int kvmppc_rm_h_int_get_queue_config(struct kvm_vcpu *vcpu,
> +				     unsigned long flag,
> +				     unsigned long target,
> +				     unsigned long priority);
> +int kvmppc_rm_h_int_set_os_reporting_line(struct kvm_vcpu *vcpu,
> +					  unsigned long flag,
> +					  unsigned long reportingline);
> +int kvmppc_rm_h_int_get_os_reporting_line(struct kvm_vcpu *vcpu,
> +					  unsigned long flag,
> +					  unsigned long target,
> +					  unsigned long reportingline);
> +int kvmppc_rm_h_int_esb(struct kvm_vcpu *vcpu, unsigned long flag,
> +			unsigned long lisn, unsigned long offset,
> +			unsigned long data);
> +int kvmppc_rm_h_int_sync(struct kvm_vcpu *vcpu, unsigned long flag,
> +			 unsigned long lisn);
> +int kvmppc_rm_h_int_reset(struct kvm_vcpu *vcpu, unsigned long flag);

Why do we need to provide real-mode versions of these hypercall
handlers?  I thought these hypercalls would only get called
infrequently, and in any case certainly much less frequently than once
per interrupt delivered.  If they are infrequent, then let's leave out
the real-mode version and just handle them in book3s_hv.c.

> @@ -5153,6 +5169,19 @@ static unsigned int default_hcall_list[] = {
>  	H_IPOLL,
>  	H_XIRR,
>  	H_XIRR_X,
> +#endif
> +#ifdef CONFIG_KVM_XIVE
> +	H_INT_GET_SOURCE_INFO,
> +	H_INT_SET_SOURCE_CONFIG,
> +	H_INT_GET_SOURCE_CONFIG,
> +	H_INT_GET_QUEUE_INFO,
> +	H_INT_SET_QUEUE_CONFIG,
> +	H_INT_GET_QUEUE_CONFIG,
> +	H_INT_SET_OS_REPORTING_LINE,
> +	H_INT_GET_OS_REPORTING_LINE,
> +	H_INT_ESB,
> +	H_INT_SYNC,
> +	H_INT_RESET,
>  #endif

The policy is not to add new hcalls to default_hcall_list[].  Is there
a strong reason for adding them here?

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support
  2019-01-07 19:10   ` [PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support Cédric Le Goater
@ 2019-01-22  5:26     ` Paul Mackerras
  2019-01-23  6:45       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2019-01-22  5:26 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On Mon, Jan 07, 2019 at 08:10:05PM +0100, Cédric Le Goater wrote:
> Clear the ESB pages from the VMA of the IRQ being pass through to the
> guest and let the fault handler repopulate the VMA when the ESB pages
> are accessed for an EOI or for a trigger.

Why do we want to do this?

I don't see any possible advantage to removing the PTEs from the
userspace mapping.  You'll need to explain further.

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl
  2019-01-07 19:10   ` [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl Cédric Le Goater
@ 2019-01-22  5:42     ` Paul Mackerras
  2019-01-23 18:39       ` Cédric Le Goater
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2019-01-22  5:42 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On Mon, Jan 07, 2019 at 08:10:06PM +0100, Cédric Le Goater wrote:
> This will be used to destroy the KVM XICS or XIVE device when the
> sPAPR machine is reseted. When the VM boots, the CAS negotiation
> process will determine which interrupt mode to use and the appropriate
> KVM device will then be created.

What would be the consequence if we didn't destroy the device?

The reason I ask is that we will have to be much more careful about
memory allocation lifetimes with this patch.  Having KVM devices last
until the KVM instance is destroyed means that we generally avoid
use-after-free bugs.  With this patch we will have to do a careful
analysis of the lifetime of the xive structures vs. possible accesses
on other threads to prove there are no use-after-free bugs.

For example, it is not sufficient to set any pointers in struct kvm or
struct kvm_vcpu that point into xive structures to NULL before freeing
the structures.  There could be code on another CPU that has read the
pointer value before you set it to NULL and then goes and accesses it
after you have freed it.  You need to prove that can't happen,
possibly using some sort of explicit synchronization that ensures that
no other CPU could still be accessing the structure at the time when
you free it.  RCU can help with this, but in general means you need
RCU synchronization primitives (rcu_read_lock() etc.) at all the
places where you use the pointer, which I don't think you currently
have.

If there is a good fundamental reason why this can't happen, even
though you don't have explicit synchronization, then at a minimum you
need to explain that in the patch description, and ideally also in
code comments.

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support
  2019-01-22  5:26     ` Paul Mackerras
@ 2019-01-23  6:45       ` Benjamin Herrenschmidt
  2019-01-23 10:30         ` Paul Mackerras
  0 siblings, 1 reply; 47+ messages in thread
From: Benjamin Herrenschmidt @ 2019-01-23  6:45 UTC (permalink / raw)
  To: Paul Mackerras, Cédric Le Goater
  Cc: linuxppc-dev, kvm, kvm-ppc, David Gibson

On Tue, 2019-01-22 at 16:26 +1100, Paul Mackerras wrote:
> On Mon, Jan 07, 2019 at 08:10:05PM +0100, Cédric Le Goater wrote:
> > Clear the ESB pages from the VMA of the IRQ being pass through to the
> > guest and let the fault handler repopulate the VMA when the ESB pages
> > are accessed for an EOI or for a trigger.
> 
> Why do we want to do this?
> 
> I don't see any possible advantage to removing the PTEs from the
> userspace mapping.  You'll need to explain further.

Afaik bcs we change the mapping to point to the real HW irq ESB page
instead of the "IPI" that was there at VM init time.

Cedric, is that correct ?

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls
       [not found]     ` <de777e750665d92509b1487a90acb79d70854cf3.camel@kernel.crashing.org>
@ 2019-01-23  8:48       ` Cédric Le Goater
  2019-01-23 10:26         ` Paul Mackerras
  0 siblings, 1 reply; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-23  8:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, kvm, kvm-ppc, David Gibson

On 1/23/19 7:44 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2019-01-22 at 16:23 +1100, Paul Mackerras wrote:
>>
>> Which ones of these could be implemented in QEMU?  Are there any that
>> can't possibly be implemented in QEMU because they need to do things
>> that require calling internal interfaces that userspace doesn't have
>> access to?
> 
> Implementing them in qemu doesn't make a lot of sense. Qemu doesn't
> have access to most of the XIVE HW state. There's a XIVE model for full
> emulation but when using the real thing, almost none of it is visible.
> A lot of those hcalls effectively turn into OPAL calls.
> 
>> How often do we expect each of these hypercalls to be called?
> 
> It depends, I can't tell for AIX. For Linux, not often with one
> exception: H_INT_ESB which will be used whenever you EOI an emulated
> LSI.

yes. This one is only doing loads and stores.

>  .../...
> 
>> Why do we need to provide real-mode versions of these hypercall
>> handlers?  I thought these hypercalls would only get called
>> infrequently, and in any case certainly much less frequently than once
>> per interrupt delivered.  If they are infrequent, then let's leave out
>> the real-mode version and just handle them in book3s_hv.c.
> 
> Agreed with the exception maybe of H_INT_ESB

ok. 

Some of these hcalls are really simple and only getting local info from 
the host (h_int_get_*). I thought handling the hcall ASAP was a preferred 
practice, even if the hcall is not called frequently. Isn't it ?

 
>>> @@ -5153,6 +5169,19 @@ static unsigned int default_hcall_list[] = {
>>>        H_IPOLL,
>>>        H_XIRR,
>>>        H_XIRR_X,
>>> +#endif
>>> +#ifdef CONFIG_KVM_XIVE
>>> +     H_INT_GET_SOURCE_INFO,
>>> +     H_INT_SET_SOURCE_CONFIG,
>>> +     H_INT_GET_SOURCE_CONFIG,
>>> +     H_INT_GET_QUEUE_INFO,
>>> +     H_INT_SET_QUEUE_CONFIG,
>>> +     H_INT_GET_QUEUE_CONFIG,
>>> +     H_INT_SET_OS_REPORTING_LINE,
>>> +     H_INT_GET_OS_REPORTING_LINE,
>>> +     H_INT_ESB,
>>> +     H_INT_SYNC,
>>> +     H_INT_RESET,
>>>   #endif
>>
>> The policy is not to add new hcalls to default_hcall_list[].  Is there
>> a strong reason for adding them here?

I don't remember. I will check for v2.

Thanks, 

C.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls
  2019-01-23  8:48       ` Cédric Le Goater
@ 2019-01-23 10:26         ` Paul Mackerras
  2019-01-23 10:48           ` Cédric Le Goater
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Mackerras @ 2019-01-23 10:26 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On Wed, Jan 23, 2019 at 09:48:31AM +0100, Cédric Le Goater wrote:
> On 1/23/19 7:44 AM, Benjamin Herrenschmidt wrote:
> > On Tue, 2019-01-22 at 16:23 +1100, Paul Mackerras wrote:
> >> Why do we need to provide real-mode versions of these hypercall
> >> handlers?  I thought these hypercalls would only get called
> >> infrequently, and in any case certainly much less frequently than once
> >> per interrupt delivered.  If they are infrequent, then let's leave out
> >> the real-mode version and just handle them in book3s_hv.c.
> > 
> > Agreed with the exception maybe of H_INT_ESB
> 
> ok. 
> 
> Some of these hcalls are really simple and only getting local info from 
> the host (h_int_get_*). I thought handling the hcall ASAP was a preferred 
> practice, even if the hcall is not called frequently. Isn't it ?

If we are going to handle a given hcall in the kernel at all, then we
have to have a virtual mode handler.  If we have a real-mode handler
as well then we in general incur a certain amount of code duplication
with consequent maintenance costs and possibility of bugs.  So we
generally only have real-mode handlers for the hcalls where it is
critical to minimize the latency.  From what Ben is saying that would
only be H_INT_ESB, and maybe not even that.

If H_INT_ESB is only used for LSIs, then is a guest going to be using
it at all?  My understanding was that with XIVE, only a small number
of interrupts that are to do with system management functions are
LSIs; all of the interrupts relating to PCI-e devices are MSIs.  So do
we actually have a real high-frequency use case for LSIs in a guest?

For now I would prefer that you remove all the real-mode hcall
handlers.  We can add them later if we get performance data showing
that they are needed.

Regarding whether or not to have a given hcall handler in the kernel
at all - if there is for example an hcall which is just called once
on guest startup, and its function is just to provide information to
the guest, and QEMU has that information, then why not have that hcall
implemented by QEMU?  Are any of the hcalls like that?

For example, if H_INT_GET_SOURCE_INFO was implemented in QEMU, could
we then remove the VC_BASE thing from the xive device?

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support
  2019-01-23  6:45       ` Benjamin Herrenschmidt
@ 2019-01-23 10:30         ` Paul Mackerras
  2019-01-23 11:07           ` Cédric Le Goater
  2019-01-23 21:25           ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 47+ messages in thread
From: Paul Mackerras @ 2019-01-23 10:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: kvm, kvm-ppc, Cédric Le Goater, linuxppc-dev, David Gibson

On Wed, Jan 23, 2019 at 05:45:24PM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2019-01-22 at 16:26 +1100, Paul Mackerras wrote:
> > On Mon, Jan 07, 2019 at 08:10:05PM +0100, Cédric Le Goater wrote:
> > > Clear the ESB pages from the VMA of the IRQ being pass through to the
> > > guest and let the fault handler repopulate the VMA when the ESB pages
> > > are accessed for an EOI or for a trigger.
> > 
> > Why do we want to do this?
> > 
> > I don't see any possible advantage to removing the PTEs from the
> > userspace mapping.  You'll need to explain further.
> 
> Afaik bcs we change the mapping to point to the real HW irq ESB page
> instead of the "IPI" that was there at VM init time.

So that makes it sound like there is a whole lot going on that hasn't
even been hinted at in the patch descriptions...  It sounds like we
need a good description of how all this works and fits together
somewhere under Documentation/.

In any case we need much more informative patch descriptions.  I
realize that it's all currently in Cedric's head, but I bet that in
two or three years' time when we come to try to debug something, it
won't be in anyone's head...

Paul.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls
  2019-01-23 10:26         ` Paul Mackerras
@ 2019-01-23 10:48           ` Cédric Le Goater
  0 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-23 10:48 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On 1/23/19 11:26 AM, Paul Mackerras wrote:
> On Wed, Jan 23, 2019 at 09:48:31AM +0100, Cédric Le Goater wrote:
>> On 1/23/19 7:44 AM, Benjamin Herrenschmidt wrote:
>>> On Tue, 2019-01-22 at 16:23 +1100, Paul Mackerras wrote:
>>>> Why do we need to provide real-mode versions of these hypercall
>>>> handlers?  I thought these hypercalls would only get called
>>>> infrequently, and in any case certainly much less frequently than once
>>>> per interrupt delivered.  If they are infrequent, then let's leave out
>>>> the real-mode version and just handle them in book3s_hv.c.
>>>
>>> Agreed with the exception maybe of H_INT_ESB
>>
>> ok. 
>>
>> Some of these hcalls are really simple and only getting local info from 
>> the host (h_int_get_*). I thought handling the hcall ASAP was a preferred 
>> practice, even if the hcall is not called frequently. Isn't it ?
> 
> If we are going to handle a given hcall in the kernel at all, then we
> have to have a virtual mode handler.  If we have a real-mode handler
> as well then we in general incur a certain amount of code duplication
> with consequent maintenance costs and possibility of bugs.  So we
> generally only have real-mode handlers for the hcalls where it is
> critical to minimize the latency.  From what Ben is saying that would
> only be H_INT_ESB, and maybe not even that.

ok. and yes, even the H_INT_ESB is questionable as this is really a rare
configuration. 

> If H_INT_ESB is only used for LSIs, then is a guest going to be using
> it at all?  My understanding was that with XIVE, only a small number
> of interrupts that are to do with system management functions are
> LSIs; all of the interrupts relating to PCI-e devices are MSIs.  So do
> we actually have a real high-frequency use case for LSIs in a guest?

The guest should be using a rtl8139 or a e1000 NIC under QEMU/KVM, which 
is not the common scenario. 

> For now I would prefer that you remove all the real-mode hcall
> handlers.  We can add them later if we get performance data showing
> that they are needed.

ok. I will.

> Regarding whether or not to have a given hcall handler in the kernel
> at all - if there is for example an hcall which is just called once
> on guest startup, and its function is just to provide information to
> the guest, and QEMU has that information, then why not have that hcall
> implemented by QEMU?  Are any of the hcalls like that?
> 
> For example, if H_INT_GET_SOURCE_INFO was implemented in QEMU, could
> we then remove the VC_BASE thing from the xive device?

Yes. H_INT_GET_SOURCE_INFO looks like a good candidate, all info should
be in QEMU and there are no OPAL calls, and we would get rid of the 
VC_BASE kvm device ioctl at the same time.

Thanks,

C.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support
  2019-01-23 10:30         ` Paul Mackerras
@ 2019-01-23 11:07           ` Cédric Le Goater
  2019-01-23 21:25           ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-23 11:07 UTC (permalink / raw)
  To: Paul Mackerras, Benjamin Herrenschmidt
  Cc: linuxppc-dev, kvm, kvm-ppc, David Gibson

On 1/23/19 11:30 AM, Paul Mackerras wrote:
> On Wed, Jan 23, 2019 at 05:45:24PM +1100, Benjamin Herrenschmidt wrote:
>> On Tue, 2019-01-22 at 16:26 +1100, Paul Mackerras wrote:
>>> On Mon, Jan 07, 2019 at 08:10:05PM +0100, Cédric Le Goater wrote:
>>>> Clear the ESB pages from the VMA of the IRQ being pass through to the
>>>> guest and let the fault handler repopulate the VMA when the ESB pages
>>>> are accessed for an EOI or for a trigger.
>>>
>>> Why do we want to do this?
>>>
>>> I don't see any possible advantage to removing the PTEs from the
>>> userspace mapping.  You'll need to explain further.
>>
>> Afaik bcs we change the mapping to point to the real HW irq ESB page
>> instead of the "IPI" that was there at VM init time.

yes exactly. You need to clean up the pages each time.
 
> So that makes it sound like there is a whole lot going on that hasn't
> even been hinted at in the patch descriptions...  It sounds like we
> need a good description of how all this works and fits together
> somewhere under Documentation/.

OK. I have started doing so for the models merged in QEMU but not yet 
for KVM. I will work on it.

> In any case we need much more informative patch descriptions.  I
> realize that it's all currently in Cedric's head, but I bet that in
> two or three years' time when we come to try to debug something, it
> won't be in anyone's head...

I agree. 


So, storing the ESB VMA under the KVM device is not shocking anyone ?  

Thanks,

C.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type
  2019-01-22  4:56   ` Paul Mackerras
@ 2019-01-23 16:24     ` Cédric Le Goater
  0 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-23 16:24 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On 1/22/19 5:56 AM, Paul Mackerras wrote:
> On Mon, Jan 07, 2019 at 07:43:15PM +0100, Cédric Le Goater wrote:
>> We will have different KVM devices for interrupts, one for the
>> XICS-over-XIVE mode and one for the XIVE native exploitation
>> mode. Let's add some checks to make sure we are not mixing the
>> interfaces in KVM.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  arch/powerpc/kvm/book3s_xive.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
>> index f78d002f0fe0..8a4fa45f07f8 100644
>> --- a/arch/powerpc/kvm/book3s_xive.c
>> +++ b/arch/powerpc/kvm/book3s_xive.c
>> @@ -819,6 +819,9 @@ u64 kvmppc_xive_get_icp(struct kvm_vcpu *vcpu)
>>  {
>>  	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
>>  
>> +	if (!kvmppc_xics_enabled(vcpu))
>> +		return -EPERM;
>> +
>>  	if (!xc)
>>  		return 0;
>>  
>> @@ -835,6 +838,9 @@ int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
>>  	u8 cppr, mfrr;
>>  	u32 xisr;
>>  
>> +	if (!kvmppc_xics_enabled(vcpu))
>> +		return -EPERM;
>> +
>>  	if (!xc || !xive)
>>  		return -ENOENT;
> 
> I can't see how these new checks could ever trigger in the code as it
> stands.  Is there a way at present? 

It would require some custom QEMU doing silly things : create the XICS 
KVM device, and then call kvm_get_one_reg(KVM_REG_PPC_ICP_STATE) or 
kvm_set_one_reg(icp->cs, KVM_REG_PPC_ICP_STATE) without connecting the
vCPU to its presenter. 

Today, you get a ENOENT.

> Do following patches ever add a path where the new checks could trigger, 
> or is this just an excess of caution? 

With the following patches, QEMU could to do something even more silly,
which is to mix the interrupt mode interfaces : create a KVM XICS device    
and call KVM CPU ioctls of the KVM XIVE device, or the opposite.  

> (Your patch description should ideally have answered these questions > for me.)

Yes. I also think that I introduced this patch to early in the series.
It make more sense when the XICS and the XIVE KVM devices are available.  

Thanks,

C.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/19] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode
  2019-01-22  5:05   ` Paul Mackerras
@ 2019-01-23 16:28     ` Cédric Le Goater
  0 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-23 16:28 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On 1/22/19 6:05 AM, Paul Mackerras wrote:
> On Mon, Jan 07, 2019 at 07:43:17PM +0100, Cédric Le Goater wrote:
>> This is the basic framework for the new KVM device supporting the XIVE
>> native exploitation mode. The user interface exposes a new capability
>> and a new KVM device to be used by QEMU.
> 
> [snip]
>> @@ -1039,7 +1039,10 @@ static int kvmppc_book3s_init(void)
>>  #ifdef CONFIG_KVM_XIVE
>>  	if (xive_enabled()) {
>>  		kvmppc_xive_init_module();
>> +		kvmppc_xive_native_init_module();
>>  		kvm_register_device_ops(&kvm_xive_ops, KVM_DEV_TYPE_XICS);
>> +		kvm_register_device_ops(&kvm_xive_native_ops,
>> +					KVM_DEV_TYPE_XIVE);
> 
> I think we want tighter conditions on initializing the xive_native
> stuff and creating the xive device class.  We could have
> xive_enabled() returning true in a guest, and this code will get
> called both by PR KVM and HV KVM (and HV KVM no longer implies that we
> are running bare metal).

Ah yes, I agree. I haven't addressed at all the nested flavor. I have 
some questions about this that I will ask in summary email you sent. 

Thanks,

C. 
  

> 
>> @@ -1050,8 +1053,10 @@ static int kvmppc_book3s_init(void)
>>  static void kvmppc_book3s_exit(void)
>>  {
>>  #ifdef CONFIG_KVM_XICS
>> -	if (xive_enabled())
>> +	if (xive_enabled()) {
>>  		kvmppc_xive_exit_module();
>> +		kvmppc_xive_native_exit_module();
> 
> Same comment here.
> 
> Paul.
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device
  2019-01-22  5:09   ` Paul Mackerras
@ 2019-01-23 16:48     ` Cédric Le Goater
  0 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-23 16:48 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On 1/22/19 6:09 AM, Paul Mackerras wrote:
> On Mon, Jan 07, 2019 at 07:43:18PM +0100, Cédric Le Goater wrote:
>> This will let the guest create a memory mapping to expose the ESB MMIO
>> regions used to control the interrupt sources, to trigger events, to
>> EOI or to turn off the sources.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  arch/powerpc/include/uapi/asm/kvm.h   |  4 ++
>>  arch/powerpc/kvm/book3s_xive_native.c | 97 +++++++++++++++++++++++++++
>>  2 files changed, 101 insertions(+)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
>> index 8c876c166ef2..6bb61ba141c2 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -675,4 +675,8 @@ struct kvm_ppc_cpu_char {
>>  #define  KVM_XICS_PRESENTED		(1ULL << 43)
>>  #define  KVM_XICS_QUEUED		(1ULL << 44)
>>  
>> +/* POWER9 XIVE Native Interrupt Controller */
>> +#define KVM_DEV_XIVE_GRP_CTRL		1
>> +#define   KVM_DEV_XIVE_GET_ESB_FD	1
>> +
>>  #endif /* __LINUX_KVM_POWERPC_H */
>> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
>> index 115143e76c45..e20081f0c8d4 100644
>> --- a/arch/powerpc/kvm/book3s_xive_native.c
>> +++ b/arch/powerpc/kvm/book3s_xive_native.c
>> @@ -153,6 +153,85 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
>>  	return rc;
>>  }
>>  
>> +static int xive_native_esb_fault(struct vm_fault *vmf)
>> +{
>> +	struct vm_area_struct *vma = vmf->vma;
>> +	struct kvmppc_xive *xive = vma->vm_file->private_data;
>> +	struct kvmppc_xive_src_block *sb;
>> +	struct kvmppc_xive_irq_state *state;
>> +	struct xive_irq_data *xd;
>> +	u32 hw_num;
>> +	u16 src;
>> +	u64 page;
>> +	unsigned long irq;
>> +
>> +	/*
>> +	 * Linux/KVM uses a two pages ESB setting, one for trigger and
>> +	 * one for EOI
>> +	 */
>> +	irq = vmf->pgoff / 2;
>> +
>> +	sb = kvmppc_xive_find_source(xive, irq, &src);
>> +	if (!sb) {
>> +		pr_err("%s: source %lx not found !\n", __func__, irq);
> 
> In general it's a bad idea to have a printk that userspace can trigger
> at will without any rate-limiting.  Is there a real reason why this
> printk is needed (and can't be pr_devel)?

yes. It should. The SIGBUS is enough to know what's going on. 

> 
>> +		return VM_FAULT_SIGBUS;
>> +	}
>> +
>> +	state = &sb->irq_state[src];
>> +	kvmppc_xive_select_irq(state, &hw_num, &xd);
>> +
>> +	arch_spin_lock(&sb->lock);
>> +
>> +	/*
>> +	 * first/even page is for trigger
>> +	 * second/odd page is for EOI and management.
>> +	 */
>> +	page = vmf->pgoff % 2 ? xd->eoi_page : xd->trig_page;
>> +	arch_spin_unlock(&sb->lock);
>> +
>> +	if (!page) {
>> +		pr_err("%s: acessing invalid ESB page for source %lx !\n",
>> +		       __func__, irq);
> 
> Does this represent a exceptional condition that userspace can't just
> trigger at will (i.e. it implies the presence of a kernel bug)?  If
> not then the same comment as above applies.

Not having an ESB page (trigger or EOI) implies that the xive_irq_data 
for the source is bogus. This probably deserves a WARN().

Thanks,

C. 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/19] KVM: PPC: Book3S HV: add a VC_BASE control to the XIVE native device
  2019-01-22  5:14   ` Paul Mackerras
@ 2019-01-23 16:56     ` Cédric Le Goater
  0 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-23 16:56 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On 1/22/19 6:14 AM, Paul Mackerras wrote:
> On Mon, Jan 07, 2019 at 07:43:20PM +0100, Cédric Le Goater wrote:
>> The ESB MMIO region controls the interrupt sources of the guest. QEMU
>> will query an fd (GET_ESB_FD ioctl) and map this region at a specific
>> address for the guest to use. The guest will obtain this information
>> using the H_INT_GET_SOURCE_INFO hcall. To inform KVM of the address
>> setting used by QEMU, add a VC_BASE control to the KVM XIVE device
> 
> This needs a little more explanation.  I *think* the only way this
> gets used is that it gets returned to the guest by the new
> hypercalls.  If that is indeed the case it would be useful to mention
> that in the patch description, because otherwise taking a value that
> userspace provides and which looks like it is an address, and not
> doing any validation on it, looks a bit scary.

I think we have solved this problem in another email thread. 

The H_INT_GET_SOURCE_INFO hcall does not need to be implemented in KVM
as all the source information should already be available in QEMU. In
that case, there is no need to inform KVM of where the ESB pages are 
mapped in the guest address space. So we don't need that extra control
on the KVM device. This is good news. 

Thanks,

C. 




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl
  2019-01-22  5:42     ` Paul Mackerras
@ 2019-01-23 18:39       ` Cédric Le Goater
  0 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-23 18:39 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On 1/22/19 6:42 AM, Paul Mackerras wrote:
> On Mon, Jan 07, 2019 at 08:10:06PM +0100, Cédric Le Goater wrote:
>> This will be used to destroy the KVM XICS or XIVE device when the
>> sPAPR machine is reseted. When the VM boots, the CAS negotiation
>> process will determine which interrupt mode to use and the appropriate
>> KVM device will then be created.
> 
> What would be the consequence if we didn't destroy the device?

So, if we don't destroy the device, it would mean that we are 
maintaining its availability under the KVM PPC structures, VM and
vCPUs, I think the changes would be significant to have two interrupt 
devices unde the VM. We would also need a way to activate one or 
the other depending on the interrupt mode chosen by CAS. In other 
words, it's moving all the interrupt mode politics from QEMU to KVM. 
It's possible of course but I would prefer to leave the ugly details 
in QEMU.  

Let's suppose now that we keep the device alive but disconnect the 
presenters from it, and from the VM also. We would have an unused 
device in the VM. We would need way to keep an handle on it (fd 
certainly) and a KVM interface to soft reset a KVM device partially 
initialized. That's one other option.

It seemed easier to do an hard reset : create/destroy.  

> The reason I ask is that we will have to be much more careful about
> memory allocation lifetimes with this patch. 

yes. bad refcounting will lead the host kernel to a crash. 

> Having KVM devices last
> until the KVM instance is destroyed means that we generally avoid
> use-after-free bugs.  With this patch we will have to do a careful
> analysis of the lifetime of the xive structures vs. possible accesses
> on other threads to prove there are no use-after-free bugs.
> 
> For example, it is not sufficient to set any pointers in struct kvm or
> struct kvm_vcpu that point into xive structures to NULL before freeing
> the structures.  There could be code on another CPU that has read the
> pointer value before you set it to NULL and then goes and accesses it
> after you have freed it.  You need to prove that can't happen,
> possibly using some sort of explicit synchronization that ensures that
> no other CPU could still be accessing the structure at the time when
> you free it.  RCU can help with this, but in general means you need
> RCU synchronization primitives (rcu_read_lock() etc.) at all the
> places where you use the pointer, which I don't think you currently
> have.

no. indeed. I have overlooked the synchronization aspect.

> If there is a good fundamental reason why this can't happen, even
> though you don't have explicit synchronization, then at a minimum you
> need to explain that in the patch description, and ideally also in
> code comments.

OK. I did leave that patch at the end for one reason. It needs more care.

Thanks,

C.
 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode
  2019-01-22  4:46 ` [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Paul Mackerras
@ 2019-01-23 19:07   ` Cédric Le Goater
  0 siblings, 0 replies; 47+ messages in thread
From: Cédric Le Goater @ 2019-01-23 19:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, linuxppc-dev, David Gibson

On 1/22/19 5:46 AM, Paul Mackerras wrote:
> On Mon, Jan 07, 2019 at 07:43:12PM +0100, Cédric Le Goater wrote:
>> Hello,
>>
>> On the POWER9 processor, the XIVE interrupt controller can control
>> interrupt sources using MMIO to trigger events, to EOI or to turn off
>> the sources. Priority management and interrupt acknowledgment is also
>> controlled by MMIO in the CPU presenter subengine.
>>
>> PowerNV/baremetal Linux runs natively under XIVE but sPAPR guests need
>> special support from the hypervisor to do the same. This is called the
>> XIVE native exploitation mode and today, it can be activated under the
>> PowerPC Hypervisor, pHyp. However, Linux/KVM lacks XIVE native support
>> and still offers the old interrupt mode interface using a
>> XICS-over-XIVE glue which implements the XICS hcalls.
>>
>> The following series is proposal to add the same support under KVM.
>>
>> A new KVM device is introduced for the XIVE native exploitation
>> mode. It reuses most of the XICS-over-XIVE glue implementation
>> structures which are internal to KVM but has a completely different
>> interface. A set of Hypervisor calls configures the sources and the
>> event queues and from there, all control is done by the guest through
>> MMIOs.
>>
>> These MMIO regions (ESB and TIMA) are exposed to guests in QEMU,
>> similarly to VFIO, and the associated VMAs are populated dynamically
>> with the appropriate pages using a fault handler. This is implemented
>> with a couple of KVM device ioctls.
>>
>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
>> negotiation process determines whether the guest operates with a
>> interrupt controller using the XICS legacy model, as found on POWER8,
>> or in XIVE exploitation mode. Which means that the KVM interrupt
>> device should be created at runtime, after the machine as started.
>> This requires extra KVM support to create/destroy KVM devices. The
>> last patches are an attempt to solve that problem.
>>
>> Migration has its own specific needs. The patchset provides the
>> necessary routines to quiesce XIVE, to capture and restore the state
>> of the different structures used by KVM, OPAL and HW. Extra OPAL
>> support is required for these.
> 
> Thanks for the patchset.  It mostly looks good, but there are some
> more things we need to consider, and I think a v2 will be needed.
>> One general comment I have is that there are a lot of acronyms in this
> code and you mostly seem to assume that people will know what they all
> mean.  It would make the code more readable if you provide the
> expansion of the acronym on first use in a comment or whatever.  For
> example, one of the patches in this series talks about the "EAS"

 Event Assignment Structure, a.k.a IVE (Interrupt Virtualization Entry)

All the names changed somewhere between XIVE v1 and XIVE v2. OPAL and
Linux should be adjusted ...

> without ever expanding it in any comment or in the patch description,
> and I have forgotten just at the moment what EAS stands for (I just
> know that understanding the XIVE is not eas-y. :)
ah ! yes. But we have great documentation :)

We pushed some high level description of XIVE in QEMU :

  https://git.qemu.org/?p=qemu.git;a=blob;f=include/hw/ppc/xive.h;h=ec23253ba448e25c621356b55a7777119a738f8e;hb=HEAD

I should do the same for Linux with a KVM section to explain the 
interfaces which do not directly expose the underlying XIVE concepts. 
It's better to understand a little what is happening under the hood.

> Another general comment is that you seem to have written all this
> code assuming we are using HV KVM in a host running bare-metal.

Yes. I didn't look at the other configurations. I thought that we could
use the kernel_irqchip=off option to begin with. A couple of checks
are indeed missing.

> However, we could be using PR KVM (either in a bare-metal host or in a
> guest), or we could be doing nested HV KVM where we are using the
> kvm_hv module inside a KVM guest and using special hypercalls for
> controlling our guests.

Yes. 

It would be good to talk a little about the nested support (offline 
may be) to make sure that we are not missing some major interface that 
would require a lot of change. If we need to prepare ground, I think
the timing is good.

The size of the IRQ number space might be a problem. It seems we 
would need to increase it considerably to support multiple nested 
guests. That said I haven't look much how nested is designed.  

> It would be perfectly acceptable for now to say that we don't yet
> support XIVE exploitation in those scenarios, as long as we then make
> sure that the new KVM capability reports false in those scenarios, and
> any attempt to use the XIVE exploitation interfaces fails cleanly.

ok. That looks the best approach for now.

> I don't see that either of those is true in the patch set as it
> stands, so that is one area that needs to be fixed.
> 
> A third general comment is that the new KVM interfaces you have added
> need to be documented in the files under Documentation/virtual/kvm.

ok. 

Thanks,

C. 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support
  2019-01-23 10:30         ` Paul Mackerras
  2019-01-23 11:07           ` Cédric Le Goater
@ 2019-01-23 21:25           ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 47+ messages in thread
From: Benjamin Herrenschmidt @ 2019-01-23 21:25 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: kvm, kvm-ppc, Cédric Le Goater, linuxppc-dev, David Gibson

On Wed, 2019-01-23 at 21:30 +1100, Paul Mackerras wrote:
> > Afaik bcs we change the mapping to point to the real HW irq ESB page
> > instead of the "IPI" that was there at VM init time.
> 
> So that makes it sound like there is a whole lot going on that hasn't
> even been hinted at in the patch descriptions...  It sounds like we
> need a good description of how all this works and fits together
> somewhere under Documentation/.
> 
> In any case we need much more informative patch descriptions.  I
> realize that it's all currently in Cedric's head, but I bet that in
> two or three years' time when we come to try to debug something, it
> won't be in anyone's head...

The main problem is understanding XIVE itself. It's not realistic to
ask Cedric to write a proper documentation for XIVE as part of the
patch series, but sadly IBM doesn't have a good one to provide either.

Ben.



^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, back to index

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-07 18:43 [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Cédric Le Goater
2019-01-07 18:43 ` [PATCH 01/19] powerpc/xive: export flags for the XIVE native exploitation mode hcalls Cédric Le Goater
2019-01-09  3:33   ` David Gibson
2019-01-09 13:08   ` Michael Ellerman
2019-01-09 13:38     ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 02/19] powerpc/xive: add OPAL extensions for the XIVE native exploitation support Cédric Le Goater
2019-01-09  4:26   ` David Gibson
2019-01-07 18:43 ` [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type Cédric Le Goater
2019-01-09  4:27   ` David Gibson
2019-01-22  4:56   ` Paul Mackerras
2019-01-23 16:24     ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 04/19] KVM: PPC: Book3S HV: export services for the XIVE native exploitation device Cédric Le Goater
2019-01-11  4:09   ` David Gibson
2019-01-07 18:43 ` [PATCH 05/19] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode Cédric Le Goater
2019-01-22  5:05   ` Paul Mackerras
2019-01-23 16:28     ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device Cédric Le Goater
2019-01-22  5:09   ` Paul Mackerras
2019-01-23 16:48     ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 07/19] KVM: PPC: Book3S HV: add a GET_TIMA_FD control to " Cédric Le Goater
2019-01-07 18:43 ` [PATCH 08/19] KVM: PPC: Book3S HV: add a VC_BASE control to the " Cédric Le Goater
2019-01-22  5:14   ` Paul Mackerras
2019-01-23 16:56     ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 09/19] KVM: PPC: Book3S HV: add a SET_SOURCE " Cédric Le Goater
2019-01-07 18:43 ` [PATCH 10/19] KVM: PPC: Book3S HV: add a EISN attribute to kvmppc_xive_irq_state Cédric Le Goater
2019-01-07 18:43 ` [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls Cédric Le Goater
2019-01-22  5:23   ` Paul Mackerras
     [not found]     ` <de777e750665d92509b1487a90acb79d70854cf3.camel@kernel.crashing.org>
2019-01-23  8:48       ` Cédric Le Goater
2019-01-23 10:26         ` Paul Mackerras
2019-01-23 10:48           ` Cédric Le Goater
2019-01-07 18:43 ` [PATCH 12/19] KVM: PPC: Book3S HV: record guest queue page address Cédric Le Goater
2019-01-07 18:43 ` [PATCH 13/19] KVM: PPC: Book3S HV: add a SYNC control for the XIVE native migration Cédric Le Goater
2019-01-07 18:43 ` [PATCH 14/19] KVM: PPC: Book3S HV: add a control to make the XIVE EQ pages dirty Cédric Le Goater
2019-01-07 18:43 ` [PATCH 15/19] KVM: PPC: Book3S HV: add get/set accessors for the source configuration Cédric Le Goater
2019-01-07 18:43 ` [PATCH 16/19] KVM: PPC: Book3S HV: add get/set accessors for the EQ configuration Cédric Le Goater
2019-01-07 19:10 ` [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state Cédric Le Goater
2019-01-07 19:10   ` [PATCH 18/19] KVM: PPC: Book3S HV: add passthrough support Cédric Le Goater
2019-01-22  5:26     ` Paul Mackerras
2019-01-23  6:45       ` Benjamin Herrenschmidt
2019-01-23 10:30         ` Paul Mackerras
2019-01-23 11:07           ` Cédric Le Goater
2019-01-23 21:25           ` Benjamin Herrenschmidt
2019-01-07 19:10   ` [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl Cédric Le Goater
2019-01-22  5:42     ` Paul Mackerras
2019-01-23 18:39       ` Cédric Le Goater
2019-01-22  4:46 ` [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode Paul Mackerras
2019-01-23 19:07   ` Cédric Le Goater

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org linuxppc-dev@archiver.kernel.org
	public-inbox-index linuxppc-dev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox