linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution
@ 2022-03-14 19:44 Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 01/32] s390/sclp: detect the zPCI load/store interpretation facility Matthew Rosato
                   ` (32 more replies)
  0 siblings, 33 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Note: A few patches in this series are dependent on Baolu's IOMMU domain ops    
split, which is currently in the next branch of linux-iommu. This series        
applies on top:                                                                 
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git     

Enable interpretive execution of zPCI instructions + adapter interruption
forwarding for s390x KVM vfio-pci.  This is done by introducing a new IOMMU
domain for s390x (KVM-managed), indicating via vfio that this IOMMU domain
should be used instead of the default, with subsequent management of the
hardware assists being handled via a new KVM ioctl for zPCI management.

By allowing intepretation of zPCI instructions and firmware delivery of         
interrupts to guests, we can significantly reduce the frequency of guest        
SIE exits for zPCI.  We then see additional gains by handling a hot-path        
instruction that can still intercept to the hypervisor (RPCIT) directly         
in kvm via the new IOMMU domain, whose map operations update the host           
DMA table with pinned guest entries over the specified range.

From the perspective of guest configuration, you passthrough zPCI devices
in the same manner as before, with intepretation support being used by
default if available in kernel+qemu.

Will reply with a link to the associated QEMU series.

Changelog v3->v4:
v3: https://lore.kernel.org/kvm/20220204211536.321475-1-mjrosato@linux.ibm.com/ 
- Significant overhaul of the userspace API.  Remove all vfio device
  feature ioctls.  Remove CONFIG_VFIO_PCI_ZDEV, this is once again always
  built with vfio-pci for s390; IS_ENABLED checks can instead look at
  CONFIG_VFIO_PCI.  Most earlier patches in the series could maintain
  their reviews, but some needed to be removed due to required code
  changes.
- Instead use a KVM ioctl for zPCI management.  The API is very similar
  to the feature ioctls used in the prior series, with an additional step
  to create an association between an iommu domain + KVM + zPCI device.
- Introduce a new iommu domain ops type for s390-iommu, to be used when         
  KVM manages the IOMMU instead of in response to VFIO mapping ioctls 
- Add a iommu method for specifying the type of domain to allocate
- Add a new type to vfio_iommu_type1 (KVM-owned) to trigger the allocation
  of the KVM-owned IOMMU domain when zPCI interpretation is requested.
  In this case, the KVM-owned type is specified on VFIO_SET_IOMMU. 
- Wire the RPCIT intercepts into the new IOMMU domain via the kernel
  IOMMU API 
- Remove a bunch of unnecessary symbol externs, make the associated
  functions static
- Now that we keep a list of zPCI associated with a given KVM, we can do
  fh lookup on this list vs the list of all zPCI on the host.  We only
  need to do a host-wide fh lookup during the initial device<->KVM
  association.


Matthew Rosato (32):
  s390/sclp: detect the zPCI load/store interpretation facility
  s390/sclp: detect the AISII facility
  s390/sclp: detect the AENI facility
  s390/sclp: detect the AISI facility
  s390/airq: pass more TPI info to airq handlers
  s390/airq: allow for airq structure that uses an input vector
  s390/pci: externalize the SIC operation controls and routine
  s390/pci: stash associated GISA designation
  s390/pci: export some routines related to RPCIT processing
  s390/pci: stash dtsm and maxstbl
  s390/pci: add helper function to find device by handle
  s390/pci: get SHM information from list pci
  s390/pci: return status from zpci_refresh_trans
  iommu: introduce iommu_domain_alloc_type and the KVM type
  vfio: introduce KVM-owned IOMMU type
  vfio-pci/zdev: add function handle to clp base capability
  KVM: s390: pci: add basic kvm_zdev structure
  iommu/s390: add support for IOMMU_DOMAIN_KVM
  KVM: s390: pci: do initial setup for AEN interpretation
  KVM: s390: pci: enable host forwarding of Adapter Event Notifications
  KVM: s390: mechanism to enable guest zPCI Interpretation
  KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM
  KVM: s390: pci: provide routines for enabling/disabling interpretation
  KVM: s390: pci: provide routines for enabling/disabling interrupt
    forwarding
  KVM: s390: pci: provide routines for enabling/disabling IOAT assist
  KVM: s390: pci: handle refresh of PCI translations
  KVM: s390: intercept the rpcit instruction
  KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices
  vfio-pci/zdev: add DTSM to clp group capability
  KVM: s390: introduce CPU feature for zPCI Interpretation
  MAINTAINERS: additional files related kvm s390 pci passthrough
  MAINTAINERS: update s390 IOMMU entry

 Documentation/virt/kvm/api.rst   |  60 +++
 MAINTAINERS                      |   4 +-
 arch/s390/include/asm/airq.h     |   7 +-
 arch/s390/include/asm/kvm_host.h |   7 +
 arch/s390/include/asm/kvm_pci.h  |  40 ++
 arch/s390/include/asm/pci.h      |  12 +
 arch/s390/include/asm/pci_clp.h  |  11 +-
 arch/s390/include/asm/pci_dma.h  |   3 +
 arch/s390/include/asm/pci_insn.h |  31 +-
 arch/s390/include/asm/sclp.h     |   4 +
 arch/s390/include/asm/tpi.h      |  13 +
 arch/s390/include/uapi/asm/kvm.h |   1 +
 arch/s390/kvm/Makefile           |   1 +
 arch/s390/kvm/interrupt.c        |  95 +++-
 arch/s390/kvm/kvm-s390.c         |  90 +++-
 arch/s390/kvm/kvm-s390.h         |  10 +
 arch/s390/kvm/pci.c              | 833 +++++++++++++++++++++++++++++++
 arch/s390/kvm/pci.h              |  63 +++
 arch/s390/kvm/priv.c             |  46 ++
 arch/s390/pci/pci.c              |  31 ++
 arch/s390/pci/pci_clp.c          |  28 +-
 arch/s390/pci/pci_dma.c          |   7 +-
 arch/s390/pci/pci_insn.c         |  15 +-
 arch/s390/pci/pci_irq.c          |  48 +-
 drivers/iommu/Kconfig            |   8 +
 drivers/iommu/Makefile           |   1 +
 drivers/iommu/iommu.c            |   7 +
 drivers/iommu/s390-iommu.c       |  53 +-
 drivers/iommu/s390-iommu.h       |  53 ++
 drivers/iommu/s390-kvm-iommu.c   | 469 +++++++++++++++++
 drivers/s390/char/sclp_early.c   |   4 +
 drivers/s390/cio/airq.c          |  12 +-
 drivers/s390/cio/qdio_thinint.c  |   6 +-
 drivers/s390/crypto/ap_bus.c     |   9 +-
 drivers/s390/virtio/virtio_ccw.c |   6 +-
 drivers/vfio/pci/vfio_pci_zdev.c |  17 +-
 drivers/vfio/vfio_iommu_type1.c  |  12 +-
 include/linux/iommu.h            |  12 +
 include/uapi/linux/kvm.h         |  43 ++
 include/uapi/linux/vfio.h        |   6 +
 include/uapi/linux/vfio_zdev.h   |   6 +
 41 files changed, 2090 insertions(+), 94 deletions(-)
 create mode 100644 arch/s390/include/asm/kvm_pci.h
 create mode 100644 arch/s390/kvm/pci.c
 create mode 100644 arch/s390/kvm/pci.h
 create mode 100644 drivers/iommu/s390-iommu.h
 create mode 100644 drivers/iommu/s390-kvm-iommu.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v4 01/32] s390/sclp: detect the zPCI load/store interpretation facility
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 02/32] s390/sclp: detect the AISII facility Matthew Rosato
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Detect the zPCI Load/Store Interpretation facility.

Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/sclp.h   | 1 +
 drivers/s390/char/sclp_early.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index c68ea35de498..58a4d3d354b7 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -88,6 +88,7 @@ struct sclp_info {
 	unsigned char has_diag318 : 1;
 	unsigned char has_sipl : 1;
 	unsigned char has_dirq : 1;
+	unsigned char has_zpci_lsi : 1;
 	unsigned int ibc;
 	unsigned int mtid;
 	unsigned int mtid_cp;
diff --git a/drivers/s390/char/sclp_early.c b/drivers/s390/char/sclp_early.c
index e9943a86c361..b88dd0da1231 100644
--- a/drivers/s390/char/sclp_early.c
+++ b/drivers/s390/char/sclp_early.c
@@ -45,6 +45,7 @@ static void __init sclp_early_facilities_detect(void)
 	sclp.has_gisaf = !!(sccb->fac118 & 0x08);
 	sclp.has_hvs = !!(sccb->fac119 & 0x80);
 	sclp.has_kss = !!(sccb->fac98 & 0x01);
+	sclp.has_zpci_lsi = !!(sccb->fac118 & 0x01);
 	if (sccb->fac85 & 0x02)
 		S390_lowcore.machine_flags |= MACHINE_FLAG_ESOP;
 	if (sccb->fac91 & 0x40)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 02/32] s390/sclp: detect the AISII facility
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 01/32] s390/sclp: detect the zPCI load/store interpretation facility Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 03/32] s390/sclp: detect the AENI facility Matthew Rosato
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Detect the Adapter Interruption Source ID Interpretation facility.

Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/sclp.h   | 1 +
 drivers/s390/char/sclp_early.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index 58a4d3d354b7..8b56ac5ae496 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -89,6 +89,7 @@ struct sclp_info {
 	unsigned char has_sipl : 1;
 	unsigned char has_dirq : 1;
 	unsigned char has_zpci_lsi : 1;
+	unsigned char has_aisii : 1;
 	unsigned int ibc;
 	unsigned int mtid;
 	unsigned int mtid_cp;
diff --git a/drivers/s390/char/sclp_early.c b/drivers/s390/char/sclp_early.c
index b88dd0da1231..29fee179e197 100644
--- a/drivers/s390/char/sclp_early.c
+++ b/drivers/s390/char/sclp_early.c
@@ -45,6 +45,7 @@ static void __init sclp_early_facilities_detect(void)
 	sclp.has_gisaf = !!(sccb->fac118 & 0x08);
 	sclp.has_hvs = !!(sccb->fac119 & 0x80);
 	sclp.has_kss = !!(sccb->fac98 & 0x01);
+	sclp.has_aisii = !!(sccb->fac118 & 0x40);
 	sclp.has_zpci_lsi = !!(sccb->fac118 & 0x01);
 	if (sccb->fac85 & 0x02)
 		S390_lowcore.machine_flags |= MACHINE_FLAG_ESOP;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 03/32] s390/sclp: detect the AENI facility
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 01/32] s390/sclp: detect the zPCI load/store interpretation facility Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 02/32] s390/sclp: detect the AISII facility Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 04/32] s390/sclp: detect the AISI facility Matthew Rosato
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Detect the Adapter Event Notification Interpretation facility.

Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/sclp.h   | 1 +
 drivers/s390/char/sclp_early.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index 8b56ac5ae496..8c2e142000d4 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -90,6 +90,7 @@ struct sclp_info {
 	unsigned char has_dirq : 1;
 	unsigned char has_zpci_lsi : 1;
 	unsigned char has_aisii : 1;
+	unsigned char has_aeni : 1;
 	unsigned int ibc;
 	unsigned int mtid;
 	unsigned int mtid_cp;
diff --git a/drivers/s390/char/sclp_early.c b/drivers/s390/char/sclp_early.c
index 29fee179e197..e9af01b4c97a 100644
--- a/drivers/s390/char/sclp_early.c
+++ b/drivers/s390/char/sclp_early.c
@@ -46,6 +46,7 @@ static void __init sclp_early_facilities_detect(void)
 	sclp.has_hvs = !!(sccb->fac119 & 0x80);
 	sclp.has_kss = !!(sccb->fac98 & 0x01);
 	sclp.has_aisii = !!(sccb->fac118 & 0x40);
+	sclp.has_aeni = !!(sccb->fac118 & 0x20);
 	sclp.has_zpci_lsi = !!(sccb->fac118 & 0x01);
 	if (sccb->fac85 & 0x02)
 		S390_lowcore.machine_flags |= MACHINE_FLAG_ESOP;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 04/32] s390/sclp: detect the AISI facility
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (2 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 03/32] s390/sclp: detect the AENI facility Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 05/32] s390/airq: pass more TPI info to airq handlers Matthew Rosato
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Detect the Adapter Interruption Suppression Interpretation facility.

Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/sclp.h   | 1 +
 drivers/s390/char/sclp_early.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index 8c2e142000d4..33b174007848 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -91,6 +91,7 @@ struct sclp_info {
 	unsigned char has_zpci_lsi : 1;
 	unsigned char has_aisii : 1;
 	unsigned char has_aeni : 1;
+	unsigned char has_aisi : 1;
 	unsigned int ibc;
 	unsigned int mtid;
 	unsigned int mtid_cp;
diff --git a/drivers/s390/char/sclp_early.c b/drivers/s390/char/sclp_early.c
index e9af01b4c97a..c13e55cc4a5d 100644
--- a/drivers/s390/char/sclp_early.c
+++ b/drivers/s390/char/sclp_early.c
@@ -47,6 +47,7 @@ static void __init sclp_early_facilities_detect(void)
 	sclp.has_kss = !!(sccb->fac98 & 0x01);
 	sclp.has_aisii = !!(sccb->fac118 & 0x40);
 	sclp.has_aeni = !!(sccb->fac118 & 0x20);
+	sclp.has_aisi = !!(sccb->fac118 & 0x10);
 	sclp.has_zpci_lsi = !!(sccb->fac118 & 0x01);
 	if (sccb->fac85 & 0x02)
 		S390_lowcore.machine_flags |= MACHINE_FLAG_ESOP;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 05/32] s390/airq: pass more TPI info to airq handlers
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (3 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 04/32] s390/sclp: detect the AISI facility Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 06/32] s390/airq: allow for airq structure that uses an input vector Matthew Rosato
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

A subsequent patch will introduce an airq handler that requires additional
TPI information beyond directed vs floating, so pass the entire tpi_info
structure via the handler.  Only pci actually uses this information today,
for the other airq handlers this is effectively a no-op.

Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/airq.h     | 3 ++-
 arch/s390/kvm/interrupt.c        | 4 +++-
 arch/s390/pci/pci_irq.c          | 9 +++++++--
 drivers/s390/cio/airq.c          | 2 +-
 drivers/s390/cio/qdio_thinint.c  | 6 ++++--
 drivers/s390/crypto/ap_bus.c     | 9 ++++++---
 drivers/s390/virtio/virtio_ccw.c | 4 +++-
 7 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/arch/s390/include/asm/airq.h b/arch/s390/include/asm/airq.h
index 01936fdfaddb..7918a7d09028 100644
--- a/arch/s390/include/asm/airq.h
+++ b/arch/s390/include/asm/airq.h
@@ -12,10 +12,11 @@
 
 #include <linux/bit_spinlock.h>
 #include <linux/dma-mapping.h>
+#include <asm/tpi.h>
 
 struct airq_struct {
 	struct hlist_node list;		/* Handler queueing. */
-	void (*handler)(struct airq_struct *airq, bool floating);
+	void (*handler)(struct airq_struct *airq, struct tpi_info *tpi_info);
 	u8 *lsi_ptr;			/* Local-Summary-Indicator pointer */
 	u8 lsi_mask;			/* Local-Summary-Indicator mask */
 	u8 isc;				/* Interrupt-subclass */
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index db933c252dbc..65e75ca2fc5d 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -28,6 +28,7 @@
 #include <asm/switch_to.h>
 #include <asm/nmi.h>
 #include <asm/airq.h>
+#include <asm/tpi.h>
 #include "kvm-s390.h"
 #include "gaccess.h"
 #include "trace-s390.h"
@@ -3269,7 +3270,8 @@ int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc)
 }
 EXPORT_SYMBOL_GPL(kvm_s390_gisc_unregister);
 
-static void gib_alert_irq_handler(struct airq_struct *airq, bool floating)
+static void gib_alert_irq_handler(struct airq_struct *airq,
+				  struct tpi_info *tpi_info)
 {
 	inc_irq_stat(IRQIO_GAL);
 	process_gib_alert_list();
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 2b6062c486f5..cc4c8d7c8f5c 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -11,6 +11,7 @@
 
 #include <asm/isc.h>
 #include <asm/airq.h>
+#include <asm/tpi.h>
 
 static enum {FLOATING, DIRECTED} irq_delivery;
 
@@ -216,8 +217,11 @@ static void zpci_handle_fallback_irq(void)
 	}
 }
 
-static void zpci_directed_irq_handler(struct airq_struct *airq, bool floating)
+static void zpci_directed_irq_handler(struct airq_struct *airq,
+				      struct tpi_info *tpi_info)
 {
+	bool floating = !tpi_info->directed_irq;
+
 	if (floating) {
 		inc_irq_stat(IRQIO_PCF);
 		zpci_handle_fallback_irq();
@@ -227,7 +231,8 @@ static void zpci_directed_irq_handler(struct airq_struct *airq, bool floating)
 	}
 }
 
-static void zpci_floating_irq_handler(struct airq_struct *airq, bool floating)
+static void zpci_floating_irq_handler(struct airq_struct *airq,
+				      struct tpi_info *tpi_info)
 {
 	unsigned long si, ai;
 	struct airq_iv *aibv;
diff --git a/drivers/s390/cio/airq.c b/drivers/s390/cio/airq.c
index e56535c99888..2f2226786319 100644
--- a/drivers/s390/cio/airq.c
+++ b/drivers/s390/cio/airq.c
@@ -99,7 +99,7 @@ static irqreturn_t do_airq_interrupt(int irq, void *dummy)
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(airq, head, list)
 		if ((*airq->lsi_ptr & airq->lsi_mask) != 0)
-			airq->handler(airq, !tpi_info->directed_irq);
+			airq->handler(airq, tpi_info);
 	rcu_read_unlock();
 
 	return IRQ_HANDLED;
diff --git a/drivers/s390/cio/qdio_thinint.c b/drivers/s390/cio/qdio_thinint.c
index 8e09bf3a2fcd..9b9335dd06db 100644
--- a/drivers/s390/cio/qdio_thinint.c
+++ b/drivers/s390/cio/qdio_thinint.c
@@ -15,6 +15,7 @@
 #include <asm/qdio.h>
 #include <asm/airq.h>
 #include <asm/isc.h>
+#include <asm/tpi.h>
 
 #include "cio.h"
 #include "ioasm.h"
@@ -93,9 +94,10 @@ static inline u32 clear_shared_ind(void)
 /**
  * tiqdio_thinint_handler - thin interrupt handler for qdio
  * @airq: pointer to adapter interrupt descriptor
- * @floating: flag to recognize floating vs. directed interrupts (unused)
+ * @tpi_info: interrupt information (e.g. floating vs directed -- unused)
  */
-static void tiqdio_thinint_handler(struct airq_struct *airq, bool floating)
+static void tiqdio_thinint_handler(struct airq_struct *airq,
+				   struct tpi_info *tpi_info)
 {
 	u64 irq_time = S390_lowcore.int_clock;
 	u32 si_used = clear_shared_ind();
diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 1986243f9cd3..df1a038442db 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -27,6 +27,7 @@
 #include <linux/kthread.h>
 #include <linux/mutex.h>
 #include <asm/airq.h>
+#include <asm/tpi.h>
 #include <linux/atomic.h>
 #include <asm/isc.h>
 #include <linux/hrtimer.h>
@@ -129,7 +130,8 @@ static int ap_max_adapter_id = 63;
 static struct bus_type ap_bus_type;
 
 /* Adapter interrupt definitions */
-static void ap_interrupt_handler(struct airq_struct *airq, bool floating);
+static void ap_interrupt_handler(struct airq_struct *airq,
+				 struct tpi_info *tpi_info);
 
 static bool ap_irq_flag;
 
@@ -442,9 +444,10 @@ static enum hrtimer_restart ap_poll_timeout(struct hrtimer *unused)
 /**
  * ap_interrupt_handler() - Schedule ap_tasklet on interrupt
  * @airq: pointer to adapter interrupt descriptor
- * @floating: ignored
+ * @tpi_info: ignored
  */
-static void ap_interrupt_handler(struct airq_struct *airq, bool floating)
+static void ap_interrupt_handler(struct airq_struct *airq,
+				 struct tpi_info *tpi_info)
 {
 	inc_irq_stat(IRQIO_APB);
 	tasklet_schedule(&ap_tasklet);
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index d35e7a3f7067..52c376d15978 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -33,6 +33,7 @@
 #include <asm/virtio-ccw.h>
 #include <asm/isc.h>
 #include <asm/airq.h>
+#include <asm/tpi.h>
 
 /*
  * virtio related functions
@@ -203,7 +204,8 @@ static void drop_airq_indicator(struct virtqueue *vq, struct airq_info *info)
 	write_unlock_irqrestore(&info->lock, flags);
 }
 
-static void virtio_airq_handler(struct airq_struct *airq, bool floating)
+static void virtio_airq_handler(struct airq_struct *airq,
+				struct tpi_info *tpi_info)
 {
 	struct airq_info *info = container_of(airq, struct airq_info, airq);
 	unsigned long ai;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 06/32] s390/airq: allow for airq structure that uses an input vector
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (4 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 05/32] s390/airq: pass more TPI info to airq handlers Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 07/32] s390/pci: externalize the SIC operation controls and routine Matthew Rosato
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

When doing device passthrough where interrupts are being forwarded from
host to guest, we wish to use a pinned section of guest memory as the
vector (the same memory used by the guest as the vector). To accomplish
this, add a new parameter for airq_iv_create which allows passing an
existing vector to be used instead of allocating a new one. The caller
is responsible for ensuring the vector is pinned in memory as well as for
unpinning the memory when the vector is no longer needed.

A subsequent patch will use this new parameter for zPCI interpretation.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/airq.h     |  4 +++-
 arch/s390/pci/pci_irq.c          |  8 ++++----
 drivers/s390/cio/airq.c          | 10 +++++++---
 drivers/s390/virtio/virtio_ccw.c |  2 +-
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/airq.h b/arch/s390/include/asm/airq.h
index 7918a7d09028..e82e5626e139 100644
--- a/arch/s390/include/asm/airq.h
+++ b/arch/s390/include/asm/airq.h
@@ -47,8 +47,10 @@ struct airq_iv {
 #define AIRQ_IV_PTR		4	/* Allocate the ptr array */
 #define AIRQ_IV_DATA		8	/* Allocate the data array */
 #define AIRQ_IV_CACHELINE	16	/* Cacheline alignment for the vector */
+#define AIRQ_IV_GUESTVEC	32	/* Vector is a pinned guest page */
 
-struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags);
+struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags,
+			       unsigned long *vec);
 void airq_iv_release(struct airq_iv *iv);
 unsigned long airq_iv_alloc(struct airq_iv *iv, unsigned long num);
 void airq_iv_free(struct airq_iv *iv, unsigned long bit, unsigned long num);
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index cc4c8d7c8f5c..0d0a02a9fbbf 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -296,7 +296,7 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 		zdev->aisb = bit;
 
 		/* Create adapter interrupt vector */
-		zdev->aibv = airq_iv_create(msi_vecs, AIRQ_IV_DATA | AIRQ_IV_BITLOCK);
+		zdev->aibv = airq_iv_create(msi_vecs, AIRQ_IV_DATA | AIRQ_IV_BITLOCK, NULL);
 		if (!zdev->aibv)
 			return -ENOMEM;
 
@@ -419,7 +419,7 @@ static int __init zpci_directed_irq_init(void)
 	union zpci_sic_iib iib = {{0}};
 	unsigned int cpu;
 
-	zpci_sbv = airq_iv_create(num_possible_cpus(), 0);
+	zpci_sbv = airq_iv_create(num_possible_cpus(), 0, NULL);
 	if (!zpci_sbv)
 		return -ENOMEM;
 
@@ -441,7 +441,7 @@ static int __init zpci_directed_irq_init(void)
 		zpci_ibv[cpu] = airq_iv_create(cache_line_size() * BITS_PER_BYTE,
 					       AIRQ_IV_DATA |
 					       AIRQ_IV_CACHELINE |
-					       (!cpu ? AIRQ_IV_ALLOC : 0));
+					       (!cpu ? AIRQ_IV_ALLOC : 0), NULL);
 		if (!zpci_ibv[cpu])
 			return -ENOMEM;
 	}
@@ -458,7 +458,7 @@ static int __init zpci_floating_irq_init(void)
 	if (!zpci_ibv)
 		return -ENOMEM;
 
-	zpci_sbv = airq_iv_create(ZPCI_NR_DEVICES, AIRQ_IV_ALLOC);
+	zpci_sbv = airq_iv_create(ZPCI_NR_DEVICES, AIRQ_IV_ALLOC, NULL);
 	if (!zpci_sbv)
 		goto out_free;
 
diff --git a/drivers/s390/cio/airq.c b/drivers/s390/cio/airq.c
index 2f2226786319..375a58b1c838 100644
--- a/drivers/s390/cio/airq.c
+++ b/drivers/s390/cio/airq.c
@@ -122,10 +122,12 @@ static inline unsigned long iv_size(unsigned long bits)
  * airq_iv_create - create an interrupt vector
  * @bits: number of bits in the interrupt vector
  * @flags: allocation flags
+ * @vec: pointer to pinned guest memory if AIRQ_IV_GUESTVEC
  *
  * Returns a pointer to an interrupt vector structure
  */
-struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags)
+struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags,
+			       unsigned long *vec)
 {
 	struct airq_iv *iv;
 	unsigned long size;
@@ -146,6 +148,8 @@ struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags)
 					     &iv->vector_dma);
 		if (!iv->vector)
 			goto out_free;
+	} else if (flags & AIRQ_IV_GUESTVEC) {
+		iv->vector = vec;
 	} else {
 		iv->vector = cio_dma_zalloc(size);
 		if (!iv->vector)
@@ -185,7 +189,7 @@ struct airq_iv *airq_iv_create(unsigned long bits, unsigned long flags)
 	kfree(iv->avail);
 	if (iv->flags & AIRQ_IV_CACHELINE && iv->vector)
 		dma_pool_free(airq_iv_cache, iv->vector, iv->vector_dma);
-	else
+	else if (!(iv->flags & AIRQ_IV_GUESTVEC))
 		cio_dma_free(iv->vector, size);
 	kfree(iv);
 out:
@@ -204,7 +208,7 @@ void airq_iv_release(struct airq_iv *iv)
 	kfree(iv->bitlock);
 	if (iv->flags & AIRQ_IV_CACHELINE)
 		dma_pool_free(airq_iv_cache, iv->vector, iv->vector_dma);
-	else
+	else if (!(iv->flags & AIRQ_IV_GUESTVEC))
 		cio_dma_free(iv->vector, iv_size(iv->bits));
 	kfree(iv->avail);
 	kfree(iv);
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 52c376d15978..410498d693f8 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -241,7 +241,7 @@ static struct airq_info *new_airq_info(int index)
 		return NULL;
 	rwlock_init(&info->lock);
 	info->aiv = airq_iv_create(VIRTIO_IV_BITS, AIRQ_IV_ALLOC | AIRQ_IV_PTR
-				   | AIRQ_IV_CACHELINE);
+				   | AIRQ_IV_CACHELINE, NULL);
 	if (!info->aiv) {
 		kfree(info);
 		return NULL;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 07/32] s390/pci: externalize the SIC operation controls and routine
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (5 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 06/32] s390/airq: allow for airq structure that uses an input vector Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 08/32] s390/pci: stash associated GISA designation Matthew Rosato
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

A subsequent patch will be issuing SIC from KVM -- export the necessary
routine and make the operation control definitions available from a header.
Because the routine will now be exported, let's rename __zpci_set_irq_ctrl
to zpci_set_irq_ctrl and get rid of the zero'd iib wrapper function of
the same name.

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/pci_insn.h | 17 +++++++++--------
 arch/s390/pci/pci_insn.c         |  3 ++-
 arch/s390/pci/pci_irq.c          | 26 ++++++++++++--------------
 3 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/s390/include/asm/pci_insn.h b/arch/s390/include/asm/pci_insn.h
index 61cf9531f68f..5331082fa516 100644
--- a/arch/s390/include/asm/pci_insn.h
+++ b/arch/s390/include/asm/pci_insn.h
@@ -98,6 +98,14 @@ struct zpci_fib {
 	u32 gd;
 } __packed __aligned(8);
 
+/* Set Interruption Controls Operation Controls  */
+#define	SIC_IRQ_MODE_ALL		0
+#define	SIC_IRQ_MODE_SINGLE		1
+#define	SIC_IRQ_MODE_DIRECT		4
+#define	SIC_IRQ_MODE_D_ALL		16
+#define	SIC_IRQ_MODE_D_SINGLE		17
+#define	SIC_IRQ_MODE_SET_CPU		18
+
 /* directed interruption information block */
 struct zpci_diib {
 	u32 : 1;
@@ -134,13 +142,6 @@ int __zpci_store(u64 data, u64 req, u64 offset);
 int zpci_store(const volatile void __iomem *addr, u64 data, unsigned long len);
 int __zpci_store_block(const u64 *data, u64 req, u64 offset);
 void zpci_barrier(void);
-int __zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib);
-
-static inline int zpci_set_irq_ctrl(u16 ctl, u8 isc)
-{
-	union zpci_sic_iib iib = {{0}};
-
-	return __zpci_set_irq_ctrl(ctl, isc, &iib);
-}
+int zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib);
 
 #endif
diff --git a/arch/s390/pci/pci_insn.c b/arch/s390/pci/pci_insn.c
index 4dd58b196cea..2a47b3936e44 100644
--- a/arch/s390/pci/pci_insn.c
+++ b/arch/s390/pci/pci_insn.c
@@ -97,7 +97,7 @@ int zpci_refresh_trans(u64 fn, u64 addr, u64 range)
 }
 
 /* Set Interruption Controls */
-int __zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib)
+int zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib)
 {
 	if (!test_facility(72))
 		return -EIO;
@@ -108,6 +108,7 @@ int __zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(zpci_set_irq_ctrl);
 
 /* PCI Load */
 static inline int ____pcilg(u64 *data, u64 req, u64 offset, u8 *status)
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 0d0a02a9fbbf..2f675355fd0c 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -15,13 +15,6 @@
 
 static enum {FLOATING, DIRECTED} irq_delivery;
 
-#define	SIC_IRQ_MODE_ALL		0
-#define	SIC_IRQ_MODE_SINGLE		1
-#define	SIC_IRQ_MODE_DIRECT		4
-#define	SIC_IRQ_MODE_D_ALL		16
-#define	SIC_IRQ_MODE_D_SINGLE		17
-#define	SIC_IRQ_MODE_SET_CPU		18
-
 /*
  * summary bit vector
  * FLOATING - summary bit per function
@@ -154,6 +147,7 @@ static struct irq_chip zpci_irq_chip = {
 static void zpci_handle_cpu_local_irq(bool rescan)
 {
 	struct airq_iv *dibv = zpci_ibv[smp_processor_id()];
+	union zpci_sic_iib iib = {{0}};
 	unsigned long bit;
 	int irqs_on = 0;
 
@@ -165,7 +159,7 @@ static void zpci_handle_cpu_local_irq(bool rescan)
 				/* End of second scan with interrupts on. */
 				break;
 			/* First scan complete, reenable interrupts. */
-			if (zpci_set_irq_ctrl(SIC_IRQ_MODE_D_SINGLE, PCI_ISC))
+			if (zpci_set_irq_ctrl(SIC_IRQ_MODE_D_SINGLE, PCI_ISC, &iib))
 				break;
 			bit = 0;
 			continue;
@@ -193,6 +187,7 @@ static void zpci_handle_remote_irq(void *data)
 static void zpci_handle_fallback_irq(void)
 {
 	struct cpu_irq_data *cpu_data;
+	union zpci_sic_iib iib = {{0}};
 	unsigned long cpu;
 	int irqs_on = 0;
 
@@ -203,7 +198,7 @@ static void zpci_handle_fallback_irq(void)
 				/* End of second scan with interrupts on. */
 				break;
 			/* First scan complete, reenable interrupts. */
-			if (zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC))
+			if (zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC, &iib))
 				break;
 			cpu = 0;
 			continue;
@@ -234,6 +229,7 @@ static void zpci_directed_irq_handler(struct airq_struct *airq,
 static void zpci_floating_irq_handler(struct airq_struct *airq,
 				      struct tpi_info *tpi_info)
 {
+	union zpci_sic_iib iib = {{0}};
 	unsigned long si, ai;
 	struct airq_iv *aibv;
 	int irqs_on = 0;
@@ -247,7 +243,7 @@ static void zpci_floating_irq_handler(struct airq_struct *airq,
 				/* End of second scan with interrupts on. */
 				break;
 			/* First scan complete, reenable interrupts. */
-			if (zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC))
+			if (zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC, &iib))
 				break;
 			si = 0;
 			continue;
@@ -407,11 +403,12 @@ static struct airq_struct zpci_airq = {
 static void __init cpu_enable_directed_irq(void *unused)
 {
 	union zpci_sic_iib iib = {{0}};
+	union zpci_sic_iib ziib = {{0}};
 
 	iib.cdiib.dibv_addr = (u64) zpci_ibv[smp_processor_id()]->vector;
 
-	__zpci_set_irq_ctrl(SIC_IRQ_MODE_SET_CPU, 0, &iib);
-	zpci_set_irq_ctrl(SIC_IRQ_MODE_D_SINGLE, PCI_ISC);
+	zpci_set_irq_ctrl(SIC_IRQ_MODE_SET_CPU, 0, &iib);
+	zpci_set_irq_ctrl(SIC_IRQ_MODE_D_SINGLE, PCI_ISC, &ziib);
 }
 
 static int __init zpci_directed_irq_init(void)
@@ -426,7 +423,7 @@ static int __init zpci_directed_irq_init(void)
 	iib.diib.isc = PCI_ISC;
 	iib.diib.nr_cpus = num_possible_cpus();
 	iib.diib.disb_addr = virt_to_phys(zpci_sbv->vector);
-	__zpci_set_irq_ctrl(SIC_IRQ_MODE_DIRECT, 0, &iib);
+	zpci_set_irq_ctrl(SIC_IRQ_MODE_DIRECT, 0, &iib);
 
 	zpci_ibv = kcalloc(num_possible_cpus(), sizeof(*zpci_ibv),
 			   GFP_KERNEL);
@@ -471,6 +468,7 @@ static int __init zpci_floating_irq_init(void)
 
 int __init zpci_irq_init(void)
 {
+	union zpci_sic_iib iib = {{0}};
 	int rc;
 
 	irq_delivery = sclp.has_dirq ? DIRECTED : FLOATING;
@@ -502,7 +500,7 @@ int __init zpci_irq_init(void)
 	 * Enable floating IRQs (with suppression after one IRQ). When using
 	 * directed IRQs this enables the fallback path.
 	 */
-	zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC);
+	zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC, &iib);
 
 	return 0;
 out_airq:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 08/32] s390/pci: stash associated GISA designation
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (6 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 07/32] s390/pci: externalize the SIC operation controls and routine Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 09/32] s390/pci: export some routines related to RPCIT processing Matthew Rosato
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

For passthrough devices, we will need to know the GISA designation of the
guest if interpretation facilities are to be used.  Setup to stash this in
the zdev and set a default of 0 (no GISA designation) for now; a subsequent
patch will set a valid GISA designation for passthrough devices.
Also, extend mpcific routines to specify this stashed designation as part
of the mpcific command.

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/pci.h     | 1 +
 arch/s390/include/asm/pci_clp.h | 3 ++-
 arch/s390/pci/pci.c             | 6 ++++++
 arch/s390/pci/pci_clp.c         | 1 +
 arch/s390/pci/pci_irq.c         | 5 +++++
 5 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 90824be5ce9a..d07d7c3205de 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -123,6 +123,7 @@ struct zpci_dev {
 	enum zpci_state state;
 	u32		fid;		/* function ID, used by sclp */
 	u32		fh;		/* function handle, used by insn's */
+	u32		gisa;		/* GISA designation for passthrough */
 	u16		vfn;		/* virtual function number */
 	u16		pchid;		/* physical channel ID */
 	u8		pfgid;		/* function group ID */
diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
index 1f4b666e85ee..f3286bc5ba6e 100644
--- a/arch/s390/include/asm/pci_clp.h
+++ b/arch/s390/include/asm/pci_clp.h
@@ -173,7 +173,8 @@ struct clp_req_set_pci {
 	u16 reserved2;
 	u8 oc;				/* operation controls */
 	u8 ndas;			/* number of dma spaces */
-	u64 reserved3;
+	u32 reserved3;
+	u32 gisa;			/* GISA designation */
 } __packed;
 
 /* Set PCI function response */
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 792f8e0f2178..ca9c29386de6 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -119,6 +119,7 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
 	fib.pba = base;
 	fib.pal = limit;
 	fib.iota = iota | ZPCI_IOTA_RTTO_FLAG;
+	fib.gd = zdev->gisa;
 	cc = zpci_mod_fc(req, &fib, &status);
 	if (cc)
 		zpci_dbg(3, "reg ioat fid:%x, cc:%d, status:%d\n", zdev->fid, cc, status);
@@ -132,6 +133,8 @@ int zpci_unregister_ioat(struct zpci_dev *zdev, u8 dmaas)
 	struct zpci_fib fib = {0};
 	u8 cc, status;
 
+	fib.gd = zdev->gisa;
+
 	cc = zpci_mod_fc(req, &fib, &status);
 	if (cc)
 		zpci_dbg(3, "unreg ioat fid:%x, cc:%d, status:%d\n", zdev->fid, cc, status);
@@ -159,6 +162,7 @@ int zpci_fmb_enable_device(struct zpci_dev *zdev)
 	atomic64_set(&zdev->unmapped_pages, 0);
 
 	fib.fmb_addr = virt_to_phys(zdev->fmb);
+	fib.gd = zdev->gisa;
 	cc = zpci_mod_fc(req, &fib, &status);
 	if (cc) {
 		kmem_cache_free(zdev_fmb_cache, zdev->fmb);
@@ -177,6 +181,8 @@ int zpci_fmb_disable_device(struct zpci_dev *zdev)
 	if (!zdev->fmb)
 		return -EINVAL;
 
+	fib.gd = zdev->gisa;
+
 	/* Function measurement is disabled if fmb address is zero */
 	cc = zpci_mod_fc(req, &fib, &status);
 	if (cc == 3) /* Function already gone. */
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index be077b39da33..4dcc37ddeeaf 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -240,6 +240,7 @@ static int clp_set_pci_fn(struct zpci_dev *zdev, u32 *fh, u8 nr_dma_as, u8 comma
 		rrb->request.fh = zdev->fh;
 		rrb->request.oc = command;
 		rrb->request.ndas = nr_dma_as;
+		rrb->request.gisa = zdev->gisa;
 
 		rc = clp_req(rrb, CLP_LPS_PCI);
 		if (rrb->response.hdr.rsp == CLP_RC_SETPCIFN_BUSY) {
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index 2f675355fd0c..a19ac0282929 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -43,6 +43,7 @@ static int zpci_set_airq(struct zpci_dev *zdev)
 	fib.fmt0.aibvo = 0;	/* each zdev has its own interrupt vector */
 	fib.fmt0.aisb = virt_to_phys(zpci_sbv->vector) + (zdev->aisb / 64) * 8;
 	fib.fmt0.aisbo = zdev->aisb & 63;
+	fib.gd = zdev->gisa;
 
 	return zpci_mod_fc(req, &fib, &status) ? -EIO : 0;
 }
@@ -54,6 +55,8 @@ static int zpci_clear_airq(struct zpci_dev *zdev)
 	struct zpci_fib fib = {0};
 	u8 cc, status;
 
+	fib.gd = zdev->gisa;
+
 	cc = zpci_mod_fc(req, &fib, &status);
 	if (cc == 3 || (cc == 1 && status == 24))
 		/* Function already gone or IRQs already deregistered. */
@@ -72,6 +75,7 @@ static int zpci_set_directed_irq(struct zpci_dev *zdev)
 	fib.fmt = 1;
 	fib.fmt1.noi = zdev->msi_nr_irqs;
 	fib.fmt1.dibvo = zdev->msi_first_bit;
+	fib.gd = zdev->gisa;
 
 	return zpci_mod_fc(req, &fib, &status) ? -EIO : 0;
 }
@@ -84,6 +88,7 @@ static int zpci_clear_directed_irq(struct zpci_dev *zdev)
 	u8 cc, status;
 
 	fib.fmt = 1;
+	fib.gd = zdev->gisa;
 	cc = zpci_mod_fc(req, &fib, &status);
 	if (cc == 3 || (cc == 1 && status == 24))
 		/* Function already gone or IRQs already deregistered. */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 09/32] s390/pci: export some routines related to RPCIT processing
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (7 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 08/32] s390/pci: stash associated GISA designation Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 10/32] s390/pci: stash dtsm and maxstbl Matthew Rosato
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

KVM will re-use dma_walk_cpu_trans to walk the host shadow table and
will also need to be able to call zpci_refresh_trans to re-issue a RPCIT.

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/pci/pci_dma.c  | 1 +
 arch/s390/pci/pci_insn.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index f46833a25526..a81de48d5ea7 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -116,6 +116,7 @@ unsigned long *dma_walk_cpu_trans(unsigned long *rto, dma_addr_t dma_addr)
 	px = calc_px(dma_addr);
 	return &pto[px];
 }
+EXPORT_SYMBOL_GPL(dma_walk_cpu_trans);
 
 void dma_update_cpu_trans(unsigned long *entry, phys_addr_t page_addr, int flags)
 {
diff --git a/arch/s390/pci/pci_insn.c b/arch/s390/pci/pci_insn.c
index 2a47b3936e44..0509554301c7 100644
--- a/arch/s390/pci/pci_insn.c
+++ b/arch/s390/pci/pci_insn.c
@@ -95,6 +95,7 @@ int zpci_refresh_trans(u64 fn, u64 addr, u64 range)
 
 	return (cc) ? -EIO : 0;
 }
+EXPORT_SYMBOL_GPL(zpci_refresh_trans);
 
 /* Set Interruption Controls */
 int zpci_set_irq_ctrl(u16 ctl, u8 isc, union zpci_sic_iib *iib)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 10/32] s390/pci: stash dtsm and maxstbl
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (8 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 09/32] s390/pci: export some routines related to RPCIT processing Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 11/32] s390/pci: add helper function to find device by handle Matthew Rosato
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Store information about what IOAT designation types are supported by
underlying hardware as well as the largest store block size allowed.
These values will be needed by passthrough.

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/pci.h     | 2 ++
 arch/s390/include/asm/pci_clp.h | 6 ++++--
 arch/s390/pci/pci_clp.c         | 2 ++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index d07d7c3205de..7ee52a70a96f 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -126,9 +126,11 @@ struct zpci_dev {
 	u32		gisa;		/* GISA designation for passthrough */
 	u16		vfn;		/* virtual function number */
 	u16		pchid;		/* physical channel ID */
+	u16		maxstbl;	/* Maximum store block size */
 	u8		pfgid;		/* function group ID */
 	u8		pft;		/* pci function type */
 	u8		port;
+	u8		dtsm;		/* Supported DT mask */
 	u8		rid_available	: 1;
 	u8		has_hp_slot	: 1;
 	u8		has_resources	: 1;
diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
index f3286bc5ba6e..d6189ed14f84 100644
--- a/arch/s390/include/asm/pci_clp.h
+++ b/arch/s390/include/asm/pci_clp.h
@@ -153,9 +153,11 @@ struct clp_rsp_query_pci_grp {
 	u8			:  6;
 	u8 frame		:  1;
 	u8 refresh		:  1;	/* TLB refresh mode */
-	u16 reserved2;
+	u16			:  3;
+	u16 maxstbl		: 13;	/* Maximum store block size */
 	u16 mui;
-	u16			: 16;
+	u8 dtsm;			/* Supported DT mask */
+	u8 reserved3;
 	u16 maxfaal;
 	u16			:  4;
 	u16 dnoi		: 12;
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index 4dcc37ddeeaf..dc733b58e74f 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -103,6 +103,8 @@ static void clp_store_query_pci_fngrp(struct zpci_dev *zdev,
 	zdev->max_msi = response->noi;
 	zdev->fmb_update = response->mui;
 	zdev->version = response->version;
+	zdev->maxstbl = response->maxstbl;
+	zdev->dtsm = response->dtsm;
 
 	switch (response->version) {
 	case 1:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 11/32] s390/pci: add helper function to find device by handle
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (9 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 10/32] s390/pci: stash dtsm and maxstbl Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 12/32] s390/pci: get SHM information from list pci Matthew Rosato
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Intercepted zPCI instructions will specify the desired function via a
function handle.  Add a routine to find the device with the specified
handle.

Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/pci.h |  1 +
 arch/s390/pci/pci.c         | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 7ee52a70a96f..3c0b9986dcdc 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -275,6 +275,7 @@ static inline struct zpci_dev *to_zpci_dev(struct device *dev)
 }
 
 struct zpci_dev *get_zdev_by_fid(u32);
+struct zpci_dev *get_zdev_by_fh(u32 fh);
 
 /* DMA */
 int zpci_dma_init(void);
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index ca9c29386de6..04c16312ad54 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -76,6 +76,22 @@ struct zpci_dev *get_zdev_by_fid(u32 fid)
 	return zdev;
 }
 
+struct zpci_dev *get_zdev_by_fh(u32 fh)
+{
+	struct zpci_dev *tmp, *zdev = NULL;
+
+	spin_lock(&zpci_list_lock);
+	list_for_each_entry(tmp, &zpci_list, entry) {
+		if (tmp->fh == fh) {
+			zdev = tmp;
+			break;
+		}
+	}
+	spin_unlock(&zpci_list_lock);
+	return zdev;
+}
+EXPORT_SYMBOL_GPL(get_zdev_by_fh);
+
 void zpci_remove_reserved_devices(void)
 {
 	struct zpci_dev *tmp, *zdev;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 12/32] s390/pci: get SHM information from list pci
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (10 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 11/32] s390/pci: add helper function to find device by handle Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 13/32] s390/pci: return status from zpci_refresh_trans Matthew Rosato
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

KVM will need information on the special handle mask used to indicate
emulated devices.  In order to obtain this, a new type of list pci call
must be made to gather the information.  Extend clp_list_pci_req to
also fetch the model-dependent-data field that holds this mask.

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/pci.h     |  1 +
 arch/s390/include/asm/pci_clp.h |  2 +-
 arch/s390/pci/pci_clp.c         | 25 ++++++++++++++++++++++---
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 3c0b9986dcdc..e8a3fd5bc169 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -227,6 +227,7 @@ int clp_enable_fh(struct zpci_dev *zdev, u32 *fh, u8 nr_dma_as);
 int clp_disable_fh(struct zpci_dev *zdev, u32 *fh);
 int clp_get_state(u32 fid, enum zpci_state *state);
 int clp_refresh_fh(u32 fid, u32 *fh);
+int zpci_get_mdd(u32 *mdd);
 
 /* UID */
 void update_uid_checking(bool new);
diff --git a/arch/s390/include/asm/pci_clp.h b/arch/s390/include/asm/pci_clp.h
index d6189ed14f84..dc2041e97de4 100644
--- a/arch/s390/include/asm/pci_clp.h
+++ b/arch/s390/include/asm/pci_clp.h
@@ -76,7 +76,7 @@ struct clp_req_list_pci {
 struct clp_rsp_list_pci {
 	struct clp_rsp_hdr hdr;
 	u64 resume_token;
-	u32 reserved2;
+	u32 mdd;
 	u16 max_fn;
 	u8			: 7;
 	u8 uid_checking		: 1;
diff --git a/arch/s390/pci/pci_clp.c b/arch/s390/pci/pci_clp.c
index dc733b58e74f..7477956be632 100644
--- a/arch/s390/pci/pci_clp.c
+++ b/arch/s390/pci/pci_clp.c
@@ -328,7 +328,7 @@ int clp_disable_fh(struct zpci_dev *zdev, u32 *fh)
 }
 
 static int clp_list_pci_req(struct clp_req_rsp_list_pci *rrb,
-			    u64 *resume_token, int *nentries)
+			    u64 *resume_token, int *nentries, u32 *mdd)
 {
 	int rc;
 
@@ -354,6 +354,8 @@ static int clp_list_pci_req(struct clp_req_rsp_list_pci *rrb,
 	*nentries = (rrb->response.hdr.len - LIST_PCI_HDR_LEN) /
 		rrb->response.entry_size;
 	*resume_token = rrb->response.resume_token;
+	if (mdd)
+		*mdd = rrb->response.mdd;
 
 	return rc;
 }
@@ -365,7 +367,7 @@ static int clp_list_pci(struct clp_req_rsp_list_pci *rrb, void *data,
 	int nentries, i, rc;
 
 	do {
-		rc = clp_list_pci_req(rrb, &resume_token, &nentries);
+		rc = clp_list_pci_req(rrb, &resume_token, &nentries, NULL);
 		if (rc)
 			return rc;
 		for (i = 0; i < nentries; i++)
@@ -383,7 +385,7 @@ static int clp_find_pci(struct clp_req_rsp_list_pci *rrb, u32 fid,
 	int nentries, i, rc;
 
 	do {
-		rc = clp_list_pci_req(rrb, &resume_token, &nentries);
+		rc = clp_list_pci_req(rrb, &resume_token, &nentries, NULL);
 		if (rc)
 			return rc;
 		fh_list = rrb->response.fh_list;
@@ -468,6 +470,23 @@ int clp_get_state(u32 fid, enum zpci_state *state)
 	return rc;
 }
 
+int zpci_get_mdd(u32 *mdd)
+{
+	struct clp_req_rsp_list_pci *rrb;
+	u64 resume_token = 0;
+	int nentries, rc;
+
+	rrb = clp_alloc_block(GFP_KERNEL);
+	if (!rrb)
+		return -ENOMEM;
+
+	rc = clp_list_pci_req(rrb, &resume_token, &nentries, mdd);
+
+	clp_free_block(rrb);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(zpci_get_mdd);
+
 static int clp_base_slpc(struct clp_req *req, struct clp_req_rsp_slpc *lpcb)
 {
 	unsigned long limit = PAGE_SIZE - sizeof(lpcb->request);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 13/32] s390/pci: return status from zpci_refresh_trans
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (11 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 12/32] s390/pci: get SHM information from list pci Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type Matthew Rosato
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Current callers of zpci_refresh_trans don't need to interrogate the status
returned from the underlying instructions.  However, a subsequent patch
will add a KVM caller that needs this information.  Add a new argument to
zpci_refresh_trans to pass the address of a status byte and update
existing call sites to provide it.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/pci_insn.h |  2 +-
 arch/s390/pci/pci_dma.c          |  6 ++++--
 arch/s390/pci/pci_insn.c         | 10 +++++-----
 drivers/iommu/s390-iommu.c       |  4 +++-
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/pci_insn.h b/arch/s390/include/asm/pci_insn.h
index 5331082fa516..32759c407b8f 100644
--- a/arch/s390/include/asm/pci_insn.h
+++ b/arch/s390/include/asm/pci_insn.h
@@ -135,7 +135,7 @@ union zpci_sic_iib {
 DECLARE_STATIC_KEY_FALSE(have_mio);
 
 u8 zpci_mod_fc(u64 req, struct zpci_fib *fib, u8 *status);
-int zpci_refresh_trans(u64 fn, u64 addr, u64 range);
+int zpci_refresh_trans(u64 fn, u64 addr, u64 range, u8 *status);
 int __zpci_load(u64 *data, u64 req, u64 offset);
 int zpci_load(u64 *data, const volatile void __iomem *addr, unsigned long len);
 int __zpci_store(u64 data, u64 req, u64 offset);
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index a81de48d5ea7..b0a2380bcad8 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -23,8 +23,9 @@ static u32 s390_iommu_aperture_factor = 1;
 
 static int zpci_refresh_global(struct zpci_dev *zdev)
 {
+	u8 status;
 	return zpci_refresh_trans((u64) zdev->fh << 32, zdev->start_dma,
-				  zdev->iommu_pages * PAGE_SIZE);
+				  zdev->iommu_pages * PAGE_SIZE, &status);
 }
 
 unsigned long *dma_alloc_cpu_table(void)
@@ -183,6 +184,7 @@ static int __dma_purge_tlb(struct zpci_dev *zdev, dma_addr_t dma_addr,
 			   size_t size, int flags)
 {
 	unsigned long irqflags;
+	u8 status;
 	int ret;
 
 	/*
@@ -201,7 +203,7 @@ static int __dma_purge_tlb(struct zpci_dev *zdev, dma_addr_t dma_addr,
 	}
 
 	ret = zpci_refresh_trans((u64) zdev->fh << 32, dma_addr,
-				 PAGE_ALIGN(size));
+				 PAGE_ALIGN(size), &status);
 	if (ret == -ENOMEM && !s390_iommu_strict) {
 		/* enable the hypervisor to free some resources */
 		if (zpci_refresh_global(zdev))
diff --git a/arch/s390/pci/pci_insn.c b/arch/s390/pci/pci_insn.c
index 0509554301c7..ca6399d52767 100644
--- a/arch/s390/pci/pci_insn.c
+++ b/arch/s390/pci/pci_insn.c
@@ -77,20 +77,20 @@ static inline u8 __rpcit(u64 fn, u64 addr, u64 range, u8 *status)
 	return cc;
 }
 
-int zpci_refresh_trans(u64 fn, u64 addr, u64 range)
+int zpci_refresh_trans(u64 fn, u64 addr, u64 range, u8 *status)
 {
-	u8 cc, status;
+	u8 cc;
 
 	do {
-		cc = __rpcit(fn, addr, range, &status);
+		cc = __rpcit(fn, addr, range, status);
 		if (cc == 2)
 			udelay(ZPCI_INSN_BUSY_DELAY);
 	} while (cc == 2);
 
 	if (cc)
-		zpci_err_insn(cc, status, addr, range);
+		zpci_err_insn(cc, *status, addr, range);
 
-	if (cc == 1 && (status == 4 || status == 16))
+	if (cc == 1 && (*status == 4 || *status == 16))
 		return -ENOMEM;
 
 	return (cc) ? -EIO : 0;
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 3833e86c6e7b..73a85c599dc2 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -214,6 +214,7 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 	unsigned long irq_flags, nr_pages, i;
 	unsigned long *entry;
 	int rc = 0;
+	u8 status;
 
 	if (dma_addr < s390_domain->domain.geometry.aperture_start ||
 	    dma_addr + size > s390_domain->domain.geometry.aperture_end)
@@ -238,7 +239,8 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain,
 	spin_lock(&s390_domain->list_lock);
 	list_for_each_entry(domain_device, &s390_domain->devices, list) {
 		rc = zpci_refresh_trans((u64) domain_device->zdev->fh << 32,
-					start_dma_addr, nr_pages * PAGE_SIZE);
+					start_dma_addr, nr_pages * PAGE_SIZE,
+					&status);
 		if (rc)
 			break;
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (12 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 13/32] s390/pci: return status from zpci_refresh_trans Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 21:36   ` Jason Gunthorpe
  2022-03-15 10:49   ` Robin Murphy
  2022-03-14 19:44 ` [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type Matthew Rosato
                   ` (18 subsequent siblings)
  32 siblings, 2 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

s390x will introduce an additional domain type that is used for
managing IOMMU owned by KVM.  Define the type here and add an
interface for allocating a specified type vs the default type.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 drivers/iommu/iommu.c |  7 +++++++
 include/linux/iommu.h | 12 ++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2c45b85b9fc..8bb57e0e3945 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1976,6 +1976,13 @@ void iommu_domain_free(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL_GPL(iommu_domain_free);
 
+struct iommu_domain *iommu_domain_alloc_type(struct bus_type *bus,
+					     unsigned int t)
+{
+	return __iommu_domain_alloc(bus, t);
+}
+EXPORT_SYMBOL_GPL(iommu_domain_alloc_type);
+
 static int __iommu_attach_device(struct iommu_domain *domain,
 				 struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9208eca4b0d1..b427bbb9f387 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -63,6 +63,7 @@ struct iommu_domain_geometry {
 					      implementation              */
 #define __IOMMU_DOMAIN_PT	(1U << 2)  /* Domain is identity mapped   */
 #define __IOMMU_DOMAIN_DMA_FQ	(1U << 3)  /* DMA-API uses flush queue    */
+#define __IOMMU_DOMAIN_KVM	(1U << 4)  /* Domain is controlled by KVM */
 
 /*
  * This are the possible domain-types
@@ -77,6 +78,7 @@ struct iommu_domain_geometry {
  *				  certain optimizations for these domains
  *	IOMMU_DOMAIN_DMA_FQ	- As above, but definitely using batched TLB
  *				  invalidation.
+ *	IOMMU_DOMAIN_KVM	- DMA mappings managed by KVM, used for VMs
  */
 #define IOMMU_DOMAIN_BLOCKED	(0U)
 #define IOMMU_DOMAIN_IDENTITY	(__IOMMU_DOMAIN_PT)
@@ -86,6 +88,8 @@ struct iommu_domain_geometry {
 #define IOMMU_DOMAIN_DMA_FQ	(__IOMMU_DOMAIN_PAGING |	\
 				 __IOMMU_DOMAIN_DMA_API |	\
 				 __IOMMU_DOMAIN_DMA_FQ)
+#define IOMMU_DOMAIN_KVM	(__IOMMU_DOMAIN_PAGING |	\
+				 __IOMMU_DOMAIN_KVM)
 
 struct iommu_domain {
 	unsigned type;
@@ -421,6 +425,8 @@ extern bool iommu_capable(struct bus_type *bus, enum iommu_cap cap);
 extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
 extern struct iommu_group *iommu_group_get_by_id(int id);
 extern void iommu_domain_free(struct iommu_domain *domain);
+extern struct iommu_domain *iommu_domain_alloc_type(struct bus_type *bus,
+						    unsigned int t);
 extern int iommu_attach_device(struct iommu_domain *domain,
 			       struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
@@ -708,6 +714,12 @@ static inline void iommu_domain_free(struct iommu_domain *domain)
 {
 }
 
+static inline struct iommu_domain *iommu_domain_alloc_type(struct bus_type *bus,
+							   unsigned int t)
+{
+	return NULL;
+}
+
 static inline int iommu_attach_device(struct iommu_domain *domain,
 				      struct device *dev)
 {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (13 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 21:38   ` Jason Gunthorpe
  2022-03-14 22:50   ` Alex Williamson
  2022-03-14 19:44 ` [PATCH v4 16/32] vfio-pci/zdev: add function handle to clp base capability Matthew Rosato
                   ` (17 subsequent siblings)
  32 siblings, 2 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

s390x will introduce a new IOMMU domain type where the mappings are
managed by KVM rather than in response to userspace mapping ioctls.  Allow
for specifying this type on the VFIO_SET_IOMMU ioctl and triggering the
appropriate iommu interface for overriding the default domain.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 drivers/vfio/vfio_iommu_type1.c | 12 +++++++++++-
 include/uapi/linux/vfio.h       |  6 ++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9394aa9444c1..0bec97077d61 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -77,6 +77,7 @@ struct vfio_iommu {
 	bool			nesting;
 	bool			dirty_page_tracking;
 	bool			container_open;
+	bool			kvm;
 	struct list_head	emulated_iommu_groups;
 };
 
@@ -2203,7 +2204,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		goto out_free_group;
 
 	ret = -EIO;
-	domain->domain = iommu_domain_alloc(bus);
+
+	if (iommu->kvm)
+		domain->domain = iommu_domain_alloc_type(bus, IOMMU_DOMAIN_KVM);
+	else
+		domain->domain = iommu_domain_alloc(bus);
+
 	if (!domain->domain)
 		goto out_free_domain;
 
@@ -2552,6 +2558,9 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	case VFIO_TYPE1v2_IOMMU:
 		iommu->v2 = true;
 		break;
+	case VFIO_KVM_IOMMU:
+		iommu->kvm = true;
+		break;
 	default:
 		kfree(iommu);
 		return ERR_PTR(-EINVAL);
@@ -2637,6 +2646,7 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
 	case VFIO_TYPE1_NESTING_IOMMU:
 	case VFIO_UNMAP_ALL:
 	case VFIO_UPDATE_VADDR:
+	case VFIO_KVM_IOMMU:
 		return 1;
 	case VFIO_DMA_CC_IOMMU:
 		if (!iommu)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index ef33ea002b0b..666edb6957ac 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -52,6 +52,12 @@
 /* Supports the vaddr flag for DMA map and unmap */
 #define VFIO_UPDATE_VADDR		10
 
+/*
+ * The KVM_IOMMU type implies that the hypervisor will control the mappings
+ * rather than userspace
+ */
+#define VFIO_KVM_IOMMU			11
+
 /*
  * The IOCTL interface is designed for extensibility by embedding the
  * structure length (argsz) and flags into structures passed between
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 16/32] vfio-pci/zdev: add function handle to clp base capability
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (14 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 17/32] KVM: s390: pci: add basic kvm_zdev structure Matthew Rosato
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

The function handle is a system-wide unique identifier for a zPCI
device.  It is used as input for various zPCI operations.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 drivers/vfio/pci/vfio_pci_zdev.c | 5 +++--
 include/uapi/linux/vfio_zdev.h   | 3 +++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index ea4c0d2b0663..4a653ce480c7 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -23,14 +23,15 @@ static int zpci_base_cap(struct zpci_dev *zdev, struct vfio_info_cap *caps)
 {
 	struct vfio_device_info_cap_zpci_base cap = {
 		.header.id = VFIO_DEVICE_INFO_CAP_ZPCI_BASE,
-		.header.version = 1,
+		.header.version = 2,
 		.start_dma = zdev->start_dma,
 		.end_dma = zdev->end_dma,
 		.pchid = zdev->pchid,
 		.vfn = zdev->vfn,
 		.fmb_length = zdev->fmb_length,
 		.pft = zdev->pft,
-		.gid = zdev->pfgid
+		.gid = zdev->pfgid,
+		.fh = zdev->fh
 	};
 
 	return vfio_info_add_capability(caps, &cap.header, sizeof(cap));
diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
index b4309397b6b2..78c022af3d29 100644
--- a/include/uapi/linux/vfio_zdev.h
+++ b/include/uapi/linux/vfio_zdev.h
@@ -29,6 +29,9 @@ struct vfio_device_info_cap_zpci_base {
 	__u16 fmb_length;	/* Measurement Block Length (in bytes) */
 	__u8 pft;		/* PCI Function Type */
 	__u8 gid;		/* PCI function group ID */
+	/* End of version 1 */
+	__u32 fh;		/* PCI function handle */
+	/* End of version 2 */
 };
 
 /**
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 17/32] KVM: s390: pci: add basic kvm_zdev structure
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (15 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 16/32] vfio-pci/zdev: add function handle to clp base capability Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 18/32] iommu/s390: add support for IOMMU_DOMAIN_KVM Matthew Rosato
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

This structure will be used to carry kvm passthrough information related to
zPCI devices.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/kvm_pci.h | 27 +++++++++++++++++++++++
 arch/s390/include/asm/pci.h     |  3 +++
 arch/s390/kvm/Makefile          |  1 +
 arch/s390/kvm/pci.c             | 38 +++++++++++++++++++++++++++++++++
 4 files changed, 69 insertions(+)
 create mode 100644 arch/s390/include/asm/kvm_pci.h
 create mode 100644 arch/s390/kvm/pci.c

diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
new file mode 100644
index 000000000000..ae8669105f72
--- /dev/null
+++ b/arch/s390/include/asm/kvm_pci.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KVM PCI Passthrough for virtual machines on s390
+ *
+ * Copyright IBM Corp. 2022
+ *
+ *    Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
+ */
+
+#ifndef ASM_KVM_PCI_H
+#define ASM_KVM_PCI_H
+
+#include <linux/types.h>
+#include <linux/kvm_types.h>
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+#include <linux/pci.h>
+
+struct kvm_zdev {
+	struct zpci_dev *zdev;
+	struct kvm *kvm;
+};
+
+int kvm_s390_pci_dev_open(struct zpci_dev *zdev);
+void kvm_s390_pci_dev_release(struct zpci_dev *zdev);
+
+#endif /* ASM_KVM_PCI_H */
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index e8a3fd5bc169..4faff673078b 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -97,6 +97,7 @@ struct zpci_bar_struct {
 };
 
 struct s390_domain;
+struct kvm_zdev;
 
 #define ZPCI_FUNCTIONS_PER_BUS 256
 struct zpci_bus {
@@ -190,6 +191,8 @@ struct zpci_dev {
 	struct dentry	*debugfs_dev;
 
 	struct s390_domain *s390_domain; /* s390 IOMMU domain data */
+
+	struct kvm_zdev *kzdev; /* passthrough data */
 };
 
 static inline bool zdev_enabled(struct zpci_dev *zdev)
diff --git a/arch/s390/kvm/Makefile b/arch/s390/kvm/Makefile
index 26f4a74e5ce4..00cf6853d93f 100644
--- a/arch/s390/kvm/Makefile
+++ b/arch/s390/kvm/Makefile
@@ -10,4 +10,5 @@ ccflags-y := -Ivirt/kvm -Iarch/s390/kvm
 kvm-y += kvm-s390.o intercept.o interrupt.o priv.o sigp.o
 kvm-y += diag.o gaccess.o guestdbg.o vsie.o pv.o
 
+kvm-$(CONFIG_PCI) += pci.o
 obj-$(CONFIG_KVM) += kvm.o
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
new file mode 100644
index 000000000000..612faf87126d
--- /dev/null
+++ b/arch/s390/kvm/pci.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * s390 kvm PCI passthrough support
+ *
+ * Copyright IBM Corp. 2022
+ *
+ *    Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/pci.h>
+#include <asm/kvm_pci.h>
+
+int kvm_s390_pci_dev_open(struct zpci_dev *zdev)
+{
+	struct kvm_zdev *kzdev;
+
+	kzdev = kzalloc(sizeof(struct kvm_zdev), GFP_KERNEL);
+	if (!kzdev)
+		return -ENOMEM;
+
+	kzdev->zdev = zdev;
+	zdev->kzdev = kzdev;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_s390_pci_dev_open);
+
+void kvm_s390_pci_dev_release(struct zpci_dev *zdev)
+{
+	struct kvm_zdev *kzdev;
+
+	kzdev = zdev->kzdev;
+	WARN_ON(kzdev->zdev != zdev);
+	zdev->kzdev = 0;
+	kfree(kzdev);
+}
+EXPORT_SYMBOL_GPL(kvm_s390_pci_dev_release);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 18/32] iommu/s390: add support for IOMMU_DOMAIN_KVM
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (16 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 17/32] KVM: s390: pci: add basic kvm_zdev structure Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 19/32] KVM: s390: pci: do initial setup for AEN interpretation Matthew Rosato
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Add an alternate domain ops for type IOMMU_DOMAIN_KVM.  This type is
intended for use when KVM is managing the IOMMU domain on behalf of a
VM.  Mapping can only be performed once a KVM is registered with the
domain as well as a guest IOTA (address translation anchor).

The map operation is expected to be received in response to an
04 intercept of a guest RPCIT instruction, and will perform a
synchronization operation between the host DMA and guest DMA tables
over the range specified.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/kvm_pci.h |   6 +
 arch/s390/include/asm/pci_dma.h |   3 +
 drivers/iommu/Kconfig           |   8 +
 drivers/iommu/Makefile          |   1 +
 drivers/iommu/s390-iommu.c      |  49 ++--
 drivers/iommu/s390-iommu.h      |  53 ++++
 drivers/iommu/s390-kvm-iommu.c  | 469 ++++++++++++++++++++++++++++++++
 7 files changed, 562 insertions(+), 27 deletions(-)
 create mode 100644 drivers/iommu/s390-iommu.h
 create mode 100644 drivers/iommu/s390-kvm-iommu.c

diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
index ae8669105f72..ebc0da5d9ac1 100644
--- a/arch/s390/include/asm/kvm_pci.h
+++ b/arch/s390/include/asm/kvm_pci.h
@@ -11,6 +11,7 @@
 #define ASM_KVM_PCI_H
 
 #include <linux/types.h>
+#include <linux/iommu.h>
 #include <linux/kvm_types.h>
 #include <linux/kvm_host.h>
 #include <linux/kvm.h>
@@ -19,9 +20,14 @@
 struct kvm_zdev {
 	struct zpci_dev *zdev;
 	struct kvm *kvm;
+	struct iommu_domain *dom; /* Used to invoke IOMMU API for RPCIT */
 };
 
 int kvm_s390_pci_dev_open(struct zpci_dev *zdev);
 void kvm_s390_pci_dev_release(struct zpci_dev *zdev);
 
+int zpci_iommu_attach_kvm(struct zpci_dev *zdev, struct kvm *kvm);
+int zpci_iommu_kvm_assign_iota(struct zpci_dev *zdev, u64 iota);
+int zpci_iommu_kvm_remove_iota(struct zpci_dev *zdev);
+
 #endif /* ASM_KVM_PCI_H */
diff --git a/arch/s390/include/asm/pci_dma.h b/arch/s390/include/asm/pci_dma.h
index 91e63426bdc5..38004e0a4383 100644
--- a/arch/s390/include/asm/pci_dma.h
+++ b/arch/s390/include/asm/pci_dma.h
@@ -50,6 +50,9 @@ enum zpci_ioat_dtype {
 #define ZPCI_TABLE_ALIGN		ZPCI_TABLE_SIZE
 #define ZPCI_TABLE_ENTRY_SIZE		(sizeof(unsigned long))
 #define ZPCI_TABLE_ENTRIES		(ZPCI_TABLE_SIZE / ZPCI_TABLE_ENTRY_SIZE)
+#define ZPCI_TABLE_PAGES		(ZPCI_TABLE_SIZE >> PAGE_SHIFT)
+#define ZPCI_TABLE_ENTRIES_PAGES	(ZPCI_TABLE_ENTRIES * ZPCI_TABLE_PAGES)
+#define ZPCI_TABLE_ENTRIES_PER_PAGE	(ZPCI_TABLE_ENTRIES / ZPCI_TABLE_PAGES)
 
 #define ZPCI_TABLE_BITS			11
 #define ZPCI_PT_BITS			8
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 3eb68fa1b8cc..9637f73925ec 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -411,6 +411,14 @@ config S390_AP_IOMMU
 	  Enables bits of IOMMU API required by VFIO. The iommu_ops
 	  is not implemented as it is not necessary for VFIO.
 
+config S390_KVM_IOMMU
+	bool "S390 KVM IOMMU Support"
+	depends on S390_IOMMU && KVM || COMPILE_TEST
+	select IOMMU_API
+	help
+	  Extends the S390 IOMMU API to support a domain owned and managed by
+	  KVM. This allows KVM to manage nested mappings vs userspace.
+
 config MTK_IOMMU
 	tristate "MediaTek IOMMU Support"
 	depends on ARCH_MEDIATEK || COMPILE_TEST
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index bc7f730edbb0..5476e978d7f5 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
 obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
+obj-$(CONFIG_S390_KVM_IOMMU) += s390-kvm-iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
 obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 73a85c599dc2..0ead37f6e232 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -11,6 +11,7 @@
 #include <linux/iommu-helper.h>
 #include <linux/sizes.h>
 #include <asm/pci_dma.h>
+#include "s390-iommu.h"
 
 /*
  * Physically contiguous memory regions can be mapped with 4 KiB alignment,
@@ -21,24 +22,6 @@
 
 static const struct iommu_ops s390_iommu_ops;
 
-struct s390_domain {
-	struct iommu_domain	domain;
-	struct list_head	devices;
-	unsigned long		*dma_table;
-	spinlock_t		dma_table_lock;
-	spinlock_t		list_lock;
-};
-
-struct s390_domain_device {
-	struct list_head	list;
-	struct zpci_dev		*zdev;
-};
-
-static struct s390_domain *to_s390_domain(struct iommu_domain *dom)
-{
-	return container_of(dom, struct s390_domain, domain);
-}
-
 static bool s390_iommu_capable(enum iommu_cap cap)
 {
 	switch (cap) {
@@ -55,7 +38,12 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
 {
 	struct s390_domain *s390_domain;
 
-	if (domain_type != IOMMU_DOMAIN_UNMANAGED)
+	if (domain_type != IOMMU_DOMAIN_UNMANAGED &&
+	    domain_type != IOMMU_DOMAIN_KVM)
+		return NULL;
+
+	if (domain_type == IOMMU_DOMAIN_KVM &&
+	    !IS_ENABLED(CONFIG_S390_KVM_IOMMU))
 		return NULL;
 
 	s390_domain = kzalloc(sizeof(*s390_domain), GFP_KERNEL);
@@ -68,23 +56,30 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
 		return NULL;
 	}
 
+	/* If KVM-managed, swap in alternate ops now */
+	if (IS_ENABLED(CONFIG_S390_KVM_IOMMU) &&
+	    domain_type == IOMMU_DOMAIN_KVM)
+		s390_domain->domain.ops = &s390_kvm_domain_ops;
+
 	spin_lock_init(&s390_domain->dma_table_lock);
 	spin_lock_init(&s390_domain->list_lock);
+	mutex_init(&s390_domain->kvm_dom.ioat_lock);
 	INIT_LIST_HEAD(&s390_domain->devices);
 
 	return &s390_domain->domain;
 }
 
-static void s390_domain_free(struct iommu_domain *domain)
+void s390_domain_free(struct iommu_domain *domain)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 
 	dma_cleanup_tables(s390_domain->dma_table);
+	mutex_destroy(&s390_domain->kvm_dom.ioat_lock);
 	kfree(s390_domain);
 }
 
-static int s390_iommu_attach_device(struct iommu_domain *domain,
-				    struct device *dev)
+int s390_iommu_attach_device(struct iommu_domain *domain,
+			     struct device *dev)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	struct zpci_dev *zdev = to_zpci_dev(dev);
@@ -143,8 +138,8 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	return rc;
 }
 
-static void s390_iommu_detach_device(struct iommu_domain *domain,
-				     struct device *dev)
+void s390_iommu_detach_device(struct iommu_domain *domain,
+			      struct device *dev)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	struct zpci_dev *zdev = to_zpci_dev(dev);
@@ -200,7 +195,7 @@ static void s390_iommu_release_device(struct device *dev)
 	if (zdev && zdev->s390_domain) {
 		domain = iommu_get_domain_for_dev(dev);
 		if (domain)
-			s390_iommu_detach_device(domain, dev);
+			domain->ops->detach_dev(domain, dev);
 	}
 }
 
@@ -282,8 +277,8 @@ static int s390_iommu_map(struct iommu_domain *domain, unsigned long iova,
 	return rc;
 }
 
-static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
-					   dma_addr_t iova)
+phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
+				    dma_addr_t iova)
 {
 	struct s390_domain *s390_domain = to_s390_domain(domain);
 	unsigned long *sto, *pto, *rto, flags;
diff --git a/drivers/iommu/s390-iommu.h b/drivers/iommu/s390-iommu.h
new file mode 100644
index 000000000000..21c8243a36b1
--- /dev/null
+++ b/drivers/iommu/s390-iommu.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * IOMMU API for s390 PCI devices
+ *
+ * Copyright IBM Corp. 2022
+ * Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
+ */
+
+#ifndef _S390_IOMMU_H
+#define _S390_IOMMU_H
+
+#include <linux/iommu.h>
+#include <linux/kvm_host.h>
+
+extern const struct iommu_domain_ops s390_kvm_domain_ops;
+
+struct s390_kvm_domain {
+	struct kvm		*kvm;
+	unsigned long		*head[ZPCI_TABLE_PAGES];
+	unsigned long		**seg;
+	unsigned long		***pt;
+	struct page *(*pin)(struct kvm *kvm, gfn_t gfn);
+	void (*unpin)(kvm_pfn_t pfn);
+	struct mutex		ioat_lock;
+	bool			map_enabled;
+};
+
+struct s390_domain {
+	struct iommu_domain	domain;
+	struct list_head	devices;
+	unsigned long		*dma_table;
+	spinlock_t		dma_table_lock;
+	spinlock_t		list_lock;
+	struct s390_kvm_domain	kvm_dom;
+};
+
+struct s390_domain_device {
+	struct list_head	list;
+	struct zpci_dev		*zdev;
+};
+
+static inline struct s390_domain *to_s390_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct s390_domain, domain);
+}
+
+void s390_domain_free(struct iommu_domain *domain);
+int s390_iommu_attach_device(struct iommu_domain *domain, struct device *dev);
+void s390_iommu_detach_device(struct iommu_domain *domain, struct device *dev);
+phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
+				    dma_addr_t iova);
+
+#endif /* _S390_IOMMU_H */
diff --git a/drivers/iommu/s390-kvm-iommu.c b/drivers/iommu/s390-kvm-iommu.c
new file mode 100644
index 000000000000..d24e6904d5f8
--- /dev/null
+++ b/drivers/iommu/s390-kvm-iommu.c
@@ -0,0 +1,469 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * IOMMU API domain ops for s390 PCI devices using KVM passthrough
+ *
+ * Copyright IBM Corp. 2022
+ * Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
+ */
+
+#include <linux/pci.h>
+#include <linux/iommu.h>
+#include <linux/iommu-helper.h>
+#include <linux/sizes.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_pci.h>
+#include <asm/pci_dma.h>
+#include "s390-iommu.h"
+
+const struct iommu_domain_ops s390_kvm_domain_ops;
+
+static int dma_shadow_cpu_trans(struct s390_kvm_domain *kvm_dom,
+				unsigned long *entry, unsigned long *gentry)
+{
+	phys_addr_t gaddr = 0;
+	unsigned long idx;
+	struct page *page;
+	kvm_pfn_t pfn;
+	gpa_t addr;
+	int rc = 0;
+
+	if (pt_entry_isvalid(*gentry)) {
+		/* pin and validate */
+		addr = *gentry & ZPCI_PTE_ADDR_MASK;
+		idx = srcu_read_lock(&kvm_dom->kvm->srcu);
+		page = kvm_dom->pin(kvm_dom->kvm, gpa_to_gfn(addr));
+		srcu_read_unlock(&kvm_dom->kvm->srcu, idx);
+		if (is_error_page(page))
+			return -EIO;
+		gaddr = page_to_phys(page) + (addr & ~PAGE_MASK);
+	}
+
+	if (pt_entry_isvalid(*entry)) {
+		/* Either we are invalidating, replacing or no-op */
+		if (gaddr != 0) {
+			if ((*entry & ZPCI_PTE_ADDR_MASK) == gaddr) {
+				/* Duplicate */
+				kvm_dom->unpin(*entry >> PAGE_SHIFT);
+			} else {
+				/* Replace */
+				pfn = (*entry >> PAGE_SHIFT);
+				invalidate_pt_entry(entry);
+				set_pt_pfaa(entry, gaddr);
+				validate_pt_entry(entry);
+				kvm_dom->unpin(pfn);
+				rc = 1;
+			}
+		} else {
+			/* Invalidate */
+			pfn = (*entry >> PAGE_SHIFT);
+			invalidate_pt_entry(entry);
+			kvm_dom->unpin(pfn);
+			rc = 1;
+		}
+	} else if (gaddr != 0) {
+		/* New Entry */
+		set_pt_pfaa(entry, gaddr);
+		validate_pt_entry(entry);
+	}
+
+	return rc;
+}
+
+static unsigned long *dma_walk_guest_cpu_trans(struct s390_kvm_domain *kvm_dom,
+					       dma_addr_t dma_addr)
+{
+	unsigned long *rto, *sto, *pto;
+	unsigned int rtx, rts, sx, px, idx;
+	struct page *page;
+	gpa_t addr;
+	int i;
+
+	/* Pin guest segment table if needed */
+	rtx = calc_rtx(dma_addr);
+	rto = kvm_dom->head[(rtx / ZPCI_TABLE_ENTRIES_PER_PAGE)];
+	rts = rtx * ZPCI_TABLE_PAGES;
+	if (!kvm_dom->seg[rts]) {
+		if (!reg_entry_isvalid(rto[rtx % ZPCI_TABLE_ENTRIES_PER_PAGE]))
+			return NULL;
+		sto = get_rt_sto(rto[rtx % ZPCI_TABLE_ENTRIES_PER_PAGE]);
+		addr = ((u64)sto & ZPCI_RTE_ADDR_MASK);
+		idx = srcu_read_lock(&kvm_dom->kvm->srcu);
+		for (i = 0; i < ZPCI_TABLE_PAGES; i++) {
+			page = kvm_dom->pin(kvm_dom->kvm, gpa_to_gfn(addr));
+			if (is_error_page(page)) {
+				srcu_read_unlock(&kvm_dom->kvm->srcu, idx);
+				return NULL;
+			}
+			kvm_dom->seg[rts + i] = (page_to_virt(page) +
+						 (addr & ~PAGE_MASK));
+			addr += PAGE_SIZE;
+		}
+		srcu_read_unlock(&kvm_dom->kvm->srcu, idx);
+	}
+
+	/* Allocate pin pointers for another segment table if needed */
+	if (!kvm_dom->pt[rtx]) {
+		kvm_dom->pt[rtx] = kcalloc(ZPCI_TABLE_ENTRIES,
+					   (sizeof(unsigned long *)),
+					   GFP_KERNEL);
+		if (!kvm_dom->pt[rtx])
+			return NULL;
+	}
+	/* Pin guest page table if needed */
+	sx = calc_sx(dma_addr);
+	sto = kvm_dom->seg[(rts + (sx / ZPCI_TABLE_ENTRIES_PER_PAGE))];
+	if (!kvm_dom->pt[rtx][sx]) {
+		if (!reg_entry_isvalid(sto[sx % ZPCI_TABLE_ENTRIES_PER_PAGE]))
+			return NULL;
+		pto = get_st_pto(sto[sx % ZPCI_TABLE_ENTRIES_PER_PAGE]);
+		if (!pto)
+			return NULL;
+		addr = ((u64)pto & ZPCI_STE_ADDR_MASK);
+		idx = srcu_read_lock(&kvm_dom->kvm->srcu);
+		page = kvm_dom->pin(kvm_dom->kvm, gpa_to_gfn(addr));
+		srcu_read_unlock(&kvm_dom->kvm->srcu, idx);
+		if (is_error_page(page))
+			return NULL;
+		kvm_dom->pt[rtx][sx] = page_to_virt(page) + (addr & ~PAGE_MASK);
+	}
+	pto = kvm_dom->pt[rtx][sx];
+
+	/* Return guest PTE */
+	px = calc_px(dma_addr);
+	return &pto[px];
+}
+
+static int dma_table_shadow(struct s390_domain *s390_domain,
+			    dma_addr_t dma_addr, size_t nr_pages,
+			    size_t *mapped_pages)
+{
+	struct s390_kvm_domain *kvm_dom = &s390_domain->kvm_dom;
+	unsigned long *entry, *gentry;
+	int rc = 0, rc2;
+
+	for (*mapped_pages = 0; *mapped_pages < nr_pages; (*mapped_pages)++) {
+		gentry = dma_walk_guest_cpu_trans(kvm_dom, dma_addr);
+		if (!gentry)
+			continue;
+		entry = dma_walk_cpu_trans(s390_domain->dma_table, dma_addr);
+
+		if (!entry)
+			return -ENOMEM;
+
+		rc2 = dma_shadow_cpu_trans(kvm_dom, entry, gentry);
+		if (rc2 < 0)
+			return -EIO;
+
+		dma_addr += PAGE_SIZE;
+		rc += rc2;
+	}
+
+	return rc;
+}
+
+static int s390_kvm_iommu_update_trans(struct s390_domain *s390_domain,
+				       dma_addr_t dma_addr, size_t nr_pages,
+				       size_t *mapped)
+{
+	struct s390_domain_device *domain_device;
+	unsigned long irq_flags;
+	size_t mapped_pages;
+	int rc = 0;
+	u8 status;
+
+	mutex_lock(&s390_domain->kvm_dom.ioat_lock);
+	rc = dma_table_shadow(s390_domain, dma_addr, nr_pages, &mapped_pages);
+
+	/* If error or no new mappings, leave immediately without refresh */
+	if (rc <= 0)
+		goto exit;
+
+	spin_lock_irqsave(&s390_domain->list_lock, irq_flags);
+	list_for_each_entry(domain_device, &s390_domain->devices, list) {
+		rc = zpci_refresh_trans((u64) domain_device->zdev->fh << 32,
+					dma_addr, nr_pages * PAGE_SIZE,
+					&status);
+		if (rc) {
+			if (status == 0)
+				rc = -EINVAL;
+			else
+				rc = -EIO;
+		}
+	}
+	spin_unlock_irqrestore(&s390_domain->list_lock, irq_flags);
+
+exit:
+	if (mapped)
+		*mapped = mapped_pages << PAGE_SHIFT;
+
+	mutex_unlock(&s390_domain->kvm_dom.ioat_lock);
+	return rc;
+}
+
+static int s390_kvm_iommu_map(struct iommu_domain *domain, unsigned long iova,
+			      phys_addr_t paddr, size_t size, int prot,
+			      gfp_t gfp)
+{
+	struct s390_domain *s390_domain = to_s390_domain(domain);
+	size_t nr_pages;
+
+	int rc = 0;
+
+	if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
+		return -EINVAL;
+
+	/* Can only perform mapping when a guest IOTA is registered */
+	if (!s390_domain->kvm_dom.map_enabled)
+		return -EINVAL;
+
+	nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	if (!nr_pages)
+		return -EINVAL;
+
+	rc = s390_kvm_iommu_update_trans(s390_domain, iova, nr_pages, NULL);
+
+	return rc;
+}
+
+static int s390_kvm_iommu_map_pages(struct iommu_domain *domain,
+				    unsigned long iova, phys_addr_t paddr,
+				    size_t pgsize, size_t pgcount, int prot,
+				    gfp_t gfp, size_t *mapped)
+{
+	struct s390_domain *s390_domain = to_s390_domain(domain);
+	size_t nr_pages;
+
+	int rc = 0;
+
+	if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
+		return -EINVAL;
+
+	/* Can only perform mapping when a guest IOTA is registered */
+	if (!s390_domain->kvm_dom.map_enabled)
+		return -EINVAL;
+
+	nr_pages = pgcount * (pgsize / PAGE_SIZE);
+	if (!nr_pages)
+		return -EINVAL;
+
+	rc = s390_kvm_iommu_update_trans(s390_domain, iova, nr_pages, mapped);
+
+	return rc;
+}
+
+static void free_pt_entry(struct s390_kvm_domain *kvm_dom, int st, int pt)
+{
+	if (!kvm_dom->pt[st][pt])
+		return;
+
+	kvm_dom->unpin((u64)kvm_dom->pt[st][pt]);
+}
+
+static void free_seg_entry(struct s390_kvm_domain *kvm_dom, int entry)
+{
+	int i, st, count = 0;
+
+	for (i = 0; i < ZPCI_TABLE_PAGES; i++) {
+		if (kvm_dom->seg[entry + i]) {
+			kvm_dom->unpin((u64)kvm_dom->seg[entry + i]);
+			count++;
+		}
+	}
+
+	if (count == 0)
+		return;
+
+	st = entry / ZPCI_TABLE_PAGES;
+	for (i = 0; i < ZPCI_TABLE_ENTRIES; i++)
+		free_pt_entry(kvm_dom, st, i);
+	kfree(kvm_dom->pt[st]);
+}
+
+static int s390_kvm_clear_ioat_tables(struct s390_domain *s390_domain)
+{
+	struct s390_kvm_domain *kvm_dom = &s390_domain->kvm_dom;
+	unsigned long *entry;
+	dma_addr_t dma_addr;
+	kvm_pfn_t pfn;
+	int i;
+
+	if (!kvm_dom->kvm || !kvm_dom->map_enabled)
+		return -EINVAL;
+
+	mutex_lock(&s390_domain->kvm_dom.ioat_lock);
+
+	/* Invalidate and unpin remaining guest pages */
+	for (dma_addr = s390_domain->domain.geometry.aperture_start;
+	     dma_addr < s390_domain->domain.geometry.aperture_end;
+	     dma_addr += PAGE_SIZE) {
+		entry = dma_walk_cpu_trans(s390_domain->dma_table, dma_addr);
+		if (entry && pt_entry_isvalid(*entry)) {
+			pfn = (*entry >> PAGE_SHIFT);
+			invalidate_pt_entry(entry);
+			kvm_dom->unpin(pfn);
+		}
+	}
+
+	/* Unpin all shadow tables */
+	for (i = 0; i < ZPCI_TABLE_PAGES; i++) {
+		kvm_dom->unpin((u64)kvm_dom->head[i] >> PAGE_SHIFT);
+		kvm_dom->head[i] = 0;
+	}
+
+	for (i = 0; i < ZPCI_TABLE_ENTRIES_PAGES; i += ZPCI_TABLE_PAGES)
+		free_seg_entry(kvm_dom, i);
+
+	kfree(kvm_dom->seg);
+	kfree(kvm_dom->pt);
+
+	mutex_unlock(&s390_domain->kvm_dom.ioat_lock);
+
+	kvm_dom->map_enabled = false;
+
+	return 0;
+}
+
+static void s390_kvm_domain_free(struct iommu_domain *domain)
+{
+	struct s390_domain *s390_domain = to_s390_domain(domain);
+
+	s390_kvm_clear_ioat_tables(s390_domain);
+
+	if (s390_domain->kvm_dom.kvm) {
+		symbol_put(gfn_to_page);
+		symbol_put(kvm_release_pfn_dirty);
+	}
+
+	s390_domain_free(domain);
+}
+
+int zpci_iommu_attach_kvm(struct zpci_dev *zdev, struct kvm *kvm)
+{
+	struct s390_domain *s390_domain = zdev->s390_domain;
+	struct iommu_domain *domain = &s390_domain->domain;
+	struct s390_domain_device *domain_device;
+	unsigned long flags;
+	int rc = 0;
+
+	if (domain->type != IOMMU_DOMAIN_KVM)
+		return -EINVAL;
+
+	if (s390_domain->kvm_dom.kvm != 0)
+		return -EINVAL;
+
+	spin_lock_irqsave(&s390_domain->list_lock, flags);
+	list_for_each_entry(domain_device, &s390_domain->devices, list) {
+		if (domain_device->zdev->kzdev->kvm != kvm) {
+			rc = -EINVAL;
+			break;
+		}
+		domain_device->zdev->kzdev->dom = domain;
+	}
+	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
+
+	if (rc)
+		return rc;
+
+	s390_domain->kvm_dom.pin = symbol_get(gfn_to_page);
+	if (!s390_domain->kvm_dom.pin)
+		return -EINVAL;
+
+	s390_domain->kvm_dom.unpin = symbol_get(kvm_release_pfn_dirty);
+	if (!s390_domain->kvm_dom.unpin) {
+		symbol_put(gfn_to_page);
+		return -EINVAL;
+	}
+
+	s390_domain->kvm_dom.kvm = kvm;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(zpci_iommu_attach_kvm);
+
+int zpci_iommu_kvm_assign_iota(struct zpci_dev *zdev, u64 iota)
+{
+	struct s390_domain *s390_domain = zdev->s390_domain;
+	struct s390_kvm_domain *kvm_dom = &s390_domain->kvm_dom;
+	gpa_t gpa = (gpa_t)(iota & ZPCI_RTE_ADDR_MASK);
+	struct page *page;
+	struct kvm *kvm;
+	unsigned int idx;
+	void *iaddr;
+	int i, rc;
+
+	/* Ensure KVM associated and IOTA not already registered */
+	if (!kvm_dom->kvm || kvm_dom->map_enabled)
+		return -EINVAL;
+
+	/* Ensure supported type specified */
+	if ((iota & ZPCI_IOTA_RTTO_FLAG) != ZPCI_IOTA_RTTO_FLAG)
+		return -EINVAL;
+
+	kvm = kvm_dom->kvm;
+	mutex_lock(&s390_domain->kvm_dom.ioat_lock);
+	idx = srcu_read_lock(&kvm->srcu);
+	for (i = 0; i < ZPCI_TABLE_PAGES; i++) {
+		page = kvm_dom->pin(kvm, gpa_to_gfn(gpa));
+		if (is_error_page(page)) {
+			srcu_read_unlock(&kvm->srcu, idx);
+			rc = -EIO;
+			goto unpin;
+		}
+		iaddr = page_to_virt(page) + (gpa & ~PAGE_MASK);
+		kvm_dom->head[i] = (unsigned long *)iaddr;
+		gpa += PAGE_SIZE;
+	}
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	kvm_dom->seg = kcalloc(ZPCI_TABLE_ENTRIES_PAGES,
+			       sizeof(unsigned long *), GFP_KERNEL);
+	if (!kvm_dom->seg)
+		goto unpin;
+	kvm_dom->pt = kcalloc(ZPCI_TABLE_ENTRIES, sizeof(unsigned long **),
+			      GFP_KERNEL);
+	if (!kvm_dom->pt)
+		goto free_seg;
+
+	mutex_unlock(&s390_domain->kvm_dom.ioat_lock);
+	kvm_dom->map_enabled = true;
+	return 0;
+
+free_seg:
+	kfree(kvm_dom->seg);
+	rc = -ENOMEM;
+unpin:
+	for (i = 0; i < ZPCI_TABLE_PAGES; i++) {
+		kvm_dom->unpin((u64)kvm_dom->head[i] >> PAGE_SHIFT);
+		kvm_dom->head[i] = 0;
+	}
+	mutex_unlock(&s390_domain->kvm_dom.ioat_lock);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(zpci_iommu_kvm_assign_iota);
+
+int zpci_iommu_kvm_remove_iota(struct zpci_dev *zdev)
+{
+	struct s390_domain *s390_domain = zdev->s390_domain;
+
+	return s390_kvm_clear_ioat_tables(s390_domain);
+}
+EXPORT_SYMBOL_GPL(zpci_iommu_kvm_remove_iota);
+
+const struct iommu_domain_ops s390_kvm_domain_ops = {
+	.attach_dev	= s390_iommu_attach_device,
+	.detach_dev	= s390_iommu_detach_device,
+	/*
+	 * All iommu mapping and unmapping operations are handled via the map
+	 * ops.  A map over a given range will synchronize the host and guest
+	 * DMA tables, performing the necessary mappings / unmappings to
+	 * synchronize the table states.
+	 * Partial mapping failures do not require a rewind, the guest will
+	 * receive an indication that will trigger a global refresh of the
+	 * tables.
+	 */
+	.map		= s390_kvm_iommu_map,
+	.map_pages	= s390_kvm_iommu_map_pages,
+	.unmap		= NULL,
+	.unmap_pages	= NULL,
+	.iova_to_phys	= s390_iommu_iova_to_phys,
+	.free		= s390_kvm_domain_free,
+};
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 19/32] KVM: s390: pci: do initial setup for AEN interpretation
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (17 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 18/32] iommu/s390: add support for IOMMU_DOMAIN_KVM Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 20/32] KVM: s390: pci: enable host forwarding of Adapter Event Notifications Matthew Rosato
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Initial setup for Adapter Event Notification Interpretation for zPCI
passthrough devices.  Specifically, allocate a structure for forwarding of
adapter events and pass the address of this structure to firmware.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/pci.h      |   4 +
 arch/s390/include/asm/pci_insn.h |  12 +++
 arch/s390/kvm/interrupt.c        |  14 +++
 arch/s390/kvm/kvm-s390.c         |   9 ++
 arch/s390/kvm/pci.c              | 154 +++++++++++++++++++++++++++++++
 arch/s390/kvm/pci.h              |  42 +++++++++
 arch/s390/pci/pci.c              |   6 ++
 7 files changed, 241 insertions(+)
 create mode 100644 arch/s390/kvm/pci.h

diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 4faff673078b..1ae49330d1c8 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -9,6 +9,7 @@
 #include <asm-generic/pci.h>
 #include <asm/pci_clp.h>
 #include <asm/pci_debug.h>
+#include <asm/pci_insn.h>
 #include <asm/sclp.h>
 
 #define PCIBIOS_MIN_IO		0x1000
@@ -204,6 +205,9 @@ extern const struct attribute_group *zpci_attr_groups[];
 extern unsigned int s390_pci_force_floating __initdata;
 extern unsigned int s390_pci_no_rid;
 
+extern union zpci_sic_iib *zpci_aipb;
+extern struct airq_iv *zpci_aif_sbv;
+
 /* -----------------------------------------------------------------------------
   Prototypes
 ----------------------------------------------------------------------------- */
diff --git a/arch/s390/include/asm/pci_insn.h b/arch/s390/include/asm/pci_insn.h
index 32759c407b8f..ad9000295c82 100644
--- a/arch/s390/include/asm/pci_insn.h
+++ b/arch/s390/include/asm/pci_insn.h
@@ -101,6 +101,7 @@ struct zpci_fib {
 /* Set Interruption Controls Operation Controls  */
 #define	SIC_IRQ_MODE_ALL		0
 #define	SIC_IRQ_MODE_SINGLE		1
+#define	SIC_SET_AENI_CONTROLS		2
 #define	SIC_IRQ_MODE_DIRECT		4
 #define	SIC_IRQ_MODE_D_ALL		16
 #define	SIC_IRQ_MODE_D_SINGLE		17
@@ -127,9 +128,20 @@ struct zpci_cdiib {
 	u64 : 64;
 } __packed __aligned(8);
 
+/* adapter interruption parameters block */
+struct zpci_aipb {
+	u64 faisb;
+	u64 gait;
+	u16 : 13;
+	u16 afi : 3;
+	u32 : 32;
+	u16 faal;
+} __packed __aligned(8);
+
 union zpci_sic_iib {
 	struct zpci_diib diib;
 	struct zpci_cdiib cdiib;
+	struct zpci_aipb aipb;
 };
 
 DECLARE_STATIC_KEY_FALSE(have_mio);
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 65e75ca2fc5d..17c7deb516d2 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -32,6 +32,7 @@
 #include "kvm-s390.h"
 #include "gaccess.h"
 #include "trace-s390.h"
+#include "pci.h"
 
 #define PFAULT_INIT 0x0600
 #define PFAULT_DONE 0x0680
@@ -3286,6 +3287,11 @@ void kvm_s390_gib_destroy(void)
 {
 	if (!gib)
 		return;
+	if (IS_ENABLED(CONFIG_VFIO_PCI) && sclp.has_aeni && aift) {
+		mutex_lock(&aift->aift_lock);
+		kvm_s390_pci_aen_exit();
+		mutex_unlock(&aift->aift_lock);
+	}
 	chsc_sgib(0);
 	unregister_adapter_interrupt(&gib_alert_irq);
 	free_page((unsigned long)gib);
@@ -3323,6 +3329,14 @@ int kvm_s390_gib_init(u8 nisc)
 		goto out_unreg_gal;
 	}
 
+	if (IS_ENABLED(CONFIG_VFIO_PCI) && sclp.has_aeni) {
+		if (kvm_s390_pci_aen_init(nisc)) {
+			pr_err("Initializing AEN for PCI failed\n");
+			rc = -EIO;
+			goto out_unreg_gal;
+		}
+	}
+
 	KVM_EVENT(3, "gib 0x%pK (nisc=%d) initialized", gib, gib->nisc);
 	goto out;
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 2296b1ff1e02..d89cd16b57dd 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -48,6 +48,7 @@
 #include <asm/fpu/api.h>
 #include "kvm-s390.h"
 #include "gaccess.h"
+#include "pci.h"
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -503,6 +504,14 @@ int kvm_arch_init(void *opaque)
 		goto out;
 	}
 
+	if (IS_ENABLED(CONFIG_VFIO_PCI)) {
+		rc = kvm_s390_pci_init();
+		if (rc) {
+			pr_err("Unable to allocate AIFT for PCI\n");
+			goto out;
+		}
+	}
+
 	rc = kvm_s390_gib_init(GAL_ISC);
 	if (rc)
 		goto out;
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index 612faf87126d..1c42d25de697 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -10,6 +10,148 @@
 #include <linux/kvm_host.h>
 #include <linux/pci.h>
 #include <asm/kvm_pci.h>
+#include <asm/pci.h>
+#include <asm/pci_insn.h>
+#include "pci.h"
+
+struct zpci_aift *aift;
+
+static inline int __set_irq_noiib(u16 ctl, u8 isc)
+{
+	union zpci_sic_iib iib = {{0}};
+
+	return zpci_set_irq_ctrl(ctl, isc, &iib);
+}
+
+/* Caller must hold the aift lock before calling this function */
+void kvm_s390_pci_aen_exit(void)
+{
+	unsigned long flags;
+	struct kvm_zdev **gait_kzdev;
+
+	/*
+	 * Contents of the aipb remain registered for the life of the host
+	 * kernel, the information preserved in zpci_aipb and zpci_aif_sbv
+	 * in case we insert the KVM module again later.  Clear the AIFT
+	 * information and free anything not registered with underlying
+	 * firmware.
+	 */
+	spin_lock_irqsave(&aift->gait_lock, flags);
+	gait_kzdev = aift->kzdev;
+	aift->gait = 0;
+	aift->sbv = 0;
+	aift->kzdev = 0;
+	spin_unlock_irqrestore(&aift->gait_lock, flags);
+
+	kfree(gait_kzdev);
+}
+
+static int zpci_setup_aipb(u8 nisc)
+{
+	struct page *page;
+	int size, rc;
+
+	zpci_aipb = kzalloc(sizeof(union zpci_sic_iib), GFP_KERNEL);
+	if (!zpci_aipb)
+		return -ENOMEM;
+
+	aift->sbv = airq_iv_create(ZPCI_NR_DEVICES, AIRQ_IV_ALLOC, 0);
+	if (!aift->sbv) {
+		rc = -ENOMEM;
+		goto free_aipb;
+	}
+	zpci_aif_sbv = aift->sbv;
+	size = get_order(PAGE_ALIGN(ZPCI_NR_DEVICES *
+						sizeof(struct zpci_gaite)));
+	page = alloc_pages(GFP_KERNEL | __GFP_ZERO, size);
+	if (!page) {
+		rc = -ENOMEM;
+		goto free_sbv;
+	}
+	aift->gait = (struct zpci_gaite *)page_to_phys(page);
+
+	zpci_aipb->aipb.faisb = virt_to_phys(aift->sbv->vector);
+	zpci_aipb->aipb.gait = virt_to_phys(aift->gait);
+	zpci_aipb->aipb.afi = nisc;
+	zpci_aipb->aipb.faal = ZPCI_NR_DEVICES;
+
+	/* Setup Adapter Event Notification Interpretation */
+	if (zpci_set_irq_ctrl(SIC_SET_AENI_CONTROLS, 0, zpci_aipb)) {
+		rc = -EIO;
+		goto free_gait;
+	}
+
+	return 0;
+
+free_gait:
+	size = get_order(PAGE_ALIGN(ZPCI_NR_DEVICES *
+				    sizeof(struct zpci_gaite)));
+	free_pages((unsigned long)aift->gait, size);
+free_sbv:
+	airq_iv_release(aift->sbv);
+	zpci_aif_sbv = 0;
+free_aipb:
+	kfree(zpci_aipb);
+	zpci_aipb = 0;
+
+	return rc;
+}
+
+static int zpci_reset_aipb(u8 nisc)
+{
+	/*
+	 * AEN registration can only happen once per system boot.  If
+	 * an aipb already exists then AEN was already registered and
+	 * we can re-use the aipb contents.  This can only happen if
+	 * the KVM module was removed and re-inserted.
+	 */
+	if (zpci_aipb->aipb.faal != ZPCI_NR_DEVICES ||
+	    zpci_aipb->aipb.afi != nisc) {
+		return -EINVAL;
+	}
+	aift->sbv = zpci_aif_sbv;
+	aift->gait = (struct zpci_gaite *)zpci_aipb->aipb.gait;
+
+	return 0;
+}
+
+int kvm_s390_pci_aen_init(u8 nisc)
+{
+	int rc = 0;
+
+	/* If already enabled for AEN, bail out now */
+	if (aift->gait || aift->sbv)
+		return -EPERM;
+
+	mutex_lock(&aift->aift_lock);
+	aift->kzdev = kcalloc(ZPCI_NR_DEVICES, sizeof(struct kvm_zdev),
+			      GFP_KERNEL);
+	if (!aift->kzdev) {
+		rc = -ENOMEM;
+		goto unlock;
+	}
+
+	if (!zpci_aipb)
+		rc = zpci_setup_aipb(nisc);
+	else
+		rc = zpci_reset_aipb(nisc);
+	if (rc)
+		goto free_zdev;
+
+	/* Enable floating IRQs */
+	if (__set_irq_noiib(SIC_IRQ_MODE_SINGLE, nisc)) {
+		rc = -EIO;
+		kvm_s390_pci_aen_exit();
+	}
+
+	goto unlock;
+
+free_zdev:
+	kfree(aift->kzdev);
+unlock:
+	mutex_unlock(&aift->aift_lock);
+	return rc;
+}
 
 int kvm_s390_pci_dev_open(struct zpci_dev *zdev)
 {
@@ -36,3 +178,15 @@ void kvm_s390_pci_dev_release(struct zpci_dev *zdev)
 	kfree(kzdev);
 }
 EXPORT_SYMBOL_GPL(kvm_s390_pci_dev_release);
+
+int kvm_s390_pci_init(void)
+{
+	aift = kzalloc(sizeof(struct zpci_aift), GFP_KERNEL);
+	if (!aift)
+		return -ENOMEM;
+
+	spin_lock_init(&aift->gait_lock);
+	mutex_init(&aift->aift_lock);
+
+	return 0;
+}
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
new file mode 100644
index 000000000000..19609d7a53a7
--- /dev/null
+++ b/arch/s390/kvm/pci.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * s390 kvm PCI passthrough support
+ *
+ * Copyright IBM Corp. 2022
+ *
+ *    Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
+ */
+
+#ifndef __KVM_S390_PCI_H
+#define __KVM_S390_PCI_H
+
+#include <linux/pci.h>
+#include <linux/mutex.h>
+#include <asm/airq.h>
+#include <asm/kvm_pci.h>
+
+struct zpci_gaite {
+	u32 gisa;
+	u8 gisc;
+	u8 count;
+	u8 reserved;
+	u8 aisbo;
+	u64 aisb;
+};
+
+struct zpci_aift {
+	struct zpci_gaite *gait;
+	struct airq_iv *sbv;
+	struct kvm_zdev **kzdev;
+	spinlock_t gait_lock; /* Protects the gait, used during AEN forward */
+	struct mutex aift_lock; /* Protects the other structures in aift */
+};
+
+extern struct zpci_aift *aift;
+
+int kvm_s390_pci_aen_init(u8 nisc);
+void kvm_s390_pci_aen_exit(void);
+
+int kvm_s390_pci_init(void);
+
+#endif /* __KVM_S390_PCI_H */
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 04c16312ad54..13033717cd4e 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -61,6 +61,12 @@ DEFINE_STATIC_KEY_FALSE(have_mio);
 
 static struct kmem_cache *zdev_fmb_cache;
 
+/* AEN structures that must be preserved over KVM module re-insertion */
+union zpci_sic_iib *zpci_aipb;
+EXPORT_SYMBOL_GPL(zpci_aipb);
+struct airq_iv *zpci_aif_sbv;
+EXPORT_SYMBOL_GPL(zpci_aif_sbv);
+
 struct zpci_dev *get_zdev_by_fid(u32 fid)
 {
 	struct zpci_dev *tmp, *zdev = NULL;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 20/32] KVM: s390: pci: enable host forwarding of Adapter Event Notifications
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (18 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 19/32] KVM: s390: pci: do initial setup for AEN interpretation Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 21/32] KVM: s390: mechanism to enable guest zPCI Interpretation Matthew Rosato
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

In cases where interrupts are not forwarded to the guest via firmware,
KVM is responsible for ensuring delivery.  When an interrupt presents
with the forwarding bit, we must process the forwarding tables until
all interrupts are delivered.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  1 +
 arch/s390/include/asm/tpi.h      | 13 ++++++
 arch/s390/kvm/interrupt.c        | 77 +++++++++++++++++++++++++++++++-
 arch/s390/kvm/kvm-s390.c         |  3 +-
 arch/s390/kvm/pci.h              | 10 +++++
 5 files changed, 102 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a22c9266ea05..b468d3a2215e 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -757,6 +757,7 @@ struct kvm_vm_stat {
 	u64 inject_pfault_done;
 	u64 inject_service_signal;
 	u64 inject_virtio;
+	u64 aen_forward;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/s390/include/asm/tpi.h b/arch/s390/include/asm/tpi.h
index 1ac538b8cbf5..f76e5fdff23a 100644
--- a/arch/s390/include/asm/tpi.h
+++ b/arch/s390/include/asm/tpi.h
@@ -19,6 +19,19 @@ struct tpi_info {
 	u32 :12;
 } __packed __aligned(4);
 
+/* I/O-Interruption Code as stored by TPI for an Adapter I/O */
+struct tpi_adapter_info {
+	u32 aism:8;
+	u32 :22;
+	u32 error:1;
+	u32 forward:1;
+	u32 reserved;
+	u32 adapter_IO:1;
+	u32 directed_irq:1;
+	u32 isc:3;
+	u32 :27;
+} __packed __aligned(4);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_S390_TPI_H */
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 17c7deb516d2..513b393d5d0d 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -3271,11 +3271,86 @@ int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc)
 }
 EXPORT_SYMBOL_GPL(kvm_s390_gisc_unregister);
 
+static void aen_host_forward(unsigned long si)
+{
+	struct kvm_s390_gisa_interrupt *gi;
+	struct zpci_gaite *gaite;
+	struct kvm *kvm;
+
+	gaite = (struct zpci_gaite *)aift->gait +
+		(si * sizeof(struct zpci_gaite));
+	if (gaite->count == 0)
+		return;
+	if (gaite->aisb != 0)
+		set_bit_inv(gaite->aisbo, (unsigned long *)gaite->aisb);
+
+	kvm = kvm_s390_pci_si_to_kvm(aift, si);
+	if (!kvm)
+		return;
+	gi = &kvm->arch.gisa_int;
+
+	if (!(gi->origin->g1.simm & AIS_MODE_MASK(gaite->gisc)) ||
+	    !(gi->origin->g1.nimm & AIS_MODE_MASK(gaite->gisc))) {
+		gisa_set_ipm_gisc(gi->origin, gaite->gisc);
+		if (hrtimer_active(&gi->timer))
+			hrtimer_cancel(&gi->timer);
+		hrtimer_start(&gi->timer, 0, HRTIMER_MODE_REL);
+		kvm->stat.aen_forward++;
+	}
+}
+
+static void aen_process_gait(u8 isc)
+{
+	bool found = false, first = true;
+	union zpci_sic_iib iib = {{0}};
+	unsigned long si, flags;
+
+	spin_lock_irqsave(&aift->gait_lock, flags);
+
+	if (!aift->gait) {
+		spin_unlock_irqrestore(&aift->gait_lock, flags);
+		return;
+	}
+
+	for (si = 0;;) {
+		/* Scan adapter summary indicator bit vector */
+		si = airq_iv_scan(aift->sbv, si, airq_iv_end(aift->sbv));
+		if (si == -1UL) {
+			if (first || found) {
+				/* Re-enable interrupts. */
+				zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, isc,
+						  &iib);
+				first = found = false;
+			} else {
+				/* Interrupts on and all bits processed */
+				break;
+			}
+			found = false;
+			si = 0;
+			/* Scan again after re-enabling interrupts */
+			continue;
+		}
+		found = true;
+		aen_host_forward(si);
+	}
+
+	spin_unlock_irqrestore(&aift->gait_lock, flags);
+}
+
 static void gib_alert_irq_handler(struct airq_struct *airq,
 				  struct tpi_info *tpi_info)
 {
+	struct tpi_adapter_info *info = (struct tpi_adapter_info *)tpi_info;
+
 	inc_irq_stat(IRQIO_GAL);
-	process_gib_alert_list();
+
+	if (IS_ENABLED(CONFIG_VFIO_PCI) && (info->forward || info->error)) {
+		aen_process_gait(info->isc);
+		if (info->aism != 0)
+			process_gib_alert_list();
+	} else {
+		process_gib_alert_list();
+	}
 }
 
 static struct airq_struct gib_alert_irq = {
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d89cd16b57dd..32e75f6f4e4d 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -65,7 +65,8 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	STATS_DESC_COUNTER(VM, inject_float_mchk),
 	STATS_DESC_COUNTER(VM, inject_pfault_done),
 	STATS_DESC_COUNTER(VM, inject_service_signal),
-	STATS_DESC_COUNTER(VM, inject_virtio)
+	STATS_DESC_COUNTER(VM, inject_virtio),
+	STATS_DESC_COUNTER(VM, aen_forward)
 };
 
 const struct kvm_stats_header kvm_vm_stats_header = {
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
index 19609d7a53a7..25cb1c787190 100644
--- a/arch/s390/kvm/pci.h
+++ b/arch/s390/kvm/pci.h
@@ -12,6 +12,7 @@
 
 #include <linux/pci.h>
 #include <linux/mutex.h>
+#include <linux/kvm_host.h>
 #include <asm/airq.h>
 #include <asm/kvm_pci.h>
 
@@ -34,6 +35,15 @@ struct zpci_aift {
 
 extern struct zpci_aift *aift;
 
+static inline struct kvm *kvm_s390_pci_si_to_kvm(struct zpci_aift *aift,
+						 unsigned long si)
+{
+	if (!IS_ENABLED(CONFIG_VFIO_PCI) || aift->kzdev == 0 ||
+	    aift->kzdev[si] == 0)
+		return 0;
+	return aift->kzdev[si]->kvm;
+};
+
 int kvm_s390_pci_aen_init(u8 nisc);
 void kvm_s390_pci_aen_exit(void);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 21/32] KVM: s390: mechanism to enable guest zPCI Interpretation
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (19 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 20/32] KVM: s390: pci: enable host forwarding of Adapter Event Notifications Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 22/32] KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM Matthew Rosato
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

The guest must have access to certain facilities in order to allow
interpretive execution of zPCI instructions and adapter event
notifications.  However, there are some cases where a guest might
disable interpretation -- provide a mechanism via which we can defer
enabling the associated zPCI interpretation facilities until the guest
indicates it wishes to use them.

Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  4 ++++
 arch/s390/kvm/kvm-s390.c         | 41 ++++++++++++++++++++++++++++++++
 arch/s390/kvm/kvm-s390.h         | 10 ++++++++
 3 files changed, 55 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index b468d3a2215e..bf61ab05f98c 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -252,7 +252,10 @@ struct kvm_s390_sie_block {
 #define ECB2_IEP	0x20
 #define ECB2_PFMFI	0x08
 #define ECB2_ESCA	0x04
+#define ECB2_ZPCI_LSI	0x02
 	__u8    ecb2;                   /* 0x0062 */
+#define ECB3_AISI	0x20
+#define ECB3_AISII	0x10
 #define ECB3_DEA 0x08
 #define ECB3_AES 0x04
 #define ECB3_RI  0x01
@@ -938,6 +941,7 @@ struct kvm_arch{
 	int use_cmma;
 	int use_pfmfi;
 	int use_skf;
+	int use_zpci_interp;
 	int user_cpu_state_ctrl;
 	int user_sigp;
 	int user_stsi;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 32e75f6f4e4d..d91b2547f0bf 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1029,6 +1029,45 @@ static int kvm_s390_vm_set_crypto(struct kvm *kvm, struct kvm_device_attr *attr)
 	return 0;
 }
 
+static void kvm_s390_vcpu_pci_setup(struct kvm_vcpu *vcpu)
+{
+	/* Only set the ECB bits after guest requests zPCI interpretation */
+	if (!vcpu->kvm->arch.use_zpci_interp)
+		return;
+
+	vcpu->arch.sie_block->ecb2 |= ECB2_ZPCI_LSI;
+	vcpu->arch.sie_block->ecb3 |= ECB3_AISII + ECB3_AISI;
+}
+
+void kvm_s390_vcpu_pci_enable_interp(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
+
+	/*
+	 * If host is configured for PCI and the necessary facilities are
+	 * available, turn on interpretation for the life of this guest
+	 */
+	if (!sclp.has_zpci_lsi || !sclp.has_aisii || !sclp.has_aeni ||
+	    !sclp.has_aisi || !IS_ENABLED(CONFIG_VFIO_PCI) ||
+	    !IS_ENABLED(CONFIG_S390_KVM_IOMMU))
+		return;
+
+	mutex_lock(&kvm->lock);
+
+	kvm->arch.use_zpci_interp = 1;
+
+	kvm_s390_vcpu_block_all(kvm);
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		kvm_s390_vcpu_pci_setup(vcpu);
+		kvm_s390_sync_request(KVM_REQ_VSIE_RESTART, vcpu);
+	}
+
+	kvm_s390_vcpu_unblock_all(kvm);
+	mutex_unlock(&kvm->lock);
+}
+
 static void kvm_s390_sync_request_broadcast(struct kvm *kvm, int req)
 {
 	unsigned long cx;
@@ -3236,6 +3275,8 @@ static int kvm_s390_vcpu_setup(struct kvm_vcpu *vcpu)
 
 	kvm_s390_vcpu_crypto_setup(vcpu);
 
+	kvm_s390_vcpu_pci_setup(vcpu);
+
 	mutex_lock(&vcpu->kvm->lock);
 	if (kvm_s390_pv_is_protected(vcpu->kvm)) {
 		rc = kvm_s390_pv_create_cpu(vcpu, &uvrc, &uvrrc);
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 098831e815e6..14bb2539f837 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -496,6 +496,16 @@ void kvm_s390_reinject_machine_check(struct kvm_vcpu *vcpu,
  */
 void kvm_s390_vcpu_crypto_reset_all(struct kvm *kvm);
 
+/**
+ * kvm_s390_vcpu_pci_enable_interp
+ *
+ * Set the associated PCI attributes for each vcpu to allow for zPCI Load/Store
+ * interpretation as well as adapter interruption forwarding.
+ *
+ * @kvm: the KVM guest
+ */
+void kvm_s390_vcpu_pci_enable_interp(struct kvm *kvm);
+
 /**
  * diag9c_forwarding_hz
  *
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 22/32] KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (20 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 21/32] KVM: s390: mechanism to enable guest zPCI Interpretation Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 21:46   ` Jason Gunthorpe
  2022-03-14 19:44 ` [PATCH v4 23/32] KVM: s390: pci: provide routines for enabling/disabling interpretation Matthew Rosato
                   ` (10 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

These routines will be wired into a KVM ioctl, to be issued from
userspace to (dis)associate a specific zPCI device with the issuing
KVM.  This will create/delete a relationship between KVM, zPCI device
and the associated IOMMU domain for the device.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |   2 +
 arch/s390/include/asm/kvm_pci.h  |   2 +
 arch/s390/kvm/kvm-s390.c         |   5 +
 arch/s390/kvm/pci.c              | 225 +++++++++++++++++++++++++++++++
 arch/s390/kvm/pci.h              |   5 +
 5 files changed, 239 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index bf61ab05f98c..bd171abbb8ef 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -965,6 +965,8 @@ struct kvm_arch{
 	DECLARE_BITMAP(idle_mask, KVM_MAX_VCPUS);
 	struct kvm_s390_gisa_interrupt gisa_int;
 	struct kvm_s390_pv pv;
+	struct list_head kzdev_list;
+	spinlock_t kzdev_list_lock;
 };
 
 #define KVM_HVA_ERR_BAD		(-1UL)
diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
index ebc0da5d9ac1..47ce18b5bddd 100644
--- a/arch/s390/include/asm/kvm_pci.h
+++ b/arch/s390/include/asm/kvm_pci.h
@@ -21,6 +21,8 @@ struct kvm_zdev {
 	struct zpci_dev *zdev;
 	struct kvm *kvm;
 	struct iommu_domain *dom; /* Used to invoke IOMMU API for RPCIT */
+	struct notifier_block nb;
+	struct list_head entry;
 };
 
 int kvm_s390_pci_dev_open(struct zpci_dev *zdev);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d91b2547f0bf..84acaf59a7d3 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2775,6 +2775,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	kvm_s390_crypto_init(kvm);
 
+	if (IS_ENABLED(CONFIG_VFIO_PCI))
+		kvm_s390_pci_init_list(kvm);
+
 	mutex_init(&kvm->arch.float_int.ais_lock);
 	spin_lock_init(&kvm->arch.float_int.lock);
 	for (i = 0; i < FIRQ_LIST_COUNT; i++)
@@ -2860,6 +2863,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	if (!kvm_is_ucontrol(kvm))
 		gmap_remove(kvm->arch.gmap);
 	kvm_s390_destroy_adapters(kvm);
+	if (IS_ENABLED(CONFIG_VFIO_PCI))
+		kvm_s390_pci_clear_list(kvm);
 	kvm_s390_clear_float_irqs(kvm);
 	kvm_s390_vsie_destroy(kvm);
 	KVM_EVENT(3, "vm 0x%pK destroyed", kvm);
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index 1c42d25de697..28fe95f13c33 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -9,6 +9,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/pci.h>
+#include <linux/vfio.h>
 #include <asm/kvm_pci.h>
 #include <asm/pci.h>
 #include <asm/pci_insn.h>
@@ -23,6 +24,22 @@ static inline int __set_irq_noiib(u16 ctl, u8 isc)
 	return zpci_set_irq_ctrl(ctl, isc, &iib);
 }
 
+static struct kvm_zdev *get_kzdev_by_fh(struct kvm *kvm, u32 fh)
+{
+	struct kvm_zdev *kzdev, *retval = NULL;
+
+	spin_lock(&kvm->arch.kzdev_list_lock);
+	list_for_each_entry(kzdev, &kvm->arch.kzdev_list, entry) {
+		if (kzdev->zdev->fh == fh) {
+			retval = kzdev;
+			break;
+		}
+	}
+	spin_unlock(&kvm->arch.kzdev_list_lock);
+
+	return retval;
+}
+
 /* Caller must hold the aift lock before calling this function */
 void kvm_s390_pci_aen_exit(void)
 {
@@ -153,6 +170,20 @@ int kvm_s390_pci_aen_init(u8 nisc)
 	return rc;
 }
 
+static int kvm_s390_pci_group_notifier(struct notifier_block *nb,
+				       unsigned long action, void *data)
+{
+	struct kvm_zdev *kzdev = container_of(nb, struct kvm_zdev, nb);
+
+	if (action == VFIO_GROUP_NOTIFY_SET_KVM) {
+		if (!data || !kzdev->zdev)
+			return NOTIFY_DONE;
+		kzdev->kvm = data;
+	}
+
+	return NOTIFY_OK;
+}
+
 int kvm_s390_pci_dev_open(struct zpci_dev *zdev)
 {
 	struct kvm_zdev *kzdev;
@@ -179,6 +210,200 @@ void kvm_s390_pci_dev_release(struct zpci_dev *zdev)
 }
 EXPORT_SYMBOL_GPL(kvm_s390_pci_dev_release);
 
+static struct vfio_device *get_vdev(struct device *dev)
+{
+	struct vfio_device *(*fn)(struct device *dev);
+	struct vfio_device *vdev;
+
+	fn = symbol_get(vfio_device_get_from_dev);
+	if (!fn)
+		return NULL;
+
+	vdev = fn(dev);
+
+	symbol_put(vfio_device_get_from_dev);
+
+	return vdev;
+}
+
+static void put_vdev(struct vfio_device *vdev)
+{
+	void (*fn)(struct vfio_device *vdev);
+
+	fn = symbol_get(vfio_device_put);
+	if (!fn)
+		return;
+
+	fn(vdev);
+
+	symbol_put(vfio_device_put);
+}
+
+static int register_notifier(struct device *dev, struct notifier_block *nb)
+{
+	int (*fn)(struct device *dev, enum vfio_notify_type type,
+		  unsigned long *events, struct notifier_block *nb);
+	unsigned long events = VFIO_GROUP_NOTIFY_SET_KVM;
+	int rc;
+
+	fn = symbol_get(vfio_register_notifier);
+	if (!fn)
+		return -EINVAL;
+
+	rc = fn(dev, VFIO_GROUP_NOTIFY, &events, nb);
+
+	symbol_put(vfio_register_notifier);
+
+	return rc;
+}
+
+static int unregister_notifier(struct device *dev, struct notifier_block *nb)
+{
+	int (*fn)(struct device *dev, enum vfio_notify_type type,
+		  struct notifier_block *nb);
+	int rc;
+
+	fn = symbol_get(vfio_unregister_notifier);
+	if (!fn)
+		return -EINVAL;
+
+	rc = fn(dev, VFIO_GROUP_NOTIFY, nb);
+
+	symbol_put(vfio_unregister_notifier);
+
+	return rc;
+}
+
+int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev)
+{
+	struct vfio_device *vdev;
+	struct pci_dev *pdev;
+	int rc;
+
+	rc = kvm_s390_pci_dev_open(zdev);
+	if (rc)
+		return rc;
+
+	pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
+	if (!pdev) {
+		rc = -ENODEV;
+		goto exit_err;
+	}
+
+	vdev = get_vdev(&pdev->dev);
+	if (!vdev) {
+		pci_dev_put(pdev);
+		rc = -ENODEV;
+		goto exit_err;
+	}
+
+	zdev->kzdev->nb.notifier_call = kvm_s390_pci_group_notifier;
+
+	/*
+	 * At this point, a KVM should already be associated with this device,
+	 * so registering the notifier now should immediately trigger the
+	 * event.  We also want to know if the KVM association is later removed
+	 * to ensure proper cleanup happens.
+	 */
+	rc = register_notifier(vdev->dev, &zdev->kzdev->nb);
+
+	put_vdev(vdev);
+	pci_dev_put(pdev);
+
+	/* Make sure the registered KVM matches the KVM issuing the ioctl */
+	if (rc || zdev->kzdev->kvm != kvm) {
+		rc = -ENODEV;
+		goto exit_err;
+	}
+
+	/* Must support KVM-managed IOMMU to proceed */
+	if (IS_ENABLED(CONFIG_S390_KVM_IOMMU))
+		rc = zpci_iommu_attach_kvm(zdev, kvm);
+	else
+		rc = -EINVAL;
+
+	if (rc)
+		goto exit_err;
+
+	spin_lock(&kvm->arch.kzdev_list_lock);
+	list_add_tail(&zdev->kzdev->entry, &kvm->arch.kzdev_list);
+	spin_unlock(&kvm->arch.kzdev_list_lock);
+	return 0;
+
+exit_err:
+	kvm_s390_pci_dev_release(zdev);
+	return rc;
+}
+
+int kvm_s390_pci_zpci_stop(struct kvm *kvm, struct zpci_dev *zdev)
+{
+	struct vfio_device *vdev;
+	struct pci_dev *pdev;
+	int rc = 0;
+
+	if (!zdev || !zdev->kzdev)
+		return -EINVAL;
+
+	pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
+	if (!pdev) {
+		rc = -ENODEV;
+		goto exit_err;
+	}
+
+	vdev = get_vdev(&pdev->dev);
+	if (!vdev) {
+		pci_dev_put(pdev);
+		rc = -ENODEV;
+		goto exit_err;
+	}
+
+	spin_lock(&kvm->arch.kzdev_list_lock);
+	list_del(&zdev->kzdev->entry);
+	spin_unlock(&kvm->arch.kzdev_list_lock);
+
+	rc = unregister_notifier(vdev->dev, &zdev->kzdev->nb);
+
+	put_vdev(vdev);
+	pci_dev_put(pdev);
+
+exit_err:
+	kvm_s390_pci_dev_release(zdev);
+	return rc;
+}
+
+void kvm_s390_pci_init_list(struct kvm *kvm)
+{
+	spin_lock_init(&kvm->arch.kzdev_list_lock);
+	INIT_LIST_HEAD(&kvm->arch.kzdev_list);
+}
+
+void kvm_s390_pci_clear_list(struct kvm *kvm)
+{
+	struct kvm_zdev *tmp, *kzdev;
+	struct vfio_device *vdev;
+	struct pci_dev *pdev;
+	LIST_HEAD(remove);
+
+	spin_lock(&kvm->arch.kzdev_list_lock);
+	list_for_each_entry_safe(kzdev, tmp, &kvm->arch.kzdev_list, entry)
+		list_move_tail(&kzdev->entry, &remove);
+	spin_unlock(&kvm->arch.kzdev_list_lock);
+
+	list_for_each_entry_safe(kzdev, tmp, &remove, entry) {
+		pdev = pci_get_slot(kzdev->zdev->zbus->bus, kzdev->zdev->devfn);
+		if (pdev) {
+			vdev = get_vdev(&pdev->dev);
+			if (vdev) {
+				unregister_notifier(vdev->dev,
+						    &kzdev->nb);
+				put_vdev(vdev);
+			}
+			pci_dev_put(pdev);
+		}
+		kvm_s390_pci_dev_release(kzdev->zdev);
+	}
+}
+
 int kvm_s390_pci_init(void)
 {
 	aift = kzalloc(sizeof(struct zpci_aift), GFP_KERNEL);
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
index 25cb1c787190..a95d9fdc91be 100644
--- a/arch/s390/kvm/pci.h
+++ b/arch/s390/kvm/pci.h
@@ -47,6 +47,11 @@ static inline struct kvm *kvm_s390_pci_si_to_kvm(struct zpci_aift *aift,
 int kvm_s390_pci_aen_init(u8 nisc);
 void kvm_s390_pci_aen_exit(void);
 
+int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev);
+int kvm_s390_pci_zpci_stop(struct kvm *kvm, struct zpci_dev *zdev);
+void kvm_s390_pci_init_list(struct kvm *kvm);
+void kvm_s390_pci_clear_list(struct kvm *kvm);
+
 int kvm_s390_pci_init(void);
 
 #endif /* __KVM_S390_PCI_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 23/32] KVM: s390: pci: provide routines for enabling/disabling interpretation
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (21 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 22/32] KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 24/32] KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding Matthew Rosato
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

These routines will be wired into a kvm ioctl in order to respond to
requests to enable / disable a device for zPCI Load/Store intepretation.

The first time such a request is received, enable the necessary facilities
for the guest.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/kvm/pci.c | 86 +++++++++++++++++++++++++++++++++++++++++++++
 arch/s390/pci/pci.c |  3 ++
 2 files changed, 89 insertions(+)

diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index 28fe95f13c33..df50dd6114c3 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -13,7 +13,9 @@
 #include <asm/kvm_pci.h>
 #include <asm/pci.h>
 #include <asm/pci_insn.h>
+#include <asm/sclp.h>
 #include "pci.h"
+#include "kvm-s390.h"
 
 struct zpci_aift *aift;
 
@@ -170,6 +172,87 @@ int kvm_s390_pci_aen_init(u8 nisc)
 	return rc;
 }
 
+static int kvm_s390_pci_interp_enable(struct zpci_dev *zdev)
+{
+	u32 gisa;
+	int rc;
+
+	if (!zdev->kzdev || !zdev->kzdev->kvm)
+		return -EINVAL;
+
+	/*
+	 * If this is the first request to use an interpreted device, make the
+	 * necessary vcpu changes
+	 */
+	if (!zdev->kzdev->kvm->arch.use_zpci_interp)
+		kvm_s390_vcpu_pci_enable_interp(zdev->kzdev->kvm);
+
+	/*
+	 * In the event of a system reset in userspace, the GISA designation
+	 * may still be assigned because the device is still enabled.
+	 * Verify it's the same guest before proceeding.
+	 */
+	gisa = (u32)virt_to_phys(&zdev->kzdev->kvm->arch.sie_page2->gisa);
+	if (zdev->gisa != 0 && zdev->gisa != gisa)
+		return -EPERM;
+
+	if (zdev_enabled(zdev)) {
+		zdev->gisa = 0;
+		rc = zpci_disable_device(zdev);
+		if (rc)
+			return rc;
+	}
+
+	/*
+	 * Store information about the identity of the kvm guest allowed to
+	 * access this device via interpretation to be used by host CLP
+	 */
+	zdev->gisa = gisa;
+
+	rc = zpci_enable_device(zdev);
+	if (rc)
+		goto err;
+
+	/* Re-register the IOMMU that was already created */
+	rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
+				virt_to_phys(zdev->dma_table));
+	if (rc)
+		goto err;
+
+	return rc;
+
+err:
+	zdev->gisa = 0;
+	return rc;
+}
+
+static int kvm_s390_pci_interp_disable(struct zpci_dev *zdev)
+{
+	int rc;
+
+	if (zdev->gisa == 0)
+		return -EINVAL;
+
+	/* Remove the host CLP guest designation */
+	zdev->gisa = 0;
+
+	if (zdev_enabled(zdev)) {
+		rc = zpci_disable_device(zdev);
+		if (rc)
+			return rc;
+	}
+
+	rc = zpci_enable_device(zdev);
+	if (rc)
+		return rc;
+
+	/* Re-register the IOMMU that was already created */
+	rc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma,
+				virt_to_phys(zdev->dma_table));
+
+	return rc;
+}
+
 static int kvm_s390_pci_group_notifier(struct notifier_block *nb,
 				       unsigned long action, void *data)
 {
@@ -203,6 +286,9 @@ void kvm_s390_pci_dev_release(struct zpci_dev *zdev)
 {
 	struct kvm_zdev *kzdev;
 
+	if (zdev->gisa != 0)
+		kvm_s390_pci_interp_disable(zdev, true);
+
 	kzdev = zdev->kzdev;
 	WARN_ON(kzdev->zdev != zdev);
 	zdev->kzdev = 0;
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 13033717cd4e..5dbe49ec325e 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -147,6 +147,7 @@ int zpci_register_ioat(struct zpci_dev *zdev, u8 dmaas,
 		zpci_dbg(3, "reg ioat fid:%x, cc:%d, status:%d\n", zdev->fid, cc, status);
 	return cc;
 }
+EXPORT_SYMBOL_GPL(zpci_register_ioat);
 
 /* Modify PCI: Unregister I/O address translation parameters */
 int zpci_unregister_ioat(struct zpci_dev *zdev, u8 dmaas)
@@ -727,6 +728,7 @@ int zpci_enable_device(struct zpci_dev *zdev)
 		zpci_update_fh(zdev, fh);
 	return rc;
 }
+EXPORT_SYMBOL_GPL(zpci_enable_device);
 
 int zpci_disable_device(struct zpci_dev *zdev)
 {
@@ -750,6 +752,7 @@ int zpci_disable_device(struct zpci_dev *zdev)
 	}
 	return rc;
 }
+EXPORT_SYMBOL_GPL(zpci_disable_device);
 
 /**
  * zpci_hot_reset_device - perform a reset of the given zPCI function
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 24/32] KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (22 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 23/32] KVM: s390: pci: provide routines for enabling/disabling interpretation Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 25/32] KVM: s390: pci: provide routines for enabling/disabling IOAT assist Matthew Rosato
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

These routines will be wired into a kvm ioctl in order to respond to
requests to enable / disable a device for Adapter Event Notifications /
Adapter Interuption Forwarding.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/kvm_pci.h |   2 +
 arch/s390/kvm/pci.c             | 201 +++++++++++++++++++++++++++++++-
 arch/s390/pci/pci_insn.c        |   1 +
 3 files changed, 203 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
index 47ce18b5bddd..ed596880fb06 100644
--- a/arch/s390/include/asm/kvm_pci.h
+++ b/arch/s390/include/asm/kvm_pci.h
@@ -16,11 +16,13 @@
 #include <linux/kvm_host.h>
 #include <linux/kvm.h>
 #include <linux/pci.h>
+#include <asm/pci_insn.h>
 
 struct kvm_zdev {
 	struct zpci_dev *zdev;
 	struct kvm *kvm;
 	struct iommu_domain *dom; /* Used to invoke IOMMU API for RPCIT */
+	struct zpci_fib fib;
 	struct notifier_block nb;
 	struct list_head entry;
 };
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index df50dd6114c3..2287c1c6a3e5 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -13,6 +13,7 @@
 #include <asm/kvm_pci.h>
 #include <asm/pci.h>
 #include <asm/pci_insn.h>
+#include <asm/pci_io.h>
 #include <asm/sclp.h>
 #include "pci.h"
 #include "kvm-s390.h"
@@ -172,6 +173,200 @@ int kvm_s390_pci_aen_init(u8 nisc)
 	return rc;
 }
 
+/* Modify PCI: Register floating adapter interruption forwarding */
+static int kvm_zpci_set_airq(struct zpci_dev *zdev)
+{
+	u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_REG_INT);
+	struct zpci_fib fib = {};
+	u8 status;
+
+	fib.fmt0.isc = zdev->kzdev->fib.fmt0.isc;
+	fib.fmt0.sum = 1;       /* enable summary notifications */
+	fib.fmt0.noi = airq_iv_end(zdev->aibv);
+	fib.fmt0.aibv = virt_to_phys(zdev->aibv->vector);
+	fib.fmt0.aibvo = 0;
+	fib.fmt0.aisb = virt_to_phys(aift->sbv->vector + (zdev->aisb / 64) * 8);
+	fib.fmt0.aisbo = zdev->aisb & 63;
+	fib.gd = zdev->gisa;
+
+	return zpci_mod_fc(req, &fib, &status) ? -EIO : 0;
+}
+
+/* Modify PCI: Unregister floating adapter interruption forwarding */
+static int kvm_zpci_clear_airq(struct zpci_dev *zdev)
+{
+	u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_DEREG_INT);
+	struct zpci_fib fib = {};
+	u8 cc, status;
+
+	fib.gd = zdev->gisa;
+
+	cc = zpci_mod_fc(req, &fib, &status);
+	if (cc == 3 || (cc == 1 && status == 24))
+		/* Function already gone or IRQs already deregistered. */
+		cc = 0;
+
+	return cc ? -EIO : 0;
+}
+
+static int kvm_s390_pci_aif_enable(struct zpci_dev *zdev, struct zpci_fib *fib,
+				   bool assist)
+{
+	struct page *aibv_page, *aisb_page = NULL;
+	unsigned int msi_vecs, idx;
+	struct zpci_gaite *gaite;
+	unsigned long bit;
+	struct kvm *kvm;
+	phys_addr_t gaddr;
+	int rc = 0, gisc;
+
+	/*
+	 * Interrupt forwarding is only applicable if the device is already
+	 * enabled for interpretation
+	 */
+	if (zdev->gisa == 0)
+		return -EINVAL;
+
+	kvm = zdev->kzdev->kvm;
+	msi_vecs = min_t(unsigned int, fib->fmt0.noi, zdev->max_msi);
+
+	/* Get the associated forwarding ISC - if invalid, return the error */
+	gisc = kvm_s390_gisc_register(kvm, fib->fmt0.isc);
+	if (gisc < 0)
+		return gisc;
+
+	/* Replace AIBV address */
+	idx = srcu_read_lock(&kvm->srcu);
+	aibv_page = gfn_to_page(kvm, gpa_to_gfn((gpa_t)fib->fmt0.aibv));
+	srcu_read_unlock(&kvm->srcu, idx);
+	if (is_error_page(aibv_page)) {
+		rc = -EIO;
+		goto out;
+	}
+	gaddr = page_to_phys(aibv_page) + (fib->fmt0.aibv & ~PAGE_MASK);
+	fib->fmt0.aibv = gaddr;
+
+	/* Pin the guest AISB if one was specified */
+	if (fib->fmt0.sum == 1) {
+		idx = srcu_read_lock(&kvm->srcu);
+		aisb_page = gfn_to_page(kvm, gpa_to_gfn((gpa_t)fib->fmt0.aisb));
+		srcu_read_unlock(&kvm->srcu, idx);
+		if (is_error_page(aisb_page)) {
+			rc = -EIO;
+			goto unpin1;
+		}
+	}
+
+	/* AISB must be allocated before we can fill in GAITE */
+	mutex_lock(&aift->aift_lock);
+	bit = airq_iv_alloc_bit(aift->sbv);
+	if (bit == -1UL)
+		goto unpin2;
+	zdev->aisb = bit; /* store the summary bit number */
+	zdev->aibv = airq_iv_create(msi_vecs, AIRQ_IV_DATA |
+				    AIRQ_IV_BITLOCK |
+				    AIRQ_IV_GUESTVEC,
+				    phys_to_virt(fib->fmt0.aibv));
+
+	spin_lock_irq(&aift->gait_lock);
+	gaite = (struct zpci_gaite *)aift->gait + (zdev->aisb *
+						   sizeof(struct zpci_gaite));
+
+	/* If assist not requested, host will get all alerts */
+	if (assist)
+		gaite->gisa = (u32)virt_to_phys(&kvm->arch.sie_page2->gisa);
+	else
+		gaite->gisa = 0;
+
+	gaite->gisc = fib->fmt0.isc;
+	gaite->count++;
+	gaite->aisbo = fib->fmt0.aisbo;
+	gaite->aisb = virt_to_phys(page_address(aisb_page) + (fib->fmt0.aisb &
+							      ~PAGE_MASK));
+	aift->kzdev[zdev->aisb] = zdev->kzdev;
+	spin_unlock_irq(&aift->gait_lock);
+
+	/* Update guest FIB for re-issue */
+	fib->fmt0.aisbo = zdev->aisb & 63;
+	fib->fmt0.aisb = virt_to_phys(aift->sbv->vector + (zdev->aisb / 64) * 8);
+	fib->fmt0.isc = gisc;
+
+	/* Save some guest fib values in the host for later use */
+	zdev->kzdev->fib.fmt0.isc = fib->fmt0.isc;
+	zdev->kzdev->fib.fmt0.aibv = fib->fmt0.aibv;
+	mutex_unlock(&aift->aift_lock);
+
+	/* Issue the clp to setup the irq now */
+	rc = kvm_zpci_set_airq(zdev);
+	return rc;
+
+unpin2:
+	mutex_unlock(&aift->aift_lock);
+	if (fib->fmt0.sum == 1) {
+		gaddr = page_to_phys(aisb_page);
+		kvm_release_pfn_dirty(gaddr >> PAGE_SHIFT);
+	}
+unpin1:
+	kvm_release_pfn_dirty(fib->fmt0.aibv >> PAGE_SHIFT);
+out:
+	return rc;
+}
+
+static int kvm_s390_pci_aif_disable(struct zpci_dev *zdev, bool force)
+{
+	struct kvm_zdev *kzdev = zdev->kzdev;
+	struct zpci_gaite *gaite;
+	int rc;
+	u8 isc;
+
+	if (zdev->gisa == 0)
+		return -EINVAL;
+
+	mutex_lock(&aift->aift_lock);
+
+	/*
+	 * If the clear fails due to an error, leave now unless we know this
+	 * device is about to go away (force) -- In that case clear the GAITE
+	 * regardless.
+	 */
+	rc = kvm_zpci_clear_airq(zdev);
+	if (rc && !force)
+		goto out;
+
+	if (zdev->kzdev->fib.fmt0.aibv == 0)
+		goto out;
+	spin_lock_irq(&aift->gait_lock);
+	gaite = (struct zpci_gaite *)aift->gait + (zdev->aisb *
+						   sizeof(struct zpci_gaite));
+	isc = gaite->gisc;
+	gaite->count--;
+	if (gaite->count == 0) {
+		/* Release guest AIBV and AISB */
+		kvm_release_pfn_dirty(kzdev->fib.fmt0.aibv >> PAGE_SHIFT);
+		if (gaite->aisb != 0)
+			kvm_release_pfn_dirty(gaite->aisb >> PAGE_SHIFT);
+		/* Clear the GAIT entry */
+		gaite->aisb = 0;
+		gaite->gisc = 0;
+		gaite->aisbo = 0;
+		gaite->gisa = 0;
+		aift->kzdev[zdev->aisb] = 0;
+		/* Clear zdev info */
+		airq_iv_free_bit(aift->sbv, zdev->aisb);
+		airq_iv_release(zdev->aibv);
+		zdev->aisb = 0;
+		zdev->aibv = NULL;
+	}
+	spin_unlock_irq(&aift->gait_lock);
+	kvm_s390_gisc_unregister(kzdev->kvm, isc);
+	kzdev->fib.fmt0.isc = 0;
+	kzdev->fib.fmt0.aibv = 0;
+out:
+	mutex_unlock(&aift->aift_lock);
+
+	return rc;
+}
+
 static int kvm_s390_pci_interp_enable(struct zpci_dev *zdev)
 {
 	u32 gisa;
@@ -226,13 +421,17 @@ static int kvm_s390_pci_interp_enable(struct zpci_dev *zdev)
 	return rc;
 }
 
-static int kvm_s390_pci_interp_disable(struct zpci_dev *zdev)
+static int kvm_s390_pci_interp_disable(struct zpci_dev *zdev, bool force)
 {
 	int rc;
 
 	if (zdev->gisa == 0)
 		return -EINVAL;
 
+	/* Forwarding must be turned off before interpretation */
+	if (zdev->kzdev->fib.fmt0.aibv != 0)
+		kvm_s390_pci_aif_disable(zdev, force);
+
 	/* Remove the host CLP guest designation */
 	zdev->gisa = 0;
 
diff --git a/arch/s390/pci/pci_insn.c b/arch/s390/pci/pci_insn.c
index ca6399d52767..f7d0e29bbf0b 100644
--- a/arch/s390/pci/pci_insn.c
+++ b/arch/s390/pci/pci_insn.c
@@ -59,6 +59,7 @@ u8 zpci_mod_fc(u64 req, struct zpci_fib *fib, u8 *status)
 
 	return cc;
 }
+EXPORT_SYMBOL_GPL(zpci_mod_fc);
 
 /* Refresh PCI Translations */
 static inline u8 __rpcit(u64 fn, u64 addr, u64 range, u8 *status)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 25/32] KVM: s390: pci: provide routines for enabling/disabling IOAT assist
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (23 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 24/32] KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 26/32] KVM: s390: pci: handle refresh of PCI translations Matthew Rosato
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

These routines will be wired into a kvm ioctl in orer to respond to
requests to enable / disable a device for PCI I/O Address Translation
assistance via a KVM-managed IOMMU.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/kvm_pci.h |  2 ++
 arch/s390/kvm/pci.c             | 25 +++++++++++++++++++++++++
 arch/s390/kvm/pci.h             |  2 ++
 3 files changed, 29 insertions(+)

diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
index ed596880fb06..e27dbede723c 100644
--- a/arch/s390/include/asm/kvm_pci.h
+++ b/arch/s390/include/asm/kvm_pci.h
@@ -30,6 +30,8 @@ struct kvm_zdev {
 int kvm_s390_pci_dev_open(struct zpci_dev *zdev);
 void kvm_s390_pci_dev_release(struct zpci_dev *zdev);
 
+u8 kvm_s390_pci_get_dtsm(struct zpci_dev *zdev);
+
 int zpci_iommu_attach_kvm(struct zpci_dev *zdev, struct kvm *kvm);
 int zpci_iommu_kvm_assign_iota(struct zpci_dev *zdev, u64 iota);
 int zpci_iommu_kvm_remove_iota(struct zpci_dev *zdev);
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index 2287c1c6a3e5..1a8b82220b29 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -367,6 +367,28 @@ static int kvm_s390_pci_aif_disable(struct zpci_dev *zdev, bool force)
 	return rc;
 }
 
+static int kvm_s390_pci_ioat_enable(struct zpci_dev *zdev, u64 iota)
+{
+	if (IS_ENABLED(CONFIG_S390_KVM_IOMMU))
+		return zpci_iommu_kvm_assign_iota(zdev, iota);
+	else
+		return -EINVAL;
+}
+
+static int kvm_s390_pci_ioat_disable(struct zpci_dev *zdev)
+{
+	if (IS_ENABLED(CONFIG_S390_KVM_IOMMU))
+		return zpci_iommu_kvm_remove_iota(zdev);
+	else
+		return -EINVAL;
+}
+
+u8 kvm_s390_pci_get_dtsm(struct zpci_dev *zdev)
+{
+	return (zdev->dtsm & KVM_S390_PCI_DTSM_MASK);
+}
+EXPORT_SYMBOL_GPL(kvm_s390_pci_get_dtsm);
+
 static int kvm_s390_pci_interp_enable(struct zpci_dev *zdev)
 {
 	u32 gisa;
@@ -432,6 +454,9 @@ static int kvm_s390_pci_interp_disable(struct zpci_dev *zdev, bool force)
 	if (zdev->kzdev->fib.fmt0.aibv != 0)
 		kvm_s390_pci_aif_disable(zdev, force);
 
+	/* If we are using the IOAT assist, disable it now */
+	kvm_s390_pci_ioat_disable(zdev);
+
 	/* Remove the host CLP guest designation */
 	zdev->gisa = 0;
 
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
index a95d9fdc91be..867f04cae3a1 100644
--- a/arch/s390/kvm/pci.h
+++ b/arch/s390/kvm/pci.h
@@ -16,6 +16,8 @@
 #include <asm/airq.h>
 #include <asm/kvm_pci.h>
 
+#define KVM_S390_PCI_DTSM_MASK 0x40
+
 struct zpci_gaite {
 	u32 gisa;
 	u8 gisc;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 26/32] KVM: s390: pci: handle refresh of PCI translations
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (24 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 25/32] KVM: s390: pci: provide routines for enabling/disabling IOAT assist Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 27/32] KVM: s390: intercept the rpcit instruction Matthew Rosato
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Add a routine that will perform a shadow operation between a guest
and host IOAT.  A subsequent patch will invoke this in response to
an 04 RPCIT instruction intercept.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/asm/kvm_pci.h |  1 +
 arch/s390/kvm/pci.c             | 31 ++++++++++++++++++++++++++++++-
 arch/s390/kvm/pci.h             |  3 +++
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
index e27dbede723c..9578b5dafb45 100644
--- a/arch/s390/include/asm/kvm_pci.h
+++ b/arch/s390/include/asm/kvm_pci.h
@@ -25,6 +25,7 @@ struct kvm_zdev {
 	struct zpci_fib fib;
 	struct notifier_block nb;
 	struct list_head entry;
+	u64 rpcit_count;
 };
 
 int kvm_s390_pci_dev_open(struct zpci_dev *zdev);
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index 1a8b82220b29..40d2fadbfbd5 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/kvm_host.h>
+#include <linux/iommu.h>
 #include <linux/pci.h>
 #include <linux/vfio.h>
 #include <asm/kvm_pci.h>
@@ -173,6 +174,30 @@ int kvm_s390_pci_aen_init(u8 nisc)
 	return rc;
 }
 
+int kvm_s390_pci_refresh_trans(struct kvm_vcpu *vcpu, unsigned long req,
+			       unsigned long start, unsigned long size)
+{
+	struct kvm_zdev *kzdev;
+	u32 fh = req >> 32;
+	int rc;
+
+	/* Make sure this is a valid device associated with this guest */
+	kzdev = get_kzdev_by_fh(vcpu->kvm, fh);
+	if (!kzdev)
+		return -EINVAL;
+
+	/*
+	 * The KVM-managed IOMMU map operation will synchronize the associated
+	 * guest IOAT tables with the host DMA tables.  A physical address is
+	 * not specified as it will be derived from pinned guest PTEs
+	 */
+	rc = iommu_map(kzdev->dom, start, 0, size, IOMMU_WRITE | IOMMU_READ);
+
+	kzdev->rpcit_count++;
+
+	return rc;
+}
+
 /* Modify PCI: Register floating adapter interruption forwarding */
 static int kvm_zpci_set_airq(struct zpci_dev *zdev)
 {
@@ -716,6 +741,8 @@ void kvm_s390_pci_clear_list(struct kvm *kvm)
 
 int kvm_s390_pci_init(void)
 {
+	int rc;
+
 	aift = kzalloc(sizeof(struct zpci_aift), GFP_KERNEL);
 	if (!aift)
 		return -ENOMEM;
@@ -723,5 +750,7 @@ int kvm_s390_pci_init(void)
 	spin_lock_init(&aift->gait_lock);
 	mutex_init(&aift->aift_lock);
 
-	return 0;
+	rc = zpci_get_mdd(&aift->mdd);
+
+	return rc;
 }
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
index 867f04cae3a1..2cb1b27396c1 100644
--- a/arch/s390/kvm/pci.h
+++ b/arch/s390/kvm/pci.h
@@ -33,6 +33,7 @@ struct zpci_aift {
 	struct kvm_zdev **kzdev;
 	spinlock_t gait_lock; /* Protects the gait, used during AEN forward */
 	struct mutex aift_lock; /* Protects the other structures in aift */
+	u32 mdd;
 };
 
 extern struct zpci_aift *aift;
@@ -48,6 +49,8 @@ static inline struct kvm *kvm_s390_pci_si_to_kvm(struct zpci_aift *aift,
 
 int kvm_s390_pci_aen_init(u8 nisc);
 void kvm_s390_pci_aen_exit(void);
+int kvm_s390_pci_refresh_trans(struct kvm_vcpu *vcpu, unsigned long req,
+			       unsigned long start, unsigned long end);
 
 int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev);
 int kvm_s390_pci_zpci_stop(struct kvm *kvm, struct zpci_dev *zdev);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 27/32] KVM: s390: intercept the rpcit instruction
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (25 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 26/32] KVM: s390: pci: handle refresh of PCI translations Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 28/32] KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices Matthew Rosato
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

For faster handling of PCI translation refreshes, intercept in KVM
and call the associated handler.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/kvm/priv.c | 46 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 417154b314a6..546c99a0e0b6 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -29,6 +29,7 @@
 #include <asm/ap.h>
 #include "gaccess.h"
 #include "kvm-s390.h"
+#include "pci.h"
 #include "trace.h"
 
 static int handle_ri(struct kvm_vcpu *vcpu)
@@ -335,6 +336,49 @@ static int handle_rrbe(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static int handle_rpcit(struct kvm_vcpu *vcpu)
+{
+	int reg1, reg2;
+	int rc;
+
+	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
+		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
+
+	/* KVM can only handle if we support KVM-managed IOMMU */
+	if (!IS_ENABLED(CONFIG_S390_KVM_IOMMU))
+		return -EOPNOTSUPP;
+
+	kvm_s390_get_regs_rre(vcpu, &reg1, &reg2);
+
+	/* If the device has a SHM bit on, let userspace take care of this */
+	if (((vcpu->run->s.regs.gprs[reg1] >> 32) & aift->mdd) != 0)
+		return -EOPNOTSUPP;
+
+	rc = kvm_s390_pci_refresh_trans(vcpu, vcpu->run->s.regs.gprs[reg1],
+					vcpu->run->s.regs.gprs[reg2],
+					vcpu->run->s.regs.gprs[reg2 + 1]);
+
+	switch (rc) {
+	case 0:
+		kvm_s390_set_psw_cc(vcpu, 0);
+		break;
+	case -ENOMEM:
+		vcpu->run->s.regs.gprs[reg1] &= 0xffffffff00ffffffUL;
+		vcpu->run->s.regs.gprs[reg1] |= (u64)4 << 24;
+		kvm_s390_set_psw_cc(vcpu, 1);
+		break;
+	case -EIO:
+		vcpu->run->s.regs.gprs[reg1] &= 0xffffffff00ffffffUL;
+		vcpu->run->s.regs.gprs[reg1] |= (u64)16 << 24;
+		kvm_s390_set_psw_cc(vcpu, 1);
+		break;
+	default:
+		kvm_s390_set_psw_cc(vcpu, 3);
+	}
+
+	return 0;
+}
+
 #define SSKE_NQ 0x8
 #define SSKE_MR 0x4
 #define SSKE_MC 0x2
@@ -1275,6 +1319,8 @@ int kvm_s390_handle_b9(struct kvm_vcpu *vcpu)
 		return handle_essa(vcpu);
 	case 0xaf:
 		return handle_pfmf(vcpu);
+	case 0xd3:
+		return handle_rpcit(vcpu);
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 28/32] KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (26 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 27/32] KVM: s390: intercept the rpcit instruction Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 29/32] vfio-pci/zdev: add DTSM to clp group capability Matthew Rosato
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

The KVM_S390_ZPCI_OP ioctl provides a series of operations that
can be invoked to manage hardware-assisted virtualization features
for s390x PCI passthrough.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 Documentation/virt/kvm/api.rst | 60 ++++++++++++++++++++++++++
 arch/s390/kvm/kvm-s390.c       | 26 ++++++++++++
 arch/s390/kvm/pci.c            | 77 ++++++++++++++++++++++++++++++++++
 arch/s390/kvm/pci.h            |  3 +-
 include/uapi/linux/kvm.h       | 43 +++++++++++++++++++
 5 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 9f3172376ec3..c642ff891cf2 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5574,6 +5574,66 @@ enabled with ``arch_prctl()``, but this may change in the future.
 The offsets of the state save areas in struct kvm_xsave follow the contents
 of CPUID leaf 0xD on the host.
 
+4.134 KVM_S390_ZPCI_OP
+--------------------
+
+:Capability: KVM_CAP_S390_ZPCI_OP
+:Architectures: s390
+:Type: vcpu ioctl
+:Parameters: struct kvm_s390_zpci_op (in, out)
+:Returns: 0 on success, <0 on error
+
+Used to manage hardware-assisted virtualization features for zPCI devices.
+
+Parameters are specified via the following structure::
+
+  struct kvm_s390_zpci_op {
+	/* in */
+	__u32 fh;		/* target device */
+	__u8  op;		/* operation to perform */
+	__u8  pad[3];
+	union {
+		/* for KVM_S390_ZPCIOP_REG_INT */
+		struct {
+			__u64 ibv;	/* Guest addr of interrupt bit vector */
+			__u64 sb;	/* Guest addr of summary bit */
+			__u32 flags;
+			__u32 noi;	/* Number of interrupts */
+			__u8 isc;	/* Guest interrupt subclass */
+			__u8 sbo;	/* Offset of guest summary bit vector */
+			__u16 pad;
+		} reg_int;
+		/* for KVM_S390_ZPCIOP_REG_IOAT */
+		struct {
+			__u64 iota;	/* I/O Translation settings */
+		} reg_ioat;
+		__u8 reserved[64];
+	} u;
+	/* out */
+	__u32 newfh;		/* updated device handle */
+  };
+
+The type of operation is specified in the "op" field.
+KVM_S390_ZPCIOP_INIT is used to assocaite a zPCI function with this vm.
+Conversely, KVM_S390_ZPCIOP_END is used to terminate that association.
+KVM_S390_ZPCIOP_START_INTERP is used to enable interpretive execution
+for the specified zPCI function for this VM; KVM_S390_ZPCIOP_STOP_INTERP
+is used to subsequently disable interpretive execution.
+KVM_S390_ZPCIOP_REG_INT is used to register the VM for adapter interruption
+forwarding, which will allow firmware delivery of interrupts directly to
+the vm, with KVM providing a backup delivery mechanism;
+KVM_S390_ZPCIOP_DEREG_INT is used to subsequently disable interrupt forwarding.
+KVM_S390_ZPCIOP_REG_IOAT is used to enable KVM-managed IOMMU ops to begin
+synchronizing guest and host DMA tables; KVM_S390_ZPCIOP_DEREG_IOAT is used
+to subsequently disable IOMMU mapping.
+
+The target zPCI function must also be specified via the "fh" field.  For the
+KVM_S390_ZPCIOP_REG_INT operation, additional information to establish the
+interrupt forwarding must be provided via the "reg_int" struct.  For the
+KVM_S390_ZPCIOP_REG_IOAT operation, guest table format and location must be
+specified via the "reg_ioat" struct.
+
+The "reserved" field is meant for future extensions.
 
 5. The kvm_run structure
 ========================
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 84acaf59a7d3..613101ba29be 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -616,6 +616,15 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_S390_PROTECTED:
 		r = is_prot_virt_host();
 		break;
+	case KVM_CAP_S390_ZPCI_OP:
+		if (IS_ENABLED(CONFIG_S390_KVM_IOMMU) && test_facility(69) &&
+		    test_facility(70) && test_facility(71) &&
+		    test_facility(72)) {
+			r = 1;
+		} else {
+			r = 0;
+		}
+		break;
 	default:
 		r = 0;
 	}
@@ -2532,6 +2541,23 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		}
 		break;
 	}
+	case KVM_S390_ZPCI_OP: {
+		struct kvm_s390_zpci_op args;
+
+		r = -EINVAL;
+		if (!IS_ENABLED(CONFIG_VFIO_PCI))
+			break;
+		if (copy_from_user(&args, argp, sizeof(args))) {
+			r = -EFAULT;
+			break;
+		}
+		r = kvm_s390_pci_zpci_op(kvm, &args);
+		if (r)
+			break;
+		if (copy_to_user(argp, &args, sizeof(args)))
+			r = -EFAULT;
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
diff --git a/arch/s390/kvm/pci.c b/arch/s390/kvm/pci.c
index 40d2fadbfbd5..15b581915cd7 100644
--- a/arch/s390/kvm/pci.c
+++ b/arch/s390/kvm/pci.c
@@ -739,6 +739,83 @@ void kvm_s390_pci_clear_list(struct kvm *kvm)
 	}
 }
 
+static int kvm_s390_pci_zpci_reg_int(struct zpci_dev *zdev,
+				     struct kvm_s390_zpci_op *args)
+{
+	struct zpci_fib fib = {};
+
+	fib.fmt0.aibv = args->u.reg_int.ibv;
+	fib.fmt0.isc = args->u.reg_int.isc;
+	fib.fmt0.noi = args->u.reg_int.noi;
+	if (args->u.reg_int.sb != 0) {
+		fib.fmt0.aisb = args->u.reg_int.sb;
+		fib.fmt0.aisbo = args->u.reg_int.sbo;
+		fib.fmt0.sum = 1;
+	} else {
+		fib.fmt0.aisb = 0;
+		fib.fmt0.aisbo = 0;
+		fib.fmt0.sum = 0;
+	}
+
+	if (args->u.reg_int.flags & KVM_S390_ZPCIOP_REGINT_HOST)
+		return kvm_s390_pci_aif_enable(zdev, &fib, true);
+	else
+		return kvm_s390_pci_aif_enable(zdev, &fib, false);
+}
+
+int kvm_s390_pci_zpci_op(struct kvm *kvm, struct kvm_s390_zpci_op *args)
+{
+	struct kvm_zdev *kzdev;
+	struct zpci_dev *zdev;
+	int r;
+
+	if (args->op == KVM_S390_ZPCIOP_INIT) {
+		zdev = get_zdev_by_fh(args->fh);
+		if (!zdev)
+			return -ENODEV;
+	} else {
+		kzdev = get_kzdev_by_fh(kvm, args->fh);
+		if (!kzdev || !kzdev->zdev)
+			return -ENODEV;
+		zdev = kzdev->zdev;
+	}
+
+	switch (args->op) {
+	case KVM_S390_ZPCIOP_INIT:
+		r = kvm_s390_pci_zpci_start(kvm, zdev);
+		break;
+	case KVM_S390_ZPCIOP_END:
+		r = kvm_s390_pci_zpci_stop(kvm, zdev);
+		break;
+	case KVM_S390_ZPCIOP_START_INTERP:
+		r = kvm_s390_pci_interp_enable(zdev);
+		break;
+	case KVM_S390_ZPCIOP_STOP_INTERP:
+		r = kvm_s390_pci_interp_disable(zdev, false);
+		break;
+	case KVM_S390_ZPCIOP_REG_INT:
+		r = kvm_s390_pci_zpci_reg_int(zdev, args);
+		break;
+	case KVM_S390_ZPCIOP_DEREG_INT:
+		r = kvm_s390_pci_aif_disable(zdev, false);
+		break;
+	case KVM_S390_ZPCIOP_REG_IOAT:
+		r = kvm_s390_pci_ioat_enable(zdev, args->u.reg_ioat.iota);
+		break;
+	case KVM_S390_ZPCIOP_DEREG_IOAT:
+		r = kvm_s390_pci_ioat_disable(zdev);
+		break;
+	default:
+		r = -EINVAL;
+	}
+
+	/* On success, always return the current host function handle */
+	if (r == 0)
+		args->newfh = zdev->fh;
+
+	return r;
+}
+
 int kvm_s390_pci_init(void)
 {
 	int rc;
diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
index 2cb1b27396c1..c30b0bacca00 100644
--- a/arch/s390/kvm/pci.h
+++ b/arch/s390/kvm/pci.h
@@ -12,6 +12,7 @@
 
 #include <linux/pci.h>
 #include <linux/mutex.h>
+#include <linux/kvm.h>
 #include <linux/kvm_host.h>
 #include <asm/airq.h>
 #include <asm/kvm_pci.h>
@@ -56,7 +57,7 @@ int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev);
 int kvm_s390_pci_zpci_stop(struct kvm *kvm, struct zpci_dev *zdev);
 void kvm_s390_pci_init_list(struct kvm *kvm);
 void kvm_s390_pci_clear_list(struct kvm *kvm);
-
+int kvm_s390_pci_zpci_op(struct kvm *kvm, struct kvm_s390_zpci_op *args);
 int kvm_s390_pci_init(void);
 
 #endif /* __KVM_S390_PCI_H */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 507ee1f2aa96..be8693ccc833 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1135,6 +1135,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_XSAVE2 208
 #define KVM_CAP_SYS_ATTRIBUTES 209
 #define KVM_CAP_PPC_AIL_MODE_3 210
+#define KVM_CAP_S390_ZPCI_OP 211
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -2049,4 +2050,46 @@ struct kvm_stats_desc {
 /* Available with KVM_CAP_XSAVE2 */
 #define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct kvm_xsave)
 
+/* Available with KVM_CAP_S390_ZPCI_OP */
+#define KVM_S390_ZPCI_OP	  _IOW(KVMIO,  0xd0, struct kvm_s390_zpci_op)
+
+struct kvm_s390_zpci_op {
+	/* in */
+	__u32 fh;		/* target device */
+	__u8  op;		/* operation to perform */
+	__u8  pad[3];
+	union {
+		/* for KVM_S390_ZPCIOP_REG_INT */
+		struct {
+			__u64 ibv;	/* Guest addr of interrupt bit vector */
+			__u64 sb;	/* Guest addr of summary bit */
+			__u32 flags;
+			__u32 noi;	/* Number of interrupts */
+			__u8 isc;	/* Guest interrupt subclass */
+			__u8 sbo;	/* Offset of guest summary bit vector */
+			__u16 pad;
+		} reg_int;
+		/* for KVM_S390_ZPCIOP_REG_IOAT */
+		struct {
+			__u64 iota;	/* I/O Translation settings */
+		} reg_ioat;
+		__u8 reserved[64];
+	} u;
+	/* out */
+	__u32 newfh;		/* updated device handle */
+};
+
+/* types for kvm_s390_zpci_op->op */
+#define KVM_S390_ZPCIOP_INIT		0
+#define KVM_S390_ZPCIOP_END		1
+#define KVM_S390_ZPCIOP_START_INTERP	2
+#define KVM_S390_ZPCIOP_STOP_INTERP	3
+#define KVM_S390_ZPCIOP_REG_INT		4
+#define KVM_S390_ZPCIOP_DEREG_INT	5
+#define KVM_S390_ZPCIOP_REG_IOAT	6
+#define KVM_S390_ZPCIOP_DEREG_IOAT	7
+
+/* flags for kvm_s390_zpci_op->u.reg_int.flags */
+#define KVM_S390_ZPCIOP_REGINT_HOST	(1 << 0)
+
 #endif /* __LINUX_KVM_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 29/32] vfio-pci/zdev: add DTSM to clp group capability
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (27 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 28/32] KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 21:49   ` Jason Gunthorpe
  2022-03-14 19:44 ` [PATCH v4 30/32] KVM: s390: introduce CPU feature for zPCI Interpretation Matthew Rosato
                   ` (3 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

The DTSM, or designation type supported mask, indicates what IOAT formats
are available to the guest.  For an interpreted device, userspace will not
know what format(s) the IOAT assist supports, so pass it via the
capability chain.  Since the value belongs to the Query PCI Function Group
clp, let's extend the existing capability with a new version.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 drivers/vfio/pci/vfio_pci_zdev.c | 12 ++++++++++--
 include/uapi/linux/vfio_zdev.h   |  3 +++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 4a653ce480c7..aadd2b58b822 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -13,6 +13,7 @@
 #include <linux/vfio_zdev.h>
 #include <asm/pci_clp.h>
 #include <asm/pci_io.h>
+#include <asm/kvm_pci.h>
 
 #include <linux/vfio_pci_core.h>
 
@@ -44,16 +45,23 @@ static int zpci_group_cap(struct zpci_dev *zdev, struct vfio_info_cap *caps)
 {
 	struct vfio_device_info_cap_zpci_group cap = {
 		.header.id = VFIO_DEVICE_INFO_CAP_ZPCI_GROUP,
-		.header.version = 1,
+		.header.version = 2,
 		.dasm = zdev->dma_mask,
 		.msi_addr = zdev->msi_addr,
 		.flags = VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH,
 		.mui = zdev->fmb_update,
 		.noi = zdev->max_msi,
 		.maxstbl = ZPCI_MAX_WRITE_SIZE,
-		.version = zdev->version
+		.version = zdev->version,
+		.dtsm = 0
 	};
 
+	/* Some values are different for interpreted devices */
+	if (zdev->kzdev) {
+		cap.maxstbl = zdev->maxstbl;
+		cap.dtsm = kvm_s390_pci_get_dtsm(zdev);
+	}
+
 	return vfio_info_add_capability(caps, &cap.header, sizeof(cap));
 }
 
diff --git a/include/uapi/linux/vfio_zdev.h b/include/uapi/linux/vfio_zdev.h
index 78c022af3d29..29351687e914 100644
--- a/include/uapi/linux/vfio_zdev.h
+++ b/include/uapi/linux/vfio_zdev.h
@@ -50,6 +50,9 @@ struct vfio_device_info_cap_zpci_group {
 	__u16 noi;		/* Maximum number of MSIs */
 	__u16 maxstbl;		/* Maximum Store Block Length */
 	__u8 version;		/* Supported PCI Version */
+	/* End of version 1 */
+	__u8 dtsm;		/* Supported IOAT Designations */
+	/* End of version 2 */
 };
 
 /**
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 30/32] KVM: s390: introduce CPU feature for zPCI Interpretation
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (28 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 29/32] vfio-pci/zdev: add DTSM to clp group capability Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 31/32] MAINTAINERS: additional files related kvm s390 pci passthrough Matthew Rosato
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

KVM_S390_VM_CPU_FEAT_ZPCI_INTERP relays whether zPCI interpretive
execution is possible based on the available hardware facilities.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 arch/s390/include/uapi/asm/kvm.h | 1 +
 arch/s390/kvm/kvm-s390.c         | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 7a6b14874d65..ed06458a871f 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -130,6 +130,7 @@ struct kvm_s390_vm_cpu_machine {
 #define KVM_S390_VM_CPU_FEAT_PFMFI	11
 #define KVM_S390_VM_CPU_FEAT_SIGPIF	12
 #define KVM_S390_VM_CPU_FEAT_KSS	13
+#define KVM_S390_VM_CPU_FEAT_ZPCI_INTERP 14
 struct kvm_s390_vm_cpu_feat {
 	__u64 feat[16];
 };
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 613101ba29be..137ab8c09b82 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -434,6 +434,12 @@ static void kvm_s390_cpu_feat_init(void)
 	if (test_facility(151)) /* DFLTCC */
 		__insn32_query(INSN_DFLTCC, kvm_s390_available_subfunc.dfltcc);
 
+	/* zPCI Interpretation */
+	if (IS_ENABLED(CONFIG_VFIO_PCI) && IS_ENABLED(CONFIG_S390_KVM_IOMMU) &&
+	    test_facility(69) && test_facility(70) && test_facility(71) &&
+	    test_facility(72))
+		allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ZPCI_INTERP);
+
 	if (MACHINE_HAS_ESOP)
 		allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
 	/*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 31/32] MAINTAINERS: additional files related kvm s390 pci passthrough
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (29 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 30/32] KVM: s390: introduce CPU feature for zPCI Interpretation Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:44 ` [PATCH v4 32/32] MAINTAINERS: update s390 IOMMU entry Matthew Rosato
  2022-03-14 19:52 ` [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Add entries from the s390 kvm subdirectory related to pci passthrough.

Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e127c2fb08a7..6c76eb66b10a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16928,6 +16928,8 @@ M:	Eric Farman <farman@linux.ibm.com>
 L:	linux-s390@vger.kernel.org
 L:	kvm@vger.kernel.org
 S:	Supported
+F:	arch/s390/include/asm/kvm_pci.h
+F:	arch/s390/kvm/pci*
 F:	drivers/vfio/pci/vfio_pci_zdev.c
 F:	include/uapi/linux/vfio_zdev.h
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v4 32/32] MAINTAINERS: update s390 IOMMU entry
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (30 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 31/32] MAINTAINERS: additional files related kvm s390 pci passthrough Matthew Rosato
@ 2022-03-14 19:44 ` Matthew Rosato
  2022-03-14 19:52 ` [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:44 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

Use wildcard to pick up new parts added by KVM domain support.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6c76eb66b10a..d803f490eafb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16867,7 +16867,7 @@ M:	Gerald Schaefer <gerald.schaefer@linux.ibm.com>
 L:	linux-s390@vger.kernel.org
 S:	Supported
 W:	http://www.ibm.com/developerworks/linux/linux390/
-F:	drivers/iommu/s390-iommu.c
+F:	drivers/iommu/s390*
 
 S390 IUCV NETWORK LAYER
 M:	Alexandra Winter <wintera@linux.ibm.com>
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution
  2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
                   ` (31 preceding siblings ...)
  2022-03-14 19:44 ` [PATCH v4 32/32] MAINTAINERS: update s390 IOMMU entry Matthew Rosato
@ 2022-03-14 19:52 ` Matthew Rosato
  32 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-14 19:52 UTC (permalink / raw)
  To: linux-s390
  Cc: alex.williamson, cohuck, schnelle, farman, pmorel, borntraeger,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, jgg, kvm, linux-kernel, iommu, linux-doc

On 3/14/22 3:44 PM, Matthew Rosato wrote:
> Note: A few patches in this series are dependent on Baolu's IOMMU domain ops
> split, which is currently in the next branch of linux-iommu. This series
> applies on top:
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git
> 
> Enable interpretive execution of zPCI instructions + adapter interruption
> forwarding for s390x KVM vfio-pci.  This is done by introducing a new IOMMU
> domain for s390x (KVM-managed), indicating via vfio that this IOMMU domain
> should be used instead of the default, with subsequent management of the
> hardware assists being handled via a new KVM ioctl for zPCI management.
> 
> By allowing intepretation of zPCI instructions and firmware delivery of
> interrupts to guests, we can significantly reduce the frequency of guest
> SIE exits for zPCI.  We then see additional gains by handling a hot-path
> instruction that can still intercept to the hypervisor (RPCIT) directly
> in kvm via the new IOMMU domain, whose map operations update the host
> DMA table with pinned guest entries over the specified range.
> 
>  From the perspective of guest configuration, you passthrough zPCI devices
> in the same manner as before, with intepretation support being used by
> default if available in kernel+qemu.
> 
> Will reply with a link to the associated QEMU series.

QEMU series:
https://lore.kernel.org/kvm/20220314194920.58888-1-mjrosato@linux.ibm.com/


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type
  2022-03-14 19:44 ` [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type Matthew Rosato
@ 2022-03-14 21:36   ` Jason Gunthorpe
  2022-03-15 10:49   ` Robin Murphy
  1 sibling, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-14 21:36 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: linux-s390, alex.williamson, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On Mon, Mar 14, 2022 at 03:44:33PM -0400, Matthew Rosato wrote:
> s390x will introduce an additional domain type that is used for
> managing IOMMU owned by KVM.  Define the type here and add an
> interface for allocating a specified type vs the default type.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>  drivers/iommu/iommu.c |  7 +++++++
>  include/linux/iommu.h | 12 ++++++++++++
>  2 files changed, 19 insertions(+)

I think the general idea is right, but I'm not keen on this as an
interface at all.

We are trying to build in iommufd a generic interface for an IOMMU
driver to expose IOMMU-device-specific domains such as this in a
general purpose way so all the platforms can get what they need.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-14 19:44 ` [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type Matthew Rosato
@ 2022-03-14 21:38   ` Jason Gunthorpe
  2022-03-15 13:49     ` Matthew Rosato
  2022-03-14 22:50   ` Alex Williamson
  1 sibling, 1 reply; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-14 21:38 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: linux-s390, alex.williamson, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On Mon, Mar 14, 2022 at 03:44:34PM -0400, Matthew Rosato wrote:

> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 9394aa9444c1..0bec97077d61 100644
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -77,6 +77,7 @@ struct vfio_iommu {
>  	bool			nesting;
>  	bool			dirty_page_tracking;
>  	bool			container_open;
> +	bool			kvm;
>  	struct list_head	emulated_iommu_groups;
>  };
>  
> @@ -2203,7 +2204,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  		goto out_free_group;
>  
>  	ret = -EIO;
> -	domain->domain = iommu_domain_alloc(bus);
> +
> +	if (iommu->kvm)
> +		domain->domain = iommu_domain_alloc_type(bus, IOMMU_DOMAIN_KVM);
> +	else
> +		domain->domain = iommu_domain_alloc(bus);
> +
>  	if (!domain->domain)
>  		goto out_free_domain;
>  
> @@ -2552,6 +2558,9 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	case VFIO_TYPE1v2_IOMMU:
>  		iommu->v2 = true;
>  		break;
> +	case VFIO_KVM_IOMMU:
> +		iommu->kvm = true;
> +		break;

Same remark for this - but more - this is called KVM but it doesn't
accept a kvm FD or any thing else to link the domain to the KVM
in-use.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 22/32] KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM
  2022-03-14 19:44 ` [PATCH v4 22/32] KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM Matthew Rosato
@ 2022-03-14 21:46   ` Jason Gunthorpe
  2022-03-15 16:39     ` Matthew Rosato
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-14 21:46 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: linux-s390, alex.williamson, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On Mon, Mar 14, 2022 at 03:44:41PM -0400, Matthew Rosato wrote:
> +int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev)
> +{
> +	struct vfio_device *vdev;
> +	struct pci_dev *pdev;
> +	int rc;
> +
> +	rc = kvm_s390_pci_dev_open(zdev);
> +	if (rc)
> +		return rc;
> +
> +	pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
> +	if (!pdev) {
> +		rc = -ENODEV;
> +		goto exit_err;
> +	}
> +
> +	vdev = get_vdev(&pdev->dev);
> +	if (!vdev) {
> +		pci_dev_put(pdev);
> +		rc = -ENODEV;
> +		goto exit_err;
> +	}
> +
> +	zdev->kzdev->nb.notifier_call = kvm_s390_pci_group_notifier;
> +
> +	/*
> +	 * At this point, a KVM should already be associated with this device,
> +	 * so registering the notifier now should immediately trigger the
> +	 * event.  We also want to know if the KVM association is later removed
> +	 * to ensure proper cleanup happens.
> +	 */
> +	rc = register_notifier(vdev->dev, &zdev->kzdev->nb);
> +
> +	put_vdev(vdev);
> +	pci_dev_put(pdev);
> +
> +	/* Make sure the registered KVM matches the KVM issuing the ioctl */
> +	if (rc || zdev->kzdev->kvm != kvm) {
> +		rc = -ENODEV;
> +		goto exit_err;
> +	}
> +
> +	/* Must support KVM-managed IOMMU to proceed */
> +	if (IS_ENABLED(CONFIG_S390_KVM_IOMMU))
> +		rc = zpci_iommu_attach_kvm(zdev, kvm);
> +	else
> +		rc = -EINVAL;

This seems like kind of a strange API, shouldn't kvm be getting a
reference on the underlying iommu_domain and then calling into it to
get the mapping table instead of pushing KVM specific logic into the
iommu driver?

I would be nice if all the special kvm stuff could more isolated in
kvm code.

I'm still a little unclear about why this is so complicated - can't
you get the iommu_domain from the group FD directly in KVM code as
power does?

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 29/32] vfio-pci/zdev: add DTSM to clp group capability
  2022-03-14 19:44 ` [PATCH v4 29/32] vfio-pci/zdev: add DTSM to clp group capability Matthew Rosato
@ 2022-03-14 21:49   ` Jason Gunthorpe
  2022-03-15 14:39     ` Matthew Rosato
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-14 21:49 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: linux-s390, alex.williamson, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On Mon, Mar 14, 2022 at 03:44:48PM -0400, Matthew Rosato wrote:
> The DTSM, or designation type supported mask, indicates what IOAT formats
> are available to the guest.  For an interpreted device, userspace will not
> know what format(s) the IOAT assist supports, so pass it via the
> capability chain.  Since the value belongs to the Query PCI Function Group
> clp, let's extend the existing capability with a new version.

Why is this on the VFIO device?

Maybe I don't quite understand it right, but the IOAT is the
'userspace page table'?

That is something that should be modeled as a nested iommu domain.

Querying the formats and any control logic for this should be on the
iommu side not built into VFIO.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-14 19:44 ` [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type Matthew Rosato
  2022-03-14 21:38   ` Jason Gunthorpe
@ 2022-03-14 22:50   ` Alex Williamson
  2022-03-14 23:18     ` Jason Gunthorpe
  1 sibling, 1 reply; 66+ messages in thread
From: Alex Williamson @ 2022-03-14 22:50 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: linux-s390, cohuck, schnelle, farman, pmorel, borntraeger, hca,
	gor, gerald.schaefer, agordeev, svens, frankja, david, imbrenda,
	vneethv, oberpar, freude, thuth, pasic, joro, will, pbonzini,
	corbet, jgg, kvm, linux-kernel, iommu, linux-doc

On Mon, 14 Mar 2022 15:44:34 -0400
Matthew Rosato <mjrosato@linux.ibm.com> wrote:

> s390x will introduce a new IOMMU domain type where the mappings are
> managed by KVM rather than in response to userspace mapping ioctls.  Allow
> for specifying this type on the VFIO_SET_IOMMU ioctl and triggering the
> appropriate iommu interface for overriding the default domain.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 12 +++++++++++-
>  include/uapi/linux/vfio.h       |  6 ++++++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 9394aa9444c1..0bec97077d61 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -77,6 +77,7 @@ struct vfio_iommu {
>  	bool			nesting;
>  	bool			dirty_page_tracking;
>  	bool			container_open;
> +	bool			kvm;
>  	struct list_head	emulated_iommu_groups;
>  };
>  
> @@ -2203,7 +2204,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>  		goto out_free_group;
>  
>  	ret = -EIO;
> -	domain->domain = iommu_domain_alloc(bus);
> +
> +	if (iommu->kvm)
> +		domain->domain = iommu_domain_alloc_type(bus, IOMMU_DOMAIN_KVM);
> +	else
> +		domain->domain = iommu_domain_alloc(bus);
> +
>  	if (!domain->domain)
>  		goto out_free_domain;
>  
> @@ -2552,6 +2558,9 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>  	case VFIO_TYPE1v2_IOMMU:
>  		iommu->v2 = true;
>  		break;
> +	case VFIO_KVM_IOMMU:
> +		iommu->kvm = true;
> +		break;
>  	default:
>  		kfree(iommu);
>  		return ERR_PTR(-EINVAL);
> @@ -2637,6 +2646,7 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
>  	case VFIO_TYPE1_NESTING_IOMMU:
>  	case VFIO_UNMAP_ALL:
>  	case VFIO_UPDATE_VADDR:
> +	case VFIO_KVM_IOMMU:
>  		return 1;
>  	case VFIO_DMA_CC_IOMMU:
>  		if (!iommu)
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ef33ea002b0b..666edb6957ac 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -52,6 +52,12 @@
>  /* Supports the vaddr flag for DMA map and unmap */
>  #define VFIO_UPDATE_VADDR		10
>  
> +/*
> + * The KVM_IOMMU type implies that the hypervisor will control the mappings
> + * rather than userspace
> + */
> +#define VFIO_KVM_IOMMU			11

Then why is this hosted in the type1 code that exposes a wide variety
of userspace interfaces?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-14 22:50   ` Alex Williamson
@ 2022-03-14 23:18     ` Jason Gunthorpe
  2022-03-15  7:57       ` Tian, Kevin
  2022-03-15 13:36       ` Matthew Rosato
  0 siblings, 2 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-14 23:18 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Matthew Rosato, linux-s390, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On Mon, Mar 14, 2022 at 04:50:33PM -0600, Alex Williamson wrote:

> > +/*
> > + * The KVM_IOMMU type implies that the hypervisor will control the mappings
> > + * rather than userspace
> > + */
> > +#define VFIO_KVM_IOMMU			11
> 
> Then why is this hosted in the type1 code that exposes a wide variety
> of userspace interfaces?  Thanks,

It is really badly named, this is the root level of a 2 stage nested
IO page table, and this approach needed a special flag to distinguish
the setup from the normal iommu_domain.

If we do try to stick this into VFIO it should probably use the
VFIO_TYPE1_NESTING_IOMMU instead - however, we would like to delete
that flag entirely as it was never fully implemented, was never used,
and isn't part of what we are proposing for IOMMU nesting on ARM
anyhow. (So far I've found nobody to explain what the plan here was..)

This is why I said the second level should be an explicit iommu_domain
all on its own that is explicitly coupled to the KVM to read the page
tables, if necessary.

But I'm not sure that reading the userspace io page tables with KVM is
even the best thing to do - the iommu driver already has the pinned
memory, it would be faster and more modular to traverse the io page
tables through the pfns in the root iommu_domain than by having KVM do
the translations. Lets see what Matthew says..

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-14 23:18     ` Jason Gunthorpe
@ 2022-03-15  7:57       ` Tian, Kevin
  2022-03-15 14:17         ` Matthew Rosato
  2022-03-15 13:36       ` Matthew Rosato
  1 sibling, 1 reply; 66+ messages in thread
From: Tian, Kevin @ 2022-03-15  7:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Matthew Rosato, linux-s390, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 15, 2022 7:18 AM
> 
> On Mon, Mar 14, 2022 at 04:50:33PM -0600, Alex Williamson wrote:
> 
> > > +/*
> > > + * The KVM_IOMMU type implies that the hypervisor will control the
> mappings
> > > + * rather than userspace
> > > + */
> > > +#define VFIO_KVM_IOMMU			11
> >
> > Then why is this hosted in the type1 code that exposes a wide variety
> > of userspace interfaces?  Thanks,
> 
> It is really badly named, this is the root level of a 2 stage nested
> IO page table, and this approach needed a special flag to distinguish
> the setup from the normal iommu_domain.
> 
> If we do try to stick this into VFIO it should probably use the
> VFIO_TYPE1_NESTING_IOMMU instead - however, we would like to delete
> that flag entirely as it was never fully implemented, was never used,
> and isn't part of what we are proposing for IOMMU nesting on ARM
> anyhow. (So far I've found nobody to explain what the plan here was..)
> 
> This is why I said the second level should be an explicit iommu_domain
> all on its own that is explicitly coupled to the KVM to read the page
> tables, if necessary.
> 
> But I'm not sure that reading the userspace io page tables with KVM is
> even the best thing to do - the iommu driver already has the pinned
> memory, it would be faster and more modular to traverse the io page
> tables through the pfns in the root iommu_domain than by having KVM do
> the translations. Lets see what Matthew says..
> 

Reading this thread it's sort of like an optimization to software nesting.
If that is the case does it make more sense to complete the basic form
of software nesting first and then adds this optimization?

The basic form would allow the userspace to create a special domain
type which points to a user/guest page table (like hardware nesting)
but doesn't install the user page table to the IOMMU hardware (unlike
hardware nesting). When receiving invalidate cmd from userspace 
the iommu driver walks the user page table (1st-level) and the parent 
page table (2nd-level) to generate a shadow mapping for the 
invalidated range in the non-nested hardware page table of this
special domain type.

Once that works what this series does just changes the matter of
how the invalidate cmd is triggered. Previously iommu driver receives
invalidate cmd from Qemu (via iommufd uAPI) while now receiving
the cmd from kvm (via iommufd kAPI) upon interception of RPCIT.
From this angle once the connection between iommufd and kvm fd 
is established there is even no direct talk between iommu driver and
kvm. 

Thanks
Kevin


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type
  2022-03-14 19:44 ` [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type Matthew Rosato
  2022-03-14 21:36   ` Jason Gunthorpe
@ 2022-03-15 10:49   ` Robin Murphy
  2022-03-17  5:47     ` Tian, Kevin
  1 sibling, 1 reply; 66+ messages in thread
From: Robin Murphy @ 2022-03-15 10:49 UTC (permalink / raw)
  To: Matthew Rosato, linux-s390
  Cc: kvm, david, thuth, linux-kernel, vneethv, agordeev, imbrenda,
	will, frankja, corbet, linux-doc, pasic, jgg, gerald.schaefer,
	borntraeger, farman, gor, schnelle, hca, alex.williamson, freude,
	pmorel, cohuck, oberpar, iommu, svens, pbonzini,
	Jean-Philippe Brucker

On 2022-03-14 19:44, Matthew Rosato wrote:
> s390x will introduce an additional domain type that is used for
> managing IOMMU owned by KVM.  Define the type here and add an
> interface for allocating a specified type vs the default type.

I'm also not a huge fan of adding a new domain_alloc interface like 
this, however if it is justifiable, then please make it take struct 
device rather than struct bus_type as an argument.

It also sounds like there may be a degree of conceptual overlap here 
with what Jean-Philippe is working on for sharing pagetables between KVM 
and SMMU for Android pKVM, so it's probably worth some thought over 
whether there's any scope for common interfaces in terms of actual 
implementation.

Thanks,
Robin.

> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   drivers/iommu/iommu.c |  7 +++++++
>   include/linux/iommu.h | 12 ++++++++++++
>   2 files changed, 19 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index f2c45b85b9fc..8bb57e0e3945 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1976,6 +1976,13 @@ void iommu_domain_free(struct iommu_domain *domain)
>   }
>   EXPORT_SYMBOL_GPL(iommu_domain_free);
>   
> +struct iommu_domain *iommu_domain_alloc_type(struct bus_type *bus,
> +					     unsigned int t)
> +{
> +	return __iommu_domain_alloc(bus, t);
> +}
> +EXPORT_SYMBOL_GPL(iommu_domain_alloc_type);
> +
>   static int __iommu_attach_device(struct iommu_domain *domain,
>   				 struct device *dev)
>   {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 9208eca4b0d1..b427bbb9f387 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -63,6 +63,7 @@ struct iommu_domain_geometry {
>   					      implementation              */
>   #define __IOMMU_DOMAIN_PT	(1U << 2)  /* Domain is identity mapped   */
>   #define __IOMMU_DOMAIN_DMA_FQ	(1U << 3)  /* DMA-API uses flush queue    */
> +#define __IOMMU_DOMAIN_KVM	(1U << 4)  /* Domain is controlled by KVM */
>   
>   /*
>    * This are the possible domain-types
> @@ -77,6 +78,7 @@ struct iommu_domain_geometry {
>    *				  certain optimizations for these domains
>    *	IOMMU_DOMAIN_DMA_FQ	- As above, but definitely using batched TLB
>    *				  invalidation.
> + *	IOMMU_DOMAIN_KVM	- DMA mappings managed by KVM, used for VMs
>    */
>   #define IOMMU_DOMAIN_BLOCKED	(0U)
>   #define IOMMU_DOMAIN_IDENTITY	(__IOMMU_DOMAIN_PT)
> @@ -86,6 +88,8 @@ struct iommu_domain_geometry {
>   #define IOMMU_DOMAIN_DMA_FQ	(__IOMMU_DOMAIN_PAGING |	\
>   				 __IOMMU_DOMAIN_DMA_API |	\
>   				 __IOMMU_DOMAIN_DMA_FQ)
> +#define IOMMU_DOMAIN_KVM	(__IOMMU_DOMAIN_PAGING |	\
> +				 __IOMMU_DOMAIN_KVM)
>   
>   struct iommu_domain {
>   	unsigned type;
> @@ -421,6 +425,8 @@ extern bool iommu_capable(struct bus_type *bus, enum iommu_cap cap);
>   extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
>   extern struct iommu_group *iommu_group_get_by_id(int id);
>   extern void iommu_domain_free(struct iommu_domain *domain);
> +extern struct iommu_domain *iommu_domain_alloc_type(struct bus_type *bus,
> +						    unsigned int t);
>   extern int iommu_attach_device(struct iommu_domain *domain,
>   			       struct device *dev);
>   extern void iommu_detach_device(struct iommu_domain *domain,
> @@ -708,6 +714,12 @@ static inline void iommu_domain_free(struct iommu_domain *domain)
>   {
>   }
>   
> +static inline struct iommu_domain *iommu_domain_alloc_type(struct bus_type *bus,
> +							   unsigned int t)
> +{
> +	return NULL;
> +}
> +
>   static inline int iommu_attach_device(struct iommu_domain *domain,
>   				      struct device *dev)
>   {

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-14 23:18     ` Jason Gunthorpe
  2022-03-15  7:57       ` Tian, Kevin
@ 2022-03-15 13:36       ` Matthew Rosato
  2022-03-15 14:55         ` Jason Gunthorpe
  1 sibling, 1 reply; 66+ messages in thread
From: Matthew Rosato @ 2022-03-15 13:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: linux-s390, cohuck, schnelle, farman, pmorel, borntraeger, hca,
	gor, gerald.schaefer, agordeev, svens, frankja, david, imbrenda,
	vneethv, oberpar, freude, thuth, pasic, joro, will, pbonzini,
	corbet, kvm, linux-kernel, iommu, linux-doc

On 3/14/22 7:18 PM, Jason Gunthorpe wrote:
> On Mon, Mar 14, 2022 at 04:50:33PM -0600, Alex Williamson wrote:
> 
>>> +/*
>>> + * The KVM_IOMMU type implies that the hypervisor will control the mappings
>>> + * rather than userspace
>>> + */
>>> +#define VFIO_KVM_IOMMU			11
>>
>> Then why is this hosted in the type1 code that exposes a wide variety
>> of userspace interfaces?  Thanks,
> 
> It is really badly named, this is the root level of a 2 stage nested
> IO page table, and this approach needed a special flag to distinguish
> the setup from the normal iommu_domain.

^^ Yes, this.

> 
> If we do try to stick this into VFIO it should probably use the
> VFIO_TYPE1_NESTING_IOMMU instead - however, we would like to delete
> that flag entirely as it was never fully implemented, was never used,
> and isn't part of what we are proposing for IOMMU nesting on ARM
> anyhow. (So far I've found nobody to explain what the plan here was..)
> 

I'm open to suggestions on how better to tie this into vfio.  The 
scenario basically plays out that:

1) the iommu will be domain_alloc'd once VFIO_SET_IOMMU is issued -- so 
at that time (or earlier) we have to make the decision on whether to use 
the standard IOMMU or this alternate KVM/nested IOMMU.

2) At the time VFIO_SET_IOMMU is received, we have not yet associated 
the vfio group with a KVM, so we can't (today) use this as an indicator 
to guess which IOMMU strategy to use.

3) Ideally, even if we changed QEMU vfio to make the KVM association 
earlier, it would be nice to still be able to indicate that we want to 
use the standard iommu/type1 despite a KVM association existing (e.g. 
backwards compatibility with older QEMU that lacks 'interpretation' 
support, nested virtualization scenarios).

> This is why I said the second level should be an explicit iommu_domain
> all on its own that is explicitly coupled to the KVM to read the page
> tables, if necessary.

Maybe I misunderstood this.  Are you proposing 2 layers of IOMMU that
interact with each other within host kernel space?

A second level runs the guest tables, pins the appropriate pieces from 
the guest to get the resulting phys_addr(s) which are then passed via 
iommu to a first level via map (or unmap)?

> 
> But I'm not sure that reading the userspace io page tables with KVM is
> even the best thing to do - the iommu driver already has the pinned
> memory, it would be faster and more modular to traverse the io page
> tables through the pfns in the root iommu_domain than by having KVM do
> the translations. Lets see what Matthew says..

OK, you lost me a bit here.  And this may be associated with the above.

So, what the current implementation is doing is reading the guest DMA 
tables (which we must pin the first time we access them) and then map 
the PTEs of the associated guest DMA entries into the associated host 
DMA table (so, again pin and place the address, or unpin and 
invalidate).  Basically we are shadowing the first level DMA table as a 
copy of the second level DMA table with the host address(es) of the 
pinned guest page(s).

I'm unclear where you are proposing the pinning be done if not by the 
iommu domain traversing the tables to perform the 'shadow' operation.




^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-14 21:38   ` Jason Gunthorpe
@ 2022-03-15 13:49     ` Matthew Rosato
  2022-03-15 14:38       ` Jason Gunthorpe
  0 siblings, 1 reply; 66+ messages in thread
From: Matthew Rosato @ 2022-03-15 13:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, alex.williamson, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On 3/14/22 5:38 PM, Jason Gunthorpe wrote:
> On Mon, Mar 14, 2022 at 03:44:34PM -0400, Matthew Rosato wrote:
> 
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 9394aa9444c1..0bec97077d61 100644
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -77,6 +77,7 @@ struct vfio_iommu {
>>   	bool			nesting;
>>   	bool			dirty_page_tracking;
>>   	bool			container_open;
>> +	bool			kvm;
>>   	struct list_head	emulated_iommu_groups;
>>   };
>>   
>> @@ -2203,7 +2204,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>>   		goto out_free_group;
>>   
>>   	ret = -EIO;
>> -	domain->domain = iommu_domain_alloc(bus);
>> +
>> +	if (iommu->kvm)
>> +		domain->domain = iommu_domain_alloc_type(bus, IOMMU_DOMAIN_KVM);
>> +	else
>> +		domain->domain = iommu_domain_alloc(bus);
>> +
>>   	if (!domain->domain)
>>   		goto out_free_domain;
>>   
>> @@ -2552,6 +2558,9 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>>   	case VFIO_TYPE1v2_IOMMU:
>>   		iommu->v2 = true;
>>   		break;
>> +	case VFIO_KVM_IOMMU:
>> +		iommu->kvm = true;
>> +		break;
> 
> Same remark for this - but more - this is called KVM but it doesn't
> accept a kvm FD or any thing else to link the domain to the KVM
> in-use.

Right...  The name is poor, but with the current design the KVM 
association comes shortly after.  To summarize, with this series, the 
following relevant steps occur:

1) VFIO_SET_IOMMU: Indicate we wish to use the alternate IOMMU domain
	-> At this point, the IOMMU will reject any maps (no KVM, no guest 
table anchor)
2) KVM ioctl "start":
	-> Register the KVM with the IOMMU domain
	-> At this point, IOMMU will still reject any maps (no guest table anchor)
3) KVM ioctl "register ioat"
	-> Register the guest DMA table head with the IOMMU domain
	-> now IOMMU maps are allowed

The rationale for splitting steps 1 and 2 are that VFIO_SET_IOMMU 
doesn't have a mechanism for specifying more than the type as an arg, 
no?  Otherwise yes, you could specify a kvm fd at this point and it 
would have some other advantages (e.g. skip notifier).  But we still 
can't use the IOMMU for mapping until step 3.

The rationale for splitting steps 2 and 3 are twofold:  1) during init, 
we simply don't know where the guest anchor will be when we allocate the 
domain and 2) because the guest can technically clear and re-initialize 
their DMA space during the life of the guest, moving the location of the 
table anchor.  We would receive another ioctl operation to unregister 
the guest table anchor and again reject any map operation until a new 
table location is provided.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-15  7:57       ` Tian, Kevin
@ 2022-03-15 14:17         ` Matthew Rosato
  2022-03-15 17:01           ` Matthew Rosato
  0 siblings, 1 reply; 66+ messages in thread
From: Matthew Rosato @ 2022-03-15 14:17 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Alex Williamson
  Cc: linux-s390, cohuck, schnelle, farman, pmorel, borntraeger, hca,
	gor, gerald.schaefer, agordeev, svens, frankja, david, imbrenda,
	vneethv, oberpar, freude, thuth, pasic, joro, will, pbonzini,
	corbet, kvm, linux-kernel, iommu, linux-doc

On 3/15/22 3:57 AM, Tian, Kevin wrote:
>> From: Jason Gunthorpe <jgg@nvidia.com>
>> Sent: Tuesday, March 15, 2022 7:18 AM
>>
>> On Mon, Mar 14, 2022 at 04:50:33PM -0600, Alex Williamson wrote:
>>
>>>> +/*
>>>> + * The KVM_IOMMU type implies that the hypervisor will control the
>> mappings
>>>> + * rather than userspace
>>>> + */
>>>> +#define VFIO_KVM_IOMMU			11
>>>
>>> Then why is this hosted in the type1 code that exposes a wide variety
>>> of userspace interfaces?  Thanks,
>>
>> It is really badly named, this is the root level of a 2 stage nested
>> IO page table, and this approach needed a special flag to distinguish
>> the setup from the normal iommu_domain.
>>
>> If we do try to stick this into VFIO it should probably use the
>> VFIO_TYPE1_NESTING_IOMMU instead - however, we would like to delete
>> that flag entirely as it was never fully implemented, was never used,
>> and isn't part of what we are proposing for IOMMU nesting on ARM
>> anyhow. (So far I've found nobody to explain what the plan here was..)
>>
>> This is why I said the second level should be an explicit iommu_domain
>> all on its own that is explicitly coupled to the KVM to read the page
>> tables, if necessary.
>>
>> But I'm not sure that reading the userspace io page tables with KVM is
>> even the best thing to do - the iommu driver already has the pinned
>> memory, it would be faster and more modular to traverse the io page
>> tables through the pfns in the root iommu_domain than by having KVM do
>> the translations. Lets see what Matthew says..
>>
> 
> Reading this thread it's sort of like an optimization to software nesting.

Yes, we want to avoid breaking to userspace for a very frequent 
operation (RPCIT / updating shadow mappings)

> If that is the case does it make more sense to complete the basic form
> of software nesting first and then adds this optimization?
> 
> The basic form would allow the userspace to create a special domain
> type which points to a user/guest page table (like hardware nesting)
> but doesn't install the user page table to the IOMMU hardware (unlike
> hardware nesting). When receiving invalidate cmd from userspace > the iommu driver walks the user page table (1st-level) and the parent
> page table (2nd-level) to generate a shadow mapping for the
> invalidated range in the non-nested hardware page table of this
> special domain type.
> 
> Once that works what this series does just changes the matter of
> how the invalidate cmd is triggered. Previously iommu driver receives
> invalidate cmd from Qemu (via iommufd uAPI) while now receiving
> the cmd from kvm (via iommufd kAPI) upon interception of RPCIT.
>  From this angle once the connection between iommufd and kvm fd
> is established there is even no direct talk between iommu driver and
> kvm.

But something somewhere still needs to be responsible for 
pinning/unpinning of the guest table entries upon each RPCIT 
interception.  e.g. the RPCIT intercept can happen because the guest 
wants to invalidate some old mappings or has generated some new mappings 
over a range, so we must shadow the new mappings (by pinning the guest 
entries and placing them in the host hardware table / unpinning 
invalidated ones and clearing their entry in the host hardware table).



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-15 13:49     ` Matthew Rosato
@ 2022-03-15 14:38       ` Jason Gunthorpe
  2022-03-15 16:29         ` Matthew Rosato
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-15 14:38 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: linux-s390, alex.williamson, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On Tue, Mar 15, 2022 at 09:49:01AM -0400, Matthew Rosato wrote:

> The rationale for splitting steps 1 and 2 are that VFIO_SET_IOMMU doesn't
> have a mechanism for specifying more than the type as an arg, no?  Otherwise
> yes, you could specify a kvm fd at this point and it would have some other
> advantages (e.g. skip notifier).  But we still can't use the IOMMU for
> mapping until step 3.

Stuff like this is why I'd be much happier if this could join our
iommfd project so we can have clean modeling of the multiple iommu_domains.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 29/32] vfio-pci/zdev: add DTSM to clp group capability
  2022-03-14 21:49   ` Jason Gunthorpe
@ 2022-03-15 14:39     ` Matthew Rosato
  2022-03-15 14:56       ` Jason Gunthorpe
  0 siblings, 1 reply; 66+ messages in thread
From: Matthew Rosato @ 2022-03-15 14:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, alex.williamson, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On 3/14/22 5:49 PM, Jason Gunthorpe wrote:
> On Mon, Mar 14, 2022 at 03:44:48PM -0400, Matthew Rosato wrote:
>> The DTSM, or designation type supported mask, indicates what IOAT formats
>> are available to the guest.  For an interpreted device, userspace will not
>> know what format(s) the IOAT assist supports, so pass it via the
>> capability chain.  Since the value belongs to the Query PCI Function Group
>> clp, let's extend the existing capability with a new version.
> 
> Why is this on the VFIO device?

Current vfio_pci_zdev support adds a series of capability chains to the 
VFIO_DEVICE_GET_INFO ioctl.  These capability chains are all related to 
output values associated with what are basically s390x query instructions.

The capability chain being modified by this patch is used to populate a 
response to the 'query this zPCI group' instruction.

> 
> Maybe I don't quite understand it right, but the IOAT is the
> 'userspace page table'?

IOAT = I/O Address Translation tables;  the head of which is called the 
IOTA (translation anchor).  But yes, this would generally refer to the 
guest DMA tables.

Specifically here we are only talking about the DTSM which is the 
formats that the guest is allowed to use for their address translation 
tables, because the hardware (or in our case the intermediary kvm iommu) 
can only operate on certain formats.

> 
> That is something that should be modeled as a nested iommu domain.
> 
> Querying the formats and any control logic for this should be on the
> iommu side not built into VFIO.

I agree that the DTSM is really controlled by what the IOMMU domain can 
support (e.g. what guest table formats it can actually operate on) and 
so the DTSM value should come from there vs out of KVM; but is there 
harm in including the query response data here along with the rest of 
the response information for 'query this zPCI group'?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-15 13:36       ` Matthew Rosato
@ 2022-03-15 14:55         ` Jason Gunthorpe
  2022-03-15 16:04           ` Matthew Rosato
  2022-03-18  7:01           ` Tian, Kevin
  0 siblings, 2 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-15 14:55 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: Alex Williamson, linux-s390, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On Tue, Mar 15, 2022 at 09:36:08AM -0400, Matthew Rosato wrote:
> > If we do try to stick this into VFIO it should probably use the
> > VFIO_TYPE1_NESTING_IOMMU instead - however, we would like to delete
> > that flag entirely as it was never fully implemented, was never used,
> > and isn't part of what we are proposing for IOMMU nesting on ARM
> > anyhow. (So far I've found nobody to explain what the plan here was..)
> > 
> 
> I'm open to suggestions on how better to tie this into vfio.  The scenario
> basically plays out that:

Ideally I would like it to follow the same 'user space page table'
design that Eric and Kevin are working on for HW iommu.

You have an 1st iommu_domain that maps and pins the entire guest physical
address space.

You have an nested iommu_domain that represents the user page table
(the ioat in your language I think)

When the guest says it wants to set a user page table then you create
the nested iommu_domain representing that user page table and pass in
the anchor (guest address of the root IOPTE) to the kernel to do the
work.

The rule for all other HW's is that the user space page table is
translated by the top level kernel page table. So when you traverse it
you fetch the CPU page storing the guest's IOPTE by doing an IOVA
translation through the first level page table - not through KVM.

Since the first level page table an the KVM GPA should be 1:1 this is
an equivalent operation.

> 1) the iommu will be domain_alloc'd once VFIO_SET_IOMMU is issued -- so at
> that time (or earlier) we have to make the decision on whether to use the
> standard IOMMU or this alternate KVM/nested IOMMU.

So in terms of iommufd I would see it this would be an iommufd 'create
a device specific iomm_domain' IOCTL and you can pass in a S390
specific data blob to make it into this special mode.

> > This is why I said the second level should be an explicit iommu_domain
> > all on its own that is explicitly coupled to the KVM to read the page
> > tables, if necessary.
> 
> Maybe I misunderstood this.  Are you proposing 2 layers of IOMMU that
> interact with each other within host kernel space?
> 
> A second level runs the guest tables, pins the appropriate pieces from the
> guest to get the resulting phys_addr(s) which are then passed via iommu to a
> first level via map (or unmap)?


The first level iommu_domain has the 'type1' map and unmap and pins
the pages. This is the 1:1 map with the GPA and ends up pinning all
guest memory because the point is you don't want to take a memory pin
on your performance path

The second level iommu_domain points to a single IO page table in GPA
and is created/destroyed whenever the guest traps to the hypervisor to
manipulate the anchor (ie the GPA of the guest IO page table).

> > But I'm not sure that reading the userspace io page tables with KVM is
> > even the best thing to do - the iommu driver already has the pinned
> > memory, it would be faster and more modular to traverse the io page
> > tables through the pfns in the root iommu_domain than by having KVM do
> > the translations. Lets see what Matthew says..
> 
> OK, you lost me a bit here.  And this may be associated with the above.
> 
> So, what the current implementation is doing is reading the guest DMA tables
> (which we must pin the first time we access them) and then map the PTEs of
> the associated guest DMA entries into the associated host DMA table (so,
> again pin and place the address, or unpin and invalidate).  Basically we are
> shadowing the first level DMA table as a copy of the second level DMA table
> with the host address(es) of the pinned guest page(s).

You can't pin/unpin in this path, there is no real way to handle error
and ulimit stuff here, plus it is really slow. I didn't notice any of
this in your patches, so what do you mean by 'pin' above?

To be like other IOMMU nesting drivers the pages should already be
pinned and stored in the 1st iommu_domain, lets say in an xarray. This
xarray is populated by type1 map/unmap sytem calls like any
iommu_domain.

A nested iommu_domain should create the real HW IO page table and
associate it with the real HW IOMMU and record the parent 1st level iommu_domain.

When you do the shadowing you use the xarray of the 1st level
iommu_domain to translate from GPA to host physical and there is no
pinning/etc involved. After walking the guest table and learning the
final vIOVA it is translated through the xarray to a CPU physical and
then programmed into the real HW IO page table.

There is no reason to use KVM to do any of this, and is actively wrong
to place CPU pages from KVM into an IOPTE that did not come through
the type1 map/unmap calls that do all the proper validation and
accounting.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 29/32] vfio-pci/zdev: add DTSM to clp group capability
  2022-03-15 14:39     ` Matthew Rosato
@ 2022-03-15 14:56       ` Jason Gunthorpe
  0 siblings, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-15 14:56 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: linux-s390, alex.williamson, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On Tue, Mar 15, 2022 at 10:39:18AM -0400, Matthew Rosato wrote:
> > That is something that should be modeled as a nested iommu domain.
> > 
> > Querying the formats and any control logic for this should be on the
> > iommu side not built into VFIO.
> 
> I agree that the DTSM is really controlled by what the IOMMU domain can
> support (e.g. what guest table formats it can actually operate on) and so
> the DTSM value should come from there vs out of KVM; but is there harm in
> including the query response data here along with the rest of the response
> information for 'query this zPCI group'?

'Harm'? No, but I think it is wrong encapsulation and layering.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-15 14:55         ` Jason Gunthorpe
@ 2022-03-15 16:04           ` Matthew Rosato
  2022-03-15 17:18             ` Jason Gunthorpe
  2022-03-18  7:01           ` Tian, Kevin
  1 sibling, 1 reply; 66+ messages in thread
From: Matthew Rosato @ 2022-03-15 16:04 UTC (permalink / raw)
  To: Jason Gunthorpe, borntraeger
  Cc: Alex Williamson, linux-s390, cohuck, schnelle, farman, pmorel,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On 3/15/22 10:55 AM, Jason Gunthorpe wrote:
> On Tue, Mar 15, 2022 at 09:36:08AM -0400, Matthew Rosato wrote:
>>> If we do try to stick this into VFIO it should probably use the
>>> VFIO_TYPE1_NESTING_IOMMU instead - however, we would like to delete
>>> that flag entirely as it was never fully implemented, was never used,
>>> and isn't part of what we are proposing for IOMMU nesting on ARM
>>> anyhow. (So far I've found nobody to explain what the plan here was..)
>>>
>>
>> I'm open to suggestions on how better to tie this into vfio.  The scenario
>> basically plays out that:
> 
> Ideally I would like it to follow the same 'user space page table'
> design that Eric and Kevin are working on for HW iommu.

'[RFC v16 0/9] SMMUv3 Nested Stage Setup (IOMMU part)' ??

https://lore.kernel.org/linux-iommu/20211027104428.1059740-1-eric.auger@redhat.com/

> 
> You have an 1st iommu_domain that maps and pins the entire guest physical
> address space.

Ahh, I see.

@Christian would it be OK to pursue a model that pins all of guest 
memory upfront?

> 
> You have an nested iommu_domain that represents the user page table
> (the ioat in your language I think)

Yes

> 
> When the guest says it wants to set a user page table then you create
> the nested iommu_domain representing that user page table and pass in
> the anchor (guest address of the root IOPTE) to the kernel to do the
> work. >
> The rule for all other HW's is that the user space page table is
> translated by the top level kernel page table. So when you traverse it
> you fetch the CPU page storing the guest's IOPTE by doing an IOVA
> translation through the first level page table - not through KVM.
> 
> Since the first level page table an the KVM GPA should be 1:1 this is
> an equivalent operation.
> 
>> 1) the iommu will be domain_alloc'd once VFIO_SET_IOMMU is issued -- so at
>> that time (or earlier) we have to make the decision on whether to use the
>> standard IOMMU or this alternate KVM/nested IOMMU.
> 
> So in terms of iommufd I would see it this would be an iommufd 'create
> a device specific iomm_domain' IOCTL and you can pass in a S390
> specific data blob to make it into this special mode.
> 
>>> This is why I said the second level should be an explicit iommu_domain
>>> all on its own that is explicitly coupled to the KVM to read the page
>>> tables, if necessary.
>>
>> Maybe I misunderstood this.  Are you proposing 2 layers of IOMMU that
>> interact with each other within host kernel space?
>>
>> A second level runs the guest tables, pins the appropriate pieces from the
>> guest to get the resulting phys_addr(s) which are then passed via iommu to a
>> first level via map (or unmap)?
> 
> 
> The first level iommu_domain has the 'type1' map and unmap and pins
> the pages. This is the 1:1 map with the GPA and ends up pinning all
> guest memory because the point is you don't want to take a memory pin
> on your performance path
> 
> The second level iommu_domain points to a single IO page table in GPA
> and is created/destroyed whenever the guest traps to the hypervisor to
> manipulate the anchor (ie the GPA of the guest IO page table).
> 

That makes sense, thanks for clarifying.

>>> But I'm not sure that reading the userspace io page tables with KVM is
>>> even the best thing to do - the iommu driver already has the pinned
>>> memory, it would be faster and more modular to traverse the io page
>>> tables through the pfns in the root iommu_domain than by having KVM do
>>> the translations. Lets see what Matthew says..
>>
>> OK, you lost me a bit here.  And this may be associated with the above.
>>
>> So, what the current implementation is doing is reading the guest DMA tables
>> (which we must pin the first time we access them) and then map the PTEs of
>> the associated guest DMA entries into the associated host DMA table (so,
>> again pin and place the address, or unpin and invalidate).  Basically we are
>> shadowing the first level DMA table as a copy of the second level DMA table
>> with the host address(es) of the pinned guest page(s).
> 
> You can't pin/unpin in this path, there is no real way to handle error
> and ulimit stuff here, plus it is really slow. I didn't notice any of
> this in your patches, so what do you mean by 'pin' above?

patch 18 does some symbol_get for gfn_to_page (which will drive 
hva_to_pfn under the covers) and kvm_release_pfn_dirty and uses those 
symbols for pin/unpin.

pin/unpin errors in this series are reliant on the fact that RPCIT is 
architected to include a panic response to the guest of 'mappings failed 
for the specified range, go refresh your tables and make room', thus 
allowing this to work for pageable guests.

Agreed this would be unnecessary if we've already mapped all of guest 
memory via a 1st iommu domain.

> 
> To be like other IOMMU nesting drivers the pages should already be
> pinned and stored in the 1st iommu_domain, lets say in an xarray. This
> xarray is populated by type1 map/unmap sytem calls like any
> iommu_domain.
> 
> A nested iommu_domain should create the real HW IO page table and
> associate it with the real HW IOMMU and record the parent 1st level iommu_domain.
> 
> When you do the shadowing you use the xarray of the 1st level
> iommu_domain to translate from GPA to host physical and there is no
> pinning/etc involved. After walking the guest table and learning the
> final vIOVA it is translated through the xarray to a CPU physical and
> then programmed into the real HW IO page table.
> 
> There is no reason to use KVM to do any of this, and is actively wrong
> to place CPU pages from KVM into an IOPTE that did not come through
> the type1 map/unmap calls that do all the proper validation and
> accounting.
> 
> Jason


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-15 14:38       ` Jason Gunthorpe
@ 2022-03-15 16:29         ` Matthew Rosato
  2022-03-15 17:25           ` Jason Gunthorpe
  0 siblings, 1 reply; 66+ messages in thread
From: Matthew Rosato @ 2022-03-15 16:29 UTC (permalink / raw)
  To: Jason Gunthorpe, borntraeger
  Cc: linux-s390, alex.williamson, cohuck, schnelle, farman, pmorel,
	hca, gor, gerald.schaefer, agordeev, svens, frankja, david,
	imbrenda, vneethv, oberpar, freude, thuth, pasic, joro, will,
	pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On 3/15/22 10:38 AM, Jason Gunthorpe wrote:
> On Tue, Mar 15, 2022 at 09:49:01AM -0400, Matthew Rosato wrote:
> 
>> The rationale for splitting steps 1 and 2 are that VFIO_SET_IOMMU doesn't
>> have a mechanism for specifying more than the type as an arg, no?  Otherwise
>> yes, you could specify a kvm fd at this point and it would have some other
>> advantages (e.g. skip notifier).  But we still can't use the IOMMU for
>> mapping until step 3.
> 
> Stuff like this is why I'd be much happier if this could join our
> iommfd project so we can have clean modeling of the multiple iommu_domains.
> 

I'd certainly be willing to collaborate so feel free to loop me in on 
the discussions; but I got the impression that iommufd is not close to 
ready (maybe I'm wrong?) -- if so I really don't want to completely 
delay this zPCI support behind it as it has a significant benefit for 
kvm guests on s390x :(


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 22/32] KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM
  2022-03-14 21:46   ` Jason Gunthorpe
@ 2022-03-15 16:39     ` Matthew Rosato
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-15 16:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, alex.williamson, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

On 3/14/22 5:46 PM, Jason Gunthorpe wrote:
> On Mon, Mar 14, 2022 at 03:44:41PM -0400, Matthew Rosato wrote:
>> +int kvm_s390_pci_zpci_start(struct kvm *kvm, struct zpci_dev *zdev)
>> +{
>> +	struct vfio_device *vdev;
>> +	struct pci_dev *pdev;
>> +	int rc;
>> +
>> +	rc = kvm_s390_pci_dev_open(zdev);
>> +	if (rc)
>> +		return rc;
>> +
>> +	pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
>> +	if (!pdev) {
>> +		rc = -ENODEV;
>> +		goto exit_err;
>> +	}
>> +
>> +	vdev = get_vdev(&pdev->dev);
>> +	if (!vdev) {
>> +		pci_dev_put(pdev);
>> +		rc = -ENODEV;
>> +		goto exit_err;
>> +	}
>> +
>> +	zdev->kzdev->nb.notifier_call = kvm_s390_pci_group_notifier;
>> +
>> +	/*
>> +	 * At this point, a KVM should already be associated with this device,
>> +	 * so registering the notifier now should immediately trigger the
>> +	 * event.  We also want to know if the KVM association is later removed
>> +	 * to ensure proper cleanup happens.
>> +	 */
>> +	rc = register_notifier(vdev->dev, &zdev->kzdev->nb);
>> +
>> +	put_vdev(vdev);
>> +	pci_dev_put(pdev);
>> +
>> +	/* Make sure the registered KVM matches the KVM issuing the ioctl */
>> +	if (rc || zdev->kzdev->kvm != kvm) {
>> +		rc = -ENODEV;
>> +		goto exit_err;
>> +	}
>> +
>> +	/* Must support KVM-managed IOMMU to proceed */
>> +	if (IS_ENABLED(CONFIG_S390_KVM_IOMMU))
>> +		rc = zpci_iommu_attach_kvm(zdev, kvm);
>> +	else
>> +		rc = -EINVAL;
> 
> This seems like kind of a strange API, shouldn't kvm be getting a
> reference on the underlying iommu_domain and then calling into it to
> get the mapping table instead of pushing KVM specific logic into the
> iommu driver?
> 
> I would be nice if all the special kvm stuff could more isolated in
> kvm code.
> 
> I'm still a little unclear about why this is so complicated - can't
> you get the iommu_domain from the group FD directly in KVM code as
> power does?

Yeah, I think I could do something like that using the vfio group fd 
like power does.

Providing a reference to the kvm itself inside iommu was being used for 
the pin/unpin operations, which would not be necessary if we switched to 
the 1st layer iommu pinning all of guest memory.




^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-15 14:17         ` Matthew Rosato
@ 2022-03-15 17:01           ` Matthew Rosato
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-15 17:01 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Alex Williamson
  Cc: linux-s390, cohuck, schnelle, farman, pmorel, borntraeger, hca,
	gor, gerald.schaefer, agordeev, svens, frankja, david, imbrenda,
	vneethv, oberpar, freude, thuth, pasic, joro, will, pbonzini,
	corbet, kvm, linux-kernel, iommu, linux-doc

On 3/15/22 10:17 AM, Matthew Rosato wrote:
> On 3/15/22 3:57 AM, Tian, Kevin wrote:
>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>> Sent: Tuesday, March 15, 2022 7:18 AM
>>>
>>> On Mon, Mar 14, 2022 at 04:50:33PM -0600, Alex Williamson wrote:
>>>
>>>>> +/*
>>>>> + * The KVM_IOMMU type implies that the hypervisor will control the
>>> mappings
>>>>> + * rather than userspace
>>>>> + */
>>>>> +#define VFIO_KVM_IOMMU            11
>>>>
>>>> Then why is this hosted in the type1 code that exposes a wide variety
>>>> of userspace interfaces?  Thanks,
>>>
>>> It is really badly named, this is the root level of a 2 stage nested
>>> IO page table, and this approach needed a special flag to distinguish
>>> the setup from the normal iommu_domain.
>>>
>>> If we do try to stick this into VFIO it should probably use the
>>> VFIO_TYPE1_NESTING_IOMMU instead - however, we would like to delete
>>> that flag entirely as it was never fully implemented, was never used,
>>> and isn't part of what we are proposing for IOMMU nesting on ARM
>>> anyhow. (So far I've found nobody to explain what the plan here was..)
>>>
>>> This is why I said the second level should be an explicit iommu_domain
>>> all on its own that is explicitly coupled to the KVM to read the page
>>> tables, if necessary.
>>>
>>> But I'm not sure that reading the userspace io page tables with KVM is
>>> even the best thing to do - the iommu driver already has the pinned
>>> memory, it would be faster and more modular to traverse the io page
>>> tables through the pfns in the root iommu_domain than by having KVM do
>>> the translations. Lets see what Matthew says..
>>>
>>
>> Reading this thread it's sort of like an optimization to software 
>> nesting.
> 
> Yes, we want to avoid breaking to userspace for a very frequent 
> operation (RPCIT / updating shadow mappings)
> 
>> If that is the case does it make more sense to complete the basic form
>> of software nesting first and then adds this optimization?
>>
>> The basic form would allow the userspace to create a special domain
>> type which points to a user/guest page table (like hardware nesting)
>> but doesn't install the user page table to the IOMMU hardware (unlike
>> hardware nesting). When receiving invalidate cmd from userspace > the 
>> iommu driver walks the user page table (1st-level) and the parent
>> page table (2nd-level) to generate a shadow mapping for the
>> invalidated range in the non-nested hardware page table of this
>> special domain type.
>>
>> Once that works what this series does just changes the matter of
>> how the invalidate cmd is triggered. Previously iommu driver receives
>> invalidate cmd from Qemu (via iommufd uAPI) while now receiving
>> the cmd from kvm (via iommufd kAPI) upon interception of RPCIT.
>>  From this angle once the connection between iommufd and kvm fd
>> is established there is even no direct talk between iommu driver and
>> kvm.
> 
> But something somewhere still needs to be responsible for 
> pinning/unpinning of the guest table entries upon each RPCIT 
> interception.  e.g. the RPCIT intercept can happen because the guest 
> wants to invalidate some old mappings or has generated some new mappings 
> over a range, so we must shadow the new mappings (by pinning the guest 
> entries and placing them in the host hardware table / unpinning 
> invalidated ones and clearing their entry in the host hardware table).
> 

OK, this got clarified by Jason in another thread: What I was missing 
here was an assumption that the 1st-level has already mapped and pinned 
all of guest physical address space; in that case there's no need to 
invoke pin/unpin operations against a kvm from within the iommu domain 
(this series as-is does not pin all of the guest physical address space; 
it does pins/unpins on-demand at RPCIT time)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-15 16:04           ` Matthew Rosato
@ 2022-03-15 17:18             ` Jason Gunthorpe
  0 siblings, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-15 17:18 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: borntraeger, Alex Williamson, linux-s390, cohuck, schnelle,
	farman, pmorel, hca, gor, gerald.schaefer, agordeev, svens,
	frankja, david, imbrenda, vneethv, oberpar, freude, thuth, pasic,
	joro, will, pbonzini, corbet, kvm, linux-kernel, iommu,
	linux-doc

On Tue, Mar 15, 2022 at 12:04:35PM -0400, Matthew Rosato wrote:

> > You can't pin/unpin in this path, there is no real way to handle error
> > and ulimit stuff here, plus it is really slow. I didn't notice any of
> > this in your patches, so what do you mean by 'pin' above?
> 
> patch 18 does some symbol_get for gfn_to_page (which will drive hva_to_pfn
> under the covers) and kvm_release_pfn_dirty and uses those symbols for
> pin/unpin.

To be very clear, this is quite wrong.

It does not account for the memory pinned by the guest and a hostile
VM could pin more memory than the KVM process is allowed to - which is
a security hole.

It also uses the wrong kind of pin, DMA pinned pages must be
pin_user_page'd not get_page'd and undone with unpin_user_page() and
not put_page(). This allows the guest to pin ZONE_MOVABALE memory and
other things which cannot be DMA'd to which will break the hypervisor
kernel. See David Hildenbrand's various series on COW bugs for an
overview why this is really security bad.

If you want to do dynamic pinning that is a different thing and
requires more CPU work in the shadowing operations. The modeling would
be similar except that the 1st stage iommu_domain would be this
'shared with KVM' domain people have been talking about - ie the page
table is not set with type 1 map/unmap but follows the KVM page table and
here it would be appropriate to use gfn_to_page/etc to access it.

However, if you do that then you do still have to take care of the
ulimit checks and you must teach kvm to use unpin_user_page/_dirty()
and related to be correct. This looks like a pretty big deal.

My advice is to start with the fully pinned case I described and
consider a KVM approach down the road.

[Also, I'm quite excited by this series you have, I think it shows
exactly how to fix POWER to work within the modern iommu framework,
they have the same basic problem]

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-15 16:29         ` Matthew Rosato
@ 2022-03-15 17:25           ` Jason Gunthorpe
  2022-03-17 18:51             ` Matthew Rosato
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-15 17:25 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: borntraeger, linux-s390, alex.williamson, cohuck, schnelle,
	farman, pmorel, hca, gor, gerald.schaefer, agordeev, svens,
	frankja, david, imbrenda, vneethv, oberpar, freude, thuth, pasic,
	joro, will, pbonzini, corbet, kvm, linux-kernel, iommu,
	linux-doc

On Tue, Mar 15, 2022 at 12:29:02PM -0400, Matthew Rosato wrote:
> On 3/15/22 10:38 AM, Jason Gunthorpe wrote:
> > On Tue, Mar 15, 2022 at 09:49:01AM -0400, Matthew Rosato wrote:
> > 
> > > The rationale for splitting steps 1 and 2 are that VFIO_SET_IOMMU doesn't
> > > have a mechanism for specifying more than the type as an arg, no?  Otherwise
> > > yes, you could specify a kvm fd at this point and it would have some other
> > > advantages (e.g. skip notifier).  But we still can't use the IOMMU for
> > > mapping until step 3.
> > 
> > Stuff like this is why I'd be much happier if this could join our
> > iommfd project so we can have clean modeling of the multiple iommu_domains.
> > 
> 
> I'd certainly be willing to collaborate so feel free to loop me in on the
> discussions; 

Sure, I have you on my list. I've been waiting for Eric to get a bit
further along on his ARM work so you have something appropriate to
look at.

In the mean time you can certainly work out the driver details as
you've been doing here and hacking through VFIO. The iommu_domain
logic is the big work item in this series, not the integration with
the uAPI.

> but I got the impression that iommufd is not close to ready (maybe
> I'm wrong?)

Well, quite alot has been done already and I think we are getting
close to having something that can start moving forward, but yes it
will not be "tomorrow".

> -- if so I really don't want to completely delay this zPCI support
> behind it as it has a significant benefit for kvm guests on s390x :(

Every platform vendor is trying to do this exact same performance
optimization. s390 is doing it a bit different, but as we've seen, not
fundamentally so compard to ARM and Intel IOMMU's with HW nesting.

I'm not sure that s390 should jump the queue and hacky hacky uAPIs all
over the place. ARM platform has been working on this for literal
years now.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type
  2022-03-15 10:49   ` Robin Murphy
@ 2022-03-17  5:47     ` Tian, Kevin
  2022-03-17 13:52       ` Jason Gunthorpe
  0 siblings, 1 reply; 66+ messages in thread
From: Tian, Kevin @ 2022-03-17  5:47 UTC (permalink / raw)
  To: Robin Murphy, Matthew Rosato, linux-s390
  Cc: kvm, david, farman, oberpar, vneethv, agordeev, imbrenda, will,
	Jean-Philippe Brucker, frankja, corbet, linux-doc, pasic, jgg,
	gerald.schaefer, borntraeger, thuth, gor, schnelle, hca,
	alex.williamson, freude, pmorel, cohuck, linux-kernel, iommu,
	svens, pbonzini, Zhao, Yan Y

> From: Robin Murphy
> Sent: Tuesday, March 15, 2022 6:49 PM
> 
> On 2022-03-14 19:44, Matthew Rosato wrote:
> > s390x will introduce an additional domain type that is used for
> > managing IOMMU owned by KVM.  Define the type here and add an
> > interface for allocating a specified type vs the default type.
> 
> I'm also not a huge fan of adding a new domain_alloc interface like
> this, however if it is justifiable, then please make it take struct
> device rather than struct bus_type as an argument.
> 
> It also sounds like there may be a degree of conceptual overlap here
> with what Jean-Philippe is working on for sharing pagetables between KVM
> and SMMU for Android pKVM, so it's probably worth some thought over
> whether there's any scope for common interfaces in terms of actual
> implementation.
> 

Same here. Yan Zhao is working on page table sharing between KVM
and VT-d. This is one important usage to build atop iommufd and
a set of common interfaces are definitely necessary here. 😊

Thanks
Kevin

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type
  2022-03-17  5:47     ` Tian, Kevin
@ 2022-03-17 13:52       ` Jason Gunthorpe
  2022-03-18  2:23         ` Tian, Kevin
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-17 13:52 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Robin Murphy, Matthew Rosato, linux-s390, kvm, david, farman,
	oberpar, vneethv, agordeev, imbrenda, will,
	Jean-Philippe Brucker, frankja, corbet, linux-doc, pasic,
	gerald.schaefer, borntraeger, thuth, gor, schnelle, hca,
	alex.williamson, freude, pmorel, cohuck, linux-kernel, iommu,
	svens, pbonzini, Zhao, Yan Y

On Thu, Mar 17, 2022 at 05:47:36AM +0000, Tian, Kevin wrote:
> > From: Robin Murphy
> > Sent: Tuesday, March 15, 2022 6:49 PM
> > 
> > On 2022-03-14 19:44, Matthew Rosato wrote:
> > > s390x will introduce an additional domain type that is used for
> > > managing IOMMU owned by KVM.  Define the type here and add an
> > > interface for allocating a specified type vs the default type.
> > 
> > I'm also not a huge fan of adding a new domain_alloc interface like
> > this, however if it is justifiable, then please make it take struct
> > device rather than struct bus_type as an argument.
> > 
> > It also sounds like there may be a degree of conceptual overlap here
> > with what Jean-Philippe is working on for sharing pagetables between KVM
> > and SMMU for Android pKVM, so it's probably worth some thought over
> > whether there's any scope for common interfaces in terms of actual
> > implementation.
> 
> Same here. Yan Zhao is working on page table sharing between KVM
> and VT-d. This is one important usage to build atop iommufd and
> a set of common interfaces are definitely necessary here. 😊

I always thought 'page table sharing with KVM' is SVA - ie it requires
PRI in the IOMMU driver as the KVM page table is fully unpinned and
dynamic. This S390 case is not doing SVA/PRI

Are people working on teaching KVM to DMA pin every page and avoid
having a dynamic page table? I'm surprised, a lot of stuff won't work,
eg write protect..

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-15 17:25           ` Jason Gunthorpe
@ 2022-03-17 18:51             ` Matthew Rosato
  0 siblings, 0 replies; 66+ messages in thread
From: Matthew Rosato @ 2022-03-17 18:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: borntraeger, linux-s390, alex.williamson, cohuck, schnelle,
	farman, pmorel, hca, gor, gerald.schaefer, agordeev, svens,
	frankja, david, imbrenda, vneethv, oberpar, freude, thuth, pasic,
	joro, will, pbonzini, corbet, kvm, linux-kernel, iommu,
	linux-doc

On 3/15/22 1:25 PM, Jason Gunthorpe wrote:
> On Tue, Mar 15, 2022 at 12:29:02PM -0400, Matthew Rosato wrote:
>> On 3/15/22 10:38 AM, Jason Gunthorpe wrote:
>>> On Tue, Mar 15, 2022 at 09:49:01AM -0400, Matthew Rosato wrote:
>>>
>>>> The rationale for splitting steps 1 and 2 are that VFIO_SET_IOMMU doesn't
>>>> have a mechanism for specifying more than the type as an arg, no?  Otherwise
>>>> yes, you could specify a kvm fd at this point and it would have some other
>>>> advantages (e.g. skip notifier).  But we still can't use the IOMMU for
>>>> mapping until step 3.
>>>
>>> Stuff like this is why I'd be much happier if this could join our
>>> iommfd project so we can have clean modeling of the multiple iommu_domains.
>>>
>>
>> I'd certainly be willing to collaborate so feel free to loop me in on the
>> discussions;
> 
> Sure, I have you on my list. I've been waiting for Eric to get a bit
> further along on his ARM work so you have something appropriate to
> look at.
> 
> In the mean time you can certainly work out the driver details as
> you've been doing here and hacking through VFIO. The iommu_domain
> logic is the big work item in this series, not the integration with
> the uAPI.
>

A subset of this series (enabling some s390x firmware-assist facilities) 
is not tied to the iommu and would still provide value while continuing 
to use vfio_iommu_type1 for all mapping -- so I think I'll look into a 
next version that shrinks down to that subset (+ re-visit the setup API).

Separate from that, I will continue looking at implementing the nested 
iommu_domain logic for s390, and continue to hack through VFIO for now. 
I'll use an RFC series when I have something more to look at, likely 
starting with the fully-pinned guest as you suggest; ultimately I'm 
interested in both scenarios (pinned kvm guest & dynamic pinning during 
shadow)

Thanks,
Matt

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type
  2022-03-17 13:52       ` Jason Gunthorpe
@ 2022-03-18  2:23         ` Tian, Kevin
  2022-03-18 14:13           ` Jason Gunthorpe
  0 siblings, 1 reply; 66+ messages in thread
From: Tian, Kevin @ 2022-03-18  2:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, Matthew Rosato, linux-s390, kvm, david, farman,
	oberpar, vneethv, agordeev, imbrenda, will,
	Jean-Philippe Brucker, frankja, corbet, linux-doc, pasic,
	gerald.schaefer, borntraeger, thuth, gor, schnelle, hca,
	alex.williamson, freude, pmorel, cohuck, linux-kernel, iommu,
	svens, pbonzini, Zhao, Yan Y

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 17, 2022 9:53 PM
> 
> On Thu, Mar 17, 2022 at 05:47:36AM +0000, Tian, Kevin wrote:
> > > From: Robin Murphy
> > > Sent: Tuesday, March 15, 2022 6:49 PM
> > >
> > > On 2022-03-14 19:44, Matthew Rosato wrote:
> > > > s390x will introduce an additional domain type that is used for
> > > > managing IOMMU owned by KVM.  Define the type here and add an
> > > > interface for allocating a specified type vs the default type.
> > >
> > > I'm also not a huge fan of adding a new domain_alloc interface like
> > > this, however if it is justifiable, then please make it take struct
> > > device rather than struct bus_type as an argument.
> > >
> > > It also sounds like there may be a degree of conceptual overlap here
> > > with what Jean-Philippe is working on for sharing pagetables between
> KVM
> > > and SMMU for Android pKVM, so it's probably worth some thought over
> > > whether there's any scope for common interfaces in terms of actual
> > > implementation.
> >
> > Same here. Yan Zhao is working on page table sharing between KVM
> > and VT-d. This is one important usage to build atop iommufd and
> > a set of common interfaces are definitely necessary here. 😊
> 
> I always thought 'page table sharing with KVM' is SVA - ie it requires
> PRI in the IOMMU driver as the KVM page table is fully unpinned and
> dynamic. This S390 case is not doing SVA/PRI
> 
> Are people working on teaching KVM to DMA pin every page and avoid
> having a dynamic page table? I'm surprised, a lot of stuff won't work,
> eg write protect..
> 

Yes, that is another major part work besides the iommufd work. And
it is not compatible with KVM features which rely on the dynamic
manner of EPT. Though It is a bit questionable whether it's worthy of 
doing so just for saving memory footprint while losing other capabilities,
it is a requirement for some future security extension in Intel trusted
computing architecture. And KVM has been pinning pages for SEV/TDX/etc.
today thus some facilities can be reused. But I agree it is not a simple
task thus we need start discussion early to explore various gaps in
iommu and kvm.

Imagine many threads (dirty tracking, nested, KVM page table sharing,
etc.) will run in parallel soon after the new iommufd RFC is out. lots of
fun ahead. 😊

Thanks
Kevin

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-15 14:55         ` Jason Gunthorpe
  2022-03-15 16:04           ` Matthew Rosato
@ 2022-03-18  7:01           ` Tian, Kevin
  2022-03-18 13:46             ` Jason Gunthorpe
  1 sibling, 1 reply; 66+ messages in thread
From: Tian, Kevin @ 2022-03-18  7:01 UTC (permalink / raw)
  To: Jason Gunthorpe, Matthew Rosato
  Cc: Alex Williamson, linux-s390, cohuck, schnelle, farman, pmorel,
	borntraeger, hca, gor, gerald.schaefer, agordeev, svens, frankja,
	david, imbrenda, vneethv, oberpar, freude, thuth, pasic, joro,
	will, pbonzini, corbet, kvm, linux-kernel, iommu, linux-doc

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 15, 2022 10:55 PM
> 
> The first level iommu_domain has the 'type1' map and unmap and pins
> the pages. This is the 1:1 map with the GPA and ends up pinning all
> guest memory because the point is you don't want to take a memory pin
> on your performance path
> 
> The second level iommu_domain points to a single IO page table in GPA
> and is created/destroyed whenever the guest traps to the hypervisor to
> manipulate the anchor (ie the GPA of the guest IO page table).
> 

Can we use consistent terms as used in iommufd and hardware, i.e.
with first-level/stage-1 referring to the child (GIOVA->GPA) which is
further nested on second-level/stage-2 as the parent (GPA->HPA)?

Otherwise all other explanations are agreed.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-18  7:01           ` Tian, Kevin
@ 2022-03-18 13:46             ` Jason Gunthorpe
  2022-03-19  7:47               ` Tian, Kevin
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-18 13:46 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Matthew Rosato, Alex Williamson, linux-s390, cohuck, schnelle,
	farman, pmorel, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, frankja, david, imbrenda, vneethv, oberpar, freude, thuth,
	pasic, joro, will, pbonzini, corbet, kvm, linux-kernel, iommu,
	linux-doc

On Fri, Mar 18, 2022 at 07:01:19AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, March 15, 2022 10:55 PM
> > 
> > The first level iommu_domain has the 'type1' map and unmap and pins
> > the pages. This is the 1:1 map with the GPA and ends up pinning all
> > guest memory because the point is you don't want to take a memory pin
> > on your performance path
> > 
> > The second level iommu_domain points to a single IO page table in GPA
> > and is created/destroyed whenever the guest traps to the hypervisor to
> > manipulate the anchor (ie the GPA of the guest IO page table).
> > 
> 
> Can we use consistent terms as used in iommufd and hardware, i.e.
> with first-level/stage-1 referring to the child (GIOVA->GPA) which is
> further nested on second-level/stage-2 as the parent (GPA->HPA)?

Honestly I don't like injecting terms that only make sense for
virtualization into iommu/vfio land.

That area is intended to be general. If you use what it exposes for
virtualization, then great.

This is why I prefer to use 'user page table' when talking about the
GIOVA->GPA or Stage 1 map because it is a phrase independent of
virtualization or HW and clearly conveys what it is to the kernel and
its inherent order in the translation scheme.

The S1/S2 is gets confusing because the HW people choose those names
so that S1 is the first translation a TLP sees and S2 is the second.

But from a software model, the S2 is the first domain created and the
first level of the translation tree, while the S1 is the second domain
created and the second level of the translation tree. ie the natural
SW numbers are backwards.

And I know Matthew isn't working on HW that has the S1/S2 HW naming :)

But yes, should try harder to have good names. Maybe it will be
clearer with code.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type
  2022-03-18  2:23         ` Tian, Kevin
@ 2022-03-18 14:13           ` Jason Gunthorpe
  2022-03-19  7:51             ` Tian, Kevin
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-18 14:13 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Robin Murphy, Matthew Rosato, linux-s390, kvm, david, farman,
	oberpar, vneethv, agordeev, imbrenda, will,
	Jean-Philippe Brucker, frankja, corbet, linux-doc, pasic,
	gerald.schaefer, borntraeger, thuth, gor, schnelle, hca,
	alex.williamson, freude, pmorel, cohuck, linux-kernel, iommu,
	svens, pbonzini, Zhao, Yan Y

On Fri, Mar 18, 2022 at 02:23:57AM +0000, Tian, Kevin wrote:

> Yes, that is another major part work besides the iommufd work. And
> it is not compatible with KVM features which rely on the dynamic
> manner of EPT. Though It is a bit questionable whether it's worthy of
> doing so just for saving memory footprint while losing other capabilities,
> it is a requirement for some future security extension in Intel trusted
> computing architecture. And KVM has been pinning pages for SEV/TDX/etc.
> today thus some facilities can be reused. But I agree it is not a simple
> task thus we need start discussion early to explore various gaps in
> iommu and kvm.

Yikes. IMHO this might work better going the other way, have KVM
import the iommu_domain and use that as the KVM page table than vice
versa.

The semantics are a heck of a lot clearer, and it is really obvious
that alot of KVM becomes disabled if you do this.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
  2022-03-18 13:46             ` Jason Gunthorpe
@ 2022-03-19  7:47               ` Tian, Kevin
  0 siblings, 0 replies; 66+ messages in thread
From: Tian, Kevin @ 2022-03-19  7:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matthew Rosato, Alex Williamson, linux-s390, cohuck, schnelle,
	farman, pmorel, borntraeger, hca, gor, gerald.schaefer, agordeev,
	svens, frankja, david, imbrenda, vneethv, oberpar, freude, thuth,
	pasic, joro, will, pbonzini, corbet, kvm, linux-kernel, iommu,
	linux-doc

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, March 18, 2022 9:46 PM
> 
> On Fri, Mar 18, 2022 at 07:01:19AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, March 15, 2022 10:55 PM
> > >
> > > The first level iommu_domain has the 'type1' map and unmap and pins
> > > the pages. This is the 1:1 map with the GPA and ends up pinning all
> > > guest memory because the point is you don't want to take a memory pin
> > > on your performance path
> > >
> > > The second level iommu_domain points to a single IO page table in GPA
> > > and is created/destroyed whenever the guest traps to the hypervisor to
> > > manipulate the anchor (ie the GPA of the guest IO page table).
> > >
> >
> > Can we use consistent terms as used in iommufd and hardware, i.e.
> > with first-level/stage-1 referring to the child (GIOVA->GPA) which is
> > further nested on second-level/stage-2 as the parent (GPA->HPA)?
> 
> Honestly I don't like injecting terms that only make sense for
> virtualization into iommu/vfio land.

1st/2nd-level or stage-1/2 are hardware terms not tied to virtualization.
GIOVA/GPA are just examples in this story.

> 
> That area is intended to be general. If you use what it exposes for
> virtualization, then great.
> 
> This is why I prefer to use 'user page table' when talking about the
> GIOVA->GPA or Stage 1 map because it is a phrase independent of
> virtualization or HW and clearly conveys what it is to the kernel and
> its inherent order in the translation scheme.

I fully agree with this point. The confusion only comes when you
start talking about first/second level in a way incompatible with
what iommu/vfio guys typically understand. 😊

> 
> The S1/S2 is gets confusing because the HW people choose those names
> so that S1 is the first translation a TLP sees and S2 is the second.
> 
> But from a software model, the S2 is the first domain created and the
> first level of the translation tree, while the S1 is the second domain
> created and the second level of the translation tree. ie the natural
> SW numbers are backwards.

Yes, I got this point.

> 
> And I know Matthew isn't working on HW that has the S1/S2 HW naming :)
> 
> But yes, should try harder to have good names. Maybe it will be
> clearer with code.
> 

Yes. let's try to use 'user page table' as possible. But if the levels/stages
are inevitable in description I prefer to staying with the hardware terms
given iommu driver has to deal with hardware naming things.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type
  2022-03-18 14:13           ` Jason Gunthorpe
@ 2022-03-19  7:51             ` Tian, Kevin
  2022-03-21 14:07               ` Jason Gunthorpe
  0 siblings, 1 reply; 66+ messages in thread
From: Tian, Kevin @ 2022-03-19  7:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, Matthew Rosato, linux-s390, kvm, david, farman,
	oberpar, vneethv, agordeev, imbrenda, will,
	Jean-Philippe Brucker, frankja, corbet, linux-doc, pasic,
	gerald.schaefer, borntraeger, thuth, gor, schnelle, hca,
	alex.williamson, freude, pmorel, cohuck, linux-kernel, iommu,
	svens, pbonzini, Zhao, Yan Y

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, March 18, 2022 10:13 PM
> 
> On Fri, Mar 18, 2022 at 02:23:57AM +0000, Tian, Kevin wrote:
> 
> > Yes, that is another major part work besides the iommufd work. And
> > it is not compatible with KVM features which rely on the dynamic
> > manner of EPT. Though It is a bit questionable whether it's worthy of
> > doing so just for saving memory footprint while losing other capabilities,
> > it is a requirement for some future security extension in Intel trusted
> > computing architecture. And KVM has been pinning pages for SEV/TDX/etc.
> > today thus some facilities can be reused. But I agree it is not a simple
> > task thus we need start discussion early to explore various gaps in
> > iommu and kvm.
> 
> Yikes. IMHO this might work better going the other way, have KVM
> import the iommu_domain and use that as the KVM page table than vice
> versa.
> 
> The semantics are a heck of a lot clearer, and it is really obvious
> that alot of KVM becomes disabled if you do this.
> 

This is an interesting angle to look at it. But given pinning is already
required in KVM to support SEV/TDX even w/o assigned device, those
restrictions have to be understood by KVM MMU code which makes
a KVM-managed page table under such restrictions closer to be 
sharable with IOMMU.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type
  2022-03-19  7:51             ` Tian, Kevin
@ 2022-03-21 14:07               ` Jason Gunthorpe
  2022-03-22  7:30                 ` Tian, Kevin
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Gunthorpe @ 2022-03-21 14:07 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Robin Murphy, Matthew Rosato, linux-s390, kvm, david, farman,
	oberpar, vneethv, agordeev, imbrenda, will,
	Jean-Philippe Brucker, frankja, corbet, linux-doc, pasic,
	gerald.schaefer, borntraeger, thuth, gor, schnelle, hca,
	alex.williamson, freude, pmorel, cohuck, linux-kernel, iommu,
	svens, pbonzini, Zhao, Yan Y

On Sat, Mar 19, 2022 at 07:51:31AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, March 18, 2022 10:13 PM
> > 
> > On Fri, Mar 18, 2022 at 02:23:57AM +0000, Tian, Kevin wrote:
> > 
> > > Yes, that is another major part work besides the iommufd work. And
> > > it is not compatible with KVM features which rely on the dynamic
> > > manner of EPT. Though It is a bit questionable whether it's worthy of
> > > doing so just for saving memory footprint while losing other capabilities,
> > > it is a requirement for some future security extension in Intel trusted
> > > computing architecture. And KVM has been pinning pages for SEV/TDX/etc.
> > > today thus some facilities can be reused. But I agree it is not a simple
> > > task thus we need start discussion early to explore various gaps in
> > > iommu and kvm.
> > 
> > Yikes. IMHO this might work better going the other way, have KVM
> > import the iommu_domain and use that as the KVM page table than vice
> > versa.
> > 
> > The semantics are a heck of a lot clearer, and it is really obvious
> > that alot of KVM becomes disabled if you do this.
> > 
> 
> This is an interesting angle to look at it. But given pinning is already
> required in KVM to support SEV/TDX even w/o assigned device, those
> restrictions have to be understood by KVM MMU code which makes
> a KVM-managed page table under such restrictions closer to be 
> sharable with IOMMU.

I thought the SEV/TDX stuff wasn't being done with pinning but via a
memfd in a special mode that does sort of pin under the covers, but it
is not necessarily a DMA pin. (it isn't even struct page memory, so
I'm not even sure what pin means)

Certainly, there is no inherent problem with SEV/TDX having movable
memory and KVM could concievably handle this - but iommu cannot.

I would not make an equivilance with SEV/TDX and iommu at least..

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type
  2022-03-21 14:07               ` Jason Gunthorpe
@ 2022-03-22  7:30                 ` Tian, Kevin
  0 siblings, 0 replies; 66+ messages in thread
From: Tian, Kevin @ 2022-03-22  7:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: david, thuth, linux-kernel, vneethv, agordeev, imbrenda, will,
	linux-s390, kvm, corbet, linux-doc, pasic, gerald.schaefer,
	borntraeger, farman, Zhao, Yan Y, gor, schnelle, hca,
	alex.williamson, freude, pbonzini, frankja, pmorel, cohuck,
	oberpar, iommu, svens, Jean-Philippe Brucker, Robin Murphy

> From: Jason Gunthorpe
> Sent: Monday, March 21, 2022 10:07 PM
> 
> On Sat, Mar 19, 2022 at 07:51:31AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, March 18, 2022 10:13 PM
> > >
> > > On Fri, Mar 18, 2022 at 02:23:57AM +0000, Tian, Kevin wrote:
> > >
> > > > Yes, that is another major part work besides the iommufd work. And
> > > > it is not compatible with KVM features which rely on the dynamic
> > > > manner of EPT. Though It is a bit questionable whether it's worthy of
> > > > doing so just for saving memory footprint while losing other capabilities,
> > > > it is a requirement for some future security extension in Intel trusted
> > > > computing architecture. And KVM has been pinning pages for
> SEV/TDX/etc.
> > > > today thus some facilities can be reused. But I agree it is not a simple
> > > > task thus we need start discussion early to explore various gaps in
> > > > iommu and kvm.
> > >
> > > Yikes. IMHO this might work better going the other way, have KVM
> > > import the iommu_domain and use that as the KVM page table than vice
> > > versa.
> > >
> > > The semantics are a heck of a lot clearer, and it is really obvious
> > > that alot of KVM becomes disabled if you do this.
> > >
> >
> > This is an interesting angle to look at it. But given pinning is already
> > required in KVM to support SEV/TDX even w/o assigned device, those
> > restrictions have to be understood by KVM MMU code which makes
> > a KVM-managed page table under such restrictions closer to be
> > sharable with IOMMU.
> 
> I thought the SEV/TDX stuff wasn't being done with pinning but via a
> memfd in a special mode that does sort of pin under the covers, but it
> is not necessarily a DMA pin. (it isn't even struct page memory, so
> I'm not even sure what pin means)
> 
> Certainly, there is no inherent problem with SEV/TDX having movable
> memory and KVM could concievably handle this - but iommu cannot.
> 
> I would not make an equivilance with SEV/TDX and iommu at least..
> 

Currently SEV does use DMA pin i.e. pin_user_pages in sev_pin_memory().

I'm not sure whether it's a hardware limitation or just a software tradeoff
for simplicity. But having that code does imply that KVM has absorbed
certain restrictions with that pinning fact.

But I agree they are not equivalent. e.g. suppose pinning is only applied to
private/encrypted memory in SEV/TDX while iommu requires pinning the
entire guest memory (if no IOPF support on device).

btw no matter it's KVM to import iommu domain or it's iommufd to
import KVM page table, in the end KVM mmu needs to explicitly mark
out its page table as shared with IOMMU and enable all kinds of
restrictions to support that sharing fact.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2022-03-22  7:31 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-14 19:44 [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 01/32] s390/sclp: detect the zPCI load/store interpretation facility Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 02/32] s390/sclp: detect the AISII facility Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 03/32] s390/sclp: detect the AENI facility Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 04/32] s390/sclp: detect the AISI facility Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 05/32] s390/airq: pass more TPI info to airq handlers Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 06/32] s390/airq: allow for airq structure that uses an input vector Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 07/32] s390/pci: externalize the SIC operation controls and routine Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 08/32] s390/pci: stash associated GISA designation Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 09/32] s390/pci: export some routines related to RPCIT processing Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 10/32] s390/pci: stash dtsm and maxstbl Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 11/32] s390/pci: add helper function to find device by handle Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 12/32] s390/pci: get SHM information from list pci Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 13/32] s390/pci: return status from zpci_refresh_trans Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 14/32] iommu: introduce iommu_domain_alloc_type and the KVM type Matthew Rosato
2022-03-14 21:36   ` Jason Gunthorpe
2022-03-15 10:49   ` Robin Murphy
2022-03-17  5:47     ` Tian, Kevin
2022-03-17 13:52       ` Jason Gunthorpe
2022-03-18  2:23         ` Tian, Kevin
2022-03-18 14:13           ` Jason Gunthorpe
2022-03-19  7:51             ` Tian, Kevin
2022-03-21 14:07               ` Jason Gunthorpe
2022-03-22  7:30                 ` Tian, Kevin
2022-03-14 19:44 ` [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type Matthew Rosato
2022-03-14 21:38   ` Jason Gunthorpe
2022-03-15 13:49     ` Matthew Rosato
2022-03-15 14:38       ` Jason Gunthorpe
2022-03-15 16:29         ` Matthew Rosato
2022-03-15 17:25           ` Jason Gunthorpe
2022-03-17 18:51             ` Matthew Rosato
2022-03-14 22:50   ` Alex Williamson
2022-03-14 23:18     ` Jason Gunthorpe
2022-03-15  7:57       ` Tian, Kevin
2022-03-15 14:17         ` Matthew Rosato
2022-03-15 17:01           ` Matthew Rosato
2022-03-15 13:36       ` Matthew Rosato
2022-03-15 14:55         ` Jason Gunthorpe
2022-03-15 16:04           ` Matthew Rosato
2022-03-15 17:18             ` Jason Gunthorpe
2022-03-18  7:01           ` Tian, Kevin
2022-03-18 13:46             ` Jason Gunthorpe
2022-03-19  7:47               ` Tian, Kevin
2022-03-14 19:44 ` [PATCH v4 16/32] vfio-pci/zdev: add function handle to clp base capability Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 17/32] KVM: s390: pci: add basic kvm_zdev structure Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 18/32] iommu/s390: add support for IOMMU_DOMAIN_KVM Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 19/32] KVM: s390: pci: do initial setup for AEN interpretation Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 20/32] KVM: s390: pci: enable host forwarding of Adapter Event Notifications Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 21/32] KVM: s390: mechanism to enable guest zPCI Interpretation Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 22/32] KVM: s390: pci: routines for (dis)associating zPCI devices with a KVM Matthew Rosato
2022-03-14 21:46   ` Jason Gunthorpe
2022-03-15 16:39     ` Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 23/32] KVM: s390: pci: provide routines for enabling/disabling interpretation Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 24/32] KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 25/32] KVM: s390: pci: provide routines for enabling/disabling IOAT assist Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 26/32] KVM: s390: pci: handle refresh of PCI translations Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 27/32] KVM: s390: intercept the rpcit instruction Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 28/32] KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 29/32] vfio-pci/zdev: add DTSM to clp group capability Matthew Rosato
2022-03-14 21:49   ` Jason Gunthorpe
2022-03-15 14:39     ` Matthew Rosato
2022-03-15 14:56       ` Jason Gunthorpe
2022-03-14 19:44 ` [PATCH v4 30/32] KVM: s390: introduce CPU feature for zPCI Interpretation Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 31/32] MAINTAINERS: additional files related kvm s390 pci passthrough Matthew Rosato
2022-03-14 19:44 ` [PATCH v4 32/32] MAINTAINERS: update s390 IOMMU entry Matthew Rosato
2022-03-14 19:52 ` [PATCH v4 00/32] KVM: s390: enable zPCI for interpretive execution Matthew Rosato

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).