qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] s390x/pci: zPCI interpretation support
@ 2021-12-07 21:04 Matthew Rosato
  2021-12-07 21:04 ` [PATCH 01/12] s390x/pci: use a reserved ID for the default PCI group Matthew Rosato
                   ` (12 more replies)
  0 siblings, 13 replies; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

Note:  The first 3 patches of this series are included as pre-reqs, but
should be pulled via a separate series.  Also, patch 5 is needed to
support 5.16+ linux header-sync and was already done by Paolo but not
merged yet so is thus included here as well. 

For QEMU, the majority of the work in enabling instruction interpretation
is handled via new VFIO ioctls to SET the appropriate interpretation and
interrupt forwarding modes, and to GET the function handle to use for
interpretive execution.  

This series implements these new ioctls, as well as adding a new, optional
'intercept' parameter to zpci to request interpretation support not be used
as well as an 'intassist' parameter to determine whether or not the
firmware assist will be used for interrupt delivery or whether the host
will be responsible for delivering all interrupts.

The ZPCI_INTERP CPU feature is added beginning with the z14 model to
enable this support.

As a consequence of implementing zPCI interpretation, ISM devices now
become eligible for passthrough (but only when zPCI interpretation is
available).

From the perspective of guest configuration, you passthrough zPCI devices
in the same manner as before, with intepretation support being used by
default if available in kernel+qemu.

Associated kernel series:
https://lkml.org/lkml/2021/12/7/1179

Matthew Rosato (11):
  s390x/pci: use a reserved ID for the default PCI group
  s390x/pci: don't use hard-coded dma range in reg_ioat
  s390x/pci: add supported DT information to clp response
  Update linux headers
  target/s390x: add zpci-interp to cpu models
  s390x/pci: enable for load/store intepretation
  s390x/pci: don't fence interpreted devices without MSI-X
  s390x/pci: enable adapter event notification for interpreted devices
  s390x/pci: use I/O Address Translation assist when interpreting
  s390x/pci: use dtsm provided from vfio capabilities for interpreted
    devices
  s390x/pci: let intercept devices have separate PCI groups

Paolo Bonzini (1):
  virtio-gpu: do not byteswap padding

 hw/s390x/s390-pci-bus.c                       | 121 +++++++++-
 hw/s390x/s390-pci-inst.c                      | 178 +++++++++++++-
 hw/s390x/s390-pci-vfio.c                      | 221 +++++++++++++++++-
 include/hw/s390x/s390-pci-bus.h               |  11 +-
 include/hw/s390x/s390-pci-clp.h               |   3 +-
 include/hw/s390x/s390-pci-inst.h              |   2 +-
 include/hw/s390x/s390-pci-vfio.h              |  45 ++++
 include/hw/virtio/virtio-gpu-bswap.h          |   1 -
 include/standard-headers/asm-x86/kvm_para.h   |   1 +
 include/standard-headers/drm/drm_fourcc.h     | 121 +++++++++-
 include/standard-headers/linux/ethtool.h      |  31 +++
 include/standard-headers/linux/fuse.h         |  15 +-
 include/standard-headers/linux/pci_regs.h     |   6 +
 include/standard-headers/linux/virtio_gpu.h   |  18 +-
 include/standard-headers/linux/virtio_ids.h   |  24 ++
 include/standard-headers/linux/virtio_mem.h   |   9 +-
 include/standard-headers/linux/virtio_vsock.h |   3 +-
 linux-headers/asm-arm64/unistd.h              |   1 +
 linux-headers/asm-generic/unistd.h            |  22 +-
 linux-headers/asm-mips/unistd_n32.h           |   2 +
 linux-headers/asm-mips/unistd_n64.h           |   2 +
 linux-headers/asm-mips/unistd_o32.h           |   2 +
 linux-headers/asm-powerpc/unistd_32.h         |   2 +
 linux-headers/asm-powerpc/unistd_64.h         |   2 +
 linux-headers/asm-s390/kvm.h                  |   1 +
 linux-headers/asm-s390/unistd_32.h            |   2 +
 linux-headers/asm-s390/unistd_64.h            |   2 +
 linux-headers/asm-x86/kvm.h                   |   5 +
 linux-headers/asm-x86/unistd_32.h             |   3 +
 linux-headers/asm-x86/unistd_64.h             |   3 +
 linux-headers/asm-x86/unistd_x32.h            |   3 +
 linux-headers/linux/kvm.h                     |  41 +++-
 linux-headers/linux/vfio.h                    |  22 ++
 linux-headers/linux/vfio_zdev.h               |  51 ++++
 target/s390x/cpu_features_def.h.inc           |   1 +
 target/s390x/gen-features.c                   |   2 +
 target/s390x/kvm/kvm.c                        |   1 +
 37 files changed, 928 insertions(+), 52 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/12] s390x/pci: use a reserved ID for the default PCI group
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-08 10:30   ` Thomas Huth
  2021-12-07 21:04 ` [PATCH 02/12] s390x/pci: don't use hard-coded dma range in reg_ioat Matthew Rosato
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

The current default PCI group being used can technically collide with a
real group ID passed from a hostdev.  Let's instead use a group ID that
comes from a special pool (0xF0-0xFF) that is architected to be reserved
for simulated devices.

Fixes: 28dc86a072 ("s390x/pci: use a PCI Group structure")
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 include/hw/s390x/s390-pci-bus.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index aa891c178d..2727e7bdef 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -313,7 +313,7 @@ typedef struct ZpciFmb {
 } ZpciFmb;
 QEMU_BUILD_BUG_MSG(offsetof(ZpciFmb, fmt0) != 48, "padding in ZpciFmb");
 
-#define ZPCI_DEFAULT_FN_GRP 0x20
+#define ZPCI_DEFAULT_FN_GRP 0xFF
 typedef struct S390PCIGroup {
     ClpRspQueryPciGrp zpci_group;
     int id;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/12] s390x/pci: don't use hard-coded dma range in reg_ioat
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
  2021-12-07 21:04 ` [PATCH 01/12] s390x/pci: use a reserved ID for the default PCI group Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-08 10:32   ` Thomas Huth
  2021-12-07 21:04 ` [PATCH 03/12] s390x/pci: add supported DT information to clp response Matthew Rosato
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

Instead use the values from clp info, they will either be the hard-coded
values or what came from the host driver via vfio.

Fixes: 9670ee752727 ("s390x/pci: use a PCI Function structure")
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-inst.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 1c8ad91175..11b7f6bfa1 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -916,9 +916,10 @@ int pci_dereg_irqs(S390PCIBusDevice *pbdev)
     return 0;
 }
 
-static int reg_ioat(CPUS390XState *env, S390PCIIOMMU *iommu, ZpciFib fib,
+static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib,
                     uintptr_t ra)
 {
+    S390PCIIOMMU *iommu = pbdev->iommu;
     uint64_t pba = ldq_p(&fib.pba);
     uint64_t pal = ldq_p(&fib.pal);
     uint64_t g_iota = ldq_p(&fib.iota);
@@ -927,7 +928,7 @@ static int reg_ioat(CPUS390XState *env, S390PCIIOMMU *iommu, ZpciFib fib,
 
     pba &= ~0xfff;
     pal |= 0xfff;
-    if (pba > pal || pba < ZPCI_SDMA_ADDR || pal > ZPCI_EDMA_ADDR) {
+    if (pba > pal || pba < pbdev->zpci_fn.sdma || pal > pbdev->zpci_fn.edma) {
         s390_program_interrupt(env, PGM_OPERAND, ra);
         return -EINVAL;
     }
@@ -1125,7 +1126,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
         } else if (pbdev->iommu->enabled) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
-        } else if (reg_ioat(env, pbdev->iommu, fib, ra)) {
+        } else if (reg_ioat(env, pbdev, fib, ra)) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_INSUF_RES);
         }
@@ -1150,7 +1151,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else {
             pci_dereg_ioat(pbdev->iommu);
-            if (reg_ioat(env, pbdev->iommu, fib, ra)) {
+            if (reg_ioat(env, pbdev, fib, ra)) {
                 cc = ZPCI_PCI_LS_ERR;
                 s390_set_status_code(env, r1, ZPCI_MOD_ST_INSUF_RES);
             }
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/12] s390x/pci: add supported DT information to clp response
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
  2021-12-07 21:04 ` [PATCH 01/12] s390x/pci: use a reserved ID for the default PCI group Matthew Rosato
  2021-12-07 21:04 ` [PATCH 02/12] s390x/pci: don't use hard-coded dma range in reg_ioat Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-07 21:04 ` [PATCH 04/12] Update linux headers Matthew Rosato
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

The DTSM is a mask that specifies which I/O Address Translation designation
types are supported.  Today QEMU only supports DT=1.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c         | 1 +
 hw/s390x/s390-pci-inst.c        | 1 +
 hw/s390x/s390-pci-vfio.c        | 1 +
 include/hw/s390x/s390-pci-bus.h | 1 +
 include/hw/s390x/s390-pci-clp.h | 3 ++-
 5 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 1b51a72838..01b58ebc70 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -782,6 +782,7 @@ static void s390_pci_init_default_group(void)
     resgrp->i = 128;
     resgrp->maxstbl = 128;
     resgrp->version = 0;
+    resgrp->dtsm = ZPCI_DTSM;
 }
 
 static void set_pbdev_info(S390PCIBusDevice *pbdev)
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 11b7f6bfa1..0cef7fbace 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -329,6 +329,7 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
         stw_p(&resgrp->i, group->zpci_group.i);
         stw_p(&resgrp->maxstbl, group->zpci_group.maxstbl);
         resgrp->version = group->zpci_group.version;
+        resgrp->dtsm = group->zpci_group.dtsm;
         stw_p(&resgrp->hdr.rsp, CLP_RC_OK);
         break;
     }
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 2a153fa8c9..6f80a47e29 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -160,6 +160,7 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
         resgrp->i = cap->noi;
         resgrp->maxstbl = cap->maxstbl;
         resgrp->version = cap->version;
+        resgrp->dtsm = ZPCI_DTSM;
     }
 }
 
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 2727e7bdef..da3cde2bb4 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -37,6 +37,7 @@
 #define ZPCI_MAX_UID 0xffff
 #define UID_UNDEFINED 0
 #define UID_CHECKING_ENABLED 0x01
+#define ZPCI_DTSM 0x40
 
 OBJECT_DECLARE_SIMPLE_TYPE(S390pciState, S390_PCI_HOST_BRIDGE)
 OBJECT_DECLARE_SIMPLE_TYPE(S390PCIBus, S390_PCI_BUS)
diff --git a/include/hw/s390x/s390-pci-clp.h b/include/hw/s390x/s390-pci-clp.h
index 96b8e3f133..cc8c8662b8 100644
--- a/include/hw/s390x/s390-pci-clp.h
+++ b/include/hw/s390x/s390-pci-clp.h
@@ -163,7 +163,8 @@ typedef struct ClpRspQueryPciGrp {
     uint8_t fr;
     uint16_t maxstbl;
     uint16_t mui;
-    uint64_t reserved3;
+    uint8_t dtsm;
+    uint8_t reserved3[7];
     uint64_t dasm; /* dma address space mask */
     uint64_t msia; /* MSI address */
     uint64_t reserved4;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/12] Update linux headers
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
                   ` (2 preceding siblings ...)
  2021-12-07 21:04 ` [PATCH 03/12] s390x/pci: add supported DT information to clp response Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-07 21:04 ` [PATCH 05/12] virtio-gpu: do not byteswap padding Matthew Rosato
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

This is a placeholder that pulls in 5.16-rc4 + unmerged kernel changes
required by this item.  A proper header sync can be done once the
associated kernel code merges.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 include/standard-headers/asm-x86/kvm_para.h   |   1 +
 include/standard-headers/drm/drm_fourcc.h     | 121 +++++++++++++++++-
 include/standard-headers/linux/ethtool.h      |  31 +++++
 include/standard-headers/linux/fuse.h         |  15 ++-
 include/standard-headers/linux/pci_regs.h     |   6 +
 include/standard-headers/linux/virtio_gpu.h   |  18 ++-
 include/standard-headers/linux/virtio_ids.h   |  24 ++++
 include/standard-headers/linux/virtio_mem.h   |   9 +-
 include/standard-headers/linux/virtio_vsock.h |   3 +-
 linux-headers/asm-arm64/unistd.h              |   1 +
 linux-headers/asm-generic/unistd.h            |  22 +++-
 linux-headers/asm-mips/unistd_n32.h           |   2 +
 linux-headers/asm-mips/unistd_n64.h           |   2 +
 linux-headers/asm-mips/unistd_o32.h           |   2 +
 linux-headers/asm-powerpc/unistd_32.h         |   2 +
 linux-headers/asm-powerpc/unistd_64.h         |   2 +
 linux-headers/asm-s390/kvm.h                  |   1 +
 linux-headers/asm-s390/unistd_32.h            |   2 +
 linux-headers/asm-s390/unistd_64.h            |   2 +
 linux-headers/asm-x86/kvm.h                   |   5 +
 linux-headers/asm-x86/unistd_32.h             |   3 +
 linux-headers/asm-x86/unistd_64.h             |   3 +
 linux-headers/asm-x86/unistd_x32.h            |   3 +
 linux-headers/linux/kvm.h                     |  41 +++++-
 linux-headers/linux/vfio.h                    |  22 ++++
 linux-headers/linux/vfio_zdev.h               |  51 ++++++++
 26 files changed, 370 insertions(+), 24 deletions(-)

diff --git a/include/standard-headers/asm-x86/kvm_para.h b/include/standard-headers/asm-x86/kvm_para.h
index 204cfb8640..f0235e58a1 100644
--- a/include/standard-headers/asm-x86/kvm_para.h
+++ b/include/standard-headers/asm-x86/kvm_para.h
@@ -8,6 +8,7 @@
  * should be used to determine that a VM is running under KVM.
  */
 #define KVM_CPUID_SIGNATURE	0x40000000
+#define KVM_SIGNATURE "KVMKVMKVM\0\0\0"
 
 /* This CPUID returns two feature bitmaps in eax, edx. Before enabling
  * a particular paravirtualization, the appropriate feature bit should
diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h
index 352b51fd0a..2c025cb4fe 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -103,6 +103,12 @@ extern "C" {
 /* 8 bpp Red */
 #define DRM_FORMAT_R8		fourcc_code('R', '8', ' ', ' ') /* [7:0] R */
 
+/* 10 bpp Red */
+#define DRM_FORMAT_R10		fourcc_code('R', '1', '0', ' ') /* [15:0] x:R 6:10 little endian */
+
+/* 12 bpp Red */
+#define DRM_FORMAT_R12		fourcc_code('R', '1', '2', ' ') /* [15:0] x:R 4:12 little endian */
+
 /* 16 bpp Red */
 #define DRM_FORMAT_R16		fourcc_code('R', '1', '6', ' ') /* [15:0] R little endian */
 
@@ -372,6 +378,12 @@ extern "C" {
 
 #define DRM_FORMAT_RESERVED	      ((1ULL << 56) - 1)
 
+#define fourcc_mod_get_vendor(modifier) \
+	(((modifier) >> 56) & 0xff)
+
+#define fourcc_mod_is_vendor(modifier, vendor) \
+	(fourcc_mod_get_vendor(modifier) == DRM_FORMAT_MOD_VENDOR_## vendor)
+
 #define fourcc_mod_code(vendor, val) \
 	((((uint64_t)DRM_FORMAT_MOD_VENDOR_## vendor) << 56) | ((val) & 0x00ffffffffffffffULL))
 
@@ -899,9 +911,9 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t modifier)
 
 /*
  * The top 4 bits (out of the 56 bits alloted for specifying vendor specific
- * modifiers) denote the category for modifiers. Currently we have only two
- * categories of modifiers ie AFBC and MISC. We can have a maximum of sixteen
- * different categories.
+ * modifiers) denote the category for modifiers. Currently we have three
+ * categories of modifiers ie AFBC, MISC and AFRC. We can have a maximum of
+ * sixteen different categories.
  */
 #define DRM_FORMAT_MOD_ARM_CODE(__type, __val) \
 	fourcc_mod_code(ARM, ((uint64_t)(__type) << 52) | ((__val) & 0x000fffffffffffffULL))
@@ -1016,6 +1028,109 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t modifier)
  */
 #define AFBC_FORMAT_MOD_USM	(1ULL << 12)
 
+/*
+ * Arm Fixed-Rate Compression (AFRC) modifiers
+ *
+ * AFRC is a proprietary fixed rate image compression protocol and format,
+ * designed to provide guaranteed bandwidth and memory footprint
+ * reductions in graphics and media use-cases.
+ *
+ * AFRC buffers consist of one or more planes, with the same components
+ * and meaning as an uncompressed buffer using the same pixel format.
+ *
+ * Within each plane, the pixel/luma/chroma values are grouped into
+ * "coding unit" blocks which are individually compressed to a
+ * fixed size (in bytes). All coding units within a given plane of a buffer
+ * store the same number of values, and have the same compressed size.
+ *
+ * The coding unit size is configurable, allowing different rates of compression.
+ *
+ * The start of each AFRC buffer plane must be aligned to an alignment granule which
+ * depends on the coding unit size.
+ *
+ * Coding Unit Size   Plane Alignment
+ * ----------------   ---------------
+ * 16 bytes           1024 bytes
+ * 24 bytes           512  bytes
+ * 32 bytes           2048 bytes
+ *
+ * Coding units are grouped into paging tiles. AFRC buffer dimensions must be aligned
+ * to a multiple of the paging tile dimensions.
+ * The dimensions of each paging tile depend on whether the buffer is optimised for
+ * scanline (SCAN layout) or rotated (ROT layout) access.
+ *
+ * Layout   Paging Tile Width   Paging Tile Height
+ * ------   -----------------   ------------------
+ * SCAN     16 coding units     4 coding units
+ * ROT      8  coding units     8 coding units
+ *
+ * The dimensions of each coding unit depend on the number of components
+ * in the compressed plane and whether the buffer is optimised for
+ * scanline (SCAN layout) or rotated (ROT layout) access.
+ *
+ * Number of Components in Plane   Layout      Coding Unit Width   Coding Unit Height
+ * -----------------------------   ---------   -----------------   ------------------
+ * 1                               SCAN        16 samples          4 samples
+ * Example: 16x4 luma samples in a 'Y' plane
+ *          16x4 chroma 'V' values, in the 'V' plane of a fully-planar YUV buffer
+ * -----------------------------   ---------   -----------------   ------------------
+ * 1                               ROT         8 samples           8 samples
+ * Example: 8x8 luma samples in a 'Y' plane
+ *          8x8 chroma 'V' values, in the 'V' plane of a fully-planar YUV buffer
+ * -----------------------------   ---------   -----------------   ------------------
+ * 2                               DONT CARE   8 samples           4 samples
+ * Example: 8x4 chroma pairs in the 'UV' plane of a semi-planar YUV buffer
+ * -----------------------------   ---------   -----------------   ------------------
+ * 3                               DONT CARE   4 samples           4 samples
+ * Example: 4x4 pixels in an RGB buffer without alpha
+ * -----------------------------   ---------   -----------------   ------------------
+ * 4                               DONT CARE   4 samples           4 samples
+ * Example: 4x4 pixels in an RGB buffer with alpha
+ */
+
+#define DRM_FORMAT_MOD_ARM_TYPE_AFRC 0x02
+
+#define DRM_FORMAT_MOD_ARM_AFRC(__afrc_mode) \
+	DRM_FORMAT_MOD_ARM_CODE(DRM_FORMAT_MOD_ARM_TYPE_AFRC, __afrc_mode)
+
+/*
+ * AFRC coding unit size modifier.
+ *
+ * Indicates the number of bytes used to store each compressed coding unit for
+ * one or more planes in an AFRC encoded buffer. The coding unit size for chrominance
+ * is the same for both Cb and Cr, which may be stored in separate planes.
+ *
+ * AFRC_FORMAT_MOD_CU_SIZE_P0 indicates the number of bytes used to store
+ * each compressed coding unit in the first plane of the buffer. For RGBA buffers
+ * this is the only plane, while for semi-planar and fully-planar YUV buffers,
+ * this corresponds to the luma plane.
+ *
+ * AFRC_FORMAT_MOD_CU_SIZE_P12 indicates the number of bytes used to store
+ * each compressed coding unit in the second and third planes in the buffer.
+ * For semi-planar and fully-planar YUV buffers, this corresponds to the chroma plane(s).
+ *
+ * For single-plane buffers, AFRC_FORMAT_MOD_CU_SIZE_P0 must be specified
+ * and AFRC_FORMAT_MOD_CU_SIZE_P12 must be zero.
+ * For semi-planar and fully-planar buffers, both AFRC_FORMAT_MOD_CU_SIZE_P0 and
+ * AFRC_FORMAT_MOD_CU_SIZE_P12 must be specified.
+ */
+#define AFRC_FORMAT_MOD_CU_SIZE_MASK 0xf
+#define AFRC_FORMAT_MOD_CU_SIZE_16 (1ULL)
+#define AFRC_FORMAT_MOD_CU_SIZE_24 (2ULL)
+#define AFRC_FORMAT_MOD_CU_SIZE_32 (3ULL)
+
+#define AFRC_FORMAT_MOD_CU_SIZE_P0(__afrc_cu_size) (__afrc_cu_size)
+#define AFRC_FORMAT_MOD_CU_SIZE_P12(__afrc_cu_size) ((__afrc_cu_size) << 4)
+
+/*
+ * AFRC scanline memory layout.
+ *
+ * Indicates if the buffer uses the scanline-optimised layout
+ * for an AFRC encoded buffer, otherwise, it uses the rotation-optimised layout.
+ * The memory layout is the same for all planes.
+ */
+#define AFRC_FORMAT_MOD_LAYOUT_SCAN (1ULL << 8)
+
 /*
  * Arm 16x16 Block U-Interleaved modifier
  *
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
index 053d3fafdf..688eb8dc39 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -603,6 +603,7 @@ enum ethtool_link_ext_state {
 	ETHTOOL_LINK_EXT_STATE_CALIBRATION_FAILURE,
 	ETHTOOL_LINK_EXT_STATE_POWER_BUDGET_EXCEEDED,
 	ETHTOOL_LINK_EXT_STATE_OVERHEAT,
+	ETHTOOL_LINK_EXT_STATE_MODULE,
 };
 
 /* More information in addition to ETHTOOL_LINK_EXT_STATE_AUTONEG. */
@@ -639,6 +640,8 @@ enum ethtool_link_ext_substate_link_logical_mismatch {
 enum ethtool_link_ext_substate_bad_signal_integrity {
 	ETHTOOL_LINK_EXT_SUBSTATE_BSI_LARGE_NUMBER_OF_PHYSICAL_ERRORS = 1,
 	ETHTOOL_LINK_EXT_SUBSTATE_BSI_UNSUPPORTED_RATE,
+	ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_REFERENCE_CLOCK_LOST,
+	ETHTOOL_LINK_EXT_SUBSTATE_BSI_SERDES_ALOS,
 };
 
 /* More information in addition to ETHTOOL_LINK_EXT_STATE_CABLE_ISSUE. */
@@ -647,6 +650,11 @@ enum ethtool_link_ext_substate_cable_issue {
 	ETHTOOL_LINK_EXT_SUBSTATE_CI_CABLE_TEST_FAILURE,
 };
 
+/* More information in addition to ETHTOOL_LINK_EXT_STATE_MODULE. */
+enum ethtool_link_ext_substate_module {
+	ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY = 1,
+};
+
 #define ETH_GSTRING_LEN		32
 
 /**
@@ -704,6 +712,29 @@ enum ethtool_stringset {
 	ETH_SS_COUNT
 };
 
+/**
+ * enum ethtool_module_power_mode_policy - plug-in module power mode policy
+ * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
+ * @ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO: Module is transitioned by the host
+ *	to high power mode when the first port using it is put administratively
+ *	up and to low power mode when the last port using it is put
+ *	administratively down.
+ */
+enum ethtool_module_power_mode_policy {
+	ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH = 1,
+	ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO,
+};
+
+/**
+ * enum ethtool_module_power_mode - plug-in module power mode
+ * @ETHTOOL_MODULE_POWER_MODE_LOW: Module is in low power mode.
+ * @ETHTOOL_MODULE_POWER_MODE_HIGH: Module is in high power mode.
+ */
+enum ethtool_module_power_mode {
+	ETHTOOL_MODULE_POWER_MODE_LOW = 1,
+	ETHTOOL_MODULE_POWER_MODE_HIGH,
+};
+
 /**
  * struct ethtool_gstrings - string set for data tagging
  * @cmd: Command number = %ETHTOOL_GSTRINGS
diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index cce105bfba..e021060cee 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -181,6 +181,12 @@
  *  - add FUSE_OPEN_KILL_SUIDGID
  *  - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT
  *  - add FUSE_SETXATTR_ACL_KILL_SGID
+ *
+ *  7.34
+ *  - add FUSE_SYNCFS
+ *
+ *  7.35
+ *  - add FOPEN_NOFLUSH
  */
 
 #ifndef _LINUX_FUSE_H
@@ -212,7 +218,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 33
+#define FUSE_KERNEL_MINOR_VERSION 35
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -283,12 +289,14 @@ struct fuse_file_lock {
  * FOPEN_NONSEEKABLE: the file is not seekable
  * FOPEN_CACHE_DIR: allow caching this directory
  * FOPEN_STREAM: the file is stream-like (no file position at all)
+ * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
 #define FOPEN_NONSEEKABLE	(1 << 2)
 #define FOPEN_CACHE_DIR		(1 << 3)
 #define FOPEN_STREAM		(1 << 4)
+#define FOPEN_NOFLUSH		(1 << 5)
 
 /**
  * INIT request/reply flags
@@ -505,6 +513,7 @@ enum fuse_opcode {
 	FUSE_COPY_FILE_RANGE	= 47,
 	FUSE_SETUPMAPPING	= 48,
 	FUSE_REMOVEMAPPING	= 49,
+	FUSE_SYNCFS		= 50,
 
 	/* CUSE specific operations */
 	CUSE_INIT		= 4096,
@@ -967,4 +976,8 @@ struct fuse_removemapping_one {
 #define FUSE_REMOVEMAPPING_MAX_ENTRY   \
 		(PAGE_SIZE / sizeof(struct fuse_removemapping_one))
 
+struct fuse_syncfs_in {
+	uint64_t	padding;
+};
+
 #endif /* _LINUX_FUSE_H */
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
index e709ae8235..ff6ccbc6ef 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -504,6 +504,12 @@
 #define  PCI_EXP_DEVCTL_URRE	0x0008	/* Unsupported Request Reporting En. */
 #define  PCI_EXP_DEVCTL_RELAX_EN 0x0010 /* Enable relaxed ordering */
 #define  PCI_EXP_DEVCTL_PAYLOAD	0x00e0	/* Max_Payload_Size */
+#define  PCI_EXP_DEVCTL_PAYLOAD_128B 0x0000 /* 128 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_256B 0x0020 /* 256 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_512B 0x0040 /* 512 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_1024B 0x0060 /* 1024 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_2048B 0x0080 /* 2048 Bytes */
+#define  PCI_EXP_DEVCTL_PAYLOAD_4096B 0x00a0 /* 4096 Bytes */
 #define  PCI_EXP_DEVCTL_EXT_TAG	0x0100	/* Extended Tag Field Enable */
 #define  PCI_EXP_DEVCTL_PHANTOM	0x0200	/* Phantom Functions Enable */
 #define  PCI_EXP_DEVCTL_AUX_PME	0x0400	/* Auxiliary Power PM Enable */
diff --git a/include/standard-headers/linux/virtio_gpu.h b/include/standard-headers/linux/virtio_gpu.h
index 1357e4774e..2da48d3d4c 100644
--- a/include/standard-headers/linux/virtio_gpu.h
+++ b/include/standard-headers/linux/virtio_gpu.h
@@ -59,6 +59,11 @@
  * VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB
  */
 #define VIRTIO_GPU_F_RESOURCE_BLOB       3
+/*
+ * VIRTIO_GPU_CMD_CREATE_CONTEXT with
+ * context_init and multiple timelines
+ */
+#define VIRTIO_GPU_F_CONTEXT_INIT        4
 
 enum virtio_gpu_ctrl_type {
 	VIRTIO_GPU_UNDEFINED = 0,
@@ -122,14 +127,20 @@ enum virtio_gpu_shm_id {
 	VIRTIO_GPU_SHM_ID_HOST_VISIBLE = 1
 };
 
-#define VIRTIO_GPU_FLAG_FENCE (1 << 0)
+#define VIRTIO_GPU_FLAG_FENCE         (1 << 0)
+/*
+ * If the following flag is set, then ring_idx contains the index
+ * of the command ring that needs to used when creating the fence
+ */
+#define VIRTIO_GPU_FLAG_INFO_RING_IDX (1 << 1)
 
 struct virtio_gpu_ctrl_hdr {
 	uint32_t type;
 	uint32_t flags;
 	uint64_t fence_id;
 	uint32_t ctx_id;
-	uint32_t padding;
+	uint8_t ring_idx;
+	uint8_t padding[3];
 };
 
 /* data passed in the cursor vq */
@@ -269,10 +280,11 @@ struct virtio_gpu_resource_create_3d {
 };
 
 /* VIRTIO_GPU_CMD_CTX_CREATE */
+#define VIRTIO_GPU_CONTEXT_INIT_CAPSET_ID_MASK 0x000000ff
 struct virtio_gpu_ctx_create {
 	struct virtio_gpu_ctrl_hdr hdr;
 	uint32_t nlen;
-	uint32_t padding;
+	uint32_t context_init;
 	char debug_name[64];
 };
 
diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h
index 4fe842c3a3..80d76b75bc 100644
--- a/include/standard-headers/linux/virtio_ids.h
+++ b/include/standard-headers/linux/virtio_ids.h
@@ -54,7 +54,31 @@
 #define VIRTIO_ID_SOUND			25 /* virtio sound */
 #define VIRTIO_ID_FS			26 /* virtio filesystem */
 #define VIRTIO_ID_PMEM			27 /* virtio pmem */
+#define VIRTIO_ID_RPMB			28 /* virtio rpmb */
 #define VIRTIO_ID_MAC80211_HWSIM	29 /* virtio mac80211-hwsim */
+#define VIRTIO_ID_VIDEO_ENCODER		30 /* virtio video encoder */
+#define VIRTIO_ID_VIDEO_DECODER		31 /* virtio video decoder */
+#define VIRTIO_ID_SCMI			32 /* virtio SCMI */
+#define VIRTIO_ID_NITRO_SEC_MOD		33 /* virtio nitro secure module*/
+#define VIRTIO_ID_I2C_ADAPTER		34 /* virtio i2c adapter */
+#define VIRTIO_ID_WATCHDOG		35 /* virtio watchdog */
+#define VIRTIO_ID_CAN			36 /* virtio can */
+#define VIRTIO_ID_DMABUF		37 /* virtio dmabuf */
+#define VIRTIO_ID_PARAM_SERV		38 /* virtio parameter server */
+#define VIRTIO_ID_AUDIO_POLICY		39 /* virtio audio policy */
 #define VIRTIO_ID_BT			40 /* virtio bluetooth */
+#define VIRTIO_ID_GPIO			41 /* virtio gpio */
+
+/*
+ * Virtio Transitional IDs
+ */
+
+#define VIRTIO_TRANS_ID_NET		1000 /* transitional virtio net */
+#define VIRTIO_TRANS_ID_BLOCK		1001 /* transitional virtio block */
+#define VIRTIO_TRANS_ID_BALLOON		1002 /* transitional virtio balloon */
+#define VIRTIO_TRANS_ID_CONSOLE		1003 /* transitional virtio console */
+#define VIRTIO_TRANS_ID_SCSI		1004 /* transitional virtio SCSI */
+#define VIRTIO_TRANS_ID_RNG		1005 /* transitional virtio rng */
+#define VIRTIO_TRANS_ID_9P		1009 /* transitional virtio 9p console */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/standard-headers/linux/virtio_mem.h b/include/standard-headers/linux/virtio_mem.h
index 05e5ade75d..18c74c527c 100644
--- a/include/standard-headers/linux/virtio_mem.h
+++ b/include/standard-headers/linux/virtio_mem.h
@@ -68,9 +68,10 @@
  * explicitly triggered (VIRTIO_MEM_REQ_UNPLUG).
  *
  * There are no guarantees what will happen if unplugged memory is
- * read/written. Such memory should, in general, not be touched. E.g.,
- * even writing might succeed, but the values will simply be discarded at
- * random points in time.
+ * read/written. In general, unplugged memory should not be touched, because
+ * the resulting action is undefined. There is one exception: without
+ * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, unplugged memory inside the usable
+ * region can be read, to simplify creation of memory dumps.
  *
  * It can happen that the device cannot process a request, because it is
  * busy. The device driver has to retry later.
@@ -87,6 +88,8 @@
 
 /* node_id is an ACPI PXM and is valid */
 #define VIRTIO_MEM_F_ACPI_PXM		0
+/* unplugged memory must not be accessed */
+#define VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE	1
 
 
 /* --- virtio-mem: guest -> host requests --- */
diff --git a/include/standard-headers/linux/virtio_vsock.h b/include/standard-headers/linux/virtio_vsock.h
index 3a23488e42..467e751b17 100644
--- a/include/standard-headers/linux/virtio_vsock.h
+++ b/include/standard-headers/linux/virtio_vsock.h
@@ -97,7 +97,8 @@ enum virtio_vsock_shutdown {
 
 /* VIRTIO_VSOCK_OP_RW flags values */
 enum virtio_vsock_rw {
-	VIRTIO_VSOCK_SEQ_EOR = 1,
+	VIRTIO_VSOCK_SEQ_EOM = 1,
+	VIRTIO_VSOCK_SEQ_EOR = 2,
 };
 
 #endif /* _LINUX_VIRTIO_VSOCK_H */
diff --git a/linux-headers/asm-arm64/unistd.h b/linux-headers/asm-arm64/unistd.h
index f83a70e07d..ce2ee8f1e3 100644
--- a/linux-headers/asm-arm64/unistd.h
+++ b/linux-headers/asm-arm64/unistd.h
@@ -20,5 +20,6 @@
 #define __ARCH_WANT_SET_GET_RLIMIT
 #define __ARCH_WANT_TIME32_SYSCALLS
 #define __ARCH_WANT_SYS_CLONE3
+#define __ARCH_WANT_MEMFD_SECRET
 
 #include <asm-generic/unistd.h>
diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h
index f211961ce1..4557a8b608 100644
--- a/linux-headers/asm-generic/unistd.h
+++ b/linux-headers/asm-generic/unistd.h
@@ -673,15 +673,15 @@ __SYSCALL(__NR_madvise, sys_madvise)
 #define __NR_remap_file_pages 234
 __SYSCALL(__NR_remap_file_pages, sys_remap_file_pages)
 #define __NR_mbind 235
-__SC_COMP(__NR_mbind, sys_mbind, compat_sys_mbind)
+__SYSCALL(__NR_mbind, sys_mbind)
 #define __NR_get_mempolicy 236
-__SC_COMP(__NR_get_mempolicy, sys_get_mempolicy, compat_sys_get_mempolicy)
+__SYSCALL(__NR_get_mempolicy, sys_get_mempolicy)
 #define __NR_set_mempolicy 237
-__SC_COMP(__NR_set_mempolicy, sys_set_mempolicy, compat_sys_set_mempolicy)
+__SYSCALL(__NR_set_mempolicy, sys_set_mempolicy)
 #define __NR_migrate_pages 238
-__SC_COMP(__NR_migrate_pages, sys_migrate_pages, compat_sys_migrate_pages)
+__SYSCALL(__NR_migrate_pages, sys_migrate_pages)
 #define __NR_move_pages 239
-__SC_COMP(__NR_move_pages, sys_move_pages, compat_sys_move_pages)
+__SYSCALL(__NR_move_pages, sys_move_pages)
 #endif
 
 #define __NR_rt_tgsigqueueinfo 240
@@ -873,8 +873,18 @@ __SYSCALL(__NR_landlock_add_rule, sys_landlock_add_rule)
 #define __NR_landlock_restrict_self 446
 __SYSCALL(__NR_landlock_restrict_self, sys_landlock_restrict_self)
 
+#ifdef __ARCH_WANT_MEMFD_SECRET
+#define __NR_memfd_secret 447
+__SYSCALL(__NR_memfd_secret, sys_memfd_secret)
+#endif
+#define __NR_process_mrelease 448
+__SYSCALL(__NR_process_mrelease, sys_process_mrelease)
+
+#define __NR_futex_waitv 449
+__SYSCALL(__NR_futex_waitv, sys_futex_waitv)
+
 #undef __NR_syscalls
-#define __NR_syscalls 447
+#define __NR_syscalls 450
 
 /*
  * 32 bit systems traditionally used different
diff --git a/linux-headers/asm-mips/unistd_n32.h b/linux-headers/asm-mips/unistd_n32.h
index 09cd297698..790309211a 100644
--- a/linux-headers/asm-mips/unistd_n32.h
+++ b/linux-headers/asm-mips/unistd_n32.h
@@ -376,5 +376,7 @@
 #define __NR_landlock_create_ruleset (__NR_Linux + 444)
 #define __NR_landlock_add_rule (__NR_Linux + 445)
 #define __NR_landlock_restrict_self (__NR_Linux + 446)
+#define __NR_process_mrelease (__NR_Linux + 448)
+#define __NR_futex_waitv (__NR_Linux + 449)
 
 #endif /* _ASM_UNISTD_N32_H */
diff --git a/linux-headers/asm-mips/unistd_n64.h b/linux-headers/asm-mips/unistd_n64.h
index 780e0cead6..d68a3cc573 100644
--- a/linux-headers/asm-mips/unistd_n64.h
+++ b/linux-headers/asm-mips/unistd_n64.h
@@ -352,5 +352,7 @@
 #define __NR_landlock_create_ruleset (__NR_Linux + 444)
 #define __NR_landlock_add_rule (__NR_Linux + 445)
 #define __NR_landlock_restrict_self (__NR_Linux + 446)
+#define __NR_process_mrelease (__NR_Linux + 448)
+#define __NR_futex_waitv (__NR_Linux + 449)
 
 #endif /* _ASM_UNISTD_N64_H */
diff --git a/linux-headers/asm-mips/unistd_o32.h b/linux-headers/asm-mips/unistd_o32.h
index 06a2b3b55e..18cfa981c7 100644
--- a/linux-headers/asm-mips/unistd_o32.h
+++ b/linux-headers/asm-mips/unistd_o32.h
@@ -422,5 +422,7 @@
 #define __NR_landlock_create_ruleset (__NR_Linux + 444)
 #define __NR_landlock_add_rule (__NR_Linux + 445)
 #define __NR_landlock_restrict_self (__NR_Linux + 446)
+#define __NR_process_mrelease (__NR_Linux + 448)
+#define __NR_futex_waitv (__NR_Linux + 449)
 
 #endif /* _ASM_UNISTD_O32_H */
diff --git a/linux-headers/asm-powerpc/unistd_32.h b/linux-headers/asm-powerpc/unistd_32.h
index cd5a8a41b2..c8a97e6f51 100644
--- a/linux-headers/asm-powerpc/unistd_32.h
+++ b/linux-headers/asm-powerpc/unistd_32.h
@@ -429,6 +429,8 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
 
 
 #endif /* _ASM_UNISTD_32_H */
diff --git a/linux-headers/asm-powerpc/unistd_64.h b/linux-headers/asm-powerpc/unistd_64.h
index 8458effa8d..5dd37251b8 100644
--- a/linux-headers/asm-powerpc/unistd_64.h
+++ b/linux-headers/asm-powerpc/unistd_64.h
@@ -401,6 +401,8 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h
index f053b8304a..d8259ff9a1 100644
--- a/linux-headers/asm-s390/kvm.h
+++ b/linux-headers/asm-s390/kvm.h
@@ -130,6 +130,7 @@ struct kvm_s390_vm_cpu_machine {
 #define KVM_S390_VM_CPU_FEAT_PFMFI	11
 #define KVM_S390_VM_CPU_FEAT_SIGPIF	12
 #define KVM_S390_VM_CPU_FEAT_KSS	13
+#define KVM_S390_VM_CPU_FEAT_ZPCI_INTERP 14
 struct kvm_s390_vm_cpu_feat {
 	__u64 feat[16];
 };
diff --git a/linux-headers/asm-s390/unistd_32.h b/linux-headers/asm-s390/unistd_32.h
index 0c3cd299e4..8c60e40ab1 100644
--- a/linux-headers/asm-s390/unistd_32.h
+++ b/linux-headers/asm-s390/unistd_32.h
@@ -419,5 +419,7 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
 
 #endif /* _ASM_S390_UNISTD_32_H */
diff --git a/linux-headers/asm-s390/unistd_64.h b/linux-headers/asm-s390/unistd_64.h
index 8dfc08b5e6..5793aa2a83 100644
--- a/linux-headers/asm-s390/unistd_64.h
+++ b/linux-headers/asm-s390/unistd_64.h
@@ -367,5 +367,7 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
 
 #endif /* _ASM_S390_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index a6c327f8ad..5a776a08f7 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -295,6 +295,7 @@ struct kvm_debug_exit_arch {
 #define KVM_GUESTDBG_USE_HW_BP		0x00020000
 #define KVM_GUESTDBG_INJECT_DB		0x00040000
 #define KVM_GUESTDBG_INJECT_BP		0x00080000
+#define KVM_GUESTDBG_BLOCKIRQ		0x00100000
 
 /* for KVM_SET_GUEST_DEBUG */
 struct kvm_guest_debug_arch {
@@ -503,4 +504,8 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */
+#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
+#define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h
index 66e96c0c68..9c9ffe312b 100644
--- a/linux-headers/asm-x86/unistd_32.h
+++ b/linux-headers/asm-x86/unistd_32.h
@@ -437,6 +437,9 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_memfd_secret 447
+#define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
 
 
 #endif /* _ASM_UNISTD_32_H */
diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h
index b8ff6f14ee..084f1eef9c 100644
--- a/linux-headers/asm-x86/unistd_64.h
+++ b/linux-headers/asm-x86/unistd_64.h
@@ -359,6 +359,9 @@
 #define __NR_landlock_create_ruleset 444
 #define __NR_landlock_add_rule 445
 #define __NR_landlock_restrict_self 446
+#define __NR_memfd_secret 447
+#define __NR_process_mrelease 448
+#define __NR_futex_waitv 449
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h
index 06a1097c15..a2441affc2 100644
--- a/linux-headers/asm-x86/unistd_x32.h
+++ b/linux-headers/asm-x86/unistd_x32.h
@@ -312,6 +312,9 @@
 #define __NR_landlock_create_ruleset (__X32_SYSCALL_BIT + 444)
 #define __NR_landlock_add_rule (__X32_SYSCALL_BIT + 445)
 #define __NR_landlock_restrict_self (__X32_SYSCALL_BIT + 446)
+#define __NR_memfd_secret (__X32_SYSCALL_BIT + 447)
+#define __NR_process_mrelease (__X32_SYSCALL_BIT + 448)
+#define __NR_futex_waitv (__X32_SYSCALL_BIT + 449)
 #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512)
 #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513)
 #define __NR_ioctl (__X32_SYSCALL_BIT + 514)
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index bcaf66cc4d..f61f357899 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -269,6 +269,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_AP_RESET_HOLD    32
 #define KVM_EXIT_X86_BUS_LOCK     33
 #define KVM_EXIT_XEN              34
+#define KVM_EXIT_RISCV_SBI        35
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -397,13 +398,23 @@ struct kvm_run {
 		 * "ndata" is correct, that new fields are enumerated in "flags",
 		 * and that each flag enumerates fields that are 64-bit aligned
 		 * and sized (so that ndata+internal.data[] is valid/accurate).
+		 *
+		 * Space beyond the defined fields may be used to store arbitrary
+		 * debug information relating to the emulation failure. It is
+		 * accounted for in "ndata" but the format is unspecified and is
+		 * not represented in "flags". Any such information is *not* ABI!
 		 */
 		struct {
 			__u32 suberror;
 			__u32 ndata;
 			__u64 flags;
-			__u8  insn_size;
-			__u8  insn_bytes[15];
+			union {
+				struct {
+					__u8  insn_size;
+					__u8  insn_bytes[15];
+				};
+			};
+			/* Arbitrary debug data may follow. */
 		} emulation_failure;
 		/* KVM_EXIT_OSI */
 		struct {
@@ -469,6 +480,13 @@ struct kvm_run {
 		} msr;
 		/* KVM_EXIT_XEN */
 		struct kvm_xen_exit xen;
+		/* KVM_EXIT_RISCV_SBI */
+		struct {
+			unsigned long extension_id;
+			unsigned long function_id;
+			unsigned long args[6];
+			unsigned long ret[2];
+		} riscv_sbi;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -1112,6 +1130,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1223,11 +1242,16 @@ struct kvm_irqfd {
 
 /* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags.  */
 #define KVM_CLOCK_TSC_STABLE		2
+#define KVM_CLOCK_REALTIME		(1 << 2)
+#define KVM_CLOCK_HOST_TSC		(1 << 3)
 
 struct kvm_clock_data {
 	__u64 clock;
 	__u32 flags;
-	__u32 pad[9];
+	__u32 pad0;
+	__u64 realtime;
+	__u64 host_tsc;
+	__u32 pad[4];
 };
 
 /* For KVM_CAP_SW_TLB */
@@ -1965,7 +1989,9 @@ struct kvm_stats_header {
 #define KVM_STATS_TYPE_CUMULATIVE	(0x0 << KVM_STATS_TYPE_SHIFT)
 #define KVM_STATS_TYPE_INSTANT		(0x1 << KVM_STATS_TYPE_SHIFT)
 #define KVM_STATS_TYPE_PEAK		(0x2 << KVM_STATS_TYPE_SHIFT)
-#define KVM_STATS_TYPE_MAX		KVM_STATS_TYPE_PEAK
+#define KVM_STATS_TYPE_LINEAR_HIST	(0x3 << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_LOG_HIST		(0x4 << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_MAX		KVM_STATS_TYPE_LOG_HIST
 
 #define KVM_STATS_UNIT_SHIFT		4
 #define KVM_STATS_UNIT_MASK		(0xF << KVM_STATS_UNIT_SHIFT)
@@ -1988,8 +2014,9 @@ struct kvm_stats_header {
  * @size: The number of data items for this stats.
  *        Every data item is of type __u64.
  * @offset: The offset of the stats to the start of stat structure in
- *          struture kvm or kvm_vcpu.
- * @unused: Unused field for future usage. Always 0 for now.
+ *          structure kvm or kvm_vcpu.
+ * @bucket_size: A parameter value used for histogram stats. It is only used
+ *		for linear histogram stats, specifying the size of the bucket;
  * @name: The name string for the stats. Its size is indicated by the
  *        &kvm_stats_header->name_size.
  */
@@ -1998,7 +2025,7 @@ struct kvm_stats_desc {
 	__s16 exponent;
 	__u16 size;
 	__u32 offset;
-	__u32 unused;
+	__u32 bucket_size;
 	char name[];
 };
 
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index e680594f27..96b18b872a 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -1002,6 +1002,28 @@ struct vfio_device_feature {
  */
 #define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN	(0)
 
+/*
+ * Provide support for enabling interpretation of zPCI instructions.  This
+ * feature is only valid for s390x PCI devices.  Data provided when setting
+ * and getting this feature is futher described in vfio_zdev.h
+ */
+#define VFIO_DEVICE_FEATURE_ZPCI_INTERP		(1)
+
+/*
+ * Provide support for enbaling adapter interruption forwarding for zPCI
+ * devices.  This feature is only valid for s390x PCI devices.  Data provided
+ * when setting and getting this feature is further described in vfio_zdev.h
+ */
+#define VFIO_DEVICE_FEATURE_ZPCI_AIF		(2)
+
+/*
+ * Provide support for enabling guest I/O address translation assistance for
+ * zPCI devices.  This feature is only valid for s390x PCI devices.  Data
+ * provided when setting and getting this feature is further described in
+ * vfio_zdev.h
+ */
+#define VFIO_DEVICE_FEATURE_ZPCI_IOAT		(3)
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
diff --git a/linux-headers/linux/vfio_zdev.h b/linux-headers/linux/vfio_zdev.h
index b4309397b6..b4c2ba8e71 100644
--- a/linux-headers/linux/vfio_zdev.h
+++ b/linux-headers/linux/vfio_zdev.h
@@ -47,6 +47,9 @@ struct vfio_device_info_cap_zpci_group {
 	__u16 noi;		/* Maximum number of MSIs */
 	__u16 maxstbl;		/* Maximum Store Block Length */
 	__u8 version;		/* Supported PCI Version */
+	/* End of version 1 */
+	__u8 dtsm;		/* Supported IOAT Designations */
+	/* End of version 2 */
 };
 
 /**
@@ -75,4 +78,52 @@ struct vfio_device_info_cap_zpci_pfip {
 	__u8 pfip[];
 };
 
+/**
+ * VFIO_DEVICE_FEATURE_ZPCI_INTERP
+ *
+ * This feature is used for enabling zPCI instruction interpretation for a
+ * device.  No data is provided when setting this feature.  When getting
+ * this feature, the following structure is provided which details whether
+ * or not interpretation is active and provides the guest with host device
+ * information necessary to enable interpretation.
+ */
+struct vfio_device_zpci_interp {
+	__u64 flags;
+#define VFIO_DEVICE_ZPCI_FLAG_INTERP 1
+	__u32 fh;		/* Host device function handle */
+};
+
+/**
+ * VFIO_DEVICE_FEATURE_ZPCI_AIF
+ *
+ * This feature is used for enabling forwarding of adapter interrupts directly
+ * from firmware to the guest.  When setting this feature, the flags indicate
+ * whether to enable/disable the feature and the structure defined below is
+ * used to setup the forwarding structures.  When getting this feature, only
+ * the flags are used to indicate the current state.
+ */
+struct vfio_device_zpci_aif {
+	__u64 flags;
+#define VFIO_DEVICE_ZPCI_FLAG_AIF_FLOAT 1
+#define VFIO_DEVICE_ZPCI_FLAG_AIF_HOST 2
+	__u64 ibv;		/* Address of guest interrupt bit vector */
+	__u64 sb;		/* Address of guest summary bit */
+	__u32 noi;		/* Number of interrupts */
+	__u8 isc;		/* Guest interrupt subclass */
+	__u8 sbo;		/* Offset of guest summary bit vector */
+};
+
+/**
+ * VFIO_DEVICE_FEATURE_ZPCI_IOAT
+ *
+ * This feature is used for enabling guest I/O translation assistance for
+ * passthrough zPCI devices using instruction interpretation.  When setting
+ * this feature, the iota specifies a KVM guest I/O translation anchor.  When
+ * getting this feature, the most recently set anchor (or 0) is returned in
+ * iota.
+ */
+struct vfio_device_zpci_ioat {
+	__u64 iota;
+};
+
 #endif
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/12] virtio-gpu: do not byteswap padding
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
                   ` (3 preceding siblings ...)
  2021-12-07 21:04 ` [PATCH 04/12] Update linux headers Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-07 21:04 ` [PATCH 06/12] target/s390x: add zpci-interp to cpu models Matthew Rosato
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, Philippe Mathieu-Daudé,
	cohuck, richard.henderson, thuth, qemu-devel, pasic,
	alex.williamson, mst, pbonzini, david, borntraeger,
	Alex Bennée

From: Paolo Bonzini <pbonzini@redhat.com>

In Linux 5.16, the padding of struct virtio_gpu_ctrl_hdr has become a
single-byte field followed by a uint8_t[3] array of padding bytes,
and virtio_gpu_ctrl_hdr_bswap does not compile anymore.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
---
 include/hw/virtio/virtio-gpu-bswap.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/hw/virtio/virtio-gpu-bswap.h b/include/hw/virtio/virtio-gpu-bswap.h
index e2bee8f595..5faac0d8d5 100644
--- a/include/hw/virtio/virtio-gpu-bswap.h
+++ b/include/hw/virtio/virtio-gpu-bswap.h
@@ -24,7 +24,6 @@ virtio_gpu_ctrl_hdr_bswap(struct virtio_gpu_ctrl_hdr *hdr)
     le32_to_cpus(&hdr->flags);
     le64_to_cpus(&hdr->fence_id);
     le32_to_cpus(&hdr->ctx_id);
-    le32_to_cpus(&hdr->padding);
 }
 
 static inline void
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/12] target/s390x: add zpci-interp to cpu models
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
                   ` (4 preceding siblings ...)
  2021-12-07 21:04 ` [PATCH 05/12] virtio-gpu: do not byteswap padding Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-08 10:16   ` Christian Borntraeger
  2021-12-07 21:04 ` [PATCH 07/12] s390x/pci: enable for load/store intepretation Matthew Rosato
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

The zpci-interp feature is used to specify whether zPCI interpretation is
to be used for this guest.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 target/s390x/cpu_features_def.h.inc | 1 +
 target/s390x/gen-features.c         | 2 ++
 target/s390x/kvm/kvm.c              | 1 +
 3 files changed, 4 insertions(+)

diff --git a/target/s390x/cpu_features_def.h.inc b/target/s390x/cpu_features_def.h.inc
index e86662bb3b..4ade3182aa 100644
--- a/target/s390x/cpu_features_def.h.inc
+++ b/target/s390x/cpu_features_def.h.inc
@@ -146,6 +146,7 @@ DEF_FEAT(SIE_CEI, "cei", SCLP_CPU, 43, "SIE: Conditional-external-interception f
 DEF_FEAT(DAT_ENH_2, "dateh2", MISC, 0, "DAT-enhancement facility 2")
 DEF_FEAT(CMM, "cmm", MISC, 0, "Collaborative-memory-management facility")
 DEF_FEAT(AP, "ap", MISC, 0, "AP instructions installed")
+DEF_FEAT(ZPCI_INTERP, "zpci-interp", MISC, 0, "zPCI interpretation")
 
 /* Features exposed via the PLO instruction. */
 DEF_FEAT(PLO_CL, "plo-cl", PLO, 0, "PLO Compare and load (32 bit in general registers)")
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 7cb1a6ec10..7005d22415 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -554,6 +554,7 @@ static uint16_t full_GEN14_GA1[] = {
     S390_FEAT_HPMA2,
     S390_FEAT_SIE_KSS,
     S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+    S390_FEAT_ZPCI_INTERP,
 };
 
 #define full_GEN14_GA2 EmptyFeat
@@ -650,6 +651,7 @@ static uint16_t default_GEN14_GA1[] = {
     S390_FEAT_GROUP_MSA_EXT_8,
     S390_FEAT_MULTIPLE_EPOCH,
     S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+    S390_FEAT_ZPCI_INTERP,
 };
 
 #define default_GEN14_GA2 EmptyFeat
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 5b1fdb55c4..b13d78f988 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2290,6 +2290,7 @@ static int kvm_to_feat[][2] = {
     { KVM_S390_VM_CPU_FEAT_PFMFI, S390_FEAT_SIE_PFMFI},
     { KVM_S390_VM_CPU_FEAT_SIGPIF, S390_FEAT_SIE_SIGPIF},
     { KVM_S390_VM_CPU_FEAT_KSS, S390_FEAT_SIE_KSS},
+    { KVM_S390_VM_CPU_FEAT_ZPCI_INTERP, S390_FEAT_ZPCI_INTERP },
 };
 
 static int query_cpu_feat(S390FeatBitmap features)
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/12] s390x/pci: enable for load/store intepretation
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
                   ` (5 preceding siblings ...)
  2021-12-07 21:04 ` [PATCH 06/12] target/s390x: add zpci-interp to cpu models Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-08 10:56   ` Thomas Huth
  2021-12-15  7:44   ` Pierre Morel
  2021-12-07 21:04 ` [PATCH 08/12] s390x/pci: don't fence interpreted devices without MSI-X Matthew Rosato
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

Use the associated vfio feature ioctl to enable interpretation for devices
when requested.  As part of this process, we must use the host function
handle rather than a QEMU-generated one -- this is provided as part of the
ioctl payload.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c          | 69 +++++++++++++++++++++++++++++++-
 hw/s390x/s390-pci-inst.c         | 63 ++++++++++++++++++++++++++++-
 hw/s390x/s390-pci-vfio.c         | 55 +++++++++++++++++++++++++
 include/hw/s390x/s390-pci-bus.h  |  1 +
 include/hw/s390x/s390-pci-vfio.h | 15 +++++++
 5 files changed, 201 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 01b58ebc70..451bd32d92 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -971,12 +971,57 @@ static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
     }
 }
 
+static int s390_pci_interp_plug(S390pciState *s, S390PCIBusDevice *pbdev)
+{
+    uint32_t idx;
+    int rc;
+
+    rc = s390_pci_probe_interp(pbdev);
+    if (rc) {
+        return rc;
+    }
+
+    rc = s390_pci_update_passthrough_fh(pbdev);
+    if (rc) {
+        return rc;
+    }
+
+    /*
+     * The host device is in an enabled state, but the device must
+     * begin as disabled for the guest so mask off the enable bit
+     * from the passthrough handle.
+     */
+    pbdev->fh &= ~FH_MASK_ENABLE;
+
+    /* Next, see if the idx is already in-use */
+    idx = pbdev->fh & FH_MASK_INDEX;
+    if (pbdev->idx != idx) {
+        if (s390_pci_find_dev_by_idx(s, idx)) {
+            return -EINVAL;
+        }
+        /*
+         * Update the idx entry with the passed through idx
+         * If the relinquised idx is lower than next_idx, use it
+         * to replace next_idx
+         */
+        g_hash_table_remove(s->zpci_table, &pbdev->idx);
+        if (idx < s->next_idx) {
+            s->next_idx = idx;
+        }
+        pbdev->idx = idx;
+        g_hash_table_insert(s->zpci_table, &pbdev->idx, pbdev);
+    }
+
+    return 0;
+}
+
 static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                               Error **errp)
 {
     S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
     PCIDevice *pdev = NULL;
     S390PCIBusDevice *pbdev = NULL;
+    int rc;
 
     if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
         PCIBridge *pb = PCI_BRIDGE(dev);
@@ -1022,12 +1067,33 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         set_pbdev_info(pbdev);
 
         if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
-            pbdev->fh |= FH_SHM_VFIO;
+            /*
+             * By default, interpretation is always requested; if the available
+             * facilities indicate it is not available, fallback to the
+             * intercept model.
+             */
+            if (pbdev->interp && !s390_has_feat(S390_FEAT_ZPCI_INTERP)) {
+                    DPRINTF("zPCI interpretation facilities missing.\n");
+                    pbdev->interp = false;
+                }
+            if (pbdev->interp) {
+                rc = s390_pci_interp_plug(s, pbdev);
+                if (rc) {
+                    error_setg(errp, "zpci interp plug failed: %d", rc);
+                    return;
+                }
+            }
             pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
             /* Fill in CLP information passed via the vfio region */
             s390_pci_get_clp_info(pbdev);
+            if (!pbdev->interp) {
+                /* Do vfio passthrough but intercept for I/O */
+                pbdev->fh |= FH_SHM_VFIO;
+            }
         } else {
             pbdev->fh |= FH_SHM_EMUL;
+            /* Always intercept emulated devices */
+            pbdev->interp = false;
         }
 
         if (s390_pci_msix_init(pbdev)) {
@@ -1360,6 +1426,7 @@ static Property s390_pci_device_properties[] = {
     DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
     DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
     DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
+    DEFINE_PROP_BOOL("interp", S390PCIBusDevice, interp, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 0cef7fbace..ba4017474e 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -18,6 +18,7 @@
 #include "sysemu/hw_accel.h"
 #include "hw/s390x/s390-pci-inst.h"
 #include "hw/s390x/s390-pci-bus.h"
+#include "hw/s390x/s390-pci-vfio.h"
 #include "hw/s390x/tod.h"
 
 #ifndef DEBUG_S390PCI_INST
@@ -156,6 +157,47 @@ out:
     return rc;
 }
 
+static int clp_enable_interp(S390PCIBusDevice *pbdev)
+{
+    int rc;
+
+    rc = s390_pci_set_interp(pbdev, true);
+    if (rc) {
+        DPRINTF("Failed to enable interpretation\n");
+        return rc;
+    }
+    rc = s390_pci_update_passthrough_fh(pbdev);
+    if (rc) {
+        DPRINTF("Failed to update passthrough fh\n");
+        return rc;
+    }
+    if (!(pbdev->fh & FH_MASK_ENABLE)) {
+        DPRINTF("Passthrough handle is not enabled\n");
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static int clp_disable_interp(S390PCIBusDevice *pbdev)
+{
+    int rc;
+
+    rc = s390_pci_set_interp(pbdev, false);
+    if (rc) {
+        DPRINTF("Failed to disable interpretation\n");
+        return rc;
+    }
+
+    rc = s390_pci_update_passthrough_fh(pbdev);
+    if (rc) {
+        DPRINTF("Failed to update passthrough fh\n");
+        return rc;
+    }
+
+    return 0;
+}
+
 int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
 {
     ClpReqHdr *reqh;
@@ -246,7 +288,19 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
                 goto out;
             }
 
-            pbdev->fh |= FH_MASK_ENABLE;
+            /*
+             * If interpretation is specified, attempt to enable this now and
+             * update with the host fh
+             */
+            if (pbdev->interp) {
+                if (clp_enable_interp(pbdev)) {
+                    stw_p(&ressetpci->hdr.rsp, CLP_RC_SETPCIFN_ERR);
+                    goto out;
+                }
+            } else {
+                pbdev->fh |= FH_MASK_ENABLE;
+            }
+
             pbdev->state = ZPCI_FS_ENABLED;
             stl_p(&ressetpci->fh, pbdev->fh);
             stw_p(&ressetpci->hdr.rsp, CLP_RC_OK);
@@ -257,6 +311,13 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
                 goto out;
             }
             device_legacy_reset(DEVICE(pbdev));
+            if (pbdev->interp) {
+                if (clp_disable_interp(pbdev)) {
+                    stw_p(&ressetpci->hdr.rsp, CLP_RC_SETPCIFN_ERR);
+                    goto out;
+                }
+            }
+            /* Mask off the enabled bit for interpreted devices too */
             pbdev->fh &= ~FH_MASK_ENABLE;
             pbdev->state = ZPCI_FS_DISABLED;
             stl_p(&ressetpci->fh, pbdev->fh);
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 6f80a47e29..78093aaac7 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -97,6 +97,61 @@ void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt)
     }
 }
 
+int s390_pci_probe_interp(S390PCIBusDevice *pbdev)
+{
+    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    struct vfio_device_feature feat = {
+        .argsz = sizeof(struct vfio_device_feature),
+        .flags = VFIO_DEVICE_FEATURE_PROBE + VFIO_DEVICE_FEATURE_ZPCI_INTERP
+    };
+
+    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, &feat);
+}
+
+int s390_pci_set_interp(S390PCIBusDevice *pbdev, bool enable)
+{
+    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    g_autofree struct vfio_device_feature *feat;
+    struct vfio_device_zpci_interp *data;
+    int size;
+
+    size = sizeof(*feat) + sizeof(*data);
+    feat = g_malloc0(size);
+    feat->argsz = size;
+    feat->flags = VFIO_DEVICE_FEATURE_SET + VFIO_DEVICE_FEATURE_ZPCI_INTERP;
+
+    data = (struct vfio_device_zpci_interp *)&feat->data;
+    if (enable) {
+        data->flags = VFIO_DEVICE_ZPCI_FLAG_INTERP;
+    } else {
+        data->flags = 0;
+    }
+
+    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
+}
+
+int s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev)
+{
+    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    g_autofree struct vfio_device_feature *feat;
+    struct vfio_device_zpci_interp *data;
+    int size, rc;
+
+    size = sizeof(*feat) + sizeof(*data);
+    feat = g_malloc0(size);
+    feat->argsz = size;
+    feat->flags = VFIO_DEVICE_FEATURE_GET + VFIO_DEVICE_FEATURE_ZPCI_INTERP;
+
+    rc = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
+    if (rc) {
+        return rc;
+    }
+
+    data = (struct vfio_device_zpci_interp *)&feat->data;
+    pbdev->fh = data->fh;
+    return 0;
+}
+
 static void s390_pci_read_base(S390PCIBusDevice *pbdev,
                                struct vfio_device_info *info)
 {
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index da3cde2bb4..a9843dfe97 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -350,6 +350,7 @@ struct S390PCIBusDevice {
     IndAddr *indicator;
     bool pci_unplug_request_processed;
     bool unplug_requested;
+    bool interp;
     QTAILQ_ENTRY(S390PCIBusDevice) link;
 };
 
diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
index ff708aef50..42533e38f7 100644
--- a/include/hw/s390x/s390-pci-vfio.h
+++ b/include/hw/s390x/s390-pci-vfio.h
@@ -20,6 +20,9 @@ bool s390_pci_update_dma_avail(int fd, unsigned int *avail);
 S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
                                           S390PCIBusDevice *pbdev);
 void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
+int s390_pci_probe_interp(S390PCIBusDevice *pbdev);
+int s390_pci_set_interp(S390PCIBusDevice *pbdev, bool enable);
+int s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev);
 void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
 #else
 static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
@@ -33,6 +36,18 @@ static inline S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
 }
 static inline void s390_pci_end_dma_count(S390pciState *s,
                                           S390PCIDMACount *cnt) { }
+int s390_pci_probe_interp(S390PCIBusDevice *pbdev)
+{
+    return -EINVAL;
+}
+static inline int s390_pci_set_interp(S390PCIBusDevice *pbdev, bool enable)
+{
+    return -EINVAL;
+}
+static inline int s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev)
+{
+    return -EINVAL;
+}
 static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
 #endif
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/12] s390x/pci: don't fence interpreted devices without MSI-X
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
                   ` (6 preceding siblings ...)
  2021-12-07 21:04 ` [PATCH 07/12] s390x/pci: enable for load/store intepretation Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-08 11:04   ` Thomas Huth
  2021-12-15  6:26   ` Pierre Morel
  2021-12-07 21:04 ` [PATCH 09/12] s390x/pci: enable adapter event notification for interpreted devices Matthew Rosato
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

Lack of MSI-X support is not an issue for interpreted passthrough
devices, so let's let these in.  This will allow, for example, ISM
devices to be passed through -- but only when interpretation is
available and being used.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 451bd32d92..503326210a 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -1096,7 +1096,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
             pbdev->interp = false;
         }
 
-        if (s390_pci_msix_init(pbdev)) {
+        if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
             error_setg(errp, "MSI-X support is mandatory "
                        "in the S390 architecture");
             return;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/12] s390x/pci: enable adapter event notification for interpreted devices
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
                   ` (7 preceding siblings ...)
  2021-12-07 21:04 ` [PATCH 08/12] s390x/pci: don't fence interpreted devices without MSI-X Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-08 11:29   ` Thomas Huth
  2021-12-07 21:04 ` [PATCH 10/12] s390x/pci: use I/O Address Translation assist when interpreting Matthew Rosato
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

Use the associated vfio feature ioctl to enable adapter event notification
and forwarding for devices when requested.  This feature will be set up
with or without firmware assist based upon the 'intassist' setting.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c          | 24 +++++++--
 hw/s390x/s390-pci-inst.c         | 54 +++++++++++++++++++-
 hw/s390x/s390-pci-vfio.c         | 88 ++++++++++++++++++++++++++++++++
 include/hw/s390x/s390-pci-bus.h  |  1 +
 include/hw/s390x/s390-pci-vfio.h | 20 ++++++++
 5 files changed, 182 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 503326210a..1ae8792a26 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -189,7 +189,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
         rc = SCLP_RC_NO_ACTION_REQUIRED;
         break;
     default:
-        if (pbdev->summary_ind) {
+        if (pbdev->interp) {
+            /* Interpreted devices were using interrupt forwarding */
+            s390_pci_set_aif(pbdev, NULL, false, pbdev->intassist);
+        } else if (pbdev->summary_ind) {
             pci_dereg_irqs(pbdev);
         }
         if (pbdev->iommu->enabled) {
@@ -981,6 +984,11 @@ static int s390_pci_interp_plug(S390pciState *s, S390PCIBusDevice *pbdev)
         return rc;
     }
 
+    rc = s390_pci_probe_aif(pbdev);
+    if (rc) {
+        return rc;
+    }
+
     rc = s390_pci_update_passthrough_fh(pbdev);
     if (rc) {
         return rc;
@@ -1075,6 +1083,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
             if (pbdev->interp && !s390_has_feat(S390_FEAT_ZPCI_INTERP)) {
                     DPRINTF("zPCI interpretation facilities missing.\n");
                     pbdev->interp = false;
+                    pbdev->intassist = false;
                 }
             if (pbdev->interp) {
                 rc = s390_pci_interp_plug(s, pbdev);
@@ -1089,11 +1098,13 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
             if (!pbdev->interp) {
                 /* Do vfio passthrough but intercept for I/O */
                 pbdev->fh |= FH_SHM_VFIO;
+                pbdev->intassist = false;
             }
         } else {
             pbdev->fh |= FH_SHM_EMUL;
             /* Always intercept emulated devices */
             pbdev->interp = false;
+            pbdev->intassist = false;
         }
 
         if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
@@ -1243,7 +1254,10 @@ static void s390_pcihost_reset(DeviceState *dev)
     /* Process all pending unplug requests */
     QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
         if (pbdev->unplug_requested) {
-            if (pbdev->summary_ind) {
+            if (pbdev->interp) {
+                /* Interpreted devices were using interrupt forwarding */
+                s390_pci_set_aif(pbdev, NULL, false, pbdev->intassist);
+            } else if (pbdev->summary_ind) {
                 pci_dereg_irqs(pbdev);
             }
             if (pbdev->iommu->enabled) {
@@ -1381,7 +1395,10 @@ static void s390_pci_device_reset(DeviceState *dev)
         break;
     }
 
-    if (pbdev->summary_ind) {
+    if (pbdev->interp) {
+        /* Interpreted devices were using interrupt forwarding */
+        s390_pci_set_aif(pbdev, NULL, false, pbdev->intassist);
+    } else if (pbdev->summary_ind) {
         pci_dereg_irqs(pbdev);
     }
     if (pbdev->iommu->enabled) {
@@ -1427,6 +1444,7 @@ static Property s390_pci_device_properties[] = {
     DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
     DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
     DEFINE_PROP_BOOL("interp", S390PCIBusDevice, interp, true),
+    DEFINE_PROP_BOOL("intassist", S390PCIBusDevice, intassist, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index ba4017474e..02093e82f9 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -1111,6 +1111,46 @@ static void fmb_update(void *opaque)
     timer_mod(pbdev->fmb_timer, t + DEFAULT_MUI);
 }
 
+static int mpcifc_reg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+    int rc;
+
+    /* Interpreted devices must also use interrupt forwarding */
+    rc = s390_pci_get_aif(pbdev, false, pbdev->intassist);
+    if (rc) {
+        DPRINTF("Bad interrupt forwarding state\n");
+        return rc;
+    }
+
+    rc = s390_pci_set_aif(pbdev, fib, true, pbdev->intassist);
+    if (rc) {
+        DPRINTF("Failed to enable interrupt forwarding\n");
+        return rc;
+    }
+
+    return 0;
+}
+
+static int mpcifc_dereg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+    int rc;
+
+    /* Interpreted devices were using interrupt forwarding */
+    rc = s390_pci_get_aif(pbdev, true, pbdev->intassist);
+    if (rc) {
+        DPRINTF("Bad interrupt forwarding state\n");
+        return rc;
+    }
+
+    rc = s390_pci_set_aif(pbdev, fib, false, pbdev->intassist);
+    if (rc) {
+        DPRINTF("Failed to disable interrupt forwarding\n");
+        return rc;
+    }
+
+    return 0;
+}
+
 int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
                         uintptr_t ra)
 {
@@ -1165,7 +1205,12 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
 
     switch (oc) {
     case ZPCI_MOD_FC_REG_INT:
-        if (pbdev->summary_ind) {
+        if (pbdev->interp) {
+            if (mpcifc_reg_int_interp(pbdev, &fib)) {
+                cc = ZPCI_PCI_LS_ERR;
+                s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
+            }
+        } else if (pbdev->summary_ind) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else if (reg_irqs(env, pbdev, fib)) {
@@ -1174,7 +1219,12 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
         }
         break;
     case ZPCI_MOD_FC_DEREG_INT:
-        if (!pbdev->summary_ind) {
+        if (pbdev->interp) {
+            if (mpcifc_dereg_int_interp(pbdev, &fib)) {
+                cc = ZPCI_PCI_LS_ERR;
+                s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
+            }
+        } else if (!pbdev->summary_ind) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else {
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 78093aaac7..6f9271df87 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -152,6 +152,94 @@ int s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev)
     return 0;
 }
 
+int s390_pci_probe_aif(S390PCIBusDevice *pbdev)
+{
+    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    struct vfio_device_feature feat = {
+        .argsz = sizeof(struct vfio_device_feature),
+        .flags = VFIO_DEVICE_FEATURE_PROBE + VFIO_DEVICE_FEATURE_ZPCI_AIF
+    };
+
+    assert(vdev);
+
+    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, &feat);
+}
+
+int s390_pci_set_aif(S390PCIBusDevice *pbdev, ZpciFib *fib, bool enable,
+                     bool assist)
+{
+    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    g_autofree struct vfio_device_feature *feat;
+    struct vfio_device_zpci_aif *data;
+    int size;
+
+    assert(vdev);
+
+    size = sizeof(*feat) + sizeof(*data);
+    feat = g_malloc0(size);
+    feat->argsz = size;
+    feat->flags = VFIO_DEVICE_FEATURE_SET + VFIO_DEVICE_FEATURE_ZPCI_AIF;
+
+    data = (struct vfio_device_zpci_aif *)&feat->data;
+    if (enable) {
+        data->flags = VFIO_DEVICE_ZPCI_FLAG_AIF_FLOAT;
+        if (!pbdev->intassist) {
+            data->flags |= VFIO_DEVICE_ZPCI_FLAG_AIF_HOST;
+        }
+        /* Fill in the guest fib info */
+        data->ibv = fib->aibv;
+        data->sb = fib->aisb;
+        data->noi = FIB_DATA_NOI(fib->data);
+        data->isc = FIB_DATA_ISC(fib->data);
+        data->sbo = FIB_DATA_AISBO(fib->data);
+    } else {
+        data->flags = 0;
+    }
+
+    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
+}
+
+int s390_pci_get_aif(S390PCIBusDevice *pbdev, bool enable, bool assist)
+{
+    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    g_autofree struct vfio_device_feature *feat;
+    struct vfio_device_zpci_aif *data;
+    int size, rc;
+
+    assert(vdev);
+
+    size = sizeof(*feat) + sizeof(*data);
+    feat = g_malloc0(size);
+    feat->argsz = size;
+    feat->flags = VFIO_DEVICE_FEATURE_GET + VFIO_DEVICE_FEATURE_ZPCI_AIF;
+
+    rc = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
+    if (rc) {
+        return rc;
+    }
+
+    /* Determine if current interrupt settings match the host */
+    data = (struct vfio_device_zpci_aif *)&feat->data;
+    if (enable && (!(data->flags & VFIO_DEVICE_ZPCI_FLAG_AIF_FLOAT))) {
+        rc = -EINVAL;
+    } else if (!enable && (data->flags & VFIO_DEVICE_ZPCI_FLAG_AIF_FLOAT)) {
+        rc = -EINVAL;
+    }
+
+    /*
+     * When enabled for interrupts, the assist and forced host-delivery are
+     * mututally exclusive
+     */
+    if (enable && assist && (data->flags & VFIO_DEVICE_ZPCI_FLAG_AIF_HOST)) {
+        rc = -EINVAL;
+    } else if (enable && (!assist) && (!(data->flags &
+                                         VFIO_DEVICE_ZPCI_FLAG_AIF_HOST))) {
+        rc = -EINVAL;
+    }
+
+    return rc;
+}
+
 static void s390_pci_read_base(S390PCIBusDevice *pbdev,
                                struct vfio_device_info *info)
 {
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index a9843dfe97..9941ca0084 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -351,6 +351,7 @@ struct S390PCIBusDevice {
     bool pci_unplug_request_processed;
     bool unplug_requested;
     bool interp;
+    bool intassist;
     QTAILQ_ENTRY(S390PCIBusDevice) link;
 };
 
diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
index 42533e38f7..6cec38a863 100644
--- a/include/hw/s390x/s390-pci-vfio.h
+++ b/include/hw/s390x/s390-pci-vfio.h
@@ -13,6 +13,7 @@
 #define HW_S390_PCI_VFIO_H
 
 #include "hw/s390x/s390-pci-bus.h"
+#include "hw/s390x/s390-pci-inst.h"
 #include CONFIG_DEVICES
 
 #ifdef CONFIG_VFIO
@@ -23,6 +24,11 @@ void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
 int s390_pci_probe_interp(S390PCIBusDevice *pbdev);
 int s390_pci_set_interp(S390PCIBusDevice *pbdev, bool enable);
 int s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev);
+int s390_pci_probe_aif(S390PCIBusDevice *pbdev);
+int s390_pci_set_aif(S390PCIBusDevice *pbdev, ZpciFib *fib, bool enable,
+                     bool assist);
+int s390_pci_get_aif(S390PCIBusDevice *pbdev, bool enable, bool assist);
+
 void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
 #else
 static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
@@ -48,6 +54,20 @@ static inline int s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev)
 {
     return -EINVAL;
 }
+static inline int s390_pci_probe_aif(S390PCIBusDevice *pbdev)
+{
+    return -EINVAL;
+}
+static inline int s390_pci_set_aif(S390PCIBusDevice *pbdev, ZpciFib *fib,
+                                   bool enable, bool assist)
+{
+    return -EINVAL;
+}
+static inline int s390_pci_get_aif(S390PCIBusDevice *pbdev, bool enable,
+                                   bool assist)
+{
+    return -EINVAL;
+}
 static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
 #endif
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/12] s390x/pci: use I/O Address Translation assist when interpreting
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
                   ` (8 preceding siblings ...)
  2021-12-07 21:04 ` [PATCH 09/12] s390x/pci: enable adapter event notification for interpreted devices Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-16  8:03   ` Pierre Morel
  2021-12-07 21:04 ` [PATCH 11/12] s390x/pci: use dtsm provided from vfio capabilities for interpreted devices Matthew Rosato
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

Allow the underlying kvm host to handle the Refresh PCI Translation
instruction intercepts.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c          |  6 ++--
 hw/s390x/s390-pci-inst.c         | 51 ++++++++++++++++++++++++++++++--
 hw/s390x/s390-pci-vfio.c         | 33 +++++++++++++++++++++
 include/hw/s390x/s390-pci-inst.h |  2 +-
 include/hw/s390x/s390-pci-vfio.h | 10 +++++++
 5 files changed, 95 insertions(+), 7 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 1ae8792a26..ab442f17fb 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -196,7 +196,7 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
             pci_dereg_irqs(pbdev);
         }
         if (pbdev->iommu->enabled) {
-            pci_dereg_ioat(pbdev->iommu);
+            pci_dereg_ioat(pbdev);
         }
         pbdev->state = ZPCI_FS_STANDBY;
         rc = SCLP_RC_NORMAL_COMPLETION;
@@ -1261,7 +1261,7 @@ static void s390_pcihost_reset(DeviceState *dev)
                 pci_dereg_irqs(pbdev);
             }
             if (pbdev->iommu->enabled) {
-                pci_dereg_ioat(pbdev->iommu);
+                pci_dereg_ioat(pbdev);
             }
             pbdev->state = ZPCI_FS_STANDBY;
             s390_pci_perform_unplug(pbdev);
@@ -1402,7 +1402,7 @@ static void s390_pci_device_reset(DeviceState *dev)
         pci_dereg_irqs(pbdev);
     }
     if (pbdev->iommu->enabled) {
-        pci_dereg_ioat(pbdev->iommu);
+        pci_dereg_ioat(pbdev);
     }
 
     fmb_timer_free(pbdev);
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 02093e82f9..598e5a5d52 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -978,6 +978,24 @@ int pci_dereg_irqs(S390PCIBusDevice *pbdev)
     return 0;
 }
 
+static int reg_ioat_interp(S390PCIBusDevice *pbdev, uint64_t iota)
+{
+    int rc;
+
+    rc = s390_pci_probe_ioat(pbdev);
+    if (rc) {
+        return rc;
+    }
+
+    rc = s390_pci_set_ioat(pbdev, iota);
+    if (rc) {
+        return rc;
+    }
+
+    pbdev->iommu->enabled = true;
+    return 0;
+}
+
 static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib,
                     uintptr_t ra)
 {
@@ -995,6 +1013,16 @@ static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib,
         return -EINVAL;
     }
 
+    /* If this is an interpreted device, we must use the IOAT assist */
+    if (pbdev->interp) {
+        if (reg_ioat_interp(pbdev, g_iota)) {
+            error_report("failure starting ioat assist");
+            s390_program_interrupt(env, PGM_OPERAND, ra);
+            return -EINVAL;
+        }
+        return 0;
+    }
+
     /* currently we only support designation type 1 with translation */
     if (!(dt == ZPCI_IOTA_RTTO && t)) {
         error_report("unsupported ioat dt %d t %d", dt, t);
@@ -1011,8 +1039,25 @@ static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib,
     return 0;
 }
 
-void pci_dereg_ioat(S390PCIIOMMU *iommu)
+static void dereg_ioat_interp(S390PCIBusDevice *pbdev)
 {
+    if (s390_pci_probe_ioat(pbdev) != 0) {
+        return;
+    }
+
+    s390_pci_set_ioat(pbdev, 0);
+    pbdev->iommu->enabled = false;
+}
+
+void pci_dereg_ioat(S390PCIBusDevice *pbdev)
+{
+    S390PCIIOMMU *iommu = pbdev->iommu;
+
+    if (pbdev->interp) {
+        dereg_ioat_interp(pbdev);
+        return;
+    }
+
     s390_pci_iommu_disable(iommu);
     iommu->pba = 0;
     iommu->pal = 0;
@@ -1251,7 +1296,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else {
-            pci_dereg_ioat(pbdev->iommu);
+            pci_dereg_ioat(pbdev);
         }
         break;
     case ZPCI_MOD_FC_REREG_IOAT:
@@ -1262,7 +1307,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else {
-            pci_dereg_ioat(pbdev->iommu);
+            pci_dereg_ioat(pbdev);
             if (reg_ioat(env, pbdev, fib, ra)) {
                 cc = ZPCI_PCI_LS_ERR;
                 s390_set_status_code(env, r1, ZPCI_MOD_ST_INSUF_RES);
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 6f9271df87..6fc03a858a 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -240,6 +240,39 @@ int s390_pci_get_aif(S390PCIBusDevice *pbdev, bool enable, bool assist)
     return rc;
 }
 
+int s390_pci_probe_ioat(S390PCIBusDevice *pbdev)
+{
+    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    struct vfio_device_feature feat = {
+        .argsz = sizeof(struct vfio_device_feature),
+        .flags = VFIO_DEVICE_FEATURE_PROBE + VFIO_DEVICE_FEATURE_ZPCI_IOAT
+    };
+
+    assert(vdev);
+
+    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, &feat);
+}
+
+int s390_pci_set_ioat(S390PCIBusDevice *pbdev, uint64_t iota)
+{
+    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    g_autofree struct vfio_device_feature *feat;
+    struct vfio_device_zpci_ioat *data;
+    int size;
+
+    assert(vdev);
+
+    size = sizeof(*feat) + sizeof(*data);
+    feat = g_malloc0(size);
+    feat->argsz = size;
+    feat->flags = VFIO_DEVICE_FEATURE_SET + VFIO_DEVICE_FEATURE_ZPCI_IOAT;
+
+    data = (struct vfio_device_zpci_ioat *)&feat->data;
+    data->iota = iota;
+
+    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
+}
+
 static void s390_pci_read_base(S390PCIBusDevice *pbdev,
                                struct vfio_device_info *info)
 {
diff --git a/include/hw/s390x/s390-pci-inst.h b/include/hw/s390x/s390-pci-inst.h
index a55c448aad..13566fb7b4 100644
--- a/include/hw/s390x/s390-pci-inst.h
+++ b/include/hw/s390x/s390-pci-inst.h
@@ -99,7 +99,7 @@ typedef struct ZpciFib {
 } QEMU_PACKED ZpciFib;
 
 int pci_dereg_irqs(S390PCIBusDevice *pbdev);
-void pci_dereg_ioat(S390PCIIOMMU *iommu);
+void pci_dereg_ioat(S390PCIBusDevice *pbdev);
 int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra);
 int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra);
 int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra);
diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
index 6cec38a863..e7a2d8ff77 100644
--- a/include/hw/s390x/s390-pci-vfio.h
+++ b/include/hw/s390x/s390-pci-vfio.h
@@ -28,6 +28,8 @@ int s390_pci_probe_aif(S390PCIBusDevice *pbdev);
 int s390_pci_set_aif(S390PCIBusDevice *pbdev, ZpciFib *fib, bool enable,
                      bool assist);
 int s390_pci_get_aif(S390PCIBusDevice *pbdev, bool enable, bool assist);
+int s390_pci_probe_ioat(S390PCIBusDevice *pbdev);
+int s390_pci_set_ioat(S390PCIBusDevice *pbdev, uint64_t iota);
 
 void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
 #else
@@ -68,6 +70,14 @@ static inline int s390_pci_get_aif(S390PCIBusDevice *pbdev, bool enable,
 {
     return -EINVAL;
 }
+static inline int s390_pci_probe_ioat(S390PCIBusDevice *pbdev)
+{
+    return -EINVAL;
+}
+static inline int s390_pci_set_ioat(S390PCIBusDevice *pbdev, uint64_t iota)
+{
+    return -EINVAL;
+}
 static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
 #endif
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/12] s390x/pci: use dtsm provided from vfio capabilities for interpreted devices
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
                   ` (9 preceding siblings ...)
  2021-12-07 21:04 ` [PATCH 10/12] s390x/pci: use I/O Address Translation assist when interpreting Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-15  7:47   ` Pierre Morel
  2021-12-07 21:04 ` [PATCH 12/12] s390x/pci: let intercept devices have separate PCI groups Matthew Rosato
  2021-12-15  7:35 ` [PATCH 00/12] s390x/pci: zPCI interpretation support Pierre Morel
  12 siblings, 1 reply; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

When using the IOAT assist via interpretation, we should advertise what
the host driver supports, not QEMU.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-vfio.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 6fc03a858a..c9269683f5 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -336,7 +336,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
         resgrp->i = cap->noi;
         resgrp->maxstbl = cap->maxstbl;
         resgrp->version = cap->version;
-        resgrp->dtsm = ZPCI_DTSM;
+        if (hdr->version >= 2 && pbdev->interp) {
+            resgrp->dtsm = cap->dtsm;
+        } else {
+            resgrp->dtsm = ZPCI_DTSM;
+        }
     }
 }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/12] s390x/pci: let intercept devices have separate PCI groups
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
                   ` (10 preceding siblings ...)
  2021-12-07 21:04 ` [PATCH 11/12] s390x/pci: use dtsm provided from vfio capabilities for interpreted devices Matthew Rosato
@ 2021-12-07 21:04 ` Matthew Rosato
  2021-12-16  8:15   ` Pierre Morel
  2021-12-15  7:35 ` [PATCH 00/12] s390x/pci: zPCI interpretation support Pierre Morel
  12 siblings, 1 reply; 32+ messages in thread
From: Matthew Rosato @ 2021-12-07 21:04 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

Let's use the reserved pool of simulated PCI groups to allow intercept
devices to have separate groups from interpreted devices as some group
values may be different. If we run out of simulated PCI groups, subsequent
intercept devices just get the default group.
Furthermore, if we encounter any PCI groups from hostdevs that are marked
as simulated, let's just assign them to the default group to avoid
conflicts between host simulated groups and our own simulated groups.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c         | 19 ++++++++++++++--
 hw/s390x/s390-pci-vfio.c        | 40 ++++++++++++++++++++++++++++++---
 include/hw/s390x/s390-pci-bus.h |  6 ++++-
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index ab442f17fb..8b0f3ef120 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -747,13 +747,14 @@ static void s390_pci_iommu_free(S390pciState *s, PCIBus *bus, int32_t devfn)
     object_unref(OBJECT(iommu));
 }
 
-S390PCIGroup *s390_group_create(int id)
+S390PCIGroup *s390_group_create(int id, int host_id)
 {
     S390PCIGroup *group;
     S390pciState *s = s390_get_phb();
 
     group = g_new0(S390PCIGroup, 1);
     group->id = id;
+    group->host_id = host_id;
     QTAILQ_INSERT_TAIL(&s->zpci_groups, group, link);
     return group;
 }
@@ -771,12 +772,25 @@ S390PCIGroup *s390_group_find(int id)
     return NULL;
 }
 
+S390PCIGroup *s390_group_find_host_sim(int host_id)
+{
+    S390PCIGroup *group;
+    S390pciState *s = s390_get_phb();
+
+    QTAILQ_FOREACH(group, &s->zpci_groups, link) {
+        if (group->id >= ZPCI_SIM_GRP_START && group->host_id == host_id) {
+            return group;
+        }
+    }
+    return NULL;
+}
+
 static void s390_pci_init_default_group(void)
 {
     S390PCIGroup *group;
     ClpRspQueryPciGrp *resgrp;
 
-    group = s390_group_create(ZPCI_DEFAULT_FN_GRP);
+    group = s390_group_create(ZPCI_DEFAULT_FN_GRP, ZPCI_DEFAULT_FN_GRP);
     resgrp = &group->zpci_group;
     resgrp->fr = 1;
     resgrp->dasm = 0;
@@ -824,6 +838,7 @@ static void s390_pcihost_realize(DeviceState *dev, Error **errp)
                                            NULL, g_free);
     s->zpci_table = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, NULL);
     s->bus_no = 0;
+    s->next_sim_grp = ZPCI_SIM_GRP_START;
     QTAILQ_INIT(&s->pending_sei);
     QTAILQ_INIT(&s->zpci_devs);
     QTAILQ_INIT(&s->zpci_dma_limit);
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index c9269683f5..bdc5892287 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -305,13 +305,17 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 {
     struct vfio_info_cap_header *hdr;
     struct vfio_device_info_cap_zpci_group *cap;
+    S390pciState *s = s390_get_phb();
     ClpRspQueryPciGrp *resgrp;
     VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
 
     hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
 
-    /* If capability not provided, just use the default group */
-    if (hdr == NULL) {
+    /*
+     * If capability not provided or the underlying hostdev is simulated, just
+     * use the default group.
+     */
+    if (hdr == NULL || pbdev->zpci_fn.pfgid >= ZPCI_SIM_GRP_START) {
         trace_s390_pci_clp_cap(vpci->vbasedev.name,
                                VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
         pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
@@ -320,11 +324,41 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
     }
     cap = (void *) hdr;
 
+    /*
+     * For an intercept device, let's use an existing simulated group if one
+     * one was already created for other intercept devices in this group.
+     * If not, create a new simulated group if any are still available.
+     * If all else fails, just fall back on the default group.
+     */
+    if (!pbdev->interp) {
+        pbdev->pci_group = s390_group_find_host_sim(pbdev->zpci_fn.pfgid);
+        if (pbdev->pci_group) {
+            /* Use existing simulated group */
+            pbdev->zpci_fn.pfgid = pbdev->pci_group->id;
+            return;
+        } else {
+            if (s->next_sim_grp == ZPCI_DEFAULT_FN_GRP) {
+                /* All out of simulated groups, use default */
+                trace_s390_pci_clp_cap(vpci->vbasedev.name,
+                                       VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
+                pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
+                pbdev->pci_group = s390_group_find(ZPCI_DEFAULT_FN_GRP);
+                return;
+            } else {
+                /* We can assign a new simulated group */
+                pbdev->zpci_fn.pfgid = s->next_sim_grp;
+                s->next_sim_grp++;
+                /* Fall through to create the new sim group using CLP info */
+            }
+        }
+    }
+
     /* See if the PCI group is already defined, create if not */
     pbdev->pci_group = s390_group_find(pbdev->zpci_fn.pfgid);
 
     if (!pbdev->pci_group) {
-        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid);
+        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid,
+                                             pbdev->zpci_fn.pfgid);
 
         resgrp = &pbdev->pci_group->zpci_group;
         if (cap->flags & VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH) {
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 9941ca0084..8664023d5d 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -315,13 +315,16 @@ typedef struct ZpciFmb {
 QEMU_BUILD_BUG_MSG(offsetof(ZpciFmb, fmt0) != 48, "padding in ZpciFmb");
 
 #define ZPCI_DEFAULT_FN_GRP 0xFF
+#define ZPCI_SIM_GRP_START 0xF0
 typedef struct S390PCIGroup {
     ClpRspQueryPciGrp zpci_group;
     int id;
+    int host_id;
     QTAILQ_ENTRY(S390PCIGroup) link;
 } S390PCIGroup;
-S390PCIGroup *s390_group_create(int id);
+S390PCIGroup *s390_group_create(int id, int host_id);
 S390PCIGroup *s390_group_find(int id);
+S390PCIGroup *s390_group_find_host_sim(int host_id);
 
 struct S390PCIBusDevice {
     DeviceState qdev;
@@ -370,6 +373,7 @@ struct S390pciState {
     QTAILQ_HEAD(, S390PCIBusDevice) zpci_devs;
     QTAILQ_HEAD(, S390PCIDMACount) zpci_dma_limit;
     QTAILQ_HEAD(, S390PCIGroup) zpci_groups;
+    uint8_t next_sim_grp;
 };
 
 S390pciState *s390_get_phb(void);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/12] target/s390x: add zpci-interp to cpu models
  2021-12-07 21:04 ` [PATCH 06/12] target/s390x: add zpci-interp to cpu models Matthew Rosato
@ 2021-12-08 10:16   ` Christian Borntraeger
  2021-12-08 18:00     ` Matthew Rosato
  0 siblings, 1 reply; 32+ messages in thread
From: Christian Borntraeger @ 2021-12-08 10:16 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david

Am 07.12.21 um 22:04 schrieb Matthew Rosato:
> The zpci-interp feature is used to specify whether zPCI interpretation is
> to be used for this guest.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   target/s390x/cpu_features_def.h.inc | 1 +
>   target/s390x/gen-features.c         | 2 ++
>   target/s390x/kvm/kvm.c              | 1 +
>   3 files changed, 4 insertions(+)
> 
> diff --git a/target/s390x/cpu_features_def.h.inc b/target/s390x/cpu_features_def.h.inc
> index e86662bb3b..4ade3182aa 100644
> --- a/target/s390x/cpu_features_def.h.inc
> +++ b/target/s390x/cpu_features_def.h.inc
> @@ -146,6 +146,7 @@ DEF_FEAT(SIE_CEI, "cei", SCLP_CPU, 43, "SIE: Conditional-external-interception f
>   DEF_FEAT(DAT_ENH_2, "dateh2", MISC, 0, "DAT-enhancement facility 2")
>   DEF_FEAT(CMM, "cmm", MISC, 0, "Collaborative-memory-management facility")
>   DEF_FEAT(AP, "ap", MISC, 0, "AP instructions installed")
> +DEF_FEAT(ZPCI_INTERP, "zpci-interp", MISC, 0, "zPCI interpretation")
>   
>   /* Features exposed via the PLO instruction. */
>   DEF_FEAT(PLO_CL, "plo-cl", PLO, 0, "PLO Compare and load (32 bit in general registers)")
> diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
> index 7cb1a6ec10..7005d22415 100644
> --- a/target/s390x/gen-features.c
> +++ b/target/s390x/gen-features.c
> @@ -554,6 +554,7 @@ static uint16_t full_GEN14_GA1[] = {
>       S390_FEAT_HPMA2,
>       S390_FEAT_SIE_KSS,
>       S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
> +    S390_FEAT_ZPCI_INTERP,
>   };
>   
>   #define full_GEN14_GA2 EmptyFeat
> @@ -650,6 +651,7 @@ static uint16_t default_GEN14_GA1[] = {
>       S390_FEAT_GROUP_MSA_EXT_8,
>       S390_FEAT_MULTIPLE_EPOCH,
>       S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
> +    S390_FEAT_ZPCI_INTERP,
>   };
>   

For the default model you need to be careful.
Is this in any way guest visible? then you definitely need to fence this
off for older QEMU versions so that when you migrate with older QEMUs
See the s390_cpudef_featoff_greater calls in  hw/s390x/s390-virtio-ccw.c

I know its more of a theoretical aspect, since PCI currently forbids migration
but we should try to have the cpu model consistent I guess.
>   #define default_GEN14_GA2 EmptyFeat
> diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
> index 5b1fdb55c4..b13d78f988 100644
> --- a/target/s390x/kvm/kvm.c
> +++ b/target/s390x/kvm/kvm.c
> @@ -2290,6 +2290,7 @@ static int kvm_to_feat[][2] = {
>       { KVM_S390_VM_CPU_FEAT_PFMFI, S390_FEAT_SIE_PFMFI},
>       { KVM_S390_VM_CPU_FEAT_SIGPIF, S390_FEAT_SIE_SIGPIF},
>       { KVM_S390_VM_CPU_FEAT_KSS, S390_FEAT_SIE_KSS},
> +    { KVM_S390_VM_CPU_FEAT_ZPCI_INTERP, S390_FEAT_ZPCI_INTERP },
>   };
>   
>   static int query_cpu_feat(S390FeatBitmap features)
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/12] s390x/pci: use a reserved ID for the default PCI group
  2021-12-07 21:04 ` [PATCH 01/12] s390x/pci: use a reserved ID for the default PCI group Matthew Rosato
@ 2021-12-08 10:30   ` Thomas Huth
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Huth @ 2021-12-08 10:30 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 07/12/2021 22.04, Matthew Rosato wrote:
> The current default PCI group being used can technically collide with a
> real group ID passed from a hostdev.  Let's instead use a group ID that
> comes from a special pool (0xF0-0xFF) that is architected to be reserved
> for simulated devices.

Maybe mention that this is not a problem for migration since zPCI currently 
can't be migrated anyway (as mentioned in the discussion of an earlier 
version of this patch)

Acked-by: Thomas Huth <thuth@redhat.com>


> Fixes: 28dc86a072 ("s390x/pci: use a PCI Group structure")
> Reviewed-by: Eric Farman <farman@linux.ibm.com>
> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   include/hw/s390x/s390-pci-bus.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
> index aa891c178d..2727e7bdef 100644
> --- a/include/hw/s390x/s390-pci-bus.h
> +++ b/include/hw/s390x/s390-pci-bus.h
> @@ -313,7 +313,7 @@ typedef struct ZpciFmb {
>   } ZpciFmb;
>   QEMU_BUILD_BUG_MSG(offsetof(ZpciFmb, fmt0) != 48, "padding in ZpciFmb");
>   
> -#define ZPCI_DEFAULT_FN_GRP 0x20
> +#define ZPCI_DEFAULT_FN_GRP 0xFF
>   typedef struct S390PCIGroup {
>       ClpRspQueryPciGrp zpci_group;
>       int id;
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/12] s390x/pci: don't use hard-coded dma range in reg_ioat
  2021-12-07 21:04 ` [PATCH 02/12] s390x/pci: don't use hard-coded dma range in reg_ioat Matthew Rosato
@ 2021-12-08 10:32   ` Thomas Huth
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Huth @ 2021-12-08 10:32 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 07/12/2021 22.04, Matthew Rosato wrote:
> Instead use the values from clp info, they will either be the hard-coded
> values or what came from the host driver via vfio.
> 
> Fixes: 9670ee752727 ("s390x/pci: use a PCI Function structure")
> Reviewed-by: Eric Farman <farman@linux.ibm.com>
> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-inst.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)


Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/12] s390x/pci: enable for load/store intepretation
  2021-12-07 21:04 ` [PATCH 07/12] s390x/pci: enable for load/store intepretation Matthew Rosato
@ 2021-12-08 10:56   ` Thomas Huth
  2021-12-15  7:44   ` Pierre Morel
  1 sibling, 0 replies; 32+ messages in thread
From: Thomas Huth @ 2021-12-08 10:56 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 07/12/2021 22.04, Matthew Rosato wrote:
> Use the associated vfio feature ioctl to enable interpretation for devices
> when requested.  As part of this process, we must use the host function
> handle rather than a QEMU-generated one -- this is provided as part of the
> ioctl payload.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-bus.c          | 69 +++++++++++++++++++++++++++++++-
>   hw/s390x/s390-pci-inst.c         | 63 ++++++++++++++++++++++++++++-
>   hw/s390x/s390-pci-vfio.c         | 55 +++++++++++++++++++++++++
>   include/hw/s390x/s390-pci-bus.h  |  1 +
>   include/hw/s390x/s390-pci-vfio.h | 15 +++++++
>   5 files changed, 201 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 01b58ebc70..451bd32d92 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -971,12 +971,57 @@ static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
>       }
>   }
>   
> +static int s390_pci_interp_plug(S390pciState *s, S390PCIBusDevice *pbdev)
> +{
> +    uint32_t idx;
> +    int rc;
> +
> +    rc = s390_pci_probe_interp(pbdev);
> +    if (rc) {
> +        return rc;
> +    }
> +
> +    rc = s390_pci_update_passthrough_fh(pbdev);
> +    if (rc) {
> +        return rc;
> +    }
> +
> +    /*
> +     * The host device is in an enabled state, but the device must
> +     * begin as disabled for the guest so mask off the enable bit
> +     * from the passthrough handle.
> +     */
> +    pbdev->fh &= ~FH_MASK_ENABLE;
> +
> +    /* Next, see if the idx is already in-use */
> +    idx = pbdev->fh & FH_MASK_INDEX;
> +    if (pbdev->idx != idx) {
> +        if (s390_pci_find_dev_by_idx(s, idx)) {
> +            return -EINVAL;
> +        }
> +        /*
> +         * Update the idx entry with the passed through idx
> +         * If the relinquised idx is lower than next_idx, use it

s/relinquised/relinquished/

> +         * to replace next_idx
> +         */
> +        g_hash_table_remove(s->zpci_table, &pbdev->idx);
> +        if (idx < s->next_idx) {
> +            s->next_idx = idx;
> +        }
> +        pbdev->idx = idx;
> +        g_hash_table_insert(s->zpci_table, &pbdev->idx, pbdev);
> +    }
> +
> +    return 0;
> +}
[...]
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 6f80a47e29..78093aaac7 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -97,6 +97,61 @@ void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt)
>       }
>   }
>   
> +int s390_pci_probe_interp(S390PCIBusDevice *pbdev)
> +{
> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +    struct vfio_device_feature feat = {
> +        .argsz = sizeof(struct vfio_device_feature),
> +        .flags = VFIO_DEVICE_FEATURE_PROBE + VFIO_DEVICE_FEATURE_ZPCI_INTERP
> +    };
> +
> +    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, &feat);
> +}
> +
> +int s390_pci_set_interp(S390PCIBusDevice *pbdev, bool enable)
> +{
> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +    g_autofree struct vfio_device_feature *feat;

IIRC there have been compiler versions that complain if a g_autofree 
variable is initialized at the point of declaration, so you might need to 
add the "= g_malloc0(size)" here already.

> +    struct vfio_device_zpci_interp *data;
> +    int size;
> +
> +    size = sizeof(*feat) + sizeof(*data);
> +    feat = g_malloc0(size);
> +    feat->argsz = size;
> +    feat->flags = VFIO_DEVICE_FEATURE_SET + VFIO_DEVICE_FEATURE_ZPCI_INTERP;
> +
> +    data = (struct vfio_device_zpci_interp *)&feat->data;
> +    if (enable) {
> +        data->flags = VFIO_DEVICE_ZPCI_FLAG_INTERP;
> +    } else {
> +        data->flags = 0;
> +    }
> +
> +    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
> +}
> +
> +int s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev)
> +{
> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +    g_autofree struct vfio_device_feature *feat;

dito

> +    struct vfio_device_zpci_interp *data;
> +    int size, rc;
> +
> +    size = sizeof(*feat) + sizeof(*data);
> +    feat = g_malloc0(size);
> +    feat->argsz = size;
> +    feat->flags = VFIO_DEVICE_FEATURE_GET + VFIO_DEVICE_FEATURE_ZPCI_INTERP;
> +
> +    rc = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
> +    if (rc) {
> +        return rc;
> +    }
> +
> +    data = (struct vfio_device_zpci_interp *)&feat->data;
> +    pbdev->fh = data->fh;
> +    return 0;
> +}

  Thomas



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/12] s390x/pci: don't fence interpreted devices without MSI-X
  2021-12-07 21:04 ` [PATCH 08/12] s390x/pci: don't fence interpreted devices without MSI-X Matthew Rosato
@ 2021-12-08 11:04   ` Thomas Huth
  2021-12-15  6:26   ` Pierre Morel
  1 sibling, 0 replies; 32+ messages in thread
From: Thomas Huth @ 2021-12-08 11:04 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 07/12/2021 22.04, Matthew Rosato wrote:
> Lack of MSI-X support is not an issue for interpreted passthrough
> devices, so let's let these in.  This will allow, for example, ISM
> devices to be passed through -- but only when interpretation is
> available and being used.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-bus.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 451bd32d92..503326210a 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -1096,7 +1096,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>               pbdev->interp = false;
>           }
>   
> -        if (s390_pci_msix_init(pbdev)) {
> +        if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
>               error_setg(errp, "MSI-X support is mandatory "
>                          "in the S390 architecture");
>               return;
> 

Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/12] s390x/pci: enable adapter event notification for interpreted devices
  2021-12-07 21:04 ` [PATCH 09/12] s390x/pci: enable adapter event notification for interpreted devices Matthew Rosato
@ 2021-12-08 11:29   ` Thomas Huth
  2021-12-08 19:09     ` Matthew Rosato
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Huth @ 2021-12-08 11:29 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 07/12/2021 22.04, Matthew Rosato wrote:
> Use the associated vfio feature ioctl to enable adapter event notification
> and forwarding for devices when requested.  This feature will be set up
> with or without firmware assist based upon the 'intassist' setting.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-bus.c          | 24 +++++++--
>   hw/s390x/s390-pci-inst.c         | 54 +++++++++++++++++++-
>   hw/s390x/s390-pci-vfio.c         | 88 ++++++++++++++++++++++++++++++++
>   include/hw/s390x/s390-pci-bus.h  |  1 +
>   include/hw/s390x/s390-pci-vfio.h | 20 ++++++++
>   5 files changed, 182 insertions(+), 5 deletions(-)
[...]
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 78093aaac7..6f9271df87 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -152,6 +152,94 @@ int s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev)
>       return 0;
>   }
>   
> +int s390_pci_probe_aif(S390PCIBusDevice *pbdev)
> +{
> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);

Should this use VFIO_PCI() instead of container_of ?

> +    struct vfio_device_feature feat = {
> +        .argsz = sizeof(struct vfio_device_feature),
> +        .flags = VFIO_DEVICE_FEATURE_PROBE + VFIO_DEVICE_FEATURE_ZPCI_AIF
> +    };
> +
> +    assert(vdev);

... then you could likely also drop the assert(), I think?

> +    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, &feat);
> +}
> +
> +int s390_pci_set_aif(S390PCIBusDevice *pbdev, ZpciFib *fib, bool enable,
> +                     bool assist)
> +{
> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +    g_autofree struct vfio_device_feature *feat;
> +    struct vfio_device_zpci_aif *data;
> +    int size;
> +
> +    assert(vdev);

dito

> +    size = sizeof(*feat) + sizeof(*data);
> +    feat = g_malloc0(size);
> +    feat->argsz = size;
> +    feat->flags = VFIO_DEVICE_FEATURE_SET + VFIO_DEVICE_FEATURE_ZPCI_AIF;
> +
> +    data = (struct vfio_device_zpci_aif *)&feat->data;
> +    if (enable) {
> +        data->flags = VFIO_DEVICE_ZPCI_FLAG_AIF_FLOAT;
> +        if (!pbdev->intassist) {
> +            data->flags |= VFIO_DEVICE_ZPCI_FLAG_AIF_HOST;
> +        }
> +        /* Fill in the guest fib info */
> +        data->ibv = fib->aibv;
> +        data->sb = fib->aisb;
> +        data->noi = FIB_DATA_NOI(fib->data);
> +        data->isc = FIB_DATA_ISC(fib->data);
> +        data->sbo = FIB_DATA_AISBO(fib->data);
> +    } else {
> +        data->flags = 0;
> +    }
> +
> +    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
> +}
> +
> +int s390_pci_get_aif(S390PCIBusDevice *pbdev, bool enable, bool assist)
> +{
> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +    g_autofree struct vfio_device_feature *feat;
> +    struct vfio_device_zpci_aif *data;
> +    int size, rc;
> +
> +    assert(vdev);

dito

> +    size = sizeof(*feat) + sizeof(*data);
> +    feat = g_malloc0(size);
> +    feat->argsz = size;
> +    feat->flags = VFIO_DEVICE_FEATURE_GET + VFIO_DEVICE_FEATURE_ZPCI_AIF;
> +
> +    rc = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
> +    if (rc) {
> +        return rc;
> +    }
> +
> +    /* Determine if current interrupt settings match the host */
> +    data = (struct vfio_device_zpci_aif *)&feat->data;
> +    if (enable && (!(data->flags & VFIO_DEVICE_ZPCI_FLAG_AIF_FLOAT))) {
> +        rc = -EINVAL;
> +    } else if (!enable && (data->flags & VFIO_DEVICE_ZPCI_FLAG_AIF_FLOAT)) {
> +        rc = -EINVAL;
> +    }
> +
> +    /*
> +     * When enabled for interrupts, the assist and forced host-delivery are
> +     * mututally exclusive
> +     */
> +    if (enable && assist && (data->flags & VFIO_DEVICE_ZPCI_FLAG_AIF_HOST)) {
> +        rc = -EINVAL;
> +    } else if (enable && (!assist) && (!(data->flags &
> +                                         VFIO_DEVICE_ZPCI_FLAG_AIF_HOST))) {
> +        rc = -EINVAL;
> +    }
> +
> +    return rc;
> +}

  Thomas



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/12] target/s390x: add zpci-interp to cpu models
  2021-12-08 10:16   ` Christian Borntraeger
@ 2021-12-08 18:00     ` Matthew Rosato
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Rosato @ 2021-12-08 18:00 UTC (permalink / raw)
  To: Christian Borntraeger, qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david

On 12/8/21 5:16 AM, Christian Borntraeger wrote:
> Am 07.12.21 um 22:04 schrieb Matthew Rosato:
>> The zpci-interp feature is used to specify whether zPCI interpretation is
>> to be used for this guest.
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>>   target/s390x/cpu_features_def.h.inc | 1 +
>>   target/s390x/gen-features.c         | 2 ++
>>   target/s390x/kvm/kvm.c              | 1 +
>>   3 files changed, 4 insertions(+)
>>
>> diff --git a/target/s390x/cpu_features_def.h.inc 
>> b/target/s390x/cpu_features_def.h.inc
>> index e86662bb3b..4ade3182aa 100644
>> --- a/target/s390x/cpu_features_def.h.inc
>> +++ b/target/s390x/cpu_features_def.h.inc
>> @@ -146,6 +146,7 @@ DEF_FEAT(SIE_CEI, "cei", SCLP_CPU, 43, "SIE: 
>> Conditional-external-interception f
>>   DEF_FEAT(DAT_ENH_2, "dateh2", MISC, 0, "DAT-enhancement facility 2")
>>   DEF_FEAT(CMM, "cmm", MISC, 0, "Collaborative-memory-management 
>> facility")
>>   DEF_FEAT(AP, "ap", MISC, 0, "AP instructions installed")
>> +DEF_FEAT(ZPCI_INTERP, "zpci-interp", MISC, 0, "zPCI interpretation")
>>   /* Features exposed via the PLO instruction. */
>>   DEF_FEAT(PLO_CL, "plo-cl", PLO, 0, "PLO Compare and load (32 bit in 
>> general registers)")
>> diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
>> index 7cb1a6ec10..7005d22415 100644
>> --- a/target/s390x/gen-features.c
>> +++ b/target/s390x/gen-features.c
>> @@ -554,6 +554,7 @@ static uint16_t full_GEN14_GA1[] = {
>>       S390_FEAT_HPMA2,
>>       S390_FEAT_SIE_KSS,
>>       S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
>> +    S390_FEAT_ZPCI_INTERP,
>>   };
>>   #define full_GEN14_GA2 EmptyFeat
>> @@ -650,6 +651,7 @@ static uint16_t default_GEN14_GA1[] = {
>>       S390_FEAT_GROUP_MSA_EXT_8,
>>       S390_FEAT_MULTIPLE_EPOCH,
>>       S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
>> +    S390_FEAT_ZPCI_INTERP,
>>   };
> 
> For the default model you need to be careful.
> Is this in any way guest visible? then you definitely need to fence this
> off for older QEMU versions so that when you migrate with older QEMUs
> See the s390_cpudef_featoff_greater calls in  hw/s390x/s390-virtio-ccw.c
> 
> I know its more of a theoretical aspect, since PCI currently forbids 
> migration
> but we should try to have the cpu model consistent I guess.

Ah, good idea.  Thanks for the pointer.

>>   #define default_GEN14_GA2 EmptyFeat
>> diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
>> index 5b1fdb55c4..b13d78f988 100644
>> --- a/target/s390x/kvm/kvm.c
>> +++ b/target/s390x/kvm/kvm.c
>> @@ -2290,6 +2290,7 @@ static int kvm_to_feat[][2] = {
>>       { KVM_S390_VM_CPU_FEAT_PFMFI, S390_FEAT_SIE_PFMFI},
>>       { KVM_S390_VM_CPU_FEAT_SIGPIF, S390_FEAT_SIE_SIGPIF},
>>       { KVM_S390_VM_CPU_FEAT_KSS, S390_FEAT_SIE_KSS},
>> +    { KVM_S390_VM_CPU_FEAT_ZPCI_INTERP, S390_FEAT_ZPCI_INTERP },
>>   };
>>   static int query_cpu_feat(S390FeatBitmap features)
>>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/12] s390x/pci: enable adapter event notification for interpreted devices
  2021-12-08 11:29   ` Thomas Huth
@ 2021-12-08 19:09     ` Matthew Rosato
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Rosato @ 2021-12-08 19:09 UTC (permalink / raw)
  To: Thomas Huth, qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 12/8/21 6:29 AM, Thomas Huth wrote:
> On 07/12/2021 22.04, Matthew Rosato wrote:
>> Use the associated vfio feature ioctl to enable adapter event 
>> notification
>> and forwarding for devices when requested.  This feature will be set up
>> with or without firmware assist based upon the 'intassist' setting.
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>>   hw/s390x/s390-pci-bus.c          | 24 +++++++--
>>   hw/s390x/s390-pci-inst.c         | 54 +++++++++++++++++++-
>>   hw/s390x/s390-pci-vfio.c         | 88 ++++++++++++++++++++++++++++++++
>>   include/hw/s390x/s390-pci-bus.h  |  1 +
>>   include/hw/s390x/s390-pci-vfio.h | 20 ++++++++
>>   5 files changed, 182 insertions(+), 5 deletions(-)
> [...]
>> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
>> index 78093aaac7..6f9271df87 100644
>> --- a/hw/s390x/s390-pci-vfio.c
>> +++ b/hw/s390x/s390-pci-vfio.c
>> @@ -152,6 +152,94 @@ int 
>> s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev)
>>       return 0;
>>   }
>> +int s390_pci_probe_aif(S390PCIBusDevice *pbdev)
>> +{
>> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, 
>> pdev);
> 
> Should this use VFIO_PCI() instead of container_of ?
> 

Yes, VFIO_PCI(pbdev->pdev) should work.

>> +    struct vfio_device_feature feat = {
>> +        .argsz = sizeof(struct vfio_device_feature),
>> +        .flags = VFIO_DEVICE_FEATURE_PROBE + 
>> VFIO_DEVICE_FEATURE_ZPCI_AIF
>> +    };
>> +
>> +    assert(vdev);
> 
> ... then you could likely also drop the assert(), I think?

If I've understood qom.h correctly then yes you're right, if we use the 
instance checker VFIO_PCI() we should trigger an assert through 
object_dynamic_cast_assert() already if the pdev isn't a vfio-pci device 
-- I just verified that by trying to call VFIO_PCI() with something else 
and we indeed get an assert e.g. 'VFIO_PCI: Object 0x... is not an 
instance of type vfio-pci'

So I'll change these and get rid of the extra asserts, thanks.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/12] s390x/pci: don't fence interpreted devices without MSI-X
  2021-12-07 21:04 ` [PATCH 08/12] s390x/pci: don't fence interpreted devices without MSI-X Matthew Rosato
  2021-12-08 11:04   ` Thomas Huth
@ 2021-12-15  6:26   ` Pierre Morel
  1 sibling, 0 replies; 32+ messages in thread
From: Pierre Morel @ 2021-12-15  6:26 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 12/7/21 22:04, Matthew Rosato wrote:
> Lack of MSI-X support is not an issue for interpreted passthrough
> devices, so let's let these in.  This will allow, for example, ISM
> devices to be passed through -- but only when interpretation is
> available and being used.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-bus.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 451bd32d92..503326210a 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -1096,7 +1096,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>               pbdev->interp = false;
>           }
>   
> -        if (s390_pci_msix_init(pbdev)) {
> +        if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
>               error_setg(errp, "MSI-X support is mandatory "
>                          "in the S390 architecture");
>               return;
> 

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/12] s390x/pci: zPCI interpretation support
  2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
                   ` (11 preceding siblings ...)
  2021-12-07 21:04 ` [PATCH 12/12] s390x/pci: let intercept devices have separate PCI groups Matthew Rosato
@ 2021-12-15  7:35 ` Pierre Morel
  2021-12-15 15:53   ` Matthew Rosato
  12 siblings, 1 reply; 32+ messages in thread
From: Pierre Morel @ 2021-12-15  7:35 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 12/7/21 22:04, Matthew Rosato wrote:
> Note:  The first 3 patches of this series are included as pre-reqs, but
> should be pulled via a separate series.  Also, patch 5 is needed to
> support 5.16+ linux header-sync and was already done by Paolo but not
> merged yet so is thus included here as well.
> 
> For QEMU, the majority of the work in enabling instruction interpretation
> is handled via new VFIO ioctls to SET the appropriate interpretation and
> interrupt forwarding modes, and to GET the function handle to use for
> interpretive execution.
> 
> This series implements these new ioctls, as well as adding a new, optional
> 'intercept' parameter to zpci to request interpretation support not be used
> as well as an 'intassist' parameter to determine whether or not the
> firmware assist will be used for interrupt delivery or whether the host
> will be responsible for delivering all interrupts.

In which circumstances do we have an added value by not using interrupt 
delivered by firmware?


> 
> The ZPCI_INTERP CPU feature is added beginning with the z14 model to
> enable this support.
> 
> As a consequence of implementing zPCI interpretation, ISM devices now
> become eligible for passthrough (but only when zPCI interpretation is
> available).
> 
>  From the perspective of guest configuration, you passthrough zPCI devices
> in the same manner as before, with intepretation support being used by
> default if available in kernel+qemu.
> 
> Associated kernel series:
> https://lkml.org/lkml/2021/12/7/1179
> 
> Matthew Rosato (11):
>    s390x/pci: use a reserved ID for the default PCI group
>    s390x/pci: don't use hard-coded dma range in reg_ioat
>    s390x/pci: add supported DT information to clp response
>    Update linux headers
>    target/s390x: add zpci-interp to cpu models
>    s390x/pci: enable for load/store intepretation
>    s390x/pci: don't fence interpreted devices without MSI-X
>    s390x/pci: enable adapter event notification for interpreted devices
>    s390x/pci: use I/O Address Translation assist when interpreting
>    s390x/pci: use dtsm provided from vfio capabilities for interpreted
>      devices
>    s390x/pci: let intercept devices have separate PCI groups
> 
> Paolo Bonzini (1):
>    virtio-gpu: do not byteswap padding
> 
>   hw/s390x/s390-pci-bus.c                       | 121 +++++++++-
>   hw/s390x/s390-pci-inst.c                      | 178 +++++++++++++-
>   hw/s390x/s390-pci-vfio.c                      | 221 +++++++++++++++++-
>   include/hw/s390x/s390-pci-bus.h               |  11 +-
>   include/hw/s390x/s390-pci-clp.h               |   3 +-
>   include/hw/s390x/s390-pci-inst.h              |   2 +-
>   include/hw/s390x/s390-pci-vfio.h              |  45 ++++
>   include/hw/virtio/virtio-gpu-bswap.h          |   1 -
>   include/standard-headers/asm-x86/kvm_para.h   |   1 +
>   include/standard-headers/drm/drm_fourcc.h     | 121 +++++++++-
>   include/standard-headers/linux/ethtool.h      |  31 +++
>   include/standard-headers/linux/fuse.h         |  15 +-
>   include/standard-headers/linux/pci_regs.h     |   6 +
>   include/standard-headers/linux/virtio_gpu.h   |  18 +-
>   include/standard-headers/linux/virtio_ids.h   |  24 ++
>   include/standard-headers/linux/virtio_mem.h   |   9 +-
>   include/standard-headers/linux/virtio_vsock.h |   3 +-
>   linux-headers/asm-arm64/unistd.h              |   1 +
>   linux-headers/asm-generic/unistd.h            |  22 +-
>   linux-headers/asm-mips/unistd_n32.h           |   2 +
>   linux-headers/asm-mips/unistd_n64.h           |   2 +
>   linux-headers/asm-mips/unistd_o32.h           |   2 +
>   linux-headers/asm-powerpc/unistd_32.h         |   2 +
>   linux-headers/asm-powerpc/unistd_64.h         |   2 +
>   linux-headers/asm-s390/kvm.h                  |   1 +
>   linux-headers/asm-s390/unistd_32.h            |   2 +
>   linux-headers/asm-s390/unistd_64.h            |   2 +
>   linux-headers/asm-x86/kvm.h                   |   5 +
>   linux-headers/asm-x86/unistd_32.h             |   3 +
>   linux-headers/asm-x86/unistd_64.h             |   3 +
>   linux-headers/asm-x86/unistd_x32.h            |   3 +
>   linux-headers/linux/kvm.h                     |  41 +++-
>   linux-headers/linux/vfio.h                    |  22 ++
>   linux-headers/linux/vfio_zdev.h               |  51 ++++
>   target/s390x/cpu_features_def.h.inc           |   1 +
>   target/s390x/gen-features.c                   |   2 +
>   target/s390x/kvm/kvm.c                        |   1 +
>   37 files changed, 928 insertions(+), 52 deletions(-)
> 

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/12] s390x/pci: enable for load/store intepretation
  2021-12-07 21:04 ` [PATCH 07/12] s390x/pci: enable for load/store intepretation Matthew Rosato
  2021-12-08 10:56   ` Thomas Huth
@ 2021-12-15  7:44   ` Pierre Morel
  2021-12-15 16:32     ` Matthew Rosato
  1 sibling, 1 reply; 32+ messages in thread
From: Pierre Morel @ 2021-12-15  7:44 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 12/7/21 22:04, Matthew Rosato wrote:
> Use the associated vfio feature ioctl to enable interpretation for devices
> when requested.  As part of this process, we must use the host function
> handle rather than a QEMU-generated one -- this is provided as part of the
> ioctl payload.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-bus.c          | 69 +++++++++++++++++++++++++++++++-
>   hw/s390x/s390-pci-inst.c         | 63 ++++++++++++++++++++++++++++-
>   hw/s390x/s390-pci-vfio.c         | 55 +++++++++++++++++++++++++
>   include/hw/s390x/s390-pci-bus.h  |  1 +
>   include/hw/s390x/s390-pci-vfio.h | 15 +++++++
>   5 files changed, 201 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 01b58ebc70..451bd32d92 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -971,12 +971,57 @@ static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
>       }
>   }
>   
> +static int s390_pci_interp_plug(S390pciState *s, S390PCIBusDevice *pbdev)
> +{
> +    uint32_t idx;
> +    int rc;
> +
> +    rc = s390_pci_probe_interp(pbdev);
> +    if (rc) {
> +        return rc;
> +    }
> +
> +    rc = s390_pci_update_passthrough_fh(pbdev);
> +    if (rc) {
> +        return rc;
> +    }
> +
> +    /*
> +     * The host device is in an enabled state, but the device must
> +     * begin as disabled for the guest so mask off the enable bit
> +     * from the passthrough handle.
> +     */

I think you should explain why the device must be seen disabled for the 
guest.
AFAIU it is because we need the guest to issue a CLP_ENABLE for us to 
activate interpretation.
This is due to the choice of activate/deactivate interpretation during 
device enable/disable.

Just curious: why not activate/deactivate interpretation on plug/unplug?

> +    pbdev->fh &= ~FH_MASK_ENABLE;
> +
> +    /* Next, see if the idx is already in-use */
> +    idx = pbdev->fh & FH_MASK_INDEX;
> +    if (pbdev->idx != idx) {
> +        if (s390_pci_find_dev_by_idx(s, idx)) {
> +            return -EINVAL;
> +        }
> +        /*
> +         * Update the idx entry with the passed through idx
> +         * If the relinquised idx is lower than next_idx, use it
> +         * to replace next_idx
> +         */
> +        g_hash_table_remove(s->zpci_table, &pbdev->idx);
> +        if (idx < s->next_idx) {
> +            s->next_idx = idx;
> +        }
> +        pbdev->idx = idx;
> +        g_hash_table_insert(s->zpci_table, &pbdev->idx, pbdev);
> +    }
> +
> +    return 0;
> +}
> +
>   static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>                                 Error **errp)
>   {
>       S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
>       PCIDevice *pdev = NULL;
>       S390PCIBusDevice *pbdev = NULL;
> +    int rc;
>   
>       if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
>           PCIBridge *pb = PCI_BRIDGE(dev);
> @@ -1022,12 +1067,33 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>           set_pbdev_info(pbdev);
>   
>           if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
> -            pbdev->fh |= FH_SHM_VFIO;
> +            /*
> +             * By default, interpretation is always requested; if the available
> +             * facilities indicate it is not available, fallback to the
> +             * intercept model.
> +             */
> +            if (pbdev->interp && !s390_has_feat(S390_FEAT_ZPCI_INTERP)) {
> +                    DPRINTF("zPCI interpretation facilities missing.\n");
> +                    pbdev->interp = false;
> +                }
> +            if (pbdev->interp) {
> +                rc = s390_pci_interp_plug(s, pbdev);
> +                if (rc) {
> +                    error_setg(errp, "zpci interp plug failed: %d", rc);
> +                    return;
> +                }
> +            }
>               pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
>               /* Fill in CLP information passed via the vfio region */
>               s390_pci_get_clp_info(pbdev);
> +            if (!pbdev->interp) {
> +                /* Do vfio passthrough but intercept for I/O */
> +                pbdev->fh |= FH_SHM_VFIO;
> +            }
>           } else {
>               pbdev->fh |= FH_SHM_EMUL;
> +            /* Always intercept emulated devices */
> +            pbdev->interp = false;
>           }
>   
>           if (s390_pci_msix_init(pbdev)) {
> @@ -1360,6 +1426,7 @@ static Property s390_pci_device_properties[] = {
>       DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
>       DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
>       DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
> +    DEFINE_PROP_BOOL("interp", S390PCIBusDevice, interp, true),
>       DEFINE_PROP_END_OF_LIST(),
>   };
>   
> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
> index 0cef7fbace..ba4017474e 100644
> --- a/hw/s390x/s390-pci-inst.c
> +++ b/hw/s390x/s390-pci-inst.c
> @@ -18,6 +18,7 @@
>   #include "sysemu/hw_accel.h"
>   #include "hw/s390x/s390-pci-inst.h"
>   #include "hw/s390x/s390-pci-bus.h"
> +#include "hw/s390x/s390-pci-vfio.h"
>   #include "hw/s390x/tod.h"
>   
>   #ifndef DEBUG_S390PCI_INST
> @@ -156,6 +157,47 @@ out:
>       return rc;
>   }
>   
> +static int clp_enable_interp(S390PCIBusDevice *pbdev)
> +{
> +    int rc;
> +
> +    rc = s390_pci_set_interp(pbdev, true);
> +    if (rc) {
> +        DPRINTF("Failed to enable interpretation\n");
> +        return rc;
> +    }
> +    rc = s390_pci_update_passthrough_fh(pbdev);
> +    if (rc) {
> +        DPRINTF("Failed to update passthrough fh\n");
> +        return rc;
> +    }
> +    if (!(pbdev->fh & FH_MASK_ENABLE)) {
> +        DPRINTF("Passthrough handle is not enabled\n");
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +static int clp_disable_interp(S390PCIBusDevice *pbdev)
> +{
> +    int rc;
> +
> +    rc = s390_pci_set_interp(pbdev, false);
> +    if (rc) {
> +        DPRINTF("Failed to disable interpretation\n");
> +        return rc;
> +    }
> +
> +    rc = s390_pci_update_passthrough_fh(pbdev);
> +    if (rc) {
> +        DPRINTF("Failed to update passthrough fh\n");
> +        return rc;
> +    }
> +
> +    return 0;
> +}
> +
>   int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
>   {
>       ClpReqHdr *reqh;
> @@ -246,7 +288,19 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
>                   goto out;
>               }
>   
> -            pbdev->fh |= FH_MASK_ENABLE;
> +            /*
> +             * If interpretation is specified, attempt to enable this now and
> +             * update with the host fh
> +             */
> +            if (pbdev->interp) {
> +                if (clp_enable_interp(pbdev)) {
> +                    stw_p(&ressetpci->hdr.rsp, CLP_RC_SETPCIFN_ERR);
> +                    goto out;
> +                }
> +            } else {
> +                pbdev->fh |= FH_MASK_ENABLE;
> +            }
> +
>               pbdev->state = ZPCI_FS_ENABLED;
>               stl_p(&ressetpci->fh, pbdev->fh);
>               stw_p(&ressetpci->hdr.rsp, CLP_RC_OK);
> @@ -257,6 +311,13 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
>                   goto out;
>               }
>               device_legacy_reset(DEVICE(pbdev));
> +            if (pbdev->interp) {
> +                if (clp_disable_interp(pbdev)) {
> +                    stw_p(&ressetpci->hdr.rsp, CLP_RC_SETPCIFN_ERR);
> +                    goto out;
> +                }
> +            }
> +            /* Mask off the enabled bit for interpreted devices too */
>               pbdev->fh &= ~FH_MASK_ENABLE;
>               pbdev->state = ZPCI_FS_DISABLED;
>               stl_p(&ressetpci->fh, pbdev->fh);
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 6f80a47e29..78093aaac7 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -97,6 +97,61 @@ void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt)
>       }
>   }
>   
> +int s390_pci_probe_interp(S390PCIBusDevice *pbdev)
> +{
> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +    struct vfio_device_feature feat = {
> +        .argsz = sizeof(struct vfio_device_feature),
> +        .flags = VFIO_DEVICE_FEATURE_PROBE + VFIO_DEVICE_FEATURE_ZPCI_INTERP
> +    };
> +
> +    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, &feat);
> +}
> +
> +int s390_pci_set_interp(S390PCIBusDevice *pbdev, bool enable)
> +{
> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +    g_autofree struct vfio_device_feature *feat;
> +    struct vfio_device_zpci_interp *data;
> +    int size;
> +
> +    size = sizeof(*feat) + sizeof(*data);
> +    feat = g_malloc0(size);
> +    feat->argsz = size;
> +    feat->flags = VFIO_DEVICE_FEATURE_SET + VFIO_DEVICE_FEATURE_ZPCI_INTERP;

I personaly do not like the use of "+" instead of "|" to act on bitmap 
or flags. But... yes, my opinion.

> +
> +    data = (struct vfio_device_zpci_interp *)&feat->data;
> +    if (enable) {
> +        data->flags = VFIO_DEVICE_ZPCI_FLAG_INTERP;
> +    } else {
> +        data->flags = 0;
> +    }
> +
> +    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
> +}
> +
> +int s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev)
> +{
> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +    g_autofree struct vfio_device_feature *feat;
> +    struct vfio_device_zpci_interp *data;
> +    int size, rc;
> +
> +    size = sizeof(*feat) + sizeof(*data);
> +    feat = g_malloc0(size);
> +    feat->argsz = size;
> +    feat->flags = VFIO_DEVICE_FEATURE_GET + VFIO_DEVICE_FEATURE_ZPCI_INTERP;
> +
> +    rc = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
> +    if (rc) {
> +        return rc;
> +    }
> +
> +    data = (struct vfio_device_zpci_interp *)&feat->data;
> +    pbdev->fh = data->fh;
> +    return 0;
> +}
> +
>   static void s390_pci_read_base(S390PCIBusDevice *pbdev,
>                                  struct vfio_device_info *info)
>   {
> diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
> index da3cde2bb4..a9843dfe97 100644
> --- a/include/hw/s390x/s390-pci-bus.h
> +++ b/include/hw/s390x/s390-pci-bus.h
> @@ -350,6 +350,7 @@ struct S390PCIBusDevice {
>       IndAddr *indicator;
>       bool pci_unplug_request_processed;
>       bool unplug_requested;
> +    bool interp;
>       QTAILQ_ENTRY(S390PCIBusDevice) link;
>   };
>   
> diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
> index ff708aef50..42533e38f7 100644
> --- a/include/hw/s390x/s390-pci-vfio.h
> +++ b/include/hw/s390x/s390-pci-vfio.h
> @@ -20,6 +20,9 @@ bool s390_pci_update_dma_avail(int fd, unsigned int *avail);
>   S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
>                                             S390PCIBusDevice *pbdev);
>   void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
> +int s390_pci_probe_interp(S390PCIBusDevice *pbdev);
> +int s390_pci_set_interp(S390PCIBusDevice *pbdev, bool enable);
> +int s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev);
>   void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
>   #else
>   static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
> @@ -33,6 +36,18 @@ static inline S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
>   }
>   static inline void s390_pci_end_dma_count(S390pciState *s,
>                                             S390PCIDMACount *cnt) { }
> +int s390_pci_probe_interp(S390PCIBusDevice *pbdev)
> +{
> +    return -EINVAL;
> +}
> +static inline int s390_pci_set_interp(S390PCIBusDevice *pbdev, bool enable)
> +{
> +    return -EINVAL;
> +}
> +static inline int s390_pci_update_passthrough_fh(S390PCIBusDevice *pbdev)
> +{
> +    return -EINVAL;
> +}
>   static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
>   #endif
>   
> 

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/12] s390x/pci: use dtsm provided from vfio capabilities for interpreted devices
  2021-12-07 21:04 ` [PATCH 11/12] s390x/pci: use dtsm provided from vfio capabilities for interpreted devices Matthew Rosato
@ 2021-12-15  7:47   ` Pierre Morel
  0 siblings, 0 replies; 32+ messages in thread
From: Pierre Morel @ 2021-12-15  7:47 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 12/7/21 22:04, Matthew Rosato wrote:
> When using the IOAT assist via interpretation, we should advertise what
> the host driver supports, not QEMU.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-vfio.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 6fc03a858a..c9269683f5 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -336,7 +336,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
>           resgrp->i = cap->noi;
>           resgrp->maxstbl = cap->maxstbl;
>           resgrp->version = cap->version;
> -        resgrp->dtsm = ZPCI_DTSM;
> +        if (hdr->version >= 2 && pbdev->interp) {
> +            resgrp->dtsm = cap->dtsm;
> +        } else {
> +            resgrp->dtsm = ZPCI_DTSM;
> +        }
>       }
>   }
>   
> 
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/12] s390x/pci: zPCI interpretation support
  2021-12-15  7:35 ` [PATCH 00/12] s390x/pci: zPCI interpretation support Pierre Morel
@ 2021-12-15 15:53   ` Matthew Rosato
  2021-12-17  9:17     ` Christian Borntraeger
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Rosato @ 2021-12-15 15:53 UTC (permalink / raw)
  To: Pierre Morel, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 12/15/21 2:35 AM, Pierre Morel wrote:
> 
> 
> On 12/7/21 22:04, Matthew Rosato wrote:
>> Note:  The first 3 patches of this series are included as pre-reqs, but
>> should be pulled via a separate series.  Also, patch 5 is needed to
>> support 5.16+ linux header-sync and was already done by Paolo but not
>> merged yet so is thus included here as well.
>>
>> For QEMU, the majority of the work in enabling instruction interpretation
>> is handled via new VFIO ioctls to SET the appropriate interpretation and
>> interrupt forwarding modes, and to GET the function handle to use for
>> interpretive execution.
>>
>> This series implements these new ioctls, as well as adding a new, 
>> optional
>> 'intercept' parameter to zpci to request interpretation support not be 
>> used
>> as well as an 'intassist' parameter to determine whether or not the
>> firmware assist will be used for interrupt delivery or whether the host
>> will be responsible for delivering all interrupts.
> 
> In which circumstances do we have an added value by not using interrupt 
> delivered by firmware?
> 

Disabling it can be a tool to debug and assist in problem determination, 
but that's about the only scenario I can think of where you would 
intentionally want to disable intassist.  Perhaps then it's not worth 
leaving in place.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/12] s390x/pci: enable for load/store intepretation
  2021-12-15  7:44   ` Pierre Morel
@ 2021-12-15 16:32     ` Matthew Rosato
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Rosato @ 2021-12-15 16:32 UTC (permalink / raw)
  To: Pierre Morel, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 12/15/21 2:44 AM, Pierre Morel wrote:
> 
> 
> On 12/7/21 22:04, Matthew Rosato wrote:
>> Use the associated vfio feature ioctl to enable interpretation for 
>> devices
>> when requested.  As part of this process, we must use the host function
>> handle rather than a QEMU-generated one -- this is provided as part of 
>> the
>> ioctl payload.
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>>   hw/s390x/s390-pci-bus.c          | 69 +++++++++++++++++++++++++++++++-
>>   hw/s390x/s390-pci-inst.c         | 63 ++++++++++++++++++++++++++++-
>>   hw/s390x/s390-pci-vfio.c         | 55 +++++++++++++++++++++++++
>>   include/hw/s390x/s390-pci-bus.h  |  1 +
>>   include/hw/s390x/s390-pci-vfio.h | 15 +++++++
>>   5 files changed, 201 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>> index 01b58ebc70..451bd32d92 100644
>> --- a/hw/s390x/s390-pci-bus.c
>> +++ b/hw/s390x/s390-pci-bus.c
>> @@ -971,12 +971,57 @@ static void 
>> s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
>>       }
>>   }
>> +static int s390_pci_interp_plug(S390pciState *s, S390PCIBusDevice 
>> *pbdev)
>> +{
>> +    uint32_t idx;
>> +    int rc;
>> +
>> +    rc = s390_pci_probe_interp(pbdev);
>> +    if (rc) {
>> +        return rc;
>> +    }
>> +
>> +    rc = s390_pci_update_passthrough_fh(pbdev);
>> +    if (rc) {
>> +        return rc;
>> +    }
>> +
>> +    /*
>> +     * The host device is in an enabled state, but the device must
>> +     * begin as disabled for the guest so mask off the enable bit
>> +     * from the passthrough handle.
>> +     */
> 
> I think you should explain why the device must be seen disabled for the 
> guest.
> AFAIU it is because we need the guest to issue a CLP_ENABLE for us to 
> activate interpretation.
> This is due to the choice of activate/deactivate interpretation during 
> device enable/disable.
> 

Not completely.  While it is true that we need the guest to issue the 
CLP_ENABLE to activate interpretation with the way this is currently 
implemented, existing code also starts all plugged devices with a QEMU 
internal state of

pbdev->state = ZPCI_FS_DISABLED;

and a disabled pbdev->fh, thus expecting a subsequent CLP ENABLE from 
the guest when/if it decides to use the device.

If we were to set the handle to an enabled state here, we must also 
udpate the pbdev state, introducing a difference in how we present 
intercept/emulated devices vs interpreted devices during plug.  I don't 
think we want to do that -- but I can improve this comment here to try 
and better explain that all devices begin plugged as disabled to the guest


> Just curious: why not activate/deactivate interpretation on plug/unplug?
> 

I think it would be possible to do so (while still showing a disabled 
handle initially), but my thinking is that tying these actions to guest 
CLP disable/enable will also be useful later when looking at trying to 
better-handle error events from the guest e.g. hot reset.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/12] s390x/pci: use I/O Address Translation assist when interpreting
  2021-12-07 21:04 ` [PATCH 10/12] s390x/pci: use I/O Address Translation assist when interpreting Matthew Rosato
@ 2021-12-16  8:03   ` Pierre Morel
  0 siblings, 0 replies; 32+ messages in thread
From: Pierre Morel @ 2021-12-16  8:03 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 12/7/21 22:04, Matthew Rosato wrote:
> Allow the underlying kvm host to handle the Refresh PCI Translation
> instruction intercepts.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-bus.c          |  6 ++--
>   hw/s390x/s390-pci-inst.c         | 51 ++++++++++++++++++++++++++++++--
>   hw/s390x/s390-pci-vfio.c         | 33 +++++++++++++++++++++
>   include/hw/s390x/s390-pci-inst.h |  2 +-
>   include/hw/s390x/s390-pci-vfio.h | 10 +++++++
>   5 files changed, 95 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 1ae8792a26..ab442f17fb 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -196,7 +196,7 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>               pci_dereg_irqs(pbdev);
>           }
>           if (pbdev->iommu->enabled) {
> -            pci_dereg_ioat(pbdev->iommu);
> +            pci_dereg_ioat(pbdev);
>           }
>           pbdev->state = ZPCI_FS_STANDBY;
>           rc = SCLP_RC_NORMAL_COMPLETION;
> @@ -1261,7 +1261,7 @@ static void s390_pcihost_reset(DeviceState *dev)
>                   pci_dereg_irqs(pbdev);
>               }
>               if (pbdev->iommu->enabled) {
> -                pci_dereg_ioat(pbdev->iommu);
> +                pci_dereg_ioat(pbdev);
>               }
>               pbdev->state = ZPCI_FS_STANDBY;
>               s390_pci_perform_unplug(pbdev);
> @@ -1402,7 +1402,7 @@ static void s390_pci_device_reset(DeviceState *dev)
>           pci_dereg_irqs(pbdev);
>       }
>       if (pbdev->iommu->enabled) {
> -        pci_dereg_ioat(pbdev->iommu);
> +        pci_dereg_ioat(pbdev);
>       }
>   
>       fmb_timer_free(pbdev);
> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
> index 02093e82f9..598e5a5d52 100644
> --- a/hw/s390x/s390-pci-inst.c
> +++ b/hw/s390x/s390-pci-inst.c
> @@ -978,6 +978,24 @@ int pci_dereg_irqs(S390PCIBusDevice *pbdev)
>       return 0;
>   }
>   
> +static int reg_ioat_interp(S390PCIBusDevice *pbdev, uint64_t iota)
> +{
> +    int rc;
> +
> +    rc = s390_pci_probe_ioat(pbdev);
> +    if (rc) {
> +        return rc;
> +    }
> +
> +    rc = s390_pci_set_ioat(pbdev, iota);
> +    if (rc) {
> +        return rc;
> +    }
> +
> +    pbdev->iommu->enabled = true;
> +    return 0;
> +}
> +
>   static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib,
>                       uintptr_t ra)
>   {
> @@ -995,6 +1013,16 @@ static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib,
>           return -EINVAL;
>       }
>   
> +    /* If this is an interpreted device, we must use the IOAT assist */
> +    if (pbdev->interp) {
> +        if (reg_ioat_interp(pbdev, g_iota)) {
> +            error_report("failure starting ioat assist");
> +            s390_program_interrupt(env, PGM_OPERAND, ra);
> +            return -EINVAL;
> +        }
> +        return 0;
> +    }
> +
>       /* currently we only support designation type 1 with translation */
>       if (!(dt == ZPCI_IOTA_RTTO && t)) {
>           error_report("unsupported ioat dt %d t %d", dt, t);
> @@ -1011,8 +1039,25 @@ static int reg_ioat(CPUS390XState *env, S390PCIBusDevice *pbdev, ZpciFib fib,
>       return 0;
>   }
>   
> -void pci_dereg_ioat(S390PCIIOMMU *iommu)
> +static void dereg_ioat_interp(S390PCIBusDevice *pbdev)
>   {
> +    if (s390_pci_probe_ioat(pbdev) != 0) {
> +        return;
> +    }
> +
> +    s390_pci_set_ioat(pbdev, 0);
> +    pbdev->iommu->enabled = false;
> +}
> +
> +void pci_dereg_ioat(S390PCIBusDevice *pbdev)
> +{
> +    S390PCIIOMMU *iommu = pbdev->iommu;
> +
> +    if (pbdev->interp) {
> +        dereg_ioat_interp(pbdev);
> +        return;
> +    }
> +
>       s390_pci_iommu_disable(iommu);
>       iommu->pba = 0;
>       iommu->pal = 0;
> @@ -1251,7 +1296,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
>               cc = ZPCI_PCI_LS_ERR;
>               s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
>           } else {
> -            pci_dereg_ioat(pbdev->iommu);
> +            pci_dereg_ioat(pbdev);
>           }
>           break;
>       case ZPCI_MOD_FC_REREG_IOAT:
> @@ -1262,7 +1307,7 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
>               cc = ZPCI_PCI_LS_ERR;
>               s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
>           } else {
> -            pci_dereg_ioat(pbdev->iommu);
> +            pci_dereg_ioat(pbdev);
>               if (reg_ioat(env, pbdev, fib, ra)) {
>                   cc = ZPCI_PCI_LS_ERR;
>                   s390_set_status_code(env, r1, ZPCI_MOD_ST_INSUF_RES);
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 6f9271df87..6fc03a858a 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -240,6 +240,39 @@ int s390_pci_get_aif(S390PCIBusDevice *pbdev, bool enable, bool assist)
>       return rc;
>   }
>   
> +int s390_pci_probe_ioat(S390PCIBusDevice *pbdev)
> +{
> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +    struct vfio_device_feature feat = {
> +        .argsz = sizeof(struct vfio_device_feature),
> +        .flags = VFIO_DEVICE_FEATURE_PROBE + VFIO_DEVICE_FEATURE_ZPCI_IOAT
> +    };
> +
> +    assert(vdev);
> +
> +    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, &feat);
> +}
> +
> +int s390_pci_set_ioat(S390PCIBusDevice *pbdev, uint64_t iota)
> +{
> +    VFIOPCIDevice *vdev = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +    g_autofree struct vfio_device_feature *feat;
> +    struct vfio_device_zpci_ioat *data;
> +    int size;
> +
> +    assert(vdev);

Same comment as Thomas in the previous patch.

> +
> +    size = sizeof(*feat) + sizeof(*data);
> +    feat = g_malloc0(size);
> +    feat->argsz = size;
> +    feat->flags = VFIO_DEVICE_FEATURE_SET + VFIO_DEVICE_FEATURE_ZPCI_IOAT;
> +
> +    data = (struct vfio_device_zpci_ioat *)&feat->data;
> +    data->iota = iota;
> +
> +    return ioctl(vdev->vbasedev.fd, VFIO_DEVICE_FEATURE, feat);
> +}
> +
>   static void s390_pci_read_base(S390PCIBusDevice *pbdev,
>                                  struct vfio_device_info *info)
>   {
> diff --git a/include/hw/s390x/s390-pci-inst.h b/include/hw/s390x/s390-pci-inst.h
> index a55c448aad..13566fb7b4 100644
> --- a/include/hw/s390x/s390-pci-inst.h
> +++ b/include/hw/s390x/s390-pci-inst.h
> @@ -99,7 +99,7 @@ typedef struct ZpciFib {
>   } QEMU_PACKED ZpciFib;
>   
>   int pci_dereg_irqs(S390PCIBusDevice *pbdev);
> -void pci_dereg_ioat(S390PCIIOMMU *iommu);
> +void pci_dereg_ioat(S390PCIBusDevice *pbdev);
>   int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra);
>   int pcilg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra);
>   int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra);
> diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
> index 6cec38a863..e7a2d8ff77 100644
> --- a/include/hw/s390x/s390-pci-vfio.h
> +++ b/include/hw/s390x/s390-pci-vfio.h
> @@ -28,6 +28,8 @@ int s390_pci_probe_aif(S390PCIBusDevice *pbdev);
>   int s390_pci_set_aif(S390PCIBusDevice *pbdev, ZpciFib *fib, bool enable,
>                        bool assist);
>   int s390_pci_get_aif(S390PCIBusDevice *pbdev, bool enable, bool assist);
> +int s390_pci_probe_ioat(S390PCIBusDevice *pbdev);
> +int s390_pci_set_ioat(S390PCIBusDevice *pbdev, uint64_t iota);
>   
>   void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
>   #else
> @@ -68,6 +70,14 @@ static inline int s390_pci_get_aif(S390PCIBusDevice *pbdev, bool enable,
>   {
>       return -EINVAL;
>   }
> +static inline int s390_pci_probe_ioat(S390PCIBusDevice *pbdev)
> +{
> +    return -EINVAL;
> +}
> +static inline int s390_pci_set_ioat(S390PCIBusDevice *pbdev, uint64_t iota)
> +{
> +    return -EINVAL;
> +}
>   static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
>   #endif
>   
> 

LGTM
With the change

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>


-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/12] s390x/pci: let intercept devices have separate PCI groups
  2021-12-07 21:04 ` [PATCH 12/12] s390x/pci: let intercept devices have separate PCI groups Matthew Rosato
@ 2021-12-16  8:15   ` Pierre Morel
  2021-12-16 15:16     ` Matthew Rosato
  0 siblings, 1 reply; 32+ messages in thread
From: Pierre Morel @ 2021-12-16  8:15 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 12/7/21 22:04, Matthew Rosato wrote:
> Let's use the reserved pool of simulated PCI groups to allow intercept
> devices to have separate groups from interpreted devices as some group
> values may be different. If we run out of simulated PCI groups, subsequent
> intercept devices just get the default group.
> Furthermore, if we encounter any PCI groups from hostdevs that are marked
> as simulated, let's just assign them to the default group to avoid
> conflicts between host simulated groups and our own simulated groups.

I have a problem here.
We will have the same hardware viewed by 2 different VFIO implementation 
(interpretation vs interception) reporting different groups ID.

The alternative is to have them reporting same group ID with different 
values.

I fear both are wrong.

On the other hand, should we have a difference in the QEMU command line 
between intercepted and interpreted devices for default values.
If not why not give up the host values so that in an hypothetical future 
migration we are clean with the GID ?

I am not sure of this, just want to open a little discussion on this.

For example what could go wrong to keep the host values returned by the CAP?


> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-bus.c         | 19 ++++++++++++++--
>   hw/s390x/s390-pci-vfio.c        | 40 ++++++++++++++++++++++++++++++---
>   include/hw/s390x/s390-pci-bus.h |  6 ++++-
>   3 files changed, 59 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index ab442f17fb..8b0f3ef120 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -747,13 +747,14 @@ static void s390_pci_iommu_free(S390pciState *s, PCIBus *bus, int32_t devfn)
>       object_unref(OBJECT(iommu));
>   }
>   
> -S390PCIGroup *s390_group_create(int id)
> +S390PCIGroup *s390_group_create(int id, int host_id)
>   {
>       S390PCIGroup *group;
>       S390pciState *s = s390_get_phb();
>   
>       group = g_new0(S390PCIGroup, 1);
>       group->id = id;
> +    group->host_id = host_id;
>       QTAILQ_INSERT_TAIL(&s->zpci_groups, group, link);
>       return group;
>   }
> @@ -771,12 +772,25 @@ S390PCIGroup *s390_group_find(int id)
>       return NULL;
>   }
>   
> +S390PCIGroup *s390_group_find_host_sim(int host_id)
> +{
> +    S390PCIGroup *group;
> +    S390pciState *s = s390_get_phb();
> +
> +    QTAILQ_FOREACH(group, &s->zpci_groups, link) {
> +        if (group->id >= ZPCI_SIM_GRP_START && group->host_id == host_id) {
> +            return group;
> +        }
> +    }
> +    return NULL;
> +}
> +
>   static void s390_pci_init_default_group(void)
>   {
>       S390PCIGroup *group;
>       ClpRspQueryPciGrp *resgrp;
>   
> -    group = s390_group_create(ZPCI_DEFAULT_FN_GRP);
> +    group = s390_group_create(ZPCI_DEFAULT_FN_GRP, ZPCI_DEFAULT_FN_GRP);
>       resgrp = &group->zpci_group;
>       resgrp->fr = 1;
>       resgrp->dasm = 0;
> @@ -824,6 +838,7 @@ static void s390_pcihost_realize(DeviceState *dev, Error **errp)
>                                              NULL, g_free);
>       s->zpci_table = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, NULL);
>       s->bus_no = 0;
> +    s->next_sim_grp = ZPCI_SIM_GRP_START;
>       QTAILQ_INIT(&s->pending_sei);
>       QTAILQ_INIT(&s->zpci_devs);
>       QTAILQ_INIT(&s->zpci_dma_limit);
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index c9269683f5..bdc5892287 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -305,13 +305,17 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
>   {
>       struct vfio_info_cap_header *hdr;
>       struct vfio_device_info_cap_zpci_group *cap;
> +    S390pciState *s = s390_get_phb();
>       ClpRspQueryPciGrp *resgrp;
>       VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
>   
>       hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
>   
> -    /* If capability not provided, just use the default group */
> -    if (hdr == NULL) {
> +    /*
> +     * If capability not provided or the underlying hostdev is simulated, just
> +     * use the default group.
> +     */
> +    if (hdr == NULL || pbdev->zpci_fn.pfgid >= ZPCI_SIM_GRP_START) {
>           trace_s390_pci_clp_cap(vpci->vbasedev.name,
>                                  VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
>           pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
> @@ -320,11 +324,41 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
>       }
>       cap = (void *) hdr;
>   
> +    /*
> +     * For an intercept device, let's use an existing simulated group if one
> +     * one was already created for other intercept devices in this group.
> +     * If not, create a new simulated group if any are still available.
> +     * If all else fails, just fall back on the default group.
> +     */
> +    if (!pbdev->interp) {
> +        pbdev->pci_group = s390_group_find_host_sim(pbdev->zpci_fn.pfgid);
> +        if (pbdev->pci_group) {
> +            /* Use existing simulated group */
> +            pbdev->zpci_fn.pfgid = pbdev->pci_group->id;
> +            return;
> +        } else {
> +            if (s->next_sim_grp == ZPCI_DEFAULT_FN_GRP) {
> +                /* All out of simulated groups, use default */
> +                trace_s390_pci_clp_cap(vpci->vbasedev.name,
> +                                       VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
> +                pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
> +                pbdev->pci_group = s390_group_find(ZPCI_DEFAULT_FN_GRP);
> +                return;
> +            } else {
> +                /* We can assign a new simulated group */
> +                pbdev->zpci_fn.pfgid = s->next_sim_grp;
> +                s->next_sim_grp++;
> +                /* Fall through to create the new sim group using CLP info */
> +            }
> +        }
> +    }
> +
>       /* See if the PCI group is already defined, create if not */
>       pbdev->pci_group = s390_group_find(pbdev->zpci_fn.pfgid);
>   
>       if (!pbdev->pci_group) {
> -        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid);
> +        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid,
> +                                             pbdev->zpci_fn.pfgid);
>   
>           resgrp = &pbdev->pci_group->zpci_group;
>           if (cap->flags & VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH) {
> diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
> index 9941ca0084..8664023d5d 100644
> --- a/include/hw/s390x/s390-pci-bus.h
> +++ b/include/hw/s390x/s390-pci-bus.h
> @@ -315,13 +315,16 @@ typedef struct ZpciFmb {
>   QEMU_BUILD_BUG_MSG(offsetof(ZpciFmb, fmt0) != 48, "padding in ZpciFmb");
>   
>   #define ZPCI_DEFAULT_FN_GRP 0xFF
> +#define ZPCI_SIM_GRP_START 0xF0
>   typedef struct S390PCIGroup {
>       ClpRspQueryPciGrp zpci_group;
>       int id;
> +    int host_id;
>       QTAILQ_ENTRY(S390PCIGroup) link;
>   } S390PCIGroup;
> -S390PCIGroup *s390_group_create(int id);
> +S390PCIGroup *s390_group_create(int id, int host_id);
>   S390PCIGroup *s390_group_find(int id);
> +S390PCIGroup *s390_group_find_host_sim(int host_id);
>   
>   struct S390PCIBusDevice {
>       DeviceState qdev;
> @@ -370,6 +373,7 @@ struct S390pciState {
>       QTAILQ_HEAD(, S390PCIBusDevice) zpci_devs;
>       QTAILQ_HEAD(, S390PCIDMACount) zpci_dma_limit;
>       QTAILQ_HEAD(, S390PCIGroup) zpci_groups;
> +    uint8_t next_sim_grp;
>   };
>   
>   S390pciState *s390_get_phb(void);
> 

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/12] s390x/pci: let intercept devices have separate PCI groups
  2021-12-16  8:15   ` Pierre Morel
@ 2021-12-16 15:16     ` Matthew Rosato
  2021-12-17  9:56       ` Pierre Morel
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Rosato @ 2021-12-16 15:16 UTC (permalink / raw)
  To: Pierre Morel, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 12/16/21 3:15 AM, Pierre Morel wrote:
> 
> 
> On 12/7/21 22:04, Matthew Rosato wrote:
>> Let's use the reserved pool of simulated PCI groups to allow intercept
>> devices to have separate groups from interpreted devices as some group
>> values may be different. If we run out of simulated PCI groups, 
>> subsequent
>> intercept devices just get the default group.
>> Furthermore, if we encounter any PCI groups from hostdevs that are marked
>> as simulated, let's just assign them to the default group to avoid
>> conflicts between host simulated groups and our own simulated groups.
> 
> I have a problem here.
> We will have the same hardware viewed by 2 different VFIO implementation 
> (interpretation vs interception) reporting different groups ID.

Yes -- To be clear, this patch proposes that the interpreted device will 
continue to report the passthrough group ID and the intercept device 
will use a simulated group ID.

> 
> The alternative is to have them reporting same group ID with different 
> values.
>

I don't think we can do this.  For starters, we would have to throw out 
the group tracking we do in QEMU; but for all we know the guest could be 
doing similar tracking -- the implication of the group ID is that 
everyone shares the same values so I don't think we can get away with 
reporting different values for 2 members of the same group.

I think the other alternative is rather to always do something like...

1) host reports its value via vfio capabilities as 'this is what an 
interpreted device can use'
2) QEMU must accept those values as-is OR reduce them to some subset of 
what both interpretation and intercept can support, and report only 
those values for all devices in the group.  (More on this further down)


> I fear both are wrong.
> 
> On the other hand, should we have a difference in the QEMU command line 
> between intercepted and interpreted devices for default values.

I'm not sure I follow what you suggest here.  Even if we somehow 
provided a command-line means for specifying some of these values, they 
would still be presented to the guest via clp and if the guest has 2 
devices in the same group the clp results had better be the same.

> If not why not give up the host values so that in an hypothetical future 
> migration we are clean with the GID ?
> 

Well, the interpreted device will use the passthrough group ID so in a 
hypothetical future migration scenario we should be good there.

And simulated devices will still use the default group, so we should 
also be OK there.

This really changes the behavior for 2 other classes of device:

1) Intercept passthrough devices -- Yes, I agree that doing this is a 
bit weird.  But my thinking was that these devices should be the 
exception case rather than the norm moving forward, and it would clearly 
dilineate the different in Q PCI FNGRP values.

2) nested simulated devices -- These aren't using real GIDs anyway and I 
would expect them to also be using the default group already -- forcing 
these to the default group was basically to make sure they didn't 
conflict with the simulated groups being created for intercept devices 
above.

> I am not sure of this, just want to open a little discussion on this.

FWIW, I'm not 100% on this either, so a better idea is welcome.  One 
thing I don't like, for example, is that we only have 16 simulated 
groups to work with, and for example we might find it useful later to 
split simulated devices into different groups based on type.

> 
> For example what could go wrong to keep the host values returned by the 
> CAP?

As-is, we risk advertising the wrong maxstbl and dtsm value for some 
devices in the group, depending on which device is plugged first. 
Imagine you have 2 devices on group 5; one will be interpreted and the 
other intercepted.

If the interpreted device plugs first, we will use the passthrough 
maxstbl and dtsm for all devices in the group; so the intercept device 
gets these values too.

If the intercept device plugs first, we will use the QEMU value for DTSM 
and the smaller maxstbl requried for intercept passthrough.  So the 
interpreted device gets these values too.

Worth noting, we could have more of these differences later -- But if we 
want to avoid splitting the group, then we I think we have to circle 
back to my 'alternative idea' above and provide equivalent support or 
toleration for intercept devices so that we can report a single group 
value that both types can support.

So insofar as dealing with the differences today...  maxstbl is pretty 
easy, we can just tolerate supporting the larger maxstbl in QEMU by 
adding logic to break up the I/O in pcistb_service_call.  We might have 
to provide 2 different maxstbl values over vfio capabilities however 
(what interpretation can support vs what kernel api supports for 
intercept as this could change between host kernel versions)

DTSM is a little trickier.  We are actually OK today because both 
intercept and interpreted devices will report the same value anyway, but 
that could change in the future.  Maybe here QEMU must report

dtsm = (QEMU_SUPPORT_MASK & HOST_SUPPORT_MASK);

So basically: ensure that only what both QEMU intercept and passthrough 
supports is advertised via the clp.  If we want to support a new type 
later, then we must either support it in both kvm and QEMU to enable it 
for the guest (or disallow intercept devices on that group, or provide 
some means of forcing an intercept device to the default group, etc)

If we do the above, then I think we can drop the idea of using simulated 
groups for intercpet passthrough devices.  What do you think?

> 
> 
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>>   hw/s390x/s390-pci-bus.c         | 19 ++++++++++++++--
>>   hw/s390x/s390-pci-vfio.c        | 40 ++++++++++++++++++++++++++++++---
>>   include/hw/s390x/s390-pci-bus.h |  6 ++++-
>>   3 files changed, 59 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>> index ab442f17fb..8b0f3ef120 100644
>> --- a/hw/s390x/s390-pci-bus.c
>> +++ b/hw/s390x/s390-pci-bus.c
>> @@ -747,13 +747,14 @@ static void s390_pci_iommu_free(S390pciState *s, 
>> PCIBus *bus, int32_t devfn)
>>       object_unref(OBJECT(iommu));
>>   }
>> -S390PCIGroup *s390_group_create(int id)
>> +S390PCIGroup *s390_group_create(int id, int host_id)
>>   {
>>       S390PCIGroup *group;
>>       S390pciState *s = s390_get_phb();
>>       group = g_new0(S390PCIGroup, 1);
>>       group->id = id;
>> +    group->host_id = host_id;
>>       QTAILQ_INSERT_TAIL(&s->zpci_groups, group, link);
>>       return group;
>>   }
>> @@ -771,12 +772,25 @@ S390PCIGroup *s390_group_find(int id)
>>       return NULL;
>>   }
>> +S390PCIGroup *s390_group_find_host_sim(int host_id)
>> +{
>> +    S390PCIGroup *group;
>> +    S390pciState *s = s390_get_phb();
>> +
>> +    QTAILQ_FOREACH(group, &s->zpci_groups, link) {
>> +        if (group->id >= ZPCI_SIM_GRP_START && group->host_id == 
>> host_id) {
>> +            return group;
>> +        }
>> +    }
>> +    return NULL;
>> +}
>> +
>>   static void s390_pci_init_default_group(void)
>>   {
>>       S390PCIGroup *group;
>>       ClpRspQueryPciGrp *resgrp;
>> -    group = s390_group_create(ZPCI_DEFAULT_FN_GRP);
>> +    group = s390_group_create(ZPCI_DEFAULT_FN_GRP, ZPCI_DEFAULT_FN_GRP);
>>       resgrp = &group->zpci_group;
>>       resgrp->fr = 1;
>>       resgrp->dasm = 0;
>> @@ -824,6 +838,7 @@ static void s390_pcihost_realize(DeviceState *dev, 
>> Error **errp)
>>                                              NULL, g_free);
>>       s->zpci_table = g_hash_table_new_full(g_int_hash, g_int_equal, 
>> NULL, NULL);
>>       s->bus_no = 0;
>> +    s->next_sim_grp = ZPCI_SIM_GRP_START;
>>       QTAILQ_INIT(&s->pending_sei);
>>       QTAILQ_INIT(&s->zpci_devs);
>>       QTAILQ_INIT(&s->zpci_dma_limit);
>> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
>> index c9269683f5..bdc5892287 100644
>> --- a/hw/s390x/s390-pci-vfio.c
>> +++ b/hw/s390x/s390-pci-vfio.c
>> @@ -305,13 +305,17 @@ static void s390_pci_read_group(S390PCIBusDevice 
>> *pbdev,
>>   {
>>       struct vfio_info_cap_header *hdr;
>>       struct vfio_device_info_cap_zpci_group *cap;
>> +    S390pciState *s = s390_get_phb();
>>       ClpRspQueryPciGrp *resgrp;
>>       VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, 
>> pdev);
>>       hdr = vfio_get_device_info_cap(info, 
>> VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
>> -    /* If capability not provided, just use the default group */
>> -    if (hdr == NULL) {
>> +    /*
>> +     * If capability not provided or the underlying hostdev is 
>> simulated, just
>> +     * use the default group.
>> +     */
>> +    if (hdr == NULL || pbdev->zpci_fn.pfgid >= ZPCI_SIM_GRP_START) {
>>           trace_s390_pci_clp_cap(vpci->vbasedev.name,
>>                                  VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
>>           pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
>> @@ -320,11 +324,41 @@ static void s390_pci_read_group(S390PCIBusDevice 
>> *pbdev,
>>       }
>>       cap = (void *) hdr;
>> +    /*
>> +     * For an intercept device, let's use an existing simulated group 
>> if one
>> +     * one was already created for other intercept devices in this 
>> group.
>> +     * If not, create a new simulated group if any are still available.
>> +     * If all else fails, just fall back on the default group.
>> +     */
>> +    if (!pbdev->interp) {
>> +        pbdev->pci_group = 
>> s390_group_find_host_sim(pbdev->zpci_fn.pfgid);
>> +        if (pbdev->pci_group) {
>> +            /* Use existing simulated group */
>> +            pbdev->zpci_fn.pfgid = pbdev->pci_group->id;
>> +            return;
>> +        } else {
>> +            if (s->next_sim_grp == ZPCI_DEFAULT_FN_GRP) {
>> +                /* All out of simulated groups, use default */
>> +                trace_s390_pci_clp_cap(vpci->vbasedev.name,
>> +                                       VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
>> +                pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
>> +                pbdev->pci_group = s390_group_find(ZPCI_DEFAULT_FN_GRP);
>> +                return;
>> +            } else {
>> +                /* We can assign a new simulated group */
>> +                pbdev->zpci_fn.pfgid = s->next_sim_grp;
>> +                s->next_sim_grp++;
>> +                /* Fall through to create the new sim group using CLP 
>> info */
>> +            }
>> +        }
>> +    }
>> +
>>       /* See if the PCI group is already defined, create if not */
>>       pbdev->pci_group = s390_group_find(pbdev->zpci_fn.pfgid);
>>       if (!pbdev->pci_group) {
>> -        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid);
>> +        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid,
>> +                                             pbdev->zpci_fn.pfgid);
>>           resgrp = &pbdev->pci_group->zpci_group;
>>           if (cap->flags & VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH) {
>> diff --git a/include/hw/s390x/s390-pci-bus.h 
>> b/include/hw/s390x/s390-pci-bus.h
>> index 9941ca0084..8664023d5d 100644
>> --- a/include/hw/s390x/s390-pci-bus.h
>> +++ b/include/hw/s390x/s390-pci-bus.h
>> @@ -315,13 +315,16 @@ typedef struct ZpciFmb {
>>   QEMU_BUILD_BUG_MSG(offsetof(ZpciFmb, fmt0) != 48, "padding in 
>> ZpciFmb");
>>   #define ZPCI_DEFAULT_FN_GRP 0xFF
>> +#define ZPCI_SIM_GRP_START 0xF0
>>   typedef struct S390PCIGroup {
>>       ClpRspQueryPciGrp zpci_group;
>>       int id;
>> +    int host_id;
>>       QTAILQ_ENTRY(S390PCIGroup) link;
>>   } S390PCIGroup;
>> -S390PCIGroup *s390_group_create(int id);
>> +S390PCIGroup *s390_group_create(int id, int host_id);
>>   S390PCIGroup *s390_group_find(int id);
>> +S390PCIGroup *s390_group_find_host_sim(int host_id);
>>   struct S390PCIBusDevice {
>>       DeviceState qdev;
>> @@ -370,6 +373,7 @@ struct S390pciState {
>>       QTAILQ_HEAD(, S390PCIBusDevice) zpci_devs;
>>       QTAILQ_HEAD(, S390PCIDMACount) zpci_dma_limit;
>>       QTAILQ_HEAD(, S390PCIGroup) zpci_groups;
>> +    uint8_t next_sim_grp;
>>   };
>>   S390pciState *s390_get_phb(void);
>>
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/12] s390x/pci: zPCI interpretation support
  2021-12-15 15:53   ` Matthew Rosato
@ 2021-12-17  9:17     ` Christian Borntraeger
  0 siblings, 0 replies; 32+ messages in thread
From: Christian Borntraeger @ 2021-12-17  9:17 UTC (permalink / raw)
  To: Matthew Rosato, Pierre Morel, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david



Am 15.12.21 um 16:53 schrieb Matthew Rosato:
> On 12/15/21 2:35 AM, Pierre Morel wrote:
>>
>>
>> On 12/7/21 22:04, Matthew Rosato wrote:
>>> Note:  The first 3 patches of this series are included as pre-reqs, but
>>> should be pulled via a separate series.  Also, patch 5 is needed to
>>> support 5.16+ linux header-sync and was already done by Paolo but not
>>> merged yet so is thus included here as well.
>>>
>>> For QEMU, the majority of the work in enabling instruction interpretation
>>> is handled via new VFIO ioctls to SET the appropriate interpretation and
>>> interrupt forwarding modes, and to GET the function handle to use for
>>> interpretive execution.
>>>
>>> This series implements these new ioctls, as well as adding a new, optional
>>> 'intercept' parameter to zpci to request interpretation support not be used
>>> as well as an 'intassist' parameter to determine whether or not the
>>> firmware assist will be used for interrupt delivery or whether the host
>>> will be responsible for delivering all interrupts.
>>
>> In which circumstances do we have an added value by not using interrupt delivered by firmware?
>>
> 
> Disabling it can be a tool to debug and assist in problem determination, but that's about the only scenario I can think of where you would intentionally want to disable intassist.  Perhaps then it's not worth leaving in place.

I would leave it in in case we run into problems. Things like the nomio parameter for the kernel have proven to be useful.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/12] s390x/pci: let intercept devices have separate PCI groups
  2021-12-16 15:16     ` Matthew Rosato
@ 2021-12-17  9:56       ` Pierre Morel
  0 siblings, 0 replies; 32+ messages in thread
From: Pierre Morel @ 2021-12-17  9:56 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 12/16/21 16:16, Matthew Rosato wrote:
> On 12/16/21 3:15 AM, Pierre Morel wrote:
>>
>>
>> On 12/7/21 22:04, Matthew Rosato wrote:
>>> Let's use the reserved pool of simulated PCI groups to allow intercept
>>> devices to have separate groups from interpreted devices as some group
>>> values may be different. If we run out of simulated PCI groups, 
>>> subsequent
>>> intercept devices just get the default group.
>>> Furthermore, if we encounter any PCI groups from hostdevs that are 
>>> marked
>>> as simulated, let's just assign them to the default group to avoid
>>> conflicts between host simulated groups and our own simulated groups.
>>
>> I have a problem here.
>> We will have the same hardware viewed by 2 different VFIO 
>> implementation (interpretation vs interception) reporting different 
>> groups ID.
> 
> Yes -- To be clear, this patch proposes that the interpreted device will 
> continue to report the passthrough group ID and the intercept device 
> will use a simulated group ID.
> 
>>
>> The alternative is to have them reporting same group ID with different 
>> values.
>>
> 
> I don't think we can do this.  For starters, we would have to throw out 
> the group tracking we do in QEMU; but for all we know the guest could be 
> doing similar tracking -- the implication of the group ID is that 
> everyone shares the same values so I don't think we can get away with 
> reporting different values for 2 members of the same group.
> 
> I think the other alternative is rather to always do something like...
> 
> 1) host reports its value via vfio capabilities as 'this is what an 
> interpreted device can use'
> 2) QEMU must accept those values as-is OR reduce them to some subset of 
> what both interpretation and intercept can support, and report only 
> those values for all devices in the group.  (More on this further down)
> 
> 
>> I fear both are wrong.
>>
>> On the other hand, should we have a difference in the QEMU command 
>> line between intercepted and interpreted devices for default values.
> 
> I'm not sure I follow what you suggest here.  Even if we somehow 
> provided a command-line means for specifying some of these values, they 
> would still be presented to the guest via clp and if the guest has 2 
> devices in the same group the clp results had better be the same.
> 
>> If not why not give up the host values so that in an hypothetical 
>> future migration we are clean with the GID ?
>>
> 
> Well, the interpreted device will use the passthrough group ID so in a 
> hypothetical future migration scenario we should be good there.

OK, sorry, I must be more clear.
I try to reformulate.
If have the same hardware and use VFIO, shouldn't we have the same 
hardawre description for the guest whenever the admin chose interception 
or interpretation?

If the answer is we do not care, what it may be, then it is OK.
Afterall interpretation should be the priviledged configuration, we can 
accept that there can be drawback using interception.

so I think I worry too much on this and having different groups is fine.

> 
> And simulated devices will still use the default group, so we should 
> also be OK there.
> 
> This really changes the behavior for 2 other classes of device:
> 
> 1) Intercept passthrough devices -- Yes, I agree that doing this is a 
> bit weird.  But my thinking was that these devices should be the 
> exception case rather than the norm moving forward, and it would clearly 
> dilineate the different in Q PCI FNGRP values.
> 
> 2) nested simulated devices -- These aren't using real GIDs anyway and I 
> would expect them to also be using the default group already -- forcing 
> these to the default group was basically to make sure they didn't 
> conflict with the simulated groups being created for intercept devices 
> above.
> 
>> I am not sure of this, just want to open a little discussion on this.
> 
> FWIW, I'm not 100% on this either, so a better idea is welcome.  One 
> thing I don't like, for example, is that we only have 16 simulated 
> groups to work with, and for example we might find it useful later to 
> split simulated devices into different groups based on type.
> 
>>
>> For example what could go wrong to keep the host values returned by 
>> the CAP?
> 
> As-is, we risk advertising the wrong maxstbl and dtsm value for some 
> devices in the group, depending on which device is plugged first. 
> Imagine you have 2 devices on group 5; one will be interpreted and the 
> other intercepted.
> 
> If the interpreted device plugs first, we will use the passthrough 
> maxstbl and dtsm for all devices in the group; so the intercept device 
> gets these values too.
> 
> If the intercept device plugs first, we will use the QEMU value for DTSM 
> and the smaller maxstbl requried for intercept passthrough.  So the 
> interpreted device gets these values too.
> 
> Worth noting, we could have more of these differences later -- But if we 
> want to avoid splitting the group, then we I think we have to circle 
> back to my 'alternative idea' above and provide equivalent support or 
> toleration for intercept devices so that we can report a single group 
> value that both types can support.
> 
> So insofar as dealing with the differences today...  maxstbl is pretty 
> easy, we can just tolerate supporting the larger maxstbl in QEMU by 
> adding logic to break up the I/O in pcistb_service_call.  We might have 
> to provide 2 different maxstbl values over vfio capabilities however 
> (what interpretation can support vs what kernel api supports for 
> intercept as this could change between host kernel versions)
> 
> DTSM is a little trickier.  We are actually OK today because both 
> intercept and interpreted devices will report the same value anyway, but 
> that could change in the future.  Maybe here QEMU must report
> 
> dtsm = (QEMU_SUPPORT_MASK & HOST_SUPPORT_MASK);
> 
> So basically: ensure that only what both QEMU intercept and passthrough 
> supports is advertised via the clp.  If we want to support a new type 
> later, then we must either support it in both kvm and QEMU to enable it 
> for the guest (or disallow intercept devices on that group, or provide 
> some means of forcing an intercept device to the default group, etc)
> 
> If we do the above, then I think we can drop the idea of using simulated 
> groups for intercpet passthrough devices.  What do you think?


Doing this, if we have a device with a larger DTSM type we need to 
update both QEMU and KVM to take full adventage on it.

So I think having two group as you propose is simpler.
And my concern was stupid, as they do not have the same group this is 
not the same hardware.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>


> 
>>
>>
>>>
>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>> ---
>>>   hw/s390x/s390-pci-bus.c         | 19 ++++++++++++++--
>>>   hw/s390x/s390-pci-vfio.c        | 40 ++++++++++++++++++++++++++++++---
>>>   include/hw/s390x/s390-pci-bus.h |  6 ++++-
>>>   3 files changed, 59 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>> index ab442f17fb..8b0f3ef120 100644
>>> --- a/hw/s390x/s390-pci-bus.c
>>> +++ b/hw/s390x/s390-pci-bus.c
>>> @@ -747,13 +747,14 @@ static void s390_pci_iommu_free(S390pciState 
>>> *s, PCIBus *bus, int32_t devfn)
>>>       object_unref(OBJECT(iommu));
>>>   }
>>> -S390PCIGroup *s390_group_create(int id)
>>> +S390PCIGroup *s390_group_create(int id, int host_id)
>>>   {
>>>       S390PCIGroup *group;
>>>       S390pciState *s = s390_get_phb();
>>>       group = g_new0(S390PCIGroup, 1);
>>>       group->id = id;
>>> +    group->host_id = host_id;
>>>       QTAILQ_INSERT_TAIL(&s->zpci_groups, group, link);
>>>       return group;
>>>   }
>>> @@ -771,12 +772,25 @@ S390PCIGroup *s390_group_find(int id)
>>>       return NULL;
>>>   }
>>> +S390PCIGroup *s390_group_find_host_sim(int host_id)
>>> +{
>>> +    S390PCIGroup *group;
>>> +    S390pciState *s = s390_get_phb();
>>> +
>>> +    QTAILQ_FOREACH(group, &s->zpci_groups, link) {
>>> +        if (group->id >= ZPCI_SIM_GRP_START && group->host_id == 
>>> host_id) {
>>> +            return group;
>>> +        }
>>> +    }
>>> +    return NULL;
>>> +}
>>> +
>>>   static void s390_pci_init_default_group(void)
>>>   {
>>>       S390PCIGroup *group;
>>>       ClpRspQueryPciGrp *resgrp;
>>> -    group = s390_group_create(ZPCI_DEFAULT_FN_GRP);
>>> +    group = s390_group_create(ZPCI_DEFAULT_FN_GRP, 
>>> ZPCI_DEFAULT_FN_GRP);
>>>       resgrp = &group->zpci_group;
>>>       resgrp->fr = 1;
>>>       resgrp->dasm = 0;
>>> @@ -824,6 +838,7 @@ static void s390_pcihost_realize(DeviceState 
>>> *dev, Error **errp)
>>>                                              NULL, g_free);
>>>       s->zpci_table = g_hash_table_new_full(g_int_hash, g_int_equal, 
>>> NULL, NULL);
>>>       s->bus_no = 0;
>>> +    s->next_sim_grp = ZPCI_SIM_GRP_START;
>>>       QTAILQ_INIT(&s->pending_sei);
>>>       QTAILQ_INIT(&s->zpci_devs);
>>>       QTAILQ_INIT(&s->zpci_dma_limit);
>>> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
>>> index c9269683f5..bdc5892287 100644
>>> --- a/hw/s390x/s390-pci-vfio.c
>>> +++ b/hw/s390x/s390-pci-vfio.c
>>> @@ -305,13 +305,17 @@ static void 
>>> s390_pci_read_group(S390PCIBusDevice *pbdev,
>>>   {
>>>       struct vfio_info_cap_header *hdr;
>>>       struct vfio_device_info_cap_zpci_group *cap;
>>> +    S390pciState *s = s390_get_phb();
>>>       ClpRspQueryPciGrp *resgrp;
>>>       VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, 
>>> pdev);
>>>       hdr = vfio_get_device_info_cap(info, 
>>> VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
>>> -    /* If capability not provided, just use the default group */
>>> -    if (hdr == NULL) {
>>> +    /*
>>> +     * If capability not provided or the underlying hostdev is 
>>> simulated, just
>>> +     * use the default group.
>>> +     */
>>> +    if (hdr == NULL || pbdev->zpci_fn.pfgid >= ZPCI_SIM_GRP_START) {
>>>           trace_s390_pci_clp_cap(vpci->vbasedev.name,
>>>                                  VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
>>>           pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
>>> @@ -320,11 +324,41 @@ static void 
>>> s390_pci_read_group(S390PCIBusDevice *pbdev,
>>>       }
>>>       cap = (void *) hdr;
>>> +    /*
>>> +     * For an intercept device, let's use an existing simulated 
>>> group if one
>>> +     * one was already created for other intercept devices in this 
>>> group.
>>> +     * If not, create a new simulated group if any are still available.
>>> +     * If all else fails, just fall back on the default group.
>>> +     */
>>> +    if (!pbdev->interp) {
>>> +        pbdev->pci_group = 
>>> s390_group_find_host_sim(pbdev->zpci_fn.pfgid);
>>> +        if (pbdev->pci_group) {
>>> +            /* Use existing simulated group */
>>> +            pbdev->zpci_fn.pfgid = pbdev->pci_group->id;
>>> +            return;
>>> +        } else {
>>> +            if (s->next_sim_grp == ZPCI_DEFAULT_FN_GRP) {
>>> +                /* All out of simulated groups, use default */
>>> +                trace_s390_pci_clp_cap(vpci->vbasedev.name,
>>> +                                       
>>> VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
>>> +                pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
>>> +                pbdev->pci_group = 
>>> s390_group_find(ZPCI_DEFAULT_FN_GRP);
>>> +                return;
>>> +            } else {
>>> +                /* We can assign a new simulated group */
>>> +                pbdev->zpci_fn.pfgid = s->next_sim_grp;
>>> +                s->next_sim_grp++;
>>> +                /* Fall through to create the new sim group using 
>>> CLP info */
>>> +            }
>>> +        }
>>> +    }
>>> +
>>>       /* See if the PCI group is already defined, create if not */
>>>       pbdev->pci_group = s390_group_find(pbdev->zpci_fn.pfgid);
>>>       if (!pbdev->pci_group) {
>>> -        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid);
>>> +        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid,
>>> +                                             pbdev->zpci_fn.pfgid);
>>>           resgrp = &pbdev->pci_group->zpci_group;
>>>           if (cap->flags & VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH) {
>>> diff --git a/include/hw/s390x/s390-pci-bus.h 
>>> b/include/hw/s390x/s390-pci-bus.h
>>> index 9941ca0084..8664023d5d 100644
>>> --- a/include/hw/s390x/s390-pci-bus.h
>>> +++ b/include/hw/s390x/s390-pci-bus.h
>>> @@ -315,13 +315,16 @@ typedef struct ZpciFmb {
>>>   QEMU_BUILD_BUG_MSG(offsetof(ZpciFmb, fmt0) != 48, "padding in 
>>> ZpciFmb");
>>>   #define ZPCI_DEFAULT_FN_GRP 0xFF
>>> +#define ZPCI_SIM_GRP_START 0xF0
>>>   typedef struct S390PCIGroup {
>>>       ClpRspQueryPciGrp zpci_group;
>>>       int id;
>>> +    int host_id;
>>>       QTAILQ_ENTRY(S390PCIGroup) link;
>>>   } S390PCIGroup;
>>> -S390PCIGroup *s390_group_create(int id);
>>> +S390PCIGroup *s390_group_create(int id, int host_id);
>>>   S390PCIGroup *s390_group_find(int id);
>>> +S390PCIGroup *s390_group_find_host_sim(int host_id);
>>>   struct S390PCIBusDevice {
>>>       DeviceState qdev;
>>> @@ -370,6 +373,7 @@ struct S390pciState {
>>>       QTAILQ_HEAD(, S390PCIBusDevice) zpci_devs;
>>>       QTAILQ_HEAD(, S390PCIDMACount) zpci_dma_limit;
>>>       QTAILQ_HEAD(, S390PCIGroup) zpci_groups;
>>> +    uint8_t next_sim_grp;
>>>   };
>>>   S390pciState *s390_get_phb(void);
>>>
>>
> 

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-12-17  9:57 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-07 21:04 [PATCH 00/12] s390x/pci: zPCI interpretation support Matthew Rosato
2021-12-07 21:04 ` [PATCH 01/12] s390x/pci: use a reserved ID for the default PCI group Matthew Rosato
2021-12-08 10:30   ` Thomas Huth
2021-12-07 21:04 ` [PATCH 02/12] s390x/pci: don't use hard-coded dma range in reg_ioat Matthew Rosato
2021-12-08 10:32   ` Thomas Huth
2021-12-07 21:04 ` [PATCH 03/12] s390x/pci: add supported DT information to clp response Matthew Rosato
2021-12-07 21:04 ` [PATCH 04/12] Update linux headers Matthew Rosato
2021-12-07 21:04 ` [PATCH 05/12] virtio-gpu: do not byteswap padding Matthew Rosato
2021-12-07 21:04 ` [PATCH 06/12] target/s390x: add zpci-interp to cpu models Matthew Rosato
2021-12-08 10:16   ` Christian Borntraeger
2021-12-08 18:00     ` Matthew Rosato
2021-12-07 21:04 ` [PATCH 07/12] s390x/pci: enable for load/store intepretation Matthew Rosato
2021-12-08 10:56   ` Thomas Huth
2021-12-15  7:44   ` Pierre Morel
2021-12-15 16:32     ` Matthew Rosato
2021-12-07 21:04 ` [PATCH 08/12] s390x/pci: don't fence interpreted devices without MSI-X Matthew Rosato
2021-12-08 11:04   ` Thomas Huth
2021-12-15  6:26   ` Pierre Morel
2021-12-07 21:04 ` [PATCH 09/12] s390x/pci: enable adapter event notification for interpreted devices Matthew Rosato
2021-12-08 11:29   ` Thomas Huth
2021-12-08 19:09     ` Matthew Rosato
2021-12-07 21:04 ` [PATCH 10/12] s390x/pci: use I/O Address Translation assist when interpreting Matthew Rosato
2021-12-16  8:03   ` Pierre Morel
2021-12-07 21:04 ` [PATCH 11/12] s390x/pci: use dtsm provided from vfio capabilities for interpreted devices Matthew Rosato
2021-12-15  7:47   ` Pierre Morel
2021-12-07 21:04 ` [PATCH 12/12] s390x/pci: let intercept devices have separate PCI groups Matthew Rosato
2021-12-16  8:15   ` Pierre Morel
2021-12-16 15:16     ` Matthew Rosato
2021-12-17  9:56       ` Pierre Morel
2021-12-15  7:35 ` [PATCH 00/12] s390x/pci: zPCI interpretation support Pierre Morel
2021-12-15 15:53   ` Matthew Rosato
2021-12-17  9:17     ` Christian Borntraeger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).