All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/9] s390x/pci: zPCI interpretation support
@ 2022-04-04 18:17 ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

For QEMU, the majority of the work in enabling instruction interpretation       
is handled via SHM bit settings (to indicate to firmware whether or not
interpretive execution facilities are to be used) + a new KVM ioctl is
used to setup firmware-interpreted forwarding of Adapter Event
Notifications.                                        
                                                                                
This series also adds a new, optional 'interpret' parameter to zpci which       
can be used to disable interpretation support (interpret=off) as well as        
an 'forwarding_assist' parameter to determine whether or not the firmware       
assist will be used for adapter event delivery (default when
interpretation is in use) or whether the host will be responsible for
delivering all adapter event notifications (forwarding_assist=off).
                                                                                
The ZPCI_INTERP CPU feature is added beginning with the z14 model to            
enable this support.                                                            
                                                                                
As a consequence of implementing zPCI interpretation, ISM devices now           
become eligible for passthrough (but only when zPCI interpretation is           
available).                                                                     
                                                                                
From the perspective of guest configuration, you passthrough zPCI devices       
in the same manner as before, with intepretation support being used by          
default if available in kernel+qemu.                                            

Will reply with a link to the associated kernel series.                                                                                
                       
Changelog v4->v5:
- Update to match latest interface from kernel code.  Major changes are:
  1) we no longer issue any ioctls to set a device to interpreted mode;
  rather, this will be done automatically if supported by the host kernel
  at the time the vfio group is associated with the KVM.  Then, the SHM
  bit setting will indicate whether or not interpretation is actually
  used.
  2) the RPCIT enhancments (IOMMU changes) are removed from this series,
  so the code associated with indicating a desired IOMMU are also
  removed.  With this series s390x-pci will continue to use only type1
  IOMMU for now.
- Refresh the linux headers sync.  Added a patch to tolerate some vfio
  uapi renames that will happen in 5.18 (this can be discarded if there
  is something else underway to address this)

Matthew Rosato (9):
  Update linux headers
  vfio: tolerate migration protocol v1 uapi renames
  target/s390x: add zpci-interp to cpu models
  s390x/pci: add routine to get host function handle from CLP info
  s390x/pci: enable for load/store intepretation
  s390x/pci: don't fence interpreted devices without MSI-X
  s390x/pci: enable adapter event notification for interpreted devices
  s390x/pci: let intercept devices have separate PCI groups
  s390x/pci: reflect proper maxstbl for groups of interpreted devices

 hw/s390x/meson.build                          |   1 +
 hw/s390x/s390-pci-bus.c                       | 107 ++++-
 hw/s390x/s390-pci-inst.c                      |  52 ++-
 hw/s390x/s390-pci-kvm.c                       |  51 +++
 hw/s390x/s390-pci-vfio.c                      | 129 +++++-
 hw/s390x/s390-virtio-ccw.c                    |   1 +
 hw/vfio/common.c                              |   2 +-
 hw/vfio/migration.c                           |  19 +-
 include/hw/s390x/s390-pci-bus.h               |   8 +-
 include/hw/s390x/s390-pci-kvm.h               |  38 ++
 include/hw/s390x/s390-pci-vfio.h              |   6 +
 .../linux/input-event-codes.h                 |   4 +-
 .../standard-headers/linux/virtio_config.h    |   6 +
 .../standard-headers/linux/virtio_crypto.h    |  82 +++-
 linux-headers/asm-arm64/kvm.h                 |  16 +
 linux-headers/asm-generic/mman-common.h       |   2 +
 linux-headers/asm-mips/mman.h                 |   2 +
 linux-headers/asm-s390/kvm.h                  |   1 +
 linux-headers/linux/kvm.h                     |  50 ++-
 linux-headers/linux/psci.h                    |   4 +
 linux-headers/linux/userfaultfd.h             |   8 +-
 linux-headers/linux/vfio.h                    | 406 +++++++++---------
 linux-headers/linux/vfio_zdev.h               |   7 +
 linux-headers/linux/vhost.h                   |   7 +
 target/s390x/cpu_features_def.h.inc           |   1 +
 target/s390x/gen-features.c                   |   2 +
 target/s390x/kvm/kvm.c                        |   8 +
 target/s390x/kvm/kvm_s390x.h                  |   1 +
 28 files changed, 763 insertions(+), 258 deletions(-)
 create mode 100644 hw/s390x/s390-pci-kvm.c
 create mode 100644 include/hw/s390x/s390-pci-kvm.h

-- 
2.27.0



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v5 0/9] s390x/pci: zPCI interpretation support
@ 2022-04-04 18:17 ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

For QEMU, the majority of the work in enabling instruction interpretation       
is handled via SHM bit settings (to indicate to firmware whether or not
interpretive execution facilities are to be used) + a new KVM ioctl is
used to setup firmware-interpreted forwarding of Adapter Event
Notifications.                                        
                                                                                
This series also adds a new, optional 'interpret' parameter to zpci which       
can be used to disable interpretation support (interpret=off) as well as        
an 'forwarding_assist' parameter to determine whether or not the firmware       
assist will be used for adapter event delivery (default when
interpretation is in use) or whether the host will be responsible for
delivering all adapter event notifications (forwarding_assist=off).
                                                                                
The ZPCI_INTERP CPU feature is added beginning with the z14 model to            
enable this support.                                                            
                                                                                
As a consequence of implementing zPCI interpretation, ISM devices now           
become eligible for passthrough (but only when zPCI interpretation is           
available).                                                                     
                                                                                
From the perspective of guest configuration, you passthrough zPCI devices       
in the same manner as before, with intepretation support being used by          
default if available in kernel+qemu.                                            

Will reply with a link to the associated kernel series.                                                                                
                       
Changelog v4->v5:
- Update to match latest interface from kernel code.  Major changes are:
  1) we no longer issue any ioctls to set a device to interpreted mode;
  rather, this will be done automatically if supported by the host kernel
  at the time the vfio group is associated with the KVM.  Then, the SHM
  bit setting will indicate whether or not interpretation is actually
  used.
  2) the RPCIT enhancments (IOMMU changes) are removed from this series,
  so the code associated with indicating a desired IOMMU are also
  removed.  With this series s390x-pci will continue to use only type1
  IOMMU for now.
- Refresh the linux headers sync.  Added a patch to tolerate some vfio
  uapi renames that will happen in 5.18 (this can be discarded if there
  is something else underway to address this)

Matthew Rosato (9):
  Update linux headers
  vfio: tolerate migration protocol v1 uapi renames
  target/s390x: add zpci-interp to cpu models
  s390x/pci: add routine to get host function handle from CLP info
  s390x/pci: enable for load/store intepretation
  s390x/pci: don't fence interpreted devices without MSI-X
  s390x/pci: enable adapter event notification for interpreted devices
  s390x/pci: let intercept devices have separate PCI groups
  s390x/pci: reflect proper maxstbl for groups of interpreted devices

 hw/s390x/meson.build                          |   1 +
 hw/s390x/s390-pci-bus.c                       | 107 ++++-
 hw/s390x/s390-pci-inst.c                      |  52 ++-
 hw/s390x/s390-pci-kvm.c                       |  51 +++
 hw/s390x/s390-pci-vfio.c                      | 129 +++++-
 hw/s390x/s390-virtio-ccw.c                    |   1 +
 hw/vfio/common.c                              |   2 +-
 hw/vfio/migration.c                           |  19 +-
 include/hw/s390x/s390-pci-bus.h               |   8 +-
 include/hw/s390x/s390-pci-kvm.h               |  38 ++
 include/hw/s390x/s390-pci-vfio.h              |   6 +
 .../linux/input-event-codes.h                 |   4 +-
 .../standard-headers/linux/virtio_config.h    |   6 +
 .../standard-headers/linux/virtio_crypto.h    |  82 +++-
 linux-headers/asm-arm64/kvm.h                 |  16 +
 linux-headers/asm-generic/mman-common.h       |   2 +
 linux-headers/asm-mips/mman.h                 |   2 +
 linux-headers/asm-s390/kvm.h                  |   1 +
 linux-headers/linux/kvm.h                     |  50 ++-
 linux-headers/linux/psci.h                    |   4 +
 linux-headers/linux/userfaultfd.h             |   8 +-
 linux-headers/linux/vfio.h                    | 406 +++++++++---------
 linux-headers/linux/vfio_zdev.h               |   7 +
 linux-headers/linux/vhost.h                   |   7 +
 target/s390x/cpu_features_def.h.inc           |   1 +
 target/s390x/gen-features.c                   |   2 +
 target/s390x/kvm/kvm.c                        |   8 +
 target/s390x/kvm/kvm_s390x.h                  |   1 +
 28 files changed, 763 insertions(+), 258 deletions(-)
 create mode 100644 hw/s390x/s390-pci-kvm.c
 create mode 100644 include/hw/s390x/s390-pci-kvm.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v5 1/9] Update linux headers
  2022-04-04 18:17 ` Matthew Rosato
@ 2022-04-04 18:17   ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

This is a placeholder that pulls in 5.18-rc1 + unmerged kernel changes
required by this item.  A proper header sync can be done once the
associated kernel code merges.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 .../linux/input-event-codes.h                 |   4 +-
 .../standard-headers/linux/virtio_config.h    |   6 +
 .../standard-headers/linux/virtio_crypto.h    |  82 +++-
 linux-headers/asm-arm64/kvm.h                 |  16 +
 linux-headers/asm-generic/mman-common.h       |   2 +
 linux-headers/asm-mips/mman.h                 |   2 +
 linux-headers/asm-s390/kvm.h                  |   1 +
 linux-headers/linux/kvm.h                     |  50 ++-
 linux-headers/linux/psci.h                    |   4 +
 linux-headers/linux/userfaultfd.h             |   8 +-
 linux-headers/linux/vfio.h                    | 406 +++++++++---------
 linux-headers/linux/vfio_zdev.h               |   7 +
 linux-headers/linux/vhost.h                   |   7 +
 13 files changed, 376 insertions(+), 219 deletions(-)

diff --git a/include/standard-headers/linux/input-event-codes.h b/include/standard-headers/linux/input-event-codes.h
index b5e86b40ab..e36c01003a 100644
--- a/include/standard-headers/linux/input-event-codes.h
+++ b/include/standard-headers/linux/input-event-codes.h
@@ -278,7 +278,8 @@
 #define KEY_PAUSECD		201
 #define KEY_PROG3		202
 #define KEY_PROG4		203
-#define KEY_DASHBOARD		204	/* AL Dashboard */
+#define KEY_ALL_APPLICATIONS	204	/* AC Desktop Show All Applications */
+#define KEY_DASHBOARD		KEY_ALL_APPLICATIONS
 #define KEY_SUSPEND		205
 #define KEY_CLOSE		206	/* AC Close */
 #define KEY_PLAY		207
@@ -612,6 +613,7 @@
 #define KEY_ASSISTANT		0x247	/* AL Context-aware desktop assistant */
 #define KEY_KBD_LAYOUT_NEXT	0x248	/* AC Next Keyboard Layout Select */
 #define KEY_EMOJI_PICKER	0x249	/* Show/hide emoji picker (HUTRR101) */
+#define KEY_DICTATE		0x24a	/* Start or Stop Voice Dictation Session (HUTRR99) */
 
 #define KEY_BRIGHTNESS_MIN		0x250	/* Set Brightness to Minimum */
 #define KEY_BRIGHTNESS_MAX		0x251	/* Set Brightness to Maximum */
diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
index 22e3a85f67..7acd8d4abc 100644
--- a/include/standard-headers/linux/virtio_config.h
+++ b/include/standard-headers/linux/virtio_config.h
@@ -80,6 +80,12 @@
 /* This feature indicates support for the packed virtqueue layout. */
 #define VIRTIO_F_RING_PACKED		34
 
+/*
+ * Inorder feature indicates that all buffers are used by the device
+ * in the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER		35
+
 /*
  * This feature indicates that memory accesses by the driver and the
  * device are ordered in a way described by the platform.
diff --git a/include/standard-headers/linux/virtio_crypto.h b/include/standard-headers/linux/virtio_crypto.h
index 5ff0b4ee59..68066dafb6 100644
--- a/include/standard-headers/linux/virtio_crypto.h
+++ b/include/standard-headers/linux/virtio_crypto.h
@@ -37,6 +37,7 @@
 #define VIRTIO_CRYPTO_SERVICE_HASH   1
 #define VIRTIO_CRYPTO_SERVICE_MAC    2
 #define VIRTIO_CRYPTO_SERVICE_AEAD   3
+#define VIRTIO_CRYPTO_SERVICE_AKCIPHER 4
 
 #define VIRTIO_CRYPTO_OPCODE(service, op)   (((service) << 8) | (op))
 
@@ -57,6 +58,10 @@ struct virtio_crypto_ctrl_header {
 	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02)
 #define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION \
 	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03)
+#define VIRTIO_CRYPTO_AKCIPHER_CREATE_SESSION \
+	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x04)
+#define VIRTIO_CRYPTO_AKCIPHER_DESTROY_SESSION \
+	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x05)
 	uint32_t opcode;
 	uint32_t algo;
 	uint32_t flag;
@@ -180,6 +185,58 @@ struct virtio_crypto_aead_create_session_req {
 	uint8_t padding[32];
 };
 
+struct virtio_crypto_rsa_session_para {
+#define VIRTIO_CRYPTO_RSA_RAW_PADDING   0
+#define VIRTIO_CRYPTO_RSA_PKCS1_PADDING 1
+	uint32_t padding_algo;
+
+#define VIRTIO_CRYPTO_RSA_NO_HASH   0
+#define VIRTIO_CRYPTO_RSA_MD2       1
+#define VIRTIO_CRYPTO_RSA_MD3       2
+#define VIRTIO_CRYPTO_RSA_MD4       3
+#define VIRTIO_CRYPTO_RSA_MD5       4
+#define VIRTIO_CRYPTO_RSA_SHA1      5
+#define VIRTIO_CRYPTO_RSA_SHA256    6
+#define VIRTIO_CRYPTO_RSA_SHA384    7
+#define VIRTIO_CRYPTO_RSA_SHA512    8
+#define VIRTIO_CRYPTO_RSA_SHA224    9
+	uint32_t hash_algo;
+};
+
+struct virtio_crypto_ecdsa_session_para {
+#define VIRTIO_CRYPTO_CURVE_UNKNOWN   0
+#define VIRTIO_CRYPTO_CURVE_NIST_P192 1
+#define VIRTIO_CRYPTO_CURVE_NIST_P224 2
+#define VIRTIO_CRYPTO_CURVE_NIST_P256 3
+#define VIRTIO_CRYPTO_CURVE_NIST_P384 4
+#define VIRTIO_CRYPTO_CURVE_NIST_P521 5
+	uint32_t curve_id;
+	uint32_t padding;
+};
+
+struct virtio_crypto_akcipher_session_para {
+#define VIRTIO_CRYPTO_NO_AKCIPHER    0
+#define VIRTIO_CRYPTO_AKCIPHER_RSA   1
+#define VIRTIO_CRYPTO_AKCIPHER_DSA   2
+#define VIRTIO_CRYPTO_AKCIPHER_ECDSA 3
+	uint32_t algo;
+
+#define VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PUBLIC  1
+#define VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PRIVATE 2
+	uint32_t keytype;
+	uint32_t keylen;
+
+	union {
+		struct virtio_crypto_rsa_session_para rsa;
+		struct virtio_crypto_ecdsa_session_para ecdsa;
+	} u;
+};
+
+struct virtio_crypto_akcipher_create_session_req {
+	struct virtio_crypto_akcipher_session_para para;
+	uint8_t padding[36];
+};
+
 struct virtio_crypto_alg_chain_session_para {
 #define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_HASH_THEN_CIPHER  1
 #define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_CIPHER_THEN_HASH  2
@@ -247,6 +304,8 @@ struct virtio_crypto_op_ctrl_req {
 			mac_create_session;
 		struct virtio_crypto_aead_create_session_req
 			aead_create_session;
+		struct virtio_crypto_akcipher_create_session_req
+			akcipher_create_session;
 		struct virtio_crypto_destroy_session_req
 			destroy_session;
 		uint8_t padding[56];
@@ -266,6 +325,14 @@ struct virtio_crypto_op_header {
 	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x00)
 #define VIRTIO_CRYPTO_AEAD_DECRYPT \
 	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x01)
+#define VIRTIO_CRYPTO_AKCIPHER_ENCRYPT \
+	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x00)
+#define VIRTIO_CRYPTO_AKCIPHER_DECRYPT \
+	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x01)
+#define VIRTIO_CRYPTO_AKCIPHER_SIGN \
+	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x02)
+#define VIRTIO_CRYPTO_AKCIPHER_VERIFY \
+	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x03)
 	uint32_t opcode;
 	/* algo should be service-specific algorithms */
 	uint32_t algo;
@@ -390,6 +457,16 @@ struct virtio_crypto_aead_data_req {
 	uint8_t padding[32];
 };
 
+struct virtio_crypto_akcipher_para {
+	uint32_t src_data_len;
+	uint32_t dst_data_len;
+};
+
+struct virtio_crypto_akcipher_data_req {
+	struct virtio_crypto_akcipher_para para;
+	uint8_t padding[40];
+};
+
 /* The request of the data virtqueue's packet */
 struct virtio_crypto_op_data_req {
 	struct virtio_crypto_op_header header;
@@ -399,6 +476,7 @@ struct virtio_crypto_op_data_req {
 		struct virtio_crypto_hash_data_req hash_req;
 		struct virtio_crypto_mac_data_req mac_req;
 		struct virtio_crypto_aead_data_req aead_req;
+		struct virtio_crypto_akcipher_data_req akcipher_req;
 		uint8_t padding[48];
 	} u;
 };
@@ -408,6 +486,8 @@ struct virtio_crypto_op_data_req {
 #define VIRTIO_CRYPTO_BADMSG    2
 #define VIRTIO_CRYPTO_NOTSUPP   3
 #define VIRTIO_CRYPTO_INVSESS   4 /* Invalid session id */
+#define VIRTIO_CRYPTO_NOSPC     5 /* no free session ID */
+#define VIRTIO_CRYPTO_KEY_REJECTED 6 /* Signature verification failed */
 
 /* The accelerator hardware is ready */
 #define VIRTIO_CRYPTO_S_HW_READY  (1 << 0)
@@ -438,7 +518,7 @@ struct virtio_crypto_config {
 	uint32_t max_cipher_key_len;
 	/* Maximum length of authenticated key */
 	uint32_t max_auth_key_len;
-	uint32_t reserve;
+	uint32_t akcipher_algo;
 	/* Maximum size of each crypto request's content */
 	uint64_t max_size;
 };
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index 3d2ce9912d..5c28a9737a 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -281,6 +281,11 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED	3
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED     	(1U << 4)
 
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3	KVM_REG_ARM_FW_REG(3)
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_NOT_AVAIL		0
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_AVAIL		1
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_NOT_REQUIRED	2
+
 /* SVE registers */
 #define KVM_REG_ARM64_SVE		(0x15 << KVM_REG_ARM_COPROC_SHIFT)
 
@@ -362,6 +367,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
 #define   KVM_ARM_VCPU_PMU_V3_INIT	1
 #define   KVM_ARM_VCPU_PMU_V3_FILTER	2
+#define   KVM_ARM_VCPU_PMU_V3_SET_PMU	3
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
@@ -411,6 +417,16 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_PSCI_RET_INVAL		PSCI_RET_INVALID_PARAMS
 #define KVM_PSCI_RET_DENIED		PSCI_RET_DENIED
 
+/* arm64-specific kvm_run::system_event flags */
+/*
+ * Reset caused by a PSCI v1.1 SYSTEM_RESET2 call.
+ * Valid only when the system event has a type of KVM_SYSTEM_EVENT_RESET.
+ */
+#define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2	(1ULL << 0)
+
+/* run->fail_entry.hardware_entry_failure_reason codes. */
+#define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED	(1ULL << 0)
+
 #endif
 
 #endif /* __ARM_KVM_H__ */
diff --git a/linux-headers/asm-generic/mman-common.h b/linux-headers/asm-generic/mman-common.h
index 1567a3294c..6c1aa92a92 100644
--- a/linux-headers/asm-generic/mman-common.h
+++ b/linux-headers/asm-generic/mman-common.h
@@ -75,6 +75,8 @@
 #define MADV_POPULATE_READ	22	/* populate (prefault) page tables readable */
 #define MADV_POPULATE_WRITE	23	/* populate (prefault) page tables writable */
 
+#define MADV_DONTNEED_LOCKED	24	/* like DONTNEED, but drop locked pages too */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/linux-headers/asm-mips/mman.h b/linux-headers/asm-mips/mman.h
index 40b210c65a..1be428663c 100644
--- a/linux-headers/asm-mips/mman.h
+++ b/linux-headers/asm-mips/mman.h
@@ -101,6 +101,8 @@
 #define MADV_POPULATE_READ	22	/* populate (prefault) page tables readable */
 #define MADV_POPULATE_WRITE	23	/* populate (prefault) page tables writable */
 
+#define MADV_DONTNEED_LOCKED	24	/* like DONTNEED, but drop locked pages too */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h
index f053b8304a..d8259ff9a1 100644
--- a/linux-headers/asm-s390/kvm.h
+++ b/linux-headers/asm-s390/kvm.h
@@ -130,6 +130,7 @@ struct kvm_s390_vm_cpu_machine {
 #define KVM_S390_VM_CPU_FEAT_PFMFI	11
 #define KVM_S390_VM_CPU_FEAT_SIGPIF	12
 #define KVM_S390_VM_CPU_FEAT_KSS	13
+#define KVM_S390_VM_CPU_FEAT_ZPCI_INTERP 14
 struct kvm_s390_vm_cpu_feat {
 	__u64 feat[16];
 };
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index d232feaae9..f71befac09 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -562,9 +562,12 @@ struct kvm_s390_mem_op {
 	__u32 op;		/* type of operation */
 	__u64 buf;		/* buffer in userspace */
 	union {
-		__u8 ar;	/* the access register number */
+		struct {
+			__u8 ar;	/* the access register number */
+			__u8 key;	/* access key, ignored if flag unset */
+		};
 		__u32 sida_offset; /* offset into the sida */
-		__u8 reserved[32]; /* should be set to 0 */
+		__u8 reserved[32]; /* ignored */
 	};
 };
 /* types for kvm_s390_mem_op->op */
@@ -572,9 +575,12 @@ struct kvm_s390_mem_op {
 #define KVM_S390_MEMOP_LOGICAL_WRITE	1
 #define KVM_S390_MEMOP_SIDA_READ	2
 #define KVM_S390_MEMOP_SIDA_WRITE	3
+#define KVM_S390_MEMOP_ABSOLUTE_READ	4
+#define KVM_S390_MEMOP_ABSOLUTE_WRITE	5
 /* flags for kvm_s390_mem_op->flags */
 #define KVM_S390_MEMOP_F_CHECK_ONLY		(1ULL << 0)
 #define KVM_S390_MEMOP_F_INJECT_EXCEPTION	(1ULL << 1)
+#define KVM_S390_MEMOP_F_SKEY_PROTECTION	(1ULL << 2)
 
 /* for KVM_INTERRUPT */
 struct kvm_interrupt {
@@ -1134,6 +1140,11 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
 #define KVM_CAP_SYS_ATTRIBUTES 209
+#define KVM_CAP_PPC_AIL_MODE_3 210
+#define KVM_CAP_S390_MEM_OP_EXTENSION 211
+#define KVM_CAP_PMU_CAPABILITY 212
+#define KVM_CAP_DISABLE_QUIRKS2 213
+#define KVM_CAP_S390_ZPCI_OP 214
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1624,9 +1635,6 @@ struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
-/* Available with KVM_CAP_XSAVE2 */
-#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct kvm_xsave)
-
 struct kvm_s390_pv_sec_parm {
 	__u64 origin;
 	__u64 length;
@@ -1973,6 +1981,8 @@ struct kvm_dirty_gfn {
 #define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
 #define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
 
+#define KVM_PMU_CAP_DISABLE                    (1 << 0)
+
 /**
  * struct kvm_stats_header - Header of per vm/vcpu binary statistics data.
  * @flags: Some extra information for header, always 0 for now.
@@ -2051,4 +2061,34 @@ struct kvm_stats_desc {
 /* Available with KVM_CAP_XSAVE2 */
 #define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct kvm_xsave)
 
+/* Available with KVM_CAP_S390_ZPCI_OP */
+#define KVM_S390_ZPCI_OP	  _IOW(KVMIO,  0xd0, struct kvm_s390_zpci_op)
+
+struct kvm_s390_zpci_op {
+	/* in */
+	__u32 fh;		/* target device */
+	__u8  op;		/* operation to perform */
+	__u8  pad[3];
+	union {
+		/* for KVM_S390_ZPCIOP_REG_AEN */
+		struct {
+			__u64 ibv;	/* Guest addr of interrupt bit vector */
+			__u64 sb;	/* Guest addr of summary bit */
+			__u32 flags;
+			__u32 noi;	/* Number of interrupts */
+			__u8 isc;	/* Guest interrupt subclass */
+			__u8 sbo;	/* Offset of guest summary bit vector */
+			__u16 pad;
+		} reg_aen;
+		__u64 reserved[8];
+	} u;
+};
+
+/* types for kvm_s390_zpci_op->op */
+#define KVM_S390_ZPCIOP_REG_AEN		0
+#define KVM_S390_ZPCIOP_DEREG_AEN	1
+
+/* flags for kvm_s390_zpci_op->u.reg_aen.flags */
+#define KVM_S390_ZPCIOP_REGAEN_HOST	(1 << 0)
+
 #endif /* __LINUX_KVM_H */
diff --git a/linux-headers/linux/psci.h b/linux-headers/linux/psci.h
index a6772d508b..213b2a0f70 100644
--- a/linux-headers/linux/psci.h
+++ b/linux-headers/linux/psci.h
@@ -82,6 +82,10 @@
 #define PSCI_0_2_TOS_UP_NO_MIGRATE		1
 #define PSCI_0_2_TOS_MP				2
 
+/* PSCI v1.1 reset type encoding for SYSTEM_RESET2 */
+#define PSCI_1_1_RESET_TYPE_SYSTEM_WARM_RESET	0
+#define PSCI_1_1_RESET_TYPE_VENDOR_START	0x80000000U
+
 /* PSCI version decoding (independent of PSCI version) */
 #define PSCI_VERSION_MAJOR_SHIFT		16
 #define PSCI_VERSION_MINOR_MASK			\
diff --git a/linux-headers/linux/userfaultfd.h b/linux-headers/linux/userfaultfd.h
index 8479af5f4c..769b8379e4 100644
--- a/linux-headers/linux/userfaultfd.h
+++ b/linux-headers/linux/userfaultfd.h
@@ -32,7 +32,8 @@
 			   UFFD_FEATURE_SIGBUS |		\
 			   UFFD_FEATURE_THREAD_ID |		\
 			   UFFD_FEATURE_MINOR_HUGETLBFS |	\
-			   UFFD_FEATURE_MINOR_SHMEM)
+			   UFFD_FEATURE_MINOR_SHMEM |		\
+			   UFFD_FEATURE_EXACT_ADDRESS)
 #define UFFD_API_IOCTLS				\
 	((__u64)1 << _UFFDIO_REGISTER |		\
 	 (__u64)1 << _UFFDIO_UNREGISTER |	\
@@ -189,6 +190,10 @@ struct uffdio_api {
 	 *
 	 * UFFD_FEATURE_MINOR_SHMEM indicates the same support as
 	 * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead.
+	 *
+	 * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page
+	 * faults would be provided and the offset within the page would not be
+	 * masked.
 	 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP		(1<<0)
 #define UFFD_FEATURE_EVENT_FORK			(1<<1)
@@ -201,6 +206,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_THREAD_ID			(1<<8)
 #define UFFD_FEATURE_MINOR_HUGETLBFS		(1<<9)
 #define UFFD_FEATURE_MINOR_SHMEM		(1<<10)
+#define UFFD_FEATURE_EXACT_ADDRESS		(1<<11)
 	__u64 features;
 
 	__u64 ioctls;
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index e680594f27..e9f7795c39 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -323,7 +323,7 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_TYPE_PCI_VENDOR_MASK	(0xffff)
 #define VFIO_REGION_TYPE_GFX                    (1)
 #define VFIO_REGION_TYPE_CCW			(2)
-#define VFIO_REGION_TYPE_MIGRATION              (3)
+#define VFIO_REGION_TYPE_MIGRATION_DEPRECATED   (3)
 
 /* sub-types for VFIO_REGION_TYPE_PCI_* */
 
@@ -405,225 +405,29 @@ struct vfio_region_gfx_edid {
 #define VFIO_REGION_SUBTYPE_CCW_CRW		(3)
 
 /* sub-types for VFIO_REGION_TYPE_MIGRATION */
-#define VFIO_REGION_SUBTYPE_MIGRATION           (1)
-
-/*
- * The structure vfio_device_migration_info is placed at the 0th offset of
- * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
- * migration information. Field accesses from this structure are only supported
- * at their native width and alignment. Otherwise, the result is undefined and
- * vendor drivers should return an error.
- *
- * device_state: (read/write)
- *      - The user application writes to this field to inform the vendor driver
- *        about the device state to be transitioned to.
- *      - The vendor driver should take the necessary actions to change the
- *        device state. After successful transition to a given state, the
- *        vendor driver should return success on write(device_state, state)
- *        system call. If the device state transition fails, the vendor driver
- *        should return an appropriate -errno for the fault condition.
- *      - On the user application side, if the device state transition fails,
- *	  that is, if write(device_state, state) returns an error, read
- *	  device_state again to determine the current state of the device from
- *	  the vendor driver.
- *      - The vendor driver should return previous state of the device unless
- *        the vendor driver has encountered an internal error, in which case
- *        the vendor driver may report the device_state VFIO_DEVICE_STATE_ERROR.
- *      - The user application must use the device reset ioctl to recover the
- *        device from VFIO_DEVICE_STATE_ERROR state. If the device is
- *        indicated to be in a valid device state by reading device_state, the
- *        user application may attempt to transition the device to any valid
- *        state reachable from the current state or terminate itself.
- *
- *      device_state consists of 3 bits:
- *      - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is clear,
- *        it indicates the _STOP state. When the device state is changed to
- *        _STOP, driver should stop the device before write() returns.
- *      - If bit 1 is set, it indicates the _SAVING state, which means that the
- *        driver should start gathering device state information that will be
- *        provided to the VFIO user application to save the device's state.
- *      - If bit 2 is set, it indicates the _RESUMING state, which means that
- *        the driver should prepare to resume the device. Data provided through
- *        the migration region should be used to resume the device.
- *      Bits 3 - 31 are reserved for future use. To preserve them, the user
- *      application should perform a read-modify-write operation on this
- *      field when modifying the specified bits.
- *
- *  +------- _RESUMING
- *  |+------ _SAVING
- *  ||+----- _RUNNING
- *  |||
- *  000b => Device Stopped, not saving or resuming
- *  001b => Device running, which is the default state
- *  010b => Stop the device & save the device state, stop-and-copy state
- *  011b => Device running and save the device state, pre-copy state
- *  100b => Device stopped and the device state is resuming
- *  101b => Invalid state
- *  110b => Error state
- *  111b => Invalid state
- *
- * State transitions:
- *
- *              _RESUMING  _RUNNING    Pre-copy    Stop-and-copy   _STOP
- *                (100b)     (001b)     (011b)        (010b)       (000b)
- * 0. Running or default state
- *                             |
- *
- * 1. Normal Shutdown (optional)
- *                             |------------------------------------->|
- *
- * 2. Save the state or suspend
- *                             |------------------------->|---------->|
- *
- * 3. Save the state during live migration
- *                             |----------->|------------>|---------->|
- *
- * 4. Resuming
- *                  |<---------|
- *
- * 5. Resumed
- *                  |--------->|
- *
- * 0. Default state of VFIO device is _RUNNING when the user application starts.
- * 1. During normal shutdown of the user application, the user application may
- *    optionally change the VFIO device state from _RUNNING to _STOP. This
- *    transition is optional. The vendor driver must support this transition but
- *    must not require it.
- * 2. When the user application saves state or suspends the application, the
- *    device state transitions from _RUNNING to stop-and-copy and then to _STOP.
- *    On state transition from _RUNNING to stop-and-copy, driver must stop the
- *    device, save the device state and send it to the application through the
- *    migration region. The sequence to be followed for such transition is given
- *    below.
- * 3. In live migration of user application, the state transitions from _RUNNING
- *    to pre-copy, to stop-and-copy, and to _STOP.
- *    On state transition from _RUNNING to pre-copy, the driver should start
- *    gathering the device state while the application is still running and send
- *    the device state data to application through the migration region.
- *    On state transition from pre-copy to stop-and-copy, the driver must stop
- *    the device, save the device state and send it to the user application
- *    through the migration region.
- *    Vendor drivers must support the pre-copy state even for implementations
- *    where no data is provided to the user before the stop-and-copy state. The
- *    user must not be required to consume all migration data before the device
- *    transitions to a new state, including the stop-and-copy state.
- *    The sequence to be followed for above two transitions is given below.
- * 4. To start the resuming phase, the device state should be transitioned from
- *    the _RUNNING to the _RESUMING state.
- *    In the _RESUMING state, the driver should use the device state data
- *    received through the migration region to resume the device.
- * 5. After providing saved device data to the driver, the application should
- *    change the state from _RESUMING to _RUNNING.
- *
- * reserved:
- *      Reads on this field return zero and writes are ignored.
- *
- * pending_bytes: (read only)
- *      The number of pending bytes still to be migrated from the vendor driver.
- *
- * data_offset: (read only)
- *      The user application should read data_offset field from the migration
- *      region. The user application should read the device data from this
- *      offset within the migration region during the _SAVING state or write
- *      the device data during the _RESUMING state. See below for details of
- *      sequence to be followed.
- *
- * data_size: (read/write)
- *      The user application should read data_size to get the size in bytes of
- *      the data copied in the migration region during the _SAVING state and
- *      write the size in bytes of the data copied in the migration region
- *      during the _RESUMING state.
- *
- * The format of the migration region is as follows:
- *  ------------------------------------------------------------------
- * |vfio_device_migration_info|    data section                      |
- * |                          |     ///////////////////////////////  |
- * ------------------------------------------------------------------
- *   ^                              ^
- *  offset 0-trapped part        data_offset
- *
- * The structure vfio_device_migration_info is always followed by the data
- * section in the region, so data_offset will always be nonzero. The offset
- * from where the data is copied is decided by the kernel driver. The data
- * section can be trapped, mmapped, or partitioned, depending on how the kernel
- * driver defines the data section. The data section partition can be defined
- * as mapped by the sparse mmap capability. If mmapped, data_offset must be
- * page aligned, whereas initial section which contains the
- * vfio_device_migration_info structure, might not end at the offset, which is
- * page aligned. The user is not required to access through mmap regardless
- * of the capabilities of the region mmap.
- * The vendor driver should determine whether and how to partition the data
- * section. The vendor driver should return data_offset accordingly.
- *
- * The sequence to be followed while in pre-copy state and stop-and-copy state
- * is as follows:
- * a. Read pending_bytes, indicating the start of a new iteration to get device
- *    data. Repeated read on pending_bytes at this stage should have no side
- *    effects.
- *    If pending_bytes == 0, the user application should not iterate to get data
- *    for that device.
- *    If pending_bytes > 0, perform the following steps.
- * b. Read data_offset, indicating that the vendor driver should make data
- *    available through the data section. The vendor driver should return this
- *    read operation only after data is available from (region + data_offset)
- *    to (region + data_offset + data_size).
- * c. Read data_size, which is the amount of data in bytes available through
- *    the migration region.
- *    Read on data_offset and data_size should return the offset and size of
- *    the current buffer if the user application reads data_offset and
- *    data_size more than once here.
- * d. Read data_size bytes of data from (region + data_offset) from the
- *    migration region.
- * e. Process the data.
- * f. Read pending_bytes, which indicates that the data from the previous
- *    iteration has been read. If pending_bytes > 0, go to step b.
- *
- * The user application can transition from the _SAVING|_RUNNING
- * (pre-copy state) to the _SAVING (stop-and-copy) state regardless of the
- * number of pending bytes. The user application should iterate in _SAVING
- * (stop-and-copy) until pending_bytes is 0.
- *
- * The sequence to be followed while _RESUMING device state is as follows:
- * While data for this device is available, repeat the following steps:
- * a. Read data_offset from where the user application should write data.
- * b. Write migration data starting at the migration region + data_offset for
- *    the length determined by data_size from the migration source.
- * c. Write data_size, which indicates to the vendor driver that data is
- *    written in the migration region. Vendor driver must return this write
- *    operations on consuming data. Vendor driver should apply the
- *    user-provided migration region data to the device resume state.
- *
- * If an error occurs during the above sequences, the vendor driver can return
- * an error code for next read() or write() operation, which will terminate the
- * loop. The user application should then take the next necessary action, for
- * example, failing migration or terminating the user application.
- *
- * For the user application, data is opaque. The user application should write
- * data in the same order as the data is received and the data should be of
- * same transaction size at the source.
- */
+#define VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED (1)
 
 struct vfio_device_migration_info {
 	__u32 device_state;         /* VFIO device state */
-#define VFIO_DEVICE_STATE_STOP      (0)
-#define VFIO_DEVICE_STATE_RUNNING   (1 << 0)
-#define VFIO_DEVICE_STATE_SAVING    (1 << 1)
-#define VFIO_DEVICE_STATE_RESUMING  (1 << 2)
-#define VFIO_DEVICE_STATE_MASK      (VFIO_DEVICE_STATE_RUNNING | \
-				     VFIO_DEVICE_STATE_SAVING |  \
-				     VFIO_DEVICE_STATE_RESUMING)
+#define VFIO_DEVICE_STATE_V1_STOP      (0)
+#define VFIO_DEVICE_STATE_V1_RUNNING   (1 << 0)
+#define VFIO_DEVICE_STATE_V1_SAVING    (1 << 1)
+#define VFIO_DEVICE_STATE_V1_RESUMING  (1 << 2)
+#define VFIO_DEVICE_STATE_MASK      (VFIO_DEVICE_STATE_V1_RUNNING | \
+				     VFIO_DEVICE_STATE_V1_SAVING |  \
+				     VFIO_DEVICE_STATE_V1_RESUMING)
 
 #define VFIO_DEVICE_STATE_VALID(state) \
-	(state & VFIO_DEVICE_STATE_RESUMING ? \
-	(state & VFIO_DEVICE_STATE_MASK) == VFIO_DEVICE_STATE_RESUMING : 1)
+	(state & VFIO_DEVICE_STATE_V1_RESUMING ? \
+	(state & VFIO_DEVICE_STATE_MASK) == VFIO_DEVICE_STATE_V1_RESUMING : 1)
 
 #define VFIO_DEVICE_STATE_IS_ERROR(state) \
-	((state & VFIO_DEVICE_STATE_MASK) == (VFIO_DEVICE_STATE_SAVING | \
-					      VFIO_DEVICE_STATE_RESUMING))
+	((state & VFIO_DEVICE_STATE_MASK) == (VFIO_DEVICE_STATE_V1_SAVING | \
+					      VFIO_DEVICE_STATE_V1_RESUMING))
 
 #define VFIO_DEVICE_STATE_SET_ERROR(state) \
-	((state & ~VFIO_DEVICE_STATE_MASK) | VFIO_DEVICE_SATE_SAVING | \
-					     VFIO_DEVICE_STATE_RESUMING)
+	((state & ~VFIO_DEVICE_STATE_MASK) | VFIO_DEVICE_STATE_V1_SAVING | \
+					     VFIO_DEVICE_STATE_V1_RESUMING)
 
 	__u32 reserved;
 	__u64 pending_bytes;
@@ -1002,6 +806,186 @@ struct vfio_device_feature {
  */
 #define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN	(0)
 
+/*
+ * Indicates the device can support the migration API through
+ * VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE. If this GET succeeds, the RUNNING and
+ * ERROR states are always supported. Support for additional states is
+ * indicated via the flags field; at least VFIO_MIGRATION_STOP_COPY must be
+ * set.
+ *
+ * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and
+ * RESUMING are supported.
+ *
+ * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
+ * is supported in addition to the STOP_COPY states.
+ *
+ * Other combinations of flags have behavior to be defined in the future.
+ */
+struct vfio_device_feature_migration {
+	__aligned_u64 flags;
+#define VFIO_MIGRATION_STOP_COPY	(1 << 0)
+#define VFIO_MIGRATION_P2P		(1 << 1)
+};
+#define VFIO_DEVICE_FEATURE_MIGRATION 1
+
+/*
+ * Upon VFIO_DEVICE_FEATURE_SET, execute a migration state change on the VFIO
+ * device. The new state is supplied in device_state, see enum
+ * vfio_device_mig_state for details
+ *
+ * The kernel migration driver must fully transition the device to the new state
+ * value before the operation returns to the user.
+ *
+ * The kernel migration driver must not generate asynchronous device state
+ * transitions outside of manipulation by the user or the VFIO_DEVICE_RESET
+ * ioctl as described above.
+ *
+ * If this function fails then current device_state may be the original
+ * operating state or some other state along the combination transition path.
+ * The user can then decide if it should execute a VFIO_DEVICE_RESET, attempt
+ * to return to the original state, or attempt to return to some other state
+ * such as RUNNING or STOP.
+ *
+ * If the new_state starts a new data transfer session then the FD associated
+ * with that session is returned in data_fd. The user is responsible to close
+ * this FD when it is finished. The user must consider the migration data stream
+ * carried over the FD to be opaque and must preserve the byte order of the
+ * stream. The user is not required to preserve buffer segmentation when writing
+ * the data stream during the RESUMING operation.
+ *
+ * Upon VFIO_DEVICE_FEATURE_GET, get the current migration state of the VFIO
+ * device, data_fd will be -1.
+ */
+struct vfio_device_feature_mig_state {
+	__u32 device_state; /* From enum vfio_device_mig_state */
+	__s32 data_fd;
+};
+#define VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE 2
+
+/*
+ * The device migration Finite State Machine is described by the enum
+ * vfio_device_mig_state. Some of the FSM arcs will create a migration data
+ * transfer session by returning a FD, in this case the migration data will
+ * flow over the FD using read() and write() as discussed below.
+ *
+ * There are 5 states to support VFIO_MIGRATION_STOP_COPY:
+ *  RUNNING - The device is running normally
+ *  STOP - The device does not change the internal or external state
+ *  STOP_COPY - The device internal state can be read out
+ *  RESUMING - The device is stopped and is loading a new internal state
+ *  ERROR - The device has failed and must be reset
+ *
+ * And 1 optional state to support VFIO_MIGRATION_P2P:
+ *  RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA
+ *
+ * The FSM takes actions on the arcs between FSM states. The driver implements
+ * the following behavior for the FSM arcs:
+ *
+ * RUNNING_P2P -> STOP
+ * STOP_COPY -> STOP
+ *   While in STOP the device must stop the operation of the device. The device
+ *   must not generate interrupts, DMA, or any other change to external state.
+ *   It must not change its internal state. When stopped the device and kernel
+ *   migration driver must accept and respond to interaction to support external
+ *   subsystems in the STOP state, for example PCI MSI-X and PCI config space.
+ *   Failure by the user to restrict device access while in STOP must not result
+ *   in error conditions outside the user context (ex. host system faults).
+ *
+ *   The STOP_COPY arc will terminate a data transfer session.
+ *
+ * RESUMING -> STOP
+ *   Leaving RESUMING terminates a data transfer session and indicates the
+ *   device should complete processing of the data delivered by write(). The
+ *   kernel migration driver should complete the incorporation of data written
+ *   to the data transfer FD into the device internal state and perform
+ *   final validity and consistency checking of the new device state. If the
+ *   user provided data is found to be incomplete, inconsistent, or otherwise
+ *   invalid, the migration driver must fail the SET_STATE ioctl and
+ *   optionally go to the ERROR state as described below.
+ *
+ *   While in STOP the device has the same behavior as other STOP states
+ *   described above.
+ *
+ *   To abort a RESUMING session the device must be reset.
+ *
+ * RUNNING_P2P -> RUNNING
+ *   While in RUNNING the device is fully operational, the device may generate
+ *   interrupts, DMA, respond to MMIO, all vfio device regions are functional,
+ *   and the device may advance its internal state.
+ *
+ * RUNNING -> RUNNING_P2P
+ * STOP -> RUNNING_P2P
+ *   While in RUNNING_P2P the device is partially running in the P2P quiescent
+ *   state defined below.
+ *
+ * STOP -> STOP_COPY
+ *   This arc begin the process of saving the device state and will return a
+ *   new data_fd.
+ *
+ *   While in the STOP_COPY state the device has the same behavior as STOP
+ *   with the addition that the data transfers session continues to stream the
+ *   migration state. End of stream on the FD indicates the entire device
+ *   state has been transferred.
+ *
+ *   The user should take steps to restrict access to vfio device regions while
+ *   the device is in STOP_COPY or risk corruption of the device migration data
+ *   stream.
+ *
+ * STOP -> RESUMING
+ *   Entering the RESUMING state starts a process of restoring the device state
+ *   and will return a new data_fd. The data stream fed into the data_fd should
+ *   be taken from the data transfer output of a single FD during saving from
+ *   a compatible device. The migration driver may alter/reset the internal
+ *   device state for this arc if required to prepare the device to receive the
+ *   migration data.
+ *
+ * any -> ERROR
+ *   ERROR cannot be specified as a device state, however any transition request
+ *   can be failed with an errno return and may then move the device_state into
+ *   ERROR. In this case the device was unable to execute the requested arc and
+ *   was also unable to restore the device to any valid device_state.
+ *   To recover from ERROR VFIO_DEVICE_RESET must be used to return the
+ *   device_state back to RUNNING.
+ *
+ * The optional peer to peer (P2P) quiescent state is intended to be a quiescent
+ * state for the device for the purposes of managing multiple devices within a
+ * user context where peer-to-peer DMA between devices may be active. The
+ * RUNNING_P2P states must prevent the device from initiating
+ * any new P2P DMA transactions. If the device can identify P2P transactions
+ * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration
+ * driver must complete any such outstanding operations prior to completing the
+ * FSM arc into a P2P state. For the purpose of specification the states
+ * behave as though the device was fully running if not supported. Like while in
+ * STOP or STOP_COPY the user must not touch the device, otherwise the state
+ * can be exited.
+ *
+ * The remaining possible transitions are interpreted as combinations of the
+ * above FSM arcs. As there are multiple paths through the FSM arcs the path
+ * should be selected based on the following rules:
+ *   - Select the shortest path.
+ * Refer to vfio_mig_get_next_state() for the result of the algorithm.
+ *
+ * The automatic transit through the FSM arcs that make up the combination
+ * transition is invisible to the user. When working with combination arcs the
+ * user may see any step along the path in the device_state if SET_STATE
+ * fails. When handling these types of errors users should anticipate future
+ * revisions of this protocol using new states and those states becoming
+ * visible in this case.
+ *
+ * The optional states cannot be used with SET_STATE if the device does not
+ * support them. The user can discover if these states are supported by using
+ * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can
+ * avoid knowing about these optional states if the kernel driver supports them.
+ */
+enum vfio_device_mig_state {
+	VFIO_DEVICE_STATE_ERROR = 0,
+	VFIO_DEVICE_STATE_STOP = 1,
+	VFIO_DEVICE_STATE_RUNNING = 2,
+	VFIO_DEVICE_STATE_STOP_COPY = 3,
+	VFIO_DEVICE_STATE_RESUMING = 4,
+	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
+};
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
diff --git a/linux-headers/linux/vfio_zdev.h b/linux-headers/linux/vfio_zdev.h
index b4309397b6..77f2aff1f2 100644
--- a/linux-headers/linux/vfio_zdev.h
+++ b/linux-headers/linux/vfio_zdev.h
@@ -29,6 +29,9 @@ struct vfio_device_info_cap_zpci_base {
 	__u16 fmb_length;	/* Measurement Block Length (in bytes) */
 	__u8 pft;		/* PCI Function Type */
 	__u8 gid;		/* PCI function group ID */
+	/* End of version 1 */
+	__u32 fh;		/* PCI function handle */
+	/* End of version 2 */
 };
 
 /**
@@ -47,6 +50,10 @@ struct vfio_device_info_cap_zpci_group {
 	__u16 noi;		/* Maximum number of MSIs */
 	__u16 maxstbl;		/* Maximum Store Block Length */
 	__u8 version;		/* Supported PCI Version */
+	/* End of version 1 */
+	__u8 reserved;
+	__u16 imaxstbl;		/* Maximum Interpreted Store Block Length */
+	/* End of version 2 */
 };
 
 /**
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index c998860d7b..5d99e7c242 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -150,4 +150,11 @@
 /* Get the valid iova range */
 #define VHOST_VDPA_GET_IOVA_RANGE	_IOR(VHOST_VIRTIO, 0x78, \
 					     struct vhost_vdpa_iova_range)
+
+/* Get the config size */
+#define VHOST_VDPA_GET_CONFIG_SIZE	_IOR(VHOST_VIRTIO, 0x79, __u32)
+
+/* Get the count of all virtqueues */
+#define VHOST_VDPA_GET_VQS_COUNT	_IOR(VHOST_VIRTIO, 0x80, __u32)
+
 #endif
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 1/9] Update linux headers
@ 2022-04-04 18:17   ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

This is a placeholder that pulls in 5.18-rc1 + unmerged kernel changes
required by this item.  A proper header sync can be done once the
associated kernel code merges.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 .../linux/input-event-codes.h                 |   4 +-
 .../standard-headers/linux/virtio_config.h    |   6 +
 .../standard-headers/linux/virtio_crypto.h    |  82 +++-
 linux-headers/asm-arm64/kvm.h                 |  16 +
 linux-headers/asm-generic/mman-common.h       |   2 +
 linux-headers/asm-mips/mman.h                 |   2 +
 linux-headers/asm-s390/kvm.h                  |   1 +
 linux-headers/linux/kvm.h                     |  50 ++-
 linux-headers/linux/psci.h                    |   4 +
 linux-headers/linux/userfaultfd.h             |   8 +-
 linux-headers/linux/vfio.h                    | 406 +++++++++---------
 linux-headers/linux/vfio_zdev.h               |   7 +
 linux-headers/linux/vhost.h                   |   7 +
 13 files changed, 376 insertions(+), 219 deletions(-)

diff --git a/include/standard-headers/linux/input-event-codes.h b/include/standard-headers/linux/input-event-codes.h
index b5e86b40ab..e36c01003a 100644
--- a/include/standard-headers/linux/input-event-codes.h
+++ b/include/standard-headers/linux/input-event-codes.h
@@ -278,7 +278,8 @@
 #define KEY_PAUSECD		201
 #define KEY_PROG3		202
 #define KEY_PROG4		203
-#define KEY_DASHBOARD		204	/* AL Dashboard */
+#define KEY_ALL_APPLICATIONS	204	/* AC Desktop Show All Applications */
+#define KEY_DASHBOARD		KEY_ALL_APPLICATIONS
 #define KEY_SUSPEND		205
 #define KEY_CLOSE		206	/* AC Close */
 #define KEY_PLAY		207
@@ -612,6 +613,7 @@
 #define KEY_ASSISTANT		0x247	/* AL Context-aware desktop assistant */
 #define KEY_KBD_LAYOUT_NEXT	0x248	/* AC Next Keyboard Layout Select */
 #define KEY_EMOJI_PICKER	0x249	/* Show/hide emoji picker (HUTRR101) */
+#define KEY_DICTATE		0x24a	/* Start or Stop Voice Dictation Session (HUTRR99) */
 
 #define KEY_BRIGHTNESS_MIN		0x250	/* Set Brightness to Minimum */
 #define KEY_BRIGHTNESS_MAX		0x251	/* Set Brightness to Maximum */
diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
index 22e3a85f67..7acd8d4abc 100644
--- a/include/standard-headers/linux/virtio_config.h
+++ b/include/standard-headers/linux/virtio_config.h
@@ -80,6 +80,12 @@
 /* This feature indicates support for the packed virtqueue layout. */
 #define VIRTIO_F_RING_PACKED		34
 
+/*
+ * Inorder feature indicates that all buffers are used by the device
+ * in the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER		35
+
 /*
  * This feature indicates that memory accesses by the driver and the
  * device are ordered in a way described by the platform.
diff --git a/include/standard-headers/linux/virtio_crypto.h b/include/standard-headers/linux/virtio_crypto.h
index 5ff0b4ee59..68066dafb6 100644
--- a/include/standard-headers/linux/virtio_crypto.h
+++ b/include/standard-headers/linux/virtio_crypto.h
@@ -37,6 +37,7 @@
 #define VIRTIO_CRYPTO_SERVICE_HASH   1
 #define VIRTIO_CRYPTO_SERVICE_MAC    2
 #define VIRTIO_CRYPTO_SERVICE_AEAD   3
+#define VIRTIO_CRYPTO_SERVICE_AKCIPHER 4
 
 #define VIRTIO_CRYPTO_OPCODE(service, op)   (((service) << 8) | (op))
 
@@ -57,6 +58,10 @@ struct virtio_crypto_ctrl_header {
 	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02)
 #define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION \
 	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03)
+#define VIRTIO_CRYPTO_AKCIPHER_CREATE_SESSION \
+	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x04)
+#define VIRTIO_CRYPTO_AKCIPHER_DESTROY_SESSION \
+	   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x05)
 	uint32_t opcode;
 	uint32_t algo;
 	uint32_t flag;
@@ -180,6 +185,58 @@ struct virtio_crypto_aead_create_session_req {
 	uint8_t padding[32];
 };
 
+struct virtio_crypto_rsa_session_para {
+#define VIRTIO_CRYPTO_RSA_RAW_PADDING   0
+#define VIRTIO_CRYPTO_RSA_PKCS1_PADDING 1
+	uint32_t padding_algo;
+
+#define VIRTIO_CRYPTO_RSA_NO_HASH   0
+#define VIRTIO_CRYPTO_RSA_MD2       1
+#define VIRTIO_CRYPTO_RSA_MD3       2
+#define VIRTIO_CRYPTO_RSA_MD4       3
+#define VIRTIO_CRYPTO_RSA_MD5       4
+#define VIRTIO_CRYPTO_RSA_SHA1      5
+#define VIRTIO_CRYPTO_RSA_SHA256    6
+#define VIRTIO_CRYPTO_RSA_SHA384    7
+#define VIRTIO_CRYPTO_RSA_SHA512    8
+#define VIRTIO_CRYPTO_RSA_SHA224    9
+	uint32_t hash_algo;
+};
+
+struct virtio_crypto_ecdsa_session_para {
+#define VIRTIO_CRYPTO_CURVE_UNKNOWN   0
+#define VIRTIO_CRYPTO_CURVE_NIST_P192 1
+#define VIRTIO_CRYPTO_CURVE_NIST_P224 2
+#define VIRTIO_CRYPTO_CURVE_NIST_P256 3
+#define VIRTIO_CRYPTO_CURVE_NIST_P384 4
+#define VIRTIO_CRYPTO_CURVE_NIST_P521 5
+	uint32_t curve_id;
+	uint32_t padding;
+};
+
+struct virtio_crypto_akcipher_session_para {
+#define VIRTIO_CRYPTO_NO_AKCIPHER    0
+#define VIRTIO_CRYPTO_AKCIPHER_RSA   1
+#define VIRTIO_CRYPTO_AKCIPHER_DSA   2
+#define VIRTIO_CRYPTO_AKCIPHER_ECDSA 3
+	uint32_t algo;
+
+#define VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PUBLIC  1
+#define VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PRIVATE 2
+	uint32_t keytype;
+	uint32_t keylen;
+
+	union {
+		struct virtio_crypto_rsa_session_para rsa;
+		struct virtio_crypto_ecdsa_session_para ecdsa;
+	} u;
+};
+
+struct virtio_crypto_akcipher_create_session_req {
+	struct virtio_crypto_akcipher_session_para para;
+	uint8_t padding[36];
+};
+
 struct virtio_crypto_alg_chain_session_para {
 #define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_HASH_THEN_CIPHER  1
 #define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_CIPHER_THEN_HASH  2
@@ -247,6 +304,8 @@ struct virtio_crypto_op_ctrl_req {
 			mac_create_session;
 		struct virtio_crypto_aead_create_session_req
 			aead_create_session;
+		struct virtio_crypto_akcipher_create_session_req
+			akcipher_create_session;
 		struct virtio_crypto_destroy_session_req
 			destroy_session;
 		uint8_t padding[56];
@@ -266,6 +325,14 @@ struct virtio_crypto_op_header {
 	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x00)
 #define VIRTIO_CRYPTO_AEAD_DECRYPT \
 	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x01)
+#define VIRTIO_CRYPTO_AKCIPHER_ENCRYPT \
+	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x00)
+#define VIRTIO_CRYPTO_AKCIPHER_DECRYPT \
+	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x01)
+#define VIRTIO_CRYPTO_AKCIPHER_SIGN \
+	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x02)
+#define VIRTIO_CRYPTO_AKCIPHER_VERIFY \
+	VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x03)
 	uint32_t opcode;
 	/* algo should be service-specific algorithms */
 	uint32_t algo;
@@ -390,6 +457,16 @@ struct virtio_crypto_aead_data_req {
 	uint8_t padding[32];
 };
 
+struct virtio_crypto_akcipher_para {
+	uint32_t src_data_len;
+	uint32_t dst_data_len;
+};
+
+struct virtio_crypto_akcipher_data_req {
+	struct virtio_crypto_akcipher_para para;
+	uint8_t padding[40];
+};
+
 /* The request of the data virtqueue's packet */
 struct virtio_crypto_op_data_req {
 	struct virtio_crypto_op_header header;
@@ -399,6 +476,7 @@ struct virtio_crypto_op_data_req {
 		struct virtio_crypto_hash_data_req hash_req;
 		struct virtio_crypto_mac_data_req mac_req;
 		struct virtio_crypto_aead_data_req aead_req;
+		struct virtio_crypto_akcipher_data_req akcipher_req;
 		uint8_t padding[48];
 	} u;
 };
@@ -408,6 +486,8 @@ struct virtio_crypto_op_data_req {
 #define VIRTIO_CRYPTO_BADMSG    2
 #define VIRTIO_CRYPTO_NOTSUPP   3
 #define VIRTIO_CRYPTO_INVSESS   4 /* Invalid session id */
+#define VIRTIO_CRYPTO_NOSPC     5 /* no free session ID */
+#define VIRTIO_CRYPTO_KEY_REJECTED 6 /* Signature verification failed */
 
 /* The accelerator hardware is ready */
 #define VIRTIO_CRYPTO_S_HW_READY  (1 << 0)
@@ -438,7 +518,7 @@ struct virtio_crypto_config {
 	uint32_t max_cipher_key_len;
 	/* Maximum length of authenticated key */
 	uint32_t max_auth_key_len;
-	uint32_t reserve;
+	uint32_t akcipher_algo;
 	/* Maximum size of each crypto request's content */
 	uint64_t max_size;
 };
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index 3d2ce9912d..5c28a9737a 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -281,6 +281,11 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED	3
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED     	(1U << 4)
 
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3	KVM_REG_ARM_FW_REG(3)
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_NOT_AVAIL		0
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_AVAIL		1
+#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3_NOT_REQUIRED	2
+
 /* SVE registers */
 #define KVM_REG_ARM64_SVE		(0x15 << KVM_REG_ARM_COPROC_SHIFT)
 
@@ -362,6 +367,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
 #define   KVM_ARM_VCPU_PMU_V3_INIT	1
 #define   KVM_ARM_VCPU_PMU_V3_FILTER	2
+#define   KVM_ARM_VCPU_PMU_V3_SET_PMU	3
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
@@ -411,6 +417,16 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_PSCI_RET_INVAL		PSCI_RET_INVALID_PARAMS
 #define KVM_PSCI_RET_DENIED		PSCI_RET_DENIED
 
+/* arm64-specific kvm_run::system_event flags */
+/*
+ * Reset caused by a PSCI v1.1 SYSTEM_RESET2 call.
+ * Valid only when the system event has a type of KVM_SYSTEM_EVENT_RESET.
+ */
+#define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2	(1ULL << 0)
+
+/* run->fail_entry.hardware_entry_failure_reason codes. */
+#define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED	(1ULL << 0)
+
 #endif
 
 #endif /* __ARM_KVM_H__ */
diff --git a/linux-headers/asm-generic/mman-common.h b/linux-headers/asm-generic/mman-common.h
index 1567a3294c..6c1aa92a92 100644
--- a/linux-headers/asm-generic/mman-common.h
+++ b/linux-headers/asm-generic/mman-common.h
@@ -75,6 +75,8 @@
 #define MADV_POPULATE_READ	22	/* populate (prefault) page tables readable */
 #define MADV_POPULATE_WRITE	23	/* populate (prefault) page tables writable */
 
+#define MADV_DONTNEED_LOCKED	24	/* like DONTNEED, but drop locked pages too */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/linux-headers/asm-mips/mman.h b/linux-headers/asm-mips/mman.h
index 40b210c65a..1be428663c 100644
--- a/linux-headers/asm-mips/mman.h
+++ b/linux-headers/asm-mips/mman.h
@@ -101,6 +101,8 @@
 #define MADV_POPULATE_READ	22	/* populate (prefault) page tables readable */
 #define MADV_POPULATE_WRITE	23	/* populate (prefault) page tables writable */
 
+#define MADV_DONTNEED_LOCKED	24	/* like DONTNEED, but drop locked pages too */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h
index f053b8304a..d8259ff9a1 100644
--- a/linux-headers/asm-s390/kvm.h
+++ b/linux-headers/asm-s390/kvm.h
@@ -130,6 +130,7 @@ struct kvm_s390_vm_cpu_machine {
 #define KVM_S390_VM_CPU_FEAT_PFMFI	11
 #define KVM_S390_VM_CPU_FEAT_SIGPIF	12
 #define KVM_S390_VM_CPU_FEAT_KSS	13
+#define KVM_S390_VM_CPU_FEAT_ZPCI_INTERP 14
 struct kvm_s390_vm_cpu_feat {
 	__u64 feat[16];
 };
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index d232feaae9..f71befac09 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -562,9 +562,12 @@ struct kvm_s390_mem_op {
 	__u32 op;		/* type of operation */
 	__u64 buf;		/* buffer in userspace */
 	union {
-		__u8 ar;	/* the access register number */
+		struct {
+			__u8 ar;	/* the access register number */
+			__u8 key;	/* access key, ignored if flag unset */
+		};
 		__u32 sida_offset; /* offset into the sida */
-		__u8 reserved[32]; /* should be set to 0 */
+		__u8 reserved[32]; /* ignored */
 	};
 };
 /* types for kvm_s390_mem_op->op */
@@ -572,9 +575,12 @@ struct kvm_s390_mem_op {
 #define KVM_S390_MEMOP_LOGICAL_WRITE	1
 #define KVM_S390_MEMOP_SIDA_READ	2
 #define KVM_S390_MEMOP_SIDA_WRITE	3
+#define KVM_S390_MEMOP_ABSOLUTE_READ	4
+#define KVM_S390_MEMOP_ABSOLUTE_WRITE	5
 /* flags for kvm_s390_mem_op->flags */
 #define KVM_S390_MEMOP_F_CHECK_ONLY		(1ULL << 0)
 #define KVM_S390_MEMOP_F_INJECT_EXCEPTION	(1ULL << 1)
+#define KVM_S390_MEMOP_F_SKEY_PROTECTION	(1ULL << 2)
 
 /* for KVM_INTERRUPT */
 struct kvm_interrupt {
@@ -1134,6 +1140,11 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
 #define KVM_CAP_SYS_ATTRIBUTES 209
+#define KVM_CAP_PPC_AIL_MODE_3 210
+#define KVM_CAP_S390_MEM_OP_EXTENSION 211
+#define KVM_CAP_PMU_CAPABILITY 212
+#define KVM_CAP_DISABLE_QUIRKS2 213
+#define KVM_CAP_S390_ZPCI_OP 214
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1624,9 +1635,6 @@ struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
-/* Available with KVM_CAP_XSAVE2 */
-#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct kvm_xsave)
-
 struct kvm_s390_pv_sec_parm {
 	__u64 origin;
 	__u64 length;
@@ -1973,6 +1981,8 @@ struct kvm_dirty_gfn {
 #define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
 #define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
 
+#define KVM_PMU_CAP_DISABLE                    (1 << 0)
+
 /**
  * struct kvm_stats_header - Header of per vm/vcpu binary statistics data.
  * @flags: Some extra information for header, always 0 for now.
@@ -2051,4 +2061,34 @@ struct kvm_stats_desc {
 /* Available with KVM_CAP_XSAVE2 */
 #define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct kvm_xsave)
 
+/* Available with KVM_CAP_S390_ZPCI_OP */
+#define KVM_S390_ZPCI_OP	  _IOW(KVMIO,  0xd0, struct kvm_s390_zpci_op)
+
+struct kvm_s390_zpci_op {
+	/* in */
+	__u32 fh;		/* target device */
+	__u8  op;		/* operation to perform */
+	__u8  pad[3];
+	union {
+		/* for KVM_S390_ZPCIOP_REG_AEN */
+		struct {
+			__u64 ibv;	/* Guest addr of interrupt bit vector */
+			__u64 sb;	/* Guest addr of summary bit */
+			__u32 flags;
+			__u32 noi;	/* Number of interrupts */
+			__u8 isc;	/* Guest interrupt subclass */
+			__u8 sbo;	/* Offset of guest summary bit vector */
+			__u16 pad;
+		} reg_aen;
+		__u64 reserved[8];
+	} u;
+};
+
+/* types for kvm_s390_zpci_op->op */
+#define KVM_S390_ZPCIOP_REG_AEN		0
+#define KVM_S390_ZPCIOP_DEREG_AEN	1
+
+/* flags for kvm_s390_zpci_op->u.reg_aen.flags */
+#define KVM_S390_ZPCIOP_REGAEN_HOST	(1 << 0)
+
 #endif /* __LINUX_KVM_H */
diff --git a/linux-headers/linux/psci.h b/linux-headers/linux/psci.h
index a6772d508b..213b2a0f70 100644
--- a/linux-headers/linux/psci.h
+++ b/linux-headers/linux/psci.h
@@ -82,6 +82,10 @@
 #define PSCI_0_2_TOS_UP_NO_MIGRATE		1
 #define PSCI_0_2_TOS_MP				2
 
+/* PSCI v1.1 reset type encoding for SYSTEM_RESET2 */
+#define PSCI_1_1_RESET_TYPE_SYSTEM_WARM_RESET	0
+#define PSCI_1_1_RESET_TYPE_VENDOR_START	0x80000000U
+
 /* PSCI version decoding (independent of PSCI version) */
 #define PSCI_VERSION_MAJOR_SHIFT		16
 #define PSCI_VERSION_MINOR_MASK			\
diff --git a/linux-headers/linux/userfaultfd.h b/linux-headers/linux/userfaultfd.h
index 8479af5f4c..769b8379e4 100644
--- a/linux-headers/linux/userfaultfd.h
+++ b/linux-headers/linux/userfaultfd.h
@@ -32,7 +32,8 @@
 			   UFFD_FEATURE_SIGBUS |		\
 			   UFFD_FEATURE_THREAD_ID |		\
 			   UFFD_FEATURE_MINOR_HUGETLBFS |	\
-			   UFFD_FEATURE_MINOR_SHMEM)
+			   UFFD_FEATURE_MINOR_SHMEM |		\
+			   UFFD_FEATURE_EXACT_ADDRESS)
 #define UFFD_API_IOCTLS				\
 	((__u64)1 << _UFFDIO_REGISTER |		\
 	 (__u64)1 << _UFFDIO_UNREGISTER |	\
@@ -189,6 +190,10 @@ struct uffdio_api {
 	 *
 	 * UFFD_FEATURE_MINOR_SHMEM indicates the same support as
 	 * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead.
+	 *
+	 * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page
+	 * faults would be provided and the offset within the page would not be
+	 * masked.
 	 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP		(1<<0)
 #define UFFD_FEATURE_EVENT_FORK			(1<<1)
@@ -201,6 +206,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_THREAD_ID			(1<<8)
 #define UFFD_FEATURE_MINOR_HUGETLBFS		(1<<9)
 #define UFFD_FEATURE_MINOR_SHMEM		(1<<10)
+#define UFFD_FEATURE_EXACT_ADDRESS		(1<<11)
 	__u64 features;
 
 	__u64 ioctls;
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index e680594f27..e9f7795c39 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -323,7 +323,7 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_TYPE_PCI_VENDOR_MASK	(0xffff)
 #define VFIO_REGION_TYPE_GFX                    (1)
 #define VFIO_REGION_TYPE_CCW			(2)
-#define VFIO_REGION_TYPE_MIGRATION              (3)
+#define VFIO_REGION_TYPE_MIGRATION_DEPRECATED   (3)
 
 /* sub-types for VFIO_REGION_TYPE_PCI_* */
 
@@ -405,225 +405,29 @@ struct vfio_region_gfx_edid {
 #define VFIO_REGION_SUBTYPE_CCW_CRW		(3)
 
 /* sub-types for VFIO_REGION_TYPE_MIGRATION */
-#define VFIO_REGION_SUBTYPE_MIGRATION           (1)
-
-/*
- * The structure vfio_device_migration_info is placed at the 0th offset of
- * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
- * migration information. Field accesses from this structure are only supported
- * at their native width and alignment. Otherwise, the result is undefined and
- * vendor drivers should return an error.
- *
- * device_state: (read/write)
- *      - The user application writes to this field to inform the vendor driver
- *        about the device state to be transitioned to.
- *      - The vendor driver should take the necessary actions to change the
- *        device state. After successful transition to a given state, the
- *        vendor driver should return success on write(device_state, state)
- *        system call. If the device state transition fails, the vendor driver
- *        should return an appropriate -errno for the fault condition.
- *      - On the user application side, if the device state transition fails,
- *	  that is, if write(device_state, state) returns an error, read
- *	  device_state again to determine the current state of the device from
- *	  the vendor driver.
- *      - The vendor driver should return previous state of the device unless
- *        the vendor driver has encountered an internal error, in which case
- *        the vendor driver may report the device_state VFIO_DEVICE_STATE_ERROR.
- *      - The user application must use the device reset ioctl to recover the
- *        device from VFIO_DEVICE_STATE_ERROR state. If the device is
- *        indicated to be in a valid device state by reading device_state, the
- *        user application may attempt to transition the device to any valid
- *        state reachable from the current state or terminate itself.
- *
- *      device_state consists of 3 bits:
- *      - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is clear,
- *        it indicates the _STOP state. When the device state is changed to
- *        _STOP, driver should stop the device before write() returns.
- *      - If bit 1 is set, it indicates the _SAVING state, which means that the
- *        driver should start gathering device state information that will be
- *        provided to the VFIO user application to save the device's state.
- *      - If bit 2 is set, it indicates the _RESUMING state, which means that
- *        the driver should prepare to resume the device. Data provided through
- *        the migration region should be used to resume the device.
- *      Bits 3 - 31 are reserved for future use. To preserve them, the user
- *      application should perform a read-modify-write operation on this
- *      field when modifying the specified bits.
- *
- *  +------- _RESUMING
- *  |+------ _SAVING
- *  ||+----- _RUNNING
- *  |||
- *  000b => Device Stopped, not saving or resuming
- *  001b => Device running, which is the default state
- *  010b => Stop the device & save the device state, stop-and-copy state
- *  011b => Device running and save the device state, pre-copy state
- *  100b => Device stopped and the device state is resuming
- *  101b => Invalid state
- *  110b => Error state
- *  111b => Invalid state
- *
- * State transitions:
- *
- *              _RESUMING  _RUNNING    Pre-copy    Stop-and-copy   _STOP
- *                (100b)     (001b)     (011b)        (010b)       (000b)
- * 0. Running or default state
- *                             |
- *
- * 1. Normal Shutdown (optional)
- *                             |------------------------------------->|
- *
- * 2. Save the state or suspend
- *                             |------------------------->|---------->|
- *
- * 3. Save the state during live migration
- *                             |----------->|------------>|---------->|
- *
- * 4. Resuming
- *                  |<---------|
- *
- * 5. Resumed
- *                  |--------->|
- *
- * 0. Default state of VFIO device is _RUNNING when the user application starts.
- * 1. During normal shutdown of the user application, the user application may
- *    optionally change the VFIO device state from _RUNNING to _STOP. This
- *    transition is optional. The vendor driver must support this transition but
- *    must not require it.
- * 2. When the user application saves state or suspends the application, the
- *    device state transitions from _RUNNING to stop-and-copy and then to _STOP.
- *    On state transition from _RUNNING to stop-and-copy, driver must stop the
- *    device, save the device state and send it to the application through the
- *    migration region. The sequence to be followed for such transition is given
- *    below.
- * 3. In live migration of user application, the state transitions from _RUNNING
- *    to pre-copy, to stop-and-copy, and to _STOP.
- *    On state transition from _RUNNING to pre-copy, the driver should start
- *    gathering the device state while the application is still running and send
- *    the device state data to application through the migration region.
- *    On state transition from pre-copy to stop-and-copy, the driver must stop
- *    the device, save the device state and send it to the user application
- *    through the migration region.
- *    Vendor drivers must support the pre-copy state even for implementations
- *    where no data is provided to the user before the stop-and-copy state. The
- *    user must not be required to consume all migration data before the device
- *    transitions to a new state, including the stop-and-copy state.
- *    The sequence to be followed for above two transitions is given below.
- * 4. To start the resuming phase, the device state should be transitioned from
- *    the _RUNNING to the _RESUMING state.
- *    In the _RESUMING state, the driver should use the device state data
- *    received through the migration region to resume the device.
- * 5. After providing saved device data to the driver, the application should
- *    change the state from _RESUMING to _RUNNING.
- *
- * reserved:
- *      Reads on this field return zero and writes are ignored.
- *
- * pending_bytes: (read only)
- *      The number of pending bytes still to be migrated from the vendor driver.
- *
- * data_offset: (read only)
- *      The user application should read data_offset field from the migration
- *      region. The user application should read the device data from this
- *      offset within the migration region during the _SAVING state or write
- *      the device data during the _RESUMING state. See below for details of
- *      sequence to be followed.
- *
- * data_size: (read/write)
- *      The user application should read data_size to get the size in bytes of
- *      the data copied in the migration region during the _SAVING state and
- *      write the size in bytes of the data copied in the migration region
- *      during the _RESUMING state.
- *
- * The format of the migration region is as follows:
- *  ------------------------------------------------------------------
- * |vfio_device_migration_info|    data section                      |
- * |                          |     ///////////////////////////////  |
- * ------------------------------------------------------------------
- *   ^                              ^
- *  offset 0-trapped part        data_offset
- *
- * The structure vfio_device_migration_info is always followed by the data
- * section in the region, so data_offset will always be nonzero. The offset
- * from where the data is copied is decided by the kernel driver. The data
- * section can be trapped, mmapped, or partitioned, depending on how the kernel
- * driver defines the data section. The data section partition can be defined
- * as mapped by the sparse mmap capability. If mmapped, data_offset must be
- * page aligned, whereas initial section which contains the
- * vfio_device_migration_info structure, might not end at the offset, which is
- * page aligned. The user is not required to access through mmap regardless
- * of the capabilities of the region mmap.
- * The vendor driver should determine whether and how to partition the data
- * section. The vendor driver should return data_offset accordingly.
- *
- * The sequence to be followed while in pre-copy state and stop-and-copy state
- * is as follows:
- * a. Read pending_bytes, indicating the start of a new iteration to get device
- *    data. Repeated read on pending_bytes at this stage should have no side
- *    effects.
- *    If pending_bytes == 0, the user application should not iterate to get data
- *    for that device.
- *    If pending_bytes > 0, perform the following steps.
- * b. Read data_offset, indicating that the vendor driver should make data
- *    available through the data section. The vendor driver should return this
- *    read operation only after data is available from (region + data_offset)
- *    to (region + data_offset + data_size).
- * c. Read data_size, which is the amount of data in bytes available through
- *    the migration region.
- *    Read on data_offset and data_size should return the offset and size of
- *    the current buffer if the user application reads data_offset and
- *    data_size more than once here.
- * d. Read data_size bytes of data from (region + data_offset) from the
- *    migration region.
- * e. Process the data.
- * f. Read pending_bytes, which indicates that the data from the previous
- *    iteration has been read. If pending_bytes > 0, go to step b.
- *
- * The user application can transition from the _SAVING|_RUNNING
- * (pre-copy state) to the _SAVING (stop-and-copy) state regardless of the
- * number of pending bytes. The user application should iterate in _SAVING
- * (stop-and-copy) until pending_bytes is 0.
- *
- * The sequence to be followed while _RESUMING device state is as follows:
- * While data for this device is available, repeat the following steps:
- * a. Read data_offset from where the user application should write data.
- * b. Write migration data starting at the migration region + data_offset for
- *    the length determined by data_size from the migration source.
- * c. Write data_size, which indicates to the vendor driver that data is
- *    written in the migration region. Vendor driver must return this write
- *    operations on consuming data. Vendor driver should apply the
- *    user-provided migration region data to the device resume state.
- *
- * If an error occurs during the above sequences, the vendor driver can return
- * an error code for next read() or write() operation, which will terminate the
- * loop. The user application should then take the next necessary action, for
- * example, failing migration or terminating the user application.
- *
- * For the user application, data is opaque. The user application should write
- * data in the same order as the data is received and the data should be of
- * same transaction size at the source.
- */
+#define VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED (1)
 
 struct vfio_device_migration_info {
 	__u32 device_state;         /* VFIO device state */
-#define VFIO_DEVICE_STATE_STOP      (0)
-#define VFIO_DEVICE_STATE_RUNNING   (1 << 0)
-#define VFIO_DEVICE_STATE_SAVING    (1 << 1)
-#define VFIO_DEVICE_STATE_RESUMING  (1 << 2)
-#define VFIO_DEVICE_STATE_MASK      (VFIO_DEVICE_STATE_RUNNING | \
-				     VFIO_DEVICE_STATE_SAVING |  \
-				     VFIO_DEVICE_STATE_RESUMING)
+#define VFIO_DEVICE_STATE_V1_STOP      (0)
+#define VFIO_DEVICE_STATE_V1_RUNNING   (1 << 0)
+#define VFIO_DEVICE_STATE_V1_SAVING    (1 << 1)
+#define VFIO_DEVICE_STATE_V1_RESUMING  (1 << 2)
+#define VFIO_DEVICE_STATE_MASK      (VFIO_DEVICE_STATE_V1_RUNNING | \
+				     VFIO_DEVICE_STATE_V1_SAVING |  \
+				     VFIO_DEVICE_STATE_V1_RESUMING)
 
 #define VFIO_DEVICE_STATE_VALID(state) \
-	(state & VFIO_DEVICE_STATE_RESUMING ? \
-	(state & VFIO_DEVICE_STATE_MASK) == VFIO_DEVICE_STATE_RESUMING : 1)
+	(state & VFIO_DEVICE_STATE_V1_RESUMING ? \
+	(state & VFIO_DEVICE_STATE_MASK) == VFIO_DEVICE_STATE_V1_RESUMING : 1)
 
 #define VFIO_DEVICE_STATE_IS_ERROR(state) \
-	((state & VFIO_DEVICE_STATE_MASK) == (VFIO_DEVICE_STATE_SAVING | \
-					      VFIO_DEVICE_STATE_RESUMING))
+	((state & VFIO_DEVICE_STATE_MASK) == (VFIO_DEVICE_STATE_V1_SAVING | \
+					      VFIO_DEVICE_STATE_V1_RESUMING))
 
 #define VFIO_DEVICE_STATE_SET_ERROR(state) \
-	((state & ~VFIO_DEVICE_STATE_MASK) | VFIO_DEVICE_SATE_SAVING | \
-					     VFIO_DEVICE_STATE_RESUMING)
+	((state & ~VFIO_DEVICE_STATE_MASK) | VFIO_DEVICE_STATE_V1_SAVING | \
+					     VFIO_DEVICE_STATE_V1_RESUMING)
 
 	__u32 reserved;
 	__u64 pending_bytes;
@@ -1002,6 +806,186 @@ struct vfio_device_feature {
  */
 #define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN	(0)
 
+/*
+ * Indicates the device can support the migration API through
+ * VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE. If this GET succeeds, the RUNNING and
+ * ERROR states are always supported. Support for additional states is
+ * indicated via the flags field; at least VFIO_MIGRATION_STOP_COPY must be
+ * set.
+ *
+ * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and
+ * RESUMING are supported.
+ *
+ * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
+ * is supported in addition to the STOP_COPY states.
+ *
+ * Other combinations of flags have behavior to be defined in the future.
+ */
+struct vfio_device_feature_migration {
+	__aligned_u64 flags;
+#define VFIO_MIGRATION_STOP_COPY	(1 << 0)
+#define VFIO_MIGRATION_P2P		(1 << 1)
+};
+#define VFIO_DEVICE_FEATURE_MIGRATION 1
+
+/*
+ * Upon VFIO_DEVICE_FEATURE_SET, execute a migration state change on the VFIO
+ * device. The new state is supplied in device_state, see enum
+ * vfio_device_mig_state for details
+ *
+ * The kernel migration driver must fully transition the device to the new state
+ * value before the operation returns to the user.
+ *
+ * The kernel migration driver must not generate asynchronous device state
+ * transitions outside of manipulation by the user or the VFIO_DEVICE_RESET
+ * ioctl as described above.
+ *
+ * If this function fails then current device_state may be the original
+ * operating state or some other state along the combination transition path.
+ * The user can then decide if it should execute a VFIO_DEVICE_RESET, attempt
+ * to return to the original state, or attempt to return to some other state
+ * such as RUNNING or STOP.
+ *
+ * If the new_state starts a new data transfer session then the FD associated
+ * with that session is returned in data_fd. The user is responsible to close
+ * this FD when it is finished. The user must consider the migration data stream
+ * carried over the FD to be opaque and must preserve the byte order of the
+ * stream. The user is not required to preserve buffer segmentation when writing
+ * the data stream during the RESUMING operation.
+ *
+ * Upon VFIO_DEVICE_FEATURE_GET, get the current migration state of the VFIO
+ * device, data_fd will be -1.
+ */
+struct vfio_device_feature_mig_state {
+	__u32 device_state; /* From enum vfio_device_mig_state */
+	__s32 data_fd;
+};
+#define VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE 2
+
+/*
+ * The device migration Finite State Machine is described by the enum
+ * vfio_device_mig_state. Some of the FSM arcs will create a migration data
+ * transfer session by returning a FD, in this case the migration data will
+ * flow over the FD using read() and write() as discussed below.
+ *
+ * There are 5 states to support VFIO_MIGRATION_STOP_COPY:
+ *  RUNNING - The device is running normally
+ *  STOP - The device does not change the internal or external state
+ *  STOP_COPY - The device internal state can be read out
+ *  RESUMING - The device is stopped and is loading a new internal state
+ *  ERROR - The device has failed and must be reset
+ *
+ * And 1 optional state to support VFIO_MIGRATION_P2P:
+ *  RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA
+ *
+ * The FSM takes actions on the arcs between FSM states. The driver implements
+ * the following behavior for the FSM arcs:
+ *
+ * RUNNING_P2P -> STOP
+ * STOP_COPY -> STOP
+ *   While in STOP the device must stop the operation of the device. The device
+ *   must not generate interrupts, DMA, or any other change to external state.
+ *   It must not change its internal state. When stopped the device and kernel
+ *   migration driver must accept and respond to interaction to support external
+ *   subsystems in the STOP state, for example PCI MSI-X and PCI config space.
+ *   Failure by the user to restrict device access while in STOP must not result
+ *   in error conditions outside the user context (ex. host system faults).
+ *
+ *   The STOP_COPY arc will terminate a data transfer session.
+ *
+ * RESUMING -> STOP
+ *   Leaving RESUMING terminates a data transfer session and indicates the
+ *   device should complete processing of the data delivered by write(). The
+ *   kernel migration driver should complete the incorporation of data written
+ *   to the data transfer FD into the device internal state and perform
+ *   final validity and consistency checking of the new device state. If the
+ *   user provided data is found to be incomplete, inconsistent, or otherwise
+ *   invalid, the migration driver must fail the SET_STATE ioctl and
+ *   optionally go to the ERROR state as described below.
+ *
+ *   While in STOP the device has the same behavior as other STOP states
+ *   described above.
+ *
+ *   To abort a RESUMING session the device must be reset.
+ *
+ * RUNNING_P2P -> RUNNING
+ *   While in RUNNING the device is fully operational, the device may generate
+ *   interrupts, DMA, respond to MMIO, all vfio device regions are functional,
+ *   and the device may advance its internal state.
+ *
+ * RUNNING -> RUNNING_P2P
+ * STOP -> RUNNING_P2P
+ *   While in RUNNING_P2P the device is partially running in the P2P quiescent
+ *   state defined below.
+ *
+ * STOP -> STOP_COPY
+ *   This arc begin the process of saving the device state and will return a
+ *   new data_fd.
+ *
+ *   While in the STOP_COPY state the device has the same behavior as STOP
+ *   with the addition that the data transfers session continues to stream the
+ *   migration state. End of stream on the FD indicates the entire device
+ *   state has been transferred.
+ *
+ *   The user should take steps to restrict access to vfio device regions while
+ *   the device is in STOP_COPY or risk corruption of the device migration data
+ *   stream.
+ *
+ * STOP -> RESUMING
+ *   Entering the RESUMING state starts a process of restoring the device state
+ *   and will return a new data_fd. The data stream fed into the data_fd should
+ *   be taken from the data transfer output of a single FD during saving from
+ *   a compatible device. The migration driver may alter/reset the internal
+ *   device state for this arc if required to prepare the device to receive the
+ *   migration data.
+ *
+ * any -> ERROR
+ *   ERROR cannot be specified as a device state, however any transition request
+ *   can be failed with an errno return and may then move the device_state into
+ *   ERROR. In this case the device was unable to execute the requested arc and
+ *   was also unable to restore the device to any valid device_state.
+ *   To recover from ERROR VFIO_DEVICE_RESET must be used to return the
+ *   device_state back to RUNNING.
+ *
+ * The optional peer to peer (P2P) quiescent state is intended to be a quiescent
+ * state for the device for the purposes of managing multiple devices within a
+ * user context where peer-to-peer DMA between devices may be active. The
+ * RUNNING_P2P states must prevent the device from initiating
+ * any new P2P DMA transactions. If the device can identify P2P transactions
+ * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration
+ * driver must complete any such outstanding operations prior to completing the
+ * FSM arc into a P2P state. For the purpose of specification the states
+ * behave as though the device was fully running if not supported. Like while in
+ * STOP or STOP_COPY the user must not touch the device, otherwise the state
+ * can be exited.
+ *
+ * The remaining possible transitions are interpreted as combinations of the
+ * above FSM arcs. As there are multiple paths through the FSM arcs the path
+ * should be selected based on the following rules:
+ *   - Select the shortest path.
+ * Refer to vfio_mig_get_next_state() for the result of the algorithm.
+ *
+ * The automatic transit through the FSM arcs that make up the combination
+ * transition is invisible to the user. When working with combination arcs the
+ * user may see any step along the path in the device_state if SET_STATE
+ * fails. When handling these types of errors users should anticipate future
+ * revisions of this protocol using new states and those states becoming
+ * visible in this case.
+ *
+ * The optional states cannot be used with SET_STATE if the device does not
+ * support them. The user can discover if these states are supported by using
+ * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can
+ * avoid knowing about these optional states if the kernel driver supports them.
+ */
+enum vfio_device_mig_state {
+	VFIO_DEVICE_STATE_ERROR = 0,
+	VFIO_DEVICE_STATE_STOP = 1,
+	VFIO_DEVICE_STATE_RUNNING = 2,
+	VFIO_DEVICE_STATE_STOP_COPY = 3,
+	VFIO_DEVICE_STATE_RESUMING = 4,
+	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
+};
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
diff --git a/linux-headers/linux/vfio_zdev.h b/linux-headers/linux/vfio_zdev.h
index b4309397b6..77f2aff1f2 100644
--- a/linux-headers/linux/vfio_zdev.h
+++ b/linux-headers/linux/vfio_zdev.h
@@ -29,6 +29,9 @@ struct vfio_device_info_cap_zpci_base {
 	__u16 fmb_length;	/* Measurement Block Length (in bytes) */
 	__u8 pft;		/* PCI Function Type */
 	__u8 gid;		/* PCI function group ID */
+	/* End of version 1 */
+	__u32 fh;		/* PCI function handle */
+	/* End of version 2 */
 };
 
 /**
@@ -47,6 +50,10 @@ struct vfio_device_info_cap_zpci_group {
 	__u16 noi;		/* Maximum number of MSIs */
 	__u16 maxstbl;		/* Maximum Store Block Length */
 	__u8 version;		/* Supported PCI Version */
+	/* End of version 1 */
+	__u8 reserved;
+	__u16 imaxstbl;		/* Maximum Interpreted Store Block Length */
+	/* End of version 2 */
 };
 
 /**
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index c998860d7b..5d99e7c242 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -150,4 +150,11 @@
 /* Get the valid iova range */
 #define VHOST_VDPA_GET_IOVA_RANGE	_IOR(VHOST_VIRTIO, 0x78, \
 					     struct vhost_vdpa_iova_range)
+
+/* Get the config size */
+#define VHOST_VDPA_GET_CONFIG_SIZE	_IOR(VHOST_VIRTIO, 0x79, __u32)
+
+/* Get the count of all virtqueues */
+#define VHOST_VDPA_GET_VQS_COUNT	_IOR(VHOST_VIRTIO, 0x80, __u32)
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 2/9] vfio: tolerate migration protocol v1 uapi renames
  2022-04-04 18:17 ` Matthew Rosato
@ 2022-04-04 18:17   ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

The v1 uapi is deprecated and will be replaced by v2 at some point;
this patch just tolerates the renaming of uapi fields to reflect
v1 / deprecated status.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/vfio/common.c    |  2 +-
 hw/vfio/migration.c | 19 +++++++++++--------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 080046e3f5..7b1e12fb69 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -380,7 +380,7 @@ static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
                 return false;
             }
 
-            if ((migration->device_state & VFIO_DEVICE_STATE_SAVING) &&
+            if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
                 (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
                 continue;
             } else {
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ff6b45de6b..e109cee551 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -432,7 +432,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
     }
 
     ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
-                                   VFIO_DEVICE_STATE_SAVING);
+                                   VFIO_DEVICE_STATE_V1_SAVING);
     if (ret) {
         error_report("%s: Failed to set state SAVING", vbasedev->name);
         return ret;
@@ -532,7 +532,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
     int ret;
 
     ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_RUNNING,
-                                   VFIO_DEVICE_STATE_SAVING);
+                                   VFIO_DEVICE_STATE_V1_SAVING);
     if (ret) {
         error_report("%s: Failed to set state STOP and SAVING",
                      vbasedev->name);
@@ -569,7 +569,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_SAVING, 0);
+    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING, 0);
     if (ret) {
         error_report("%s: Failed to set state STOPPED", vbasedev->name);
         return ret;
@@ -730,7 +730,7 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
          * start saving data.
          */
         if (state == RUN_STATE_SAVE_VM) {
-            value = VFIO_DEVICE_STATE_SAVING;
+            value = VFIO_DEVICE_STATE_V1_SAVING;
         } else {
             value = 0;
         }
@@ -768,8 +768,9 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
     case MIGRATION_STATUS_FAILED:
         bytes_transferred = 0;
         ret = vfio_migration_set_state(vbasedev,
-                      ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING),
-                      VFIO_DEVICE_STATE_RUNNING);
+                                       ~(VFIO_DEVICE_STATE_V1_SAVING |
+                                         VFIO_DEVICE_STATE_RESUMING),
+                                       VFIO_DEVICE_STATE_RUNNING);
         if (ret) {
             error_report("%s: Failed to set state RUNNING", vbasedev->name);
         }
@@ -864,8 +865,10 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
         goto add_blocker;
     }
 
-    ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
-                                   VFIO_REGION_SUBTYPE_MIGRATION, &info);
+    ret = vfio_get_dev_region_info(vbasedev,
+                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
+                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
+                                   &info);
     if (ret) {
         goto add_blocker;
     }
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 2/9] vfio: tolerate migration protocol v1 uapi renames
@ 2022-04-04 18:17   ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

The v1 uapi is deprecated and will be replaced by v2 at some point;
this patch just tolerates the renaming of uapi fields to reflect
v1 / deprecated status.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/vfio/common.c    |  2 +-
 hw/vfio/migration.c | 19 +++++++++++--------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 080046e3f5..7b1e12fb69 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -380,7 +380,7 @@ static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
                 return false;
             }
 
-            if ((migration->device_state & VFIO_DEVICE_STATE_SAVING) &&
+            if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
                 (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
                 continue;
             } else {
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ff6b45de6b..e109cee551 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -432,7 +432,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
     }
 
     ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
-                                   VFIO_DEVICE_STATE_SAVING);
+                                   VFIO_DEVICE_STATE_V1_SAVING);
     if (ret) {
         error_report("%s: Failed to set state SAVING", vbasedev->name);
         return ret;
@@ -532,7 +532,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
     int ret;
 
     ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_RUNNING,
-                                   VFIO_DEVICE_STATE_SAVING);
+                                   VFIO_DEVICE_STATE_V1_SAVING);
     if (ret) {
         error_report("%s: Failed to set state STOP and SAVING",
                      vbasedev->name);
@@ -569,7 +569,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
         return ret;
     }
 
-    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_SAVING, 0);
+    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING, 0);
     if (ret) {
         error_report("%s: Failed to set state STOPPED", vbasedev->name);
         return ret;
@@ -730,7 +730,7 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
          * start saving data.
          */
         if (state == RUN_STATE_SAVE_VM) {
-            value = VFIO_DEVICE_STATE_SAVING;
+            value = VFIO_DEVICE_STATE_V1_SAVING;
         } else {
             value = 0;
         }
@@ -768,8 +768,9 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
     case MIGRATION_STATUS_FAILED:
         bytes_transferred = 0;
         ret = vfio_migration_set_state(vbasedev,
-                      ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING),
-                      VFIO_DEVICE_STATE_RUNNING);
+                                       ~(VFIO_DEVICE_STATE_V1_SAVING |
+                                         VFIO_DEVICE_STATE_RESUMING),
+                                       VFIO_DEVICE_STATE_RUNNING);
         if (ret) {
             error_report("%s: Failed to set state RUNNING", vbasedev->name);
         }
@@ -864,8 +865,10 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
         goto add_blocker;
     }
 
-    ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
-                                   VFIO_REGION_SUBTYPE_MIGRATION, &info);
+    ret = vfio_get_dev_region_info(vbasedev,
+                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
+                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
+                                   &info);
     if (ret) {
         goto add_blocker;
     }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 3/9] target/s390x: add zpci-interp to cpu models
  2022-04-04 18:17 ` Matthew Rosato
@ 2022-04-04 18:17   ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

The zpci-interp feature is used to specify whether zPCI interpretation is
to be used for this guest.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-virtio-ccw.c          | 1 +
 target/s390x/cpu_features_def.h.inc | 1 +
 target/s390x/gen-features.c         | 2 ++
 target/s390x/kvm/kvm.c              | 1 +
 4 files changed, 5 insertions(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 90480e7cf9..b190234308 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -805,6 +805,7 @@ static void ccw_machine_6_2_instance_options(MachineState *machine)
     static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V6_2 };
 
     ccw_machine_7_0_instance_options(machine);
+    s390_cpudef_featoff_greater(14, 1, S390_FEAT_ZPCI_INTERP);
     s390_set_qemu_cpu_model(0x3906, 14, 2, qemu_cpu_feat);
 }
 
diff --git a/target/s390x/cpu_features_def.h.inc b/target/s390x/cpu_features_def.h.inc
index e86662bb3b..4ade3182aa 100644
--- a/target/s390x/cpu_features_def.h.inc
+++ b/target/s390x/cpu_features_def.h.inc
@@ -146,6 +146,7 @@ DEF_FEAT(SIE_CEI, "cei", SCLP_CPU, 43, "SIE: Conditional-external-interception f
 DEF_FEAT(DAT_ENH_2, "dateh2", MISC, 0, "DAT-enhancement facility 2")
 DEF_FEAT(CMM, "cmm", MISC, 0, "Collaborative-memory-management facility")
 DEF_FEAT(AP, "ap", MISC, 0, "AP instructions installed")
+DEF_FEAT(ZPCI_INTERP, "zpci-interp", MISC, 0, "zPCI interpretation")
 
 /* Features exposed via the PLO instruction. */
 DEF_FEAT(PLO_CL, "plo-cl", PLO, 0, "PLO Compare and load (32 bit in general registers)")
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 22846121c4..9db6bd545e 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -554,6 +554,7 @@ static uint16_t full_GEN14_GA1[] = {
     S390_FEAT_HPMA2,
     S390_FEAT_SIE_KSS,
     S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+    S390_FEAT_ZPCI_INTERP,
 };
 
 #define full_GEN14_GA2 EmptyFeat
@@ -650,6 +651,7 @@ static uint16_t default_GEN14_GA1[] = {
     S390_FEAT_GROUP_MSA_EXT_8,
     S390_FEAT_MULTIPLE_EPOCH,
     S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+    S390_FEAT_ZPCI_INTERP,
 };
 
 #define default_GEN14_GA2 EmptyFeat
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 6acf14d5ec..0357bfda89 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2294,6 +2294,7 @@ static int kvm_to_feat[][2] = {
     { KVM_S390_VM_CPU_FEAT_PFMFI, S390_FEAT_SIE_PFMFI},
     { KVM_S390_VM_CPU_FEAT_SIGPIF, S390_FEAT_SIE_SIGPIF},
     { KVM_S390_VM_CPU_FEAT_KSS, S390_FEAT_SIE_KSS},
+    { KVM_S390_VM_CPU_FEAT_ZPCI_INTERP, S390_FEAT_ZPCI_INTERP },
 };
 
 static int query_cpu_feat(S390FeatBitmap features)
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 3/9] target/s390x: add zpci-interp to cpu models
@ 2022-04-04 18:17   ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

The zpci-interp feature is used to specify whether zPCI interpretation is
to be used for this guest.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-virtio-ccw.c          | 1 +
 target/s390x/cpu_features_def.h.inc | 1 +
 target/s390x/gen-features.c         | 2 ++
 target/s390x/kvm/kvm.c              | 1 +
 4 files changed, 5 insertions(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 90480e7cf9..b190234308 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -805,6 +805,7 @@ static void ccw_machine_6_2_instance_options(MachineState *machine)
     static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V6_2 };
 
     ccw_machine_7_0_instance_options(machine);
+    s390_cpudef_featoff_greater(14, 1, S390_FEAT_ZPCI_INTERP);
     s390_set_qemu_cpu_model(0x3906, 14, 2, qemu_cpu_feat);
 }
 
diff --git a/target/s390x/cpu_features_def.h.inc b/target/s390x/cpu_features_def.h.inc
index e86662bb3b..4ade3182aa 100644
--- a/target/s390x/cpu_features_def.h.inc
+++ b/target/s390x/cpu_features_def.h.inc
@@ -146,6 +146,7 @@ DEF_FEAT(SIE_CEI, "cei", SCLP_CPU, 43, "SIE: Conditional-external-interception f
 DEF_FEAT(DAT_ENH_2, "dateh2", MISC, 0, "DAT-enhancement facility 2")
 DEF_FEAT(CMM, "cmm", MISC, 0, "Collaborative-memory-management facility")
 DEF_FEAT(AP, "ap", MISC, 0, "AP instructions installed")
+DEF_FEAT(ZPCI_INTERP, "zpci-interp", MISC, 0, "zPCI interpretation")
 
 /* Features exposed via the PLO instruction. */
 DEF_FEAT(PLO_CL, "plo-cl", PLO, 0, "PLO Compare and load (32 bit in general registers)")
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 22846121c4..9db6bd545e 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -554,6 +554,7 @@ static uint16_t full_GEN14_GA1[] = {
     S390_FEAT_HPMA2,
     S390_FEAT_SIE_KSS,
     S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+    S390_FEAT_ZPCI_INTERP,
 };
 
 #define full_GEN14_GA2 EmptyFeat
@@ -650,6 +651,7 @@ static uint16_t default_GEN14_GA1[] = {
     S390_FEAT_GROUP_MSA_EXT_8,
     S390_FEAT_MULTIPLE_EPOCH,
     S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
+    S390_FEAT_ZPCI_INTERP,
 };
 
 #define default_GEN14_GA2 EmptyFeat
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 6acf14d5ec..0357bfda89 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2294,6 +2294,7 @@ static int kvm_to_feat[][2] = {
     { KVM_S390_VM_CPU_FEAT_PFMFI, S390_FEAT_SIE_PFMFI},
     { KVM_S390_VM_CPU_FEAT_SIGPIF, S390_FEAT_SIE_SIGPIF},
     { KVM_S390_VM_CPU_FEAT_KSS, S390_FEAT_SIE_KSS},
+    { KVM_S390_VM_CPU_FEAT_ZPCI_INTERP, S390_FEAT_ZPCI_INTERP },
 };
 
 static int query_cpu_feat(S390FeatBitmap features)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 4/9] s390x/pci: add routine to get host function handle from CLP info
  2022-04-04 18:17 ` Matthew Rosato
@ 2022-04-04 18:17   ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

In order to interface with the underlying host zPCI device, we need
to know it's function handle.  Add a routine to grab this from the
vfio CLP capabilities chain.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-vfio.c         | 83 ++++++++++++++++++++++++++------
 include/hw/s390x/s390-pci-vfio.h |  6 +++
 2 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 6f80a47e29..4bf0a7e22d 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -124,6 +124,27 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev,
     pbdev->zpci_fn.pft = 0;
 }
 
+static bool get_host_fh(S390PCIBusDevice *pbdev, struct vfio_device_info *info,
+                        uint32_t *fh)
+{
+    struct vfio_info_cap_header *hdr;
+    struct vfio_device_info_cap_zpci_base *cap;
+    VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+
+    hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+
+    /* Can only get the host fh with version 2 or greater */
+    if (hdr == NULL || hdr->version < 2) {
+        trace_s390_pci_clp_cap(vpci->vbasedev.name,
+                               VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+        return false;
+    }
+    cap = (void *) hdr;
+
+    *fh = cap->fh;
+    return true;
+}
+
 static void s390_pci_read_group(S390PCIBusDevice *pbdev,
                                 struct vfio_device_info *info)
 {
@@ -217,25 +238,13 @@ static void s390_pci_read_pfip(S390PCIBusDevice *pbdev,
     memcpy(pbdev->zpci_fn.pfip, cap->pfip, CLP_PFIP_NR_SEGMENTS);
 }
 
-/*
- * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
- * capabilities that contain information about CLP features provided by the
- * underlying host.
- * On entry, defaults have already been placed into the guest CLP response
- * buffers.  On exit, defaults will have been overwritten for any CLP features
- * found in the capability chain; defaults will remain for any CLP features not
- * found in the chain.
- */
-void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
+static struct vfio_device_info *get_device_info(S390PCIBusDevice *pbdev,
+                                                uint32_t argsz)
 {
-    g_autofree struct vfio_device_info *info = NULL;
+    struct vfio_device_info *info = g_malloc0(argsz);
     VFIOPCIDevice *vfio_pci;
-    uint32_t argsz;
     int fd;
 
-    argsz = sizeof(*info);
-    info = g_malloc0(argsz);
-
     vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
     fd = vfio_pci->vbasedev.fd;
 
@@ -250,7 +259,8 @@ retry:
 
     if (ioctl(fd, VFIO_DEVICE_GET_INFO, info)) {
         trace_s390_pci_clp_dev_info(vfio_pci->vbasedev.name);
-        return;
+        free(info);
+        return NULL;
     }
 
     if (info->argsz > argsz) {
@@ -259,6 +269,47 @@ retry:
         goto retry;
     }
 
+    return info;
+}
+
+/*
+ * Get the host function handle from the vfio CLP capabilities chain.  Returns
+ * true if a fh value was placed into the provided buffer.  Returns false
+ * if a fh could not be obtained (ioctl failed or capabilitiy version does
+ * not include the fh)
+ */
+bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
+{
+    g_autofree struct vfio_device_info *info = NULL;
+
+    assert(fh);
+
+    info = get_device_info(pbdev, sizeof(*info));
+    if (!info) {
+        return false;
+    }
+
+    return get_host_fh(pbdev, info, fh);
+}
+
+/*
+ * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
+ * capabilities that contain information about CLP features provided by the
+ * underlying host.
+ * On entry, defaults have already been placed into the guest CLP response
+ * buffers.  On exit, defaults will have been overwritten for any CLP features
+ * found in the capability chain; defaults will remain for any CLP features not
+ * found in the chain.
+ */
+void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
+{
+    g_autofree struct vfio_device_info *info = NULL;
+
+    info = get_device_info(pbdev, sizeof(*info));
+    if (!info) {
+        return;
+    }
+
     /*
      * Find the CLP features provided and fill in the guest CLP responses.
      * Always call s390_pci_read_base first as information from this could
diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
index ff708aef50..0c2e4b5175 100644
--- a/include/hw/s390x/s390-pci-vfio.h
+++ b/include/hw/s390x/s390-pci-vfio.h
@@ -20,6 +20,7 @@ bool s390_pci_update_dma_avail(int fd, unsigned int *avail);
 S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
                                           S390PCIBusDevice *pbdev);
 void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
+bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
 void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
 #else
 static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
@@ -33,6 +34,11 @@ static inline S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
 }
 static inline void s390_pci_end_dma_count(S390pciState *s,
                                           S390PCIDMACount *cnt) { }
+static inline bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev,
+                                        unsigned int *fh)
+{
+    return false;
+}
 static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
 #endif
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 4/9] s390x/pci: add routine to get host function handle from CLP info
@ 2022-04-04 18:17   ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

In order to interface with the underlying host zPCI device, we need
to know it's function handle.  Add a routine to grab this from the
vfio CLP capabilities chain.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-vfio.c         | 83 ++++++++++++++++++++++++++------
 include/hw/s390x/s390-pci-vfio.h |  6 +++
 2 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 6f80a47e29..4bf0a7e22d 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -124,6 +124,27 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev,
     pbdev->zpci_fn.pft = 0;
 }
 
+static bool get_host_fh(S390PCIBusDevice *pbdev, struct vfio_device_info *info,
+                        uint32_t *fh)
+{
+    struct vfio_info_cap_header *hdr;
+    struct vfio_device_info_cap_zpci_base *cap;
+    VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+
+    hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+
+    /* Can only get the host fh with version 2 or greater */
+    if (hdr == NULL || hdr->version < 2) {
+        trace_s390_pci_clp_cap(vpci->vbasedev.name,
+                               VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
+        return false;
+    }
+    cap = (void *) hdr;
+
+    *fh = cap->fh;
+    return true;
+}
+
 static void s390_pci_read_group(S390PCIBusDevice *pbdev,
                                 struct vfio_device_info *info)
 {
@@ -217,25 +238,13 @@ static void s390_pci_read_pfip(S390PCIBusDevice *pbdev,
     memcpy(pbdev->zpci_fn.pfip, cap->pfip, CLP_PFIP_NR_SEGMENTS);
 }
 
-/*
- * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
- * capabilities that contain information about CLP features provided by the
- * underlying host.
- * On entry, defaults have already been placed into the guest CLP response
- * buffers.  On exit, defaults will have been overwritten for any CLP features
- * found in the capability chain; defaults will remain for any CLP features not
- * found in the chain.
- */
-void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
+static struct vfio_device_info *get_device_info(S390PCIBusDevice *pbdev,
+                                                uint32_t argsz)
 {
-    g_autofree struct vfio_device_info *info = NULL;
+    struct vfio_device_info *info = g_malloc0(argsz);
     VFIOPCIDevice *vfio_pci;
-    uint32_t argsz;
     int fd;
 
-    argsz = sizeof(*info);
-    info = g_malloc0(argsz);
-
     vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
     fd = vfio_pci->vbasedev.fd;
 
@@ -250,7 +259,8 @@ retry:
 
     if (ioctl(fd, VFIO_DEVICE_GET_INFO, info)) {
         trace_s390_pci_clp_dev_info(vfio_pci->vbasedev.name);
-        return;
+        free(info);
+        return NULL;
     }
 
     if (info->argsz > argsz) {
@@ -259,6 +269,47 @@ retry:
         goto retry;
     }
 
+    return info;
+}
+
+/*
+ * Get the host function handle from the vfio CLP capabilities chain.  Returns
+ * true if a fh value was placed into the provided buffer.  Returns false
+ * if a fh could not be obtained (ioctl failed or capabilitiy version does
+ * not include the fh)
+ */
+bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
+{
+    g_autofree struct vfio_device_info *info = NULL;
+
+    assert(fh);
+
+    info = get_device_info(pbdev, sizeof(*info));
+    if (!info) {
+        return false;
+    }
+
+    return get_host_fh(pbdev, info, fh);
+}
+
+/*
+ * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
+ * capabilities that contain information about CLP features provided by the
+ * underlying host.
+ * On entry, defaults have already been placed into the guest CLP response
+ * buffers.  On exit, defaults will have been overwritten for any CLP features
+ * found in the capability chain; defaults will remain for any CLP features not
+ * found in the chain.
+ */
+void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
+{
+    g_autofree struct vfio_device_info *info = NULL;
+
+    info = get_device_info(pbdev, sizeof(*info));
+    if (!info) {
+        return;
+    }
+
     /*
      * Find the CLP features provided and fill in the guest CLP responses.
      * Always call s390_pci_read_base first as information from this could
diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
index ff708aef50..0c2e4b5175 100644
--- a/include/hw/s390x/s390-pci-vfio.h
+++ b/include/hw/s390x/s390-pci-vfio.h
@@ -20,6 +20,7 @@ bool s390_pci_update_dma_avail(int fd, unsigned int *avail);
 S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
                                           S390PCIBusDevice *pbdev);
 void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
+bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
 void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
 #else
 static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
@@ -33,6 +34,11 @@ static inline S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
 }
 static inline void s390_pci_end_dma_count(S390pciState *s,
                                           S390PCIDMACount *cnt) { }
+static inline bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev,
+                                        unsigned int *fh)
+{
+    return false;
+}
 static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
 #endif
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 5/9] s390x/pci: enable for load/store intepretation
  2022-04-04 18:17 ` Matthew Rosato
@ 2022-04-04 18:17   ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

If the appropriate CPU facilty is available as well as the necessary
ZPCI_OP ioctl, then the underlying KVM host will enable load/store
intepretation for any guest device without a SHM bit in the guest
function handle.  For a device that will be using interpretation
support, ensure the guest function handle matches the host function
handle; this value is re-checked every time the guest issues a SET PCI FN
to enable the guest device as it is the only opportunity to reflect
function handle changes.

By default, unless interpret=off is specified, interpretation support will
always be assumed and exploited if the necessary ioctl and features are
available on the host kernel.  When these are unavailable, we will silently
revert to the interception model; this allows existing guest configurations
to work unmodified on hosts with and without zPCI interpretation support,
allowing QEMU to choose the best support model available.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/meson.build            |  1 +
 hw/s390x/s390-pci-bus.c         | 66 ++++++++++++++++++++++++++++++++-
 hw/s390x/s390-pci-inst.c        | 12 ++++++
 hw/s390x/s390-pci-kvm.c         | 21 +++++++++++
 include/hw/s390x/s390-pci-bus.h |  1 +
 include/hw/s390x/s390-pci-kvm.h | 24 ++++++++++++
 target/s390x/kvm/kvm.c          |  7 ++++
 target/s390x/kvm/kvm_s390x.h    |  1 +
 8 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 hw/s390x/s390-pci-kvm.c
 create mode 100644 include/hw/s390x/s390-pci-kvm.h

diff --git a/hw/s390x/meson.build b/hw/s390x/meson.build
index 28484256ec..6e6e47fcda 100644
--- a/hw/s390x/meson.build
+++ b/hw/s390x/meson.build
@@ -23,6 +23,7 @@ s390x_ss.add(when: 'CONFIG_KVM', if_true: files(
   's390-skeys-kvm.c',
   's390-stattrib-kvm.c',
   'pv.c',
+  's390-pci-kvm.c',
 ))
 s390x_ss.add(when: 'CONFIG_TCG', if_true: files(
   'tod-tcg.c',
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 4b2bdd94b3..156051e6e9 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -16,6 +16,7 @@
 #include "qapi/visitor.h"
 #include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-inst.h"
+#include "hw/s390x/s390-pci-kvm.h"
 #include "hw/s390x/s390-pci-vfio.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/qdev-properties.h"
@@ -971,12 +972,51 @@ static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
     }
 }
 
+static int s390_pci_interp_plug(S390pciState *s, S390PCIBusDevice *pbdev)
+{
+    uint32_t idx, fh;
+
+    if (!s390_pci_get_host_fh(pbdev, &fh)) {
+        return -EPERM;
+    }
+
+    /*
+     * The host device is already in an enabled state, but we always present
+     * the initial device state to the guest as disabled (ZPCI_FS_DISABLED).
+     * Therefore, mask off the enable bit from the passthrough handle until
+     * the guest issues a CLP SET PCI FN later to enable the device.
+     */
+    pbdev->fh = fh & ~FH_MASK_ENABLE;
+
+    /* Next, see if the idx is already in-use */
+    idx = pbdev->fh & FH_MASK_INDEX;
+    if (pbdev->idx != idx) {
+        if (s390_pci_find_dev_by_idx(s, idx)) {
+            return -EINVAL;
+        }
+        /*
+         * Update the idx entry with the passed through idx
+         * If the relinquished idx is lower than next_idx, use it
+         * to replace next_idx
+         */
+        g_hash_table_remove(s->zpci_table, &pbdev->idx);
+        if (idx < s->next_idx) {
+            s->next_idx = idx;
+        }
+        pbdev->idx = idx;
+        g_hash_table_insert(s->zpci_table, &pbdev->idx, pbdev);
+    }
+
+    return 0;
+}
+
 static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                               Error **errp)
 {
     S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
     PCIDevice *pdev = NULL;
     S390PCIBusDevice *pbdev = NULL;
+    int rc;
 
     if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
         PCIBridge *pb = PCI_BRIDGE(dev);
@@ -1022,12 +1062,35 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         set_pbdev_info(pbdev);
 
         if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
-            pbdev->fh |= FH_SHM_VFIO;
+            /*
+             * By default, interpretation is always requested; if the available
+             * facilities indicate it is not available, fallback to the
+             * interception model.
+             */
+            if (pbdev->interp) {
+                if (s390_pci_kvm_interp_allowed()) {
+                    rc = s390_pci_interp_plug(s, pbdev);
+                    if (rc) {
+                        error_setg(errp, "Plug failed for zPCI device in "
+                                   "interpretation mode: %d", rc);
+                        return;
+                    }
+                } else {
+                    DPRINTF("zPCI interpretation facilities missing.\n");
+                    pbdev->interp = false;
+                }
+            }
             pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
             /* Fill in CLP information passed via the vfio region */
             s390_pci_get_clp_info(pbdev);
+            if (!pbdev->interp) {
+                /* Do vfio passthrough but intercept for I/O */
+                pbdev->fh |= FH_SHM_VFIO;
+            }
         } else {
             pbdev->fh |= FH_SHM_EMUL;
+            /* Always intercept emulated devices */
+            pbdev->interp = false;
         }
 
         if (s390_pci_msix_init(pbdev)) {
@@ -1360,6 +1423,7 @@ static Property s390_pci_device_properties[] = {
     DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
     DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
     DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
+    DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 6d400d4147..c898c8abe9 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -18,6 +18,8 @@
 #include "sysemu/hw_accel.h"
 #include "hw/s390x/s390-pci-inst.h"
 #include "hw/s390x/s390-pci-bus.h"
+#include "hw/s390x/s390-pci-kvm.h"
+#include "hw/s390x/s390-pci-vfio.h"
 #include "hw/s390x/tod.h"
 
 #ifndef DEBUG_S390PCI_INST
@@ -246,6 +248,16 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
                 goto out;
             }
 
+            /*
+             * Take this opportunity to make sure we still have an accurate
+             * host fh.  It's possible part of the handle changed while the
+             * device was disabled to the guest (e.g. vfio hot reset for
+             * ISM during plug)
+             */
+            if (pbdev->interp) {
+                /* Take this opportunity to make sure we are sync'd with host */
+                s390_pci_get_host_fh(pbdev, &pbdev->fh);
+            }
             pbdev->fh |= FH_MASK_ENABLE;
             pbdev->state = ZPCI_FS_ENABLED;
             stl_p(&ressetpci->fh, pbdev->fh);
diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
new file mode 100644
index 0000000000..8bfce9ef18
--- /dev/null
+++ b/hw/s390x/s390-pci-kvm.c
@@ -0,0 +1,21 @@
+/*
+ * s390 zPCI KVM interfaces
+ *
+ * Copyright 2022 IBM Corp.
+ * Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "kvm/kvm_s390x.h"
+#include "hw/s390x/s390-pci-kvm.h"
+#include "cpu_models.h"
+
+bool s390_pci_kvm_interp_allowed(void)
+{
+    return s390_has_feat(S390_FEAT_ZPCI_INTERP) && kvm_s390_get_zpci_op();
+}
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index da3cde2bb4..a9843dfe97 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -350,6 +350,7 @@ struct S390PCIBusDevice {
     IndAddr *indicator;
     bool pci_unplug_request_processed;
     bool unplug_requested;
+    bool interp;
     QTAILQ_ENTRY(S390PCIBusDevice) link;
 };
 
diff --git a/include/hw/s390x/s390-pci-kvm.h b/include/hw/s390x/s390-pci-kvm.h
new file mode 100644
index 0000000000..80a2e7d0ca
--- /dev/null
+++ b/include/hw/s390x/s390-pci-kvm.h
@@ -0,0 +1,24 @@
+/*
+ * s390 PCI KVM interfaces
+ *
+ * Copyright 2022 IBM Corp.
+ * Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef HW_S390_PCI_KVM_H
+#define HW_S390_PCI_KVM_H
+
+#ifdef CONFIG_KVM
+bool s390_pci_kvm_interp_allowed(void);
+#else
+static inline bool s390_pci_kvm_interp_allowed(void)
+{
+    return false;
+}
+#endif
+
+#endif
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 0357bfda89..288fbd1d75 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -157,6 +157,7 @@ static int cap_ri;
 static int cap_hpage_1m;
 static int cap_vcpu_resets;
 static int cap_protected;
+static int cap_zpci_op;
 
 static int active_cmma;
 
@@ -358,6 +359,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ);
     cap_vcpu_resets = kvm_check_extension(s, KVM_CAP_S390_VCPU_RESETS);
     cap_protected = kvm_check_extension(s, KVM_CAP_S390_PROTECTED);
+    cap_zpci_op = kvm_check_extension(s, KVM_CAP_S390_ZPCI_OP);
 
     kvm_vm_enable_cap(s, KVM_CAP_S390_USER_SIGP, 0);
     kvm_vm_enable_cap(s, KVM_CAP_S390_VECTOR_REGISTERS, 0);
@@ -2567,3 +2569,8 @@ bool kvm_arch_cpu_check_are_resettable(void)
 {
     return true;
 }
+
+int kvm_s390_get_zpci_op(void)
+{
+    return cap_zpci_op;
+}
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index 05a5e1e6f4..aaae8570de 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -27,6 +27,7 @@ void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu);
 int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu);
 int kvm_s390_get_hpage_1m(void);
 int kvm_s390_get_ri(void);
+int kvm_s390_get_zpci_op(void);
 int kvm_s390_get_clock(uint8_t *tod_high, uint64_t *tod_clock);
 int kvm_s390_get_clock_ext(uint8_t *tod_high, uint64_t *tod_clock);
 int kvm_s390_set_clock(uint8_t tod_high, uint64_t tod_clock);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 5/9] s390x/pci: enable for load/store intepretation
@ 2022-04-04 18:17   ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

If the appropriate CPU facilty is available as well as the necessary
ZPCI_OP ioctl, then the underlying KVM host will enable load/store
intepretation for any guest device without a SHM bit in the guest
function handle.  For a device that will be using interpretation
support, ensure the guest function handle matches the host function
handle; this value is re-checked every time the guest issues a SET PCI FN
to enable the guest device as it is the only opportunity to reflect
function handle changes.

By default, unless interpret=off is specified, interpretation support will
always be assumed and exploited if the necessary ioctl and features are
available on the host kernel.  When these are unavailable, we will silently
revert to the interception model; this allows existing guest configurations
to work unmodified on hosts with and without zPCI interpretation support,
allowing QEMU to choose the best support model available.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/meson.build            |  1 +
 hw/s390x/s390-pci-bus.c         | 66 ++++++++++++++++++++++++++++++++-
 hw/s390x/s390-pci-inst.c        | 12 ++++++
 hw/s390x/s390-pci-kvm.c         | 21 +++++++++++
 include/hw/s390x/s390-pci-bus.h |  1 +
 include/hw/s390x/s390-pci-kvm.h | 24 ++++++++++++
 target/s390x/kvm/kvm.c          |  7 ++++
 target/s390x/kvm/kvm_s390x.h    |  1 +
 8 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 hw/s390x/s390-pci-kvm.c
 create mode 100644 include/hw/s390x/s390-pci-kvm.h

diff --git a/hw/s390x/meson.build b/hw/s390x/meson.build
index 28484256ec..6e6e47fcda 100644
--- a/hw/s390x/meson.build
+++ b/hw/s390x/meson.build
@@ -23,6 +23,7 @@ s390x_ss.add(when: 'CONFIG_KVM', if_true: files(
   's390-skeys-kvm.c',
   's390-stattrib-kvm.c',
   'pv.c',
+  's390-pci-kvm.c',
 ))
 s390x_ss.add(when: 'CONFIG_TCG', if_true: files(
   'tod-tcg.c',
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 4b2bdd94b3..156051e6e9 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -16,6 +16,7 @@
 #include "qapi/visitor.h"
 #include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-inst.h"
+#include "hw/s390x/s390-pci-kvm.h"
 #include "hw/s390x/s390-pci-vfio.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/qdev-properties.h"
@@ -971,12 +972,51 @@ static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
     }
 }
 
+static int s390_pci_interp_plug(S390pciState *s, S390PCIBusDevice *pbdev)
+{
+    uint32_t idx, fh;
+
+    if (!s390_pci_get_host_fh(pbdev, &fh)) {
+        return -EPERM;
+    }
+
+    /*
+     * The host device is already in an enabled state, but we always present
+     * the initial device state to the guest as disabled (ZPCI_FS_DISABLED).
+     * Therefore, mask off the enable bit from the passthrough handle until
+     * the guest issues a CLP SET PCI FN later to enable the device.
+     */
+    pbdev->fh = fh & ~FH_MASK_ENABLE;
+
+    /* Next, see if the idx is already in-use */
+    idx = pbdev->fh & FH_MASK_INDEX;
+    if (pbdev->idx != idx) {
+        if (s390_pci_find_dev_by_idx(s, idx)) {
+            return -EINVAL;
+        }
+        /*
+         * Update the idx entry with the passed through idx
+         * If the relinquished idx is lower than next_idx, use it
+         * to replace next_idx
+         */
+        g_hash_table_remove(s->zpci_table, &pbdev->idx);
+        if (idx < s->next_idx) {
+            s->next_idx = idx;
+        }
+        pbdev->idx = idx;
+        g_hash_table_insert(s->zpci_table, &pbdev->idx, pbdev);
+    }
+
+    return 0;
+}
+
 static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                               Error **errp)
 {
     S390pciState *s = S390_PCI_HOST_BRIDGE(hotplug_dev);
     PCIDevice *pdev = NULL;
     S390PCIBusDevice *pbdev = NULL;
+    int rc;
 
     if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
         PCIBridge *pb = PCI_BRIDGE(dev);
@@ -1022,12 +1062,35 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         set_pbdev_info(pbdev);
 
         if (object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
-            pbdev->fh |= FH_SHM_VFIO;
+            /*
+             * By default, interpretation is always requested; if the available
+             * facilities indicate it is not available, fallback to the
+             * interception model.
+             */
+            if (pbdev->interp) {
+                if (s390_pci_kvm_interp_allowed()) {
+                    rc = s390_pci_interp_plug(s, pbdev);
+                    if (rc) {
+                        error_setg(errp, "Plug failed for zPCI device in "
+                                   "interpretation mode: %d", rc);
+                        return;
+                    }
+                } else {
+                    DPRINTF("zPCI interpretation facilities missing.\n");
+                    pbdev->interp = false;
+                }
+            }
             pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
             /* Fill in CLP information passed via the vfio region */
             s390_pci_get_clp_info(pbdev);
+            if (!pbdev->interp) {
+                /* Do vfio passthrough but intercept for I/O */
+                pbdev->fh |= FH_SHM_VFIO;
+            }
         } else {
             pbdev->fh |= FH_SHM_EMUL;
+            /* Always intercept emulated devices */
+            pbdev->interp = false;
         }
 
         if (s390_pci_msix_init(pbdev)) {
@@ -1360,6 +1423,7 @@ static Property s390_pci_device_properties[] = {
     DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
     DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
     DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
+    DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 6d400d4147..c898c8abe9 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -18,6 +18,8 @@
 #include "sysemu/hw_accel.h"
 #include "hw/s390x/s390-pci-inst.h"
 #include "hw/s390x/s390-pci-bus.h"
+#include "hw/s390x/s390-pci-kvm.h"
+#include "hw/s390x/s390-pci-vfio.h"
 #include "hw/s390x/tod.h"
 
 #ifndef DEBUG_S390PCI_INST
@@ -246,6 +248,16 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
                 goto out;
             }
 
+            /*
+             * Take this opportunity to make sure we still have an accurate
+             * host fh.  It's possible part of the handle changed while the
+             * device was disabled to the guest (e.g. vfio hot reset for
+             * ISM during plug)
+             */
+            if (pbdev->interp) {
+                /* Take this opportunity to make sure we are sync'd with host */
+                s390_pci_get_host_fh(pbdev, &pbdev->fh);
+            }
             pbdev->fh |= FH_MASK_ENABLE;
             pbdev->state = ZPCI_FS_ENABLED;
             stl_p(&ressetpci->fh, pbdev->fh);
diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
new file mode 100644
index 0000000000..8bfce9ef18
--- /dev/null
+++ b/hw/s390x/s390-pci-kvm.c
@@ -0,0 +1,21 @@
+/*
+ * s390 zPCI KVM interfaces
+ *
+ * Copyright 2022 IBM Corp.
+ * Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "kvm/kvm_s390x.h"
+#include "hw/s390x/s390-pci-kvm.h"
+#include "cpu_models.h"
+
+bool s390_pci_kvm_interp_allowed(void)
+{
+    return s390_has_feat(S390_FEAT_ZPCI_INTERP) && kvm_s390_get_zpci_op();
+}
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index da3cde2bb4..a9843dfe97 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -350,6 +350,7 @@ struct S390PCIBusDevice {
     IndAddr *indicator;
     bool pci_unplug_request_processed;
     bool unplug_requested;
+    bool interp;
     QTAILQ_ENTRY(S390PCIBusDevice) link;
 };
 
diff --git a/include/hw/s390x/s390-pci-kvm.h b/include/hw/s390x/s390-pci-kvm.h
new file mode 100644
index 0000000000..80a2e7d0ca
--- /dev/null
+++ b/include/hw/s390x/s390-pci-kvm.h
@@ -0,0 +1,24 @@
+/*
+ * s390 PCI KVM interfaces
+ *
+ * Copyright 2022 IBM Corp.
+ * Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef HW_S390_PCI_KVM_H
+#define HW_S390_PCI_KVM_H
+
+#ifdef CONFIG_KVM
+bool s390_pci_kvm_interp_allowed(void);
+#else
+static inline bool s390_pci_kvm_interp_allowed(void)
+{
+    return false;
+}
+#endif
+
+#endif
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 0357bfda89..288fbd1d75 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -157,6 +157,7 @@ static int cap_ri;
 static int cap_hpage_1m;
 static int cap_vcpu_resets;
 static int cap_protected;
+static int cap_zpci_op;
 
 static int active_cmma;
 
@@ -358,6 +359,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ);
     cap_vcpu_resets = kvm_check_extension(s, KVM_CAP_S390_VCPU_RESETS);
     cap_protected = kvm_check_extension(s, KVM_CAP_S390_PROTECTED);
+    cap_zpci_op = kvm_check_extension(s, KVM_CAP_S390_ZPCI_OP);
 
     kvm_vm_enable_cap(s, KVM_CAP_S390_USER_SIGP, 0);
     kvm_vm_enable_cap(s, KVM_CAP_S390_VECTOR_REGISTERS, 0);
@@ -2567,3 +2569,8 @@ bool kvm_arch_cpu_check_are_resettable(void)
 {
     return true;
 }
+
+int kvm_s390_get_zpci_op(void)
+{
+    return cap_zpci_op;
+}
diff --git a/target/s390x/kvm/kvm_s390x.h b/target/s390x/kvm/kvm_s390x.h
index 05a5e1e6f4..aaae8570de 100644
--- a/target/s390x/kvm/kvm_s390x.h
+++ b/target/s390x/kvm/kvm_s390x.h
@@ -27,6 +27,7 @@ void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu);
 int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu);
 int kvm_s390_get_hpage_1m(void);
 int kvm_s390_get_ri(void);
+int kvm_s390_get_zpci_op(void);
 int kvm_s390_get_clock(uint8_t *tod_high, uint64_t *tod_clock);
 int kvm_s390_get_clock_ext(uint8_t *tod_high, uint64_t *tod_clock);
 int kvm_s390_set_clock(uint8_t tod_high, uint64_t tod_clock);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 6/9] s390x/pci: don't fence interpreted devices without MSI-X
  2022-04-04 18:17 ` Matthew Rosato
@ 2022-04-04 18:17   ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

Lack of MSI-X support is not an issue for interpreted passthrough
devices, so let's let these in.  This will allow, for example, ISM
devices to be passed through -- but only when interpretation is
available and being used.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 156051e6e9..9c02d31250 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -1093,7 +1093,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
             pbdev->interp = false;
         }
 
-        if (s390_pci_msix_init(pbdev)) {
+        if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
             error_setg(errp, "MSI-X support is mandatory "
                        "in the S390 architecture");
             return;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 6/9] s390x/pci: don't fence interpreted devices without MSI-X
@ 2022-04-04 18:17   ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

Lack of MSI-X support is not an issue for interpreted passthrough
devices, so let's let these in.  This will allow, for example, ISM
devices to be passed through -- but only when interpretation is
available and being used.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 156051e6e9..9c02d31250 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -1093,7 +1093,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
             pbdev->interp = false;
         }
 
-        if (s390_pci_msix_init(pbdev)) {
+        if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
             error_setg(errp, "MSI-X support is mandatory "
                        "in the S390 architecture");
             return;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
  2022-04-04 18:17 ` Matthew Rosato
@ 2022-04-04 18:17   ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

Use the associated kvm ioctl operation to enable adapter event notification
and forwarding for devices when requested.  This feature will be set up
with or without firmware assist based upon the 'forwarding_assist' setting.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
 hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
 hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
 include/hw/s390x/s390-pci-bus.h |  1 +
 include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
 5 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 9c02d31250..47918d2ce9 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
         rc = SCLP_RC_NO_ACTION_REQUIRED;
         break;
     default:
-        if (pbdev->summary_ind) {
+        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+            /* Interpreted devices were using interrupt forwarding */
+            s390_pci_kvm_aif_disable(pbdev);
+        } else if (pbdev->summary_ind) {
             pci_dereg_irqs(pbdev);
         }
         if (pbdev->iommu->enabled) {
@@ -1078,6 +1081,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                 } else {
                     DPRINTF("zPCI interpretation facilities missing.\n");
                     pbdev->interp = false;
+                    pbdev->forwarding_assist = false;
                 }
             }
             pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
@@ -1086,11 +1090,13 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
             if (!pbdev->interp) {
                 /* Do vfio passthrough but intercept for I/O */
                 pbdev->fh |= FH_SHM_VFIO;
+                pbdev->forwarding_assist = false;
             }
         } else {
             pbdev->fh |= FH_SHM_EMUL;
             /* Always intercept emulated devices */
             pbdev->interp = false;
+            pbdev->forwarding_assist = false;
         }
 
         if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
@@ -1240,7 +1246,10 @@ static void s390_pcihost_reset(DeviceState *dev)
     /* Process all pending unplug requests */
     QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
         if (pbdev->unplug_requested) {
-            if (pbdev->summary_ind) {
+            if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+                /* Interpreted devices were using interrupt forwarding */
+                s390_pci_kvm_aif_disable(pbdev);
+            } else if (pbdev->summary_ind) {
                 pci_dereg_irqs(pbdev);
             }
             if (pbdev->iommu->enabled) {
@@ -1378,7 +1387,10 @@ static void s390_pci_device_reset(DeviceState *dev)
         break;
     }
 
-    if (pbdev->summary_ind) {
+    if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+        /* Interpreted devices were using interrupt forwarding */
+        s390_pci_kvm_aif_disable(pbdev);
+    } else if (pbdev->summary_ind) {
         pci_dereg_irqs(pbdev);
     }
     if (pbdev->iommu->enabled) {
@@ -1424,6 +1436,8 @@ static Property s390_pci_device_properties[] = {
     DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
     DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
     DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
+    DEFINE_PROP_BOOL("forwarding_assist", S390PCIBusDevice, forwarding_assist,
+                     true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index c898c8abe9..c3a34da73d 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -1062,6 +1062,32 @@ static void fmb_update(void *opaque)
     timer_mod(pbdev->fmb_timer, t + pbdev->pci_group->zpci_group.mui);
 }
 
+static int mpcifc_reg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+    int rc;
+
+    rc = s390_pci_kvm_aif_enable(pbdev, fib, pbdev->forwarding_assist);
+    if (rc) {
+        DPRINTF("Failed to enable interrupt forwarding\n");
+        return rc;
+    }
+
+    return 0;
+}
+
+static int mpcifc_dereg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+    int rc;
+
+    rc = s390_pci_kvm_aif_disable(pbdev);
+    if (rc) {
+        DPRINTF("Failed to disable interrupt forwarding\n");
+        return rc;
+    }
+
+    return 0;
+}
+
 int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
                         uintptr_t ra)
 {
@@ -1116,7 +1142,12 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
 
     switch (oc) {
     case ZPCI_MOD_FC_REG_INT:
-        if (pbdev->summary_ind) {
+        if (pbdev->interp) {
+            if (mpcifc_reg_int_interp(pbdev, &fib)) {
+                cc = ZPCI_PCI_LS_ERR;
+                s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
+            }
+        } else if (pbdev->summary_ind) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else if (reg_irqs(env, pbdev, fib)) {
@@ -1125,7 +1156,12 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
         }
         break;
     case ZPCI_MOD_FC_DEREG_INT:
-        if (!pbdev->summary_ind) {
+        if (pbdev->interp) {
+            if (mpcifc_dereg_int_interp(pbdev, &fib)) {
+                cc = ZPCI_PCI_LS_ERR;
+                s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
+            }
+        } else if (!pbdev->summary_ind) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else {
diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
index 8bfce9ef18..cb20b8dcb9 100644
--- a/hw/s390x/s390-pci-kvm.c
+++ b/hw/s390x/s390-pci-kvm.c
@@ -11,11 +11,41 @@
 
 #include "qemu/osdep.h"
 
+#include <linux/kvm.h>
+
 #include "kvm/kvm_s390x.h"
+#include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-kvm.h"
+#include "hw/s390x/s390-pci-inst.h"
 #include "cpu_models.h"
 
 bool s390_pci_kvm_interp_allowed(void)
 {
     return s390_has_feat(S390_FEAT_ZPCI_INTERP) && kvm_s390_get_zpci_op();
 }
+
+int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib, bool assist)
+{
+    struct kvm_s390_zpci_op args = {
+        .fh = pbdev->fh,
+        .op = KVM_S390_ZPCIOP_REG_AEN,
+        .u.reg_aen.ibv = fib->aibv,
+        .u.reg_aen.sb = fib->aisb,
+        .u.reg_aen.noi = FIB_DATA_NOI(fib->data),
+        .u.reg_aen.isc = FIB_DATA_ISC(fib->data),
+        .u.reg_aen.sbo = FIB_DATA_AISBO(fib->data),
+        .u.reg_aen.flags = (assist) ? 0 : KVM_S390_ZPCIOP_REGAEN_HOST
+    };
+
+    return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, &args);
+}
+
+int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev)
+{
+    struct kvm_s390_zpci_op args = {
+        .fh = pbdev->fh,
+        .op = KVM_S390_ZPCIOP_DEREG_AEN
+    };
+
+    return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, &args);
+}
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index a9843dfe97..5b09f0cf2f 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -351,6 +351,7 @@ struct S390PCIBusDevice {
     bool pci_unplug_request_processed;
     bool unplug_requested;
     bool interp;
+    bool forwarding_assist;
     QTAILQ_ENTRY(S390PCIBusDevice) link;
 };
 
diff --git a/include/hw/s390x/s390-pci-kvm.h b/include/hw/s390x/s390-pci-kvm.h
index 80a2e7d0ca..933814a402 100644
--- a/include/hw/s390x/s390-pci-kvm.h
+++ b/include/hw/s390x/s390-pci-kvm.h
@@ -12,13 +12,27 @@
 #ifndef HW_S390_PCI_KVM_H
 #define HW_S390_PCI_KVM_H
 
+#include "hw/s390x/s390-pci-bus.h"
+#include "hw/s390x/s390-pci-inst.h"
+
 #ifdef CONFIG_KVM
 bool s390_pci_kvm_interp_allowed(void);
+int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib, bool assist);
+int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev);
 #else
 static inline bool s390_pci_kvm_interp_allowed(void)
 {
     return false;
 }
+static inline int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib,
+                                          bool assist)
+{
+    return -EINVAL;
+}
+static inline int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev)
+{
+    return -EINVAL;
+}
 #endif
 
 #endif
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
@ 2022-04-04 18:17   ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

Use the associated kvm ioctl operation to enable adapter event notification
and forwarding for devices when requested.  This feature will be set up
with or without firmware assist based upon the 'forwarding_assist' setting.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
 hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
 hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
 include/hw/s390x/s390-pci-bus.h |  1 +
 include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
 5 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 9c02d31250..47918d2ce9 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
         rc = SCLP_RC_NO_ACTION_REQUIRED;
         break;
     default:
-        if (pbdev->summary_ind) {
+        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+            /* Interpreted devices were using interrupt forwarding */
+            s390_pci_kvm_aif_disable(pbdev);
+        } else if (pbdev->summary_ind) {
             pci_dereg_irqs(pbdev);
         }
         if (pbdev->iommu->enabled) {
@@ -1078,6 +1081,7 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                 } else {
                     DPRINTF("zPCI interpretation facilities missing.\n");
                     pbdev->interp = false;
+                    pbdev->forwarding_assist = false;
                 }
             }
             pbdev->iommu->dma_limit = s390_pci_start_dma_count(s, pbdev);
@@ -1086,11 +1090,13 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
             if (!pbdev->interp) {
                 /* Do vfio passthrough but intercept for I/O */
                 pbdev->fh |= FH_SHM_VFIO;
+                pbdev->forwarding_assist = false;
             }
         } else {
             pbdev->fh |= FH_SHM_EMUL;
             /* Always intercept emulated devices */
             pbdev->interp = false;
+            pbdev->forwarding_assist = false;
         }
 
         if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
@@ -1240,7 +1246,10 @@ static void s390_pcihost_reset(DeviceState *dev)
     /* Process all pending unplug requests */
     QTAILQ_FOREACH_SAFE(pbdev, &s->zpci_devs, link, next) {
         if (pbdev->unplug_requested) {
-            if (pbdev->summary_ind) {
+            if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+                /* Interpreted devices were using interrupt forwarding */
+                s390_pci_kvm_aif_disable(pbdev);
+            } else if (pbdev->summary_ind) {
                 pci_dereg_irqs(pbdev);
             }
             if (pbdev->iommu->enabled) {
@@ -1378,7 +1387,10 @@ static void s390_pci_device_reset(DeviceState *dev)
         break;
     }
 
-    if (pbdev->summary_ind) {
+    if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
+        /* Interpreted devices were using interrupt forwarding */
+        s390_pci_kvm_aif_disable(pbdev);
+    } else if (pbdev->summary_ind) {
         pci_dereg_irqs(pbdev);
     }
     if (pbdev->iommu->enabled) {
@@ -1424,6 +1436,8 @@ static Property s390_pci_device_properties[] = {
     DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
     DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
     DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
+    DEFINE_PROP_BOOL("forwarding_assist", S390PCIBusDevice, forwarding_assist,
+                     true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index c898c8abe9..c3a34da73d 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -1062,6 +1062,32 @@ static void fmb_update(void *opaque)
     timer_mod(pbdev->fmb_timer, t + pbdev->pci_group->zpci_group.mui);
 }
 
+static int mpcifc_reg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+    int rc;
+
+    rc = s390_pci_kvm_aif_enable(pbdev, fib, pbdev->forwarding_assist);
+    if (rc) {
+        DPRINTF("Failed to enable interrupt forwarding\n");
+        return rc;
+    }
+
+    return 0;
+}
+
+static int mpcifc_dereg_int_interp(S390PCIBusDevice *pbdev, ZpciFib *fib)
+{
+    int rc;
+
+    rc = s390_pci_kvm_aif_disable(pbdev);
+    if (rc) {
+        DPRINTF("Failed to disable interrupt forwarding\n");
+        return rc;
+    }
+
+    return 0;
+}
+
 int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
                         uintptr_t ra)
 {
@@ -1116,7 +1142,12 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
 
     switch (oc) {
     case ZPCI_MOD_FC_REG_INT:
-        if (pbdev->summary_ind) {
+        if (pbdev->interp) {
+            if (mpcifc_reg_int_interp(pbdev, &fib)) {
+                cc = ZPCI_PCI_LS_ERR;
+                s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
+            }
+        } else if (pbdev->summary_ind) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else if (reg_irqs(env, pbdev, fib)) {
@@ -1125,7 +1156,12 @@ int mpcifc_service_call(S390CPU *cpu, uint8_t r1, uint64_t fiba, uint8_t ar,
         }
         break;
     case ZPCI_MOD_FC_DEREG_INT:
-        if (!pbdev->summary_ind) {
+        if (pbdev->interp) {
+            if (mpcifc_dereg_int_interp(pbdev, &fib)) {
+                cc = ZPCI_PCI_LS_ERR;
+                s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
+            }
+        } else if (!pbdev->summary_ind) {
             cc = ZPCI_PCI_LS_ERR;
             s390_set_status_code(env, r1, ZPCI_MOD_ST_SEQUENCE);
         } else {
diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
index 8bfce9ef18..cb20b8dcb9 100644
--- a/hw/s390x/s390-pci-kvm.c
+++ b/hw/s390x/s390-pci-kvm.c
@@ -11,11 +11,41 @@
 
 #include "qemu/osdep.h"
 
+#include <linux/kvm.h>
+
 #include "kvm/kvm_s390x.h"
+#include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-kvm.h"
+#include "hw/s390x/s390-pci-inst.h"
 #include "cpu_models.h"
 
 bool s390_pci_kvm_interp_allowed(void)
 {
     return s390_has_feat(S390_FEAT_ZPCI_INTERP) && kvm_s390_get_zpci_op();
 }
+
+int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib, bool assist)
+{
+    struct kvm_s390_zpci_op args = {
+        .fh = pbdev->fh,
+        .op = KVM_S390_ZPCIOP_REG_AEN,
+        .u.reg_aen.ibv = fib->aibv,
+        .u.reg_aen.sb = fib->aisb,
+        .u.reg_aen.noi = FIB_DATA_NOI(fib->data),
+        .u.reg_aen.isc = FIB_DATA_ISC(fib->data),
+        .u.reg_aen.sbo = FIB_DATA_AISBO(fib->data),
+        .u.reg_aen.flags = (assist) ? 0 : KVM_S390_ZPCIOP_REGAEN_HOST
+    };
+
+    return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, &args);
+}
+
+int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev)
+{
+    struct kvm_s390_zpci_op args = {
+        .fh = pbdev->fh,
+        .op = KVM_S390_ZPCIOP_DEREG_AEN
+    };
+
+    return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, &args);
+}
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index a9843dfe97..5b09f0cf2f 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -351,6 +351,7 @@ struct S390PCIBusDevice {
     bool pci_unplug_request_processed;
     bool unplug_requested;
     bool interp;
+    bool forwarding_assist;
     QTAILQ_ENTRY(S390PCIBusDevice) link;
 };
 
diff --git a/include/hw/s390x/s390-pci-kvm.h b/include/hw/s390x/s390-pci-kvm.h
index 80a2e7d0ca..933814a402 100644
--- a/include/hw/s390x/s390-pci-kvm.h
+++ b/include/hw/s390x/s390-pci-kvm.h
@@ -12,13 +12,27 @@
 #ifndef HW_S390_PCI_KVM_H
 #define HW_S390_PCI_KVM_H
 
+#include "hw/s390x/s390-pci-bus.h"
+#include "hw/s390x/s390-pci-inst.h"
+
 #ifdef CONFIG_KVM
 bool s390_pci_kvm_interp_allowed(void);
+int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib, bool assist);
+int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev);
 #else
 static inline bool s390_pci_kvm_interp_allowed(void)
 {
     return false;
 }
+static inline int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib,
+                                          bool assist)
+{
+    return -EINVAL;
+}
+static inline int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev)
+{
+    return -EINVAL;
+}
 #endif
 
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 8/9] s390x/pci: let intercept devices have separate PCI groups
  2022-04-04 18:17 ` Matthew Rosato
@ 2022-04-04 18:17   ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

Let's use the reserved pool of simulated PCI groups to allow intercept
devices to have separate groups from interpreted devices as some group
values may be different. If we run out of simulated PCI groups, subsequent
intercept devices just get the default group.
Furthermore, if we encounter any PCI groups from hostdevs that are marked
as simulated, let's just assign them to the default group to avoid
conflicts between host simulated groups and our own simulated groups.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c         | 19 ++++++++++++++--
 hw/s390x/s390-pci-vfio.c        | 40 ++++++++++++++++++++++++++++++---
 include/hw/s390x/s390-pci-bus.h |  6 ++++-
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 47918d2ce9..a222a8f4f7 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -748,13 +748,14 @@ static void s390_pci_iommu_free(S390pciState *s, PCIBus *bus, int32_t devfn)
     object_unref(OBJECT(iommu));
 }
 
-S390PCIGroup *s390_group_create(int id)
+S390PCIGroup *s390_group_create(int id, int host_id)
 {
     S390PCIGroup *group;
     S390pciState *s = s390_get_phb();
 
     group = g_new0(S390PCIGroup, 1);
     group->id = id;
+    group->host_id = host_id;
     QTAILQ_INSERT_TAIL(&s->zpci_groups, group, link);
     return group;
 }
@@ -772,12 +773,25 @@ S390PCIGroup *s390_group_find(int id)
     return NULL;
 }
 
+S390PCIGroup *s390_group_find_host_sim(int host_id)
+{
+    S390PCIGroup *group;
+    S390pciState *s = s390_get_phb();
+
+    QTAILQ_FOREACH(group, &s->zpci_groups, link) {
+        if (group->id >= ZPCI_SIM_GRP_START && group->host_id == host_id) {
+            return group;
+        }
+    }
+    return NULL;
+}
+
 static void s390_pci_init_default_group(void)
 {
     S390PCIGroup *group;
     ClpRspQueryPciGrp *resgrp;
 
-    group = s390_group_create(ZPCI_DEFAULT_FN_GRP);
+    group = s390_group_create(ZPCI_DEFAULT_FN_GRP, ZPCI_DEFAULT_FN_GRP);
     resgrp = &group->zpci_group;
     resgrp->fr = 1;
     resgrp->dasm = 0;
@@ -825,6 +839,7 @@ static void s390_pcihost_realize(DeviceState *dev, Error **errp)
                                            NULL, g_free);
     s->zpci_table = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, NULL);
     s->bus_no = 0;
+    s->next_sim_grp = ZPCI_SIM_GRP_START;
     QTAILQ_INIT(&s->pending_sei);
     QTAILQ_INIT(&s->zpci_devs);
     QTAILQ_INIT(&s->zpci_dma_limit);
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 4bf0a7e22d..985980f021 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -150,13 +150,18 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 {
     struct vfio_info_cap_header *hdr;
     struct vfio_device_info_cap_zpci_group *cap;
+    S390pciState *s = s390_get_phb();
     ClpRspQueryPciGrp *resgrp;
     VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    uint8_t start_gid = pbdev->zpci_fn.pfgid;
 
     hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
 
-    /* If capability not provided, just use the default group */
-    if (hdr == NULL) {
+    /*
+     * If capability not provided or the underlying hostdev is simulated, just
+     * use the default group.
+     */
+    if (hdr == NULL || pbdev->zpci_fn.pfgid >= ZPCI_SIM_GRP_START) {
         trace_s390_pci_clp_cap(vpci->vbasedev.name,
                                VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
         pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
@@ -165,11 +170,40 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
     }
     cap = (void *) hdr;
 
+    /*
+     * For an intercept device, let's use an existing simulated group if one
+     * one was already created for other intercept devices in this group.
+     * If not, create a new simulated group if any are still available.
+     * If all else fails, just fall back on the default group.
+     */
+    if (!pbdev->interp) {
+        pbdev->pci_group = s390_group_find_host_sim(pbdev->zpci_fn.pfgid);
+        if (pbdev->pci_group) {
+            /* Use existing simulated group */
+            pbdev->zpci_fn.pfgid = pbdev->pci_group->id;
+            return;
+        } else {
+            if (s->next_sim_grp == ZPCI_DEFAULT_FN_GRP) {
+                /* All out of simulated groups, use default */
+                trace_s390_pci_clp_cap(vpci->vbasedev.name,
+                                       VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
+                pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
+                pbdev->pci_group = s390_group_find(ZPCI_DEFAULT_FN_GRP);
+                return;
+            } else {
+                /* We can assign a new simulated group */
+                pbdev->zpci_fn.pfgid = s->next_sim_grp;
+                s->next_sim_grp++;
+                /* Fall through to create the new sim group using CLP info */
+            }
+        }
+    }
+
     /* See if the PCI group is already defined, create if not */
     pbdev->pci_group = s390_group_find(pbdev->zpci_fn.pfgid);
 
     if (!pbdev->pci_group) {
-        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid);
+        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid, start_gid);
 
         resgrp = &pbdev->pci_group->zpci_group;
         if (cap->flags & VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH) {
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 5b09f0cf2f..0605fcea24 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -315,13 +315,16 @@ typedef struct ZpciFmb {
 QEMU_BUILD_BUG_MSG(offsetof(ZpciFmb, fmt0) != 48, "padding in ZpciFmb");
 
 #define ZPCI_DEFAULT_FN_GRP 0xFF
+#define ZPCI_SIM_GRP_START 0xF0
 typedef struct S390PCIGroup {
     ClpRspQueryPciGrp zpci_group;
     int id;
+    int host_id;
     QTAILQ_ENTRY(S390PCIGroup) link;
 } S390PCIGroup;
-S390PCIGroup *s390_group_create(int id);
+S390PCIGroup *s390_group_create(int id, int host_id);
 S390PCIGroup *s390_group_find(int id);
+S390PCIGroup *s390_group_find_host_sim(int host_id);
 
 struct S390PCIBusDevice {
     DeviceState qdev;
@@ -370,6 +373,7 @@ struct S390pciState {
     QTAILQ_HEAD(, S390PCIBusDevice) zpci_devs;
     QTAILQ_HEAD(, S390PCIDMACount) zpci_dma_limit;
     QTAILQ_HEAD(, S390PCIGroup) zpci_groups;
+    uint8_t next_sim_grp;
 };
 
 S390pciState *s390_get_phb(void);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 8/9] s390x/pci: let intercept devices have separate PCI groups
@ 2022-04-04 18:17   ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

Let's use the reserved pool of simulated PCI groups to allow intercept
devices to have separate groups from interpreted devices as some group
values may be different. If we run out of simulated PCI groups, subsequent
intercept devices just get the default group.
Furthermore, if we encounter any PCI groups from hostdevs that are marked
as simulated, let's just assign them to the default group to avoid
conflicts between host simulated groups and our own simulated groups.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-bus.c         | 19 ++++++++++++++--
 hw/s390x/s390-pci-vfio.c        | 40 ++++++++++++++++++++++++++++++---
 include/hw/s390x/s390-pci-bus.h |  6 ++++-
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 47918d2ce9..a222a8f4f7 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -748,13 +748,14 @@ static void s390_pci_iommu_free(S390pciState *s, PCIBus *bus, int32_t devfn)
     object_unref(OBJECT(iommu));
 }
 
-S390PCIGroup *s390_group_create(int id)
+S390PCIGroup *s390_group_create(int id, int host_id)
 {
     S390PCIGroup *group;
     S390pciState *s = s390_get_phb();
 
     group = g_new0(S390PCIGroup, 1);
     group->id = id;
+    group->host_id = host_id;
     QTAILQ_INSERT_TAIL(&s->zpci_groups, group, link);
     return group;
 }
@@ -772,12 +773,25 @@ S390PCIGroup *s390_group_find(int id)
     return NULL;
 }
 
+S390PCIGroup *s390_group_find_host_sim(int host_id)
+{
+    S390PCIGroup *group;
+    S390pciState *s = s390_get_phb();
+
+    QTAILQ_FOREACH(group, &s->zpci_groups, link) {
+        if (group->id >= ZPCI_SIM_GRP_START && group->host_id == host_id) {
+            return group;
+        }
+    }
+    return NULL;
+}
+
 static void s390_pci_init_default_group(void)
 {
     S390PCIGroup *group;
     ClpRspQueryPciGrp *resgrp;
 
-    group = s390_group_create(ZPCI_DEFAULT_FN_GRP);
+    group = s390_group_create(ZPCI_DEFAULT_FN_GRP, ZPCI_DEFAULT_FN_GRP);
     resgrp = &group->zpci_group;
     resgrp->fr = 1;
     resgrp->dasm = 0;
@@ -825,6 +839,7 @@ static void s390_pcihost_realize(DeviceState *dev, Error **errp)
                                            NULL, g_free);
     s->zpci_table = g_hash_table_new_full(g_int_hash, g_int_equal, NULL, NULL);
     s->bus_no = 0;
+    s->next_sim_grp = ZPCI_SIM_GRP_START;
     QTAILQ_INIT(&s->pending_sei);
     QTAILQ_INIT(&s->zpci_devs);
     QTAILQ_INIT(&s->zpci_dma_limit);
diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 4bf0a7e22d..985980f021 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -150,13 +150,18 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 {
     struct vfio_info_cap_header *hdr;
     struct vfio_device_info_cap_zpci_group *cap;
+    S390pciState *s = s390_get_phb();
     ClpRspQueryPciGrp *resgrp;
     VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
+    uint8_t start_gid = pbdev->zpci_fn.pfgid;
 
     hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
 
-    /* If capability not provided, just use the default group */
-    if (hdr == NULL) {
+    /*
+     * If capability not provided or the underlying hostdev is simulated, just
+     * use the default group.
+     */
+    if (hdr == NULL || pbdev->zpci_fn.pfgid >= ZPCI_SIM_GRP_START) {
         trace_s390_pci_clp_cap(vpci->vbasedev.name,
                                VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
         pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
@@ -165,11 +170,40 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
     }
     cap = (void *) hdr;
 
+    /*
+     * For an intercept device, let's use an existing simulated group if one
+     * one was already created for other intercept devices in this group.
+     * If not, create a new simulated group if any are still available.
+     * If all else fails, just fall back on the default group.
+     */
+    if (!pbdev->interp) {
+        pbdev->pci_group = s390_group_find_host_sim(pbdev->zpci_fn.pfgid);
+        if (pbdev->pci_group) {
+            /* Use existing simulated group */
+            pbdev->zpci_fn.pfgid = pbdev->pci_group->id;
+            return;
+        } else {
+            if (s->next_sim_grp == ZPCI_DEFAULT_FN_GRP) {
+                /* All out of simulated groups, use default */
+                trace_s390_pci_clp_cap(vpci->vbasedev.name,
+                                       VFIO_DEVICE_INFO_CAP_ZPCI_GROUP);
+                pbdev->zpci_fn.pfgid = ZPCI_DEFAULT_FN_GRP;
+                pbdev->pci_group = s390_group_find(ZPCI_DEFAULT_FN_GRP);
+                return;
+            } else {
+                /* We can assign a new simulated group */
+                pbdev->zpci_fn.pfgid = s->next_sim_grp;
+                s->next_sim_grp++;
+                /* Fall through to create the new sim group using CLP info */
+            }
+        }
+    }
+
     /* See if the PCI group is already defined, create if not */
     pbdev->pci_group = s390_group_find(pbdev->zpci_fn.pfgid);
 
     if (!pbdev->pci_group) {
-        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid);
+        pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid, start_gid);
 
         resgrp = &pbdev->pci_group->zpci_group;
         if (cap->flags & VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH) {
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 5b09f0cf2f..0605fcea24 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -315,13 +315,16 @@ typedef struct ZpciFmb {
 QEMU_BUILD_BUG_MSG(offsetof(ZpciFmb, fmt0) != 48, "padding in ZpciFmb");
 
 #define ZPCI_DEFAULT_FN_GRP 0xFF
+#define ZPCI_SIM_GRP_START 0xF0
 typedef struct S390PCIGroup {
     ClpRspQueryPciGrp zpci_group;
     int id;
+    int host_id;
     QTAILQ_ENTRY(S390PCIGroup) link;
 } S390PCIGroup;
-S390PCIGroup *s390_group_create(int id);
+S390PCIGroup *s390_group_create(int id, int host_id);
 S390PCIGroup *s390_group_find(int id);
+S390PCIGroup *s390_group_find_host_sim(int host_id);
 
 struct S390PCIBusDevice {
     DeviceState qdev;
@@ -370,6 +373,7 @@ struct S390pciState {
     QTAILQ_HEAD(, S390PCIBusDevice) zpci_devs;
     QTAILQ_HEAD(, S390PCIDMACount) zpci_dma_limit;
     QTAILQ_HEAD(, S390PCIGroup) zpci_groups;
+    uint8_t next_sim_grp;
 };
 
 S390pciState *s390_get_phb(void);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 9/9] s390x/pci: reflect proper maxstbl for groups of interpreted devices
  2022-04-04 18:17 ` Matthew Rosato
@ 2022-04-04 18:17   ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: farman, kvm, pmorel, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

The maximum supported store block length might be different depending
on whether the instruction is interpretively executed (firmware-reported
maximum) or handled via userspace intercept (host kernel API maximum).
Choose the best available value during group creation.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-vfio.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 985980f021..212dd053f7 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -213,7 +213,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
         resgrp->msia = cap->msi_addr;
         resgrp->mui = cap->mui;
         resgrp->i = cap->noi;
-        resgrp->maxstbl = cap->maxstbl;
+        if (pbdev->interp && hdr->version >= 2) {
+            resgrp->maxstbl = cap->imaxstbl;
+        } else {
+            resgrp->maxstbl = cap->maxstbl;
+        }
         resgrp->version = cap->version;
         resgrp->dtsm = ZPCI_DTSM;
     }
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v5 9/9] s390x/pci: reflect proper maxstbl for groups of interpreted devices
@ 2022-04-04 18:17   ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-04 18:17 UTC (permalink / raw)
  To: qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

The maximum supported store block length might be different depending
on whether the instruction is interpretively executed (firmware-reported
maximum) or handled via userspace intercept (host kernel API maximum).
Choose the best available value during group creation.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-vfio.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 985980f021..212dd053f7 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -213,7 +213,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
         resgrp->msia = cap->msi_addr;
         resgrp->mui = cap->mui;
         resgrp->i = cap->noi;
-        resgrp->maxstbl = cap->maxstbl;
+        if (pbdev->interp && hdr->version >= 2) {
+            resgrp->maxstbl = cap->imaxstbl;
+        } else {
+            resgrp->maxstbl = cap->maxstbl;
+        }
         resgrp->version = cap->version;
         resgrp->dtsm = ZPCI_DTSM;
     }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 2/9] vfio: tolerate migration protocol v1 uapi renames
  2022-04-04 18:17   ` Matthew Rosato
@ 2022-04-12 15:50     ` Pierre Morel
  -1 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-12 15:50 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm



On 4/4/22 20:17, Matthew Rosato wrote:
> The v1 uapi is deprecated and will be replaced by v2 at some point;
> this patch just tolerates the renaming of uapi fields to reflect
> v1 / deprecated status.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/vfio/common.c    |  2 +-
>   hw/vfio/migration.c | 19 +++++++++++--------
>   2 files changed, 12 insertions(+), 9 deletions(-)


I do not understand why you need this patch in this series.
Shouldn't it be separate?

> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 080046e3f5..7b1e12fb69 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -380,7 +380,7 @@ static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
>                   return false;
>               }
>   
> -            if ((migration->device_state & VFIO_DEVICE_STATE_SAVING) &&
> +            if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
>                   (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
>                   continue;
>               } else {
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index ff6b45de6b..e109cee551 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -432,7 +432,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>       }
>   
>       ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
> -                                   VFIO_DEVICE_STATE_SAVING);
> +                                   VFIO_DEVICE_STATE_V1_SAVING);
>       if (ret) {
>           error_report("%s: Failed to set state SAVING", vbasedev->name);
>           return ret;
> @@ -532,7 +532,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>       int ret;
>   
>       ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_RUNNING,
> -                                   VFIO_DEVICE_STATE_SAVING);
> +                                   VFIO_DEVICE_STATE_V1_SAVING);
>       if (ret) {
>           error_report("%s: Failed to set state STOP and SAVING",
>                        vbasedev->name);
> @@ -569,7 +569,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>           return ret;
>       }
>   
> -    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_SAVING, 0);
> +    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING, 0);
>       if (ret) {
>           error_report("%s: Failed to set state STOPPED", vbasedev->name);
>           return ret;
> @@ -730,7 +730,7 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>            * start saving data.
>            */
>           if (state == RUN_STATE_SAVE_VM) {
> -            value = VFIO_DEVICE_STATE_SAVING;
> +            value = VFIO_DEVICE_STATE_V1_SAVING;
>           } else {
>               value = 0;
>           }
> @@ -768,8 +768,9 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>       case MIGRATION_STATUS_FAILED:
>           bytes_transferred = 0;
>           ret = vfio_migration_set_state(vbasedev,
> -                      ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING),
> -                      VFIO_DEVICE_STATE_RUNNING);
> +                                       ~(VFIO_DEVICE_STATE_V1_SAVING |
> +                                         VFIO_DEVICE_STATE_RESUMING),
> +                                       VFIO_DEVICE_STATE_RUNNING);
>           if (ret) {
>               error_report("%s: Failed to set state RUNNING", vbasedev->name);
>           }
> @@ -864,8 +865,10 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>           goto add_blocker;
>       }
>   
> -    ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
> -                                   VFIO_REGION_SUBTYPE_MIGRATION, &info);
> +    ret = vfio_get_dev_region_info(vbasedev,
> +                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
> +                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> +                                   &info);
>       if (ret) {
>           goto add_blocker;
>       }
> 

-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 2/9] vfio: tolerate migration protocol v1 uapi renames
@ 2022-04-12 15:50     ` Pierre Morel
  0 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-12 15:50 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 4/4/22 20:17, Matthew Rosato wrote:
> The v1 uapi is deprecated and will be replaced by v2 at some point;
> this patch just tolerates the renaming of uapi fields to reflect
> v1 / deprecated status.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/vfio/common.c    |  2 +-
>   hw/vfio/migration.c | 19 +++++++++++--------
>   2 files changed, 12 insertions(+), 9 deletions(-)


I do not understand why you need this patch in this series.
Shouldn't it be separate?

> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 080046e3f5..7b1e12fb69 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -380,7 +380,7 @@ static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
>                   return false;
>               }
>   
> -            if ((migration->device_state & VFIO_DEVICE_STATE_SAVING) &&
> +            if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
>                   (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
>                   continue;
>               } else {
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index ff6b45de6b..e109cee551 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -432,7 +432,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>       }
>   
>       ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
> -                                   VFIO_DEVICE_STATE_SAVING);
> +                                   VFIO_DEVICE_STATE_V1_SAVING);
>       if (ret) {
>           error_report("%s: Failed to set state SAVING", vbasedev->name);
>           return ret;
> @@ -532,7 +532,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>       int ret;
>   
>       ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_RUNNING,
> -                                   VFIO_DEVICE_STATE_SAVING);
> +                                   VFIO_DEVICE_STATE_V1_SAVING);
>       if (ret) {
>           error_report("%s: Failed to set state STOP and SAVING",
>                        vbasedev->name);
> @@ -569,7 +569,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
>           return ret;
>       }
>   
> -    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_SAVING, 0);
> +    ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING, 0);
>       if (ret) {
>           error_report("%s: Failed to set state STOPPED", vbasedev->name);
>           return ret;
> @@ -730,7 +730,7 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>            * start saving data.
>            */
>           if (state == RUN_STATE_SAVE_VM) {
> -            value = VFIO_DEVICE_STATE_SAVING;
> +            value = VFIO_DEVICE_STATE_V1_SAVING;
>           } else {
>               value = 0;
>           }
> @@ -768,8 +768,9 @@ static void vfio_migration_state_notifier(Notifier *notifier, void *data)
>       case MIGRATION_STATUS_FAILED:
>           bytes_transferred = 0;
>           ret = vfio_migration_set_state(vbasedev,
> -                      ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING),
> -                      VFIO_DEVICE_STATE_RUNNING);
> +                                       ~(VFIO_DEVICE_STATE_V1_SAVING |
> +                                         VFIO_DEVICE_STATE_RESUMING),
> +                                       VFIO_DEVICE_STATE_RUNNING);
>           if (ret) {
>               error_report("%s: Failed to set state RUNNING", vbasedev->name);
>           }
> @@ -864,8 +865,10 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>           goto add_blocker;
>       }
>   
> -    ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
> -                                   VFIO_REGION_SUBTYPE_MIGRATION, &info);
> +    ret = vfio_get_dev_region_info(vbasedev,
> +                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
> +                                   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> +                                   &info);
>       if (ret) {
>           goto add_blocker;
>       }
> 

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 2/9] vfio: tolerate migration protocol v1 uapi renames
  2022-04-12 15:50     ` Pierre Morel
@ 2022-04-12 16:07       ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-12 16:07 UTC (permalink / raw)
  To: Pierre Morel, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

On 4/12/22 11:50 AM, Pierre Morel wrote:
> 
> 
> On 4/4/22 20:17, Matthew Rosato wrote:
>> The v1 uapi is deprecated and will be replaced by v2 at some point;
>> this patch just tolerates the renaming of uapi fields to reflect
>> v1 / deprecated status.
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>>   hw/vfio/common.c    |  2 +-
>>   hw/vfio/migration.c | 19 +++++++++++--------
>>   2 files changed, 12 insertions(+), 9 deletions(-)
> 
> 
> I do not understand why you need this patch in this series.
> Shouldn't it be separate?

This patch is included because of the patch 1 kernel header sync, which 
pulls in uapi headers from kernel version 5.18-rc1 + my unmerged kernel 
uapi changes.

This patch is unnecessary without a header sync (and in fact would break 
QEMU compile), and is unrelated to the rest of the series -- but QEMU 
will not compile without it once you update linux uapi headers to 
5.18-rc1 (or greater) due to the v1 uapi for vfio migration being 
deprecated [1].  This means that ANY series that does a linux header 
sync starting from here on will need something like this patch to go 
along with the header sync (or a series that replaces v1 usage with v2?).

If this patch looks good it could be included whenever a header sync is 
next needed, doesn't necessarily have to be with this series.

[1] https://www.spinics.net/lists/kernel/msg4288200.html

> 
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 080046e3f5..7b1e12fb69 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -380,7 +380,7 @@ static bool 
>> vfio_devices_all_running_and_saving(VFIOContainer *container)
>>                   return false;
>>               }
>> -            if ((migration->device_state & VFIO_DEVICE_STATE_SAVING) &&
>> +            if ((migration->device_state & 
>> VFIO_DEVICE_STATE_V1_SAVING) &&
>>                   (migration->device_state & 
>> VFIO_DEVICE_STATE_RUNNING)) {
>>                   continue;
>>               } else {
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index ff6b45de6b..e109cee551 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -432,7 +432,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>>       }
>>       ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
>> -                                   VFIO_DEVICE_STATE_SAVING);
>> +                                   VFIO_DEVICE_STATE_V1_SAVING);
>>       if (ret) {
>>           error_report("%s: Failed to set state SAVING", vbasedev->name);
>>           return ret;
>> @@ -532,7 +532,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, 
>> void *opaque)
>>       int ret;
>>       ret = vfio_migration_set_state(vbasedev, 
>> ~VFIO_DEVICE_STATE_RUNNING,
>> -                                   VFIO_DEVICE_STATE_SAVING);
>> +                                   VFIO_DEVICE_STATE_V1_SAVING);
>>       if (ret) {
>>           error_report("%s: Failed to set state STOP and SAVING",
>>                        vbasedev->name);
>> @@ -569,7 +569,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, 
>> void *opaque)
>>           return ret;
>>       }
>> -    ret = vfio_migration_set_state(vbasedev, 
>> ~VFIO_DEVICE_STATE_SAVING, 0);
>> +    ret = vfio_migration_set_state(vbasedev, 
>> ~VFIO_DEVICE_STATE_V1_SAVING, 0);
>>       if (ret) {
>>           error_report("%s: Failed to set state STOPPED", 
>> vbasedev->name);
>>           return ret;
>> @@ -730,7 +730,7 @@ static void vfio_vmstate_change(void *opaque, bool 
>> running, RunState state)
>>            * start saving data.
>>            */
>>           if (state == RUN_STATE_SAVE_VM) {
>> -            value = VFIO_DEVICE_STATE_SAVING;
>> +            value = VFIO_DEVICE_STATE_V1_SAVING;
>>           } else {
>>               value = 0;
>>           }
>> @@ -768,8 +768,9 @@ static void vfio_migration_state_notifier(Notifier 
>> *notifier, void *data)
>>       case MIGRATION_STATUS_FAILED:
>>           bytes_transferred = 0;
>>           ret = vfio_migration_set_state(vbasedev,
>> -                      ~(VFIO_DEVICE_STATE_SAVING | 
>> VFIO_DEVICE_STATE_RESUMING),
>> -                      VFIO_DEVICE_STATE_RUNNING);
>> +                                       ~(VFIO_DEVICE_STATE_V1_SAVING |
>> +                                         VFIO_DEVICE_STATE_RESUMING),
>> +                                       VFIO_DEVICE_STATE_RUNNING);
>>           if (ret) {
>>               error_report("%s: Failed to set state RUNNING", 
>> vbasedev->name);
>>           }
>> @@ -864,8 +865,10 @@ int vfio_migration_probe(VFIODevice *vbasedev, 
>> Error **errp)
>>           goto add_blocker;
>>       }
>> -    ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
>> -                                   VFIO_REGION_SUBTYPE_MIGRATION, 
>> &info);
>> +    ret = vfio_get_dev_region_info(vbasedev,
>> +                                   
>> VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
>> +                                   
>> VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
>> +                                   &info);
>>       if (ret) {
>>           goto add_blocker;
>>       }
>>
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 2/9] vfio: tolerate migration protocol v1 uapi renames
@ 2022-04-12 16:07       ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-12 16:07 UTC (permalink / raw)
  To: Pierre Morel, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 4/12/22 11:50 AM, Pierre Morel wrote:
> 
> 
> On 4/4/22 20:17, Matthew Rosato wrote:
>> The v1 uapi is deprecated and will be replaced by v2 at some point;
>> this patch just tolerates the renaming of uapi fields to reflect
>> v1 / deprecated status.
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>>   hw/vfio/common.c    |  2 +-
>>   hw/vfio/migration.c | 19 +++++++++++--------
>>   2 files changed, 12 insertions(+), 9 deletions(-)
> 
> 
> I do not understand why you need this patch in this series.
> Shouldn't it be separate?

This patch is included because of the patch 1 kernel header sync, which 
pulls in uapi headers from kernel version 5.18-rc1 + my unmerged kernel 
uapi changes.

This patch is unnecessary without a header sync (and in fact would break 
QEMU compile), and is unrelated to the rest of the series -- but QEMU 
will not compile without it once you update linux uapi headers to 
5.18-rc1 (or greater) due to the v1 uapi for vfio migration being 
deprecated [1].  This means that ANY series that does a linux header 
sync starting from here on will need something like this patch to go 
along with the header sync (or a series that replaces v1 usage with v2?).

If this patch looks good it could be included whenever a header sync is 
next needed, doesn't necessarily have to be with this series.

[1] https://www.spinics.net/lists/kernel/msg4288200.html

> 
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 080046e3f5..7b1e12fb69 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -380,7 +380,7 @@ static bool 
>> vfio_devices_all_running_and_saving(VFIOContainer *container)
>>                   return false;
>>               }
>> -            if ((migration->device_state & VFIO_DEVICE_STATE_SAVING) &&
>> +            if ((migration->device_state & 
>> VFIO_DEVICE_STATE_V1_SAVING) &&
>>                   (migration->device_state & 
>> VFIO_DEVICE_STATE_RUNNING)) {
>>                   continue;
>>               } else {
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index ff6b45de6b..e109cee551 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -432,7 +432,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>>       }
>>       ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
>> -                                   VFIO_DEVICE_STATE_SAVING);
>> +                                   VFIO_DEVICE_STATE_V1_SAVING);
>>       if (ret) {
>>           error_report("%s: Failed to set state SAVING", vbasedev->name);
>>           return ret;
>> @@ -532,7 +532,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, 
>> void *opaque)
>>       int ret;
>>       ret = vfio_migration_set_state(vbasedev, 
>> ~VFIO_DEVICE_STATE_RUNNING,
>> -                                   VFIO_DEVICE_STATE_SAVING);
>> +                                   VFIO_DEVICE_STATE_V1_SAVING);
>>       if (ret) {
>>           error_report("%s: Failed to set state STOP and SAVING",
>>                        vbasedev->name);
>> @@ -569,7 +569,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, 
>> void *opaque)
>>           return ret;
>>       }
>> -    ret = vfio_migration_set_state(vbasedev, 
>> ~VFIO_DEVICE_STATE_SAVING, 0);
>> +    ret = vfio_migration_set_state(vbasedev, 
>> ~VFIO_DEVICE_STATE_V1_SAVING, 0);
>>       if (ret) {
>>           error_report("%s: Failed to set state STOPPED", 
>> vbasedev->name);
>>           return ret;
>> @@ -730,7 +730,7 @@ static void vfio_vmstate_change(void *opaque, bool 
>> running, RunState state)
>>            * start saving data.
>>            */
>>           if (state == RUN_STATE_SAVE_VM) {
>> -            value = VFIO_DEVICE_STATE_SAVING;
>> +            value = VFIO_DEVICE_STATE_V1_SAVING;
>>           } else {
>>               value = 0;
>>           }
>> @@ -768,8 +768,9 @@ static void vfio_migration_state_notifier(Notifier 
>> *notifier, void *data)
>>       case MIGRATION_STATUS_FAILED:
>>           bytes_transferred = 0;
>>           ret = vfio_migration_set_state(vbasedev,
>> -                      ~(VFIO_DEVICE_STATE_SAVING | 
>> VFIO_DEVICE_STATE_RESUMING),
>> -                      VFIO_DEVICE_STATE_RUNNING);
>> +                                       ~(VFIO_DEVICE_STATE_V1_SAVING |
>> +                                         VFIO_DEVICE_STATE_RESUMING),
>> +                                       VFIO_DEVICE_STATE_RUNNING);
>>           if (ret) {
>>               error_report("%s: Failed to set state RUNNING", 
>> vbasedev->name);
>>           }
>> @@ -864,8 +865,10 @@ int vfio_migration_probe(VFIODevice *vbasedev, 
>> Error **errp)
>>           goto add_blocker;
>>       }
>> -    ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
>> -                                   VFIO_REGION_SUBTYPE_MIGRATION, 
>> &info);
>> +    ret = vfio_get_dev_region_info(vbasedev,
>> +                                   
>> VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
>> +                                   
>> VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
>> +                                   &info);
>>       if (ret) {
>>           goto add_blocker;
>>       }
>>
> 



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 2/9] vfio: tolerate migration protocol v1 uapi renames
  2022-04-12 16:07       ` Matthew Rosato
  (?)
@ 2022-04-19 15:44       ` Pierre Morel
  -1 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-19 15:44 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 4/12/22 18:07, Matthew Rosato wrote:
> On 4/12/22 11:50 AM, Pierre Morel wrote:
>>
>>
>> On 4/4/22 20:17, Matthew Rosato wrote:
>>> The v1 uapi is deprecated and will be replaced by v2 at some point;
>>> this patch just tolerates the renaming of uapi fields to reflect
>>> v1 / deprecated status.
>>>
>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>> ---
>>>   hw/vfio/common.c    |  2 +-
>>>   hw/vfio/migration.c | 19 +++++++++++--------
>>>   2 files changed, 12 insertions(+), 9 deletions(-)
>>
>>
>> I do not understand why you need this patch in this series.
>> Shouldn't it be separate?
> 
> This patch is included because of the patch 1 kernel header sync, which 
> pulls in uapi headers from kernel version 5.18-rc1 + my unmerged kernel 
> uapi changes.
> 
> This patch is unnecessary without a header sync (and in fact would break 
> QEMU compile), and is unrelated to the rest of the series -- but QEMU 
> will not compile without it once you update linux uapi headers to 
> 5.18-rc1 (or greater) due to the v1 uapi for vfio migration being 
> deprecated [1].  This means that ANY series that does a linux header 
> sync starting from here on will need something like this patch to go 
> along with the header sync (or a series that replaces v1 usage with v2?).
> 
> If this patch looks good it could be included whenever a header sync is 
> next needed, doesn't necessarily have to be with this series.
> 
> [1] https://www.spinics.net/lists/kernel/msg4288200.html
> 

arrg, seems I will need it too then.
Thanks,

Pierre


-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 4/9] s390x/pci: add routine to get host function handle from CLP info
  2022-04-04 18:17   ` Matthew Rosato
@ 2022-04-19 19:15     ` Pierre Morel
  -1 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-19 19:15 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm



On 4/4/22 20:17, Matthew Rosato wrote:
> In order to interface with the underlying host zPCI device, we need
> to know it's function handle.  Add a routine to grab this from the
> vfio CLP capabilities chain.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>


> ---
>   hw/s390x/s390-pci-vfio.c         | 83 ++++++++++++++++++++++++++------
>   include/hw/s390x/s390-pci-vfio.h |  6 +++
>   2 files changed, 73 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 6f80a47e29..4bf0a7e22d 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -124,6 +124,27 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev,
>       pbdev->zpci_fn.pft = 0;
>   }
>   
> +static bool get_host_fh(S390PCIBusDevice *pbdev, struct vfio_device_info *info,
> +                        uint32_t *fh)
> +{
> +    struct vfio_info_cap_header *hdr;
> +    struct vfio_device_info_cap_zpci_base *cap;
> +    VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +
> +    hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
> +
> +    /* Can only get the host fh with version 2 or greater */
> +    if (hdr == NULL || hdr->version < 2) {
> +        trace_s390_pci_clp_cap(vpci->vbasedev.name,
> +                               VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
> +        return false;
> +    }
> +    cap = (void *) hdr;
> +
> +    *fh = cap->fh;
> +    return true;
> +}
> +
>   static void s390_pci_read_group(S390PCIBusDevice *pbdev,
>                                   struct vfio_device_info *info)
>   {
> @@ -217,25 +238,13 @@ static void s390_pci_read_pfip(S390PCIBusDevice *pbdev,
>       memcpy(pbdev->zpci_fn.pfip, cap->pfip, CLP_PFIP_NR_SEGMENTS);
>   }
>   
> -/*
> - * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
> - * capabilities that contain information about CLP features provided by the
> - * underlying host.
> - * On entry, defaults have already been placed into the guest CLP response
> - * buffers.  On exit, defaults will have been overwritten for any CLP features
> - * found in the capability chain; defaults will remain for any CLP features not
> - * found in the chain.
> - */
> -void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
> +static struct vfio_device_info *get_device_info(S390PCIBusDevice *pbdev,
> +                                                uint32_t argsz)
>   {
> -    g_autofree struct vfio_device_info *info = NULL;
> +    struct vfio_device_info *info = g_malloc0(argsz);
>       VFIOPCIDevice *vfio_pci;
> -    uint32_t argsz;
>       int fd;
>   
> -    argsz = sizeof(*info);
> -    info = g_malloc0(argsz);
> -
>       vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
>       fd = vfio_pci->vbasedev.fd;
>   
> @@ -250,7 +259,8 @@ retry:
>   
>       if (ioctl(fd, VFIO_DEVICE_GET_INFO, info)) {
>           trace_s390_pci_clp_dev_info(vfio_pci->vbasedev.name);
> -        return;
> +        free(info);
> +        return NULL;
>       }
>   
>       if (info->argsz > argsz) {
> @@ -259,6 +269,47 @@ retry:
>           goto retry;
>       }
>   
> +    return info;
> +}
> +
> +/*
> + * Get the host function handle from the vfio CLP capabilities chain.  Returns
> + * true if a fh value was placed into the provided buffer.  Returns false
> + * if a fh could not be obtained (ioctl failed or capabilitiy version does
> + * not include the fh)
> + */
> +bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
> +{
> +    g_autofree struct vfio_device_info *info = NULL;
> +
> +    assert(fh);
> +
> +    info = get_device_info(pbdev, sizeof(*info));
> +    if (!info) {
> +        return false;
> +    }
> +
> +    return get_host_fh(pbdev, info, fh);
> +}
> +
> +/*
> + * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
> + * capabilities that contain information about CLP features provided by the
> + * underlying host.
> + * On entry, defaults have already been placed into the guest CLP response
> + * buffers.  On exit, defaults will have been overwritten for any CLP features
> + * found in the capability chain; defaults will remain for any CLP features not
> + * found in the chain.
> + */
> +void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
> +{
> +    g_autofree struct vfio_device_info *info = NULL;
> +
> +    info = get_device_info(pbdev, sizeof(*info));
> +    if (!info) {
> +        return;
> +    }
> +
>       /*
>        * Find the CLP features provided and fill in the guest CLP responses.
>        * Always call s390_pci_read_base first as information from this could
> diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
> index ff708aef50..0c2e4b5175 100644
> --- a/include/hw/s390x/s390-pci-vfio.h
> +++ b/include/hw/s390x/s390-pci-vfio.h
> @@ -20,6 +20,7 @@ bool s390_pci_update_dma_avail(int fd, unsigned int *avail);
>   S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
>                                             S390PCIBusDevice *pbdev);
>   void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
> +bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
>   void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
>   #else
>   static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
> @@ -33,6 +34,11 @@ static inline S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
>   }
>   static inline void s390_pci_end_dma_count(S390pciState *s,
>                                             S390PCIDMACount *cnt) { }
> +static inline bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev,
> +                                        unsigned int *fh)
> +{
> +    return false;
> +}
>   static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
>   #endif
>   
> 

-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 4/9] s390x/pci: add routine to get host function handle from CLP info
@ 2022-04-19 19:15     ` Pierre Morel
  0 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-19 19:15 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 4/4/22 20:17, Matthew Rosato wrote:
> In order to interface with the underlying host zPCI device, we need
> to know it's function handle.  Add a routine to grab this from the
> vfio CLP capabilities chain.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>


> ---
>   hw/s390x/s390-pci-vfio.c         | 83 ++++++++++++++++++++++++++------
>   include/hw/s390x/s390-pci-vfio.h |  6 +++
>   2 files changed, 73 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 6f80a47e29..4bf0a7e22d 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -124,6 +124,27 @@ static void s390_pci_read_base(S390PCIBusDevice *pbdev,
>       pbdev->zpci_fn.pft = 0;
>   }
>   
> +static bool get_host_fh(S390PCIBusDevice *pbdev, struct vfio_device_info *info,
> +                        uint32_t *fh)
> +{
> +    struct vfio_info_cap_header *hdr;
> +    struct vfio_device_info_cap_zpci_base *cap;
> +    VFIOPCIDevice *vpci =  container_of(pbdev->pdev, VFIOPCIDevice, pdev);
> +
> +    hdr = vfio_get_device_info_cap(info, VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
> +
> +    /* Can only get the host fh with version 2 or greater */
> +    if (hdr == NULL || hdr->version < 2) {
> +        trace_s390_pci_clp_cap(vpci->vbasedev.name,
> +                               VFIO_DEVICE_INFO_CAP_ZPCI_BASE);
> +        return false;
> +    }
> +    cap = (void *) hdr;
> +
> +    *fh = cap->fh;
> +    return true;
> +}
> +
>   static void s390_pci_read_group(S390PCIBusDevice *pbdev,
>                                   struct vfio_device_info *info)
>   {
> @@ -217,25 +238,13 @@ static void s390_pci_read_pfip(S390PCIBusDevice *pbdev,
>       memcpy(pbdev->zpci_fn.pfip, cap->pfip, CLP_PFIP_NR_SEGMENTS);
>   }
>   
> -/*
> - * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
> - * capabilities that contain information about CLP features provided by the
> - * underlying host.
> - * On entry, defaults have already been placed into the guest CLP response
> - * buffers.  On exit, defaults will have been overwritten for any CLP features
> - * found in the capability chain; defaults will remain for any CLP features not
> - * found in the chain.
> - */
> -void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
> +static struct vfio_device_info *get_device_info(S390PCIBusDevice *pbdev,
> +                                                uint32_t argsz)
>   {
> -    g_autofree struct vfio_device_info *info = NULL;
> +    struct vfio_device_info *info = g_malloc0(argsz);
>       VFIOPCIDevice *vfio_pci;
> -    uint32_t argsz;
>       int fd;
>   
> -    argsz = sizeof(*info);
> -    info = g_malloc0(argsz);
> -
>       vfio_pci = container_of(pbdev->pdev, VFIOPCIDevice, pdev);
>       fd = vfio_pci->vbasedev.fd;
>   
> @@ -250,7 +259,8 @@ retry:
>   
>       if (ioctl(fd, VFIO_DEVICE_GET_INFO, info)) {
>           trace_s390_pci_clp_dev_info(vfio_pci->vbasedev.name);
> -        return;
> +        free(info);
> +        return NULL;
>       }
>   
>       if (info->argsz > argsz) {
> @@ -259,6 +269,47 @@ retry:
>           goto retry;
>       }
>   
> +    return info;
> +}
> +
> +/*
> + * Get the host function handle from the vfio CLP capabilities chain.  Returns
> + * true if a fh value was placed into the provided buffer.  Returns false
> + * if a fh could not be obtained (ioctl failed or capabilitiy version does
> + * not include the fh)
> + */
> +bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh)
> +{
> +    g_autofree struct vfio_device_info *info = NULL;
> +
> +    assert(fh);
> +
> +    info = get_device_info(pbdev, sizeof(*info));
> +    if (!info) {
> +        return false;
> +    }
> +
> +    return get_host_fh(pbdev, info, fh);
> +}
> +
> +/*
> + * This function will issue the VFIO_DEVICE_GET_INFO ioctl and look for
> + * capabilities that contain information about CLP features provided by the
> + * underlying host.
> + * On entry, defaults have already been placed into the guest CLP response
> + * buffers.  On exit, defaults will have been overwritten for any CLP features
> + * found in the capability chain; defaults will remain for any CLP features not
> + * found in the chain.
> + */
> +void s390_pci_get_clp_info(S390PCIBusDevice *pbdev)
> +{
> +    g_autofree struct vfio_device_info *info = NULL;
> +
> +    info = get_device_info(pbdev, sizeof(*info));
> +    if (!info) {
> +        return;
> +    }
> +
>       /*
>        * Find the CLP features provided and fill in the guest CLP responses.
>        * Always call s390_pci_read_base first as information from this could
> diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
> index ff708aef50..0c2e4b5175 100644
> --- a/include/hw/s390x/s390-pci-vfio.h
> +++ b/include/hw/s390x/s390-pci-vfio.h
> @@ -20,6 +20,7 @@ bool s390_pci_update_dma_avail(int fd, unsigned int *avail);
>   S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
>                                             S390PCIBusDevice *pbdev);
>   void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
> +bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
>   void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
>   #else
>   static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
> @@ -33,6 +34,11 @@ static inline S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
>   }
>   static inline void s390_pci_end_dma_count(S390pciState *s,
>                                             S390PCIDMACount *cnt) { }
> +static inline bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev,
> +                                        unsigned int *fh)
> +{
> +    return false;
> +}
>   static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
>   #endif
>   
> 

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 5/9] s390x/pci: enable for load/store intepretation
  2022-04-04 18:17   ` Matthew Rosato
@ 2022-04-19 19:47     ` Pierre Morel
  -1 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-19 19:47 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm



On 4/4/22 20:17, Matthew Rosato wrote:
> If the appropriate CPU facilty is available as well as the necessary
> ZPCI_OP ioctl, then the underlying KVM host will enable load/store
> intepretation for any guest device without a SHM bit in the guest
> function handle.  For a device that will be using interpretation
> support, ensure the guest function handle matches the host function
> handle; this value is re-checked every time the guest issues a SET PCI FN
> to enable the guest device as it is the only opportunity to reflect
> function handle changes.
> 
> By default, unless interpret=off is specified, interpretation support will
> always be assumed and exploited if the necessary ioctl and features are
> available on the host kernel.  When these are unavailable, we will silently
> revert to the interception model; this allows existing guest configurations
> to work unmodified on hosts with and without zPCI interpretation support,
> allowing QEMU to choose the best support model available.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/meson.build            |  1 +
>   hw/s390x/s390-pci-bus.c         | 66 ++++++++++++++++++++++++++++++++-
>   hw/s390x/s390-pci-inst.c        | 12 ++++++
>   hw/s390x/s390-pci-kvm.c         | 21 +++++++++++
>   include/hw/s390x/s390-pci-bus.h |  1 +
>   include/hw/s390x/s390-pci-kvm.h | 24 ++++++++++++
>   target/s390x/kvm/kvm.c          |  7 ++++
>   target/s390x/kvm/kvm_s390x.h    |  1 +
>   8 files changed, 132 insertions(+), 1 deletion(-)
>   create mode 100644 hw/s390x/s390-pci-kvm.c
>   create mode 100644 include/hw/s390x/s390-pci-kvm.h
> 

...snip...

>           if (s390_pci_msix_init(pbdev)) {
> @@ -1360,6 +1423,7 @@ static Property s390_pci_device_properties[] = {
>       DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
>       DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
>       DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
> +    DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
>       DEFINE_PROP_END_OF_LIST(),
>   };
>   
> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
> index 6d400d4147..c898c8abe9 100644
> --- a/hw/s390x/s390-pci-inst.c
> +++ b/hw/s390x/s390-pci-inst.c
> @@ -18,6 +18,8 @@
>   #include "sysemu/hw_accel.h"
>   #include "hw/s390x/s390-pci-inst.h"
>   #include "hw/s390x/s390-pci-bus.h"
> +#include "hw/s390x/s390-pci-kvm.h"
> +#include "hw/s390x/s390-pci-vfio.h"
>   #include "hw/s390x/tod.h"
>   
>   #ifndef DEBUG_S390PCI_INST
> @@ -246,6 +248,16 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
>                   goto out;
>               }
>   
> +            /*
> +             * Take this opportunity to make sure we still have an accurate
> +             * host fh.  It's possible part of the handle changed while the
> +             * device was disabled to the guest (e.g. vfio hot reset for
> +             * ISM during plug)
> +             */
> +            if (pbdev->interp) {
> +                /* Take this opportunity to make sure we are sync'd with host */
> +                s390_pci_get_host_fh(pbdev, &pbdev->fh);
> +            }
>               pbdev->fh |= FH_MASK_ENABLE;

Are we sure here that the PCI device is always enabled?
Shouldn't we check?


>               pbdev->state = ZPCI_FS_ENABLED;
>               stl_p(&ressetpci->fh, pbdev->fh);
> diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
> new file mode 100644
> index 0000000000..8bfce9ef18
> --- /dev/null
> +++ b/hw/s390x/s390-pci-kvm.c
> @@ -0,0 +1,21 @@
> +/*
> + * s390 zPCI KVM interfaces
> + *
> + * Copyright 2022 IBM Corp.
> + * Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "kvm/kvm_s390x.h"
> +#include "hw/s390x/s390-pci-kvm.h"
> +#include "cpu_models.h"
> +
> +bool s390_pci_kvm_interp_allowed(void)
> +{
> +    return s390_has_feat(S390_FEAT_ZPCI_INTERP) && kvm_s390_get_zpci_op();
> +}

ZPCI is not supported by the PV currently but I do not see what could 
prevent it to be enable in the future.
As the code of QEMU zPCI is not PV compatible, I would like to add a 
check for PV.

... && !s390_is_pv())



> diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
> index da3cde2bb4..a9843dfe97 100644
> --- a/include/hw/s390x/s390-pci-bus.h
> +++ b/include/hw/s390x/s390-pci-bus.h
> @@ -350,6 +350,7 @@ struct S390PCIBusDevice {
>       IndAddr *indicator;
>       bool pci_unplug_request_processed;
>       bool unplug_requested;
> +    bool interp;
>       QTAILQ_ENTRY(S390PCIBusDevice) link;
>   };
>   

...snip...

-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 5/9] s390x/pci: enable for load/store intepretation
@ 2022-04-19 19:47     ` Pierre Morel
  0 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-19 19:47 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 4/4/22 20:17, Matthew Rosato wrote:
> If the appropriate CPU facilty is available as well as the necessary
> ZPCI_OP ioctl, then the underlying KVM host will enable load/store
> intepretation for any guest device without a SHM bit in the guest
> function handle.  For a device that will be using interpretation
> support, ensure the guest function handle matches the host function
> handle; this value is re-checked every time the guest issues a SET PCI FN
> to enable the guest device as it is the only opportunity to reflect
> function handle changes.
> 
> By default, unless interpret=off is specified, interpretation support will
> always be assumed and exploited if the necessary ioctl and features are
> available on the host kernel.  When these are unavailable, we will silently
> revert to the interception model; this allows existing guest configurations
> to work unmodified on hosts with and without zPCI interpretation support,
> allowing QEMU to choose the best support model available.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/meson.build            |  1 +
>   hw/s390x/s390-pci-bus.c         | 66 ++++++++++++++++++++++++++++++++-
>   hw/s390x/s390-pci-inst.c        | 12 ++++++
>   hw/s390x/s390-pci-kvm.c         | 21 +++++++++++
>   include/hw/s390x/s390-pci-bus.h |  1 +
>   include/hw/s390x/s390-pci-kvm.h | 24 ++++++++++++
>   target/s390x/kvm/kvm.c          |  7 ++++
>   target/s390x/kvm/kvm_s390x.h    |  1 +
>   8 files changed, 132 insertions(+), 1 deletion(-)
>   create mode 100644 hw/s390x/s390-pci-kvm.c
>   create mode 100644 include/hw/s390x/s390-pci-kvm.h
> 

...snip...

>           if (s390_pci_msix_init(pbdev)) {
> @@ -1360,6 +1423,7 @@ static Property s390_pci_device_properties[] = {
>       DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
>       DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
>       DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
> +    DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
>       DEFINE_PROP_END_OF_LIST(),
>   };
>   
> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
> index 6d400d4147..c898c8abe9 100644
> --- a/hw/s390x/s390-pci-inst.c
> +++ b/hw/s390x/s390-pci-inst.c
> @@ -18,6 +18,8 @@
>   #include "sysemu/hw_accel.h"
>   #include "hw/s390x/s390-pci-inst.h"
>   #include "hw/s390x/s390-pci-bus.h"
> +#include "hw/s390x/s390-pci-kvm.h"
> +#include "hw/s390x/s390-pci-vfio.h"
>   #include "hw/s390x/tod.h"
>   
>   #ifndef DEBUG_S390PCI_INST
> @@ -246,6 +248,16 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, uintptr_t ra)
>                   goto out;
>               }
>   
> +            /*
> +             * Take this opportunity to make sure we still have an accurate
> +             * host fh.  It's possible part of the handle changed while the
> +             * device was disabled to the guest (e.g. vfio hot reset for
> +             * ISM during plug)
> +             */
> +            if (pbdev->interp) {
> +                /* Take this opportunity to make sure we are sync'd with host */
> +                s390_pci_get_host_fh(pbdev, &pbdev->fh);
> +            }
>               pbdev->fh |= FH_MASK_ENABLE;

Are we sure here that the PCI device is always enabled?
Shouldn't we check?


>               pbdev->state = ZPCI_FS_ENABLED;
>               stl_p(&ressetpci->fh, pbdev->fh);
> diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
> new file mode 100644
> index 0000000000..8bfce9ef18
> --- /dev/null
> +++ b/hw/s390x/s390-pci-kvm.c
> @@ -0,0 +1,21 @@
> +/*
> + * s390 zPCI KVM interfaces
> + *
> + * Copyright 2022 IBM Corp.
> + * Author(s): Matthew Rosato <mjrosato@linux.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "kvm/kvm_s390x.h"
> +#include "hw/s390x/s390-pci-kvm.h"
> +#include "cpu_models.h"
> +
> +bool s390_pci_kvm_interp_allowed(void)
> +{
> +    return s390_has_feat(S390_FEAT_ZPCI_INTERP) && kvm_s390_get_zpci_op();
> +}

ZPCI is not supported by the PV currently but I do not see what could 
prevent it to be enable in the future.
As the code of QEMU zPCI is not PV compatible, I would like to add a 
check for PV.

... && !s390_is_pv())



> diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
> index da3cde2bb4..a9843dfe97 100644
> --- a/include/hw/s390x/s390-pci-bus.h
> +++ b/include/hw/s390x/s390-pci-bus.h
> @@ -350,6 +350,7 @@ struct S390PCIBusDevice {
>       IndAddr *indicator;
>       bool pci_unplug_request_processed;
>       bool unplug_requested;
> +    bool interp;
>       QTAILQ_ENTRY(S390PCIBusDevice) link;
>   };
>   

...snip...

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 9/9] s390x/pci: reflect proper maxstbl for groups of interpreted devices
  2022-04-04 18:17   ` Matthew Rosato
@ 2022-04-19 19:49     ` Pierre Morel
  -1 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-19 19:49 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm



On 4/4/22 20:17, Matthew Rosato wrote:
> The maximum supported store block length might be different depending
> on whether the instruction is interpretively executed (firmware-reported
> maximum) or handled via userspace intercept (host kernel API maximum).
> Choose the best available value during group creation.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>

> ---
>   hw/s390x/s390-pci-vfio.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 985980f021..212dd053f7 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -213,7 +213,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
>           resgrp->msia = cap->msi_addr;
>           resgrp->mui = cap->mui;
>           resgrp->i = cap->noi;
> -        resgrp->maxstbl = cap->maxstbl;
> +        if (pbdev->interp && hdr->version >= 2) {
> +            resgrp->maxstbl = cap->imaxstbl;
> +        } else {
> +            resgrp->maxstbl = cap->maxstbl;
> +        }
>           resgrp->version = cap->version;
>           resgrp->dtsm = ZPCI_DTSM;
>       }
> 

-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 9/9] s390x/pci: reflect proper maxstbl for groups of interpreted devices
@ 2022-04-19 19:49     ` Pierre Morel
  0 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-19 19:49 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 4/4/22 20:17, Matthew Rosato wrote:
> The maximum supported store block length might be different depending
> on whether the instruction is interpretively executed (firmware-reported
> maximum) or handled via userspace intercept (host kernel API maximum).
> Choose the best available value during group creation.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>

> ---
>   hw/s390x/s390-pci-vfio.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 985980f021..212dd053f7 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -213,7 +213,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
>           resgrp->msia = cap->msi_addr;
>           resgrp->mui = cap->mui;
>           resgrp->i = cap->noi;
> -        resgrp->maxstbl = cap->maxstbl;
> +        if (pbdev->interp && hdr->version >= 2) {
> +            resgrp->maxstbl = cap->imaxstbl;
> +        } else {
> +            resgrp->maxstbl = cap->maxstbl;
> +        }
>           resgrp->version = cap->version;
>           resgrp->dtsm = ZPCI_DTSM;
>       }
> 

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 5/9] s390x/pci: enable for load/store intepretation
  2022-04-19 19:47     ` Pierre Morel
@ 2022-04-20 15:12       ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-20 15:12 UTC (permalink / raw)
  To: Pierre Morel, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

On 4/19/22 3:47 PM, Pierre Morel wrote:
> 
> 
> On 4/4/22 20:17, Matthew Rosato wrote:
>> If the appropriate CPU facilty is available as well as the necessary
>> ZPCI_OP ioctl, then the underlying KVM host will enable load/store
>> intepretation for any guest device without a SHM bit in the guest
>> function handle.  For a device that will be using interpretation
>> support, ensure the guest function handle matches the host function
>> handle; this value is re-checked every time the guest issues a SET PCI FN
>> to enable the guest device as it is the only opportunity to reflect
>> function handle changes.
>>
>> By default, unless interpret=off is specified, interpretation support 
>> will
>> always be assumed and exploited if the necessary ioctl and features are
>> available on the host kernel.  When these are unavailable, we will 
>> silently
>> revert to the interception model; this allows existing guest 
>> configurations
>> to work unmodified on hosts with and without zPCI interpretation support,
>> allowing QEMU to choose the best support model available.
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>>   hw/s390x/meson.build            |  1 +
>>   hw/s390x/s390-pci-bus.c         | 66 ++++++++++++++++++++++++++++++++-
>>   hw/s390x/s390-pci-inst.c        | 12 ++++++
>>   hw/s390x/s390-pci-kvm.c         | 21 +++++++++++
>>   include/hw/s390x/s390-pci-bus.h |  1 +
>>   include/hw/s390x/s390-pci-kvm.h | 24 ++++++++++++
>>   target/s390x/kvm/kvm.c          |  7 ++++
>>   target/s390x/kvm/kvm_s390x.h    |  1 +
>>   8 files changed, 132 insertions(+), 1 deletion(-)
>>   create mode 100644 hw/s390x/s390-pci-kvm.c
>>   create mode 100644 include/hw/s390x/s390-pci-kvm.h
>>
> 
> ...snip...
> 
>>           if (s390_pci_msix_init(pbdev)) {
>> @@ -1360,6 +1423,7 @@ static Property s390_pci_device_properties[] = {
>>       DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
>>       DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
>>       DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
>> +    DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
>>       DEFINE_PROP_END_OF_LIST(),
>>   };
>> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
>> index 6d400d4147..c898c8abe9 100644
>> --- a/hw/s390x/s390-pci-inst.c
>> +++ b/hw/s390x/s390-pci-inst.c
>> @@ -18,6 +18,8 @@
>>   #include "sysemu/hw_accel.h"
>>   #include "hw/s390x/s390-pci-inst.h"
>>   #include "hw/s390x/s390-pci-bus.h"
>> +#include "hw/s390x/s390-pci-kvm.h"
>> +#include "hw/s390x/s390-pci-vfio.h"
>>   #include "hw/s390x/tod.h"
>>   #ifndef DEBUG_S390PCI_INST
>> @@ -246,6 +248,16 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, 
>> uintptr_t ra)
>>                   goto out;
>>               }
>> +            /*
>> +             * Take this opportunity to make sure we still have an 
>> accurate
>> +             * host fh.  It's possible part of the handle changed 
>> while the
>> +             * device was disabled to the guest (e.g. vfio hot reset for
>> +             * ISM during plug)
>> +             */
>> +            if (pbdev->interp) {
>> +                /* Take this opportunity to make sure we are sync'd 
>> with host */
>> +                s390_pci_get_host_fh(pbdev, &pbdev->fh);
>> +            }
>>               pbdev->fh |= FH_MASK_ENABLE;
> 
> Are we sure here that the PCI device is always enabled?
> Shouldn't we check?

I guess you mean the host device?  Interesting thought.

So, to be clear, the idea on setting FH_MASK_ENABLE here is that we are 
handling a guest CLP SET PCI FN enable so the guest fh should always 
have FH_MASK_ENABLE set if we return CLP_RC_OK to the guest.

But for interpretation, if we find the host function is disabled, I 
suppose we could return an error on the guest CLP (not sure which error 
yet); otherwise, if we return the force-enabled handle and CLP_RC_OK as 
we do here then the guest will just get errors attempting to use it.




^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 5/9] s390x/pci: enable for load/store intepretation
@ 2022-04-20 15:12       ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-20 15:12 UTC (permalink / raw)
  To: Pierre Morel, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 4/19/22 3:47 PM, Pierre Morel wrote:
> 
> 
> On 4/4/22 20:17, Matthew Rosato wrote:
>> If the appropriate CPU facilty is available as well as the necessary
>> ZPCI_OP ioctl, then the underlying KVM host will enable load/store
>> intepretation for any guest device without a SHM bit in the guest
>> function handle.  For a device that will be using interpretation
>> support, ensure the guest function handle matches the host function
>> handle; this value is re-checked every time the guest issues a SET PCI FN
>> to enable the guest device as it is the only opportunity to reflect
>> function handle changes.
>>
>> By default, unless interpret=off is specified, interpretation support 
>> will
>> always be assumed and exploited if the necessary ioctl and features are
>> available on the host kernel.  When these are unavailable, we will 
>> silently
>> revert to the interception model; this allows existing guest 
>> configurations
>> to work unmodified on hosts with and without zPCI interpretation support,
>> allowing QEMU to choose the best support model available.
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>>   hw/s390x/meson.build            |  1 +
>>   hw/s390x/s390-pci-bus.c         | 66 ++++++++++++++++++++++++++++++++-
>>   hw/s390x/s390-pci-inst.c        | 12 ++++++
>>   hw/s390x/s390-pci-kvm.c         | 21 +++++++++++
>>   include/hw/s390x/s390-pci-bus.h |  1 +
>>   include/hw/s390x/s390-pci-kvm.h | 24 ++++++++++++
>>   target/s390x/kvm/kvm.c          |  7 ++++
>>   target/s390x/kvm/kvm_s390x.h    |  1 +
>>   8 files changed, 132 insertions(+), 1 deletion(-)
>>   create mode 100644 hw/s390x/s390-pci-kvm.c
>>   create mode 100644 include/hw/s390x/s390-pci-kvm.h
>>
> 
> ...snip...
> 
>>           if (s390_pci_msix_init(pbdev)) {
>> @@ -1360,6 +1423,7 @@ static Property s390_pci_device_properties[] = {
>>       DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
>>       DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
>>       DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
>> +    DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
>>       DEFINE_PROP_END_OF_LIST(),
>>   };
>> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
>> index 6d400d4147..c898c8abe9 100644
>> --- a/hw/s390x/s390-pci-inst.c
>> +++ b/hw/s390x/s390-pci-inst.c
>> @@ -18,6 +18,8 @@
>>   #include "sysemu/hw_accel.h"
>>   #include "hw/s390x/s390-pci-inst.h"
>>   #include "hw/s390x/s390-pci-bus.h"
>> +#include "hw/s390x/s390-pci-kvm.h"
>> +#include "hw/s390x/s390-pci-vfio.h"
>>   #include "hw/s390x/tod.h"
>>   #ifndef DEBUG_S390PCI_INST
>> @@ -246,6 +248,16 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, 
>> uintptr_t ra)
>>                   goto out;
>>               }
>> +            /*
>> +             * Take this opportunity to make sure we still have an 
>> accurate
>> +             * host fh.  It's possible part of the handle changed 
>> while the
>> +             * device was disabled to the guest (e.g. vfio hot reset for
>> +             * ISM during plug)
>> +             */
>> +            if (pbdev->interp) {
>> +                /* Take this opportunity to make sure we are sync'd 
>> with host */
>> +                s390_pci_get_host_fh(pbdev, &pbdev->fh);
>> +            }
>>               pbdev->fh |= FH_MASK_ENABLE;
> 
> Are we sure here that the PCI device is always enabled?
> Shouldn't we check?

I guess you mean the host device?  Interesting thought.

So, to be clear, the idea on setting FH_MASK_ENABLE here is that we are 
handling a guest CLP SET PCI FN enable so the guest fh should always 
have FH_MASK_ENABLE set if we return CLP_RC_OK to the guest.

But for interpretation, if we find the host function is disabled, I 
suppose we could return an error on the guest CLP (not sure which error 
yet); otherwise, if we return the force-enabled handle and CLP_RC_OK as 
we do here then the guest will just get errors attempting to use it.





^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 5/9] s390x/pci: enable for load/store intepretation
  2022-04-20 15:12       ` Matthew Rosato
@ 2022-04-22  9:27         ` Pierre Morel
  -1 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-22  9:27 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm



On 4/20/22 17:12, Matthew Rosato wrote:
> On 4/19/22 3:47 PM, Pierre Morel wrote:
>>
>>
>> On 4/4/22 20:17, Matthew Rosato wrote:
>>> If the appropriate CPU facilty is available as well as the necessary
>>> ZPCI_OP ioctl, then the underlying KVM host will enable load/store
>>> intepretation for any guest device without a SHM bit in the guest
>>> function handle.  For a device that will be using interpretation
>>> support, ensure the guest function handle matches the host function
>>> handle; this value is re-checked every time the guest issues a SET 
>>> PCI FN
>>> to enable the guest device as it is the only opportunity to reflect
>>> function handle changes.
>>>
>>> By default, unless interpret=off is specified, interpretation support 
>>> will
>>> always be assumed and exploited if the necessary ioctl and features are
>>> available on the host kernel.  When these are unavailable, we will 
>>> silently
>>> revert to the interception model; this allows existing guest 
>>> configurations
>>> to work unmodified on hosts with and without zPCI interpretation 
>>> support,
>>> allowing QEMU to choose the best support model available.
>>>
>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>> ---
>>>   hw/s390x/meson.build            |  1 +
>>>   hw/s390x/s390-pci-bus.c         | 66 ++++++++++++++++++++++++++++++++-
>>>   hw/s390x/s390-pci-inst.c        | 12 ++++++
>>>   hw/s390x/s390-pci-kvm.c         | 21 +++++++++++
>>>   include/hw/s390x/s390-pci-bus.h |  1 +
>>>   include/hw/s390x/s390-pci-kvm.h | 24 ++++++++++++
>>>   target/s390x/kvm/kvm.c          |  7 ++++
>>>   target/s390x/kvm/kvm_s390x.h    |  1 +
>>>   8 files changed, 132 insertions(+), 1 deletion(-)
>>>   create mode 100644 hw/s390x/s390-pci-kvm.c
>>>   create mode 100644 include/hw/s390x/s390-pci-kvm.h
>>>
>>
>> ...snip...
>>
>>>           if (s390_pci_msix_init(pbdev)) {
>>> @@ -1360,6 +1423,7 @@ static Property s390_pci_device_properties[] = {
>>>       DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
>>>       DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
>>>       DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
>>> +    DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
>>>       DEFINE_PROP_END_OF_LIST(),
>>>   };
>>> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
>>> index 6d400d4147..c898c8abe9 100644
>>> --- a/hw/s390x/s390-pci-inst.c
>>> +++ b/hw/s390x/s390-pci-inst.c
>>> @@ -18,6 +18,8 @@
>>>   #include "sysemu/hw_accel.h"
>>>   #include "hw/s390x/s390-pci-inst.h"
>>>   #include "hw/s390x/s390-pci-bus.h"
>>> +#include "hw/s390x/s390-pci-kvm.h"
>>> +#include "hw/s390x/s390-pci-vfio.h"
>>>   #include "hw/s390x/tod.h"
>>>   #ifndef DEBUG_S390PCI_INST
>>> @@ -246,6 +248,16 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, 
>>> uintptr_t ra)
>>>                   goto out;
>>>               }
>>> +            /*
>>> +             * Take this opportunity to make sure we still have an 
>>> accurate
>>> +             * host fh.  It's possible part of the handle changed 
>>> while the
>>> +             * device was disabled to the guest (e.g. vfio hot reset 
>>> for
>>> +             * ISM during plug)
>>> +             */
>>> +            if (pbdev->interp) {
>>> +                /* Take this opportunity to make sure we are sync'd 
>>> with host */
>>> +                s390_pci_get_host_fh(pbdev, &pbdev->fh);

Here we should check the return value and, AFAIU, assume that the device 
disappear if it did return false.

>>> +            }
>>>               pbdev->fh |= FH_MASK_ENABLE;
>>
>> Are we sure here that the PCI device is always enabled?
>> Shouldn't we check?
> 
> I guess you mean the host device?  Interesting thought.
> 
> So, to be clear, the idea on setting FH_MASK_ENABLE here is that we are 
> handling a guest CLP SET PCI FN enable so the guest fh should always 
> have FH_MASK_ENABLE set if we return CLP_RC_OK to the guest.
> 
> But for interpretation, if we find the host function is disabled, I 
> suppose we could return an error on the guest CLP (not sure which error 
> yet); otherwise, if we return the force-enabled handle and CLP_RC_OK as 
> we do here then the guest will just get errors attempting to use it.

hum, in this case can't we have a loop on
clp enable->error->clp disable->clp enable->error...

I think we should return an error if what the guest asked for could not 
be done.


> 
> 
> 

-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 5/9] s390x/pci: enable for load/store intepretation
@ 2022-04-22  9:27         ` Pierre Morel
  0 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-22  9:27 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 4/20/22 17:12, Matthew Rosato wrote:
> On 4/19/22 3:47 PM, Pierre Morel wrote:
>>
>>
>> On 4/4/22 20:17, Matthew Rosato wrote:
>>> If the appropriate CPU facilty is available as well as the necessary
>>> ZPCI_OP ioctl, then the underlying KVM host will enable load/store
>>> intepretation for any guest device without a SHM bit in the guest
>>> function handle.  For a device that will be using interpretation
>>> support, ensure the guest function handle matches the host function
>>> handle; this value is re-checked every time the guest issues a SET 
>>> PCI FN
>>> to enable the guest device as it is the only opportunity to reflect
>>> function handle changes.
>>>
>>> By default, unless interpret=off is specified, interpretation support 
>>> will
>>> always be assumed and exploited if the necessary ioctl and features are
>>> available on the host kernel.  When these are unavailable, we will 
>>> silently
>>> revert to the interception model; this allows existing guest 
>>> configurations
>>> to work unmodified on hosts with and without zPCI interpretation 
>>> support,
>>> allowing QEMU to choose the best support model available.
>>>
>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>> ---
>>>   hw/s390x/meson.build            |  1 +
>>>   hw/s390x/s390-pci-bus.c         | 66 ++++++++++++++++++++++++++++++++-
>>>   hw/s390x/s390-pci-inst.c        | 12 ++++++
>>>   hw/s390x/s390-pci-kvm.c         | 21 +++++++++++
>>>   include/hw/s390x/s390-pci-bus.h |  1 +
>>>   include/hw/s390x/s390-pci-kvm.h | 24 ++++++++++++
>>>   target/s390x/kvm/kvm.c          |  7 ++++
>>>   target/s390x/kvm/kvm_s390x.h    |  1 +
>>>   8 files changed, 132 insertions(+), 1 deletion(-)
>>>   create mode 100644 hw/s390x/s390-pci-kvm.c
>>>   create mode 100644 include/hw/s390x/s390-pci-kvm.h
>>>
>>
>> ...snip...
>>
>>>           if (s390_pci_msix_init(pbdev)) {
>>> @@ -1360,6 +1423,7 @@ static Property s390_pci_device_properties[] = {
>>>       DEFINE_PROP_UINT16("uid", S390PCIBusDevice, uid, UID_UNDEFINED),
>>>       DEFINE_PROP_S390_PCI_FID("fid", S390PCIBusDevice, fid),
>>>       DEFINE_PROP_STRING("target", S390PCIBusDevice, target),
>>> +    DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
>>>       DEFINE_PROP_END_OF_LIST(),
>>>   };
>>> diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
>>> index 6d400d4147..c898c8abe9 100644
>>> --- a/hw/s390x/s390-pci-inst.c
>>> +++ b/hw/s390x/s390-pci-inst.c
>>> @@ -18,6 +18,8 @@
>>>   #include "sysemu/hw_accel.h"
>>>   #include "hw/s390x/s390-pci-inst.h"
>>>   #include "hw/s390x/s390-pci-bus.h"
>>> +#include "hw/s390x/s390-pci-kvm.h"
>>> +#include "hw/s390x/s390-pci-vfio.h"
>>>   #include "hw/s390x/tod.h"
>>>   #ifndef DEBUG_S390PCI_INST
>>> @@ -246,6 +248,16 @@ int clp_service_call(S390CPU *cpu, uint8_t r2, 
>>> uintptr_t ra)
>>>                   goto out;
>>>               }
>>> +            /*
>>> +             * Take this opportunity to make sure we still have an 
>>> accurate
>>> +             * host fh.  It's possible part of the handle changed 
>>> while the
>>> +             * device was disabled to the guest (e.g. vfio hot reset 
>>> for
>>> +             * ISM during plug)
>>> +             */
>>> +            if (pbdev->interp) {
>>> +                /* Take this opportunity to make sure we are sync'd 
>>> with host */
>>> +                s390_pci_get_host_fh(pbdev, &pbdev->fh);

Here we should check the return value and, AFAIU, assume that the device 
disappear if it did return false.

>>> +            }
>>>               pbdev->fh |= FH_MASK_ENABLE;
>>
>> Are we sure here that the PCI device is always enabled?
>> Shouldn't we check?
> 
> I guess you mean the host device?  Interesting thought.
> 
> So, to be clear, the idea on setting FH_MASK_ENABLE here is that we are 
> handling a guest CLP SET PCI FN enable so the guest fh should always 
> have FH_MASK_ENABLE set if we return CLP_RC_OK to the guest.
> 
> But for interpretation, if we find the host function is disabled, I 
> suppose we could return an error on the guest CLP (not sure which error 
> yet); otherwise, if we return the force-enabled handle and CLP_RC_OK as 
> we do here then the guest will just get errors attempting to use it.

hum, in this case can't we have a loop on
clp enable->error->clp disable->clp enable->error...

I think we should return an error if what the guest asked for could not 
be done.


> 
> 
> 

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
  2022-04-04 18:17   ` Matthew Rosato
@ 2022-04-22  9:39     ` Pierre Morel
  -1 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-22  9:39 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm



On 4/4/22 20:17, Matthew Rosato wrote:
> Use the associated kvm ioctl operation to enable adapter event notification
> and forwarding for devices when requested.  This feature will be set up
> with or without firmware assist based upon the 'forwarding_assist' setting.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
>   hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
>   hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
>   include/hw/s390x/s390-pci-bus.h |  1 +
>   include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
>   5 files changed, 100 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 9c02d31250..47918d2ce9 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>           rc = SCLP_RC_NO_ACTION_REQUIRED;
>           break;
>       default:
> -        if (pbdev->summary_ind) {
> +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
> +            /* Interpreted devices were using interrupt forwarding */
> +            s390_pci_kvm_aif_disable(pbdev);

Same remark as for the kernel part.
The VFIO device is already initialized and the action is on this device, 
Shouldn't we use the VFIO device interface instead of the KVM interface?


regards,
Pierre


-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
@ 2022-04-22  9:39     ` Pierre Morel
  0 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-22  9:39 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 4/4/22 20:17, Matthew Rosato wrote:
> Use the associated kvm ioctl operation to enable adapter event notification
> and forwarding for devices when requested.  This feature will be set up
> with or without firmware assist based upon the 'forwarding_assist' setting.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
>   hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
>   hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
>   include/hw/s390x/s390-pci-bus.h |  1 +
>   include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
>   5 files changed, 100 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 9c02d31250..47918d2ce9 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>           rc = SCLP_RC_NO_ACTION_REQUIRED;
>           break;
>       default:
> -        if (pbdev->summary_ind) {
> +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
> +            /* Interpreted devices were using interrupt forwarding */
> +            s390_pci_kvm_aif_disable(pbdev);

Same remark as for the kernel part.
The VFIO device is already initialized and the action is on this device, 
Shouldn't we use the VFIO device interface instead of the KVM interface?


regards,
Pierre


-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 9/9] s390x/pci: reflect proper maxstbl for groups of interpreted devices
  2022-04-04 18:17   ` Matthew Rosato
@ 2022-04-22  9:43     ` Pierre Morel
  -1 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-22  9:43 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm



On 4/4/22 20:17, Matthew Rosato wrote:
> The maximum supported store block length might be different depending
> on whether the instruction is interpretively executed (firmware-reported
> maximum) or handled via userspace intercept (host kernel API maximum).
> Choose the best available value during group creation.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>

> ---
>   hw/s390x/s390-pci-vfio.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 985980f021..212dd053f7 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -213,7 +213,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
>           resgrp->msia = cap->msi_addr;
>           resgrp->mui = cap->mui;
>           resgrp->i = cap->noi;
> -        resgrp->maxstbl = cap->maxstbl;
> +        if (pbdev->interp && hdr->version >= 2) {
> +            resgrp->maxstbl = cap->imaxstbl;
> +        } else {
> +            resgrp->maxstbl = cap->maxstbl;
> +        }
>           resgrp->version = cap->version;
>           resgrp->dtsm = ZPCI_DTSM;
>       }
> 

-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 9/9] s390x/pci: reflect proper maxstbl for groups of interpreted devices
@ 2022-04-22  9:43     ` Pierre Morel
  0 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-04-22  9:43 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 4/4/22 20:17, Matthew Rosato wrote:
> The maximum supported store block length might be different depending
> on whether the instruction is interpretively executed (firmware-reported
> maximum) or handled via userspace intercept (host kernel API maximum).
> Choose the best available value during group creation.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>

> ---
>   hw/s390x/s390-pci-vfio.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 985980f021..212dd053f7 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -213,7 +213,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
>           resgrp->msia = cap->msi_addr;
>           resgrp->mui = cap->mui;
>           resgrp->i = cap->noi;
> -        resgrp->maxstbl = cap->maxstbl;
> +        if (pbdev->interp && hdr->version >= 2) {
> +            resgrp->maxstbl = cap->imaxstbl;
> +        } else {
> +            resgrp->maxstbl = cap->maxstbl;
> +        }
>           resgrp->version = cap->version;
>           resgrp->dtsm = ZPCI_DTSM;
>       }
> 

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
  2022-04-22  9:39     ` Pierre Morel
@ 2022-04-22 12:10       ` Matthew Rosato
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-22 12:10 UTC (permalink / raw)
  To: Pierre Morel, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, thuth, farman,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

On 4/22/22 5:39 AM, Pierre Morel wrote:
> 
> 
> On 4/4/22 20:17, Matthew Rosato wrote:
>> Use the associated kvm ioctl operation to enable adapter event 
>> notification
>> and forwarding for devices when requested.  This feature will be set up
>> with or without firmware assist based upon the 'forwarding_assist' 
>> setting.
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>>   hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
>>   hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
>>   hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
>>   include/hw/s390x/s390-pci-bus.h |  1 +
>>   include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
>>   5 files changed, 100 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>> index 9c02d31250..47918d2ce9 100644
>> --- a/hw/s390x/s390-pci-bus.c
>> +++ b/hw/s390x/s390-pci-bus.c
>> @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>>           rc = SCLP_RC_NO_ACTION_REQUIRED;
>>           break;
>>       default:
>> -        if (pbdev->summary_ind) {
>> +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
>> +            /* Interpreted devices were using interrupt forwarding */
>> +            s390_pci_kvm_aif_disable(pbdev);
> 
> Same remark as for the kernel part.
> The VFIO device is already initialized and the action is on this device, 
> Shouldn't we use the VFIO device interface instead of the KVM interface?
> 

I don't necessarily disagree, but in v3 of the kernel series I was told 
not to use VFIO ioctls to accomplish tasks that are unique to KVM (e.g. 
AEN interpretation) and to instead use a KVM ioctl.

VFIO_DEVICE_SET_IRQS won't work as-is for reasons described in the 
kernel series (e.g. we don't see any of the config space notifiers 
because of instruction interpretation) -- as far as I can figure we 
could add our own s390 code to QEMU to issue VFIO_DEVICE_SET_IRQS 
directly for an interpreted device, but I think would also need 
s390-specific changes to VFIO_DEVICE_SET_IRQS accommodate this (e.g. 
maybe something like a VFIO_IRQ_SET_DATA_S390AEN where we can then 
specify the aen information in vfio_irq_set.data -- or something else I 
haven't though of yet) -- I can try to look at this some more and see if 
I get a good idea.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
@ 2022-04-22 12:10       ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-04-22 12:10 UTC (permalink / raw)
  To: Pierre Morel, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger

On 4/22/22 5:39 AM, Pierre Morel wrote:
> 
> 
> On 4/4/22 20:17, Matthew Rosato wrote:
>> Use the associated kvm ioctl operation to enable adapter event 
>> notification
>> and forwarding for devices when requested.  This feature will be set up
>> with or without firmware assist based upon the 'forwarding_assist' 
>> setting.
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>>   hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
>>   hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
>>   hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
>>   include/hw/s390x/s390-pci-bus.h |  1 +
>>   include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
>>   5 files changed, 100 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>> index 9c02d31250..47918d2ce9 100644
>> --- a/hw/s390x/s390-pci-bus.c
>> +++ b/hw/s390x/s390-pci-bus.c
>> @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>>           rc = SCLP_RC_NO_ACTION_REQUIRED;
>>           break;
>>       default:
>> -        if (pbdev->summary_ind) {
>> +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
>> +            /* Interpreted devices were using interrupt forwarding */
>> +            s390_pci_kvm_aif_disable(pbdev);
> 
> Same remark as for the kernel part.
> The VFIO device is already initialized and the action is on this device, 
> Shouldn't we use the VFIO device interface instead of the KVM interface?
> 

I don't necessarily disagree, but in v3 of the kernel series I was told 
not to use VFIO ioctls to accomplish tasks that are unique to KVM (e.g. 
AEN interpretation) and to instead use a KVM ioctl.

VFIO_DEVICE_SET_IRQS won't work as-is for reasons described in the 
kernel series (e.g. we don't see any of the config space notifiers 
because of instruction interpretation) -- as far as I can figure we 
could add our own s390 code to QEMU to issue VFIO_DEVICE_SET_IRQS 
directly for an interpreted device, but I think would also need 
s390-specific changes to VFIO_DEVICE_SET_IRQS accommodate this (e.g. 
maybe something like a VFIO_IRQ_SET_DATA_S390AEN where we can then 
specify the aen information in vfio_irq_set.data -- or something else I 
haven't though of yet) -- I can try to look at this some more and see if 
I get a good idea.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
  2022-04-22 12:10       ` Matthew Rosato
@ 2022-05-02  7:48         ` Pierre Morel
  -1 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-05-02  7:48 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x, alex.williamson
  Cc: schnelle, cohuck, thuth, farman, richard.henderson, david, pasic,
	borntraeger, mst, pbonzini, qemu-devel, kvm



On 4/22/22 14:10, Matthew Rosato wrote:
> On 4/22/22 5:39 AM, Pierre Morel wrote:
>>
>>
>> On 4/4/22 20:17, Matthew Rosato wrote:
>>> Use the associated kvm ioctl operation to enable adapter event 
>>> notification
>>> and forwarding for devices when requested.  This feature will be set up
>>> with or without firmware assist based upon the 'forwarding_assist' 
>>> setting.
>>>
>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>> ---
>>>   hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
>>>   hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
>>>   hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
>>>   include/hw/s390x/s390-pci-bus.h |  1 +
>>>   include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
>>>   5 files changed, 100 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>> index 9c02d31250..47918d2ce9 100644
>>> --- a/hw/s390x/s390-pci-bus.c
>>> +++ b/hw/s390x/s390-pci-bus.c
>>> @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>>>           rc = SCLP_RC_NO_ACTION_REQUIRED;
>>>           break;
>>>       default:
>>> -        if (pbdev->summary_ind) {
>>> +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
>>> +            /* Interpreted devices were using interrupt forwarding */
>>> +            s390_pci_kvm_aif_disable(pbdev);
>>
>> Same remark as for the kernel part.
>> The VFIO device is already initialized and the action is on this 
>> device, Shouldn't we use the VFIO device interface instead of the KVM 
>> interface?
>>
> 
> I don't necessarily disagree, but in v3 of the kernel series I was told 
> not to use VFIO ioctls to accomplish tasks that are unique to KVM (e.g. 
> AEN interpretation) and to instead use a KVM ioctl.
> 
> VFIO_DEVICE_SET_IRQS won't work as-is for reasons described in the 
> kernel series (e.g. we don't see any of the config space notifiers 
> because of instruction interpretation) -- as far as I can figure we 
> could add our own s390 code to QEMU to issue VFIO_DEVICE_SET_IRQS 
> directly for an interpreted device, but I think would also need 
> s390-specific changes to VFIO_DEVICE_SET_IRQS accommodate this (e.g. 
> maybe something like a VFIO_IRQ_SET_DATA_S390AEN where we can then 
> specify the aen information in vfio_irq_set.data -- or something else I 

Hi,

yes this in VFIO_DEVICE_SET_IRQS is what I think should be done.

> haven't though of yet) -- I can try to look at this some more and see if 
> I get a good idea.



I understood that the demand was concerning the IOMMU but I may be wrong.
For my opinion, the handling of AEN is not specific to KVM but specific 
to the device, for example the code should be the same if Z ever decide 
to use XEN or another hypervizor, except for the GISA part but this part 
is already implemented in KVM in a way it can be used from a device like 
in VFIO AP.

@Alex, what do you think?

Regards,
Pierre


-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
@ 2022-05-02  7:48         ` Pierre Morel
  0 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-05-02  7:48 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x, alex.williamson
  Cc: thuth, kvm, david, cohuck, richard.henderson, farman, qemu-devel,
	pasic, mst, schnelle, pbonzini, borntraeger



On 4/22/22 14:10, Matthew Rosato wrote:
> On 4/22/22 5:39 AM, Pierre Morel wrote:
>>
>>
>> On 4/4/22 20:17, Matthew Rosato wrote:
>>> Use the associated kvm ioctl operation to enable adapter event 
>>> notification
>>> and forwarding for devices when requested.  This feature will be set up
>>> with or without firmware assist based upon the 'forwarding_assist' 
>>> setting.
>>>
>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>> ---
>>>   hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
>>>   hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
>>>   hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
>>>   include/hw/s390x/s390-pci-bus.h |  1 +
>>>   include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
>>>   5 files changed, 100 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>> index 9c02d31250..47918d2ce9 100644
>>> --- a/hw/s390x/s390-pci-bus.c
>>> +++ b/hw/s390x/s390-pci-bus.c
>>> @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>>>           rc = SCLP_RC_NO_ACTION_REQUIRED;
>>>           break;
>>>       default:
>>> -        if (pbdev->summary_ind) {
>>> +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
>>> +            /* Interpreted devices were using interrupt forwarding */
>>> +            s390_pci_kvm_aif_disable(pbdev);
>>
>> Same remark as for the kernel part.
>> The VFIO device is already initialized and the action is on this 
>> device, Shouldn't we use the VFIO device interface instead of the KVM 
>> interface?
>>
> 
> I don't necessarily disagree, but in v3 of the kernel series I was told 
> not to use VFIO ioctls to accomplish tasks that are unique to KVM (e.g. 
> AEN interpretation) and to instead use a KVM ioctl.
> 
> VFIO_DEVICE_SET_IRQS won't work as-is for reasons described in the 
> kernel series (e.g. we don't see any of the config space notifiers 
> because of instruction interpretation) -- as far as I can figure we 
> could add our own s390 code to QEMU to issue VFIO_DEVICE_SET_IRQS 
> directly for an interpreted device, but I think would also need 
> s390-specific changes to VFIO_DEVICE_SET_IRQS accommodate this (e.g. 
> maybe something like a VFIO_IRQ_SET_DATA_S390AEN where we can then 
> specify the aen information in vfio_irq_set.data -- or something else I 

Hi,

yes this in VFIO_DEVICE_SET_IRQS is what I think should be done.

> haven't though of yet) -- I can try to look at this some more and see if 
> I get a good idea.



I understood that the demand was concerning the IOMMU but I may be wrong.
For my opinion, the handling of AEN is not specific to KVM but specific 
to the device, for example the code should be the same if Z ever decide 
to use XEN or another hypervizor, except for the GISA part but this part 
is already implemented in KVM in a way it can be used from a device like 
in VFIO AP.

@Alex, what do you think?

Regards,
Pierre


-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
  2022-05-02  7:48         ` Pierre Morel
@ 2022-05-02  9:19           ` Niklas Schnelle
  -1 siblings, 0 replies; 55+ messages in thread
From: Niklas Schnelle @ 2022-05-02  9:19 UTC (permalink / raw)
  To: Pierre Morel, Matthew Rosato, qemu-s390x, alex.williamson
  Cc: cohuck, thuth, farman, richard.henderson, david, pasic,
	borntraeger, mst, pbonzini, qemu-devel, kvm

On Mon, 2022-05-02 at 09:48 +0200, Pierre Morel wrote:
> 
> On 4/22/22 14:10, Matthew Rosato wrote:
> > On 4/22/22 5:39 AM, Pierre Morel wrote:
> > > 
> > > On 4/4/22 20:17, Matthew Rosato wrote:
> > > > Use the associated kvm ioctl operation to enable adapter event 
> > > > notification
> > > > and forwarding for devices when requested.  This feature will be set up
> > > > with or without firmware assist based upon the 'forwarding_assist' 
> > > > setting.
> > > > 
> > > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > > ---
> > > >   hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
> > > >   hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
> > > >   hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
> > > >   include/hw/s390x/s390-pci-bus.h |  1 +
> > > >   include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
> > > >   5 files changed, 100 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> > > > index 9c02d31250..47918d2ce9 100644
> > > > --- a/hw/s390x/s390-pci-bus.c
> > > > +++ b/hw/s390x/s390-pci-bus.c
> > > > @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
> > > >           rc = SCLP_RC_NO_ACTION_REQUIRED;
> > > >           break;
> > > >       default:
> > > > -        if (pbdev->summary_ind) {
> > > > +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
> > > > +            /* Interpreted devices were using interrupt forwarding */
> > > > +            s390_pci_kvm_aif_disable(pbdev);
> > > 
> > > Same remark as for the kernel part.
> > > The VFIO device is already initialized and the action is on this 
> > > device, Shouldn't we use the VFIO device interface instead of the KVM 
> > > interface?
> > > 
> > 
> > I don't necessarily disagree, but in v3 of the kernel series I was told 
> > not to use VFIO ioctls to accomplish tasks that are unique to KVM (e.g. 
> > AEN interpretation) and to instead use a KVM ioctl.
> > 
> > VFIO_DEVICE_SET_IRQS won't work as-is for reasons described in the 
> > kernel series (e.g. we don't see any of the config space notifiers 
> > because of instruction interpretation) -- as far as I can figure we 
> > could add our own s390 code to QEMU to issue VFIO_DEVICE_SET_IRQS 
> > directly for an interpreted device, but I think would also need 
> > s390-specific changes to VFIO_DEVICE_SET_IRQS accommodate this (e.g. 
> > maybe something like a VFIO_IRQ_SET_DATA_S390AEN where we can then 
> > specify the aen information in vfio_irq_set.data -- or something else I 
> 
> Hi,
> 
> yes this in VFIO_DEVICE_SET_IRQS is what I think should be done.
> 
> > haven't though of yet) -- I can try to look at this some more and see if 
> > I get a good idea.
> 
> 
> I understood that the demand was concerning the IOMMU but I may be wrong.
> For my opinion, the handling of AEN is not specific to KVM but specific 
> to the device, for example the code should be the same if Z ever decide 
> to use XEN or another hypervizor, except for the GISA part but this part 
> is already implemented in KVM in a way it can be used from a device like 
> in VFIO AP.
> 
> @Alex, what do you think?
> 
> Regards,
> Pierre
> 

As I understand it the question isn't if it is specific to KVM but
rather if it is specific to virtualization. As vfio-pci is also used
for non virtualization purposes such as with DPDK/SPDK or a fully
emulating QEMU, it should only be in VFIO if it is relevant for these
kinds of user-space PCI accesses too. I'm not an AEN expert but as I
understand it, this does forwarding interrupts into a SIE context which
only makes sense for virtualization not for general user-space PCI.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
@ 2022-05-02  9:19           ` Niklas Schnelle
  0 siblings, 0 replies; 55+ messages in thread
From: Niklas Schnelle @ 2022-05-02  9:19 UTC (permalink / raw)
  To: Pierre Morel, Matthew Rosato, qemu-s390x, alex.williamson
  Cc: thuth, kvm, david, cohuck, richard.henderson, farman, qemu-devel,
	pasic, mst, pbonzini, borntraeger

On Mon, 2022-05-02 at 09:48 +0200, Pierre Morel wrote:
> 
> On 4/22/22 14:10, Matthew Rosato wrote:
> > On 4/22/22 5:39 AM, Pierre Morel wrote:
> > > 
> > > On 4/4/22 20:17, Matthew Rosato wrote:
> > > > Use the associated kvm ioctl operation to enable adapter event 
> > > > notification
> > > > and forwarding for devices when requested.  This feature will be set up
> > > > with or without firmware assist based upon the 'forwarding_assist' 
> > > > setting.
> > > > 
> > > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > > ---
> > > >   hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
> > > >   hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
> > > >   hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
> > > >   include/hw/s390x/s390-pci-bus.h |  1 +
> > > >   include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
> > > >   5 files changed, 100 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> > > > index 9c02d31250..47918d2ce9 100644
> > > > --- a/hw/s390x/s390-pci-bus.c
> > > > +++ b/hw/s390x/s390-pci-bus.c
> > > > @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
> > > >           rc = SCLP_RC_NO_ACTION_REQUIRED;
> > > >           break;
> > > >       default:
> > > > -        if (pbdev->summary_ind) {
> > > > +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
> > > > +            /* Interpreted devices were using interrupt forwarding */
> > > > +            s390_pci_kvm_aif_disable(pbdev);
> > > 
> > > Same remark as for the kernel part.
> > > The VFIO device is already initialized and the action is on this 
> > > device, Shouldn't we use the VFIO device interface instead of the KVM 
> > > interface?
> > > 
> > 
> > I don't necessarily disagree, but in v3 of the kernel series I was told 
> > not to use VFIO ioctls to accomplish tasks that are unique to KVM (e.g. 
> > AEN interpretation) and to instead use a KVM ioctl.
> > 
> > VFIO_DEVICE_SET_IRQS won't work as-is for reasons described in the 
> > kernel series (e.g. we don't see any of the config space notifiers 
> > because of instruction interpretation) -- as far as I can figure we 
> > could add our own s390 code to QEMU to issue VFIO_DEVICE_SET_IRQS 
> > directly for an interpreted device, but I think would also need 
> > s390-specific changes to VFIO_DEVICE_SET_IRQS accommodate this (e.g. 
> > maybe something like a VFIO_IRQ_SET_DATA_S390AEN where we can then 
> > specify the aen information in vfio_irq_set.data -- or something else I 
> 
> Hi,
> 
> yes this in VFIO_DEVICE_SET_IRQS is what I think should be done.
> 
> > haven't though of yet) -- I can try to look at this some more and see if 
> > I get a good idea.
> 
> 
> I understood that the demand was concerning the IOMMU but I may be wrong.
> For my opinion, the handling of AEN is not specific to KVM but specific 
> to the device, for example the code should be the same if Z ever decide 
> to use XEN or another hypervizor, except for the GISA part but this part 
> is already implemented in KVM in a way it can be used from a device like 
> in VFIO AP.
> 
> @Alex, what do you think?
> 
> Regards,
> Pierre
> 

As I understand it the question isn't if it is specific to KVM but
rather if it is specific to virtualization. As vfio-pci is also used
for non virtualization purposes such as with DPDK/SPDK or a fully
emulating QEMU, it should only be in VFIO if it is relevant for these
kinds of user-space PCI accesses too. I'm not an AEN expert but as I
understand it, this does forwarding interrupts into a SIE context which
only makes sense for virtualization not for general user-space PCI.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
  2022-05-02  9:19           ` Niklas Schnelle
@ 2022-05-02 11:30             ` Pierre Morel
  -1 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-05-02 11:30 UTC (permalink / raw)
  To: Niklas Schnelle, Matthew Rosato, qemu-s390x, alex.williamson
  Cc: cohuck, thuth, farman, richard.henderson, david, pasic,
	borntraeger, mst, pbonzini, qemu-devel, kvm



On 5/2/22 11:19, Niklas Schnelle wrote:
> On Mon, 2022-05-02 at 09:48 +0200, Pierre Morel wrote:
>>
>> On 4/22/22 14:10, Matthew Rosato wrote:
>>> On 4/22/22 5:39 AM, Pierre Morel wrote:
>>>>
>>>> On 4/4/22 20:17, Matthew Rosato wrote:
>>>>> Use the associated kvm ioctl operation to enable adapter event
>>>>> notification
>>>>> and forwarding for devices when requested.  This feature will be set up
>>>>> with or without firmware assist based upon the 'forwarding_assist'
>>>>> setting.
>>>>>
>>>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>>>> ---
>>>>>    hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
>>>>>    hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
>>>>>    hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
>>>>>    include/hw/s390x/s390-pci-bus.h |  1 +
>>>>>    include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
>>>>>    5 files changed, 100 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>>>> index 9c02d31250..47918d2ce9 100644
>>>>> --- a/hw/s390x/s390-pci-bus.c
>>>>> +++ b/hw/s390x/s390-pci-bus.c
>>>>> @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>>>>>            rc = SCLP_RC_NO_ACTION_REQUIRED;
>>>>>            break;
>>>>>        default:
>>>>> -        if (pbdev->summary_ind) {
>>>>> +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
>>>>> +            /* Interpreted devices were using interrupt forwarding */
>>>>> +            s390_pci_kvm_aif_disable(pbdev);
>>>>
>>>> Same remark as for the kernel part.
>>>> The VFIO device is already initialized and the action is on this
>>>> device, Shouldn't we use the VFIO device interface instead of the KVM
>>>> interface?
>>>>
>>>
>>> I don't necessarily disagree, but in v3 of the kernel series I was told
>>> not to use VFIO ioctls to accomplish tasks that are unique to KVM (e.g.
>>> AEN interpretation) and to instead use a KVM ioctl.
>>>
>>> VFIO_DEVICE_SET_IRQS won't work as-is for reasons described in the
>>> kernel series (e.g. we don't see any of the config space notifiers
>>> because of instruction interpretation) -- as far as I can figure we
>>> could add our own s390 code to QEMU to issue VFIO_DEVICE_SET_IRQS
>>> directly for an interpreted device, but I think would also need
>>> s390-specific changes to VFIO_DEVICE_SET_IRQS accommodate this (e.g.
>>> maybe something like a VFIO_IRQ_SET_DATA_S390AEN where we can then
>>> specify the aen information in vfio_irq_set.data -- or something else I
>>
>> Hi,
>>
>> yes this in VFIO_DEVICE_SET_IRQS is what I think should be done.
>>
>>> haven't though of yet) -- I can try to look at this some more and see if
>>> I get a good idea.
>>
>>
>> I understood that the demand was concerning the IOMMU but I may be wrong.
>> For my opinion, the handling of AEN is not specific to KVM but specific
>> to the device, for example the code should be the same if Z ever decide
>> to use XEN or another hypervizor, except for the GISA part but this part
>> is already implemented in KVM in a way it can be used from a device like
>> in VFIO AP.
>>
>> @Alex, what do you think?
>>
>> Regards,
>> Pierre
>>
> 
> As I understand it the question isn't if it is specific to KVM but
> rather if it is specific to virtualization. As vfio-pci is also used
> for non virtualization purposes such as with DPDK/SPDK or a fully
> emulating QEMU, it should only be in VFIO if it is relevant for these
> kinds of user-space PCI accesses too. I'm not an AEN expert but as I
> understand it, this does forwarding interrupts into a SIE context which
> only makes sense for virtualization not for general user-space PCI.
> 

Being in VFIO kernel part does not mean that this part should be called 
from any user of VFIO in userland.
That is a reason why I did propose an extension and not using the 
current implementation of VFIO_DEVICE_SET_IRQS as is.

The reason behind is that the AEN hardware handling is device specific: 
we need the Function Handle to program AEN.

If the API is through KVM which is device agnostic the implementation in 
KVM has to search through the system to find the device being handled to 
apply AEN on it.

This not the logical way for me and it is a potential source of problems 
for future extensions.

Regards,
Pierre

-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
@ 2022-05-02 11:30             ` Pierre Morel
  0 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-05-02 11:30 UTC (permalink / raw)
  To: Niklas Schnelle, Matthew Rosato, qemu-s390x, alex.williamson
  Cc: thuth, kvm, david, cohuck, richard.henderson, farman, qemu-devel,
	pasic, mst, pbonzini, borntraeger



On 5/2/22 11:19, Niklas Schnelle wrote:
> On Mon, 2022-05-02 at 09:48 +0200, Pierre Morel wrote:
>>
>> On 4/22/22 14:10, Matthew Rosato wrote:
>>> On 4/22/22 5:39 AM, Pierre Morel wrote:
>>>>
>>>> On 4/4/22 20:17, Matthew Rosato wrote:
>>>>> Use the associated kvm ioctl operation to enable adapter event
>>>>> notification
>>>>> and forwarding for devices when requested.  This feature will be set up
>>>>> with or without firmware assist based upon the 'forwarding_assist'
>>>>> setting.
>>>>>
>>>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>>>> ---
>>>>>    hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
>>>>>    hw/s390x/s390-pci-inst.c        | 40 +++++++++++++++++++++++++++++++--
>>>>>    hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
>>>>>    include/hw/s390x/s390-pci-bus.h |  1 +
>>>>>    include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
>>>>>    5 files changed, 100 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>>>> index 9c02d31250..47918d2ce9 100644
>>>>> --- a/hw/s390x/s390-pci-bus.c
>>>>> +++ b/hw/s390x/s390-pci-bus.c
>>>>> @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>>>>>            rc = SCLP_RC_NO_ACTION_REQUIRED;
>>>>>            break;
>>>>>        default:
>>>>> -        if (pbdev->summary_ind) {
>>>>> +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
>>>>> +            /* Interpreted devices were using interrupt forwarding */
>>>>> +            s390_pci_kvm_aif_disable(pbdev);
>>>>
>>>> Same remark as for the kernel part.
>>>> The VFIO device is already initialized and the action is on this
>>>> device, Shouldn't we use the VFIO device interface instead of the KVM
>>>> interface?
>>>>
>>>
>>> I don't necessarily disagree, but in v3 of the kernel series I was told
>>> not to use VFIO ioctls to accomplish tasks that are unique to KVM (e.g.
>>> AEN interpretation) and to instead use a KVM ioctl.
>>>
>>> VFIO_DEVICE_SET_IRQS won't work as-is for reasons described in the
>>> kernel series (e.g. we don't see any of the config space notifiers
>>> because of instruction interpretation) -- as far as I can figure we
>>> could add our own s390 code to QEMU to issue VFIO_DEVICE_SET_IRQS
>>> directly for an interpreted device, but I think would also need
>>> s390-specific changes to VFIO_DEVICE_SET_IRQS accommodate this (e.g.
>>> maybe something like a VFIO_IRQ_SET_DATA_S390AEN where we can then
>>> specify the aen information in vfio_irq_set.data -- or something else I
>>
>> Hi,
>>
>> yes this in VFIO_DEVICE_SET_IRQS is what I think should be done.
>>
>>> haven't though of yet) -- I can try to look at this some more and see if
>>> I get a good idea.
>>
>>
>> I understood that the demand was concerning the IOMMU but I may be wrong.
>> For my opinion, the handling of AEN is not specific to KVM but specific
>> to the device, for example the code should be the same if Z ever decide
>> to use XEN or another hypervizor, except for the GISA part but this part
>> is already implemented in KVM in a way it can be used from a device like
>> in VFIO AP.
>>
>> @Alex, what do you think?
>>
>> Regards,
>> Pierre
>>
> 
> As I understand it the question isn't if it is specific to KVM but
> rather if it is specific to virtualization. As vfio-pci is also used
> for non virtualization purposes such as with DPDK/SPDK or a fully
> emulating QEMU, it should only be in VFIO if it is relevant for these
> kinds of user-space PCI accesses too. I'm not an AEN expert but as I
> understand it, this does forwarding interrupts into a SIE context which
> only makes sense for virtualization not for general user-space PCI.
> 

Being in VFIO kernel part does not mean that this part should be called 
from any user of VFIO in userland.
That is a reason why I did propose an extension and not using the 
current implementation of VFIO_DEVICE_SET_IRQS as is.

The reason behind is that the AEN hardware handling is device specific: 
we need the Function Handle to program AEN.

If the API is through KVM which is device agnostic the implementation in 
KVM has to search through the system to find the device being handled to 
apply AEN on it.

This not the logical way for me and it is a potential source of problems 
for future extensions.

Regards,
Pierre

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
  2022-05-02 11:30             ` Pierre Morel
  (?)
@ 2022-05-02 19:57             ` Matthew Rosato
  2022-05-03 14:53               ` Pierre Morel
  -1 siblings, 1 reply; 55+ messages in thread
From: Matthew Rosato @ 2022-05-02 19:57 UTC (permalink / raw)
  To: Pierre Morel, Niklas Schnelle, qemu-s390x, alex.williamson
  Cc: cohuck, thuth, farman, richard.henderson, david, pasic,
	borntraeger, mst, pbonzini, qemu-devel, kvm

On 5/2/22 7:30 AM, Pierre Morel wrote:
> 
> 
> On 5/2/22 11:19, Niklas Schnelle wrote:
>> On Mon, 2022-05-02 at 09:48 +0200, Pierre Morel wrote:
>>>
>>> On 4/22/22 14:10, Matthew Rosato wrote:
>>>> On 4/22/22 5:39 AM, Pierre Morel wrote:
>>>>>
>>>>> On 4/4/22 20:17, Matthew Rosato wrote:
>>>>>> Use the associated kvm ioctl operation to enable adapter event
>>>>>> notification
>>>>>> and forwarding for devices when requested.  This feature will be 
>>>>>> set up
>>>>>> with or without firmware assist based upon the 'forwarding_assist'
>>>>>> setting.
>>>>>>
>>>>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>>>>> ---
>>>>>>    hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
>>>>>>    hw/s390x/s390-pci-inst.c        | 40 
>>>>>> +++++++++++++++++++++++++++++++--
>>>>>>    hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
>>>>>>    include/hw/s390x/s390-pci-bus.h |  1 +
>>>>>>    include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
>>>>>>    5 files changed, 100 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>>>>> index 9c02d31250..47918d2ce9 100644
>>>>>> --- a/hw/s390x/s390-pci-bus.c
>>>>>> +++ b/hw/s390x/s390-pci-bus.c
>>>>>> @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>>>>>>            rc = SCLP_RC_NO_ACTION_REQUIRED;
>>>>>>            break;
>>>>>>        default:
>>>>>> -        if (pbdev->summary_ind) {
>>>>>> +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
>>>>>> +            /* Interpreted devices were using interrupt 
>>>>>> forwarding */
>>>>>> +            s390_pci_kvm_aif_disable(pbdev);
>>>>>
>>>>> Same remark as for the kernel part.
>>>>> The VFIO device is already initialized and the action is on this
>>>>> device, Shouldn't we use the VFIO device interface instead of the KVM
>>>>> interface?
>>>>>
>>>>
>>>> I don't necessarily disagree, but in v3 of the kernel series I was told
>>>> not to use VFIO ioctls to accomplish tasks that are unique to KVM (e.g.
>>>> AEN interpretation) and to instead use a KVM ioctl.
>>>>
>>>> VFIO_DEVICE_SET_IRQS won't work as-is for reasons described in the
>>>> kernel series (e.g. we don't see any of the config space notifiers
>>>> because of instruction interpretation) -- as far as I can figure we
>>>> could add our own s390 code to QEMU to issue VFIO_DEVICE_SET_IRQS
>>>> directly for an interpreted device, but I think would also need
>>>> s390-specific changes to VFIO_DEVICE_SET_IRQS accommodate this (e.g.
>>>> maybe something like a VFIO_IRQ_SET_DATA_S390AEN where we can then
>>>> specify the aen information in vfio_irq_set.data -- or something else I
>>>
>>> Hi,
>>>
>>> yes this in VFIO_DEVICE_SET_IRQS is what I think should be done.
>>>
>>>> haven't though of yet) -- I can try to look at this some more and 
>>>> see if
>>>> I get a good idea.
>>>
>>>
>>> I understood that the demand was concerning the IOMMU but I may be 
>>> wrong.

The IOMMU was an issue, but the request to move the ioctl out of vfio to 
kvm was specifically because these ioctl operations were only relevant 
for VMs and are not applicable to vfio uses cases outside of virtualization.

https://lore.kernel.org/kvm/20220208185141.GH4160@nvidia.com/

>>> For my opinion, the handling of AEN is not specific to KVM but specific
>>> to the device, for example the code should be the same if Z ever decide
>>> to use XEN or another hypervizor, except for the GISA part but this part
>>> is already implemented in KVM in a way it can be used from a device like
>>> in VFIO AP.


Fundamentally, these operations are valid only when you have _both_ a 
virtual machine and vfio device.  (Yes, you could swap in a new 
hypervisor with a new GISA implementation, but at the end of it the 
hypervisor must still provide the GISA designation for this to work)

If fh lookup is a concern, one idea that Jason floated was passing the 
vfio device fd as an argument to the kvm ioctl (so pass this down on a 
kvm ioctl from userspace instead of a fh) and then using a new vfio 
external API to get the relevant device from the provided fd.

https://lore.kernel.org/kvm/20220208195117.GI4160@nvidia.com/

>>>
>>> @Alex, what do you think?
>>>
>>> Regards,
>>> Pierre
>>>
>>
>> As I understand it the question isn't if it is specific to KVM but
>> rather if it is specific to virtualization. As vfio-pci is also used
>> for non virtualization purposes such as with DPDK/SPDK or a fully
>> emulating QEMU, it should only be in VFIO if it is relevant for these
>> kinds of user-space PCI accesses too. I'm not an AEN expert but as I
>> understand it, this does forwarding interrupts into a SIE context which
>> only makes sense for virtualization not for general user-space PCI.

Right, AEN forwarding is only relevant for virtual machines.

>>
> 
> Being in VFIO kernel part does not mean that this part should be called 
> from any user of VFIO in userland.
> That is a reason why I did propose an extension and not using the 
> current implementation of VFIO_DEVICE_SET_IRQS as is.
> 
> The reason behind is that the AEN hardware handling is device specific: 
> we need the Function Handle to program AEN.

You also need the GISA designation which is provided by the kvm or you 
also can't program AEN.  So you ultimately need both a function handle 
that is 'owned' by the device (vfio device fd) and the GISA designation 
that is 'owned' by kvm (kvm fd).  So there are 2 different "owning" fds 
involved.

> 
> If the API is through KVM which is device agnostic the implementation in 
> KVM has to search through the system to find the device being handled to 
> apply AEN on it.

See comment above about instead passing the vfio device fd.

> 
> This not the logical way for me and it is a potential source of problems 
> for future extensions.
> 




^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
  2022-05-02 19:57             ` Matthew Rosato
@ 2022-05-03 14:53               ` Pierre Morel
  2022-05-04 14:20                 ` Matthew Rosato
  0 siblings, 1 reply; 55+ messages in thread
From: Pierre Morel @ 2022-05-03 14:53 UTC (permalink / raw)
  To: Matthew Rosato, Niklas Schnelle, qemu-s390x, alex.williamson
  Cc: cohuck, thuth, farman, richard.henderson, david, pasic,
	borntraeger, mst, pbonzini, qemu-devel, kvm



On 5/2/22 21:57, Matthew Rosato wrote:
> On 5/2/22 7:30 AM, Pierre Morel wrote:
>>
>>
>> On 5/2/22 11:19, Niklas Schnelle wrote:
>>> On Mon, 2022-05-02 at 09:48 +0200, Pierre Morel wrote:
>>>>
>>>> On 4/22/22 14:10, Matthew Rosato wrote:
>>>>> On 4/22/22 5:39 AM, Pierre Morel wrote:
>>>>>>
>>>>>> On 4/4/22 20:17, Matthew Rosato wrote:
>>>>>>> Use the associated kvm ioctl operation to enable adapter event
>>>>>>> notification
>>>>>>> and forwarding for devices when requested.  This feature will be 
>>>>>>> set up
>>>>>>> with or without firmware assist based upon the 'forwarding_assist'
>>>>>>> setting.
>>>>>>>
>>>>>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>>>>>> ---
>>>>>>>    hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
>>>>>>>    hw/s390x/s390-pci-inst.c        | 40 
>>>>>>> +++++++++++++++++++++++++++++++--
>>>>>>>    hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
>>>>>>>    include/hw/s390x/s390-pci-bus.h |  1 +
>>>>>>>    include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
>>>>>>>    5 files changed, 100 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>>>>>> index 9c02d31250..47918d2ce9 100644
>>>>>>> --- a/hw/s390x/s390-pci-bus.c
>>>>>>> +++ b/hw/s390x/s390-pci-bus.c
>>>>>>> @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>>>>>>>            rc = SCLP_RC_NO_ACTION_REQUIRED;
>>>>>>>            break;
>>>>>>>        default:
>>>>>>> -        if (pbdev->summary_ind) {
>>>>>>> +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
>>>>>>> +            /* Interpreted devices were using interrupt 
>>>>>>> forwarding */
>>>>>>> +            s390_pci_kvm_aif_disable(pbdev);
>>>>>>
>>>>>> Same remark as for the kernel part.
>>>>>> The VFIO device is already initialized and the action is on this
>>>>>> device, Shouldn't we use the VFIO device interface instead of the KVM
>>>>>> interface?
>>>>>>
>>>>>
>>>>> I don't necessarily disagree, but in v3 of the kernel series I was 
>>>>> told
>>>>> not to use VFIO ioctls to accomplish tasks that are unique to KVM 
>>>>> (e.g.
>>>>> AEN interpretation) and to instead use a KVM ioctl.
>>>>>
>>>>> VFIO_DEVICE_SET_IRQS won't work as-is for reasons described in the
>>>>> kernel series (e.g. we don't see any of the config space notifiers
>>>>> because of instruction interpretation) -- as far as I can figure we
>>>>> could add our own s390 code to QEMU to issue VFIO_DEVICE_SET_IRQS
>>>>> directly for an interpreted device, but I think would also need
>>>>> s390-specific changes to VFIO_DEVICE_SET_IRQS accommodate this (e.g.
>>>>> maybe something like a VFIO_IRQ_SET_DATA_S390AEN where we can then
>>>>> specify the aen information in vfio_irq_set.data -- or something 
>>>>> else I
>>>>
>>>> Hi,
>>>>
>>>> yes this in VFIO_DEVICE_SET_IRQS is what I think should be done.
>>>>
>>>>> haven't though of yet) -- I can try to look at this some more and 
>>>>> see if
>>>>> I get a good idea.
>>>>
>>>>
>>>> I understood that the demand was concerning the IOMMU but I may be 
>>>> wrong.
> 
> The IOMMU was an issue, but the request to move the ioctl out of vfio to 
> kvm was specifically because these ioctl operations were only relevant 
> for VMs and are not applicable to vfio uses cases outside of 
> virtualization.
> 
> https://lore.kernel.org/kvm/20220208185141.GH4160@nvidia.com/

I absolutely agree that KVM specific handling should go through KVM fd.
But as I say here under, AEN is not KVM specific but device specific.
Instruction interpretation is KVM specific.
see later---v

> 
>>>> For my opinion, the handling of AEN is not specific to KVM but specific
>>>> to the device, for example the code should be the same if Z ever decide
>>>> to use XEN or another hypervizor, except for the GISA part but this 
>>>> part
>>>> is already implemented in KVM in a way it can be used from a device 
>>>> like
>>>> in VFIO AP.
> 
> 
> Fundamentally, these operations are valid only when you have _both_ a 
> virtual machine and vfio device.  (Yes, you could swap in a new 
> hypervisor with a new GISA implementation, but at the end of it the 
> hypervisor must still provide the GISA designation for this to work)
> 
> If fh lookup is a concern, one idea that Jason floated was passing the 
> vfio device fd as an argument to the kvm ioctl (so pass this down on a 
> kvm ioctl from userspace instead of a fh) and then using a new vfio 
> external API to get the relevant device from the provided fd.
> 
> https://lore.kernel.org/kvm/20220208195117.GI4160@nvidia.com/

^------
This looks like a wrong architecture to me.

If something is used to virtualize the I/O of a device it should go 
through the device VFIO fd.

If we need a new VFIO external API why not using an extension of the 
VFIO_DEVICE_SET_IRQS and use directly the VFIO device to setup interrupts?

see following ----v

> 
>>>>
>>>> @Alex, what do you think?
>>>>
>>>> Regards,
>>>> Pierre
>>>>
>>>
>>> As I understand it the question isn't if it is specific to KVM but
>>> rather if it is specific to virtualization. As vfio-pci is also used
>>> for non virtualization purposes such as with DPDK/SPDK or a fully
>>> emulating QEMU, it should only be in VFIO if it is relevant for these
>>> kinds of user-space PCI accesses too. I'm not an AEN expert but as I
>>> understand it, this does forwarding interrupts into a SIE context which
>>> only makes sense for virtualization not for general user-space PCI.
> 
> Right, AEN forwarding is only relevant for virtual machines.
> 
>>>
>>
>> Being in VFIO kernel part does not mean that this part should be 
>> called from any user of VFIO in userland.
>> That is a reason why I did propose an extension and not using the 
>> current implementation of VFIO_DEVICE_SET_IRQS as is.
>>
>> The reason behind is that the AEN hardware handling is device 
>> specific: we need the Function Handle to program AEN.
> 
> You also need the GISA designation which is provided by the kvm or you 
> also can't program AEN.  So you ultimately need both a function handle 
> that is 'owned' by the device (vfio device fd) and the GISA designation 
> that is 'owned' by kvm (kvm fd).  So there are 2 different "owning" fds 
> involved.

Yes GISA is a host structure, not device specific but guest specific and 
exist very soon during the guest creation, there should be no problem to 
retrieve it from a VFIO device IOTCL.

> 
>>
>> If the API is through KVM which is device agnostic the implementation 
>> in KVM has to search through the system to find the device being 
>> handled to apply AEN on it.
> 
> See comment above about instead passing the vfio device fd.
> 
>>
>> This not the logical way for me and it is a potential source of 
>> problems for future extensions.
>>
> 
> 
>
^------

There are three different things to modify for the Z-guest to use VFIO:
- IOMMU
- device IRQ
- instruction interpretation, feature negociation

For my opinion only the last one should go directly through the KVM fd.

This should be possible for all architectures.
If it is not possible for Z, the failing path must be adapted it should 
not go through another path.

Giving the right IRQ information to the host can be done with a 
dedicated IOCTL through the VFIO device fd, just like we need an 
extension in the other direction to retrieve the Z specific capabilities.

I am quite sure that other architectures will need some specificity too 
for the interrupt or IOMMU handling in the future with increasing 
implementation of virtualization in the firmware.

Having a dedicated IOCTL command means it can be called from QEMU and 
for guest virtualizuation only then let unused for other userland access.


Regards,
Pierre


-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices
  2022-05-03 14:53               ` Pierre Morel
@ 2022-05-04 14:20                 ` Matthew Rosato
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Rosato @ 2022-05-04 14:20 UTC (permalink / raw)
  To: Pierre Morel, Niklas Schnelle, qemu-s390x, alex.williamson
  Cc: cohuck, thuth, farman, richard.henderson, david, pasic,
	borntraeger, mst, pbonzini, qemu-devel, kvm

On 5/3/22 10:53 AM, Pierre Morel wrote:
> 
> 
> On 5/2/22 21:57, Matthew Rosato wrote:
>> On 5/2/22 7:30 AM, Pierre Morel wrote:
>>>
>>>
>>> On 5/2/22 11:19, Niklas Schnelle wrote:
>>>> On Mon, 2022-05-02 at 09:48 +0200, Pierre Morel wrote:
>>>>>
>>>>> On 4/22/22 14:10, Matthew Rosato wrote:
>>>>>> On 4/22/22 5:39 AM, Pierre Morel wrote:
>>>>>>>
>>>>>>> On 4/4/22 20:17, Matthew Rosato wrote:
>>>>>>>> Use the associated kvm ioctl operation to enable adapter event
>>>>>>>> notification
>>>>>>>> and forwarding for devices when requested.  This feature will be 
>>>>>>>> set up
>>>>>>>> with or without firmware assist based upon the 'forwarding_assist'
>>>>>>>> setting.
>>>>>>>>
>>>>>>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>>>>>>> ---
>>>>>>>>    hw/s390x/s390-pci-bus.c         | 20 ++++++++++++++---
>>>>>>>>    hw/s390x/s390-pci-inst.c        | 40 
>>>>>>>> +++++++++++++++++++++++++++++++--
>>>>>>>>    hw/s390x/s390-pci-kvm.c         | 30 +++++++++++++++++++++++++
>>>>>>>>    include/hw/s390x/s390-pci-bus.h |  1 +
>>>>>>>>    include/hw/s390x/s390-pci-kvm.h | 14 ++++++++++++
>>>>>>>>    5 files changed, 100 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>>>>>>> index 9c02d31250..47918d2ce9 100644
>>>>>>>> --- a/hw/s390x/s390-pci-bus.c
>>>>>>>> +++ b/hw/s390x/s390-pci-bus.c
>>>>>>>> @@ -190,7 +190,10 @@ void s390_pci_sclp_deconfigure(SCCB *sccb)
>>>>>>>>            rc = SCLP_RC_NO_ACTION_REQUIRED;
>>>>>>>>            break;
>>>>>>>>        default:
>>>>>>>> -        if (pbdev->summary_ind) {
>>>>>>>> +        if (pbdev->interp && (pbdev->fh & FH_MASK_ENABLE)) {
>>>>>>>> +            /* Interpreted devices were using interrupt 
>>>>>>>> forwarding */
>>>>>>>> +            s390_pci_kvm_aif_disable(pbdev);
>>>>>>>
>>>>>>> Same remark as for the kernel part.
>>>>>>> The VFIO device is already initialized and the action is on this
>>>>>>> device, Shouldn't we use the VFIO device interface instead of the 
>>>>>>> KVM
>>>>>>> interface?
>>>>>>>
>>>>>>
>>>>>> I don't necessarily disagree, but in v3 of the kernel series I was 
>>>>>> told
>>>>>> not to use VFIO ioctls to accomplish tasks that are unique to KVM 
>>>>>> (e.g.
>>>>>> AEN interpretation) and to instead use a KVM ioctl.
>>>>>>
>>>>>> VFIO_DEVICE_SET_IRQS won't work as-is for reasons described in the
>>>>>> kernel series (e.g. we don't see any of the config space notifiers
>>>>>> because of instruction interpretation) -- as far as I can figure we
>>>>>> could add our own s390 code to QEMU to issue VFIO_DEVICE_SET_IRQS
>>>>>> directly for an interpreted device, but I think would also need
>>>>>> s390-specific changes to VFIO_DEVICE_SET_IRQS accommodate this (e.g.
>>>>>> maybe something like a VFIO_IRQ_SET_DATA_S390AEN where we can then
>>>>>> specify the aen information in vfio_irq_set.data -- or something 
>>>>>> else I
>>>>>
>>>>> Hi,
>>>>>
>>>>> yes this in VFIO_DEVICE_SET_IRQS is what I think should be done.
>>>>>
>>>>>> haven't though of yet) -- I can try to look at this some more and 
>>>>>> see if
>>>>>> I get a good idea.
>>>>>
>>>>>
>>>>> I understood that the demand was concerning the IOMMU but I may be 
>>>>> wrong.
>>
>> The IOMMU was an issue, but the request to move the ioctl out of vfio 
>> to kvm was specifically because these ioctl operations were only 
>> relevant for VMs and are not applicable to vfio uses cases outside of 
>> virtualization.
>>
>> https://lore.kernel.org/kvm/20220208185141.GH4160@nvidia.com/
> 
> I absolutely agree that KVM specific handling should go through KVM fd.
> But as I say here under, AEN is not KVM specific but device specific.
> Instruction interpretation is KVM specific.
> see later---v
> 
>>
>>>>> For my opinion, the handling of AEN is not specific to KVM but 
>>>>> specific
>>>>> to the device, for example the code should be the same if Z ever 
>>>>> decide
>>>>> to use XEN or another hypervizor, except for the GISA part but this 
>>>>> part
>>>>> is already implemented in KVM in a way it can be used from a device 
>>>>> like
>>>>> in VFIO AP.
>>
>>
>> Fundamentally, these operations are valid only when you have _both_ a 
>> virtual machine and vfio device.  (Yes, you could swap in a new 
>> hypervisor with a new GISA implementation, but at the end of it the 
>> hypervisor must still provide the GISA designation for this to work)
>>
>> If fh lookup is a concern, one idea that Jason floated was passing the 
>> vfio device fd as an argument to the kvm ioctl (so pass this down on a 
>> kvm ioctl from userspace instead of a fh) and then using a new vfio 
>> external API to get the relevant device from the provided fd.
>>
>> https://lore.kernel.org/kvm/20220208195117.GI4160@nvidia.com/
> 
> ^------
> This looks like a wrong architecture to me.
> 
> If something is used to virtualize the I/O of a device it should go 
> through the device VFIO fd.
> 
> If we need a new VFIO external API why not using an extension of the 
> VFIO_DEVICE_SET_IRQS and use directly the VFIO device to setup interrupts?
> 
> see following ----v
> 
>>
>>>>>
>>>>> @Alex, what do you think?
>>>>>
>>>>> Regards,
>>>>> Pierre
>>>>>
>>>>
>>>> As I understand it the question isn't if it is specific to KVM but
>>>> rather if it is specific to virtualization. As vfio-pci is also used
>>>> for non virtualization purposes such as with DPDK/SPDK or a fully
>>>> emulating QEMU, it should only be in VFIO if it is relevant for these
>>>> kinds of user-space PCI accesses too. I'm not an AEN expert but as I
>>>> understand it, this does forwarding interrupts into a SIE context which
>>>> only makes sense for virtualization not for general user-space PCI.
>>
>> Right, AEN forwarding is only relevant for virtual machines.
>>
>>>>
>>>
>>> Being in VFIO kernel part does not mean that this part should be 
>>> called from any user of VFIO in userland.
>>> That is a reason why I did propose an extension and not using the 
>>> current implementation of VFIO_DEVICE_SET_IRQS as is.
>>>
>>> The reason behind is that the AEN hardware handling is device 
>>> specific: we need the Function Handle to program AEN.
>>
>> You also need the GISA designation which is provided by the kvm or you 
>> also can't program AEN.  So you ultimately need both a function handle 
>> that is 'owned' by the device (vfio device fd) and the GISA 
>> designation that is 'owned' by kvm (kvm fd).  So there are 2 different 
>> "owning" fds involved.
> 
> Yes GISA is a host structure, not device specific but guest specific and 
> exist very soon during the guest creation, there should be no problem to 
> retrieve it from a VFIO device IOTCL.
> 
>>
>>>
>>> If the API is through KVM which is device agnostic the implementation 
>>> in KVM has to search through the system to find the device being 
>>> handled to apply AEN on it.
>>
>> See comment above about instead passing the vfio device fd.
>>
>>>
>>> This not the logical way for me and it is a potential source of 
>>> problems for future extensions.
>>>
>>
>>
>>
> ^------
> 
> There are three different things to modify for the Z-guest to use VFIO:
> - IOMMU
> - device IRQ
> - instruction interpretation, feature negociation
> 
> For my opinion only the last one should go directly through the KVM fd.
> 
> This should be possible for all architectures.
> If it is not possible for Z, the failing path must be adapted it should 
> not go through another path.
> 
> Giving the right IRQ information to the host can be done with a 
> dedicated IOCTL through the VFIO device fd, just like we need an 
> extension in the other direction to retrieve the Z specific capabilities.
> 
> I am quite sure that other architectures will need some specificity too 
> for the interrupt or IOMMU handling in the future with increasing 
> implementation of virtualization in the firmware.
> 
> Having a dedicated IOCTL command means it can be called from QEMU and 
> for guest virtualizuation only then let unused for other userland access.
> 

Another approach (that I admittedly don't have all the details worked 
out on yet) would be to do something like add a new type of 
kvm_irq_routing_entry that can be used specifically for AEN.  Then we 
can establish this route with the following info:

struct kvm_irq_routing_s390_aen {
        __u64 ind_addr;
        __u64 summary_addr;
        __u32 fd; /* vfio device fd */
        __u32 noi;
        __u8  isc;
        __u8  sbo;
};

The vfio device fd is required as it would then be used to get the 
associated zdev and thus its fh, which we need when we go to activate 
AEN (mpcifc).  Our existing adapter-based routes 
(kvm_irq_routing_s390_adapter) lack this association and I can't think 
of a way around that besides introducing a different route type.

During interrupt.c:kvm_set_routing_entry() we can stash the guest info + 
fd.  Then during irq_bypass_{add,del}_producer for this new route type 
we can find the zpci->fh via the vfio device fd and then actually 
(un)pin guest addresses / issue the host mpcfic using the routing info. 
vfio will trigger irq_bypass_{add,del}_producer when the virq is 
enabled/disabled.

 From QEMU though, it gets a bit weird -- since we enable load/store 
interpretation, we will never see any of the config space writes from 
the guest.  So, vfio MSI notifiers never get triggered to call e.g. 
kvm_irqchip_add_msi_route -- we would have to do this ourselves in 
s390x-pci, specifying the new route type above.  And then also possibly 
issue our own VFIO_DEVICE_SET_IRQS since again, it will never get 
tripped via a vfio notifier.

Or alternatively, we can intentionally trigger the MSI notifiers from 
s390x-pci code (looks like spapr does this via spapr_msi_setmsg) for the 
number of vectors the guest specifies on the mpcific to create the 
necessary virq(s) and drive the subsequent VFIO_DEVICE_SET_IRQ call. 
Actually, we might have to do that anyway to satisfy 
VFIO_DEVICE_SET_IRQS expectations in the kernel.  And, using the above 
structure, we probably only need to create a single virq since it 
contains all of the routes in one payload + once AEN is established we 
are always delivering interrupts to the guest via GISA, not over eventfd.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 9/9] s390x/pci: reflect proper maxstbl for groups of interpreted devices
  2022-04-04 18:17   ` Matthew Rosato
                     ` (2 preceding siblings ...)
  (?)
@ 2022-05-06  9:03   ` Pierre Morel
  -1 siblings, 0 replies; 55+ messages in thread
From: Pierre Morel @ 2022-05-06  9:03 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: farman, kvm, schnelle, cohuck, richard.henderson, thuth,
	qemu-devel, pasic, alex.williamson, mst, pbonzini, david,
	borntraeger



On 4/4/22 20:17, Matthew Rosato wrote:
> The maximum supported store block length might be different depending
> on whether the instruction is interpretively executed (firmware-reported
> maximum) or handled via userspace intercept (host kernel API maximum).
> Choose the best available value during group creation.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>


> ---
>   hw/s390x/s390-pci-vfio.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
> index 985980f021..212dd053f7 100644
> --- a/hw/s390x/s390-pci-vfio.c
> +++ b/hw/s390x/s390-pci-vfio.c
> @@ -213,7 +213,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
>           resgrp->msia = cap->msi_addr;
>           resgrp->mui = cap->mui;
>           resgrp->i = cap->noi;
> -        resgrp->maxstbl = cap->maxstbl;
> +        if (pbdev->interp && hdr->version >= 2) {
> +            resgrp->maxstbl = cap->imaxstbl;
> +        } else {
> +            resgrp->maxstbl = cap->maxstbl;
> +        }
>           resgrp->version = cap->version;
>           resgrp->dtsm = ZPCI_DTSM;
>       }
> 

-- 
Pierre Morel
IBM Lab Boeblingen

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 3/9] target/s390x: add zpci-interp to cpu models
  2022-04-04 18:17   ` Matthew Rosato
  (?)
@ 2022-05-18  8:01   ` Thomas Huth
  2022-05-18  8:02     ` Thomas Huth
  -1 siblings, 1 reply; 55+ messages in thread
From: Thomas Huth @ 2022-05-18  8:01 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x, david
  Cc: alex.williamson, schnelle, cohuck, farman, pmorel,
	richard.henderson, pasic, borntraeger, mst, pbonzini, qemu-devel,
	kvm

On 04/04/2022 20.17, Matthew Rosato wrote:
> The zpci-interp feature is used to specify whether zPCI interpretation is
> to be used for this guest.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-virtio-ccw.c          | 1 +
>   target/s390x/cpu_features_def.h.inc | 1 +
>   target/s390x/gen-features.c         | 2 ++
>   target/s390x/kvm/kvm.c              | 1 +
>   4 files changed, 5 insertions(+)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 90480e7cf9..b190234308 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -805,6 +805,7 @@ static void ccw_machine_6_2_instance_options(MachineState *machine)
>       static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V6_2 };
>   
>       ccw_machine_7_0_instance_options(machine);
> +    s390_cpudef_featoff_greater(14, 1, S390_FEAT_ZPCI_INTERP);
>       s390_set_qemu_cpu_model(0x3906, 14, 2, qemu_cpu_feat);
>   }
>   
> diff --git a/target/s390x/cpu_features_def.h.inc b/target/s390x/cpu_features_def.h.inc
> index e86662bb3b..4ade3182aa 100644
> --- a/target/s390x/cpu_features_def.h.inc
> +++ b/target/s390x/cpu_features_def.h.inc
> @@ -146,6 +146,7 @@ DEF_FEAT(SIE_CEI, "cei", SCLP_CPU, 43, "SIE: Conditional-external-interception f
>   DEF_FEAT(DAT_ENH_2, "dateh2", MISC, 0, "DAT-enhancement facility 2")
>   DEF_FEAT(CMM, "cmm", MISC, 0, "Collaborative-memory-management facility")
>   DEF_FEAT(AP, "ap", MISC, 0, "AP instructions installed")
> +DEF_FEAT(ZPCI_INTERP, "zpci-interp", MISC, 0, "zPCI interpretation")
>   
>   /* Features exposed via the PLO instruction. */
>   DEF_FEAT(PLO_CL, "plo-cl", PLO, 0, "PLO Compare and load (32 bit in general registers)")
> diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
> index 22846121c4..9db6bd545e 100644
> --- a/target/s390x/gen-features.c
> +++ b/target/s390x/gen-features.c
> @@ -554,6 +554,7 @@ static uint16_t full_GEN14_GA1[] = {
>       S390_FEAT_HPMA2,
>       S390_FEAT_SIE_KSS,
>       S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
> +    S390_FEAT_ZPCI_INTERP,
>   };
>   
>   #define full_GEN14_GA2 EmptyFeat
> @@ -650,6 +651,7 @@ static uint16_t default_GEN14_GA1[] = {
>       S390_FEAT_GROUP_MSA_EXT_8,
>       S390_FEAT_MULTIPLE_EPOCH,
>       S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
> +    S390_FEAT_ZPCI_INTERP,
>   };

If you add something to the default model, I think you also need to add some 
compatibility handling to the machine types. See e.g. commit 84176c7906f as 
an example.

  Thomas


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 3/9] target/s390x: add zpci-interp to cpu models
  2022-05-18  8:01   ` Thomas Huth
@ 2022-05-18  8:02     ` Thomas Huth
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Huth @ 2022-05-18  8:02 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x, david
  Cc: alex.williamson, schnelle, cohuck, farman, pmorel,
	richard.henderson, pasic, borntraeger, mst, pbonzini, qemu-devel,
	kvm

On 18/05/2022 10.01, Thomas Huth wrote:
> On 04/04/2022 20.17, Matthew Rosato wrote:
>> The zpci-interp feature is used to specify whether zPCI interpretation is
>> to be used for this guest.
>>
>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>> ---
>>   hw/s390x/s390-virtio-ccw.c          | 1 +
>>   target/s390x/cpu_features_def.h.inc | 1 +
>>   target/s390x/gen-features.c         | 2 ++
>>   target/s390x/kvm/kvm.c              | 1 +
>>   4 files changed, 5 insertions(+)
>>
>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>> index 90480e7cf9..b190234308 100644
>> --- a/hw/s390x/s390-virtio-ccw.c
>> +++ b/hw/s390x/s390-virtio-ccw.c
>> @@ -805,6 +805,7 @@ static void 
>> ccw_machine_6_2_instance_options(MachineState *machine)
>>       static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V6_2 };
>>       ccw_machine_7_0_instance_options(machine);
>> +    s390_cpudef_featoff_greater(14, 1, S390_FEAT_ZPCI_INTERP);
>>       s390_set_qemu_cpu_model(0x3906, 14, 2, qemu_cpu_feat);
>>   }
>> diff --git a/target/s390x/cpu_features_def.h.inc 
>> b/target/s390x/cpu_features_def.h.inc
>> index e86662bb3b..4ade3182aa 100644
>> --- a/target/s390x/cpu_features_def.h.inc
>> +++ b/target/s390x/cpu_features_def.h.inc
>> @@ -146,6 +146,7 @@ DEF_FEAT(SIE_CEI, "cei", SCLP_CPU, 43, "SIE: 
>> Conditional-external-interception f
>>   DEF_FEAT(DAT_ENH_2, "dateh2", MISC, 0, "DAT-enhancement facility 2")
>>   DEF_FEAT(CMM, "cmm", MISC, 0, "Collaborative-memory-management facility")
>>   DEF_FEAT(AP, "ap", MISC, 0, "AP instructions installed")
>> +DEF_FEAT(ZPCI_INTERP, "zpci-interp", MISC, 0, "zPCI interpretation")
>>   /* Features exposed via the PLO instruction. */
>>   DEF_FEAT(PLO_CL, "plo-cl", PLO, 0, "PLO Compare and load (32 bit in 
>> general registers)")
>> diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
>> index 22846121c4..9db6bd545e 100644
>> --- a/target/s390x/gen-features.c
>> +++ b/target/s390x/gen-features.c
>> @@ -554,6 +554,7 @@ static uint16_t full_GEN14_GA1[] = {
>>       S390_FEAT_HPMA2,
>>       S390_FEAT_SIE_KSS,
>>       S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
>> +    S390_FEAT_ZPCI_INTERP,
>>   };
>>   #define full_GEN14_GA2 EmptyFeat
>> @@ -650,6 +651,7 @@ static uint16_t default_GEN14_GA1[] = {
>>       S390_FEAT_GROUP_MSA_EXT_8,
>>       S390_FEAT_MULTIPLE_EPOCH,
>>       S390_FEAT_GROUP_MULTIPLE_EPOCH_PTFF,
>> +    S390_FEAT_ZPCI_INTERP,
>>   };
> 
> If you add something to the default model, I think you also need to add some 
> compatibility handling to the machine types. See e.g. commit 84176c7906f as 
> an example.

Ah, never mind, it's there some lines earlier in the patch ... I guess I did 
not have not enough coffee today yet...

  Thomas



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 3/9] target/s390x: add zpci-interp to cpu models
  2022-04-04 18:17   ` Matthew Rosato
  (?)
  (?)
@ 2022-05-18  8:05   ` Thomas Huth
  -1 siblings, 0 replies; 55+ messages in thread
From: Thomas Huth @ 2022-05-18  8:05 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

On 04/04/2022 20.17, Matthew Rosato wrote:
> The zpci-interp feature is used to specify whether zPCI interpretation is
> to be used for this guest.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-virtio-ccw.c          | 1 +
>   target/s390x/cpu_features_def.h.inc | 1 +
>   target/s390x/gen-features.c         | 2 ++
>   target/s390x/kvm/kvm.c              | 1 +
>   4 files changed, 5 insertions(+)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 90480e7cf9..b190234308 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -805,6 +805,7 @@ static void ccw_machine_6_2_instance_options(MachineState *machine)
>       static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V6_2 };
>   
>       ccw_machine_7_0_instance_options(machine);
> +    s390_cpudef_featoff_greater(14, 1, S390_FEAT_ZPCI_INTERP);
>       s390_set_qemu_cpu_model(0x3906, 14, 2, qemu_cpu_feat);
>   }

This needs to be moved into ccw_machine_7_0_instance_options() now that 7.0 
has been released without this feature.

  Thomas


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v5 4/9] s390x/pci: add routine to get host function handle from CLP info
  2022-04-04 18:17   ` Matthew Rosato
  (?)
  (?)
@ 2022-05-18  8:13   ` Thomas Huth
  -1 siblings, 0 replies; 55+ messages in thread
From: Thomas Huth @ 2022-05-18  8:13 UTC (permalink / raw)
  To: Matthew Rosato, qemu-s390x
  Cc: alex.williamson, schnelle, cohuck, farman, pmorel,
	richard.henderson, david, pasic, borntraeger, mst, pbonzini,
	qemu-devel, kvm

On 04/04/2022 20.17, Matthew Rosato wrote:
> In order to interface with the underlying host zPCI device, we need
> to know it's function handle.  Add a routine to grab this from the
> vfio CLP capabilities chain.
> 
> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
>   hw/s390x/s390-pci-vfio.c         | 83 ++++++++++++++++++++++++++------
>   include/hw/s390x/s390-pci-vfio.h |  6 +++
>   2 files changed, 73 insertions(+), 16 deletions(-)
[...]
> diff --git a/include/hw/s390x/s390-pci-vfio.h b/include/hw/s390x/s390-pci-vfio.h
> index ff708aef50..0c2e4b5175 100644
> --- a/include/hw/s390x/s390-pci-vfio.h
> +++ b/include/hw/s390x/s390-pci-vfio.h
> @@ -20,6 +20,7 @@ bool s390_pci_update_dma_avail(int fd, unsigned int *avail);
>   S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
>                                             S390PCIBusDevice *pbdev);
>   void s390_pci_end_dma_count(S390pciState *s, S390PCIDMACount *cnt);
> +bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev, uint32_t *fh);
>   void s390_pci_get_clp_info(S390PCIBusDevice *pbdev);
>   #else
>   static inline bool s390_pci_update_dma_avail(int fd, unsigned int *avail)
> @@ -33,6 +34,11 @@ static inline S390PCIDMACount *s390_pci_start_dma_count(S390pciState *s,
>   }
>   static inline void s390_pci_end_dma_count(S390pciState *s,
>                                             S390PCIDMACount *cnt) { }
> +static inline bool s390_pci_get_host_fh(S390PCIBusDevice *pbdev,
> +                                        unsigned int *fh)

This prototype does not match the one before the else - please replace 
"unsigned int" with "uint32_t".

  Thomas

> +{
> +    return false;
> +}
>   static inline void s390_pci_get_clp_info(S390PCIBusDevice *pbdev) { }
>   #endif
>   


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2022-05-18  8:13 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-04 18:17 [PATCH v5 0/9] s390x/pci: zPCI interpretation support Matthew Rosato
2022-04-04 18:17 ` Matthew Rosato
2022-04-04 18:17 ` [PATCH v5 1/9] Update linux headers Matthew Rosato
2022-04-04 18:17   ` Matthew Rosato
2022-04-04 18:17 ` [PATCH v5 2/9] vfio: tolerate migration protocol v1 uapi renames Matthew Rosato
2022-04-04 18:17   ` Matthew Rosato
2022-04-12 15:50   ` Pierre Morel
2022-04-12 15:50     ` Pierre Morel
2022-04-12 16:07     ` Matthew Rosato
2022-04-12 16:07       ` Matthew Rosato
2022-04-19 15:44       ` Pierre Morel
2022-04-04 18:17 ` [PATCH v5 3/9] target/s390x: add zpci-interp to cpu models Matthew Rosato
2022-04-04 18:17   ` Matthew Rosato
2022-05-18  8:01   ` Thomas Huth
2022-05-18  8:02     ` Thomas Huth
2022-05-18  8:05   ` Thomas Huth
2022-04-04 18:17 ` [PATCH v5 4/9] s390x/pci: add routine to get host function handle from CLP info Matthew Rosato
2022-04-04 18:17   ` Matthew Rosato
2022-04-19 19:15   ` Pierre Morel
2022-04-19 19:15     ` Pierre Morel
2022-05-18  8:13   ` Thomas Huth
2022-04-04 18:17 ` [PATCH v5 5/9] s390x/pci: enable for load/store intepretation Matthew Rosato
2022-04-04 18:17   ` Matthew Rosato
2022-04-19 19:47   ` Pierre Morel
2022-04-19 19:47     ` Pierre Morel
2022-04-20 15:12     ` Matthew Rosato
2022-04-20 15:12       ` Matthew Rosato
2022-04-22  9:27       ` Pierre Morel
2022-04-22  9:27         ` Pierre Morel
2022-04-04 18:17 ` [PATCH v5 6/9] s390x/pci: don't fence interpreted devices without MSI-X Matthew Rosato
2022-04-04 18:17   ` Matthew Rosato
2022-04-04 18:17 ` [PATCH v5 7/9] s390x/pci: enable adapter event notification for interpreted devices Matthew Rosato
2022-04-04 18:17   ` Matthew Rosato
2022-04-22  9:39   ` Pierre Morel
2022-04-22  9:39     ` Pierre Morel
2022-04-22 12:10     ` Matthew Rosato
2022-04-22 12:10       ` Matthew Rosato
2022-05-02  7:48       ` Pierre Morel
2022-05-02  7:48         ` Pierre Morel
2022-05-02  9:19         ` Niklas Schnelle
2022-05-02  9:19           ` Niklas Schnelle
2022-05-02 11:30           ` Pierre Morel
2022-05-02 11:30             ` Pierre Morel
2022-05-02 19:57             ` Matthew Rosato
2022-05-03 14:53               ` Pierre Morel
2022-05-04 14:20                 ` Matthew Rosato
2022-04-04 18:17 ` [PATCH v5 8/9] s390x/pci: let intercept devices have separate PCI groups Matthew Rosato
2022-04-04 18:17   ` Matthew Rosato
2022-04-04 18:17 ` [PATCH v5 9/9] s390x/pci: reflect proper maxstbl for groups of interpreted devices Matthew Rosato
2022-04-04 18:17   ` Matthew Rosato
2022-04-19 19:49   ` Pierre Morel
2022-04-19 19:49     ` Pierre Morel
2022-04-22  9:43   ` Pierre Morel
2022-04-22  9:43     ` Pierre Morel
2022-05-06  9:03   ` Pierre Morel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.