kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support
@ 2020-10-22 17:11 Tony Krowiak
  2020-10-22 17:11 ` [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset Tony Krowiak
                   ` (13 more replies)
  0 siblings, 14 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:11 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

The current design for AP pass-through does not support making dynamic
changes to the AP matrix of a running guest resulting in a few 
deficiencies this patch series is intended to mitigate:

1. Adapters, domains and control domains can not be added to or removed
   from a running guest. In order to modify a guest's AP configuration,
   the guest must be terminated; only then can AP resources be assigned
   to or unassigned from the guest's matrix mdev. The new AP 
   configuration becomes available to the guest when it is subsequently
   restarted.

2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
   be modified by a root user without any restrictions. A change to
   either mask can result in AP queue devices being unbound from the
   vfio_ap device driver and bound to a zcrypt device driver even if a
   guest is using the queues, thus giving the host access to the guest's
   private crypto data and vice versa.

3. The APQNs derived from the Cartesian product of the APIDs of the
   adapters and APQIs of the domains assigned to a matrix mdev must
   reference an AP queue device bound to the vfio_ap device driver. The
   AP architecture allows assignment of AP resources that are not
   available to the system, so this artificial restriction is not 
   compliant with the architecture.

4. The AP configuration profile can be dynamically changed for the linux
   host after a KVM guest is started. For example, a new domain can be
   dynamically added to the configuration profile via the SE or an HMC
   connected to a DPM enabled lpar. Likewise, AP adapters can be 
   dynamically configured (online state) and deconfigured (standby state)
   using the SE, an SCLP command or an HMC connected to a DPM enabled
   lpar. This can result in inadvertent sharing of AP queues between the
   guest and host.

5. A root user can manually unbind an AP queue device representing a 
   queue in use by a KVM guest via the vfio_ap device driver's sysfs 
   unbind attribute. In this case, the guest will be using a queue that
   is not bound to the driver which violates the device model.

This patch series introduces the following changes to the current design
to alleviate the shortcomings described above as well as to implement
more of the AP architecture:

1. A root user will be prevented from making edits to the AP bus's
   /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would transfer
   ownership of an APQN from the vfio_ap device driver to a zcrypt driver
   while the APQN is assigned to a matrix mdev.

2. Allow a root user to hot plug/unplug AP adapters, domains and control
   domains for a KVM guest using the matrix mdev via its sysfs
   assign/unassign attributes.

4. Allow assignment of an AP adapter or domain to a matrix mdev even if
   it results in assignment of an APQN that does not reference an AP
   queue device bound to the vfio_ap device driver, as long as the APQN
   is not reserved for use by the default zcrypt drivers (also known as
   over-provisioning of AP resources). Allowing over-provisioning of AP
   resources better models the architecture which does not preclude
   assigning AP resources that are not yet available in the system. Such
   APQNs, however, will not be assigned to the guest using the matrix
   mdev; only APQNs referencing AP queue devices bound to the vfio_ap
   device driver will actually get assigned to the guest.

5. Handle dynamic changes to the AP device model. 

1. Rationale for changes to AP bus's apmask/aqmask interfaces:
----------------------------------------------------------
Due to the extremely sensitive nature of cryptographic data, it is
imperative that great care be taken to ensure that such data is secured.
Allowing a root user, either inadvertently or maliciously, to configure
these masks such that a queue is shared between the host and a guest is
not only avoidable, it is advisable. It was suggested that this scenario
is better handled in user space with management software, but that does
not preclude a malicious administrator from using the sysfs interfaces
to gain access to a guest's crypto data. It was also suggested that this
scenario could be avoided by taking access to the adapter away from the
guest and zeroing out the queues prior to the vfio_ap driver releasing the
device; however, stealing an adapter in use from a guest as a by-product
of an operation is bad and will likely cause problems for the guest
unnecessarily. It was decided that the most effective solution with the
least number of negative side effects is to prevent the situation at the
source.

2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
----------------------------------------------------------------
Allowing a user to hot plug/unplug AP resources using the matrix mdev
sysfs interfaces circumvents the need to terminate the guest in order to
modify its AP configuration. Allowing dynamic configuration makes 
reconfiguring a guest's AP matrix much less disruptive.

3. Rationale for allowing over-provisioning of AP resources:
----------------------------------------------------------- 
Allowing assignment of AP resources to a matrix mdev and ultimately to a
guest better models the AP architecture. The architecture does not
preclude assignment of unavailable AP resources. If a queue subsequently
becomes available while a guest using the matrix mdev to which its APQN
is assigned, the guest will be given access to it. If an APQN
is dynamically unassigned from the underlying host system, it will 
automatically become unavailable to the guest.

Change log v10-v11:
------------------
* The matrix mdev's configuration is not filtered by APID so that if any
  APQN assigned to the mdev is not bound to the vfio_ap device driver,
  the adapter will not get plugged into the KVM guest on startup, or when
  a new adapter is assigned to the mdev.

* Replaced patch 8 by squashing patches 8 (filtering patch) and 15 (handle 
  probe/remove).

* Added a patch 1 to remove disable IRQ after a reset because the reset
  already disables a queue.

* Now using filtering code to update the KVM guest's matrix when
  notified that AP bus scan has completed.

* Fixed issue with probe/remove not inititiated by a configuration change
  occurring within a config change.


Change log v9-v10:
-----------------
* Updated the documentation in vfio-ap.rst to include information about the
  AP dynamic configuration support

Change log v8-v9:
----------------
* Fixed errors flagged by the kernel test robot

* Fixed issue with guest losing queues when a new queue is probed due to
  manual bind operation.

Change log v7-v8:
----------------
* Now logging a message when an attempt to reserve APQNs for the zcrypt
  drivers will result in taking a queue away from a KVM guest to provide
  the sysadmin a way to ascertain why the sysfs operation failed.

* Created locked and unlocked versions of the ap_parse_mask_str() function.

* Now using new interface provided by an AP bus patch -
  s390/ap: introduce new ap function ap_get_qdev() - to retrieve
  struct ap_queue representing an AP queue device. This patch is not a
  part of this series but is a prerequisite for this series. 

Change log v6-v7:
----------------
* Added callbacks to AP bus:
  - on_config_changed: Notifies implementing drivers that
    the AP configuration has changed since last AP device scan.
  - on_scan_complete: Notifies implementing drivers that the device scan
    has completed.
  - implemented on_config_changed and on_scan_complete callbacks for
    vfio_ap device driver.
  - updated vfio_ap device driver's probe and remove callbacks to handle
    dynamic changes to the AP device model. 
* Added code to filter APQNs when assigning AP resources to a KVM guest's
  CRYCB

Change log v5-v6:
----------------
* Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5 
  series. Harald Freudenberer pointed out that the mutex lock
  for ap_perms_mutex in the apmask_store and aqmask_store functions
  was not being freed. 

* Removed patch 6/7 which added logging to the vfio_ap driver
  to expedite acceptance of this series. The logging will be introduced
  with a separate patch series to allow more time to explore options
  such as DBF logging vs. tracepoints.

* Added 3 patches related to ensuring that APQNs that do not reference
  AP queue devices bound to the vfio_ap device driver are not assigned
  to the guest CRYCB:

  Patch 4: Filter CRYCB bits for unavailable queue devices
  Patch 5: sysfs attribute to display the guest CRYCB
  Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks

* Added a patch (Patch 9) to version the vfio_ap module.

* Reshuffled patches to allow the in_use callback implementation to
  invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
  patch 2. 

Change log v4-v5:
----------------
* Added a patch to provide kernel s390dbf debug logs for VFIO AP

Change log v3->v4:
-----------------
* Restored patches preventing root user from changing ownership of
  APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
  assigned to an mdev.

* No longer enforcing requirement restricting guest access to
  queues represented by a queue device bound to the vfio_ap
  device driver.

* Removed shadow CRYCB and now directly updating the guest CRYCB
  from the matrix mdev's matrix.

* Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
  Control' patches.

* Disabled bind/unbind sysfs interfaces for vfio_ap driver

Change log v2->v3:
-----------------
* Allow guest access to an AP queue only if the queue is bound to
  the vfio_ap device driver.

* Removed the patch to test CRYCB masks before taking the vCPUs
  out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.

Change log v1->v2:
-----------------
* Removed patches preventing root user from unbinding AP queues from 
  the vfio_ap device driver
* Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic 
  changes to the AP guest configuration due to root user interventions
  or hardware anomalies.


Tony Krowiak (14):
  s390/vfio-ap: No need to disable IRQ after queue reset
  390/vfio-ap: use new AP bus interface to search for queue devices
  s390/vfio-ap: manage link between queue struct and matrix mdev
  s390/zcrypt: driver callback to indicate resource in use
  s390/vfio-ap: implement in-use callback for vfio_ap driver
  s390/vfio-ap: introduce shadow APCB
  s390/vfio-ap: sysfs attribute to display the guest's matrix
  s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
  s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  s390/zcrypt: Notify driver on config changed and scan complete
    callbacks
  s390/vfio-ap: handle host AP config change notification
  s390/vfio-ap: handle AP bus scan completed notification
  s390/vfio-ap: update docs to include dynamic config support

 Documentation/s390/vfio-ap.rst        |  362 ++++++--
 drivers/s390/crypto/ap_bus.c          |  236 +++++-
 drivers/s390/crypto/ap_bus.h          |   16 +
 drivers/s390/crypto/vfio_ap_drv.c     |   52 +-
 drivers/s390/crypto/vfio_ap_ops.c     | 1091 +++++++++++++++++++------
 drivers/s390/crypto/vfio_ap_private.h |   29 +-
 6 files changed, 1384 insertions(+), 402 deletions(-)

-- 
2.21.1


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
@ 2020-10-22 17:11 ` Tony Krowiak
  2020-10-22 19:44   ` kernel test robot
  2020-10-27  6:48   ` Halil Pasic
  2020-10-22 17:11 ` [PATCH v11 02/14] 390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:11 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

The queues assigned to a matrix mediated device are currently reset when:

* The VFIO_DEVICE_RESET ioctl is invoked
* The mdev fd is closed by userspace (QEMU)
* The mdev is removed from sysfs.

Immediately after the reset of a queue, a call is made to disable
interrupts for the queue. This is entirely unnecessary because the reset of
a queue disables interrupts, so this will be removed.

Since interrupt processing may have been enabled by the guest, it may also
be necessary to clean up the resources used for interrupt processing. Part
of the cleanup operation requires a reference to KVM, so a check is also
being added to ensure the reference to KVM exists. The reason is because
the release callback - invoked when userspace closes the mdev fd - removes
the reference to KVM. When the remove callback - called when the mdev is
removed from sysfs - is subsequently invoked, there will be no reference to
KVM when the cleanup is performed.

This patch will also do a bit of refactoring due to the fact that the
remove callback, implemented in vfio_ap_drv.c, disables the queue after
resetting it. Instead of the remove callback making a call into the
vfio_ap_ops.c to clean up the resources used for interrupt processing,
let's move the probe and remove callbacks into the vfio_ap_ops.c
file keep all code related to managing queues in a single file.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     | 45 +------------------
 drivers/s390/crypto/vfio_ap_ops.c     | 63 +++++++++++++++++++--------
 drivers/s390/crypto/vfio_ap_private.h |  7 +--
 3 files changed, 52 insertions(+), 63 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index be2520cc010b..73bd073fd5d3 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -43,47 +43,6 @@ static struct ap_device_id ap_queue_ids[] = {
 
 MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
 
-/**
- * vfio_ap_queue_dev_probe:
- *
- * Allocate a vfio_ap_queue structure and associate it
- * with the device as driver_data.
- */
-static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
-{
-	struct vfio_ap_queue *q;
-
-	q = kzalloc(sizeof(*q), GFP_KERNEL);
-	if (!q)
-		return -ENOMEM;
-	dev_set_drvdata(&apdev->device, q);
-	q->apqn = to_ap_queue(&apdev->device)->qid;
-	q->saved_isc = VFIO_AP_ISC_INVALID;
-	return 0;
-}
-
-/**
- * vfio_ap_queue_dev_remove:
- *
- * Takes the matrix lock to avoid actions on this device while removing
- * Free the associated vfio_ap_queue structure
- */
-static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
-{
-	struct vfio_ap_queue *q;
-	int apid, apqi;
-
-	mutex_lock(&matrix_dev->lock);
-	q = dev_get_drvdata(&apdev->device);
-	dev_set_drvdata(&apdev->device, NULL);
-	apid = AP_QID_CARD(q->apqn);
-	apqi = AP_QID_QUEUE(q->apqn);
-	vfio_ap_mdev_reset_queue(apid, apqi, 1);
-	vfio_ap_irq_disable(q);
-	kfree(q);
-	mutex_unlock(&matrix_dev->lock);
-}
-
 static void vfio_ap_matrix_dev_release(struct device *dev)
 {
 	struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev);
@@ -186,8 +145,8 @@ static int __init vfio_ap_init(void)
 		return ret;
 
 	memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
-	vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
-	vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
+	vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
+	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
 	vfio_ap_drv.ids = ap_queue_ids;
 
 	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..c471832f0a30 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -119,7 +119,8 @@ static void vfio_ap_wait_for_irqclear(int apqn)
  */
 static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
 {
-	if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev)
+	if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev &&
+	    q->matrix_mdev->kvm)
 		kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
 	if (q->saved_pfn && q->matrix_mdev)
 		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
@@ -144,7 +145,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
  * Returns if ap_aqic function failed with invalid, deconfigured or
  * checkstopped AP.
  */
-struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
+static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
 {
 	struct ap_qirq_ctrl aqic_gisa = {};
 	struct ap_queue_status status;
@@ -297,6 +298,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 	if (!q)
 		goto out_unlock;
 
+	q->matrix_mdev = matrix_mdev;
 	status = vcpu->run->s.regs.gprs[1];
 
 	/* If IR bit(16) is set we enable the interrupt */
@@ -1114,20 +1116,6 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
-static void vfio_ap_irq_disable_apqn(int apqn)
-{
-	struct device *dev;
-	struct vfio_ap_queue *q;
-
-	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
-				 &apqn, match_apqn);
-	if (dev) {
-		q = dev_get_drvdata(dev);
-		vfio_ap_irq_disable(q);
-		put_device(dev);
-	}
-}
-
 int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
 			     unsigned int retry)
 {
@@ -1162,6 +1150,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
 {
 	int ret;
 	int rc = 0;
+	struct vfio_ap_queue *q;
 	unsigned long apid, apqi;
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
@@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
 			 */
 			if (ret)
 				rc = ret;
-			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
+			q = vfio_ap_get_queue(matrix_mdev,
+					      AP_MKQID(apid, apqi));
+			if (q)
+				vfio_ap_free_aqic_resources(q);
 		}
 	}
 
@@ -1302,3 +1294,40 @@ void vfio_ap_mdev_unregister(void)
 {
 	mdev_unregister_device(&matrix_dev->device);
 }
+
+int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
+{
+	struct vfio_ap_queue *q;
+	struct ap_queue *queue;
+
+	queue = to_ap_queue(&apdev->device);
+
+	q = kzalloc(sizeof(*q), GFP_KERNEL);
+	if (!q)
+		return -ENOMEM;
+
+	dev_set_drvdata(&queue->ap_dev.device, q);
+	q->apqn = queue->qid;
+	q->saved_isc = VFIO_AP_ISC_INVALID;
+
+	return 0;
+}
+
+void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
+{
+	struct vfio_ap_queue *q;
+	struct ap_queue *queue;
+	int apid, apqi;
+
+	queue = to_ap_queue(&apdev->device);
+
+	mutex_lock(&matrix_dev->lock);
+	q = dev_get_drvdata(&queue->ap_dev.device);
+	dev_set_drvdata(&queue->ap_dev.device, NULL);
+	apid = AP_QID_CARD(q->apqn);
+	apqi = AP_QID_QUEUE(q->apqn);
+	vfio_ap_mdev_reset_queue(apid, apqi, 1);
+	vfio_ap_free_aqic_resources(q);
+	kfree(q);
+	mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index f46dde56b464..d9003de4fbad 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -90,8 +90,6 @@ struct ap_matrix_mdev {
 
 extern int vfio_ap_mdev_register(void);
 extern void vfio_ap_mdev_unregister(void);
-int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
-			     unsigned int retry);
 
 struct vfio_ap_queue {
 	struct ap_matrix_mdev *matrix_mdev;
@@ -100,5 +98,8 @@ struct vfio_ap_queue {
 #define VFIO_AP_ISC_INVALID 0xff
 	unsigned char saved_isc;
 };
-struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
+
+int vfio_ap_mdev_probe_queue(struct ap_device *queue);
+void vfio_ap_mdev_remove_queue(struct ap_device *queue);
+
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 02/14] 390/vfio-ap: use new AP bus interface to search for queue devices
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
  2020-10-22 17:11 ` [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset Tony Krowiak
@ 2020-10-22 17:11 ` Tony Krowiak
  2020-10-27  7:01   ` Halil Pasic
  2020-10-22 17:11 ` [PATCH v11 03/14] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:11 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

This patch refactors the vfio_ap device driver to use the AP bus's
ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
information about a queue that is bound to the vfio_ap device driver.
The bus's ap_get_qdev() function retrieves the queue device from a
hashtable keyed by APQN. This is much more efficient than looping over
the list of devices attached to the AP bus by several orders of
magnitude.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 35 +++++++++++++------------------
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index c471832f0a30..049b97d7444c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -26,43 +26,36 @@
 
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 
-static int match_apqn(struct device *dev, const void *data)
-{
-	struct vfio_ap_queue *q = dev_get_drvdata(dev);
-
-	return (q->apqn == *(int *)(data)) ? 1 : 0;
-}
-
 /**
- * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
+ * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
  * @matrix_mdev: the associated mediated matrix
  * @apqn: The queue APQN
  *
- * Retrieve a queue with a specific APQN from the list of the
- * devices of the vfio_ap_drv.
- * Verify that the APID and the APQI are set in the matrix.
+ * Retrieve a queue with a specific APQN from the AP queue devices attached to
+ * the AP bus.
  *
- * Returns the pointer to the associated vfio_ap_queue
+ * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
  */
 static struct vfio_ap_queue *vfio_ap_get_queue(
 					struct ap_matrix_mdev *matrix_mdev,
-					int apqn)
+					unsigned long apqn)
 {
-	struct vfio_ap_queue *q;
-	struct device *dev;
+	struct ap_queue *queue;
+	struct vfio_ap_queue *q = NULL;
 
 	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
 		return NULL;
 	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
 		return NULL;
 
-	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
-				 &apqn, match_apqn);
-	if (!dev)
+	queue = ap_get_qdev(apqn);
+	if (!queue)
 		return NULL;
-	q = dev_get_drvdata(dev);
-	q->matrix_mdev = matrix_mdev;
-	put_device(dev);
+
+	if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver)
+		q = dev_get_drvdata(&queue->ap_dev.device);
+
+	put_device(&queue->ap_dev.device);
 
 	return q;
 }
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 03/14] s390/vfio-ap: manage link between queue struct and matrix mdev
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
  2020-10-22 17:11 ` [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset Tony Krowiak
  2020-10-22 17:11 ` [PATCH v11 02/14] 390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
@ 2020-10-22 17:11 ` Tony Krowiak
  2020-10-27  9:33   ` Halil Pasic
  2020-10-22 17:11 ` [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:11 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Let's create links between each queue device bound to the vfio_ap device
driver and the matrix mdev to which the queue is assigned. The idea is to
facilitate efficient retrieval of the objects representing the queue
devices and matrix mdevs as well as to verify that a queue assigned to
a matrix mdev is bound to the driver.

The links will be created as follows:

   * When the queue device is probed, if its APQN is assigned to a matrix
     mdev, the structures representing the queue device and the matrix mdev
     will be linked.

   * When an adapter or domain is assigned to a matrix mdev, for each new
     APQN assigned that references a queue device bound to the vfio_ap
     device driver, the structures representing the queue device and the
     matrix mdev will be linked.

The links will be removed as follows:

   * When the queue device is removed, if its APQN is assigned to a matrix
     mdev, the structures representing the queue device and the matrix mdev
     will be unlinked.

   * When an adapter or domain is unassigned from a matrix mdev, for each
     APQN unassigned that references a queue device bound to the vfio_ap
     device driver, the structures representing the queue device and the
     matrix mdev will be unlinked.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c     | 146 +++++++++++++++++++++++---
 drivers/s390/crypto/vfio_ap_private.h |   3 +
 2 files changed, 135 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 049b97d7444c..1357f8f8b7e4 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -28,7 +28,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 
 /**
  * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
- * @matrix_mdev: the associated mediated matrix
  * @apqn: The queue APQN
  *
  * Retrieve a queue with a specific APQN from the AP queue devices attached to
@@ -36,18 +35,11 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
  *
  * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
  */
-static struct vfio_ap_queue *vfio_ap_get_queue(
-					struct ap_matrix_mdev *matrix_mdev,
-					unsigned long apqn)
+static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
 {
 	struct ap_queue *queue;
 	struct vfio_ap_queue *q = NULL;
 
-	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
-		return NULL;
-	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
-		return NULL;
-
 	queue = ap_get_qdev(apqn);
 	if (!queue)
 		return NULL;
@@ -60,6 +52,19 @@ static struct vfio_ap_queue *vfio_ap_get_queue(
 	return q;
 }
 
+static struct vfio_ap_queue *
+vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
+{
+	struct vfio_ap_queue *q;
+
+	hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
+		if (q && (q->apqn == apqn))
+			return q;
+	}
+
+	return NULL;
+}
+
 /**
  * vfio_ap_wait_for_irqclear
  * @apqn: The AP Queue number
@@ -171,7 +176,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
 		  status.response_code);
 end_free:
 	vfio_ap_free_aqic_resources(q);
-	q->matrix_mdev = NULL;
 	return status;
 }
 
@@ -284,14 +288,14 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 
 	if (!vcpu->kvm->arch.crypto.pqap_hook)
 		goto out_unlock;
+
 	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
 				   struct ap_matrix_mdev, pqap_hook);
 
-	q = vfio_ap_get_queue(matrix_mdev, apqn);
+	q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
 	if (!q)
 		goto out_unlock;
 
-	q->matrix_mdev = matrix_mdev;
 	status = vcpu->run->s.regs.gprs[1];
 
 	/* If IR bit(16) is set we enable the interrupt */
@@ -331,6 +335,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 
 	matrix_mdev->mdev = mdev;
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+	hash_init(matrix_mdev->qtable);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	matrix_mdev->pqap_hook.hook = handle_pqap;
 	matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -559,6 +564,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
 	return 0;
 }
 
+enum qlink_type {
+	LINK_APID,
+	LINK_APQI,
+	UNLINK_APID,
+	UNLINK_APQI,
+};
+
+static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
+				    unsigned long apid, unsigned long apqi)
+{
+	struct vfio_ap_queue *q;
+
+	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
+	if (q) {
+		q->matrix_mdev = matrix_mdev;
+		hash_add(matrix_mdev->qtable,
+			 &q->mdev_qnode, q->apqn);
+	}
+}
+
+static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
+{
+	struct vfio_ap_queue *q;
+
+	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
+	if (q) {
+		q->matrix_mdev = NULL;
+		hash_del(&q->mdev_qnode);
+	}
+}
+
+/**
+ * vfio_ap_mdev_link_queues
+ *
+ * @matrix_mdev: The matrix mdev to link.
+ * @type:	 The type of @qlink_id.
+ * @qlink_id:	 The APID or APQI of the queues to link.
+ *
+ * Sets or clears the links between the queues with the specified @qlink_id
+ * and the @matrix_mdev:
+ *     @type == LINK_APID: Set the links between the @matrix_mdev and the
+ *                         queues with the specified @qlink_id (APID)
+ *     @type == LINK_APQI: Set the links between the @matrix_mdev and the
+ *                         queues with the specified @qlink_id (APQI)
+ *     @type == UNLINK_APID: Clear the links between the @matrix_mdev and the
+ *                           queues with the specified @qlink_id (APID)
+ *     @type == UNLINK_APQI: Clear the links between the @matrix_mdev and the
+ *                           queues with the specified @qlink_id (APQI)
+ */
+static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
+				     enum qlink_type type,
+				     unsigned long qlink_id)
+{
+	unsigned long id;
+
+	switch (type) {
+	case LINK_APID:
+		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
+				     matrix_mdev->matrix.aqm_max + 1)
+			vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
+		break;
+	case UNLINK_APID:
+		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
+				     matrix_mdev->matrix.aqm_max + 1)
+			vfio_ap_mdev_unlink_queue(qlink_id, id);
+		break;
+	case LINK_APQI:
+		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
+				     matrix_mdev->matrix.apm_max + 1)
+			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
+		break;
+	case UNLINK_APQI:
+		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
+				     matrix_mdev->matrix.apm_max + 1)
+			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+	}
+}
+
 /**
  * assign_adapter_store
  *
@@ -628,6 +714,7 @@ static ssize_t assign_adapter_store(struct device *dev,
 	if (ret)
 		goto share_err;
 
+	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
 	ret = count;
 	goto done;
 
@@ -679,6 +766,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -769,6 +857,7 @@ static ssize_t assign_domain_store(struct device *dev,
 	if (ret)
 		goto share_err;
 
+	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
 	ret = count;
 	goto done;
 
@@ -821,6 +910,7 @@ static ssize_t unassign_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
+	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -1159,8 +1249,8 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
 			 */
 			if (ret)
 				rc = ret;
-			q = vfio_ap_get_queue(matrix_mdev,
-					      AP_MKQID(apid, apqi));
+			q = vfio_ap_mdev_get_queue(matrix_mdev,
+						   AP_MKQID(apid, apqi));
 			if (q)
 				vfio_ap_free_aqic_resources(q);
 		}
@@ -1288,6 +1378,29 @@ void vfio_ap_mdev_unregister(void)
 	mdev_unregister_device(&matrix_dev->device);
 }
 
+/**
+ * vfio_ap_queue_link_mdev
+ *
+ * @q: The queue to link with the matrix mdev.
+ *
+ * Links @q with the matrix mdev to which the queue's APQN is assigned.
+ */
+static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
+{
+	unsigned long apid = AP_QID_CARD(q->apqn);
+	unsigned long apqi = AP_QID_QUEUE(q->apqn);
+	struct ap_matrix_mdev *matrix_mdev;
+
+	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+		if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
+		    test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
+			q->matrix_mdev = matrix_mdev;
+			hash_add(matrix_mdev->qtable, &q->mdev_qnode, q->apqn);
+			break;
+		}
+	}
+}
+
 int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
 {
 	struct vfio_ap_queue *q;
@@ -1299,9 +1412,12 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
 	if (!q)
 		return -ENOMEM;
 
+	mutex_lock(&matrix_dev->lock);
 	dev_set_drvdata(&queue->ap_dev.device, q);
 	q->apqn = queue->qid;
 	q->saved_isc = VFIO_AP_ISC_INVALID;
+	vfio_ap_queue_link_mdev(q);
+	mutex_unlock(&matrix_dev->lock);
 
 	return 0;
 }
@@ -1321,6 +1437,8 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
 	apqi = AP_QID_QUEUE(q->apqn);
 	vfio_ap_mdev_reset_queue(apid, apqi, 1);
 	vfio_ap_free_aqic_resources(q);
+	if (q->matrix_mdev)
+		hash_del(&q->mdev_qnode);
 	kfree(q);
 	mutex_unlock(&matrix_dev->lock);
 }
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index d9003de4fbad..4e5cc72fc0db 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -18,6 +18,7 @@
 #include <linux/delay.h>
 #include <linux/mutex.h>
 #include <linux/kvm_host.h>
+#include <linux/hashtable.h>
 
 #include "ap_bus.h"
 
@@ -86,6 +87,7 @@ struct ap_matrix_mdev {
 	struct kvm *kvm;
 	struct kvm_s390_module_hook pqap_hook;
 	struct mdev_device *mdev;
+	DECLARE_HASHTABLE(qtable, 8);
 };
 
 extern int vfio_ap_mdev_register(void);
@@ -97,6 +99,7 @@ struct vfio_ap_queue {
 	int	apqn;
 #define VFIO_AP_ISC_INVALID 0xff
 	unsigned char saved_isc;
+	struct hlist_node mdev_qnode;
 };
 
 int vfio_ap_mdev_probe_queue(struct ap_device *queue);
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (2 preceding siblings ...)
  2020-10-22 17:11 ` [PATCH v11 03/14] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
@ 2020-10-22 17:11 ` Tony Krowiak
  2020-10-27 13:01   ` Halil Pasic
  2020-10-27 16:55   ` Harald Freudenberger
  2020-10-22 17:12 ` [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:11 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Introduces a new driver callback to prevent a root user from unbinding
an AP queue from its device driver if the queue is in use. The callback
will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
attributes would result in one or more AP queues being removed from its
driver. If the callback responds in the affirmative for any driver
queried, the change to the apmask or aqmask will be rejected with a device
in use error.

For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver facilitates pass-through of an AP queue to a
guest. The idea here is that a guest may be administered by a different
sysadmin than the host and we don't want AP resources to unexpectedly
disappear from a guest's AP configuration (i.e., adapters and domains
assigned to the matrix mdev). This will enforce the proper procedure for
removing AP resources intended for guest usage which is to
first unassign them from the matrix mdev, then unbind them from the
vfio_ap device driver.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
 drivers/s390/crypto/ap_bus.h |   4 +
 2 files changed, 142 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 485cbfcbf06e..998e61cd86d9 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -35,6 +35,7 @@
 #include <linux/mod_devicetable.h>
 #include <linux/debugfs.h>
 #include <linux/ctype.h>
+#include <linux/module.h>
 
 #include "ap_bus.h"
 #include "ap_debug.h"
@@ -893,6 +894,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
 	return 0;
 }
 
+static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
+			       unsigned long *newmap)
+{
+	unsigned long size;
+	int rc;
+
+	size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
+	if (*str == '+' || *str == '-') {
+		memcpy(newmap, bitmap, size);
+		rc = modify_bitmap(str, newmap, bits);
+	} else {
+		memset(newmap, 0, size);
+		rc = hex2bitmap(str, newmap, bits);
+	}
+	return rc;
+}
+
 int ap_parse_mask_str(const char *str,
 		      unsigned long *bitmap, int bits,
 		      struct mutex *lock)
@@ -912,14 +930,7 @@ int ap_parse_mask_str(const char *str,
 		kfree(newmap);
 		return -ERESTARTSYS;
 	}
-
-	if (*str == '+' || *str == '-') {
-		memcpy(newmap, bitmap, size);
-		rc = modify_bitmap(str, newmap, bits);
-	} else {
-		memset(newmap, 0, size);
-		rc = hex2bitmap(str, newmap, bits);
-	}
+	rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
 	if (rc == 0)
 		memcpy(bitmap, newmap, size);
 	mutex_unlock(lock);
@@ -1111,12 +1122,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
 	return rc;
 }
 
+static int __verify_card_reservations(struct device_driver *drv, void *data)
+{
+	int rc = 0;
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+	unsigned long *newapm = (unsigned long *)data;
+
+	/*
+	 * No need to verify whether the driver is using the queues if it is the
+	 * default driver.
+	 */
+	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+		return 0;
+
+	/* The non-default driver's module must be loaded */
+	if (!try_module_get(drv->owner))
+		return 0;
+
+	if (ap_drv->in_use)
+		if (ap_drv->in_use(newapm, ap_perms.aqm))
+			rc = -EBUSY;
+
+	module_put(drv->owner);
+
+	return rc;
+}
+
+static int apmask_commit(unsigned long *newapm)
+{
+	int rc;
+	unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
+
+	/*
+	 * Check if any bits in the apmask have been set which will
+	 * result in queues being removed from non-default drivers
+	 */
+	if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
+		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+				      __verify_card_reservations);
+		if (rc)
+			return rc;
+	}
+
+	memcpy(ap_perms.apm, newapm, APMASKSIZE);
+
+	return 0;
+}
+
 static ssize_t apmask_store(struct bus_type *bus, const char *buf,
 			    size_t count)
 {
 	int rc;
+	DECLARE_BITMAP(newapm, AP_DEVICES);
+
+	if (mutex_lock_interruptible(&ap_perms_mutex))
+		return -ERESTARTSYS;
+
+	rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
+	if (rc)
+		goto done;
 
-	rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
+	rc = apmask_commit(newapm);
+
+done:
+	mutex_unlock(&ap_perms_mutex);
 	if (rc)
 		return rc;
 
@@ -1142,12 +1211,71 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
 	return rc;
 }
 
+static int __verify_queue_reservations(struct device_driver *drv, void *data)
+{
+	int rc = 0;
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+	unsigned long *newaqm = (unsigned long *)data;
+
+	/*
+	 * If the reserved bits do not identify queues reserved for use by the
+	 * non-default driver, there is no need to verify the driver is using
+	 * the queues.
+	 */
+	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+		return 0;
+
+	/* The non-default driver's module must be loaded */
+	if (!try_module_get(drv->owner))
+		return 0;
+
+	if (ap_drv->in_use)
+		if (ap_drv->in_use(ap_perms.apm, newaqm))
+			rc = -EBUSY;
+
+	module_put(drv->owner);
+
+	return rc;
+}
+
+static int aqmask_commit(unsigned long *newaqm)
+{
+	int rc;
+	unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
+
+	/*
+	 * Check if any bits in the aqmask have been set which will
+	 * result in queues being removed from non-default drivers
+	 */
+	if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
+		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+				      __verify_queue_reservations);
+		if (rc)
+			return rc;
+	}
+
+	memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
+
+	return 0;
+}
+
 static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
 			    size_t count)
 {
 	int rc;
+	DECLARE_BITMAP(newaqm, AP_DOMAINS);
 
-	rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
+	if (mutex_lock_interruptible(&ap_perms_mutex))
+		return -ERESTARTSYS;
+
+	rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
+	if (rc)
+		goto done;
+
+	rc = aqmask_commit(newaqm);
+
+done:
+	mutex_unlock(&ap_perms_mutex);
 	if (rc)
 		return rc;
 
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 5029b80132aa..6ce154d924d3 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -145,6 +145,7 @@ struct ap_driver {
 
 	int (*probe)(struct ap_device *);
 	void (*remove)(struct ap_device *);
+	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
 };
 
 #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
@@ -293,6 +294,9 @@ void ap_queue_init_state(struct ap_queue *aq);
 struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
 			       int comp_device_type, unsigned int functions);
 
+#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
+#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
+
 struct ap_perms {
 	unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
 	unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (3 preceding siblings ...)
  2020-10-22 17:11 ` [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
@ 2020-10-22 17:12 ` Tony Krowiak
  2020-10-27 13:27   ` Halil Pasic
  2020-10-22 17:12 ` [PATCH v11 06/14] s390/vfio-ap: introduce shadow APCB Tony Krowiak
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:12 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |  1 +
 drivers/s390/crypto/vfio_ap_ops.c     | 78 +++++++++++++++++++--------
 drivers/s390/crypto/vfio_ap_private.h |  2 +
 3 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 73bd073fd5d3..8934471b7944 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
 	memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
 	vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
 	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
+	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
 	vfio_ap_drv.ids = ap_queue_ids;
 
 	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 1357f8f8b7e4..9e9fad560859 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -522,18 +522,40 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
 	return 0;
 }
 
+#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
+			 "already assigned to %s"
+
+static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
+					 unsigned long *apm,
+					 unsigned long *aqm)
+{
+	unsigned long apid, apqi;
+
+	for_each_set_bit_inv(apid, apm, AP_DEVICES)
+		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
+			pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
+}
+
 /**
  * vfio_ap_mdev_verify_no_sharing
  *
- * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
+ * Verifies that each APQN derived from the cross product of the AP adapter IDs
+ * and AP queue indexes comprising an AP matrix is not assigned to a
  * mediated device. AP queue sharing is not allowed.
  *
- * @matrix_mdev: the mediated matrix device
+ * @matrix_mdev: the mediated matrix device to which the APQNs being verified
+ *		 are assigned. If the value is not NULL, then verification will
+ *		 proceed for all other matrix mediated devices; otherwise, all
+ *		 matrix mediated devices will be verified.
+ * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
+ * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
  *
- * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
+ * Returns 0 if no APQNs are not shared, otherwise; returns -EADDRINUSE if one
+ * or more APQNs are shared.
  */
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
+					  unsigned long *mdev_apm,
+					  unsigned long *mdev_aqm)
 {
 	struct ap_matrix_mdev *lstdev;
 	DECLARE_BITMAP(apm, AP_DEVICES);
@@ -550,14 +572,15 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
 		 * We work on full longs, as we can only exclude the leftover
 		 * bits in non-inverse order. The leftover is all zeros.
 		 */
-		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
-				lstdev->matrix.apm, AP_DEVICES))
+		if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
 			continue;
 
-		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
-				lstdev->matrix.aqm, AP_DOMAINS))
+		if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
 			continue;
 
+		vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
+					     apm, aqm);
+
 		return -EADDRINUSE;
 	}
 
@@ -683,6 +706,7 @@ static ssize_t assign_adapter_store(struct device *dev,
 {
 	int ret;
 	unsigned long apid;
+	DECLARE_BITMAP(apm, AP_DEVICES);
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
@@ -708,18 +732,18 @@ static ssize_t assign_adapter_store(struct device *dev,
 	if (ret)
 		goto done;
 
-	set_bit_inv(apid, matrix_mdev->matrix.apm);
+	memset(apm, 0, sizeof(apm));
+	set_bit_inv(apid, apm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
+	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
+					     matrix_mdev->matrix.aqm);
 	if (ret)
-		goto share_err;
+		goto done;
 
+	set_bit_inv(apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
 	ret = count;
-	goto done;
 
-share_err:
-	clear_bit_inv(apid, matrix_mdev->matrix.apm);
 done:
 	mutex_unlock(&matrix_dev->lock);
 
@@ -831,6 +855,7 @@ static ssize_t assign_domain_store(struct device *dev,
 {
 	int ret;
 	unsigned long apqi;
+	DECLARE_BITMAP(aqm, AP_DOMAINS);
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
@@ -851,18 +876,18 @@ static ssize_t assign_domain_store(struct device *dev,
 	if (ret)
 		goto done;
 
-	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
+	memset(aqm, 0, sizeof(aqm));
+	set_bit_inv(apqi, aqm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
+	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
+					     matrix_mdev->matrix.apm, aqm);
 	if (ret)
-		goto share_err;
+		goto done;
 
+	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
 	ret = count;
-	goto done;
 
-share_err:
-	clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
 done:
 	mutex_unlock(&matrix_dev->lock);
 
@@ -1442,3 +1467,14 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
 	kfree(q);
 	mutex_unlock(&matrix_dev->lock);
 }
+
+bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
+{
+	bool in_use;
+
+	mutex_lock(&matrix_dev->lock);
+	in_use = !!vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
+	mutex_unlock(&matrix_dev->lock);
+
+	return in_use;
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 4e5cc72fc0db..c1d8b5507610 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -105,4 +105,6 @@ struct vfio_ap_queue {
 int vfio_ap_mdev_probe_queue(struct ap_device *queue);
 void vfio_ap_mdev_remove_queue(struct ap_device *queue);
 
+bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
+
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 06/14] s390/vfio-ap: introduce shadow APCB
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (4 preceding siblings ...)
  2020-10-22 17:12 ` [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
@ 2020-10-22 17:12 ` Tony Krowiak
  2020-10-28  8:11   ` Halil Pasic
  2020-10-22 17:12 ` [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:12 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

The APCB is a field within the CRYCB that provides the AP configuration
to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
maintain it for the lifespan of the guest.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c     | 24 +++++++++++++++++++-----
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 9e9fad560859..9791761aa7fd 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -320,6 +320,19 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
 	matrix->adm_max = info->apxa ? info->Nd : 15;
 }
 
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+				  matrix_mdev->shadow_apcb.apm,
+				  matrix_mdev->shadow_apcb.aqm,
+				  matrix_mdev->shadow_apcb.adm);
+}
+
 static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 {
 	struct ap_matrix_mdev *matrix_mdev;
@@ -335,6 +348,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 
 	matrix_mdev->mdev = mdev;
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
 	hash_init(matrix_mdev->qtable);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	matrix_mdev->pqap_hook.hook = handle_pqap;
@@ -1213,13 +1227,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 	if (ret)
 		return NOTIFY_DONE;
 
-	/* If there is no CRYCB pointer, then we can't copy the masks */
-	if (!matrix_mdev->kvm->arch.crypto.crycbd)
+	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
 		return NOTIFY_DONE;
 
-	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
-				  matrix_mdev->matrix.aqm,
-				  matrix_mdev->matrix.adm);
+	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
+	       sizeof(matrix_mdev->shadow_apcb));
+	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 
 	return NOTIFY_OK;
 }
@@ -1329,6 +1342,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
 		kvm_put_kvm(matrix_mdev->kvm);
 		matrix_mdev->kvm = NULL;
 	}
+
 	mutex_unlock(&matrix_dev->lock);
 
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index c1d8b5507610..fc8634cee485 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -75,6 +75,7 @@ struct ap_matrix {
  * @list:	allows the ap_matrix_mdev struct to be added to a list
  * @matrix:	the adapters, usage domains and control domains assigned to the
  *		mediated matrix device.
+ * @shadow_apcb:    the shadow copy of the APCB field of the KVM guest's CRYCB
  * @group_notifier: notifier block used for specifying callback function for
  *		    handling the VFIO_GROUP_NOTIFY_SET_KVM event
  * @kvm:	the struct holding guest's state
@@ -82,6 +83,7 @@ struct ap_matrix {
 struct ap_matrix_mdev {
 	struct list_head node;
 	struct ap_matrix matrix;
+	struct ap_matrix shadow_apcb;
 	struct notifier_block group_notifier;
 	struct notifier_block iommu_notifier;
 	struct kvm *kvm;
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (5 preceding siblings ...)
  2020-10-22 17:12 ` [PATCH v11 06/14] s390/vfio-ap: introduce shadow APCB Tony Krowiak
@ 2020-10-22 17:12 ` Tony Krowiak
  2020-10-28  8:17   ` Halil Pasic
  2020-10-22 17:12 ` [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device Tony Krowiak
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:12 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

The matrix of adapters and domains configured in a guest's APCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of a guest
using the matrix mdev. For a matrix mdev denoted by $uuid, the crycb for a
guest using the matrix mdev can be displayed as follows:

   cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix

If a guest is not using the matrix mdev at the time the crycb is displayed,
an error (ENODEV) will be returned.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 54 +++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 9791761aa7fd..7bad70d7bcef 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1073,29 +1073,24 @@ static ssize_t control_domains_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(control_domains);
 
-static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
-			   char *buf)
+static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
 {
-	struct mdev_device *mdev = mdev_from_dev(dev);
-	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	char *bufpos = buf;
 	unsigned long apid;
 	unsigned long apqi;
 	unsigned long apid1;
 	unsigned long apqi1;
-	unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
-	unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
+	unsigned long napm_bits = matrix->apm_max + 1;
+	unsigned long naqm_bits = matrix->aqm_max + 1;
 	int nchars = 0;
 	int n;
 
-	apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);
-	apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
-
-	mutex_lock(&matrix_dev->lock);
+	apid1 = find_first_bit_inv(matrix->apm, napm_bits);
+	apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);
 
 	if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
-		for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
-			for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+		for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
+			for_each_set_bit_inv(apqi, matrix->aqm,
 					     naqm_bits) {
 				n = sprintf(bufpos, "%02lx.%04lx\n", apid,
 					    apqi);
@@ -1104,25 +1099,55 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
 			}
 		}
 	} else if (apid1 < napm_bits) {
-		for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
+		for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
 			n = sprintf(bufpos, "%02lx.\n", apid);
 			bufpos += n;
 			nchars += n;
 		}
 	} else if (apqi1 < naqm_bits) {
-		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
+		for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
 			n = sprintf(bufpos, ".%04lx\n", apqi);
 			bufpos += n;
 			nchars += n;
 		}
 	}
 
+	return nchars;
+}
+
+static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	ssize_t nchars;
+	struct mdev_device *mdev = mdev_from_dev(dev);
+	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+	mutex_lock(&matrix_dev->lock);
+	nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->matrix, buf);
 	mutex_unlock(&matrix_dev->lock);
 
 	return nchars;
 }
 static DEVICE_ATTR_RO(matrix);
 
+static ssize_t guest_matrix_show(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	ssize_t nchars;
+	struct mdev_device *mdev = mdev_from_dev(dev);
+	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
+		return -ENODEV;
+
+	mutex_lock(&matrix_dev->lock);
+	nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->shadow_apcb, buf);
+	mutex_unlock(&matrix_dev->lock);
+
+	return nchars;
+}
+static DEVICE_ATTR_RO(guest_matrix);
+
 static struct attribute *vfio_ap_mdev_attrs[] = {
 	&dev_attr_assign_adapter.attr,
 	&dev_attr_unassign_adapter.attr,
@@ -1132,6 +1157,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
 	&dev_attr_unassign_control_domain.attr,
 	&dev_attr_control_domains.attr,
 	&dev_attr_matrix.attr,
+	&dev_attr_guest_matrix.attr,
 	NULL,
 };
 
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (6 preceding siblings ...)
  2020-10-22 17:12 ` [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
@ 2020-10-22 17:12 ` Tony Krowiak
  2020-10-22 20:30   ` kernel test robot
  2020-10-28 13:57   ` Halil Pasic
  2020-10-22 17:12 ` [PATCH v11 09/14] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:12 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

In response to the probe or remove of a queue device, if a KVM guest is
using the matrix mdev to which the APQN of the queue device is assigned,
the vfio_ap device driver must respond accordingly. In an ideal world, the
queue device being probed would be hot plugged into the guest. Likewise,
the queue corresponding to the queue device being removed would
be hot unplugged from the guest. Unfortunately, the AP architecture
precludes plugging or unplugging individual queues. We must also
consider the fact that the linux device model precludes us from passing a
queue device through to a KVM guest that is not bound to the driver
facilitating the pass-through. Consequently, we are left with the choice of
plugging/unplugging the adapter or the domain. In the latter case, this
would result in taking access to the domain away for each adapter the
guest is using. In either case, the operation will alter a KVM guest's
access to one or more queues, so let's plug/unplug the adapter on
bind/unbind of the queue device since this corresponds to the hardware
entity that may be physically plugged/unplugged - i.e., a domain is not
a piece of hardware.

Example:
=======
Queue devices bound to vfio_ap device driver:
   04.0004
   04.0047
   04.0054

   05.0005
   05.0047

Adapters and domains assigned to matrix mdev:
   Adapters  Domains  -> Queues
   04        0004        04.0004
   05        0047        04.0047
             0054        04.0054
                         05.0004
                         05.0047
                         05.0054

KVM guest matrix at is startup:
   Adapters  Domains  -> Queues
   04        0004        04.0004
             0047        04.0047
             0054        04.0054

   Adapter 05 is filtered because queue 05.0054 is not bound.

KVM guest matrix after queue 05.0054 is bound to the vfio_ap driver:
   Adapters  Domains  -> Queues
   04        0004        04.0004
   05        0047        04.0047
             0054        04.0054
                         05.0004
                         05.0047
                         05.0054

   All queues assigned to the matrix mdev are now bound.

KVM guest matrix after queue 04.0004 is unbound:

   Adapters  Domains  -> Queues
   05        0004        05.0004
             0047        05.0047
             0054        05.0054

   Adapter 04 is filtered because 04.0004 is no longer bound.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 158 +++++++++++++++++++++++++++++-
 1 file changed, 155 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 7bad70d7bcef..5b34bc8fca31 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -312,6 +312,13 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static void vfio_ap_matrix_clear_masks(struct ap_matrix *matrix)
+{
+	bitmap_clear(matrix->apm, 0, AP_DEVICES);
+	bitmap_clear(matrix->aqm, 0, AP_DOMAINS);
+	bitmap_clear(matrix->adm, 0, AP_DOMAINS);
+}
+
 static void vfio_ap_matrix_init(struct ap_config_info *info,
 				struct ap_matrix *matrix)
 {
@@ -601,6 +608,104 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
 	return 0;
 }
 
+static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
+					struct ap_matrix *matrix2)
+{
+	return (bitmap_equal(matrix1->apm, matrix2->apm, AP_DEVICES) &&
+		bitmap_equal(matrix1->aqm, matrix2->aqm, AP_DOMAINS) &&
+		bitmap_equal(matrix1->adm, matrix2->adm, AP_DOMAINS));
+}
+
+/**
+ * vfio_ap_mdev_filter_matrix
+ *
+ * Filters the matrix of adapters, domains, and control domains assigned to
+ * a matrix mdev's AP configuration and stores the result in the shadow copy of
+ * the APCB used to supply a KVM guest's AP configuration.
+ *
+ * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
+ *
+ * Returns true if filtering has changed the shadow copy of the APCB used
+ * to supply a KVM guest's AP configuration; otherwise, returns false.
+ */
+static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev)
+{
+	struct ap_matrix shadow_apcb;
+	unsigned long apid, apqi, apqn;
+
+	memcpy(&shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
+
+	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+		/*
+		 * If the APID is not assigned to the host AP configuration,
+		 * we can not assign it to the guest's AP configuration
+		 */
+		if (!test_bit_inv(apid,
+				  (unsigned long *)matrix_dev->info.apm)) {
+			clear_bit_inv(apid, shadow_apcb.apm);
+			continue;
+		}
+
+		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+				     AP_DOMAINS) {
+			/*
+			 * If the APQI is not assigned to the host AP
+			 * configuration, then it can not be assigned to the
+			 * guest's AP configuration
+			 */
+			if (!test_bit_inv(apqi, (unsigned long *)
+					  matrix_dev->info.aqm)) {
+				clear_bit_inv(apqi, shadow_apcb.aqm);
+				continue;
+			}
+
+			/*
+			 * If the APQN is not bound to the vfio_ap device
+			 * driver, then we can't assign it to the guest's
+			 * AP configuration. The AP architecture won't
+			 * allow filtering of a single APQN, so let's filter
+			 * the APID.
+			 */
+			apqn = AP_MKQID(apid, apqi);
+			if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
+				clear_bit_inv(apid, shadow_apcb.apm);
+				break;
+			}
+		}
+
+		/*
+		 * If all APIDs have been cleared, then clear the APQIs from the
+		 * shadow APCB and quit filtering.
+		 */
+		if (bitmap_empty(shadow_apcb.apm, AP_DEVICES)) {
+			if (!bitmap_empty(shadow_apcb.aqm, AP_DOMAINS))
+				bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
+
+			break;
+		}
+
+		/*
+		 * If all APQIs have been cleared, then clear the APIDs from the
+		 * shadow APCB and quit filtering.
+		 */
+		if (bitmap_empty(shadow_apcb.aqm, AP_DOMAINS)) {
+			if (!bitmap_empty(shadow_apcb.apm, AP_DEVICES))
+				bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
+
+			break;
+		}
+	}
+
+	if (vfio_ap_mdev_matrixes_equal(&matrix_mdev->shadow_apcb,
+					&shadow_apcb))
+		return false;
+
+	memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb,
+	       sizeof(struct ap_matrix));
+
+	return true;
+}
+
 enum qlink_type {
 	LINK_APID,
 	LINK_APQI,
@@ -1256,9 +1361,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
 		return NOTIFY_DONE;
 
-	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
-	       sizeof(matrix_mdev->shadow_apcb));
-	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+	if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 
 	return NOTIFY_OK;
 }
@@ -1369,6 +1473,18 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
 		matrix_mdev->kvm = NULL;
 	}
 
+	/*
+	 * The shadow_apcb must be cleared.
+	 *
+	 * The shadow_apcb is committed to the guest only if the masks resulting
+	 * from filtering the matrix_mdev->matrix differs from the masks in the
+	 * shadow_apcb. Consequently, if we don't clear the masks here and a
+	 * guest is subsequently started, the filtering may not result in a
+	 * change to the shadow_apcb which will not get committed to the guest;
+	 * in that case, the guest will be left without any queues.
+	 */
+	vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
+
 	mutex_unlock(&matrix_dev->lock);
 
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
@@ -1466,6 +1582,16 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
 	}
 }
 
+static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
+{
+
+	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
+		return;
+
+	if (vfio_ap_mdev_filter_guest_matrix(q->matrix_mdev))
+		vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
+}
+
 int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
 {
 	struct vfio_ap_queue *q;
@@ -1482,11 +1608,36 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
 	q->apqn = queue->qid;
 	q->saved_isc = VFIO_AP_ISC_INVALID;
 	vfio_ap_queue_link_mdev(q);
+	vfio_ap_mdev_hot_plug_queue(q);
 	mutex_unlock(&matrix_dev->lock);
 
 	return 0;
 }
 
+void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
+{
+	unsigned long apid = AP_QID_CARD(q->apqn);
+
+	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
+		return;
+
+	/*
+	 * If the APID is assigned to the guest, then let's
+	 * go ahead and unplug the adapter since the
+	 * architecture does not provide a means to unplug
+	 * an individual queue.
+	 */
+	if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
+		clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
+
+		if (bitmap_empty(q->matrix_mdev->shadow_apcb.apm, AP_DEVICES))
+			bitmap_clear(q->matrix_mdev->shadow_apcb.aqm, 0,
+				     AP_DOMAINS);
+
+		vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
+	}
+}
+
 void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
 {
 	struct vfio_ap_queue *q;
@@ -1497,6 +1648,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
 
 	mutex_lock(&matrix_dev->lock);
 	q = dev_get_drvdata(&queue->ap_dev.device);
+	vfio_ap_mdev_hot_unplug_queue(q);
 	dev_set_drvdata(&queue->ap_dev.device, NULL);
 	apid = AP_QID_CARD(q->apqn);
 	apqi = AP_QID_QUEUE(q->apqn);
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 09/14] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (7 preceding siblings ...)
  2020-10-22 17:12 ` [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device Tony Krowiak
@ 2020-10-22 17:12 ` Tony Krowiak
  2020-10-28 15:03   ` Halil Pasic
  2020-10-22 17:12 ` [PATCH v11 10/14] s390/vfio-ap: allow hot plug/unplug of AP resources using " Tony Krowiak
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:12 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

The current implementation does not allow assignment of an AP adapter or
domain to an mdev device if each APQN resulting from the assignment
does not reference an AP queue device that is bound to the vfio_ap device
driver. This patch allows assignment of AP resources to the matrix mdev as
long as the APQNs resulting from the assignment:
   1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
   2. Are not assigned to another matrix mdev.

The rationale behind this is twofold:
   1. The AP architecture does not preclude assignment of APQNs to an AP
      configuration that are not available to the system.
   2. APQNs that do not reference a queue device bound to the vfio_ap
      device driver will not be assigned to the guest's CRYCB, so the
      guest will not get access to queues not bound to the vfio_ap driver.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 197 ++++--------------------------
 1 file changed, 26 insertions(+), 171 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 5b34bc8fca31..c2c6dcec8829 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -427,122 +427,6 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
 	NULL,
 };
 
-struct vfio_ap_queue_reserved {
-	unsigned long *apid;
-	unsigned long *apqi;
-	bool reserved;
-};
-
-/**
- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
- *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- *   as reserved if the APID and APQI fields for the AP queue device matches
- *
- * - If @data contains only an apid value, @data will be flagged as
- *   reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- *   reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
- */
-static int vfio_ap_has_queue(struct device *dev, void *data)
-{
-	struct vfio_ap_queue_reserved *qres = data;
-	struct ap_queue *ap_queue = to_ap_queue(dev);
-	ap_qid_t qid;
-	unsigned long id;
-
-	if (qres->apid && qres->apqi) {
-		qid = AP_MKQID(*qres->apid, *qres->apqi);
-		if (qid == ap_queue->qid)
-			qres->reserved = true;
-	} else if (qres->apid && !qres->apqi) {
-		id = AP_QID_CARD(ap_queue->qid);
-		if (id == *qres->apid)
-			qres->reserved = true;
-	} else if (!qres->apid && qres->apqi) {
-		id = AP_QID_QUEUE(ap_queue->qid);
-		if (id == *qres->apqi)
-			qres->reserved = true;
-	} else {
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-/**
- * vfio_ap_verify_queue_reserved
- *
- * @matrix_dev: a mediated matrix device
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
- *
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
- * driver according to the following rules:
- *
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- *   device bound to the vfio_ap driver with the APQN identified by @apid and
- *   @apqi
- *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apid
- *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apqi
- *
- * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
- */
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
-					 unsigned long *apqi)
-{
-	int ret;
-	struct vfio_ap_queue_reserved qres;
-
-	qres.apid = apid;
-	qres.apqi = apqi;
-	qres.reserved = false;
-
-	ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
-				     &qres, vfio_ap_has_queue);
-	if (ret)
-		return ret;
-
-	if (qres.reserved)
-		return 0;
-
-	return -EADDRNOTAVAIL;
-}
-
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
-					     unsigned long apid)
-{
-	int ret;
-	unsigned long apqi;
-	unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
-
-	if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
-		return vfio_ap_verify_queue_reserved(&apid, NULL);
-
-	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
-		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
-}
-
 #define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
 			 "already assigned to %s"
 
@@ -608,6 +492,16 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
 	return 0;
 }
 
+static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev,
+				       unsigned long *mdev_apm,
+				       unsigned long *mdev_aqm)
+{
+	if (ap_apqn_in_matrix_owned_by_def_drv(mdev_apm, mdev_aqm))
+		return -EADDRNOTAVAIL;
+
+	return vfio_ap_mdev_verify_no_sharing(matrix_mdev, mdev_apm, mdev_aqm);
+}
+
 static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
 					struct ap_matrix *matrix2)
 {
@@ -840,33 +734,21 @@ static ssize_t assign_adapter_store(struct device *dev,
 	if (apid > matrix_mdev->matrix.apm_max)
 		return -ENODEV;
 
-	/*
-	 * Set the bit in the AP mask (APM) corresponding to the AP adapter
-	 * number (APID). The bits in the mask, from most significant to least
-	 * significant bit, correspond to APIDs 0-255.
-	 */
-	mutex_lock(&matrix_dev->lock);
-
-	ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
-	if (ret)
-		goto done;
-
 	memset(apm, 0, sizeof(apm));
 	set_bit_inv(apid, apm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
-					     matrix_mdev->matrix.aqm);
-	if (ret)
-		goto done;
-
+	mutex_lock(&matrix_dev->lock);
+	ret = vfio_ap_mdev_validate_masks(matrix_mdev, apm,
+					  matrix_mdev->matrix.aqm);
+	if (ret) {
+		mutex_unlock(&matrix_dev->lock);
+		return ret;
+	}
 	set_bit_inv(apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
-	ret = count;
-
-done:
 	mutex_unlock(&matrix_dev->lock);
 
-	return ret;
+	return count;
 }
 static DEVICE_ATTR_WO(assign_adapter);
 
@@ -916,26 +798,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(unassign_adapter);
 
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
-					     unsigned long apqi)
-{
-	int ret;
-	unsigned long apid;
-	unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
-
-	if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
-		return vfio_ap_verify_queue_reserved(NULL, &apqi);
-
-	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
-		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
-}
-
 /**
  * assign_domain_store
  *
@@ -989,28 +851,21 @@ static ssize_t assign_domain_store(struct device *dev,
 	if (apqi > max_apqi)
 		return -ENODEV;
 
-	mutex_lock(&matrix_dev->lock);
-
-	ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
-	if (ret)
-		goto done;
-
 	memset(aqm, 0, sizeof(aqm));
 	set_bit_inv(apqi, aqm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
-					     matrix_mdev->matrix.apm, aqm);
-	if (ret)
-		goto done;
-
+	mutex_lock(&matrix_dev->lock);
+	ret = vfio_ap_mdev_validate_masks(matrix_mdev, matrix_mdev->matrix.apm,
+					  aqm);
+	if (ret) {
+		mutex_unlock(&matrix_dev->lock);
+		return ret;
+	}
 	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
-	ret = count;
-
-done:
 	mutex_unlock(&matrix_dev->lock);
 
-	return ret;
+	return count;
 }
 static DEVICE_ATTR_WO(assign_domain);
 
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 10/14] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (8 preceding siblings ...)
  2020-10-22 17:12 ` [PATCH v11 09/14] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
@ 2020-10-22 17:12 ` Tony Krowiak
  2020-10-22 17:12 ` [PATCH v11 11/14] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:12 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Let's hot plug/unplug adapters, domains and control domains assigned to or
unassigned from an AP matrix mdev device while it is in use by a guest per
the following:

* Hot plug AP adapter:

  When the APID of an adapter is assigned to a matrix mdev in use by a KVM
  guest, the adapter will be hot plugged into the KVM guest as long as each
  APQN derived from the Cartesian product of the APID being assigned and
  the APQIs already assigned to the matrix mdev references a queue device
  bound to the vfio_ap device driver.

* Hot unplug adapter:

  When the APID of an adapter is unassigned from a matrix mdev in use by a
  KVM guest, the adapter will be hot unplugged from the KVM guest.

* Hot plug domain:

  When the APQI of a domain is assigned to a matrix mdev in use by a KVM
  guest, the domain will be hot plugged into the KVM guest as long as each
  APQN derived from the Cartesian product of the APQI being assigned and
  the APIDs already assigned to the matrix mdev references a queue device
  bound to the vfio_ap device driver.

* Hot unplug domain:

  When the APQI of a domain is unassigned from a matrix mdev in use by a
  KVM guest, the domain will be hot unplugged from the KVM guest

* Hot plug control domain:

  When the domain number of a control domain is assigned to a matrix mdev
  in use by a KVM guest, the control domain will be hot plugged into the
  KVM guest. The AP architecture ensures a guest will only get access to
  the control domain if it is in the host's AP configuration, so there is
  no risk in hot plugging it; however, it will become automatically
  available to the guest when it is added to the host configuration.

* Hot unplug control domain:

  When the domain number of a control domain is unassigned from a matrix
  mdev in use by a KVM guest, the control domain will be hot unplugged
  from the KVM guest.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 148 ++++++++++++++++++++++++------
 1 file changed, 119 insertions(+), 29 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index c2c6dcec8829..dae1fba41941 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -517,12 +517,18 @@ static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
  * a matrix mdev's AP configuration and stores the result in the shadow copy of
  * the APCB used to supply a KVM guest's AP configuration.
  *
+ * Note: Filtering is applied only to adapters and domains. Changes to control
+ *	 domains will always be reflected in the shadow APCB.
+ *
  * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
+ * @filter_apid:  indicates whether APIDs (true) or APQIs (false) shall be
+ *		  filtered
  *
  * Returns true if filtering has changed the shadow copy of the APCB used
  * to supply a KVM guest's AP configuration; otherwise, returns false.
  */
-static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev,
+					    bool filter_apid)
 {
 	struct ap_matrix shadow_apcb;
 	unsigned long apid, apqi, apqn;
@@ -561,9 +567,15 @@ static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev)
 			 * the APID.
 			 */
 			apqn = AP_MKQID(apid, apqi);
+
 			if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
-				clear_bit_inv(apid, shadow_apcb.apm);
-				break;
+				if (filter_apid) {
+					clear_bit_inv(apid, shadow_apcb.apm);
+					break;
+				}
+
+				clear_bit_inv(apqi, shadow_apcb.aqm);
+				continue;
 			}
 		}
 
@@ -723,10 +735,6 @@ static ssize_t assign_adapter_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow assignment of adapter */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apid);
 	if (ret)
 		return ret;
@@ -746,12 +754,44 @@ static ssize_t assign_adapter_store(struct device *dev,
 	}
 	set_bit_inv(apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
+
+	if (vfio_ap_mdev_has_crycb(matrix_mdev))
+		if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev, true))
+			vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(assign_adapter);
 
+static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
+					     unsigned long apid)
+{
+	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+		if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
+			clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+
+			/*
+			 * If there are no APIDs assigned to the guest, then
+			 * the guest will not have access to any queues, so
+			 * let's also go ahead and unassign the APQIs. Keeping
+			 * them around may yield unpredictable results during
+			 * a probe that is not related to a host AP
+			 * configuration change (i.e., an AP adapter is
+			 * configured online).
+			 */
+			if (bitmap_empty(matrix_mdev->shadow_apcb.apm,
+					 AP_DEVICES))
+				bitmap_clear(matrix_mdev->shadow_apcb.aqm, 0,
+					     AP_DOMAINS);
+
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /**
  * unassign_adapter_store
  *
@@ -778,10 +818,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow un-assignment of adapter */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apid);
 	if (ret)
 		return ret;
@@ -792,6 +828,9 @@ static ssize_t unassign_adapter_store(struct device *dev,
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
+
+	if (vfio_ap_mdev_unassign_guest_apid(matrix_mdev, apid))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -841,10 +880,6 @@ static ssize_t assign_domain_store(struct device *dev,
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
 
-	/* If the guest is running, disallow assignment of domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apqi);
 	if (ret)
 		return ret;
@@ -863,12 +898,43 @@ static ssize_t assign_domain_store(struct device *dev,
 	}
 	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
+
+	if (vfio_ap_mdev_has_crycb(matrix_mdev))
+		if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev, false))
+			vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(assign_domain);
 
+static bool vfio_ap_mdev_unassign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
+					     unsigned long apqi)
+{
+	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+		if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
+			clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
+
+			/*
+			 * If there are no APQIs assigned to the guest, then
+			 * the guest will not have access to any queues, so
+			 * let's also go ahead and unassign the APIDs. Keeping
+			 * them around may yield unpredictable results during
+			 * a probe that is not related to a host AP
+			 * configuration change (i.e., an AP adapter is
+			 * configured online).
+			 */
+			if (bitmap_empty(matrix_mdev->shadow_apcb.aqm,
+					 AP_DOMAINS))
+				bitmap_clear(matrix_mdev->shadow_apcb.apm, 0,
+					     AP_DEVICES);
+
+			return true;
+		}
+	}
+
+	return false;
+}
 
 /**
  * unassign_domain_store
@@ -896,10 +962,6 @@ static ssize_t unassign_domain_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow un-assignment of domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apqi);
 	if (ret)
 		return ret;
@@ -910,12 +972,29 @@ static ssize_t unassign_domain_store(struct device *dev,
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
+
+	if (vfio_ap_mdev_unassign_guest_apqi(matrix_mdev, apqi))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(unassign_domain);
 
+static bool vfio_ap_mdev_assign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
+					   unsigned long domid)
+{
+	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+		if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
+			set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /**
  * assign_control_domain_store
  *
@@ -941,10 +1020,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow assignment of control domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &id);
 	if (ret)
 		return ret;
@@ -959,12 +1034,29 @@ static ssize_t assign_control_domain_store(struct device *dev,
 	 */
 	mutex_lock(&matrix_dev->lock);
 	set_bit_inv(id, matrix_mdev->matrix.adm);
+	if (vfio_ap_mdev_assign_guest_cdom(matrix_mdev, id))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(assign_control_domain);
 
+static bool
+vfio_ap_mdev_unassign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
+				 unsigned long domid)
+{
+	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+		if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
+			clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /**
  * unassign_control_domain_store
  *
@@ -991,10 +1083,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_domid =  matrix_mdev->matrix.adm_max;
 
-	/* If the guest is running, disallow un-assignment of control domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &domid);
 	if (ret)
 		return ret;
@@ -1003,6 +1091,8 @@ static ssize_t unassign_control_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv(domid, matrix_mdev->matrix.adm);
+	if (vfio_ap_mdev_unassign_guest_cdom(matrix_mdev, domid))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -1216,7 +1306,7 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
 		return NOTIFY_DONE;
 
-	if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev))
+	if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev, true))
 		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 
 	return NOTIFY_OK;
@@ -1443,7 +1533,7 @@ static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
 	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
 		return;
 
-	if (vfio_ap_mdev_filter_guest_matrix(q->matrix_mdev))
+	if (vfio_ap_mdev_filter_guest_matrix(q->matrix_mdev, true))
 		vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
 }
 
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 11/14] s390/zcrypt: Notify driver on config changed and scan complete callbacks
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (9 preceding siblings ...)
  2020-10-22 17:12 ` [PATCH v11 10/14] s390/vfio-ap: allow hot plug/unplug of AP resources using " Tony Krowiak
@ 2020-10-22 17:12 ` Tony Krowiak
  2020-10-27 17:28   ` Harald Freudenberger
  2020-10-22 17:12 ` [PATCH v11 12/14] s390/vfio-ap: handle host AP config change notification Tony Krowiak
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:12 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

This patch intruduces an extension to the ap bus to notify device drivers
when the host AP configuration changes - i.e., adapters, domains or
control domains are added or removed. To that end, two new callbacks are
introduced for AP device drivers:

  void (*on_config_changed)(struct ap_config_info *new_config_info,
                            struct ap_config_info *old_config_info);

     This callback is invoked at the start of the AP bus scan
     function when it determines that the host AP configuration information
     has changed since the previous scan. This is done by storing
     an old and current QCI info struct and comparing them. If there is any
     difference, the callback is invoked.

     Note that when the AP bus scan detects that AP adapters, domains or
     control domains have been removed from the host's AP configuration, it
     will remove the associated devices from the AP bus subsystem's device
     model. This callback gives the device driver a chance to respond to
     the removal of the AP devices from the host configuration prior to
     calling the device driver's remove callback. The primary purpose of
     this callback is to allow the vfio_ap driver to do a bulk unplug of
     all affected adapters, domains and control domains from affected
     guests rather than unplugging them one at a time when the remove
     callback is invoked.

  void (*on_scan_complete)(struct ap_config_info *new_config_info,
                           struct ap_config_info *old_config_info);

     The on_scan_complete callback is invoked after the ap bus scan is
     complete if the host AP configuration data has changed.

     Note that when the AP bus scan detects that adapters, domains or
     control domains have been added to the host's configuration, it will
     create new devices in the AP bus subsystem's device model. The primary
     purpose of this callback is to allow the vfio_ap driver to do a bulk
     plug of all affected adapters, domains and control domains into
     affected guests rather than plugging them one at a time when the
     probe callback is invoked.

Please note that changes to the apmask and aqmask do not trigger
these two callbacks since the bus scan function is not invoked by changes
to those masks.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/ap_bus.c          | 88 ++++++++++++++++++++++++++-
 drivers/s390/crypto/ap_bus.h          | 12 ++++
 drivers/s390/crypto/vfio_ap_drv.c     |  2 +-
 drivers/s390/crypto/vfio_ap_ops.c     | 11 ++--
 drivers/s390/crypto/vfio_ap_private.h |  2 +-
 5 files changed, 106 insertions(+), 9 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 998e61cd86d9..5b94956ef6bc 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -73,8 +73,10 @@ struct ap_perms ap_perms;
 EXPORT_SYMBOL(ap_perms);
 DEFINE_MUTEX(ap_perms_mutex);
 EXPORT_SYMBOL(ap_perms_mutex);
+DEFINE_MUTEX(ap_config_lock);
 
 static struct ap_config_info *ap_qci_info;
+static struct ap_config_info *ap_qci_info_old;
 
 /*
  * AP bus related debug feature things.
@@ -1420,6 +1422,52 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
 		&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
 }
 
+/* Helper function for notify_config_changed */
+static int __drv_notify_config_changed(struct device_driver *drv, void *data)
+{
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+
+	if (try_module_get(drv->owner)) {
+		if (ap_drv->on_config_changed)
+			ap_drv->on_config_changed(ap_qci_info,
+						  ap_qci_info_old);
+		module_put(drv->owner);
+	}
+
+	return 0;
+}
+
+/* Notify all drivers about an qci config change */
+static inline void notify_config_changed(void)
+{
+	bus_for_each_drv(&ap_bus_type, NULL, NULL,
+			 __drv_notify_config_changed);
+}
+
+/* Helper function for notify_scan_complete */
+static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
+{
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+
+	if (try_module_get(drv->owner)) {
+		if (ap_drv->on_scan_complete)
+			ap_drv->on_scan_complete(ap_qci_info,
+						 ap_qci_info_old);
+		module_put(drv->owner);
+	}
+
+	return 0;
+}
+
+/* Notify all drivers about bus scan complete */
+static inline void notify_scan_complete(void)
+{
+	bus_for_each_drv(&ap_bus_type, NULL, NULL,
+			 __drv_notify_scan_complete);
+}
+
+
+
 /*
  * Helper function for ap_scan_bus().
  * Remove card device and associated queue devices.
@@ -1696,15 +1744,45 @@ static inline void ap_scan_adapter(int ap)
 	put_device(&ac->ap_dev.device);
 }
 
+static int ap_config_changed(void)
+{
+	int cfg_chg = 0;
+
+	if (ap_qci_info) {
+		if (!ap_qci_info_old) {
+			ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old),
+						  GFP_KERNEL);
+			if (!ap_qci_info_old)
+				return 0;
+		} else {
+			memcpy(ap_qci_info_old, ap_qci_info,
+			       sizeof(struct ap_config_info));
+		}
+		ap_fetch_qci_info(ap_qci_info);
+		cfg_chg = memcmp(ap_qci_info,
+				 ap_qci_info_old,
+				 sizeof(struct ap_config_info)) != 0;
+	}
+
+	return cfg_chg;
+}
+
 /**
  * ap_scan_bus(): Scan the AP bus for new devices
  * Runs periodically, workqueue timer (ap_config_time)
  */
 static void ap_scan_bus(struct work_struct *unused)
 {
-	int ap;
+	int ap, config_changed = 0;
+
+	mutex_lock(&ap_config_lock);
 
-	ap_fetch_qci_info(ap_qci_info);
+	/* config change notify */
+	config_changed = ap_config_changed();
+	if (config_changed)
+		notify_config_changed();
+	memcpy(ap_qci_info_old, ap_qci_info,
+	       sizeof(struct ap_config_info));
 	ap_select_domain();
 
 	AP_DBF_DBG("%s running\n", __func__);
@@ -1713,6 +1791,12 @@ static void ap_scan_bus(struct work_struct *unused)
 	for (ap = 0; ap <= ap_max_adapter_id; ap++)
 		ap_scan_adapter(ap);
 
+	/* scan complete notify */
+	if (config_changed)
+		notify_scan_complete();
+
+	mutex_unlock(&ap_config_lock);
+
 	/* check if there is at least one queue available with default domain */
 	if (ap_domain_index >= 0) {
 		struct device *dev =
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 6ce154d924d3..c021ea5121a9 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -146,6 +146,18 @@ struct ap_driver {
 	int (*probe)(struct ap_device *);
 	void (*remove)(struct ap_device *);
 	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
+	/*
+	 * Called at the start of the ap bus scan function when
+	 * the crypto config information (qci) has changed.
+	 */
+	void (*on_config_changed)(struct ap_config_info *new_config_info,
+				  struct ap_config_info *old_config_info);
+	/*
+	 * Called at the end of the ap bus scan function when
+	 * the crypto config information (qci) has changed.
+	 */
+	void (*on_scan_complete)(struct ap_config_info *new_config_info,
+				 struct ap_config_info *old_config_info);
 };
 
 #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 8934471b7944..f06e19754de3 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -87,7 +87,7 @@ static int vfio_ap_matrix_dev_create(void)
 
 	/* Fill in config info via PQAP(QCI), if available */
 	if (test_facility(12)) {
-		ret = ap_qci(&matrix_dev->info);
+		ret = ap_qci(&matrix_dev->config_info);
 		if (ret)
 			goto matrix_alloc_err;
 	}
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index dae1fba41941..c4ea80ec8599 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -354,8 +354,9 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 	}
 
 	matrix_mdev->mdev = mdev;
-	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
-	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
+	vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
+	vfio_ap_matrix_init(&matrix_dev->config_info,
+			    &matrix_mdev->shadow_apcb);
 	hash_init(matrix_mdev->qtable);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	matrix_mdev->pqap_hook.hook = handle_pqap;
@@ -540,8 +541,8 @@ static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev,
 		 * If the APID is not assigned to the host AP configuration,
 		 * we can not assign it to the guest's AP configuration
 		 */
-		if (!test_bit_inv(apid,
-				  (unsigned long *)matrix_dev->info.apm)) {
+		if (!test_bit_inv(apid, (unsigned long *)
+				  matrix_dev->config_info.apm)) {
 			clear_bit_inv(apid, shadow_apcb.apm);
 			continue;
 		}
@@ -554,7 +555,7 @@ static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev,
 			 * guest's AP configuration
 			 */
 			if (!test_bit_inv(apqi, (unsigned long *)
-					  matrix_dev->info.aqm)) {
+					  matrix_dev->config_info.aqm)) {
 				clear_bit_inv(apqi, shadow_apcb.aqm);
 				continue;
 			}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index fc8634cee485..5065f0367ea2 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -40,7 +40,7 @@
 struct ap_matrix_dev {
 	struct device device;
 	atomic_t available_instances;
-	struct ap_config_info info;
+	struct ap_config_info config_info;
 	struct list_head mdev_list;
 	struct mutex lock;
 	struct ap_driver  *vfio_ap_drv;
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 12/14] s390/vfio-ap: handle host AP config change notification
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (10 preceding siblings ...)
  2020-10-22 17:12 ` [PATCH v11 11/14] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
@ 2020-10-22 17:12 ` Tony Krowiak
  2020-10-22 21:17   ` kernel test robot
  2020-11-03  9:48   ` kernel test robot
  2020-10-22 17:12 ` [PATCH v11 13/14] s390/vfio-ap: handle AP bus scan completed notification Tony Krowiak
  2020-10-22 17:12 ` [PATCH v11 14/14] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak
  13 siblings, 2 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:12 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

The motivation for config change notification is to enable the vfio_ap
device driver to handle hot plug/unplug of AP queues for a KVM guest as a
bulk operation. For example, if a new APID is dynamically assigned to the
host configuration, then a queue device will be created for each APQN that
can be formulated from the new APID and all APQIs already assigned to the
host configuration. Each of these new queue devices will get bound to their
respective driver one at a time, as they are created. In the case of the
vfio_ap driver, if the APQN of the queue device being bound to the driver
is assigned to a matrix mdev in use by a KVM guest, it will be hot plugged
into the guest if possible. Given that the AP architecture allows for 256
adapters and 256 domains, one can see the possibility of the vfio_ap
driver's probe/remove callbacks getting invoked an inordinate number of
times when the host configuration changes. Keep in mind that in order to
plug/unplug an AP queue for a guest, the guest's VCPUs must be suspended,
then the guest's AP configuration must be updated followed by the VCPUs
being resumed. If this is done each time the probe or remove callback is
invoked and there are hundreds or thousands of queues to be probed or
removed, this would be incredibly inefficient and could have a large impact
on guest performance. What the config notification does is allow us to
make the changes to the guest in a single operation.

This patch implements the on_cfg_changed callback which notifies the
AP device drivers that the host AP configuration has changed (i.e.,
adapters, domains and/or control domains are added to or removed from the
host AP configuration).

Adapters added to host configuration:
* The APIDs of the adapters added will be stored in a bitmap contained
  within the struct representing the matrix device which is the parent
  device of all matrix mediated devices.
* When a queue is probed, if the APID of the queue being probed is
  contained in the bitmap of adapters added, the queue hot plug operation
  will be skipped until the AP bus notifies the driver that its scan
  operation has completed.

Domains added to host configuration:
* The APQIs of the domains added will be stored in a bitmap contained
  within the struct representing the matrix device which is the parent
  device of all matrix mediated devices.
* When a queue is probed, if the APQI of the queue being probed is
  contained in the bitmap of domains added, the queue hot plug operation
  will be skipped until the AP bus notifies the driver that its scan
  operation has completed.

Control domains added to the host configuration:
* Since control domains are not devices in the linux device model, there is
  no concern with whether they are bound to a device driver.
* The AP architecture will mask off control domains not in the host AP
  configuration from the guest, so there is also no concern about a guest
  changing a domain to which it is not authorized.

Adapters removed from configuration:
* Each adapter removed from the host configuration will be hot unplugged
  from each guest using it.
* Each queue device with the APID identifying an adapter removed from
  the host AP configuration will be unlinked from the matrix mdev to which
  the queue's APQN is assigned.
* When the vfio_ap driver's remove callback is invoked, if the queue
  device is not linked to the matrix mdev, the hot unplug operation will
  be skipped until the vfio_ap driver is notified that the AP bus scan
  has completed.

Adapters removed from configuration:
* Each domain removed from the host configuration will be hot unplugged
  from each guest using it.
* Each queue device with the APQI identifying a domain removed from
  the host AP configuration will be unlinked from the matrix mdev to which
  the queue's APQN is assigned.
* When the vfio_ap driver's remove callback is invoked, if the queue
  device is not linked to the matrix mdev, the hot unplug operation will
  be  until the vfio_ap driver is notified that the AP bus scan
  has completed.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |   3 +
 drivers/s390/crypto/vfio_ap_ops.c     | 223 +++++++++++++++++++++++++-
 drivers/s390/crypto/vfio_ap_private.h |  11 ++
 3 files changed, 236 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index f06e19754de3..d7aa5543afef 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -90,6 +90,8 @@ static int vfio_ap_matrix_dev_create(void)
 		ret = ap_qci(&matrix_dev->config_info);
 		if (ret)
 			goto matrix_alloc_err;
+		memcpy(&matrix_dev->config_info_prev, &matrix_dev->config_info,
+		       sizeof(struct ap_config_info));
 	}
 
 	mutex_init(&matrix_dev->lock);
@@ -149,6 +151,7 @@ static int __init vfio_ap_init(void)
 	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
 	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
 	vfio_ap_drv.ids = ap_queue_ids;
+	vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
 
 	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
 	if (ret) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index c4ea80ec8599..075096adbfd3 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1530,8 +1530,13 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
 
 static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
 {
+	unsigned long apid = AP_QID_CARD(q->apqn);
+	unsigned long apqi = AP_QID_QUEUE(q->apqn);
 
-	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
+	if ((q->matrix_mdev == NULL) ||
+	    !vfio_ap_mdev_has_crycb(q->matrix_mdev) ||
+	    test_bit_inv(apid, matrix_dev->ap_add) ||
+	    test_bit_inv(apqi, matrix_dev->aq_add))
 		return;
 
 	if (vfio_ap_mdev_filter_guest_matrix(q->matrix_mdev, true))
@@ -1616,3 +1621,219 @@ bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
 
 	return in_use;
 }
+
+/**
+ * vfio_ap_mdev_unassign_apids
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @apid_rem: The bitmap specifying the APIDs of the adapters removed from
+ *	      the host's AP configuration
+ *
+ * Unassigns each APID specified in @apid_rem that is assigned to the
+ * shadow APCB. Returns true if at least one APID is unassigned; otherwise,
+ * returns false.
+ */
+static bool vfio_ap_mdev_unassign_apids(struct ap_matrix_mdev *matrix_mdev,
+					unsigned long *apid_rem)
+{
+	DECLARE_BITMAP(shadow_apm, AP_DEVICES);
+
+	/*
+	 * Get the result of filtering the APIDs removed from the host AP
+	 * configuration out of the shadow APCB
+	 */
+	bitmap_andnot(shadow_apm, matrix_mdev->shadow_apcb.apm, apid_rem,
+		      AP_DEVICES);
+
+	/*
+	 * If filtering removed any APIDs from the shadow APCB, then let's go
+	 * ahead and update the shadow APCB accordingly
+	 */
+	if (!bitmap_equal(matrix_mdev->shadow_apcb.apm, shadow_apm,
+			  AP_DEVICES)) {
+		memcpy(matrix_mdev->shadow_apcb.apm, shadow_apm,
+		       sizeof(struct ap_matrix));
+
+		/*
+		 * If all APIDs have been filtered from the shadow APCB, then
+		 * let's also filter all of the APQIs. You need both APIDs and
+		 * APQIs to identify the APQNs of the queues to assign to a
+		 * guest.
+		 */
+		if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
+			bitmap_clear(matrix_mdev->shadow_apcb.aqm, 0,
+				     AP_DOMAINS);
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * vfio_ap_mdev_unlink_apids
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @apid_rem: The bitmap specifying the APIDs of the adapters removed from
+ *	      the host's AP configuration
+ *
+ * Unlinks @matrix_mdev from each queue assigned to @matrix_mdev whose APQN
+ * contains an APID specified in @apid_rem.
+ */
+static void vfio_ap_mdev_unlink_apids(struct ap_matrix_mdev *matrix_mdev,
+				      unsigned long *apid_rem)
+{
+	int bkt, apid;
+	struct vfio_ap_queue *q;
+
+	hash_for_each(matrix_mdev->qtable, bkt, q, mdev_qnode) {
+		apid = AP_QID_CARD(q->apqn);
+		if (test_bit_inv(apid, apid_rem)) {
+			q->matrix_mdev = NULL;
+			hash_del(&q->mdev_qnode);
+		}
+	}
+}
+
+/**
+ * vfio_ap_mdev_unassign_apqis
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @apqi_rem: The bitmap specifying the APQIs of the domains removed from
+ *	      the host's AP configuration
+ *
+ * Unassigns each APQI specified in @apqi_rem that is assigned to the
+ * shadow APCB. Returns true if at least one APQI is unassigned; otherwise,
+ * returns false.
+ */
+static bool vfio_ap_mdev_unassign_apqis(struct ap_matrix_mdev *matrix_mdev,
+					unsigned long *apqi_rem)
+{
+	DECLARE_BITMAP(shadow_aqm, AP_DOMAINS);
+
+	/*
+	 * Get the result of filtering the APQIs removed from the host AP
+	 * configuration out of the shadow APCB
+	 */
+	bitmap_andnot(shadow_aqm, matrix_mdev->shadow_apcb.aqm, apqi_rem,
+		      AP_DOMAINS);
+
+	/*
+	 * If filtering removed any APQIs from the shadow APCB, then let's go
+	 * ahead and update the shadow APCB accordingly
+	 */
+	if (!bitmap_equal(matrix_mdev->shadow_apcb.aqm, shadow_aqm,
+			  AP_DOMAINS)) {
+		memcpy(matrix_mdev->shadow_apcb.aqm, shadow_aqm,
+		       sizeof(struct ap_matrix));
+
+		/*
+		 * If all APQIs have been filtered from the shadow APCB, then
+		 * let's also filter all of the APIDs. You need both APIDs and
+		 * APQIs to identify the APQNs of the queues to assign to a
+		 * guest.
+		 */
+		if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
+			bitmap_clear(matrix_mdev->shadow_apcb.apm, 0,
+				     AP_DEVICES);
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * vfio_ap_mdev_unlink_apqis
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @apqi_rem: The bitmap specifying the APQIs of the domains removed from
+ *	      the host's AP configuration
+ *
+ * Unlinks @matrix_mdev from each queue assigned to @matrix_mdev whose APQN
+ * contains an APQI specified in @apqi_rem.
+ */
+static void vfio_ap_mdev_unlink_apqis(struct ap_matrix_mdev *matrix_mdev,
+				      unsigned long *apqi_rem)
+{
+	int bkt, apqi;
+	struct vfio_ap_queue *q;
+
+	hash_for_each(matrix_mdev->qtable, bkt, q, mdev_qnode) {
+		apqi = AP_QID_QUEUE(q->apqn);
+		if (test_bit_inv(apqi, apqi_rem)) {
+			q->matrix_mdev = NULL;
+			hash_del(&q->mdev_qnode);
+		}
+	}
+}
+
+static void vfio_ap_mdev_on_cfg_remove(void)
+{
+	bool unassigned = false;
+	int ap_remove, aq_remove;
+	struct ap_matrix_mdev *matrix_mdev;
+	DECLARE_BITMAP(apid_rem, AP_DEVICES);
+	DECLARE_BITMAP(apqi_rem, AP_DOMAINS);
+	unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
+
+	cur_apm = (unsigned long *)matrix_dev->config_info.apm;
+	cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
+	prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
+	prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
+
+	ap_remove = bitmap_andnot(apid_rem, prev_apm, cur_apm, AP_DEVICES);
+	aq_remove = bitmap_andnot(apqi_rem, prev_aqm, cur_aqm, AP_DOMAINS);
+
+	if (!ap_remove && !aq_remove)
+		return;
+
+	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+		if (!vfio_ap_mdev_has_crycb(matrix_mdev))
+			continue;
+
+		if (ap_remove) {
+			if (vfio_ap_mdev_unassign_apids(matrix_mdev, apid_rem))
+				unassigned = true;
+			vfio_ap_mdev_unlink_apids(matrix_mdev, apid_rem);
+		}
+
+		if (aq_remove) {
+			if (vfio_ap_mdev_unassign_apqis(matrix_mdev, apqi_rem))
+				unassigned = true;
+			vfio_ap_mdev_unlink_apqis(matrix_mdev, apqi_rem);
+		}
+	}
+}
+
+void vfio_ap_mdev_on_cfg_add(void)
+{
+	unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
+
+	cur_apm = (unsigned long *)matrix_dev->config_info.apm;
+	cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
+
+	prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
+	prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
+
+	bitmap_andnot(matrix_dev->ap_add, cur_apm, prev_apm, AP_DEVICES);
+	bitmap_andnot(matrix_dev->aq_add, cur_aqm, prev_aqm, AP_DOMAINS);
+}
+
+void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
+			    struct ap_config_info *old_config_info)
+{
+	mutex_lock(&matrix_dev->lock);
+	memcpy(&matrix_dev->config_info, new_config_info,
+	       sizeof(struct ap_config_info));
+	memcpy(&matrix_dev->config_info_prev, old_config_info,
+	       sizeof(struct ap_config_info));
+
+	vfio_ap_mdev_on_cfg_remove();
+	vfio_ap_mdev_on_cfg_add();
+	mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 5065f0367ea2..64f1f5b820f6 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -36,14 +36,21 @@
  *		driver, be it using @mdev_list or writing the state of a
  *		single ap_matrix_mdev device. It's quite coarse but we don't
  *		expect much contention.
+ ** @ap_add:	a bitmap specifying the APIDs added to the host AP configuration
+ *		as notified by the AP bus via the on_cfg_chg callback.
+ * @aq_add:	a bitmap specifying the APQIs added to the host AP configuration
+ *		as notified by the AP bus via the on_cfg_chg callback.
  */
 struct ap_matrix_dev {
 	struct device device;
 	atomic_t available_instances;
 	struct ap_config_info config_info;
+	struct ap_config_info config_info_prev;
 	struct list_head mdev_list;
 	struct mutex lock;
 	struct ap_driver  *vfio_ap_drv;
+	DECLARE_BITMAP(ap_add, AP_DEVICES);
+	DECLARE_BITMAP(aq_add, AP_DEVICES);
 };
 
 extern struct ap_matrix_dev *matrix_dev;
@@ -90,6 +97,8 @@ struct ap_matrix_mdev {
 	struct kvm_s390_module_hook pqap_hook;
 	struct mdev_device *mdev;
 	DECLARE_HASHTABLE(qtable, 8);
+	DECLARE_BITMAP(ap_add, AP_DEVICES);
+	DECLARE_BITMAP(aq_add, AP_DEVICES);
 };
 
 extern int vfio_ap_mdev_register(void);
@@ -108,5 +117,7 @@ int vfio_ap_mdev_probe_queue(struct ap_device *queue);
 void vfio_ap_mdev_remove_queue(struct ap_device *queue);
 
 bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
+void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
+			    struct ap_config_info *old_config_info);
 
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 13/14] s390/vfio-ap: handle AP bus scan completed notification
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (11 preceding siblings ...)
  2020-10-22 17:12 ` [PATCH v11 12/14] s390/vfio-ap: handle host AP config change notification Tony Krowiak
@ 2020-10-22 17:12 ` Tony Krowiak
  2020-10-22 17:12 ` [PATCH v11 14/14] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak
  13 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:12 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Implements the driver callback invoked by the AP bus when the AP bus
scan has completed. Since this callback is invoked after binding the newly
added devices to their respective device drivers, the vfio_ap driver will
attempt to hot plug the adapters, domains and control domains into each
guest using the matrix mdev to which they are assigned. Keep in mind that
an adapter or domain can be plugged in only if each APQN with the APID of
the adapter or the APQI of the domain references a queue device bound
to the vfio_ap device driver. Consequently, not all newly added adapters
and domains will necessarily get hot plugged.

The same filtering operation used when the guest is started will again be
used to filter the APQNs assigned to the guest when the vfio_ap driver is
notified the AP bus scan has completed for those matrix mediated devices
to which the newly added APID(s) and/or APQI(s) are assigned.

To recap the filtering process employed:

For each APQN formulated from the Cartesian
product of the APIDs and APQIs assigned to the matrix mdev, if the APQN
does not reference a queue device bound to the vfio_ap device driver, the
APID will not be hot plugged into the guest. If any APIDs are left after
filtering, all of the queues referenced by the APQNs formulated by the
remaining APIDs and the APQIs assigned to the matrix mdev will be hot
plugged into the guest.

Control domains will not be filtered and will always be hot plugged.

Example:
    =======
    Queue devices bound to vfio_ap device driver:
       04.0004
       04.0047
       04.0054

       05.0005
       05.0047

    Adapters and domains assigned to matrix mdev:
       Adapters  Domains  -> Queues
       04        0004        04.0004
       05        0047        04.0047
                 0054        04.0054
                             05.0004
                             05.0047
                             05.0054

    KVM guest matrix after filtering:
       Adapters  Domains  -> Queues
       04        0004        04.0004
                 0047        04.0047
                 0054        04.0054

       Adapter 05 is filtered because queue 05.0054 is not bound.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |  1 +
 drivers/s390/crypto/vfio_ap_ops.c     | 26 ++++++++++++++++++++++++++
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 3 files changed, 29 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index d7aa5543afef..357481e80b0a 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -152,6 +152,7 @@ static int __init vfio_ap_init(void)
 	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
 	vfio_ap_drv.ids = ap_queue_ids;
 	vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
+	vfio_ap_drv.on_scan_complete = vfio_ap_on_scan_complete;
 
 	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
 	if (ret) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 075096adbfd3..824f936364ba 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1837,3 +1837,29 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
 	vfio_ap_mdev_on_cfg_add();
 	mutex_unlock(&matrix_dev->lock);
 }
+
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+			      struct ap_config_info *old_config_info)
+{
+	struct ap_matrix_mdev *matrix_mdev;
+
+	mutex_lock(&matrix_dev->lock);
+	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+		if (!vfio_ap_mdev_has_crycb(matrix_mdev))
+			continue;
+
+		if (!bitmap_intersects(matrix_mdev->matrix.apm,
+				       matrix_dev->ap_add, AP_DEVICES) &&
+		    !bitmap_intersects(matrix_mdev->matrix.aqm,
+				       matrix_dev->aq_add, AP_DOMAINS))
+			continue;
+
+		if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev,
+						     true))
+			vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+	}
+
+	bitmap_clear(matrix_dev->ap_add, 0, AP_DEVICES);
+	bitmap_clear(matrix_dev->aq_add, 0, AP_DOMAINS);
+	mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 64f1f5b820f6..d82d1e62cb2f 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -119,5 +119,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *queue);
 bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
 void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
 			    struct ap_config_info *old_config_info);
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+			      struct ap_config_info *old_config_info);
 
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v11 14/14] s390/vfio-ap: update docs to include dynamic config support
  2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (12 preceding siblings ...)
  2020-10-22 17:12 ` [PATCH v11 13/14] s390/vfio-ap: handle AP bus scan completed notification Tony Krowiak
@ 2020-10-22 17:12 ` Tony Krowiak
  13 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-22 17:12 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Update the documentation in vfio-ap.rst to include information about the
AP dynamic configuration support (i.e., hot plug of adapters, domains
and control domains via the matrix mediated device's sysfs assignment
attributes).

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 Documentation/s390/vfio-ap.rst | 362 ++++++++++++++++++++++++++-------
 1 file changed, 285 insertions(+), 77 deletions(-)

diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
index e15436599086..888e15dbefc0 100644
--- a/Documentation/s390/vfio-ap.rst
+++ b/Documentation/s390/vfio-ap.rst
@@ -253,7 +253,7 @@ The process for reserving an AP queue for use by a KVM guest is:
 1. The administrator loads the vfio_ap device driver
 2. The vfio-ap driver during its initialization will register a single 'matrix'
    device with the device core. This will serve as the parent device for
-   all mediated matrix devices used to configure an AP matrix for a guest.
+   all matrix mediated devices used to configure an AP matrix for a guest.
 3. The /sys/devices/vfio_ap/matrix device is created by the device core
 4. The vfio_ap device driver will register with the AP bus for AP queue devices
    of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
@@ -269,7 +269,7 @@ The process for reserving an AP queue for use by a KVM guest is:
    default zcrypt cex4queue driver.
 8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
    it.
-9. The administrator creates a passthrough type mediated matrix device to be
+9. The administrator creates a passthrough type matrix mediated device to be
    used by a guest
 10. The administrator assigns the adapters, usage domains and control domains
     to be exclusively used by a guest.
@@ -279,14 +279,14 @@ Set up the VFIO mediated device interfaces
 The VFIO AP device driver utilizes the common interface of the VFIO mediated
 device core driver to:
 
-* Register an AP mediated bus driver to add a mediated matrix device to and
+* Register an AP mediated bus driver to add a matrix mediated device to and
   remove it from a VFIO group.
-* Create and destroy a mediated matrix device
-* Add a mediated matrix device to and remove it from the AP mediated bus driver
-* Add a mediated matrix device to and remove it from an IOMMU group
+* Create and destroy a matrix mediated device
+* Add a matrix mediated device to and remove it from the AP mediated bus driver
+* Add a matrix mediated device to and remove it from an IOMMU group
 
 The following high-level block diagram shows the main components and interfaces
-of the VFIO AP mediated matrix device driver::
+of the VFIO AP matrix mediated device driver::
 
    +-------------+
    |             |
@@ -351,29 +351,37 @@ matrix device.
     This attribute group identifies the user-defined sysfs attributes of the
     mediated device. When a device is registered with the VFIO mediated device
     framework, the sysfs attribute files identified in the 'mdev_attr_groups'
-    structure will be created in the mediated matrix device's directory. The
-    sysfs attributes for a mediated matrix device are:
+    structure will be created in the matrix mediated device's directory. The
+    sysfs attributes for a matrix mediated device are:
 
     assign_adapter / unassign_adapter:
       Write-only attributes for assigning/unassigning an AP adapter to/from the
-      mediated matrix device. To assign/unassign an adapter, the APID of the
+      matrix mediated device. To assign/unassign an adapter, the APID of the
       adapter is echoed to the respective attribute file.
     assign_domain / unassign_domain:
       Write-only attributes for assigning/unassigning an AP usage domain to/from
-      the mediated matrix device. To assign/unassign a domain, the domain
+      the matrix mediated device. To assign/unassign a domain, the domain
       number of the usage domain is echoed to the respective attribute
       file.
     matrix:
-      A read-only file for displaying the APQNs derived from the cross product
-      of the adapter and domain numbers assigned to the mediated matrix device.
+      A read-only file for displaying the APQNs derived from the Cartesian
+      product of the adapter and domain numbers assigned to the mediated matrix
+      device.
+    guest_matrix:
+      A read-only file for displaying the APQNs derived from the Cartesian
+      product of the adapter and domain numbers assigned to the APM and AQM
+      fields respectively of the KVM guest's CRYCB. This will differ from the
+      matrix if any APQNs assigned to the matrix mediated device do not
+      reference a queue device bound to the vfio_ap device driver (i.e., the
+      queue is not in the AP configuration).
     assign_control_domain / unassign_control_domain:
       Write-only attributes for assigning/unassigning an AP control domain
-      to/from the mediated matrix device. To assign/unassign a control domain,
+      to/from the matrix mediated device. To assign/unassign a control domain,
       the ID of the domain to be assigned/unassigned is echoed to the respective
       attribute file.
     control_domains:
       A read-only file for displaying the control domain numbers assigned to the
-      mediated matrix device.
+      matrix mediated device.
 
 * functions:
 
@@ -385,7 +393,7 @@ matrix device.
       domains assigned via the corresponding sysfs attributes files
 
   remove:
-    deallocates the mediated matrix device's ap_matrix_mdev structure. This will
+    deallocates the matrix mediated device's ap_matrix_mdev structure. This will
     be allowed only if a running guest is not using the mdev.
 
 * callback interfaces
@@ -397,7 +405,7 @@ matrix device.
     for the mdev matrix device to the MDEV bus. Access to the KVM structure used
     to configure the KVM guest is provided via this callback. The KVM structure,
     is used to configure the guest's access to the AP matrix defined via the
-    mediated matrix device's sysfs attribute files.
+    matrix mediated device's sysfs attribute files.
   release:
     unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
     mdev matrix device and deconfigures the guest's AP matrix.
@@ -410,11 +418,49 @@ function is called when QEMU connects to KVM. The guest's AP matrix is
 configured via it's CRYCB by:
 
 * Setting the bits in the APM corresponding to the APIDs assigned to the
-  mediated matrix device via its 'assign_adapter' interface.
+  matrix mediated device via its 'assign_adapter' interface.
 * Setting the bits in the AQM corresponding to the domains assigned to the
-  mediated matrix device via its 'assign_domain' interface.
+  matrix mediated device via its 'assign_domain' interface.
 * Setting the bits in the ADM corresponding to the domain dIDs assigned to the
-  mediated matrix device via its 'assign_control_domains' interface.
+  matrix mediated device via its 'assign_control_domains' interface.
+
+The linux device model precludes passing a device through to a KVM guest that
+is not bound to the device driver facilitating its pass-through. Consequently,
+an APQN that does not reference a queue device bound to the vfio_ap device
+driver will not be assigned to a KVM guest's CRYCB. The AP architecture,
+however, does not provide a means to filter individual APQNs from the guest's
+CRYCB, so the following logic is employed to filter them:
+
+* Filter the APQNs assigned to the matrix mediated device by APID.
+
+  To filter APQNs by APID, each APQN derived from the Cartesian product of the
+  adapter numbers (APID) and domain numbers (APQI) assigned to the mdev is
+  examined and if any one of them does not reference a queue device bound to the
+  vfio_ap device driver, the adapter will not be plugged into the guest (i.e.,
+  the bit corresponding to its APID will not be set in the APM of the guest's
+  CRYCB).
+
+  If at least one adapter is plugged into the guest, then all domains assigned
+  to the mdev will also be plugged into the guest (i.e., the bits corresponding
+  to the APQIs of the domains assigned to the mdev will be set in the AQM field
+  of the guest's CRYCB).
+
+* Filter the APQNs assigned to the matrix mediated device by APQI.
+
+  The APQNs will be filtered by APQI if filtering by APID does not result in any
+  adapters or domains getting plugged into the guest.
+
+  To filter APQNs by APQI, each APQN derived from the Cartesian product of the
+  adapter numbers (APID) and domain numbers (APQI) assigned to the mdev is
+  examined and if any one of them does not reference a queue device bound to the
+  vfio_ap device driver, the domain will not be plugged into the guest (i.e.,
+  the bit corresponding to its APQI will not be set in the AQM of the guest's
+  CRYCB).
+
+  If at least one domain is plugged into the guest, then all adapters assigned
+  to the mdev will also be plugged into the guest (i.e., the bits corresponding
+  to the APIDs of the adapters assigned to the mdev will be set in the APM field
+  of the guest's CRYCB).
 
 The CPU model features for AP
 -----------------------------
@@ -435,6 +481,10 @@ available to a KVM guest via the following CPU model features:
    can be made available to the guest only if it is available on the host (i.e.,
    facility bit 12 is set).
 
+4. apqi: Indicates AP queue interrupts are available on the guest. This facility
+   can be made available to the guest only if it is available on the host (i.e.,
+   facility bit 65 is set).
+
 Note: If the user chooses to specify a CPU model different than the 'host'
 model to QEMU, the CPU model features and facilities need to be turned on
 explicitly; for example::
@@ -444,7 +494,7 @@ explicitly; for example::
 A guest can be precluded from using AP features/facilities by turning them off
 explicitly; for example::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off
 
 Note: If the APFT facility is turned off (apft=off) for the guest, the guest
 will not see any AP devices. The zcrypt device drivers that register for type 10
@@ -530,40 +580,56 @@ These are the steps:
 
 2. Secure the AP queues to be used by the three guests so that the host can not
    access them. To secure them, there are two sysfs files that specify
-   bitmasks marking a subset of the APQN range as 'usable by the default AP
-   queue device drivers' or 'not usable by the default device drivers' and thus
-   available for use by the vfio_ap device driver'. The location of the sysfs
-   files containing the masks are::
+   bitmasks marking a subset of the APQN range as usable only by the default AP
+   queue device drivers. All remaining APQNs are available available for use by
+   any other device driver. The vfio_ap device driver is currently the only
+   non-default device driver. The location of the sysfs files containing the
+   masks are::
 
      /sys/bus/ap/apmask
      /sys/bus/ap/aqmask
 
    The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
-   (APID). Each bit in the mask, from left to right (i.e., from most significant
-   to least significant bit in big endian order), corresponds to an APID from
-   0-255. If a bit is set, the APID is marked as usable only by the default AP
-   queue device drivers; otherwise, the APID is usable by the vfio_ap
-   device driver.
+   (APID). Each bit in the mask, from left to right corresponds to an APID from
+   0-255. If a bit is set, the APID is marked as available to the default AP
+   queue device drivers.
 
    The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
-   (APQI). Each bit in the mask, from left to right (i.e., from most significant
-   to least significant bit in big endian order), corresponds to an APQI from
-   0-255. If a bit is set, the APQI is marked as usable only by the default AP
-   queue device drivers; otherwise, the APQI is usable by the vfio_ap device
-   driver.
+   (APQI). Each bit in the mask, from left to right corresponds to an APQI from
+   0-255. If a bit is set, the APQI is marked as available to the default AP
+   queue device drivers.
+
+   The Cartesian product of the APIDs corresponding to the bits set in the
+   apmask and the APQIs corresponding to the bits set in the aqmask comprise
+   the subset of APQNs that can be used only by the host default device drivers.
+   All other APQNs are available to the non-default device drivers such as the
+   vfio_ap driver.
+
+   Take, for example, the following masks::
+
+      apmask:
+      0x7d00000000000000000000000000000000000000000000000000000000000000
+
+      aqmask:
+      0x8000000000000000000000000000000000000000000000000000000000000000
+
+   The masks indicate:
 
-   Take, for example, the following mask::
+   * Adapters 1, 2, 3, 4, 5, and 7 are available for use by the host default
+     device drivers.
 
-      0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+   * Domain 0 is available for use by the host default device drivers
 
-    It indicates:
+   * The subset of APQNs available for use only by the default host device
+     drivers are:
 
-      1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
-      belong to the vfio_ap device driver's pool.
+     (1,0), (2,0), (3,0), (4.0), (5,0) and (7,0)
+
+   * All other APQNs are available for use by the non-default device drivers.
 
    The APQN of each AP queue device assigned to the linux host is checked by the
-   AP bus against the set of APQNs derived from the cross product of APIDs
-   and APQIs marked as usable only by the default AP queue device drivers. If a
+   AP bus against the set of APQNs derived from the Cartesian product of APIDs
+   and APQIs marked as available to the default AP queue device drivers. If a
    match is detected,  only the default AP queue device drivers will be probed;
    otherwise, the vfio_ap device driver will be probed.
 
@@ -627,6 +693,16 @@ These are the steps:
 	    default drivers pool:    adapter 0-15, domain 1
 	    alternate drivers pool:  adapter 16-255, domains 0, 2-255
 
+   Note ***:
+   Changing a mask such that one or more APQNs will be taken from a matrix
+   mediated device (see below) will fail with an error (EADDRINUSE). The error
+   is logged to the kernel ring buffer which can be viewed with the 'dmesg'
+   command. The output identifies each APQN flagged as 'in use' and the matrix
+   mediated device to which it is assigned; for example:
+
+   Userspace may not re-assign queue 05.0054 already assigned to 62177883-f1bb-47f0-914d-32a22e3a8804
+   Userspace may not re-assign queue 04.0054 already assigned to cef03c3c-903d-4ecc-9a83-40694cb8aee4
+
 Securing the APQNs for our example
 ----------------------------------
    To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
@@ -684,7 +760,7 @@ Securing the APQNs for our example
 
      /sys/devices/vfio_ap/matrix/
      --- [mdev_supported_types]
-     ------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
+     ------ [vfio_ap-passthrough] (passthrough matrix mediated device type)
      --------- create
      --------- [devices]
 
@@ -775,17 +851,18 @@ Securing the APQNs for our example
      higher than the maximum is specified, the operation will terminate with
      an error (ENODEV).
 
-   * All APQNs that can be derived from the adapter ID and the IDs of
-     the previously assigned domains must be bound to the vfio_ap device
-     driver. If no domains have yet been assigned, then there must be at least
-     one APQN with the specified APID bound to the vfio_ap driver. If no such
-     APQNs are bound to the driver, the operation will terminate with an
-     error (EADDRNOTAVAIL).
+   * All APQNs that can be derived from the Cartesian product of the APID of the
+     adapter being assigned and the APQIs of the previously assigned domains
+     must be available to the vfio_ap device driver as specified in the sysfs
+     /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even one APQN
+     is reserved for use by the host device driver, the operation will terminate
+     with an error (EADDRNOTAVAIL).
 
-     No APQN that can be derived from the adapter ID and the IDs of the
-     previously assigned domains can be assigned to another mediated matrix
-     device. If an APQN is assigned to another mediated matrix device, the
-     operation will terminate with an error (EADDRINUSE).
+   * No APQN that can be derived from the Cartesian product of the APID of the
+     adapter being assigned and the APQIs of the previously assigned domains can
+     be assigned to another matrix mediated device. If even one APQN is assigned
+     to another matrix mediated device, the operation will terminate with an
+     error (EADDRINUSE).
 
    In order to successfully assign a domain:
 
@@ -794,17 +871,18 @@ Securing the APQNs for our example
      higher than the maximum is specified, the operation will terminate with
      an error (ENODEV).
 
-   * All APQNs that can be derived from the domain ID and the IDs of
-     the previously assigned adapters must be bound to the vfio_ap device
-     driver. If no domains have yet been assigned, then there must be at least
-     one APQN with the specified APQI bound to the vfio_ap driver. If no such
-     APQNs are bound to the driver, the operation will terminate with an
-     error (EADDRNOTAVAIL).
+   * All APQNs that can be derived from the Cartesian product of the APQI of the
+     domain being assigned and the APIDs of the previously assigned adapters
+     must be available to the vfio_ap device driver as specified in the sysfs
+     /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even one APQN
+     is reserved for use by the host device driver, the operation will terminate
+     with an error (EADDRNOTAVAIL).
 
-     No APQN that can be derived from the domain ID and the IDs of the
-     previously assigned adapters can be assigned to another mediated matrix
-     device. If an APQN is assigned to another mediated matrix device, the
-     operation will terminate with an error (EADDRINUSE).
+   * No APQN that can be derived from the Cartesian product of the APQI of the
+     domain being assigned and the APIDs of the previously assigned adapters can
+     be assigned to another matrix mediated device. If even one APQN is assigned
+     to another matrix mediated device, the operation will terminate with an
+     error (EADDRINUSE).
 
    In order to successfully assign a control domain, the domain number
    specified must represent a value from 0 up to the maximum domain number
@@ -813,22 +891,22 @@ Securing the APQNs for our example
 
 5. Start Guest1::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
 	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...
 
 7. Start Guest2::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
 	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...
 
 7. Start Guest3::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
 	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ...
 
-When the guest is shut down, the mediated matrix devices may be removed.
+When the guest is shut down, the matrix mediated devices may be removed.
 
-Using our example again, to remove the mediated matrix device $uuid1::
+Using our example again, to remove the matrix mediated device $uuid1::
 
    /sys/devices/vfio_ap/matrix/
       --- [mdev_supported_types]
@@ -851,16 +929,146 @@ remove it if no guest will use it during the remaining lifetime of the linux
 host. If the mdev matrix device is removed, one may want to also reconfigure
 the pool of adapters and queues reserved for use by the default drivers.
 
+Hot plug support:
+================
+An adapter, domain or control domain may be hot plugged into a running KVM
+guest by assigning it to the matrix mediated device being used by the guest.
+Control domains will always be hot plugged; however, an adapter or domain will
+be hot plugged only if each new APQN resulting from its assignment
+references a queue device bound to the vfio_ap device driver as described
+below.
+
+When an adapter is assigned to a matrix mediated device in use by a KVM guest:
+
+* If no domains have yet been plugged into the KVM guest:
+
+  Hot plug the adapter and every domain previously assigned to the mdev if each
+  APQN derived from the Cartesian product of the APID of the adapter being
+  assigned and the APQIs of the domains previously assigned references a queue
+  device bound to the vfio_ap device driver.
+
+* If one or more domains have previously been plugged into the guest:
+
+  Hot plug the adapter if each APQN derived from the Cartesian product of the
+  APID of the adapter being assigned and the APQIs of the domains already
+  plugged into the guest references a queue device bound to the vfio_ap device
+  driver.
+
+When a domain is assigned to a matrix mediated device in use by a KVM guest:
+
+* If no adapters have yet been plugged into the KVM guest:
+
+  Hot plug the domain and every adapter previously assigned to the mdev if each
+  APQN derived from the Cartesian product of the APIDs of the adapters
+  previously assigned and the APQI of the domain being assigned references a
+  queue device bound to the vfio_ap device driver.
+
+* If one or more adapters have previously been plugged into the guest:
+
+  Hot plug the domain if each APQN derived from the Cartesian product of the
+  APIDs of the adapters already plugged into the guest and the APQI of the
+  domain being assigned references a queue device bound to the vfio_ap device
+  driver.
+
+Over-provisioning of AP queues for a KVM guest:
+==============================================
+Over-provisioning is defined herein as the assignment of adapters or domains to
+a matrix mediated device that do not reference AP devices in the host's AP
+configuration. The idea here is that when the adapter or domain becomes
+available, it will be automatically hot-plugged into the KVM guest using
+the matrix mediated device to which it is assigned as long as each new APQN
+resulting from plugging it in references a queue device bound to the vfio_ap
+device driver.
+
 Limitations
 ===========
-* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
-  to the default drivers pool of a queue that is still assigned to a mediated
-  device in use by a guest. It is incumbent upon the administrator to
-  ensure there is no mediated device in use by a guest to which the APQN is
-  assigned lest the host be given access to the private data of the AP queue
-  device such as a private key configured specifically for the guest.
+Live guest migration is not supported for guests using AP devices without
+intervention by a system administrator. Before a KVM guest can be migrated,
+the matrix mediated device must be removed. Unfortunately, it can not be
+removed manually (i.e., echo 1 > /sys/devices/vfio_ap/matrix/$UUID/remove) while
+the mdev is in use by a KVM guest. If the guest is being emulated by QEMU,
+its mdev can be hot unplugged from the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot unplug the mdev via
+   the following commands:
+
+      virsh detach-device <guestname> <path-to-device-xml>
+
+      For example, to hot unplug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 from
+      the guest named 'my-guest':
+
+         virsh detach-device my-guest ~/config/my-guest-hostdev.xml
+
+            The contents of my-guest-hostdev.xml:
+
+            <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+              <source>
+                <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+              </source>
+            </hostdev>
+
+
+      virsh qemu-monitor-command <guest-name> --hmp "device-del <device-id>"
+
+      For example, to hot unplug the matrix mediated device identified on the
+      qemu command line with 'id=hostdev0' from the guest named 'my-guest':
+
+         virsh qemu-monitor-command my-guest --hmp "device_del hostdev0"
+
+2. A matrix mediated device can be hot unplugged by attaching the qemu monitor
+   to the guest and using the following qemu monitor command:
+
+      (QEMU) device-del id=<device-id>
+
+      For example, to hot unplug the matrix mediated device that was specified
+      on the qemu command line with 'id=hostdev0' when the guest was started:
+
+         (QEMU) device-del id=hostdev0
+
+After live migration of the KVM guest completes, an AP configuration can be
+restored to the KVM guest by hot plugging a matrix mediated device on the target
+system into the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated
+   device into the guest via the following virsh commands:
+
+   virsh attach-device <guestname> <path-to-device-xml>
+
+      For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into
+      the guest named 'my-guest':
+
+         virsh attach-device my-guest ~/config/my-guest-hostdev.xml
+
+            The contents of my-guest-hostdev.xml:
+
+            <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+              <source>
+                <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+              </source>
+            </hostdev>
+
+
+   virsh qemu-monitor-command <guest-name> --hmp \
+   "device_add vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
+
+      For example, to hot plug the matrix mediated device
+      62177883-f1bb-47f0-914d-32a22e3a8804 into the guest named 'my-guest' with
+      device-id hostdev0:
+
+      virsh qemu-monitor-command my-guest --hmp \
+      "device_add vfio-ap,\
+      sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+      id=hostdev0"
+
+2. A matrix mediated device can be hot plugged by attaching the qemu monitor
+   to the guest and using the following qemu monitor command:
+
+      (qemu) device_add "vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
 
-* Dynamically modifying the AP matrix for a running guest (which would amount to
-  hot(un)plug of AP devices for the guest) is currently not supported
+      For example, to plug the matrix mediated device
+      62177883-f1bb-47f0-914d-32a22e3a8804 into the guest with the device-id
+      hostdev0:
 
-* Live guest migration is not supported for guests using AP devices.
+         (QEMU) device-add "vfio-ap,\
+         sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+         id=hostdev0"
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-22 17:11 ` [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset Tony Krowiak
@ 2020-10-22 19:44   ` kernel test robot
  2020-10-26 16:57     ` Tony Krowiak
  2020-10-27  6:48   ` Halil Pasic
  1 sibling, 1 reply; 68+ messages in thread
From: kernel test robot @ 2020-10-22 19:44 UTC (permalink / raw)
  To: Tony Krowiak, linux-s390, linux-kernel, kvm
  Cc: kbuild-all, freude, borntraeger, cohuck, mjrosato, pasic,
	alex.williamson, kwankhede

[-- Attachment #1: Type: text/plain, Size: 3942 bytes --]

Hi Tony,

I love your patch! Perhaps something to improve:

[auto build test WARNING on s390/features]
[also build test WARNING on linus/master kvms390/next linux/master v5.9 next-20201022]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-allyesconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/572c94c40a76754d49f07e4e383097d2db132f8c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
        git checkout 572c94c40a76754d49f07e4e383097d2db132f8c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/s390/crypto/vfio_ap_ops.c:1119:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
    1119 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~

vim +/vfio_ap_mdev_reset_queue +1119 drivers/s390/crypto/vfio_ap_ops.c

258287c994de8f Tony Krowiak 2018-09-25  1118  
ec89b55e3bce7c Pierre Morel 2019-05-21 @1119  int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
46a7263d4746a2 Tony Krowiak 2018-09-25  1120  			     unsigned int retry)
46a7263d4746a2 Tony Krowiak 2018-09-25  1121  {
46a7263d4746a2 Tony Krowiak 2018-09-25  1122  	struct ap_queue_status status;
ec89b55e3bce7c Pierre Morel 2019-05-21  1123  	int retry2 = 2;
ec89b55e3bce7c Pierre Morel 2019-05-21  1124  	int apqn = AP_MKQID(apid, apqi);
46a7263d4746a2 Tony Krowiak 2018-09-25  1125  
46a7263d4746a2 Tony Krowiak 2018-09-25  1126  	do {
ec89b55e3bce7c Pierre Morel 2019-05-21  1127  		status = ap_zapq(apqn);
46a7263d4746a2 Tony Krowiak 2018-09-25  1128  		switch (status.response_code) {
46a7263d4746a2 Tony Krowiak 2018-09-25  1129  		case AP_RESPONSE_NORMAL:
ec89b55e3bce7c Pierre Morel 2019-05-21  1130  			while (!status.queue_empty && retry2--) {
ec89b55e3bce7c Pierre Morel 2019-05-21  1131  				msleep(20);
ec89b55e3bce7c Pierre Morel 2019-05-21  1132  				status = ap_tapq(apqn, NULL);
ec89b55e3bce7c Pierre Morel 2019-05-21  1133  			}
024cdcdbf3cf99 Halil Pasic  2019-09-03  1134  			WARN_ON_ONCE(retry2 <= 0);
46a7263d4746a2 Tony Krowiak 2018-09-25  1135  			return 0;
46a7263d4746a2 Tony Krowiak 2018-09-25  1136  		case AP_RESPONSE_RESET_IN_PROGRESS:
46a7263d4746a2 Tony Krowiak 2018-09-25  1137  		case AP_RESPONSE_BUSY:
46a7263d4746a2 Tony Krowiak 2018-09-25  1138  			msleep(20);
46a7263d4746a2 Tony Krowiak 2018-09-25  1139  			break;
46a7263d4746a2 Tony Krowiak 2018-09-25  1140  		default:
46a7263d4746a2 Tony Krowiak 2018-09-25  1141  			/* things are really broken, give up */
46a7263d4746a2 Tony Krowiak 2018-09-25  1142  			return -EIO;
46a7263d4746a2 Tony Krowiak 2018-09-25  1143  		}
46a7263d4746a2 Tony Krowiak 2018-09-25  1144  	} while (retry--);
46a7263d4746a2 Tony Krowiak 2018-09-25  1145  
46a7263d4746a2 Tony Krowiak 2018-09-25  1146  	return -EBUSY;
46a7263d4746a2 Tony Krowiak 2018-09-25  1147  }
46a7263d4746a2 Tony Krowiak 2018-09-25  1148  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 63274 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
  2020-10-22 17:12 ` [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device Tony Krowiak
@ 2020-10-22 20:30   ` kernel test robot
  2020-10-26 17:04     ` Tony Krowiak
  2020-10-28 13:57   ` Halil Pasic
  1 sibling, 1 reply; 68+ messages in thread
From: kernel test robot @ 2020-10-22 20:30 UTC (permalink / raw)
  To: Tony Krowiak, linux-s390, linux-kernel, kvm
  Cc: kbuild-all, freude, borntraeger, cohuck, mjrosato, pasic,
	alex.williamson, kwankhede

[-- Attachment #1: Type: text/plain, Size: 3020 bytes --]

Hi Tony,

I love your patch! Perhaps something to improve:

[auto build test WARNING on s390/features]
[also build test WARNING on linus/master kvms390/next linux/master v5.9 next-20201022]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-allyesconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/aea9ab29b77facc3bb09415ebe464fd6a22ec22e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
        git checkout aea9ab29b77facc3bb09415ebe464fd6a22ec22e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/s390/crypto/vfio_ap_ops.c:1370:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
    1370 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/s390/crypto/vfio_ap_ops.c:1617:6: warning: no previous prototype for 'vfio_ap_mdev_hot_unplug_queue' [-Wmissing-prototypes]
    1617 | void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

vim +/vfio_ap_mdev_hot_unplug_queue +1617 drivers/s390/crypto/vfio_ap_ops.c

  1616	
> 1617	void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
  1618	{
  1619		unsigned long apid = AP_QID_CARD(q->apqn);
  1620	
  1621		if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
  1622			return;
  1623	
  1624		/*
  1625		 * If the APID is assigned to the guest, then let's
  1626		 * go ahead and unplug the adapter since the
  1627		 * architecture does not provide a means to unplug
  1628		 * an individual queue.
  1629		 */
  1630		if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
  1631			clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
  1632	
  1633			if (bitmap_empty(q->matrix_mdev->shadow_apcb.apm, AP_DEVICES))
  1634				bitmap_clear(q->matrix_mdev->shadow_apcb.aqm, 0,
  1635					     AP_DOMAINS);
  1636	
  1637			vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
  1638		}
  1639	}
  1640	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 63274 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 12/14] s390/vfio-ap: handle host AP config change notification
  2020-10-22 17:12 ` [PATCH v11 12/14] s390/vfio-ap: handle host AP config change notification Tony Krowiak
@ 2020-10-22 21:17   ` kernel test robot
  2020-10-26 17:07     ` Tony Krowiak
  2020-10-26 17:21     ` Tony Krowiak
  2020-11-03  9:48   ` kernel test robot
  1 sibling, 2 replies; 68+ messages in thread
From: kernel test robot @ 2020-10-22 21:17 UTC (permalink / raw)
  To: Tony Krowiak, linux-s390, linux-kernel, kvm
  Cc: kbuild-all, freude, borntraeger, cohuck, mjrosato, pasic,
	alex.williamson, kwankhede

[-- Attachment #1: Type: text/plain, Size: 5265 bytes --]

Hi Tony,

I love your patch! Perhaps something to improve:

[auto build test WARNING on s390/features]
[also build test WARNING on linus/master next-20201022]
[cannot apply to kvms390/next linux/master v5.9]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-allyesconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/32786ef6d4ba3703d993a8894ea1d763785fd3a4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
        git checkout 32786ef6d4ba3703d993a8894ea1d763785fd3a4
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/s390/crypto/vfio_ap_ops.c:1316:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
    1316 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~
   drivers/s390/crypto/vfio_ap_ops.c:1568:6: warning: no previous prototype for 'vfio_ap_mdev_hot_unplug_queue' [-Wmissing-prototypes]
    1568 | void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/s390/crypto/vfio_ap_ops.c: In function 'vfio_ap_mdev_on_cfg_remove':
>> drivers/s390/crypto/vfio_ap_ops.c:1777:7: warning: variable 'unassigned' set but not used [-Wunused-but-set-variable]
    1777 |  bool unassigned = false;
         |       ^~~~~~~~~~
   drivers/s390/crypto/vfio_ap_ops.c: At top level:
>> drivers/s390/crypto/vfio_ap_ops.c:1813:6: warning: no previous prototype for 'vfio_ap_mdev_on_cfg_add' [-Wmissing-prototypes]
    1813 | void vfio_ap_mdev_on_cfg_add(void)
         |      ^~~~~~~~~~~~~~~~~~~~~~~
   In file included from drivers/s390/crypto/vfio_ap_ops.c:11:
   In function 'memcpy',
       inlined from 'vfio_ap_mdev_unassign_apids' at drivers/s390/crypto/vfio_ap_ops.c:1655:3,
       inlined from 'vfio_ap_mdev_on_cfg_remove' at drivers/s390/crypto/vfio_ap_ops.c:1800:8,
       inlined from 'vfio_ap_on_cfg_changed' at drivers/s390/crypto/vfio_ap_ops.c:1836:2:
   include/linux/string.h:402:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
     402 |    __read_overflow2();
         |    ^~~~~~~~~~~~~~~~~~

vim +/unassigned +1777 drivers/s390/crypto/vfio_ap_ops.c

  1774	
  1775	static void vfio_ap_mdev_on_cfg_remove(void)
  1776	{
> 1777		bool unassigned = false;
  1778		int ap_remove, aq_remove;
  1779		struct ap_matrix_mdev *matrix_mdev;
  1780		DECLARE_BITMAP(apid_rem, AP_DEVICES);
  1781		DECLARE_BITMAP(apqi_rem, AP_DOMAINS);
  1782		unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
  1783	
  1784		cur_apm = (unsigned long *)matrix_dev->config_info.apm;
  1785		cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
  1786		prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
  1787		prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
  1788	
  1789		ap_remove = bitmap_andnot(apid_rem, prev_apm, cur_apm, AP_DEVICES);
  1790		aq_remove = bitmap_andnot(apqi_rem, prev_aqm, cur_aqm, AP_DOMAINS);
  1791	
  1792		if (!ap_remove && !aq_remove)
  1793			return;
  1794	
  1795		list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
  1796			if (!vfio_ap_mdev_has_crycb(matrix_mdev))
  1797				continue;
  1798	
  1799			if (ap_remove) {
  1800				if (vfio_ap_mdev_unassign_apids(matrix_mdev, apid_rem))
  1801					unassigned = true;
  1802				vfio_ap_mdev_unlink_apids(matrix_mdev, apid_rem);
  1803			}
  1804	
  1805			if (aq_remove) {
  1806				if (vfio_ap_mdev_unassign_apqis(matrix_mdev, apqi_rem))
  1807					unassigned = true;
  1808				vfio_ap_mdev_unlink_apqis(matrix_mdev, apqi_rem);
  1809			}
  1810		}
  1811	}
  1812	
> 1813	void vfio_ap_mdev_on_cfg_add(void)
  1814	{
  1815		unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
  1816	
  1817		cur_apm = (unsigned long *)matrix_dev->config_info.apm;
  1818		cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
  1819	
  1820		prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
  1821		prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
  1822	
  1823		bitmap_andnot(matrix_dev->ap_add, cur_apm, prev_apm, AP_DEVICES);
  1824		bitmap_andnot(matrix_dev->aq_add, cur_aqm, prev_aqm, AP_DOMAINS);
  1825	}
  1826	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 63274 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-22 19:44   ` kernel test robot
@ 2020-10-26 16:57     ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-26 16:57 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: kbuild-all, freude, borntraeger, cohuck, mjrosato, pasic,
	alex.williamson, kwankhede



On 10/22/20 3:44 PM, kernel test robot wrote:
> Hi Tony,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on s390/features]
> [also build test WARNING on linus/master kvms390/next linux/master v5.9 next-20201022]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
> config: s390-allyesconfig (attached as .config)
> compiler: s390-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # https://github.com/0day-ci/linux/commit/572c94c40a76754d49f07e4e383097d2db132f8c
>          git remote add linux-review https://github.com/0day-ci/linux
>          git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
>          git checkout 572c94c40a76754d49f07e4e383097d2db132f8c
>          # save the attached .config to linux build tree
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All warnings (new ones prefixed by >>):
>
>>> drivers/s390/crypto/vfio_ap_ops.c:1119:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
>      1119 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>           |     ^~~~~~~~~~~~~~~~~~~~~~~~

This function needs to be made static because it is no longer defined in 
the header file.

>
> vim +/vfio_ap_mdev_reset_queue +1119 drivers/s390/crypto/vfio_ap_ops.c
>
> 258287c994de8f Tony Krowiak 2018-09-25  1118
> ec89b55e3bce7c Pierre Morel 2019-05-21 @1119  int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1120  			     unsigned int retry)
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1121  {
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1122  	struct ap_queue_status status;
> ec89b55e3bce7c Pierre Morel 2019-05-21  1123  	int retry2 = 2;
> ec89b55e3bce7c Pierre Morel 2019-05-21  1124  	int apqn = AP_MKQID(apid, apqi);
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1125
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1126  	do {
> ec89b55e3bce7c Pierre Morel 2019-05-21  1127  		status = ap_zapq(apqn);
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1128  		switch (status.response_code) {
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1129  		case AP_RESPONSE_NORMAL:
> ec89b55e3bce7c Pierre Morel 2019-05-21  1130  			while (!status.queue_empty && retry2--) {
> ec89b55e3bce7c Pierre Morel 2019-05-21  1131  				msleep(20);
> ec89b55e3bce7c Pierre Morel 2019-05-21  1132  				status = ap_tapq(apqn, NULL);
> ec89b55e3bce7c Pierre Morel 2019-05-21  1133  			}
> 024cdcdbf3cf99 Halil Pasic  2019-09-03  1134  			WARN_ON_ONCE(retry2 <= 0);
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1135  			return 0;
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1136  		case AP_RESPONSE_RESET_IN_PROGRESS:
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1137  		case AP_RESPONSE_BUSY:
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1138  			msleep(20);
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1139  			break;
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1140  		default:
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1141  			/* things are really broken, give up */
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1142  			return -EIO;
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1143  		}
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1144  	} while (retry--);
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1145
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1146  	return -EBUSY;
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1147  }
> 46a7263d4746a2 Tony Krowiak 2018-09-25  1148
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
  2020-10-22 20:30   ` kernel test robot
@ 2020-10-26 17:04     ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-26 17:04 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: kbuild-all, freude, borntraeger, cohuck, mjrosato, pasic,
	alex.williamson, kwankhede



On 10/22/20 4:30 PM, kernel test robot wrote:
> Hi Tony,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on s390/features]
> [also build test WARNING on linus/master kvms390/next linux/master v5.9 next-20201022]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
> config: s390-allyesconfig (attached as .config)
> compiler: s390-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # https://github.com/0day-ci/linux/commit/aea9ab29b77facc3bb09415ebe464fd6a22ec22e
>          git remote add linux-review https://github.com/0day-ci/linux
>          git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
>          git checkout aea9ab29b77facc3bb09415ebe464fd6a22ec22e
>          # save the attached .config to linux build tree
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All warnings (new ones prefixed by >>):
>
>     drivers/s390/crypto/vfio_ap_ops.c:1370:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
>      1370 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>           |     ^~~~~~~~~~~~~~~~~~~~~~~~

My mistake, need to be a static function.

>>> drivers/s390/crypto/vfio_ap_ops.c:1617:6: warning: no previous prototype for 'vfio_ap_mdev_hot_unplug_queue' [-Wmissing-prototypes]
>      1617 | void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
>           |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ditto here, but this was reported for patch 01/14 and will be fixed there.

>
> vim +/vfio_ap_mdev_hot_unplug_queue +1617 drivers/s390/crypto/vfio_ap_ops.c
>
>    1616	
>> 1617	void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
>    1618	{
>    1619		unsigned long apid = AP_QID_CARD(q->apqn);
>    1620	
>    1621		if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
>    1622			return;
>    1623	
>    1624		/*
>    1625		 * If the APID is assigned to the guest, then let's
>    1626		 * go ahead and unplug the adapter since the
>    1627		 * architecture does not provide a means to unplug
>    1628		 * an individual queue.
>    1629		 */
>    1630		if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
>    1631			clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
>    1632	
>    1633			if (bitmap_empty(q->matrix_mdev->shadow_apcb.apm, AP_DEVICES))
>    1634				bitmap_clear(q->matrix_mdev->shadow_apcb.aqm, 0,
>    1635					     AP_DOMAINS);
>    1636	
>    1637			vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
>    1638		}
>    1639	}
>    1640	
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 12/14] s390/vfio-ap: handle host AP config change notification
  2020-10-22 21:17   ` kernel test robot
@ 2020-10-26 17:07     ` Tony Krowiak
  2020-10-26 17:21     ` Tony Krowiak
  1 sibling, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-26 17:07 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: kbuild-all, freude, borntraeger, cohuck, mjrosato, pasic,
	alex.williamson, kwankhede



On 10/22/20 5:17 PM, kernel test robot wrote:
> Hi Tony,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on s390/features]
> [also build test WARNING on linus/master next-20201022]
> [cannot apply to kvms390/next linux/master v5.9]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
> config: s390-allyesconfig (attached as .config)
> compiler: s390-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # https://github.com/0day-ci/linux/commit/32786ef6d4ba3703d993a8894ea1d763785fd3a4
>          git remote add linux-review https://github.com/0day-ci/linux
>          git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
>          git checkout 32786ef6d4ba3703d993a8894ea1d763785fd3a4
>          # save the attached .config to linux build tree
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All warnings (new ones prefixed by >>):
>
>     drivers/s390/crypto/vfio_ap_ops.c:1316:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
>      1316 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>           |     ^~~~~~~~~~~~~~~~~~~~~~~~

This was also reported for patch 01/14 and will be fixed there.

>     drivers/s390/crypto/vfio_ap_ops.c:1568:6: warning: no previous prototype for 'vfio_ap_mdev_hot_unplug_queue' [-Wmissing-prototypes]
>      1568 | void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
>           |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This was also reported for patch 08/14 and will be fixed there.

>     drivers/s390/crypto/vfio_ap_ops.c: In function 'vfio_ap_mdev_on_cfg_remove':
>>> drivers/s390/crypto/vfio_ap_ops.c:1777:7: warning: variable 'unassigned' set but not used [-Wunused-but-set-variable]
>      1777 |  bool unassigned = false;
>           |       ^~~~~~~~~~

This will be removed.

>     drivers/s390/crypto/vfio_ap_ops.c: At top level:
>>> drivers/s390/crypto/vfio_ap_ops.c:1813:6: warning: no previous prototype for 'vfio_ap_mdev_on_cfg_add' [-Wmissing-prototypes]
>      1813 | void vfio_ap_mdev_on_cfg_add(void)
>           |      ^~~~~~~~~~~~~~~~~~~~~~~
>     In file included from drivers/s390/crypto/vfio_ap_ops.c:11:
>     In function 'memcpy',
>         inlined from 'vfio_ap_mdev_unassign_apids' at drivers/s390/crypto/vfio_ap_ops.c:1655:3,
>         inlined from 'vfio_ap_mdev_on_cfg_remove' at drivers/s390/crypto/vfio_ap_ops.c:1800:8,
>         inlined from 'vfio_ap_on_cfg_changed' at drivers/s390/crypto/vfio_ap_ops.c:1836:2:
>     include/linux/string.h:402:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
>       402 |    __read_overflow2();
>           |    ^~~~~~~~~~~~~~~~~~
>
> vim +/unassigned +1777 drivers/s390/crypto/vfio_ap_ops.c
>
>    1774	
>    1775	static void vfio_ap_mdev_on_cfg_remove(void)
>    1776	{
>> 1777		bool unassigned = false;
>    1778		int ap_remove, aq_remove;
>    1779		struct ap_matrix_mdev *matrix_mdev;
>    1780		DECLARE_BITMAP(apid_rem, AP_DEVICES);
>    1781		DECLARE_BITMAP(apqi_rem, AP_DOMAINS);
>    1782		unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
>    1783	
>    1784		cur_apm = (unsigned long *)matrix_dev->config_info.apm;
>    1785		cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
>    1786		prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
>    1787		prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
>    1788	
>    1789		ap_remove = bitmap_andnot(apid_rem, prev_apm, cur_apm, AP_DEVICES);
>    1790		aq_remove = bitmap_andnot(apqi_rem, prev_aqm, cur_aqm, AP_DOMAINS);
>    1791	
>    1792		if (!ap_remove && !aq_remove)
>    1793			return;
>    1794	
>    1795		list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
>    1796			if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>    1797				continue;
>    1798	
>    1799			if (ap_remove) {
>    1800				if (vfio_ap_mdev_unassign_apids(matrix_mdev, apid_rem))
>    1801					unassigned = true;
>    1802				vfio_ap_mdev_unlink_apids(matrix_mdev, apid_rem);
>    1803			}
>    1804	
>    1805			if (aq_remove) {
>    1806				if (vfio_ap_mdev_unassign_apqis(matrix_mdev, apqi_rem))
>    1807					unassigned = true;
>    1808				vfio_ap_mdev_unlink_apqis(matrix_mdev, apqi_rem);
>    1809			}
>    1810		}
>    1811	}
>    1812	
>> 1813	void vfio_ap_mdev_on_cfg_add(void)
>    1814	{
>    1815		unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
>    1816	
>    1817		cur_apm = (unsigned long *)matrix_dev->config_info.apm;
>    1818		cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
>    1819	
>    1820		prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
>    1821		prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
>    1822	
>    1823		bitmap_andnot(matrix_dev->ap_add, cur_apm, prev_apm, AP_DEVICES);
>    1824		bitmap_andnot(matrix_dev->aq_add, cur_aqm, prev_aqm, AP_DOMAINS);
>    1825	}
>    1826	
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 12/14] s390/vfio-ap: handle host AP config change notification
  2020-10-22 21:17   ` kernel test robot
  2020-10-26 17:07     ` Tony Krowiak
@ 2020-10-26 17:21     ` Tony Krowiak
  1 sibling, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-26 17:21 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: kbuild-all, freude, borntraeger, cohuck, mjrosato, pasic,
	alex.williamson, kwankhede



On 10/22/20 5:17 PM, kernel test robot wrote:
> Hi Tony,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on s390/features]
> [also build test WARNING on linus/master next-20201022]
> [cannot apply to kvms390/next linux/master v5.9]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
> config: s390-allyesconfig (attached as .config)
> compiler: s390-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # https://github.com/0day-ci/linux/commit/32786ef6d4ba3703d993a8894ea1d763785fd3a4
>          git remote add linux-review https://github.com/0day-ci/linux
>          git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
>          git checkout 32786ef6d4ba3703d993a8894ea1d763785fd3a4
>          # save the attached .config to linux build tree
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All warnings (new ones prefixed by >>):
>
>     drivers/s390/crypto/vfio_ap_ops.c:1316:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
>      1316 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>           |     ^~~~~~~~~~~~~~~~~~~~~~~~
>     drivers/s390/crypto/vfio_ap_ops.c:1568:6: warning: no previous prototype for 'vfio_ap_mdev_hot_unplug_queue' [-Wmissing-prototypes]
>      1568 | void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
>           |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     drivers/s390/crypto/vfio_ap_ops.c: In function 'vfio_ap_mdev_on_cfg_remove':
>>> drivers/s390/crypto/vfio_ap_ops.c:1777:7: warning: variable 'unassigned' set but not used [-Wunused-but-set-variable]
>      1777 |  bool unassigned = false;
>           |       ^~~~~~~~~~
>     drivers/s390/crypto/vfio_ap_ops.c: At top level:
>>> drivers/s390/crypto/vfio_ap_ops.c:1813:6: warning: no previous prototype for 'vfio_ap_mdev_on_cfg_add' [-Wmissing-prototypes]
>      1813 | void vfio_ap_mdev_on_cfg_add(void)
>           |      ^~~~~~~~~~~~~~~~~~~~~~~

Needs to be static, will fix.

>     In file included from drivers/s390/crypto/vfio_ap_ops.c:11:
>     In function 'memcpy',
>         inlined from 'vfio_ap_mdev_unassign_apids' at drivers/s390/crypto/vfio_ap_ops.c:1655:3,
>         inlined from 'vfio_ap_mdev_on_cfg_remove' at drivers/s390/crypto/vfio_ap_ops.c:1800:8,
>         inlined from 'vfio_ap_on_cfg_changed' at drivers/s390/crypto/vfio_ap_ops.c:1836:2:
>     include/linux/string.h:402:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
>       402 |    __read_overflow2();
>           |    ^~~~~~~~~~~~~~~~~~

Will replace memcpy with bitmap_copy.

>
> vim +/unassigned +1777 drivers/s390/crypto/vfio_ap_ops.c
>
>    1774	
>    1775	static void vfio_ap_mdev_on_cfg_remove(void)
>    1776	{
>> 1777		bool unassigned = false;
>    1778		int ap_remove, aq_remove;
>    1779		struct ap_matrix_mdev *matrix_mdev;
>    1780		DECLARE_BITMAP(apid_rem, AP_DEVICES);
>    1781		DECLARE_BITMAP(apqi_rem, AP_DOMAINS);
>    1782		unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
>    1783	
>    1784		cur_apm = (unsigned long *)matrix_dev->config_info.apm;
>    1785		cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
>    1786		prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
>    1787		prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
>    1788	
>    1789		ap_remove = bitmap_andnot(apid_rem, prev_apm, cur_apm, AP_DEVICES);
>    1790		aq_remove = bitmap_andnot(apqi_rem, prev_aqm, cur_aqm, AP_DOMAINS);
>    1791	
>    1792		if (!ap_remove && !aq_remove)
>    1793			return;
>    1794	
>    1795		list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
>    1796			if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>    1797				continue;
>    1798	
>    1799			if (ap_remove) {
>    1800				if (vfio_ap_mdev_unassign_apids(matrix_mdev, apid_rem))
>    1801					unassigned = true;
>    1802				vfio_ap_mdev_unlink_apids(matrix_mdev, apid_rem);
>    1803			}
>    1804	
>    1805			if (aq_remove) {
>    1806				if (vfio_ap_mdev_unassign_apqis(matrix_mdev, apqi_rem))
>    1807					unassigned = true;
>    1808				vfio_ap_mdev_unlink_apqis(matrix_mdev, apqi_rem);
>    1809			}
>    1810		}
>    1811	}
>    1812	
>> 1813	void vfio_ap_mdev_on_cfg_add(void)
>    1814	{
>    1815		unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
>    1816	
>    1817		cur_apm = (unsigned long *)matrix_dev->config_info.apm;
>    1818		cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
>    1819	
>    1820		prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
>    1821		prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
>    1822	
>    1823		bitmap_andnot(matrix_dev->ap_add, cur_apm, prev_apm, AP_DEVICES);
>    1824		bitmap_andnot(matrix_dev->aq_add, cur_aqm, prev_aqm, AP_DOMAINS);
>    1825	}
>    1826	
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-22 17:11 ` [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset Tony Krowiak
  2020-10-22 19:44   ` kernel test robot
@ 2020-10-27  6:48   ` Halil Pasic
  2020-10-29 23:29     ` Tony Krowiak
  1 sibling, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-10-27  6:48 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 22 Oct 2020 13:11:56 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The queues assigned to a matrix mediated device are currently reset when:
> 
> * The VFIO_DEVICE_RESET ioctl is invoked
> * The mdev fd is closed by userspace (QEMU)
> * The mdev is removed from sysfs.

What about the situation when vfio_ap_mdev_group_notifier() is called to
tell us that our pointer to KVM is about to become invalid? Do we need to
clean up the IRQ stuff there?

> 
> Immediately after the reset of a queue, a call is made to disable
> interrupts for the queue. This is entirely unnecessary because the reset of
> a queue disables interrupts, so this will be removed.

Makes sense.

> 
> Since interrupt processing may have been enabled by the guest, it may also
> be necessary to clean up the resources used for interrupt processing. Part
> of the cleanup operation requires a reference to KVM, so a check is also
> being added to ensure the reference to KVM exists. The reason is because
> the release callback - invoked when userspace closes the mdev fd - removes
> the reference to KVM. When the remove callback - called when the mdev is
> removed from sysfs - is subsequently invoked, there will be no reference to
> KVM when the cleanup is performed.

Please see below in the code.

> 
> This patch will also do a bit of refactoring due to the fact that the
> remove callback, implemented in vfio_ap_drv.c, disables the queue after
> resetting it. Instead of the remove callback making a call into the
> vfio_ap_ops.c to clean up the resources used for interrupt processing,
> let's move the probe and remove callbacks into the vfio_ap_ops.c
> file keep all code related to managing queues in a single file.
>

It would have been helpful to split out the refactoring as a separate
patch. This way it is harder to review the code that got moved, because
it is intermingled with the changes that intend to change behavior.
 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_drv.c     | 45 +------------------
>  drivers/s390/crypto/vfio_ap_ops.c     | 63 +++++++++++++++++++--------
>  drivers/s390/crypto/vfio_ap_private.h |  7 +--
>  3 files changed, 52 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index be2520cc010b..73bd073fd5d3 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -43,47 +43,6 @@ static struct ap_device_id ap_queue_ids[] = {
>  
>  MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>  
> -/**
> - * vfio_ap_queue_dev_probe:
> - *
> - * Allocate a vfio_ap_queue structure and associate it
> - * with the device as driver_data.
> - */
> -static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
> -{
> -	struct vfio_ap_queue *q;
> -
> -	q = kzalloc(sizeof(*q), GFP_KERNEL);
> -	if (!q)
> -		return -ENOMEM;
> -	dev_set_drvdata(&apdev->device, q);
> -	q->apqn = to_ap_queue(&apdev->device)->qid;
> -	q->saved_isc = VFIO_AP_ISC_INVALID;
> -	return 0;
> -}
> -
> -/**
> - * vfio_ap_queue_dev_remove:
> - *
> - * Takes the matrix lock to avoid actions on this device while removing
> - * Free the associated vfio_ap_queue structure
> - */
> -static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
> -{
> -	struct vfio_ap_queue *q;
> -	int apid, apqi;
> -
> -	mutex_lock(&matrix_dev->lock);
> -	q = dev_get_drvdata(&apdev->device);
> -	dev_set_drvdata(&apdev->device, NULL);
> -	apid = AP_QID_CARD(q->apqn);
> -	apqi = AP_QID_QUEUE(q->apqn);
> -	vfio_ap_mdev_reset_queue(apid, apqi, 1);
> -	vfio_ap_irq_disable(q);
> -	kfree(q);
> -	mutex_unlock(&matrix_dev->lock);
> -}
> -
>  static void vfio_ap_matrix_dev_release(struct device *dev)
>  {
>  	struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev);
> @@ -186,8 +145,8 @@ static int __init vfio_ap_init(void)
>  		return ret;
>  
>  	memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
> -	vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
> -	vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
> +	vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
> +	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
>  	vfio_ap_drv.ids = ap_queue_ids;
>  
>  	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index e0bde8518745..c471832f0a30 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -119,7 +119,8 @@ static void vfio_ap_wait_for_irqclear(int apqn)
>   */
>  static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
>  {
> -	if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev)
> +	if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev &&
> +	    q->matrix_mdev->kvm)

Here is the check that the kvm reference exists, you mentioned in the
cover letter. You make only the gisc_unregister depend on it, because
that's what is going to explode.

But I'm actually wondering if "KVM is gone but we still haven't cleaned
up our aqic resources" is valid. I argue that it is not. The two
resources we manage are the gisc registration and the pinned page. I
argue that it makes on sense to keep what was the guests page pinned,
if here is no guest associated (we don't have KVM).

I assume the cleanup is supposed to be atomic from the perspective of
other threads/contexts, so I expect the cleanup either to be fully done
or not not entered the critical section.

So !kvm && (q->saved_isc != VFIO_AP_ISC_INVALID || q->saved_pfn) is a
bug. Isn't it?

In that sense this change would only hide the actual problem.

Is the scenario we are talking about something that can happen, or is
this just about programming defensively?

In any case, I don't think this is a good idea. We can be defensive
about it, but we have to do it differently.


>  		kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
>  	if (q->saved_pfn && q->matrix_mdev)
>  		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
> @@ -144,7 +145,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
>   * Returns if ap_aqic function failed with invalid, deconfigured or
>   * checkstopped AP.
>   */
> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
> +static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
>  {
>  	struct ap_qirq_ctrl aqic_gisa = {};
>  	struct ap_queue_status status;
> @@ -297,6 +298,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  	if (!q)
>  		goto out_unlock;
>  
> +	q->matrix_mdev = matrix_mdev;

What is the purpose of this? Doesn't the preceding vfio_ap_get_queue()
already set q->matrix_mdev?

>  	status = vcpu->run->s.regs.gprs[1];
>  
>  	/* If IR bit(16) is set we enable the interrupt */
> @@ -1114,20 +1116,6 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  	return NOTIFY_OK;
>  }
>  
> -static void vfio_ap_irq_disable_apqn(int apqn)
> -{
> -	struct device *dev;
> -	struct vfio_ap_queue *q;
> -
> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> -				 &apqn, match_apqn);
> -	if (dev) {
> -		q = dev_get_drvdata(dev);
> -		vfio_ap_irq_disable(q);
> -		put_device(dev);
> -	}
> -}
> -
>  int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>  			     unsigned int retry)
>  {
> @@ -1162,6 +1150,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>  {
>  	int ret;
>  	int rc = 0;
> +	struct vfio_ap_queue *q;
>  	unsigned long apid, apqi;
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>  			 */
>  			if (ret)
>  				rc = ret;
> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
> +			q = vfio_ap_get_queue(matrix_mdev,
> +					      AP_MKQID(apid, apqi));
> +			if (q)
> +				vfio_ap_free_aqic_resources(q);

Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't
think so. I mean does the current code (and vfio_ap_mdev_reset_queue()
in particular guarantee that the reset is actually done when we arrive
here)? BTW, I think we have a similar problem with the current code as
well.

Under what circumstances do we expect !q? If we don't, then we need to
complain one way or another.

I believe that each time we call vfio_ap_mdev_reset_queue(), we will
also want to call vfio_ap_free_aqic_resources(q) to clean up our aqic
resources associated with the queue -- if any. So I would really prefer
having a function that does both.

>  		}
>  	}
>  
> @@ -1302,3 +1294,40 @@ void vfio_ap_mdev_unregister(void)
>  {
>  	mdev_unregister_device(&matrix_dev->device);
>  }
> +
> +int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> +{
> +	struct vfio_ap_queue *q;
> +	struct ap_queue *queue;
> +
> +	queue = to_ap_queue(&apdev->device);
> +
> +	q = kzalloc(sizeof(*q), GFP_KERNEL);
> +	if (!q)
> +		return -ENOMEM;
> +
> +	dev_set_drvdata(&queue->ap_dev.device, q);
> +	q->apqn = queue->qid;
> +	q->saved_isc = VFIO_AP_ISC_INVALID;
> +
> +	return 0;
> +}
> +
> +void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
> +{
> +	struct vfio_ap_queue *q;
> +	struct ap_queue *queue;
> +	int apid, apqi;
> +
> +	queue = to_ap_queue(&apdev->device);

What is the benefit of rewriting this? You introduced
queue just to do queue->ap_dev to get to the apdev you
have in hand in the first place.

> +
> +	mutex_lock(&matrix_dev->lock);
> +	q = dev_get_drvdata(&queue->ap_dev.device);
> +	dev_set_drvdata(&queue->ap_dev.device, NULL);
> +	apid = AP_QID_CARD(q->apqn);
> +	apqi = AP_QID_QUEUE(q->apqn);
> +	vfio_ap_mdev_reset_queue(apid, apqi, 1);
> +	vfio_ap_free_aqic_resources(q);
> +	kfree(q);
> +	mutex_unlock(&matrix_dev->lock);
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index f46dde56b464..d9003de4fbad 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -90,8 +90,6 @@ struct ap_matrix_mdev {
>  
>  extern int vfio_ap_mdev_register(void);
>  extern void vfio_ap_mdev_unregister(void);
> -int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> -			     unsigned int retry);
>  
>  struct vfio_ap_queue {
>  	struct ap_matrix_mdev *matrix_mdev;
> @@ -100,5 +98,8 @@ struct vfio_ap_queue {
>  #define VFIO_AP_ISC_INVALID 0xff
>  	unsigned char saved_isc;
>  };
> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
> +
> +int vfio_ap_mdev_probe_queue(struct ap_device *queue);
> +void vfio_ap_mdev_remove_queue(struct ap_device *queue);
> +
>  #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 02/14] 390/vfio-ap: use new AP bus interface to search for queue devices
  2020-10-22 17:11 ` [PATCH v11 02/14] 390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
@ 2020-10-27  7:01   ` Halil Pasic
  2020-11-02 21:57     ` Tony Krowiak
  0 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-10-27  7:01 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 22 Oct 2020 13:11:57 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> This patch refactors the vfio_ap device driver to use the AP bus's
> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
> information about a queue that is bound to the vfio_ap device driver.
> The bus's ap_get_qdev() function retrieves the queue device from a
> hashtable keyed by APQN. This is much more efficient than looping over
> the list of devices attached to the AP bus by several orders of
> magnitude.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>

Reviewed-by: Halil Pasic <pasic@linux.ibm.com>

> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 35 +++++++++++++------------------
>  1 file changed, 14 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index c471832f0a30..049b97d7444c 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -26,43 +26,36 @@
>  
>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>  
> -static int match_apqn(struct device *dev, const void *data)
> -{
> -	struct vfio_ap_queue *q = dev_get_drvdata(dev);
> -
> -	return (q->apqn == *(int *)(data)) ? 1 : 0;
> -}
> -
>  /**
> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>   * @matrix_mdev: the associated mediated matrix
>   * @apqn: The queue APQN
>   *
> - * Retrieve a queue with a specific APQN from the list of the
> - * devices of the vfio_ap_drv.
> - * Verify that the APID and the APQI are set in the matrix.
> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
> + * the AP bus.
>   *
> - * Returns the pointer to the associated vfio_ap_queue
> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>   */
>  static struct vfio_ap_queue *vfio_ap_get_queue(
>  					struct ap_matrix_mdev *matrix_mdev,
> -					int apqn)
> +					unsigned long apqn)
>  {
> -	struct vfio_ap_queue *q;
> -	struct device *dev;
> +	struct ap_queue *queue;
> +	struct vfio_ap_queue *q = NULL;
>  
>  	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>  		return NULL;
>  	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>  		return NULL;
>  
> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> -				 &apqn, match_apqn);
> -	if (!dev)
> +	queue = ap_get_qdev(apqn);
> +	if (!queue)
>  		return NULL;
> -	q = dev_get_drvdata(dev);
> -	q->matrix_mdev = matrix_mdev;
> -	put_device(dev);
> +
> +	if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver)
> +		q = dev_get_drvdata(&queue->ap_dev.device);
> +

Needs to be called with the vfio_ap lock held, right? Otherwise the queue could
get unbound while we are working with it as a vfio_ap_queue... Noting
new, but might we worth documenting.

> +	put_device(&queue->ap_dev.device);
>  
>  	return q;
>  }


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 03/14] s390/vfio-ap: manage link between queue struct and matrix mdev
  2020-10-22 17:11 ` [PATCH v11 03/14] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
@ 2020-10-27  9:33   ` Halil Pasic
  0 siblings, 0 replies; 68+ messages in thread
From: Halil Pasic @ 2020-10-27  9:33 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 22 Oct 2020 13:11:58 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Let's create links between each queue device bound to the vfio_ap device
> driver and the matrix mdev to which the queue is assigned. The idea is to
> facilitate efficient retrieval of the objects representing the queue
> devices and matrix mdevs as well as to verify that a queue assigned to
> a matrix mdev is bound to the driver.
> 
> The links will be created as follows:
> 
>    * When the queue device is probed, if its APQN is assigned to a matrix
>      mdev, the structures representing the queue device and the matrix mdev
>      will be linked.
> 
>    * When an adapter or domain is assigned to a matrix mdev, for each new
>      APQN assigned that references a queue device bound to the vfio_ap
>      device driver, the structures representing the queue device and the
>      matrix mdev will be linked.
> 
> The links will be removed as follows:
> 
>    * When the queue device is removed, if its APQN is assigned to a matrix
>      mdev, the structures representing the queue device and the matrix mdev
>      will be unlinked.
> 
>    * When an adapter or domain is unassigned from a matrix mdev, for each
>      APQN unassigned that references a queue device bound to the vfio_ap
>      device driver, the structures representing the queue device and the
>      matrix mdev will be unlinked.
> 

I would prefer if the changes to the q->matrix_mdev link were restricted
to this patch. Patches 1 and 2 do some of that stuff as well. See my
comments at the code. 

> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c     | 146 +++++++++++++++++++++++---
>  drivers/s390/crypto/vfio_ap_private.h |   3 +
>  2 files changed, 135 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 049b97d7444c..1357f8f8b7e4 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -28,7 +28,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>  
>  /**
>   * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
> - * @matrix_mdev: the associated mediated matrix
>   * @apqn: The queue APQN
>   *
>   * Retrieve a queue with a specific APQN from the AP queue devices attached to
> @@ -36,18 +35,11 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>   *
>   * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>   */
> -static struct vfio_ap_queue *vfio_ap_get_queue(
> -					struct ap_matrix_mdev *matrix_mdev,
> -					unsigned long apqn)
> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
>  {
>  	struct ap_queue *queue;
>  	struct vfio_ap_queue *q = NULL;
>  
> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> -		return NULL;
> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> -		return NULL;
> -
>  	queue = ap_get_qdev(apqn);
>  	if (!queue)
>  		return NULL;

Patch 2 removed
	q->matrix_mdev = matrix_mdev;
because patch 1 make it redundant. But patch 1 should not have made it
redundant in the first place.

It should be removed in this patch.

> @@ -60,6 +52,19 @@ static struct vfio_ap_queue *vfio_ap_get_queue(
>  	return q;
>  }
>  
> +static struct vfio_ap_queue *
> +vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
> +		if (q && (q->apqn == apqn))
> +			return q;
> +	}
> +
> +	return NULL;
> +}
> +
>  /**
>   * vfio_ap_wait_for_irqclear
>   * @apqn: The AP Queue number
> @@ -171,7 +176,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
>  		  status.response_code);
>  end_free:
>  	vfio_ap_free_aqic_resources(q);
> -	q->matrix_mdev = NULL;
>  	return status;
>  }
>  
> @@ -284,14 +288,14 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  
>  	if (!vcpu->kvm->arch.crypto.pqap_hook)
>  		goto out_unlock;
> +
>  	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>  				   struct ap_matrix_mdev, pqap_hook);
>  
> -	q = vfio_ap_get_queue(matrix_mdev, apqn);
> +	q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
>  	if (!q)
>  		goto out_unlock;
>  
> -	q->matrix_mdev = matrix_mdev;

This was unnecessarily added in patch 1, now it's removed.

>  	status = vcpu->run->s.regs.gprs[1];
>  
>  	/* If IR bit(16) is set we enable the interrupt */
> @@ -331,6 +335,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  
>  	matrix_mdev->mdev = mdev;
>  	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> +	hash_init(matrix_mdev->qtable);
>  	mdev_set_drvdata(mdev, matrix_mdev);
>  	matrix_mdev->pqap_hook.hook = handle_pqap;
>  	matrix_mdev->pqap_hook.owner = THIS_MODULE;
> @@ -559,6 +564,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>  	return 0;
>  }
>  
> +enum qlink_type {
> +	LINK_APID,
> +	LINK_APQI,
> +	UNLINK_APID,
> +	UNLINK_APQI,
> +};
> +
> +static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
> +				    unsigned long apid, unsigned long apqi)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> +	if (q) {
> +		q->matrix_mdev = matrix_mdev;
> +		hash_add(matrix_mdev->qtable,
> +			 &q->mdev_qnode, q->apqn);
> +	}
> +}
> +
> +static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> +	if (q) {
> +		q->matrix_mdev = NULL;
> +		hash_del(&q->mdev_qnode);
> +	}
> +}
> +
> +/**
> + * vfio_ap_mdev_link_queues
> + *
> + * @matrix_mdev: The matrix mdev to link.
> + * @type:	 The type of @qlink_id.
> + * @qlink_id:	 The APID or APQI of the queues to link.
> + *
> + * Sets or clears the links between the queues with the specified @qlink_id
> + * and the @matrix_mdev:
> + *     @type == LINK_APID: Set the links between the @matrix_mdev and the
> + *                         queues with the specified @qlink_id (APID)
> + *     @type == LINK_APQI: Set the links between the @matrix_mdev and the
> + *                         queues with the specified @qlink_id (APQI)
> + *     @type == UNLINK_APID: Clear the links between the @matrix_mdev and the
> + *                           queues with the specified @qlink_id (APID)
> + *     @type == UNLINK_APQI: Clear the links between the @matrix_mdev and the
> + *                           queues with the specified @qlink_id (APQI)
> + */
> +static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
> +				     enum qlink_type type,
> +				     unsigned long qlink_id)

I believe Connie wanted this changed, and IMHO she is right, this does
not specify the type of link, the type of the link is always the same,
but determines what action needs to be taken. The enum name qlink_type
reads like it's the type of the qlink, but as your doc says it just tells
you what qlink_id is. 

If apids and apqis had their own type-checked distinct type, the type of qlink_id
would be the union of those two...

> +{
> +	unsigned long id;
> +
> +	switch (type) {

Since each of these cases is used at exactly one place, maybe it would
be simpler to just inline them where they are needed. Or are these going
to be used in other situations as well?

> +	case LINK_APID:

assign_adapter

> +		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> +				     matrix_mdev->matrix.aqm_max + 1)
> +			vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
> +		break;
> +	case UNLINK_APID:

unassign_adapter

> +		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> +				     matrix_mdev->matrix.aqm_max + 1)
> +			vfio_ap_mdev_unlink_queue(qlink_id, id);
> +		break;
> +	case LINK_APQI:

assign_domain

> +		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> +				     matrix_mdev->matrix.apm_max + 1)
> +			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> +		break;
> +	case UNLINK_APQI:

unassign_domain

> +		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> +				     matrix_mdev->matrix.apm_max + 1)
> +			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +	}
> +}
> +
>  /**
>   * assign_adapter_store
>   *
> @@ -628,6 +714,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>  	if (ret)
>  		goto share_err;
>  
> +	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
>  	ret = count;
>  	goto done;
>  
> @@ -679,6 +766,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
>  
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> +	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
> @@ -769,6 +857,7 @@ static ssize_t assign_domain_store(struct device *dev,
>  	if (ret)
>  		goto share_err;
>  
> +	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
>  	ret = count;
>  	goto done;
>  
> @@ -821,6 +910,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>  
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
> +	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
> @@ -1159,8 +1249,8 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>  			 */
>  			if (ret)
>  				rc = ret;
> -			q = vfio_ap_get_queue(matrix_mdev,
> -					      AP_MKQID(apid, apqi));
> +			q = vfio_ap_mdev_get_queue(matrix_mdev,
> +						   AP_MKQID(apid, apqi));
>  			if (q)
>  				vfio_ap_free_aqic_resources(q);
>  		}
> @@ -1288,6 +1378,29 @@ void vfio_ap_mdev_unregister(void)
>  	mdev_unregister_device(&matrix_dev->device);
>  }
>  
> +/**
> + * vfio_ap_queue_link_mdev
> + *
> + * @q: The queue to link with the matrix mdev.
> + *
> + * Links @q with the matrix mdev to which the queue's APQN is assigned.
> + */
> +static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
> +{
> +	unsigned long apid = AP_QID_CARD(q->apqn);
> +	unsigned long apqi = AP_QID_QUEUE(q->apqn);
> +	struct ap_matrix_mdev *matrix_mdev;
> +
> +	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
> +		if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
> +		    test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
> +			q->matrix_mdev = matrix_mdev;
> +			hash_add(matrix_mdev->qtable, &q->mdev_qnode, q->apqn);
> +			break;
> +		}
> +	}
> +}
> +
>  int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>  {
>  	struct vfio_ap_queue *q;
> @@ -1299,9 +1412,12 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>  	if (!q)
>  		return -ENOMEM;
>  
> +	mutex_lock(&matrix_dev->lock);
>  	dev_set_drvdata(&queue->ap_dev.device, q);
>  	q->apqn = queue->qid;
>  	q->saved_isc = VFIO_AP_ISC_INVALID;
> +	vfio_ap_queue_link_mdev(q);
> +	mutex_unlock(&matrix_dev->lock);
>  
>  	return 0;
>  }
> @@ -1321,6 +1437,8 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>  	apqi = AP_QID_QUEUE(q->apqn);
>  	vfio_ap_mdev_reset_queue(apid, apqi, 1);
>  	vfio_ap_free_aqic_resources(q);
> +	if (q->matrix_mdev)
> +		hash_del(&q->mdev_qnode);
>  	kfree(q);
>  	mutex_unlock(&matrix_dev->lock);
>  }
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index d9003de4fbad..4e5cc72fc0db 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -18,6 +18,7 @@
>  #include <linux/delay.h>
>  #include <linux/mutex.h>
>  #include <linux/kvm_host.h>
> +#include <linux/hashtable.h>
>  
>  #include "ap_bus.h"
>  
> @@ -86,6 +87,7 @@ struct ap_matrix_mdev {
>  	struct kvm *kvm;
>  	struct kvm_s390_module_hook pqap_hook;
>  	struct mdev_device *mdev;
> +	DECLARE_HASHTABLE(qtable, 8);

I'm not sure about the benefit of this hashtable if the bus is supposed
to give us O(1) queue lookup based on APQN. I guess it's also easier to
right-size the hashtable in the bus than for each mdev.

Don't get me wrong, I'm willing to accept these hashtables.

Another thing I'm thinking about is how do we want to deal later with
resources filtered because one of the required queues is missing. Does
it make sense to maintain the link for those? I will have to study the
following patches and return to this one later.

Regards,
Halil


>  };
>  
>  extern int vfio_ap_mdev_register(void);
> @@ -97,6 +99,7 @@ struct vfio_ap_queue {
>  	int	apqn;
>  #define VFIO_AP_ISC_INVALID 0xff
>  	unsigned char saved_isc;
> +	struct hlist_node mdev_qnode;
>  };
>  
>  int vfio_ap_mdev_probe_queue(struct ap_device *queue);


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use
  2020-10-22 17:11 ` [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
@ 2020-10-27 13:01   ` Halil Pasic
  2020-10-27 16:55   ` Harald Freudenberger
  1 sibling, 0 replies; 68+ messages in thread
From: Halil Pasic @ 2020-10-27 13:01 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 22 Oct 2020 13:11:59 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The callback
> will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.

Like discussed last time, there seems to be nothing, that would prevent
a resource becoming in use between the in_use() callback returned false
and the resource being removed as a result of ap_bus_revise_bindings().

Another thing that may be of interest, is that now we hold the
ap_perms_mutex for the in_use() checks. The ap_perms_mutex is used
in ap_device_probe() and I don't quite understand some
usages of in zcrypt_api.c My feeling is that the extra pressure on that
lock should not be a problem, except if in_use() were to not return
because of some deadlock.

With all that said if Harald is fine with it, so am I.

Acked-by: Halil Pasic <pasic@linux.ibm.com>

> 
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters and domains
> assigned to the matrix mdev). This will enforce the proper procedure for
> removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-10-22 17:12 ` [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
@ 2020-10-27 13:27   ` Halil Pasic
  2020-11-13 17:14     ` Tony Krowiak
  0 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-10-27 13:27 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 22 Oct 2020 13:12:00 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Let's implement the callback to indicate when an APQN
> is in use by the vfio_ap device driver. The callback is
> invoked whenever a change to the apmask or aqmask would
> result in one or more queue devices being removed from the driver. The
> vfio_ap device driver will indicate a resource is in use
> if the APQN of any of the queue devices to be removed are assigned to
> any of the matrix mdevs under the driver's control.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_drv.c     |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c     | 78 +++++++++++++++++++--------
>  drivers/s390/crypto/vfio_ap_private.h |  2 +
>  3 files changed, 60 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index 73bd073fd5d3..8934471b7944 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
>  	memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
>  	vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
>  	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
> +	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>  	vfio_ap_drv.ids = ap_queue_ids;
>  
>  	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 1357f8f8b7e4..9e9fad560859 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -522,18 +522,40 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>  	return 0;
>  }
>  
> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> +			 "already assigned to %s"
> +
> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
> +					 unsigned long *apm,
> +					 unsigned long *aqm)
> +{
> +	unsigned long apid, apqi;
> +
> +	for_each_set_bit_inv(apid, apm, AP_DEVICES)
> +		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> +			pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);

Isn't error rather severe for this? For my taste even warning would be
severe for this.

> +}
> +
>  /**
>   * vfio_ap_mdev_verify_no_sharing
>   *
> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> - * and AP queue indexes comprising the AP matrix are not configured for another
> + * Verifies that each APQN derived from the cross product of the AP adapter IDs
> + * and AP queue indexes comprising an AP matrix is not assigned to a
>   * mediated device. AP queue sharing is not allowed.
>   *
> - * @matrix_mdev: the mediated matrix device
> + * @matrix_mdev: the mediated matrix device to which the APQNs being verified
> + *		 are assigned. If the value is not NULL, then verification will
> + *		 proceed for all other matrix mediated devices; otherwise, all
> + *		 matrix mediated devices will be verified.
> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
>   *
> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> + * Returns 0 if no APQNs are not shared, otherwise; returns -EADDRINUSE if one
> + * or more APQNs are shared.
>   */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
> +					  unsigned long *mdev_apm,
> +					  unsigned long *mdev_aqm)
>  {
>  	struct ap_matrix_mdev *lstdev;
>  	DECLARE_BITMAP(apm, AP_DEVICES);
> @@ -550,14 +572,15 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>  		 * We work on full longs, as we can only exclude the leftover
>  		 * bits in non-inverse order. The leftover is all zeros.
>  		 */
> -		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> -				lstdev->matrix.apm, AP_DEVICES))
> +		if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
>  			continue;
>  
> -		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> -				lstdev->matrix.aqm, AP_DOMAINS))
> +		if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
>  			continue;
>  
> +		vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
> +					     apm, aqm);
> +
>  		return -EADDRINUSE;
>  	}
>  
> @@ -683,6 +706,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>  {
>  	int ret;
>  	unsigned long apid;
> +	DECLARE_BITMAP(apm, AP_DEVICES);
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> @@ -708,18 +732,18 @@ static ssize_t assign_adapter_store(struct device *dev,
>  	if (ret)
>  		goto done;
>  
> -	set_bit_inv(apid, matrix_mdev->matrix.apm);
> +	memset(apm, 0, sizeof(apm));
> +	set_bit_inv(apid, apm);
>  
> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> +	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
> +					     matrix_mdev->matrix.aqm);

What is the benefit of using a copy here? I mean we have the vfio_ap lock
so nobody can see the bit we speculatively flipped.

I've also pointed out in the previous patch that in_use() isn't
perfectly reliable (at least in theory) because of a race.

Otherwise looks good to me!

>  	if (ret)
> -		goto share_err;
> +		goto done;
>  
> +	set_bit_inv(apid, matrix_mdev->matrix.apm);
>  	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
>  	ret = count;
> -	goto done;
>  
> -share_err:
> -	clear_bit_inv(apid, matrix_mdev->matrix.apm);
>  done:
>  	mutex_unlock(&matrix_dev->lock);
>  
> @@ -831,6 +855,7 @@ static ssize_t assign_domain_store(struct device *dev,
>  {
>  	int ret;
>  	unsigned long apqi;
> +	DECLARE_BITMAP(aqm, AP_DOMAINS);
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
> @@ -851,18 +876,18 @@ static ssize_t assign_domain_store(struct device *dev,
>  	if (ret)
>  		goto done;
>  
> -	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
> +	memset(aqm, 0, sizeof(aqm));
> +	set_bit_inv(apqi, aqm);
>  
> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> +	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
> +					     matrix_mdev->matrix.apm, aqm);
>  	if (ret)
> -		goto share_err;
> +		goto done;
>  
> +	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>  	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
>  	ret = count;
> -	goto done;
>  
> -share_err:
> -	clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
>  done:
>  	mutex_unlock(&matrix_dev->lock);
>  
> @@ -1442,3 +1467,14 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>  	kfree(q);
>  	mutex_unlock(&matrix_dev->lock);
>  }
> +
> +bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
> +{
> +	bool in_use;
> +
> +	mutex_lock(&matrix_dev->lock);
> +	in_use = !!vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
> +	mutex_unlock(&matrix_dev->lock);
> +
> +	return in_use;
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 4e5cc72fc0db..c1d8b5507610 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -105,4 +105,6 @@ struct vfio_ap_queue {
>  int vfio_ap_mdev_probe_queue(struct ap_device *queue);
>  void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>  
> +bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
> +
>  #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use
  2020-10-22 17:11 ` [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
  2020-10-27 13:01   ` Halil Pasic
@ 2020-10-27 16:55   ` Harald Freudenberger
  2020-11-13 21:30     ` Tony Krowiak
  1 sibling, 1 reply; 68+ messages in thread
From: Harald Freudenberger @ 2020-10-27 16:55 UTC (permalink / raw)
  To: Tony Krowiak, linux-s390, linux-kernel, kvm
  Cc: borntraeger, cohuck, mjrosato, pasic, alex.williamson, kwankhede,
	fiuczy, frankja, david, hca, gor

On 22.10.20 19:11, Tony Krowiak wrote:
> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The callback
> will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.
>
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters and domains
> assigned to the matrix mdev). This will enforce the proper procedure for
> removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
>
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
>  drivers/s390/crypto/ap_bus.h |   4 +
>  2 files changed, 142 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index 485cbfcbf06e..998e61cd86d9 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -35,6 +35,7 @@
>  #include <linux/mod_devicetable.h>
>  #include <linux/debugfs.h>
>  #include <linux/ctype.h>
> +#include <linux/module.h>
>  
>  #include "ap_bus.h"
>  #include "ap_debug.h"
> @@ -893,6 +894,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
>  	return 0;
>  }
>  
> +static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
> +			       unsigned long *newmap)
> +{
> +	unsigned long size;
> +	int rc;
> +
> +	size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
> +	if (*str == '+' || *str == '-') {
> +		memcpy(newmap, bitmap, size);
> +		rc = modify_bitmap(str, newmap, bits);
> +	} else {
> +		memset(newmap, 0, size);
> +		rc = hex2bitmap(str, newmap, bits);
> +	}
> +	return rc;
> +}
> +
>  int ap_parse_mask_str(const char *str,
>  		      unsigned long *bitmap, int bits,
>  		      struct mutex *lock)
> @@ -912,14 +930,7 @@ int ap_parse_mask_str(const char *str,
>  		kfree(newmap);
>  		return -ERESTARTSYS;
>  	}
> -
> -	if (*str == '+' || *str == '-') {
> -		memcpy(newmap, bitmap, size);
> -		rc = modify_bitmap(str, newmap, bits);
> -	} else {
> -		memset(newmap, 0, size);
> -		rc = hex2bitmap(str, newmap, bits);
> -	}
> +	rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
>  	if (rc == 0)
>  		memcpy(bitmap, newmap, size);
>  	mutex_unlock(lock);
> @@ -1111,12 +1122,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>  	return rc;
>  }
>  
> +static int __verify_card_reservations(struct device_driver *drv, void *data)
> +{
> +	int rc = 0;
> +	struct ap_driver *ap_drv = to_ap_drv(drv);
> +	unsigned long *newapm = (unsigned long *)data;
> +
> +	/*
> +	 * No need to verify whether the driver is using the queues if it is the
> +	 * default driver.
> +	 */
> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> +		return 0;
> +
> +	/* The non-default driver's module must be loaded */
Can you please update this comment? It should be something like
/* increase the driver's module refcounter to be sure it is not
   going away when we invoke the callback function. */

> +	if (!try_module_get(drv->owner))
> +		return 0;
> +
> +	if (ap_drv->in_use)
> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
> +			rc = -EBUSY;
> +
And here: /* release driver's module */ or simmilar
> +	module_put(drv->owner);
> +
> +	return rc;
> +}
> +
> +static int apmask_commit(unsigned long *newapm)
> +{
> +	int rc;
> +	unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
> +
> +	/*
> +	 * Check if any bits in the apmask have been set which will
> +	 * result in queues being removed from non-default drivers
> +	 */
> +	if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
> +		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
> +				      __verify_card_reservations);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	memcpy(ap_perms.apm, newapm, APMASKSIZE);
> +
> +	return 0;
> +}
> +
>  static ssize_t apmask_store(struct bus_type *bus, const char *buf,
>  			    size_t count)
>  {
>  	int rc;
> +	DECLARE_BITMAP(newapm, AP_DEVICES);
> +
> +	if (mutex_lock_interruptible(&ap_perms_mutex))
> +		return -ERESTARTSYS;
> +
> +	rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
> +	if (rc)
> +		goto done;
>  
> -	rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
> +	rc = apmask_commit(newapm);
> +
> +done:
> +	mutex_unlock(&ap_perms_mutex);
>  	if (rc)
>  		return rc;
>  
> @@ -1142,12 +1211,71 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
>  	return rc;
>  }
>  
> +static int __verify_queue_reservations(struct device_driver *drv, void *data)
> +{
> +	int rc = 0;
> +	struct ap_driver *ap_drv = to_ap_drv(drv);
> +	unsigned long *newaqm = (unsigned long *)data;
> +
> +	/*
> +	 * If the reserved bits do not identify queues reserved for use by the
> +	 * non-default driver, there is no need to verify the driver is using
> +	 * the queues.
> +	 */
> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> +		return 0;
> +
> +	/* The non-default driver's module must be loaded */
Same here.
> +	if (!try_module_get(drv->owner))
> +		return 0;
> +
> +	if (ap_drv->in_use)
> +		if (ap_drv->in_use(ap_perms.apm, newaqm))
> +			rc = -EBUSY;
> +
and here
> +	module_put(drv->owner);
> +
> +	return rc;
> +}
> +
> +static int aqmask_commit(unsigned long *newaqm)
> +{
> +	int rc;
> +	unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
> +
> +	/*
> +	 * Check if any bits in the aqmask have been set which will
> +	 * result in queues being removed from non-default drivers
> +	 */
> +	if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
> +		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
> +				      __verify_queue_reservations);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
> +
> +	return 0;
> +}
> +
>  static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
>  			    size_t count)
>  {
>  	int rc;
> +	DECLARE_BITMAP(newaqm, AP_DOMAINS);
>  
> -	rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
> +	if (mutex_lock_interruptible(&ap_perms_mutex))
> +		return -ERESTARTSYS;
> +
> +	rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
> +	if (rc)
> +		goto done;
> +
> +	rc = aqmask_commit(newaqm);
> +
> +done:
> +	mutex_unlock(&ap_perms_mutex);
>  	if (rc)
>  		return rc;
>  
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index 5029b80132aa..6ce154d924d3 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -145,6 +145,7 @@ struct ap_driver {
>  
>  	int (*probe)(struct ap_device *);
>  	void (*remove)(struct ap_device *);
> +	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
>  };
>  
>  #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
> @@ -293,6 +294,9 @@ void ap_queue_init_state(struct ap_queue *aq);
>  struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
>  			       int comp_device_type, unsigned int functions);
>  
> +#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
> +#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
> +
>  struct ap_perms {
>  	unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
>  	unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
I still don't like this code. That's because of what it is doing - not because of the code quality.
And Halil, you are right. It is adding more pressure to the mutex used for locking the apmask
and aqmask stuff (and the zcrypt multiple device drivers support code also).
I am very concerned about the in_use callback which is called with the ap_perms_mutex
held AND during bus_for_each_drv (so holding the overall AP BUS mutex) and then diving
into the vfio_ap ... with yet another mutex to protect the vfio structs.
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 11/14] s390/zcrypt: Notify driver on config changed and scan complete callbacks
  2020-10-22 17:12 ` [PATCH v11 11/14] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
@ 2020-10-27 17:28   ` Harald Freudenberger
  2020-11-13 20:58     ` Tony Krowiak
  0 siblings, 1 reply; 68+ messages in thread
From: Harald Freudenberger @ 2020-10-27 17:28 UTC (permalink / raw)
  To: Tony Krowiak, linux-s390, linux-kernel, kvm
  Cc: borntraeger, cohuck, mjrosato, pasic, alex.williamson, kwankhede,
	fiuczy, frankja, david, hca, gor

On 22.10.20 19:12, Tony Krowiak wrote:
> This patch intruduces an extension to the ap bus to notify device drivers
> when the host AP configuration changes - i.e., adapters, domains or
> control domains are added or removed. To that end, two new callbacks are
> introduced for AP device drivers:
>
>   void (*on_config_changed)(struct ap_config_info *new_config_info,
>                             struct ap_config_info *old_config_info);
>
>      This callback is invoked at the start of the AP bus scan
>      function when it determines that the host AP configuration information
>      has changed since the previous scan. This is done by storing
>      an old and current QCI info struct and comparing them. If there is any
>      difference, the callback is invoked.
>
>      Note that when the AP bus scan detects that AP adapters, domains or
>      control domains have been removed from the host's AP configuration, it
>      will remove the associated devices from the AP bus subsystem's device
>      model. This callback gives the device driver a chance to respond to
>      the removal of the AP devices from the host configuration prior to
>      calling the device driver's remove callback. The primary purpose of
>      this callback is to allow the vfio_ap driver to do a bulk unplug of
>      all affected adapters, domains and control domains from affected
>      guests rather than unplugging them one at a time when the remove
>      callback is invoked.
>
>   void (*on_scan_complete)(struct ap_config_info *new_config_info,
>                            struct ap_config_info *old_config_info);
>
>      The on_scan_complete callback is invoked after the ap bus scan is
>      complete if the host AP configuration data has changed.
>
>      Note that when the AP bus scan detects that adapters, domains or
>      control domains have been added to the host's configuration, it will
>      create new devices in the AP bus subsystem's device model. The primary
>      purpose of this callback is to allow the vfio_ap driver to do a bulk
>      plug of all affected adapters, domains and control domains into
>      affected guests rather than plugging them one at a time when the
>      probe callback is invoked.
>
> Please note that changes to the apmask and aqmask do not trigger
> these two callbacks since the bus scan function is not invoked by changes
> to those masks.
>
> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Did I really sign-off this ? I know, I saw this code but ...
First of all, please separate the ap bus changes from the vfio_ap driver changes.
This makes backports and code change history much easier.
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/ap_bus.c          | 88 ++++++++++++++++++++++++++-
>  drivers/s390/crypto/ap_bus.h          | 12 ++++
>  drivers/s390/crypto/vfio_ap_drv.c     |  2 +-
>  drivers/s390/crypto/vfio_ap_ops.c     | 11 ++--
>  drivers/s390/crypto/vfio_ap_private.h |  2 +-
>  5 files changed, 106 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index 998e61cd86d9..5b94956ef6bc 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -73,8 +73,10 @@ struct ap_perms ap_perms;
>  EXPORT_SYMBOL(ap_perms);
>  DEFINE_MUTEX(ap_perms_mutex);
>  EXPORT_SYMBOL(ap_perms_mutex);
> +DEFINE_MUTEX(ap_config_lock);
This mutes is unnecessary, but see details below.
>  
>  static struct ap_config_info *ap_qci_info;
> +static struct ap_config_info *ap_qci_info_old;
>  
>  /*
>   * AP bus related debug feature things.
> @@ -1420,6 +1422,52 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
>  		&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
>  }
>  
> +/* Helper function for notify_config_changed */
> +static int __drv_notify_config_changed(struct device_driver *drv, void *data)
> +{
> +	struct ap_driver *ap_drv = to_ap_drv(drv);
> +
> +	if (try_module_get(drv->owner)) {
> +		if (ap_drv->on_config_changed)
> +			ap_drv->on_config_changed(ap_qci_info,
> +						  ap_qci_info_old);
> +		module_put(drv->owner);
> +	}
> +
> +	return 0;
> +}
> +
> +/* Notify all drivers about an qci config change */
> +static inline void notify_config_changed(void)
> +{
> +	bus_for_each_drv(&ap_bus_type, NULL, NULL,
> +			 __drv_notify_config_changed);
> +}
> +
> +/* Helper function for notify_scan_complete */
> +static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
> +{
> +	struct ap_driver *ap_drv = to_ap_drv(drv);
> +
> +	if (try_module_get(drv->owner)) {
> +		if (ap_drv->on_scan_complete)
> +			ap_drv->on_scan_complete(ap_qci_info,
> +						 ap_qci_info_old);
> +		module_put(drv->owner);
> +	}
> +
> +	return 0;
> +}
> +
> +/* Notify all drivers about bus scan complete */
> +static inline void notify_scan_complete(void)
> +{
> +	bus_for_each_drv(&ap_bus_type, NULL, NULL,
> +			 __drv_notify_scan_complete);
> +}
> +
> +
> +
>  /*
>   * Helper function for ap_scan_bus().
>   * Remove card device and associated queue devices.
> @@ -1696,15 +1744,45 @@ static inline void ap_scan_adapter(int ap)
>  	put_device(&ac->ap_dev.device);
>  }
>  
> +static int ap_config_changed(void)
I don't like the name here. This function is effectively fetching the qci info
and then comparing the new with the prev. qci info. So it is the new
ap_get_configuration() which returns bool true (config changed) or
false (old and current config are the very same).
> +{
> +	int cfg_chg = 0;
> +
> +	if (ap_qci_info) {
> +		if (!ap_qci_info_old) {
> +			ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old),
> +						  GFP_KERNEL);
> +			if (!ap_qci_info_old)
> +				return 0;
> +		} else {
> +			memcpy(ap_qci_info_old, ap_qci_info,
> +			       sizeof(struct ap_config_info));
> +		}
> +		ap_fetch_qci_info(ap_qci_info);
> +		cfg_chg = memcmp(ap_qci_info,
> +				 ap_qci_info_old,
> +				 sizeof(struct ap_config_info)) != 0;
> +	}
> +
> +	return cfg_chg;
> +}
> +
>  /**
>   * ap_scan_bus(): Scan the AP bus for new devices
>   * Runs periodically, workqueue timer (ap_config_time)
>   */
>  static void ap_scan_bus(struct work_struct *unused)
>  {
> -	int ap;
> +	int ap, config_changed = 0;
> +
> +	mutex_lock(&ap_config_lock);
This mutex is more or less surrrounding the ap_scan_bus function.
The ap_scan_bus function is only called via a workqueue which is
making sure there is only one invocation at a point in time. So it
is not needed.
>  
> -	ap_fetch_qci_info(ap_qci_info);
> +	/* config change notify */
> +	config_changed = ap_config_changed();
> +	if (config_changed)
> +		notify_config_changed();
> +	memcpy(ap_qci_info_old, ap_qci_info,
> +	       sizeof(struct ap_config_info));
>  	ap_select_domain();
>  
>  	AP_DBF_DBG("%s running\n", __func__);
> @@ -1713,6 +1791,12 @@ static void ap_scan_bus(struct work_struct *unused)
>  	for (ap = 0; ap <= ap_max_adapter_id; ap++)
>  		ap_scan_adapter(ap);
>  
> +	/* scan complete notify */
> +	if (config_changed)
> +		notify_scan_complete();
> +
> +	mutex_unlock(&ap_config_lock);
> +
>  	/* check if there is at least one queue available with default domain */
>  	if (ap_domain_index >= 0) {
>  		struct device *dev =
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index 6ce154d924d3..c021ea5121a9 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -146,6 +146,18 @@ struct ap_driver {
>  	int (*probe)(struct ap_device *);
>  	void (*remove)(struct ap_device *);
>  	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
> +	/*
> +	 * Called at the start of the ap bus scan function when
> +	 * the crypto config information (qci) has changed.
> +	 */
> +	void (*on_config_changed)(struct ap_config_info *new_config_info,
> +				  struct ap_config_info *old_config_info);
> +	/*
> +	 * Called at the end of the ap bus scan function when
> +	 * the crypto config information (qci) has changed.
> +	 */
> +	void (*on_scan_complete)(struct ap_config_info *new_config_info,
> +				 struct ap_config_info *old_config_info);
>  };
>  
>  #define to_ap_drv(x) container_of((x), struct ap_driver, driver)

Rest of this patch is vfio related and should be in a separate patch.

Please note: The ap bus scan function does actively destroy card and associated queue
devices when the TAPQ invocation tells that the function bits have changed (e.g. from
EP11 mode to CCA mode) or the type has changed (e.g. from CEX6 to CEX7).
This does not come with an change in the qci apm or adm bitfields !

> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index 8934471b7944..f06e19754de3 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -87,7 +87,7 @@ static int vfio_ap_matrix_dev_create(void)
>  
>  	/* Fill in config info via PQAP(QCI), if available */
>  	if (test_facility(12)) {
> -		ret = ap_qci(&matrix_dev->info);
> +		ret = ap_qci(&matrix_dev->config_info);
>  		if (ret)
>  			goto matrix_alloc_err;
>  	}
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index dae1fba41941..c4ea80ec8599 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -354,8 +354,9 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  	}
>  
>  	matrix_mdev->mdev = mdev;
> -	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> -	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
> +	vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
> +	vfio_ap_matrix_init(&matrix_dev->config_info,
> +			    &matrix_mdev->shadow_apcb);
>  	hash_init(matrix_mdev->qtable);
>  	mdev_set_drvdata(mdev, matrix_mdev);
>  	matrix_mdev->pqap_hook.hook = handle_pqap;
> @@ -540,8 +541,8 @@ static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev,
>  		 * If the APID is not assigned to the host AP configuration,
>  		 * we can not assign it to the guest's AP configuration
>  		 */
> -		if (!test_bit_inv(apid,
> -				  (unsigned long *)matrix_dev->info.apm)) {
> +		if (!test_bit_inv(apid, (unsigned long *)
> +				  matrix_dev->config_info.apm)) {
>  			clear_bit_inv(apid, shadow_apcb.apm);
>  			continue;
>  		}
> @@ -554,7 +555,7 @@ static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev,
>  			 * guest's AP configuration
>  			 */
>  			if (!test_bit_inv(apqi, (unsigned long *)
> -					  matrix_dev->info.aqm)) {
> +					  matrix_dev->config_info.aqm)) {
>  				clear_bit_inv(apqi, shadow_apcb.aqm);
>  				continue;
>  			}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index fc8634cee485..5065f0367ea2 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -40,7 +40,7 @@
>  struct ap_matrix_dev {
>  	struct device device;
>  	atomic_t available_instances;
> -	struct ap_config_info info;
> +	struct ap_config_info config_info;
>  	struct list_head mdev_list;
>  	struct mutex lock;
>  	struct ap_driver  *vfio_ap_drv;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 06/14] s390/vfio-ap: introduce shadow APCB
  2020-10-22 17:12 ` [PATCH v11 06/14] s390/vfio-ap: introduce shadow APCB Tony Krowiak
@ 2020-10-28  8:11   ` Halil Pasic
  2020-11-13 17:18     ` Tony Krowiak
  0 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-10-28  8:11 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 22 Oct 2020 13:12:01 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The APCB is a field within the CRYCB that provides the AP configuration
> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
> maintain it for the lifespan of the guest.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c     | 24 +++++++++++++++++++-----
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 9e9fad560859..9791761aa7fd 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -320,6 +320,19 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
>  	matrix->adm_max = info->apxa ? info->Nd : 15;
>  }
>  
> +static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
> +{
> +	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
> +}
> +
> +static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
> +{
> +	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
> +				  matrix_mdev->shadow_apcb.apm,
> +				  matrix_mdev->shadow_apcb.aqm,
> +				  matrix_mdev->shadow_apcb.adm);
> +}
> +
>  static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  {
>  	struct ap_matrix_mdev *matrix_mdev;
> @@ -335,6 +348,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  
>  	matrix_mdev->mdev = mdev;
>  	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> +	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
>  	hash_init(matrix_mdev->qtable);
>  	mdev_set_drvdata(mdev, matrix_mdev);
>  	matrix_mdev->pqap_hook.hook = handle_pqap;
> @@ -1213,13 +1227,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  	if (ret)
>  		return NOTIFY_DONE;
>  
> -	/* If there is no CRYCB pointer, then we can't copy the masks */
> -	if (!matrix_mdev->kvm->arch.crypto.crycbd)
> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>  		return NOTIFY_DONE;
>  
> -	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> -				  matrix_mdev->matrix.aqm,
> -				  matrix_mdev->matrix.adm);
> +	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
> +	       sizeof(matrix_mdev->shadow_apcb));
> +	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>  
>  	return NOTIFY_OK;
>  }
> @@ -1329,6 +1342,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>  		kvm_put_kvm(matrix_mdev->kvm);
>  		matrix_mdev->kvm = NULL;
>  	}
> +

Unrelated change.

Otherwise patch looks OK.

Reviewed-by: Halil Pasic <pasic@linux.ibm.com>

>  	mutex_unlock(&matrix_dev->lock);
>  
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index c1d8b5507610..fc8634cee485 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -75,6 +75,7 @@ struct ap_matrix {
>   * @list:	allows the ap_matrix_mdev struct to be added to a list
>   * @matrix:	the adapters, usage domains and control domains assigned to the
>   *		mediated matrix device.
> + * @shadow_apcb:    the shadow copy of the APCB field of the KVM guest's CRYCB
>   * @group_notifier: notifier block used for specifying callback function for
>   *		    handling the VFIO_GROUP_NOTIFY_SET_KVM event
>   * @kvm:	the struct holding guest's state
> @@ -82,6 +83,7 @@ struct ap_matrix {
>  struct ap_matrix_mdev {
>  	struct list_head node;
>  	struct ap_matrix matrix;
> +	struct ap_matrix shadow_apcb;
>  	struct notifier_block group_notifier;
>  	struct notifier_block iommu_notifier;
>  	struct kvm *kvm;


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-10-22 17:12 ` [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
@ 2020-10-28  8:17   ` Halil Pasic
  2020-11-13 17:27     ` Tony Krowiak
  0 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-10-28  8:17 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 22 Oct 2020 13:12:02 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> +static ssize_t guest_matrix_show(struct device *dev,
> +				 struct device_attribute *attr, char *buf)
> +{
> +	ssize_t nchars;
> +	struct mdev_device *mdev = mdev_from_dev(dev);
> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> +		return -ENODEV;

I'm wondering, would it make sense to have guest_matrix display the would
be guest matrix when we don't have a KVM? With the filtering in
place, the question in what guest_matrix would my (assign) matrix result
right now if I were to hook up my vfio_ap_mdev to a guest seems a
legitimate one.


> +
> +	mutex_lock(&matrix_dev->lock);
> +	nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->shadow_apcb, buf);
> +	mutex_unlock(&matrix_dev->lock);
> +
> +	return nchars;
> +}
> +static DEVICE_ATTR_RO(guest_matrix);

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
  2020-10-22 17:12 ` [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device Tony Krowiak
  2020-10-22 20:30   ` kernel test robot
@ 2020-10-28 13:57   ` Halil Pasic
  2020-11-03 22:49     ` Tony Krowiak
  1 sibling, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-10-28 13:57 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 22 Oct 2020 13:12:03 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> In response to the probe or remove of a queue device, if a KVM guest is
> using the matrix mdev to which the APQN of the queue device is assigned,
> the vfio_ap device driver must respond accordingly. In an ideal world, the
> queue device being probed would be hot plugged into the guest. Likewise,
> the queue corresponding to the queue device being removed would
> be hot unplugged from the guest. Unfortunately, the AP architecture
> precludes plugging or unplugging individual queues. We must also
> consider the fact that the linux device model precludes us from passing a
> queue device through to a KVM guest that is not bound to the driver
> facilitating the pass-through. Consequently, we are left with the choice of
> plugging/unplugging the adapter or the domain. In the latter case, this
> would result in taking access to the domain away for each adapter the
> guest is using. In either case, the operation will alter a KVM guest's
> access to one or more queues, so let's plug/unplug the adapter on
> bind/unbind of the queue device since this corresponds to the hardware
> entity that may be physically plugged/unplugged - i.e., a domain is not
> a piece of hardware.
> 
> Example:
> =======
> Queue devices bound to vfio_ap device driver:
>    04.0004
>    04.0047
>    04.0054
> 
>    05.0005
>    05.0047
> 
> Adapters and domains assigned to matrix mdev:
>    Adapters  Domains  -> Queues
>    04        0004        04.0004
>    05        0047        04.0047
>              0054        04.0054
>                          05.0004
>                          05.0047
>                          05.0054
> 
> KVM guest matrix at is startup:
>    Adapters  Domains  -> Queues
>    04        0004        04.0004
>              0047        04.0047
>              0054        04.0054
> 
>    Adapter 05 is filtered because queue 05.0054 is not bound.
> 
> KVM guest matrix after queue 05.0054 is bound to the vfio_ap driver:
>    Adapters  Domains  -> Queues
>    04        0004        04.0004
>    05        0047        04.0047
>              0054        04.0054
>                          05.0004
>                          05.0047
>                          05.0054
> 
>    All queues assigned to the matrix mdev are now bound.
> 
> KVM guest matrix after queue 04.0004 is unbound:
> 
>    Adapters  Domains  -> Queues
>    05        0004        05.0004
>              0047        05.0047
>              0054        05.0054
> 
>    Adapter 04 is filtered because 04.0004 is no longer bound.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 158 +++++++++++++++++++++++++++++-
>  1 file changed, 155 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 7bad70d7bcef..5b34bc8fca31 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -312,6 +312,13 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> +static void vfio_ap_matrix_clear_masks(struct ap_matrix *matrix)
> +{
> +	bitmap_clear(matrix->apm, 0, AP_DEVICES);
> +	bitmap_clear(matrix->aqm, 0, AP_DOMAINS);
> +	bitmap_clear(matrix->adm, 0, AP_DOMAINS);
> +}
> +
>  static void vfio_ap_matrix_init(struct ap_config_info *info,
>  				struct ap_matrix *matrix)
>  {
> @@ -601,6 +608,104 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>  	return 0;
>  }
>  
> +static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
> +					struct ap_matrix *matrix2)
> +{
> +	return (bitmap_equal(matrix1->apm, matrix2->apm, AP_DEVICES) &&
> +		bitmap_equal(matrix1->aqm, matrix2->aqm, AP_DOMAINS) &&
> +		bitmap_equal(matrix1->adm, matrix2->adm, AP_DOMAINS));
> +}
> +
> +/**
> + * vfio_ap_mdev_filter_matrix
> + *
> + * Filters the matrix of adapters, domains, and control domains assigned to
> + * a matrix mdev's AP configuration and stores the result in the shadow copy of
> + * the APCB used to supply a KVM guest's AP configuration.
> + *
> + * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
> + *
> + * Returns true if filtering has changed the shadow copy of the APCB used
> + * to supply a KVM guest's AP configuration; otherwise, returns false.
> + */
> +static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev)
> +{
> +	struct ap_matrix shadow_apcb;
> +	unsigned long apid, apqi, apqn;
> +
> +	memcpy(&shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
> +
> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> +		/*
> +		 * If the APID is not assigned to the host AP configuration,
> +		 * we can not assign it to the guest's AP configuration
> +		 */
> +		if (!test_bit_inv(apid,
> +				  (unsigned long *)matrix_dev->info.apm)) {
> +			clear_bit_inv(apid, shadow_apcb.apm);
> +			continue;
> +		}
> +
> +		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> +				     AP_DOMAINS) {
> +			/*
> +			 * If the APQI is not assigned to the host AP
> +			 * configuration, then it can not be assigned to the
> +			 * guest's AP configuration
> +			 */
> +			if (!test_bit_inv(apqi, (unsigned long *)
> +					  matrix_dev->info.aqm)) {
> +				clear_bit_inv(apqi, shadow_apcb.aqm);
> +				continue;
> +			}
> +
> +			/*
> +			 * If the APQN is not bound to the vfio_ap device
> +			 * driver, then we can't assign it to the guest's
> +			 * AP configuration. The AP architecture won't
> +			 * allow filtering of a single APQN, so let's filter
> +			 * the APID.
> +			 */
> +			apqn = AP_MKQID(apid, apqi);
> +			if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
> +				clear_bit_inv(apid, shadow_apcb.apm);
> +				break;
> +			}
> +		}
> +
> +		/*
> +		 * If all APIDs have been cleared, then clear the APQIs from the
> +		 * shadow APCB and quit filtering.
> +		 */
> +		if (bitmap_empty(shadow_apcb.apm, AP_DEVICES)) {
> +			if (!bitmap_empty(shadow_apcb.aqm, AP_DOMAINS))
> +				bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
> +
> +			break;
> +		}
> +
> +		/*
> +		 * If all APQIs have been cleared, then clear the APIDs from the
> +		 * shadow APCB and quit filtering.
> +		 */
> +		if (bitmap_empty(shadow_apcb.aqm, AP_DOMAINS)) {
> +			if (!bitmap_empty(shadow_apcb.apm, AP_DEVICES))
> +				bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
> +
> +			break;
> +		}

We do this to show the no queues but bits set output in show? We could
get rid of some code if we were to not z

> +	}
> +
> +	if (vfio_ap_mdev_matrixes_equal(&matrix_mdev->shadow_apcb,
> +					&shadow_apcb))
> +		return false;
> +
> +	memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb,
> +	       sizeof(struct ap_matrix));
> +
> +	return true;
> +}
> +
>  enum qlink_type {
>  	LINK_APID,
>  	LINK_APQI,
> @@ -1256,9 +1361,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>  		return NOTIFY_DONE;
>  
> -	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
> -	       sizeof(matrix_mdev->shadow_apcb));
> -	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +	if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev))
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>  
>  	return NOTIFY_OK;
>  }
> @@ -1369,6 +1473,18 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>  		matrix_mdev->kvm = NULL;
>  	}
>  
> +	/*
> +	 * The shadow_apcb must be cleared.
> +	 *
> +	 * The shadow_apcb is committed to the guest only if the masks resulting
> +	 * from filtering the matrix_mdev->matrix differs from the masks in the
> +	 * shadow_apcb. Consequently, if we don't clear the masks here and a
> +	 * guest is subsequently started, the filtering may not result in a
> +	 * change to the shadow_apcb which will not get committed to the guest;
> +	 * in that case, the guest will be left without any queues.
> +	 */
> +	vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
> +
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> @@ -1466,6 +1582,16 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
>  	}
>  }
>  
> +static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
> +{
> +
> +	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
> +		return;
> +
> +	if (vfio_ap_mdev_filter_guest_matrix(q->matrix_mdev))
> +		vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);

Here we do more work than necessary. At this point we now, that
we either put the APID of the queue in the shadow_apcb or do nothing. To
decide if we have to put the APID in the shadow apcb we need to
check for the cartesian product of shadow_apcb.aqm with the APID, if the
queues identified by those APQNs are bound to the vfio_ap driver. The 
vfio_ap_mdev_filter_guest_matrix() is going to do a lookup for each
assigned APQN.

> +}
> +
>  int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>  {
>  	struct vfio_ap_queue *q;
> @@ -1482,11 +1608,36 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>  	q->apqn = queue->qid;
>  	q->saved_isc = VFIO_AP_ISC_INVALID;
>  	vfio_ap_queue_link_mdev(q);
> +	vfio_ap_mdev_hot_plug_queue(q);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return 0;
>  }
>  
> +void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
> +{
> +	unsigned long apid = AP_QID_CARD(q->apqn);
> +
> +	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
> +		return;
> +
> +	/*
> +	 * If the APID is assigned to the guest, then let's
> +	 * go ahead and unplug the adapter since the
> +	 * architecture does not provide a means to unplug
> +	 * an individual queue.
> +	 */
> +	if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
> +		clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);

Shouldn't we check aqm as well? I mean it may be clear at this point
bacause of info->aqm. If the bit is clear, we don't have to remove
the apm bit.

> +
> +		if (bitmap_empty(q->matrix_mdev->shadow_apcb.apm, AP_DEVICES))
> +			bitmap_clear(q->matrix_mdev->shadow_apcb.aqm, 0,
> +				     AP_DOMAINS);
> +
> +		vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
> +	}
> +}
> +
>  void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>  {
>  	struct vfio_ap_queue *q;
> @@ -1497,6 +1648,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>  
>  	mutex_lock(&matrix_dev->lock);
>  	q = dev_get_drvdata(&queue->ap_dev.device);
> +	vfio_ap_mdev_hot_unplug_queue(q);

Puh this is ugly. In an ideal world the guest would be guaranteed to not
get any writes to the notifier byte after it has seen that the queue is
gone (or the interrupts were disabled).

The reset below might too late as the vcpus may go back immediately.

I don't have a good solution for this with the tools currently at
our disposal. We could simulate an external reset for the queue before
the update do the APCB, or just disable the interrupts. These are ugly
in their own way. 

Switching to emulation mode might be something for the future, but right
now it is also ugly.

Any thoughts? Am I just dreaming up a problem here?

Regards,
Halil


>  	dev_set_drvdata(&queue->ap_dev.device, NULL);
>  	apid = AP_QID_CARD(q->apqn);
>  	apqi = AP_QID_QUEUE(q->apqn);


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 09/14] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2020-10-22 17:12 ` [PATCH v11 09/14] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
@ 2020-10-28 15:03   ` Halil Pasic
  0 siblings, 0 replies; 68+ messages in thread
From: Halil Pasic @ 2020-10-28 15:03 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 22 Oct 2020 13:12:04 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> +static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev,
> +				       unsigned long *mdev_apm,
> +				       unsigned long *mdev_aqm)
> +{
> +	if (ap_apqn_in_matrix_owned_by_def_drv(mdev_apm, mdev_aqm))
> +		return -EADDRNOTAVAIL;
> +
> +	return vfio_ap_mdev_verify_no_sharing(matrix_mdev, mdev_apm, mdev_aqm);
> +}
> +
>  static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
>  					struct ap_matrix *matrix2)
>  {
> @@ -840,33 +734,21 @@ static ssize_t assign_adapter_store(struct device *dev,
>  	if (apid > matrix_mdev->matrix.apm_max)
>  		return -ENODEV;
>  
> -	/*
> -	 * Set the bit in the AP mask (APM) corresponding to the AP adapter
> -	 * number (APID). The bits in the mask, from most significant to least
> -	 * significant bit, correspond to APIDs 0-255.
> -	 */
> -	mutex_lock(&matrix_dev->lock);
> -
> -	ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
> -	if (ret)
> -		goto done;
> -
>  	memset(apm, 0, sizeof(apm));
>  	set_bit_inv(apid, apm);
>  
> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
> -					     matrix_mdev->matrix.aqm);
> -	if (ret)
> -		goto done;
> -
> +	mutex_lock(&matrix_dev->lock);
> +	ret = vfio_ap_mdev_validate_masks(matrix_mdev, apm,
> +					  matrix_mdev->matrix.aqm);

Is this a potential deadlock?

Consider following scenario 
1) apmask_store() takes ap_perms_mutex
2) assign_adapter_store() takes matrix_dev->lock
3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
   to take matrix_dev->lock
4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
   which tries to take ap_perms_mutex

BANG!

I think using mutex_trylock(&matrix_dev->lock) and bailing out with busy
if we don't manage to acquire the lock would be a good idea anyway, to
prevent a bunch of mdev management operations piling up on the mutex
and starving in_use().

Regards,
Halil

 
> +	if (ret) {
> +		mutex_unlock(&matrix_dev->lock);
> +		return ret;
> +	}
>  	set_bit_inv(apid, matrix_mdev->matrix.apm);
>  	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
> -	ret = count;
> -
> -done:
>  	mutex_unlock(&matrix_dev->lock);
>  
> -	return ret;
> +	return count;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-27  6:48   ` Halil Pasic
@ 2020-10-29 23:29     ` Tony Krowiak
  2020-10-30 16:13       ` Tony Krowiak
                         ` (4 more replies)
  0 siblings, 5 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-29 23:29 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/27/20 2:48 AM, Halil Pasic wrote:
> On Thu, 22 Oct 2020 13:11:56 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The queues assigned to a matrix mediated device are currently reset when:
>>
>> * The VFIO_DEVICE_RESET ioctl is invoked
>> * The mdev fd is closed by userspace (QEMU)
>> * The mdev is removed from sysfs.
> What about the situation when vfio_ap_mdev_group_notifier() is called to
> tell us that our pointer to KVM is about to become invalid? Do we need to
> clean up the IRQ stuff there?

After reading this question, I decided to do some tracing using
printk's and learned that the vfio_ap_mdev_group_notifier()
function does not get called when the guest is shutdown. The reason
for this is because the vfio_ap_mdev_release() function, which is called
before the KVM pointer is invalidated, unregisters the group notifier.

I took a look at some of the other drivers that register a group
notifier in the mdev_parent_ops.open callback and each unregistered
the notifier in the mdev_parent_ops.release callback.

So, to answer your question, there is no need to cleanup the IRQ
stuff in the vfio_ap_mdev_group_notifier() function since it will
not get called when the KVM pointer is invalidated. The cleanup
should be done in the vfio_ap_mdev_release() function that gets
called when the mdev fd is closed.

>
>> Immediately after the reset of a queue, a call is made to disable
>> interrupts for the queue. This is entirely unnecessary because the reset of
>> a queue disables interrupts, so this will be removed.
> Makes sense.
>
>> Since interrupt processing may have been enabled by the guest, it may also
>> be necessary to clean up the resources used for interrupt processing. Part
>> of the cleanup operation requires a reference to KVM, so a check is also
>> being added to ensure the reference to KVM exists. The reason is because
>> the release callback - invoked when userspace closes the mdev fd - removes
>> the reference to KVM. When the remove callback - called when the mdev is
>> removed from sysfs - is subsequently invoked, there will be no reference to
>> KVM when the cleanup is performed.
> Please see below in the code.
>
>> This patch will also do a bit of refactoring due to the fact that the
>> remove callback, implemented in vfio_ap_drv.c, disables the queue after
>> resetting it. Instead of the remove callback making a call into the
>> vfio_ap_ops.c to clean up the resources used for interrupt processing,
>> let's move the probe and remove callbacks into the vfio_ap_ops.c
>> file keep all code related to managing queues in a single file.
>>
> It would have been helpful to split out the refactoring as a separate
> patch. This way it is harder to review the code that got moved, because
> it is intermingled with the changes that intend to change behavior.

I suppose I can do that.

>   
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c     | 45 +------------------
>>   drivers/s390/crypto/vfio_ap_ops.c     | 63 +++++++++++++++++++--------
>>   drivers/s390/crypto/vfio_ap_private.h |  7 +--
>>   3 files changed, 52 insertions(+), 63 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index be2520cc010b..73bd073fd5d3 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -43,47 +43,6 @@ static struct ap_device_id ap_queue_ids[] = {
>>   
>>   MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>>   
>> -/**
>> - * vfio_ap_queue_dev_probe:
>> - *
>> - * Allocate a vfio_ap_queue structure and associate it
>> - * with the device as driver_data.
>> - */
>> -static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>> -{
>> -	struct vfio_ap_queue *q;
>> -
>> -	q = kzalloc(sizeof(*q), GFP_KERNEL);
>> -	if (!q)
>> -		return -ENOMEM;
>> -	dev_set_drvdata(&apdev->device, q);
>> -	q->apqn = to_ap_queue(&apdev->device)->qid;
>> -	q->saved_isc = VFIO_AP_ISC_INVALID;
>> -	return 0;
>> -}
>> -
>> -/**
>> - * vfio_ap_queue_dev_remove:
>> - *
>> - * Takes the matrix lock to avoid actions on this device while removing
>> - * Free the associated vfio_ap_queue structure
>> - */
>> -static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>> -{
>> -	struct vfio_ap_queue *q;
>> -	int apid, apqi;
>> -
>> -	mutex_lock(&matrix_dev->lock);
>> -	q = dev_get_drvdata(&apdev->device);
>> -	dev_set_drvdata(&apdev->device, NULL);
>> -	apid = AP_QID_CARD(q->apqn);
>> -	apqi = AP_QID_QUEUE(q->apqn);
>> -	vfio_ap_mdev_reset_queue(apid, apqi, 1);
>> -	vfio_ap_irq_disable(q);
>> -	kfree(q);
>> -	mutex_unlock(&matrix_dev->lock);
>> -}
>> -
>>   static void vfio_ap_matrix_dev_release(struct device *dev)
>>   {
>>   	struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev);
>> @@ -186,8 +145,8 @@ static int __init vfio_ap_init(void)
>>   		return ret;
>>   
>>   	memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
>> -	vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
>> -	vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
>> +	vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
>> +	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
>>   	vfio_ap_drv.ids = ap_queue_ids;
>>   
>>   	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index e0bde8518745..c471832f0a30 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -119,7 +119,8 @@ static void vfio_ap_wait_for_irqclear(int apqn)
>>    */
>>   static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
>>   {
>> -	if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev)
>> +	if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev &&
>> +	    q->matrix_mdev->kvm)
> Here is the check that the kvm reference exists, you mentioned in the
> cover letter. You make only the gisc_unregister depend on it, because
> that's what is going to explode.
>
> But I'm actually wondering if "KVM is gone but we still haven't cleaned
> up our aqic resources" is valid. I argue that it is not. The two
> resources we manage are the gisc registration and the pinned page. I
> argue that it makes on sense to keep what was the guests page pinned,
> if here is no guest associated (we don't have KVM).
>
> I assume the cleanup is supposed to be atomic from the perspective of
> other threads/contexts, so I expect the cleanup either to be fully done
> or not not entered the critical section.
>
> So !kvm && (q->saved_isc != VFIO_AP_ISC_INVALID || q->saved_pfn) is a
> bug. Isn't it?
>
> In that sense this change would only hide the actual problem.
>
> Is the scenario we are talking about something that can happen, or is
> this just about programming defensively?
>
> In any case, I don't think this is a good idea. We can be defensive
> about it, but we have to do it differently.
>
>
>>   		kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
>>   	if (q->saved_pfn && q->matrix_mdev)
>>   		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
>> @@ -144,7 +145,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
>>    * Returns if ap_aqic function failed with invalid, deconfigured or
>>    * checkstopped AP.
>>    */
>> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
>> +static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
>>   {
>>   	struct ap_qirq_ctrl aqic_gisa = {};
>>   	struct ap_queue_status status;
>> @@ -297,6 +298,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>   	if (!q)
>>   		goto out_unlock;
>>   
>> +	q->matrix_mdev = matrix_mdev;
> What is the purpose of this? Doesn't the preceding vfio_ap_get_queue()
> already set q->matrix_mdev?

You are correct, it shall be removed.

>
>>   	status = vcpu->run->s.regs.gprs[1];
>>   
>>   	/* If IR bit(16) is set we enable the interrupt */
>> @@ -1114,20 +1116,6 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>   	return NOTIFY_OK;
>>   }
>>   
>> -static void vfio_ap_irq_disable_apqn(int apqn)
>> -{
>> -	struct device *dev;
>> -	struct vfio_ap_queue *q;
>> -
>> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>> -				 &apqn, match_apqn);
>> -	if (dev) {
>> -		q = dev_get_drvdata(dev);
>> -		vfio_ap_irq_disable(q);
>> -		put_device(dev);
>> -	}
>> -}
>> -
>>   int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>>   			     unsigned int retry)
>>   {
>> @@ -1162,6 +1150,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>   {
>>   	int ret;
>>   	int rc = 0;
>> +	struct vfio_ap_queue *q;
>>   	unsigned long apid, apqi;
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   
>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>   			 */
>>   			if (ret)
>>   				rc = ret;
>> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
>> +			q = vfio_ap_get_queue(matrix_mdev,
>> +					      AP_MKQID(apid, apqi));
>> +			if (q)
>> +				vfio_ap_free_aqic_resources(q);
> Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't
> think so. I mean does the current code (and vfio_ap_mdev_reset_queue()
> in particular guarantee that the reset is actually done when we arrive
> here)? BTW, I think we have a similar problem with the current code as
> well.

If the return code from the vfio_ap_mdev_reset_queue() function
is zero, then yes, we are guaranteed the reset was done and the
queue is empty.  The function returns a non-zero return code if
the reset fails or the queue the reset did not complete within a given
amount of time, so maybe we shouldn't free AQIC resources when
we get a non-zero return code from the reset function?

There are three occasions when the vfio_ap_mdev_reset_queues()
is called:
1. When the VFIO_DEVICE_RESET ioctl is invoked from userspace
     (i.e., when the guest is started)
2. When the mdev fd is closed (vfio_ap_mdev_release())
3. When the mdev is removed (vfio_ap_mdev_remove())

The IRQ resources are initialized when the PQAP(AQIC)
is intercepted to enable interrupts. This would occur after
the guest boots and the AP bus initializes. So, 1 would
presumably occur before that happens. I couldn't find
anywhere in the AP bus or zcrypt code where a PQAP(AQIC)
is executed to disable interrupts, so my assumption is
that IRQ disablement is accomplished by a reset on
the guest. I'll have to ask Harald about that. So, 2 would
occur when the guest is about to terminate and 3
would occur only after the guest is terminated. In any
case, it seems that IRQ resources should be cleaned up.
Maybe it would be more appropriate to do that in the
vfio_ap_mdev_release() and vfio_ap_mdev_remove()
functions themselves?

>
> Under what circumstances do we expect !q? If we don't, then we need to
> complain one way or another.

In the current code (i.e., prior to introducing the subsequent hot
plug patches), an APQN can not be assigned to an mdev unless it
references a queue device bound to the vfio_ap device driver; however,
there is nothing preventing a queue device from getting unbound
while the guest is running (one of the problems mostly resolved by this
series). In that case, q would be NULL.

>
> I believe that each time we call vfio_ap_mdev_reset_queue(), we will
> also want to call vfio_ap_free_aqic_resources(q) to clean up our aqic
> resources associated with the queue -- if any. So I would really prefer
> having a function that does both.

As stated above, I don't believe PQAP(AQIC) is ever called by
the AP bus or zcrypt to disable IRQs, but I could be wrong about
that so I'll verify with Harald. If that is the case, then it would
make sense to free IRQ resources when a queue completes.
I can either add a function that does both and call it instead of
vfio_ap_mdev_reset_queue(). What say you?

>
>>   		}
>>   	}
>>   
>> @@ -1302,3 +1294,40 @@ void vfio_ap_mdev_unregister(void)
>>   {
>>   	mdev_unregister_device(&matrix_dev->device);
>>   }
>> +
>> +int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>> +{
>> +	struct vfio_ap_queue *q;
>> +	struct ap_queue *queue;
>> +
>> +	queue = to_ap_queue(&apdev->device);
>> +
>> +	q = kzalloc(sizeof(*q), GFP_KERNEL);
>> +	if (!q)
>> +		return -ENOMEM;
>> +
>> +	dev_set_drvdata(&queue->ap_dev.device, q);
>> +	q->apqn = queue->qid;
>> +	q->saved_isc = VFIO_AP_ISC_INVALID;
>> +
>> +	return 0;
>> +}
>> +
>> +void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>> +{
>> +	struct vfio_ap_queue *q;
>> +	struct ap_queue *queue;
>> +	int apid, apqi;
>> +
>> +	queue = to_ap_queue(&apdev->device);
> What is the benefit of rewriting this? You introduced
> queue just to do queue->ap_dev to get to the apdev you
> have in hand in the first place.

I'm not quite sure what you're asking. This function is
the callback function specified via the function pointer
specified via the remove field of the struct ap_driver
when the vfio_ap device driver is registered with the
AP bus. That callback function takes a struct ap_device
as a parameter. What am I missing here?

>
>> +
>> +	mutex_lock(&matrix_dev->lock);
>> +	q = dev_get_drvdata(&queue->ap_dev.device);
>> +	dev_set_drvdata(&queue->ap_dev.device, NULL);
>> +	apid = AP_QID_CARD(q->apqn);
>> +	apqi = AP_QID_QUEUE(q->apqn);
>> +	vfio_ap_mdev_reset_queue(apid, apqi, 1);
>> +	vfio_ap_free_aqic_resources(q);
>> +	kfree(q);
>> +	mutex_unlock(&matrix_dev->lock);
>> +}
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index f46dde56b464..d9003de4fbad 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -90,8 +90,6 @@ struct ap_matrix_mdev {
>>   
>>   extern int vfio_ap_mdev_register(void);
>>   extern void vfio_ap_mdev_unregister(void);
>> -int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>> -			     unsigned int retry);
>>   
>>   struct vfio_ap_queue {
>>   	struct ap_matrix_mdev *matrix_mdev;
>> @@ -100,5 +98,8 @@ struct vfio_ap_queue {
>>   #define VFIO_AP_ISC_INVALID 0xff
>>   	unsigned char saved_isc;
>>   };
>> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
>> +
>> +int vfio_ap_mdev_probe_queue(struct ap_device *queue);
>> +void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>> +
>>   #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-29 23:29     ` Tony Krowiak
@ 2020-10-30 16:13       ` Tony Krowiak
  2020-10-30 17:27       ` Halil Pasic
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-30 16:13 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/29/20 7:29 PM, Tony Krowiak wrote:
>
>
> On 10/27/20 2:48 AM, Halil Pasic wrote:
>> On Thu, 22 Oct 2020 13:11:56 -0400
>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>
>>> The queues assigned to a matrix mediated device are currently reset 
>>> when:
>>>
>>> * The VFIO_DEVICE_RESET ioctl is invoked
>>> * The mdev fd is closed by userspace (QEMU)
>>> * The mdev is removed from sysfs.
>> What about the situation when vfio_ap_mdev_group_notifier() is called to
>> tell us that our pointer to KVM is about to become invalid? Do we 
>> need to
>> clean up the IRQ stuff there?
>
> After reading this question, I decided to do some tracing using
> printk's and learned that the vfio_ap_mdev_group_notifier()
> function does not get called when the guest is shutdown. The reason
> for this is because the vfio_ap_mdev_release() function, which is called
> before the KVM pointer is invalidated, unregisters the group notifier.
>
> I took a look at some of the other drivers that register a group
> notifier in the mdev_parent_ops.open callback and each unregistered
> the notifier in the mdev_parent_ops.release callback.
>
> So, to answer your question, there is no need to cleanup the IRQ
> stuff in the vfio_ap_mdev_group_notifier() function since it will
> not get called when the KVM pointer is invalidated. The cleanup
> should be done in the vfio_ap_mdev_release() function that gets
> called when the mdev fd is closed.
>
>>
>>> Immediately after the reset of a queue, a call is made to disable
>>> interrupts for the queue. This is entirely unnecessary because the 
>>> reset of
>>> a queue disables interrupts, so this will be removed.
>> Makes sense.
>>
>>> Since interrupt processing may have been enabled by the guest, it 
>>> may also
>>> be necessary to clean up the resources used for interrupt 
>>> processing. Part
>>> of the cleanup operation requires a reference to KVM, so a check is 
>>> also
>>> being added to ensure the reference to KVM exists. The reason is 
>>> because
>>> the release callback - invoked when userspace closes the mdev fd - 
>>> removes
>>> the reference to KVM. When the remove callback - called when the 
>>> mdev is
>>> removed from sysfs - is subsequently invoked, there will be no 
>>> reference to
>>> KVM when the cleanup is performed.
>> Please see below in the code.
>>
>>> This patch will also do a bit of refactoring due to the fact that the
>>> remove callback, implemented in vfio_ap_drv.c, disables the queue after
>>> resetting it. Instead of the remove callback making a call into the
>>> vfio_ap_ops.c to clean up the resources used for interrupt processing,
>>> let's move the probe and remove callbacks into the vfio_ap_ops.c
>>> file keep all code related to managing queues in a single file.
>>>
>> It would have been helpful to split out the refactoring as a separate
>> patch. This way it is harder to review the code that got moved, because
>> it is intermingled with the changes that intend to change behavior.
>
> I suppose I can do that.
>
>>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>>> ---
>>>   drivers/s390/crypto/vfio_ap_drv.c     | 45 +------------------
>>>   drivers/s390/crypto/vfio_ap_ops.c     | 63 
>>> +++++++++++++++++++--------
>>>   drivers/s390/crypto/vfio_ap_private.h |  7 +--
>>>   3 files changed, 52 insertions(+), 63 deletions(-)
>>>
>>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
>>> b/drivers/s390/crypto/vfio_ap_drv.c
>>> index be2520cc010b..73bd073fd5d3 100644
>>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>>> @@ -43,47 +43,6 @@ static struct ap_device_id ap_queue_ids[] = {
>>>     MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>>>   -/**
>>> - * vfio_ap_queue_dev_probe:
>>> - *
>>> - * Allocate a vfio_ap_queue structure and associate it
>>> - * with the device as driver_data.
>>> - */
>>> -static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>>> -{
>>> -    struct vfio_ap_queue *q;
>>> -
>>> -    q = kzalloc(sizeof(*q), GFP_KERNEL);
>>> -    if (!q)
>>> -        return -ENOMEM;
>>> -    dev_set_drvdata(&apdev->device, q);
>>> -    q->apqn = to_ap_queue(&apdev->device)->qid;
>>> -    q->saved_isc = VFIO_AP_ISC_INVALID;
>>> -    return 0;
>>> -}
>>> -
>>> -/**
>>> - * vfio_ap_queue_dev_remove:
>>> - *
>>> - * Takes the matrix lock to avoid actions on this device while 
>>> removing
>>> - * Free the associated vfio_ap_queue structure
>>> - */
>>> -static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>>> -{
>>> -    struct vfio_ap_queue *q;
>>> -    int apid, apqi;
>>> -
>>> -    mutex_lock(&matrix_dev->lock);
>>> -    q = dev_get_drvdata(&apdev->device);
>>> -    dev_set_drvdata(&apdev->device, NULL);
>>> -    apid = AP_QID_CARD(q->apqn);
>>> -    apqi = AP_QID_QUEUE(q->apqn);
>>> -    vfio_ap_mdev_reset_queue(apid, apqi, 1);
>>> -    vfio_ap_irq_disable(q);
>>> -    kfree(q);
>>> -    mutex_unlock(&matrix_dev->lock);
>>> -}
>>> -
>>>   static void vfio_ap_matrix_dev_release(struct device *dev)
>>>   {
>>>       struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev);
>>> @@ -186,8 +145,8 @@ static int __init vfio_ap_init(void)
>>>           return ret;
>>>         memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
>>> -    vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
>>> -    vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
>>> +    vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
>>> +    vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
>>>       vfio_ap_drv.ids = ap_queue_ids;
>>>         ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, 
>>> VFIO_AP_DRV_NAME);
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>> index e0bde8518745..c471832f0a30 100644
>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -119,7 +119,8 @@ static void vfio_ap_wait_for_irqclear(int apqn)
>>>    */
>>>   static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
>>>   {
>>> -    if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev)
>>> +    if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev &&
>>> +        q->matrix_mdev->kvm)
>> Here is the check that the kvm reference exists, you mentioned in the
>> cover letter. You make only the gisc_unregister depend on it, because
>> that's what is going to explode.
>>
>> But I'm actually wondering if "KVM is gone but we still haven't cleaned
>> up our aqic resources" is valid. I argue that it is not. The two
>> resources we manage are the gisc registration and the pinned page. I
>> argue that it makes on sense to keep what was the guests page pinned,
>> if here is no guest associated (we don't have KVM).
>>
>> I assume the cleanup is supposed to be atomic from the perspective of
>> other threads/contexts, so I expect the cleanup either to be fully done
>> or not not entered the critical section.
>>
>> So !kvm && (q->saved_isc != VFIO_AP_ISC_INVALID || q->saved_pfn) is a
>> bug. Isn't it?
>>
>> In that sense this change would only hide the actual problem.
>>
>> Is the scenario we are talking about something that can happen, or is
>> this just about programming defensively?
>>
>> In any case, I don't think this is a good idea. We can be defensive
>> about it, but we have to do it differently.
>>
>>
>>> kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
>>>       if (q->saved_pfn && q->matrix_mdev)
>>> vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
>>> @@ -144,7 +145,7 @@ static void vfio_ap_free_aqic_resources(struct 
>>> vfio_ap_queue *q)
>>>    * Returns if ap_aqic function failed with invalid, deconfigured or
>>>    * checkstopped AP.
>>>    */
>>> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
>>> +static struct ap_queue_status vfio_ap_irq_disable(struct 
>>> vfio_ap_queue *q)
>>>   {
>>>       struct ap_qirq_ctrl aqic_gisa = {};
>>>       struct ap_queue_status status;
>>> @@ -297,6 +298,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>>       if (!q)
>>>           goto out_unlock;
>>>   +    q->matrix_mdev = matrix_mdev;
>> What is the purpose of this? Doesn't the preceding vfio_ap_get_queue()
>> already set q->matrix_mdev?
>
> You are correct, it shall be removed.
>
>>
>>>       status = vcpu->run->s.regs.gprs[1];
>>>         /* If IR bit(16) is set we enable the interrupt */
>>> @@ -1114,20 +1116,6 @@ static int vfio_ap_mdev_group_notifier(struct 
>>> notifier_block *nb,
>>>       return NOTIFY_OK;
>>>   }
>>>   -static void vfio_ap_irq_disable_apqn(int apqn)
>>> -{
>>> -    struct device *dev;
>>> -    struct vfio_ap_queue *q;
>>> -
>>> -    dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>>> -                 &apqn, match_apqn);
>>> -    if (dev) {
>>> -        q = dev_get_drvdata(dev);
>>> -        vfio_ap_irq_disable(q);
>>> -        put_device(dev);
>>> -    }
>>> -}
>>> -
>>>   int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>>>                    unsigned int retry)
>>>   {
>>> @@ -1162,6 +1150,7 @@ static int vfio_ap_mdev_reset_queues(struct 
>>> mdev_device *mdev)
>>>   {
>>>       int ret;
>>>       int rc = 0;
>>> +    struct vfio_ap_queue *q;
>>>       unsigned long apid, apqi;
>>>       struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>   @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct 
>>> mdev_device *mdev)
>>>                */
>>>               if (ret)
>>>                   rc = ret;
>>> -            vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
>>> +            q = vfio_ap_get_queue(matrix_mdev,
>>> +                          AP_MKQID(apid, apqi));
>>> +            if (q)
>>> +                vfio_ap_free_aqic_resources(q);
>> Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't
>> think so. I mean does the current code (and vfio_ap_mdev_reset_queue()
>> in particular guarantee that the reset is actually done when we arrive
>> here)? BTW, I think we have a similar problem with the current code as
>> well.
>
> If the return code from the vfio_ap_mdev_reset_queue() function
> is zero, then yes, we are guaranteed the reset was done and the
> queue is empty.  The function returns a non-zero return code if
> the reset fails or the queue the reset did not complete within a given
> amount of time, so maybe we shouldn't free AQIC resources when
> we get a non-zero return code from the reset function?
>
> There are three occasions when the vfio_ap_mdev_reset_queues()
> is called:
> 1. When the VFIO_DEVICE_RESET ioctl is invoked from userspace
>     (i.e., when the guest is started)
> 2. When the mdev fd is closed (vfio_ap_mdev_release())
> 3. When the mdev is removed (vfio_ap_mdev_remove())
>
> The IRQ resources are initialized when the PQAP(AQIC)
> is intercepted to enable interrupts. This would occur after
> the guest boots and the AP bus initializes. So, 1 would
> presumably occur before that happens. I couldn't find
> anywhere in the AP bus or zcrypt code where a PQAP(AQIC)
> is executed to disable interrupts, so my assumption is
> that IRQ disablement is accomplished by a reset on
> the guest. I'll have to ask Harald about that. So, 2 would
> occur when the guest is about to terminate and 3
> would occur only after the guest is terminated. In any
> case, it seems that IRQ resources should be cleaned up.
> Maybe it would be more appropriate to do that in the
> vfio_ap_mdev_release() and vfio_ap_mdev_remove()
> functions themselves?

After further review, I've come to the conclusion it makes
sense to cleanup the IRQ resources only in the vfio_ap_mdev_release()
function for the following reasons:
1. The KVM pointer should still be available because it is not invalidated
     until after the release callback is invoked.
2. The release callback is part of normal guest termination, so interrupt
     processing will presumably no longer be necessary for the guest.
3. The zcrypt drivers on the guest do not disable interrupt processing
     via the PQAP(AQIC) instruction (I verified this with Harald), so 
there is
     no opportunity to clean up IRQ on interception of IRQ disable.
4. It makes no sense to clean up IRQ resources in the vfio_ap_mdev_remove()
     function because the function disallows removal of the mdev when 
the KVM
     pointer is still valid because it is assumed the guest is still 
running at that point


>
>>
>> Under what circumstances do we expect !q? If we don't, then we need to
>> complain one way or another.
>
> In the current code (i.e., prior to introducing the subsequent hot
> plug patches), an APQN can not be assigned to an mdev unless it
> references a queue device bound to the vfio_ap device driver; however,
> there is nothing preventing a queue device from getting unbound
> while the guest is running (one of the problems mostly resolved by this
> series). In that case, q would be NULL.
>
>>
>> I believe that each time we call vfio_ap_mdev_reset_queue(), we will
>> also want to call vfio_ap_free_aqic_resources(q) to clean up our aqic
>> resources associated with the queue -- if any. So I would really prefer
>> having a function that does both.
>
> As stated above, I don't believe PQAP(AQIC) is ever called by
> the AP bus or zcrypt to disable IRQs, but I could be wrong about
> that so I'll verify with Harald. If that is the case, then it would
> make sense to free IRQ resources when a queue completes.
> I can either add a function that does both and call it instead of
> vfio_ap_mdev_reset_queue(). What say you?
>
>>
>>>           }
>>>       }
>>>   @@ -1302,3 +1294,40 @@ void vfio_ap_mdev_unregister(void)
>>>   {
>>>       mdev_unregister_device(&matrix_dev->device);
>>>   }
>>> +
>>> +int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>>> +{
>>> +    struct vfio_ap_queue *q;
>>> +    struct ap_queue *queue;
>>> +
>>> +    queue = to_ap_queue(&apdev->device);
>>> +
>>> +    q = kzalloc(sizeof(*q), GFP_KERNEL);
>>> +    if (!q)
>>> +        return -ENOMEM;
>>> +
>>> +    dev_set_drvdata(&queue->ap_dev.device, q);
>>> +    q->apqn = queue->qid;
>>> +    q->saved_isc = VFIO_AP_ISC_INVALID;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>>> +{
>>> +    struct vfio_ap_queue *q;
>>> +    struct ap_queue *queue;
>>> +    int apid, apqi;
>>> +
>>> +    queue = to_ap_queue(&apdev->device);
>> What is the benefit of rewriting this? You introduced
>> queue just to do queue->ap_dev to get to the apdev you
>> have in hand in the first place.
>
> I'm not quite sure what you're asking. This function is
> the callback function specified via the function pointer
> specified via the remove field of the struct ap_driver
> when the vfio_ap device driver is registered with the
> AP bus. That callback function takes a struct ap_device
> as a parameter. What am I missing here?
>
>>
>>> +
>>> +    mutex_lock(&matrix_dev->lock);
>>> +    q = dev_get_drvdata(&queue->ap_dev.device);
>>> +    dev_set_drvdata(&queue->ap_dev.device, NULL);
>>> +    apid = AP_QID_CARD(q->apqn);
>>> +    apqi = AP_QID_QUEUE(q->apqn);
>>> +    vfio_ap_mdev_reset_queue(apid, apqi, 1);
>>> +    vfio_ap_free_aqic_resources(q);
>>> +    kfree(q);
>>> +    mutex_unlock(&matrix_dev->lock);
>>> +}
>>> diff --git a/drivers/s390/crypto/vfio_ap_private.h 
>>> b/drivers/s390/crypto/vfio_ap_private.h
>>> index f46dde56b464..d9003de4fbad 100644
>>> --- a/drivers/s390/crypto/vfio_ap_private.h
>>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>>> @@ -90,8 +90,6 @@ struct ap_matrix_mdev {
>>>     extern int vfio_ap_mdev_register(void);
>>>   extern void vfio_ap_mdev_unregister(void);
>>> -int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>>> -                 unsigned int retry);
>>>     struct vfio_ap_queue {
>>>       struct ap_matrix_mdev *matrix_mdev;
>>> @@ -100,5 +98,8 @@ struct vfio_ap_queue {
>>>   #define VFIO_AP_ISC_INVALID 0xff
>>>       unsigned char saved_isc;
>>>   };
>>> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
>>> +
>>> +int vfio_ap_mdev_probe_queue(struct ap_device *queue);
>>> +void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>>> +
>>>   #endif /* _VFIO_AP_PRIVATE_H_ */
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-29 23:29     ` Tony Krowiak
  2020-10-30 16:13       ` Tony Krowiak
@ 2020-10-30 17:27       ` Halil Pasic
  2020-10-30 20:45         ` Tony Krowiak
  2020-10-30 17:42       ` Halil Pasic
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-10-30 17:27 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 29 Oct 2020 19:29:35 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 10/27/20 2:48 AM, Halil Pasic wrote:
> > On Thu, 22 Oct 2020 13:11:56 -0400
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >  
> >> The queues assigned to a matrix mediated device are currently reset when:
> >>
> >> * The VFIO_DEVICE_RESET ioctl is invoked
> >> * The mdev fd is closed by userspace (QEMU)
> >> * The mdev is removed from sysfs.  
> > What about the situation when vfio_ap_mdev_group_notifier() is called to
> > tell us that our pointer to KVM is about to become invalid? Do we need to
> > clean up the IRQ stuff there?  
> 
> After reading this question, I decided to do some tracing using
> printk's and learned that the vfio_ap_mdev_group_notifier()
> function does not get called when the guest is shutdown. The reason
> for this is because the vfio_ap_mdev_release() function, which is called
> before the KVM pointer is invalidated, unregisters the group notifier.
> 
> I took a look at some of the other drivers that register a group
> notifier in the mdev_parent_ops.open callback and each unregistered
> the notifier in the mdev_parent_ops.release callback.
> 
> So, to answer your question, there is no need to cleanup the IRQ
> stuff in the vfio_ap_mdev_group_notifier() function since it will
> not get called when the KVM pointer is invalidated. The cleanup
> should be done in the vfio_ap_mdev_release() function that gets
> called when the mdev fd is closed.

You say if vfio_ap_mdev_group_notifier() is called to tell us
that KVM going away, then it is a bug?

If that is the case, I would like that reflected in the code! By that I
mean at logging an error at least (if not BUG_ON).

Regards,
Halil

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-29 23:29     ` Tony Krowiak
  2020-10-30 16:13       ` Tony Krowiak
  2020-10-30 17:27       ` Halil Pasic
@ 2020-10-30 17:42       ` Halil Pasic
  2020-10-30 20:37         ` Tony Krowiak
  2020-10-30 17:54       ` Halil Pasic
  2020-10-30 17:56       ` Halil Pasic
  4 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-10-30 17:42 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 29 Oct 2020 19:29:35 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> >>   			 */
> >>   			if (ret)
> >>   				rc = ret;
> >> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
> >> +			q = vfio_ap_get_queue(matrix_mdev,
> >> +					      AP_MKQID(apid, apqi));
> >> +			if (q)
> >> +				vfio_ap_free_aqic_resources(q);  
> > Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't
> > think so. I mean does the current code (and vfio_ap_mdev_reset_queue()
> > in particular guarantee that the reset is actually done when we arrive
> > here)? BTW, I think we have a similar problem with the current code as
> > well.  
> 
> If the return code from the vfio_ap_mdev_reset_queue() function
> is zero, then yes, we are guaranteed the reset was done and the
> queue is empty.

I've read up on this and I disagree. We should discuss this offline.

>  The function returns a non-zero return code if
> the reset fails or the queue the reset did not complete within a given
> amount of time, so maybe we shouldn't free AQIC resources when
> we get a non-zero return code from the reset function?
> 

If the queue is gone, or broken, it won't produce interrupts or poke the
notifier bit, and we should clean up the AQIC resources.


> There are three occasions when the vfio_ap_mdev_reset_queues()
> is called:
> 1. When the VFIO_DEVICE_RESET ioctl is invoked from userspace
>      (i.e., when the guest is started)
> 2. When the mdev fd is closed (vfio_ap_mdev_release())
> 3. When the mdev is removed (vfio_ap_mdev_remove())
> 
> The IRQ resources are initialized when the PQAP(AQIC)
> is intercepted to enable interrupts. This would occur after
> the guest boots and the AP bus initializes. So, 1 would
> presumably occur before that happens. I couldn't find
> anywhere in the AP bus or zcrypt code where a PQAP(AQIC)
> is executed to disable interrupts, so my assumption is
> that IRQ disablement is accomplished by a reset on
> the guest. I'll have to ask Harald about that. So, 2 would
> occur when the guest is about to terminate and 3
> would occur only after the guest is terminated. In any
> case, it seems that IRQ resources should be cleaned up.
> Maybe it would be more appropriate to do that in the
> vfio_ap_mdev_release() and vfio_ap_mdev_remove()
> functions themselves?

I'm a bit confused. But I think you are wrong. What happens when the
guest reIPLs? I guess the subsystem reset should also do the
VFIO_DEVICE_RESET ioctl, and that has to reset the queues and disable
the interrupts. Or?

Regards,
Halil


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-29 23:29     ` Tony Krowiak
                         ` (2 preceding siblings ...)
  2020-10-30 17:42       ` Halil Pasic
@ 2020-10-30 17:54       ` Halil Pasic
  2020-10-30 20:53         ` Tony Krowiak
  2020-10-30 17:56       ` Halil Pasic
  4 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-10-30 17:54 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 29 Oct 2020 19:29:35 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> >>   			 */
> >>   			if (ret)
> >>   				rc = ret;
> >> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
> >> +			q = vfio_ap_get_queue(matrix_mdev,
> >> +					      AP_MKQID(apid, apqi));
> >> +			if (q)
> >> +				vfio_ap_free_aqic_resources(q);  

[..]

> >
> > Under what circumstances do we expect !q? If we don't, then we need to
> > complain one way or another.  
> 
> In the current code (i.e., prior to introducing the subsequent hot
> plug patches), an APQN can not be assigned to an mdev unless it
> references a queue device bound to the vfio_ap device driver; however,
> there is nothing preventing a queue device from getting unbound
> while the guest is running (one of the problems mostly resolved by this
> series). In that case, q would be NULL.

But if the queue does not belong to us any more it does not make sense
call vfio_ap_mdev_reset_queue() on it's APQN, or?

I think we should have 

if(!q)
	continue; 
at the very beginning of the loop body, or we want to be sure that q is
not null. 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-29 23:29     ` Tony Krowiak
                         ` (3 preceding siblings ...)
  2020-10-30 17:54       ` Halil Pasic
@ 2020-10-30 17:56       ` Halil Pasic
  2020-10-30 21:17         ` Tony Krowiak
  4 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-10-30 17:56 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 29 Oct 2020 19:29:35 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >> +void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
> >> +{
> >> +	struct vfio_ap_queue *q;
> >> +	struct ap_queue *queue;
> >> +	int apid, apqi;
> >> +
> >> +	queue = to_ap_queue(&apdev->device);  
> > What is the benefit of rewriting this? You introduced
> > queue just to do queue->ap_dev to get to the apdev you
> > have in hand in the first place.  
> 
> I'm not quite sure what you're asking. This function is
> the callback function specified via the function pointer
> specified via the remove field of the struct ap_driver
> when the vfio_ap device driver is registered with the
> AP bus. That callback function takes a struct ap_device
> as a parameter. What am I missing here?

Please compare the removed function vfio_ap_queue_dev_remove() with the
added function vfio_ap_mdev_remove_queue() line by line. It should
become clear.

Regards,
Halil

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-30 17:42       ` Halil Pasic
@ 2020-10-30 20:37         ` Tony Krowiak
  2020-10-31  3:43           ` Halil Pasic
  0 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-10-30 20:37 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/30/20 1:42 PM, Halil Pasic wrote:
> On Thu, 29 Oct 2020 19:29:35 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>>>    			 */
>>>>    			if (ret)
>>>>    				rc = ret;
>>>> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
>>>> +			q = vfio_ap_get_queue(matrix_mdev,
>>>> +					      AP_MKQID(apid, apqi));
>>>> +			if (q)
>>>> +				vfio_ap_free_aqic_resources(q);
>>> Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't
>>> think so. I mean does the current code (and vfio_ap_mdev_reset_queue()
>>> in particular guarantee that the reset is actually done when we arrive
>>> here)? BTW, I think we have a similar problem with the current code as
>>> well.
>> If the return code from the vfio_ap_mdev_reset_queue() function
>> is zero, then yes, we are guaranteed the reset was done and the
>> queue is empty.
> I've read up on this and I disagree. We should discuss this offline.

Maybe you are confusing things here; my statement is specific to the return
code from the vfio_ap_mdev_reset_queue() function, not the response code
from the PQAP(ZAPQ) instruction. The vfio_ap_mdev_reset_queue()
function issues the PQAP(ZAPQ) instruction and if the status response code
is 0 indicating the reset was successfully initiated, it waits for the
queue to empty. When the queue is empty, it returns 0 to indicate
the queue is reset. If the queue does not become empty after a period of 
time,
it will issue a warning (WARN_ON_ONCE) and return 0. In that case, I suppose
there is no guarantee the reset was done, so maybe a change needs to be
made there such as a non-zero return code.

>
>>    The function returns a non-zero return code if
>> the reset fails or the queue the reset did not complete within a given
>> amount of time, so maybe we shouldn't free AQIC resources when
>> we get a non-zero return code from the reset function?
>>
> If the queue is gone, or broken, it won't produce interrupts or poke the
> notifier bit, and we should clean up the AQIC resources.

True, which is what the code provided by this patch does; however,
the AQIC resources should be cleaned up only if the KVM pointer is
not NULL for reasons discussed elsewhere.

>
>
>> There are three occasions when the vfio_ap_mdev_reset_queues()
>> is called:
>> 1. When the VFIO_DEVICE_RESET ioctl is invoked from userspace
>>       (i.e., when the guest is started)
>> 2. When the mdev fd is closed (vfio_ap_mdev_release())
>> 3. When the mdev is removed (vfio_ap_mdev_remove())
>>
>> The IRQ resources are initialized when the PQAP(AQIC)
>> is intercepted to enable interrupts. This would occur after
>> the guest boots and the AP bus initializes. So, 1 would
>> presumably occur before that happens. I couldn't find
>> anywhere in the AP bus or zcrypt code where a PQAP(AQIC)
>> is executed to disable interrupts, so my assumption is
>> that IRQ disablement is accomplished by a reset on
>> the guest. I'll have to ask Harald about that. So, 2 would
>> occur when the guest is about to terminate and 3
>> would occur only after the guest is terminated. In any
>> case, it seems that IRQ resources should be cleaned up.
>> Maybe it would be more appropriate to do that in the
>> vfio_ap_mdev_release() and vfio_ap_mdev_remove()
>> functions themselves?
> I'm a bit confused. But I think you are wrong. What happens when the
> guest reIPLs? I guess the subsystem reset should also do the
> VFIO_DEVICE_RESET ioctl, and that has to reset the queues and disable
> the interrupts. Or?

What did I say that is wrong? I think you are referring
to my statement about the VFIO_DEVICE_RESET ioctl.
I am not knowledgeable about all of the circumstances
under which the VFIO_DEVICE_RESET ioctl is invoked,
but I know for a fact that it is invoked when the guest is
started as I've verified that via tracing. On the other hand,
I suspect you are correct in assuming it is also invoked on
a subsystem reset from the guest, so that also argues for
cleaning up the IRQ resources after a reset as long as
the KVM pointer is valid.

>
> Regards,
> Halil
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-30 17:27       ` Halil Pasic
@ 2020-10-30 20:45         ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-30 20:45 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/30/20 1:27 PM, Halil Pasic wrote:
> On Thu, 29 Oct 2020 19:29:35 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 10/27/20 2:48 AM, Halil Pasic wrote:
>>> On Thu, 22 Oct 2020 13:11:56 -0400
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>   
>>>> The queues assigned to a matrix mediated device are currently reset when:
>>>>
>>>> * The VFIO_DEVICE_RESET ioctl is invoked
>>>> * The mdev fd is closed by userspace (QEMU)
>>>> * The mdev is removed from sysfs.
>>> What about the situation when vfio_ap_mdev_group_notifier() is called to
>>> tell us that our pointer to KVM is about to become invalid? Do we need to
>>> clean up the IRQ stuff there?
>> After reading this question, I decided to do some tracing using
>> printk's and learned that the vfio_ap_mdev_group_notifier()
>> function does not get called when the guest is shutdown. The reason
>> for this is because the vfio_ap_mdev_release() function, which is called
>> before the KVM pointer is invalidated, unregisters the group notifier.
>>
>> I took a look at some of the other drivers that register a group
>> notifier in the mdev_parent_ops.open callback and each unregistered
>> the notifier in the mdev_parent_ops.release callback.
>>
>> So, to answer your question, there is no need to cleanup the IRQ
>> stuff in the vfio_ap_mdev_group_notifier() function since it will
>> not get called when the KVM pointer is invalidated. The cleanup
>> should be done in the vfio_ap_mdev_release() function that gets
>> called when the mdev fd is closed.
> You say if vfio_ap_mdev_group_notifier() is called to tell us
> that KVM going away, then it is a bug?

If the notifier gets called after the notifier is unregistered then
yes, I would say that is a bug; however, my tracing showed that
the notifier does not get called precisely because it is unregistered
in the release callback.

>
> If that is the case, I would like that reflected in the code! By that I
> mean at logging an error at least (if not BUG_ON).

I do not know whether or not there are other circumstances under
which the notifier can get invoked before the release callback to
make notification that the KVM pointer has been invalidated, so
I don't think this would be appropriate. I think we should just
process the call by setting the matrix_mdev->kvm pointer to
NULL and decrement the reference count to kvm.

Maybe someone from the VFIO team can provide some better
insight.

>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-30 17:54       ` Halil Pasic
@ 2020-10-30 20:53         ` Tony Krowiak
  2020-10-30 21:13           ` Tony Krowiak
  0 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-10-30 20:53 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/30/20 1:54 PM, Halil Pasic wrote:
> On Thu, 29 Oct 2020 19:29:35 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>>>    			 */
>>>>    			if (ret)
>>>>    				rc = ret;
>>>> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
>>>> +			q = vfio_ap_get_queue(matrix_mdev,
>>>> +					      AP_MKQID(apid, apqi));
>>>> +			if (q)
>>>> +				vfio_ap_free_aqic_resources(q);
> [..]
>
>>> Under what circumstances do we expect !q? If we don't, then we need to
>>> complain one way or another.
>> In the current code (i.e., prior to introducing the subsequent hot
>> plug patches), an APQN can not be assigned to an mdev unless it
>> references a queue device bound to the vfio_ap device driver; however,
>> there is nothing preventing a queue device from getting unbound
>> while the guest is running (one of the problems mostly resolved by this
>> series). In that case, q would be NULL.
> But if the queue does not belong to us any more it does not make sense
> call vfio_ap_mdev_reset_queue() on it's APQN, or?

This is precisely why we prevent a queue from being taken away
from vfio_ap (the in-use callback) when its APQN is assigned to an
mdev in this patch series. On the other hand, this is a very good
point.

>
> I think we should have
>
> if(!q)
> 	continue;
> at the very beginning of the loop body, or we want to be sure that q is
> not null.

I agree, I'll go ahead and make this change.




>   
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-30 20:53         ` Tony Krowiak
@ 2020-10-30 21:13           ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-30 21:13 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/30/20 4:53 PM, Tony Krowiak wrote:
>
>
> On 10/30/20 1:54 PM, Halil Pasic wrote:
>> On Thu, 29 Oct 2020 19:29:35 -0400
>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>
>>>>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct 
>>>>> mdev_device *mdev)
>>>>>                 */
>>>>>                if (ret)
>>>>>                    rc = ret;
>>>>> -            vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
>>>>> +            q = vfio_ap_get_queue(matrix_mdev,
>>>>> +                          AP_MKQID(apid, apqi));
>>>>> +            if (q)
>>>>> +                vfio_ap_free_aqic_resources(q);
>> [..]
>>
>>>> Under what circumstances do we expect !q? If we don't, then we need to
>>>> complain one way or another.
>>> In the current code (i.e., prior to introducing the subsequent hot
>>> plug patches), an APQN can not be assigned to an mdev unless it
>>> references a queue device bound to the vfio_ap device driver; however,
>>> there is nothing preventing a queue device from getting unbound
>>> while the guest is running (one of the problems mostly resolved by this
>>> series). In that case, q would be NULL.
>> But if the queue does not belong to us any more it does not make sense
>> call vfio_ap_mdev_reset_queue() on it's APQN, or?
>
> This is precisely why we prevent a queue from being taken away
> from vfio_ap (the in-use callback) when its APQN is assigned to an
> mdev in this patch series. On the other hand, this is a very good
> point.
>
>>
>> I think we should have
>>
>> if(!q)
>>     continue;
>> at the very beginning of the loop body, or we want to be sure that q is
>> not null.
>
> I agree, I'll go ahead and make this change.

After thinking about this a bit more, I don't think it makes sense to make
this change in this patch. For the current implementation, it is incumbent
upon the system administrator to ensure that a queue device is not unbound
from the vfio_ap device driver if its APQN is assigned to an mdev, so the
assumption here is that any APQN assigned to the mdev is (or was) bound to
the vfio_ap driver. If it was erroneously unbound while in use by a guest,
then both the guest and possibly the zcrypt driver will have simultaneous
access (one of the things fixed by this patch series). In that case, I think
it ought to be reset regardless of whether it is bound to vfio_ap or not.

Having said that, I think it makes sense to make the change you recommend
in patch 03/14. In that patch, the vfio_ap_queue object is retrieved 
from the
matrix_mdev. Since these queue objects are linked only when the queue
device is probed and unlinked when the the queue device is removed and
a queue device can not get bound to another driver while its APQN is 
assigned
to an mdev, it would make perfect sense to forego reset of a queue when
its APQN is assigned to an mdev.

>
>
>
>
>>
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-30 17:56       ` Halil Pasic
@ 2020-10-30 21:17         ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-10-30 21:17 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/30/20 1:56 PM, Halil Pasic wrote:
> On Thu, 29 Oct 2020 19:29:35 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>>> +void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>>>> +{
>>>> +	struct vfio_ap_queue *q;
>>>> +	struct ap_queue *queue;
>>>> +	int apid, apqi;
>>>> +
>>>> +	queue = to_ap_queue(&apdev->device);
>>> What is the benefit of rewriting this? You introduced
>>> queue just to do queue->ap_dev to get to the apdev you
>>> have in hand in the first place.
>> I'm not quite sure what you're asking. This function is
>> the callback function specified via the function pointer
>> specified via the remove field of the struct ap_driver
>> when the vfio_ap device driver is registered with the
>> AP bus. That callback function takes a struct ap_device
>> as a parameter. What am I missing here?
> Please compare the removed function vfio_ap_queue_dev_remove() with the
> added function vfio_ap_mdev_remove_queue() line by line. It should
> become clear.

Got it. You are one sharp cookie, I'll fix this.

>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-30 20:37         ` Tony Krowiak
@ 2020-10-31  3:43           ` Halil Pasic
  2020-11-02 14:35             ` Tony Krowiak
  0 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-10-31  3:43 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Fri, 30 Oct 2020 16:37:04 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 10/30/20 1:42 PM, Halil Pasic wrote:
> > On Thu, 29 Oct 2020 19:29:35 -0400
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >  
> >>>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> >>>>    			 */
> >>>>    			if (ret)
> >>>>    				rc = ret;
> >>>> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
> >>>> +			q = vfio_ap_get_queue(matrix_mdev,
> >>>> +					      AP_MKQID(apid, apqi));
> >>>> +			if (q)
> >>>> +				vfio_ap_free_aqic_resources(q);  
> >>> Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't
> >>> think so. I mean does the current code (and vfio_ap_mdev_reset_queue()
> >>> in particular guarantee that the reset is actually done when we arrive
> >>> here)? BTW, I think we have a similar problem with the current code as
> >>> well.  
> >> If the return code from the vfio_ap_mdev_reset_queue() function
> >> is zero, then yes, we are guaranteed the reset was done and the
> >> queue is empty.  
> > I've read up on this and I disagree. We should discuss this offline.  
> 
> Maybe you are confusing things here; my statement is specific to the return
> code from the vfio_ap_mdev_reset_queue() function, not the response code
> from the PQAP(ZAPQ) instruction. The vfio_ap_mdev_reset_queue()
> function issues the PQAP(ZAPQ) instruction and if the status response code
> is 0 indicating the reset was successfully initiated, it waits for the
> queue to empty. When the queue is empty, it returns 0 to indicate
> the queue is reset. 
> If the queue does not become empty after a period of 
> time,
> it will issue a warning (WARN_ON_ONCE) and return 0. In that case, I suppose
> there is no guarantee the reset was done, so maybe a change needs to be
> made there such as a non-zero return code.
>

I've overlooked the wait for empty. Maybe that return 0 had a part in
it. I now remember me insisting on having the wait code added when the
interrupt support was in the make. Sorry!

If we have given up on out of retries retries, we are in trouble anyway.
 
> >  
> >>    The function returns a non-zero return code if
> >> the reset fails or the queue the reset did not complete within a given
> >> amount of time, so maybe we shouldn't free AQIC resources when
> >> we get a non-zero return code from the reset function?
> >>  
> > If the queue is gone, or broken, it won't produce interrupts or poke the
> > notifier bit, and we should clean up the AQIC resources.  
> 
> True, which is what the code provided by this patch does; however,
> the AQIC resources should be cleaned up only if the KVM pointer is
> not NULL for reasons discussed elsewhere.

Yes, but these should be cleaned up before the KVM pointer becomes
null. We don't want to keep the page with the notifier byte pinned
forever, or?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-10-31  3:43           ` Halil Pasic
@ 2020-11-02 14:35             ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-11-02 14:35 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/30/20 11:43 PM, Halil Pasic wrote:
> On Fri, 30 Oct 2020 16:37:04 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 10/30/20 1:42 PM, Halil Pasic wrote:
>>> On Thu, 29 Oct 2020 19:29:35 -0400
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>   
>>>>>> @@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>>>>>     			 */
>>>>>>     			if (ret)
>>>>>>     				rc = ret;
>>>>>> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
>>>>>> +			q = vfio_ap_get_queue(matrix_mdev,
>>>>>> +					      AP_MKQID(apid, apqi));
>>>>>> +			if (q)
>>>>>> +				vfio_ap_free_aqic_resources(q);
>>>>> Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't
>>>>> think so. I mean does the current code (and vfio_ap_mdev_reset_queue()
>>>>> in particular guarantee that the reset is actually done when we arrive
>>>>> here)? BTW, I think we have a similar problem with the current code as
>>>>> well.
>>>> If the return code from the vfio_ap_mdev_reset_queue() function
>>>> is zero, then yes, we are guaranteed the reset was done and the
>>>> queue is empty.
>>> I've read up on this and I disagree. We should discuss this offline.
>> Maybe you are confusing things here; my statement is specific to the return
>> code from the vfio_ap_mdev_reset_queue() function, not the response code
>> from the PQAP(ZAPQ) instruction. The vfio_ap_mdev_reset_queue()
>> function issues the PQAP(ZAPQ) instruction and if the status response code
>> is 0 indicating the reset was successfully initiated, it waits for the
>> queue to empty. When the queue is empty, it returns 0 to indicate
>> the queue is reset.
>> If the queue does not become empty after a period of
>> time,
>> it will issue a warning (WARN_ON_ONCE) and return 0. In that case, I suppose
>> there is no guarantee the reset was done, so maybe a change needs to be
>> made there such as a non-zero return code.
>>
> I've overlooked the wait for empty. Maybe that return 0 had a part in
> it. I now remember me insisting on having the wait code added when the
> interrupt support was in the make. Sorry!
>
> If we have given up on out of retries retries, we are in trouble anyway.
>   
>>>   
>>>>     The function returns a non-zero return code if
>>>> the reset fails or the queue the reset did not complete within a given
>>>> amount of time, so maybe we shouldn't free AQIC resources when
>>>> we get a non-zero return code from the reset function?
>>>>   
>>> If the queue is gone, or broken, it won't produce interrupts or poke the
>>> notifier bit, and we should clean up the AQIC resources.
>> True, which is what the code provided by this patch does; however,
>> the AQIC resources should be cleaned up only if the KVM pointer is
>> not NULL for reasons discussed elsewhere.
> Yes, but these should be cleaned up before the KVM pointer becomes
> null. We don't want to keep the page with the notifier byte pinned
> forever, or?

No, we do not want to keep the page forever. I probably should
have been clearer. There are times we do a reset - e.g., on remove
of the mdev - at which time there should be no KVM pointer, or
else the remove will not be allowed. Of course, we won't do the
reset either, so I guess you can ignore my comment. If there is
no KVM pointer yet a page remains pinned, something bad
happened.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 02/14] 390/vfio-ap: use new AP bus interface to search for queue devices
  2020-10-27  7:01   ` Halil Pasic
@ 2020-11-02 21:57     ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-11-02 21:57 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/27/20 3:01 AM, Halil Pasic wrote:
> On Thu, 22 Oct 2020 13:11:57 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> This patch refactors the vfio_ap device driver to use the AP bus's
>> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
>> information about a queue that is bound to the vfio_ap device driver.
>> The bus's ap_get_qdev() function retrieves the queue device from a
>> hashtable keyed by APQN. This is much more efficient than looping over
>> the list of devices attached to the AP bus by several orders of
>> magnitude.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>

Thank you for your review.

>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 35 +++++++++++++------------------
>>   1 file changed, 14 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index c471832f0a30..049b97d7444c 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -26,43 +26,36 @@
>>   
>>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>>   
>> -static int match_apqn(struct device *dev, const void *data)
>> -{
>> -	struct vfio_ap_queue *q = dev_get_drvdata(dev);
>> -
>> -	return (q->apqn == *(int *)(data)) ? 1 : 0;
>> -}
>> -
>>   /**
>> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
>> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>>    * @matrix_mdev: the associated mediated matrix
>>    * @apqn: The queue APQN
>>    *
>> - * Retrieve a queue with a specific APQN from the list of the
>> - * devices of the vfio_ap_drv.
>> - * Verify that the APID and the APQI are set in the matrix.
>> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
>> + * the AP bus.
>>    *
>> - * Returns the pointer to the associated vfio_ap_queue
>> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>>    */
>>   static struct vfio_ap_queue *vfio_ap_get_queue(
>>   					struct ap_matrix_mdev *matrix_mdev,
>> -					int apqn)
>> +					unsigned long apqn)
>>   {
>> -	struct vfio_ap_queue *q;
>> -	struct device *dev;
>> +	struct ap_queue *queue;
>> +	struct vfio_ap_queue *q = NULL;
>>   
>>   	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>>   		return NULL;
>>   	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>>   		return NULL;
>>   
>> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>> -				 &apqn, match_apqn);
>> -	if (!dev)
>> +	queue = ap_get_qdev(apqn);
>> +	if (!queue)
>>   		return NULL;
>> -	q = dev_get_drvdata(dev);
>> -	q->matrix_mdev = matrix_mdev;
>> -	put_device(dev);
>> +
>> +	if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver)
>> +		q = dev_get_drvdata(&queue->ap_dev.device);
>> +
> Needs to be called with the vfio_ap lock held, right? Otherwise the queue could
> get unbound while we are working with it as a vfio_ap_queue... Noting
> new, but might we worth documenting.

This is always called with the vfio_ap lock held.

>
>> +	put_device(&queue->ap_dev.device);
>>   
>>   	return q;
>>   }


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 12/14] s390/vfio-ap: handle host AP config change notification
  2020-10-22 17:12 ` [PATCH v11 12/14] s390/vfio-ap: handle host AP config change notification Tony Krowiak
  2020-10-22 21:17   ` kernel test robot
@ 2020-11-03  9:48   ` kernel test robot
  2020-11-13 21:06     ` Tony Krowiak
  1 sibling, 1 reply; 68+ messages in thread
From: kernel test robot @ 2020-11-03  9:48 UTC (permalink / raw)
  To: Tony Krowiak, linux-s390, linux-kernel, kvm
  Cc: kbuild-all, freude, borntraeger, cohuck, mjrosato, pasic,
	alex.williamson, kwankhede

[-- Attachment #1: Type: text/plain, Size: 4423 bytes --]

Hi Tony,

I love your patch! Yet something to improve:

[auto build test ERROR on s390/features]
[also build test ERROR on linus/master v5.10-rc2 next-20201103]
[cannot apply to kvms390/next linux/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-allmodconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/32786ef6d4ba3703d993a8894ea1d763785fd3a4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
        git checkout 32786ef6d4ba3703d993a8894ea1d763785fd3a4
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/s390/crypto/vfio_ap_ops.c:1316:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
    1316 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~
   drivers/s390/crypto/vfio_ap_ops.c:1568:6: warning: no previous prototype for 'vfio_ap_mdev_hot_unplug_queue' [-Wmissing-prototypes]
    1568 | void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/s390/crypto/vfio_ap_ops.c: In function 'vfio_ap_mdev_on_cfg_remove':
   drivers/s390/crypto/vfio_ap_ops.c:1777:7: warning: variable 'unassigned' set but not used [-Wunused-but-set-variable]
    1777 |  bool unassigned = false;
         |       ^~~~~~~~~~
   drivers/s390/crypto/vfio_ap_ops.c: At top level:
   drivers/s390/crypto/vfio_ap_ops.c:1813:6: warning: no previous prototype for 'vfio_ap_mdev_on_cfg_add' [-Wmissing-prototypes]
    1813 | void vfio_ap_mdev_on_cfg_add(void)
         |      ^~~~~~~~~~~~~~~~~~~~~~~
   In file included from drivers/s390/crypto/vfio_ap_ops.c:11:
   In function 'memcpy',
       inlined from 'vfio_ap_mdev_unassign_apids' at drivers/s390/crypto/vfio_ap_ops.c:1655:3,
       inlined from 'vfio_ap_mdev_on_cfg_remove' at drivers/s390/crypto/vfio_ap_ops.c:1800:8,
       inlined from 'vfio_ap_on_cfg_changed' at drivers/s390/crypto/vfio_ap_ops.c:1836:2:
>> include/linux/string.h:402:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
     402 |    __read_overflow2();
         |    ^~~~~~~~~~~~~~~~~~

vim +/__read_overflow2 +402 include/linux/string.h

6974f0c4555e285 Daniel Micay  2017-07-12  393  
6974f0c4555e285 Daniel Micay  2017-07-12  394  __FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
6974f0c4555e285 Daniel Micay  2017-07-12  395  {
6974f0c4555e285 Daniel Micay  2017-07-12  396  	size_t p_size = __builtin_object_size(p, 0);
6974f0c4555e285 Daniel Micay  2017-07-12  397  	size_t q_size = __builtin_object_size(q, 0);
6974f0c4555e285 Daniel Micay  2017-07-12  398  	if (__builtin_constant_p(size)) {
6974f0c4555e285 Daniel Micay  2017-07-12  399  		if (p_size < size)
6974f0c4555e285 Daniel Micay  2017-07-12  400  			__write_overflow();
6974f0c4555e285 Daniel Micay  2017-07-12  401  		if (q_size < size)
6974f0c4555e285 Daniel Micay  2017-07-12 @402  			__read_overflow2();
6974f0c4555e285 Daniel Micay  2017-07-12  403  	}
6974f0c4555e285 Daniel Micay  2017-07-12  404  	if (p_size < size || q_size < size)
6974f0c4555e285 Daniel Micay  2017-07-12  405  		fortify_panic(__func__);
47227d27e2fcb01 Daniel Axtens 2020-06-03  406  	return __underlying_memcpy(p, q, size);
6974f0c4555e285 Daniel Micay  2017-07-12  407  }
6974f0c4555e285 Daniel Micay  2017-07-12  408  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 63158 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
  2020-10-28 13:57   ` Halil Pasic
@ 2020-11-03 22:49     ` Tony Krowiak
  2020-11-04 12:52       ` Halil Pasic
  2020-11-04 13:23       ` Halil Pasic
  0 siblings, 2 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-11-03 22:49 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/28/20 9:57 AM, Halil Pasic wrote:
> On Thu, 22 Oct 2020 13:12:03 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> In response to the probe or remove of a queue device, if a KVM guest is
>> using the matrix mdev to which the APQN of the queue device is assigned,
>> the vfio_ap device driver must respond accordingly. In an ideal world, the
>> queue device being probed would be hot plugged into the guest. Likewise,
>> the queue corresponding to the queue device being removed would
>> be hot unplugged from the guest. Unfortunately, the AP architecture
>> precludes plugging or unplugging individual queues. We must also
>> consider the fact that the linux device model precludes us from passing a
>> queue device through to a KVM guest that is not bound to the driver
>> facilitating the pass-through. Consequently, we are left with the choice of
>> plugging/unplugging the adapter or the domain. In the latter case, this
>> would result in taking access to the domain away for each adapter the
>> guest is using. In either case, the operation will alter a KVM guest's
>> access to one or more queues, so let's plug/unplug the adapter on
>> bind/unbind of the queue device since this corresponds to the hardware
>> entity that may be physically plugged/unplugged - i.e., a domain is not
>> a piece of hardware.
>>
>> Example:
>> =======
>> Queue devices bound to vfio_ap device driver:
>>     04.0004
>>     04.0047
>>     04.0054
>>
>>     05.0005
>>     05.0047
>>
>> Adapters and domains assigned to matrix mdev:
>>     Adapters  Domains  -> Queues
>>     04        0004        04.0004
>>     05        0047        04.0047
>>               0054        04.0054
>>                           05.0004
>>                           05.0047
>>                           05.0054
>>
>> KVM guest matrix at is startup:
>>     Adapters  Domains  -> Queues
>>     04        0004        04.0004
>>               0047        04.0047
>>               0054        04.0054
>>
>>     Adapter 05 is filtered because queue 05.0054 is not bound.
>>
>> KVM guest matrix after queue 05.0054 is bound to the vfio_ap driver:
>>     Adapters  Domains  -> Queues
>>     04        0004        04.0004
>>     05        0047        04.0047
>>               0054        04.0054
>>                           05.0004
>>                           05.0047
>>                           05.0054
>>
>>     All queues assigned to the matrix mdev are now bound.
>>
>> KVM guest matrix after queue 04.0004 is unbound:
>>
>>     Adapters  Domains  -> Queues
>>     05        0004        05.0004
>>               0047        05.0047
>>               0054        05.0054
>>
>>     Adapter 04 is filtered because 04.0004 is no longer bound.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 158 +++++++++++++++++++++++++++++-
>>   1 file changed, 155 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 7bad70d7bcef..5b34bc8fca31 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -312,6 +312,13 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>   	return 0;
>>   }
>>   
>> +static void vfio_ap_matrix_clear_masks(struct ap_matrix *matrix)
>> +{
>> +	bitmap_clear(matrix->apm, 0, AP_DEVICES);
>> +	bitmap_clear(matrix->aqm, 0, AP_DOMAINS);
>> +	bitmap_clear(matrix->adm, 0, AP_DOMAINS);
>> +}
>> +
>>   static void vfio_ap_matrix_init(struct ap_config_info *info,
>>   				struct ap_matrix *matrix)
>>   {
>> @@ -601,6 +608,104 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>>   	return 0;
>>   }
>>   
>> +static bool vfio_ap_mdev_matrixes_equal(struct ap_matrix *matrix1,
>> +					struct ap_matrix *matrix2)
>> +{
>> +	return (bitmap_equal(matrix1->apm, matrix2->apm, AP_DEVICES) &&
>> +		bitmap_equal(matrix1->aqm, matrix2->aqm, AP_DOMAINS) &&
>> +		bitmap_equal(matrix1->adm, matrix2->adm, AP_DOMAINS));
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_filter_matrix
>> + *
>> + * Filters the matrix of adapters, domains, and control domains assigned to
>> + * a matrix mdev's AP configuration and stores the result in the shadow copy of
>> + * the APCB used to supply a KVM guest's AP configuration.
>> + *
>> + * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
>> + *
>> + * Returns true if filtering has changed the shadow copy of the APCB used
>> + * to supply a KVM guest's AP configuration; otherwise, returns false.
>> + */
>> +static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> +	struct ap_matrix shadow_apcb;
>> +	unsigned long apid, apqi, apqn;
>> +
>> +	memcpy(&shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
>> +
>> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> +		/*
>> +		 * If the APID is not assigned to the host AP configuration,
>> +		 * we can not assign it to the guest's AP configuration
>> +		 */
>> +		if (!test_bit_inv(apid,
>> +				  (unsigned long *)matrix_dev->info.apm)) {
>> +			clear_bit_inv(apid, shadow_apcb.apm);
>> +			continue;
>> +		}
>> +
>> +		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>> +				     AP_DOMAINS) {
>> +			/*
>> +			 * If the APQI is not assigned to the host AP
>> +			 * configuration, then it can not be assigned to the
>> +			 * guest's AP configuration
>> +			 */
>> +			if (!test_bit_inv(apqi, (unsigned long *)
>> +					  matrix_dev->info.aqm)) {
>> +				clear_bit_inv(apqi, shadow_apcb.aqm);
>> +				continue;
>> +			}
>> +
>> +			/*
>> +			 * If the APQN is not bound to the vfio_ap device
>> +			 * driver, then we can't assign it to the guest's
>> +			 * AP configuration. The AP architecture won't
>> +			 * allow filtering of a single APQN, so let's filter
>> +			 * the APID.
>> +			 */
>> +			apqn = AP_MKQID(apid, apqi);
>> +			if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
>> +				clear_bit_inv(apid, shadow_apcb.apm);
>> +				break;
>> +			}
>> +		}
>> +
>> +		/*
>> +		 * If all APIDs have been cleared, then clear the APQIs from the
>> +		 * shadow APCB and quit filtering.
>> +		 */
>> +		if (bitmap_empty(shadow_apcb.apm, AP_DEVICES)) {
>> +			if (!bitmap_empty(shadow_apcb.aqm, AP_DOMAINS))
>> +				bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
>> +
>> +			break;
>> +		}
>> +
>> +		/*
>> +		 * If all APQIs have been cleared, then clear the APIDs from the
>> +		 * shadow APCB and quit filtering.
>> +		 */
>> +		if (bitmap_empty(shadow_apcb.aqm, AP_DOMAINS)) {
>> +			if (!bitmap_empty(shadow_apcb.apm, AP_DEVICES))
>> +				bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
>> +
>> +			break;
>> +		}
> We do this to show the no queues but bits set output in show? We could
> get rid of some code if we were to not z

I'm not sure what you are saying/asking here. The reason for this
is because there is no point in setting bits in the APCB if no queues
will be made available to the guest which is the case if the APM or
AQM are cleared.

>
>> +	}
>> +
>> +	if (vfio_ap_mdev_matrixes_equal(&matrix_mdev->shadow_apcb,
>> +					&shadow_apcb))
>> +		return false;
>> +
>> +	memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb,
>> +	       sizeof(struct ap_matrix));
>> +
>> +	return true;
>> +}
>> +
>>   enum qlink_type {
>>   	LINK_APID,
>>   	LINK_APQI,
>> @@ -1256,9 +1361,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>   	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>>   		return NOTIFY_DONE;
>>   
>> -	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
>> -	       sizeof(matrix_mdev->shadow_apcb));
>> -	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> +	if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   
>>   	return NOTIFY_OK;
>>   }
>> @@ -1369,6 +1473,18 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>>   		matrix_mdev->kvm = NULL;
>>   	}
>>   
>> +	/*
>> +	 * The shadow_apcb must be cleared.
>> +	 *
>> +	 * The shadow_apcb is committed to the guest only if the masks resulting
>> +	 * from filtering the matrix_mdev->matrix differs from the masks in the
>> +	 * shadow_apcb. Consequently, if we don't clear the masks here and a
>> +	 * guest is subsequently started, the filtering may not result in a
>> +	 * change to the shadow_apcb which will not get committed to the guest;
>> +	 * in that case, the guest will be left without any queues.
>> +	 */
>> +	vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
>> +
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>> @@ -1466,6 +1582,16 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
>>   	}
>>   }
>>   
>> +static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
>> +{
>> +
>> +	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
>> +		return;
>> +
>> +	if (vfio_ap_mdev_filter_guest_matrix(q->matrix_mdev))
>> +		vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
> Here we do more work than necessary. At this point we now, that
> we either put the APID of the queue in the shadow_apcb or do nothing. To
> decide if we have to put the APID in the shadow apcb we need to
> check for the cartesian product of shadow_apcb.aqm with the APID, if the
> queues identified by those APQNs are bound to the vfio_ap driver. The
> vfio_ap_mdev_filter_guest_matrix() is going to do a lookup for each
> assigned APQN.

That is true and I believe in the previous iteration that is what was
done. In the interest of keeping things simple and consistent across
the various interfaces that require filtering, I decided to use one
function instead of duplicating function in multiple places. Let me think
on this and maybe I can come up with a way to kill many birds with one
stone so. The question is, how often is type of thing going to happen 
(i.e.,
is performance really an issue here?).

>
>> +}
>> +
>>   int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>>   {
>>   	struct vfio_ap_queue *q;
>> @@ -1482,11 +1608,36 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>>   	q->apqn = queue->qid;
>>   	q->saved_isc = VFIO_AP_ISC_INVALID;
>>   	vfio_ap_queue_link_mdev(q);
>> +	vfio_ap_mdev_hot_plug_queue(q);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return 0;
>>   }
>>   
>> +void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
>> +{
>> +	unsigned long apid = AP_QID_CARD(q->apqn);
>> +
>> +	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
>> +		return;
>> +
>> +	/*
>> +	 * If the APID is assigned to the guest, then let's
>> +	 * go ahead and unplug the adapter since the
>> +	 * architecture does not provide a means to unplug
>> +	 * an individual queue.
>> +	 */
>> +	if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
>> +		clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
> Shouldn't we check aqm as well? I mean it may be clear at this point
> bacause of info->aqm. If the bit is clear, we don't have to remove
> the apm bit.

The rule we agreed upon is that if a queue is removed, we unplug
the card because we can't unplug an individual queue, so this code
is consistent with the stated rule. Typically, a queue is unplugged
because the adapter has been deconfigured or is broken which means
that all queues for that adapter will be removed in succession. On the
other hand, that situation would be handled when the last queue is
removed if we check the AQM, so I'm not adverse to making that
check if you insist. Of course, if the queue is manually unbound from
the vfio driver, what you are asking for makes sense I suppose. I'll have
to think about this one some more, but feel free to respond to this.

>
>> +
>> +		if (bitmap_empty(q->matrix_mdev->shadow_apcb.apm, AP_DEVICES))
>> +			bitmap_clear(q->matrix_mdev->shadow_apcb.aqm, 0,
>> +				     AP_DOMAINS);
>> +
>> +		vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
>> +	}
>> +}
>> +
>>   void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>>   {
>>   	struct vfio_ap_queue *q;
>> @@ -1497,6 +1648,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>>   
>>   	mutex_lock(&matrix_dev->lock);
>>   	q = dev_get_drvdata(&queue->ap_dev.device);
>> +	vfio_ap_mdev_hot_unplug_queue(q);
> Puh this is ugly. In an ideal world the guest would be guaranteed to not
> get any writes to the notifier byte after it has seen that the queue is
> gone (or the interrupts were disabled).
>
> The reset below might too late as the vcpus may go back immediately.
>
> I don't have a good solution for this with the tools currently at
> our disposal. We could simulate an external reset for the queue before
> the update do the APCB, or just disable the interrupts. These are ugly
> in their own way.
>
> Switching to emulation mode might be something for the future, but right
> now it is also ugly.
>
> Any thoughts? Am I just dreaming up a problem here?

I realize that we shouldn't make any assumptions about the OS
running on the guest, but in the world as it exists today the only
OS supported as a guest of linux is linux. Consequently, when
the adapter is unplugged from the guest, the zcrypt driver will
reset the queues associated with it thus disabling interrupts.
Unfortuately we can't control the flow between the host and the
guest, so there is a possibility the vfio driver could be first to
reset the queue in question. I'm not sure this is a problem
because either the vfio driver or the zcrypt driver on the guest
will get a response code indicating a reset is in progress at which
time it will wait for completion before trying again.

The bottom line is, in my opinion you are dreaming up a problem
here. Ultimately, when an adapter is removed from the guest,
the guest will no longer access any queue on that adapter and
the adapter will be reset before giving it back to the host.

>
> Regards,
> Halil
>
>
>>   	dev_set_drvdata(&queue->ap_dev.device, NULL);
>>   	apid = AP_QID_CARD(q->apqn);
>>   	apqi = AP_QID_QUEUE(q->apqn);


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
  2020-11-03 22:49     ` Tony Krowiak
@ 2020-11-04 12:52       ` Halil Pasic
  2020-11-04 21:20         ` Tony Krowiak
  2020-11-04 13:23       ` Halil Pasic
  1 sibling, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-11-04 12:52 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 3 Nov 2020 17:49:21 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >>   
> >> +void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
> >> +{
> >> +	unsigned long apid = AP_QID_CARD(q->apqn);
> >> +
> >> +	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
> >> +		return;
> >> +
> >> +	/*
> >> +	 * If the APID is assigned to the guest, then let's
> >> +	 * go ahead and unplug the adapter since the
> >> +	 * architecture does not provide a means to unplug
> >> +	 * an individual queue.
> >> +	 */
> >> +	if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
> >> +		clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);  
> > Shouldn't we check aqm as well? I mean it may be clear at this point
> > bacause of info->aqm. If the bit is clear, we don't have to remove
> > the apm bit.  
> 
> The rule we agreed upon is that if a queue is removed, we unplug
> the card because we can't unplug an individual queue, so this code
> is consistent with the stated rule.

All I'm asking for is to verify that the queue is actually plugged. The
queue is actually plugged iff 
test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm) && test_bit_inv(apqi,
q->matrix_mdev->shadow_apcb.aqm).

There is no point in unplugging the whole card, if the queue removed is
unplugged in the first place.

> Typically, a queue is unplugged
> because the adapter has been deconfigured or is broken which means
> that all queues for that adapter will be removed in succession. On the
> other hand, that situation would be handled when the last queue is
> removed if we check the AQM, so I'm not adverse to making that
> check if you insist. 

I don't agree. Let's detail your scenario. We have a nicely
operating card which is as a whole passed trough to our guest. It
goes broken, and the ap bus decides to deconstruct the queues.
Already the first queue removed would unplug the the card, because
both the apm and the aqm bits are set at this point. Subsequent removals
then see that the apm bit is removed. Actually IMHO everything works
like without the extra check on aqm (in this scenario).

Would make reasoning about the code much easier to me, so sorry I do
insist.

> Of course, if the queue is manually unbound from
> the vfio driver, what you are asking for makes sense I suppose. I'll have
> to think about this one some more, but feel free to respond to this.

I'm not sure the situation where the queues ->mdev_matrix pointer is set
but the apqi is not in the shadow_apcb can actually happen (races not
considered). But I'm sure the code is suggesting it can, because 
vfio_ap_mdev_filter_guest_matrix() has a third parameter called filter_apid,
which governs whether the apm or the aqm bit should be removed. And
vfio_ap_mdev_filter_guest_matrix() does get called with filter_apid=false in
assign_domain_store() and I don't see subsequent unlink operations that would
severe q->mdev_matrix.

Another case where the aqm may get filtered in
vfio_ap_mdev_filter_guest_matrix() is the info->aqm bit not set, as I've
mentioned in my previous mail. If that can not happen, we should turn
that into an assert.

Actually if you are convinced that apqi bit is always set in the
q->matrix_mdev->shadow_apcb.aqm, I would agree to turning that into an
assertion instead of condition. Then if not completely convinced, I
could at least try to trigger the assert :).

Regards,
Halil

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
  2020-11-03 22:49     ` Tony Krowiak
  2020-11-04 12:52       ` Halil Pasic
@ 2020-11-04 13:23       ` Halil Pasic
  1 sibling, 0 replies; 68+ messages in thread
From: Halil Pasic @ 2020-11-04 13:23 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 3 Nov 2020 17:49:21 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> > We do this to show the no queues but bits set output in show? We could
> > get rid of some code if we were to not z  

Managed to delete "eroize" fro "zeroize"

> 
> I'm not sure what you are saying/asking here. The reason for this
> is because there is no point in setting bits in the APCB if no queues
> will be made available to the guest which is the case if the APM or
> AQM are cleared.

Exactly my train of thought! There is no point doing work (here
zeroizing) that has no effect.

Also I'm leaning towards incremental updates to the shadow_apcb (instead
of basically recomputing it from the scratch each time). One thing I'm
particularly worried abut is that because of the third argument of
vfio_ap_mdev_filter_guest_matrix() called filter_apid, we could end up
with different filtering decision than previously. E.g. we decided to
filter the card on e.g. removal of a single queueu, but then somebody
does an assign domain, and suddenly we unplug the domain and plug the
card. With incremental changes the shadow_apcb, we could do less work
(revise only what needs to be), and it would be more straight forward
to reason about the absence of inconsistent filtering.

Regards,
Halil

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
  2020-11-04 12:52       ` Halil Pasic
@ 2020-11-04 21:20         ` Tony Krowiak
  2020-11-05 12:27           ` Halil Pasic
  0 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-11-04 21:20 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 11/4/20 7:52 AM, Halil Pasic wrote:
> On Tue, 3 Nov 2020 17:49:21 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>>>    
>>>> +void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
>>>> +{
>>>> +	unsigned long apid = AP_QID_CARD(q->apqn);
>>>> +
>>>> +	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
>>>> +		return;
>>>> +
>>>> +	/*
>>>> +	 * If the APID is assigned to the guest, then let's
>>>> +	 * go ahead and unplug the adapter since the
>>>> +	 * architecture does not provide a means to unplug
>>>> +	 * an individual queue.
>>>> +	 */
>>>> +	if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm)) {
>>>> +		clear_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm);
>>> Shouldn't we check aqm as well? I mean it may be clear at this point
>>> bacause of info->aqm. If the bit is clear, we don't have to remove
>>> the apm bit.
>> The rule we agreed upon is that if a queue is removed, we unplug
>> the card because we can't unplug an individual queue, so this code
>> is consistent with the stated rule.
> All I'm asking for is to verify that the queue is actually plugged. The
> queue is actually plugged iff
> test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm) && test_bit_inv(apqi,
> q->matrix_mdev->shadow_apcb.aqm).
>
> There is no point in unplugging the whole card, if the queue removed is
> unplugged in the first place.

No problem, I can make that change.

>
>> Typically, a queue is unplugged
>> because the adapter has been deconfigured or is broken which means
>> that all queues for that adapter will be removed in succession. On the
>> other hand, that situation would be handled when the last queue is
>> removed if we check the AQM, so I'm not adverse to making that
>> check if you insist.
> I don't agree. Let's detail your scenario. We have a nicely
> operating card which is as a whole passed trough to our guest. It
> goes broken, and the ap bus decides to deconstruct the queues.
> Already the first queue removed would unplug the the card, because
> both the apm and the aqm bits are set at this point. Subsequent removals
> then see that the apm bit is removed. Actually IMHO everything works
> like without the extra check on aqm (in this scenario).
>
> Would make reasoning about the code much easier to me, so sorry I do
> insist.

As you said, it works as-is in the scenario you pointed out:)
Whether it makes it any easier to understand the code is in
the eyes of the beholder (for example, I disagree),
but I'm willing to make the change, it's not a big deal.

>
>> Of course, if the queue is manually unbound from
>> the vfio driver, what you are asking for makes sense I suppose. I'll have
>> to think about this one some more, but feel free to respond to this.
> I'm not sure the situation where the queues ->mdev_matrix pointer is set
> but the apqi is not in the shadow_apcb can actually happen (races not
> considered).

Of course it can, for example:

1. No queues bound to vfio driver

2. APQN 04.0004 assigned to matrix mdev

3. Guest started:
     a. No bits set in shadow_apcb because no queues are bound to vfio

4. queue device 04.0004 is bound to the driver
     a. q->matrix_mdev is set because 04.0004 is assigned to matrix mdev
     b. apqi 0004 is not in shadow_apcb (see 3a.)


> But I'm sure the code is suggesting it can, because
> vfio_ap_mdev_filter_guest_matrix() has a third parameter called filter_apid,
> which governs whether the apm or the aqm bit should be removed. And
> vfio_ap_mdev_filter_guest_matrix() does get called with filter_apid=false in
> assign_domain_store() and I don't see subsequent unlink operations that would
> severe q->mdev_matrix.

I think you may be conflating two different things. The q in q->matrix_mdev
represents a queue device bound to the driver. The link to matrix_mdev
indicates the APQN of the queue device is assigned to the matrix_mdev.
When a new domain is assigned to matrix_mdev, we know that
all APQNS currently assigned to the shadow_apcb  are bound to the vfio 
driver
because of previous filtering, so we are only concerned with those APQNs
with the APQI of the new domain being assigned.

1. Queues bound to vfio_ap:
     04.0004
     04.0047
2. APQNs assigned to matrix_mdev:
     04.0004
     04.0047
3. shadow_apcb:
     04.0004
     04.0047
4. Assign domain 0054 to matrix_mdev
5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
6. no change to shadow_apcb:
     04.0004
     04.0047

Or:

1. Queues bound to vfio_ap:
     04.0004
     04.0047
     04.0054
2. APQNs assigned to matrix_mdev:
     04.0004
     04.0047
3. shadow_apcb:
     04.0004
     04.0047
4. Assign domain 0054 to matrix_mdev
5. APQNs assigned to matrix_mdev
     04.0004
     04.0047
     04.0054
5. APQI 0054 does not get filtered because 04.0054 is bound to vfio_ap
6. shadow_apcb after filtering:
     04.0004
     04.0047
     04.0054

I'm not sure why you are bringing up unlinking in the context of assigning
a new domain. Unlinking only occurs when an APID or APQI is unassigned.

>
> Another case where the aqm may get filtered in
> vfio_ap_mdev_filter_guest_matrix() is the info->aqm bit not set, as I've
> mentioned in my previous mail. If that can not happen, we should turn
> that into an assert.

In an earlier email of yours, you brought up the scenario whereby
a queue is probed not because of a change in the QCI info,
but because an unbound queue is bound; for instance manually.
I made a change to account for that so consider the following
scenario:

1. APQI 0004 removed from info->aqm
2. AP bus notifies vfio_ap that AP configuration has changed
3. vfio_ap removes APQI 0004 from shadow_apcb
4. Userspace binds queue 04.0004 to vfio_ap
5. Filtering code filters 0004 because it has been removed
     from info->aqm
6. AP bus notifies vfio_ap scan is over

>
> Actually if you are convinced that apqi bit is always set in the
> q->matrix_mdev->shadow_apcb.aqm, I would agree to turning that into an
> assertion instead of condition. Then if not completely convinced, I
> could at least try to trigger the assert :).
>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
  2020-11-04 21:20         ` Tony Krowiak
@ 2020-11-05 12:27           ` Halil Pasic
  2020-11-13 20:36             ` Tony Krowiak
  0 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-11-05 12:27 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Wed, 4 Nov 2020 16:20:26 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> > But I'm sure the code is suggesting it can, because
> > vfio_ap_mdev_filter_guest_matrix() has a third parameter called filter_apid,
> > which governs whether the apm or the aqm bit should be removed. And
> > vfio_ap_mdev_filter_guest_matrix() does get called with filter_apid=false in
> > assign_domain_store() and I don't see subsequent unlink operations that would
> > severe q->mdev_matrix.  
> 
> I think you may be conflating two different things. The q in q->matrix_mdev
> represents a queue device bound to the driver. The link to matrix_mdev
> indicates the APQN of the queue device is assigned to the matrix_mdev.
> When a new domain is assigned to matrix_mdev, we know that
> all APQNS currently assigned to the shadow_apcb  are bound to the vfio 
> driver
> because of previous filtering, so we are only concerned with those APQNs
> with the APQI of the new domain being assigned.
> 
> 1. Queues bound to vfio_ap:
>      04.0004
>      04.0047
> 2. APQNs assigned to matrix_mdev:
>      04.0004
>      04.0047
> 3. shadow_apcb:
>      04.0004
>      04.0047
> 4. Assign domain 0054 to matrix_mdev
> 5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
> 6. no change to shadow_apcb:
>      04.0004
>      04.0047

Let me please expand on your example. For reference see the filtering
code after the example.

1. Queues bound to vfio_ap:
     04.0004
     04.0047
     05.0004
     05.0047
     05.0054
2. APQNs assigned to matrix_mdev:
     04.0004
     04.0047
3. shadow_apcb:
     04.0004
     04.0047
4. Assign domain 0054 to matrix_mdev
5. APQNs assigned to matrix_mdev:
     04.0004
     04.0047
     04.0054
5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
6. no change to shadow_apcb:
     04.0004
     04.0047
7. assign adapter 05
8. APQNs assigned to matrix_mdev:
     04.0004
     04.0047
     04.0054 
     05.0004
     05.0047
     05.0054
9. shadow_apcb changes to:
     05.0004
     05.0047
     05.0054
because now vfio_ap_mdev_filter_guest_matrix() is called with filter_apid=true
10. assign domain 0052
11. APQNs assigned to matrix_mdev:
     04.0004
     04.0047
     04.0053     
     04.0054 
     05.0004
     05.0047
     05.0053
     05.0054
11. shadow_apcb changes to 
     04.0004
     04.0047
     05.0004
     05.0047
     because now filter_guest_matrix() is called with filter_apid=false
     and apqis 0053 and 0054 get filtered
12. 05.0054 gets removed (unbound)
13. with your current code we unplug adapter 05 from shadow_apcb
    despite the fact that 05.0054 was not in the shadow_apcb in
    the first place
14. unassign adapter 05
15. unassign domain 0053
16. APQNs assigned to matrix_mdev:     
     04.0004
     04.0047
     04.0054
17. shadow apcb is
    04.0004
    04.0047
16. assign adapter 05
15. APQNs assigned to matrix_mdev:     
     04.0004
     04.0047
     04.0054
     05.0004
     05.0047     
     05.0054
16. shadow_apcb changes to 
     <empty>
     because now filter_guest_matrix() is called with filter_apid=true
     and apqn 04 gets filtered because queues 04.0053 are not bound
     and apqn 05 gets filtered because queues 05.0053 are not bound

static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev, 
                                            bool filter_apid)                   
{                                                                               
        struct ap_matrix shadow_apcb;                                           
        unsigned long apid, apqi, apqn;                                         
                                                                                
        memcpy(&shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));   
                                                                                
        for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {       
                /*                                                              
                 * If the APID is not assigned to the host AP configuration,    
                 * we can not assign it to the guest's AP configuration         
                 */                                                             
                if (!test_bit_inv(apid, (unsigned long *)                       
                                  matrix_dev->config_info.apm)) {               
                        clear_bit_inv(apid, shadow_apcb.apm);                   
                        continue;                                               
                }                                                               
                                                                                
                for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,             
                                     AP_DOMAINS) {                              
                        /*                                                      
                         * If the APQI is not assigned to the host AP           
                         * configuration, then it can not be assigned to the    
                         * guest's AP configuration                             
                         */                                                     
                        if (!test_bit_inv(apqi, (unsigned long *)               
                                          matrix_dev->config_info.aqm)) {       
                                clear_bit_inv(apqi, shadow_apcb.aqm);           
                                continue;                                       
                        }                                                       
                                                                                
                        /*                                                      
                         * If the APQN is not bound to the vfio_ap device       
                         * driver, then we can't assign it to the guest's       
                         * AP configuration. The AP architecture won't          
                         * allow filtering of a single APQN, so let's filter    
                         * the APID.                                            
                         */                                                     
                        apqn = AP_MKQID(apid, apqi);                            
                                                                                
                        if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {       
                                if (filter_apid) {                              
                                        clear_bit_inv(apid, shadow_apcb.apm);   
                                        break;                                  
                                }                                               
                                                                                
                                clear_bit_inv(apqi, shadow_apcb.aqm);           
                                continue;                                       
                        }                                                       
                }

I realize this scenario (to play through to the end) requires  
manually unbound queue (more precisely queue missing not because
of host ap config or because of a[pq]mask), but just one 'hole' suffices.

I'm afraid, that I might be bitching around, because last time it was me
who downplayed the effects of such 'holes'.

Nevertheless, I would like to ask you to verify the scenario I've
sketched, or complain if I've gotten something wrong.

Regarding solutions to the problem. It makes no sense to talk about a
solution, before agreeing on the existence of the problem. Nevertheless
I will write down two sentences, mostly as a reminder to myself, for the
case we do agree on the existence of the problem. The simplest approach
is to always filter by apid. That way we get a quirky adapter unplug
right at steps 4, but it won't create the complicated mess we have in
the rest of the points. Another idea is to restrict the overprovisioning
of domains. Basically we would make the step 4 fail because we detected
a 'hole'. But this idea has its own problems, and in some scenarios
it does boil down to the unplug the adapter rule. 

[..]

> 
> I'm not sure why you are bringing up unlinking in the context of assigning
> a new domain. Unlinking only occurs when an APID or APQI is unassigned.

Are you certain? What about vfio_ap_mdev_on_cfg_remove()? I believe it
unplugs from the shadow_apcb, but it does not change the
assignments to the matrix_mdev. We do that so we know in remove that the
queue was already cleaned up, and does not need more cleanup.

Regards,
Halil


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-10-27 13:27   ` Halil Pasic
@ 2020-11-13 17:14     ` Tony Krowiak
  2020-11-13 23:47       ` Halil Pasic
  0 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-11-13 17:14 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/27/20 9:27 AM, Halil Pasic wrote:
> On Thu, 22 Oct 2020 13:12:00 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Let's implement the callback to indicate when an APQN
>> is in use by the vfio_ap device driver. The callback is
>> invoked whenever a change to the apmask or aqmask would
>> result in one or more queue devices being removed from the driver. The
>> vfio_ap device driver will indicate a resource is in use
>> if the APQN of any of the queue devices to be removed are assigned to
>> any of the matrix mdevs under the driver's control.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c     |  1 +
>>   drivers/s390/crypto/vfio_ap_ops.c     | 78 +++++++++++++++++++--------
>>   drivers/s390/crypto/vfio_ap_private.h |  2 +
>>   3 files changed, 60 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index 73bd073fd5d3..8934471b7944 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
>>   	memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
>>   	vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
>>   	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
>> +	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>>   	vfio_ap_drv.ids = ap_queue_ids;
>>   
>>   	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 1357f8f8b7e4..9e9fad560859 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -522,18 +522,40 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>>   	return 0;
>>   }
>>   
>> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
>> +			 "already assigned to %s"
>> +
>> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
>> +					 unsigned long *apm,
>> +					 unsigned long *aqm)
>> +{
>> +	unsigned long apid, apqi;
>> +
>> +	for_each_set_bit_inv(apid, apm, AP_DEVICES)
>> +		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
>> +			pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
> Isn't error rather severe for this? For my taste even warning would be
> severe for this.

The user only sees a EADDRINUSE returned from the sysfs interface,
so Conny asked if I could log a message to indicate which APQNs are
in use by which mdev. I can change this to an info message, but it
will be missed if the log level is set higher. Maybe Conny can put in
her two cents here since she asked for this.

>
>> +}
>> +
>>   /**
>>    * vfio_ap_mdev_verify_no_sharing
>>    *
>> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
>> - * and AP queue indexes comprising the AP matrix are not configured for another
>> + * Verifies that each APQN derived from the cross product of the AP adapter IDs
>> + * and AP queue indexes comprising an AP matrix is not assigned to a
>>    * mediated device. AP queue sharing is not allowed.
>>    *
>> - * @matrix_mdev: the mediated matrix device
>> + * @matrix_mdev: the mediated matrix device to which the APQNs being verified
>> + *		 are assigned. If the value is not NULL, then verification will
>> + *		 proceed for all other matrix mediated devices; otherwise, all
>> + *		 matrix mediated devices will be verified.
>> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
>> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
>>    *
>> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
>> + * Returns 0 if no APQNs are not shared, otherwise; returns -EADDRINUSE if one
>> + * or more APQNs are shared.
>>    */
>> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>> +static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>> +					  unsigned long *mdev_apm,
>> +					  unsigned long *mdev_aqm)
>>   {
>>   	struct ap_matrix_mdev *lstdev;
>>   	DECLARE_BITMAP(apm, AP_DEVICES);
>> @@ -550,14 +572,15 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>   		 * We work on full longs, as we can only exclude the leftover
>>   		 * bits in non-inverse order. The leftover is all zeros.
>>   		 */
>> -		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
>> -				lstdev->matrix.apm, AP_DEVICES))
>> +		if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
>>   			continue;
>>   
>> -		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
>> -				lstdev->matrix.aqm, AP_DOMAINS))
>> +		if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
>>   			continue;
>>   
>> +		vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
>> +					     apm, aqm);
>> +
>>   		return -EADDRINUSE;
>>   	}
>>   
>> @@ -683,6 +706,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>>   {
>>   	int ret;
>>   	unsigned long apid;
>> +	DECLARE_BITMAP(apm, AP_DEVICES);
>>   	struct mdev_device *mdev = mdev_from_dev(dev);
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   
>> @@ -708,18 +732,18 @@ static ssize_t assign_adapter_store(struct device *dev,
>>   	if (ret)
>>   		goto done;
>>   
>> -	set_bit_inv(apid, matrix_mdev->matrix.apm);
>> +	memset(apm, 0, sizeof(apm));
>> +	set_bit_inv(apid, apm);
>>   
>> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>> +	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
>> +					     matrix_mdev->matrix.aqm);
> What is the benefit of using a copy here? I mean we have the vfio_ap lock
> so nobody can see the bit we speculatively flipped.

The vfio_ap_mdev_verify_no_sharing() function definition was changed
so that it can also be re-used by the vfio_ap_mdev_resource_in_use()
function rather than duplicating that code for the in_use callback. The
in-use callback is invoked by the AP bus which has no concept of
a mediated device, so I made this change to accommodate that fact.

>
> I've also pointed out in the previous patch that in_use() isn't
> perfectly reliable (at least in theory) because of a race.

We discussed that privately and determined that the sysfs assignment
interfaces will use mutex_trylock() to avoid races.

>
> Otherwise looks good to me!
>
>>   	if (ret)
>> -		goto share_err;
>> +		goto done;
>>   
>> +	set_bit_inv(apid, matrix_mdev->matrix.apm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
>>   	ret = count;
>> -	goto done;
>>   
>> -share_err:
>> -	clear_bit_inv(apid, matrix_mdev->matrix.apm);
>>   done:
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>> @@ -831,6 +855,7 @@ static ssize_t assign_domain_store(struct device *dev,
>>   {
>>   	int ret;
>>   	unsigned long apqi;
>> +	DECLARE_BITMAP(aqm, AP_DOMAINS);
>>   	struct mdev_device *mdev = mdev_from_dev(dev);
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>> @@ -851,18 +876,18 @@ static ssize_t assign_domain_store(struct device *dev,
>>   	if (ret)
>>   		goto done;
>>   
>> -	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>> +	memset(aqm, 0, sizeof(aqm));
>> +	set_bit_inv(apqi, aqm);
>>   
>> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>> +	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
>> +					     matrix_mdev->matrix.apm, aqm);
>>   	if (ret)
>> -		goto share_err;
>> +		goto done;
>>   
>> +	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
>>   	ret = count;
>> -	goto done;
>>   
>> -share_err:
>> -	clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>   done:
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>> @@ -1442,3 +1467,14 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>>   	kfree(q);
>>   	mutex_unlock(&matrix_dev->lock);
>>   }
>> +
>> +bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
>> +{
>> +	bool in_use;
>> +
>> +	mutex_lock(&matrix_dev->lock);
>> +	in_use = !!vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
>> +	mutex_unlock(&matrix_dev->lock);
>> +
>> +	return in_use;
>> +}
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index 4e5cc72fc0db..c1d8b5507610 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -105,4 +105,6 @@ struct vfio_ap_queue {
>>   int vfio_ap_mdev_probe_queue(struct ap_device *queue);
>>   void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>>   
>> +bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
>> +
>>   #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 06/14] s390/vfio-ap: introduce shadow APCB
  2020-10-28  8:11   ` Halil Pasic
@ 2020-11-13 17:18     ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-11-13 17:18 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/28/20 4:11 AM, Halil Pasic wrote:
> On Thu, 22 Oct 2020 13:12:01 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The APCB is a field within the CRYCB that provides the AP configuration
>> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
>> maintain it for the lifespan of the guest.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c     | 24 +++++++++++++++++++-----
>>   drivers/s390/crypto/vfio_ap_private.h |  2 ++
>>   2 files changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 9e9fad560859..9791761aa7fd 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -320,6 +320,19 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
>>   	matrix->adm_max = info->apxa ? info->Nd : 15;
>>   }
>>   
>> +static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> +	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
>> +}
>> +
>> +static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> +	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>> +				  matrix_mdev->shadow_apcb.apm,
>> +				  matrix_mdev->shadow_apcb.aqm,
>> +				  matrix_mdev->shadow_apcb.adm);
>> +}
>> +
>>   static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>   {
>>   	struct ap_matrix_mdev *matrix_mdev;
>> @@ -335,6 +348,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>   
>>   	matrix_mdev->mdev = mdev;
>>   	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>> +	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
>>   	hash_init(matrix_mdev->qtable);
>>   	mdev_set_drvdata(mdev, matrix_mdev);
>>   	matrix_mdev->pqap_hook.hook = handle_pqap;
>> @@ -1213,13 +1227,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>   	if (ret)
>>   		return NOTIFY_DONE;
>>   
>> -	/* If there is no CRYCB pointer, then we can't copy the masks */
>> -	if (!matrix_mdev->kvm->arch.crypto.crycbd)
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>>   		return NOTIFY_DONE;
>>   
>> -	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
>> -				  matrix_mdev->matrix.aqm,
>> -				  matrix_mdev->matrix.adm);
>> +	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
>> +	       sizeof(matrix_mdev->shadow_apcb));
>> +	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   
>>   	return NOTIFY_OK;
>>   }
>> @@ -1329,6 +1342,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>>   		kvm_put_kvm(matrix_mdev->kvm);
>>   		matrix_mdev->kvm = NULL;
>>   	}
>> +
> Unrelated change.
>
> Otherwise patch looks OK.
>
> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>

I'll fix it. Thanks for your review.

>
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index c1d8b5507610..fc8634cee485 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -75,6 +75,7 @@ struct ap_matrix {
>>    * @list:	allows the ap_matrix_mdev struct to be added to a list
>>    * @matrix:	the adapters, usage domains and control domains assigned to the
>>    *		mediated matrix device.
>> + * @shadow_apcb:    the shadow copy of the APCB field of the KVM guest's CRYCB
>>    * @group_notifier: notifier block used for specifying callback function for
>>    *		    handling the VFIO_GROUP_NOTIFY_SET_KVM event
>>    * @kvm:	the struct holding guest's state
>> @@ -82,6 +83,7 @@ struct ap_matrix {
>>   struct ap_matrix_mdev {
>>   	struct list_head node;
>>   	struct ap_matrix matrix;
>> +	struct ap_matrix shadow_apcb;
>>   	struct notifier_block group_notifier;
>>   	struct notifier_block iommu_notifier;
>>   	struct kvm *kvm;


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-10-28  8:17   ` Halil Pasic
@ 2020-11-13 17:27     ` Tony Krowiak
  2020-11-13 23:12       ` Halil Pasic
  0 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-11-13 17:27 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 10/28/20 4:17 AM, Halil Pasic wrote:
> On Thu, 22 Oct 2020 13:12:02 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> +static ssize_t guest_matrix_show(struct device *dev,
>> +				 struct device_attribute *attr, char *buf)
>> +{
>> +	ssize_t nchars;
>> +	struct mdev_device *mdev = mdev_from_dev(dev);
>> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>> +		return -ENODEV;
> I'm wondering, would it make sense to have guest_matrix display the would
> be guest matrix when we don't have a KVM? With the filtering in
> place, the question in what guest_matrix would my (assign) matrix result
> right now if I were to hook up my vfio_ap_mdev to a guest seems a
> legitimate one.

A couple of thoughts here:
* The ENODEV informs the user that there is no guest running
    which makes sense to me given this interface displays the
    guest matrix. The alternative, which I considered, was to
    display an empty matrix (i.e., nothing).
* This would be a pretty drastic change to the design because
    the shadow_apcb - which is what is displayed via this interface - is
    only updated when the guest is started and while it is running (i.e.,
    hot plug of new adapters/domains). Making this change would
    require changing that entire design concept which I am reluctant
    to do at this point in the game.


>
>
>> +
>> +	mutex_lock(&matrix_dev->lock);
>> +	nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->shadow_apcb, buf);
>> +	mutex_unlock(&matrix_dev->lock);
>> +
>> +	return nchars;
>> +}
>> +static DEVICE_ATTR_RO(guest_matrix);


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device
  2020-11-05 12:27           ` Halil Pasic
@ 2020-11-13 20:36             ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-11-13 20:36 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 11/5/20 7:27 AM, Halil Pasic wrote:
> On Wed, 4 Nov 2020 16:20:26 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>> But I'm sure the code is suggesting it can, because
>>> vfio_ap_mdev_filter_guest_matrix() has a third parameter called filter_apid,
>>> which governs whether the apm or the aqm bit should be removed. And
>>> vfio_ap_mdev_filter_guest_matrix() does get called with filter_apid=false in
>>> assign_domain_store() and I don't see subsequent unlink operations that would
>>> severe q->mdev_matrix.
>> I think you may be conflating two different things. The q in q->matrix_mdev
>> represents a queue device bound to the driver. The link to matrix_mdev
>> indicates the APQN of the queue device is assigned to the matrix_mdev.
>> When a new domain is assigned to matrix_mdev, we know that
>> all APQNS currently assigned to the shadow_apcb  are bound to the vfio
>> driver
>> because of previous filtering, so we are only concerned with those APQNs
>> with the APQI of the new domain being assigned.
>>
>> 1. Queues bound to vfio_ap:
>>       04.0004
>>       04.0047
>> 2. APQNs assigned to matrix_mdev:
>>       04.0004
>>       04.0047
>> 3. shadow_apcb:
>>       04.0004
>>       04.0047
>> 4. Assign domain 0054 to matrix_mdev
>> 5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
>> 6. no change to shadow_apcb:
>>       04.0004
>>       04.0047
> Let me please expand on your example. For reference see the filtering
> code after the example.

Since our face to face discussion, I've made changes which
affect the scenario you laid out. The following shows the difference
in results using your scenario. Let me know what you think.

1. Queues bound to vfio_ap:
      04.0004
      04.0047
      05.0004
      05.0047
      05.0054
2. APQNs assigned to matrix_mdev:
      04.0004
      04.0047
3. shadow_apcb:
      04.0004
      04.0047
4. Assign domain 0054 to matrix_mdev
5. APQNs assigned to matrix_mdev:
      04.0004
      04.0047
      04.0054
5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
6. no change to shadow_apcb:
      04.0004
      04.0047
7. assign adapter 05
8. APQNs assigned to matrix_mdev:
      04.0004
      04.0047
      04.0054
      05.0004
      05.0047
      05.0054
9. shadow_apcb changes to:
      04.0004
      04.0047
      05.0004
      05.0047
      because adapter 05 is checked against the APQIs in the
      shadow_apcb (0004, 0047) and since 05.0004 and
      05.0047 are bound to the driver, adapter 05 is
      hot plugged.
10. assign domain 0052
11. APQNs assigned to matrix_mdev:
      04.0004
      04.0047
      04.0053
      04.0054
      05.0004
      05.0047
      05.0053
      05.0054
11. shadow_apcb remains
      04.0004
      04.0047
      05.0004
      05.0047
      because domain 0052 is checked against adapters assigned to
      shadow_apcb and rejected because neither 04.0052 nor 05.0052
      is bound to the vfio_ap driver.
12. 05.0054 gets removed (unbound)
13. Nothing is removed because 05.0054 is not assigned to shadow_apcb
14. unassign adapter 05
15. unassign domain 0053
16. APQNs assigned to matrix_mdev:
      04.0004
      04.0047
      04.0054
17. shadow apcb is
     04.0004
     04.0047
16. assign adapter 05
15. APQNs assigned to matrix_mdev:
      04.0004
      04.0047
      04.0054
      05.0004
      05.0047
      05.0054
16. shadow_apcb changes to:
     04.0004
     04.0047
     05.0004
     05.0047
     because adapter 05 is checked against APQIs (0004, 0047)
     in shadow_apcb and since queues 05.0004 and 05.0047
     are bound to vfio_ap, the adapter is hot plugged

>
> 1. Queues bound to vfio_ap:
>       04.0004
>       04.0047
>       05.0004
>       05.0047
>       05.0054
> 2. APQNs assigned to matrix_mdev:
>       04.0004
>       04.0047
> 3. shadow_apcb:
>       04.0004
>       04.0047
> 4. Assign domain 0054 to matrix_mdev
> 5. APQNs assigned to matrix_mdev:
>       04.0004
>       04.0047
>       04.0054
> 5. APQI 0054 gets filtered because 04.0054 not bound to vfio_ap
> 6. no change to shadow_apcb:
>       04.0004
>       04.0047
> 7. assign adapter 05
> 8. APQNs assigned to matrix_mdev:
>       04.0004
>       04.0047
>       04.0054
>       05.0004
>       05.0047
>       05.0054
> 9. shadow_apcb changes to:
>       05.0004
>       05.0047
>       05.0054
> because now vfio_ap_mdev_filter_guest_matrix() is called with filter_apid=true
> 10. assign domain 0052
> 11. APQNs assigned to matrix_mdev:
>       04.0004
>       04.0047
>       04.0053
>       04.0054
>       05.0004
>       05.0047
>       05.0053
>       05.0054
> 11. shadow_apcb changes to
>       04.0004
>       04.0047
>       05.0004
>       05.0047
>       because now filter_guest_matrix() is called with filter_apid=false
>       and apqis 0053 and 0054 get filtered
> 12. 05.0054 gets removed (unbound)
> 13. with your current code we unplug adapter 05 from shadow_apcb
>      despite the fact that 05.0054 was not in the shadow_apcb in
>      the first place
> 14. unassign adapter 05
> 15. unassign domain 0053
> 16. APQNs assigned to matrix_mdev:
>       04.0004
>       04.0047
>       04.0054
> 17. shadow apcb is
>      04.0004
>      04.0047
> 16. assign adapter 05
> 15. APQNs assigned to matrix_mdev:
>       04.0004
>       04.0047
>       04.0054
>       05.0004
>       05.0047
>       05.0054
> 16. shadow_apcb changes to
>       <empty>
>       because now filter_guest_matrix() is called with filter_apid=true
>       and apqn 04 gets filtered because queues 04.0053 are not bound
>       and apqn 05 gets filtered because queues 05.0053 are not bound
>
> static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev,
>                                              bool filter_apid)
> {
>          struct ap_matrix shadow_apcb;
>          unsigned long apid, apqi, apqn;
>                                                                                  
>          memcpy(&shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
>                                                                                  
>          for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>                  /*
>                   * If the APID is not assigned to the host AP configuration,
>                   * we can not assign it to the guest's AP configuration
>                   */
>                  if (!test_bit_inv(apid, (unsigned long *)
>                                    matrix_dev->config_info.apm)) {
>                          clear_bit_inv(apid, shadow_apcb.apm);
>                          continue;
>                  }
>                                                                                  
>                  for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>                                       AP_DOMAINS) {
>                          /*
>                           * If the APQI is not assigned to the host AP
>                           * configuration, then it can not be assigned to the
>                           * guest's AP configuration
>                           */
>                          if (!test_bit_inv(apqi, (unsigned long *)
>                                            matrix_dev->config_info.aqm)) {
>                                  clear_bit_inv(apqi, shadow_apcb.aqm);
>                                  continue;
>                          }
>                                                                                  
>                          /*
>                           * If the APQN is not bound to the vfio_ap device
>                           * driver, then we can't assign it to the guest's
>                           * AP configuration. The AP architecture won't
>                           * allow filtering of a single APQN, so let's filter
>                           * the APID.
>                           */
>                          apqn = AP_MKQID(apid, apqi);
>                                                                                  
>                          if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
>                                  if (filter_apid) {
>                                          clear_bit_inv(apid, shadow_apcb.apm);
>                                          break;
>                                  }
>                                                                                  
>                                  clear_bit_inv(apqi, shadow_apcb.aqm);
>                                  continue;
>                          }
>                  }
>
> I realize this scenario (to play through to the end) requires
> manually unbound queue (more precisely queue missing not because
> of host ap config or because of a[pq]mask), but just one 'hole' suffices.
>
> I'm afraid, that I might be bitching around, because last time it was me
> who downplayed the effects of such 'holes'.
>
> Nevertheless, I would like to ask you to verify the scenario I've
> sketched, or complain if I've gotten something wrong.

Your scenario looks correct and convinced me to change
the filtering logic giving the results I pointed out above.
It was a good catch on your part, so I thank you for
the review.

>
> Regarding solutions to the problem. It makes no sense to talk about a
> solution, before agreeing on the existence of the problem. Nevertheless
> I will write down two sentences, mostly as a reminder to myself, for the
> case we do agree on the existence of the problem. The simplest approach
> is to always filter by apid. That way we get a quirky adapter unplug
> right at steps 4, but it won't create the complicated mess we have in
> the rest of the points. Another idea is to restrict the overprovisioning
> of domains. Basically we would make the step 4 fail because we detected
> a 'hole'. But this idea has its own problems, and in some scenarios
> it does boil down to the unplug the adapter rule.

I made the following changes that I believe rectify this problem:
1. On guest startup, the shadow_apcb is initialized by filtering the
     mdev's matrix by APID (i.e., if an APQN derived from a particular
     APID and the APQIs assigned to the mdev's matrix does not
     reference a queue device bound to vfio_ap, that APID is not
     assigned to the shadow_apcb).
2. On adapter assignment, if each APQN derived from the APID
     being assigned and the APQIs assigned to the shadow_apcb
     does not reference a queue bound to vfio_ap, the adapter
     will not be hot plugged.
3. On adapter unassignment, if the APID is set in the shadow_apcb,
     the adapter will be hot unplugged.
4. On domain assignment, if each APQN derived from the APQI
     being assigned and the APIDs assigned to the shadow_apcb
     does not reference a queue bound to vfio_ap, the domain
     will not be hot plugged.
5. On domain unassignment, if the APQI is set in the shadow_apcb,
     the domain will be hot unplugged.
6. On probe:
     a. For the queue's APID, the same logic as #2 will be used.
     b. For the queue's APQI, the same logic as #4 will be used.
7. On remove, if the APQN of the queue being unbound is assigned
     to the shadow_apcb, the adapter will be hot unplugged.

>   
>
> [..]
>
>> I'm not sure why you are bringing up unlinking in the context of assigning
>> a new domain. Unlinking only occurs when an APID or APQI is unassigned.
> Are you certain? What about vfio_ap_mdev_on_cfg_remove()? I believe it
> unplugs from the shadow_apcb, but it does not change the
> assignments to the matrix_mdev. We do that so we know in remove that the
> queue was already cleaned up, and does not need more cleanup.

What you say is true; however, that is not related to my comment.
I asked why you were bringing up unlinking in the context of
assigning a new domain. The point you just made above has to
do with unassigning adapters/domains.

>
> Regards,
> Halil
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 11/14] s390/zcrypt: Notify driver on config changed and scan complete callbacks
  2020-10-27 17:28   ` Harald Freudenberger
@ 2020-11-13 20:58     ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-11-13 20:58 UTC (permalink / raw)
  To: Harald Freudenberger, linux-s390, linux-kernel, kvm
  Cc: borntraeger, cohuck, mjrosato, pasic, alex.williamson, kwankhede,
	fiuczy, frankja, david, hca, gor



On 10/27/20 1:28 PM, Harald Freudenberger wrote:
> On 22.10.20 19:12, Tony Krowiak wrote:
>> This patch intruduces an extension to the ap bus to notify device drivers
>> when the host AP configuration changes - i.e., adapters, domains or
>> control domains are added or removed. To that end, two new callbacks are
>> introduced for AP device drivers:
>>
>>    void (*on_config_changed)(struct ap_config_info *new_config_info,
>>                              struct ap_config_info *old_config_info);
>>
>>       This callback is invoked at the start of the AP bus scan
>>       function when it determines that the host AP configuration information
>>       has changed since the previous scan. This is done by storing
>>       an old and current QCI info struct and comparing them. If there is any
>>       difference, the callback is invoked.
>>
>>       Note that when the AP bus scan detects that AP adapters, domains or
>>       control domains have been removed from the host's AP configuration, it
>>       will remove the associated devices from the AP bus subsystem's device
>>       model. This callback gives the device driver a chance to respond to
>>       the removal of the AP devices from the host configuration prior to
>>       calling the device driver's remove callback. The primary purpose of
>>       this callback is to allow the vfio_ap driver to do a bulk unplug of
>>       all affected adapters, domains and control domains from affected
>>       guests rather than unplugging them one at a time when the remove
>>       callback is invoked.
>>
>>    void (*on_scan_complete)(struct ap_config_info *new_config_info,
>>                             struct ap_config_info *old_config_info);
>>
>>       The on_scan_complete callback is invoked after the ap bus scan is
>>       complete if the host AP configuration data has changed.
>>
>>       Note that when the AP bus scan detects that adapters, domains or
>>       control domains have been added to the host's configuration, it will
>>       create new devices in the AP bus subsystem's device model. The primary
>>       purpose of this callback is to allow the vfio_ap driver to do a bulk
>>       plug of all affected adapters, domains and control domains into
>>       affected guests rather than plugging them one at a time when the
>>       probe callback is invoked.
>>
>> Please note that changes to the apmask and aqmask do not trigger
>> these two callbacks since the bus scan function is not invoked by changes
>> to those masks.
>>
>> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
> Did I really sign-off this ? I know, I saw this code but ...

Good question, but I would not have introduced this myself. It's been
so long since this patch was created that I don't recall all of the details,
but I vaguely remember maybe getting an early version of this code
from you, although I could be wrong. I recognize the last comment in
the description as being yours. I will remove the Signed-off-by if you
prefer.

> First of all, please separate the ap bus changes from the vfio_ap driver changes.
> This makes backports and code change history much easier.

The problem is if I remove the vfio_ap driver changes, then this patch
will not build. I've been told in the past that this is a no-no.

>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/ap_bus.c          | 88 ++++++++++++++++++++++++++-
>>   drivers/s390/crypto/ap_bus.h          | 12 ++++
>>   drivers/s390/crypto/vfio_ap_drv.c     |  2 +-
>>   drivers/s390/crypto/vfio_ap_ops.c     | 11 ++--
>>   drivers/s390/crypto/vfio_ap_private.h |  2 +-
>>   5 files changed, 106 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
>> index 998e61cd86d9..5b94956ef6bc 100644
>> --- a/drivers/s390/crypto/ap_bus.c
>> +++ b/drivers/s390/crypto/ap_bus.c
>> @@ -73,8 +73,10 @@ struct ap_perms ap_perms;
>>   EXPORT_SYMBOL(ap_perms);
>>   DEFINE_MUTEX(ap_perms_mutex);
>>   EXPORT_SYMBOL(ap_perms_mutex);
>> +DEFINE_MUTEX(ap_config_lock);
> This mutes is unnecessary, but see details below.
>>   
>>   static struct ap_config_info *ap_qci_info;
>> +static struct ap_config_info *ap_qci_info_old;
>>   
>>   /*
>>    * AP bus related debug feature things.
>> @@ -1420,6 +1422,52 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
>>   		&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
>>   }
>>   
>> +/* Helper function for notify_config_changed */
>> +static int __drv_notify_config_changed(struct device_driver *drv, void *data)
>> +{
>> +	struct ap_driver *ap_drv = to_ap_drv(drv);
>> +
>> +	if (try_module_get(drv->owner)) {
>> +		if (ap_drv->on_config_changed)
>> +			ap_drv->on_config_changed(ap_qci_info,
>> +						  ap_qci_info_old);
>> +		module_put(drv->owner);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/* Notify all drivers about an qci config change */
>> +static inline void notify_config_changed(void)
>> +{
>> +	bus_for_each_drv(&ap_bus_type, NULL, NULL,
>> +			 __drv_notify_config_changed);
>> +}
>> +
>> +/* Helper function for notify_scan_complete */
>> +static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
>> +{
>> +	struct ap_driver *ap_drv = to_ap_drv(drv);
>> +
>> +	if (try_module_get(drv->owner)) {
>> +		if (ap_drv->on_scan_complete)
>> +			ap_drv->on_scan_complete(ap_qci_info,
>> +						 ap_qci_info_old);
>> +		module_put(drv->owner);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/* Notify all drivers about bus scan complete */
>> +static inline void notify_scan_complete(void)
>> +{
>> +	bus_for_each_drv(&ap_bus_type, NULL, NULL,
>> +			 __drv_notify_scan_complete);
>> +}
>> +
>> +
>> +
>>   /*
>>    * Helper function for ap_scan_bus().
>>    * Remove card device and associated queue devices.
>> @@ -1696,15 +1744,45 @@ static inline void ap_scan_adapter(int ap)
>>   	put_device(&ac->ap_dev.device);
>>   }
>>   
>> +static int ap_config_changed(void)
> I don't like the name here. This function is effectively fetching the qci info
> and then comparing the new with the prev. qci info. So it is the new
> ap_get_configuration() which returns bool true (config changed) or
> false (old and current config are the very same).

Okay, so I think what you are saying there is you prefer the name
ap_get_configuration()?

>> +{
>> +	int cfg_chg = 0;
>> +
>> +	if (ap_qci_info) {
>> +		if (!ap_qci_info_old) {
>> +			ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old),
>> +						  GFP_KERNEL);
>> +			if (!ap_qci_info_old)
>> +				return 0;
>> +		} else {
>> +			memcpy(ap_qci_info_old, ap_qci_info,
>> +			       sizeof(struct ap_config_info));
>> +		}
>> +		ap_fetch_qci_info(ap_qci_info);
>> +		cfg_chg = memcmp(ap_qci_info,
>> +				 ap_qci_info_old,
>> +				 sizeof(struct ap_config_info)) != 0;
>> +	}
>> +
>> +	return cfg_chg;
>> +}
>> +
>>   /**
>>    * ap_scan_bus(): Scan the AP bus for new devices
>>    * Runs periodically, workqueue timer (ap_config_time)
>>    */
>>   static void ap_scan_bus(struct work_struct *unused)
>>   {
>> -	int ap;
>> +	int ap, config_changed = 0;
>> +
>> +	mutex_lock(&ap_config_lock);
> This mutex is more or less surrrounding the ap_scan_bus function.
> The ap_scan_bus function is only called via a workqueue which is
> making sure there is only one invocation at a point in time. So it
> is not needed.

Makes sense, I'll remove it.

>>   
>> -	ap_fetch_qci_info(ap_qci_info);
>> +	/* config change notify */
>> +	config_changed = ap_config_changed();
>> +	if (config_changed)
>> +		notify_config_changed();
>> +	memcpy(ap_qci_info_old, ap_qci_info,
>> +	       sizeof(struct ap_config_info));
>>   	ap_select_domain();
>>   
>>   	AP_DBF_DBG("%s running\n", __func__);
>> @@ -1713,6 +1791,12 @@ static void ap_scan_bus(struct work_struct *unused)
>>   	for (ap = 0; ap <= ap_max_adapter_id; ap++)
>>   		ap_scan_adapter(ap);
>>   
>> +	/* scan complete notify */
>> +	if (config_changed)
>> +		notify_scan_complete();
>> +
>> +	mutex_unlock(&ap_config_lock);
>> +
>>   	/* check if there is at least one queue available with default domain */
>>   	if (ap_domain_index >= 0) {
>>   		struct device *dev =
>> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
>> index 6ce154d924d3..c021ea5121a9 100644
>> --- a/drivers/s390/crypto/ap_bus.h
>> +++ b/drivers/s390/crypto/ap_bus.h
>> @@ -146,6 +146,18 @@ struct ap_driver {
>>   	int (*probe)(struct ap_device *);
>>   	void (*remove)(struct ap_device *);
>>   	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
>> +	/*
>> +	 * Called at the start of the ap bus scan function when
>> +	 * the crypto config information (qci) has changed.
>> +	 */
>> +	void (*on_config_changed)(struct ap_config_info *new_config_info,
>> +				  struct ap_config_info *old_config_info);
>> +	/*
>> +	 * Called at the end of the ap bus scan function when
>> +	 * the crypto config information (qci) has changed.
>> +	 */
>> +	void (*on_scan_complete)(struct ap_config_info *new_config_info,
>> +				 struct ap_config_info *old_config_info);
>>   };
>>   
>>   #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
> Rest of this patch is vfio related and should be in a separate patch.

As stated above, if I remove the vfio-related changes then this patch will
not build which I've been told in the past is a no-no.

>
> Please note: The ap bus scan function does actively destroy card and associated queue
> devices when the TAPQ invocation tells that the function bits have changed (e.g. from
> EP11 mode to CCA mode) or the type has changed (e.g. from CEX6 to CEX7).
> This does not come with an change in the qci apm or adm bitfields !

Yes, I am aware of that and have coded the vfio_ap driver's probe and
remove functions accordingly.

>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index 8934471b7944..f06e19754de3 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -87,7 +87,7 @@ static int vfio_ap_matrix_dev_create(void)
>>   
>>   	/* Fill in config info via PQAP(QCI), if available */
>>   	if (test_facility(12)) {
>> -		ret = ap_qci(&matrix_dev->info);
>> +		ret = ap_qci(&matrix_dev->config_info);
>>   		if (ret)
>>   			goto matrix_alloc_err;
>>   	}
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index dae1fba41941..c4ea80ec8599 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -354,8 +354,9 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>   	}
>>   
>>   	matrix_mdev->mdev = mdev;
>> -	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>> -	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
>> +	vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
>> +	vfio_ap_matrix_init(&matrix_dev->config_info,
>> +			    &matrix_mdev->shadow_apcb);
>>   	hash_init(matrix_mdev->qtable);
>>   	mdev_set_drvdata(mdev, matrix_mdev);
>>   	matrix_mdev->pqap_hook.hook = handle_pqap;
>> @@ -540,8 +541,8 @@ static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev,
>>   		 * If the APID is not assigned to the host AP configuration,
>>   		 * we can not assign it to the guest's AP configuration
>>   		 */
>> -		if (!test_bit_inv(apid,
>> -				  (unsigned long *)matrix_dev->info.apm)) {
>> +		if (!test_bit_inv(apid, (unsigned long *)
>> +				  matrix_dev->config_info.apm)) {
>>   			clear_bit_inv(apid, shadow_apcb.apm);
>>   			continue;
>>   		}
>> @@ -554,7 +555,7 @@ static int vfio_ap_mdev_filter_guest_matrix(struct ap_matrix_mdev *matrix_mdev,
>>   			 * guest's AP configuration
>>   			 */
>>   			if (!test_bit_inv(apqi, (unsigned long *)
>> -					  matrix_dev->info.aqm)) {
>> +					  matrix_dev->config_info.aqm)) {
>>   				clear_bit_inv(apqi, shadow_apcb.aqm);
>>   				continue;
>>   			}
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index fc8634cee485..5065f0367ea2 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -40,7 +40,7 @@
>>   struct ap_matrix_dev {
>>   	struct device device;
>>   	atomic_t available_instances;
>> -	struct ap_config_info info;
>> +	struct ap_config_info config_info;
>>   	struct list_head mdev_list;
>>   	struct mutex lock;
>>   	struct ap_driver  *vfio_ap_drv;


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 12/14] s390/vfio-ap: handle host AP config change notification
  2020-11-03  9:48   ` kernel test robot
@ 2020-11-13 21:06     ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-11-13 21:06 UTC (permalink / raw)
  To: kernel test robot, linux-s390, linux-kernel, kvm
  Cc: kbuild-all, freude, borntraeger, cohuck, mjrosato, pasic,
	alex.williamson, kwankhede

Fixed the errors.

On 11/3/20 4:48 AM, kernel test robot wrote:
> Hi Tony,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on s390/features]
> [also build test ERROR on linus/master v5.10-rc2 next-20201103]
> [cannot apply to kvms390/next linux/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
> config: s390-allmodconfig (attached as .config)
> compiler: s390-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # https://github.com/0day-ci/linux/commit/32786ef6d4ba3703d993a8894ea1d763785fd3a4
>          git remote add linux-review https://github.com/0day-ci/linux
>          git fetch --no-tags linux-review Tony-Krowiak/s390-vfio-ap-dynamic-configuration-support/20201023-011543
>          git checkout 32786ef6d4ba3703d993a8894ea1d763785fd3a4
>          # save the attached .config to linux build tree
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>     drivers/s390/crypto/vfio_ap_ops.c:1316:5: warning: no previous prototype for 'vfio_ap_mdev_reset_queue' [-Wmissing-prototypes]
>      1316 | int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>           |     ^~~~~~~~~~~~~~~~~~~~~~~~
>     drivers/s390/crypto/vfio_ap_ops.c:1568:6: warning: no previous prototype for 'vfio_ap_mdev_hot_unplug_queue' [-Wmissing-prototypes]
>      1568 | void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
>           |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     drivers/s390/crypto/vfio_ap_ops.c: In function 'vfio_ap_mdev_on_cfg_remove':
>     drivers/s390/crypto/vfio_ap_ops.c:1777:7: warning: variable 'unassigned' set but not used [-Wunused-but-set-variable]
>      1777 |  bool unassigned = false;
>           |       ^~~~~~~~~~
>     drivers/s390/crypto/vfio_ap_ops.c: At top level:
>     drivers/s390/crypto/vfio_ap_ops.c:1813:6: warning: no previous prototype for 'vfio_ap_mdev_on_cfg_add' [-Wmissing-prototypes]
>      1813 | void vfio_ap_mdev_on_cfg_add(void)
>           |      ^~~~~~~~~~~~~~~~~~~~~~~
>     In file included from drivers/s390/crypto/vfio_ap_ops.c:11:
>     In function 'memcpy',
>         inlined from 'vfio_ap_mdev_unassign_apids' at drivers/s390/crypto/vfio_ap_ops.c:1655:3,
>         inlined from 'vfio_ap_mdev_on_cfg_remove' at drivers/s390/crypto/vfio_ap_ops.c:1800:8,
>         inlined from 'vfio_ap_on_cfg_changed' at drivers/s390/crypto/vfio_ap_ops.c:1836:2:
>>> include/linux/string.h:402:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
>       402 |    __read_overflow2();
>           |    ^~~~~~~~~~~~~~~~~~
>
> vim +/__read_overflow2 +402 include/linux/string.h
>
> 6974f0c4555e285 Daniel Micay  2017-07-12  393
> 6974f0c4555e285 Daniel Micay  2017-07-12  394  __FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
> 6974f0c4555e285 Daniel Micay  2017-07-12  395  {
> 6974f0c4555e285 Daniel Micay  2017-07-12  396  	size_t p_size = __builtin_object_size(p, 0);
> 6974f0c4555e285 Daniel Micay  2017-07-12  397  	size_t q_size = __builtin_object_size(q, 0);
> 6974f0c4555e285 Daniel Micay  2017-07-12  398  	if (__builtin_constant_p(size)) {
> 6974f0c4555e285 Daniel Micay  2017-07-12  399  		if (p_size < size)
> 6974f0c4555e285 Daniel Micay  2017-07-12  400  			__write_overflow();
> 6974f0c4555e285 Daniel Micay  2017-07-12  401  		if (q_size < size)
> 6974f0c4555e285 Daniel Micay  2017-07-12 @402  			__read_overflow2();
> 6974f0c4555e285 Daniel Micay  2017-07-12  403  	}
> 6974f0c4555e285 Daniel Micay  2017-07-12  404  	if (p_size < size || q_size < size)
> 6974f0c4555e285 Daniel Micay  2017-07-12  405  		fortify_panic(__func__);
> 47227d27e2fcb01 Daniel Axtens 2020-06-03  406  	return __underlying_memcpy(p, q, size);
> 6974f0c4555e285 Daniel Micay  2017-07-12  407  }
> 6974f0c4555e285 Daniel Micay  2017-07-12  408
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use
  2020-10-27 16:55   ` Harald Freudenberger
@ 2020-11-13 21:30     ` Tony Krowiak
  2020-11-14  0:00       ` Halil Pasic
  0 siblings, 1 reply; 68+ messages in thread
From: Tony Krowiak @ 2020-11-13 21:30 UTC (permalink / raw)
  To: Harald Freudenberger, linux-s390, linux-kernel, kvm
  Cc: borntraeger, cohuck, mjrosato, pasic, alex.williamson, kwankhede,
	fiuczy, frankja, david, hca, gor



On 10/27/20 12:55 PM, Harald Freudenberger wrote:
> On 22.10.20 19:11, Tony Krowiak wrote:
>> Introduces a new driver callback to prevent a root user from unbinding
>> an AP queue from its device driver if the queue is in use. The callback
>> will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
>> attributes would result in one or more AP queues being removed from its
>> driver. If the callback responds in the affirmative for any driver
>> queried, the change to the apmask or aqmask will be rejected with a device
>> in use error.
>>
>> For this patch, only non-default drivers will be queried. Currently,
>> there is only one non-default driver, the vfio_ap device driver. The
>> vfio_ap device driver facilitates pass-through of an AP queue to a
>> guest. The idea here is that a guest may be administered by a different
>> sysadmin than the host and we don't want AP resources to unexpectedly
>> disappear from a guest's AP configuration (i.e., adapters and domains
>> assigned to the matrix mdev). This will enforce the proper procedure for
>> removing AP resources intended for guest usage which is to
>> first unassign them from the matrix mdev, then unbind them from the
>> vfio_ap device driver.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
>>   drivers/s390/crypto/ap_bus.h |   4 +
>>   2 files changed, 142 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
>> index 485cbfcbf06e..998e61cd86d9 100644
>> --- a/drivers/s390/crypto/ap_bus.c
>> +++ b/drivers/s390/crypto/ap_bus.c
>> @@ -35,6 +35,7 @@
>>   #include <linux/mod_devicetable.h>
>>   #include <linux/debugfs.h>
>>   #include <linux/ctype.h>
>> +#include <linux/module.h>
>>   
>>   #include "ap_bus.h"
>>   #include "ap_debug.h"
>> @@ -893,6 +894,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
>>   	return 0;
>>   }
>>   
>> +static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
>> +			       unsigned long *newmap)
>> +{
>> +	unsigned long size;
>> +	int rc;
>> +
>> +	size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
>> +	if (*str == '+' || *str == '-') {
>> +		memcpy(newmap, bitmap, size);
>> +		rc = modify_bitmap(str, newmap, bits);
>> +	} else {
>> +		memset(newmap, 0, size);
>> +		rc = hex2bitmap(str, newmap, bits);
>> +	}
>> +	return rc;
>> +}
>> +
>>   int ap_parse_mask_str(const char *str,
>>   		      unsigned long *bitmap, int bits,
>>   		      struct mutex *lock)
>> @@ -912,14 +930,7 @@ int ap_parse_mask_str(const char *str,
>>   		kfree(newmap);
>>   		return -ERESTARTSYS;
>>   	}
>> -
>> -	if (*str == '+' || *str == '-') {
>> -		memcpy(newmap, bitmap, size);
>> -		rc = modify_bitmap(str, newmap, bits);
>> -	} else {
>> -		memset(newmap, 0, size);
>> -		rc = hex2bitmap(str, newmap, bits);
>> -	}
>> +	rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
>>   	if (rc == 0)
>>   		memcpy(bitmap, newmap, size);
>>   	mutex_unlock(lock);
>> @@ -1111,12 +1122,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>>   	return rc;
>>   }
>>   
>> +static int __verify_card_reservations(struct device_driver *drv, void *data)
>> +{
>> +	int rc = 0;
>> +	struct ap_driver *ap_drv = to_ap_drv(drv);
>> +	unsigned long *newapm = (unsigned long *)data;
>> +
>> +	/*
>> +	 * No need to verify whether the driver is using the queues if it is the
>> +	 * default driver.
>> +	 */
>> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>> +		return 0;
>> +
>> +	/* The non-default driver's module must be loaded */
> Can you please update this comment? It should be something like
> /* increase the driver's module refcounter to be sure it is not
>     going away when we invoke the callback function. */

Will do.

>
>> +	if (!try_module_get(drv->owner))
>> +		return 0;
>> +
>> +	if (ap_drv->in_use)
>> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
>> +			rc = -EBUSY;
>> +
> And here: /* release driver's module */ or simmilar

Okay

>> +	module_put(drv->owner);
>> +
>> +	return rc;
>> +}
>> +
>> +static int apmask_commit(unsigned long *newapm)
>> +{
>> +	int rc;
>> +	unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
>> +
>> +	/*
>> +	 * Check if any bits in the apmask have been set which will
>> +	 * result in queues being removed from non-default drivers
>> +	 */
>> +	if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
>> +		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
>> +				      __verify_card_reservations);
>> +		if (rc)
>> +			return rc;
>> +	}
>> +
>> +	memcpy(ap_perms.apm, newapm, APMASKSIZE);
>> +
>> +	return 0;
>> +}
>> +
>>   static ssize_t apmask_store(struct bus_type *bus, const char *buf,
>>   			    size_t count)
>>   {
>>   	int rc;
>> +	DECLARE_BITMAP(newapm, AP_DEVICES);
>> +
>> +	if (mutex_lock_interruptible(&ap_perms_mutex))
>> +		return -ERESTARTSYS;
>> +
>> +	rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
>> +	if (rc)
>> +		goto done;
>>   
>> -	rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
>> +	rc = apmask_commit(newapm);
>> +
>> +done:
>> +	mutex_unlock(&ap_perms_mutex);
>>   	if (rc)
>>   		return rc;
>>   
>> @@ -1142,12 +1211,71 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
>>   	return rc;
>>   }
>>   
>> +static int __verify_queue_reservations(struct device_driver *drv, void *data)
>> +{
>> +	int rc = 0;
>> +	struct ap_driver *ap_drv = to_ap_drv(drv);
>> +	unsigned long *newaqm = (unsigned long *)data;
>> +
>> +	/*
>> +	 * If the reserved bits do not identify queues reserved for use by the
>> +	 * non-default driver, there is no need to verify the driver is using
>> +	 * the queues.
>> +	 */
>> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>> +		return 0;
>> +
>> +	/* The non-default driver's module must be loaded */
> Same here.

Okay

>> +	if (!try_module_get(drv->owner))
>> +		return 0;
>> +
>> +	if (ap_drv->in_use)
>> +		if (ap_drv->in_use(ap_perms.apm, newaqm))
>> +			rc = -EBUSY;
>> +
> and here

Okay

>> +	module_put(drv->owner);
>> +
>> +	return rc;
>> +}
>> +
>> +static int aqmask_commit(unsigned long *newaqm)
>> +{
>> +	int rc;
>> +	unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
>> +
>> +	/*
>> +	 * Check if any bits in the aqmask have been set which will
>> +	 * result in queues being removed from non-default drivers
>> +	 */
>> +	if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
>> +		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
>> +				      __verify_queue_reservations);
>> +		if (rc)
>> +			return rc;
>> +	}
>> +
>> +	memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
>> +
>> +	return 0;
>> +}
>> +
>>   static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
>>   			    size_t count)
>>   {
>>   	int rc;
>> +	DECLARE_BITMAP(newaqm, AP_DOMAINS);
>>   
>> -	rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
>> +	if (mutex_lock_interruptible(&ap_perms_mutex))
>> +		return -ERESTARTSYS;
>> +
>> +	rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
>> +	if (rc)
>> +		goto done;
>> +
>> +	rc = aqmask_commit(newaqm);
>> +
>> +done:
>> +	mutex_unlock(&ap_perms_mutex);
>>   	if (rc)
>>   		return rc;
>>   
>> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
>> index 5029b80132aa..6ce154d924d3 100644
>> --- a/drivers/s390/crypto/ap_bus.h
>> +++ b/drivers/s390/crypto/ap_bus.h
>> @@ -145,6 +145,7 @@ struct ap_driver {
>>   
>>   	int (*probe)(struct ap_device *);
>>   	void (*remove)(struct ap_device *);
>> +	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
>>   };
>>   
>>   #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
>> @@ -293,6 +294,9 @@ void ap_queue_init_state(struct ap_queue *aq);
>>   struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
>>   			       int comp_device_type, unsigned int functions);
>>   
>> +#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
>> +#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
>> +
>>   struct ap_perms {
>>   	unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
>>   	unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
> I still don't like this code. That's because of what it is doing - not because of the code quality.
> And Halil, you are right. It is adding more pressure to the mutex used for locking the apmask
> and aqmask stuff (and the zcrypt multiple device drivers support code also).
> I am very concerned about the in_use callback which is called with the ap_perms_mutex
> held AND during bus_for_each_drv (so holding the overall AP BUS mutex) and then diving
> into the vfio_ap ... with yet another mutex to protect the vfio structs.
> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>

Thank you for your review. Maybe you ought to bring these concerns up with
our crypto architect. Halil came up with a solution for the potential 
deadlock
situation. We will be using the mutex_trylock() function in our sysfs 
assignment
interfaces which make the call to the AP bus to check permissions (which 
also
locks ap_perms). If the mutex_trylock() fails, we return from the assignment
function with -EBUSY. This should resolve that potential deadlock issue.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-11-13 17:27     ` Tony Krowiak
@ 2020-11-13 23:12       ` Halil Pasic
  2020-11-19 18:15         ` Tony Krowiak
  0 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-11-13 23:12 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Fri, 13 Nov 2020 12:27:32 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> 
> 
> On 10/28/20 4:17 AM, Halil Pasic wrote:
> > On Thu, 22 Oct 2020 13:12:02 -0400
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >
> >> +static ssize_t guest_matrix_show(struct device *dev,
> >> +				 struct device_attribute *attr, char *buf)
> >> +{
> >> +	ssize_t nchars;
> >> +	struct mdev_device *mdev = mdev_from_dev(dev);
> >> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >> +
> >> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> >> +		return -ENODEV;
> > I'm wondering, would it make sense to have guest_matrix display the would
> > be guest matrix when we don't have a KVM? With the filtering in
> > place, the question in what guest_matrix would my (assign) matrix result
> > right now if I were to hook up my vfio_ap_mdev to a guest seems a
> > legitimate one.
> 
> A couple of thoughts here:
> * The ENODEV informs the user that there is no guest running
>     which makes sense to me given this interface displays the
>     guest matrix. The alternative, which I considered, was to
>     display an empty matrix (i.e., nothing).
> * This would be a pretty drastic change to the design because
>     the shadow_apcb - which is what is displayed via this interface - is
>     only updated when the guest is started and while it is running (i.e.,
>     hot plug of new adapters/domains). Making this change would
>     require changing that entire design concept which I am reluctant
>     to do at this point in the game.
> 
> 

No problem. My thinking was, that, because we can do the
assign/unassing ops also for the running guest, that we also have
the code to do the maintenance on the shadow_apcb. In this
series this code is conditional with respect to vfio_ap_mdev_has_crycb().
E.g. 

static ssize_t assign_adapter_store(struct device *dev,                         
                                    struct device_attribute *attr,              
                                    const char *buf, size_t count)              
{                                                                               
[..]                                                                                
        if (vfio_ap_mdev_has_crycb(matrix_mdev))                                
                if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev, true))        
                        vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);

If one were to move the 
vfio_ap_mdev_has_crycb() check into vfio_ap_mdev_commit_shadow_apcb()
then we would have an always up to date shatdow_apcb, we could display.

I don't feel strongly about this. Was just an idea, because if the result
of the filtering is surprising, currently the only to see, without
knowing the algorithm, and possibly the state, and the history of the
system, is to actually start a guest.

Regards,
Halil


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-11-13 17:14     ` Tony Krowiak
@ 2020-11-13 23:47       ` Halil Pasic
  2020-11-16 16:58         ` Tony Krowiak
  2020-11-23 17:03         ` Cornelia Huck
  0 siblings, 2 replies; 68+ messages in thread
From: Halil Pasic @ 2020-11-13 23:47 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Fri, 13 Nov 2020 12:14:22 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:
[..]
> >>   }
> >>   
> >> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> >> +			 "already assigned to %s"
> >> +
> >> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
> >> +					 unsigned long *apm,
> >> +					 unsigned long *aqm)
> >> +{
> >> +	unsigned long apid, apqi;
> >> +
> >> +	for_each_set_bit_inv(apid, apm, AP_DEVICES)
> >> +		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> >> +			pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
> > Isn't error rather severe for this? For my taste even warning would be
> > severe for this.
> 
> The user only sees a EADDRINUSE returned from the sysfs interface,
> so Conny asked if I could log a message to indicate which APQNs are
> in use by which mdev. I can change this to an info message, but it
> will be missed if the log level is set higher. Maybe Conny can put in
> her two cents here since she asked for this.
> 

I'm looking forward to Conny's opinion. :)

[..]
> >>   
> >> @@ -708,18 +732,18 @@ static ssize_t assign_adapter_store(struct device *dev,
> >>   	if (ret)
> >>   		goto done;
> >>   
> >> -	set_bit_inv(apid, matrix_mdev->matrix.apm);
> >> +	memset(apm, 0, sizeof(apm));
> >> +	set_bit_inv(apid, apm);
> >>   
> >> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> >> +	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
> >> +					     matrix_mdev->matrix.aqm);
> > What is the benefit of using a copy here? I mean we have the vfio_ap lock
> > so nobody can see the bit we speculatively flipped.
> 
> The vfio_ap_mdev_verify_no_sharing() function definition was changed
> so that it can also be re-used by the vfio_ap_mdev_resource_in_use()
> function rather than duplicating that code for the in_use callback. The
> in-use callback is invoked by the AP bus which has no concept of
> a mediated device, so I made this change to accommodate that fact.

Seems I was not clear enough with my question. Here you pass a local
apm which has the every bit 0 except the one corresponding to the
adapter we are trying to assign. The matrix.apm actually may have
more apm bits set. What we used to do, is set the matrix.apm bit,
verify, and clear it if verification fails. I think that
would still work.

The computational complexity is currently the same. For
some reason unknown to me ap_apqn_in_matrix_owned_by_def_drv() uses loops
instead of using bitmap operations. But it won't do any less work
if the apm argument is sparse. Same is true bitmap ops are used.

What you do here is not wrong, because if the invariants, which should
be maintained, are maintained, performing the check with the other
bits set in the apm is superfluous. But as I said before, actually
it ain't extra work, and if there was a bug, it could help us detect
it (because the assignment, that should have worked would fail).

Preparing the local apm isn't much extra work either, but I still
don't understand the change. Why can't you pass in matrix.apm
after set_bit_inv(apid, ...) like we use to do before?

Again, no big deal, but I just prefer to understand the whys.

> 
> >
> > I've also pointed out in the previous patch that in_use() isn't
> > perfectly reliable (at least in theory) because of a race.
> 
> We discussed that privately and determined that the sysfs assignment
> interfaces will use mutex_trylock() to avoid races.

I don't think, what we discussed is going to fix the race I'm referring
to here. But I do look forward to v12.

Regards,
Halil

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use
  2020-11-13 21:30     ` Tony Krowiak
@ 2020-11-14  0:00       ` Halil Pasic
  2020-11-16 16:23         ` Tony Krowiak
  0 siblings, 1 reply; 68+ messages in thread
From: Halil Pasic @ 2020-11-14  0:00 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: Harald Freudenberger, linux-s390, linux-kernel, kvm, borntraeger,
	cohuck, mjrosato, alex.williamson, kwankhede, fiuczy, frankja,
	david, hca, gor

On Fri, 13 Nov 2020 16:30:31 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> We will be using the mutex_trylock() function in our sysfs 
> assignment
> interfaces which make the call to the AP bus to check permissions (which 
> also
> locks ap_perms). If the mutex_trylock() fails, we return from the assignment
> function with -EBUSY. This should resolve that potential deadlock issue.

It resolves the deadlock issue only if in_use() is also doing
mutex_trylock(), but the if in_use doesn't take the lock it
needs to back off (and so does it's client code) i.e. a boolean as
return value won't do.

Regards,
Halil

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use
  2020-11-14  0:00       ` Halil Pasic
@ 2020-11-16 16:23         ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-11-16 16:23 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Harald Freudenberger, linux-s390, linux-kernel, kvm, borntraeger,
	cohuck, mjrosato, alex.williamson, kwankhede, fiuczy, frankja,
	david, hca, gor



On 11/13/20 7:00 PM, Halil Pasic wrote:
> On Fri, 13 Nov 2020 16:30:31 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> We will be using the mutex_trylock() function in our sysfs
>> assignment
>> interfaces which make the call to the AP bus to check permissions (which
>> also
>> locks ap_perms). If the mutex_trylock() fails, we return from the assignment
>> function with -EBUSY. This should resolve that potential deadlock issue.
> It resolves the deadlock issue only if in_use() is also doing
> mutex_trylock(), but the if in_use doesn't take the lock it
> needs to back off (and so does it's client code) i.e. a boolean as
> return value won't do.

Makes sense. I'll change the in_use callback to return an int and use
mutex_trylock() for the vfio_ap_mdev_in_use() function. If the lock
can not be obtained, the function will return -EBUSY.

>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-11-13 23:47       ` Halil Pasic
@ 2020-11-16 16:58         ` Tony Krowiak
  2020-11-23 17:03         ` Cornelia Huck
  1 sibling, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-11-16 16:58 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 11/13/20 6:47 PM, Halil Pasic wrote:
> On Fri, 13 Nov 2020 12:14:22 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> [..]
>>>>    }
>>>>    
>>>> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
>>>> +			 "already assigned to %s"
>>>> +
>>>> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
>>>> +					 unsigned long *apm,
>>>> +					 unsigned long *aqm)
>>>> +{
>>>> +	unsigned long apid, apqi;
>>>> +
>>>> +	for_each_set_bit_inv(apid, apm, AP_DEVICES)
>>>> +		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
>>>> +			pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
>>> Isn't error rather severe for this? For my taste even warning would be
>>> severe for this.
>> The user only sees a EADDRINUSE returned from the sysfs interface,
>> so Conny asked if I could log a message to indicate which APQNs are
>> in use by which mdev. I can change this to an info message, but it
>> will be missed if the log level is set higher. Maybe Conny can put in
>> her two cents here since she asked for this.
>>
> I'm looking forward to Conny's opinion. :)
>
> [..]
>>>>    
>>>> @@ -708,18 +732,18 @@ static ssize_t assign_adapter_store(struct device *dev,
>>>>    	if (ret)
>>>>    		goto done;
>>>>    
>>>> -	set_bit_inv(apid, matrix_mdev->matrix.apm);
>>>> +	memset(apm, 0, sizeof(apm));
>>>> +	set_bit_inv(apid, apm);
>>>>    
>>>> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
>>>> +	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
>>>> +					     matrix_mdev->matrix.aqm);
>>> What is the benefit of using a copy here? I mean we have the vfio_ap lock
>>> so nobody can see the bit we speculatively flipped.
>> The vfio_ap_mdev_verify_no_sharing() function definition was changed
>> so that it can also be re-used by the vfio_ap_mdev_resource_in_use()
>> function rather than duplicating that code for the in_use callback. The
>> in-use callback is invoked by the AP bus which has no concept of
>> a mediated device, so I made this change to accommodate that fact.
> Seems I was not clear enough with my question. Here you pass a local
> apm which has the every bit 0 except the one corresponding to the
> adapter we are trying to assign. The matrix.apm actually may have
> more apm bits set. What we used to do, is set the matrix.apm bit,
> verify, and clear it if verification fails. I think that
> would still work.
>
> The computational complexity is currently the same. For
> some reason unknown to me ap_apqn_in_matrix_owned_by_def_drv() uses loops
> instead of using bitmap operations. But it won't do any less work
> if the apm argument is sparse. Same is true bitmap ops are used.
>
> What you do here is not wrong, because if the invariants, which should
> be maintained, are maintained, performing the check with the other
> bits set in the apm is superfluous. But as I said before, actually
> it ain't extra work, and if there was a bug, it could help us detect
> it (because the assignment, that should have worked would fail).
>
> Preparing the local apm isn't much extra work either, but I still
> don't understand the change. Why can't you pass in matrix.apm
> after set_bit_inv(apid, ...) like we use to do before?
>
> Again, no big deal, but I just prefer to understand the whys.

I think you misunderstood what I was saying, probably because
I didn't explain it very thoroughly or clearly. The change was not
made to reduce the amount of work done in the
vfio_ap_mdev_verify_no_sharing() function.


If the assignment functions were the only ones to call the
vfio_ap_mdev_verify_no_sharing() function, then you'd be correct;
there would be no good reason not to set the apid in the
matrix_mdev->matrix.apm/aqm as we used to. The modification
was made to accommodate the vfio_ap_mdev_resource_in_use() function.

The vfio_ap_mdev_resource_in_use() function is invoked by the
AP bus when a change is made to the apmask/aqmask that
will result in taking queues away from vfio_ap. This function
needs to verify that the affected APQNs are not assigned to
any matrix mdev. Rather than write a new function that duplicates
the logic in the vfio_ap_mdev_verify_no_sharing() function, I merely
changed the signature to take the apm/aqm specifying the APQNs to
verify rather than obtaining them from the matrix_mdev. The
reason for this is because the bitmaps passed to the in_use
callback are not specific to a particular matrix_mdev as is the
case with the assignment interfaces. Making this change allowed the
vfio_ap_mdev_verify_no_sharing() function to be used by both the
assignment functions as well as the in_use callback.

I suppose another option
would have been to create a phony matrix_mdev in the in_use
callback and copy the masks passed in to the function to the
phony matrix_mdev's apm/aqm. That would have eliminated
the need to change the signature of the vfio_ap_mdev_verify_no_sharing()
function, but I'm not sure it is worth the effort at this point.

>>> I've also pointed out in the previous patch that in_use() isn't
>>> perfectly reliable (at least in theory) because of a race.
>> We discussed that privately and determined that the sysfs assignment
>> interfaces will use mutex_trylock() to avoid races.
> I don't think, what we discussed is going to fix the race I'm referring
> to here. But I do look forward to v12.
>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-11-13 23:12       ` Halil Pasic
@ 2020-11-19 18:15         ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-11-19 18:15 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 11/13/20 6:12 PM, Halil Pasic wrote:
> On Fri, 13 Nov 2020 12:27:32 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>
>> On 10/28/20 4:17 AM, Halil Pasic wrote:
>>> On Thu, 22 Oct 2020 13:12:02 -0400
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>
>>>> +static ssize_t guest_matrix_show(struct device *dev,
>>>> +				 struct device_attribute *attr, char *buf)
>>>> +{
>>>> +	ssize_t nchars;
>>>> +	struct mdev_device *mdev = mdev_from_dev(dev);
>>>> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>> +
>>>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>>>> +		return -ENODEV;
>>> I'm wondering, would it make sense to have guest_matrix display the would
>>> be guest matrix when we don't have a KVM? With the filtering in
>>> place, the question in what guest_matrix would my (assign) matrix result
>>> right now if I were to hook up my vfio_ap_mdev to a guest seems a
>>> legitimate one.
>> A couple of thoughts here:
>> * The ENODEV informs the user that there is no guest running
>>      which makes sense to me given this interface displays the
>>      guest matrix. The alternative, which I considered, was to
>>      display an empty matrix (i.e., nothing).
>> * This would be a pretty drastic change to the design because
>>      the shadow_apcb - which is what is displayed via this interface - is
>>      only updated when the guest is started and while it is running (i.e.,
>>      hot plug of new adapters/domains). Making this change would
>>      require changing that entire design concept which I am reluctant
>>      to do at this point in the game.
>>
>>
> No problem. My thinking was, that, because we can do the
> assign/unassing ops also for the running guest, that we also have
> the code to do the maintenance on the shadow_apcb. In this
> series this code is conditional with respect to vfio_ap_mdev_has_crycb().
> E.g.
>
> static ssize_t assign_adapter_store(struct device *dev,
>                                      struct device_attribute *attr,
>                                      const char *buf, size_t count)
> {
> [..]
>          if (vfio_ap_mdev_has_crycb(matrix_mdev))
>                  if (vfio_ap_mdev_filter_guest_matrix(matrix_mdev, true))
>                          vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>
> If one were to move the
> vfio_ap_mdev_has_crycb() check into vfio_ap_mdev_commit_shadow_apcb()
> then we would have an always up to date shatdow_apcb, we could display.
>
> I don't feel strongly about this. Was just an idea, because if the result
> of the filtering is surprising, currently the only to see, without
> knowing the algorithm, and possibly the state, and the history of the
> system, is to actually start a guest.

Okay, I can buy this and will make the change.

>
> Regards,
> Halil
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-11-13 23:47       ` Halil Pasic
  2020-11-16 16:58         ` Tony Krowiak
@ 2020-11-23 17:03         ` Cornelia Huck
  2020-11-23 19:23           ` Tony Krowiak
  1 sibling, 1 reply; 68+ messages in thread
From: Cornelia Huck @ 2020-11-23 17:03 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Tony Krowiak, linux-s390, linux-kernel, kvm, freude, borntraeger,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Sat, 14 Nov 2020 00:47:22 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Fri, 13 Nov 2020 12:14:22 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> [..]
> > >>   }
> > >>   
> > >> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> > >> +			 "already assigned to %s"
> > >> +
> > >> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
> > >> +					 unsigned long *apm,
> > >> +					 unsigned long *aqm)
> > >> +{
> > >> +	unsigned long apid, apqi;
> > >> +
> > >> +	for_each_set_bit_inv(apid, apm, AP_DEVICES)
> > >> +		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> > >> +			pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);  
> > > Isn't error rather severe for this? For my taste even warning would be
> > > severe for this.  
> > 
> > The user only sees a EADDRINUSE returned from the sysfs interface,
> > so Conny asked if I could log a message to indicate which APQNs are
> > in use by which mdev. I can change this to an info message, but it
> > will be missed if the log level is set higher. Maybe Conny can put in
> > her two cents here since she asked for this.
> >   
> 
> I'm looking forward to Conny's opinion. :)

(only just saw this; -ETOOMANYEMAILS)

It is probably not an error in the sense of "things are broken, this
cannot work"; but I'd consider this at least a warning "this does not
work as you intended".


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-11-23 17:03         ` Cornelia Huck
@ 2020-11-23 19:23           ` Tony Krowiak
  0 siblings, 0 replies; 68+ messages in thread
From: Tony Krowiak @ 2020-11-23 19:23 UTC (permalink / raw)
  To: Cornelia Huck, Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	alex.williamson, kwankhede, fiuczy, frankja, david, hca, gor



On 11/23/20 12:03 PM, Cornelia Huck wrote:
> On Sat, 14 Nov 2020 00:47:22 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
>
>> On Fri, 13 Nov 2020 12:14:22 -0500
>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>> [..]
>>>>>    }
>>>>>    
>>>>> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
>>>>> +			 "already assigned to %s"
>>>>> +
>>>>> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
>>>>> +					 unsigned long *apm,
>>>>> +					 unsigned long *aqm)
>>>>> +{
>>>>> +	unsigned long apid, apqi;
>>>>> +
>>>>> +	for_each_set_bit_inv(apid, apm, AP_DEVICES)
>>>>> +		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
>>>>> +			pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
>>>> Isn't error rather severe for this? For my taste even warning would be
>>>> severe for this.
>>> The user only sees a EADDRINUSE returned from the sysfs interface,
>>> so Conny asked if I could log a message to indicate which APQNs are
>>> in use by which mdev. I can change this to an info message, but it
>>> will be missed if the log level is set higher. Maybe Conny can put in
>>> her two cents here since she asked for this.
>>>    
>> I'm looking forward to Conny's opinion. :)
> (only just saw this; -ETOOMANYEMAILS)
>
> It is probably not an error in the sense of "things are broken, this
> cannot work"; but I'd consider this at least a warning "this does not
> work as you intended".

Okay then, I'll make it a warning.

>


^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2020-11-23 19:23 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-22 17:11 [PATCH v11 00/14] s390/vfio-ap: dynamic configuration support Tony Krowiak
2020-10-22 17:11 ` [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset Tony Krowiak
2020-10-22 19:44   ` kernel test robot
2020-10-26 16:57     ` Tony Krowiak
2020-10-27  6:48   ` Halil Pasic
2020-10-29 23:29     ` Tony Krowiak
2020-10-30 16:13       ` Tony Krowiak
2020-10-30 17:27       ` Halil Pasic
2020-10-30 20:45         ` Tony Krowiak
2020-10-30 17:42       ` Halil Pasic
2020-10-30 20:37         ` Tony Krowiak
2020-10-31  3:43           ` Halil Pasic
2020-11-02 14:35             ` Tony Krowiak
2020-10-30 17:54       ` Halil Pasic
2020-10-30 20:53         ` Tony Krowiak
2020-10-30 21:13           ` Tony Krowiak
2020-10-30 17:56       ` Halil Pasic
2020-10-30 21:17         ` Tony Krowiak
2020-10-22 17:11 ` [PATCH v11 02/14] 390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
2020-10-27  7:01   ` Halil Pasic
2020-11-02 21:57     ` Tony Krowiak
2020-10-22 17:11 ` [PATCH v11 03/14] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
2020-10-27  9:33   ` Halil Pasic
2020-10-22 17:11 ` [PATCH v11 04/14] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
2020-10-27 13:01   ` Halil Pasic
2020-10-27 16:55   ` Harald Freudenberger
2020-11-13 21:30     ` Tony Krowiak
2020-11-14  0:00       ` Halil Pasic
2020-11-16 16:23         ` Tony Krowiak
2020-10-22 17:12 ` [PATCH v11 05/14] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
2020-10-27 13:27   ` Halil Pasic
2020-11-13 17:14     ` Tony Krowiak
2020-11-13 23:47       ` Halil Pasic
2020-11-16 16:58         ` Tony Krowiak
2020-11-23 17:03         ` Cornelia Huck
2020-11-23 19:23           ` Tony Krowiak
2020-10-22 17:12 ` [PATCH v11 06/14] s390/vfio-ap: introduce shadow APCB Tony Krowiak
2020-10-28  8:11   ` Halil Pasic
2020-11-13 17:18     ` Tony Krowiak
2020-10-22 17:12 ` [PATCH v11 07/14] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
2020-10-28  8:17   ` Halil Pasic
2020-11-13 17:27     ` Tony Krowiak
2020-11-13 23:12       ` Halil Pasic
2020-11-19 18:15         ` Tony Krowiak
2020-10-22 17:12 ` [PATCH v11 08/14] s390/vfio-ap: hot plug/unplug queues on bind/unbind of queue device Tony Krowiak
2020-10-22 20:30   ` kernel test robot
2020-10-26 17:04     ` Tony Krowiak
2020-10-28 13:57   ` Halil Pasic
2020-11-03 22:49     ` Tony Krowiak
2020-11-04 12:52       ` Halil Pasic
2020-11-04 21:20         ` Tony Krowiak
2020-11-05 12:27           ` Halil Pasic
2020-11-13 20:36             ` Tony Krowiak
2020-11-04 13:23       ` Halil Pasic
2020-10-22 17:12 ` [PATCH v11 09/14] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
2020-10-28 15:03   ` Halil Pasic
2020-10-22 17:12 ` [PATCH v11 10/14] s390/vfio-ap: allow hot plug/unplug of AP resources using " Tony Krowiak
2020-10-22 17:12 ` [PATCH v11 11/14] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
2020-10-27 17:28   ` Harald Freudenberger
2020-11-13 20:58     ` Tony Krowiak
2020-10-22 17:12 ` [PATCH v11 12/14] s390/vfio-ap: handle host AP config change notification Tony Krowiak
2020-10-22 21:17   ` kernel test robot
2020-10-26 17:07     ` Tony Krowiak
2020-10-26 17:21     ` Tony Krowiak
2020-11-03  9:48   ` kernel test robot
2020-11-13 21:06     ` Tony Krowiak
2020-10-22 17:12 ` [PATCH v11 13/14] s390/vfio-ap: handle AP bus scan completed notification Tony Krowiak
2020-10-22 17:12 ` [PATCH v11 14/14] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).