linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control
@ 2019-02-22 15:29 Pierre Morel
  2019-02-22 15:29 ` [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC Pierre Morel
                   ` (7 more replies)
  0 siblings, 8 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-22 15:29 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

This patch implement PQAP/AQIC interception in KVM.

To implement this we need to add a new structure, vfio_ap_queue,to be
able to retrieve the mediated device associated with a queue and specific
values needed to register/unregister the interrupt structures:
 - APQN: to be able to issue the commands and search for queue structures
 - NIB : to unpin the NIB on clear IRQ
 - ISC : to unregister with the GIB interface
 - MATRIX: a pointer to the matrix mediated device
 - LIST: the list_head to handle the vfio_queue life cycle

Having this structure and the list management greatly ease the handling
of the AP queues and diminues the LOCs needed in the vfio_ap driver by
more than 150 lines in comparison with the previous version.


0) Queues life cycle

vfio_ap_queues are created on probe

We define one bucket on the matrix device to store the free vfio_ap_queues,
the queues not assign to any matrix mediated device.

We define one bucket on each matrix mediated device to hold the
vfio_ap_queues belonging to it.

vfio_ap_queues are deleted on remove

This makes the search for a queue easy and the detection of assignent
incoherency obvious (the queue is not avilable) and simplifies assignment.


1) Phase 1, probe and remove from vfio_ap_queue

The vfio_ap_queue structures are dynamically allocated and setup
when a queue is probed by the ap_vfio_driver.
The vfio_ap_queue is linked to the ap_queue device as the driver data.

The new The vfio_ap_queue is put on a free_list belonging to the
matrix device.

The vfio_ap_queue are free during remove.


2) Phase 2, assignment of vfio_ap_queue to a mediated device

When a APID is assigned we look for APQI already assigned to
the matrix mediated device and associate all the queue with the
APQN = (APID,APQI) to the mediated device by adding them to
the mediated device queue list.
We do the same when a APQI is assigned.

If any queue with a matching APQN can not be found on the matrix
device free list it means it is already associated to another matrix
mediated device and no queue is added to the matrix mediated device.

3) Phase 3, starting the guest

When the VFIO device is opened the PQAP callback and a pointer to
the matrix mediated device are set inside KVM during the open callback.

When the device is closed or if a queue is removed, the vfio_ap_queue is
dissociated from the mediated device.


4) Phase 3 intercepting the PQAP/AQIC instruction

On interception of the PQAP/AQIC instruction, the interception code
makes sure the pqap_hook is initialized and allowed to be called
and call it.
Otherwise it reports the usual -EOPNOTSUPP return code to let
QEMU handle the fault.
  
the pqap callback search for the queue asociated with the APQN
stored in the register 0, setting the code to "illegal APQN"
if the vfio_ap_queue can not be found.

Depending on the "i" bit of the register 1, the pqap callback
setup or clear the interruption by calling the host format PQAP/AQIC
instruction.
When seting up the interruption it uses the NIB and the guest ISC
provided by the guest and the host ISC provided by the registration
to the GIB code, pin the NIB and also stores ISC and NIB inside
the vfio_ap_queue structure.
When clearing the interrupt it retrieves the host ISC to unregister
with the GIB code and unpin the NIB.

We take care when enabling GISA that the guest may have issued a
reset and will not need to disable the interuptions before
re-enabling interruptions.


5) Phase 4 clean dissociation from the mediated device on remove

On removing of the AP device the remove callback is called.
To be sure that the guest will not access the queue anymore
we clear the APID CRYCB bit.
Cleaning the APID, over the APQI, is chosen because the architecture
specifies that only the APID can be dynamically changed outside IPL.


6) Associated QEMU patch

There is a QEMU patch which is needed to enable the PQAP/AQIC
facility in the guest.

Posted in qemu-devel@nongnu.org as:
Message-Id: <1550146494-21085-1-git-send-email-pmorel@linux.ibm.com>


Pierre Morel (7):
  s390: ap: kvm: add PQAP interception for AQIC
  s390: ap: new vfio_ap_queue structure
  s390: ap: associate a ap_vfio_queue and a matrix mdev
  vfio: ap: register IOMMU VFIO notifier
  s390: ap: implement PAPQ AQIC interception in kernel
  s390: ap: Cleanup on removing the AP device
  s390: ap: kvm: Enable PQAP/AQIC facility for the guest

 arch/s390/include/asm/kvm_host.h      |   2 +
 arch/s390/kvm/priv.c                  |  52 +++
 arch/s390/tools/gen_facilities.c      |   1 +
 drivers/s390/crypto/ap_bus.h          |   1 +
 drivers/s390/crypto/vfio_ap_drv.c     |  62 +++-
 drivers/s390/crypto/vfio_ap_ops.c     | 630 +++++++++++++++++++++++-----------
 drivers/s390/crypto/vfio_ap_private.h |  15 +
 7 files changed, 563 insertions(+), 200 deletions(-)

-- 
2.7.4

Changelog:
- Associating the vfio_queues during APID/APQI assign
  (Tony)
- Dissociating the vfio_queues during APID/APQI unassign
  (Tony)
- Taking care that the guest can directly disable the interrupt
  by using a RESET
  (Halil)
- Remove the patch creating the matrix bus to accelerate its
  integration in Linux stable
  (Christian)


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-22 15:29 [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
@ 2019-02-22 15:29 ` Pierre Morel
  2019-02-25 18:36   ` Tony Krowiak
  2019-02-22 15:29 ` [PATCH v4 2/7] s390: ap: new vfio_ap_queue structure Pierre Morel
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-22 15:29 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

We prepare the interception of the PQAP/AQIC instruction for
the case the AQIC facility is enabled in the guest.

We add a callback inside the KVM arch structure for s390 for
a VFIO driver to handle a specific response to the PQAP
instruction with the AQIC command.

We inject the correct exceptions from inside KVM for the case the
callback is not initialized, which happens when the vfio_ap driver
is not loaded.

If the callback has been setup we call it.
If not we setup an answer considering that no queue is available
for the guest when no callback has been setup.

We do consider the responsability of the driver to always initialize
the PQAP callback if it defines queues by initializing the CRYCB for
a guest.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  1 +
 arch/s390/kvm/priv.c             | 52 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index c5f5156..49cc8b0 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -719,6 +719,7 @@ struct kvm_s390_cpu_model {
 
 struct kvm_s390_crypto {
 	struct kvm_s390_crypto_cb *crycb;
+	int (*pqap_hook)(struct kvm_vcpu *vcpu);
 	__u32 crycbd;
 	__u8 aes_kw;
 	__u8 dea_kw;
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 8679bd7..3448abd 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -27,6 +27,7 @@
 #include <asm/io.h>
 #include <asm/ptrace.h>
 #include <asm/sclp.h>
+#include <asm/ap.h>
 #include "gaccess.h"
 #include "kvm-s390.h"
 #include "trace.h"
@@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
 	}
 }
 
+/*
+ * handle_pqap: Handling pqap interception
+ * @vcpu: the vcpu having issue the pqap instruction
+ *
+ * We now support PQAP/AQIC instructions and we need to correctly
+ * answer the guest even if no dedicated driver's hook is available.
+ *
+ * The intercepting code calls a dedicated callback for this instruction
+ * if a driver did register one in the CRYPTO satellite of the
+ * SIE block.
+ *
+ * For PQAP/AQIC instructions only, verify privilege and specifications.
+ *
+ * If no callback available, the queues are not available, return this to
+ * the caller.
+ * Else return the value returned by the callback.
+ */
+static int handle_pqap(struct kvm_vcpu *vcpu)
+{
+	uint8_t fc;
+	struct ap_queue_status status = {};
+
+	/* Verify that the AP instruction are available */
+	if (!ap_instructions_available())
+		return -EOPNOTSUPP;
+	/* Verify that the guest is allowed to use AP instructions */
+	if (!(vcpu->arch.sie_block->eca & ECA_APIE))
+		return -EOPNOTSUPP;
+	/* Verify that the function code is AQIC */
+	fc = vcpu->run->s.regs.gprs[0] >> 24;
+	if (fc != 0x03)
+		return -EOPNOTSUPP;
+
+	/* PQAP instructions are allowed for guest kernel only */
+	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
+		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
+	/* AQIC instruction is allowed only if facility 65 is available */
+	if (!test_kvm_facility(vcpu->kvm, 65))
+		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
+	/* Verify that the hook callback is registered and call it */
+	if (vcpu->kvm->arch.crypto.pqap_hook)
+		return vcpu->kvm->arch.crypto.pqap_hook(vcpu);
+
+	/* PQAP/AQIC instructions are authorized but there is no queue */
+	status.response_code = 0x01;
+	memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
+	return 0;
+}
+
 static int handle_stfl(struct kvm_vcpu *vcpu)
 {
 	int rc;
@@ -878,6 +928,8 @@ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu)
 		return handle_sthyi(vcpu);
 	case 0x7d:
 		return handle_stsi(vcpu);
+	case 0xaf:
+		return handle_pqap(vcpu);
 	case 0xb1:
 		return handle_stfl(vcpu);
 	case 0xb2:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 2/7] s390: ap: new vfio_ap_queue structure
  2019-02-22 15:29 [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
  2019-02-22 15:29 ` [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC Pierre Morel
@ 2019-02-22 15:29 ` Pierre Morel
  2019-02-26 16:10   ` Tony Krowiak
  2019-02-22 15:29 ` [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev Pierre Morel
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-22 15:29 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

The AP interruptions are assigned on a queue basis and
the GISA structure is handled on a VM basis, so that
we need to add a structure we can retrieve from both side
holding the information we need to handle PQAP/AQIC interception
and setup the GISA.

Since we can not add more information to the ap_device
we add a new structure vfio_ap_queue, to hold per queue
information useful to handle interruptions and set it as
driver's data of the standard ap_queue device.

Usually, the device and the mediated device are linked together
but in the vfio_ap driver design we have a bunch of "sub" devices
(the ap_queue devices) belonging to the mediated device.

Linking these structure to the mediated device it is assigned to,
with the help of the vfio_ap_queue structure will help us to
retrieve the AP devices associated with the mediated devices
during the mediated device operations.

------------    -------------
| AP queue |--> | AP_vfio_q |<----
------------    ------^------    |    ---------------
                      |          <--->| matrix_mdev |
------------    ------v------    |    ---------------
| AP queue |--> | AP_vfio_q |-----
------------    -------------

The vfio_ap_queue device will hold the following entries:
- apqn: AP queue number (defined here)
- isc : Interrupt subclass (defined later)
- nib : notification information byte (defined later)
- list: a list_head entry allowing to link this structure to a
	matrix mediated device it is assigned to.

The vfio_ap_queue structure is allocated when the vfio_ap_driver
is probed and added as driver data to the ap_queue device.
It is free on remove.

The structure is linked to the matrix_dev host device at the
probe of the device building some kind of free list for the
matrix mediated devices.

When the vfio_queue is associated to a matrix mediated device,
the vfio_ap_queue device is linked to this matrix mediated device
and unlinked when dissociated.

This patch and the 3 next can be squashed together on the
final release of this series.
until then I separate them to ease the review.

So please do not complain about unused functions or about
squashing the patches together, this will be resolved during
the last iteration.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     | 27 ++++++++++++++++++++++++++-
 drivers/s390/crypto/vfio_ap_private.h |  9 +++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index e9824c3..eca0ffc 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -40,14 +40,38 @@ static struct ap_device_id ap_queue_ids[] = {
 
 MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
 
+/**
+ * vfio_ap_queue_dev_probe:
+ *
+ * Allocate a vfio_ap_queue structure and associate it
+ * with the device as driver_data.
+ */
 static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
 {
+	struct vfio_ap_queue *q;
+
+	q = kzalloc(sizeof(*q), GFP_KERNEL);
+	if (!q)
+		return -ENOMEM;
+	dev_set_drvdata(&apdev->device, q);
+	q->apqn = to_ap_queue(&apdev->device)->qid;
+	INIT_LIST_HEAD(&q->list);
+	list_add(&q->list, &matrix_dev->free_list);
 	return 0;
 }
 
+/**
+ * vfio_ap_queue_dev_remove:
+ *
+ * Free the associated vfio_ap_queue structure
+ */
 static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
 {
-	/* Nothing to do yet */
+	struct vfio_ap_queue *q;
+
+	q = dev_get_drvdata(&apdev->device);
+	list_del(&q->list);
+	kfree(q);
 }
 
 static void vfio_ap_matrix_dev_release(struct device *dev)
@@ -107,6 +131,7 @@ static int vfio_ap_matrix_dev_create(void)
 	matrix_dev->device.bus = &matrix_bus;
 	matrix_dev->device.release = vfio_ap_matrix_dev_release;
 	matrix_dev->vfio_ap_drv = &vfio_ap_drv;
+	INIT_LIST_HEAD(&matrix_dev->free_list);
 
 	ret = device_register(&matrix_dev->device);
 	if (ret)
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 76b7f98..2760178 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -39,6 +39,7 @@ struct ap_matrix_dev {
 	atomic_t available_instances;
 	struct ap_config_info info;
 	struct list_head mdev_list;
+	struct list_head free_list;
 	struct mutex lock;
 	struct ap_driver  *vfio_ap_drv;
 };
@@ -81,9 +82,17 @@ struct ap_matrix_mdev {
 	struct ap_matrix matrix;
 	struct notifier_block group_notifier;
 	struct kvm *kvm;
+	struct list_head qlist;
 };
 
 extern int vfio_ap_mdev_register(void);
 extern void vfio_ap_mdev_unregister(void);
 
+struct vfio_ap_queue {
+	struct list_head list;
+	struct ap_matrix_mdev *matrix_mdev;
+	unsigned long nib;
+	int	apqn;
+	unsigned char isc;
+};
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-02-22 15:29 [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
  2019-02-22 15:29 ` [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC Pierre Morel
  2019-02-22 15:29 ` [PATCH v4 2/7] s390: ap: new vfio_ap_queue structure Pierre Morel
@ 2019-02-22 15:29 ` Pierre Morel
  2019-02-26 18:14   ` Tony Krowiak
                     ` (3 more replies)
  2019-02-22 15:29 ` [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier Pierre Morel
                   ` (4 subsequent siblings)
  7 siblings, 4 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-22 15:29 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

We need to associate the ap_vfio_queue, which will hold the
per queue information for interrupt with a matrix mediated device
which hold the configuration and the way to the CRYCB.

Let's do this when assigning a APID or a APQI to the mediated device
and clear the relation when unassigning.

Queuing the devices on a list of free devices and testing the
matrix_mdev pointer to the associated matrix allow us to know
if the queue is associated to the matrix device and associated
or not to a mediated device.

When resetting an AP queue we must wait until there are no more
messages in the message queue before considering the queue is really
in a clean state.

Let's do it and wait until the status response code indicate the
queue is empty after issuing a PAPQ/ZAPQ instruction.

Being at work on the reset function, let's simplify
vfio_ap_mdev_reset_queue and vfio_ap_mdev_reset_queues by using the
vfio_ap_queue structure as parameter.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 385 +++++++++++++++++++-------------------
 1 file changed, 189 insertions(+), 196 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 900b9cf..172d6eb 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -24,6 +24,57 @@
 #define VFIO_AP_MDEV_TYPE_HWVIRT "passthrough"
 #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
 
+/**
+ * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
+ * @apqn: The queue APQN
+ *
+ * Retrieve a queue with a specific APQN from the list of the
+ * devices associated with a list.
+ *
+ * Returns the pointer to the associated vfio_ap_queue
+ */
+struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
+{
+	struct vfio_ap_queue *q;
+
+	list_for_each_entry(q, l, list)
+		if (q->apqn == apqn)
+			return q;
+	return NULL;
+}
+
+static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
+{
+	struct ap_queue_status status;
+	int retry = 20;
+
+	do {
+		status = ap_zapq(q->apqn);
+		switch (status.response_code) {
+		case AP_RESPONSE_NORMAL:
+			while (!status.queue_empty && retry--) {
+				msleep(20);
+				status = ap_tapq(q->apqn, NULL);
+			}
+			if (retry <= 0)
+				pr_warn("%s: queue 0x%04x not empty\n",
+					__func__, q->apqn);
+			return 0;
+		case AP_RESPONSE_RESET_IN_PROGRESS:
+		case AP_RESPONSE_BUSY:
+			msleep(20);
+			break;
+		default:
+			/* things are really broken, give up */
+			pr_warn("%s: zapq error %02x on apqn 0x%04x\n",
+				__func__, status.response_code, q->apqn);
+			return -EIO;
+		}
+	} while (retry--);
+
+	return -EBUSY;
+}
+
 static void vfio_ap_matrix_init(struct ap_config_info *info,
 				struct ap_matrix *matrix)
 {
@@ -45,6 +96,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 		return -ENOMEM;
 	}
 
+	INIT_LIST_HEAD(&matrix_mdev->qlist);
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	mutex_lock(&matrix_dev->lock);
@@ -113,162 +165,160 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
 	NULL,
 };
 
-struct vfio_ap_queue_reserved {
-	unsigned long *apid;
-	unsigned long *apqi;
-	bool reserved;
-};
+static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
+{
+	struct vfio_ap_queue *q;
+
+	q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
+	if (!q)
+		return;
+	q->matrix_mdev = NULL;
+	vfio_ap_mdev_reset_queue(q);
+	list_move(&q->list, &matrix_dev->free_list);
+}
 
 /**
- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
+ * vfio_ap_put_all_domains:
  *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- *   as reserved if the APID and APQI fields for the AP queue device matches
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ *		 all available queues with a given apqi.
+ * @apid:	 The apid which associated with all defined APQI of the
+ *		 mediated device will define a AP queue.
  *
- * - If @data contains only an apid value, @data will be flagged as
- *   reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- *   reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
+ * We remove the queue from the list of queues associated with the
+ * mediated device and put them back to the free list of the matrix
+ * device and clear the matrix_mdev pointer.
  */
-static int vfio_ap_has_queue(struct device *dev, void *data)
+static void vfio_ap_put_all_domains(struct ap_matrix_mdev *matrix_mdev,
+				    int apid)
 {
-	struct vfio_ap_queue_reserved *qres = data;
-	struct ap_queue *ap_queue = to_ap_queue(dev);
-	ap_qid_t qid;
-	unsigned long id;
+	int apqi, apqn;
 
-	if (qres->apid && qres->apqi) {
-		qid = AP_MKQID(*qres->apid, *qres->apqi);
-		if (qid == ap_queue->qid)
-			qres->reserved = true;
-	} else if (qres->apid && !qres->apqi) {
-		id = AP_QID_CARD(ap_queue->qid);
-		if (id == *qres->apid)
-			qres->reserved = true;
-	} else if (!qres->apid && qres->apqi) {
-		id = AP_QID_QUEUE(ap_queue->qid);
-		if (id == *qres->apqi)
-			qres->reserved = true;
-	} else {
-		return -EINVAL;
+	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
+		apqn = AP_MKQID(apid, apqi);
+		vfio_ap_free_queue(apqn, matrix_mdev);
 	}
-
-	return 0;
 }
 
 /**
- * vfio_ap_verify_queue_reserved
- *
- * @matrix_dev: a mediated matrix device
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
- *
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
- * driver according to the following rules:
+ * vfio_ap_put_all_cards:
  *
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- *   device bound to the vfio_ap driver with the APQN identified by @apid and
- *   @apqi
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ *		 all available queues with a given apqi.
+ * @apqi:	 The apqi which associated with all defined APID of the
+ *		 mediated device will define a AP queue.
  *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apid
- *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apqi
- *
- * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
+ * We remove the queue from the list of queues associated with the
+ * mediated device and put them back to the free list of the matrix
+ * device and clear the matrix_mdev pointer.
  */
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
-					 unsigned long *apqi)
+static void vfio_ap_put_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
 {
-	int ret;
-	struct vfio_ap_queue_reserved qres;
+	int apid, apqn;
 
-	qres.apid = apid;
-	qres.apqi = apqi;
-	qres.reserved = false;
-
-	ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
-				     &qres, vfio_ap_has_queue);
-	if (ret)
-		return ret;
-
-	if (qres.reserved)
-		return 0;
-
-	return -EADDRNOTAVAIL;
+	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+		apqn = AP_MKQID(apid, apqi);
+		vfio_ap_free_queue(apqn, matrix_mdev);
+	}
 }
 
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
-					     unsigned long apid)
+static void move_and_set(struct list_head *src, struct list_head *dst,
+			 struct ap_matrix_mdev *matrix_mdev)
 {
-	int ret;
-	unsigned long apqi;
-	unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
-
-	if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
-		return vfio_ap_verify_queue_reserved(&apid, NULL);
+	struct vfio_ap_queue *q, *qtmp;
 
-	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
-		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
-		if (ret)
-			return ret;
+	list_for_each_entry_safe(q, qtmp, src, list) {
+		list_move(&q->list, dst);
+		q->matrix_mdev = matrix_mdev;
 	}
-
+}
+/**
+ * vfio_ap_get_all_domains:
+ *
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ *		 all available queues with a given apqi.
+ * @apqi:	 The apqi which associated with all defined APID of the
+ *		 mediated device will define a AP queue.
+ *
+ * We define a local list to put all queues we find on the matrix device
+ * free list when associating the apqi with all already defined apid for
+ * this matrix mediated device.
+ *
+ * If we can get all the devices we roll them to the mediated device list
+ * If we get errors we unroll them to the free list.
+ */
+static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
+{
+	int apqi, apqn;
+	int ret = 0;
+	struct vfio_ap_queue *q;
+	struct list_head q_list;
+
+	INIT_LIST_HEAD(&q_list);
+
+	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
+		apqn = AP_MKQID(apid, apqi);
+		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
+		if (!q) {
+			ret = -EADDRNOTAVAIL;
+			goto rewind;
+		}
+		if (q->matrix_mdev) {
+			ret = -EADDRINUSE;
+			goto rewind;
+		}
+		list_move(&q->list, &q_list);
+	}
+	move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
 	return 0;
+rewind:
+	move_and_set(&q_list, &matrix_dev->free_list, NULL);
+	return ret;
 }
-
 /**
- * vfio_ap_mdev_verify_no_sharing
+ * vfio_ap_get_all_cards:
  *
- * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
- * mediated device. AP queue sharing is not allowed.
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ *		 all available queues with a given apqi.
+ * @apqi:	 The apqi which associated with all defined APID of the
+ *		 mediated device will define a AP queue.
  *
- * @matrix_mdev: the mediated matrix device
+ * We define a local list to put all queues we find on the matrix device
+ * free list when associating the apqi with all already defined apid for
+ * this matrix mediated device.
  *
- * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
+ * If we can get all the devices we roll them to the mediated device list
+ * If we get errors we unroll them to the free list.
  */
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
 {
-	struct ap_matrix_mdev *lstdev;
-	DECLARE_BITMAP(apm, AP_DEVICES);
-	DECLARE_BITMAP(aqm, AP_DOMAINS);
-
-	list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
-		if (matrix_mdev == lstdev)
-			continue;
-
-		memset(apm, 0, sizeof(apm));
-		memset(aqm, 0, sizeof(aqm));
-
-		/*
-		 * We work on full longs, as we can only exclude the leftover
-		 * bits in non-inverse order. The leftover is all zeros.
-		 */
-		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
-				lstdev->matrix.apm, AP_DEVICES))
-			continue;
-
-		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
-				lstdev->matrix.aqm, AP_DOMAINS))
-			continue;
-
-		return -EADDRINUSE;
+	int apid, apqn;
+	int ret = 0;
+	struct vfio_ap_queue *q;
+	struct list_head q_list;
+	struct ap_matrix_mdev *tmp = NULL;
+
+	INIT_LIST_HEAD(&q_list);
+
+	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+		apqn = AP_MKQID(apid, apqi);
+		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
+		if (!q) {
+			ret = -EADDRNOTAVAIL;
+			goto rewind;
+		}
+		if (q->matrix_mdev) {
+			ret = -EADDRINUSE;
+			goto rewind;
+		}
+		list_move(&q->list, &q_list);
 	}
-
+	tmp = matrix_mdev;
+	move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
 	return 0;
+rewind:
+	move_and_set(&q_list, &matrix_dev->free_list, NULL);
+	return ret;
 }
 
 /**
@@ -330,21 +380,15 @@ static ssize_t assign_adapter_store(struct device *dev,
 	 */
 	mutex_lock(&matrix_dev->lock);
 
-	ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
+	ret = vfio_ap_get_all_domains(matrix_mdev, apid);
 	if (ret)
 		goto done;
 
 	set_bit_inv(apid, matrix_mdev->matrix.apm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
-	if (ret)
-		goto share_err;
-
 	ret = count;
 	goto done;
 
-share_err:
-	clear_bit_inv(apid, matrix_mdev->matrix.apm);
 done:
 	mutex_unlock(&matrix_dev->lock);
 
@@ -391,32 +435,13 @@ static ssize_t unassign_adapter_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+	vfio_ap_put_all_domains(matrix_mdev, apid);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(unassign_adapter);
 
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
-					     unsigned long apqi)
-{
-	int ret;
-	unsigned long apid;
-	unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
-
-	if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
-		return vfio_ap_verify_queue_reserved(NULL, &apqi);
-
-	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
-		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
-}
-
 /**
  * assign_domain_store
  *
@@ -471,21 +496,15 @@ static ssize_t assign_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 
-	ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
+	ret = vfio_ap_get_all_cards(matrix_mdev, apqi);
 	if (ret)
 		goto done;
 
 	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
-	if (ret)
-		goto share_err;
-
 	ret = count;
 	goto done;
 
-share_err:
-	clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
 done:
 	mutex_unlock(&matrix_dev->lock);
 
@@ -533,6 +552,7 @@ static ssize_t unassign_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
+	vfio_ap_put_all_cards(matrix_mdev, apqi);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -790,49 +810,22 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
-static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
-				    unsigned int retry)
-{
-	struct ap_queue_status status;
-
-	do {
-		status = ap_zapq(AP_MKQID(apid, apqi));
-		switch (status.response_code) {
-		case AP_RESPONSE_NORMAL:
-			return 0;
-		case AP_RESPONSE_RESET_IN_PROGRESS:
-		case AP_RESPONSE_BUSY:
-			msleep(20);
-			break;
-		default:
-			/* things are really broken, give up */
-			return -EIO;
-		}
-	} while (retry--);
-
-	return -EBUSY;
-}
-
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
 {
 	int ret;
 	int rc = 0;
-	unsigned long apid, apqi;
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+	struct vfio_ap_queue *q;
 
-	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
-			     matrix_mdev->matrix.apm_max + 1) {
-		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
-				     matrix_mdev->matrix.aqm_max + 1) {
-			ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
-			/*
-			 * Regardless whether a queue turns out to be busy, or
-			 * is not operational, we need to continue resetting
-			 * the remaining queues.
-			 */
-			if (ret)
-				rc = ret;
-		}
+	list_for_each_entry(q, &matrix_mdev->qlist, list) {
+		ret = vfio_ap_mdev_reset_queue(q);
+		/*
+		 * Regardless whether a queue turns out to be busy, or
+		 * is not operational, we need to continue resetting
+		 * the remaining queues but notice the last error code.
+		 */
+		if (ret)
+			rc = ret;
 	}
 
 	return rc;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier
  2019-02-22 15:29 [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
                   ` (2 preceding siblings ...)
  2019-02-22 15:29 ` [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev Pierre Morel
@ 2019-02-22 15:29 ` Pierre Morel
  2019-02-27  9:42   ` Cornelia Huck
  2019-02-28  8:23   ` Christian Borntraeger
  2019-02-22 15:29 ` [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel Pierre Morel
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-22 15:29 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

To be able to use the VFIO interface to facilitate the
mediated device memory pining/unpining we need to register
a notifier for IOMMU.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c     | 53 ++++++++++++++++++++++++++++++++---
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 172d6eb..1b5130a 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -748,6 +748,36 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
 };
 
 /**
+ * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
+ *
+ * @nb: The notifier block
+ * @action: Action to be taken (VFIO_IOMMU_NOTIFY_DMA_UNMAP)
+ * @data: the specific unmap structure for vfio_iommu_type1
+ *
+ * Unpins the guest IOVA. (The NIB guest address we pinned before).
+ * Return NOTIFY_OK after unpining on a UNMAP request.
+ * otherwise, returns NOTIFY_DONE .
+ */
+static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
+				       unsigned long action, void *data)
+{
+	struct ap_matrix_mdev *matrix_mdev;
+
+	matrix_mdev = container_of(nb, struct ap_matrix_mdev, iommu_notifier);
+
+	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
+		struct vfio_iommu_type1_dma_unmap *unmap = data;
+		unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
+
+		vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
+		return NOTIFY_OK;
+	}
+
+	return NOTIFY_DONE;
+}
+
+
+/**
  * vfio_ap_mdev_set_kvm
  *
  * @matrix_mdev: a mediated matrix device
@@ -846,12 +876,25 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
 
 	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
 				     &events, &matrix_mdev->group_notifier);
-	if (ret) {
-		module_put(THIS_MODULE);
-		return ret;
-	}
+	if (ret)
+		goto err_group;
+
+	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
+	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
+
+	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+				     &events, &matrix_mdev->iommu_notifier);
+	if (ret)
+		goto err_iommu;
 
 	return 0;
+
+err_iommu:
+	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
+				 &matrix_mdev->group_notifier);
+err_group:
+	module_put(THIS_MODULE);
+	return ret;
 }
 
 static void vfio_ap_mdev_release(struct mdev_device *mdev)
@@ -864,6 +907,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
 	vfio_ap_mdev_reset_queues(mdev);
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
 				 &matrix_mdev->group_notifier);
+	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+				 &matrix_mdev->iommu_notifier);
 	matrix_mdev->kvm = NULL;
 	module_put(THIS_MODULE);
 }
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 2760178..e535735 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -81,8 +81,10 @@ struct ap_matrix_mdev {
 	struct list_head node;
 	struct ap_matrix matrix;
 	struct notifier_block group_notifier;
+	struct notifier_block iommu_notifier;
 	struct kvm *kvm;
 	struct list_head qlist;
+	struct mdev_device *mdev;
 };
 
 extern int vfio_ap_mdev_register(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel
  2019-02-22 15:29 [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
                   ` (3 preceding siblings ...)
  2019-02-22 15:29 ` [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier Pierre Morel
@ 2019-02-22 15:29 ` Pierre Morel
  2019-02-26 18:23   ` Tony Krowiak
                     ` (3 more replies)
  2019-02-22 15:29 ` [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device Pierre Morel
                   ` (2 subsequent siblings)
  7 siblings, 4 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-22 15:29 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

We register the AP PQAP instruction hook during the open
of the mediated device. And unregister it on release.

In the AP PQAP instruction hook, if we receive a demand to
enable IRQs,
- we retrieve the vfio_ap_queue based on the APQN we receive
  in REG1,
- we retrieve the page of the guest address, (NIB), from
  register REG2
- we the mediated device to use the VFIO pinning infratrsucture
  to pin the page of the guest address,
- we retrieve the pointer to KVM to register the guest ISC
  and retrieve the host ISC
- finaly we activate GISA

If we receive a demand to disable IRQs,
- we deactivate GISA
- unregister from the GIB
- unping the NIB

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h      |   1 +
 drivers/s390/crypto/ap_bus.h          |   1 +
 drivers/s390/crypto/vfio_ap_ops.c     | 199 +++++++++++++++++++++++++++++++++-
 drivers/s390/crypto/vfio_ap_private.h |   1 +
 4 files changed, 199 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 49cc8b0..5f3bb8c 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -720,6 +720,7 @@ struct kvm_s390_cpu_model {
 struct kvm_s390_crypto {
 	struct kvm_s390_crypto_cb *crycb;
 	int (*pqap_hook)(struct kvm_vcpu *vcpu);
+	void *vfio_private;
 	__u32 crycbd;
 	__u8 aes_kw;
 	__u8 dea_kw;
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index bfc66e4..323f2aa 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -43,6 +43,7 @@ static inline int ap_test_bit(unsigned int *ptr, unsigned int nr)
 #define AP_RESPONSE_BUSY		0x05
 #define AP_RESPONSE_INVALID_ADDRESS	0x06
 #define AP_RESPONSE_OTHERWISE_CHANGED	0x07
+#define AP_RESPONSE_INVALID_GISA	0x08
 #define AP_RESPONSE_Q_FULL		0x10
 #define AP_RESPONSE_NO_PENDING_REPLY	0x10
 #define AP_RESPONSE_INDEX_TOO_BIG	0x11
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 1b5130a..0196065 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -43,7 +43,7 @@ struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
 	return NULL;
 }
 
-static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
+int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
 {
 	struct ap_queue_status status;
 	int retry = 20;
@@ -75,6 +75,27 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
 	return -EBUSY;
 }
 
+/**
+ * vfio_ap_free_irq:
+ * @q: The vfio_ap_queue
+ *
+ * Unpin the guest NIB
+ * Unregister the ISC from the GIB alert
+ * Clear the vfio_ap_queue intern fields
+ */
+static void vfio_ap_free_irq(struct vfio_ap_queue *q)
+{
+	if (!q)
+		return;
+	if (q->g_pfn)
+		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &q->g_pfn, 1);
+	if (q->isc)
+		kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->isc);
+	q->nib = 0;
+	q->isc = 0;
+	q->g_pfn = 0;
+}
+
 static void vfio_ap_matrix_init(struct ap_config_info *info,
 				struct ap_matrix *matrix)
 {
@@ -97,6 +118,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 	}
 
 	INIT_LIST_HEAD(&matrix_mdev->qlist);
+	matrix_mdev->mdev = mdev;
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	mutex_lock(&matrix_dev->lock);
@@ -109,10 +131,16 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 static int vfio_ap_mdev_remove(struct mdev_device *mdev)
 {
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+	struct vfio_ap_queue *q, *qtmp;
 
 	if (matrix_mdev->kvm)
 		return -EBUSY;
 
+	list_for_each_entry_safe(q, qtmp, &matrix_mdev->qlist, list) {
+		q->matrix_mdev = NULL;
+		vfio_ap_mdev_reset_queue(q);
+		list_move(&q->list, &matrix_dev->free_list);
+	}
 	mutex_lock(&matrix_dev->lock);
 	list_del(&matrix_mdev->node);
 	mutex_unlock(&matrix_dev->lock);
@@ -748,6 +776,161 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
 };
 
 /**
+ * vfio_ap_clrirq: Disable Interruption for a APQN
+ *
+ * @dev: the device associated with the ap_queue
+ * @q:   the vfio_ap_queue holding AQIC parameters
+ *
+ * Issue the host side PQAP/AQIC
+ * On success: unpin the NIB saved in *q and unregister from GIB
+ * interface
+ *
+ * Return the ap_queue_status returned by the ap_aqic()
+ */
+static struct ap_queue_status vfio_ap_clrirq(struct vfio_ap_queue *q)
+{
+	struct ap_qirq_ctrl aqic_gisa = {};
+	struct ap_queue_status status;
+
+	status = ap_aqic(q->apqn, aqic_gisa, NULL);
+	if (!status.response_code)
+		vfio_ap_free_irq(q);
+
+	return status;
+}
+
+/**
+ * vfio_ap_setirq: Enable Interruption for a APQN
+ *
+ * @dev: the device associated with the ap_queue
+ * @q:   the vfio_ap_queue holding AQIC parameters
+ *
+ * Pin the NIB saved in *q
+ * Register the guest ISC to GIB interface and retrieve the
+ * host ISC to issue the host side PQAP/AQIC
+ *
+ * Response.status may be set to following Response Code in case of error:
+ * - AP_RESPONSE_INVALID_ADDRESS: vfio_pin_pages failed
+ * - AP_RESPONSE_OTHERWISE_CHANGED: Hypervizor GISA internal error
+ *
+ * Otherwise return the ap_queue_status returned by the ap_aqic()
+ */
+static struct ap_queue_status vfio_ap_setirq(struct vfio_ap_queue *q)
+{
+	struct ap_qirq_ctrl aqic_gisa = {};
+	struct ap_queue_status status = {};
+	struct kvm_s390_gisa *gisa;
+	struct kvm *kvm;
+	unsigned long g_pfn, h_nib, h_pfn;
+	int ret;
+
+	kvm = q->matrix_mdev->kvm;
+	gisa = kvm->arch.gisa_int.origin;
+
+	g_pfn = q->nib >> PAGE_SHIFT;
+	ret = vfio_pin_pages(mdev_dev(q->matrix_mdev->mdev), &g_pfn, 1,
+			     IOMMU_READ | IOMMU_WRITE, &h_pfn);
+	switch (ret) {
+	case 1:
+		break;
+	case -EINVAL:
+	case -E2BIG:
+		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
+		/* Fallthrough */
+	default:
+		return status;
+	}
+
+	h_nib = (h_pfn << PAGE_SHIFT) | (q->nib & ~PAGE_MASK);
+	aqic_gisa.gisc = q->isc;
+	aqic_gisa.isc = kvm_s390_gisc_register(kvm, q->isc);
+	aqic_gisa.ir = 1;
+	aqic_gisa.gisa = gisa->next_alert >> 4;
+
+	status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
+	switch (status.response_code) {
+	case AP_RESPONSE_NORMAL:
+		if (q->g_pfn)
+			vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
+					 &q->g_pfn, 1);
+		q->g_pfn = g_pfn;
+		break;
+	case AP_RESPONSE_OTHERWISE_CHANGED:
+		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &g_pfn, 1);
+		break;
+	case AP_RESPONSE_INVALID_GISA:
+		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
+	default:	/* Fall Through */
+		pr_warn("%s: apqn %04x: response: %02x\n", __func__, q->apqn,
+			status.response_code);
+		vfio_ap_free_irq(q);
+		break;
+	}
+
+	return status;
+}
+
+/**
+ * handle_pqap: PQAP instruction callback
+ *
+ * @vcpu: The vcpu on which we received the PQAP instruction
+ *
+ * Get the general register contents to initialize internal variables.
+ * REG[0]: APQN
+ * REG[1]: IR and ISC
+ * REG[2]: NIB
+ *
+ * Response.status may be set to following Response Code:
+ * - AP_RESPONSE_Q_NOT_AVAIL: if the queue is not available
+ * - AP_RESPONSE_DECONFIGURED: if the queue is not configured
+ * - AP_RESPONSE_NORMAL (0) : in case of successs
+ *   Check vfio_ap_setirq() and vfio_ap_clrirq() for other possible RC.
+ *
+ * Return 0 if we could handle the request inside KVM.
+ * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
+ */
+static int handle_pqap(struct kvm_vcpu *vcpu)
+{
+	uint64_t status;
+	uint16_t apqn;
+	struct vfio_ap_queue *q;
+	struct ap_queue_status qstatus = {};
+	struct ap_matrix_mdev *matrix_mdev;
+
+	/* If we do not use the AIV facility just go to userland */
+	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
+		return -EOPNOTSUPP;
+
+	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
+	matrix_mdev = vcpu->kvm->arch.crypto.vfio_private;
+	if (!matrix_mdev)
+		return -EOPNOTSUPP;
+	q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
+	if (!q) {
+		qstatus.response_code = AP_RESPONSE_Q_NOT_AVAIL;
+		goto out;
+	}
+
+	status = vcpu->run->s.regs.gprs[1];
+
+	/* If IR bit(16) is set we enable the interrupt */
+	if ((status >> (63 - 16)) & 0x01) {
+		q->isc = status & 0x07;
+		q->nib = vcpu->run->s.regs.gprs[2];
+		qstatus = vfio_ap_setirq(q);
+		if (qstatus.response_code) {
+			q->nib = 0;
+			q->isc = 0;
+		}
+	} else
+		qstatus = vfio_ap_clrirq(q);
+
+out:
+	memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus));
+	return 0;
+}
+
+ /*
  * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
  *
  * @nb: The notifier block
@@ -767,9 +950,10 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
 
 	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
 		struct vfio_iommu_type1_dma_unmap *unmap = data;
-		unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
+		unsigned long pfn = unmap->iova >> PAGE_SHIFT;
 
-		vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
+		if (matrix_mdev->mdev)
+			vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &pfn, 1);
 		return NOTIFY_OK;
 	}
 
@@ -879,6 +1063,11 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
 	if (ret)
 		goto err_group;
 
+	if (!matrix_mdev->kvm) {
+		ret = -ENODEV;
+		goto err_iommu;
+	}
+
 	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
 	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
 
@@ -887,6 +1076,8 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
 	if (ret)
 		goto err_iommu;
 
+	matrix_mdev->kvm->arch.crypto.pqap_hook = handle_pqap;
+	matrix_mdev->kvm->arch.crypto.vfio_private = matrix_mdev;
 	return 0;
 
 err_iommu:
@@ -905,6 +1096,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
 		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
 
 	vfio_ap_mdev_reset_queues(mdev);
+	matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+	matrix_mdev->kvm->arch.crypto.vfio_private = NULL;
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
 				 &matrix_mdev->group_notifier);
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index e535735..e2fd2c0 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -94,6 +94,7 @@ struct vfio_ap_queue {
 	struct list_head list;
 	struct ap_matrix_mdev *matrix_mdev;
 	unsigned long nib;
+	unsigned long g_pfn;
 	int	apqn;
 	unsigned char isc;
 };
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device
  2019-02-22 15:29 [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
                   ` (4 preceding siblings ...)
  2019-02-22 15:29 ` [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel Pierre Morel
@ 2019-02-22 15:29 ` Pierre Morel
  2019-02-26 18:27   ` Tony Krowiak
  2019-03-08 22:43   ` Tony Krowiak
  2019-02-22 15:30 ` [PATCH v4 7/7] s390: ap: kvm: Enable PQAP/AQIC facility for the guest Pierre Morel
  2019-02-28 15:08 ` [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Halil Pasic
  7 siblings, 2 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-22 15:29 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

When the device is remove, we must make sure to
clear the interruption and reset the AP device.

We also need to clear the CRYCB of the guest.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     | 35 +++++++++++++++++++++++++++++++++++
 drivers/s390/crypto/vfio_ap_ops.c     |  3 ++-
 drivers/s390/crypto/vfio_ap_private.h |  3 +++
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index eca0ffc..e5d91ff 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -5,6 +5,7 @@
  * Copyright IBM Corp. 2018
  *
  * Author(s): Tony Krowiak <akrowiak@linux.ibm.com>
+ *	      Pierre Morel <pmorel@linux.ibm.com>
  */
 
 #include <linux/module.h>
@@ -12,6 +13,8 @@
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <asm/facility.h>
+#include <linux/bitops.h>
+#include <linux/kvm_host.h>
 #include "vfio_ap_private.h"
 
 #define VFIO_AP_ROOT_NAME "vfio_ap"
@@ -61,6 +64,33 @@ static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
 }
 
 /**
+ * vfio_ap_update_crycb
+ * @q: A pointer to the queue being removed
+ *
+ * We clear the APID of the queue, making this queue unusable for the guest.
+ * After this function we can reset the queue without to fear a race with
+ * the guest to access the queue again.
+ * We do not fear race with the host as we still get the device.
+ */
+static void vfio_ap_update_crycb(struct vfio_ap_queue *q)
+{
+	struct ap_matrix_mdev *matrix_mdev = q->matrix_mdev;
+
+	if (!matrix_mdev)
+		return;
+
+	clear_bit_inv(AP_QID_CARD(q->apqn), matrix_mdev->matrix.apm);
+
+	if (!matrix_mdev->kvm)
+		return;
+
+	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+				  matrix_mdev->matrix.apm,
+				  matrix_mdev->matrix.aqm,
+				  matrix_mdev->matrix.adm);
+}
+
+/**
  * vfio_ap_queue_dev_remove:
  *
  * Free the associated vfio_ap_queue structure
@@ -70,6 +100,11 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
 	struct vfio_ap_queue *q;
 
 	q = dev_get_drvdata(&apdev->device);
+	if (!q)
+		return;
+
+	vfio_ap_update_crycb(q);
+	vfio_ap_mdev_reset_queue(q);
 	list_del(&q->list);
 	kfree(q);
 }
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 0196065..5b9bb33 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -59,6 +59,7 @@ int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
 			if (retry <= 0)
 				pr_warn("%s: queue 0x%04x not empty\n",
 					__func__, q->apqn);
+			vfio_ap_free_irq(q);
 			return 0;
 		case AP_RESPONSE_RESET_IN_PROGRESS:
 		case AP_RESPONSE_BUSY:
@@ -83,7 +84,7 @@ int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
  * Unregister the ISC from the GIB alert
  * Clear the vfio_ap_queue intern fields
  */
-static void vfio_ap_free_irq(struct vfio_ap_queue *q)
+void vfio_ap_free_irq(struct vfio_ap_queue *q)
 {
 	if (!q)
 		return;
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index e2fd2c0..cc18215 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -4,6 +4,7 @@
  *
  * Author(s): Tony Krowiak <akrowiak@linux.ibm.com>
  *	      Halil Pasic <pasic@linux.ibm.com>
+ *	      Pierre Morel <pmorel@linux.ibm.com>
  *
  * Copyright IBM Corp. 2018
  */
@@ -98,4 +99,6 @@ struct vfio_ap_queue {
 	int	apqn;
 	unsigned char isc;
 };
+void vfio_ap_free_irq(struct vfio_ap_queue *q);
+int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q);
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 7/7] s390: ap: kvm: Enable PQAP/AQIC facility for the guest
  2019-02-22 15:29 [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
                   ` (5 preceding siblings ...)
  2019-02-22 15:29 ` [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device Pierre Morel
@ 2019-02-22 15:30 ` Pierre Morel
  2019-02-28 15:08 ` [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Halil Pasic
  7 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-22 15:30 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

AP Queue Interruption Control (AQIC) facility gives
the guest the possibility to control interruption for
the Cryptographic Adjunct Processor queues.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 arch/s390/tools/gen_facilities.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/tools/gen_facilities.c b/arch/s390/tools/gen_facilities.c
index fd788e0..18d317d 100644
--- a/arch/s390/tools/gen_facilities.c
+++ b/arch/s390/tools/gen_facilities.c
@@ -108,6 +108,7 @@ static struct facility_def facility_defs[] = {
 		.bits = (int[]){
 			12, /* AP Query Configuration Information */
 			15, /* AP Facilities Test */
+			65, /* AP Queue Interruption Control */
 			156, /* etoken facility */
 			-1  /* END */
 		}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-22 15:29 ` [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC Pierre Morel
@ 2019-02-25 18:36   ` Tony Krowiak
  2019-02-26 11:47     ` Pierre Morel
  2019-02-28  8:31     ` Christian Borntraeger
  0 siblings, 2 replies; 79+ messages in thread
From: Tony Krowiak @ 2019-02-25 18:36 UTC (permalink / raw)
  To: Pierre Morel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/22/19 10:29 AM, Pierre Morel wrote:
> We prepare the interception of the PQAP/AQIC instruction for
> the case the AQIC facility is enabled in the guest.
> 
> We add a callback inside the KVM arch structure for s390 for
> a VFIO driver to handle a specific response to the PQAP
> instruction with the AQIC command.
> 
> We inject the correct exceptions from inside KVM for the case the
> callback is not initialized, which happens when the vfio_ap driver
> is not loaded.
> 
> If the callback has been setup we call it.
> If not we setup an answer considering that no queue is available
> for the guest when no callback has been setup.
> 
> We do consider the responsability of the driver to always initialize
> the PQAP callback if it defines queues by initializing the CRYCB for
> a guest.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>   arch/s390/include/asm/kvm_host.h |  1 +
>   arch/s390/kvm/priv.c             | 52 ++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 53 insertions(+)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index c5f5156..49cc8b0 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -719,6 +719,7 @@ struct kvm_s390_cpu_model {
>   
>   struct kvm_s390_crypto {
>   	struct kvm_s390_crypto_cb *crycb;
> +	int (*pqap_hook)(struct kvm_vcpu *vcpu);
>   	__u32 crycbd;
>   	__u8 aes_kw;
>   	__u8 dea_kw;
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 8679bd7..3448abd 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -27,6 +27,7 @@
>   #include <asm/io.h>
>   #include <asm/ptrace.h>
>   #include <asm/sclp.h>
> +#include <asm/ap.h>
>   #include "gaccess.h"
>   #include "kvm-s390.h"
>   #include "trace.h"
> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>   	}
>   }
>   
> +/*
> + * handle_pqap: Handling pqap interception
> + * @vcpu: the vcpu having issue the pqap instruction
> + *
> + * We now support PQAP/AQIC instructions and we need to correctly
> + * answer the guest even if no dedicated driver's hook is available.
> + *
> + * The intercepting code calls a dedicated callback for this instruction
> + * if a driver did register one in the CRYPTO satellite of the
> + * SIE block.
> + *
> + * For PQAP/AQIC instructions only, verify privilege and specifications.
> + *
> + * If no callback available, the queues are not available, return this to
> + * the caller.
> + * Else return the value returned by the callback.
> + */
> +static int handle_pqap(struct kvm_vcpu *vcpu)
> +{
> +	uint8_t fc;
> +	struct ap_queue_status status = {};
> +
> +	/* Verify that the AP instruction are available */
> +	if (!ap_instructions_available())
> +		return -EOPNOTSUPP;

How can the guest even execute an AP instruction if the AP instructions
are not available? If the AP instructions are not available on the host,
they will not be available on the guest (i.e., CPU model feature
S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
here given QEMU may not be the only client.

> +	/* Verify that the guest is allowed to use AP instructions */
> +	if (!(vcpu->arch.sie_block->eca & ECA_APIE))
> +		return -EOPNOTSUPP;
> +	/* Verify that the function code is AQIC */
> +	fc = vcpu->run->s.regs.gprs[0] >> 24;
> +	if (fc != 0x03)
> +		return -EOPNOTSUPP;

You must have missed my suggestion to move this to the
vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:

Message ID <342ffd56-b73a-b1f4-004d-de2c4aeef729@linux.ibm.com>
Message ID <e04f0c8b-2fd9-1846-334a-faa48e0e051e@linux.ibm.com>

You previously stated:

    "QEMU and KVM can both accept PQAP/AQIC even if the vfio_ap driver is
     not loaded. However now that the guest officially get the PQAP/AQIC
     instruction we need to handle the specification and operation
     exceptions inside KVM _before_ testing and even calling the driver
     hook.

     I will make the changes in the next iteration."

I don't know what any of the above has to do with checking FC=0x03? If
that check is moved to the pqap handler hook, it can just as well return
-EOPNOTSUPP. In fact, down below you do this:

	return vcpu->kvm->arch.crypto.pqap_hook(vcpu);

If the RC=0x03 check fails in the hook, it will return -EOPNOTSUPP just
like above. None of this is critical, but the parsing of the register
values for the PQAP(AQIC) function ought to be done in the code that
handles the PQAP instruction IMHO.

> +
> +	/* PQAP instructions are allowed for guest kernel only */
> +	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> +		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
> +	/* AQIC instruction is allowed only if facility 65 is available */
> +	if (!test_kvm_facility(vcpu->kvm, 65))
> +		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
> +	/* Verify that the hook callback is registered and call it */
> +	if (vcpu->kvm->arch.crypto.pqap_hook)
> +		return vcpu->kvm->arch.crypto.pqap_hook(vcpu);
> +
> +	/* PQAP/AQIC instructions are authorized but there is no queue */
> +	status.response_code = 0x01;
> +	memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
> +	return 0;
> +}
> +
>   static int handle_stfl(struct kvm_vcpu *vcpu)
>   {
>   	int rc;
> @@ -878,6 +928,8 @@ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu)
>   		return handle_sthyi(vcpu);
>   	case 0x7d:
>   		return handle_stsi(vcpu);
> +	case 0xaf:
> +		return handle_pqap(vcpu);
>   	case 0xb1:
>   		return handle_stfl(vcpu);
>   	case 0xb2:
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-25 18:36   ` Tony Krowiak
@ 2019-02-26 11:47     ` Pierre Morel
  2019-02-26 15:47       ` Tony Krowiak
  2019-02-28  8:31     ` Christian Borntraeger
  1 sibling, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-26 11:47 UTC (permalink / raw)
  To: Tony Krowiak, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 25/02/2019 19:36, Tony Krowiak wrote:
> On 2/22/19 10:29 AM, Pierre Morel wrote:
>> We prepare the interception of the PQAP/AQIC instruction for
>> the case the AQIC facility is enabled in the guest.
>>
>> We add a callback inside the KVM arch structure for s390 for
>> a VFIO driver to handle a specific response to the PQAP
>> instruction with the AQIC command.
>>
>> We inject the correct exceptions from inside KVM for the case the
>> callback is not initialized, which happens when the vfio_ap driver
>> is not loaded.
>>
>> If the callback has been setup we call it.
>> If not we setup an answer considering that no queue is available
>> for the guest when no callback has been setup.
>>
>> We do consider the responsability of the driver to always initialize
>> the PQAP callback if it defines queues by initializing the CRYCB for
>> a guest.
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>

...snip...

>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>       }
>>   }
>> +/*
>> + * handle_pqap: Handling pqap interception
>> + * @vcpu: the vcpu having issue the pqap instruction
>> + *
>> + * We now support PQAP/AQIC instructions and we need to correctly
>> + * answer the guest even if no dedicated driver's hook is available.
>> + *
>> + * The intercepting code calls a dedicated callback for this instruction
>> + * if a driver did register one in the CRYPTO satellite of the
>> + * SIE block.
>> + *
>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>> + *
>> + * If no callback available, the queues are not available, return 
>> this to
>> + * the caller.
>> + * Else return the value returned by the callback.
>> + */
>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>> +{
>> +    uint8_t fc;
>> +    struct ap_queue_status status = {};
>> +
>> +    /* Verify that the AP instruction are available */
>> +    if (!ap_instructions_available())
>> +        return -EOPNOTSUPP;
> 
> How can the guest even execute an AP instruction if the AP instructions
> are not available? If the AP instructions are not available on the host,
> they will not be available on the guest (i.e., CPU model feature
> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
> here given QEMU may not be the only client.
> 
>> +    /* Verify that the guest is allowed to use AP instructions */
>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>> +        return -EOPNOTSUPP;
>> +    /* Verify that the function code is AQIC */
>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>> +    if (fc != 0x03)
>> +        return -EOPNOTSUPP;
> 
> You must have missed my suggestion to move this to the
> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:

Please consider what happen if the vfio_ap module is not loaded.

> 
> Message ID <342ffd56-b73a-b1f4-004d-de2c4aeef729@linux.ibm.com>
> Message ID <e04f0c8b-2fd9-1846-334a-faa48e0e051e@linux.ibm.com>
> 
> You previously stated:
> 
>     "QEMU and KVM can both accept PQAP/AQIC even if the vfio_ap driver is
>      not loaded. However now that the guest officially get the PQAP/AQIC
>      instruction we need to handle the specification and operation
>      exceptions inside KVM _before_ testing and even calling the driver
>      hook.
> 
>      I will make the changes in the next iteration."

Still seems right to me, and is done is this patch.
Isn't it?

> 
> I don't know what any of the above has to do with checking FC=0x03? If
> that check is moved to the pqap handler hook, it can just as well return
> -EOPNOTSUPP. In fact, down below you do this:
> 
>      return vcpu->kvm->arch.crypto.pqap_hook(vcpu);
> 
> If the RC=0x03 check fails in the hook, it will return -EOPNOTSUPP just
> like above. None of this is critical, but the parsing of the register
> values for the PQAP(AQIC) function ought to be done in the code that
> handles the PQAP instruction IMHO.


This interception code must handle the PQAP/AQIC instruction when the 
hook is not used and should not modify the handling for other PQAP 
instructions.
We can not move anything inside the hook that must be always done.

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-26 11:47     ` Pierre Morel
@ 2019-02-26 15:47       ` Tony Krowiak
  2019-02-27  8:09         ` Pierre Morel
  0 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2019-02-26 15:47 UTC (permalink / raw)
  To: pmorel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/26/19 6:47 AM, Pierre Morel wrote:
> On 25/02/2019 19:36, Tony Krowiak wrote:
>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>> We prepare the interception of the PQAP/AQIC instruction for
>>> the case the AQIC facility is enabled in the guest.
>>>
>>> We add a callback inside the KVM arch structure for s390 for
>>> a VFIO driver to handle a specific response to the PQAP
>>> instruction with the AQIC command.
>>>
>>> We inject the correct exceptions from inside KVM for the case the
>>> callback is not initialized, which happens when the vfio_ap driver
>>> is not loaded.
>>>
>>> If the callback has been setup we call it.
>>> If not we setup an answer considering that no queue is available
>>> for the guest when no callback has been setup.
>>>
>>> We do consider the responsability of the driver to always initialize
>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>> a guest.
>>>
>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> 
> ...snip...
> 
>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>       }
>>>   }
>>> +/*
>>> + * handle_pqap: Handling pqap interception
>>> + * @vcpu: the vcpu having issue the pqap instruction
>>> + *
>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>> + * answer the guest even if no dedicated driver's hook is available.
>>> + *
>>> + * The intercepting code calls a dedicated callback for this 
>>> instruction
>>> + * if a driver did register one in the CRYPTO satellite of the
>>> + * SIE block.
>>> + *
>>> + * For PQAP/AQIC instructions only, verify privilege and 
>>> specifications.
>>> + *
>>> + * If no callback available, the queues are not available, return 
>>> this to
>>> + * the caller.
>>> + * Else return the value returned by the callback.
>>> + */
>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>> +{
>>> +    uint8_t fc;
>>> +    struct ap_queue_status status = {};
>>> +
>>> +    /* Verify that the AP instruction are available */
>>> +    if (!ap_instructions_available())
>>> +        return -EOPNOTSUPP;
>>
>> How can the guest even execute an AP instruction if the AP instructions
>> are not available? If the AP instructions are not available on the host,
>> they will not be available on the guest (i.e., CPU model feature
>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>> here given QEMU may not be the only client.
>>
>>> +    /* Verify that the guest is allowed to use AP instructions */
>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>> +        return -EOPNOTSUPP;
>>> +    /* Verify that the function code is AQIC */
>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>> +    if (fc != 0x03)
>>> +        return -EOPNOTSUPP;
>>
>> You must have missed my suggestion to move this to the
>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
> 
> Please consider what happen if the vfio_ap module is not loaded.

I have considered it and even verified my expectations empirically. If
the vfio_ap module is not loaded, you will not be able to create an mdev 
device. If you don't have an mdev device, you will not be able to
start a guest with a vfio-ap device. If you start a guest without a
vfio-ap device, but enable AP instructions for the guest, there will be
no AP devices attached to the guest. Without any AP devices attached,
the PQAP(AQIC) instructions will not ever get executed. Even if for some
unknown reason the PQAP(AQIC) instruction is executed - for some unknown
reason, it will fail with response code 0x01, AP-queue number not valid.


> 
>>
>> Message ID <342ffd56-b73a-b1f4-004d-de2c4aeef729@linux.ibm.com>
>> Message ID <e04f0c8b-2fd9-1846-334a-faa48e0e051e@linux.ibm.com>
>>
>> You previously stated:
>>
>>     "QEMU and KVM can both accept PQAP/AQIC even if the vfio_ap driver is
>>      not loaded. However now that the guest officially get the PQAP/AQIC
>>      instruction we need to handle the specification and operation
>>      exceptions inside KVM _before_ testing and even calling the driver
>>      hook.
>>
>>      I will make the changes in the next iteration."
> 
> Still seems right to me, and is done is this patch.
> Isn't it?

I don't think it's a matter of right and wrong, it's a matter of what
makes sense. IMHO, you want to make things easy if other PQAP functions
are intercepted at some time. In my opinion, there should be a switch
statement in the pqap hook code with a case statement for each PQAP
function supported by the hook. To plug in a new PQAP function handler,
it will be a simple matter of writing the handler function and calling
it from the case statement, like this:

static int handle_pqap(struct kvm_vcpu *vcpu)
{
	int ret;
	uint8_t fc;

	fc = vcpu->run->s.regs.gprs[0] >> 24;

	switch (fc) {
	case 0x03:
		ret = handle_pqap_aqic(vcpu);
	default:
		ret = -EOPNOTSUPP;
	}

	return ret;
}

That function belongs in the pqap hook. I see no reaason whatsoever to
check the function code here. If there is no hook, then you will fall
through to the instruction below:

status.response_code = 0x01;

> 
>>
>> I don't know what any of the above has to do with checking FC=0x03? If
>> that check is moved to the pqap handler hook, it can just as well return
>> -EOPNOTSUPP. In fact, down below you do this:
>>
>>      return vcpu->kvm->arch.crypto.pqap_hook(vcpu);
>>
>> If the RC=0x03 check fails in the hook, it will return -EOPNOTSUPP just
>> like above. None of this is critical, but the parsing of the register
>> values for the PQAP(AQIC) function ought to be done in the code that
>> handles the PQAP instruction IMHO.
> 
> 
> This interception code must handle the PQAP/AQIC instruction when the 
> hook is not used and should not modify the handling for other PQAP 
> instructions.
> We can not move anything inside the hook that must be always done.

What you are saying here makes no sense. If the check for the function
code is moved into the pqap hook and fc != 0x03, the result will be
exactly the same; the hook will return -EOPNOTSUPP.

> 
> Regards,
> Pierre
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 2/7] s390: ap: new vfio_ap_queue structure
  2019-02-22 15:29 ` [PATCH v4 2/7] s390: ap: new vfio_ap_queue structure Pierre Morel
@ 2019-02-26 16:10   ` Tony Krowiak
  2019-02-27  8:40     ` Pierre Morel
  0 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2019-02-26 16:10 UTC (permalink / raw)
  To: Pierre Morel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/22/19 10:29 AM, Pierre Morel wrote:
> The AP interruptions are assigned on a queue basis and
> the GISA structure is handled on a VM basis, so that
> we need to add a structure we can retrieve from both side
> holding the information we need to handle PQAP/AQIC interception
> and setup the GISA.
> 
> Since we can not add more information to the ap_device
> we add a new structure vfio_ap_queue, to hold per queue
> information useful to handle interruptions and set it as
> driver's data of the standard ap_queue device.
> 
> Usually, the device and the mediated device are linked together
> but in the vfio_ap driver design we have a bunch of "sub" devices
> (the ap_queue devices) belonging to the mediated device.
> 
> Linking these structure to the mediated device it is assigned to,
> with the help of the vfio_ap_queue structure will help us to
> retrieve the AP devices associated with the mediated devices
> during the mediated device operations.
> 
> ------------    -------------
> | AP queue |--> | AP_vfio_q |<----
> ------------    ------^------    |    ---------------
>                        |          <--->| matrix_mdev |
> ------------    ------v------    |    ---------------
> | AP queue |--> | AP_vfio_q |-----
> ------------    -------------
> 
> The vfio_ap_queue device will hold the following entries:
> - apqn: AP queue number (defined here)
> - isc : Interrupt subclass (defined later)
> - nib : notification information byte (defined later)
> - list: a list_head entry allowing to link this structure to a
> 	matrix mediated device it is assigned to.
> 
> The vfio_ap_queue structure is allocated when the vfio_ap_driver
> is probed and added as driver data to the ap_queue device.
> It is free on remove.
> 
> The structure is linked to the matrix_dev host device at the
> probe of the device building some kind of free list for the
> matrix mediated devices.
> 
> When the vfio_queue is associated to a matrix mediated device,
> the vfio_ap_queue device is linked to this matrix mediated device
> and unlinked when dissociated.
> 
> This patch and the 3 next can be squashed together on the
> final release of this series.
> until then I separate them to ease the review.
> 
> So please do not complain about unused functions or about
> squashing the patches together, this will be resolved during
> the last iteration.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>   drivers/s390/crypto/vfio_ap_drv.c     | 27 ++++++++++++++++++++++++++-
>   drivers/s390/crypto/vfio_ap_private.h |  9 +++++++++
>   2 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index e9824c3..eca0ffc 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -40,14 +40,38 @@ static struct ap_device_id ap_queue_ids[] = {
>   
>   MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>   
> +/**
> + * vfio_ap_queue_dev_probe:
> + *
> + * Allocate a vfio_ap_queue structure and associate it
> + * with the device as driver_data.
> + */
>   static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>   {
> +	struct vfio_ap_queue *q;
> +
> +	q = kzalloc(sizeof(*q), GFP_KERNEL);
> +	if (!q)
> +		return -ENOMEM;
> +	dev_set_drvdata(&apdev->device, q);
> +	q->apqn = to_ap_queue(&apdev->device)->qid;
> +	INIT_LIST_HEAD(&q->list);
> +	list_add(&q->list, &matrix_dev->free_list);
>   	return 0;
>   }
>   
> +/**
> + * vfio_ap_queue_dev_remove:
> + *
> + * Free the associated vfio_ap_queue structure
> + */
>   static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>   {
> -	/* Nothing to do yet */
> +	struct vfio_ap_queue *q;
> +
> +	q = dev_get_drvdata(&apdev->device);
> +	list_del(&q->list);
> +	kfree(q);
>   }
>   
>   static void vfio_ap_matrix_dev_release(struct device *dev)
> @@ -107,6 +131,7 @@ static int vfio_ap_matrix_dev_create(void)
>   	matrix_dev->device.bus = &matrix_bus;
>   	matrix_dev->device.release = vfio_ap_matrix_dev_release;
>   	matrix_dev->vfio_ap_drv = &vfio_ap_drv;
> +	INIT_LIST_HEAD(&matrix_dev->free_list);
>   
>   	ret = device_register(&matrix_dev->device);
>   	if (ret)
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 76b7f98..2760178 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -39,6 +39,7 @@ struct ap_matrix_dev {
>   	atomic_t available_instances;
>   	struct ap_config_info info;
>   	struct list_head mdev_list;
> +	struct list_head free_list;
>   	struct mutex lock;
>   	struct ap_driver  *vfio_ap_drv;
>   };
> @@ -81,9 +82,17 @@ struct ap_matrix_mdev {
>   	struct ap_matrix matrix;
>   	struct notifier_block group_notifier;
>   	struct kvm *kvm;
> +	struct list_head qlist;

I do not see much value in maintaining two lists of at the
expense of complicating the code and introducing additional
processing (i.e., list rewinds etc.). IMHO, the only think it buys
us is being able to pass a smaller list to the vfio_ap_get_queue()
function to traverse. That function can traverse the list in
struct ap_matrix_dev to find a queue. I understand what you are
doing here in pulling vfio_ap_queue structs from the free_list
to add them to qlist for the mdev when adapter/domain assignment
takes place, but you are now maintaining links to the vfio_ap_queue
in multiple places; as drvdata as well as two lists. I think this
is over designing.

>   };
>   
>   extern int vfio_ap_mdev_register(void);
>   extern void vfio_ap_mdev_unregister(void);
>   
> +struct vfio_ap_queue {
> +	struct list_head list;
> +	struct ap_matrix_mdev *matrix_mdev;
> +	unsigned long nib;
> +	int	apqn;
> +	unsigned char isc;
> +};
>   #endif /* _VFIO_AP_PRIVATE_H_ */
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-02-22 15:29 ` [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev Pierre Morel
@ 2019-02-26 18:14   ` Tony Krowiak
  2019-02-27  9:29     ` Pierre Morel
  2019-02-27  9:32   ` Cornelia Huck
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2019-02-26 18:14 UTC (permalink / raw)
  To: Pierre Morel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/22/19 10:29 AM, Pierre Morel wrote:
> We need to associate the ap_vfio_queue, which will hold the
> per queue information for interrupt with a matrix mediated device
> which hold the configuration and the way to the CRYCB.
> 
> Let's do this when assigning a APID or a APQI to the mediated device
> and clear the relation when unassigning.
> 
> Queuing the devices on a list of free devices and testing the
> matrix_mdev pointer to the associated matrix allow us to know
> if the queue is associated to the matrix device and associated
> or not to a mediated device.
> 
> When resetting an AP queue we must wait until there are no more
> messages in the message queue before considering the queue is really
> in a clean state.
> 
> Let's do it and wait until the status response code indicate the
> queue is empty after issuing a PAPQ/ZAPQ instruction.
> 
> Being at work on the reset function, let's simplify
> vfio_ap_mdev_reset_queue and vfio_ap_mdev_reset_queues by using the
> vfio_ap_queue structure as parameter.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>   drivers/s390/crypto/vfio_ap_ops.c | 385 +++++++++++++++++++-------------------
>   1 file changed, 189 insertions(+), 196 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 900b9cf..172d6eb 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -24,6 +24,57 @@
>   #define VFIO_AP_MDEV_TYPE_HWVIRT "passthrough"
>   #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>   
> +/**
> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> + * @apqn: The queue APQN
> + *
> + * Retrieve a queue with a specific APQN from the list of the
> + * devices associated with a list.
> + *
> + * Returns the pointer to the associated vfio_ap_queue
> + */
> +struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	list_for_each_entry(q, l, list)
> +		if (q->apqn == apqn)
> +			return q;
> +	return NULL;
> +} > +
> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
> +{
> +	struct ap_queue_status status;
> +	int retry = 20;
> +
> +	do {
> +		status = ap_zapq(q->apqn);
> +		switch (status.response_code) {
> +		case AP_RESPONSE_NORMAL:
> +			while (!status.queue_empty && retry--) {
> +				msleep(20);
> +				status = ap_tapq(q->apqn, NULL);
> +			}

I am not sure the above is necessary. I have an email out to the author
of the architecture doc to verify.

> +			if (retry <= 0)
> +				pr_warn("%s: queue 0x%04x not empty\n",
> +					__func__, q->apqn);
> +			return 0;
> +		case AP_RESPONSE_RESET_IN_PROGRESS:
> +		case AP_RESPONSE_BUSY:
> +			msleep(20);
> +			break;
> +		default:
> +			/* things are really broken, give up */
> +			pr_warn("%s: zapq error %02x on apqn 0x%04x\n",
> +				__func__, status.response_code, q->apqn);
> +			return -EIO;
> +		}
> +	} while (retry--);
> +
> +	return -EBUSY;
> +}
> +
>   static void vfio_ap_matrix_init(struct ap_config_info *info,
>   				struct ap_matrix *matrix)
>   {
> @@ -45,6 +96,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>   		return -ENOMEM;
>   	}
>   
> +	INIT_LIST_HEAD(&matrix_mdev->qlist);
>   	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>   	mdev_set_drvdata(mdev, matrix_mdev);
>   	mutex_lock(&matrix_dev->lock);
> @@ -113,162 +165,160 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>   	NULL,
>   };
>   
> -struct vfio_ap_queue_reserved {
> -	unsigned long *apid;
> -	unsigned long *apqi;
> -	bool reserved;
> -};
> +static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
> +	if (!q)
> +		return;
> +	q->matrix_mdev = NULL;
> +	vfio_ap_mdev_reset_queue(q);
> +	list_move(&q->list, &matrix_dev->free_list);
> +}
>   
>   /**
> - * vfio_ap_has_queue
> - *
> - * @dev: an AP queue device
> - * @data: a struct vfio_ap_queue_reserved reference
> - *
> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
> - * apid or apqi specified in @data:
> + * vfio_ap_put_all_domains

>    *
> - * - If @data contains both an apid and apqi value, then @data will be flagged
> - *   as reserved if the APID and APQI fields for the AP queue device matches
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + *		 all available queues with a given apqi.
> + * @apid:	 The apid which associated with all defined APQI of the
> + *		 mediated device will define a AP queue.
>    *
> - * - If @data contains only an apid value, @data will be flagged as
> - *   reserved if the APID field in the AP queue device matches
> - *
> - * - If @data contains only an apqi value, @data will be flagged as
> - *   reserved if the APQI field in the AP queue device matches
> - *
> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
> - * @data does not contain either an apid or apqi.
> + * We remove the queue from the list of queues associated with the
> + * mediated device and put them back to the free list of the matrix
> + * device and clear the matrix_mdev pointer.
>    */
> -static int vfio_ap_has_queue(struct device *dev, void *data)
> +static void vfio_ap_put_all_domains(struct ap_matrix_mdev *matrix_mdev,
> +				    int apid)

I would prefer this be named:

	vfio_ap_mdev_free_queues_with_apid()

get/put is typically used to increment/decrement reference counters.
What you are doing in this function freeing all queues connected to 
specified card.

>   {
> -	struct vfio_ap_queue_reserved *qres = data;
> -	struct ap_queue *ap_queue = to_ap_queue(dev);
> -	ap_qid_t qid;
> -	unsigned long id;
> +	int apqi, apqn;
>   
> -	if (qres->apid && qres->apqi) {
> -		qid = AP_MKQID(*qres->apid, *qres->apqi);
> -		if (qid == ap_queue->qid)
> -			qres->reserved = true;
> -	} else if (qres->apid && !qres->apqi) {
> -		id = AP_QID_CARD(ap_queue->qid);
> -		if (id == *qres->apid)
> -			qres->reserved = true;
> -	} else if (!qres->apid && qres->apqi) {
> -		id = AP_QID_QUEUE(ap_queue->qid);
> -		if (id == *qres->apqi)
> -			qres->reserved = true;
> -	} else {
> -		return -EINVAL;
> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> +		apqn = AP_MKQID(apid, apqi);
> +		vfio_ap_free_queue(apqn, matrix_mdev);
>   	}

Maybe you should clear the bit corresponding to apid from the APM here?

> -
> -	return 0;
>   }
>   
>   /**
> - * vfio_ap_verify_queue_reserved
> - *
> - * @matrix_dev: a mediated matrix device
> - * @apid: an AP adapter ID
> - * @apqi: an AP queue index
> - *
> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
> - * driver according to the following rules:
> + * vfio_ap_put_all_cards:
>    *
> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
> - *   device bound to the vfio_ap driver with the APQN identified by @apid and
> - *   @apqi
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + *		 all available queues with a given apqi.
> + * @apqi:	 The apqi which associated with all defined APID of the
> + *		 mediated device will define a AP queue.
>    *
> - * - If only @apid is not NULL, then there must be an AP queue device bound
> - *   to the vfio_ap driver with an APQN containing @apid
> - *
> - * - If only @apqi is not NULL, then there must be an AP queue device bound
> - *   to the vfio_ap driver with an APQN containing @apqi
> - *
> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
> + * We remove the queue from the list of queues associated with the
> + * mediated device and put them back to the free list of the matrix
> + * device and clear the matrix_mdev pointer.
>    */
> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
> -					 unsigned long *apqi)
> +static void vfio_ap_put_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)

I don't like this name for the same reasons I stated above for
put_all_domains. I prefer something like:

	vfio_ap_mdev_free_queues_with_apqi()

>   {
> -	int ret;
> -	struct vfio_ap_queue_reserved qres;
> +	int apid, apqn;
>   
> -	qres.apid = apid;
> -	qres.apqi = apqi;
> -	qres.reserved = false;
> -
> -	ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> -				     &qres, vfio_ap_has_queue);
> -	if (ret)
> -		return ret;
> -
> -	if (qres.reserved)
> -		return 0;
> -
> -	return -EADDRNOTAVAIL;
> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> +		apqn = AP_MKQID(apid, apqi);
> +		vfio_ap_free_queue(apqn, matrix_mdev);
> +	}

Maybe clear the apqi from the APM here?

>   }
>   
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
> -					     unsigned long apid)
> +static void move_and_set(struct list_head *src, struct list_head *dst,
> +			 struct ap_matrix_mdev *matrix_mdev)
>   {
> -	int ret;
> -	unsigned long apqi;
> -	unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
> -
> -	if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
> -		return vfio_ap_verify_queue_reserved(&apid, NULL);
> +	struct vfio_ap_queue *q, *qtmp;
>   
> -	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
> -		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> -		if (ret)
> -			return ret;
> +	list_for_each_entry_safe(q, qtmp, src, list) {
> +		list_move(&q->list, dst);
> +		q->matrix_mdev = matrix_mdev;
>   	}
> -
> +}
> +/**
> + * vfio_ap_get_all_domains:
> + *
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + *		 all available queues with a given apqi.
> + * @apqi:	 The apqi which associated with all defined APID of the
> + *		 mediated device will define a AP queue.
> + *
> + * We define a local list to put all queues we find on the matrix device
> + * free list when associating the apqi with all already defined apid for
> + * this matrix mediated device.
> + *
> + * If we can get all the devices we roll them to the mediated device list
> + * If we get errors we unroll them to the free list.
> + */
> +static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)

I'd prefer to change this name to something like:

	vfio_ap_mdev_get_queues_with_apid()



> +{
> +	int apqi, apqn;
> +	int ret = 0;
> +	struct vfio_ap_queue *q;
> +	struct list_head q_list;
> +
> +	INIT_LIST_HEAD(&q_list);
> +
> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> +		apqn = AP_MKQID(apid, apqi);
> +		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
> +		if (!q) {
> +			ret = -EADDRNOTAVAIL;
> +			goto rewind;
> +		}
> +		if (q->matrix_mdev) {
> +			ret = -EADDRINUSE;
> +			goto rewind;
> +		}
> +		list_move(&q->list, &q_list);

IMHO, all of the list moving and rewinding is over complicated and
not necessary If you simply maintain the list of queues in the
matrix_mdev.

> +	}
> +	move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);

Maybe set the bit for the apid in the APM here instead of in the
calling function?

>   	return 0;
> +rewind:
> +	move_and_set(&q_list, &matrix_dev->free_list, NULL);
> +	return ret;
>   }
> -
>   /**
> - * vfio_ap_mdev_verify_no_sharing
> + * vfio_ap_get_all_cards:
>    *
> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> - * and AP queue indexes comprising the AP matrix are not configured for another
> - * mediated device. AP queue sharing is not allowed.
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + *		 all available queues with a given apqi.
> + * @apqi:	 The apqi which associated with all defined APID of the
> + *		 mediated device will define a Avfio_ap_put_all_domainsP queue.
>    *
> - * @matrix_mdev: the mediated matrix device
> + * We define a local list to put all queues we find on the matrix device
> + * free list when associating the apqi with all already defined apid for
> + * this matrix mediated device.
>    *
> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> + * If we can get all the devices we roll them to the mediated device list
> + * If we get errors we unroll them to the free list.
>    */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
>   {

I'd prefer to change this name to something like:

	vfio_ap_mdev_get_queues_with_apqi()

> -	struct ap_matrix_mdev *lstdev;
> -	DECLARE_BITMAP(apm, AP_DEVICES);
> -	DECLARE_BITMAP(aqm, AP_DOMAINS);
> -
> -	list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
> -		if (matrix_mdev == lstdev)
> -			continue;
> -
> -		memset(apm, 0, sizeof(apm));
> -		memset(aqm, 0, sizeof(aqm));vfio_ap_put_all_domains
> -
> -		/*
> -		 * We work on full longs, as we can only exclude the leftover
> -		 * bits in non-inverse order. The leftover is all zeros.
> -		 */
> -		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> -				lstdev->matrix.apm, AP_DEVICES))
> -			continue;
> -
> -		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> -				lstdev->matrix.aqm, AP_DOMAINS))
> -			continue;
> -
> -		return -EADDRINUSE;
> +	int apid, apqn;
> +	int ret = 0;
> +	struct vfio_ap_queue *q;
> +	struct list_head q_list;
> +	struct ap_matrix_mdev *tmp = NULL;
> +vfio_ap_put_all_domains
> +	INIT_LIST_HEAD(&q_list);
> +
> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> +		apqn = AP_MKQID(apid, apqi);
> +		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
> +		if (!q) {
> +			ret = -EADDRNOTAVAIL;
> +			goto rewind;
> +		}
> +		if (q->matrix_mdev) {
> +			ret = -EADDRINUSE;
> +			goto rewind;
> +		}
> +		list_move(&q->list, &q_list);

IMHO, all of the list moving and rewinding is over complicated and
not necessary if you simply maintain one list of queues in the
matrix_mdev.

>   	}
> -
> +	tmp = matrix_mdev;
> +	move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
>   	return 0;
> +rewind:
> +	move_and_set(&q_list, &matrix_dev->free_list, NULL);
> +	return ret;
>   }
>   
>   /**
> @@ -330,21 +380,15 @@ static ssize_t assign_adapter_store(struct device *dev,
>   	 */
>   	mutex_lock(&matrix_dev->lock);
>   
> -	ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
> +	ret = vfio_ap_get_all_domains(matrix_mdev, apid);
>   	if (ret)
>   		goto done;
>   
>   	set_bit_inv(apid, matrix_mdev->matrix.apm);
>   
> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> -	if (ret)
> -		goto share_err;
> -
>   	ret = count;
>   	goto done;
>   
> -share_err:
> -	clear_bit_inv(apid, matrix_mdev->matrix.apm);
>   done:
>   	mutex_unlock(&matrix_dev->lock);
>   
> @@ -391,32 +435,13 @@ static ssize_t unassign_adapter_store(struct device *dev,
>   
>   	mutex_lock(&matrix_dev->lock);
>   	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);

Maybe clear the bit in the vfio_ap_put_all_domains function (which as
I said above should be named vfio_ap_mdev_free_queues_for_apid()?

> +	vfio_ap_put_all_domains(matrix_mdev, apid);
>   	mutex_unlock(&matrix_dev->lock);
>   
>   	return count;
>   }
>   static DEVICE_ATTR_WO(unassign_adapter);
>   
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
> -					     unsigned long apqi)
> -{
> -	int ret;
> -	unsigned long apid;
> -	unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
> -
> -	if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
> -		return vfio_ap_verify_queue_reserved(NULL, &apqi);
> -
> -	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
> -		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	return 0;
> -}
> -
>   /**
>    * assign_domain_store
>    *
> @@ -471,21 +496,15 @@ static ssize_t assign_domain_store(struct device *dev,
>   
>   	mutex_lock(&matrix_dev->lock);
>   
> -	ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
> +	ret = vfio_ap_get_all_cards(matrix_mdev, apqi);
>   	if (ret)
>   		goto done;
>   
>   	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>   
> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> -	if (ret)
> -		goto share_err;
> -
>   	ret = count;
>   	goto done;
>   
> -share_err:
> -	clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
>   done:
>   	mutex_unlock(&matrix_dev->lock);
>   
> @@ -533,6 +552,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>   
>   	mutex_lock(&matrix_dev->lock);
>   	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);

Maybe clear the apqi in the vfio_ap_put_all_cards() function (which I
suggested should be called vfio_ap_mdev_free_queues_with_apqi()).

> +	vfio_ap_put_all_cards(matrix_mdev, apqi);
>   	mutex_unlock(&matrix_dev->lock);
>   
>   	return count;
> @@ -790,49 +810,22 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>   	return NOTIFY_OK;
>   }
>   
> -static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> -				    unsigned int retry)
> -{
> -	struct ap_queue_status status;
> -
> -	do {
> -		status = ap_zapq(AP_MKQID(apid, apqi));
> -		switch (status.response_code) {
> -		case AP_RESPONSE_NORMAL:
> -			return 0;
> -		case AP_RESPONSE_RESET_IN_PROGRESS:
> -		case AP_RESPONSE_BUSY:
> -			msleep(20);
> -			break;
> -		default:
> -			/* things are really broken, give up */
> -			return -EIO;
> -		}
> -	} while (retry--);
> -
> -	return -EBUSY;
> -}
> -
>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>   {
>   	int ret;
>   	int rc = 0;
> -	unsigned long apid, apqi;
>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +	struct vfio_ap_queue *q;
>   
> -	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
> -			     matrix_mdev->matrix.apm_max + 1) {
> -		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> -				     matrix_mdev->matrix.aqm_max + 1) {
> -			ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
> -			/*
> -			 * Regardless whether a queue turns out to be busy, or
> -			 * is not operational, we need to continue resetting
> -			 * the remaining queues.
> -			 */
> -			if (ret)
> -				rc = ret;
> -		}
> +	list_for_each_entry(q, &matrix_mdev->qlist, list) {
> +		ret = vfio_ap_mdev_reset_queue(q);
> +		/*
> +		 * Regardless whether a queue turns out to be busy, or
> +		 * is not operational, we need to continue resetting
> +		 * the remaining queues but notice the last error code.
> +		 */
> +		if (ret)
> +			rc = ret;
>   	}
>   
>   	return rc;
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel
  2019-02-22 15:29 ` [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel Pierre Morel
@ 2019-02-26 18:23   ` Tony Krowiak
  2019-02-27  9:54     ` Pierre Morel
  2019-02-27 18:18   ` Tony Krowiak
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2019-02-26 18:23 UTC (permalink / raw)
  To: Pierre Morel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/22/19 10:29 AM, Pierre Morel wrote:
> We register the AP PQAP instruction hook during the open
> of the mediated device. And unregister it on release.
> 
> In the AP PQAP instruction hook, if we receive a demand to
> enable IRQs,
> - we retrieve the vfio_ap_queue based on the APQN we receive
>    in REG1,
> - we retrieve the page of the guest address, (NIB), from
>    register REG2
> - we the mediated device to use the VFIO pinning infratrsucture
>    to pin the page of the guest address,
> - we retrieve the pointer to KVM to register the guest ISC
>    and retrieve the host ISC
> - finaly we activate GISA
> 
> If we receive a demand to disable IRQs,
> - we deactivate GISA
> - unregister from the GIB
> - unping the NIB
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>   arch/s390/include/asm/kvm_host.h      |   1 +
>   drivers/s390/crypto/ap_bus.h          |   1 +
>   drivers/s390/crypto/vfio_ap_ops.c     | 199 +++++++++++++++++++++++++++++++++-
>   drivers/s390/crypto/vfio_ap_private.h |   1 +
>   4 files changed, 199 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 49cc8b0..5f3bb8c 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -720,6 +720,7 @@ struct kvm_s390_cpu_model {
>   struct kvm_s390_crypto {
>   	struct kvm_s390_crypto_cb *crycb;
>   	int (*pqap_hook)(struct kvm_vcpu *vcpu);
> +	void *vfio_private;
>   	__u32 crycbd;
>   	__u8 aes_kw;
>   	__u8 dea_kw;
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index bfc66e4..323f2aa 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -43,6 +43,7 @@ static inline int ap_test_bit(unsigned int *ptr, unsigned int nr)
>   #define AP_RESPONSE_BUSY		0x05
>   #define AP_RESPONSE_INVALID_ADDRESS	0x06
>   #define AP_RESPONSE_OTHERWISE_CHANGED	0x07
> +#define AP_RESPONSE_INVALID_GISA	0x08
>   #define AP_RESPONSE_Q_FULL		0x10
>   #define AP_RESPONSE_NO_PENDING_REPLY	0x10
>   #define AP_RESPONSE_INDEX_TOO_BIG	0x11
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 1b5130a..0196065 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -43,7 +43,7 @@ struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
>   	return NULL;
>   }
>   
> -static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
> +int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>   {
>   	struct ap_queue_status status;
>   	int retry = 20;
> @@ -75,6 +75,27 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>   	return -EBUSY;
>   }
>   
> +/**
> + * vfio_ap_free_irq:
> + * @q: The vfio_ap_queue
> + *
> + * Unpin the guest NIB
> + * Unregister the ISC from the GIB alert
> + * Clear the vfio_ap_queue intern fields
> + */
> +static void vfio_ap_free_irq(struct vfio_ap_queue *q)
> +{
> +	if (!q)
> +		return;
> +	if (q->g_pfn)
> +		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &q->g_pfn, 1);
> +	if (q->isc)
> +		kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->isc);
> +	q->nib = 0;
> +	q->isc = 0;
> +	q->g_pfn = 0;
> +}
> +
>   static void vfio_ap_matrix_init(struct ap_config_info *info,
>   				struct ap_matrix *matrix)
>   {
> @@ -97,6 +118,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>   	}
>   
>   	INIT_LIST_HEAD(&matrix_mdev->qlist);
> +	matrix_mdev->mdev = mdev;
>   	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>   	mdev_set_drvdata(mdev, matrix_mdev);
>   	mutex_lock(&matrix_dev->lock);
> @@ -109,10 +131,16 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>   static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>   {
>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +	struct vfio_ap_queue *q, *qtmp;
>   
>   	if (matrix_mdev->kvm)
>   		return -EBUSY;
>   
> +	list_for_each_entry_safe(q, qtmp, &matrix_mdev->qlist, list) {
> +		q->matrix_mdev = NULL;
> +		vfio_ap_mdev_reset_queue(q);
> +		list_move(&q->list, &matrix_dev->free_list);
> +	}
>   	mutex_lock(&matrix_dev->lock);
>   	list_del(&matrix_mdev->node);
>   	mutex_unlock(&matrix_dev->lock);
> @@ -748,6 +776,161 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
>   };
>   
>   /**
> + * vfio_ap_clrirq: Disable Interruption for a APQN
> + *
> + * @dev: the device associated with the ap_queue
> + * @q:   the vfio_ap_queue holding AQIC parameters
> + *
> + * Issue the host side PQAP/AQIC
> + * On success: unpin the NIB saved in *q and unregister from GIB
> + * interface
> + *
> + * Return the ap_queue_status returned by the ap_aqic()
> + */
> +static struct ap_queue_status vfio_ap_clrirq(struct vfio_ap_queue *q)
> +{
> +	struct ap_qirq_ctrl aqic_gisa = {};
> +	struct ap_queue_status status;
> +
> +	status = ap_aqic(q->apqn, aqic_gisa, NULL);
> +	if (!status.response_code)
> +		vfio_ap_free_irq(q);
> +
> +	return status;
> +}
> +
> +/**
> + * vfio_ap_setirq: Enable Interruption for a APQN
> + *
> + * @dev: the device associated with the ap_queue
> + * @q:   the vfio_ap_queue holding AQIC parameters
> + *
> + * Pin the NIB saved in *q
> + * Register the guest ISC to GIB interface and retrieve the
> + * host ISC to issue the host side PQAP/AQIC
> + *
> + * Response.status may be set to following Response Code in case of error:
> + * - AP_RESPONSE_INVALID_ADDRESS: vfio_pin_pages failed
> + * - AP_RESPONSE_OTHERWISE_CHANGED: Hypervizor GISA internal error
> + *
> + * Otherwise return the ap_queue_status returned by the ap_aqic()
> + */
> +static struct ap_queue_status vfio_ap_setirq(struct vfio_ap_queue *q)
> +{
> +	struct ap_qirq_ctrl aqic_gisa = {};
> +	struct ap_queue_status status = {};
> +	struct kvm_s390_gisa *gisa;
> +	struct kvm *kvm;
> +	unsigned long g_pfn, h_nib, h_pfn;
> +	int ret;
> +
> +	kvm = q->matrix_mdev->kvm;
> +	gisa = kvm->arch.gisa_int.origin;
> +
> +	g_pfn = q->nib >> PAGE_SHIFT;
> +	ret = vfio_pin_pages(mdev_dev(q->matrix_mdev->mdev), &g_pfn, 1,
> +			     IOMMU_READ | IOMMU_WRITE, &h_pfn);
> +	switch (ret) {
> +	case 1:
> +		break;
> +	case -EINVAL:
> +	case -E2BIG:
> +		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
> +		/* Fallthrough */
> +	default:
> +		return status;
> +	}
> +
> +	h_nib = (h_pfn << PAGE_SHIFT) | (q->nib & ~PAGE_MASK);
> +	aqic_gisa.gisc = q->isc;
> +	aqic_gisa.isc = kvm_s390_gisc_register(kvm, q->isc);
> +	aqic_gisa.ir = 1;
> +	aqic_gisa.gisa = gisa->next_alert >> 4;
> +
> +	status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
> +	switch (status.response_code) {
> +	case AP_RESPONSE_NORMAL:
> +		if (q->g_pfn)
> +			vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
> +					 &q->g_pfn, 1);
> +		q->g_pfn = g_pfn;
> +		break;
> +	case AP_RESPONSE_OTHERWISE_CHANGED:
> +		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &g_pfn, 1);
> +		break;
> +	case AP_RESPONSE_INVALID_GISA:
> +		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
> +	default:	/* Fall Through */
> +		pr_warn("%s: apqn %04x: response: %02x\n", __func__, q->apqn,
> +			status.response_code);
> +		vfio_ap_free_irq(q);
> +		break;
> +	}
> +
> +	return status;
> +}
> +
> +/**
> + * handle_pqap: PQAP instruction callback
> + *
> + * @vcpu: The vcpu on which we received the PQAP instruction
> + *
> + * Get the general register contents to initialize internal variables.
> + * REG[0]: APQN
> + * REG[1]: IR and ISC
> + * REG[2]: NIB
> + *
> + * Response.status may be set to following Response Code:
> + * - AP_RESPONSE_Q_NOT_AVAIL: if the queue is not available
> + * - AP_RESPONSE_DECONFIGURED: if the queue is not configured
> + * - AP_RESPONSE_NORMAL (0) : in case of successs
> + *   Check vfio_ap_setirq() and vfio_ap_clrirq() for other possible RC.
> + *
> + * Return 0 if we could handle the request inside KVM.
> + * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
> + */
> +static int handle_pqap(struct kvm_vcpu *vcpu)

Change this function name to handle_pqap_aqic

> +{
> +	uint64_t status;
> +	uint16_t apqn;
> +	struct vfio_ap_queue *q;
> +	struct ap_queue_status qstatus = {};
> +	struct ap_matrix_mdev *matrix_mdev;
> +
> +	/* If we do not use the AIV facility just go to userland */
> +	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
> +		return -EOPNOTSUPP;
> +
> +	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
> +	matrix_mdev = vcpu->kvm->arch.crypto.vfio_private;
> +	if (!matrix_mdev)
> +		return -EOPNOTSUPP;
> +	q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
> +	if (!q) {
> +		qstatus.response_code = AP_RESPONSE_Q_NOT_AVAIL;
> +		goto out;
> +	}
> +
> +	status = vcpu->run->s.regs.gprs[1];
> +
> +	/* If IR bit(16) is set we enable the interrupt */
> +	if ((status >> (63 - 16)) & 0x01) {
> +		q->isc = status & 0x07;
> +		q->nib = vcpu->run->s.regs.gprs[2];
> +		qstatus = vfio_ap_setirq(q);
> +		if (qstatus.response_code) {
> +			q->nib = 0;
> +			q->isc = 0;
> +		}
> +	} else
> +		qstatus = vfio_ap_clrirq(q);
> +
> +out:
> +	memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus));
> +	return 0;
> +}

Add this function:

static int handle_pqap(struct kvm_vcpu *vcpu)
{
	int ret;
	uint8_t fc;

	fc = vcpu->run->s.regs.gprs[0] >> 24;
	switch (fc) {
	case 0x03:
		ret = handle_pqap_aqic(vcpu);
		break;
	default:
		ret = -EOPNOTSUPP;
		break;
	}

	return ret;
}

> +
> + /*
>    * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
>    *
>    * @nb: The notifier block
> @@ -767,9 +950,10 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>   
>   	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
>   		struct vfio_iommu_type1_dma_unmap *unmap = data;
> -		unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
> +		unsigned long pfn = unmap->iova >> PAGE_SHIFT;
>   
> -		vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
> +		if (matrix_mdev->mdev)
> +			vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &pfn, 1);
>   		return NOTIFY_OK;
>   	}
>   
> @@ -879,6 +1063,11 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>   	if (ret)
>   		goto err_group;
>   
> +	if (!matrix_mdev->kvm) {
> +		ret = -ENODEV;
> +		goto err_iommu;
> +	}
> +
>   	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
>   	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
>   
> @@ -887,6 +1076,8 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>   	if (ret)
>   		goto err_iommu;
>   
> +	matrix_mdev->kvm->arch.crypto.pqap_hook = handle_pqap;
> +	matrix_mdev->kvm->arch.crypto.vfio_private = matrix_mdev;

I do not see this used anywhere, why do we need it?

>   	return 0;
>   
>   err_iommu:
> @@ -905,6 +1096,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>   		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>   
>   	vfio_ap_mdev_reset_queues(mdev);
> +	matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> +	matrix_mdev->kvm->arch.crypto.vfio_private = NULL;

Ditto

>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>   				 &matrix_mdev->group_notifier);
>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index e535735..e2fd2c0 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -94,6 +94,7 @@ struct vfio_ap_queue {
>   	struct list_head list;
>   	struct ap_matrix_mdev *matrix_mdev;
>   	unsigned long nib;
> +	unsigned long g_pfn;

Can't this be calculated from the nib?

>   	int	apqn;
>   	unsigned char isc;
>   };
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device
  2019-02-22 15:29 ` [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device Pierre Morel
@ 2019-02-26 18:27   ` Tony Krowiak
  2019-02-27  9:58     ` Pierre Morel
  2019-03-04 13:02     ` Cornelia Huck
  2019-03-08 22:43   ` Tony Krowiak
  1 sibling, 2 replies; 79+ messages in thread
From: Tony Krowiak @ 2019-02-26 18:27 UTC (permalink / raw)
  To: Pierre Morel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/22/19 10:29 AM, Pierre Morel wrote:
> When the device is remove, we must make sure to
> clear the interruption and reset the AP device.
> 
> We also need to clear the CRYCB of the guest.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>   drivers/s390/crypto/vfio_ap_drv.c     | 35 +++++++++++++++++++++++++++++++++++
>   drivers/s390/crypto/vfio_ap_ops.c     |  3 ++-
>   drivers/s390/crypto/vfio_ap_private.h |  3 +++
>   3 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index eca0ffc..e5d91ff 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -5,6 +5,7 @@
>    * Copyright IBM Corp. 2018
>    *
>    * Author(s): Tony Krowiak <akrowiak@linux.ibm.com>
> + *	      Pierre Morel <pmorel@linux.ibm.com>
>    */
>   
>   #include <linux/module.h>
> @@ -12,6 +13,8 @@
>   #include <linux/slab.h>
>   #include <linux/string.h>
>   #include <asm/facility.h>
> +#include <linux/bitops.h>
> +#include <linux/kvm_host.h>
>   #include "vfio_ap_private.h"
>   
>   #define VFIO_AP_ROOT_NAME "vfio_ap"
> @@ -61,6 +64,33 @@ static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>   }
>   
>   /**
> + * vfio_ap_update_crycb
> + * @q: A pointer to the queue being removed
> + *
> + * We clear the APID of the queue, making this queue unusable for the guest.
> + * After this function we can reset the queue without to fear a race with
> + * the guest to access the queue again.
> + * We do not fear race with the host as we still get the device.
> + */
> +static void vfio_ap_update_crycb(struct vfio_ap_queue *q)
> +{
> +	struct ap_matrix_mdev *matrix_mdev = q->matrix_mdev;
> +
> +	if (!matrix_mdev)
> +		return;
> +
> +	clear_bit_inv(AP_QID_CARD(q->apqn), matrix_mdev->matrix.apm);
> +
> +	if (!matrix_mdev->kvm)
> +		return;
> +
> +	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
> +				  matrix_mdev->matrix.apm,
> +				  matrix_mdev->matrix.aqm,
> +				  matrix_mdev->matrix.adm);
> +}
> +
> +/**
>    * vfio_ap_queue_dev_remove:
>    *
>    * Free the associated vfio_ap_queue structure
> @@ -70,6 +100,11 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>   	struct vfio_ap_queue *q;
>   
>   	q = dev_get_drvdata(&apdev->device);
> +	if (!q)
> +		return;
> +
> +	vfio_ap_update_crycb(q);
> +	vfio_ap_mdev_reset_queue(q);

The reset is unnecessary because once the card is removed from the
CRYCB, the ZAPQ may fail with because the queue may not exist anymore.
Besides, once the card is removed from the guest's CRYCB, the bus
running in the guest will do a reset.

>   	list_del(&q->list);
>   	kfree(q);
>   }
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 0196065..5b9bb33 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -59,6 +59,7 @@ int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>   			if (retry <= 0)
>   				pr_warn("%s: queue 0x%04x not empty\n",
>   					__func__, q->apqn);
> +			vfio_ap_free_irq(q);
>   			return 0;
>   		case AP_RESPONSE_RESET_IN_PROGRESS:
>   		case AP_RESPONSE_BUSY:
> @@ -83,7 +84,7 @@ int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>    * Unregister the ISC from the GIB alert
>    * Clear the vfio_ap_queue intern fields
>    */
> -static void vfio_ap_free_irq(struct vfio_ap_queue *q)
> +void vfio_ap_free_irq(struct vfio_ap_queue *q)
>   {
>   	if (!q)
>   		return;
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index e2fd2c0..cc18215 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -4,6 +4,7 @@
>    *
>    * Author(s): Tony Krowiak <akrowiak@linux.ibm.com>
>    *	      Halil Pasic <pasic@linux.ibm.com>
> + *	      Pierre Morel <pmorel@linux.ibm.com>
>    *
>    * Copyright IBM Corp. 2018
>    */
> @@ -98,4 +99,6 @@ struct vfio_ap_queue {
>   	int	apqn;
>   	unsigned char isc;
>   };
> +void vfio_ap_free_irq(struct vfio_ap_queue *q);
> +int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q);
>   #endif /* _VFIO_AP_PRIVATE_H_ */
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-26 15:47       ` Tony Krowiak
@ 2019-02-27  8:09         ` Pierre Morel
  2019-02-27  9:13           ` Cornelia Huck
  2019-02-27 18:00           ` Tony Krowiak
  0 siblings, 2 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-27  8:09 UTC (permalink / raw)
  To: Tony Krowiak, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 26/02/2019 16:47, Tony Krowiak wrote:
> On 2/26/19 6:47 AM, Pierre Morel wrote:
>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>> the case the AQIC facility is enabled in the guest.
>>>>
>>>> We add a callback inside the KVM arch structure for s390 for
>>>> a VFIO driver to handle a specific response to the PQAP
>>>> instruction with the AQIC command.
>>>>
>>>> We inject the correct exceptions from inside KVM for the case the
>>>> callback is not initialized, which happens when the vfio_ap driver
>>>> is not loaded.
>>>>
>>>> If the callback has been setup we call it.
>>>> If not we setup an answer considering that no queue is available
>>>> for the guest when no callback has been setup.
>>>>
>>>> We do consider the responsability of the driver to always initialize
>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>> a guest.
>>>>
>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>
>> ...snip...
>>
>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>       }
>>>>   }
>>>> +/*
>>>> + * handle_pqap: Handling pqap interception
>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>> + *
>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>> + *
>>>> + * The intercepting code calls a dedicated callback for this 
>>>> instruction
>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>> + * SIE block.
>>>> + *
>>>> + * For PQAP/AQIC instructions only, verify privilege and 
>>>> specifications.
>>>> + *
>>>> + * If no callback available, the queues are not available, return 
>>>> this to
>>>> + * the caller.
>>>> + * Else return the value returned by the callback.
>>>> + */
>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +    uint8_t fc;
>>>> +    struct ap_queue_status status = {};
>>>> +
>>>> +    /* Verify that the AP instruction are available */
>>>> +    if (!ap_instructions_available())
>>>> +        return -EOPNOTSUPP;
>>>
>>> How can the guest even execute an AP instruction if the AP instructions
>>> are not available? If the AP instructions are not available on the host,
>>> they will not be available on the guest (i.e., CPU model feature
>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>> here given QEMU may not be the only client.
>>>
>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>> +        return -EOPNOTSUPP;
>>>> +    /* Verify that the function code is AQIC */
>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>> +    if (fc != 0x03)
>>>> +        return -EOPNOTSUPP;
>>>
>>> You must have missed my suggestion to move this to the
>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>
>> Please consider what happen if the vfio_ap module is not loaded.
> 
> I have considered it and even verified my expectations empirically. If
> the vfio_ap module is not loaded, you will not be able to create an mdev 
> device.

OK, now please consider that another userland tool, not QEMU uses KVM.

> If you don't have an mdev device, you will not be able to
> start a guest with a vfio-ap device. If you start a guest without a
> vfio-ap device, but enable AP instructions for the guest, there will be
> no AP devices attached to the guest. Without any AP devices attached,
> the PQAP(AQIC) instructions will not ever get executed.

This is not right. The instruction will be executed, eventually, after 
decoding.

> Even if for some
> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
> reason, it will fail with response code 0x01, AP-queue number not valid.

No, before accessing the AP-queue the instruction will be decoded and 
depending on the installed micro-code it will fail with
- OPERATION EXCEPTION if the micro-code is not installed
- PRIVILEDGE OPERATION if the instruction is issued from userland 
(programm state)
- SPECIFICATION exception if the instruction do not respect the usage 
specification

then it will be interpreted by the microcode and access the queue and 
only then it will fail with RC 0x01, AP queue not valid.

In the case of KVM, we intercept the instruction because it is issued by 
the guest and we set the AQIC facility on to force interception.

KVM do for us all the decode steps I mention here above, if there is or 
not a pqap hook to be call to simulate the QP queue access.

That done, the AP queue virtualisation can be called, this is done by 
calling the hook.

> 
> 
>>
>>>
>>> Message ID <342ffd56-b73a-b1f4-004d-de2c4aeef729@linux.ibm.com>
>>> Message ID <e04f0c8b-2fd9-1846-334a-faa48e0e051e@linux.ibm.com>
>>>
>>> You previously stated:
>>>
>>>     "QEMU and KVM can both accept PQAP/AQIC even if the vfio_ap 
>>> driver is
>>>      not loaded. However now that the guest officially get the PQAP/AQIC
>>>      instruction we need to handle the specification and operation
>>>      exceptions inside KVM _before_ testing and even calling the driver
>>>      hook.
>>>
>>>      I will make the changes in the next iteration."
>>
>> Still seems right to me, and is done is this patch.
>> Isn't it?
> 
> I don't think it's a matter of right and wrong, it's a matter of what
> makes sense. IMHO, you want to make things easy if other PQAP functions
> are intercepted at some time. In my opinion, there should be a switch
> statement in the pqap hook code with a case statement for each PQAP
> function supported by the hook. To plug in a new PQAP function handler,
> it will be a simple matter of writing the handler function and calling
> it from the case statement, like this:
> 
> static int handle_pqap(struct kvm_vcpu *vcpu)
> {
>      int ret;
>      uint8_t fc;
> 
>      fc = vcpu->run->s.regs.gprs[0] >> 24;
> 
>      switch (fc) {
>      case 0x03:
>          ret = handle_pqap_aqic(vcpu);
>      default:
>          ret = -EOPNOTSUPP;
>      }
> 
>      return ret;
> }
> 
> That function belongs in the pqap hook. I see no reaason whatsoever to
> check the function code here. If there is no hook, then you will fall
> through to the instruction below:
> 
> status.response_code = 0x01;

See answer above, what you are speaking about is the execution of the 
instruction, but there can be exceptions during the decode of the 
instruction.

> 
>>
>>>
>>> I don't know what any of the above has to do with checking FC=0x03? If
>>> that check is moved to the pqap handler hook, it can just as well return
>>> -EOPNOTSUPP. In fact, down below you do this:
>>>
>>>      return vcpu->kvm->arch.crypto.pqap_hook(vcpu);
>>>
>>> If the RC=0x03 check fails in the hook, it will return -EOPNOTSUPP just
>>> like above. None of this is critical, but the parsing of the register
>>> values for the PQAP(AQIC) function ought to be done in the code that
>>> handles the PQAP instruction IMHO.
>>
>>
>> This interception code must handle the PQAP/AQIC instruction when the 
>> hook is not used and should not modify the handling for other PQAP 
>> instructions.
>> We can not move anything inside the hook that must be always done.
> 
> What you are saying here makes no sense. If the check for the function
> code is moved into the pqap hook and fc != 0x03, the result will be
> exactly the same; the hook will return -EOPNOTSUPP.

again please consider that the hook may not be initialized.

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 2/7] s390: ap: new vfio_ap_queue structure
  2019-02-26 16:10   ` Tony Krowiak
@ 2019-02-27  8:40     ` Pierre Morel
  2019-02-27 20:35       ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-27  8:40 UTC (permalink / raw)
  To: Tony Krowiak, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 26/02/2019 17:10, Tony Krowiak wrote:
> On 2/22/19 10:29 AM, Pierre Morel wrote:
>> The AP interruptions are assigned on a queue basis and
>> the GISA structure is handled on a VM basis, so that
>> we need to add a structure we can retrieve from both side
>> holding the information we need to handle PQAP/AQIC interception
>> and setup the GISA.
>>
>> Since we can not add more information to the ap_device
>> we add a new structure vfio_ap_queue, to hold per queue
>> information useful to handle interruptions and set it as
>> driver's data of the standard ap_queue device.
>>
>> Usually, the device and the mediated device are linked together
>> but in the vfio_ap driver design we have a bunch of "sub" devices
>> (the ap_queue devices) belonging to the mediated device.
>>
>> Linking these structure to the mediated device it is assigned to,
>> with the help of the vfio_ap_queue structure will help us to
>> retrieve the AP devices associated with the mediated devices
>> during the mediated device operations.
>>
>> ------------    -------------
>> | AP queue |--> | AP_vfio_q |<----
>> ------------    ------^------    |    ---------------
>>                        |          <--->| matrix_mdev |
>> ------------    ------v------    |    ---------------
>> | AP queue |--> | AP_vfio_q |-----
>> ------------    -------------
>>
>> The vfio_ap_queue device will hold the following entries:
>> - apqn: AP queue number (defined here)
>> - isc : Interrupt subclass (defined later)
>> - nib : notification information byte (defined later)
>> - list: a list_head entry allowing to link this structure to a
>>     matrix mediated device it is assigned to.
>>
>> The vfio_ap_queue structure is allocated when the vfio_ap_driver
>> is probed and added as driver data to the ap_queue device.
>> It is free on remove.
>>
>> The structure is linked to the matrix_dev host device at the
>> probe of the device building some kind of free list for the
>> matrix mediated devices.
>>
>> When the vfio_queue is associated to a matrix mediated device,
>> the vfio_ap_queue device is linked to this matrix mediated device
>> and unlinked when dissociated.
>>
>> This patch and the 3 next can be squashed together on the
>> final release of this series.
>> until then I separate them to ease the review.
>>
>> So please do not complain about unused functions or about
>> squashing the patches together, this will be resolved during
>> the last iteration.
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c     | 27 ++++++++++++++++++++++++++-
>>   drivers/s390/crypto/vfio_ap_private.h |  9 +++++++++
>>   2 files changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
>> b/drivers/s390/crypto/vfio_ap_drv.c
>> index e9824c3..eca0ffc 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -40,14 +40,38 @@ static struct ap_device_id ap_queue_ids[] = {
>>   MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>> +/**
>> + * vfio_ap_queue_dev_probe:
>> + *
>> + * Allocate a vfio_ap_queue structure and associate it
>> + * with the device as driver_data.
>> + */
>>   static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>>   {
>> +    struct vfio_ap_queue *q;
>> +
>> +    q = kzalloc(sizeof(*q), GFP_KERNEL);
>> +    if (!q)
>> +        return -ENOMEM;
>> +    dev_set_drvdata(&apdev->device, q);
>> +    q->apqn = to_ap_queue(&apdev->device)->qid;
>> +    INIT_LIST_HEAD(&q->list);
>> +    list_add(&q->list, &matrix_dev->free_list);
>>       return 0;
>>   }
>> +/**
>> + * vfio_ap_queue_dev_remove:
>> + *
>> + * Free the associated vfio_ap_queue structure
>> + */
>>   static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>>   {
>> -    /* Nothing to do yet */
>> +    struct vfio_ap_queue *q;
>> +
>> +    q = dev_get_drvdata(&apdev->device);
>> +    list_del(&q->list);
>> +    kfree(q);
>>   }
>>   static void vfio_ap_matrix_dev_release(struct device *dev)
>> @@ -107,6 +131,7 @@ static int vfio_ap_matrix_dev_create(void)
>>       matrix_dev->device.bus = &matrix_bus;
>>       matrix_dev->device.release = vfio_ap_matrix_dev_release;
>>       matrix_dev->vfio_ap_drv = &vfio_ap_drv;
>> +    INIT_LIST_HEAD(&matrix_dev->free_list);
>>       ret = device_register(&matrix_dev->device);
>>       if (ret)
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h 
>> b/drivers/s390/crypto/vfio_ap_private.h
>> index 76b7f98..2760178 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -39,6 +39,7 @@ struct ap_matrix_dev {
>>       atomic_t available_instances;
>>       struct ap_config_info info;
>>       struct list_head mdev_list;
>> +    struct list_head free_list;
>>       struct mutex lock;
>>       struct ap_driver  *vfio_ap_drv;
>>   };
>> @@ -81,9 +82,17 @@ struct ap_matrix_mdev {
>>       struct ap_matrix matrix;
>>       struct notifier_block group_notifier;
>>       struct kvm *kvm;
>> +    struct list_head qlist;
> 
> I do not see much value in maintaining two lists of at the
> expense of complicating the code and introducing additional
> processing (i.e., list rewinds etc.). IMHO, the only think it buys
> us is being able to pass a smaller list to the vfio_ap_get_queue()
> function to traverse. That function can traverse the list in
> struct ap_matrix_dev to find a queue. I understand what you are
> doing here in pulling vfio_ap_queue structs from the free_list
> to add them to qlist for the mdev when adapter/domain assignment
> takes place, but you are now maintaining links to the vfio_ap_queue
> in multiple places; as drvdata as well as two lists. I think this
> is over designing.

This is not completely exact, the drvdata allows to retrieve the 
vfio_ap_queue structure from the AP device, which is global to all AP 
devices, and not related to a mediated device.

The vfio_ap_queue structure is only linked to one of the different lists 
it can be linked to: the free_list or the mediated device list.

The complication of the code you mention make us win almost 200 LOCs, 
some may see it as a simplification ;) .

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-27  8:09         ` Pierre Morel
@ 2019-02-27  9:13           ` Cornelia Huck
  2019-02-27 10:16             ` Pierre Morel
  2019-02-27 18:00           ` Tony Krowiak
  1 sibling, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2019-02-27  9:13 UTC (permalink / raw)
  To: Pierre Morel
  Cc: Tony Krowiak, borntraeger, alex.williamson, linux-kernel,
	linux-s390, kvm, frankja, pasic, david, schwidefsky,
	heiko.carstens, freude, mimu

On Wed, 27 Feb 2019 09:09:09 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 26/02/2019 16:47, Tony Krowiak wrote:
> > On 2/26/19 6:47 AM, Pierre Morel wrote:  
> >> On 25/02/2019 19:36, Tony Krowiak wrote:  
> >>> On 2/22/19 10:29 AM, Pierre Morel wrote:  
> >>>> We prepare the interception of the PQAP/AQIC instruction for
> >>>> the case the AQIC facility is enabled in the guest.
> >>>>
> >>>> We add a callback inside the KVM arch structure for s390 for
> >>>> a VFIO driver to handle a specific response to the PQAP
> >>>> instruction with the AQIC command.
> >>>>
> >>>> We inject the correct exceptions from inside KVM for the case the
> >>>> callback is not initialized, which happens when the vfio_ap driver
> >>>> is not loaded.
> >>>>
> >>>> If the callback has been setup we call it.
> >>>> If not we setup an answer considering that no queue is available
> >>>> for the guest when no callback has been setup.
> >>>>
> >>>> We do consider the responsability of the driver to always initialize
> >>>> the PQAP callback if it defines queues by initializing the CRYCB for
> >>>> a guest.
> >>>>
> >>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>  
> >>
> >> ...snip...
> >>  
> >>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
> >>>>       }
> >>>>   }
> >>>> +/*
> >>>> + * handle_pqap: Handling pqap interception
> >>>> + * @vcpu: the vcpu having issue the pqap instruction
> >>>> + *
> >>>> + * We now support PQAP/AQIC instructions and we need to correctly
> >>>> + * answer the guest even if no dedicated driver's hook is available.
> >>>> + *
> >>>> + * The intercepting code calls a dedicated callback for this 
> >>>> instruction
> >>>> + * if a driver did register one in the CRYPTO satellite of the
> >>>> + * SIE block.
> >>>> + *
> >>>> + * For PQAP/AQIC instructions only, verify privilege and 
> >>>> specifications.
> >>>> + *
> >>>> + * If no callback available, the queues are not available, return 
> >>>> this to
> >>>> + * the caller.
> >>>> + * Else return the value returned by the callback.
> >>>> + */
> >>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +    uint8_t fc;
> >>>> +    struct ap_queue_status status = {};
> >>>> +
> >>>> +    /* Verify that the AP instruction are available */
> >>>> +    if (!ap_instructions_available())
> >>>> +        return -EOPNOTSUPP;  
> >>>
> >>> How can the guest even execute an AP instruction if the AP instructions
> >>> are not available? If the AP instructions are not available on the host,
> >>> they will not be available on the guest (i.e., CPU model feature
> >>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
> >>> here given QEMU may not be the only client.
> >>>  
> >>>> +    /* Verify that the guest is allowed to use AP instructions */
> >>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
> >>>> +        return -EOPNOTSUPP;
> >>>> +    /* Verify that the function code is AQIC */
> >>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
> >>>> +    if (fc != 0x03)
> >>>> +        return -EOPNOTSUPP;  
> >>>
> >>> You must have missed my suggestion to move this to the
> >>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:  
> >>
> >> Please consider what happen if the vfio_ap module is not loaded.  
> > 
> > I have considered it and even verified my expectations empirically. If
> > the vfio_ap module is not loaded, you will not be able to create an mdev 
> > device.  
> 
> OK, now please consider that another userland tool, not QEMU uses KVM.
> 
> > If you don't have an mdev device, you will not be able to
> > start a guest with a vfio-ap device. If you start a guest without a
> > vfio-ap device, but enable AP instructions for the guest, there will be
> > no AP devices attached to the guest. Without any AP devices attached,
> > the PQAP(AQIC) instructions will not ever get executed.  
> 
> This is not right. The instruction will be executed, eventually, after 
> decoding.

A sane guest will not issue PQAP(AQIC) if it doesn't have ap
capabilities, but there's nothing that keeps a guest from issuing that
instruction regardless.

However, is this instruction always intercepted and never handled by
the SIE itself, even if the guest was not configured for ap? By which
criteria do we enable interception?

> 
> > Even if for some
> > unknown reason the PQAP(AQIC) instruction is executed - for some unknown
> > reason, it will fail with response code 0x01, AP-queue number not valid.  
> 
> No, before accessing the AP-queue the instruction will be decoded and 
> depending on the installed micro-code it will fail with
> - OPERATION EXCEPTION if the micro-code is not installed
> - PRIVILEDGE OPERATION if the instruction is issued from userland 
> (programm state)
> - SPECIFICATION exception if the instruction do not respect the usage 
> specification

So, all of these happen prior to checking the function code?

> 
> then it will be interpreted by the microcode and access the queue and 
> only then it will fail with RC 0x01, AP queue not valid.
> 
> In the case of KVM, we intercept the instruction because it is issued by 
> the guest and we set the AQIC facility on to force interception.

Will we set that facility even if no vfio-ap device is configured?

> 
> KVM do for us all the decode steps I mention here above, if there is or 
> not a pqap hook to be call to simulate the QP queue access.
> 
> That done, the AP queue virtualisation can be called, this is done by 
> calling the hook.
> 
> > 
> >   
> >>  
> >>>
> >>> Message ID <342ffd56-b73a-b1f4-004d-de2c4aeef729@linux.ibm.com>
> >>> Message ID <e04f0c8b-2fd9-1846-334a-faa48e0e051e@linux.ibm.com>
> >>>
> >>> You previously stated:
> >>>
> >>>     "QEMU and KVM can both accept PQAP/AQIC even if the vfio_ap 
> >>> driver is
> >>>      not loaded. However now that the guest officially get the PQAP/AQIC
> >>>      instruction we need to handle the specification and operation
> >>>      exceptions inside KVM _before_ testing and even calling the driver
> >>>      hook.
> >>>
> >>>      I will make the changes in the next iteration."  
> >>
> >> Still seems right to me, and is done is this patch.
> >> Isn't it?  
> > 
> > I don't think it's a matter of right and wrong, it's a matter of what
> > makes sense. IMHO, you want to make things easy if other PQAP functions
> > are intercepted at some time. In my opinion, there should be a switch
> > statement in the pqap hook code with a case statement for each PQAP
> > function supported by the hook. To plug in a new PQAP function handler,
> > it will be a simple matter of writing the handler function and calling
> > it from the case statement, like this:
> > 
> > static int handle_pqap(struct kvm_vcpu *vcpu)
> > {
> >      int ret;
> >      uint8_t fc;
> > 
> >      fc = vcpu->run->s.regs.gprs[0] >> 24;
> > 
> >      switch (fc) {
> >      case 0x03:
> >          ret = handle_pqap_aqic(vcpu);
> >      default:
> >          ret = -EOPNOTSUPP;
> >      }
> > 
> >      return ret;
> > }
> > 
> > That function belongs in the pqap hook. I see no reaason whatsoever to
> > check the function code here. If there is no hook, then you will fall
> > through to the instruction below:
> > 
> > status.response_code = 0x01;  
> 
> See answer above, what you are speaking about is the execution of the 
> instruction, but there can be exceptions during the decode of the 
> instruction.

If e.g. calling that instruction from userspace always creates a priv
op exception, that should be checked prior to even looking at the
function code. Same with other exceptions. From my no-docs point of
view, it makes sense to have those common checks in handle_pqap() and
use the switch/case to call handler functions for the individual
function codes...

> 
> >   
> >>  
> >>>
> >>> I don't know what any of the above has to do with checking FC=0x03? If
> >>> that check is moved to the pqap handler hook, it can just as well return
> >>> -EOPNOTSUPP. In fact, down below you do this:
> >>>
> >>>      return vcpu->kvm->arch.crypto.pqap_hook(vcpu);
> >>>
> >>> If the RC=0x03 check fails in the hook, it will return -EOPNOTSUPP just
> >>> like above. None of this is critical, but the parsing of the register
> >>> values for the PQAP(AQIC) function ought to be done in the code that
> >>> handles the PQAP instruction IMHO.  
> >>
> >>
> >> This interception code must handle the PQAP/AQIC instruction when the 
> >> hook is not used and should not modify the handling for other PQAP 
> >> instructions.
> >> We can not move anything inside the hook that must be always done.  
> > 
> > What you are saying here makes no sense. If the check for the function
> > code is moved into the pqap hook and fc != 0x03, the result will be
> > exactly the same; the hook will return -EOPNOTSUPP.  
> 
> again please consider that the hook may not be initialized.

I agree.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-02-26 18:14   ` Tony Krowiak
@ 2019-02-27  9:29     ` Pierre Morel
  2019-02-27 20:14       ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-27  9:29 UTC (permalink / raw)
  To: Tony Krowiak, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 26/02/2019 19:14, Tony Krowiak wrote:
> On 2/22/19 10:29 AM, Pierre Morel wrote:
>> We need to associate the ap_vfio_queue, which will hold the
>> per queue information for interrupt with a matrix mediated device
>> which hold the configuration and the way to the CRYCB.
>>
>> Let's do this when assigning a APID or a APQI to the mediated device
>> and clear the relation when unassigning.
>>
>> Queuing the devices on a list of free devices and testing the
>> matrix_mdev pointer to the associated matrix allow us to know
>> if the queue is associated to the matrix device and associated
>> or not to a mediated device.
>>
>> When resetting an AP queue we must wait until there are no more
>> messages in the message queue before considering the queue is really
>> in a clean state.
>>
>> Let's do it and wait until the status response code indicate the
>> queue is empty after issuing a PAPQ/ZAPQ instruction.
>>
>> Being at work on the reset function, let's simplify
>> vfio_ap_mdev_reset_queue and vfio_ap_mdev_reset_queues by using the
>> vfio_ap_queue structure as parameter.
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 385 
>> +++++++++++++++++++-------------------
>>   1 file changed, 189 insertions(+), 196 deletions(-)

...snip...

>> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>> +{
>> +    struct ap_queue_status status;
>> +    int retry = 20;
>> +
>> +    do {
>> +        status = ap_zapq(q->apqn);
>> +        switch (status.response_code) {
>> +        case AP_RESPONSE_NORMAL:
>> +            while (!status.queue_empty && retry--) {
>> +                msleep(20);
>> +                status = ap_tapq(q->apqn, NULL);
>> +            }
> 
> I am not sure the above is necessary. I have an email out to the author
> of the architecture doc to verify.

I do not know the question you asked but the documentation is very clear 
on the reset behavior: a queue is completely reseted only after the RC 
of reset/zapq is 0 and the queue_empty bit is set.

> 
>> +            if (retry <= 0)
>> +                pr_warn("%s: queue 0x%04x not empty\n",

...snip...

>> + * @matrix_mdev: the matrix mediated device for which we want to 
>> associate
>> + *         all available queues with a given apqi.
>> + * @apid:     The apid which associated with all defined APQI of the
>> + *         mediated device will define a AP queue.
>>    *
>> - * - If @data contains only an apid value, @data will be flagged as
>> - *   reserved if the APID field in the AP queue device matches
>> - *
>> - * - If @data contains only an apqi value, @data will be flagged as
>> - *   reserved if the APQI field in the AP queue device matches
>> - *
>> - * Returns 0 to indicate the input to function succeeded. Returns 
>> -EINVAL if
>> - * @data does not contain either an apid or apqi.
>> + * We remove the queue from the list of queues associated with the
>> + * mediated device and put them back to the free list of the matrix
>> + * device and clear the matrix_mdev pointer.
>>    */
>> -static int vfio_ap_has_queue(struct device *dev, void *data)
>> +static void vfio_ap_put_all_domains(struct ap_matrix_mdev *matrix_mdev,
>> +                    int apid)
> 
> I would prefer this be named:
> 
>      vfio_ap_mdev_free_queues_with_apid()
> 
> get/put is typically used to increment/decrement reference counters.
> What you are doing in this function freeing all queues connected to 
> specified card.

OK, I can change this function name and the further one you mentioned.

> 
>>   {
>> -    struct vfio_ap_queue_reserved *qres = data;
>> -    struct ap_queue *ap_queue = to_ap_queue(dev);
>> -    ap_qid_t qid;
>> -    unsigned long id;
>> +    int apqi, apqn;
>> -    if (qres->apid && qres->apqi) {
>> -        qid = AP_MKQID(*qres->apid, *qres->apqi);
>> -        if (qid == ap_queue->qid)
>> -            qres->reserved = true;
>> -    } else if (qres->apid && !qres->apqi) {
>> -        id = AP_QID_CARD(ap_queue->qid);
>> -        if (id == *qres->apid)
>> -            qres->reserved = true;
>> -    } else if (!qres->apid && qres->apqi) {
>> -        id = AP_QID_QUEUE(ap_queue->qid);
>> -        if (id == *qres->apqi)
>> -            qres->reserved = true;
>> -    } else {
>> -        return -EINVAL;
>> +    for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>> +        apqn = AP_MKQID(apid, apqi);
>> +        vfio_ap_free_queue(apqn, matrix_mdev);
>>       }
> 
> Maybe you should clear the bit corresponding to apid from the APM here?

I do not think so, this is pure list handling, the APM bit is already 
cleared in the unassign_adapter_store function.

I only answered once for all comments on naming and bit mask but will 
treat them the same way.
Thanks for comments.

Regards,
Pierre



-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-02-22 15:29 ` [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev Pierre Morel
  2019-02-26 18:14   ` Tony Krowiak
@ 2019-02-27  9:32   ` Cornelia Huck
  2019-02-27 10:21     ` Pierre Morel
  2019-02-27 10:44     ` Pierre Morel
  2019-02-27 20:53   ` Tony Krowiak
  2019-03-04  2:09   ` Halil Pasic
  3 siblings, 2 replies; 79+ messages in thread
From: Cornelia Huck @ 2019-02-27  9:32 UTC (permalink / raw)
  To: Pierre Morel
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On Fri, 22 Feb 2019 16:29:56 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> We need to associate the ap_vfio_queue, which will hold the
> per queue information for interrupt with a matrix mediated device
> which hold the configuration and the way to the CRYCB.
> 
> Let's do this when assigning a APID or a APQI to the mediated device
> and clear the relation when unassigning.
> 
> Queuing the devices on a list of free devices and testing the
> matrix_mdev pointer to the associated matrix allow us to know
> if the queue is associated to the matrix device and associated
> or not to a mediated device.
> 
> When resetting an AP queue we must wait until there are no more
> messages in the message queue before considering the queue is really
> in a clean state.
> 
> Let's do it and wait until the status response code indicate the
> queue is empty after issuing a PAPQ/ZAPQ instruction.

I'm a bit confused about the context where that list moving etc. is
supposed to take place.

When are we assigning/deassigning? Is there even supposed to be any
activity that we need to zap on the queues?

Do we need any serialization/locking on the lists?

> 
> Being at work on the reset function, let's simplify
> vfio_ap_mdev_reset_queue and vfio_ap_mdev_reset_queues by using the
> vfio_ap_queue structure as parameter.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 385 +++++++++++++++++++-------------------
>  1 file changed, 189 insertions(+), 196 deletions(-)

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier
  2019-02-22 15:29 ` [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier Pierre Morel
@ 2019-02-27  9:42   ` Cornelia Huck
  2019-02-27 10:22     ` Pierre Morel
  2019-02-28  8:23   ` Christian Borntraeger
  1 sibling, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2019-02-27  9:42 UTC (permalink / raw)
  To: Pierre Morel
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On Fri, 22 Feb 2019 16:29:57 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> To be able to use the VFIO interface to facilitate the
> mediated device memory pining/unpining we need to register

s/pining/pinning/ (unless it's pining for the fjords :)

> a notifier for IOMMU.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c     | 53 ++++++++++++++++++++++++++++++++---
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  2 files changed, 51 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 172d6eb..1b5130a 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -748,6 +748,36 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
>  };
>  
>  /**
> + * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
> + *
> + * @nb: The notifier block
> + * @action: Action to be taken (VFIO_IOMMU_NOTIFY_DMA_UNMAP)

I'd drop this annotation; you only do something for UNMAP but nothing
prevents the caller from passing in something else :)

> + * @data: the specific unmap structure for vfio_iommu_type1

"data associated with the request" ?

(same reasoning as above)

> + *
> + * Unpins the guest IOVA. (The NIB guest address we pinned before).
> + * Return NOTIFY_OK after unpining on a UNMAP request.
> + * otherwise, returns NOTIFY_DONE .

"For an UNMAP request, unpin the guest IOVA (the NIB guest address we
pinned before). Other requests are ignored."

?

> + */

Looks sane to me.

With comments changed,
Reviewed-by: Cornelia Huck <cohuck@redhat.com>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel
  2019-02-26 18:23   ` Tony Krowiak
@ 2019-02-27  9:54     ` Pierre Morel
  2019-02-27 18:17       ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-27  9:54 UTC (permalink / raw)
  To: Tony Krowiak, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 26/02/2019 19:23, Tony Krowiak wrote:
> On 2/22/19 10:29 AM, Pierre Morel wrote:
>> We register the AP PQAP instruction hook during the open
>> of the mediated device. And unregister it on release.
>>
>> In the AP PQAP instruction hook, if we receive a demand to
>> enable IRQs,
>> - we retrieve the vfio_ap_queue based on the APQN we receive
>>    in REG1,
>> - we retrieve the page of the guest address, (NIB), from
>>    register REG2
>> - we the mediated device to use the VFIO pinning infratrsucture
>>    to pin the page of the guest address,
>> - we retrieve the pointer to KVM to register the guest ISC
>>    and retrieve the host ISC
>> - finaly we activate GISA
>>
>> If we receive a demand to disable IRQs,
>> - we deactivate GISA
>> - unregister from the GIB
>> - unping the NIB
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/kvm_host.h      |   1 +
>>   drivers/s390/crypto/ap_bus.h          |   1 +
>>   drivers/s390/crypto/vfio_ap_ops.c     | 199 
>> +++++++++++++++++++++++++++++++++-
>>   drivers/s390/crypto/vfio_ap_private.h |   1 +
>>   4 files changed, 199 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h 
>> b/arch/s390/include/asm/kvm_host.h
>> index 49cc8b0..5f3bb8c 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -720,6 +720,7 @@ struct kvm_s390_cpu_model {
>>   struct kvm_s390_crypto {
>>       struct kvm_s390_crypto_cb *crycb;
>>       int (*pqap_hook)(struct kvm_vcpu *vcpu);
>> +    void *vfio_private;

...snip...


>> + *
>> + * Return 0 if we could handle the request inside KVM.
>> + * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
>> + */
>> +static int handle_pqap(struct kvm_vcpu *vcpu)
> 
> Change this function name to handle_pqap_aqic

Since we only intercept AQIC, why not.

> 
>> +{
>> +}
> 
> Add this function:
> 
> static int handle_pqap(struct kvm_vcpu *vcpu)
> {
>      int ret;
>      uint8_t fc;
> 
>      fc = vcpu->run->s.regs.gprs[0] >> 24;
>      switch (fc) {
>      case 0x03:
>          ret = handle_pqap_aqic(vcpu);
>          break;
>      default:
>          ret = -EOPNOTSUPP;
>          break;
>      }
> 
>      return ret;
> }

It is of no use for now, we only intercept AQIC, why introduce this now?

We can introduce a trampoline when we intercept TAPQ. If we do.


> 
>> +
>> + /*
>>    * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
>>    *
>>    * @nb: The notifier block
>> @@ -767,9 +950,10 @@ static int vfio_ap_mdev_iommu_notifier(struct 
>> notifier_block *nb,
>>       if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
>>           struct vfio_iommu_type1_dma_unmap *unmap = data;
>> -        unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
>> +        unsigned long pfn = unmap->iova >> PAGE_SHIFT;
>> -        vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
>> +        if (matrix_mdev->mdev)
>> +            vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &pfn, 1);
>>           return NOTIFY_OK;
>>       }
>> @@ -879,6 +1063,11 @@ static int vfio_ap_mdev_open(struct mdev_device 
>> *mdev)
>>       if (ret)
>>           goto err_group;
>> +    if (!matrix_mdev->kvm) {
>> +        ret = -ENODEV;
>> +        goto err_iommu;
>> +    }
>> +
>>       matrix_mdev->iommu_notifier.notifier_call = 
>> vfio_ap_mdev_iommu_notifier;
>>       events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
>> @@ -887,6 +1076,8 @@ static int vfio_ap_mdev_open(struct mdev_device 
>> *mdev)
>>       if (ret)
>>           goto err_iommu;
>> +    matrix_mdev->kvm->arch.crypto.pqap_hook = handle_pqap;
>> +    matrix_mdev->kvm->arch.crypto.vfio_private = matrix_mdev;
> 
> I do not see this used anywhere, why do we need it?

In handle_papq to retrieve the associated mediated device

> 
>>       return 0;
>>   err_iommu:
>> @@ -905,6 +1096,8 @@ static void vfio_ap_mdev_release(struct 
>> mdev_device *mdev)
>>           kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>>       vfio_ap_mdev_reset_queues(mdev);
>> +    matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
>> +    matrix_mdev->kvm->arch.crypto.vfio_private = NULL;
> 
> Ditto

ditto

> 
>>       vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>>                    &matrix_mdev->group_notifier);
>>       vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h 
>> b/drivers/s390/crypto/vfio_ap_private.h
>> index e535735..e2fd2c0 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -94,6 +94,7 @@ struct vfio_ap_queue {
>>       struct list_head list;
>>       struct ap_matrix_mdev *matrix_mdev;
>>       unsigned long nib;
>> +    unsigned long g_pfn;
> 
> Can't this be calculated from the nib?

It is.
It is initialized during the IRQ enabling with the current pinned NIB.
While the nib is initialised with the NIB to be use.

This allows to unpin the previous pinned NIB in the case the guest reset 
the queue, which automatically disable interrupt, because in this case 
the guest will not explicitely disable IRQ by using AQIC.


Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device
  2019-02-26 18:27   ` Tony Krowiak
@ 2019-02-27  9:58     ` Pierre Morel
  2019-03-04 13:02     ` Cornelia Huck
  1 sibling, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-27  9:58 UTC (permalink / raw)
  To: Tony Krowiak, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 26/02/2019 19:27, Tony Krowiak wrote:
> On 2/22/19 10:29 AM, Pierre Morel wrote:
>> When the device is remove, we must make sure to
>> clear the interruption and reset the AP device.
>>
>> We also need to clear the CRYCB of the guest.
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c     | 35 
>> +++++++++++++++++++++++++++++++++++
>>   drivers/s390/crypto/vfio_ap_ops.c     |  3 ++-
>>   drivers/s390/crypto/vfio_ap_private.h |  3 +++
>>   3 files changed, 40 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
>> b/drivers/s390/crypto/vfio_ap_drv.c
>> index eca0ffc..e5d91ff 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -5,6 +5,7 @@
>>    * Copyright IBM Corp. 2018
>>    *
>>    * Author(s): Tony Krowiak <akrowiak@linux.ibm.com>
>> + *          Pierre Morel <pmorel@linux.ibm.com>
>>    */
>>   #include <linux/module.h>
>> @@ -12,6 +13,8 @@
>>   #include <linux/slab.h>
>>   #include <linux/string.h>
>>   #include <asm/facility.h>
>> +#include <linux/bitops.h>
>> +#include <linux/kvm_host.h>
>>   #include "vfio_ap_private.h"
>>   #define VFIO_AP_ROOT_NAME "vfio_ap"
>> @@ -61,6 +64,33 @@ static int vfio_ap_queue_dev_probe(struct ap_device 
>> *apdev)
>>   }
>>   /**
>> + * vfio_ap_update_crycb
>> + * @q: A pointer to the queue being removed
>> + *
>> + * We clear the APID of the queue, making this queue unusable for the 
>> guest.
>> + * After this function we can reset the queue without to fear a race 
>> with
>> + * the guest to access the queue again.
>> + * We do not fear race with the host as we still get the device.
>> + */
>> +static void vfio_ap_update_crycb(struct vfio_ap_queue *q)
>> +{
>> +    struct ap_matrix_mdev *matrix_mdev = q->matrix_mdev;
>> +
>> +    if (!matrix_mdev)
>> +        return;
>> +
>> +    clear_bit_inv(AP_QID_CARD(q->apqn), matrix_mdev->matrix.apm);
>> +
>> +    if (!matrix_mdev->kvm)
>> +        return;
>> +
>> +    kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>> +                  matrix_mdev->matrix.apm,
>> +                  matrix_mdev->matrix.aqm,
>> +                  matrix_mdev->matrix.adm);
>> +}
>> +
>> +/**
>>    * vfio_ap_queue_dev_remove:
>>    *
>>    * Free the associated vfio_ap_queue structure
>> @@ -70,6 +100,11 @@ static void vfio_ap_queue_dev_remove(struct 
>> ap_device *apdev)
>>       struct vfio_ap_queue *q;
>>       q = dev_get_drvdata(&apdev->device);
>> +    if (!q)
>> +        return;
>> +
>> +    vfio_ap_update_crycb(q);
>> +    vfio_ap_mdev_reset_queue(q);
> 
> The reset is unnecessary because once the card is removed from the
> CRYCB, the ZAPQ may fail with because the queue may not exist anymore.

The code here is run inside the host. So the queue is still available 
for ZAPQ.


> Besides, once the card is removed from the guest's CRYCB, the bus
> running in the guest will do a reset.

I already answered this:
- The AP bus reset the queue before calling the driver probe
- which means that the guest can still access the queue after the reset 
done by the bus.
- we need to first clear the CRYCB to remove the queue from the guest 
before we reset the queue.


Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-27  9:13           ` Cornelia Huck
@ 2019-02-27 10:16             ` Pierre Morel
  0 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-27 10:16 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Tony Krowiak, borntraeger, alex.williamson, linux-kernel,
	linux-s390, kvm, frankja, pasic, david, schwidefsky,
	heiko.carstens, freude, mimu

On 27/02/2019 10:13, Cornelia Huck wrote:
> On Wed, 27 Feb 2019 09:09:09 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 26/02/2019 16:47, Tony Krowiak wrote:
>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>>> the case the AQIC facility is enabled in the guest.
>>>>>>
>>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>>> instruction with the AQIC command.
>>>>>>
>>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>>> is not loaded.
>>>>>>
>>>>>> If the callback has been setup we call it.
>>>>>> If not we setup an answer considering that no queue is available
>>>>>> for the guest when no callback has been setup.
>>>>>>
>>>>>> We do consider the responsability of the driver to always initialize
>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>>> a guest.
>>>>>>
>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>
>>>> ...snip...
>>>>   
>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>>        }
>>>>>>    }
>>>>>> +/*
>>>>>> + * handle_pqap: Handling pqap interception
>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>>> + *
>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>>> + *
>>>>>> + * The intercepting code calls a dedicated callback for this
>>>>>> instruction
>>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>>> + * SIE block.
>>>>>> + *
>>>>>> + * For PQAP/AQIC instructions only, verify privilege and
>>>>>> specifications.
>>>>>> + *
>>>>>> + * If no callback available, the queues are not available, return
>>>>>> this to
>>>>>> + * the caller.
>>>>>> + * Else return the value returned by the callback.
>>>>>> + */
>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +    uint8_t fc;
>>>>>> +    struct ap_queue_status status = {};
>>>>>> +
>>>>>> +    /* Verify that the AP instruction are available */
>>>>>> +    if (!ap_instructions_available())
>>>>>> +        return -EOPNOTSUPP;
>>>>>
>>>>> How can the guest even execute an AP instruction if the AP instructions
>>>>> are not available? If the AP instructions are not available on the host,
>>>>> they will not be available on the guest (i.e., CPU model feature
>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>>> here given QEMU may not be the only client.
>>>>>   
>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    /* Verify that the function code is AQIC */
>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>>> +    if (fc != 0x03)
>>>>>> +        return -EOPNOTSUPP;
>>>>>
>>>>> You must have missed my suggestion to move this to the
>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>>
>>>> Please consider what happen if the vfio_ap module is not loaded.
>>>
>>> I have considered it and even verified my expectations empirically. If
>>> the vfio_ap module is not loaded, you will not be able to create an mdev
>>> device.
>>
>> OK, now please consider that another userland tool, not QEMU uses KVM.
>>
>>> If you don't have an mdev device, you will not be able to
>>> start a guest with a vfio-ap device. If you start a guest without a
>>> vfio-ap device, but enable AP instructions for the guest, there will be
>>> no AP devices attached to the guest. Without any AP devices attached,
>>> the PQAP(AQIC) instructions will not ever get executed.
>>
>> This is not right. The instruction will be executed, eventually, after
>> decoding.
> 
> A sane guest will not issue PQAP(AQIC) if it doesn't have ap
> capabilities, but there's nothing that keeps a guest from issuing that
> instruction regardless.
> 
> However, is this instruction always intercepted and never handled by
> the SIE itself, even if the guest was not configured for ap? By which
> criteria do we enable interception?

It is always intercepted what ever ECA.28 is.
We enable the instruction is allowed through facility 65.

> 
>>
>>> Even if for some
>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>>> reason, it will fail with response code 0x01, AP-queue number not valid.
>>
>> No, before accessing the AP-queue the instruction will be decoded and
>> depending on the installed micro-code it will fail with
>> - OPERATION EXCEPTION if the micro-code is not installed
>> - PRIVILEDGE OPERATION if the instruction is issued from userland
>> (programm state)
>> - SPECIFICATION exception if the instruction do not respect the usage
>> specification
> 
> So, all of these happen prior to checking the function code?

Yes, this is the order of checks AFAIK

> 
>>
>> then it will be interpreted by the microcode and access the queue and
>> only then it will fail with RC 0x01, AP queue not valid.
>>
>> In the case of KVM, we intercept the instruction because it is issued by
>> the guest and we set the AQIC facility on to force interception.
> 
> Will we set that facility even if no vfio-ap device is configured?


Yes we do.


> 
>>
>> KVM do for us all the decode steps I mention here above, if there is or
>> not a pqap hook to be call to simulate the QP queue access.
>>
>> That done, the AP queue virtualisation can be called, this is done by
>> calling the hook.
>>
>>>
>>>    
>>>>   
>>>>>
>>>>> Message ID <342ffd56-b73a-b1f4-004d-de2c4aeef729@linux.ibm.com>
>>>>> Message ID <e04f0c8b-2fd9-1846-334a-faa48e0e051e@linux.ibm.com>
>>>>>
>>>>> You previously stated:
>>>>>
>>>>>      "QEMU and KVM can both accept PQAP/AQIC even if the vfio_ap
>>>>> driver is
>>>>>       not loaded. However now that the guest officially get the PQAP/AQIC
>>>>>       instruction we need to handle the specification and operation
>>>>>       exceptions inside KVM _before_ testing and even calling the driver
>>>>>       hook.
>>>>>
>>>>>       I will make the changes in the next iteration."
>>>>
>>>> Still seems right to me, and is done is this patch.
>>>> Isn't it?
>>>
>>> I don't think it's a matter of right and wrong, it's a matter of what
>>> makes sense. IMHO, you want to make things easy if other PQAP functions
>>> are intercepted at some time. In my opinion, there should be a switch
>>> statement in the pqap hook code with a case statement for each PQAP
>>> function supported by the hook. To plug in a new PQAP function handler,
>>> it will be a simple matter of writing the handler function and calling
>>> it from the case statement, like this:
>>>
>>> static int handle_pqap(struct kvm_vcpu *vcpu)
>>> {
>>>       int ret;
>>>       uint8_t fc;
>>>
>>>       fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>
>>>       switch (fc) {
>>>       case 0x03:
>>>           ret = handle_pqap_aqic(vcpu);
>>>       default:
>>>           ret = -EOPNOTSUPP;
>>>       }
>>>
>>>       return ret;
>>> }
>>>
>>> That function belongs in the pqap hook. I see no reaason whatsoever to
>>> check the function code here. If there is no hook, then you will fall
>>> through to the instruction below:
>>>
>>> status.response_code = 0x01;
>>
>> See answer above, what you are speaking about is the execution of the
>> instruction, but there can be exceptions during the decode of the
>> instruction.
> 
> If e.g. calling that instruction from userspace always creates a priv
> op exception, that should be checked prior to even looking at the
> function code. Same with other exceptions. From my no-docs point of
> view, it makes sense to have those common checks in handle_pqap() and
> use the switch/case to call handler functions for the individual
> function codes...
> 
>>
>>>    
>>>>   
>>>>>
>>>>> I don't know what any of the above has to do with checking FC=0x03? If
>>>>> that check is moved to the pqap handler hook, it can just as well return
>>>>> -EOPNOTSUPP. In fact, down below you do this:
>>>>>
>>>>>       return vcpu->kvm->arch.crypto.pqap_hook(vcpu);
>>>>>
>>>>> If the RC=0x03 check fails in the hook, it will return -EOPNOTSUPP just
>>>>> like above. None of this is critical, but the parsing of the register
>>>>> values for the PQAP(AQIC) function ought to be done in the code that
>>>>> handles the PQAP instruction IMHO.
>>>>
>>>>
>>>> This interception code must handle the PQAP/AQIC instruction when the
>>>> hook is not used and should not modify the handling for other PQAP
>>>> instructions.
>>>> We can not move anything inside the hook that must be always done.
>>>
>>> What you are saying here makes no sense. If the check for the function
>>> code is moved into the pqap hook and fc != 0x03, the result will be
>>> exactly the same; the hook will return -EOPNOTSUPP.
>>
>> again please consider that the hook may not be initialized.
> 
> I agree.
> 

Thanks for the comments.

Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-02-27  9:32   ` Cornelia Huck
@ 2019-02-27 10:21     ` Pierre Morel
  2019-02-27 10:44     ` Pierre Morel
  1 sibling, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-27 10:21 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On 27/02/2019 10:32, Cornelia Huck wrote:
> On Fri, 22 Feb 2019 16:29:56 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> We need to associate the ap_vfio_queue, which will hold the
>> per queue information for interrupt with a matrix mediated device
>> which hold the configuration and the way to the CRYCB.
>>
>> Let's do this when assigning a APID or a APQI to the mediated device
>> and clear the relation when unassigning.
>>
>> Queuing the devices on a list of free devices and testing the
>> matrix_mdev pointer to the associated matrix allow us to know
>> if the queue is associated to the matrix device and associated
>> or not to a mediated device.
>>
>> When resetting an AP queue we must wait until there are no more
>> messages in the message queue before considering the queue is really
>> in a clean state.
>>
>> Let's do it and wait until the status response code indicate the
>> queue is empty after issuing a PAPQ/ZAPQ instruction.
> 
> I'm a bit confused about the context where that list moving etc. is
> supposed to take place.
> 
> When are we assigning/deassigning? Is there even supposed to be any
> activity that we need to zap on the queues?
> 
> Do we need any serialization/locking on the lists?

Did I really forget this!?

Yes, thanks.

Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier
  2019-02-27  9:42   ` Cornelia Huck
@ 2019-02-27 10:22     ` Pierre Morel
  0 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-27 10:22 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On 27/02/2019 10:42, Cornelia Huck wrote:
> On Fri, 22 Feb 2019 16:29:57 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> To be able to use the VFIO interface to facilitate the
>> mediated device memory pining/unpining we need to register
> 
> s/pining/pinning/ (unless it's pining for the fjords :)
> 
>> a notifier for IOMMU.
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c     | 53 ++++++++++++++++++++++++++++++++---
>>   drivers/s390/crypto/vfio_ap_private.h |  2 ++
>>   2 files changed, 51 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 172d6eb..1b5130a 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -748,6 +748,36 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
>>   };
>>   
>>   /**
>> + * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
>> + *
>> + * @nb: The notifier block
>> + * @action: Action to be taken (VFIO_IOMMU_NOTIFY_DMA_UNMAP)
> 
> I'd drop this annotation; you only do something for UNMAP but nothing
> prevents the caller from passing in something else :)
> 
>> + * @data: the specific unmap structure for vfio_iommu_type1
> 
> "data associated with the request" ?
> 
> (same reasoning as above)
> 
>> + *
>> + * Unpins the guest IOVA. (The NIB guest address we pinned before).
>> + * Return NOTIFY_OK after unpining on a UNMAP request.
>> + * otherwise, returns NOTIFY_DONE .
> 
> "For an UNMAP request, unpin the guest IOVA (the NIB guest address we
> pinned before). Other requests are ignored."
> 
> ?
> 
>> + */
> 
> Looks sane to me.
> 
> With comments changed,
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> 

Thanks.
Will do the changes.

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-02-27  9:32   ` Cornelia Huck
  2019-02-27 10:21     ` Pierre Morel
@ 2019-02-27 10:44     ` Pierre Morel
  1 sibling, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-27 10:44 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On 27/02/2019 10:32, Cornelia Huck wrote:
> On Fri, 22 Feb 2019 16:29:56 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> We need to associate the ap_vfio_queue, which will hold the
>> per queue information for interrupt with a matrix mediated device
>> which hold the configuration and the way to the CRYCB.
>>
>> Let's do this when assigning a APID or a APQI to the mediated device
>> and clear the relation when unassigning.
>>
>> Queuing the devices on a list of free devices and testing the
>> matrix_mdev pointer to the associated matrix allow us to know
>> if the queue is associated to the matrix device and associated
>> or not to a mediated device.
>>
>> When resetting an AP queue we must wait until there are no more
>> messages in the message queue before considering the queue is really
>> in a clean state.
>>
>> Let's do it and wait until the status response code indicate the
>> queue is empty after issuing a PAPQ/ZAPQ instruction.
> 
> I'm a bit confused about the context where that list moving etc. is
> supposed to take place.

You are confused because... it is confuse.

> 
> When are we assigning/deassigning? Is there even supposed to be any
> activity that we need to zap on the queues?

No I mixed two functionalities here. It is not right.

I think I must:

- separate the simplification for the reset may be move the chunks to 
the previous patch as it is a simplification coming with the use of the 
lists or move them to a separate patch.

- make the commit message less confuse :)


Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-27  8:09         ` Pierre Morel
  2019-02-27  9:13           ` Cornelia Huck
@ 2019-02-27 18:00           ` Tony Krowiak
  2019-02-28  9:42             ` Christian Borntraeger
  1 sibling, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2019-02-27 18:00 UTC (permalink / raw)
  To: pmorel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/27/19 3:09 AM, Pierre Morel wrote:
> On 26/02/2019 16:47, Tony Krowiak wrote:
>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>> the case the AQIC facility is enabled in the guest.
>>>>>
>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>> instruction with the AQIC command.
>>>>>
>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>> is not loaded.
>>>>>
>>>>> If the callback has been setup we call it.
>>>>> If not we setup an answer considering that no queue is available
>>>>> for the guest when no callback has been setup.
>>>>>
>>>>> We do consider the responsability of the driver to always initialize
>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>> a guest.
>>>>>
>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>
>>> ...snip...
>>>
>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>       }
>>>>>   }
>>>>> +/*
>>>>> + * handle_pqap: Handling pqap interception
>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>> + *
>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>> + *
>>>>> + * The intercepting code calls a dedicated callback for this 
>>>>> instruction
>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>> + * SIE block.
>>>>> + *
>>>>> + * For PQAP/AQIC instructions only, verify privilege and 
>>>>> specifications.
>>>>> + *
>>>>> + * If no callback available, the queues are not available, return 
>>>>> this to
>>>>> + * the caller.
>>>>> + * Else return the value returned by the callback.
>>>>> + */
>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +    uint8_t fc;
>>>>> +    struct ap_queue_status status = {};
>>>>> +
>>>>> +    /* Verify that the AP instruction are available */
>>>>> +    if (!ap_instructions_available())
>>>>> +        return -EOPNOTSUPP;
>>>>
>>>> How can the guest even execute an AP instruction if the AP instructions
>>>> are not available? If the AP instructions are not available on the 
>>>> host,
>>>> they will not be available on the guest (i.e., CPU model feature
>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>> here given QEMU may not be the only client.
>>>>
>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>> +        return -EOPNOTSUPP;
>>>>> +    /* Verify that the function code is AQIC */
>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>> +    if (fc != 0x03)
>>>>> +        return -EOPNOTSUPP;
>>>>
>>>> You must have missed my suggestion to move this to the
>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>
>>> Please consider what happen if the vfio_ap module is not loaded.
>>
>> I have considered it and even verified my expectations empirically. If
>> the vfio_ap module is not loaded, you will not be able to create an 
>> mdev device.
> 
> OK, now please consider that another userland tool, not QEMU uses KVM.

What does that have to do with loading the vfio_ap module? Without the
vfio_ap module, there will be no AP devices for the guest. What are you
suggesting here?

> 
>> If you don't have an mdev device, you will not be able to
>> start a guest with a vfio-ap device. If you start a guest without a
>> vfio-ap device, but enable AP instructions for the guest, there will be
>> no AP devices attached to the guest. Without any AP devices attached,
>> the PQAP(AQIC) instructions will not ever get executed.
> 
> This is not right. The instruction will be executed, eventually, after 
> decoding.

Please explain why the PQAP(AQIC) instruction will be executed on a
guest without any devices? Point me to the code in the AP bus where
PQAP(AQIC) is executed without a queue?

> 
>> Even if for some
>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>> reason, it will fail with response code 0x01, AP-queue number not valid.
> 
> No, before accessing the AP-queue the instruction will be decoded and 
> depending on the installed micro-code it will fail with
> - OPERATION EXCEPTION if the micro-code is not installed
> - PRIVILEDGE OPERATION if the instruction is issued from userland 
> (programm state)
> - SPECIFICATION exception if the instruction do not respect the usage 
> specification
> 
> then it will be interpreted by the microcode and access the queue and 
> only then it will fail with RC 0x01, AP queue not valid.
> 
> In the case of KVM, we intercept the instruction because it is issued by 
> the guest and we set the AQIC facility on to force interception.
> 
> KVM do for us all the decode steps I mention here above, if there is or 
> not a pqap hook to be call to simulate the QP queue access.
> 
> That done, the AP queue virtualisation can be called, this is done by 
> calling the hook.

Okay, let's go back to the genesis of this discussion; namely, my
suggestion about moving the fc == 0x03 check into the hook code. If
the vfio_ap module is not loaded, there will be no hook code. In that
case, the check for the hook will fail and ultimately response code
0x01 will be set in the status word (which may not be the right thing
to do?). You have not stated a single good reason for keeping this
check, but I'm done with this silly argument. It certainly doesn't
hurt anything.

> 
>>
>>
>>>
>>>>
>>>> Message ID <342ffd56-b73a-b1f4-004d-de2c4aeef729@linux.ibm.com>
>>>> Message ID <e04f0c8b-2fd9-1846-334a-faa48e0e051e@linux.ibm.com>
>>>>
>>>> You previously stated:
>>>>
>>>>     "QEMU and KVM can both accept PQAP/AQIC even if the vfio_ap 
>>>> driver is
>>>>      not loaded. However now that the guest officially get the 
>>>> PQAP/AQIC
>>>>      instruction we need to handle the specification and operation
>>>>      exceptions inside KVM _before_ testing and even calling the driver
>>>>      hook.
>>>>
>>>>      I will make the changes in the next iteration."
>>>
>>> Still seems right to me, and is done is this patch.
>>> Isn't it?
>>
>> I don't think it's a matter of right and wrong, it's a matter of what
>> makes sense. IMHO, you want to make things easy if other PQAP functions
>> are intercepted at some time. In my opinion, there should be a switch
>> statement in the pqap hook code with a case statement for each PQAP
>> function supported by the hook. To plug in a new PQAP function handler,
>> it will be a simple matter of writing the handler function and calling
>> it from the case statement, like this:
>>
>> static int handle_pqap(struct kvm_vcpu *vcpu)
>> {
>>      int ret;
>>      uint8_t fc;
>>
>>      fc = vcpu->run->s.regs.gprs[0] >> 24;
>>
>>      switch (fc) {
>>      case 0x03:
>>          ret = handle_pqap_aqic(vcpu);
>>      default:
>>          ret = -EOPNOTSUPP;
>>      }
>>
>>      return ret;
>> }
>>
>> That function belongs in the pqap hook. I see no reaason whatsoever to
>> check the function code here. If there is no hook, then you will fall
>> through to the instruction below:
>>
>> status.response_code = 0x01;
> 
> See answer above, what you are speaking about is the execution of the 
> instruction, but there can be exceptions during the decode of the 
> instruction.

What are you talking about, "decode of the instruction".

> 
>>
>>>
>>>>
>>>> I don't know what any of the above has to do with checking FC=0x03? If
>>>> that check is moved to the pqap handler hook, it can just as well 
>>>> return
>>>> -EOPNOTSUPP. In fact, down below you do this:
>>>>
>>>>      return vcpu->kvm->arch.crypto.pqap_hook(vcpu);
>>>>
>>>> If the RC=0x03 check fails in the hook, it will return -EOPNOTSUPP just
>>>> like above. None of this is critical, but the parsing of the register
>>>> values for the PQAP(AQIC) function ought to be done in the code that
>>>> handles the PQAP instruction IMHO.
>>>
>>>
>>> This interception code must handle the PQAP/AQIC instruction when the 
>>> hook is not used and should not modify the handling for other PQAP 
>>> instructions.
>>> We can not move anything inside the hook that must be always done.
>>
>> What you are saying here makes no sense. If the check for the function
>> code is moved into the pqap hook and fc != 0x03, the result will be
>> exactly the same; the hook will return -EOPNOTSUPP.
> 
> again please consider that the hook may not be initialized.


So what? Then maybe the code at the end of the function is wrong:

/* PQAP/AQIC instructions are authorized but there is no queue */
status.response_code = 0x01;
memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
return 0;

Why does this make sense? What if the APQN is valid? You don't even know
whether it is or not. The only reason you would even reach this
instruction is if the pqap hook is not initialized. Wouldn't it make
more sense to just return -EOPNOTSUPP here? If there is no hook, then
it is not supported.

> 
> Regards,
> Pierre
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel
  2019-02-27  9:54     ` Pierre Morel
@ 2019-02-27 18:17       ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2019-02-27 18:17 UTC (permalink / raw)
  To: pmorel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/27/19 4:54 AM, Pierre Morel wrote:
> On 26/02/2019 19:23, Tony Krowiak wrote:
>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>> We register the AP PQAP instruction hook during the open
>>> of the mediated device. And unregister it on release.
>>>
>>> In the AP PQAP instruction hook, if we receive a demand to
>>> enable IRQs,
>>> - we retrieve the vfio_ap_queue based on the APQN we receive
>>>    in REG1,
>>> - we retrieve the page of the guest address, (NIB), from
>>>    register REG2
>>> - we the mediated device to use the VFIO pinning infratrsucture
>>>    to pin the page of the guest address,
>>> - we retrieve the pointer to KVM to register the guest ISC
>>>    and retrieve the host ISC
>>> - finaly we activate GISA
>>>
>>> If we receive a demand to disable IRQs,
>>> - we deactivate GISA
>>> - unregister from the GIB
>>> - unping the NIB
>>>
>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>> ---
>>>   arch/s390/include/asm/kvm_host.h      |   1 +
>>>   drivers/s390/crypto/ap_bus.h          |   1 +
>>>   drivers/s390/crypto/vfio_ap_ops.c     | 199 
>>> +++++++++++++++++++++++++++++++++-
>>>   drivers/s390/crypto/vfio_ap_private.h |   1 +
>>>   4 files changed, 199 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/s390/include/asm/kvm_host.h 
>>> b/arch/s390/include/asm/kvm_host.h
>>> index 49cc8b0..5f3bb8c 100644
>>> --- a/arch/s390/include/asm/kvm_host.h
>>> +++ b/arch/s390/include/asm/kvm_host.h
>>> @@ -720,6 +720,7 @@ struct kvm_s390_cpu_model {
>>>   struct kvm_s390_crypto {
>>>       struct kvm_s390_crypto_cb *crycb;
>>>       int (*pqap_hook)(struct kvm_vcpu *vcpu);
>>> +    void *vfio_private;
> 
> ...snip...
> 
> 
>>> + *
>>> + * Return 0 if we could handle the request inside KVM.
>>> + * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
>>> + */
>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>
>> Change this function name to handle_pqap_aqic
> 
> Since we only intercept AQIC, why not.
> 
>>
>>> +{
>>> +}
>>
>> Add this function:
>>
>> static int handle_pqap(struct kvm_vcpu *vcpu)
>> {
>>      int ret;
>>      uint8_t fc;
>>
>>      fc = vcpu->run->s.regs.gprs[0] >> 24;
>>      switch (fc) {
>>      case 0x03:
>>          ret = handle_pqap_aqic(vcpu);
>>          break;
>>      default:
>>          ret = -EOPNOTSUPP;
>>          break;
>>      }
>>
>>      return ret;
>> }
> 
> It is of no use for now, we only intercept AQIC, why introduce this now?
> 
> We can introduce a trampoline when we intercept TAPQ. If we do.

It simplifies adding additional functions down the road, makes the
code much clearer and there is no good reason not to do it.

> 
> 
>>
>>> +
>>> + /*
>>>    * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
>>>    *
>>>    * @nb: The notifier block
>>> @@ -767,9 +950,10 @@ static int vfio_ap_mdev_iommu_notifier(struct 
>>> notifier_block *nb,
>>>       if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
>>>           struct vfio_iommu_type1_dma_unmap *unmap = data;
>>> -        unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
>>> +        unsigned long pfn = unmap->iova >> PAGE_SHIFT;
>>> -        vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
>>> +        if (matrix_mdev->mdev)
>>> +            vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &pfn, 1);
>>>           return NOTIFY_OK;
>>>       }
>>> @@ -879,6 +1063,11 @@ static int vfio_ap_mdev_open(struct mdev_device 
>>> *mdev)
>>>       if (ret)
>>>           goto err_group;
>>> +    if (!matrix_mdev->kvm) {
>>> +        ret = -ENODEV;
>>> +        goto err_iommu;
>>> +    }
>>> +
>>>       matrix_mdev->iommu_notifier.notifier_call = 
>>> vfio_ap_mdev_iommu_notifier;
>>>       events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
>>> @@ -887,6 +1076,8 @@ static int vfio_ap_mdev_open(struct mdev_device 
>>> *mdev)
>>>       if (ret)
>>>           goto err_iommu;
>>> +    matrix_mdev->kvm->arch.crypto.pqap_hook = handle_pqap;
>>> +    matrix_mdev->kvm->arch.crypto.vfio_private = matrix_mdev;
>>
>> I do not see this used anywhere, why do we need it?
> 
> In handle_papq to retrieve the associated mediated device

I don't think this is necessary and IMHO is indicative of a
design flaw. If all vfio_ap_queue objects identifying queues
bound to the vfio_ap driver were maintained in a single list
(i.e., not moved back and forth from the free_list to the qlist)
then there would be no need for this vfio_private field. See
my comments in response to patch 5/7 for the reasons.

> 
>>
>>>       return 0;
>>>   err_iommu:
>>> @@ -905,6 +1096,8 @@ static void vfio_ap_mdev_release(struct 
>>> mdev_device *mdev)
>>>           kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>>>       vfio_ap_mdev_reset_queues(mdev);
>>> +    matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
>>> +    matrix_mdev->kvm->arch.crypto.vfio_private = NULL;
>>
>> Ditto
> 
> ditto
> 
>>
>>>       vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>>>                    &matrix_mdev->group_notifier);
>>>       vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>>> diff --git a/drivers/s390/crypto/vfio_ap_private.h 
>>> b/drivers/s390/crypto/vfio_ap_private.h
>>> index e535735..e2fd2c0 100644
>>> --- a/drivers/s390/crypto/vfio_ap_private.h
>>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>>> @@ -94,6 +94,7 @@ struct vfio_ap_queue {
>>>       struct list_head list;
>>>       struct ap_matrix_mdev *matrix_mdev;
>>>       unsigned long nib;
>>> +    unsigned long g_pfn;
>>
>> Can't this be calculated from the nib?
> 
> It is.
> It is initialized during the IRQ enabling with the current pinned NIB.
> While the nib is initialised with the NIB to be use.
> 
> This allows to unpin the previous pinned NIB in the case the guest reset 
> the queue, which automatically disable interrupt, because in this case 
> the guest will not explicitely disable IRQ by using AQIC.

I'm sorry, I don't understand the point you are making.

> 
> 
> Regards,
> Pierre
> 
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel
  2019-02-22 15:29 ` [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel Pierre Morel
  2019-02-26 18:23   ` Tony Krowiak
@ 2019-02-27 18:18   ` Tony Krowiak
  2019-02-28 20:20   ` Christian Borntraeger
  2019-03-04  1:57   ` Halil Pasic
  3 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2019-02-27 18:18 UTC (permalink / raw)
  To: Pierre Morel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/22/19 10:29 AM, Pierre Morel wrote:
> We register the AP PQAP instruction hook during the open
> of the mediated device. And unregister it on release.
> 
> In the AP PQAP instruction hook, if we receive a demand to
> enable IRQs,
> - we retrieve the vfio_ap_queue based on the APQN we receive
>    in REG1,
> - we retrieve the page of the guest address, (NIB), from
>    register REG2
> - we the mediated device to use the VFIO pinning infratrsucture
>    to pin the page of the guest address,
> - we retrieve the pointer to KVM to register the guest ISC
>    and retrieve the host ISC
> - finaly we activate GISA
> 
> If we receive a demand to disable IRQs,
> - we deactivate GISA
> - unregister from the GIB
> - unping the NIB
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>   arch/s390/include/asm/kvm_host.h      |   1 +
>   drivers/s390/crypto/ap_bus.h          |   1 +
>   drivers/s390/crypto/vfio_ap_ops.c     | 199 +++++++++++++++++++++++++++++++++-
>   drivers/s390/crypto/vfio_ap_private.h |   1 +
>   4 files changed, 199 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 49cc8b0..5f3bb8c 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -720,6 +720,7 @@ struct kvm_s390_cpu_model {
>   struct kvm_s390_crypto {
>   	struct kvm_s390_crypto_cb *crycb;
>   	int (*pqap_hook)(struct kvm_vcpu *vcpu);
> +	void *vfio_private;
>   	__u32 crycbd;
>   	__u8 aes_kw;
>   	__u8 dea_kw;
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index bfc66e4..323f2aa 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -43,6 +43,7 @@ static inline int ap_test_bit(unsigned int *ptr, unsigned int nr)
>   #define AP_RESPONSE_BUSY		0x05
>   #define AP_RESPONSE_INVALID_ADDRESS	0x06
>   #define AP_RESPONSE_OTHERWISE_CHANGED	0x07
> +#define AP_RESPONSE_INVALID_GISA	0x08
>   #define AP_RESPONSE_Q_FULL		0x10
>   #define AP_RESPONSE_NO_PENDING_REPLY	0x10
>   #define AP_RESPONSE_INDEX_TOO_BIG	0x11
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 1b5130a..0196065 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -43,7 +43,7 @@ struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
>   	return NULL;
>   }
>   
> -static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
> +int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>   {
>   	struct ap_queue_status status;
>   	int retry = 20;
> @@ -75,6 +75,27 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>   	return -EBUSY;
>   }
>   
> +/**
> + * vfio_ap_free_irq:
> + * @q: The vfio_ap_queue
> + *
> + * Unpin the guest NIB
> + * Unregister the ISC from the GIB alert
> + * Clear the vfio_ap_queue intern fields
> + */
> +static void vfio_ap_free_irq(struct vfio_ap_queue *q)
> +{
> +	if (!q)
> +		return;
> +	if (q->g_pfn)
> +		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &q->g_pfn, 1);
> +	if (q->isc)
> +		kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->isc);
> +	q->nib = 0;
> +	q->isc = 0;
> +	q->g_pfn = 0;
> +}
> +
>   static void vfio_ap_matrix_init(struct ap_config_info *info,
>   				struct ap_matrix *matrix)
>   {
> @@ -97,6 +118,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>   	}
>   
>   	INIT_LIST_HEAD(&matrix_mdev->qlist);
> +	matrix_mdev->mdev = mdev;
>   	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>   	mdev_set_drvdata(mdev, matrix_mdev);
>   	mutex_lock(&matrix_dev->lock);
> @@ -109,10 +131,16 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>   static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>   {
>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +	struct vfio_ap_queue *q, *qtmp;
>   
>   	if (matrix_mdev->kvm)
>   		return -EBUSY;
>   
> +	list_for_each_entry_safe(q, qtmp, &matrix_mdev->qlist, list) {
> +		q->matrix_mdev = NULL;
> +		vfio_ap_mdev_reset_queue(q);
> +		list_move(&q->list, &matrix_dev->free_list);
> +	}
>   	mutex_lock(&matrix_dev->lock);
>   	list_del(&matrix_mdev->node);
>   	mutex_unlock(&matrix_dev->lock);
> @@ -748,6 +776,161 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
>   };
>   
>   /**
> + * vfio_ap_clrirq: Disable Interruption for a APQN
> + *
> + * @dev: the device associated with the ap_queue
> + * @q:   the vfio_ap_queue holding AQIC parameters
> + *
> + * Issue the host side PQAP/AQIC
> + * On success: unpin the NIB saved in *q and unregister from GIB
> + * interface
> + *
> + * Return the ap_queue_status returned by the ap_aqic()
> + */
> +static struct ap_queue_status vfio_ap_clrirq(struct vfio_ap_queue *q)
> +{
> +	struct ap_qirq_ctrl aqic_gisa = {};
> +	struct ap_queue_status status;
> +
> +	status = ap_aqic(q->apqn, aqic_gisa, NULL);
> +	if (!status.response_code)
> +		vfio_ap_free_irq(q);
> +
> +	return status;
> +}
> +
> +/**
> + * vfio_ap_setirq: Enable Interruption for a APQN
> + *
> + * @dev: the device associated with the ap_queue
> + * @q:   the vfio_ap_queue holding AQIC parameters
> + *
> + * Pin the NIB saved in *q
> + * Register the guest ISC to GIB interface and retrieve the
> + * host ISC to issue the host side PQAP/AQIC
> + *
> + * Response.status may be set to following Response Code in case of error:
> + * - AP_RESPONSE_INVALID_ADDRESS: vfio_pin_pages failed
> + * - AP_RESPONSE_OTHERWISE_CHANGED: Hypervizor GISA internal error
> + *
> + * Otherwise return the ap_queue_status returned by the ap_aqic()
> + */
> +static struct ap_queue_status vfio_ap_setirq(struct vfio_ap_queue *q)
> +{
> +	struct ap_qirq_ctrl aqic_gisa = {};
> +	struct ap_queue_status status = {};
> +	struct kvm_s390_gisa *gisa;
> +	struct kvm *kvm;
> +	unsigned long g_pfn, h_nib, h_pfn;
> +	int ret;
> +
> +	kvm = q->matrix_mdev->kvm;
> +	gisa = kvm->arch.gisa_int.origin;
> +
> +	g_pfn = q->nib >> PAGE_SHIFT;
> +	ret = vfio_pin_pages(mdev_dev(q->matrix_mdev->mdev), &g_pfn, 1,
> +			     IOMMU_READ | IOMMU_WRITE, &h_pfn);
> +	switch (ret) {
> +	case 1:
> +		break;
> +	case -EINVAL:
> +	case -E2BIG:
> +		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
> +		/* Fallthrough */
> +	default:
> +		return status;
> +	}
> +
> +	h_nib = (h_pfn << PAGE_SHIFT) | (q->nib & ~PAGE_MASK);
> +	aqic_gisa.gisc = q->isc;
> +	aqic_gisa.isc = kvm_s390_gisc_register(kvm, q->isc);
> +	aqic_gisa.ir = 1;
> +	aqic_gisa.gisa = gisa->next_alert >> 4;
> +
> +	status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
> +	switch (status.response_code) {
> +	case AP_RESPONSE_NORMAL:
> +		if (q->g_pfn)
> +			vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
> +					 &q->g_pfn, 1);
> +		q->g_pfn = g_pfn;
> +		break;
> +	case AP_RESPONSE_OTHERWISE_CHANGED:
> +		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &g_pfn, 1);
> +		break;
> +	case AP_RESPONSE_INVALID_GISA:
> +		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
> +	default:	/* Fall Through */
> +		pr_warn("%s: apqn %04x: response: %02x\n", __func__, q->apqn,
> +			status.response_code);
> +		vfio_ap_free_irq(q);
> +		break;
> +	}
> +
> +	return status;
> +}
> +
> +/**
> + * handle_pqap: PQAP instruction callback
> + *
> + * @vcpu: The vcpu on which we received the PQAP instruction
> + *
> + * Get the general register contents to initialize internal variables.
> + * REG[0]: APQN
> + * REG[1]: IR and ISC
> + * REG[2]: NIB
> + *
> + * Response.status may be set to following Response Code:
> + * - AP_RESPONSE_Q_NOT_AVAIL: if the queue is not available
> + * - AP_RESPONSE_DECONFIGURED: if the queue is not configured
> + * - AP_RESPONSE_NORMAL (0) : in case of successs
> + *   Check vfio_ap_setirq() and vfio_ap_clrirq() for other possible RC.
> + *
> + * Return 0 if we could handle the request inside KVM.
> + * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
> + */
> +static int handle_pqap(struct kvm_vcpu *vcpu)
> +{
> +	uint64_t status;
> +	uint16_t apqn;
> +	struct vfio_ap_queue *q;
> +	struct ap_queue_status qstatus = {};
> +	struct ap_matrix_mdev *matrix_mdev;
> +
> +	/* If we do not use the AIV facility just go to userland */
> +	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
> +		return -EOPNOTSUPP;
> +
> +	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
> +	matrix_mdev = vcpu->kvm->arch.crypto.vfio_private;

It looks to me like we have added a new field to the
kvm_s390_crypto structure because of the decision to store a list
of queues in the matrix_mdev device. The reason I say this is
because I see that we need a matrix_mdev device in order to get
the using the vfio_ap_get_queue() function. If we maintained a
list of all queues bound to the vfio_ap driver in the matrix_dev
structure, then we wouldn't need to store a reference to
the matrix_mdev in the kvm_s390_crypto structure. IMHO, this is
indicative of a design flaw.

> +	if (!matrix_mdev)
> +		return -EOPNOTSUPP;
> +	q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
> +	if (!q) {
> +		qstatus.response_code = AP_RESPONSE_Q_NOT_AVAIL;
> +		goto out;
> +	}
> +
> +	status = vcpu->run->s.regs.gprs[1];
> +
> +	/* If IR bit(16) is set we enable the interrupt */
> +	if ((status >> (63 - 16)) & 0x01) {
> +		q->isc = status & 0x07;
> +		q->nib = vcpu->run->s.regs.gprs[2];
> +		qstatus = vfio_ap_setirq(q);
> +		if (qstatus.response_code) {
> +			q->nib = 0;
> +			q->isc = 0;
> +		}
> +	} else
> +		qstatus = vfio_ap_clrirq(q);
> +
> +out:
> +	memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus));
> +	return 0;
> +}
> +
> + /*
>    * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
>    *
>    * @nb: The notifier block
> @@ -767,9 +950,10 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>   
>   	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
>   		struct vfio_iommu_type1_dma_unmap *unmap = data;
> -		unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
> +		unsigned long pfn = unmap->iova >> PAGE_SHIFT;
>   
> -		vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
> +		if (matrix_mdev->mdev)
> +			vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &pfn, 1);
>   		return NOTIFY_OK;
>   	}
>   
> @@ -879,6 +1063,11 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>   	if (ret)
>   		goto err_group;
>   
> +	if (!matrix_mdev->kvm) {
> +		ret = -ENODEV;
> +		goto err_iommu;
> +	}
> +
>   	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
>   	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
>   
> @@ -887,6 +1076,8 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>   	if (ret)
>   		goto err_iommu;
>   
> +	matrix_mdev->kvm->arch.crypto.pqap_hook = handle_pqap;
> +	matrix_mdev->kvm->arch.crypto.vfio_private = matrix_mdev;
>   	return 0;
>   
>   err_iommu:
> @@ -905,6 +1096,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>   		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>   
>   	vfio_ap_mdev_reset_queues(mdev);
> +	matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> +	matrix_mdev->kvm->arch.crypto.vfio_private = NULL;
>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>   				 &matrix_mdev->group_notifier);
>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index e535735..e2fd2c0 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -94,6 +94,7 @@ struct vfio_ap_queue {
>   	struct list_head list;
>   	struct ap_matrix_mdev *matrix_mdev;
>   	unsigned long nib;
> +	unsigned long g_pfn;
>   	int	apqn;
>   	unsigned char isc;
>   };
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-02-27  9:29     ` Pierre Morel
@ 2019-02-27 20:14       ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2019-02-27 20:14 UTC (permalink / raw)
  To: pmorel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/27/19 4:29 AM, Pierre Morel wrote:
> On 26/02/2019 19:14, Tony Krowiak wrote:
>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>> We need to associate the ap_vfio_queue, which will hold the
>>> per queue information for interrupt with a matrix mediated device
>>> which hold the configuration and the way to the CRYCB.
>>>
>>> Let's do this when assigning a APID or a APQI to the mediated device
>>> and clear the relation when unassigning.
>>>
>>> Queuing the devices on a list of free devices and testing the
>>> matrix_mdev pointer to the associated matrix allow us to know
>>> if the queue is associated to the matrix device and associated
>>> or not to a mediated device.
>>>
>>> When resetting an AP queue we must wait until there are no more
>>> messages in the message queue before considering the queue is really
>>> in a clean state.
>>>
>>> Let's do it and wait until the status response code indicate the
>>> queue is empty after issuing a PAPQ/ZAPQ instruction.
>>>
>>> Being at work on the reset function, let's simplify
>>> vfio_ap_mdev_reset_queue and vfio_ap_mdev_reset_queues by using the
>>> vfio_ap_queue structure as parameter.
>>>
>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>> ---
>>>   drivers/s390/crypto/vfio_ap_ops.c | 385 
>>> +++++++++++++++++++-------------------
>>>   1 file changed, 189 insertions(+), 196 deletions(-)
> 
> ...snip...
> 
>>> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>>> +{
>>> +    struct ap_queue_status status;
>>> +    int retry = 20;
>>> +
>>> +    do {
>>> +        status = ap_zapq(q->apqn);
>>> +        switch (status.response_code) {
>>> +        case AP_RESPONSE_NORMAL:
>>> +            while (!status.queue_empty && retry--) {
>>> +                msleep(20);
>>> +                status = ap_tapq(q->apqn, NULL);
>>> +            }
>>
>> I am not sure the above is necessary. I have an email out to the author
>> of the architecture doc to verify.
> 
> I do not know the question you asked but the documentation is very clear 
> on the reset behavior: a queue is completely reseted only after the RC 
> of reset/zapq is 0 and the queue_empty bit is set.

You may want to check your email once in a while. I copied you on the
email I sent to the doc author. What you say is true and you may very
well be right, but I found the doc to be confusing in the way it was
worded. I would like to get confirmation of the need for this. Notice
that I started my sentence off with I AM NOT SURE, so I clearly wasn't
saying it is definitely not necessary.

> 
>>
>>> +            if (retry <= 0)
>>> +                pr_warn("%s: queue 0x%04x not empty\n",
> 
> ...snip...
> 
>>> + * @matrix_mdev: the matrix mediated device for which we want to 
>>> associate
>>> + *         all available queues with a given apqi.
>>> + * @apid:     The apid which associated with all defined APQI of the
>>> + *         mediated device will define a AP queue.
>>>    *
>>> - * - If @data contains only an apid value, @data will be flagged as
>>> - *   reserved if the APID field in the AP queue device matches
>>> - *
>>> - * - If @data contains only an apqi value, @data will be flagged as
>>> - *   reserved if the APQI field in the AP queue device matches
>>> - *
>>> - * Returns 0 to indicate the input to function succeeded. Returns 
>>> -EINVAL if
>>> - * @data does not contain either an apid or apqi.
>>> + * We remove the queue from the list of queues associated with the
>>> + * mediated device and put them back to the free list of the matrix
>>> + * device and clear the matrix_mdev pointer.
>>>    */
>>> -static int vfio_ap_has_queue(struct device *dev, void *data)
>>> +static void vfio_ap_put_all_domains(struct ap_matrix_mdev *matrix_mdev,
>>> +                    int apid)
>>
>> I would prefer this be named:
>>
>>      vfio_ap_mdev_free_queues_with_apid()
>>
>> get/put is typically used to increment/decrement reference counters.
>> What you are doing in this function freeing all queues connected to 
>> specified card.
> 
> OK, I can change this function name and the further one you mentioned.
> 
>>
>>>   {
>>> -    struct vfio_ap_queue_reserved *qres = data;
>>> -    struct ap_queue *ap_queue = to_ap_queue(dev);
>>> -    ap_qid_t qid;
>>> -    unsigned long id;
>>> +    int apqi, apqn;
>>> -    if (qres->apid && qres->apqi) {
>>> -        qid = AP_MKQID(*qres->apid, *qres->apqi);
>>> -        if (qid == ap_queue->qid)
>>> -            qres->reserved = true;
>>> -    } else if (qres->apid && !qres->apqi) {
>>> -        id = AP_QID_CARD(ap_queue->qid);
>>> -        if (id == *qres->apid)
>>> -            qres->reserved = true;
>>> -    } else if (!qres->apid && qres->apqi) {
>>> -        id = AP_QID_QUEUE(ap_queue->qid);
>>> -        if (id == *qres->apqi)
>>> -            qres->reserved = true;
>>> -    } else {
>>> -        return -EINVAL;
>>> +    for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>>> +        apqn = AP_MKQID(apid, apqi);
>>> +        vfio_ap_free_queue(apqn, matrix_mdev);
>>>       }
>>
>> Maybe you should clear the bit corresponding to apid from the APM here?
> 
> I do not think so, this is pure list handling, the APM bit is already 
> cleared in the unassign_adapter_store function.
> 
> I only answered once for all comments on naming and bit mask but will 
> treat them the same way.
> Thanks for comments.
> 
> Regards,
> Pierre
> 
> 
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 2/7] s390: ap: new vfio_ap_queue structure
  2019-02-27  8:40     ` Pierre Morel
@ 2019-02-27 20:35       ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2019-02-27 20:35 UTC (permalink / raw)
  To: pmorel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/27/19 3:40 AM, Pierre Morel wrote:
> On 26/02/2019 17:10, Tony Krowiak wrote:
>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>> The AP interruptions are assigned on a queue basis and
>>> the GISA structure is handled on a VM basis, so that
>>> we need to add a structure we can retrieve from both side
>>> holding the information we need to handle PQAP/AQIC interception
>>> and setup the GISA.
>>>
>>> Since we can not add more information to the ap_device
>>> we add a new structure vfio_ap_queue, to hold per queue
>>> information useful to handle interruptions and set it as
>>> driver's data of the standard ap_queue device.
>>>
>>> Usually, the device and the mediated device are linked together
>>> but in the vfio_ap driver design we have a bunch of "sub" devices
>>> (the ap_queue devices) belonging to the mediated device.
>>>
>>> Linking these structure to the mediated device it is assigned to,
>>> with the help of the vfio_ap_queue structure will help us to
>>> retrieve the AP devices associated with the mediated devices
>>> during the mediated device operations.
>>>
>>> ------------    -------------
>>> | AP queue |--> | AP_vfio_q |<----
>>> ------------    ------^------    |    ---------------
>>>                        |          <--->| matrix_mdev |
>>> ------------    ------v------    |    ---------------
>>> | AP queue |--> | AP_vfio_q |-----
>>> ------------    -------------
>>>
>>> The vfio_ap_queue device will hold the following entries:
>>> - apqn: AP queue number (defined here)
>>> - isc : Interrupt subclass (defined later)
>>> - nib : notification information byte (defined later)
>>> - list: a list_head entry allowing to link this structure to a
>>>     matrix mediated device it is assigned to.
>>>
>>> The vfio_ap_queue structure is allocated when the vfio_ap_driver
>>> is probed and added as driver data to the ap_queue device.
>>> It is free on remove.
>>>
>>> The structure is linked to the matrix_dev host device at the
>>> probe of the device building some kind of free list for the
>>> matrix mediated devices.
>>>
>>> When the vfio_queue is associated to a matrix mediated device,
>>> the vfio_ap_queue device is linked to this matrix mediated device
>>> and unlinked when dissociated.
>>>
>>> This patch and the 3 next can be squashed together on the
>>> final release of this series.
>>> until then I separate them to ease the review.
>>>
>>> So please do not complain about unused functions or about
>>> squashing the patches together, this will be resolved during
>>> the last iteration.
>>>
>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>> ---
>>>   drivers/s390/crypto/vfio_ap_drv.c     | 27 ++++++++++++++++++++++++++-
>>>   drivers/s390/crypto/vfio_ap_private.h |  9 +++++++++
>>>   2 files changed, 35 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
>>> b/drivers/s390/crypto/vfio_ap_drv.c
>>> index e9824c3..eca0ffc 100644
>>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>>> @@ -40,14 +40,38 @@ static struct ap_device_id ap_queue_ids[] = {
>>>   MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>>> +/**
>>> + * vfio_ap_queue_dev_probe:
>>> + *
>>> + * Allocate a vfio_ap_queue structure and associate it
>>> + * with the device as driver_data.
>>> + */
>>>   static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>>>   {
>>> +    struct vfio_ap_queue *q;
>>> +
>>> +    q = kzalloc(sizeof(*q), GFP_KERNEL);
>>> +    if (!q)
>>> +        return -ENOMEM;
>>> +    dev_set_drvdata(&apdev->device, q);
>>> +    q->apqn = to_ap_queue(&apdev->device)->qid;
>>> +    INIT_LIST_HEAD(&q->list);
>>> +    list_add(&q->list, &matrix_dev->free_list);
>>>       return 0;
>>>   }
>>> +/**
>>> + * vfio_ap_queue_dev_remove:
>>> + *
>>> + * Free the associated vfio_ap_queue structure
>>> + */
>>>   static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>>>   {
>>> -    /* Nothing to do yet */
>>> +    struct vfio_ap_queue *q;
>>> +
>>> +    q = dev_get_drvdata(&apdev->device);
>>> +    list_del(&q->list);
>>> +    kfree(q);
>>>   }
>>>   static void vfio_ap_matrix_dev_release(struct device *dev)
>>> @@ -107,6 +131,7 @@ static int vfio_ap_matrix_dev_create(void)
>>>       matrix_dev->device.bus = &matrix_bus;
>>>       matrix_dev->device.release = vfio_ap_matrix_dev_release;
>>>       matrix_dev->vfio_ap_drv = &vfio_ap_drv;
>>> +    INIT_LIST_HEAD(&matrix_dev->free_list);
>>>       ret = device_register(&matrix_dev->device);
>>>       if (ret)
>>> diff --git a/drivers/s390/crypto/vfio_ap_private.h 
>>> b/drivers/s390/crypto/vfio_ap_private.h
>>> index 76b7f98..2760178 100644
>>> --- a/drivers/s390/crypto/vfio_ap_private.h
>>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>>> @@ -39,6 +39,7 @@ struct ap_matrix_dev {
>>>       atomic_t available_instances;
>>>       struct ap_config_info info;
>>>       struct list_head mdev_list;
>>> +    struct list_head free_list;
>>>       struct mutex lock;
>>>       struct ap_driver  *vfio_ap_drv;
>>>   };

>>> @@ -81,9 +82,17 @@ struct ap_matrix_mdev {
>>>       struct ap_matrix matrix;
>>>       struct notifier_block group_notifier;
>>>       struct kvm *kvm;

>>> +    struct list_head qlist;
>>
>> I do not see much value in maintaining two lists of at the
>> expense of complicating the code and introducing additional
>> processing (i.e., list rewinds etc.). IMHO, the only think it buys
>> us is being able to pass a smaller list to the vfio_ap_get_queue()
>> function to traverse. That function can traverse the list in
>> struct ap_matrix_dev to find a queue. I understand what you are
>> doing here in pulling vfio_ap_queue structs from the free_list
>> to add them to qlist for the mdev when adapter/domain assignment
>> takes place, but you are now maintaining links to the vfio_ap_queue
>> in multiple places; as drvdata as well as two lists. I think this
>> is over designing.
> 
> This is not completely exact, the drvdata allows to retrieve the 
> vfio_ap_queue structure from the AP device, which is global to all AP 
> devices, and not related to a mediated device.

The vfio_ap_queue structure has a field (matrix_mdev) which holds a
reference to the mediated device, so you are effectively maintaining the
relationship between the queue and the mdev device in two places; the
mdev device's qlist and the vfio_ap_queue structure's matrix_mdev. All
of the code to maintain and manipulate these lists is entirely
unnecessary.

In the previous version of this patch series, you provided
a function that used the driver_find_device() function to get a 
vfio_ap_queue struct by matching on its APQN. That is the only thing you
need to be able to provide every bit of the new function you've
introduced in this series without introducing all of this unnecessary
list manipulation that complicates things and doesn't add much value.

Let's take for example the vfio_ap_get_all_domains() function you
introduce in patch 3/7. Your claim below is that you win 200 LOCs
with this new list design. I can save you additional lines of code
by eliminating the lists. If you reinstate the function that uses the
driver_find_device() from your previous version of these patches,
all you need to do is call that function. You'll have the queue, without
consulting this new free_list. You can then remove the move_and_set
function and all of the list moving and rewinding. That saves you even
more LOCs and simplifies the implementation.

> 
> The vfio_ap_queue structure is only linked to one of the different lists 
> it can be linked to: the free_list or the mediated device list.
> 
> The complication of the code you mention make us win almost 200 LOCs, 
> some may see it as a simplification ;) .

See my comments above.

> 
> Regards,
> Pierre
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-02-22 15:29 ` [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev Pierre Morel
  2019-02-26 18:14   ` Tony Krowiak
  2019-02-27  9:32   ` Cornelia Huck
@ 2019-02-27 20:53   ` Tony Krowiak
  2019-03-04  2:09   ` Halil Pasic
  3 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2019-02-27 20:53 UTC (permalink / raw)
  To: Pierre Morel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/22/19 10:29 AM, Pierre Morel wrote:
> We need to associate the ap_vfio_queue, which will hold the
> per queue information for interrupt with a matrix mediated device
> which hold the configuration and the way to the CRYCB.
> 
> Let's do this when assigning a APID or a APQI to the mediated device
> and clear the relation when unassigning.
> 
> Queuing the devices on a list of free devices and testing the
> matrix_mdev pointer to the associated matrix allow us to know
> if the queue is associated to the matrix device and associated
> or not to a mediated device.
> 
> When resetting an AP queue we must wait until there are no more
> messages in the message queue before considering the queue is really
> in a clean state.
> 
> Let's do it and wait until the status response code indicate the
> queue is empty after issuing a PAPQ/ZAPQ instruction.
> 
> Being at work on the reset function, let's simplify
> vfio_ap_mdev_reset_queue and vfio_ap_mdev_reset_queues by using the
> vfio_ap_queue structure as parameter.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>   drivers/s390/crypto/vfio_ap_ops.c | 385 +++++++++++++++++++-------------------
>   1 file changed, 189 insertions(+), 196 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 900b9cf..172d6eb 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -24,6 +24,57 @@
>   #define VFIO_AP_MDEV_TYPE_HWVIRT "passthrough"
>   #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>   
> +/**
> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> + * @apqn: The queue APQN
> + *
> + * Retrieve a queue with a specific APQN from the list of the
> + * devices associated with a list.
> + *
> + * Returns the pointer to the associated vfio_ap_queue
> + */
> +struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	list_for_each_entry(q, l, list)
> +		if (q->apqn == apqn)
> +			return q;
> +	return NULL;
> +}

I think you can simplify this patch as well as save a number of LOCs by
going restoring your previous version of this function that used the
driver_find_device() function to retrieve the queue with a specific
APQN. Please see the rest of my comments for further clarification.

> +
> +static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
> +{
> +	struct ap_queue_status status;
> +	int retry = 20;
> +
> +	do {
> +		status = ap_zapq(q->apqn);
> +		switch (status.response_code) {
> +		case AP_RESPONSE_NORMAL:
> +			while (!status.queue_empty && retry--) {
> +				msleep(20);
> +				status = ap_tapq(q->apqn, NULL);
> +			}
> +			if (retry <= 0)
> +				pr_warn("%s: queue 0x%04x not empty\n",
> +					__func__, q->apqn);
> +			return 0;
> +		case AP_RESPONSE_RESET_IN_PROGRESS:
> +		case AP_RESPONSE_BUSY:
> +			msleep(20);
> +			break;
> +		default:
> +			/* things are really broken, give up */
> +			pr_warn("%s: zapq error %02x on apqn 0x%04x\n",
> +				__func__, status.response_code, q->apqn);
> +			return -EIO;
> +		}
> +	} while (retry--);
> +
> +	return -EBUSY;
> +}
> +
>   static void vfio_ap_matrix_init(struct ap_config_info *info,
>   				struct ap_matrix *matrix)
>   {
> @@ -45,6 +96,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>   		return -ENOMEM;
>   	}
>   
> +	INIT_LIST_HEAD(&matrix_mdev->qlist);
>   	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>   	mdev_set_drvdata(mdev, matrix_mdev);
>   	mutex_lock(&matrix_dev->lock);
> @@ -113,162 +165,160 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>   	NULL,
>   };
>   
> -struct vfio_ap_queue_reserved {
> -	unsigned long *apid;
> -	unsigned long *apqi;
> -	bool reserved;
> -};
> +static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);

If you restore the previous version of vfio_ap_get_queue(), we don't 
need the qlist to retrieve the q. The apqn is sufficient.

> +	if (!q)
> +		return;
> +	q->matrix_mdev = NULL;
> +	vfio_ap_mdev_reset_queue(q);
> +	list_move(&q->list, &matrix_dev->free_list);

If we get rid of the qlist and free_list, then we don't need the
list_move function.

> +}
>   
>   /**
> - * vfio_ap_has_queue
> - *
> - * @dev: an AP queue device
> - * @data: a struct vfio_ap_queue_reserved reference
> - *
> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
> - * apid or apqi specified in @data:
> + * vfio_ap_put_all_domains:
>    *
> - * - If @data contains both an apid and apqi value, then @data will be flagged
> - *   as reserved if the APID and APQI fields for the AP queue device matches
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + *		 all available queues with a given apqi.
> + * @apid:	 The apid which associated with all defined APQI of the
> + *		 mediated device will define a AP queue.
>    *
> - * - If @data contains only an apid value, @data will be flagged as
> - *   reserved if the APID field in the AP queue device matches
> - *
> - * - If @data contains only an apqi value, @data will be flagged as
> - *   reserved if the APQI field in the AP queue device matches
> - *
> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
> - * @data does not contain either an apid or apqi.
> + * We remove the queue from the list of queues associated with the
> + * mediated device and put them back to the free list of the matrix
> + * device and clear the matrix_mdev pointer.
>    */
> -static int vfio_ap_has_queue(struct device *dev, void *data)
> +static void vfio_ap_put_all_domains(struct ap_matrix_mdev *matrix_mdev,
> +				    int apid)
>   {
> -	struct vfio_ap_queue_reserved *qres = data;
> -	struct ap_queue *ap_queue = to_ap_queue(dev);
> -	ap_qid_t qid;
> -	unsigned long id;
> +	int apqi, apqn;
>   
> -	if (qres->apid && qres->apqi) {
> -		qid = AP_MKQID(*qres->apid, *qres->apqi);
> -		if (qid == ap_queue->qid)
> -			qres->reserved = true;
> -	} else if (qres->apid && !qres->apqi) {
> -		id = AP_QID_CARD(ap_queue->qid);
> -		if (id == *qres->apid)
> -			qres->reserved = true;
> -	} else if (!qres->apid && qres->apqi) {
> -		id = AP_QID_QUEUE(ap_queue->qid);
> -		if (id == *qres->apqi)
> -			qres->reserved = true;
> -	} else {
> -		return -EINVAL;
> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> +		apqn = AP_MKQID(apid, apqi);
> +		vfio_ap_free_queue(apqn, matrix_mdev);
>   	}
> -
> -	return 0;
>   }
>   
>   /**
> - * vfio_ap_verify_queue_reserved
> - *
> - * @matrix_dev: a mediated matrix device
> - * @apid: an AP adapter ID
> - * @apqi: an AP queue index
> - *
> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
> - * driver according to the following rules:
> + * vfio_ap_put_all_cards:
>    *
> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
> - *   device bound to the vfio_ap driver with the APQN identified by @apid and
> - *   @apqi
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + *		 all available queues with a given apqi.
> + * @apqi:	 The apqi which associated with all defined APID of the
> + *		 mediated device will define a AP queue.
>    *
> - * - If only @apid is not NULL, then there must be an AP queue device bound
> - *   to the vfio_ap driver with an APQN containing @apid
> - *
> - * - If only @apqi is not NULL, then there must be an AP queue device bound
> - *   to the vfio_ap driver with an APQN containing @apqi
> - *
> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
> + * We remove the queue from the list of queues associated with the
> + * mediated device and put them back to the free list of the matrix
> + * device and clear the matrix_mdev pointer.
>    */
> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
> -					 unsigned long *apqi)
> +static void vfio_ap_put_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
>   {
> -	int ret;
> -	struct vfio_ap_queue_reserved qres;
> +	int apid, apqn;
>   
> -	qres.apid = apid;
> -	qres.apqi = apqi;
> -	qres.reserved = false;
> -
> -	ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> -				     &qres, vfio_ap_has_queue);
> -	if (ret)
> -		return ret;
> -
> -	if (qres.reserved)
> -		return 0;
> -
> -	return -EADDRNOTAVAIL;
> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> +		apqn = AP_MKQID(apid, apqi);
> +		vfio_ap_free_queue(apqn, matrix_mdev);
> +	}
>   }
>   
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
> -					     unsigned long apid)
> +static void move_and_set(struct list_head *src, struct list_head *dst,
> +			 struct ap_matrix_mdev *matrix_mdev)
>   {
> -	int ret;
> -	unsigned long apqi;
> -	unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
> -
> -	if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
> -		return vfio_ap_verify_queue_reserved(&apid, NULL);
> +	struct vfio_ap_queue *q, *qtmp;
>   
> -	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
> -		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> -		if (ret)
> -			return ret;
> +	list_for_each_entry_safe(q, qtmp, src, list) {
> +		list_move(&q->list, dst);
> +		q->matrix_mdev = matrix_mdev;
>   	}

If we get rid of the lists, this function becomes unnecessary.

> -
> +}
> +/**
> + * vfio_ap_get_all_domains:
> + *
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + *		 all available queues with a given apqi.
> + * @apqi:	 The apqi which associated with all defined APID of the
> + *		 mediated device will define a AP queue.
> + *
> + * We define a local list to put all queues we find on the matrix device
> + * free list when associating the apqi with all already defined apid for
> + * this matrix mediated device.
> + *
> + * If we can get all the devices we roll them to the mediated device list
> + * If we get errors we unroll them to the free list.
> + */
> +static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
> +{
> +	int apqi, apqn;
> +	int ret = 0;
> +	struct vfio_ap_queue *q;
> +	struct list_head q_list;
> +
> +	INIT_LIST_HEAD(&q_list);
> +
> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> +		apqn = AP_MKQID(apid, apqi);
> +		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
> +		if (!q) {
> +			ret = -EADDRNOTAVAIL;
> +			goto rewind;

If we get rid of the lists, there is no need to rewind

> +		}
> +		if (q->matrix_mdev) {
> +			ret = -EADDRINUSE;
> +			goto rewind;

If we get rid of the lists, there is no need to rewind

> +		}
> +		list_move(&q->list, &q_list);

If we get rid of the lists, there is no need to move the queue to
q_list.

> +	}
> +	move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);

If we get rid of the lists, this call becomes unnecessary.

>   	return 0;
> +rewind:
> +	move_and_set(&q_list, &matrix_dev->free_list, NULL);

If we get rid of the lists, this call becomes unnecessary.

> +	return ret;
>   }
> -
>   /**
> - * vfio_ap_mdev_verify_no_sharing
> + * vfio_ap_get_all_cards:
>    *
> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> - * and AP queue indexes comprising the AP matrix are not configured for another
> - * mediated device. AP queue sharing is not allowed.
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + *		 all available queues with a given apqi.
> + * @apqi:	 The apqi which associated with all defined APID of the
> + *		 mediated device will define a AP queue.
>    *
> - * @matrix_mdev: the mediated matrix device
> + * We define a local list to put all queues we find on the matrix device
> + * free list when associating the apqi with all already defined apid for
> + * this matrix mediated device.
>    *
> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> + * If we can get all the devices we roll them to the mediated device list
> + * If we get errors we unroll them to the free list.
>    */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
>   {
> -	struct ap_matrix_mdev *lstdev;
> -	DECLARE_BITMAP(apm, AP_DEVICES);
> -	DECLARE_BITMAP(aqm, AP_DOMAINS);
> -
> -	list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
> -		if (matrix_mdev == lstdev)
> -			continue;
> -
> -		memset(apm, 0, sizeof(apm));
> -		memset(aqm, 0, sizeof(aqm));
> -
> -		/*
> -		 * We work on full longs, as we can only exclude the leftover
> -		 * bits in non-inverse order. The leftover is all zeros.
> -		 */
> -		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> -				lstdev->matrix.apm, AP_DEVICES))
> -			continue;
> -
> -		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> -				lstdev->matrix.aqm, AP_DOMAINS))
> -			continue;
> -
> -		return -EADDRINUSE;
> +	int apid, apqn;
> +	int ret = 0;
> +	struct vfio_ap_queue *q;
> +	struct list_head q_list;
> +	struct ap_matrix_mdev *tmp = NULL;
> +
> +	INIT_LIST_HEAD(&q_list);
> +
> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> +		apqn = AP_MKQID(apid, apqi);
> +		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
> +		if (!q) {
> +			ret = -EADDRNOTAVAIL;
> +			goto rewind;

No lists, no rewind necessary

> +		}
> +		if (q->matrix_mdev) {
> +			ret = -EADDRINUSE;
> +			goto rewind;

No lists, no rewind necessary

> +		}
> +		list_move(&q->list, &q_list);

No lists, no need to move the queue to q_list

>   	}
> -
> +	tmp = matrix_mdev;
> +	move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);

No lists, no need to move and set.

>   	return 0;
> +rewind:
> +	move_and_set(&q_list, &matrix_dev->free_list, NULL);

No lists, no need to move and set.

> +	return ret;
>   }
>   
>   /**
> @@ -330,21 +380,15 @@ static ssize_t assign_adapter_store(struct device *dev,
>   	 */
>   	mutex_lock(&matrix_dev->lock);
>   
> -	ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
> +	ret = vfio_ap_get_all_domains(matrix_mdev, apid);
>   	if (ret)
>   		goto done;
>   
>   	set_bit_inv(apid, matrix_mdev->matrix.apm);
>   
> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> -	if (ret)
> -		goto share_err;
> -
>   	ret = count;
>   	goto done;
>   
> -share_err:
> -	clear_bit_inv(apid, matrix_mdev->matrix.apm);
>   done:
>   	mutex_unlock(&matrix_dev->lock);
>   
> @@ -391,32 +435,13 @@ static ssize_t unassign_adapter_store(struct device *dev,
>   
>   	mutex_lock(&matrix_dev->lock);
>   	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> +	vfio_ap_put_all_domains(matrix_mdev, apid);
>   	mutex_unlock(&matrix_dev->lock);
>   
>   	return count;
>   }
>   static DEVICE_ATTR_WO(unassign_adapter);
>   
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
> -					     unsigned long apqi)
> -{
> -	int ret;
> -	unsigned long apid;
> -	unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
> -
> -	if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
> -		return vfio_ap_verify_queue_reserved(NULL, &apqi);
> -
> -	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
> -		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	return 0;
> -}
> -
>   /**
>    * assign_domain_store
>    *
> @@ -471,21 +496,15 @@ static ssize_t assign_domain_store(struct device *dev,
>   
>   	mutex_lock(&matrix_dev->lock);
>   
> -	ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
> +	ret = vfio_ap_get_all_cards(matrix_mdev, apqi);
>   	if (ret)
>   		goto done;
>   
>   	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>   
> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
> -	if (ret)
> -		goto share_err;
> -
>   	ret = count;
>   	goto done;
>   
> -share_err:
> -	clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
>   done:
>   	mutex_unlock(&matrix_dev->lock);
>   
> @@ -533,6 +552,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>   
>   	mutex_lock(&matrix_dev->lock);
>   	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
> +	vfio_ap_put_all_cards(matrix_mdev, apqi);
>   	mutex_unlock(&matrix_dev->lock);
>   
>   	return count;
> @@ -790,49 +810,22 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>   	return NOTIFY_OK;
>   }
>   
> -static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> -				    unsigned int retry)
> -{
> -	struct ap_queue_status status;
> -
> -	do {
> -		status = ap_zapq(AP_MKQID(apid, apqi));
> -		switch (status.response_code) {
> -		case AP_RESPONSE_NORMAL:
> -			return 0;
> -		case AP_RESPONSE_RESET_IN_PROGRESS:
> -		case AP_RESPONSE_BUSY:
> -			msleep(20);
> -			break;
> -		default:
> -			/* things are really broken, give up */
> -			return -EIO;
> -		}
> -	} while (retry--);
> -
> -	return -EBUSY;
> -}
> -
>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>   {
>   	int ret;
>   	int rc = 0;
> -	unsigned long apid, apqi;
>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +	struct vfio_ap_queue *q;
>   
> -	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
> -			     matrix_mdev->matrix.apm_max + 1) {
> -		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> -				     matrix_mdev->matrix.aqm_max + 1) {
> -			ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
> -			/*
> -			 * Regardless whether a queue turns out to be busy, or
> -			 * is not operational, we need to continue resetting
> -			 * the remaining queues.
> -			 */
> -			if (ret)
> -				rc = ret;
> -		}
> +	list_for_each_entry(q, &matrix_mdev->qlist, list) {
> +		ret = vfio_ap_mdev_reset_queue(q);
> +		/*
> +		 * Regardless whether a queue turns out to be busy, or
> +		 * is not operational, we need to continue resetting
> +		 * the remaining queues but notice the last error code.
> +		 */
> +		if (ret)
> +			rc = ret;
>   	}

There is no need for this change. Without the lists, we can keep the
code as-is above. Having the list buys us absolutely nothing here.

>   
>   	return rc;
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier
  2019-02-22 15:29 ` [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier Pierre Morel
  2019-02-27  9:42   ` Cornelia Huck
@ 2019-02-28  8:23   ` Christian Borntraeger
  2019-02-28  8:48     ` Pierre Morel
  1 sibling, 1 reply; 79+ messages in thread
From: Christian Borntraeger @ 2019-02-28  8:23 UTC (permalink / raw)
  To: Pierre Morel
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

On 22.02.2019 16:29, Pierre Morel wrote:
> To be able to use the VFIO interface to facilitate the
> mediated device memory pining/unpining we need to register
> a notifier for IOMMU.

You might want to add that while we start to pin one guest page for the
interrupt indicator byte in the next patch, this is still ok with ballooning
as this page will never be used by the guest virtio-balloon driver. So the
pinned page will never be freed. And even a broken guest does so, that would
not impact the host as the original page is still in control by vfio.

> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c     | 53 ++++++++++++++++++++++++++++++++---
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  2 files changed, 51 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 172d6eb..1b5130a 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -748,6 +748,36 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
>  };
>  
>  /**
> + * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
> + *
> + * @nb: The notifier block
> + * @action: Action to be taken (VFIO_IOMMU_NOTIFY_DMA_UNMAP)
> + * @data: the specific unmap structure for vfio_iommu_type1
> + *
> + * Unpins the guest IOVA. (The NIB guest address we pinned before).
> + * Return NOTIFY_OK after unpining on a UNMAP request.
> + * otherwise, returns NOTIFY_DONE .
> + */
> +static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
> +				       unsigned long action, void *data)
> +{
> +	struct ap_matrix_mdev *matrix_mdev;
> +
> +	matrix_mdev = container_of(nb, struct ap_matrix_mdev, iommu_notifier);
> +
> +	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
> +		struct vfio_iommu_type1_dma_unmap *unmap = data;
> +		unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
> +
> +		vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
> +		return NOTIFY_OK;
> +	}
> +
> +	return NOTIFY_DONE;
> +}
> +
> +
> +/**
>   * vfio_ap_mdev_set_kvm
>   *
>   * @matrix_mdev: a mediated matrix device
> @@ -846,12 +876,25 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>  
>  	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>  				     &events, &matrix_mdev->group_notifier);
> -	if (ret) {
> -		module_put(THIS_MODULE);
> -		return ret;
> -	}
> +	if (ret)
> +		goto err_group;
> +
> +	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
> +	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
> +
> +	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> +				     &events, &matrix_mdev->iommu_notifier);
> +	if (ret)
> +		goto err_iommu;
>  
>  	return 0;
> +
> +err_iommu:
> +	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> +				 &matrix_mdev->group_notifier);
> +err_group:
> +	module_put(THIS_MODULE);
> +	return ret;
>  }
>  
>  static void vfio_ap_mdev_release(struct mdev_device *mdev)
> @@ -864,6 +907,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>  	vfio_ap_mdev_reset_queues(mdev);
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>  				 &matrix_mdev->group_notifier);
> +	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> +				 &matrix_mdev->iommu_notifier);
>  	matrix_mdev->kvm = NULL;
>  	module_put(THIS_MODULE);
>  }
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 2760178..e535735 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -81,8 +81,10 @@ struct ap_matrix_mdev {
>  	struct list_head node;
>  	struct ap_matrix matrix;
>  	struct notifier_block group_notifier;
> +	struct notifier_block iommu_notifier;
>  	struct kvm *kvm;
>  	struct list_head qlist;
> +	struct mdev_device *mdev;
>  };
>  
>  extern int vfio_ap_mdev_register(void);
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-25 18:36   ` Tony Krowiak
  2019-02-26 11:47     ` Pierre Morel
@ 2019-02-28  8:31     ` Christian Borntraeger
  1 sibling, 0 replies; 79+ messages in thread
From: Christian Borntraeger @ 2019-02-28  8:31 UTC (permalink / raw)
  To: Tony Krowiak, Pierre Morel
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu



On 25.02.2019 19:36, Tony Krowiak wrote:
> On 2/22/19 10:29 AM, Pierre Morel wrote:
>> We prepare the interception of the PQAP/AQIC instruction for
>> the case the AQIC facility is enabled in the guest.
>>
>> We add a callback inside the KVM arch structure for s390 for
>> a VFIO driver to handle a specific response to the PQAP
>> instruction with the AQIC command.
>>
>> We inject the correct exceptions from inside KVM for the case the
>> callback is not initialized, which happens when the vfio_ap driver
>> is not loaded.
>>
>> If the callback has been setup we call it.
>> If not we setup an answer considering that no queue is available
>> for the guest when no callback has been setup.
>>
>> We do consider the responsability of the driver to always initialize
>> the PQAP callback if it defines queues by initializing the CRYCB for
>> a guest.
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/kvm_host.h |  1 +
>>   arch/s390/kvm/priv.c             | 52 ++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 53 insertions(+)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index c5f5156..49cc8b0 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -719,6 +719,7 @@ struct kvm_s390_cpu_model {
>>     struct kvm_s390_crypto {
>>       struct kvm_s390_crypto_cb *crycb;
>> +    int (*pqap_hook)(struct kvm_vcpu *vcpu);
>>       __u32 crycbd;
>>       __u8 aes_kw;
>>       __u8 dea_kw;
>> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
>> index 8679bd7..3448abd 100644
>> --- a/arch/s390/kvm/priv.c
>> +++ b/arch/s390/kvm/priv.c
>> @@ -27,6 +27,7 @@
>>   #include <asm/io.h>
>>   #include <asm/ptrace.h>
>>   #include <asm/sclp.h>
>> +#include <asm/ap.h>
>>   #include "gaccess.h"
>>   #include "kvm-s390.h"
>>   #include "trace.h"
>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>       }
>>   }
>>   +/*
>> + * handle_pqap: Handling pqap interception
>> + * @vcpu: the vcpu having issue the pqap instruction
>> + *
>> + * We now support PQAP/AQIC instructions and we need to correctly
>> + * answer the guest even if no dedicated driver's hook is available.
>> + *
>> + * The intercepting code calls a dedicated callback for this instruction
>> + * if a driver did register one in the CRYPTO satellite of the
>> + * SIE block.
>> + *
>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>> + *
>> + * If no callback available, the queues are not available, return this to
>> + * the caller.
>> + * Else return the value returned by the callback.
>> + */
>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>> +{
>> +    uint8_t fc;
>> +    struct ap_queue_status status = {};
>> +
>> +    /* Verify that the AP instruction are available */
>> +    if (!ap_instructions_available())
>> +        return -EOPNOTSUPP;
> 
> How can the guest even execute an AP instruction if the AP instructions
> are not available? If the AP instructions are not available on the host,
> they will not be available on the guest (i.e., CPU model feature
> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
> here given QEMU may not be the only client.

The guest can always issue that instruction, even without the facility bit
and we very likely get an instruction intercept. 
I think the checks below would also catch this, but it certainly does not
hurt?
> 
>> +    /* Verify that the guest is allowed to use AP instructions */
>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>> +        return -EOPNOTSUPP;
>> +    /* Verify that the function code is AQIC */
>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>> +    if (fc != 0x03)
>> +        return -EOPNOTSUPP;


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier
  2019-02-28  8:23   ` Christian Borntraeger
@ 2019-02-28  8:48     ` Pierre Morel
  2019-02-28 16:55       ` Halil Pasic
  0 siblings, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-28  8:48 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

On 28/02/2019 09:23, Christian Borntraeger wrote:
> On 22.02.2019 16:29, Pierre Morel wrote:
>> To be able to use the VFIO interface to facilitate the
>> mediated device memory pining/unpining we need to register
>> a notifier for IOMMU.
> 
> You might want to add that while we start to pin one guest page for the
> interrupt indicator byte in the next patch, this is still ok with ballooning
> as this page will never be used by the guest virtio-balloon driver. So the
> pinned page will never be freed. And even a broken guest does so, that would
> not impact the host as the original page is still in control by vfio.
> 

Thanks, I ll do.

Regards,
Pierre

>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c     | 53 ++++++++++++++++++++++++++++++++---
>>   drivers/s390/crypto/vfio_ap_private.h |  2 ++
>>   2 files changed, 51 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 172d6eb..1b5130a 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -748,6 +748,36 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
>>   };
>>   
>>   /**
>> + * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
>> + *
>> + * @nb: The notifier block
>> + * @action: Action to be taken (VFIO_IOMMU_NOTIFY_DMA_UNMAP)
>> + * @data: the specific unmap structure for vfio_iommu_type1
>> + *
>> + * Unpins the guest IOVA. (The NIB guest address we pinned before).
>> + * Return NOTIFY_OK after unpining on a UNMAP request.
>> + * otherwise, returns NOTIFY_DONE .
>> + */
>> +static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>> +				       unsigned long action, void *data)
>> +{
>> +	struct ap_matrix_mdev *matrix_mdev;
>> +
>> +	matrix_mdev = container_of(nb, struct ap_matrix_mdev, iommu_notifier);
>> +
>> +	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
>> +		struct vfio_iommu_type1_dma_unmap *unmap = data;
>> +		unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
>> +
>> +		vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
>> +		return NOTIFY_OK;
>> +	}
>> +
>> +	return NOTIFY_DONE;
>> +}
>> +
>> +
>> +/**
>>    * vfio_ap_mdev_set_kvm
>>    *
>>    * @matrix_mdev: a mediated matrix device
>> @@ -846,12 +876,25 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>>   
>>   	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>>   				     &events, &matrix_mdev->group_notifier);
>> -	if (ret) {
>> -		module_put(THIS_MODULE);
>> -		return ret;
>> -	}
>> +	if (ret)
>> +		goto err_group;
>> +
>> +	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
>> +	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
>> +
>> +	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>> +				     &events, &matrix_mdev->iommu_notifier);
>> +	if (ret)
>> +		goto err_iommu;
>>   
>>   	return 0;
>> +
>> +err_iommu:
>> +	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>> +				 &matrix_mdev->group_notifier);
>> +err_group:
>> +	module_put(THIS_MODULE);
>> +	return ret;
>>   }
>>   
>>   static void vfio_ap_mdev_release(struct mdev_device *mdev)
>> @@ -864,6 +907,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>>   	vfio_ap_mdev_reset_queues(mdev);
>>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>>   				 &matrix_mdev->group_notifier);
>> +	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>> +				 &matrix_mdev->iommu_notifier);
>>   	matrix_mdev->kvm = NULL;
>>   	module_put(THIS_MODULE);
>>   }
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index 2760178..e535735 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -81,8 +81,10 @@ struct ap_matrix_mdev {
>>   	struct list_head node;
>>   	struct ap_matrix matrix;
>>   	struct notifier_block group_notifier;
>> +	struct notifier_block iommu_notifier;
>>   	struct kvm *kvm;
>>   	struct list_head qlist;
>> +	struct mdev_device *mdev;
>>   };
>>   
>>   extern int vfio_ap_mdev_register(void);
>>


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-27 18:00           ` Tony Krowiak
@ 2019-02-28  9:42             ` Christian Borntraeger
  2019-02-28 11:03               ` Christian Borntraeger
                                 ` (3 more replies)
  0 siblings, 4 replies; 79+ messages in thread
From: Christian Borntraeger @ 2019-02-28  9:42 UTC (permalink / raw)
  To: Tony Krowiak, pmorel
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu



On 27.02.2019 19:00, Tony Krowiak wrote:
> On 2/27/19 3:09 AM, Pierre Morel wrote:
>> On 26/02/2019 16:47, Tony Krowiak wrote:
>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>>> the case the AQIC facility is enabled in the guest.
>>>>>>
>>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>>> instruction with the AQIC command.
>>>>>>
>>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>>> is not loaded.
>>>>>>
>>>>>> If the callback has been setup we call it.
>>>>>> If not we setup an answer considering that no queue is available
>>>>>> for the guest when no callback has been setup.
>>>>>>
>>>>>> We do consider the responsability of the driver to always initialize
>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>>> a guest.
>>>>>>
>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>
>>>> ...snip...
>>>>
>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>>       }
>>>>>>   }
>>>>>> +/*
>>>>>> + * handle_pqap: Handling pqap interception
>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>>> + *
>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>>> + *
>>>>>> + * The intercepting code calls a dedicated callback for this instruction
>>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>>> + * SIE block.
>>>>>> + *
>>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>>>>>> + *
>>>>>> + * If no callback available, the queues are not available, return this to
>>>>>> + * the caller.
>>>>>> + * Else return the value returned by the callback.
>>>>>> + */
>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +    uint8_t fc;
>>>>>> +    struct ap_queue_status status = {};
>>>>>> +
>>>>>> +    /* Verify that the AP instruction are available */
>>>>>> +    if (!ap_instructions_available())
>>>>>> +        return -EOPNOTSUPP;
>>>>>
>>>>> How can the guest even execute an AP instruction if the AP instructions
>>>>> are not available? If the AP instructions are not available on the host,
>>>>> they will not be available on the guest (i.e., CPU model feature
>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>>> here given QEMU may not be the only client.
>>>>>
>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>>> +        return -EOPNOTSUPP;
>>>>>> +    /* Verify that the function code is AQIC */
>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>>> +    if (fc != 0x03)
>>>>>> +        return -EOPNOTSUPP;
>>>>>
>>>>> You must have missed my suggestion to move this to the
>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>>
>>>> Please consider what happen if the vfio_ap module is not loaded.
>>>
>>> I have considered it and even verified my expectations empirically. If
>>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
>>
>> OK, now please consider that another userland tool, not QEMU uses KVM.
> 
> What does that have to do with loading the vfio_ap module? Without the
> vfio_ap module, there will be no AP devices for the guest. What are you
> suggesting here?
> 
>>
>>> If you don't have an mdev device, you will not be able to
>>> start a guest with a vfio-ap device. If you start a guest without a
>>> vfio-ap device, but enable AP instructions for the guest, there will be
>>> no AP devices attached to the guest. Without any AP devices attached,
>>> the PQAP(AQIC) instructions will not ever get executed.
>>
>> This is not right. The instruction will be executed, eventually, after decoding.
> 
> Please explain why the PQAP(AQIC) instruction will be executed on a
> guest without any devices? Point me to the code in the AP bus where
> PQAP(AQIC) is executed without a queue?

The host must be prepared to handle malicous and broken guests. So if
a guest does PQAP, we must handle that gracefully (e.g. by injecting an
exception)

> 
>>
>>> Even if for some
>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>>> reason, it will fail with response code 0x01, AP-queue number not valid.
>>
>> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
>> - OPERATION EXCEPTION if the micro-code is not installed
>> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
>> - SPECIFICATION exception if the instruction do not respect the usage specification
>>
>> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
>>
>> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
>>
>> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
>>
>> That done, the AP queue virtualisation can be called, this is done by calling the hook.
> 
> Okay, let's go back to the genesis of this discussion; namely, my
> suggestion about moving the fc == 0x03 check into the hook code. If
> the vfio_ap module is not loaded, there will be no hook code. In that
> case, the check for the hook will fail and ultimately response code
> 0x01 will be set in the status word (which may not be the right thing
> to do?). You have not stated a single good reason for keeping this
> check, but I'm done with this silly argument. It certainly doesn't
> hurt anything.

The instruction handler must handle the basic checks for the
instruction itself as outlined above.

Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
The we should pass along everything to QEMU, but this is already done with the
ECA_APIE check, correct?

Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
in QEMU and we have enabled the AP instructions interpretion?
If yes then this has some implication:

1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
right away with a specification exception. I do not want the hook to mess with
the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
3. What shall we do when fc == 0x3? We can certainly do the check here OR in the
hook. As long as we have only fc==3 this does not matter.

Correct?

> 
>>
>>>
>>>
>>>>
>>>>>
>>>>> Message ID <342ffd56-b73a-b1f4-004d-de2c4aeef729@linux.ibm.com>
>>>>> Message ID <e04f0c8b-2fd9-1846-334a-faa48e0e051e@linux.ibm.com>
>>>>>
>>>>> You previously stated:
>>>>>
>>>>>     "QEMU and KVM can both accept PQAP/AQIC even if the vfio_ap driver is
>>>>>      not loaded. However now that the guest officially get the PQAP/AQIC
>>>>>      instruction we need to handle the specification and operation
>>>>>      exceptions inside KVM _before_ testing and even calling the driver
>>>>>      hook.
>>>>>
>>>>>      I will make the changes in the next iteration."
>>>>
>>>> Still seems right to me, and is done is this patch.
>>>> Isn't it?
>>>
>>> I don't think it's a matter of right and wrong, it's a matter of what
>>> makes sense. IMHO, you want to make things easy if other PQAP functions
>>> are intercepted at some time. In my opinion, there should be a switch
>>> statement in the pqap hook code with a case statement for each PQAP
>>> function supported by the hook. To plug in a new PQAP function handler,
>>> it will be a simple matter of writing the handler function and calling
>>> it from the case statement, like this:
>>>
>>> static int handle_pqap(struct kvm_vcpu *vcpu)
>>> {
>>>      int ret;
>>>      uint8_t fc;
>>>
>>>      fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>
>>>      switch (fc) {
>>>      case 0x03:
>>>          ret = handle_pqap_aqic(vcpu);
>>>      default:
>>>          ret = -EOPNOTSUPP;
>>>      }
>>>
>>>      return ret;
>>> }
>>>
>>> That function belongs in the pqap hook. I see no reaason whatsoever to
>>> check the function code here. If there is no hook, then you will fall
>>> through to the instruction below:
>>>
>>> status.response_code = 0x01;
>>
>> See answer above, what you are speaking about is the execution of the instruction, but there can be exceptions during the decode of the instruction.
> 
> What are you talking about, "decode of the instruction".

I think Pierre is talking about the the KVM instruction decoder.
(see handle_instruction in  intercept.c that will then call handle_b2
and then call handle_pqap).

>>
>>>
>>>>
>>>>>
>>>>> I don't know what any of the above has to do with checking FC=0x03? If
>>>>> that check is moved to the pqap handler hook, it can just as well return
>>>>> -EOPNOTSUPP. In fact, down below you do this:
>>>>>
>>>>>      return vcpu->kvm->arch.crypto.pqap_hook(vcpu);
>>>>>
>>>>> If the RC=0x03 check fails in the hook, it will return -EOPNOTSUPP just
>>>>> like above. None of this is critical, but the parsing of the register
>>>>> values for the PQAP(AQIC) function ought to be done in the code that
>>>>> handles the PQAP instruction IMHO.
>>>>
>>>>
>>>> This interception code must handle the PQAP/AQIC instruction when the hook is not used and should not modify the handling for other PQAP instructions.
>>>> We can not move anything inside the hook that must be always done.
>>>
>>> What you are saying here makes no sense. If the check for the function
>>> code is moved into the pqap hook and fc != 0x03, the result will be
>>> exactly the same; the hook will return -EOPNOTSUPP.
>>
>> again please consider that the hook may not be initialized.
> 
> 
> So what? Then maybe the code at the end of the function is wrong:
> 
> /* PQAP/AQIC instructions are authorized but there is no queue */
> status.response_code = 0x01;
> memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
> return 0;
> 
> Why does this make sense? What if the APQN is valid? You don't even know
> whether it is or not. The only reason you would even reach this
> instruction is if the pqap hook is not initialized. Wouldn't it make
> more sense to just return -EOPNOTSUPP here? If there is no hook, then
> it is not supported.
> 
>>
>> Regards,
>> Pierre
>>
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28  9:42             ` Christian Borntraeger
@ 2019-02-28 11:03               ` Christian Borntraeger
  2019-02-28 11:22                 ` Cornelia Huck
                                   ` (2 more replies)
  2019-02-28 12:39               ` Halil Pasic
                                 ` (2 subsequent siblings)
  3 siblings, 3 replies; 79+ messages in thread
From: Christian Borntraeger @ 2019-02-28 11:03 UTC (permalink / raw)
  To: Tony Krowiak, pmorel
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu



On 28.02.2019 10:42, Christian Borntraeger wrote:
[...]
>> Okay, let's go back to the genesis of this discussion; namely, my
>> suggestion about moving the fc == 0x03 check into the hook code. If
>> the vfio_ap module is not loaded, there will be no hook code. In that
>> case, the check for the hook will fail and ultimately response code
>> 0x01 will be set in the status word (which may not be the right thing
>> to do?). You have not stated a single good reason for keeping this
>> check, but I'm done with this silly argument. It certainly doesn't
>> hurt anything.
> 
> The instruction handler must handle the basic checks for the
> instruction itself as outlined above.
> 
> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
> The we should pass along everything to QEMU, but this is already done with the
> ECA_APIE check, correct?
> 
> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
> in QEMU and we have enabled the AP instructions interpretion?
> If yes then this has some implication:
> 
> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
> right away with a specification exception. I do not want the hook to mess with
> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
> 3. What shall we do when fc == 0x3? We can certainly do the check here OR in the
> hook. As long as we have only fc==3 this does not matter.
> 
> Correct?

Thinking more about that, I think we should inject a specification exception for all
unknown FCc != 0x3. That would also qualify for keeping it in the instruction handler.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 11:03               ` Christian Borntraeger
@ 2019-02-28 11:22                 ` Cornelia Huck
  2019-02-28 13:16                   ` Pierre Morel
  2019-02-28 13:10                 ` Pierre Morel
  2019-02-28 15:36                 ` Tony Krowiak
  2 siblings, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2019-02-28 11:22 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Tony Krowiak, pmorel, alex.williamson, linux-kernel, linux-s390,
	kvm, frankja, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

On Thu, 28 Feb 2019 12:03:38 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 28.02.2019 10:42, Christian Borntraeger wrote:
> [...]
> >> Okay, let's go back to the genesis of this discussion; namely, my
> >> suggestion about moving the fc == 0x03 check into the hook code. If
> >> the vfio_ap module is not loaded, there will be no hook code. In that
> >> case, the check for the hook will fail and ultimately response code
> >> 0x01 will be set in the status word (which may not be the right thing
> >> to do?). You have not stated a single good reason for keeping this
> >> check, but I'm done with this silly argument. It certainly doesn't
> >> hurt anything.  
> > 
> > The instruction handler must handle the basic checks for the
> > instruction itself as outlined above.
> > 
> > Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
> > The we should pass along everything to QEMU, but this is already done with the
> > ECA_APIE check, correct?
> > 
> > Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
> > in QEMU and we have enabled the AP instructions interpretion?
> > If yes then this has some implication:
> > 
> > 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
> > 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
> > right away with a specification exception. I do not want the hook to mess with
> > the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
> > 3. What shall we do when fc == 0x3? We can certainly do the check here OR in the
> > hook. As long as we have only fc==3 this does not matter.
> > 
> > Correct?  
> 
> Thinking more about that, I think we should inject a specification exception for all
> unknown FCc != 0x3. That would also qualify for keeping it in the instruction handler.
> 

So, to summarize, the function should do:
- Is userspace supposed to emulate everything (!ECA_APIE)? Return
  -EOPNOTSUPP to hand control to it.
- We are now interpreting the instruction in KVM. Do common checks
  (PSTATE etc.) and inject exceptions, if needed.
- Now look at the fc; if there's a handler for it, call that; if not
  (case does not attempt to call a specific handler, or no handler
  registered), inject a specification exception. (Do we want pre-checks
  like for facility 65 here, or in the handler?)

That response code 0x01 thingy probably needs to go into the specific
handler function, if anywhere (don't know the semantics, sorry).

Question: Will the handlers for the individual fcs need to generate
different exceptions on their own? I.e., do they need to do injections
themselves, or should the calling function possibly inject an exception
on error?

(Are there more possible fcs than 0x3 and whatever the other
subfunction was?)

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28  9:42             ` Christian Borntraeger
  2019-02-28 11:03               ` Christian Borntraeger
@ 2019-02-28 12:39               ` Halil Pasic
  2019-02-28 14:12                 ` Pierre Morel
  2019-02-28 15:43                 ` Tony Krowiak
  2019-02-28 13:23               ` Pierre Morel
  2019-02-28 15:35               ` Tony Krowiak
  3 siblings, 2 replies; 79+ messages in thread
From: Halil Pasic @ 2019-02-28 12:39 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Tony Krowiak, pmorel, alex.williamson, cohuck, linux-kernel,
	linux-s390, kvm, frankja, david, schwidefsky, heiko.carstens,
	freude, mimu

On Thu, 28 Feb 2019 10:42:23 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> 
> 
> On 27.02.2019 19:00, Tony Krowiak wrote:
> > On 2/27/19 3:09 AM, Pierre Morel wrote:
> >> On 26/02/2019 16:47, Tony Krowiak wrote:
> >>> On 2/26/19 6:47 AM, Pierre Morel wrote:
> >>>> On 25/02/2019 19:36, Tony Krowiak wrote:
> >>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
> >>>>>> We prepare the interception of the PQAP/AQIC instruction for
> >>>>>> the case the AQIC facility is enabled in the guest.
> >>>>>>
> >>>>>> We add a callback inside the KVM arch structure for s390 for
> >>>>>> a VFIO driver to handle a specific response to the PQAP
> >>>>>> instruction with the AQIC command.
> >>>>>>
> >>>>>> We inject the correct exceptions from inside KVM for the case the
> >>>>>> callback is not initialized, which happens when the vfio_ap driver
> >>>>>> is not loaded.
> >>>>>>
> >>>>>> If the callback has been setup we call it.
> >>>>>> If not we setup an answer considering that no queue is available
> >>>>>> for the guest when no callback has been setup.
> >>>>>>
> >>>>>> We do consider the responsability of the driver to always initialize
> >>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
> >>>>>> a guest.
> >>>>>>
> >>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> >>>>
> >>>> ...snip...
> >>>>
> >>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
> >>>>>>       }
> >>>>>>   }
> >>>>>> +/*
> >>>>>> + * handle_pqap: Handling pqap interception
> >>>>>> + * @vcpu: the vcpu having issue the pqap instruction
> >>>>>> + *
> >>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
> >>>>>> + * answer the guest even if no dedicated driver's hook is available.
> >>>>>> + *
> >>>>>> + * The intercepting code calls a dedicated callback for this instruction
> >>>>>> + * if a driver did register one in the CRYPTO satellite of the
> >>>>>> + * SIE block.
> >>>>>> + *
> >>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
> >>>>>> + *
> >>>>>> + * If no callback available, the queues are not available, return this to
> >>>>>> + * the caller.
> >>>>>> + * Else return the value returned by the callback.
> >>>>>> + */
> >>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
> >>>>>> +{
> >>>>>> +    uint8_t fc;
> >>>>>> +    struct ap_queue_status status = {};
> >>>>>> +
> >>>>>> +    /* Verify that the AP instruction are available */
> >>>>>> +    if (!ap_instructions_available())
> >>>>>> +        return -EOPNOTSUPP;
> >>>>>
> >>>>> How can the guest even execute an AP instruction if the AP instructions
> >>>>> are not available? If the AP instructions are not available on the host,
> >>>>> they will not be available on the guest (i.e., CPU model feature
> >>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
> >>>>> here given QEMU may not be the only client.
> >>>>>
> >>>>>> +    /* Verify that the guest is allowed to use AP instructions */
> >>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
> >>>>>> +        return -EOPNOTSUPP;
> >>>>>> +    /* Verify that the function code is AQIC */
> >>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
> >>>>>> +    if (fc != 0x03)
> >>>>>> +        return -EOPNOTSUPP;
> >>>>>
> >>>>> You must have missed my suggestion to move this to the
> >>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
> >>>>
> >>>> Please consider what happen if the vfio_ap module is not loaded.
> >>>
> >>> I have considered it and even verified my expectations empirically. If
> >>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
> >>
> >> OK, now please consider that another userland tool, not QEMU uses KVM.
> > 
> > What does that have to do with loading the vfio_ap module? Without the
> > vfio_ap module, there will be no AP devices for the guest. What are you
> > suggesting here?
> > 
> >>
> >>> If you don't have an mdev device, you will not be able to
> >>> start a guest with a vfio-ap device. If you start a guest without a
> >>> vfio-ap device, but enable AP instructions for the guest, there will be
> >>> no AP devices attached to the guest. Without any AP devices attached,
> >>> the PQAP(AQIC) instructions will not ever get executed.
> >>
> >> This is not right. The instruction will be executed, eventually, after decoding.
> > 
> > Please explain why the PQAP(AQIC) instruction will be executed on a
> > guest without any devices? Point me to the code in the AP bus where
> > PQAP(AQIC) is executed without a queue?
> 
> The host must be prepared to handle malicous and broken guests. So if
> a guest does PQAP, we must handle that gracefully (e.g. by injecting an
> exception)
> 

Nod.

> > 
> >>
> >>> Even if for some
> >>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
> >>> reason, it will fail with response code 0x01, AP-queue number not valid.
> >>
> >> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
> >> - OPERATION EXCEPTION if the micro-code is not installed
> >> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
> >> - SPECIFICATION exception if the instruction do not respect the usage specification
> >>
> >> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
> >>
> >> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
> >>
> >> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
> >>
> >> That done, the AP queue virtualisation can be called, this is done by calling the hook.
> > 
> > Okay, let's go back to the genesis of this discussion; namely, my
> > suggestion about moving the fc == 0x03 check into the hook code. If
> > the vfio_ap module is not loaded, there will be no hook code. In that
> > case, the check for the hook will fail and ultimately response code
> > 0x01 will be set in the status word (which may not be the right thing
> > to do?). You have not stated a single good reason for keeping this
> > check, but I'm done with this silly argument. It certainly doesn't
> > hurt anything.
> 
> The instruction handler must handle the basic checks for the
> instruction itself as outlined above.

Nod.

> 
> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
> The we should pass along everything to QEMU, but this is already done with the
> ECA_APIE check, correct?

Nod. 

> 
> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
> in QEMU and we have enabled the AP instructions interpretion?

At least the intention is to not emulate. ECA_APIE is an effective
control though...

> If yes then this has some implication:
> 
> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).

Not necessarily true. TAPQ can be intercepted as well (APFT depends
IC.3). But for now we don't care about that.

> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
> right away with a specification exception. I do not want the hook to mess with
> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))

As far as I can tell he already does test_kvm_facility(vcpu->kvm, 65). I
agree we need a spec exception if guest does not have facility 65, but
does have ap instructions.

> 3. What shall we do when fc == 0x3? We can certainly do the check here OR in the
> hook. As long as we have only fc==3 this does not matter.
> 

I guess Tony's point is that we may have fc == 0 that is TAPQ in the
APFT flavor. IMHO we don't need to care about that at the moment. 

> Correct?

IMHO mostly.

I also doing the facility checks in kvm is easier, and I think this is
something we can change later if needed without any major trouble.

There are a couple of things I would do differently than Pierre does:
1) Do the PGM_PRIVILEGED_OP before the fc == 3 check.

2) Do the test_kvm_facility(vcpu->kvm, 65) check in the context of fc ==
3. I.e. decide if this hook is about pqap or just about pqap aqic and
make the code convey that decision to its reader.

3) I would most probably test if the queue is available by looking at the
masks in CRYCB here. If not AP_RESPONSE_Q_NOT_AVAIL is what we need.

4) If we have APIE and queues authorized by the CRYCB (i.e. we have a
vfio_ap module loaded an an mdev associated with the kvm) the callback
not set (!(vcpu->kvm->arch.crypto.pqap_hook)) is a BUG! In that case
lying that the queue is not available does not seem right. BTW this is
something Pierre changed since the last version quietly (I can't recall
a mention in the change log or somebody asking for this). If we want to
be very pedantic about this bug scenario our best bet is probably
response code 6.

Regards,
Halil

[..]


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 11:03               ` Christian Borntraeger
  2019-02-28 11:22                 ` Cornelia Huck
@ 2019-02-28 13:10                 ` Pierre Morel
  2019-02-28 15:36                 ` Tony Krowiak
  2 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-28 13:10 UTC (permalink / raw)
  To: Christian Borntraeger, Tony Krowiak
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 28/02/2019 12:03, Christian Borntraeger wrote:
> 
> 
> On 28.02.2019 10:42, Christian Borntraeger wrote:
> [...]
>>> Okay, let's go back to the genesis of this discussion; namely, my
>>> suggestion about moving the fc == 0x03 check into the hook code. If
>>> the vfio_ap module is not loaded, there will be no hook code. In that
>>> case, the check for the hook will fail and ultimately response code
>>> 0x01 will be set in the status word (which may not be the right thing
>>> to do?). You have not stated a single good reason for keeping this
>>> check, but I'm done with this silly argument. It certainly doesn't
>>> hurt anything.
>>
>> The instruction handler must handle the basic checks for the
>> instruction itself as outlined above.
>>
>> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
>> The we should pass along everything to QEMU, but this is already done with the
>> ECA_APIE check, correct?
>>
>> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
>> in QEMU and we have enabled the AP instructions interpretion?
>> If yes then this has some implication:
>>
>> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
>> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
>> right away with a specification exception. I do not want the hook to mess with
>> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
>> 3. What shall we do when fc == 0x3? We can certainly do the check here OR in the
>> hook. As long as we have only fc==3 this does not matter.
>>
>> Correct?
> 
> Thinking more about that, I think we should inject a specification exception for all
> unknown FCc != 0x3. That would also qualify for keeping it in the instruction handler.
> 

May be return a privileged operation exception if issued from guest's 
program state, but generally I agree with the idea of handling all PQAP 
functions here.

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 11:22                 ` Cornelia Huck
@ 2019-02-28 13:16                   ` Pierre Morel
  2019-02-28 13:52                     ` Cornelia Huck
  0 siblings, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-28 13:16 UTC (permalink / raw)
  To: Cornelia Huck, Christian Borntraeger
  Cc: Tony Krowiak, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 28/02/2019 12:22, Cornelia Huck wrote:
> On Thu, 28 Feb 2019 12:03:38 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 28.02.2019 10:42, Christian Borntraeger wrote:
>> [...]
>>>> Okay, let's go back to the genesis of this discussion; namely, my
>>>> suggestion about moving the fc == 0x03 check into the hook code. If
>>>> the vfio_ap module is not loaded, there will be no hook code. In that
>>>> case, the check for the hook will fail and ultimately response code
>>>> 0x01 will be set in the status word (which may not be the right thing
>>>> to do?). You have not stated a single good reason for keeping this
>>>> check, but I'm done with this silly argument. It certainly doesn't
>>>> hurt anything.
>>>
>>> The instruction handler must handle the basic checks for the
>>> instruction itself as outlined above.
>>>
>>> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
>>> The we should pass along everything to QEMU, but this is already done with the
>>> ECA_APIE check, correct?
>>>
>>> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
>>> in QEMU and we have enabled the AP instructions interpretion?
>>> If yes then this has some implication:
>>>
>>> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
>>> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
>>> right away with a specification exception. I do not want the hook to mess with
>>> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
>>> 3. What shall we do when fc == 0x3? We can certainly do the check here OR in the
>>> hook. As long as we have only fc==3 this does not matter.
>>>
>>> Correct?
>>
>> Thinking more about that, I think we should inject a specification exception for all
>> unknown FCc != 0x3. That would also qualify for keeping it in the instruction handler.
>>
> 
> So, to summarize, the function should do:
> - Is userspace supposed to emulate everything (!ECA_APIE)? Return
>    -EOPNOTSUPP to hand control to it.
> - We are now interpreting the instruction in KVM. Do common checks
>    (PSTATE etc.) and inject exceptions, if needed.
> - Now look at the fc; if there's a handler for it, call that; if not
>    (case does not attempt to call a specific handler, or no handler
>    registered), inject a specification exception. (Do we want pre-checks
>    like for facility 65 here, or in the handler?)
> 
> That response code 0x01 thingy probably needs to go into the specific
> handler function, if anywhere (don't know the semantics, sorry).

What do you mean with specific handler function?

If you mean a switch around the FC with static function's call, I agree, 
if you mean a jump into a hook I do not agree.


> 
> Question: Will the handlers for the individual fcs need to generate
> different exceptions on their own? I.e., do they need to do injections
> themselves, or should the calling function possibly inject an exception
> on error?

There are some specificities.

> 
> (Are there more possible fcs than 0x3 and whatever the other
> subfunction was?)
> 

Yes, at least 5 different FC are implemented in the Linux kernel today 
AFAIK.

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28  9:42             ` Christian Borntraeger
  2019-02-28 11:03               ` Christian Borntraeger
  2019-02-28 12:39               ` Halil Pasic
@ 2019-02-28 13:23               ` Pierre Morel
  2019-02-28 13:44                 ` Christian Borntraeger
  2019-02-28 15:35               ` Tony Krowiak
  3 siblings, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-28 13:23 UTC (permalink / raw)
  To: Christian Borntraeger, Tony Krowiak
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 28/02/2019 10:42, Christian Borntraeger wrote:
> 
> 
> On 27.02.2019 19:00, Tony Krowiak wrote:
>> On 2/27/19 3:09 AM, Pierre Morel wrote:
>>> On 26/02/2019 16:47, Tony Krowiak wrote:
>>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>>>> the case the AQIC facility is enabled in the guest.
>>>>>>>
>>>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>>>> instruction with the AQIC command.
>>>>>>>
>>>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>>>> is not loaded.
>>>>>>>
>>>>>>> If the callback has been setup we call it.
>>>>>>> If not we setup an answer considering that no queue is available
>>>>>>> for the guest when no callback has been setup.
>>>>>>>
>>>>>>> We do consider the responsability of the driver to always initialize
>>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>>>> a guest.
>>>>>>>
>>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>>
>>>>> ...snip...
>>>>>
>>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>>>        }
>>>>>>>    }
>>>>>>> +/*
>>>>>>> + * handle_pqap: Handling pqap interception
>>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>>>> + *
>>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>>>> + *
>>>>>>> + * The intercepting code calls a dedicated callback for this instruction
>>>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>>>> + * SIE block.
>>>>>>> + *
>>>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>>>>>>> + *
>>>>>>> + * If no callback available, the queues are not available, return this to
>>>>>>> + * the caller.
>>>>>>> + * Else return the value returned by the callback.
>>>>>>> + */
>>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>>>> +{
>>>>>>> +    uint8_t fc;
>>>>>>> +    struct ap_queue_status status = {};
>>>>>>> +
>>>>>>> +    /* Verify that the AP instruction are available */
>>>>>>> +    if (!ap_instructions_available())
>>>>>>> +        return -EOPNOTSUPP;
>>>>>>
>>>>>> How can the guest even execute an AP instruction if the AP instructions
>>>>>> are not available? If the AP instructions are not available on the host,
>>>>>> they will not be available on the guest (i.e., CPU model feature
>>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>>>> here given QEMU may not be the only client.
>>>>>>
>>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>>>> +        return -EOPNOTSUPP;
>>>>>>> +    /* Verify that the function code is AQIC */
>>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>>>> +    if (fc != 0x03)
>>>>>>> +        return -EOPNOTSUPP;
>>>>>>
>>>>>> You must have missed my suggestion to move this to the
>>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>>>
>>>>> Please consider what happen if the vfio_ap module is not loaded.
>>>>
>>>> I have considered it and even verified my expectations empirically. If
>>>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
>>>
>>> OK, now please consider that another userland tool, not QEMU uses KVM.
>>
>> What does that have to do with loading the vfio_ap module? Without the
>> vfio_ap module, there will be no AP devices for the guest. What are you
>> suggesting here?
>>
>>>
>>>> If you don't have an mdev device, you will not be able to
>>>> start a guest with a vfio-ap device. If you start a guest without a
>>>> vfio-ap device, but enable AP instructions for the guest, there will be
>>>> no AP devices attached to the guest. Without any AP devices attached,
>>>> the PQAP(AQIC) instructions will not ever get executed.
>>>
>>> This is not right. The instruction will be executed, eventually, after decoding.
>>
>> Please explain why the PQAP(AQIC) instruction will be executed on a
>> guest without any devices? Point me to the code in the AP bus where
>> PQAP(AQIC) is executed without a queue?
> 
> The host must be prepared to handle malicous and broken guests. So if
> a guest does PQAP, we must handle that gracefully (e.g. by injecting an
> exception)
> 
>>
>>>
>>>> Even if for some
>>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>>>> reason, it will fail with response code 0x01, AP-queue number not valid.
>>>
>>> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
>>> - OPERATION EXCEPTION if the micro-code is not installed
>>> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
>>> - SPECIFICATION exception if the instruction do not respect the usage specification
>>>
>>> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
>>>
>>> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
>>>
>>> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
>>>
>>> That done, the AP queue virtualisation can be called, this is done by calling the hook.
>>
>> Okay, let's go back to the genesis of this discussion; namely, my
>> suggestion about moving the fc == 0x03 check into the hook code. If
>> the vfio_ap module is not loaded, there will be no hook code. In that
>> case, the check for the hook will fail and ultimately response code
>> 0x01 will be set in the status word (which may not be the right thing
>> to do?). You have not stated a single good reason for keeping this
>> check, but I'm done with this silly argument. It certainly doesn't
>> hurt anything.
> 
> The instruction handler must handle the basic checks for the
> instruction itself as outlined above.
> 
> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
> The we should pass along everything to QEMU, but this is already done with the
> ECA_APIE check, correct?
> 
> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
> in QEMU and we have enabled the AP instructions interpretion?
> If yes then this has some implication:
> 
> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
> right away with a specification exception. I do not want the hook to mess with
> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))


Currently the check test_kvm_facility(vcpu->kvm, 65) is done in the 
instruction handler, what do you mean here?

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 13:23               ` Pierre Morel
@ 2019-02-28 13:44                 ` Christian Borntraeger
  2019-02-28 13:47                   ` Pierre Morel
  2019-02-28 15:45                   ` Tony Krowiak
  0 siblings, 2 replies; 79+ messages in thread
From: Christian Borntraeger @ 2019-02-28 13:44 UTC (permalink / raw)
  To: pmorel, Tony Krowiak
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu



On 28.02.2019 14:23, Pierre Morel wrote:
> On 28/02/2019 10:42, Christian Borntraeger wrote:
>>
>>
>> On 27.02.2019 19:00, Tony Krowiak wrote:
>>> On 2/27/19 3:09 AM, Pierre Morel wrote:
>>>> On 26/02/2019 16:47, Tony Krowiak wrote:
>>>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>>>>> the case the AQIC facility is enabled in the guest.
>>>>>>>>
>>>>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>>>>> instruction with the AQIC command.
>>>>>>>>
>>>>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>>>>> is not loaded.
>>>>>>>>
>>>>>>>> If the callback has been setup we call it.
>>>>>>>> If not we setup an answer considering that no queue is available
>>>>>>>> for the guest when no callback has been setup.
>>>>>>>>
>>>>>>>> We do consider the responsability of the driver to always initialize
>>>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>>>>> a guest.
>>>>>>>>
>>>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>>>
>>>>>> ...snip...
>>>>>>
>>>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>>>>        }
>>>>>>>>    }
>>>>>>>> +/*
>>>>>>>> + * handle_pqap: Handling pqap interception
>>>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>>>>> + *
>>>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>>>>> + *
>>>>>>>> + * The intercepting code calls a dedicated callback for this instruction
>>>>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>>>>> + * SIE block.
>>>>>>>> + *
>>>>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>>>>>>>> + *
>>>>>>>> + * If no callback available, the queues are not available, return this to
>>>>>>>> + * the caller.
>>>>>>>> + * Else return the value returned by the callback.
>>>>>>>> + */
>>>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>>>>> +{
>>>>>>>> +    uint8_t fc;
>>>>>>>> +    struct ap_queue_status status = {};
>>>>>>>> +
>>>>>>>> +    /* Verify that the AP instruction are available */
>>>>>>>> +    if (!ap_instructions_available())
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>
>>>>>>> How can the guest even execute an AP instruction if the AP instructions
>>>>>>> are not available? If the AP instructions are not available on the host,
>>>>>>> they will not be available on the guest (i.e., CPU model feature
>>>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>>>>> here given QEMU may not be the only client.
>>>>>>>
>>>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>> +    /* Verify that the function code is AQIC */
>>>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>>>>> +    if (fc != 0x03)
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>
>>>>>>> You must have missed my suggestion to move this to the
>>>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>>>>
>>>>>> Please consider what happen if the vfio_ap module is not loaded.
>>>>>
>>>>> I have considered it and even verified my expectations empirically. If
>>>>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
>>>>
>>>> OK, now please consider that another userland tool, not QEMU uses KVM.
>>>
>>> What does that have to do with loading the vfio_ap module? Without the
>>> vfio_ap module, there will be no AP devices for the guest. What are you
>>> suggesting here?
>>>
>>>>
>>>>> If you don't have an mdev device, you will not be able to
>>>>> start a guest with a vfio-ap device. If you start a guest without a
>>>>> vfio-ap device, but enable AP instructions for the guest, there will be
>>>>> no AP devices attached to the guest. Without any AP devices attached,
>>>>> the PQAP(AQIC) instructions will not ever get executed.
>>>>
>>>> This is not right. The instruction will be executed, eventually, after decoding.
>>>
>>> Please explain why the PQAP(AQIC) instruction will be executed on a
>>> guest without any devices? Point me to the code in the AP bus where
>>> PQAP(AQIC) is executed without a queue?
>>
>> The host must be prepared to handle malicous and broken guests. So if
>> a guest does PQAP, we must handle that gracefully (e.g. by injecting an
>> exception)
>>
>>>
>>>>
>>>>> Even if for some
>>>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>>>>> reason, it will fail with response code 0x01, AP-queue number not valid.
>>>>
>>>> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
>>>> - OPERATION EXCEPTION if the micro-code is not installed
>>>> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
>>>> - SPECIFICATION exception if the instruction do not respect the usage specification
>>>>
>>>> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
>>>>
>>>> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
>>>>
>>>> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
>>>>
>>>> That done, the AP queue virtualisation can be called, this is done by calling the hook.
>>>
>>> Okay, let's go back to the genesis of this discussion; namely, my
>>> suggestion about moving the fc == 0x03 check into the hook code. If
>>> the vfio_ap module is not loaded, there will be no hook code. In that
>>> case, the check for the hook will fail and ultimately response code
>>> 0x01 will be set in the status word (which may not be the right thing
>>> to do?). You have not stated a single good reason for keeping this
>>> check, but I'm done with this silly argument. It certainly doesn't
>>> hurt anything.
>>
>> The instruction handler must handle the basic checks for the
>> instruction itself as outlined above.
>>
>> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
>> The we should pass along everything to QEMU, but this is already done with the
>> ECA_APIE check, correct?
>>
>> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
>> in QEMU and we have enabled the AP instructions interpretion?
>> If yes then this has some implication:
>>
>> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
>> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
>> right away with a specification exception. I do not want the hook to mess with
>> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
> 
> 
> Currently the check test_kvm_facility(vcpu->kvm, 65) is done in the instruction handler, what do you mean here?

Found it. I think we should couple the check for 64 to fc==3. Otherwise both things are somewhat
disconnected when reviewing.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 13:44                 ` Christian Borntraeger
@ 2019-02-28 13:47                   ` Pierre Morel
  2019-02-28 14:07                     ` Halil Pasic
  2019-02-28 15:45                   ` Tony Krowiak
  1 sibling, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-28 13:47 UTC (permalink / raw)
  To: Christian Borntraeger, Tony Krowiak
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 28/02/2019 14:44, Christian Borntraeger wrote:
> 
> 
> On 28.02.2019 14:23, Pierre Morel wrote:
>> On 28/02/2019 10:42, Christian Borntraeger wrote:
>>>
>>>
>>> On 27.02.2019 19:00, Tony Krowiak wrote:
>>>> On 2/27/19 3:09 AM, Pierre Morel wrote:
>>>>> On 26/02/2019 16:47, Tony Krowiak wrote:
>>>>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>>>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>>>>>> the case the AQIC facility is enabled in the guest.
>>>>>>>>>
>>>>>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>>>>>> instruction with the AQIC command.
>>>>>>>>>
>>>>>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>>>>>> is not loaded.
>>>>>>>>>
>>>>>>>>> If the callback has been setup we call it.
>>>>>>>>> If not we setup an answer considering that no queue is available
>>>>>>>>> for the guest when no callback has been setup.
>>>>>>>>>
>>>>>>>>> We do consider the responsability of the driver to always initialize
>>>>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>>>>>> a guest.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>>>>
>>>>>>> ...snip...
>>>>>>>
>>>>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>>>>>         }
>>>>>>>>>     }
>>>>>>>>> +/*
>>>>>>>>> + * handle_pqap: Handling pqap interception
>>>>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>>>>>> + *
>>>>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>>>>>> + *
>>>>>>>>> + * The intercepting code calls a dedicated callback for this instruction
>>>>>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>>>>>> + * SIE block.
>>>>>>>>> + *
>>>>>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>>>>>>>>> + *
>>>>>>>>> + * If no callback available, the queues are not available, return this to
>>>>>>>>> + * the caller.
>>>>>>>>> + * Else return the value returned by the callback.
>>>>>>>>> + */
>>>>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>>>>>> +{
>>>>>>>>> +    uint8_t fc;
>>>>>>>>> +    struct ap_queue_status status = {};
>>>>>>>>> +
>>>>>>>>> +    /* Verify that the AP instruction are available */
>>>>>>>>> +    if (!ap_instructions_available())
>>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>>
>>>>>>>> How can the guest even execute an AP instruction if the AP instructions
>>>>>>>> are not available? If the AP instructions are not available on the host,
>>>>>>>> they will not be available on the guest (i.e., CPU model feature
>>>>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>>>>>> here given QEMU may not be the only client.
>>>>>>>>
>>>>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>>> +    /* Verify that the function code is AQIC */
>>>>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>>>>>> +    if (fc != 0x03)
>>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>>
>>>>>>>> You must have missed my suggestion to move this to the
>>>>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>>>>>
>>>>>>> Please consider what happen if the vfio_ap module is not loaded.
>>>>>>
>>>>>> I have considered it and even verified my expectations empirically. If
>>>>>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
>>>>>
>>>>> OK, now please consider that another userland tool, not QEMU uses KVM.
>>>>
>>>> What does that have to do with loading the vfio_ap module? Without the
>>>> vfio_ap module, there will be no AP devices for the guest. What are you
>>>> suggesting here?
>>>>
>>>>>
>>>>>> If you don't have an mdev device, you will not be able to
>>>>>> start a guest with a vfio-ap device. If you start a guest without a
>>>>>> vfio-ap device, but enable AP instructions for the guest, there will be
>>>>>> no AP devices attached to the guest. Without any AP devices attached,
>>>>>> the PQAP(AQIC) instructions will not ever get executed.
>>>>>
>>>>> This is not right. The instruction will be executed, eventually, after decoding.
>>>>
>>>> Please explain why the PQAP(AQIC) instruction will be executed on a
>>>> guest without any devices? Point me to the code in the AP bus where
>>>> PQAP(AQIC) is executed without a queue?
>>>
>>> The host must be prepared to handle malicous and broken guests. So if
>>> a guest does PQAP, we must handle that gracefully (e.g. by injecting an
>>> exception)
>>>
>>>>
>>>>>
>>>>>> Even if for some
>>>>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>>>>>> reason, it will fail with response code 0x01, AP-queue number not valid.
>>>>>
>>>>> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
>>>>> - OPERATION EXCEPTION if the micro-code is not installed
>>>>> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
>>>>> - SPECIFICATION exception if the instruction do not respect the usage specification
>>>>>
>>>>> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
>>>>>
>>>>> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
>>>>>
>>>>> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
>>>>>
>>>>> That done, the AP queue virtualisation can be called, this is done by calling the hook.
>>>>
>>>> Okay, let's go back to the genesis of this discussion; namely, my
>>>> suggestion about moving the fc == 0x03 check into the hook code. If
>>>> the vfio_ap module is not loaded, there will be no hook code. In that
>>>> case, the check for the hook will fail and ultimately response code
>>>> 0x01 will be set in the status word (which may not be the right thing
>>>> to do?). You have not stated a single good reason for keeping this
>>>> check, but I'm done with this silly argument. It certainly doesn't
>>>> hurt anything.
>>>
>>> The instruction handler must handle the basic checks for the
>>> instruction itself as outlined above.
>>>
>>> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
>>> The we should pass along everything to QEMU, but this is already done with the
>>> ECA_APIE check, correct?
>>>
>>> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
>>> in QEMU and we have enabled the AP instructions interpretion?
>>> If yes then this has some implication:
>>>
>>> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
>>> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
>>> right away with a specification exception. I do not want the hook to mess with
>>> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
>>
>>
>> Currently the check test_kvm_facility(vcpu->kvm, 65) is done in the instruction handler, what do you mean here?
> 
> Found it. I think we should couple the check for 64 to fc==3. Otherwise both things are somewhat
> disconnected when reviewing.
> 

Right.
In the next version I will go the way you proposed anyway and handle all 
PQAP functions separatly (switch/dedicated functions).
With this, I will have to split the checks to the right place.

Thanks for the comments.

Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 13:16                   ` Pierre Morel
@ 2019-02-28 13:52                     ` Cornelia Huck
  2019-02-28 14:14                       ` Pierre Morel
  0 siblings, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2019-02-28 13:52 UTC (permalink / raw)
  To: Pierre Morel
  Cc: Christian Borntraeger, Tony Krowiak, alex.williamson,
	linux-kernel, linux-s390, kvm, frankja, pasic, david,
	schwidefsky, heiko.carstens, freude, mimu

On Thu, 28 Feb 2019 14:16:09 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 28/02/2019 12:22, Cornelia Huck wrote:

> > So, to summarize, the function should do:
> > - Is userspace supposed to emulate everything (!ECA_APIE)? Return
> >    -EOPNOTSUPP to hand control to it.
> > - We are now interpreting the instruction in KVM. Do common checks
> >    (PSTATE etc.) and inject exceptions, if needed.
> > - Now look at the fc; if there's a handler for it, call that; if not
> >    (case does not attempt to call a specific handler, or no handler
> >    registered), inject a specification exception. (Do we want pre-checks
> >    like for facility 65 here, or in the handler?)
> > 
> > That response code 0x01 thingy probably needs to go into the specific
> > handler function, if anywhere (don't know the semantics, sorry).  
> 
> What do you mean with specific handler function?
> 
> If you mean a switch around the FC with static function's call, I agree, 
> if you mean a jump into a hook I do not agree.

Ah, ok; so each case (that we want to handle) should call into a
subhandler that does
{
	(... check things like facilities ...)
	if (!specific_hook)
		inject_specif_excp_and_return();
	ret = specific_hook();
	if (ret)
		set_resp_code_0x01(); // or in specific_hook()?
}

?
 
> > 
> > Question: Will the handlers for the individual fcs need to generate
> > different exceptions on their own? I.e., do they need to do injections
> > themselves, or should the calling function possibly inject an exception
> > on error?  
> 
> There are some specificities.

Ok, should probably done in the subhandlers?

(I hope I don't muddy the waters too much; but basically, I'm poking
around with a stick in the dark :)

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 13:47                   ` Pierre Morel
@ 2019-02-28 14:07                     ` Halil Pasic
  2019-02-28 14:13                       ` Pierre Morel
  0 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2019-02-28 14:07 UTC (permalink / raw)
  To: Pierre Morel
  Cc: Christian Borntraeger, Tony Krowiak, alex.williamson, cohuck,
	linux-kernel, linux-s390, kvm, frankja, david, schwidefsky,
	heiko.carstens, freude, mimu

On Thu, 28 Feb 2019 14:47:35 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 28/02/2019 14:44, Christian Borntraeger wrote:
> > 
> > 
> > On 28.02.2019 14:23, Pierre Morel wrote:
> >> On 28/02/2019 10:42, Christian Borntraeger wrote:
> >>>
> >>>
> >>> On 27.02.2019 19:00, Tony Krowiak wrote:
> >>>> On 2/27/19 3:09 AM, Pierre Morel wrote:
> >>>>> On 26/02/2019 16:47, Tony Krowiak wrote:
> >>>>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
> >>>>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
> >>>>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
> >>>>>>>>> We prepare the interception of the PQAP/AQIC instruction for
> >>>>>>>>> the case the AQIC facility is enabled in the guest.
> >>>>>>>>>
> >>>>>>>>> We add a callback inside the KVM arch structure for s390 for
> >>>>>>>>> a VFIO driver to handle a specific response to the PQAP
> >>>>>>>>> instruction with the AQIC command.
> >>>>>>>>>
> >>>>>>>>> We inject the correct exceptions from inside KVM for the case the
> >>>>>>>>> callback is not initialized, which happens when the vfio_ap driver
> >>>>>>>>> is not loaded.
> >>>>>>>>>
> >>>>>>>>> If the callback has been setup we call it.
> >>>>>>>>> If not we setup an answer considering that no queue is available
> >>>>>>>>> for the guest when no callback has been setup.
> >>>>>>>>>
> >>>>>>>>> We do consider the responsability of the driver to always initialize
> >>>>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
> >>>>>>>>> a guest.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> >>>>>>>
> >>>>>>> ...snip...
> >>>>>>>
> >>>>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
> >>>>>>>>>         }
> >>>>>>>>>     }
> >>>>>>>>> +/*
> >>>>>>>>> + * handle_pqap: Handling pqap interception
> >>>>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
> >>>>>>>>> + *
> >>>>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
> >>>>>>>>> + * answer the guest even if no dedicated driver's hook is available.
> >>>>>>>>> + *
> >>>>>>>>> + * The intercepting code calls a dedicated callback for this instruction
> >>>>>>>>> + * if a driver did register one in the CRYPTO satellite of the
> >>>>>>>>> + * SIE block.
> >>>>>>>>> + *
> >>>>>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
> >>>>>>>>> + *
> >>>>>>>>> + * If no callback available, the queues are not available, return this to
> >>>>>>>>> + * the caller.
> >>>>>>>>> + * Else return the value returned by the callback.
> >>>>>>>>> + */
> >>>>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
> >>>>>>>>> +{
> >>>>>>>>> +    uint8_t fc;
> >>>>>>>>> +    struct ap_queue_status status = {};
> >>>>>>>>> +
> >>>>>>>>> +    /* Verify that the AP instruction are available */
> >>>>>>>>> +    if (!ap_instructions_available())
> >>>>>>>>> +        return -EOPNOTSUPP;
> >>>>>>>>
> >>>>>>>> How can the guest even execute an AP instruction if the AP instructions
> >>>>>>>> are not available? If the AP instructions are not available on the host,
> >>>>>>>> they will not be available on the guest (i.e., CPU model feature
> >>>>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
> >>>>>>>> here given QEMU may not be the only client.
> >>>>>>>>
> >>>>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
> >>>>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
> >>>>>>>>> +        return -EOPNOTSUPP;
> >>>>>>>>> +    /* Verify that the function code is AQIC */
> >>>>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
> >>>>>>>>> +    if (fc != 0x03)
> >>>>>>>>> +        return -EOPNOTSUPP;
> >>>>>>>>
> >>>>>>>> You must have missed my suggestion to move this to the
> >>>>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
> >>>>>>>
> >>>>>>> Please consider what happen if the vfio_ap module is not loaded.
> >>>>>>
> >>>>>> I have considered it and even verified my expectations empirically. If
> >>>>>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
> >>>>>
> >>>>> OK, now please consider that another userland tool, not QEMU uses KVM.
> >>>>
> >>>> What does that have to do with loading the vfio_ap module? Without the
> >>>> vfio_ap module, there will be no AP devices for the guest. What are you
> >>>> suggesting here?
> >>>>
> >>>>>
> >>>>>> If you don't have an mdev device, you will not be able to
> >>>>>> start a guest with a vfio-ap device. If you start a guest without a
> >>>>>> vfio-ap device, but enable AP instructions for the guest, there will be
> >>>>>> no AP devices attached to the guest. Without any AP devices attached,
> >>>>>> the PQAP(AQIC) instructions will not ever get executed.
> >>>>>
> >>>>> This is not right. The instruction will be executed, eventually, after decoding.
> >>>>
> >>>> Please explain why the PQAP(AQIC) instruction will be executed on a
> >>>> guest without any devices? Point me to the code in the AP bus where
> >>>> PQAP(AQIC) is executed without a queue?
> >>>
> >>> The host must be prepared to handle malicous and broken guests. So if
> >>> a guest does PQAP, we must handle that gracefully (e.g. by injecting an
> >>> exception)
> >>>
> >>>>
> >>>>>
> >>>>>> Even if for some
> >>>>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
> >>>>>> reason, it will fail with response code 0x01, AP-queue number not valid.
> >>>>>
> >>>>> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
> >>>>> - OPERATION EXCEPTION if the micro-code is not installed
> >>>>> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
> >>>>> - SPECIFICATION exception if the instruction do not respect the usage specification
> >>>>>
> >>>>> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
> >>>>>
> >>>>> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
> >>>>>
> >>>>> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
> >>>>>
> >>>>> That done, the AP queue virtualisation can be called, this is done by calling the hook.
> >>>>
> >>>> Okay, let's go back to the genesis of this discussion; namely, my
> >>>> suggestion about moving the fc == 0x03 check into the hook code. If
> >>>> the vfio_ap module is not loaded, there will be no hook code. In that
> >>>> case, the check for the hook will fail and ultimately response code
> >>>> 0x01 will be set in the status word (which may not be the right thing
> >>>> to do?). You have not stated a single good reason for keeping this
> >>>> check, but I'm done with this silly argument. It certainly doesn't
> >>>> hurt anything.
> >>>
> >>> The instruction handler must handle the basic checks for the
> >>> instruction itself as outlined above.
> >>>
> >>> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
> >>> The we should pass along everything to QEMU, but this is already done with the
> >>> ECA_APIE check, correct?
> >>>
> >>> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
> >>> in QEMU and we have enabled the AP instructions interpretion?
> >>> If yes then this has some implication:
> >>>
> >>> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
> >>> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
> >>> right away with a specification exception. I do not want the hook to mess with
> >>> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
> >>
> >>
> >> Currently the check test_kvm_facility(vcpu->kvm, 65) is done in the instruction handler, what do you mean here?
> > 
> > Found it. I think we should couple the check for 64 to fc==3. Otherwise both things are somewhat
> > disconnected when reviewing.
> > 
> 
> Right.
> In the next version I will go the way you proposed anyway and handle all 
> PQAP functions separatly (switch/dedicated functions).

Sorry what did Christian propose? I've lost you. Christian's initial
analysis assumed AFAIU that we only have or care for fc == 3.

BTW have you seen my response to Christians analysis and the changes I
proposed?

Regards,
Halil

> With this, I will have to split the checks to the right place.
> 
> Thanks for the comments.
> 
> Regards,
> Pierre
> 
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 12:39               ` Halil Pasic
@ 2019-02-28 14:12                 ` Pierre Morel
  2019-02-28 16:51                   ` Halil Pasic
  2019-02-28 15:43                 ` Tony Krowiak
  1 sibling, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-28 14:12 UTC (permalink / raw)
  To: Halil Pasic, Christian Borntraeger
  Cc: Tony Krowiak, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, david, schwidefsky, heiko.carstens, freude, mimu

On 28/02/2019 13:39, Halil Pasic wrote:
> On Thu, 28 Feb 2019 10:42:23 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>>
>>
>> On 27.02.2019 19:00, Tony Krowiak wrote:
>>> On 2/27/19 3:09 AM, Pierre Morel wrote:
>>>> On 26/02/2019 16:47, Tony Krowiak wrote:
>>>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>>>>> the case the AQIC facility is enabled in the guest.
>>>>>>>>
>>>>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>>>>> instruction with the AQIC command.
>>>>>>>>
>>>>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>>>>> is not loaded.
>>>>>>>>
>>>>>>>> If the callback has been setup we call it.
>>>>>>>> If not we setup an answer considering that no queue is available
>>>>>>>> for the guest when no callback has been setup.
>>>>>>>>
>>>>>>>> We do consider the responsability of the driver to always initialize
>>>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>>>>> a guest.
>>>>>>>>
>>>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>>>
>>>>>> ...snip...
>>>>>>
>>>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>>>>        }
>>>>>>>>    }
>>>>>>>> +/*
>>>>>>>> + * handle_pqap: Handling pqap interception
>>>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>>>>> + *
>>>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>>>>> + *
>>>>>>>> + * The intercepting code calls a dedicated callback for this instruction
>>>>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>>>>> + * SIE block.
>>>>>>>> + *
>>>>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>>>>>>>> + *
>>>>>>>> + * If no callback available, the queues are not available, return this to
>>>>>>>> + * the caller.
>>>>>>>> + * Else return the value returned by the callback.
>>>>>>>> + */
>>>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>>>>> +{
>>>>>>>> +    uint8_t fc;
>>>>>>>> +    struct ap_queue_status status = {};
>>>>>>>> +
>>>>>>>> +    /* Verify that the AP instruction are available */
>>>>>>>> +    if (!ap_instructions_available())
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>
>>>>>>> How can the guest even execute an AP instruction if the AP instructions
>>>>>>> are not available? If the AP instructions are not available on the host,
>>>>>>> they will not be available on the guest (i.e., CPU model feature
>>>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>>>>> here given QEMU may not be the only client.
>>>>>>>
>>>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>> +    /* Verify that the function code is AQIC */
>>>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>>>>> +    if (fc != 0x03)
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>
>>>>>>> You must have missed my suggestion to move this to the
>>>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>>>>
>>>>>> Please consider what happen if the vfio_ap module is not loaded.
>>>>>
>>>>> I have considered it and even verified my expectations empirically. If
>>>>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
>>>>
>>>> OK, now please consider that another userland tool, not QEMU uses KVM.
>>>
>>> What does that have to do with loading the vfio_ap module? Without the
>>> vfio_ap module, there will be no AP devices for the guest. What are you
>>> suggesting here?
>>>
>>>>
>>>>> If you don't have an mdev device, you will not be able to
>>>>> start a guest with a vfio-ap device. If you start a guest without a
>>>>> vfio-ap device, but enable AP instructions for the guest, there will be
>>>>> no AP devices attached to the guest. Without any AP devices attached,
>>>>> the PQAP(AQIC) instructions will not ever get executed.
>>>>
>>>> This is not right. The instruction will be executed, eventually, after decoding.
>>>
>>> Please explain why the PQAP(AQIC) instruction will be executed on a
>>> guest without any devices? Point me to the code in the AP bus where
>>> PQAP(AQIC) is executed without a queue?
>>
>> The host must be prepared to handle malicous and broken guests. So if
>> a guest does PQAP, we must handle that gracefully (e.g. by injecting an
>> exception)
>>
> 
> Nod.
> 
>>>
>>>>
>>>>> Even if for some
>>>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>>>>> reason, it will fail with response code 0x01, AP-queue number not valid.
>>>>
>>>> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
>>>> - OPERATION EXCEPTION if the micro-code is not installed
>>>> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
>>>> - SPECIFICATION exception if the instruction do not respect the usage specification
>>>>
>>>> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
>>>>
>>>> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
>>>>
>>>> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
>>>>
>>>> That done, the AP queue virtualisation can be called, this is done by calling the hook.
>>>
>>> Okay, let's go back to the genesis of this discussion; namely, my
>>> suggestion about moving the fc == 0x03 check into the hook code. If
>>> the vfio_ap module is not loaded, there will be no hook code. In that
>>> case, the check for the hook will fail and ultimately response code
>>> 0x01 will be set in the status word (which may not be the right thing
>>> to do?). You have not stated a single good reason for keeping this
>>> check, but I'm done with this silly argument. It certainly doesn't
>>> hurt anything.
>>
>> The instruction handler must handle the basic checks for the
>> instruction itself as outlined above.
> 
> Nod.
> 
>>
>> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
>> The we should pass along everything to QEMU, but this is already done with the
>> ECA_APIE check, correct?
> 
> Nod.
> 
>>
>> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
>> in QEMU and we have enabled the AP instructions interpretion?
> 
> At least the intention is to not emulate. ECA_APIE is an effective
> control though...
> 
>> If yes then this has some implication:
>>
>> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
> 
> Not necessarily true. TAPQ can be intercepted as well (APFT depends
> IC.3). But for now we don't care about that.
> 
>> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
>> right away with a specification exception. I do not want the hook to mess with
>> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
> 
> As far as I can tell he already does test_kvm_facility(vcpu->kvm, 65). I
> agree we need a spec exception if guest does not have facility 65, but
> does have ap instructions.
> 
>> 3. What shall we do when fc == 0x3? We can certainly do the check here OR in the
>> hook. As long as we have only fc==3 this does not matter.
>>
> 
> I guess Tony's point is that we may have fc == 0 that is TAPQ in the
> APFT flavor. IMHO we don't need to care about that at the moment.
> 
>> Correct?
> 
> IMHO mostly.
> 
> I also doing the facility checks in kvm is easier, and I think this is
> something we can change later if needed without any major trouble.
> 
> There are a couple of things I would do differently than Pierre does:
> 1) Do the PGM_PRIVILEGED_OP before the fc == 3 check.

Idea was not to modify existing behavior for fc != 3

Also Christian already proposed to handle all FC codes. So in this idea, 
this must be done as you say.

> 
> 2) Do the test_kvm_facility(vcpu->kvm, 65) check in the context of fc ==
> 3. I.e. decide if this hook is about pqap or just about pqap aqic and
> make the code convey that decision to its reader.
> 
> 3) I would most probably test if the queue is available by looking at the
> masks in CRYCB here. If not AP_RESPONSE_Q_NOT_AVAIL is what we need.

This I do not agree with, it is typically the responsibility of the part 
in charge of the virtualization to do this, also the vfio_driver.

> 
> 4) If we have APIE and queues authorized by the CRYCB (i.e. we have a
> vfio_ap module loaded an an mdev associated with the kvm) the callback
> not set (!(vcpu->kvm->arch.crypto.pqap_hook)) is a BUG!

I do not agree with this either, the maintainers ;) will not allow this.

> In that case
> lying that the queue is not available does not seem right. BTW this is
> something Pierre changed since the last version quietly (I can't recall
> a mention in the change log or somebody asking for this). If we want to
> be very pedantic about this bug scenario our best bet is probably
> response code 6.


RC 06 means "Invalid address of AP-queue notification byte"

So you must have think about another code or I do not understand at all 
what you mean.

Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 14:07                     ` Halil Pasic
@ 2019-02-28 14:13                       ` Pierre Morel
  0 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-02-28 14:13 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Christian Borntraeger, Tony Krowiak, alex.williamson, cohuck,
	linux-kernel, linux-s390, kvm, frankja, david, schwidefsky,
	heiko.carstens, freude, mimu

On 28/02/2019 15:07, Halil Pasic wrote:
> On Thu, 28 Feb 2019 14:47:35 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 28/02/2019 14:44, Christian Borntraeger wrote:
>>>
>>>
>>> On 28.02.2019 14:23, Pierre Morel wrote:
>>>> On 28/02/2019 10:42, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> On 27.02.2019 19:00, Tony Krowiak wrote:
>>>>>> On 2/27/19 3:09 AM, Pierre Morel wrote:
>>>>>>> On 26/02/2019 16:47, Tony Krowiak wrote:
>>>>>>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>>>>>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>>>>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>>>>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>>>>>>>> the case the AQIC facility is enabled in the guest.
>>>>>>>>>>>
>>>>>>>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>>>>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>>>>>>>> instruction with the AQIC command.
>>>>>>>>>>>
>>>>>>>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>>>>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>>>>>>>> is not loaded.
>>>>>>>>>>>
>>>>>>>>>>> If the callback has been setup we call it.
>>>>>>>>>>> If not we setup an answer considering that no queue is available
>>>>>>>>>>> for the guest when no callback has been setup.
>>>>>>>>>>>
>>>>>>>>>>> We do consider the responsability of the driver to always initialize
>>>>>>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>>>>>>>> a guest.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>>>>>>
>>>>>>>>> ...snip...
>>>>>>>>>
>>>>>>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>>>>>>>          }
>>>>>>>>>>>      }
>>>>>>>>>>> +/*
>>>>>>>>>>> + * handle_pqap: Handling pqap interception
>>>>>>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>>>>>>>> + *
>>>>>>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>>>>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>>>>>>>> + *
>>>>>>>>>>> + * The intercepting code calls a dedicated callback for this instruction
>>>>>>>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>>>>>>>> + * SIE block.
>>>>>>>>>>> + *
>>>>>>>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>>>>>>>>>>> + *
>>>>>>>>>>> + * If no callback available, the queues are not available, return this to
>>>>>>>>>>> + * the caller.
>>>>>>>>>>> + * Else return the value returned by the callback.
>>>>>>>>>>> + */
>>>>>>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>>>>>>>> +{
>>>>>>>>>>> +    uint8_t fc;
>>>>>>>>>>> +    struct ap_queue_status status = {};
>>>>>>>>>>> +
>>>>>>>>>>> +    /* Verify that the AP instruction are available */
>>>>>>>>>>> +    if (!ap_instructions_available())
>>>>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>>>>
>>>>>>>>>> How can the guest even execute an AP instruction if the AP instructions
>>>>>>>>>> are not available? If the AP instructions are not available on the host,
>>>>>>>>>> they will not be available on the guest (i.e., CPU model feature
>>>>>>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>>>>>>>> here given QEMU may not be the only client.
>>>>>>>>>>
>>>>>>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>>>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>>>>> +    /* Verify that the function code is AQIC */
>>>>>>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>>>>>>>> +    if (fc != 0x03)
>>>>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>>>>
>>>>>>>>>> You must have missed my suggestion to move this to the
>>>>>>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>>>>>>>
>>>>>>>>> Please consider what happen if the vfio_ap module is not loaded.
>>>>>>>>
>>>>>>>> I have considered it and even verified my expectations empirically. If
>>>>>>>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
>>>>>>>
>>>>>>> OK, now please consider that another userland tool, not QEMU uses KVM.
>>>>>>
>>>>>> What does that have to do with loading the vfio_ap module? Without the
>>>>>> vfio_ap module, there will be no AP devices for the guest. What are you
>>>>>> suggesting here?
>>>>>>
>>>>>>>
>>>>>>>> If you don't have an mdev device, you will not be able to
>>>>>>>> start a guest with a vfio-ap device. If you start a guest without a
>>>>>>>> vfio-ap device, but enable AP instructions for the guest, there will be
>>>>>>>> no AP devices attached to the guest. Without any AP devices attached,
>>>>>>>> the PQAP(AQIC) instructions will not ever get executed.
>>>>>>>
>>>>>>> This is not right. The instruction will be executed, eventually, after decoding.
>>>>>>
>>>>>> Please explain why the PQAP(AQIC) instruction will be executed on a
>>>>>> guest without any devices? Point me to the code in the AP bus where
>>>>>> PQAP(AQIC) is executed without a queue?
>>>>>
>>>>> The host must be prepared to handle malicous and broken guests. So if
>>>>> a guest does PQAP, we must handle that gracefully (e.g. by injecting an
>>>>> exception)
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Even if for some
>>>>>>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>>>>>>>> reason, it will fail with response code 0x01, AP-queue number not valid.
>>>>>>>
>>>>>>> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
>>>>>>> - OPERATION EXCEPTION if the micro-code is not installed
>>>>>>> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
>>>>>>> - SPECIFICATION exception if the instruction do not respect the usage specification
>>>>>>>
>>>>>>> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
>>>>>>>
>>>>>>> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
>>>>>>>
>>>>>>> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
>>>>>>>
>>>>>>> That done, the AP queue virtualisation can be called, this is done by calling the hook.
>>>>>>
>>>>>> Okay, let's go back to the genesis of this discussion; namely, my
>>>>>> suggestion about moving the fc == 0x03 check into the hook code. If
>>>>>> the vfio_ap module is not loaded, there will be no hook code. In that
>>>>>> case, the check for the hook will fail and ultimately response code
>>>>>> 0x01 will be set in the status word (which may not be the right thing
>>>>>> to do?). You have not stated a single good reason for keeping this
>>>>>> check, but I'm done with this silly argument. It certainly doesn't
>>>>>> hurt anything.
>>>>>
>>>>> The instruction handler must handle the basic checks for the
>>>>> instruction itself as outlined above.
>>>>>
>>>>> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
>>>>> The we should pass along everything to QEMU, but this is already done with the
>>>>> ECA_APIE check, correct?
>>>>>
>>>>> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
>>>>> in QEMU and we have enabled the AP instructions interpretion?
>>>>> If yes then this has some implication:
>>>>>
>>>>> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
>>>>> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
>>>>> right away with a specification exception. I do not want the hook to mess with
>>>>> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
>>>>
>>>>
>>>> Currently the check test_kvm_facility(vcpu->kvm, 65) is done in the instruction handler, what do you mean here?
>>>
>>> Found it. I think we should couple the check for 64 to fc==3. Otherwise both things are somewhat
>>> disconnected when reviewing.
>>>
>>
>> Right.
>> In the next version I will go the way you proposed anyway and handle all
>> PQAP functions separatly (switch/dedicated functions).
> 
> Sorry what did Christian propose? I've lost you. Christian's initial
> analysis assumed AFAIU that we only have or care for fc == 3.
> 
> BTW have you seen my response to Christians analysis and the changes I
> proposed?

Yes, just pushed the send button. :)

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 13:52                     ` Cornelia Huck
@ 2019-02-28 14:14                       ` Pierre Morel
  2019-03-01 12:03                         ` Pierre Morel
  0 siblings, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-02-28 14:14 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Christian Borntraeger, Tony Krowiak, alex.williamson,
	linux-kernel, linux-s390, kvm, frankja, pasic, david,
	schwidefsky, heiko.carstens, freude, mimu

On 28/02/2019 14:52, Cornelia Huck wrote:
> On Thu, 28 Feb 2019 14:16:09 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 28/02/2019 12:22, Cornelia Huck wrote:
> 
>>> So, to summarize, the function should do:
>>> - Is userspace supposed to emulate everything (!ECA_APIE)? Return
>>>     -EOPNOTSUPP to hand control to it.
>>> - We are now interpreting the instruction in KVM. Do common checks
>>>     (PSTATE etc.) and inject exceptions, if needed.
>>> - Now look at the fc; if there's a handler for it, call that; if not
>>>     (case does not attempt to call a specific handler, or no handler
>>>     registered), inject a specification exception. (Do we want pre-checks
>>>     like for facility 65 here, or in the handler?)
>>>
>>> That response code 0x01 thingy probably needs to go into the specific
>>> handler function, if anywhere (don't know the semantics, sorry).
>>
>> What do you mean with specific handler function?
>>
>> If you mean a switch around the FC with static function's call, I agree,
>> if you mean a jump into a hook I do not agree.
> 
> Ah, ok; so each case (that we want to handle) should call into a
> subhandler that does
> {
> 	(... check things like facilities ...)
> 	if (!specific_hook)
> 		inject_specif_excp_and_return();
> 	ret = specific_hook();
> 	if (ret)
> 		set_resp_code_0x01(); // or in specific_hook()?
> }
> 
> ?

Yes something in this direction.

>   
>>>
>>> Question: Will the handlers for the individual fcs need to generate
>>> different exceptions on their own? I.e., do they need to do injections
>>> themselves, or should the calling function possibly inject an exception
>>> on error?
>>
>> There are some specificities.
> 
> Ok, should probably done in the subhandlers?
> 
> (I hope I don't muddy the waters too much; but basically, I'm poking
> around with a stick in the dark :)
> 

No problem, it is OK.
My first idea was to make only changes associated with PQAP/AQIC.
We already should have done it for all PQAP functions so it is decided 
that we will do it now as Christian proposed.

Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control
  2019-02-22 15:29 [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
                   ` (6 preceding siblings ...)
  2019-02-22 15:30 ` [PATCH v4 7/7] s390: ap: kvm: Enable PQAP/AQIC facility for the guest Pierre Morel
@ 2019-02-28 15:08 ` Halil Pasic
  2019-03-01  9:40   ` Pierre Morel
  7 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2019-02-28 15:08 UTC (permalink / raw)
  To: Pierre Morel
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On Fri, 22 Feb 2019 16:29:53 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> This patch implement PQAP/AQIC interception in KVM.
> 
> To implement this we need to add a new structure, vfio_ap_queue,to be
> able to retrieve the mediated device associated with a queue and specific
> values needed to register/unregister the interrupt structures:
>  - APQN: to be able to issue the commands and search for queue structures
>  - NIB : to unpin the NIB on clear IRQ
>  - ISC : to unregister with the GIB interface
>  - MATRIX: a pointer to the matrix mediated device
>  - LIST: the list_head to handle the vfio_queue life cycle
> 
> Having this structure and the list management greatly ease the handling
> of the AP queues and diminues the LOCs needed in the vfio_ap driver by
> more than 150 lines in comparison with the previous version.
> 
> 
> 0) Queues life cycle
> 
> vfio_ap_queues are created on probe
> 
> We define one bucket on the matrix device to store the free vfio_ap_queues,
> the queues not assign to any matrix mediated device.
> 
> We define one bucket on each matrix mediated device to hold the
> vfio_ap_queues belonging to it.
> 
> vfio_ap_queues are deleted on remove
> 
> This makes the search for a queue easy and the detection of assignent
> incoherency obvious (the queue is not avilable) and simplifies assignment.
> 
> 
> 1) Phase 1, probe and remove from vfio_ap_queue
> 
> The vfio_ap_queue structures are dynamically allocated and setup
> when a queue is probed by the ap_vfio_driver.
> The vfio_ap_queue is linked to the ap_queue device as the driver data.
> 
> The new The vfio_ap_queue is put on a free_list belonging to the
> matrix device.
> 
> The vfio_ap_queue are free during remove.
> 
> 
> 2) Phase 2, assignment of vfio_ap_queue to a mediated device
> 
> When a APID is assigned we look for APQI already assigned to
> the matrix mediated device and associate all the queue with the
> APQN = (APID,APQI) to the mediated device by adding them to
> the mediated device queue list.
> We do the same when a APQI is assigned.
> 
> If any queue with a matching APQN can not be found on the matrix
> device free list it means it is already associated to another matrix
> mediated device and no queue is added to the matrix mediated device.
> 
> 3) Phase 3, starting the guest
> 
> When the VFIO device is opened the PQAP callback and a pointer to
> the matrix mediated device are set inside KVM during the open callback.
> 
> When the device is closed or if a queue is removed, the vfio_ap_queue is
> dissociated from the mediated device.
> 
> 
> 4) Phase 3 intercepting the PQAP/AQIC instruction
> 
> On interception of the PQAP/AQIC instruction, the interception code
> makes sure the pqap_hook is initialized and allowed to be called
> and call it.
> Otherwise it reports the usual -EOPNOTSUPP return code to let
> QEMU handle the fault.
>   
> the pqap callback search for the queue asociated with the APQN
> stored in the register 0, setting the code to "illegal APQN"
> if the vfio_ap_queue can not be found.
> 
> Depending on the "i" bit of the register 1, the pqap callback
> setup or clear the interruption by calling the host format PQAP/AQIC
> instruction.
> When seting up the interruption it uses the NIB and the guest ISC
> provided by the guest and the host ISC provided by the registration
> to the GIB code, pin the NIB and also stores ISC and NIB inside
> the vfio_ap_queue structure.
> When clearing the interrupt it retrieves the host ISC to unregister
> with the GIB code and unpin the NIB.
> 
> We take care when enabling GISA that the guest may have issued a
> reset and will not need to disable the interuptions before
> re-enabling interruptions.

Please let us know what guarantees, that we will disable the
interruptions we previously enabled using AQIC (and generally facilitate
proper cleanup) *before* kvm_s390_gisa_destroy() makes the gisa and
with that the IPM go away!

Please note that IMHO this needs to be guaranteed by the kernel
regardless of what userspace (QEMU) or the guest does.

(I've asked this question before during our internal review but I could
not find the answer if there was one after going trough my mails.)

Regards,
Halil


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28  9:42             ` Christian Borntraeger
                                 ` (2 preceding siblings ...)
  2019-02-28 13:23               ` Pierre Morel
@ 2019-02-28 15:35               ` Tony Krowiak
  2019-03-01  8:42                 ` Christian Borntraeger
  3 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2019-02-28 15:35 UTC (permalink / raw)
  To: Christian Borntraeger, pmorel
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/28/19 4:42 AM, Christian Borntraeger wrote:
> 
> 
> On 27.02.2019 19:00, Tony Krowiak wrote:
>> On 2/27/19 3:09 AM, Pierre Morel wrote:
>>> On 26/02/2019 16:47, Tony Krowiak wrote:
>>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>>>> the case the AQIC facility is enabled in the guest.
>>>>>>>
>>>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>>>> instruction with the AQIC command.
>>>>>>>
>>>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>>>> is not loaded.
>>>>>>>
>>>>>>> If the callback has been setup we call it.
>>>>>>> If not we setup an answer considering that no queue is available
>>>>>>> for the guest when no callback has been setup.
>>>>>>>
>>>>>>> We do consider the responsability of the driver to always initialize
>>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>>>> a guest.
>>>>>>>
>>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>>
>>>>> ...snip...
>>>>>
>>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>>>        }
>>>>>>>    }
>>>>>>> +/*
>>>>>>> + * handle_pqap: Handling pqap interception
>>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>>>> + *
>>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>>>> + *
>>>>>>> + * The intercepting code calls a dedicated callback for this instruction
>>>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>>>> + * SIE block.
>>>>>>> + *
>>>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>>>>>>> + *
>>>>>>> + * If no callback available, the queues are not available, return this to
>>>>>>> + * the caller.
>>>>>>> + * Else return the value returned by the callback.
>>>>>>> + */
>>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>>>> +{
>>>>>>> +    uint8_t fc;
>>>>>>> +    struct ap_queue_status status = {};
>>>>>>> +
>>>>>>> +    /* Verify that the AP instruction are available */
>>>>>>> +    if (!ap_instructions_available())
>>>>>>> +        return -EOPNOTSUPP;
>>>>>>
>>>>>> How can the guest even execute an AP instruction if the AP instructions
>>>>>> are not available? If the AP instructions are not available on the host,
>>>>>> they will not be available on the guest (i.e., CPU model feature
>>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>>>> here given QEMU may not be the only client.
>>>>>>
>>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>>>> +        return -EOPNOTSUPP;
>>>>>>> +    /* Verify that the function code is AQIC */
>>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>>>> +    if (fc != 0x03)
>>>>>>> +        return -EOPNOTSUPP;
>>>>>>
>>>>>> You must have missed my suggestion to move this to the
>>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>>>
>>>>> Please consider what happen if the vfio_ap module is not loaded.
>>>>
>>>> I have considered it and even verified my expectations empirically. If
>>>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
>>>
>>> OK, now please consider that another userland tool, not QEMU uses KVM.
>>
>> What does that have to do with loading the vfio_ap module? Without the
>> vfio_ap module, there will be no AP devices for the guest. What are you
>> suggesting here?
>>
>>>
>>>> If you don't have an mdev device, you will not be able to
>>>> start a guest with a vfio-ap device. If you start a guest without a
>>>> vfio-ap device, but enable AP instructions for the guest, there will be
>>>> no AP devices attached to the guest. Without any AP devices attached,
>>>> the PQAP(AQIC) instructions will not ever get executed.
>>>
>>> This is not right. The instruction will be executed, eventually, after decoding.
>>
>> Please explain why the PQAP(AQIC) instruction will be executed on a
>> guest without any devices? Point me to the code in the AP bus where
>> PQAP(AQIC) is executed without a queue?
> 
> The host must be prepared to handle malicous and broken guests. So if
> a guest does PQAP, we must handle that gracefully (e.g. by injecting an
> exception)

I agree, but the context of this discussion is whether it is
more appropriate to check fc == 0x03 in this function or the pqap
hook. If there is no vfio_ap module, which Pierre asked me to consider,
then there will be no hook initialized. Again, nothing Pierre has
stated has convinced me that the fc check belongs here, although there
is no harm in doing so. In fact, a malicious guest can issue PQAP(AQIC)
with fc=0x03, so none of the arguments above makes sense in this
context.

> 
>>
>>>
>>>> Even if for some
>>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>>>> reason, it will fail with response code 0x01, AP-queue number not valid.
>>>
>>> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
>>> - OPERATION EXCEPTION if the micro-code is not installed
>>> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
>>> - SPECIFICATION exception if the instruction do not respect the usage specification
>>>
>>> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
>>>
>>> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
>>>
>>> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
>>>
>>> That done, the AP queue virtualisation can be called, this is done by calling the hook.
>>
>> Okay, let's go back to the genesis of this discussion; namely, my
>> suggestion about moving the fc == 0x03 check into the hook code. If
>> the vfio_ap module is not loaded, there will be no hook code. In that
>> case, the check for the hook will fail and ultimately response code
>> 0x01 will be set in the status word (which may not be the right thing
>> to do?). You have not stated a single good reason for keeping this
>> check, but I'm done with this silly argument. It certainly doesn't
>> hurt anything.
> 
> The instruction handler must handle the basic checks for the
> instruction itself as outlined above.

The pqap hook IS the instruction handler. Everything up to the point of
calling the hook is validation code. Since the pqap hook is the entity
that actually handles the instruction, it seem to me logical that it
would be the place to parse the sub-function (i.e., fc) of the
PQAP instruction. That is the essence of my argument. I've stated
several sound reasons for asking for the change. Having said that, if
you all feel strongly that it belongs here, no harm done. As the Beatles
once said, let it be.

> 
> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
> The we should pass along everything to QEMU, but this is already done with the
> ECA_APIE check, correct?
> 
> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
> in QEMU and we have enabled the AP instructions interpretion?
> If yes then this has some implication:
> 
> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
> right away with a specification exception. I do not want the hook to mess with
> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
> 3. What shall we do when fc == 0x3? We can certainly do the check here OR in the
> hook. As long as we have only fc==3 this does not matter.
> 
> Correct?

That all sounds correct assuming we will never do interception of other
AP instructions in KVM.


> 
>>
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Message ID <342ffd56-b73a-b1f4-004d-de2c4aeef729@linux.ibm.com>
>>>>>> Message ID <e04f0c8b-2fd9-1846-334a-faa48e0e051e@linux.ibm.com>
>>>>>>
>>>>>> You previously stated:
>>>>>>
>>>>>>      "QEMU and KVM can both accept PQAP/AQIC even if the vfio_ap driver is
>>>>>>       not loaded. However now that the guest officially get the PQAP/AQIC
>>>>>>       instruction we need to handle the specification and operation
>>>>>>       exceptions inside KVM _before_ testing and even calling the driver
>>>>>>       hook.
>>>>>>
>>>>>>       I will make the changes in the next iteration."
>>>>>
>>>>> Still seems right to me, and is done is this patch.
>>>>> Isn't it?
>>>>
>>>> I don't think it's a matter of right and wrong, it's a matter of what
>>>> makes sense. IMHO, you want to make things easy if other PQAP functions
>>>> are intercepted at some time. In my opinion, there should be a switch
>>>> statement in the pqap hook code with a case statement for each PQAP
>>>> function supported by the hook. To plug in a new PQAP function handler,
>>>> it will be a simple matter of writing the handler function and calling
>>>> it from the case statement, like this:
>>>>
>>>> static int handle_pqap(struct kvm_vcpu *vcpu)
>>>> {
>>>>       int ret;
>>>>       uint8_t fc;
>>>>
>>>>       fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>
>>>>       switch (fc) {
>>>>       case 0x03:
>>>>           ret = handle_pqap_aqic(vcpu);
>>>>       default:
>>>>           ret = -EOPNOTSUPP;
>>>>       }
>>>>
>>>>       return ret;
>>>> }
>>>>
>>>> That function belongs in the pqap hook. I see no reaason whatsoever to
>>>> check the function code here. If there is no hook, then you will fall
>>>> through to the instruction below:
>>>>
>>>> status.response_code = 0x01;
>>>
>>> See answer above, what you are speaking about is the execution of the instruction, but there can be exceptions during the decode of the instruction.
>>
>> What are you talking about, "decode of the instruction".
> 
> I think Pierre is talking about the the KVM instruction decoder.
> (see handle_instruction in  intercept.c that will then call handle_b2
> and then call handle_pqap).

I think this debate has gone on far too long for such a minor
suggestion. If Pierre wants to keep the check for fc here, so be
it. I've wasted waaaaaay to much time on it.

> 
>>>
>>>>
>>>>>
>>>>>>
>>>>>> I don't know what any of the above has to do with checking FC=0x03? If
>>>>>> that check is moved to the pqap handler hook, it can just as well return
>>>>>> -EOPNOTSUPP. In fact, down below you do this:
>>>>>>
>>>>>>       return vcpu->kvm->arch.crypto.pqap_hook(vcpu);
>>>>>>
>>>>>> If the RC=0x03 check fails in the hook, it will return -EOPNOTSUPP just
>>>>>> like above. None of this is critical, but the parsing of the register
>>>>>> values for the PQAP(AQIC) function ought to be done in the code that
>>>>>> handles the PQAP instruction IMHO.
>>>>>
>>>>>
>>>>> This interception code must handle the PQAP/AQIC instruction when the hook is not used and should not modify the handling for other PQAP instructions.
>>>>> We can not move anything inside the hook that must be always done.
>>>>
>>>> What you are saying here makes no sense. If the check for the function
>>>> code is moved into the pqap hook and fc != 0x03, the result will be
>>>> exactly the same; the hook will return -EOPNOTSUPP.
>>>
>>> again please consider that the hook may not be initialized.
>>
>>
>> So what? Then maybe the code at the end of the function is wrong:
>>
>> /* PQAP/AQIC instructions are authorized but there is no queue */
>> status.response_code = 0x01;
>> memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
>> return 0;
>>
>> Why does this make sense? What if the APQN is valid? You don't even know
>> whether it is or not. The only reason you would even reach this
>> instruction is if the pqap hook is not initialized. Wouldn't it make
>> more sense to just return -EOPNOTSUPP here? If there is no hook, then
>> it is not supported.
>>
>>>
>>> Regards,
>>> Pierre
>>>
>>
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 11:03               ` Christian Borntraeger
  2019-02-28 11:22                 ` Cornelia Huck
  2019-02-28 13:10                 ` Pierre Morel
@ 2019-02-28 15:36                 ` Tony Krowiak
  2 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2019-02-28 15:36 UTC (permalink / raw)
  To: Christian Borntraeger, pmorel
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/28/19 6:03 AM, Christian Borntraeger wrote:
> 
> 
> On 28.02.2019 10:42, Christian Borntraeger wrote:
> [...]
>>> Okay, let's go back to the genesis of this discussion; namely, my
>>> suggestion about moving the fc == 0x03 check into the hook code. If
>>> the vfio_ap module is not loaded, there will be no hook code. In that
>>> case, the check for the hook will fail and ultimately response code
>>> 0x01 will be set in the status word (which may not be the right thing
>>> to do?). You have not stated a single good reason for keeping this
>>> check, but I'm done with this silly argument. It certainly doesn't
>>> hurt anything.
>>
>> The instruction handler must handle the basic checks for the
>> instruction itself as outlined above.
>>
>> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
>> The we should pass along everything to QEMU, but this is already done with the
>> ECA_APIE check, correct?
>>
>> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
>> in QEMU and we have enabled the AP instructions interpretion?
>> If yes then this has some implication:
>>
>> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
>> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
>> right away with a specification exception. I do not want the hook to mess with
>> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
>> 3. What shall we do when fc == 0x3? We can certainly do the check here OR in the
>> hook. As long as we have only fc==3 this does not matter.
>>
>> Correct?
> 
> Thinking more about that, I think we should inject a specification exception for all
> unknown FCc != 0x3. That would also qualify for keeping it in the instruction handler.

Sure, let's do it.

> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 12:39               ` Halil Pasic
  2019-02-28 14:12                 ` Pierre Morel
@ 2019-02-28 15:43                 ` Tony Krowiak
  1 sibling, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2019-02-28 15:43 UTC (permalink / raw)
  To: Halil Pasic, Christian Borntraeger
  Cc: pmorel, alex.williamson, cohuck, linux-kernel, linux-s390, kvm,
	frankja, david, schwidefsky, heiko.carstens, freude, mimu

On 2/28/19 7:39 AM, Halil Pasic wrote:
> On Thu, 28 Feb 2019 10:42:23 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>>
>>
>> On 27.02.2019 19:00, Tony Krowiak wrote:
>>> On 2/27/19 3:09 AM, Pierre Morel wrote:
>>>> On 26/02/2019 16:47, Tony Krowiak wrote:
>>>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>>>>> the case the AQIC facility is enabled in the guest.
>>>>>>>>
>>>>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>>>>> instruction with the AQIC command.
>>>>>>>>
>>>>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>>>>> is not loaded.
>>>>>>>>
>>>>>>>> If the callback has been setup we call it.
>>>>>>>> If not we setup an answer considering that no queue is available
>>>>>>>> for the guest when no callback has been setup.
>>>>>>>>
>>>>>>>> We do consider the responsability of the driver to always initialize
>>>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>>>>> a guest.
>>>>>>>>
>>>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>>>
>>>>>> ...snip...
>>>>>>
>>>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>>>>        }
>>>>>>>>    }
>>>>>>>> +/*
>>>>>>>> + * handle_pqap: Handling pqap interception
>>>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>>>>> + *
>>>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>>>>> + *
>>>>>>>> + * The intercepting code calls a dedicated callback for this instruction
>>>>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>>>>> + * SIE block.
>>>>>>>> + *
>>>>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>>>>>>>> + *
>>>>>>>> + * If no callback available, the queues are not available, return this to
>>>>>>>> + * the caller.
>>>>>>>> + * Else return the value returned by the callback.
>>>>>>>> + */
>>>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>>>>> +{
>>>>>>>> +    uint8_t fc;
>>>>>>>> +    struct ap_queue_status status = {};
>>>>>>>> +
>>>>>>>> +    /* Verify that the AP instruction are available */
>>>>>>>> +    if (!ap_instructions_available())
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>
>>>>>>> How can the guest even execute an AP instruction if the AP instructions
>>>>>>> are not available? If the AP instructions are not available on the host,
>>>>>>> they will not be available on the guest (i.e., CPU model feature
>>>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>>>>> here given QEMU may not be the only client.
>>>>>>>
>>>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>> +    /* Verify that the function code is AQIC */
>>>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>>>>> +    if (fc != 0x03)
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>
>>>>>>> You must have missed my suggestion to move this to the
>>>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>>>>
>>>>>> Please consider what happen if the vfio_ap module is not loaded.
>>>>>
>>>>> I have considered it and even verified my expectations empirically. If
>>>>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
>>>>
>>>> OK, now please consider that another userland tool, not QEMU uses KVM.
>>>
>>> What does that have to do with loading the vfio_ap module? Without the
>>> vfio_ap module, there will be no AP devices for the guest. What are you
>>> suggesting here?
>>>
>>>>
>>>>> If you don't have an mdev device, you will not be able to
>>>>> start a guest with a vfio-ap device. If you start a guest without a
>>>>> vfio-ap device, but enable AP instructions for the guest, there will be
>>>>> no AP devices attached to the guest. Without any AP devices attached,
>>>>> the PQAP(AQIC) instructions will not ever get executed.
>>>>
>>>> This is not right. The instruction will be executed, eventually, after decoding.
>>>
>>> Please explain why the PQAP(AQIC) instruction will be executed on a
>>> guest without any devices? Point me to the code in the AP bus where
>>> PQAP(AQIC) is executed without a queue?
>>
>> The host must be prepared to handle malicous and broken guests. So if
>> a guest does PQAP, we must handle that gracefully (e.g. by injecting an
>> exception)
>>
> 
> Nod.
> 
>>>
>>>>
>>>>> Even if for some
>>>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>>>>> reason, it will fail with response code 0x01, AP-queue number not valid.
>>>>
>>>> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
>>>> - OPERATION EXCEPTION if the micro-code is not installed
>>>> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
>>>> - SPECIFICATION exception if the instruction do not respect the usage specification
>>>>
>>>> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
>>>>
>>>> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
>>>>
>>>> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
>>>>
>>>> That done, the AP queue virtualisation can be called, this is done by calling the hook.
>>>
>>> Okay, let's go back to the genesis of this discussion; namely, my
>>> suggestion about moving the fc == 0x03 check into the hook code. If
>>> the vfio_ap module is not loaded, there will be no hook code. In that
>>> case, the check for the hook will fail and ultimately response code
>>> 0x01 will be set in the status word (which may not be the right thing
>>> to do?). You have not stated a single good reason for keeping this
>>> check, but I'm done with this silly argument. It certainly doesn't
>>> hurt anything.
>>
>> The instruction handler must handle the basic checks for the
>> instruction itself as outlined above.
> 
> Nod.
> 
>>
>> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
>> The we should pass along everything to QEMU, but this is already done with the
>> ECA_APIE check, correct?
> 
> Nod.
> 
>>
>> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
>> in QEMU and we have enabled the AP instructions interpretion?
> 
> At least the intention is to not emulate. ECA_APIE is an effective
> control though...
> 
>> If yes then this has some implication:
>>
>> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
> 
> Not necessarily true. TAPQ can be intercepted as well (APFT depends
> IC.3). But for now we don't care about that.
> 
>> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
>> right away with a specification exception. I do not want the hook to mess with
>> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
> 
> As far as I can tell he already does test_kvm_facility(vcpu->kvm, 65). I
> agree we need a spec exception if guest does not have facility 65, but
> does have ap instructions.
> 
>> 3. What shall we do when fc == 0x3? We can certainly do the check here OR in the
>> hook. As long as we have only fc==3 this does not matter.
>>
> 
> I guess Tony's point is that we may have fc == 0 that is TAPQ in the
> APFT flavor. IMHO we don't need to care about that at the moment.
> 
>> Correct?
> 
> IMHO mostly.
> 
> I also doing the facility checks in kvm is easier, and I think this is
> something we can change later if needed without any major trouble.
> 
> There are a couple of things I would do differently than Pierre does:
> 1) Do the PGM_PRIVILEGED_OP before the fc == 3 check.
> 
> 2) Do the test_kvm_facility(vcpu->kvm, 65) check in the context of fc ==
> 3. I.e. decide if this hook is about pqap or just about pqap aqic and
> make the code convey that decision to its reader.
> 
> 3) I would most probably test if the queue is available by looking at the
> masks in CRYCB here. If not AP_RESPONSE_Q_NOT_AVAIL is what we need.
> 
> 4) If we have APIE and queues authorized by the CRYCB (i.e. we have a
> vfio_ap module loaded an an mdev associated with the kvm) the callback
> not set (!(vcpu->kvm->arch.crypto.pqap_hook)) is a BUG! In that case
> lying that the queue is not available does not seem right. BTW this is
> something Pierre changed since the last version quietly (I can't recall
> a mention in the change log or somebody asking for this). If we want to
> be very pedantic about this bug scenario our best bet is probably
> response code 6.
> 

Agreed

> Regards,
> Halil
> 
> [..]
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 13:44                 ` Christian Borntraeger
  2019-02-28 13:47                   ` Pierre Morel
@ 2019-02-28 15:45                   ` Tony Krowiak
  1 sibling, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2019-02-28 15:45 UTC (permalink / raw)
  To: Christian Borntraeger, pmorel
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/28/19 8:44 AM, Christian Borntraeger wrote:
> 
> 
> On 28.02.2019 14:23, Pierre Morel wrote:
>> On 28/02/2019 10:42, Christian Borntraeger wrote:
>>>
>>>
>>> On 27.02.2019 19:00, Tony Krowiak wrote:
>>>> On 2/27/19 3:09 AM, Pierre Morel wrote:
>>>>> On 26/02/2019 16:47, Tony Krowiak wrote:
>>>>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>>>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>>>>>> the case the AQIC facility is enabled in the guest.
>>>>>>>>>
>>>>>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>>>>>> instruction with the AQIC command.
>>>>>>>>>
>>>>>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>>>>>> is not loaded.
>>>>>>>>>
>>>>>>>>> If the callback has been setup we call it.
>>>>>>>>> If not we setup an answer considering that no queue is available
>>>>>>>>> for the guest when no callback has been setup.
>>>>>>>>>
>>>>>>>>> We do consider the responsability of the driver to always initialize
>>>>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>>>>>> a guest.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>>>>
>>>>>>> ...snip...
>>>>>>>
>>>>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>>>>>         }
>>>>>>>>>     }
>>>>>>>>> +/*
>>>>>>>>> + * handle_pqap: Handling pqap interception
>>>>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>>>>>> + *
>>>>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>>>>>> + *
>>>>>>>>> + * The intercepting code calls a dedicated callback for this instruction
>>>>>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>>>>>> + * SIE block.
>>>>>>>>> + *
>>>>>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>>>>>>>>> + *
>>>>>>>>> + * If no callback available, the queues are not available, return this to
>>>>>>>>> + * the caller.
>>>>>>>>> + * Else return the value returned by the callback.
>>>>>>>>> + */
>>>>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>>>>>> +{
>>>>>>>>> +    uint8_t fc;
>>>>>>>>> +    struct ap_queue_status status = {};
>>>>>>>>> +
>>>>>>>>> +    /* Verify that the AP instruction are available */
>>>>>>>>> +    if (!ap_instructions_available())
>>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>>
>>>>>>>> How can the guest even execute an AP instruction if the AP instructions
>>>>>>>> are not available? If the AP instructions are not available on the host,
>>>>>>>> they will not be available on the guest (i.e., CPU model feature
>>>>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>>>>>> here given QEMU may not be the only client.
>>>>>>>>
>>>>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>>> +    /* Verify that the function code is AQIC */
>>>>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>>>>>> +    if (fc != 0x03)
>>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>>
>>>>>>>> You must have missed my suggestion to move this to the
>>>>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>>>>>
>>>>>>> Please consider what happen if the vfio_ap module is not loaded.
>>>>>>
>>>>>> I have considered it and even verified my expectations empirically. If
>>>>>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
>>>>>
>>>>> OK, now please consider that another userland tool, not QEMU uses KVM.
>>>>
>>>> What does that have to do with loading the vfio_ap module? Without the
>>>> vfio_ap module, there will be no AP devices for the guest. What are you
>>>> suggesting here?
>>>>
>>>>>
>>>>>> If you don't have an mdev device, you will not be able to
>>>>>> start a guest with a vfio-ap device. If you start a guest without a
>>>>>> vfio-ap device, but enable AP instructions for the guest, there will be
>>>>>> no AP devices attached to the guest. Without any AP devices attached,
>>>>>> the PQAP(AQIC) instructions will not ever get executed.
>>>>>
>>>>> This is not right. The instruction will be executed, eventually, after decoding.
>>>>
>>>> Please explain why the PQAP(AQIC) instruction will be executed on a
>>>> guest without any devices? Point me to the code in the AP bus where
>>>> PQAP(AQIC) is executed without a queue?
>>>
>>> The host must be prepared to handle malicous and broken guests. So if
>>> a guest does PQAP, we must handle that gracefully (e.g. by injecting an
>>> exception)
>>>
>>>>
>>>>>
>>>>>> Even if for some
>>>>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>>>>>> reason, it will fail with response code 0x01, AP-queue number not valid.
>>>>>
>>>>> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
>>>>> - OPERATION EXCEPTION if the micro-code is not installed
>>>>> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
>>>>> - SPECIFICATION exception if the instruction do not respect the usage specification
>>>>>
>>>>> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
>>>>>
>>>>> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
>>>>>
>>>>> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
>>>>>
>>>>> That done, the AP queue virtualisation can be called, this is done by calling the hook.
>>>>
>>>> Okay, let's go back to the genesis of this discussion; namely, my
>>>> suggestion about moving the fc == 0x03 check into the hook code. If
>>>> the vfio_ap module is not loaded, there will be no hook code. In that
>>>> case, the check for the hook will fail and ultimately response code
>>>> 0x01 will be set in the status word (which may not be the right thing
>>>> to do?). You have not stated a single good reason for keeping this
>>>> check, but I'm done with this silly argument. It certainly doesn't
>>>> hurt anything.
>>>
>>> The instruction handler must handle the basic checks for the
>>> instruction itself as outlined above.
>>>
>>> Do we want to allow QEMU to fully emulate everything (the  ECA_APIE case being off)?
>>> The we should pass along everything to QEMU, but this is already done with the
>>> ECA_APIE check, correct?
>>>
>>> Do we agree that when we are beyond the ECA_APIE check, that we do not emulate
>>> in QEMU and we have enabled the AP instructions interpretion?
>>> If yes then this has some implication:
>>>
>>> 1. ECA is on and we should only get PQAP interception for specific FC (namely 3).
>>> 2. What we certainly should check is the facility bit of the guest (65) and reject fc==3
>>> right away with a specification exception. I do not want the hook to mess with
>>> the kvm cpu model. @Pierre would be good to actually check test_kvm_facility(vcpu->kvm, 65))
>>
>>
>> Currently the check test_kvm_facility(vcpu->kvm, 65) is done in the instruction handler, what do you mean here?
> 
> Found it. I think we should couple the check for 64 to fc==3. Otherwise both things are somewhat
> disconnected when reviewing.

I think you meant facility bit 65.

> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 14:12                 ` Pierre Morel
@ 2019-02-28 16:51                   ` Halil Pasic
  2019-03-01 12:10                     ` Pierre Morel
  0 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2019-02-28 16:51 UTC (permalink / raw)
  To: Pierre Morel
  Cc: Christian Borntraeger, Tony Krowiak, alex.williamson, cohuck,
	linux-kernel, linux-s390, kvm, frankja, david, schwidefsky,
	heiko.carstens, freude, mimu

On Thu, 28 Feb 2019 15:12:16 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 28/02/2019 13:39, Halil Pasic wrote:
> > On Thu, 28 Feb 2019 10:42:23 +0100
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
[..]
> >> Correct?
> > 
> > IMHO mostly.
> > 
> > I also doing the facility checks in kvm is easier, and I think this is
> > something we can change later if needed without any major trouble.
> > 
> > There are a couple of things I would do differently than Pierre does:
> > 1) Do the PGM_PRIVILEGED_OP before the fc == 3 check.
> 
> Idea was not to modify existing behavior for fc != 3
> 
> Also Christian already proposed to handle all FC codes. So in this idea, 
> this must be done as you say.
> 
> > 
> > 2) Do the test_kvm_facility(vcpu->kvm, 65) check in the context of fc ==
> > 3. I.e. decide if this hook is about pqap or just about pqap aqic and
> > make the code convey that decision to its reader.
> > 
> > 3) I would most probably test if the queue is available by looking at the
> > masks in CRYCB here. If not AP_RESPONSE_Q_NOT_AVAIL is what we need.
> 
> This I do not agree with, it is typically the responsibility of the part 
> in charge of the virtualization to do this, also the vfio_driver.
> 

See at 4) regarding the details. My guess is you disagree with checking
CRYCB explicitly but don't digress with AP_RESPONSE_Q_NOT_AVAIL if APCB
does not authorize the queue. Your idea was to infer APCB all zero from
the fact that pqap_hook is NULL.

If my assumption is right, then yes we can have an implicit coarse check
here and a fine grained check in the client code (vfio_ap).

> > 
> > 4) If we have APIE and queues authorized by the CRYCB (i.e. we have a
> > vfio_ap module loaded an an mdev associated with the kvm) the callback
> > not set (!(vcpu->kvm->arch.crypto.pqap_hook)) is a BUG!
> 
> I do not agree with this either, the maintainers ;) will not allow this.

After an offline discussion we came to the conclusion that I did not
understand your code.

Your train of thought was:

!(vcpu->kvm->arch.crypto.pqap_hook) _implies_ APCB all zero (i.e. the
masks in the CRYCB

This is *why* you respond with AP_RESPONSE_Q_NOT_AVAIL.

However if that is the case I would like that spelled out in a code
comment at least. Furthermore setting pqap_hook and APCB needs to happen
in the right sequence. Means client code (vfio_ap) may only set APCB
after the qpap_hook has been set. Currently we have a race there (as
you first do  kvm_arch_crypto_set_masks and only then
kvm->arch.crypto.pqap_hook. Furthermore I guess
kvm->arch.crypto.pqap_hook needs to be set with the kvm lock held, which
does not seem to be the case.

> 
> > In that case
> > lying that the queue is not available does not seem right. BTW this
> > is something Pierre changed since the last version quietly (I can't
> > recall a mention in the change log or somebody asking for this). If
> > we want to be very pedantic about this bug scenario our best bet is
> > probably response code 6.
> 
> 
> RC 06 means "Invalid address of AP-queue notification byte"
> 
> So you must have think about another code or I do not understand at
> all what you mean.
> 

I did not assume you decided to ignore the possibility of a programming
error (which you at least technically did commit yourself) for what I
described as a BUG.

My train of thought was, if we are very pedantic we can make things work
with degraded functionality in that case. I.e. without AP interrupts.
For that we need to tell the guest something like: yes your queue is
fine and there and all that but AQCI setup interrupts did not work. And
RC 06 is the only RC I see being suitable to convey that.

Detect and handle if the client code does not hold up their end of the
bargain or just ignore the possibility is a design decision. But at least
you should spell out your expectations against the client code.

Regards,
Halil


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier
  2019-02-28  8:48     ` Pierre Morel
@ 2019-02-28 16:55       ` Halil Pasic
  2019-03-01  7:51         ` Christian Borntraeger
  0 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2019-02-28 16:55 UTC (permalink / raw)
  To: Pierre Morel
  Cc: Christian Borntraeger, alex.williamson, cohuck, linux-kernel,
	linux-s390, kvm, frankja, akrowiak, david, schwidefsky,
	heiko.carstens, freude, mimu

On Thu, 28 Feb 2019 09:48:39 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 28/02/2019 09:23, Christian Borntraeger wrote:
> > On 22.02.2019 16:29, Pierre Morel wrote:
> >> To be able to use the VFIO interface to facilitate the
> >> mediated device memory pining/unpining we need to register
> >> a notifier for IOMMU.
> > 
> > You might want to add that while we start to pin one guest page for the
> > interrupt indicator byte in the next patch, this is still ok with ballooning
> > as this page will never be used by the guest virtio-balloon driver. So the
> > pinned page will never be freed. And even a broken guest does so, that would
> > not impact the host as the original page is still in control by vfio.
> > 
> 
> Thanks, I ll do.
> 

I recall a comment in qemu that says vfio-ap does not pin any pages.
That one needs to be fixed up as well.

Regards,
Halil

[..]


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel
  2019-02-22 15:29 ` [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel Pierre Morel
  2019-02-26 18:23   ` Tony Krowiak
  2019-02-27 18:18   ` Tony Krowiak
@ 2019-02-28 20:20   ` Christian Borntraeger
  2019-03-01  9:35     ` Pierre Morel
  2019-03-04  1:57   ` Halil Pasic
  3 siblings, 1 reply; 79+ messages in thread
From: Christian Borntraeger @ 2019-02-28 20:20 UTC (permalink / raw)
  To: Pierre Morel
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu



On 22.02.2019 16:29, Pierre Morel wrote:
> We register the AP PQAP instruction hook during the open
> of the mediated device. And unregister it on release.
> 
> In the AP PQAP instruction hook, if we receive a demand to
> enable IRQs,
> - we retrieve the vfio_ap_queue based on the APQN we receive
>   in REG1,
> - we retrieve the page of the guest address, (NIB), from
>   register REG2
> - we the mediated device to use the VFIO pinning infratrsucture
>   to pin the page of the guest address,
> - we retrieve the pointer to KVM to register the guest ISC
>   and retrieve the host ISC
> - finaly we activate GISA
> 
> If we receive a demand to disable IRQs,
> - we deactivate GISA
> - unregister from the GIB
> - unping the NIB
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h      |   1 +
>  drivers/s390/crypto/ap_bus.h          |   1 +
>  drivers/s390/crypto/vfio_ap_ops.c     | 199 +++++++++++++++++++++++++++++++++-
>  drivers/s390/crypto/vfio_ap_private.h |   1 +
>  4 files changed, 199 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 49cc8b0..5f3bb8c 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -720,6 +720,7 @@ struct kvm_s390_cpu_model {
>  struct kvm_s390_crypto {
>  	struct kvm_s390_crypto_cb *crycb;
>  	int (*pqap_hook)(struct kvm_vcpu *vcpu);
> +	void *vfio_private;
>  	__u32 crycbd;
>  	__u8 aes_kw;
>  	__u8 dea_kw;
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index bfc66e4..323f2aa 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -43,6 +43,7 @@ static inline int ap_test_bit(unsigned int *ptr, unsigned int nr)
>  #define AP_RESPONSE_BUSY		0x05
>  #define AP_RESPONSE_INVALID_ADDRESS	0x06
>  #define AP_RESPONSE_OTHERWISE_CHANGED	0x07
> +#define AP_RESPONSE_INVALID_GISA	0x08
>  #define AP_RESPONSE_Q_FULL		0x10
>  #define AP_RESPONSE_NO_PENDING_REPLY	0x10
>  #define AP_RESPONSE_INDEX_TOO_BIG	0x11
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 1b5130a..0196065 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -43,7 +43,7 @@ struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
>  	return NULL;
>  }
>  
> -static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
> +int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>  {
>  	struct ap_queue_status status;
>  	int retry = 20;
> @@ -75,6 +75,27 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>  	return -EBUSY;
>  }
>  
> +/**
> + * vfio_ap_free_irq:
> + * @q: The vfio_ap_queue
> + *
> + * Unpin the guest NIB
> + * Unregister the ISC from the GIB alert
> + * Clear the vfio_ap_queue intern fields
> + */
> +static void vfio_ap_free_irq(struct vfio_ap_queue *q)
> +{
> +	if (!q)
> +		return;
> +	if (q->g_pfn)
> +		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &q->g_pfn, 1);
> +	if (q->isc)
> +		kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->isc);
> +	q->nib = 0;
> +	q->isc = 0;
> +	q->g_pfn = 0;
> +}

Pierre, unless there is some magic, I think we need to free the gisa stuff before kvm exit.

Imagine a malicious userspace that setups everything fine, but then closes all kvm file 
descriptors but not the vfio file descriptor. This might result in random access to the
memory that contained the gisa potentially resulting in random memory overwrites.

the problem is that kvm_destroy_vm calls kvm_arch_destroy_vm(kvm) before it calls
kvm_destroy_devices(kvm); So we already free the gisa before we do the unregister call.

What about calling kvm_get_kvm/put from some of the callbacks in the right places.

Debugging random memory overwrites is a PITA, so we either should document why I cannot
happen (even with malicious userspace) or simply fix the refcounting.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier
  2019-02-28 16:55       ` Halil Pasic
@ 2019-03-01  7:51         ` Christian Borntraeger
  0 siblings, 0 replies; 79+ messages in thread
From: Christian Borntraeger @ 2019-03-01  7:51 UTC (permalink / raw)
  To: Halil Pasic, Pierre Morel
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, david, schwidefsky, heiko.carstens, freude, mimu



On 28.02.2019 17:55, Halil Pasic wrote:
> On Thu, 28 Feb 2019 09:48:39 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 28/02/2019 09:23, Christian Borntraeger wrote:
>>> On 22.02.2019 16:29, Pierre Morel wrote:
>>>> To be able to use the VFIO interface to facilitate the
>>>> mediated device memory pining/unpining we need to register
>>>> a notifier for IOMMU.
>>>
>>> You might want to add that while we start to pin one guest page for the
>>> interrupt indicator byte in the next patch, this is still ok with ballooning
>>> as this page will never be used by the guest virtio-balloon driver. So the
>>> pinned page will never be freed. And even a broken guest does so, that would
>>> not impact the host as the original page is still in control by vfio.
>>>
>>
>> Thanks, I ll do.
>>
> 
> I recall a comment in qemu that says vfio-ap does not pin any pages.
> That one needs to be fixed up as well.


Yes, something along the line that we do pin the interrupt indicator pages
but those do not change regularly and we stay in lockstep with the guest.
At the same time the guest driver will keep that page allocate so virtio-balloon
will not take them.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 15:35               ` Tony Krowiak
@ 2019-03-01  8:42                 ` Christian Borntraeger
  0 siblings, 0 replies; 79+ messages in thread
From: Christian Borntraeger @ 2019-03-01  8:42 UTC (permalink / raw)
  To: Tony Krowiak, pmorel
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu



On 28.02.2019 16:35, Tony Krowiak wrote:
> On 2/28/19 4:42 AM, Christian Borntraeger wrote:
>>
>>
>> On 27.02.2019 19:00, Tony Krowiak wrote:
>>> On 2/27/19 3:09 AM, Pierre Morel wrote:
>>>> On 26/02/2019 16:47, Tony Krowiak wrote:
>>>>> On 2/26/19 6:47 AM, Pierre Morel wrote:
>>>>>> On 25/02/2019 19:36, Tony Krowiak wrote:
>>>>>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>>>>>> We prepare the interception of the PQAP/AQIC instruction for
>>>>>>>> the case the AQIC facility is enabled in the guest.
>>>>>>>>
>>>>>>>> We add a callback inside the KVM arch structure for s390 for
>>>>>>>> a VFIO driver to handle a specific response to the PQAP
>>>>>>>> instruction with the AQIC command.
>>>>>>>>
>>>>>>>> We inject the correct exceptions from inside KVM for the case the
>>>>>>>> callback is not initialized, which happens when the vfio_ap driver
>>>>>>>> is not loaded.
>>>>>>>>
>>>>>>>> If the callback has been setup we call it.
>>>>>>>> If not we setup an answer considering that no queue is available
>>>>>>>> for the guest when no callback has been setup.
>>>>>>>>
>>>>>>>> We do consider the responsability of the driver to always initialize
>>>>>>>> the PQAP callback if it defines queues by initializing the CRYCB for
>>>>>>>> a guest.
>>>>>>>>
>>>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>>>
>>>>>> ...snip...
>>>>>>
>>>>>>>> @@ -592,6 +593,55 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>>>>>>>        }
>>>>>>>>    }
>>>>>>>> +/*
>>>>>>>> + * handle_pqap: Handling pqap interception
>>>>>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>>>>>> + *
>>>>>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>>>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>>>>>> + *
>>>>>>>> + * The intercepting code calls a dedicated callback for this instruction
>>>>>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>>>>>> + * SIE block.
>>>>>>>> + *
>>>>>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>>>>>>>> + *
>>>>>>>> + * If no callback available, the queues are not available, return this to
>>>>>>>> + * the caller.
>>>>>>>> + * Else return the value returned by the callback.
>>>>>>>> + */
>>>>>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>>>>> +{
>>>>>>>> +    uint8_t fc;
>>>>>>>> +    struct ap_queue_status status = {};
>>>>>>>> +
>>>>>>>> +    /* Verify that the AP instruction are available */
>>>>>>>> +    if (!ap_instructions_available())
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>
>>>>>>> How can the guest even execute an AP instruction if the AP instructions
>>>>>>> are not available? If the AP instructions are not available on the host,
>>>>>>> they will not be available on the guest (i.e., CPU model feature
>>>>>>> S390_FEAT_AP will not be set). I suppose it doesn't hurt to check this
>>>>>>> here given QEMU may not be the only client.
>>>>>>>
>>>>>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>>>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>> +    /* Verify that the function code is AQIC */
>>>>>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>>>>>> +    if (fc != 0x03)
>>>>>>>> +        return -EOPNOTSUPP;
>>>>>>>
>>>>>>> You must have missed my suggestion to move this to the
>>>>>>> vcpu->kvm->arch.crypto.pqap_hook(vcpu) in the following responses:
>>>>>>
>>>>>> Please consider what happen if the vfio_ap module is not loaded.
>>>>>
>>>>> I have considered it and even verified my expectations empirically. If
>>>>> the vfio_ap module is not loaded, you will not be able to create an mdev device.
>>>>
>>>> OK, now please consider that another userland tool, not QEMU uses KVM.
>>>
>>> What does that have to do with loading the vfio_ap module? Without the
>>> vfio_ap module, there will be no AP devices for the guest. What are you
>>> suggesting here?
>>>
>>>>
>>>>> If you don't have an mdev device, you will not be able to
>>>>> start a guest with a vfio-ap device. If you start a guest without a
>>>>> vfio-ap device, but enable AP instructions for the guest, there will be
>>>>> no AP devices attached to the guest. Without any AP devices attached,
>>>>> the PQAP(AQIC) instructions will not ever get executed.
>>>>
>>>> This is not right. The instruction will be executed, eventually, after decoding.
>>>
>>> Please explain why the PQAP(AQIC) instruction will be executed on a
>>> guest without any devices? Point me to the code in the AP bus where
>>> PQAP(AQIC) is executed without a queue?
>>
>> The host must be prepared to handle malicous and broken guests. So if
>> a guest does PQAP, we must handle that gracefully (e.g. by injecting an
>> exception)
> 
> I agree, but the context of this discussion is whether it is
> more appropriate to check fc == 0x03 in this function or the pqap
> hook. If there is no vfio_ap module, which Pierre asked me to consider,
> then there will be no hook initialized. Again, nothing Pierre has
> stated has convinced me that the fc check belongs here, although there
> is no harm in doing so. In fact, a malicious guest can issue PQAP(AQIC)
> with fc=0x03, so none of the arguments above makes sense in this
> context.
> 
>>
>>>
>>>>
>>>>> Even if for some
>>>>> unknown reason the PQAP(AQIC) instruction is executed - for some unknown
>>>>> reason, it will fail with response code 0x01, AP-queue number not valid.
>>>>
>>>> No, before accessing the AP-queue the instruction will be decoded and depending on the installed micro-code it will fail with
>>>> - OPERATION EXCEPTION if the micro-code is not installed
>>>> - PRIVILEDGE OPERATION if the instruction is issued from userland (programm state)
>>>> - SPECIFICATION exception if the instruction do not respect the usage specification
>>>>
>>>> then it will be interpreted by the microcode and access the queue and only then it will fail with RC 0x01, AP queue not valid.
>>>>
>>>> In the case of KVM, we intercept the instruction because it is issued by the guest and we set the AQIC facility on to force interception.
>>>>
>>>> KVM do for us all the decode steps I mention here above, if there is or not a pqap hook to be call to simulate the QP queue access.
>>>>
>>>> That done, the AP queue virtualisation can be called, this is done by calling the hook.
>>>
>>> Okay, let's go back to the genesis of this discussion; namely, my
>>> suggestion about moving the fc == 0x03 check into the hook code. If
>>> the vfio_ap module is not loaded, there will be no hook code. In that
>>> case, the check for the hook will fail and ultimately response code
>>> 0x01 will be set in the status word (which may not be the right thing
>>> to do?). You have not stated a single good reason for keeping this
>>> check, but I'm done with this silly argument. It certainly doesn't
>>> hurt anything.
>>
>> The instruction handler must handle the basic checks for the
>> instruction itself as outlined above.
> 
> The pqap hook IS the instruction handler.

In the kvm sense handle_pqap is the instruction handler.
But can we stop that discussion NOW?
There is things that can be done in both places. As long as the overall code
produces the right result it really does not matter where we do the checks.

This discussion distracts the attention from more important issues, for example
the question about how do we guarantee the de-registration of the interrupt
indicator byte when the kvm guest goes away.

>> I think Pierre is talking about the the KVM instruction decoder.
>> (see handle_instruction in  intercept.c that will then call handle_b2
>> and then call handle_pqap).
> 
> I think this debate has gone on far too long for such a minor
> suggestion. If Pierre wants to keep the check for fc here, so be
> it. I've wasted waaaaaay to much time on it.

Absolutely. Lets stop here and focus on the real things. I think we are
pretty close but we need to tackle some issues.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel
  2019-02-28 20:20   ` Christian Borntraeger
@ 2019-03-01  9:35     ` Pierre Morel
  0 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-03-01  9:35 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

On 28/02/2019 21:20, Christian Borntraeger wrote:
> 
> 
> On 22.02.2019 16:29, Pierre Morel wrote:
>> We register the AP PQAP instruction hook during the open
>> of the mediated device. And unregister it on release.
>>
>> In the AP PQAP instruction hook, if we receive a demand to
>> enable IRQs,
>> - we retrieve the vfio_ap_queue based on the APQN we receive
>>    in REG1,
>> - we retrieve the page of the guest address, (NIB), from
>>    register REG2
>> - we the mediated device to use the VFIO pinning infratrsucture
>>    to pin the page of the guest address,
>> - we retrieve the pointer to KVM to register the guest ISC
>>    and retrieve the host ISC
>> - finaly we activate GISA
>>
>> If we receive a demand to disable IRQs,
>> - we deactivate GISA
>> - unregister from the GIB
>> - unping the NIB
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/kvm_host.h      |   1 +
>>   drivers/s390/crypto/ap_bus.h          |   1 +
>>   drivers/s390/crypto/vfio_ap_ops.c     | 199 +++++++++++++++++++++++++++++++++-
>>   drivers/s390/crypto/vfio_ap_private.h |   1 +
>>   4 files changed, 199 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index 49cc8b0..5f3bb8c 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -720,6 +720,7 @@ struct kvm_s390_cpu_model {
>>   struct kvm_s390_crypto {
>>   	struct kvm_s390_crypto_cb *crycb;
>>   	int (*pqap_hook)(struct kvm_vcpu *vcpu);
>> +	void *vfio_private;
>>   	__u32 crycbd;
>>   	__u8 aes_kw;
>>   	__u8 dea_kw;
>> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
>> index bfc66e4..323f2aa 100644
>> --- a/drivers/s390/crypto/ap_bus.h
>> +++ b/drivers/s390/crypto/ap_bus.h
>> @@ -43,6 +43,7 @@ static inline int ap_test_bit(unsigned int *ptr, unsigned int nr)
>>   #define AP_RESPONSE_BUSY		0x05
>>   #define AP_RESPONSE_INVALID_ADDRESS	0x06
>>   #define AP_RESPONSE_OTHERWISE_CHANGED	0x07
>> +#define AP_RESPONSE_INVALID_GISA	0x08
>>   #define AP_RESPONSE_Q_FULL		0x10
>>   #define AP_RESPONSE_NO_PENDING_REPLY	0x10
>>   #define AP_RESPONSE_INDEX_TOO_BIG	0x11
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 1b5130a..0196065 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -43,7 +43,7 @@ struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
>>   	return NULL;
>>   }
>>   
>> -static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>> +int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>>   {
>>   	struct ap_queue_status status;
>>   	int retry = 20;
>> @@ -75,6 +75,27 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>>   	return -EBUSY;
>>   }
>>   
>> +/**
>> + * vfio_ap_free_irq:
>> + * @q: The vfio_ap_queue
>> + *
>> + * Unpin the guest NIB
>> + * Unregister the ISC from the GIB alert
>> + * Clear the vfio_ap_queue intern fields
>> + */
>> +static void vfio_ap_free_irq(struct vfio_ap_queue *q)
>> +{
>> +	if (!q)
>> +		return;
>> +	if (q->g_pfn)
>> +		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &q->g_pfn, 1);
>> +	if (q->isc)
>> +		kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->isc);
>> +	q->nib = 0;
>> +	q->isc = 0;
>> +	q->g_pfn = 0;
>> +}
> 
> Pierre, unless there is some magic, I think we need to free the gisa stuff before kvm exit.
> 
> Imagine a malicious userspace that setups everything fine, but then closes all kvm file
> descriptors but not the vfio file descriptor. This might result in random access to the
> memory that contained the gisa potentially resulting in random memory overwrites.
> 
> the problem is that kvm_destroy_vm calls kvm_arch_destroy_vm(kvm) before it calls
> kvm_destroy_devices(kvm); So we already free the gisa before we do the unregister call.
> 
> What about calling kvm_get_kvm/put from some of the callbacks in the right places.
> 
> Debugging random memory overwrites is a PITA, so we either should document why I cannot
> happen (even with malicious userspace) or simply fix the refcounting.
> 

OK, understood.
I think we can do something simple by using kvm_get/kvm_put, as you 
suggested, in the vfio KVM notifier to ensure the order of the calls and 
also unregister the GISA at this moment.
I will investigate in this direction.

Thanks
Pierre




-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control
  2019-02-28 15:08 ` [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Halil Pasic
@ 2019-03-01  9:40   ` Pierre Morel
  0 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-03-01  9:40 UTC (permalink / raw)
  To: Halil Pasic
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On 28/02/2019 16:08, Halil Pasic wrote:
> On Fri, 22 Feb 2019 16:29:53 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> This patch implement PQAP/AQIC interception in KVM.
>>
>> To implement this we need to add a new structure, vfio_ap_queue,to be
>> able to retrieve the mediated device associated with a queue and specific
>> values needed to register/unregister the interrupt structures:
>>   - APQN: to be able to issue the commands and search for queue structures
>>   - NIB : to unpin the NIB on clear IRQ
>>   - ISC : to unregister with the GIB interface
>>   - MATRIX: a pointer to the matrix mediated device
>>   - LIST: the list_head to handle the vfio_queue life cycle
>>
>> Having this structure and the list management greatly ease the handling
>> of the AP queues and diminues the LOCs needed in the vfio_ap driver by
>> more than 150 lines in comparison with the previous version.
>>
>>
>> 0) Queues life cycle
>>
>> vfio_ap_queues are created on probe
>>
>> We define one bucket on the matrix device to store the free vfio_ap_queues,
>> the queues not assign to any matrix mediated device.
>>
>> We define one bucket on each matrix mediated device to hold the
>> vfio_ap_queues belonging to it.
>>
>> vfio_ap_queues are deleted on remove
>>
>> This makes the search for a queue easy and the detection of assignent
>> incoherency obvious (the queue is not avilable) and simplifies assignment.
>>
>>
>> 1) Phase 1, probe and remove from vfio_ap_queue
>>
>> The vfio_ap_queue structures are dynamically allocated and setup
>> when a queue is probed by the ap_vfio_driver.
>> The vfio_ap_queue is linked to the ap_queue device as the driver data.
>>
>> The new The vfio_ap_queue is put on a free_list belonging to the
>> matrix device.
>>
>> The vfio_ap_queue are free during remove.
>>
>>
>> 2) Phase 2, assignment of vfio_ap_queue to a mediated device
>>
>> When a APID is assigned we look for APQI already assigned to
>> the matrix mediated device and associate all the queue with the
>> APQN = (APID,APQI) to the mediated device by adding them to
>> the mediated device queue list.
>> We do the same when a APQI is assigned.
>>
>> If any queue with a matching APQN can not be found on the matrix
>> device free list it means it is already associated to another matrix
>> mediated device and no queue is added to the matrix mediated device.
>>
>> 3) Phase 3, starting the guest
>>
>> When the VFIO device is opened the PQAP callback and a pointer to
>> the matrix mediated device are set inside KVM during the open callback.
>>
>> When the device is closed or if a queue is removed, the vfio_ap_queue is
>> dissociated from the mediated device.
>>
>>
>> 4) Phase 3 intercepting the PQAP/AQIC instruction
>>
>> On interception of the PQAP/AQIC instruction, the interception code
>> makes sure the pqap_hook is initialized and allowed to be called
>> and call it.
>> Otherwise it reports the usual -EOPNOTSUPP return code to let
>> QEMU handle the fault.
>>    
>> the pqap callback search for the queue asociated with the APQN
>> stored in the register 0, setting the code to "illegal APQN"
>> if the vfio_ap_queue can not be found.
>>
>> Depending on the "i" bit of the register 1, the pqap callback
>> setup or clear the interruption by calling the host format PQAP/AQIC
>> instruction.
>> When seting up the interruption it uses the NIB and the guest ISC
>> provided by the guest and the host ISC provided by the registration
>> to the GIB code, pin the NIB and also stores ISC and NIB inside
>> the vfio_ap_queue structure.
>> When clearing the interrupt it retrieves the host ISC to unregister
>> with the GIB code and unpin the NIB.
>>
>> We take care when enabling GISA that the guest may have issued a
>> reset and will not need to disable the interuptions before
>> re-enabling interruptions.
> 
> Please let us know what guarantees, that we will disable the
> interruptions we previously enabled using AQIC (and generally facilitate
> proper cleanup) *before* kvm_s390_gisa_destroy() makes the gisa and
> with that the IPM go away!
> 
> Please note that IMHO this needs to be guaranteed by the kernel
> regardless of what userspace (QEMU) or the guest does.
> 
> (I've asked this question before during our internal review but I could
> not find the answer if there was one after going trough my mails.)
> 
> Regards,
> Halil
> 

You are right.
I will investigate this too.

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 14:14                       ` Pierre Morel
@ 2019-03-01 12:03                         ` Pierre Morel
  2019-03-01 12:05                           ` Christian Borntraeger
  0 siblings, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-03-01 12:03 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Christian Borntraeger, Tony Krowiak, alex.williamson,
	linux-kernel, linux-s390, kvm, frankja, pasic, david,
	schwidefsky, heiko.carstens, freude, mimu

On 28/02/2019 15:14, Pierre Morel wrote:
> On 28/02/2019 14:52, Cornelia Huck wrote:
>> On Thu, 28 Feb 2019 14:16:09 +0100
>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>
>>> On 28/02/2019 12:22, Cornelia Huck wrote:
>>
>>>> So, to summarize, the function should do:
>>>> - Is userspace supposed to emulate everything (!ECA_APIE)? Return
>>>>     -EOPNOTSUPP to hand control to it.
>>>> - We are now interpreting the instruction in KVM. Do common checks
>>>>     (PSTATE etc.) and inject exceptions, if needed.
>>>> - Now look at the fc; if there's a handler for it, call that; if not
>>>>     (case does not attempt to call a specific handler, or no handler
>>>>     registered), inject a specification exception. (Do we want 
>>>> pre-checks
>>>>     like for facility 65 here, or in the handler?)
>>>>
>>>> That response code 0x01 thingy probably needs to go into the specific
>>>> handler function, if anywhere (don't know the semantics, sorry).
>>>
>>> What do you mean with specific handler function?
>>>
>>> If you mean a switch around the FC with static function's call, I agree,
>>> if you mean a jump into a hook I do not agree.
>>
>> Ah, ok; so each case (that we want to handle) should call into a
>> subhandler that does
>> {
>>     (... check things like facilities ...)
>>     if (!specific_hook)
>>         inject_specif_excp_and_return();
>>     ret = specific_hook();
>>     if (ret)
>>         set_resp_code_0x01(); // or in specific_hook()?
>> }
>>
>> ?
> 
> Yes something in this direction.

Sorry, after reflection, no, we do not want to change the previous 
behavior so we only handle the AQIC case.

Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-01 12:03                         ` Pierre Morel
@ 2019-03-01 12:05                           ` Christian Borntraeger
  2019-03-01 12:36                             ` Cornelia Huck
  0 siblings, 1 reply; 79+ messages in thread
From: Christian Borntraeger @ 2019-03-01 12:05 UTC (permalink / raw)
  To: pmorel, Cornelia Huck
  Cc: Tony Krowiak, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, pasic, david, schwidefsky, heiko.carstens, freude, mimu



On 01.03.2019 13:03, Pierre Morel wrote:
> On 28/02/2019 15:14, Pierre Morel wrote:
>> On 28/02/2019 14:52, Cornelia Huck wrote:
>>> On Thu, 28 Feb 2019 14:16:09 +0100
>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>
>>>> On 28/02/2019 12:22, Cornelia Huck wrote:
>>>
>>>>> So, to summarize, the function should do:
>>>>> - Is userspace supposed to emulate everything (!ECA_APIE)? Return
>>>>>     -EOPNOTSUPP to hand control to it.
>>>>> - We are now interpreting the instruction in KVM. Do common checks
>>>>>     (PSTATE etc.) and inject exceptions, if needed.
>>>>> - Now look at the fc; if there's a handler for it, call that; if not
>>>>>     (case does not attempt to call a specific handler, or no handler
>>>>>     registered), inject a specification exception. (Do we want pre-checks
>>>>>     like for facility 65 here, or in the handler?)
>>>>>
>>>>> That response code 0x01 thingy probably needs to go into the specific
>>>>> handler function, if anywhere (don't know the semantics, sorry).
>>>>
>>>> What do you mean with specific handler function?
>>>>
>>>> If you mean a switch around the FC with static function's call, I agree,
>>>> if you mean a jump into a hook I do not agree.
>>>
>>> Ah, ok; so each case (that we want to handle) should call into a
>>> subhandler that does
>>> {
>>>     (... check things like facilities ...)
>>>     if (!specific_hook)
>>>         inject_specif_excp_and_return();
>>>     ret = specific_hook();
>>>     if (ret)
>>>         set_resp_code_0x01(); // or in specific_hook()?
>>> }
>>>
>>> ?
>>
>> Yes something in this direction.
> 
> Sorry, after reflection, no, we do not want to change the previous behavior so we only handle the AQIC case.

I think what you wanted to say is the following:
Today (without the patch set) we will answer PQAP with an exception.
With this patch set we want to handle FC==3, but nothing else. So for anything FC!=3 we
will continue to return an exception?

Correct?


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-02-28 16:51                   ` Halil Pasic
@ 2019-03-01 12:10                     ` Pierre Morel
  0 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-03-01 12:10 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Christian Borntraeger, Tony Krowiak, alex.williamson, cohuck,
	linux-kernel, linux-s390, kvm, frankja, david, schwidefsky,
	heiko.carstens, freude, mimu

On 28/02/2019 17:51, Halil Pasic wrote:
> On Thu, 28 Feb 2019 15:12:16 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 28/02/2019 13:39, Halil Pasic wrote:
>>> On Thu, 28 Feb 2019 10:42:23 +0100
>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> [..]
>>>> Correct?
>>>
>>> IMHO mostly.
>>>
>>> I also doing the facility checks in kvm is easier, and I think this is
>>> something we can change later if needed without any major trouble.
>>>
>>> There are a couple of things I would do differently than Pierre does:
>>> 1) Do the PGM_PRIVILEGED_OP before the fc == 3 check.
>>
>> Idea was not to modify existing behavior for fc != 3
>>
>> Also Christian already proposed to handle all FC codes. So in this idea,
>> this must be done as you say.
>>
>>>
>>> 2) Do the test_kvm_facility(vcpu->kvm, 65) check in the context of fc ==
>>> 3. I.e. decide if this hook is about pqap or just about pqap aqic and
>>> make the code convey that decision to its reader.
>>>
>>> 3) I would most probably test if the queue is available by looking at the
>>> masks in CRYCB here. If not AP_RESPONSE_Q_NOT_AVAIL is what we need.
>>
>> This I do not agree with, it is typically the responsibility of the part
>> in charge of the virtualization to do this, also the vfio_driver.
>>
> 
> See at 4) regarding the details. My guess is you disagree with checking
> CRYCB explicitly but don't digress with AP_RESPONSE_Q_NOT_AVAIL if APCB
> does not authorize the queue. Your idea was to infer APCB all zero from
> the fact that pqap_hook is NULL.
> 
> If my assumption is right, then yes we can have an implicit coarse check
> here and a fine grained check in the client code (vfio_ap).




> 
>>>
>>> 4) If we have APIE and queues authorized by the CRYCB (i.e. we have a
>>> vfio_ap module loaded an an mdev associated with the kvm) the callback
>>> not set (!(vcpu->kvm->arch.crypto.pqap_hook)) is a BUG!
>>
>> I do not agree with this either, the maintainers ;) will not allow this.
> 
> After an offline discussion we came to the conclusion that I did not
> understand your code.
> 
> Your train of thought was:
> 
> !(vcpu->kvm->arch.crypto.pqap_hook) _implies_ APCB all zero (i.e. the
> masks in the CRYCB
> 
> This is *why* you respond with AP_RESPONSE_Q_NOT_AVAIL.
> 
> However if that is the case I would like that spelled out in a code
> comment at least. Furthermore setting pqap_hook and APCB needs to happen
> in the right sequence. Means client code (vfio_ap) may only set APCB
> after the qpap_hook has been set. Currently we have a race there (as
> you first do  kvm_arch_crypto_set_masks and only then
> kvm->arch.crypto.pqap_hook. Furthermore I guess
> kvm->arch.crypto.pqap_hook needs to be set with the kvm lock held, which
> does not seem to be the case.


Yes, that is right.
This part (setting/resetting hook and CRYCB will be modified for the 
reason you mention and also to correctly handle the order of releasing 
KVM and VFIO, as you and Christian mentioned.




> 
>>
>>> In that case
>>> lying that the queue is not available does not seem right. BTW this
>>> is something Pierre changed since the last version quietly (I can't
>>> recall a mention in the change log or somebody asking for this). If
>>> we want to be very pedantic about this bug scenario our best bet is
>>> probably response code 6.
>>
>>
>> RC 06 means "Invalid address of AP-queue notification byte"
>>
>> So you must have think about another code or I do not understand at
>> all what you mean.
>>
> 
> I did not assume you decided to ignore the possibility of a programming
> error (which you at least technically did commit yourself) for what I
> described as a BUG.
> 
> My train of thought was, if we are very pedantic we can make things work
> with degraded functionality in that case. I.e. without AP interrupts.
> For that we need to tell the guest something like: yes your queue is
> fine and there and all that but AQCI setup interrupts did not work. And
> RC 06 is the only RC I see being suitable to convey that.
> 
> Detect and handle if the client code does not hold up their end of the
> bargain or just ignore the possibility is a design decision. But at least
> you should spell out your expectations against the client code.
> 
> Regards,
> Halil
> 

I prefer to comment the obligation for the vfio_driver to register the 
callback instead to add code complexity for which will eventually go 
deeper and deeper.


Thanks,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-01 12:05                           ` Christian Borntraeger
@ 2019-03-01 12:36                             ` Cornelia Huck
  2019-03-01 15:32                               ` Pierre Morel
  0 siblings, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2019-03-01 12:36 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: pmorel, Tony Krowiak, alex.williamson, linux-kernel, linux-s390,
	kvm, frankja, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

On Fri, 1 Mar 2019 13:05:54 +0100
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 01.03.2019 13:03, Pierre Morel wrote:
> > On 28/02/2019 15:14, Pierre Morel wrote:  
> >> On 28/02/2019 14:52, Cornelia Huck wrote:  
> >>> On Thu, 28 Feb 2019 14:16:09 +0100
> >>> Pierre Morel <pmorel@linux.ibm.com> wrote:
> >>>  
> >>>> On 28/02/2019 12:22, Cornelia Huck wrote:  
> >>>  
> >>>>> So, to summarize, the function should do:
> >>>>> - Is userspace supposed to emulate everything (!ECA_APIE)? Return
> >>>>>     -EOPNOTSUPP to hand control to it.
> >>>>> - We are now interpreting the instruction in KVM. Do common checks
> >>>>>     (PSTATE etc.) and inject exceptions, if needed.
> >>>>> - Now look at the fc; if there's a handler for it, call that; if not
> >>>>>     (case does not attempt to call a specific handler, or no handler
> >>>>>     registered), inject a specification exception. (Do we want pre-checks
> >>>>>     like for facility 65 here, or in the handler?)
> >>>>>
> >>>>> That response code 0x01 thingy probably needs to go into the specific
> >>>>> handler function, if anywhere (don't know the semantics, sorry).  
> >>>>
> >>>> What do you mean with specific handler function?
> >>>>
> >>>> If you mean a switch around the FC with static function's call, I agree,
> >>>> if you mean a jump into a hook I do not agree.  
> >>>
> >>> Ah, ok; so each case (that we want to handle) should call into a
> >>> subhandler that does
> >>> {
> >>>     (... check things like facilities ...)
> >>>     if (!specific_hook)
> >>>         inject_specif_excp_and_return();
> >>>     ret = specific_hook();
> >>>     if (ret)
> >>>         set_resp_code_0x01(); // or in specific_hook()?
> >>> }
> >>>
> >>> ?  
> >>
> >> Yes something in this direction.  
> > 
> > Sorry, after reflection, no, we do not want to change the previous behavior so we only handle the AQIC case.  
> 
> I think what you wanted to say is the following:
> Today (without the patch set) we will answer PQAP with an exception.
> With this patch set we want to handle FC==3, but nothing else. So for anything FC!=3 we
> will continue to return an exception?
> 
> Correct?
> 

That sounds reasonable; but I don't see how this conflicts with my
proposal? Just don't introduce a subfunction for fc != 3...

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-01 12:36                             ` Cornelia Huck
@ 2019-03-01 15:32                               ` Pierre Morel
  0 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-03-01 15:32 UTC (permalink / raw)
  To: Cornelia Huck, Christian Borntraeger
  Cc: Tony Krowiak, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 01/03/2019 13:36, Cornelia Huck wrote:
> On Fri, 1 Mar 2019 13:05:54 +0100
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 01.03.2019 13:03, Pierre Morel wrote:
>>> On 28/02/2019 15:14, Pierre Morel wrote:
>>>> On 28/02/2019 14:52, Cornelia Huck wrote:
>>>>> On Thu, 28 Feb 2019 14:16:09 +0100
>>>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>>>   
>>>>>> On 28/02/2019 12:22, Cornelia Huck wrote:
>>>>>   
>>>>>>> So, to summarize, the function should do:
>>>>>>> - Is userspace supposed to emulate everything (!ECA_APIE)? Return
>>>>>>>      -EOPNOTSUPP to hand control to it.
>>>>>>> - We are now interpreting the instruction in KVM. Do common checks
>>>>>>>      (PSTATE etc.) and inject exceptions, if needed.
>>>>>>> - Now look at the fc; if there's a handler for it, call that; if not
>>>>>>>      (case does not attempt to call a specific handler, or no handler
>>>>>>>      registered), inject a specification exception. (Do we want pre-checks
>>>>>>>      like for facility 65 here, or in the handler?)
>>>>>>>
>>>>>>> That response code 0x01 thingy probably needs to go into the specific
>>>>>>> handler function, if anywhere (don't know the semantics, sorry).
>>>>>>
>>>>>> What do you mean with specific handler function?
>>>>>>
>>>>>> If you mean a switch around the FC with static function's call, I agree,
>>>>>> if you mean a jump into a hook I do not agree.
>>>>>
>>>>> Ah, ok; so each case (that we want to handle) should call into a
>>>>> subhandler that does
>>>>> {
>>>>>      (... check things like facilities ...)
>>>>>      if (!specific_hook)
>>>>>          inject_specif_excp_and_return();
>>>>>      ret = specific_hook();
>>>>>      if (ret)
>>>>>          set_resp_code_0x01(); // or in specific_hook()?
>>>>> }
>>>>>
>>>>> ?
>>>>
>>>> Yes something in this direction.
>>>
>>> Sorry, after reflection, no, we do not want to change the previous behavior so we only handle the AQIC case.
>>
>> I think what you wanted to say is the following:
>> Today (without the patch set) we will answer PQAP with an exception.
>> With this patch set we want to handle FC==3, but nothing else. So for anything FC!=3 we
>> will continue to return an exception?
>>
>> Correct?

Yes correct.
Thanks for the much preciser explanation.

>>
> 
> That sounds reasonable; but I don't see how this conflicts with my
> proposal? Just don't introduce a subfunction for fc != 3...
> 

Correct too, it does not conflict, as you said it is just not introduce 
subfunctions.

Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel
  2019-02-22 15:29 ` [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel Pierre Morel
                     ` (2 preceding siblings ...)
  2019-02-28 20:20   ` Christian Borntraeger
@ 2019-03-04  1:57   ` Halil Pasic
  2019-03-04  9:47     ` Pierre Morel
  3 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2019-03-04  1:57 UTC (permalink / raw)
  To: Pierre Morel
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On Fri, 22 Feb 2019 16:29:58 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> We register the AP PQAP instruction hook during the open
> of the mediated device. And unregister it on release.
> 
> In the AP PQAP instruction hook, if we receive a demand to
> enable IRQs,
> - we retrieve the vfio_ap_queue based on the APQN we receive
>   in REG1,
> - we retrieve the page of the guest address, (NIB), from
>   register REG2
> - we the mediated device to use the VFIO pinning infratrsucture
>   to pin the page of the guest address,
> - we retrieve the pointer to KVM to register the guest ISC
>   and retrieve the host ISC
> - finaly we activate GISA
> 
> If we receive a demand to disable IRQs,
> - we deactivate GISA
> - unregister from the GIB
> - unping the NIB
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
[..]
> + */
> +static void vfio_ap_free_irq(struct vfio_ap_queue *q)
> +{
> +	if (!q)
> +		return;
> +	if (q->g_pfn)
> +		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &q->g_pfn, 1);
> +	if (q->isc)
> +		kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->isc);

Ain't isc 0 a perfectly legit isc?

> +	q->nib = 0;
> +	q->isc = 0;
> +	q->g_pfn = 0;
> +}
> +
[..]
> @@ -109,10 +131,16 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>  {
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +	struct vfio_ap_queue *q, *qtmp;
>  
>  	if (matrix_mdev->kvm)
>  		return -EBUSY;
>  
> +	list_for_each_entry_safe(q, qtmp, &matrix_mdev->qlist, list) {
> +		q->matrix_mdev = NULL;
> +		vfio_ap_mdev_reset_queue(q);
> +		list_move(&q->list, &matrix_dev->free_list);

How about matrix_dev->lock? I guess you should protect free_list with
it. If not maybe a code comment would help not stumble over this.

> +	}
>  	mutex_lock(&matrix_dev->lock);
>  	list_del(&matrix_mdev->node);
>  	mutex_unlock(&matrix_dev->lock);

[..]

> +/**
> + * vfio_ap_setirq: Enable Interruption for a APQN
> + *
> + * @dev: the device associated with the ap_queue
> + * @q:   the vfio_ap_queue holding AQIC parameters
> + *
> + * Pin the NIB saved in *q
> + * Register the guest ISC to GIB interface and retrieve the
> + * host ISC to issue the host side PQAP/AQIC
> + *
> + * Response.status may be set to following Response Code in case of error:
> + * - AP_RESPONSE_INVALID_ADDRESS: vfio_pin_pages failed
> + * - AP_RESPONSE_OTHERWISE_CHANGED: Hypervizor GISA internal error
> + *
> + * Otherwise return the ap_queue_status returned by the ap_aqic()
> + */
> +static struct ap_queue_status vfio_ap_setirq(struct vfio_ap_queue *q)
> +{
> +	struct ap_qirq_ctrl aqic_gisa = {};
> +	struct ap_queue_status status = {};
> +	struct kvm_s390_gisa *gisa;
> +	struct kvm *kvm;
> +	unsigned long g_pfn, h_nib, h_pfn;
> +	int ret;
> +
> +	kvm = q->matrix_mdev->kvm;
> +	gisa = kvm->arch.gisa_int.origin;
> +
> +	g_pfn = q->nib >> PAGE_SHIFT;
> +	ret = vfio_pin_pages(mdev_dev(q->matrix_mdev->mdev), &g_pfn, 1,
> +			     IOMMU_READ | IOMMU_WRITE, &h_pfn);
> +	switch (ret) {
> +	case 1:
> +		break;
> +	case -EINVAL:
> +	case -E2BIG:
> +		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
> +		/* Fallthrough */
> +	default:
> +		return status;
> +	}
> +
> +	h_nib = (h_pfn << PAGE_SHIFT) | (q->nib & ~PAGE_MASK);
> +	aqic_gisa.gisc = q->isc;
> +	aqic_gisa.isc = kvm_s390_gisc_register(kvm, q->isc);
> +	aqic_gisa.ir = 1;
> +	aqic_gisa.gisa = gisa->next_alert >> 4;
> +
> +	status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
> +	switch (status.response_code) {
> +	case AP_RESPONSE_NORMAL:
> +		if (q->g_pfn)
> +			vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
> +					 &q->g_pfn, 1);

Shouldn't you call kvm_s390_gisc_unregister() here.

> +		q->g_pfn = g_pfn;
> +		break;
> +	case AP_RESPONSE_OTHERWISE_CHANGED:
> +		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &g_pfn, 1);

and here.

> +		break;
> +	case AP_RESPONSE_INVALID_GISA:
> +		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
> +	default:	/* Fall Through */
> +		pr_warn("%s: apqn %04x: response: %02x\n", __func__, q->apqn,
> +			status.response_code);
> +		vfio_ap_free_irq(q);

This guy won't unpin g_pfn but only q->g_pfn if not zero :/

> +		break;
> +	}
> +
> +	return status;
> +}
> +
> +/**
> + * handle_pqap: PQAP instruction callback
> + *
> + * @vcpu: The vcpu on which we received the PQAP instruction
> + *
> + * Get the general register contents to initialize internal variables.
> + * REG[0]: APQN
> + * REG[1]: IR and ISC
> + * REG[2]: NIB
> + *
> + * Response.status may be set to following Response Code:
> + * - AP_RESPONSE_Q_NOT_AVAIL: if the queue is not available
> + * - AP_RESPONSE_DECONFIGURED: if the queue is not configured
> + * - AP_RESPONSE_NORMAL (0) : in case of successs
> + *   Check vfio_ap_setirq() and vfio_ap_clrirq() for other possible
> RC.
> + *
> + * Return 0 if we could handle the request inside KVM.
> + * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
> + */
> +static int handle_pqap(struct kvm_vcpu *vcpu)
> +{
> +	uint64_t status;
> +	uint16_t apqn;
> +	struct vfio_ap_queue *q;
> +	struct ap_queue_status qstatus = {};
> +	struct ap_matrix_mdev *matrix_mdev;
> +
> +	/* If we do not use the AIV facility just go to userland */
> +	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
> +		return -EOPNOTSUPP;
> +
> +	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
> +	matrix_mdev = vcpu->kvm->arch.crypto.vfio_private;
> +	if (!matrix_mdev)
> +		return -EOPNOTSUPP;
> +	q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);

This get is not a 'refcount affecting get' any more...

> +	if (!q) {
> +		qstatus.response_code = AP_RESPONSE_Q_NOT_AVAIL;
> +		goto out;
> +	}
> +
> +	status = vcpu->run->s.regs.gprs[1];
> +
> +	/* If IR bit(16) is set we enable the interrupt */
> +	if ((status >> (63 - 16)) & 0x01) {
> +		q->isc = status & 0x07;
> +		q->nib = vcpu->run->s.regs.gprs[2];

... and I don't see what should prevent a potential use after free here.

Regards,
Halil

> +		qstatus = vfio_ap_setirq(q);
> +		if (qstatus.response_code) {
> +			q->nib = 0;
> +			q->isc = 0;
> +		}
> +	} else
> +		qstatus = vfio_ap_clrirq(q);
> +
> +out:
> +	memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus));
> +	return 0;
> +}

[..]


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-02-22 15:29 ` [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev Pierre Morel
                     ` (2 preceding siblings ...)
  2019-02-27 20:53   ` Tony Krowiak
@ 2019-03-04  2:09   ` Halil Pasic
  2019-03-04 10:19     ` Pierre Morel
                       ` (2 more replies)
  3 siblings, 3 replies; 79+ messages in thread
From: Halil Pasic @ 2019-03-04  2:09 UTC (permalink / raw)
  To: Pierre Morel
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On Fri, 22 Feb 2019 16:29:56 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> We need to associate the ap_vfio_queue, which will hold the
> per queue information for interrupt with a matrix mediated device
> which hold the configuration and the way to the CRYCB.
[..]
> +static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
> +{
> +	int apqi, apqn;
> +	int ret = 0;
> +	struct vfio_ap_queue *q;
> +	struct list_head q_list;
> +
> +	INIT_LIST_HEAD(&q_list);
> +
> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> +		apqn = AP_MKQID(apid, apqi);
> +		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
> +		if (!q) {
> +			ret = -EADDRNOTAVAIL;
> +			goto rewind;
> +		}
> +		if (q->matrix_mdev) {
> +			ret = -EADDRINUSE;

You tried to get the q from matrix_dev->free_list thus modulo races
q->matrix_mdev should be 0. This change breaks the error codes in a
sense that it becomes impossible to provoke EADDRINUSE (the proper
error code for taken by another matrix_mdev). 

> +			goto rewind;
> +		}
> +		list_move(&q->list, &q_list);
> +	}
> +	move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
>  	return 0;
> +rewind:
> +	move_and_set(&q_list, &matrix_dev->free_list, NULL);
> +	return ret;
>  }
> -
>  /**
> - * vfio_ap_mdev_verify_no_sharing
> + * vfio_ap_get_all_cards:
>   *
> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> - * and AP queue indexes comprising the AP matrix are not configured for another
> - * mediated device. AP queue sharing is not allowed.
> + * @matrix_mdev: the matrix mediated device for which we want to associate
> + *		 all available queues with a given apqi.
> + * @apqi:	 The apqi which associated with all defined APID of the
> + *		 mediated device will define a AP queue.
>   *
> - * @matrix_mdev: the mediated matrix device
> + * We define a local list to put all queues we find on the matrix device
> + * free list when associating the apqi with all already defined apid for
> + * this matrix mediated device.
>   *
> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> + * If we can get all the devices we roll them to the mediated device list
> + * If we get errors we unroll them to the free list.
>   */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
>  {
> -	struct ap_matrix_mdev *lstdev;
> -	DECLARE_BITMAP(apm, AP_DEVICES);
> -	DECLARE_BITMAP(aqm, AP_DOMAINS);
> -
> -	list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
> -		if (matrix_mdev == lstdev)
> -			continue;
> -
> -		memset(apm, 0, sizeof(apm));
> -		memset(aqm, 0, sizeof(aqm));
> -
> -		/*
> -		 * We work on full longs, as we can only exclude the leftover
> -		 * bits in non-inverse order. The leftover is all zeros.
> -		 */
> -		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> -				lstdev->matrix.apm, AP_DEVICES))
> -			continue;
> -
> -		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> -				lstdev->matrix.aqm, AP_DOMAINS))
> -			continue;
> -
> -		return -EADDRINUSE;
> +	int apid, apqn;
> +	int ret = 0;
> +	struct vfio_ap_queue *q;
> +	struct list_head q_list;
> +	struct ap_matrix_mdev *tmp = NULL;
> +
> +	INIT_LIST_HEAD(&q_list);
> +
> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> +		apqn = AP_MKQID(apid, apqi);
> +		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
> +		if (!q) {
> +			ret = -EADDRNOTAVAIL;
> +			goto rewind;
> +		}
> +		if (q->matrix_mdev) {
> +			ret = -EADDRINUSE;

Same here!

Regards,
Halil

> +			goto rewind;
> +		}
> +		list_move(&q->list, &q_list);
>  	}

[..]


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel
  2019-03-04  1:57   ` Halil Pasic
@ 2019-03-04  9:47     ` Pierre Morel
  0 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-03-04  9:47 UTC (permalink / raw)
  To: Halil Pasic
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On 04/03/2019 02:57, Halil Pasic wrote:
> On Fri, 22 Feb 2019 16:29:58 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> We register the AP PQAP instruction hook during the open
>> of the mediated device. And unregister it on release.
>>
>> In the AP PQAP instruction hook, if we receive a demand to
>> enable IRQs,
>> - we retrieve the vfio_ap_queue based on the APQN we receive
>>    in REG1,
>> - we retrieve the page of the guest address, (NIB), from
>>    register REG2
>> - we the mediated device to use the VFIO pinning infratrsucture
>>    to pin the page of the guest address,
>> - we retrieve the pointer to KVM to register the guest ISC
>>    and retrieve the host ISC
>> - finaly we activate GISA
>>
>> If we receive a demand to disable IRQs,
>> - we deactivate GISA
>> - unregister from the GIB
>> - unping the NIB
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
> [..]
>> + */
>> +static void vfio_ap_free_irq(struct vfio_ap_queue *q)
>> +{
>> +	if (!q)
>> +		return;
>> +	if (q->g_pfn)
>> +		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &q->g_pfn, 1);
>> +	if (q->isc)
>> +		kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->isc);
> 
> Ain't isc 0 a perfectly legit isc?

Exact, even GIB interface always gives 5 back I should initialize the 
ISC to a bad value like > 7
Will do, thanks.

> 
>> +	q->nib = 0;
>> +	q->isc = 0;
>> +	q->g_pfn = 0;
>> +}
>> +
> [..]
>> @@ -109,10 +131,16 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>   static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>>   {
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +	struct vfio_ap_queue *q, *qtmp;
>>   
>>   	if (matrix_mdev->kvm)
>>   		return -EBUSY;
>>   
>> +	list_for_each_entry_safe(q, qtmp, &matrix_mdev->qlist, list) {
>> +		q->matrix_mdev = NULL;
>> +		vfio_ap_mdev_reset_queue(q);
>> +		list_move(&q->list, &matrix_dev->free_list);
> 
> How about matrix_dev->lock? I guess you should protect free_list with
> it. If not maybe a code comment would help not stumble over this.

Conny already commented that I forgot locks
I need a lock there and in the interception. May be more, I will check.

> 
>> +	}
>>   	mutex_lock(&matrix_dev->lock);
>>   	list_del(&matrix_mdev->node);
>>   	mutex_unlock(&matrix_dev->lock);
> 
> [..]
> 
>> +/**
>> + * vfio_ap_setirq: Enable Interruption for a APQN
>> + *
>> + * @dev: the device associated with the ap_queue
>> + * @q:   the vfio_ap_queue holding AQIC parameters
>> + *
>> + * Pin the NIB saved in *q
>> + * Register the guest ISC to GIB interface and retrieve the
>> + * host ISC to issue the host side PQAP/AQIC
>> + *
>> + * Response.status may be set to following Response Code in case of error:
>> + * - AP_RESPONSE_INVALID_ADDRESS: vfio_pin_pages failed
>> + * - AP_RESPONSE_OTHERWISE_CHANGED: Hypervizor GISA internal error
>> + *
>> + * Otherwise return the ap_queue_status returned by the ap_aqic()
>> + */
>> +static struct ap_queue_status vfio_ap_setirq(struct vfio_ap_queue *q)
>> +{
>> +	struct ap_qirq_ctrl aqic_gisa = {};
>> +	struct ap_queue_status status = {};
>> +	struct kvm_s390_gisa *gisa;
>> +	struct kvm *kvm;
>> +	unsigned long g_pfn, h_nib, h_pfn;
>> +	int ret;
>> +
>> +	kvm = q->matrix_mdev->kvm;
>> +	gisa = kvm->arch.gisa_int.origin;
>> +
>> +	g_pfn = q->nib >> PAGE_SHIFT;
>> +	ret = vfio_pin_pages(mdev_dev(q->matrix_mdev->mdev), &g_pfn, 1,
>> +			     IOMMU_READ | IOMMU_WRITE, &h_pfn);
>> +	switch (ret) {
>> +	case 1:
>> +		break;
>> +	case -EINVAL:
>> +	case -E2BIG:
>> +		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
>> +		/* Fallthrough */
>> +	default:
>> +		return status;
>> +	}
>> +
>> +	h_nib = (h_pfn << PAGE_SHIFT) | (q->nib & ~PAGE_MASK);
>> +	aqic_gisa.gisc = q->isc;
>> +	aqic_gisa.isc = kvm_s390_gisc_register(kvm, q->isc);
>> +	aqic_gisa.ir = 1;
>> +	aqic_gisa.gisa = gisa->next_alert >> 4;
>> +
>> +	status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
>> +	switch (status.response_code) {
>> +	case AP_RESPONSE_NORMAL:
>> +		if (q->g_pfn)
>> +			vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
>> +					 &q->g_pfn, 1);
> 
> Shouldn't you call kvm_s390_gisc_unregister() here.

I should.


> 
>> +		q->g_pfn = g_pfn;
>> +		break;
>> +	case AP_RESPONSE_OTHERWISE_CHANGED:
>> +		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &g_pfn, 1);
> 
> and here.

too

> 
>> +		break;
>> +	case AP_RESPONSE_INVALID_GISA:
>> +		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
>> +	default:	/* Fall Through */
>> +		pr_warn("%s: apqn %04x: response: %02x\n", __func__, q->apqn,
>> +			status.response_code);
>> +		vfio_ap_free_irq(q);
> 
> This guy won't unpin g_pfn but only q->g_pfn if not zero :/

OK, seems I have to rework this all.
Thanks.

> 
>> +		break;
>> +	}
>> +
>> +	return status;
>> +}
>> +
>> +/**
>> + * handle_pqap: PQAP instruction callback
>> + *
>> + * @vcpu: The vcpu on which we received the PQAP instruction
>> + *
>> + * Get the general register contents to initialize internal variables.
>> + * REG[0]: APQN
>> + * REG[1]: IR and ISC
>> + * REG[2]: NIB
>> + *
>> + * Response.status may be set to following Response Code:
>> + * - AP_RESPONSE_Q_NOT_AVAIL: if the queue is not available
>> + * - AP_RESPONSE_DECONFIGURED: if the queue is not configured
>> + * - AP_RESPONSE_NORMAL (0) : in case of successs
>> + *   Check vfio_ap_setirq() and vfio_ap_clrirq() for other possible
>> RC.
>> + *
>> + * Return 0 if we could handle the request inside KVM.
>> + * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
>> + */
>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>> +{
>> +	uint64_t status;
>> +	uint16_t apqn;
>> +	struct vfio_ap_queue *q;
>> +	struct ap_queue_status qstatus = {};
>> +	struct ap_matrix_mdev *matrix_mdev;
>> +
>> +	/* If we do not use the AIV facility just go to userland */
>> +	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
>> +		return -EOPNOTSUPP;
>> +
>> +	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
>> +	matrix_mdev = vcpu->kvm->arch.crypto.vfio_private;
>> +	if (!matrix_mdev)
>> +		return -EOPNOTSUPP;
>> +	q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
> 
> This get is not a 'refcount affecting get' any more...
> 
>> +	if (!q) {
>> +		qstatus.response_code = AP_RESPONSE_Q_NOT_AVAIL;
>> +		goto out;
>> +	}
>> +
>> +	status = vcpu->run->s.regs.gprs[1];
>> +
>> +	/* If IR bit(16) is set we enable the interrupt */
>> +	if ((status >> (63 - 16)) & 0x01) {
>> +		q->isc = status & 0x07;
>> +		q->nib = vcpu->run->s.regs.gprs[2];
> 
> ... and I don't see what should prevent a potential use after free here.

I think this will be corrected when I add the lock I forgot in handle_pqap.


Thanks for the comments,

Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-03-04  2:09   ` Halil Pasic
@ 2019-03-04 10:19     ` Pierre Morel
  2019-03-05 22:17     ` Tony Krowiak
  2019-03-12 21:39     ` Tony Krowiak
  2 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-03-04 10:19 UTC (permalink / raw)
  To: Halil Pasic
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On 04/03/2019 03:09, Halil Pasic wrote:
> On Fri, 22 Feb 2019 16:29:56 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> We need to associate the ap_vfio_queue, which will hold the
>> per queue information for interrupt with a matrix mediated device
>> which hold the configuration and the way to the CRYCB.
> [..]
>> +static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
>> +{
>> +	int apqi, apqn;
>> +	int ret = 0;
>> +	struct vfio_ap_queue *q;
>> +	struct list_head q_list;
>> +
>> +	INIT_LIST_HEAD(&q_list);
>> +
>> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>> +		apqn = AP_MKQID(apid, apqi);
>> +		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
>> +		if (!q) {
>> +			ret = -EADDRNOTAVAIL;
>> +			goto rewind;
>> +		}
>> +		if (q->matrix_mdev) {
>> +			ret = -EADDRINUSE;
> 
> You tried to get the q from matrix_dev->free_list thus modulo races
> q->matrix_mdev should be 0. This change breaks the error codes in a
> sense that it becomes impossible to provoke EADDRINUSE (the proper
> error code for taken by another matrix_mdev).
> 

right.
I will change this.

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device
  2019-02-26 18:27   ` Tony Krowiak
  2019-02-27  9:58     ` Pierre Morel
@ 2019-03-04 13:02     ` Cornelia Huck
  1 sibling, 0 replies; 79+ messages in thread
From: Cornelia Huck @ 2019-03-04 13:02 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: Pierre Morel, borntraeger, alex.williamson, linux-kernel,
	linux-s390, kvm, frankja, pasic, david, schwidefsky,
	heiko.carstens, freude, mimu

On Tue, 26 Feb 2019 13:27:57 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 2/22/19 10:29 AM, Pierre Morel wrote:
> > When the device is remove, we must make sure to
> > clear the interruption and reset the AP device.
> > 
> > We also need to clear the CRYCB of the guest.
> > 
> > Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> > ---
> >   drivers/s390/crypto/vfio_ap_drv.c     | 35 +++++++++++++++++++++++++++++++++++
> >   drivers/s390/crypto/vfio_ap_ops.c     |  3 ++-
> >   drivers/s390/crypto/vfio_ap_private.h |  3 +++
> >   3 files changed, 40 insertions(+), 1 deletion(-)
(...)
> >   /**
> > + * vfio_ap_update_crycb
> > + * @q: A pointer to the queue being removed
> > + *
> > + * We clear the APID of the queue, making this queue unusable for the guest.
> > + * After this function we can reset the queue without to fear a race with
> > + * the guest to access the queue again.
> > + * We do not fear race with the host as we still get the device.
> > + */
> > +static void vfio_ap_update_crycb(struct vfio_ap_queue *q)
> > +{
> > +	struct ap_matrix_mdev *matrix_mdev = q->matrix_mdev;
> > +
> > +	if (!matrix_mdev)
> > +		return;
> > +
> > +	clear_bit_inv(AP_QID_CARD(q->apqn), matrix_mdev->matrix.apm);
> > +
> > +	if (!matrix_mdev->kvm)
> > +		return;
> > +
> > +	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
> > +				  matrix_mdev->matrix.apm,
> > +				  matrix_mdev->matrix.aqm,
> > +				  matrix_mdev->matrix.adm);
> > +}
> > +
> > +/**
> >    * vfio_ap_queue_dev_remove:
> >    *
> >    * Free the associated vfio_ap_queue structure
> > @@ -70,6 +100,11 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
> >   	struct vfio_ap_queue *q;
> >   
> >   	q = dev_get_drvdata(&apdev->device);
> > +	if (!q)
> > +		return;
> > +
> > +	vfio_ap_update_crycb(q);
> > +	vfio_ap_mdev_reset_queue(q);  
> 
> The reset is unnecessary because once the card is removed from the
> CRYCB, the ZAPQ may fail with because the queue may not exist anymore.
> Besides, once the card is removed from the guest's CRYCB, the bus
> running in the guest will do a reset.

You cannot rely on whatever a sane guest would do, any needed cleanup
needs to be done by the host.

(No idea what actually needs to be done here :)

> 
> >   	list_del(&q->list);
> >   	kfree(q);
> >   }

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-03-04  2:09   ` Halil Pasic
  2019-03-04 10:19     ` Pierre Morel
@ 2019-03-05 22:17     ` Tony Krowiak
  2019-03-12 21:39     ` Tony Krowiak
  2 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2019-03-05 22:17 UTC (permalink / raw)
  To: Halil Pasic, Pierre Morel
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, david, schwidefsky, heiko.carstens, freude, mimu

On 3/3/19 9:09 PM, Halil Pasic wrote:
> On Fri, 22 Feb 2019 16:29:56 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> We need to associate the ap_vfio_queue, which will hold the
>> per queue information for interrupt with a matrix mediated device
>> which hold the configuration and the way to the CRYCB.
> [..]
>> +static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
>> +{
>> +	int apqi, apqn;
>> +	int ret = 0;
>> +	struct vfio_ap_queue *q;
>> +	struct list_head q_list;
>> +
>> +	INIT_LIST_HEAD(&q_list);
>> +
>> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>> +		apqn = AP_MKQID(apid, apqi);
>> +		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
>> +		if (!q) {
>> +			ret = -EADDRNOTAVAIL;
>> +			goto rewind;
>> +		}
>> +		if (q->matrix_mdev) {
>> +			ret = -EADDRINUSE;
> 
> You tried to get the q from matrix_dev->free_list thus modulo races
> q->matrix_mdev should be 0. This change breaks the error codes in a
> sense that it becomes impossible to provoke EADDRINUSE (the proper
> error code for taken by another matrix_mdev).

I don't understand what you are saying here. AFIU, the idea here is to
pull the q from the free list. If there is no q for the apqn on the free
list, then that would indicate the queue has not been bound to a driver
in which case the appropriate rc is EADDRNOTAVAIL. If the queue has
been bound, then a check is done to see whether the queue has been
associated with an mdev device. If so, the rc is -EADDRINUSE, which is
also appropriate. What am I missing?

> 
>> +			goto rewind;
>> +		}
>> +		list_move(&q->list, &q_list);
>> +	}
>> +	move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
>>   	return 0;
>> +rewind:
>> +	move_and_set(&q_list, &matrix_dev->free_list, NULL);
>> +	return ret;
>>   }
>> -
>>   /**
>> - * vfio_ap_mdev_verify_no_sharing
>> + * vfio_ap_get_all_cards:
>>    *
>> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
>> - * and AP queue indexes comprising the AP matrix are not configured for another
>> - * mediated device. AP queue sharing is not allowed.
>> + * @matrix_mdev: the matrix mediated device for which we want to associate
>> + *		 all available queues with a given apqi.
>> + * @apqi:	 The apqi which associated with all defined APID of the
>> + *		 mediated device will define a AP queue.
>>    *
>> - * @matrix_mdev: the mediated matrix device
>> + * We define a local list to put all queues we find on the matrix device
>> + * free list when associating the apqi with all already defined apid for
>> + * this matrix mediated device.
>>    *
>> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
>> + * If we can get all the devices we roll them to the mediated device list
>> + * If we get errors we unroll them to the free list.
>>    */
>> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>> +static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
>>   {
>> -	struct ap_matrix_mdev *lstdev;
>> -	DECLARE_BITMAP(apm, AP_DEVICES);
>> -	DECLARE_BITMAP(aqm, AP_DOMAINS);
>> -
>> -	list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
>> -		if (matrix_mdev == lstdev)
>> -			continue;
>> -
>> -		memset(apm, 0, sizeof(apm));
>> -		memset(aqm, 0, sizeof(aqm));
>> -
>> -		/*
>> -		 * We work on full longs, as we can only exclude the leftover
>> -		 * bits in non-inverse order. The leftover is all zeros.
>> -		 */
>> -		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
>> -				lstdev->matrix.apm, AP_DEVICES))
>> -			continue;
>> -
>> -		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
>> -				lstdev->matrix.aqm, AP_DOMAINS))
>> -			continue;
>> -
>> -		return -EADDRINUSE;
>> +	int apid, apqn;
>> +	int ret = 0;
>> +	struct vfio_ap_queue *q;
>> +	struct list_head q_list;
>> +	struct ap_matrix_mdev *tmp = NULL;
>> +
>> +	INIT_LIST_HEAD(&q_list);
>> +
>> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> +		apqn = AP_MKQID(apid, apqi);
>> +		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
>> +		if (!q) {
>> +			ret = -EADDRNOTAVAIL;
>> +			goto rewind;
>> +		}
>> +		if (q->matrix_mdev) {
>> +			ret = -EADDRINUSE;
> 
> Same here!
> 
> Regards,
> Halil
> 
>> +			goto rewind;
>> +		}
>> +		list_move(&q->list, &q_list);
>>   	}
> 
> [..]
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device
  2019-02-22 15:29 ` [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device Pierre Morel
  2019-02-26 18:27   ` Tony Krowiak
@ 2019-03-08 22:43   ` Tony Krowiak
  2019-03-11  8:31     ` Pierre Morel
  1 sibling, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2019-03-08 22:43 UTC (permalink / raw)
  To: Pierre Morel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 2/22/19 10:29 AM, Pierre Morel wrote:
> When the device is remove, we must make sure to
> clear the interruption and reset the AP device.
> 
> We also need to clear the CRYCB of the guest.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>   drivers/s390/crypto/vfio_ap_drv.c     | 35 +++++++++++++++++++++++++++++++++++
>   drivers/s390/crypto/vfio_ap_ops.c     |  3 ++-
>   drivers/s390/crypto/vfio_ap_private.h |  3 +++
>   3 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index eca0ffc..e5d91ff 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -5,6 +5,7 @@
>    * Copyright IBM Corp. 2018
>    *
>    * Author(s): Tony Krowiak <akrowiak@linux.ibm.com>
> + *	      Pierre Morel <pmorel@linux.ibm.com>
>    */
>   
>   #include <linux/module.h>
> @@ -12,6 +13,8 @@
>   #include <linux/slab.h>
>   #include <linux/string.h>
>   #include <asm/facility.h>
> +#include <linux/bitops.h>
> +#include <linux/kvm_host.h>
>   #include "vfio_ap_private.h"
>   
>   #define VFIO_AP_ROOT_NAME "vfio_ap"
> @@ -61,6 +64,33 @@ static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>   }
>   
>   /**
> + * vfio_ap_update_crycb
> + * @q: A pointer to the queue being removed
> + *
> + * We clear the APID of the queue, making this queue unusable for the guest.
> + * After this function we can reset the queue without to fear a race with
> + * the guest to access the queue again.
> + * We do not fear race with the host as we still get the device.
> + */
> +static void vfio_ap_update_crycb(struct vfio_ap_queue *q)
> +{
> +	struct ap_matrix_mdev *matrix_mdev = q->matrix_mdev;
> +
> +	if (!matrix_mdev)
> +		return;
> +
> +	clear_bit_inv(AP_QID_CARD(q->apqn), matrix_mdev->matrix.apm);
> +
> +	if (!matrix_mdev->kvm)
> +		return;
> +
> +	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
> +				  matrix_mdev->matrix.apm,
> +				  matrix_mdev->matrix.aqm,
> +				  matrix_mdev->matrix.adm);
> +}
> +
> +/**
>    * vfio_ap_queue_dev_remove:
>    *
>    * Free the associated vfio_ap_queue structure
> @@ -70,6 +100,11 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>   	struct vfio_ap_queue *q;
>   
>   	q = dev_get_drvdata(&apdev->device);
> +	if (!q)
> +		return;
> +
> +	vfio_ap_update_crycb(q);
> +	vfio_ap_mdev_reset_queue(q);

Since the bit corresponding to the APID is cleared in the
vfio_ap_update_crycb() above, shouldn't all queues on that
card also be reset?

>   	list_del(&q->list);
>   	kfree(q);
>   }
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 0196065..5b9bb33 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -59,6 +59,7 @@ int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>   			if (retry <= 0)
>   				pr_warn("%s: queue 0x%04x not empty\n",
>   					__func__, q->apqn);
> +			vfio_ap_free_irq(q);

Shouldn't this be done for the response codes that terminate this loop
such as those caught by the default case?

>   			return 0;
>   		case AP_RESPONSE_RESET_IN_PROGRESS:
>   		case AP_RESPONSE_BUSY:
> @@ -83,7 +84,7 @@ int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>    * Unregister the ISC from the GIB alert
>    * Clear the vfio_ap_queue intern fields
>    */
> -static void vfio_ap_free_irq(struct vfio_ap_queue *q)
> +void vfio_ap_free_irq(struct vfio_ap_queue *q)
>   {
>   	if (!q)
>   		return;
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index e2fd2c0..cc18215 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -4,6 +4,7 @@
>    *
>    * Author(s): Tony Krowiak <akrowiak@linux.ibm.com>
>    *	      Halil Pasic <pasic@linux.ibm.com>
> + *	      Pierre Morel <pmorel@linux.ibm.com>
>    *
>    * Copyright IBM Corp. 2018
>    */
> @@ -98,4 +99,6 @@ struct vfio_ap_queue {
>   	int	apqn;
>   	unsigned char isc;
>   };
> +void vfio_ap_free_irq(struct vfio_ap_queue *q);
> +int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q);
>   #endif /* _VFIO_AP_PRIVATE_H_ */
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device
  2019-03-08 22:43   ` Tony Krowiak
@ 2019-03-11  8:31     ` Pierre Morel
  2019-03-12 21:53       ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Pierre Morel @ 2019-03-11  8:31 UTC (permalink / raw)
  To: Tony Krowiak, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 08/03/2019 23:43, Tony Krowiak wrote:
> On 2/22/19 10:29 AM, Pierre Morel wrote:
>> When the device is remove, we must make sure to
>> clear the interruption and reset the AP device.
>>
>> We also need to clear the CRYCB of the guest.
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c     | 35 
>> +++++++++++++++++++++++++++++++++++
>>   drivers/s390/crypto/vfio_ap_ops.c     |  3 ++-
>>   drivers/s390/crypto/vfio_ap_private.h |  3 +++
>>   3 files changed, 40 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
>> b/drivers/s390/crypto/vfio_ap_drv.c
>> index eca0ffc..e5d91ff 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -5,6 +5,7 @@
>>    * Copyright IBM Corp. 2018
>>    *
>>    * Author(s): Tony Krowiak <akrowiak@linux.ibm.com>
>> + *          Pierre Morel <pmorel@linux.ibm.com>
>>    */
>>   #include <linux/module.h>
>> @@ -12,6 +13,8 @@
>>   #include <linux/slab.h>
>>   #include <linux/string.h>
>>   #include <asm/facility.h>
>> +#include <linux/bitops.h>
>> +#include <linux/kvm_host.h>
>>   #include "vfio_ap_private.h"
>>   #define VFIO_AP_ROOT_NAME "vfio_ap"
>> @@ -61,6 +64,33 @@ static int vfio_ap_queue_dev_probe(struct ap_device 
>> *apdev)
>>   }
>>   /**
>> + * vfio_ap_update_crycb
>> + * @q: A pointer to the queue being removed
>> + *
>> + * We clear the APID of the queue, making this queue unusable for the 
>> guest.
>> + * After this function we can reset the queue without to fear a race 
>> with
>> + * the guest to access the queue again.
>> + * We do not fear race with the host as we still get the device.
>> + */
>> +static void vfio_ap_update_crycb(struct vfio_ap_queue *q)
>> +{
>> +    struct ap_matrix_mdev *matrix_mdev = q->matrix_mdev;
>> +
>> +    if (!matrix_mdev)
>> +        return;
>> +
>> +    clear_bit_inv(AP_QID_CARD(q->apqn), matrix_mdev->matrix.apm);
>> +
>> +    if (!matrix_mdev->kvm)
>> +        return;
>> +
>> +    kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>> +                  matrix_mdev->matrix.apm,
>> +                  matrix_mdev->matrix.aqm,
>> +                  matrix_mdev->matrix.adm);
>> +}
>> +
>> +/**
>>    * vfio_ap_queue_dev_remove:
>>    *
>>    * Free the associated vfio_ap_queue structure
>> @@ -70,6 +100,11 @@ static void vfio_ap_queue_dev_remove(struct 
>> ap_device *apdev)
>>       struct vfio_ap_queue *q;
>>       q = dev_get_drvdata(&apdev->device);
>> +    if (!q)
>> +        return;
>> +
>> +    vfio_ap_update_crycb(q);
>> +    vfio_ap_mdev_reset_queue(q);
> 
> Since the bit corresponding to the APID is cleared in the
> vfio_ap_update_crycb() above, shouldn't all queues on that
> card also be reset?

I do not think so.
The remove function will be called in a loop for all queues by the bus.
No need to clear all queues.


> 
>>       list_del(&q->list);
>>       kfree(q);
>>   }
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index 0196065..5b9bb33 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -59,6 +59,7 @@ int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>>               if (retry <= 0)
>>                   pr_warn("%s: queue 0x%04x not empty\n",
>>                       __func__, q->apqn);
>> +            vfio_ap_free_irq(q);
> 
> Shouldn't this be done for the response codes that terminate this loop
> such as those caught by the default case?

I do not think so, the error code is returned and the caller may want to 
reset the queue again.
I think that doing the free inside the call to reset is not right.
I will investigate in this direction.

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-03-04  2:09   ` Halil Pasic
  2019-03-04 10:19     ` Pierre Morel
  2019-03-05 22:17     ` Tony Krowiak
@ 2019-03-12 21:39     ` Tony Krowiak
  2019-03-13 10:19       ` Pierre Morel
  2 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2019-03-12 21:39 UTC (permalink / raw)
  To: Halil Pasic, Pierre Morel
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, david, schwidefsky, heiko.carstens, freude, mimu

On 3/3/19 9:09 PM, Halil Pasic wrote:
> On Fri, 22 Feb 2019 16:29:56 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> We need to associate the ap_vfio_queue, which will hold the
>> per queue information for interrupt with a matrix mediated device
>> which hold the configuration and the way to the CRYCB.
> [..]
>> +static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
>> +{
>> +	int apqi, apqn;
>> +	int ret = 0;
>> +	struct vfio_ap_queue *q;
>> +	struct list_head q_list;
>> +
>> +	INIT_LIST_HEAD(&q_list);
>> +
>> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>> +		apqn = AP_MKQID(apid, apqi);
>> +		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
>> +		if (!q) {
>> +			ret = -EADDRNOTAVAIL;
>> +			goto rewind;
>> +		}
>> +		if (q->matrix_mdev) {
>> +			ret = -EADDRINUSE;
> 
> You tried to get the q from matrix_dev->free_list thus modulo races
> q->matrix_mdev should be 0. This change breaks the error codes in a
> sense that it becomes impossible to provoke EADDRINUSE (the proper
> error code for taken by another matrix_mdev).

It is necessary to determine if the queue is in use by another mdev, so
it will still be necessary to traverse all of the matrix_mdev structs to
see if q is in matrix_mdev->qlist. It seems that maintaining the qlist
does not buy us much.

> 
>> +			goto rewind;
>> +		}
>> +		list_move(&q->list, &q_list);
>> +	}
>> +	move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
>>   	return 0;
>> +rewind:
>> +	move_and_set(&q_list, &matrix_dev->free_list, NULL);
>> +	return ret;
>>   }
>> -
>>   /**
>> - * vfio_ap_mdev_verify_no_sharing
>> + * vfio_ap_get_all_cards:
>>    *
>> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
>> - * and AP queue indexes comprising the AP matrix are not configured for another
>> - * mediated device. AP queue sharing is not allowed.
>> + * @matrix_mdev: the matrix mediated device for which we want to associate
>> + *		 all available queues with a given apqi.
>> + * @apqi:	 The apqi which associated with all defined APID of the
>> + *		 mediated device will define a AP queue.
>>    *
>> - * @matrix_mdev: the mediated matrix device
>> + * We define a local list to put all queues we find on the matrix device
>> + * free list when associating the apqi with all already defined apid for
>> + * this matrix mediated device.
>>    *
>> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
>> + * If we can get all the devices we roll them to the mediated device list
>> + * If we get errors we unroll them to the free list.
>>    */
>> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>> +static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
>>   {
>> -	struct ap_matrix_mdev *lstdev;
>> -	DECLARE_BITMAP(apm, AP_DEVICES);
>> -	DECLARE_BITMAP(aqm, AP_DOMAINS);
>> -
>> -	list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
>> -		if (matrix_mdev == lstdev)
>> -			continue;
>> -
>> -		memset(apm, 0, sizeof(apm));
>> -		memset(aqm, 0, sizeof(aqm));
>> -
>> -		/*
>> -		 * We work on full longs, as we can only exclude the leftover
>> -		 * bits in non-inverse order. The leftover is all zeros.
>> -		 */
>> -		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
>> -				lstdev->matrix.apm, AP_DEVICES))
>> -			continue;
>> -
>> -		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
>> -				lstdev->matrix.aqm, AP_DOMAINS))
>> -			continue;
>> -
>> -		return -EADDRINUSE;
>> +	int apid, apqn;
>> +	int ret = 0;
>> +	struct vfio_ap_queue *q;
>> +	struct list_head q_list;
>> +	struct ap_matrix_mdev *tmp = NULL;
>> +
>> +	INIT_LIST_HEAD(&q_list);
>> +
>> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> +		apqn = AP_MKQID(apid, apqi);
>> +		q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
>> +		if (!q) {
>> +			ret = -EADDRNOTAVAIL;
>> +			goto rewind;
>> +		}
>> +		if (q->matrix_mdev) {
>> +			ret = -EADDRINUSE;
> 
> Same here!
> 
> Regards,
> Halil
> 
>> +			goto rewind;
>> +		}
>> +		list_move(&q->list, &q_list);
>>   	}
> 
> [..]
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device
  2019-03-11  8:31     ` Pierre Morel
@ 2019-03-12 21:53       ` Tony Krowiak
  2019-03-13 10:15         ` Pierre Morel
  0 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2019-03-12 21:53 UTC (permalink / raw)
  To: pmorel, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 3/11/19 4:31 AM, Pierre Morel wrote:
> On 08/03/2019 23:43, Tony Krowiak wrote:
>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>> When the device is remove, we must make sure to
>>> clear the interruption and reset the AP device.
>>>
>>> We also need to clear the CRYCB of the guest.
>>>
>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>> ---
>>>   drivers/s390/crypto/vfio_ap_drv.c     | 35 
>>> +++++++++++++++++++++++++++++++++++
>>>   drivers/s390/crypto/vfio_ap_ops.c     |  3 ++-
>>>   drivers/s390/crypto/vfio_ap_private.h |  3 +++
>>>   3 files changed, 40 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
>>> b/drivers/s390/crypto/vfio_ap_drv.c
>>> index eca0ffc..e5d91ff 100644
>>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>>> @@ -5,6 +5,7 @@
>>>    * Copyright IBM Corp. 2018
>>>    *
>>>    * Author(s): Tony Krowiak <akrowiak@linux.ibm.com>
>>> + *          Pierre Morel <pmorel@linux.ibm.com>
>>>    */
>>>   #include <linux/module.h>
>>> @@ -12,6 +13,8 @@
>>>   #include <linux/slab.h>
>>>   #include <linux/string.h>
>>>   #include <asm/facility.h>
>>> +#include <linux/bitops.h>
>>> +#include <linux/kvm_host.h>
>>>   #include "vfio_ap_private.h"
>>>   #define VFIO_AP_ROOT_NAME "vfio_ap"
>>> @@ -61,6 +64,33 @@ static int vfio_ap_queue_dev_probe(struct 
>>> ap_device *apdev)
>>>   }
>>>   /**
>>> + * vfio_ap_update_crycb
>>> + * @q: A pointer to the queue being removed
>>> + *
>>> + * We clear the APID of the queue, making this queue unusable for 
>>> the guest.
>>> + * After this function we can reset the queue without to fear a race 
>>> with
>>> + * the guest to access the queue again.
>>> + * We do not fear race with the host as we still get the device.
>>> + */
>>> +static void vfio_ap_update_crycb(struct vfio_ap_queue *q)
>>> +{
>>> +    struct ap_matrix_mdev *matrix_mdev = q->matrix_mdev;
>>> +
>>> +    if (!matrix_mdev)
>>> +        return;
>>> +

You should probably check whether the APID has been cleared before
proceeding. Take the case where an AP with multiple queues is removed
from the configuration via the SE or SCLP. The AP bus is going to invoke
the vfio_ap_queue_dev_remove() function for each of the queues. The APID
will get cleared on the first remove, so it is not only unnecessary to
clear it on subsequent removes, it is kind of nasty to keep resetting
the masks in the guest's CRYCB (below) each time the remove callback is
invoked.

>>> +    clear_bit_inv(AP_QID_CARD(q->apqn), matrix_mdev->matrix.apm);
>>> +
>>> +    if (!matrix_mdev->kvm)
>>> +        return;
>>> +
>>> +    kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>>> +                  matrix_mdev->matrix.apm,
>>> +                  matrix_mdev->matrix.aqm,
>>> +                  matrix_mdev->matrix.adm);
>>> +}
>>> +
>>> +/**
>>>    * vfio_ap_queue_dev_remove:
>>>    *
>>>    * Free the associated vfio_ap_queue structure
>>> @@ -70,6 +100,11 @@ static void vfio_ap_queue_dev_remove(struct 
>>> ap_device *apdev)
>>>       struct vfio_ap_queue *q;
>>>       q = dev_get_drvdata(&apdev->device);
>>> +    if (!q)
>>> +        return;
>>> +
>>> +    vfio_ap_update_crycb(q);
>>> +    vfio_ap_mdev_reset_queue(q);
>>
>> Since the bit corresponding to the APID is cleared in the
>> vfio_ap_update_crycb() above, shouldn't all queues on that
>> card also be reset?
> 
> I do not think so.
> The remove function will be called in a loop for all queues by the bus.
> No need to clear all queues.
> 
> 
>>
>>>       list_del(&q->list);
>>>       kfree(q);
>>>   }
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>> index 0196065..5b9bb33 100644
>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -59,6 +59,7 @@ int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
>>>               if (retry <= 0)
>>>                   pr_warn("%s: queue 0x%04x not empty\n",
>>>                       __func__, q->apqn);
>>> +            vfio_ap_free_irq(q);
>>
>> Shouldn't this be done for the response codes that terminate this loop
>> such as those caught by the default case?
> 
> I do not think so, the error code is returned and the caller may want to 
> reset the queue again.
> I think that doing the free inside the call to reset is not right.
> I will investigate in this direction.
> 
> Regards,
> Pierre
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device
  2019-03-12 21:53       ` Tony Krowiak
@ 2019-03-13 10:15         ` Pierre Morel
  0 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-03-13 10:15 UTC (permalink / raw)
  To: Tony Krowiak, borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	pasic, david, schwidefsky, heiko.carstens, freude, mimu

On 12/03/2019 22:53, Tony Krowiak wrote:
> On 3/11/19 4:31 AM, Pierre Morel wrote:
>> On 08/03/2019 23:43, Tony Krowiak wrote:
>>> On 2/22/19 10:29 AM, Pierre Morel wrote:
>>>> When the device is remove, we must make sure to
>>>> clear the interruption and reset the AP device.
>>>>
>>>> We also need to clear the CRYCB of the guest.
>>>>
>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>> ---
>>>>   drivers/s390/crypto/vfio_ap_drv.c     | 35 
>>>> +++++++++++++++++++++++++++++++++++
>>>>   drivers/s390/crypto/vfio_ap_ops.c     |  3 ++-
>>>>   drivers/s390/crypto/vfio_ap_private.h |  3 +++
>>>>   3 files changed, 40 insertions(+), 1 deletion(-)
>>>>

...snip...

>>>> + * vfio_ap_update_crycb
>>>> + * @q: A pointer to the queue being removed
>>>> + *
>>>> + * We clear the APID of the queue, making this queue unusable for 
>>>> the guest.
>>>> + * After this function we can reset the queue without to fear a 
>>>> race with
>>>> + * the guest to access the queue again.
>>>> + * We do not fear race with the host as we still get the devic
>>>> + */
>>>> +static void vfio_ap_update_crycb(struct vfio_ap_queue *q)
>>>> +{
>>>> +    struct ap_matrix_mdev *matrix_mdev = q->matrix_mdev;
>>>> +
>>>> +    if (!matrix_mdev)
>>>> +        return;
>>>> +
> 
> You should probably check whether the APID has been cleared before
> proceeding. Take the case where an AP with multiple queues is removed
> from the configuration via the SE or SCLP. The AP bus is going to invoke
> the vfio_ap_queue_dev_remove() function for each of the queues. The APID
> will get cleared on the first remove, so it is not only unnecessary to
> clear it on subsequent removes, it is kind of nasty to keep resetting
> the masks in the guest's CRYCB (below) each time the remove callback is
> invoked.

This can not happen.
The only way to clear the APM is when the matrix is not associated with KVM.

This case is tested and the masks are not changed.

> 
>>>> +    clear_bit_inv(AP_QID_CARD(q->apqn), matrix_mdev->matrix.apm);
>>>> +
>>>> +    if (!matrix_mdev->kvm)
>>>> +        return;
>>>> +
>>>> +    kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>>>> +                  matrix_mdev->matrix.apm,
>>>> +                  matrix_mdev->matrix.aqm,
>>>> +                  matrix_mdev->matrix.adm);
>>>> +}




-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev
  2019-03-12 21:39     ` Tony Krowiak
@ 2019-03-13 10:19       ` Pierre Morel
  0 siblings, 0 replies; 79+ messages in thread
From: Pierre Morel @ 2019-03-13 10:19 UTC (permalink / raw)
  To: Tony Krowiak, Halil Pasic
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, david, schwidefsky, heiko.carstens, freude, mimu

On 12/03/2019 22:39, Tony Krowiak wrote:
> On 3/3/19 9:09 PM, Halil Pasic wrote:
>> On Fri, 22 Feb 2019 16:29:56 +0100
>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>
>>> We need to associate the ap_vfio_queue, which will hold the
>>> per queue information for interrupt with a matrix mediated device
>>> which hold the configuration and the way to the CRYCB.
>> [..]
>>> +static int vfio_ap_get_all_domains(struct ap_matrix_mdev 
>>> *matrix_mdev, int apid)
>>> +{
>>> +    int apqi, apqn;
>>> +    int ret = 0;
>>> +    struct vfio_ap_queue *q;
>>> +    struct list_head q_list;
>>> +
>>> +    INIT_LIST_HEAD(&q_list);
>>> +
>>> +    for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>>> +        apqn = AP_MKQID(apid, apqi);
>>> +        q = vfio_ap_get_queue(apqn, &matrix_dev->free_list);
>>> +        if (!q) {
>>> +            ret = -EADDRNOTAVAIL;
>>> +            goto rewind;
>>> +        }
>>> +        if (q->matrix_mdev) {
>>> +            ret = -EADDRINUSE;
>>
>> You tried to get the q from matrix_dev->free_list thus modulo races
>> q->matrix_mdev should be 0. This change breaks the error codes in a
>> sense that it becomes impossible to provoke EADDRINUSE (the proper
>> error code for taken by another matrix_mdev).
> 
> It is necessary to determine if the queue is in use by another mdev, so
> it will still be necessary to traverse all of the matrix_mdev structs to
> see if q is in matrix_mdev->qlist. It seems that maintaining the qlist
> does not buy us much.
> 

Tony, Halil already pointed out this issue and I already answered.
Please, no need to duplicate the remarks.

Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2019-03-13 10:20 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-22 15:29 [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
2019-02-22 15:29 ` [PATCH v4 1/7] s390: ap: kvm: add PQAP interception for AQIC Pierre Morel
2019-02-25 18:36   ` Tony Krowiak
2019-02-26 11:47     ` Pierre Morel
2019-02-26 15:47       ` Tony Krowiak
2019-02-27  8:09         ` Pierre Morel
2019-02-27  9:13           ` Cornelia Huck
2019-02-27 10:16             ` Pierre Morel
2019-02-27 18:00           ` Tony Krowiak
2019-02-28  9:42             ` Christian Borntraeger
2019-02-28 11:03               ` Christian Borntraeger
2019-02-28 11:22                 ` Cornelia Huck
2019-02-28 13:16                   ` Pierre Morel
2019-02-28 13:52                     ` Cornelia Huck
2019-02-28 14:14                       ` Pierre Morel
2019-03-01 12:03                         ` Pierre Morel
2019-03-01 12:05                           ` Christian Borntraeger
2019-03-01 12:36                             ` Cornelia Huck
2019-03-01 15:32                               ` Pierre Morel
2019-02-28 13:10                 ` Pierre Morel
2019-02-28 15:36                 ` Tony Krowiak
2019-02-28 12:39               ` Halil Pasic
2019-02-28 14:12                 ` Pierre Morel
2019-02-28 16:51                   ` Halil Pasic
2019-03-01 12:10                     ` Pierre Morel
2019-02-28 15:43                 ` Tony Krowiak
2019-02-28 13:23               ` Pierre Morel
2019-02-28 13:44                 ` Christian Borntraeger
2019-02-28 13:47                   ` Pierre Morel
2019-02-28 14:07                     ` Halil Pasic
2019-02-28 14:13                       ` Pierre Morel
2019-02-28 15:45                   ` Tony Krowiak
2019-02-28 15:35               ` Tony Krowiak
2019-03-01  8:42                 ` Christian Borntraeger
2019-02-28  8:31     ` Christian Borntraeger
2019-02-22 15:29 ` [PATCH v4 2/7] s390: ap: new vfio_ap_queue structure Pierre Morel
2019-02-26 16:10   ` Tony Krowiak
2019-02-27  8:40     ` Pierre Morel
2019-02-27 20:35       ` Tony Krowiak
2019-02-22 15:29 ` [PATCH v4 3/7] s390: ap: associate a ap_vfio_queue and a matrix mdev Pierre Morel
2019-02-26 18:14   ` Tony Krowiak
2019-02-27  9:29     ` Pierre Morel
2019-02-27 20:14       ` Tony Krowiak
2019-02-27  9:32   ` Cornelia Huck
2019-02-27 10:21     ` Pierre Morel
2019-02-27 10:44     ` Pierre Morel
2019-02-27 20:53   ` Tony Krowiak
2019-03-04  2:09   ` Halil Pasic
2019-03-04 10:19     ` Pierre Morel
2019-03-05 22:17     ` Tony Krowiak
2019-03-12 21:39     ` Tony Krowiak
2019-03-13 10:19       ` Pierre Morel
2019-02-22 15:29 ` [PATCH v4 4/7] vfio: ap: register IOMMU VFIO notifier Pierre Morel
2019-02-27  9:42   ` Cornelia Huck
2019-02-27 10:22     ` Pierre Morel
2019-02-28  8:23   ` Christian Borntraeger
2019-02-28  8:48     ` Pierre Morel
2019-02-28 16:55       ` Halil Pasic
2019-03-01  7:51         ` Christian Borntraeger
2019-02-22 15:29 ` [PATCH v4 5/7] s390: ap: implement PAPQ AQIC interception in kernel Pierre Morel
2019-02-26 18:23   ` Tony Krowiak
2019-02-27  9:54     ` Pierre Morel
2019-02-27 18:17       ` Tony Krowiak
2019-02-27 18:18   ` Tony Krowiak
2019-02-28 20:20   ` Christian Borntraeger
2019-03-01  9:35     ` Pierre Morel
2019-03-04  1:57   ` Halil Pasic
2019-03-04  9:47     ` Pierre Morel
2019-02-22 15:29 ` [PATCH v4 6/7] s390: ap: Cleanup on removing the AP device Pierre Morel
2019-02-26 18:27   ` Tony Krowiak
2019-02-27  9:58     ` Pierre Morel
2019-03-04 13:02     ` Cornelia Huck
2019-03-08 22:43   ` Tony Krowiak
2019-03-11  8:31     ` Pierre Morel
2019-03-12 21:53       ` Tony Krowiak
2019-03-13 10:15         ` Pierre Morel
2019-02-22 15:30 ` [PATCH v4 7/7] s390: ap: kvm: Enable PQAP/AQIC facility for the guest Pierre Morel
2019-02-28 15:08 ` [PATCH v4 0/7] vfio: ap: AP Queue Interrupt Control Halil Pasic
2019-03-01  9:40   ` Pierre Morel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).