linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/7] vfio: ap: AP Queue Interrupt Control
@ 2019-03-13 16:04 Pierre Morel
  2019-03-13 16:04 ` [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC Pierre Morel
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-13 16:04 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

This patch implement PQAP/AQIC interception in KVM.

To implement this we need to add a new structure, vfio_ap_queue,to be
able to retrieve the mediated device associated with a queue and specific
values needed to register/unregister the interrupt structures:
 - APQN: to be able to issue the commands and search for queue structures
 - NIB : to unpin the NIB on clear IRQ
 - ISC : to unregister with the GIB interface
 - MATRIX: a pointer to the matrix mediated device
 - LIST: the list_head to handle the vfio_queue life cycle

Having this structure and the list management greatly ease the handling
of the AP queues and diminues the LOCs needed in the vfio_ap driver by
more than 150 lines in comparison with the previous version.


0) Queues life cycle

vfio_ap_queues are created on probe

We define one bucket on the matrix device to store the free vfio_ap_queues,
the queues not assign to any matrix mediated device.

We define one bucket on each matrix mediated device to hold the
vfio_ap_queues belonging to it.

vfio_ap_queues are deleted on remove

This makes the search for a queue easy and the detection of assignent
incoherency obvious (the queue is not avilable) and simplifies assignment.


1) Phase 1, probe and remove from vfio_ap_queue

The vfio_ap_queue structures are dynamically allocated and setup
when a queue is probed by the ap_vfio_driver.
The vfio_ap_queue is linked to the ap_queue device as the driver data.

The new The vfio_ap_queue is put on a free_list belonging to the
matrix device.

The vfio_ap_queue are free during remove.


2) Phase 2, assignment of vfio_ap_queue to a mediated device

When a APID is assigned we look for APQI already assigned to
the matrix mediated device and associate all the queue with the
APQN = (APID,APQI) to the mediated device by adding them to
the mediated device queue list.
We do the same when a APQI is assigned.

If any queue with a matching APQN can not be found on the matrix
device free list it means it is already associated to another matrix
mediated device and no queue is added to the matrix mediated device.

3) Phase 3, starting the guest

When the VFIO device is opened the PQAP callback and a pointer to
the matrix mediated device are set inside KVM during the open callback.

When the device is closed or if a queue is removed, the vfio_ap_queue is
dissociated from the mediated device.


4) Phase 3 intercepting the PQAP/AQIC instruction

On interception of the PQAP/AQIC instruction, the interception code
makes sure the pqap_hook is initialized and allowed to be called
and call it.
Otherwise it reports the usual -EOPNOTSUPP return code to let
QEMU handle the fault.
  
the pqap callback search for the queue asociated with the APQN
stored in the register 0, setting the code to "illegal APQN"
if the vfio_ap_queue can not be found.

Depending on the "i" bit of the register 1, the pqap callback
setup or clear the interruption by calling the host format PQAP/AQIC
instruction.
When seting up the interruption it uses the NIB and the guest ISC
provided by the guest and the host ISC provided by the registration
to the GIB code, pin the NIB and also stores ISC and NIB inside
the vfio_ap_queue structure.
When clearing the interrupt it retrieves the host ISC to unregister
with the GIB code and unpin the NIB.

We take care when enabling GISA that the guest may have issued a
reset and will not need to disable the interuptions before
re-enabling interruptions.

To make sure that the module holding the callback does not disapear
we use a module reference counting in the structure containing the
callback.


5) Phase 4 clean dissociation from the mediated device on remove

On removing of the AP device the remove callback is called.
To be sure that the guest will not access the queue anymore
we clear the APID CRYCB bit.
Cleaning the APID, over the APQI, is chosen because the architecture
specifies that only the APID can be dynamically changed outside IPL.

To be sure that the IRQ is cleared before the GISA is free we use
the KVM reference counting, raise it in open, lower it on release.


6) Associated QEMU patch

There is a QEMU patch which is needed to enable the PQAP/AQIC
facility in the guest.

Posted in qemu-devel@nongnu.org as:
Message-Id: <1550146494-21085-1-git-send-email-pmorel@linux.ibm.com>



Pierre Morel (7):
  s390: ap: kvm: add PQAP interception for AQIC
  s390: ap: new vfio_ap_queue structure
  vfio: ap: register IOMMU VFIO notifier
  s390: ap: setup relation betwen KVM and mediated device
  s390: ap: implement PAPQ AQIC interception in kernel
  s390: ap: Cleanup on removing the AP device
  s390: ap: kvm: Enable PQAP/AQIC facility for the guest

 arch/s390/include/asm/kvm_host.h      |   8 +
 arch/s390/kvm/priv.c                  |  62 +++
 arch/s390/tools/gen_facilities.c      |   1 +
 drivers/s390/crypto/ap_bus.h          |   1 +
 drivers/s390/crypto/vfio_ap_drv.c     |  69 +++-
 drivers/s390/crypto/vfio_ap_ops.c     | 727 +++++++++++++++++++++++-----------
 drivers/s390/crypto/vfio_ap_private.h |  20 +
 7 files changed, 658 insertions(+), 230 deletions(-)

-- 
2.7.4

Changelog from v4:
- Add forgotten locking for vfio_get_queue() in pqap callback
  (Conny / Halil)
- Add KVM reference counting to make sure GISA is free after IRQ
  (Christian / Halil)
- Take care that ISC = 0 is a valid ISC
  (Halil)
- Integrate the PQAP call back in a structure with module owner
  reference counting to make sure the callback does not disappear.
- Restrict functionality to always open KVM before opening the
  VFIO device.
- Search all devices in the vfio_ap driver list when associating
  a queue to a mediated device
  (Halil / Tony)
- Get vfio_ap_free_irq() out of vfio_ap_mdev_reset_queue() to call
  it always, whatever the result of the reset.
  (Tony)

Changelog from v3:
- Associating the vfio_queues during APID/APQI assign
  (Tony)
- Dissociating the vfio_queues during APID/APQI unassign
  (Tony)
- Taking care that the guest can directly disable the interrupt
  by using a RESET
  (Halil)
- Remove the patch creating the matrix bus to accelerate its
  integration in Linux stable
  (Christian)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-13 16:04 [PATCH v5 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
@ 2019-03-13 16:04 ` Pierre Morel
  2019-03-15 10:20   ` Cornelia Huck
  2019-03-13 16:04 ` [PATCH v5 2/7] s390: ap: new vfio_ap_queue structure Pierre Morel
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Pierre Morel @ 2019-03-13 16:04 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

We prepare the interception of the PQAP/AQIC instruction for
the case the AQIC facility is enabled in the guest.

We add a callback inside the KVM arch structure for s390 for
a VFIO driver to handle a specific response to the PQAP
instruction with the AQIC command and only this command.
The preceding behavior for other commands should not change.

We inject the correct exceptions from inside KVM for the case the
callback is not initialized, which happens when the vfio_ap driver
is not loaded.

It is the duty of the vfio_driver to setup a pqap callback inside
the crypto structure.
If the callback has been setup we call it.
If not we setup an answer considering that no queue is available
for the guest when no callback has been setup.

We do consider the responsability of the driver to always initialize
the PQAP callback if it defines queues by initializing the CRYCB for
a guest.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h      |  8 +++++
 arch/s390/kvm/priv.c                  | 62 +++++++++++++++++++++++++++++++++++
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 3 files changed, 72 insertions(+)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a496276..624460b 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -18,6 +18,7 @@
 #include <linux/kvm_host.h>
 #include <linux/kvm.h>
 #include <linux/seqlock.h>
+#include <linux/module.h>
 #include <asm/debug.h>
 #include <asm/cpu.h>
 #include <asm/fpu/api.h>
@@ -721,8 +722,15 @@ struct kvm_s390_cpu_model {
 	unsigned short ibc;
 };
 
+struct kvm_s390_module_hook {
+	int (*hook)(struct kvm_vcpu *vcpu);
+	void *data;
+	struct module *owner;
+};
+
 struct kvm_s390_crypto {
 	struct kvm_s390_crypto_cb *crycb;
+	struct kvm_s390_module_hook *pqap_hook;
 	__u32 crycbd;
 	__u8 aes_kw;
 	__u8 dea_kw;
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 8679bd7..72f683a 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -27,6 +27,7 @@
 #include <asm/io.h>
 #include <asm/ptrace.h>
 #include <asm/sclp.h>
+#include <asm/ap.h>
 #include "gaccess.h"
 #include "kvm-s390.h"
 #include "trace.h"
@@ -592,6 +593,65 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
 	}
 }
 
+/*
+ * handle_pqap: Handling pqap interception
+ * @vcpu: the vcpu having issue the pqap instruction
+ *
+ * We now support PQAP/AQIC instructions and we need to correctly
+ * answer the guest even if no dedicated driver's hook is available.
+ *
+ * The intercepting code calls a dedicated callback for this instruction
+ * if a driver did register one in the CRYPTO satellite of the
+ * SIE block.
+ *
+ * For PQAP/AQIC instructions only, verify privilege and specifications.
+ *
+ * If no callback available, the queues are not available, return this to
+ * the caller.
+ * Else return the value returned by the callback.
+ */
+static int handle_pqap(struct kvm_vcpu *vcpu)
+{
+	uint8_t fc;
+	struct ap_queue_status status = {};
+	int ret;
+	/* Verify that the AP instruction are available */
+	if (!ap_instructions_available())
+		return -EOPNOTSUPP;
+	/* Verify that the guest is allowed to use AP instructions */
+	if (!(vcpu->arch.sie_block->eca & ECA_APIE))
+		return -EOPNOTSUPP;
+	/* Verify that the function code is AQIC */
+	fc = vcpu->run->s.regs.gprs[0] >> 24;
+	/* We do not want to change the behavior we had before this patch*/
+	if (fc != 0x03)
+		return -EOPNOTSUPP;
+
+	/* PQAP instructions are allowed for guest kernel only */
+	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
+		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
+	/* AQIC instruction is allowed only if facility 65 is available */
+	if (!test_kvm_facility(vcpu->kvm, 65))
+		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
+	/* Verify that the hook callback is registered and call it */
+	if (vcpu->kvm->arch.crypto.pqap_hook) {
+		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
+			return -EOPNOTSUPP;
+		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
+		module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
+		return ret;
+	}
+	/*
+	 * It is the duty of the vfio_driver to register a hook
+	 * If it does not and we get an exception on AQIC we must
+	 * guess that there is no vfio_ap_driver at all and no one
+	 * to handle the guests's CRYCB and the CRYCB is empty.
+	 */
+	status.response_code = 0x01;
+	memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
+	return 0;
+}
+
 static int handle_stfl(struct kvm_vcpu *vcpu)
 {
 	int rc;
@@ -878,6 +938,8 @@ int kvm_s390_handle_b2(struct kvm_vcpu *vcpu)
 		return handle_sthyi(vcpu);
 	case 0x7d:
 		return handle_stsi(vcpu);
+	case 0xaf:
+		return handle_pqap(vcpu);
 	case 0xb1:
 		return handle_stfl(vcpu);
 	case 0xb2:
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 76b7f98..a910be1 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -16,6 +16,7 @@
 #include <linux/mdev.h>
 #include <linux/delay.h>
 #include <linux/mutex.h>
+#include <linux/kvm_host.h>
 
 #include "ap_bus.h"
 
@@ -81,6 +82,7 @@ struct ap_matrix_mdev {
 	struct ap_matrix matrix;
 	struct notifier_block group_notifier;
 	struct kvm *kvm;
+	struct kvm_s390_module_hook pqap_hook;
 };
 
 extern int vfio_ap_mdev_register(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 2/7] s390: ap: new vfio_ap_queue structure
  2019-03-13 16:04 [PATCH v5 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
  2019-03-13 16:04 ` [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC Pierre Morel
@ 2019-03-13 16:04 ` Pierre Morel
  2019-03-15 10:33   ` Cornelia Huck
  2019-03-13 16:05 ` [PATCH v5 3/7] vfio: ap: register IOMMU VFIO notifier Pierre Morel
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Pierre Morel @ 2019-03-13 16:04 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

The AP interruptions are assigned on a queue basis and
the GISA structure is handled on a VM basis, so that
we need to add a structure we can retrieve from both side
holding the information we need to handle PQAP/AQIC interception
and setup the GISA.

Since we can not add more information to the ap_device
we add a new structure vfio_ap_queue, to hold per queue
information useful to handle interruptions and set it as
driver's data of the standard ap_queue device.

Usually, the device and the mediated device are linked together
but in the vfio_ap driver design we have a bunch of "sub" devices
(the ap_queue devices) belonging to the mediated device.

Linking these structure to the mediated device it is assigned to,
with the help of the vfio_ap_queue structure will help us to
retrieve the AP devices associated with the mediated devices
during the mediated device operations.

------------    -------------
| AP queue |--> | AP_vfio_q |<----
------------    ------^------    |    ---------------
                      |          <--->| matrix_mdev |
------------    ------v------    |    ---------------
| AP queue |--> | AP_vfio_q |-----
------------    -------------

The vfio_ap_queue device will hold the following entries:
- apqn: AP queue number (defined here)
- isc : Interrupt subclass (defined later)
- nib : notification information byte (defined later)
- list: a list_head entry allowing to link this structure to a
	matrix mediated device it is assigned to.

The vfio_ap_queue structure is allocated when the vfio_ap_driver
is probed and added as driver data to the ap_queue device.
It is free on remove.

The structure is linked to the matrix_dev host device at the
probe of the device building some kind of free list for the
matrix mediated devices.

When the vfio_queue is associated to a matrix mediated device,
during assign_adapter or assign_domain,
the vfio_ap_queue device is linked to this matrix mediated device
and unlinked when dissociated.

Queuing the devices on a list of free devices and testing the
matrix_mdev pointer to the associated matrix allow us to know
if the queue is associated to the matrix device and associated
or not to a mediated device.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |  31 ++-
 drivers/s390/crypto/vfio_ap_ops.c     | 390 +++++++++++++++++-----------------
 drivers/s390/crypto/vfio_ap_private.h |   7 +
 3 files changed, 234 insertions(+), 194 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index e9824c3..df6f21a 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -40,14 +40,42 @@ static struct ap_device_id ap_queue_ids[] = {
 
 MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
 
+/**
+ * vfio_ap_queue_dev_probe:
+ *
+ * Allocate a vfio_ap_queue structure and associate it
+ * with the device as driver_data.
+ */
 static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
 {
+	struct vfio_ap_queue *q;
+
+	q = kzalloc(sizeof(*q), GFP_KERNEL);
+	if (!q)
+		return -ENOMEM;
+	dev_set_drvdata(&apdev->device, q);
+	q->apqn = to_ap_queue(&apdev->device)->qid;
+	INIT_LIST_HEAD(&q->list);
+	mutex_lock(&matrix_dev->lock);
+	list_add(&q->list, &matrix_dev->free_list);
+	mutex_unlock(&matrix_dev->lock);
 	return 0;
 }
 
+/**
+ * vfio_ap_queue_dev_remove:
+ *
+ * Free the associated vfio_ap_queue structure
+ */
 static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
 {
-	/* Nothing to do yet */
+	struct vfio_ap_queue *q;
+
+	q = dev_get_drvdata(&apdev->device);
+	mutex_lock(&matrix_dev->lock);
+	list_del(&q->list);
+	mutex_unlock(&matrix_dev->lock);
+	kfree(q);
 }
 
 static void vfio_ap_matrix_dev_release(struct device *dev)
@@ -107,6 +135,7 @@ static int vfio_ap_matrix_dev_create(void)
 	matrix_dev->device.bus = &matrix_bus;
 	matrix_dev->device.release = vfio_ap_matrix_dev_release;
 	matrix_dev->vfio_ap_drv = &vfio_ap_drv;
+	INIT_LIST_HEAD(&matrix_dev->free_list);
 
 	ret = device_register(&matrix_dev->device);
 	if (ret)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 900b9cf..485595c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -24,6 +24,48 @@
 #define VFIO_AP_MDEV_TYPE_HWVIRT "passthrough"
 #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
 
+/**
+ * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
+ * @apqn: The queue APQN
+ *
+ * Retrieve a queue with a specific APQN from the list of the
+ * devices associated with a list.
+ *
+ * Returns the pointer to the associated vfio_ap_queue
+ */
+struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
+{
+	struct vfio_ap_queue *q;
+
+	list_for_each_entry(q, l, list)
+		if (q->apqn == apqn)
+			return q;
+	return NULL;
+}
+
+static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
+{
+	struct ap_queue_status status;
+	int retry = 1;
+
+	do {
+		status = ap_zapq(q->apqn);
+		switch (status.response_code) {
+		case AP_RESPONSE_NORMAL:
+			return 0;
+		case AP_RESPONSE_RESET_IN_PROGRESS:
+		case AP_RESPONSE_BUSY:
+			msleep(20);
+			break;
+		default:
+			/* things are really broken, give up */
+			return -EIO;
+		}
+	} while (retry--);
+
+	return -EBUSY;
+}
+
 static void vfio_ap_matrix_init(struct ap_config_info *info,
 				struct ap_matrix *matrix)
 {
@@ -45,6 +87,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 		return -ENOMEM;
 	}
 
+	INIT_LIST_HEAD(&matrix_mdev->qlist);
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	mutex_lock(&matrix_dev->lock);
@@ -113,162 +156,178 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
 	NULL,
 };
 
-struct vfio_ap_queue_reserved {
-	unsigned long *apid;
-	unsigned long *apqi;
-	bool reserved;
-};
+static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
+{
+	struct vfio_ap_queue *q;
+
+	q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
+	if (!q)
+		return;
+	q->matrix_mdev = NULL;
+	vfio_ap_mdev_reset_queue(q);
+	list_move(&q->list, &matrix_dev->free_list);
+}
 
 /**
- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
- *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- *   as reserved if the APID and APQI fields for the AP queue device matches
+ * vfio_ap_put_all_domains:
  *
- * - If @data contains only an apid value, @data will be flagged as
- *   reserved if the APID field in the AP queue device matches
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ *		 all available queues with a given apqi.
+ * @apid:	 The apid which associated with all defined APQI of the
+ *		 mediated device will define a AP queue.
  *
- * - If @data contains only an apqi value, @data will be flagged as
- *   reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
+ * We remove the queue from the list of queues associated with the
+ * mediated device and put them back to the free list of the matrix
+ * device and clear the matrix_mdev pointer.
  */
-static int vfio_ap_has_queue(struct device *dev, void *data)
+static void vfio_ap_put_all_domains(struct ap_matrix_mdev *matrix_mdev,
+				    int apid)
 {
-	struct vfio_ap_queue_reserved *qres = data;
-	struct ap_queue *ap_queue = to_ap_queue(dev);
-	ap_qid_t qid;
-	unsigned long id;
+	int apqi, apqn;
 
-	if (qres->apid && qres->apqi) {
-		qid = AP_MKQID(*qres->apid, *qres->apqi);
-		if (qid == ap_queue->qid)
-			qres->reserved = true;
-	} else if (qres->apid && !qres->apqi) {
-		id = AP_QID_CARD(ap_queue->qid);
-		if (id == *qres->apid)
-			qres->reserved = true;
-	} else if (!qres->apid && qres->apqi) {
-		id = AP_QID_QUEUE(ap_queue->qid);
-		if (id == *qres->apqi)
-			qres->reserved = true;
-	} else {
-		return -EINVAL;
+	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
+		apqn = AP_MKQID(apid, apqi);
+		vfio_ap_free_queue(apqn, matrix_mdev);
 	}
-
-	return 0;
 }
 
 /**
- * vfio_ap_verify_queue_reserved
- *
- * @matrix_dev: a mediated matrix device
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
- *
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
- * driver according to the following rules:
- *
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- *   device bound to the vfio_ap driver with the APQN identified by @apid and
- *   @apqi
+ * vfio_ap_put_all_cards:
  *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apid
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ *		 all available queues with a given apqi.
+ * @apqi:	 The apqi which associated with all defined APID of the
+ *		 mediated device will define a AP queue.
  *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apqi
- *
- * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
+ * We remove the queue from the list of queues associated with the
+ * mediated device and put them back to the free list of the matrix
+ * device and clear the matrix_mdev pointer.
  */
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
-					 unsigned long *apqi)
+static void vfio_ap_put_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
 {
-	int ret;
-	struct vfio_ap_queue_reserved qres;
+	int apid, apqn;
+
+	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+		apqn = AP_MKQID(apid, apqi);
+		vfio_ap_free_queue(apqn, matrix_mdev);
+	}
+}
 
-	qres.apid = apid;
-	qres.apqi = apqi;
-	qres.reserved = false;
+static void move_and_set(struct list_head *src, struct list_head *dst,
+			 struct ap_matrix_mdev *matrix_mdev)
+{
+	struct vfio_ap_queue *q, *qtmp;
 
-	ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
-				     &qres, vfio_ap_has_queue);
-	if (ret)
-		return ret;
+	list_for_each_entry_safe(q, qtmp, src, list) {
+		list_move(&q->list, dst);
+		q->matrix_mdev = matrix_mdev;
+	}
+}
 
-	if (qres.reserved)
-		return 0;
+static int vfio_ap_queue_match(struct device *dev, void *data)
+{
+	struct ap_queue *ap;
 
-	return -EADDRNOTAVAIL;
+	ap = to_ap_queue(dev);
+	return ap->qid == *(int *)data;
 }
 
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
-					     unsigned long apid)
+static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
 {
-	int ret;
-	unsigned long apqi;
-	unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
+	struct device *dev;
 
-	if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
-		return vfio_ap_verify_queue_reserved(&apid, NULL);
+	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
+				 &apqn, vfio_ap_queue_match);
+	return dev ? dev_get_drvdata(dev) : NULL;
+}
 
-	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
-		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
-		if (ret)
-			return ret;
+/**
+ * vfio_ap_get_all_domains:
+ *
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ *		 all available queues with a given apqi.
+ * @apqi:	 The apqi which associated with all defined APID of the
+ *		 mediated device will define a AP queue.
+ *
+ * We define a local list to put all queues we find on the matrix driver
+ * device list when associating the apqi with all already defined apid for
+ * this matrix mediated device.
+ *
+ * If we can get all the devices we roll them to the mediated device list
+ * If we get errors we unroll them to the free list.
+ */
+static int vfio_ap_get_all_domains(struct ap_matrix_mdev *matrix_mdev, int apid)
+{
+	int apqi, apqn;
+	int ret = 0;
+	struct vfio_ap_queue *q;
+	struct list_head q_list;
+
+	INIT_LIST_HEAD(&q_list);
+
+	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
+		apqn = AP_MKQID(apid, apqi);
+		q = vfio_ap_find_queue(apqn);
+		if (!q) {
+			ret = -EADDRNOTAVAIL;
+			goto rewind;
+		}
+		if (q->matrix_mdev) {
+			ret = -EADDRINUSE;
+			goto rewind;
+		}
+		list_move(&q->list, &q_list);
 	}
-
+	move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
 	return 0;
+rewind:
+	move_and_set(&q_list, &matrix_dev->free_list, NULL);
+	return ret;
 }
-
 /**
- * vfio_ap_mdev_verify_no_sharing
+ * vfio_ap_get_all_cards:
  *
- * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
- * mediated device. AP queue sharing is not allowed.
+ * @matrix_mdev: the matrix mediated device for which we want to associate
+ *		 all available queues with a given apqi.
+ * @apqi:	 The apqi which associated with all defined APID of the
+ *		 mediated device will define a AP queue.
  *
- * @matrix_mdev: the mediated matrix device
+ * We define a local list to put all queues we find on the matrix device
+ * free list when associating the apqi with all already defined apid for
+ * this matrix mediated device.
  *
- * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
+ * If we can get all the devices we roll them to the mediated device list
+ * If we get errors we unroll them to the free list.
  */
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_get_all_cards(struct ap_matrix_mdev *matrix_mdev, int apqi)
 {
-	struct ap_matrix_mdev *lstdev;
-	DECLARE_BITMAP(apm, AP_DEVICES);
-	DECLARE_BITMAP(aqm, AP_DOMAINS);
-
-	list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
-		if (matrix_mdev == lstdev)
-			continue;
-
-		memset(apm, 0, sizeof(apm));
-		memset(aqm, 0, sizeof(aqm));
-
-		/*
-		 * We work on full longs, as we can only exclude the leftover
-		 * bits in non-inverse order. The leftover is all zeros.
-		 */
-		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
-				lstdev->matrix.apm, AP_DEVICES))
-			continue;
-
-		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
-				lstdev->matrix.aqm, AP_DOMAINS))
-			continue;
-
-		return -EADDRINUSE;
+	int apid, apqn;
+	int ret = 0;
+	struct vfio_ap_queue *q;
+	struct list_head q_list;
+	struct ap_matrix_mdev *tmp = NULL;
+
+	INIT_LIST_HEAD(&q_list);
+
+	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+		apqn = AP_MKQID(apid, apqi);
+		q = vfio_ap_find_queue(apqn);
+		if (!q) {
+			ret = -EADDRNOTAVAIL;
+			goto rewind;
+		}
+		if (q->matrix_mdev) {
+			ret = -EADDRINUSE;
+			goto rewind;
+		}
+		list_move(&q->list, &q_list);
 	}
-
+	tmp = matrix_mdev;
+	move_and_set(&q_list, &matrix_mdev->qlist, matrix_mdev);
 	return 0;
+rewind:
+	move_and_set(&q_list, &matrix_dev->free_list, NULL);
+	return ret;
 }
 
 /**
@@ -330,21 +389,15 @@ static ssize_t assign_adapter_store(struct device *dev,
 	 */
 	mutex_lock(&matrix_dev->lock);
 
-	ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
+	ret = vfio_ap_get_all_domains(matrix_mdev, apid);
 	if (ret)
 		goto done;
 
 	set_bit_inv(apid, matrix_mdev->matrix.apm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
-	if (ret)
-		goto share_err;
-
 	ret = count;
 	goto done;
 
-share_err:
-	clear_bit_inv(apid, matrix_mdev->matrix.apm);
 done:
 	mutex_unlock(&matrix_dev->lock);
 
@@ -391,32 +444,13 @@ static ssize_t unassign_adapter_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+	vfio_ap_put_all_domains(matrix_mdev, apid);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(unassign_adapter);
 
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
-					     unsigned long apqi)
-{
-	int ret;
-	unsigned long apid;
-	unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
-
-	if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
-		return vfio_ap_verify_queue_reserved(NULL, &apqi);
-
-	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
-		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
-}
-
 /**
  * assign_domain_store
  *
@@ -471,21 +505,15 @@ static ssize_t assign_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 
-	ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
+	ret = vfio_ap_get_all_cards(matrix_mdev, apqi);
 	if (ret)
 		goto done;
 
 	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
-	if (ret)
-		goto share_err;
-
 	ret = count;
 	goto done;
 
-share_err:
-	clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
 done:
 	mutex_unlock(&matrix_dev->lock);
 
@@ -533,6 +561,7 @@ static ssize_t unassign_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
+	vfio_ap_put_all_cards(matrix_mdev, apqi);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -790,49 +819,22 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
-static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
-				    unsigned int retry)
-{
-	struct ap_queue_status status;
-
-	do {
-		status = ap_zapq(AP_MKQID(apid, apqi));
-		switch (status.response_code) {
-		case AP_RESPONSE_NORMAL:
-			return 0;
-		case AP_RESPONSE_RESET_IN_PROGRESS:
-		case AP_RESPONSE_BUSY:
-			msleep(20);
-			break;
-		default:
-			/* things are really broken, give up */
-			return -EIO;
-		}
-	} while (retry--);
-
-	return -EBUSY;
-}
-
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
 {
 	int ret;
 	int rc = 0;
-	unsigned long apid, apqi;
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+	struct vfio_ap_queue *q;
 
-	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
-			     matrix_mdev->matrix.apm_max + 1) {
-		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
-				     matrix_mdev->matrix.aqm_max + 1) {
-			ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
-			/*
-			 * Regardless whether a queue turns out to be busy, or
-			 * is not operational, we need to continue resetting
-			 * the remaining queues.
-			 */
-			if (ret)
-				rc = ret;
-		}
+	list_for_each_entry(q, &matrix_mdev->qlist, list) {
+		ret = vfio_ap_mdev_reset_queue(q);
+		/*
+		 * Regardless whether a queue turns out to be busy, or
+		 * is not operational, we need to continue resetting
+		 * the remaining queues but notice the last error code.
+		 */
+		if (ret)
+			rc = ret;
 	}
 
 	return rc;
@@ -868,10 +870,10 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
 	if (matrix_mdev->kvm)
 		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
 
+	matrix_mdev->kvm = NULL;
 	vfio_ap_mdev_reset_queues(mdev);
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
 				 &matrix_mdev->group_notifier);
-	matrix_mdev->kvm = NULL;
 	module_put(THIS_MODULE);
 }
 
@@ -905,7 +907,9 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
 		ret = vfio_ap_mdev_get_device_info(arg);
 		break;
 	case VFIO_DEVICE_RESET:
+		mutex_lock(&matrix_dev->lock);
 		ret = vfio_ap_mdev_reset_queues(mdev);
+		mutex_unlock(&matrix_dev->lock);
 		break;
 	default:
 		ret = -EOPNOTSUPP;
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index a910be1..3e6940c 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -40,6 +40,7 @@ struct ap_matrix_dev {
 	atomic_t available_instances;
 	struct ap_config_info info;
 	struct list_head mdev_list;
+	struct list_head free_list;
 	struct mutex lock;
 	struct ap_driver  *vfio_ap_drv;
 };
@@ -83,9 +84,15 @@ struct ap_matrix_mdev {
 	struct notifier_block group_notifier;
 	struct kvm *kvm;
 	struct kvm_s390_module_hook pqap_hook;
+	struct list_head qlist;
 };
 
 extern int vfio_ap_mdev_register(void);
 extern void vfio_ap_mdev_unregister(void);
 
+struct vfio_ap_queue {
+	struct list_head list;
+	struct ap_matrix_mdev *matrix_mdev;
+	int	apqn;
+};
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 3/7] vfio: ap: register IOMMU VFIO notifier
  2019-03-13 16:04 [PATCH v5 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
  2019-03-13 16:04 ` [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC Pierre Morel
  2019-03-13 16:04 ` [PATCH v5 2/7] s390: ap: new vfio_ap_queue structure Pierre Morel
@ 2019-03-13 16:05 ` Pierre Morel
  2019-03-13 16:05 ` [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device Pierre Morel
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-13 16:05 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

To be able to use the VFIO interface to facilitate the
mediated device memory pinning/unpinning we need to register
a notifier for IOMMU.

While we will start to pin one guest page for the interrupt indicator
byte, this is still ok with ballooning as this page will never be
used by the guest virtio-balloon driver.
So the pinned page will never be freed. And even a broken guest does
so, that would not impact the host as the original page is still
in control by vfio.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/crypto/vfio_ap_ops.c     | 53 ++++++++++++++++++++++++++++++++---
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 485595c..0f8952c23 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -757,6 +757,36 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
 };
 
 /**
+ * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
+ *
+ * @nb: The notifier block
+ * @action: Action to be taken
+ * @data: data associated with the request
+ *
+ * For an UNMAP request, unpin the guest IOVA (the NIB guest address we
+ * pinned before). Other requests are ignored.
+ *
+ */
+static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
+				       unsigned long action, void *data)
+{
+	struct ap_matrix_mdev *matrix_mdev;
+
+	matrix_mdev = container_of(nb, struct ap_matrix_mdev, iommu_notifier);
+
+	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
+		struct vfio_iommu_type1_dma_unmap *unmap = data;
+		unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
+
+		vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
+		return NOTIFY_OK;
+	}
+
+	return NOTIFY_DONE;
+}
+
+
+/**
  * vfio_ap_mdev_set_kvm
  *
  * @matrix_mdev: a mediated matrix device
@@ -855,12 +885,25 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
 
 	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
 				     &events, &matrix_mdev->group_notifier);
-	if (ret) {
-		module_put(THIS_MODULE);
-		return ret;
-	}
+	if (ret)
+		goto err_group;
+
+	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
+	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
+
+	ret = vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+				     &events, &matrix_mdev->iommu_notifier);
+	if (ret)
+		goto err_iommu;
 
 	return 0;
+
+err_iommu:
+	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
+				 &matrix_mdev->group_notifier);
+err_group:
+	module_put(THIS_MODULE);
+	return ret;
 }
 
 static void vfio_ap_mdev_release(struct mdev_device *mdev)
@@ -872,6 +915,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
 
 	matrix_mdev->kvm = NULL;
 	vfio_ap_mdev_reset_queues(mdev);
+	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+				 &matrix_mdev->iommu_notifier);
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
 				 &matrix_mdev->group_notifier);
 	module_put(THIS_MODULE);
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 3e6940c..4a287c8 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -82,9 +82,11 @@ struct ap_matrix_mdev {
 	struct list_head node;
 	struct ap_matrix matrix;
 	struct notifier_block group_notifier;
+	struct notifier_block iommu_notifier;
 	struct kvm *kvm;
 	struct kvm_s390_module_hook pqap_hook;
 	struct list_head qlist;
+	struct mdev_device *mdev;
 };
 
 extern int vfio_ap_mdev_register(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device
  2019-03-13 16:04 [PATCH v5 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
                   ` (2 preceding siblings ...)
  2019-03-13 16:05 ` [PATCH v5 3/7] vfio: ap: register IOMMU VFIO notifier Pierre Morel
@ 2019-03-13 16:05 ` Pierre Morel
  2019-03-15 18:15   ` Halil Pasic
  2019-03-13 16:05 ` [PATCH v5 5/7] s390: ap: implement PAPQ AQIC interception in kernel Pierre Morel
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Pierre Morel @ 2019-03-13 16:05 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

When the mediated device is open we setup the relation with KVM unset it
when the mediated device is released.

We ensure KVM is present on opening of the mediated device.

We ensure that KVM survives the mediated device, and establish a direct
link from KVM to the mediated device to simplify the relationship.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 80 ++++++++++++++++++++++-----------------
 1 file changed, 46 insertions(+), 34 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 0f8952c23..6b559ca 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -790,7 +790,6 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
  * vfio_ap_mdev_set_kvm
  *
  * @matrix_mdev: a mediated matrix device
- * @kvm: reference to KVM instance
  *
  * Verifies no other mediated matrix device has @kvm and sets a reference to
  * it in @matrix_mdev->kvm.
@@ -798,53 +797,39 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
  * Return 0 if no other mediated matrix device has a reference to @kvm;
  * otherwise, returns an -EPERM.
  */
-static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
-				struct kvm *kvm)
+static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev)
 {
-	struct ap_matrix_mdev *m;
-
 	mutex_lock(&matrix_dev->lock);
+	if (matrix_mdev->kvm->arch.crypto.pqap_hook)
+		goto err_unlock;
 
-	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
-		if ((m != matrix_mdev) && (m->kvm == kvm)) {
-			mutex_unlock(&matrix_dev->lock);
-			return -EPERM;
-		}
-	}
+	if (!matrix_mdev->kvm->arch.crypto.crycbd)
+		goto err_unlock;
 
-	matrix_mdev->kvm = kvm;
-	mutex_unlock(&matrix_dev->lock);
+	matrix_mdev->kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
 
+	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
+				  matrix_mdev->matrix.aqm,
+				  matrix_mdev->matrix.adm);
+	kvm_get_kvm(matrix_mdev->kvm);
+	mutex_unlock(&matrix_dev->lock);
 	return 0;
+
+err_unlock:
+	mutex_unlock(&matrix_dev->lock);
+	return -EPERM;
 }
 
 static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 				       unsigned long action, void *data)
 {
-	int ret;
 	struct ap_matrix_mdev *matrix_mdev;
 
 	if (action != VFIO_GROUP_NOTIFY_SET_KVM)
 		return NOTIFY_OK;
 
 	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
-
-	if (!data) {
-		matrix_mdev->kvm = NULL;
-		return NOTIFY_OK;
-	}
-
-	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
-	if (ret)
-		return NOTIFY_DONE;
-
-	/* If there is no CRYCB pointer, then we can't copy the masks */
-	if (!matrix_mdev->kvm->arch.crypto.crycbd)
-		return NOTIFY_DONE;
-
-	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
-				  matrix_mdev->matrix.aqm,
-				  matrix_mdev->matrix.adm);
+	matrix_mdev->kvm = data;
 
 	return NOTIFY_OK;
 }
@@ -888,6 +873,12 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
 	if (ret)
 		goto err_group;
 
+	/* We do not support opening the mediated device without KVM */
+	if (!matrix_mdev->kvm) {
+		ret = -ENODEV;
+		goto err_group;
+	}
+
 	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
 	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
 
@@ -896,8 +887,15 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
 	if (ret)
 		goto err_iommu;
 
+	ret = vfio_ap_mdev_set_kvm(matrix_mdev);
+	if (ret)
+		goto err_kvm;
+
 	return 0;
 
+err_kvm:
+	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+				 &matrix_mdev->iommu_notifier);
 err_iommu:
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
 				 &matrix_mdev->group_notifier);
@@ -906,19 +904,33 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
 	return ret;
 }
 
-static void vfio_ap_mdev_release(struct mdev_device *mdev)
+static int vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
 {
-	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+	struct kvm *kvm = matrix_mdev->kvm;
 
 	if (matrix_mdev->kvm)
 		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-
+	vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+	matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
 	matrix_mdev->kvm = NULL;
+
+	kvm_put_kvm(kvm);
+	return 0;
+}
+
+static void vfio_ap_mdev_release(struct mdev_device *mdev)
+{
+	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+	mutex_lock(&matrix_dev->lock);
+
 	vfio_ap_mdev_reset_queues(mdev);
+	vfio_ap_mdev_unset_kvm(matrix_mdev);
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
 				 &matrix_mdev->iommu_notifier);
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
 				 &matrix_mdev->group_notifier);
+	mutex_unlock(&matrix_dev->lock);
 	module_put(THIS_MODULE);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 5/7] s390: ap: implement PAPQ AQIC interception in kernel
  2019-03-13 16:04 [PATCH v5 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
                   ` (3 preceding siblings ...)
  2019-03-13 16:05 ` [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device Pierre Morel
@ 2019-03-13 16:05 ` Pierre Morel
  2019-03-13 16:05 ` [PATCH v5 6/7] s390: ap: Cleanup on removing the AP device Pierre Morel
  2019-03-13 16:05 ` [PATCH v5 7/7] s390: ap: kvm: Enable PQAP/AQIC facility for the guest Pierre Morel
  6 siblings, 0 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-13 16:05 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

We register the AP PQAP instruction hook during the open
of the mediated device. And unregister it on release.

In the AP PQAP instruction hook, if we receive a demand to
enable IRQs,
- we retrieve the vfio_ap_queue based on the APQN we receive
  in REG1,
- we retrieve the page of the guest address, (NIB), from
  register REG2
- we the mediated device to use the VFIO pinning infratrsucture
  to pin the page of the guest address,
- we retrieve the pointer to KVM to register the guest ISC
  and retrieve the host ISC
- finaly we activate GISA

If we receive a demand to disable IRQs,
- we deactivate GISA
- unregister from the GIB
- unping the NIB

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 drivers/s390/crypto/ap_bus.h          |   1 +
 drivers/s390/crypto/vfio_ap_drv.c     |   2 +
 drivers/s390/crypto/vfio_ap_ops.c     | 204 +++++++++++++++++++++++++++++++++-
 drivers/s390/crypto/vfio_ap_private.h |   6 +
 4 files changed, 210 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index d0059ea..9a4fd96 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -43,6 +43,7 @@ static inline int ap_test_bit(unsigned int *ptr, unsigned int nr)
 #define AP_RESPONSE_BUSY		0x05
 #define AP_RESPONSE_INVALID_ADDRESS	0x06
 #define AP_RESPONSE_OTHERWISE_CHANGED	0x07
+#define AP_RESPONSE_INVALID_GISA	0x08
 #define AP_RESPONSE_Q_FULL		0x10
 #define AP_RESPONSE_NO_PENDING_REPLY	0x10
 #define AP_RESPONSE_INDEX_TOO_BIG	0x11
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index df6f21a..796e73d4 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -55,6 +55,8 @@ static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
 		return -ENOMEM;
 	dev_set_drvdata(&apdev->device, q);
 	q->apqn = to_ap_queue(&apdev->device)->qid;
+	q->a_isc = VFIO_AP_ISC_INVALID;
+	q->p_isc = VFIO_AP_ISC_INVALID;
 	INIT_LIST_HEAD(&q->list);
 	mutex_lock(&matrix_dev->lock);
 	list_add(&q->list, &matrix_dev->free_list);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 6b559ca..dc3454f 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -66,6 +66,194 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
 	return -EBUSY;
 }
 
+/**
+ * vfio_ap_free_irq:
+ * @q: The vfio_ap_queue
+ *
+ * Unpin the guest NIB
+ * Unregister the ISC from the GIB alert
+ * Clear the vfio_ap_queue intern fields
+ */
+static void vfio_ap_free_irq(struct vfio_ap_queue *q)
+{
+	if (!q)
+		return;
+	if (q->a_pfn)
+		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &q->a_pfn, 1);
+	if (q->a_isc != VFIO_AP_ISC_INVALID)
+		kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->a_isc);
+	q->a_pfn = 0;
+	q->p_pfn = 0;
+	q->a_isc = VFIO_AP_ISC_INVALID;
+	q->p_isc = VFIO_AP_ISC_INVALID;
+}
+
+/**
+ * vfio_ap_clrirq: Disable Interruption for a APQN
+ *
+ * @dev: the device associated with the ap_queue
+ * @q:   the vfio_ap_queue holding AQIC parameters
+ *
+ * Issue the host side PQAP/AQIC
+ * On success: unpin the NIB saved in *q and unregister from GIB
+ * interface
+ *
+ * Return the ap_queue_status returned by the ap_aqic()
+ */
+static struct ap_queue_status vfio_ap_clrirq(struct vfio_ap_queue *q)
+{
+	struct ap_qirq_ctrl aqic_gisa = {};
+	struct ap_queue_status status;
+
+	status = ap_aqic(q->apqn, aqic_gisa, NULL);
+	if (!status.response_code)
+		vfio_ap_free_irq(q);
+
+	return status;
+}
+
+/**
+ * vfio_ap_setirq: Enable Interruption for a APQN
+ *
+ * @dev: the device associated with the ap_queue
+ * @q:   the vfio_ap_queue holding AQIC parameters
+ *
+ * Pin the NIB saved in *q
+ * Register the guest ISC to GIB interface and retrieve the
+ * host ISC to issue the host side PQAP/AQIC
+ *
+ * Response.status may be set to following Response Code in case of error:
+ * - AP_RESPONSE_INVALID_ADDRESS: vfio_pin_pages failed
+ * - AP_RESPONSE_OTHERWISE_CHANGED: Hypervizor GISA internal error
+ *
+ * Otherwise return the ap_queue_status returned by the ap_aqic()
+ */
+static struct ap_queue_status vfio_ap_setirq(struct vfio_ap_queue *q)
+{
+	struct ap_qirq_ctrl aqic_gisa = {};
+	struct ap_queue_status status = {};
+	struct kvm_s390_gisa *gisa;
+	struct kvm *kvm;
+	unsigned long h_nib, h_pfn;
+	int ret;
+
+	kvm = q->matrix_mdev->kvm;
+	gisa = kvm->arch.gisa_int.origin;
+
+	q->a_pfn = q->a_nib >> PAGE_SHIFT;
+	ret = vfio_pin_pages(mdev_dev(q->matrix_mdev->mdev), &q->a_pfn, 1,
+			     IOMMU_READ | IOMMU_WRITE, &h_pfn);
+	switch (ret) {
+	case 1:
+		break;
+	case -EINVAL:
+	case -E2BIG:
+		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
+		/* Fallthrough */
+	default:
+		return status;
+	}
+
+	h_nib = (h_pfn << PAGE_SHIFT) | (q->a_nib & ~PAGE_MASK);
+	aqic_gisa.gisc = q->a_isc;
+	aqic_gisa.isc = kvm_s390_gisc_register(kvm, q->a_isc);
+	aqic_gisa.ir = 1;
+	aqic_gisa.gisa = gisa->next_alert >> 4;
+
+	status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
+	switch (status.response_code) {
+	case AP_RESPONSE_NORMAL:
+		/* See if we did clear older IRQ configuration */
+		if (q->p_pfn)
+			vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
+					 &q->p_pfn, 1);
+		if (q->p_isc != VFIO_AP_ISC_INVALID)
+			kvm_s390_gisc_unregister(kvm, q->p_isc);
+		q->p_pfn = q->a_pfn;
+		q->p_isc = q->a_isc;
+		break;
+	case AP_RESPONSE_OTHERWISE_CHANGED:
+		/* We could not modify IRQ setings: clear new configuration */
+		vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), &q->a_pfn, 1);
+		kvm_s390_gisc_unregister(kvm, q->a_isc);
+		break;
+	case AP_RESPONSE_INVALID_GISA:
+		status.response_code = AP_RESPONSE_INVALID_ADDRESS;
+	default:	/* Fall Through */
+		pr_warn("%s: apqn %04x: response: %02x\n", __func__, q->apqn,
+			status.response_code);
+		vfio_ap_free_irq(q);
+		break;
+	}
+
+	return status;
+}
+
+/**
+ * handle_pqap: PQAP instruction callback
+ *
+ * @vcpu: The vcpu on which we received the PQAP instruction
+ *
+ * Get the general register contents to initialize internal variables.
+ * REG[0]: APQN
+ * REG[1]: IR and ISC
+ * REG[2]: NIB
+ *
+ * Response.status may be set to following Response Code:
+ * - AP_RESPONSE_Q_NOT_AVAIL: if the queue is not available
+ * - AP_RESPONSE_DECONFIGURED: if the queue is not configured
+ * - AP_RESPONSE_NORMAL (0) : in case of successs
+ *   Check vfio_ap_setirq() and vfio_ap_clrirq() for other possible RC.
+ * We take the matrix_dev lock to ensure serialization on queues and
+ * mediated device access.
+ *
+ * Return 0 if we could handle the request inside KVM.
+ * otherwise, returns -EOPNOTSUPP to let QEMU handle the fault.
+ */
+static int handle_pqap(struct kvm_vcpu *vcpu)
+{
+	uint64_t status;
+	uint16_t apqn;
+	struct vfio_ap_queue *q;
+	struct ap_queue_status qstatus = {
+			       .response_code = AP_RESPONSE_Q_NOT_AVAIL, };
+	struct ap_matrix_mdev *matrix_mdev;
+
+	/* If we do not use the AIV facility just go to userland */
+	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
+		return -EOPNOTSUPP;
+
+	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
+	mutex_lock(&matrix_dev->lock);
+	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
+				   struct ap_matrix_mdev, pqap_hook);
+	if (!matrix_mdev)
+		goto out_unlock;
+	q = vfio_ap_get_queue(apqn, &matrix_mdev->qlist);
+	if (!q)
+		goto out_noqueue;
+
+	status = vcpu->run->s.regs.gprs[1];
+
+	/* If IR bit(16) is set we enable the interrupt */
+	if ((status >> (63 - 16)) & 0x01) {
+		q->a_isc = status & 0x07;
+		q->a_nib = vcpu->run->s.regs.gprs[2];
+		qstatus = vfio_ap_setirq(q);
+		if (qstatus.response_code) {
+			q->a_nib = 0;
+			q->a_isc = VFIO_AP_ISC_INVALID;
+		}
+	} else
+		qstatus = vfio_ap_clrirq(q);
+
+out_noqueue:
+	memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus));
+out_unlock:
+	mutex_unlock(&matrix_dev->lock);
+	return 0;
+}
+
 static void vfio_ap_matrix_init(struct ap_config_info *info,
 				struct ap_matrix *matrix)
 {
@@ -88,8 +276,11 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 	}
 
 	INIT_LIST_HEAD(&matrix_mdev->qlist);
+	matrix_mdev->mdev = mdev;
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
 	mdev_set_drvdata(mdev, matrix_mdev);
+	matrix_mdev->pqap_hook.hook = handle_pqap;
+	matrix_mdev->pqap_hook.owner = THIS_MODULE;
 	mutex_lock(&matrix_dev->lock);
 	list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
 	mutex_unlock(&matrix_dev->lock);
@@ -100,11 +291,17 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 static int vfio_ap_mdev_remove(struct mdev_device *mdev)
 {
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+	struct vfio_ap_queue *q, *qtmp;
 
 	if (matrix_mdev->kvm)
 		return -EBUSY;
 
 	mutex_lock(&matrix_dev->lock);
+	list_for_each_entry_safe(q, qtmp, &matrix_mdev->qlist, list) {
+		q->matrix_mdev = NULL;
+		vfio_ap_mdev_reset_queue(q);
+		list_move(&q->list, &matrix_dev->free_list);
+	}
 	list_del(&matrix_mdev->node);
 	mutex_unlock(&matrix_dev->lock);
 
@@ -756,7 +953,7 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
 	NULL
 };
 
-/**
+ /*
  * vfio_ap_mdev_iommu_notifier: IOMMU notifier callback
  *
  * @nb: The notifier block
@@ -776,9 +973,10 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
 
 	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
 		struct vfio_iommu_type1_dma_unmap *unmap = data;
-		unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
+		unsigned long pfn = unmap->iova >> PAGE_SHIFT;
 
-		vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &g_pfn, 1);
+		if (matrix_mdev->mdev)
+			vfio_unpin_pages(mdev_dev(matrix_mdev->mdev), &pfn, 1);
 		return NOTIFY_OK;
 	}
 
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 4a287c8..968d8aa 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -95,6 +95,12 @@ extern void vfio_ap_mdev_unregister(void);
 struct vfio_ap_queue {
 	struct list_head list;
 	struct ap_matrix_mdev *matrix_mdev;
+	unsigned long a_nib;
+	unsigned long a_pfn;
+	unsigned long p_pfn;
 	int	apqn;
+#define VFIO_AP_ISC_INVALID 0xff
+	unsigned char a_isc;
+	unsigned char p_isc;
 };
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 6/7] s390: ap: Cleanup on removing the AP device
  2019-03-13 16:04 [PATCH v5 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
                   ` (4 preceding siblings ...)
  2019-03-13 16:05 ` [PATCH v5 5/7] s390: ap: implement PAPQ AQIC interception in kernel Pierre Morel
@ 2019-03-13 16:05 ` Pierre Morel
  2019-03-13 16:05 ` [PATCH v5 7/7] s390: ap: kvm: Enable PQAP/AQIC facility for the guest Pierre Morel
  6 siblings, 0 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-13 16:05 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

When a AP device is remove, clear the queue's APID bit in the guest CRYCB.
to be sure that the guest will not access the AP queue anymore.

Then we clear the interruptions and reset the AP device properly.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     | 36 +++++++++++++++++++++++++++++++++++
 drivers/s390/crypto/vfio_ap_ops.c     | 16 +++++++++++++---
 drivers/s390/crypto/vfio_ap_private.h |  3 +++
 3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 796e73d4..850ba6e 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -5,6 +5,7 @@
  * Copyright IBM Corp. 2018
  *
  * Author(s): Tony Krowiak <akrowiak@linux.ibm.com>
+ *	      Pierre Morel <pmorel@linux.ibm.com>
  */
 
 #include <linux/module.h>
@@ -12,6 +13,8 @@
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <asm/facility.h>
+#include <linux/bitops.h>
+#include <linux/kvm_host.h>
 #include "vfio_ap_private.h"
 
 #define VFIO_AP_ROOT_NAME "vfio_ap"
@@ -65,6 +68,33 @@ static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
 }
 
 /**
+ * vfio_ap_update_crycb
+ * @q: A pointer to the queue being removed
+ *
+ * We clear the APID of the queue, making this queue unusable for the guest.
+ * After this function we can reset the queue without to fear a race with
+ * the guest to access the queue again.
+ * We do not fear race with the host as we still get the device.
+ */
+static void vfio_ap_update_crycb(struct vfio_ap_queue *q)
+{
+	struct ap_matrix_mdev *matrix_mdev = q->matrix_mdev;
+
+	if (!matrix_mdev)
+		return;
+
+	clear_bit_inv(AP_QID_CARD(q->apqn), matrix_mdev->matrix.apm);
+
+	if (!matrix_mdev->kvm)
+		return;
+
+	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+				  matrix_mdev->matrix.apm,
+				  matrix_mdev->matrix.aqm,
+				  matrix_mdev->matrix.adm);
+}
+
+/**
  * vfio_ap_queue_dev_remove:
  *
  * Free the associated vfio_ap_queue structure
@@ -74,7 +104,13 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
 	struct vfio_ap_queue *q;
 
 	q = dev_get_drvdata(&apdev->device);
+	if (!q)
+		return;
+
 	mutex_lock(&matrix_dev->lock);
+	vfio_ap_update_crycb(q);
+	vfio_ap_mdev_reset_queue(q);
+	vfio_ap_free_irq(q);
 	list_del(&q->list);
 	mutex_unlock(&matrix_dev->lock);
 	kfree(q);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index dc3454f..dc3caa7 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -43,15 +43,22 @@ struct vfio_ap_queue *vfio_ap_get_queue(int apqn, struct list_head *l)
 	return NULL;
 }
 
-static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
+int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
 {
 	struct ap_queue_status status;
-	int retry = 1;
+	int retry = 20;
 
 	do {
 		status = ap_zapq(q->apqn);
 		switch (status.response_code) {
 		case AP_RESPONSE_NORMAL:
+			while (!status.queue_empty && retry--) {
+				msleep(20);
+				status = ap_tapq(q->apqn, NULL);
+			}
+			if (retry <= 0)
+				pr_warn("%s: queue 0x%04x not empty\n",
+					__func__, q->apqn);
 			return 0;
 		case AP_RESPONSE_RESET_IN_PROGRESS:
 		case AP_RESPONSE_BUSY:
@@ -74,7 +81,7 @@ static int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q)
  * Unregister the ISC from the GIB alert
  * Clear the vfio_ap_queue intern fields
  */
-static void vfio_ap_free_irq(struct vfio_ap_queue *q)
+void vfio_ap_free_irq(struct vfio_ap_queue *q)
 {
 	if (!q)
 		return;
@@ -300,6 +307,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
 	list_for_each_entry_safe(q, qtmp, &matrix_mdev->qlist, list) {
 		q->matrix_mdev = NULL;
 		vfio_ap_mdev_reset_queue(q);
+		vfio_ap_free_irq(q);
 		list_move(&q->list, &matrix_dev->free_list);
 	}
 	list_del(&matrix_mdev->node);
@@ -362,6 +370,7 @@ static void vfio_ap_free_queue(int apqn, struct ap_matrix_mdev *matrix_mdev)
 		return;
 	q->matrix_mdev = NULL;
 	vfio_ap_mdev_reset_queue(q);
+	vfio_ap_free_irq(q);
 	list_move(&q->list, &matrix_dev->free_list);
 }
 
@@ -1041,6 +1050,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
 
 	list_for_each_entry(q, &matrix_mdev->qlist, list) {
 		ret = vfio_ap_mdev_reset_queue(q);
+		vfio_ap_free_irq(q);
 		/*
 		 * Regardless whether a queue turns out to be busy, or
 		 * is not operational, we need to continue resetting
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 968d8aa..9fe580b 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -4,6 +4,7 @@
  *
  * Author(s): Tony Krowiak <akrowiak@linux.ibm.com>
  *	      Halil Pasic <pasic@linux.ibm.com>
+ *	      Pierre Morel <pmorel@linux.ibm.com>
  *
  * Copyright IBM Corp. 2018
  */
@@ -103,4 +104,6 @@ struct vfio_ap_queue {
 	unsigned char a_isc;
 	unsigned char p_isc;
 };
+void vfio_ap_free_irq(struct vfio_ap_queue *q);
+int vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q);
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 7/7] s390: ap: kvm: Enable PQAP/AQIC facility for the guest
  2019-03-13 16:04 [PATCH v5 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
                   ` (5 preceding siblings ...)
  2019-03-13 16:05 ` [PATCH v5 6/7] s390: ap: Cleanup on removing the AP device Pierre Morel
@ 2019-03-13 16:05 ` Pierre Morel
  6 siblings, 0 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-13 16:05 UTC (permalink / raw)
  To: borntraeger
  Cc: alex.williamson, cohuck, linux-kernel, linux-s390, kvm, frankja,
	akrowiak, pasic, david, schwidefsky, heiko.carstens, freude,
	mimu

AP Queue Interruption Control (AQIC) facility gives
the guest the possibility to control interruption for
the Cryptographic Adjunct Processor queues.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 arch/s390/tools/gen_facilities.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/s390/tools/gen_facilities.c b/arch/s390/tools/gen_facilities.c
index 04c5f07..be4b826 100644
--- a/arch/s390/tools/gen_facilities.c
+++ b/arch/s390/tools/gen_facilities.c
@@ -111,6 +111,7 @@ static struct facility_def facility_defs[] = {
 		.bits = (int[]){
 			12, /* AP Query Configuration Information */
 			15, /* AP Facilities Test */
+			65, /* AP Queue Interruption Control */
 			156, /* etoken facility */
 			-1  /* END */
 		}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-13 16:04 ` [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC Pierre Morel
@ 2019-03-15 10:20   ` Cornelia Huck
  2019-03-15 13:26     ` Pierre Morel
  0 siblings, 1 reply; 29+ messages in thread
From: Cornelia Huck @ 2019-03-15 10:20 UTC (permalink / raw)
  To: Pierre Morel
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On Wed, 13 Mar 2019 17:04:58 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> +/*
> + * handle_pqap: Handling pqap interception
> + * @vcpu: the vcpu having issue the pqap instruction
> + *
> + * We now support PQAP/AQIC instructions and we need to correctly
> + * answer the guest even if no dedicated driver's hook is available.
> + *
> + * The intercepting code calls a dedicated callback for this instruction
> + * if a driver did register one in the CRYPTO satellite of the
> + * SIE block.
> + *
> + * For PQAP/AQIC instructions only, verify privilege and specifications.
> + *
> + * If no callback available, the queues are not available, return this to
> + * the caller.
> + * Else return the value returned by the callback.
> + */
> +static int handle_pqap(struct kvm_vcpu *vcpu)
> +{
> +	uint8_t fc;
> +	struct ap_queue_status status = {};
> +	int ret;
> +	/* Verify that the AP instruction are available */
> +	if (!ap_instructions_available())
> +		return -EOPNOTSUPP;
> +	/* Verify that the guest is allowed to use AP instructions */
> +	if (!(vcpu->arch.sie_block->eca & ECA_APIE))
> +		return -EOPNOTSUPP;
> +	/* Verify that the function code is AQIC */
> +	fc = vcpu->run->s.regs.gprs[0] >> 24;
> +	/* We do not want to change the behavior we had before this patch*/
> +	if (fc != 0x03)
> +		return -EOPNOTSUPP;
> +
> +	/* PQAP instructions are allowed for guest kernel only */
> +	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> +		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
> +	/* AQIC instruction is allowed only if facility 65 is available */
> +	if (!test_kvm_facility(vcpu->kvm, 65))
> +		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
> +	/* Verify that the hook callback is registered and call it */
> +	if (vcpu->kvm->arch.crypto.pqap_hook) {
> +		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> +			return -EOPNOTSUPP;
> +		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
> +		module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
> +		return ret;
> +	}
> +	/*
> +	 * It is the duty of the vfio_driver to register a hook
> +	 * If it does not and we get an exception on AQIC we must
> +	 * guess that there is no vfio_ap_driver at all and no one
> +	 * to handle the guests's CRYCB and the CRYCB is empty.
> +	 */
> +	status.response_code = 0x01;

I'm still confused here, sorry. From previous discussions I recall that
this indicates "no crypto device" (please correct me if I'm wrong.)

Before this patch, we had:
- guest issues PQAP/AQIC -> drop to userspace

With a correct implementation, we get:
- guest issues PQAP/AQIC -> callback does what needs to be done

With an incorrect implementation (no callback), we get:
- guest issues PQAP/AQIC -> guest gets response code 0x01

Why not drop to userspace in that case?


> +	memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/7] s390: ap: new vfio_ap_queue structure
  2019-03-13 16:04 ` [PATCH v5 2/7] s390: ap: new vfio_ap_queue structure Pierre Morel
@ 2019-03-15 10:33   ` Cornelia Huck
  2019-03-15 13:29     ` Pierre Morel
  0 siblings, 1 reply; 29+ messages in thread
From: Cornelia Huck @ 2019-03-15 10:33 UTC (permalink / raw)
  To: Pierre Morel
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On Wed, 13 Mar 2019 17:04:59 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index e9824c3..df6f21a 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -40,14 +40,42 @@ static struct ap_device_id ap_queue_ids[] = {
>  
>  MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>  
> +/**
> + * vfio_ap_queue_dev_probe:
> + *
> + * Allocate a vfio_ap_queue structure and associate it
> + * with the device as driver_data.
> + */
>  static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>  {
> +	struct vfio_ap_queue *q;
> +
> +	q = kzalloc(sizeof(*q), GFP_KERNEL);
> +	if (!q)
> +		return -ENOMEM;
> +	dev_set_drvdata(&apdev->device, q);
> +	q->apqn = to_ap_queue(&apdev->device)->qid;
> +	INIT_LIST_HEAD(&q->list);
> +	mutex_lock(&matrix_dev->lock);
> +	list_add(&q->list, &matrix_dev->free_list);
> +	mutex_unlock(&matrix_dev->lock);

From what I can see, dealing with the free_list is supposed to be
protected by the matrix_dev mutex, and at a glance, it indeed seems to
be held every time you interact with the list. I think it would be good
to document that with a comment.

(I have not reviewed this deeply, and I won't be able to look at this
more until April, sorry.)


>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-15 10:20   ` Cornelia Huck
@ 2019-03-15 13:26     ` Pierre Morel
  2019-03-15 13:41       ` Cornelia Huck
                         ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-15 13:26 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On 15/03/2019 11:20, Cornelia Huck wrote:
> On Wed, 13 Mar 2019 17:04:58 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> +/*
>> + * handle_pqap: Handling pqap interception
>> + * @vcpu: the vcpu having issue the pqap instruction
>> + *
>> + * We now support PQAP/AQIC instructions and we need to correctly
>> + * answer the guest even if no dedicated driver's hook is available.
>> + *
>> + * The intercepting code calls a dedicated callback for this instruction
>> + * if a driver did register one in the CRYPTO satellite of the
>> + * SIE block.
>> + *
>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>> + *
>> + * If no callback available, the queues are not available, return this to
>> + * the caller.
>> + * Else return the value returned by the callback.
>> + */
>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>> +{
>> +	uint8_t fc;
>> +	struct ap_queue_status status = {};
>> +	int ret;
>> +	/* Verify that the AP instruction are available */
>> +	if (!ap_instructions_available())
>> +		return -EOPNOTSUPP;
>> +	/* Verify that the guest is allowed to use AP instructions */
>> +	if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>> +		return -EOPNOTSUPP;
>> +	/* Verify that the function code is AQIC */
>> +	fc = vcpu->run->s.regs.gprs[0] >> 24;
>> +	/* We do not want to change the behavior we had before this patch*/
>> +	if (fc != 0x03)
>> +		return -EOPNOTSUPP;
>> +
>> +	/* PQAP instructions are allowed for guest kernel only */
>> +	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>> +		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>> +	/* AQIC instruction is allowed only if facility 65 is available */
>> +	if (!test_kvm_facility(vcpu->kvm, 65))
>> +		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>> +	/* Verify that the hook callback is registered and call it */
>> +	if (vcpu->kvm->arch.crypto.pqap_hook) {
>> +		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
>> +			return -EOPNOTSUPP;
>> +		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
>> +		module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
>> +		return ret;
>> +	}
>> +	/*
>> +	 * It is the duty of the vfio_driver to register a hook
>> +	 * If it does not and we get an exception on AQIC we must
>> +	 * guess that there is no vfio_ap_driver at all and no one
>> +	 * to handle the guests's CRYCB and the CRYCB is empty.
>> +	 */
>> +	status.response_code = 0x01;
> 
> I'm still confused here, sorry. From previous discussions I recall that
> this indicates "no crypto device" (please correct me if I'm wrong.)
> 
> Before this patch, we had:
> - guest issues PQAP/AQIC -> drop to userspace
> 
> With a correct implementation, we get:
> - guest issues PQAP/AQIC -> callback does what needs to be done
> 
> With an incorrect implementation (no callback), we get:
> - guest issues PQAP/AQIC -> guest gets response code 0x01
> 
> Why not drop to userspace in that case?

This is what I had in the previous patches.
Hum, I do not remember which discussion lead me to modify this.

Anyway, now that you put the finger on this problem, I think the problem 
is worse.

The behavior with old / new Linux, vfio driver and qemu is:

LINUX	VFIO_AP	QEMU	PGM
OLD	x	x	OPERATION
NEW	-	OLD	SPECIFICATION
NEW	-	NEW/aqic=off	SPECIFICATION
NEW	x	NEW/aqic=on	-

x = whatever
- = absent/none

So yes there is a change in behavior for the userland for the case QEMU 
do not set the AQIC facility 65, OLD QEMU or NEW QEMU wanting to behave 
like an older one.

I fear we have the same problem with the privileged operation...

For the last case, when the kvm_facility(65) is set, the explication is 
the following:

This is related to the handling of PQAP AQIC which is now authorized by 
this patch series.
If we authorize PQAP AQIC, by setting the bit for facility 65, the guest 
can use this instruction.
If the instruction follows the specifications we must answer something 
realistic and since there is nothing in the CRYCB (no driver) we answer 
that there is no queue.

Conclusion:  we must handle this in userland, it will have the benefit 
to keep old behavior when there is no callback.
OLD QEMU will not see change as they will not set aqic facility
NEW QEMU will handle this correctly.

In this case we also do not need to handle all other tests here but can 
move it to the callback as Tony wanted.

Would you agree with something simple like:

static int handle_pqap(struct kvm_vcpu *vcpu)
{
         int ret = -EOPNOTSUPP;

         /* Verify that the hook callback is registered and call it */
         if (pqap_hook)
                 if (try_module_get(pqap_hook->owner)) {
                         ret = pqap_hook->hook(vcpu);
                         module_put(pqap_hook->owner);
                 }
         return ret;
}

All other tests in QEMU and in the callback.

Thanks for the comments.

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/7] s390: ap: new vfio_ap_queue structure
  2019-03-15 10:33   ` Cornelia Huck
@ 2019-03-15 13:29     ` Pierre Morel
  0 siblings, 0 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-15 13:29 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On 15/03/2019 11:33, Cornelia Huck wrote:
> On Wed, 13 Mar 2019 17:04:59 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index e9824c3..df6f21a 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -40,14 +40,42 @@ static struct ap_device_id ap_queue_ids[] = {
>>   
>>   MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
>>   
>> +/**
>> + * vfio_ap_queue_dev_probe:
>> + *
>> + * Allocate a vfio_ap_queue structure and associate it
>> + * with the device as driver_data.
>> + */
>>   static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
>>   {
>> +	struct vfio_ap_queue *q;
>> +
>> +	q = kzalloc(sizeof(*q), GFP_KERNEL);
>> +	if (!q)
>> +		return -ENOMEM;
>> +	dev_set_drvdata(&apdev->device, q);
>> +	q->apqn = to_ap_queue(&apdev->device)->qid;
>> +	INIT_LIST_HEAD(&q->list);
>> +	mutex_lock(&matrix_dev->lock);
>> +	list_add(&q->list, &matrix_dev->free_list);
>> +	mutex_unlock(&matrix_dev->lock);
> 
>  From what I can see, dealing with the free_list is supposed to be
> protected by the matrix_dev mutex, and at a glance, it indeed seems to
> be held every time you interact with the list. I think it would be good
> to document that with a comment.

Yes.
It is a big lock but only on administrative tasks so I think it is harmless.

> 
> (I have not reviewed this deeply, and I won't be able to look at this
> more until April, sorry.)

OK.
Thanks for the comments.

Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-15 13:26     ` Pierre Morel
@ 2019-03-15 13:41       ` Cornelia Huck
  2019-03-15 13:44         ` Pierre Morel
  2019-03-15 14:10       ` Pierre Morel
  2019-03-15 17:28       ` Halil Pasic
  2 siblings, 1 reply; 29+ messages in thread
From: Cornelia Huck @ 2019-03-15 13:41 UTC (permalink / raw)
  To: Pierre Morel
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On Fri, 15 Mar 2019 14:26:34 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> Conclusion:  we must handle this in userland, it will have the benefit 
> to keep old behavior when there is no callback.
> OLD QEMU will not see change as they will not set aqic facility
> NEW QEMU will handle this correctly.
> 
> In this case we also do not need to handle all other tests here but can 
> move it to the callback as Tony wanted.
> 
> Would you agree with something simple like:
> 
> static int handle_pqap(struct kvm_vcpu *vcpu)
> {
>          int ret = -EOPNOTSUPP;
> 
>          /* Verify that the hook callback is registered and call it */
>          if (pqap_hook)
>                  if (try_module_get(pqap_hook->owner)) {
>                          ret = pqap_hook->hook(vcpu);
>                          module_put(pqap_hook->owner);
>                  }
>          return ret;
> }
> 
> All other tests in QEMU and in the callback.

With the hook checking for priv, fc, etc.? Yeah, might work.

But don't count on my feedback too much right now, better wait for
others' comments :) I'll resume in April, if needed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-15 13:41       ` Cornelia Huck
@ 2019-03-15 13:44         ` Pierre Morel
  0 siblings, 0 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-15 13:44 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On 15/03/2019 14:41, Cornelia Huck wrote:
> On Fri, 15 Mar 2019 14:26:34 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> Conclusion:  we must handle this in userland, it will have the benefit
>> to keep old behavior when there is no callback.
>> OLD QEMU will not see change as they will not set aqic facility
>> NEW QEMU will handle this correctly.
>>
>> In this case we also do not need to handle all other tests here but can
>> move it to the callback as Tony wanted.
>>
>> Would you agree with something simple like:
>>
>> static int handle_pqap(struct kvm_vcpu *vcpu)
>> {
>>           int ret = -EOPNOTSUPP;
>>
>>           /* Verify that the hook callback is registered and call it */
>>           if (pqap_hook)
>>                   if (try_module_get(pqap_hook->owner)) {
>>                           ret = pqap_hook->hook(vcpu);
>>                           module_put(pqap_hook->owner);
>>                   }
>>           return ret;
>> }
>>
>> All other tests in QEMU and in the callback.
> 
> With the hook checking for priv, fc, etc.? Yeah, might work.
> 
> But don't count on my feedback too much right now, better wait for
> others' comments :) I'll resume in April, if needed.
> 

OK, thanks

Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-15 13:26     ` Pierre Morel
  2019-03-15 13:41       ` Cornelia Huck
@ 2019-03-15 14:10       ` Pierre Morel
  2019-03-15 17:43         ` Halil Pasic
  2019-03-19  9:55         ` Pierre Morel
  2019-03-15 17:28       ` Halil Pasic
  2 siblings, 2 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-15 14:10 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On 15/03/2019 14:26, Pierre Morel wrote:
> On 15/03/2019 11:20, Cornelia Huck wrote:
>> On Wed, 13 Mar 2019 17:04:58 +0100
>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>
>>> +/*
>>> + * handle_pqap: Handling pqap interception
>>> + * @vcpu: the vcpu having issue the pqap instruction
>>> + *
>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>> + * answer the guest even if no dedicated driver's hook is available.
>>> + *
>>> + * The intercepting code calls a dedicated callback for this 
>>> instruction
>>> + * if a driver did register one in the CRYPTO satellite of the
>>> + * SIE block.
>>> + *
>>> + * For PQAP/AQIC instructions only, verify privilege and 
>>> specifications.
>>> + *
>>> + * If no callback available, the queues are not available, return 
>>> this to
>>> + * the caller.
>>> + * Else return the value returned by the callback.
>>> + */
>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>> +{
>>> +    uint8_t fc;
>>> +    struct ap_queue_status status = {};
>>> +    int ret;
>>> +    /* Verify that the AP instruction are available */
>>> +    if (!ap_instructions_available())
>>> +        return -EOPNOTSUPP;
>>> +    /* Verify that the guest is allowed to use AP instructions */
>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>> +        return -EOPNOTSUPP;
>>> +    /* Verify that the function code is AQIC */
>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>> +    /* We do not want to change the behavior we had before this patch*/
>>> +    if (fc != 0x03)
>>> +        return -EOPNOTSUPP;
>>> +
>>> +    /* PQAP instructions are allowed for guest kernel only */
>>> +    if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>>> +        return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>>> +    /* AQIC instruction is allowed only if facility 65 is available */
>>> +    if (!test_kvm_facility(vcpu->kvm, 65))
>>> +        return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>>> +    /* Verify that the hook callback is registered and call it */
>>> +    if (vcpu->kvm->arch.crypto.pqap_hook) {
>>> +        if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
>>> +            return -EOPNOTSUPP;
>>> +        ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
>>> +        module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
>>> +        return ret;
>>> +    }
>>> +    /*
>>> +     * It is the duty of the vfio_driver to register a hook
>>> +     * If it does not and we get an exception on AQIC we must
>>> +     * guess that there is no vfio_ap_driver at all and no one
>>> +     * to handle the guests's CRYCB and the CRYCB is empty.
>>> +     */
>>> +    status.response_code = 0x01;
>>
>> I'm still confused here, sorry. From previous discussions I recall that
>> this indicates "no crypto device" (please correct me if I'm wrong.)
>>
>> Before this patch, we had:
>> - guest issues PQAP/AQIC -> drop to userspace
>>
>> With a correct implementation, we get:
>> - guest issues PQAP/AQIC -> callback does what needs to be done
>>
>> With an incorrect implementation (no callback), we get:
>> - guest issues PQAP/AQIC -> guest gets response code 0x01
>>
>> Why not drop to userspace in that case?
> 
> This is what I had in the previous patches.
> Hum, I do not remember which discussion lead me to modify this.
> 
> Anyway, now that you put the finger on this problem, I think the problem 
> is worse.
> 
> The behavior with old / new Linux, vfio driver and qemu is:
> 
> LINUX    VFIO_AP    QEMU    PGM
> OLD    x    x    OPERATION
> NEW    -    OLD    SPECIFICATION
> NEW    -    NEW/aqic=off    SPECIFICATION
> NEW    x    NEW/aqic=on    -
> 
> x = whatever
> - = absent/none
> 
> So yes there is a change in behavior for the userland for the case QEMU 
> do not set the AQIC facility 65, OLD QEMU or NEW QEMU wanting to behave 
> like an older one.
> 
> I fear we have the same problem with the privileged operation...
> 
> For the last case, when the kvm_facility(65) is set, the explication is 
> the following:
> 
> This is related to the handling of PQAP AQIC which is now authorized by 
> this patch series.
> If we authorize PQAP AQIC, by setting the bit for facility 65, the guest 
> can use this instruction.
> If the instruction follows the specifications we must answer something 
> realistic and since there is nothing in the CRYCB (no driver) we answer 
> that there is no queue.
> 
> Conclusion:  we must handle this in userland, it will have the benefit 
> to keep old behavior when there is no callback.
> OLD QEMU will not see change as they will not set aqic facility
> NEW QEMU will handle this correctly.
> 

Sorry, wrong conclusion, handling this in userland will bring us much 
too far if we want to answer correctly for the case the hook is not 
there but QEMU accepted the facility for AQIC.

The alternative is easier, we just continue to respond with the 
OPERATION exception here and only handle the specification and 
privileged exception cases in QEMU and in the hook.

So, I think the discussion will go on until you come back :)

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-15 13:26     ` Pierre Morel
  2019-03-15 13:41       ` Cornelia Huck
  2019-03-15 14:10       ` Pierre Morel
@ 2019-03-15 17:28       ` Halil Pasic
  2019-03-19 10:01         ` Pierre Morel
  2 siblings, 1 reply; 29+ messages in thread
From: Halil Pasic @ 2019-03-15 17:28 UTC (permalink / raw)
  To: Pierre Morel
  Cc: Cornelia Huck, borntraeger, alex.williamson, linux-kernel,
	linux-s390, kvm, frankja, akrowiak, david, schwidefsky,
	heiko.carstens, freude, mimu

On Fri, 15 Mar 2019 14:26:34 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 15/03/2019 11:20, Cornelia Huck wrote:
> > On Wed, 13 Mar 2019 17:04:58 +0100
> > Pierre Morel <pmorel@linux.ibm.com> wrote:
> > 
> >> +/*
> >> + * handle_pqap: Handling pqap interception
> >> + * @vcpu: the vcpu having issue the pqap instruction
> >> + *
> >> + * We now support PQAP/AQIC instructions and we need to correctly
> >> + * answer the guest even if no dedicated driver's hook is available.
> >> + *
> >> + * The intercepting code calls a dedicated callback for this instruction
> >> + * if a driver did register one in the CRYPTO satellite of the
> >> + * SIE block.
> >> + *
> >> + * For PQAP/AQIC instructions only, verify privilege and specifications.
> >> + *
> >> + * If no callback available, the queues are not available, return this to
> >> + * the caller.
> >> + * Else return the value returned by the callback.
> >> + */
> >> +static int handle_pqap(struct kvm_vcpu *vcpu)
> >> +{
> >> +	uint8_t fc;
> >> +	struct ap_queue_status status = {};
> >> +	int ret;
> >> +	/* Verify that the AP instruction are available */
> >> +	if (!ap_instructions_available())
> >> +		return -EOPNOTSUPP;
> >> +	/* Verify that the guest is allowed to use AP instructions */
> >> +	if (!(vcpu->arch.sie_block->eca & ECA_APIE))
> >> +		return -EOPNOTSUPP;
> >> +	/* Verify that the function code is AQIC */
> >> +	fc = vcpu->run->s.regs.gprs[0] >> 24;
> >> +	/* We do not want to change the behavior we had before this patch*/
> >> +	if (fc != 0x03)
> >> +		return -EOPNOTSUPP;
> >> +
> >> +	/* PQAP instructions are allowed for guest kernel only */
> >> +	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> >> +		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
> >> +	/* AQIC instruction is allowed only if facility 65 is available */
> >> +	if (!test_kvm_facility(vcpu->kvm, 65))
> >> +		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
> >> +	/* Verify that the hook callback is registered and call it */
> >> +	if (vcpu->kvm->arch.crypto.pqap_hook) {
> >> +		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> >> +			return -EOPNOTSUPP;
> >> +		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
> >> +		module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
> >> +		return ret;
> >> +	}
> >> +	/*
> >> +	 * It is the duty of the vfio_driver to register a hook
> >> +	 * If it does not and we get an exception on AQIC we must
> >> +	 * guess that there is no vfio_ap_driver at all and no one
> >> +	 * to handle the guests's CRYCB and the CRYCB is empty.
> >> +	 */
> >> +	status.response_code = 0x01;
> > 
> > I'm still confused here, sorry. From previous discussions I recall that
> > this indicates "no crypto device" (please correct me if I'm wrong.)
> > 
> > Before this patch, we had:
> > - guest issues PQAP/AQIC -> drop to userspace
> > 
> > With a correct implementation, we get:
> > - guest issues PQAP/AQIC -> callback does what needs to be done
> > 
> > With an incorrect implementation (no callback), we get:
> > - guest issues PQAP/AQIC -> guest gets response code 0x01
> > 
> > Why not drop to userspace in that case?
> 
> This is what I had in the previous patches.
> Hum, I do not remember which discussion lead me to modify this.
> 
> Anyway, now that you put the finger on this problem, I think the problem 
> is worse.
> 
> The behavior with old / new Linux, vfio driver and qemu is:
> 
> LINUX	VFIO_AP	QEMU	PGM
> OLD	x	x	OPERATION

Isn't OPERATION a bad answer if ap=on? It should not happen
with a well behaved guest because facility 65 is not indicated,
but if it does, I guess we give the wrong answer.

> NEW	-	OLD	SPECIFICATION
> NEW	-	NEW/aqic=off	SPECIFICATION
> NEW	x	NEW/aqic=on	-
> 

AFAICT with LINUX == NEW we get the correct answer. OPERATION exception
is only good if ap=off.

> x = whatever
> - = absent/none
> 
> So yes there is a change in behavior for the userland for the case QEMU 
> do not set the AQIC facility 65, OLD QEMU or NEW QEMU wanting to behave 
> like an older one.
> 
> I fear we have the same problem with the privileged operation...
> 

IMHO this boils down to:
* either OLD QEMU or 
* OLD LINUX
should have taken care of handling the mandatory intercept for PQAP/AQIC
if ap=on (i.e. guest has AP instructions), and does not have facility 65
which was the case for OLD.

Things get complicated when one considers that ECA.28 is an effective
control.

> For the last case, when the kvm_facility(65) is set, the explication is 
> the following:
> 
> This is related to the handling of PQAP AQIC which is now authorized by 
> this patch series.
> If we authorize PQAP AQIC, by setting the bit for facility 65, the guest 
> can use this instruction.
> If the instruction follows the specifications we must answer something 
> realistic and since there is nothing in the CRYCB (no driver) we answer 
> that there is no queue.
> 
> Conclusion:  we must handle this in userland, it will have the benefit 
> to keep old behavior when there is no callback.
> OLD QEMU will not see change as they will not set aqic facility

That would mean we remain quirky.

> NEW QEMU will handle this correctly.
> 
> In this case we also do not need to handle all other tests here but can 
> move it to the callback as Tony wanted.
> 
> Would you agree with something simple like:
> 
> static int handle_pqap(struct kvm_vcpu *vcpu)
> {
>          int ret = -EOPNOTSUPP;
> 
>          /* Verify that the hook callback is registered and call it */
>          if (pqap_hook)
>                  if (try_module_get(pqap_hook->owner)) {
>                          ret = pqap_hook->hook(vcpu);
>                          module_put(pqap_hook->owner);
>                  }
>          return ret;
> }
> 
> All other tests in QEMU and in the callback.
> 

You stated in another email that the conclusion is wrong. I'm not sure
what is the cleanest solution here. This effective control thing does
make my head spin.

Regards,
Halil


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-15 14:10       ` Pierre Morel
@ 2019-03-15 17:43         ` Halil Pasic
  2019-03-19  9:55         ` Pierre Morel
  1 sibling, 0 replies; 29+ messages in thread
From: Halil Pasic @ 2019-03-15 17:43 UTC (permalink / raw)
  To: Pierre Morel
  Cc: Cornelia Huck, borntraeger, alex.williamson, linux-kernel,
	linux-s390, kvm, frankja, akrowiak, david, schwidefsky,
	heiko.carstens, freude, mimu

On Fri, 15 Mar 2019 15:10:25 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> Sorry, wrong conclusion, handling this in userland will bring us much 
> too far if we want to answer correctly for the case the hook is not 
> there but QEMU accepted the facility for AQIC.
> 
> The alternative is easier, we just continue to respond with the 
> OPERATION exception here and only handle the specification and 
> privileged exception cases in QEMU and in the hook.

I don't quite understand what do you mean by this paragraph. Especially
not what do you mean by 'just continue to respond with the OPERATION
exception here'.

In any case if the guest is supposed to have ap instructions, and does
not have facility 65 the right answer is specification and not operation
exception. And this has to work regardless of vfio-ap module loaded or
not.

Regards,
Halil

> 
> So, I think the discussion will go on until you come back :)
> 
> Regards,
> Pierre


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device
  2019-03-13 16:05 ` [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device Pierre Morel
@ 2019-03-15 18:15   ` Halil Pasic
  2019-03-19  9:38     ` Pierre Morel
  0 siblings, 1 reply; 29+ messages in thread
From: Halil Pasic @ 2019-03-15 18:15 UTC (permalink / raw)
  To: Pierre Morel
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On Wed, 13 Mar 2019 17:05:01 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> When the mediated device is open we setup the relation with KVM unset it
> when the mediated device is released.
> 
> We ensure KVM is present on opening of the mediated device.
> 
> We ensure that KVM survives the mediated device, and establish a direct

survives? 

> link from KVM to the mediated device to simplify the relationship.
> 
> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 80 ++++++++++++++++++++++-----------------
>  1 file changed, 46 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 0f8952c23..6b559ca 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -790,7 +790,6 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>   * vfio_ap_mdev_set_kvm
>   *
>   * @matrix_mdev: a mediated matrix device
> - * @kvm: reference to KVM instance
>   *
>   * Verifies no other mediated matrix device has @kvm and sets a reference to
>   * it in @matrix_mdev->kvm.
> @@ -798,53 +797,39 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>   * Return 0 if no other mediated matrix device has a reference to @kvm;
>   * otherwise, returns an -EPERM.
>   */
> -static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
> -				struct kvm *kvm)
> +static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev)
>  {
> -	struct ap_matrix_mdev *m;
> -
>  	mutex_lock(&matrix_dev->lock);
> +	if (matrix_mdev->kvm->arch.crypto.pqap_hook)
> +		goto err_unlock;
>  
> -	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
> -		if ((m != matrix_mdev) && (m->kvm == kvm)) {
> -			mutex_unlock(&matrix_dev->lock);
> -			return -EPERM;
> -		}
> -	}
> +	if (!matrix_mdev->kvm->arch.crypto.crycbd)
> +		goto err_unlock;
>  
> -	matrix_mdev->kvm = kvm;
> -	mutex_unlock(&matrix_dev->lock);
> +	matrix_mdev->kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>  
> +	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> +				  matrix_mdev->matrix.aqm,
> +				  matrix_mdev->matrix.adm);
> +	kvm_get_kvm(matrix_mdev->kvm);
> +	mutex_unlock(&matrix_dev->lock);
>  	return 0;
> +
> +err_unlock:
> +	mutex_unlock(&matrix_dev->lock);
> +	return -EPERM;
>  }
>  
>  static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  				       unsigned long action, void *data)
>  {
> -	int ret;
>  	struct ap_matrix_mdev *matrix_mdev;
>  
>  	if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>  		return NOTIFY_OK;
>  
>  	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
> -
> -	if (!data) {
> -		matrix_mdev->kvm = NULL;
> -		return NOTIFY_OK;
> -	}
> -
> -	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
> -	if (ret)
> -		return NOTIFY_DONE;
> -
> -	/* If there is no CRYCB pointer, then we can't copy the masks */
> -	if (!matrix_mdev->kvm->arch.crypto.crycbd)
> -		return NOTIFY_DONE;
> -
> -	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> -				  matrix_mdev->matrix.aqm,
> -				  matrix_mdev->matrix.adm);
> +	matrix_mdev->kvm = data;
>  
>  	return NOTIFY_OK;
>  }
> @@ -888,6 +873,12 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>  	if (ret)
>  		goto err_group;
>  
> +	/* We do not support opening the mediated device without KVM */
> +	if (!matrix_mdev->kvm) {
> +		ret = -ENODEV;
> +		goto err_group;
> +	}
> +
>  	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
>  	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
>  
> @@ -896,8 +887,15 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>  	if (ret)
>  		goto err_iommu;
>  
> +	ret = vfio_ap_mdev_set_kvm(matrix_mdev);

At this point the matrix_mdev->kvm ain't guaranteed to be valid IMHO. Or
am I wrong? If I'm right kvm_get_kvm(matrix_mdev->kvm) could be too late.

> +	if (ret)
> +		goto err_kvm;
> +
>  	return 0;
>  
> +err_kvm:
> +	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> +				 &matrix_mdev->iommu_notifier);
>  err_iommu:
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>  				 &matrix_mdev->group_notifier);
> @@ -906,19 +904,33 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>  	return ret;
>  }
>  
> -static void vfio_ap_mdev_release(struct mdev_device *mdev)
> +static int vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
>  {
> -	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +	struct kvm *kvm = matrix_mdev->kvm;
>  
>  	if (matrix_mdev->kvm)
>  		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);

This still conditional?

> -
> +	vfio_ap_mdev_reset_queues(matrix_mdev->mdev);

I guess your intention was to move vfio_ap_mdev_reset_queues()
here from vfio_ap_mdev_release(), but you still have a
vfio_ap_mdev_reset_queues() call in vfio_ap_mdev_release().

> +	matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
>  	matrix_mdev->kvm = NULL;
> +
> +	kvm_put_kvm(kvm);
> +	return 0;
> +}
> +
> +static void vfio_ap_mdev_release(struct mdev_device *mdev)
> +{
> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> +	mutex_lock(&matrix_dev->lock);
> +
>  	vfio_ap_mdev_reset_queues(mdev);

Here.

Regards,
Halil

> +	vfio_ap_mdev_unset_kvm(matrix_mdev);
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>  				 &matrix_mdev->iommu_notifier);
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>  				 &matrix_mdev->group_notifier);
> +	mutex_unlock(&matrix_dev->lock);
>  	module_put(THIS_MODULE);
>  }
>  


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device
  2019-03-15 18:15   ` Halil Pasic
@ 2019-03-19  9:38     ` Pierre Morel
  2019-03-19 11:54       ` Halil Pasic
  0 siblings, 1 reply; 29+ messages in thread
From: Pierre Morel @ 2019-03-19  9:38 UTC (permalink / raw)
  To: Halil Pasic
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On 15/03/2019 19:15, Halil Pasic wrote:
> On Wed, 13 Mar 2019 17:05:01 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> When the mediated device is open we setup the relation with KVM unset it
>> when the mediated device is released.
>>
>> We ensure KVM is present on opening of the mediated device.
>>
>> We ensure that KVM survives the mediated device, and establish a direct
> 
> survives?

what alternative do you prefer?

> 
>> link from KVM to the mediated device to simplify the relationship.
>>
>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>> ---

...snip...

>>   static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>   				       unsigned long action, void *data)
>>   {
>> -	int ret;
>>   	struct ap_matrix_mdev *matrix_mdev;
>>   
>>   	if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>>   		return NOTIFY_OK;
>>   
>>   	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
>> -
>> -	if (!data) {
>> -		matrix_mdev->kvm = NULL;
>> -		return NOTIFY_OK;
>> -	}
>> -
>> -	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>> -	if (ret)
>> -		return NOTIFY_DONE;
>> -
>> -	/* If there is no CRYCB pointer, then we can't copy the masks */
>> -	if (!matrix_mdev->kvm->arch.crypto.crycbd)
>> -		return NOTIFY_DONE;
>> -
>> -	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
>> -				  matrix_mdev->matrix.aqm,
>> -				  matrix_mdev->matrix.adm);
>> +	matrix_mdev->kvm = data;
>>   
>>   	return NOTIFY_OK;
>>   }
>> @@ -888,6 +873,12 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>>   	if (ret)
>>   		goto err_group;
>>   
>> +	/* We do not support opening the mediated device without KVM */
>> +	if (!matrix_mdev->kvm) {
>> +		ret = -ENODEV;
>> +		goto err_group;
>> +	}
>> +
>>   	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
>>   	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
>>   
>> @@ -896,8 +887,15 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>>   	if (ret)
>>   		goto err_iommu;
>>   
>> +	ret = vfio_ap_mdev_set_kvm(matrix_mdev);
> 
> At this point the matrix_mdev->kvm ain't guaranteed to be valid IMHO. Or
> am I wrong? If I'm right kvm_get_kvm(matrix_mdev->kvm) could be too late.

What about the if (!matrix_mdev->kvm) 10 lines above ?

> 
>> +	if (ret)
>> +		goto err_kvm;
>> +
>>   	return 0;
>>   
>> +err_kvm:
>> +	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>> +				 &matrix_mdev->iommu_notifier);
>>   err_iommu:
>>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>>   				 &matrix_mdev->group_notifier);
>> @@ -906,19 +904,33 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>>   	return ret;
>>   }
>>   
>> -static void vfio_ap_mdev_release(struct mdev_device *mdev)
>> +static int vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
>>   {
>> -	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +	struct kvm *kvm = matrix_mdev->kvm;
>>   
>>   	if (matrix_mdev->kvm)
>>   		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> 
> This still conditional?

Yes, nothing to clear if there is no KVM.

> 
>> -
>> +	vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> 
> I guess your intention was to move vfio_ap_mdev_reset_queues()
> here from vfio_ap_mdev_release(), but you still have a
> vfio_ap_mdev_reset_queues() call in vfio_ap_mdev_release().
> 
>> +	matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
>>   	matrix_mdev->kvm = NULL;
>> +
>> +	kvm_put_kvm(kvm);
>> +	return 0;
>> +}
>> +
>> +static void vfio_ap_mdev_release(struct mdev_device *mdev)
>> +{
>> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +
>> +	mutex_lock(&matrix_dev->lock);
>> +
>>   	vfio_ap_mdev_reset_queues(mdev);

right, this one will go away.
Thanks for reviewing.

Regards,
Pierre



-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-15 14:10       ` Pierre Morel
  2019-03-15 17:43         ` Halil Pasic
@ 2019-03-19  9:55         ` Pierre Morel
  1 sibling, 0 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-19  9:55 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: borntraeger, alex.williamson, linux-kernel, linux-s390, kvm,
	frankja, akrowiak, pasic, david, schwidefsky, heiko.carstens,
	freude, mimu

On 15/03/2019 15:10, Pierre Morel wrote:
> On 15/03/2019 14:26, Pierre Morel wrote:
>> On 15/03/2019 11:20, Cornelia Huck wrote:
>>> On Wed, 13 Mar 2019 17:04:58 +0100
>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>
>>>> +/*
>>>> + * handle_pqap: Handling pqap interception
>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>> + *
>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>> + *
>>>> + * The intercepting code calls a dedicated callback for this 
>>>> instruction
>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>> + * SIE block.
>>>> + *
>>>> + * For PQAP/AQIC instructions only, verify privilege and 
>>>> specifications.
>>>> + *
>>>> + * If no callback available, the queues are not available, return 
>>>> this to
>>>> + * the caller.
>>>> + * Else return the value returned by the callback.
>>>> + */
>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +    uint8_t fc;
>>>> +    struct ap_queue_status status = {};
>>>> +    int ret;
>>>> +    /* Verify that the AP instruction are available */
>>>> +    if (!ap_instructions_available())
>>>> +        return -EOPNOTSUPP;
>>>> +    /* Verify that the guest is allowed to use AP instructions */
>>>> +    if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>> +        return -EOPNOTSUPP;
>>>> +    /* Verify that the function code is AQIC */
>>>> +    fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>> +    /* We do not want to change the behavior we had before this 
>>>> patch*/
>>>> +    if (fc != 0x03)
>>>> +        return -EOPNOTSUPP;
>>>> +
>>>> +    /* PQAP instructions are allowed for guest kernel only */
>>>> +    if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>>>> +        return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>>>> +    /* AQIC instruction is allowed only if facility 65 is available */
>>>> +    if (!test_kvm_facility(vcpu->kvm, 65))
>>>> +        return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>>>> +    /* Verify that the hook callback is registered and call it */
>>>> +    if (vcpu->kvm->arch.crypto.pqap_hook) {
>>>> +        if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
>>>> +            return -EOPNOTSUPP;
>>>> +        ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
>>>> +        module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
>>>> +        return ret;
>>>> +    }
>>>> +    /*
>>>> +     * It is the duty of the vfio_driver to register a hook
>>>> +     * If it does not and we get an exception on AQIC we must
>>>> +     * guess that there is no vfio_ap_driver at all and no one
>>>> +     * to handle the guests's CRYCB and the CRYCB is empty.
>>>> +     */
>>>> +    status.response_code = 0x01;
>>>
>>> I'm still confused here, sorry. From previous discussions I recall that
>>> this indicates "no crypto device" (please correct me if I'm wrong.)
>>>
>>> Before this patch, we had:
>>> - guest issues PQAP/AQIC -> drop to userspace
>>>
>>> With a correct implementation, we get:
>>> - guest issues PQAP/AQIC -> callback does what needs to be done
>>>
>>> With an incorrect implementation (no callback), we get:
>>> - guest issues PQAP/AQIC -> guest gets response code 0x01
>>>
>>> Why not drop to userspace in that case?
>>
>> This is what I had in the previous patches.
>> Hum, I do not remember which discussion lead me to modify this.
>>
>> Anyway, now that you put the finger on this problem, I think the 
>> problem is worse.
>>
>> The behavior with old / new Linux, vfio driver and qemu is:
>>
>> LINUX    VFIO_AP    QEMU    PGM
>> OLD    x    x    OPERATION
>> NEW    -    OLD    SPECIFICATION
>> NEW    -    NEW/aqic=off    SPECIFICATION
>> NEW    x    NEW/aqic=on    -
>>
>> x = whatever
>> - = absent/none
>>
>> So yes there is a change in behavior for the userland for the case 
>> QEMU do not set the AQIC facility 65, OLD QEMU or NEW QEMU wanting to 
>> behave like an older one.
>>
>> I fear we have the same problem with the privileged operation...
>>
>> For the last case, when the kvm_facility(65) is set, the explication 
>> is the following:
>>
>> This is related to the handling of PQAP AQIC which is now authorized 
>> by this patch series.
>> If we authorize PQAP AQIC, by setting the bit for facility 65, the 
>> guest can use this instruction.
>> If the instruction follows the specifications we must answer something 
>> realistic and since there is nothing in the CRYCB (no driver) we 
>> answer that there is no queue.
>>
>> Conclusion:  we must handle this in userland, it will have the benefit 
>> to keep old behavior when there is no callback.
>> OLD QEMU will not see change as they will not set aqic facility
>> NEW QEMU will handle this correctly.
>>
> 
> Sorry, wrong conclusion, handling this in userland will bring us much 
> too far if we want to answer correctly for the case the hook is not 
> there but QEMU accepted the facility for AQIC.

Sorry, forget it, I was tired.

Pierre

> 
> The alternative is easier, we just continue to respond with the 
> OPERATION exception here and only handle the specification and 
> privileged exception cases in QEMU and in the hook.
> 
> So, I think the discussion will go on until you come back :)
> 
> Regards,
> Pierre
> 


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-15 17:28       ` Halil Pasic
@ 2019-03-19 10:01         ` Pierre Morel
  2019-03-19 14:54           ` Halil Pasic
  0 siblings, 1 reply; 29+ messages in thread
From: Pierre Morel @ 2019-03-19 10:01 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, borntraeger, alex.williamson, linux-kernel,
	linux-s390, kvm, frankja, akrowiak, david, schwidefsky,
	heiko.carstens, freude, mimu

On 15/03/2019 18:28, Halil Pasic wrote:
> On Fri, 15 Mar 2019 14:26:34 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 15/03/2019 11:20, Cornelia Huck wrote:
>>> On Wed, 13 Mar 2019 17:04:58 +0100
>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>
>>>> +/*
>>>> + * handle_pqap: Handling pqap interception
>>>> + * @vcpu: the vcpu having issue the pqap instruction
>>>> + *
>>>> + * We now support PQAP/AQIC instructions and we need to correctly
>>>> + * answer the guest even if no dedicated driver's hook is available.
>>>> + *
>>>> + * The intercepting code calls a dedicated callback for this instruction
>>>> + * if a driver did register one in the CRYPTO satellite of the
>>>> + * SIE block.
>>>> + *
>>>> + * For PQAP/AQIC instructions only, verify privilege and specifications.
>>>> + *
>>>> + * If no callback available, the queues are not available, return this to
>>>> + * the caller.
>>>> + * Else return the value returned by the callback.
>>>> + */
>>>> +static int handle_pqap(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	uint8_t fc;
>>>> +	struct ap_queue_status status = {};
>>>> +	int ret;
>>>> +	/* Verify that the AP instruction are available */
>>>> +	if (!ap_instructions_available())
>>>> +		return -EOPNOTSUPP;
>>>> +	/* Verify that the guest is allowed to use AP instructions */
>>>> +	if (!(vcpu->arch.sie_block->eca & ECA_APIE))
>>>> +		return -EOPNOTSUPP;
>>>> +	/* Verify that the function code is AQIC */
>>>> +	fc = vcpu->run->s.regs.gprs[0] >> 24;
>>>> +	/* We do not want to change the behavior we had before this patch*/
>>>> +	if (fc != 0x03)
>>>> +		return -EOPNOTSUPP;
>>>> +
>>>> +	/* PQAP instructions are allowed for guest kernel only */
>>>> +	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>>>> +		return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>>>> +	/* AQIC instruction is allowed only if facility 65 is available */
>>>> +	if (!test_kvm_facility(vcpu->kvm, 65))
>>>> +		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>>>> +	/* Verify that the hook callback is registered and call it */
>>>> +	if (vcpu->kvm->arch.crypto.pqap_hook) {
>>>> +		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
>>>> +			return -EOPNOTSUPP;
>>>> +		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
>>>> +		module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
>>>> +		return ret;
>>>> +	}
>>>> +	/*
>>>> +	 * It is the duty of the vfio_driver to register a hook
>>>> +	 * If it does not and we get an exception on AQIC we must
>>>> +	 * guess that there is no vfio_ap_driver at all and no one
>>>> +	 * to handle the guests's CRYCB and the CRYCB is empty.
>>>> +	 */
>>>> +	status.response_code = 0x01;
>>>
>>> I'm still confused here, sorry. From previous discussions I recall that
>>> this indicates "no crypto device" (please correct me if I'm wrong.)
>>>
>>> Before this patch, we had:
>>> - guest issues PQAP/AQIC -> drop to userspace
>>>
>>> With a correct implementation, we get:
>>> - guest issues PQAP/AQIC -> callback does what needs to be done
>>>
>>> With an incorrect implementation (no callback), we get:
>>> - guest issues PQAP/AQIC -> guest gets response code 0x01
>>>
>>> Why not drop to userspace in that case?
>>
>> This is what I had in the previous patches.
>> Hum, I do not remember which discussion lead me to modify this.
>>
>> Anyway, now that you put the finger on this problem, I think the problem
>> is worse.
>>
>> The behavior with old / new Linux, vfio driver and qemu is:
>>
>> LINUX	VFIO_AP	QEMU	PGM
>> OLD	x	x	OPERATION
> 
> Isn't OPERATION a bad answer if ap=on? It should not happen
> with a well behaved guest because facility 65 is not indicated,
> but if it does, I guess we give the wrong answer.

It is clearly wrong but we can not change the past :)

> 
>> NEW	-	OLD	SPECIFICATION
>> NEW	-	NEW/aqic=off	SPECIFICATION
>> NEW	x	NEW/aqic=on	-
>>
> 
> AFAICT with LINUX == NEW we get the correct answer. OPERATION exception
> is only good if ap=off.

Exact.

> 
>> x = whatever
>> - = absent/none
>>
>> So yes there is a change in behavior for the userland for the case QEMU
>> do not set the AQIC facility 65, OLD QEMU or NEW QEMU wanting to behave
>> like an older one.
>>
>> I fear we have the same problem with the privileged operation...
>>
> 
> IMHO this boils down to:
> * either OLD QEMU or
> * OLD LINUX
> should have taken care of handling the mandatory intercept for PQAP/AQIC
> if ap=on (i.e. guest has AP instructions), and does not have facility 65
> which was the case for OLD.

yes

> 
> Things get complicated when one considers that ECA.28 is an effective
> control.

I don't think so, ECA_28 is not really a problem.
We do not propagate ECA_AIV in VSIE and ECA_AIV is tested in the vfio 
driver to support GISA.
So that the guest 3 will not support interrupt.

> 
>> For the last case, when the kvm_facility(65) is set, the explication is
>> the following:
>>
>> This is related to the handling of PQAP AQIC which is now authorized by
>> this patch series.
>> If we authorize PQAP AQIC, by setting the bit for facility 65, the guest
>> can use this instruction.
>> If the instruction follows the specifications we must answer something
>> realistic and since there is nothing in the CRYCB (no driver) we answer
>> that there is no queue.
>>
>> Conclusion:  we must handle this in userland, it will have the benefit
>> to keep old behavior when there is no callback.
>> OLD QEMU will not see change as they will not set aqic facility
> 
> That would mean we remain quirky.

Yes, the alternative is:

1) We do things right but this mean we change the ABI (SPECIFICATION 
instead of OPERATION)

I thing this is the best thing to do, it is the implementation proposed 
by this patch where all is done in Kernel, so that we are right what 
ever the userland user is (QEMU or other).

2) We want to preserve the old ABI for old QEMU
Then I proposed the implementation here under.


My personal opinion, is that we should change the ABI and do things 
right now.
We should also do it right for TAPQ with t bit set. I remember Christian 
already warned about this but we did not implement it.

> 
>> NEW QEMU will handle this correctly.
>>
>> In this case we also do not need to handle all other tests here but can
>> move it to the callback as Tony wanted.
>>
>> Would you agree with something simple like:
>>
>> static int handle_pqap(struct kvm_vcpu *vcpu)
>> {
>>           int ret = -EOPNOTSUPP;
>>
>>           /* Verify that the hook callback is registered and call it */
>>           if (pqap_hook)
>>                   if (try_module_get(pqap_hook->owner)) {
>>                           ret = pqap_hook->hook(vcpu);
>>                           module_put(pqap_hook->owner);
>>                   }
>>           return ret;
>> }
>>
>> All other tests in QEMU and in the callback.
>>
> 
> You stated in another email that the conclusion is wrong. I'm not sure

Forget it, I do not understand what I wanted to say there /o\ .

> what is the cleanest solution here. This effective control thing does
> make my head spin.

As I said I do not thing any effective control do interfere here.
IMHO Alternative 1 is the cleanest solution

Regards,
Pierre



-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device
  2019-03-19  9:38     ` Pierre Morel
@ 2019-03-19 11:54       ` Halil Pasic
  2019-03-19 14:23         ` Pierre Morel
  0 siblings, 1 reply; 29+ messages in thread
From: Halil Pasic @ 2019-03-19 11:54 UTC (permalink / raw)
  To: Pierre Morel
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On Tue, 19 Mar 2019 10:38:42 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 15/03/2019 19:15, Halil Pasic wrote:
> > On Wed, 13 Mar 2019 17:05:01 +0100
> > Pierre Morel <pmorel@linux.ibm.com> wrote:
> > 
> >> When the mediated device is open we setup the relation with KVM unset it
> >> when the mediated device is released.
> >>
> >> We ensure KVM is present on opening of the mediated device.
> >>
> >> We ensure that KVM survives the mediated device, and establish a direct
> > 
> > survives?
> 
> what alternative do you prefer?
> 

Increase kvm's refcount to ensure the guest is alive when the
ap_matrix_mdev is active. An ap mp_matrix becomes active with
a successful open() and ceases to be active with a release().

Your sentence was materially wrong as the mdev is allowed to outlive
the KVM. BTW survive tends to have an 'in spite of' note to it, which
outlive does not. vfio-ap is, I hope, not a calamity that threatens
the life of KVM ;). https://en.oxforddictionaries.com/definition/survive

> > 
> >> link from KVM to the mediated device to simplify the relationship.
> >>
> >> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
> >> ---
> 
> ...snip...
> 
> >>   static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> >>   				       unsigned long action, void *data)
> >>   {
> >> -	int ret;
> >>   	struct ap_matrix_mdev *matrix_mdev;
> >>   
> >>   	if (action != VFIO_GROUP_NOTIFY_SET_KVM)
> >>   		return NOTIFY_OK;
> >>   
> >>   	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
> >> -
> >> -	if (!data) {
> >> -		matrix_mdev->kvm = NULL;
> >> -		return NOTIFY_OK;
> >> -	}
> >> -
> >> -	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
> >> -	if (ret)
> >> -		return NOTIFY_DONE;
> >> -
> >> -	/* If there is no CRYCB pointer, then we can't copy the masks */
> >> -	if (!matrix_mdev->kvm->arch.crypto.crycbd)
> >> -		return NOTIFY_DONE;
> >> -
> >> -	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> >> -				  matrix_mdev->matrix.aqm,
> >> -				  matrix_mdev->matrix.adm);
> >> +	matrix_mdev->kvm = data;
> >>   
> >>   	return NOTIFY_OK;
> >>   }
> >> @@ -888,6 +873,12 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
> >>   	if (ret)
> >>   		goto err_group;
> >>   
> >> +	/* We do not support opening the mediated device without KVM */
> >> +	if (!matrix_mdev->kvm) {
> >> +		ret = -ENODEV;
> >> +		goto err_group;
> >> +	}
> >> +
> >>   	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
> >>   	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
> >>   
> >> @@ -896,8 +887,15 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
> >>   	if (ret)
> >>   		goto err_iommu;
> >>   
> >> +	ret = vfio_ap_mdev_set_kvm(matrix_mdev);
> > 
> > At this point the matrix_mdev->kvm ain't guaranteed to be valid IMHO. Or
> > am I wrong? If I'm right kvm_get_kvm(matrix_mdev->kvm) could be too late.
> 
> What about the if (!matrix_mdev->kvm) 10 lines above ?
> 

That check is not sufficient.

You should do the kvm_get_kvm() in vfio_ap_mdev_group_notifier(). VFIO
must ensure that the kvm pointer you get is valid, in a sense that it
points to a valid struct kvm and the kvm object is alive, while you are
in the callback. But not beyond.

If another thread were to decrement the refcount of the kvm object you
would end up with matrix_mdev->kvm pointing to an object that has already
died.

Does my analysis make sense to you?

> > 
> >> +	if (ret)
> >> +		goto err_kvm;
> >> +
> >>   	return 0;
> >>   
> >> +err_kvm:
> >> +	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> >> +				 &matrix_mdev->iommu_notifier);
> >>   err_iommu:
> >>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> >>   				 &matrix_mdev->group_notifier);
> >> @@ -906,19 +904,33 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
> >>   	return ret;
> >>   }
> >>   
> >> -static void vfio_ap_mdev_release(struct mdev_device *mdev)
> >> +static int vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> >>   {
> >> -	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >> +	struct kvm *kvm = matrix_mdev->kvm;
> >>   
> >>   	if (matrix_mdev->kvm)
> >>   		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> > 
> > This still conditional?
> 
> Yes, nothing to clear if there is no KVM.
> 

Since we have ensured the open only works if there is a KVM at that
point in time, and we have taken a reference to KVM, I would expect
KVM can not go away before we give up our reference.

> > 
> >> -
> >> +	vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> > 
> > I guess your intention was to move vfio_ap_mdev_reset_queues()
> > here from vfio_ap_mdev_release(), but you still have a
> > vfio_ap_mdev_reset_queues() call in vfio_ap_mdev_release().
> > 
> >> +	matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >>   	matrix_mdev->kvm = NULL;
> >> +
> >> +	kvm_put_kvm(kvm);
> >> +	return 0;
> >> +}
> >> +
> >> +static void vfio_ap_mdev_release(struct mdev_device *mdev)
> >> +{
> >> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >> +
> >> +	mutex_lock(&matrix_dev->lock);
> >> +
> >>   	vfio_ap_mdev_reset_queues(mdev);
> 
> right, this one will go away.
> Thanks for reviewing.
> 

yw

Regards,
Halil


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device
  2019-03-19 11:54       ` Halil Pasic
@ 2019-03-19 14:23         ` Pierre Morel
  2019-03-19 14:47           ` Pierre Morel
  0 siblings, 1 reply; 29+ messages in thread
From: Pierre Morel @ 2019-03-19 14:23 UTC (permalink / raw)
  To: Halil Pasic
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On 19/03/2019 12:54, Halil Pasic wrote:
> On Tue, 19 Mar 2019 10:38:42 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 15/03/2019 19:15, Halil Pasic wrote:
>>> On Wed, 13 Mar 2019 17:05:01 +0100
>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>
>>>> When the mediated device is open we setup the relation with KVM unset it
>>>> when the mediated device is released.
>>>>
>>>> We ensure KVM is present on opening of the mediated device.
>>>>
>>>> We ensure that KVM survives the mediated device, and establish a direct
>>>
>>> survives?
>>
>> what alternative do you prefer?
>>
> 
> Increase kvm's refcount to ensure the guest is alive when the
> ap_matrix_mdev is active. An ap mp_matrix becomes active with
> a successful open() and ceases to be active with a release().

Right, it is mdev usage not mdev.

> 
> Your sentence was materially wrong as the mdev is allowed to outlive
> the KVM. BTW survive tends to have an 'in spite of' note to it, which
> outlive does not. vfio-ap is, I hope, not a calamity that threatens
> the life of KVM ;). https://en.oxforddictionaries.com/definition/survive

Thanks, your description is much better.

> 
>>>
>>>> link from KVM to the mediated device to simplify the relationship.
>>>>
>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>> ---
>>
>> ...snip...
>>
>>>>    static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>>>    				       unsigned long action, void *data)
>>>>    {
>>>> -	int ret;
>>>>    	struct ap_matrix_mdev *matrix_mdev;
>>>>    
>>>>    	if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>>>>    		return NOTIFY_OK;
>>>>    
>>>>    	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
>>>> -
>>>> -	if (!data) {
>>>> -		matrix_mdev->kvm = NULL;
>>>> -		return NOTIFY_OK;
>>>> -	}
>>>> -
>>>> -	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>>>> -	if (ret)
>>>> -		return NOTIFY_DONE;
>>>> -
>>>> -	/* If there is no CRYCB pointer, then we can't copy the masks */
>>>> -	if (!matrix_mdev->kvm->arch.crypto.crycbd)
>>>> -		return NOTIFY_DONE;
>>>> -
>>>> -	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
>>>> -				  matrix_mdev->matrix.aqm,
>>>> -				  matrix_mdev->matrix.adm);
>>>> +	matrix_mdev->kvm = data;
>>>>    
>>>>    	return NOTIFY_OK;
>>>>    }
>>>> @@ -888,6 +873,12 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>>>>    	if (ret)
>>>>    		goto err_group;
>>>>    
>>>> +	/* We do not support opening the mediated device without KVM */
>>>> +	if (!matrix_mdev->kvm) {
>>>> +		ret = -ENODEV;
>>>> +		goto err_group;
>>>> +	}
>>>> +
>>>>    	matrix_mdev->iommu_notifier.notifier_call = vfio_ap_mdev_iommu_notifier;
>>>>    	events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
>>>>    
>>>> @@ -896,8 +887,15 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>>>>    	if (ret)
>>>>    		goto err_iommu;
>>>>    
>>>> +	ret = vfio_ap_mdev_set_kvm(matrix_mdev);
>>>
>>> At this point the matrix_mdev->kvm ain't guaranteed to be valid IMHO. Or
>>> am I wrong? If I'm right kvm_get_kvm(matrix_mdev->kvm) could be too late.
>>
>> What about the if (!matrix_mdev->kvm) 10 lines above ?
>>
> 
> That check is not sufficient.
> 
> You should do the kvm_get_kvm() in vfio_ap_mdev_group_notifier(). VFIO
> must ensure that the kvm pointer you get is valid, in a sense that it
> points to a valid struct kvm and the kvm object is alive, while you are
> in the callback. But not beyond.
> 
> If another thread were to decrement the refcount of the kvm object you
> would end up with matrix_mdev->kvm pointing to an object that has already
> died.
> 
> Does my analysis make sense to you?

Yes thanks the explication is good, it would have been worth to get it 
the first time.

> 
>>>
>>>> +	if (ret)
>>>> +		goto err_kvm;
>>>> +
>>>>    	return 0;
>>>>    
>>>> +err_kvm:
>>>> +	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>>>> +				 &matrix_mdev->iommu_notifier);
>>>>    err_iommu:
>>>>    	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>>>>    				 &matrix_mdev->group_notifier);
>>>> @@ -906,19 +904,33 @@ static int vfio_ap_mdev_open(struct mdev_device *mdev)
>>>>    	return ret;
>>>>    }
>>>>    
>>>> -static void vfio_ap_mdev_release(struct mdev_device *mdev)
>>>> +static int vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
>>>>    {
>>>> -	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>> +	struct kvm *kvm = matrix_mdev->kvm;
>>>>    
>>>>    	if (matrix_mdev->kvm)
>>>>    		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>>>
>>> This still conditional?
>>
>> Yes, nothing to clear if there is no KVM.
>>
> 
> Since we have ensured the open only works if there is a KVM at that
> point in time, and we have taken a reference to KVM, I would expect
> KVM can not go away before we give up our reference.

Right.


Thanks,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device
  2019-03-19 14:23         ` Pierre Morel
@ 2019-03-19 14:47           ` Pierre Morel
  2019-03-19 15:27             ` Halil Pasic
  0 siblings, 1 reply; 29+ messages in thread
From: Pierre Morel @ 2019-03-19 14:47 UTC (permalink / raw)
  To: Halil Pasic
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On 19/03/2019 15:23, Pierre Morel wrote:
> On 19/03/2019 12:54, Halil Pasic wrote:
>> On Tue, 19 Mar 2019 10:38:42 +0100
>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>
>>> On 15/03/2019 19:15, Halil Pasic wrote:
>>>> On Wed, 13 Mar 2019 17:05:01 +0100
>>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>>
>>>>> When the mediated device is open we setup the relation with KVM 
>>>>> unset it
>>>>> when the mediated device is released.
>>>>>
>>>>> We ensure KVM is present on opening of the mediated device.
>>>>>
>>>>> We ensure that KVM survives the mediated device, and establish a 
>>>>> direct
>>>>
>>>> survives?
>>>
>>> what alternative do you prefer?
>>>
>>
>> Increase kvm's refcount to ensure the guest is alive when the
>> ap_matrix_mdev is active. An ap mp_matrix becomes active with
>> a successful open() and ceases to be active with a release().
> 
> Right, it is mdev usage not mdev.
> 
>>
>> Your sentence was materially wrong as the mdev is allowed to outlive
>> the KVM. BTW survive tends to have an 'in spite of' note to it, which
>> outlive does not. vfio-ap is, I hope, not a calamity that threatens
>> the life of KVM ;). https://en.oxforddictionaries.com/definition/survive
> 
> Thanks, your description is much better.
> 
>>
>>>>
>>>>> link from KVM to the mediated device to simplify the relationship.
>>>>>
>>>>> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
>>>>> ---
>>>
>>> ...snip...
>>>
>>>>>    static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>>>>                           unsigned long action, void *data)
>>>>>    {
>>>>> -    int ret;
>>>>>        struct ap_matrix_mdev *matrix_mdev;
>>>>>        if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>>>>>            return NOTIFY_OK;
>>>>>        matrix_mdev = container_of(nb, struct ap_matrix_mdev, 
>>>>> group_notifier);
>>>>> -
>>>>> -    if (!data) {
>>>>> -        matrix_mdev->kvm = NULL;
>>>>> -        return NOTIFY_OK;
>>>>> -    }
>>>>> -
>>>>> -    ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>>>>> -    if (ret)
>>>>> -        return NOTIFY_DONE;
>>>>> -
>>>>> -    /* If there is no CRYCB pointer, then we can't copy the masks */
>>>>> -    if (!matrix_mdev->kvm->arch.crypto.crycbd)
>>>>> -        return NOTIFY_DONE;
>>>>> -
>>>>> -    kvm_arch_crypto_set_masks(matrix_mdev->kvm, 
>>>>> matrix_mdev->matrix.apm,
>>>>> -                  matrix_mdev->matrix.aqm,
>>>>> -                  matrix_mdev->matrix.adm);
>>>>> +    matrix_mdev->kvm = data;
>>>>>        return NOTIFY_OK;
>>>>>    }
>>>>> @@ -888,6 +873,12 @@ static int vfio_ap_mdev_open(struct 
>>>>> mdev_device *mdev)
>>>>>        if (ret)
>>>>>            goto err_group;
>>>>> +    /* We do not support opening the mediated device without KVM */
>>>>> +    if (!matrix_mdev->kvm) {
>>>>> +        ret = -ENODEV;
>>>>> +        goto err_group;
>>>>> +    }
>>>>> +
>>>>>        matrix_mdev->iommu_notifier.notifier_call = 
>>>>> vfio_ap_mdev_iommu_notifier;
>>>>>        events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
>>>>> @@ -896,8 +887,15 @@ static int vfio_ap_mdev_open(struct 
>>>>> mdev_device *mdev)
>>>>>        if (ret)
>>>>>            goto err_iommu;
>>>>> +    ret = vfio_ap_mdev_set_kvm(matrix_mdev);
>>>>
>>>> At this point the matrix_mdev->kvm ain't guaranteed to be valid 
>>>> IMHO. Or
>>>> am I wrong? If I'm right kvm_get_kvm(matrix_mdev->kvm) could be too 
>>>> late.
>>>
>>> What about the if (!matrix_mdev->kvm) 10 lines above ?
>>>
>>
>> That check is not sufficient.
>>
>> You should do the kvm_get_kvm() in vfio_ap_mdev_group_notifier(). VFIO
>> must ensure that the kvm pointer you get is valid, in a sense that it
>> points to a valid struct kvm and the kvm object is alive, while you are
>> in the callback. But not beyond.
>>
>> If another thread were to decrement the refcount of the kvm object you
>> would end up with matrix_mdev->kvm pointing to an object that has already
>> died.
>>
>> Does my analysis make sense to you?
> 
> Yes thanks the explication is good, it would have been worth to get it 
> the first time.
> 
>>
>>>>
>>>>> +    if (ret)
>>>>> +        goto err_kvm;
>>>>> +
>>>>>        return 0;
>>>>> +err_kvm:
>>>>> +    vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>>>>> +                 &matrix_mdev->iommu_notifier);
>>>>>    err_iommu:
>>>>>        vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>>>>>                     &matrix_mdev->group_notifier);
>>>>> @@ -906,19 +904,33 @@ static int vfio_ap_mdev_open(struct 
>>>>> mdev_device *mdev)
>>>>>        return ret;
>>>>>    }
>>>>> -static void vfio_ap_mdev_release(struct mdev_device *mdev)
>>>>> +static int vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
>>>>>    {
>>>>> -    struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>>> +    struct kvm *kvm = matrix_mdev->kvm;
>>>>>        if (matrix_mdev->kvm)
>>>>>            kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>>>>
>>>> This still conditional?
>>>
>>> Yes, nothing to clear if there is no KVM.
>>>
>>
>> Since we have ensured the open only works if there is a KVM at that
>> point in time, and we have taken a reference to KVM, I would expect
>> KVM can not go away before we give up our reference.
> 
> Right.

Right but based on the assumption we do a kvm_get_kvm() during open.

But now we will do it inside the notifier, so the logic is to do a 
kvm_put_kvm in the notifier too.
This is important because userland will ask us to release the KVM/VFIO 
link through this notifier.
So I will have to rework this part where KVM==NULL in the notifier too.

Regards,
Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-19 10:01         ` Pierre Morel
@ 2019-03-19 14:54           ` Halil Pasic
  2019-03-19 17:07             ` Pierre Morel
  0 siblings, 1 reply; 29+ messages in thread
From: Halil Pasic @ 2019-03-19 14:54 UTC (permalink / raw)
  To: Pierre Morel
  Cc: Cornelia Huck, borntraeger, alex.williamson, linux-kernel,
	linux-s390, kvm, frankja, akrowiak, david, schwidefsky,
	heiko.carstens, freude, mimu

On Tue, 19 Mar 2019 11:01:44 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 15/03/2019 18:28, Halil Pasic wrote:

[..]

> > 
> > Things get complicated when one considers that ECA.28 is an effective
> > control.
> 
> I don't think so, ECA_28 is not really a problem.
> We do not propagate ECA_AIV in VSIE and ECA_AIV is tested in the vfio 
> driver to support GISA.
> So that the guest 3 will not support interrupt.
> 

That was not my concern, but while we are at it... I guess you refer to
the check in handle_pqap(). That seems to do -EOPNOTSUPP, i.e. got to
userspace, i.e. with today's QEMU operation exception. Which does not
seem right.

My concern was the following. Let assume 
ECA.28 == 1 and EECA.28 == 0 != 1
and guest issues a PQAP (for simplicity AQIC).

Currently I guess we take a 0x04 interception and go to userspace, which
may or may not be the best thing to do.

With this patch we would take a 0x04, but (opposed to before) if guest
does not have facility 65 we go with a specification exception.
Operation exception should however take priority over this kind of
specification exception. So basically everything except PQAP/AQIC would
give you and operation exception (with current QEMU), but PQAP/AQIC would
give you a specification exception. Which is wrong!

AFAICT there is no way to tell if we got a 04 interception because
EECA.28 != 1 (and ECA.28 == 1) and FW won't interpret the AP
instructions for us, or because it PQAP/AQIC is a mandatory intercept.
In other words I don't see a way to tell if EECA.28 is 1 when
interpreting PQAP/AQIC.

Do you agree?

[..]

> 
> Yes, the alternative is:
> 
> 1) We do things right but this mean we change the ABI (SPECIFICATION 
> instead of OPERATION)
> 
> I thing this is the best thing to do, it is the implementation
> proposed by this patch where all is done in Kernel, so that we are
> right what ever the userland user is (QEMU or other).
> 
> 2) We want to preserve the old ABI for old QEMU
> Then I proposed the implementation here under.
> 
> 
> My personal opinion, is that we should change the ABI and do things 
> right now.

I tend to agree. Giving an operation exception instead of a specification
exception is a bug. If it is a kernel or qemu bug it ain't clear to me
at the moment. 

> We should also do it right for TAPQ with t bit set. I remember
> Christian already warned about this but we did not implement it.
> 

Yes, I have some blurry memories of something similar myself. I wonder
if there was a reason, or did we just forget to address this issue.

Regards,
Halil


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device
  2019-03-19 14:47           ` Pierre Morel
@ 2019-03-19 15:27             ` Halil Pasic
  2019-03-19 16:48               ` Pierre Morel
  0 siblings, 1 reply; 29+ messages in thread
From: Halil Pasic @ 2019-03-19 15:27 UTC (permalink / raw)
  To: Pierre Morel
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On Tue, 19 Mar 2019 15:47:05 +0100
Pierre Morel <pmorel@linux.ibm.com> wrote:

> >>>>>        if (matrix_mdev->kvm)
> >>>>>            kvm_arch_crypto_clear_masks(matrix_mdev->kvm);  
> >>>>
> >>>> This still conditional?  
> >>>
> >>> Yes, nothing to clear if there is no KVM.
> >>>  
> >>
> >> Since we have ensured the open only works if there is a KVM at that
> >> point in time, and we have taken a reference to KVM, I would expect
> >> KVM can not go away before we give up our reference.  
> > 
> > Right.  
> 
> Right but based on the assumption we do a kvm_get_kvm() during open.
> 
> But now we will do it inside the notifier, so the logic is to do a 
> kvm_put_kvm in the notifier too.
> This is important because userland will ask us to release the KVM/VFIO 
> link through this notifier.
> So I will have to rework this part where KVM==NULL in the notifier too.
> 
> Regards,
> Pierre

I think it can be done both ways. If you ensure KVM != NULL if the open
succeeds and take the reference in the notifier. I suppose if open()
fails release() won't be called. But the logic/code in open() would get
quite ugly because the callback could be called assync so that it
overlaps with the rest of open().

Not failing open() in case of no KVM is there yet is in my opinion
cleaner anyway.

Regards,
Halil 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device
  2019-03-19 15:27             ` Halil Pasic
@ 2019-03-19 16:48               ` Pierre Morel
  0 siblings, 0 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-19 16:48 UTC (permalink / raw)
  To: Halil Pasic
  Cc: borntraeger, alex.williamson, cohuck, linux-kernel, linux-s390,
	kvm, frankja, akrowiak, david, schwidefsky, heiko.carstens,
	freude, mimu

On 19/03/2019 16:27, Halil Pasic wrote:
> On Tue, 19 Mar 2019 15:47:05 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>>>>>>>         if (matrix_mdev->kvm)
>>>>>>>             kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>>>>>>
>>>>>> This still conditional?
>>>>>
>>>>> Yes, nothing to clear if there is no KVM.
>>>>>   
>>>>
>>>> Since we have ensured the open only works if there is a KVM at that
>>>> point in time, and we have taken a reference to KVM, I would expect
>>>> KVM can not go away before we give up our reference.
>>>
>>> Right.
>>
>> Right but based on the assumption we do a kvm_get_kvm() during open.
>>
>> But now we will do it inside the notifier, so the logic is to do a
>> kvm_put_kvm in the notifier too.
>> This is important because userland will ask us to release the KVM/VFIO
>> link through this notifier.
>> So I will have to rework this part where KVM==NULL in the notifier too.
>>
>> Regards,
>> Pierre
> 
> I think it can be done both ways. If you ensure KVM != NULL if the open
> succeeds and take the reference in the notifier. I suppose if open()
> fails release() won't be called. But the logic/code in open() would get
> quite ugly because the callback could be called assync so that it
> overlaps with the rest of open().

Not necessary, but there is more than just the kvm_get_kvm().

When the user calls KVM_DEV_VFIO_GROUP_DEL he asks to break the link 
between VFIO and KVM.

Currently we just ignore this instead of stopping all activity 
associated with KVM.

But we have more bugs there:
We should not support multiple open of the mdev which will overwrite 
matrix->kvm for the same mdev with a different KVM.
I send a bugfix for this.


> 
> Not failing open() in case of no KVM is there yet is in my opinion
> cleaner anyway.

If we handle correctly the notifiers and the exclusivity, we can do this.
I will make this correctly for the next iteration.

Regards,
Pierre

> 
> Regards,
> Halil
> 


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-19 14:54           ` Halil Pasic
@ 2019-03-19 17:07             ` Pierre Morel
  2019-03-21 14:05               ` Pierre Morel
  0 siblings, 1 reply; 29+ messages in thread
From: Pierre Morel @ 2019-03-19 17:07 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, borntraeger, alex.williamson, linux-kernel,
	linux-s390, kvm, frankja, akrowiak, david, schwidefsky,
	heiko.carstens, freude, mimu

On 19/03/2019 15:54, Halil Pasic wrote:
> On Tue, 19 Mar 2019 11:01:44 +0100
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 15/03/2019 18:28, Halil Pasic wrote:
> 
> [..]
> 
>>>
>>> Things get complicated when one considers that ECA.28 is an effective
>>> control.
>>
>> I don't think so, ECA_28 is not really a problem.
>> We do not propagate ECA_AIV in VSIE and ECA_AIV is tested in the vfio
>> driver to support GISA.
>> So that the guest 3 will not support interrupt.
>>
> 
> That was not my concern, but while we are at it... I guess you refer to
> the check in handle_pqap(). That seems to do -EOPNOTSUPP, i.e. got to
> userspace, i.e. with today's QEMU operation exception. Which does not
> seem right.

We already discussed this. no?

> 
> My concern was the following. Let assume
> ECA.28 == 1 and EECA.28 == 0 != 1
> and guest issues a PQAP (for simplicity AQIC).
> 
> Currently I guess we take a 0x04 interception and go to userspace, which
> may or may not be the best thing to do.
> 
> With this patch we would take a 0x04, but (opposed to before) if guest
> does not have facility 65 we go with a specification exception.

This is not right.
We return -EOPNOTSUPP which will be intercepted by QEMU which will 
report an OPERATION exception as before.

> Operation exception should however take priority over this kind of
> specification exception. So basically everything except PQAP/AQIC would
> give you and operation exception (with current QEMU), but PQAP/AQIC would
> give you a specification exception. Which is wrong!
> 
> AFAICT there is no way to tell if we got a 04 interception because
> EECA.28 != 1 (and ECA.28 == 1) and FW won't interpret the AP
> instructions for us, or because it PQAP/AQIC is a mandatory intercept.
> In other words I don't see a way to tell if EECA.28 is 1 when
> interpreting PQAP/AQIC.
> 
> Do you agree?


No.
EECA = HOST_ECA & GUEST_ECA
after we made sure that AP instructions are available, HOST_ECA=1

(vcpu->arch.sie_block->eca & ECA_APIE) gives us the answer.

In the case HOST_ECA=0 we always go to userland as before.

> 
> [..]
> 
>>
>> Yes, the alternative is:
>>
>> 1) We do things right but this mean we change the ABI (SPECIFICATION
>> instead of OPERATION)
>>
>> I thing this is the best thing to do, it is the implementation
>> proposed by this patch where all is done in Kernel, so that we are
>> right what ever the userland user is (QEMU or other).
>>
>> 2) We want to preserve the old ABI for old QEMU
>> Then I proposed the implementation here under.
>>
>>
>> My personal opinion, is that we should change the ABI and do things
>> right now.
> 
> I tend to agree. Giving an operation exception instead of a specification
> exception is a bug. If it is a kernel or qemu bug it ain't clear to me
> at the moment.
> 
>> We should also do it right for TAPQ with t bit set. I remember
>> Christian already warned about this but we did not implement it.
>>
> 
> Yes, I have some blurry memories of something similar myself. I wonder
> if there was a reason, or did we just forget to address this issue.


I will integrate it in the next iteration too, even it is not IRQ, the 
PQAP hook patch can be more general.

Regards,
Pierre

> 
> Regards,
> Halil
> 


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC
  2019-03-19 17:07             ` Pierre Morel
@ 2019-03-21 14:05               ` Pierre Morel
  0 siblings, 0 replies; 29+ messages in thread
From: Pierre Morel @ 2019-03-21 14:05 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, borntraeger, alex.williamson, linux-kernel,
	linux-s390, kvm, frankja, akrowiak, david, schwidefsky,
	heiko.carstens, freude, mimu

On 19/03/2019 18:07, Pierre Morel wrote:
> On 19/03/2019 15:54, Halil Pasic wrote:
>> On Tue, 19 Mar 2019 11:01:44 +0100
>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>
>>> On 15/03/2019 18:28, Halil Pasic wrote:
>>

...snip...

>>
>>> We should also do it right for TAPQ with t bit set. I remember
>>> Christian already warned about this but we did not implement it.
>>>
>>
>> Yes, I have some blurry memories of something similar myself. I wonder
>> if there was a reason, or did we just forget to address this issue.
> 
> 
> I will integrate it in the next iteration too, even it is not IRQ, the 
> PQAP hook patch can be more general.

After all, I will not do this, I remember the reason why we did not do 
it once: simply it is not intercepted until we enable it.

So we will handle it when we enable the TAPQ-t interception

Regards,
Pierre

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-03-21 14:05 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-13 16:04 [PATCH v5 0/7] vfio: ap: AP Queue Interrupt Control Pierre Morel
2019-03-13 16:04 ` [PATCH v5 1/7] s390: ap: kvm: add PQAP interception for AQIC Pierre Morel
2019-03-15 10:20   ` Cornelia Huck
2019-03-15 13:26     ` Pierre Morel
2019-03-15 13:41       ` Cornelia Huck
2019-03-15 13:44         ` Pierre Morel
2019-03-15 14:10       ` Pierre Morel
2019-03-15 17:43         ` Halil Pasic
2019-03-19  9:55         ` Pierre Morel
2019-03-15 17:28       ` Halil Pasic
2019-03-19 10:01         ` Pierre Morel
2019-03-19 14:54           ` Halil Pasic
2019-03-19 17:07             ` Pierre Morel
2019-03-21 14:05               ` Pierre Morel
2019-03-13 16:04 ` [PATCH v5 2/7] s390: ap: new vfio_ap_queue structure Pierre Morel
2019-03-15 10:33   ` Cornelia Huck
2019-03-15 13:29     ` Pierre Morel
2019-03-13 16:05 ` [PATCH v5 3/7] vfio: ap: register IOMMU VFIO notifier Pierre Morel
2019-03-13 16:05 ` [PATCH v5 4/7] s390: ap: setup relation betwen KVM and mediated device Pierre Morel
2019-03-15 18:15   ` Halil Pasic
2019-03-19  9:38     ` Pierre Morel
2019-03-19 11:54       ` Halil Pasic
2019-03-19 14:23         ` Pierre Morel
2019-03-19 14:47           ` Pierre Morel
2019-03-19 15:27             ` Halil Pasic
2019-03-19 16:48               ` Pierre Morel
2019-03-13 16:05 ` [PATCH v5 5/7] s390: ap: implement PAPQ AQIC interception in kernel Pierre Morel
2019-03-13 16:05 ` [PATCH v5 6/7] s390: ap: Cleanup on removing the AP device Pierre Morel
2019-03-13 16:05 ` [PATCH v5 7/7] s390: ap: kvm: Enable PQAP/AQIC facility for the guest Pierre Morel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).