* [PATCH v3 0/2] s390/vfio-ap: fix memory leak in mdev remove callback @ 2021-05-19 15:39 Tony Krowiak 2021-05-19 15:39 ` [PATCH v3 1/2] " Tony Krowiak 2021-05-19 15:39 ` [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler Tony Krowiak 0 siblings, 2 replies; 15+ messages in thread From: Tony Krowiak @ 2021-05-19 15:39 UTC (permalink / raw) To: linux-s390, linux-kernel Cc: borntraeger, cohuck, pasic, jjherne, jgg, alex.williamson, kwankhede, Tony Krowiak Fixes a memory leak in the mdev remove callback when invoked while the mdev is in use by a KVM guest. Instead of returning -EBUSY from the callback, a full cleanup of the resources allocated to the mdev is performed because regardless of the value returned from the function, the mdev is removed from sysfs. The cleanup of resources allocated to the mdev may coincide with the interception of the PQAP(AQIC) instruction in which case data needed to handle the interception may get removed. A patch is included in this series to synchronize access to resources needed by the interception handler to protect against invalid memory accesses. Change log: v2 -> v3: -------- * Added a patch to control access to the PQAP(AQIC) hook using RCU v1 -> v2: -------- * Call vfio_ap_mdev_unset_kvm() function from the remove callback instead of merely clearing the guest's APCB. Tony Krowiak (2): s390/vfio-ap: fix memory leak in mdev remove callback s390/vfio-ap: control access to PQAP(AQIC) interception handler arch/s390/include/asm/kvm_host.h | 1 + arch/s390/kvm/priv.c | 47 +++++++++++++--------- drivers/s390/crypto/vfio_ap_ops.c | 67 +++++++++++++++++++++---------- 3 files changed, 75 insertions(+), 40 deletions(-) -- 2.30.2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 1/2] s390/vfio-ap: fix memory leak in mdev remove callback 2021-05-19 15:39 [PATCH v3 0/2] s390/vfio-ap: fix memory leak in mdev remove callback Tony Krowiak @ 2021-05-19 15:39 ` Tony Krowiak 2021-05-19 15:39 ` [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler Tony Krowiak 1 sibling, 0 replies; 15+ messages in thread From: Tony Krowiak @ 2021-05-19 15:39 UTC (permalink / raw) To: linux-s390, linux-kernel Cc: borntraeger, cohuck, pasic, jjherne, jgg, alex.williamson, kwankhede, Tony Krowiak, stable The mdev remove callback for the vfio_ap device driver bails out with -EBUSY if the mdev is in use by a KVM guest. The intended purpose was to prevent the mdev from being removed while in use; however, returning a non-zero rc does not prevent removal. This could result in a memory leak of the resources allocated when the mdev was created. In addition, the KVM guest will still have access to the AP devices assigned to the mdev even though the mdev no longer exists. To prevent this scenario, cleanup will be done - including unplugging the AP adapters, domains and control domains - regardless of whether the mdev is in use by a KVM guest or not. Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback") Cc: stable@vger.kernel.org Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> --- drivers/s390/crypto/vfio_ap_ops.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index b2c7e10dfdcd..f90c9103dac2 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -26,6 +26,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev); static struct vfio_ap_queue *vfio_ap_find_queue(int apqn); +static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev); static int match_apqn(struct device *dev, const void *data) { @@ -366,17 +367,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev) struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); mutex_lock(&matrix_dev->lock); - - /* - * If the KVM pointer is in flux or the guest is running, disallow - * un-assignment of control domain. - */ - if (matrix_mdev->kvm_busy || matrix_mdev->kvm) { - mutex_unlock(&matrix_dev->lock); - return -EBUSY; - } - - vfio_ap_mdev_reset_queues(mdev); + vfio_ap_mdev_unset_kvm(matrix_mdev); list_del(&matrix_mdev->node); kfree(matrix_mdev); mdev_set_drvdata(mdev, NULL); -- 2.30.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-19 15:39 [PATCH v3 0/2] s390/vfio-ap: fix memory leak in mdev remove callback Tony Krowiak 2021-05-19 15:39 ` [PATCH v3 1/2] " Tony Krowiak @ 2021-05-19 15:39 ` Tony Krowiak 2021-05-19 16:16 ` Jason Gunthorpe 2021-05-19 17:21 ` Halil Pasic 1 sibling, 2 replies; 15+ messages in thread From: Tony Krowiak @ 2021-05-19 15:39 UTC (permalink / raw) To: linux-s390, linux-kernel Cc: borntraeger, cohuck, pasic, jjherne, jgg, alex.williamson, kwankhede, Tony Krowiak There is currently nothing that controls access to the structure that contains the function pointer to the handler that processes interception of the PQAP(AQIC) instruction. If the mdev is removed while the PQAP(AQIC) instruction is being intercepted, there is a possibility that the function pointer to the handler can get wiped out prior to the attempt to call it. This patch utilizes RCU to synchronize access to the kvm_s390_module_hook structure used to process interception of the PQAP(AQIC) instruction. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> --- arch/s390/include/asm/kvm_host.h | 1 + arch/s390/kvm/priv.c | 47 ++++++++++++++++----------- drivers/s390/crypto/vfio_ap_ops.c | 54 ++++++++++++++++++++++++------- 3 files changed, 73 insertions(+), 29 deletions(-) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 8925f3969478..4987e82d6116 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -806,6 +806,7 @@ struct kvm_s390_cpu_model { struct kvm_s390_module_hook { int (*hook)(struct kvm_vcpu *vcpu); struct module *owner; + void *data; }; struct kvm_s390_crypto { diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index 9928f785c677..2d330dfbdb61 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -610,8 +610,11 @@ static int handle_io_inst(struct kvm_vcpu *vcpu) static int handle_pqap(struct kvm_vcpu *vcpu) { struct ap_queue_status status = {}; + struct kvm_s390_module_hook *pqap_module_hook; + int (*pqap_hook)(struct kvm_vcpu *vcpu); + struct module *owner; unsigned long reg0; - int ret; + int ret = 0; uint8_t fc; /* Verify that the AP instruction are available */ @@ -657,24 +660,32 @@ static int handle_pqap(struct kvm_vcpu *vcpu) * Verify that the hook callback is registered, lock the owner * and call the hook. */ - if (vcpu->kvm->arch.crypto.pqap_hook) { - if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) - return -EOPNOTSUPP; - ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); - module_put(vcpu->kvm->arch.crypto.pqap_hook->owner); - if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000) - kvm_s390_set_psw_cc(vcpu, 3); - return ret; + rcu_read_lock(); + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook); + if (pqap_module_hook) { + pqap_hook = pqap_module_hook->hook; + owner = pqap_module_hook->owner; + rcu_read_unlock(); + if (!try_module_get(owner)) { + ret = -EOPNOTSUPP; + } else { + ret = pqap_hook(vcpu); + module_put(owner); + if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000) + kvm_s390_set_psw_cc(vcpu, 3); + } + } else { + rcu_read_unlock(); + /* + * A vfio_driver must register a hook. + * No hook means no driver to enable the SIE CRYCB and no + * queues. We send this response to the guest. + */ + status.response_code = 0x01; + memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status)); + kvm_s390_set_psw_cc(vcpu, 3); } - /* - * A vfio_driver must register a hook. - * No hook means no driver to enable the SIE CRYCB and no queues. - * We send this response to the guest. - */ - status.response_code = 0x01; - memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status)); - kvm_s390_set_psw_cc(vcpu, 3); - return 0; + return ret; } static int handle_stfl(struct kvm_vcpu *vcpu) diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index f90c9103dac2..a6aa3f753ac4 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -16,6 +16,7 @@ #include <linux/bitops.h> #include <linux/kvm_host.h> #include <linux/module.h> +#include <linux/rcupdate.h> #include <asm/kvm.h> #include <asm/zcrypt.h> @@ -279,6 +280,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu) uint64_t status; uint16_t apqn; struct vfio_ap_queue *q; + struct kvm_s390_module_hook *pqap_module_hook; struct ap_queue_status qstatus = { .response_code = AP_RESPONSE_Q_NOT_AVAIL, }; struct ap_matrix_mdev *matrix_mdev; @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu) if (!(vcpu->arch.sie_block->eca & ECA_AIV)) return -EOPNOTSUPP; - apqn = vcpu->run->s.regs.gprs[0] & 0xffff; - mutex_lock(&matrix_dev->lock); + rcu_read_lock(); + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook); + if (!pqap_module_hook) { + rcu_read_unlock(); + goto set_status; + } - if (!vcpu->kvm->arch.crypto.pqap_hook) - goto out_unlock; - matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook, - struct ap_matrix_mdev, pqap_hook); + matrix_mdev = pqap_module_hook->data; + rcu_read_unlock(); + mutex_lock(&matrix_dev->lock); + apqn = vcpu->run->s.regs.gprs[0] & 0xffff; /* * If the KVM pointer is in the process of being set, wait until the @@ -322,9 +328,10 @@ static int handle_pqap(struct kvm_vcpu *vcpu) qstatus = vfio_ap_irq_disable(q); out_unlock: + mutex_unlock(&matrix_dev->lock); +set_status: memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus)); vcpu->run->s.regs.gprs[1] >>= 32; - mutex_unlock(&matrix_dev->lock); return 0; } @@ -353,8 +360,6 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev) vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix); init_waitqueue_head(&matrix_mdev->wait_for_kvm); mdev_set_drvdata(mdev, matrix_mdev); - matrix_mdev->pqap_hook.hook = handle_pqap; - matrix_mdev->pqap_hook.owner = THIS_MODULE; mutex_lock(&matrix_dev->lock); list_add(&matrix_mdev->node, &matrix_dev->mdev_list); mutex_unlock(&matrix_dev->lock); @@ -1085,6 +1090,22 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = { NULL }; +static int vfio_ap_mdev_set_pqap_hook(struct ap_matrix_mdev *matrix_mdev, + struct kvm *kvm) +{ + struct kvm_s390_module_hook *pqap_hook; + + pqap_hook = kmalloc(sizeof(*kvm->arch.crypto.pqap_hook), GFP_KERNEL); + if (!pqap_hook) + return -ENOMEM; + pqap_hook->data = matrix_mdev; + pqap_hook->hook = handle_pqap; + pqap_hook->owner = THIS_MODULE; + rcu_assign_pointer(kvm->arch.crypto.pqap_hook, pqap_hook); + + return 0; +} + /** * vfio_ap_mdev_set_kvm * @@ -1107,6 +1128,7 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = { static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, struct kvm *kvm) { + int ret; struct ap_matrix_mdev *m; if (kvm->arch.crypto.crycbd) { @@ -1115,6 +1137,10 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, return -EPERM; } + ret = vfio_ap_mdev_set_pqap_hook(matrix_mdev, kvm); + if (ret) + return ret; + kvm_get_kvm(kvm); matrix_mdev->kvm_busy = true; mutex_unlock(&matrix_dev->lock); @@ -1123,7 +1149,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, matrix_mdev->matrix.aqm, matrix_mdev->matrix.adm); mutex_lock(&matrix_dev->lock); - kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook; matrix_mdev->kvm = kvm; matrix_mdev->kvm_busy = false; wake_up_all(&matrix_mdev->wait_for_kvm); @@ -1161,6 +1186,13 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb, return NOTIFY_DONE; } +static void vfio_ap_mdev_unset_pqap_hook(struct kvm *kvm) +{ + rcu_assign_pointer(kvm->arch.crypto.pqap_hook, NULL); + synchronize_rcu(); + kfree(kvm->arch.crypto.pqap_hook); +} + /** * vfio_ap_mdev_unset_kvm * @@ -1189,11 +1221,11 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) if (matrix_mdev->kvm) { matrix_mdev->kvm_busy = true; + vfio_ap_mdev_unset_pqap_hook(matrix_mdev->kvm); mutex_unlock(&matrix_dev->lock); kvm_arch_crypto_clear_masks(matrix_mdev->kvm); mutex_lock(&matrix_dev->lock); vfio_ap_mdev_reset_queues(matrix_mdev->mdev); - matrix_mdev->kvm->arch.crypto.pqap_hook = NULL; kvm_put_kvm(matrix_mdev->kvm); matrix_mdev->kvm = NULL; matrix_mdev->kvm_busy = false; -- 2.30.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-19 15:39 ` [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler Tony Krowiak @ 2021-05-19 16:16 ` Jason Gunthorpe 2021-05-19 23:04 ` Tony Krowiak 2021-05-19 17:21 ` Halil Pasic 1 sibling, 1 reply; 15+ messages in thread From: Jason Gunthorpe @ 2021-05-19 16:16 UTC (permalink / raw) To: Tony Krowiak Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, alex.williamson, kwankhede On Wed, May 19, 2021 at 11:39:21AM -0400, Tony Krowiak wrote: > @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu) > if (!(vcpu->arch.sie_block->eca & ECA_AIV)) > return -EOPNOTSUPP; > > - apqn = vcpu->run->s.regs.gprs[0] & 0xffff; > - mutex_lock(&matrix_dev->lock); > + rcu_read_lock(); > + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook); > + if (!pqap_module_hook) { > + rcu_read_unlock(); > + goto set_status; > + } > > - if (!vcpu->kvm->arch.crypto.pqap_hook) > - goto out_unlock; > - matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook, > - struct ap_matrix_mdev, pqap_hook); > + matrix_mdev = pqap_module_hook->data; > + rcu_read_unlock(); > + mutex_lock(&matrix_dev->lock); The matrix_mdev pointer was extracted from the pqap_module_hook, but now there is nothing protecting it since the rcu was dropped and it gets freed in vfio_ap_mdev_remove. And, again, module locking doesn't prevent vfio_ap_mdev_remove() from being called. None of these patches should be combining module locking with RCU. Jason ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-19 16:16 ` Jason Gunthorpe @ 2021-05-19 23:04 ` Tony Krowiak 2021-05-19 23:22 ` Jason Gunthorpe 0 siblings, 1 reply; 15+ messages in thread From: Tony Krowiak @ 2021-05-19 23:04 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, alex.williamson, kwankhede On 5/19/21 12:16 PM, Jason Gunthorpe wrote: > On Wed, May 19, 2021 at 11:39:21AM -0400, Tony Krowiak wrote: > >> @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu) >> if (!(vcpu->arch.sie_block->eca & ECA_AIV)) >> return -EOPNOTSUPP; >> >> - apqn = vcpu->run->s.regs.gprs[0] & 0xffff; >> - mutex_lock(&matrix_dev->lock); >> + rcu_read_lock(); >> + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook); >> + if (!pqap_module_hook) { >> + rcu_read_unlock(); >> + goto set_status; >> + } >> >> - if (!vcpu->kvm->arch.crypto.pqap_hook) >> - goto out_unlock; >> - matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook, >> - struct ap_matrix_mdev, pqap_hook); >> + matrix_mdev = pqap_module_hook->data; >> + rcu_read_unlock(); >> + mutex_lock(&matrix_dev->lock); > The matrix_mdev pointer was extracted from the pqap_module_hook, > but now there is nothing protecting it since the rcu was dropped and > it gets freed in vfio_ap_mdev_remove. Therein lies the rub. We can't hold the rcu_read_lock across the entire time that the interception is being processed because of wait conditions in the interception handler. Regardless of whether the pointer to the matrix_mdev is retrieved as the container of or extracted from the pqap_hook, there is nothing protecting it and there appears to be no way to do so using RCU. > > And, again, module locking doesn't prevent vfio_ap_mdev_remove() from > being called. None of these patches should be combining module locking > with RCU. Is there any other way besides user interaction with the mdev's sysfs remove interface for the remove callback to get invoked? If I try to remove the mdev using the sysfs interface while the mdev fd is still open by the guest, the remove hangs until the fd is closed. That being the case, the mdev release callback will get invoked prior to the remove callback being invoked which renders this whole debate moot. What am I missing here? > > Jason ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-19 23:04 ` Tony Krowiak @ 2021-05-19 23:22 ` Jason Gunthorpe 2021-05-20 1:08 ` Tony Krowiak ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Jason Gunthorpe @ 2021-05-19 23:22 UTC (permalink / raw) To: Tony Krowiak Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, alex.williamson, kwankhede On Wed, May 19, 2021 at 07:04:46PM -0400, Tony Krowiak wrote: > > > On 5/19/21 12:16 PM, Jason Gunthorpe wrote: > > On Wed, May 19, 2021 at 11:39:21AM -0400, Tony Krowiak wrote: > > > > > @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu) > > > if (!(vcpu->arch.sie_block->eca & ECA_AIV)) > > > return -EOPNOTSUPP; > > > - apqn = vcpu->run->s.regs.gprs[0] & 0xffff; > > > - mutex_lock(&matrix_dev->lock); > > > + rcu_read_lock(); > > > + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook); > > > + if (!pqap_module_hook) { > > > + rcu_read_unlock(); > > > + goto set_status; > > > + } > > > - if (!vcpu->kvm->arch.crypto.pqap_hook) > > > - goto out_unlock; > > > - matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook, > > > - struct ap_matrix_mdev, pqap_hook); > > > + matrix_mdev = pqap_module_hook->data; > > > + rcu_read_unlock(); > > > + mutex_lock(&matrix_dev->lock); > > The matrix_mdev pointer was extracted from the pqap_module_hook, > > but now there is nothing protecting it since the rcu was dropped and > > it gets freed in vfio_ap_mdev_remove. > > Therein lies the rub. We can't hold the rcu_read_lock across the > entire time that the interception is being processed because of > wait conditions in the interception handler. Regardless of whether > the pointer to the matrix_mdev is retrieved as the container of > or extracted from the pqap_hook, there is nothing protecting it > and there appears to be no way to do so using RCU. RCU is a lock that should only be used for highly performance sensitive read work loads. It eliminates one atomic from a lock, but if you go on to immediately do something like try_module_get, which has an atomic inside, the whole thing is pointless (assuming try_module_get was even the right thing to do) Use a simple sleepable rwsem around the whole thing and forget about the module_get. Hold the write side when NULL'ing the pointer. > > And, again, module locking doesn't prevent vfio_ap_mdev_remove() from > > being called. None of these patches should be combining module locking > > with RCU. > > Is there any other way besides user interaction with the mdev's > sysfs remove interface for the remove callback to get invoked? There are more options after my re-organizing series. But I'm not sure why you ask in response to my point about module locking? Module locking is not really related to sysfs. > If I try to remove the mdev using the sysfs interface while the > mdev fd is still open by the guest, the remove hangs until the > fd is closed. Yes, it will wait when the vfio_device is being destroyed. > That being the case, the mdev release callback will get invoked > prior to the remove callback being invoked which renders this whole > debate moot. What am I missing here? AFAICT the control flow is such that release can immediately move on to remove on the same CPU without an additional synchronization. So the kfree can still race with the above. But the more I look at this the wonkier it is.. The real issue is not the matrix_mdev, it is the whole vcpu->kvm->arch.crypto.pqap_hook This is nonesense too: if (vcpu->kvm->arch.crypto.pqap_hook) { if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) return -EOPNOTSUPP; ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); It should have a lock around it of some kind, not a try_module_get. module_get is not la lock. And that lock should interact with loading the hook in the first place: kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook; And of course NULLin'g the hooke should be synchronous with the lock. There should be no owner for something like this. unregistering the function pointer should fully fence all access. The simple solution in sketch is just this: diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index 9928f785c6773a..f70386452367dd 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -657,13 +657,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu) * Verify that the hook callback is registered, lock the owner * and call the hook. */ - if (vcpu->kvm->arch.crypto.pqap_hook) { - if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) - return -EOPNOTSUPP; + if (down_read_trylock(&vcpu->kvm->arch.crypto.rwsem) && + vcpu->kvm->arch.crypto.pqap_hook) { ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); - module_put(vcpu->kvm->arch.crypto.pqap_hook->owner); if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000) kvm_s390_set_psw_cc(vcpu, 3); + up_read(&vcpu->kv->arch.crypto.rwsem); return ret; } /* diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index b2c7e10dfdcdcf..64c89f6a711e94 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -352,8 +352,7 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev) vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix); init_waitqueue_head(&matrix_mdev->wait_for_kvm); mdev_set_drvdata(mdev, matrix_mdev); - matrix_mdev->pqap_hook.hook = handle_pqap; - matrix_mdev->pqap_hook.owner = THIS_MODULE; + down_write(&&vcpu->kvm->arch.crypto.rwsem); mutex_lock(&matrix_dev->lock); list_add(&matrix_mdev->node, &matrix_dev->mdev_list); mutex_unlock(&matrix_dev->lock); @@ -1132,7 +1131,9 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, matrix_mdev->matrix.aqm, matrix_mdev->matrix.adm); mutex_lock(&matrix_dev->lock); + down_write(&kvm->arch.crypto.rwsem); kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook; + up_write(&kvm->arch.crypto.rwsem); matrix_mdev->kvm = kvm; matrix_mdev->kvm_busy = false; wake_up_all(&matrix_mdev->wait_for_kvm); @@ -1202,7 +1203,9 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) kvm_arch_crypto_clear_masks(matrix_mdev->kvm); mutex_lock(&matrix_dev->lock); vfio_ap_mdev_reset_queues(matrix_mdev->mdev); + down_write(&matrix_mdev->kvm->arch.crypto.rwsem); matrix_mdev->kvm->arch.crypto.pqap_hook = NULL; + up_write(&matrix_mdev->kvm->arch.crypto.rwsem); kvm_put_kvm(matrix_mdev->kvm); matrix_mdev->kvm = NULL; matrix_mdev->kvm_busy = false; Jason ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-19 23:22 ` Jason Gunthorpe @ 2021-05-20 1:08 ` Tony Krowiak 2021-05-20 8:48 ` Halil Pasic 2021-05-20 8:38 ` Halil Pasic 2021-05-21 18:24 ` Tony Krowiak 2 siblings, 1 reply; 15+ messages in thread From: Tony Krowiak @ 2021-05-20 1:08 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, alex.williamson, kwankhede On 5/19/21 7:22 PM, Jason Gunthorpe wrote: > On Wed, May 19, 2021 at 07:04:46PM -0400, Tony Krowiak wrote: >> >> On 5/19/21 12:16 PM, Jason Gunthorpe wrote: >>> On Wed, May 19, 2021 at 11:39:21AM -0400, Tony Krowiak wrote: >>> >>>> @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu) >>>> if (!(vcpu->arch.sie_block->eca & ECA_AIV)) >>>> return -EOPNOTSUPP; >>>> - apqn = vcpu->run->s.regs.gprs[0] & 0xffff; >>>> - mutex_lock(&matrix_dev->lock); >>>> + rcu_read_lock(); >>>> + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook); >>>> + if (!pqap_module_hook) { >>>> + rcu_read_unlock(); >>>> + goto set_status; >>>> + } >>>> - if (!vcpu->kvm->arch.crypto.pqap_hook) >>>> - goto out_unlock; >>>> - matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook, >>>> - struct ap_matrix_mdev, pqap_hook); >>>> + matrix_mdev = pqap_module_hook->data; >>>> + rcu_read_unlock(); >>>> + mutex_lock(&matrix_dev->lock); >>> The matrix_mdev pointer was extracted from the pqap_module_hook, >>> but now there is nothing protecting it since the rcu was dropped and >>> it gets freed in vfio_ap_mdev_remove. >> Therein lies the rub. We can't hold the rcu_read_lock across the >> entire time that the interception is being processed because of >> wait conditions in the interception handler. Regardless of whether >> the pointer to the matrix_mdev is retrieved as the container of >> or extracted from the pqap_hook, there is nothing protecting it >> and there appears to be no way to do so using RCU. > RCU is a lock that should only be used for highly performance > sensitive read work loads. It eliminates one atomic from a lock, but > if you go on to immediately do something like try_module_get, which > has an atomic inside, the whole thing is pointless (assuming > try_module_get was even the right thing to do) > > Use a simple sleepable rwsem around the whole thing and forget about > the module_get. Hold the write side when NULL'ing the pointer. > >>> And, again, module locking doesn't prevent vfio_ap_mdev_remove() from >>> being called. None of these patches should be combining module locking >>> with RCU. >> Is there any other way besides user interaction with the mdev's >> sysfs remove interface for the remove callback to get invoked? > There are more options after my re-organizing series. > > But I'm not sure why you ask in response to my point about module > locking? Module locking is not really related to sysfs. I don't know why you keep harping on module locking. I didn't write the original code, so I really have no idea why that was incorporated. The question wasn't in response to module locking, it's just something that popped into my head at the time because based on my observations, the remove callback did not get invoked in response to a user removing the mdev via the sysfs interface until the mdev fd was closed by the guest. > >> If I try to remove the mdev using the sysfs interface while the >> mdev fd is still open by the guest, the remove hangs until the >> fd is closed. > Yes, it will wait when the vfio_device is being destroyed. > >> That being the case, the mdev release callback will get invoked >> prior to the remove callback being invoked which renders this whole >> debate moot. What am I missing here? > AFAICT the control flow is such that release can immediately move on > to remove on the same CPU without an additional synchronization. So > the kfree can still race with the above. I'd have to look into it more, but my initial thought is that this is not true. If what I observed is true - i.e., the mdev remove callback won't get invoked until the mdev fd is closed by the guest - then the matrix_mdev will never get freed while the pqap_hook is not NULL because the mdev release callback - invoked when the mdev fd is closed - nullifies it. That is why I asked whether there is any other way that the mdev remove callback gets invoked other than when a user removes it via the sysfs remove attribute. Keep in mind, the matrix_mdev is removed under the matrix_dev->lock. If retrieve the matix_mdev in the interception handler from the matrix_dev->mdev_list under the matrix_dev->lock - instead of from the pqap_hook or the container_of macro - and exit the handler if it has been removed, then I think the race issue can be avoided. > > But the more I look at this the wonkier it is.. The real issue is not > the matrix_mdev, it is the whole vcpu->kvm->arch.crypto.pqap_hook Actually, the matrix_mdev is the issue. If the handle_pqap callback is invoked after the matrix_mdev is freed, then because we retrieve it from the pqap_hook structure or even if we retrieve it from the enclosing container, the interception handler is screwed. > > This is nonesense too: > > if (vcpu->kvm->arch.crypto.pqap_hook) { > if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) > return -EOPNOTSUPP; > ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); > > It should have a lock around it of some kind, not a > try_module_get. module_get is not la lock. As I said earlier, I don't know why the author did this. My best guess is that he wanted to ensure that the module was still loaded; otherwise, the data structures contained therein - for example, the pqap_hook and matrix_mdev that contains it - would be gonzo. > > And that lock should interact with loading the hook in the first > place: > kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook; > > And of course NULLin'g the hooke should be synchronous with the lock. > > There should be no owner for something like this. unregistering the > function pointer should fully fence all access. > > The simple solution in sketch is just this: > > diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c > index 9928f785c6773a..f70386452367dd 100644 > --- a/arch/s390/kvm/priv.c > +++ b/arch/s390/kvm/priv.c > @@ -657,13 +657,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu) > * Verify that the hook callback is registered, lock the owner > * and call the hook. > */ > - if (vcpu->kvm->arch.crypto.pqap_hook) { > - if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) > - return -EOPNOTSUPP; > + if (down_read_trylock(&vcpu->kvm->arch.crypto.rwsem) && > + vcpu->kvm->arch.crypto.pqap_hook) { > ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); > - module_put(vcpu->kvm->arch.crypto.pqap_hook->owner); > if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000) > kvm_s390_set_psw_cc(vcpu, 3); > + up_read(&vcpu->kv->arch.crypto.rwsem); > return ret; > } Since reading the various review comments related to this patch, I started working on something along the lines of that which you propose. I will post a new set of patches, probably tomorrow. > /* > diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > index b2c7e10dfdcdcf..64c89f6a711e94 100644 > --- a/drivers/s390/crypto/vfio_ap_ops.c > +++ b/drivers/s390/crypto/vfio_ap_ops.c > @@ -352,8 +352,7 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev) > vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix); > init_waitqueue_head(&matrix_mdev->wait_for_kvm); > mdev_set_drvdata(mdev, matrix_mdev); > - matrix_mdev->pqap_hook.hook = handle_pqap; > - matrix_mdev->pqap_hook.owner = THIS_MODULE; > + down_write(&&vcpu->kvm->arch.crypto.rwsem); > mutex_lock(&matrix_dev->lock); > list_add(&matrix_mdev->node, &matrix_dev->mdev_list); > mutex_unlock(&matrix_dev->lock); > @@ -1132,7 +1131,9 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, > matrix_mdev->matrix.aqm, > matrix_mdev->matrix.adm); > mutex_lock(&matrix_dev->lock); > + down_write(&kvm->arch.crypto.rwsem); > kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook; > + up_write(&kvm->arch.crypto.rwsem); > matrix_mdev->kvm = kvm; > matrix_mdev->kvm_busy = false; > wake_up_all(&matrix_mdev->wait_for_kvm); > @@ -1202,7 +1203,9 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) > kvm_arch_crypto_clear_masks(matrix_mdev->kvm); > mutex_lock(&matrix_dev->lock); > vfio_ap_mdev_reset_queues(matrix_mdev->mdev); > + down_write(&matrix_mdev->kvm->arch.crypto.rwsem); > matrix_mdev->kvm->arch.crypto.pqap_hook = NULL; > + up_write(&matrix_mdev->kvm->arch.crypto.rwsem); > kvm_put_kvm(matrix_mdev->kvm); > matrix_mdev->kvm = NULL; > matrix_mdev->kvm_busy = false; > > > Jason ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-20 1:08 ` Tony Krowiak @ 2021-05-20 8:48 ` Halil Pasic 2021-05-20 12:26 ` Jason Gunthorpe 0 siblings, 1 reply; 15+ messages in thread From: Halil Pasic @ 2021-05-20 8:48 UTC (permalink / raw) To: Tony Krowiak Cc: Jason Gunthorpe, linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, alex.williamson, kwankhede On Wed, 19 May 2021 21:08:15 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > > > > This is nonesense too: > > > > if (vcpu->kvm->arch.crypto.pqap_hook) { > > if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) > > return -EOPNOTSUPP; > > ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); > > > > It should have a lock around it of some kind, not a > > try_module_get. module_get is not la lock. > > As I said earlier, I don't know why the author did this. Please have a look at these links from the archive to get some perspective: https://lkml.org/lkml/2020/12/4/994 https://lkml.org/lkml/2020/12/3/987 https://www.lkml.org/lkml/2019/3/1/260 We can ask the original author, but I don't think we have to. BTW the patch that introduced it has your r-b. > My best guess > is that he wanted to ensure that the module was still loaded; otherwise, > the data structures contained therein - for example, the pqap_hook > and matrix_mdev that contains it - would be gonzo. More precisely prevent the module from unloading while we execute code from it. As I've pointed out in a previous email the module may be gone by the time we call try_module_get(). Regards, Halil ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-20 8:48 ` Halil Pasic @ 2021-05-20 12:26 ` Jason Gunthorpe 0 siblings, 0 replies; 15+ messages in thread From: Jason Gunthorpe @ 2021-05-20 12:26 UTC (permalink / raw) To: Halil Pasic Cc: Tony Krowiak, linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, alex.williamson, kwankhede On Thu, May 20, 2021 at 10:48:57AM +0200, Halil Pasic wrote: > On Wed, 19 May 2021 21:08:15 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > > > > > > > This is nonesense too: > > > > > > if (vcpu->kvm->arch.crypto.pqap_hook) { > > > if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) > > > return -EOPNOTSUPP; > > > ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); > > > > > > It should have a lock around it of some kind, not a > > > try_module_get. module_get is not la lock. > > > > As I said earlier, I don't know why the author did this. > > Please have a look at these links from the archive to get some > perspective: > https://lkml.org/lkml/2020/12/4/994 > https://lkml.org/lkml/2020/12/3/987 > https://www.lkml.org/lkml/2019/3/1/260 > > We can ask the original author, but I don't think we have to. BTW the > patch that introduced it has your r-b. > > > My best guess > > is that he wanted to ensure that the module was still loaded; otherwise, > > the data structures contained therein - for example, the pqap_hook > > and matrix_mdev that contains it - would be gonzo. > > More precisely prevent the module from unloading while we execute code > from it. As I've pointed out in a previous email the module may be gone > by the time we call try_module_get(). No, this is a common misconception. The module_get prevents the module from even being attempted to be unloaded. Code should acquire this if it has done something that would cause a module remove function hang indefinitely, such as a design that waits for a user FD to close. This provides a good user experience but should generally not be required for correctness. All code passing function pointers across subsystems should always fully fence those function pointers during removal. This means it interacts with some kind of locking that guarentees nothing is currently calling, or ever will call again, those function pointers. This is not just to protect the function pointer code itself, but the lock should also protect the data access that function pointer almost always invokes. This is the bug here, ap is accessing the matrix_dev data from a function pointer without any locking or serialization against kfree(matrix_dev). Fencing to guarentee the hook isn't and won't run also serves as a strong enough serialization to allow the kfree(). The basic logic is that a module removal cannot complete until all its function pointers have been removed from everywhere and all the locking that protect those removals are satisified. Jason ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-19 23:22 ` Jason Gunthorpe 2021-05-20 1:08 ` Tony Krowiak @ 2021-05-20 8:38 ` Halil Pasic 2021-05-20 12:51 ` Jason Gunthorpe 2021-05-21 18:24 ` Tony Krowiak 2 siblings, 1 reply; 15+ messages in thread From: Halil Pasic @ 2021-05-20 8:38 UTC (permalink / raw) To: Jason Gunthorpe Cc: Tony Krowiak, linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, alex.williamson, kwankhede On Wed, 19 May 2021 20:22:02 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Wed, May 19, 2021 at 07:04:46PM -0400, Tony Krowiak wrote: > > > > > > On 5/19/21 12:16 PM, Jason Gunthorpe wrote: > > > On Wed, May 19, 2021 at 11:39:21AM -0400, Tony Krowiak wrote: > > > > > > > @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu) > > > > if (!(vcpu->arch.sie_block->eca & ECA_AIV)) > > > > return -EOPNOTSUPP; > > > > - apqn = vcpu->run->s.regs.gprs[0] & 0xffff; > > > > - mutex_lock(&matrix_dev->lock); > > > > + rcu_read_lock(); > > > > + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook); > > > > + if (!pqap_module_hook) { > > > > + rcu_read_unlock(); > > > > + goto set_status; > > > > + } > > > > - if (!vcpu->kvm->arch.crypto.pqap_hook) > > > > - goto out_unlock; > > > > - matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook, > > > > - struct ap_matrix_mdev, pqap_hook); > > > > + matrix_mdev = pqap_module_hook->data; > > > > + rcu_read_unlock(); > > > > + mutex_lock(&matrix_dev->lock); > > > The matrix_mdev pointer was extracted from the pqap_module_hook, > > > but now there is nothing protecting it since the rcu was dropped and > > > it gets freed in vfio_ap_mdev_remove. > > > > Therein lies the rub. We can't hold the rcu_read_lock across the > > entire time that the interception is being processed because of > > wait conditions in the interception handler. Regardless of whether > > the pointer to the matrix_mdev is retrieved as the container of > > or extracted from the pqap_hook, there is nothing protecting it > > and there appears to be no way to do so using RCU. > > RCU is a lock that should only be used for highly performance > sensitive read work loads. This is not a highly performance sensitive read workload. > It eliminates one atomic from a lock, but > if you go on to immediately do something like try_module_get, which > has an atomic inside, the whole thing is pointless (assuming > try_module_get was even the right thing to do) I'm not sure about this argument, or that RCU should be used only for highly performance sensitive read workloads. Can you please elaborate on the argument above and also put your statement in perspective with https://lwn.net/Articles/263130/? @Christian: Since you proposed RCU for this, I guess your opinion does not align with Jason's. > > Use a simple sleepable rwsem around the whole thing and forget about > the module_get. Hold the write side when NULL'ing the pointer. > Yes a simple sleepable lock would work, and we wouldn't need get_module(). Because before the vfio_ap module unloads, it sets all vcpu->kvm->arch.crypto.pqap_hook instances to NULL. So if we know that vcpu->kvm->arch.crypto.pqap_hook then we know that the still have valid references to the module. > > > And, again, module locking doesn't prevent vfio_ap_mdev_remove() from > > > being called. None of these patches should be combining module locking > > > with RCU. > > > > Is there any other way besides user interaction with the mdev's > > sysfs remove interface for the remove callback to get invoked? > > There are more options after my re-organizing series. > > But I'm not sure why you ask in response to my point about module > locking? Module locking is not really related to sysfs. > > > If I try to remove the mdev using the sysfs interface while the > > mdev fd is still open by the guest, the remove hangs until the > > fd is closed. > > Yes, it will wait when the vfio_device is being destroyed. > > > That being the case, the mdev release callback will get invoked > > prior to the remove callback being invoked which renders this whole > > debate moot. What am I missing here? > > AFAICT the control flow is such that release can immediately move on > to remove on the same CPU without an additional synchronization. So > the kfree can still race with the above. > > But the more I look at this the wonkier it is.. The real issue is not > the matrix_mdev, it is the whole vcpu->kvm->arch.crypto.pqap_hook > > This is nonesense too: > > if (vcpu->kvm->arch.crypto.pqap_hook) { > if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) > return -EOPNOTSUPP; > ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); > > It should have a lock around it of some kind, not a > try_module_get. module_get is not la lock. I tend to agree. In fact I asked for a lock around it several times: https://www.lkml.org/lkml/2019/3/1/260 https://lkml.org/lkml/2020/12/3/987 https://lkml.org/lkml/2020/12/4/994 But in my opinion RCU is also viable (not this patch). I think, I prefer a lock for simplicity, unless it is not (deadlocks) ;). Many thanks for bringing your perspective to this. I'm optimistic about getting this finally addressed properly. Regards, Halil ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-20 8:38 ` Halil Pasic @ 2021-05-20 12:51 ` Jason Gunthorpe 0 siblings, 0 replies; 15+ messages in thread From: Jason Gunthorpe @ 2021-05-20 12:51 UTC (permalink / raw) To: Halil Pasic Cc: Tony Krowiak, linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, alex.williamson, kwankhede On Thu, May 20, 2021 at 10:38:29AM +0200, Halil Pasic wrote: > > It eliminates one atomic from a lock, but > > if you go on to immediately do something like try_module_get, which > > has an atomic inside, the whole thing is pointless (assuming > > try_module_get was even the right thing to do) > > I'm not sure about this argument, or that RCU should be used only for > highly performance sensitive read workloads. Fundamentally RCU is a technique for building a read/write lock that avoids an atomic incr on the read side. This comes at the cost of significantly slowing down the write side. Everything about RCU is very complicated, people typically use it wrong. People use it wrong so often than Paul wrote a very long list of guidelines to help understand if it is being used properly: Documentation/RCU/checklist.rst If you haven't memorized this document then you probably shouldn't be using RCU at all :) > Can you please elaborate on the argument above and also put your > statement in perspective with https://lwn.net/Articles/263130/? This article doesn't talk much about the downsides - and focuses on performance win. If you need the performance then sure use RCU, but RCU is not a general purpose replacement for a rwsem. People should *always* reach for a rwsem first. Design that properly and then ask themselves if the rwsem can or should be optimized to a RCU. > Yes a simple sleepable lock would work, and we wouldn't need > get_module(). Because before the vfio_ap module unloads, it > sets all vcpu->kvm->arch.crypto.pqap_hook instances to NULL. So if > we know that vcpu->kvm->arch.crypto.pqap_hook then we know > that the still have valid references to the module. Yes > But in my opinion RCU is also viable (not this patch). I think, I prefer > a lock for simplicity, unless it is not (deadlocks) ;). As soon as you said you needed to sleep RCU stopped being viable. To make a sleepable region you need to do an atomic write and at that instant all the value of RCU was destroyed - use a normal rwsem. There is SRCU that solves the sleepable problem, but it has an incredible cost in both write side performance and memory usage that should only be reached for if high read side performance is really required. Jason ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-19 23:22 ` Jason Gunthorpe 2021-05-20 1:08 ` Tony Krowiak 2021-05-20 8:38 ` Halil Pasic @ 2021-05-21 18:24 ` Tony Krowiak 2021-05-21 18:30 ` Jason Gunthorpe 2 siblings, 1 reply; 15+ messages in thread From: Tony Krowiak @ 2021-05-21 18:24 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, alex.williamson, kwankhede > The simple solution in sketch is just this: The code below results in a lockdep WARN_ON for the reasons documented in my comments interspersed in the code. After trying several different permutations using RCU and an rw_semaphore, I came to the conclusion that setting and clearing the hook pointer while the mdev fd is being open and closed or when the mdev is being removed unnecessarily complicates things. There is no good reason to set/clear the function pointer at this time, nor is there any compelling reason to store the function pointer in a satellite structure of the kvm struct. Since the hook function's lifespan coincides with the lifespan of the vfio_ap module, why not store it when the module is loaded and clear it when the module is unloaded? Access to the function pointer can be controlled by a lock that is independent of the matrix_dev->lock, thus avoiding potential lock dependencies. Access to the mdev is controlled by the matrix_dev lock, so if the mdev is retrieved from the matrix_dev->mdev_list in the hook function, then we are assured that the mdev will never be accessed after it is freed; problem solved. > > diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c > index 9928f785c6773a..f70386452367dd 100644 > --- a/arch/s390/kvm/priv.c > +++ b/arch/s390/kvm/priv.c > @@ -657,13 +657,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu) > * Verify that the hook callback is registered, lock the owner > * and call the hook. > */ > - if (vcpu->kvm->arch.crypto.pqap_hook) { > - if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) > - return -EOPNOTSUPP; > + if (down_read_trylock(&vcpu->kvm->arch.crypto.rwsem) && > + vcpu->kvm->arch.crypto.pqap_hook) { > ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); So, the hook function (handle_pqap in vfio_ap_ops.c) is executed while holding the rwsem lock. The hook function tries to lock the matrix_dev->lock mutex. > - module_put(vcpu->kvm->arch.crypto.pqap_hook->owner); > if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000) > kvm_s390_set_psw_cc(vcpu, 3); > + up_read(&vcpu->kv->arch.crypto.rwsem); > return ret; > } > /* > diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > index b2c7e10dfdcdcf..64c89f6a711e94 100644 > --- a/drivers/s390/crypto/vfio_ap_ops.c > +++ b/drivers/s390/crypto/vfio_ap_ops.c > @@ -352,8 +352,7 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev) > vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix); > init_waitqueue_head(&matrix_mdev->wait_for_kvm); > mdev_set_drvdata(mdev, matrix_mdev); > - matrix_mdev->pqap_hook.hook = handle_pqap; > - matrix_mdev->pqap_hook.owner = THIS_MODULE; > + down_write(&&vcpu->kvm->arch.crypto.rwsem); > mutex_lock(&matrix_dev->lock); > list_add(&matrix_mdev->node, &matrix_dev->mdev_list); > mutex_unlock(&matrix_dev->lock); > @@ -1132,7 +1131,9 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, > matrix_mdev->matrix.aqm, > matrix_mdev->matrix.adm); > mutex_lock(&matrix_dev->lock); Locks the matrix_dev->lock mutex, then tries to lock rwsem to store the pqap_hook under the rwsem lock. During testing, this occurred while the interception of the PQAP instruction was taking place, so the read lock was already taken and the hook function was waiting on the matrix_dev->lock taken above. All of the test cases ran successfully to completion, so there didn't appear to be a deadlock of any sort, but lockdep apparently detected a problem. I was able to get around this by doing the down_write and setting the hook vfio_ap_mdev_group_notifier function before calling this function (above) and before taking the matrix_dev->lock, but that circumvents the protection against accessing a matrix_dev that's already been freed. > + down_write(&kvm->arch.crypto.rwsem); > kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook; > + up_write(&kvm->arch.crypto.rwsem); > matrix_mdev->kvm = kvm; > matrix_mdev->kvm_busy = false; > wake_up_all(&matrix_mdev->wait_for_kvm); > @@ -1202,7 +1203,9 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) > kvm_arch_crypto_clear_masks(matrix_mdev->kvm); > mutex_lock(&matrix_dev->lock); > vfio_ap_mdev_reset_queues(matrix_mdev->mdev); > + down_write(&matrix_mdev->kvm->arch.crypto.rwsem); Same scenario here. > matrix_mdev->kvm->arch.crypto.pqap_hook = NULL; > + up_write(&matrix_mdev->kvm->arch.crypto.rwsem); > kvm_put_kvm(matrix_mdev->kvm); > matrix_mdev->kvm = NULL; > matrix_mdev->kvm_busy = false; > > > Jason ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-21 18:24 ` Tony Krowiak @ 2021-05-21 18:30 ` Jason Gunthorpe 0 siblings, 0 replies; 15+ messages in thread From: Jason Gunthorpe @ 2021-05-21 18:30 UTC (permalink / raw) To: Tony Krowiak Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, alex.williamson, kwankhede On Fri, May 21, 2021 at 02:24:33PM -0400, Tony Krowiak wrote: > > > The simple solution in sketch is just this: > > The code below results in a lockdep WARN_ON for the > reasons documented in my comments interspersed in > the code. Sure, to be expected for a 2 min effort, but you seem to have solved it by trivially putting things in the right locking order? > After trying several different permutations using RCU and > an rw_semaphore, I came to the conclusion that setting and > clearing the hook pointer while the mdev fd is being open > and closed or when the mdev is being removed unnecessarily > complicates things. There is no good reason to set/clear the > function pointer at this time, nor is there any compelling > reason to store the function pointer in a satellite structure > of the kvm struct. Since the hook function's lifespan coincides > with the lifespan of the vfio_ap module, why not store it > when the module is loaded and clear it when the module is > unloaded? Well, the hook function isn't really the problem.. > Access to the function pointer can be controlled by a lock > that is independent of the matrix_dev->lock, thus avoiding > potential lock dependencies. Access to the mdev is controlled by > the matrix_dev lock, so if the mdev is retrieved from the > matrix_dev->mdev_list in the hook function, then we are assured > that the mdev will never be accessed after it is freed; problem solved. This just transforms the problem into needing to hold a lock around mdev_list while accessing a member of the mdev_list Is it simpler? Jason ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-19 15:39 ` [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler Tony Krowiak 2021-05-19 16:16 ` Jason Gunthorpe @ 2021-05-19 17:21 ` Halil Pasic 2021-05-19 23:14 ` Tony Krowiak 1 sibling, 1 reply; 15+ messages in thread From: Halil Pasic @ 2021-05-19 17:21 UTC (permalink / raw) To: Tony Krowiak Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, jgg, alex.williamson, kwankhede On Wed, 19 May 2021 11:39:21 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > There is currently nothing that controls access to the structure that > contains the function pointer to the handler that processes interception of > the PQAP(AQIC) instruction. If the mdev is removed while the PQAP(AQIC) > instruction is being intercepted, there is a possibility that the function > pointer to the handler can get wiped out prior to the attempt to call it. > > This patch utilizes RCU to synchronize access to the kvm_s390_module_hook > structure used to process interception of the PQAP(AQIC) instruction. > > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> > --- > arch/s390/include/asm/kvm_host.h | 1 + > arch/s390/kvm/priv.c | 47 ++++++++++++++++----------- > drivers/s390/crypto/vfio_ap_ops.c | 54 ++++++++++++++++++++++++------- > 3 files changed, 73 insertions(+), 29 deletions(-) > > diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h > index 8925f3969478..4987e82d6116 100644 > --- a/arch/s390/include/asm/kvm_host.h > +++ b/arch/s390/include/asm/kvm_host.h > @@ -806,6 +806,7 @@ struct kvm_s390_cpu_model { > struct kvm_s390_module_hook { > int (*hook)(struct kvm_vcpu *vcpu); > struct module *owner; > + void *data; I guess you need this, because you stopped using the member of struct ap_mdev_matrix and instead you kzalloc() a new object. Yet I don't understand why do you do so? > }; > > struct kvm_s390_crypto { > diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c > index 9928f785c677..2d330dfbdb61 100644 > --- a/arch/s390/kvm/priv.c > +++ b/arch/s390/kvm/priv.c > @@ -610,8 +610,11 @@ static int handle_io_inst(struct kvm_vcpu *vcpu) > static int handle_pqap(struct kvm_vcpu *vcpu) > { > struct ap_queue_status status = {}; > + struct kvm_s390_module_hook *pqap_module_hook; > + int (*pqap_hook)(struct kvm_vcpu *vcpu); > + struct module *owner; > unsigned long reg0; > - int ret; > + int ret = 0; > uint8_t fc; > > /* Verify that the AP instruction are available */ > @@ -657,24 +660,32 @@ static int handle_pqap(struct kvm_vcpu *vcpu) > * Verify that the hook callback is registered, lock the owner > * and call the hook. > */ > - if (vcpu->kvm->arch.crypto.pqap_hook) { > - if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) > - return -EOPNOTSUPP; > - ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); > - module_put(vcpu->kvm->arch.crypto.pqap_hook->owner); > - if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000) > - kvm_s390_set_psw_cc(vcpu, 3); > - return ret; > + rcu_read_lock(); > + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook); > + if (pqap_module_hook) { > + pqap_hook = pqap_module_hook->hook; > + owner = pqap_module_hook->owner; > + rcu_read_unlock(); > + if (!try_module_get(owner)) { Why do this outside the rcu_read lock? What guarantees that the module ain't gone by this time? I don't think try_module_get() is guaranteed to give you false if passed in a pointer that points to some memory that ain't a struct module any more (use-after-free). > + ret = -EOPNOTSUPP; > + } else { > + ret = pqap_hook(vcpu); > + module_put(owner); > + if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000) > + kvm_s390_set_psw_cc(vcpu, 3); > + } > + } else { > + rcu_read_unlock(); > + /* > + * A vfio_driver must register a hook. > + * No hook means no driver to enable the SIE CRYCB and no > + * queues. We send this response to the guest. > + */ > + status.response_code = 0x01; > + memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status)); > + kvm_s390_set_psw_cc(vcpu, 3); > } > - /* > - * A vfio_driver must register a hook. > - * No hook means no driver to enable the SIE CRYCB and no queues. > - * We send this response to the guest. > - */ > - status.response_code = 0x01; > - memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status)); > - kvm_s390_set_psw_cc(vcpu, 3); > - return 0; > + return ret; > } > > static int handle_stfl(struct kvm_vcpu *vcpu) > diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > index f90c9103dac2..a6aa3f753ac4 100644 > --- a/drivers/s390/crypto/vfio_ap_ops.c > +++ b/drivers/s390/crypto/vfio_ap_ops.c > @@ -16,6 +16,7 @@ > #include <linux/bitops.h> > #include <linux/kvm_host.h> > #include <linux/module.h> > +#include <linux/rcupdate.h> > #include <asm/kvm.h> > #include <asm/zcrypt.h> > > @@ -279,6 +280,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu) > uint64_t status; > uint16_t apqn; > struct vfio_ap_queue *q; > + struct kvm_s390_module_hook *pqap_module_hook; > struct ap_queue_status qstatus = { > .response_code = AP_RESPONSE_Q_NOT_AVAIL, }; > struct ap_matrix_mdev *matrix_mdev; > @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu) > if (!(vcpu->arch.sie_block->eca & ECA_AIV)) > return -EOPNOTSUPP; > > - apqn = vcpu->run->s.regs.gprs[0] & 0xffff; > - mutex_lock(&matrix_dev->lock); > + rcu_read_lock(); > + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook); > + if (!pqap_module_hook) { > + rcu_read_unlock(); > + goto set_status; > + } > > - if (!vcpu->kvm->arch.crypto.pqap_hook) > - goto out_unlock; > - matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook, > - struct ap_matrix_mdev, pqap_hook); > + matrix_mdev = pqap_module_hook->data; > + rcu_read_unlock(); > + mutex_lock(&matrix_dev->lock); I agree with Jason's assessment. At this point the matrix_dev pointer may point to garbage. Above, I think we can use the pqap_hook function pointer local to handle_pqap, because we know that as long as the module is there the callback will sit at the same address and won't go away. And we do the try_module_get() to ensure that the module stays loaded. > + apqn = vcpu->run->s.regs.gprs[0] & 0xffff; > > /* > * If the KVM pointer is in the process of being set, wait until the > @@ -322,9 +328,10 @@ static int handle_pqap(struct kvm_vcpu *vcpu) > qstatus = vfio_ap_irq_disable(q); > > out_unlock: > + mutex_unlock(&matrix_dev->lock); > +set_status: > memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus)); > vcpu->run->s.regs.gprs[1] >>= 32; > - mutex_unlock(&matrix_dev->lock); > return 0; > } > > @@ -353,8 +360,6 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev) > vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix); > init_waitqueue_head(&matrix_mdev->wait_for_kvm); > mdev_set_drvdata(mdev, matrix_mdev); > - matrix_mdev->pqap_hook.hook = handle_pqap; > - matrix_mdev->pqap_hook.owner = THIS_MODULE; I guess the member of struct ap_matrix_mdev is still around, it will remain all zero. Is this somehow intentional? > mutex_lock(&matrix_dev->lock); > list_add(&matrix_mdev->node, &matrix_dev->mdev_list); > mutex_unlock(&matrix_dev->lock); > @@ -1085,6 +1090,22 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = { > NULL > }; > > +static int vfio_ap_mdev_set_pqap_hook(struct ap_matrix_mdev *matrix_mdev, > + struct kvm *kvm) > +{ > + struct kvm_s390_module_hook *pqap_hook; > + > + pqap_hook = kmalloc(sizeof(*kvm->arch.crypto.pqap_hook), GFP_KERNEL); What is the extra allocation supposed to buy us? > + if (!pqap_hook) > + return -ENOMEM; > + pqap_hook->data = matrix_mdev; > + pqap_hook->hook = handle_pqap; > + pqap_hook->owner = THIS_MODULE; > + rcu_assign_pointer(kvm->arch.crypto.pqap_hook, pqap_hook); > + > + return 0; > +} > + > /** > * vfio_ap_mdev_set_kvm > * > @@ -1107,6 +1128,7 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = { > static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, > struct kvm *kvm) > { > + int ret; > struct ap_matrix_mdev *m; > > if (kvm->arch.crypto.crycbd) { > @@ -1115,6 +1137,10 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, > return -EPERM; > } > > + ret = vfio_ap_mdev_set_pqap_hook(matrix_mdev, kvm); > + if (ret) > + return ret; > + > kvm_get_kvm(kvm); > matrix_mdev->kvm_busy = true; > mutex_unlock(&matrix_dev->lock); > @@ -1123,7 +1149,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, > matrix_mdev->matrix.aqm, > matrix_mdev->matrix.adm); > mutex_lock(&matrix_dev->lock); > - kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook; > matrix_mdev->kvm = kvm; > matrix_mdev->kvm_busy = false; > wake_up_all(&matrix_mdev->wait_for_kvm); > @@ -1161,6 +1186,13 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb, > return NOTIFY_DONE; > } > > +static void vfio_ap_mdev_unset_pqap_hook(struct kvm *kvm) > +{ > + rcu_assign_pointer(kvm->arch.crypto.pqap_hook, NULL); > + synchronize_rcu(); > + kfree(kvm->arch.crypto.pqap_hook); > +} > + > /** > * vfio_ap_mdev_unset_kvm > * > @@ -1189,11 +1221,11 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) > > if (matrix_mdev->kvm) { > matrix_mdev->kvm_busy = true; > + vfio_ap_mdev_unset_pqap_hook(matrix_mdev->kvm); > mutex_unlock(&matrix_dev->lock); > kvm_arch_crypto_clear_masks(matrix_mdev->kvm); > mutex_lock(&matrix_dev->lock); > vfio_ap_mdev_reset_queues(matrix_mdev->mdev); > - matrix_mdev->kvm->arch.crypto.pqap_hook = NULL; > kvm_put_kvm(matrix_mdev->kvm); > matrix_mdev->kvm = NULL; > matrix_mdev->kvm_busy = false; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler 2021-05-19 17:21 ` Halil Pasic @ 2021-05-19 23:14 ` Tony Krowiak 0 siblings, 0 replies; 15+ messages in thread From: Tony Krowiak @ 2021-05-19 23:14 UTC (permalink / raw) To: Halil Pasic Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne, jgg, alex.williamson, kwankhede On 5/19/21 1:21 PM, Halil Pasic wrote: > On Wed, 19 May 2021 11:39:21 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> There is currently nothing that controls access to the structure that >> contains the function pointer to the handler that processes interception of >> the PQAP(AQIC) instruction. If the mdev is removed while the PQAP(AQIC) >> instruction is being intercepted, there is a possibility that the function >> pointer to the handler can get wiped out prior to the attempt to call it. >> >> This patch utilizes RCU to synchronize access to the kvm_s390_module_hook >> structure used to process interception of the PQAP(AQIC) instruction. >> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> >> --- >> arch/s390/include/asm/kvm_host.h | 1 + >> arch/s390/kvm/priv.c | 47 ++++++++++++++++----------- >> drivers/s390/crypto/vfio_ap_ops.c | 54 ++++++++++++++++++++++++------- >> 3 files changed, 73 insertions(+), 29 deletions(-) >> >> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h >> index 8925f3969478..4987e82d6116 100644 >> --- a/arch/s390/include/asm/kvm_host.h >> +++ b/arch/s390/include/asm/kvm_host.h >> @@ -806,6 +806,7 @@ struct kvm_s390_cpu_model { >> struct kvm_s390_module_hook { >> int (*hook)(struct kvm_vcpu *vcpu); >> struct module *owner; >> + void *data; > I guess you need this, because you stopped using the member of struct > ap_mdev_matrix and instead you kzalloc() a new object. Yet I don't > understand why do you do so? I did so because the mdev remove callback frees the matrix_mdev and I thought I could protect against accessing freed storage by storing the pointer in the pqap_hook structure; however, since we can't hold the rcu_read_lock while the handle_pqap is executing - due to wait conditions - this turns out to be a bad idea. I'm not sure at this point that we can use RCU for this because the freeing of the matrix_mdev is independent of pqap processing. > >> }; >> >> struct kvm_s390_crypto { >> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c >> index 9928f785c677..2d330dfbdb61 100644 >> --- a/arch/s390/kvm/priv.c >> +++ b/arch/s390/kvm/priv.c >> @@ -610,8 +610,11 @@ static int handle_io_inst(struct kvm_vcpu *vcpu) >> static int handle_pqap(struct kvm_vcpu *vcpu) >> { >> struct ap_queue_status status = {}; >> + struct kvm_s390_module_hook *pqap_module_hook; >> + int (*pqap_hook)(struct kvm_vcpu *vcpu); >> + struct module *owner; >> unsigned long reg0; >> - int ret; >> + int ret = 0; >> uint8_t fc; >> >> /* Verify that the AP instruction are available */ >> @@ -657,24 +660,32 @@ static int handle_pqap(struct kvm_vcpu *vcpu) >> * Verify that the hook callback is registered, lock the owner >> * and call the hook. >> */ >> - if (vcpu->kvm->arch.crypto.pqap_hook) { >> - if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner)) >> - return -EOPNOTSUPP; >> - ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu); >> - module_put(vcpu->kvm->arch.crypto.pqap_hook->owner); >> - if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000) >> - kvm_s390_set_psw_cc(vcpu, 3); >> - return ret; >> + rcu_read_lock(); >> + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook); >> + if (pqap_module_hook) { >> + pqap_hook = pqap_module_hook->hook; >> + owner = pqap_module_hook->owner; >> + rcu_read_unlock(); >> + if (!try_module_get(owner)) { > Why do this outside the rcu_read lock? It should be done inside the lock. > > What guarantees that the module ain't gone by this time? I don't think > try_module_get() is guaranteed to give you false if passed in a pointer > that points to some memory that ain't a struct module any more > (use-after-free). That needs to be inside the lock. > >> + ret = -EOPNOTSUPP; >> + } else { >> + ret = pqap_hook(vcpu); >> + module_put(owner); >> + if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000) >> + kvm_s390_set_psw_cc(vcpu, 3); >> + } >> + } else { >> + rcu_read_unlock(); >> + /* >> + * A vfio_driver must register a hook. >> + * No hook means no driver to enable the SIE CRYCB and no >> + * queues. We send this response to the guest. >> + */ >> + status.response_code = 0x01; >> + memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status)); >> + kvm_s390_set_psw_cc(vcpu, 3); >> } >> - /* >> - * A vfio_driver must register a hook. >> - * No hook means no driver to enable the SIE CRYCB and no queues. >> - * We send this response to the guest. >> - */ >> - status.response_code = 0x01; >> - memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status)); >> - kvm_s390_set_psw_cc(vcpu, 3); >> - return 0; >> + return ret; >> } >> >> static int handle_stfl(struct kvm_vcpu *vcpu) >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c >> index f90c9103dac2..a6aa3f753ac4 100644 >> --- a/drivers/s390/crypto/vfio_ap_ops.c >> +++ b/drivers/s390/crypto/vfio_ap_ops.c >> @@ -16,6 +16,7 @@ >> #include <linux/bitops.h> >> #include <linux/kvm_host.h> >> #include <linux/module.h> >> +#include <linux/rcupdate.h> >> #include <asm/kvm.h> >> #include <asm/zcrypt.h> >> >> @@ -279,6 +280,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu) >> uint64_t status; >> uint16_t apqn; >> struct vfio_ap_queue *q; >> + struct kvm_s390_module_hook *pqap_module_hook; >> struct ap_queue_status qstatus = { >> .response_code = AP_RESPONSE_Q_NOT_AVAIL, }; >> struct ap_matrix_mdev *matrix_mdev; >> @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu) >> if (!(vcpu->arch.sie_block->eca & ECA_AIV)) >> return -EOPNOTSUPP; >> >> - apqn = vcpu->run->s.regs.gprs[0] & 0xffff; >> - mutex_lock(&matrix_dev->lock); >> + rcu_read_lock(); >> + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook); >> + if (!pqap_module_hook) { >> + rcu_read_unlock(); >> + goto set_status; >> + } >> >> - if (!vcpu->kvm->arch.crypto.pqap_hook) >> - goto out_unlock; >> - matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook, >> - struct ap_matrix_mdev, pqap_hook); >> + matrix_mdev = pqap_module_hook->data; >> + rcu_read_unlock(); >> + mutex_lock(&matrix_dev->lock); > I agree with Jason's assessment. At this point the matrix_dev pointer > may point to garbage. > > Above, I think we can use the pqap_hook function pointer local to > handle_pqap, because we know that as long as the module is there > the callback will sit at the same address and won't go away. And > we do the try_module_get() to ensure that the module stays loaded. I'll take your word that is true. If so, then I think I can replace the way we get the matrix_mdev in the pqap handler which can be done under the matrix_dev->lock. I can write a function to get the matrix from the list of mdevs in matrix_dev. If the matrix_mdev has been removed, then we can bail out of the handle_pqap function. Maybe the RCU can be used after all. > > >> + apqn = vcpu->run->s.regs.gprs[0] & 0xffff; >> >> /* >> * If the KVM pointer is in the process of being set, wait until the >> @@ -322,9 +328,10 @@ static int handle_pqap(struct kvm_vcpu *vcpu) >> qstatus = vfio_ap_irq_disable(q); >> >> out_unlock: >> + mutex_unlock(&matrix_dev->lock); >> +set_status: >> memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus)); >> vcpu->run->s.regs.gprs[1] >>= 32; >> - mutex_unlock(&matrix_dev->lock); >> return 0; >> } >> >> @@ -353,8 +360,6 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev) >> vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix); >> init_waitqueue_head(&matrix_mdev->wait_for_kvm); >> mdev_set_drvdata(mdev, matrix_mdev); >> - matrix_mdev->pqap_hook.hook = handle_pqap; >> - matrix_mdev->pqap_hook.owner = THIS_MODULE; > I guess the member of struct ap_matrix_mdev is still around, it will > remain all zero. Is this somehow intentional? > > >> mutex_lock(&matrix_dev->lock); >> list_add(&matrix_mdev->node, &matrix_dev->mdev_list); >> mutex_unlock(&matrix_dev->lock); >> @@ -1085,6 +1090,22 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = { >> NULL >> }; >> >> +static int vfio_ap_mdev_set_pqap_hook(struct ap_matrix_mdev *matrix_mdev, >> + struct kvm *kvm) >> +{ >> + struct kvm_s390_module_hook *pqap_hook; >> + >> + pqap_hook = kmalloc(sizeof(*kvm->arch.crypto.pqap_hook), GFP_KERNEL); > What is the extra allocation supposed to buy us? > >> + if (!pqap_hook) >> + return -ENOMEM; >> + pqap_hook->data = matrix_mdev; >> + pqap_hook->hook = handle_pqap; >> + pqap_hook->owner = THIS_MODULE; >> + rcu_assign_pointer(kvm->arch.crypto.pqap_hook, pqap_hook); >> + >> + return 0; >> +} >> + >> /** >> * vfio_ap_mdev_set_kvm >> * >> @@ -1107,6 +1128,7 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = { >> static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, >> struct kvm *kvm) >> { >> + int ret; >> struct ap_matrix_mdev *m; >> >> if (kvm->arch.crypto.crycbd) { >> @@ -1115,6 +1137,10 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, >> return -EPERM; >> } >> >> + ret = vfio_ap_mdev_set_pqap_hook(matrix_mdev, kvm); >> + if (ret) >> + return ret; >> + >> kvm_get_kvm(kvm); >> matrix_mdev->kvm_busy = true; >> mutex_unlock(&matrix_dev->lock); >> @@ -1123,7 +1149,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, >> matrix_mdev->matrix.aqm, >> matrix_mdev->matrix.adm); >> mutex_lock(&matrix_dev->lock); >> - kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook; >> matrix_mdev->kvm = kvm; >> matrix_mdev->kvm_busy = false; >> wake_up_all(&matrix_mdev->wait_for_kvm); >> @@ -1161,6 +1186,13 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb, >> return NOTIFY_DONE; >> } >> >> +static void vfio_ap_mdev_unset_pqap_hook(struct kvm *kvm) >> +{ >> + rcu_assign_pointer(kvm->arch.crypto.pqap_hook, NULL); >> + synchronize_rcu(); >> + kfree(kvm->arch.crypto.pqap_hook); >> +} >> + >> /** >> * vfio_ap_mdev_unset_kvm >> * >> @@ -1189,11 +1221,11 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) >> >> if (matrix_mdev->kvm) { >> matrix_mdev->kvm_busy = true; >> + vfio_ap_mdev_unset_pqap_hook(matrix_mdev->kvm); >> mutex_unlock(&matrix_dev->lock); >> kvm_arch_crypto_clear_masks(matrix_mdev->kvm); >> mutex_lock(&matrix_dev->lock); >> vfio_ap_mdev_reset_queues(matrix_mdev->mdev); >> - matrix_mdev->kvm->arch.crypto.pqap_hook = NULL; >> kvm_put_kvm(matrix_mdev->kvm); >> matrix_mdev->kvm = NULL; >> matrix_mdev->kvm_busy = false; ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-05-21 18:30 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-19 15:39 [PATCH v3 0/2] s390/vfio-ap: fix memory leak in mdev remove callback Tony Krowiak 2021-05-19 15:39 ` [PATCH v3 1/2] " Tony Krowiak 2021-05-19 15:39 ` [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler Tony Krowiak 2021-05-19 16:16 ` Jason Gunthorpe 2021-05-19 23:04 ` Tony Krowiak 2021-05-19 23:22 ` Jason Gunthorpe 2021-05-20 1:08 ` Tony Krowiak 2021-05-20 8:48 ` Halil Pasic 2021-05-20 12:26 ` Jason Gunthorpe 2021-05-20 8:38 ` Halil Pasic 2021-05-20 12:51 ` Jason Gunthorpe 2021-05-21 18:24 ` Tony Krowiak 2021-05-21 18:30 ` Jason Gunthorpe 2021-05-19 17:21 ` Halil Pasic 2021-05-19 23:14 ` Tony Krowiak
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.