From: Zhangfei Gao <zhangfei.gao@linaro.org> To: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Arnd Bergmann <arnd@arndb.de>, Herbert Xu <herbert@gondor.apana.org.au>, Wangzhou <wangzhou1@hisilicon.com>, Jonathan Cameron <Jonathan.Cameron@huawei.com>, linux-accelerators@lists.ozlabs.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, iommu@lists.linux-foundation.org, Yang Shen <shenyang39@huawei.com> Subject: Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove Date: Thu, 16 Jun 2022 12:10:18 +0800 [thread overview] Message-ID: <fdc8d8b0-4e04-78f5-1e8a-4cf44c89a37f@linaro.org> (raw) In-Reply-To: <Yqn3spLZHpAkQ9Us@myrica> Hi, Jean On 2022/6/15 下午11:16, Jean-Philippe Brucker wrote: > Hi, > > On Fri, Jun 10, 2022 at 08:34:23PM +0800, Zhangfei Gao wrote: >> The uacce parent's module can be removed when uacce is working, >> which may cause troubles. >> >> If rmmod/uacce_remove happens just after fops_open: bind_queue, >> the uacce_remove can not remove the bound queue since it is not >> added to the queue list yet, which blocks the uacce_disable_sva. >> >> Change queues_lock area to make sure the bound queue is added to >> the list thereby can be searched in uacce_remove. >> >> And uacce->parent->driver is checked immediately in case rmmod is >> just happening. >> >> Also the parent driver must always stop DMA before calling >> uacce_remove. >> >> Signed-off-by: Yang Shen <shenyang39@huawei.com> >> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> >> --- >> drivers/misc/uacce/uacce.c | 19 +++++++++++++------ >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c >> index 281c54003edc..b6219c6bfb48 100644 >> --- a/drivers/misc/uacce/uacce.c >> +++ b/drivers/misc/uacce/uacce.c >> @@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode *inode, struct file *filep) >> if (!q) >> return -ENOMEM; >> >> + mutex_lock(&uacce->queues_lock); >> + >> + if (!uacce->parent->driver) { > I don't think this is useful, because the core clears parent->driver after > having run uacce_remove(): > > rmmod hisi_zip open() > ... uacce_fops_open() > __device_release_driver() ... > pci_device_remove() > hisi_zip_remove() > hisi_qm_uninit() > uacce_remove() > ... ... > mutex_lock(uacce->queues_lock) > ... if (!uacce->parent->driver) > device_unbind_cleanup() /* driver still valid, proceed */ > dev->driver = NULL The check if (!uacce->parent->driver) is required, otherwise NULL pointer may happen. iommu_sva_bind_device const struct iommu_ops *ops = dev_iommu_ops(dev); -> dev->iommu->iommu_dev->ops rmmod has no issue, but remove parent pci device has the issue. Test: sleep in fops_open before mutex. estuary:/mnt$ ./work/a.out & //sleep in fops_open echo 1 > /sys/bus/pci/devices/0000:00:02.0/remove & estuary:/mnt$ [ 22.594348] uacce_remove! [ 22.594663] pci 0000:00:02.0: Removing from iommu group 2 [ 22.595073] iommu_release_device dev->iommu=0 [ 22.595076] CPU: 2 PID: 229 Comm: ash Not tainted 5.19.0-rc1-15071-gcbcf098c5257-dirty #633 [ 22.595079] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 22.595080] Call trace: [ 22.595080] dump_backtrace+0xe4/0xf0 [ 22.595085] show_stack+0x20/0x70 [ 22.595086] dump_stack_lvl+0x8c/0xb8 [ 22.595089] dump_stack+0x18/0x34 [ 22.595091] iommu_release_device+0x90/0x98 [ 22.595095] iommu_bus_notifier+0x58/0x68 [ 22.595097] blocking_notifier_call_chain+0x74/0xa8 [ 22.595100] device_del+0x268/0x3b0 [ 22.595102] pci_remove_bus_device+0x84/0x110 [ 22.595106] pci_stop_and_remove_bus_device_locked+0x30/0x60 ... estuary:/mnt$ [ 31.466360] uacce: sleep end! [ 31.466362] uacce->parent->driver=0 [ 31.466364] uacce->parent->iommu=0 [ 31.466365] uacce_bind_queue! [ 31.466366] uacce_bind_queue call iommu_sva_bind_device! [ 31.466367] uacce->parent=d120d0 [ 31.466371] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038 [ 31.472870] Mem abort info: [ 31.473450] ESR = 0x0000000096000004 [ 31.474223] EC = 0x25: DABT (current EL), IL = 32 bits [ 31.475390] SET = 0, FnV = 0 [ 31.476031] EA = 0, S1PTW = 0 [ 31.476680] FSC = 0x04: level 0 translation fault [ 31.477687] Data abort info: [ 31.478294] ISV = 0, ISS = 0x00000004 [ 31.479152] CM = 0, WnR = 0 [ 31.479785] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000714d8000 [ 31.481144] [0000000000000038] pgd=0000000000000000, p4d=0000000000000000 [ 31.482622] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 31.483784] Modules linked in: hisi_zip [ 31.484590] CPU: 2 PID: 228 Comm: a.out Not tainted 5.19.0-rc1-15071-gcbcf098c5257-dirty #633 [ 31.486374] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 31.487862] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 31.489390] pc : iommu_sva_bind_device+0x44/0xf4 [ 31.490404] lr : uacce_fops_open+0x128/0x234 > > Since uacce_remove() disabled SVA, the following uacce_bind_queue() will > fail anyway. However, if uacce->flags does not have UACCE_DEV_SVA set, > we'll proceed further and call uacce->ops->get_queue(), which does not > exist anymore since the parent module is gone. > > I think we need the global uacce_mutex to serialize uacce_remove() and > uacce_fops_open(). uacce_remove() would do everything, including > xa_erase(), while holding that mutex. And uacce_fops_open() would try to > obtain the uacce object from the xarray while holding the mutex, which > fails if the uacce object is being removed. Since fops_open get char device refcount, uacce_release will not happen until open returns. So either uacce = xa_load(&uacce_xa, iminor(inode)) is got, uacce_release release uacce after fops_release. Or uacce is not got and return -ENODEV. open: uacce = xa_load(&uacce_xa, iminor(inode)); if (!uacce) return -ENODEV; uacce->dev.release = uacce_release; uacce_release: kfree(uacce); Thanks
WARNING: multiple messages have this Message-ID (diff)
From: Zhangfei Gao <zhangfei.gao@linaro.org> To: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Yang Shen <shenyang39@huawei.com>, Herbert Xu <herbert@gondor.apana.org.au>, Arnd Bergmann <arnd@arndb.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-crypto@vger.kernel.org, linux-accelerators@lists.ozlabs.org Subject: Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove Date: Thu, 16 Jun 2022 12:10:18 +0800 [thread overview] Message-ID: <fdc8d8b0-4e04-78f5-1e8a-4cf44c89a37f@linaro.org> (raw) In-Reply-To: <Yqn3spLZHpAkQ9Us@myrica> Hi, Jean On 2022/6/15 下午11:16, Jean-Philippe Brucker wrote: > Hi, > > On Fri, Jun 10, 2022 at 08:34:23PM +0800, Zhangfei Gao wrote: >> The uacce parent's module can be removed when uacce is working, >> which may cause troubles. >> >> If rmmod/uacce_remove happens just after fops_open: bind_queue, >> the uacce_remove can not remove the bound queue since it is not >> added to the queue list yet, which blocks the uacce_disable_sva. >> >> Change queues_lock area to make sure the bound queue is added to >> the list thereby can be searched in uacce_remove. >> >> And uacce->parent->driver is checked immediately in case rmmod is >> just happening. >> >> Also the parent driver must always stop DMA before calling >> uacce_remove. >> >> Signed-off-by: Yang Shen <shenyang39@huawei.com> >> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> >> --- >> drivers/misc/uacce/uacce.c | 19 +++++++++++++------ >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c >> index 281c54003edc..b6219c6bfb48 100644 >> --- a/drivers/misc/uacce/uacce.c >> +++ b/drivers/misc/uacce/uacce.c >> @@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode *inode, struct file *filep) >> if (!q) >> return -ENOMEM; >> >> + mutex_lock(&uacce->queues_lock); >> + >> + if (!uacce->parent->driver) { > I don't think this is useful, because the core clears parent->driver after > having run uacce_remove(): > > rmmod hisi_zip open() > ... uacce_fops_open() > __device_release_driver() ... > pci_device_remove() > hisi_zip_remove() > hisi_qm_uninit() > uacce_remove() > ... ... > mutex_lock(uacce->queues_lock) > ... if (!uacce->parent->driver) > device_unbind_cleanup() /* driver still valid, proceed */ > dev->driver = NULL The check if (!uacce->parent->driver) is required, otherwise NULL pointer may happen. iommu_sva_bind_device const struct iommu_ops *ops = dev_iommu_ops(dev); -> dev->iommu->iommu_dev->ops rmmod has no issue, but remove parent pci device has the issue. Test: sleep in fops_open before mutex. estuary:/mnt$ ./work/a.out & //sleep in fops_open echo 1 > /sys/bus/pci/devices/0000:00:02.0/remove & estuary:/mnt$ [ 22.594348] uacce_remove! [ 22.594663] pci 0000:00:02.0: Removing from iommu group 2 [ 22.595073] iommu_release_device dev->iommu=0 [ 22.595076] CPU: 2 PID: 229 Comm: ash Not tainted 5.19.0-rc1-15071-gcbcf098c5257-dirty #633 [ 22.595079] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 22.595080] Call trace: [ 22.595080] dump_backtrace+0xe4/0xf0 [ 22.595085] show_stack+0x20/0x70 [ 22.595086] dump_stack_lvl+0x8c/0xb8 [ 22.595089] dump_stack+0x18/0x34 [ 22.595091] iommu_release_device+0x90/0x98 [ 22.595095] iommu_bus_notifier+0x58/0x68 [ 22.595097] blocking_notifier_call_chain+0x74/0xa8 [ 22.595100] device_del+0x268/0x3b0 [ 22.595102] pci_remove_bus_device+0x84/0x110 [ 22.595106] pci_stop_and_remove_bus_device_locked+0x30/0x60 ... estuary:/mnt$ [ 31.466360] uacce: sleep end! [ 31.466362] uacce->parent->driver=0 [ 31.466364] uacce->parent->iommu=0 [ 31.466365] uacce_bind_queue! [ 31.466366] uacce_bind_queue call iommu_sva_bind_device! [ 31.466367] uacce->parent=d120d0 [ 31.466371] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038 [ 31.472870] Mem abort info: [ 31.473450] ESR = 0x0000000096000004 [ 31.474223] EC = 0x25: DABT (current EL), IL = 32 bits [ 31.475390] SET = 0, FnV = 0 [ 31.476031] EA = 0, S1PTW = 0 [ 31.476680] FSC = 0x04: level 0 translation fault [ 31.477687] Data abort info: [ 31.478294] ISV = 0, ISS = 0x00000004 [ 31.479152] CM = 0, WnR = 0 [ 31.479785] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000714d8000 [ 31.481144] [0000000000000038] pgd=0000000000000000, p4d=0000000000000000 [ 31.482622] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 31.483784] Modules linked in: hisi_zip [ 31.484590] CPU: 2 PID: 228 Comm: a.out Not tainted 5.19.0-rc1-15071-gcbcf098c5257-dirty #633 [ 31.486374] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 31.487862] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 31.489390] pc : iommu_sva_bind_device+0x44/0xf4 [ 31.490404] lr : uacce_fops_open+0x128/0x234 > > Since uacce_remove() disabled SVA, the following uacce_bind_queue() will > fail anyway. However, if uacce->flags does not have UACCE_DEV_SVA set, > we'll proceed further and call uacce->ops->get_queue(), which does not > exist anymore since the parent module is gone. > > I think we need the global uacce_mutex to serialize uacce_remove() and > uacce_fops_open(). uacce_remove() would do everything, including > xa_erase(), while holding that mutex. And uacce_fops_open() would try to > obtain the uacce object from the xarray while holding the mutex, which > fails if the uacce object is being removed. Since fops_open get char device refcount, uacce_release will not happen until open returns. So either uacce = xa_load(&uacce_xa, iminor(inode)) is got, uacce_release release uacce after fops_release. Or uacce is not got and return -ENODEV. open: uacce = xa_load(&uacce_xa, iminor(inode)); if (!uacce) return -ENODEV; uacce->dev.release = uacce_release; uacce_release: kfree(uacce); Thanks _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2022-06-16 4:10 UTC|newest] Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-06-10 12:34 [PATCH] uacce: fix concurrency of fops_open and uacce_remove Zhangfei Gao 2022-06-10 12:34 ` Zhangfei Gao 2022-06-15 15:16 ` Jean-Philippe Brucker 2022-06-15 15:16 ` Jean-Philippe Brucker 2022-06-16 4:10 ` Zhangfei Gao [this message] 2022-06-16 4:10 ` Zhangfei Gao 2022-06-16 8:14 ` Jean-Philippe Brucker 2022-06-16 8:14 ` Jean-Philippe Brucker 2022-06-17 6:05 ` Zhangfei Gao 2022-06-17 6:05 ` Zhangfei Gao 2022-06-17 8:20 ` Zhangfei Gao 2022-06-17 8:20 ` Zhangfei Gao 2022-06-17 14:23 ` Zhangfei Gao 2022-06-17 14:23 ` Zhangfei Gao 2022-06-20 13:25 ` Jean-Philippe Brucker 2022-06-20 13:25 ` Jean-Philippe Brucker 2022-06-20 13:24 ` Jean-Philippe Brucker 2022-06-20 13:24 ` Jean-Philippe Brucker 2022-06-20 13:36 ` Greg Kroah-Hartman 2022-06-20 13:36 ` Greg Kroah-Hartman 2022-06-21 7:37 ` Zhangfei Gao 2022-06-21 7:37 ` Zhangfei Gao 2022-06-21 7:44 ` Greg Kroah-Hartman 2022-06-21 7:44 ` Greg Kroah-Hartman 2022-06-22 8:14 ` Zhangfei Gao 2022-06-22 8:14 ` Zhangfei Gao 2022-06-22 8:24 ` Greg Kroah-Hartman 2022-06-22 8:24 ` Greg Kroah-Hartman 2022-06-20 13:38 ` Greg Kroah-Hartman 2022-06-20 13:38 ` Greg Kroah-Hartman 2022-06-20 20:18 ` [PATCH] uacce: Tidy up locking kernel test robot 2022-06-20 20:18 ` kernel test robot
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=fdc8d8b0-4e04-78f5-1e8a-4cf44c89a37f@linaro.org \ --to=zhangfei.gao@linaro.org \ --cc=Jonathan.Cameron@huawei.com \ --cc=arnd@arndb.de \ --cc=gregkh@linuxfoundation.org \ --cc=herbert@gondor.apana.org.au \ --cc=iommu@lists.linux-foundation.org \ --cc=jean-philippe@linaro.org \ --cc=linux-accelerators@lists.ozlabs.org \ --cc=linux-crypto@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=shenyang39@huawei.com \ --cc=wangzhou1@hisilicon.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.