All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drivers:uio: Fix system crashes during driver switchover
@ 2022-05-11  6:34 zhanghongtao (A)
  2022-05-11  6:48 ` Greg KH
  2022-05-11 12:45 ` Christoph Hellwig
  0 siblings, 2 replies; 6+ messages in thread
From: zhanghongtao (A) @ 2022-05-11  6:34 UTC (permalink / raw)
  To: gregkh; +Cc: akpm, linfeilong, suweifeng1, linux-kernel, io-uring

From: Hongtao Zhang <zhanghongtao22@huawei.com>

Switch the driver of the SPDK program that is being read and written from the uio_pci_generic driver to the NVMe driver
(Unbind the UIO driver from the device and bind the NVMe driver to the device.) ,the system crashes and restarts.
Bug reproduction: When the SPDK is reading or writing data, run the following command: /opt/spdk/setup.sh reset
The one with a higher probability of occurrence is as follows:
PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"
    [exception RIP: _raw_spin_lock_irqsave+30]
    RIP: ffffffff836a1cae  RSP: ffff8bca9ecc3f20  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000246  RCX: 0000000000000017
    RDX: 0000000000000001  RSI: 0000000000000000  RDI: 0000000000000008
    RBP: 0000000000000000   R8: 000000afb34e50f9   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff8bca9ecc3f50
    R13: 0000000000000004  R14: 0000000000000004  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff8bca9ecc3f28] complete at ffffffff82f09bb8
reason:After the driver switchover, the upper-layer program can still access the bar space of the NVMe disk controller and knock the doorbell.
To solve this problem, a reference counting is added to prevent unbind execution before the application is closed or exited.

Signed-off-by: Hongtao Zhang <zhanghongtao22@huawei.com>
Reviewed-by: Weifeng Su <suweifeng1@huawei.com>
---
 drivers/uio/uio.c          | 13 +++++++++++++
 include/linux/uio_driver.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 43afbb7c5ab9..cb8ed29a8648 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -31,6 +31,7 @@ static int uio_major;
 static struct cdev *uio_cdev;
 static DEFINE_IDR(uio_idr);
 static const struct file_operations uio_fops;
+static DECLARE_WAIT_QUEUE_HEAD(refc_wait);

 /* Protect idr accesses */
 static DEFINE_MUTEX(minor_lock);
@@ -501,6 +502,7 @@ static int uio_open(struct inode *inode, struct file *filep)
 	mutex_unlock(&idev->info_lock);
 	if (ret)
 		goto err_infoopen;
+	refcount_inc(&idev->dev_refc);

 	return 0;

@@ -536,6 +538,9 @@ static int uio_release(struct inode *inode, struct file *filep)
 		ret = idev->info->release(idev->info, inode);
 	mutex_unlock(&idev->info_lock);

+	if (refcount_dec_and_test(&idev->dev_refc))
+			wake_up(&refc_wait);
+
 	module_put(idev->owner);
 	kfree(listener);
 	put_device(&idev->dev);
@@ -937,6 +942,7 @@ int __uio_register_device(struct module *owner,

 	idev->owner = owner;
 	idev->info = info;
+	refcount_set(&idev->dev_refc, 0);
 	mutex_init(&idev->info_lock);
 	init_waitqueue_head(&idev->wait);
 	atomic_set(&idev->event, 0);
@@ -1045,6 +1051,7 @@ void uio_unregister_device(struct uio_info *info)
 {
 	struct uio_device *idev;
 	unsigned long minor;
+	unsigned int dref_count;

 	if (!info || !info->uio_dev)
 		return;
@@ -1052,6 +1059,12 @@ void uio_unregister_device(struct uio_info *info)
 	idev = info->uio_dev;
 	minor = idev->minor;

+	dref_count = refcount_read(&idev->dev_refc);
+	if (dref_count > 0) {
+		dev_err(&idev->dev, "The device is in use, please close the file descriptor or kill the occupied process\n");
+		wait_event(refc_wait, !refcount_read(&idev->dev_refc));
+	}
+
 	mutex_lock(&idev->info_lock);
 	uio_dev_del_attributes(idev);

diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
index 47c5962b876b..28301dcc4d31 100644
--- a/include/linux/uio_driver.h
+++ b/include/linux/uio_driver.h
@@ -77,6 +77,7 @@ struct uio_device {
 	struct mutex		info_lock;
 	struct kobject          *map_dir;
 	struct kobject          *portio_dir;
+	refcount_t				dev_refc;
 };

 /**
-- 
2.27.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers:uio: Fix system crashes during driver switchover
  2022-05-11  6:34 [PATCH] drivers:uio: Fix system crashes during driver switchover zhanghongtao (A)
@ 2022-05-11  6:48 ` Greg KH
  2022-05-11  8:51   ` zhanghongtao (A)
  2022-05-11 12:45 ` Christoph Hellwig
  1 sibling, 1 reply; 6+ messages in thread
From: Greg KH @ 2022-05-11  6:48 UTC (permalink / raw)
  To: zhanghongtao (A); +Cc: akpm, linfeilong, suweifeng1, linux-kernel, io-uring

On Wed, May 11, 2022 at 02:34:28PM +0800, zhanghongtao (A) wrote:
> From: Hongtao Zhang <zhanghongtao22@huawei.com>
> 
> Switch the driver of the SPDK program that is being read and written from the uio_pci_generic driver to the NVMe driver
> (Unbind the UIO driver from the device and bind the NVMe driver to the device.) ,the system crashes and restarts.
> Bug reproduction: When the SPDK is reading or writing data, run the following command: /opt/spdk/setup.sh reset

Please properly wrap your lines at 72 columns like the editor asked you
to.

> The one with a higher probability of occurrence is as follows:
> PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"
>     [exception RIP: _raw_spin_lock_irqsave+30]
>     RIP: ffffffff836a1cae  RSP: ffff8bca9ecc3f20  RFLAGS: 00010046
>     RAX: 0000000000000000  RBX: 0000000000000246  RCX: 0000000000000017
>     RDX: 0000000000000001  RSI: 0000000000000000  RDI: 0000000000000008
>     RBP: 0000000000000000   R8: 000000afb34e50f9   R9: 0000000000000000
>     R10: 0000000000000000  R11: 0000000000000000  R12: ffff8bca9ecc3f50
>     R13: 0000000000000004  R14: 0000000000000004  R15: 0000000000000000
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #7 [ffff8bca9ecc3f28] complete at ffffffff82f09bb8
> reason:After the driver switchover, the upper-layer program can still access the bar space of the NVMe disk controller and knock the doorbell.
> To solve this problem, a reference counting is added to prevent unbind execution before the application is closed or exited.
> 
> Signed-off-by: Hongtao Zhang <zhanghongtao22@huawei.com>
> Reviewed-by: Weifeng Su <suweifeng1@huawei.com>
> ---
>  drivers/uio/uio.c          | 13 +++++++++++++
>  include/linux/uio_driver.h |  1 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
> index 43afbb7c5ab9..cb8ed29a8648 100644
> --- a/drivers/uio/uio.c
> +++ b/drivers/uio/uio.c
> @@ -31,6 +31,7 @@ static int uio_major;
>  static struct cdev *uio_cdev;
>  static DEFINE_IDR(uio_idr);
>  static const struct file_operations uio_fops;
> +static DECLARE_WAIT_QUEUE_HEAD(refc_wait);
> 
>  /* Protect idr accesses */
>  static DEFINE_MUTEX(minor_lock);
> @@ -501,6 +502,7 @@ static int uio_open(struct inode *inode, struct file *filep)
>  	mutex_unlock(&idev->info_lock);
>  	if (ret)
>  		goto err_infoopen;
> +	refcount_inc(&idev->dev_refc);
> 
>  	return 0;
> 
> @@ -536,6 +538,9 @@ static int uio_release(struct inode *inode, struct file *filep)
>  		ret = idev->info->release(idev->info, inode);
>  	mutex_unlock(&idev->info_lock);
> 
> +	if (refcount_dec_and_test(&idev->dev_refc))
> +			wake_up(&refc_wait);
> +
>  	module_put(idev->owner);
>  	kfree(listener);
>  	put_device(&idev->dev);
> @@ -937,6 +942,7 @@ int __uio_register_device(struct module *owner,
> 
>  	idev->owner = owner;
>  	idev->info = info;
> +	refcount_set(&idev->dev_refc, 0);
>  	mutex_init(&idev->info_lock);
>  	init_waitqueue_head(&idev->wait);
>  	atomic_set(&idev->event, 0);
> @@ -1045,6 +1051,7 @@ void uio_unregister_device(struct uio_info *info)
>  {
>  	struct uio_device *idev;
>  	unsigned long minor;
> +	unsigned int dref_count;
> 
>  	if (!info || !info->uio_dev)
>  		return;
> @@ -1052,6 +1059,12 @@ void uio_unregister_device(struct uio_info *info)
>  	idev = info->uio_dev;
>  	minor = idev->minor;
> 
> +	dref_count = refcount_read(&idev->dev_refc);
> +	if (dref_count > 0) {

You can not do this, it could have changed right after reading this.

Also we went through this many times in the past already, why submit
this type of change again?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers:uio: Fix system crashes during driver switchover
  2022-05-11  6:48 ` Greg KH
@ 2022-05-11  8:51   ` zhanghongtao (A)
  2022-05-11 10:09     ` Greg KH
  2022-05-17  7:34     ` Greg KH
  0 siblings, 2 replies; 6+ messages in thread
From: zhanghongtao (A) @ 2022-05-11  8:51 UTC (permalink / raw)
  To: Greg KH; +Cc: akpm, linfeilong, suweifeng1, linux-kernel, io-uring

Thanks for your reply.
I looked through the historical emails and thought I was not the
same problem as his.
After the driver is switched, the application can still operate
on the mapped address, which causes the system to crash.
The application is not aware of the driver's switchover.
The solution I can think of is to block the switch and wait for
the application to release before switching, as shown in the patch.
So want to seek help from the community, how to solve it better?
Is there a better way?

在 2022/5/11 14:48, Greg KH 写道:
> On Wed, May 11, 2022 at 02:34:28PM +0800, zhanghongtao (A) wrote:
>> From: Hongtao Zhang <zhanghongtao22@huawei.com>
>>
>> Switch the driver of the SPDK program that is being read and written from the uio_pci_generic driver to the NVMe driver
>> (Unbind the UIO driver from the device and bind the NVMe driver to the device.) ,the system crashes and restarts.
>> Bug reproduction: When the SPDK is reading or writing data, run the following command: /opt/spdk/setup.sh reset
> 
> Please properly wrap your lines at 72 columns like the editor asked you
> to.
> 
>> The one with a higher probability of occurrence is as follows:
>> PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"
>>     [exception RIP: _raw_spin_lock_irqsave+30]
>>     RIP: ffffffff836a1cae  RSP: ffff8bca9ecc3f20  RFLAGS: 00010046
>>     RAX: 0000000000000000  RBX: 0000000000000246  RCX: 0000000000000017
>>     RDX: 0000000000000001  RSI: 0000000000000000  RDI: 0000000000000008
>>     RBP: 0000000000000000   R8: 000000afb34e50f9   R9: 0000000000000000
>>     R10: 0000000000000000  R11: 0000000000000000  R12: ffff8bca9ecc3f50
>>     R13: 0000000000000004  R14: 0000000000000004  R15: 0000000000000000
>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>  #7 [ffff8bca9ecc3f28] complete at ffffffff82f09bb8
>> reason:After the driver switchover, the upper-layer program can still access the bar space of the NVMe disk controller and knock the doorbell.
>> To solve this problem, a reference counting is added to prevent unbind execution before the application is closed or exited.
>>
>> Signed-off-by: Hongtao Zhang <zhanghongtao22@huawei.com>
>> Reviewed-by: Weifeng Su <suweifeng1@huawei.com>
>> ---
>>  drivers/uio/uio.c          | 13 +++++++++++++
>>  include/linux/uio_driver.h |  1 +
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
>> index 43afbb7c5ab9..cb8ed29a8648 100644
>> --- a/drivers/uio/uio.c
>> +++ b/drivers/uio/uio.c
>> @@ -31,6 +31,7 @@ static int uio_major;
>>  static struct cdev *uio_cdev;
>>  static DEFINE_IDR(uio_idr);
>>  static const struct file_operations uio_fops;
>> +static DECLARE_WAIT_QUEUE_HEAD(refc_wait);
>>
>>  /* Protect idr accesses */
>>  static DEFINE_MUTEX(minor_lock);
>> @@ -501,6 +502,7 @@ static int uio_open(struct inode *inode, struct file *filep)
>>  	mutex_unlock(&idev->info_lock);
>>  	if (ret)
>>  		goto err_infoopen;
>> +	refcount_inc(&idev->dev_refc);
>>
>>  	return 0;
>>
>> @@ -536,6 +538,9 @@ static int uio_release(struct inode *inode, struct file *filep)
>>  		ret = idev->info->release(idev->info, inode);
>>  	mutex_unlock(&idev->info_lock);
>>
>> +	if (refcount_dec_and_test(&idev->dev_refc))
>> +			wake_up(&refc_wait);
>> +
>>  	module_put(idev->owner);
>>  	kfree(listener);
>>  	put_device(&idev->dev);
>> @@ -937,6 +942,7 @@ int __uio_register_device(struct module *owner,
>>
>>  	idev->owner = owner;
>>  	idev->info = info;
>> +	refcount_set(&idev->dev_refc, 0);
>>  	mutex_init(&idev->info_lock);
>>  	init_waitqueue_head(&idev->wait);
>>  	atomic_set(&idev->event, 0);
>> @@ -1045,6 +1051,7 @@ void uio_unregister_device(struct uio_info *info)
>>  {
>>  	struct uio_device *idev;
>>  	unsigned long minor;
>> +	unsigned int dref_count;
>>
>>  	if (!info || !info->uio_dev)
>>  		return;
>> @@ -1052,6 +1059,12 @@ void uio_unregister_device(struct uio_info *info)
>>  	idev = info->uio_dev;
>>  	minor = idev->minor;
>>
>> +	dref_count = refcount_read(&idev->dev_refc);
>> +	if (dref_count > 0) {
> 
> You can not do this, it could have changed right after reading this.
> 
> Also we went through this many times in the past already, why submit
> this type of change again?
> 
> thanks,
> 
> greg k-h
> .

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers:uio: Fix system crashes during driver switchover
  2022-05-11  8:51   ` zhanghongtao (A)
@ 2022-05-11 10:09     ` Greg KH
  2022-05-17  7:34     ` Greg KH
  1 sibling, 0 replies; 6+ messages in thread
From: Greg KH @ 2022-05-11 10:09 UTC (permalink / raw)
  To: zhanghongtao (A); +Cc: akpm, linfeilong, suweifeng1, linux-kernel, io-uring

A: http://en.wikipedia.org/wiki/Top_post
Q: Were do I find info about this thing called top-posting?
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top

On Wed, May 11, 2022 at 04:51:11PM +0800, zhanghongtao (A) wrote:
> Thanks for your reply.
> I looked through the historical emails and thought I was not the
> same problem as his.
> After the driver is switched, the application can still operate
> on the mapped address, which causes the system to crash.
> The application is not aware of the driver's switchover.
> The solution I can think of is to block the switch and wait for
> the application to release before switching, as shown in the patch.
> So want to seek help from the community, how to solve it better?
> Is there a better way?

A better way to do what?  I have no context here at all, sorry.

What is "switching" the driver?  If it is root by doing a bind/unbind,
then you are on your own and have to do that very carefully, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers:uio: Fix system crashes during driver switchover
  2022-05-11  6:34 [PATCH] drivers:uio: Fix system crashes during driver switchover zhanghongtao (A)
  2022-05-11  6:48 ` Greg KH
@ 2022-05-11 12:45 ` Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2022-05-11 12:45 UTC (permalink / raw)
  To: zhanghongtao (A)
  Cc: gregkh, akpm, linfeilong, suweifeng1, linux-kernel, io-uring

I thought SPDK uses vfio?  Using uio for anything that does DMA is
completelz unsafe.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drivers:uio: Fix system crashes during driver switchover
  2022-05-11  8:51   ` zhanghongtao (A)
  2022-05-11 10:09     ` Greg KH
@ 2022-05-17  7:34     ` Greg KH
  1 sibling, 0 replies; 6+ messages in thread
From: Greg KH @ 2022-05-17  7:34 UTC (permalink / raw)
  To: zhanghongtao (A); +Cc: akpm, linfeilong, suweifeng1, linux-kernel, io-uring

On Wed, May 11, 2022 at 04:51:11PM +0800, zhanghongtao (A) wrote:
> Thanks for your reply.
> I looked through the historical emails and thought I was not the
> same problem as his.
> After the driver is switched, the application can still operate
> on the mapped address, which causes the system to crash.
> The application is not aware of the driver's switchover.
> The solution I can think of is to block the switch and wait for
> the application to release before switching, as shown in the patch.
> So want to seek help from the community, how to solve it better?
> Is there a better way?

Yes, do not unbind the driver from the device unless all userspace
programs have stopped using the device.

Also, do not use the UIO driver for disk devices, that way is crazy, use
a real block driver.  What prevents you from doing that today?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-17  7:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-11  6:34 [PATCH] drivers:uio: Fix system crashes during driver switchover zhanghongtao (A)
2022-05-11  6:48 ` Greg KH
2022-05-11  8:51   ` zhanghongtao (A)
2022-05-11 10:09     ` Greg KH
2022-05-17  7:34     ` Greg KH
2022-05-11 12:45 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.