linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath
@ 2023-03-21 10:50 Lei Lei2 Yin
  2023-03-21 11:09 ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Lei Lei2 Yin @ 2023-03-21 10:50 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel, cybeyond

From b134e7930b50679ce48e5522ddd37672b1802340 Mon Sep 17 00:00:00 2001
From: Lei Yin <yinlei2@lenovo.com>
Date: Tue, 21 Mar 2023 16:09:08 +0800
Subject: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme
 multipath

When blk_queue_split works in nvme_ns_head_submit_bio, input bio will be
splited to two bios. If parent bio is completed first, and the bi_disk
in parent bio is kfreed by nvme_free_ns, child will access this freed
bi_disk in bio_endio. This will trigger heap-use-after-free or null
pointer oops.

The following is kasan report:

BUG: KASAN: use-after-free in bio_endio+0x477/0x500
Read of size 8 at addr ffff888106f2e3a8 by task kworker/1:1H/241

CPU: 1 PID: 241 Comm: kworker/1:1H Kdump: loaded Tainted: G           O
      5.10.167 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: kblockd nvme_requeue_work [nvme_core]
Call Trace:
 dump_stack+0x92/0xc4
 ? bio_endio+0x477/0x500
 print_address_description.constprop.7+0x1e/0x230
 ? record_print_text.cold.40+0x11/0x11
 ? _raw_spin_trylock_bh+0x120/0x120
 ? blk_throtl_bio+0x225/0x3050
 ? bio_endio+0x477/0x500
 ? bio_endio+0x477/0x500
 kasan_report.cold.9+0x37/0x7c
 ? bio_endio+0x477/0x500
 bio_endio+0x477/0x500
 nvme_ns_head_submit_bio+0x950/0x1130 [nvme_core]
 ? nvme_find_path+0x7f0/0x7f0 [nvme_core]
 ? __kasan_slab_free+0x11a/0x150
 ? bio_endio+0x213/0x500
 submit_bio_noacct+0x2a4/0xd10
 ? _dev_info+0xcd/0xff
 ? _dev_notice+0xff/0xff
 ? blk_queue_enter+0x6c0/0x6c0
 ? _raw_spin_lock_irq+0x81/0xd5
 ? _raw_spin_lock+0xd0/0xd0
 nvme_requeue_work+0x144/0x18c [nvme_core]
 process_one_work+0x878/0x13e0
 worker_thread+0x87/0xf70
 ? __kthread_parkme+0x8f/0x100
 ? process_one_work+0x13e0/0x13e0
 kthread+0x30f/0x3d0
 ? kthread_parkme+0x80/0x80
 ret_from_fork+0x1f/0x30

Allocated by task 52:
 kasan_save_stack+0x19/0x40
 __kasan_kmalloc.constprop.11+0xc8/0xd0
 __alloc_disk_node+0x5c/0x320
 nvme_alloc_ns+0x6e9/0x1520 [nvme_core]
 nvme_validate_or_alloc_ns+0x17c/0x370 [nvme_core]
 nvme_scan_work+0x2d4/0x4d0 [nvme_core]
 process_one_work+0x878/0x13e0
 worker_thread+0x87/0xf70
 kthread+0x30f/0x3d0
 ret_from_fork+0x1f/0x30

Freed by task 54:
 kasan_save_stack+0x19/0x40
 kasan_set_track+0x1c/0x30
 kasan_set_free_info+0x1b/0x30
 __kasan_slab_free+0x108/0x150
 kfree+0xa7/0x300
 device_release+0x98/0x210
 kobject_release+0x109/0x3a0
 nvme_free_ns+0x15e/0x1f7 [nvme_core]
 nvme_remove_namespaces+0x22f/0x390 [nvme_core]
 nvme_do_delete_ctrl+0xac/0x106 [nvme_core]
 process_one_work+0x878/0x13e0
 worker_thread+0x87/0xf70
 kthread+0x30f/0x3d0
 ret_from_fork+0x1f/0x30

Signed-off-by: Lei Yin <yinlei2@lenovo.com>
---
 drivers/nvme/host/nvme.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index c3e4d9b6f9c0..b441c5ce4157 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -749,8 +749,17 @@ static inline void nvme_trace_bio_complete(struct request *req,
 {
 	struct nvme_ns *ns = req->q->queuedata;
 
-	if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio)
+	if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio) {
 		trace_block_bio_complete(ns->head->disk->queue, req->bio);
+
+		/* Point bio->bi_disk to head disk.
+		 * This bio maybe as other bio's parent in bio chain. If this bi_disk
+		 * is kfreed by nvme_free_ns, other bio may get this bio by __bio_chain_endio
+		 * in bio_endio, and access this bi_disk. This will trigger heap-use-after-free
+		 * or null pointer oops.
+		 */
+		req->bio->bi_disk = ns->head->disk;
+	}
 }
 
 extern struct device_attribute dev_attr_ana_grpid;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath
  2023-03-21 10:50 [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath Lei Lei2 Yin
@ 2023-03-21 11:09 ` Sagi Grimberg
  2023-03-21 11:54   ` [External] " Lei Lei2 Yin
  0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2023-03-21 11:09 UTC (permalink / raw)
  To: Lei Lei2 Yin, kbusch, axboe, hch; +Cc: linux-nvme, linux-kernel, cybeyond



On 3/21/23 12:50, Lei Lei2 Yin wrote:
>  From b134e7930b50679ce48e5522ddd37672b1802340 Mon Sep 17 00:00:00 2001
> From: Lei Yin <yinlei2@lenovo.com>
> Date: Tue, 21 Mar 2023 16:09:08 +0800
> Subject: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme
>   multipath
> 
> When blk_queue_split works in nvme_ns_head_submit_bio, input bio will be
> splited to two bios. If parent bio is completed first, and the bi_disk
> in parent bio is kfreed by nvme_free_ns, child will access this freed
> bi_disk in bio_endio. This will trigger heap-use-after-free or null
> pointer oops.

Can you explain further? It is unclear to me how we can delete the ns
gendisk

> 
> The following is kasan report:
> 
> BUG: KASAN: use-after-free in bio_endio+0x477/0x500
> Read of size 8 at addr ffff888106f2e3a8 by task kworker/1:1H/241
> 
> CPU: 1 PID: 241 Comm: kworker/1:1H Kdump: loaded Tainted: G           O
>        5.10.167 #1
> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> Workqueue: kblockd nvme_requeue_work [nvme_core]
> Call Trace:
>   dump_stack+0x92/0xc4
>   ? bio_endio+0x477/0x500
>   print_address_description.constprop.7+0x1e/0x230
>   ? record_print_text.cold.40+0x11/0x11
>   ? _raw_spin_trylock_bh+0x120/0x120
>   ? blk_throtl_bio+0x225/0x3050
>   ? bio_endio+0x477/0x500
>   ? bio_endio+0x477/0x500
>   kasan_report.cold.9+0x37/0x7c
>   ? bio_endio+0x477/0x500
>   bio_endio+0x477/0x500
>   nvme_ns_head_submit_bio+0x950/0x1130 [nvme_core]
>   ? nvme_find_path+0x7f0/0x7f0 [nvme_core]
>   ? __kasan_slab_free+0x11a/0x150
>   ? bio_endio+0x213/0x500
>   submit_bio_noacct+0x2a4/0xd10
>   ? _dev_info+0xcd/0xff
>   ? _dev_notice+0xff/0xff
>   ? blk_queue_enter+0x6c0/0x6c0
>   ? _raw_spin_lock_irq+0x81/0xd5
>   ? _raw_spin_lock+0xd0/0xd0
>   nvme_requeue_work+0x144/0x18c [nvme_core]
>   process_one_work+0x878/0x13e0
>   worker_thread+0x87/0xf70
>   ? __kthread_parkme+0x8f/0x100
>   ? process_one_work+0x13e0/0x13e0
>   kthread+0x30f/0x3d0
>   ? kthread_parkme+0x80/0x80
>   ret_from_fork+0x1f/0x30
> 
> Allocated by task 52:
>   kasan_save_stack+0x19/0x40
>   __kasan_kmalloc.constprop.11+0xc8/0xd0
>   __alloc_disk_node+0x5c/0x320
>   nvme_alloc_ns+0x6e9/0x1520 [nvme_core]
>   nvme_validate_or_alloc_ns+0x17c/0x370 [nvme_core]
>   nvme_scan_work+0x2d4/0x4d0 [nvme_core]
>   process_one_work+0x878/0x13e0
>   worker_thread+0x87/0xf70
>   kthread+0x30f/0x3d0
>   ret_from_fork+0x1f/0x30
> 
> Freed by task 54:
>   kasan_save_stack+0x19/0x40
>   kasan_set_track+0x1c/0x30
>   kasan_set_free_info+0x1b/0x30
>   __kasan_slab_free+0x108/0x150
>   kfree+0xa7/0x300
>   device_release+0x98/0x210
>   kobject_release+0x109/0x3a0
>   nvme_free_ns+0x15e/0x1f7 [nvme_core]
>   nvme_remove_namespaces+0x22f/0x390 [nvme_core]
>   nvme_do_delete_ctrl+0xac/0x106 [nvme_core]
>   process_one_work+0x878/0x13e0
>   worker_thread+0x87/0xf70
>   kthread+0x30f/0x3d0
>   ret_from_fork+0x1f/0x30
> 
> Signed-off-by: Lei Yin <yinlei2@lenovo.com>
> ---
>   drivers/nvme/host/nvme.h | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index c3e4d9b6f9c0..b441c5ce4157 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -749,8 +749,17 @@ static inline void nvme_trace_bio_complete(struct request *req,
>   {
>   	struct nvme_ns *ns = req->q->queuedata;
>   
> -	if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio)
> +	if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio) {
>   		trace_block_bio_complete(ns->head->disk->queue, req->bio);
> +
> +		/* Point bio->bi_disk to head disk.
> +		 * This bio maybe as other bio's parent in bio chain. If this bi_disk
> +		 * is kfreed by nvme_free_ns, other bio may get this bio by __bio_chain_endio
> +		 * in bio_endio, and access this bi_disk. This will trigger heap-use-after-free
> +		 * or null pointer oops.
> +		 */
> +		req->bio->bi_disk = ns->head->disk;
> +	}

This is absolutely the wrong place to do this. This is a tracing
function, it should not have any other logic.

What tree is this against anyways? There is no bi_disk in struct bio 
anymore.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath
  2023-03-21 11:09 ` Sagi Grimberg
@ 2023-03-21 11:54   ` Lei Lei2 Yin
  2023-03-21 12:26     ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Lei Lei2 Yin @ 2023-03-21 11:54 UTC (permalink / raw)
  To: Sagi Grimberg, kbusch, axboe, hch; +Cc: linux-nvme, linux-kernel, cybeyond

	Thank you for your reply

	This problem occurs in nvme over rdma and nvme over tcp with nvme generate multipath. Delete the ns gendisk is caused by nvmf target subsystem is faulty, then host detect all path keep alive overtime and io timeout. After ctrl-loss-tmo seconds, host will remove fail ctrl and ns gendisk.

	We have reappear this proble in Linux-5.10.136, Linux-5.10.167 and the latest commit in linux-5.10.y, and this patch is only applicable to Linux-5.10.y

	Yes , this is absolutely the wrong place to do this . Can i move this modification after nvme_trace_bio_complete?

	Do I need to resubmit a patch, if modifications are needed?



-----邮件原件-----
发件人: Sagi Grimberg <sagi@grimberg.me> 
发送时间: 2023年3月21日 19:09
收件人: Lei Lei2 Yin <yinlei2@lenovo.com>; kbusch@kernel.org; axboe@fb.com; hch@lst.de
抄送: linux-nvme@lists.infradead.org; linux-kernel@vger.kernel.org; cybeyond@foxmail.com
主题: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath



On 3/21/23 12:50, Lei Lei2 Yin wrote:
>  From b134e7930b50679ce48e5522ddd37672b1802340 Mon Sep 17 00:00:00 
> 2001
> From: Lei Yin <yinlei2@lenovo.com>
> Date: Tue, 21 Mar 2023 16:09:08 +0800
> Subject: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme
>   multipath
> 
> When blk_queue_split works in nvme_ns_head_submit_bio, input bio will 
> be splited to two bios. If parent bio is completed first, and the 
> bi_disk in parent bio is kfreed by nvme_free_ns, child will access 
> this freed bi_disk in bio_endio. This will trigger heap-use-after-free 
> or null pointer oops.

Can you explain further? It is unclear to me how we can delete the ns gendisk

> 
> The following is kasan report:
> 
> BUG: KASAN: use-after-free in bio_endio+0x477/0x500 Read of size 8 at 
> addr ffff888106f2e3a8 by task kworker/1:1H/241
> 
> CPU: 1 PID: 241 Comm: kworker/1:1H Kdump: loaded Tainted: G           O
>        5.10.167 #1
> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> Workqueue: kblockd nvme_requeue_work [nvme_core] Call Trace:
>   dump_stack+0x92/0xc4
>   ? bio_endio+0x477/0x500
>   print_address_description.constprop.7+0x1e/0x230
>   ? record_print_text.cold.40+0x11/0x11
>   ? _raw_spin_trylock_bh+0x120/0x120
>   ? blk_throtl_bio+0x225/0x3050
>   ? bio_endio+0x477/0x500
>   ? bio_endio+0x477/0x500
>   kasan_report.cold.9+0x37/0x7c
>   ? bio_endio+0x477/0x500
>   bio_endio+0x477/0x500
>   nvme_ns_head_submit_bio+0x950/0x1130 [nvme_core]
>   ? nvme_find_path+0x7f0/0x7f0 [nvme_core]
>   ? __kasan_slab_free+0x11a/0x150
>   ? bio_endio+0x213/0x500
>   submit_bio_noacct+0x2a4/0xd10
>   ? _dev_info+0xcd/0xff
>   ? _dev_notice+0xff/0xff
>   ? blk_queue_enter+0x6c0/0x6c0
>   ? _raw_spin_lock_irq+0x81/0xd5
>   ? _raw_spin_lock+0xd0/0xd0
>   nvme_requeue_work+0x144/0x18c [nvme_core]
>   process_one_work+0x878/0x13e0
>   worker_thread+0x87/0xf70
>   ? __kthread_parkme+0x8f/0x100
>   ? process_one_work+0x13e0/0x13e0
>   kthread+0x30f/0x3d0
>   ? kthread_parkme+0x80/0x80
>   ret_from_fork+0x1f/0x30
> 
> Allocated by task 52:
>   kasan_save_stack+0x19/0x40
>   __kasan_kmalloc.constprop.11+0xc8/0xd0
>   __alloc_disk_node+0x5c/0x320
>   nvme_alloc_ns+0x6e9/0x1520 [nvme_core]
>   nvme_validate_or_alloc_ns+0x17c/0x370 [nvme_core]
>   nvme_scan_work+0x2d4/0x4d0 [nvme_core]
>   process_one_work+0x878/0x13e0
>   worker_thread+0x87/0xf70
>   kthread+0x30f/0x3d0
>   ret_from_fork+0x1f/0x30
> 
> Freed by task 54:
>   kasan_save_stack+0x19/0x40
>   kasan_set_track+0x1c/0x30
>   kasan_set_free_info+0x1b/0x30
>   __kasan_slab_free+0x108/0x150
>   kfree+0xa7/0x300
>   device_release+0x98/0x210
>   kobject_release+0x109/0x3a0
>   nvme_free_ns+0x15e/0x1f7 [nvme_core]
>   nvme_remove_namespaces+0x22f/0x390 [nvme_core]
>   nvme_do_delete_ctrl+0xac/0x106 [nvme_core]
>   process_one_work+0x878/0x13e0
>   worker_thread+0x87/0xf70
>   kthread+0x30f/0x3d0
>   ret_from_fork+0x1f/0x30
> 
> Signed-off-by: Lei Yin <yinlei2@lenovo.com>
> ---
>   drivers/nvme/host/nvme.h | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 
> c3e4d9b6f9c0..b441c5ce4157 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -749,8 +749,17 @@ static inline void nvme_trace_bio_complete(struct request *req,
>   {
>   	struct nvme_ns *ns = req->q->queuedata;
>   
> -	if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio)
> +	if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio) {
>   		trace_block_bio_complete(ns->head->disk->queue, req->bio);
> +
> +		/* Point bio->bi_disk to head disk.
> +		 * This bio maybe as other bio's parent in bio chain. If this bi_disk
> +		 * is kfreed by nvme_free_ns, other bio may get this bio by __bio_chain_endio
> +		 * in bio_endio, and access this bi_disk. This will trigger heap-use-after-free
> +		 * or null pointer oops.
> +		 */
> +		req->bio->bi_disk = ns->head->disk;
> +	}

This is absolutely the wrong place to do this. This is a tracing function, it should not have any other logic.

What tree is this against anyways? There is no bi_disk in struct bio anymore.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath
  2023-03-21 11:54   ` [External] " Lei Lei2 Yin
@ 2023-03-21 12:26     ` Sagi Grimberg
  2023-03-21 13:30       ` 回复: " Lei Lei2 Yin
  0 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2023-03-21 12:26 UTC (permalink / raw)
  To: Lei Lei2 Yin, kbusch, axboe, hch; +Cc: linux-nvme, linux-kernel, cybeyond


> 	Thank you for your reply
> 
> 	This problem occurs in nvme over rdma and nvme over tcp with nvme generate multipath. Delete the ns gendisk is caused by nvmf target subsystem is faulty, then host detect all path keep alive overtime and io timeout. After ctrl-loss-tmo seconds, host will remove fail ctrl and ns gendisk.

That is fine, but it is a problem if it does not correctly drain
inflight I/O, weather it was split or not. And this looks like the wrong
place to address this.

> 	We have reappear this proble in Linux-5.10.136, Linux-5.10.167 and the latest commit in linux-5.10.y, and this patch is only applicable to Linux-5.10.y

So my understanding that this does not reproduce upstream?

> 
> 	Yes , this is absolutely the wrong place to do this . Can i move this modification after nvme_trace_bio_complete?
> 
> 	Do I need to resubmit a patch, if modifications are needed?

Yes, but a backport fix needs to be sent to stable mailing list
(stable@vger.kernel.org) and cc'd to linux-nvme mailing list.

But I don't think that this fix is the correct one. What is needed is
to identify where this was fixed upstream and backport that fix instead.
If that is too involving because of code dependencies, it may be
possible to send an alternative surgical fix, but it needs to be
justified.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* 回复: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath
  2023-03-21 12:26     ` Sagi Grimberg
@ 2023-03-21 13:30       ` Lei Lei2 Yin
  2023-03-21 15:00         ` hch
  2023-03-22  7:12         ` Sagi Grimberg
  0 siblings, 2 replies; 7+ messages in thread
From: Lei Lei2 Yin @ 2023-03-21 13:30 UTC (permalink / raw)
  To: Sagi Grimberg, kbusch, axboe, hch; +Cc: linux-nvme, linux-kernel, cybeyond


	No, I have not verified this issue with a system larger than 5.10.y(such as 5.15.y and 6.0 or furthor), because some function we need like cgroup in upper version kernel has changed too much, we can't use these upper version kernel.
	

	In addition , uptreams have change bi_disk's modify to bio_set_dev(bio, ns->disk->part0), and as you said there is no bi_disk in struct bio anymore. So that is too involving because of code dependencies,  i want to do is what you said, to send an alternative surgical fix.
	(I will confirm upstream for this problem in the near future, if it have same problem, i will submit this fix.)

	I'm not sure what evidence is needed to prove this problem and patch. The following is child bio and parent bio struct when heap-use-after-free occur catched by crash(I turn on kasan and panic_on_warn).

	Please help me confirm if this is enough, thanks.

	all bio from nvme_ns_head_submit_bio to bio_endio is nvme head disk, and failed bio is origin bio's parent, and its bi_disk(0xffff888153ead000) is kfreed before kasan warn(I confirmed this by adding a log). 


      KERNEL: /usr/lib/debug/vmlinux  [TAINTED]                        
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 8
        DATE: Mon Mar 20 19:43:39 CST 2023
      UPTIME: 00:05:33
LOAD AVERAGE: 73.43, 20.60, 7.11
       TASKS: 526
    NODENAME: C8
     RELEASE: 5.10.167
     VERSION: #1 SMP Fri Feb 17 11:02:17 CST 2023
     MACHINE: x86_64  (2194 Mhz)
      MEMORY: 64 GB
       PANIC: "Kernel panic - not syncing: KASAN: panic_on_warn set ..."
         PID: 417
     COMMAND: "kworker/5:1H"
        TASK: ffff888126972040  [THREAD_INFO: ffff888126972040]
         CPU: 5
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 417      TASK: ffff888126972040  CPU: 5    COMMAND: "kworker/5:1H"
 #0 [ffff88810ebcf828] machine_kexec at ffffffff8f701b3e
 #1 [ffff88810ebcf948] __crash_kexec at ffffffff8f9d28eb
 #2 [ffff88810ebcfa60] panic at ffffffff913967e9
 #3 [ffff88810ebcfb30] bio_endio at ffffffff902541f7
 #4 [ffff88810ebcfb78] bio_endio at ffffffff902541f7
 #5 [ffff88810ebcfba8] bio_endio at ffffffff902541f7
 #6 [ffff88810ebcfbd8] nvme_ns_head_submit_bio at ffffffffc13cf960 [nvme_core]
 #7 [ffff88810ebcfcc8] submit_bio_noacct at ffffffff9026b134
 #8 [ffff88810ebcfdb8] nvme_requeue_work at ffffffffc13cdc40 [nvme_core]
 #9 [ffff88810ebcfdf8] process_one_work at ffffffff8f8133c8
#10 [ffff88810ebcfe78] worker_thread at ffffffff8f813fb7
#11 [ffff88810ebcff10] kthread at ffffffff8f825e6f
#12 [ffff88810ebcff50] ret_from_fork at ffffffff8f60619f
crash> p *(struct bio *)0xffff8881890f4900   // child bio
$1 = {
  bi_next = 0x0, 
  bi_disk = 0xdae00188000001a1, 
  bi_opf = 33605633, 
  bi_flags = 1922, 
  bi_ioprio = 0, 
  bi_write_hint = 0, 
  bi_status = 10 '\n', 
  bi_partno = 0 '\000', 
  __bi_remaining = {
    counter = 1
  }, 
  bi_iter = {
    bi_sector = 12287744, 
    bi_size = 65536, 
    bi_idx = 3, 
    bi_bvec_done = 106496
  }, 
  bi_end_io = 0xffffffff90254280 <bio_chain_endio>, 
  bi_private = 0xffff888198b778d0, 
  bi_blkg = 0x0, 
  bi_issue = {
    value = 288230712376101481
  }, 
  {
    bi_integrity = 0x0
  }, 
  bi_vcnt = 0, 
  bi_max_vecs = 0, 
  __bi_cnt = {
    counter = 1
  }, 
  bi_io_vec = 0xffff8881a4530000, 
  bi_pool = 0xffff888141bd7af8, 
  bi_inline_vecs = 0xffff8881890f4978
}

crash> p *(struct bio *)0xffff888198b778d0    // parent bio
$2 = {
  bi_next = 0x0, 
  bi_disk = 0xffff888153ead000, 
  bi_opf = 33589249, 
  bi_flags = 1664, 
  bi_ioprio = 0, 
  bi_write_hint = 0, 
  bi_status = 10 '\n', 
  bi_partno = 0 '\000', 
  __bi_remaining = {
    counter = 0
  }, 
  bi_iter = {
    bi_sector = 12288000, 
    bi_size = 0, 
    bi_idx = 5, 
    bi_bvec_done = 0
  }, 
  bi_end_io = 0xffffffff8ff8df80 <blkdev_bio_end_io_simple>, 
  bi_private = 0xffff8881b0c54080, 
  bi_blkg = 0xffff8881974df400, 
  bi_issue = {
    value = 288230665264113654
  }, 
  {
    bi_integrity = 0x0
  }, 
  bi_vcnt = 5, 
  bi_max_vecs = 256, 
  __bi_cnt = {
    counter = 1
  }, 
  bi_io_vec = 0xffff8881a4530000, 
  bi_pool = 0x0, 
  bi_inline_vecs = 0xffff888198b77948
}
	
 

-----邮件原件-----
发件人: Sagi Grimberg <sagi@grimberg.me> 
发送时间: 2023年3月21日 20:26
收件人: Lei Lei2 Yin <yinlei2@lenovo.com>; kbusch@kernel.org; axboe@fb.com; hch@lst.de
抄送: linux-nvme@lists.infradead.org; linux-kernel@vger.kernel.org; cybeyond@foxmail.com
主题: Re: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath


> 	Thank you for your reply
> 
> 	This problem occurs in nvme over rdma and nvme over tcp with nvme generate multipath. Delete the ns gendisk is caused by nvmf target subsystem is faulty, then host detect all path keep alive overtime and io timeout. After ctrl-loss-tmo seconds, host will remove fail ctrl and ns gendisk.

That is fine, but it is a problem if it does not correctly drain inflight I/O, weather it was split or not. And this looks like the wrong place to address this.

> 	We have reappear this proble in Linux-5.10.136, Linux-5.10.167 and 
> the latest commit in linux-5.10.y, and this patch is only applicable 
> to Linux-5.10.y

So my understanding that this does not reproduce upstream?

> 
> 	Yes , this is absolutely the wrong place to do this . Can i move this modification after nvme_trace_bio_complete?
> 
> 	Do I need to resubmit a patch, if modifications are needed?

Yes, but a backport fix needs to be sent to stable mailing list
(stable@vger.kernel.org) and cc'd to linux-nvme mailing list.

But I don't think that this fix is the correct one. What is needed is to identify where this was fixed upstream and backport that fix instead.
If that is too involving because of code dependencies, it may be possible to send an alternative surgical fix, but it needs to be justified.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 回复: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath
  2023-03-21 13:30       ` 回复: " Lei Lei2 Yin
@ 2023-03-21 15:00         ` hch
  2023-03-22  7:12         ` Sagi Grimberg
  1 sibling, 0 replies; 7+ messages in thread
From: hch @ 2023-03-21 15:00 UTC (permalink / raw)
  To: Lei Lei2 Yin
  Cc: Sagi Grimberg, kbusch, axboe, hch, linux-nvme, linux-kernel, cybeyond

On Tue, Mar 21, 2023 at 01:30:18PM +0000, Lei Lei2 Yin wrote:
> 
> 	No, I have not verified this issue with a system larger than 5.10.y(such as 5.15.y and 6.0 or furthor), because some function we need like cgroup in upper version kernel has changed too much, we can't use these upper version kernel.

If you can't reproduce on in a non-tained upstream kernel it's entirely
your own problem to deal with.  Please don't waste the upstream
maintainers resources.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 回复: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath
  2023-03-21 13:30       ` 回复: " Lei Lei2 Yin
  2023-03-21 15:00         ` hch
@ 2023-03-22  7:12         ` Sagi Grimberg
  1 sibling, 0 replies; 7+ messages in thread
From: Sagi Grimberg @ 2023-03-22  7:12 UTC (permalink / raw)
  To: Lei Lei2 Yin, kbusch, axboe, hch; +Cc: linux-nvme, linux-kernel, cybeyond


> 	No, I have not verified this issue with a system larger than 5.10.y(such as 5.15.y and 6.0 or furthor), because some function we need like cgroup in upper version kernel has changed too much, we can't use these upper version kernel.

Well, this would be the starting point.

> 	In addition , uptreams have change bi_disk's modify to bio_set_dev(bio, ns->disk->part0), and as you said there is no bi_disk in struct bio anymore. So that is too involving because of code dependencies,  i want to do is what you said, to send an alternative surgical fix.

The correct course of action would be to identify and narrow down the
fix for this upstream, and then backport it back to stable kernel 5.10.y

> 	(I will confirm upstream for this problem in the near future, if it have same problem, i will submit this fix.)

Great.

> 	I'm not sure what evidence is needed to prove this problem and patch. The following is child bio and parent bio struct when heap-use-after-free occur catched by crash(I turn on kasan and panic_on_warn).
> 
> 	Please help me confirm if this is enough, thanks.

It is clear that there is a bug in 5.10.y, what we are discussing is:
1. Is this problem relevant to upstream kernel?
2. If yes, we can debate the correct fix, as your initial patch is not
    If not, then the upstream fix for this needs to be identified and
    backported.

Having stable kernels drift away from the original code-base is a bad
idea.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-03-22  7:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-21 10:50 [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath Lei Lei2 Yin
2023-03-21 11:09 ` Sagi Grimberg
2023-03-21 11:54   ` [External] " Lei Lei2 Yin
2023-03-21 12:26     ` Sagi Grimberg
2023-03-21 13:30       ` 回复: " Lei Lei2 Yin
2023-03-21 15:00         ` hch
2023-03-22  7:12         ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).