* [PATCH v2] nvme-multipath: Early exit if no path is available @ 2021-01-27 10:30 Daniel Wagner 2021-01-27 10:34 ` Hannes Reinecke ` (3 more replies) 0 siblings, 4 replies; 23+ messages in thread From: Daniel Wagner @ 2021-01-27 10:30 UTC (permalink / raw) To: linux-nvme Cc: linux-kernel, Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, Daniel Wagner, Hannes Reinecke nvme_round_robin_path() should test if the return ns pointer is valid. nvme_next_ns() will return a NULL pointer if there is no path left. Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Daniel Wagner <dwagner@suse.de> --- v2: - moved NULL test into the if conditional statement - added Fixes tag drivers/nvme/host/multipath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 9ac762b28811..282b7a4ea9a9 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, } for (ns = nvme_next_ns(head, old); - ns != old; + ns && ns != old; ns = nvme_next_ns(head, ns)) { if (nvme_path_is_disabled(ns)) continue; -- 2.29.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-27 10:30 [PATCH v2] nvme-multipath: Early exit if no path is available Daniel Wagner @ 2021-01-27 10:34 ` Hannes Reinecke 2021-01-27 16:49 ` Christoph Hellwig ` (2 subsequent siblings) 3 siblings, 0 replies; 23+ messages in thread From: Hannes Reinecke @ 2021-01-27 10:34 UTC (permalink / raw) To: Daniel Wagner, linux-nvme Cc: linux-kernel, Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg On 1/27/21 11:30 AM, Daniel Wagner wrote: > nvme_round_robin_path() should test if the return ns pointer is > valid. nvme_next_ns() will return a NULL pointer if there is no path > left. > > Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") > Cc: Hannes Reinecke <hare@suse.de> > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > v2: > - moved NULL test into the if conditional statement > - added Fixes tag > > drivers/nvme/host/multipath.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > index 9ac762b28811..282b7a4ea9a9 100644 > --- a/drivers/nvme/host/multipath.c > +++ b/drivers/nvme/host/multipath.c > @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, > } > > for (ns = nvme_next_ns(head, old); > - ns != old; > + ns && ns != old; > ns = nvme_next_ns(head, ns)) { > if (nvme_path_is_disabled(ns)) > continue; > Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-27 10:30 [PATCH v2] nvme-multipath: Early exit if no path is available Daniel Wagner 2021-01-27 10:34 ` Hannes Reinecke @ 2021-01-27 16:49 ` Christoph Hellwig 2021-01-28 1:31 ` Chao Leng 2021-01-28 1:36 ` Chao Leng 3 siblings, 0 replies; 23+ messages in thread From: Christoph Hellwig @ 2021-01-27 16:49 UTC (permalink / raw) To: Daniel Wagner Cc: linux-nvme, linux-kernel, Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, Hannes Reinecke Thanks, applied to nvme-5.11. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-27 10:30 [PATCH v2] nvme-multipath: Early exit if no path is available Daniel Wagner 2021-01-27 10:34 ` Hannes Reinecke 2021-01-27 16:49 ` Christoph Hellwig @ 2021-01-28 1:31 ` Chao Leng 2021-01-28 7:58 ` Daniel Wagner 2021-01-28 1:36 ` Chao Leng 3 siblings, 1 reply; 23+ messages in thread From: Chao Leng @ 2021-01-28 1:31 UTC (permalink / raw) To: Daniel Wagner, linux-nvme Cc: Sagi Grimberg, linux-kernel, Jens Axboe, Hannes Reinecke, Keith Busch, Christoph Hellwig On 2021/1/27 18:30, Daniel Wagner wrote: > nvme_round_robin_path() should test if the return ns pointer is > valid. nvme_next_ns() will return a NULL pointer if there is no path > left. > > Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") > Cc: Hannes Reinecke <hare@suse.de> > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > v2: > - moved NULL test into the if conditional statement > - added Fixes tag > > drivers/nvme/host/multipath.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > index 9ac762b28811..282b7a4ea9a9 100644 > --- a/drivers/nvme/host/multipath.c > +++ b/drivers/nvme/host/multipath.c > @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, > } > > for (ns = nvme_next_ns(head, old); > - ns != old; > + ns && ns != old; nvme_round_robin_path just be called when !"old". nvme_next_ns should not return NULL when !"old". It seems unnecessary to add checking "ns". Is there a bug that "old" in not in "head" list? If yes, we should fix it. > ns = nvme_next_ns(head, ns)) { > if (nvme_path_is_disabled(ns)) > continue; > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-28 1:31 ` Chao Leng @ 2021-01-28 7:58 ` Daniel Wagner 2021-01-28 9:18 ` Chao Leng 0 siblings, 1 reply; 23+ messages in thread From: Daniel Wagner @ 2021-01-28 7:58 UTC (permalink / raw) To: Chao Leng Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe, Hannes Reinecke, Keith Busch, Christoph Hellwig On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote: > > --- a/drivers/nvme/host/multipath.c > > +++ b/drivers/nvme/host/multipath.c > > @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, > > } > > for (ns = nvme_next_ns(head, old); > > - ns != old; > > + ns && ns != old; > nvme_round_robin_path just be called when !"old". > nvme_next_ns should not return NULL when !"old". > It seems unnecessary to add checking "ns". The problem is when we enter nvme_round_robin_path() and there is no path available. In this case the initialization ns = nvme_next_ns(head, old) could return a NULL pointer. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-28 7:58 ` Daniel Wagner @ 2021-01-28 9:18 ` Chao Leng 2021-01-28 9:23 ` Hannes Reinecke 2021-01-28 9:40 ` Daniel Wagner 0 siblings, 2 replies; 23+ messages in thread From: Chao Leng @ 2021-01-28 9:18 UTC (permalink / raw) To: Daniel Wagner Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe, Hannes Reinecke, Keith Busch, Christoph Hellwig On 2021/1/28 15:58, Daniel Wagner wrote: > On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote: >>> --- a/drivers/nvme/host/multipath.c >>> +++ b/drivers/nvme/host/multipath.c >>> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, >>> } >>> for (ns = nvme_next_ns(head, old); >>> - ns != old; >>> + ns && ns != old; >> nvme_round_robin_path just be called when !"old". >> nvme_next_ns should not return NULL when !"old". >> It seems unnecessary to add checking "ns". > > The problem is when we enter nvme_round_robin_path() and there is no > path available. In this case the initialization ns = nvme_next_ns(head, > old) could return a NULL pointer."old" should not be NULL, so there is at least one path that is "old". It is impossible to return NULL for nvme_next_ns(head, old). > . > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-28 9:18 ` Chao Leng @ 2021-01-28 9:23 ` Hannes Reinecke 2021-01-29 1:18 ` Chao Leng 2021-01-28 9:40 ` Daniel Wagner 1 sibling, 1 reply; 23+ messages in thread From: Hannes Reinecke @ 2021-01-28 9:23 UTC (permalink / raw) To: Chao Leng, Daniel Wagner Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig On 1/28/21 10:18 AM, Chao Leng wrote: > > > On 2021/1/28 15:58, Daniel Wagner wrote: >> On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote: >>>> --- a/drivers/nvme/host/multipath.c >>>> +++ b/drivers/nvme/host/multipath.c >>>> @@ -221,7 +221,7 @@ static struct nvme_ns >>>> *nvme_round_robin_path(struct nvme_ns_head *head, >>>> } >>>> for (ns = nvme_next_ns(head, old); >>>> - ns != old; >>>> + ns && ns != old; >>> nvme_round_robin_path just be called when !"old". >>> nvme_next_ns should not return NULL when !"old". >>> It seems unnecessary to add checking "ns". >> >> The problem is when we enter nvme_round_robin_path() and there is no >> path available. In this case the initialization ns = nvme_next_ns(head, >> old) could return a NULL pointer."old" should not be NULL, so there is >> at least one path that is "old". > It is impossible to return NULL for nvme_next_ns(head, old). No. list_next_or_null_rcu()/list_first_or_null_rcu() will return NULL when then end of the list is reached. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-28 9:23 ` Hannes Reinecke @ 2021-01-29 1:18 ` Chao Leng 0 siblings, 0 replies; 23+ messages in thread From: Chao Leng @ 2021-01-29 1:18 UTC (permalink / raw) To: Hannes Reinecke, Daniel Wagner Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig On 2021/1/28 17:23, Hannes Reinecke wrote: > On 1/28/21 10:18 AM, Chao Leng wrote: >> >> >> On 2021/1/28 15:58, Daniel Wagner wrote: >>> On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote: >>>>> --- a/drivers/nvme/host/multipath.c >>>>> +++ b/drivers/nvme/host/multipath.c >>>>> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, >>>>> } >>>>> for (ns = nvme_next_ns(head, old); >>>>> - ns != old; >>>>> + ns && ns != old; >>>> nvme_round_robin_path just be called when !"old". >>>> nvme_next_ns should not return NULL when !"old". >>>> It seems unnecessary to add checking "ns". >>> >>> The problem is when we enter nvme_round_robin_path() and there is no >>> path available. In this case the initialization ns = nvme_next_ns(head, >>> old) could return a NULL pointer."old" should not be NULL, so there is at least one path that is "old". >> It is impossible to return NULL for nvme_next_ns(head, old). > > No. list_next_or_null_rcu()/list_first_or_null_rcu() will return NULL when then end of the list is reached. Although list_next_or_null_rcu()/list_first_or_null_rcu() may return NULL, but nvme_next_ns(head, old) assume that the "old" is in the "head", so nvme_next_ns(head, old) should not return NULL. If the "old" is not in the "head", nvme_next_ns(head, old) will run abnormal. So there is other bug which cause nvme_next_ns(head, old). I review the code about head->list and head->current_path, I find 2 bugs may cause nvme_next_ns(head, old) abnormal: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. > > Cheers, > > Hannes ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-28 9:18 ` Chao Leng 2021-01-28 9:23 ` Hannes Reinecke @ 2021-01-28 9:40 ` Daniel Wagner 2021-01-29 1:23 ` Chao Leng 1 sibling, 1 reply; 23+ messages in thread From: Daniel Wagner @ 2021-01-28 9:40 UTC (permalink / raw) To: Chao Leng Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe, Hannes Reinecke, Keith Busch, Christoph Hellwig On Thu, Jan 28, 2021 at 05:18:56PM +0800, Chao Leng wrote: > It is impossible to return NULL for nvme_next_ns(head, old). block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O block nvme0n1: no usable path - requeuing I/O BUG: kernel NULL pointer dereference, address: 0000000000000068 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000ff67bc067 P4D 8000000ff67bc067 PUD ff9ac9067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 23 PID: 15759 Comm: dt.21.15 Kdump: loaded Tainted: G E 5.3.18-0.gc9fe679-default #1 SLE15-SP2 (unreleased) Hardware name: FUJITSU PRIMERGY RX2540 M2/D3289-B1, BIOS V5.0.0.11 R1.18.0 for D3289-B1x 02/06/2018 RIP: 0010:nvme_ns_head_make_request+0x1d1/0x430 [nvme_core] Code: 54 24 10 0f 84 c9 01 00 00 48 8b 54 24 10 48 83 ea 30 0f 84 ba 01 00 00 48 39 d0 0f 84 01 02 00 00 31 ff eb 05 48 39 d0 74 67 <48> 8b 72 68 83 e6 04 75 13 48 8b 72 68 83 e6 01 75 0a 48 8b 72 10 RSP: 0018:ffffa69d08017af8 EFLAGS: 00010246 RAX: ffff92f261d87800 RBX: ffff92fa555b0010 RCX: ffff92fa555bc570 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000001 R08: 0000000000001000 R09: 0000000000001000 R10: ffffa69d080179a8 R11: ffff92f264f0c1c0 R12: ffff92f264f7f000 R13: ffff92fa555b0000 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f3962bae700(0000) GS:ffff92f29ffc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000068 CR3: 0000000fd69a2002 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: generic_make_request+0x121/0x300 ? submit_bio+0x42/0x1c0 submit_bio+0x42/0x1c0 ext4_io_submit+0x49/0x60 [ext4] ext4_writepages+0x625/0xe90 [ext4] ? do_writepages+0x4b/0xe0 ? ext4_mark_inode_dirty+0x1d0/0x1d0 [ext4] do_writepages+0x4b/0xe0 ? __generic_file_write_iter+0x192/0x1c0 ? __filemap_fdatawrite_range+0xcb/0x100 __filemap_fdatawrite_range+0xcb/0x100 ? ext4_file_write_iter+0x128/0x3c0 [ext4] file_write_and_wait_range+0x5e/0xb0 __generic_file_fsync+0x22/0xb0 ext4_sync_file+0x1f7/0x3c0 [ext4] do_fsync+0x38/0x60 __x64_sys_fsync+0x10/0x20 do_syscall_64+0x5b/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). And I have positive feedback, this patch fixes the above problem. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-28 9:40 ` Daniel Wagner @ 2021-01-29 1:23 ` Chao Leng 2021-01-29 1:42 ` Sagi Grimberg 0 siblings, 1 reply; 23+ messages in thread From: Chao Leng @ 2021-01-29 1:23 UTC (permalink / raw) To: Daniel Wagner Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe, Hannes Reinecke, Keith Busch, Christoph Hellwig On 2021/1/28 17:40, Daniel Wagner wrote: > On Thu, Jan 28, 2021 at 05:18:56PM +0800, Chao Leng wrote: >> It is impossible to return NULL for nvme_next_ns(head, old). > > block nvme0n1: no usable path - requeuing I/O > block nvme0n1: no usable path - requeuing I/O > block nvme0n1: no usable path - requeuing I/O > block nvme0n1: no usable path - requeuing I/O > block nvme0n1: no usable path - requeuing I/O > block nvme0n1: no usable path - requeuing I/O > BUG: kernel NULL pointer dereference, address: 0000000000000068 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 8000000ff67bc067 P4D 8000000ff67bc067 PUD ff9ac9067 PMD 0 > Oops: 0000 [#1] SMP PTI > CPU: 23 PID: 15759 Comm: dt.21.15 Kdump: loaded Tainted: G E 5.3.18-0.gc9fe679-default #1 SLE15-SP2 (unreleased) > Hardware name: FUJITSU PRIMERGY RX2540 M2/D3289-B1, BIOS V5.0.0.11 R1.18.0 for D3289-B1x 02/06/2018 > RIP: 0010:nvme_ns_head_make_request+0x1d1/0x430 [nvme_core] > Code: 54 24 10 0f 84 c9 01 00 00 48 8b 54 24 10 48 83 ea 30 0f 84 ba 01 00 00 48 39 d0 0f 84 01 02 00 00 31 ff eb 05 48 39 d0 74 67 <48> 8b 72 68 83 e6 04 75 13 48 8b 72 68 83 e6 01 75 0a 48 8b 72 10 > RSP: 0018:ffffa69d08017af8 EFLAGS: 00010246 > RAX: ffff92f261d87800 RBX: ffff92fa555b0010 RCX: ffff92fa555bc570 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: 0000000000000001 R08: 0000000000001000 R09: 0000000000001000 > R10: ffffa69d080179a8 R11: ffff92f264f0c1c0 R12: ffff92f264f7f000 > R13: ffff92fa555b0000 R14: 0000000000000001 R15: 0000000000000000 > FS: 00007f3962bae700(0000) GS:ffff92f29ffc0000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000068 CR3: 0000000fd69a2002 CR4: 00000000003606e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > generic_make_request+0x121/0x300 > ? submit_bio+0x42/0x1c0 > submit_bio+0x42/0x1c0 > ext4_io_submit+0x49/0x60 [ext4] > ext4_writepages+0x625/0xe90 [ext4] > ? do_writepages+0x4b/0xe0 > ? ext4_mark_inode_dirty+0x1d0/0x1d0 [ext4] > do_writepages+0x4b/0xe0 > ? __generic_file_write_iter+0x192/0x1c0 > ? __filemap_fdatawrite_range+0xcb/0x100 > __filemap_fdatawrite_range+0xcb/0x100 > ? ext4_file_write_iter+0x128/0x3c0 [ext4] > file_write_and_wait_range+0x5e/0xb0 > __generic_file_fsync+0x22/0xb0 > ext4_sync_file+0x1f7/0x3c0 [ext4] > do_fsync+0x38/0x60 > __x64_sys_fsync+0x10/0x20 > do_syscall_64+0x5b/0x1e0 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > You can't see exactly where it dies but I followed the assembly to > nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, > old) which returns NULL but nvme_next_ns() is returning NULL eventually > (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I review the code about head->list and head->current_path, I find 2 bugs may cause the bug: First, I already send the patch. see: https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ Second, in nvme_ns_remove, list_del_rcu is before nvme_mpath_clear_current_path. This may cause "old" is deleted from the "head", but still use "old". I'm not sure there's any other consideration here, I will check it and try to fix it. > > And I have positive feedback, this patch fixes the above problem. Although adding check ns can fix the crash. There may still be other problems such as in __nvme_find_path which use list_for_each_entry_rcu. > > . > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-29 1:23 ` Chao Leng @ 2021-01-29 1:42 ` Sagi Grimberg 2021-01-29 3:07 ` Chao Leng 0 siblings, 1 reply; 23+ messages in thread From: Sagi Grimberg @ 2021-01-29 1:42 UTC (permalink / raw) To: Chao Leng, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Hannes Reinecke, Keith Busch, Christoph Hellwig >> You can't see exactly where it dies but I followed the assembly to >> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, >> old) which returns NULL but nvme_next_ns() is returning NULL eventually >> (list_next_or_null_rcu()). > So there is other bug cause nvme_next_ns abormal. > I review the code about head->list and head->current_path, I find 2 bugs > may cause the bug: > First, I already send the patch. see: > https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ > > Second, in nvme_ns_remove, list_del_rcu is before > nvme_mpath_clear_current_path. This may cause "old" is deleted from the > "head", but still use "old". I'm not sure there's any other > consideration here, I will check it and try to fix it. The reason why we first remove from head->list and only then clear current_path is because the other way around there is no way to guarantee that that the ns won't be assigned as current_path again (because it is in head->list). nvme_ns_remove fences continue of deletion of the ns by synchronizing the srcu such that for sure the current_path clearance is visible. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-29 1:42 ` Sagi Grimberg @ 2021-01-29 3:07 ` Chao Leng 2021-01-29 3:30 ` Sagi Grimberg 2021-01-29 7:06 ` Hannes Reinecke 0 siblings, 2 replies; 23+ messages in thread From: Chao Leng @ 2021-01-29 3:07 UTC (permalink / raw) To: Sagi Grimberg, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Hannes Reinecke, Keith Busch, Christoph Hellwig On 2021/1/29 9:42, Sagi Grimberg wrote: > >>> You can't see exactly where it dies but I followed the assembly to >>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, >>> old) which returns NULL but nvme_next_ns() is returning NULL eventually >>> (list_next_or_null_rcu()). >> So there is other bug cause nvme_next_ns abormal. >> I review the code about head->list and head->current_path, I find 2 bugs >> may cause the bug: >> First, I already send the patch. see: >> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ >> Second, in nvme_ns_remove, list_del_rcu is before >> nvme_mpath_clear_current_path. This may cause "old" is deleted from the >> "head", but still use "old". I'm not sure there's any other >> consideration here, I will check it and try to fix it. > > The reason why we first remove from head->list and only then clear > current_path is because the other way around there is no way > to guarantee that that the ns won't be assigned as current_path > again (because it is in head->list). ok, I see. > > nvme_ns_remove fences continue of deletion of the ns by synchronizing > the srcu such that for sure the current_path clearance is visible. The list will be like this: head->next = ns1; ns1->next = head; old->next = ns1; This may cause infinite loop in nvme_round_robin_path. for (ns = nvme_next_ns(head, old); ns != old; ns = nvme_next_ns(head, ns)) The ns will always be ns1, and then infinite loop. > . ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-29 3:07 ` Chao Leng @ 2021-01-29 3:30 ` Sagi Grimberg 2021-01-29 3:36 ` Chao Leng 2021-01-29 7:06 ` Hannes Reinecke 1 sibling, 1 reply; 23+ messages in thread From: Sagi Grimberg @ 2021-01-29 3:30 UTC (permalink / raw) To: Chao Leng, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Hannes Reinecke, Keith Busch, Christoph Hellwig >>>> You can't see exactly where it dies but I followed the assembly to >>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, >>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually >>>> (list_next_or_null_rcu()). >>> So there is other bug cause nvme_next_ns abormal. >>> I review the code about head->list and head->current_path, I find 2 bugs >>> may cause the bug: >>> First, I already send the patch. see: >>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ >>> >>> Second, in nvme_ns_remove, list_del_rcu is before >>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the >>> "head", but still use "old". I'm not sure there's any other >>> consideration here, I will check it and try to fix it. >> >> The reason why we first remove from head->list and only then clear >> current_path is because the other way around there is no way >> to guarantee that that the ns won't be assigned as current_path >> again (because it is in head->list). > ok, I see. >> >> nvme_ns_remove fences continue of deletion of the ns by synchronizing >> the srcu such that for sure the current_path clearance is visible. > The list will be like this: > head->next = ns1; > ns1->next = head; > old->next = ns1; > This may cause infinite loop in nvme_round_robin_path. > for (ns = nvme_next_ns(head, old); > ns != old; > ns = nvme_next_ns(head, ns)) > The ns will always be ns1, and then infinite loop. Who is being removed? I'm not following ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-29 3:30 ` Sagi Grimberg @ 2021-01-29 3:36 ` Chao Leng 0 siblings, 0 replies; 23+ messages in thread From: Chao Leng @ 2021-01-29 3:36 UTC (permalink / raw) To: Sagi Grimberg, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Hannes Reinecke, Keith Busch, Christoph Hellwig On 2021/1/29 11:30, Sagi Grimberg wrote: > >>>>> You can't see exactly where it dies but I followed the assembly to >>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, >>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually >>>>> (list_next_or_null_rcu()). >>>> So there is other bug cause nvme_next_ns abormal. >>>> I review the code about head->list and head->current_path, I find 2 bugs >>>> may cause the bug: >>>> First, I already send the patch. see: >>>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ >>>> Second, in nvme_ns_remove, list_del_rcu is before >>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the >>>> "head", but still use "old". I'm not sure there's any other >>>> consideration here, I will check it and try to fix it. >>> >>> The reason why we first remove from head->list and only then clear >>> current_path is because the other way around there is no way >>> to guarantee that that the ns won't be assigned as current_path >>> again (because it is in head->list). >> ok, I see. >>> >>> nvme_ns_remove fences continue of deletion of the ns by synchronizing >>> the srcu such that for sure the current_path clearance is visible. >> The list will be like this: >> head->next = ns1; >> ns1->next = head; >> old->next = ns1; >> This may cause infinite loop in nvme_round_robin_path. >> for (ns = nvme_next_ns(head, old); >> ns != old; >> ns = nvme_next_ns(head, ns)) >> The ns will always be ns1, and then infinite loop. > > Who is being removed? I'm not following The "old" is being removed path. Daniel Wagner report crash like this: head->next = head; old->next = head; So nvme_next_ns(head, old) will return NULL, and then crash. Although check ns can avoid crash, but can not avoid infinite loop. Similar reason, The list will be like this: head->next = ns1; ns1->next = head; old->next = ns1; ns1 is other path. > . ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-29 3:07 ` Chao Leng 2021-01-29 3:30 ` Sagi Grimberg @ 2021-01-29 7:06 ` Hannes Reinecke [not found] ` <81b22bbf-4dd3-6161-e63a-9699690a4e4f@huawei.com> 1 sibling, 1 reply; 23+ messages in thread From: Hannes Reinecke @ 2021-01-29 7:06 UTC (permalink / raw) To: Chao Leng, Sagi Grimberg, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig On 1/29/21 4:07 AM, Chao Leng wrote: > > > On 2021/1/29 9:42, Sagi Grimberg wrote: >> >>>> You can't see exactly where it dies but I followed the assembly to >>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, >>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually >>>> (list_next_or_null_rcu()). >>> So there is other bug cause nvme_next_ns abormal. >>> I review the code about head->list and head->current_path, I find 2 bugs >>> may cause the bug: >>> First, I already send the patch. see: >>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ >>> >>> Second, in nvme_ns_remove, list_del_rcu is before >>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the >>> "head", but still use "old". I'm not sure there's any other >>> consideration here, I will check it and try to fix it. >> >> The reason why we first remove from head->list and only then clear >> current_path is because the other way around there is no way >> to guarantee that that the ns won't be assigned as current_path >> again (because it is in head->list). > ok, I see. >> >> nvme_ns_remove fences continue of deletion of the ns by synchronizing >> the srcu such that for sure the current_path clearance is visible. > The list will be like this: > head->next = ns1; > ns1->next = head; > old->next = ns1; Where does 'old' pointing to? > This may cause infinite loop in nvme_round_robin_path. > for (ns = nvme_next_ns(head, old); > ns != old; > ns = nvme_next_ns(head, ns)) > The ns will always be ns1, and then infinite loop. No. nvme_next_ns() will return NULL. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <81b22bbf-4dd3-6161-e63a-9699690a4e4f@huawei.com>]
[parent not found: <715dd943-0587-be08-2840-e0948cf0bc62@suse.de>]
[parent not found: <eb131d8f-f009-42e7-105d-58b84060f0dd@huawei.com>]
[parent not found: <ac019690-7f02-d28c-ed58-bfc8c1d48879@suse.de>]
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available [not found] ` <ac019690-7f02-d28c-ed58-bfc8c1d48879@suse.de> @ 2021-02-01 2:16 ` Chao Leng 2021-02-01 7:29 ` Hannes Reinecke 0 siblings, 1 reply; 23+ messages in thread From: Chao Leng @ 2021-02-01 2:16 UTC (permalink / raw) To: Hannes Reinecke, Sagi Grimberg, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig On 2021/1/29 17:20, Hannes Reinecke wrote: > On 1/29/21 9:46 AM, Chao Leng wrote: >> >> >> On 2021/1/29 16:33, Hannes Reinecke wrote: >>> On 1/29/21 8:45 AM, Chao Leng wrote: >>>> >>>> >>>> On 2021/1/29 15:06, Hannes Reinecke wrote: >>>>> On 1/29/21 4:07 AM, Chao Leng wrote: >>>>>> >>>>>> >>>>>> On 2021/1/29 9:42, Sagi Grimberg wrote: >>>>>>> >>>>>>>>> You can't see exactly where it dies but I followed the assembly to >>>>>>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, >>>>>>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually >>>>>>>>> (list_next_or_null_rcu()). >>>>>>>> So there is other bug cause nvme_next_ns abormal. >>>>>>>> I review the code about head->list and head->current_path, I find 2 bugs >>>>>>>> may cause the bug: >>>>>>>> First, I already send the patch. see: >>>>>>>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ >>>>>>>> Second, in nvme_ns_remove, list_del_rcu is before >>>>>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the >>>>>>>> "head", but still use "old". I'm not sure there's any other >>>>>>>> consideration here, I will check it and try to fix it. >>>>>>> >>>>>>> The reason why we first remove from head->list and only then clear >>>>>>> current_path is because the other way around there is no way >>>>>>> to guarantee that that the ns won't be assigned as current_path >>>>>>> again (because it is in head->list). >>>>>> ok, I see. >>>>>>> >>>>>>> nvme_ns_remove fences continue of deletion of the ns by synchronizing >>>>>>> the srcu such that for sure the current_path clearance is visible. >>>>>> The list will be like this: >>>>>> head->next = ns1; >>>>>> ns1->next = head; >>>>>> old->next = ns1; >>>>> >>>>> Where does 'old' pointing to? >>>>> >>>>>> This may cause infinite loop in nvme_round_robin_path. >>>>>> for (ns = nvme_next_ns(head, old); >>>>>> ns != old; >>>>>> ns = nvme_next_ns(head, ns)) >>>>>> The ns will always be ns1, and then infinite loop. >>>>> >>>>> No. nvme_next_ns() will return NULL. >>>> If there is just one path(the "old") and the "old" is deleted, >>>> nvme_next_ns() will return NULL. >>>> The list like this: >>>> head->next = head; >>>> old->next = head; >>>> If there is two or more path and the "old" is deleted, >>>> "for" will be infinite loop. because nvme_next_ns() will return >>>> the path which in the list except the "old", check condition will >>>> be true for ever. >>> >>> But that will be caught by the statement above: >>> >>> if (list_is_singular(&head->list)) >>> >>> no? >> Two path just a sample example. >> If there is just two path, will enter it, may cause no path but there is >> actually one path. It is falsely assumed that the "old" must be not deleted. >> If there is more than two path, will cause infinite loop. > So you mean we'll need something like this? > > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > index 71696819c228..8ffccaf9c19a 100644 > --- a/drivers/nvme/host/multipath.c > +++ b/drivers/nvme/host/multipath.c > @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) > static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, > struct nvme_ns *ns) > { > - ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns, > - siblings); > - if (ns) > - return ns; > + if (ns) { > + ns = list_next_or_null_rcu(&head->list, &ns->siblings, > + struct nvme_ns, siblings); > + if (ns) > + return ns; > + } No, in the scenario, ns should not be NULL. May be we can do like this: diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 282b7a4ea9a9..b895011a2cbd 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) return found; } -static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, - struct nvme_ns *ns) -{ - ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns, - siblings); - if (ns) - return ns; - return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings); -} +#define nvme_next_ns_condition(head, current, condition) \ +({ \ + struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \ + &(current)->siblings, struct nvme_ns, siblings); \ + __ptr ? __ptr : (condition) ? (condition) = false, \ + list_first_or_null_rcu(&(head)->list, struct nvme_ns, \ + siblings) : NULL; \ +}) static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, int node, struct nvme_ns *old) { struct nvme_ns *ns, *found = NULL; + bool first_half = true; - if (list_is_singular(&head->list)) { - if (nvme_path_is_disabled(old)) - return NULL; - return old; - } - - for (ns = nvme_next_ns(head, old); + for (ns = nvme_next_ns_condition(head, old, first_half); ns && ns != old; - ns = nvme_next_ns(head, ns)) { + ns = nvme_next_ns_condition(head, ns, first_half)) { if (nvme_path_is_disabled(ns)) continue; > return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings); > } > > Cheers, > > Hannes ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-02-01 2:16 ` Chao Leng @ 2021-02-01 7:29 ` Hannes Reinecke 2021-02-01 8:47 ` Chao Leng 0 siblings, 1 reply; 23+ messages in thread From: Hannes Reinecke @ 2021-02-01 7:29 UTC (permalink / raw) To: Chao Leng, Sagi Grimberg, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig On 2/1/21 3:16 AM, Chao Leng wrote: > > > On 2021/1/29 17:20, Hannes Reinecke wrote: >> On 1/29/21 9:46 AM, Chao Leng wrote: >>> >>> >>> On 2021/1/29 16:33, Hannes Reinecke wrote: >>>> On 1/29/21 8:45 AM, Chao Leng wrote: >>>>> >>>>> >>>>> On 2021/1/29 15:06, Hannes Reinecke wrote: >>>>>> On 1/29/21 4:07 AM, Chao Leng wrote: >>>>>>> >>>>>>> >>>>>>> On 2021/1/29 9:42, Sagi Grimberg wrote: >>>>>>>> >>>>>>>>>> You can't see exactly where it dies but I followed the >>>>>>>>>> assembly to >>>>>>>>>> nvme_round_robin_path(). Maybe it's not the initial >>>>>>>>>> nvme_next_ns(head, >>>>>>>>>> old) which returns NULL but nvme_next_ns() is returning NULL >>>>>>>>>> eventually >>>>>>>>>> (list_next_or_null_rcu()). >>>>>>>>> So there is other bug cause nvme_next_ns abormal. >>>>>>>>> I review the code about head->list and head->current_path, I >>>>>>>>> find 2 bugs >>>>>>>>> may cause the bug: >>>>>>>>> First, I already send the patch. see: >>>>>>>>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ >>>>>>>>> >>>>>>>>> Second, in nvme_ns_remove, list_del_rcu is before >>>>>>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted >>>>>>>>> from the >>>>>>>>> "head", but still use "old". I'm not sure there's any other >>>>>>>>> consideration here, I will check it and try to fix it. >>>>>>>> >>>>>>>> The reason why we first remove from head->list and only then clear >>>>>>>> current_path is because the other way around there is no way >>>>>>>> to guarantee that that the ns won't be assigned as current_path >>>>>>>> again (because it is in head->list). >>>>>>> ok, I see. >>>>>>>> >>>>>>>> nvme_ns_remove fences continue of deletion of the ns by >>>>>>>> synchronizing >>>>>>>> the srcu such that for sure the current_path clearance is visible. >>>>>>> The list will be like this: >>>>>>> head->next = ns1; >>>>>>> ns1->next = head; >>>>>>> old->next = ns1; >>>>>> >>>>>> Where does 'old' pointing to? >>>>>> >>>>>>> This may cause infinite loop in nvme_round_robin_path. >>>>>>> for (ns = nvme_next_ns(head, old); >>>>>>> ns != old; >>>>>>> ns = nvme_next_ns(head, ns)) >>>>>>> The ns will always be ns1, and then infinite loop. >>>>>> >>>>>> No. nvme_next_ns() will return NULL. >>>>> If there is just one path(the "old") and the "old" is deleted, >>>>> nvme_next_ns() will return NULL. >>>>> The list like this: >>>>> head->next = head; >>>>> old->next = head; >>>>> If there is two or more path and the "old" is deleted, >>>>> "for" will be infinite loop. because nvme_next_ns() will return >>>>> the path which in the list except the "old", check condition will >>>>> be true for ever. >>>> >>>> But that will be caught by the statement above: >>>> >>>> if (list_is_singular(&head->list)) >>>> >>>> no? >>> Two path just a sample example. >>> If there is just two path, will enter it, may cause no path but there is >>> actually one path. It is falsely assumed that the "old" must be not >>> deleted. >>> If there is more than two path, will cause infinite loop. >> So you mean we'll need something like this? >> >> diff --git a/drivers/nvme/host/multipath.c >> b/drivers/nvme/host/multipath.c >> index 71696819c228..8ffccaf9c19a 100644 >> --- a/drivers/nvme/host/multipath.c >> +++ b/drivers/nvme/host/multipath.c >> @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct >> nvme_ns_head *head, int node) >> static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, >> struct nvme_ns *ns) >> { >> - ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct >> nvme_ns, >> - siblings); >> - if (ns) >> - return ns; >> + if (ns) { >> + ns = list_next_or_null_rcu(&head->list, &ns->siblings, >> + struct nvme_ns, siblings); >> + if (ns) >> + return ns; >> + } > No, in the scenario, ns should not be NULL. Why not? 'ns == NULL' is precisely the corner-case this is trying to fix... > May be we can do like this: > > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > index 282b7a4ea9a9..b895011a2cbd 100644 > --- a/drivers/nvme/host/multipath.c > +++ b/drivers/nvme/host/multipath.c > @@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct > nvme_ns_head *head, int node) > return found; > } > > -static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, > - struct nvme_ns *ns) > -{ > - ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct > nvme_ns, > - siblings); > - if (ns) > - return ns; > - return list_first_or_null_rcu(&head->list, struct nvme_ns, > siblings); > -} > +#define nvme_next_ns_condition(head, current, condition) \ > +({ \ > + struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \ > + &(current)->siblings, struct nvme_ns, siblings); \ > + __ptr ? __ptr : (condition) ? (condition) = false, \ > + list_first_or_null_rcu(&(head)->list, struct nvme_ns, \ > + siblings) : NULL; \ > +}) > Urgh. Please, no. That is well impossible to debug. Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is? I'm still not clear where the problem is once we applied both patches. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-02-01 7:29 ` Hannes Reinecke @ 2021-02-01 8:47 ` Chao Leng 2021-02-01 8:57 ` Hannes Reinecke 0 siblings, 1 reply; 23+ messages in thread From: Chao Leng @ 2021-02-01 8:47 UTC (permalink / raw) To: Hannes Reinecke, Sagi Grimberg, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig On 2021/2/1 15:29, Hannes Reinecke wrote: > On 2/1/21 3:16 AM, Chao Leng wrote: >> >> >> On 2021/1/29 17:20, Hannes Reinecke wrote: >>> On 1/29/21 9:46 AM, Chao Leng wrote: >>>> >>>> >>>> On 2021/1/29 16:33, Hannes Reinecke wrote: >>>>> On 1/29/21 8:45 AM, Chao Leng wrote: >>>>>> >>>>>> >>>>>> On 2021/1/29 15:06, Hannes Reinecke wrote: >>>>>>> On 1/29/21 4:07 AM, Chao Leng wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 2021/1/29 9:42, Sagi Grimberg wrote: >>>>>>>>> >>>>>>>>>>> You can't see exactly where it dies but I followed the assembly to >>>>>>>>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, >>>>>>>>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually >>>>>>>>>>> (list_next_or_null_rcu()). >>>>>>>>>> So there is other bug cause nvme_next_ns abormal. >>>>>>>>>> I review the code about head->list and head->current_path, I find 2 bugs >>>>>>>>>> may cause the bug: >>>>>>>>>> First, I already send the patch. see: >>>>>>>>>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ >>>>>>>>>> Second, in nvme_ns_remove, list_del_rcu is before >>>>>>>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the >>>>>>>>>> "head", but still use "old". I'm not sure there's any other >>>>>>>>>> consideration here, I will check it and try to fix it. >>>>>>>>> >>>>>>>>> The reason why we first remove from head->list and only then clear >>>>>>>>> current_path is because the other way around there is no way >>>>>>>>> to guarantee that that the ns won't be assigned as current_path >>>>>>>>> again (because it is in head->list). >>>>>>>> ok, I see. >>>>>>>>> >>>>>>>>> nvme_ns_remove fences continue of deletion of the ns by synchronizing >>>>>>>>> the srcu such that for sure the current_path clearance is visible. >>>>>>>> The list will be like this: >>>>>>>> head->next = ns1; >>>>>>>> ns1->next = head; >>>>>>>> old->next = ns1; >>>>>>> >>>>>>> Where does 'old' pointing to? >>>>>>> >>>>>>>> This may cause infinite loop in nvme_round_robin_path. >>>>>>>> for (ns = nvme_next_ns(head, old); >>>>>>>> ns != old; >>>>>>>> ns = nvme_next_ns(head, ns)) >>>>>>>> The ns will always be ns1, and then infinite loop. >>>>>>> >>>>>>> No. nvme_next_ns() will return NULL. >>>>>> If there is just one path(the "old") and the "old" is deleted, >>>>>> nvme_next_ns() will return NULL. >>>>>> The list like this: >>>>>> head->next = head; >>>>>> old->next = head; >>>>>> If there is two or more path and the "old" is deleted, >>>>>> "for" will be infinite loop. because nvme_next_ns() will return >>>>>> the path which in the list except the "old", check condition will >>>>>> be true for ever. >>>>> >>>>> But that will be caught by the statement above: >>>>> >>>>> if (list_is_singular(&head->list)) >>>>> >>>>> no? >>>> Two path just a sample example. >>>> If there is just two path, will enter it, may cause no path but there is >>>> actually one path. It is falsely assumed that the "old" must be not deleted. >>>> If there is more than two path, will cause infinite loop. >>> So you mean we'll need something like this? >>> >>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c >>> index 71696819c228..8ffccaf9c19a 100644 >>> --- a/drivers/nvme/host/multipath.c >>> +++ b/drivers/nvme/host/multipath.c >>> @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) >>> static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, >>> struct nvme_ns *ns) >>> { >>> - ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns, >>> - siblings); >>> - if (ns) >>> - return ns; >>> + if (ns) { >>> + ns = list_next_or_null_rcu(&head->list, &ns->siblings, >>> + struct nvme_ns, siblings); >>> + if (ns) >>> + return ns; >>> + } >> No, in the scenario, ns should not be NULL. > > Why not? 'ns == NULL' is precisely the corner-case this is trying to fix... > >> May be we can do like this: >> >> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c >> index 282b7a4ea9a9..b895011a2cbd 100644 >> --- a/drivers/nvme/host/multipath.c >> +++ b/drivers/nvme/host/multipath.c >> @@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) >> return found; >> } >> >> -static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, >> - struct nvme_ns *ns) >> -{ >> - ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns, >> - siblings); >> - if (ns) >> - return ns; >> - return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings); >> -} >> +#define nvme_next_ns_condition(head, current, condition) \ >> +({ \ >> + struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \ >> + &(current)->siblings, struct nvme_ns, siblings); \ >> + __ptr ? __ptr : (condition) ? (condition) = false, \ >> + list_first_or_null_rcu(&(head)->list, struct nvme_ns, \ >> + siblings) : NULL; \ >> +}) >> > Urgh. Please, no. That is well impossible to debug. > Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is? > I'm still not clear where the problem is once we applied both patches. For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED: head->next = ns1; ns1->next = ns2; ns2->next = head; old->next = ns2; My patch work flow: nvme_next_ns_condition(head, old, true) return ns2; nvme_next_ns_condition(head, ns2, true) change the condition to false and return ns1; nvme_next_ns_condition(head, ns1, false) return ns2; nvme_next_ns_condition(head, ns2, false) return NULL; And then the loop end. Your patch work flow: nvme_next_ns(head, old) return ns2; nvme_next_ns(head, ns2) return ns1; nvme_next_ns(head, ns1) return ns2; nvme_next_ns(head, ns2) return ns1; nvme_next_ns(head, ns1) return ns2; And then the loop is infinite. > > Cheers, > > Hannes ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-02-01 8:47 ` Chao Leng @ 2021-02-01 8:57 ` Hannes Reinecke 2021-02-01 9:40 ` Chao Leng 0 siblings, 1 reply; 23+ messages in thread From: Hannes Reinecke @ 2021-02-01 8:57 UTC (permalink / raw) To: Chao Leng, Sagi Grimberg, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig On 2/1/21 9:47 AM, Chao Leng wrote: > > > On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ] >> Urgh. Please, no. That is well impossible to debug. >> Can you please open-code it to demonstrate where the difference to the >> current (and my fixed) versions is? >> I'm still not clear where the problem is once we applied both patches. > For example assume the list has three path, and all path is not > NVME_ANA_OPTIMIZED: > head->next = ns1; > ns1->next = ns2; > ns2->next = head; > old->next = ns2; > And this is where I have issues with. Where does 'old' come from? Clearly it was part of the list at one point; so what happened to it? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-02-01 8:57 ` Hannes Reinecke @ 2021-02-01 9:40 ` Chao Leng 2021-02-01 10:45 ` Hannes Reinecke 0 siblings, 1 reply; 23+ messages in thread From: Chao Leng @ 2021-02-01 9:40 UTC (permalink / raw) To: Hannes Reinecke, Sagi Grimberg, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig On 2021/2/1 16:57, Hannes Reinecke wrote: > On 2/1/21 9:47 AM, Chao Leng wrote: >> >> >> On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ] >>> Urgh. Please, no. That is well impossible to debug. >>> Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is? >>> I'm still not clear where the problem is once we applied both patches. >> For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED: >> head->next = ns1; >> ns1->next = ns2; >> ns2->next = head; >> old->next = ns2; >> > And this is where I have issues with. > Where does 'old' come from? > Clearly it was part of the list at one point; so what happened to it? I explained this earlier. In nvme_ns_remove, there is a hole between list_del_rcu and nvme_mpath_clear_current_path. If head->current_path is the "old", and the "old" is removing. The "old" is already removed from the list by list_del_rcu, but head->current_path is not clear to NULL by nvme_mpath_clear_current_path. Find path is race with nvme_ns_remove, use the "old" pass to nvme_round_robin_path to find path. > > Cheers, > > Hannes ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-02-01 9:40 ` Chao Leng @ 2021-02-01 10:45 ` Hannes Reinecke 2021-02-02 1:12 ` Chao Leng 0 siblings, 1 reply; 23+ messages in thread From: Hannes Reinecke @ 2021-02-01 10:45 UTC (permalink / raw) To: Chao Leng, Sagi Grimberg, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig On 2/1/21 10:40 AM, Chao Leng wrote: > > > On 2021/2/1 16:57, Hannes Reinecke wrote: >> On 2/1/21 9:47 AM, Chao Leng wrote: >>> >>> >>> On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ] >>>> Urgh. Please, no. That is well impossible to debug. >>>> Can you please open-code it to demonstrate where the difference to >>>> the current (and my fixed) versions is? >>>> I'm still not clear where the problem is once we applied both patches. >>> For example assume the list has three path, and all path is not >>> NVME_ANA_OPTIMIZED: >>> head->next = ns1; >>> ns1->next = ns2; >>> ns2->next = head; >>> old->next = ns2; >>> >> And this is where I have issues with. >> Where does 'old' come from? >> Clearly it was part of the list at one point; so what happened to it? > I explained this earlier. > In nvme_ns_remove, there is a hole between list_del_rcu and > nvme_mpath_clear_current_path. If head->current_path is the "old", and > the "old" is removing. The "old" is already removed from the list by > list_del_rcu, but head->current_path is not clear to NULL by > nvme_mpath_clear_current_path. > Find path is race with nvme_ns_remove, use the "old" pass to > nvme_round_robin_path to find path. Ah. So this should be better: @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, struct nvme_ns *ns) { - ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns, - siblings); - if (ns) - return ns; + if (ns && !test_bit(NVME_NS_REMOVING, &ns->flags)) { + ns = list_next_or_null_rcu(&head->list, &ns->siblings, + struct nvme_ns, siblings); + if (ns) + return ns; + } return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings); } The 'NVME_NS_REMOVING' bit is set before list_del_rcu() is called, so it should guard against the issue you mentioned. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-02-01 10:45 ` Hannes Reinecke @ 2021-02-02 1:12 ` Chao Leng 0 siblings, 0 replies; 23+ messages in thread From: Chao Leng @ 2021-02-02 1:12 UTC (permalink / raw) To: Hannes Reinecke, Sagi Grimberg, Daniel Wagner Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig On 2021/2/1 18:45, Hannes Reinecke wrote: > On 2/1/21 10:40 AM, Chao Leng wrote: >> >> >> On 2021/2/1 16:57, Hannes Reinecke wrote: >>> On 2/1/21 9:47 AM, Chao Leng wrote: >>>> >>>> >>>> On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ] >>>>> Urgh. Please, no. That is well impossible to debug. >>>>> Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is? >>>>> I'm still not clear where the problem is once we applied both patches. >>>> For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED: >>>> head->next = ns1; >>>> ns1->next = ns2; >>>> ns2->next = head; >>>> old->next = ns2; >>>> >>> And this is where I have issues with. >>> Where does 'old' come from? >>> Clearly it was part of the list at one point; so what happened to it? >> I explained this earlier. >> In nvme_ns_remove, there is a hole between list_del_rcu and >> nvme_mpath_clear_current_path. If head->current_path is the "old", and >> the "old" is removing. The "old" is already removed from the list by >> list_del_rcu, but head->current_path is not clear to NULL by >> nvme_mpath_clear_current_path. >> Find path is race with nvme_ns_remove, use the "old" pass to >> nvme_round_robin_path to find path. > > Ah. So this should be better: > > @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) > static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, > struct nvme_ns *ns) > { > - ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns, > - siblings); > - if (ns) > - return ns; > + if (ns && !test_bit(NVME_NS_REMOVING, &ns->flags)) { > + ns = list_next_or_null_rcu(&head->list, &ns->siblings, > + struct nvme_ns, siblings); > + if (ns) > + return ns; > + } > return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings); > } > > The 'NVME_NS_REMOVING' bit is set before list_del_rcu() is called, so it should guard against the issue you mentioned. Looks useless, it is still infinite loop. You can check the workflow for the scenario I mentioned. > > Cheers, > > Hannes ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2] nvme-multipath: Early exit if no path is available 2021-01-27 10:30 [PATCH v2] nvme-multipath: Early exit if no path is available Daniel Wagner ` (2 preceding siblings ...) 2021-01-28 1:31 ` Chao Leng @ 2021-01-28 1:36 ` Chao Leng 3 siblings, 0 replies; 23+ messages in thread From: Chao Leng @ 2021-01-28 1:36 UTC (permalink / raw) To: Daniel Wagner, linux-nvme Cc: Sagi Grimberg, linux-kernel, Jens Axboe, Hannes Reinecke, Keith Busch, Christoph Hellwig On 2021/1/27 18:30, Daniel Wagner wrote: > nvme_round_robin_path() should test if the return ns pointer is > valid. nvme_next_ns() will return a NULL pointer if there is no path > left. > > Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") > Cc: Hannes Reinecke <hare@suse.de> > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > v2: > - moved NULL test into the if conditional statement > - added Fixes tag > > drivers/nvme/host/multipath.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > index 9ac762b28811..282b7a4ea9a9 100644 > --- a/drivers/nvme/host/multipath.c > +++ b/drivers/nvme/host/multipath.c > @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, > } > > for (ns = nvme_next_ns(head, old); > - ns != old; > + ns && ns != old; nvme_round_robin_path just be called when !"old". nvme_next_ns should not return NULL when !"old". It seems unnecessary to add checking "ns". Is there a bug that "old" is not in "head" list? If yes, we should fix it. > ns = nvme_next_ns(head, ns)) { > if (nvme_path_is_disabled(ns)) > continue; > ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2021-02-02 1:13 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-27 10:30 [PATCH v2] nvme-multipath: Early exit if no path is available Daniel Wagner 2021-01-27 10:34 ` Hannes Reinecke 2021-01-27 16:49 ` Christoph Hellwig 2021-01-28 1:31 ` Chao Leng 2021-01-28 7:58 ` Daniel Wagner 2021-01-28 9:18 ` Chao Leng 2021-01-28 9:23 ` Hannes Reinecke 2021-01-29 1:18 ` Chao Leng 2021-01-28 9:40 ` Daniel Wagner 2021-01-29 1:23 ` Chao Leng 2021-01-29 1:42 ` Sagi Grimberg 2021-01-29 3:07 ` Chao Leng 2021-01-29 3:30 ` Sagi Grimberg 2021-01-29 3:36 ` Chao Leng 2021-01-29 7:06 ` Hannes Reinecke [not found] ` <81b22bbf-4dd3-6161-e63a-9699690a4e4f@huawei.com> [not found] ` <715dd943-0587-be08-2840-e0948cf0bc62@suse.de> [not found] ` <eb131d8f-f009-42e7-105d-58b84060f0dd@huawei.com> [not found] ` <ac019690-7f02-d28c-ed58-bfc8c1d48879@suse.de> 2021-02-01 2:16 ` Chao Leng 2021-02-01 7:29 ` Hannes Reinecke 2021-02-01 8:47 ` Chao Leng 2021-02-01 8:57 ` Hannes Reinecke 2021-02-01 9:40 ` Chao Leng 2021-02-01 10:45 ` Hannes Reinecke 2021-02-02 1:12 ` Chao Leng 2021-01-28 1:36 ` Chao Leng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).