linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] nvme-multipath: Early exit if no path is available
@ 2021-01-27 10:30 Daniel Wagner
  2021-01-27 10:34 ` Hannes Reinecke
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Daniel Wagner @ 2021-01-27 10:30 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Keith Busch, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, Daniel Wagner, Hannes Reinecke

nvme_round_robin_path() should test if the return ns pointer is
valid. nvme_next_ns() will return a NULL pointer if there is no path
left.

Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy")
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
v2:
  - moved NULL test into the if conditional statement
  - added Fixes tag

 drivers/nvme/host/multipath.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 9ac762b28811..282b7a4ea9a9 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
 	}
 
 	for (ns = nvme_next_ns(head, old);
-	     ns != old;
+	     ns && ns != old;
 	     ns = nvme_next_ns(head, ns)) {
 		if (nvme_path_is_disabled(ns))
 			continue;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-27 10:30 [PATCH v2] nvme-multipath: Early exit if no path is available Daniel Wagner
@ 2021-01-27 10:34 ` Hannes Reinecke
  2021-01-27 16:49 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Hannes Reinecke @ 2021-01-27 10:34 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: linux-kernel, Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg

On 1/27/21 11:30 AM, Daniel Wagner wrote:
> nvme_round_robin_path() should test if the return ns pointer is
> valid. nvme_next_ns() will return a NULL pointer if there is no path
> left.
> 
> Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy")
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
> v2:
>    - moved NULL test into the if conditional statement
>    - added Fixes tag
> 
>   drivers/nvme/host/multipath.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 9ac762b28811..282b7a4ea9a9 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>   	}
>   
>   	for (ns = nvme_next_ns(head, old);
> -	     ns != old;
> +	     ns && ns != old;
>   	     ns = nvme_next_ns(head, ns)) {
>   		if (nvme_path_is_disabled(ns))
>   			continue;
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-27 10:30 [PATCH v2] nvme-multipath: Early exit if no path is available Daniel Wagner
  2021-01-27 10:34 ` Hannes Reinecke
@ 2021-01-27 16:49 ` Christoph Hellwig
  2021-01-28  1:31 ` Chao Leng
  2021-01-28  1:36 ` Chao Leng
  3 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2021-01-27 16:49 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Hannes Reinecke

Thanks,

applied to nvme-5.11.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-27 10:30 [PATCH v2] nvme-multipath: Early exit if no path is available Daniel Wagner
  2021-01-27 10:34 ` Hannes Reinecke
  2021-01-27 16:49 ` Christoph Hellwig
@ 2021-01-28  1:31 ` Chao Leng
  2021-01-28  7:58   ` Daniel Wagner
  2021-01-28  1:36 ` Chao Leng
  3 siblings, 1 reply; 23+ messages in thread
From: Chao Leng @ 2021-01-28  1:31 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: Sagi Grimberg, linux-kernel, Jens Axboe, Hannes Reinecke,
	Keith Busch, Christoph Hellwig



On 2021/1/27 18:30, Daniel Wagner wrote:
> nvme_round_robin_path() should test if the return ns pointer is
> valid. nvme_next_ns() will return a NULL pointer if there is no path
> left.
> 
> Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy")
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
> v2:
>    - moved NULL test into the if conditional statement
>    - added Fixes tag
> 
>   drivers/nvme/host/multipath.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 9ac762b28811..282b7a4ea9a9 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>   	}
>   
>   	for (ns = nvme_next_ns(head, old);
> -	     ns != old;
> +	     ns && ns != old;
nvme_round_robin_path just be called when !"old".
nvme_next_ns should not return NULL when !"old".
It seems unnecessary to add checking "ns".
Is there a bug that "old" in not in "head" list?
If yes, we should fix it.
>   	     ns = nvme_next_ns(head, ns)) {
>   		if (nvme_path_is_disabled(ns))
>   			continue;
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-27 10:30 [PATCH v2] nvme-multipath: Early exit if no path is available Daniel Wagner
                   ` (2 preceding siblings ...)
  2021-01-28  1:31 ` Chao Leng
@ 2021-01-28  1:36 ` Chao Leng
  3 siblings, 0 replies; 23+ messages in thread
From: Chao Leng @ 2021-01-28  1:36 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: Sagi Grimberg, linux-kernel, Jens Axboe, Hannes Reinecke,
	Keith Busch, Christoph Hellwig



On 2021/1/27 18:30, Daniel Wagner wrote:
> nvme_round_robin_path() should test if the return ns pointer is
> valid. nvme_next_ns() will return a NULL pointer if there is no path
> left.
> 
> Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy")
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
> v2:
>    - moved NULL test into the if conditional statement
>    - added Fixes tag
> 
>   drivers/nvme/host/multipath.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 9ac762b28811..282b7a4ea9a9 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>   	}
>   
>   	for (ns = nvme_next_ns(head, old);
> -	     ns != old;
> +	     ns && ns != old;
nvme_round_robin_path just be called when !"old".
nvme_next_ns should not return NULL when !"old".
It seems unnecessary to add checking "ns".
Is there a bug that "old" is not in "head" list?
If yes, we should fix it.
>   	     ns = nvme_next_ns(head, ns)) {
>   		if (nvme_path_is_disabled(ns))
>   			continue;
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-28  1:31 ` Chao Leng
@ 2021-01-28  7:58   ` Daniel Wagner
  2021-01-28  9:18     ` Chao Leng
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Wagner @ 2021-01-28  7:58 UTC (permalink / raw)
  To: Chao Leng
  Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe,
	Hannes Reinecke, Keith Busch, Christoph Hellwig

On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote:
> > --- a/drivers/nvme/host/multipath.c
> > +++ b/drivers/nvme/host/multipath.c
> > @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
> >   	}
> >   	for (ns = nvme_next_ns(head, old);
> > -	     ns != old;
> > +	     ns && ns != old;
> nvme_round_robin_path just be called when !"old".
> nvme_next_ns should not return NULL when !"old".
> It seems unnecessary to add checking "ns".

The problem is when we enter nvme_round_robin_path() and there is no
path available. In this case the initialization ns = nvme_next_ns(head,
old) could return a NULL pointer.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-28  7:58   ` Daniel Wagner
@ 2021-01-28  9:18     ` Chao Leng
  2021-01-28  9:23       ` Hannes Reinecke
  2021-01-28  9:40       ` Daniel Wagner
  0 siblings, 2 replies; 23+ messages in thread
From: Chao Leng @ 2021-01-28  9:18 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe,
	Hannes Reinecke, Keith Busch, Christoph Hellwig



On 2021/1/28 15:58, Daniel Wagner wrote:
> On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote:
>>> --- a/drivers/nvme/host/multipath.c
>>> +++ b/drivers/nvme/host/multipath.c
>>> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>>>    	}
>>>    	for (ns = nvme_next_ns(head, old);
>>> -	     ns != old;
>>> +	     ns && ns != old;
>> nvme_round_robin_path just be called when !"old".
>> nvme_next_ns should not return NULL when !"old".
>> It seems unnecessary to add checking "ns".
> 
> The problem is when we enter nvme_round_robin_path() and there is no
> path available. In this case the initialization ns = nvme_next_ns(head,
> old) could return a NULL pointer."old" should not be NULL, so there is at least one path that is "old".
It is impossible to return NULL for nvme_next_ns(head, old).
> .
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-28  9:18     ` Chao Leng
@ 2021-01-28  9:23       ` Hannes Reinecke
  2021-01-29  1:18         ` Chao Leng
  2021-01-28  9:40       ` Daniel Wagner
  1 sibling, 1 reply; 23+ messages in thread
From: Hannes Reinecke @ 2021-01-28  9:23 UTC (permalink / raw)
  To: Chao Leng, Daniel Wagner
  Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe, Keith Busch,
	Christoph Hellwig

On 1/28/21 10:18 AM, Chao Leng wrote:
> 
> 
> On 2021/1/28 15:58, Daniel Wagner wrote:
>> On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote:
>>>> --- a/drivers/nvme/host/multipath.c
>>>> +++ b/drivers/nvme/host/multipath.c
>>>> @@ -221,7 +221,7 @@ static struct nvme_ns 
>>>> *nvme_round_robin_path(struct nvme_ns_head *head,
>>>>        }
>>>>        for (ns = nvme_next_ns(head, old);
>>>> -         ns != old;
>>>> +         ns && ns != old;
>>> nvme_round_robin_path just be called when !"old".
>>> nvme_next_ns should not return NULL when !"old".
>>> It seems unnecessary to add checking "ns".
>>
>> The problem is when we enter nvme_round_robin_path() and there is no
>> path available. In this case the initialization ns = nvme_next_ns(head,
>> old) could return a NULL pointer."old" should not be NULL, so there is 
>> at least one path that is "old".
> It is impossible to return NULL for nvme_next_ns(head, old).

No. list_next_or_null_rcu()/list_first_or_null_rcu() will return NULL 
when then end of the list is reached.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-28  9:18     ` Chao Leng
  2021-01-28  9:23       ` Hannes Reinecke
@ 2021-01-28  9:40       ` Daniel Wagner
  2021-01-29  1:23         ` Chao Leng
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Wagner @ 2021-01-28  9:40 UTC (permalink / raw)
  To: Chao Leng
  Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe,
	Hannes Reinecke, Keith Busch, Christoph Hellwig

On Thu, Jan 28, 2021 at 05:18:56PM +0800, Chao Leng wrote:
> It is impossible to return NULL for nvme_next_ns(head, old).

 block nvme0n1: no usable path - requeuing I/O
 block nvme0n1: no usable path - requeuing I/O
 block nvme0n1: no usable path - requeuing I/O
 block nvme0n1: no usable path - requeuing I/O
 block nvme0n1: no usable path - requeuing I/O
 block nvme0n1: no usable path - requeuing I/O
 BUG: kernel NULL pointer dereference, address: 0000000000000068
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 8000000ff67bc067 P4D 8000000ff67bc067 PUD ff9ac9067 PMD 0 
 Oops: 0000 [#1] SMP PTI
 CPU: 23 PID: 15759 Comm: dt.21.15 Kdump: loaded Tainted: G            E       5.3.18-0.gc9fe679-default #1 SLE15-SP2 (unreleased)
 Hardware name: FUJITSU PRIMERGY RX2540 M2/D3289-B1, BIOS V5.0.0.11 R1.18.0 for D3289-B1x                    02/06/2018
 RIP: 0010:nvme_ns_head_make_request+0x1d1/0x430 [nvme_core]
 Code: 54 24 10 0f 84 c9 01 00 00 48 8b 54 24 10 48 83 ea 30 0f 84 ba 01 00 00 48 39 d0 0f 84 01 02 00 00 31 ff eb 05 48 39 d0 74 67 <48> 8b 72 68 83 e6 04 75 13 48 8b 72 68 83 e6 01 75 0a 48 8b 72 10
 RSP: 0018:ffffa69d08017af8 EFLAGS: 00010246
 RAX: ffff92f261d87800 RBX: ffff92fa555b0010 RCX: ffff92fa555bc570
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: 0000000000000001 R08: 0000000000001000 R09: 0000000000001000
 R10: ffffa69d080179a8 R11: ffff92f264f0c1c0 R12: ffff92f264f7f000
 R13: ffff92fa555b0000 R14: 0000000000000001 R15: 0000000000000000
 FS:  00007f3962bae700(0000) GS:ffff92f29ffc0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000068 CR3: 0000000fd69a2002 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  generic_make_request+0x121/0x300
  ? submit_bio+0x42/0x1c0
  submit_bio+0x42/0x1c0
  ext4_io_submit+0x49/0x60 [ext4]
  ext4_writepages+0x625/0xe90 [ext4]
  ? do_writepages+0x4b/0xe0
  ? ext4_mark_inode_dirty+0x1d0/0x1d0 [ext4]
  do_writepages+0x4b/0xe0
  ? __generic_file_write_iter+0x192/0x1c0
  ? __filemap_fdatawrite_range+0xcb/0x100
  __filemap_fdatawrite_range+0xcb/0x100
  ? ext4_file_write_iter+0x128/0x3c0 [ext4]
  file_write_and_wait_range+0x5e/0xb0
  __generic_file_fsync+0x22/0xb0
  ext4_sync_file+0x1f7/0x3c0 [ext4]
  do_fsync+0x38/0x60
  __x64_sys_fsync+0x10/0x20
  do_syscall_64+0x5b/0x1e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

You can't see exactly where it dies but I followed the assembly to
nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
old) which returns NULL but nvme_next_ns() is returning NULL eventually
(list_next_or_null_rcu()).

And I have positive feedback, this patch fixes the above problem.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-28  9:23       ` Hannes Reinecke
@ 2021-01-29  1:18         ` Chao Leng
  0 siblings, 0 replies; 23+ messages in thread
From: Chao Leng @ 2021-01-29  1:18 UTC (permalink / raw)
  To: Hannes Reinecke, Daniel Wagner
  Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe, Keith Busch,
	Christoph Hellwig



On 2021/1/28 17:23, Hannes Reinecke wrote:
> On 1/28/21 10:18 AM, Chao Leng wrote:
>>
>>
>> On 2021/1/28 15:58, Daniel Wagner wrote:
>>> On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote:
>>>>> --- a/drivers/nvme/host/multipath.c
>>>>> +++ b/drivers/nvme/host/multipath.c
>>>>> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>>>>>        }
>>>>>        for (ns = nvme_next_ns(head, old);
>>>>> -         ns != old;
>>>>> +         ns && ns != old;
>>>> nvme_round_robin_path just be called when !"old".
>>>> nvme_next_ns should not return NULL when !"old".
>>>> It seems unnecessary to add checking "ns".
>>>
>>> The problem is when we enter nvme_round_robin_path() and there is no
>>> path available. In this case the initialization ns = nvme_next_ns(head,
>>> old) could return a NULL pointer."old" should not be NULL, so there is at least one path that is "old".
>> It is impossible to return NULL for nvme_next_ns(head, old).
> 
> No. list_next_or_null_rcu()/list_first_or_null_rcu() will return NULL when then end of the list is reached.
Although list_next_or_null_rcu()/list_first_or_null_rcu() may return
NULL, but nvme_next_ns(head, old) assume that the "old" is in the "head",
so nvme_next_ns(head, old) should not return NULL. If the "old" is not
in the "head", nvme_next_ns(head, old) will run abnormal.
So there is other bug which cause nvme_next_ns(head, old).

I review the code about head->list and head->current_path, I find 2 bugs
may cause nvme_next_ns(head, old) abnormal:
First, I already send the patch. see:
https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/
Second, in nvme_ns_remove, list_del_rcu is before
nvme_mpath_clear_current_path. This may cause "old" is deleted from the
"head", but still use "old". I'm not sure there's any other
consideration here, I will check it and try to fix it.
> 
> Cheers,
> 
> Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-28  9:40       ` Daniel Wagner
@ 2021-01-29  1:23         ` Chao Leng
  2021-01-29  1:42           ` Sagi Grimberg
  0 siblings, 1 reply; 23+ messages in thread
From: Chao Leng @ 2021-01-29  1:23 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, Sagi Grimberg, linux-kernel, Jens Axboe,
	Hannes Reinecke, Keith Busch, Christoph Hellwig



On 2021/1/28 17:40, Daniel Wagner wrote:
> On Thu, Jan 28, 2021 at 05:18:56PM +0800, Chao Leng wrote:
>> It is impossible to return NULL for nvme_next_ns(head, old).
> 
>   block nvme0n1: no usable path - requeuing I/O
>   block nvme0n1: no usable path - requeuing I/O
>   block nvme0n1: no usable path - requeuing I/O
>   block nvme0n1: no usable path - requeuing I/O
>   block nvme0n1: no usable path - requeuing I/O
>   block nvme0n1: no usable path - requeuing I/O
>   BUG: kernel NULL pointer dereference, address: 0000000000000068
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 8000000ff67bc067 P4D 8000000ff67bc067 PUD ff9ac9067 PMD 0
>   Oops: 0000 [#1] SMP PTI
>   CPU: 23 PID: 15759 Comm: dt.21.15 Kdump: loaded Tainted: G            E       5.3.18-0.gc9fe679-default #1 SLE15-SP2 (unreleased)
>   Hardware name: FUJITSU PRIMERGY RX2540 M2/D3289-B1, BIOS V5.0.0.11 R1.18.0 for D3289-B1x                    02/06/2018
>   RIP: 0010:nvme_ns_head_make_request+0x1d1/0x430 [nvme_core]
>   Code: 54 24 10 0f 84 c9 01 00 00 48 8b 54 24 10 48 83 ea 30 0f 84 ba 01 00 00 48 39 d0 0f 84 01 02 00 00 31 ff eb 05 48 39 d0 74 67 <48> 8b 72 68 83 e6 04 75 13 48 8b 72 68 83 e6 01 75 0a 48 8b 72 10
>   RSP: 0018:ffffa69d08017af8 EFLAGS: 00010246
>   RAX: ffff92f261d87800 RBX: ffff92fa555b0010 RCX: ffff92fa555bc570
>   RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>   RBP: 0000000000000001 R08: 0000000000001000 R09: 0000000000001000
>   R10: ffffa69d080179a8 R11: ffff92f264f0c1c0 R12: ffff92f264f7f000
>   R13: ffff92fa555b0000 R14: 0000000000000001 R15: 0000000000000000
>   FS:  00007f3962bae700(0000) GS:ffff92f29ffc0000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000068 CR3: 0000000fd69a2002 CR4: 00000000003606e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    generic_make_request+0x121/0x300
>    ? submit_bio+0x42/0x1c0
>    submit_bio+0x42/0x1c0
>    ext4_io_submit+0x49/0x60 [ext4]
>    ext4_writepages+0x625/0xe90 [ext4]
>    ? do_writepages+0x4b/0xe0
>    ? ext4_mark_inode_dirty+0x1d0/0x1d0 [ext4]
>    do_writepages+0x4b/0xe0
>    ? __generic_file_write_iter+0x192/0x1c0
>    ? __filemap_fdatawrite_range+0xcb/0x100
>    __filemap_fdatawrite_range+0xcb/0x100
>    ? ext4_file_write_iter+0x128/0x3c0 [ext4]
>    file_write_and_wait_range+0x5e/0xb0
>    __generic_file_fsync+0x22/0xb0
>    ext4_sync_file+0x1f7/0x3c0 [ext4]
>    do_fsync+0x38/0x60
>    __x64_sys_fsync+0x10/0x20
>    do_syscall_64+0x5b/0x1e0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> You can't see exactly where it dies but I followed the assembly to
> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
> old) which returns NULL but nvme_next_ns() is returning NULL eventually
> (list_next_or_null_rcu()).
So there is other bug cause nvme_next_ns abormal.
I review the code about head->list and head->current_path, I find 2 bugs
may cause the bug:
First, I already send the patch. see:
https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/
Second, in nvme_ns_remove, list_del_rcu is before
nvme_mpath_clear_current_path. This may cause "old" is deleted from the
"head", but still use "old". I'm not sure there's any other
consideration here, I will check it and try to fix it.
> 
> And I have positive feedback, this patch fixes the above problem.
Although adding check ns can fix the crash. There may still be other problems
such as in __nvme_find_path which use list_for_each_entry_rcu.
> 
> .
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-29  1:23         ` Chao Leng
@ 2021-01-29  1:42           ` Sagi Grimberg
  2021-01-29  3:07             ` Chao Leng
  0 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2021-01-29  1:42 UTC (permalink / raw)
  To: Chao Leng, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Hannes Reinecke,
	Keith Busch, Christoph Hellwig


>> You can't see exactly where it dies but I followed the assembly to
>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>> (list_next_or_null_rcu()).
> So there is other bug cause nvme_next_ns abormal.
> I review the code about head->list and head->current_path, I find 2 bugs
> may cause the bug:
> First, I already send the patch. see:
> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ 
> 
> Second, in nvme_ns_remove, list_del_rcu is before
> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
> "head", but still use "old". I'm not sure there's any other
> consideration here, I will check it and try to fix it.

The reason why we first remove from head->list and only then clear
current_path is because the other way around there is no way
to guarantee that that the ns won't be assigned as current_path
again (because it is in head->list).

nvme_ns_remove fences continue of deletion of the ns by synchronizing
the srcu such that for sure the current_path clearance is visible.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-29  1:42           ` Sagi Grimberg
@ 2021-01-29  3:07             ` Chao Leng
  2021-01-29  3:30               ` Sagi Grimberg
  2021-01-29  7:06               ` Hannes Reinecke
  0 siblings, 2 replies; 23+ messages in thread
From: Chao Leng @ 2021-01-29  3:07 UTC (permalink / raw)
  To: Sagi Grimberg, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Hannes Reinecke,
	Keith Busch, Christoph Hellwig



On 2021/1/29 9:42, Sagi Grimberg wrote:
> 
>>> You can't see exactly where it dies but I followed the assembly to
>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>> (list_next_or_null_rcu()).
>> So there is other bug cause nvme_next_ns abormal.
>> I review the code about head->list and head->current_path, I find 2 bugs
>> may cause the bug:
>> First, I already send the patch. see:
>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/
>> Second, in nvme_ns_remove, list_del_rcu is before
>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>> "head", but still use "old". I'm not sure there's any other
>> consideration here, I will check it and try to fix it.
> 
> The reason why we first remove from head->list and only then clear
> current_path is because the other way around there is no way
> to guarantee that that the ns won't be assigned as current_path
> again (because it is in head->list).
ok, I see.
> 
> nvme_ns_remove fences continue of deletion of the ns by synchronizing
> the srcu such that for sure the current_path clearance is visible.
The list will be like this:
head->next = ns1;
ns1->next = head;
old->next = ns1;
This may cause infinite loop in nvme_round_robin_path.
for (ns = nvme_next_ns(head, old);
	ns != old;
	ns = nvme_next_ns(head, ns))
The ns will always be ns1, and then infinite loop.
> .

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-29  3:07             ` Chao Leng
@ 2021-01-29  3:30               ` Sagi Grimberg
  2021-01-29  3:36                 ` Chao Leng
  2021-01-29  7:06               ` Hannes Reinecke
  1 sibling, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2021-01-29  3:30 UTC (permalink / raw)
  To: Chao Leng, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Hannes Reinecke,
	Keith Busch, Christoph Hellwig


>>>> You can't see exactly where it dies but I followed the assembly to
>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>>> (list_next_or_null_rcu()).
>>> So there is other bug cause nvme_next_ns abormal.
>>> I review the code about head->list and head->current_path, I find 2 bugs
>>> may cause the bug:
>>> First, I already send the patch. see:
>>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ 
>>>
>>> Second, in nvme_ns_remove, list_del_rcu is before
>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>>> "head", but still use "old". I'm not sure there's any other
>>> consideration here, I will check it and try to fix it.
>>
>> The reason why we first remove from head->list and only then clear
>> current_path is because the other way around there is no way
>> to guarantee that that the ns won't be assigned as current_path
>> again (because it is in head->list).
> ok, I see.
>>
>> nvme_ns_remove fences continue of deletion of the ns by synchronizing
>> the srcu such that for sure the current_path clearance is visible.
> The list will be like this:
> head->next = ns1;
> ns1->next = head;
> old->next = ns1;
> This may cause infinite loop in nvme_round_robin_path.
> for (ns = nvme_next_ns(head, old);
>      ns != old;
>      ns = nvme_next_ns(head, ns))
> The ns will always be ns1, and then infinite loop.

Who is being removed? I'm not following

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-29  3:30               ` Sagi Grimberg
@ 2021-01-29  3:36                 ` Chao Leng
  0 siblings, 0 replies; 23+ messages in thread
From: Chao Leng @ 2021-01-29  3:36 UTC (permalink / raw)
  To: Sagi Grimberg, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Hannes Reinecke,
	Keith Busch, Christoph Hellwig



On 2021/1/29 11:30, Sagi Grimberg wrote:
> 
>>>>> You can't see exactly where it dies but I followed the assembly to
>>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>>>> (list_next_or_null_rcu()).
>>>> So there is other bug cause nvme_next_ns abormal.
>>>> I review the code about head->list and head->current_path, I find 2 bugs
>>>> may cause the bug:
>>>> First, I already send the patch. see:
>>>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/
>>>> Second, in nvme_ns_remove, list_del_rcu is before
>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>>>> "head", but still use "old". I'm not sure there's any other
>>>> consideration here, I will check it and try to fix it.
>>>
>>> The reason why we first remove from head->list and only then clear
>>> current_path is because the other way around there is no way
>>> to guarantee that that the ns won't be assigned as current_path
>>> again (because it is in head->list).
>> ok, I see.
>>>
>>> nvme_ns_remove fences continue of deletion of the ns by synchronizing
>>> the srcu such that for sure the current_path clearance is visible.
>> The list will be like this:
>> head->next = ns1;
>> ns1->next = head;
>> old->next = ns1;
>> This may cause infinite loop in nvme_round_robin_path.
>> for (ns = nvme_next_ns(head, old);
>>      ns != old;
>>      ns = nvme_next_ns(head, ns))
>> The ns will always be ns1, and then infinite loop.
> 
> Who is being removed? I'm not following
The "old" is being removed path.
Daniel Wagner report crash like this:
head->next = head;
old->next = head;
So nvme_next_ns(head, old) will return NULL, and then crash.
Although check ns can avoid crash, but can not avoid infinite loop.

Similar reason, The list will be like this:
head->next = ns1;
ns1->next = head;
old->next = ns1;
ns1 is other path.

> .

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-01-29  3:07             ` Chao Leng
  2021-01-29  3:30               ` Sagi Grimberg
@ 2021-01-29  7:06               ` Hannes Reinecke
       [not found]                 ` <81b22bbf-4dd3-6161-e63a-9699690a4e4f@huawei.com>
  1 sibling, 1 reply; 23+ messages in thread
From: Hannes Reinecke @ 2021-01-29  7:06 UTC (permalink / raw)
  To: Chao Leng, Sagi Grimberg, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig

On 1/29/21 4:07 AM, Chao Leng wrote:
> 
> 
> On 2021/1/29 9:42, Sagi Grimberg wrote:
>>
>>>> You can't see exactly where it dies but I followed the assembly to
>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>>> (list_next_or_null_rcu()).
>>> So there is other bug cause nvme_next_ns abormal.
>>> I review the code about head->list and head->current_path, I find 2 bugs
>>> may cause the bug:
>>> First, I already send the patch. see:
>>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ 
>>>
>>> Second, in nvme_ns_remove, list_del_rcu is before
>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>>> "head", but still use "old". I'm not sure there's any other
>>> consideration here, I will check it and try to fix it.
>>
>> The reason why we first remove from head->list and only then clear
>> current_path is because the other way around there is no way
>> to guarantee that that the ns won't be assigned as current_path
>> again (because it is in head->list).
> ok, I see.
>>
>> nvme_ns_remove fences continue of deletion of the ns by synchronizing
>> the srcu such that for sure the current_path clearance is visible.
> The list will be like this:
> head->next = ns1;
> ns1->next = head;
> old->next = ns1;

Where does 'old' pointing to?

> This may cause infinite loop in nvme_round_robin_path.
> for (ns = nvme_next_ns(head, old);
>      ns != old;
>      ns = nvme_next_ns(head, ns))
> The ns will always be ns1, and then infinite loop.

No. nvme_next_ns() will return NULL.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
       [not found]                       ` <ac019690-7f02-d28c-ed58-bfc8c1d48879@suse.de>
@ 2021-02-01  2:16                         ` Chao Leng
  2021-02-01  7:29                           ` Hannes Reinecke
  0 siblings, 1 reply; 23+ messages in thread
From: Chao Leng @ 2021-02-01  2:16 UTC (permalink / raw)
  To: Hannes Reinecke, Sagi Grimberg, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig



On 2021/1/29 17:20, Hannes Reinecke wrote:
> On 1/29/21 9:46 AM, Chao Leng wrote:
>>
>>
>> On 2021/1/29 16:33, Hannes Reinecke wrote:
>>> On 1/29/21 8:45 AM, Chao Leng wrote:
>>>>
>>>>
>>>> On 2021/1/29 15:06, Hannes Reinecke wrote:
>>>>> On 1/29/21 4:07 AM, Chao Leng wrote:
>>>>>>
>>>>>>
>>>>>> On 2021/1/29 9:42, Sagi Grimberg wrote:
>>>>>>>
>>>>>>>>> You can't see exactly where it dies but I followed the assembly to
>>>>>>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>>>>>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>>>>>>>> (list_next_or_null_rcu()).
>>>>>>>> So there is other bug cause nvme_next_ns abormal.
>>>>>>>> I review the code about head->list and head->current_path, I find 2 bugs
>>>>>>>> may cause the bug:
>>>>>>>> First, I already send the patch. see:
>>>>>>>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/
>>>>>>>> Second, in nvme_ns_remove, list_del_rcu is before
>>>>>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>>>>>>>> "head", but still use "old". I'm not sure there's any other
>>>>>>>> consideration here, I will check it and try to fix it.
>>>>>>>
>>>>>>> The reason why we first remove from head->list and only then clear
>>>>>>> current_path is because the other way around there is no way
>>>>>>> to guarantee that that the ns won't be assigned as current_path
>>>>>>> again (because it is in head->list).
>>>>>> ok, I see.
>>>>>>>
>>>>>>> nvme_ns_remove fences continue of deletion of the ns by synchronizing
>>>>>>> the srcu such that for sure the current_path clearance is visible.
>>>>>> The list will be like this:
>>>>>> head->next = ns1;
>>>>>> ns1->next = head;
>>>>>> old->next = ns1;
>>>>>
>>>>> Where does 'old' pointing to?
>>>>>
>>>>>> This may cause infinite loop in nvme_round_robin_path.
>>>>>> for (ns = nvme_next_ns(head, old);
>>>>>>      ns != old;
>>>>>>      ns = nvme_next_ns(head, ns))
>>>>>> The ns will always be ns1, and then infinite loop.
>>>>>
>>>>> No. nvme_next_ns() will return NULL.
>>>> If there is just one path(the "old") and the "old" is deleted,
>>>> nvme_next_ns() will return NULL.
>>>> The list like this:
>>>> head->next = head;
>>>> old->next = head;
>>>> If there is two or more path and the "old" is deleted,
>>>> "for" will be infinite loop. because nvme_next_ns() will return
>>>> the path which in the list except the "old", check condition will
>>>> be true for ever.
>>>
>>> But that will be caught by the statement above:
>>>
>>> if (list_is_singular(&head->list))
>>>
>>> no?
>> Two path just a sample example.
>> If there is just two path, will enter it, may cause no path but there is
>> actually one path. It is falsely assumed that the "old" must be not deleted.
>> If there is more than two path, will cause infinite loop.
> So you mean we'll need something like this?
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 71696819c228..8ffccaf9c19a 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
>   static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
>                  struct nvme_ns *ns)
>   {
> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
> -                       siblings);
> -       if (ns)
> -               return ns;
> +       if (ns) {
> +               ns = list_next_or_null_rcu(&head->list, &ns->siblings,
> +                                          struct nvme_ns, siblings);
> +               if (ns)
> +                       return ns;
> +       }
No, in the scenario, ns should not be NULL.
May be we can do like this:

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 282b7a4ea9a9..b895011a2cbd 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
         return found;
  }

-static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
-               struct nvme_ns *ns)
-{
-       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
-                       siblings);
-       if (ns)
-               return ns;
-       return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
-}
+#define nvme_next_ns_condition(head, current, condition) \
+({ \
+       struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \
+               &(current)->siblings, struct nvme_ns, siblings); \
+       __ptr ? __ptr : (condition) ? (condition) = false, \
+               list_first_or_null_rcu(&(head)->list, struct nvme_ns, \
+                       siblings) : NULL; \
+})

  static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
                 int node, struct nvme_ns *old)
  {
         struct nvme_ns *ns, *found = NULL;
+       bool first_half = true;

-       if (list_is_singular(&head->list)) {
-               if (nvme_path_is_disabled(old))
-                       return NULL;
-               return old;
-       }
-
-       for (ns = nvme_next_ns(head, old);
+       for (ns = nvme_next_ns_condition(head, old, first_half);
              ns && ns != old;
-            ns = nvme_next_ns(head, ns)) {
+            ns = nvme_next_ns_condition(head, ns, first_half)) {
                 if (nvme_path_is_disabled(ns))
                         continue;

>          return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
>   }
> 
> Cheers,
> 
> Hannes

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-02-01  2:16                         ` Chao Leng
@ 2021-02-01  7:29                           ` Hannes Reinecke
  2021-02-01  8:47                             ` Chao Leng
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Reinecke @ 2021-02-01  7:29 UTC (permalink / raw)
  To: Chao Leng, Sagi Grimberg, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig

On 2/1/21 3:16 AM, Chao Leng wrote:
> 
> 
> On 2021/1/29 17:20, Hannes Reinecke wrote:
>> On 1/29/21 9:46 AM, Chao Leng wrote:
>>>
>>>
>>> On 2021/1/29 16:33, Hannes Reinecke wrote:
>>>> On 1/29/21 8:45 AM, Chao Leng wrote:
>>>>>
>>>>>
>>>>> On 2021/1/29 15:06, Hannes Reinecke wrote:
>>>>>> On 1/29/21 4:07 AM, Chao Leng wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2021/1/29 9:42, Sagi Grimberg wrote:
>>>>>>>>
>>>>>>>>>> You can't see exactly where it dies but I followed the 
>>>>>>>>>> assembly to
>>>>>>>>>> nvme_round_robin_path(). Maybe it's not the initial 
>>>>>>>>>> nvme_next_ns(head,
>>>>>>>>>> old) which returns NULL but nvme_next_ns() is returning NULL 
>>>>>>>>>> eventually
>>>>>>>>>> (list_next_or_null_rcu()).
>>>>>>>>> So there is other bug cause nvme_next_ns abormal.
>>>>>>>>> I review the code about head->list and head->current_path, I 
>>>>>>>>> find 2 bugs
>>>>>>>>> may cause the bug:
>>>>>>>>> First, I already send the patch. see:
>>>>>>>>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ 
>>>>>>>>>
>>>>>>>>> Second, in nvme_ns_remove, list_del_rcu is before
>>>>>>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted 
>>>>>>>>> from the
>>>>>>>>> "head", but still use "old". I'm not sure there's any other
>>>>>>>>> consideration here, I will check it and try to fix it.
>>>>>>>>
>>>>>>>> The reason why we first remove from head->list and only then clear
>>>>>>>> current_path is because the other way around there is no way
>>>>>>>> to guarantee that that the ns won't be assigned as current_path
>>>>>>>> again (because it is in head->list).
>>>>>>> ok, I see.
>>>>>>>>
>>>>>>>> nvme_ns_remove fences continue of deletion of the ns by 
>>>>>>>> synchronizing
>>>>>>>> the srcu such that for sure the current_path clearance is visible.
>>>>>>> The list will be like this:
>>>>>>> head->next = ns1;
>>>>>>> ns1->next = head;
>>>>>>> old->next = ns1;
>>>>>>
>>>>>> Where does 'old' pointing to?
>>>>>>
>>>>>>> This may cause infinite loop in nvme_round_robin_path.
>>>>>>> for (ns = nvme_next_ns(head, old);
>>>>>>>      ns != old;
>>>>>>>      ns = nvme_next_ns(head, ns))
>>>>>>> The ns will always be ns1, and then infinite loop.
>>>>>>
>>>>>> No. nvme_next_ns() will return NULL.
>>>>> If there is just one path(the "old") and the "old" is deleted,
>>>>> nvme_next_ns() will return NULL.
>>>>> The list like this:
>>>>> head->next = head;
>>>>> old->next = head;
>>>>> If there is two or more path and the "old" is deleted,
>>>>> "for" will be infinite loop. because nvme_next_ns() will return
>>>>> the path which in the list except the "old", check condition will
>>>>> be true for ever.
>>>>
>>>> But that will be caught by the statement above:
>>>>
>>>> if (list_is_singular(&head->list))
>>>>
>>>> no?
>>> Two path just a sample example.
>>> If there is just two path, will enter it, may cause no path but there is
>>> actually one path. It is falsely assumed that the "old" must be not 
>>> deleted.
>>> If there is more than two path, will cause infinite loop.
>> So you mean we'll need something like this?
>>
>> diff --git a/drivers/nvme/host/multipath.c 
>> b/drivers/nvme/host/multipath.c
>> index 71696819c228..8ffccaf9c19a 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct 
>> nvme_ns_head *head, int node)
>>   static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
>>                  struct nvme_ns *ns)
>>   {
>> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct 
>> nvme_ns,
>> -                       siblings);
>> -       if (ns)
>> -               return ns;
>> +       if (ns) {
>> +               ns = list_next_or_null_rcu(&head->list, &ns->siblings,
>> +                                          struct nvme_ns, siblings);
>> +               if (ns)
>> +                       return ns;
>> +       }
> No, in the scenario, ns should not be NULL.

Why not? 'ns == NULL' is precisely the corner-case this is trying to fix...

> May be we can do like this:
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 282b7a4ea9a9..b895011a2cbd 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct 
> nvme_ns_head *head, int node)
>          return found;
>   }
> 
> -static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
> -               struct nvme_ns *ns)
> -{
> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct 
> nvme_ns,
> -                       siblings);
> -       if (ns)
> -               return ns;
> -       return list_first_or_null_rcu(&head->list, struct nvme_ns, 
> siblings);
> -}
> +#define nvme_next_ns_condition(head, current, condition) \
> +({ \
> +       struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \
> +               &(current)->siblings, struct nvme_ns, siblings); \
> +       __ptr ? __ptr : (condition) ? (condition) = false, \
> +               list_first_or_null_rcu(&(head)->list, struct nvme_ns, \
> +                       siblings) : NULL; \
> +})
> 
Urgh. Please, no. That is well impossible to debug.
Can you please open-code it to demonstrate where the difference to the 
current (and my fixed) versions is?
I'm still not clear where the problem is once we applied both patches.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-02-01  7:29                           ` Hannes Reinecke
@ 2021-02-01  8:47                             ` Chao Leng
  2021-02-01  8:57                               ` Hannes Reinecke
  0 siblings, 1 reply; 23+ messages in thread
From: Chao Leng @ 2021-02-01  8:47 UTC (permalink / raw)
  To: Hannes Reinecke, Sagi Grimberg, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig



On 2021/2/1 15:29, Hannes Reinecke wrote:
> On 2/1/21 3:16 AM, Chao Leng wrote:
>>
>>
>> On 2021/1/29 17:20, Hannes Reinecke wrote:
>>> On 1/29/21 9:46 AM, Chao Leng wrote:
>>>>
>>>>
>>>> On 2021/1/29 16:33, Hannes Reinecke wrote:
>>>>> On 1/29/21 8:45 AM, Chao Leng wrote:
>>>>>>
>>>>>>
>>>>>> On 2021/1/29 15:06, Hannes Reinecke wrote:
>>>>>>> On 1/29/21 4:07 AM, Chao Leng wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2021/1/29 9:42, Sagi Grimberg wrote:
>>>>>>>>>
>>>>>>>>>>> You can't see exactly where it dies but I followed the assembly to
>>>>>>>>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>>>>>>>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>>>>>>>>>> (list_next_or_null_rcu()).
>>>>>>>>>> So there is other bug cause nvme_next_ns abormal.
>>>>>>>>>> I review the code about head->list and head->current_path, I find 2 bugs
>>>>>>>>>> may cause the bug:
>>>>>>>>>> First, I already send the patch. see:
>>>>>>>>>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/
>>>>>>>>>> Second, in nvme_ns_remove, list_del_rcu is before
>>>>>>>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>>>>>>>>>> "head", but still use "old". I'm not sure there's any other
>>>>>>>>>> consideration here, I will check it and try to fix it.
>>>>>>>>>
>>>>>>>>> The reason why we first remove from head->list and only then clear
>>>>>>>>> current_path is because the other way around there is no way
>>>>>>>>> to guarantee that that the ns won't be assigned as current_path
>>>>>>>>> again (because it is in head->list).
>>>>>>>> ok, I see.
>>>>>>>>>
>>>>>>>>> nvme_ns_remove fences continue of deletion of the ns by synchronizing
>>>>>>>>> the srcu such that for sure the current_path clearance is visible.
>>>>>>>> The list will be like this:
>>>>>>>> head->next = ns1;
>>>>>>>> ns1->next = head;
>>>>>>>> old->next = ns1;
>>>>>>>
>>>>>>> Where does 'old' pointing to?
>>>>>>>
>>>>>>>> This may cause infinite loop in nvme_round_robin_path.
>>>>>>>> for (ns = nvme_next_ns(head, old);
>>>>>>>>      ns != old;
>>>>>>>>      ns = nvme_next_ns(head, ns))
>>>>>>>> The ns will always be ns1, and then infinite loop.
>>>>>>>
>>>>>>> No. nvme_next_ns() will return NULL.
>>>>>> If there is just one path(the "old") and the "old" is deleted,
>>>>>> nvme_next_ns() will return NULL.
>>>>>> The list like this:
>>>>>> head->next = head;
>>>>>> old->next = head;
>>>>>> If there is two or more path and the "old" is deleted,
>>>>>> "for" will be infinite loop. because nvme_next_ns() will return
>>>>>> the path which in the list except the "old", check condition will
>>>>>> be true for ever.
>>>>>
>>>>> But that will be caught by the statement above:
>>>>>
>>>>> if (list_is_singular(&head->list))
>>>>>
>>>>> no?
>>>> Two path just a sample example.
>>>> If there is just two path, will enter it, may cause no path but there is
>>>> actually one path. It is falsely assumed that the "old" must be not deleted.
>>>> If there is more than two path, will cause infinite loop.
>>> So you mean we'll need something like this?
>>>
>>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>>> index 71696819c228..8ffccaf9c19a 100644
>>> --- a/drivers/nvme/host/multipath.c
>>> +++ b/drivers/nvme/host/multipath.c
>>> @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
>>>   static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
>>>                  struct nvme_ns *ns)
>>>   {
>>> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
>>> -                       siblings);
>>> -       if (ns)
>>> -               return ns;
>>> +       if (ns) {
>>> +               ns = list_next_or_null_rcu(&head->list, &ns->siblings,
>>> +                                          struct nvme_ns, siblings);
>>> +               if (ns)
>>> +                       return ns;
>>> +       }
>> No, in the scenario, ns should not be NULL.
> 
> Why not? 'ns == NULL' is precisely the corner-case this is trying to fix...
> 
>> May be we can do like this:
>>
>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>> index 282b7a4ea9a9..b895011a2cbd 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
>>          return found;
>>   }
>>
>> -static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
>> -               struct nvme_ns *ns)
>> -{
>> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
>> -                       siblings);
>> -       if (ns)
>> -               return ns;
>> -       return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
>> -}
>> +#define nvme_next_ns_condition(head, current, condition) \
>> +({ \
>> +       struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \
>> +               &(current)->siblings, struct nvme_ns, siblings); \
>> +       __ptr ? __ptr : (condition) ? (condition) = false, \
>> +               list_first_or_null_rcu(&(head)->list, struct nvme_ns, \
>> +                       siblings) : NULL; \
>> +})
>>
> Urgh. Please, no. That is well impossible to debug.
> Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is?
> I'm still not clear where the problem is once we applied both patches.
For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED:
head->next = ns1;
ns1->next = ns2;
ns2->next = head;
old->next = ns2;

My patch work flow:
nvme_next_ns_condition(head, old, true) return ns2;
nvme_next_ns_condition(head, ns2, true) change the condition to false
                                         and return ns1;
nvme_next_ns_condition(head, ns1, false) return ns2;
nvme_next_ns_condition(head, ns2, false) return NULL;
And then the loop end.

Your patch work flow:
nvme_next_ns(head, old) return ns2;
nvme_next_ns(head, ns2) return ns1;
nvme_next_ns(head, ns1) return ns2;
nvme_next_ns(head, ns2) return ns1;
nvme_next_ns(head, ns1) return ns2;
And then the loop is infinite.
> 
> Cheers,
> 
> Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-02-01  8:47                             ` Chao Leng
@ 2021-02-01  8:57                               ` Hannes Reinecke
  2021-02-01  9:40                                 ` Chao Leng
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Reinecke @ 2021-02-01  8:57 UTC (permalink / raw)
  To: Chao Leng, Sagi Grimberg, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig

On 2/1/21 9:47 AM, Chao Leng wrote:
> 
> 
> On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ]
>> Urgh. Please, no. That is well impossible to debug.
>> Can you please open-code it to demonstrate where the difference to the 
>> current (and my fixed) versions is?
>> I'm still not clear where the problem is once we applied both patches.
> For example assume the list has three path, and all path is not 
> NVME_ANA_OPTIMIZED:
> head->next = ns1;
> ns1->next = ns2;
> ns2->next = head;
> old->next = ns2;
> 
And this is where I have issues with.
Where does 'old' come from?
Clearly it was part of the list at one point; so what happened to it?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-02-01  8:57                               ` Hannes Reinecke
@ 2021-02-01  9:40                                 ` Chao Leng
  2021-02-01 10:45                                   ` Hannes Reinecke
  0 siblings, 1 reply; 23+ messages in thread
From: Chao Leng @ 2021-02-01  9:40 UTC (permalink / raw)
  To: Hannes Reinecke, Sagi Grimberg, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig



On 2021/2/1 16:57, Hannes Reinecke wrote:
> On 2/1/21 9:47 AM, Chao Leng wrote:
>>
>>
>> On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ]
>>> Urgh. Please, no. That is well impossible to debug.
>>> Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is?
>>> I'm still not clear where the problem is once we applied both patches.
>> For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED:
>> head->next = ns1;
>> ns1->next = ns2;
>> ns2->next = head;
>> old->next = ns2;
>>
> And this is where I have issues with.
> Where does 'old' come from?
> Clearly it was part of the list at one point; so what happened to it?
I explained this earlier.
In nvme_ns_remove, there is a hole between list_del_rcu and
nvme_mpath_clear_current_path. If head->current_path is the "old", and
the "old" is removing. The "old" is already removed from the list by
list_del_rcu, but head->current_path is not clear to NULL by
nvme_mpath_clear_current_path.
Find path is race with nvme_ns_remove, use the "old" pass to
nvme_round_robin_path to find path.
> 
> Cheers,
> 
> Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-02-01  9:40                                 ` Chao Leng
@ 2021-02-01 10:45                                   ` Hannes Reinecke
  2021-02-02  1:12                                     ` Chao Leng
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Reinecke @ 2021-02-01 10:45 UTC (permalink / raw)
  To: Chao Leng, Sagi Grimberg, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig

On 2/1/21 10:40 AM, Chao Leng wrote:
> 
> 
> On 2021/2/1 16:57, Hannes Reinecke wrote:
>> On 2/1/21 9:47 AM, Chao Leng wrote:
>>>
>>>
>>> On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ]
>>>> Urgh. Please, no. That is well impossible to debug.
>>>> Can you please open-code it to demonstrate where the difference to 
>>>> the current (and my fixed) versions is?
>>>> I'm still not clear where the problem is once we applied both patches.
>>> For example assume the list has three path, and all path is not 
>>> NVME_ANA_OPTIMIZED:
>>> head->next = ns1;
>>> ns1->next = ns2;
>>> ns2->next = head;
>>> old->next = ns2;
>>>
>> And this is where I have issues with.
>> Where does 'old' come from?
>> Clearly it was part of the list at one point; so what happened to it?
> I explained this earlier.
> In nvme_ns_remove, there is a hole between list_del_rcu and
> nvme_mpath_clear_current_path. If head->current_path is the "old", and
> the "old" is removing. The "old" is already removed from the list by
> list_del_rcu, but head->current_path is not clear to NULL by
> nvme_mpath_clear_current_path.
> Find path is race with nvme_ns_remove, use the "old" pass to
> nvme_round_robin_path to find path.

Ah. So this should be better:

@@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct 
nvme_ns_head *head, int node)
  static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
                 struct nvme_ns *ns)
  {
-       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct 
nvme_ns,
-                       siblings);
-       if (ns)
-               return ns;
+       if (ns && !test_bit(NVME_NS_REMOVING, &ns->flags)) {
+               ns = list_next_or_null_rcu(&head->list, &ns->siblings,
+                                          struct nvme_ns, siblings);
+               if (ns)
+                       return ns;
+       }
         return list_first_or_null_rcu(&head->list, struct nvme_ns, 
siblings);
  }

The 'NVME_NS_REMOVING' bit is set before list_del_rcu() is called, so it 
should guard against the issue you mentioned.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvme-multipath: Early exit if no path is available
  2021-02-01 10:45                                   ` Hannes Reinecke
@ 2021-02-02  1:12                                     ` Chao Leng
  0 siblings, 0 replies; 23+ messages in thread
From: Chao Leng @ 2021-02-02  1:12 UTC (permalink / raw)
  To: Hannes Reinecke, Sagi Grimberg, Daniel Wagner
  Cc: linux-nvme, linux-kernel, Jens Axboe, Keith Busch, Christoph Hellwig



On 2021/2/1 18:45, Hannes Reinecke wrote:
> On 2/1/21 10:40 AM, Chao Leng wrote:
>>
>>
>> On 2021/2/1 16:57, Hannes Reinecke wrote:
>>> On 2/1/21 9:47 AM, Chao Leng wrote:
>>>>
>>>>
>>>> On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ]
>>>>> Urgh. Please, no. That is well impossible to debug.
>>>>> Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is?
>>>>> I'm still not clear where the problem is once we applied both patches.
>>>> For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED:
>>>> head->next = ns1;
>>>> ns1->next = ns2;
>>>> ns2->next = head;
>>>> old->next = ns2;
>>>>
>>> And this is where I have issues with.
>>> Where does 'old' come from?
>>> Clearly it was part of the list at one point; so what happened to it?
>> I explained this earlier.
>> In nvme_ns_remove, there is a hole between list_del_rcu and
>> nvme_mpath_clear_current_path. If head->current_path is the "old", and
>> the "old" is removing. The "old" is already removed from the list by
>> list_del_rcu, but head->current_path is not clear to NULL by
>> nvme_mpath_clear_current_path.
>> Find path is race with nvme_ns_remove, use the "old" pass to
>> nvme_round_robin_path to find path.
> 
> Ah. So this should be better:
> 
> @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
>   static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
>                  struct nvme_ns *ns)
>   {
> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
> -                       siblings);
> -       if (ns)
> -               return ns;
> +       if (ns && !test_bit(NVME_NS_REMOVING, &ns->flags)) {
> +               ns = list_next_or_null_rcu(&head->list, &ns->siblings,
> +                                          struct nvme_ns, siblings);
> +               if (ns)
> +                       return ns;
> +       }
>          return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
>   }
> 
> The 'NVME_NS_REMOVING' bit is set before list_del_rcu() is called, so it should guard against the issue you mentioned.
Looks useless, it is still infinite loop. You can check the workflow for the scenario I mentioned.
> 
> Cheers,
> 
> Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-02-02  1:13 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-27 10:30 [PATCH v2] nvme-multipath: Early exit if no path is available Daniel Wagner
2021-01-27 10:34 ` Hannes Reinecke
2021-01-27 16:49 ` Christoph Hellwig
2021-01-28  1:31 ` Chao Leng
2021-01-28  7:58   ` Daniel Wagner
2021-01-28  9:18     ` Chao Leng
2021-01-28  9:23       ` Hannes Reinecke
2021-01-29  1:18         ` Chao Leng
2021-01-28  9:40       ` Daniel Wagner
2021-01-29  1:23         ` Chao Leng
2021-01-29  1:42           ` Sagi Grimberg
2021-01-29  3:07             ` Chao Leng
2021-01-29  3:30               ` Sagi Grimberg
2021-01-29  3:36                 ` Chao Leng
2021-01-29  7:06               ` Hannes Reinecke
     [not found]                 ` <81b22bbf-4dd3-6161-e63a-9699690a4e4f@huawei.com>
     [not found]                   ` <715dd943-0587-be08-2840-e0948cf0bc62@suse.de>
     [not found]                     ` <eb131d8f-f009-42e7-105d-58b84060f0dd@huawei.com>
     [not found]                       ` <ac019690-7f02-d28c-ed58-bfc8c1d48879@suse.de>
2021-02-01  2:16                         ` Chao Leng
2021-02-01  7:29                           ` Hannes Reinecke
2021-02-01  8:47                             ` Chao Leng
2021-02-01  8:57                               ` Hannes Reinecke
2021-02-01  9:40                                 ` Chao Leng
2021-02-01 10:45                                   ` Hannes Reinecke
2021-02-02  1:12                                     ` Chao Leng
2021-01-28  1:36 ` Chao Leng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).