All of lore.kernel.org
 help / color / mirror / Atom feed
* Request timeout seen with NVMEoF TCP
@ 2020-12-09 17:19 Potnuri Bharat Teja
  2020-12-09 18:02 ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Potnuri Bharat Teja @ 2020-12-09 17:19 UTC (permalink / raw)
  To: sagi; +Cc: hch, linux-nvme

Hi All,
I am seeing the following timeouts and reconnects on NVMF TCP initiator with latest v5.10-rc5 
kernel.
I see the same behavior with nvme tree too (branch:nvme-5.11)
I last ran this with 5.8, where it was running fine.

Target configuration is, 1 target with 1gb ramdisk namespace. On intiator,
discover, connect and run fio or Iozone. Traces are seen within couple of minutes 
after starting the test.

initiator dmesg:
--
[77206.339485] nvme nvme1: queue 11: timeout request 0x34 type 6
[77206.340070] nvme nvme1: starting error recovery
[77206.340360] nvme nvme1: queue 11: timeout request 0x36 type 6
[77206.340766] nvme nvme1: queue 11: timeout request 0x38 type 6
[77206.341128] nvme nvme1: queue 11: timeout request 0x39 type 6
[77206.341515] nvme nvme1: queue 11: timeout request 0x3a type 4
[77206.341871] block nvme1n1: no usable path - requeuing I/O
[77206.342103] nvme nvme1: Reconnecting in 10 seconds...
[77206.342170] block nvme1n1: no usable path - requeuing I/O
[77206.342871] block nvme1n1: no usable path - requeuing I/O
[77206.343183] block nvme1n1: no usable path - requeuing I/O
[77206.343498] block nvme1n1: no usable path - requeuing I/O
[77206.343810] block nvme1n1: no usable path - requeuing I/O
[77206.344103] block nvme1n1: no usable path - requeuing I/O
[77206.344387] block nvme1n1: no usable path - requeuing I/O
[77206.344705] block nvme1n1: no usable path - requeuing I/O
[77206.345027] block nvme1n1: no usable path - requeuing I/O
[77211.971543] nvme_ns_head_submit_bio: 449 callbacks suppressed
[77211.971547] block nvme1n1: no usable path - requeuing I/O
[77216.600034] block nvme1n1: no usable path - requeuing I/O
[77216.600386] block nvme1n1: no usable path - requeuing I/O
[77216.600752] block nvme1n1: no usable path - requeuing I/O
[77216.601150] block nvme1n1: no usable path - requeuing I/O
[77216.601531] block nvme1n1: no usable path - requeuing I/O
[77216.601915] block nvme1n1: no usable path - requeuing I/O
[77216.602176] block nvme1n1: no usable path - requeuing I/O
[77216.602467] block nvme1n1: no usable path - requeuing I/O
[77216.602830] block nvme1n1: no usable path - requeuing I/O
[77216.603597] nvme nvme1: creating 16 I/O queues.
[77216.642166] nvme nvme1: mapped 16/0/0 default/read/poll queues.
[77216.642516] debugfs: Directory 'hctx0' with parent '/' already present!
[77216.642899] debugfs: Directory 'hctx1' with parent '/' already present!
[77216.643177] debugfs: Directory 'hctx2' with parent '/' already present!
[77216.643475] debugfs: Directory 'hctx3' with parent '/' already present!
--

Please let me know if I could help with anything.

Thanks,
Bharat.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-09 17:19 Request timeout seen with NVMEoF TCP Potnuri Bharat Teja
@ 2020-12-09 18:02 ` Sagi Grimberg
  2020-12-10 17:06   ` Potnuri Bharat Teja
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2020-12-09 18:02 UTC (permalink / raw)
  To: Potnuri Bharat Teja; +Cc: hch, linux-nvme


> Hi All,
> I am seeing the following timeouts and reconnects on NVMF TCP initiator with latest v5.10-rc5
> kernel.
> I see the same behavior with nvme tree too (branch:nvme-5.11)
> I last ran this with 5.8, where it was running fine.
> 
> Target configuration is, 1 target with 1gb ramdisk namespace. On intiator,
> discover, connect and run fio or Iozone. Traces are seen within couple of minutes
> after starting the test.

Hey Potnuri,

Can you also attach the target side logs? it seems like an I/O times out
for no apparent reason..

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-09 18:02 ` Sagi Grimberg
@ 2020-12-10 17:06   ` Potnuri Bharat Teja
  2020-12-10 21:47     ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Potnuri Bharat Teja @ 2020-12-10 17:06 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: hch, linux-nvme

On Wednesday, December 12/09/20, 2020 at 10:02:57 -0800, Sagi Grimberg wrote:
> 
> > Hi All,
> > I am seeing the following timeouts and reconnects on NVMF TCP initiator with latest v5.10-rc5
> > kernel.
> > I see the same behavior with nvme tree too (branch:nvme-5.11)
> > I last ran this with 5.8, where it was running fine.
> > 
> > Target configuration is, 1 target with 1gb ramdisk namespace. On intiator,
> > discover, connect and run fio or Iozone. Traces are seen within couple of minutes
> > after starting the test.
> 
> Hey Potnuri,
> 
> Can you also attach the target side logs? it seems like an I/O times out
> for no apparent reason..

I see nothing much logged on target.

Target dmesg:
---
[ 8371.979029] brd: module loaded
[ 8371.983368] nvmet: adding nsid 1 to subsystem nvme-ram0
[ 8371.988189] nvmet: adding nsid 1 to subsystem nvme-ram1
[ 8371.992766] nvmet: adding nsid 1 to subsystem nvme-ram2
[ 8371.997107] nvmet: adding nsid 1 to subsystem nvme-ram3
[ 8372.001881] nvmet_tcp: enabling port 1 (10.0.3.2:4420)
[ 8407.988148] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 8417.287799] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 8417.328557] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 8417.365155] nvmet: creating controller 2 for subsystem nvme-ram1 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 8417.402337] nvmet: creating controller 3 for subsystem nvme-ram2 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 8417.439785] nvmet: creating controller 4 for subsystem nvme-ram3 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.

[ 8511.903509] nvmet: creating controller 2 for subsystem nvme-ram1 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 8552.862271] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
^^^^^^^ Logs due to reconnect ^^^^^^^^
---

Initiator dmesg:
---
[  923.976468] nvme nvme0: Removing ctrl: NQN "nvme-ram0"
[  985.213878] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.3.2:4420
[  985.215489] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[  994.513541] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.3.2:4420
[  994.515025] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[  994.554853] nvme nvme0: creating 16 I/O queues.
[  994.559725] nvme nvme0: mapped 16/0/0 default/read/poll queues.
[  994.565373] nvme nvme0: new ctrl: NQN "nvme-ram0", addr 10.0.3.2:4420
[  994.591559] nvme nvme1: creating 16 I/O queues.
[  994.596472] nvme nvme1: mapped 16/0/0 default/read/poll queues.
[  994.602392] nvme nvme1: new ctrl: NQN "nvme-ram1", addr 10.0.3.2:4420
[  994.628442] nvme nvme2: creating 16 I/O queues.
[  994.633391] nvme nvme2: mapped 16/0/0 default/read/poll queues.
[  994.639256] nvme nvme2: new ctrl: NQN "nvme-ram2", addr 10.0.3.2:4420
[  994.665958] nvme nvme3: creating 16 I/O queues.
[  994.670913] nvme nvme3: mapped 16/0/0 default/read/poll queues.
[  994.676894] nvme nvme3: new ctrl: NQN "nvme-ram3", addr 10.0.3.2:4420
[ 1007.040464] EXT4-fs (nvme0n1): mounting ext3 file system using the ext4 subsystem
[ 1007.044448] EXT4-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: (null)
[ 1007.070111] EXT4-fs (nvme1n1): mounting ext3 file system using the ext4 subsystem
[ 1007.073819] EXT4-fs (nvme1n1): mounted filesystem with ordered data mode. Opts: (null)
[ 1007.099188] EXT4-fs (nvme2n1): mounting ext3 file system using the ext4 subsystem
[ 1007.102578] EXT4-fs (nvme2n1): mounted filesystem with ordered data mode. Opts: (null)
[ 1007.126442] EXT4-fs (nvme3n1): mounting ext3 file system using the ext4 subsystem
[ 1007.130367] EXT4-fs (nvme3n1): mounted filesystem with ordered data mode. Opts: (null)
[ 1078.867439] nvme nvme1: queue 14: timeout request 0x33 type 6
[ 1078.868266] nvme nvme1: starting error recovery
[ 1078.868775] nvme nvme1: queue 14: timeout request 0x34 type 6
[ 1078.869853] nvme nvme1: queue 14: timeout request 0x35 type 6
[ 1078.869908] nvme nvme1: queue 14: timeout request 0x36 type 4
[ 1078.869958] nvme nvme1: queue 14: timeout request 0x37 type 4
[ 1078.870586] block nvme1n1: no usable path - requeuing I/O
[ 1078.870686] block nvme1n1: no usable path - requeuing I/O
[ 1078.871752] block nvme1n1: no usable path - requeuing I/O
[ 1078.872573] nvme nvme1: Reconnecting in 10 seconds...
[ 1078.872949] block nvme1n1: no usable path - requeuing I/O
[ 1078.874802] block nvme1n1: no usable path - requeuing I/O
[ 1078.875685] block nvme1n1: no usable path - requeuing I/O
[ 1078.876588] block nvme1n1: no usable path - requeuing I/O
[ 1078.877456] block nvme1n1: no usable path - requeuing I/O
[ 1078.878248] block nvme1n1: no usable path - requeuing I/O
[ 1078.879115] block nvme1n1: no usable path - requeuing I/O
[ 1089.130331] nvme_ns_head_submit_bio: 5 callbacks suppressed
[ 1089.131050] block nvme1n1: no usable path - requeuing I/O
[ 1089.131925] block nvme1n1: no usable path - requeuing I/O
[ 1089.132619] block nvme1n1: no usable path - requeuing I/O
[ 1089.133446] block nvme1n1: no usable path - requeuing I/O
[ 1089.134082] block nvme1n1: no usable path - requeuing I/O
[ 1089.134724] block nvme1n1: no usable path - requeuing I/O
[ 1089.135464] block nvme1n1: no usable path - requeuing I/O
[ 1089.136070] block nvme1n1: no usable path - requeuing I/O
[ 1089.137102] nvme nvme1: creating 16 I/O queues.
[ 1089.181184] nvme nvme1: mapped 16/0/0 default/read/poll queues.
[ 1089.196437] nvme nvme1: Successfully reconnected (1 attempt)
[ 1119.827490] nvme nvme0: queue 16: timeout request 0x1a type 6
[ 1119.828223] nvme nvme0: starting error recovery
[ 1119.828801] nvme nvme0: queue 16: timeout request 0x1b type 4
[ 1119.829456] block nvme0n1: no usable path - requeuing I/O
[ 1119.830006] block nvme0n1: no usable path - requeuing I/O
[ 1119.830337] nvme nvme0: Reconnecting in 10 seconds...
[ 1119.830664] block nvme0n1: no usable path - requeuing I/O
[ 1130.088305] block nvme0n1: no usable path - requeuing I/O
[ 1130.089473] block nvme0n1: no usable path - requeuing I/O
[ 1130.090319] nvme nvme0: creating 16 I/O queues.
[ 1130.132996] nvme nvme0: mapped 16/0/0 default/read/poll queues.
[ 1130.133572] debugfs: Directory 'hctx0' with parent '/' already present!
[ 1130.134086] debugfs: Directory 'hctx1' with parent '/' already present!
[ 1130.134669] debugfs: Directory 'hctx2' with parent '/' already present!
[ 1130.135196] debugfs: Directory 'hctx3' with parent '/' already present!
[ 1130.135766] debugfs: Directory 'hctx4' with parent '/' already present!
[ 1130.137374] debugfs: Directory 'hctx5' with parent '/' already present!
[ 1130.138070] debugfs: Directory 'hctx6' with parent '/' already present!
[ 1130.138782] debugfs: Directory 'hctx7' with parent '/' already present!
[ 1130.139553] debugfs: Directory 'hctx8' with parent '/' already present!
[ 1130.140114] debugfs: Directory 'hctx9' with parent '/' already present!
[ 1130.140728] debugfs: Directory 'hctx10' with parent '/' already present!
[ 1130.141502] debugfs: Directory 'hctx11' with parent '/' already present!
[ 1130.142111] debugfs: Directory 'hctx12' with parent '/' already present!
[ 1130.142763] debugfs: Directory 'hctx13' with parent '/' already present!
[ 1130.143535] debugfs: Directory 'hctx14' with parent '/' already present!
[ 1130.144114] debugfs: Directory 'hctx15' with parent '/' already present!
[ 1130.150104] nvme nvme0: Successfully reconnected (1 attempt)
---

I see the timeouts almost immediately after starting iozone. As expected I see 
IO blocked for a couple of seconds untill host reconnects back.

Here is my test:
-> create 4 NVMF-TCP targets with 1gb ramdisk namespace each.
-> discover and connect all targets.
-> format nvme devices as ext4, mount them. (# mkfs.ext4 -F /dev/nvme0n1)
-> run iozone over all mounted disks (# iozone -a -g 512m)

What else can I check?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-10 17:06   ` Potnuri Bharat Teja
@ 2020-12-10 21:47     ` Sagi Grimberg
  2020-12-11  7:26       ` Potnuri Bharat Teja
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2020-12-10 21:47 UTC (permalink / raw)
  To: Potnuri Bharat Teja; +Cc: hch, linux-nvme

>>> Hi All,
>>> I am seeing the following timeouts and reconnects on NVMF TCP initiator with latest v5.10-rc5
>>> kernel.
>>> I see the same behavior with nvme tree too (branch:nvme-5.11)
>>> I last ran this with 5.8, where it was running fine.
>>>
>>> Target configuration is, 1 target with 1gb ramdisk namespace. On intiator,
>>> discover, connect and run fio or Iozone. Traces are seen within couple of minutes
>>> after starting the test.
>>
>> Hey Potnuri,
>>
>> Can you also attach the target side logs? it seems like an I/O times out
>> for no apparent reason..
> 
> I see nothing much logged on target.

Hey Potnuri,

This issue is consistent with what Baharat reported.

Can you please check if the below solves the issue?
-- 
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 1ba659927442..9193b05d7bda 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1122,6 +1122,14 @@ static void nvme_tcp_io_work(struct work_struct *w)
                                 pending = true;
                         else if (unlikely(result < 0))
                                 break;
+               } else {
+                       /*
+                        * submission path is sending, we need to
+                        * continue or resched because the submission
+                        * path direct send is not concerned with
+                        * rescheduling...
+                        */
+                       pending = true;
                 }

                 result = nvme_tcp_try_recv(queue);
-- 

Thanks

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-10 21:47     ` Sagi Grimberg
@ 2020-12-11  7:26       ` Potnuri Bharat Teja
  2020-12-14 17:51         ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Potnuri Bharat Teja @ 2020-12-11  7:26 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: hch, linux-nvme

On Friday, December 12/11/20, 2020 at 03:17:11 +0530, Sagi Grimberg wrote:
> >>> Hi All,
> >>> I am seeing the following timeouts and reconnects on NVMF TCP initiator with latest v5.10-rc5
> >>> kernel.
> >>> I see the same behavior with nvme tree too (branch:nvme-5.11)
> >>> I last ran this with 5.8, where it was running fine.
> >>>
> >>> Target configuration is, 1 target with 1gb ramdisk namespace. On intiator,
> >>> discover, connect and run fio or Iozone. Traces are seen within couple of minutes
> >>> after starting the test.
> >>
> >> Hey Potnuri,
> >>
> >> Can you also attach the target side logs? it seems like an I/O times out
> >> for no apparent reason..
> > 
> > I see nothing much logged on target.
> 
> Hey Potnuri,
> 
> This issue is consistent with what Baharat reported.
> 
> Can you please check if the below solves the issue?
> -- 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 1ba659927442..9193b05d7bda 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1122,6 +1122,14 @@ static void nvme_tcp_io_work(struct work_struct *w)
>                                  pending = true;
>                          else if (unlikely(result < 0))
>                                  break;
> +               } else {
> +                       /*
> +                        * submission path is sending, we need to
> +                        * continue or resched because the submission
> +                        * path direct send is not concerned with
> +                        * rescheduling...
> +                        */
> +                       pending = true;
>                  }
> 
>                  result = nvme_tcp_try_recv(queue);
> --

Hi Sagi,
With above patch I still see the issue but less frequently. Without patch I was 
able to consistently reproduce the timouts with 4 target devices. With patch I 
see IO running fine for 4 targets. Tried the same test with 8 target devices 
and I see the below timeout. I've observed only one instance of timeout. So, 
I'll let it run for somemore time or rerun and update.

Target dmesg:
---
[ 1704.132366] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.185987] nvmet: creating controller 2 for subsystem nvme-ram1 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.230065] nvmet: creating controller 3 for subsystem nvme-ram2 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.277712] nvmet: creating controller 4 for subsystem nvme-ram3 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.314457] nvmet: creating controller 5 for subsystem nvme-ram4 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.370124] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.435581] nvmet: creating controller 7 for subsystem nvme-ram6 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 1704.501813] nvmet: creating controller 8 for subsystem nvme-ram7 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
[ 2103.965017] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
^^^^^^^^^^^^^^^
---

Initiator dmesg:
---
[ 1735.038634] EXT4-fs (nvme7n1): mounted filesystem with ordered data mode. Opts: (null)
[ 2111.990419] nvme nvme5: queue 7: timeout request 0x57 type 4
[ 2111.991835] nvme nvme5: starting error recovery
[ 2111.998796] block nvme5n1: no usable path - requeuing I/O
[ 2111.998816] nvme nvme5: Reconnecting in 10 seconds...
[ 2122.253431] block nvme5n1: no usable path - requeuing I/O
[ 2122.254732] nvme nvme5: creating 16 I/O queues.
[ 2122.301169] nvme nvme5: mapped 16/0/0 default/read/poll queues.
[ 2122.314229] nvme nvme5: Successfully reconnected (1 attempt)
---


Thanks.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-11  7:26       ` Potnuri Bharat Teja
@ 2020-12-14 17:51         ` Sagi Grimberg
  2020-12-14 18:05           ` Potnuri Bharat Teja
  2020-12-15  1:53           ` Sagi Grimberg
  0 siblings, 2 replies; 15+ messages in thread
From: Sagi Grimberg @ 2020-12-14 17:51 UTC (permalink / raw)
  To: Potnuri Bharat Teja; +Cc: Samuel Jones, hch, linux-nvme


> Hi Sagi,
> With above patch I still see the issue but less frequently. Without patch I was
> able to consistently reproduce the timouts with 4 target devices. With patch I
> see IO running fine for 4 targets. Tried the same test with 8 target devices
> and I see the below timeout. I've observed only one instance of timeout. So,
> I'll let it run for somemore time or rerun and update.

Hey Potnuri,

Have you observed this further?

I'd think that if the io_work reschedule itself when it races
with the direct send path this should not happen, but we may be
seeing a different race going on here, adding Samuel who saw
a similar phenomenon.

> 
> Target dmesg:
> ---
> [ 1704.132366] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> [ 1704.185987] nvmet: creating controller 2 for subsystem nvme-ram1 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> [ 1704.230065] nvmet: creating controller 3 for subsystem nvme-ram2 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> [ 1704.277712] nvmet: creating controller 4 for subsystem nvme-ram3 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> [ 1704.314457] nvmet: creating controller 5 for subsystem nvme-ram4 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> [ 1704.370124] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> [ 1704.435581] nvmet: creating controller 7 for subsystem nvme-ram6 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> [ 1704.501813] nvmet: creating controller 8 for subsystem nvme-ram7 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> [ 2103.965017] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> ^^^^^^^^^^^^^^^
> ---
> 
> Initiator dmesg:
> ---
> [ 1735.038634] EXT4-fs (nvme7n1): mounted filesystem with ordered data mode. Opts: (null)
> [ 2111.990419] nvme nvme5: queue 7: timeout request 0x57 type 4
> [ 2111.991835] nvme nvme5: starting error recovery
> [ 2111.998796] block nvme5n1: no usable path - requeuing I/O
> [ 2111.998816] nvme nvme5: Reconnecting in 10 seconds...
> [ 2122.253431] block nvme5n1: no usable path - requeuing I/O
> [ 2122.254732] nvme nvme5: creating 16 I/O queues.
> [ 2122.301169] nvme nvme5: mapped 16/0/0 default/read/poll queues.
> [ 2122.314229] nvme nvme5: Successfully reconnected (1 attempt)
> ---
> 
> 
> Thanks.
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-14 17:51         ` Sagi Grimberg
@ 2020-12-14 18:05           ` Potnuri Bharat Teja
  2020-12-14 19:27             ` Samuel Jones
  2020-12-15  1:53           ` Sagi Grimberg
  1 sibling, 1 reply; 15+ messages in thread
From: Potnuri Bharat Teja @ 2020-12-14 18:05 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Samuel Jones, hch, linux-nvme

On Monday, December 12/14/20, 2020 at 09:51:28 -0800, Sagi Grimberg wrote:
> 
> > Hi Sagi,
> > With above patch I still see the issue but less frequently. Without patch I was
> > able to consistently reproduce the timouts with 4 target devices. With patch I
> > see IO running fine for 4 targets. Tried the same test with 8 target devices
> > and I see the below timeout. I've observed only one instance of timeout. So,
> > I'll let it run for somemore time or rerun and update.
> 
> Hey Potnuri,
> 
> Have you observed this further?
> 
> I'd think that if the io_work reschedule itself when it races
> with the direct send path this should not happen, but we may be
> seeing a different race going on here, adding Samuel who saw
> a similar phenomenon.
> 

Yes, I have seen it further too but very intermittently. Below is the dmesg that
I saved before my test got stopped for some other reason. From the timestamps, 
timeouts were around ~20min, ~45min and ~3 hours apart.

dmesg on intiator:
---
[ 1735.038634] EXT4-fs (nvme7n1): mounted filesystem with ordered data mod                                                                                e. Opts: (null)
[ 2111.990419] nvme nvme5: queue 7: timeout request 0x57 type 4
[ 2111.991835] nvme nvme5: starting error recovery
[ 2111.998796] block nvme5n1: no usable path - requeuing I/O
[ 2111.998816] nvme nvme5: Reconnecting in 10 seconds...
[ 2122.253431] block nvme5n1: no usable path - requeuing I/O
[ 2122.254732] nvme nvme5: creating 16 I/O queues.
[ 2122.301169] nvme nvme5: mapped 16/0/0 default/read/poll queues.
[ 2122.314229] nvme nvme5: Successfully reconnected (1 attempt)
[ 2522.795467] perf: interrupt took too long (2503 > 2500), lowering kerne                                                                                l.perf_event_max_sample_rate to 79000
[ 3043.544758] perf: interrupt took too long (3135 > 3128), lowering kerne                                                                                l.perf_event_max_sample_rate to 63000
[ 4197.260041] perf: interrupt took too long (3922 > 3918), lowering kerne                                                                                l.perf_event_max_sample_rate to 50000
[ 4848.175367] nvme nvme1: queue 6: timeout request 0x42 type 4
[ 4848.176086] nvme nvme1: starting error recovery
[ 4848.176864] nvme nvme1: queue 6: timeout request 0x43 type 4
[ 4848.178194] nvme nvme1: queue 6: timeout request 0x44 type 4
[ 4848.179873] block nvme1n1: no usable path - requeuing I/O
[ 4848.180665] block nvme1n1: no usable path - requeuing I/O
[ 4848.180775] nvme nvme1: Reconnecting in 10 seconds...
[ 4848.181372] block nvme1n1: no usable path - requeuing I/O
[ 4848.183022] block nvme1n1: no usable path - requeuing I/O
[ 4848.183782] block nvme1n1: no usable path - requeuing I/O
[ 4858.436889] block nvme1n1: no usable path - requeuing I/O
[ 4858.437558] block nvme1n1: no usable path - requeuing I/O
[ 4858.438168] block nvme1n1: no usable path - requeuing I/O
[ 4858.439547] nvme nvme1: creating 16 I/O queues.
[ 4858.489650] nvme nvme1: mapped 16/0/0 default/read/poll queues.
[ 4858.490357] debugfs: Directory 'hctx0' with parent '/' already present!
[ 4858.491256] debugfs: Directory 'hctx1' with parent '/' already present!
[ 4858.492790] debugfs: Directory 'hctx2' with parent '/' already present!
[ 4858.493601] debugfs: Directory 'hctx3' with parent '/' already present!
[ 4858.494362] debugfs: Directory 'hctx4' with parent '/' already present!
[ 4858.495310] debugfs: Directory 'hctx5' with parent '/' already present!
[ 4858.496261] debugfs: Directory 'hctx6' with parent '/' already present!
[ 4858.497098] debugfs: Directory 'hctx7' with parent '/' already present!
[ 4858.497972] debugfs: Directory 'hctx8' with parent '/' already present!
[ 4858.498806] debugfs: Directory 'hctx9' with parent '/' already present!
[ 4858.499574] debugfs: Directory 'hctx10' with parent '/' already present                                                                                !
[ 4858.500268] debugfs: Directory 'hctx11' with parent '/' already present                                                                                !
[ 4858.501030] debugfs: Directory 'hctx12' with parent '/' already present                                                                                !
[ 4858.501807] debugfs: Directory 'hctx13' with parent '/' already present                                                                                !
[ 4858.502459] debugfs: Directory 'hctx14' with parent '/' already present                                                                                !
[ 4858.503196] debugfs: Directory 'hctx15' with parent '/' already present                                                                                !
[ 4858.510075] nvme nvme1: Successfully reconnected (1 attempt)
[12905.117166] perf: interrupt took too long (4944 > 4902), lowering kernel.perf_event_max_sample_rate to 40000
[15436.031342] nvme nvme0: queue 16: timeout request 0x22 type 4
[15436.031920] nvme nvme0: starting error recovery
[15436.033746] block nvme0n1: no usable path - requeuing I/O
[15436.035221] nvme nvme0: Reconnecting in 10 seconds...
[15446.292544] block nvme0n1: no usable path - requeuing I/O
[15446.293541] nvme nvme0: creating 16 I/O queues.
[15446.339595] nvme nvme0: mapped 16/0/0 default/read/poll queues.
[15446.340846] debugfs: Directory 'hctx0' with parent '/' already present!
[15446.342664] debugfs: Directory 'hctx1' with parent '/' already present!
[15446.343413] debugfs: Directory 'hctx2' with parent '/' already present!
[15446.344228] debugfs: Directory 'hctx3' with parent '/' already present!
[15446.345155] debugfs: Directory 'hctx4' with parent '/' already present!
[15446.345948] debugfs: Directory 'hctx5' with parent '/' already present!
[15446.346767] debugfs: Directory 'hctx6' with parent '/' already present!
[15446.347368] debugfs: Directory 'hctx7' with parent '/' already present!
[15446.348129] debugfs: Directory 'hctx8' with parent '/' already present!
[15446.348948] debugfs: Directory 'hctx9' with parent '/' already present!
[15446.349538] debugfs: Directory 'hctx10' with parent '/' already present!
[15446.350185] debugfs: Directory 'hctx11' with parent '/' already present!
[15446.350757] debugfs: Directory 'hctx12' with parent '/' already present!
[15446.351539] debugfs: Directory 'hctx13' with parent '/' already present!
[15446.352200] debugfs: Directory 'hctx14' with parent '/' already present!
[15446.352722] debugfs: Directory 'hctx15' with parent '/' already present!
[15446.358750] nvme nvme0: Successfully reconnected (1 attempt)

---


> > 
> > Target dmesg:
> > ---
> > [ 1704.132366] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.185987] nvmet: creating controller 2 for subsystem nvme-ram1 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.230065] nvmet: creating controller 3 for subsystem nvme-ram2 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.277712] nvmet: creating controller 4 for subsystem nvme-ram3 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.314457] nvmet: creating controller 5 for subsystem nvme-ram4 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.370124] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.435581] nvmet: creating controller 7 for subsystem nvme-ram6 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.501813] nvmet: creating controller 8 for subsystem nvme-ram7 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 2103.965017] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > ^^^^^^^^^^^^^^^
> > ---
> > 
> > Initiator dmesg:
> > ---
> > [ 1735.038634] EXT4-fs (nvme7n1): mounted filesystem with ordered data mode. Opts: (null)
> > [ 2111.990419] nvme nvme5: queue 7: timeout request 0x57 type 4
> > [ 2111.991835] nvme nvme5: starting error recovery
> > [ 2111.998796] block nvme5n1: no usable path - requeuing I/O
> > [ 2111.998816] nvme nvme5: Reconnecting in 10 seconds...
> > [ 2122.253431] block nvme5n1: no usable path - requeuing I/O
> > [ 2122.254732] nvme nvme5: creating 16 I/O queues.
> > [ 2122.301169] nvme nvme5: mapped 16/0/0 default/read/poll queues.
> > [ 2122.314229] nvme nvme5: Successfully reconnected (1 attempt)
> > ---
> > 
> > 
> > Thanks.
> > 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-14 18:05           ` Potnuri Bharat Teja
@ 2020-12-14 19:27             ` Samuel Jones
  2020-12-14 20:13               ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Samuel Jones @ 2020-12-14 19:27 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Potnuri Bharat Teja, hch, linux-nvme

Hi Sagi,

Without instrumenting the driver it's hard to say what might be happening here. But I did make a few comments at the end of my initial email which might be relevant:

1. It seems abnormal to me that the direct send path does not check the return value of nvme_tcp_try_send(), given that this routine at least claims to be able to fail transiently. nvme_tcp_queue_request should reschedule the workqueue if nvme_tcp_try_send() does not return 1, IMHO.

2. If, for whatever reason, kernel_sendpage or sock_no_sendpage returns -EAGAIN, the nvme_tcp_try_send() returns 0. This will be interpreted by nvme_tcp_io_work() as meaning there is nothing more to do. This is wrong, because in fact there is more work to do and nvme_tcp_io_work() should reschedule itself in this case. Unfortunately, as the system is coded, nvme_tcp_io_work() has no way of distinguishing between "nvme_tcp_try_send returned 0 because there is nothing to do" and "nvme_tcp_try_send returned 0 because of a transient failure".

Not sure how possible these cases are in practice but theoretically they can occur..

Regards
Samuel

----- Original Message -----
From: "Potnuri Bharat Teja" <bharat@chelsio.com>
To: "Sagi Grimberg" <sagi@grimberg.me>
Cc: hch@lst.de, "linux-nvme" <linux-nvme@lists.infradead.org>, "Samuel Jones" <sjones@kalrayinc.com>
Sent: Monday, 14 December, 2020 19:05:31
Subject: Re: Request timeout seen with NVMEoF TCP

On Monday, December 12/14/20, 2020 at 09:51:28 -0800, Sagi Grimberg wrote:
> 
> > Hi Sagi,
> > With above patch I still see the issue but less frequently. Without patch I was
> > able to consistently reproduce the timouts with 4 target devices. With patch I
> > see IO running fine for 4 targets. Tried the same test with 8 target devices
> > and I see the below timeout. I've observed only one instance of timeout. So,
> > I'll let it run for somemore time or rerun and update.
> 
> Hey Potnuri,
> 
> Have you observed this further?
> 
> I'd think that if the io_work reschedule itself when it races
> with the direct send path this should not happen, but we may be
> seeing a different race going on here, adding Samuel who saw
> a similar phenomenon.
> 

Yes, I have seen it further too but very intermittently. Below is the dmesg that
I saved before my test got stopped for some other reason. From the timestamps, 
timeouts were around ~20min, ~45min and ~3 hours apart.

dmesg on intiator:
---
[ 1735.038634] EXT4-fs (nvme7n1): mounted filesystem with ordered data mod                                                                                e. Opts: (null)
[ 2111.990419] nvme nvme5: queue 7: timeout request 0x57 type 4
[ 2111.991835] nvme nvme5: starting error recovery
[ 2111.998796] block nvme5n1: no usable path - requeuing I/O
[ 2111.998816] nvme nvme5: Reconnecting in 10 seconds...
[ 2122.253431] block nvme5n1: no usable path - requeuing I/O
[ 2122.254732] nvme nvme5: creating 16 I/O queues.
[ 2122.301169] nvme nvme5: mapped 16/0/0 default/read/poll queues.
[ 2122.314229] nvme nvme5: Successfully reconnected (1 attempt)
[ 2522.795467] perf: interrupt took too long (2503 > 2500), lowering kerne                                                                                l.perf_event_max_sample_rate to 79000
[ 3043.544758] perf: interrupt took too long (3135 > 3128), lowering kerne                                                                                l.perf_event_max_sample_rate to 63000
[ 4197.260041] perf: interrupt took too long (3922 > 3918), lowering kerne                                                                                l.perf_event_max_sample_rate to 50000
[ 4848.175367] nvme nvme1: queue 6: timeout request 0x42 type 4
[ 4848.176086] nvme nvme1: starting error recovery
[ 4848.176864] nvme nvme1: queue 6: timeout request 0x43 type 4
[ 4848.178194] nvme nvme1: queue 6: timeout request 0x44 type 4
[ 4848.179873] block nvme1n1: no usable path - requeuing I/O
[ 4848.180665] block nvme1n1: no usable path - requeuing I/O
[ 4848.180775] nvme nvme1: Reconnecting in 10 seconds...
[ 4848.181372] block nvme1n1: no usable path - requeuing I/O
[ 4848.183022] block nvme1n1: no usable path - requeuing I/O
[ 4848.183782] block nvme1n1: no usable path - requeuing I/O
[ 4858.436889] block nvme1n1: no usable path - requeuing I/O
[ 4858.437558] block nvme1n1: no usable path - requeuing I/O
[ 4858.438168] block nvme1n1: no usable path - requeuing I/O
[ 4858.439547] nvme nvme1: creating 16 I/O queues.
[ 4858.489650] nvme nvme1: mapped 16/0/0 default/read/poll queues.
[ 4858.490357] debugfs: Directory 'hctx0' with parent '/' already present!
[ 4858.491256] debugfs: Directory 'hctx1' with parent '/' already present!
[ 4858.492790] debugfs: Directory 'hctx2' with parent '/' already present!
[ 4858.493601] debugfs: Directory 'hctx3' with parent '/' already present!
[ 4858.494362] debugfs: Directory 'hctx4' with parent '/' already present!
[ 4858.495310] debugfs: Directory 'hctx5' with parent '/' already present!
[ 4858.496261] debugfs: Directory 'hctx6' with parent '/' already present!
[ 4858.497098] debugfs: Directory 'hctx7' with parent '/' already present!
[ 4858.497972] debugfs: Directory 'hctx8' with parent '/' already present!
[ 4858.498806] debugfs: Directory 'hctx9' with parent '/' already present!
[ 4858.499574] debugfs: Directory 'hctx10' with parent '/' already present                                                                                !
[ 4858.500268] debugfs: Directory 'hctx11' with parent '/' already present                                                                                !
[ 4858.501030] debugfs: Directory 'hctx12' with parent '/' already present                                                                                !
[ 4858.501807] debugfs: Directory 'hctx13' with parent '/' already present                                                                                !
[ 4858.502459] debugfs: Directory 'hctx14' with parent '/' already present                                                                                !
[ 4858.503196] debugfs: Directory 'hctx15' with parent '/' already present                                                                                !
[ 4858.510075] nvme nvme1: Successfully reconnected (1 attempt)
[12905.117166] perf: interrupt took too long (4944 > 4902), lowering kernel.perf_event_max_sample_rate to 40000
[15436.031342] nvme nvme0: queue 16: timeout request 0x22 type 4
[15436.031920] nvme nvme0: starting error recovery
[15436.033746] block nvme0n1: no usable path - requeuing I/O
[15436.035221] nvme nvme0: Reconnecting in 10 seconds...
[15446.292544] block nvme0n1: no usable path - requeuing I/O
[15446.293541] nvme nvme0: creating 16 I/O queues.
[15446.339595] nvme nvme0: mapped 16/0/0 default/read/poll queues.
[15446.340846] debugfs: Directory 'hctx0' with parent '/' already present!
[15446.342664] debugfs: Directory 'hctx1' with parent '/' already present!
[15446.343413] debugfs: Directory 'hctx2' with parent '/' already present!
[15446.344228] debugfs: Directory 'hctx3' with parent '/' already present!
[15446.345155] debugfs: Directory 'hctx4' with parent '/' already present!
[15446.345948] debugfs: Directory 'hctx5' with parent '/' already present!
[15446.346767] debugfs: Directory 'hctx6' with parent '/' already present!
[15446.347368] debugfs: Directory 'hctx7' with parent '/' already present!
[15446.348129] debugfs: Directory 'hctx8' with parent '/' already present!
[15446.348948] debugfs: Directory 'hctx9' with parent '/' already present!
[15446.349538] debugfs: Directory 'hctx10' with parent '/' already present!
[15446.350185] debugfs: Directory 'hctx11' with parent '/' already present!
[15446.350757] debugfs: Directory 'hctx12' with parent '/' already present!
[15446.351539] debugfs: Directory 'hctx13' with parent '/' already present!
[15446.352200] debugfs: Directory 'hctx14' with parent '/' already present!
[15446.352722] debugfs: Directory 'hctx15' with parent '/' already present!
[15446.358750] nvme nvme0: Successfully reconnected (1 attempt)

---


> > 
> > Target dmesg:
> > ---
> > [ 1704.132366] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.185987] nvmet: creating controller 2 for subsystem nvme-ram1 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.230065] nvmet: creating controller 3 for subsystem nvme-ram2 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.277712] nvmet: creating controller 4 for subsystem nvme-ram3 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.314457] nvmet: creating controller 5 for subsystem nvme-ram4 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.370124] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.435581] nvmet: creating controller 7 for subsystem nvme-ram6 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.501813] nvmet: creating controller 8 for subsystem nvme-ram7 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 2103.965017] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > ^^^^^^^^^^^^^^^
> > ---
> > 
> > Initiator dmesg:
> > ---
> > [ 1735.038634] EXT4-fs (nvme7n1): mounted filesystem with ordered data mode. Opts: (null)
> > [ 2111.990419] nvme nvme5: queue 7: timeout request 0x57 type 4
> > [ 2111.991835] nvme nvme5: starting error recovery
> > [ 2111.998796] block nvme5n1: no usable path - requeuing I/O
> > [ 2111.998816] nvme nvme5: Reconnecting in 10 seconds...
> > [ 2122.253431] block nvme5n1: no usable path - requeuing I/O
> > [ 2122.254732] nvme nvme5: creating 16 I/O queues.
> > [ 2122.301169] nvme nvme5: mapped 16/0/0 default/read/poll queues.
> > [ 2122.314229] nvme nvme5: Successfully reconnected (1 attempt)
> > ---
> > 
> > 
> > Thanks.
> >


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-14 19:27             ` Samuel Jones
@ 2020-12-14 20:13               ` Sagi Grimberg
  0 siblings, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2020-12-14 20:13 UTC (permalink / raw)
  To: Samuel Jones; +Cc: Potnuri Bharat Teja, hch, linux-nvme


> Hi Sagi,

Hey Sam,

> Without instrumenting the driver it's hard to say what might be happening here.

Yes, that would be the next step.

  But I did make a few comments at the end of my initial email which 
might be relevant:
> 
> 1. It seems abnormal to me that the direct send path does not check the return value of nvme_tcp_try_send(), given that this routine at least claims to be able to fail transiently. nvme_tcp_queue_request should reschedule the workqueue if nvme_tcp_try_send() does not return 1, IMHO.
> 
> 2. If, for whatever reason, kernel_sendpage or sock_no_sendpage returns -EAGAIN, the nvme_tcp_try_send() returns 0. This will be interpreted by nvme_tcp_io_work() as meaning there is nothing more to do. This is wrong, because in fact there is more work to do and nvme_tcp_io_work() should reschedule itself in this case. Unfortunately, as the system is coded, nvme_tcp_io_work() has no way of distinguishing between "nvme_tcp_try_send returned 0 because there is nothing to do" and "nvme_tcp_try_send returned 0 because of a transient failure".
> 
> Not sure how possible these cases are in practice but theoretically they can occur..

The design is that if sendpage fails with a transient error (e.g.
EAGAIN) the socket buffer is full and we're guaranteed to be called
with nvme_tcp_write_space when the socket makes additional room, when
write_space is called, we schedule io_work to resume its work, hence
the direct send path is not rescheduling the io_work in this case.

It may be possible that write_space is called when the socket buffer has
some more room, but it immediately gets filled with the direct send path
but then the io_work should be triggered and with the proposed fix it
should reschedule itself if it wasn't able to acquire the send_mutex.

But apparently there is a different race here that we're not seeing
at the moment...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-14 17:51         ` Sagi Grimberg
  2020-12-14 18:05           ` Potnuri Bharat Teja
@ 2020-12-15  1:53           ` Sagi Grimberg
  2020-12-15 18:30             ` Wunderlich, Mark
  2020-12-16  5:51             ` Potnuri Bharat Teja
  1 sibling, 2 replies; 15+ messages in thread
From: Sagi Grimberg @ 2020-12-15  1:53 UTC (permalink / raw)
  To: Potnuri Bharat Teja; +Cc: Samuel Jones, hch, linux-nvme


> Hey Potnuri,
> 
> Have you observed this further?
> 
> I'd think that if the io_work reschedule itself when it races
> with the direct send path this should not happen, but we may be
> seeing a different race going on here, adding Samuel who saw
> a similar phenomenon.

I think we still have a race here with the following:
1. queue_rq sends h2cdata PDU (no data)
2. host receives r2t - prepares data PDU to send and schedules io_work
3. queue_rq sends another h2cdata PDU - ends up sending (2) because it 
was queued before it
4. io_work starts, loops but never able to acquire the send_mutex - 
eventually just ends (dosn't requeue)
5. (3) completes, now nothing will send (2)

We can either schedule the io_work from the direct send path, but that
is less efficient than just trying to drain the send queue in the
direct send path and if not all was sent, the write_space callback
will trigger it.

Potnuri, does this patch solves what you are seeing?
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 1ba659927442..1b4e25624ba4 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -262,6 +262,16 @@ static inline void nvme_tcp_advance_req(struct 
nvme_tcp_request *req,
         }
  }

+static inline void nvme_tcp_send_all(struct nvme_tcp_queue *queue)
+{
+       int ret;
+
+       /* drain the send queue as much as we can... */
+       do {
+               ret = nvme_tcp_try_send(queue);
+       } while (ret > 0);
+}
+
  static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
                 bool sync, bool last)
  {
@@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct 
nvme_tcp_request *req,
         if (queue->io_cpu == smp_processor_id() &&
             sync && empty && mutex_trylock(&queue->send_mutex)) {
                 queue->more_requests = !last;
-               nvme_tcp_try_send(queue);
+               nvme_tcp_send_all(queue);
                 queue->more_requests = false;
                 mutex_unlock(&queue->send_mutex);
         } else if (last) {
@@ -1122,6 +1132,14 @@ static void nvme_tcp_io_work(struct work_struct *w)
                                 pending = true;
                         else if (unlikely(result < 0))
                                 break;
+               } else {
+                       /*
+                        * submission path is sending, we need to
+                        * continue or resched because the submission
+                        * path direct send is not concerned with
+                        * rescheduling...
+                        */
+                       pending = true;
                 }

                 result = nvme_tcp_try_recv(queue);
--

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: Request timeout seen with NVMEoF TCP
  2020-12-15  1:53           ` Sagi Grimberg
@ 2020-12-15 18:30             ` Wunderlich, Mark
  2020-12-19  1:04               ` Sagi Grimberg
  2020-12-16  5:51             ` Potnuri Bharat Teja
  1 sibling, 1 reply; 15+ messages in thread
From: Wunderlich, Mark @ 2020-12-15 18:30 UTC (permalink / raw)
  To: Sagi Grimberg, Potnuri Bharat Teja; +Cc: Samuel Jones, hch, linux-nvme

> I think we still have a race here with the following:
> 1. queue_rq sends h2cdata PDU (no data)
> 2. host receives r2t - prepares data PDU to send and schedules io_work 3. queue_rq sends another h2cdata PDU - ends up sending (2) because it was queued before it 4. io_work starts, loops but never able to acquire > the send_mutex - eventually just ends (dosn't requeue) 5. (3) completes, now nothing will send (2)

> We can either schedule the io_work from the direct send path, but that is less efficient than just trying to drain the send queue in the direct send path and if not all was sent, the write_space callback will trigger it.

Wouldn't the addition of the change to io_work() itself result in step (4) above never occurring?  Pending always being set if mutex can not be acquired, and if io_work() then exceeds time period it always re-queues itself.  So io_work() should always end up draining any send list eventually.  Unless io_work() exits early for some reason via the try_recv (ret < 0) return point without requeuing.

Or, can there be a case where there are sends through queue_rq where 'last' is false (and inline send conditions fail), but there is no subsequent send with 'last' true to schedule io_work()?  Might try changing queue_request() to always queue io_work() if inline send fails (not looking at last.  And add the second part of your patch for io_work() to set pending to true.  Also trap if ret<0 case above ever happens (or do break instead of return).  Re-run and see if failure still occurs.  Just a thought.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-15  1:53           ` Sagi Grimberg
  2020-12-15 18:30             ` Wunderlich, Mark
@ 2020-12-16  5:51             ` Potnuri Bharat Teja
  2020-12-19  1:35               ` Sagi Grimberg
  1 sibling, 1 reply; 15+ messages in thread
From: Potnuri Bharat Teja @ 2020-12-16  5:51 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Samuel Jones, hch, linux-nvme

On Monday, December 12/14/20, 2020 at 17:53:44 -0800, Sagi Grimberg wrote:
> 
> > Hey Potnuri,
> > 
> > Have you observed this further?
> > 
> > I'd think that if the io_work reschedule itself when it races
> > with the direct send path this should not happen, but we may be
> > seeing a different race going on here, adding Samuel who saw
> > a similar phenomenon.
> 
> I think we still have a race here with the following:
> 1. queue_rq sends h2cdata PDU (no data)
> 2. host receives r2t - prepares data PDU to send and schedules io_work
> 3. queue_rq sends another h2cdata PDU - ends up sending (2) because it was
> queued before it
> 4. io_work starts, loops but never able to acquire the send_mutex -
> eventually just ends (dosn't requeue)
> 5. (3) completes, now nothing will send (2)
> 
> We can either schedule the io_work from the direct send path, but that
> is less efficient than just trying to drain the send queue in the
> direct send path and if not all was sent, the write_space callback
> will trigger it.
> 
> Potnuri, does this patch solves what you are seeing?

Hi Sagi,
Below patch works fine. I have it running all night with out any issues.
Thanks.

> --
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 1ba659927442..1b4e25624ba4 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -262,6 +262,16 @@ static inline void nvme_tcp_advance_req(struct
> nvme_tcp_request *req,
>         }
>  }
> 
> +static inline void nvme_tcp_send_all(struct nvme_tcp_queue *queue)
> +{
> +       int ret;
> +
> +       /* drain the send queue as much as we can... */
> +       do {
> +               ret = nvme_tcp_try_send(queue);
> +       } while (ret > 0);
> +}
> +
>  static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
>                 bool sync, bool last)
>  {
> @@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct
> nvme_tcp_request *req,
>         if (queue->io_cpu == smp_processor_id() &&
>             sync && empty && mutex_trylock(&queue->send_mutex)) {
>                 queue->more_requests = !last;
> -               nvme_tcp_try_send(queue);
> +               nvme_tcp_send_all(queue);
>                 queue->more_requests = false;
>                 mutex_unlock(&queue->send_mutex);
>         } else if (last) {
> @@ -1122,6 +1132,14 @@ static void nvme_tcp_io_work(struct work_struct *w)
>                                 pending = true;
>                         else if (unlikely(result < 0))
>                                 break;
> +               } else {
> +                       /*
> +                        * submission path is sending, we need to
> +                        * continue or resched because the submission
> +                        * path direct send is not concerned with
> +                        * rescheduling...
> +                        */
> +                       pending = true;
>                 }
> 
>                 result = nvme_tcp_try_recv(queue);
> --

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-15 18:30             ` Wunderlich, Mark
@ 2020-12-19  1:04               ` Sagi Grimberg
  0 siblings, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2020-12-19  1:04 UTC (permalink / raw)
  To: Wunderlich, Mark, Potnuri Bharat Teja; +Cc: Samuel Jones, hch, linux-nvme


> Wouldn't the addition of the change to io_work() itself result in step (4) above never occurring?  Pending always being set if mutex can not be acquired, and if io_work() then exceeds time period it always re-queues itself.  So io_work() should always end up draining any send list eventually.  Unless io_work() exits early for some reason via the try_recv (ret < 0) return point without requeuing.

nvme_tcp_try_recv returns what it consumed, so it can return 0 if it did 
not have anything to recv in the socket, hence it may not reschedule itself.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-16  5:51             ` Potnuri Bharat Teja
@ 2020-12-19  1:35               ` Sagi Grimberg
  2020-12-21  7:50                 ` Potnuri Bharat Teja
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2020-12-19  1:35 UTC (permalink / raw)
  To: Potnuri Bharat Teja; +Cc: Samuel Jones, hch, linux-nvme


>> Potnuri, does this patch solves what you are seeing?
> 
> Hi Sagi,
> Below patch works fine. I have it running all night with out any issues.
> Thanks.

Thanks Potnuri, can you try the patch without the nvme_tcp_io_work
portion? I think the issue can be resolved with just adding
nvme_tcp_send_all..

Thinking more about what Mark mentioned and it is correct
that io_work needs to reschedule itself if it wasn't able
to acquire the send_mutex hence I am not sure it is needed.

Thanks.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Request timeout seen with NVMEoF TCP
  2020-12-19  1:35               ` Sagi Grimberg
@ 2020-12-21  7:50                 ` Potnuri Bharat Teja
  0 siblings, 0 replies; 15+ messages in thread
From: Potnuri Bharat Teja @ 2020-12-21  7:50 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Samuel Jones, hch, linux-nvme

On Friday, December 12/18/20, 2020 at 17:35:01 -0800, Sagi Grimberg wrote:
> 
> > > Potnuri, does this patch solves what you are seeing?
> > 
> > Hi Sagi,
> > Below patch works fine. I have it running all night with out any issues.
> > Thanks.
> 
> Thanks Potnuri, can you try the patch without the nvme_tcp_io_work
> portion? I think the issue can be resolved with just adding
> nvme_tcp_send_all..
> 
Yes. Issue is resolved with just nvme_tcp_send_all() change:
Test runs fine for me.
Thanks.

> Thinking more about what Mark mentioned and it is correct
> that io_work needs to reschedule itself if it wasn't able
> to acquire the send_mutex hence I am not sure it is needed.
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-12-21  7:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-09 17:19 Request timeout seen with NVMEoF TCP Potnuri Bharat Teja
2020-12-09 18:02 ` Sagi Grimberg
2020-12-10 17:06   ` Potnuri Bharat Teja
2020-12-10 21:47     ` Sagi Grimberg
2020-12-11  7:26       ` Potnuri Bharat Teja
2020-12-14 17:51         ` Sagi Grimberg
2020-12-14 18:05           ` Potnuri Bharat Teja
2020-12-14 19:27             ` Samuel Jones
2020-12-14 20:13               ` Sagi Grimberg
2020-12-15  1:53           ` Sagi Grimberg
2020-12-15 18:30             ` Wunderlich, Mark
2020-12-19  1:04               ` Sagi Grimberg
2020-12-16  5:51             ` Potnuri Bharat Teja
2020-12-19  1:35               ` Sagi Grimberg
2020-12-21  7:50                 ` Potnuri Bharat Teja

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.