* [bug report] NVMe/IB: reset_controller need more than 1min @ 2021-05-21 16:12 ` Yi Zhang 0 siblings, 0 replies; 45+ messages in thread From: Yi Zhang @ 2021-05-21 16:12 UTC (permalink / raw) To: linux-nvme, linux-rdma; +Cc: Sagi Grimberg, maxg Hi I found this issue on 5.13-rc2 with NVMe/IB environment, could anyone help check it? Thanks. $ time echo 1 >/sys/block/nvme0n1/device/reset_controller real 0m10.678s user 0m0.000s sys 0m0.000s $ time echo 1 >/sys/block/nvme0n1/device/reset_controller real 1m11.530s user 0m0.000s sys 0m0.000s target: $ dmesg | grep nvme [ 276.891454] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. [ 287.374412] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! [ 287.399317] nvmet: ctrl 1 fatal error occurred! [ 348.412672] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. client: $ dmesg | grep nvme [ 281.704475] nvme nvme0: creating 40 I/O queues. [ 285.557759] nvme nvme0: mapped 40/0/0 default/read/poll queues. [ 353.187809] nvme nvme0: I/O 8 QID 0 timeout [ 353.193100] nvme nvme0: Property Set error: 881, offset 0x14 [ 353.226082] nvme nvme0: creating 40 I/O queues. [ 357.088266] nvme nvme0: mapped 40/0/0 default/read/poll queues. -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* [bug report] NVMe/IB: reset_controller need more than 1min @ 2021-05-21 16:12 ` Yi Zhang 0 siblings, 0 replies; 45+ messages in thread From: Yi Zhang @ 2021-05-21 16:12 UTC (permalink / raw) To: linux-nvme, linux-rdma; +Cc: Sagi Grimberg, maxg Hi I found this issue on 5.13-rc2 with NVMe/IB environment, could anyone help check it? Thanks. $ time echo 1 >/sys/block/nvme0n1/device/reset_controller real 0m10.678s user 0m0.000s sys 0m0.000s $ time echo 1 >/sys/block/nvme0n1/device/reset_controller real 1m11.530s user 0m0.000s sys 0m0.000s target: $ dmesg | grep nvme [ 276.891454] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. [ 287.374412] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! [ 287.399317] nvmet: ctrl 1 fatal error occurred! [ 348.412672] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. client: $ dmesg | grep nvme [ 281.704475] nvme nvme0: creating 40 I/O queues. [ 285.557759] nvme nvme0: mapped 40/0/0 default/read/poll queues. [ 353.187809] nvme nvme0: I/O 8 QID 0 timeout [ 353.193100] nvme nvme0: Property Set error: 881, offset 0x14 [ 353.226082] nvme nvme0: creating 40 I/O queues. [ 357.088266] nvme nvme0: mapped 40/0/0 default/read/poll queues. -- Best Regards, Yi Zhang _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-05-21 16:12 ` Yi Zhang @ 2021-05-21 18:00 ` Sagi Grimberg -1 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2021-05-21 18:00 UTC (permalink / raw) To: Yi Zhang, linux-nvme, linux-rdma; +Cc: maxg > Hi > I found this issue on 5.13-rc2 with NVMe/IB environment, could anyone > help check it? > Thanks. > > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > real 0m10.678s > user 0m0.000s > sys 0m0.000s > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > real 1m11.530s > user 0m0.000s > sys 0m0.000s > > target: > $ dmesg | grep nvme > [ 276.891454] nvmet: creating controller 1 for subsystem testnqn for > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > [ 287.374412] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > [ 287.399317] nvmet: ctrl 1 fatal error occurred! > [ 348.412672] nvmet: creating controller 1 for subsystem testnqn for > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > client: > $ dmesg | grep nvme > [ 281.704475] nvme nvme0: creating 40 I/O queues. > [ 285.557759] nvme nvme0: mapped 40/0/0 default/read/poll queues. > [ 353.187809] nvme nvme0: I/O 8 QID 0 timeout > [ 353.193100] nvme nvme0: Property Set error: 881, offset 0x14 > [ 353.226082] nvme nvme0: creating 40 I/O queues. > [ 357.088266] nvme nvme0: mapped 40/0/0 default/read/poll queues. It appears that there is an admin timeout that is either triggered by the reset or unrelated. Can you run nvme reset /dev/nvme0 instead so we can see the "resetting controller" print? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min @ 2021-05-21 18:00 ` Sagi Grimberg 0 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2021-05-21 18:00 UTC (permalink / raw) To: Yi Zhang, linux-nvme, linux-rdma; +Cc: maxg > Hi > I found this issue on 5.13-rc2 with NVMe/IB environment, could anyone > help check it? > Thanks. > > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > real 0m10.678s > user 0m0.000s > sys 0m0.000s > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > real 1m11.530s > user 0m0.000s > sys 0m0.000s > > target: > $ dmesg | grep nvme > [ 276.891454] nvmet: creating controller 1 for subsystem testnqn for > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > [ 287.374412] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > [ 287.399317] nvmet: ctrl 1 fatal error occurred! > [ 348.412672] nvmet: creating controller 1 for subsystem testnqn for > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > client: > $ dmesg | grep nvme > [ 281.704475] nvme nvme0: creating 40 I/O queues. > [ 285.557759] nvme nvme0: mapped 40/0/0 default/read/poll queues. > [ 353.187809] nvme nvme0: I/O 8 QID 0 timeout > [ 353.193100] nvme nvme0: Property Set error: 881, offset 0x14 > [ 353.226082] nvme nvme0: creating 40 I/O queues. > [ 357.088266] nvme nvme0: mapped 40/0/0 default/read/poll queues. It appears that there is an admin timeout that is either triggered by the reset or unrelated. Can you run nvme reset /dev/nvme0 instead so we can see the "resetting controller" print? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-05-21 18:00 ` Sagi Grimberg @ 2021-05-22 4:27 ` Yi Zhang -1 siblings, 0 replies; 45+ messages in thread From: Yi Zhang @ 2021-05-22 4:27 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-nvme, linux-rdma On Sat, May 22, 2021 at 2:00 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > Hi > > I found this issue on 5.13-rc2 with NVMe/IB environment, could anyone > > help check it? > > Thanks. > > > > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > > real 0m10.678s > > user 0m0.000s > > sys 0m0.000s > > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > > real 1m11.530s > > user 0m0.000s > > sys 0m0.000s > > > > target: > > $ dmesg | grep nvme > > [ 276.891454] nvmet: creating controller 1 for subsystem testnqn for > > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > [ 287.374412] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > > [ 287.399317] nvmet: ctrl 1 fatal error occurred! > > [ 348.412672] nvmet: creating controller 1 for subsystem testnqn for > > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > > > client: > > $ dmesg | grep nvme > > [ 281.704475] nvme nvme0: creating 40 I/O queues. > > [ 285.557759] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > [ 353.187809] nvme nvme0: I/O 8 QID 0 timeout > > [ 353.193100] nvme nvme0: Property Set error: 881, offset 0x14 > > [ 353.226082] nvme nvme0: creating 40 I/O queues. > > [ 357.088266] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > It appears that there is an admin timeout that is either triggered > by the reset or unrelated. > > Can you run nvme reset /dev/nvme0 instead so we can see the "resetting > controller" print? > Yes, here is the log: ------------------------0 + nvme reset /dev/nvme0 real 0m10.737s user 0m0.004s sys 0m0.004s ------------------------1 + nvme reset /dev/nvme0 real 1m11.335s user 0m0.002s sys 0m0.005s target: [ 934.306016] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. [ 944.875021] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! [ 944.900051] nvmet: ctrl 1 fatal error occurred! [ 1005.628340] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. client: [ 857.264029] nvme nvme0: resetting controller [ 864.115369] nvme nvme0: creating 40 I/O queues. [ 867.996746] nvme nvme0: mapped 40/0/0 default/read/poll queues. [ 868.001673] nvme nvme0: resetting controller [ 935.396789] nvme nvme0: I/O 9 QID 0 timeout [ 935.402036] nvme nvme0: Property Set error: 881, offset 0x14 [ 935.438080] nvme nvme0: creating 40 I/O queues. [ 939.332125] nvme nvme0: mapped 40/0/0 default/read/poll queues. -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min @ 2021-05-22 4:27 ` Yi Zhang 0 siblings, 0 replies; 45+ messages in thread From: Yi Zhang @ 2021-05-22 4:27 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-nvme, linux-rdma On Sat, May 22, 2021 at 2:00 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > Hi > > I found this issue on 5.13-rc2 with NVMe/IB environment, could anyone > > help check it? > > Thanks. > > > > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > > real 0m10.678s > > user 0m0.000s > > sys 0m0.000s > > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > > real 1m11.530s > > user 0m0.000s > > sys 0m0.000s > > > > target: > > $ dmesg | grep nvme > > [ 276.891454] nvmet: creating controller 1 for subsystem testnqn for > > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > [ 287.374412] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > > [ 287.399317] nvmet: ctrl 1 fatal error occurred! > > [ 348.412672] nvmet: creating controller 1 for subsystem testnqn for > > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > > > client: > > $ dmesg | grep nvme > > [ 281.704475] nvme nvme0: creating 40 I/O queues. > > [ 285.557759] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > [ 353.187809] nvme nvme0: I/O 8 QID 0 timeout > > [ 353.193100] nvme nvme0: Property Set error: 881, offset 0x14 > > [ 353.226082] nvme nvme0: creating 40 I/O queues. > > [ 357.088266] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > It appears that there is an admin timeout that is either triggered > by the reset or unrelated. > > Can you run nvme reset /dev/nvme0 instead so we can see the "resetting > controller" print? > Yes, here is the log: ------------------------0 + nvme reset /dev/nvme0 real 0m10.737s user 0m0.004s sys 0m0.004s ------------------------1 + nvme reset /dev/nvme0 real 1m11.335s user 0m0.002s sys 0m0.005s target: [ 934.306016] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. [ 944.875021] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! [ 944.900051] nvmet: ctrl 1 fatal error occurred! [ 1005.628340] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. client: [ 857.264029] nvme nvme0: resetting controller [ 864.115369] nvme nvme0: creating 40 I/O queues. [ 867.996746] nvme nvme0: mapped 40/0/0 default/read/poll queues. [ 868.001673] nvme nvme0: resetting controller [ 935.396789] nvme nvme0: I/O 9 QID 0 timeout [ 935.402036] nvme nvme0: Property Set error: 881, offset 0x14 [ 935.438080] nvme nvme0: creating 40 I/O queues. [ 939.332125] nvme nvme0: mapped 40/0/0 default/read/poll queues. -- Best Regards, Yi Zhang _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-05-22 4:27 ` Yi Zhang @ 2021-06-23 10:01 ` Yi Zhang -1 siblings, 0 replies; 45+ messages in thread From: Yi Zhang @ 2021-06-23 10:01 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-nvme, linux-rdma Hello Gentle ping here, this issue still exists on latest 5.13-rc7 # time nvme reset /dev/nvme0 real 0m12.636s user 0m0.002s sys 0m0.005s # time nvme reset /dev/nvme0 real 0m12.641s user 0m0.000s sys 0m0.007s # time nvme reset /dev/nvme0 real 1m16.133s user 0m0.000s sys 0m0.007s On Sat, May 22, 2021 at 12:27 PM Yi Zhang <yi.zhang@redhat.com> wrote: > > On Sat, May 22, 2021 at 2:00 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > > > > Hi > > > I found this issue on 5.13-rc2 with NVMe/IB environment, could anyone > > > help check it? > > > Thanks. > > > > > > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > > > real 0m10.678s > > > user 0m0.000s > > > sys 0m0.000s > > > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > > > real 1m11.530s > > > user 0m0.000s > > > sys 0m0.000s > > > > > > target: > > > $ dmesg | grep nvme > > > [ 276.891454] nvmet: creating controller 1 for subsystem testnqn for > > > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > > [ 287.374412] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > > > [ 287.399317] nvmet: ctrl 1 fatal error occurred! > > > [ 348.412672] nvmet: creating controller 1 for subsystem testnqn for > > > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > > > > > client: > > > $ dmesg | grep nvme > > > [ 281.704475] nvme nvme0: creating 40 I/O queues. > > > [ 285.557759] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > > [ 353.187809] nvme nvme0: I/O 8 QID 0 timeout > > > [ 353.193100] nvme nvme0: Property Set error: 881, offset 0x14 > > > [ 353.226082] nvme nvme0: creating 40 I/O queues. > > > [ 357.088266] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > > > It appears that there is an admin timeout that is either triggered > > by the reset or unrelated. > > > > Can you run nvme reset /dev/nvme0 instead so we can see the "resetting > > controller" print? > > > Yes, here is the log: > ------------------------0 > + nvme reset /dev/nvme0 > real 0m10.737s > user 0m0.004s > sys 0m0.004s > ------------------------1 > + nvme reset /dev/nvme0 > real 1m11.335s > user 0m0.002s > sys 0m0.005s > > target: > [ 934.306016] nvmet: creating controller 1 for subsystem testnqn for > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > [ 944.875021] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > [ 944.900051] nvmet: ctrl 1 fatal error occurred! > [ 1005.628340] nvmet: creating controller 1 for subsystem testnqn for > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > client: > [ 857.264029] nvme nvme0: resetting controller > [ 864.115369] nvme nvme0: creating 40 I/O queues. > [ 867.996746] nvme nvme0: mapped 40/0/0 default/read/poll queues. > [ 868.001673] nvme nvme0: resetting controller > [ 935.396789] nvme nvme0: I/O 9 QID 0 timeout > [ 935.402036] nvme nvme0: Property Set error: 881, offset 0x14 > [ 935.438080] nvme nvme0: creating 40 I/O queues. > [ 939.332125] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > > -- > Best Regards, > Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min @ 2021-06-23 10:01 ` Yi Zhang 0 siblings, 0 replies; 45+ messages in thread From: Yi Zhang @ 2021-06-23 10:01 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-nvme, linux-rdma Hello Gentle ping here, this issue still exists on latest 5.13-rc7 # time nvme reset /dev/nvme0 real 0m12.636s user 0m0.002s sys 0m0.005s # time nvme reset /dev/nvme0 real 0m12.641s user 0m0.000s sys 0m0.007s # time nvme reset /dev/nvme0 real 1m16.133s user 0m0.000s sys 0m0.007s On Sat, May 22, 2021 at 12:27 PM Yi Zhang <yi.zhang@redhat.com> wrote: > > On Sat, May 22, 2021 at 2:00 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > > > > Hi > > > I found this issue on 5.13-rc2 with NVMe/IB environment, could anyone > > > help check it? > > > Thanks. > > > > > > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > > > real 0m10.678s > > > user 0m0.000s > > > sys 0m0.000s > > > $ time echo 1 >/sys/block/nvme0n1/device/reset_controller > > > real 1m11.530s > > > user 0m0.000s > > > sys 0m0.000s > > > > > > target: > > > $ dmesg | grep nvme > > > [ 276.891454] nvmet: creating controller 1 for subsystem testnqn for > > > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > > [ 287.374412] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > > > [ 287.399317] nvmet: ctrl 1 fatal error occurred! > > > [ 348.412672] nvmet: creating controller 1 for subsystem testnqn for > > > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > > > > > client: > > > $ dmesg | grep nvme > > > [ 281.704475] nvme nvme0: creating 40 I/O queues. > > > [ 285.557759] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > > [ 353.187809] nvme nvme0: I/O 8 QID 0 timeout > > > [ 353.193100] nvme nvme0: Property Set error: 881, offset 0x14 > > > [ 353.226082] nvme nvme0: creating 40 I/O queues. > > > [ 357.088266] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > > > It appears that there is an admin timeout that is either triggered > > by the reset or unrelated. > > > > Can you run nvme reset /dev/nvme0 instead so we can see the "resetting > > controller" print? > > > Yes, here is the log: > ------------------------0 > + nvme reset /dev/nvme0 > real 0m10.737s > user 0m0.004s > sys 0m0.004s > ------------------------1 > + nvme reset /dev/nvme0 > real 1m11.335s > user 0m0.002s > sys 0m0.005s > > target: > [ 934.306016] nvmet: creating controller 1 for subsystem testnqn for > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > [ 944.875021] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > [ 944.900051] nvmet: ctrl 1 fatal error occurred! > [ 1005.628340] nvmet: creating controller 1 for subsystem testnqn for > NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > client: > [ 857.264029] nvme nvme0: resetting controller > [ 864.115369] nvme nvme0: creating 40 I/O queues. > [ 867.996746] nvme nvme0: mapped 40/0/0 default/read/poll queues. > [ 868.001673] nvme nvme0: resetting controller > [ 935.396789] nvme nvme0: I/O 9 QID 0 timeout > [ 935.402036] nvme nvme0: Property Set error: 881, offset 0x14 > [ 935.438080] nvme nvme0: creating 40 I/O queues. > [ 939.332125] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > > -- > Best Regards, > Yi Zhang _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-06-23 10:01 ` Yi Zhang @ 2021-06-23 21:32 ` Sagi Grimberg -1 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2021-06-23 21:32 UTC (permalink / raw) To: Yi Zhang; +Cc: linux-nvme, linux-rdma > Hello > > Gentle ping here, this issue still exists on latest 5.13-rc7 > > # time nvme reset /dev/nvme0 > > real 0m12.636s > user 0m0.002s > sys 0m0.005s > # time nvme reset /dev/nvme0 > > real 0m12.641s > user 0m0.000s > sys 0m0.007s Strange that even normal resets take so long... What device are you using? > # time nvme reset /dev/nvme0 > > real 1m16.133s > user 0m0.000s > sys 0m0.007s There seems to be a spurious command timeout here, but maybe this is due to the fact that the queues take so long to connect and the target expires the keep-alive timer. Does this patch help? -- diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c index 7d0f3523fdab..f4a7db1ab3e5 100644 --- a/drivers/nvme/target/fabrics-cmd.c +++ b/drivers/nvme/target/fabrics-cmd.c @@ -142,6 +142,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl *ctrl, struct nvmet_req *req) } } + /* + * Controller establishment flow may take some time, and the host may not + * send us keep-alive during this period, hence reset the + * traffic based keep-alive timer so we don't trigger a + * controller teardown as a result of a keep-alive expiration. + */ + ctrl->reset_tbkas = true; + return 0; err: -- >> target: >> [ 934.306016] nvmet: creating controller 1 for subsystem testnqn for >> NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. >> [ 944.875021] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! >> [ 944.900051] nvmet: ctrl 1 fatal error occurred! >> [ 1005.628340] nvmet: creating controller 1 for subsystem testnqn for >> NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. >> >> client: >> [ 857.264029] nvme nvme0: resetting controller >> [ 864.115369] nvme nvme0: creating 40 I/O queues. >> [ 867.996746] nvme nvme0: mapped 40/0/0 default/read/poll queues. >> [ 868.001673] nvme nvme0: resetting controller >> [ 935.396789] nvme nvme0: I/O 9 QID 0 timeout >> [ 935.402036] nvme nvme0: Property Set error: 881, offset 0x14 >> [ 935.438080] nvme nvme0: creating 40 I/O queues. >> [ 939.332125] nvme nvme0: mapped 40/0/0 default/read/poll queues. ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min @ 2021-06-23 21:32 ` Sagi Grimberg 0 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2021-06-23 21:32 UTC (permalink / raw) To: Yi Zhang; +Cc: linux-nvme, linux-rdma > Hello > > Gentle ping here, this issue still exists on latest 5.13-rc7 > > # time nvme reset /dev/nvme0 > > real 0m12.636s > user 0m0.002s > sys 0m0.005s > # time nvme reset /dev/nvme0 > > real 0m12.641s > user 0m0.000s > sys 0m0.007s Strange that even normal resets take so long... What device are you using? > # time nvme reset /dev/nvme0 > > real 1m16.133s > user 0m0.000s > sys 0m0.007s There seems to be a spurious command timeout here, but maybe this is due to the fact that the queues take so long to connect and the target expires the keep-alive timer. Does this patch help? -- diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c index 7d0f3523fdab..f4a7db1ab3e5 100644 --- a/drivers/nvme/target/fabrics-cmd.c +++ b/drivers/nvme/target/fabrics-cmd.c @@ -142,6 +142,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl *ctrl, struct nvmet_req *req) } } + /* + * Controller establishment flow may take some time, and the host may not + * send us keep-alive during this period, hence reset the + * traffic based keep-alive timer so we don't trigger a + * controller teardown as a result of a keep-alive expiration. + */ + ctrl->reset_tbkas = true; + return 0; err: -- >> target: >> [ 934.306016] nvmet: creating controller 1 for subsystem testnqn for >> NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. >> [ 944.875021] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! >> [ 944.900051] nvmet: ctrl 1 fatal error occurred! >> [ 1005.628340] nvmet: creating controller 1 for subsystem testnqn for >> NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. >> >> client: >> [ 857.264029] nvme nvme0: resetting controller >> [ 864.115369] nvme nvme0: creating 40 I/O queues. >> [ 867.996746] nvme nvme0: mapped 40/0/0 default/read/poll queues. >> [ 868.001673] nvme nvme0: resetting controller >> [ 935.396789] nvme nvme0: I/O 9 QID 0 timeout >> [ 935.402036] nvme nvme0: Property Set error: 881, offset 0x14 >> [ 935.438080] nvme nvme0: creating 40 I/O queues. >> [ 939.332125] nvme nvme0: mapped 40/0/0 default/read/poll queues. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-06-23 21:32 ` Sagi Grimberg @ 2021-06-24 16:14 ` Yi Zhang -1 siblings, 0 replies; 45+ messages in thread From: Yi Zhang @ 2021-06-24 16:14 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-nvme, linux-rdma On Thu, Jun 24, 2021 at 5:32 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > Hello > > > > Gentle ping here, this issue still exists on latest 5.13-rc7 > > > > # time nvme reset /dev/nvme0 > > > > real 0m12.636s > > user 0m0.002s > > sys 0m0.005s > > # time nvme reset /dev/nvme0 > > > > real 0m12.641s > > user 0m0.000s > > sys 0m0.007s > > Strange that even normal resets take so long... > What device are you using? Hi Sagi Here is the device info: Mellanox Technologies MT27700 Family [ConnectX-4] > > > # time nvme reset /dev/nvme0 > > > > real 1m16.133s > > user 0m0.000s > > sys 0m0.007s > > There seems to be a spurious command timeout here, but maybe this > is due to the fact that the queues take so long to connect and > the target expires the keep-alive timer. > > Does this patch help? The issue still exists, let me know if you need more testing for it. :) > -- > diff --git a/drivers/nvme/target/fabrics-cmd.c > b/drivers/nvme/target/fabrics-cmd.c > index 7d0f3523fdab..f4a7db1ab3e5 100644 > --- a/drivers/nvme/target/fabrics-cmd.c > +++ b/drivers/nvme/target/fabrics-cmd.c > @@ -142,6 +142,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl > *ctrl, struct nvmet_req *req) > } > } > > + /* > + * Controller establishment flow may take some time, and the > host may not > + * send us keep-alive during this period, hence reset the > + * traffic based keep-alive timer so we don't trigger a > + * controller teardown as a result of a keep-alive expiration. > + */ > + ctrl->reset_tbkas = true; > + > return 0; > > err: > -- > > >> target: > >> [ 934.306016] nvmet: creating controller 1 for subsystem testnqn for > >> NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > >> [ 944.875021] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > >> [ 944.900051] nvmet: ctrl 1 fatal error occurred! > >> [ 1005.628340] nvmet: creating controller 1 for subsystem testnqn for > >> NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > >> > >> client: > >> [ 857.264029] nvme nvme0: resetting controller > >> [ 864.115369] nvme nvme0: creating 40 I/O queues. > >> [ 867.996746] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >> [ 868.001673] nvme nvme0: resetting controller > >> [ 935.396789] nvme nvme0: I/O 9 QID 0 timeout > >> [ 935.402036] nvme nvme0: Property Set error: 881, offset 0x14 > >> [ 935.438080] nvme nvme0: creating 40 I/O queues. > >> [ 939.332125] nvme nvme0: mapped 40/0/0 default/read/poll queues. > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min @ 2021-06-24 16:14 ` Yi Zhang 0 siblings, 0 replies; 45+ messages in thread From: Yi Zhang @ 2021-06-24 16:14 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-nvme, linux-rdma On Thu, Jun 24, 2021 at 5:32 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > Hello > > > > Gentle ping here, this issue still exists on latest 5.13-rc7 > > > > # time nvme reset /dev/nvme0 > > > > real 0m12.636s > > user 0m0.002s > > sys 0m0.005s > > # time nvme reset /dev/nvme0 > > > > real 0m12.641s > > user 0m0.000s > > sys 0m0.007s > > Strange that even normal resets take so long... > What device are you using? Hi Sagi Here is the device info: Mellanox Technologies MT27700 Family [ConnectX-4] > > > # time nvme reset /dev/nvme0 > > > > real 1m16.133s > > user 0m0.000s > > sys 0m0.007s > > There seems to be a spurious command timeout here, but maybe this > is due to the fact that the queues take so long to connect and > the target expires the keep-alive timer. > > Does this patch help? The issue still exists, let me know if you need more testing for it. :) > -- > diff --git a/drivers/nvme/target/fabrics-cmd.c > b/drivers/nvme/target/fabrics-cmd.c > index 7d0f3523fdab..f4a7db1ab3e5 100644 > --- a/drivers/nvme/target/fabrics-cmd.c > +++ b/drivers/nvme/target/fabrics-cmd.c > @@ -142,6 +142,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl > *ctrl, struct nvmet_req *req) > } > } > > + /* > + * Controller establishment flow may take some time, and the > host may not > + * send us keep-alive during this period, hence reset the > + * traffic based keep-alive timer so we don't trigger a > + * controller teardown as a result of a keep-alive expiration. > + */ > + ctrl->reset_tbkas = true; > + > return 0; > > err: > -- > > >> target: > >> [ 934.306016] nvmet: creating controller 1 for subsystem testnqn for > >> NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > >> [ 944.875021] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > >> [ 944.900051] nvmet: ctrl 1 fatal error occurred! > >> [ 1005.628340] nvmet: creating controller 1 for subsystem testnqn for > >> NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > >> > >> client: > >> [ 857.264029] nvme nvme0: resetting controller > >> [ 864.115369] nvme nvme0: creating 40 I/O queues. > >> [ 867.996746] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >> [ 868.001673] nvme nvme0: resetting controller > >> [ 935.396789] nvme nvme0: I/O 9 QID 0 timeout > >> [ 935.402036] nvme nvme0: Property Set error: 881, offset 0x14 > >> [ 935.438080] nvme nvme0: creating 40 I/O queues. > >> [ 939.332125] nvme nvme0: mapped 40/0/0 default/read/poll queues. > -- Best Regards, Yi Zhang _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-06-24 16:14 ` Yi Zhang (?) @ 2021-12-11 3:01 ` Yi Zhang 2021-12-12 9:45 ` Sagi Grimberg -1 siblings, 1 reply; 45+ messages in thread From: Yi Zhang @ 2021-12-11 3:01 UTC (permalink / raw) To: Sagi Grimberg; +Cc: open list:NVM EXPRESS DRIVER, RDMA mailing list On Fri, Jun 25, 2021 at 12:14 AM Yi Zhang <yi.zhang@redhat.com> wrote: > > On Thu, Jun 24, 2021 at 5:32 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > > > > Hello > > > > > > Gentle ping here, this issue still exists on latest 5.13-rc7 > > > > > > # time nvme reset /dev/nvme0 > > > > > > real 0m12.636s > > > user 0m0.002s > > > sys 0m0.005s > > > # time nvme reset /dev/nvme0 > > > > > > real 0m12.641s > > > user 0m0.000s > > > sys 0m0.007s > > > > Strange that even normal resets take so long... > > What device are you using? > > Hi Sagi > > Here is the device info: > Mellanox Technologies MT27700 Family [ConnectX-4] > > > > > > # time nvme reset /dev/nvme0 > > > > > > real 1m16.133s > > > user 0m0.000s > > > sys 0m0.007s > > > > There seems to be a spurious command timeout here, but maybe this > > is due to the fact that the queues take so long to connect and > > the target expires the keep-alive timer. > > > > Does this patch help? > > The issue still exists, let me know if you need more testing for it. :) Hi Sagi ping, this issue still can be reproduced on the latest linux-block/for-next, do you have a chance to recheck it, thanks. > > > > -- > > diff --git a/drivers/nvme/target/fabrics-cmd.c > > b/drivers/nvme/target/fabrics-cmd.c > > index 7d0f3523fdab..f4a7db1ab3e5 100644 > > --- a/drivers/nvme/target/fabrics-cmd.c > > +++ b/drivers/nvme/target/fabrics-cmd.c > > @@ -142,6 +142,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl > > *ctrl, struct nvmet_req *req) > > } > > } > > > > + /* > > + * Controller establishment flow may take some time, and the > > host may not > > + * send us keep-alive during this period, hence reset the > > + * traffic based keep-alive timer so we don't trigger a > > + * controller teardown as a result of a keep-alive expiration. > > + */ > > + ctrl->reset_tbkas = true; > > + > > return 0; > > > > err: > > -- > > > > >> target: > > >> [ 934.306016] nvmet: creating controller 1 for subsystem testnqn for > > >> NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > >> [ 944.875021] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > > >> [ 944.900051] nvmet: ctrl 1 fatal error occurred! > > >> [ 1005.628340] nvmet: creating controller 1 for subsystem testnqn for > > >> NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4c10-8058-b7c04f383432. > > >> > > >> client: > > >> [ 857.264029] nvme nvme0: resetting controller > > >> [ 864.115369] nvme nvme0: creating 40 I/O queues. > > >> [ 867.996746] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > >> [ 868.001673] nvme nvme0: resetting controller > > >> [ 935.396789] nvme nvme0: I/O 9 QID 0 timeout > > >> [ 935.402036] nvme nvme0: Property Set error: 881, offset 0x14 > > >> [ 935.438080] nvme nvme0: creating 40 I/O queues. > > >> [ 939.332125] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > > > > -- > Best Regards, > Yi Zhang -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-11 3:01 ` Yi Zhang @ 2021-12-12 9:45 ` Sagi Grimberg 2021-12-13 6:12 ` Yi Zhang 0 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2021-12-12 9:45 UTC (permalink / raw) To: Yi Zhang; +Cc: open list:NVM EXPRESS DRIVER, RDMA mailing list On 12/11/21 5:01 AM, Yi Zhang wrote: > On Fri, Jun 25, 2021 at 12:14 AM Yi Zhang <yi.zhang@redhat.com> wrote: >> >> On Thu, Jun 24, 2021 at 5:32 AM Sagi Grimberg <sagi@grimberg.me> wrote: >>> >>> >>>> Hello >>>> >>>> Gentle ping here, this issue still exists on latest 5.13-rc7 >>>> >>>> # time nvme reset /dev/nvme0 >>>> >>>> real 0m12.636s >>>> user 0m0.002s >>>> sys 0m0.005s >>>> # time nvme reset /dev/nvme0 >>>> >>>> real 0m12.641s >>>> user 0m0.000s >>>> sys 0m0.007s >>> >>> Strange that even normal resets take so long... >>> What device are you using? >> >> Hi Sagi >> >> Here is the device info: >> Mellanox Technologies MT27700 Family [ConnectX-4] >> >>> >>>> # time nvme reset /dev/nvme0 >>>> >>>> real 1m16.133s >>>> user 0m0.000s >>>> sys 0m0.007s >>> >>> There seems to be a spurious command timeout here, but maybe this >>> is due to the fact that the queues take so long to connect and >>> the target expires the keep-alive timer. >>> >>> Does this patch help? >> >> The issue still exists, let me know if you need more testing for it. :) > > Hi Sagi > ping, this issue still can be reproduced on the latest > linux-block/for-next, do you have a chance to recheck it, thanks. Can you check if it happens with the below patch: -- diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c index f91a56180d3d..6e5aadfb07a0 100644 --- a/drivers/nvme/target/fabrics-cmd.c +++ b/drivers/nvme/target/fabrics-cmd.c @@ -191,6 +191,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl *ctrl, struct nvmet_req *req) } } + /* + * Controller establishment flow may take some time, and the host may not + * send us keep-alive during this period, hence reset the + * traffic based keep-alive timer so we don't trigger a + * controller teardown as a result of a keep-alive expiration. + */ + ctrl->reset_tbkas = true; + return 0; err: -- ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-12 9:45 ` Sagi Grimberg @ 2021-12-13 6:12 ` Yi Zhang 2021-12-13 9:04 ` Sagi Grimberg 0 siblings, 1 reply; 45+ messages in thread From: Yi Zhang @ 2021-12-13 6:12 UTC (permalink / raw) To: Sagi Grimberg; +Cc: open list:NVM EXPRESS DRIVER, RDMA mailing list On Sun, Dec 12, 2021 at 5:45 PM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > On 12/11/21 5:01 AM, Yi Zhang wrote: > > On Fri, Jun 25, 2021 at 12:14 AM Yi Zhang <yi.zhang@redhat.com> wrote: > >> > >> On Thu, Jun 24, 2021 at 5:32 AM Sagi Grimberg <sagi@grimberg.me> wrote: > >>> > >>> > >>>> Hello > >>>> > >>>> Gentle ping here, this issue still exists on latest 5.13-rc7 > >>>> > >>>> # time nvme reset /dev/nvme0 > >>>> > >>>> real 0m12.636s > >>>> user 0m0.002s > >>>> sys 0m0.005s > >>>> # time nvme reset /dev/nvme0 > >>>> > >>>> real 0m12.641s > >>>> user 0m0.000s > >>>> sys 0m0.007s > >>> > >>> Strange that even normal resets take so long... > >>> What device are you using? > >> > >> Hi Sagi > >> > >> Here is the device info: > >> Mellanox Technologies MT27700 Family [ConnectX-4] > >> > >>> > >>>> # time nvme reset /dev/nvme0 > >>>> > >>>> real 1m16.133s > >>>> user 0m0.000s > >>>> sys 0m0.007s > >>> > >>> There seems to be a spurious command timeout here, but maybe this > >>> is due to the fact that the queues take so long to connect and > >>> the target expires the keep-alive timer. > >>> > >>> Does this patch help? > >> > >> The issue still exists, let me know if you need more testing for it. :) > > > > Hi Sagi > > ping, this issue still can be reproduced on the latest > > linux-block/for-next, do you have a chance to recheck it, thanks. > > Can you check if it happens with the below patch: Hi Sagi It is still reproducible with the change, here is the log: # time nvme reset /dev/nvme0 real 0m12.973s user 0m0.000s sys 0m0.006s # time nvme reset /dev/nvme0 real 1m15.606s user 0m0.000s sys 0m0.007s # dmesg | grep nvme [ 900.634877] nvme nvme0: resetting controller [ 909.026958] nvme nvme0: creating 40 I/O queues. [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. [ 917.600993] nvme nvme0: resetting controller [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 [ 988.608181] nvme nvme0: creating 40 I/O queues. [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. BTW, this issue cannot be reproduced on my NVME/ROCE environment. > -- > diff --git a/drivers/nvme/target/fabrics-cmd.c > b/drivers/nvme/target/fabrics-cmd.c > index f91a56180d3d..6e5aadfb07a0 100644 > --- a/drivers/nvme/target/fabrics-cmd.c > +++ b/drivers/nvme/target/fabrics-cmd.c > @@ -191,6 +191,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl > *ctrl, struct nvmet_req *req) > } > } > > + /* > + * Controller establishment flow may take some time, and the > host may not > + * send us keep-alive during this period, hence reset the > + * traffic based keep-alive timer so we don't trigger a > + * controller teardown as a result of a keep-alive expiration. > + */ > + ctrl->reset_tbkas = true; > + > return 0; > > err: > -- > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-13 6:12 ` Yi Zhang @ 2021-12-13 9:04 ` Sagi Grimberg 2021-12-13 17:05 ` Yi Zhang 0 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2021-12-13 9:04 UTC (permalink / raw) To: Yi Zhang; +Cc: open list:NVM EXPRESS DRIVER, RDMA mailing list >>>>>> Hello >>>>>> >>>>>> Gentle ping here, this issue still exists on latest 5.13-rc7 >>>>>> >>>>>> # time nvme reset /dev/nvme0 >>>>>> >>>>>> real 0m12.636s >>>>>> user 0m0.002s >>>>>> sys 0m0.005s >>>>>> # time nvme reset /dev/nvme0 >>>>>> >>>>>> real 0m12.641s >>>>>> user 0m0.000s >>>>>> sys 0m0.007s >>>>> >>>>> Strange that even normal resets take so long... >>>>> What device are you using? >>>> >>>> Hi Sagi >>>> >>>> Here is the device info: >>>> Mellanox Technologies MT27700 Family [ConnectX-4] >>>> >>>>> >>>>>> # time nvme reset /dev/nvme0 >>>>>> >>>>>> real 1m16.133s >>>>>> user 0m0.000s >>>>>> sys 0m0.007s >>>>> >>>>> There seems to be a spurious command timeout here, but maybe this >>>>> is due to the fact that the queues take so long to connect and >>>>> the target expires the keep-alive timer. >>>>> >>>>> Does this patch help? >>>> >>>> The issue still exists, let me know if you need more testing for it. :) >>> >>> Hi Sagi >>> ping, this issue still can be reproduced on the latest >>> linux-block/for-next, do you have a chance to recheck it, thanks. >> >> Can you check if it happens with the below patch: > > Hi Sagi > It is still reproducible with the change, here is the log: > > # time nvme reset /dev/nvme0 > > real 0m12.973s > user 0m0.000s > sys 0m0.006s > # time nvme reset /dev/nvme0 > > real 1m15.606s > user 0m0.000s > sys 0m0.007s Does it speed up if you use less queues? (i.e. connect with -i 4) ? > > # dmesg | grep nvme > [ 900.634877] nvme nvme0: resetting controller > [ 909.026958] nvme nvme0: creating 40 I/O queues. > [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. > [ 917.600993] nvme nvme0: resetting controller > [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout > [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 > [ 988.608181] nvme nvme0: creating 40 I/O queues. > [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > BTW, this issue cannot be reproduced on my NVME/ROCE environment. Then I think that we need the rdma folks to help here... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-13 9:04 ` Sagi Grimberg @ 2021-12-13 17:05 ` Yi Zhang 2021-12-14 10:39 ` Sagi Grimberg 0 siblings, 1 reply; 45+ messages in thread From: Yi Zhang @ 2021-12-13 17:05 UTC (permalink / raw) To: Sagi Grimberg; +Cc: open list:NVM EXPRESS DRIVER, RDMA mailing list On Mon, Dec 13, 2021 at 5:05 PM Sagi Grimberg <sagi@grimberg.me> wrote: > > > >>>>>> Hello > >>>>>> > >>>>>> Gentle ping here, this issue still exists on latest 5.13-rc7 > >>>>>> > >>>>>> # time nvme reset /dev/nvme0 > >>>>>> > >>>>>> real 0m12.636s > >>>>>> user 0m0.002s > >>>>>> sys 0m0.005s > >>>>>> # time nvme reset /dev/nvme0 > >>>>>> > >>>>>> real 0m12.641s > >>>>>> user 0m0.000s > >>>>>> sys 0m0.007s > >>>>> > >>>>> Strange that even normal resets take so long... > >>>>> What device are you using? > >>>> > >>>> Hi Sagi > >>>> > >>>> Here is the device info: > >>>> Mellanox Technologies MT27700 Family [ConnectX-4] > >>>> > >>>>> > >>>>>> # time nvme reset /dev/nvme0 > >>>>>> > >>>>>> real 1m16.133s > >>>>>> user 0m0.000s > >>>>>> sys 0m0.007s > >>>>> > >>>>> There seems to be a spurious command timeout here, but maybe this > >>>>> is due to the fact that the queues take so long to connect and > >>>>> the target expires the keep-alive timer. > >>>>> > >>>>> Does this patch help? > >>>> > >>>> The issue still exists, let me know if you need more testing for it. :) > >>> > >>> Hi Sagi > >>> ping, this issue still can be reproduced on the latest > >>> linux-block/for-next, do you have a chance to recheck it, thanks. > >> > >> Can you check if it happens with the below patch: > > > > Hi Sagi > > It is still reproducible with the change, here is the log: > > > > # time nvme reset /dev/nvme0 > > > > real 0m12.973s > > user 0m0.000s > > sys 0m0.006s > > # time nvme reset /dev/nvme0 > > > > real 1m15.606s > > user 0m0.000s > > sys 0m0.007s > > Does it speed up if you use less queues? (i.e. connect with -i 4) ? Yes, with -i 4, it has stablee 1.3s # time nvme reset /dev/nvme0 real 0m1.225s user 0m0.000s sys 0m0.007s > > > > > # dmesg | grep nvme > > [ 900.634877] nvme nvme0: resetting controller > > [ 909.026958] nvme nvme0: creating 40 I/O queues. > > [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > [ 917.600993] nvme nvme0: resetting controller > > [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout > > [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 > > [ 988.608181] nvme nvme0: creating 40 I/O queues. > > [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. > > > > BTW, this issue cannot be reproduced on my NVME/ROCE environment. > > Then I think that we need the rdma folks to help here... > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-13 17:05 ` Yi Zhang @ 2021-12-14 10:39 ` Sagi Grimberg 2021-12-14 12:00 ` Max Gurtovoy 0 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2021-12-14 10:39 UTC (permalink / raw) To: Yi Zhang, Max Gurtovoy; +Cc: open list:NVM EXPRESS DRIVER, RDMA mailing list >>> Hi Sagi >>> It is still reproducible with the change, here is the log: >>> >>> # time nvme reset /dev/nvme0 >>> >>> real 0m12.973s >>> user 0m0.000s >>> sys 0m0.006s >>> # time nvme reset /dev/nvme0 >>> >>> real 1m15.606s >>> user 0m0.000s >>> sys 0m0.007s >> >> Does it speed up if you use less queues? (i.e. connect with -i 4) ? > Yes, with -i 4, it has stablee 1.3s > # time nvme reset /dev/nvme0 So it appears that destroying a qp takes a long time on IB for some reason... > real 0m1.225s > user 0m0.000s > sys 0m0.007s > >> >>> >>> # dmesg | grep nvme >>> [ 900.634877] nvme nvme0: resetting controller >>> [ 909.026958] nvme nvme0: creating 40 I/O queues. >>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>> [ 917.600993] nvme nvme0: resetting controller >>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout >>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 >>> [ 988.608181] nvme nvme0: creating 40 I/O queues. >>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>> >>> BTW, this issue cannot be reproduced on my NVME/ROCE environment. >> >> Then I think that we need the rdma folks to help here... Max? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-14 10:39 ` Sagi Grimberg @ 2021-12-14 12:00 ` Max Gurtovoy 2021-12-15 1:15 ` Yi Zhang 0 siblings, 1 reply; 45+ messages in thread From: Max Gurtovoy @ 2021-12-14 12:00 UTC (permalink / raw) To: Sagi Grimberg, Yi Zhang, Max Gurtovoy Cc: open list:NVM EXPRESS DRIVER, RDMA mailing list On 12/14/2021 12:39 PM, Sagi Grimberg wrote: > >>>> Hi Sagi >>>> It is still reproducible with the change, here is the log: >>>> >>>> # time nvme reset /dev/nvme0 >>>> >>>> real 0m12.973s >>>> user 0m0.000s >>>> sys 0m0.006s >>>> # time nvme reset /dev/nvme0 >>>> >>>> real 1m15.606s >>>> user 0m0.000s >>>> sys 0m0.007s >>> >>> Does it speed up if you use less queues? (i.e. connect with -i 4) ? >> Yes, with -i 4, it has stablee 1.3s >> # time nvme reset /dev/nvme0 > > So it appears that destroying a qp takes a long time on > IB for some reason... > >> real 0m1.225s >> user 0m0.000s >> sys 0m0.007s >> >>> >>>> >>>> # dmesg | grep nvme >>>> [ 900.634877] nvme nvme0: resetting controller >>>> [ 909.026958] nvme nvme0: creating 40 I/O queues. >>>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>>> [ 917.600993] nvme nvme0: resetting controller >>>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout >>>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 >>>> [ 988.608181] nvme nvme0: creating 40 I/O queues. >>>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>>> >>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment. >>> >>> Then I think that we need the rdma folks to help here... > > Max? It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+. Can you try repro with latest versions please ? Or give the exact scenario ? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-14 12:00 ` Max Gurtovoy @ 2021-12-15 1:15 ` Yi Zhang 2021-12-15 12:10 ` Max Gurtovoy 0 siblings, 1 reply; 45+ messages in thread From: Yi Zhang @ 2021-12-15 1:15 UTC (permalink / raw) To: Max Gurtovoy Cc: Sagi Grimberg, Max Gurtovoy, open list:NVM EXPRESS DRIVER, RDMA mailing list On Tue, Dec 14, 2021 at 8:01 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > > > On 12/14/2021 12:39 PM, Sagi Grimberg wrote: > > > >>>> Hi Sagi > >>>> It is still reproducible with the change, here is the log: > >>>> > >>>> # time nvme reset /dev/nvme0 > >>>> > >>>> real 0m12.973s > >>>> user 0m0.000s > >>>> sys 0m0.006s > >>>> # time nvme reset /dev/nvme0 > >>>> > >>>> real 1m15.606s > >>>> user 0m0.000s > >>>> sys 0m0.007s > >>> > >>> Does it speed up if you use less queues? (i.e. connect with -i 4) ? > >> Yes, with -i 4, it has stablee 1.3s > >> # time nvme reset /dev/nvme0 > > > > So it appears that destroying a qp takes a long time on > > IB for some reason... > > > >> real 0m1.225s > >> user 0m0.000s > >> sys 0m0.007s > >> > >>> > >>>> > >>>> # dmesg | grep nvme > >>>> [ 900.634877] nvme nvme0: resetting controller > >>>> [ 909.026958] nvme nvme0: creating 40 I/O queues. > >>>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >>>> [ 917.600993] nvme nvme0: resetting controller > >>>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout > >>>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 > >>>> [ 988.608181] nvme nvme0: creating 40 I/O queues. > >>>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >>>> > >>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment. > >>> > >>> Then I think that we need the rdma folks to help here... > > > > Max? > > It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+. > > Can you try repro with latest versions please ? > > Or give the exact scenario ? Yeah, both target and client are using Mellanox Technologies MT27700 Family [ConnectX-4], could you try stress "nvme reset /dev/nvme0", the first time reset will take 12s, and it always can be reproduced at the second reset operation. > > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-15 1:15 ` Yi Zhang @ 2021-12-15 12:10 ` Max Gurtovoy 2021-12-16 2:18 ` Yi Zhang 0 siblings, 1 reply; 45+ messages in thread From: Max Gurtovoy @ 2021-12-15 12:10 UTC (permalink / raw) To: Yi Zhang Cc: Sagi Grimberg, Max Gurtovoy, open list:NVM EXPRESS DRIVER, RDMA mailing list On 12/15/2021 3:15 AM, Yi Zhang wrote: > On Tue, Dec 14, 2021 at 8:01 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >> >> On 12/14/2021 12:39 PM, Sagi Grimberg wrote: >>>>>> Hi Sagi >>>>>> It is still reproducible with the change, here is the log: >>>>>> >>>>>> # time nvme reset /dev/nvme0 >>>>>> >>>>>> real 0m12.973s >>>>>> user 0m0.000s >>>>>> sys 0m0.006s >>>>>> # time nvme reset /dev/nvme0 >>>>>> >>>>>> real 1m15.606s >>>>>> user 0m0.000s >>>>>> sys 0m0.007s >>>>> Does it speed up if you use less queues? (i.e. connect with -i 4) ? >>>> Yes, with -i 4, it has stablee 1.3s >>>> # time nvme reset /dev/nvme0 >>> So it appears that destroying a qp takes a long time on >>> IB for some reason... >>> >>>> real 0m1.225s >>>> user 0m0.000s >>>> sys 0m0.007s >>>> >>>>>> # dmesg | grep nvme >>>>>> [ 900.634877] nvme nvme0: resetting controller >>>>>> [ 909.026958] nvme nvme0: creating 40 I/O queues. >>>>>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>>>>> [ 917.600993] nvme nvme0: resetting controller >>>>>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout >>>>>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 >>>>>> [ 988.608181] nvme nvme0: creating 40 I/O queues. >>>>>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>>>>> >>>>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment. >>>>> Then I think that we need the rdma folks to help here... >>> Max? >> It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+. >> >> Can you try repro with latest versions please ? >> >> Or give the exact scenario ? > Yeah, both target and client are using Mellanox Technologies MT27700 > Family [ConnectX-4], could you try stress "nvme reset /dev/nvme0", the > first time reset will take 12s, and it always can be reproduced at the > second reset operation. I created a target with 1 namespace backed by null_blk and connected to it from the same server in loopback rdma connection using the ConnectX-4 adapter. I run a loop with the "nvme reset" command and it took me 4-5 secs to reset each time.. > >> > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-15 12:10 ` Max Gurtovoy @ 2021-12-16 2:18 ` Yi Zhang 2021-12-16 13:21 ` Max Gurtovoy 0 siblings, 1 reply; 45+ messages in thread From: Yi Zhang @ 2021-12-16 2:18 UTC (permalink / raw) To: Max Gurtovoy Cc: Sagi Grimberg, Max Gurtovoy, open list:NVM EXPRESS DRIVER, RDMA mailing list On Wed, Dec 15, 2021 at 8:10 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > > > On 12/15/2021 3:15 AM, Yi Zhang wrote: > > On Tue, Dec 14, 2021 at 8:01 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > >> > >> On 12/14/2021 12:39 PM, Sagi Grimberg wrote: > >>>>>> Hi Sagi > >>>>>> It is still reproducible with the change, here is the log: > >>>>>> > >>>>>> # time nvme reset /dev/nvme0 > >>>>>> > >>>>>> real 0m12.973s > >>>>>> user 0m0.000s > >>>>>> sys 0m0.006s > >>>>>> # time nvme reset /dev/nvme0 > >>>>>> > >>>>>> real 1m15.606s > >>>>>> user 0m0.000s > >>>>>> sys 0m0.007s > >>>>> Does it speed up if you use less queues? (i.e. connect with -i 4) ? > >>>> Yes, with -i 4, it has stablee 1.3s > >>>> # time nvme reset /dev/nvme0 > >>> So it appears that destroying a qp takes a long time on > >>> IB for some reason... > >>> > >>>> real 0m1.225s > >>>> user 0m0.000s > >>>> sys 0m0.007s > >>>> > >>>>>> # dmesg | grep nvme > >>>>>> [ 900.634877] nvme nvme0: resetting controller > >>>>>> [ 909.026958] nvme nvme0: creating 40 I/O queues. > >>>>>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >>>>>> [ 917.600993] nvme nvme0: resetting controller > >>>>>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout > >>>>>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 > >>>>>> [ 988.608181] nvme nvme0: creating 40 I/O queues. > >>>>>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >>>>>> > >>>>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment. > >>>>> Then I think that we need the rdma folks to help here... > >>> Max? > >> It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+. > >> > >> Can you try repro with latest versions please ? > >> > >> Or give the exact scenario ? > > Yeah, both target and client are using Mellanox Technologies MT27700 > > Family [ConnectX-4], could you try stress "nvme reset /dev/nvme0", the > > first time reset will take 12s, and it always can be reproduced at the > > second reset operation. > > I created a target with 1 namespace backed by null_blk and connected to > it from the same server in loopback rdma connection using the ConnectX-4 > adapter. Could you share your loop.json file so I can try it on my environment? And can you try it with two servers that both have CX-4? This should be easier to reproduce it. > > I run a loop with the "nvme reset" command and it took me 4-5 secs to > reset each time.. > > > > > >> > > > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-16 2:18 ` Yi Zhang @ 2021-12-16 13:21 ` Max Gurtovoy 2021-12-16 16:32 ` Yi Zhang 0 siblings, 1 reply; 45+ messages in thread From: Max Gurtovoy @ 2021-12-16 13:21 UTC (permalink / raw) To: Yi Zhang Cc: Sagi Grimberg, Max Gurtovoy, open list:NVM EXPRESS DRIVER, RDMA mailing list On 12/16/2021 4:18 AM, Yi Zhang wrote: > On Wed, Dec 15, 2021 at 8:10 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >> >> On 12/15/2021 3:15 AM, Yi Zhang wrote: >>> On Tue, Dec 14, 2021 at 8:01 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >>>> On 12/14/2021 12:39 PM, Sagi Grimberg wrote: >>>>>>>> Hi Sagi >>>>>>>> It is still reproducible with the change, here is the log: >>>>>>>> >>>>>>>> # time nvme reset /dev/nvme0 >>>>>>>> >>>>>>>> real 0m12.973s >>>>>>>> user 0m0.000s >>>>>>>> sys 0m0.006s >>>>>>>> # time nvme reset /dev/nvme0 >>>>>>>> >>>>>>>> real 1m15.606s >>>>>>>> user 0m0.000s >>>>>>>> sys 0m0.007s >>>>>>> Does it speed up if you use less queues? (i.e. connect with -i 4) ? >>>>>> Yes, with -i 4, it has stablee 1.3s >>>>>> # time nvme reset /dev/nvme0 >>>>> So it appears that destroying a qp takes a long time on >>>>> IB for some reason... >>>>> >>>>>> real 0m1.225s >>>>>> user 0m0.000s >>>>>> sys 0m0.007s >>>>>> >>>>>>>> # dmesg | grep nvme >>>>>>>> [ 900.634877] nvme nvme0: resetting controller >>>>>>>> [ 909.026958] nvme nvme0: creating 40 I/O queues. >>>>>>>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>>>>>>> [ 917.600993] nvme nvme0: resetting controller >>>>>>>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout >>>>>>>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 >>>>>>>> [ 988.608181] nvme nvme0: creating 40 I/O queues. >>>>>>>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>>>>>>> >>>>>>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment. >>>>>>> Then I think that we need the rdma folks to help here... >>>>> Max? >>>> It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+. >>>> >>>> Can you try repro with latest versions please ? >>>> >>>> Or give the exact scenario ? >>> Yeah, both target and client are using Mellanox Technologies MT27700 >>> Family [ConnectX-4], could you try stress "nvme reset /dev/nvme0", the >>> first time reset will take 12s, and it always can be reproduced at the >>> second reset operation. >> I created a target with 1 namespace backed by null_blk and connected to >> it from the same server in loopback rdma connection using the ConnectX-4 >> adapter. > Could you share your loop.json file so I can try it on my environment? { "hosts": [], "ports": [ { "addr": { "adrfam": "ipv4", "traddr": "<ip>", "treq": "not specified", "trsvcid": "4420", "trtype": "rdma" }, "portid": 1, "referrals": [], "subsystems": [ "testsubsystem_0" ] } ], "subsystems": [ { "allowed_hosts": [], "attr": { "allow_any_host": "1", "cntlid_max": "65519", "cntlid_min": "1", "model": "Linux", "serial": "3d83c78b76623f1d", "version": "1.3" }, "namespaces": [ { "device": { "nguid": "5b722b05-e9b6-542d-ba80-62010b57775d", "path": "/dev/nullb0", "uuid": "26ffc8ce-73b4-321d-9685-7d7a9872c460" }, "enable": 1, "nsid": 1 } ], "nqn": "testsubsystem_0" } ] } > > And can you try it with two servers that both have CX-4? This should > be easier to reproduce it. I did this experiment. I have only a setup with 12 cores so I created 12 nvmf queues. The reset took 4 seconds. The test did 100 loops of "nvme reset". I saw that you also complained on the disconnect flow so I assume the root cause is the same. My disconnect took 2 seconds. My FW version is 12.28.2006. >> I run a loop with the "nvme reset" command and it took me 4-5 secs to >> reset each time.. >> >> > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-16 13:21 ` Max Gurtovoy @ 2021-12-16 16:32 ` Yi Zhang 2021-12-16 17:33 ` Haakon Bugge 0 siblings, 1 reply; 45+ messages in thread From: Yi Zhang @ 2021-12-16 16:32 UTC (permalink / raw) To: Max Gurtovoy Cc: Sagi Grimberg, Max Gurtovoy, open list:NVM EXPRESS DRIVER, RDMA mailing list On Thu, Dec 16, 2021 at 9:21 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > > > On 12/16/2021 4:18 AM, Yi Zhang wrote: > > On Wed, Dec 15, 2021 at 8:10 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > >> > >> On 12/15/2021 3:15 AM, Yi Zhang wrote: > >>> On Tue, Dec 14, 2021 at 8:01 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > >>>> On 12/14/2021 12:39 PM, Sagi Grimberg wrote: > >>>>>>>> Hi Sagi > >>>>>>>> It is still reproducible with the change, here is the log: > >>>>>>>> > >>>>>>>> # time nvme reset /dev/nvme0 > >>>>>>>> > >>>>>>>> real 0m12.973s > >>>>>>>> user 0m0.000s > >>>>>>>> sys 0m0.006s > >>>>>>>> # time nvme reset /dev/nvme0 > >>>>>>>> > >>>>>>>> real 1m15.606s > >>>>>>>> user 0m0.000s > >>>>>>>> sys 0m0.007s > >>>>>>> Does it speed up if you use less queues? (i.e. connect with -i 4) ? > >>>>>> Yes, with -i 4, it has stablee 1.3s > >>>>>> # time nvme reset /dev/nvme0 > >>>>> So it appears that destroying a qp takes a long time on > >>>>> IB for some reason... > >>>>> > >>>>>> real 0m1.225s > >>>>>> user 0m0.000s > >>>>>> sys 0m0.007s > >>>>>> > >>>>>>>> # dmesg | grep nvme > >>>>>>>> [ 900.634877] nvme nvme0: resetting controller > >>>>>>>> [ 909.026958] nvme nvme0: creating 40 I/O queues. > >>>>>>>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >>>>>>>> [ 917.600993] nvme nvme0: resetting controller > >>>>>>>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout > >>>>>>>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 > >>>>>>>> [ 988.608181] nvme nvme0: creating 40 I/O queues. > >>>>>>>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >>>>>>>> > >>>>>>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment. > >>>>>>> Then I think that we need the rdma folks to help here... > >>>>> Max? > >>>> It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+. > >>>> > >>>> Can you try repro with latest versions please ? > >>>> > >>>> Or give the exact scenario ? > >>> Yeah, both target and client are using Mellanox Technologies MT27700 > >>> Family [ConnectX-4], could you try stress "nvme reset /dev/nvme0", the > >>> first time reset will take 12s, and it always can be reproduced at the > >>> second reset operation. > >> I created a target with 1 namespace backed by null_blk and connected to > >> it from the same server in loopback rdma connection using the ConnectX-4 > >> adapter. > > Could you share your loop.json file so I can try it on my environment? > > { > "hosts": [], > "ports": [ > { > "addr": { > "adrfam": "ipv4", > "traddr": "<ip>", > "treq": "not specified", > "trsvcid": "4420", > "trtype": "rdma" > }, > "portid": 1, > "referrals": [], > "subsystems": [ > "testsubsystem_0" > ] > } > ], > "subsystems": [ > { > "allowed_hosts": [], > "attr": { > "allow_any_host": "1", > "cntlid_max": "65519", > "cntlid_min": "1", > "model": "Linux", > "serial": "3d83c78b76623f1d", > "version": "1.3" > }, > "namespaces": [ > { > "device": { > "nguid": "5b722b05-e9b6-542d-ba80-62010b57775d", > "path": "/dev/nullb0", > "uuid": "26ffc8ce-73b4-321d-9685-7d7a9872c460" > }, > "enable": 1, > "nsid": 1 > } > ], > "nqn": "testsubsystem_0" > } > ] > } Thanks, I reproduced it with your json file on one server with loopback rdma connection: # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testsubsystem_0 real 0m4.557s user 0m0.000s sys 0m0.005s # time nvme reset /dev/nvme0 real 0m13.176s user 0m0.000s sys 0m0.007s # time nvme reset /dev/nvme0 real 1m16.577s user 0m0.002s sys 0m0.005s # time nvme disconnect -n testsubsystem_0 NQN:testsubsystem_0 disconnected 1 controller(s) real 1m11.541s user 0m0.000s sys 0m0.187s #dmesg [96600.362827] nvmet: creating nvm controller 1 for subsystem testsubsystem_0 for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4d10-8030-b7c04f393432. [96600.363038] nvme nvme0: creating 40 I/O queues. [96604.905514] nvme nvme0: mapped 40/0/0 default/read/poll queues. [96604.909161] nvme nvme0: new ctrl: NQN "testsubsystem_0", addr 172.31.0.202:4420 [96614.270825] nvme nvme0: Removing ctrl: NQN "testsubsystem_0" [96659.268006] nvmet: creating nvm controller 1 for subsystem testsubsystem_0 for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4d10-8030-b7c04f393432. [96659.268215] nvme nvme0: creating 40 I/O queues. [96663.801929] nvme nvme0: mapped 40/0/0 default/read/poll queues. [96663.805589] nvme nvme0: new ctrl: NQN "testsubsystem_0", addr 172.31.0.202:4420 [96673.130986] nvme nvme0: resetting controller [96681.761992] nvmet: creating nvm controller 1 for subsystem testsubsystem_0 for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4d10-8030-b7c04f393432. [96681.762133] nvme nvme0: creating 40 I/O queues. [96686.302544] nvme nvme0: mapped 40/0/0 default/read/poll queues. [96688.850272] nvme nvme0: resetting controller [96697.336217] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! [96697.361231] nvmet: ctrl 1 fatal error occurred! [96760.824363] nvme nvme0: I/O 25 QID 0 timeout [96760.847531] nvme nvme0: Property Set error: 881, offset 0x14 [96760.885731] nvmet: creating nvm controller 1 for subsystem testsubsystem_0 for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4d10-8030-b7c04f393432. [96760.885879] nvme nvme0: creating 40 I/O queues. [96765.423099] nvme nvme0: mapped 40/0/0 default/read/poll queues. [96808.112730] nvme nvme0: Removing ctrl: NQN "testsubsystem_0" [96816.632485] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! [96816.657537] nvmet: ctrl 1 fatal error occurred! [96879.608632] nvme nvme0: I/O 12 QID 0 timeout [96879.632104] nvme nvme0: Property Set error: 881, offset 0x14> > > > > > And can you try it with two servers that both have CX-4? This should > > be easier to reproduce it. > > I did this experiment. I have only a setup with 12 cores so I created 12 > nvmf queues. > > The reset took 4 seconds. The test did 100 loops of "nvme reset". > > I saw that you also complained on the disconnect flow so I assume the > root cause is the same. Yeah, I think so > > My disconnect took 2 seconds. > > My FW version is 12.28.2006. Yeah, mine is same with yours. # cat /sys/devices/pci0000\:00/0000\:00\:02.0/0000\:04\:00.0/infiniband/mlx5_0/fw_ver 12.28.2006 > > >> I run a loop with the "nvme reset" command and it took me 4-5 secs to > >> reset each time.. > >> > >> > > > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-16 16:32 ` Yi Zhang @ 2021-12-16 17:33 ` Haakon Bugge 2021-12-17 7:03 ` Yi Zhang 0 siblings, 1 reply; 45+ messages in thread From: Haakon Bugge @ 2021-12-16 17:33 UTC (permalink / raw) To: Yi Zhang Cc: Max Gurtovoy, Sagi Grimberg, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list > On 16 Dec 2021, at 17:32, Yi Zhang <yi.zhang@redhat.com> wrote: > > On Thu, Dec 16, 2021 at 9:21 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >> >> >> On 12/16/2021 4:18 AM, Yi Zhang wrote: >>> On Wed, Dec 15, 2021 at 8:10 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >>>> >>>> On 12/15/2021 3:15 AM, Yi Zhang wrote: >>>>> On Tue, Dec 14, 2021 at 8:01 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >>>>>> On 12/14/2021 12:39 PM, Sagi Grimberg wrote: >>>>>>>>>> Hi Sagi >>>>>>>>>> It is still reproducible with the change, here is the log: >>>>>>>>>> >>>>>>>>>> # time nvme reset /dev/nvme0 >>>>>>>>>> >>>>>>>>>> real 0m12.973s >>>>>>>>>> user 0m0.000s >>>>>>>>>> sys 0m0.006s >>>>>>>>>> # time nvme reset /dev/nvme0 >>>>>>>>>> >>>>>>>>>> real 1m15.606s >>>>>>>>>> user 0m0.000s >>>>>>>>>> sys 0m0.007s >>>>>>>>> Does it speed up if you use less queues? (i.e. connect with -i 4) ? >>>>>>>> Yes, with -i 4, it has stablee 1.3s >>>>>>>> # time nvme reset /dev/nvme0 >>>>>>> So it appears that destroying a qp takes a long time on >>>>>>> IB for some reason... >>>>>>> >>>>>>>> real 0m1.225s >>>>>>>> user 0m0.000s >>>>>>>> sys 0m0.007s >>>>>>>> >>>>>>>>>> # dmesg | grep nvme >>>>>>>>>> [ 900.634877] nvme nvme0: resetting controller >>>>>>>>>> [ 909.026958] nvme nvme0: creating 40 I/O queues. >>>>>>>>>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>>>>>>>>> [ 917.600993] nvme nvme0: resetting controller >>>>>>>>>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout >>>>>>>>>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 >>>>>>>>>> [ 988.608181] nvme nvme0: creating 40 I/O queues. >>>>>>>>>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>>>>>>>>> >>>>>>>>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment. >>>>>>>>> Then I think that we need the rdma folks to help here... >>>>>>> Max? >>>>>> It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+. >>>>>> >>>>>> Can you try repro with latest versions please ? >>>>>> >>>>>> Or give the exact scenario ? >>>>> Yeah, both target and client are using Mellanox Technologies MT27700 >>>>> Family [ConnectX-4], could you try stress "nvme reset /dev/nvme0", the >>>>> first time reset will take 12s, and it always can be reproduced at the >>>>> second reset operation. >>>> I created a target with 1 namespace backed by null_blk and connected to >>>> it from the same server in loopback rdma connection using the ConnectX-4 >>>> adapter. >>> Could you share your loop.json file so I can try it on my environment? >> >> { >> "hosts": [], >> "ports": [ >> { >> "addr": { >> "adrfam": "ipv4", >> "traddr": "<ip>", >> "treq": "not specified", >> "trsvcid": "4420", >> "trtype": "rdma" >> }, >> "portid": 1, >> "referrals": [], >> "subsystems": [ >> "testsubsystem_0" >> ] >> } >> ], >> "subsystems": [ >> { >> "allowed_hosts": [], >> "attr": { >> "allow_any_host": "1", >> "cntlid_max": "65519", >> "cntlid_min": "1", >> "model": "Linux", >> "serial": "3d83c78b76623f1d", >> "version": "1.3" >> }, >> "namespaces": [ >> { >> "device": { >> "nguid": "5b722b05-e9b6-542d-ba80-62010b57775d", >> "path": "/dev/nullb0", >> "uuid": "26ffc8ce-73b4-321d-9685-7d7a9872c460" >> }, >> "enable": 1, >> "nsid": 1 >> } >> ], >> "nqn": "testsubsystem_0" >> } >> ] >> } > > Thanks, I reproduced it with your json file on one server with > loopback rdma connection: > # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testsubsystem_0 > > real 0m4.557s > user 0m0.000s > sys 0m0.005s > # time nvme reset /dev/nvme0 > > real 0m13.176s > user 0m0.000s > sys 0m0.007s > # time nvme reset /dev/nvme0 > > real 1m16.577s > user 0m0.002s > sys 0m0.005s > # time nvme disconnect -n testsubsystem_0 What does: # rdma res show qp | grep -c ERR say, when it is slow? Thxs, Håkon > NQN:testsubsystem_0 disconnected 1 controller(s) > > real 1m11.541s > user 0m0.000s > sys 0m0.187s > > #dmesg > [96600.362827] nvmet: creating nvm controller 1 for subsystem > testsubsystem_0 for NQN > nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4d10-8030-b7c04f393432. > [96600.363038] nvme nvme0: creating 40 I/O queues. > [96604.905514] nvme nvme0: mapped 40/0/0 default/read/poll queues. > [96604.909161] nvme nvme0: new ctrl: NQN "testsubsystem_0", addr > 172.31.0.202:4420 > [96614.270825] nvme nvme0: Removing ctrl: NQN "testsubsystem_0" > [96659.268006] nvmet: creating nvm controller 1 for subsystem > testsubsystem_0 for NQN > nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4d10-8030-b7c04f393432. > [96659.268215] nvme nvme0: creating 40 I/O queues. > [96663.801929] nvme nvme0: mapped 40/0/0 default/read/poll queues. > [96663.805589] nvme nvme0: new ctrl: NQN "testsubsystem_0", addr > 172.31.0.202:4420 > [96673.130986] nvme nvme0: resetting controller > [96681.761992] nvmet: creating nvm controller 1 for subsystem > testsubsystem_0 for NQN > nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4d10-8030-b7c04f393432. > [96681.762133] nvme nvme0: creating 40 I/O queues. > [96686.302544] nvme nvme0: mapped 40/0/0 default/read/poll queues. > [96688.850272] nvme nvme0: resetting controller > [96697.336217] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > [96697.361231] nvmet: ctrl 1 fatal error occurred! > [96760.824363] nvme nvme0: I/O 25 QID 0 timeout > [96760.847531] nvme nvme0: Property Set error: 881, offset 0x14 > [96760.885731] nvmet: creating nvm controller 1 for subsystem > testsubsystem_0 for NQN > nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0056-4d10-8030-b7c04f393432. > [96760.885879] nvme nvme0: creating 40 I/O queues. > [96765.423099] nvme nvme0: mapped 40/0/0 default/read/poll queues. > [96808.112730] nvme nvme0: Removing ctrl: NQN "testsubsystem_0" > [96816.632485] nvmet: ctrl 1 keep-alive timer (5 seconds) expired! > [96816.657537] nvmet: ctrl 1 fatal error occurred! > [96879.608632] nvme nvme0: I/O 12 QID 0 timeout > [96879.632104] nvme nvme0: Property Set error: 881, offset 0x14> >> >>> >>> And can you try it with two servers that both have CX-4? This should >>> be easier to reproduce it. >> >> I did this experiment. I have only a setup with 12 cores so I created 12 >> nvmf queues. >> >> The reset took 4 seconds. The test did 100 loops of "nvme reset". >> >> I saw that you also complained on the disconnect flow so I assume the >> root cause is the same. > > Yeah, I think so > >> >> My disconnect took 2 seconds. >> >> My FW version is 12.28.2006. > > Yeah, mine is same with yours. > # cat /sys/devices/pci0000\:00/0000\:00\:02.0/0000\:04\:00.0/infiniband/mlx5_0/fw_ver > 12.28.2006 > > >> >>>> I run a loop with the "nvme reset" command and it took me 4-5 secs to >>>> reset each time.. >>>> >>>> >>> >> > > > -- > Best Regards, > Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-16 17:33 ` Haakon Bugge @ 2021-12-17 7:03 ` Yi Zhang 2021-12-17 11:19 ` Haakon Bugge 0 siblings, 1 reply; 45+ messages in thread From: Yi Zhang @ 2021-12-17 7:03 UTC (permalink / raw) To: Haakon Bugge Cc: Max Gurtovoy, Sagi Grimberg, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list On Fri, Dec 17, 2021 at 1:34 AM Haakon Bugge <haakon.bugge@oracle.com> wrote: > > > > > On 16 Dec 2021, at 17:32, Yi Zhang <yi.zhang@redhat.com> wrote: > > > > On Thu, Dec 16, 2021 at 9:21 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > >> > >> > >> On 12/16/2021 4:18 AM, Yi Zhang wrote: > >>> On Wed, Dec 15, 2021 at 8:10 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > >>>> > >>>> On 12/15/2021 3:15 AM, Yi Zhang wrote: > >>>>> On Tue, Dec 14, 2021 at 8:01 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > >>>>>> On 12/14/2021 12:39 PM, Sagi Grimberg wrote: > >>>>>>>>>> Hi Sagi > >>>>>>>>>> It is still reproducible with the change, here is the log: > >>>>>>>>>> > >>>>>>>>>> # time nvme reset /dev/nvme0 > >>>>>>>>>> > >>>>>>>>>> real 0m12.973s > >>>>>>>>>> user 0m0.000s > >>>>>>>>>> sys 0m0.006s > >>>>>>>>>> # time nvme reset /dev/nvme0 > >>>>>>>>>> > >>>>>>>>>> real 1m15.606s > >>>>>>>>>> user 0m0.000s > >>>>>>>>>> sys 0m0.007s > >>>>>>>>> Does it speed up if you use less queues? (i.e. connect with -i 4) ? > >>>>>>>> Yes, with -i 4, it has stablee 1.3s > >>>>>>>> # time nvme reset /dev/nvme0 > >>>>>>> So it appears that destroying a qp takes a long time on > >>>>>>> IB for some reason... > >>>>>>> > >>>>>>>> real 0m1.225s > >>>>>>>> user 0m0.000s > >>>>>>>> sys 0m0.007s > >>>>>>>> > >>>>>>>>>> # dmesg | grep nvme > >>>>>>>>>> [ 900.634877] nvme nvme0: resetting controller > >>>>>>>>>> [ 909.026958] nvme nvme0: creating 40 I/O queues. > >>>>>>>>>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >>>>>>>>>> [ 917.600993] nvme nvme0: resetting controller > >>>>>>>>>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout > >>>>>>>>>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 > >>>>>>>>>> [ 988.608181] nvme nvme0: creating 40 I/O queues. > >>>>>>>>>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >>>>>>>>>> > >>>>>>>>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment. > >>>>>>>>> Then I think that we need the rdma folks to help here... > >>>>>>> Max? > >>>>>> It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+. > >>>>>> > >>>>>> Can you try repro with latest versions please ? > >>>>>> > >>>>>> Or give the exact scenario ? > >>>>> Yeah, both target and client are using Mellanox Technologies MT27700 > >>>>> Family [ConnectX-4], could you try stress "nvme reset /dev/nvme0", the > >>>>> first time reset will take 12s, and it always can be reproduced at the > >>>>> second reset operation. > >>>> I created a target with 1 namespace backed by null_blk and connected to > >>>> it from the same server in loopback rdma connection using the ConnectX-4 > >>>> adapter. > >>> Could you share your loop.json file so I can try it on my environment? > >> > >> { > >> "hosts": [], > >> "ports": [ > >> { > >> "addr": { > >> "adrfam": "ipv4", > >> "traddr": "<ip>", > >> "treq": "not specified", > >> "trsvcid": "4420", > >> "trtype": "rdma" > >> }, > >> "portid": 1, > >> "referrals": [], > >> "subsystems": [ > >> "testsubsystem_0" > >> ] > >> } > >> ], > >> "subsystems": [ > >> { > >> "allowed_hosts": [], > >> "attr": { > >> "allow_any_host": "1", > >> "cntlid_max": "65519", > >> "cntlid_min": "1", > >> "model": "Linux", > >> "serial": "3d83c78b76623f1d", > >> "version": "1.3" > >> }, > >> "namespaces": [ > >> { > >> "device": { > >> "nguid": "5b722b05-e9b6-542d-ba80-62010b57775d", > >> "path": "/dev/nullb0", > >> "uuid": "26ffc8ce-73b4-321d-9685-7d7a9872c460" > >> }, > >> "enable": 1, > >> "nsid": 1 > >> } > >> ], > >> "nqn": "testsubsystem_0" > >> } > >> ] > >> } > > > > Thanks, I reproduced it with your json file on one server with > > loopback rdma connection: > > # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testsubsystem_0 > > > > real 0m4.557s > > user 0m0.000s > > sys 0m0.005s > > # time nvme reset /dev/nvme0 > > > > real 0m13.176s > > user 0m0.000s > > sys 0m0.007s > > # time nvme reset /dev/nvme0 > > > > real 1m16.577s > > user 0m0.002s > > sys 0m0.005s > > # time nvme disconnect -n testsubsystem_0 > > What does: > > # rdma res show qp | grep -c ERR > > say, when it is slow? Hi Haakon Here is the output during the reset operation: 38 33 28 24 19 14 9 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 > > > > Thxs, Håkon > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-17 7:03 ` Yi Zhang @ 2021-12-17 11:19 ` Haakon Bugge 2022-02-14 9:47 ` Yi Zhang 0 siblings, 1 reply; 45+ messages in thread From: Haakon Bugge @ 2021-12-17 11:19 UTC (permalink / raw) To: Yi Zhang Cc: Max Gurtovoy, Sagi Grimberg, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list > On 17 Dec 2021, at 08:03, Yi Zhang <yi.zhang@redhat.com> wrote: > > On Fri, Dec 17, 2021 at 1:34 AM Haakon Bugge <haakon.bugge@oracle.com> wrote: >> >> >> >>> On 16 Dec 2021, at 17:32, Yi Zhang <yi.zhang@redhat.com> wrote: >>> >>> On Thu, Dec 16, 2021 at 9:21 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >>>> >>>> >>>> On 12/16/2021 4:18 AM, Yi Zhang wrote: >>>>> On Wed, Dec 15, 2021 at 8:10 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >>>>>> >>>>>> On 12/15/2021 3:15 AM, Yi Zhang wrote: >>>>>>> On Tue, Dec 14, 2021 at 8:01 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >>>>>>>> On 12/14/2021 12:39 PM, Sagi Grimberg wrote: >>>>>>>>>>>> Hi Sagi >>>>>>>>>>>> It is still reproducible with the change, here is the log: >>>>>>>>>>>> >>>>>>>>>>>> # time nvme reset /dev/nvme0 >>>>>>>>>>>> >>>>>>>>>>>> real 0m12.973s >>>>>>>>>>>> user 0m0.000s >>>>>>>>>>>> sys 0m0.006s >>>>>>>>>>>> # time nvme reset /dev/nvme0 >>>>>>>>>>>> >>>>>>>>>>>> real 1m15.606s >>>>>>>>>>>> user 0m0.000s >>>>>>>>>>>> sys 0m0.007s >>>>>>>>>>> Does it speed up if you use less queues? (i.e. connect with -i 4) ? >>>>>>>>>> Yes, with -i 4, it has stablee 1.3s >>>>>>>>>> # time nvme reset /dev/nvme0 >>>>>>>>> So it appears that destroying a qp takes a long time on >>>>>>>>> IB for some reason... >>>>>>>>> >>>>>>>>>> real 0m1.225s >>>>>>>>>> user 0m0.000s >>>>>>>>>> sys 0m0.007s >>>>>>>>>> >>>>>>>>>>>> # dmesg | grep nvme >>>>>>>>>>>> [ 900.634877] nvme nvme0: resetting controller >>>>>>>>>>>> [ 909.026958] nvme nvme0: creating 40 I/O queues. >>>>>>>>>>>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>>>>>>>>>>> [ 917.600993] nvme nvme0: resetting controller >>>>>>>>>>>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout >>>>>>>>>>>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 >>>>>>>>>>>> [ 988.608181] nvme nvme0: creating 40 I/O queues. >>>>>>>>>>>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. >>>>>>>>>>>> >>>>>>>>>>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment. >>>>>>>>>>> Then I think that we need the rdma folks to help here... >>>>>>>>> Max? >>>>>>>> It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+. >>>>>>>> >>>>>>>> Can you try repro with latest versions please ? >>>>>>>> >>>>>>>> Or give the exact scenario ? >>>>>>> Yeah, both target and client are using Mellanox Technologies MT27700 >>>>>>> Family [ConnectX-4], could you try stress "nvme reset /dev/nvme0", the >>>>>>> first time reset will take 12s, and it always can be reproduced at the >>>>>>> second reset operation. >>>>>> I created a target with 1 namespace backed by null_blk and connected to >>>>>> it from the same server in loopback rdma connection using the ConnectX-4 >>>>>> adapter. >>>>> Could you share your loop.json file so I can try it on my environment? >>>> >>>> { >>>> "hosts": [], >>>> "ports": [ >>>> { >>>> "addr": { >>>> "adrfam": "ipv4", >>>> "traddr": "<ip>", >>>> "treq": "not specified", >>>> "trsvcid": "4420", >>>> "trtype": "rdma" >>>> }, >>>> "portid": 1, >>>> "referrals": [], >>>> "subsystems": [ >>>> "testsubsystem_0" >>>> ] >>>> } >>>> ], >>>> "subsystems": [ >>>> { >>>> "allowed_hosts": [], >>>> "attr": { >>>> "allow_any_host": "1", >>>> "cntlid_max": "65519", >>>> "cntlid_min": "1", >>>> "model": "Linux", >>>> "serial": "3d83c78b76623f1d", >>>> "version": "1.3" >>>> }, >>>> "namespaces": [ >>>> { >>>> "device": { >>>> "nguid": "5b722b05-e9b6-542d-ba80-62010b57775d", >>>> "path": "/dev/nullb0", >>>> "uuid": "26ffc8ce-73b4-321d-9685-7d7a9872c460" >>>> }, >>>> "enable": 1, >>>> "nsid": 1 >>>> } >>>> ], >>>> "nqn": "testsubsystem_0" >>>> } >>>> ] >>>> } >>> >>> Thanks, I reproduced it with your json file on one server with >>> loopback rdma connection: >>> # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testsubsystem_0 >>> >>> real 0m4.557s >>> user 0m0.000s >>> sys 0m0.005s >>> # time nvme reset /dev/nvme0 >>> >>> real 0m13.176s >>> user 0m0.000s >>> sys 0m0.007s >>> # time nvme reset /dev/nvme0 >>> >>> real 1m16.577s >>> user 0m0.002s >>> sys 0m0.005s >>> # time nvme disconnect -n testsubsystem_0 >> >> What does: >> >> # rdma res show qp | grep -c ERR >> >> say, when it is slow? > > Hi Haakon > Here is the output during the reset operation: > > 38 > 33 > 28 > 24 > 19 > 14 > 9 > 4 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 0 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 1 > 0 > 0 > 0 OK. I had a hunch that many QPs in the ERR state could be the reason. But this is not many, so I rest my case :-) Thanks for trying though! Thxs, Håkon > >> >> >> >> Thxs, Håkon >> > > > > -- > Best Regards, > Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2021-12-17 11:19 ` Haakon Bugge @ 2022-02-14 9:47 ` Yi Zhang 2022-02-14 11:00 ` Chaitanya Kulkarni 2022-02-14 11:32 ` Sagi Grimberg 0 siblings, 2 replies; 45+ messages in thread From: Yi Zhang @ 2022-02-14 9:47 UTC (permalink / raw) To: Max Gurtovoy, Sagi Grimberg Cc: Haakon Bugge, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list Hi Sagi/Max Here are more findings with the bisect: The time for reset operation changed from 3s[1] to 12s[2] after commit[3], and after commit[4], the reset operation timeout at the second reset[5], let me know if you need any testing for it, thanks. [1] # time nvme reset /dev/nvme0 real 0m3.049s user 0m0.000s sys 0m0.006s [2] # time nvme reset /dev/nvme0 real 0m12.498s user 0m0.000s sys 0m0.006s [3] commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc (HEAD) Author: Max Gurtovoy <maxg@mellanox.com> Date: Tue May 19 17:05:56 2020 +0300 nvme-rdma: add metadata/T10-PI support [4] commit a70b81bd4d9d2d6c05cfe6ef2a10bccc2e04357a (HEAD) Author: Hannes Reinecke <hare@suse.de> Date: Fri Apr 16 13:46:20 2021 +0200 nvme: sanitize KATO setting- [5] # time nvme reset /dev/nvme0 real 0m12.628s user 0m0.000s sys 0m0.006s # time nvme reset /dev/nvme0 real 1m15.617s user 0m0.000s sys 0m0.006s #dmesg [ 1866.068377] nvme nvme0: creating 40 I/O queues. [ 1870.367851] nvme nvme0: mapped 40/0/0 default/read/poll queues. [ 1870.370949] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.0.202:4420 [ 1930.981494] nvme nvme0: resetting controller [ 1939.316131] nvme nvme0: creating 40 I/O queues. [ 1943.605427] nvme nvme0: mapped 40/0/0 default/read/poll queues. [ 1953.708327] nvme nvme0: resetting controller [ 2024.985591] nvme nvme0: I/O 7 QID 0 timeout [ 2024.990979] nvme nvme0: Property Set error: 881, offset 0x14 [ 2025.031069] nvme nvme0: creating 40 I/O queues. [ 2029.321798] nvme nvme0: mapped 40/0/0 default/read/poll queues. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-02-14 9:47 ` Yi Zhang @ 2022-02-14 11:00 ` Chaitanya Kulkarni 2022-02-14 11:32 ` Sagi Grimberg 1 sibling, 0 replies; 45+ messages in thread From: Chaitanya Kulkarni @ 2022-02-14 11:00 UTC (permalink / raw) To: Yi Zhang Cc: Haakon Bugge, Max Gurtovoy, Sagi Grimberg, open list:NVM EXPRESS DRIVER, OFED mailing list, Max Gurtovoy On 2/14/22 01:47, Yi Zhang wrote: > Hi Sagi/Max > Here are more findings with the bisect: > > The time for reset operation changed from 3s[1] to 12s[2] after > commit[3], and after commit[4], the reset operation timeout at the > second reset[5], let me know if you need any testing for it, thanks. you can collect the perf numbers with function call graph for base case and problem case, that way you will know exactly where we are spending more time ... -ck > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-02-14 9:47 ` Yi Zhang 2022-02-14 11:00 ` Chaitanya Kulkarni @ 2022-02-14 11:32 ` Sagi Grimberg 2022-02-14 12:11 ` Max Gurtovoy 1 sibling, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2022-02-14 11:32 UTC (permalink / raw) To: Yi Zhang, Max Gurtovoy Cc: Haakon Bugge, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list > Hi Sagi/Max > Here are more findings with the bisect: > > The time for reset operation changed from 3s[1] to 12s[2] after > commit[3], and after commit[4], the reset operation timeout at the > second reset[5], let me know if you need any testing for it, thanks. Does this at least eliminate the timeout? -- diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a162f6c6da6e..60e415078893 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout; extern unsigned int admin_timeout; #define NVME_ADMIN_TIMEOUT (admin_timeout * HZ) -#define NVME_DEFAULT_KATO 5 +#define NVME_DEFAULT_KATO 10 #ifdef CONFIG_ARCH_NO_SG_CHAIN #define NVME_INLINE_SG_CNT 0 -- > > [1] > # time nvme reset /dev/nvme0 > > real 0m3.049s > user 0m0.000s > sys 0m0.006s > [2] > # time nvme reset /dev/nvme0 > > real 0m12.498s > user 0m0.000s > sys 0m0.006s > [3] > commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc (HEAD) > Author: Max Gurtovoy <maxg@mellanox.com> > Date: Tue May 19 17:05:56 2020 +0300 > > nvme-rdma: add metadata/T10-PI support > > [4] > commit a70b81bd4d9d2d6c05cfe6ef2a10bccc2e04357a (HEAD) > Author: Hannes Reinecke <hare@suse.de> > Date: Fri Apr 16 13:46:20 2021 +0200 > > nvme: sanitize KATO setting- This change effectively changed the keep-alive timeout from 15 to 5 and modified the host to send keepalives every 2.5 seconds instead of 5. I guess that in combination that now it takes longer to create and delete rdma resources (either qps or mrs) it starts to timeout in setups where there are a lot of queues. ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-02-14 11:32 ` Sagi Grimberg @ 2022-02-14 12:11 ` Max Gurtovoy 2022-02-15 13:52 ` Yi Zhang 0 siblings, 1 reply; 45+ messages in thread From: Max Gurtovoy @ 2022-02-14 12:11 UTC (permalink / raw) To: Sagi Grimberg, Yi Zhang Cc: Haakon Bugge, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list On 2/14/2022 1:32 PM, Sagi Grimberg wrote: > >> Hi Sagi/Max >> Here are more findings with the bisect: >> >> The time for reset operation changed from 3s[1] to 12s[2] after >> commit[3], and after commit[4], the reset operation timeout at the >> second reset[5], let me know if you need any testing for it, thanks. > > Does this at least eliminate the timeout? > -- > diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h > index a162f6c6da6e..60e415078893 100644 > --- a/drivers/nvme/host/nvme.h > +++ b/drivers/nvme/host/nvme.h > @@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout; > extern unsigned int admin_timeout; > #define NVME_ADMIN_TIMEOUT (admin_timeout * HZ) > > -#define NVME_DEFAULT_KATO 5 > +#define NVME_DEFAULT_KATO 10 > > #ifdef CONFIG_ARCH_NO_SG_CHAIN > #define NVME_INLINE_SG_CNT 0 > -- > or for the initial test you can use --keep-alive-tmo=<10 or 15> flag in the connect command >> >> [1] >> # time nvme reset /dev/nvme0 >> >> real 0m3.049s >> user 0m0.000s >> sys 0m0.006s >> [2] >> # time nvme reset /dev/nvme0 >> >> real 0m12.498s >> user 0m0.000s >> sys 0m0.006s >> [3] >> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc (HEAD) >> Author: Max Gurtovoy <maxg@mellanox.com> >> Date: Tue May 19 17:05:56 2020 +0300 >> >> nvme-rdma: add metadata/T10-PI support >> >> [4] >> commit a70b81bd4d9d2d6c05cfe6ef2a10bccc2e04357a (HEAD) >> Author: Hannes Reinecke <hare@suse.de> >> Date: Fri Apr 16 13:46:20 2021 +0200 >> >> nvme: sanitize KATO setting- > > This change effectively changed the keep-alive timeout > from 15 to 5 and modified the host to send keepalives every > 2.5 seconds instead of 5. > > I guess that in combination that now it takes longer to > create and delete rdma resources (either qps or mrs) > it starts to timeout in setups where there are a lot of > queues. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-02-14 12:11 ` Max Gurtovoy @ 2022-02-15 13:52 ` Yi Zhang 2022-02-15 14:30 ` Max Gurtovoy 0 siblings, 1 reply; 45+ messages in thread From: Yi Zhang @ 2022-02-15 13:52 UTC (permalink / raw) To: Max Gurtovoy Cc: Sagi Grimberg, Haakon Bugge, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list Hi Sagi/Max Changing the value to 10 or 15 fixed the timeout issue. And the reset operation still needs more than 12s on my environment, I also tried disabling the pi_enable, the reset operation will be back to 3s, so seems the added 9s was due to the PI enabled code path. On Mon, Feb 14, 2022 at 8:12 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > > > On 2/14/2022 1:32 PM, Sagi Grimberg wrote: > > > >> Hi Sagi/Max > >> Here are more findings with the bisect: > >> > >> The time for reset operation changed from 3s[1] to 12s[2] after > >> commit[3], and after commit[4], the reset operation timeout at the > >> second reset[5], let me know if you need any testing for it, thanks. > > > > Does this at least eliminate the timeout? > > -- > > diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h > > index a162f6c6da6e..60e415078893 100644 > > --- a/drivers/nvme/host/nvme.h > > +++ b/drivers/nvme/host/nvme.h > > @@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout; > > extern unsigned int admin_timeout; > > #define NVME_ADMIN_TIMEOUT (admin_timeout * HZ) > > > > -#define NVME_DEFAULT_KATO 5 > > +#define NVME_DEFAULT_KATO 10 > > > > #ifdef CONFIG_ARCH_NO_SG_CHAIN > > #define NVME_INLINE_SG_CNT 0 > > -- > > > or for the initial test you can use --keep-alive-tmo=<10 or 15> flag in > the connect command > > >> > >> [1] > >> # time nvme reset /dev/nvme0 > >> > >> real 0m3.049s > >> user 0m0.000s > >> sys 0m0.006s > >> [2] > >> # time nvme reset /dev/nvme0 > >> > >> real 0m12.498s > >> user 0m0.000s > >> sys 0m0.006s > >> [3] > >> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc (HEAD) > >> Author: Max Gurtovoy <maxg@mellanox.com> > >> Date: Tue May 19 17:05:56 2020 +0300 > >> > >> nvme-rdma: add metadata/T10-PI support > >> > >> [4] > >> commit a70b81bd4d9d2d6c05cfe6ef2a10bccc2e04357a (HEAD) > >> Author: Hannes Reinecke <hare@suse.de> > >> Date: Fri Apr 16 13:46:20 2021 +0200 > >> > >> nvme: sanitize KATO setting- > > > > This change effectively changed the keep-alive timeout > > from 15 to 5 and modified the host to send keepalives every > > 2.5 seconds instead of 5. > > > > I guess that in combination that now it takes longer to > > create and delete rdma resources (either qps or mrs) > > it starts to timeout in setups where there are a lot of > > queues. > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-02-15 13:52 ` Yi Zhang @ 2022-02-15 14:30 ` Max Gurtovoy 2022-02-21 10:00 ` Yi Zhang 0 siblings, 1 reply; 45+ messages in thread From: Max Gurtovoy @ 2022-02-15 14:30 UTC (permalink / raw) To: Yi Zhang Cc: Sagi Grimberg, Haakon Bugge, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list [-- Attachment #1: Type: text/plain, Size: 2820 bytes --] Thanks Yi Zhang. Few years ago I've sent some patches that were supposed to fix the KA mechanism but eventually they weren't accepted. I haven't tested it since but maybe you can run some tests with it. The attached patches are partial and include only rdma transport for your testing. If it work for you we can work on it again and argue for correctness. Please don't use the workaround we suggested earlier with these patches. -Max. On 2/15/2022 3:52 PM, Yi Zhang wrote: > Hi Sagi/Max > > Changing the value to 10 or 15 fixed the timeout issue. > And the reset operation still needs more than 12s on my environment, I > also tried disabling the pi_enable, the reset operation will be back > to 3s, so seems the added 9s was due to the PI enabled code path. > > On Mon, Feb 14, 2022 at 8:12 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >> >> On 2/14/2022 1:32 PM, Sagi Grimberg wrote: >>>> Hi Sagi/Max >>>> Here are more findings with the bisect: >>>> >>>> The time for reset operation changed from 3s[1] to 12s[2] after >>>> commit[3], and after commit[4], the reset operation timeout at the >>>> second reset[5], let me know if you need any testing for it, thanks. >>> Does this at least eliminate the timeout? >>> -- >>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h >>> index a162f6c6da6e..60e415078893 100644 >>> --- a/drivers/nvme/host/nvme.h >>> +++ b/drivers/nvme/host/nvme.h >>> @@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout; >>> extern unsigned int admin_timeout; >>> #define NVME_ADMIN_TIMEOUT (admin_timeout * HZ) >>> >>> -#define NVME_DEFAULT_KATO 5 >>> +#define NVME_DEFAULT_KATO 10 >>> >>> #ifdef CONFIG_ARCH_NO_SG_CHAIN >>> #define NVME_INLINE_SG_CNT 0 >>> -- >>> >> or for the initial test you can use --keep-alive-tmo=<10 or 15> flag in >> the connect command >> >>>> [1] >>>> # time nvme reset /dev/nvme0 >>>> >>>> real 0m3.049s >>>> user 0m0.000s >>>> sys 0m0.006s >>>> [2] >>>> # time nvme reset /dev/nvme0 >>>> >>>> real 0m12.498s >>>> user 0m0.000s >>>> sys 0m0.006s >>>> [3] >>>> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc (HEAD) >>>> Author: Max Gurtovoy <maxg@mellanox.com> >>>> Date: Tue May 19 17:05:56 2020 +0300 >>>> >>>> nvme-rdma: add metadata/T10-PI support >>>> >>>> [4] >>>> commit a70b81bd4d9d2d6c05cfe6ef2a10bccc2e04357a (HEAD) >>>> Author: Hannes Reinecke <hare@suse.de> >>>> Date: Fri Apr 16 13:46:20 2021 +0200 >>>> >>>> nvme: sanitize KATO setting- >>> This change effectively changed the keep-alive timeout >>> from 15 to 5 and modified the host to send keepalives every >>> 2.5 seconds instead of 5. >>> >>> I guess that in combination that now it takes longer to >>> create and delete rdma resources (either qps or mrs) >>> it starts to timeout in setups where there are a lot of >>> queues. > [-- Attachment #2: 0001-Revert-nvme-unexport-nvme_start_keep_alive.patch --] [-- Type: text/plain, Size: 1802 bytes --] From 4f5629786ceaa6912d5c585c18cc5d83fa346ea9 Mon Sep 17 00:00:00 2001 From: Max Gurtovoy <mgurtovoy@nvidia.com> Date: Tue, 10 Apr 2018 15:21:29 +0000 Subject: [PATCH 1/3] Revert "nvme: unexport nvme_start_keep_alive" This reverts commit c8799eee39e7523e5e0be10f8950b11cb66085bd. nvme_start_keep_alive() will be used by the transport drivers to fix keep-alive synchronization between NVMe-oF target/host. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> --- drivers/nvme/host/core.c | 3 ++- drivers/nvme/host/nvme.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 961a5f8a44d2..46f2d85d4f31 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1284,13 +1284,14 @@ static void nvme_keep_alive_work(struct work_struct *work) blk_execute_rq_nowait(rq, false, nvme_keep_alive_end_io); } -static void nvme_start_keep_alive(struct nvme_ctrl *ctrl) +void nvme_start_keep_alive(struct nvme_ctrl *ctrl) { if (unlikely(ctrl->kato == 0)) return; nvme_queue_keep_alive_work(ctrl); } +EXPORT_SYMBOL_GPL(nvme_start_keep_alive); void nvme_stop_keep_alive(struct nvme_ctrl *ctrl) { diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a162f6c6da6e..1610ec764bfc 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -729,6 +729,7 @@ int nvme_get_features(struct nvme_ctrl *dev, unsigned int fid, unsigned int dword11, void *buffer, size_t buflen, u32 *result); int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count); +void nvme_start_keep_alive(struct nvme_ctrl *ctrl); void nvme_stop_keep_alive(struct nvme_ctrl *ctrl); int nvme_reset_ctrl(struct nvme_ctrl *ctrl); int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl); -- 2.18.1 [-- Attachment #3: 0002-nvme-remove-association-between-ctrl-and-keep-alive.patch --] [-- Type: text/plain, Size: 1135 bytes --] From cbd2e21bc4487e345d37aa643404eab495c05e25 Mon Sep 17 00:00:00 2001 From: Max Gurtovoy <mgurtovoy@nvidia.com> Date: Tue, 10 Apr 2018 16:30:18 +0000 Subject: [PATCH 2/3] nvme: remove association between ctrl and keep-alive Keep-alive mechanism is an admin queue property and should be activated/deactivated during admin queue creation/destruction. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> --- drivers/nvme/host/core.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 46f2d85d4f31..34b006a77270 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4389,7 +4389,6 @@ EXPORT_SYMBOL_GPL(nvme_complete_async_event); void nvme_stop_ctrl(struct nvme_ctrl *ctrl) { nvme_mpath_stop(ctrl); - nvme_stop_keep_alive(ctrl); nvme_stop_failfast_work(ctrl); flush_work(&ctrl->async_event_work); cancel_work_sync(&ctrl->fw_act_work); @@ -4398,8 +4397,6 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl); void nvme_start_ctrl(struct nvme_ctrl *ctrl) { - nvme_start_keep_alive(ctrl); - nvme_enable_aen(ctrl); if (ctrl->queue_count > 1) { -- 2.18.1 [-- Attachment #4: 0003-nvme-rdma-add-keep-alive-mechanism-as-admin_q-proper.patch --] [-- Type: text/plain, Size: 1546 bytes --] From 3a40795e1c47aae8297a1dcaa821f63fea5e4baa Mon Sep 17 00:00:00 2001 From: Max Gurtovoy <mgurtovoy@nvidia.com> Date: Tue, 10 Apr 2018 16:33:06 +0000 Subject: [PATCH 3/3] nvme-rdma: add keep-alive mechanism as admin_q property Activate/deactivate it during admin queue creation/destruction and remove association to nvme ctrl. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> --- drivers/nvme/host/rdma.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 993e3a076a41..ab71f7d3d6a5 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -925,6 +925,8 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, if (error) goto out_quiesce_queue; + nvme_start_keep_alive(&ctrl->ctrl); + return 0; out_quiesce_queue: @@ -1026,6 +1028,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new) static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remove) { + nvme_stop_keep_alive(&ctrl->ctrl); nvme_stop_admin_queue(&ctrl->ctrl); blk_sync_queue(ctrl->ctrl.admin_q); nvme_rdma_stop_queue(&ctrl->queues[0]); @@ -1199,7 +1202,6 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work) struct nvme_rdma_ctrl *ctrl = container_of(work, struct nvme_rdma_ctrl, err_work); - nvme_stop_keep_alive(&ctrl->ctrl); flush_work(&ctrl->ctrl.async_event_work); nvme_rdma_teardown_io_queues(ctrl, false); nvme_start_queues(&ctrl->ctrl); -- 2.18.1 ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-02-15 14:30 ` Max Gurtovoy @ 2022-02-21 10:00 ` Yi Zhang 2022-02-23 10:04 ` Max Gurtovoy 0 siblings, 1 reply; 45+ messages in thread From: Yi Zhang @ 2022-02-21 10:00 UTC (permalink / raw) To: Max Gurtovoy Cc: Sagi Grimberg, Haakon Bugge, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list Hi Max The patch fixed the timeout issue when I use one non-debug kernel, but when I tested on debug kernel with your patches, the timeout still can be triggered with "nvme reset/nvme disconnect-all" operations. On Tue, Feb 15, 2022 at 10:31 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > > Thanks Yi Zhang. > > Few years ago I've sent some patches that were supposed to fix the KA > mechanism but eventually they weren't accepted. > > I haven't tested it since but maybe you can run some tests with it. > > The attached patches are partial and include only rdma transport for > your testing. > > If it work for you we can work on it again and argue for correctness. > > Please don't use the workaround we suggested earlier with these patches. > > -Max. > > On 2/15/2022 3:52 PM, Yi Zhang wrote: > > Hi Sagi/Max > > > > Changing the value to 10 or 15 fixed the timeout issue. > > And the reset operation still needs more than 12s on my environment, I > > also tried disabling the pi_enable, the reset operation will be back > > to 3s, so seems the added 9s was due to the PI enabled code path. > > > > On Mon, Feb 14, 2022 at 8:12 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > >> > >> On 2/14/2022 1:32 PM, Sagi Grimberg wrote: > >>>> Hi Sagi/Max > >>>> Here are more findings with the bisect: > >>>> > >>>> The time for reset operation changed from 3s[1] to 12s[2] after > >>>> commit[3], and after commit[4], the reset operation timeout at the > >>>> second reset[5], let me know if you need any testing for it, thanks. > >>> Does this at least eliminate the timeout? > >>> -- > >>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h > >>> index a162f6c6da6e..60e415078893 100644 > >>> --- a/drivers/nvme/host/nvme.h > >>> +++ b/drivers/nvme/host/nvme.h > >>> @@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout; > >>> extern unsigned int admin_timeout; > >>> #define NVME_ADMIN_TIMEOUT (admin_timeout * HZ) > >>> > >>> -#define NVME_DEFAULT_KATO 5 > >>> +#define NVME_DEFAULT_KATO 10 > >>> > >>> #ifdef CONFIG_ARCH_NO_SG_CHAIN > >>> #define NVME_INLINE_SG_CNT 0 > >>> -- > >>> > >> or for the initial test you can use --keep-alive-tmo=<10 or 15> flag in > >> the connect command > >> > >>>> [1] > >>>> # time nvme reset /dev/nvme0 > >>>> > >>>> real 0m3.049s > >>>> user 0m0.000s > >>>> sys 0m0.006s > >>>> [2] > >>>> # time nvme reset /dev/nvme0 > >>>> > >>>> real 0m12.498s > >>>> user 0m0.000s > >>>> sys 0m0.006s > >>>> [3] > >>>> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc (HEAD) > >>>> Author: Max Gurtovoy <maxg@mellanox.com> > >>>> Date: Tue May 19 17:05:56 2020 +0300 > >>>> > >>>> nvme-rdma: add metadata/T10-PI support > >>>> > >>>> [4] > >>>> commit a70b81bd4d9d2d6c05cfe6ef2a10bccc2e04357a (HEAD) > >>>> Author: Hannes Reinecke <hare@suse.de> > >>>> Date: Fri Apr 16 13:46:20 2021 +0200 > >>>> > >>>> nvme: sanitize KATO setting- > >>> This change effectively changed the keep-alive timeout > >>> from 15 to 5 and modified the host to send keepalives every > >>> 2.5 seconds instead of 5. > >>> > >>> I guess that in combination that now it takes longer to > >>> create and delete rdma resources (either qps or mrs) > >>> it starts to timeout in setups where there are a lot of > >>> queues. > > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-02-21 10:00 ` Yi Zhang @ 2022-02-23 10:04 ` Max Gurtovoy 2022-02-23 10:30 ` Sagi Grimberg 2022-03-01 0:06 ` Yi Zhang 0 siblings, 2 replies; 45+ messages in thread From: Max Gurtovoy @ 2022-02-23 10:04 UTC (permalink / raw) To: Yi Zhang Cc: Sagi Grimberg, Haakon Bugge, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list Hi Yi Zhang, thanks for testing the patches. Can you provide more info on the time it took with both kernels ? The patches don't intend to decrease this time but re-start the KA in early stage - as soon as we create the AQ. I guess we need to debug it offline. On 2/21/2022 12:00 PM, Yi Zhang wrote: > Hi Max > > The patch fixed the timeout issue when I use one non-debug kernel, > but when I tested on debug kernel with your patches, the timeout still > can be triggered with "nvme reset/nvme disconnect-all" operations. > > On Tue, Feb 15, 2022 at 10:31 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >> Thanks Yi Zhang. >> >> Few years ago I've sent some patches that were supposed to fix the KA >> mechanism but eventually they weren't accepted. >> >> I haven't tested it since but maybe you can run some tests with it. >> >> The attached patches are partial and include only rdma transport for >> your testing. >> >> If it work for you we can work on it again and argue for correctness. >> >> Please don't use the workaround we suggested earlier with these patches. >> >> -Max. >> >> On 2/15/2022 3:52 PM, Yi Zhang wrote: >>> Hi Sagi/Max >>> >>> Changing the value to 10 or 15 fixed the timeout issue. >>> And the reset operation still needs more than 12s on my environment, I >>> also tried disabling the pi_enable, the reset operation will be back >>> to 3s, so seems the added 9s was due to the PI enabled code path. >>> >>> On Mon, Feb 14, 2022 at 8:12 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >>>> On 2/14/2022 1:32 PM, Sagi Grimberg wrote: >>>>>> Hi Sagi/Max >>>>>> Here are more findings with the bisect: >>>>>> >>>>>> The time for reset operation changed from 3s[1] to 12s[2] after >>>>>> commit[3], and after commit[4], the reset operation timeout at the >>>>>> second reset[5], let me know if you need any testing for it, thanks. >>>>> Does this at least eliminate the timeout? >>>>> -- >>>>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h >>>>> index a162f6c6da6e..60e415078893 100644 >>>>> --- a/drivers/nvme/host/nvme.h >>>>> +++ b/drivers/nvme/host/nvme.h >>>>> @@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout; >>>>> extern unsigned int admin_timeout; >>>>> #define NVME_ADMIN_TIMEOUT (admin_timeout * HZ) >>>>> >>>>> -#define NVME_DEFAULT_KATO 5 >>>>> +#define NVME_DEFAULT_KATO 10 >>>>> >>>>> #ifdef CONFIG_ARCH_NO_SG_CHAIN >>>>> #define NVME_INLINE_SG_CNT 0 >>>>> -- >>>>> >>>> or for the initial test you can use --keep-alive-tmo=<10 or 15> flag in >>>> the connect command >>>> >>>>>> [1] >>>>>> # time nvme reset /dev/nvme0 >>>>>> >>>>>> real 0m3.049s >>>>>> user 0m0.000s >>>>>> sys 0m0.006s >>>>>> [2] >>>>>> # time nvme reset /dev/nvme0 >>>>>> >>>>>> real 0m12.498s >>>>>> user 0m0.000s >>>>>> sys 0m0.006s >>>>>> [3] >>>>>> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc (HEAD) >>>>>> Author: Max Gurtovoy <maxg@mellanox.com> >>>>>> Date: Tue May 19 17:05:56 2020 +0300 >>>>>> >>>>>> nvme-rdma: add metadata/T10-PI support >>>>>> >>>>>> [4] >>>>>> commit a70b81bd4d9d2d6c05cfe6ef2a10bccc2e04357a (HEAD) >>>>>> Author: Hannes Reinecke <hare@suse.de> >>>>>> Date: Fri Apr 16 13:46:20 2021 +0200 >>>>>> >>>>>> nvme: sanitize KATO setting- >>>>> This change effectively changed the keep-alive timeout >>>>> from 15 to 5 and modified the host to send keepalives every >>>>> 2.5 seconds instead of 5. >>>>> >>>>> I guess that in combination that now it takes longer to >>>>> create and delete rdma resources (either qps or mrs) >>>>> it starts to timeout in setups where there are a lot of >>>>> queues. > > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-02-23 10:04 ` Max Gurtovoy @ 2022-02-23 10:30 ` Sagi Grimberg 2022-02-23 11:20 ` Max Gurtovoy 2022-03-01 0:06 ` Yi Zhang 1 sibling, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2022-02-23 10:30 UTC (permalink / raw) To: Max Gurtovoy, Yi Zhang Cc: Haakon Bugge, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list > Hi Yi Zhang, > > thanks for testing the patches. > > Can you provide more info on the time it took with both kernels ? > > The patches don't intend to decrease this time but re-start the KA in > early stage - as soon as we create the AQ. Still not sure why this addresses the problem, because every io queue connect should reset the keep alive timer in the target. But if at all, just move the keep alive start to nvme_init_ctrl_finish don't expose it to drivers... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-02-23 10:30 ` Sagi Grimberg @ 2022-02-23 11:20 ` Max Gurtovoy 0 siblings, 0 replies; 45+ messages in thread From: Max Gurtovoy @ 2022-02-23 11:20 UTC (permalink / raw) To: Sagi Grimberg, Yi Zhang Cc: Haakon Bugge, Max Gurtovoy, open list:NVM EXPRESS DRIVER, OFED mailing list On 2/23/2022 12:30 PM, Sagi Grimberg wrote: > >> Hi Yi Zhang, >> >> thanks for testing the patches. >> >> Can you provide more info on the time it took with both kernels ? >> >> The patches don't intend to decrease this time but re-start the KA in >> early stage - as soon as we create the AQ. > > Still not sure why this addresses the problem, because every io queue > connect should reset the keep alive timer in the target. Right, in the NVMf connect. Not in transport connect. You first allocate all IO queues (takes time) and only then nvmf_connect all IO queues. In this time you probably get the timeout I guess. between nvme_rdma_alloc_io_queues and nvme_rdma_start_io_queues there is no reason the admin_q can't send keep-alives. This is what I tried pushing upstream few years ago... Anyway, we shouldn't assume anything about the target implementation. We need to do our best to have a working initiator logic. > > But if at all, just move the keep alive start to nvme_init_ctrl_finish > don't expose it to drivers... Yes, back then there was no nvme_init_ctrl_finish code. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-02-23 10:04 ` Max Gurtovoy 2022-02-23 10:30 ` Sagi Grimberg @ 2022-03-01 0:06 ` Yi Zhang 2022-03-16 15:16 ` Sagi Grimberg 1 sibling, 1 reply; 45+ messages in thread From: Yi Zhang @ 2022-03-01 0:06 UTC (permalink / raw) To: Max Gurtovoy, Max Gurtovoy Cc: open list:NVM EXPRESS DRIVER, RDMA mailing list, Sagi Grimberg On Wed, Feb 23, 2022 at 6:04 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > > Hi Yi Zhang, > > thanks for testing the patches. > > Can you provide more info on the time it took with both kernels ? Hi Max Sorry for the late response, here are the test results/dmesg on debug/non-debug kernel with your patch: debug kernel: timeout # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testnqn real 0m16.956s user 0m0.000s sys 0m0.237s # time nvme reset /dev/nvme0 real 1m33.623s user 0m0.000s sys 0m0.024s # time nvme disconnect-all real 1m26.640s user 0m0.000s sys 0m9.969s host dmesg: https://pastebin.com/8T3Lqtkn target dmesg: https://pastebin.com/KpFP7xG2 non-debug kernel: no timeout issue, but still 12s for reset, and 8s for disconnect host: # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testnqn real 0m4.579s user 0m0.000s sys 0m0.004s # time nvme reset /dev/nvme0 real 0m12.778s user 0m0.000s sys 0m0.006s # time nvme reset /dev/nvme0 real 0m12.793s user 0m0.000s sys 0m0.006s # time nvme reset /dev/nvme0 real 0m12.808s user 0m0.000s sys 0m0.006s # time nvme disconnect-all real 0m8.348s user 0m0.000s sys 0m0.189s > > The patches don't intend to decrease this time but re-start the KA in > early stage - as soon as we create the AQ. > > I guess we need to debug it offline. > > On 2/21/2022 12:00 PM, Yi Zhang wrote: > > Hi Max > > > > The patch fixed the timeout issue when I use one non-debug kernel, > > but when I tested on debug kernel with your patches, the timeout still > > can be triggered with "nvme reset/nvme disconnect-all" operations. > > > > On Tue, Feb 15, 2022 at 10:31 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > >> Thanks Yi Zhang. > >> > >> Few years ago I've sent some patches that were supposed to fix the KA > >> mechanism but eventually they weren't accepted. > >> > >> I haven't tested it since but maybe you can run some tests with it. > >> > >> The attached patches are partial and include only rdma transport for > >> your testing. > >> > >> If it work for you we can work on it again and argue for correctness. > >> > >> Please don't use the workaround we suggested earlier with these patches. > >> > >> -Max. > >> > >> On 2/15/2022 3:52 PM, Yi Zhang wrote: > >>> Hi Sagi/Max > >>> > >>> Changing the value to 10 or 15 fixed the timeout issue. > >>> And the reset operation still needs more than 12s on my environment, I > >>> also tried disabling the pi_enable, the reset operation will be back > >>> to 3s, so seems the added 9s was due to the PI enabled code path. > >>> > >>> On Mon, Feb 14, 2022 at 8:12 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > >>>> On 2/14/2022 1:32 PM, Sagi Grimberg wrote: > >>>>>> Hi Sagi/Max > >>>>>> Here are more findings with the bisect: > >>>>>> > >>>>>> The time for reset operation changed from 3s[1] to 12s[2] after > >>>>>> commit[3], and after commit[4], the reset operation timeout at the > >>>>>> second reset[5], let me know if you need any testing for it, thanks. > >>>>> Does this at least eliminate the timeout? > >>>>> -- > >>>>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h > >>>>> index a162f6c6da6e..60e415078893 100644 > >>>>> --- a/drivers/nvme/host/nvme.h > >>>>> +++ b/drivers/nvme/host/nvme.h > >>>>> @@ -25,7 +25,7 @@ extern unsigned int nvme_io_timeout; > >>>>> extern unsigned int admin_timeout; > >>>>> #define NVME_ADMIN_TIMEOUT (admin_timeout * HZ) > >>>>> > >>>>> -#define NVME_DEFAULT_KATO 5 > >>>>> +#define NVME_DEFAULT_KATO 10 > >>>>> > >>>>> #ifdef CONFIG_ARCH_NO_SG_CHAIN > >>>>> #define NVME_INLINE_SG_CNT 0 > >>>>> -- > >>>>> > >>>> or for the initial test you can use --keep-alive-tmo=<10 or 15> flag in > >>>> the connect command > >>>> > >>>>>> [1] > >>>>>> # time nvme reset /dev/nvme0 > >>>>>> > >>>>>> real 0m3.049s > >>>>>> user 0m0.000s > >>>>>> sys 0m0.006s > >>>>>> [2] > >>>>>> # time nvme reset /dev/nvme0 > >>>>>> > >>>>>> real 0m12.498s > >>>>>> user 0m0.000s > >>>>>> sys 0m0.006s > >>>>>> [3] > >>>>>> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc (HEAD) > >>>>>> Author: Max Gurtovoy <maxg@mellanox.com> > >>>>>> Date: Tue May 19 17:05:56 2020 +0300 > >>>>>> > >>>>>> nvme-rdma: add metadata/T10-PI support > >>>>>> > >>>>>> [4] > >>>>>> commit a70b81bd4d9d2d6c05cfe6ef2a10bccc2e04357a (HEAD) > >>>>>> Author: Hannes Reinecke <hare@suse.de> > >>>>>> Date: Fri Apr 16 13:46:20 2021 +0200 > >>>>>> > >>>>>> nvme: sanitize KATO setting- > >>>>> This change effectively changed the keep-alive timeout > >>>>> from 15 to 5 and modified the host to send keepalives every > >>>>> 2.5 seconds instead of 5. > >>>>> > >>>>> I guess that in combination that now it takes longer to > >>>>> create and delete rdma resources (either qps or mrs) > >>>>> it starts to timeout in setups where there are a lot of > >>>>> queues. > > > > > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-03-01 0:06 ` Yi Zhang @ 2022-03-16 15:16 ` Sagi Grimberg 2022-03-19 7:29 ` Yi Zhang 0 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2022-03-16 15:16 UTC (permalink / raw) To: Yi Zhang, Max Gurtovoy, Max Gurtovoy Cc: open list:NVM EXPRESS DRIVER, RDMA mailing list >> Hi Yi Zhang, >> >> thanks for testing the patches. >> >> Can you provide more info on the time it took with both kernels ? > > Hi Max > Sorry for the late response, here are the test results/dmesg on > debug/non-debug kernel with your patch: > debug kernel: timeout > # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testnqn > real 0m16.956s > user 0m0.000s > sys 0m0.237s > # time nvme reset /dev/nvme0 > real 1m33.623s > user 0m0.000s > sys 0m0.024s > # time nvme disconnect-all > real 1m26.640s > user 0m0.000s > sys 0m9.969s > > host dmesg: > https://pastebin.com/8T3Lqtkn > target dmesg: > https://pastebin.com/KpFP7xG2 > > non-debug kernel: no timeout issue, but still 12s for reset, and 8s > for disconnect > host: > # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testnqn > > real 0m4.579s > user 0m0.000s > sys 0m0.004s > # time nvme reset /dev/nvme0 > > real 0m12.778s > user 0m0.000s > sys 0m0.006s > # time nvme reset /dev/nvme0 > > real 0m12.793s > user 0m0.000s > sys 0m0.006s > # time nvme reset /dev/nvme0 > > real 0m12.808s > user 0m0.000s > sys 0m0.006s > # time nvme disconnect-all > > real 0m8.348s > user 0m0.000s > sys 0m0.189s These are very long times for a non-debug kernel... Max, do you see the root cause for this? Yi, does this happen with rxe/siw as well? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-03-16 15:16 ` Sagi Grimberg @ 2022-03-19 7:29 ` Yi Zhang 2022-03-20 10:50 ` Max Gurtovoy 0 siblings, 1 reply; 45+ messages in thread From: Yi Zhang @ 2022-03-19 7:29 UTC (permalink / raw) To: Sagi Grimberg Cc: Max Gurtovoy, Max Gurtovoy, open list:NVM EXPRESS DRIVER, RDMA mailing list On Wed, Mar 16, 2022 at 11:16 PM Sagi Grimberg <sagi@grimberg.me> wrote: > > > >> Hi Yi Zhang, > >> > >> thanks for testing the patches. > >> > >> Can you provide more info on the time it took with both kernels ? > > > > Hi Max > > Sorry for the late response, here are the test results/dmesg on > > debug/non-debug kernel with your patch: > > debug kernel: timeout > > # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testnqn > > real 0m16.956s > > user 0m0.000s > > sys 0m0.237s > > # time nvme reset /dev/nvme0 > > real 1m33.623s > > user 0m0.000s > > sys 0m0.024s > > # time nvme disconnect-all > > real 1m26.640s > > user 0m0.000s > > sys 0m9.969s > > > > host dmesg: > > https://pastebin.com/8T3Lqtkn > > target dmesg: > > https://pastebin.com/KpFP7xG2 > > > > non-debug kernel: no timeout issue, but still 12s for reset, and 8s > > for disconnect > > host: > > # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testnqn > > > > real 0m4.579s > > user 0m0.000s > > sys 0m0.004s > > # time nvme reset /dev/nvme0 > > > > real 0m12.778s > > user 0m0.000s > > sys 0m0.006s > > # time nvme reset /dev/nvme0 > > > > real 0m12.793s > > user 0m0.000s > > sys 0m0.006s > > # time nvme reset /dev/nvme0 > > > > real 0m12.808s > > user 0m0.000s > > sys 0m0.006s > > # time nvme disconnect-all > > > > real 0m8.348s > > user 0m0.000s > > sys 0m0.189s > > These are very long times for a non-debug kernel... > Max, do you see the root cause for this? > > Yi, does this happen with rxe/siw as well? Hi Sagi rxe/siw will take less than 1s with rdma_rxe # time nvme reset /dev/nvme0 real 0m0.094s user 0m0.000s sys 0m0.006s with siw # time nvme reset /dev/nvme0 real 0m0.097s user 0m0.000s sys 0m0.006s This is only reproducible with mlx IB card, as I mentioned before, the reset operation time changed from 3s to 12s after the below commit, could you check this commit? commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc Author: Max Gurtovoy <maxg@mellanox.com> Date: Tue May 19 17:05:56 2020 +0300 nvme-rdma: add metadata/T10-PI support > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-03-19 7:29 ` Yi Zhang @ 2022-03-20 10:50 ` Max Gurtovoy 2022-03-20 13:03 ` Sagi Grimberg 0 siblings, 1 reply; 45+ messages in thread From: Max Gurtovoy @ 2022-03-20 10:50 UTC (permalink / raw) To: Yi Zhang, Sagi Grimberg Cc: Max Gurtovoy, open list:NVM EXPRESS DRIVER, RDMA mailing list On 3/19/2022 9:29 AM, Yi Zhang wrote: > On Wed, Mar 16, 2022 at 11:16 PM Sagi Grimberg <sagi@grimberg.me> wrote: >> >>>> Hi Yi Zhang, >>>> >>>> thanks for testing the patches. >>>> >>>> Can you provide more info on the time it took with both kernels ? >>> Hi Max >>> Sorry for the late response, here are the test results/dmesg on >>> debug/non-debug kernel with your patch: >>> debug kernel: timeout >>> # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testnqn >>> real 0m16.956s >>> user 0m0.000s >>> sys 0m0.237s >>> # time nvme reset /dev/nvme0 >>> real 1m33.623s >>> user 0m0.000s >>> sys 0m0.024s >>> # time nvme disconnect-all >>> real 1m26.640s >>> user 0m0.000s >>> sys 0m9.969s >>> >>> host dmesg: >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2F8T3Lqtkn&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cc89cc47d8acf4ef3256408da097a3305%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637832717692265478%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=qtZ8E6cvHlSu8LbUkBa0ehhguyQRfP%2B%2BC8BEonDNj9Y%3D&reserved=0 >>> target dmesg: >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FKpFP7xG2&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cc89cc47d8acf4ef3256408da097a3305%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637832717692265478%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=DerGWqQmWm9C30FFGbb5AcU%2B%2BrBErKClXzFlqSJT7jw%3D&reserved=0 >>> >>> non-debug kernel: no timeout issue, but still 12s for reset, and 8s >>> for disconnect >>> host: >>> # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testnqn >>> >>> real 0m4.579s >>> user 0m0.000s >>> sys 0m0.004s >>> # time nvme reset /dev/nvme0 >>> >>> real 0m12.778s >>> user 0m0.000s >>> sys 0m0.006s >>> # time nvme reset /dev/nvme0 >>> >>> real 0m12.793s >>> user 0m0.000s >>> sys 0m0.006s >>> # time nvme reset /dev/nvme0 >>> >>> real 0m12.808s >>> user 0m0.000s >>> sys 0m0.006s >>> # time nvme disconnect-all >>> >>> real 0m8.348s >>> user 0m0.000s >>> sys 0m0.189s >> These are very long times for a non-debug kernel... >> Max, do you see the root cause for this? >> >> Yi, does this happen with rxe/siw as well? > Hi Sagi > > rxe/siw will take less than 1s > with rdma_rxe > # time nvme reset /dev/nvme0 > real 0m0.094s > user 0m0.000s > sys 0m0.006s > > with siw > # time nvme reset /dev/nvme0 > real 0m0.097s > user 0m0.000s > sys 0m0.006s > > This is only reproducible with mlx IB card, as I mentioned before, the > reset operation time changed from 3s to 12s after the below commit, > could you check this commit? > > commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc > Author: Max Gurtovoy <maxg@mellanox.com> > Date: Tue May 19 17:05:56 2020 +0300 > > nvme-rdma: add metadata/T10-PI support > I couldn't repro these long reset times. Nevertheless, the above commit added T10-PI offloads. In this commit, for supported devices we create extra resources in HW (more memory keys per task). I suggested doing this configuration as part of the "nvme connect" command and save this resource allocation by default but during the review I was asked to make it the default behavior. Sagi/Christoph, WDYT ? should we reconsider the "nvme connect --with_metadata" option ? > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-03-20 10:50 ` Max Gurtovoy @ 2022-03-20 13:03 ` Sagi Grimberg 2022-03-20 15:11 ` Max Gurtovoy 0 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2022-03-20 13:03 UTC (permalink / raw) To: Max Gurtovoy, Yi Zhang Cc: Max Gurtovoy, open list:NVM EXPRESS DRIVER, RDMA mailing list >>> These are very long times for a non-debug kernel... >>> Max, do you see the root cause for this? >>> >>> Yi, does this happen with rxe/siw as well? >> Hi Sagi >> >> rxe/siw will take less than 1s >> with rdma_rxe >> # time nvme reset /dev/nvme0 >> real 0m0.094s >> user 0m0.000s >> sys 0m0.006s >> >> with siw >> # time nvme reset /dev/nvme0 >> real 0m0.097s >> user 0m0.000s >> sys 0m0.006s >> >> This is only reproducible with mlx IB card, as I mentioned before, the >> reset operation time changed from 3s to 12s after the below commit, >> could you check this commit? >> >> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc >> Author: Max Gurtovoy <maxg@mellanox.com> >> Date: Tue May 19 17:05:56 2020 +0300 >> >> nvme-rdma: add metadata/T10-PI support >> > I couldn't repro these long reset times. It appears to be when setting up a controller with lots of queues maybe? > Nevertheless, the above commit added T10-PI offloads. > > In this commit, for supported devices we create extra resources in HW > (more memory keys per task). > > I suggested doing this configuration as part of the "nvme connect" > command and save this resource allocation by default but during the > review I was asked to make it the default behavior. Don't know if I gave you this feedback or not, but it probably didn't occur to the commenter that it will make the connection establishment take tens of seconds. > Sagi/Christoph, > > WDYT ? should we reconsider the "nvme connect --with_metadata" option ? Maybe you can make these lazily allocated? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-03-20 13:03 ` Sagi Grimberg @ 2022-03-20 15:11 ` Max Gurtovoy 2022-03-21 9:28 ` Sagi Grimberg 0 siblings, 1 reply; 45+ messages in thread From: Max Gurtovoy @ 2022-03-20 15:11 UTC (permalink / raw) To: Sagi Grimberg, Yi Zhang Cc: Max Gurtovoy, open list:NVM EXPRESS DRIVER, RDMA mailing list On 3/20/2022 3:03 PM, Sagi Grimberg wrote: > >>>> These are very long times for a non-debug kernel... >>>> Max, do you see the root cause for this? >>>> >>>> Yi, does this happen with rxe/siw as well? >>> Hi Sagi >>> >>> rxe/siw will take less than 1s >>> with rdma_rxe >>> # time nvme reset /dev/nvme0 >>> real 0m0.094s >>> user 0m0.000s >>> sys 0m0.006s >>> >>> with siw >>> # time nvme reset /dev/nvme0 >>> real 0m0.097s >>> user 0m0.000s >>> sys 0m0.006s >>> >>> This is only reproducible with mlx IB card, as I mentioned before, the >>> reset operation time changed from 3s to 12s after the below commit, >>> could you check this commit? >>> >>> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc >>> Author: Max Gurtovoy <maxg@mellanox.com> >>> Date: Tue May 19 17:05:56 2020 +0300 >>> >>> nvme-rdma: add metadata/T10-PI support >>> >> I couldn't repro these long reset times. > > It appears to be when setting up a controller with lots of queues > maybe? I'm doing that. > >> Nevertheless, the above commit added T10-PI offloads. >> >> In this commit, for supported devices we create extra resources in HW >> (more memory keys per task). >> >> I suggested doing this configuration as part of the "nvme connect" >> command and save this resource allocation by default but during the >> review I was asked to make it the default behavior. > > Don't know if I gave you this feedback or not, but it probably didn't > occur to the commenter that it will make the connection establishment > take tens of seconds. It was known that we create more resources than needed for "non-PI-desired" controllers. > >> Sagi/Christoph, >> >> WDYT ? should we reconsider the "nvme connect --with_metadata" option ? > > Maybe you can make these lazily allocated? You mean something like: diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index fd4720d37cc0..367ba0bb62ab 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1620,10 +1620,19 @@ int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo) } #ifdef CONFIG_BLK_DEV_INTEGRITY -static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type, - u32 max_integrity_segments) +static int nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns) { struct blk_integrity integrity = { }; + u16 ms = ns->ms; + u8 pi_type = ns->pi_type; + u32 max_integrity_segments = ns->ctrl->max_integrity_segments; + int ret; + + if (ns->ctrl->ops->init_integrity) { + ret = ns->ctrl->ops->init_integrity(ns->ctrl); + if (ret) + return ret; + } switch (pi_type) { case NVME_NS_DPS_PI_TYPE3: @@ -1644,11 +1653,13 @@ static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type, integrity.tuple_size = ms; blk_integrity_register(disk, &integrity); blk_queue_max_integrity_segments(disk->queue, max_integrity_segments); + + return 0; } #else -static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type, - u32 max_integrity_segments) +static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns) { + return 0; } #endif /* CONFIG_BLK_DEV_INTEGRITY */ @@ -1853,8 +1864,8 @@ static void nvme_update_disk_info(struct gendisk *disk, if (ns->ms) { if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && (ns->features & NVME_NS_METADATA_SUPPORTED)) - nvme_init_integrity(disk, ns->ms, ns->pi_type, - ns->ctrl->max_integrity_segments); + if (nvme_init_integrity(disk, ns)) + capacity = 0; else if (!nvme_ns_has_pi(ns)) capacity = 0; } @@ -4395,7 +4406,7 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl); and create the resources for the first namespace we find as PI formatted ? ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-03-20 15:11 ` Max Gurtovoy @ 2022-03-21 9:28 ` Sagi Grimberg 2022-03-21 12:11 ` Max Gurtovoy 0 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2022-03-21 9:28 UTC (permalink / raw) To: Max Gurtovoy, Yi Zhang Cc: Max Gurtovoy, open list:NVM EXPRESS DRIVER, RDMA mailing list >>> WDYT ? should we reconsider the "nvme connect --with_metadata" option ? >> >> Maybe you can make these lazily allocated? > > You mean something like: > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index fd4720d37cc0..367ba0bb62ab 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -1620,10 +1620,19 @@ int nvme_getgeo(struct block_device *bdev, > struct hd_geometry *geo) > } > > #ifdef CONFIG_BLK_DEV_INTEGRITY > -static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type, > - u32 max_integrity_segments) > +static int nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns) > { > struct blk_integrity integrity = { }; > + u16 ms = ns->ms; > + u8 pi_type = ns->pi_type; > + u32 max_integrity_segments = ns->ctrl->max_integrity_segments; > + int ret; > + > + if (ns->ctrl->ops->init_integrity) { > + ret = ns->ctrl->ops->init_integrity(ns->ctrl); > + if (ret) > + return ret; > + } > > switch (pi_type) { > case NVME_NS_DPS_PI_TYPE3: > @@ -1644,11 +1653,13 @@ static void nvme_init_integrity(struct gendisk > *disk, u16 ms, u8 pi_type, > integrity.tuple_size = ms; > blk_integrity_register(disk, &integrity); > blk_queue_max_integrity_segments(disk->queue, > max_integrity_segments); > + > + return 0; > } > #else > -static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type, > - u32 max_integrity_segments) > +static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns) > { > + return 0; > } > #endif /* CONFIG_BLK_DEV_INTEGRITY */ > > @@ -1853,8 +1864,8 @@ static void nvme_update_disk_info(struct gendisk > *disk, > if (ns->ms) { > if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && > (ns->features & NVME_NS_METADATA_SUPPORTED)) > - nvme_init_integrity(disk, ns->ms, ns->pi_type, > - ns->ctrl->max_integrity_segments); > + if (nvme_init_integrity(disk, ns)) > + capacity = 0; > else if (!nvme_ns_has_pi(ns)) > capacity = 0; > } > @@ -4395,7 +4406,7 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl); > > > and create the resources for the first namespace we find as PI formatted ? > > I was thinking more along the lines of allocating it as soon as an I/O comes with pi... Is there something internal to the driver that can be done in parallel to expedite the allocation of these extra resources? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [bug report] NVMe/IB: reset_controller need more than 1min 2022-03-21 9:28 ` Sagi Grimberg @ 2022-03-21 12:11 ` Max Gurtovoy 0 siblings, 0 replies; 45+ messages in thread From: Max Gurtovoy @ 2022-03-21 12:11 UTC (permalink / raw) To: Sagi Grimberg, Yi Zhang Cc: Max Gurtovoy, open list:NVM EXPRESS DRIVER, RDMA mailing list On 3/21/2022 11:28 AM, Sagi Grimberg wrote: > >>>> WDYT ? should we reconsider the "nvme connect --with_metadata" >>>> option ? >>> >>> Maybe you can make these lazily allocated? >> >> You mean something like: >> >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index fd4720d37cc0..367ba0bb62ab 100644 >> --- a/drivers/nvme/host/core.c >> +++ b/drivers/nvme/host/core.c >> @@ -1620,10 +1620,19 @@ int nvme_getgeo(struct block_device *bdev, >> struct hd_geometry *geo) >> } >> >> #ifdef CONFIG_BLK_DEV_INTEGRITY >> -static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 >> pi_type, >> - u32 max_integrity_segments) >> +static int nvme_init_integrity(struct gendisk *disk, struct nvme_ns >> *ns) >> { >> struct blk_integrity integrity = { }; >> + u16 ms = ns->ms; >> + u8 pi_type = ns->pi_type; >> + u32 max_integrity_segments = ns->ctrl->max_integrity_segments; >> + int ret; >> + >> + if (ns->ctrl->ops->init_integrity) { >> + ret = ns->ctrl->ops->init_integrity(ns->ctrl); >> + if (ret) >> + return ret; >> + } >> >> switch (pi_type) { >> case NVME_NS_DPS_PI_TYPE3: >> @@ -1644,11 +1653,13 @@ static void nvme_init_integrity(struct >> gendisk *disk, u16 ms, u8 pi_type, >> integrity.tuple_size = ms; >> blk_integrity_register(disk, &integrity); >> blk_queue_max_integrity_segments(disk->queue, >> max_integrity_segments); >> + >> + return 0; >> } >> #else >> -static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 >> pi_type, >> - u32 max_integrity_segments) >> +static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns >> *ns) >> { >> + return 0; >> } >> #endif /* CONFIG_BLK_DEV_INTEGRITY */ >> >> @@ -1853,8 +1864,8 @@ static void nvme_update_disk_info(struct >> gendisk *disk, >> if (ns->ms) { >> if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && >> (ns->features & NVME_NS_METADATA_SUPPORTED)) >> - nvme_init_integrity(disk, ns->ms, ns->pi_type, >> - ns->ctrl->max_integrity_segments); >> + if (nvme_init_integrity(disk, ns)) >> + capacity = 0; >> else if (!nvme_ns_has_pi(ns)) >> capacity = 0; >> } >> @@ -4395,7 +4406,7 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl); >> >> >> and create the resources for the first namespace we find as PI >> formatted ? >> >> > > I was thinking more along the lines of allocating it as soon as an I/O > comes with pi... Is there something internal to the driver that can > be done in parallel to expedite the allocation of these extra resources? Since when are we allocating things in the fast path ? We allocate a pool of MRs per queue. Not MR per task. Do you think it's better to allocate the whole pool in the first PI IO and pay the latency for this IO ? -Max. ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2022-03-21 12:12 UTC | newest] Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-21 16:12 [bug report] NVMe/IB: reset_controller need more than 1min Yi Zhang 2021-05-21 16:12 ` Yi Zhang 2021-05-21 18:00 ` Sagi Grimberg 2021-05-21 18:00 ` Sagi Grimberg 2021-05-22 4:27 ` Yi Zhang 2021-05-22 4:27 ` Yi Zhang 2021-06-23 10:01 ` Yi Zhang 2021-06-23 10:01 ` Yi Zhang 2021-06-23 21:32 ` Sagi Grimberg 2021-06-23 21:32 ` Sagi Grimberg 2021-06-24 16:14 ` Yi Zhang 2021-06-24 16:14 ` Yi Zhang 2021-12-11 3:01 ` Yi Zhang 2021-12-12 9:45 ` Sagi Grimberg 2021-12-13 6:12 ` Yi Zhang 2021-12-13 9:04 ` Sagi Grimberg 2021-12-13 17:05 ` Yi Zhang 2021-12-14 10:39 ` Sagi Grimberg 2021-12-14 12:00 ` Max Gurtovoy 2021-12-15 1:15 ` Yi Zhang 2021-12-15 12:10 ` Max Gurtovoy 2021-12-16 2:18 ` Yi Zhang 2021-12-16 13:21 ` Max Gurtovoy 2021-12-16 16:32 ` Yi Zhang 2021-12-16 17:33 ` Haakon Bugge 2021-12-17 7:03 ` Yi Zhang 2021-12-17 11:19 ` Haakon Bugge 2022-02-14 9:47 ` Yi Zhang 2022-02-14 11:00 ` Chaitanya Kulkarni 2022-02-14 11:32 ` Sagi Grimberg 2022-02-14 12:11 ` Max Gurtovoy 2022-02-15 13:52 ` Yi Zhang 2022-02-15 14:30 ` Max Gurtovoy 2022-02-21 10:00 ` Yi Zhang 2022-02-23 10:04 ` Max Gurtovoy 2022-02-23 10:30 ` Sagi Grimberg 2022-02-23 11:20 ` Max Gurtovoy 2022-03-01 0:06 ` Yi Zhang 2022-03-16 15:16 ` Sagi Grimberg 2022-03-19 7:29 ` Yi Zhang 2022-03-20 10:50 ` Max Gurtovoy 2022-03-20 13:03 ` Sagi Grimberg 2022-03-20 15:11 ` Max Gurtovoy 2022-03-21 9:28 ` Sagi Grimberg 2022-03-21 12:11 ` Max Gurtovoy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.