All of lore.kernel.org
 help / color / mirror / Atom feed
* NVMe Over Fabrics Disconnect Kernel error
@ 2020-03-28  6:12 Anton Brekhov
  2020-03-29  4:14 ` Sagi Grimberg
  2020-03-31 13:26 ` Christoph Hellwig
  0 siblings, 2 replies; 9+ messages in thread
From: Anton Brekhov @ 2020-03-28  6:12 UTC (permalink / raw)
  To: linux-nvme

Greetings!

We're using nvme-cli technology with ZFS and Lustre Filesystem on top of it.
But we constantly come across a kernel error while disconnecting
remote disks from switched off nodes:
```
[  +0,000089] INFO: task kworker/u593:0:82293 blocked for more than 120 seconds.
[  +0,001959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  +0,001941] kworker/u593:0  D ffff90e8493fe2a0     0 82293      2 0x00000080
[  +0,000031] Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
[  +0,000003] Call Trace:
[  +0,000008]  [<ffffffff8177f229>] schedule+0x29/0x70
[  +0,000010]  [<ffffffff81358e85>] blk_mq_freeze_queue_wait+0x75/0xe0
[  +0,000007]  [<ffffffff810c61c0>] ? wake_up_atomic_t+0x30/0x30
[  +0,000006]  [<ffffffff81359cb4>] blk_freeze_queue+0x24/0x50
[  +0,000009]  [<ffffffff8134e0ef>] blk_cleanup_queue+0x7f/0x1b0
[  +0,000012]  [<ffffffffc031158e>] nvme_ns_remove+0x8e/0xb0 [nvme_core]
[  +0,000011]  [<ffffffffc031174b>] nvme_remove_namespaces+0xab/0xf0 [nvme_core]
[  +0,000012]  [<ffffffffc03117e2>] nvme_delete_ctrl_work+0x52/0x80 [nvme_core]
[  +0,000008]  [<ffffffff810bd0ff>] process_one_work+0x17f/0x440
[  +0,000006]  [<ffffffff810be368>] worker_thread+0x278/0x3c0
[  +0,000006]  [<ffffffff810be0f0>] ? manage_workers.isra.26+0x2a0/0x2a0
[  +0,000005]  [<ffffffff810c50d1>] kthread+0xd1/0xe0
[  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
[  +0,000006]  [<ffffffff8178cd1d>] ret_from_fork_nospec_begin+0x7/0x21
[  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
```
Nodes characteristics:
[root@s02p005 ~]# uname -srm
Linux 3.10.0-1062.1.1.el7.x86_64 x86_64
[root@s02p005 ~]# cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)

Where're using nvmet_rdma.
Is there any workaround for such error?

Best Regards,
Anton Brekhov.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NVMe Over Fabrics Disconnect Kernel error
  2020-03-28  6:12 NVMe Over Fabrics Disconnect Kernel error Anton Brekhov
@ 2020-03-29  4:14 ` Sagi Grimberg
  2020-03-29  8:50   ` Max Gurtovoy
  2020-03-31 13:26 ` Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2020-03-29  4:14 UTC (permalink / raw)
  To: Anton Brekhov, linux-nvme


> Greetings!
> 
> We're using nvme-cli technology with ZFS and Lustre Filesystem on top of it.
> But we constantly come across a kernel error while disconnecting
> remote disks from switched off nodes:
> ```
> [  +0,000089] INFO: task kworker/u593:0:82293 blocked for more than 120 seconds.
> [  +0,001959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  +0,001941] kworker/u593:0  D ffff90e8493fe2a0     0 82293      2 0x00000080
> [  +0,000031] Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
> [  +0,000003] Call Trace:
> [  +0,000008]  [<ffffffff8177f229>] schedule+0x29/0x70
> [  +0,000010]  [<ffffffff81358e85>] blk_mq_freeze_queue_wait+0x75/0xe0
> [  +0,000007]  [<ffffffff810c61c0>] ? wake_up_atomic_t+0x30/0x30
> [  +0,000006]  [<ffffffff81359cb4>] blk_freeze_queue+0x24/0x50
> [  +0,000009]  [<ffffffff8134e0ef>] blk_cleanup_queue+0x7f/0x1b0
> [  +0,000012]  [<ffffffffc031158e>] nvme_ns_remove+0x8e/0xb0 [nvme_core]
> [  +0,000011]  [<ffffffffc031174b>] nvme_remove_namespaces+0xab/0xf0 [nvme_core]
> [  +0,000012]  [<ffffffffc03117e2>] nvme_delete_ctrl_work+0x52/0x80 [nvme_core]
> [  +0,000008]  [<ffffffff810bd0ff>] process_one_work+0x17f/0x440
> [  +0,000006]  [<ffffffff810be368>] worker_thread+0x278/0x3c0
> [  +0,000006]  [<ffffffff810be0f0>] ? manage_workers.isra.26+0x2a0/0x2a0
> [  +0,000005]  [<ffffffff810c50d1>] kthread+0xd1/0xe0
> [  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
> [  +0,000006]  [<ffffffff8178cd1d>] ret_from_fork_nospec_begin+0x7/0x21
> [  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
> ```
> Nodes characteristics:
> [root@s02p005 ~]# uname -srm
> Linux 3.10.0-1062.1.1.el7.x86_64 x86_64
> [root@s02p005 ~]# cat /etc/redhat-release
> CentOS Linux release 7.7.1908 (Core)
> 
> Where're using nvmet_rdma.
> Is there any workaround for such error?

It seems like queue freeze is stuck. Can you share more of the
trace so we can see what else is blocking? If not, when
it reproduces run echo t > /proc/sysrq-trigger and share the
log.

Thanks.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NVMe Over Fabrics Disconnect Kernel error
  2020-03-29  4:14 ` Sagi Grimberg
@ 2020-03-29  8:50   ` Max Gurtovoy
  2020-03-29 11:38     ` Anton Brekhov
  0 siblings, 1 reply; 9+ messages in thread
From: Max Gurtovoy @ 2020-03-29  8:50 UTC (permalink / raw)
  To: Sagi Grimberg, Anton Brekhov, linux-nvme


On 3/29/2020 7:14 AM, Sagi Grimberg wrote:
>
>> Greetings!
>>
>> We're using nvme-cli technology with ZFS and Lustre Filesystem on top 
>> of it.
>> But we constantly come across a kernel error while disconnecting
>> remote disks from switched off nodes:
>> ```
>> [  +0,000089] INFO: task kworker/u593:0:82293 blocked for more than 
>> 120 seconds.
>> [  +0,001959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  +0,001941] kworker/u593:0  D ffff90e8493fe2a0     0 82293      2 
>> 0x00000080
>> [  +0,000031] Workqueue: nvme-delete-wq nvme_delete_ctrl_work 
>> [nvme_core]
>> [  +0,000003] Call Trace:
>> [  +0,000008]  [<ffffffff8177f229>] schedule+0x29/0x70
>> [  +0,000010]  [<ffffffff81358e85>] blk_mq_freeze_queue_wait+0x75/0xe0
>> [  +0,000007]  [<ffffffff810c61c0>] ? wake_up_atomic_t+0x30/0x30
>> [  +0,000006]  [<ffffffff81359cb4>] blk_freeze_queue+0x24/0x50
>> [  +0,000009]  [<ffffffff8134e0ef>] blk_cleanup_queue+0x7f/0x1b0
>> [  +0,000012]  [<ffffffffc031158e>] nvme_ns_remove+0x8e/0xb0 [nvme_core]
>> [  +0,000011]  [<ffffffffc031174b>] nvme_remove_namespaces+0xab/0xf0 
>> [nvme_core]
>> [  +0,000012]  [<ffffffffc03117e2>] nvme_delete_ctrl_work+0x52/0x80 
>> [nvme_core]
>> [  +0,000008]  [<ffffffff810bd0ff>] process_one_work+0x17f/0x440
>> [  +0,000006]  [<ffffffff810be368>] worker_thread+0x278/0x3c0
>> [  +0,000006]  [<ffffffff810be0f0>] ? manage_workers.isra.26+0x2a0/0x2a0
>> [  +0,000005]  [<ffffffff810c50d1>] kthread+0xd1/0xe0
>> [  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
>> [  +0,000006]  [<ffffffff8178cd1d>] ret_from_fork_nospec_begin+0x7/0x21
>> [  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
>> ```
>> Nodes characteristics:
>> [root@s02p005 ~]# uname -srm
>> Linux 3.10.0-1062.1.1.el7.x86_64 x86_64
>> [root@s02p005 ~]# cat /etc/redhat-release
>> CentOS Linux release 7.7.1908 (Core)
>>
>> Where're using nvmet_rdma.
>> Is there any workaround for such error?
>
> It seems like queue freeze is stuck. Can you share more of the
> trace so we can see what else is blocking? If not, when
> it reproduces run echo t > /proc/sysrq-trigger and share the
> log.

Anton,

Can you repro this with latest nvme branch ? or only inbox Centos7.7 ?


>
> Thanks.
>
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@lists.infradead.org
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-nvme&amp;data=02%7C01%7Cmaxg%40mellanox.com%7Cb8a2ccf609be4e77c09a08d7d397be0f%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637210520985686688&amp;sdata=9iYRWRp2LtIVefdfe%2BN4vqj2WMLTJCByPvbzryaiA74%3D&amp;reserved=0 
>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NVMe Over Fabrics Disconnect Kernel error
  2020-03-29  8:50   ` Max Gurtovoy
@ 2020-03-29 11:38     ` Anton Brekhov
  2020-03-29 11:56       ` Max Gurtovoy
  0 siblings, 1 reply; 9+ messages in thread
From: Anton Brekhov @ 2020-03-29 11:38 UTC (permalink / raw)
  To: Max Gurtovoy; +Cc: Sagi Grimberg, linux-nvme, Konstantin Ponomarev

Max,
This error we've obtained while using the latest release of nvme-cli:
[root@s02p005 ~]# nvme version
nvme version 1.10.1

Or there were some major changes after latest release?
Thanks.

вс, 29 мар. 2020 г. в 11:51, Max Gurtovoy <maxg@mellanox.com>:
>
>
> On 3/29/2020 7:14 AM, Sagi Grimberg wrote:
> >
> >> Greetings!
> >>
> >> We're using nvme-cli technology with ZFS and Lustre Filesystem on top
> >> of it.
> >> But we constantly come across a kernel error while disconnecting
> >> remote disks from switched off nodes:
> >> ```
> >> [  +0,000089] INFO: task kworker/u593:0:82293 blocked for more than
> >> 120 seconds.
> >> [  +0,001959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> disables this message.
> >> [  +0,001941] kworker/u593:0  D ffff90e8493fe2a0     0 82293      2
> >> 0x00000080
> >> [  +0,000031] Workqueue: nvme-delete-wq nvme_delete_ctrl_work
> >> [nvme_core]
> >> [  +0,000003] Call Trace:
> >> [  +0,000008]  [<ffffffff8177f229>] schedule+0x29/0x70
> >> [  +0,000010]  [<ffffffff81358e85>] blk_mq_freeze_queue_wait+0x75/0xe0
> >> [  +0,000007]  [<ffffffff810c61c0>] ? wake_up_atomic_t+0x30/0x30
> >> [  +0,000006]  [<ffffffff81359cb4>] blk_freeze_queue+0x24/0x50
> >> [  +0,000009]  [<ffffffff8134e0ef>] blk_cleanup_queue+0x7f/0x1b0
> >> [  +0,000012]  [<ffffffffc031158e>] nvme_ns_remove+0x8e/0xb0 [nvme_core]
> >> [  +0,000011]  [<ffffffffc031174b>] nvme_remove_namespaces+0xab/0xf0
> >> [nvme_core]
> >> [  +0,000012]  [<ffffffffc03117e2>] nvme_delete_ctrl_work+0x52/0x80
> >> [nvme_core]
> >> [  +0,000008]  [<ffffffff810bd0ff>] process_one_work+0x17f/0x440
> >> [  +0,000006]  [<ffffffff810be368>] worker_thread+0x278/0x3c0
> >> [  +0,000006]  [<ffffffff810be0f0>] ? manage_workers.isra.26+0x2a0/0x2a0
> >> [  +0,000005]  [<ffffffff810c50d1>] kthread+0xd1/0xe0
> >> [  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
> >> [  +0,000006]  [<ffffffff8178cd1d>] ret_from_fork_nospec_begin+0x7/0x21
> >> [  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
> >> ```
> >> Nodes characteristics:
> >> [root@s02p005 ~]# uname -srm
> >> Linux 3.10.0-1062.1.1.el7.x86_64 x86_64
> >> [root@s02p005 ~]# cat /etc/redhat-release
> >> CentOS Linux release 7.7.1908 (Core)
> >>
> >> Where're using nvmet_rdma.
> >> Is there any workaround for such error?
> >
> > It seems like queue freeze is stuck. Can you share more of the
> > trace so we can see what else is blocking? If not, when
> > it reproduces run echo t > /proc/sysrq-trigger and share the
> > log.
>
> Anton,
>
> Can you repro this with latest nvme branch ? or only inbox Centos7.7 ?
>
>
> >
> > Thanks.
> >
> > _______________________________________________
> > linux-nvme mailing list
> > linux-nvme@lists.infradead.org
> > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-nvme&amp;data=02%7C01%7Cmaxg%40mellanox.com%7Cb8a2ccf609be4e77c09a08d7d397be0f%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637210520985686688&amp;sdata=9iYRWRp2LtIVefdfe%2BN4vqj2WMLTJCByPvbzryaiA74%3D&amp;reserved=0
> >

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NVMe Over Fabrics Disconnect Kernel error
  2020-03-29 11:38     ` Anton Brekhov
@ 2020-03-29 11:56       ` Max Gurtovoy
  2020-03-29 14:40         ` Anton Brekhov
  0 siblings, 1 reply; 9+ messages in thread
From: Max Gurtovoy @ 2020-03-29 11:56 UTC (permalink / raw)
  To: Anton Brekhov; +Cc: Sagi Grimberg, linux-nvme, Konstantin Ponomarev


On 3/29/2020 2:38 PM, Anton Brekhov wrote:
> Max,
> This error we've obtained while using the latest release of nvme-cli:
> [root@s02p005 ~]# nvme version
> nvme version 1.10.1
>
> Or there were some major changes after latest release?

I referred to the kernel version.

Can you check your scenario with git://git.infradead.org/nvme.git 
(branch nvme-5.7 or nvme-5.7-rc1).

-Max.


> Thanks.
>
> вс, 29 мар. 2020 г. в 11:51, Max Gurtovoy <maxg@mellanox.com>:
>>
>> On 3/29/2020 7:14 AM, Sagi Grimberg wrote:
>>>> Greetings!
>>>>
>>>> We're using nvme-cli technology with ZFS and Lustre Filesystem on top
>>>> of it.
>>>> But we constantly come across a kernel error while disconnecting
>>>> remote disks from switched off nodes:
>>>> ```
>>>> [  +0,000089] INFO: task kworker/u593:0:82293 blocked for more than
>>>> 120 seconds.
>>>> [  +0,001959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>> disables this message.
>>>> [  +0,001941] kworker/u593:0  D ffff90e8493fe2a0     0 82293      2
>>>> 0x00000080
>>>> [  +0,000031] Workqueue: nvme-delete-wq nvme_delete_ctrl_work
>>>> [nvme_core]
>>>> [  +0,000003] Call Trace:
>>>> [  +0,000008]  [<ffffffff8177f229>] schedule+0x29/0x70
>>>> [  +0,000010]  [<ffffffff81358e85>] blk_mq_freeze_queue_wait+0x75/0xe0
>>>> [  +0,000007]  [<ffffffff810c61c0>] ? wake_up_atomic_t+0x30/0x30
>>>> [  +0,000006]  [<ffffffff81359cb4>] blk_freeze_queue+0x24/0x50
>>>> [  +0,000009]  [<ffffffff8134e0ef>] blk_cleanup_queue+0x7f/0x1b0
>>>> [  +0,000012]  [<ffffffffc031158e>] nvme_ns_remove+0x8e/0xb0 [nvme_core]
>>>> [  +0,000011]  [<ffffffffc031174b>] nvme_remove_namespaces+0xab/0xf0
>>>> [nvme_core]
>>>> [  +0,000012]  [<ffffffffc03117e2>] nvme_delete_ctrl_work+0x52/0x80
>>>> [nvme_core]
>>>> [  +0,000008]  [<ffffffff810bd0ff>] process_one_work+0x17f/0x440
>>>> [  +0,000006]  [<ffffffff810be368>] worker_thread+0x278/0x3c0
>>>> [  +0,000006]  [<ffffffff810be0f0>] ? manage_workers.isra.26+0x2a0/0x2a0
>>>> [  +0,000005]  [<ffffffff810c50d1>] kthread+0xd1/0xe0
>>>> [  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
>>>> [  +0,000006]  [<ffffffff8178cd1d>] ret_from_fork_nospec_begin+0x7/0x21
>>>> [  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
>>>> ```
>>>> Nodes characteristics:
>>>> [root@s02p005 ~]# uname -srm
>>>> Linux 3.10.0-1062.1.1.el7.x86_64 x86_64
>>>> [root@s02p005 ~]# cat /etc/redhat-release
>>>> CentOS Linux release 7.7.1908 (Core)
>>>>
>>>> Where're using nvmet_rdma.
>>>> Is there any workaround for such error?
>>> It seems like queue freeze is stuck. Can you share more of the
>>> trace so we can see what else is blocking? If not, when
>>> it reproduces run echo t > /proc/sysrq-trigger and share the
>>> log.
>> Anton,
>>
>> Can you repro this with latest nvme branch ? or only inbox Centos7.7 ?
>>
>>
>>> Thanks.
>>>
>>> _______________________________________________
>>> linux-nvme mailing list
>>> linux-nvme@lists.infradead.org
>>> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-nvme&amp;data=02%7C01%7Cmaxg%40mellanox.com%7C14471b0f1bab4be2a68108d7d3d5c89b%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637210787464775707&amp;sdata=xmwgk5ljFt%2F7%2BsZRQmP6mfwuR0hhjoYsvNrrLUBayqI%3D&amp;reserved=0
>>>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NVMe Over Fabrics Disconnect Kernel error
  2020-03-29 11:56       ` Max Gurtovoy
@ 2020-03-29 14:40         ` Anton Brekhov
  2020-03-30  4:38           ` Sagi Grimberg
  0 siblings, 1 reply; 9+ messages in thread
From: Anton Brekhov @ 2020-03-29 14:40 UTC (permalink / raw)
  To: Max Gurtovoy; +Cc: Sagi Grimberg, linux-nvme, Konstantin Ponomarev

Then I'm afraid that we can't reproduce this, cause we use Intel omni
path drivers that is not compatible with the latest version of the
kernel.
Today we've tried to install new version of kernel, upgrade to Centos
8 and 8.1. However nothing is compatible with our technologies in our
HPC cluster.
If there is any other workarounds or ideas we would be happy to hear
it from you.
If no, we'll stay in touch when will be upgrading whole cluster.

Anton

вс, 29 мар. 2020 г. в 14:56, Max Gurtovoy <maxg@mellanox.com>:

>
>
> On 3/29/2020 2:38 PM, Anton Brekhov wrote:
> > Max,
> > This error we've obtained while using the latest release of nvme-cli:
> > [root@s02p005 ~]# nvme version
> > nvme version 1.10.1
> >
> > Or there were some major changes after latest release?
>
> I referred to the kernel version.
>
> Can you check your scenario with git://git.infradead.org/nvme.git
> (branch nvme-5.7 or nvme-5.7-rc1).
>
> -Max.
>
>
> > Thanks.
> >
> > вс, 29 мар. 2020 г. в 11:51, Max Gurtovoy <maxg@mellanox.com>:
> >>
> >> On 3/29/2020 7:14 AM, Sagi Grimberg wrote:
> >>>> Greetings!
> >>>>
> >>>> We're using nvme-cli technology with ZFS and Lustre Filesystem on top
> >>>> of it.
> >>>> But we constantly come across a kernel error while disconnecting
> >>>> remote disks from switched off nodes:
> >>>> ```
> >>>> [  +0,000089] INFO: task kworker/u593:0:82293 blocked for more than
> >>>> 120 seconds.
> >>>> [  +0,001959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>> disables this message.
> >>>> [  +0,001941] kworker/u593:0  D ffff90e8493fe2a0     0 82293      2
> >>>> 0x00000080
> >>>> [  +0,000031] Workqueue: nvme-delete-wq nvme_delete_ctrl_work
> >>>> [nvme_core]
> >>>> [  +0,000003] Call Trace:
> >>>> [  +0,000008]  [<ffffffff8177f229>] schedule+0x29/0x70
> >>>> [  +0,000010]  [<ffffffff81358e85>] blk_mq_freeze_queue_wait+0x75/0xe0
> >>>> [  +0,000007]  [<ffffffff810c61c0>] ? wake_up_atomic_t+0x30/0x30
> >>>> [  +0,000006]  [<ffffffff81359cb4>] blk_freeze_queue+0x24/0x50
> >>>> [  +0,000009]  [<ffffffff8134e0ef>] blk_cleanup_queue+0x7f/0x1b0
> >>>> [  +0,000012]  [<ffffffffc031158e>] nvme_ns_remove+0x8e/0xb0 [nvme_core]
> >>>> [  +0,000011]  [<ffffffffc031174b>] nvme_remove_namespaces+0xab/0xf0
> >>>> [nvme_core]
> >>>> [  +0,000012]  [<ffffffffc03117e2>] nvme_delete_ctrl_work+0x52/0x80
> >>>> [nvme_core]
> >>>> [  +0,000008]  [<ffffffff810bd0ff>] process_one_work+0x17f/0x440
> >>>> [  +0,000006]  [<ffffffff810be368>] worker_thread+0x278/0x3c0
> >>>> [  +0,000006]  [<ffffffff810be0f0>] ? manage_workers.isra.26+0x2a0/0x2a0
> >>>> [  +0,000005]  [<ffffffff810c50d1>] kthread+0xd1/0xe0
> >>>> [  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
> >>>> [  +0,000006]  [<ffffffff8178cd1d>] ret_from_fork_nospec_begin+0x7/0x21
> >>>> [  +0,000006]  [<ffffffff810c5000>] ? insert_kthread_work+0x40/0x40
> >>>> ```
> >>>> Nodes characteristics:
> >>>> [root@s02p005 ~]# uname -srm
> >>>> Linux 3.10.0-1062.1.1.el7.x86_64 x86_64
> >>>> [root@s02p005 ~]# cat /etc/redhat-release
> >>>> CentOS Linux release 7.7.1908 (Core)
> >>>>
> >>>> Where're using nvmet_rdma.
> >>>> Is there any workaround for such error?
> >>> It seems like queue freeze is stuck. Can you share more of the
> >>> trace so we can see what else is blocking? If not, when
> >>> it reproduces run echo t > /proc/sysrq-trigger and share the
> >>> log.
> >> Anton,
> >>
> >> Can you repro this with latest nvme branch ? or only inbox Centos7.7 ?
> >>
> >>
> >>> Thanks.
> >>>
> >>> _______________________________________________
> >>> linux-nvme mailing list
> >>> linux-nvme@lists.infradead.org
> >>> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-nvme&amp;data=02%7C01%7Cmaxg%40mellanox.com%7C14471b0f1bab4be2a68108d7d3d5c89b%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637210787464775707&amp;sdata=xmwgk5ljFt%2F7%2BsZRQmP6mfwuR0hhjoYsvNrrLUBayqI%3D&amp;reserved=0
> >>>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NVMe Over Fabrics Disconnect Kernel error
  2020-03-29 14:40         ` Anton Brekhov
@ 2020-03-30  4:38           ` Sagi Grimberg
  2020-03-30  8:26             ` Anton Brekhov
  0 siblings, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2020-03-30  4:38 UTC (permalink / raw)
  To: Anton Brekhov, Max Gurtovoy; +Cc: linux-nvme, Konstantin Ponomarev


> Then I'm afraid that we can't reproduce this, cause we use Intel omni
> path drivers that is not compatible with the latest version of the
> kernel.
> Today we've tried to install new version of kernel, upgrade to Centos
> 8 and 8.1. However nothing is compatible with our technologies in our
> HPC cluster.
> If there is any other workarounds or ideas we would be happy to hear
> it from you.
> If no, we'll stay in touch when will be upgrading whole cluster.

Given that this is 1 RH issue, then probably you should open a ticket to 
RH asking to backport a fix or report a bug if it still exists upstream.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NVMe Over Fabrics Disconnect Kernel error
  2020-03-30  4:38           ` Sagi Grimberg
@ 2020-03-30  8:26             ` Anton Brekhov
  0 siblings, 0 replies; 9+ messages in thread
From: Anton Brekhov @ 2020-03-30  8:26 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Max Gurtovoy, linux-nvme, Konstantin Ponomarev

Ok, thank you!

By the way I found this thread https://lkml.org/lkml/2019/7/18/832
It seems to me like our problem. Maybe I'm wrong.

пн, 30 мар. 2020 г. в 07:39, Sagi Grimberg <sagi@grimberg.me>:
>
>
> > Then I'm afraid that we can't reproduce this, cause we use Intel omni
> > path drivers that is not compatible with the latest version of the
> > kernel.
> > Today we've tried to install new version of kernel, upgrade to Centos
> > 8 and 8.1. However nothing is compatible with our technologies in our
> > HPC cluster.
> > If there is any other workarounds or ideas we would be happy to hear
> > it from you.
> > If no, we'll stay in touch when will be upgrading whole cluster.
>
> Given that this is 1 RH issue, then probably you should open a ticket to
> RH asking to backport a fix or report a bug if it still exists upstream.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NVMe Over Fabrics Disconnect Kernel error
  2020-03-28  6:12 NVMe Over Fabrics Disconnect Kernel error Anton Brekhov
  2020-03-29  4:14 ` Sagi Grimberg
@ 2020-03-31 13:26 ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2020-03-31 13:26 UTC (permalink / raw)
  To: Anton Brekhov; +Cc: linux-nvme

On Sat, Mar 28, 2020 at 09:12:37AM +0300, Anton Brekhov wrote:
> Greetings!
> 
> We're using nvme-cli technology with ZFS and Lustre Filesystem on top of it.
> But we constantly come across a kernel error while disconnecting
> remote disks from switched off nodes:

Which is totally your problem if you use out of tree modules that
violate licensing.  *plonk*

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-31 15:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-28  6:12 NVMe Over Fabrics Disconnect Kernel error Anton Brekhov
2020-03-29  4:14 ` Sagi Grimberg
2020-03-29  8:50   ` Max Gurtovoy
2020-03-29 11:38     ` Anton Brekhov
2020-03-29 11:56       ` Max Gurtovoy
2020-03-29 14:40         ` Anton Brekhov
2020-03-30  4:38           ` Sagi Grimberg
2020-03-30  8:26             ` Anton Brekhov
2020-03-31 13:26 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.