All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug report] blktests nvme/tcp triggered WARNING at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c
@ 2022-07-08 16:03 Yi Zhang
  2022-07-10  9:41 ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Yi Zhang @ 2022-07-08 16:03 UTC (permalink / raw)
  To: linux-block, open list:NVM EXPRESS DRIVER; +Cc: Sagi Grimberg

Hello

I reproduced this issue on the linux-block/for-next, pls help check
it, feel free to let me know if you need info/test, thanks.

[ 6026.144114] run blktests nvme/012 at 2022-07-08 08:15:09
[ 6026.271866] loop0: detected capacity change from 0 to 2097152
[ 6026.294403] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 6026.322827] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 6026.347984] nvmet: creating nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:90390a00-4597-11e9-b935-3c18a0043981.
[ 6026.364007] nvme nvme0: creating 32 I/O queues.
[ 6026.380279] nvme nvme0: mapped 32/0/0 default/read/poll queues.
[ 6026.398481] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420
[ 6027.653759] XFS (nvme0n1): Mounting V5 Filesystem
[ 6027.677423] XFS (nvme0n1): Ending clean mount
[ 6173.064201] XFS (nvme0n1): Unmounting Filesystem
[ 6173.656286] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
[ 6174.005589] ------------[ cut here ]------------
[ 6174.010200] workqueue: WQ_MEM_RECLAIM
nvmet-wq:nvmet_tcp_release_queue_work [nvmet_tcp] is flushing
!WQ_MEM_RECLAIM nvmet_tcp_wq:nvmet_tcp_io_work [nvmet_tcp]
[ 6174.010216] WARNING: CPU: 20 PID: 14456 at kernel/workqueue.c:2628
check_flush_dependency+0x110/0x14c
[ 6174.033579] Modules linked in: nvme_tcp nvme_fabrics nvmet_tcp
nvmet nvme nvme_core loop tls mlx4_ib ib_uverbs ib_core mlx4_en rfkill
sunrpc vfat fat joydev acpi_ipmi mlx4_core igb ipmi_ssif cppc_cpufreq
fuse zram xfs uas usb_storage dwc3 ulpi udc_core ast crct10dif_ce
drm_vram_helper ghash_ce drm_ttm_helper sbsa_gwdt ttm
i2c_xgene_slimpro ahci_platform gpio_dwapb xgene_hwmon xhci_plat_hcd
ipmi_devintf ipmi_msghandler [last unloaded: nvmet]
[ 6174.072622] CPU: 20 PID: 14456 Comm: kworker/20:8 Not tainted 5.19.0-rc5+ #1
[ 6174.079660] Hardware name: Lenovo HR350A            7X35CTO1WW
/FALCON     , BIOS hve104q-1.14 06/25/2020
[ 6174.089474] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
[ 6174.096168] pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 6174.103117] pc : check_flush_dependency+0x110/0x14c
[ 6174.107985] lr : check_flush_dependency+0x110/0x14c
[ 6174.112851] sp : ffff800026b2bb10
[ 6174.116153] x29: ffff800026b2bb10 x28: 0000000000000000 x27: ffff80000a94f240
[ 6174.123279] x26: ffff800009304a90 x25: 0000000000000001 x24: ffff80000a570448
[ 6174.130405] x23: ffff009f6c6d82a8 x22: fffffbffee9cea00 x21: ffff800001395430
[ 6174.137532] x20: ffff0008c0fb3000 x19: ffff00087b7dda00 x18: ffffffffffffffff
[ 6174.144657] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000006
[ 6174.151783] x14: 0000000000000001 x13: 204d49414c434552 x12: 5f4d454d5f515721
[ 6174.158909] x11: 00000000ffffdfff x10: ffff80000a53eb70 x9 : ffff80000824f754
[ 6174.166034] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
[ 6174.173160] x5 : 0000000000001fff x4 : 0000000000000000 x3 : 0000000000000027
[ 6174.180286] x2 : 0000000000000002 x1 : ffff0008c6080000 x0 : 0000000000000092
[ 6174.187412] Call trace:
[ 6174.189847]  check_flush_dependency+0x110/0x14c
[ 6174.194367]  start_flush_work+0xd8/0x410
[ 6174.198278]  __flush_work+0x88/0xe0
[ 6174.201755]  __cancel_work_timer+0x118/0x194
[ 6174.206014]  cancel_work_sync+0x20/0x2c
[ 6174.209837]  nvmet_tcp_release_queue_work+0xcc/0x300 [nvmet_tcp]
[ 6174.215834]  process_one_work+0x2b8/0x704
[ 6174.219832]  worker_thread+0x80/0x42c
[ 6174.223483]  kthread+0xfc/0x110
[ 6174.226613]  ret_from_fork+0x10/0x20
[ 6174.230179] irq event stamp: 0
[ 6174.233221] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[ 6174.239478] hardirqs last disabled at (0): [<ffff800008198c44>]
copy_process+0x674/0x14a0
[ 6174.247644] softirqs last  enabled at (0): [<ffff800008198c44>]
copy_process+0x674/0x14a0
[ 6174.255809] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 6174.262063] ---[ end trace 0000000000000000 ]-----

Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] blktests nvme/tcp triggered WARNING at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c
  2022-07-08 16:03 [bug report] blktests nvme/tcp triggered WARNING at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c Yi Zhang
@ 2022-07-10  9:41 ` Sagi Grimberg
  2022-07-19 16:08   ` Yi Zhang
  2022-07-20  6:18   ` Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Sagi Grimberg @ 2022-07-10  9:41 UTC (permalink / raw)
  To: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER


> Hello
> 
> I reproduced this issue on the linux-block/for-next, pls help check
> it, feel free to let me know if you need info/test, thanks.


These reports are making me tired... Should we just remove
MEM_RECLAIM from all nvme/nvmet workqueues are be done with
it?

The only downside is that nvme reset/error-recovery will
also be susceptible for low-memory situation...

Christoph, Keith, what are your thoughts?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] blktests nvme/tcp triggered WARNING at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c
  2022-07-10  9:41 ` Sagi Grimberg
@ 2022-07-19 16:08   ` Yi Zhang
  2022-07-20  6:18   ` Christoph Hellwig
  1 sibling, 0 replies; 8+ messages in thread
From: Yi Zhang @ 2022-07-19 16:08 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: linux-block, open list:NVM EXPRESS DRIVER, Sagi Grimberg

On Sun, Jul 10, 2022 at 5:41 PM Sagi Grimberg <sagi@grimberg.me> wrote:
>
>
> > Hello
> >
> > I reproduced this issue on the linux-block/for-next, pls help check
> > it, feel free to let me know if you need info/test, thanks.
>
>
> These reports are making me tired... Should we just remove
> MEM_RECLAIM from all nvme/nvmet workqueues are be done with
> it?
>
> The only downside is that nvme reset/error-recovery will
> also be susceptible for low-memory situation...
>
> Christoph, Keith, what are your thoughts?
>

Hi Christoph, Keith

Ping, in case you missed Sagi's comment. :)



--
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] blktests nvme/tcp triggered WARNING at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c
  2022-07-10  9:41 ` Sagi Grimberg
  2022-07-19 16:08   ` Yi Zhang
@ 2022-07-20  6:18   ` Christoph Hellwig
  2022-07-21 22:13     ` Sagi Grimberg
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2022-07-20  6:18 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER

On Sun, Jul 10, 2022 at 12:41:33PM +0300, Sagi Grimberg wrote:
> 
> > Hello
> > 
> > I reproduced this issue on the linux-block/for-next, pls help check
> > it, feel free to let me know if you need info/test, thanks.
> 
> 
> These reports are making me tired... Should we just remove
> MEM_RECLAIM from all nvme/nvmet workqueues are be done with
> it?
> 
> The only downside is that nvme reset/error-recovery will
> also be susceptible for low-memory situation...

We can't just do that.  We need to sort out the dependency chains
properly.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] blktests nvme/tcp triggered WARNING at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c
  2022-07-20  6:18   ` Christoph Hellwig
@ 2022-07-21 22:13     ` Sagi Grimberg
  2022-07-22  4:46       ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2022-07-21 22:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER


>>> Hello
>>>
>>> I reproduced this issue on the linux-block/for-next, pls help check
>>> it, feel free to let me know if you need info/test, thanks.
>>
>>
>> These reports are making me tired... Should we just remove
>> MEM_RECLAIM from all nvme/nvmet workqueues are be done with
>> it?
>>
>> The only downside is that nvme reset/error-recovery will
>> also be susceptible for low-memory situation...
> 
> We can't just do that.  We need to sort out the dependency chains
> properly.

The problem is that nvme_wq is MEM_RECLAIM, and nvme_tcp_wq is
for the socket threads, that does not need to be MEM_RECLAIM workqueue.
But reset/error-recovery that take place on nvme_wq, stop nvme-tcp
queues, and that must involve flushing queue->io_work in order to
fence concurrent execution.

So what is the solution? make nvme_tcp_wq MEM_RECLAIM?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] blktests nvme/tcp triggered WARNING at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c
  2022-07-21 22:13     ` Sagi Grimberg
@ 2022-07-22  4:46       ` Christoph Hellwig
  2022-07-24  8:21         ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2022-07-22  4:46 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER

On Fri, Jul 22, 2022 at 01:13:26AM +0300, Sagi Grimberg wrote:
> The problem is that nvme_wq is MEM_RECLAIM, and nvme_tcp_wq is
> for the socket threads, that does not need to be MEM_RECLAIM workqueue.

Why don't we need MEM_RECLAIM for the socket threads?

> But reset/error-recovery that take place on nvme_wq, stop nvme-tcp
> queues, and that must involve flushing queue->io_work in order to
> fence concurrent execution.
> 
> So what is the solution? make nvme_tcp_wq MEM_RECLAIM?

I think so.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] blktests nvme/tcp triggered WARNING at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c
  2022-07-22  4:46       ` Christoph Hellwig
@ 2022-07-24  8:21         ` Sagi Grimberg
  2022-07-29  7:37           ` Yi Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2022-07-24  8:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER


>> The problem is that nvme_wq is MEM_RECLAIM, and nvme_tcp_wq is
>> for the socket threads, that does not need to be MEM_RECLAIM workqueue.
> 
> Why don't we need MEM_RECLAIM for the socket threads?
> 
>> But reset/error-recovery that take place on nvme_wq, stop nvme-tcp
>> queues, and that must involve flushing queue->io_work in order to
>> fence concurrent execution.
>>
>> So what is the solution? make nvme_tcp_wq MEM_RECLAIM?
> 
> I think so.

OK.

Yi, does this patch makes the issue go away?
--
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 0a9542599ad1..dc3b4dc8fe08 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1839,7 +1839,8 @@ static int __init nvmet_tcp_init(void)
  {
         int ret;

-       nvmet_tcp_wq = alloc_workqueue("nvmet_tcp_wq", WQ_HIGHPRI, 0);
+       nvmet_tcp_wq = alloc_workqueue("nvmet_tcp_wq",
+                               WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
         if (!nvmet_tcp_wq)
                 return -ENOMEM;
--

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [bug report] blktests nvme/tcp triggered WARNING at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c
  2022-07-24  8:21         ` Sagi Grimberg
@ 2022-07-29  7:37           ` Yi Zhang
  0 siblings, 0 replies; 8+ messages in thread
From: Yi Zhang @ 2022-07-29  7:37 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, linux-block, open list:NVM EXPRESS DRIVER

On Sun, Jul 24, 2022 at 4:21 PM Sagi Grimberg <sagi@grimberg.me> wrote:
>
>
> >> The problem is that nvme_wq is MEM_RECLAIM, and nvme_tcp_wq is
> >> for the socket threads, that does not need to be MEM_RECLAIM workqueue.
> >
> > Why don't we need MEM_RECLAIM for the socket threads?
> >
> >> But reset/error-recovery that take place on nvme_wq, stop nvme-tcp
> >> queues, and that must involve flushing queue->io_work in order to
> >> fence concurrent execution.
> >>
> >> So what is the solution? make nvme_tcp_wq MEM_RECLAIM?
> >
> > I think so.
>
> OK.
>
> Yi, does this patch makes the issue go away?

I tried to find one server to manually reproduce the issue but with no
luck reproducing it, since it has been merged, I will keep monitoring
this issue from the CKI tests.

> --
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 0a9542599ad1..dc3b4dc8fe08 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1839,7 +1839,8 @@ static int __init nvmet_tcp_init(void)
>   {
>          int ret;
>
> -       nvmet_tcp_wq = alloc_workqueue("nvmet_tcp_wq", WQ_HIGHPRI, 0);
> +       nvmet_tcp_wq = alloc_workqueue("nvmet_tcp_wq",
> +                               WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
>          if (!nvmet_tcp_wq)
>                  return -ENOMEM;
> --
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-29  7:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-08 16:03 [bug report] blktests nvme/tcp triggered WARNING at kernel/workqueue.c:2628 check_flush_dependency+0x110/0x14c Yi Zhang
2022-07-10  9:41 ` Sagi Grimberg
2022-07-19 16:08   ` Yi Zhang
2022-07-20  6:18   ` Christoph Hellwig
2022-07-21 22:13     ` Sagi Grimberg
2022-07-22  4:46       ` Christoph Hellwig
2022-07-24  8:21         ` Sagi Grimberg
2022-07-29  7:37           ` Yi Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.