linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* blktests failures with v6.4
@ 2023-07-07  7:27 Shinichiro Kawasaki
  2023-07-09 14:32 ` Sagi Grimberg
  0 siblings, 1 reply; 6+ messages in thread
From: Shinichiro Kawasaki @ 2023-07-07  7:27 UTC (permalink / raw)
  To: linux-block, linux-nvme, linux-scsi

Hi all,

I ran the latest blktests (git hash: 30bdcb72ad99) with v6.4 kernel and observed
test case failures listed below. I call for support to fix them. The list of
failures are same as the report for v6.3 kernel [1].

  [1] https://lore.kernel.org/linux-block/rsmmxrchy6voi5qhl4irss5sprna3f5owkqtvybxglcv2pnylm@xmrnpfu3tfpe/

List of failures
================
#1: block/011
#2: block/024
#3: nvme/003 (fabrics transport)
#4: nvme/030 or nvme/031 (rdma transport with siw)
#5: nvme/* (fc transport)

Failure description
===================

#1: block/011

   This test case shows two failure symptoms.

   Symptom A:

     The test case fails with NVME devices due to lockdep WARNING "possible
     circular locking dependency detected". Reported in Sep/2022 [2] and
     solution was discussed. Waiting a fix.

     [2] https://lore.kernel.org/linux-block/20220930001943.zdbvolc3gkekfmcv@shindev/

   Symptom B:

     The test case occasionally fail with fio assert messages. Fio bug is
     suspected. I'm preparing a fix patch [3].

     block/011 => nvme0n1 (disable PCI device while doing I/O)    [failed]
        runtime  32.250s  ...  1684.324s
        --- tests/block/011.out     2023-04-06 10:11:07.920670529 +0900
        +++ /home/shin/Blktests/blktests/results/nvme0n1/block/011.out.bad  2023-07-07 14:34:26.123295632 +0900
        @@ -1,2 +1,11 @@
         Running block/011
        +fio: ioengines.c:335: td_io_queue: Assertion `(io_u->flags & IO_U_F_FLIGHT) == 0' failed.
        +fio: ioengines.c:335: td_io_queue: Assertion `(io_u->flags & IO_U_F_FLIGHT) == 0' failed.
        +fio: ioengines.c:335: td_io_queue: Assertion `(io_u->flags & IO_U_F_FLIGHT) == 0' failed.
        +fio: ioengines.c:335: td_io_queue: Assertion `(io_u->flags & IO_U_F_FLIGHT) == 0' failed.
        +fio: pid=1492, got signal=6
        +fio: pid=1493, got signal=6
        ...
        (Run 'diff -u tests/block/011.out /home/shin/Blktests/blktests/results/nvme0n1/block/011.out.bad' to see the entire diff)

     [3] https://github.com/kawasaki/fio/commit/f4b68b90c6780a73d4eb017449400a7891443e3b

#2: block/024

   Fails on slow machines. Reported in Dec/2022. Test case side issue is
   suspected. Still needs further investigation.

   block/024 (do I/O faster than a jiffy and check iostats times) [failed]
    runtime    ...  4.347s
    --- tests/block/024.out     2022-12-06 20:51:41.525066605 +0900
    +++ /home/shin/kts/kernel-test-suite/sets/blktests/log/runlog/nodev/block/024.out.bad       2022-12-07 12:51:03.610924521 +0900
    @@ -6,5 +6,5 @@
     read 1 s
     write 1 s
     read 2 s
    -write 3 s
    +write 4 s
     Test complete

#3: nvme/003 (fabrics transport)

   When nvme test group is run with trtype=rdma or tcp, the test case fails
   due to lockdep WARNING "possible circular locking dependency detected".
   Reported in May/2023. Bart suggested a fix for trytpe=rdma [4] but it
   needs more discussion.

   [4] https://lore.kernel.org/linux-nvme/20230511150321.103172-1-bvanassche@acm.org/

#4: nvme/030 or nvme/031 (rdma transport with siw)

   When nvme test group is run with trtype=rdma and use_siw=1 configurations,
   nvme/030 or nvme/031 fail occasionally due to "BUG: KASAN: slab-use-after-
   free in __mutex_lock". Reported to linux-rdma in May/2023. A fix was
   suggested but it did not fix the root cause in rdma iwarp cm [5]. Waiting
   for a good fix.

   [5] https://lore.kernel.org/linux-rdma/20230612054237.1855292-1-shinichiro.kawasaki@wdc.com/

#5: nvme/* (fc transport)

   With trtype=fc configuration, test run on nvme test group hangs. Daniel is
   driving fix work [6].

   [6] https://lore.kernel.org/linux-nvme/20230620133711.22840-1-dwagner@suse.de/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: blktests failures with v6.4
  2023-07-07  7:27 blktests failures with v6.4 Shinichiro Kawasaki
@ 2023-07-09 14:32 ` Sagi Grimberg
  2023-07-13  1:22   ` Shinichiro Kawasaki
  0 siblings, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2023-07-09 14:32 UTC (permalink / raw)
  To: Shinichiro Kawasaki, linux-block, linux-nvme, linux-scsi


> #3: nvme/003 (fabrics transport)
> 
>     When nvme test group is run with trtype=rdma or tcp, the test case fails
>     due to lockdep WARNING "possible circular locking dependency detected".
>     Reported in May/2023. Bart suggested a fix for trytpe=rdma [4] but it
>     needs more discussion.
> 
>     [4] https://lore.kernel.org/linux-nvme/20230511150321.103172-1-bvanassche@acm.org/

This patch is unfortunately incorrect and buggy.

This will likely make the issue go away, but adds another
old issue where a client can DDOS a target by bombarding it
with connect/disconnect. When releases are async and we don't
have any back-pressure, it is likely to happen.
--
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 4597bca43a6d..8b4f4aa48206 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1582,11 +1582,6 @@ static int nvmet_rdma_queue_connect(struct 
rdma_cm_id *cm_id,
                 goto put_device;
         }

-       if (queue->host_qid == 0) {
-               /* Let inflight controller teardown complete */
-               flush_workqueue(nvmet_wq);
-       }
-
         ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
         if (ret) {
                 /*
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 868aa4de2e4c..c8cfa19e11c7 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1844,11 +1844,6 @@ static u16 nvmet_tcp_install_queue(struct 
nvmet_sq *sq)
         struct nvmet_tcp_queue *queue =
                 container_of(sq, struct nvmet_tcp_queue, nvme_sq);

-       if (sq->qid == 0) {
-               /* Let inflight controller teardown complete */
-               flush_workqueue(nvmet_wq);
-       }
-
         queue->nr_cmds = sq->size * 2;
         if (nvmet_tcp_alloc_cmds(queue))
                 return NVME_SC_INTERNAL;
--


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: blktests failures with v6.4
  2023-07-09 14:32 ` Sagi Grimberg
@ 2023-07-13  1:22   ` Shinichiro Kawasaki
  2023-07-13  7:48     ` Sagi Grimberg
  0 siblings, 1 reply; 6+ messages in thread
From: Shinichiro Kawasaki @ 2023-07-13  1:22 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-block, linux-nvme, linux-scsi

On Jul 09, 2023 / 17:32, Sagi Grimberg wrote:
> 
> > #3: nvme/003 (fabrics transport)
> > 
> >     When nvme test group is run with trtype=rdma or tcp, the test case fails
> >     due to lockdep WARNING "possible circular locking dependency detected".
> >     Reported in May/2023. Bart suggested a fix for trytpe=rdma [4] but it
> >     needs more discussion.
> > 
> >     [4] https://lore.kernel.org/linux-nvme/20230511150321.103172-1-bvanassche@acm.org/
> 
> This patch is unfortunately incorrect and buggy.
> 
> This will likely make the issue go away, but adds another
> old issue where a client can DDOS a target by bombarding it
> with connect/disconnect. When releases are async and we don't
> have any back-pressure, it is likely to happen.
> --
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index 4597bca43a6d..8b4f4aa48206 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1582,11 +1582,6 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id
> *cm_id,
>                 goto put_device;
>         }
> 
> -       if (queue->host_qid == 0) {
> -               /* Let inflight controller teardown complete */
> -               flush_workqueue(nvmet_wq);
> -       }
> -
>         ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
>         if (ret) {
>                 /*
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 868aa4de2e4c..c8cfa19e11c7 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1844,11 +1844,6 @@ static u16 nvmet_tcp_install_queue(struct nvmet_sq
> *sq)
>         struct nvmet_tcp_queue *queue =
>                 container_of(sq, struct nvmet_tcp_queue, nvme_sq);
> 
> -       if (sq->qid == 0) {
> -               /* Let inflight controller teardown complete */
> -               flush_workqueue(nvmet_wq);
> -       }
> -
>         queue->nr_cmds = sq->size * 2;
>         if (nvmet_tcp_alloc_cmds(queue))
>                 return NVME_SC_INTERNAL;
> --

Thanks Sagi, I tried the patch above and confirmed the lockdep WARN disappears
for both rdma and tcp. It indicates that the flush_workqueue(nvmet_wq)
introduced the circular lock dependency. I also found the two commits below
record why the flush_workqueue(nvmet_wq) was introduced.

 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
 8832cf922151 ("nvmet: use a private workqueue instead of the system workqueue")

The left question is how to avoid both the connect/disconnect bombarding DDOS
and the circular lock possibility related to the nvmet_wq completion.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: blktests failures with v6.4
  2023-07-13  1:22   ` Shinichiro Kawasaki
@ 2023-07-13  7:48     ` Sagi Grimberg
  2023-07-13  8:41       ` Hannes Reinecke
  0 siblings, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2023-07-13  7:48 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: linux-block, linux-nvme, linux-scsi


>>> #3: nvme/003 (fabrics transport)
>>>
>>>      When nvme test group is run with trtype=rdma or tcp, the test case fails
>>>      due to lockdep WARNING "possible circular locking dependency detected".
>>>      Reported in May/2023. Bart suggested a fix for trytpe=rdma [4] but it
>>>      needs more discussion.
>>>
>>>      [4] https://lore.kernel.org/linux-nvme/20230511150321.103172-1-bvanassche@acm.org/
>>
>> This patch is unfortunately incorrect and buggy.
>>
>> This will likely make the issue go away, but adds another
>> old issue where a client can DDOS a target by bombarding it
>> with connect/disconnect. When releases are async and we don't
>> have any back-pressure, it is likely to happen.
>> --
>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
>> index 4597bca43a6d..8b4f4aa48206 100644
>> --- a/drivers/nvme/target/rdma.c
>> +++ b/drivers/nvme/target/rdma.c
>> @@ -1582,11 +1582,6 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id
>> *cm_id,
>>                  goto put_device;
>>          }
>>
>> -       if (queue->host_qid == 0) {
>> -               /* Let inflight controller teardown complete */
>> -               flush_workqueue(nvmet_wq);
>> -       }
>> -
>>          ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
>>          if (ret) {
>>                  /*
>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>> index 868aa4de2e4c..c8cfa19e11c7 100644
>> --- a/drivers/nvme/target/tcp.c
>> +++ b/drivers/nvme/target/tcp.c
>> @@ -1844,11 +1844,6 @@ static u16 nvmet_tcp_install_queue(struct nvmet_sq
>> *sq)
>>          struct nvmet_tcp_queue *queue =
>>                  container_of(sq, struct nvmet_tcp_queue, nvme_sq);
>>
>> -       if (sq->qid == 0) {
>> -               /* Let inflight controller teardown complete */
>> -               flush_workqueue(nvmet_wq);
>> -       }
>> -
>>          queue->nr_cmds = sq->size * 2;
>>          if (nvmet_tcp_alloc_cmds(queue))
>>                  return NVME_SC_INTERNAL;
>> --
> 
> Thanks Sagi, I tried the patch above and confirmed the lockdep WARN disappears
> for both rdma and tcp. It indicates that the flush_workqueue(nvmet_wq)
> introduced the circular lock dependency.

Thanks for confirming. This was expected.

> I also found the two commits below
> record why the flush_workqueue(nvmet_wq) was introduced.
> 
>   777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown")
>   8832cf922151 ("nvmet: use a private workqueue instead of the system workqueue")

The second patch is unrelated, before we used a global workqueue and
fundamentally had the same issue.

> The left question is how to avoid both the connect/disconnect bombarding DDOS
> and the circular lock possibility related to the nvmet_wq completion.

I don't see any way to synchronize connects with releases without moving 
connect sequences to a dedicated thread. Which in my mind is undesirable.

The only solution I can think of is to fail a host connect expecting the
host to reconnect and throttle this way, but that would lead to spurious
connect failures (at least from the host PoV).

Maybe we can add a NOT_READY connect error code in nvme for that...



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: blktests failures with v6.4
  2023-07-13  7:48     ` Sagi Grimberg
@ 2023-07-13  8:41       ` Hannes Reinecke
  2023-07-13 10:16         ` Sagi Grimberg
  0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2023-07-13  8:41 UTC (permalink / raw)
  To: Sagi Grimberg, Shinichiro Kawasaki; +Cc: linux-block, linux-nvme, linux-scsi

On 7/13/23 09:48, Sagi Grimberg wrote:
> 
>>>> #3: nvme/003 (fabrics transport)
>>>>
>>>>      When nvme test group is run with trtype=rdma or tcp, the test 
>>>> case fails
>>>>      due to lockdep WARNING "possible circular locking dependency 
>>>> detected".
>>>>      Reported in May/2023. Bart suggested a fix for trytpe=rdma [4] 
>>>> but it
>>>>      needs more discussion.
>>>>
>>>>      [4] 
>>>> https://lore.kernel.org/linux-nvme/20230511150321.103172-1-bvanassche@acm.org/
>>>
>>> This patch is unfortunately incorrect and buggy.
>>>
>>> This will likely make the issue go away, but adds another
>>> old issue where a client can DDOS a target by bombarding it
>>> with connect/disconnect. When releases are async and we don't
>>> have any back-pressure, it is likely to happen.
>>> -- 
>>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
>>> index 4597bca43a6d..8b4f4aa48206 100644
>>> --- a/drivers/nvme/target/rdma.c
>>> +++ b/drivers/nvme/target/rdma.c
>>> @@ -1582,11 +1582,6 @@ static int nvmet_rdma_queue_connect(struct 
>>> rdma_cm_id
>>> *cm_id,
>>>                  goto put_device;
>>>          }
>>>
>>> -       if (queue->host_qid == 0) {
>>> -               /* Let inflight controller teardown complete */
>>> -               flush_workqueue(nvmet_wq);
>>> -       }
>>> -
>>>          ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
>>>          if (ret) {
>>>                  /*
>>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>>> index 868aa4de2e4c..c8cfa19e11c7 100644
>>> --- a/drivers/nvme/target/tcp.c
>>> +++ b/drivers/nvme/target/tcp.c
>>> @@ -1844,11 +1844,6 @@ static u16 nvmet_tcp_install_queue(struct 
>>> nvmet_sq
>>> *sq)
>>>          struct nvmet_tcp_queue *queue =
>>>                  container_of(sq, struct nvmet_tcp_queue, nvme_sq);
>>>
>>> -       if (sq->qid == 0) {
>>> -               /* Let inflight controller teardown complete */
>>> -               flush_workqueue(nvmet_wq);
>>> -       }
>>> -
>>>          queue->nr_cmds = sq->size * 2;
>>>          if (nvmet_tcp_alloc_cmds(queue))
>>>                  return NVME_SC_INTERNAL;
>>> -- 
>>
>> Thanks Sagi, I tried the patch above and confirmed the lockdep WARN 
>> disappears
>> for both rdma and tcp. It indicates that the flush_workqueue(nvmet_wq)
>> introduced the circular lock dependency.
> 
> Thanks for confirming. This was expected.
> 
>> I also found the two commits below
>> record why the flush_workqueue(nvmet_wq) was introduced.
>>
>>   777dc82395de ("nvmet-rdma: occasionally flush ongoing controller 
>> teardown")
>>   8832cf922151 ("nvmet: use a private workqueue instead of the system 
>> workqueue")
> 
> The second patch is unrelated, before we used a global workqueue and
> fundamentally had the same issue.
> 
>> The left question is how to avoid both the connect/disconnect 
>> bombarding DDOS
>> and the circular lock possibility related to the nvmet_wq completion.
> 
> I don't see any way to synchronize connects with releases without moving 
> connect sequences to a dedicated thread. Which in my mind is undesirable.
> 
> The only solution I can think of is to fail a host connect expecting the
> host to reconnect and throttle this way, but that would lead to spurious
> connect failures (at least from the host PoV).
> 
> Maybe we can add a NOT_READY connect error code in nvme for that...
> 
You know, I have been seeing the very same lockdep warning during TLS 
testing; wasn't sure, though, if it's a generic issue or something I've 
introduced with the TLS logic.

I guess we can solve this by adding another NVMET_TCP_Q_INIT state 
before NVMET_TCP_Q_CONNECTING; then we can flip the state from INIT to 
CONNECTING in nvmet_tcp_alloc_queue():

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index ed98df72c76b..e6f699a44128 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -121,6 +121,7 @@ struct nvmet_tcp_cmd {
  };

  enum nvmet_tcp_queue_state {
+       NVMET_TCP_Q_INIT,
         NVMET_TCP_Q_CONNECTING,
         NVMET_TCP_Q_LIVE,
         NVMET_TCP_Q_DISCONNECTING,
@@ -1274,7 +1275,8 @@ static int nvmet_tcp_try_recv(struct 
nvmet_tcp_queue *queue,
  static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue 
*queue)
  {
         spin_lock(&queue->state_lock);
-       if (queue->state != NVMET_TCP_Q_DISCONNECTING) {
+       if (queue->state != NVMET_TCP_Q_INIT &&
+           queue->state != NVMET_TCP_Q_DISCONNECTING) {
                 queue->state = NVMET_TCP_Q_DISCONNECTING;
                 queue_work(nvmet_wq, &queue->release_work);
         }
@@ -1625,7 +1627,7 @@ static int nvmet_tcp_alloc_queue(struct 
nvmet_tcp_port *port,
         queue->port = port;
         queue->nr_cmds = 0;
         spin_lock_init(&queue->state_lock);
-       queue->state = NVMET_TCP_Q_CONNECTING;
+       queue->state = NVMET_TCP_Q_INIT;
         INIT_LIST_HEAD(&queue->free_list);
         init_llist_head(&queue->resp_list);
         INIT_LIST_HEAD(&queue->resp_send_list);
@@ -1832,10 +1834,12 @@ static u16 nvmet_tcp_install_queue(struct 
nvmet_sq *sq)
         struct nvmet_tcp_queue *queue =
                 container_of(sq, struct nvmet_tcp_queue, nvme_sq);

-       if (sq->qid == 0) {
-               /* Let inflight controller teardown complete */
-               flush_workqueue(nvmet_wq);
+       spin_lock(&queue->state_lock);
+       if (queue->state != NVMET_TCP_Q_INIT) {
+               spin_unlock(&queue->state_lock);
+               return NVME_SC_NOT_READY;
         }
+       spin_unlock(&queue->state_lock);

         queue->nr_cmds = sq->size * 2;
         if (nvmet_tcp_alloc_cmds(queue))

With that we'll return 'not ready' whenever we hit this condition, but 
that should be fine as we would've crashed anyway with the old code.

Hmm?

Cheers,

Hannes


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: blktests failures with v6.4
  2023-07-13  8:41       ` Hannes Reinecke
@ 2023-07-13 10:16         ` Sagi Grimberg
  0 siblings, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2023-07-13 10:16 UTC (permalink / raw)
  To: Hannes Reinecke, Shinichiro Kawasaki; +Cc: linux-block, linux-nvme, linux-scsi



On 7/13/23 11:41, Hannes Reinecke wrote:
> On 7/13/23 09:48, Sagi Grimberg wrote:
>>
>>>>> #3: nvme/003 (fabrics transport)
>>>>>
>>>>>      When nvme test group is run with trtype=rdma or tcp, the test 
>>>>> case fails
>>>>>      due to lockdep WARNING "possible circular locking dependency 
>>>>> detected".
>>>>>      Reported in May/2023. Bart suggested a fix for trytpe=rdma [4] 
>>>>> but it
>>>>>      needs more discussion.
>>>>>
>>>>>      [4] 
>>>>> https://lore.kernel.org/linux-nvme/20230511150321.103172-1-bvanassche@acm.org/
>>>>
>>>> This patch is unfortunately incorrect and buggy.
>>>>
>>>> This will likely make the issue go away, but adds another
>>>> old issue where a client can DDOS a target by bombarding it
>>>> with connect/disconnect. When releases are async and we don't
>>>> have any back-pressure, it is likely to happen.
>>>> -- 
>>>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
>>>> index 4597bca43a6d..8b4f4aa48206 100644
>>>> --- a/drivers/nvme/target/rdma.c
>>>> +++ b/drivers/nvme/target/rdma.c
>>>> @@ -1582,11 +1582,6 @@ static int nvmet_rdma_queue_connect(struct 
>>>> rdma_cm_id
>>>> *cm_id,
>>>>                  goto put_device;
>>>>          }
>>>>
>>>> -       if (queue->host_qid == 0) {
>>>> -               /* Let inflight controller teardown complete */
>>>> -               flush_workqueue(nvmet_wq);
>>>> -       }
>>>> -
>>>>          ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
>>>>          if (ret) {
>>>>                  /*
>>>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>>>> index 868aa4de2e4c..c8cfa19e11c7 100644
>>>> --- a/drivers/nvme/target/tcp.c
>>>> +++ b/drivers/nvme/target/tcp.c
>>>> @@ -1844,11 +1844,6 @@ static u16 nvmet_tcp_install_queue(struct 
>>>> nvmet_sq
>>>> *sq)
>>>>          struct nvmet_tcp_queue *queue =
>>>>                  container_of(sq, struct nvmet_tcp_queue, nvme_sq);
>>>>
>>>> -       if (sq->qid == 0) {
>>>> -               /* Let inflight controller teardown complete */
>>>> -               flush_workqueue(nvmet_wq);
>>>> -       }
>>>> -
>>>>          queue->nr_cmds = sq->size * 2;
>>>>          if (nvmet_tcp_alloc_cmds(queue))
>>>>                  return NVME_SC_INTERNAL;
>>>> -- 
>>>
>>> Thanks Sagi, I tried the patch above and confirmed the lockdep WARN 
>>> disappears
>>> for both rdma and tcp. It indicates that the flush_workqueue(nvmet_wq)
>>> introduced the circular lock dependency.
>>
>> Thanks for confirming. This was expected.
>>
>>> I also found the two commits below
>>> record why the flush_workqueue(nvmet_wq) was introduced.
>>>
>>>   777dc82395de ("nvmet-rdma: occasionally flush ongoing controller 
>>> teardown")
>>>   8832cf922151 ("nvmet: use a private workqueue instead of the system 
>>> workqueue")
>>
>> The second patch is unrelated, before we used a global workqueue and
>> fundamentally had the same issue.
>>
>>> The left question is how to avoid both the connect/disconnect 
>>> bombarding DDOS
>>> and the circular lock possibility related to the nvmet_wq completion.
>>
>> I don't see any way to synchronize connects with releases without 
>> moving connect sequences to a dedicated thread. Which in my mind is 
>> undesirable.
>>
>> The only solution I can think of is to fail a host connect expecting the
>> host to reconnect and throttle this way, but that would lead to spurious
>> connect failures (at least from the host PoV).
>>
>> Maybe we can add a NOT_READY connect error code in nvme for that...
>>
> You know, I have been seeing the very same lockdep warning during TLS 
> testing; wasn't sure, though, if it's a generic issue or something I've 
> introduced with the TLS logic.
> 
> I guess we can solve this by adding another NVMET_TCP_Q_INIT state 
> before NVMET_TCP_Q_CONNECTING; then we can flip the state from INIT to 
> CONNECTING in nvmet_tcp_alloc_queue():
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index ed98df72c76b..e6f699a44128 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -121,6 +121,7 @@ struct nvmet_tcp_cmd {
>   };
> 
>   enum nvmet_tcp_queue_state {
> +       NVMET_TCP_Q_INIT,
>          NVMET_TCP_Q_CONNECTING,
>          NVMET_TCP_Q_LIVE,
>          NVMET_TCP_Q_DISCONNECTING,
> @@ -1274,7 +1275,8 @@ static int nvmet_tcp_try_recv(struct 
> nvmet_tcp_queue *queue,
>   static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue 
> *queue)
>   {
>          spin_lock(&queue->state_lock);
> -       if (queue->state != NVMET_TCP_Q_DISCONNECTING) {
> +       if (queue->state != NVMET_TCP_Q_INIT &&
> +           queue->state != NVMET_TCP_Q_DISCONNECTING) {
>                  queue->state = NVMET_TCP_Q_DISCONNECTING;
>                  queue_work(nvmet_wq, &queue->release_work);
>          }
> @@ -1625,7 +1627,7 @@ static int nvmet_tcp_alloc_queue(struct 
> nvmet_tcp_port *port,
>          queue->port = port;
>          queue->nr_cmds = 0;
>          spin_lock_init(&queue->state_lock);
> -       queue->state = NVMET_TCP_Q_CONNECTING;
> +       queue->state = NVMET_TCP_Q_INIT;
>          INIT_LIST_HEAD(&queue->free_list);
>          init_llist_head(&queue->resp_list);
>          INIT_LIST_HEAD(&queue->resp_send_list);
> @@ -1832,10 +1834,12 @@ static u16 nvmet_tcp_install_queue(struct 
> nvmet_sq *sq)
>          struct nvmet_tcp_queue *queue =
>                  container_of(sq, struct nvmet_tcp_queue, nvme_sq);
> 
> -       if (sq->qid == 0) {
> -               /* Let inflight controller teardown complete */
> -               flush_workqueue(nvmet_wq);
> +       spin_lock(&queue->state_lock);
> +       if (queue->state != NVMET_TCP_Q_INIT) {
> +               spin_unlock(&queue->state_lock);
> +               return NVME_SC_NOT_READY;
>          }
> +       spin_unlock(&queue->state_lock);
> 
>          queue->nr_cmds = sq->size * 2;
>          if (nvmet_tcp_alloc_cmds(queue))
> 
> With that we'll return 'not ready' whenever we hit this condition, but 
> that should be fine as we would've crashed anyway with the old code.
> 
> Hmm?

I don't understand what this patch is doing... Nor how it solves
anything.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-07-13 10:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-07  7:27 blktests failures with v6.4 Shinichiro Kawasaki
2023-07-09 14:32 ` Sagi Grimberg
2023-07-13  1:22   ` Shinichiro Kawasaki
2023-07-13  7:48     ` Sagi Grimberg
2023-07-13  8:41       ` Hannes Reinecke
2023-07-13 10:16         ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).