linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Hannes Reinecke <hare@suse.de>,
	Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: blktests failures with v6.4
Date: Thu, 13 Jul 2023 13:16:31 +0300	[thread overview]
Message-ID: <5cb63b10-72ab-4b7c-7574-48583ea8ffd1@grimberg.me> (raw)
In-Reply-To: <142962ad-55fc-d193-de7a-949f290d3fdf@suse.de>



On 7/13/23 11:41, Hannes Reinecke wrote:
> On 7/13/23 09:48, Sagi Grimberg wrote:
>>
>>>>> #3: nvme/003 (fabrics transport)
>>>>>
>>>>>      When nvme test group is run with trtype=rdma or tcp, the test 
>>>>> case fails
>>>>>      due to lockdep WARNING "possible circular locking dependency 
>>>>> detected".
>>>>>      Reported in May/2023. Bart suggested a fix for trytpe=rdma [4] 
>>>>> but it
>>>>>      needs more discussion.
>>>>>
>>>>>      [4] 
>>>>> https://lore.kernel.org/linux-nvme/20230511150321.103172-1-bvanassche@acm.org/
>>>>
>>>> This patch is unfortunately incorrect and buggy.
>>>>
>>>> This will likely make the issue go away, but adds another
>>>> old issue where a client can DDOS a target by bombarding it
>>>> with connect/disconnect. When releases are async and we don't
>>>> have any back-pressure, it is likely to happen.
>>>> -- 
>>>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
>>>> index 4597bca43a6d..8b4f4aa48206 100644
>>>> --- a/drivers/nvme/target/rdma.c
>>>> +++ b/drivers/nvme/target/rdma.c
>>>> @@ -1582,11 +1582,6 @@ static int nvmet_rdma_queue_connect(struct 
>>>> rdma_cm_id
>>>> *cm_id,
>>>>                  goto put_device;
>>>>          }
>>>>
>>>> -       if (queue->host_qid == 0) {
>>>> -               /* Let inflight controller teardown complete */
>>>> -               flush_workqueue(nvmet_wq);
>>>> -       }
>>>> -
>>>>          ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
>>>>          if (ret) {
>>>>                  /*
>>>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>>>> index 868aa4de2e4c..c8cfa19e11c7 100644
>>>> --- a/drivers/nvme/target/tcp.c
>>>> +++ b/drivers/nvme/target/tcp.c
>>>> @@ -1844,11 +1844,6 @@ static u16 nvmet_tcp_install_queue(struct 
>>>> nvmet_sq
>>>> *sq)
>>>>          struct nvmet_tcp_queue *queue =
>>>>                  container_of(sq, struct nvmet_tcp_queue, nvme_sq);
>>>>
>>>> -       if (sq->qid == 0) {
>>>> -               /* Let inflight controller teardown complete */
>>>> -               flush_workqueue(nvmet_wq);
>>>> -       }
>>>> -
>>>>          queue->nr_cmds = sq->size * 2;
>>>>          if (nvmet_tcp_alloc_cmds(queue))
>>>>                  return NVME_SC_INTERNAL;
>>>> -- 
>>>
>>> Thanks Sagi, I tried the patch above and confirmed the lockdep WARN 
>>> disappears
>>> for both rdma and tcp. It indicates that the flush_workqueue(nvmet_wq)
>>> introduced the circular lock dependency.
>>
>> Thanks for confirming. This was expected.
>>
>>> I also found the two commits below
>>> record why the flush_workqueue(nvmet_wq) was introduced.
>>>
>>>   777dc82395de ("nvmet-rdma: occasionally flush ongoing controller 
>>> teardown")
>>>   8832cf922151 ("nvmet: use a private workqueue instead of the system 
>>> workqueue")
>>
>> The second patch is unrelated, before we used a global workqueue and
>> fundamentally had the same issue.
>>
>>> The left question is how to avoid both the connect/disconnect 
>>> bombarding DDOS
>>> and the circular lock possibility related to the nvmet_wq completion.
>>
>> I don't see any way to synchronize connects with releases without 
>> moving connect sequences to a dedicated thread. Which in my mind is 
>> undesirable.
>>
>> The only solution I can think of is to fail a host connect expecting the
>> host to reconnect and throttle this way, but that would lead to spurious
>> connect failures (at least from the host PoV).
>>
>> Maybe we can add a NOT_READY connect error code in nvme for that...
>>
> You know, I have been seeing the very same lockdep warning during TLS 
> testing; wasn't sure, though, if it's a generic issue or something I've 
> introduced with the TLS logic.
> 
> I guess we can solve this by adding another NVMET_TCP_Q_INIT state 
> before NVMET_TCP_Q_CONNECTING; then we can flip the state from INIT to 
> CONNECTING in nvmet_tcp_alloc_queue():
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index ed98df72c76b..e6f699a44128 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -121,6 +121,7 @@ struct nvmet_tcp_cmd {
>   };
> 
>   enum nvmet_tcp_queue_state {
> +       NVMET_TCP_Q_INIT,
>          NVMET_TCP_Q_CONNECTING,
>          NVMET_TCP_Q_LIVE,
>          NVMET_TCP_Q_DISCONNECTING,
> @@ -1274,7 +1275,8 @@ static int nvmet_tcp_try_recv(struct 
> nvmet_tcp_queue *queue,
>   static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue 
> *queue)
>   {
>          spin_lock(&queue->state_lock);
> -       if (queue->state != NVMET_TCP_Q_DISCONNECTING) {
> +       if (queue->state != NVMET_TCP_Q_INIT &&
> +           queue->state != NVMET_TCP_Q_DISCONNECTING) {
>                  queue->state = NVMET_TCP_Q_DISCONNECTING;
>                  queue_work(nvmet_wq, &queue->release_work);
>          }
> @@ -1625,7 +1627,7 @@ static int nvmet_tcp_alloc_queue(struct 
> nvmet_tcp_port *port,
>          queue->port = port;
>          queue->nr_cmds = 0;
>          spin_lock_init(&queue->state_lock);
> -       queue->state = NVMET_TCP_Q_CONNECTING;
> +       queue->state = NVMET_TCP_Q_INIT;
>          INIT_LIST_HEAD(&queue->free_list);
>          init_llist_head(&queue->resp_list);
>          INIT_LIST_HEAD(&queue->resp_send_list);
> @@ -1832,10 +1834,12 @@ static u16 nvmet_tcp_install_queue(struct 
> nvmet_sq *sq)
>          struct nvmet_tcp_queue *queue =
>                  container_of(sq, struct nvmet_tcp_queue, nvme_sq);
> 
> -       if (sq->qid == 0) {
> -               /* Let inflight controller teardown complete */
> -               flush_workqueue(nvmet_wq);
> +       spin_lock(&queue->state_lock);
> +       if (queue->state != NVMET_TCP_Q_INIT) {
> +               spin_unlock(&queue->state_lock);
> +               return NVME_SC_NOT_READY;
>          }
> +       spin_unlock(&queue->state_lock);
> 
>          queue->nr_cmds = sq->size * 2;
>          if (nvmet_tcp_alloc_cmds(queue))
> 
> With that we'll return 'not ready' whenever we hit this condition, but 
> that should be fine as we would've crashed anyway with the old code.
> 
> Hmm?

I don't understand what this patch is doing... Nor how it solves
anything.


      reply	other threads:[~2023-07-13 10:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-07  7:27 blktests failures with v6.4 Shinichiro Kawasaki
2023-07-09 14:32 ` Sagi Grimberg
2023-07-13  1:22   ` Shinichiro Kawasaki
2023-07-13  7:48     ` Sagi Grimberg
2023-07-13  8:41       ` Hannes Reinecke
2023-07-13 10:16         ` Sagi Grimberg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5cb63b10-72ab-4b7c-7574-48583ea8ffd1@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=hare@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).