All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, hch@lst.de
Subject: Re: [RFC PATCH] nvme-tcp: rerun io_work if req_list is not empty
Date: Tue, 18 May 2021 07:38:17 -0700	[thread overview]
Message-ID: <20210518143817.GD2709569@dhcp-10-100-145-180.wdc.com> (raw)
In-Reply-To: <11e1733a-24d3-72c2-1ece-4f9d1a8fade1@grimberg.me>

On Mon, May 17, 2021 at 08:02:41PM -0700, Sagi Grimberg wrote:
> > nvme_tcp_send_all() breaks out of the loop if nvme_tcp_fetch_request()
> > returns NULL. If that happens just before io_work calls
> > nvme_tcp_handle_r2t() to enqueue the H2C request, nvme_tcp_send_all()
> > will not see that request, but send_mutex is still held. We're counting
> > on io_work to run again to handle sending the H2C data in that case.
> > Unlikely as it sounds, if the same nvme_tcp_send_all() context is still
> > holding the send_mutex when io_work gets back to trying to take it, how
> > will the data get sent?
> 
> Yes you are correct, overlooked this race. I guess this is all coming
> from having less queues than cores (where rx really competes with tx)
> which is not as common as a non default.
> 
> This is enough to convince me that this is needed:
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

Great! I thought the scenario seemed possible, but wasn't completely
sure, so thank you for confirming.

Christoph, can we pick this up for the next rc? For stable, we can add

Fixes: db5ad6b7f8cdd ("nvme-tcp: try to send request in queue_rq context")
 
> > > Can we maybe try to catch if that is the case?
> > 
> > Do you have a better idea on how we can catch this? I think there was
> > only one occurance of this sighting so far, and it looks like it took a
> > long time to encounter it, but we will try again if you have a proposal.
> 
> We can continue to test with the patch and hunt for another occurance,
> given the argument above, this patch is needed regardless...

Sounds good, we'll run with the patch and see what happens. If the tests
are successful, I'm not sure if we can conclude this definitely fixes
the timeout or if we just got lucky. If a timeout is observed, though, I
will try to work in a debug patch.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-05-18 14:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17 22:36 [RFC PATCH] nvme-tcp: rerun io_work if req_list is not empty Keith Busch
2021-05-18  0:42 ` Sagi Grimberg
2021-05-18  1:38   ` Keith Busch
2021-05-18  3:02     ` Sagi Grimberg
2021-05-18 14:38       ` Keith Busch [this message]
2021-05-19  6:35 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210518143817.GD2709569@dhcp-10-100-145-180.wdc.com \
    --to=kbusch@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.