linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, hch@lst.de
Subject: Re: nvme tcp receive errors
Date: Mon, 17 May 2021 13:48:55 -0700	[thread overview]
Message-ID: <20210517204855.GB2709569@dhcp-10-100-145-180.wdc.com> (raw)
In-Reply-To: <68ca75e3-2e0a-c0d6-c6cd-ab4d7467c0ad@grimberg.me>

On Thu, May 13, 2021 at 12:53:54PM -0700, Sagi Grimberg wrote:
> On 5/13/21 8:48 AM, Keith Busch wrote:
> > On Tue, May 11, 2021 at 10:17:09AM -0700, Sagi Grimberg wrote:
> > > 
> > > > > I may have a theory to this issue. I think that the problem is in
> > > > > cases where we send commands with data to the controller and then in
> > > > > nvme_tcp_send_data between the last successful kernel_sendpage
> > > > > and before nvme_tcp_advance_req, the controller sends back a successful
> > > > > completion.
> > > > > 
> > > > > If that is the case, then the completion path could be triggered,
> > > > > the tag would be reused, triggering a new .queue_rq, setting again
> > > > > the req.iter with the new bio params (all is not taken by the
> > > > > send_mutex) and then the send context would call nvme_tcp_advance_req
> > > > > progressing the req.iter with the former sent bytes... And given that
> > > > > the req.iter is used for reads/writes, it is possible that it can
> > > > > explain both issues.
> > > > > 
> > > > > While this is not easy to trigger, there is nothing I think that
> > > > > can prevent that. The driver used to have a single context that
> > > > > would do both send and recv so this could not have happened, but
> > > > > now that we added the .queue_rq send context, I guess this can
> > > > > indeed confuse the driver.
> > > > 
> > > > Awesome, this is exactly the type of sequence I've been trying to
> > > > capture, but couldn't quite get there. Now that you've described it,
> > > > that flow can certainly explain the observations, including the
> > > > corrupted debug trace event I was trying to add.
> > > > 
> > > > The sequence looks unlikely to happen, which agrees with the difficulty
> > > > in reproducing it. I am betting right now that you got it, but a little
> > > > surprised no one else is reporting a similar problem yet.
> > > 
> > > We had at least one report from Potnuri that I think may have been
> > > triggered by this, this ended up fixed (or rather worked-around
> > > with 5c11f7d9f843).
> > > 
> > > > Your option "1" looks like the best one, IMO. I've requested dropping
> > > > all debug and test patches and using just this one on the current nvme
> > > > baseline for the next test cycle.
> > > 
> > > Cool, waiting to hear back...
> > 
> > This patch has been tested successfully on the initial workloads. There
> > are several more that need to be validated, but each one runs for many
> > hours, so it may be a couple more days before completed. Just wanted to
> > leat you know: so far, so good.
> 
> Encouraging... I'll send a patch for that as soon as you give me the
> final verdict. I'm assuming Narayan would be the reporter and the
> tester?

This tests successfully. There was one timeout issue observed in all the
testing, but does not appear related to the reported problems here, or
your fix, so I will start a new thread on that if I can get more
information on it.

You may use the following tags for the commit log:

Reported-by: Narayan Ayalasomayajula <narayan.ayalasomayajula@wdc.com>
Tested-by: Anil Mishra <anil.mishra@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

      reply	other threads:[~2021-05-17 20:50 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-31 16:18 nvme tcp receive errors Keith Busch
2021-03-31 19:10 ` Sagi Grimberg
2021-03-31 20:49   ` Keith Busch
2021-03-31 22:16     ` Sagi Grimberg
2021-03-31 22:26       ` Keith Busch
2021-03-31 22:45         ` Sagi Grimberg
2021-04-02 17:11     ` Keith Busch
2021-04-02 17:27       ` Sagi Grimberg
2021-04-05 14:37         ` Keith Busch
2021-04-07 19:53           ` Keith Busch
2021-04-09 21:38             ` Sagi Grimberg
2021-04-27 23:39               ` Keith Busch
2021-04-27 23:55                 ` Sagi Grimberg
2021-04-28 15:58                   ` Keith Busch
2021-04-28 17:42                     ` Sagi Grimberg
2021-04-28 18:01                       ` Keith Busch
2021-04-28 23:06                         ` Sagi Grimberg
2021-04-29  3:33                           ` Keith Busch
2021-04-29  4:52                             ` Sagi Grimberg
2021-05-03 18:51                               ` Keith Busch
2021-05-03 19:58                                 ` Sagi Grimberg
2021-05-03 20:25                                   ` Keith Busch
2021-05-04 19:29                                     ` Sagi Grimberg
2021-04-09 18:04           ` Sagi Grimberg
2021-04-14  0:29             ` Keith Busch
2021-04-21  5:33               ` Sagi Grimberg
2021-04-21 14:28                 ` Keith Busch
2021-04-21 16:59                   ` Sagi Grimberg
2021-04-26 15:31                 ` Keith Busch
2021-04-27  3:10                   ` Sagi Grimberg
2021-04-27 18:12                     ` Keith Busch
2021-04-27 23:58                       ` Sagi Grimberg
2021-04-30 23:42                         ` Sagi Grimberg
2021-05-03 14:28                           ` Keith Busch
2021-05-03 19:36                             ` Sagi Grimberg
2021-05-03 19:38                               ` Sagi Grimberg
2021-05-03 19:44                                 ` Keith Busch
2021-05-03 20:00                                   ` Sagi Grimberg
2021-05-04 14:36                                     ` Keith Busch
2021-05-04 18:15                                       ` Sagi Grimberg
2021-05-04 19:14                                         ` Keith Busch
2021-05-10 18:06                                           ` Keith Busch
2021-05-10 18:18                                             ` Sagi Grimberg
2021-05-10 18:30                                               ` Keith Busch
2021-05-10 21:07                                                 ` Sagi Grimberg
2021-05-11  3:00                                                   ` Keith Busch
2021-05-11 17:17                                                     ` Sagi Grimberg
2021-05-13 15:48                                                       ` Keith Busch
2021-05-13 19:53                                                         ` Sagi Grimberg
2021-05-17 20:48                                                           ` Keith Busch [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210517204855.GB2709569@dhcp-10-100-145-180.wdc.com \
    --to=kbusch@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).