From: Keith Busch <kbusch@kernel.org>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, hch@lst.de
Subject: Re: nvme tcp receive errors
Date: Mon, 17 May 2021 13:48:55 -0700 [thread overview]
Message-ID: <20210517204855.GB2709569@dhcp-10-100-145-180.wdc.com> (raw)
In-Reply-To: <68ca75e3-2e0a-c0d6-c6cd-ab4d7467c0ad@grimberg.me>
On Thu, May 13, 2021 at 12:53:54PM -0700, Sagi Grimberg wrote:
> On 5/13/21 8:48 AM, Keith Busch wrote:
> > On Tue, May 11, 2021 at 10:17:09AM -0700, Sagi Grimberg wrote:
> > >
> > > > > I may have a theory to this issue. I think that the problem is in
> > > > > cases where we send commands with data to the controller and then in
> > > > > nvme_tcp_send_data between the last successful kernel_sendpage
> > > > > and before nvme_tcp_advance_req, the controller sends back a successful
> > > > > completion.
> > > > >
> > > > > If that is the case, then the completion path could be triggered,
> > > > > the tag would be reused, triggering a new .queue_rq, setting again
> > > > > the req.iter with the new bio params (all is not taken by the
> > > > > send_mutex) and then the send context would call nvme_tcp_advance_req
> > > > > progressing the req.iter with the former sent bytes... And given that
> > > > > the req.iter is used for reads/writes, it is possible that it can
> > > > > explain both issues.
> > > > >
> > > > > While this is not easy to trigger, there is nothing I think that
> > > > > can prevent that. The driver used to have a single context that
> > > > > would do both send and recv so this could not have happened, but
> > > > > now that we added the .queue_rq send context, I guess this can
> > > > > indeed confuse the driver.
> > > >
> > > > Awesome, this is exactly the type of sequence I've been trying to
> > > > capture, but couldn't quite get there. Now that you've described it,
> > > > that flow can certainly explain the observations, including the
> > > > corrupted debug trace event I was trying to add.
> > > >
> > > > The sequence looks unlikely to happen, which agrees with the difficulty
> > > > in reproducing it. I am betting right now that you got it, but a little
> > > > surprised no one else is reporting a similar problem yet.
> > >
> > > We had at least one report from Potnuri that I think may have been
> > > triggered by this, this ended up fixed (or rather worked-around
> > > with 5c11f7d9f843).
> > >
> > > > Your option "1" looks like the best one, IMO. I've requested dropping
> > > > all debug and test patches and using just this one on the current nvme
> > > > baseline for the next test cycle.
> > >
> > > Cool, waiting to hear back...
> >
> > This patch has been tested successfully on the initial workloads. There
> > are several more that need to be validated, but each one runs for many
> > hours, so it may be a couple more days before completed. Just wanted to
> > leat you know: so far, so good.
>
> Encouraging... I'll send a patch for that as soon as you give me the
> final verdict. I'm assuming Narayan would be the reporter and the
> tester?
This tests successfully. There was one timeout issue observed in all the
testing, but does not appear related to the reported problems here, or
your fix, so I will start a new thread on that if I can get more
information on it.
You may use the following tags for the commit log:
Reported-by: Narayan Ayalasomayajula <narayan.ayalasomayajula@wdc.com>
Tested-by: Anil Mishra <anil.mishra@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
prev parent reply other threads:[~2021-05-17 20:50 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-31 16:18 nvme tcp receive errors Keith Busch
2021-03-31 19:10 ` Sagi Grimberg
2021-03-31 20:49 ` Keith Busch
2021-03-31 22:16 ` Sagi Grimberg
2021-03-31 22:26 ` Keith Busch
2021-03-31 22:45 ` Sagi Grimberg
2021-04-02 17:11 ` Keith Busch
2021-04-02 17:27 ` Sagi Grimberg
2021-04-05 14:37 ` Keith Busch
2021-04-07 19:53 ` Keith Busch
2021-04-09 21:38 ` Sagi Grimberg
2021-04-27 23:39 ` Keith Busch
2021-04-27 23:55 ` Sagi Grimberg
2021-04-28 15:58 ` Keith Busch
2021-04-28 17:42 ` Sagi Grimberg
2021-04-28 18:01 ` Keith Busch
2021-04-28 23:06 ` Sagi Grimberg
2021-04-29 3:33 ` Keith Busch
2021-04-29 4:52 ` Sagi Grimberg
2021-05-03 18:51 ` Keith Busch
2021-05-03 19:58 ` Sagi Grimberg
2021-05-03 20:25 ` Keith Busch
2021-05-04 19:29 ` Sagi Grimberg
2021-04-09 18:04 ` Sagi Grimberg
2021-04-14 0:29 ` Keith Busch
2021-04-21 5:33 ` Sagi Grimberg
2021-04-21 14:28 ` Keith Busch
2021-04-21 16:59 ` Sagi Grimberg
2021-04-26 15:31 ` Keith Busch
2021-04-27 3:10 ` Sagi Grimberg
2021-04-27 18:12 ` Keith Busch
2021-04-27 23:58 ` Sagi Grimberg
2021-04-30 23:42 ` Sagi Grimberg
2021-05-03 14:28 ` Keith Busch
2021-05-03 19:36 ` Sagi Grimberg
2021-05-03 19:38 ` Sagi Grimberg
2021-05-03 19:44 ` Keith Busch
2021-05-03 20:00 ` Sagi Grimberg
2021-05-04 14:36 ` Keith Busch
2021-05-04 18:15 ` Sagi Grimberg
2021-05-04 19:14 ` Keith Busch
2021-05-10 18:06 ` Keith Busch
2021-05-10 18:18 ` Sagi Grimberg
2021-05-10 18:30 ` Keith Busch
2021-05-10 21:07 ` Sagi Grimberg
2021-05-11 3:00 ` Keith Busch
2021-05-11 17:17 ` Sagi Grimberg
2021-05-13 15:48 ` Keith Busch
2021-05-13 19:53 ` Sagi Grimberg
2021-05-17 20:48 ` Keith Busch [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210517204855.GB2709569@dhcp-10-100-145-180.wdc.com \
--to=kbusch@kernel.org \
--cc=hch@lst.de \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).