All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, hch@lst.de
Subject: Re: nvme tcp receive errors
Date: Thu, 13 May 2021 08:48:19 -0700	[thread overview]
Message-ID: <20210513154819.GB2272284@dhcp-10-100-145-180.wdc.com> (raw)
In-Reply-To: <88879279-fff5-b26c-2c1c-52a700a1c40a@grimberg.me>

On Tue, May 11, 2021 at 10:17:09AM -0700, Sagi Grimberg wrote:
> 
> > > I may have a theory to this issue. I think that the problem is in
> > > cases where we send commands with data to the controller and then in
> > > nvme_tcp_send_data between the last successful kernel_sendpage
> > > and before nvme_tcp_advance_req, the controller sends back a successful
> > > completion.
> > > 
> > > If that is the case, then the completion path could be triggered,
> > > the tag would be reused, triggering a new .queue_rq, setting again
> > > the req.iter with the new bio params (all is not taken by the
> > > send_mutex) and then the send context would call nvme_tcp_advance_req
> > > progressing the req.iter with the former sent bytes... And given that
> > > the req.iter is used for reads/writes, it is possible that it can
> > > explain both issues.
> > > 
> > > While this is not easy to trigger, there is nothing I think that
> > > can prevent that. The driver used to have a single context that
> > > would do both send and recv so this could not have happened, but
> > > now that we added the .queue_rq send context, I guess this can
> > > indeed confuse the driver.
> > 
> > Awesome, this is exactly the type of sequence I've been trying to
> > capture, but couldn't quite get there. Now that you've described it,
> > that flow can certainly explain the observations, including the
> > corrupted debug trace event I was trying to add.
> > 
> > The sequence looks unlikely to happen, which agrees with the difficulty
> > in reproducing it. I am betting right now that you got it, but a little
> > surprised no one else is reporting a similar problem yet.
> 
> We had at least one report from Potnuri that I think may have been
> triggered by this, this ended up fixed (or rather worked-around
> with 5c11f7d9f843).
> 
> > Your option "1" looks like the best one, IMO. I've requested dropping
> > all debug and test patches and using just this one on the current nvme
> > baseline for the next test cycle.
> 
> Cool, waiting to hear back...

This patch has been tested successfully on the initial workloads. There
are several more that need to be validated, but each one runs for many
hours, so it may be a couple more days before completed. Just wanted to
leat you know: so far, so good.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-05-13 15:49 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-31 16:18 nvme tcp receive errors Keith Busch
2021-03-31 19:10 ` Sagi Grimberg
2021-03-31 20:49   ` Keith Busch
2021-03-31 22:16     ` Sagi Grimberg
2021-03-31 22:26       ` Keith Busch
2021-03-31 22:45         ` Sagi Grimberg
2021-04-02 17:11     ` Keith Busch
2021-04-02 17:27       ` Sagi Grimberg
2021-04-05 14:37         ` Keith Busch
2021-04-07 19:53           ` Keith Busch
2021-04-09 21:38             ` Sagi Grimberg
2021-04-27 23:39               ` Keith Busch
2021-04-27 23:55                 ` Sagi Grimberg
2021-04-28 15:58                   ` Keith Busch
2021-04-28 17:42                     ` Sagi Grimberg
2021-04-28 18:01                       ` Keith Busch
2021-04-28 23:06                         ` Sagi Grimberg
2021-04-29  3:33                           ` Keith Busch
2021-04-29  4:52                             ` Sagi Grimberg
2021-05-03 18:51                               ` Keith Busch
2021-05-03 19:58                                 ` Sagi Grimberg
2021-05-03 20:25                                   ` Keith Busch
2021-05-04 19:29                                     ` Sagi Grimberg
2021-04-09 18:04           ` Sagi Grimberg
2021-04-14  0:29             ` Keith Busch
2021-04-21  5:33               ` Sagi Grimberg
2021-04-21 14:28                 ` Keith Busch
2021-04-21 16:59                   ` Sagi Grimberg
2021-04-26 15:31                 ` Keith Busch
2021-04-27  3:10                   ` Sagi Grimberg
2021-04-27 18:12                     ` Keith Busch
2021-04-27 23:58                       ` Sagi Grimberg
2021-04-30 23:42                         ` Sagi Grimberg
2021-05-03 14:28                           ` Keith Busch
2021-05-03 19:36                             ` Sagi Grimberg
2021-05-03 19:38                               ` Sagi Grimberg
2021-05-03 19:44                                 ` Keith Busch
2021-05-03 20:00                                   ` Sagi Grimberg
2021-05-04 14:36                                     ` Keith Busch
2021-05-04 18:15                                       ` Sagi Grimberg
2021-05-04 19:14                                         ` Keith Busch
2021-05-10 18:06                                           ` Keith Busch
2021-05-10 18:18                                             ` Sagi Grimberg
2021-05-10 18:30                                               ` Keith Busch
2021-05-10 21:07                                                 ` Sagi Grimberg
2021-05-11  3:00                                                   ` Keith Busch
2021-05-11 17:17                                                     ` Sagi Grimberg
2021-05-13 15:48                                                       ` Keith Busch [this message]
2021-05-13 19:53                                                         ` Sagi Grimberg
2021-05-17 20:48                                                           ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210513154819.GB2272284@dhcp-10-100-145-180.wdc.com \
    --to=kbusch@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.