From: Keith Busch <keith.busch@intel.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Jaesoo Lee <jalee@purestorage.com>,
axboe@fb.com, hch@lst.de, linux-nvme@lists.infradead.org,
linux-kernel@vger.kernel.org,
Prabhath Sajeepa <psajeepa@purestorage.com>,
Roland Dreier <roland@purestorage.com>,
Ashish Karkare <ashishk@purestorage.com>
Subject: Re: [PATCH] nvme-rdma: complete requests from ->timeout
Date: Fri, 7 Dec 2018 19:02:01 -0700 [thread overview]
Message-ID: <20181208020201.GD21523@localhost.localdomain> (raw)
In-Reply-To: <2055d5b5-2c27-b5a2-e3a0-75146c7bd227@grimberg.me>
On Fri, Dec 07, 2018 at 12:05:37PM -0800, Sagi Grimberg wrote:
>
> > Could you please take a look at this bug and code review?
> >
> > We are seeing more instances of this bug and found that reconnect_work
> > could hang as well, as can be seen from below stacktrace.
> >
> > Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
> > Call Trace:
> > __schedule+0x2ab/0x880
> > schedule+0x36/0x80
> > schedule_timeout+0x161/0x300
> > ? __next_timer_interrupt+0xe0/0xe0
> > io_schedule_timeout+0x1e/0x50
> > wait_for_completion_io_timeout+0x130/0x1a0
> > ? wake_up_q+0x80/0x80
> > blk_execute_rq+0x6e/0xa0
> > __nvme_submit_sync_cmd+0x6e/0xe0
> > nvmf_connect_admin_queue+0x128/0x190 [nvme_fabrics]
> > ? wait_for_completion_interruptible_timeout+0x157/0x1b0
> > nvme_rdma_start_queue+0x5e/0x90 [nvme_rdma]
> > nvme_rdma_setup_ctrl+0x1b4/0x730 [nvme_rdma]
> > nvme_rdma_reconnect_ctrl_work+0x27/0x70 [nvme_rdma]
> > process_one_work+0x179/0x390
> > worker_thread+0x4f/0x3e0
> > kthread+0x105/0x140
> > ? max_active_store+0x80/0x80
> > ? kthread_bind+0x20/0x20
> >
> > This bug is produced by setting MTU of RoCE interface to '568' for
> > test while running I/O traffics.
>
> I think that with the latest changes from Keith we can no longer rely
> on blk-mq to barrier racing completions. We will probably need
> to barrier ourselves in nvme-rdma...
You really need to do that anyway. If you were relying on blk-mq to save
you from double completions by ending a request in the nvme driver while
the lower half can still complete the same one, the only thing preventing
data corruption is the probability the request wasn't reallocated for a
new command.
next prev parent reply other threads:[~2018-12-08 2:04 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-29 23:59 [PATCH] nvme-rdma: complete requests from ->timeout Jaesoo Lee
2018-11-30 1:30 ` Sagi Grimberg
2018-11-30 1:54 ` Jaesoo Lee
2018-12-07 0:18 ` Jaesoo Lee
2018-12-07 20:05 ` Sagi Grimberg
2018-12-08 2:02 ` Keith Busch [this message]
2018-12-08 6:28 ` Jaesoo Lee
2018-12-09 14:22 ` Nitzan Carmi
2018-12-10 23:40 ` Jaesoo Lee
2018-12-11 9:14 ` Nitzan Carmi
2018-12-11 23:16 ` Jaesoo Lee
2018-12-11 23:38 ` Sagi Grimberg
2018-12-12 1:31 ` Jaesoo Lee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181208020201.GD21523@localhost.localdomain \
--to=keith.busch@intel.com \
--cc=ashishk@purestorage.com \
--cc=axboe@fb.com \
--cc=hch@lst.de \
--cc=jalee@purestorage.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=psajeepa@purestorage.com \
--cc=roland@purestorage.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).