All of lore.kernel.org
 help / color / mirror / Atom feed
From: Potnuri Bharat Teja <bharat@chelsio.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Samuel Jones <sjones@kalrayinc.com>, "hch@lst.de" <hch@lst.de>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: Request timeout seen with NVMEoF TCP
Date: Mon, 14 Dec 2020 23:35:31 +0530	[thread overview]
Message-ID: <X9epa3PPORdBKZ71@chelsio.com> (raw)
In-Reply-To: <dbb14893-da35-2f7e-0583-c33ce9f41243@grimberg.me>

On Monday, December 12/14/20, 2020 at 09:51:28 -0800, Sagi Grimberg wrote:
> 
> > Hi Sagi,
> > With above patch I still see the issue but less frequently. Without patch I was
> > able to consistently reproduce the timouts with 4 target devices. With patch I
> > see IO running fine for 4 targets. Tried the same test with 8 target devices
> > and I see the below timeout. I've observed only one instance of timeout. So,
> > I'll let it run for somemore time or rerun and update.
> 
> Hey Potnuri,
> 
> Have you observed this further?
> 
> I'd think that if the io_work reschedule itself when it races
> with the direct send path this should not happen, but we may be
> seeing a different race going on here, adding Samuel who saw
> a similar phenomenon.
> 

Yes, I have seen it further too but very intermittently. Below is the dmesg that
I saved before my test got stopped for some other reason. From the timestamps, 
timeouts were around ~20min, ~45min and ~3 hours apart.

dmesg on intiator:
---
[ 1735.038634] EXT4-fs (nvme7n1): mounted filesystem with ordered data mod                                                                                e. Opts: (null)
[ 2111.990419] nvme nvme5: queue 7: timeout request 0x57 type 4
[ 2111.991835] nvme nvme5: starting error recovery
[ 2111.998796] block nvme5n1: no usable path - requeuing I/O
[ 2111.998816] nvme nvme5: Reconnecting in 10 seconds...
[ 2122.253431] block nvme5n1: no usable path - requeuing I/O
[ 2122.254732] nvme nvme5: creating 16 I/O queues.
[ 2122.301169] nvme nvme5: mapped 16/0/0 default/read/poll queues.
[ 2122.314229] nvme nvme5: Successfully reconnected (1 attempt)
[ 2522.795467] perf: interrupt took too long (2503 > 2500), lowering kerne                                                                                l.perf_event_max_sample_rate to 79000
[ 3043.544758] perf: interrupt took too long (3135 > 3128), lowering kerne                                                                                l.perf_event_max_sample_rate to 63000
[ 4197.260041] perf: interrupt took too long (3922 > 3918), lowering kerne                                                                                l.perf_event_max_sample_rate to 50000
[ 4848.175367] nvme nvme1: queue 6: timeout request 0x42 type 4
[ 4848.176086] nvme nvme1: starting error recovery
[ 4848.176864] nvme nvme1: queue 6: timeout request 0x43 type 4
[ 4848.178194] nvme nvme1: queue 6: timeout request 0x44 type 4
[ 4848.179873] block nvme1n1: no usable path - requeuing I/O
[ 4848.180665] block nvme1n1: no usable path - requeuing I/O
[ 4848.180775] nvme nvme1: Reconnecting in 10 seconds...
[ 4848.181372] block nvme1n1: no usable path - requeuing I/O
[ 4848.183022] block nvme1n1: no usable path - requeuing I/O
[ 4848.183782] block nvme1n1: no usable path - requeuing I/O
[ 4858.436889] block nvme1n1: no usable path - requeuing I/O
[ 4858.437558] block nvme1n1: no usable path - requeuing I/O
[ 4858.438168] block nvme1n1: no usable path - requeuing I/O
[ 4858.439547] nvme nvme1: creating 16 I/O queues.
[ 4858.489650] nvme nvme1: mapped 16/0/0 default/read/poll queues.
[ 4858.490357] debugfs: Directory 'hctx0' with parent '/' already present!
[ 4858.491256] debugfs: Directory 'hctx1' with parent '/' already present!
[ 4858.492790] debugfs: Directory 'hctx2' with parent '/' already present!
[ 4858.493601] debugfs: Directory 'hctx3' with parent '/' already present!
[ 4858.494362] debugfs: Directory 'hctx4' with parent '/' already present!
[ 4858.495310] debugfs: Directory 'hctx5' with parent '/' already present!
[ 4858.496261] debugfs: Directory 'hctx6' with parent '/' already present!
[ 4858.497098] debugfs: Directory 'hctx7' with parent '/' already present!
[ 4858.497972] debugfs: Directory 'hctx8' with parent '/' already present!
[ 4858.498806] debugfs: Directory 'hctx9' with parent '/' already present!
[ 4858.499574] debugfs: Directory 'hctx10' with parent '/' already present                                                                                !
[ 4858.500268] debugfs: Directory 'hctx11' with parent '/' already present                                                                                !
[ 4858.501030] debugfs: Directory 'hctx12' with parent '/' already present                                                                                !
[ 4858.501807] debugfs: Directory 'hctx13' with parent '/' already present                                                                                !
[ 4858.502459] debugfs: Directory 'hctx14' with parent '/' already present                                                                                !
[ 4858.503196] debugfs: Directory 'hctx15' with parent '/' already present                                                                                !
[ 4858.510075] nvme nvme1: Successfully reconnected (1 attempt)
[12905.117166] perf: interrupt took too long (4944 > 4902), lowering kernel.perf_event_max_sample_rate to 40000
[15436.031342] nvme nvme0: queue 16: timeout request 0x22 type 4
[15436.031920] nvme nvme0: starting error recovery
[15436.033746] block nvme0n1: no usable path - requeuing I/O
[15436.035221] nvme nvme0: Reconnecting in 10 seconds...
[15446.292544] block nvme0n1: no usable path - requeuing I/O
[15446.293541] nvme nvme0: creating 16 I/O queues.
[15446.339595] nvme nvme0: mapped 16/0/0 default/read/poll queues.
[15446.340846] debugfs: Directory 'hctx0' with parent '/' already present!
[15446.342664] debugfs: Directory 'hctx1' with parent '/' already present!
[15446.343413] debugfs: Directory 'hctx2' with parent '/' already present!
[15446.344228] debugfs: Directory 'hctx3' with parent '/' already present!
[15446.345155] debugfs: Directory 'hctx4' with parent '/' already present!
[15446.345948] debugfs: Directory 'hctx5' with parent '/' already present!
[15446.346767] debugfs: Directory 'hctx6' with parent '/' already present!
[15446.347368] debugfs: Directory 'hctx7' with parent '/' already present!
[15446.348129] debugfs: Directory 'hctx8' with parent '/' already present!
[15446.348948] debugfs: Directory 'hctx9' with parent '/' already present!
[15446.349538] debugfs: Directory 'hctx10' with parent '/' already present!
[15446.350185] debugfs: Directory 'hctx11' with parent '/' already present!
[15446.350757] debugfs: Directory 'hctx12' with parent '/' already present!
[15446.351539] debugfs: Directory 'hctx13' with parent '/' already present!
[15446.352200] debugfs: Directory 'hctx14' with parent '/' already present!
[15446.352722] debugfs: Directory 'hctx15' with parent '/' already present!
[15446.358750] nvme nvme0: Successfully reconnected (1 attempt)

---


> > 
> > Target dmesg:
> > ---
> > [ 1704.132366] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.185987] nvmet: creating controller 2 for subsystem nvme-ram1 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.230065] nvmet: creating controller 3 for subsystem nvme-ram2 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.277712] nvmet: creating controller 4 for subsystem nvme-ram3 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.314457] nvmet: creating controller 5 for subsystem nvme-ram4 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.370124] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.435581] nvmet: creating controller 7 for subsystem nvme-ram6 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.501813] nvmet: creating controller 8 for subsystem nvme-ram7 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 2103.965017] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > ^^^^^^^^^^^^^^^
> > ---
> > 
> > Initiator dmesg:
> > ---
> > [ 1735.038634] EXT4-fs (nvme7n1): mounted filesystem with ordered data mode. Opts: (null)
> > [ 2111.990419] nvme nvme5: queue 7: timeout request 0x57 type 4
> > [ 2111.991835] nvme nvme5: starting error recovery
> > [ 2111.998796] block nvme5n1: no usable path - requeuing I/O
> > [ 2111.998816] nvme nvme5: Reconnecting in 10 seconds...
> > [ 2122.253431] block nvme5n1: no usable path - requeuing I/O
> > [ 2122.254732] nvme nvme5: creating 16 I/O queues.
> > [ 2122.301169] nvme nvme5: mapped 16/0/0 default/read/poll queues.
> > [ 2122.314229] nvme nvme5: Successfully reconnected (1 attempt)
> > ---
> > 
> > 
> > Thanks.
> > 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-12-14 18:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-09 17:19 Request timeout seen with NVMEoF TCP Potnuri Bharat Teja
2020-12-09 18:02 ` Sagi Grimberg
2020-12-10 17:06   ` Potnuri Bharat Teja
2020-12-10 21:47     ` Sagi Grimberg
2020-12-11  7:26       ` Potnuri Bharat Teja
2020-12-14 17:51         ` Sagi Grimberg
2020-12-14 18:05           ` Potnuri Bharat Teja [this message]
2020-12-14 19:27             ` Samuel Jones
2020-12-14 20:13               ` Sagi Grimberg
2020-12-15  1:53           ` Sagi Grimberg
2020-12-15 18:30             ` Wunderlich, Mark
2020-12-19  1:04               ` Sagi Grimberg
2020-12-16  5:51             ` Potnuri Bharat Teja
2020-12-19  1:35               ` Sagi Grimberg
2020-12-21  7:50                 ` Potnuri Bharat Teja

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X9epa3PPORdBKZ71@chelsio.com \
    --to=bharat@chelsio.com \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=sjones@kalrayinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.