linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>, Jens Axboe <axboe@kernel.dk>,
	Keith Busch <kbusch@kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
	Chaitanya Kulkarni <kch@nvidia.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: 6.2 nvme-pci: something wrong
Date: Sat, 24 Dec 2022 02:19:34 -0800 (PST)	[thread overview]
Message-ID: <c67eba0-fc5a-4ad0-971-cf80bc1c6e5a@google.com> (raw)
In-Reply-To: <Y6amzxU7choHAXWi@infradead.org>

On Fri, 23 Dec 2022, Christoph Hellwig wrote:
> On Fri, Dec 23, 2022 at 09:24:56PM -0800, Hugh Dickins wrote:
> > Hi Christoph,
> > 
> > There's something wrong with the nvme-pci heading for 6.2-rc1:
> > no problem booting here on this Lenovo ThinkPad X1 Carbon 5th,
> > but under load...
> > 
> > nvme nvme0: I/O 0 (I/O Cmd) QID 2 timeout, aborting
> > nvme nvme0: I/O 1 (I/O Cmd) QID 2 timeout, aborting
> > nvme nvme0: I/O 2 (I/O Cmd) QID 2 timeout, aborting
> > nvme nvme0: I/O 3 (I/O Cmd) QID 2 timeout, aborting
> > nvme nvme0: Abort status: 0x0
> > nvme nvme0: Abort status: 0x0
> > nvme nvme0: Abort status: 0x0
> > nvme nvme0: Abort status: 0x0
> > nvme nvme0: I/O 0 QID 2 timeout, reset controller
> > 
> > ...and more, until I just have to poweroff and reboot.
> > 
> > Bisection points to your
> > 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
> > And that does revert cleanly, giving a kernel which shows no problem.
> > 
> > I've spent a while comparing old nvme_pci_alloc_tag_set() and new
> > nvme_alloc_io_tag_set(), I do not know my way around there at all
> > and may be talking nonsense, but it did look as if there might now
> > be a difference in the queue_depth, sqsize, q_depth conversions.
> > 
> > I'm running load successfully with the patch below, but I strongly
> > suspect that the right patch will be somewhere else: over to you!
> 
> Thanks for the report!  The patch is definitively wrong, ->sqsize
> hold one of the awful so called 'zeroes based values' in NVMe,
> where 0 means 1, and thus have a built-in one off.  We should
> probably convert it to a sane value at read time, but that's a
> separate discussion.
> 
> I suspect your controller is one of those where we quirk the size,
> and the commit you bisected fails to reflects that in the common
> sqsizse value.  The patch below should be the minimum fix, and in
> the long term, the duplicate bookkeeping for it in the PCI driver
> should go away:

Thanks for the rapid response.  No, I've just tried your patch below,
and it does not help.  And I don't see any message about "queue size"
in dmesg, so don't think mine is quirked.

cat /sys/devices/pci0000:00/0000:00:1c.4/0000:05:00.0/nvme/nvme0/sqsize
tells me 511 (and that looked like the only relevant file under /sys).

Regarding the awful 0's based queue depth: yes, it just looked to me
as if the way that got handled in pci.c before differed from the way
it gets handled in pci.c and core.c now, one too many "+ 1"s or "- 1"s
somewhere.

> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index f0f8027644bbf8..a73c0ee7bd1892 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2536,7 +2536,6 @@ static int nvme_pci_enable(struct nvme_dev *dev)
>  
>  	dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
>  				io_queue_depth);
> -	dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
>  	dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap);
>  	dev->dbs = dev->bar + 4096;
>  
> @@ -2577,7 +2576,7 @@ static int nvme_pci_enable(struct nvme_dev *dev)
>  		dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n",
>  			 dev->q_depth);
>  	}
> -
> +	dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
>  
>  	nvme_map_cmb(dev);
>  

  reply	other threads:[~2022-12-24 10:19 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-24  5:24 6.2 nvme-pci: something wrong Hugh Dickins
2022-12-24  7:14 ` Christoph Hellwig
2022-12-24 10:19   ` Hugh Dickins [this message]
2022-12-24 16:56     ` Linus Torvalds
2022-12-24  7:52 ` 6.2 nvme-pci: something wrong #forregzbot Thorsten Leemhuis
2023-01-04 14:02   ` Thorsten Leemhuis
2022-12-24 22:06 ` 6.2 nvme-pci: something wrong Keith Busch
2022-12-25  5:30   ` Christoph Hellwig
2022-12-25  8:33     ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c67eba0-fc5a-4ad0-971-cf80bc1c6e5a@google.com \
    --to=hughd@google.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=regressions@leemhuis.info \
    --cc=sagi@grimberg.me \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).