All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Chaitanya Kulkarni <chaitanyak@nvidia.com>
Cc: "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"hch@lst.de" <hch@lst.de>,
	"kbusch@kernel.org" <kbusch@kernel.org>
Subject: Re: [PATCH 3/3] nvme: add KConfig options for debug features
Date: Mon, 13 Dec 2021 11:27:09 +0200	[thread overview]
Message-ID: <6a1d326e-ce96-ef9e-f09d-7ebdc9ef6084@grimberg.me> (raw)
In-Reply-To: <492750cf-b031-5709-0836-c28e475c29fd@nvidia.com>


>>> Add KConfig menu option to enable and disable gencounter debug
>>> feature that uses config NVME_DEBUG_USE_CID_GENCTR.
>>>
>>> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
>>> ---
>>>    drivers/nvme/host/Kconfig | 10 ++++++++++
>>>    1 file changed, 10 insertions(+)
>>>
>>> diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
>>> index dc0450ca23a3..dfa2609b7006 100644
>>> --- a/drivers/nvme/host/Kconfig
>>> +++ b/drivers/nvme/host/Kconfig
>>> @@ -1,4 +1,14 @@
>>>    # SPDX-License-Identifier: GPL-2.0-only
>>> +menu "Debug (Enable driver debug features)"
>>> +config NVME_DEBUG_USE_CID_GENCTR
>>> +     bool "Enable command ID gen counter for spurious request
>>> completion"
>>> +     depends on NVME_CORE
>>> +     help
>>> +       The NVM Express driver will use generation counter
>>> +       when calculating the command id. This is needed to debug the
>>> +       spurious request completions coming from a buggy controller.
>>
>> This is not just to debug - it is also to protect against such a
>> controller. What is the purpose of this config option anyways?
>> The main distributions will (as they should) enable it anyways...
> 
> I can rewrite the text and rename it to "driver features".

It is not a feature.

> We are protecting against such a controller which is not stable
> (buggy), i.e. it is doing things which it shouldn't be
> doing at the first place.

Yes, this is exactly what we are doing.

> Consider a case if controller is not
> buggy then it adds instructions in the fast path which are not
> needed at all.

We keep repeating this, if the controller assumes _anything_ on
the command identifier then it _is_ buggy, period.

Again, the fact that the controller is buggy is perfectly fine
really, that's why we have quirks.

What happened to your nvme-cli argument? that would make it
specific to that controller.

The host side cost is very cheap as shown before.

> A controller(s) that is used in the production environment goes
> through qualification process from vendors and from the consumers
> to make sure they are stable, something like spurious completions
> detection is a basic part of the qualification,

I don't think that assuming that a production controller does not
have any hidden bugs is a great practice, and controllers evolve during
their production lifetime.

> hence we should
> allow user to configure genctr than forcing additional
> instructions in the fast path and keep this pattern for future
> such cases.

I personally think that given that the worst-case option for such
a bug is possible data-corruption then we should not make it optional,
but I won't insist on it if others think otherwise.

> Maybe I didn't understand, can you please explain what are the
> benefits of having gen-counter where controller is stable?

The benefit is that the host code does not "assume" anything
about the controller.

> I'll wait to send V2 if you can suggest any other way
> (than kconfig something like module param, I'm not sure) so user
> can configure genctr calculations, please let me know.

As I said, the nvme-cli skip-genctr argument is perfectly valid in my
mind.


  reply	other threads:[~2021-12-13  9:28 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10 11:21 [PATCH 0/3] nvme-core: make gencounter feature tunable Chaitanya Kulkarni
2021-12-10 11:21 ` [PATCH 1/3] nvme-core: make cid gencounter configurable Chaitanya Kulkarni
2021-12-10 11:21 ` [PATCH 2/3] nvme-core: move gencounter check into nvme_cid() Chaitanya Kulkarni
2021-12-10 15:43   ` Keith Busch
2021-12-10 17:46     ` Chaitanya Kulkarni
2021-12-12  9:19       ` Sagi Grimberg
2021-12-13  7:07         ` Chaitanya Kulkarni
2021-12-10 11:21 ` [PATCH 3/3] nvme: add KConfig options for debug features Chaitanya Kulkarni
2021-12-12  9:22   ` Sagi Grimberg
2021-12-13  7:39     ` Chaitanya Kulkarni
2021-12-13  9:27       ` Sagi Grimberg [this message]
2021-12-14  7:54         ` Chaitanya Kulkarni
2021-12-14 11:03           ` Sagi Grimberg
2021-12-17  6:52             ` Chaitanya Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6a1d326e-ce96-ef9e-f09d-7ebdc9ef6084@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=chaitanyak@nvidia.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.