linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] nvme: do not log errors for user commands
@ 2022-10-06  9:36 Daniel Wagner
  2022-10-06 12:16 ` Pankaj Raghav
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Wagner @ 2022-10-06  9:36 UTC (permalink / raw)
  To: linux-nvme
  Cc: Keith Busch, Daniel Wagner, yi.zhang, Tomas Bzatek, Alan Adamson,
	Chaitanya Kulkarni, Martin K . Petersen

User space might issue commands which fail. Even though userland
handles them correclty, these command get logged which irritates
users.

E.g. libnvme's nvme_scan_topology function is using Namespace
Identification Descriptor list (CNS Value 0x03) for discovering. It
handles the case where it fails but still the kernel logs this as
error:

 nvme0: Admin Cmd(0x6), I/O Error (sct 0x0 / sc 0x2) DNR
 nvme1: Admin Cmd(0x6), I/O Error (sct 0x0 / sc 0x2) DNR

This disables error logging via nvme_log_error for all passthrough
commands. Instead reverting the complete patch, let's disable this bit
until the interested parties in extended logging have figured out how
to do it without introducing regressions. Removing the now
non-functional part in nvme_log_error can be done later if needed.

Fixes: bd83fe6f2cd2 ("nvme: add verbose error logging")
Link: https://github.com/linux-nvme/libnvme/issues/485
Reported-by: yi.zhang@redhat.com
Reported-by: Tomas Bzatek <tbzatek@redhat.com>
Cc: Alan Adamson <alan.adamson@oracle.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---

v1:
 - SET RQF_QUIET flag instead of checking NVME_REQ_USERCMD in
   nvme_log_error.
v0:
 - https://lore.kernel.org/linux-nvme/20220929171338.10792-1-dwagner@suse.de/

 drivers/nvme/host/ioctl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index a48a79ed5c4c..b2c1b73f0a22 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -151,6 +151,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 	bio = req->bio;
 	ctrl = nvme_req(req)->ctrl;
 
+	req->rq_flags |= RQF_QUIET;
 	ret = nvme_execute_passthru_rq(req, &effects);
 
 	if (result)
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] nvme: do not log errors for user commands
  2022-10-06  9:36 [PATCH v1] nvme: do not log errors for user commands Daniel Wagner
@ 2022-10-06 12:16 ` Pankaj Raghav
  2022-10-06 13:30   ` Daniel Wagner
  0 siblings, 1 reply; 8+ messages in thread
From: Pankaj Raghav @ 2022-10-06 12:16 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, Keith Busch, yi.zhang, Tomas Bzatek, Alan Adamson,
	Chaitanya Kulkarni, Martin K . Petersen, gost.dev, Pankaj Raghav

On Thu, Oct 06, 2022 at 11:36:28AM +0200, Daniel Wagner wrote:
> User space might issue commands which fail. Even though userland
> handles them correclty, these command get logged which irritates
> users.
> 
> E.g. libnvme's nvme_scan_topology function is using Namespace
> Identification Descriptor list (CNS Value 0x03) for discovering. It
> handles the case where it fails but still the kernel logs this as
> error:
> 
>  nvme0: Admin Cmd(0x6), I/O Error (sct 0x0 / sc 0x2) DNR
>  nvme1: Admin Cmd(0x6), I/O Error (sct 0x0 / sc 0x2) DNR
> 
> This disables error logging via nvme_log_error for all passthrough
> commands. Instead reverting the complete patch, let's disable this bit
> until the interested parties in extended logging have figured out how
> to do it without introducing regressions. Removing the now
> non-functional part in nvme_log_error can be done later if needed.
<snip>
> @@ -151,6 +151,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
>  	bio = req->bio;
>  	ctrl = nvme_req(req)->ctrl;
>  
> +	req->rq_flags |= RQF_QUIET;

With this, we are disabling logging for ADMIN and IO commands via the
ioctl. IMO, the error logging for IO commands could be useful for
debugging. But I do understand your regression argument here.

>  	ret = nvme_execute_passthru_rq(req, &effects);
>  
>  	if (result)

We can also send admin commands via the uring_cmd interface:
nvme_dev_uring_cmd(). Should we also enable RQF_QUIET flags for them?

P.S: I sent a patch today that touches the error logging in NVMe that
sets the starting LBA as zero for IO passthrough requests.
https://lore.kernel.org/linux-nvme/20221006091053.36611-1-p.raghav@samsung.com/



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] nvme: do not log errors for user commands
  2022-10-06 12:16 ` Pankaj Raghav
@ 2022-10-06 13:30   ` Daniel Wagner
  2022-10-07 21:19     ` Alan Adamson
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Wagner @ 2022-10-06 13:30 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: linux-nvme, Keith Busch, yi.zhang, Tomas Bzatek, Alan Adamson,
	Chaitanya Kulkarni, Martin K . Petersen, gost.dev, Pankaj Raghav

On Thu, Oct 06, 2022 at 02:16:35PM +0200, Pankaj Raghav wrote:
> > +	req->rq_flags |= RQF_QUIET;
> 
> With this, we are disabling logging for ADMIN and IO commands via the
> ioctl. IMO, the error logging for IO commands could be useful for
> debugging. But I do understand your regression argument here.

I do get the debugging argument, but this clearly regresses production
code. Ideally, this code should be opt-in. The only way I see how to make
this work is by introducing a new API which allows enable this
feature.

> >  	ret = nvme_execute_passthru_rq(req, &effects);
> >  
> >  	if (result)
> 
> We can also send admin commands via the uring_cmd interface:
> nvme_dev_uring_cmd(). Should we also enable RQF_QUIET flags for them?

Good point, this path is likely to have the same problem.
 
> P.S: I sent a patch today that touches the error logging in NVMe that
> sets the starting LBA as zero for IO passthrough requests.
> https://lore.kernel.org/linux-nvme/20221006091053.36611-1-p.raghav@samsung.com/

Yes, saw your patch, but with this patch it wont do match afterwards :)



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] nvme: do not log errors for user commands
  2022-10-06 13:30   ` Daniel Wagner
@ 2022-10-07 21:19     ` Alan Adamson
  2022-10-10  7:36       ` Daniel Wagner
  2022-10-17  7:07       ` Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Alan Adamson @ 2022-10-07 21:19 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Pankaj Raghav, open list:NVM EXPRESS DRIVER, Keith Busch,
	Yi Zhang, Tomas Bzatek, Chaitanya Kulkarni, Martin Petersen,
	gost.dev, Pankaj Raghav



> On Oct 6, 2022, at 6:30 AM, Daniel Wagner <dwagner@suse.de> wrote:
> 
> On Thu, Oct 06, 2022 at 02:16:35PM +0200, Pankaj Raghav wrote:
>>> +	req->rq_flags |= RQF_QUIET;
>> 
>> With this, we are disabling logging for ADMIN and IO commands via the
>> ioctl. IMO, the error logging for IO commands could be useful for
>> debugging. But I do understand your regression argument here.
> 
> I do get the debugging argument, but this clearly regresses production
> code. Ideally, this code should be opt-in. The only way I see how to make
> this work is by introducing a new API which allows enable this
> feature.

Looks like a new API may need to be created.  I’ll investigate that.  With your change, there
will need to be a blktests change.



nvme/039 => nvme0n1 (test error logging)                     [failed]
    runtime  0.120s  ...  0.118s
    --- tests/nvme/039.out	2022-09-21 17:14:12.760890663 -0400
    +++ /root/blktests/results/nvme0n1/nvme/039.out.bad	2022-10-06 19:11:45.846879996 -0400
    @@ -2,6 +2,4 @@
      Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR 
      Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR 
      Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR 
    - Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR 
    - Unknown(0x96), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR 
     Test complete
[root@localhost blktests]#


Alan

> 
>>> 	ret = nvme_execute_passthru_rq(req, &effects);
>>> 
>>> 	if (result)
>> 
>> We can also send admin commands via the uring_cmd interface:
>> nvme_dev_uring_cmd(). Should we also enable RQF_QUIET flags for them?
> 
> Good point, this path is likely to have the same problem.
> 
>> P.S: I sent a patch today that touches the error logging in NVMe that
>> sets the starting LBA as zero for IO passthrough requests.
>> https://lore.kernel.org/linux-nvme/20221006091053.36611-1-p.raghav@samsung.com/
> 
> Yes, saw your patch, but with this patch it wont do match afterwards :)
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] nvme: do not log errors for user commands
  2022-10-07 21:19     ` Alan Adamson
@ 2022-10-10  7:36       ` Daniel Wagner
  2022-10-17  7:07       ` Christoph Hellwig
  1 sibling, 0 replies; 8+ messages in thread
From: Daniel Wagner @ 2022-10-10  7:36 UTC (permalink / raw)
  To: Alan Adamson
  Cc: Pankaj Raghav, open list:NVM EXPRESS DRIVER, Keith Busch,
	Yi Zhang, Tomas Bzatek, Chaitanya Kulkarni, Martin Petersen,
	gost.dev, Pankaj Raghav

Hi Alan,

On Fri, Oct 07, 2022 at 09:19:08PM +0000, Alan Adamson wrote:
> > I do get the debugging argument, but this clearly regresses production
> > code. Ideally, this code should be opt-in. The only way I see how to make
> > this work is by introducing a new API which allows enable this
> > feature.
> 
> Looks like a new API may need to be created.  I’ll investigate that.
> With your change, there will need to be a blktests change.

Thanks for picking this up.

> nvme/039 => nvme0n1 (test error logging)                     [failed]
>     runtime  0.120s  ...  0.118s
>     --- tests/nvme/039.out	2022-09-21 17:14:12.760890663 -0400
>     +++ /root/blktests/results/nvme0n1/nvme/039.out.bad	2022-10-06 19:11:45.846879996 -0400
>     @@ -2,6 +2,4 @@
>       Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR 
>       Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR 
>       Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR 
>     - Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR 
>     - Unknown(0x96), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR 
>      Test complete

Hmm, that is not supposed to happen. Well, obviously this needs some
more work :)

Thanks,
Daniel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] nvme: do not log errors for user commands
  2022-10-07 21:19     ` Alan Adamson
  2022-10-10  7:36       ` Daniel Wagner
@ 2022-10-17  7:07       ` Christoph Hellwig
  2022-10-17 16:37         ` Keith Busch
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2022-10-17  7:07 UTC (permalink / raw)
  To: Alan Adamson
  Cc: Daniel Wagner, Pankaj Raghav, open list:NVM EXPRESS DRIVER,
	Keith Busch, Yi Zhang, Tomas Bzatek, Chaitanya Kulkarni,
	Martin Petersen, gost.dev, Pankaj Raghav

On Fri, Oct 07, 2022 at 09:19:08PM +0000, Alan Adamson wrote:
> > I do get the debugging argument, but this clearly regresses production
> > code. Ideally, this code should be opt-in. The only way I see how to make
> > this work is by introducing a new API which allows enable this
> > feature.
> 
> Looks like a new API may need to be created.  I’ll investigate that.
> With your change, there
> will need to be a blktests change.

Yes, I think we need to effectively revert this for user commands and
then add an opt in - be that from the submitter, or through a sysfs
file for all command.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] nvme: do not log errors for user commands
  2022-10-17  7:07       ` Christoph Hellwig
@ 2022-10-17 16:37         ` Keith Busch
  2022-10-17 16:40           ` Alan Adamson
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2022-10-17 16:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alan Adamson, Daniel Wagner, Pankaj Raghav,
	open list:NVM EXPRESS DRIVER, Yi Zhang, Tomas Bzatek,
	Chaitanya Kulkarni, Martin Petersen, gost.dev, Pankaj Raghav

On Mon, Oct 17, 2022 at 12:07:33AM -0700, Christoph Hellwig wrote:
> On Fri, Oct 07, 2022 at 09:19:08PM +0000, Alan Adamson wrote:
> > > I do get the debugging argument, but this clearly regresses production
> > > code. Ideally, this code should be opt-in. The only way I see how to make
> > > this work is by introducing a new API which allows enable this
> > > feature.
> > 
> > Looks like a new API may need to be created.  I’ll investigate that.
> > With your change, there
> > will need to be a blktests change.
> 
> Yes, I think we need to effectively revert this for user commands and
> then add an opt in - be that from the submitter, or through a sysfs
> file for all command.

Yeah, I think we should revert logging errors for user admin commands
(io command logs can probably remain). There are tools running at
regular intervals that query all devices for optional pages, and
spamming the kernel logs with these repeated errors is causing some
problems.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] nvme: do not log errors for user commands
  2022-10-17 16:37         ` Keith Busch
@ 2022-10-17 16:40           ` Alan Adamson
  0 siblings, 0 replies; 8+ messages in thread
From: Alan Adamson @ 2022-10-17 16:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Daniel Wagner, Pankaj Raghav,
	open list:NVM EXPRESS DRIVER, Yi Zhang, Tomas Bzatek,
	Chaitanya Kulkarni, Martin Petersen, gost.dev, Pankaj Raghav



> On Oct 17, 2022, at 9:37 AM, Keith Busch <kbusch@kernel.org> wrote:
> 
> On Mon, Oct 17, 2022 at 12:07:33AM -0700, Christoph Hellwig wrote:
>> On Fri, Oct 07, 2022 at 09:19:08PM +0000, Alan Adamson wrote:
>>>> I do get the debugging argument, but this clearly regresses production
>>>> code. Ideally, this code should be opt-in. The only way I see how to make
>>>> this work is by introducing a new API which allows enable this
>>>> feature.
>>> 
>>> Looks like a new API may need to be created.  I’ll investigate that.
>>> With your change, there
>>> will need to be a blktests change.
>> 
>> Yes, I think we need to effectively revert this for user commands and
>> then add an opt in - be that from the submitter, or through a sysfs
>> file for all command.
> 
> Yeah, I think we should revert logging errors for user admin commands
> (io command logs can probably remain). There are tools running at
> regular intervals that query all devices for optional pages, and
> spamming the kernel logs with these repeated errors is causing some
> problems.

I agree.  I’ll look into a being able to opt in to logging of admin commands.

Alan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-10-17 16:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-06  9:36 [PATCH v1] nvme: do not log errors for user commands Daniel Wagner
2022-10-06 12:16 ` Pankaj Raghav
2022-10-06 13:30   ` Daniel Wagner
2022-10-07 21:19     ` Alan Adamson
2022-10-10  7:36       ` Daniel Wagner
2022-10-17  7:07       ` Christoph Hellwig
2022-10-17 16:37         ` Keith Busch
2022-10-17 16:40           ` Alan Adamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).