linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] RAS: Fix the trace_show() function to output trace_count
@ 2022-10-05 16:16 Tony Luck
  2022-10-05 18:17 ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Tony Luck @ 2022-10-05 16:16 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-kernel, Tony Luck

The /sys/kernel/debug/ras/daemon_active is used to indicate to some
kernel code whether there is a user mode consumer of RAS trace events
to decide whether to print to the console.

The intent was that reading daemon_active would give the same indication
to users. But the code is broken and does not display anything.

Add a seq_printf() to output the current value of the trace_count.

Fixes: d963cd95bea9 ("RAS, debugfs: Add debugfs interface for RAS subsystem")
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/ras/debugfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ras/debugfs.c b/drivers/ras/debugfs.c
index 0d4f985afbf3..d67af5e08caf 100644
--- a/drivers/ras/debugfs.c
+++ b/drivers/ras/debugfs.c
@@ -15,7 +15,8 @@ EXPORT_SYMBOL_GPL(ras_userspace_consumers);
 
 static int trace_show(struct seq_file *m, void *v)
 {
-	return atomic_read(&trace_count);
+	seq_printf(m, "%d\n", atomic_read(&trace_count));
+	return 0;
 }
 
 static int trace_open(struct inode *inode, struct file *file)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] RAS: Fix the trace_show() function to output trace_count
  2022-10-05 16:16 [PATCH] RAS: Fix the trace_show() function to output trace_count Tony Luck
@ 2022-10-05 18:17 ` Borislav Petkov
  2022-10-05 19:46   ` [PATCH v2] " Tony Luck
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2022-10-05 18:17 UTC (permalink / raw)
  To: Tony Luck; +Cc: x86, linux-kernel

On Wed, Oct 05, 2022 at 09:16:45AM -0700, Tony Luck wrote:
> The /sys/kernel/debug/ras/daemon_active is used to indicate to some
> kernel code whether there is a user mode consumer of RAS trace events
> to decide whether to print to the console.
> 
> The intent was that reading daemon_active would give the same indication
> to users. But the code is broken and does not display anything.

Huh, looking at that Fixes commit, it says:

"One can track which daemon opens it via "lsof
/path/to/debugfs/ras/daemon_active"."

# lsof /sys/kernel/debug/ras/daemon_active
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
tail    13339 root    3r   REG    0,6        0   27 /sys/kernel/debug/ras/daemon_active

So it works as advertized I'd say.

> Add a seq_printf() to output the current value of the trace_count.

Now, if you want to cat that file, that's a different story.

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] RAS: Fix the trace_show() function to output trace_count
  2022-10-05 18:17 ` Borislav Petkov
@ 2022-10-05 19:46   ` Tony Luck
  2022-10-17 10:36     ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Tony Luck @ 2022-10-05 19:46 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-kernel, Tony Luck

The /sys/kernel/debug/ras/daemon_active file is used to indicate to some
kernel code whether there is a user mode consumer of RAS trace events
to decide whether to print to the console.

It looks like the intent was that reading the file would show the count
of tasks with the file open. But the code in the trace_show() function
is nonsense.

Add a seq_printf() to output the current value of the trace_count.

Fixes: d963cd95bea9 ("RAS, debugfs: Add debugfs interface for RAS subsystem")
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

V2: Updated commit description based on feedback from Boris

 drivers/ras/debugfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ras/debugfs.c b/drivers/ras/debugfs.c
index 0d4f985afbf3..d67af5e08caf 100644
--- a/drivers/ras/debugfs.c
+++ b/drivers/ras/debugfs.c
@@ -15,7 +15,8 @@ EXPORT_SYMBOL_GPL(ras_userspace_consumers);
 
 static int trace_show(struct seq_file *m, void *v)
 {
-	return atomic_read(&trace_count);
+	seq_printf(m, "%d\n", atomic_read(&trace_count));
+	return 0;
 }
 
 static int trace_open(struct inode *inode, struct file *file)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] RAS: Fix the trace_show() function to output trace_count
  2022-10-05 19:46   ` [PATCH v2] " Tony Luck
@ 2022-10-17 10:36     ` Borislav Petkov
  2022-10-17 16:09       ` Luck, Tony
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2022-10-17 10:36 UTC (permalink / raw)
  To: Tony Luck; +Cc: x86, linux-kernel

On Wed, Oct 05, 2022 at 12:46:44PM -0700, Tony Luck wrote:
>  static int trace_show(struct seq_file *m, void *v)
>  {
> -	return atomic_read(&trace_count);
> +	seq_printf(m, "%d\n", atomic_read(&trace_count));

Still misleading:

$ cat /sys/kernel/debug/ras/daemon_active
1

and it is 1 and not 0 because:

[   13.177644] trace_open: trace_count: 1, comm: cat
[   13.178675] trace_release: trace_count: 0, comm: cat

we toggle trace_count when we read it.

I don't know, maybe we should teach RAS daemons to ->write() into that
file their name and PID so that the trace_count counts *only* the RAS
daemons not any reader...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2] RAS: Fix the trace_show() function to output trace_count
  2022-10-17 10:36     ` Borislav Petkov
@ 2022-10-17 16:09       ` Luck, Tony
  2022-10-17 19:40         ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Luck, Tony @ 2022-10-17 16:09 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-kernel

> Still misleading:
>
> $ cat /sys/kernel/debug/ras/daemon_active
> 1

Agreed. It needs user to interpret the answer. The filename would lead
them to think "1" means the daemon is active, but its actually just a count
of how many times the file is concurrently open (which includes the
"cat" process reading the file).


> I don't know, maybe we should teach RAS daemons to ->write() into that
> file their name and PID so that the trace_count counts *only* the RAS
> daemons not any reader...

Should have thought of this earlier ... changing user space semantics
is hard. Even with only one user, there is a long transition period where
new kernels are running with old rasdaemon and vice versa.

How about:

	seq_printf(m, "%d\n", atomic_read(&trace_count) - 1);

with a comment that users reading the file only want to know if anyone
else has it open?

-Tony



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] RAS: Fix the trace_show() function to output trace_count
  2022-10-17 16:09       ` Luck, Tony
@ 2022-10-17 19:40         ` Borislav Petkov
  2022-10-18 14:36           ` Tony Luck
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2022-10-17 19:40 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86, linux-kernel

On Mon, Oct 17, 2022 at 04:09:23PM +0000, Luck, Tony wrote:
> Agreed. It needs user to interpret the answer. The filename would lead
> them to think "1" means the daemon is active, but its actually just a count
> of how many times the file is concurrently open (which includes the
> "cat" process reading the file).

Yap, exactly.


> Should have thought of this earlier ... changing user space semantics
> is hard.

AFAIR, at the time we cared only about there being at least one
consumer... thus the binary test, is there at least one or not:

        if (!ras_userspace_consumers()) {
                print_extlog_rcd(NULL, tmp, cpu);
                goto out;
        }


> How about:
> 
> 	seq_printf(m, "%d\n", atomic_read(&trace_count) - 1);
> 
> with a comment that users reading the file only want to know if anyone
> else has it open?

Yeah, doesn't work either:

# tail -f /sys/kernel/debug/ras/daemon_active &
[1] 3019
1
tail: /sys/kernel/debug/ras/daemon_active: file truncated
1
#  cat /sys/kernel/debug/ras/daemon_active
2



We really need something to say, "I really am a RAS events consumer and
not some random file opener."

OTOH, if one does that on ones system, then one has herself to blame
when errors don't get logged and disappear. I mean, why would one even
do that?!

Then again, I've seen weirder stuff so...

Question is, what is your goal with this?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] RAS: Fix the trace_show() function to output trace_count
  2022-10-17 19:40         ` Borislav Petkov
@ 2022-10-18 14:36           ` Tony Luck
  2022-10-18 16:05             ` Borislav Petkov
  2022-10-18 16:59             ` [PATCH v3] RAS: Fix return value from show_trace() Tony Luck
  0 siblings, 2 replies; 10+ messages in thread
From: Tony Luck @ 2022-10-18 14:36 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-kernel

On Mon, Oct 17, 2022 at 09:40:09PM +0200, Borislav Petkov wrote:
> Question is, what is your goal with this?

I ran into it while fixing:

  https://lore.kernel.org/all/20221006163258.318916-1-tony.luck@intel.com/

when I just used "cat daemon_active" to check status and was confused
by the lack of any output. Looked at the code and saw the silly

	return atomic_read(&trace_count);

which does nothing useful.

Maybe if there isn't a reasonable thing to output, that could be
changed to:

static int trace_show(struct seq_file *m, void *v)
{
	/*
	 * User should use "lsof daemon_active" if they want to 
	 * know if there is a process consuming trace error records
	 */
	return 0;
}

Sadly, we need to have a trace_show() function. If trace_open() just
does:

	return single_open(file, NULL, NULL);

then we get a deref NULL OOPs if somebody does "cat daemon_active" :-(

-Tony

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] RAS: Fix the trace_show() function to output trace_count
  2022-10-18 14:36           ` Tony Luck
@ 2022-10-18 16:05             ` Borislav Petkov
  2022-10-18 16:59             ` [PATCH v3] RAS: Fix return value from show_trace() Tony Luck
  1 sibling, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2022-10-18 16:05 UTC (permalink / raw)
  To: Tony Luck; +Cc: x86, linux-kernel

On Tue, Oct 18, 2022 at 07:36:53AM -0700, Tony Luck wrote:
> Sadly, we need to have a trace_show() function. If trace_open() just
> does:
> 
> 	return single_open(file, NULL, NULL);
> 
> then we get a deref NULL OOPs if somebody does "cat daemon_active" :-(

I guess but as you see yourself, cat-ting that file is done only by
us when debugging stuff. So we shouldn't waste too much energy on
"designing" an interface. As nothing'll use it.

But sure, trace_show(), why not.

I mean, if people go and open that file and confuse the system, they'll
only have themselves to blame. Besides, it is debugfs and root-only and
so on - if you wanna do damage, you already have root.

So yeah, let's keep this effort minimal and only address valid use cases.

IMO.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3] RAS: Fix return value from show_trace()
  2022-10-18 14:36           ` Tony Luck
  2022-10-18 16:05             ` Borislav Petkov
@ 2022-10-18 16:59             ` Tony Luck
  2022-10-31 18:15               ` [tip: ras/core] " tip-bot2 for Tony Luck
  1 sibling, 1 reply; 10+ messages in thread
From: Tony Luck @ 2022-10-18 16:59 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-kernel, Tony Luck

Documentation/filesystems/seq_file.rst describes the possible return
values from a "show()" function used by single_open().

show_trace() returns the value of "trace_count" this could be interpreted
as "SEQ_SKIP", or just confuse the calling function.

Change to just return "0" to avoid confusing anyone reading this code
and possibly using as a template. Reading "daemon_active" was never
an intended use case.

Signed-off-by: Tony Luck <tony.luck@intel.com>

---
V3: Boris: "let's keep this effort minimal and only address valid use
cases"
---
 drivers/ras/debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ras/debugfs.c b/drivers/ras/debugfs.c
index 0d4f985afbf3..f0a6391b1146 100644
--- a/drivers/ras/debugfs.c
+++ b/drivers/ras/debugfs.c
@@ -15,7 +15,7 @@ EXPORT_SYMBOL_GPL(ras_userspace_consumers);
 
 static int trace_show(struct seq_file *m, void *v)
 {
-	return atomic_read(&trace_count);
+	return 0;
 }
 
 static int trace_open(struct inode *inode, struct file *file)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip: ras/core] RAS: Fix return value from show_trace()
  2022-10-18 16:59             ` [PATCH v3] RAS: Fix return value from show_trace() Tony Luck
@ 2022-10-31 18:15               ` tip-bot2 for Tony Luck
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot2 for Tony Luck @ 2022-10-31 18:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Tony Luck, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     50865c14f34edbd03f8113147fac069b39f4e390
Gitweb:        https://git.kernel.org/tip/50865c14f34edbd03f8113147fac069b39f4e390
Author:        Tony Luck <tony.luck@intel.com>
AuthorDate:    Tue, 18 Oct 2022 09:59:00 -07:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Mon, 31 Oct 2022 18:55:18 +01:00

RAS: Fix return value from show_trace()

Documentation/filesystems/seq_file.rst describes the possible return
values from a "show()" function used by single_open().

show_trace() returns the value of "trace_count". This could be
interpreted as "SEQ_SKIP", or just confuse the calling function.

Change to just return "0" to avoid confusing anyone reading this code
and possibly using as a template. Reading "daemon_active" was never
an intended use case.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221018165900.109029-1-tony.luck@intel.com
---
 drivers/ras/debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ras/debugfs.c b/drivers/ras/debugfs.c
index 0d4f985..f0a6391 100644
--- a/drivers/ras/debugfs.c
+++ b/drivers/ras/debugfs.c
@@ -15,7 +15,7 @@ EXPORT_SYMBOL_GPL(ras_userspace_consumers);
 
 static int trace_show(struct seq_file *m, void *v)
 {
-	return atomic_read(&trace_count);
+	return 0;
 }
 
 static int trace_open(struct inode *inode, struct file *file)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-10-31 18:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-05 16:16 [PATCH] RAS: Fix the trace_show() function to output trace_count Tony Luck
2022-10-05 18:17 ` Borislav Petkov
2022-10-05 19:46   ` [PATCH v2] " Tony Luck
2022-10-17 10:36     ` Borislav Petkov
2022-10-17 16:09       ` Luck, Tony
2022-10-17 19:40         ` Borislav Petkov
2022-10-18 14:36           ` Tony Luck
2022-10-18 16:05             ` Borislav Petkov
2022-10-18 16:59             ` [PATCH v3] RAS: Fix return value from show_trace() Tony Luck
2022-10-31 18:15               ` [tip: ras/core] " tip-bot2 for Tony Luck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).