linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bug-report] rasdaemon doesnot report new records.
@ 2023-01-30  9:34 Harshit Mogalapalli
  2023-01-30 11:57 ` Linux kernel regression tracking (#adding)
  0 siblings, 1 reply; 2+ messages in thread
From: Harshit Mogalapalli @ 2023-01-30  9:34 UTC (permalink / raw)
  To: linux-edac, linux-trace-kernel
  Cc: William ROCHE, Darren Kenny, rostedt, LKML, harshit.m.mogalapalli

Hi,

Since kernel 6.1-rc6 rasdaemon fails to update the summary of the records.

When we inject MCE errors, generally ras-mc-ctl --summary should be able
to read new errors, but starting from 6.1-rc6 the summary(count on
number of MCE records) doesnot udpate when we inject new mce errors.

This started happening after this commit
42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have
polling block on watermark") -- Commit landed first in 6.1-rc6, 6.1-rc5 
kernel doesnot have this problem.

On reverting this commit, rasdaemon works good(i.e It is able to read
the new mce records).

This continues to happen on latest kernel(v6.2-rc6) as well.

In a Good case -- 6.2-rc6 + revert of 42fb0a1e84ff 
("tracing/ring-buffer: Have polling block on watermark"), post poll read 
happens without being stuck.

openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu0/trace_pipe_raw", 
O_RDONLY) = 4
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu1/trace_pipe_raw", 
O_RDONLY) = 5
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu2/trace_pipe_raw", 
O_RDONLY) = 6
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu3/trace_pipe_raw", 
O_RDONLY) = 7
[...]
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, 
events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1) = 
1 ([{fd=4, revents=POLLIN}])
read(4, 
"\215~\0\0\0\0\0\0t\0\0\0\0\0\0\0\34\t\2\0\263\0\0\0#\0\0\0\n\1\0\t"..., 
4096) = 4096
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, 
st_size=114, ...}, 0) = 0
write(2, "rasdaemon: ", 11rasdaemon: )             = 11
write(2, "mce_record store: 0x56047b270008"..., 33mce_record store: 
0x56047b270008
) = 33


In a case where new records are not updated in summary: -- 6.2-rc6
The reason why the database of records isn't populated, is simply 
because rasdaemon doesn't get notified anymore by the kernel:

openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu0/trace_pipe_raw", 
O_RDONLY) = 4
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu1/trace_pipe_raw", 
O_RDONLY) = 5
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu2/trace_pipe_raw", 
O_RDONLY) = 6
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu3/trace_pipe_raw", 
O_RDONLY) = 7
[...]
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, 
events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1

--- Stuck here even when we inject MCE-errors.


Before the Commit 42fb0a1e84ff ("tracing/ring-buffer: Have polling block 
on watermark"), an error injection could wake the poll() call on the 
above special files, and I can confirm that the subsequent read() call 
did not hang. With the Commit 42fb0a1e84ff, an error injection doesn't 
wake the poll() call anymore.

We need to let rasdaemon (or any other trace consumer using the per_cpu 
files) retrieve the available information as soon as it is available.

Additional info:
1. 
https://github.com/mchehab/rasdaemon/blob/master/ras-events.c#:~:text=ready%20%3D%20poll(fds%2C%20(n_cpus%20%2B%201)%2C%20%2D1)%3B 
this is the code which is getting hit on rasdaemon side.

2. Changing the buffer_percent to a lower value didnot change the behaviour.


Thanks,
Harshit






^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [bug-report] rasdaemon doesnot report new records.
  2023-01-30  9:34 [bug-report] rasdaemon doesnot report new records Harshit Mogalapalli
@ 2023-01-30 11:57 ` Linux kernel regression tracking (#adding)
  0 siblings, 0 replies; 2+ messages in thread
From: Linux kernel regression tracking (#adding) @ 2023-01-30 11:57 UTC (permalink / raw)
  To: Harshit Mogalapalli, linux-edac, linux-trace-kernel
  Cc: William ROCHE, Darren Kenny, rostedt, LKML,
	harshit.m.mogalapalli, Linux kernel regressions list

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

On 30.01.23 10:34, Harshit Mogalapalli wrote:
> Hi,
> 
> Since kernel 6.1-rc6 rasdaemon fails to update the summary of the records.
> 
> When we inject MCE errors, generally ras-mc-ctl --summary should be able
> to read new errors, but starting from 6.1-rc6 the summary(count on
> number of MCE records) doesnot udpate when we inject new mce errors.
> 
> This started happening after this commit
> 42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have
> polling block on watermark") -- Commit landed first in 6.1-rc6, 6.1-rc5
> kernel doesnot have this problem.
> 
> On reverting this commit, rasdaemon works good(i.e It is able to read
> the new mce records).
>
> This continues to happen on latest kernel(v6.2-rc6) as well.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 42fb0a1e84ff525
#regzbot title tracing/ring-buffer: rasdaemon does not report new records
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

> In a Good case -- 6.2-rc6 + revert of 42fb0a1e84ff
> ("tracing/ring-buffer: Have polling block on watermark"), post poll read
> happens without being stuck.
> 
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu0/trace_pipe_raw", O_RDONLY) = 4
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu1/trace_pipe_raw", O_RDONLY) = 5
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu2/trace_pipe_raw", O_RDONLY) = 6
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu3/trace_pipe_raw", O_RDONLY) = 7
> [...]
> poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1) =
> 1 ([{fd=4, revents=POLLIN}])
> read(4,
> "\215~\0\0\0\0\0\0t\0\0\0\0\0\0\0\34\t\2\0\263\0\0\0#\0\0\0\n\1\0\t"...,
> 4096) = 4096
> newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644,
> st_size=114, ...}, 0) = 0
> write(2, "rasdaemon: ", 11rasdaemon: )             = 11
> write(2, "mce_record store: 0x56047b270008"..., 33mce_record store:
> 0x56047b270008
> ) = 33
> 
> 
> In a case where new records are not updated in summary: -- 6.2-rc6
> The reason why the database of records isn't populated, is simply
> because rasdaemon doesn't get notified anymore by the kernel:
> 
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu0/trace_pipe_raw", O_RDONLY) = 4
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu1/trace_pipe_raw", O_RDONLY) = 5
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu2/trace_pipe_raw", O_RDONLY) = 6
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu3/trace_pipe_raw", O_RDONLY) = 7
> [...]
> poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1
> 
> --- Stuck here even when we inject MCE-errors.
> 
> 
> Before the Commit 42fb0a1e84ff ("tracing/ring-buffer: Have polling block
> on watermark"), an error injection could wake the poll() call on the
> above special files, and I can confirm that the subsequent read() call
> did not hang. With the Commit 42fb0a1e84ff, an error injection doesn't
> wake the poll() call anymore.
> 
> We need to let rasdaemon (or any other trace consumer using the per_cpu
> files) retrieve the available information as soon as it is available.
> 
> Additional info:
> 1.
> https://github.com/mchehab/rasdaemon/blob/master/ras-events.c#:~:text=ready%20%3D%20poll(fds%2C%20(n_cpus%20%2B%201)%2C%20%2D1)%3B this is the code which is getting hit on rasdaemon side.
> 
> 2. Changing the buffer_percent to a lower value didnot change the
> behaviour.
> 
> 
> Thanks,
> Harshit
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-01-30 11:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-30  9:34 [bug-report] rasdaemon doesnot report new records Harshit Mogalapalli
2023-01-30 11:57 ` Linux kernel regression tracking (#adding)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).