linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
To: linux-edac@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Cc: William ROCHE <william.roche@oracle.com>,
	Darren Kenny <darren.kenny@oracle.com>,
	rostedt@goodmis.org, LKML <linux-kernel@vger.kernel.org>,
	"harshit.m.mogalapalli@gmail.com"
	<harshit.m.mogalapalli@gmail.com>
Subject: [bug-report] rasdaemon doesnot report new records.
Date: Mon, 30 Jan 2023 15:04:09 +0530	[thread overview]
Message-ID: <31eb3b12-3350-90a4-a0d9-d1494db7cf74@oracle.com> (raw)

Hi,

Since kernel 6.1-rc6 rasdaemon fails to update the summary of the records.

When we inject MCE errors, generally ras-mc-ctl --summary should be able
to read new errors, but starting from 6.1-rc6 the summary(count on
number of MCE records) doesnot udpate when we inject new mce errors.

This started happening after this commit
42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have
polling block on watermark") -- Commit landed first in 6.1-rc6, 6.1-rc5 
kernel doesnot have this problem.

On reverting this commit, rasdaemon works good(i.e It is able to read
the new mce records).

This continues to happen on latest kernel(v6.2-rc6) as well.

In a Good case -- 6.2-rc6 + revert of 42fb0a1e84ff 
("tracing/ring-buffer: Have polling block on watermark"), post poll read 
happens without being stuck.

openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu0/trace_pipe_raw", 
O_RDONLY) = 4
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu1/trace_pipe_raw", 
O_RDONLY) = 5
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu2/trace_pipe_raw", 
O_RDONLY) = 6
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu3/trace_pipe_raw", 
O_RDONLY) = 7
[...]
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, 
events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1) = 
1 ([{fd=4, revents=POLLIN}])
read(4, 
"\215~\0\0\0\0\0\0t\0\0\0\0\0\0\0\34\t\2\0\263\0\0\0#\0\0\0\n\1\0\t"..., 
4096) = 4096
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, 
st_size=114, ...}, 0) = 0
write(2, "rasdaemon: ", 11rasdaemon: )             = 11
write(2, "mce_record store: 0x56047b270008"..., 33mce_record store: 
0x56047b270008
) = 33


In a case where new records are not updated in summary: -- 6.2-rc6
The reason why the database of records isn't populated, is simply 
because rasdaemon doesn't get notified anymore by the kernel:

openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu0/trace_pipe_raw", 
O_RDONLY) = 4
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu1/trace_pipe_raw", 
O_RDONLY) = 5
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu2/trace_pipe_raw", 
O_RDONLY) = 6
openat(AT_FDCWD, 
"/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu3/trace_pipe_raw", 
O_RDONLY) = 7
[...]
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, 
events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1

--- Stuck here even when we inject MCE-errors.


Before the Commit 42fb0a1e84ff ("tracing/ring-buffer: Have polling block 
on watermark"), an error injection could wake the poll() call on the 
above special files, and I can confirm that the subsequent read() call 
did not hang. With the Commit 42fb0a1e84ff, an error injection doesn't 
wake the poll() call anymore.

We need to let rasdaemon (or any other trace consumer using the per_cpu 
files) retrieve the available information as soon as it is available.

Additional info:
1. 
https://github.com/mchehab/rasdaemon/blob/master/ras-events.c#:~:text=ready%20%3D%20poll(fds%2C%20(n_cpus%20%2B%201)%2C%20%2D1)%3B 
this is the code which is getting hit on rasdaemon side.

2. Changing the buffer_percent to a lower value didnot change the behaviour.


Thanks,
Harshit






             reply	other threads:[~2023-01-30  9:36 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-30  9:34 Harshit Mogalapalli [this message]
2023-01-30 11:57 ` [bug-report] rasdaemon doesnot report new records Linux kernel regression tracking (#adding)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=31eb3b12-3350-90a4-a0d9-d1494db7cf74@oracle.com \
    --to=harshit.m.mogalapalli@oracle.com \
    --cc=darren.kenny@oracle.com \
    --cc=harshit.m.mogalapalli@gmail.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=william.roche@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).