From: "Linux kernel regression tracking (#adding)" <regressions@leemhuis.info>
To: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>,
linux-edac@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Cc: William ROCHE <william.roche@oracle.com>,
Darren Kenny <darren.kenny@oracle.com>,
rostedt@goodmis.org, LKML <linux-kernel@vger.kernel.org>,
"harshit.m.mogalapalli@gmail.com"
<harshit.m.mogalapalli@gmail.com>,
Linux kernel regressions list <regressions@lists.linux.dev>
Subject: Re: [bug-report] rasdaemon doesnot report new records.
Date: Mon, 30 Jan 2023 12:57:14 +0100 [thread overview]
Message-ID: <1a78baae-eb78-5add-13b3-9526082160d9@leemhuis.info> (raw)
In-Reply-To: <31eb3b12-3350-90a4-a0d9-d1494db7cf74@oracle.com>
[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]
[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]
On 30.01.23 10:34, Harshit Mogalapalli wrote:
> Hi,
>
> Since kernel 6.1-rc6 rasdaemon fails to update the summary of the records.
>
> When we inject MCE errors, generally ras-mc-ctl --summary should be able
> to read new errors, but starting from 6.1-rc6 the summary(count on
> number of MCE records) doesnot udpate when we inject new mce errors.
>
> This started happening after this commit
> 42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have
> polling block on watermark") -- Commit landed first in 6.1-rc6, 6.1-rc5
> kernel doesnot have this problem.
>
> On reverting this commit, rasdaemon works good(i.e It is able to read
> the new mce records).
>
> This continues to happen on latest kernel(v6.2-rc6) as well.
Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:
#regzbot ^introduced 42fb0a1e84ff525
#regzbot title tracing/ring-buffer: rasdaemon does not report new records
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
> In a Good case -- 6.2-rc6 + revert of 42fb0a1e84ff
> ("tracing/ring-buffer: Have polling block on watermark"), post poll read
> happens without being stuck.
>
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu0/trace_pipe_raw", O_RDONLY) = 4
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu1/trace_pipe_raw", O_RDONLY) = 5
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu2/trace_pipe_raw", O_RDONLY) = 6
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu3/trace_pipe_raw", O_RDONLY) = 7
> [...]
> poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1) =
> 1 ([{fd=4, revents=POLLIN}])
> read(4,
> "\215~\0\0\0\0\0\0t\0\0\0\0\0\0\0\34\t\2\0\263\0\0\0#\0\0\0\n\1\0\t"...,
> 4096) = 4096
> newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644,
> st_size=114, ...}, 0) = 0
> write(2, "rasdaemon: ", 11rasdaemon: ) = 11
> write(2, "mce_record store: 0x56047b270008"..., 33mce_record store:
> 0x56047b270008
> ) = 33
>
>
> In a case where new records are not updated in summary: -- 6.2-rc6
> The reason why the database of records isn't populated, is simply
> because rasdaemon doesn't get notified anymore by the kernel:
>
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu0/trace_pipe_raw", O_RDONLY) = 4
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu1/trace_pipe_raw", O_RDONLY) = 5
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu2/trace_pipe_raw", O_RDONLY) = 6
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu3/trace_pipe_raw", O_RDONLY) = 7
> [...]
> poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1
>
> --- Stuck here even when we inject MCE-errors.
>
>
> Before the Commit 42fb0a1e84ff ("tracing/ring-buffer: Have polling block
> on watermark"), an error injection could wake the poll() call on the
> above special files, and I can confirm that the subsequent read() call
> did not hang. With the Commit 42fb0a1e84ff, an error injection doesn't
> wake the poll() call anymore.
>
> We need to let rasdaemon (or any other trace consumer using the per_cpu
> files) retrieve the available information as soon as it is available.
>
> Additional info:
> 1.
> https://github.com/mchehab/rasdaemon/blob/master/ras-events.c#:~:text=ready%20%3D%20poll(fds%2C%20(n_cpus%20%2B%201)%2C%20%2D1)%3B this is the code which is getting hit on rasdaemon side.
>
> 2. Changing the buffer_percent to a lower value didnot change the
> behaviour.
>
>
> Thanks,
> Harshit
>
>
>
>
>
next prev parent reply other threads:[~2023-01-30 11:57 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-30 9:34 [bug-report] rasdaemon doesnot report new records Harshit Mogalapalli
2023-01-30 11:57 ` Linux kernel regression tracking (#adding) [this message]
2023-02-16 13:59 ` Linux regression tracking #update (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1a78baae-eb78-5add-13b3-9526082160d9@leemhuis.info \
--to=regressions@leemhuis.info \
--cc=darren.kenny@oracle.com \
--cc=harshit.m.mogalapalli@gmail.com \
--cc=harshit.m.mogalapalli@oracle.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=regressions@lists.linux.dev \
--cc=rostedt@goodmis.org \
--cc=william.roche@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).