linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Linux kernel regression tracking (#adding)"  <regressions@leemhuis.info>
To: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>,
	linux-edac@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Cc: William ROCHE <william.roche@oracle.com>,
	Darren Kenny <darren.kenny@oracle.com>,
	rostedt@goodmis.org, LKML <linux-kernel@vger.kernel.org>,
	"harshit.m.mogalapalli@gmail.com"
	<harshit.m.mogalapalli@gmail.com>,
	Linux kernel regressions list <regressions@lists.linux.dev>
Subject: Re: [bug-report] rasdaemon doesnot report new records.
Date: Mon, 30 Jan 2023 12:57:14 +0100	[thread overview]
Message-ID: <1a78baae-eb78-5add-13b3-9526082160d9@leemhuis.info> (raw)
In-Reply-To: <31eb3b12-3350-90a4-a0d9-d1494db7cf74@oracle.com>

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

On 30.01.23 10:34, Harshit Mogalapalli wrote:
> Hi,
> 
> Since kernel 6.1-rc6 rasdaemon fails to update the summary of the records.
> 
> When we inject MCE errors, generally ras-mc-ctl --summary should be able
> to read new errors, but starting from 6.1-rc6 the summary(count on
> number of MCE records) doesnot udpate when we inject new mce errors.
> 
> This started happening after this commit
> 42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have
> polling block on watermark") -- Commit landed first in 6.1-rc6, 6.1-rc5
> kernel doesnot have this problem.
> 
> On reverting this commit, rasdaemon works good(i.e It is able to read
> the new mce records).
>
> This continues to happen on latest kernel(v6.2-rc6) as well.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 42fb0a1e84ff525
#regzbot title tracing/ring-buffer: rasdaemon does not report new records
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

> In a Good case -- 6.2-rc6 + revert of 42fb0a1e84ff
> ("tracing/ring-buffer: Have polling block on watermark"), post poll read
> happens without being stuck.
> 
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu0/trace_pipe_raw", O_RDONLY) = 4
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu1/trace_pipe_raw", O_RDONLY) = 5
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu2/trace_pipe_raw", O_RDONLY) = 6
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu3/trace_pipe_raw", O_RDONLY) = 7
> [...]
> poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1) =
> 1 ([{fd=4, revents=POLLIN}])
> read(4,
> "\215~\0\0\0\0\0\0t\0\0\0\0\0\0\0\34\t\2\0\263\0\0\0#\0\0\0\n\1\0\t"...,
> 4096) = 4096
> newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644,
> st_size=114, ...}, 0) = 0
> write(2, "rasdaemon: ", 11rasdaemon: )             = 11
> write(2, "mce_record store: 0x56047b270008"..., 33mce_record store:
> 0x56047b270008
> ) = 33
> 
> 
> In a case where new records are not updated in summary: -- 6.2-rc6
> The reason why the database of records isn't populated, is simply
> because rasdaemon doesn't get notified anymore by the kernel:
> 
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu0/trace_pipe_raw", O_RDONLY) = 4
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu1/trace_pipe_raw", O_RDONLY) = 5
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu2/trace_pipe_raw", O_RDONLY) = 6
> openat(AT_FDCWD,
> "/sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu3/trace_pipe_raw", O_RDONLY) = 7
> [...]
> poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6,
> events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 5, -1
> 
> --- Stuck here even when we inject MCE-errors.
> 
> 
> Before the Commit 42fb0a1e84ff ("tracing/ring-buffer: Have polling block
> on watermark"), an error injection could wake the poll() call on the
> above special files, and I can confirm that the subsequent read() call
> did not hang. With the Commit 42fb0a1e84ff, an error injection doesn't
> wake the poll() call anymore.
> 
> We need to let rasdaemon (or any other trace consumer using the per_cpu
> files) retrieve the available information as soon as it is available.
> 
> Additional info:
> 1.
> https://github.com/mchehab/rasdaemon/blob/master/ras-events.c#:~:text=ready%20%3D%20poll(fds%2C%20(n_cpus%20%2B%201)%2C%20%2D1)%3B this is the code which is getting hit on rasdaemon side.
> 
> 2. Changing the buffer_percent to a lower value didnot change the
> behaviour.
> 
> 
> Thanks,
> Harshit
> 
> 
> 
> 
> 

  reply	other threads:[~2023-01-30 11:57 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-30  9:34 [bug-report] rasdaemon doesnot report new records Harshit Mogalapalli
2023-01-30 11:57 ` Linux kernel regression tracking (#adding) [this message]
2023-02-16 13:59   ` Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a78baae-eb78-5add-13b3-9526082160d9@leemhuis.info \
    --to=regressions@leemhuis.info \
    --cc=darren.kenny@oracle.com \
    --cc=harshit.m.mogalapalli@gmail.com \
    --cc=harshit.m.mogalapalli@oracle.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=william.roche@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).