linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected
Date: Wed, 12 Feb 2020 15:54:41 +0100	[thread overview]
Message-ID: <20200212145441.ukhvil6tpljsivvl@pathway.suse.cz> (raw)
In-Reply-To: <c3ce3cee-c757-f76e-8d43-33e9b9ae80ba@yandex-team.ru>

On Tue 2020-02-11 15:36:02, Konstantin Khlebnikov wrote:
> On 11/02/2020 11.14, Kirill Tkhai wrote:
> > Hi, Konstantin,
> > 
> > On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
> > > In NMI context printk() could save messages into per-cpu buffers and
> > > schedule flush by irq_work when IRQ are unblocked. This means message
> > > about hardlockup appears in kernel log only when/if lockup is gone.
> > > 
> > > Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> > > thus printk() cannot schedule flush work to another cpu.
> > > 
> > > This patch adds simple atomic counter of detected hardlockups and
> > > flushes all per-cpu printk buffers in context softlockup watchdog
> > > at any other cpu when it sees changes of this counter.
> > > 
> > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > > ---
> > >   include/linux/nmi.h   |    1 +
> > >   kernel/watchdog.c     |   22 ++++++++++++++++++++++
> > >   kernel/watchdog_hld.c |    1 +
> > >   3 files changed, 24 insertions(+)
> > > 
> > > diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> > > index 9003e29cde46..8406df72ae5a 100644
> > > --- a/include/linux/nmi.h
> > > +++ b/include/linux/nmi.h
> > > @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
> > >   #if defined(CONFIG_HARDLOCKUP_DETECTOR)
> > >   extern void hardlockup_detector_disable(void);
> > >   extern unsigned int hardlockup_panic;
> > > +extern atomic_t hardlockup_detected;
> > >   #else
> > >   static inline void hardlockup_detector_disable(void) {}
> > >   #endif
> > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > index b6b1f54a7837..9f5c68fababe 100644
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
> > >   }
> > >   __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
> > >   # endif /* CONFIG_SMP */
> > > +
> > > +atomic_t hardlockup_detected = ATOMIC_INIT(0);
> > > +
> > > +static inline void flush_hardlockup_messages(void)
> > > +{
> > > +	static atomic_t flushed = ATOMIC_INIT(0);
> > > +
> > > +	/* flush messages from hard lockup detector */
> > > +	if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
> > > +		atomic_set(&flushed, atomic_read(&hardlockup_detected));
> > > +		printk_safe_flush();
> > > +	}
> > > +}
> > 
> > Do we really need two variables here? They may come into two different
> > cache lines, and there will be double cache pollution just because of
> > this simple check. Why not the below?
> 
> I don't think anybody could notice read-only access to second variable.
> This executes once in several seconds.
> 
> Watchdogs already use same pattern (monotonic counter + snapshot) in
> couple places. So code looks more clean in this way.

It is not only about speed. It is also about code complexity
and correctness. Using two variables is more complex.
Calling atomic_read(&hardlockup_detected) twice does
not look like a correct pattern.

The single variable patter is used for similar things there
as well, for example, see hard_watchdog_warn,
hardlockup_allcpu_dumped.

I would call the variable "hardlockup_dump_flush" and
use the same logic as for hardlockup_allcpu_dumped.

Note that simple READ_ONCE(), WRITE_ONCE() are not enough
because they do not provide smp barriers.

Best Regards,
Petr

  reply	other threads:[~2020-02-12 14:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-10  9:48 [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected Konstantin Khlebnikov
2020-02-10 22:51 ` Andrew Morton
2020-02-11 11:01   ` Konstantin Khlebnikov
2020-02-11  8:14 ` Kirill Tkhai
2020-02-11 12:36   ` Konstantin Khlebnikov
2020-02-12 14:54     ` Petr Mladek [this message]
2020-02-13 13:05       ` Konstantin Khlebnikov
2020-02-12  1:15 ` Sergey Senozhatsky
2020-02-12  2:49   ` Steven Rostedt
2020-02-12  3:04     ` Sergey Senozhatsky
2020-02-12  3:27       ` Steven Rostedt
2020-02-12 14:18   ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200212145441.ukhvil6tpljsivvl@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=dmtrmonakhov@yandex-team.ru \
    --cc=khlebnikov@yandex-team.ru \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).