All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>, Nigel Croxon <ncroxon@redhat.com>,
	"Theodore Y. Ts'o" <tytso@mit.edu>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	dm-devel@redhat.com, Mikulas Patocka <mpatocka@redhat.com>,
	linux-serial@vger.kernel.org
Subject: Re: Serial console is causing system lock-up
Date: Thu, 14 Mar 2019 19:30:45 +0900	[thread overview]
Message-ID: <20190314103045.GA24210@jagdpanzerIV> (raw)
In-Reply-To: <878sxj9nbb.fsf@linutronix.de>

On (03/13/19 09:43), John Ogness wrote:
> I don't understand how you can think "print or die trying" is replaced
> with another "print or die trying".

Sorry, let me explain. In some contexts CPUs which are spinning on
prb_lock don't do anything else. A careful placement of

        touch_softlockup_watchdog_sync();
        clocksource_touch_watchdog();
        rcu_cpu_stall_reset();
        touch_nmi_watchdog();

keeps the watchdogs away, yes, but that doesn't mean that we are not
sitting on a time bomb. Think of RCU, for instance. We keep rcu_cpu_stall
silent and things can look OK, but that doesn't mean that RCU is OK in
reality; spinning CPUs may hold off grace periods. So now a relatively
simple issue - raid checksum mismatch in this particular case - has
potential to become OOM. Quadratic CPU serialisation doesn't scale.
Throw enough reporting CPUs on it and we may get very close to some
big problems. Does this make sense?

This bug report demonstrates that we can have N CPUs reporting warns
simultaneously. And I think that people would want to have pr_warns
and WARN_ONs to be printed as emergency level messages (it sort of
sounds reasonable. I understand that you have different opinion on this).

And what I'm thinking is that *probably* we can have a bit less radical
approach - the system is not always doomed when it WARNs us - and a bit
more "best effort" one. *May be* we don't need to apply full serialisation
all the time. *May be* full serialisation can be applied only when we see
that we are about to run out of free space in logbuf. Or may be can
start dynamically resize the logbuf. And so on.

> By the way, Sergey, I appreciate your skepticism.

Sorry, John. I know I'm a PITA.

	-ss

  reply	other threads:[~2019-03-14 10:30 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-06 14:27 Serial console is causing system lock-up Mikulas Patocka
2019-03-06 15:22 ` Petr Mladek
2019-03-06 16:07   ` Mikulas Patocka
2019-03-06 16:30     ` Theodore Y. Ts'o
2019-03-06 17:11       ` Mikulas Patocka
2019-03-06 22:19         ` Steven Rostedt
2019-03-06 22:43           ` John Ogness
2019-03-07  2:22             ` Sergey Senozhatsky
2019-03-07  8:17               ` John Ogness
2019-03-07  8:25                 ` Sergey Senozhatsky
2019-03-07  8:34                   ` John Ogness
2019-03-07  9:17                     ` Sergey Senozhatsky
2019-03-07 10:37                       ` John Ogness
2019-03-07 12:26                         ` Sergey Senozhatsky
2019-03-07 12:54                           ` Mikulas Patocka
2019-03-07 14:21                           ` John Ogness
2019-03-07 15:35                             ` Petr Mladek
2019-03-12  2:32                             ` Sergey Senozhatsky
2019-03-12  8:17                               ` John Ogness
2019-03-12  8:59                                 ` Sergey Senozhatsky
2019-03-12 10:05                                 ` Mikulas Patocka
2019-03-12 13:19                                   ` John Ogness
2019-03-12 13:44                                     ` Petr Mladek
2019-03-12 12:08                                 ` Petr Mladek
2019-03-12 15:19                                   ` John Ogness
2019-03-13  2:38                                   ` Sergey Senozhatsky
2019-03-13  8:43                                     ` John Ogness
2019-03-14 10:30                                       ` Sergey Senozhatsky [this message]
2019-03-07 14:08             ` John Stoffel
2019-03-07 14:26               ` Mikulas Patocka
2019-03-08  1:22                 ` Sergey Senozhatsky
2019-03-08  1:39                   ` Sergey Senozhatsky
2019-03-08  2:36                     ` John Ogness
2019-03-07 15:16         ` Petr Mladek
2019-03-07  1:56     ` Sergey Senozhatsky
2019-03-07 13:12       ` Mikulas Patocka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190314103045.GA24210@jagdpanzerIV \
    --to=sergey.senozhatsky.work@gmail.com \
    --cc=dm-devel@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=linux-serial@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=ncroxon@redhat.com \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.