All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] printk for 5.19
@ 2022-05-23 13:21 Petr Mladek
  2022-05-25 18:09 ` Linus Torvalds
  2022-05-25 18:37 ` pr-tracker-bot
  0 siblings, 2 replies; 6+ messages in thread
From: Petr Mladek @ 2022-05-23 13:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness, Andy Shevchenko,
	Rasmus Villemoes, Jan Kara, Peter Zijlstra, linux-kernel

Hi Linus,

please pull the latest printk changes from

  git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git tags/printk-for-5.19

================================================

- Offload writing printk() messages on consoles to per-console kthreads.

  It prevents soft-lockups when an extensive amount of messages is printed.
  It was observed, for example, during boot of large systems with a lot
  of peripherals like disks or network interfaces.

  It prevents live-lockups that were observed, for example, when messages
  about allocation failures were reported and a CPU handled consoles
  instead of reclaiming the memory. It was hard to solve even with
  rate limiting because it would need to take into account the amount
  of messages and the speed of all consoles.

  It is a must to have for real time. Otherwise, any printk() might break
  latency guarantees.

  The per-console kthreads allow to handle each console on its own speed.
  Slow consoles do not longer slow down faster ones. And printk() does
  not longer unpredictably slows down various code paths.

  There are situations when the kthreads are either not available or
  not reliable, for example, early boot, suspend, or panic. In these
  situations, printk() uses the legacy mode and tries to handle consoles
  immediately.


- Add documentation for the printk index.


===============================================

Risk:

There was always fear that messages might not reach the console when
the kthreads could not be scheduled. It blocked introducing the kthreads
for nearly a decade.

It is clear that the kthreads adds some risk that the messages will
not reach consoles. But the original solution was just the best
effort either.

The kthreads fix some real bugs as mentioned above. Also they allow
to see the messages on other consoles when other consoles are slow
or get stuck.

The patchset has been in linux-next for nearly one month since April 26.
It helped to find one already existing race when initializing
a console device. It made the race more visible, see
https://lore.kernel.org/r/20220508103547.626355-1-john.ogness@linutronix.de

Nobody reported problems with missing messages. Of course, it just
a good sign but it is not proof of anything. Let's see how it
works when it gets more testing in the mainline. I am afraid
that we won't find without trying.


----------------------------------------------------------------
Chris Down (1):
      MAINTAINERS: Add printk indexing maintainers on mention of printk_index

John Ogness (15):
      printk: rename cpulock functions
      printk: cpu sync always disable interrupts
      printk: add missing memory barrier to wake_up_klogd()
      printk: wake up all waiters
      printk: wake waiters for safe and NMI contexts
      printk: get caller_id/timestamp after migration disable
      printk: call boot_delay_msec() in printk_delay()
      printk: add con_printk() macro for console details
      printk: refactor and rework printing logic
      printk: move buffer definitions into console_emit_next_record() caller
      printk: add pr_flush()
      printk: add functions to prefer direct printing
      printk: add kthread console printers
      printk: extend console_lock for per-console locking
      printk: remove @console_locked

Marco Elver (1):
      printk, tracing: fix console tracepoint

Petr Mladek (2):
      printk/index: Printk index feature documentation
      Merge branch 'rework/kthreads' into for-linus

 Documentation/core-api/index.rst        |    1 +
 Documentation/core-api/printk-index.rst |  137 ++++
 MAINTAINERS                             |    2 +
 drivers/tty/sysrq.c                     |    2 +
 include/linux/console.h                 |   19 +
 include/linux/printk.h                  |   82 ++-
 kernel/hung_task.c                      |   11 +-
 kernel/panic.c                          |    4 +
 kernel/printk/printk.c                  | 1205 +++++++++++++++++++++++--------
 kernel/rcu/tree_stall.h                 |    2 +
 kernel/reboot.c                         |   14 +-
 kernel/watchdog.c                       |    4 +
 kernel/watchdog_hld.c                   |    4 +
 lib/dump_stack.c                        |    4 +-
 lib/nmi_backtrace.c                     |    4 +-
 15 files changed, 1174 insertions(+), 321 deletions(-)
 create mode 100644 Documentation/core-api/printk-index.rst

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] printk for 5.19
  2022-05-23 13:21 [GIT PULL] printk for 5.19 Petr Mladek
@ 2022-05-25 18:09 ` Linus Torvalds
  2022-05-25 22:02   ` John Ogness
  2022-05-25 18:37 ` pr-tracker-bot
  1 sibling, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2022-05-25 18:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness, Andy Shevchenko,
	Rasmus Villemoes, Jan Kara, Peter Zijlstra,
	Linux Kernel Mailing List

On Mon, May 23, 2022 at 6:21 AM Petr Mladek <pmladek@suse.com> wrote:
>
>   There are situations when the kthreads are either not available or
>   not reliable, for example, early boot, suspend, or panic. In these
>   situations, printk() uses the legacy mode and tries to handle consoles
>   immediately.

Let's see how this works out, but I do have one complaint/query about
the series.

Looking through the commits, I don't see how that "printk: wake up all
waiters" makes any sense at all.

It *ALREADY* woke up all waiters as far as I can see.

Doing a wake_up_interruptible() will stop waking things up only when
it hits a *exclusive* waiter, and as far as I can tell, there are no
exclusive waiters there.

And if they were there, the "wake_up_all()" wouldn't be the right thing anyway.

So that commit seems to be fundamentally confused about things.

You should NEVER use wake_up_interruptible_all() in any normal code.

That "all()" form is only for when there are exclusive waiters (that
are expected to handle the situation entirely, or wake up the next
waiter if they don't), *and* you have some exceptional thing that then
causes *ALL* waiters to need to be woken up.

For example, a "read()" might be an exclusive wait, so that multiple
potential concurrent readers don't cause a scheduling herd of
processes all to wake up when somebody writes to the socket or pipe or
whatever.

So in that situation a write() uses a regular wakeup, so that we only
wake up the one waiter that will take care of things.

But then a *shutdown* event obviously does affect everybody, so that
would cause a "wake_up_interruptible_all()".

I really don't see why the printk() code decided to use that wakeup
function, and the commit message doesn't explain why it was done
either.

I'm sure we have lots of drivers that are confused about core things
like this, and I don't really care.

But when I see something really core like printk() get confused and
mis-understand basic wait queue behavior, that makes me go "This is
WRONG".

Please explain.

                    Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] printk for 5.19
  2022-05-23 13:21 [GIT PULL] printk for 5.19 Petr Mladek
  2022-05-25 18:09 ` Linus Torvalds
@ 2022-05-25 18:37 ` pr-tracker-bot
  1 sibling, 0 replies; 6+ messages in thread
From: pr-tracker-bot @ 2022-05-25 18:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Linus Torvalds, Sergey Senozhatsky, Steven Rostedt, John Ogness,
	Andy Shevchenko, Rasmus Villemoes, Jan Kara, Peter Zijlstra,
	linux-kernel

The pull request you sent on Mon, 23 May 2022 15:21:07 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git tags/printk-for-5.19

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/537e62c865dcb9b91d07ed83f8615b71fa0b51bb

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] printk for 5.19
  2022-05-25 18:09 ` Linus Torvalds
@ 2022-05-25 22:02   ` John Ogness
  2022-05-26 12:09     ` Petr Mladek
  0 siblings, 1 reply; 6+ messages in thread
From: John Ogness @ 2022-05-25 22:02 UTC (permalink / raw)
  To: Linus Torvalds, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andy Shevchenko,
	Rasmus Villemoes, Jan Kara, Peter Zijlstra,
	Linux Kernel Mailing List

On 2022-05-25, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, May 23, 2022 at 6:21 AM Petr Mladek <pmladek@suse.com> wrote:
>> There are situations when the kthreads are either not available or
>> not reliable, for example, early boot, suspend, or panic. In these
>> situations, printk() uses the legacy mode and tries to handle
>> consoles immediately.
>
> Let's see how this works out, but I do have one complaint/query about
> the series.
>
> Looking through the commits, I don't see how that "printk: wake up all
> waiters" makes any sense at all.
>
> It *ALREADY* woke up all waiters as far as I can see.
>
> Doing a wake_up_interruptible() will stop waking things up only when
> it hits a *exclusive* waiter, and as far as I can tell, there are no
> exclusive waiters there.

You are correct. @log_wait never has exclusive waiters. I will post a
patch to revert the change in question.

Until now, I never took the time to investigate if there were any
exclusive waiters. I just wanted to be sure that all waiters wake up,
regardless of their type. But obviously the patch is unnecessary with
the currently implemented waiters.

> That "all()" form is only for when there are exclusive waiters (that
> are expected to handle the situation entirely, or wake up the next
> waiter if they don't), *and* you have some exceptional thing that then
> causes *ALL* waiters to need to be woken up.
>
> For example, a "read()" might be an exclusive wait, so that multiple
> potential concurrent readers don't cause a scheduling herd of
> processes all to wake up when somebody writes to the socket or pipe or
> whatever.
>
> So in that situation a write() uses a regular wakeup, so that we only
> wake up the one waiter that will take care of things.
>
> But then a *shutdown* event obviously does affect everybody, so that
> would cause a "wake_up_interruptible_all()".

Thank you for taking the time to clarify the use case for all(). Such a
use case currently does not exist for printk.

> I'm sure we have lots of drivers that are confused about core things
> like this, and I don't really care.
>
> But when I see something really core like printk() get confused and
> mis-understand basic wait queue behavior, that makes me go "This is
> WRONG".

You are correct to question a series when something like this in core
code is found. I cannot guarantee that every line is perfect, but this
series does have a great deal of dedicated review, discussion, revision,
and heavy testing behind it.

Petr has done a great job of requiring these kinds of changes as
separate commits, which certainly helps to identify any false steps.

Thank you for taking extra time to scrutinize this series.

John Ogness

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] printk for 5.19
  2022-05-25 22:02   ` John Ogness
@ 2022-05-26 12:09     ` Petr Mladek
  2022-05-26 17:13       ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Petr Mladek @ 2022-05-26 12:09 UTC (permalink / raw)
  To: John Ogness
  Cc: Linus Torvalds, Sergey Senozhatsky, Steven Rostedt,
	Andy Shevchenko, Rasmus Villemoes, Jan Kara, Peter Zijlstra,
	Linux Kernel Mailing List

On Thu 2022-05-26 00:08:46, John Ogness wrote:
> On 2022-05-25, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > On Mon, May 23, 2022 at 6:21 AM Petr Mladek <pmladek@suse.com> wrote:
> >> There are situations when the kthreads are either not available or
> >> not reliable, for example, early boot, suspend, or panic. In these
> >> situations, printk() uses the legacy mode and tries to handle
> >> consoles immediately.
> >
> > Looking through the commits, I don't see how that "printk: wake up all
> > waiters" makes any sense at all.
> >
> > It *ALREADY* woke up all waiters as far as I can see.
> >
> > Doing a wake_up_interruptible() will stop waking things up only when
> > it hits a *exclusive* waiter, and as far as I can tell, there are no
> > exclusive waiters there.
> 
> You are correct. @log_wait never has exclusive waiters. I will post a
> patch to revert the change in question.

John, Yes, please do so.

> > But when I see something really core like printk() get confused and
> > mis-understand basic wait queue behavior, that makes me go "This is
> > WRONG".

Linus, thanks a lot for catching and explaining the problem.

Yeah, it is a shame. I have met code with exclusive waiters only once,
8 years ago and completely forgot about it. I was curious how it
worked with more log waiters but I put it into TODO list. The new
code looked safe and I was too focused on finding potential problems.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] printk for 5.19
  2022-05-26 12:09     ` Petr Mladek
@ 2022-05-26 17:13       ` Linus Torvalds
  0 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2022-05-26 17:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Andy Shevchenko,
	Rasmus Villemoes, Jan Kara, Peter Zijlstra,
	Linux Kernel Mailing List

On Thu, May 26, 2022 at 5:09 AM Petr Mladek <pmladek@suse.com> wrote:
>
> Yeah, it is a shame. I have met code with exclusive waiters only once,
> 8 years ago and completely forgot about it.

Yeah, the exclusive waiters thing is quite unusual, and not a normal pattern.

It's basically only used for a very few performance-critical cases
where the "multiple readers, but only one writer should wake things
up" can happen.

And it's rare. It's things like "accept()" on a socket - where it's
common to have a lot of threads waiting for a new connection, but you
don't want to wake them all just because one new connection comes in,
because only one of them will successfully take it.

There are not very many patterns like that where you have that "many
people waiting for the same thing, only one will get it". It happens
for socket readers (kind of like accept, but for the "UDP packet with
command comes in"), or with things like pipes (which people use for
similar things and distribute work to potentially lots of threads
through a pipe).

So the common pattern tends to be "some event that gets a threaded
server using one thread to complete", but there just aren't that many
of those cases.

It _could_ also be used for things like locking (ie think "lock
release wakes up waiter"), but I think all our lock primitives end up
doing their wakeup routines at a lower level (ie using
"wake_up_process()" directly) so those don't actually use it.

End result: the exclusive waiters are an important case, but they are
rare - and thus the "all()" version of wakeup is also quite rare.

                        Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-26 17:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-23 13:21 [GIT PULL] printk for 5.19 Petr Mladek
2022-05-25 18:09 ` Linus Torvalds
2022-05-25 22:02   ` John Ogness
2022-05-26 12:09     ` Petr Mladek
2022-05-26 17:13       ` Linus Torvalds
2022-05-25 18:37 ` pr-tracker-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.