All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH printk v1 11/13] printk: reimplement console_lock for proper kthread support
Date: Fri, 11 Mar 2022 23:27:51 +0106	[thread overview]
Message-ID: <87tuc4yvsw.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <Yit2LN1nCaiUo5y4@alley>

Hi Petr,

I do not think the example below is showing what you think it does, but
it does shed light on an important point.

On 2022-03-11, Petr Mladek <pmladek@suse.com> wrote:
> CPU0			CPU1			CPU2
>
> printk()
>   // direct mode allowed

OK, so @printk_direct is >= 1.

>   console_trylock()
>   console_unlock()
>     console_flush_all()
>
> 			printk_direct_enter()

@printk_direct is now >= 2.

>
>       allows_direct_printing() -> false;

I do not understand why you say it returns false. @printk_direct is not
zero. This returns _true_ and will print all records currently in the
ringbuffer.

>       break;
>
>       __console_unlock()
>         wakeup_klogd()

Note that all kthreads wake here. To keep it simple, let us assume there
is only 1.

> 						// woken printk_khread

The kthread will only go into its printing loop if new records have
appeared after the above __console_unlock(). If any did appear, _that_
printk() will attempt to print them because direct printing is
active. But let us assume that the kthread on CPU2 woke up before the
new printk() context could attempt a console_lock() or
console_trylock().

> 						console_thread_printing_enter()

@console_lock_count is now 1. Now console_lock() and console_trylock()
will block or fail, respectively.

> 			printk_direct_exit()

@printk_direct is now >= 1.

>       console_trylock()
>         atomic_tryblock()
> 	  //fails because thread active

Correct. (Note that if CPU0 had been in a console_lock() context, it
would _not_ fail because console_lock_reacquire() would wait for the
active kthreads to finish their current record.)

>    return;
>
> 			printk_direct_enter()

@printk_direct is now >= 2.

> 						console_thread_printing_exit()

Note that CPU2 would have printed a single record.

> 						// sleep because
> 						atomic_read(&printk_direct) is not
> 						zero

No, the kthread does not sleep. It will continue printing until it is
blocked via console_lock()/console_trylock() or until all the records
are printed.

> Result: nobody prints

Sorry, but I have no idea how you came to that conclusion.

> Note: The thread will actually not sleep because printk_should_wake()
>       does not check atomic_read(&printk_direct).

Exactly.

>       But it is a bug. Active thread should check it and allow
>       entering direct mode.

We are in direct mode this entire example. I do not understand what you
mean by "allow entering direct mode". Perhaps you mean "allow
console_trylock() to succeed".

If that is what you mean, then you are suggesting that the
console_trylock() spins until all the kthreads have finished their
current record. This could be a new variant of console_trylock().

And then rather than continuing their printing loop, the kthreads all
stay asleep as long as @printk_direct is non-zero (in addition to the
existing sleep conditions).

This could work if the kthreads disable preemption for the
console_thread_printing_enter()/_exit() window.

Although it should be noted that this still will not help if a
console_lock() context is scheduled out.

> Note2: Even when one printk thread flushes the messages. There might
>        be other thread that will never get scheduled a nobody would
>        printk the messages on the related console.

That is what atomic consoles are for. Until atomic consoles are
available, that situation is covered by @oops_in_progress and
console_flush_on_panic().

> This is the race that I see with console_trylock(). IMHO, if we solve
> this race then we do not need console_lock_reacquire().

I do not understand why you continue to mix console_trylock() and
console_lock_reacquire(). console_lock_reacquire() is only for the
console_lock() context.

> Well, I might be wrong. It is Friday evening. I still do not have
> the entire picture. I probably should have waited for Monday.

I believe that you sense a danger with direct printing and
console_trylock(). It allows for scenarios like your example that end up
relying on kthreads to finish the printing (if there is no
panic). Mainline already has this issue because console_lock() can also
be scheduled away and console_trylock() has no chance to print. This
really is the same issue and ultimately relies on @oops_in_progress and
console_flush_on_panic() to get the messages out.

I believe you are hinting at the worst-case scenario: a kthread getting
scheduled out while printing and never seeing a CPU again because the
system is so busy. Assuming the system does not panic, no printing would
be seen on that console anymore, even if direct printing is enabled.

The only solution I see (aside from atomic consoles) is to disable
preemption during printing. Perhaps for non-atomic consoles, this is
what we need to do. That, together with a new console_trylock() variant,
should avoid this concern. Do you agree? Do we want to go that route?

Disabling preemption would be a problem moving forward for the fbcon
because, for the future, I really would like fbcon to live in a
sleepable context. I already have some new ideas about this. But that is
not a topic for this series.

John

P.S. By "new console_trylock() variant" I mean something for the
console_trylock() context like:

1. grab @console_sem with down_trylock_console_sem()

2. spin on atomic_tryblock()

If console_thread_printing_enter() disables preemption, this should work
about as well as the current owner/waiter logic for @console_sem. I can
do some prototype testing to see if there is some detail I am missing.

John

  reply	other threads:[~2022-03-11 22:45 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-07 19:43 [PATCH printk v1 00/13] implement threaded console printing John Ogness
2022-02-07 19:43 ` [PATCH printk v1 01/13] printk: rename cpulock functions John Ogness
2022-02-11 12:44   ` Petr Mladek
2022-02-11 14:42     ` John Ogness
2022-02-11 20:57       ` Steven Rostedt
2022-02-11 21:04         ` Peter Zijlstra
2022-02-15  9:32           ` Petr Mladek
2022-02-15  9:13       ` Petr Mladek
2022-02-14  6:49     ` Sergey Senozhatsky
2022-02-14  9:45       ` John Ogness
2022-02-15  9:29       ` Petr Mladek
2022-02-16  3:27         ` Sergey Senozhatsky
2022-02-17 14:34         ` John Ogness
2022-02-07 19:43 ` [PATCH printk v1 02/13] printk: cpu sync always disable interrupts John Ogness
2022-02-11 12:58   ` Petr Mladek
2022-02-14  6:36     ` Sergey Senozhatsky
2022-02-07 19:43 ` [PATCH printk v1 03/13] printk: use percpu flag instead of cpu_online() John Ogness
2022-02-11 16:05   ` Petr Mladek
2022-02-14  7:08     ` Sergey Senozhatsky
2022-02-14  7:35     ` Sergey Senozhatsky
2022-02-15 10:38       ` Petr Mladek
2022-02-16  3:29         ` Sergey Senozhatsky
2022-03-02 14:21         ` John Ogness
2022-03-04 15:56           ` Petr Mladek
2022-03-05 17:05             ` Jason A. Donenfeld
2022-03-07 16:14               ` Petr Mladek
2022-02-16 13:58   ` two locations: was: " Petr Mladek
2022-03-02 14:49     ` John Ogness
2022-03-04 16:14       ` Petr Mladek
2022-03-07 10:06         ` John Ogness
2022-03-08 16:08       ` Petr Mladek
2022-02-07 19:43 ` [PATCH printk v1 04/13] printk: get caller_id/timestamp after migration disable John Ogness
2022-02-15  5:53   ` Sergey Senozhatsky
2022-02-15 11:56   ` Petr Mladek
2022-02-07 19:43 ` [PATCH printk v1 05/13] printk: call boot_delay_msec() in printk_delay() John Ogness
2022-02-15  5:58   ` Sergey Senozhatsky
2022-02-15 14:59     ` Petr Mladek
2022-02-16  3:21       ` Sergey Senozhatsky
2022-02-15 15:03   ` Petr Mladek
2022-02-07 19:43 ` [PATCH printk v1 06/13] printk: refactor and rework printing logic John Ogness
2022-02-16 15:43   ` Petr Mladek
2022-03-02 16:10     ` John Ogness
2022-02-07 19:43 ` [PATCH printk v1 07/13] printk: move buffer definitions into console_emit_next_record() caller John Ogness
2022-02-16 16:10   ` Petr Mladek
2022-03-02 16:25     ` John Ogness
2022-02-07 19:43 ` [PATCH printk v1 08/13] printk: add pr_flush() John Ogness
2022-02-17 10:11   ` Petr Mladek
2022-03-02 17:23     ` John Ogness
2022-03-04 13:24       ` Petr Mladek
2022-02-07 19:43 ` [PATCH printk v1 09/13] printk: add functions to allow direct printing John Ogness
2022-02-17 12:52   ` Petr Mladek
2022-02-18  9:00     ` David Laight
2022-02-18 12:52       ` Petr Mladek
2022-03-03 14:37     ` John Ogness
2022-02-07 19:43 ` [PATCH printk v1 10/13] printk: add kthread console printers John Ogness
2022-02-18  9:00   ` early start: was: " Petr Mladek
2022-02-18  9:04   ` start&stop: " Petr Mladek
2022-02-18  9:08   ` main loop: " Petr Mladek
2022-02-18  9:12   ` wake_up_all: " Petr Mladek
2022-02-07 19:43 ` [PATCH printk v1 11/13] printk: reimplement console_lock for proper kthread support John Ogness
2022-02-18 16:20   ` Petr Mladek
2022-02-18 21:41     ` John Ogness
2022-02-18 22:03       ` John Ogness
2022-02-22 11:42       ` Petr Mladek
2022-02-23 17:20         ` John Ogness
2022-02-24  8:27           ` Petr Mladek
2022-02-23 10:19   ` Petr Mladek
2022-03-09 13:56     ` John Ogness
2022-03-10 14:34       ` Petr Mladek
2022-03-10 16:08         ` John Ogness
2022-03-11 10:26           ` Petr Mladek
2022-03-11 13:28             ` John Ogness
2022-03-11 16:17               ` Petr Mladek
2022-03-11 22:21                 ` John Ogness [this message]
2022-03-14 14:08                   ` Petr Mladek
2022-03-14 14:43                     ` John Ogness
2022-03-14 15:53                       ` Petr Mladek
2022-03-11 18:41               ` Petr Mladek
2022-02-07 19:43 ` [PATCH printk v1 12/13] printk: remove @console_locked John Ogness
2022-02-23 12:17   ` Petr Mladek
2022-02-07 19:43 ` [PATCH printk v1 13/13] console: introduce CON_MIGHT_SLEEP for vt John Ogness
2022-02-23 13:37   ` Petr Mladek
2022-02-23 18:31     ` Greg Kroah-Hartman
     [not found] ` <20220208083620.2736-1-hdanton@sina.com>
2022-02-08 11:08   ` [PATCH printk v1 10/13] printk: add kthread console printers John Ogness
2022-02-08 14:53     ` Petr Mladek
2022-02-14  6:12       ` Sergey Senozhatsky
2022-02-14 10:02         ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tuc4yvsw.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.