linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH printk v3 14/15] printk: extend console_lock for proper kthread support
Date: Thu, 21 Apr 2022 16:36:25 +0206	[thread overview]
Message-ID: <875yn2h5ku.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <20220421124119.GB11747@pathway.suse.cz>

On 2022-04-21, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2603,9 +2666,10 @@ static int console_cpu_notify(unsigned int cpu)
>>  		/* If trylock fails, someone else is doing the printing */
>>  		if (console_trylock())
>>  			console_unlock();
>> -
>> -		/* Wake kthread printers. Some may have become usable. */
>> -		wake_up_klogd();
>> +		else {
>> +			/* Some kthread printers may have become usable. */
>> +			wake_up_klogd();
>
> Do you have any particular scenario in mind, please?
> Could CPU hotplug put any printk kthread into a sleep?

I do not have a particular scenario. My reasoning was that a CPU coming
online would affect the conditions of __console_is_usable() for consoles
without CON_ANYTIME. Of course, it would mean that previously a kthread
went to sleep because it was trying to print from a CPU that was
offline. I am doubtful that such a scenario is possible. But you did
uncover some bizarre code paths where task migration could fail during
CPU offlining.

Anyway, you suggested to keep the CON_ANYTIME checks for kthreads in
there. So it seems correct to wake threads anytime the
printer_should_wake() conditions change.

>> @@ -2625,11 +2689,33 @@ void console_lock(void)
>>  	down_console_sem();
>>  	if (console_suspended)
>>  		return;
>> +	console_kthreads_block();
>>  	console_locked = 1;
>>  	console_may_schedule = 1;
>>  }
>>  EXPORT_SYMBOL(console_lock);
>>  
>> +/*
>> + * Lock the console_lock, but rather than blocking all the kthread printers,
>> + * lock a specified kthread printer and hold the lock. This is useful if
>> + * console flags for a particular console need to be updated.
>> + */
>> +void console_lock_single_hold(struct console *con)
>> +{
>> +	might_sleep();
>> +	down_console_sem();
>> +	mutex_lock(&con->lock);
>> +	console_locked = 1;
>> +	console_may_schedule = 1;
>
> This looks wrong. It is a global flag that could be modified
> only when all consoles are blocked.

You are correct. is_console_locked() needs to return false in this
scenario. I will leave out the @console_lock setting and insert a
comment to clarify why.

> This API blocks only the single console. The other consoles are still
> allowed to print actively.

That is the point. VT does not care about the other printers. VT is
using @console_locked to protect itself against itself.

> Another problem will appear with the 15th patch. It will remove
> console_locked variable and is_console_locked() will not longer
> be aware that this console is locked. We will not know that
> it might cause deadlock in the VT code.

From the perspective of VT code the console is _not_ locked. So
is_console_locked() should return false. is_console_locked() is to make
sure that the _VT code_ has called console_lock()/console_trylock(). So
the 15th patch is still correct.

>> @@ -2728,17 +2834,18 @@ static void __console_unlock(void)
>>   *
>>   * @handover will be set to true if a printk waiter has taken over the
>>   * console_lock, in which case the caller is no longer holding the
>> - * console_lock. Otherwise it is set to false.
>> + * console_lock. Otherwise it is set to false. A NULL pointer may be provided
>> + * to disable allowing the console_lock to be taken over by a printk waiter.
>>   *
>>   * Returns false if the given console has no next record to print, otherwise
>>   * true.
>>   *
>> - * Requires the console_lock.
>> + * Requires the console_lock if @handover is non-NULL.
>
>     * Requires con->lock otherwise.

Right. I will update the comments.

>>   */
>> -static bool console_emit_next_record(struct console *con, char *text, char *ext_text,
>> -				     char *dropped_text, bool *handover)
>> +static bool __console_emit_next_record(struct console *con, char *text, char *ext_text,
>> +				       char *dropped_text, bool *handover)
>>  {
>> -	static int panic_console_dropped;
>> +	static atomic_t panic_console_dropped = ATOMIC_INIT(0);
>>  	struct printk_info info;
>>  	struct printk_record r;
>>  	unsigned long flags;
>> @@ -3261,6 +3401,8 @@ void register_console(struct console *newcon)
>>  
>>  	newcon->dropped = 0;
>>  	newcon->thread = NULL;
>> +	newcon->flags |= CON_THD_BLOCKED;
>
> Just to show the complexity added by console_lock_single_hold():
>
> It took me some time to realize that it is correct. The flag
> is needed because the console will be added under console_lock().
> The flag would not be needed when it was added under
> console_lock_single_hold().

?? But it is not added under
console_lock_single_hold(). console_lock_single_hold() is not a
replacement for console_lock(). Their purpose is very
different. console_lock_single_hold() is an internal function to provide
synchronization for @flags and @thread updates of a single console.

Maybe we are getting caught in my "bad naming" trap again. :-/

I do not have any ideas for a function that:

- locks @console_sem to prevent console registration/deregistration

- locks con->lock to provide synchronized @flags and/or @thread updates

>> +	mutex_init(&newcon->lock);
>>  
>>  	if (newcon->flags & CON_PRINTBUFFER) {
>>  		/* Get a consistent copy of @syslog_seq. */
>> @@ -3314,7 +3456,7 @@ int unregister_console(struct console *console)
>>  		return 0;
>>  
>>  	res = -ENODEV;
>> -	console_lock();
>> +	console_lock_single_hold(console);
>>  	if (console_drivers == console) {
>>  		console_drivers=console->next;
>
> Another example of the complexity:
>
> I though that this was not safe. console_drivers is a global list
> and console_lock_single_hold() is supposed to block only a single
> console. But it is actually safe because console_lock_single_hold()
> holds console_sem.

Yes. It is safe.

> Another question is why console_lock_single_hold() is enough
> here and why console_lock() is used in register_console(). I think
> that console_lock_single_hold() will be enough even in
> register_console().

?? And which console would you want to lock? @newcon? It is not
registered yet.

If you want to minimize register_console() locking, it is enough just to
down @console_sem.

> All this is far from obvious. It shows how the API is confusing and
> tricky. And it is another motivation to remove
> console_lock_single_hold().

We need a method to provide @flags synchronization between the kthreads
and console_stop(). Keep in mind that console_lock() does *not* hold the
mutexes. So a completed console_lock() call does *not* mean that the
kthreads are sleeping. They could still lock their own mutex and keep
going. It is not until the kthreads see that CON_THD_BLOCKED is set that
they realize they are not supposed to be running and go to sleep. But
console_stop() could be performing an update to @flags while that
kthread is checking it. It is a data race in code that should be
synchronized.

I spent some time trying to find a good solution for this. Here are the
ideas that I came up with:

1. Use READ_ONCE(short)/WRITE_ONCE(short) because probably that is
   enough to guarantee atomic writes/reads on all platforms.

2. Make @flags atomic_t. This guarentees consistence but would require
   changing how all consoles initialize that field.

3. Create a separate @enabled boolean field in struct console so that
   data races do not matter. This would also change how all consoles
   initialize their struct.

4. Provide a new function that uses the mutex to synchronize, since the
   kthread is already using the mutex.

I ended up choosing #4 because it had the added benefit of allowing
console_start(), console_stop(), console_unregister() to avoid affecting
the other kthreads.

>>  		res = 0;
>> @@ -3676,14 +3835,14 @@ static int printk_kthread_func(void *data)
>>  	kfree(ext_text);
>>  	kfree(text);
>>  
>> -	console_lock();
>> +	mutex_lock(&con->lock);
>
> This is serialized against unregister_console() but not with
> register_console() because they use different locking scheme.

?? In register_console() the thread has not been created yet. There is
nothing to synchronize against.

> Resume:
>
> I would prefer to get rid of console_lock_single_hold() and
> console_unlock_single_release() API.
>
> It was definitely an interesting experiment. I agree that it would
> be nice to do not block the other kthreads when it is not really
> needed. But from my POV, it adds more harm than good at the moment.

So we go with option #1 to solve(?) the @flags synchronization issue? Or
is there another option I missed?

> It is possible that we will want to do such optimizations
> in the future. But it must be easier to understand what exactly
> is serialized which way. At least it should be more documented.
> Also the same API would need to be used on the related code
> paths.

AFAICT it is used in all places that it is appropriate.

John

  reply	other threads:[~2022-04-21 14:30 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-19 23:46 [PATCH printk v3 00/15] printk/for-next John Ogness
2022-04-19 23:46 ` [PATCH printk v3 01/15] printk: rename cpulock functions John Ogness
2022-04-19 23:46 ` [PATCH printk v3 02/15] printk: cpu sync always disable interrupts John Ogness
2022-04-19 23:46 ` [PATCH printk v3 03/15] printk: add missing memory barrier to wake_up_klogd() John Ogness
2022-04-20 12:34   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 04/15] printk: wake up all waiters John Ogness
2022-04-20 12:36   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 05/15] printk: wake waiters for safe and NMI contexts John Ogness
2022-04-20 13:55   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 06/15] printk: get caller_id/timestamp after migration disable John Ogness
2022-04-19 23:46 ` [PATCH printk v3 07/15] printk: call boot_delay_msec() in printk_delay() John Ogness
2022-04-19 23:46 ` [PATCH printk v3 08/15] printk: add con_printk() macro for console details John Ogness
2022-04-20 14:01   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 09/15] printk: refactor and rework printing logic John Ogness
2022-04-20 14:55   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 10/15] printk: move buffer definitions into console_emit_next_record() caller John Ogness
2022-04-19 23:46 ` [PATCH printk v3 11/15] printk: add pr_flush() John Ogness
2022-04-20 15:10   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 12/15] printk: add functions to prefer direct printing John Ogness
2022-04-19 23:46 ` [PATCH printk v3 13/15] printk: add kthread console printers John Ogness
2022-04-20 17:53   ` Petr Mladek
2022-04-20 20:02     ` John Ogness
2022-04-21 14:25       ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 14/15] printk: extend console_lock for proper kthread support John Ogness
2022-04-20  2:13   ` kernel test robot
2022-04-20 13:32     ` John Ogness
2022-04-20  4:04   ` kernel test robot
2022-04-21 12:41   ` Petr Mladek
2022-04-21 14:30     ` John Ogness [this message]
2022-04-22 13:03       ` Petr Mladek
2022-04-22 14:14         ` John Ogness
2022-04-22 15:15           ` Petr Mladek
2022-04-22 21:25             ` John Ogness
2022-04-25 15:18               ` Petr Mladek
2022-04-25 19:10                 ` John Ogness
2022-04-19 23:46 ` [PATCH printk v3 15/15] printk: remove @console_locked John Ogness
2022-04-21 12:46   ` Petr Mladek
2022-04-21 14:40 ` [PATCH printk v3 00/15] printk/for-next Petr Mladek
2022-04-21 15:02   ` John Ogness

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875yn2h5ku.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).