All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH printk v3 14/15] printk: extend console_lock for proper kthread support
Date: Thu, 21 Apr 2022 16:36:25 +0206	[thread overview]
Message-ID: <875yn2h5ku.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <20220421124119.GB11747@pathway.suse.cz>

On 2022-04-21, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2603,9 +2666,10 @@ static int console_cpu_notify(unsigned int cpu)
>>  		/* If trylock fails, someone else is doing the printing */
>>  		if (console_trylock())
>>  			console_unlock();
>> -
>> -		/* Wake kthread printers. Some may have become usable. */
>> -		wake_up_klogd();
>> +		else {
>> +			/* Some kthread printers may have become usable. */
>> +			wake_up_klogd();
>
> Do you have any particular scenario in mind, please?
> Could CPU hotplug put any printk kthread into a sleep?

I do not have a particular scenario. My reasoning was that a CPU coming
online would affect the conditions of __console_is_usable() for consoles
without CON_ANYTIME. Of course, it would mean that previously a kthread
went to sleep because it was trying to print from a CPU that was
offline. I am doubtful that such a scenario is possible. But you did
uncover some bizarre code paths where task migration could fail during
CPU offlining.

Anyway, you suggested to keep the CON_ANYTIME checks for kthreads in
there. So it seems correct to wake threads anytime the
printer_should_wake() conditions change.

>> @@ -2625,11 +2689,33 @@ void console_lock(void)
>>  	down_console_sem();
>>  	if (console_suspended)
>>  		return;
>> +	console_kthreads_block();
>>  	console_locked = 1;
>>  	console_may_schedule = 1;
>>  }
>>  EXPORT_SYMBOL(console_lock);
>>  
>> +/*
>> + * Lock the console_lock, but rather than blocking all the kthread printers,
>> + * lock a specified kthread printer and hold the lock. This is useful if
>> + * console flags for a particular console need to be updated.
>> + */
>> +void console_lock_single_hold(struct console *con)
>> +{
>> +	might_sleep();
>> +	down_console_sem();
>> +	mutex_lock(&con->lock);
>> +	console_locked = 1;
>> +	console_may_schedule = 1;
>
> This looks wrong. It is a global flag that could be modified
> only when all consoles are blocked.

You are correct. is_console_locked() needs to return false in this
scenario. I will leave out the @console_lock setting and insert a
comment to clarify why.

> This API blocks only the single console. The other consoles are still
> allowed to print actively.

That is the point. VT does not care about the other printers. VT is
using @console_locked to protect itself against itself.

> Another problem will appear with the 15th patch. It will remove
> console_locked variable and is_console_locked() will not longer
> be aware that this console is locked. We will not know that
> it might cause deadlock in the VT code.

From the perspective of VT code the console is _not_ locked. So
is_console_locked() should return false. is_console_locked() is to make
sure that the _VT code_ has called console_lock()/console_trylock(). So
the 15th patch is still correct.

>> @@ -2728,17 +2834,18 @@ static void __console_unlock(void)
>>   *
>>   * @handover will be set to true if a printk waiter has taken over the
>>   * console_lock, in which case the caller is no longer holding the
>> - * console_lock. Otherwise it is set to false.
>> + * console_lock. Otherwise it is set to false. A NULL pointer may be provided
>> + * to disable allowing the console_lock to be taken over by a printk waiter.
>>   *
>>   * Returns false if the given console has no next record to print, otherwise
>>   * true.
>>   *
>> - * Requires the console_lock.
>> + * Requires the console_lock if @handover is non-NULL.
>
>     * Requires con->lock otherwise.

Right. I will update the comments.

>>   */
>> -static bool console_emit_next_record(struct console *con, char *text, char *ext_text,
>> -				     char *dropped_text, bool *handover)
>> +static bool __console_emit_next_record(struct console *con, char *text, char *ext_text,
>> +				       char *dropped_text, bool *handover)
>>  {
>> -	static int panic_console_dropped;
>> +	static atomic_t panic_console_dropped = ATOMIC_INIT(0);
>>  	struct printk_info info;
>>  	struct printk_record r;
>>  	unsigned long flags;
>> @@ -3261,6 +3401,8 @@ void register_console(struct console *newcon)
>>  
>>  	newcon->dropped = 0;
>>  	newcon->thread = NULL;
>> +	newcon->flags |= CON_THD_BLOCKED;
>
> Just to show the complexity added by console_lock_single_hold():
>
> It took me some time to realize that it is correct. The flag
> is needed because the console will be added under console_lock().
> The flag would not be needed when it was added under
> console_lock_single_hold().

?? But it is not added under
console_lock_single_hold(). console_lock_single_hold() is not a
replacement for console_lock(). Their purpose is very
different. console_lock_single_hold() is an internal function to provide
synchronization for @flags and @thread updates of a single console.

Maybe we are getting caught in my "bad naming" trap again. :-/

I do not have any ideas for a function that:

- locks @console_sem to prevent console registration/deregistration

- locks con->lock to provide synchronized @flags and/or @thread updates

>> +	mutex_init(&newcon->lock);
>>  
>>  	if (newcon->flags & CON_PRINTBUFFER) {
>>  		/* Get a consistent copy of @syslog_seq. */
>> @@ -3314,7 +3456,7 @@ int unregister_console(struct console *console)
>>  		return 0;
>>  
>>  	res = -ENODEV;
>> -	console_lock();
>> +	console_lock_single_hold(console);
>>  	if (console_drivers == console) {
>>  		console_drivers=console->next;
>
> Another example of the complexity:
>
> I though that this was not safe. console_drivers is a global list
> and console_lock_single_hold() is supposed to block only a single
> console. But it is actually safe because console_lock_single_hold()
> holds console_sem.

Yes. It is safe.

> Another question is why console_lock_single_hold() is enough
> here and why console_lock() is used in register_console(). I think
> that console_lock_single_hold() will be enough even in
> register_console().

?? And which console would you want to lock? @newcon? It is not
registered yet.

If you want to minimize register_console() locking, it is enough just to
down @console_sem.

> All this is far from obvious. It shows how the API is confusing and
> tricky. And it is another motivation to remove
> console_lock_single_hold().

We need a method to provide @flags synchronization between the kthreads
and console_stop(). Keep in mind that console_lock() does *not* hold the
mutexes. So a completed console_lock() call does *not* mean that the
kthreads are sleeping. They could still lock their own mutex and keep
going. It is not until the kthreads see that CON_THD_BLOCKED is set that
they realize they are not supposed to be running and go to sleep. But
console_stop() could be performing an update to @flags while that
kthread is checking it. It is a data race in code that should be
synchronized.

I spent some time trying to find a good solution for this. Here are the
ideas that I came up with:

1. Use READ_ONCE(short)/WRITE_ONCE(short) because probably that is
   enough to guarantee atomic writes/reads on all platforms.

2. Make @flags atomic_t. This guarentees consistence but would require
   changing how all consoles initialize that field.

3. Create a separate @enabled boolean field in struct console so that
   data races do not matter. This would also change how all consoles
   initialize their struct.

4. Provide a new function that uses the mutex to synchronize, since the
   kthread is already using the mutex.

I ended up choosing #4 because it had the added benefit of allowing
console_start(), console_stop(), console_unregister() to avoid affecting
the other kthreads.

>>  		res = 0;
>> @@ -3676,14 +3835,14 @@ static int printk_kthread_func(void *data)
>>  	kfree(ext_text);
>>  	kfree(text);
>>  
>> -	console_lock();
>> +	mutex_lock(&con->lock);
>
> This is serialized against unregister_console() but not with
> register_console() because they use different locking scheme.

?? In register_console() the thread has not been created yet. There is
nothing to synchronize against.

> Resume:
>
> I would prefer to get rid of console_lock_single_hold() and
> console_unlock_single_release() API.
>
> It was definitely an interesting experiment. I agree that it would
> be nice to do not block the other kthreads when it is not really
> needed. But from my POV, it adds more harm than good at the moment.

So we go with option #1 to solve(?) the @flags synchronization issue? Or
is there another option I missed?

> It is possible that we will want to do such optimizations
> in the future. But it must be easier to understand what exactly
> is serialized which way. At least it should be more documented.
> Also the same API would need to be used on the related code
> paths.

AFAICT it is used in all places that it is appropriate.

John

  reply	other threads:[~2022-04-21 14:30 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-19 23:46 [PATCH printk v3 00/15] printk/for-next John Ogness
2022-04-19 23:46 ` [PATCH printk v3 01/15] printk: rename cpulock functions John Ogness
2022-04-19 23:46 ` [PATCH printk v3 02/15] printk: cpu sync always disable interrupts John Ogness
2022-04-19 23:46 ` [PATCH printk v3 03/15] printk: add missing memory barrier to wake_up_klogd() John Ogness
2022-04-20 12:34   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 04/15] printk: wake up all waiters John Ogness
2022-04-20 12:36   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 05/15] printk: wake waiters for safe and NMI contexts John Ogness
2022-04-20 13:55   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 06/15] printk: get caller_id/timestamp after migration disable John Ogness
2022-04-19 23:46 ` [PATCH printk v3 07/15] printk: call boot_delay_msec() in printk_delay() John Ogness
2022-04-19 23:46 ` [PATCH printk v3 08/15] printk: add con_printk() macro for console details John Ogness
2022-04-20 14:01   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 09/15] printk: refactor and rework printing logic John Ogness
2022-04-20 14:55   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 10/15] printk: move buffer definitions into console_emit_next_record() caller John Ogness
2022-04-19 23:46 ` [PATCH printk v3 11/15] printk: add pr_flush() John Ogness
2022-04-20 15:10   ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 12/15] printk: add functions to prefer direct printing John Ogness
2022-04-19 23:46 ` [PATCH printk v3 13/15] printk: add kthread console printers John Ogness
2022-04-20 17:53   ` Petr Mladek
2022-04-20 20:02     ` John Ogness
2022-04-21 14:25       ` Petr Mladek
2022-04-19 23:46 ` [PATCH printk v3 14/15] printk: extend console_lock for proper kthread support John Ogness
2022-04-20  2:13   ` kernel test robot
2022-04-20 13:32     ` John Ogness
2022-04-20 13:32       ` John Ogness
2022-04-20  4:04   ` kernel test robot
2022-04-21 12:41   ` Petr Mladek
2022-04-21 14:30     ` John Ogness [this message]
2022-04-22 13:03       ` Petr Mladek
2022-04-22 14:14         ` John Ogness
2022-04-22 15:15           ` Petr Mladek
2022-04-22 21:25             ` John Ogness
2022-04-25 15:18               ` Petr Mladek
2022-04-25 19:10                 ` John Ogness
2022-04-19 23:46 ` [PATCH printk v3 15/15] printk: remove @console_locked John Ogness
2022-04-21 12:46   ` Petr Mladek
2022-04-21 14:40 ` [PATCH printk v3 00/15] printk/for-next Petr Mladek
2022-04-21 15:02   ` John Ogness

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875yn2h5ku.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.