All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: "Sergey Senozhatsky" <senozhatsky@chromium.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Jiri Slaby" <jirislaby@kernel.org>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Tony Lindgren" <tony@atomide.com>,
	"Geert Uytterhoeven" <geert+renesas@glider.be>,
	"Justin Chen" <justin.chen@broadcom.com>,
	"Jiaqing Zhao" <jiaqing.zhao@linux.intel.com>,
	linux-serial@vger.kernel.org
Subject: Re: [PATCH printk v2 08/26] printk: nbcon: Implement processing in port->lock wrapper
Date: Mon, 11 Mar 2024 18:14:19 +0106	[thread overview]
Message-ID: <87le6oy9vg.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <Zdh4eEJJpasEWqa5@alley>

Hi Petr,

Thanks for the detailed feedback. Here is a lengthy response. I hope it
clarifies the uart port and console fields. And I think you identified a
bug relating to the setup() callback.

On 2024-02-23, Petr Mladek <pmladek@suse.com> wrote:
> My main (only) concern was the synchronization of the various accessed
> variables, especially, port->cons.

The only issue is if port->cons disappears between lock and unlock. I
see there is code setting port->cons to NULL, although I do not know
why. Once port->cons is set, there is never a reason to unset it.

Regardless, I will add port->lock synchronization when modifying
port->cons. There are only a few code sites and they are all during
driver setup.

> Note: I am not completely sure how the early and valid console drivers
>       share the same struct uart_port. So, maybe I miss some important
>       guarantee.

The struct uart_port is _not_ shared between the early and normal
consoles. However, the struct console is shared for normal consoles
amoung various ports of a particular driver.

> Anyway. synchronization of port->cons looks like a shade area.
> IMHO, the existing code expects that it is used _only when the console
> is registered_. But this patch wants to access it _even before
> the console is registered_!

Indeed. It is not enough for uart_is_nbcon() to check if it is an
NBCON. It must also check if it is registered, locklessly:

    hlist_unhashed_lockless(&con->node);

Most importantly to be sure that nbcon_init() has already been called.
register_console() clears the nbcon state after cons->index has been
set, but before the console has been added to the list.

> For example, it took me quite a lot of time to shake my head around:
>
> #define uart_console(port) \
> 	((port)->cons && (port)->cons->index == (port)->line)
>
>   + port->cons and port->line are updated in the uart code.
>     It seems that the update is not serialized by port->lock.
>     Something might be done under port->mutex.
>
>   + cons->index is updated in register_console() under
>     console_list_lock.
>
> Spoiler: I propose a solution which does not use uart_console().

This check is necessary because multiple ports of a driver will set and
share the same port->cons value, even if they are not really the
console. I looked into enforcing that port->cons is NULL if it is not a
registered console, but this is tricky. port->cons is driver-internal
and hidden from printk. The driver will set port->cons in case it
becomes the console and printk will set cons->index once it has chosen
which port will be the actual console. But there is no way to unset
port->cons if a port was not chosen by printk.

The various fields have the following meaning (AFAICT):

port->line: An identifier to represent a particular port supported by a
driver.

port->cons: The struct console to use if this port is chosen to be a
console.

port->console: Boolean, true if this port was chosen to be a
console. (Used only by the tty layer.)

cons->index: The port chosen by printk to be a console.

None of those fields specify if the port is currently registered as a
console. For that you would need to check if port->cons->node is hashed
and then verify that port->line matches port->cons->index. This is what
uart_nbcon_acquire() needs to do (as well as check if it is an nbcon
console).

>> --- a/drivers/tty/serial/8250/8250_port.c
>> +++ b/drivers/tty/serial/8250/8250_port.c
>> @@ -3284,6 +3284,7 @@ void serial8250_init_port(struct uart_8250_port *up)
>>  	struct uart_port *port = &up->port;
>>  
>>  	spin_lock_init(&port->lock);
>> +	port->nbcon_locked_port = false;
>
> I am not sure if the variable really helps anything:
>
>    + nbcon_context release() must handle the case when it
>      lost the ownership anyway.

uart_nbcon_release() must first check if the provided port is a
registered nbcon console. A simple boolean check is certainly faster
than the 4 checks mentioned above. After all, if it was never locked,
there is no reason to unlock it.

>    + nbcon_acquire() is called under port->lock. It means that
>      nbcon_release() should always be called when the acquire
>      succeeded earlier.

Same answer as above.

> We just need to make sure that port->cons can be cleared only
> under port->lock. But this would be required even with
> port->nbcon_locked_port.

Agreed. I will add this.

>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -995,3 +996,79 @@ void nbcon_free(struct console *con)
>>  
>>  	con->pbufs = NULL;
>>  }
>> +
>> +static inline bool uart_is_nbcon(struct uart_port *up)
>> +{
>> +	int cookie;
>> +	bool ret;
>> +
>> +	if (!uart_console(up))
>
> This function accesses up->cons. I am not completely sure how
> this value is synchronized, for example:
>
>   + serial_core_add_one_port() sets uport->cons under port->mutex.
>     The function is called uart_add_one_port() in various probe()
>     or init() calls.

I will add port->lock synchronization.

>   + univ8250_console_setup() sets and clears port->cons with
>     no explicit synchronization. The function is called from
>     try_enable_preferred_console() under console_list_lock.

I will add port->lock synchronization.

> IMHO, the assignment is done when the drivers are being initialized.
> It is done when the port might already be used by early consoles.
>
> Especially, according to my understanding, newcon->setup() callbacks
> are responsible for using the same port by early and real console drivers.
>
> I guess that uart_port_lock() API is used by early consoles as well.
> It means that they might access up->cons here while it is being
> manipulated by the proper driver.

Note that port->lock does not synchronize early and normal
consoles. Only the console lock can do that. But you bring up a very
good point with setup(). serial8250_console_setup() can call
probe_baud(), which will write to the hardware.

I think that con->setup() needs to be called under the console lock
(just as already with unblank() and device()).

>> +		return false;
>> +
>> +	cookie = console_srcu_read_lock();
>> +	ret = (console_srcu_read_flags(up->cons) & CON_NBCON);
>> +	console_srcu_read_unlock(cookie);
>
> Hmm, console_srcu_read_flags(con) is called under
> console_srcu_read_lock() to make sure that it does not
> disappear. It makes sense when it is used by registered consoles.
>
> But uart_port_lock() might be called even when the console
> is not registered.
>
> I suggest to remove the SRCU lock here. In this code path,
> it does not guarantee anything and is rather misleading.

Agreed.

> I would use a READ_ONCE(), for example by splitting:
>
> /*
>  * Locklessly reading console->flags provides a consistent
>  * read value because there is at most one CPU modifying
>  * console->flags and that CPU is using only read-modify-write
>  * operations to do so.
>  *
>  * The caller must make sure that @con does not disappear.
>  * It can be done by console_srcu_read_lock() when used
>  * only for registered consoles.
>  */
> static inline short console_read_flags(const struct console *con)
> {
> 	return data_race(READ_ONCE(con->flags));
> }
>
> /* Locklessly reading console->flags for registered consoles */
> static inline short console_srcu_read_flags(const struct console *con)
> {
> 	WARN_ON_ONCE(!console_srcu_read_lock_is_held());
>
> 	console_read_flags();
> }

OK.

>> +void uart_nbcon_acquire(struct uart_port *up)
>> +{
>> +	struct console *con = up->cons;
>> +	struct nbcon_context ctxt;
>
> I would add:
>
> 	lockdep_assert_held(&up->lock);

OK.

>> +
>> +	if (!uart_is_nbcon(up))
>> +		return;
>> +
>> +	WARN_ON_ONCE(up->nbcon_locked_port);
>> +
>> +	do {
>> +		do {
>> +			memset(&ctxt, 0, sizeof(ctxt));
>> +			ctxt.console	= con;
>> +			ctxt.prio	= NBCON_PRIO_NORMAL;
>> +		} while (!nbcon_context_try_acquire(&ctxt));
>> +
>> +	} while (!nbcon_context_enter_unsafe(&ctxt));
>> +
>> +	up->nbcon_locked_port = true;
>> +}
>> +EXPORT_SYMBOL_GPL(uart_nbcon_acquire);
>
> I would prefer to split the uart and nbcon specific code, for example:

Can you explain why? This code will not be used anywhere else.

> /**
>  * nbcon_normal_context_acquire - Acquire a generic context with
>  *	the normal priority for nbcon console
>  * @con:	nbcon console
>  *
>  * Context:	Any context which could not be migrated to another CPU.
>  *
>  * Acquire a generic context with the normal priority for the given console.
>  * Prevent the release by entering the unsafe state.
>  *
>  * Note: The console might still loose the ownership by a hostile takeover.
>  *	 But it can be done only by the final flush in panic() when other
>  *	 CPUs should be stopped and other contexts interrupted.
>  */
> static void nbcon_normal_context_acquire(struct console *con)
> {
> 	struct nbcon_context ctxt;
>
> 	do {
> 		do {
> 			memset(&ctxt, 0, sizeof(ctxt));
> 			ctxt.console	= con;
> 			ctxt.prio	= NBCON_PRIO_NORMAL;
> 		} while (!nbcon_context_try_acquire(&ctxt));
>
> 	} while (!nbcon_context_enter_unsafe(&ctxt));
> }
>
> /**
>  * uart_nbcon_acquire - Acquire nbcon console associated with the gived port.
>  * @up:		uart port
>  *
>  * Context:	Must be called under up->lock to prevent manipulating
>  *		up->cons and migrating to another CPU.
>  *
>  * Note: The console might still loose the ownership by a hostile takeover.
>  *	 But it can be done only by the final flush in panic() when other
>  *	 CPUs should be stopped and other contexts interrupted.
>  */
> void uart_nbcon_acquire(struct uart_port *up)
> {
> 	struct console *con; = up->cons;
>
> 	lockdep_assert_held(&up->lock);
>
> 	/*
> 	 * FIXME: This would require adding WRITE_ONCE()
> 	 * on the write part.
> 	 *
> 	 * Or even better, the value should be modified under
> 	 * port->lock so that simple read would be enough here.
> 	 */
> 	con = data_race(READ_ONCE(up->cons));
>
> 	if (!con)
> 		return;
>
> 	if (!console_read_flags(con) & CON_NBCON)
> 		return;
>
> 	nbcon_normal_context_acquire(con);
> }
>
> Note that I did not use up->nbcon_locked_port as explained above.

Note that it will not work because other ports will share the same
up->cons value even though they are not consoles. up->cons only
specifies which struct console to use _if_ printk chooses that port as a
console. It does _not_ mean that printk has chosen that port.

>> +void uart_nbcon_release(struct uart_port *up)
>> +{
>> +	struct console *con = up->cons;
>> +	struct nbcon_context ctxt = {
>> +		.console	= con,
>> +		.prio		= NBCON_PRIO_NORMAL,
>> +	};
>> +
>
> I would add here as well.
>
> 	lockdep_assert_held(&up->lock);

OK.

> This deserves a comment why we do not complain when this function
> is called for nbcon and it is not locked. Something like:
>
> 	/*
> 	 * Even port used by nbcon console might be seen unlocked
> 	 * when it was locked and the console has been registered
> 	 * at the same time.
> 	 */

I think a more appropriate comment would be:

	/*
	 * This function is called for ports that are not consoles
	 * and for ports that may be consoles but are not nbcon
	 * consoles. In those the cases the nbcon console was
	 * never locked and this context must not unlock.
	 */

>> +	if (!up->nbcon_locked_port)
>> +		return;
>> +
>> +	if (nbcon_context_exit_unsafe(&ctxt))
>> +		nbcon_context_release(&ctxt);
>> +
>> +	up->nbcon_locked_port = false;
>> +}
>
> Again I would better split the nbcon and uart part and create:

I can do the split, but I do not see the reason for it.

John

  reply	other threads:[~2024-03-11 17:09 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-18 18:57 [PATCH printk v2 00/26] wire up write_atomic() printing John Ogness
2024-02-18 18:57 ` [PATCH printk v2 01/26] serial: core: Provide port lock wrappers John Ogness
2024-02-18 18:57 ` [PATCH printk v2 02/26] serial: core: Use " John Ogness
2024-02-18 18:57 ` [PATCH printk v2 03/26] serial: core: fix kernel-doc for uart_port_unlock_irqrestore() John Ogness
2024-02-18 18:57 ` [PATCH printk v2 04/26] printk: Consider nbcon boot consoles on seq init John Ogness
2024-02-20 10:26   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 05/26] printk: Add notation to console_srcu locking John Ogness
2024-02-20 10:29   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 06/26] printk: nbcon: Ensure ownership release on failed emit John Ogness
2024-02-20 15:16   ` Petr Mladek
2024-02-20 16:29     ` John Ogness
2024-02-21 13:23       ` John Ogness
2024-02-18 18:57 ` [PATCH printk v2 07/26] printk: Check printk_deferred_enter()/_exit() usage John Ogness
2024-02-21  9:55   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 08/26] printk: nbcon: Implement processing in port->lock wrapper John Ogness
2024-02-19 12:16   ` Andy Shevchenko
2024-02-19 14:18     ` John Ogness
2024-02-19 14:35       ` Andy Shevchenko
2024-02-19 16:52         ` John Ogness
2024-02-19 17:14           ` Andy Shevchenko
2024-02-23 10:51   ` Petr Mladek
2024-03-11 17:08     ` John Ogness [this message]
2024-03-13  9:49       ` John Ogness
2024-03-22  6:23         ` Tony Lindgren
2024-03-27  9:32           ` John Ogness
2024-03-27 13:12             ` Andy Shevchenko
2024-03-14 11:37       ` John Ogness
2024-03-14 14:26       ` Petr Mladek
2024-03-15 15:04         ` John Ogness
2024-03-18 15:42           ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 09/26] printk: nbcon: Add detailed doc for write_atomic() John Ogness
2024-02-23 13:11   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 10/26] printk: nbcon: Fix kerneldoc for enums John Ogness
2024-02-18 19:10   ` Randy Dunlap
2024-02-23 13:13   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 11/26] printk: Make console_is_usable() available to nbcon John Ogness
2024-02-18 18:57 ` [PATCH printk v2 12/26] printk: Let console_is_usable() handle nbcon John Ogness
2024-02-18 18:57 ` [PATCH printk v2 13/26] printk: Add @flags argument for console_is_usable() John Ogness
2024-02-18 18:57 ` [PATCH printk v2 14/26] printk: nbcon: Provide function to flush using write_atomic() John Ogness
2024-02-23 15:47   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 15/26] printk: Track registered boot consoles John Ogness
2024-02-23 15:57   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 16/26] printk: nbcon: Use nbcon consoles in console_flush_all() John Ogness
2024-02-23 17:15   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 17/26] printk: nbcon: Assign priority based on CPU state John Ogness
2024-02-29 13:50   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 18/26] printk: nbcon: Add unsafe flushing on panic John Ogness
2024-02-29 13:53   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 19/26] printk: Avoid console_lock dance if no legacy or boot consoles John Ogness
2024-02-29 15:19   ` Petr Mladek
2024-02-29 16:19   ` READ_ONCE: was: " Petr Mladek
2024-02-29 22:51     ` Paul E. McKenney
2024-02-18 18:57 ` [PATCH printk v2 20/26] printk: Track nbcon consoles John Ogness
2024-03-01  9:39   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 21/26] printk: Coordinate direct printing in panic John Ogness
2024-03-01 13:05   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 22/26] printk: nbcon: Implement emergency sections John Ogness
2024-03-01 13:28   ` Petr Mladek
2024-03-01 15:49   ` flush was: " Petr Mladek
2024-03-01 16:12     ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 23/26] panic: Mark emergency section in warn John Ogness
2024-03-01 13:57   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 24/26] panic: Mark emergency section in oops John Ogness
2024-03-01 14:55   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 25/26] rcu: Mark emergency section in rcu stalls John Ogness
2024-03-01 15:13   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 26/26] lockdep: Mark emergency section in lockdep splats John Ogness
2024-02-19  4:14   ` Waiman Long
2024-02-19 11:11     ` John Ogness
2024-02-19 15:07       ` Waiman Long
2024-03-01 15:18   ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87le6oy9vg.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=geert+renesas@glider.be \
    --cc=gregkh@linuxfoundation.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=jiaqing.zhao@linux.intel.com \
    --cc=jirislaby@kernel.org \
    --cc=justin.chen@broadcom.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    --cc=tony@atomide.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.