All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()
Date: Fri, 12 Apr 2024 11:07:05 +0200	[thread overview]
Message-ID: <Zhj5uQ-JJnlIGUXK@localhost.localdomain> (raw)
In-Reply-To: <ZhfwXsEE2Y8IPPxX@localhost.localdomain>

On Thu 2024-04-11 16:14:58, Petr Mladek wrote:
> On Wed 2024-04-03 00:17:19, John Ogness wrote:
> > Allow nbcon consoles to print messages in the legacy printk()
> > caller context (printing via unlock) by integrating them into
> > console_flush_all(). The write_atomic() callback is used for
> > printing.
> 
> Hmm, this patch tries to flush nbcon console even in context
> with NBCON_PRIO_NORMAL. Do we really want this, please?
> 
> I would expect that it would do so only when the kthread
> is not working.
> 
> > Provide nbcon_legacy_emit_next_record(), which acts as the
> > nbcon variant of console_emit_next_record(). Call this variant
> > within console_flush_all() for nbcon consoles. Since nbcon
> > consoles use their own @nbcon_seq variable to track the next
> > record to print, this also must be appropriately handled.
> 
> I have been a bit confused by all the boolean return values
> and what _exactly_ they mean. IMHO, we should make it more
> clear how it works when it can't acquire the context.
> 
> IMHO, it is is importnat because console_flush_all() interprets
> nbcon_legacy_emit_next_record() return value as @progress even when
> there is no guaranteed progress. We just expect that
> the other context is doing something.
> 
> It feels like it might get stuck forewer in some situatuon.
> It would be good to understand if it is OK or not.
> 
> 
> Later update:
> 
> Hmm, console_flush_all() is called from console_unlock().
> It might be called in atomic context. But the current
> owner might be theoretically scheduled out.
> 
> This is from documentation of nbcon_context_try_acquire()
> 
> /**
>  * nbcon_context_try_acquire - Try to acquire nbcon console
>  * @ctxt:	The context of the caller
>  *
>  * Context:	Any context which could not be migrated to another CPU.
> 
> 
> I can't find any situation where nbcon_context_try_acquire() is
> currently called in normal (schedulable) context. This is probably
> why you did not see any problems with testing.

> I see 3 possible solutions:
> 
>   1. Enforce that nbcon context can be acquired only with preemtion
>      disabled.

We actually have to make sure that preemtion is disabled because
nbcon_owner_matches() is not reliable after a wakeup.

The context might be taken by a higher priority context then
released and then taken by another task on the same CPU as
the original sleeping owner. I mean this:


CPU0				CPU1

 [ task A ]

 nbcon_context_try_acquire()
   # success with NORMAL prio
   # .unsafe == false;  // safe for takeover

 [ schedule: task A -> B ]


				WARN_ON()
				  nbcon_atomic_flush_pending()
				    nbcon_context_try_acquire()
				      # success with EMERGENCY prio
				      # .unsafe == false;  // safe for takeover

				      # flushing
				      nbcon_context_release()


 nbcon_context_try_acquire()
   # success with NORMAL prio [ task B ]
   # .unsafe == false;  // safe for takeover

 [ schedule: task B -> A ]

 nbcon_enter_unsafe()
   nbcon_context_can_proceed()

BUG: nbcon_context_can_proceed() returns "true" because
     the console is owned by a context on CPU0 with
     NBCON_PRIO_NORMAL.

     But it should return "false". The console is owned
     by a context from task B and we do the check
     in a context from task A.


I guess that most of the current code is safe because, for example:

  + __nbcon_atomic_flush_pending() disables interrupts before
    acquiring the context

  + nbcon_driver_acquire() is called under spin_lock in
    the uart_port_*lock() API.

  + Even the nbcon_kthread_func() in the current RT tree
    acquires the context under con->device_lock(). Where
    the device_lock() is a spin_lock in the only supported
    uart serial console.


To be done:

1. We should make this clear:

    + Add either preempt_disable() or cant_sleep() into
      nbcon_context_try_acquire().

    + Replace cant_migrate() with cant_sleep everywhere

    + Fix/update the documentation


2. We should make sure that the context is acquired for each
   emitted record separately at least when using the normal
   priority.

   For example,  __nbcon_atomic_flush_pending() is wrong from
   this POV. It is used also from console_unlock(). It should
   allow to schedule in between the records in this case.


Best Regards,
Petr

PS: I am still shaking my head around this. Sigh, I haven't expected
    such a big "aha moment" at this stage.

  parent reply	other threads:[~2024-04-12  9:07 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02 22:11 [PATCH printk v4 00/27] wire up write_atomic() printing John Ogness
2024-04-02 22:11 ` [PATCH printk v4 01/27] printk: Add notation to console_srcu locking John Ogness
2024-04-02 22:11 ` [PATCH printk v4 02/27] printk: Properly deal with nbcon consoles on seq init John Ogness
2024-04-05 15:37   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 03/27] printk: nbcon: Remove return value for write_atomic() John Ogness
2024-04-08 13:17   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 04/27] printk: Check printk_deferred_enter()/_exit() usage John Ogness
2024-04-02 22:11 ` [PATCH printk v4 05/27] printk: nbcon: Add detailed doc for write_atomic() John Ogness
2024-04-08 15:20   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver John Ogness
2024-04-09  9:20   ` Petr Mladek
2024-04-16 15:01     ` John Ogness
2024-04-17 13:03       ` Petr Mladek
2024-04-17 14:54         ` John Ogness
2024-04-18 10:33           ` Petr Mladek
2024-04-18 12:10             ` John Ogness
2024-04-18 15:03               ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 07/27] printk: nbcon: Use driver synchronization while registering John Ogness
2024-04-09  9:46   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 08/27] serial: core: Provide low-level functions to lock port John Ogness
2024-04-09 12:00   ` Petr Mladek
2024-04-09 13:23   ` Greg Kroah-Hartman
2024-04-02 22:11 ` [PATCH printk v4 09/27] printk: nbcon: Implement processing in port->lock wrapper John Ogness
2024-04-03 11:35   ` John Ogness
2024-04-10 12:35   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 10/27] printk: nbcon: Do not rely on proxy headers John Ogness
2024-04-03 10:33   ` Andy Shevchenko
2024-04-10 13:06   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 11/27] printk: nbcon: Fix kerneldoc for enums John Ogness
2024-04-02 22:11 ` [PATCH printk v4 12/27] printk: Make console_is_usable() available to nbcon John Ogness
2024-04-02 22:11 ` [PATCH printk v4 13/27] printk: Let console_is_usable() handle nbcon John Ogness
2024-04-02 22:11 ` [PATCH printk v4 14/27] printk: Add @flags argument for console_is_usable() John Ogness
2024-04-02 22:11 ` [PATCH printk v4 15/27] printk: nbcon: Provide function to flush using write_atomic() John Ogness
2024-04-10 14:56   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 16/27] printk: Track registered boot consoles John Ogness
2024-04-02 22:11 ` [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all() John Ogness
2024-04-11 14:14   ` Petr Mladek
2024-04-11 15:32     ` Petr Mladek
2024-04-17 23:05       ` John Ogness
2024-04-18 12:47         ` Petr Mladek
2024-04-18 21:45           ` John Ogness
2024-04-19  9:55             ` Petr Mladek
2024-04-12  9:07     ` Petr Mladek [this message]
2024-04-17 21:59     ` John Ogness
2024-04-02 22:11 ` [PATCH printk v4 18/27] printk: nbcon: Assign priority based on CPU state John Ogness
2024-04-02 22:11 ` [PATCH printk v4 19/27] printk: nbcon: Add unsafe flushing on panic John Ogness
2024-04-11 14:18   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 20/27] printk: Avoid console_lock dance if no legacy or boot consoles John Ogness
2024-04-03 10:01   ` John Ogness
2024-04-12 14:21     ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 21/27] printk: Track nbcon consoles John Ogness
2024-04-12 14:22   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 22/27] printk: Coordinate direct printing in panic John Ogness
2024-04-12 14:39   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 23/27] printk: nbcon: Implement emergency sections John Ogness
2024-04-12 15:27   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 24/27] panic: Mark emergency section in warn John Ogness
2024-04-15 13:16   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 25/27] panic: Mark emergency section in oops John Ogness
2024-04-15 13:22   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 26/27] rcu: Mark emergency section in rcu stalls John Ogness
2024-04-15 13:32   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 27/27] lockdep: Mark emergency sections in lockdep splats John Ogness
2024-04-16  9:51   ` Petr Mladek
2024-04-16 11:17   ` Peter Zijlstra
2024-04-16 12:53     ` John Ogness

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zhj5uQ-JJnlIGUXK@localhost.localdomain \
    --to=pmladek@suse.com \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.