rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH printk v5 00/30] wire up write_atomic() printing
@ 2024-05-02 21:38 John Ogness
  2024-05-02 21:38 ` [PATCH printk v5 29/30] rcu: Mark emergency sections in rcu stalls John Ogness
  0 siblings, 1 reply; 3+ messages in thread
From: John Ogness @ 2024-05-02 21:38 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Paul E. McKenney, Miguel Ojeda, Greg Kroah-Hartman,
	Jiri Slaby, linux-serial, Russell King, Tony Lindgren,
	Andy Shevchenko, Ilpo Järvinen, Uwe Kleine-König,
	Théo Lebrun, Linus Walleij, Hongyu Xie, Christophe JAILLET,
	Arnd Bergmann, Lino Sanfilippo, Andrew Morton, Lukas Wunner,
	Uros Bizjak, Kefeng Wang, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, rcu, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Waiman Long

Hi,

This is v5 of a series to wire up the nbcon consoles so that
they actually perform printing using their write_atomic()
callback. v4 is here [0]. For information about the motivation
of the atomic consoles, please read the cover letter of v1 [1].

The main focus of this series:

- For nbcon consoles, always call write_atomic() directly from
  printk() caller context for the panic CPU.

- For nbcon consoles, call write_atomic() when unlocking the
  console lock.

- Only perform the console lock/unlock dance if legacy or boot
  consoles are registered.

- For legacy consoles, if nbcon consoles are registered, do not
  attempt to print from printk() caller context for the panic
  CPU until nbcon consoles have had a chance to print the most
  significant messages.

- Mark emergency sections. In these sections printk() calls
  will only store the messages. Upon exiting the emergency
  section, nbcon consoles are flushed directly and legacy
  console flushing is triggered via irq_work.

This series does _not_ include threaded printing or nbcon
drivers. Those features will be added in separate follow-up
series.

Note: With this series, a system with _only_ nbcon consoles
      registered will not perform console printing unless the
      console lock or nbcon port lock are used or on panic.
      This is on purpose. When nbcon kthreads are introduced,
      they will fill the gaps.

The changes since v4:

- In serial_core_add_one_port(), initialize the port lock
  before setting @cons (since uart_port_set_cons() uses the
  port lock).

- For unregister_console_locked(), take the con->device_lock
  when removing the console from the console list.

- Remove the struct nbcon_drvdata and instead rely on the port
  lock being taken when adding/removing uart nbcon console list
  items.

- Move the nbcon context for drivers into the struct console
  (formery in struct nbcon_drvdata).

- Simplify the port lock wrapper implementation since we can
  rely on the registration state not changing while the port
  lock is held.

- Change nbcon_driver_acquire() to nbcon_driver_try_acquire()
  in order to support try-semantics for the port lock wrappers.
  Also update its kerneldoc to clarify its usage.

- Implement true try-lock semantics for the try-variants of the
  port lock wrappers.

- Remove the retry-loop in __nbcon_atomic_flush_pending() since
  there is never a need to retry. If a context takes over
  ownership it also takes over responsibility to print the
  records.

- Invert the return value of nbcon_atomic_emit_one() and
  add kerneldoc and comments about the meaning of the return
  value.

- For nbcon_legacy_emit_next_record() use the same return
  value and kerneldoc explanation as nbcon_atomic_emit_one().

- Invert the meaning of the return value of
  __nbcon_atomic_flush_pending_con() and use various errno
  values to report the reason for failure.

- Add nbcon_atomic_flush_pending_con() to flush all records if
  records were added while flushing. (Once printer threads are
  available, we can rely on them to print the remaining
  records.)

- For nbcon_driver_release(), flush all records if records were
  added while holding the port lock. (Once printer threads are
  available, we can rely on them to print the remaining
  records.)

- Add nbcon_cpu_emergency_flush() to allow periodically
  flushing if there has been many records stored in emergency
  context. It also attempts legacy flushing when safe.

- Change lockdep_print_held_locks() and debug_show_all_locks()
  to rely on their callers marking emergency sections because
  these functions can be called in non-emergency situations.
  Note that debug_show_all_locks() still calls
  nbcon_cpu_emergency_flush() is case it used in emergency.

- Rename console_init_seq() to get_init_console_seq() and have
  it return the new seq rather than setting @newcon->seq.

- Change nbcon_init() to take the initial sequence number as
  an argument.

- For __pr_flush(), move the barrier() to ensure no
  intermediate use of the printing_via_unlock() macro.

- For nbcon_cpu_emergency_exit(), update the comments and
  WARN_ON_ONCE position as suggested.

- Move the printing_via_unlock() macro into internal.h so that
  it can be used by nbcon.c as well (in
  nbcon_cpu_emergency_flush()).

- Update the kerneldoc for nbcon callbacks write_atomic() and
  device_lock().

- Add clarification in console_srcu_read_flags() kerneldoc.

- Change kerneldoc nbcon_context_try_acquire() context to
  mention device_lock() or local_irq_save() requirement.

John Ogness

[0] https://lore.kernel.org/lkml/20240402221129.2613843-1-john.ogness@linutronix.de
[1] https://lore.kernel.org/lkml/20230302195618.156940-1-john.ogness@linutronix.de

John Ogness (25):
  printk: Add notation to console_srcu locking
  printk: nbcon: Remove return value for write_atomic()
  printk: nbcon: Add detailed doc for write_atomic()
  printk: nbcon: Add callbacks to synchronize with driver
  printk: nbcon: Use driver synchronization while (un)registering
  serial: core: Provide low-level functions to lock port
  serial: core: Introduce wrapper to set @uart_port->cons
  console: Improve console_srcu_read_flags() comments
  nbcon: Provide functions for drivers to acquire console for
    non-printing
  serial: core: Implement processing in port->lock wrapper
  printk: nbcon: Do not rely on proxy headers
  printk: nbcon: Fix kerneldoc for enums
  printk: Make console_is_usable() available to nbcon
  printk: Let console_is_usable() handle nbcon
  printk: Add @flags argument for console_is_usable()
  printk: nbcon: Add helper to assign priority based on CPU state
  printk: Track registered boot consoles
  printk: nbcon: Use nbcon consoles in console_flush_all()
  printk: nbcon: Add unsafe flushing on panic
  printk: Avoid console_lock dance if no legacy or boot consoles
  printk: Track nbcon consoles
  printk: Coordinate direct printing in panic
  panic: Mark emergency section in oops
  rcu: Mark emergency sections in rcu stalls
  lockdep: Mark emergency sections in lockdep splats

Petr Mladek (1):
  printk: Properly deal with nbcon consoles on seq init

Sebastian Andrzej Siewior (1):
  printk: Check printk_deferred_enter()/_exit() usage

Thomas Gleixner (3):
  printk: nbcon: Provide function to flush using write_atomic()
  printk: nbcon: Implement emergency sections
  panic: Mark emergency section in warn

 drivers/tty/serial/8250/8250_core.c |   6 +-
 drivers/tty/serial/amba-pl011.c     |   2 +-
 drivers/tty/serial/serial_core.c    |  16 +-
 include/linux/console.h             | 116 ++++++-
 include/linux/printk.h              |  33 +-
 include/linux/serial_core.h         | 117 ++++++-
 kernel/locking/lockdep.c            |  84 ++++-
 kernel/panic.c                      |   9 +
 kernel/printk/internal.h            |  71 +++-
 kernel/printk/nbcon.c               | 488 +++++++++++++++++++++++++++-
 kernel/printk/printk.c              | 303 +++++++++++++----
 kernel/printk/printk_ringbuffer.h   |   2 +
 kernel/printk/printk_safe.c         |  12 +
 kernel/rcu/tree_exp.h               |   9 +
 kernel/rcu/tree_stall.h             |  11 +
 15 files changed, 1148 insertions(+), 131 deletions(-)


base-commit: a2b4cab9da7746c42f87c13721d305baf0085a20
-- 
2.39.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH printk v5 29/30] rcu: Mark emergency sections in rcu stalls
  2024-05-02 21:38 [PATCH printk v5 00/30] wire up write_atomic() printing John Ogness
@ 2024-05-02 21:38 ` John Ogness
  2024-05-21 14:03   ` Petr Mladek
  0 siblings, 1 reply; 3+ messages in thread
From: John Ogness @ 2024-05-02 21:38 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu

Mark emergency sections wherever multiple lines of
rcu stall information are generated. In an emergency
section the CPU will not perform console output for the
printk() calls. Instead, a flushing of the console
output is triggered when exiting the emergency section.
This allows the full message block to be stored as
quickly as possible in the ringbuffer.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/rcu/tree_exp.h   |  9 +++++++++
 kernel/rcu/tree_stall.h | 11 +++++++++++
 2 files changed, 20 insertions(+)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 6b83537480b1..bc1d8733c08f 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -7,6 +7,7 @@
  * Authors: Paul E. McKenney <paulmck@linux.ibm.com>
  */
 
+#include <linux/console.h>
 #include <linux/lockdep.h>
 
 static void rcu_exp_handler(void *unused);
@@ -571,6 +572,9 @@ static void synchronize_rcu_expedited_wait(void)
 			return;
 		if (rcu_stall_is_suppressed())
 			continue;
+
+		nbcon_cpu_emergency_enter();
+
 		j = jiffies;
 		rcu_stall_notifier_call_chain(RCU_STALL_NOTIFY_EXP, (void *)(j - jiffies_start));
 		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
@@ -612,6 +616,7 @@ static void synchronize_rcu_expedited_wait(void)
 			}
 			pr_cont("\n");
 		}
+		nbcon_cpu_emergency_flush();
 		rcu_for_each_leaf_node(rnp) {
 			for_each_leaf_node_possible_cpu(rnp, cpu) {
 				mask = leaf_node_cpu_bit(rnp, cpu);
@@ -624,6 +629,9 @@ static void synchronize_rcu_expedited_wait(void)
 			rcu_exp_print_detail_task_stall_rnp(rnp);
 		}
 		jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
+
+		nbcon_cpu_emergency_exit();
+
 		panic_on_rcu_stall();
 	}
 }
@@ -792,6 +800,7 @@ static void rcu_exp_print_detail_task_stall_rnp(struct rcu_node *rnp)
 		 */
 		touch_nmi_watchdog();
 		sched_show_task(t);
+		nbcon_cpu_emergency_flush();
 	}
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 }
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 5d666428546b..1ca0826545c1 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -7,6 +7,7 @@
  * Author: Paul E. McKenney <paulmck@linux.ibm.com>
  */
 
+#include <linux/console.h>
 #include <linux/kvm_para.h>
 #include <linux/rcu_notifier.h>
 
@@ -260,6 +261,7 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
 		 */
 		touch_nmi_watchdog();
 		sched_show_task(t);
+		nbcon_cpu_emergency_flush();
 	}
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 }
@@ -522,6 +524,7 @@ static void print_cpu_stall_info(int cpu)
 	       falsepositive ? " (false positive?)" : "");
 
 	print_cpu_stat_info(cpu);
+	nbcon_cpu_emergency_flush();
 }
 
 /* Complain about starvation of grace-period kthread.  */
@@ -604,6 +607,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	if (rcu_stall_is_suppressed())
 		return;
 
+	nbcon_cpu_emergency_enter();
+
 	/*
 	 * OK, time to rat on our buddy...
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
@@ -655,6 +660,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	rcu_check_gp_kthread_expired_fqs_timer();
 	rcu_check_gp_kthread_starvation();
 
+	nbcon_cpu_emergency_exit();
+
 	panic_on_rcu_stall();
 
 	rcu_force_quiescent_state();  /* Kick them all. */
@@ -675,6 +682,8 @@ static void print_cpu_stall(unsigned long gps)
 	if (rcu_stall_is_suppressed())
 		return;
 
+	nbcon_cpu_emergency_enter();
+
 	/*
 	 * OK, time to rat on ourselves...
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
@@ -703,6 +712,8 @@ static void print_cpu_stall(unsigned long gps)
 			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
+	nbcon_cpu_emergency_exit();
+
 	panic_on_rcu_stall();
 
 	/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH printk v5 29/30] rcu: Mark emergency sections in rcu stalls
  2024-05-02 21:38 ` [PATCH printk v5 29/30] rcu: Mark emergency sections in rcu stalls John Ogness
@ 2024-05-21 14:03   ` Petr Mladek
  0 siblings, 0 replies; 3+ messages in thread
From: Petr Mladek @ 2024-05-21 14:03 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu

On Thu 2024-05-02 23:44:38, John Ogness wrote:
> Mark emergency sections wherever multiple lines of
> rcu stall information are generated. In an emergency
> section the CPU will not perform console output for the
> printk() calls. Instead, a flushing of the console
> output is triggered when exiting the emergency section.
> This allows the full message block to be stored as
> quickly as possible in the ringbuffer.
>
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -612,6 +616,7 @@ static void synchronize_rcu_expedited_wait(void)
>  			}
>  			pr_cont("\n");
>  		}
> +		nbcon_cpu_emergency_flush();

It would make more sense to do the flush inside the cycle after each
dump_cpu_task(). Something like:

		rcu_for_each_leaf_node(rnp) {
			for_each_leaf_node_possible_cpu(rnp, cpu) {
				mask = leaf_node_cpu_bit(rnp, cpu);
				if (!(READ_ONCE(rnp->expmask) & mask))
					continue;
				preempt_disable(); // For smp_processor_id() in dump_cpu_task().
				dump_cpu_task(cpu);
				preempt_enable();
+				nbcon_cpu_emergency_flush();
			}
			rcu_exp_print_detail_task_stall_rnp(rnp);
		}


Or maybe, it is limited onto to few CPUs by rcu_for_each_leaf_node(rnp)?


>  		rcu_for_each_leaf_node(rnp) {
>  			for_each_leaf_node_possible_cpu(rnp, cpu) {
>  				mask = leaf_node_cpu_bit(rnp, cpu);

Otherwise, it looks good to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-05-21 14:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-02 21:38 [PATCH printk v5 00/30] wire up write_atomic() printing John Ogness
2024-05-02 21:38 ` [PATCH printk v5 29/30] rcu: Mark emergency sections in rcu stalls John Ogness
2024-05-21 14:03   ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).