linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks
@ 2018-06-05  2:19 Hoeun Ryu
  2018-06-22  4:59 ` Hoeun Ryu
  0 siblings, 1 reply; 2+ messages in thread
From: Hoeun Ryu @ 2018-06-05  2:19 UTC (permalink / raw)
  To: Andrew Morton, Kees Cook, Andi Kleen, Borislav Petkov,
	Thomas Gleixner, Steven Rostedt (VMware)
  Cc: Josh Poimboeuf, Tejun Heo, Vitaly Kuznetsov, Hoeun Ryu, linux-kernel

From: Hoeun Ryu <hoeun.ryu@lge.com>

 Many console device drivers hold the uart_port->lock spinlock with irq disabled
(using spin_lock_irqsave()) while the device drivers are writing characters to their
devices, but the device drivers just try to hold the spin lock (using
spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater than 1 to
avoid deadlocks.

 There is a case ocurring a deadlock related to the lock and oops_in_progress. If the
kernel lockup detector calls panic() while the device driver is holding the lock,
it can cause a deadlock because panic() eventually calls console_unlock() and tries
to hold the lock. Here is an example.

 CPU0

 local_irq_save()
 .
 foo()
 bar()
 .						// foo() + bar() takes long time
 printk()
   console_unlock()
     call_console_drivers()			// close to watchdog threshold
       some_slow_console_device_write()		// device driver code
         spin_lock_irqsave(uart->lock)		// acquire uart spin lock
           slow-write()
             watchdog_overflow_callback()	// watchdog expired and call panic()
               panic()
                 bust_spinlocks(0)		// now, oops_in_progress = 0
                   console_flush_on_panic()
                     console_unlock()
                       call_console_drivers()
                         some_slow_console_device_write()
                           spin_lock_irqsave(uart->lock)
                           ^^^^ deadlock	// we can use spin_trylock_irqsave()

 console_flush_on_panic() is called in panic() and it eventually holds the uart
lock but the lock is held by the preempted CPU (the same CPU in NMI context) and it is
a deadlock.
 By moving bust_spinlocks(0) after console_flush_on_panic(), let the console device
drivers think the Oops is still in progress to call spin_trylock_irqsave() instead of
spin_lock_irqsave() to avoid the deadlock.

 CPU0

 watchdog_overflow_callback()			// watchdog expired and call panic()
   panic()
     console_flush_on_panic()
       console_unlock()
         call_console_drivers()
           some_slow_console_device_write()
             spin_trylock_irqsave(uart->lock)	// oops_in_progress = 1
             ^^^^ use trylock, no deadlock
     bust_spinlocks(0)				// now, oops_in_progress = 0

Signed-off-by: Hoeun Ryu <hoeun.ryu@lge.com>
---
 v2: fix commit message on the reason of a deadlock, no code change.

 kernel/panic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 42e4874..b4063b6 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
 	if (_crash_kexec_post_notifiers)
 		__crash_kexec(NULL);
 
-	bust_spinlocks(0);
-
 	/*
 	 * We may have ended up stopping the CPU holding the lock (in
 	 * smp_send_stop()) while still having some valuable data in the console
@@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
 	debug_locks_off();
 	console_flush_on_panic();
 
+	bust_spinlocks(0);
+
 	if (!panic_blink)
 		panic_blink = no_blink;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* RE: [PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks
  2018-06-05  2:19 [PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks Hoeun Ryu
@ 2018-06-22  4:59 ` Hoeun Ryu
  0 siblings, 0 replies; 2+ messages in thread
From: Hoeun Ryu @ 2018-06-22  4:59 UTC (permalink / raw)
  To: 'Hoeun Ryu', 'Andrew Morton', 'Kees Cook',
	'Andi Kleen', 'Borislav Petkov',
	'Thomas Gleixner', 'Steven Rostedt (VMware)'
  Cc: sergey.senozhatsky.work, pmladek, 'Josh Poimboeuf',
	'Tejun Heo', 'Vitaly Kuznetsov',
	linux-kernel

+CC
sergey.senozhatsky.work@gmail.com
pmladek@suse.com

Please review this patch.

> -----Original Message-----
> From: Hoeun Ryu [mailto:hoeun.ryu@lge.com.com]
> Sent: Tuesday, June 05, 2018 11:19 AM
> To: Andrew Morton <akpm@linux-foundation.org>; Kees Cook
> <keescook@chromium.org>; Andi Kleen <ak@linux.intel.com>; Borislav Petkov
> <bp@suse.de>; Thomas Gleixner <tglx@linutronix.de>; Steven Rostedt
(VMware)
> <rostedt@goodmis.org>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>; Tejun Heo <tj@kernel.org>;
> Vitaly Kuznetsov <vkuznets@redhat.com>; Hoeun Ryu <hoeun.ryu@lge.com>;
> linux-kernel@vger.kernel.org
> Subject: [PATCH v2] panic: move bust_spinlocks(0) after
> console_flush_on_panic() to avoid deadlocks
> 
> From: Hoeun Ryu <hoeun.ryu@lge.com>
> 
>  Many console device drivers hold the uart_port->lock spinlock with irq
> disabled
> (using spin_lock_irqsave()) while the device drivers are writing
> characters to their
> devices, but the device drivers just try to hold the spin lock (using
> spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater
> than 1 to
> avoid deadlocks.
> 
>  There is a case ocurring a deadlock related to the lock and
> oops_in_progress. If the
> kernel lockup detector calls panic() while the device driver is holding
> the lock,
> it can cause a deadlock because panic() eventually calls console_unlock()
> and tries
> to hold the lock. Here is an example.
> 
>  CPU0
> 
>  local_irq_save()
>  .
>  foo()
>  bar()
>  .						// foo() + bar() takes long
time
>  printk()
>    console_unlock()
>      call_console_drivers()			// close to watchdog
threshold
>        some_slow_console_device_write()		// device driver
code
>          spin_lock_irqsave(uart->lock)		// acquire uart spin
lock
>            slow-write()
>              watchdog_overflow_callback()	// watchdog expired and call
> panic()
>                panic()
>                  bust_spinlocks(0)		// now, oops_in_progress = 0
>                    console_flush_on_panic()
>                      console_unlock()
>                        call_console_drivers()
>                          some_slow_console_device_write()
>                            spin_lock_irqsave(uart->lock)
>                            ^^^^ deadlock	// we can use
> spin_trylock_irqsave()
> 
>  console_flush_on_panic() is called in panic() and it eventually holds the
> uart
> lock but the lock is held by the preempted CPU (the same CPU in NMI
> context) and it is
> a deadlock.
>  By moving bust_spinlocks(0) after console_flush_on_panic(), let the
> console device
> drivers think the Oops is still in progress to call spin_trylock_irqsave()
> instead of
> spin_lock_irqsave() to avoid the deadlock.
> 
>  CPU0
> 
>  watchdog_overflow_callback()			// watchdog expired and
> call panic()
>    panic()
>      console_flush_on_panic()
>        console_unlock()
>          call_console_drivers()
>            some_slow_console_device_write()
>              spin_trylock_irqsave(uart->lock)	// oops_in_progress = 1
>              ^^^^ use trylock, no deadlock
>      bust_spinlocks(0)				// now,
oops_in_progress =
> 0
> 
> Signed-off-by: Hoeun Ryu <hoeun.ryu@lge.com>
> ---
>  v2: fix commit message on the reason of a deadlock, no code change.
> 
>  kernel/panic.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 42e4874..b4063b6 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
>  	if (_crash_kexec_post_notifiers)
>  		__crash_kexec(NULL);
> 
> -	bust_spinlocks(0);
> -
>  	/*
>  	 * We may have ended up stopping the CPU holding the lock (in
>  	 * smp_send_stop()) while still having some valuable data in the
> console
> @@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
>  	debug_locks_off();
>  	console_flush_on_panic();
> 
> +	bust_spinlocks(0);
> +
>  	if (!panic_blink)
>  		panic_blink = no_blink;
> 
> --
> 2.1.4


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-06-22  4:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-05  2:19 [PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks Hoeun Ryu
2018-06-22  4:59 ` Hoeun Ryu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).