linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb
@ 2018-10-29 18:07 Douglas Anderson
  2018-10-29 18:07 ` [PATCH 1/7] serial: qcom_geni_serial: Finish supporting sysrq Douglas Anderson
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Douglas Anderson @ 2018-10-29 18:07 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, nm, linux-mips,
	dalias, catalin.marinas, vigneshr, linux-aspeed, linux-sh,
	peterz, will.deacon, mhocko, paulus, hpa, sparclinux, marex, sfr,
	ysato, linux-hexagon, x86, linux, pombredanne, tony, mingo, joel,
	linux-serial, rolf.evers.fischer, jhogan, asierra,
	linux-snps-arc, dan.carpenter, ying.huang, riel, frederic,
	jslaby, paul.burton, rppt, bp, luto, andriy.shevchenko,
	linux-arm-kernel, christophe.leroy, andrew, linux-kernel, ralf,
	rkuo, Jisheng.Zhang, vgupta, benh, jk, mpe, akpm, linuxppc-dev,
	davem, kstewart

I started out this series trying to make sysrq work over the serial
console on qcom_geni_serial, then fell into a rat's nest.

To solve the deadlock I faced when enabling sysrq I tried to borrow
code from '8250_port.c' which avoided grabbing the port lock in
console_write().  ...but since these days I try to run with lockdep on
all the time, I found it caused an annoying lockdep splat (which I
also reproduced on my rk3399 board).  ...so I instead changed my
qcom_geni_serial solution to borrow code from 'msm_serial.c'

I wasn't super happy with the solution in 'msm_serial.c' though.  I
don't like releasing the spinlock there.  Not only is it ugly but it
means we are unlocking / re-locking _all the time_ even though sysrq
characters are rare.  ...so I came up with what I think is a better
solution and then implemented it for qcom_geni_serial.

Since I had a good way to test 8250-based UARTs, I also fixed that
driver to use my new method.  When doing so, I ran into a missing
msm_serial.c at all, so I didn't switch that (or all other serial
drivers for that matter) to the new method.

After fixing all the above issues, I found the next lockdep splat in
kgdb and I think I've worked around it in a good-enough way, but I'm
much less confident about this.  Hopefully folks can take a look at
it.

In general, patches earlier in this series should be "less
controversial" and hopefully can land even if people don't like
patches later in the series.  ;-)

Looking back, this is pretty much two series squashed that could be
treated indepdently.  The first is a serial series and the second is a
kgdb series.

For all serial patches I'd expect them to go through the tty tree once
they've been reviewed.

If folks are OK w/ the 'smp' patch it probably should go in some core
kernel tree.  The kgdb patch won't work without it, though, so to land
that we'd need coordination between the folks landing that and the
folks landing the 'smp' patch.


Douglas Anderson (7):
  serial: qcom_geni_serial: Finish supporting sysrq
  serial: core: Allow processing sysrq at port unlock time
  serial: qcom_geni_serial: Process sysrq at port unlock time
  serial: core: Include console.h from serial_core.h
  serial: 8250: Process sysrq at port unlock time
  smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
  kgdb: Remove irq flags and local_irq_enable/disable from roundup

 arch/arc/kernel/kgdb.c                      |  4 +--
 arch/arm/kernel/kgdb.c                      |  4 +--
 arch/arm64/kernel/kgdb.c                    |  4 +--
 arch/hexagon/kernel/kgdb.c                  | 11 ++----
 arch/mips/kernel/kgdb.c                     |  4 +--
 arch/powerpc/kernel/kgdb.c                  |  2 +-
 arch/sh/kernel/kgdb.c                       |  4 +--
 arch/sparc/kernel/smp_64.c                  |  2 +-
 arch/x86/kernel/kgdb.c                      |  9 ++---
 drivers/tty/serial/8250/8250_aspeed_vuart.c |  6 +++-
 drivers/tty/serial/8250/8250_fsl.c          |  6 +++-
 drivers/tty/serial/8250/8250_omap.c         |  6 +++-
 drivers/tty/serial/8250/8250_port.c         |  8 ++---
 drivers/tty/serial/qcom_geni_serial.c       | 10 ++++--
 include/linux/kgdb.h                        |  9 ++---
 include/linux/serial_core.h                 | 38 ++++++++++++++++++++-
 kernel/debug/debug_core.c                   |  2 +-
 kernel/smp.c                                |  4 ++-
 18 files changed, 80 insertions(+), 53 deletions(-)

-- 
2.19.1.568.g152ad8e336-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/7] serial: qcom_geni_serial: Finish supporting sysrq
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
@ 2018-10-29 18:07 ` Douglas Anderson
  2018-11-02 16:47   ` Stephen Boyd
  2018-10-29 18:07 ` [PATCH 2/7] serial: core: Allow processing sysrq at port unlock time Douglas Anderson
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Douglas Anderson @ 2018-10-29 18:07 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, linux-kernel,
	linux-serial, jslaby

The geni serial driver already had some sysrq code in it, but since
SUPPORT_SYSRQ wasn't defined the code didn't do anything useful.
Let's make it useful by adding that define using the same formula
found in other serial drivers.

In order to prevent deadlock, we'll take a page from the
'msm_serial.c' where the spinlock is released around
uart_handle_sysrq_char().  This seemed better than copying from
'8250_port.c' where we skip locking in the console_write function
since the '8250_port.c' method can cause lockdep warnings when
dropping into kgdb.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/tty/serial/qcom_geni_serial.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index d3b5261ee80a..3c8e0202da8b 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -1,6 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2017-2018, The Linux foundation. All rights reserved.
 
+#if defined(CONFIG_SERIAL_QCOM_GENI_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
+# define SUPPORT_SYSRQ
+#endif
+
 #include <linux/clk.h>
 #include <linux/console.h>
 #include <linux/io.h>
@@ -495,7 +499,10 @@ static int handle_rx_console(struct uart_port *uport, u32 bytes, bool drop)
 					continue;
 			}
 
+			spin_unlock(&uport->lock);
 			sysrq = uart_handle_sysrq_char(uport, buf[c]);
+			spin_lock(&uport->lock);
+
 			if (!sysrq)
 				tty_insert_flip_char(tport, buf[c], TTY_NORMAL);
 		}
-- 
2.19.1.568.g152ad8e336-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/7] serial: core: Allow processing sysrq at port unlock time
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
  2018-10-29 18:07 ` [PATCH 1/7] serial: qcom_geni_serial: Finish supporting sysrq Douglas Anderson
@ 2018-10-29 18:07 ` Douglas Anderson
  2018-10-30  5:31   ` Doug Anderson
  2018-10-29 18:07 ` [PATCH 3/7] serial: qcom_geni_serial: Process " Douglas Anderson
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Douglas Anderson @ 2018-10-29 18:07 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, linux-kernel, jslaby

Right now serial drivers process sysrq keys deep in their character
receiving code.  This means that they've already grabbed their
port->lock spinlock.  This can end up getting in the way if we've go
to do serial stuff (especially kgdb) in response to the sysrq.

Serial drivers have various hacks in them to handle this.  Looking at
'8250_port.c' you can see that the console_write() skips locking if
we're in the sysrq handler.  Looking at 'msm_serial.c' you can see
that the port lock is dropped around uart_handle_sysrq_char().

It turns out that these hacks aren't exactly perfect.  If you have
lockdep turned on and use something like the 8250_port hack you'll get
a splat that looks like:

  WARNING: possible circular locking dependency detected
  [...] is trying to acquire lock:
  ... (console_owner){-.-.}, at: console_unlock+0x2e0/0x5e4

  but task is already holding lock:
  ... (&port_lock_key){-.-.}, at: serial8250_handle_irq+0x30/0xe4

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (&port_lock_key){-.-.}:
         _raw_spin_lock_irqsave+0x58/0x70
         serial8250_console_write+0xa8/0x250
         univ8250_console_write+0x40/0x4c
         console_unlock+0x528/0x5e4
         register_console+0x2c4/0x3b0
         uart_add_one_port+0x350/0x478
         serial8250_register_8250_port+0x350/0x3a8
         dw8250_probe+0x67c/0x754
         platform_drv_probe+0x58/0xa4
         really_probe+0x150/0x294
         driver_probe_device+0xac/0xe8
         __driver_attach+0x98/0xd0
         bus_for_each_dev+0x84/0xc8
         driver_attach+0x2c/0x34
         bus_add_driver+0xf0/0x1ec
         driver_register+0xb4/0x100
         __platform_driver_register+0x60/0x6c
         dw8250_platform_driver_init+0x20/0x28
	 ...

  -> #0 (console_owner){-.-.}:
         lock_acquire+0x1e8/0x214
         console_unlock+0x35c/0x5e4
         vprintk_emit+0x230/0x274
         vprintk_default+0x7c/0x84
         vprintk_func+0x190/0x1bc
         printk+0x80/0xa0
         __handle_sysrq+0x104/0x21c
         handle_sysrq+0x30/0x3c
         serial8250_read_char+0x15c/0x18c
         serial8250_rx_chars+0x34/0x74
         serial8250_handle_irq+0x9c/0xe4
         dw8250_handle_irq+0x98/0xcc
         serial8250_interrupt+0x50/0xe8
         ...

  other info that might help us debug this:

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&port_lock_key);
                                 lock(console_owner);
                                 lock(&port_lock_key);
    lock(console_owner);

   *** DEADLOCK ***

The hack used in 'msm_serial.c' doesn't cause the above splats but it
seems a bit ugly to unlock / lock our spinlock deep in our irq
handler.

It seems like we could defer processing the sysrq until the end of the
interrupt handler right after we've unlocked the port.  With this
scheme if a whole batch of sysrq characters comes in one irq then we
won't handle them all, but that seems like it should be a fine
compromise.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 include/linux/serial_core.h | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 047fa67d039b..78de9d929762 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -175,6 +175,7 @@ struct uart_port {
 	struct console		*cons;			/* struct console, if any */
 #if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ)
 	unsigned long		sysrq;			/* sysrq timeout */
+	unsigned int		sysrq_ch;		/* char for sysrq */
 #endif
 
 	/* flags must be updated while holding port mutex */
@@ -485,8 +486,42 @@ uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
 	}
 	return 0;
 }
+static inline int
+uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch)
+{
+	if (port->sysrq) {
+		if (ch && time_before(jiffies, port->sysrq)) {
+			port->sysrq_ch = ch;
+			port->sysrq = 0;
+			return 1;
+		}
+		port->sysrq = 0;
+	}
+	return 0;
+}
+static inline void
+uart_unlock_and_check_sysrq(struct uart_port *port, unsigned long irqflags)
+{
+	int sysrq_ch;
+
+	sysrq_ch = port->sysrq_ch;
+	port->sysrq_ch = 0;
+
+	spin_unlock_irqrestore(&port->lock, irqflags);
+
+	if (sysrq_ch)
+		handle_sysrq(sysrq_ch);
+}
 #else
-#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; })
+static inline int
+uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) { return 0; }
+static inline int
+uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch) { return 0; }
+static inline void
+uart_unlock_and_check_sysrq(struct uart_port *port, unsigned long irqflags)
+{
+	spin_unlock_irqrestore(&port->lock, irqflags);
+}
 #endif
 
 /*
-- 
2.19.1.568.g152ad8e336-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/7] serial: qcom_geni_serial: Process sysrq at port unlock time
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
  2018-10-29 18:07 ` [PATCH 1/7] serial: qcom_geni_serial: Finish supporting sysrq Douglas Anderson
  2018-10-29 18:07 ` [PATCH 2/7] serial: core: Allow processing sysrq at port unlock time Douglas Anderson
@ 2018-10-29 18:07 ` Douglas Anderson
  2018-10-29 18:07 ` [PATCH 4/7] serial: core: Include console.h from serial_core.h Douglas Anderson
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Douglas Anderson @ 2018-10-29 18:07 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, linux-kernel,
	linux-serial, jslaby

Let's take advantage of the new ("serial: core: Allow processing sysrq
at port unlock time") to handle sysrqs more cleanly.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/tty/serial/qcom_geni_serial.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 3c8e0202da8b..20edce1e222e 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -499,9 +499,7 @@ static int handle_rx_console(struct uart_port *uport, u32 bytes, bool drop)
 					continue;
 			}
 
-			spin_unlock(&uport->lock);
-			sysrq = uart_handle_sysrq_char(uport, buf[c]);
-			spin_lock(&uport->lock);
+			sysrq = uart_prepare_sysrq_char(uport, buf[c]);
 
 			if (!sysrq)
 				tty_insert_flip_char(tport, buf[c], TTY_NORMAL);
@@ -811,7 +809,8 @@ static irqreturn_t qcom_geni_serial_isr(int isr, void *dev)
 		qcom_geni_serial_handle_rx(uport, drop_rx);
 
 out_unlock:
-	spin_unlock_irqrestore(&uport->lock, flags);
+	uart_unlock_and_check_sysrq(uport, flags);
+
 	return IRQ_HANDLED;
 }
 
-- 
2.19.1.568.g152ad8e336-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/7] serial: core: Include console.h from serial_core.h
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
                   ` (2 preceding siblings ...)
  2018-10-29 18:07 ` [PATCH 3/7] serial: qcom_geni_serial: Process " Douglas Anderson
@ 2018-10-29 18:07 ` Douglas Anderson
  2018-10-29 18:07 ` [PATCH 5/7] serial: 8250: Process sysrq at port unlock time Douglas Anderson
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Douglas Anderson @ 2018-10-29 18:07 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, linux-kernel, jslaby

In the static inline function uart_handle_break() in serial_core.h we
dereference port->cons.  That gives an error unless console.h is also
included.

This error hasn't shown up till now because everyone who has defined
SUPPORT_SYSRQ has also included console.h, but it's a bit ugly to make
this requirement.  Let's make the include explicit.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 include/linux/serial_core.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 78de9d929762..5fe2b037e833 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -22,6 +22,7 @@
 
 #include <linux/bitops.h>
 #include <linux/compiler.h>
+#include <linux/console.h>
 #include <linux/interrupt.h>
 #include <linux/circ_buf.h>
 #include <linux/spinlock.h>
-- 
2.19.1.568.g152ad8e336-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/7] serial: 8250: Process sysrq at port unlock time
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
                   ` (3 preceding siblings ...)
  2018-10-29 18:07 ` [PATCH 4/7] serial: core: Include console.h from serial_core.h Douglas Anderson
@ 2018-10-29 18:07 ` Douglas Anderson
  2018-10-29 18:07 ` [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus() Douglas Anderson
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Douglas Anderson @ 2018-10-29 18:07 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, nm, marex,
	vigneshr, linux-aspeed, jk, andrew, rolf.evers.fischer,
	linux-kernel, tony, Jisheng.Zhang, joel, linux-serial, jslaby,
	asierra, andriy.shevchenko, dan.carpenter, linux-arm-kernel

Let's take advantage of the new ("serial: core: Allow processing sysrq
at port unlock time") to handle sysrqs more cleanly.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I don't have a great way to test all 8250 variants, but I've at least
tested rk3288 / rk3399 and they seem to work.  Hopefully I got the
aspeed_vuart / fsl / omap variants right (I only compile tested
those).

 drivers/tty/serial/8250/8250_aspeed_vuart.c | 6 +++++-
 drivers/tty/serial/8250/8250_fsl.c          | 6 +++++-
 drivers/tty/serial/8250/8250_omap.c         | 6 +++++-
 drivers/tty/serial/8250/8250_port.c         | 8 +++-----
 4 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_aspeed_vuart.c b/drivers/tty/serial/8250/8250_aspeed_vuart.c
index 435bec40dee6..0438d9a905ce 100644
--- a/drivers/tty/serial/8250/8250_aspeed_vuart.c
+++ b/drivers/tty/serial/8250/8250_aspeed_vuart.c
@@ -5,6 +5,10 @@
  *    Copyright (C) 2016 Jeremy Kerr <jk@ozlabs.org>, IBM Corp.
  *    Copyright (C) 2006 Arnd Bergmann <arnd@arndb.de>, IBM Corp.
  */
+#if defined(CONFIG_SERIAL_8250_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
+#define SUPPORT_SYSRQ
+#endif
+
 #include <linux/device.h>
 #include <linux/module.h>
 #include <linux/of_address.h>
@@ -293,7 +297,7 @@ static int aspeed_vuart_handle_irq(struct uart_port *port)
 	if (lsr & UART_LSR_THRE)
 		serial8250_tx_chars(up);
 
-	spin_unlock_irqrestore(&port->lock, flags);
+	uart_unlock_and_check_sysrq(port, flags);
 
 	return 1;
 }
diff --git a/drivers/tty/serial/8250/8250_fsl.c b/drivers/tty/serial/8250/8250_fsl.c
index 6640a4c7ddd1..ff3dcaea5d93 100644
--- a/drivers/tty/serial/8250/8250_fsl.c
+++ b/drivers/tty/serial/8250/8250_fsl.c
@@ -1,4 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0
+#if defined(CONFIG_SERIAL_8250_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
+#define SUPPORT_SYSRQ
+#endif
+
 #include <linux/serial_reg.h>
 #include <linux/serial_8250.h>
 
@@ -54,7 +58,7 @@ int fsl8250_handle_irq(struct uart_port *port)
 		serial8250_tx_chars(up);
 
 	up->lsr_saved_flags = orig_lsr;
-	spin_unlock_irqrestore(&up->port.lock, flags);
+	uart_unlock_and_check_sysrq(&up->port, flags);
 	return 1;
 }
 EXPORT_SYMBOL_GPL(fsl8250_handle_irq);
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index a019286f8bb6..ad7ba7d0f28d 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -8,6 +8,10 @@
  *
  */
 
+#if defined(CONFIG_SERIAL_8250_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
+#define SUPPORT_SYSRQ
+#endif
+
 #include <linux/device.h>
 #include <linux/io.h>
 #include <linux/module.h>
@@ -1085,7 +1089,7 @@ static int omap_8250_dma_handle_irq(struct uart_port *port)
 		}
 	}
 
-	spin_unlock_irqrestore(&port->lock, flags);
+	uart_unlock_and_check_sysrq(port, flags);
 	serial8250_rpm_put(up);
 	return 1;
 }
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index f776b3eafb96..c39482b96111 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1755,7 +1755,7 @@ void serial8250_read_char(struct uart_8250_port *up, unsigned char lsr)
 		else if (lsr & UART_LSR_FE)
 			flag = TTY_FRAME;
 	}
-	if (uart_handle_sysrq_char(port, ch))
+	if (uart_prepare_sysrq_char(port, ch))
 		return;
 
 	uart_insert_char(port, lsr, UART_LSR_OE, ch, flag);
@@ -1897,7 +1897,7 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
 	if ((!up->dma || up->dma->tx_err) && (status & UART_LSR_THRE))
 		serial8250_tx_chars(up);
 
-	spin_unlock_irqrestore(&port->lock, flags);
+	uart_unlock_and_check_sysrq(port, flags);
 	return 1;
 }
 EXPORT_SYMBOL_GPL(serial8250_handle_irq);
@@ -3258,9 +3258,7 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
 
 	serial8250_rpm_get(up);
 
-	if (port->sysrq)
-		locked = 0;
-	else if (oops_in_progress)
+	if (oops_in_progress)
 		locked = spin_trylock_irqsave(&port->lock, flags);
 	else
 		spin_lock_irqsave(&port->lock, flags);
-- 
2.19.1.568.g152ad8e336-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
                   ` (4 preceding siblings ...)
  2018-10-29 18:07 ` [PATCH 5/7] serial: 8250: Process sysrq at port unlock time Douglas Anderson
@ 2018-10-29 18:07 ` Douglas Anderson
  2018-10-30  8:25   ` Peter Zijlstra
  2018-10-30  9:41   ` Daniel Thompson
  2018-10-29 18:07 ` [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup Douglas Anderson
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 18+ messages in thread
From: Douglas Anderson @ 2018-10-29 18:07 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, linux-mips,
	linux-sh, peterz, linux-hexagon, frederic, riel, linux-kernel,
	luto, sparclinux, linux-snps-arc, linuxppc-dev, linux-arm-kernel

In kgdb_roundup_cpus() we've got code that looks like:
  local_irq_enable();
  smp_call_function(kgdb_call_nmi_hook, NULL, 0);
  local_irq_disable();

In certain cases when we drop into kgdb (like with sysrq-g on a serial
console) we'll get a big yell that looks like:

  sysrq: SysRq : DEBUG
  ------------[ cut here ]------------
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)
  WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
  pstate: 604003c9 (nZCv DAIF +PAN -UAO)
  pc : lockdep_hardirqs_on+0xf0/0x160
  ...
  Call trace:
   lockdep_hardirqs_on+0xf0/0x160
   trace_hardirqs_on+0x188/0x1ac
   kgdb_roundup_cpus+0x14/0x3c
   kgdb_cpu_enter+0x53c/0x5cc
   kgdb_handle_exception+0x180/0x1d4
   kgdb_compiled_brk_fn+0x30/0x3c
   brk_handler+0x134/0x178
   do_debug_exception+0xfc/0x178
   el1_dbg+0x18/0x78
   kgdb_breakpoint+0x34/0x58
   sysrq_handle_dbg+0x54/0x5c
   __handle_sysrq+0x114/0x21c
   handle_sysrq+0x30/0x3c
   qcom_geni_serial_isr+0x2dc/0x30c
  ...
  ...
  irq event stamp: ...45
  hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
  hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
  softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
  softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
  ---[ end trace adf21f830c46e638 ]---

Let's add kgdb to the list of reasons not to warn in
smp_call_function_many().  That will allow us (in a future patch) to
stop calling local_irq_enable() which will get rid of the original
splat.

NOTE: with this change comes the obvious question: will we start
deadlocking more often now when we drop into the debugger.  I can't
say that for sure one way or the other, but the fact that we do the
same logic for "oops_in_progress" makes me feel like it shouldn't
happen too often.  Also note that the old logic of turning on
interrupts temporarily wasn't exactly safe since (I presume) that
could have violated spin_lock_irqsave() semantics and ended up with a
deadlock of its own.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 kernel/smp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 163c451af42e..bb581e58c8dc 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -19,6 +19,7 @@
 #include <linux/sched.h>
 #include <linux/sched/idle.h>
 #include <linux/hypervisor.h>
+#include <linux/kgdb.h>
 
 #include "smpboot.h"
 
@@ -413,7 +414,8 @@ void smp_call_function_many(const struct cpumask *mask,
 	 * can't happen.
 	 */
 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
-		     && !oops_in_progress && !early_boot_irqs_disabled);
+		     && !oops_in_progress && !early_boot_irqs_disabled
+		     && !in_dbg_master());
 
 	/* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
 	cpu = cpumask_first_and(mask, cpu_online_mask);
-- 
2.19.1.568.g152ad8e336-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
                   ` (5 preceding siblings ...)
  2018-10-29 18:07 ` [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus() Douglas Anderson
@ 2018-10-29 18:07 ` Douglas Anderson
  2018-10-30 11:46   ` Daniel Thompson
  2018-10-30 11:56 ` [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Daniel Thompson
  2018-10-30 12:36 ` Andy Shevchenko
  8 siblings, 1 reply; 18+ messages in thread
From: Douglas Anderson @ 2018-10-29 18:07 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, kstewart,
	linux-mips, dalias, linux-sh, benh, will.deacon, linux-kernel,
	mhocko, paulus, hpa, sparclinux, linux-hexagon, sfr, ysato, mpe,
	x86, linux, mingo, catalin.marinas, jhogan, linux-snps-arc,
	ying.huang, rppt, bp, linux-arm-kernel, christophe.leroy,
	pombredanne, ralf, rkuo, paul.burton, vgupta, akpm, linuxppc-dev,
	davem

The function kgdb_roundup_cpus() was passed a parameter that was
documented as:

> the flags that will be used when restoring the interrupts. There is
> local_irq_save() call before kgdb_roundup_cpus().

Nobody used those flags.  Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them.  So we can definitely remove the flags.

Speaking of calling local_irq_enable(), it seems like a bad idea and
it caused a nice splat on my system with lockdep turned on.
Specifically it hit:
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)

See the previous patch in this series ("smp: Don't yell about IRQs
disabled in kgdb_roundup_cpus()") for more details, but the last few
things on the stack were this on my arm64 board:
  lockdep_hardirqs_on+0xf0/0x160
  trace_hardirqs_on+0x188/0x1ac
  kgdb_roundup_cpus+0x14/0x3c

As agrued in the the commit text of ("smp: Don't yell about IRQs
disabled in kgdb_roundup_cpus()"), it seems better to make
smp_call_function() lenient about kgdb than to locally turn on IRQs
here.  Thus let's totally remove all the local_irq_enable() and
local_irq_disable() calls from all of the kgdb_roundup_cpus() calls.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 arch/arc/kernel/kgdb.c     |  4 +---
 arch/arm/kernel/kgdb.c     |  4 +---
 arch/arm64/kernel/kgdb.c   |  4 +---
 arch/hexagon/kernel/kgdb.c | 11 ++---------
 arch/mips/kernel/kgdb.c    |  4 +---
 arch/powerpc/kernel/kgdb.c |  2 +-
 arch/sh/kernel/kgdb.c      |  4 +---
 arch/sparc/kernel/smp_64.c |  2 +-
 arch/x86/kernel/kgdb.c     |  9 ++-------
 include/linux/kgdb.h       |  9 ++-------
 kernel/debug/debug_core.c  |  2 +-
 11 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 9a3c34af2ae8..d94d3cb7f9e8 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -197,11 +197,9 @@ static void kgdb_call_nmi_hook(void *ignored)
 	kgdb_nmicallback(raw_smp_processor_id(), NULL);
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-	local_irq_enable();
 	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-	local_irq_disable();
 }
 
 struct kgdb_arch arch_kgdb_ops = {
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index caa0dbe3dc61..a80e9259f7e9 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -175,11 +175,9 @@ static void kgdb_call_nmi_hook(void *ignored)
        kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-       local_irq_enable();
        smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-       local_irq_disable();
 }
 
 static int __kgdb_notify(struct die_args *args, unsigned long cmd)
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a20de58061a8..5d171c26788f 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -289,11 +289,9 @@ static void kgdb_call_nmi_hook(void *ignored)
 	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-	local_irq_enable();
 	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-	local_irq_disable();
 }
 
 static int __kgdb_notify(struct die_args *args, unsigned long cmd)
diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
index 16c24b22d0b2..30fbc491cf45 100644
--- a/arch/hexagon/kernel/kgdb.c
+++ b/arch/hexagon/kernel/kgdb.c
@@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long pc)
 
 /**
  * kgdb_roundup_cpus - Get other CPUs into a holding pattern
- * @flags: Current IRQ state
  *
  * On SMP systems, we need to get the attention of the other CPUs
  * and get them be in a known state.  This should do what is needed
  * to get the other CPUs to call kgdb_wait(). Note that on some arches,
  * the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs. In
- * this case, we have to make sure that interrupts are enabled before
- * calling smp_call_function(). The argument to this function is
- * the flags that will be used when restoring the interrupts. There is
- * local_irq_save() call before kgdb_roundup_cpus().
+ * in case of MIPS, smp_call_function() is used to roundup CPUs.
  *
  * On non-SMP systems, this is not called.
  */
@@ -139,11 +134,9 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
 	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-	local_irq_enable();
 	smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
-	local_irq_disable();
 }
 #endif
 
diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index eb6c0d582626..6671a279966f 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -219,11 +219,9 @@ static void kgdb_call_nmi_hook(void *ignored)
 	set_fs(old_fs);
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-	local_irq_enable();
 	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-	local_irq_disable();
 }
 
 static int compute_signal(int tt)
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 59c578f865aa..b0e804844be0 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -124,7 +124,7 @@ static int kgdb_call_nmi_hook(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_SMP
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
 	smp_send_debugger_break();
 }
diff --git a/arch/sh/kernel/kgdb.c b/arch/sh/kernel/kgdb.c
index 4f04c6638a4d..86b3ea927e42 100644
--- a/arch/sh/kernel/kgdb.c
+++ b/arch/sh/kernel/kgdb.c
@@ -319,11 +319,9 @@ static void kgdb_call_nmi_hook(void *ignored)
 	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
 }
 
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
-	local_irq_enable();
 	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
-	local_irq_disable();
 }
 
 static int __kgdb_notify(struct die_args *args, unsigned long cmd)
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 4792e08ad36b..f45d876983f1 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1014,7 +1014,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 }
 
 #ifdef CONFIG_KGDB
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
 	smp_cross_call(&xcall_kgdb_capture, 0, 0, 0);
 }
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 8e36f249646e..ac6291a4178d 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -422,21 +422,16 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
 #ifdef CONFIG_SMP
 /**
  *	kgdb_roundup_cpus - Get other CPUs into a holding pattern
- *	@flags: Current IRQ state
  *
  *	On SMP systems, we need to get the attention of the other CPUs
  *	and get them be in a known state.  This should do what is needed
  *	to get the other CPUs to call kgdb_wait(). Note that on some arches,
  *	the NMI approach is not used for rounding up all the CPUs. For example,
- *	in case of MIPS, smp_call_function() is used to roundup CPUs. In
- *	this case, we have to make sure that interrupts are enabled before
- *	calling smp_call_function(). The argument to this function is
- *	the flags that will be used when restoring the interrupts. There is
- *	local_irq_save() call before kgdb_roundup_cpus().
+ *	in case of MIPS, smp_call_function() is used to roundup CPUs.
  *
  *	On non-SMP systems, this is not called.
  */
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
 {
 	apic->send_IPI_allbutself(APIC_DM_NMI);
 }
diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index e465bb15912d..05e5b2eb0d32 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -178,21 +178,16 @@ kgdb_arch_handle_exception(int vector, int signo, int err_code,
 
 /**
  *	kgdb_roundup_cpus - Get other CPUs into a holding pattern
- *	@flags: Current IRQ state
  *
  *	On SMP systems, we need to get the attention of the other CPUs
  *	and get them into a known state.  This should do what is needed
  *	to get the other CPUs to call kgdb_wait(). Note that on some arches,
  *	the NMI approach is not used for rounding up all the CPUs. For example,
- *	in case of MIPS, smp_call_function() is used to roundup CPUs. In
- *	this case, we have to make sure that interrupts are enabled before
- *	calling smp_call_function(). The argument to this function is
- *	the flags that will be used when restoring the interrupts. There is
- *	local_irq_save() call before kgdb_roundup_cpus().
+ *	in case of MIPS, smp_call_function() is used to roundup CPUs.
  *
  *	On non-SMP systems, this is not called.
  */
-extern void kgdb_roundup_cpus(unsigned long flags);
+extern void kgdb_roundup_cpus(void);
 
 /**
  *	kgdb_arch_set_pc - Generic call back to the program counter
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 65c0f1363788..f3cadda45f07 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -593,7 +593,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
 
 	/* Signal the other CPUs to enter kgdb_wait() */
 	else if ((!kgdb_single_step) && kgdb_do_roundup)
-		kgdb_roundup_cpus(flags);
+		kgdb_roundup_cpus();
 #endif
 
 	/*
-- 
2.19.1.568.g152ad8e336-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/7] serial: core: Allow processing sysrq at port unlock time
  2018-10-29 18:07 ` [PATCH 2/7] serial: core: Allow processing sysrq at port unlock time Douglas Anderson
@ 2018-10-30  5:31   ` Doug Anderson
  0 siblings, 0 replies; 18+ messages in thread
From: Doug Anderson @ 2018-10-30  5:31 UTC (permalink / raw)
  To: Jason Wessel, Daniel Thompson, Thomas Gleixner, Ingo Molnar,
	Greg Kroah-Hartman
  Cc: linux-arm-msm, kgdb-bugreport, LKML, Jiri Slaby, Jeremy Kerr,
	linux-serial

Hi,

On Mon, Oct 29, 2018 at 11:08 AM Douglas Anderson <dianders@chromium.org> wrote:
>
> Right now serial drivers process sysrq keys deep in their character
> receiving code.  This means that they've already grabbed their
> port->lock spinlock.  This can end up getting in the way if we've go
> to do serial stuff (especially kgdb) in response to the sysrq.
>
> Serial drivers have various hacks in them to handle this.  Looking at
> '8250_port.c' you can see that the console_write() skips locking if
> we're in the sysrq handler.  Looking at 'msm_serial.c' you can see
> that the port lock is dropped around uart_handle_sysrq_char().
>
> It turns out that these hacks aren't exactly perfect.  If you have
> lockdep turned on and use something like the 8250_port hack you'll get
> a splat that looks like:
>
>   WARNING: possible circular locking dependency detected
>   [...] is trying to acquire lock:
>   ... (console_owner){-.-.}, at: console_unlock+0x2e0/0x5e4
>
>   but task is already holding lock:
>   ... (&port_lock_key){-.-.}, at: serial8250_handle_irq+0x30/0xe4
>
>   which lock already depends on the new lock.
>
>   the existing dependency chain (in reverse order) is:
>
>   -> #1 (&port_lock_key){-.-.}:
>          _raw_spin_lock_irqsave+0x58/0x70
>          serial8250_console_write+0xa8/0x250
>          univ8250_console_write+0x40/0x4c
>          console_unlock+0x528/0x5e4
>          register_console+0x2c4/0x3b0
>          uart_add_one_port+0x350/0x478
>          serial8250_register_8250_port+0x350/0x3a8
>          dw8250_probe+0x67c/0x754
>          platform_drv_probe+0x58/0xa4
>          really_probe+0x150/0x294
>          driver_probe_device+0xac/0xe8
>          __driver_attach+0x98/0xd0
>          bus_for_each_dev+0x84/0xc8
>          driver_attach+0x2c/0x34
>          bus_add_driver+0xf0/0x1ec
>          driver_register+0xb4/0x100
>          __platform_driver_register+0x60/0x6c
>          dw8250_platform_driver_init+0x20/0x28
>          ...
>
>   -> #0 (console_owner){-.-.}:
>          lock_acquire+0x1e8/0x214
>          console_unlock+0x35c/0x5e4
>          vprintk_emit+0x230/0x274
>          vprintk_default+0x7c/0x84
>          vprintk_func+0x190/0x1bc
>          printk+0x80/0xa0
>          __handle_sysrq+0x104/0x21c
>          handle_sysrq+0x30/0x3c
>          serial8250_read_char+0x15c/0x18c
>          serial8250_rx_chars+0x34/0x74
>          serial8250_handle_irq+0x9c/0xe4
>          dw8250_handle_irq+0x98/0xcc
>          serial8250_interrupt+0x50/0xe8
>          ...
>
>   other info that might help us debug this:
>
>    Possible unsafe locking scenario:
>
>          CPU0                    CPU1
>          ----                    ----
>     lock(&port_lock_key);
>                                  lock(console_owner);
>                                  lock(&port_lock_key);
>     lock(console_owner);
>
>    *** DEADLOCK ***
>
> The hack used in 'msm_serial.c' doesn't cause the above splats but it
> seems a bit ugly to unlock / lock our spinlock deep in our irq
> handler.
>
> It seems like we could defer processing the sysrq until the end of the
> interrupt handler right after we've unlocked the port.  With this
> scheme if a whole batch of sysrq characters comes in one irq then we
> won't handle them all, but that seems like it should be a fine
> compromise.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
>  include/linux/serial_core.h | 37 ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 36 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
> index 047fa67d039b..78de9d929762 100644
> --- a/include/linux/serial_core.h
> +++ b/include/linux/serial_core.h
> @@ -175,6 +175,7 @@ struct uart_port {
>         struct console          *cons;                  /* struct console, if any */
>  #if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ)
>         unsigned long           sysrq;                  /* sysrq timeout */
> +       unsigned int            sysrq_ch;               /* char for sysrq */
>  #endif
>
>         /* flags must be updated while holding port mutex */
> @@ -485,8 +486,42 @@ uart_handle_sysrq_char(struct uart_port *port, unsigned int ch)
>         }
>         return 0;
>  }
> +static inline int
> +uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch)
> +{
> +       if (port->sysrq) {
> +               if (ch && time_before(jiffies, port->sysrq)) {
> +                       port->sysrq_ch = ch;
> +                       port->sysrq = 0;
> +                       return 1;
> +               }
> +               port->sysrq = 0;
> +       }
> +       return 0;
> +}
> +static inline void
> +uart_unlock_and_check_sysrq(struct uart_port *port, unsigned long irqflags)
> +{
> +       int sysrq_ch;
> +
> +       sysrq_ch = port->sysrq_ch;
> +       port->sysrq_ch = 0;
> +
> +       spin_unlock_irqrestore(&port->lock, irqflags);
> +
> +       if (sysrq_ch)
> +               handle_sysrq(sysrq_ch);
> +}
>  #else
> -#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; })
> +static inline int
> +uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) { return 0; }
> +static inline int
> +uart_prepare_sysrq_char(struct uart_port *port, unsigned int ch) { return 0; }
> +static inline void
> +uart_unlock_and_check_sysrq(struct uart_port *port, unsigned long irqflags)
> +{
> +       spin_unlock_irqrestore(&port->lock, irqflags);
> +}

Jeremy wrote me to point out that I messed up and didn't get this
patch posted to the linux-serial list.  Sorry about that.  :(  I guess
get_mainatiners doesn't notice that this include file is relevant to
serial and I didn't notice either since I was too focused on trying to
figure out if it was really the right call to Cc all the arch
maintainers on the cover letter and the last patch...  Sigh.

If/when I need to repost, I'll make sure to get linux-serial.  For now
at least they are on LKML so probably the easiest place to find all
the patches is:

https://lore.kernel.org/patchwork/cover/1004280/

...if you clock on the "show" next to "Related" you get the whole
series.  Using the message ID from there you can also find them at:

https://lkml.kernel.org/r/20181029180707.207546-1-dianders@chromium.org

Both places allow you to grab 'mbox' files which (which a bit of a
hassle--sorry) can allow you to reply /apply patches.

-Doug

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
  2018-10-29 18:07 ` [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus() Douglas Anderson
@ 2018-10-30  8:25   ` Peter Zijlstra
  2018-10-30  9:41   ` Daniel Thompson
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2018-10-30  8:25 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh,
	linux-arm-msm, kgdb-bugreport, linux-mips, linux-sh,
	linux-hexagon, frederic, riel, linux-kernel, luto, sparclinux,
	linux-snps-arc, linuxppc-dev, linux-arm-kernel

On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote:
> In kgdb_roundup_cpus() we've got code that looks like:
>   local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
>   local_irq_disable();

> Let's add kgdb to the list of reasons not to warn in
> smp_call_function_many().  That will allow us (in a future patch) to
> stop calling local_irq_enable() which will get rid of the original
> splat.
> 
> NOTE: with this change comes the obvious question: will we start
> deadlocking more often now when we drop into the debugger.  I can't
> say that for sure one way or the other, but the fact that we do the
> same logic for "oops_in_progress" makes me feel like it shouldn't
> happen too often.  Also note that the old logic of turning on
> interrupts temporarily wasn't exactly safe since (I presume) that
> could have violated spin_lock_irqsave() semantics and ended up with a
> deadlock of its own.

How is any of that not utterly and terminally broken?

> @@ -413,7 +414,8 @@ void smp_call_function_many(const struct cpumask *mask,
>  	 * can't happen.
>  	 */
>  	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> -		     && !oops_in_progress && !early_boot_irqs_disabled);
> +		     && !oops_in_progress && !early_boot_irqs_disabled
> +		     && !in_dbg_master());
>  
>  	/* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
>  	cpu = cpumask_first_and(mask, cpu_online_mask);

Not a fan of this. There is a distinct difference between
oops_in_progress and dropping into kgdb in that you don't ever expect to
return/survive oopses, whereas we do expect to survive kgdb.

Also, how does kgdb work at all without actual NMIs ?

So no, NAK on this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
  2018-10-29 18:07 ` [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus() Douglas Anderson
  2018-10-30  8:25   ` Peter Zijlstra
@ 2018-10-30  9:41   ` Daniel Thompson
  2018-10-30 22:21     ` Doug Anderson
  1 sibling, 1 reply; 18+ messages in thread
From: Daniel Thompson @ 2018-10-30  9:41 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Jason Wessel, tglx, mingo, gregkh, linux-arm-msm, kgdb-bugreport,
	linux-mips, linux-sh, peterz, linux-hexagon, frederic, riel,
	linux-kernel, luto, sparclinux, linux-snps-arc, linuxppc-dev,
	linux-arm-kernel

On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote:
> In kgdb_roundup_cpus() we've got code that looks like:
>   local_irq_enable();
>   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
>   local_irq_disable();
> 
> In certain cases when we drop into kgdb (like with sysrq-g on a serial
> console) we'll get a big yell that looks like:
> 
>   sysrq: SysRq : DEBUG
>   ------------[ cut here ]------------
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>   WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
>   pstate: 604003c9 (nZCv DAIF +PAN -UAO)
>   pc : lockdep_hardirqs_on+0xf0/0x160
>   ...
>   Call trace:
>    lockdep_hardirqs_on+0xf0/0x160
>    trace_hardirqs_on+0x188/0x1ac
>    kgdb_roundup_cpus+0x14/0x3c
>    kgdb_cpu_enter+0x53c/0x5cc
>    kgdb_handle_exception+0x180/0x1d4
>    kgdb_compiled_brk_fn+0x30/0x3c
>    brk_handler+0x134/0x178
>    do_debug_exception+0xfc/0x178
>    el1_dbg+0x18/0x78
>    kgdb_breakpoint+0x34/0x58
>    sysrq_handle_dbg+0x54/0x5c
>    __handle_sysrq+0x114/0x21c
>    handle_sysrq+0x30/0x3c
>    qcom_geni_serial_isr+0x2dc/0x30c
>   ...
>   ...
>   irq event stamp: ...45
>   hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
>   hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
>   softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
>   softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
>   ---[ end trace adf21f830c46e638 ]---
> 
> Let's add kgdb to the list of reasons not to warn in
> smp_call_function_many().  That will allow us (in a future patch) to
> stop calling local_irq_enable() which will get rid of the original
> splat.
> 
> NOTE: with this change comes the obvious question: will we start
> deadlocking more often now when we drop into the debugger.  I can't
> say that for sure one way or the other, but the fact that we do the
> same logic for "oops_in_progress" makes me feel like it shouldn't
> happen too often.  Also note that the old logic of turning on
> interrupts temporarily wasn't exactly safe since (I presume) that
> could have violated spin_lock_irqsave() semantics and ended up with a
> deadlock of its own.

This is part of the code to bring all the cores to a halt and since
the other cores are still running kgdb isn't yet able to use the fact
all the CPUs are halted to bend the rules. It is better for this code
to play by the rules if it can.

Is is possible to get the roundup functions to use a private csd
alongside smp_call_function_single_async()? We could add a helper
function to the debug core to avoid having to add cpu_online loops into
every kgdb_roundup_cpus() implementaton.


Daniel.



> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  kernel/smp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 163c451af42e..bb581e58c8dc 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -19,6 +19,7 @@
>  #include <linux/sched.h>
>  #include <linux/sched/idle.h>
>  #include <linux/hypervisor.h>
> +#include <linux/kgdb.h>
>  
>  #include "smpboot.h"
>  
> @@ -413,7 +414,8 @@ void smp_call_function_many(const struct cpumask *mask,
>  	 * can't happen.
>  	 */
>  	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> -		     && !oops_in_progress && !early_boot_irqs_disabled);
> +		     && !oops_in_progress && !early_boot_irqs_disabled
> +		     && !in_dbg_master());
>  
>  	/* Try to fastpath.  So, what's a CPU they want? Ignoring this one. */
>  	cpu = cpumask_first_and(mask, cpu_online_mask);
> -- 
> 2.19.1.568.g152ad8e336-goog
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup
  2018-10-29 18:07 ` [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup Douglas Anderson
@ 2018-10-30 11:46   ` Daniel Thompson
  2018-10-30 22:22     ` Doug Anderson
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Thompson @ 2018-10-30 11:46 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Jason Wessel, tglx, mingo, gregkh, linux-arm-msm, kgdb-bugreport,
	kstewart, linux-mips, dalias, linux-sh, benh, will.deacon,
	linux-kernel, mhocko, paulus, hpa, sparclinux, linux-hexagon,
	sfr, ysato, mpe, x86, linux, mingo, catalin.marinas, jhogan,
	linux-snps-arc, ying.huang, rppt, bp, linux-arm-kernel,
	christophe.leroy, pombredanne, ralf, rkuo, paul.burton, vgupta,
	akpm, linuxppc-dev, davem

On Mon, Oct 29, 2018 at 11:07:07AM -0700, Douglas Anderson wrote:
> The function kgdb_roundup_cpus() was passed a parameter that was
> documented as:
> 
> > the flags that will be used when restoring the interrupts. There is
> > local_irq_save() call before kgdb_roundup_cpus().
> 
> Nobody used those flags.  Anyone who wanted to temporarily turn on
> interrupts just did local_irq_enable() and local_irq_disable() without
> looking at them.  So we can definitely remove the flags.

On the whole I'd rather that this change...


> Speaking of calling local_irq_enable(), it seems like a bad idea and
> it caused a nice splat on my system with lockdep turned on.
> Specifically it hit:
>   DEBUG_LOCKS_WARN_ON(current->hardirq_context)

... and fixes for this this were in separate patches. They don't appear
especially related.


Daniel.

 
> See the previous patch in this series ("smp: Don't yell about IRQs
> disabled in kgdb_roundup_cpus()") for more details, but the last few
> things on the stack were this on my arm64 board:
>   lockdep_hardirqs_on+0xf0/0x160
>   trace_hardirqs_on+0x188/0x1ac
>   kgdb_roundup_cpus+0x14/0x3c
> 
> As agrued in the the commit text of ("smp: Don't yell about IRQs
> disabled in kgdb_roundup_cpus()"), it seems better to make
> smp_call_function() lenient about kgdb than to locally turn on IRQs
> here.  Thus let's totally remove all the local_irq_enable() and
> local_irq_disable() calls from all of the kgdb_roundup_cpus() calls.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  arch/arc/kernel/kgdb.c     |  4 +---
>  arch/arm/kernel/kgdb.c     |  4 +---
>  arch/arm64/kernel/kgdb.c   |  4 +---
>  arch/hexagon/kernel/kgdb.c | 11 ++---------
>  arch/mips/kernel/kgdb.c    |  4 +---
>  arch/powerpc/kernel/kgdb.c |  2 +-
>  arch/sh/kernel/kgdb.c      |  4 +---
>  arch/sparc/kernel/smp_64.c |  2 +-
>  arch/x86/kernel/kgdb.c     |  9 ++-------
>  include/linux/kgdb.h       |  9 ++-------
>  kernel/debug/debug_core.c  |  2 +-
>  11 files changed, 14 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
> index 9a3c34af2ae8..d94d3cb7f9e8 100644
> --- a/arch/arc/kernel/kgdb.c
> +++ b/arch/arc/kernel/kgdb.c
> @@ -197,11 +197,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>  	kgdb_nmicallback(raw_smp_processor_id(), NULL);
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -	local_irq_enable();
>  	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> -	local_irq_disable();
>  }
>  
>  struct kgdb_arch arch_kgdb_ops = {
> diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
> index caa0dbe3dc61..a80e9259f7e9 100644
> --- a/arch/arm/kernel/kgdb.c
> +++ b/arch/arm/kernel/kgdb.c
> @@ -175,11 +175,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>         kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -       local_irq_enable();
>         smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> -       local_irq_disable();
>  }
>  
>  static int __kgdb_notify(struct die_args *args, unsigned long cmd)
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index a20de58061a8..5d171c26788f 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -289,11 +289,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>  	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -	local_irq_enable();
>  	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> -	local_irq_disable();
>  }
>  
>  static int __kgdb_notify(struct die_args *args, unsigned long cmd)
> diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
> index 16c24b22d0b2..30fbc491cf45 100644
> --- a/arch/hexagon/kernel/kgdb.c
> +++ b/arch/hexagon/kernel/kgdb.c
> @@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long pc)
>  
>  /**
>   * kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - * @flags: Current IRQ state
>   *
>   * On SMP systems, we need to get the attention of the other CPUs
>   * and get them be in a known state.  This should do what is needed
>   * to get the other CPUs to call kgdb_wait(). Note that on some arches,
>   * the NMI approach is not used for rounding up all the CPUs. For example,
> - * in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - * this case, we have to make sure that interrupts are enabled before
> - * calling smp_call_function(). The argument to this function is
> - * the flags that will be used when restoring the interrupts. There is
> - * local_irq_save() call before kgdb_roundup_cpus().
> + * in case of MIPS, smp_call_function() is used to roundup CPUs.
>   *
>   * On non-SMP systems, this is not called.
>   */
> @@ -139,11 +134,9 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
>  	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -	local_irq_enable();
>  	smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
> -	local_irq_disable();
>  }
>  #endif
>  
> diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
> index eb6c0d582626..6671a279966f 100644
> --- a/arch/mips/kernel/kgdb.c
> +++ b/arch/mips/kernel/kgdb.c
> @@ -219,11 +219,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>  	set_fs(old_fs);
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -	local_irq_enable();
>  	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> -	local_irq_disable();
>  }
>  
>  static int compute_signal(int tt)
> diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
> index 59c578f865aa..b0e804844be0 100644
> --- a/arch/powerpc/kernel/kgdb.c
> +++ b/arch/powerpc/kernel/kgdb.c
> @@ -124,7 +124,7 @@ static int kgdb_call_nmi_hook(struct pt_regs *regs)
>  }
>  
>  #ifdef CONFIG_SMP
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>  	smp_send_debugger_break();
>  }
> diff --git a/arch/sh/kernel/kgdb.c b/arch/sh/kernel/kgdb.c
> index 4f04c6638a4d..86b3ea927e42 100644
> --- a/arch/sh/kernel/kgdb.c
> +++ b/arch/sh/kernel/kgdb.c
> @@ -319,11 +319,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>  	kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
>  }
>  
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
> -	local_irq_enable();
>  	smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> -	local_irq_disable();
>  }
>  
>  static int __kgdb_notify(struct die_args *args, unsigned long cmd)
> diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
> index 4792e08ad36b..f45d876983f1 100644
> --- a/arch/sparc/kernel/smp_64.c
> +++ b/arch/sparc/kernel/smp_64.c
> @@ -1014,7 +1014,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
>  }
>  
>  #ifdef CONFIG_KGDB
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>  	smp_cross_call(&xcall_kgdb_capture, 0, 0, 0);
>  }
> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> index 8e36f249646e..ac6291a4178d 100644
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -422,21 +422,16 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
>  #ifdef CONFIG_SMP
>  /**
>   *	kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - *	@flags: Current IRQ state
>   *
>   *	On SMP systems, we need to get the attention of the other CPUs
>   *	and get them be in a known state.  This should do what is needed
>   *	to get the other CPUs to call kgdb_wait(). Note that on some arches,
>   *	the NMI approach is not used for rounding up all the CPUs. For example,
> - *	in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - *	this case, we have to make sure that interrupts are enabled before
> - *	calling smp_call_function(). The argument to this function is
> - *	the flags that will be used when restoring the interrupts. There is
> - *	local_irq_save() call before kgdb_roundup_cpus().
> + *	in case of MIPS, smp_call_function() is used to roundup CPUs.
>   *
>   *	On non-SMP systems, this is not called.
>   */
> -void kgdb_roundup_cpus(unsigned long flags)
> +void kgdb_roundup_cpus(void)
>  {
>  	apic->send_IPI_allbutself(APIC_DM_NMI);
>  }
> diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
> index e465bb15912d..05e5b2eb0d32 100644
> --- a/include/linux/kgdb.h
> +++ b/include/linux/kgdb.h
> @@ -178,21 +178,16 @@ kgdb_arch_handle_exception(int vector, int signo, int err_code,
>  
>  /**
>   *	kgdb_roundup_cpus - Get other CPUs into a holding pattern
> - *	@flags: Current IRQ state
>   *
>   *	On SMP systems, we need to get the attention of the other CPUs
>   *	and get them into a known state.  This should do what is needed
>   *	to get the other CPUs to call kgdb_wait(). Note that on some arches,
>   *	the NMI approach is not used for rounding up all the CPUs. For example,
> - *	in case of MIPS, smp_call_function() is used to roundup CPUs. In
> - *	this case, we have to make sure that interrupts are enabled before
> - *	calling smp_call_function(). The argument to this function is
> - *	the flags that will be used when restoring the interrupts. There is
> - *	local_irq_save() call before kgdb_roundup_cpus().
> + *	in case of MIPS, smp_call_function() is used to roundup CPUs.
>   *
>   *	On non-SMP systems, this is not called.
>   */
> -extern void kgdb_roundup_cpus(unsigned long flags);
> +extern void kgdb_roundup_cpus(void);
>  
>  /**
>   *	kgdb_arch_set_pc - Generic call back to the program counter
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 65c0f1363788..f3cadda45f07 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -593,7 +593,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
>  
>  	/* Signal the other CPUs to enter kgdb_wait() */
>  	else if ((!kgdb_single_step) && kgdb_do_roundup)
> -		kgdb_roundup_cpus(flags);
> +		kgdb_roundup_cpus();
>  #endif
>  
>  	/*
> -- 
> 2.19.1.568.g152ad8e336-goog
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
                   ` (6 preceding siblings ...)
  2018-10-29 18:07 ` [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup Douglas Anderson
@ 2018-10-30 11:56 ` Daniel Thompson
  2018-10-30 12:31   ` Russell King - ARM Linux
  2018-10-30 12:36 ` Andy Shevchenko
  8 siblings, 1 reply; 18+ messages in thread
From: Daniel Thompson @ 2018-10-30 11:56 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Jason Wessel, tglx, mingo, gregkh, linux-arm-msm, kgdb-bugreport,
	nm, linux-mips, dalias, catalin.marinas, vigneshr, linux-aspeed,
	linux-sh, peterz, will.deacon, mhocko, paulus, hpa, sparclinux,
	marex, sfr, ysato, linux-hexagon, x86, linux, pombredanne, tony,
	mingo, joel, linux-serial, rolf.evers.fischer, jhogan, asierra,
	linux-snps-arc, dan.carpenter, ying.huang, riel, frederic,
	jslaby, paul.burton, rppt, bp, luto, andriy.shevchenko,
	linux-arm-kernel, christophe.leroy, andrew, linux-kernel, ralf,
	rkuo, Jisheng.Zhang, vgupta, benh, jk, mpe, akpm, linuxppc-dev,
	davem, kstewart

On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> Looking back, this is pretty much two series squashed that could be
> treated indepdently.  The first is a serial series and the second is a
> kgdb series.

Indeed.

I couldn't work out the link between the first 5 patches and the last 2
until I read this...

Is anything in the 01->05 patch set even related to kgdb? From the stack
traces it looks to me like the lock dep warning would trigger for any
sysrq. I think separating into two threads for v2 would be sensible.


Daniel.


> 
> For all serial patches I'd expect them to go through the tty tree once
> they've been reviewed.
> 
> If folks are OK w/ the 'smp' patch it probably should go in some core
> kernel tree.  The kgdb patch won't work without it, though, so to land
> that we'd need coordination between the folks landing that and the
> folks landing the 'smp' patch.
> 
> 
> Douglas Anderson (7):
>   serial: qcom_geni_serial: Finish supporting sysrq
>   serial: core: Allow processing sysrq at port unlock time
>   serial: qcom_geni_serial: Process sysrq at port unlock time
>   serial: core: Include console.h from serial_core.h
>   serial: 8250: Process sysrq at port unlock time
>   smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
>   kgdb: Remove irq flags and local_irq_enable/disable from roundup
> 
>  arch/arc/kernel/kgdb.c                      |  4 +--
>  arch/arm/kernel/kgdb.c                      |  4 +--
>  arch/arm64/kernel/kgdb.c                    |  4 +--
>  arch/hexagon/kernel/kgdb.c                  | 11 ++----
>  arch/mips/kernel/kgdb.c                     |  4 +--
>  arch/powerpc/kernel/kgdb.c                  |  2 +-
>  arch/sh/kernel/kgdb.c                       |  4 +--
>  arch/sparc/kernel/smp_64.c                  |  2 +-
>  arch/x86/kernel/kgdb.c                      |  9 ++---
>  drivers/tty/serial/8250/8250_aspeed_vuart.c |  6 +++-
>  drivers/tty/serial/8250/8250_fsl.c          |  6 +++-
>  drivers/tty/serial/8250/8250_omap.c         |  6 +++-
>  drivers/tty/serial/8250/8250_port.c         |  8 ++---
>  drivers/tty/serial/qcom_geni_serial.c       | 10 ++++--
>  include/linux/kgdb.h                        |  9 ++---
>  include/linux/serial_core.h                 | 38 ++++++++++++++++++++-
>  kernel/debug/debug_core.c                   |  2 +-
>  kernel/smp.c                                |  4 ++-
>  18 files changed, 80 insertions(+), 53 deletions(-)
> 
> -- 
> 2.19.1.568.g152ad8e336-goog
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb
  2018-10-30 11:56 ` [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Daniel Thompson
@ 2018-10-30 12:31   ` Russell King - ARM Linux
  0 siblings, 0 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2018-10-30 12:31 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Douglas Anderson, Jason Wessel, tglx, mingo, gregkh,
	linux-arm-msm, kgdb-bugreport, nm, linux-mips, dalias,
	catalin.marinas, vigneshr, linux-aspeed, linux-sh, peterz,
	will.deacon, mhocko, paulus, hpa, sparclinux, marex, sfr, ysato,
	linux-hexagon, x86, pombredanne, tony, mingo, joel, linux-serial,
	rolf.evers.fischer, jhogan, asierra, linux-snps-arc,
	dan.carpenter, ying.huang, riel, frederic, jslaby, paul.burton,
	rppt, bp, luto, andriy.shevchenko, linux-arm-kernel,
	christophe.leroy, andrew, linux-kernel, ralf, rkuo,
	Jisheng.Zhang, vgupta, benh, jk, mpe, akpm, linuxppc-dev, davem,
	kstewart

On Tue, Oct 30, 2018 at 11:56:28AM +0000, Daniel Thompson wrote:
> On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> > Looking back, this is pretty much two series squashed that could be
> > treated indepdently.  The first is a serial series and the second is a
> > kgdb series.
> 
> Indeed.
> 
> I couldn't work out the link between the first 5 patches and the last 2
> until I read this...
> 
> Is anything in the 01->05 patch set even related to kgdb? From the stack
> traces it looks to me like the lock dep warning would trigger for any
> sysrq. I think separating into two threads for v2 would be sensible.

I'm concerned about calling smp_call_function() from IRQ context with
IRQs disabled - that will block the ability of the _calling_ CPU to
process IPIs from other CPUs in the system.  If we have other CPUs
waiting on their IPIs to complete on _this_ CPU, we could end up
deadlocking while trying to grab the CSD lock.

This is the intention of warnings in smp_call_function*() - to catch
cases where deadlocks _can_ occur, but do not reliably show up.

The exceptions to the warning (disregarding oops_in_progress) are
chosen to allow IRQs-disabled calls when we're sure that the rest of
the system isn't going to be sending the calling CPU an IPI (eg,
because the CPU isn't marked online, and we only send IPIs to online
CPUs, or if we're still early in the kernel boot and hence have no
other CPUs running.)  The exception is oops_in_progress, which can
occur at any time - even with the current CPU already holding some
CSD locks (eg, oops while trying to send an IPI.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb
  2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
                   ` (7 preceding siblings ...)
  2018-10-30 11:56 ` [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Daniel Thompson
@ 2018-10-30 12:36 ` Andy Shevchenko
  8 siblings, 0 replies; 18+ messages in thread
From: Andy Shevchenko @ 2018-10-30 12:36 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Jason Wessel, Daniel Thompson, tglx, mingo, gregkh,
	linux-arm-msm, kgdb-bugreport, nm, linux-mips, dalias,
	catalin.marinas, vigneshr, linux-aspeed, linux-sh, peterz,
	will.deacon, mhocko, paulus, hpa, sparclinux, marex, sfr, ysato,
	linux-hexagon, x86, linux, pombredanne, tony, mingo, joel,
	linux-serial, rolf.evers.fischer, jhogan, asierra,
	linux-snps-arc, dan.carpenter, ying.huang, riel, frederic,
	jslaby, paul.burton, rppt, bp, luto, linux-arm-kernel,
	christophe.leroy, andrew, linux-kernel, ralf, rkuo,
	Jisheng.Zhang, vgupta, benh, jk, mpe, akpm, linuxppc-dev, davem,
	kstewart

On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> I started out this series trying to make sysrq work over the serial
> console on qcom_geni_serial, then fell into a rat's nest.
> 
> To solve the deadlock I faced when enabling sysrq I tried to borrow
> code from '8250_port.c' which avoided grabbing the port lock in
> console_write().  ...but since these days I try to run with lockdep on
> all the time, I found it caused an annoying lockdep splat (which I
> also reproduced on my rk3399 board).  ...so I instead changed my
> qcom_geni_serial solution to borrow code from 'msm_serial.c'
> 
> I wasn't super happy with the solution in 'msm_serial.c' though.  I
> don't like releasing the spinlock there.  Not only is it ugly but it
> means we are unlocking / re-locking _all the time_ even though sysrq
> characters are rare.  ...so I came up with what I think is a better
> solution and then implemented it for qcom_geni_serial.
> 
> Since I had a good way to test 8250-based UARTs, I also fixed that
> driver to use my new method.  When doing so, I ran into a missing
> msm_serial.c at all, so I didn't switch that (or all other serial
> drivers for that matter) to the new method.
> 
> After fixing all the above issues, I found the next lockdep splat in
> kgdb and I think I've worked around it in a good-enough way, but I'm
> much less confident about this.  Hopefully folks can take a look at
> it.
> 
> In general, patches earlier in this series should be "less
> controversial" and hopefully can land even if people don't like
> patches later in the series.  ;-)
> 
> Looking back, this is pretty much two series squashed that could be
> treated indepdently.  The first is a serial series and the second is a
> kgdb series.
> 
> For all serial patches I'd expect them to go through the tty tree once
> they've been reviewed.
> 
> If folks are OK w/ the 'smp' patch it probably should go in some core
> kernel tree.  The kgdb patch won't work without it, though, so to land
> that we'd need coordination between the folks landing that and the
> folks landing the 'smp' patch.

I have got only 0/7 and 5/7, everything okay with your mail client and other tools?

> 
> 
> Douglas Anderson (7):
>   serial: qcom_geni_serial: Finish supporting sysrq
>   serial: core: Allow processing sysrq at port unlock time
>   serial: qcom_geni_serial: Process sysrq at port unlock time
>   serial: core: Include console.h from serial_core.h
>   serial: 8250: Process sysrq at port unlock time
>   smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
>   kgdb: Remove irq flags and local_irq_enable/disable from roundup
> 
>  arch/arc/kernel/kgdb.c                      |  4 +--
>  arch/arm/kernel/kgdb.c                      |  4 +--
>  arch/arm64/kernel/kgdb.c                    |  4 +--
>  arch/hexagon/kernel/kgdb.c                  | 11 ++----
>  arch/mips/kernel/kgdb.c                     |  4 +--
>  arch/powerpc/kernel/kgdb.c                  |  2 +-
>  arch/sh/kernel/kgdb.c                       |  4 +--
>  arch/sparc/kernel/smp_64.c                  |  2 +-
>  arch/x86/kernel/kgdb.c                      |  9 ++---
>  drivers/tty/serial/8250/8250_aspeed_vuart.c |  6 +++-
>  drivers/tty/serial/8250/8250_fsl.c          |  6 +++-
>  drivers/tty/serial/8250/8250_omap.c         |  6 +++-
>  drivers/tty/serial/8250/8250_port.c         |  8 ++---
>  drivers/tty/serial/qcom_geni_serial.c       | 10 ++++--
>  include/linux/kgdb.h                        |  9 ++---
>  include/linux/serial_core.h                 | 38 ++++++++++++++++++++-
>  kernel/debug/debug_core.c                   |  2 +-
>  kernel/smp.c                                |  4 ++-
>  18 files changed, 80 insertions(+), 53 deletions(-)
> 
> -- 
> 2.19.1.568.g152ad8e336-goog
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
  2018-10-30  9:41   ` Daniel Thompson
@ 2018-10-30 22:21     ` Doug Anderson
  0 siblings, 0 replies; 18+ messages in thread
From: Doug Anderson @ 2018-10-30 22:21 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Jason Wessel, Thomas Gleixner, Ingo Molnar, Greg Kroah-Hartman,
	linux-arm-msm, kgdb-bugreport, linux-mips, linux-sh,
	Peter Zijlstra, linux-hexagon, frederic, riel, LKML, luto,
	sparclinux, linux-snps-arc, linuxppc-dev, Linux ARM

Daniel,

On Tue, Oct 30, 2018 at 2:42 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>
> On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote:
> > In kgdb_roundup_cpus() we've got code that looks like:
> >   local_irq_enable();
> >   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> >   local_irq_disable();
> >
> > In certain cases when we drop into kgdb (like with sysrq-g on a serial
> > console) we'll get a big yell that looks like:
> >
> >   sysrq: SysRq : DEBUG
> >   ------------[ cut here ]------------
> >   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> >   WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
> >   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
> >   pstate: 604003c9 (nZCv DAIF +PAN -UAO)
> >   pc : lockdep_hardirqs_on+0xf0/0x160
> >   ...
> >   Call trace:
> >    lockdep_hardirqs_on+0xf0/0x160
> >    trace_hardirqs_on+0x188/0x1ac
> >    kgdb_roundup_cpus+0x14/0x3c
> >    kgdb_cpu_enter+0x53c/0x5cc
> >    kgdb_handle_exception+0x180/0x1d4
> >    kgdb_compiled_brk_fn+0x30/0x3c
> >    brk_handler+0x134/0x178
> >    do_debug_exception+0xfc/0x178
> >    el1_dbg+0x18/0x78
> >    kgdb_breakpoint+0x34/0x58
> >    sysrq_handle_dbg+0x54/0x5c
> >    __handle_sysrq+0x114/0x21c
> >    handle_sysrq+0x30/0x3c
> >    qcom_geni_serial_isr+0x2dc/0x30c
> >   ...
> >   ...
> >   irq event stamp: ...45
> >   hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
> >   hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
> >   softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
> >   softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
> >   ---[ end trace adf21f830c46e638 ]---
> >
> > Let's add kgdb to the list of reasons not to warn in
> > smp_call_function_many().  That will allow us (in a future patch) to
> > stop calling local_irq_enable() which will get rid of the original
> > splat.
> >
> > NOTE: with this change comes the obvious question: will we start
> > deadlocking more often now when we drop into the debugger.  I can't
> > say that for sure one way or the other, but the fact that we do the
> > same logic for "oops_in_progress" makes me feel like it shouldn't
> > happen too often.  Also note that the old logic of turning on
> > interrupts temporarily wasn't exactly safe since (I presume) that
> > could have violated spin_lock_irqsave() semantics and ended up with a
> > deadlock of its own.
>
> This is part of the code to bring all the cores to a halt and since
> the other cores are still running kgdb isn't yet able to use the fact
> all the CPUs are halted to bend the rules. It is better for this code
> to play by the rules if it can.
>
> Is is possible to get the roundup functions to use a private csd
> alongside smp_call_function_single_async()? We could add a helper
> function to the debug core to avoid having to add cpu_online loops into
> every kgdb_roundup_cpus() implementaton.

Exactly the kind of helpful suggestion I was looking for.  Thank you
very much!  See v2 and hopefully it matches what you were thinking of.

-Doug

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup
  2018-10-30 11:46   ` Daniel Thompson
@ 2018-10-30 22:22     ` Doug Anderson
  0 siblings, 0 replies; 18+ messages in thread
From: Doug Anderson @ 2018-10-30 22:22 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Jason Wessel, Thomas Gleixner, Ingo Molnar, Greg Kroah-Hartman,
	linux-arm-msm, kgdb-bugreport, kstewart, linux-mips, dalias,
	linux-sh, Benjamin Herrenschmidt, Will Deacon, LKML, mhocko,
	paulus, H. Peter Anvin, sparclinux, linux-hexagon,
	Stephen Rothwell, ysato, mpe, x86, Russell King - ARM Linux,
	Ingo Molnar, Catalin Marinas, jhogan, linux-snps-arc, ying.huang,
	rppt, bp, Linux ARM, christophe.leroy, pombredanne, Ralf Baechle,
	rkuo, paul.burton, Vineet Gupta, Andrew Morton, linuxppc-dev,
	David Miller

Hi,

On Tue, Oct 30, 2018 at 4:46 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>
> On Mon, Oct 29, 2018 at 11:07:07AM -0700, Douglas Anderson wrote:
> > The function kgdb_roundup_cpus() was passed a parameter that was
> > documented as:
> >
> > > the flags that will be used when restoring the interrupts. There is
> > > local_irq_save() call before kgdb_roundup_cpus().
> >
> > Nobody used those flags.  Anyone who wanted to temporarily turn on
> > interrupts just did local_irq_enable() and local_irq_disable() without
> > looking at them.  So we can definitely remove the flags.
>
> On the whole I'd rather that this change...
>
>
> > Speaking of calling local_irq_enable(), it seems like a bad idea and
> > it caused a nice splat on my system with lockdep turned on.
> > Specifically it hit:
> >   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>
> ... and fixes for this this were in separate patches. They don't appear
> especially related.

Agreed that this is cleaner.  Done for v2.

-Doug

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/7] serial: qcom_geni_serial: Finish supporting sysrq
  2018-10-29 18:07 ` [PATCH 1/7] serial: qcom_geni_serial: Finish supporting sysrq Douglas Anderson
@ 2018-11-02 16:47   ` Stephen Boyd
  0 siblings, 0 replies; 18+ messages in thread
From: Stephen Boyd @ 2018-11-02 16:47 UTC (permalink / raw)
  To: Daniel Thompson, Douglas Anderson, Jason Wessel, gregkh, mingo, tglx
  Cc: linux-arm-msm, kgdb-bugreport, Douglas Anderson, linux-kernel,
	linux-serial, jslaby

Quoting Douglas Anderson (2018-10-29 11:07:01)
> The geni serial driver already had some sysrq code in it, but since
> SUPPORT_SYSRQ wasn't defined the code didn't do anything useful.
> Let's make it useful by adding that define using the same formula
> found in other serial drivers.
> 
> In order to prevent deadlock, we'll take a page from the
> 'msm_serial.c' where the spinlock is released around
> uart_handle_sysrq_char().  This seemed better than copying from
> '8250_port.c' where we skip locking in the console_write function
> since the '8250_port.c' method can cause lockdep warnings when
> dropping into kgdb.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Given that it's the same as msm_serial.c, and I wrote the msm_serial.c
"hack" then:

Reviewed-by: Stephen Boyd <swboyd@chromium.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-11-02 16:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-29 18:07 [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Douglas Anderson
2018-10-29 18:07 ` [PATCH 1/7] serial: qcom_geni_serial: Finish supporting sysrq Douglas Anderson
2018-11-02 16:47   ` Stephen Boyd
2018-10-29 18:07 ` [PATCH 2/7] serial: core: Allow processing sysrq at port unlock time Douglas Anderson
2018-10-30  5:31   ` Doug Anderson
2018-10-29 18:07 ` [PATCH 3/7] serial: qcom_geni_serial: Process " Douglas Anderson
2018-10-29 18:07 ` [PATCH 4/7] serial: core: Include console.h from serial_core.h Douglas Anderson
2018-10-29 18:07 ` [PATCH 5/7] serial: 8250: Process sysrq at port unlock time Douglas Anderson
2018-10-29 18:07 ` [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus() Douglas Anderson
2018-10-30  8:25   ` Peter Zijlstra
2018-10-30  9:41   ` Daniel Thompson
2018-10-30 22:21     ` Doug Anderson
2018-10-29 18:07 ` [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup Douglas Anderson
2018-10-30 11:46   ` Daniel Thompson
2018-10-30 22:22     ` Doug Anderson
2018-10-30 11:56 ` [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb Daniel Thompson
2018-10-30 12:31   ` Russell King - ARM Linux
2018-10-30 12:36 ` Andy Shevchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).