linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANNOUNCE] v5.4.28-rt19
@ 2020-03-30 14:47 Sebastian Andrzej Siewior
  2020-04-23  8:51 ` [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19) Jiri Kosina
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-30 14:47 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, linux-rt-users, Steven Rostedt

Dear RT folks!

I'm pleased to announce the v5.4.28-rt19 patch set. 

Changes since v5.4.28-rt18:

  - Memory compaction of unevictable pages has been disabled in the v5.4
    release. This aligns the change with what has been prepared for
    upstream: Disable is by default on PREEMPT_RT
    (/proc/sys/vm/compact_unevictable_allowed reads 0) but allow
    enabling it which will trigger a warning.

Known issues
     - It has been pointed out that due to changes to the printk code the
       internal buffer representation changed. This is only an issue if tools
       like `crash' are used to extract the printk buffer from a kernel memory
       image.

The delta patch against v5.4.28-rt18 is appended below and can be found here:
 
     https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.4/incr/patch-5.4.28-rt18-rt19.patch.xz

You can get this release via the git tree at:

    git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git v5.4.28-rt19

The RT patch against v5.4.28 can be found here:

    https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.4/older/patch-5.4.28-rt19.patch.xz

The split quilt queue is available at:

    https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.4/older/patches-5.4.28-rt19.tar.xz

Sebastian

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 64aeee1009cab..0329a4d3fa9ec 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -128,6 +128,9 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact.
 This should be used on systems where stalls for minor page faults are an
 acceptable trade for large contiguous free memory.  Set to 0 to prevent
 compaction from moving pages that are unevictable.  Default value is 1.
+On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due
+to compaction, which would block the task from becomming active until the fault
+is resolved.
 
 
 dirty_background_bytes
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index d08bd51a0fbc3..8221219d65c35 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -212,6 +212,11 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos);
 static int proc_taint(struct ctl_table *table, int write,
 			       void __user *buffer, size_t *lenp, loff_t *ppos);
+#ifdef CONFIG_COMPACTION
+static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table,
+					       int write, void __user *buffer,
+					       size_t *lenp, loff_t *ppos);
+#endif
 #endif
 
 #ifdef CONFIG_PRINTK
@@ -1488,17 +1493,16 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &min_extfrag_threshold,
 		.extra2		= &max_extfrag_threshold,
 	},
-#ifndef CONFIG_PREEMPT_RT
 	{
 		.procname	= "compact_unevictable_allowed",
 		.data		= &sysctl_compact_unevictable_allowed,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
+		.proc_handler	= proc_dointvec_minmax_warn_RT_change,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
 	},
-#endif
+
 #endif /* CONFIG_COMPACTION */
 	{
 		.procname	= "min_free_kbytes",
@@ -2582,6 +2586,28 @@ int proc_dointvec(struct ctl_table *table, int write,
 	return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
 }
 
+#ifdef CONFIG_COMPACTION
+static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table,
+					       int write, void __user *buffer,
+					       size_t *lenp, loff_t *ppos)
+{
+	int ret, old;
+
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write)
+		return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+	old = *(int *)table->data;
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+	if (ret)
+		return ret;
+	if (old != *(int *)table->data)
+		pr_warn_once("sysctl attribute %s changed by %s[%d]\n",
+			     table->procname, current->comm,
+			     task_pid_nr(current));
+	return ret;
+}
+#endif
+
 /**
  * proc_douintvec - read a vector of unsigned integers
  * @table: the sysctl table
diff --git a/localversion-rt b/localversion-rt
index 9e7cd66d9f44f..483ad771f201a 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt18
+-rt19
diff --git a/mm/compaction.c b/mm/compaction.c
index 83cc3d1e5df7b..31e6e103f38b4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1591,7 +1591,7 @@ typedef enum {
  * compactable pages.
  */
 #ifdef CONFIG_PREEMPT_RT
-#define sysctl_compact_unevictable_allowed 0
+int sysctl_compact_unevictable_allowed __read_mostly = 0;
 #else
 int sysctl_compact_unevictable_allowed __read_mostly = 1;
 #endif

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-03-30 14:47 [ANNOUNCE] v5.4.28-rt19 Sebastian Andrzej Siewior
@ 2020-04-23  8:51 ` Jiri Kosina
  2020-04-23  9:12   ` Jiri Kosina
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Kosina @ 2020-04-23  8:51 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On Mon, 30 Mar 2020, Sebastian Andrzej Siewior wrote:

> I'm pleased to announce the v5.4.28-rt19 patch set. 

First, I don't believe this is necessarily a regression coming with this 
particular version, but this is the first kernel where I tried this and it 
crashed.

When running this kernel in KVM guest, and flooding its 8250-based ttyS0 
serial console quickly leads to

[   52.674358] 000: irq 4: nobody cared (try booting with the "irqpoll" option)
[   52.674381] 000: CPU: 0 PID: 155 Comm: irq/4-ttyS0 Not tainted 5.4.28-rt19+ #7
[   52.674389] 000: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014
[   52.674398] 000: Call Trace:
[   52.674417] 000:  <IRQ>
[   52.674426] 000:  dump_stack+0x50/0x6b
[   52.674446] 000:  __report_bad_irq+0x2b/0xb0
[   52.674454] 000:  note_interrupt+0x22e/0x280
[   52.674457] 000:  ? io_serial_out+0x11/0x20
[   52.674463] 000:  handle_irq_event_percpu+0x6e/0x90
[   52.674471] 000:  handle_irq_event+0x48/0x90
[   52.674473] 000:  handle_edge_irq+0x95/0x1f0
[   52.674476] 000:  do_IRQ+0x6c/0x120
[   52.674487] 000:  common_interrupt+0xf/0xf
[   52.674496] 000:  </IRQ>
[   52.674497] 000: RIP: 0010:io_serial_out+0x11/0x20
[   52.674503] 000: Code: 8b 57 30 d3 e6 01 f2 ec 0f b6 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f b6 8f e1 00 00 00 89 d0 8b 57 30 d3 e6 01 f2 ee <c3> 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f b6 87 e2 00 00 00
[   52.674507] 000: RSP: 0018:ffffa860c03a7de8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdb
[   52.674508] 000: RAX: 0000000000000034 RBX: ffffffffb05a3bc0 RCX: 0000000000000000
[   52.674510] 000: RDX: 00000000000003f8 RSI: 0000000000000000 RDI: ffffffffb05a3bc0
[   52.674514] 000: RBP: ffff98c1b9718000 R08: ffffffffafa0b8e0 R09: ffffa860c03a7d80
[   52.674515] 000: R10: ffffa860c03a7c40 R11: 0000000000000000 R12: 0000000000000009
[   52.674516] 000: R13: 0000000000000020 R14: 0000000000000001 R15: ffff98c13714a8b8
[   52.674524] 000:  ? io_serial_in+0x20/0x20
[   52.674529] 000:  serial8250_tx_chars+0x121/0x310
[   52.674535] 000:  ? migrate_disable+0x33/0x90
1339913399133991[   52.674540] 000:  
serial8250_handle_irq.part.27+0xc2/0xe0
[   52.674543] 000:  serial8250_default_handle_irq+0x45/0x50
[   52.674545] 000:  serial8250_interrupt+0x51/0xa0
[   52.674548] 000:  ? irq_finalize_oneshot.part.46+0xd0/0xd0
[   52.674550] 000:  irq_forced_thread_fn+0x2b/0x70
[   52.674552] 000:  irq_thread+0xd8/0x170
[   52.674554] 000:  ? wake_threads_waitq+0x30/0x30
[   52.674556] 000:  ? irq_thread_dtor+0x90/0x90
[   52.674558] 000:  kthread+0x10e/0x130
[   52.674563] 000:  ? kthread_park+0x80/0x80
[   52.674565] 000:  ret_from_fork+0x35/0x40
[   52.674571] 000: handlers:
[   52.674579] 000: [<00000000d13c0583>] irq_default_primary_handler threaded [<00000000f172db97>] serial8250_interrupt
[   52.674580] 000: Disabling IRQ #4

(and things go downhill afterwards with RCU stall on the affected CPU, 
etc).

Reproducing it is as easy as firing up qemu with '-serial stdio', booting 
with console set to ttyS0, and then doing e.g.

	i=0; while test $i -lt 100000; do /usr/bin/echo $i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i$i; i=$((i+1)); done

in the serial console terminal. It explodes within a few tens of seconds.

Now, whenever this happens, we always seem to be interrupted in 
0010:io_serial_out+0x11/0x20, which is this in Code

	... d3 e6 01 f2 ee <c3> 66 66 2e 0f ...

where '0xee' is the actual outb. So it looks like outb would take so long 
that 99k IRQ4s trigger in the meantime, which would seem to point to KVM / 
qemu 
issue. 

*However* after either

- turning CONFIG_PREEMPT_RT off 

	or

- starting qemu-kvm without any '-smp' args (i.e. starting the guest as 
  UP)

this issue never reproduces any more; the reason for it almost certainly 
never triggering with !CONFIG_PREEMPT_RT is that the above pattern 
obviously can't happen, because we never interrupt that io_serial_out, 
because the spin_lock_irqsave() in serial8250_handle_irq() disables IRQ in 
this region. Why it doesn't reproduce on UP I don't understand.

I haven't done much debugging on this yet, but reporting this right away 
in case anyone has any idea what's happening here.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-04-23  8:51 ` [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19) Jiri Kosina
@ 2020-04-23  9:12   ` Jiri Kosina
  2020-04-23 10:45     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Kosina @ 2020-04-23  9:12 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On Thu, 23 Apr 2020, Jiri Kosina wrote:

> > I'm pleased to announce the v5.4.28-rt19 patch set. 
> 
> First, I don't believe this is necessarily a regression coming with this 
> particular version, but this is the first kernel where I tried this and it 
> crashed.

I just tried with 5.6.4-rt3, and I can make it explode exactly the same 
way:

[   46.523586] 000: irq 4: nobody cared (try booting with the "irqpoll" option)
4187418741874187[   46.523608] 000: CPU: 0 PID: 1219 Comm: irq/4-ttyS0 Not tainted 5.6.4-rt19+ #10
[   46.523611] 000: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014
[   46.523613] 000: Call Trace:
[   46.523630] 000:  <IRQ>
[   46.523638] 000:  dump_stack+0x50/0x6b
[   46.523655] 000:  __report_bad_irq+0x2b/0xb0
[   46.523662] 000:  note_interrupt+0x22e/0x280
[   46.523666] 000:  ? io_serial_out+0x11/0x20
[   46.523671] 000:  handle_irq_event_percpu+0x6e/0x90
[   46.523676] 000:  handle_irq_event+0x48/0x90
[   46.523678] 000:  handle_edge_irq+0x95/0x1f0
[   46.523683] 000:  do_IRQ+0x6c/0x120
[   46.523692] 000:  common_interrupt+0xf/0xf
[   46.523700] 000:  </IRQ>
[   46.523701] 000: RIP: 0010:io_serial_out+0x11/0x20
[   46.523706] 000: Code: 8b 57 30 d3 e6 01 f2 ec 0f b6 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f b6 8f e1 00 00 00 89 d0 8b 57 30 d3 e6 01 f2 ee <c3> 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f b6 87 e2 00 00 00
[   46.523709] 000: RSP: 0018:ffffa4ce80813de8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdb
[   46.523710] 000: RAX: 0000000000000031 RBX: ffffffffb57a0de0 RCX: 0000000000000000
[   46.523716] 000: RDX: 00000000000003f8 RSI: 0000000000000000 RDI: ffffffffb57a0de0
[   46.523718] 000: RBP: ffffa30bf9320000 R08: ffffffffb4c0b7a0 R09: 0000000000000001
[   46.523719] 000: R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000010
[   46.523720] 000: R13: 0000000000000020 R14: 0000000000000001 R15: ffffa30bfa737498
[   46.523730] 000:  ? io_serial_in+0x20/0x20
[   46.523732] 000:  serial8250_tx_chars+0x121/0x310
[   46.523735] 000:  ? migrate_disable+0x33/0x90
[   46.523739] 000:  serial8250_handle_irq.part.26+0xbb/0xc0
[   46.523745] 000:  serial8250_default_handle_irq+0x45/0x50
[   46.523747] 000:  serial8250_interrupt+0x51/0xa0
[   46.523750] 000:  ? irq_finalize_oneshot.part.46+0xd0/0xd0
[   46.523752] 000:  irq_forced_thread_fn+0x2b/0x70
[   46.523754] 000:  irq_thread+0xd8/0x170
[   46.523757] 000:  ? wake_threads_waitq+0x30/0x30
[   46.523759] 000:  ? irq_thread_dtor+0x90/0x90
[   46.523760] 000:  kthread+0x10e/0x130
[   46.523765] 000:  ? kthread_park+0x80/0x80
[   46.523766] 000:  ret_from_fork+0x35/0x40
[   46.523772] 000: handlers:
[   46.523776] 000: [<00000000cdfd4a1d>] irq_default_primary_handler threaded [<0000000078f3e5cd>] serial8250_interrupt
[   46.523778] 000: Disabling IRQ #4

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-04-23  9:12   ` Jiri Kosina
@ 2020-04-23 10:45     ` Sebastian Andrzej Siewior
  2020-04-23 13:06       ` Steven Rostedt
  2020-04-23 16:07       ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-04-23 10:45 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On 2020-04-23 11:12:59 [+0200], Jiri Kosina wrote:
> On Thu, 23 Apr 2020, Jiri Kosina wrote:
> 
> > > I'm pleased to announce the v5.4.28-rt19 patch set. 
> > 
> > First, I don't believe this is necessarily a regression coming with this 
> > particular version, but this is the first kernel where I tried this and it 
> > crashed.
> 
> I just tried with 5.6.4-rt3, and I can make it explode exactly the same 
> way:

I though I dealt with it. In the past it triggered also with threadirqs
on !RT but this isn't the case anymore. It still explodes on RT. Let me
look…

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-04-23 10:45     ` Sebastian Andrzej Siewior
@ 2020-04-23 13:06       ` Steven Rostedt
  2020-04-23 18:34         ` Steven Rostedt
  2020-04-23 16:07       ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2020-04-23 13:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Jiri Kosina, Thomas Gleixner, LKML, linux-rt-users,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On Thu, 23 Apr 2020 12:45:59 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2020-04-23 11:12:59 [+0200], Jiri Kosina wrote:
> > On Thu, 23 Apr 2020, Jiri Kosina wrote:
> >   
> > > > I'm pleased to announce the v5.4.28-rt19 patch set.   
> > > 
> > > First, I don't believe this is necessarily a regression coming with this 
> > > particular version, but this is the first kernel where I tried this and it 
> > > crashed.  
> > 
> > I just tried with 5.6.4-rt3, and I can make it explode exactly the same 
> > way:  
> 
> I though I dealt with it. In the past it triggered also with threadirqs
> on !RT but this isn't the case anymore. It still explodes on RT. Let me
> look…

Good, something to look for as a backport ;-)

BTW, I haven't released a new 5.4-rt because the default one is triggering
a large latency somewhere, and makes my tests fail. I'm trying to dig into
the cause when I get time.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-04-23 10:45     ` Sebastian Andrzej Siewior
  2020-04-23 13:06       ` Steven Rostedt
@ 2020-04-23 16:07       ` Sebastian Andrzej Siewior
  2020-04-23 16:20         ` [PATCH RT] Revert "rt: Improve the serial console PASS_LIMIT" Sebastian Andrzej Siewior
                           ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-04-23 16:07 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On 2020-04-23 12:45:59 [+0200], To Jiri Kosina wrote:
> On 2020-04-23 11:12:59 [+0200], Jiri Kosina wrote:
> > On Thu, 23 Apr 2020, Jiri Kosina wrote:
> > 
> > > > I'm pleased to announce the v5.4.28-rt19 patch set. 
> > > 
> > > First, I don't believe this is necessarily a regression coming with this 
> > > particular version, but this is the first kernel where I tried this and it 
> > > crashed.
> > 
> > I just tried with 5.6.4-rt3, and I can make it explode exactly the same 
> > way:
> 
> I though I dealt with it. In the past it triggered also with threadirqs
> on !RT but this isn't the case anymore. It still explodes on RT. Let me
> look…

So it also happens with !RT, you just have to try a little harder. For
instance in drivers/tty/serial/8250/8250_core.c making the PASS_LIMIT
change apply to !RT and boom.

The IRQ4 is edge and in charge of ttyS0. It is handled by
handle_edge_irq() and after ->irq_ack(), the thread is woken up and then
we get another ->handle_edge_irq() for IRQ4. With larger PASS_LIMIT the
thread runs longer so note_interrupt() will make less IRQ_HANDLED based
on ->threads_handled_last. If it observes 100 handled within 100000
interrupts then the counters are reset again. On !RT it usually manages
to get >100 per 100000 interrupts so it appears good. On RT it gets less
and the interrupt gets disabled.

So it is not RT related, but RT triggers it more reliably (also the
PASS_LIMIT change can vanish).
I can't tell if this is a qemu bug in emulating the HW or not. I can't
reproduce it real HW. I see a second edge interrupt only after the
thread completed. I can't tell if this is because it is a real UART and
the data is flowing slower or because the edge-IRQ is not triggered
repeatedly.

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RT] Revert "rt: Improve the serial console PASS_LIMIT"
  2020-04-23 16:07       ` Sebastian Andrzej Siewior
@ 2020-04-23 16:20         ` Sebastian Andrzej Siewior
  2020-04-23 16:21           ` Sebastian Andrzej Siewior
  2020-04-23 19:48         ` [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19) Jiri Kosina
  2020-04-24 19:19         ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-04-23 16:20 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

There is no need to loop for longer. The message of too much work was
removed in commit
    9d7c249a1ef9b ("serial: 8250: drop the printk from serial8250_interrupt()")

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/tty/serial/8250/8250_core.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index 15d7c47fb7ca3..475b51f4321db 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -55,16 +55,7 @@ static struct uart_driver serial8250_reg;
 
 static unsigned int skip_txen_test; /* force skip of txen test at init time */
 
-/*
- * On -rt we can have a more delays, and legitimately
- * so - so don't drop work spuriously and spam the
- * syslog:
- */
-#ifdef CONFIG_PREEMPT_RT
-# define PASS_LIMIT	1000000
-#else
-# define PASS_LIMIT	512
-#endif
+#define PASS_LIMIT	512
 
 #include <asm/serial.h>
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RT] Revert "rt: Improve the serial console PASS_LIMIT"
  2020-04-23 16:20         ` [PATCH RT] Revert "rt: Improve the serial console PASS_LIMIT" Sebastian Andrzej Siewior
@ 2020-04-23 16:21           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-04-23 16:21 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On 2020-04-23 18:20:04 [+0200], To Jiri Kosina wrote:
> There is no need to loop for longer. The message of too much work was
> removed in commit
>     9d7c249a1ef9b ("serial: 8250: drop the printk from serial8250_interrupt()")
> 

This of course does not explain why the edge interrupt is constantly
active but one step at a time.

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-04-23 13:06       ` Steven Rostedt
@ 2020-04-23 18:34         ` Steven Rostedt
  2020-04-23 18:47           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2020-04-23 18:34 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Jiri Kosina, Thomas Gleixner, LKML, linux-rt-users,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On Thu, 23 Apr 2020 09:06:01 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> BTW, I haven't released a new 5.4-rt because the default one is triggering
> a large latency somewhere, and makes my tests fail. I'm trying to dig into
> the cause when I get time.

Figured it out. Seems that starting with 5.4-rt, PREEMPT_RT now depends on
CONFIG_EXPERT, which I didn't have set, and PREEMPT_RT was disabled when
using my default config.

Had to add CONFIG_EXPERT to my default config to make it work again.

Starting the tests over....

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-04-23 18:34         ` Steven Rostedt
@ 2020-04-23 18:47           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-04-23 18:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Kosina, Thomas Gleixner, LKML, linux-rt-users,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On 2020-04-23 14:34:37 [-0400], Steven Rostedt wrote:
> On Thu, 23 Apr 2020 09:06:01 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > BTW, I haven't released a new 5.4-rt because the default one is triggering
> > a large latency somewhere, and makes my tests fail. I'm trying to dig into
> > the cause when I get time.
> 
> Figured it out. Seems that starting with 5.4-rt, PREEMPT_RT now depends on
> CONFIG_EXPERT, which I didn't have set, and PREEMPT_RT was disabled when
> using my default config.
> 
> Had to add CONFIG_EXPERT to my default config to make it work again.

Yes, this changed upstream.

> Starting the tests over....
> 
> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-04-23 16:07       ` Sebastian Andrzej Siewior
  2020-04-23 16:20         ` [PATCH RT] Revert "rt: Improve the serial console PASS_LIMIT" Sebastian Andrzej Siewior
@ 2020-04-23 19:48         ` Jiri Kosina
  2020-04-23 20:15           ` Steven Rostedt
  2020-04-24 19:19         ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 15+ messages in thread
From: Jiri Kosina @ 2020-04-23 19:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On Thu, 23 Apr 2020, Sebastian Andrzej Siewior wrote:

> The IRQ4 is edge and in charge of ttyS0. It is handled by 
> handle_edge_irq() and after ->irq_ack(), the thread is woken up and then 
> we get another ->handle_edge_irq() for IRQ4. With larger PASS_LIMIT the 
> thread runs longer so note_interrupt() will make less IRQ_HANDLED based 
> on ->threads_handled_last. If it observes 100 handled within 100000 
> interrupts then the counters are reset again. On !RT it usually manages 
> to get >100 per 100000 interrupts so it appears good. On RT it gets less 
> and the interrupt gets disabled.
> 
> So it is not RT related, but RT triggers it more reliably (also the 
> PASS_LIMIT change can vanish). I can't tell if this is a qemu bug in 
> emulating the HW or not. I can't reproduce it real HW. I see a second 
> edge interrupt only after the thread completed. I can't tell if this is 
> because it is a real UART and the data is flowing slower or because the 
> edge-IRQ is not triggered repeatedly.

Yeah, it's all strange. In the hope of understanding the issue a little 
bit better, I tried to disable IRQs in serial8250_handle_irq() to mimic 
what !PREEMPT_RT spinlock would do; the idea was that this is some kind of 
strange race / memory ordering (missed ack?) between the threaded irq4 
handler and the do_IRQ() -> handle_edge_irq() -> ... path.

So I did this:

---
 drivers/tty/serial/8250/8250_port.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index e31217e8dce6..1a577305e174 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1813,12 +1813,13 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
 int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
 {
 	unsigned char status;
-	unsigned long flags;
+	unsigned long flags, f;
 	struct uart_8250_port *up = up_to_u8250p(port);
 
 	if (iir & UART_IIR_NO_INT)
 		return 0;
 
+	local_irq_save(f);
 	spin_lock_irqsave(&port->lock, flags);
 
 	status = serial_port_in(port, UART_LSR);
@@ -1833,6 +1834,7 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
 		serial8250_tx_chars(up);
 
 	uart_unlock_and_check_sysrq(port, flags);
+	local_irq_restore(f);
 	return 1;
 }

But curiously enough, that exploded in the oposite order (so first there 
was CPU stall, and only later the disabling of IRQ4 due to spurious 
storm):

[  107.308525] 000: rcu: INFO: rcu_preempt self-detected stall on CPU
[  107.308542] 000: rcu: 	0-....: (20985 ticks this GP) idle=b2a/1/0x4000000000000002 softirq=0/0 fqs=4841 
[  107.308563] 000: 	(t=21000 jiffies g=16217 q=22343)
[  107.308565] 000: NMI backtrace for cpu 0
[  107.308570] 000: CPU: 0 PID: 1213 Comm: irq/4-ttyS0 Not tainted 5.6.4-rt19+ #11
[  107.308577] 000: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014
[  107.308579] 000: Call Trace:
[  107.308599] 000:  <IRQ>
[  107.308608] 000:  dump_stack+0x50/0x6b
[  107.308630] 000:  nmi_cpu_backtrace+0x8a/0x90
[  107.308639] 000:  ? lapic_can_unplug_cpu+0xd0/0xd0
[  107.308644] 000:  nmi_trigger_cpumask_backtrace+0xe9/0x130
[  107.308647] 000:  rcu_dump_cpu_stacks+0x8e/0xbc
[  107.308658] 000:  rcu_sched_clock_irq+0x70a/0xad0
[  107.308661] 000:  ? cpuacct_account_field+0x16/0x70
[  107.308667] 000:  ? tick_sched_do_timer+0x60/0x60
[  107.308675] 000:  update_process_times+0x1f/0x40
[  107.308680] 000:  tick_sched_handle+0x2f/0x40
[  107.308685] 000:  tick_sched_timer+0x3e/0x80
[  107.308687] 000:  __hrtimer_run_queues+0x11e/0x3c0
[  107.308691] 000:  hrtimer_interrupt+0xea/0x250
[  107.308694] 000:  smp_apic_timer_interrupt+0x84/0x200
[  107.308704] 000:  apic_timer_interrupt+0xf/0x20
[  107.308709] 000:  </IRQ>
[  107.308713] 000: RIP: 0010:serial8250_handle_irq.part.26+0x9e/0xd0
[  107.308719] 000: Code: 48 85 c0 74 09 80 b8 e9 00 00 00 00 74 0f 41 83 e5 20 74 09 f6 83 36 02 00 00 02 75 2d 31 f6 48 89 df e8 65 87 ff ff 41 54 9d <b8> 01 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 48 8b 40 08 48 89 df e8
[  107.308723] 000: RSP: 0018:ffffb6ffc0a63e20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  107.308726] 000: RAX: ffff90b23734e040 RBX: ffffffff9f5a0de0 RCX: 0000000000000000
[  107.308728] 000: RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffffffff9f5a0de0
[  107.308729] 000: RBP: 0000000000000060 R08: ffffffff9ea0b7a0 R09: ffffb6ffc0a63d78
[  107.308730] 000: R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000246
[  107.308731] 000: R13: 0000000000000020 R14: 00000000000000c2 R15: ffff90b2ba711e58
[  107.308739] 000:  serial8250_default_handle_irq+0x45/0x50
[  107.308741] 000:  serial8250_interrupt+0x51/0xa0
[  107.308749] 000:  ? irq_finalize_oneshot.part.46+0xd0/0xd0
[  107.308755] 000:  irq_forced_thread_fn+0x2b/0x70
[  107.308758] 000:  irq_thread+0xd8/0x170
[  107.308760] 000:  ? wake_threads_waitq+0x30/0x30
[  107.308761] 000:  ? irq_thread_dtor+0x90/0x90
[  107.308763] 000:  kthread+0x10e/0x130
[  107.308769] 000:  ? kthread_park+0x80/0x80
[  107.308770] 000:  ret_from_fork+0x35/0x40
[  165.360370] 000: irq 4: nobody cared (try booting with the "irqpoll" option)
[  165.360392] 000: CPU: 0 PID: 1213 Comm: irq/4-ttyS0 Not tainted 5.6.4-rt19+ #11
[  165.360398] 000: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014
[  165.360599] 000: Call Trace:
[  165.361639] 000:  <IRQ>
[  165.361651] 000:  dump_stack+0x50/0x6b
[  165.361670] 000:  __report_bad_irq+0x2b/0xb0
[  165.361678] 000:  note_interrupt+0x22e/0x280
[  165.361680] 000:  ? serial8250_handle_irq.part.26+0x9e/0xd0
[  165.361685] 000:  handle_irq_event_percpu+0x6e/0x90
[  165.361693] 000:  handle_irq_event+0x48/0x90
[  165.361695] 000:  handle_edge_irq+0x95/0x1f0
[  165.361700] 000:  do_IRQ+0x6c/0x120
[  165.361709] 000:  common_interrupt+0xf/0xf
[  165.361719] 000:  </IRQ>
[  165.361720] 000: RIP: 0010:serial8250_handle_irq.part.26+0x9e/0xd0
[  165.361727] 000: Code: 48 85 c0 74 09 80 b8 e9 00 00 00 00 74 0f 41 83 e5 20 74 09 f6 83 36 02 00 00 02 75 2d 31 f6 48 89 df e8 65 87 ff ff 41 54 9d <b8> 01 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 48 8b 40 08 48 89 df e8
[  165.361731] 000: RSP: 0018:ffffb6ffc0a63e20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb
[  165.361732] 000: RAX: ffff90b23734e040 RBX: ffffffff9f5a0de0 RCX: 0000000000000001
[  165.361736] 000: RDX: 0000000000000003 RSI: 0000000000000097 RDI: 00000000ffffffff
[  165.361737] 000: RBP: 0000000000000060 R08: ffffffff9ea0b7a0 R09: ffffb6ffc0a63d78
[  165.361739] 000: R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000246
[  165.361740] 000: R13: 0000000000000020 R14: 00000000000000c2 R15: ffff90b2ba711e58
[  165.361749] 000:  serial8250_default_handle_irq+0x45/0x50
[  165.361752] 000:  serial8250_interrupt+0x51/0xa0
[  165.361759] 000:  ? irq_finalize_oneshot.part.46+0xd0/0xd0
[  165.361762] 000:  irq_forced_thread_fn+0x2b/0x70
[  165.361767] 000:  irq_thread+0xd8/0x170
[  165.361771] 000:  ? wake_threads_waitq+0x30/0x30
[  165.361773] 000:  ? irq_thread_dtor+0x90/0x90
[  165.361775] 000:  kthread+0x10e/0x130
[  165.361780] 000:  ? kthread_park+0x80/0x80
[  165.361782] 000:  ret_from_fork+0x35/0x40
[  165.361787] 000: handlers:
[  165.361793] 000: [<00000000005f10ab>] irq_default_primary_handler threaded [<00000000853a4a7e>] serial8250_interrupt
[  165.361795] 000: Disabling IRQ #4

So this time the storm + stall apparently was triggered by the 'popf' 
coming from my added local_irq_restore().

It's really curious.

I'd be happy to blame it on some qemu breakage, but it seems to be 
happening on any randomly chosen version so far.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-04-23 19:48         ` [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19) Jiri Kosina
@ 2020-04-23 20:15           ` Steven Rostedt
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2020-04-23 20:15 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, LKML, linux-rt-users,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On Thu, 23 Apr 2020 21:48:20 +0200 (CEST)
Jiri Kosina <jikos@kernel.org> wrote:

> On Thu, 23 Apr 2020, Sebastian Andrzej Siewior wrote:
> 
> > The IRQ4 is edge and in charge of ttyS0. It is handled by 
> > handle_edge_irq() and after ->irq_ack(), the thread is woken up and then 
> > we get another ->handle_edge_irq() for IRQ4. With larger PASS_LIMIT the 
> > thread runs longer so note_interrupt() will make less IRQ_HANDLED based 
> > on ->threads_handled_last. If it observes 100 handled within 100000 
> > interrupts then the counters are reset again. On !RT it usually manages 
> > to get >100 per 100000 interrupts so it appears good. On RT it gets less 
> > and the interrupt gets disabled.
> > 
> > So it is not RT related, but RT triggers it more reliably (also the 
> > PASS_LIMIT change can vanish). I can't tell if this is a qemu bug in 
> > emulating the HW or not. I can't reproduce it real HW. I see a second 
> > edge interrupt only after the thread completed. I can't tell if this is 
> > because it is a real UART and the data is flowing slower or because the 
> > edge-IRQ is not triggered repeatedly.  
> 
> Yeah, it's all strange. In the hope of understanding the issue a little 
> bit better, I tried to disable IRQs in serial8250_handle_irq() to mimic 
> what !PREEMPT_RT spinlock would do; the idea was that this is some kind of 
> strange race / memory ordering (missed ack?) between the threaded irq4 
> handler and the do_IRQ() -> handle_edge_irq() -> ... path.
> 
> So I did this:
> 
> ---
>  drivers/tty/serial/8250/8250_port.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> index e31217e8dce6..1a577305e174 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -1813,12 +1813,13 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
>  int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
>  {
>  	unsigned char status;
> -	unsigned long flags;
> +	unsigned long flags, f;
>  	struct uart_8250_port *up = up_to_u8250p(port);
>  
>  	if (iir & UART_IIR_NO_INT)
>  		return 0;
>  
> +	local_irq_save(f);
>  	spin_lock_irqsave(&port->lock, flags);

Note, this would break if there ever was a contention, as the spin lock is
a mutex and would sleep. And we don't want to do that with interrupts
disabled!

>  
>  	status = serial_port_in(port, UART_LSR);
> @@ -1833,6 +1834,7 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
>  		serial8250_tx_chars(up);
>  
>  	uart_unlock_and_check_sysrq(port, flags);
> +	local_irq_restore(f);
>  	return 1;
>  }
> 
> But curiously enough, that exploded in the oposite order (so first there 
> was CPU stall, and only later the disabling of IRQ4 due to spurious 
> storm):


Now what may be interesting to try is to enable tracing and
ftrace_dump_on_opps, and set panic_on_warning, as well as
traceoff_on_warning.


 # echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
 # echo 1 > /proc/sys/kernel/panic_on_warn
 # echo 1 > /proc/sys/kernel/traceoff_on_warning
 # echo 1 > /sys/kernel/tracing/events/enable

Enabling all events will include interrupt events and wake ups, and perhaps
give you an idea what interrupts are happening after the uart thread is
woken.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-04-23 16:07       ` Sebastian Andrzej Siewior
  2020-04-23 16:20         ` [PATCH RT] Revert "rt: Improve the serial console PASS_LIMIT" Sebastian Andrzej Siewior
  2020-04-23 19:48         ` [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19) Jiri Kosina
@ 2020-04-24 19:19         ` Sebastian Andrzej Siewior
  2020-04-24 20:54           ` Jiri Kosina
  2 siblings, 1 reply; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-04-24 19:19 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On 2020-04-23 18:07:07 [+0200], To Jiri Kosina wrote:
> The IRQ4 is edge and in charge of ttyS0. It is handled by
> handle_edge_irq() and after ->irq_ack(), the thread is woken up and then
> we get another ->handle_edge_irq() for IRQ4. With larger PASS_LIMIT the
> thread runs longer so note_interrupt() will make less IRQ_HANDLED based
> on ->threads_handled_last. If it observes 100 handled within 100000
> interrupts then the counters are reset again. On !RT it usually manages
> to get >100 per 100000 interrupts so it appears good. On RT it gets less
> and the interrupt gets disabled.

so the EDGE interrupt shouldn't retrigger unless something happened that
*retriggers* the situation. I think the problem is that qemu is too
fast. Based on tracing: The IRQ happens, the threaded-handler fires up.
The threaded handler writes something into the TX FIFO. QEMU probably
transmits that byte immediately so the FIFO is empty again *and* another
interrupt is fired. It makes sense because I see one one byte written to
the FIFO followed by an interrupt, followed by another byte, followed by
an interrupt, ….
This does not happen on real hardware because it takes some time to
transmit the data so.

I already dropped PASS_LIMIT from the RT queue so it might not trigger
again. I think I'm going to make the chain-handler depend on ISA so it
the thread is invoked more often.
According to the comment it is/was required to handle some ISA issues.
I'm not sure if back then there was no SA_SHARED handling in the IRQ
core was limited or $REASON. As of today I don't see a reason to keep it
for !ISA hardware. On Debian CONFIG_ISA_BUS is disabled, I guess the
same is true for SUSE.

And with the hunk at the bottom it shouldn't trigger as well. So
PASS_LIMIT and no custom handler is duct tape. This really stops the IRQ
from being generated.

diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index e31217e8dce62..eadb12bf0e90b 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1741,6 +1741,7 @@ void serial8250_tx_chars(struct uart_8250_port *up)
 	}
 
 	count = up->tx_loadsz;
+	serial8250_clear_THRI(up);
 	do {
 		serial_out(up, UART_TX, xmit->buf[xmit->tail]);
 		xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);
@@ -1755,6 +1756,7 @@ void serial8250_tx_chars(struct uart_8250_port *up)
 		    !(serial_in(up, UART_LSR) & UART_LSR_THRE))
 			break;
 	} while (--count > 0);
+	serial8250_set_THRI(up);
 
 	if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS)
 		uart_write_wakeup(port);
 
Sebastian

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-04-24 19:19         ` Sebastian Andrzej Siewior
@ 2020-04-24 20:54           ` Jiri Kosina
  2020-04-27  9:17             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Kosina @ 2020-04-24 20:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On Fri, 24 Apr 2020, Sebastian Andrzej Siewior wrote:

> And with the hunk at the bottom it shouldn't trigger as well. So
> PASS_LIMIT and no custom handler is duct tape. This really stops the IRQ
> from being generated.
> 
> diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
> index e31217e8dce62..eadb12bf0e90b 100644
> --- a/drivers/tty/serial/8250/8250_port.c
> +++ b/drivers/tty/serial/8250/8250_port.c
> @@ -1741,6 +1741,7 @@ void serial8250_tx_chars(struct uart_8250_port *up)
>  	}
>  
>  	count = up->tx_loadsz;
> +	serial8250_clear_THRI(up);
>  	do {
>  		serial_out(up, UART_TX, xmit->buf[xmit->tail]);
>  		xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);
> @@ -1755,6 +1756,7 @@ void serial8250_tx_chars(struct uart_8250_port *up)
>  		    !(serial_in(up, UART_LSR) & UART_LSR_THRE))
>  			break;
>  	} while (--count > 0);
> +	serial8250_set_THRI(up);
>  
>  	if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS)
>  		uart_write_wakeup(port);

It's still wonky; with the two hunks above on top of 5.6.4-rt3 (that's 
without the PASS_LIMIT adjustment) flooding the emulated serial console 
still emits the splat below.

Looking at this:

Code: 5b c3 48 8b 47 08 48 03 04 d5 c0 26 4d a1 48 89 c3 48 8b 00 f6 c4 02 
75 09 50 9d e8 19 dd 5e ff eb a1 e8 f2 db 5e ff ff 33 9d <eb> 97 66 66 2e 
0f 1f 84 00 00 00 00 00 66 90 55 53 31 ed 48 8b 1f
All code
========
   0:	5b                   	pop    %rbx
   1:	c3                   	retq   
   2:	48 8b 47 08          	mov    0x8(%rdi),%rax
   6:	48 03 04 d5 c0 26 4d 	add    -0x5eb2d940(,%rdx,8),%rax
   d:	a1 
   e:	48 89 c3             	mov    %rax,%rbx
  11:	48 8b 00             	mov    (%rax),%rax
  14:	f6 c4 02             	test   $0x2,%ah
  17:	75 09                	jne    0x22
  19:	50                   	push   %rax
  1a:	9d                   	popfq  
  1b:	e8 19 dd 5e ff       	callq  0xffffffffff5edd39
  20:	eb a1                	jmp    0xffffffffffffffc3
  22:	e8 f2 db 5e ff       	callq  0xffffffffff5edc19
  27:	ff 33                	pushq  (%rbx)
  29:	9d                   	popfq  
  2a:*	eb 97                	jmp    0xffffffffffffffc3		<-- trapping instruction
  2c:	66 66 2e 0f 1f 84 00 	data16 nopw %cs:0x0(%rax,%rax,1)

So now the endless interrupt storm comes at a different point -- exactly 
once IRQs get re-enabled in prb_unlock(). How we reach prb_unlock() from 
serial8250_tx_chars() I still have to understand. Worth involving John?

Thanks.


====== cut here ========
[   75.286440] 000: rcu: INFO: rcu_preempt self-detected stall on CPU
[   75.286533] 000: rcu: 	0-....: (1 GPs behind) idle=94a/1/0x4000000000000002 softirq=0/0 fqs=5167 
[   75.286556] 000: 	(t=21000 jiffies g=15213 q=25248)
[   75.286557] 000: NMI backtrace for cpu 0
[   75.286567] 000: CPU: 0 PID: 1209 Comm: irq/4-ttyS0 Not tainted 5.6.4-rt19-00003-g5cf51e8702ad #16
[   75.286570] 000: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014
[   75.286578] 000: Call Trace:
[   75.288543] 000:  <IRQ>
[   75.288557] 000:  dump_stack+0x71/0x9b
[   75.288574] 000:  nmi_cpu_backtrace+0x8a/0x90
[   75.288578] 000:  ? lapic_can_unplug_cpu+0xd0/0xd0
[   75.288585] 000:  nmi_trigger_cpumask_backtrace+0xe9/0x130
[   75.288587] 000:  rcu_dump_cpu_stacks+0x93/0xcd
[   75.288592] 000:  rcu_sched_clock_irq+0x70f/0xae0
[   75.288595] 000:  ? cpuacct_account_field+0x1b/0x70
[   75.288603] 000:  ? tick_sched_do_timer+0x60/0x60
[   75.288607] 000:  update_process_times+0x24/0x50
[   75.288610] 000:  tick_sched_handle+0x34/0x50
[   75.288612] 000:  tick_sched_timer+0x43/0x80
[   75.288617] 000:  __hrtimer_run_queues+0x123/0x3c0
[   75.288620] 000:  hrtimer_interrupt+0xef/0x250
[   75.288622] 000:  smp_apic_timer_interrupt+0x89/0x200
[   75.288631] 000:  apic_timer_interrupt+0xf/0x20
[   75.288636] 000:  </IRQ>
[   75.288637] 000: RIP: 0010:prb_unlock+0x71/0x80
[   75.288642] 000: Code: 5b c3 48 8b 47 08 48 03 04 d5 c0 26 4d a1 48 89 c3 48 8b 00 f6 c4 02 75 09 50 9d e8 19 dd 5e ff eb a1 e8 f2 db 5e ff ff 33 9d <eb> 97 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 55 53 31 ed 48 8b 1f
[   75.288648] 000: RSP: 0018:ffffbc8cc08cbde0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
[   75.288649] 000: RAX: 0000000000000000 RBX: ffff9dd2fcc18a48 RCX: 0000000000000000
[   75.288650] 000: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0d7220e
[   75.288651] 000: RBP: ffff9dd2f8878000 R08: 0000000000000001 R09: 0000000000000001
[   75.288652] 000: R10: 0000000000000001 R11: 0000000000292836 R12: 0000000000000000
[   75.288653] 000: R13: 0000000000000020 R14: 0000000000000001 R15: ffff9dd276c10078
[   75.288656] 000:  ? prb_unlock+0x6e/0x80
[   75.288658] 000:  serial8250_tx_chars+0x3b5/0x410
[   75.288663] 000:  serial8250_handle_irq.part.26+0xc0/0xd0
[   75.288665] 000:  serial8250_default_handle_irq+0x4a/0x60
[   75.288667] 000:  serial8250_interrupt+0x56/0xa0
[   75.288669] 000:  ? irq_finalize_oneshot.part.46+0xe0/0xe0
[   75.288671] 000:  irq_forced_thread_fn+0x30/0x80
[   75.288673] 000:  irq_thread+0xdd/0x180
[   75.288675] 000:  ? wake_threads_waitq+0x30/0x30
[   75.288676] 000:  ? irq_thread_dtor+0x90/0x90
[   75.288677] 000:  kthread+0x113/0x130
[   75.288680] 000:  ? kthread_park+0x90/0x90
[   75.288682] 000:  ret_from_fork+0x3a/0x50
[  134.432670] 000: irq 4: nobody cared (try booting with the "irqpoll" option)
[  134.432685] 000: CPU: 0 PID: 1209 Comm: irq/4-ttyS0 Not tainted 5.6.4-rt19-00003-g5cf51e8702ad #16
[  134.432690] 000: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014
[  134.432696] 000: Call Trace:
[  134.432715] 000:  <IRQ>
[  134.432726] 000:  dump_stack+0x71/0x9b
[  134.432743] 000:  __report_bad_irq+0x30/0xc0
[  134.432748] 000:  note_interrupt+0x233/0x280
[  134.432752] 000:  ? prb_unlock+0x71/0x80
[  134.432756] 000:  handle_irq_event_percpu+0x73/0xa0
[  134.432758] 000:  handle_irq_event+0x4d/0x90
[  134.432760] 000:  handle_edge_irq+0x9a/0x1f0
[  134.432764] 000:  do_IRQ+0x71/0x120
[  134.432771] 000:  common_interrupt+0xf/0xf
[  134.432776] 000:  </IRQ>
[  134.432777] 000: RIP: 0010:prb_unlock+0x71/0x80
[  134.432782] 000: Code: 5b c3 48 8b 47 08 48 03 04 d5 c0 26 4d a1 48 89 c3 48 8b 00 f6 c4 02 75 09 50 9d e8 19 dd 5e ff eb a1 e8 f2 db 5e ff ff 33 9d <eb> 97 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 55 53 31 ed 48 8b 1f
[  134.432785] 000: RSP: 0018:ffffbc8cc08cbde0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffffdb
[  134.432786] 000: RAX: 0000000000000000 RBX: ffff9dd2fcc18a48 RCX: 0000000000000000
[  134.432790] 000: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0d7220e
[  134.432791] 000: RBP: ffff9dd2f8878000 R08: 0000000000000001 R09: ffffbc8cc08cbd80
[  134.432792] 000: R10: 0000000000000001 R11: 000000000000010e R12: 0000000000000000
[  134.432795] 000: R13: 0000000000000020 R14: 0000000000000001 R15: ffff9dd276c10078
[  134.432796] 000:  ? prb_unlock+0x6e/0x80
[  134.432799] 000:  serial8250_tx_chars+0x3b5/0x410
[  134.432804] 000:  serial8250_handle_irq.part.26+0xc0/0xd0
[  134.432809] 000:  serial8250_default_handle_irq+0x4a/0x60
[  134.432810] 000:  serial8250_interrupt+0x56/0xa0
[  134.432813] 000:  ? irq_finalize_oneshot.part.46+0xe0/0xe0
[  134.432814] 000:  irq_forced_thread_fn+0x30/0x80
[  134.432816] 000:  irq_thread+0xdd/0x180
[  134.432817] 000:  ? wake_threads_waitq+0x30/0x30
[  134.432819] 000:  ? irq_thread_dtor+0x90/0x90
[  134.432819] 000:  kthread+0x113/0x130
[  134.432823] 000:  ? kthread_park+0x90/0x90
[  134.432824] 000:  ret_from_fork+0x3a/0x50
[  134.432827] 000: handlers:
[  134.432832] 000: [<0000000099290260>] irq_default_primary_handler threaded [<00000000733fcb60>] serial8250_interrupt
[  134.432833] 000: Disabling IRQ #4
[  138.287552] 000: rcu: INFO: rcu_preempt self-detected stall on CPU
[  138.287561] 000: rcu: 	0-....: (1 GPs behind) idle=94a/1/0x4000000000000002 softirq=0/0 fqs=20687 
[  138.287565] 000: 	(t=84003 jiffies g=15213 q=90211)
[  138.287566] 000: NMI backtrace for cpu 0
[  138.287572] 000: CPU: 0 PID: 1209 Comm: irq/4-ttyS0 Not tainted 5.6.4-rt19-00003-g5cf51e8702ad #16
[  138.287573] 000: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014
[  138.287578] 000: Call Trace:
[  138.287590] 000:  <IRQ>
[  138.287598] 000:  dump_stack+0x71/0x9b
[  138.287607] 000:  nmi_cpu_backtrace+0x8a/0x90
[  138.287609] 000:  ? lapic_can_unplug_cpu+0xd0/0xd0
[  138.287612] 000:  nmi_trigger_cpumask_backtrace+0xe9/0x130
[  138.287616] 000:  rcu_dump_cpu_stacks+0x93/0xcd
[  138.287620] 000:  rcu_sched_clock_irq+0x70f/0xae0
[  138.287622] 000:  ? cpuacct_account_field+0x1b/0x70
[  138.287627] 000:  ? tick_sched_do_timer+0x60/0x60
[  138.287632] 000:  update_process_times+0x24/0x50
[  138.287636] 000:  tick_sched_handle+0x34/0x50
[  138.287638] 000:  tick_sched_timer+0x43/0x80
[  138.287639] 000:  __hrtimer_run_queues+0x123/0x3c0
[  138.287644] 000:  hrtimer_interrupt+0xef/0x250
[  138.287646] 000:  smp_apic_timer_interrupt+0x89/0x200
[  138.287650] 000:  apic_timer_interrupt+0xf/0x20
[  138.287652] 000:  </IRQ>
[  138.287655] 000: RIP: 0010:prb_unlock+0x71/0x80
[  138.287659] 000: Code: 5b c3 48 8b 47 08 48 03 04 d5 c0 26 4d a1 48 89 c3 48 8b 00 f6 c4 02 75 09 50 9d e8 19 dd 5e ff eb a1 e8 f2 db 5e ff ff 33 9d <eb> 97 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 55 53 31 ed 48 8b 1f
[  138.287664] 000: RSP: 0018:ffffbc8cc08cbde0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
[  138.287666] 000: RAX: 0000000000000000 RBX: ffff9dd2fcc18a48 RCX: 0000000000000000
[  138.287667] 000: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0d7220e
[  138.287668] 000: RBP: ffff9dd2f8878000 R08: 0000000000000001 R09: ffffbc8cc08cbd80
[  138.287669] 000: R10: 0000000000000001 R11: 00000000000003e1 R12: 0000000000000010
[  138.287670] 000: R13: 0000000000000020 R14: 0000000000000001 R15: ffff9dd276c10078
[  138.287671] 000:  ? prb_unlock+0x6e/0x80
[  138.287673] 000:  serial8250_tx_chars+0x384/0x410
[  138.287675] 000:  serial8250_handle_irq.part.26+0xc0/0xd0
[  138.287677] 000:  serial8250_default_handle_irq+0x4a/0x60
[  138.287679] 000:  serial8250_interrupt+0x56/0xa0
[  138.287680] 000:  ? irq_finalize_oneshot.part.46+0xe0/0xe0
[  138.287682] 000:  irq_forced_thread_fn+0x30/0x80
[  138.287683] 000:  irq_thread+0xdd/0x180
[  138.287684] 000:  ? wake_threads_waitq+0x30/0x30
[  138.287685] 000:  ? irq_thread_dtor+0x90/0x90
[  138.287686] 000:  kthread+0x113/0x130
[  138.287688] 000:  ? kthread_park+0x90/0x90
[  138.287689] 000:  ret_from_fork+0x3a/0x50
====== cut here ========

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19)
  2020-04-24 20:54           ` Jiri Kosina
@ 2020-04-27  9:17             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-04-27  9:17 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt,
	Frederic Weisbecker, Matt Fleming, Daniel Wagner

On 2020-04-24 22:54:51 [+0200], Jiri Kosina wrote:
> It's still wonky; with the two hunks above on top of 5.6.4-rt3 (that's 
> without the PASS_LIMIT adjustment) flooding the emulated serial console 
> still emits the splat below.
> So now the endless interrupt storm comes at a different point -- exactly 
> once IRQs get re-enabled in prb_unlock(). How we reach prb_unlock() from 
> serial8250_tx_chars() I still have to understand. Worth involving John?

My guess is that it is unrelated and it is simply code that
disabled/enabled interrupts at the time the NMI was was triggered.

…
> [   75.286440] 000: rcu: INFO: rcu_preempt self-detected stall on CPU
> [   75.286533] 000: rcu: 	0-....: (1 GPs behind) idle=94a/1/0x4000000000000002 softirq=0/0 fqs=5167 
> [   75.286556] 000: 	(t=21000 jiffies g=15213 q=25248)

a RCU stall but it is only one GP behind :)
My guess here would be that simply we never had the opportunity to
perform perform a GP callbacks and nobody entered a RCU critical section
(we were busy printing on the console the whole time).

So a dummy RCU section like this:

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index f61f6f5426eff..5636123a90580 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1145,6 +1145,8 @@ static int irq_thread(void *data)
 		migrate_enable();
 #endif
 		wake_threads_waitq(desc);
+		rcu_read_lock();
+		rcu_read_unlock();
 	}
 
 	/*


should work given the RCU-boosting is enabled.

…
> [  134.432670] 000: irq 4: nobody cared (try booting with the "irqpoll" option)
> [  134.432685] 000: CPU: 0 PID: 1209 Comm: irq/4-ttyS0 Not tainted 5.6.4-rt19-00003-g5cf51e8702ad #16
> [  134.432690] 000: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014

yeah. again. I'm not sure if this is good or bad. The threaded
IRQ-handler runs constantly in this scenario. The core code *thinks*
that the handler makes no progress or is stuck and so it disables it.
It is not so far fetched. It wouldn't happen on real hardware actual HW
would take more time and so not "stuck" in the handler endlessly.

In networking, we would have NAPI which then pushes the driver to the
ksofitrqd which runs at SCHED_OTHER while here the IRQ thread runs at a
RT priority. I don't think we should add something like this to the 8250
driver to deal with the situation.

Sebastian

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-04-27  9:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-30 14:47 [ANNOUNCE] v5.4.28-rt19 Sebastian Andrzej Siewior
2020-04-23  8:51 ` [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19) Jiri Kosina
2020-04-23  9:12   ` Jiri Kosina
2020-04-23 10:45     ` Sebastian Andrzej Siewior
2020-04-23 13:06       ` Steven Rostedt
2020-04-23 18:34         ` Steven Rostedt
2020-04-23 18:47           ` Sebastian Andrzej Siewior
2020-04-23 16:07       ` Sebastian Andrzej Siewior
2020-04-23 16:20         ` [PATCH RT] Revert "rt: Improve the serial console PASS_LIMIT" Sebastian Andrzej Siewior
2020-04-23 16:21           ` Sebastian Andrzej Siewior
2020-04-23 19:48         ` [PREEMPT_RT] 8250 IRQ lockup when flooding serial console (was Re: [ANNOUNCE] v5.4.28-rt19) Jiri Kosina
2020-04-23 20:15           ` Steven Rostedt
2020-04-24 19:19         ` Sebastian Andrzej Siewior
2020-04-24 20:54           ` Jiri Kosina
2020-04-27  9:17             ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).