All of lore.kernel.org
 help / color / mirror / Atom feed
* stall/hang on 4.15 kernel with some Xeon skylake CPUs and extended APIC
@ 2018-05-01 16:37 Rick Warner
  2018-05-15 16:07 ` [bisected] rcu_sched detected stalls - 4.15 or newer " Rick Warner
  0 siblings, 1 reply; 8+ messages in thread
From: Rick Warner @ 2018-05-01 16:37 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 4178 bytes --]

Hi All,

I've discovered that some new Supermicro skylake systems will hang/stall
while booting the 4.15 kernel when extended APIC (x2apic) is enabled in
the BIOS. The issue happens on specific CPUs only and follows the CPUs.

We had (4) quad socket systems with Xeon 6134 CPUs; 2 out of 4 were
exhibiting this behavior.  We replaced 2 CPUs at that time and the
behavior was eliminated. Those systems were then shipped to our customer
(we are an HPC system integrator).

Now, we have 5 single socket systems with 5122 CPUs.  2 out of the 5 are
hanging.  If we swap the CPUs from the hanging systems with working
systems, the behavior follows the CPU.

I've done a git bisect between 4.14 and 4.15 and found this commit is
triggering the issue:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=023a611748fd58d46c8aa049cf4f22ebada983f5

Some of the commits right before it also seemed to trigger this warning:
[    5.062563] Debug warning: early ioremap leak of 1 areas detected.
               please boot with early_ioremap_debug and report the dmesg.

I have a dmesg log of 1 commit prior to the referenced link with
early_ioremap_debug enabled if it is desired.

The latest git still has the issue.

I've attached a dmesg log captured via serial console from a system
exhibiting this problem.  Here is an excerpt from it where the problems
start:

ACPI: Added _OSI(Module Device)
ACPI: Added _OSI(Processor Device)
ACPI: Added _OSI(3.0 _SCP Extensions)
ACPI: Added _OSI(Processor Aggregator Device)
ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
INFO: rcu_sched self-detected stall on CPU
        34-....: (14997 ticks this GP) idle=b3e/140000000000001/0
softirq=18/18 fqs=7497
INFO: rcu_sched detected stalls on CPUs/tasks:

        34-....: (14997 ticks this GP) idle=b3e/140000000000001/0
softirq=18/18 fqs=7498
 (t=15002 jiffies g=-294 c=-295 q=391)
        (detected by 0, t=15002 jiffies, g=-294, c=-295, q=391)
NMI backtrace for cpu 34
CPU: 34 PID: 1 Comm: swapper/0 Not tainted 4.15.7-gentoo-r1-netuno-x86_64 #4
Hardware name: Supermicro SYS-2049U-TR4/X11QPH+, BIOS 2.0c 02/23/2018
Call Trace:
 <IRQ>
 dump_stack+0x5d/0x79
 nmi_cpu_backtrace+0x94/0xae
 ? irq_force_complete_move+0x6f/0x6f
 nmi_trigger_cpumask_backtrace+0x56/0xd3
 rcu_dump_cpu_stacks+0x96/0xc0
 rcu_check_callbacks+0x285/0x697
 update_process_times+0x28/0x4a
 tick_handle_periodic+0x20/0x5f
 smp_apic_timer_interrupt+0x93/0xf9
 apic_timer_interrupt+0x7d/0x90
 </IRQ>
RIP: 0010:smp_call_function_many+0x1f1/0x204
RSP: 0000:ffffc900000f3af0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff11
RAX: 0000000000000001 RBX: ffff880c110a0488 RCX: 0000000000000001
RDX: ffff880c10e64440 RSI: 0000000000000000 RDI: ffff880c110a0488
RBP: ffff880c110a0480 R08: fffffffffffffffe R09: 0000000000000003
R10: 0000000000000000 R11: ffffea00c03c1a60 R12: 0000000000000001
R13: ffff880c110a04b8 R14: 0000000000020440 R15: ffffffff81ed5400
 ? slub_cpu_dead+0xa0/0xa0
 ? slub_cpu_dead+0xa0/0xa0
 ? __mmu_notifier_mm_destroy+0x32/0x32
 on_each_cpu_mask+0x23/0x53
 ? slub_cpu_dead+0xa0/0xa0
 on_each_cpu_cond+0x7c/0x8b
 __kmem_cache_shrink+0x3c/0x237
 ? acpi_ps_delete_parse_tree+0x2d/0x59
 ? set_debug_rodata+0x11/0x11
 ? acpi_os_purge_cache+0xa/0xd
 acpi_os_purge_cache+0xa/0xd
 acpi_purge_cached_objects+0x29/0x38
 acpi_initialize_objects+0x46/0x4f
 ? acpi_sleep_init+0xd6/0xd6
 acpi_init+0xb6/0x324
 ? scan_for_dmi_ipmi+0x15/0xec
 ? acpi_sleep_init+0xd6/0xd6
 do_one_initcall+0x89/0x128
 ? set_debug_rodata+0x11/0x11
 ? set_debug_rodata+0x11/0x11
 kernel_init_freeable+0x112/0x18e
 ? rest_init+0xaa/0xaa
 kernel_init+0xa/0xf0
 ret_from_fork+0x35/0x40

The NMI dump info repeats periodically after that but never progresses
further.

If any other information is needed, please let me know.  I've reported
this issue to Supermicro already and they believe it is an issue with
the kernel opposed to an issue specific to their systems.  I don't have
any other brand Xeon skylake systems with extended APIC support that I
can try this with.

Thanks,
Rick


Richard Warner
Chief Technology Officer
Microway, Inc

[-- Attachment #2: dmesg-hang-with-extended-APIC-enabled.txt.gz --]
[-- Type: application/gzip, Size: 7055 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bisected] rcu_sched detected stalls - 4.15 or newer kernel with some Xeon skylake CPUs and extended APIC
  2018-05-01 16:37 stall/hang on 4.15 kernel with some Xeon skylake CPUs and extended APIC Rick Warner
@ 2018-05-15 16:07 ` Rick Warner
  2018-05-15 20:19   ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Rick Warner @ 2018-05-15 16:07 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Thomas Gleixner

Hi All,

Does anyone have ideas on this?  Is there any other data I can provide 
to help debug this?

Thanks,
Rick

On 05/01/2018 12:37 PM, Rick Warner wrote:
> Hi All,
>
> I've discovered that some new Supermicro skylake systems will hang/stall
> while booting the 4.15 kernel when extended APIC (x2apic) is enabled in
> the BIOS. The issue happens on specific CPUs only and follows the CPUs.
>
> We had (4) quad socket systems with Xeon 6134 CPUs; 2 out of 4 were
> exhibiting this behavior.  We replaced 2 CPUs at that time and the
> behavior was eliminated. Those systems were then shipped to our customer
> (we are an HPC system integrator).
>
> Now, we have 5 single socket systems with 5122 CPUs.  2 out of the 5 are
> hanging.  If we swap the CPUs from the hanging systems with working
> systems, the behavior follows the CPU.
>
> I've done a git bisect between 4.14 and 4.15 and found this commit is
> triggering the issue:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=023a611748fd58d46c8aa049cf4f22ebada983f5
>
> Some of the commits right before it also seemed to trigger this warning:
> [    5.062563] Debug warning: early ioremap leak of 1 areas detected.
>                 please boot with early_ioremap_debug and report the dmesg.
>
> I have a dmesg log of 1 commit prior to the referenced link with
> early_ioremap_debug enabled if it is desired.
>
> The latest git still has the issue.
>
> I've attached a dmesg log captured via serial console from a system
> exhibiting this problem.  Here is an excerpt from it where the problems
> start:
>
> ACPI: Added _OSI(Module Device)
> ACPI: Added _OSI(Processor Device)
> ACPI: Added _OSI(3.0 _SCP Extensions)
> ACPI: Added _OSI(Processor Aggregator Device)
> ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
> INFO: rcu_sched self-detected stall on CPU
>          34-....: (14997 ticks this GP) idle=b3e/140000000000001/0
> softirq=18/18 fqs=7497
> INFO: rcu_sched detected stalls on CPUs/tasks:
>
>          34-....: (14997 ticks this GP) idle=b3e/140000000000001/0
> softirq=18/18 fqs=7498
>   (t=15002 jiffies g=-294 c=-295 q=391)
>          (detected by 0, t=15002 jiffies, g=-294, c=-295, q=391)
> NMI backtrace for cpu 34
> CPU: 34 PID: 1 Comm: swapper/0 Not tainted 4.15.7-gentoo-r1-netuno-x86_64 #4
> Hardware name: Supermicro SYS-2049U-TR4/X11QPH+, BIOS 2.0c 02/23/2018
> Call Trace:
>   <IRQ>
>   dump_stack+0x5d/0x79
>   nmi_cpu_backtrace+0x94/0xae
>   ? irq_force_complete_move+0x6f/0x6f
>   nmi_trigger_cpumask_backtrace+0x56/0xd3
>   rcu_dump_cpu_stacks+0x96/0xc0
>   rcu_check_callbacks+0x285/0x697
>   update_process_times+0x28/0x4a
>   tick_handle_periodic+0x20/0x5f
>   smp_apic_timer_interrupt+0x93/0xf9
>   apic_timer_interrupt+0x7d/0x90
>   </IRQ>
> RIP: 0010:smp_call_function_many+0x1f1/0x204
> RSP: 0000:ffffc900000f3af0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff11
> RAX: 0000000000000001 RBX: ffff880c110a0488 RCX: 0000000000000001
> RDX: ffff880c10e64440 RSI: 0000000000000000 RDI: ffff880c110a0488
> RBP: ffff880c110a0480 R08: fffffffffffffffe R09: 0000000000000003
> R10: 0000000000000000 R11: ffffea00c03c1a60 R12: 0000000000000001
> R13: ffff880c110a04b8 R14: 0000000000020440 R15: ffffffff81ed5400
>   ? slub_cpu_dead+0xa0/0xa0
>   ? slub_cpu_dead+0xa0/0xa0
>   ? __mmu_notifier_mm_destroy+0x32/0x32
>   on_each_cpu_mask+0x23/0x53
>   ? slub_cpu_dead+0xa0/0xa0
>   on_each_cpu_cond+0x7c/0x8b
>   __kmem_cache_shrink+0x3c/0x237
>   ? acpi_ps_delete_parse_tree+0x2d/0x59
>   ? set_debug_rodata+0x11/0x11
>   ? acpi_os_purge_cache+0xa/0xd
>   acpi_os_purge_cache+0xa/0xd
>   acpi_purge_cached_objects+0x29/0x38
>   acpi_initialize_objects+0x46/0x4f
>   ? acpi_sleep_init+0xd6/0xd6
>   acpi_init+0xb6/0x324
>   ? scan_for_dmi_ipmi+0x15/0xec
>   ? acpi_sleep_init+0xd6/0xd6
>   do_one_initcall+0x89/0x128
>   ? set_debug_rodata+0x11/0x11
>   ? set_debug_rodata+0x11/0x11
>   kernel_init_freeable+0x112/0x18e
>   ? rest_init+0xaa/0xaa
>   kernel_init+0xa/0xf0
>   ret_from_fork+0x35/0x40
>
> The NMI dump info repeats periodically after that but never progresses
> further.
>
> If any other information is needed, please let me know.  I've reported
> this issue to Supermicro already and they believe it is an issue with
> the kernel opposed to an issue specific to their systems.  I don't have
> any other brand Xeon skylake systems with extended APIC support that I
> can try this with.
>
> Thanks,
> Rick
>
>
> Richard Warner
> Chief Technology Officer
> Microway, Inc

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bisected] rcu_sched detected stalls - 4.15 or newer kernel with some Xeon skylake CPUs and extended APIC
  2018-05-15 16:07 ` [bisected] rcu_sched detected stalls - 4.15 or newer " Rick Warner
@ 2018-05-15 20:19   ` Thomas Gleixner
  2018-05-16 14:50     ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2018-05-15 20:19 UTC (permalink / raw)
  To: Rick Warner; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3087 bytes --]

On Tue, 15 May 2018, Rick Warner wrote:
> > I've discovered that some new Supermicro skylake systems will hang/stall
> > while booting the 4.15 kernel when extended APIC (x2apic) is enabled in
> > the BIOS. The issue happens on specific CPUs only and follows the CPUs.
> > 
> > We had (4) quad socket systems with Xeon 6134 CPUs; 2 out of 4 were
> > exhibiting this behavior.  We replaced 2 CPUs at that time and the
> > behavior was eliminated. Those systems were then shipped to our customer
> > (we are an HPC system integrator).
> > 
> > Now, we have 5 single socket systems with 5122 CPUs.  2 out of the 5 are
> > hanging.  If we swap the CPUs from the hanging systems with working
> > systems, the behavior follows the CPU.

That's weird.

> > I've done a git bisect between 4.14 and 4.15 and found this commit is
> > triggering the issue:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=023a611748fd58d46c8aa049cf4f22ebada983f5
> >

Interesting.

> > I've attached a dmesg log captured via serial console from a system
> > exhibiting this problem.  Here is an excerpt from it where the problems
> > start:

> > NMI backtrace for cpu 34

> > RIP: 0010:smp_call_function_many+0x1f1/0x204

So this waits for the IPI to be handled on some other CPU(s).

> > RSP: 0000:ffffc900000f3af0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff11
> > RAX: 0000000000000001 RBX: ffff880c110a0488 RCX: 0000000000000001
> > RDX: ffff880c10e64440 RSI: 0000000000000000 RDI: ffff880c110a0488
> > RBP: ffff880c110a0480 R08: fffffffffffffffe R09: 0000000000000003
> > R10: 0000000000000000 R11: ffffea00c03c1a60 R12: 0000000000000001
> > R13: ffff880c110a04b8 R14: 0000000000020440 R15: ffffffff81ed5400
> >   ? slub_cpu_dead+0xa0/0xa0
> >   ? slub_cpu_dead+0xa0/0xa0
> >   ? __mmu_notifier_mm_destroy+0x32/0x32
> >   on_each_cpu_mask+0x23/0x53
> >   ? slub_cpu_dead+0xa0/0xa0
> >   on_each_cpu_cond+0x7c/0x8b
> >   __kmem_cache_shrink+0x3c/0x237
> >   ? acpi_ps_delete_parse_tree+0x2d/0x59
> >   ? set_debug_rodata+0x11/0x11
> >   ? acpi_os_purge_cache+0xa/0xd
> >   acpi_os_purge_cache+0xa/0xd
> >   acpi_purge_cached_objects+0x29/0x38
> >   acpi_initialize_objects+0x46/0x4f
> >   ? acpi_sleep_init+0xd6/0xd6
> >   acpi_init+0xb6/0x324
> >   ? scan_for_dmi_ipmi+0x15/0xec
> >   ? acpi_sleep_init+0xd6/0xd6
> >   do_one_initcall+0x89/0x128
> >   ? set_debug_rodata+0x11/0x11
> >   ? set_debug_rodata+0x11/0x11
> >   kernel_init_freeable+0x112/0x18e
> >   ? rest_init+0xaa/0xaa
> >   kernel_init+0xa/0xf0
> >   ret_from_fork+0x35/0x40

> > If any other information is needed, please let me know.  I've reported
> > this issue to Supermicro already and they believe it is an issue with
> > the kernel opposed to an issue specific to their systems.  I don't have
> > any other brand Xeon skylake systems with extended APIC support that I
> > can try this with.

I can't spot an immediate fail with that commit, but I'll have a look
tomorrow for instrumenting this with tracepoints which can be dumped from
the stall detector.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bisected] rcu_sched detected stalls - 4.15 or newer kernel with some Xeon skylake CPUs and extended APIC
  2018-05-15 20:19   ` Thomas Gleixner
@ 2018-05-16 14:50     ` Thomas Gleixner
  2018-05-16 23:02       ` Rick Warner
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2018-05-16 14:50 UTC (permalink / raw)
  To: Rick Warner; +Cc: Linux Kernel Mailing List

Rick,

On Tue, 15 May 2018, Thomas Gleixner wrote:
> I can't spot an immediate fail with that commit, but I'll have a look
> tomorrow for instrumenting this with tracepoints which can be dumped from
> the stall detector.

can you please give the patch below a try? I assume you have a serial
console, otherwise this is going to be tedious.

Please add the following to the kernel command line:

  ftrace_dump_on_oops

Note, the box will panic after the first stall and spill out the trace
buffer over serial, which might take a while.

Thanks,

	tglx

8<--------------------

--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -52,20 +52,28 @@ static void
 	if (apic_dest != APIC_DEST_ALLINC)
 		cpumask_clear_cpu(smp_processor_id(), tmpmsk);
 
+	trace_printk("To: %*pbl\n", cpumask_pr_args(tmpmsk));
+
 	/* Collapse cpus in a cluster so a single IPI per cluster is sent */
 	for_each_cpu(cpu, tmpmsk) {
 		struct cluster_mask *cmsk = per_cpu(cluster_masks, cpu);
 
 		dest = 0;
+		trace_printk("CPU: %u cluster: %*pbl\n", cpu,
+			     cpumask_pr_args(&cmsk->mask));
 		for_each_cpu_and(clustercpu, tmpmsk, &cmsk->mask)
 			dest |= per_cpu(x86_cpu_to_logical_apicid, clustercpu);
 
-		if (!dest)
+		if (!dest) {
+			trace_printk("dest = 0!?\n");
 			continue;
+		}
 
 		__x2apic_send_IPI_dest(dest, vector, apic->dest_logical);
 		/* Remove cluster CPUs from tmpmask */
 		cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
+		trace_printk("dest %08x --> tmpmsk %*pbl\n", dest,
+			     cpumask_pr_args(tmpmsk));
 	}
 
 	local_irq_restore(flags);
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -124,7 +124,7 @@ int rcu_num_lvls __read_mostly = RCU_NUM
 int num_rcu_lvl[] = NUM_RCU_LVL_INIT;
 int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
 /* panic() on RCU Stall sysctl. */
-int sysctl_panic_on_rcu_stall __read_mostly;
+int sysctl_panic_on_rcu_stall __read_mostly = 1;
 
 /*
  * The rcu_scheduler_active variable is initialized to the value
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -681,6 +681,7 @@ void on_each_cpu_cond(bool (*cond_func)(
 		for_each_online_cpu(cpu)
 			if (cond_func(cpu, info))
 				cpumask_set_cpu(cpu, cpus);
+		trace_printk("%*pbl\n", cpumask_pr_args(cpus));
 		on_each_cpu_mask(cpus, func, info, wait);
 		preempt_enable();
 		free_cpumask_var(cpus);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bisected] rcu_sched detected stalls - 4.15 or newer kernel with some Xeon skylake CPUs and extended APIC
  2018-05-16 14:50     ` Thomas Gleixner
@ 2018-05-16 23:02       ` Rick Warner
  2018-05-17 12:36         ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Rick Warner @ 2018-05-16 23:02 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 6510 bytes --]

Thanks Thomas!

I've attached the dmesg output with the kernel parameter and supplied patch.

Here is the excerpt of the ftrace dump:
Dumping ftrace buffer:
---------------------------------
swapper/-1       0d... 463331us : __x2apic_send_IPI_mask: To:
swapper/-1       0d... 463333us : __x2apic_send_IPI_mask: CPU: 0
cluster: 0-1
swapper/-1       0d... 463334us : __x2apic_send_IPI_mask: dest 00010001
--> tmpmsk
swapper/-1       0d... 504423us : __x2apic_send_IPI_mask: To:
swapper/-1       0d... 504424us : __x2apic_send_IPI_mask: CPU: 1
cluster: 0-1
swapper/-1       0d... 504425us : __x2apic_send_IPI_mask: dest 00000400
--> tmpmsk
swapper/-1       0d... 504425us : __x2apic_send_IPI_mask: CPU: 2 cluster: 2
swapper/-1       0d... 504425us : __x2apic_send_IPI_mask: dest 00010010
--> tmpmsk
swapper/-1       0d... 504425us : __x2apic_send_IPI_mask: CPU: 3 cluster: 3
swapper/-1       0d... 504426us : __x2apic_send_IPI_mask: dest 00010040
--> tmpmsk
swapper/-1       0d... 504430us : __x2apic_send_IPI_mask: To:
swapper/-1       0d... 504430us : __x2apic_send_IPI_mask: CPU: 1
cluster: 0-1
swapper/-1       0d... 504431us : __x2apic_send_IPI_mask: dest 00000400
--> tmpmsk
swapper/-1       0d... 504431us : __x2apic_send_IPI_mask: CPU: 2 cluster: 2
swapper/-1       0d... 504431us : __x2apic_send_IPI_mask: dest 00010010
--> tmpmsk
swapper/-1       0d... 504431us : __x2apic_send_IPI_mask: CPU: 3 cluster: 3
swapper/-1       0d... 504431us : __x2apic_send_IPI_mask: dest 00010040
--> tmpmsk
swapper/-1       0d... 504433us : __x2apic_send_IPI_mask: To:
swapper/-1       0d... 504433us : __x2apic_send_IPI_mask: CPU: 1
cluster: 0-1
swapper/-1       0d... 504433us : __x2apic_send_IPI_mask: dest 00000400
--> tmpmsk
swapper/-1       0d... 504434us : __x2apic_send_IPI_mask: CPU: 2 cluster: 2
swapper/-1       0d... 504434us : __x2apic_send_IPI_mask: dest 00010010
--> tmpmsk
swapper/-1       0d... 504434us : __x2apic_send_IPI_mask: CPU: 3 cluster: 3
swapper/-1       0d... 504434us : __x2apic_send_IPI_mask: dest 00010040
--> tmpmsk
swapper/-1       3d... 1813404us : __x2apic_send_IPI_mask: To:
swapper/-1       3d... 1813405us : __x2apic_send_IPI_mask: CPU: 0
cluster: 0-1
swapper/-1       3d... 1813406us : __x2apic_send_IPI_mask: dest 00010001
--> tmpmsk
swapper/-1       3.... 1879964us : on_each_cpu_cond: 0-1,3
swapper/-1       3.... 1879972us : on_each_cpu_cond: 0-1,3
swapper/-1       3.... 1879975us : on_each_cpu_cond: 0-1,3
swapper/-1       3d... 1879976us : __x2apic_send_IPI_mask: To:
swapper/-1       3d... 1879977us : __x2apic_send_IPI_mask: CPU: 0
cluster: 0-1
swapper/-1       3d... 1879977us : __x2apic_send_IPI_mask: dest 00010401
--> tmpmsk
  <idle>-0       0d.h. 61903171us : __x2apic_send_IPI_mask: To:
  <idle>-0       0d.h. 61903172us : __x2apic_send_IPI_mask: CPU: 3
cluster: 3
  <idle>-0       0d.h. 61903172us : __x2apic_send_IPI_mask: dest
00010040 --> tmpmsk
  <idle>-0       0d.h. 61904223us : __x2apic_send_IPI_mask: To:
  <idle>-0       0d.h. 61904223us : __x2apic_send_IPI_mask: CPU: 1
cluster: 0-1
  <idle>-0       0d.h. 61904223us : __x2apic_send_IPI_mask: dest
00000400 --> tmpmsk
  <idle>-0       0d.h. 61904223us : __x2apic_send_IPI_mask: CPU: 2
cluster: 2
  <idle>-0       0d.h. 61904224us : __x2apic_send_IPI_mask: dest
00010010 --> tmpmsk
  <idle>-0       0d.h. 61904224us : __x2apic_send_IPI_mask: CPU: 3
cluster: 3
  <idle>-0       0d.h. 61904224us : __x2apic_send_IPI_mask: dest
00010040 --> tmpmsk
  <idle>-0       0d.h. 62924392us : __x2apic_send_IPI_mask: To:
  <idle>-0       0d.h. 62924393us : __x2apic_send_IPI_mask: CPU: 3
cluster: 3
  <idle>-0       0d.h. 62924394us : __x2apic_send_IPI_mask: dest
00010040 --> tmpmsk

Thanks,
Rick

On 05/16/18 10:50, Thomas Gleixner wrote:
> Rick,
>
> On Tue, 15 May 2018, Thomas Gleixner wrote:
>> I can't spot an immediate fail with that commit, but I'll have a look
>> tomorrow for instrumenting this with tracepoints which can be dumped from
>> the stall detector.
> can you please give the patch below a try? I assume you have a serial
> console, otherwise this is going to be tedious.
>
> Please add the following to the kernel command line:
>
>   ftrace_dump_on_oops
>
> Note, the box will panic after the first stall and spill out the trace
> buffer over serial, which might take a while.
>
> Thanks,
>
> 	tglx
>
> 8<--------------------
>
> --- a/arch/x86/kernel/apic/x2apic_cluster.c
> +++ b/arch/x86/kernel/apic/x2apic_cluster.c
> @@ -52,20 +52,28 @@ static void
>  	if (apic_dest != APIC_DEST_ALLINC)
>  		cpumask_clear_cpu(smp_processor_id(), tmpmsk);
>  
> +	trace_printk("To: %*pbl\n", cpumask_pr_args(tmpmsk));
> +
>  	/* Collapse cpus in a cluster so a single IPI per cluster is sent */
>  	for_each_cpu(cpu, tmpmsk) {
>  		struct cluster_mask *cmsk = per_cpu(cluster_masks, cpu);
>  
>  		dest = 0;
> +		trace_printk("CPU: %u cluster: %*pbl\n", cpu,
> +			     cpumask_pr_args(&cmsk->mask));
>  		for_each_cpu_and(clustercpu, tmpmsk, &cmsk->mask)
>  			dest |= per_cpu(x86_cpu_to_logical_apicid, clustercpu);
>  
> -		if (!dest)
> +		if (!dest) {
> +			trace_printk("dest = 0!?\n");
>  			continue;
> +		}
>  
>  		__x2apic_send_IPI_dest(dest, vector, apic->dest_logical);
>  		/* Remove cluster CPUs from tmpmask */
>  		cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
> +		trace_printk("dest %08x --> tmpmsk %*pbl\n", dest,
> +			     cpumask_pr_args(tmpmsk));
>  	}
>  
>  	local_irq_restore(flags);
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -124,7 +124,7 @@ int rcu_num_lvls __read_mostly = RCU_NUM
>  int num_rcu_lvl[] = NUM_RCU_LVL_INIT;
>  int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
>  /* panic() on RCU Stall sysctl. */
> -int sysctl_panic_on_rcu_stall __read_mostly;
> +int sysctl_panic_on_rcu_stall __read_mostly = 1;
>  
>  /*
>   * The rcu_scheduler_active variable is initialized to the value
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -681,6 +681,7 @@ void on_each_cpu_cond(bool (*cond_func)(
>  		for_each_online_cpu(cpu)
>  			if (cond_func(cpu, info))
>  				cpumask_set_cpu(cpu, cpus);
> +		trace_printk("%*pbl\n", cpumask_pr_args(cpus));
>  		on_each_cpu_mask(cpus, func, info, wait);
>  		preempt_enable();
>  		free_cpumask_var(cpus);


[-- Attachment #2: x2apic-hang-on-boot-dmesg-patch1.txt --]
[-- Type: text/plain, Size: 24287 bytes --]

Command line: vmlinuz-bisect vmlinuz-bisect initrd=netuno64.cpio.img.gz vga=normal video=vga16:off nomodeset console=tty1 console=ttyS1,115200 net.ifnames=0 ftrace_dump_on_oops
x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
x86/fpu: xstate_offset[5]:  960, xstate_sizes[5]:   64
x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]:  512
x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
x86/fpu: xstate_offset[9]: 2560, xstate_sizes[9]:    8
x86/fpu: Enabled xstate features 0x2ff, context size is 2568 bytes, using 'compacted' format.
e820: BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x00000000000997ff] usable
BIOS-e820: [mem 0x0000000000099800-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x000000006a67ffff] usable
BIOS-e820: [mem 0x000000006a680000-0x000000006c77ffff] reserved
BIOS-e820: [mem 0x000000006c780000-0x000000006c88efff] usable
BIOS-e820: [mem 0x000000006c88f000-0x000000006d409fff] ACPI NVS
BIOS-e820: [mem 0x000000006d40a000-0x000000006f31bfff] reserved
BIOS-e820: [mem 0x000000006f31c000-0x000000006f7fffff] usable
BIOS-e820: [mem 0x000000006f800000-0x000000008fffffff] reserved
BIOS-e820: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved
BIOS-e820: [mem 0x00000000fed20000-0x00000000fed44fff] reserved
BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x0000000100000000-0x000000303fffffff] usable
NX (Execute Disable) protection: active
random: fast init done
SMBIOS 3.1.1 present.
DMI: Supermicro SYS-5019P-WTR/X11SPW-TF, BIOS 2.0b 02/26/2018
e820: last_pfn = 0x3040000 max_arch_pfn = 0x400000000
x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
x2apic: enabled by BIOS, switching to x2apic ops
e820: last_pfn = 0x6f800 max_arch_pfn = 0x400000000
Using GB pages for direct mapping
RAMDISK: [mem 0x663b6000-0x6a67ffff]
ACPI: Early table checksum verification disabled
ACPI: RSDP 0x00000000000F05B0 000024 (v02 SUPERM)
ACPI: XSDT 0x000000006C88F0C8 000114 (v01 SUPERM SUPERM   01072009 AMI  00010013)
ACPI: FACP 0x000000006C8D7C98 000114 (v06 SUPERM SMCI--MB 01072009 INTL 20091013)
ACPI: DSDT 0x000000006C88F278 048A1E (v02 SUPERM SMCI--MB 01072009 INTL 20091013)
ACPI: FACS 0x000000006D408080 000040
ACPI: FPDT 0x000000006C8D7DB0 000044 (v01                 01072009 AMI  00010013)
ACPI: FIDT 0x000000006C8D7DF8 00009C (v01 SUPERM SMCI--MB 01072009 AMI  00010013)
ACPI: SPMI 0x000000006C8D7E98 000041 (v05 SUPERM SMCI--MB 00000000 AMI. 00000000)
ACPI: UEFI 0x000000006C8D7EE0 00005C (v01 INTEL  RstUefiV 00000000      00000000)
ACPI: UEFI 0x000000006C8D7F40 000042 (v01 SUPERM SMCI--MB 01072009      01000013)
ACPI: MCFG 0x000000006C8D7F88 00003C (v01 SUPERM SMCI--MB 01072009 MSFT 00000097)
ACPI: HPET 0x000000006C8D7FC8 000038 (v01 SUPERM SMCI--MB 00000001 INTL 20091013)
ACPI: APIC 0x000000006C8D8000 0016DE (v03 SUPERM SMCI--MB 00000000 INTL 20091013)
ACPI: MIGT 0x000000006C8D96E0 000040 (v01 SUPERM SMCI--MB 00000000 INTL 20091013)
ACPI: MSCT 0x000000006C8D9720 000090 (v01 SUPERM SMCI--MB 00000001 INTL 20091013)
ACPI: NFIT 0x000000006C8D97B0 018028 (v01                 00000000      00000000)
ACPI: PCAT 0x000000006C8F17D8 000048 (v01 SUPERM SMCI--MB 00000002 INTL 20091013)
ACPI: PCCT 0x000000006C8F1820 00006E (v01 SUPERM SMCI--MB 00000002 INTL 20091013)
ACPI: RASF 0x000000006C8F1890 000030 (v01 SUPERM SMCI--MB 00000001 INTL 20091013)
ACPI: SLIT 0x000000006C8F18C0 00002D (v01 SUPERM SMCI--MB 00000001 INTL 20091013)
ACPI: SRAT 0x000000006C8F18F0 002830 (v03 SUPERM SMCI--MB 00000002 INTL 20091013)
ACPI: SVOS 0x000000006C8F4120 000032 (v01 SUPERM SMCI--MB 00000000 INTL 20091013)
ACPI: WDDT 0x000000006C8F4158 000040 (v01 SUPERM SMCI--MB 00000000 INTL 20091013)
ACPI: OEM4 0x000000006C8F4198 0A27C4 (v02 INTEL  CPU  CST 00003000 INTL 20140828)
ACPI: OEM1 0x000000006C996960 02A2C4 (v02 INTEL  CPU EIST 00003000 INTL 20140828)
ACPI: SSDT 0x000000006C9C0C28 033990 (v02 INTEL  SSDT  PM 00004000 INTL 20140828)
ACPI: OEM3 0x000000006C9F45B8 026664 (v02 INTEL  CPU  TST 00003000 INTL 20140828)
ACPI: SSDT 0x000000006CA1AC20 00065B (v02 SUPERM SMCI--MB 00000000 INTL 20091013)
ACPI: SSDT 0x000000006CA1B280 002AF6 (v02 INTEL  SpsNm    00000002 INTL 20140828)
ACPI: DMAR 0x000000006CA1DD78 000138 (v01 SUPERM SMCI--MB 00000001 INTL 20091013)
ACPI: HEST 0x000000006CA1DEB0 0000A8 (v01 SUPERM SMCI--MB 00000001 INTL 00000001)
ACPI: BERT 0x000000006CA1DF58 000030 (v01 SUPERM SMCI--MB 00000001 INTL 00000001)
ACPI: ERST 0x000000006CA1DF88 000230 (v01 SUPERM SMCI--MB 00000001 INTL 00000001)
ACPI: EINJ 0x000000006CA1E1B8 000150 (v01 SUPERM SMCI--MB 00000001 INTL 00000001)
Setting APIC routing to cluster x2apic.
SRAT: PXM 0 -> APIC 0x0010 -> Node 0
SRAT: PXM 0 -> APIC 0x000a -> Node 0
SRAT: PXM 0 -> APIC 0x0014 -> Node 0
SRAT: PXM 0 -> APIC 0x0016 -> Node 0
ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x303fffffff]
NUMA: Node 0 [mem 0x00000000-0x7fffffff] + [mem 0x100000000-0x303fffffff] -> [mem 0x00000000-0x303fffffff]
NODE_DATA(0) allocated [mem 0x303fff9000-0x303fffcfff]
Zone ranges:
  DMA      [mem 0x0000000000001000-0x0000000000ffffff]
  DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
  Normal   [mem 0x0000000100000000-0x000000303fffffff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000000001000-0x0000000000098fff]
  node   0: [mem 0x0000000000100000-0x000000006a67ffff]
  node   0: [mem 0x000000006c780000-0x000000006c88efff]
  node   0: [mem 0x000000006f31c000-0x000000006f7fffff]
  node   0: [mem 0x0000000100000000-0x000000303fffffff]
Initmem setup node 0 [mem 0x0000000000001000-0x000000303fffffff]
ACPI: PM-Timer IO Port: 0x508
ACPI: X2APIC_NMI (uid[0xffffffff] high level lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1])
IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
IOAPIC[1]: apic_id 9, version 32, address 0xfec01000, GSI 24-31
IOAPIC[2]: apic_id 10, version 32, address 0xfec08000, GSI 32-39
IOAPIC[3]: apic_id 11, version 32, address 0xfec10000, GSI 40-47
IOAPIC[4]: apic_id 12, version 32, address 0xfec18000, GSI 48-55
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x8086a701 base: 0xfed00000
smpboot: Allowing 4 CPUs, 0 hotplug CPUs
e820: [mem 0x90000000-0xfcffffff] available for PCI devices
Booting paravirtualized kernel on bare hardware
clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
setup_percpu: NR_CPUS:384 nr_cpumask_bits:384 nr_cpu_ids:4 nr_node_ids:1
percpu: Embedded 36 pages/cpu @ffff88303fc00000 s106536 r8192 d32728 u524288
Built 1 zonelists, mobility grouping on.  Total pages: 49201454
Policy zone: Normal
Kernel command line: ast.modeset=0 console=ttyS1,115200 console=tty1 vmlinuz-bisect vmlinuz-bisect initrd=netuno64.cpio.img.gz vga=normal video=vga16:off nomodeset console=tty1 console=ttyS1,115200 net.ifnames=0 ftrace_dump_on_oops
PID hash table entries: 4096 (order: 3, 32768 bytes)
Memory: 196466912K/199929900K available (7195K kernel code, 1358K rwdata, 3240K rodata, 188024K init, 676K bss, 3462988K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
ftrace: allocating 33884 entries in 133 pages

**********************************************************
**   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
**                                                      **
** trace_printk() being used. Allocating extra memory.  **
**                                                      **
** This means that this is a DEBUG kernel and it is     **
** unsafe for production use.                           **
**                                                      **
** If you see this message and you are not debugging    **
** the kernel, report this immediately to your vendor!  **
**                                                      **
**   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
**********************************************************
Hierarchical RCU implementation.
	RCU restricting CPUs from NR_CPUS=384 to nr_cpu_ids=4.
RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
NR_IRQS: 24832, nr_irqs: 1000, preallocated irqs: 16
Console: colour VGA+ 80x25
console [tty1] enabled
console [ttyS1] enabled
clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
APIC: Switch to symmectic I/O mode setup
DMAR: Host address width 46
DMAR: DRHD base: 0x000000c5ffc000 flags: 0x0
DMAR: dmar0: reg_base_addr c5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
DMAR: DRHD base: 0x000000e0ffc000 flags: 0x0
DMAR: dmar1: reg_base_addr e0ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
DMAR: dmar2: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
DMAR: DRHD base: 0x000000aaffc000 flags: 0x1
DMAR: dmar3: reg_base_addr aaffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
DMAR: RMRR base: 0x0000006f054000 end: 0x0000006f064fff
DMAR: ATSR flags: 0x0
DMAR: RHSA base: 0x000000aaffc000 proximity domain: 0x0
DMAR: RHSA base: 0x000000c5ffc000 proximity domain: 0x0
DMAR: RHSA base: 0x000000e0ffc000 proximity domain: 0x0
DMAR: RHSA base: 0x000000fbffc000 proximity domain: 0x0
DMAR-IR: IOAPIC id 12 under DRHD base  0xfbffc000 IOMMU 2
DMAR-IR: IOAPIC id 11 under DRHD base  0xe0ffc000 IOMMU 1
DMAR-IR: IOAPIC id 10 under DRHD base  0xc5ffc000 IOMMU 0
DMAR-IR: IOAPIC id 8 under DRHD base  0xaaffc000 IOMMU 3
DMAR-IR: IOAPIC id 9 under DRHD base  0xaaffc000 IOMMU 3
DMAR-IR: HPET id 0 under DRHD base 0xaaffc000
DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
DMAR-IR: Enabled IRQ remapping in x2apic mode
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
tsc: Detected 3600.000 MHz processor
Calibrating delay loop (skipped), value calculated using timer frequency.. 7200.00 BogoMIPS (lpj=14400000)
pid_max: default: 32768 minimum: 301
ACPI: Core revision 20170728
ACPI Error: Table [DMAR] is not invalidated during early boot stage (20170728/tbxface-196)
ACPI: 4 ACPI AML tables successfully acquired and loaded
Dentry cache hash table entries: 16777216 (order: 15, 134217728 bytes)
Inode-cache hash table entries: 8388608 (order: 14, 67108864 bytes)
Mount-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Mountpoint-cache hash table entries: 262144 (order: 9, 2097152 bytes)
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 8
mce: CPU supports 20 MCE banks
CPU0: Thermal monitoring enabled (TM1)
process: using mwait in idle threads
Last level iTLB entries: 4KB 128, 2MB 8, 4MB 8
Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
Freeing SMP alternatives memory: 36K
smpboot: Max logical packages: 56
smpboot: CPU0: Intel(R) Xeon(R) Gold 5122 CPU @ 3.60GHz (family: 0x6, model: 0x55, stepping: 0x4)
Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
... version:                4
... bit width:              48
... generic registers:      8
... value mask:             0000ffffffffffff
... max period:             00007fffffffffff
... fixed-purpose events:   3
... event mask:             00000007000000ff
Hierarchical SRCU implementation.
smp: Bringing up secondary CPUs ...
x86: Booting SMP configuration:
.... node  #0, CPUs:      #1 #2 #3
smp: Brought up 1 node, 4 CPUs
smpboot: Total of 4 processors activated (28803.72 BogoMIPS)
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
futex hash table entries: 1024 (order: 4, 65536 bytes)
xor: automatically using best checksumming function   avx       
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
cpuidle: using governor ladder
Error parsing PCC subspaces from PCCT
ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
ACPI: bus type PCI registered
PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
pmd_set_huge: Cannot satisfy [mem 0x80000000-0x80200000] with a huge-page mapping due to MTRR override.
PCI: Using configuration type 1 for base access
HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
raid6: sse2x1   gen() 12542 MB/s
raid6: sse2x1   xor()  8877 MB/s
raid6: sse2x2   gen() 16430 MB/s
raid6: sse2x2   xor() 11027 MB/s
raid6: sse2x4   gen() 18836 MB/s
raid6: sse2x4   xor() 12801 MB/s
raid6: avx2x1   gen() 25116 MB/s
raid6: avx2x1   xor() 16935 MB/s
raid6: avx2x2   gen() 30816 MB/s
raid6: avx2x2   xor() 20794 MB/s
raid6: avx2x4   gen() 33776 MB/s
raid6: avx2x4   xor() 19072 MB/s
raid6: avx512x1 gen() 35930 MB/s
raid6: avx512x1 xor() 20622 MB/s
raid6: avx512x2 gen() 45248 MB/s
raid6: avx512x2 xor() 25840 MB/s
raid6: avx512x4 gen() 52791 MB/s
raid6: avx512x4 xor() 16125 MB/s
raid6: using algorithm avx512x4 gen() 52791 MB/s
raid6: .... xor() 16125 MB/s, rmw enabled
raid6: using avx512x2 recovery algorithm
ACPI: Added _OSI(Module Device)
ACPI: Added _OSI(Processor Device)
ACPI: Added _OSI(3.0 _SCP Extensions)
ACPI: Added _OSI(Processor Aggregator Device)
ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
INFO: rcu_sched self-detected stall on CPU
	3-...: (14999 ticks this GP) idle=0de/140000000000001/0 softirq=122/131 fqs=7499 
INFO: rcu_sched detected stalls on CPUs/tasks:
	3-...: (14999 ticks this GP) idle=0de/140000000000001/0 softirq=122/131 fqs=7499 
	(detected by 0, t=15002 jiffies, g=-291, c=-292, q=66)
Sending NMI from CPU 0 to CPUs 3:
NMI backtrace for cpu 3
CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc2-netuno-x86_64+ #2
Hardware name: Supermicro SYS-5019P-WTR/X11SPW-TF, BIOS 2.0b 02/26/2018
task: ffff882f73718000 task.stack: ffffc90000070000
RIP: 0010:io_serial_in+0x18/0x1d
RSP: 0000:ffff88303fd83c10 EFLAGS: 00000002
RAX: 000003cd27276e00 RBX: ffffffff8d58fd80 RCX: 0000000000000000
RDX: 00000000000002fd RSI: 0000000000000005 RDI: ffffffff8d58fd80
RBP: ffff88303fd83c10 R08: 000003cd26b908d4 R09: ffffffff8d533df0
R10: ffff88303fd83cc0 R11: ffffffff8d52e50f R12: 00000000000026fa
R13: 0000000000000020 R14: ffffffff8145773d R15: 0000000000000053
FS:  0000000000000000(0000) GS:ffff88303fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001c0b001 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 00000000
Call Trace:
 <IRQ>
 wait_for_xmitr+0x25/0x88
 serial8250_console_putchar+0x1c/0x2c
 uart_console_write+0x40/0x4b
 serial8250_console_write+0x17f/0x1d2
 ? msg_print_text+0xa3/0xf7
 univ8250_console_write+0x21/0x23
 console_unlock+0x392/0x407
 vprintk_emit+0x213/0x225
 vprintk_default+0x1d/0x1f
 vprintk_func+0x57/0x59
 printk+0x43/0x4b
 print_cpu_stall_info+0x119/0x12c
 ? vprintk_func+0x57/0x59
 ? printk+0x43/0x4b
 rcu_check_callbacks+0x1e1/0x681
 update_process_times+0x2f/0x54
 tick_periodic+0x6c/0x6e
 tick_handle_periodic+0x25/0x66
 smp_apic_timer_interrupt+0x98/0x100
 apic_timer_interrupt+0x93/0xa0
 </IRQ>
RIP: 0010:smp_call_function_many+0x1ef/0x202
RSP: 0000:ffffc90000073a70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000001 RBX: ffff88303fd99d88 RCX: 0000000000000001
RDX: ffff88303fc9e5a0 RSI: 0000000000000000 RDI: ffff88303fd99d88
RBP: ffffc90000073ab8 R08: fffffffffffffffe R09: 0000000000000003
R10: 0000000000000000 R11: ffffea00bdca58a0 R12: ffff88303fd99d80
R13: ffff88303fd99db8 R14: 0000000000019d40 R15: ffffffff81acf3c0
 ? slub_cpu_dead+0xa7/0xa7
 ? slub_cpu_dead+0xa7/0xa7
 ? __mmu_notifier_mm_destroy+0x37/0x37
 on_each_cpu_mask+0x28/0x5a
 on_each_cpu_cond+0xa3/0xb2
 ? slub_cpu_dead+0xa7/0xa7
 flush_all+0x2a/0x2c
 __kmem_cache_shrink+0x3c/0x237
 ? set_debug_rodata+0x17/0x17
 kmem_cache_shrink+0xe/0x10
 ? kmem_cache_shrink+0xe/0x10
 acpi_os_purge_cache+0xe/0x12
 acpi_purge_cached_objects+0x2d/0x3d
 acpi_initialize_objects+0x4a/0x54
 ? acpi_sleep_init+0xd8/0xd8
 acpi_init+0xba/0x314
 ? scan_for_dmi_ipmi+0x19/0xf1
 ? acpi_sleep_init+0xd8/0xd8
 do_one_initcall+0x90/0x137
 ? set_debug_rodata+0x17/0x17
 kernel_init_freeable+0x119/0x19c
 ? rest_init+0xb0/0xb0
 kernel_init+0xe/0xf5
 ret_from_fork+0x25/0x30
Code: 00 00 55 48 89 e5 d3 e6 48 63 f6 48 03 77 20 8b 06 5d c3 0f 1f 44 00 00 0f b6 8f b1 00 00 00 8b 57 18 55 48 89 e5 d3 e6 01 f2 ec <0f> b6 c0 5d c3 0f 1f 44 00 00 0f b6 8f b1 00 00 00 89 d0 8b 57 
Kernel panic - not syncing: RCU Stall

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc2-netuno-x86_64+ #2
Hardware name: Supermicro SYS-5019P-WTR/X11SPW-TF, BIOS 2.0b 02/26/2018
Call Trace:
 <IRQ>
 dump_stack+0x61/0x7e
 panic+0xd8/0x205
 rcu_check_callbacks+0x2be/0x681
 update_process_times+0x2f/0x54
 tick_periodic+0x6c/0x6e
 tick_handle_periodic+0x25/0x66
 smp_apic_timer_interrupt+0x98/0x100
 apic_timer_interrupt+0x93/0xa0
 </IRQ>
RIP: 0010:mwait_idle+0x8d/0x145
RSP: 0000:ffffffff81c03e50 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000000 RBX: ffffffff81c104c0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffffff81c03e68 R08: 0000000000000000 R09: 0000000000000008
R10: ffffc900000efd80 R11: 000000000001d401 R12: 0000000000000000
R13: 0000000000000000 R14: ffffffff81c104c0 R15: ffffffff81c104c0
 arch_cpu_idle+0xf/0x11
 default_idle_call+0x2c/0x2e
 do_idle+0xb5/0x144
 cpu_startup_entry+0x1f/0x21
 rest_init+0xae/0xb0
 start_kernel+0x3e6/0x3f3
 x86_64_start_reservations+0x2a/0x2c
 x86_64_start_kernel+0x72/0x75
 secondary_startup_64+0xa5/0xa5
	 (t=15002 jiffies g=-291 c=-292 q=66)
NMI backtrace for cpu 3
CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc2-netuno-x86_64+ #2
Hardware name: Supermicro SYS-5019P-WTR/X11SPW-TF, BIOS 2.0b 02/26/2018
Call Trace:
 <IRQ>
 dump_stack+0x61/0x7e
 nmi_cpu_backtrace+0x9c/0xb8
 ? irq_force_complete_move+0xf4/0xf4
 nmi_trigger_cpumask_backtrace+0x5d/0xdd
 arch_trigger_cpumask_backtrace+0x19/0x1b
 rcu_dump_cpu_stacks+0x9a/0xc5
 rcu_check_callbacks+0x263/0x681
 update_process_times+0x2f/0x54
 tick_periodic+0x6c/0x6e
 tick_handle_periodic+0x25/0x66
 smp_apic_timer_interrupt+0x98/0x100
 apic_timer_interrupt+0x93/0xa0
 </IRQ>
RIP: 0010:smp_call_function_many+0x1ef/0x202
RSP: 0000:ffffc90000073a70 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000001 RBX: ffff88303fd99d88 RCX: 0000000000000001
RDX: ffff88303fc9e5a0 RSI: 0000000000000000 RDI: ffff88303fd99d88
RBP: ffffc90000073ab8 R08: fffffffffffffffe R09: 0000000000000003
R10: 0000000000000000 R11: ffffea00bdca58a0 R12: ffff88303fd99d80
R13: ffff88303fd99db8 R14: 0000000000019d40 R15: ffffffff81acf3c0
 ? slub_cpu_dead+0xa7/0xa7
 ? slub_cpu_dead+0xa7/0xa7
 ? __mmu_notifier_mm_destroy+0x37/0x37
 on_each_cpu_mask+0x28/0x5a
 on_each_cpu_cond+0xa3/0xb2
 ? slub_cpu_dead+0xa7/0xa7
 flush_all+0x2a/0x2c
 __kmem_cache_shrink+0x3c/0x237
 ? set_debug_rodata+0x17/0x17
 kmem_cache_shrink+0xe/0x10
 ? kmem_cache_shrink+0xe/0x10
 acpi_os_purge_cache+0xe/0x12
 acpi_purge_cached_objects+0x2d/0x3d
 acpi_initialize_objects+0x4a/0x54
 ? acpi_sleep_init+0xd8/0xd8
 acpi_init+0xba/0x314
 ? scan_for_dmi_ipmi+0x19/0xf1
 ? acpi_sleep_init+0xd8/0xd8
 do_one_initcall+0x90/0x137
 ? set_debug_rodata+0x17/0x17
 kernel_init_freeable+0x119/0x19c
 ? rest_init+0xb0/0xb0
 kernel_init+0xe/0xf5
 ret_from_fork+0x25/0x30
Shutting down cpus with NMI
Dumping ftrace buffer:
---------------------------------
swapper/-1       0d... 463331us : __x2apic_send_IPI_mask: To: 
swapper/-1       0d... 463333us : __x2apic_send_IPI_mask: CPU: 0 cluster: 0-1
swapper/-1       0d... 463334us : __x2apic_send_IPI_mask: dest 00010001 --> tmpmsk 
swapper/-1       0d... 504423us : __x2apic_send_IPI_mask: To: 
swapper/-1       0d... 504424us : __x2apic_send_IPI_mask: CPU: 1 cluster: 0-1
swapper/-1       0d... 504425us : __x2apic_send_IPI_mask: dest 00000400 --> tmpmsk 
swapper/-1       0d... 504425us : __x2apic_send_IPI_mask: CPU: 2 cluster: 2
swapper/-1       0d... 504425us : __x2apic_send_IPI_mask: dest 00010010 --> tmpmsk 
swapper/-1       0d... 504425us : __x2apic_send_IPI_mask: CPU: 3 cluster: 3
swapper/-1       0d... 504426us : __x2apic_send_IPI_mask: dest 00010040 --> tmpmsk 
swapper/-1       0d... 504430us : __x2apic_send_IPI_mask: To: 
swapper/-1       0d... 504430us : __x2apic_send_IPI_mask: CPU: 1 cluster: 0-1
swapper/-1       0d... 504431us : __x2apic_send_IPI_mask: dest 00000400 --> tmpmsk 
swapper/-1       0d... 504431us : __x2apic_send_IPI_mask: CPU: 2 cluster: 2
swapper/-1       0d... 504431us : __x2apic_send_IPI_mask: dest 00010010 --> tmpmsk 
swapper/-1       0d... 504431us : __x2apic_send_IPI_mask: CPU: 3 cluster: 3
swapper/-1       0d... 504431us : __x2apic_send_IPI_mask: dest 00010040 --> tmpmsk 
swapper/-1       0d... 504433us : __x2apic_send_IPI_mask: To: 
swapper/-1       0d... 504433us : __x2apic_send_IPI_mask: CPU: 1 cluster: 0-1
swapper/-1       0d... 504433us : __x2apic_send_IPI_mask: dest 00000400 --> tmpmsk 
swapper/-1       0d... 504434us : __x2apic_send_IPI_mask: CPU: 2 cluster: 2
swapper/-1       0d... 504434us : __x2apic_send_IPI_mask: dest 00010010 --> tmpmsk 
swapper/-1       0d... 504434us : __x2apic_send_IPI_mask: CPU: 3 cluster: 3
swapper/-1       0d... 504434us : __x2apic_send_IPI_mask: dest 00010040 --> tmpmsk 
swapper/-1       3d... 1813404us : __x2apic_send_IPI_mask: To: 
swapper/-1       3d... 1813405us : __x2apic_send_IPI_mask: CPU: 0 cluster: 0-1
swapper/-1       3d... 1813406us : __x2apic_send_IPI_mask: dest 00010001 --> tmpmsk 
swapper/-1       3.... 1879964us : on_each_cpu_cond: 0-1,3
swapper/-1       3.... 1879972us : on_each_cpu_cond: 0-1,3
swapper/-1       3.... 1879975us : on_each_cpu_cond: 0-1,3
swapper/-1       3d... 1879976us : __x2apic_send_IPI_mask: To: 
swapper/-1       3d... 1879977us : __x2apic_send_IPI_mask: CPU: 0 cluster: 0-1
swapper/-1       3d... 1879977us : __x2apic_send_IPI_mask: dest 00010401 --> tmpmsk 
  <idle>-0       0d.h. 61903171us : __x2apic_send_IPI_mask: To: 
  <idle>-0       0d.h. 61903172us : __x2apic_send_IPI_mask: CPU: 3 cluster: 3
  <idle>-0       0d.h. 61903172us : __x2apic_send_IPI_mask: dest 00010040 --> tmpmsk 
  <idle>-0       0d.h. 61904223us : __x2apic_send_IPI_mask: To: 
  <idle>-0       0d.h. 61904223us : __x2apic_send_IPI_mask: CPU: 1 cluster: 0-1
  <idle>-0       0d.h. 61904223us : __x2apic_send_IPI_mask: dest 00000400 --> tmpmsk 
  <idle>-0       0d.h. 61904223us : __x2apic_send_IPI_mask: CPU: 2 cluster: 2
  <idle>-0       0d.h. 61904224us : __x2apic_send_IPI_mask: dest 00010010 --> tmpmsk 
  <idle>-0       0d.h. 61904224us : __x2apic_send_IPI_mask: CPU: 3 cluster: 3
  <idle>-0       0d.h. 61904224us : __x2apic_send_IPI_mask: dest 00010040 --> tmpmsk 
  <idle>-0       0d.h. 62924392us : __x2apic_send_IPI_mask: To: 
  <idle>-0       0d.h. 62924393us : __x2apic_send_IPI_mask: CPU: 3 cluster: 3
  <idle>-0       0d.h. 62924394us : __x2apic_send_IPI_mask: dest 00010040 --> tmpmsk 
---------------------------------
---[ end Kernel panic - not syncing: RCU Stall


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bisected] rcu_sched detected stalls - 4.15 or newer kernel with some Xeon skylake CPUs and extended APIC
  2018-05-16 23:02       ` Rick Warner
@ 2018-05-17 12:36         ` Thomas Gleixner
  2018-05-17 15:59           ` Rick Warner
  2018-05-17 19:03           ` [tip:x86/urgent] x86/apic/x2apic: Initialize cluster ID properly tip-bot for Thomas Gleixner
  0 siblings, 2 replies; 8+ messages in thread
From: Thomas Gleixner @ 2018-05-17 12:36 UTC (permalink / raw)
  To: Rick Warner; +Cc: Linux Kernel Mailing List

Rick,

On Wed, 16 May 2018, Rick Warner wrote:

> I've attached the dmesg output with the kernel parameter and supplied patch.

Thanks for providing the data. I think I know what's wrong here. Does the
patch below fix your problem?

While you test, I go looking for a big brown paperbag.

Thanks,

	tglx

8<----------------

--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -124,6 +124,7 @@ static void init_x2apic_ldr(void)
 			goto update;
 	}
 	cmsk = cluster_hotplug_mask;
+	cmsk->clusterid = cluster;
 	cluster_hotplug_mask = NULL;
 update:
 	this_cpu_write(cluster_masks, cmsk);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bisected] rcu_sched detected stalls - 4.15 or newer kernel with some Xeon skylake CPUs and extended APIC
  2018-05-17 12:36         ` Thomas Gleixner
@ 2018-05-17 15:59           ` Rick Warner
  2018-05-17 19:03           ` [tip:x86/urgent] x86/apic/x2apic: Initialize cluster ID properly tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 8+ messages in thread
From: Rick Warner @ 2018-05-17 15:59 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linux Kernel Mailing List

Success!

This resolved the issue.  Thanks Thomas!

Rick

On 05/17/18 08:36, Thomas Gleixner wrote:
> Rick,
>
> On Wed, 16 May 2018, Rick Warner wrote:
>
>> I've attached the dmesg output with the kernel parameter and supplied patch.
> Thanks for providing the data. I think I know what's wrong here. Does the
> patch below fix your problem?
>
> While you test, I go looking for a big brown paperbag.
>
> Thanks,
>
> 	tglx
>
> 8<----------------
>
> --- a/arch/x86/kernel/apic/x2apic_cluster.c
> +++ b/arch/x86/kernel/apic/x2apic_cluster.c
> @@ -124,6 +124,7 @@ static void init_x2apic_ldr(void)
>  			goto update;
>  	}
>  	cmsk = cluster_hotplug_mask;
> +	cmsk->clusterid = cluster;
>  	cluster_hotplug_mask = NULL;
>  update:
>  	this_cpu_write(cluster_masks, cmsk);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:x86/urgent] x86/apic/x2apic: Initialize cluster ID properly
  2018-05-17 12:36         ` Thomas Gleixner
  2018-05-17 15:59           ` Rick Warner
@ 2018-05-17 19:03           ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 8+ messages in thread
From: tip-bot for Thomas Gleixner @ 2018-05-17 19:03 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: hpa, rick, mingo, tglx, linux-kernel

Commit-ID:  fed71f7d98795ed0fa1d431910787f0f4a68324f
Gitweb:     https://git.kernel.org/tip/fed71f7d98795ed0fa1d431910787f0f4a68324f
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 17 May 2018 14:36:39 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 17 May 2018 21:00:12 +0200

x86/apic/x2apic: Initialize cluster ID properly

Rick bisected a regression on large systems which use the x2apic cluster
mode for interrupt delivery to the commit wich reworked the cluster
management.

The problem is caused by a missing initialization of the clusterid field
in the shared cluster data structures. So all structures end up with
cluster ID 0 which only allows sharing between all CPUs which belong to
cluster 0. All other CPUs with a cluster ID > 0 cannot share the data
structure because they cannot find existing data with their cluster
ID. This causes malfunction with IPIs because IPIs are sent to the wrong
cluster and the caller waits for ever that the target CPU handles the IPI.

Add the missing initialization when a upcoming CPU is the first in a
cluster so that the later booting CPUs can find the data and share it for
proper operation.

Fixes: 023a611748fd ("x86/apic/x2apic: Simplify cluster management")
Reported-by: Rick Warner <rick@microway.com>
Bisected-by: Rick Warner <rick@microway.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Rick Warner <rick@microway.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805171418210.1947@nanos.tec.linutronix.de
---
 arch/x86/kernel/apic/x2apic_cluster.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 8b04234e010b..7685444a106b 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -116,6 +116,7 @@ static void init_x2apic_ldr(void)
 			goto update;
 	}
 	cmsk = cluster_hotplug_mask;
+	cmsk->clusterid = cluster;
 	cluster_hotplug_mask = NULL;
 update:
 	this_cpu_write(cluster_masks, cmsk);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-05-17 19:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-01 16:37 stall/hang on 4.15 kernel with some Xeon skylake CPUs and extended APIC Rick Warner
2018-05-15 16:07 ` [bisected] rcu_sched detected stalls - 4.15 or newer " Rick Warner
2018-05-15 20:19   ` Thomas Gleixner
2018-05-16 14:50     ` Thomas Gleixner
2018-05-16 23:02       ` Rick Warner
2018-05-17 12:36         ` Thomas Gleixner
2018-05-17 15:59           ` Rick Warner
2018-05-17 19:03           ` [tip:x86/urgent] x86/apic/x2apic: Initialize cluster ID properly tip-bot for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.