linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
@ 2024-01-31 19:27 Helge Deller
  2024-01-31 22:28 ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Helge Deller @ 2024-01-31 19:27 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, linux-kernel; +Cc: linux-parisc

When hot-unplugging a 32-bit CPU on the parisc platform with
"chcpu -d 1", I get the following kernel panic. Adding a check
for !pwq prevents the panic.

 Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000
 CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 6.8.0-rc1-32bit+ #1291
 Hardware name: 9000/778/B160L
 
 IASQ: 00000000 00000000 IAOQ: 10446db4 10446db8
  IIR: 0f80109c    ISR: 00000000  IOR: 00000000
  CPU:        1   CR30: 11dd1710 CR31: 00000000
  IAOQ[0]: wq_update_pod+0x98/0x14c
  IAOQ[1]: wq_update_pod+0x9c/0x14c
  RP(r2): wq_update_pod+0x80/0x14c
 Backtrace:
  [<10448744>] workqueue_offline_cpu+0x1d4/0x1dc
  [<10429db4>] cpuhp_invoke_callback+0xf8/0x200
  [<1042a1d0>] cpuhp_thread_fun+0xb8/0x164
  [<10452970>] smpboot_thread_fn+0x284/0x288
  [<1044d8f4>] kthread+0x12c/0x13c
  [<1040201c>] ret_from_kernel_thread+0x1c/0x24
 Kernel panic - not syncing: Kernel Fault

Signed-off-by: Helge Deller <deller@gmx.de>

---

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 76e60faed892..dfeee7b7322c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4521,6 +4521,8 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu,
 	wq_calc_pod_cpumask(target_attrs, cpu, off_cpu);
 	pwq = rcu_dereference_protected(*per_cpu_ptr(wq->cpu_pwq, cpu),
 					lockdep_is_held(&wq_pool_mutex));
+	if (!pwq)
+		return;
 	if (wqattrs_equal(target_attrs, pwq->pool->attrs))
 		return;
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
  2024-01-31 19:27 [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug Helge Deller
@ 2024-01-31 22:28 ` Tejun Heo
  2024-02-01 16:41   ` Helge Deller
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2024-01-31 22:28 UTC (permalink / raw)
  To: Helge Deller; +Cc: Lai Jiangshan, linux-kernel, linux-parisc

Hello,

On Wed, Jan 31, 2024 at 08:27:45PM +0100, Helge Deller wrote:
> When hot-unplugging a 32-bit CPU on the parisc platform with
> "chcpu -d 1", I get the following kernel panic. Adding a check
> for !pwq prevents the panic.
> 
>  Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000
>  CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 6.8.0-rc1-32bit+ #1291
>  Hardware name: 9000/778/B160L
>  
>  IASQ: 00000000 00000000 IAOQ: 10446db4 10446db8
>   IIR: 0f80109c    ISR: 00000000  IOR: 00000000
>   CPU:        1   CR30: 11dd1710 CR31: 00000000
>   IAOQ[0]: wq_update_pod+0x98/0x14c
>   IAOQ[1]: wq_update_pod+0x9c/0x14c
>   RP(r2): wq_update_pod+0x80/0x14c
>  Backtrace:
>   [<10448744>] workqueue_offline_cpu+0x1d4/0x1dc
>   [<10429db4>] cpuhp_invoke_callback+0xf8/0x200
>   [<1042a1d0>] cpuhp_thread_fun+0xb8/0x164
>   [<10452970>] smpboot_thread_fn+0x284/0x288
>   [<1044d8f4>] kthread+0x12c/0x13c
>   [<1040201c>] ret_from_kernel_thread+0x1c/0x24
>  Kernel panic - not syncing: Kernel Fault
> 
> Signed-off-by: Helge Deller <deller@gmx.de>
> 
> ---
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 76e60faed892..dfeee7b7322c 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4521,6 +4521,8 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu,
>  	wq_calc_pod_cpumask(target_attrs, cpu, off_cpu);
>  	pwq = rcu_dereference_protected(*per_cpu_ptr(wq->cpu_pwq, cpu),
>  					lockdep_is_held(&wq_pool_mutex));
> +	if (!pwq)
> +		return;

Hmm... I have a hard time imagining a scenario where some CPUs don't have
pwq installed on wq->cpu_pwq. Can you please run `drgn
tools/workqueue/wq_dump.py` before triggering the hotplug event and paste
the output along with full dmesg?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
  2024-01-31 22:28 ` Tejun Heo
@ 2024-02-01 16:41   ` Helge Deller
  2024-02-01 16:54     ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Helge Deller @ 2024-02-01 16:41 UTC (permalink / raw)
  To: Tejun Heo, Helge Deller; +Cc: Lai Jiangshan, linux-kernel, linux-parisc

On 1/31/24 23:28, Tejun Heo wrote:
> On Wed, Jan 31, 2024 at 08:27:45PM +0100, Helge Deller wrote:
>> When hot-unplugging a 32-bit CPU on the parisc platform with
>> "chcpu -d 1", I get the following kernel panic. Adding a check
>> for !pwq prevents the panic.
>>
>>   Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000
>>   CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 6.8.0-rc1-32bit+ #1291
>>   Hardware name: 9000/778/B160L
>>
>>   IASQ: 00000000 00000000 IAOQ: 10446db4 10446db8
>>    IIR: 0f80109c    ISR: 00000000  IOR: 00000000
>>    CPU:        1   CR30: 11dd1710 CR31: 00000000
>>    IAOQ[0]: wq_update_pod+0x98/0x14c
>>    IAOQ[1]: wq_update_pod+0x9c/0x14c
>>    RP(r2): wq_update_pod+0x80/0x14c
>>   Backtrace:
>>    [<10448744>] workqueue_offline_cpu+0x1d4/0x1dc
>>    [<10429db4>] cpuhp_invoke_callback+0xf8/0x200
>>    [<1042a1d0>] cpuhp_thread_fun+0xb8/0x164
>>    [<10452970>] smpboot_thread_fn+0x284/0x288
>>    [<1044d8f4>] kthread+0x12c/0x13c
>>    [<1040201c>] ret_from_kernel_thread+0x1c/0x24
>>   Kernel panic - not syncing: Kernel Fault
>>
>> Signed-off-by: Helge Deller <deller@gmx.de>
>>
>> ---
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 76e60faed892..dfeee7b7322c 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -4521,6 +4521,8 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu,
>>   	wq_calc_pod_cpumask(target_attrs, cpu, off_cpu);
>>   	pwq = rcu_dereference_protected(*per_cpu_ptr(wq->cpu_pwq, cpu),
>>   					lockdep_is_held(&wq_pool_mutex));
>> +	if (!pwq)
>> +		return;
>
> Hmm... I have a hard time imagining a scenario where some CPUs don't have
> pwq installed on wq->cpu_pwq. Can you please run `drgn
> tools/workqueue/wq_dump.py` before triggering the hotplug event and paste
> the output along with full dmesg?

I'm not sure if parisc is already fully supported with that tool, or
if I'm doing something wrong:

root@debian:~# uname -a
Linux debian 6.8.0-rc1-32bit+ #1292 SMP PREEMPT Thu Feb  1 11:31:38 CET 2024 parisc GNU/Linux

root@debian:~# drgn --main-symbols -s ./vmlinux ./wq_dump.py
Traceback (most recent call last):
   File "/usr/bin/drgn", line 33, in <module>
     sys.exit(load_entry_point('drgn==0.0.25', 'console_scripts', 'drgn')())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/lib/python3/dist-packages/drgn/cli.py", line 301, in _main
     runpy.run_path(script, init_globals={"prog": prog}, run_name="__main__")
   File "<frozen runpy>", line 291, in run_path
   File "<frozen runpy>", line 98, in _run_module_code
   File "<frozen runpy>", line 88, in _run_code
   File "./wq_dump.py", line 78, in <module>
     worker_pool_idr         = prog['worker_pool_idr']
                               ~~~~^^^^^^^^^^^^^^^^^^^
KeyError: 'worker_pool_idr'

Maybe you have an idea? I'll check further, but otherwise it's probably
easier for me to add some printk() to the kernel function wq_update_pod()
and send that info?

Helge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
  2024-02-01 16:41   ` Helge Deller
@ 2024-02-01 16:54     ` Tejun Heo
  2024-02-01 17:56       ` Helge Deller
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2024-02-01 16:54 UTC (permalink / raw)
  To: Helge Deller; +Cc: Helge Deller, Lai Jiangshan, linux-kernel, linux-parisc

Hello, Helge.

On Thu, Feb 01, 2024 at 05:41:10PM +0100, Helge Deller wrote:
> > Hmm... I have a hard time imagining a scenario where some CPUs don't have
> > pwq installed on wq->cpu_pwq. Can you please run `drgn
> > tools/workqueue/wq_dump.py` before triggering the hotplug event and paste
> > the output along with full dmesg?
> 
> I'm not sure if parisc is already fully supported with that tool, or
> if I'm doing something wrong:
> 
> root@debian:~# uname -a
> Linux debian 6.8.0-rc1-32bit+ #1292 SMP PREEMPT Thu Feb  1 11:31:38 CET 2024 parisc GNU/Linux
> 
> root@debian:~# drgn --main-symbols -s ./vmlinux ./wq_dump.py
> Traceback (most recent call last):
>   File "/usr/bin/drgn", line 33, in <module>
>     sys.exit(load_entry_point('drgn==0.0.25', 'console_scripts', 'drgn')())
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/drgn/cli.py", line 301, in _main
>     runpy.run_path(script, init_globals={"prog": prog}, run_name="__main__")
>   File "<frozen runpy>", line 291, in run_path
>   File "<frozen runpy>", line 98, in _run_module_code
>   File "<frozen runpy>", line 88, in _run_code
>   File "./wq_dump.py", line 78, in <module>
>     worker_pool_idr         = prog['worker_pool_idr']
>                               ~~~~^^^^^^^^^^^^^^^^^^^
> KeyError: 'worker_pool_idr'

Does the kernel have CONFIG_DEBUG_INFO enabled? If you can look up
worker_pool_idr in gdb, drgn should be able to do the same.

> Maybe you have an idea? I'll check further, but otherwise it's probably
> easier for me to add some printk() to the kernel function wq_update_pod()
> and send that info?

Can you first try with drgn? The script dumps all the config info, so it's
likely easier to view that way. If that doesn't work out, I can write up a
debug patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
  2024-02-01 16:54     ` Tejun Heo
@ 2024-02-01 17:56       ` Helge Deller
  2024-02-02  1:39         ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Helge Deller @ 2024-02-01 17:56 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Helge Deller, Lai Jiangshan, linux-kernel, linux-parisc

Hi Tejun,

On 2/1/24 17:54, Tejun Heo wrote:
> On Thu, Feb 01, 2024 at 05:41:10PM +0100, Helge Deller wrote:
>>> Hmm... I have a hard time imagining a scenario where some CPUs don't have
>>> pwq installed on wq->cpu_pwq. Can you please run `drgn
>>> tools/workqueue/wq_dump.py` before triggering the hotplug event and paste
>>> the output along with full dmesg?

Enabling CONFIG_DEBUG_INFO=y did the trick :-)


root@debian:~# drgn --main-symbols -s ./vmlinux ./wq_dump.py 2>&1 | tee L
Affinity Scopes
===============
wq_unbound_cpumask=0000ffff

CPU
   nr_pods  16
   pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008 [4]=00000010 [5]=00000020 [6]=00000040 [7]=00000080 [8]=00000100 [9]=00000200 [10]=00000400 [11]=00000800 [12]=00001000 [13]=00002000 [14]=00004000 [15]=00008000
   pod_node [0]=0 [1]=0 [2]=0 [3]=0 [4]=0 [5]=0 [6]=0 [7]=0 [8]=0 [9]=0 [10]=0 [11]=0 [12]=0 [13]=0 [14]=0 [15]=0
   cpu_pod  [0]=0 [1]=1

SMT
   nr_pods  16
   pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008 [4]=00000010 [5]=00000020 [6]=00000040 [7]=00000080 [8]=00000100 [9]=00000200 [10]=00000400 [11]=00000800 [12]=00001000 [13]=00002000 [14]=00004000 [15]=00008000
   pod_node [0]=0 [1]=0 [2]=0 [3]=0 [4]=0 [5]=0 [6]=0 [7]=0 [8]=0 [9]=0 [10]=0 [11]=0 [12]=0 [13]=0 [14]=0 [15]=0
   cpu_pod  [0]=0 [1]=1

CACHE (default)
   nr_pods  1
   pod_cpus [0]=0000ffff
   pod_node [0]=0
   cpu_pod  [0]=0 [1]=0

NUMA
   nr_pods  1
   pod_cpus [0]=0000ffff
   pod_node [0]=0
   cpu_pod  [0]=0 [1]=0

SYSTEM
   nr_pods  1
   pod_cpus [0]=0000ffff
   pod_node [0]=-1
   cpu_pod  [0]=0 [1]=0

Worker Pools
============
pool[00] ref= 1 nice=  0 idle/workers=  4/  4 cpu=  0
pool[01] ref= 1 nice=-20 idle/workers=  2/  2 cpu=  0
pool[02] ref= 1 nice=  0 idle/workers=  4/  4 cpu=  1
pool[03] ref= 1 nice=-20 idle/workers=  2/  2 cpu=  1
pool[04] ref= 1 nice=  0 idle/workers=  0/  0 cpu=  2
pool[05] ref= 1 nice=-20 idle/workers=  0/  0 cpu=  2
pool[06] ref= 1 nice=  0 idle/workers=  0/  0 cpu=  3
pool[07] ref= 1 nice=-20 idle/workers=  0/  0 cpu=  3
pool[08] ref= 1 nice=  0 idle/workers=  0/  0 cpu=  4
pool[09] ref= 1 nice=-20 idle/workers=  0/  0 cpu=  4
pool[10] ref= 1 nice=  0 idle/workers=  0/  0 cpu=  5
pool[11] ref= 1 nice=-20 idle/workers=  0/  0 cpu=  5
pool[12] ref= 1 nice=  0 idle/workers=  0/  0 cpu=  6
pool[13] ref= 1 nice=-20 idle/workers=  0/  0 cpu=  6
pool[14] ref= 1 nice=  0 idle/workers=  0/  0 cpu=  7
pool[15] ref= 1 nice=-20 idle/workers=  0/  0 cpu=  7
pool[16] ref= 1 nice=  0 idle/workers=  0/  0 cpu=  8
pool[17] ref= 1 nice=-20 idle/workers=  0/  0 cpu=  8
pool[18] ref= 1 nice=  0 idle/workers=  0/  0 cpu=  9
pool[19] ref= 1 nice=-20 idle/workers=  0/  0 cpu=  9
pool[20] ref= 1 nice=  0 idle/workers=  0/  0 cpu= 10
pool[21] ref= 1 nice=-20 idle/workers=  0/  0 cpu= 10
pool[22] ref= 1 nice=  0 idle/workers=  0/  0 cpu= 11
pool[23] ref= 1 nice=-20 idle/workers=  0/  0 cpu= 11
pool[24] ref= 1 nice=  0 idle/workers=  0/  0 cpu= 12
pool[25] ref= 1 nice=-20 idle/workers=  0/  0 cpu= 12
pool[26] ref= 1 nice=  0 idle/workers=  0/  0 cpu= 13
pool[27] ref= 1 nice=-20 idle/workers=  0/  0 cpu= 13
pool[28] ref= 1 nice=  0 idle/workers=  0/  0 cpu= 14
pool[29] ref= 1 nice=-20 idle/workers=  0/  0 cpu= 14
pool[30] ref= 1 nice=  0 idle/workers=  0/  0 cpu= 15
pool[31] ref= 1 nice=-20 idle/workers=  0/  0 cpu= 15
pool[32] ref=28 nice=  0 idle/workers=  8/  8 cpus=0000ffff pod_cpus=0000ffff

Workqueue CPU -> pool
=====================
[    workqueue     \     type   CPU  0  1 dfl]
events                   percpu      0  2
events_highpri           percpu      1  3
events_long              percpu      0  2
events_unbound           unbound    32 32 32
events_freezable         percpu      0  2
events_power_efficient   percpu      0  2
events_freezable_power_  percpu      0  2
rcu_gp                   percpu      0  2
rcu_par_gp               percpu      0  2
slub_flushwq             percpu      0  2
netns                    ordered    32 32 32
mm_percpu_wq             percpu      0  2
inet_frag_wq             percpu      0  2
cgroup_destroy           percpu      0  2
cgroup_pidlist_destroy   percpu      0  2
cgwb_release             percpu      0  2
writeback                unbound    32 32 32
kintegrityd              percpu      1  3
kblockd                  percpu      1  3
blkcg_punt_bio           unbound    32 32 32
ata_sff                  percpu      0  2
usb_hub_wq               percpu      0  2
inode_switch_wbs         percpu      0  2
virtio-blk               percpu      0  2
scsi_tmf_0               ordered    32 32 32
psmouse-smbus            percpu      0  2
kpsmoused                ordered    32 32 32
sock_diag_events         percpu      0  2
kstrp                    ordered    32 32 32
ext4-rsv-conversion      ordered    32 32 32
root@debian:~#
root@debian:~# lscpu
Architecture:          parisc
   Byte Order:          Big Endian
CPU(s):                2
   On-line CPU(s) list: 0,1
Model name:            PA7300LC (PCX-L2)
   CPU family:          PA-RISC 1.1e
   Model:               9000/778/B160L - Merlin L2 160 (9000/778/B160L)
   Thread(s) per core:  1
   Core(s) per socket:  1
   Socket(s):           2
   BogoMIPS:            2446.13
root@debian:~#
root@debian:~# chcpu -d 1
[  261.926353] Backtrace:
[  261.928292]  [<10448744>] workqueue_offline_cpu+0x1d4/0x1dc
[  261.928292]  [<10429db4>] cpuhp_invoke_callback+0xf8/0x200
[  261.928292]  [<1042a1d0>] cpuhp_thread_fun+0xb8/0x164
[  261.928292]  [<10452970>] smpboot_thread_fn+0x284/0x288
[  261.928292]  [<1044d8f4>] kthread+0x12c/0x13c
[  261.928292]  [<1040201c>] ret_from_kernel_thread+0x1c/0x24
[  261.928292]
[  261.928292]
[  261.928292] Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000
[  261.928292] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 6.8.0-rc1-32bit+ #1293
[  261.928292] Hardware name: 9000/778/B160L
[  261.928292]
[  261.928292]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[  261.928292] PSW: 00000000000001101111111100001111 Not tainted
[  261.928292] r00-03  0006ff0f 11011540 10446d9c 11e00500
[  261.928292] r04-07  11c0b800 00000002 11c0d000 00000001
[  261.928292] r08-11  110194e4 11018f08 00000000 00000004
[  261.928292] r12-15  10c78800 00000612 f0028050 f0027fd8
[  261.928292] r16-19  fffffffc fee01180 f0027ed8 01735000
[  261.928292] r20-23  0000ffff 1249cc00 1249cc00 00000000
[  261.928292] r24-27  11c0c580 11c0d004 11c0d000 10ceb708
[  261.928292] r28-31  00000000 0000000e 11e00580 00000018
[  261.928292] sr00-03  00000000 00000000 00000000 000004be
[  261.928292] sr04-07  00000000 00000000 00000000 00000000
[  261.928292]
[  261.928292] IASQ: 00000000 00000000 IAOQ: 10446db4 10446db8
[  261.928292]  IIR: 0f80109c    ISR: 00000000  IOR: 00000000
[  261.928292]  CPU:        1   CR30: 11dd1710 CR31: 00000000
[  261.928292]  ORIG_R28: 00000612
[  261.928292]  IAOQ[0]: wq_update_pod+0x98/0x14c
[  261.928292]  IAOQ[1]: wq_update_pod+0x9c/0x14c
[  261.928292]  RP(r2): wq_update_pod+0x80/0x14c
[  261.928292] Backtrace:
[  261.928292]  [<10448744>] workqueue_offline_cpu+0x1d4/0x1dc
[  261.928292]  [<10429db4>] cpuhp_invoke_callback+0xf8/0x200
[  261.928292]  [<1042a1d0>] cpuhp_thread_fun+0xb8/0x164
[  261.928292]  [<10452970>] smpboot_thread_fn+0x284/0x288
[  261.928292]  [<1044d8f4>] kthread+0x12c/0x13c
[  261.928292]  [<1040201c>] ret_from_kernel_thread+0x1c/0x24
[  261.928292]
[  261.928292] Kernel panic - not syncing: Kernel Fault


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
  2024-02-01 17:56       ` Helge Deller
@ 2024-02-02  1:39         ` Tejun Heo
  2024-02-02  8:28           ` Helge Deller
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2024-02-02  1:39 UTC (permalink / raw)
  To: Helge Deller; +Cc: Helge Deller, Lai Jiangshan, linux-kernel, linux-parisc

Hello,

On Thu, Feb 01, 2024 at 06:56:20PM +0100, Helge Deller wrote:
> root@debian:~# drgn --main-symbols -s ./vmlinux ./wq_dump.py 2>&1 | tee L
> Affinity Scopes
> ===============
> wq_unbound_cpumask=0000ffff
> 
> CPU
>   nr_pods  16
>   pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008 [4]=00000010 [5]=00000020 [6]=00000040 [7]=00000080 [8]=00000100 [9]=00000200 [10]=00000400 [11]=00000800 [12]=00001000 [13]=00002000 [14]=00004000 [15]=00008000
>   pod_node [0]=0 [1]=0 [2]=0 [3]=0 [4]=0 [5]=0 [6]=0 [7]=0 [8]=0 [9]=0 [10]=0 [11]=0 [12]=0 [13]=0 [14]=0 [15]=0
>   cpu_pod  [0]=0 [1]=1

wq_unbound_cpumask is saying there are 16 possible cpus but
for_each_possible_cpu() iteration is only giving two. Can you please apply
the following patch and post the boot dmesg? Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ffb625db9771..d3fa2bea4d75 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7146,6 +7146,9 @@ void __init workqueue_init_early(void)
 	BUG_ON(!alloc_cpumask_var(&wq_requested_unbound_cpumask, GFP_KERNEL));
 	BUG_ON(!zalloc_cpumask_var(&wq_isolated_cpumask, GFP_KERNEL));
 
+	printk("XXX workqueue_init_early: possible_cpus=%*pb\n",
+	       cpumask_pr_args(cpu_possible_mask));
+
 	cpumask_copy(wq_unbound_cpumask, cpu_possible_mask);
 	restrict_unbound_cpumask("HK_TYPE_WQ", housekeeping_cpumask(HK_TYPE_WQ));
 	restrict_unbound_cpumask("HK_TYPE_DOMAIN", housekeeping_cpumask(HK_TYPE_DOMAIN));
@@ -7290,6 +7293,9 @@ void __init workqueue_init(void)
 	struct worker_pool *pool;
 	int cpu, bkt;
 
+	printk("XXX workqueue_init: possible_cpus=%*pb\n",
+	       cpumask_pr_args(cpu_possible_mask));
+
 	wq_cpu_intensive_thresh_init();
 
 	mutex_lock(&wq_pool_mutex);
@@ -7401,6 +7407,9 @@ void __init workqueue_init_topology(void)
 	struct workqueue_struct *wq;
 	int cpu;
 
+	printk("XXX workqueue_init_topology: possible_cpus=%*pb\n",
+	       cpumask_pr_args(cpu_possible_mask));
+
 	init_pod_type(&wq_pod_types[WQ_AFFN_CPU], cpus_dont_share);
 	init_pod_type(&wq_pod_types[WQ_AFFN_SMT], cpus_share_smt);
 	init_pod_type(&wq_pod_types[WQ_AFFN_CACHE], cpus_share_cache);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
  2024-02-02  1:39         ` Tejun Heo
@ 2024-02-02  8:28           ` Helge Deller
  2024-02-02  8:41             ` Helge Deller
  0 siblings, 1 reply; 11+ messages in thread
From: Helge Deller @ 2024-02-02  8:28 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Helge Deller, Lai Jiangshan, linux-kernel, linux-parisc

On 2/2/24 02:39, Tejun Heo wrote:
> Hello,
>
> On Thu, Feb 01, 2024 at 06:56:20PM +0100, Helge Deller wrote:
>> root@debian:~# drgn --main-symbols -s ./vmlinux ./wq_dump.py 2>&1 | tee L
>> Affinity Scopes
>> ===============
>> wq_unbound_cpumask=0000ffff
>>
>> CPU
>>    nr_pods  16
>>    pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008 [4]=00000010 [5]=00000020 [6]=00000040 [7]=00000080 [8]=00000100 [9]=00000200 [10]=00000400 [11]=00000800 [12]=00001000 [13]=00002000 [14]=00004000 [15]=00008000
>>    pod_node [0]=0 [1]=0 [2]=0 [3]=0 [4]=0 [5]=0 [6]=0 [7]=0 [8]=0 [9]=0 [10]=0 [11]=0 [12]=0 [13]=0 [14]=0 [15]=0
>>    cpu_pod  [0]=0 [1]=1
>
> wq_unbound_cpumask is saying there are 16 possible cpus but
> for_each_possible_cpu() iteration is only giving two. Can you please apply
> the following patch and post the boot dmesg? Thanks.
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index ffb625db9771..d3fa2bea4d75 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -7146,6 +7146,9 @@ void __init workqueue_init_early(void)
>   	BUG_ON(!alloc_cpumask_var(&wq_requested_unbound_cpumask, GFP_KERNEL));
>   	BUG_ON(!zalloc_cpumask_var(&wq_isolated_cpumask, GFP_KERNEL));
>
> +	printk("XXX workqueue_init_early: possible_cpus=%*pb\n",
> +	       cpumask_pr_args(cpu_possible_mask));
> +
>   	cpumask_copy(wq_unbound_cpumask, cpu_possible_mask);
>   	restrict_unbound_cpumask("HK_TYPE_WQ", housekeeping_cpumask(HK_TYPE_WQ));
>   	restrict_unbound_cpumask("HK_TYPE_DOMAIN", housekeeping_cpumask(HK_TYPE_DOMAIN));
> @@ -7290,6 +7293,9 @@ void __init workqueue_init(void)
>   	struct worker_pool *pool;
>   	int cpu, bkt;
>
> +	printk("XXX workqueue_init: possible_cpus=%*pb\n",
> +	       cpumask_pr_args(cpu_possible_mask));
> +
>   	wq_cpu_intensive_thresh_init();
>
>   	mutex_lock(&wq_pool_mutex);
> @@ -7401,6 +7407,9 @@ void __init workqueue_init_topology(void)
>   	struct workqueue_struct *wq;
>   	int cpu;
>
> +	printk("XXX workqueue_init_topology: possible_cpus=%*pb\n",
> +	       cpumask_pr_args(cpu_possible_mask));
> +
>   	init_pod_type(&wq_pod_types[WQ_AFFN_CPU], cpus_dont_share);
>   	init_pod_type(&wq_pod_types[WQ_AFFN_SMT], cpus_share_smt);
>   	init_pod_type(&wq_pod_types[WQ_AFFN_CACHE], cpus_share_cache);

Here it is:

[    0.000000] Linux version 6.8.0-rc2-32bit+ (deller@carbonx1) (hppa-linux-gnu-gcc (GCC) 13.2.1 20230728 (Red Hat Cross 13.2.1-1), GNU ld version 2.40-3.fc39) #1294 SMP PREEMPT Fri Feb  2 09:24:28 CET 2024
[    0.000000] FP[0] enabled: Rev 1 Model 15
[    0.000000] The 32-bit Kernel has started...
[    0.000000] Kernel default page size is 4 KB. Huge pages disabled.
[    0.000000] Determining PDC firmware type: System Map.
[    0.000000] model 00005020 00000481 00000000 02020202 77729da0 100000f0 00000004 000000ba 000000ba 00000000
[    0.000000] vers  00000008
[    0.000000] CPUID vers 15 rev 8 (0x000001e8)
[    0.000000] capabilities 0x2
[    0.000000] HP-UX model name: 9000/778/B160L
[    0.000000] MPE/iX model name: 9000/778/B160L
[    0.000000] Memory Ranges:
[    0.000000]  0) Start 0x0000000000000000 End 0x000000001fffffff Size    512 MB
[    0.000000] Total Memory: 512 MB
[    0.000000] PDT: Firmware does not provide any page deallocation information.
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000001ffffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000001fffffff]
[    0.000000] percpu: Embedded 16 pages/cpu s34912 r8192 d22432 u65536
[    0.000000] SMP: bootstrap CPU ID is 0
[    0.000000] Kernel command line: root=/dev/sda5  panic=-1 console=ttyS0  earlycon=pdc
[    0.000000] earlycon: pdc0 at MMIO32be 0x00000000 (options '')
[    0.000000] printk: legacy bootconsole [pdc0] enabled
[    0.000000] printk: log_buf_len individual max cpu contribution: 4096 bytes
[    0.000000] printk: log_buf_len total cpu_extra contributions: 61440 bytes
[    0.000000] printk: log_buf_len min size: 65536 bytes
[    0.000000] printk: log_buf_len: 131072 bytes
[    0.000000] printk: early log buf free: 63792(97%)
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 129920
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 497732K/524288K available (7216K kernel code, 3452K rwdata, 2176K rodata, 3072K init, 388K bss, 26556K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
[    0.000000] XXX workqueue_init_early: possible_cpus=ffff
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000]  Trampoline variant of Tasks RCU enabled.
[    0.000000]  Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] NR_IRQS: 96
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000149] sched_clock: 32 bits at 250MHz, resolution 4ns, wraps every 8589934590ns
[    0.005146] Console: colour dummy device 160x64
[    0.006640] Calibrating delay loop... 1771.11 BogoMIPS (lpj=8855552)
[    0.123925] pid_max: default: 32768 minimum: 301
[    0.128202] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.128461] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.159241] XXX workqueue_init: possible_cpus=ffff
[    0.175221] RCU Tasks: Setting shift to 4 and lim to 1 rcu_task_cb_adjust=1.
[    0.176798] RCU Tasks Trace: Setting shift to 4 and lim to 1 rcu_task_cb_adjust=1.
[    0.177562] TOC handler registered
[    0.178992] rcu: Hierarchical SRCU implementation.
[    0.179222] rcu:     Max phase no-delay instances is 1000.
[    0.193315] smp: Bringing up secondary CPUs ...
[    0.194144] smp: Brought up 1 node, 1 CPU
[    0.199106] XXX workqueue_init_topology: possible_cpus=ffff
[    0.208842] devtmpfs: initialized
[    0.217239] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.218067] futex hash table entries: 4096 (order: 5, 131072 bytes, linear)
[    0.224055] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.231058] EISA bus registered
[    0.231481] Searching for devices...
[    0.237416] Found devices:
[    0.237623] 1. Phantom PseudoBC GSC+ Port [8] at 0xffc00000 { type:7, hv:0x504, sv:0x0, rev:0x0 }
[    0.237874] 2. Dino PCI Bridge [8:0] at 0xfff80000 { type:13, hv:0x680, sv:0xa, rev:0x3 }
[    0.238234] 3. Merlin+ 132 Dino RS-232 [8:0:63] at 0xfff83000 { type:10, hv:0x22, sv:0x8c, rev:0x0 }
[    0.238490] 4. Merlin 160 Core BA [8:16] at 0xffd00000 { type:11, hv:0x3d, sv:0x81, rev:0x0 }, additional addresses: 0xffd0c000 0xffc00000
[    0.238923] 5. Merlin 160 Core RS-232 [8:16:4] at 0xffd05000 { type:10, hv:0x3d, sv:0x8c, rev:0x0 }
[    0.239193] 6. Merlin 160 Core Centronics [8:16:0] at 0xffd02000 { type:10, hv:0x3d, sv:0x74, rev:0x0 }, additional addresses: 0xffd01000 0xffd03000
[    0.239595] 7. Memory [63] at 0xfffff000 { type:1, hv:0x67, sv:0x9, rev:0x0 }
[    0.239839] 8. Merlin L2 160 (9000/778/B160L) [48] at 0xfffb0000 { type:0, hv:0x502, sv:0x4, rev:0x0 }
[    0.240145] 9. Merlin L2 160 (9000/778/B160L) [49] at 0xfffb1000 { type:0, hv:0x502, sv:0x4, rev:0x0 }
[    0.240509] Found qemu fw_cfg interface at 0xfffa0000
[    0.243313] CPU0: cpu core 0 of socket 0
[    0.244475] CPU1: cpu core 0 of socket 1
[    0.248134] Releasing cpu 1 now, hpa=fffb1000
[    0.255231] CPU(s): 2 out of 2 PA7300LC (PCX-L2) at 250.000000 MHz online
[    0.257077] alternatives: applied 17 out of 1505 patches
[    0.258013] Calculated flush threshold is 5909 KiB
[    0.258162] Cache flush threshold set to 2 KiB
[    0.258304] TLB flush threshold set to 480 KiB
[    0.259216] Lasi version 0 at 0xffd00000 found.
[    0.261123] Dino version 3.1 found at 0xfff80000
[    0.263459] dino 8:0: PCI host bridge to bus 0000:00
[    0.263841] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
[    0.264196] pci_bus 0000:00: root bus resource [mem 0xf0800000-0xff7fffff]
[    0.264476] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.265734] pci 0000:00:00.0: [1000:0012] type 00 class 0x010000 conventional PCI endpoint
[    0.267232] pci 0000:00:00.0: BAR 0 [io  0x1000-0x10ff]
[    0.268292] pci 0000:00:00.0: BAR 1 [mem 0xff7fe000-0xff7fe3ff]
[    0.269408] pci 0000:00:00.0: BAR 2 [mem 0xff7fc000-0xff7fdfff]
[    0.277602] pci 0000:00:01.0: [1011:0019] type 00 class 0x020000 conventional PCI endpoint
[    0.278298] pci 0000:00:01.0: BAR 0 [io  0x1100-0x117f]
[    0.278820] pci 0000:00:01.0: BAR 1 [mem 0xff7ff000-0xff7ff07f]
[    0.281323] pci 0000:00:02.0: [103c:1048] type 00 class 0x070000 conventional PCI endpoint
[    0.281867] pci 0000:00:02.0: BAR 0 [io  0x1180-0x1187]
[    0.284921] pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00
[    0.287179] pci 0000:00:00.0: BAR 2 [mem 0xf0800000-0xf0801fff]: assigned
[    0.287830] pci 0000:00:00.0: BAR 1 [mem 0xf0802000-0xf08023ff]: assigned
[    0.288225] pci 0000:00:00.0: BAR 0 [io  0x0100-0x01ff]: assigned
[    0.288628] pci 0000:00:01.0: BAR 0 [io  0x0080-0x00ff]: assigned
[    0.289002] pci 0000:00:01.0: BAR 1 [mem 0xf0803000-0xf080307f]: assigned
[    0.289359] pci 0000:00:02.0: BAR 0 [io  0x0010-0x0017]: assigned
[    0.290999] powersw: Soft power switch at 0xf07ffff0 enabled.
[    0.314551] SCSI subsystem initialized
[    0.317664] usbcore: registered new interface driver usbfs
[    0.318202] usbcore: registered new interface driver hub
[    0.318580] usbcore: registered new device driver usb
[    0.334210] vgaarb: loaded
[    0.337059] VFS: Disk quotas dquot_6.6.0
[    0.337473] VFS: Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[    0.369580] NET: Registered PF_INET protocol family
[    0.371471] IP idents hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.380082] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 5120 bytes, linear)
[    0.381103] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.381397] TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
[    0.382605] TCP bind hash table entries: 4096 (order: 5, 163840 bytes, linear)
[    0.383338] TCP: Hash tables configured (established 4096 bind 4096)
[    0.385144] UDP hash table entries: 256 (order: 1, 12288 bytes, linear)
[    0.385743] UDP-Lite hash table entries: 256 (order: 1, 12288 bytes, linear)
[    0.387545] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.388172] PCI: CLS 16 bytes
[    0.390311] clocksource: cr16: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041786 ns
[    0.391141] clocksource: Switched to clocksource cr16
[    0.391756] Enabling PDC chassis warnings support v0.05
[    0.396076] workingset: timestamp_bits=14 max_order=17 bucket_order=3
[    0.404012] io scheduler mq-deadline registered
[    0.404324] io scheduler kyber registered
[    0.407216] PDC Stable Storage facility v0.30
[    0.408521] sticore: STI GSC/PCI core graphics driver Version 0.9c
[    0.496987] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.511533] printk: legacy console [ttyS0] disabled
[    0.515146] 8:16:4: ttyS0 at MMIO 0xffd05800 (irq = 16, base_baud = 454545) is a 16550A
[    0.518634] printk: legacy console [ttyS0] enabled
[    0.518634] printk: legacy console [ttyS0] enabled
[    0.519267] printk: legacy bootconsole [pdc0] disabled
[    0.519267] printk: legacy bootconsole [pdc0] disabled
[    0.528540] 8:0:63: ttyS1 at MMIO 0xfff83800 (irq = 18, base_baud = 454545) is a 16550A
[    0.531850] serial 0000:00:02.0: enabling SERR and PARITY (0103 -> 0143)
[    0.535316] 0000:00:02.0: ttyS2 at I/O 0x10 (irq = 21, base_baud = 115200) is a 16550A
[    0.539332] parport_init_chip: initialize bidirectional-mode
[    0.540300] parport0: PC-style at 0xffd02800, irq 17 [PCSPP,TRISTATE]
[    0.667976] loop: module loaded
[    0.669638] sym53c8xx 0000:00:00.0: enabling SERR and PARITY (0107 -> 0147)
[    0.672643] sym0: <895a> rev 0x0 at pci 0000:00:00.0 irq 19
[    0.678720] sym0: PA-RISC Firmware, ID 7, Fast-40, LVD, parity checking
[    0.684386] sym0: SCSI BUS has been reset.
[    0.687904] scsi host0: sym-2.2.3
[    3.783993] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[    3.784726] scsi target0:0:0: tagged command queuing enabled, command queue depth 16.
[    3.785900] scsi target0:0:0: Beginning Domain Validation
[    3.788722] scsi target0:0:0: Domain Validation skipping write tests
[    3.788895] scsi target0:0:0: Ending Domain Validation
[    3.801756] scsi 0:0:2:0: CD-ROM            QEMU     QEMU CD-ROM      2.5+ PQ: 0 ANSI: 5
[    3.802043] scsi target0:0:2: tagged command queuing enabled, command queue depth 16.
[    3.802392] scsi target0:0:2: Beginning Domain Validation
[    3.803995] scsi target0:0:2: Domain Validation skipping write tests
[    3.804138] scsi target0:0:2: Ending Domain Validation
[    3.816982] st: Version 20160209, fixed bufsize 32768, s/g segs 256
[    3.820181] sr 0:0:2:0: Power-on or device reset occurred
[    3.822714] sr 0:0:2:0: [sr0] scsi3-mmc drive: 16x/50x cd/rw xa/form2 cdda tray
[    3.823139] cdrom: Uniform CD-ROM driver Revision: 3.20
[    3.824901] sd 0:0:0:0: Power-on or device reset occurred
[    3.828055] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
[    3.829255] sd 0:0:0:0: [sda] Write Protect is off
[    3.832789] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    3.833235] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    3.833534] sr 0:0:2:0: Attached scsi generic sg1 type 5
[    3.835756] tulip 0000:00:01.0: enabling SERR and PARITY (0103 -> 0143)
[    3.849999] tulip0: EEPROM default media type Autosense
[    3.850253] tulip0: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block
[    3.853899] tulip0:  MII transceiver #1 config 3100 status 702c advertising 0501
[    3.856302]  sda: sda1 sda2 sda3 < sda5 sda6 >
[    3.870391] sd 0:0:0:0: [sda] Attached SCSI disk
[    3.871194] net eth0: Digital DS21142/43 Tulip rev 0 at Port 0x80, 52:54:00:12:34:56, IRQ 20
[    3.872279] e100: Intel(R) PRO/100 Network Driver
[    3.872486] e100: Copyright(c) 1999-2006 Intel Corporation
[    3.873836] e1000: Intel(R) PRO/1000 Network Driver
[    3.874067] e1000: Copyright (c) 1999-2006 Intel Corporation.
[    3.874406] e1000e: Intel(R) PRO/1000 Network Driver
[    3.874536] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    3.875010] igb: Intel(R) Gigabit Ethernet Network Driver
[    3.875212] igb: Copyright (c) 2007-2014 Intel Corporation.
[    3.875563] LASI 82596 driver - Revision: 1.30
[    3.878798] HP SDC: No SDC found.
[    3.879000] HP SDC MLC: Registering the System Domain Controller's HIL MLC.
[    3.884554] HP SDC MLC: Request for raw HIL ISR hook denied
[    3.906699] mousedev: PS/2 mouse device common for all mice
[    3.912379] rtc-generic rtc-generic: registered as rtc0
[    3.913033] rtc-generic rtc-generic: setting system clock to 2024-02-02T08:24:45 UTC (1706862285)
[    3.919114] hid: raw HID events driver (C) Jiri Kosina
[    3.919926] usbcore: registered new interface driver usbhid
[    3.920146] usbhid: USB HID core driver
[    3.921737] NET: Registered PF_PACKET protocol family
[    3.924787] Key type dns_resolver registered
[    4.092592] EXT4-fs (sda5): INFO: recovery required on readonly filesystem
[    4.092910] EXT4-fs (sda5): write access will be enabled during recovery
[    4.200680] EXT4-fs (sda5): recovery complete
[    4.207887] EXT4-fs (sda5): mounted filesystem 0e24f05b-efd0-4f75-b1de-3309bd27dbd8 ro with ordered data mode. Quota mode: none.
[    4.209030] VFS: Mounted root (ext4 filesystem) readonly on device 8:5.
[    4.214953] devtmpfs: mounted
[    4.267329] Freeing unused kernel image (initmem) memory: 3072K
[    4.267826] Write protected read-only-after-init data: 2k
[    4.268225] Run /sbin/init as init process
[    4.305438] process 'usr/lib/systemd/systemd' started with executable stack
[    4.932357] systemd[1]: systemd 255.3-2 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK -SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
[    4.933521] systemd[1]: Detected architecture parisc.

Welcome to Debian GNU/Linux trixie/sid!

[...]
root@debian:~# lscpu
Architecture:          parisc
   Byte Order:          Big Endian
CPU(s):                2
   On-line CPU(s) list: 0,1
Model name:            PA7300LC (PCX-L2)
   CPU family:          PA-RISC 1.1e
   Model:               9000/778/B160L - Merlin L2 160 (9000/778/B160L)
   Thread(s) per core:  1
   Core(s) per socket:  1
   Socket(s):           2
   BogoMIPS:            1771.11
root@debian:~# chcpu -d 1
[   84.800279] Backtrace:
[   84.802648]  [<10448744>] workqueue_offline_cpu+0x1d4/0x1dc
[   84.802648]  [<10429db4>] cpuhp_invoke_callback+0xf8/0x200
[   84.802648]  [<1042a1d0>] cpuhp_thread_fun+0xb8/0x164
[   84.802648]  [<10452970>] smpboot_thread_fn+0x284/0x288
[   84.802648]  [<1044d8f4>] kthread+0x12c/0x13c
[   84.802648]  [<1040201c>] ret_from_kernel_thread+0x1c/0x24
[   84.802648]
[   84.802648]
[   84.802648] Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000
[   84.802648] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 6.8.0-rc2-32bit+ #1294
[   84.802648] Hardware name: 9000/778/B160L
[   84.802648]
[   84.802648]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[   84.802648] PSW: 00000000000001001111111100001111 Not tainted
[   84.802648] r00-03  0004ff0f 11011540 10446d9c 11e00500
[   84.802648] r04-07  11c0b800 00000002 11c0d000 00000001
[   84.802648] r08-11  110194e4 11019168 00000000 00000004
[   84.802648] r12-15  10c78800 00000612 f0028050 f0027fd8
[   84.802648] r16-19  fffffffc fee01180 f0027ed8 01735000
[   84.802648] r20-23  0000ffff 13ae1a00 13ae1a00 00000000
[   84.802648] r24-27  11c0c580 11c0d004 11c0d000 10ceb968
[   84.802648] r28-31  00000000 0000000e 11e00580 00000018
[   84.802648] sr00-03  00000000 00000000 00000000 000004af
[   84.802648] sr04-07  00000000 00000000 00000000 00000000
[   84.802648]
[   84.802648] IASQ: 00000000 00000000 IAOQ: 10446db4 10446db8
[   84.802648]  IIR: 0f80109c    ISR: 00000000  IOR: 00000000
[   84.802648]  CPU:        1   CR30: 11dd1710 CR31: 00000000
[   84.802648]  ORIG_R28: 00000612
[   84.802648]  IAOQ[0]: wq_update_pod+0x98/0x14c
[   84.802648]  IAOQ[1]: wq_update_pod+0x9c/0x14c
[   84.802648]  RP(r2): wq_update_pod+0x80/0x14c
[   84.802648] Backtrace:
[   84.802648]  [<10448744>] workqueue_offline_cpu+0x1d4/0x1dc
[   84.802648]  [<10429db4>] cpuhp_invoke_callback+0xf8/0x200
[   84.802648]  [<1042a1d0>] cpuhp_thread_fun+0xb8/0x164
[   84.802648]  [<10452970>] smpboot_thread_fn+0x284/0x288
[   84.802648]  [<1044d8f4>] kthread+0x12c/0x13c
[   84.802648]  [<1040201c>] ret_from_kernel_thread+0x1c/0x24
[   84.802648]
[   84.802648] Kernel panic - not syncing: Kernel Fault


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
  2024-02-02  8:28           ` Helge Deller
@ 2024-02-02  8:41             ` Helge Deller
  2024-02-02 17:29               ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Helge Deller @ 2024-02-02  8:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Helge Deller, Lai Jiangshan, linux-kernel, linux-parisc

On 2/2/24 09:28, Helge Deller wrote:
> On 2/2/24 02:39, Tejun Heo wrote:
>> Hello,
>>
>> On Thu, Feb 01, 2024 at 06:56:20PM +0100, Helge Deller wrote:
>>> root@debian:~# drgn --main-symbols -s ./vmlinux ./wq_dump.py 2>&1 | tee L
>>> Affinity Scopes
>>> ===============
>>> wq_unbound_cpumask=0000ffff
>>>
>>> CPU
>>>    nr_pods  16
>>>    pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008 [4]=00000010 [5]=00000020 [6]=00000040 [7]=00000080 [8]=00000100 [9]=00000200 [10]=00000400 [11]=00000800 [12]=00001000 [13]=00002000 [14]=00004000 [15]=00008000
>>>    pod_node [0]=0 [1]=0 [2]=0 [3]=0 [4]=0 [5]=0 [6]=0 [7]=0 [8]=0 [9]=0 [10]=0 [11]=0 [12]=0 [13]=0 [14]=0 [15]=0
>>>    cpu_pod  [0]=0 [1]=1
>>
>> wq_unbound_cpumask is saying there are 16 possible cpus but
>> for_each_possible_cpu() iteration is only giving two. Can you please apply
>> the following patch and post the boot dmesg? Thanks.
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index ffb625db9771..d3fa2bea4d75 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -7146,6 +7146,9 @@ void __init workqueue_init_early(void)
>>       BUG_ON(!alloc_cpumask_var(&wq_requested_unbound_cpumask, GFP_KERNEL));
>>       BUG_ON(!zalloc_cpumask_var(&wq_isolated_cpumask, GFP_KERNEL));
>>
>> +    printk("XXX workqueue_init_early: possible_cpus=%*pb\n",
>> +           cpumask_pr_args(cpu_possible_mask));
>> +
>>       cpumask_copy(wq_unbound_cpumask, cpu_possible_mask);
>>       restrict_unbound_cpumask("HK_TYPE_WQ", housekeeping_cpumask(HK_TYPE_WQ));
>>       restrict_unbound_cpumask("HK_TYPE_DOMAIN", housekeeping_cpumask(HK_TYPE_DOMAIN));
>> @@ -7290,6 +7293,9 @@ void __init workqueue_init(void)
>>       struct worker_pool *pool;
>>       int cpu, bkt;
>>
>> +    printk("XXX workqueue_init: possible_cpus=%*pb\n",
>> +           cpumask_pr_args(cpu_possible_mask));
>> +
>>       wq_cpu_intensive_thresh_init();
>>
>>       mutex_lock(&wq_pool_mutex);
>> @@ -7401,6 +7407,9 @@ void __init workqueue_init_topology(void)
>>       struct workqueue_struct *wq;
>>       int cpu;
>>
>> +    printk("XXX workqueue_init_topology: possible_cpus=%*pb\n",
>> +           cpumask_pr_args(cpu_possible_mask));
>> +
>>       init_pod_type(&wq_pod_types[WQ_AFFN_CPU], cpus_dont_share);
>>       init_pod_type(&wq_pod_types[WQ_AFFN_SMT], cpus_share_smt);
>>       init_pod_type(&wq_pod_types[WQ_AFFN_CACHE], cpus_share_cache);
>
> Here it is:
>
> [    0.000000] Linux version 6.8.0-rc2-32bit+ (deller@carbonx1) (hppa-linux-gnu-gcc (GCC) 13.2.1 20230728 (Red Hat Cross 13.2.1-1), GNU ld version 2.40-3.fc39) #1294 SMP PREEMPT Fri Feb  2 09:24:28 CET 2024
> [    0.000000] FP[0] enabled: Rev 1 Model 15
> [    0.000000] The 32-bit Kernel has started...
> [    0.000000] Kernel default page size is 4 KB. Huge pages disabled.
> [    0.000000] Determining PDC firmware type: System Map.
> [    0.000000] model 00005020 00000481 00000000 02020202 77729da0 100000f0 00000004 000000ba 000000ba 00000000
> [    0.000000] vers  00000008
> [    0.000000] CPUID vers 15 rev 8 (0x000001e8)
> [    0.000000] capabilities 0x2
> [    0.000000] HP-UX model name: 9000/778/B160L
> [    0.000000] MPE/iX model name: 9000/778/B160L
> [    0.000000] Memory Ranges:
> [    0.000000]  0) Start 0x0000000000000000 End 0x000000001fffffff Size    512 MB
> [    0.000000] Total Memory: 512 MB
> [    0.000000] PDT: Firmware does not provide any page deallocation information.
> [    0.000000] Zone ranges:
> [    0.000000]   Normal   [mem 0x0000000000000000-0x000001ffffffffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000001fffffff]
> [    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000001fffffff]
> [    0.000000] percpu: Embedded 16 pages/cpu s34912 r8192 d22432 u65536
> [    0.000000] SMP: bootstrap CPU ID is 0
> [    0.000000] Kernel command line: root=/dev/sda5  panic=-1 console=ttyS0  earlycon=pdc
> [    0.000000] earlycon: pdc0 at MMIO32be 0x00000000 (options '')
> [    0.000000] printk: legacy bootconsole [pdc0] enabled
> [    0.000000] printk: log_buf_len individual max cpu contribution: 4096 bytes
> [    0.000000] printk: log_buf_len total cpu_extra contributions: 61440 bytes
> [    0.000000] printk: log_buf_len min size: 65536 bytes
> [    0.000000] printk: log_buf_len: 131072 bytes
> [    0.000000] printk: early log buf free: 63792(97%)
> [    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
> [    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
> [    0.000000] Sorting __ex_table...
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 129920
> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [    0.000000] Memory: 497732K/524288K available (7216K kernel code, 3452K rwdata, 2176K rodata, 3072K init, 388K bss, 26556K reserved, 0K cma-reserved)
> [    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
> [    0.000000] XXX workqueue_init_early: possible_cpus=ffff
> [    0.000000] rcu: Preemptible hierarchical RCU implementation.
> [    0.000000]  Trampoline variant of Tasks RCU enabled.
> [    0.000000]  Tracing variant of Tasks RCU enabled.
> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
> [    0.000000] NR_IRQS: 96
> [    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> [    0.000149] sched_clock: 32 bits at 250MHz, resolution 4ns, wraps every 8589934590ns
> [    0.005146] Console: colour dummy device 160x64
> [    0.006640] Calibrating delay loop... 1771.11 BogoMIPS (lpj=8855552)
> [    0.123925] pid_max: default: 32768 minimum: 301
> [    0.128202] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> [    0.128461] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> [    0.159241] XXX workqueue_init: possible_cpus=ffff
> [    0.175221] RCU Tasks: Setting shift to 4 and lim to 1 rcu_task_cb_adjust=1.
> [    0.176798] RCU Tasks Trace: Setting shift to 4 and lim to 1 rcu_task_cb_adjust=1.
> [    0.177562] TOC handler registered
> [    0.178992] rcu: Hierarchical SRCU implementation.
> [    0.179222] rcu:     Max phase no-delay instances is 1000.
> [    0.193315] smp: Bringing up secondary CPUs ...
> [    0.194144] smp: Brought up 1 node, 1 CPU
> [    0.199106] XXX workqueue_init_topology: possible_cpus=ffff
> [    0.208842] devtmpfs: initialized
> [    0.217239] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> [    0.218067] futex hash table entries: 4096 (order: 5, 131072 bytes, linear)
> [    0.224055] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> [    0.231058] EISA bus registered
> [    0.231481] Searching for devices...
> [    0.237416] Found devices:
> [    0.237623] 1. Phantom PseudoBC GSC+ Port [8] at 0xffc00000 { type:7, hv:0x504, sv:0x0, rev:0x0 }
> [    0.237874] 2. Dino PCI Bridge [8:0] at 0xfff80000 { type:13, hv:0x680, sv:0xa, rev:0x3 }
> [    0.238234] 3. Merlin+ 132 Dino RS-232 [8:0:63] at 0xfff83000 { type:10, hv:0x22, sv:0x8c, rev:0x0 }
> [    0.238490] 4. Merlin 160 Core BA [8:16] at 0xffd00000 { type:11, hv:0x3d, sv:0x81, rev:0x0 }, additional addresses: 0xffd0c000 0xffc00000
> [    0.238923] 5. Merlin 160 Core RS-232 [8:16:4] at 0xffd05000 { type:10, hv:0x3d, sv:0x8c, rev:0x0 }
> [    0.239193] 6. Merlin 160 Core Centronics [8:16:0] at 0xffd02000 { type:10, hv:0x3d, sv:0x74, rev:0x0 }, additional addresses: 0xffd01000 0xffd03000
> [    0.239595] 7. Memory [63] at 0xfffff000 { type:1, hv:0x67, sv:0x9, rev:0x0 }
> [    0.239839] 8. Merlin L2 160 (9000/778/B160L) [48] at 0xfffb0000 { type:0, hv:0x502, sv:0x4, rev:0x0 }
> [    0.240145] 9. Merlin L2 160 (9000/778/B160L) [49] at 0xfffb1000 { type:0, hv:0x502, sv:0x4, rev:0x0 }
> [    0.240509] Found qemu fw_cfg interface at 0xfffa0000
> [    0.243313] CPU0: cpu core 0 of socket 0
> [    0.244475] CPU1: cpu core 0 of socket 1
> [    0.248134] Releasing cpu 1 now, hpa=fffb1000
> [    0.255231] CPU(s): 2 out of 2 PA7300LC (PCX-L2) at 250.000000 MHz online
> [    0.257077] alternatives: applied 17 out of 1505 patches
> [    0.258013] Calculated flush threshold is 5909 KiB
> [    0.258162] Cache flush threshold set to 2 KiB
> [    0.258304] TLB flush threshold set to 480 KiB
> [    0.259216] Lasi version 0 at 0xffd00000 found.
> [    0.261123] Dino version 3.1 found at 0xfff80000
> [    0.263459] dino 8:0: PCI host bridge to bus 0000:00
> [    0.263841] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
> [    0.264196] pci_bus 0000:00: root bus resource [mem 0xf0800000-0xff7fffff]
> [    0.264476] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    0.265734] pci 0000:00:00.0: [1000:0012] type 00 class 0x010000 conventional PCI endpoint
> [    0.267232] pci 0000:00:00.0: BAR 0 [io  0x1000-0x10ff]
> [    0.268292] pci 0000:00:00.0: BAR 1 [mem 0xff7fe000-0xff7fe3ff]
> [    0.269408] pci 0000:00:00.0: BAR 2 [mem 0xff7fc000-0xff7fdfff]
> [    0.277602] pci 0000:00:01.0: [1011:0019] type 00 class 0x020000 conventional PCI endpoint
> [    0.278298] pci 0000:00:01.0: BAR 0 [io  0x1100-0x117f]
> [    0.278820] pci 0000:00:01.0: BAR 1 [mem 0xff7ff000-0xff7ff07f]
> [    0.281323] pci 0000:00:02.0: [103c:1048] type 00 class 0x070000 conventional PCI endpoint
> [    0.281867] pci 0000:00:02.0: BAR 0 [io  0x1180-0x1187]
> [    0.284921] pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00
> [    0.287179] pci 0000:00:00.0: BAR 2 [mem 0xf0800000-0xf0801fff]: assigned
> [    0.287830] pci 0000:00:00.0: BAR 1 [mem 0xf0802000-0xf08023ff]: assigned
> [    0.288225] pci 0000:00:00.0: BAR 0 [io  0x0100-0x01ff]: assigned
> [    0.288628] pci 0000:00:01.0: BAR 0 [io  0x0080-0x00ff]: assigned
> [    0.289002] pci 0000:00:01.0: BAR 1 [mem 0xf0803000-0xf080307f]: assigned
> [    0.289359] pci 0000:00:02.0: BAR 0 [io  0x0010-0x0017]: assigned
> [    0.290999] powersw: Soft power switch at 0xf07ffff0 enabled.
> [    0.314551] SCSI subsystem initialized
> [    0.317664] usbcore: registered new interface driver usbfs
> [    0.318202] usbcore: registered new interface driver hub
> [    0.318580] usbcore: registered new device driver usb
> [    0.334210] vgaarb: loaded
> [    0.337059] VFS: Disk quotas dquot_6.6.0
> [    0.337473] VFS: Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
> [    0.369580] NET: Registered PF_INET protocol family
> [    0.371471] IP idents hash table entries: 8192 (order: 4, 65536 bytes, linear)
> [    0.380082] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 5120 bytes, linear)
> [    0.381103] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
> [    0.381397] TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
> [    0.382605] TCP bind hash table entries: 4096 (order: 5, 163840 bytes, linear)
> [    0.383338] TCP: Hash tables configured (established 4096 bind 4096)
> [    0.385144] UDP hash table entries: 256 (order: 1, 12288 bytes, linear)
> [    0.385743] UDP-Lite hash table entries: 256 (order: 1, 12288 bytes, linear)
> [    0.387545] NET: Registered PF_UNIX/PF_LOCAL protocol family
> [    0.388172] PCI: CLS 16 bytes
> [    0.390311] clocksource: cr16: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041786 ns
> [    0.391141] clocksource: Switched to clocksource cr16
> [    0.391756] Enabling PDC chassis warnings support v0.05
> [    0.396076] workingset: timestamp_bits=14 max_order=17 bucket_order=3
> [    0.404012] io scheduler mq-deadline registered
> [    0.404324] io scheduler kyber registered
> [    0.407216] PDC Stable Storage facility v0.30
> [    0.408521] sticore: STI GSC/PCI core graphics driver Version 0.9c
> [    0.496987] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    0.511533] printk: legacy console [ttyS0] disabled
> [    0.515146] 8:16:4: ttyS0 at MMIO 0xffd05800 (irq = 16, base_baud = 454545) is a 16550A
> [    0.518634] printk: legacy console [ttyS0] enabled
> [    0.518634] printk: legacy console [ttyS0] enabled
> [    0.519267] printk: legacy bootconsole [pdc0] disabled
> [    0.519267] printk: legacy bootconsole [pdc0] disabled
> [    0.528540] 8:0:63: ttyS1 at MMIO 0xfff83800 (irq = 18, base_baud = 454545) is a 16550A
> [    0.531850] serial 0000:00:02.0: enabling SERR and PARITY (0103 -> 0143)
> [    0.535316] 0000:00:02.0: ttyS2 at I/O 0x10 (irq = 21, base_baud = 115200) is a 16550A
> [    0.539332] parport_init_chip: initialize bidirectional-mode
> [    0.540300] parport0: PC-style at 0xffd02800, irq 17 [PCSPP,TRISTATE]
> [    0.667976] loop: module loaded
> [    0.669638] sym53c8xx 0000:00:00.0: enabling SERR and PARITY (0107 -> 0147)
> [    0.672643] sym0: <895a> rev 0x0 at pci 0000:00:00.0 irq 19
> [    0.678720] sym0: PA-RISC Firmware, ID 7, Fast-40, LVD, parity checking
> [    0.684386] sym0: SCSI BUS has been reset.
> [    0.687904] scsi host0: sym-2.2.3
> [    3.783993] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
> [    3.784726] scsi target0:0:0: tagged command queuing enabled, command queue depth 16.
> [    3.785900] scsi target0:0:0: Beginning Domain Validation
> [    3.788722] scsi target0:0:0: Domain Validation skipping write tests
> [    3.788895] scsi target0:0:0: Ending Domain Validation
> [    3.801756] scsi 0:0:2:0: CD-ROM            QEMU     QEMU CD-ROM      2.5+ PQ: 0 ANSI: 5
> [    3.802043] scsi target0:0:2: tagged command queuing enabled, command queue depth 16.
> [    3.802392] scsi target0:0:2: Beginning Domain Validation
> [    3.803995] scsi target0:0:2: Domain Validation skipping write tests
> [    3.804138] scsi target0:0:2: Ending Domain Validation
> [    3.816982] st: Version 20160209, fixed bufsize 32768, s/g segs 256
> [    3.820181] sr 0:0:2:0: Power-on or device reset occurred
> [    3.822714] sr 0:0:2:0: [sr0] scsi3-mmc drive: 16x/50x cd/rw xa/form2 cdda tray
> [    3.823139] cdrom: Uniform CD-ROM driver Revision: 3.20
> [    3.824901] sd 0:0:0:0: Power-on or device reset occurred
> [    3.828055] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
> [    3.829255] sd 0:0:0:0: [sda] Write Protect is off
> [    3.832789] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [    3.833235] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [    3.833534] sr 0:0:2:0: Attached scsi generic sg1 type 5
> [    3.835756] tulip 0000:00:01.0: enabling SERR and PARITY (0103 -> 0143)
> [    3.849999] tulip0: EEPROM default media type Autosense
> [    3.850253] tulip0: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block
> [    3.853899] tulip0:  MII transceiver #1 config 3100 status 702c advertising 0501
> [    3.856302]  sda: sda1 sda2 sda3 < sda5 sda6 >
> [    3.870391] sd 0:0:0:0: [sda] Attached SCSI disk
> [    3.871194] net eth0: Digital DS21142/43 Tulip rev 0 at Port 0x80, 52:54:00:12:34:56, IRQ 20
> [    3.872279] e100: Intel(R) PRO/100 Network Driver
> [    3.872486] e100: Copyright(c) 1999-2006 Intel Corporation
> [    3.873836] e1000: Intel(R) PRO/1000 Network Driver
> [    3.874067] e1000: Copyright (c) 1999-2006 Intel Corporation.
> [    3.874406] e1000e: Intel(R) PRO/1000 Network Driver
> [    3.874536] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [    3.875010] igb: Intel(R) Gigabit Ethernet Network Driver
> [    3.875212] igb: Copyright (c) 2007-2014 Intel Corporation.
> [    3.875563] LASI 82596 driver - Revision: 1.30
> [    3.878798] HP SDC: No SDC found.
> [    3.879000] HP SDC MLC: Registering the System Domain Controller's HIL MLC.
> [    3.884554] HP SDC MLC: Request for raw HIL ISR hook denied
> [    3.906699] mousedev: PS/2 mouse device common for all mice
> [    3.912379] rtc-generic rtc-generic: registered as rtc0
> [    3.913033] rtc-generic rtc-generic: setting system clock to 2024-02-02T08:24:45 UTC (1706862285)
> [    3.919114] hid: raw HID events driver (C) Jiri Kosina
> [    3.919926] usbcore: registered new interface driver usbhid
> [    3.920146] usbhid: USB HID core driver
> [    3.921737] NET: Registered PF_PACKET protocol family
> [    3.924787] Key type dns_resolver registered
> [    4.092592] EXT4-fs (sda5): INFO: recovery required on readonly filesystem
> [    4.092910] EXT4-fs (sda5): write access will be enabled during recovery
> [    4.200680] EXT4-fs (sda5): recovery complete
> [    4.207887] EXT4-fs (sda5): mounted filesystem 0e24f05b-efd0-4f75-b1de-3309bd27dbd8 ro with ordered data mode. Quota mode: none.
> [    4.209030] VFS: Mounted root (ext4 filesystem) readonly on device 8:5.
> [    4.214953] devtmpfs: mounted
> [    4.267329] Freeing unused kernel image (initmem) memory: 3072K
> [    4.267826] Write protected read-only-after-init data: 2k
> [    4.268225] Run /sbin/init as init process
> [    4.305438] process 'usr/lib/systemd/systemd' started with executable stack
> [    4.932357] systemd[1]: systemd 255.3-2 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK -SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
> [    4.933521] systemd[1]: Detected architecture parisc.
>
> Welcome to Debian GNU/Linux trixie/sid!
>
> [...]
> root@debian:~# lscpu
> Architecture:          parisc
>    Byte Order:          Big Endian
> CPU(s):                2
>    On-line CPU(s) list: 0,1
> Model name:            PA7300LC (PCX-L2)
>    CPU family:          PA-RISC 1.1e
>    Model:               9000/778/B160L - Merlin L2 160 (9000/778/B160L)
>    Thread(s) per core:  1
>    Core(s) per socket:  1
>    Socket(s):           2
>    BogoMIPS:            1771.11
> root@debian:~# chcpu -d 1
> [   84.800279] Backtrace:
> [   84.802648]  [<10448744>] workqueue_offline_cpu+0x1d4/0x1dc
> [   84.802648]  [<10429db4>] cpuhp_invoke_callback+0xf8/0x200
> [   84.802648]  [<1042a1d0>] cpuhp_thread_fun+0xb8/0x164
> [   84.802648]  [<10452970>] smpboot_thread_fn+0x284/0x288
> [   84.802648]  [<1044d8f4>] kthread+0x12c/0x13c
> [   84.802648]  [<1040201c>] ret_from_kernel_thread+0x1c/0x24
> [   84.802648]
> [   84.802648]
> [   84.802648] Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000
> [   84.802648] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 6.8.0-rc2-32bit+ #1294
> [   84.802648] Hardware name: 9000/778/B160L
> [   84.802648]
> [   84.802648]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> [   84.802648] PSW: 00000000000001001111111100001111 Not tainted
> [   84.802648] r00-03  0004ff0f 11011540 10446d9c 11e00500
> [   84.802648] r04-07  11c0b800 00000002 11c0d000 00000001
> [   84.802648] r08-11  110194e4 11019168 00000000 00000004
> [   84.802648] r12-15  10c78800 00000612 f0028050 f0027fd8
> [   84.802648] r16-19  fffffffc fee01180 f0027ed8 01735000
> [   84.802648] r20-23  0000ffff 13ae1a00 13ae1a00 00000000
> [   84.802648] r24-27  11c0c580 11c0d004 11c0d000 10ceb968
> [   84.802648] r28-31  00000000 0000000e 11e00580 00000018
> [   84.802648] sr00-03  00000000 00000000 00000000 000004af
> [   84.802648] sr04-07  00000000 00000000 00000000 00000000
> [   84.802648]
> [   84.802648] IASQ: 00000000 00000000 IAOQ: 10446db4 10446db8
> [   84.802648]  IIR: 0f80109c    ISR: 00000000  IOR: 00000000
> [   84.802648]  CPU:        1   CR30: 11dd1710 CR31: 00000000
> [   84.802648]  ORIG_R28: 00000612
> [   84.802648]  IAOQ[0]: wq_update_pod+0x98/0x14c
> [   84.802648]  IAOQ[1]: wq_update_pod+0x9c/0x14c
> [   84.802648]  RP(r2): wq_update_pod+0x80/0x14c
> [   84.802648] Backtrace:
> [   84.802648]  [<10448744>] workqueue_offline_cpu+0x1d4/0x1dc
> [   84.802648]  [<10429db4>] cpuhp_invoke_callback+0xf8/0x200
> [   84.802648]  [<1042a1d0>] cpuhp_thread_fun+0xb8/0x164
> [   84.802648]  [<10452970>] smpboot_thread_fn+0x284/0x288
> [   84.802648]  [<1044d8f4>] kthread+0x12c/0x13c
> [   84.802648]  [<1040201c>] ret_from_kernel_thread+0x1c/0x24
> [   84.802648]
> [   84.802648] Kernel panic - not syncing: Kernel Fault

In a second step I extended your patch to print the present
and online CPUs too. Below is the relevant dmesg part.

Note, that on parisc the second CPU will be activated later in the
boot process, after the kernel has the inventory.
This I think differs vs x86, where all CPUs are available earlier
in the boot process.
...
[    0.000000] XXX workqueue_init_early: possible_cpus=ffff  present=0001  online=0001
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000]  Trampoline variant of Tasks RCU enabled.
[    0.000000]  Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] NR_IRQS: 96
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000149] sched_clock: 32 bits at 250MHz, resolution 4ns, wraps every 8589934590ns
[    0.005119] Console: colour dummy device 160x64
[    0.006600] Calibrating delay loop... 2465.79 BogoMIPS (lpj=12328960)
[    0.196545] pid_max: default: 32768 minimum: 301
[    0.200761] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.201009] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.228080] XXX workqueue_init: possible_cpus=ffff  present=0001  online=0001
[    0.240799] RCU Tasks: Setting shift to 4 and lim to 1 rcu_task_cb_adjust=1.
[    0.242004] RCU Tasks Trace: Setting shift to 4 and lim to 1 rcu_task_cb_adjust=1.
[    0.242735] TOC handler registered
[    0.244112] rcu: Hierarchical SRCU implementation.
[    0.244270] rcu:     Max phase no-delay instances is 1000.
[    0.259462] smp: Bringing up secondary CPUs ...
[    0.260271] smp: Brought up 1 node, 1 CPU
[    0.263466] XXX workqueue_init_topology: possible_cpus=ffff  present=0001  online=0001
[    0.273156] devtmpfs: initialized
[    0.282163] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
...
[    0.296480] Searching for devices...
[    0.301965] Found devices:
[    0.302119] 1. Phantom PseudoBC GSC+ Port [8] at 0xffc00000 { type:7, hv:0x504, sv:0x0, rev:0x0 }
[    0.302353] 2. Dino PCI Bridge [8:0] at 0xfff80000 { type:13, hv:0x680, sv:0xa, rev:0x3 }
[    0.302549] 3. Merlin+ 132 Dino RS-232 [8:0:63] at 0xfff83000 { type:10, hv:0x22, sv:0x8c, rev:0x0 }
[    0.302755] 4. Merlin 160 Core BA [8:16] at 0xffd00000 { type:11, hv:0x3d, sv:0x81, rev:0x0 }, additional addresses: 0xffd0c000 0xffc00000
[    0.303084] 5. Merlin 160 Core RS-232 [8:16:4] at 0xffd05000 { type:10, hv:0x3d, sv:0x8c, rev:0x0 }
[    0.303287] 6. Merlin 160 Core Centronics [8:16:0] at 0xffd02000 { type:10, hv:0x3d, sv:0x74, rev:0x0 }, additional addresses: 0xffd01000 0xffd03000
[    0.303605] 7. Memory [63] at 0xfffff000 { type:1, hv:0x67, sv:0x9, rev:0x0 }
[    0.303776] 8. Merlin L2 160 (9000/778/B160L) [48] at 0xfffb0000 { type:0, hv:0x502, sv:0x4, rev:0x0 }
[    0.303996] 9. Merlin L2 160 (9000/778/B160L) [49] at 0xfffb1000 { type:0, hv:0x502, sv:0x4, rev:0x0 }
[    0.304245] Found qemu fw_cfg interface at 0xfffa0000
[    0.306850] CPU0: cpu core 0 of socket 0
[    0.307868] CPU1: cpu core 0 of socket 1
[    0.311565] Releasing cpu 1 now, hpa=fffb1000
[    0.322058] CPU(s): 2 out of 2 PA7300LC (PCX-L2) at 250.000000 MHz online
^^^ here the second CPU gets activated (after workqueue_init* has run).

Helge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
  2024-02-02  8:41             ` Helge Deller
@ 2024-02-02 17:29               ` Tejun Heo
  2024-02-05  9:58                 ` Helge Deller
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2024-02-02 17:29 UTC (permalink / raw)
  To: Helge Deller; +Cc: Helge Deller, Lai Jiangshan, linux-kernel, linux-parisc

Hello, Helge.

On Fri, Feb 02, 2024 at 09:41:38AM +0100, Helge Deller wrote:
> In a second step I extended your patch to print the present
> and online CPUs too. Below is the relevant dmesg part.
> 
> Note, that on parisc the second CPU will be activated later in the
> boot process, after the kernel has the inventory.
> This I think differs vs x86, where all CPUs are available earlier
> in the boot process.
> ...
> [    0.000000] XXX workqueue_init_early: possible_cpus=ffff  present=0001  online=0001
...
> [    0.228080] XXX workqueue_init: possible_cpus=ffff  present=0001  online=0001
...
> [    0.263466] XXX workqueue_init_topology: possible_cpus=ffff  present=0001  online=0001

So, what's bothersome is that when the wq_dump.py script printing each cpu's
pwq, it's only printing for CPU 0 and 1. The for_each_possible_cpu() drgn
helper reads cpu_possible_mask from the kernel and iterates that, so that
most likely indicates at some point the cpu_possible_mask becomes 0x3
instead of the one used during boot - 0xffff, which is problematic.

Can you please sprinkle more printks to find out whether and when the
cpu_possible_mask changes during boot?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
  2024-02-02 17:29               ` Tejun Heo
@ 2024-02-05  9:58                 ` Helge Deller
  2024-02-05 17:45                   ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Helge Deller @ 2024-02-05  9:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Helge Deller, Lai Jiangshan, linux-kernel, linux-parisc

Hi Tejun,

On 2/2/24 18:29, Tejun Heo wrote:
> Hello, Helge.
>
> On Fri, Feb 02, 2024 at 09:41:38AM +0100, Helge Deller wrote:
>> In a second step I extended your patch to print the present
>> and online CPUs too. Below is the relevant dmesg part.
>>
>> Note, that on parisc the second CPU will be activated later in the
>> boot process, after the kernel has the inventory.
>> This I think differs vs x86, where all CPUs are available earlier
>> in the boot process.
>> ...
>> [    0.000000] XXX workqueue_init_early: possible_cpus=ffff  present=0001  online=0001
> ...
>> [    0.228080] XXX workqueue_init: possible_cpus=ffff  present=0001  online=0001
> ...
>> [    0.263466] XXX workqueue_init_topology: possible_cpus=ffff  present=0001  online=0001
>
> So, what's bothersome is that when the wq_dump.py script printing each cpu's
> pwq, it's only printing for CPU 0 and 1. The for_each_possible_cpu() drgn
> helper reads cpu_possible_mask from the kernel and iterates that, so that
> most likely indicates at some point the cpu_possible_mask becomes 0x3
> instead of the one used during boot - 0xffff, which is problematic.
>
> Can you please sprinkle more printks to find out whether and when the
> cpu_possible_mask changes during boot?

It seems the commit 0921244f6f4f ("parisc: Only list existing CPUs in cpu_possible_mask")
is the culprit. Reverting that patch makes cpu hot-unplug work again.
Furthermore this commit breaks the cpumask Kunit test as reported by Guenter:
https://lkml.org/lkml/2024/2/4/146

So, I've added the revert to the parisc git tree and if my further tests
go well I'll push it upstream.

Thanks for your help!!
Helge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
  2024-02-05  9:58                 ` Helge Deller
@ 2024-02-05 17:45                   ` Tejun Heo
  0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2024-02-05 17:45 UTC (permalink / raw)
  To: Helge Deller; +Cc: Helge Deller, Lai Jiangshan, linux-kernel, linux-parisc

Hello,

On Mon, Feb 05, 2024 at 10:58:26AM +0100, Helge Deller wrote:
> It seems the commit 0921244f6f4f ("parisc: Only list existing CPUs in cpu_possible_mask")
> is the culprit. Reverting that patch makes cpu hot-unplug work again.
> Furthermore this commit breaks the cpumask Kunit test as reported by Guenter:
> https://lkml.org/lkml/2024/2/4/146
> 
> So, I've added the revert to the parisc git tree and if my further tests
> go well I'll push it upstream.

Yeah, it probably just needs to happen way earlier so that the
cpu_possible_mask doesn't change while the kernel is already half
initialized.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-02-05 17:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31 19:27 [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug Helge Deller
2024-01-31 22:28 ` Tejun Heo
2024-02-01 16:41   ` Helge Deller
2024-02-01 16:54     ` Tejun Heo
2024-02-01 17:56       ` Helge Deller
2024-02-02  1:39         ` Tejun Heo
2024-02-02  8:28           ` Helge Deller
2024-02-02  8:41             ` Helge Deller
2024-02-02 17:29               ` Tejun Heo
2024-02-05  9:58                 ` Helge Deller
2024-02-05 17:45                   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).