live-patching.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qian Cai <quic_qiancai@quicinc.com>
To: Peter Zijlstra <peterz@infradead.org>, <gor@linux.ibm.com>,
	<jpoimboe@redhat.com>, <jikos@kernel.org>, <mbenes@suse.cz>,
	<pmladek@suse.com>, <mingo@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <joe.lawrence@redhat.com>,
	<fweisbec@gmail.com>, <tglx@linutronix.de>, <hca@linux.ibm.com>,
	<svens@linux.ibm.com>, <sumanthk@linux.ibm.com>,
	<live-patching@vger.kernel.org>, <paulmck@kernel.org>,
	<rostedt@goodmis.org>, <x86@kernel.org>
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()
Date: Mon, 18 Oct 2021 23:47:32 -0400	[thread overview]
Message-ID: <a354fadd-268f-8119-d37a-102e5efa1437@quicinc.com> (raw)
In-Reply-To: <ba4ca17f-100e-bef7-6d7b-4de0f5a515b9@quicinc.com>

Peter, any thoughts? I did confirm that reverting the commit fixed the issue.

On 10/13/2021 10:32 AM, Qian Cai wrote:
> 
> 
> On 9/29/2021 11:17 AM, Peter Zijlstra wrote:
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -1170,14 +1170,14 @@ void wake_up_all_idle_cpus(void)
>>  {
>>  	int cpu;
>>  
>> -	preempt_disable();
>> +	cpus_read_lock();
>>  	for_each_online_cpu(cpu) {
>> -		if (cpu == smp_processor_id())
>> +		if (cpu == raw_smp_processor_id())
>>  			continue;
>>  
>>  		wake_up_if_idle(cpu);
>>  	}
>> -	preempt_enable();
>> +	cpus_read_unlock();
> 
> Peter, it looks like this thing introduced a deadlock during CPU online/offline.
> 
> [  630.145166][  T129] WARNING: possible recursive locking detected
> [  630.151164][  T129] 5.15.0-rc5-next-20211013+ #145 Not tainted
> [  630.156988][  T129] --------------------------------------------
> [  630.162984][  T129] cpuhp/21/129 is trying to acquire lock:
> [  630.168547][  T129] ffff800011f466d0 (cpu_hotplug_lock){++++}-{0:0}, at: wake_up_all_idle_cpus+0x40/0xe8
> wake_up_all_idle_cpus at /usr/src/linux-next/kernel/smp.c:1174
> [  630.178040][  T129]
> [  630.178040][){++++}-{0:0}, at help us debug this:
> [  630.202292][  T129]  Possible unsafe locking scenario:
> [  630.202292][  T129]
> [  630.209590][  T129]        CPU0
> [  630.212720][  T129]        ----
> [  630.215851][  T129]   lock(cpu_hotplug_lock);
> [  630.220202][  T129]   lock(cpu_hotplug_lock);
> [  630.224553][  T129]
> [  630.224553][  T129]  *** DEADLOCK ***
> [  630.224553][  T129]
> [  630.232545][  T129]  May be due to missing lock nesting notation
> [  630.232545][  T129]
> [  630.240711][  T129] 3 locks held by cpuhp/21/129:
> [  630.245406][  T129]  #0: ffff800011f466d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0xe0/0x588
> [  630.254976][  T129]  #1: ffff800011f46780 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0xe0/0x588
> [  630.264372][  T129]  #2: ffff8000191fb9c8 (cpuidle_lock){+.+.}-{3:3}, at: cpuidle_pause_and_lock+0x24/0x38
> [  630.274031][  T129]
> [  630.274031][  T129] stack backtrace:
> [  630.279767][  T129] CPU: 21 PID: 129 Comm: cpuhp/21 Not tainted 5.15.0-rc5-next-20211013+ #145
> [  630.288371][  T129] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> [  630.296886][  T129] Call trace:
> [  630.300017][  T129]  dump_backtrace+0x0/0x3b8
> [  630.304369][  T129]  show_stack+0x20/0x30
> [  630.308371][  T129]  dump_stack_lvl+0x8c/0xb8
> [  630.312722][  T129]  dump_stack+0x1c/0x38
> [  630.316723][  T129]  validate_chain+0x1d84/0x1da0
> [  630.321421][  T129]  __lock_acquire+0xab0/0x2040
> [  630.326033][  T129]  lock_acquire+0x32c/0xb08
> [  630.330390][  T129]  cpus_read_lock+0x94/0x308
> [  630.334827][  T129]  wake_up_all_idle_cpus+0x40/0xe8
> [  630.339784][  T129]  cpuidle_uninstall_idle_handler+0x3c/0x50
> [  630.345524][  T129]  cpuidle_pause_and_lock+0x28/0x38
> [  630.350569][  T129]  acpi_processor_hotplug+0xc0/0x170
> [  630.355701][  T129]  acpi_soft_cpu_online+0x124/0x250
> [  630.360745][  T129]  cpuhp_invoke_callback+0x51c/0x2ab8
> [  630.365963][  T129]  cpuhp_thread_fun+0x204/0x588
> [  630.370659][  T129]  smpboot_thread_fn+0x3f0/0xc40
> [  630.375444][  T129]  kthread+0x3d8/0x488
> [  630.379360][  T129]  ret_from_fork+0x10/0x20
> [  863.525716][  T191] INFO: task cpuhp/21:129 blocked for more than 122 seconds.
> [  863.532954][  T191]       Not tainted 5.15.0-rc5-next-20211013+ #145
> [  863.539361][  T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  863.547927][  T191] task:cpuhp/21        state:D stack:59104 pid:  129 ppid:     2 flags:0x00000008
> [  863.557029][  T191] Call trace:
> [  863.560171][  T191]  __switch_to+0x184/0x400
> [  863.564448][  T191]  __schedule+0x74c/0x1940
> [  863.568753][  T191]  schedule+0x110/0x318
> [  863.572764][  T191]  percpu_rwsem_wait+0x1b8/0x348
> [  863.577592][  T191]  __percpu_down_read+0xb0/0x148
> [  863.582386][  T191]  cpus_read_lock+0x2b0/0x308
> [  863.586961][  T191]  wake_up_all_idle_cpus+0x40/0xe8
> [  863.591931][  T191]  cpuidle_uninstall_idle_handler+0x3c/0x50
> [  863.597716][  T191]  cpuidle_pause_and_lock+0x28/0x38
> [  863.602771][  T191]  acpi_processor_hotplug+0xc0/0x170
> [  863.607946][  T191]  acpi_soft_cpu_online+0x124/0x250
> [  863.613001][  T191]  cpuhp_invoke_callback+0x51c/0x2ab8
> [  863.618261][  T191]  cpuhp_thread_fun+0x204/0x588
> [  863.622967][  T191]  smpboot_thread_fn+0x3f0/0xc40
> [  863.627787][  T191]  kthread+0x3d8/0x488
> [  863.631712][  T191]  ret_from_fork+0x10/0x20
> [  863.636020][  T191] INFO: task kworker/0:2:189 blocked for more than 122 seconds.
> [  863.643500][  T191]       Not tainted 5.15.0-rc5-next-20211013+ #145
> [  863.649882][  T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  863.658425][  T191] task:kworker/0:2     state:D stack:58368 pid:  189 ppid:     2 flags:0x00000008
> [  863.667516][  T191] Workqueue: events vmstat_shepherd
> [  863.672573][  T191] Call trace:
> [  863.675731][  T191]  __switch_to+0x184/0x400
> [  863.680001][  T191]  __schedule+0x74c/0x1940
> [  863.684268][  T191]  schedule+0x110/0x318
> [  863.688295][  T191]  percpu_rwsem_wait+0x1b8/0x348
> [  863.693085][  T191]  __percpu_down_read+0xb0/0x148
> [  863.697892][  T191]  cpus_read_lock+0x2b0/0x308
> [  863.702421][  T191]  vmstat_shepherd+0x5c/0x1a8
> [  863.706977][  T191]  process_one_work+0x808/0x19d0
> [  863.711767][  T191]  worker_thread+0x334/0xae0
> [  863.716227][  T191]  kthread+0x3d8/0x488
> [  863.720149][  T191]  ret_from_fork+0x10/0x20
> [  863.724487][  T191] INFO: task lsbug:4642 blocked for more than 123 seconds.
> [  863.731565][  T191]       Not tainted 5.15.0-rc5-next-20211013+ #145
> [  863.737938][  T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  863.746490][  T191] task:lsbug           state:D stack:55536 pid: 4642 ppid:  4638 flags:0x00000008
> [  863.755549][  T191] Call trace:
> [  863.758712][  T191]  __switch_to+0x184/0x400
> [  863.762984][  T191]  __schedule+0x74c/0x1940
> [  863.767286][  T191]  schedule+0x110/0x318
> [  863.771294][  T191]  schedule_timeout+0x188/0x238
> [  863.776016][  T191]  wait_for_completion+0x174/0x290
> [  863.780979][  T191]  __cpuhp_kick_ap+0x158/0x1a8
> [  863.785592][  T191]  cpuhp_kick_ap+0x1f0/0x828
> [  863.790053][  T191]  bringup_cpu+0x180/0x1e0
> [  863.794320][  T191]  cpuhp_invoke_callback+0x51c/0x2ab8
> [  863.799561][  T191]  cpuhp_invoke_callback_range+0xa4/0x108
> [  863.805130][  T191]  cpu_up+0x528/0xd78
> [  863.808982][  T191]  cpu_device_up+0x4c/0x68
> [  863.813249][  T191]  cpu_subsys_online+0xc0/0x1f8
> [  863.817972][  T191]  device_online+0x10c/0x180
> [  863.822413][  T191]  online_store+0x10c/0x118
> [  863.826791][  T191]  dev_attr_store+0x44/0x78
> [  863.831148][  T191]  sysfs_kf_write+0xe8/0x138
> [  863.835590][  T191]  kernfs_fop_write_iter+0x26c/0x3d0
> [  863.840745][  T191]  new_sync_write+0x2bc/0x4f8
> [  863.845275][  T191]  vfs_write+0x714/0xcd8
> [  863.849387][  T191]  ksys_write+0xf8/0x1e0
> [  863.853481][  T191]  __arm64_sys_write+0x74/0xa8
> [  863.858113][  T191]  invoke_syscall.constprop.0+0xdc/0x1d8
> [  863.863597][  T191]  do_el0_svc+0xe4/0x298
> [  863.867710][  T191]  el0_svc+0x64/0x130
> [  863.871545][  T191]  el0t_64_sync_handler+0xb0/0xb8
> [  863.876437][  T191]  el0t_64_sync+0x180/0x184
> [  863.880798][  T191] INFO: task mount:4682 blocked for more than 123 seconds.
> [  863.887881][  T191]       Not tainted 5.15.0-rc5-next-20211013+ #145
> [  863.894232][  T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  863.902776][  T191] task:mount           state:D stack:55856 pid: 4682 ppid:  1101 flags:0x00000000
> [  863.911865][  T191] Call trace:
> [  863.915003][  T191]  __switch_to+0x184/0x400
> [  863.919296][  T191]  __schedule+0x74c/0x1940
> [  863.923564][  T191]  schedule+0x110/0x318
> [  863.927590][  T191]  percpu_rwsem_wait+0x1b8/0x348
> [  863.932380][  T191]  __percpu_down_read+0xb0/0x148
> [  863.937187][  T191]  cpus_read_lock+0x2b0/0x308
> [  863.941715][  T191]  alloc_workqueue+0x730/0xd48
> [  863.946357][  T191]  loop_configure+0x2d4/0x1180 [loop]
> [  863.951592][  T191]  lo_ioctl+0x5dc/0x1228 [loop]
> [  863.956321][  T191]  blkdev_ioctl+0x258/0x820
> [  863.960678][  T191]  __arm64_sys_ioctl+0x114/0x180
> [  863.965468][  T191]  invoke_syscall.constprop.0+0xdc/0x1d8
> [  863.970974][  T191]  do_el0_svc+0xe4/0x298
> [  863.975069][  T191]  el0_svc+0x64/0x130
> [  863.978922][  T191]  el0t_64_sync_handler+0xb0/0xb8
> [  863.983798][  T191]  el0t_64_sync+0x180/0x184
> [  863.988172][  T191] INFO: lockdep is turned off.
> 

  reply	other threads:[~2021-10-19  3:47 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-29 15:17 [PATCH v2 00/11] sched,rcu,context_tracking,livepatch: Improve livepatch task transitions for idle and NOHZ_FULL Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 01/11] sched: Improve try_invoke_on_locked_down_task() Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 02/11] sched,rcu: Rework try_invoke_on_locked_down_task() Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 03/11] sched,livepatch: Use task_call_func() Peter Zijlstra
2021-10-05 11:40   ` Petr Mladek
2021-10-05 14:03     ` Peter Zijlstra
2021-10-06  8:59   ` Miroslav Benes
2021-09-29 15:17 ` [PATCH v2 04/11] sched: Simplify wake_up_*idle*() Peter Zijlstra
2021-10-13 14:32   ` Qian Cai
2021-10-19  3:47     ` Qian Cai [this message]
2021-10-19  8:56       ` Peter Zijlstra
2021-10-19  9:10         ` Peter Zijlstra
2021-10-19 15:32           ` Qian Cai
2021-10-19 15:50             ` Peter Zijlstra
2021-10-19 19:22               ` Qian Cai
2021-10-19 20:27                 ` Peter Zijlstra
     [not found]   ` <CGME20211022134630eucas1p2e79e2816587d182c580459d567c1f2a9@eucas1p2.samsung.com>
2021-10-22 13:46     ` Marek Szyprowski
2021-09-29 15:17 ` [PATCH v2 05/11] sched,livepatch: Use wake_up_if_idle() Peter Zijlstra
2021-10-05 12:00   ` Petr Mladek
2021-10-06  9:16   ` Miroslav Benes
2021-10-07  9:18     ` Vasily Gorbik
2021-10-07 10:02       ` Peter Zijlstra
2021-10-13 19:37   ` Arnd Bergmann
2021-10-14 10:42     ` Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 06/11] context_tracking: Prefix user_{enter,exit}*() Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 07/11] context_tracking: Add an atomic sequence/state count Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 08/11] context_tracking,rcu: Replace RCU dynticks counter with context_tracking Peter Zijlstra
2021-09-29 18:37   ` Paul E. McKenney
2021-09-29 19:09     ` Peter Zijlstra
2021-09-29 19:11     ` Peter Zijlstra
2021-09-29 19:13     ` Peter Zijlstra
2021-09-29 19:24       ` Peter Zijlstra
2021-09-29 19:45         ` Paul E. McKenney
2021-09-29 18:54   ` Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 09/11] context_tracking,livepatch: Dont disturb NOHZ_FULL Peter Zijlstra
2021-10-06  8:12   ` Petr Mladek
2021-10-06  9:04     ` Peter Zijlstra
2021-10-06 10:29       ` Petr Mladek
2021-10-06 11:41         ` Peter Zijlstra
2021-10-06 11:48         ` Miroslav Benes
2021-09-29 15:17 ` [RFC][PATCH v2 10/11] livepatch: Remove klp_synchronize_transition() Peter Zijlstra
2021-10-06 12:30   ` Petr Mladek
2021-09-29 15:17 ` [RFC][PATCH v2 11/11] context_tracking,x86: Fix text_poke_sync() vs NOHZ_FULL Peter Zijlstra
2021-10-21 18:39   ` Marcelo Tosatti
2021-10-21 18:40     ` Marcelo Tosatti
2021-10-21 19:25     ` Peter Zijlstra
2021-10-21 19:57       ` Marcelo Tosatti
2021-10-21 20:18         ` Peter Zijlstra
2021-10-26 18:19           ` Marcelo Tosatti
2021-10-26 19:38             ` Peter Zijlstra
2021-09-29 18:03 ` [PATCH v2 00/11] sched,rcu,context_tracking,livepatch: Improve livepatch task transitions for idle and NOHZ_FULL Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a354fadd-268f-8119-d37a-102e5efa1437@quicinc.com \
    --to=quic_qiancai@quicinc.com \
    --cc=fweisbec@gmail.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jikos@kernel.org \
    --cc=joe.lawrence@redhat.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=live-patching@vger.kernel.org \
    --cc=mbenes@suse.cz \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sumanthk@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --subject='Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).