live-patching.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qian Cai <quic_qiancai@quicinc.com>
To: Peter Zijlstra <peterz@infradead.org>, <gor@linux.ibm.com>,
	<jpoimboe@redhat.com>, <jikos@kernel.org>, <mbenes@suse.cz>,
	<pmladek@suse.com>, <mingo@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <joe.lawrence@redhat.com>,
	<fweisbec@gmail.com>, <tglx@linutronix.de>, <hca@linux.ibm.com>,
	<svens@linux.ibm.com>, <sumanthk@linux.ibm.com>,
	<live-patching@vger.kernel.org>, <paulmck@kernel.org>,
	<rostedt@goodmis.org>, <x86@kernel.org>
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()
Date: Wed, 13 Oct 2021 10:32:26 -0400	[thread overview]
Message-ID: <ba4ca17f-100e-bef7-6d7b-4de0f5a515b9@quicinc.com> (raw)
In-Reply-To: <20210929152428.769328779@infradead.org>



On 9/29/2021 11:17 AM, Peter Zijlstra wrote:
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -1170,14 +1170,14 @@ void wake_up_all_idle_cpus(void)
>  {
>  	int cpu;
>  
> -	preempt_disable();
> +	cpus_read_lock();
>  	for_each_online_cpu(cpu) {
> -		if (cpu == smp_processor_id())
> +		if (cpu == raw_smp_processor_id())
>  			continue;
>  
>  		wake_up_if_idle(cpu);
>  	}
> -	preempt_enable();
> +	cpus_read_unlock();

Peter, it looks like this thing introduced a deadlock during CPU online/offline.

[  630.145166][  T129] WARNING: possible recursive locking detected
[  630.151164][  T129] 5.15.0-rc5-next-20211013+ #145 Not tainted
[  630.156988][  T129] --------------------------------------------
[  630.162984][  T129] cpuhp/21/129 is trying to acquire lock:
[  630.168547][  T129] ffff800011f466d0 (cpu_hotplug_lock){++++}-{0:0}, at: wake_up_all_idle_cpus+0x40/0xe8
wake_up_all_idle_cpus at /usr/src/linux-next/kernel/smp.c:1174
[  630.178040][  T129]
[  630.178040][){++++}-{0:0}, at help us debug this:
[  630.202292][  T129]  Possible unsafe locking scenario:
[  630.202292][  T129]
[  630.209590][  T129]        CPU0
[  630.212720][  T129]        ----
[  630.215851][  T129]   lock(cpu_hotplug_lock);
[  630.220202][  T129]   lock(cpu_hotplug_lock);
[  630.224553][  T129]
[  630.224553][  T129]  *** DEADLOCK ***
[  630.224553][  T129]
[  630.232545][  T129]  May be due to missing lock nesting notation
[  630.232545][  T129]
[  630.240711][  T129] 3 locks held by cpuhp/21/129:
[  630.245406][  T129]  #0: ffff800011f466d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0xe0/0x588
[  630.254976][  T129]  #1: ffff800011f46780 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0xe0/0x588
[  630.264372][  T129]  #2: ffff8000191fb9c8 (cpuidle_lock){+.+.}-{3:3}, at: cpuidle_pause_and_lock+0x24/0x38
[  630.274031][  T129]
[  630.274031][  T129] stack backtrace:
[  630.279767][  T129] CPU: 21 PID: 129 Comm: cpuhp/21 Not tainted 5.15.0-rc5-next-20211013+ #145
[  630.288371][  T129] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
[  630.296886][  T129] Call trace:
[  630.300017][  T129]  dump_backtrace+0x0/0x3b8
[  630.304369][  T129]  show_stack+0x20/0x30
[  630.308371][  T129]  dump_stack_lvl+0x8c/0xb8
[  630.312722][  T129]  dump_stack+0x1c/0x38
[  630.316723][  T129]  validate_chain+0x1d84/0x1da0
[  630.321421][  T129]  __lock_acquire+0xab0/0x2040
[  630.326033][  T129]  lock_acquire+0x32c/0xb08
[  630.330390][  T129]  cpus_read_lock+0x94/0x308
[  630.334827][  T129]  wake_up_all_idle_cpus+0x40/0xe8
[  630.339784][  T129]  cpuidle_uninstall_idle_handler+0x3c/0x50
[  630.345524][  T129]  cpuidle_pause_and_lock+0x28/0x38
[  630.350569][  T129]  acpi_processor_hotplug+0xc0/0x170
[  630.355701][  T129]  acpi_soft_cpu_online+0x124/0x250
[  630.360745][  T129]  cpuhp_invoke_callback+0x51c/0x2ab8
[  630.365963][  T129]  cpuhp_thread_fun+0x204/0x588
[  630.370659][  T129]  smpboot_thread_fn+0x3f0/0xc40
[  630.375444][  T129]  kthread+0x3d8/0x488
[  630.379360][  T129]  ret_from_fork+0x10/0x20
[  863.525716][  T191] INFO: task cpuhp/21:129 blocked for more than 122 seconds.
[  863.532954][  T191]       Not tainted 5.15.0-rc5-next-20211013+ #145
[  863.539361][  T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  863.547927][  T191] task:cpuhp/21        state:D stack:59104 pid:  129 ppid:     2 flags:0x00000008
[  863.557029][  T191] Call trace:
[  863.560171][  T191]  __switch_to+0x184/0x400
[  863.564448][  T191]  __schedule+0x74c/0x1940
[  863.568753][  T191]  schedule+0x110/0x318
[  863.572764][  T191]  percpu_rwsem_wait+0x1b8/0x348
[  863.577592][  T191]  __percpu_down_read+0xb0/0x148
[  863.582386][  T191]  cpus_read_lock+0x2b0/0x308
[  863.586961][  T191]  wake_up_all_idle_cpus+0x40/0xe8
[  863.591931][  T191]  cpuidle_uninstall_idle_handler+0x3c/0x50
[  863.597716][  T191]  cpuidle_pause_and_lock+0x28/0x38
[  863.602771][  T191]  acpi_processor_hotplug+0xc0/0x170
[  863.607946][  T191]  acpi_soft_cpu_online+0x124/0x250
[  863.613001][  T191]  cpuhp_invoke_callback+0x51c/0x2ab8
[  863.618261][  T191]  cpuhp_thread_fun+0x204/0x588
[  863.622967][  T191]  smpboot_thread_fn+0x3f0/0xc40
[  863.627787][  T191]  kthread+0x3d8/0x488
[  863.631712][  T191]  ret_from_fork+0x10/0x20
[  863.636020][  T191] INFO: task kworker/0:2:189 blocked for more than 122 seconds.
[  863.643500][  T191]       Not tainted 5.15.0-rc5-next-20211013+ #145
[  863.649882][  T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  863.658425][  T191] task:kworker/0:2     state:D stack:58368 pid:  189 ppid:     2 flags:0x00000008
[  863.667516][  T191] Workqueue: events vmstat_shepherd
[  863.672573][  T191] Call trace:
[  863.675731][  T191]  __switch_to+0x184/0x400
[  863.680001][  T191]  __schedule+0x74c/0x1940
[  863.684268][  T191]  schedule+0x110/0x318
[  863.688295][  T191]  percpu_rwsem_wait+0x1b8/0x348
[  863.693085][  T191]  __percpu_down_read+0xb0/0x148
[  863.697892][  T191]  cpus_read_lock+0x2b0/0x308
[  863.702421][  T191]  vmstat_shepherd+0x5c/0x1a8
[  863.706977][  T191]  process_one_work+0x808/0x19d0
[  863.711767][  T191]  worker_thread+0x334/0xae0
[  863.716227][  T191]  kthread+0x3d8/0x488
[  863.720149][  T191]  ret_from_fork+0x10/0x20
[  863.724487][  T191] INFO: task lsbug:4642 blocked for more than 123 seconds.
[  863.731565][  T191]       Not tainted 5.15.0-rc5-next-20211013+ #145
[  863.737938][  T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  863.746490][  T191] task:lsbug           state:D stack:55536 pid: 4642 ppid:  4638 flags:0x00000008
[  863.755549][  T191] Call trace:
[  863.758712][  T191]  __switch_to+0x184/0x400
[  863.762984][  T191]  __schedule+0x74c/0x1940
[  863.767286][  T191]  schedule+0x110/0x318
[  863.771294][  T191]  schedule_timeout+0x188/0x238
[  863.776016][  T191]  wait_for_completion+0x174/0x290
[  863.780979][  T191]  __cpuhp_kick_ap+0x158/0x1a8
[  863.785592][  T191]  cpuhp_kick_ap+0x1f0/0x828
[  863.790053][  T191]  bringup_cpu+0x180/0x1e0
[  863.794320][  T191]  cpuhp_invoke_callback+0x51c/0x2ab8
[  863.799561][  T191]  cpuhp_invoke_callback_range+0xa4/0x108
[  863.805130][  T191]  cpu_up+0x528/0xd78
[  863.808982][  T191]  cpu_device_up+0x4c/0x68
[  863.813249][  T191]  cpu_subsys_online+0xc0/0x1f8
[  863.817972][  T191]  device_online+0x10c/0x180
[  863.822413][  T191]  online_store+0x10c/0x118
[  863.826791][  T191]  dev_attr_store+0x44/0x78
[  863.831148][  T191]  sysfs_kf_write+0xe8/0x138
[  863.835590][  T191]  kernfs_fop_write_iter+0x26c/0x3d0
[  863.840745][  T191]  new_sync_write+0x2bc/0x4f8
[  863.845275][  T191]  vfs_write+0x714/0xcd8
[  863.849387][  T191]  ksys_write+0xf8/0x1e0
[  863.853481][  T191]  __arm64_sys_write+0x74/0xa8
[  863.858113][  T191]  invoke_syscall.constprop.0+0xdc/0x1d8
[  863.863597][  T191]  do_el0_svc+0xe4/0x298
[  863.867710][  T191]  el0_svc+0x64/0x130
[  863.871545][  T191]  el0t_64_sync_handler+0xb0/0xb8
[  863.876437][  T191]  el0t_64_sync+0x180/0x184
[  863.880798][  T191] INFO: task mount:4682 blocked for more than 123 seconds.
[  863.887881][  T191]       Not tainted 5.15.0-rc5-next-20211013+ #145
[  863.894232][  T191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  863.902776][  T191] task:mount           state:D stack:55856 pid: 4682 ppid:  1101 flags:0x00000000
[  863.911865][  T191] Call trace:
[  863.915003][  T191]  __switch_to+0x184/0x400
[  863.919296][  T191]  __schedule+0x74c/0x1940
[  863.923564][  T191]  schedule+0x110/0x318
[  863.927590][  T191]  percpu_rwsem_wait+0x1b8/0x348
[  863.932380][  T191]  __percpu_down_read+0xb0/0x148
[  863.937187][  T191]  cpus_read_lock+0x2b0/0x308
[  863.941715][  T191]  alloc_workqueue+0x730/0xd48
[  863.946357][  T191]  loop_configure+0x2d4/0x1180 [loop]
[  863.951592][  T191]  lo_ioctl+0x5dc/0x1228 [loop]
[  863.956321][  T191]  blkdev_ioctl+0x258/0x820
[  863.960678][  T191]  __arm64_sys_ioctl+0x114/0x180
[  863.965468][  T191]  invoke_syscall.constprop.0+0xdc/0x1d8
[  863.970974][  T191]  do_el0_svc+0xe4/0x298
[  863.975069][  T191]  el0_svc+0x64/0x130
[  863.978922][  T191]  el0t_64_sync_handler+0xb0/0xb8
[  863.983798][  T191]  el0t_64_sync+0x180/0x184
[  863.988172][  T191] INFO: lockdep is turned off.

  reply	other threads:[~2021-10-13 14:33 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-29 15:17 [PATCH v2 00/11] sched,rcu,context_tracking,livepatch: Improve livepatch task transitions for idle and NOHZ_FULL Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 01/11] sched: Improve try_invoke_on_locked_down_task() Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 02/11] sched,rcu: Rework try_invoke_on_locked_down_task() Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 03/11] sched,livepatch: Use task_call_func() Peter Zijlstra
2021-10-05 11:40   ` Petr Mladek
2021-10-05 14:03     ` Peter Zijlstra
2021-10-06  8:59   ` Miroslav Benes
2021-09-29 15:17 ` [PATCH v2 04/11] sched: Simplify wake_up_*idle*() Peter Zijlstra
2021-10-13 14:32   ` Qian Cai [this message]
2021-10-19  3:47     ` Qian Cai
2021-10-19  8:56       ` Peter Zijlstra
2021-10-19  9:10         ` Peter Zijlstra
2021-10-19 15:32           ` Qian Cai
2021-10-19 15:50             ` Peter Zijlstra
2021-10-19 19:22               ` Qian Cai
2021-10-19 20:27                 ` Peter Zijlstra
     [not found]   ` <CGME20211022134630eucas1p2e79e2816587d182c580459d567c1f2a9@eucas1p2.samsung.com>
2021-10-22 13:46     ` Marek Szyprowski
2021-09-29 15:17 ` [PATCH v2 05/11] sched,livepatch: Use wake_up_if_idle() Peter Zijlstra
2021-10-05 12:00   ` Petr Mladek
2021-10-06  9:16   ` Miroslav Benes
2021-10-07  9:18     ` Vasily Gorbik
2021-10-07 10:02       ` Peter Zijlstra
2021-10-13 19:37   ` Arnd Bergmann
2021-10-14 10:42     ` Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 06/11] context_tracking: Prefix user_{enter,exit}*() Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 07/11] context_tracking: Add an atomic sequence/state count Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 08/11] context_tracking,rcu: Replace RCU dynticks counter with context_tracking Peter Zijlstra
2021-09-29 18:37   ` Paul E. McKenney
2021-09-29 19:09     ` Peter Zijlstra
2021-09-29 19:11     ` Peter Zijlstra
2021-09-29 19:13     ` Peter Zijlstra
2021-09-29 19:24       ` Peter Zijlstra
2021-09-29 19:45         ` Paul E. McKenney
2021-09-29 18:54   ` Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 09/11] context_tracking,livepatch: Dont disturb NOHZ_FULL Peter Zijlstra
2021-10-06  8:12   ` Petr Mladek
2021-10-06  9:04     ` Peter Zijlstra
2021-10-06 10:29       ` Petr Mladek
2021-10-06 11:41         ` Peter Zijlstra
2021-10-06 11:48         ` Miroslav Benes
2021-09-29 15:17 ` [RFC][PATCH v2 10/11] livepatch: Remove klp_synchronize_transition() Peter Zijlstra
2021-10-06 12:30   ` Petr Mladek
2021-09-29 15:17 ` [RFC][PATCH v2 11/11] context_tracking,x86: Fix text_poke_sync() vs NOHZ_FULL Peter Zijlstra
2021-10-21 18:39   ` Marcelo Tosatti
2021-10-21 18:40     ` Marcelo Tosatti
2021-10-21 19:25     ` Peter Zijlstra
2021-10-21 19:57       ` Marcelo Tosatti
2021-10-21 20:18         ` Peter Zijlstra
2021-10-26 18:19           ` Marcelo Tosatti
2021-10-26 19:38             ` Peter Zijlstra
2021-09-29 18:03 ` [PATCH v2 00/11] sched,rcu,context_tracking,livepatch: Improve livepatch task transitions for idle and NOHZ_FULL Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba4ca17f-100e-bef7-6d7b-4de0f5a515b9@quicinc.com \
    --to=quic_qiancai@quicinc.com \
    --cc=fweisbec@gmail.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jikos@kernel.org \
    --cc=joe.lawrence@redhat.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=live-patching@vger.kernel.org \
    --cc=mbenes@suse.cz \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sumanthk@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --subject='Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).