Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work

From: Xing Zhengjun <zhengjun.xing@linux.intel.com>
To: Hillf Danton <hdanton@sina.com>
Cc: Oliver Sang <oliver.sang@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Lai Jiangshan <laijs@linux.alibaba.com>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@intel.com, lkp <lkp@lists.01.org>
Subject: Re: [workqueue] d5bff968ea: WARNING:at_kernel/workqueue.c:#process_one_work
Date: Mon, 25 Jan 2021 16:37:03 +0800	[thread overview]
Message-ID: <a2a57594-5470-2ab8-2267-d8bdbe9c75bf@linux.intel.com> (raw)
In-Reply-To: <20210122075903.1722-1-hdanton@sina.com>

[-- Attachment #1: Type: text/plain, Size: 10576 bytes --]


On 1/22/2021 3:59 PM, Hillf Danton wrote:
> On Fri, 22 Jan 2021 09:48:32 +0800 Xing Zhengjun wrote:
>> On 1/21/2021 12:00 PM, Hillf Danton wrote:
>>> On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote:
>>>> On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote:
>>>>> Thu, 14 Jan 2021 15:45:11 +0800
>>>>>>
>>>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>>>
>>>>>> commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity initiatively")
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2021.01.11b
>>>>>>
>>>>> [...]
>>>>>>
>>>>>> [   73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 process_one_work
>>>>>
>>>>> Thanks for your report.
>>>>>
>>>>> We can also break CPU affinity by checking POOL_DISASSOCIATED at attach
>>>>> time without extra cost paid; that way we have the same behavior as at
>>>>> the unbind time.
>>>>>
>>>>> What is more the change that makes kworker pcpu is cut because they are
>>>>> going to not help either hotplug or the mechanism of stop machine.
>>>>
>>>> hi, by applying below patch, the issue still happened.
>>>
>>> Thanks for your report.
>>>>
>>>> [ 4.574467] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
>>>> [ 4.575651] pci 0000:00:01.0: Activating ISA DMA hang workarounds
>>>> [ 4.576900] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
>>>> [ 4.578648] PCI: CLS 0 bytes, default 64
>>>> [ 4.579685] Unpacking initramfs...
>>>> [ 8.878031] -----------[ cut here ]-----------
>>>> [ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 process_one_work+0x92/0x9e0
>>>> [ 8.880688] Modules linked in:
>>>> [ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 5.11.0-rc3-gc213503139bb #2
>>>
>>> The kworker bond to CPU1 runs on CPU0 and triggers the warning, which
>>> shows that scheduler breaks CPU affinity, after 06249738a41a
>>> ("workqueue: Manually break affinity on hotplug"), though quite likely
>>> by kworker/1:0 for the initial workers.
>>>
>>>> [ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
>>>> [ 8.887539] Workqueue: 0x0 (events)
>>>> [ 8.887838] EIP: process_one_work+0x92/0x9e0
>>>> [ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31
>>>> [ 8.887838] EAX: 42f51d08 EBX: 00000000 ECX: 00000000 EDX: 00000001
>>>> [ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08
>>>> [ 8.887838] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010002
>>>> [ 8.887838] CR0: 80050033 CR2: 00000000 CR3: 034e3000 CR4: 000406d0
>>>> [ 8.887838] Call Trace:
>>>> [ 8.887838] ? worker_thread+0x98/0x6a0
>>>> [ 8.887838] ? worker_thread+0x2dd/0x6a0
>>>> [ 8.887838] ? kthread+0x1ba/0x1e0
>>>> [ 8.887838] ? create_worker+0x1e0/0x1e0
>>>> [ 8.887838] ? kzalloc+0x20/0x20
>>>> [ 8.887838] ? ret_from_fork+0x1c/0x28
>>>> [ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed
>>>> [ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with crng_init=0
>>>> [ 8.887838] --[ end trace ac461b4d54c37cfa ]--
>>>
>>>
>>> Instead of creating the initial workers only on the active CPUS, rebind
>>> them (labeled pcpu) and jump to the right CPU at bootup time.
>>>
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -2385,6 +2385,16 @@ woke_up:
>>>    		return 0;
>>>    	}
>>>    
>>> +	if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() !=
>>> +								pool->cpu) {
>>> +		/* scheduler breaks CPU affinity for us, rebind it */
>>> +		raw_spin_unlock_irq(&pool->lock);
>>> +		set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
>>> +		/* and jump to the right seat */
>>> +		schedule_timeout_interruptible(1);
>>> +		goto woke_up;
>>> +	}
>>> +
>>>    	worker_leave_idle(worker);
>>>    recheck:
>>>    	/* no more worker necessary? */
>>> --
>>>
>> I test the patch, the warning still appears in the kernel log.
> 
> Thanks for your report.
>>
>> [  230.356503] smpboot: CPU 1 is now offline
>> [  230.544652] x86: Booting SMP configuration:
>> [  230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1
>> [  230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock
>> [  230.545675] masked ExtINT on CPU#1
>> [  230.593829] ------------[ cut here ]------------
>> [  230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 process_one_work+0x92/0x9e0
>> [  230.594990] Modules linked in: rcutorture torture mousedev input_leds
>> led_class pcspkr psmouse evbug tiny_power_button button
>> [  230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 5.11.0-rc3-gdcba55d9080f #2
> 
> Like what was reported, kworker bond to CPU1 runs on CPU0 and triggers
> warning, due to scheduler breaking CPU affinity for us. What is new, the
> affinity was broken at offline time instead of bootup.
> 
>> [  230.596621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
>> [  230.597322] Workqueue:  0x0 (rcu_gp)
>> [  230.597636] EIP: process_one_work+0x92/0x9e0
>> [  230.598005] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00
>> 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 f4 85 13 00 ff 05 cc 30 04
>> 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31
>> [  230.599569] EAX: 42f51d08 EBX: 00000000 ECX: 00000000 EDX: 00000001
>> [  230.600100] ESI: 43d94240 EDI: df4040f4 EBP: de7f23c0 ESP: bf5f1f08
>> [  230.600629] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010002
>> [  230.601203] CR0: 80050033 CR2: 01bdecbc CR3: 04e2c000 CR4: 000406d0
>> [  230.601735] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> [  230.602265] DR6: fffe0ff0 DR7: 00000400
>> [  230.602594] Call Trace:
>> [  230.602813]  ? process_one_work+0x20e/0x9e0
>> [  230.603181]  ? worker_thread+0x32d/0x700
>> [  230.603522]  ? kthread+0x1ba/0x1e0
>> [  230.603818]  ? create_worker+0x1e0/0x1e0
>> [  230.604157]  ? kzalloc+0x20/0x20
>> [  230.604524]  ? ret_from_fork+0x1c/0x28
>> [  230.604850] ---[ end trace 06b1e66b5e17fa85 ]---
>> [  230.605504] kvm-guest: stealtime: cpu 1, msr 9e7e6ec0
>> [  230.766960] smpboot: CPU 1 is now offline
>> [  230.814803] x86: Booting SMP configuration:
>> [  230.815306] smpboot: Booting Node 0 Processor 1 APIC 0x1
>> [  230.815964] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock
> 
> 
> Unlike the above diff that is at most papering over the problem
> sitting somewhere in the scheduler, add change to creating worker
> by skipping set_cpus_allowed_ptr() because we will wake it up after
> attaching it to worker pool.
> 
> If we can ignore rescuer for now, then the allowed ptr is only
> updated at on/offline time; lets see the difference at boot time.
> 
> 
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1844,16 +1844,10 @@ static struct worker *alloc_worker(int n
>    * cpu-[un]hotplugs.
>    */
>   static void worker_attach_to_pool(struct worker *worker,
> -				   struct worker_pool *pool)
> +				   struct worker_pool *pool,
> +				   int update_cpus_allowed)
>   {
>   	mutex_lock(&wq_pool_attach_mutex);
> -
> -	/*
> -	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
> -	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
> -	 */
> -	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> -
>   	/*
>   	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
>   	 * stable across this function.  See the comments above the flag
> @@ -1867,6 +1861,9 @@ static void worker_attach_to_pool(struct
>   	list_add_tail(&worker->node, &pool->workers);
>   	worker->pool = pool;
>   
> +	if (update_cpus_allowed)
> +		set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> +
>   	mutex_unlock(&wq_pool_attach_mutex);
>   }
>   
> @@ -1942,8 +1939,11 @@ static struct worker *create_worker(stru
>   	set_user_nice(worker->task, pool->attrs->nice);
>   	kthread_bind_mask(worker->task, pool->attrs->cpumask);
>   
> -	/* successful, attach the worker to the pool */
> -	worker_attach_to_pool(worker, pool);
> +	/*
> +	 * attach the worker to the pool without asking scheduler to
> +	 * update CPUs allowed
> +	 */
> +	worker_attach_to_pool(worker, pool, 0);
>   
>   	/* start the newly created worker */
>   	raw_spin_lock_irq(&pool->lock);
> @@ -2508,7 +2508,7 @@ repeat:
>   
>   		raw_spin_unlock_irq(&wq_mayday_lock);
>   
> -		worker_attach_to_pool(rescuer, pool);
> +		worker_attach_to_pool(rescuer, pool, 1);
>   
>   		raw_spin_lock_irq(&pool->lock);
>   
> --
> 
I test the patch, the warning still appears in the kernel log.

[   55.754187] smpboot: Booting Node 0 Processor 1 APIC 0x1
[   55.785594] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock
[   55.785646] masked ExtINT on CPU#1
[   55.920602] ------------[ cut here ]------------
[   55.921355] WARNING: CPU: 0 PID: 160 at kernel/workqueue.c:2192 
process_one_work+0x92/0x9e0
[   55.922583] Modules linked in: rcutorture torture mousedev evbug 
input_leds led_class tiny_power_button psmouse pcspkr button
[   55.924294] CPU: 0 PID: 160 Comm: kworker/1:2 Not tainted 
5.11.0-rc3-00186-g77bf4e461cfa #2
[   55.925552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.12.0-1 04/01/2014
[   55.926763] Workqueue:  0x0 (rcu_gp)
[   55.927298] EIP: process_one_work+0x92/0x9e0
[   55.927950] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 
00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 94 85 13 00 ff 05 b8 30 04 
43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31
[   55.930726] EAX: 42f51d08 EBX: 00000000 ECX: 00000000 EDX: 00000001
[   55.931642] ESI: 43d90540 EDI: df48c0f4 EBP: de7f23c0 ESP: bfb47f08
[   55.932590] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010002
[   55.933609] CR0: 80050033 CR2: 024e994c CR3: 7fd80000 CR4: 000406d0
[   55.934555] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   55.935457] DR6: fffe0ff0 DR7: 00000400
[   55.936041] Call Trace:
[   55.936534]  ? process_one_work+0x20e/0x9e0
[   55.937305]  ? worker_thread+0x2dd/0x6a0
[   55.938018]  ? kthread+0x1ba/0x1e0
[   55.938598]  ? create_worker+0x1e0/0x1e0
[   55.939315]  ? kzalloc+0x20/0x20
[   55.940000]  ? ret_from_fork+0x1c/0x28
[   55.940627] ---[ end trace d155e9e6402de179 ]---
[   55.941641] kvm-guest: stealtime: cpu 1, msr 9e7e6ec0
[   56.155271] smpboot: CPU 1 is now offline
[   56.193613] x86: Booting SMP configuration:
[   56.194400] smpboot: Booting Node 0 Processor 1 APIC 0x1


-- 
Zhengjun Xing

[-- Attachment #2: dmesg.xz --]
[-- Type: application/octet-stream, Size: 38816 bytes --]