On 1/21/2021 12:00 PM, Hillf Danton wrote: > On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote: >> On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote: >>> Thu, 14 Jan 2021 15:45:11 +0800 >>>> >>>> FYI, we noticed the following commit (built with gcc-9): >>>> >>>> commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity initiatively") >>>> https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2021.01.11b >>>> >>> [...] >>>> >>>> [ 73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 process_one_work >>> >>> Thanks for your report. >>> >>> We can also break CPU affinity by checking POOL_DISASSOCIATED at attach >>> time without extra cost paid; that way we have the same behavior as at >>> the unbind time. >>> >>> What is more the change that makes kworker pcpu is cut because they are >>> going to not help either hotplug or the mechanism of stop machine. >> >> hi, by applying below patch, the issue still happened. > > Thanks for your report. >> >> [ 4.574467] pci 0000:00:00.0: Limiting direct PCI/PCI transfers >> [ 4.575651] pci 0000:00:01.0: Activating ISA DMA hang workarounds >> [ 4.576900] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] >> [ 4.578648] PCI: CLS 0 bytes, default 64 >> [ 4.579685] Unpacking initramfs... >> [ 8.878031] -----------[ cut here ]----------- >> [ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 process_one_work+0x92/0x9e0 >> [ 8.880688] Modules linked in: >> [ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 5.11.0-rc3-gc213503139bb #2 > > The kworker bond to CPU1 runs on CPU0 and triggers the warning, which > shows that scheduler breaks CPU affinity, after 06249738a41a > ("workqueue: Manually break affinity on hotplug"), though quite likely > by kworker/1:0 for the initial workers. > >> [ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 >> [ 8.887539] Workqueue: 0x0 (events) >> [ 8.887838] EIP: process_one_work+0x92/0x9e0 >> [ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31 >> [ 8.887838] EAX: 42f51d08 EBX: 00000000 ECX: 00000000 EDX: 00000001 >> [ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08 >> [ 8.887838] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010002 >> [ 8.887838] CR0: 80050033 CR2: 00000000 CR3: 034e3000 CR4: 000406d0 >> [ 8.887838] Call Trace: >> [ 8.887838] ? worker_thread+0x98/0x6a0 >> [ 8.887838] ? worker_thread+0x2dd/0x6a0 >> [ 8.887838] ? kthread+0x1ba/0x1e0 >> [ 8.887838] ? create_worker+0x1e0/0x1e0 >> [ 8.887838] ? kzalloc+0x20/0x20 >> [ 8.887838] ? ret_from_fork+0x1c/0x28 >> [ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed >> [ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with crng_init=0 >> [ 8.887838] --[ end trace ac461b4d54c37cfa ]-- > > > Instead of creating the initial workers only on the active CPUS, rebind > them (labeled pcpu) and jump to the right CPU at bootup time. > > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -2385,6 +2385,16 @@ woke_up: > return 0; > } > > + if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() != > + pool->cpu) { > + /* scheduler breaks CPU affinity for us, rebind it */ > + raw_spin_unlock_irq(&pool->lock); > + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > + /* and jump to the right seat */ > + schedule_timeout_interruptible(1); > + goto woke_up; > + } > + > worker_leave_idle(worker); > recheck: > /* no more worker necessary? */ > -- > I test the patch, the warning still appears in the kernel log. [ 230.356503] smpboot: CPU 1 is now offline [ 230.544652] x86: Booting SMP configuration: [ 230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock [ 230.545675] masked ExtINT on CPU#1 [ 230.593829] ------------[ cut here ]------------ [ 230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 process_one_work+0x92/0x9e0 [ 230.594990] Modules linked in: rcutorture torture mousedev input_leds led_class pcspkr psmouse evbug tiny_power_button button [ 230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 5.11.0-rc3-gdcba55d9080f #2 [ 230.596621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 230.597322] Workqueue: 0x0 (rcu_gp) [ 230.597636] EIP: process_one_work+0x92/0x9e0 [ 230.598005] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 f4 85 13 00 ff 05 cc 30 04 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31 [ 230.599569] EAX: 42f51d08 EBX: 00000000 ECX: 00000000 EDX: 00000001 [ 230.600100] ESI: 43d94240 EDI: df4040f4 EBP: de7f23c0 ESP: bf5f1f08 [ 230.600629] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010002 [ 230.601203] CR0: 80050033 CR2: 01bdecbc CR3: 04e2c000 CR4: 000406d0 [ 230.601735] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 230.602265] DR6: fffe0ff0 DR7: 00000400 [ 230.602594] Call Trace: [ 230.602813] ? process_one_work+0x20e/0x9e0 [ 230.603181] ? worker_thread+0x32d/0x700 [ 230.603522] ? kthread+0x1ba/0x1e0 [ 230.603818] ? create_worker+0x1e0/0x1e0 [ 230.604157] ? kzalloc+0x20/0x20 [ 230.604524] ? ret_from_fork+0x1c/0x28 [ 230.604850] ---[ end trace 06b1e66b5e17fa85 ]--- [ 230.605504] kvm-guest: stealtime: cpu 1, msr 9e7e6ec0 [ 230.766960] smpboot: CPU 1 is now offline [ 230.814803] x86: Booting SMP configuration: [ 230.815306] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 230.815964] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock -- Zhengjun Xing