On 1/25/2021 5:29 PM, Hillf Danton wrote: > On 25 Jan 2021 16:31:32 +0800 Xing Zhengjun wrote: >> On 1/22/2021 3:59 PM, Hillf Danton wrote: >>> On Fri, 22 Jan 2021 09:48:32 +0800 Xing Zhengjun wrote: >>>> On 1/21/2021 12:00 PM, Hillf Danton wrote: >>>>> On Wed, 20 Jan 2021 21:46:33 +0800 Oliver Sang wrote: >>>>>> On Fri, Jan 15, 2021 at 03:24:32PM +0800, Hillf Danton wrote: >>>>>>> Thu, 14 Jan 2021 15:45:11 +0800 >>>>>>>> >>>>>>>> FYI, we noticed the following commit (built with gcc-9): >>>>>>>> >>>>>>>> commit: d5bff968ea9cc005e632d9369c26cbd8148c93d5 ("workqueue: break affinity initiatively") >>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2021.01.11b >>>>>>>> >>>>>>> [...] >>>>>>>> >>>>>>>> [ 73.794288] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2192 process_one_work >>>>>>> >>>>>>> Thanks for your report. >>>>>>> >>>>>>> We can also break CPU affinity by checking POOL_DISASSOCIATED at attach >>>>>>> time without extra cost paid; that way we have the same behavior as at >>>>>>> the unbind time. >>>>>>> >>>>>>> What is more the change that makes kworker pcpu is cut because they are >>>>>>> going to not help either hotplug or the mechanism of stop machine. >>>>>> >>>>>> hi, by applying below patch, the issue still happened. >>>>> >>>>> Thanks for your report. >>>>>> >>>>>> [ 4.574467] pci 0000:00:00.0: Limiting direct PCI/PCI transfers >>>>>> [ 4.575651] pci 0000:00:01.0: Activating ISA DMA hang workarounds >>>>>> [ 4.576900] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] >>>>>> [ 4.578648] PCI: CLS 0 bytes, default 64 >>>>>> [ 4.579685] Unpacking initramfs... >>>>>> [ 8.878031] -----------[ cut here ]----------- >>>>>> [ 8.879083] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2187 process_one_work+0x92/0x9e0 >>>>>> [ 8.880688] Modules linked in: >>>>>> [ 8.881274] CPU: 0 PID: 22 Comm: kworker/1:0 Not tainted 5.11.0-rc3-gc213503139bb #2 >>>>> >>>>> The kworker bond to CPU1 runs on CPU0 and triggers the warning, which >>>>> shows that scheduler breaks CPU affinity, after 06249738a41a >>>>> ("workqueue: Manually break affinity on hotplug"), though quite likely >>>>> by kworker/1:0 for the initial workers. >>>>> >>>>>> [ 8.882518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 >>>>>> [ 8.887539] Workqueue: 0x0 (events) >>>>>> [ 8.887838] EIP: process_one_work+0x92/0x9e0 >>>>>> [ 8.887838] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 74 85 13 00 ff 05 b8 30 04 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31 >>>>>> [ 8.887838] EAX: 42f51d08 EBX: 00000000 ECX: 00000000 EDX: 00000001 >>>>>> [ 8.887838] ESI: 43c04720 EDI: 42e45620 EBP: de7f23c0 ESP: 43d7bf08 >>>>>> [ 8.887838] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010002 >>>>>> [ 8.887838] CR0: 80050033 CR2: 00000000 CR3: 034e3000 CR4: 000406d0 >>>>>> [ 8.887838] Call Trace: >>>>>> [ 8.887838] ? worker_thread+0x98/0x6a0 >>>>>> [ 8.887838] ? worker_thread+0x2dd/0x6a0 >>>>>> [ 8.887838] ? kthread+0x1ba/0x1e0 >>>>>> [ 8.887838] ? create_worker+0x1e0/0x1e0 >>>>>> [ 8.887838] ? kzalloc+0x20/0x20 >>>>>> [ 8.887838] ? ret_from_fork+0x1c/0x28 >>>>>> [ 8.887838] _warn_unseeded_randomness: 63 callbacks suppressed >>>>>> [ 8.887838] random: get_random_bytes called from init_oops_id+0x2b/0x60 with crng_init=0 >>>>>> [ 8.887838] --[ end trace ac461b4d54c37cfa ]-- >>>>> >>>>> >>>>> Instead of creating the initial workers only on the active CPUS, rebind >>>>> them (labeled pcpu) and jump to the right CPU at bootup time. >>>>> >>>>> --- a/kernel/workqueue.c >>>>> +++ b/kernel/workqueue.c >>>>> @@ -2385,6 +2385,16 @@ woke_up: >>>>> return 0; >>>>> } >>>>> >>>>> + if (!(pool->flags & POOL_DISASSOCIATED) && smp_processor_id() != >>>>> + pool->cpu) { >>>>> + /* scheduler breaks CPU affinity for us, rebind it */ >>>>> + raw_spin_unlock_irq(&pool->lock); >>>>> + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); >>>>> + /* and jump to the right seat */ >>>>> + schedule_timeout_interruptible(1); >>>>> + goto woke_up; >>>>> + } >>>>> + >>>>> worker_leave_idle(worker); >>>>> recheck: >>>>> /* no more worker necessary? */ >>>>> -- >>>>> >>>> I test the patch, the warning still appears in the kernel log. >>> >>> Thanks for your report. >>>> >>>> [ 230.356503] smpboot: CPU 1 is now offline >>>> [ 230.544652] x86: Booting SMP configuration: >>>> [ 230.545077] smpboot: Booting Node 0 Processor 1 APIC 0x1 >>>> [ 230.545640] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock >>>> [ 230.545675] masked ExtINT on CPU#1 >>>> [ 230.593829] ------------[ cut here ]------------ >>>> [ 230.594257] WARNING: CPU: 0 PID: 257 at kernel/workqueue.c:2192 process_one_work+0x92/0x9e0 >>>> [ 230.594990] Modules linked in: rcutorture torture mousedev input_leds >>>> led_class pcspkr psmouse evbug tiny_power_button button >>>> [ 230.595961] CPU: 0 PID: 257 Comm: kworker/1:3 Not tainted 5.11.0-rc3-gdcba55d9080f #2 >>> >>> Like what was reported, kworker bond to CPU1 runs on CPU0 and triggers >>> warning, due to scheduler breaking CPU affinity for us. What is new, the >>> affinity was broken at offline time instead of bootup. >>> >>>> [ 230.596621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 >>>> [ 230.597322] Workqueue: 0x0 (rcu_gp) >>>> [ 230.597636] EIP: process_one_work+0x92/0x9e0 >>>> [ 230.598005] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 >>>> 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 f4 85 13 00 ff 05 cc 30 04 >>>> 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31 >>>> [ 230.599569] EAX: 42f51d08 EBX: 00000000 ECX: 00000000 EDX: 00000001 >>>> [ 230.600100] ESI: 43d94240 EDI: df4040f4 EBP: de7f23c0 ESP: bf5f1f08 >>>> [ 230.600629] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010002 >>>> [ 230.601203] CR0: 80050033 CR2: 01bdecbc CR3: 04e2c000 CR4: 000406d0 >>>> [ 230.601735] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 >>>> [ 230.602265] DR6: fffe0ff0 DR7: 00000400 >>>> [ 230.602594] Call Trace: >>>> [ 230.602813] ? process_one_work+0x20e/0x9e0 >>>> [ 230.603181] ? worker_thread+0x32d/0x700 >>>> [ 230.603522] ? kthread+0x1ba/0x1e0 >>>> [ 230.603818] ? create_worker+0x1e0/0x1e0 >>>> [ 230.604157] ? kzalloc+0x20/0x20 >>>> [ 230.604524] ? ret_from_fork+0x1c/0x28 >>>> [ 230.604850] ---[ end trace 06b1e66b5e17fa85 ]--- >>>> [ 230.605504] kvm-guest: stealtime: cpu 1, msr 9e7e6ec0 >>>> [ 230.766960] smpboot: CPU 1 is now offline >>>> [ 230.814803] x86: Booting SMP configuration: >>>> [ 230.815306] smpboot: Booting Node 0 Processor 1 APIC 0x1 >>>> [ 230.815964] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock >>> >>> >>> Unlike the above diff that is at most papering over the problem >>> sitting somewhere in the scheduler, add change to creating worker >>> by skipping set_cpus_allowed_ptr() because we will wake it up after >>> attaching it to worker pool. >>> >>> If we can ignore rescuer for now, then the allowed ptr is only >>> updated at on/offline time; lets see the difference at boot time. >>> >>> >>> --- a/kernel/workqueue.c >>> +++ b/kernel/workqueue.c >>> @@ -1844,16 +1844,10 @@ static struct worker *alloc_worker(int n >>> * cpu-[un]hotplugs. >>> */ >>> static void worker_attach_to_pool(struct worker *worker, >>> - struct worker_pool *pool) >>> + struct worker_pool *pool, >>> + int update_cpus_allowed) >>> { >>> mutex_lock(&wq_pool_attach_mutex); >>> - >>> - /* >>> - * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any >>> - * online CPUs. It'll be re-applied when any of the CPUs come up. >>> - */ >>> - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); >>> - >>> /* >>> * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains >>> * stable across this function. See the comments above the flag >>> @@ -1867,6 +1861,9 @@ static void worker_attach_to_pool(struct >>> list_add_tail(&worker->node, &pool->workers); >>> worker->pool = pool; >>> >>> + if (update_cpus_allowed) >>> + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); >>> + >>> mutex_unlock(&wq_pool_attach_mutex); >>> } >>> >>> @@ -1942,8 +1939,11 @@ static struct worker *create_worker(stru >>> set_user_nice(worker->task, pool->attrs->nice); >>> kthread_bind_mask(worker->task, pool->attrs->cpumask); >>> >>> - /* successful, attach the worker to the pool */ >>> - worker_attach_to_pool(worker, pool); >>> + /* >>> + * attach the worker to the pool without asking scheduler to >>> + * update CPUs allowed >>> + */ >>> + worker_attach_to_pool(worker, pool, 0); >>> >>> /* start the newly created worker */ >>> raw_spin_lock_irq(&pool->lock); >>> @@ -2508,7 +2508,7 @@ repeat: >>> >>> raw_spin_unlock_irq(&wq_mayday_lock); >>> >>> - worker_attach_to_pool(rescuer, pool); >>> + worker_attach_to_pool(rescuer, pool, 1); >>> >>> raw_spin_lock_irq(&pool->lock); >>> >>> -- >>> >> I test the patch, the warning still appears in the kernel log. > > Thanks. >> >> [ 55.754187] smpboot: Booting Node 0 Processor 1 APIC 0x1 >> [ 55.785594] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock >> [ 55.785646] masked ExtINT on CPU#1 >> [ 55.920602] ------------[ cut here ]------------ >> [ 55.921355] WARNING: CPU: 0 PID: 160 at kernel/workqueue.c:2192 process_one_work+0x92/0x9e0 >> [ 55.922583] Modules linked in: rcutorture torture mousedev evbug >> input_leds led_class tiny_power_button psmouse pcspkr button >> [ 55.924294] CPU: 0 PID: 160 Comm: kworker/1:2 Not tainted 5.11.0-rc3-00186-g77bf4e461cfa #2 > > Same issue as before. > >> [ 55.925552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 >> [ 55.926763] Workqueue: 0x0 (rcu_gp) >> [ 55.927298] EIP: process_one_work+0x92/0x9e0 >> [ 55.927950] Code: 37 64 a1 58 54 4c 43 39 45 24 74 2c 31 c9 ba 01 00 >> 00 00 c7 04 24 01 00 00 00 b8 08 1d f5 42 e8 94 85 13 00 ff 05 b8 30 04 >> 43 <0f> 0b ba 01 00 00 00 eb 22 8d 74 26 00 90 c7 04 24 01 00 00 00 31 >> [ 55.930726] EAX: 42f51d08 EBX: 00000000 ECX: 00000000 EDX: 00000001 >> [ 55.931642] ESI: 43d90540 EDI: df48c0f4 EBP: de7f23c0 ESP: bfb47f08 >> [ 55.932590] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010002 >> [ 55.933609] CR0: 80050033 CR2: 024e994c CR3: 7fd80000 CR4: 000406d0 >> [ 55.934555] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 >> [ 55.935457] DR6: fffe0ff0 DR7: 00000400 >> [ 55.936041] Call Trace: >> [ 55.936534] ? process_one_work+0x20e/0x9e0 >> [ 55.937305] ? worker_thread+0x2dd/0x6a0 >> [ 55.938018] ? kthread+0x1ba/0x1e0 >> [ 55.938598] ? create_worker+0x1e0/0x1e0 >> [ 55.939315] ? kzalloc+0x20/0x20 >> [ 55.940000] ? ret_from_fork+0x1c/0x28 >> [ 55.940627] ---[ end trace d155e9e6402de179 ]--- >> [ 55.941641] kvm-guest: stealtime: cpu 1, msr 9e7e6ec0 >> [ 56.155271] smpboot: CPU 1 is now offline >> [ 56.193613] x86: Booting SMP configuration: >> [ 56.194400] smpboot: Booting Node 0 Processor 1 APIC 0x1 > > The changes in the diff below are > > 1/ at rescue time, change CPU affinity only if POOL_DISASSOCIATED > is not set, and print warning the same way as offline time. > > 2/ at offine time, dont update allowed CPUs after setting > POOL_DISASSOCIATED because we no longer have interest in affinity. > > 3/ at online time, mark pcpu before binding affinity. > > Though one change a diff is appreciated, by the WARNs, we can tell > which is what if any warning goes into dmesg. > > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -1844,25 +1844,23 @@ static struct worker *alloc_worker(int n > * cpu-[un]hotplugs. > */ > static void worker_attach_to_pool(struct worker *worker, > - struct worker_pool *pool) > + struct worker_pool *pool, int set) > { > mutex_lock(&wq_pool_attach_mutex); > > /* > - * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any > - * online CPUs. It'll be re-applied when any of the CPUs come up. > - */ > - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > - > - /* > * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains > * stable across this function. See the comments above the flag > * definition for details. > */ > if (pool->flags & POOL_DISASSOCIATED) > worker->flags |= WORKER_UNBOUND; > - else > + else { > kthread_set_per_cpu(worker->task, true); > + if (set) > + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, > + pool->attrs->cpumask) < 0); > + } > > list_add_tail(&worker->node, &pool->workers); > worker->pool = pool; > @@ -1943,7 +1941,7 @@ static struct worker *create_worker(stru > kthread_bind_mask(worker->task, pool->attrs->cpumask); > > /* successful, attach the worker to the pool */ > - worker_attach_to_pool(worker, pool); > + worker_attach_to_pool(worker, pool, 0); > > /* start the newly created worker */ > raw_spin_lock_irq(&pool->lock); > @@ -2508,7 +2506,7 @@ repeat: > > raw_spin_unlock_irq(&wq_mayday_lock); > > - worker_attach_to_pool(rescuer, pool); > + worker_attach_to_pool(rescuer, pool, 1); > > raw_spin_lock_irq(&pool->lock); > > @@ -4923,7 +4921,6 @@ static void unbind_workers(int cpu) > > for_each_pool_worker(worker, pool) { > kthread_set_per_cpu(worker->task, false); > - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0); > } > > mutex_unlock(&wq_pool_attach_mutex); > @@ -4977,9 +4974,9 @@ static void rebind_workers(struct worker > * from CPU_ONLINE, the following shouldn't fail. > */ > for_each_pool_worker(worker, pool) { > + kthread_set_per_cpu(worker->task, true); > WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, > pool->attrs->cpumask) < 0); > - kthread_set_per_cpu(worker->task, true); > } > > raw_spin_lock_irq(&pool->lock); > -- > I test the patch, the warning still appears in the kernel log, but the warning is different from before. [ 0.054803] kvm-clock: cpu 1, msr 34f6021, secondary cpu clock [ 0.054803] masked ExtINT on CPU#1 [ 0.054803] smpboot: CPU 1 Converting physical 0 to logical die 1 [ 1.890338] ------------[ cut here ]------------ [ 1.890338] WARNING: CPU: 1 PID: 18 at kernel/kthread.c:508 kthread_set_per_cpu+0x156/0x180 [ 1.890338] Modules linked in: [ 1.890338] CPU: 1 PID: 18 Comm: cpuhp/1 Not tainted 5.11.0-rc3-00186-ged03082352b2 #2 [ 1.890338] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 1.890338] EIP: kthread_set_per_cpu+0x156/0x180 [ 1.890338] Code: 00 00 00 00 ff 05 68 4e 04 43 83 c4 08 5b 5e 5f c3 8d 76 00 ff 05 34 50 04 43 0f 0b e9 f9 fe ff ff 8d 76 00 ff 05 2c 4e 04 43 <0f> 0b eb 9d 8d b6 00 00 00 00 ff 05 40 4e 04 43 0f 0b e9 45 ff ff [ 1.890338] EAX: 42f52ce0 EBX: 00000001 ECX: 00000000 EDX: 00000001 [ 1.890338] ESI: 43d76300 EDI: 43c0de00 EBP: de7f2564 ESP: 43d6beb8 [ 1.900350] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010202 [ 1.901303] CR0: 80050033 CR2: 00000000 CR3: 034e3000 CR4: 000406d0 [ 1.902280] Call Trace: [ 1.902682] ? workqueue_online_cpu+0x12b/0x640 [ 1.903415] ? workqueue_prepare_cpu+0xa0/0xa0 [ 1.904155] ? cpuhp_invoke_callback+0x1ed/0x1340 [ 1.904941] ? cpuhp_thread_fun+0x28f/0x460 [ 1.905630] ? cpuhp_thread_fun+0x49/0x460 [ 1.906298] ? smpboot_thread_fn+0x446/0x620 [ 1.910275] ? kthread+0x1ba/0x1e0 [ 1.910857] ? __smpboot_create_thread+0x260/0x260 [ 1.911659] ? kzalloc+0x20/0x20 [ 1.912368] ? ret_from_fork+0x1c/0x28 [ 1.913016] ---[ end trace 6f6c005278241eba ]--- [ 1.913971] kvm-guest: stealtime: cpu 1, msr 9e7e6ec0 [ 1.920012] smp: Brought up 1 node, 2 CPUs [ 1.920299] smpboot: Max logical packages: 2 [ 1.921019] smpboot: Total of 2 processors activated (10774.03 BogoMIPS) -- Zhengjun Xing