On Tue, 2021-02-16 at 13:53 +0000, David Woodhouse wrote: > I threw it into my tree at > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel > > It seems to work fairly nicely. The parallel SIPI seems to win be about > a third of the bringup time on my 28-thread Haswell box. This is at the > penultimate commit of the above branch: > > [ 0.307590] smp: Bringing up secondary CPUs ... > [ 0.307826] x86: Booting SMP configuration: > [ 0.307830] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 > [ 0.376677] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details. > [ 0.377177] #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 > [ 0.402323] Brought CPUs online in 246691584 cycles > [ 0.402323] smp: Brought up 1 node, 28 CPUs > > ... and this is the tip of the branch: > > [ 0.308332] smp: Bringing up secondary CPUs ... > [ 0.308569] x86: Booting SMP configuration: > [ 0.308572] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 > [ 0.321120] Brought 28 CPUs to x86/cpu:kick in 34828752 cycles > [ 0.366663] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details. > [ 0.368749] Brought CPUs online in 124913032 cycles > [ 0.368749] smp: Brought up 1 node, 28 CPUs > [ 0.368749] smpboot: Max logical packages: 1 > [ 0.368749] smpboot: Total of 28 processors activated (145259.85 BogoMIPS) > > There's more to be gained here if we can fix up the next stage. Right > now if I set every CPU's bit in cpu_initialized_mask to allow them to > proceed from wait_for_master_cpu() through to the end of cpu_init() and > onwards through start_secondary(), they all end up hitting > check_tsc_sync_target() in parallel and it goes horridly wrong. Actually it breaks before that, in rcu_cpu_starting(). A spinlock around that, an atomic_t to let the APs do their TSC sync one at a time (both in the above tree now), and I have a 75% saving on CPU bringup time for my 28-thread Haswell: [ 0.307341] smp: Bringing up secondary CPUs ... [ 0.307576] x86: Booting SMP configuration: [ 0.307579] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 [ 0.320100] Brought 28 CPUs to x86/cpu:kick in 34645984 cycles [ 0.325032] Brought 28 CPUs to x86/cpu:wait-init in 12865752 cycles [ 0.326902] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details. [ 0.328739] Brought CPUs online in 11702224 cycles [ 0.328739] smp: Brought up 1 node, 28 CPUs [ 0.328739] smpboot: Max logical packages: 1 [ 0.328739] smpboot: Total of 28 processors activated (145261.81 BogoMIPS)