On Fri, 2023-03-24 at 14:57 +0100, Thomas Gleixner wrote:
> 
> Why? Simply because of this:
> 
>   BP                    AP              state
>   kick()                                BRINGUP_CPU
>                         startup()                      
>   sync()                sync() 
>                         starting()      advances to AP_ONLINE
>   sync()                sync()
>   TSC_sync()            TSC_sync()
>   wait_for_online()     set_online()
>                         cpu_startup_entry() AP_ONLINE_IDLE
>   wait_for_completion() complete()
> 
> This works correctly today because bringup_cpu() does not modify state
> and excpects the state to be advanced by the AP once the completion is
> done.
> 
> So you _cannot_ just throw some magic dynamic states before BRINGUP_CPU
> and then expect that the state machine is consistent when the AP is
> allowed to run the starting callbacks in parallel.

Aha! I see.

Yes, when the AP calls notify_cpu_starting(), which x86 does from
smp_callin(), the AP takes *itself* forward through the states from
there.

That happens when the BP gets to do_wait_cpu_initialized(). So yes, the
actual code in the existing series of patches is entirely safe, but
you're right that we do only want that *one* additional state for
parallelising the "kick AP"  before CPUHP_BRINGUP_CPU. The rest need to
come afterwards and be handled differently.