On 3/18/22 08:30, Kirill A. Shutemov wrote:
> +	/*
> +	 * After writing the wakeup command, wait for maximum timeout of 0xFF
> +	 * for firmware to reset the command address back zero to indicate
> +	 * the successful reception of command.
> +	 * NOTE: 0xFF as timeout value is decided based on our experiments.
> +	 *
> +	 * XXX: Change the timeout once ACPI specification comes up with
> +	 *      standard maximum timeout value.
> +	 */
> +	timeout = 0xFF;
> +	while (READ_ONCE(acpi_mp_wake_mailbox->command) && --timeout)
> +		cpu_relax();

I don't feel like this was ever actually resolved.  This timeout
basically boiled down to "this value seems to work for us".  There are
also *SURELY* timeouts that are going to happen here.

In the end, the host has to choose to run the guest.  If it decides for
some odd reason not to run the CPU that's being booted, we'll get timeouts.

The other thing that baffles me is the re-wakeup bits:

>         /*
>          * If the CPU wakeup process is successful, store the
>          * status in apic_id_wakemap to prevent re-wakeup
>          * requests.
>          */
>         physid_set(apicid, apic_id_wakemap);
> 
>         return 0;
> }

If this goes wrong, won't the new wakeup just timeout?  Do we really
need a dedicated mechanism to stop re-wakeups?  How much of a problem is
this going to be?

Anyway, see the totally untested hunk I've attached.  It fixes a bunch
of issues IMNHO.