Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU

From: Thiago Jung Bauermann <bauerman@linux.ibm.com>
To: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org,
	Gautham R Shenoy <ego@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>,
	Michael Bringmann <mwb@linux.vnet.ibm.com>,
	Tyrel Datwyler <tyreld@linux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Subject: Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
Date: Tue, 30 Apr 2019 16:59:18 -0300	[thread overview]
Message-ID: <87v9yve02x.fsf@morokweng.localdomain> (raw)
In-Reply-To: <877ebbsb8u.fsf@linux.ibm.com>

Hello Nathan,

Thanks for reviewing the patch!

Nathan Lynch <nathanl@linux.ibm.com> writes:

> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>> This can be a problem because if the busy loop finishes too early, then the
>> kernel may offline another CPU before the previous one finished dying,
>> which would lead to two concurrent calls to rtas-stop-self, which is
>> prohibited by the PAPR.
>>
>> Since the hotplug machinery already assumes that cpu_die() is going to
>> work, we can simply loop until the CPU stops.
>>
>> Also change the loop to wait 100 µs between each call to
>> smp_query_cpu_stopped() to avoid querying RTAS too often.
>
> [...]
>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 97feb6e79f1a..d75cee60644c 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -214,13 +214,17 @@ static void pseries_cpu_die(unsigned int cpu)
>>  			msleep(1);
>>  		}
>>  	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
>> -
>> -		for (tries = 0; tries < 25; tries++) {
>> +		/*
>> +		 * rtas_stop_self() panics if the CPU fails to stop and our
>> +		 * callers already assume that we are going to succeed, so we
>> +		 * can just loop until the CPU stops.
>> +		 */
>> +		while (true) {
>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>  			if (cpu_status == QCSS_STOPPED ||
>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>  				break;
>> -			cpu_relax();
>> +			udelay(100);
>>  		}
>>  	}
>
> I agree with looping indefinitely but doesn't it need a cond_resched()
> or similar check?

If there's no kernel or hypervisor bug, it shouldn't take more than a
few tens of ms for this loop to complete (Gautham measured a maximum of
10 ms on a POWER9 with an earlier version of this patch).

In case of bugs related to CPU hotplug (either in the kernel or the
hypervisor), I was hoping that the resulting lockup warnings would be a
good indicator that something is wrong. :-)

Though perhaps adding a cond_resched() every 10 ms or so, with a
WARN_ON() if it loops for more than 50 ms would be better.

I'll send an alternative patch.

--
Thiago Jung Bauermann
IBM Linux Technology Center