All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Thiago Jung Bauermann <bauerman@linux.ibm.com>,
	Nathan Lynch <nathanl@linux.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Michael Bringmann <mwb@linux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Subject: Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
Date: Thu, 02 May 2019 09:12:34 +1000	[thread overview]
Message-ID: <1556752043.jyg2z3kgaw.astroid@bobo.none> (raw)
In-Reply-To: <8736lyrzmh.fsf@linux.ibm.com>

Nathan Lynch's on May 2, 2019 12:57 am:
> Hi Thiago,
> 
> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>>>> +		while (true) {
>>>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>>>  			if (cpu_status == QCSS_STOPPED ||
>>>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>>>  				break;
>>>> -			cpu_relax();
>>>> +			udelay(100);
>>>>  		}
>>>>  	}
>>>
>>> I agree with looping indefinitely but doesn't it need a cond_resched()
>>> or similar check?
>>
>> If there's no kernel or hypervisor bug, it shouldn't take more than a
>> few tens of ms for this loop to complete (Gautham measured a maximum of
>> 10 ms on a POWER9 with an earlier version of this patch).
> 
> 10ms is twice the default scheduler quantum...
> 
> 
>> In case of bugs related to CPU hotplug (either in the kernel or the
>> hypervisor), I was hoping that the resulting lockup warnings would be a
>> good indicator that something is wrong. :-)
> 
> Not convinced we should assume something is wrong if it takes a few
> dozen ms to complete the operation.

Right, and if there is no kernel or hypervisor bug then it will stop
eventually :)

> AFAIK we don't have any guarantees
> about the maximum latency of stop-self, and it can be affected by other
> activity in the system, whether we're in shared processor mode, etc. Not
> to mention smp_query_cpu_stopped has to acquire the global RTAS lock and
> be serialized with other tasks calling into RTAS. So I am concerned
> about generating spurious warnings here.

Agreed.

> 
> If for whatever reason the operation is taking too long, drmgr or
> whichever application is initiating the change will appear to stop
> making progress. It's not too hard to find out what's going on with
> facilities like perf or /proc/pid/stack.
> 
> 
>> Though perhaps adding a cond_resched() every 10 ms or so, with a
>> WARN_ON() if it loops for more than 50 ms would be better.
> 
> A warning doesn't seem appropriate to me, and cond_resched should be
> invoked in each iteration. Or just msleep(1) in each iteration would be
> fine, I think.
> 
> But I'd like to bring in some more context -- here is the body of
> pseries_cpu_die:
> 
> static void pseries_cpu_die(unsigned int cpu)
> {
> 	int tries;
> 	int cpu_status = 1;
> 	unsigned int pcpu = get_hard_smp_processor_id(cpu);
> 
> 	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
> 		cpu_status = 1;
> 		for (tries = 0; tries < 5000; tries++) {
> 			if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
> 				cpu_status = 0;
> 				break;
> 			}
> 			msleep(1);
> 		}
> 	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> 
> 		for (tries = 0; tries < 25; tries++) {
> 			cpu_status = smp_query_cpu_stopped(pcpu);
> 			if (cpu_status == QCSS_STOPPED ||
> 			    cpu_status == QCSS_HARDWARE_ERROR)
> 				break;
> 			cpu_relax();
> 		}
> }
> 
> This patch alters the behavior of the second loop (the CPU_STATE_OFFLINE
> branch). The CPU_STATE_INACTIVE branch is used when the offline behavior
> is to use H_CEDE instead of stop-self, correct?
> 
> And isn't entering H_CEDE expected to be quite a bit faster than
> stop-self? If so, why does that path get five whole seconds[*] while
> we're bikeshedding about tens of milliseconds for stop-self? :-)
> 
> [*] And should it be made to retry indefinitely as well?

I think so.

Thanks,
Nick

WARNING: multiple messages have this Message-ID (diff)
From: Nicholas Piggin <npiggin@gmail.com>
To: Thiago Jung Bauermann <bauerman@linux.ibm.com>,
	Nathan Lynch <nathanl@linux.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org,
	Michael Bringmann <mwb@linux.vnet.ibm.com>,
	Tyrel Datwyler <tyreld@linux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
Date: Thu, 02 May 2019 09:12:34 +1000	[thread overview]
Message-ID: <1556752043.jyg2z3kgaw.astroid@bobo.none> (raw)
In-Reply-To: <8736lyrzmh.fsf@linux.ibm.com>

Nathan Lynch's on May 2, 2019 12:57 am:
> Hi Thiago,
> 
> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>>>> +		while (true) {
>>>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>>>  			if (cpu_status == QCSS_STOPPED ||
>>>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>>>  				break;
>>>> -			cpu_relax();
>>>> +			udelay(100);
>>>>  		}
>>>>  	}
>>>
>>> I agree with looping indefinitely but doesn't it need a cond_resched()
>>> or similar check?
>>
>> If there's no kernel or hypervisor bug, it shouldn't take more than a
>> few tens of ms for this loop to complete (Gautham measured a maximum of
>> 10 ms on a POWER9 with an earlier version of this patch).
> 
> 10ms is twice the default scheduler quantum...
> 
> 
>> In case of bugs related to CPU hotplug (either in the kernel or the
>> hypervisor), I was hoping that the resulting lockup warnings would be a
>> good indicator that something is wrong. :-)
> 
> Not convinced we should assume something is wrong if it takes a few
> dozen ms to complete the operation.

Right, and if there is no kernel or hypervisor bug then it will stop
eventually :)

> AFAIK we don't have any guarantees
> about the maximum latency of stop-self, and it can be affected by other
> activity in the system, whether we're in shared processor mode, etc. Not
> to mention smp_query_cpu_stopped has to acquire the global RTAS lock and
> be serialized with other tasks calling into RTAS. So I am concerned
> about generating spurious warnings here.

Agreed.

> 
> If for whatever reason the operation is taking too long, drmgr or
> whichever application is initiating the change will appear to stop
> making progress. It's not too hard to find out what's going on with
> facilities like perf or /proc/pid/stack.
> 
> 
>> Though perhaps adding a cond_resched() every 10 ms or so, with a
>> WARN_ON() if it loops for more than 50 ms would be better.
> 
> A warning doesn't seem appropriate to me, and cond_resched should be
> invoked in each iteration. Or just msleep(1) in each iteration would be
> fine, I think.
> 
> But I'd like to bring in some more context -- here is the body of
> pseries_cpu_die:
> 
> static void pseries_cpu_die(unsigned int cpu)
> {
> 	int tries;
> 	int cpu_status = 1;
> 	unsigned int pcpu = get_hard_smp_processor_id(cpu);
> 
> 	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
> 		cpu_status = 1;
> 		for (tries = 0; tries < 5000; tries++) {
> 			if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
> 				cpu_status = 0;
> 				break;
> 			}
> 			msleep(1);
> 		}
> 	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> 
> 		for (tries = 0; tries < 25; tries++) {
> 			cpu_status = smp_query_cpu_stopped(pcpu);
> 			if (cpu_status == QCSS_STOPPED ||
> 			    cpu_status == QCSS_HARDWARE_ERROR)
> 				break;
> 			cpu_relax();
> 		}
> }
> 
> This patch alters the behavior of the second loop (the CPU_STATE_OFFLINE
> branch). The CPU_STATE_INACTIVE branch is used when the offline behavior
> is to use H_CEDE instead of stop-self, correct?
> 
> And isn't entering H_CEDE expected to be quite a bit faster than
> stop-self? If so, why does that path get five whole seconds[*] while
> we're bikeshedding about tens of milliseconds for stop-self? :-)
> 
> [*] And should it be made to retry indefinitely as well?

I think so.

Thanks,
Nick

  reply	other threads:[~2019-05-01 23:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-23 22:39 [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU Thiago Jung Bauermann
2019-04-23 22:39 ` Thiago Jung Bauermann
2019-04-30 16:34 ` Nathan Lynch
2019-04-30 19:59   ` Thiago Jung Bauermann
2019-04-30 19:59     ` Thiago Jung Bauermann
2019-05-01 14:57     ` Nathan Lynch
2019-05-01 14:57       ` Nathan Lynch
2019-05-01 23:12       ` Nicholas Piggin [this message]
2019-05-01 23:12         ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1556752043.jyg2z3kgaw.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=bauerman@linux.ibm.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mwb@linux.vnet.ibm.com \
    --cc=nathanl@linux.ibm.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=tyreld@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.