Linux-PM Archive on lore.kernel.org
 help / color / Atom feed
From: Abhishek <huntbag@linux.vnet.ibm.com>
To: Nicholas Piggin <npiggin@gmail.com>,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
Cc: daniel.lezcano@linaro.org, dja@axtens.net,
	ego@linux.vnet.ibm.com, mpe@ellerman.id.au, rjw@rjwysocki.net
Subject: Re: [PATCH v2 1/1] cpuidle-powernv : forced wakeup for stop states
Date: Wed, 26 Jun 2019 14:39:26 +0530
Message-ID: <003ea53f-1c11-96cf-5949-3d7bf6fc4b31@linux.vnet.ibm.com> (raw)
In-Reply-To: <1560938644.5ukemauqsy.astroid@bobo.none>

Hi Nick,


On 06/19/2019 03:39 PM, Nicholas Piggin wrote:
> Abhishek's on June 19, 2019 7:08 pm:
>> Hi Nick,
>>
>> Thanks for the review. Some replies below.
>>
>> On 06/19/2019 09:53 AM, Nicholas Piggin wrote:
>>> Abhishek Goel's on June 17, 2019 7:56 pm:
>>>> Currently, the cpuidle governors determine what idle state a idling CPU
>>>> should enter into based on heuristics that depend on the idle history on
>>>> that CPU. Given that no predictive heuristic is perfect, there are cases
>>>> where the governor predicts a shallow idle state, hoping that the CPU will
>>>> be busy soon. However, if no new workload is scheduled on that CPU in the
>>>> near future, the CPU may end up in the shallow state.
>>>>
>>>> This is problematic, when the predicted state in the aforementioned
>>>> scenario is a shallow stop state on a tickless system. As we might get
>>>> stuck into shallow states for hours, in absence of ticks or interrupts.
>>>>
>>>> To address this, We forcefully wakeup the cpu by setting the
>>>> decrementer. The decrementer is set to a value that corresponds with the
>>>> residency of the next available state. Thus firing up a timer that will
>>>> forcefully wakeup the cpu. Few such iterations will essentially train the
>>>> governor to select a deeper state for that cpu, as the timer here
>>>> corresponds to the next available cpuidle state residency. Thus, cpu will
>>>> eventually end up in the deepest possible state.
>>>>
>>>> Signed-off-by: Abhishek Goel <huntbag@linux.vnet.ibm.com>
>>>> ---
>>>>
>>>> Auto-promotion
>>>>    v1 : started as auto promotion logic for cpuidle states in generic
>>>> driver
>>>>    v2 : Removed timeout_needed and rebased the code to upstream kernel
>>>> Forced-wakeup
>>>>    v1 : New patch with name of forced wakeup started
>>>>    v2 : Extending the forced wakeup logic for all states. Setting the
>>>> decrementer instead of queuing up a hrtimer to implement the logic.
>>>>
>>>>    drivers/cpuidle/cpuidle-powernv.c | 38 +++++++++++++++++++++++++++++++
>>>>    1 file changed, 38 insertions(+)
>>>>
>>>> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
>>>> index 84b1ebe212b3..bc9ca18ae7e3 100644
>>>> --- a/drivers/cpuidle/cpuidle-powernv.c
>>>> +++ b/drivers/cpuidle/cpuidle-powernv.c
>>>> @@ -46,6 +46,26 @@ static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly
>>>>    static u64 default_snooze_timeout __read_mostly;
>>>>    static bool snooze_timeout_en __read_mostly;
>>>>    
>>>> +static u64 forced_wakeup_timeout(struct cpuidle_device *dev,
>>>> +				 struct cpuidle_driver *drv,
>>>> +				 int index)
>>>> +{
>>>> +	int i;
>>>> +
>>>> +	for (i = index + 1; i < drv->state_count; i++) {
>>>> +		struct cpuidle_state *s = &drv->states[i];
>>>> +		struct cpuidle_state_usage *su = &dev->states_usage[i];
>>>> +
>>>> +		if (s->disabled || su->disable)
>>>> +			continue;
>>>> +
>>>> +		return (s->target_residency + 2 * s->exit_latency) *
>>>> +			tb_ticks_per_usec;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>> It would be nice to not have this kind of loop iteration in the
>>> idle fast path. Can we add a flag or something to the idle state?
>> Currently, we do not have any callback notification or some feedback that
>> notifies the driver everytime some state is enabled/disabled. So we have
>> to parse everytime to get the next enabled state.
> Ahh, that's why you're doing that.
>
>> Are you suggesting to
>> add something like next_enabled_state in cpuidle state structure itself
>> which will be updated when a state is enabled or disabled?
> Hmm, I guess it normally should not iterate over more than one state
> unless some idle states are disabled.
>
> What would have been nice is each state just have its own timeout
> field with ticks already calculated, if that could be updated when
> a state is enabled or disabled. How hard is that to add to the
> cpuidle core?

I have implemented a prototype which does what you have asked for. Added
a  disable_callback which will update timeout whenever a state is 
enabled or
disabled. But It would mean adding some code to cpuidle.h and 
cpuidle/sysfs.c.
If that is not an issue, should I go ahead and post it?
>>>> +
>>>>    static u64 get_snooze_timeout(struct cpuidle_device *dev,
>>>>    			      struct cpuidle_driver *drv,
>>>>    			      int index)
>>>> @@ -144,8 +164,26 @@ static int stop_loop(struct cpuidle_device *dev,
>>>>    		     struct cpuidle_driver *drv,
>>>>    		     int index)
>>>>    {
>>>> +	u64 dec_expiry_tb, dec, timeout_tb, forced_wakeup;
>>>> +
>>>> +	dec = mfspr(SPRN_DEC);
>>>> +	timeout_tb = forced_wakeup_timeout(dev, drv, index);
>>>> +	forced_wakeup = 0;
>>>> +
>>>> +	if (timeout_tb && timeout_tb < dec) {
>>>> +		forced_wakeup = 1;
>>>> +		dec_expiry_tb = mftb() + dec;
>>>> +	}
>>> The compiler probably can't optimise away the SPR manipulations so try
>>> to avoid them if possible.
>> Are you suggesting something like set_dec_before_idle?(in line with
>> what you have suggested to do after idle, reset_dec_after_idle)
> I should have been clear, I meant don't mfspr(SPRN_DEC) until you
> have tested timeout_tb.
>
>>>> +
>>>> +	if (forced_wakeup)
>>>> +		mtspr(SPRN_DEC, timeout_tb);
>>> This should just be put in the above 'if'.
>> Fair point.
>>>> +
>>>>    	power9_idle_type(stop_psscr_table[index].val,
>>>>    			 stop_psscr_table[index].mask);
>>>> +
>>>> +	if (forced_wakeup)
>>>> +		mtspr(SPRN_DEC, dec_expiry_tb - mftb());
>>> This will sometimes go negative and result in another timer interrupt.
>>>
>>> It also breaks irq work (which can be set here by machine check I
>>> believe.
>>>
>>> May need to implement some timer code to do this for you.
>>>
>>> static void reset_dec_after_idle(void)
>>> {
>>> 	u64 now;
>>>           u64 *next_tb;
>>>
>>> 	if (test_irq_work_pending())
>>> 		return;
>>> 	now = mftb;
>>> 	next_tb = this_cpu_ptr(&decrementers_next_tb);
>>>
>>> 	if (now >= *next_tb)
>>> 		return;
>>> 	set_dec(*next_tb - now);
>>> 	if (test_irq_work_pending())
>>> 		set_dec(1);
>>> }
>>>
>>> Something vaguely like that. See timer_interrupt().
>> Ah, Okay. Will go through timer_interrupt().
> Thanks,
> Nick

Thanks,
Abhishek


      reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17  9:56 [PATCH v2 0/1] Forced-wakeup for stop states on Powernv Abhishek Goel
2019-06-17  9:56 ` [PATCH v2 1/1] cpuidle-powernv : forced wakeup for stop states Abhishek Goel
2019-06-19  4:23   ` Nicholas Piggin
2019-06-19  9:08     ` Abhishek
2019-06-19 10:09       ` Nicholas Piggin
2019-06-26  9:09         ` Abhishek [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=003ea53f-1c11-96cf-5949-3d7bf6fc4b31@linux.vnet.ibm.com \
    --to=huntbag@linux.vnet.ibm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dja@axtens.net \
    --cc=ego@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-PM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pm/0 linux-pm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pm linux-pm/ https://lore.kernel.org/linux-pm \
		linux-pm@vger.kernel.org
	public-inbox-index linux-pm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git