linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cpuidle: menu: Fix long delay issue when tick stopped
@ 2022-01-17  8:16 Shaokun Zhang
  2022-01-20 18:55 ` Rafael J. Wysocki
  0 siblings, 1 reply; 3+ messages in thread
From: Shaokun Zhang @ 2022-01-17  8:16 UTC (permalink / raw)
  To: linux-pm, linux-kernel
  Cc: Guo Yang, Rafael J. Wysocki, Daniel Lezcano, Shaokun Zhang

From: Guo Yang <guoyang2@huawei.com>

The network delay was always big on arm server tested by qperf,
the reason was that the cpu entered deep power down idle state(like intel
C6) and can't goto a shallow one.

The intervals in @get_typical_interval() was much smaller than predicted_ns
in @menu_select(), so the predict state is always deepest and cause long
time network delay.

Every time when the cpu got an interrupt from the network, the cpu was
waken up and did the IRQ, after that the cpu enter @menu_select()
but the @tick_nohz_tick_stopped() was true and get a big data->next_timer_ns,
the cpu can never goto a shallow state util the data->next_timer_ns timeout.
Below was the print when the issue occurrence.

[   37.082861] intervals = 36us
[   37.082875] intervals = 15us
[   37.082888] intervals = 22us
[   37.082902] intervals = 35us
[   37.082915] intervals = 34us
[   37.082929] intervals = 39us
[   37.082942] intervals = 39us
[   37.082956] intervals = 35us
[   37.082970] target_residency_ns = 10000, predicted_ns = 35832710
[   37.082998] target_residency_ns = 600000, predicted_ns = 35832710
[   37.083037] intervals = 36us
[   37.083050] intervals = 15us
[   37.083064] intervals = 22us
[   37.083077] intervals = 35us
[   37.083091] intervals = 34us
[   37.083104] intervals = 39us
[   37.083118] intervals = 39us
[   37.083131] intervals = 35us
[   37.083145] target_residency_ns = 10000, predicted_ns = 35657420
[   37.083174] target_residency_ns = 600000, predicted_ns = 35657420
[   37.083212] intervals = 36us
[   37.083225] intervals = 15us
[   37.083239] intervals = 22us
[   37.083253] intervals = 35us
[   37.083266] intervals = 34us
[   37.083279] intervals = 39us
[   37.083293] intervals = 39us
[   37.083307] intervals = 35us
[   37.083320] target_residency_ns = 10000, predicted_ns = 35482140
[   37.083349] target_residency_ns = 600000, predicted_ns = 35482140

Add idle tick wakeup judge before change predicted_ns.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Guo Yang <guoyang2@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
---
 drivers/cpuidle/governors/menu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index c492268..3f03843 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -313,7 +313,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 				get_typical_interval(data, predicted_us)) *
 				NSEC_PER_USEC;
 
-	if (tick_nohz_tick_stopped()) {
+	if (tick_nohz_tick_stopped() && data->tick_wakeup) {
 		/*
 		 * If the tick is already stopped, the cost of possible short
 		 * idle duration misprediction is much higher, because the CPU
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] cpuidle: menu: Fix long delay issue when tick stopped
  2022-01-17  8:16 [PATCH] cpuidle: menu: Fix long delay issue when tick stopped Shaokun Zhang
@ 2022-01-20 18:55 ` Rafael J. Wysocki
  2022-01-27  1:43   ` Shaokun Zhang
  0 siblings, 1 reply; 3+ messages in thread
From: Rafael J. Wysocki @ 2022-01-20 18:55 UTC (permalink / raw)
  To: Shaokun Zhang
  Cc: Linux PM, Linux Kernel Mailing List, Guo Yang, Rafael J. Wysocki,
	Daniel Lezcano

On Mon, Jan 17, 2022 at 9:16 AM Shaokun Zhang
<zhangshaokun@hisilicon.com> wrote:
>
> From: Guo Yang <guoyang2@huawei.com>
>
> The network delay was always big on arm server tested by qperf,
> the reason was that the cpu entered deep power down idle state(like intel
> C6) and can't goto a shallow one.
>
> The intervals in @get_typical_interval() was much smaller than predicted_ns
> in @menu_select(), so the predict state is always deepest and cause long
> time network delay.
>
> Every time when the cpu got an interrupt from the network, the cpu was
> waken up and did the IRQ, after that the cpu enter @menu_select()
> but the @tick_nohz_tick_stopped() was true and get a big data->next_timer_ns,
> the cpu can never goto a shallow state util the data->next_timer_ns timeout.
> Below was the print when the issue occurrence.
>
> [   37.082861] intervals = 36us
> [   37.082875] intervals = 15us
> [   37.082888] intervals = 22us
> [   37.082902] intervals = 35us
> [   37.082915] intervals = 34us
> [   37.082929] intervals = 39us
> [   37.082942] intervals = 39us
> [   37.082956] intervals = 35us
> [   37.082970] target_residency_ns = 10000, predicted_ns = 35832710
> [   37.082998] target_residency_ns = 600000, predicted_ns = 35832710
> [   37.083037] intervals = 36us
> [   37.083050] intervals = 15us
> [   37.083064] intervals = 22us
> [   37.083077] intervals = 35us
> [   37.083091] intervals = 34us
> [   37.083104] intervals = 39us
> [   37.083118] intervals = 39us
> [   37.083131] intervals = 35us
> [   37.083145] target_residency_ns = 10000, predicted_ns = 35657420
> [   37.083174] target_residency_ns = 600000, predicted_ns = 35657420
> [   37.083212] intervals = 36us
> [   37.083225] intervals = 15us
> [   37.083239] intervals = 22us
> [   37.083253] intervals = 35us
> [   37.083266] intervals = 34us
> [   37.083279] intervals = 39us
> [   37.083293] intervals = 39us
> [   37.083307] intervals = 35us
> [   37.083320] target_residency_ns = 10000, predicted_ns = 35482140
> [   37.083349] target_residency_ns = 600000, predicted_ns = 35482140
>
> Add idle tick wakeup judge before change predicted_ns.
>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Guo Yang <guoyang2@huawei.com>
> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
> ---
>  drivers/cpuidle/governors/menu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index c492268..3f03843 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -313,7 +313,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>                                 get_typical_interval(data, predicted_us)) *
>                                 NSEC_PER_USEC;
>
> -       if (tick_nohz_tick_stopped()) {
> +       if (tick_nohz_tick_stopped() && data->tick_wakeup) {

data->tick_wakeup is only true if tick_nohz_idle_got_tick() has
returned true, but I'm not sure how this can happen after stopping the
tick.

IOW, it looks like the change simply makes the condition be always false.

>                 /*
>                  * If the tick is already stopped, the cost of possible short
>                  * idle duration misprediction is much higher, because the CPU
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] cpuidle: menu: Fix long delay issue when tick stopped
  2022-01-20 18:55 ` Rafael J. Wysocki
@ 2022-01-27  1:43   ` Shaokun Zhang
  0 siblings, 0 replies; 3+ messages in thread
From: Shaokun Zhang @ 2022-01-27  1:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Linux Kernel Mailing List, Guo Yang, Daniel Lezcano

Hi Rafael,

Apologies that reply later.

On 2022/1/21 2:55, Rafael J. Wysocki wrote:
> On Mon, Jan 17, 2022 at 9:16 AM Shaokun Zhang
> <zhangshaokun@hisilicon.com> wrote:
>>

...<snip>...

>> [   37.083307] intervals = 35us
>> [   37.083320] target_residency_ns = 10000, predicted_ns = 35482140
>> [   37.083349] target_residency_ns = 600000, predicted_ns = 35482140
>>
>> Add idle tick wakeup judge before change predicted_ns.
>>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
>> Signed-off-by: Guo Yang <guoyang2@huawei.com>
>> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
>> ---
>>  drivers/cpuidle/governors/menu.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index c492268..3f03843 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -313,7 +313,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>>                                 get_typical_interval(data, predicted_us)) *
>>                                 NSEC_PER_USEC;
>>
>> -       if (tick_nohz_tick_stopped()) {
>> +       if (tick_nohz_tick_stopped() && data->tick_wakeup) {
> 
> data->tick_wakeup is only true if tick_nohz_idle_got_tick() has
> returned true, but I'm not sure how this can happen after stopping the
> tick.

In order to debug this, call trace is added and as follow:

if (predicted_us < TICK_USEC)
    predicted_us = ktime_to_us(delta_next);
    printk("predicted_us = %uus\n", predicted_us);
    dump_stack(); //add call trace print
}

When the issue came, the CPU was waken up by network interrupts
[ 1048.130033] intervals = 1us
[ 1048.130034] intervals = 1us
[ 1048.130035] intervals = 1us
[ 1048.130036] intervals = 1us
[ 1048.130037] intervals = 1us
[ 1048.130038] intervals = 1us
[ 1048.130039] intervals = 1us
[ 1048.130040] intervals = 1us
[ 1048.130041] predicted_us = 484143us
[ 1048.130043] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           OE     5.3.0-rc6 #23
[ 1048.130044] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.39 12/01/2017
[ 1048.130045] Call Trace:
[ 1048.130048]  dump_stack+0x5a/0x73
[ 1048.130052]  menu_select+0x3b0/0x6c0
[ 1048.130058]  do_idle+0x1b4/0x290
[ 1048.130063]  cpu_startup_entry+0x19/0x20
[ 1048.130067]  start_secondary+0x155/0x1b0
[ 1048.130070]  secondary_startup_64+0xa4/0xb0
[ 1048.130078] intervals = 1us
[ 1048.130079] intervals = 1us
[ 1048.130080] intervals = 1us
[ 1048.130081] intervals = 1us
[ 1048.130081] intervals = 1us
[ 1048.130082] intervals = 1us
[ 1048.130083] intervals = 1us
[ 1048.130084] intervals = 1us
[ 1048.130085] predicted_us = 484097us
[ 1048.130087] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           OE     5.3.0-rc6 #23
[ 1048.130088] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.39 12/01/2017
[ 1048.130089] Call Trace:
[ 1048.130093]  dump_stack+0x5a/0x73
[ 1048.130097]  menu_select+0x3b0/0x6c0
[ 1048.130102]  do_idle+0x1b4/0x290
[ 1048.130107]  cpu_startup_entry+0x19/0x20
[ 1048.130112]  start_secondary+0x155/0x1b0
[ 1048.130115]  secondary_startup_64+0xa4/0xb0
[ 1048.130123] intervals = 1us
[ 1048.130123] intervals = 1us
[ 1048.130124] intervals = 1us
[ 1048.130125] intervals = 1us
[ 1048.130126] intervals = 1us
[ 1048.130127] intervals = 1us
[ 1048.130128] intervals = 1us
[ 1048.130129] intervals = 1us
[ 1048.130130] predicted_us = 484053us
[ 1048.130132] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           OE     5.3.0-rc6 #23
[ 1048.130133] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.39 12/01/2017
[ 1048.130134] Call Trace:
[ 1048.130137]  dump_stack+0x5a/0x73
[ 1048.130141]  menu_select+0x3b0/0x6c0
[ 1048.130147]  do_idle+0x1b4/0x290
[ 1048.130152]  cpu_startup_entry+0x19/0x20
[ 1048.130156]  start_secondary+0x155/0x1b0
[ 1048.130159]  secondary_startup_64+0xa4/0xb0

> 
> IOW, it looks like the change simply makes the condition be always false.
> 

Agree, any good feedback is welcome and we can try it.

Thanks,
Shaokun

>>                 /*
>>                  * If the tick is already stopped, the cost of possible short
>>                  * idle duration misprediction is much higher, because the CPU
>> --
>> 1.8.3.1
>>
> .
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-27  1:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-17  8:16 [PATCH] cpuidle: menu: Fix long delay issue when tick stopped Shaokun Zhang
2022-01-20 18:55 ` Rafael J. Wysocki
2022-01-27  1:43   ` Shaokun Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).