linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 4.14 regression - hang on shutdown (VIA longhaul related?)
@ 2017-11-27 10:53 Meelis Roos
  2017-11-27 13:26 ` Rafael J. Wysocki
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Meelis Roos @ 2017-11-27 10:53 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linux-pm, Linux Kernel list

On my Via EPIA-M min-ITX computer, 4.13 works reliably but 4.14 
shutdown or reboot hangs with message "sda: synchronizing cache". 
Longhaul cpufreq has been enable manually with "longhaul.enable=1" and 
it work but ocassionally logs th efollowing in dmesg:

longhaul: Warning: Timeout while waiting for idle PCI bus
cpufreq: __target_index: Failed to change cpu frequency: -16

It took time to bisect because with bad kernels, it does not happen each 
time. Bisecting finally leads to the following commit. Reverting just 
this commit makes it work again.

e948bc8fbee077735c2b71b991a5ca5e573f3506 is the first bad commit
commit e948bc8fbee077735c2b71b991a5ca5e573f3506
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Thu Aug 17 09:12:27 2017 +0530

    cpufreq: Cap the default transition delay value to 10 ms
    
    If transition_delay_us isn't defined by the cpufreq driver, the default
    value of transition delay (time after which the cpufreq governor will
    try updating the frequency again) is currently calculated by multiplying
    transition_latency (nsec) with LATENCY_MULTIPLIER (1000) and then
    converting this time to usec. That gives the exact same value as
    transition_latency, just that the time unit is usec instead of nsec.
    
    With acpi-cpufreq for example, transition_latency is set to around 10
    usec and we get transition delay as 10 ms. Which seems to be a
    reasonable amount of time to reevaluate the frequency again.
    
    But for platforms where frequency switching isn't that fast (like ARM),
    the transition_latency varies from 500 usec to 3 ms, and the transition
    delay becomes 500 ms to 3 seconds. Of course, that is a pretty bad
    default value to start with.
    
    We can try to come across a better formula (instead of multiplying with
    LATENCY_MULTIPLIER) to solve this problem, but will that be worth it ?
    
    This patch tries a simple approach and caps the maximum value of default
    transition delay to 10 ms. Of course, userspace can still come in and
    change this value anytime or individual drivers can rather provide
    transition_delay_us instead.
    
    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

:040000 040000 7bb8dafb58b703b36fc43d3a081c1a4677a4afde 084c10fa24028461048bcf4b8be5360f36aedd05 M      drivers


-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 4.14 regression - hang on shutdown (VIA longhaul related?)
  2017-11-27 10:53 4.14 regression - hang on shutdown (VIA longhaul related?) Meelis Roos
@ 2017-11-27 13:26 ` Rafael J. Wysocki
  2017-11-27 14:31   ` Meelis Roos
  2017-11-28  3:11 ` [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms Viresh Kumar
  2017-12-07  9:45 ` [PATCH V2] cpufreq: longhaul: Revert transition_delay_us to 200 ms Viresh Kumar
  2 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2017-11-27 13:26 UTC (permalink / raw)
  To: Meelis Roos; +Cc: Viresh Kumar, Linux PM, Linux Kernel list

Hi,

On Mon, Nov 27, 2017 at 11:53 AM, Meelis Roos <mroos@linux.ee> wrote:
> On my Via EPIA-M min-ITX computer, 4.13 works reliably but 4.14
> shutdown or reboot hangs with message "sda: synchronizing cache".
> Longhaul cpufreq has been enable manually with "longhaul.enable=1" and
> it work but ocassionally logs th efollowing in dmesg:
>
> longhaul: Warning: Timeout while waiting for idle PCI bus
> cpufreq: __target_index: Failed to change cpu frequency: -16
>
> It took time to bisect because with bad kernels, it does not happen each
> time. Bisecting finally leads to the following commit. Reverting just
> this commit makes it work again.
>
> e948bc8fbee077735c2b71b991a5ca5e573f3506 is the first bad commit
> commit e948bc8fbee077735c2b71b991a5ca5e573f3506
> Author: Viresh Kumar <viresh.kumar@linaro.org>
> Date:   Thu Aug 17 09:12:27 2017 +0530
>
>     cpufreq: Cap the default transition delay value to 10 ms
>
>     If transition_delay_us isn't defined by the cpufreq driver, the default
>     value of transition delay (time after which the cpufreq governor will
>     try updating the frequency again) is currently calculated by multiplying
>     transition_latency (nsec) with LATENCY_MULTIPLIER (1000) and then
>     converting this time to usec. That gives the exact same value as
>     transition_latency, just that the time unit is usec instead of nsec.
>
>     With acpi-cpufreq for example, transition_latency is set to around 10
>     usec and we get transition delay as 10 ms. Which seems to be a
>     reasonable amount of time to reevaluate the frequency again.
>
>     But for platforms where frequency switching isn't that fast (like ARM),
>     the transition_latency varies from 500 usec to 3 ms, and the transition
>     delay becomes 500 ms to 3 seconds. Of course, that is a pretty bad
>     default value to start with.
>
>     We can try to come across a better formula (instead of multiplying with
>     LATENCY_MULTIPLIER) to solve this problem, but will that be worth it ?
>
>     This patch tries a simple approach and caps the maximum value of default
>     transition delay to 10 ms. Of course, userspace can still come in and
>     change this value anytime or individual drivers can rather provide
>     transition_delay_us instead.
>
>     Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> :040000 040000 7bb8dafb58b703b36fc43d3a081c1a4677a4afde 084c10fa24028461048bcf4b8be5360f36aedd05 M      drivers

Please try to replace the 10000 in
cpufreq_policy_transition_delay_us() with a greater number (say 20000
or 50000) and see if that helps.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 4.14 regression - hang on shutdown (VIA longhaul related?)
  2017-11-27 13:26 ` Rafael J. Wysocki
@ 2017-11-27 14:31   ` Meelis Roos
  2017-11-27 15:19     ` Meelis Roos
  0 siblings, 1 reply; 21+ messages in thread
From: Meelis Roos @ 2017-11-27 14:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Viresh Kumar, Linux PM, Linux Kernel list

> Please try to replace the 10000 in
> cpufreq_policy_transition_delay_us() with a greater number (say 20000
> or 50000) and see if that helps.

With 50000 it has worked 4 reboots out of 4.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 4.14 regression - hang on shutdown (VIA longhaul related?)
  2017-11-27 14:31   ` Meelis Roos
@ 2017-11-27 15:19     ` Meelis Roos
  2017-11-27 16:21       ` Rafael J. Wysocki
  0 siblings, 1 reply; 21+ messages in thread
From: Meelis Roos @ 2017-11-27 15:19 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Viresh Kumar, Linux PM, Linux Kernel list

> > Please try to replace the 10000 in
> > cpufreq_policy_transition_delay_us() with a greater number (say 20000
> > or 50000) and see if that helps.
> 
> With 50000 it has worked 4 reboots out of 4.

With 20000, it also seems to work 3 out of 3 times.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 4.14 regression - hang on shutdown (VIA longhaul related?)
  2017-11-27 15:19     ` Meelis Roos
@ 2017-11-27 16:21       ` Rafael J. Wysocki
  2017-11-27 20:20         ` Meelis Roos
  0 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2017-11-27 16:21 UTC (permalink / raw)
  To: Meelis Roos; +Cc: Rafael J. Wysocki, Viresh Kumar, Linux PM, Linux Kernel list

On Mon, Nov 27, 2017 at 4:19 PM, Meelis Roos <mroos@linux.ee> wrote:
>> > Please try to replace the 10000 in
>> > cpufreq_policy_transition_delay_us() with a greater number (say 20000
>> > or 50000) and see if that helps.
>>
>> With 50000 it has worked 4 reboots out of 4.
>
> With 20000, it also seems to work 3 out of 3 times.

OK, please test it a bit more just to be sure.  If 20000 is
sufficient, we can easily make this change.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 4.14 regression - hang on shutdown (VIA longhaul related?)
  2017-11-27 16:21       ` Rafael J. Wysocki
@ 2017-11-27 20:20         ` Meelis Roos
  0 siblings, 0 replies; 21+ messages in thread
From: Meelis Roos @ 2017-11-27 20:20 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Viresh Kumar, Linux PM, Linux Kernel list

> On Mon, Nov 27, 2017 at 4:19 PM, Meelis Roos <mroos@linux.ee> wrote:
> >> > Please try to replace the 10000 in
> >> > cpufreq_policy_transition_delay_us() with a greater number (say 20000
> >> > or 50000) and see if that helps.
> >>
> >> With 50000 it has worked 4 reboots out of 4.
> >
> > With 20000, it also seems to work 3 out of 3 times.
> 
> OK, please test it a bit more just to be sure.  If 20000 is
> sufficient, we can easily make this change.

Seems stable, 20+ reboots after diferenet uptimes and workloads were all 
successful.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-11-27 10:53 4.14 regression - hang on shutdown (VIA longhaul related?) Meelis Roos
  2017-11-27 13:26 ` Rafael J. Wysocki
@ 2017-11-28  3:11 ` Viresh Kumar
  2017-11-28 22:07   ` Rafael J. Wysocki
  2017-12-05  8:18   ` Meelis Roos
  2017-12-07  9:45 ` [PATCH V2] cpufreq: longhaul: Revert transition_delay_us to 200 ms Viresh Kumar
  2 siblings, 2 replies; 21+ messages in thread
From: Viresh Kumar @ 2017-11-28  3:11 UTC (permalink / raw)
  To: Rafael Wysocki, mroos
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, 4 . 14+, linux-kernel

The commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
value to 10 ms") caused a regression on EPIA-M min-ITX computer where
shutdown or reboot hangs occasionally with a print message like:

longhaul: Warning: Timeout while waiting for idle PCI bus
cpufreq: __target_index: Failed to change cpu frequency: -16

This probably happens because the cpufreq governor tries to change the
frequency of the CPU faster than allowed by the hardware.

With the above commit, the default transition delay comes to 10 ms for a
transition_latency of 200 us. Set the default transition delay to 20 ms
directly to fix this regression.

Fixes: e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms")
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Reported-by: Meelis Roos <mroos@linux.ee>
Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/longhaul.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
index c46a12df40dd..56eafcb07859 100644
--- a/drivers/cpufreq/longhaul.c
+++ b/drivers/cpufreq/longhaul.c
@@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
 	if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
 		longhaul_setup_voltagescaling();
 
-	policy->cpuinfo.transition_latency = 200000;	/* nsec */
+	policy->transition_delay_us = 20000;	/* usec */
 
 	return cpufreq_table_validate_and_show(policy, longhaul_table);
 }
-- 
2.15.0.194.g9af6a3dea062

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-11-28  3:11 ` [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms Viresh Kumar
@ 2017-11-28 22:07   ` Rafael J. Wysocki
  2017-11-29  6:59     ` Meelis Roos
  2017-12-05  8:18   ` Meelis Roos
  1 sibling, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2017-11-28 22:07 UTC (permalink / raw)
  To: Viresh Kumar, Meelis Roos
  Cc: Rafael Wysocki, Linux PM, Vincent Guittot, 4 . 14+,
	Linux Kernel Mailing List

On Tue, Nov 28, 2017 at 4:11 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> The commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
> value to 10 ms") caused a regression on EPIA-M min-ITX computer where
> shutdown or reboot hangs occasionally with a print message like:
>
> longhaul: Warning: Timeout while waiting for idle PCI bus
> cpufreq: __target_index: Failed to change cpu frequency: -16
>
> This probably happens because the cpufreq governor tries to change the
> frequency of the CPU faster than allowed by the hardware.
>
> With the above commit, the default transition delay comes to 10 ms for a
> transition_latency of 200 us. Set the default transition delay to 20 ms
> directly to fix this regression.
>
> Fixes: e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms")
> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
> Reported-by: Meelis Roos <mroos@linux.ee>
> Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/longhaul.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> index c46a12df40dd..56eafcb07859 100644
> --- a/drivers/cpufreq/longhaul.c
> +++ b/drivers/cpufreq/longhaul.c
> @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
>         if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
>                 longhaul_setup_voltagescaling();
>
> -       policy->cpuinfo.transition_latency = 200000;    /* nsec */
> +       policy->transition_delay_us = 20000;    /* usec */
>
>         return cpufreq_table_validate_and_show(policy, longhaul_table);
>  }
> --

Meelis, please check if this fixes the shutdown issue you have
reported recently.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-11-28 22:07   ` Rafael J. Wysocki
@ 2017-11-29  6:59     ` Meelis Roos
  2017-12-04 15:03       ` Rafael J. Wysocki
  0 siblings, 1 reply; 21+ messages in thread
From: Meelis Roos @ 2017-11-29  6:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Rafael Wysocki, Linux PM, Vincent Guittot, 4 . 14+,
	Linux Kernel Mailing List

> > diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> > index c46a12df40dd..56eafcb07859 100644
> > --- a/drivers/cpufreq/longhaul.c
> > +++ b/drivers/cpufreq/longhaul.c
> > @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
> >         if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
> >                 longhaul_setup_voltagescaling();
> >
> > -       policy->cpuinfo.transition_latency = 200000;    /* nsec */
> > +       policy->transition_delay_us = 20000;    /* usec */
> >
> >         return cpufreq_table_validate_and_show(policy, longhaul_table);
> >  }
> > --
> 
> Meelis, please check if this fixes the shutdown issue you have
> reported recently.

Yes, but not today - hopefully tomorrow.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-11-29  6:59     ` Meelis Roos
@ 2017-12-04 15:03       ` Rafael J. Wysocki
  0 siblings, 0 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2017-12-04 15:03 UTC (permalink / raw)
  To: Meelis Roos
  Cc: Rafael J. Wysocki, Viresh Kumar, Linux PM, Vincent Guittot,
	4 . 14+,
	Linux Kernel Mailing List

On Wednesday, November 29, 2017 7:59:27 AM CET Meelis Roos wrote:
> > > diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> > > index c46a12df40dd..56eafcb07859 100644
> > > --- a/drivers/cpufreq/longhaul.c
> > > +++ b/drivers/cpufreq/longhaul.c
> > > @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
> > >         if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
> > >                 longhaul_setup_voltagescaling();
> > >
> > > -       policy->cpuinfo.transition_latency = 200000;    /* nsec */
> > > +       policy->transition_delay_us = 20000;    /* usec */
> > >
> > >         return cpufreq_table_validate_and_show(policy, longhaul_table);
> > >  }
> > > --
> > 
> > Meelis, please check if this fixes the shutdown issue you have
> > reported recently.
> 
> Yes, but not today - hopefully tomorrow.

Any news?

I'd like to push the fix for 4.15 shortly if it works for you (I don't
see why it wouldn't work, but still I'd prefer it to be actually tested).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-11-28  3:11 ` [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms Viresh Kumar
  2017-11-28 22:07   ` Rafael J. Wysocki
@ 2017-12-05  8:18   ` Meelis Roos
  2017-12-05  8:54     ` Meelis Roos
  1 sibling, 1 reply; 21+ messages in thread
From: Meelis Roos @ 2017-12-05  8:18 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linux-pm, Vincent Guittot, 4 . 14+, linux-kernel


> The commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
> value to 10 ms") caused a regression on EPIA-M min-ITX computer where
> shutdown or reboot hangs occasionally with a print message like:
> 
> longhaul: Warning: Timeout while waiting for idle PCI bus
> cpufreq: __target_index: Failed to change cpu frequency: -16
> 
> This probably happens because the cpufreq governor tries to change the
> frequency of the CPU faster than allowed by the hardware.
> 
> With the above commit, the default transition delay comes to 10 ms for a
> transition_latency of 200 us. Set the default transition delay to 20 ms
> directly to fix this regression.
> 
> Fixes: e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms")
> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
> Reported-by: Meelis Roos <mroos@linux.ee>
> Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/longhaul.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> index c46a12df40dd..56eafcb07859 100644
> --- a/drivers/cpufreq/longhaul.c
> +++ b/drivers/cpufreq/longhaul.c
> @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
>  	if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
>  		longhaul_setup_voltagescaling();
>  
> -	policy->cpuinfo.transition_latency = 200000;	/* nsec */
> +	policy->transition_delay_us = 20000;	/* usec */
>  
>  	return cpufreq_table_validate_and_show(policy, longhaul_table);
>  }

This patch also works on my EPIA-M board - tested 10+ times.

Sorry it took so long to test, it was a remote computer.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-12-05  8:18   ` Meelis Roos
@ 2017-12-05  8:54     ` Meelis Roos
  2017-12-05 15:26       ` Rafael J. Wysocki
  0 siblings, 1 reply; 21+ messages in thread
From: Meelis Roos @ 2017-12-05  8:54 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linux-pm, Vincent Guittot, 4 . 14+, linux-kernel

> > The commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
> > value to 10 ms") caused a regression on EPIA-M min-ITX computer where
> > shutdown or reboot hangs occasionally with a print message like:
> > 
> > longhaul: Warning: Timeout while waiting for idle PCI bus
> > cpufreq: __target_index: Failed to change cpu frequency: -16
> > 
> > This probably happens because the cpufreq governor tries to change the
> > frequency of the CPU faster than allowed by the hardware.
> > 
> > With the above commit, the default transition delay comes to 10 ms for a
> > transition_latency of 200 us. Set the default transition delay to 20 ms
> > directly to fix this regression.
> > 
> > Fixes: e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms")
> > Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
> > Reported-by: Meelis Roos <mroos@linux.ee>
> > Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> >  drivers/cpufreq/longhaul.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> > index c46a12df40dd..56eafcb07859 100644
> > --- a/drivers/cpufreq/longhaul.c
> > +++ b/drivers/cpufreq/longhaul.c
> > @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
> >  	if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
> >  		longhaul_setup_voltagescaling();
> >  
> > -	policy->cpuinfo.transition_latency = 200000;	/* nsec */
> > +	policy->transition_delay_us = 20000;	/* usec */
> >  
> >  	return cpufreq_table_validate_and_show(policy, longhaul_table);
> >  }
> 
> This patch also works on my EPIA-M board - tested 10+ times.

An on the last try just after sending the mail, it hung again in the 
same way as before - so maybe 20 is on the edge of being good.


-- 
Meelis Roos (mroos@ut.ee)      http://www.cs.ut.ee/~mroos/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-12-05  8:54     ` Meelis Roos
@ 2017-12-05 15:26       ` Rafael J. Wysocki
  2017-12-06 18:21         ` Meelis Roos
  0 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2017-12-05 15:26 UTC (permalink / raw)
  To: Meelis Roos
  Cc: Viresh Kumar, Rafael Wysocki, Linux PM, Vincent Guittot, 4 . 14+,
	Linux Kernel Mailing List

On Tue, Dec 5, 2017 at 9:54 AM, Meelis Roos <mroos@ut.ee> wrote:
>> > The commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
>> > value to 10 ms") caused a regression on EPIA-M min-ITX computer where
>> > shutdown or reboot hangs occasionally with a print message like:
>> >
>> > longhaul: Warning: Timeout while waiting for idle PCI bus
>> > cpufreq: __target_index: Failed to change cpu frequency: -16
>> >
>> > This probably happens because the cpufreq governor tries to change the
>> > frequency of the CPU faster than allowed by the hardware.
>> >
>> > With the above commit, the default transition delay comes to 10 ms for a
>> > transition_latency of 200 us. Set the default transition delay to 20 ms
>> > directly to fix this regression.
>> >
>> > Fixes: e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms")
>> > Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
>> > Reported-by: Meelis Roos <mroos@linux.ee>
>> > Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
>> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>> > ---
>> >  drivers/cpufreq/longhaul.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
>> > index c46a12df40dd..56eafcb07859 100644
>> > --- a/drivers/cpufreq/longhaul.c
>> > +++ b/drivers/cpufreq/longhaul.c
>> > @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
>> >     if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
>> >             longhaul_setup_voltagescaling();
>> >
>> > -   policy->cpuinfo.transition_latency = 200000;    /* nsec */
>> > +   policy->transition_delay_us = 20000;    /* usec */
>> >
>> >     return cpufreq_table_validate_and_show(policy, longhaul_table);
>> >  }
>>
>> This patch also works on my EPIA-M board - tested 10+ times.
>
> An on the last try just after sending the mail, it hung again in the
> same way as before - so maybe 20 is on the edge of being good.

OK, so can you please try to modify the patch to set
transition_delay_us to 30000, say, and see if that's reliable?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-12-05 15:26       ` Rafael J. Wysocki
@ 2017-12-06 18:21         ` Meelis Roos
  2017-12-07  4:40           ` Viresh Kumar
  0 siblings, 1 reply; 21+ messages in thread
From: Meelis Roos @ 2017-12-06 18:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Rafael Wysocki, Linux PM, Vincent Guittot, 4 . 14+,
	Linux Kernel Mailing List

> >> > diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> >> > index c46a12df40dd..56eafcb07859 100644
> >> > --- a/drivers/cpufreq/longhaul.c
> >> > +++ b/drivers/cpufreq/longhaul.c
> >> > @@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
> >> >     if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
> >> >             longhaul_setup_voltagescaling();
> >> >
> >> > -   policy->cpuinfo.transition_latency = 200000;    /* nsec */
> >> > +   policy->transition_delay_us = 20000;    /* usec */
> >> >
> >> >     return cpufreq_table_validate_and_show(policy, longhaul_table);
> >> >  }
> >>
> >> This patch also works on my EPIA-M board - tested 10+ times.
> >
> > An on the last try just after sending the mail, it hung again in the
> > same way as before - so maybe 20 is on the edge of being good.
> 
> OK, so can you please try to modify the patch to set
> transition_delay_us to 30000, say, and see if that's reliable?

30000 was not reliable.

I created root cron job
@reboot sleep 120; /sbin/reboot

and by the evening it was dead again.

Will try 50000 tomorrow.

-- 
Meelis Roos (mroos@linux.ee

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-12-06 18:21         ` Meelis Roos
@ 2017-12-07  4:40           ` Viresh Kumar
  2017-12-07  5:14             ` Meelis Roos
                               ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Viresh Kumar @ 2017-12-07  4:40 UTC (permalink / raw)
  To: Meelis Roos
  Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM, Vincent Guittot,
	4 . 14+,
	Linux Kernel Mailing List

On 06-12-17, 20:21, Meelis Roos wrote:
> 30000 was not reliable.
> 
> I created root cron job
> @reboot sleep 120; /sbin/reboot
> 
> and by the evening it was dead again.
> 
> Will try 50000 tomorrow.

Lets make it similar to what it was before my original patch modified
it, to avoid all corner cases.

Please test against 200 ms, 200000 value here.

-- 
viresh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-12-07  4:40           ` Viresh Kumar
@ 2017-12-07  5:14             ` Meelis Roos
  2017-12-07  7:26             ` Meelis Roos
  2017-12-07 12:51             ` Meelis Roos
  2 siblings, 0 replies; 21+ messages in thread
From: Meelis Roos @ 2017-12-07  5:14 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM, Vincent Guittot,
	4 . 14+,
	Linux Kernel Mailing List

> > 30000 was not reliable.
> > 
> > I created root cron job
> > @reboot sleep 120; /sbin/reboot
> > 
> > and by the evening it was dead again.
> > 
> > Will try 50000 tomorrow.
> 
> Lets make it similar to what it was before my original patch modified
> it, to avoid all corner cases.
> 
> Please test against 200 ms, 200000 value here.

20000 was the first one tested, after it was unreliable, I tested 30000 
and that was unreliable too.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-12-07  4:40           ` Viresh Kumar
  2017-12-07  5:14             ` Meelis Roos
@ 2017-12-07  7:26             ` Meelis Roos
  2017-12-07  9:33               ` Viresh Kumar
  2017-12-07 12:51             ` Meelis Roos
  2 siblings, 1 reply; 21+ messages in thread
From: Meelis Roos @ 2017-12-07  7:26 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM, Vincent Guittot,
	4 . 14+,
	Linux Kernel Mailing List

> On 06-12-17, 20:21, Meelis Roos wrote:
> > 30000 was not reliable.
> > 
> > I created root cron job
> > @reboot sleep 120; /sbin/reboot
> > 
> > and by the evening it was dead again.
> > 
> > Will try 50000 tomorrow.
> 
> Lets make it similar to what it was before my original patch modified
> it, to avoid all corner cases.
> 
> Please test against 200 ms, 200000 value here.

Sorry, I confused 200000 vs 20000, will test 200000.

But 200000 was the value before. Shall I test 200000 with or without 
the other limiting patch?

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-12-07  7:26             ` Meelis Roos
@ 2017-12-07  9:33               ` Viresh Kumar
  0 siblings, 0 replies; 21+ messages in thread
From: Viresh Kumar @ 2017-12-07  9:33 UTC (permalink / raw)
  To: Meelis Roos
  Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM, Vincent Guittot,
	4 . 14+,
	Linux Kernel Mailing List

On 07-12-17, 09:26, Meelis Roos wrote:
> > On 06-12-17, 20:21, Meelis Roos wrote:
> > > 30000 was not reliable.
> > > 
> > > I created root cron job
> > > @reboot sleep 120; /sbin/reboot
> > > 
> > > and by the evening it was dead again.
> > > 
> > > Will try 50000 tomorrow.
> > 
> > Lets make it similar to what it was before my original patch modified
> > it, to avoid all corner cases.
> > 
> > Please test against 200 ms, 200000 value here.
> 
> Sorry, I confused 200000 vs 20000, will test 200000.
> 
> But 200000 was the value before.

It was value of a different variable (transition_latency) at that
time. Just set transition_delay_us in my recent patch as 200,000 and
apply that over mainline.

I will resend the patch in the mean time as well.

> Shall I test 200000 with or without 
> the other limiting patch?

-- 
viresh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH V2] cpufreq: longhaul: Revert transition_delay_us to 200 ms
  2017-11-27 10:53 4.14 regression - hang on shutdown (VIA longhaul related?) Meelis Roos
  2017-11-27 13:26 ` Rafael J. Wysocki
  2017-11-28  3:11 ` [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms Viresh Kumar
@ 2017-12-07  9:45 ` Viresh Kumar
  2 siblings, 0 replies; 21+ messages in thread
From: Viresh Kumar @ 2017-12-07  9:45 UTC (permalink / raw)
  To: Rafael Wysocki, mroos
  Cc: Viresh Kumar, linux-pm, Vincent Guittot, 4 . 14+, linux-kernel

The commit e948bc8fbee0 ("cpufreq: Cap the default transition delay
value to 10 ms") caused a regression on EPIA-M min-ITX computer where
shutdown or reboot hangs occasionally with a print message like:

longhaul: Warning: Timeout while waiting for idle PCI bus
cpufreq: __target_index: Failed to change cpu frequency: -16

This probably happens because the cpufreq governor tries to change the
frequency of the CPU faster than allowed by the hardware.

Before the above commit, the default transition delay was set to 200 ms
for a transition_latency of 200000 ns. Lets revert back to that
transition delay value to fix it. Note that several other transition
delay values were tested like 20 ms and 30 ms and none of them have
resolved system hang issue completely.

Fixes: e948bc8fbee0 ("cpufreq: Cap the default transition delay value to 10 ms")
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Reported-by: Meelis Roos <mroos@linux.ee>
Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
V1->V2:
- s/20 ms/200 ms.

 drivers/cpufreq/longhaul.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
index c46a12df40dd..5faa37c5b091 100644
--- a/drivers/cpufreq/longhaul.c
+++ b/drivers/cpufreq/longhaul.c
@@ -894,7 +894,7 @@ static int longhaul_cpu_init(struct cpufreq_policy *policy)
 	if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
 		longhaul_setup_voltagescaling();
 
-	policy->cpuinfo.transition_latency = 200000;	/* nsec */
+	policy->transition_delay_us = 200000;	/* usec */
 
 	return cpufreq_table_validate_and_show(policy, longhaul_table);
 }
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-12-07  4:40           ` Viresh Kumar
  2017-12-07  5:14             ` Meelis Roos
  2017-12-07  7:26             ` Meelis Roos
@ 2017-12-07 12:51             ` Meelis Roos
  2017-12-07 12:54               ` Rafael J. Wysocki
  2 siblings, 1 reply; 21+ messages in thread
From: Meelis Roos @ 2017-12-07 12:51 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM, Vincent Guittot,
	4 . 14+,
	Linux Kernel Mailing List

> On 06-12-17, 20:21, Meelis Roos wrote:
> > 30000 was not reliable.
> > 
> > I created root cron job
> > @reboot sleep 120; /sbin/reboot
> > 
> > and by the evening it was dead again.
> > 
> > Will try 50000 tomorrow.
> 
> Lets make it similar to what it was before my original patch modified
> it, to avoid all corner cases.
> 
> Please test against 200 ms, 200000 value here.

I tried

policy->transition_delay_us = 200000;

and it still hangs on top of mainline.

What next?


-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms
  2017-12-07 12:51             ` Meelis Roos
@ 2017-12-07 12:54               ` Rafael J. Wysocki
  0 siblings, 0 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2017-12-07 12:54 UTC (permalink / raw)
  To: Meelis Roos
  Cc: Viresh Kumar, Rafael J. Wysocki, Linux PM, Vincent Guittot,
	4 . 14+,
	Linux Kernel Mailing List

On Thursday, December 7, 2017 1:51:04 PM CET Meelis Roos wrote:
> > On 06-12-17, 20:21, Meelis Roos wrote:
> > > 30000 was not reliable.
> > > 
> > > I created root cron job
> > > @reboot sleep 120; /sbin/reboot
> > > 
> > > and by the evening it was dead again.
> > > 
> > > Will try 50000 tomorrow.
> > 
> > Lets make it similar to what it was before my original patch modified
> > it, to avoid all corner cases.
> > 
> > Please test against 200 ms, 200000 value here.
> 
> I tried
> 
> policy->transition_delay_us = 200000;
> 
> and it still hangs on top of mainline.
> 
> What next?

Well, please try to revert the commit you bisected the problem to and see
if that doesn't hang.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-12-07 12:55 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27 10:53 4.14 regression - hang on shutdown (VIA longhaul related?) Meelis Roos
2017-11-27 13:26 ` Rafael J. Wysocki
2017-11-27 14:31   ` Meelis Roos
2017-11-27 15:19     ` Meelis Roos
2017-11-27 16:21       ` Rafael J. Wysocki
2017-11-27 20:20         ` Meelis Roos
2017-11-28  3:11 ` [PATCH] cpufreq: longhaul: Set transition_delay_us to 20 ms Viresh Kumar
2017-11-28 22:07   ` Rafael J. Wysocki
2017-11-29  6:59     ` Meelis Roos
2017-12-04 15:03       ` Rafael J. Wysocki
2017-12-05  8:18   ` Meelis Roos
2017-12-05  8:54     ` Meelis Roos
2017-12-05 15:26       ` Rafael J. Wysocki
2017-12-06 18:21         ` Meelis Roos
2017-12-07  4:40           ` Viresh Kumar
2017-12-07  5:14             ` Meelis Roos
2017-12-07  7:26             ` Meelis Roos
2017-12-07  9:33               ` Viresh Kumar
2017-12-07 12:51             ` Meelis Roos
2017-12-07 12:54               ` Rafael J. Wysocki
2017-12-07  9:45 ` [PATCH V2] cpufreq: longhaul: Revert transition_delay_us to 200 ms Viresh Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).