linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cpufreq: skip cpufreq resume if it's not suspended
@ 2018-01-23 21:57 Bo Yan
  2018-01-24  2:02 ` Rafael J. Wysocki
  2018-02-05  9:19 ` [PATCH] " Rafael J. Wysocki
  0 siblings, 2 replies; 14+ messages in thread
From: Bo Yan @ 2018-01-23 21:57 UTC (permalink / raw)
  To: viresh.kumar, rjw, sgurrappadi; +Cc: linux-pm, linux-kernel, Bo Yan

cpufreq_resume can be called even without preceding cpufreq_suspend.
This can happen in following scenario:

    suspend_devices_and_enter
       --> dpm_suspend_start
          --> dpm_prepare
              --> device_prepare : this function errors out
          --> dpm_suspend: this is skipped due to dpm_prepare failure
                           this means cpufreq_suspend is skipped over
       --> goto Recover_platform, due to previous error
       --> goto Resume_devices
       --> dpm_resume_end
           --> dpm_resume
               --> cpufreq_resume

In case schedutil is used as frequency governor, cpufreq_resume will
eventually call sugov_start, which does following:

    memset(sg_cpu, 0, sizeof(*sg_cpu));
    ....

This effectively erases function pointer for frequency update, causing
crash later on. The function pointer would have been set correctly if
subsequent cpufreq_add_update_util_hook runs successfully, but that
function returns earlier because cpufreq_suspend was not called:

    if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
		return;

Ideally, suspend should succeed, then things will be fine. But even
in case of suspend failure, system should not crash.

The fix is to check cpufreq_suspended first, if it's false, that means
cpufreq_suspend was not called in the first place, so do not resume
cpufreq.

Signed-off-by: Bo Yan <byan@nvidia.com>
---
 drivers/cpufreq/cpufreq.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 41d148af7748..95b1c4afe14e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
 	if (!cpufreq_driver)
 		return;
 
+	if (unlikely(!cpufreq_suspended)) {
+		pr_warn("%s: resume after failing suspend\n", __func__);
+		return;
+	}
 	cpufreq_suspended = false;
 
 	if (!has_target() && !cpufreq_driver->resume)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-01-23 21:57 [PATCH] cpufreq: skip cpufreq resume if it's not suspended Bo Yan
@ 2018-01-24  2:02 ` Rafael J. Wysocki
  2018-01-24 20:53   ` Bo Yan
  2018-01-25 19:15   ` [PATCH v2] " Bo Yan
  2018-02-05  9:19 ` [PATCH] " Rafael J. Wysocki
  1 sibling, 2 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2018-01-24  2:02 UTC (permalink / raw)
  To: Bo Yan; +Cc: viresh.kumar, sgurrappadi, linux-pm, linux-kernel

On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
> cpufreq_resume can be called even without preceding cpufreq_suspend.
> This can happen in following scenario:
> 
>     suspend_devices_and_enter
>        --> dpm_suspend_start
>           --> dpm_prepare
>               --> device_prepare : this function errors out
>           --> dpm_suspend: this is skipped due to dpm_prepare failure
>                            this means cpufreq_suspend is skipped over
>        --> goto Recover_platform, due to previous error
>        --> goto Resume_devices
>        --> dpm_resume_end
>            --> dpm_resume
>                --> cpufreq_resume
> 
> In case schedutil is used as frequency governor, cpufreq_resume will
> eventually call sugov_start, which does following:
> 
>     memset(sg_cpu, 0, sizeof(*sg_cpu));
>     ....
> 
> This effectively erases function pointer for frequency update, causing
> crash later on. The function pointer would have been set correctly if
> subsequent cpufreq_add_update_util_hook runs successfully, but that
> function returns earlier because cpufreq_suspend was not called:
> 
>     if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
> 		return;
> 
> Ideally, suspend should succeed, then things will be fine. But even
> in case of suspend failure, system should not crash.
> 
> The fix is to check cpufreq_suspended first, if it's false, that means
> cpufreq_suspend was not called in the first place, so do not resume
> cpufreq.
> 
> Signed-off-by: Bo Yan <byan@nvidia.com>
> ---
>  drivers/cpufreq/cpufreq.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 41d148af7748..95b1c4afe14e 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
>  	if (!cpufreq_driver)
>  		return;
>  
> +	if (unlikely(!cpufreq_suspended)) {
> +		pr_warn("%s: resume after failing suspend\n", __func__);
> +		return;
> +	}
>  	cpufreq_suspended = false;
>  
>  	if (!has_target() && !cpufreq_driver->resume)
> 

Good catch, but rather than doing this it would be better to avoid
calling cpufreq_resume() at all if cpufreq_suspend() has not been called.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-01-24  2:02 ` Rafael J. Wysocki
@ 2018-01-24 20:53   ` Bo Yan
  2018-02-02 11:54     ` Rafael J. Wysocki
  2018-01-25 19:15   ` [PATCH v2] " Bo Yan
  1 sibling, 1 reply; 14+ messages in thread
From: Bo Yan @ 2018-01-24 20:53 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: viresh.kumar, sgurrappadi, linux-pm, linux-kernel


On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:
> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
>>   drivers/cpufreq/cpufreq.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index 41d148af7748..95b1c4afe14e 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
>>   	if (!cpufreq_driver)
>>   		return;
>>   
>> +	if (unlikely(!cpufreq_suspended)) {
>> +		pr_warn("%s: resume after failing suspend\n", __func__);
>> +		return;
>> +	}
>>   	cpufreq_suspended = false;
>>   
>>   	if (!has_target() && !cpufreq_driver->resume)
>>
> Good catch, but rather than doing this it would be better to avoid
> calling cpufreq_resume() at all if cpufreq_suspend() has not been called.
Yes, I thought about that, but there is no good way to skip over it 
without introducing another flag. cpufreq_resume is called by 
dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure 
case, dpm_resume is called, but dpm_suspend is not. So on a higher level 
it's already unbalanced.

One possibility is to rely on the pm_transition flag. So something like:


diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index dc259d20c967..8469e6fc2b2c 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t 
cookie)
  void dpm_resume(pm_message_t state)
  {
         struct device *dev;
+       bool suspended = (pm_transition.event != PM_EVENT_ON);
         ktime_t starttime = ktime_get();

         trace_suspend_resume(TPS("dpm_resume"), state.event, true);
@@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
         async_synchronize_full();
         dpm_show_time(starttime, state, NULL);

-       cpufreq_resume();
+       if (likely(suspended))
+               cpufreq_resume();
         trace_suspend_resume(TPS("dpm_resume"), state.event, false);
  }

This relies on the fact that the pm_transition will stay as PMSG_ON if 
dpm_prepare failed, in which case dpm_suspend will be skipped over, 
pm_transition will remain as 0 until dpm_resume.

dpm_suspend changes pm_transition to whatever state it receives, which 
is never PMSG_ON. pm_transition is not changing to PMSG_ON before 
dpm_resume. This is my understanding. does this make sense?


>
> Thanks,
> Rafael
>
>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2] cpufreq: skip cpufreq resume if it's not suspended
  2018-01-24  2:02 ` Rafael J. Wysocki
  2018-01-24 20:53   ` Bo Yan
@ 2018-01-25 19:15   ` Bo Yan
  1 sibling, 0 replies; 14+ messages in thread
From: Bo Yan @ 2018-01-25 19:15 UTC (permalink / raw)
  To: rjw, pavel, len.brown; +Cc: linux-pm, linux-kernel, Bo Yan

cpufreq_resume can be called even without preceding cpufreq_suspend.
This can happen in following scenario:

    suspend_devices_and_enter
       --> dpm_suspend_start
          --> dpm_prepare
              --> device_prepare : this function errors out
          --> dpm_suspend: this is skipped due to dpm_prepare failure
                           this means cpufreq_suspend is skipped over
       --> goto Recover_platform, due to previous error
       --> goto Resume_devices
       --> dpm_resume_end
           --> dpm_resume
               --> cpufreq_resume

In case schedutil is used as frequency governor, cpufreq_resume will
eventually call sugov_start, which does following:

    memset(sg_cpu, 0, sizeof(*sg_cpu));
    ....

This effectively erases function pointer for frequency update, causing
crash later on. The function pointer would have been set correctly if
subsequent cpufreq_add_update_util_hook runs successfully, but that
function returns earlier because cpufreq_suspend was not called:

    if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
		return;

Ideally, suspend should succeed, then things will be fine. But even
in case of suspend failure, system should not crash.

The fix is to check the pm_transition status in dpm_resume. if
pm_transition.event == PMSG_ON, we know for sure dpm_suspend has not
been called, so do not call cpufreq_resume.

Signed-off-by: Bo Yan <byan@nvidia.com>
---
 drivers/base/power/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 08744b572af6..39829d7a9311 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -921,6 +921,7 @@ static void async_resume(void *data, async_cookie_t cookie)
 void dpm_resume(pm_message_t state)
 {
 	struct device *dev;
+	bool suspended = (pm_transition.event != PM_EVENT_ON);
 	ktime_t starttime = ktime_get();
 
 	trace_suspend_resume(TPS("dpm_resume"), state.event, true);
@@ -964,7 +965,8 @@ void dpm_resume(pm_message_t state)
 	async_synchronize_full();
 	dpm_show_time(starttime, state, 0, NULL);
 
-	cpufreq_resume();
+	if (likely(suspended))
+		cpufreq_resume();
 	trace_suspend_resume(TPS("dpm_resume"), state.event, false);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-01-24 20:53   ` Bo Yan
@ 2018-02-02 11:54     ` Rafael J. Wysocki
  2018-02-02 19:34       ` Saravana Kannan
  0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2018-02-02 11:54 UTC (permalink / raw)
  To: Bo Yan; +Cc: viresh.kumar, sgurrappadi, linux-pm, linux-kernel

On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote:
> 
> On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:
> > On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
> >>   drivers/cpufreq/cpufreq.c | 4 ++++
> >>   1 file changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> >> index 41d148af7748..95b1c4afe14e 100644
> >> --- a/drivers/cpufreq/cpufreq.c
> >> +++ b/drivers/cpufreq/cpufreq.c
> >> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
> >>   	if (!cpufreq_driver)
> >>   		return;
> >>   
> >> +	if (unlikely(!cpufreq_suspended)) {
> >> +		pr_warn("%s: resume after failing suspend\n", __func__);
> >> +		return;
> >> +	}
> >>   	cpufreq_suspended = false;
> >>   
> >>   	if (!has_target() && !cpufreq_driver->resume)
> >>
> > Good catch, but rather than doing this it would be better to avoid
> > calling cpufreq_resume() at all if cpufreq_suspend() has not been called.
> Yes, I thought about that, but there is no good way to skip over it 
> without introducing another flag. cpufreq_resume is called by 
> dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure 
> case, dpm_resume is called, but dpm_suspend is not. So on a higher level 
> it's already unbalanced.
> 
> One possibility is to rely on the pm_transition flag. So something like:
> 
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index dc259d20c967..8469e6fc2b2c 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t 
> cookie)
>   void dpm_resume(pm_message_t state)
>   {
>          struct device *dev;
> +       bool suspended = (pm_transition.event != PM_EVENT_ON);
>          ktime_t starttime = ktime_get();
> 
>          trace_suspend_resume(TPS("dpm_resume"), state.event, true);
> @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
>          async_synchronize_full();
>          dpm_show_time(starttime, state, NULL);
> 
> -       cpufreq_resume();
> +       if (likely(suspended))
> +               cpufreq_resume();
>          trace_suspend_resume(TPS("dpm_resume"), state.event, false);
>   }

I was thinking about something else.

Anyway, I think your original patch is OK too, but without printing the
message.  Just combine the cpufreq_suspended check with the cpufreq_driver
one and the unlikely() thing is not necessary.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-02-02 11:54     ` Rafael J. Wysocki
@ 2018-02-02 19:34       ` Saravana Kannan
  2018-02-02 21:28         ` Bo Yan
  0 siblings, 1 reply; 14+ messages in thread
From: Saravana Kannan @ 2018-02-02 19:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bo Yan, viresh.kumar, sgurrappadi, linux-pm, linux-kernel

On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote:
> On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote:
>>
>> On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:
>>> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
>>>>    drivers/cpufreq/cpufreq.c | 4 ++++
>>>>    1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>> index 41d148af7748..95b1c4afe14e 100644
>>>> --- a/drivers/cpufreq/cpufreq.c
>>>> +++ b/drivers/cpufreq/cpufreq.c
>>>> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
>>>>    	if (!cpufreq_driver)
>>>>    		return;
>>>>
>>>> +	if (unlikely(!cpufreq_suspended)) {
>>>> +		pr_warn("%s: resume after failing suspend\n", __func__);
>>>> +		return;
>>>> +	}
>>>>    	cpufreq_suspended = false;
>>>>
>>>>    	if (!has_target() && !cpufreq_driver->resume)
>>>>
>>> Good catch, but rather than doing this it would be better to avoid
>>> calling cpufreq_resume() at all if cpufreq_suspend() has not been called.
>> Yes, I thought about that, but there is no good way to skip over it
>> without introducing another flag. cpufreq_resume is called by
>> dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure
>> case, dpm_resume is called, but dpm_suspend is not. So on a higher level
>> it's already unbalanced.
>>
>> One possibility is to rely on the pm_transition flag. So something like:
>>
>>
>> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>> index dc259d20c967..8469e6fc2b2c 100644
>> --- a/drivers/base/power/main.c
>> +++ b/drivers/base/power/main.c
>> @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t
>> cookie)
>>    void dpm_resume(pm_message_t state)
>>    {
>>           struct device *dev;
>> +       bool suspended = (pm_transition.event != PM_EVENT_ON);
>>           ktime_t starttime = ktime_get();
>>
>>           trace_suspend_resume(TPS("dpm_resume"), state.event, true);
>> @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
>>           async_synchronize_full();
>>           dpm_show_time(starttime, state, NULL);
>>
>> -       cpufreq_resume();
>> +       if (likely(suspended))
>> +               cpufreq_resume();
>>           trace_suspend_resume(TPS("dpm_resume"), state.event, false);
>>    }
>
> I was thinking about something else.
>
> Anyway, I think your original patch is OK too, but without printing the
> message.  Just combine the cpufreq_suspended check with the cpufreq_driver
> one and the unlikely() thing is not necessary.
>

I rather have this fixed in the dpm_suspend/resume() code. This is just 
masking the first issue that's being caused by unbalanced error 
handling. If that means adding flags in dpm_suspend/resume() then that's 
what we should do right now and clean it up later if it can be improved. 
Making cpufreq more messy doesn't seem like the right answer.

Thanks,
Saravana


-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-02-02 19:34       ` Saravana Kannan
@ 2018-02-02 21:28         ` Bo Yan
  2018-02-05  4:01           ` Viresh Kumar
  0 siblings, 1 reply; 14+ messages in thread
From: Bo Yan @ 2018-02-02 21:28 UTC (permalink / raw)
  To: Saravana Kannan, Rafael J. Wysocki
  Cc: viresh.kumar, sgurrappadi, linux-pm, linux-kernel

On 02/02/2018 11:34 AM, Saravana Kannan wrote:
> On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote:
>> On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote:
>>>
>>> On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:
>>>> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
>>>>>    drivers/cpufreq/cpufreq.c | 4 ++++
>>>>>    1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>>> index 41d148af7748..95b1c4afe14e 100644
>>>>> --- a/drivers/cpufreq/cpufreq.c
>>>>> +++ b/drivers/cpufreq/cpufreq.c
>>>>> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
>>>>>        if (!cpufreq_driver)
>>>>>            return;
>>>>>
>>>>> +    if (unlikely(!cpufreq_suspended)) {
>>>>> +        pr_warn("%s: resume after failing suspend\n", __func__);
>>>>> +        return;
>>>>> +    }
>>>>>        cpufreq_suspended = false;
>>>>>
>>>>>        if (!has_target() && !cpufreq_driver->resume)
>>>>>
>>>> Good catch, but rather than doing this it would be better to avoid
>>>> calling cpufreq_resume() at all if cpufreq_suspend() has not been 
>>>> called.
>>> Yes, I thought about that, but there is no good way to skip over it
>>> without introducing another flag. cpufreq_resume is called by
>>> dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure
>>> case, dpm_resume is called, but dpm_suspend is not. So on a higher 
>>> level
>>> it's already unbalanced.
>>>
>>> One possibility is to rely on the pm_transition flag. So something 
>>> like:
>>>
>>>
>>> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>>> index dc259d20c967..8469e6fc2b2c 100644
>>> --- a/drivers/base/power/main.c
>>> +++ b/drivers/base/power/main.c
>>> @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t
>>> cookie)
>>>    void dpm_resume(pm_message_t state)
>>>    {
>>>           struct device *dev;
>>> +       bool suspended = (pm_transition.event != PM_EVENT_ON);
>>>           ktime_t starttime = ktime_get();
>>>
>>>           trace_suspend_resume(TPS("dpm_resume"), state.event, true);
>>> @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
>>>           async_synchronize_full();
>>>           dpm_show_time(starttime, state, NULL);
>>>
>>> -       cpufreq_resume();
>>> +       if (likely(suspended))
>>> +               cpufreq_resume();
>>>           trace_suspend_resume(TPS("dpm_resume"), state.event, false);
>>>    }
>>
>> I was thinking about something else.
>>
>> Anyway, I think your original patch is OK too, but without printing the
>> message.  Just combine the cpufreq_suspended check with the 
>> cpufreq_driver
>> one and the unlikely() thing is not necessary.
>>
>
> I rather have this fixed in the dpm_suspend/resume() code. This is 
> just masking the first issue that's being caused by unbalanced error 
> handling. If that means adding flags in dpm_suspend/resume() then 
> that's what we should do right now and clean it up later if it can be 
> improved. Making cpufreq more messy doesn't seem like the right answer.
>
> Thanks,
> Saravana
>
>
dpm_suspend and dpm_resume by themselves are not balanced in this 
particular case. As it's currently structured, dpm_resume can't be 
omitted even if dpm_suspend is skipped due to earlier failure.  I think 
checking cpufreq_suspended flag is a reasonable compromise. If we can 
find a way to make dpm_suspend/dpm_resume also balanced, that will be best.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-02-02 21:28         ` Bo Yan
@ 2018-02-05  4:01           ` Viresh Kumar
  2018-02-05  8:50             ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Viresh Kumar @ 2018-02-05  4:01 UTC (permalink / raw)
  To: Bo Yan
  Cc: Saravana Kannan, Rafael J. Wysocki, sgurrappadi, linux-pm, linux-kernel

On 02-02-18, 13:28, Bo Yan wrote:
> On 02/02/2018 11:34 AM, Saravana Kannan wrote:
> >I rather have this fixed in the dpm_suspend/resume() code. This is just
> >masking the first issue that's being caused by unbalanced error handling.
> >If that means adding flags in dpm_suspend/resume() then that's what we
> >should do right now and clean it up later if it can be improved. Making
> >cpufreq more messy doesn't seem like the right answer.

+1

> dpm_suspend and dpm_resume by themselves are not balanced in this particular
> case. As it's currently structured, dpm_resume can't be omitted even if
> dpm_suspend is skipped due to earlier failure.  I think checking
> cpufreq_suspended flag is a reasonable compromise. If we can find a way to
> make dpm_suspend/dpm_resume also balanced, that will be best.

I think cpufreq is just one of the users which broke. Others didn't break
because:

- They don't have a complicated resume part.
- Or we just don't know that they broke.

Resuming something that never suspended is just broken by design. Yeah, its much
simpler in this particular case to fix cpufreq core but the
suspend/resume/hibernation part is really core kernel and should be fixed to
avoid such band-aids.

-- 
viresh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-02-05  4:01           ` Viresh Kumar
@ 2018-02-05  8:50             ` Rafael J. Wysocki
  2018-02-05  9:05               ` Viresh Kumar
  0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2018-02-05  8:50 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Bo Yan, Saravana Kannan, sgurrappadi, linux-pm, linux-kernel

On Monday, February 5, 2018 5:01:18 AM CET Viresh Kumar wrote:
> On 02-02-18, 13:28, Bo Yan wrote:
> > On 02/02/2018 11:34 AM, Saravana Kannan wrote:
> > >I rather have this fixed in the dpm_suspend/resume() code. This is just
> > >masking the first issue that's being caused by unbalanced error handling.
> > >If that means adding flags in dpm_suspend/resume() then that's what we
> > >should do right now and clean it up later if it can be improved. Making
> > >cpufreq more messy doesn't seem like the right answer.
> 
> +1
> 
> > dpm_suspend and dpm_resume by themselves are not balanced in this particular
> > case. As it's currently structured, dpm_resume can't be omitted even if
> > dpm_suspend is skipped due to earlier failure.  I think checking
> > cpufreq_suspended flag is a reasonable compromise. If we can find a way to
> > make dpm_suspend/dpm_resume also balanced, that will be best.
> 
> I think cpufreq is just one of the users which broke. Others didn't break
> because:
> 
> - They don't have a complicated resume part.
> - Or we just don't know that they broke.

No and no.

> Resuming something that never suspended is just broken by design. Yeah, its much
> simpler in this particular case to fix cpufreq core but the
> suspend/resume/hibernation part is really core kernel and should be fixed to
> avoid such band-aids.

By design (which I admit may be confusing) it should be fine to call
dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason
for the failure is.  cpufreq_suspend/resume() don't take that into account,
everybody else does.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-02-05  8:50             ` Rafael J. Wysocki
@ 2018-02-05  9:05               ` Viresh Kumar
  2018-02-15 21:27                 ` Saravana Kannan
  0 siblings, 1 reply; 14+ messages in thread
From: Viresh Kumar @ 2018-02-05  9:05 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bo Yan, Saravana Kannan, sgurrappadi, linux-pm, linux-kernel

On 05-02-18, 09:50, Rafael J. Wysocki wrote:
> By design (which I admit may be confusing) it should be fine to call
> dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason
> for the failure is.  cpufreq_suspend/resume() don't take that into account,
> everybody else does.

Hmm, I see. Can't do much then, just fix the only broken piece of code :)

-- 
viresh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-01-23 21:57 [PATCH] cpufreq: skip cpufreq resume if it's not suspended Bo Yan
  2018-01-24  2:02 ` Rafael J. Wysocki
@ 2018-02-05  9:19 ` Rafael J. Wysocki
  2018-02-05  9:23   ` Viresh Kumar
  1 sibling, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2018-02-05  9:19 UTC (permalink / raw)
  To: Bo Yan; +Cc: viresh.kumar, sgurrappadi, linux-pm, linux-kernel

On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
> cpufreq_resume can be called even without preceding cpufreq_suspend.
> This can happen in following scenario:
> 
>     suspend_devices_and_enter
>        --> dpm_suspend_start
>           --> dpm_prepare
>               --> device_prepare : this function errors out
>           --> dpm_suspend: this is skipped due to dpm_prepare failure
>                            this means cpufreq_suspend is skipped over
>        --> goto Recover_platform, due to previous error
>        --> goto Resume_devices
>        --> dpm_resume_end
>            --> dpm_resume
>                --> cpufreq_resume
> 
> In case schedutil is used as frequency governor, cpufreq_resume will
> eventually call sugov_start, which does following:
> 
>     memset(sg_cpu, 0, sizeof(*sg_cpu));
>     ....
> 
> This effectively erases function pointer for frequency update, causing
> crash later on. The function pointer would have been set correctly if
> subsequent cpufreq_add_update_util_hook runs successfully, but that
> function returns earlier because cpufreq_suspend was not called:
> 
>     if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
> 		return;
> 
> Ideally, suspend should succeed, then things will be fine. But even
> in case of suspend failure, system should not crash.
> 
> The fix is to check cpufreq_suspended first, if it's false, that means
> cpufreq_suspend was not called in the first place, so do not resume
> cpufreq.
> 
> Signed-off-by: Bo Yan <byan@nvidia.com>
> ---
>  drivers/cpufreq/cpufreq.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 41d148af7748..95b1c4afe14e 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
>  	if (!cpufreq_driver)
>  		return;
>  
> +	if (unlikely(!cpufreq_suspended)) {
> +		pr_warn("%s: resume after failing suspend\n", __func__);
> +		return;
> +	}
>  	cpufreq_suspended = false;
>  
>  	if (!has_target() && !cpufreq_driver->resume)

I've just edited this patch somewhat (mostly by dropping the pr_warn())
and queued it up.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-02-05  9:19 ` [PATCH] " Rafael J. Wysocki
@ 2018-02-05  9:23   ` Viresh Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Viresh Kumar @ 2018-02-05  9:23 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Bo Yan, sgurrappadi, linux-pm, linux-kernel

On 05-02-18, 10:19, Rafael J. Wysocki wrote:
> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 41d148af7748..95b1c4afe14e 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
> >  	if (!cpufreq_driver)
> >  		return;
> >  
> > +	if (unlikely(!cpufreq_suspended)) {
> > +		pr_warn("%s: resume after failing suspend\n", __func__);
> > +		return;
> > +	}
> >  	cpufreq_suspended = false;
> >  
> >  	if (!has_target() && !cpufreq_driver->resume)
> 
> I've just edited this patch somewhat (mostly by dropping the pr_warn())
> and queued it up.

You can add my Ack as well.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-02-05  9:05               ` Viresh Kumar
@ 2018-02-15 21:27                 ` Saravana Kannan
  2018-02-15 22:06                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Saravana Kannan @ 2018-02-15 21:27 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Bo Yan, sgurrappadi, linux-pm, linux-kernel

On 02/05/2018 01:05 AM, Viresh Kumar wrote:
> On 05-02-18, 09:50, Rafael J. Wysocki wrote:
>> By design (which I admit may be confusing) it should be fine to call
>> dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason
>> for the failure is.  cpufreq_suspend/resume() don't take that into account,
>> everybody else does.
>
> Hmm, I see. Can't do much then, just fix the only broken piece of code :)
>

Sorry for the late reply, this email didn't get filtered into the right 
folder.

I think the design of dpm_suspend_start() and dpm_resume_end() generally 
works fine because we seem to keep track of what devices have been 
suspended so far (in the dpm_suspended_list) and call resume only of 
those. So, why isn't the right fix to have cpufreq get put into that 
list? Instead of just always call it on the resume path even if it 
wasn't suspended? That seems to be the real issue.

So, we should either have dpm_suspend/resume() have a flag to keep track 
of if cpufreq_suspend/resume() was called and make sure they are called 
in proper pairs. Or have cpufreq register in a way that gets it put in 
the suspend/resume list.

I'd still like to NACK this change.

-Saravana

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
  2018-02-15 21:27                 ` Saravana Kannan
@ 2018-02-15 22:06                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2018-02-15 22:06 UTC (permalink / raw)
  To: Saravana Kannan; +Cc: Viresh Kumar, Bo Yan, sgurrappadi, linux-pm, linux-kernel

On Thursday, February 15, 2018 10:27:10 PM CET Saravana Kannan wrote:
> On 02/05/2018 01:05 AM, Viresh Kumar wrote:
> > On 05-02-18, 09:50, Rafael J. Wysocki wrote:
> >> By design (which I admit may be confusing) it should be fine to call
> >> dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason
> >> for the failure is.  cpufreq_suspend/resume() don't take that into account,
> >> everybody else does.
> >
> > Hmm, I see. Can't do much then, just fix the only broken piece of code :)
> >
> 
> Sorry for the late reply, this email didn't get filtered into the right 
> folder.
> 
> I think the design of dpm_suspend_start() and dpm_resume_end() generally 
> works fine because we seem to keep track of what devices have been 
> suspended so far (in the dpm_suspended_list) and call resume only of 
> those. So, why isn't the right fix to have cpufreq get put into that 
> list?

Because it is more complicated?

> Instead of just always call it on the resume path even if it 
> wasn't suspended? That seems to be the real issue.
> 
> So, we should either have dpm_suspend/resume() have a flag to keep track 
> of if cpufreq_suspend/resume() was called and make sure they are called 
> in proper pairs.

Why?

> Or have cpufreq register in a way that gets it put in 
> the suspend/resume list.
> 
> I'd still like to NACK this change.

It's gone in already, sorry.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-02-15 22:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-23 21:57 [PATCH] cpufreq: skip cpufreq resume if it's not suspended Bo Yan
2018-01-24  2:02 ` Rafael J. Wysocki
2018-01-24 20:53   ` Bo Yan
2018-02-02 11:54     ` Rafael J. Wysocki
2018-02-02 19:34       ` Saravana Kannan
2018-02-02 21:28         ` Bo Yan
2018-02-05  4:01           ` Viresh Kumar
2018-02-05  8:50             ` Rafael J. Wysocki
2018-02-05  9:05               ` Viresh Kumar
2018-02-15 21:27                 ` Saravana Kannan
2018-02-15 22:06                   ` Rafael J. Wysocki
2018-01-25 19:15   ` [PATCH v2] " Bo Yan
2018-02-05  9:19 ` [PATCH] " Rafael J. Wysocki
2018-02-05  9:23   ` Viresh Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).