* [PATCH] cpufreq: skip cpufreq resume if it's not suspended @ 2018-01-23 21:57 Bo Yan 2018-01-24 2:02 ` Rafael J. Wysocki 2018-02-05 9:19 ` [PATCH] " Rafael J. Wysocki 0 siblings, 2 replies; 14+ messages in thread From: Bo Yan @ 2018-01-23 21:57 UTC (permalink / raw) To: viresh.kumar, rjw, sgurrappadi; +Cc: linux-pm, linux-kernel, Bo Yan cpufreq_resume can be called even without preceding cpufreq_suspend. This can happen in following scenario: suspend_devices_and_enter --> dpm_suspend_start --> dpm_prepare --> device_prepare : this function errors out --> dpm_suspend: this is skipped due to dpm_prepare failure this means cpufreq_suspend is skipped over --> goto Recover_platform, due to previous error --> goto Resume_devices --> dpm_resume_end --> dpm_resume --> cpufreq_resume In case schedutil is used as frequency governor, cpufreq_resume will eventually call sugov_start, which does following: memset(sg_cpu, 0, sizeof(*sg_cpu)); .... This effectively erases function pointer for frequency update, causing crash later on. The function pointer would have been set correctly if subsequent cpufreq_add_update_util_hook runs successfully, but that function returns earlier because cpufreq_suspend was not called: if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu))) return; Ideally, suspend should succeed, then things will be fine. But even in case of suspend failure, system should not crash. The fix is to check cpufreq_suspended first, if it's false, that means cpufreq_suspend was not called in the first place, so do not resume cpufreq. Signed-off-by: Bo Yan <byan@nvidia.com> --- drivers/cpufreq/cpufreq.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 41d148af7748..95b1c4afe14e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) if (!cpufreq_driver) return; + if (unlikely(!cpufreq_suspended)) { + pr_warn("%s: resume after failing suspend\n", __func__); + return; + } cpufreq_suspended = false; if (!has_target() && !cpufreq_driver->resume) -- 2.7.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-01-23 21:57 [PATCH] cpufreq: skip cpufreq resume if it's not suspended Bo Yan @ 2018-01-24 2:02 ` Rafael J. Wysocki 2018-01-24 20:53 ` Bo Yan 2018-01-25 19:15 ` [PATCH v2] " Bo Yan 2018-02-05 9:19 ` [PATCH] " Rafael J. Wysocki 1 sibling, 2 replies; 14+ messages in thread From: Rafael J. Wysocki @ 2018-01-24 2:02 UTC (permalink / raw) To: Bo Yan; +Cc: viresh.kumar, sgurrappadi, linux-pm, linux-kernel On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: > cpufreq_resume can be called even without preceding cpufreq_suspend. > This can happen in following scenario: > > suspend_devices_and_enter > --> dpm_suspend_start > --> dpm_prepare > --> device_prepare : this function errors out > --> dpm_suspend: this is skipped due to dpm_prepare failure > this means cpufreq_suspend is skipped over > --> goto Recover_platform, due to previous error > --> goto Resume_devices > --> dpm_resume_end > --> dpm_resume > --> cpufreq_resume > > In case schedutil is used as frequency governor, cpufreq_resume will > eventually call sugov_start, which does following: > > memset(sg_cpu, 0, sizeof(*sg_cpu)); > .... > > This effectively erases function pointer for frequency update, causing > crash later on. The function pointer would have been set correctly if > subsequent cpufreq_add_update_util_hook runs successfully, but that > function returns earlier because cpufreq_suspend was not called: > > if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu))) > return; > > Ideally, suspend should succeed, then things will be fine. But even > in case of suspend failure, system should not crash. > > The fix is to check cpufreq_suspended first, if it's false, that means > cpufreq_suspend was not called in the first place, so do not resume > cpufreq. > > Signed-off-by: Bo Yan <byan@nvidia.com> > --- > drivers/cpufreq/cpufreq.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 41d148af7748..95b1c4afe14e 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) > if (!cpufreq_driver) > return; > > + if (unlikely(!cpufreq_suspended)) { > + pr_warn("%s: resume after failing suspend\n", __func__); > + return; > + } > cpufreq_suspended = false; > > if (!has_target() && !cpufreq_driver->resume) > Good catch, but rather than doing this it would be better to avoid calling cpufreq_resume() at all if cpufreq_suspend() has not been called. Thanks, Rafael ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-01-24 2:02 ` Rafael J. Wysocki @ 2018-01-24 20:53 ` Bo Yan 2018-02-02 11:54 ` Rafael J. Wysocki 2018-01-25 19:15 ` [PATCH v2] " Bo Yan 1 sibling, 1 reply; 14+ messages in thread From: Bo Yan @ 2018-01-24 20:53 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: viresh.kumar, sgurrappadi, linux-pm, linux-kernel On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote: > On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: >> drivers/cpufreq/cpufreq.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >> index 41d148af7748..95b1c4afe14e 100644 >> --- a/drivers/cpufreq/cpufreq.c >> +++ b/drivers/cpufreq/cpufreq.c >> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) >> if (!cpufreq_driver) >> return; >> >> + if (unlikely(!cpufreq_suspended)) { >> + pr_warn("%s: resume after failing suspend\n", __func__); >> + return; >> + } >> cpufreq_suspended = false; >> >> if (!has_target() && !cpufreq_driver->resume) >> > Good catch, but rather than doing this it would be better to avoid > calling cpufreq_resume() at all if cpufreq_suspend() has not been called. Yes, I thought about that, but there is no good way to skip over it without introducing another flag. cpufreq_resume is called by dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure case, dpm_resume is called, but dpm_suspend is not. So on a higher level it's already unbalanced. One possibility is to rely on the pm_transition flag. So something like: diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index dc259d20c967..8469e6fc2b2c 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t cookie) void dpm_resume(pm_message_t state) { struct device *dev; + bool suspended = (pm_transition.event != PM_EVENT_ON); ktime_t starttime = ktime_get(); trace_suspend_resume(TPS("dpm_resume"), state.event, true); @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state) async_synchronize_full(); dpm_show_time(starttime, state, NULL); - cpufreq_resume(); + if (likely(suspended)) + cpufreq_resume(); trace_suspend_resume(TPS("dpm_resume"), state.event, false); } This relies on the fact that the pm_transition will stay as PMSG_ON if dpm_prepare failed, in which case dpm_suspend will be skipped over, pm_transition will remain as 0 until dpm_resume. dpm_suspend changes pm_transition to whatever state it receives, which is never PMSG_ON. pm_transition is not changing to PMSG_ON before dpm_resume. This is my understanding. does this make sense? > > Thanks, > Rafael > > ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-01-24 20:53 ` Bo Yan @ 2018-02-02 11:54 ` Rafael J. Wysocki 2018-02-02 19:34 ` Saravana Kannan 0 siblings, 1 reply; 14+ messages in thread From: Rafael J. Wysocki @ 2018-02-02 11:54 UTC (permalink / raw) To: Bo Yan; +Cc: viresh.kumar, sgurrappadi, linux-pm, linux-kernel On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote: > > On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote: > > On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: > >> drivers/cpufreq/cpufreq.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > >> index 41d148af7748..95b1c4afe14e 100644 > >> --- a/drivers/cpufreq/cpufreq.c > >> +++ b/drivers/cpufreq/cpufreq.c > >> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) > >> if (!cpufreq_driver) > >> return; > >> > >> + if (unlikely(!cpufreq_suspended)) { > >> + pr_warn("%s: resume after failing suspend\n", __func__); > >> + return; > >> + } > >> cpufreq_suspended = false; > >> > >> if (!has_target() && !cpufreq_driver->resume) > >> > > Good catch, but rather than doing this it would be better to avoid > > calling cpufreq_resume() at all if cpufreq_suspend() has not been called. > Yes, I thought about that, but there is no good way to skip over it > without introducing another flag. cpufreq_resume is called by > dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure > case, dpm_resume is called, but dpm_suspend is not. So on a higher level > it's already unbalanced. > > One possibility is to rely on the pm_transition flag. So something like: > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index dc259d20c967..8469e6fc2b2c 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t > cookie) > void dpm_resume(pm_message_t state) > { > struct device *dev; > + bool suspended = (pm_transition.event != PM_EVENT_ON); > ktime_t starttime = ktime_get(); > > trace_suspend_resume(TPS("dpm_resume"), state.event, true); > @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state) > async_synchronize_full(); > dpm_show_time(starttime, state, NULL); > > - cpufreq_resume(); > + if (likely(suspended)) > + cpufreq_resume(); > trace_suspend_resume(TPS("dpm_resume"), state.event, false); > } I was thinking about something else. Anyway, I think your original patch is OK too, but without printing the message. Just combine the cpufreq_suspended check with the cpufreq_driver one and the unlikely() thing is not necessary. Thanks, Rafael ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-02-02 11:54 ` Rafael J. Wysocki @ 2018-02-02 19:34 ` Saravana Kannan 2018-02-02 21:28 ` Bo Yan 0 siblings, 1 reply; 14+ messages in thread From: Saravana Kannan @ 2018-02-02 19:34 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Bo Yan, viresh.kumar, sgurrappadi, linux-pm, linux-kernel On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote: > On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote: >> >> On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote: >>> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: >>>> drivers/cpufreq/cpufreq.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>>> index 41d148af7748..95b1c4afe14e 100644 >>>> --- a/drivers/cpufreq/cpufreq.c >>>> +++ b/drivers/cpufreq/cpufreq.c >>>> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) >>>> if (!cpufreq_driver) >>>> return; >>>> >>>> + if (unlikely(!cpufreq_suspended)) { >>>> + pr_warn("%s: resume after failing suspend\n", __func__); >>>> + return; >>>> + } >>>> cpufreq_suspended = false; >>>> >>>> if (!has_target() && !cpufreq_driver->resume) >>>> >>> Good catch, but rather than doing this it would be better to avoid >>> calling cpufreq_resume() at all if cpufreq_suspend() has not been called. >> Yes, I thought about that, but there is no good way to skip over it >> without introducing another flag. cpufreq_resume is called by >> dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure >> case, dpm_resume is called, but dpm_suspend is not. So on a higher level >> it's already unbalanced. >> >> One possibility is to rely on the pm_transition flag. So something like: >> >> >> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c >> index dc259d20c967..8469e6fc2b2c 100644 >> --- a/drivers/base/power/main.c >> +++ b/drivers/base/power/main.c >> @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t >> cookie) >> void dpm_resume(pm_message_t state) >> { >> struct device *dev; >> + bool suspended = (pm_transition.event != PM_EVENT_ON); >> ktime_t starttime = ktime_get(); >> >> trace_suspend_resume(TPS("dpm_resume"), state.event, true); >> @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state) >> async_synchronize_full(); >> dpm_show_time(starttime, state, NULL); >> >> - cpufreq_resume(); >> + if (likely(suspended)) >> + cpufreq_resume(); >> trace_suspend_resume(TPS("dpm_resume"), state.event, false); >> } > > I was thinking about something else. > > Anyway, I think your original patch is OK too, but without printing the > message. Just combine the cpufreq_suspended check with the cpufreq_driver > one and the unlikely() thing is not necessary. > I rather have this fixed in the dpm_suspend/resume() code. This is just masking the first issue that's being caused by unbalanced error handling. If that means adding flags in dpm_suspend/resume() then that's what we should do right now and clean it up later if it can be improved. Making cpufreq more messy doesn't seem like the right answer. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-02-02 19:34 ` Saravana Kannan @ 2018-02-02 21:28 ` Bo Yan 2018-02-05 4:01 ` Viresh Kumar 0 siblings, 1 reply; 14+ messages in thread From: Bo Yan @ 2018-02-02 21:28 UTC (permalink / raw) To: Saravana Kannan, Rafael J. Wysocki Cc: viresh.kumar, sgurrappadi, linux-pm, linux-kernel On 02/02/2018 11:34 AM, Saravana Kannan wrote: > On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote: >> On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote: >>> >>> On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote: >>>> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: >>>>> drivers/cpufreq/cpufreq.c | 4 ++++ >>>>> 1 file changed, 4 insertions(+) >>>>> >>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >>>>> index 41d148af7748..95b1c4afe14e 100644 >>>>> --- a/drivers/cpufreq/cpufreq.c >>>>> +++ b/drivers/cpufreq/cpufreq.c >>>>> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) >>>>> if (!cpufreq_driver) >>>>> return; >>>>> >>>>> + if (unlikely(!cpufreq_suspended)) { >>>>> + pr_warn("%s: resume after failing suspend\n", __func__); >>>>> + return; >>>>> + } >>>>> cpufreq_suspended = false; >>>>> >>>>> if (!has_target() && !cpufreq_driver->resume) >>>>> >>>> Good catch, but rather than doing this it would be better to avoid >>>> calling cpufreq_resume() at all if cpufreq_suspend() has not been >>>> called. >>> Yes, I thought about that, but there is no good way to skip over it >>> without introducing another flag. cpufreq_resume is called by >>> dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure >>> case, dpm_resume is called, but dpm_suspend is not. So on a higher >>> level >>> it's already unbalanced. >>> >>> One possibility is to rely on the pm_transition flag. So something >>> like: >>> >>> >>> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c >>> index dc259d20c967..8469e6fc2b2c 100644 >>> --- a/drivers/base/power/main.c >>> +++ b/drivers/base/power/main.c >>> @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t >>> cookie) >>> void dpm_resume(pm_message_t state) >>> { >>> struct device *dev; >>> + bool suspended = (pm_transition.event != PM_EVENT_ON); >>> ktime_t starttime = ktime_get(); >>> >>> trace_suspend_resume(TPS("dpm_resume"), state.event, true); >>> @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state) >>> async_synchronize_full(); >>> dpm_show_time(starttime, state, NULL); >>> >>> - cpufreq_resume(); >>> + if (likely(suspended)) >>> + cpufreq_resume(); >>> trace_suspend_resume(TPS("dpm_resume"), state.event, false); >>> } >> >> I was thinking about something else. >> >> Anyway, I think your original patch is OK too, but without printing the >> message. Just combine the cpufreq_suspended check with the >> cpufreq_driver >> one and the unlikely() thing is not necessary. >> > > I rather have this fixed in the dpm_suspend/resume() code. This is > just masking the first issue that's being caused by unbalanced error > handling. If that means adding flags in dpm_suspend/resume() then > that's what we should do right now and clean it up later if it can be > improved. Making cpufreq more messy doesn't seem like the right answer. > > Thanks, > Saravana > > dpm_suspend and dpm_resume by themselves are not balanced in this particular case. As it's currently structured, dpm_resume can't be omitted even if dpm_suspend is skipped due to earlier failure. I think checking cpufreq_suspended flag is a reasonable compromise. If we can find a way to make dpm_suspend/dpm_resume also balanced, that will be best. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-02-02 21:28 ` Bo Yan @ 2018-02-05 4:01 ` Viresh Kumar 2018-02-05 8:50 ` Rafael J. Wysocki 0 siblings, 1 reply; 14+ messages in thread From: Viresh Kumar @ 2018-02-05 4:01 UTC (permalink / raw) To: Bo Yan Cc: Saravana Kannan, Rafael J. Wysocki, sgurrappadi, linux-pm, linux-kernel On 02-02-18, 13:28, Bo Yan wrote: > On 02/02/2018 11:34 AM, Saravana Kannan wrote: > >I rather have this fixed in the dpm_suspend/resume() code. This is just > >masking the first issue that's being caused by unbalanced error handling. > >If that means adding flags in dpm_suspend/resume() then that's what we > >should do right now and clean it up later if it can be improved. Making > >cpufreq more messy doesn't seem like the right answer. +1 > dpm_suspend and dpm_resume by themselves are not balanced in this particular > case. As it's currently structured, dpm_resume can't be omitted even if > dpm_suspend is skipped due to earlier failure. I think checking > cpufreq_suspended flag is a reasonable compromise. If we can find a way to > make dpm_suspend/dpm_resume also balanced, that will be best. I think cpufreq is just one of the users which broke. Others didn't break because: - They don't have a complicated resume part. - Or we just don't know that they broke. Resuming something that never suspended is just broken by design. Yeah, its much simpler in this particular case to fix cpufreq core but the suspend/resume/hibernation part is really core kernel and should be fixed to avoid such band-aids. -- viresh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-02-05 4:01 ` Viresh Kumar @ 2018-02-05 8:50 ` Rafael J. Wysocki 2018-02-05 9:05 ` Viresh Kumar 0 siblings, 1 reply; 14+ messages in thread From: Rafael J. Wysocki @ 2018-02-05 8:50 UTC (permalink / raw) To: Viresh Kumar; +Cc: Bo Yan, Saravana Kannan, sgurrappadi, linux-pm, linux-kernel On Monday, February 5, 2018 5:01:18 AM CET Viresh Kumar wrote: > On 02-02-18, 13:28, Bo Yan wrote: > > On 02/02/2018 11:34 AM, Saravana Kannan wrote: > > >I rather have this fixed in the dpm_suspend/resume() code. This is just > > >masking the first issue that's being caused by unbalanced error handling. > > >If that means adding flags in dpm_suspend/resume() then that's what we > > >should do right now and clean it up later if it can be improved. Making > > >cpufreq more messy doesn't seem like the right answer. > > +1 > > > dpm_suspend and dpm_resume by themselves are not balanced in this particular > > case. As it's currently structured, dpm_resume can't be omitted even if > > dpm_suspend is skipped due to earlier failure. I think checking > > cpufreq_suspended flag is a reasonable compromise. If we can find a way to > > make dpm_suspend/dpm_resume also balanced, that will be best. > > I think cpufreq is just one of the users which broke. Others didn't break > because: > > - They don't have a complicated resume part. > - Or we just don't know that they broke. No and no. > Resuming something that never suspended is just broken by design. Yeah, its much > simpler in this particular case to fix cpufreq core but the > suspend/resume/hibernation part is really core kernel and should be fixed to > avoid such band-aids. By design (which I admit may be confusing) it should be fine to call dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason for the failure is. cpufreq_suspend/resume() don't take that into account, everybody else does. Thanks, Rafael ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-02-05 8:50 ` Rafael J. Wysocki @ 2018-02-05 9:05 ` Viresh Kumar 2018-02-15 21:27 ` Saravana Kannan 0 siblings, 1 reply; 14+ messages in thread From: Viresh Kumar @ 2018-02-05 9:05 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Bo Yan, Saravana Kannan, sgurrappadi, linux-pm, linux-kernel On 05-02-18, 09:50, Rafael J. Wysocki wrote: > By design (which I admit may be confusing) it should be fine to call > dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason > for the failure is. cpufreq_suspend/resume() don't take that into account, > everybody else does. Hmm, I see. Can't do much then, just fix the only broken piece of code :) -- viresh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-02-05 9:05 ` Viresh Kumar @ 2018-02-15 21:27 ` Saravana Kannan 2018-02-15 22:06 ` Rafael J. Wysocki 0 siblings, 1 reply; 14+ messages in thread From: Saravana Kannan @ 2018-02-15 21:27 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Bo Yan, sgurrappadi, linux-pm, linux-kernel On 02/05/2018 01:05 AM, Viresh Kumar wrote: > On 05-02-18, 09:50, Rafael J. Wysocki wrote: >> By design (which I admit may be confusing) it should be fine to call >> dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason >> for the failure is. cpufreq_suspend/resume() don't take that into account, >> everybody else does. > > Hmm, I see. Can't do much then, just fix the only broken piece of code :) > Sorry for the late reply, this email didn't get filtered into the right folder. I think the design of dpm_suspend_start() and dpm_resume_end() generally works fine because we seem to keep track of what devices have been suspended so far (in the dpm_suspended_list) and call resume only of those. So, why isn't the right fix to have cpufreq get put into that list? Instead of just always call it on the resume path even if it wasn't suspended? That seems to be the real issue. So, we should either have dpm_suspend/resume() have a flag to keep track of if cpufreq_suspend/resume() was called and make sure they are called in proper pairs. Or have cpufreq register in a way that gets it put in the suspend/resume list. I'd still like to NACK this change. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-02-15 21:27 ` Saravana Kannan @ 2018-02-15 22:06 ` Rafael J. Wysocki 0 siblings, 0 replies; 14+ messages in thread From: Rafael J. Wysocki @ 2018-02-15 22:06 UTC (permalink / raw) To: Saravana Kannan; +Cc: Viresh Kumar, Bo Yan, sgurrappadi, linux-pm, linux-kernel On Thursday, February 15, 2018 10:27:10 PM CET Saravana Kannan wrote: > On 02/05/2018 01:05 AM, Viresh Kumar wrote: > > On 05-02-18, 09:50, Rafael J. Wysocki wrote: > >> By design (which I admit may be confusing) it should be fine to call > >> dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason > >> for the failure is. cpufreq_suspend/resume() don't take that into account, > >> everybody else does. > > > > Hmm, I see. Can't do much then, just fix the only broken piece of code :) > > > > Sorry for the late reply, this email didn't get filtered into the right > folder. > > I think the design of dpm_suspend_start() and dpm_resume_end() generally > works fine because we seem to keep track of what devices have been > suspended so far (in the dpm_suspended_list) and call resume only of > those. So, why isn't the right fix to have cpufreq get put into that > list? Because it is more complicated? > Instead of just always call it on the resume path even if it > wasn't suspended? That seems to be the real issue. > > So, we should either have dpm_suspend/resume() have a flag to keep track > of if cpufreq_suspend/resume() was called and make sure they are called > in proper pairs. Why? > Or have cpufreq register in a way that gets it put in > the suspend/resume list. > > I'd still like to NACK this change. It's gone in already, sorry. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2] cpufreq: skip cpufreq resume if it's not suspended 2018-01-24 2:02 ` Rafael J. Wysocki 2018-01-24 20:53 ` Bo Yan @ 2018-01-25 19:15 ` Bo Yan 1 sibling, 0 replies; 14+ messages in thread From: Bo Yan @ 2018-01-25 19:15 UTC (permalink / raw) To: rjw, pavel, len.brown; +Cc: linux-pm, linux-kernel, Bo Yan cpufreq_resume can be called even without preceding cpufreq_suspend. This can happen in following scenario: suspend_devices_and_enter --> dpm_suspend_start --> dpm_prepare --> device_prepare : this function errors out --> dpm_suspend: this is skipped due to dpm_prepare failure this means cpufreq_suspend is skipped over --> goto Recover_platform, due to previous error --> goto Resume_devices --> dpm_resume_end --> dpm_resume --> cpufreq_resume In case schedutil is used as frequency governor, cpufreq_resume will eventually call sugov_start, which does following: memset(sg_cpu, 0, sizeof(*sg_cpu)); .... This effectively erases function pointer for frequency update, causing crash later on. The function pointer would have been set correctly if subsequent cpufreq_add_update_util_hook runs successfully, but that function returns earlier because cpufreq_suspend was not called: if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu))) return; Ideally, suspend should succeed, then things will be fine. But even in case of suspend failure, system should not crash. The fix is to check the pm_transition status in dpm_resume. if pm_transition.event == PMSG_ON, we know for sure dpm_suspend has not been called, so do not call cpufreq_resume. Signed-off-by: Bo Yan <byan@nvidia.com> --- drivers/base/power/main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 08744b572af6..39829d7a9311 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -921,6 +921,7 @@ static void async_resume(void *data, async_cookie_t cookie) void dpm_resume(pm_message_t state) { struct device *dev; + bool suspended = (pm_transition.event != PM_EVENT_ON); ktime_t starttime = ktime_get(); trace_suspend_resume(TPS("dpm_resume"), state.event, true); @@ -964,7 +965,8 @@ void dpm_resume(pm_message_t state) async_synchronize_full(); dpm_show_time(starttime, state, 0, NULL); - cpufreq_resume(); + if (likely(suspended)) + cpufreq_resume(); trace_suspend_resume(TPS("dpm_resume"), state.event, false); } -- 2.7.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-01-23 21:57 [PATCH] cpufreq: skip cpufreq resume if it's not suspended Bo Yan 2018-01-24 2:02 ` Rafael J. Wysocki @ 2018-02-05 9:19 ` Rafael J. Wysocki 2018-02-05 9:23 ` Viresh Kumar 1 sibling, 1 reply; 14+ messages in thread From: Rafael J. Wysocki @ 2018-02-05 9:19 UTC (permalink / raw) To: Bo Yan; +Cc: viresh.kumar, sgurrappadi, linux-pm, linux-kernel On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: > cpufreq_resume can be called even without preceding cpufreq_suspend. > This can happen in following scenario: > > suspend_devices_and_enter > --> dpm_suspend_start > --> dpm_prepare > --> device_prepare : this function errors out > --> dpm_suspend: this is skipped due to dpm_prepare failure > this means cpufreq_suspend is skipped over > --> goto Recover_platform, due to previous error > --> goto Resume_devices > --> dpm_resume_end > --> dpm_resume > --> cpufreq_resume > > In case schedutil is used as frequency governor, cpufreq_resume will > eventually call sugov_start, which does following: > > memset(sg_cpu, 0, sizeof(*sg_cpu)); > .... > > This effectively erases function pointer for frequency update, causing > crash later on. The function pointer would have been set correctly if > subsequent cpufreq_add_update_util_hook runs successfully, but that > function returns earlier because cpufreq_suspend was not called: > > if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu))) > return; > > Ideally, suspend should succeed, then things will be fine. But even > in case of suspend failure, system should not crash. > > The fix is to check cpufreq_suspended first, if it's false, that means > cpufreq_suspend was not called in the first place, so do not resume > cpufreq. > > Signed-off-by: Bo Yan <byan@nvidia.com> > --- > drivers/cpufreq/cpufreq.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 41d148af7748..95b1c4afe14e 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) > if (!cpufreq_driver) > return; > > + if (unlikely(!cpufreq_suspended)) { > + pr_warn("%s: resume after failing suspend\n", __func__); > + return; > + } > cpufreq_suspended = false; > > if (!has_target() && !cpufreq_driver->resume) I've just edited this patch somewhat (mostly by dropping the pr_warn()) and queued it up. Thanks, Rafael ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended 2018-02-05 9:19 ` [PATCH] " Rafael J. Wysocki @ 2018-02-05 9:23 ` Viresh Kumar 0 siblings, 0 replies; 14+ messages in thread From: Viresh Kumar @ 2018-02-05 9:23 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Bo Yan, sgurrappadi, linux-pm, linux-kernel On 05-02-18, 10:19, Rafael J. Wysocki wrote: > On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > index 41d148af7748..95b1c4afe14e 100644 > > --- a/drivers/cpufreq/cpufreq.c > > +++ b/drivers/cpufreq/cpufreq.c > > @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) > > if (!cpufreq_driver) > > return; > > > > + if (unlikely(!cpufreq_suspended)) { > > + pr_warn("%s: resume after failing suspend\n", __func__); > > + return; > > + } > > cpufreq_suspended = false; > > > > if (!has_target() && !cpufreq_driver->resume) > > I've just edited this patch somewhat (mostly by dropping the pr_warn()) > and queued it up. You can add my Ack as well. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> -- viresh ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-02-15 22:08 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-23 21:57 [PATCH] cpufreq: skip cpufreq resume if it's not suspended Bo Yan 2018-01-24 2:02 ` Rafael J. Wysocki 2018-01-24 20:53 ` Bo Yan 2018-02-02 11:54 ` Rafael J. Wysocki 2018-02-02 19:34 ` Saravana Kannan 2018-02-02 21:28 ` Bo Yan 2018-02-05 4:01 ` Viresh Kumar 2018-02-05 8:50 ` Rafael J. Wysocki 2018-02-05 9:05 ` Viresh Kumar 2018-02-15 21:27 ` Saravana Kannan 2018-02-15 22:06 ` Rafael J. Wysocki 2018-01-25 19:15 ` [PATCH v2] " Bo Yan 2018-02-05 9:19 ` [PATCH] " Rafael J. Wysocki 2018-02-05 9:23 ` Viresh Kumar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).