* [RFC PATCH 1/3] cpuidle: ladder: Fix bogus comparison between s64 and u64 @ 2022-11-05 17:42 Zhang Rui 2022-11-05 17:42 ` [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold Zhang Rui ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Zhang Rui @ 2022-11-05 17:42 UTC (permalink / raw) To: rjw, daniel.lezcano; +Cc: linux-pm, linux-kernel ladder_device_state.threshold.promotion_time_ns/demotion_time_ns are u64 type. In ladder_select_state(), variable 'last_residency', as calculated by last_residency = dev->last_residency_ns - drv->states[last_idx].exit_latency_ns are s64 type, and it can be negative value. When this happens, comparing between 'last_residency' and 'promotion_time_ns/demotion_time_ns' become bogus. As a result, the ladder governor promotes or stays with current state errornously. <idle>-0 [001] d..1. 151.893396: ladder_select_state: last_idx 7, last_residency -373033 <idle>-0 [001] d..1. 151.893399: ladder_select_state: dev->last_residency_ns 106967, drv->states[last_idx].exit_latency_ns 480000 <idle>-0 [001] d..1. 151.893402: ladder_select_state: promote, last_state->threshold.promotion_time_ns 480000 <idle>-0 [001] d..1. 151.893404: ladder_select_state: ---> new state 7 <idle>-0 [001] d..1. 151.893465: ladder_select_state: last_idx 7, last_residency -463800 <idle>-0 [001] d..1. 151.893467: ladder_select_state: dev->last_residency_ns 16200, drv->states[last_idx].exit_latency_ns 480000 <idle>-0 [001] d..1. 151.893468: ladder_select_state: promote, last_state->threshold.promotion_time_ns 480000 <idle>-0 [001] dn.1. 151.893470: ladder_select_state: ---> new state 8 Given that promotion_time_ns/demotion_time_ns are initialized with cpuidle_state.exit_latency_ns, which is s64 type, and they are used to compare with 'last_residency', which is also s64 type, there is no reason to use u64 for promotion_time_ns/demotion_time_ns. With this patch, <idle>-0 [001] d..1. 523.578531: ladder_select_state: last_idx 8, last_residency -879453 <idle>-0 [001] d..1. 523.578531: ladder_select_state: dev->last_residency_ns 10547, drv->states[last_idx].exit_latency_ns 890000 <idle>-0 [001] d..1. 523.578532: ladder_select_state: demote , last_state->threshold.demotion_time_ns 890000 <idle>-0 [001] d..1. 523.578532: ladder_select_state: ---> new state 7 <idle>-0 [001] d..1. 523.580220: ladder_select_state: last_idx 7, last_residency -169629 <idle>-0 [001] d..1. 523.580221: ladder_select_state: dev->last_residency_ns 310371, drv->states[last_idx].exit_latency_ns 480000 <idle>-0 [001] d..1. 523.580221: ladder_select_state: demote , last_state->threshold.demotion_time_ns 480000 <idle>-0 [001] d..1. 523.580222: ladder_select_state: ---> new state 6 Signed-off-by: Zhang Rui <rui.zhang@intel.com> --- drivers/cpuidle/governors/ladder.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c index 8e9058c4ea63..fb61118aef37 100644 --- a/drivers/cpuidle/governors/ladder.c +++ b/drivers/cpuidle/governors/ladder.c @@ -27,8 +27,8 @@ struct ladder_device_state { struct { u32 promotion_count; u32 demotion_count; - u64 promotion_time_ns; - u64 demotion_time_ns; + s64 promotion_time_ns; + s64 demotion_time_ns; } threshold; struct { int promotion_count; -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold 2022-11-05 17:42 [RFC PATCH 1/3] cpuidle: ladder: Fix bogus comparison between s64 and u64 Zhang Rui @ 2022-11-05 17:42 ` Zhang Rui 2022-11-23 17:50 ` Rafael J. Wysocki 2022-11-05 17:42 ` [RFC PATCH 3/3] cpuidle: ladder: Fix handling for disabled states Zhang Rui 2022-11-23 17:37 ` [RFC PATCH 1/3] cpuidle: ladder: Fix bogus comparison between s64 and u64 Rafael J. Wysocki 2 siblings, 1 reply; 10+ messages in thread From: Zhang Rui @ 2022-11-05 17:42 UTC (permalink / raw) To: rjw, daniel.lezcano; +Cc: linux-pm, linux-kernel After fixing the bogus comparison between u64 and s64, the ladder governor stops making promotion decisions errornously. However, after this, it is found that the ladder governor demotes much easier than promotes. Below is captured using turbostat after a 30 seconds runtime idle, Without previous patch, Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt 0.30 2373 0 0 0 4 9 25 122 326 2857 0.36 0.04 0.57 98.73 1.48 With previous patch, Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt 0.42 3071 0 771 838 447 327 336 382 299 344 34.18 16.21 17.69 31.51 2.00 And this is caused by the imbalanced PROMOTION_COUNT/DEMOTION_COUNT. With this patch, Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt 0.39 2436 0 1 72 177 51 194 243 799 1883 0.50 0.32 0.35 98.45 1.53 Note that this is an experimental patch to illustrate the problem, and it is checked with idle scenario only for now. I will try to evaluate with more scenarios, and if someone can help evaluate with more scenarios at the same time and provide data for the benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that would be great. Signed-off-by: Zhang Rui <rui.zhang@intel.com> --- drivers/cpuidle/governors/ladder.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c index fb61118aef37..4b47aa0a4da9 100644 --- a/drivers/cpuidle/governors/ladder.c +++ b/drivers/cpuidle/governors/ladder.c @@ -20,8 +20,8 @@ #include <asm/io.h> #include <linux/uaccess.h> -#define PROMOTION_COUNT 4 -#define DEMOTION_COUNT 1 +#define PROMOTION_COUNT 2 +#define DEMOTION_COUNT 4 struct ladder_device_state { struct { -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold 2022-11-05 17:42 ` [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold Zhang Rui @ 2022-11-23 17:50 ` Rafael J. Wysocki 2022-11-23 23:53 ` Doug Smythies 2022-11-25 6:38 ` Zhang Rui 0 siblings, 2 replies; 10+ messages in thread From: Rafael J. Wysocki @ 2022-11-23 17:50 UTC (permalink / raw) To: Zhang Rui; +Cc: rjw, daniel.lezcano, linux-pm, linux-kernel On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote: > > After fixing the bogus comparison between u64 and s64, the ladder > governor stops making promotion decisions errornously. > > However, after this, it is found that the ladder governor demotes much > easier than promotes. "After fixing an error related to using signed and unsigned integers in the ladder governor in a previous patch, that governor turns out to demote much easier than promote" > Below is captured using turbostat after a 30 seconds runtime idle, > > Without previous patch, > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > 0.30 2373 0 0 0 4 9 25 122 326 2857 0.36 0.04 0.57 98.73 1.48 Why is the above relevant? > With previous patch, > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > 0.42 3071 0 771 838 447 327 336 382 299 344 34.18 16.21 17.69 31.51 2.00 > > And this is caused by the imbalanced PROMOTION_COUNT/DEMOTION_COUNT. I would explain why/how the imbalanced PROMOTION_COUNT/DEMOTION_COUNT imbalance causes this. I guess more residency in the deeper idle state is expected? Or desired?? > With this patch, > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > 0.39 2436 0 1 72 177 51 194 243 799 1883 0.50 0.32 0.35 98.45 1.53 > > Note that this is an experimental patch to illustrate the problem, > and it is checked with idle scenario only for now. > I will try to evaluate with more scenarios, and if someone can help > evaluate with more scenarios at the same time and provide data for the > benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that > would be great. So yes, this requires more work. Overall, I think that you are concerned that the previous change might be regarded as a regression and are trying to compensate for it with a PROMOTION_COUNT/DEMOTION_COUNT change. I'm not sure I can agree with that approach, because the shallower idle states might be preferred by the original ladder design intentionally, for performance reasons. > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > drivers/cpuidle/governors/ladder.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c > index fb61118aef37..4b47aa0a4da9 100644 > --- a/drivers/cpuidle/governors/ladder.c > +++ b/drivers/cpuidle/governors/ladder.c > @@ -20,8 +20,8 @@ > #include <asm/io.h> > #include <linux/uaccess.h> > > -#define PROMOTION_COUNT 4 > -#define DEMOTION_COUNT 1 > +#define PROMOTION_COUNT 2 > +#define DEMOTION_COUNT 4 > > struct ladder_device_state { > struct { > -- ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold 2022-11-23 17:50 ` Rafael J. Wysocki @ 2022-11-23 23:53 ` Doug Smythies 2022-11-25 6:38 ` Zhang Rui 1 sibling, 0 replies; 10+ messages in thread From: Doug Smythies @ 2022-11-23 23:53 UTC (permalink / raw) To: 'Rafael J. Wysocki', 'Zhang Rui' Cc: rjw, daniel.lezcano, linux-pm, linux-kernel, Doug Smythies On 2022.11.23 09:50 Rafael wrote: > On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote: >> >> After fixing the bogus comparison between u64 and s64, the ladder >> governor stops making promotion decisions errornously. >> >> However, after this, it is found that the ladder governor demotes much >> easier than promotes. > > "After fixing an error related to using signed and unsigned integers > in the ladder governor in a previous patch, that governor turns out to > demote much easier than promote" > >> Below is captured using turbostat after a 30 seconds runtime idle, >> >> Without previous patch, >> Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt >> 0.30 2373 0 0 0 4 9 25 122 326 2857 0.36 0.04 0.57 98.73 1.48 > > Why is the above relevant? > >> With previous patch, >> Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt >> 0.42 3071 0 771 838 447 327 336 382 299 344 34.18 16.21 17.69 31.51 2.00 >> >> And this is caused by the imbalanced PROMOTION_COUNT/DEMOTION_COUNT. > > I would explain why/how the imbalanced PROMOTION_COUNT/DEMOTION_COUNT > imbalance causes this. > > I guess more residency in the deeper idle state is expected? Or desired?? > >> With this patch, >> Busy% IRQ POLL C1 C1E C3 C6 C7s C8 C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt >> 0.39 2436 0 1 72 177 51 194 243 799 1883 0.50 0.32 0.35 98.45 1.53 >> >> Note that this is an experimental patch to illustrate the problem, >> and it is checked with idle scenario only for now. >> I will try to evaluate with more scenarios, and if someone can help >> evaluate with more scenarios at the same time and provide data for the >> benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that >> would be great. > > So yes, this requires more work. > > Overall, I think that you are concerned that the previous change might > be regarded as a regression and are trying to compensate for it with a > PROMOTION_COUNT/DEMOTION_COUNT change. > > I'm not sure I can agree with that approach, because the shallower > idle states might be preferred by the original ladder design > intentionally, for performance reasons. Hi All, Because I was continuing to test the teo governor with the util patch version 4, it was fairly easy for me to test this patch set also. However, I have had difficulties having enough time to write up my results. The best improvement was for a slow speed ping-pong (I did 3 speeds, fast, medium, and slow) 2 pairs of ping pongs, not forced CPU affinity, schedutil CPU scaling governor, intel_cpufreq CPU scaling driver, HWP disabled. The menu governor was considered the master reference: Old ladder was 44% slower. New ladder was 5.9% slower. Just for reference: Old teo was 29% slower. teo util V4 was 13% slower. The worst degradation was for a fast speed ping-pong 2 pairs of ping pongs, not forced CPU affinity, schedutil CPU scaling governor, intel_cpufreq CPU scaling driver, HWP disabled. The menu governor was considered the master reference: Old ladder was 64% slower. New ladder was 71% slower. Interestingly, the old ladder governor outperformed the menu governor on the slow 2 pair ping-pong with the performance governor: Old ladder was 0.56% faster. New ladder was 0.81% slower. Disclaimer: Test results using the schedutil CPU scaling governor are noisy, with questionable repeatability. I'll try to get all the test results written up soon. ... Doug ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold 2022-11-23 17:50 ` Rafael J. Wysocki 2022-11-23 23:53 ` Doug Smythies @ 2022-11-25 6:38 ` Zhang Rui 2022-11-25 13:36 ` Rafael J. Wysocki 1 sibling, 1 reply; 10+ messages in thread From: Zhang Rui @ 2022-11-25 6:38 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: rjw, daniel.lezcano, linux-pm, linux-kernel Hi, Rafael, thanks for reviewing the patch series. On Wed, 2022-11-23 at 18:50 +0100, Rafael J. Wysocki wrote: > On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote: > > After fixing the bogus comparison between u64 and s64, the ladder > > governor stops making promotion decisions errornously. > > > > However, after this, it is found that the ladder governor demotes > > much > > easier than promotes. > > "After fixing an error related to using signed and unsigned integers > in the ladder governor in a previous patch, that governor turns out > to > demote much easier than promote" > > > Below is captured using turbostat after a 30 seconds runtime idle, > > > > Without previous patch, > > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 > > C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > > 0.30 2373 0 0 0 4 9 25 122 > > 326 2857 0.36 0.04 0.57 98.73 1.48 > > Why is the above relevant? Just for comparison purpose. For a pure idle scenario (Busy% < 0.5), with ladder governor, we used to have 99% CPU%c7, but now, with patch 1/3, CPU%c1 CPU%c3 CPU%c6 CPU%c7 34.18 16.21 17.69 31.51 This does not look like the correct behavior for any cpuidle governor. > > > With previous patch, > > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 > > C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > > 0.42 3071 0 771 838 447 327 336 382 > > 299 344 34.18 16.21 17.69 31.51 2.00 > > > > And this is caused by the imbalanced > > PROMOTION_COUNT/DEMOTION_COUNT. > > I would explain why/how the imbalanced PROMOTION_COUNT/DEMOTION_COUNT > imbalance causes this. sure, how about something below. The PROMOTION_COUNT/DEMOTION_COUNT are used as the threshold between the ladder governor detects it "should promote/demote", and the ladder governor does a real promotion/demotion. Currently, PROMOTION_COUNT is set to 4 and DEMOTION_COUNT is set to 1. This means that the ladder governor does real demotion immediately when it "should demote", but it does real promotion only if it "should promote" 4 times in a row, without a single "should demote" occur in between. As a result, this lower the chance to do real promotion and the ladder governor is more likely to choose a shallower state. > > I guess more residency in the deeper idle state is expected? Or > desired?? > > > With this patch, > > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 > > C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > > 0.39 2436 0 1 72 177 51 194 243 > > 799 1883 0.50 0.32 0.35 98.45 1.53 > > > > Note that this is an experimental patch to illustrate the problem, > > and it is checked with idle scenario only for now. > > I will try to evaluate with more scenarios, and if someone can help > > evaluate with more scenarios at the same time and provide data for > > the > > benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that > > would be great. > > So yes, this requires more work. > > Overall, I think that you are concerned that the previous change > might > be regarded as a regression and are trying to compensate for it with > a > PROMOTION_COUNT/DEMOTION_COUNT change. Exactly. > I'm not sure I can agree with that approach, because the shallower > idle states might be preferred by the original ladder design > intentionally, for performance reasons. > hmmm, even if there is only 30% c7/c8/c9/c10 residency in a pure idle scenario? And further more, since the imbalanced PROMOTION_COUNT/DEMOTION_COUNT and the unsigned/signed integers problem are both there since the first day the ladder governor was introduced, commit 4f86d3a8e297 ("cpuidle: consolidate 2.6.22 cpuidle branch into one patch"), my guess is that the unsigned/signed integers problem introduces a lot of pseudo promotions, and the PROMOTION_COUNT/DEMOTION_COUNT is introduced to workaround this so that the ladder governor doesn't get stuck at deep idle state. I don't have a solid proof for this. But at least for the pure idle scenario, I don't think 30% deep idle residency is the right behavior, and it needs to be tuned anyway. thanks, rui > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > --- > > drivers/cpuidle/governors/ladder.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/cpuidle/governors/ladder.c > > b/drivers/cpuidle/governors/ladder.c > > index fb61118aef37..4b47aa0a4da9 100644 > > --- a/drivers/cpuidle/governors/ladder.c > > +++ b/drivers/cpuidle/governors/ladder.c > > @@ -20,8 +20,8 @@ > > #include <asm/io.h> > > #include <linux/uaccess.h> > > > > -#define PROMOTION_COUNT 4 > > -#define DEMOTION_COUNT 1 > > +#define PROMOTION_COUNT 2 > > +#define DEMOTION_COUNT 4 > > > > struct ladder_device_state { > > struct { > > -- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold 2022-11-25 6:38 ` Zhang Rui @ 2022-11-25 13:36 ` Rafael J. Wysocki 2022-11-27 3:18 ` Zhang Rui 0 siblings, 1 reply; 10+ messages in thread From: Rafael J. Wysocki @ 2022-11-25 13:36 UTC (permalink / raw) To: Zhang Rui; +Cc: Rafael J. Wysocki, rjw, daniel.lezcano, linux-pm, linux-kernel On Fri, Nov 25, 2022 at 7:39 AM Zhang Rui <rui.zhang@intel.com> wrote: > > Hi, Rafael, > > thanks for reviewing the patch series. > > On Wed, 2022-11-23 at 18:50 +0100, Rafael J. Wysocki wrote: > > On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote: > > > After fixing the bogus comparison between u64 and s64, the ladder > > > governor stops making promotion decisions errornously. > > > > > > However, after this, it is found that the ladder governor demotes > > > much > > > easier than promotes. > > > > "After fixing an error related to using signed and unsigned integers > > in the ladder governor in a previous patch, that governor turns out > > to > > demote much easier than promote" > > > > > Below is captured using turbostat after a 30 seconds runtime idle, > > > > > > Without previous patch, > > > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 > > > C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > > > 0.30 2373 0 0 0 4 9 25 122 > > > 326 2857 0.36 0.04 0.57 98.73 1.48 > > > > Why is the above relevant? > > Just for comparison purpose. > For a pure idle scenario (Busy% < 0.5), with ladder governor, we used > to have 99% CPU%c7, but now, with patch 1/3, > > CPU%c1 CPU%c3 CPU%c6 CPU%c7 > 34.18 16.21 17.69 31.51 > This does not look like the correct behavior for any cpuidle governor. It all depends on what the design goal was and I don't really know that in this particular case. It looks like the plan was to make it promote less often than demote or the counts would have been chosen differently. > > > > > With previous patch, > > > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 > > > C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > > > 0.42 3071 0 771 838 447 327 336 382 > > > 299 344 34.18 16.21 17.69 31.51 2.00 > > > > > > And this is caused by the imbalanced > > > PROMOTION_COUNT/DEMOTION_COUNT. > > > > I would explain why/how the imbalanced PROMOTION_COUNT/DEMOTION_COUNT > > imbalance causes this. > > sure, how about something below. > > The PROMOTION_COUNT/DEMOTION_COUNT are used as the threshold between > the ladder governor detects it "should promote/demote", and the ladder > governor does a real promotion/demotion. > > Currently, PROMOTION_COUNT is set to 4 and DEMOTION_COUNT is set to 1. > This means that the ladder governor does real demotion immediately when > it "should demote", but it does real promotion only if it "should > promote" 4 times in a row, without a single "should demote" occur in > between. > > As a result, this lower the chance to do real promotion and the ladder > governor is more likely to choose a shallower state. Sounds good and now the question is what's the behavior expected by users. Do we have any data? > > > > I guess more residency in the deeper idle state is expected? Or > > desired?? > > > > > With this patch, > > > Busy% IRQ POLL C1 C1E C3 C6 C7s C8 > > > C9 C10 CPU%c1 CPU%c3 CPU%c6 CPU%c7 PkgWatt > > > 0.39 2436 0 1 72 177 51 194 243 > > > 799 1883 0.50 0.32 0.35 98.45 1.53 > > > > > > Note that this is an experimental patch to illustrate the problem, > > > and it is checked with idle scenario only for now. > > > I will try to evaluate with more scenarios, and if someone can help > > > evaluate with more scenarios at the same time and provide data for > > > the > > > benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that > > > would be great. > > > > So yes, this requires more work. > > > > Overall, I think that you are concerned that the previous change > > might > > be regarded as a regression and are trying to compensate for it with > > a > > PROMOTION_COUNT/DEMOTION_COUNT change. > > Exactly. > > > I'm not sure I can agree with that approach, because the shallower > > idle states might be preferred by the original ladder design > > intentionally, for performance reasons. > > > hmmm, even if there is only 30% c7/c8/c9/c10 residency in a pure idle > scenario? Yes, even in that case. All boils down to the question regarding user expectations. > And further more, since the imbalanced PROMOTION_COUNT/DEMOTION_COUNT > and the unsigned/signed integers problem are both there since the first > day the ladder governor was introduced, commit 4f86d3a8e297 ("cpuidle: > consolidate 2.6.22 cpuidle branch into one patch"), > > my guess is that > > the unsigned/signed integers problem introduces a lot of pseudo > promotions, and the PROMOTION_COUNT/DEMOTION_COUNT is introduced to > workaround this so that the ladder governor doesn't get stuck at deep > idle state. That may be a good guess, so I would add it to the changelog of the patch. > I don't have a solid proof for this. But at least for the pure idle > scenario, I don't think 30% deep idle residency is the right behavior, > and it needs to be tuned anyway. Well, have you checked what happens if the counts are set to the same value, e.g. 2? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold 2022-11-25 13:36 ` Rafael J. Wysocki @ 2022-11-27 3:18 ` Zhang Rui 0 siblings, 0 replies; 10+ messages in thread From: Zhang Rui @ 2022-11-27 3:18 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: rjw, daniel.lezcano, linux-pm, linux-kernel > > > I don't have a solid proof for this. But at least for the pure idle > > scenario, I don't think 30% deep idle residency is the right > > behavior, > > and it needs to be tuned anyway. > > Well, have you checked what happens if the counts are set to the same > value, e.g. 2? Well, this is embarrassing. I found a problem with my previous data when I re-evaluate following your suggestion. In short, 1. the 30% deep idle residency problem was got when I added some trace_printk() in the ladder_select_state() 2, without those trace_printk(), after patch 1, the ladder governor can still get 98% CPU%c7 in pure idle scenario. Currently, my understanding is that trace_printk() can call __schedule() and this increased the chance that call_cpuidle() returns immediately. When this happens, dev->last_residency_ns is set to 0 and results in a real demotion next time. Anyway, you are right on questioning this approach, because this seems to be a different problem or even a false alarm. So, I think I will submit patch 1/3 and 3/3 as they are bug fixes, and drop this patch for now, and leave the tuning work, if there is any, for the real ladder governor users. What do you think? thanks, rui ^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 3/3] cpuidle: ladder: Fix handling for disabled states 2022-11-05 17:42 [RFC PATCH 1/3] cpuidle: ladder: Fix bogus comparison between s64 and u64 Zhang Rui 2022-11-05 17:42 ` [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold Zhang Rui @ 2022-11-05 17:42 ` Zhang Rui 2022-11-23 17:56 ` Rafael J. Wysocki 2022-11-23 17:37 ` [RFC PATCH 1/3] cpuidle: ladder: Fix bogus comparison between s64 and u64 Rafael J. Wysocki 2 siblings, 1 reply; 10+ messages in thread From: Zhang Rui @ 2022-11-05 17:42 UTC (permalink / raw) To: rjw, daniel.lezcano; +Cc: linux-pm, linux-kernel There are two reasons why CPU idle states may be disabled: either because the driver has disabled them or because they have been disabled by user space via sysfs. Handling for the driver-disabled state in the ladder gonover has three main rules. Two are introduced by commit 62d6ae880e3e ("Honor state disabling in the cpuidle ladder governor"). The behavior is described as below "The behavior and the effect of the disable variable depends on the implementation of a particular governor. In the ladder governor, for example, it is not coherent, i.e. if one is disabling a light state, then all deeper states are disabled as well, but the disable variable does not reflect it. Likewise, if one enables a deep state but a lighter state still is disabled, then this has no effect." So Rule 1. when promote, only checks the 'disable' flag for the next deeper state. If it is disabled, consider all deeper states as disabled and stick with current state. Rule 2. when demote, ignore the 'disable' flag of the next shallower state, because when a deeper state is used, all of its shallower states must be enabled. The third one is introduced by commit a2bd92023357 ("cpuidle: Do not use poll_idle unless user asks for it"). Rule 3. never demote to POLL unless the latency requirement is 0. Handling for the sysfs-disabled state in the ladder governor is introduced by commit 66804c13f7b7 ("PM / cpuidle: Make ladder governor use the "disabled" state flag"). It copies the same logic as driver-disabled state, but this might break because the sysfs-disabled state has different definition and it can be changed at runtime. Today, when ladder governor is used, by playing with the sysfs "disable" attribute, the below behavior is observed. 1. After disabling a specific state, if the CPU was in a deeper state previously, it can still request for the disabled state. 2. After disabling a specific state, if the CPU was in a deeper state previously, it can still request for a state deeper than the disabled state. This behavior is kept until it demotes to a state shallower than the disabled state, and Rule 1 starts to take effect then. The time for Rule 1 to take effect may be long, depending on the workload. For example, on an Intel Kabylake platform, disabling C1E (state 2) does not take effect after 30 seconds in idle scenario. 3. When all non-POLL states are disabled (or just with state1 and state2 disabled), the ladder governor demotes to shallower states, and finally stuck in state 1 (the shallowest non-POLL state), even if the state is disabled. So the previous logic for the driver-disabled state does not work well for the sysfs-disabled state case. Note that with commit 99e98d3fb100 ("cpuidle: Consolidate disabled state checks"), these two cases are combined to share one flag, thus the behaviors for handling the driver-disabled state and the sysfs-disabled state *HAVE TO* be consistent now. Now the question is what behaviors should be used? I'm not sure why ladder governor handles driver-disabled state differently than other governors. And IMO, it also conflicts with the expectation of the sysfs 'disable' attribute, as described in Documentation/admin-guide/pm/cpuidle.rst, "disable Whether or not this idle state is disabled." So this patch changes the ladder governor to align with the sysfs 'disable' attribute definition. This means, 1. when promote, always choose the next enabled deeper state 2. when demote, always choose the next enabled shallower state plus, Rule 3 is kept and enhanced 3. Unless latency requirement is not met, never chooses POLL. (Previously, unless the latency requirement is set to 0, ladder governor won't choose POLL even if the shallowest non-POLL state does not meet the latency requirement) Any comments on this would be really appreciated. Reported-by: Junxiao Chang <junxiao.chang@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> --- Documentation/admin-guide/pm/cpuidle.rst | 4 +- drivers/cpuidle/governors/ladder.c | 149 ++++++++++++++++++----- 2 files changed, 117 insertions(+), 36 deletions(-) diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst index 19754beb5a4e..be21690211ce 100644 --- a/Documentation/admin-guide/pm/cpuidle.rst +++ b/Documentation/admin-guide/pm/cpuidle.rst @@ -470,9 +470,7 @@ governor will never select it for this particular CPU and the ``CPUIdle`` driver will never ask the hardware to enter it for that CPU as a result. However, disabling an idle state for one CPU does not prevent it from being asked for by the other CPUs, so it must be disabled for all of them in order to -never be asked for by any of them. [Note that, due to the way the ``ladder`` -governor is implemented, disabling an idle state prevents that governor from -selecting any idle states deeper than the disabled one too.] +never be asked for by any of them. If the :file:`disable` attribute contains 0, the given idle state is enabled for this particular CPU, but it still may be disabled for some or all of the other diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c index 4b47aa0a4da9..be8ddb792ad8 100644 --- a/drivers/cpuidle/governors/ladder.c +++ b/drivers/cpuidle/governors/ladder.c @@ -57,6 +57,78 @@ static inline void ladder_do_selection(struct cpuidle_device *dev, dev->last_state_idx = new_idx; } +/* + * Choose the first enabled shallower state that meets the latency requirement + */ +static int get_shallower(struct cpuidle_driver *drv, struct cpuidle_device *dev, + int idx, s64 latency_req, bool ignore_poll) +{ + int i; + + /* Choose the first shallower state that meets the requirement */ + for (i = idx; i >= 0; i--) { + if (dev->states_usage[i].disable) + continue; + if (ignore_poll && drv->states[i].flags & CPUIDLE_FLAG_POLLING) + continue; + if (drv->states[i].exit_latency_ns <= latency_req) + return i; + } + + /* Choose the first deeper one if no suitable shallower states found */ + for (i = idx + 1; i < drv->state_count; i++) { + if (dev->states_usage[i].disable) + continue; + if (drv->states[i].exit_latency_ns <= latency_req) + return i; + /* all deeper states cannot meet latency requirement */ + break; + } + + /* + * When comes here, there are two possibilities, + * 1. all enabled state do not meet the latency requirement + * 2. Only POLL meets the latency requirement but ignore_poll is set + * in both cases, the first enabled state should be choosed + */ + for (i = 0; i < drv->state_count; i++) + if (!dev->states_usage[i].disable) + return i; + return 0; +} + +/* + * Choose the first enabled deeper state that meets the latency requirement + */ +static int get_deeper(struct cpuidle_driver *drv, struct cpuidle_device *dev, + int idx, s64 latency_req) +{ + int i, shallowest = -1; + + for (i = idx; i < drv->state_count; i++) { + if (dev->states_usage[i].disable) + continue; + if (shallowest == -1) + shallowest = i; + if (drv->states[i].exit_latency_ns <= latency_req) + return i; + /* No need to search for deeper state because latency_req cannot be met */ + break; + } + + /* Choose a shallower state if no deeper state found */ + for (i = idx - 1; i >= 0; i--) { + if (dev->states_usage[i].disable) + continue; + shallowest = i; + if (drv->states[i].exit_latency_ns <= latency_req) + return i; + } + + /* Choose the shallowest enabled state if latency_req cannot be met */ + return shallowest == -1 ? 0 : shallowest; +} + /** * ladder_select_state - selects the next state to enter * @drv: cpuidle driver @@ -69,59 +141,63 @@ static int ladder_select_state(struct cpuidle_driver *drv, struct ladder_device *ldev = this_cpu_ptr(&ladder_devices); struct ladder_device_state *last_state; int last_idx = dev->last_state_idx; - int first_idx = drv->states[0].flags & CPUIDLE_FLAG_POLLING ? 1 : 0; + int next_idx; s64 latency_req = cpuidle_governor_latency_req(dev->cpu); s64 last_residency; /* Special case when user has set very strict latency requirement */ if (unlikely(latency_req == 0)) { - ladder_do_selection(dev, ldev, last_idx, 0); - return 0; + /* Choose the shallowest state */ + next_idx = get_deeper(drv, dev, 0, 0); + goto end; } last_state = &ldev->states[last_idx]; last_residency = dev->last_residency_ns - drv->states[last_idx].exit_latency_ns; - /* consider promotion */ - if (last_idx < drv->state_count - 1 && - !dev->states_usage[last_idx + 1].disable && - last_residency > last_state->threshold.promotion_time_ns && - drv->states[last_idx + 1].exit_latency_ns <= latency_req) { + /* meet latency requirement first */ + if (drv->states[last_idx].exit_latency_ns > latency_req) { + /* Need to consider POLL because of latency requirement */ + next_idx = get_shallower(drv, dev, last_idx - 1, latency_req, 0); + goto end; + } + + /* choose a deeper state because of promotion */ + if (last_residency > last_state->threshold.promotion_time_ns) { + next_idx = get_deeper(drv, dev, last_idx + 1, latency_req); + + /* no usable deeper state, use available deepest one */ + if (next_idx <= last_idx) + goto end; last_state->stats.promotion_count++; last_state->stats.demotion_count = 0; - if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) { - ladder_do_selection(dev, ldev, last_idx, last_idx + 1); - return last_idx + 1; - } + if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) + goto end; + goto remain_cur; } - /* consider demotion */ - if (last_idx > first_idx && - (dev->states_usage[last_idx].disable || - drv->states[last_idx].exit_latency_ns > latency_req)) { - int i; - - for (i = last_idx - 1; i > first_idx; i--) { - if (drv->states[i].exit_latency_ns <= latency_req) - break; - } - ladder_do_selection(dev, ldev, last_idx, i); - return i; - } + /* choose a shallower state because of demotion */ + if (last_residency < last_state->threshold.demotion_time_ns) { + next_idx = get_shallower(drv, dev, last_idx - 1, latency_req, 1); - if (last_idx > first_idx && - last_residency < last_state->threshold.demotion_time_ns) { + /* no usable shallower state, use available shallowest one */ + if (next_idx >= last_idx) + goto end; last_state->stats.demotion_count++; last_state->stats.promotion_count = 0; - if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) { - ladder_do_selection(dev, ldev, last_idx, last_idx - 1); - return last_idx - 1; - } + if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) + goto end; } - /* otherwise remain at the current state */ - return last_idx; +remain_cur: + /* remain at the current state but in case it is disabled */ + next_idx = get_shallower(drv, dev, last_idx, latency_req, 1); + +end: + if (next_idx != last_idx) + ladder_do_selection(dev, ldev, last_idx, next_idx); + return next_idx; } /** @@ -152,8 +228,15 @@ static int ladder_enable_device(struct cpuidle_driver *drv, if (i < drv->state_count - 1) lstate->threshold.promotion_time_ns = state->exit_latency_ns; + else + /* Effectively disable promotion from deepest state */ + lstate->threshold.promotion_time_ns = S64_MAX; + if (i > first_idx) lstate->threshold.demotion_time_ns = state->exit_latency_ns; + else + /* Effectively disable demotion from shallowest state */ + lstate->threshold.demotion_time_ns = S64_MIN; } return 0; -- 2.25.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/3] cpuidle: ladder: Fix handling for disabled states 2022-11-05 17:42 ` [RFC PATCH 3/3] cpuidle: ladder: Fix handling for disabled states Zhang Rui @ 2022-11-23 17:56 ` Rafael J. Wysocki 0 siblings, 0 replies; 10+ messages in thread From: Rafael J. Wysocki @ 2022-11-23 17:56 UTC (permalink / raw) To: Zhang Rui; +Cc: rjw, daniel.lezcano, linux-pm, linux-kernel On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote: > > There are two reasons why CPU idle states may be disabled: either > because the driver has disabled them or because they have been disabled > by user space via sysfs. > > Handling for the driver-disabled state in the ladder gonover has three > main rules. > > Two are introduced by commit 62d6ae880e3e ("Honor state disabling in the > cpuidle ladder governor"). The behavior is described as below > > "The behavior and the effect of the disable variable depends on the > implementation of a particular governor. In the ladder governor, for > example, it is not coherent, i.e. if one is disabling a light state, > then all deeper states are disabled as well, but the disable variable > does not reflect it. Likewise, if one enables a deep state but a lighter > state still is disabled, then this has no effect." > > So > Rule 1. when promote, only checks the 'disable' flag for the next deeper > state. If it is disabled, consider all deeper states as disabled > and stick with current state. > Rule 2. when demote, ignore the 'disable' flag of the next shallower > state, because when a deeper state is used, all of its shallower > states must be enabled. > > The third one is introduced by commit a2bd92023357 ("cpuidle: Do not use > poll_idle unless user asks for it"). > > Rule 3. never demote to POLL unless the latency requirement is 0. > > Handling for the sysfs-disabled state in the ladder governor is > introduced by commit 66804c13f7b7 ("PM / cpuidle: Make ladder governor > use the "disabled" state flag"). It copies the same logic as > driver-disabled state, but this might break because the sysfs-disabled > state has different definition and it can be changed at runtime. > > Today, when ladder governor is used, by playing with the sysfs "disable" > attribute, the below behavior is observed. > 1. After disabling a specific state, if the CPU was in a deeper state > previously, it can still request for the disabled state. > 2. After disabling a specific state, if the CPU was in a deeper state > previously, it can still request for a state deeper than the disabled > state. > This behavior is kept until it demotes to a state shallower than the > disabled state, and Rule 1 starts to take effect then. The time for > Rule 1 to take effect may be long, depending on the workload. For > example, on an Intel Kabylake platform, disabling C1E (state 2) does not > take effect after 30 seconds in idle scenario. > > 3. When all non-POLL states are disabled (or just with state1 and state2 > disabled), the ladder governor demotes to shallower states, and > finally stuck in state 1 (the shallowest non-POLL state), even if the > state is disabled. > > So the previous logic for the driver-disabled state does not work well > for the sysfs-disabled state case. > > Note that with commit 99e98d3fb100 ("cpuidle: Consolidate disabled state > checks"), these two cases are combined to share one flag, thus the > behaviors for handling the driver-disabled state and the sysfs-disabled > state *HAVE TO* be consistent now. > > Now the question is what behaviors should be used? > I'm not sure why ladder governor handles driver-disabled state > differently than other governors. And IMO, it also conflicts with the > expectation of the sysfs 'disable' attribute, as described in > Documentation/admin-guide/pm/cpuidle.rst, > "disable Whether or not this idle state is disabled." > > So this patch changes the ladder governor to align with the sysfs > 'disable' attribute definition. > This means, > 1. when promote, always choose the next enabled deeper state > 2. when demote, always choose the next enabled shallower state I agree with this. A disabled state should just be omitted and disabling one state should not affect the other states that have not been disabled. > plus, Rule 3 is kept and enhanced > 3. Unless latency requirement is not met, never chooses POLL. > (Previously, unless the latency requirement is set to 0, ladder > governor won't choose POLL even if the shallowest non-POLL state does > not meet the latency requirement) I agree with this too. > Any comments on this would be really appreciated. > > Reported-by: Junxiao Chang <junxiao.chang@intel.com> > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > Documentation/admin-guide/pm/cpuidle.rst | 4 +- > drivers/cpuidle/governors/ladder.c | 149 ++++++++++++++++++----- > 2 files changed, 117 insertions(+), 36 deletions(-) > > diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst > index 19754beb5a4e..be21690211ce 100644 > --- a/Documentation/admin-guide/pm/cpuidle.rst > +++ b/Documentation/admin-guide/pm/cpuidle.rst > @@ -470,9 +470,7 @@ governor will never select it for this particular CPU and the ``CPUIdle`` > driver will never ask the hardware to enter it for that CPU as a result. > However, disabling an idle state for one CPU does not prevent it from being > asked for by the other CPUs, so it must be disabled for all of them in order to > -never be asked for by any of them. [Note that, due to the way the ``ladder`` > -governor is implemented, disabling an idle state prevents that governor from > -selecting any idle states deeper than the disabled one too.] > +never be asked for by any of them. > > If the :file:`disable` attribute contains 0, the given idle state is enabled for > this particular CPU, but it still may be disabled for some or all of the other > diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c > index 4b47aa0a4da9..be8ddb792ad8 100644 > --- a/drivers/cpuidle/governors/ladder.c > +++ b/drivers/cpuidle/governors/ladder.c > @@ -57,6 +57,78 @@ static inline void ladder_do_selection(struct cpuidle_device *dev, > dev->last_state_idx = new_idx; > } > > +/* > + * Choose the first enabled shallower state that meets the latency requirement > + */ > +static int get_shallower(struct cpuidle_driver *drv, struct cpuidle_device *dev, > + int idx, s64 latency_req, bool ignore_poll) > +{ > + int i; > + > + /* Choose the first shallower state that meets the requirement */ > + for (i = idx; i >= 0; i--) { > + if (dev->states_usage[i].disable) > + continue; > + if (ignore_poll && drv->states[i].flags & CPUIDLE_FLAG_POLLING) > + continue; > + if (drv->states[i].exit_latency_ns <= latency_req) > + return i; > + } > + > + /* Choose the first deeper one if no suitable shallower states found */ > + for (i = idx + 1; i < drv->state_count; i++) { > + if (dev->states_usage[i].disable) > + continue; > + if (drv->states[i].exit_latency_ns <= latency_req) > + return i; > + /* all deeper states cannot meet latency requirement */ > + break; > + } > + > + /* > + * When comes here, there are two possibilities, > + * 1. all enabled state do not meet the latency requirement > + * 2. Only POLL meets the latency requirement but ignore_poll is set > + * in both cases, the first enabled state should be choosed > + */ > + for (i = 0; i < drv->state_count; i++) > + if (!dev->states_usage[i].disable) > + return i; > + return 0; > +} > + > +/* > + * Choose the first enabled deeper state that meets the latency requirement > + */ > +static int get_deeper(struct cpuidle_driver *drv, struct cpuidle_device *dev, > + int idx, s64 latency_req) > +{ > + int i, shallowest = -1; > + > + for (i = idx; i < drv->state_count; i++) { > + if (dev->states_usage[i].disable) > + continue; > + if (shallowest == -1) > + shallowest = i; > + if (drv->states[i].exit_latency_ns <= latency_req) > + return i; > + /* No need to search for deeper state because latency_req cannot be met */ > + break; > + } > + > + /* Choose a shallower state if no deeper state found */ > + for (i = idx - 1; i >= 0; i--) { > + if (dev->states_usage[i].disable) > + continue; > + shallowest = i; > + if (drv->states[i].exit_latency_ns <= latency_req) > + return i; > + } > + > + /* Choose the shallowest enabled state if latency_req cannot be met */ > + return shallowest == -1 ? 0 : shallowest; > +} > + > /** > * ladder_select_state - selects the next state to enter > * @drv: cpuidle driver > @@ -69,59 +141,63 @@ static int ladder_select_state(struct cpuidle_driver *drv, > struct ladder_device *ldev = this_cpu_ptr(&ladder_devices); > struct ladder_device_state *last_state; > int last_idx = dev->last_state_idx; > - int first_idx = drv->states[0].flags & CPUIDLE_FLAG_POLLING ? 1 : 0; > + int next_idx; > s64 latency_req = cpuidle_governor_latency_req(dev->cpu); > s64 last_residency; > > /* Special case when user has set very strict latency requirement */ > if (unlikely(latency_req == 0)) { > - ladder_do_selection(dev, ldev, last_idx, 0); > - return 0; > + /* Choose the shallowest state */ > + next_idx = get_deeper(drv, dev, 0, 0); > + goto end; > } > > last_state = &ldev->states[last_idx]; > > last_residency = dev->last_residency_ns - drv->states[last_idx].exit_latency_ns; > > - /* consider promotion */ > - if (last_idx < drv->state_count - 1 && > - !dev->states_usage[last_idx + 1].disable && > - last_residency > last_state->threshold.promotion_time_ns && > - drv->states[last_idx + 1].exit_latency_ns <= latency_req) { > + /* meet latency requirement first */ > + if (drv->states[last_idx].exit_latency_ns > latency_req) { > + /* Need to consider POLL because of latency requirement */ > + next_idx = get_shallower(drv, dev, last_idx - 1, latency_req, 0); > + goto end; > + } > + > + /* choose a deeper state because of promotion */ > + if (last_residency > last_state->threshold.promotion_time_ns) { > + next_idx = get_deeper(drv, dev, last_idx + 1, latency_req); > + > + /* no usable deeper state, use available deepest one */ > + if (next_idx <= last_idx) > + goto end; > last_state->stats.promotion_count++; > last_state->stats.demotion_count = 0; > - if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) { > - ladder_do_selection(dev, ldev, last_idx, last_idx + 1); > - return last_idx + 1; > - } > + if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) > + goto end; > + goto remain_cur; > } > > - /* consider demotion */ > - if (last_idx > first_idx && > - (dev->states_usage[last_idx].disable || > - drv->states[last_idx].exit_latency_ns > latency_req)) { > - int i; > - > - for (i = last_idx - 1; i > first_idx; i--) { > - if (drv->states[i].exit_latency_ns <= latency_req) > - break; > - } > - ladder_do_selection(dev, ldev, last_idx, i); > - return i; > - } > + /* choose a shallower state because of demotion */ > + if (last_residency < last_state->threshold.demotion_time_ns) { > + next_idx = get_shallower(drv, dev, last_idx - 1, latency_req, 1); > > - if (last_idx > first_idx && > - last_residency < last_state->threshold.demotion_time_ns) { > + /* no usable shallower state, use available shallowest one */ > + if (next_idx >= last_idx) > + goto end; > last_state->stats.demotion_count++; > last_state->stats.promotion_count = 0; > - if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) { > - ladder_do_selection(dev, ldev, last_idx, last_idx - 1); > - return last_idx - 1; > - } > + if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) > + goto end; > } > > - /* otherwise remain at the current state */ > - return last_idx; > +remain_cur: > + /* remain at the current state but in case it is disabled */ > + next_idx = get_shallower(drv, dev, last_idx, latency_req, 1); > + > +end: > + if (next_idx != last_idx) > + ladder_do_selection(dev, ldev, last_idx, next_idx); > + return next_idx; > } > > /** > @@ -152,8 +228,15 @@ static int ladder_enable_device(struct cpuidle_driver *drv, > > if (i < drv->state_count - 1) > lstate->threshold.promotion_time_ns = state->exit_latency_ns; > + else > + /* Effectively disable promotion from deepest state */ > + lstate->threshold.promotion_time_ns = S64_MAX; > + > if (i > first_idx) > lstate->threshold.demotion_time_ns = state->exit_latency_ns; > + else > + /* Effectively disable demotion from shallowest state */ > + lstate->threshold.demotion_time_ns = S64_MIN; > } > > return 0; > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 1/3] cpuidle: ladder: Fix bogus comparison between s64 and u64 2022-11-05 17:42 [RFC PATCH 1/3] cpuidle: ladder: Fix bogus comparison between s64 and u64 Zhang Rui 2022-11-05 17:42 ` [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold Zhang Rui 2022-11-05 17:42 ` [RFC PATCH 3/3] cpuidle: ladder: Fix handling for disabled states Zhang Rui @ 2022-11-23 17:37 ` Rafael J. Wysocki 2 siblings, 0 replies; 10+ messages in thread From: Rafael J. Wysocki @ 2022-11-23 17:37 UTC (permalink / raw) To: Zhang Rui; +Cc: rjw, daniel.lezcano, linux-pm, linux-kernel On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote: > > ladder_device_state.threshold.promotion_time_ns/demotion_time_ns > are u64 type. > > In ladder_select_state(), variable 'last_residency', as calculated by > > last_residency = dev->last_residency_ns - drv->states[last_idx].exit_latency_ns > > are s64 type, and it can be negative value. The code changes are fine AFAICS, but the description below could be more precise. > When this happens, comparing between 'last_residency' and > 'promotion_time_ns/demotion_time_ns' become bogus. IIUC, what happens is that last_residency is converted to u64 in the comparison expression and that conversion causes it to become a large positive number if it is negative. > As a result, the ladder governor promotes or stays with current state errornously. "promotes or retains the current state erroneously". > > <idle>-0 [001] d..1. 151.893396: ladder_select_state: last_idx 7, last_residency -373033 > <idle>-0 [001] d..1. 151.893399: ladder_select_state: dev->last_residency_ns 106967, drv->states[last_idx].exit_latency_ns 480000 > <idle>-0 [001] d..1. 151.893402: ladder_select_state: promote, last_state->threshold.promotion_time_ns 480000 > <idle>-0 [001] d..1. 151.893404: ladder_select_state: ---> new state 7 > <idle>-0 [001] d..1. 151.893465: ladder_select_state: last_idx 7, last_residency -463800 > <idle>-0 [001] d..1. 151.893467: ladder_select_state: dev->last_residency_ns 16200, drv->states[last_idx].exit_latency_ns 480000 > <idle>-0 [001] d..1. 151.893468: ladder_select_state: promote, last_state->threshold.promotion_time_ns 480000 > <idle>-0 [001] dn.1. 151.893470: ladder_select_state: ---> new state 8 > > Given that promotion_time_ns/demotion_time_ns are initialized with > cpuidle_state.exit_latency_ns, which is s64 type, and they are used to > compare with 'last_residency', which is also s64 type, there is no "they are compared with" > reason to use u64 for promotion_time_ns/demotion_time_ns. "so change them both to be s64". > With this patch, > <idle>-0 [001] d..1. 523.578531: ladder_select_state: last_idx 8, last_residency -879453 > <idle>-0 [001] d..1. 523.578531: ladder_select_state: dev->last_residency_ns 10547, drv->states[last_idx].exit_latency_ns 890000 > <idle>-0 [001] d..1. 523.578532: ladder_select_state: demote , last_state->threshold.demotion_time_ns 890000 > <idle>-0 [001] d..1. 523.578532: ladder_select_state: ---> new state 7 > <idle>-0 [001] d..1. 523.580220: ladder_select_state: last_idx 7, last_residency -169629 > <idle>-0 [001] d..1. 523.580221: ladder_select_state: dev->last_residency_ns 310371, drv->states[last_idx].exit_latency_ns 480000 > <idle>-0 [001] d..1. 523.580221: ladder_select_state: demote , last_state->threshold.demotion_time_ns 480000 > <idle>-0 [001] d..1. 523.580222: ladder_select_state: ---> new state 6 > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > drivers/cpuidle/governors/ladder.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c > index 8e9058c4ea63..fb61118aef37 100644 > --- a/drivers/cpuidle/governors/ladder.c > +++ b/drivers/cpuidle/governors/ladder.c > @@ -27,8 +27,8 @@ struct ladder_device_state { > struct { > u32 promotion_count; > u32 demotion_count; > - u64 promotion_time_ns; > - u64 demotion_time_ns; > + s64 promotion_time_ns; > + s64 demotion_time_ns; > } threshold; > struct { > int promotion_count; > -- ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-11-27 3:18 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-05 17:42 [RFC PATCH 1/3] cpuidle: ladder: Fix bogus comparison between s64 and u64 Zhang Rui 2022-11-05 17:42 ` [RFC PATCH 2/3] cpuidle: ladder: Tune promotion/demotion threshold Zhang Rui 2022-11-23 17:50 ` Rafael J. Wysocki 2022-11-23 23:53 ` Doug Smythies 2022-11-25 6:38 ` Zhang Rui 2022-11-25 13:36 ` Rafael J. Wysocki 2022-11-27 3:18 ` Zhang Rui 2022-11-05 17:42 ` [RFC PATCH 3/3] cpuidle: ladder: Fix handling for disabled states Zhang Rui 2022-11-23 17:56 ` Rafael J. Wysocki 2022-11-23 17:37 ` [RFC PATCH 1/3] cpuidle: ladder: Fix bogus comparison between s64 and u64 Rafael J. Wysocki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.