* [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse @ 2014-12-18 10:51 Thomas Gleixner 2014-12-18 14:01 ` Eduardo Valentin ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Thomas Gleixner @ 2014-12-18 10:51 UTC (permalink / raw) To: Preeti Murthy Cc: Preeti U Murthy, Viresh Kumar, Frederic Weisbecker, Fengguang Wu, Frederic Weisbecker, Pan, Jacob jun, LKML, LKP, Peter Zijlstra, Zhang Rui commit 4dbd27711cd9 "tick: export nohz tick idle symbols for module use" was merged via the thermal tree without an explicit ack from the relevant maintainers. The exports are abused by the intel powerclamp driver which implements a fake idle state from a sched FIFO task. This causes all kinds of wreckage in the NOHZ core code which rightfully assumes that tick_nohz_idle_enter/exit() are only called from the idle task itself. Recent changes in the NOHZ core lead to a failure of the powerclamp driver and now people try to hack completely broken and backwards workarounds into the NOHZ core code. This is completely unacceptable. The real solution is to fix the powerclamp driver by rewriting it with a sane concept, but that's beyond the scope of this. So the only solution for now is to remove the calls into the core NOHZ code from the powerclamp trainwreck along with the exports. Fixes: d6d71ee4a14a "PM: Introduce Intel PowerClamp Driver" Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index b46c706e1cac..e98b4249187c 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -435,7 +435,6 @@ static int clamp_thread(void *arg) * allowed. thus jiffies are updated properly. */ preempt_disable(); - tick_nohz_idle_enter(); /* mwait until target jiffies is reached */ while (time_before(jiffies, target_jiffies)) { unsigned long ecx = 1; @@ -451,7 +450,6 @@ static int clamp_thread(void *arg) start_critical_timings(); atomic_inc(&idle_wakeup_counter); } - tick_nohz_idle_exit(); preempt_enable(); } del_timer_sync(&wakeup_timer); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 4d54b7540585..1363d58f07e9 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -847,7 +847,6 @@ void tick_nohz_idle_enter(void) local_irq_enable(); } -EXPORT_SYMBOL_GPL(tick_nohz_idle_enter); /** * tick_nohz_irq_exit - update next tick event from interrupt exit @@ -974,7 +973,6 @@ void tick_nohz_idle_exit(void) local_irq_enable(); } -EXPORT_SYMBOL_GPL(tick_nohz_idle_exit); static int tick_nohz_reprogram(struct tick_sched *ts, ktime_t now) { ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse 2014-12-18 10:51 [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse Thomas Gleixner @ 2014-12-18 14:01 ` Eduardo Valentin 2014-12-18 14:43 ` Thomas Gleixner 2014-12-18 17:28 ` Preeti U Murthy 2014-12-19 13:09 ` [tip:timers/urgent] " tip-bot for Thomas Gleixner 2 siblings, 1 reply; 12+ messages in thread From: Eduardo Valentin @ 2014-12-18 14:01 UTC (permalink / raw) To: Thomas Gleixner Cc: Preeti Murthy, Preeti U Murthy, Viresh Kumar, Frederic Weisbecker, Fengguang Wu, Frederic Weisbecker, Pan, Jacob jun, LKML, LKP, Peter Zijlstra, Zhang Rui [-- Attachment #1: Type: text/plain, Size: 3266 bytes --] Hello Thomas, On Thu, Dec 18, 2014 at 11:51:01AM +0100, Thomas Gleixner wrote: > commit 4dbd27711cd9 "tick: export nohz tick idle symbols for module > use" was merged via the thermal tree without an explicit ack from the > relevant maintainers. > This is a shame. Rui, do you have any comments on this? > The exports are abused by the intel powerclamp driver which implements > a fake idle state from a sched FIFO task. This causes all kinds of > wreckage in the NOHZ core code which rightfully assumes that > tick_nohz_idle_enter/exit() are only called from the idle task itself. > OK. > Recent changes in the NOHZ core lead to a failure of the powerclamp > driver and now people try to hack completely broken and backwards > workarounds into the NOHZ core code. This is completely unacceptable. > I see. > The real solution is to fix the powerclamp driver by rewriting it with > a sane concept, but that's beyond the scope of this. > Do you have suggestions on what exactly is the expected rewriting or the correct sane concepts? > So the only solution for now is to remove the calls into the core NOHZ > code from the powerclamp trainwreck along with the exports. > > Fixes: d6d71ee4a14a "PM: Introduce Intel PowerClamp Driver" If I got it right, the driver is currently broken due to changes in NOHZ core. So, does this patch fix power clamp behavior ? If I got your proposal right, in the end power clamp will be still broken, but at least won't be abusing NOHZ. Is that what you are proposing? > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c > index b46c706e1cac..e98b4249187c 100644 > --- a/drivers/thermal/intel_powerclamp.c > +++ b/drivers/thermal/intel_powerclamp.c > @@ -435,7 +435,6 @@ static int clamp_thread(void *arg) > * allowed. thus jiffies are updated properly. > */ > preempt_disable(); > - tick_nohz_idle_enter(); > /* mwait until target jiffies is reached */ > while (time_before(jiffies, target_jiffies)) { > unsigned long ecx = 1; > @@ -451,7 +450,6 @@ static int clamp_thread(void *arg) > start_critical_timings(); > atomic_inc(&idle_wakeup_counter); > } > - tick_nohz_idle_exit(); > preempt_enable(); > } > del_timer_sync(&wakeup_timer); > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index 4d54b7540585..1363d58f07e9 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -847,7 +847,6 @@ void tick_nohz_idle_enter(void) > > local_irq_enable(); > } > -EXPORT_SYMBOL_GPL(tick_nohz_idle_enter); > > /** > * tick_nohz_irq_exit - update next tick event from interrupt exit > @@ -974,7 +973,6 @@ void tick_nohz_idle_exit(void) > > local_irq_enable(); > } > -EXPORT_SYMBOL_GPL(tick_nohz_idle_exit); > > static int tick_nohz_reprogram(struct tick_sched *ts, ktime_t now) > { > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse 2014-12-18 14:01 ` Eduardo Valentin @ 2014-12-18 14:43 ` Thomas Gleixner 0 siblings, 0 replies; 12+ messages in thread From: Thomas Gleixner @ 2014-12-18 14:43 UTC (permalink / raw) To: Eduardo Valentin Cc: Preeti Murthy, Preeti U Murthy, Viresh Kumar, Frederic Weisbecker, Fengguang Wu, Frederic Weisbecker, Pan, Jacob jun, LKML, LKP, Peter Zijlstra, Zhang Rui On Thu, 18 Dec 2014, Eduardo Valentin wrote: > > The real solution is to fix the powerclamp driver by rewriting it with > > a sane concept, but that's beyond the scope of this. > > > > Do you have suggestions on what exactly is the expected rewriting or the > correct sane concepts? There was quite some discussion about this in this very thread. > > So the only solution for now is to remove the calls into the core NOHZ > > code from the powerclamp trainwreck along with the exports. > > > > Fixes: d6d71ee4a14a "PM: Introduce Intel PowerClamp Driver" > > If I got it right, the driver is currently broken due to changes in NOHZ > core. So, does this patch fix power clamp behavior ? The driver has been broken forever. It just worked by chance. Now a very well justified and correct change in the core code exposed that wreckage. So we have 2 choices: 1) Get rid of the abuse and let powerclamp deal with the problem. 2) Revert a correct patch for the sake of a 'works by chance' driver or put hacky workarounds in the core. Either of that will just paper over the real root cause until the next thing breaks in subtle ways. #1 is the only sane decision. We cannot deal with misdesigned driver code in the NOHZ core. > If I got your proposal right, in the end power clamp will be still > broken, but at least won't be abusing NOHZ. Is that what you are > proposing? Yes, the design of powerclamp stays broken, but the NOHZ abuse is gone. powerclamp will work, but it can't benefit from the possible longer idle times anymore. Thanks, tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse 2014-12-18 10:51 [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse Thomas Gleixner 2014-12-18 14:01 ` Eduardo Valentin @ 2014-12-18 17:28 ` Preeti U Murthy [not found] ` <DD415AA12F8FF042BF4EA69DF123C1478AF91730@ORSMSX101.amr.corp.intel.com> 2014-12-19 13:09 ` [tip:timers/urgent] " tip-bot for Thomas Gleixner 2 siblings, 1 reply; 12+ messages in thread From: Preeti U Murthy @ 2014-12-18 17:28 UTC (permalink / raw) To: Thomas Gleixner, Preeti Murthy, Pan, Jacob jun, Peter Zijlstra Cc: Viresh Kumar, Frederic Weisbecker, Fengguang Wu, Frederic Weisbecker, LKML, LKP, Zhang Rui Hi Thomas, On 12/18/2014 04:21 PM, Thomas Gleixner wrote: > commit 4dbd27711cd9 "tick: export nohz tick idle symbols for module > use" was merged via the thermal tree without an explicit ack from the > relevant maintainers. > > The exports are abused by the intel powerclamp driver which implements > a fake idle state from a sched FIFO task. This causes all kinds of > wreckage in the NOHZ core code which rightfully assumes that > tick_nohz_idle_enter/exit() are only called from the idle task itself. > > Recent changes in the NOHZ core lead to a failure of the powerclamp > driver and now people try to hack completely broken and backwards > workarounds into the NOHZ core code. This is completely unacceptable. > > The real solution is to fix the powerclamp driver by rewriting it with > a sane concept, but that's beyond the scope of this. > > So the only solution for now is to remove the calls into the core NOHZ > code from the powerclamp trainwreck along with the exports. > > Fixes: d6d71ee4a14a "PM: Introduce Intel PowerClamp Driver" > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c > index b46c706e1cac..e98b4249187c 100644 > --- a/drivers/thermal/intel_powerclamp.c > +++ b/drivers/thermal/intel_powerclamp.c > @@ -435,7 +435,6 @@ static int clamp_thread(void *arg) > * allowed. thus jiffies are updated properly. > */ > preempt_disable(); > - tick_nohz_idle_enter(); > /* mwait until target jiffies is reached */ > while (time_before(jiffies, target_jiffies)) { > unsigned long ecx = 1; > @@ -451,7 +450,6 @@ static int clamp_thread(void *arg) > start_critical_timings(); > atomic_inc(&idle_wakeup_counter); > } > - tick_nohz_idle_exit(); > preempt_enable(); > } > del_timer_sync(&wakeup_timer); > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index 4d54b7540585..1363d58f07e9 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -847,7 +847,6 @@ void tick_nohz_idle_enter(void) > > local_irq_enable(); > } > -EXPORT_SYMBOL_GPL(tick_nohz_idle_enter); > > /** > * tick_nohz_irq_exit - update next tick event from interrupt exit > @@ -974,7 +973,6 @@ void tick_nohz_idle_exit(void) > > local_irq_enable(); > } > -EXPORT_SYMBOL_GPL(tick_nohz_idle_exit); > > static int tick_nohz_reprogram(struct tick_sched *ts, ktime_t now) > { > Ok the solution looks apt to me. Let me see if I can come up with a sane solution for powerclamp based on the suggestions that you gave in the previous thread. I was thinking of the below steps towards its implementation. The idea is based on the throttling mechanism that you had suggested. 1. Queue a deferable periodic timer whose handler checks if idle needs to be injected. If so, it sets rq->need_throttle for the cpu. If its already in the fake idle period, it clears rq->need_throttle and sets need_resched. 2. pick_next_task_fair() checks rq->need_throttle and dequeues all tasks in the rq if this is set and puts them on a throttled list. This mechanism is similar to throttling cfs rq today. This function hence fails to return a task, and if no task from any other sched class exists, idle task is picked. Peter thoughts? 3. So we are now in the idle injected period. The scheduler state is sane because the cpu is idle, rq->nr_running = 0, rq->curr = rq->idle. The nohz state is sane, because ts->inidle = 1 and tick_stopped may or may not be 1 and they are set by an idle task. 4. When need_resched is set again, the idle task of course unsets inidle and restarts tick. In the following scheduler tick, pick_next_task_fair() sees that rq->need_throttle is cleared, enqueues back the tasks and returns one of them to run. Of course there may be several points that I have missed. But how does the approach appear? If it looks sane enough, the cases which do not obviously fall in place can be worked upon. Regards Preeti U Murthy ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <DD415AA12F8FF042BF4EA69DF123C1478AF91730@ORSMSX101.amr.corp.intel.com>]
* Re: [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse [not found] ` <DD415AA12F8FF042BF4EA69DF123C1478AF91730@ORSMSX101.amr.corp.intel.com> @ 2014-12-18 19:52 ` Jacob Pan 2014-12-18 21:12 ` Thomas Gleixner 0 siblings, 1 reply; 12+ messages in thread From: Jacob Pan @ 2014-12-18 19:52 UTC (permalink / raw) To: Pan, Jacob jun, Thomas Gleixner, Preeti U Murthy, Peter Zijlstra, Viresh Kumar, LKP, LKML, Zhang, Rui, Frederic Weisbecker, Eduardo Valentin > > > -----Original Message----- > From: Preeti U Murthy [mailto:preeti@linux.vnet.ibm.com] > Sent: Thursday, December 18, 2014 9:28 AM > To: Thomas Gleixner; Preeti Murthy; Pan, Jacob jun; Peter Zijlstra > Cc: Viresh Kumar; Frederic Weisbecker; Wu, Fengguang; Frederic > Weisbecker; LKML; LKP; Zhang, Rui Subject: Re: [PATCH] > tick/powerclamp: Remove tick_nohz_idle abuse > > Hi Thomas, > > On 12/18/2014 04:21 PM, Thomas Gleixner wrote: > > commit 4dbd27711cd9 "tick: export nohz tick idle symbols for module > > use" was merged via the thermal tree without an explicit ack from > > the relevant maintainers. > > > > The exports are abused by the intel powerclamp driver which > > implements a fake idle state from a sched FIFO task. This causes > > all kinds of wreckage in the NOHZ core code which rightfully > > assumes that tick_nohz_idle_enter/exit() are only called from the > > idle task itself. > > > > Recent changes in the NOHZ core lead to a failure of the powerclamp > > driver and now people try to hack completely broken and backwards > > workarounds into the NOHZ core code. This is completely > > unacceptable. > > > > The real solution is to fix the powerclamp driver by rewriting it > > with a sane concept, but that's beyond the scope of this. > > > > So the only solution for now is to remove the calls into the core > > NOHZ code from the powerclamp trainwreck along with the exports. > > > > Fixes: d6d71ee4a14a "PM: Introduce Intel PowerClamp Driver" > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > > --- > > diff --git a/drivers/thermal/intel_powerclamp.c > > b/drivers/thermal/intel_powerclamp.c > > index b46c706e1cac..e98b4249187c 100644 > > --- a/drivers/thermal/intel_powerclamp.c > > +++ b/drivers/thermal/intel_powerclamp.c > > @@ -435,7 +435,6 @@ static int clamp_thread(void *arg) > > * allowed. thus jiffies are updated properly. > > */ > > preempt_disable(); > > - tick_nohz_idle_enter(); > > /* mwait until target jiffies is reached */ > > while (time_before(jiffies, target_jiffies)) { > > unsigned long ecx = 1; > > @@ -451,7 +450,6 @@ static int clamp_thread(void *arg) > > start_critical_timings(); > > atomic_inc(&idle_wakeup_counter); > > } > > - tick_nohz_idle_exit(); > > preempt_enable(); > > } > > del_timer_sync(&wakeup_timer); > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > index 4d54b7540585..1363d58f07e9 100644 > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -847,7 +847,6 @@ void tick_nohz_idle_enter(void) > > > > local_irq_enable(); > > } > > -EXPORT_SYMBOL_GPL(tick_nohz_idle_enter); > > > > /** > > * tick_nohz_irq_exit - update next tick event from interrupt exit > > @@ -974,7 +973,6 @@ void tick_nohz_idle_exit(void) > > > > local_irq_enable(); > > } > > -EXPORT_SYMBOL_GPL(tick_nohz_idle_exit); > > > > static int tick_nohz_reprogram(struct tick_sched *ts, ktime_t > > now) { > > > (switching to my linux email) OK I agree, also as I mentioned earlier, Peter already has a patch for consolidated idle loop and remove tick_nohz_idle_enter/exit call from powerclamp driver. I have been working on a few tweaks to maintain the functionality and efficiency with the consolidated idle loop. We can apply the patches on top of yours. > Ok the solution looks apt to me. > > Let me see if I can come up with a sane solution for powerclamp based > on the suggestions that you gave in the previous thread. I was > thinking of the below steps towards its implementation. The idea is > based on the throttling mechanism that you had suggested. > > 1. Queue a deferable periodic timer whose handler checks if idle > needs to be injected. If so, it sets rq->need_throttle for the cpu. > If its already in the fake idle period, it clears rq->need_throttle > and sets need_resched. > The key to powerclamp driver is to achieve package level idle states, which implies synchronized idle injection. From power/performance standpoint, only package level idle states is worth injection. IMHO, percpu the deferrable timer based solution makes it hard to synchronize. And you have to be able to request deepest idle. Some background on why we do this: As the power consumption in package level idle goes lower and lower with new processors, it became negligible compared to running states. Therefore, powerclamp driver can give you near linear power-performance throttling. Idle injection at per cpu core level may not be worthwhile in most of todays' cpus. Just some background on the use case, if you want to try powerclamp on your ultrabook, you will be able compare the effectiveness in controlling cpu thermal. You can use tmon tool in kernel source. e.g. $tools/thermal/tmon$ sudo ./tmon -z 1 -c intel_powerclamp (choose -z thermal zone of your processor zone, pkg-temp or acpi tz) > 2. pick_next_task_fair() checks rq->need_throttle and dequeues all > tasks in the rq if this is set and puts them on a throttled list. > This mechanism is similar to throttling cfs rq today. This function > hence fails to return a task, and if no task from any other sched > class exists, idle task is picked. > > Peter thoughts? > > 3. So we are now in the idle injected period. The scheduler state is > sane because the cpu is idle, rq->nr_running = 0, rq->curr = > rq->idle. The nohz state is sane, because ts->inidle = 1 and > tick_stopped may or may not be 1 and they are set by an idle task. > > 4. When need_resched is set again, the idle task of course unsets > inidle and restarts tick. In the following scheduler tick, > pick_next_task_fair() sees that rq->need_throttle is cleared, > enqueues back the tasks and returns one of them to run. > > Of course there may be several points that I have missed. But how > does the approach appear? If it looks sane enough, the cases which do > not obviously fall in place can be worked upon. > > Regards > Preeti U Murthy > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse 2014-12-18 19:52 ` Jacob Pan @ 2014-12-18 21:12 ` Thomas Gleixner 2014-12-19 18:39 ` Jacob Pan 0 siblings, 1 reply; 12+ messages in thread From: Thomas Gleixner @ 2014-12-18 21:12 UTC (permalink / raw) To: Jacob Pan Cc: Pan, Jacob jun, Preeti U Murthy, Peter Zijlstra, Viresh Kumar, LKP, LKML, Zhang, Rui, Frederic Weisbecker, Eduardo Valentin On Thu, 18 Dec 2014, Jacob Pan wrote: > OK I agree, also as I mentioned earlier, Peter already has a patch for > consolidated idle loop and remove tick_nohz_idle_enter/exit call from > powerclamp driver. I have been working on a few tweaks to maintain the > functionality and efficiency with the consolidated idle loop. > We can apply the patches on top of yours. No. This is equally wrong as I pointed out before. The 'unified' idle loop is still fake and just pretending to be idle. If simple standard interfaces like cpu_idle() are not working from idle code anymore then this simply stinks. And that's what any fake idle thread will do. The whole approach is wrong. Implement a sched fair throttler and you can avoid the whoile trainwreck. > > 1. Queue a deferable periodic timer whose handler checks if idle > > needs to be injected. If so, it sets rq->need_throttle for the cpu. > > If its already in the fake idle period, it clears rq->need_throttle > > and sets need_resched. > > > The key to powerclamp driver is to achieve package level idle > states, which implies synchronized idle injection. From > power/performance standpoint, only package level idle states is worth > injection. Then use a synchronized non deferrable timer on all cpus. It's simple enough. Thanks, tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse 2014-12-18 21:12 ` Thomas Gleixner @ 2014-12-19 18:39 ` Jacob Pan 2014-12-19 19:56 ` Thomas Gleixner 0 siblings, 1 reply; 12+ messages in thread From: Jacob Pan @ 2014-12-19 18:39 UTC (permalink / raw) To: Thomas Gleixner Cc: Pan, Jacob jun, Preeti U Murthy, Peter Zijlstra, Viresh Kumar, LKP, LKML, Zhang, Rui, Frederic Weisbecker, Eduardo Valentin, Van De Ven, Arjan On Thu, 18 Dec 2014 22:12:57 +0100 (CET) Thomas Gleixner <tglx@linutronix.de> wrote: > On Thu, 18 Dec 2014, Jacob Pan wrote: > > OK I agree, also as I mentioned earlier, Peter already has a patch > > for consolidated idle loop and remove tick_nohz_idle_enter/exit > > call from powerclamp driver. I have been working on a few tweaks to > > maintain the functionality and efficiency with the consolidated > > idle loop. We can apply the patches on top of yours. > > No. This is equally wrong as I pointed out before. The 'unified' idle > loop is still fake and just pretending to be idle. > In terms of efficiency, the consolidated idle loop will allow turning off sched tick during idle injection period. If we just take out the tick_nohz_idle_xxx call, the effectiveness of powerclamp is going down significantly. I am not arguing the design but from fixing regression perspective or short term solution. > If simple standard interfaces like cpu_idle() are not working from > idle code anymore then this simply stinks. And that's what any fake > idle thread will do. > > The whole approach is wrong. Implement a sched fair throttler and you > can avoid the whoile trainwreck. > I agree with you that fake idle always have dilemma. on one side, idle injection threads are really busy injecting idle, so from that standpoint it is not idle. But on the other side, we have to stop tick to avoid being woken up all the time. So we can't just simply take control of the cpu periodically and execute mwait. I need to do some study here, thanks for the pointers, Jacob > > > 1. Queue a deferable periodic timer whose handler checks if idle > > > needs to be injected. If so, it sets rq->need_throttle for the > > > cpu. If its already in the fake idle period, it clears > > > rq->need_throttle and sets need_resched. > > > > > The key to powerclamp driver is to achieve package level idle > > states, which implies synchronized idle injection. From > > power/performance standpoint, only package level idle states is > > worth injection. > > Then use a synchronized non deferrable timer on all cpus. It's simple > enough. > > Thanks, > > tglx [Jacob Pan] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse 2014-12-19 18:39 ` Jacob Pan @ 2014-12-19 19:56 ` Thomas Gleixner 2014-12-20 1:31 ` Preeti U Murthy 0 siblings, 1 reply; 12+ messages in thread From: Thomas Gleixner @ 2014-12-19 19:56 UTC (permalink / raw) To: Jacob Pan Cc: Pan, Jacob jun, Preeti U Murthy, Peter Zijlstra, Viresh Kumar, LKP, LKML, Zhang, Rui, Frederic Weisbecker, Eduardo Valentin, Van De Ven, Arjan On Fri, 19 Dec 2014, Jacob Pan wrote: > On Thu, 18 Dec 2014 22:12:57 +0100 (CET) > Thomas Gleixner <tglx@linutronix.de> wrote: > > > On Thu, 18 Dec 2014, Jacob Pan wrote: > > > OK I agree, also as I mentioned earlier, Peter already has a patch > > > for consolidated idle loop and remove tick_nohz_idle_enter/exit > > > call from powerclamp driver. I have been working on a few tweaks to > > > maintain the functionality and efficiency with the consolidated > > > idle loop. We can apply the patches on top of yours. > > > > No. This is equally wrong as I pointed out before. The 'unified' idle > > loop is still fake and just pretending to be idle. > > > In terms of efficiency, the consolidated idle loop will allow turning > off sched tick during idle injection period. If we just take out the > tick_nohz_idle_xxx call, the effectiveness of powerclamp is going down > significantly. I am not arguing the design but from fixing regression > perspective or short term solution. There is no perspective. Period. Its violates every rightful assumption of the nohz_IDLE_* code and just ever worked by chance. There is so much subtle wreckage lurking there that the only sane solution is to forbid it. End of story. Thanks, tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse 2014-12-19 19:56 ` Thomas Gleixner @ 2014-12-20 1:31 ` Preeti U Murthy 2014-12-23 2:57 ` Jacob Pan 0 siblings, 1 reply; 12+ messages in thread From: Preeti U Murthy @ 2014-12-20 1:31 UTC (permalink / raw) To: Thomas Gleixner, Jacob Pan Cc: Pan, Jacob jun, Peter Zijlstra, Viresh Kumar, LKP, LKML, Zhang, Rui, Frederic Weisbecker, Eduardo Valentin, Van De Ven, Arjan On 12/20/2014 01:26 AM, Thomas Gleixner wrote: > On Fri, 19 Dec 2014, Jacob Pan wrote: > >> On Thu, 18 Dec 2014 22:12:57 +0100 (CET) >> Thomas Gleixner <tglx@linutronix.de> wrote: >> >>> On Thu, 18 Dec 2014, Jacob Pan wrote: >>>> OK I agree, also as I mentioned earlier, Peter already has a patch >>>> for consolidated idle loop and remove tick_nohz_idle_enter/exit >>>> call from powerclamp driver. I have been working on a few tweaks to >>>> maintain the functionality and efficiency with the consolidated >>>> idle loop. We can apply the patches on top of yours. >>> >>> No. This is equally wrong as I pointed out before. The 'unified' idle >>> loop is still fake and just pretending to be idle. >>> >> In terms of efficiency, the consolidated idle loop will allow turning >> off sched tick during idle injection period. If we just take out the >> tick_nohz_idle_xxx call, the effectiveness of powerclamp is going down >> significantly. I am not arguing the design but from fixing regression >> perspective or short term solution. > > There is no perspective. Period. > > Its violates every rightful assumption of the nohz_IDLE_* code and > just ever worked by chance. There is so much subtle wreckage lurking > there that the only sane solution is to forbid it. End of story. > > Thanks, > > tglx > Hi Jacob, Like Thomas pointed out, we can design a sane solution for powerclamp. Idle injection is nothing but throttling of runqueue. If the runqueue is throttled, no fair tasks will be selected and the natural choice in the absence of tasks from any other sched class is the idle task. The idle loop will automatically be called and the nohz state will also fall in place. The cpu is really idle now: the runqueue has no tasks and the task running on the cpu is the idle thread. The throttled tasks are on a separate list. When the period of idle injection is over, we unthrottle the runqueue. All this being taken care of my a non-deferrable timer. This design ensures that the intention of powerclamp is not hampered while at the same time maintaining a sane state for nohz; you will get the efficiency you want. Of course there may be corner cases and challenges around synchronization of package idle, which I am sure we can work around with a better design such as the above. I am working on that patchset and will post out in a day. You can take a look and let us know the pieces we are missing. I find that implementing the above design is not too hard. Regards Preeti U Murthy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse 2014-12-20 1:31 ` Preeti U Murthy @ 2014-12-23 2:57 ` Jacob Pan 2014-12-31 5:04 ` Preeti U Murthy 0 siblings, 1 reply; 12+ messages in thread From: Jacob Pan @ 2014-12-23 2:57 UTC (permalink / raw) To: Preeti U Murthy Cc: Thomas Gleixner, Pan, Jacob jun, Peter Zijlstra, Viresh Kumar, LKP, LKML, Zhang, Rui, Frederic Weisbecker, Eduardo Valentin, Van De Ven, Arjan On Sat, 20 Dec 2014 07:01:12 +0530 Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote: > On 12/20/2014 01:26 AM, Thomas Gleixner wrote: > > On Fri, 19 Dec 2014, Jacob Pan wrote: > > > >> On Thu, 18 Dec 2014 22:12:57 +0100 (CET) > >> Thomas Gleixner <tglx@linutronix.de> wrote: > >> > >>> On Thu, 18 Dec 2014, Jacob Pan wrote: > >>>> OK I agree, also as I mentioned earlier, Peter already has a > >>>> patch for consolidated idle loop and remove > >>>> tick_nohz_idle_enter/exit call from powerclamp driver. I have > >>>> been working on a few tweaks to maintain the functionality and > >>>> efficiency with the consolidated idle loop. We can apply the > >>>> patches on top of yours. > >>> > >>> No. This is equally wrong as I pointed out before. The 'unified' > >>> idle loop is still fake and just pretending to be idle. > >>> > >> In terms of efficiency, the consolidated idle loop will allow > >> turning off sched tick during idle injection period. If we just > >> take out the tick_nohz_idle_xxx call, the effectiveness of > >> powerclamp is going down significantly. I am not arguing the > >> design but from fixing regression perspective or short term > >> solution. > > > > There is no perspective. Period. > > > > Its violates every rightful assumption of the nohz_IDLE_* code and > > just ever worked by chance. There is so much subtle wreckage lurking > > there that the only sane solution is to forbid it. End of story. > > > > Thanks, > > > > tglx > > > Hi Jacob, > > Like Thomas pointed out, we can design a sane solution for powerclamp. > Idle injection is nothing but throttling of runqueue. If the runqueue > is throttled, no fair tasks will be selected and the natural choice > in the absence of tasks from any other sched class is the idle task. > > The idle loop will automatically be called and the nohz state will > also fall in place. The cpu is really idle now: the runqueue has no > tasks and the task running on the cpu is the idle thread. The > throttled tasks are on a separate list. > > When the period of idle injection is over, we unthrottle the runqueue. > All this being taken care of my a non-deferrable timer. This design > ensures that the intention of powerclamp is not hampered while at the > same time maintaining a sane state for nohz; you will get the > efficiency you want. > > Of course there may be corner cases and challenges around > synchronization of package idle, which I am sure we can work around > with a better design such as the above. I am working on that patchset > and will post out in a day. You can take a look and let us know the > pieces we are missing. > > I find that implementing the above design is not too hard. > Hi Preeti, Yeah, it seems to be a good approach. looking forward to work with you on this. Timer may scale better for larger systems. One question, will timer irq gets unpredictable delays if run by ksoftirqd? BTW, I may not be able to respond quickly during the holidays. If things workout, it may benefit ACPI PAD driver as well. Thanks, Jacob > Regards > Preeti U Murthy > [Jacob Pan] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse 2014-12-23 2:57 ` Jacob Pan @ 2014-12-31 5:04 ` Preeti U Murthy 0 siblings, 0 replies; 12+ messages in thread From: Preeti U Murthy @ 2014-12-31 5:04 UTC (permalink / raw) To: Jacob Pan, Thomas Gleixner, peterz Cc: Pan, Jacob jun, Peter Zijlstra, Viresh Kumar, LKP, LKML, Zhang, Rui, Frederic Weisbecker, Eduardo Valentin, Van De Ven, Arjan Hi Jacob, On 12/23/2014 08:27 AM, Jacob Pan wrote: > On Sat, 20 Dec 2014 07:01:12 +0530 > Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote: > >> On 12/20/2014 01:26 AM, Thomas Gleixner wrote: >>> On Fri, 19 Dec 2014, Jacob Pan wrote: >>> >>>> On Thu, 18 Dec 2014 22:12:57 +0100 (CET) >>>> Thomas Gleixner <tglx@linutronix.de> wrote: >>>> >>>>> On Thu, 18 Dec 2014, Jacob Pan wrote: >>>>>> OK I agree, also as I mentioned earlier, Peter already has a >>>>>> patch for consolidated idle loop and remove >>>>>> tick_nohz_idle_enter/exit call from powerclamp driver. I have >>>>>> been working on a few tweaks to maintain the functionality and >>>>>> efficiency with the consolidated idle loop. We can apply the >>>>>> patches on top of yours. >>>>> >>>>> No. This is equally wrong as I pointed out before. The 'unified' >>>>> idle loop is still fake and just pretending to be idle. >>>>> >>>> In terms of efficiency, the consolidated idle loop will allow >>>> turning off sched tick during idle injection period. If we just >>>> take out the tick_nohz_idle_xxx call, the effectiveness of >>>> powerclamp is going down significantly. I am not arguing the >>>> design but from fixing regression perspective or short term >>>> solution. >>> >>> There is no perspective. Period. >>> >>> Its violates every rightful assumption of the nohz_IDLE_* code and >>> just ever worked by chance. There is so much subtle wreckage lurking >>> there that the only sane solution is to forbid it. End of story. >>> >>> Thanks, >>> >>> tglx >>> >> Hi Jacob, >> >> Like Thomas pointed out, we can design a sane solution for powerclamp. >> Idle injection is nothing but throttling of runqueue. If the runqueue >> is throttled, no fair tasks will be selected and the natural choice >> in the absence of tasks from any other sched class is the idle task. >> >> The idle loop will automatically be called and the nohz state will >> also fall in place. The cpu is really idle now: the runqueue has no >> tasks and the task running on the cpu is the idle thread. The >> throttled tasks are on a separate list. >> >> When the period of idle injection is over, we unthrottle the runqueue. >> All this being taken care of my a non-deferrable timer. This design >> ensures that the intention of powerclamp is not hampered while at the >> same time maintaining a sane state for nohz; you will get the >> efficiency you want. >> >> Of course there may be corner cases and challenges around >> synchronization of package idle, which I am sure we can work around >> with a better design such as the above. I am working on that patchset >> and will post out in a day. You can take a look and let us know the >> pieces we are missing. >> >> I find that implementing the above design is not too hard. >> > Hi Preeti, > Yeah, it seems to be a good approach. looking forward to work with you > on this. Timer may scale better for larger systems. One question, will > timer irq gets unpredictable delays if run by ksoftirqd? I am sorry I could not respond earlier; I was on vacation as well. Yes, we may have a problem here. Let alone synchronization between cpus in performing clamping, there are two other functionality issues that I see. 1. Since periodic timers get executed in the softirq context, scheduler_tick() would have passed by by then. i.e. hrtimer_interrupt() |__tick_sched_handle() |__scheduler_tick() |__raise_softirq(TIMER_SOFTIRQ) ksoftirqd runs on local_bh_enable()-->powerclamp_timer handler runs Although runqueues are throttled in the powerclamp timer handler, it has to wait till the next scheduler tick to select an idle task to run. A precious 4-10ms depending on the config would have passed by then. 2. For the same reason as 1, when the ksoftirqd has to run on the cpus during the tick after the one in which throttling is enabled, cpus are unavailable to run the daemon because they are throttled. However there is no other way to unthrottle the runqueues now except by running the ksoftirqd; a chicken and egg problem. I think both the above problems could be solved by using hrtimers instead of periodic timers to perform clamping/unclamping, since hrtimers are serviced in the interrupt context. But we cannot initialize/start/modify hrtimers on a remote cpu. We will end up using IPIs for handling the hrtimers during start/end of powerclamp or modification of the idle duration of clamping, which is not a tempting option either. So I am currently stuck at this point. I would be glad to have some suggestions. Thanks Regards Preeti U Murthy > BTW, I may not be able to respond quickly during the holidays. If > things workout, it may benefit ACPI PAD driver as well. > > > Thanks, > > Jacob >> Regards >> Preeti U Murthy >> > > [Jacob Pan] > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [tip:timers/urgent] tick/powerclamp: Remove tick_nohz_idle abuse 2014-12-18 10:51 [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse Thomas Gleixner 2014-12-18 14:01 ` Eduardo Valentin 2014-12-18 17:28 ` Preeti U Murthy @ 2014-12-19 13:09 ` tip-bot for Thomas Gleixner 2 siblings, 0 replies; 12+ messages in thread From: tip-bot for Thomas Gleixner @ 2014-12-19 13:09 UTC (permalink / raw) To: linux-tip-commits Cc: frederic, preeti, rui.zhang, tglx, fweisbec, fengguang.wu, peterz, linux-kernel, lkp, hpa, mingo, jacob.jun.pan, viresh.kumar Commit-ID: a5fd9733a30d18d7ac23f17080e7e07bb3205b69 Gitweb: http://git.kernel.org/tip/a5fd9733a30d18d7ac23f17080e7e07bb3205b69 Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Thu, 18 Dec 2014 11:51:01 +0100 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Fri, 19 Dec 2014 14:05:52 +0100 tick/powerclamp: Remove tick_nohz_idle abuse commit 4dbd27711cd9 "tick: export nohz tick idle symbols for module use" was merged via the thermal tree without an explicit ack from the relevant maintainers. The exports are abused by the intel powerclamp driver which implements a fake idle state from a sched FIFO task. This causes all kinds of wreckage in the NOHZ core code which rightfully assumes that tick_nohz_idle_enter/exit() are only called from the idle task itself. Recent changes in the NOHZ core lead to a failure of the powerclamp driver and now people try to hack completely broken and backwards workarounds into the NOHZ core code. This is completely unacceptable and just papers over the real problem. There are way more subtle issues lurking around the corner. The real solution is to fix the powerclamp driver by rewriting it with a sane concept, but that's beyond the scope of this. So the only solution for now is to remove the calls into the core NOHZ code from the powerclamp trainwreck along with the exports. Fixes: d6d71ee4a14a "PM: Introduce Intel PowerClamp Driver" Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Pan Jacob jun <jacob.jun.pan@intel.com> Cc: LKP <lkp@01.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- drivers/thermal/intel_powerclamp.c | 2 -- kernel/time/tick-sched.c | 2 -- 2 files changed, 4 deletions(-) diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 95cb7fc..6cb7849 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -435,7 +435,6 @@ static int clamp_thread(void *arg) * allowed. thus jiffies are updated properly. */ preempt_disable(); - tick_nohz_idle_enter(); /* mwait until target jiffies is reached */ while (time_before(jiffies, target_jiffies)) { unsigned long ecx = 1; @@ -451,7 +450,6 @@ static int clamp_thread(void *arg) start_critical_timings(); atomic_inc(&idle_wakeup_counter); } - tick_nohz_idle_exit(); preempt_enable(); } del_timer_sync(&wakeup_timer); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 1f43560..ff3ec34 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -847,7 +847,6 @@ void tick_nohz_idle_enter(void) local_irq_enable(); } -EXPORT_SYMBOL_GPL(tick_nohz_idle_enter); /** * tick_nohz_irq_exit - update next tick event from interrupt exit @@ -974,7 +973,6 @@ void tick_nohz_idle_exit(void) local_irq_enable(); } -EXPORT_SYMBOL_GPL(tick_nohz_idle_exit); static int tick_nohz_reprogram(struct tick_sched *ts, ktime_t now) { ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-12-31 5:05 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-12-18 10:51 [PATCH] tick/powerclamp: Remove tick_nohz_idle abuse Thomas Gleixner 2014-12-18 14:01 ` Eduardo Valentin 2014-12-18 14:43 ` Thomas Gleixner 2014-12-18 17:28 ` Preeti U Murthy [not found] ` <DD415AA12F8FF042BF4EA69DF123C1478AF91730@ORSMSX101.amr.corp.intel.com> 2014-12-18 19:52 ` Jacob Pan 2014-12-18 21:12 ` Thomas Gleixner 2014-12-19 18:39 ` Jacob Pan 2014-12-19 19:56 ` Thomas Gleixner 2014-12-20 1:31 ` Preeti U Murthy 2014-12-23 2:57 ` Jacob Pan 2014-12-31 5:04 ` Preeti U Murthy 2014-12-19 13:09 ` [tip:timers/urgent] " tip-bot for Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).