linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] nohz: no update idle entry time on get_cpu_idle/iowait_time_us()
@ 2015-09-29  1:01 Yunhong Jiang
  2015-10-11 18:38 ` Thomas Gleixner
  0 siblings, 1 reply; 3+ messages in thread
From: Yunhong Jiang @ 2015-09-29  1:01 UTC (permalink / raw)
  To: tglx; +Cc: arjan, linux-kernel

Currently the get_cpu_idle/iowait_time_us() updates the idle_entrytime.
When it's invoked from another CPU and the target CPU has been on idle
already, it will update the idle_entrytime to now, which is incorrect.

However, the get_cpu_idle/iowait_time_us() is not guranteed to be called
on the target CPU. For example, the get_cpu_idle_time_us() seems is
invoked remotely on drivers/cpufreq/cpufreq_governor.c through
get_cpu_idle_time().

There is a check that the update happens only if a valid last_update_time
parameter passed. IMHO, this is more a hack because there is no guarantee
that it's invoked on the target CPU when last_update_time is valid.

In fact, we don't need update the idle stats from
get_cpu_idle/iowait_time_us(). Now the policy is, we record the
 entrytime when tick_nohz_start_idle() and update the stats
 when tick_nohz_stop_idle(). We calculate the stats on other situations.

Please notice:
1) There is a bug currently that the tick_nohz_stop_idle() calls the
update_ts_time_stats() and update the idle_entrytime, which is sure
to be wrong. Removing the idle_entrytime update resolves the bug also.

2) There is a small widows in tick_nohz_stop_idle() between
 update_ts_time_stats() and clear ts->idle_active, that
get_cpu_idle/iowait_time_us(), when invoked remotely, may double
 count last idle period. This window exists w/o this change and this
change does not fix it.

Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
---
 kernel/time/tick-sched.c | 43 +++++++++++++++----------------------------
 1 file changed, 15 insertions(+), 28 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 7c7ec4515983..c2a78887d265 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -437,7 +437,7 @@ static void tick_nohz_update_jiffies(ktime_t now)
  * Updates the per cpu time idle statistics counters
  */
 static void
-update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
+update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now)
 {
 	ktime_t delta;
 
@@ -447,17 +447,12 @@ update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_upda
 			ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
 		else
 			ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
-		ts->idle_entrytime = now;
 	}
-
-	if (last_update_time)
-		*last_update_time = ktime_to_us(now);
-
 }
 
 static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
 {
-	update_ts_time_stats(smp_processor_id(), ts, now, NULL);
+	update_ts_time_stats(smp_processor_id(), ts, now);
 	ts->idle_active = 0;
 
 	sched_clock_idle_wakeup_event(0);
@@ -496,21 +491,16 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
 		return -1;
 
 	now = ktime_get();
-	if (last_update_time) {
-		update_ts_time_stats(cpu, ts, now, last_update_time);
-		idle = ts->idle_sleeptime;
-	} else {
-		if (ts->idle_active && !nr_iowait_cpu(cpu)) {
-			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+	idle = ts->idle_sleeptime;
+	if (ts->idle_active && !nr_iowait_cpu(cpu)) {
+		ktime_t delta = ktime_sub(now, ts->idle_entrytime);
 
-			idle = ktime_add(ts->idle_sleeptime, delta);
-		} else {
-			idle = ts->idle_sleeptime;
-		}
+		idle = ktime_add(ts->idle_sleeptime, delta);
 	}
 
+	if (last_update_time)
+		*last_update_time = ktime_to_us(now);
 	return ktime_to_us(idle);
-
 }
 EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
 
@@ -537,19 +527,16 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
 		return -1;
 
 	now = ktime_get();
-	if (last_update_time) {
-		update_ts_time_stats(cpu, ts, now, last_update_time);
-		iowait = ts->iowait_sleeptime;
-	} else {
-		if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
-			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+	iowait = ts->iowait_sleeptime;
+	if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
+		ktime_t delta = ktime_sub(now, ts->idle_entrytime);
 
-			iowait = ktime_add(ts->iowait_sleeptime, delta);
-		} else {
-			iowait = ts->iowait_sleeptime;
-		}
+		iowait = ktime_add(ts->iowait_sleeptime, delta);
 	}
 
+	if (last_update_time)
+		*last_update_time = ktime_to_us(now);
+
 	return ktime_to_us(iowait);
 }
 EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC] nohz: no update idle entry time on get_cpu_idle/iowait_time_us()
  2015-09-29  1:01 [RFC] nohz: no update idle entry time on get_cpu_idle/iowait_time_us() Yunhong Jiang
@ 2015-10-11 18:38 ` Thomas Gleixner
  2015-10-20 23:17   ` Yunhong Jiang
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Gleixner @ 2015-10-11 18:38 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: arjan, linux-kernel

On Mon, 28 Sep 2015, Yunhong Jiang wrote:
> Currently the get_cpu_idle/iowait_time_us() updates the idle_entrytime.
> When it's invoked from another CPU and the target CPU has been on idle
> already, it will update the idle_entrytime to now, which is incorrect.
> 
> However, the get_cpu_idle/iowait_time_us() is not guranteed to be called
> on the target CPU. For example, the get_cpu_idle_time_us() seems is
> invoked remotely on drivers/cpufreq/cpufreq_governor.c through
> get_cpu_idle_time().
> 
> There is a check that the update happens only if a valid last_update_time
> parameter passed. IMHO, this is more a hack because there is no guarantee
> that it's invoked on the target CPU when last_update_time is valid.

Looking at the call sites, this last_update_time parameter is
silly. Why is the calling code not taking the timestamp? There is
hardly a requirement that this needs to be the same timestamp as the
one which is used to calculate idle time. That cpufreq calculations
are speculative anyway.

So we better get rid of that parameter completely.

> In fact, we don't need update the idle stats from
> get_cpu_idle/iowait_time_us(). Now the policy is, we record the
>  entrytime when tick_nohz_start_idle() and update the stats
>  when tick_nohz_stop_idle(). We calculate the stats on other situations.
> 
> Please notice:
> 1) There is a bug currently that the tick_nohz_stop_idle() calls the
> update_ts_time_stats() and update the idle_entrytime, which is sure
> to be wrong. Removing the idle_entrytime update resolves the bug also.

Care to explain the actual bug and what wreckage it causes? If it's a
real bug then removing "ts->idle_entrytime = now" needs to be a
separate patch. AFAICT, it's a cosmetic issue.
 
> 2) There is a small widows in tick_nohz_stop_idle() between
>  update_ts_time_stats() and clear ts->idle_active, that
> get_cpu_idle/iowait_time_us(), when invoked remotely, may double
>  count last idle period. This window exists w/o this change and this
> change does not fix it.

Calling any of those functions from a remote cpu is broken to begin
with, especially on 32bit machines. And that does not change with your
patch at all.

What we really need here is protecting the idle stats fields with a
raw_spinlock.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] nohz: no update idle entry time on get_cpu_idle/iowait_time_us()
  2015-10-11 18:38 ` Thomas Gleixner
@ 2015-10-20 23:17   ` Yunhong Jiang
  0 siblings, 0 replies; 3+ messages in thread
From: Yunhong Jiang @ 2015-10-20 23:17 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: arjan, linux-kernel, rjw, viresh.kumar

On Sun, Oct 11, 2015 at 08:38:17PM +0200, Thomas Gleixner wrote:
> On Mon, 28 Sep 2015, Yunhong Jiang wrote:
> > Currently the get_cpu_idle/iowait_time_us() updates the idle_entrytime.
> > When it's invoked from another CPU and the target CPU has been on idle
> > already, it will update the idle_entrytime to now, which is incorrect.
> > 
> > However, the get_cpu_idle/iowait_time_us() is not guranteed to be called
> > on the target CPU. For example, the get_cpu_idle_time_us() seems is
> > invoked remotely on drivers/cpufreq/cpufreq_governor.c through
> > get_cpu_idle_time().
> > 
> > There is a check that the update happens only if a valid last_update_time
> > parameter passed. IMHO, this is more a hack because there is no guarantee
> > that it's invoked on the target CPU when last_update_time is valid.
> 
> Looking at the call sites, this last_update_time parameter is
> silly. Why is the calling code not taking the timestamp? There is
> hardly a requirement that this needs to be the same timestamp as the
> one which is used to calculate idle time. That cpufreq calculations
> are speculative anyway.
> 
> So we better get rid of that parameter completely.

Sure, will change the patch accordingly.

> 
> > In fact, we don't need update the idle stats from
> > get_cpu_idle/iowait_time_us(). Now the policy is, we record the
> >  entrytime when tick_nohz_start_idle() and update the stats
> >  when tick_nohz_stop_idle(). We calculate the stats on other situations.
> > 
> > Please notice:
> > 1) There is a bug currently that the tick_nohz_stop_idle() calls the
> > update_ts_time_stats() and update the idle_entrytime, which is sure
> > to be wrong. Removing the idle_entrytime update resolves the bug also.
> 
> Care to explain the actual bug and what wreckage it causes? If it's a
> real bug then removing "ts->idle_entrytime = now" needs to be a
> separate patch. AFAICT, it's a cosmetic issue.

Thanks for review and sorry for slow response.
Yes, it's an exaggeration to call it a bug. I just thought that updating the
idle_entrytime when exiting the idle state should be something wrong. But 
since the idle_entrytime is not used meaningful, it has no impact.

>  
> > 2) There is a small widows in tick_nohz_stop_idle() between
> >  update_ts_time_stats() and clear ts->idle_active, that
> > get_cpu_idle/iowait_time_us(), when invoked remotely, may double
> >  count last idle period. This window exists w/o this change and this
> > change does not fix it.
> 
> Calling any of those functions from a remote cpu is broken to begin
> with, especially on 32bit machines. And that does not change with your
> patch at all.

Yes, as stated, this change does not fix it.

Arjan also said that get_cpu_idle/iowait_time_us() should not be called from 
remote CPUs, but as stated, drivers/cpufreq/cpufreq_governor.c seems trying 
to invoke them remotely. I have no idea of the impact, Rafael or Viresh, can 
you give me some hints?

> 
> What we really need here is protecting the idle stats fields with a
> raw_spinlock.

If no calling from remote CPU, then we don't need the spinlock protection, 
right?

Thanks
--jyh
> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-10-20 23:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29  1:01 [RFC] nohz: no update idle entry time on get_cpu_idle/iowait_time_us() Yunhong Jiang
2015-10-11 18:38 ` Thomas Gleixner
2015-10-20 23:17   ` Yunhong Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).