All of lore.kernel.org
 help / color / mirror / Atom feed
* Have we changed /proc/stat idle statistics by NOHZ kernel?
@ 2011-07-25 14:33 Michal Hocko
  2011-08-01 19:59 ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2011-07-25 14:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, Andrew Morton, Alexey Dobriyan

Hi,
we have a customer reporting that /proc/stat doesn't provide correct
results about idle time if the machine is idle.
The issue is caused by the fact that tickles kernel doesn't update
kstat_cpu(i).cpustat.idle while it is tickles. Tools that parse this
file interpret the unchanged value as 0% idle since the last time.
While I personally do not think that measuring the idle machine is
that important one could say that the semantic of the file has changed
with NOHZ which is not good as we are trying to keep this interface
stable.
One way to fix this is to consider the current status of idle in
show_stat. The very primitive attempt of that can be seen bellow (on
top of the current Linus tree). I know it has several issue it just
illustrates what I am trying to say.  It will not work if jiffies
overflow while the CPU was tickles and it also misses locking and
handling !NOHZ configuration.

I have also noticed we have get_cpu_idle_time_us which should do
something similar. Should it be used instead or it is more intrusive?

Btw. is this considered to be a problem at all?

Thanks
---
>From 015b5535a0cf9b75357afabd9e1d5d17558ed985 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Mon, 25 Jul 2011 16:16:26 +0200
Subject: [PATCH] proc: consider time when ticks are off when reporting idle
 time

---
 fs/proc/stat.c           |    3 +++
 kernel/time/tick-sched.c |   20 ++++++++++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 9758b65..970ec81 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -21,6 +21,8 @@
 #define arch_idle_time(cpu) 0
 #endif
 
+cputime64_t nohz_idle_shift(int cpu);
+
 static int show_stat(struct seq_file *p, void *v)
 {
 	int i, j;
@@ -44,6 +46,7 @@ static int show_stat(struct seq_file *p, void *v)
 		system = cputime64_add(system, kstat_cpu(i).cpustat.system);
 		idle = cputime64_add(idle, kstat_cpu(i).cpustat.idle);
 		idle = cputime64_add(idle, arch_idle_time(i));
+		idle = cputime64_add(idle, nohz_idle_shift(i));
 		iowait = cputime64_add(iowait, kstat_cpu(i).cpustat.iowait);
 		irq = cputime64_add(irq, kstat_cpu(i).cpustat.irq);
 		softirq = cputime64_add(softirq, kstat_cpu(i).cpustat.softirq);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d5097c4..57d11fa 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -194,6 +194,26 @@ static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
 	return now;
 }
 
+cputime64_t nohz_idle_shift(int cpu)
+{
+	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+	cputime64_t notick_idle = 0;
+
+	if (ts->idle_active && time_after(ts->next_jiffies, jiffies)) {
+		/*
+		 * we are idle and not ticking due to NOHZ so the
+		 * kernel doesn't account for the idle. Let's use
+		 * last_jiffies. We are screwed when jiffies overflow
+		 * of course but what else we can do?
+		 */
+		notick_idle = cputime64_add(notick_idle,
+				jiffies_to_cputime(
+					jiffies - ts->last_jiffies));
+	}
+
+	return notick_idle;
+}
+
 /**
  * get_cpu_idle_time_us - get the total idle time of a cpu
  * @cpu: CPU number to query
-- 
1.7.5.4


-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Have we changed /proc/stat idle statistics by NOHZ kernel?
  2011-07-25 14:33 Have we changed /proc/stat idle statistics by NOHZ kernel? Michal Hocko
@ 2011-08-01 19:59 ` Andrew Morton
  2011-08-02 17:02   ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2011-08-01 19:59 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, Thomas Gleixner, Alexey Dobriyan

On Mon, 25 Jul 2011 16:33:13 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> Hi,
> we have a customer reporting that /proc/stat doesn't provide correct
> results about idle time if the machine is idle.
> The issue is caused by the fact that tickles kernel doesn't update
> kstat_cpu(i).cpustat.idle while it is tickles. Tools that parse this
> file interpret the unchanged value as 0% idle since the last time.
> While I personally do not think that measuring the idle machine is
> that important one could say that the semantic of the file has changed
> with NOHZ which is not good as we are trying to keep this interface
> stable.
> One way to fix this is to consider the current status of idle in
> show_stat. The very primitive attempt of that can be seen bellow (on
> top of the current Linus tree). I know it has several issue it just
> illustrates what I am trying to say.  It will not work if jiffies
> overflow while the CPU was tickles and it also misses locking and
> handling !NOHZ configuration.
> 
> I have also noticed we have get_cpu_idle_time_us which should do
> something similar. Should it be used instead or it is more intrusive?
> 
> Btw. is this considered to be a problem at all?
> 

I'd consider it a bug and a regression.  If the machine was idle and
/proc/stat says "zero idle time" then that is simply incorrect.

Can we just cheat?  subtract elapsed R and D time from elapsed wall
time and print that out?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Have we changed /proc/stat idle statistics by NOHZ kernel?
  2011-08-01 19:59 ` Andrew Morton
@ 2011-08-02 17:02   ` Michal Hocko
       [not found]     ` <cover.1312544541.git.mhocko@suse.cz>
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2011-08-02 17:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Thomas Gleixner, Alexey Dobriyan

On Mon 01-08-11 12:59:16, Andrew Morton wrote:
> On Mon, 25 Jul 2011 16:33:13 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Hi,
> > we have a customer reporting that /proc/stat doesn't provide correct
> > results about idle time if the machine is idle.
> > The issue is caused by the fact that tickles kernel doesn't update
> > kstat_cpu(i).cpustat.idle while it is tickles. Tools that parse this
> > file interpret the unchanged value as 0% idle since the last time.
> > While I personally do not think that measuring the idle machine is
> > that important one could say that the semantic of the file has changed
> > with NOHZ which is not good as we are trying to keep this interface
> > stable.
> > One way to fix this is to consider the current status of idle in
> > show_stat. The very primitive attempt of that can be seen bellow (on
> > top of the current Linus tree). I know it has several issue it just
> > illustrates what I am trying to say.  It will not work if jiffies
> > overflow while the CPU was tickles and it also misses locking and
> > handling !NOHZ configuration.
> > 
> > I have also noticed we have get_cpu_idle_time_us which should do
> > something similar. Should it be used instead or it is more intrusive?
> > 
> > Btw. is this considered to be a problem at all?
> > 
> 
> I'd consider it a bug and a regression.  If the machine was idle and
> /proc/stat says "zero idle time" then that is simply incorrect.
> 
> Can we just cheat?  subtract elapsed R and D time from elapsed wall
> time and print that out?

I was thinking about that as well. Something like 
now - (cpustat.user + cpustat.system + cpustat.iowait)

but this has the similar problem because iowait is accounted the same
way as idle.
Or did you mean some other counters?

I am currently looking into using get_cpu_idle_time_us which should be
more convenient but I do not like that it calls update_ts_time_stats
unconditionally (because then we will race with governors which use that
function as well). 

I will try to rip the core out of the function and reuse it here. 
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] proc: consider NO_HZ when printing idle and iowait times
       [not found]       ` <9314d03604802205b02524ebda1e534547042dfa.1312544541.git.mhocko@suse.cz>
@ 2011-08-05 14:23         ` Michal Hocko
  2011-08-22  9:56             ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2011-08-05 14:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Jones, Arnd Bergmann, Thomas Gleixner, Andrew Morton,
	cpufreq, Alexey Dobriyan

On Thu 04-08-11 17:20:32, Michal Hocko wrote:
> show_stat handler of the /proc/stat file relies on kstat_cpu(cpu)
> statistics when priting information about idle and iowait times.
> This is OK if we are not using tickless kernel (CONFIG_NO_HZ) because
> counters are updated periodically.
> With NO_HZ things got more tricky because we are not doing idle/iowait
> accounting while we are tickless so the value might get outdated.
> Users of /proc/stat will notice that by unchanged idle/iowait values
> which is then interpreted as 0% idle/iowait time. From the user space
> POV this is an unexpected behavior and a change of the interface.
> 
> Let's fix this by using get_cpu_{idle,iowait}_time_us which accounts the
> total idle/iowait time since boot and it doesn't rely on sampling or any
> other periodic activity. Fall back to the previous behavior if NO_HZ is
> disabled or not configured.

I forgot to mention that this might be racy because we are updating
those per-cpu values without having preemption disabled or any other
locking which would be necessary as governors iterate over all CPUs.
Governors do not have to care about that because they are singletons. 
Introducing locks doesn't look like an option but I was thinking
about adding __get_cpu_{idle,iowait}_time_us which wouldn't call
update_ts_timestat and calculate the result instead.
I can add a patch which does that but I wanted to hear about general
approach first.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3 RFC] Fix /proc/stat idle/iowait statistics for tickless kernel
       [not found]     ` <cover.1312544541.git.mhocko@suse.cz>
       [not found]       ` <9314d03604802205b02524ebda1e534547042dfa.1312544541.git.mhocko@suse.cz>
@ 2011-08-17 11:59       ` Michal Hocko
  2011-08-23 21:18         ` Andrew Morton
       [not found]       ` <c6c176ace7eca5fffe568865193a1061d181c77c.1312544541.git.mhocko@suse.cz>
  2 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2011-08-17 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Jones, Arnd Bergmann, Thomas Gleixner, Andrew Morton,
	cpufreq, Alexey Dobriyan

Nobody interested or it just slipped through?

On Fri 05-08-11 13:42:21, Michal Hocko wrote:
> [Added cpufreq people into CC - the email thread started at
> https://lkml.org/lkml/2011/7/25/227]
> 
> This patch set aims at addressing /proc/stat issue which has been
> introduced with tickless kernel.
> In short, show_stat (proc handler) relies on kstat_cpu(i).cpustat
> statistics which are updated periodically so those numbers are more or
> less accurate.
> This is, however, not true with tickless kernel for idle and iowait
> counters because those are not updated while the cpu is in the tickless
> state. As the time when CPU might be tickless is not bounded, we can see
> really outdated values.
> The biggest problem is that tools which read /proc/stat interpret
> unchanged idle/iowait numbers as 0% idle/iowait which might confuse
> those who rely on them.
> 
> First patch in this series changes update_ts_time_stat semantic. The
> current implementation updates idle counter regardless we are in iowait
> loop at the moment. I see it as an optimization because cpufreq drivers,
> which are only users of those counters, care about busy vs. non-busy
> states so idle+iowait makes perfect sense. This, however, makes idle
> counter useless for others. 
> I think that calling using get_cpu_idle_time_us + get_cpu_iowait_time_us
> should have the same meaning (at least this is what we do for jiffies
> variants).
> This is necessary for the third patch.
> 
> The second patch is trivial and just a cleanup. It is not necessary for
> the series but I fixed those macros while I was at it.
> 
> The third patch uses get_cpu_{idle,iowait}_time_us to get precise
> counters. We still fall back to kstat_cpu if tickless kernel is
> disabled.
> 
> Thanks for any comments.
> 
> Michal Hocko (3):
>   tick: fix update_ts_time_stat idle accounting
>   cputime: clean up cputime_to_usecs and usecs_to_cputime macros
>   proc: consider NO_HZ when printing idle and iowait times
> 
>  drivers/cpufreq/cpufreq_conservative.c |    4 ++-
>  drivers/cpufreq/cpufreq_ondemand.c     |    4 ++-
>  fs/proc/stat.c                         |   41 ++++++++++++++++++++++++++-----
>  include/asm-generic/cputime.h          |    4 +-
>  kernel/time/tick-sched.c               |    8 +++---
>  5 files changed, 46 insertions(+), 15 deletions(-)
> 
> -- 
> 1.7.5.4

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] cputime: clean up cputime_to_usecs and usecs_to_cputime macros
       [not found]       ` <c6c176ace7eca5fffe568865193a1061d181c77c.1312544541.git.mhocko@suse.cz>
@ 2011-08-17 14:02         ` Arnd Bergmann
  0 siblings, 0 replies; 9+ messages in thread
From: Arnd Bergmann @ 2011-08-17 14:02 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Dave Jones, Thomas Gleixner, Andrew Morton,
	cpufreq, Alexey Dobriyan

On Thursday 04 August 2011, Michal Hocko wrote:
> get rid of semicolon so that those expressions can be used also
> somewhere else than just in an assignment.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nohz: do not update idle/iowait counters from get_cpu_{idle,iowait}_time_us if not asked
  2011-08-05 14:23         ` [PATCH 3/3] proc: consider NO_HZ when printing idle and iowait times Michal Hocko
@ 2011-08-22  9:56             ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2011-08-22  9:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Jones, Arnd Bergmann, Thomas Gleixner, Andrew Morton,
	cpufreq, Alexey Dobriyan

On Fri 05-08-11 16:23:49, Michal Hocko wrote:
> On Thu 04-08-11 17:20:32, Michal Hocko wrote:
> > show_stat handler of the /proc/stat file relies on kstat_cpu(cpu)
> > statistics when priting information about idle and iowait times.
> > This is OK if we are not using tickless kernel (CONFIG_NO_HZ) because
> > counters are updated periodically.
> > With NO_HZ things got more tricky because we are not doing idle/iowait
> > accounting while we are tickless so the value might get outdated.
> > Users of /proc/stat will notice that by unchanged idle/iowait values
> > which is then interpreted as 0% idle/iowait time. From the user space
> > POV this is an unexpected behavior and a change of the interface.
> > 
> > Let's fix this by using get_cpu_{idle,iowait}_time_us which accounts the
> > total idle/iowait time since boot and it doesn't rely on sampling or any
> > other periodic activity. Fall back to the previous behavior if NO_HZ is
> > disabled or not configured.
> 
> I forgot to mention that this might be racy because we are updating
> those per-cpu values without having preemption disabled or any other
> locking which would be necessary as governors iterate over all CPUs.
> Governors do not have to care about that because they are singletons. 
> Introducing locks doesn't look like an option but I was thinking
> about adding __get_cpu_{idle,iowait}_time_us which wouldn't call
> update_ts_timestat and calculate the result instead.
> I can add a patch which does that but I wanted to hear about general
> approach first.

I guess we do not need a separate __get_cpu_{idle,iowait}_time_us
variant and rather reuse last_update_time parameter to determine whether
to update counters or not.

AFAICS we can still race with IRQ in update path (governors):
irq_enter
  tick_check_idle
    tick_check_nohz
      tick_nohz_stop_idle
but this is a separate issue IMO.
--- 
>From 46b3c7735b23180299c6a85089571574f28527a2 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Mon, 22 Aug 2011 11:35:27 +0200
Subject: [PATCH] nohz: do not update idle/iowait counters from
 get_cpu_{idle,iowait}_time_us if not asked

get_cpu_{idle,iowait}_time_us update idle/iowait counters
unconditionally if the given CPU is in the idle loop. This doesn't work
well outside of CPU governors which are singletons so nobody (except for
IRQ) can race with them.
We will need to use both functions from /proc/stat handler to properly
handle nohz idle/iowait times.
Let's update those counters only if the given last_update_time parameter
is non-NULL which means that the caller is interested in updating.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 kernel/time/tick-sched.c |   37 +++++++++++++++++++++++++++++++------
 1 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 7ab44bc..5bcff38 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -198,7 +198,8 @@ static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
 /**
  * get_cpu_idle_time_us - get the total idle time of a cpu
  * @cpu: CPU number to query
- * @last_update_time: variable to store update time in
+ * @last_update_time: variable to store update time in. Do not update
+ * counters if NULL.
  *
  * Return the cummulative idle time (since boot) for a given
  * CPU, in microseconds.
@@ -211,20 +212,33 @@ static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
 u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+	ktime_t now, idle;
 
 	if (!tick_nohz_enabled)
 		return -1;
 
-	update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);
+	now = ktime_get();
+	if (last_update_time) {
+		update_ts_time_stats(cpu, ts, now, last_update_time);
+		idle = ts->idle_sleeptime;
+	} else {
+		if (ts->idle_active && !nr_iowait_cpu(cpu)) {
+			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+			idle = ktime_add(ts->idle_sleeptime, delta);
+		} else
+			idle = ts->idle_sleeptime;
+	}
+
+	return ktime_to_us(idle);
 
-	return ktime_to_us(ts->idle_sleeptime);
 }
 EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
 
 /**
  * get_cpu_iowait_time_us - get the total iowait time of a cpu
  * @cpu: CPU number to query
- * @last_update_time: variable to store update time in
+ * @last_update_time: variable to store update time in. Do not update
+ * counters if NULL.
  *
  * Return the cummulative iowait time (since boot) for a given
  * CPU, in microseconds.
@@ -237,13 +251,24 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
 u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+	ktime_t now, iowait;
 
 	if (!tick_nohz_enabled)
 		return -1;
 
-	update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);
+	now = ktime_get();
+	if (last_update_time) {
+		update_ts_time_stats(cpu, ts, now, last_update_time);
+		iowait = ts->iowait_sleeptime;
+	} else {
+		if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
+			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+			iowait = ktime_add(ts->iowait_sleeptime, delta);
+		} else
+			iowait = ts->iowait_sleeptime;
+	}
 
-	return ktime_to_us(ts->iowait_sleeptime);
+	return ktime_to_us(iowait);
 }
 EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
 
-- 
1.7.5.4


-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] nohz: do not update idle/iowait counters from get_cpu_{idle,iowait}_time_us if not asked
@ 2011-08-22  9:56             ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2011-08-22  9:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Jones, Arnd Bergmann, Thomas Gleixner, Andrew Morton,
	cpufreq, Alexey Dobriyan

On Fri 05-08-11 16:23:49, Michal Hocko wrote:
> On Thu 04-08-11 17:20:32, Michal Hocko wrote:
> > show_stat handler of the /proc/stat file relies on kstat_cpu(cpu)
> > statistics when priting information about idle and iowait times.
> > This is OK if we are not using tickless kernel (CONFIG_NO_HZ) because
> > counters are updated periodically.
> > With NO_HZ things got more tricky because we are not doing idle/iowait
> > accounting while we are tickless so the value might get outdated.
> > Users of /proc/stat will notice that by unchanged idle/iowait values
> > which is then interpreted as 0% idle/iowait time. From the user space
> > POV this is an unexpected behavior and a change of the interface.
> > 
> > Let's fix this by using get_cpu_{idle,iowait}_time_us which accounts the
> > total idle/iowait time since boot and it doesn't rely on sampling or any
> > other periodic activity. Fall back to the previous behavior if NO_HZ is
> > disabled or not configured.
> 
> I forgot to mention that this might be racy because we are updating
> those per-cpu values without having preemption disabled or any other
> locking which would be necessary as governors iterate over all CPUs.
> Governors do not have to care about that because they are singletons. 
> Introducing locks doesn't look like an option but I was thinking
> about adding __get_cpu_{idle,iowait}_time_us which wouldn't call
> update_ts_timestat and calculate the result instead.
> I can add a patch which does that but I wanted to hear about general
> approach first.

I guess we do not need a separate __get_cpu_{idle,iowait}_time_us
variant and rather reuse last_update_time parameter to determine whether
to update counters or not.

AFAICS we can still race with IRQ in update path (governors):
irq_enter
  tick_check_idle
    tick_check_nohz
      tick_nohz_stop_idle
but this is a separate issue IMO.
--- 
From 46b3c7735b23180299c6a85089571574f28527a2 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Mon, 22 Aug 2011 11:35:27 +0200
Subject: [PATCH] nohz: do not update idle/iowait counters from
 get_cpu_{idle,iowait}_time_us if not asked

get_cpu_{idle,iowait}_time_us update idle/iowait counters
unconditionally if the given CPU is in the idle loop. This doesn't work
well outside of CPU governors which are singletons so nobody (except for
IRQ) can race with them.
We will need to use both functions from /proc/stat handler to properly
handle nohz idle/iowait times.
Let's update those counters only if the given last_update_time parameter
is non-NULL which means that the caller is interested in updating.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 kernel/time/tick-sched.c |   37 +++++++++++++++++++++++++++++++------
 1 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 7ab44bc..5bcff38 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -198,7 +198,8 @@ static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
 /**
  * get_cpu_idle_time_us - get the total idle time of a cpu
  * @cpu: CPU number to query
- * @last_update_time: variable to store update time in
+ * @last_update_time: variable to store update time in. Do not update
+ * counters if NULL.
  *
  * Return the cummulative idle time (since boot) for a given
  * CPU, in microseconds.
@@ -211,20 +212,33 @@ static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
 u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+	ktime_t now, idle;
 
 	if (!tick_nohz_enabled)
 		return -1;
 
-	update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);
+	now = ktime_get();
+	if (last_update_time) {
+		update_ts_time_stats(cpu, ts, now, last_update_time);
+		idle = ts->idle_sleeptime;
+	} else {
+		if (ts->idle_active && !nr_iowait_cpu(cpu)) {
+			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+			idle = ktime_add(ts->idle_sleeptime, delta);
+		} else
+			idle = ts->idle_sleeptime;
+	}
+
+	return ktime_to_us(idle);
 
-	return ktime_to_us(ts->idle_sleeptime);
 }
 EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
 
 /**
  * get_cpu_iowait_time_us - get the total iowait time of a cpu
  * @cpu: CPU number to query
- * @last_update_time: variable to store update time in
+ * @last_update_time: variable to store update time in. Do not update
+ * counters if NULL.
  *
  * Return the cummulative iowait time (since boot) for a given
  * CPU, in microseconds.
@@ -237,13 +251,24 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
 u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+	ktime_t now, iowait;
 
 	if (!tick_nohz_enabled)
 		return -1;
 
-	update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);
+	now = ktime_get();
+	if (last_update_time) {
+		update_ts_time_stats(cpu, ts, now, last_update_time);
+		iowait = ts->iowait_sleeptime;
+	} else {
+		if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
+			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+			iowait = ktime_add(ts->iowait_sleeptime, delta);
+		} else
+			iowait = ts->iowait_sleeptime;
+	}
 
-	return ktime_to_us(ts->iowait_sleeptime);
+	return ktime_to_us(iowait);
 }
 EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
 
-- 
1.7.5.4


-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3 RFC] Fix /proc/stat idle/iowait statistics for tickless kernel
  2011-08-17 11:59       ` [PATCH 0/3 RFC] Fix /proc/stat idle/iowait statistics for tickless kernel Michal Hocko
@ 2011-08-23 21:18         ` Andrew Morton
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2011-08-23 21:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Dave Jones, Arnd Bergmann, Thomas Gleixner,
	cpufreq, Alexey Dobriyan

On Wed, 17 Aug 2011 13:59:25 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> Nobody interested or it just slipped through?

I was asleep on a beach at the time, am catching up now.

The patchset is a bit confusing, with an afterthough followup patch. 
Please redo it all and send a v2 series?

Then I'll queue it up and will periodically harrass Thomas with it, so
there's no point in hiding ;)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-08-23 21:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-25 14:33 Have we changed /proc/stat idle statistics by NOHZ kernel? Michal Hocko
2011-08-01 19:59 ` Andrew Morton
2011-08-02 17:02   ` Michal Hocko
     [not found]     ` <cover.1312544541.git.mhocko@suse.cz>
     [not found]       ` <9314d03604802205b02524ebda1e534547042dfa.1312544541.git.mhocko@suse.cz>
2011-08-05 14:23         ` [PATCH 3/3] proc: consider NO_HZ when printing idle and iowait times Michal Hocko
2011-08-22  9:56           ` [PATCH] nohz: do not update idle/iowait counters from get_cpu_{idle,iowait}_time_us if not asked Michal Hocko
2011-08-22  9:56             ` Michal Hocko
2011-08-17 11:59       ` [PATCH 0/3 RFC] Fix /proc/stat idle/iowait statistics for tickless kernel Michal Hocko
2011-08-23 21:18         ` Andrew Morton
     [not found]       ` <c6c176ace7eca5fffe568865193a1061d181c77c.1312544541.git.mhocko@suse.cz>
2011-08-17 14:02         ` [PATCH 2/3] cputime: clean up cputime_to_usecs and usecs_to_cputime macros Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.