All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Proper kernel irq time reporting -v2
@ 2010-12-22  1:08 Venkatesh Pallipadi
  2010-12-22  1:09 ` [PATCH 1/5] Free up pf flag PF_KSOFTIRQD -v2 Venkatesh Pallipadi
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-22  1:08 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Balbir Singh, Martin Schwidefsky
  Cc: linux-kernel, Paul Turner, Eric Dumazet, Shaun Ruffell,
	Venkatesh Pallipadi

This is Part 2 of
"Proper kernel irq time accounting -v4"
http://lkml.indiana.edu/hypermail//linux/kernel/1010.0/01175.html

and applies over 2.6.37-rc7.

Part 1 solves the way irqs are accounted in scheduler and tasks. This
patchset solves how irq times are reported in /proc/stat and also not
to include irq time in task->stime, etc.

Example:
Running a cpu intensive loop and network intensive nc on a 4 CPU system
and looking at 'top' output.

With vanilla kernel:
Cpu0  :  0.0% us,  0.3% sy,  0.0% ni, 99.3% id,  0.0% wa,  0.0% hi,  0.3% si
Cpu1  : 100.0% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu2  :  1.3% us, 27.2% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 71.4% si
Cpu3  :  1.6% us,  1.3% sy,  0.0% ni, 96.7% id,  0.0% wa,  0.0% hi,  0.3% si

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7555 root      20   0  1760  528  436 R  100  0.0   0:15.79 nc
 7563 root      20   0  3632  268  204 R  100  0.0   0:13.13 loop

Notes:
* Both tasks show 100% CPU, even when one of them is stuck on a CPU thats
  processing 70% softirq.
* no hardirq time.


With "Part 1" patches:
Cpu0  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu1  : 100.0% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu2  :  2.0% us, 30.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 67.4% si
Cpu3  :  0.7% us,  0.7% sy,  0.3% ni, 98.3% id,  0.0% wa,  0.0% hi,  0.0% si

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6289 root      20   0  3632  268  204 R  100  0.0   2:18.67 loop
 5737 root      20   0  1760  528  436 R   33  0.0   0:26.72 nc

Notes:
* Tasks show 100% CPU and 33% CPU that correspond to their non-irq exec time.
* no hardirq time.


With "Part 1 + Part 2" patches:
Cpu0  :  1.3% us,  1.0% sy,  0.3% ni, 97.0% id,  0.0% wa,  0.0% hi,  0.3% si
Cpu1  : 99.3% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.7% hi,  0.0% si
Cpu2  :  1.3% us, 31.5% sy,  0.0% ni,  0.0% id,  0.0% wa,  8.3% hi, 58.9% si
Cpu3  :  1.0% us,  2.0% sy,  0.3% ni, 95.0% id,  0.0% wa,  0.7% hi,  1.0% si

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20929 root      20   0  3632  268  204 R   99  0.0   3:48.25 loop
20796 root      20   0  1760  528  436 R   33  0.0   2:38.65 nc

Notes:
* Both task exec time and hard irq time reported correctly.
* hi and si time are based on fine granularity info and not on samples.
* getrusage would give proper utime/stime split not including irq times
  in that ratio.
* Other places that report user/sys time like, cgroup cpuacct.stat will
  now include only non-irq exectime.


Changes since v1:
( v1 - http://kerneltrap.com/mailarchive/linux-kernel/2010/10/25/4637170 )
* Rebase over other related code changes since v1

Signed-off-by: Venkatesh Pallipadi <venki@google.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] Free up pf flag PF_KSOFTIRQD -v2
  2010-12-22  1:08 [PATCH 0/5] Proper kernel irq time reporting -v2 Venkatesh Pallipadi
@ 2010-12-22  1:09 ` Venkatesh Pallipadi
  2010-12-22  9:17   ` Peter Zijlstra
  2011-01-26 12:11   ` [tip:sched/core] softirqs: Free up pf flag PF_KSOFTIRQD tip-bot for Venkatesh Pallipadi
  2010-12-22  1:09 ` [PATCH 2/5] Add nsecs_to_cputime64 interface for asm-generic -v2 Venkatesh Pallipadi
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-22  1:09 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Balbir Singh, Martin Schwidefsky
  Cc: linux-kernel, Paul Turner, Eric Dumazet, Shaun Ruffell,
	Venkatesh Pallipadi

Patchset:
This is Part 2 of
"Proper kernel irq time accounting -v4"
http://lkml.indiana.edu/hypermail//linux/kernel/1010.0/01175.html

and applies 2.6.37-rc7.

Part 1 solves the way irqs are accounted in scheduler and tasks. This
patchset solves how irq times are reported in /proc/stat and also not
to include irq time in task->stime, etc.

Example:
Running a cpu intensive loop and network intensive nc on a 4 CPU system
and looking at 'top' output.

With vanilla kernel:
Cpu0  :  0.0% us,  0.3% sy,  0.0% ni, 99.3% id,  0.0% wa,  0.0% hi,  0.3% si
Cpu1  : 100.0% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu2  :  1.3% us, 27.2% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 71.4% si
Cpu3  :  1.6% us,  1.3% sy,  0.0% ni, 96.7% id,  0.0% wa,  0.0% hi,  0.3% si

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7555 root      20   0  1760  528  436 R  100  0.0   0:15.79 nc
 7563 root      20   0  3632  268  204 R  100  0.0   0:13.13 loop

Notes:
* Both tasks show 100% CPU, even when one of them is stuck on a CPU thats
  processing 70% softirq.
* no hardirq time.


With "Part 1" patches:
Cpu0  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu1  : 100.0% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu2  :  2.0% us, 30.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 67.4% si
Cpu3  :  0.7% us,  0.7% sy,  0.3% ni, 98.3% id,  0.0% wa,  0.0% hi,  0.0% si

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6289 root      20   0  3632  268  204 R  100  0.0   2:18.67 loop
 5737 root      20   0  1760  528  436 R   33  0.0   0:26.72 nc

Notes:
* Tasks show 100% CPU and 33% CPU that correspond to their non-irq exec time.
* no hardirq time.


With "Part 1 + Part 2" patches:
Cpu0  :  1.3% us,  1.0% sy,  0.3% ni, 97.0% id,  0.0% wa,  0.0% hi,  0.3% si
Cpu1  : 99.3% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.7% hi,  0.0% si
Cpu2  :  1.3% us, 31.5% sy,  0.0% ni,  0.0% id,  0.0% wa,  8.3% hi, 58.9% si
Cpu3  :  1.0% us,  2.0% sy,  0.3% ni, 95.0% id,  0.0% wa,  0.7% hi,  1.0% si

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20929 root      20   0  3632  268  204 R   99  0.0   3:48.25 loop
20796 root      20   0  1760  528  436 R   33  0.0   2:38.65 nc

Notes:
* Both task exec time and hard irq time reported correctly.
* hi and si time are based on fine granularity info and not on samples.
* getrusage would give proper utime/stime split not including irq times
  in that ratio.
* Other places that report user/sys time like, cgroup cpuacct.stat will
  now include only non-irq exectime.

This patch:

Cleanup patch, freeing up PF_KSOFTIRQD and use per_cpu ksoftirqd pointer
instead, as suggested by Eric Dumazet.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 include/linux/interrupt.h |    7 +++++++
 include/linux/sched.h     |    1 -
 kernel/sched.c            |    2 +-
 kernel/softirq.c          |    3 +--
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 79d0c4f..3802fac 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -426,6 +426,13 @@ extern void raise_softirq(unsigned int nr);
  */
 DECLARE_PER_CPU(struct list_head [NR_SOFTIRQS], softirq_work_list);
 
+DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
+
+static inline struct task_struct *this_cpu_ksoftirqd(void)
+{
+	return this_cpu_read(ksoftirqd);
+}
+
 /* Try to send a softirq to a remote cpu.  If this cannot be done, the
  * work will be queued to the local cpu.
  */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2238745..86924ff 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1699,7 +1699,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 /*
  * Per process flags
  */
-#define PF_KSOFTIRQD	0x00000001	/* I am ksoftirqd */
 #define PF_STARTING	0x00000002	/* being created */
 #define PF_EXITING	0x00000004	/* getting shut down */
 #define PF_EXITPIDONE	0x00000008	/* pi exit done on shut down */
diff --git a/kernel/sched.c b/kernel/sched.c
index 297d1a0..bfc9646 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2011,7 +2011,7 @@ void account_system_vtime(struct task_struct *curr)
 	 */
 	if (hardirq_count())
 		__this_cpu_add(cpu_hardirq_time, delta);
-	else if (in_serving_softirq() && !(curr->flags & PF_KSOFTIRQD))
+	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
 		__this_cpu_add(cpu_softirq_time, delta);
 
 	irq_time_write_end();
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 18f4be0..b904be8 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -54,7 +54,7 @@ EXPORT_SYMBOL(irq_stat);
 
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 
-static DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
 
 char *softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
@@ -721,7 +721,6 @@ static int run_ksoftirqd(void * __bind_cpu)
 {
 	set_current_state(TASK_INTERRUPTIBLE);
 
-	current->flags |= PF_KSOFTIRQD;
 	while (!kthread_should_stop()) {
 		preempt_disable();
 		if (!local_softirq_pending()) {
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] Add nsecs_to_cputime64 interface for asm-generic -v2
  2010-12-22  1:08 [PATCH 0/5] Proper kernel irq time reporting -v2 Venkatesh Pallipadi
  2010-12-22  1:09 ` [PATCH 1/5] Free up pf flag PF_KSOFTIRQD -v2 Venkatesh Pallipadi
@ 2010-12-22  1:09 ` Venkatesh Pallipadi
  2010-12-22  8:30   ` Martin Schwidefsky
  2011-01-26 12:12   ` [tip:sched/core] time: Add nsecs_to_cputime64 interface for asm-generic tip-bot for Venkatesh Pallipadi
  2010-12-22  1:09 ` [PATCH 3/5] Refactor account_system_time separating id-update -v2 Venkatesh Pallipadi
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-22  1:09 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Balbir Singh, Martin Schwidefsky
  Cc: linux-kernel, Paul Turner, Eric Dumazet, Shaun Ruffell,
	Venkatesh Pallipadi

Add nsecs_to_cputime64 interface. This is used in following patches that
updates cpu irq stat based on ns granularity info in IRQ_TIME_ACCOUNTING.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 include/asm-generic/cputime.h |    3 +++
 include/linux/jiffies.h       |    1 +
 kernel/time.c                 |   23 +++++++++++++++++++++--
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/cputime.h b/include/asm-generic/cputime.h
index 2bcc5c7..61e03dd 100644
--- a/include/asm-generic/cputime.h
+++ b/include/asm-generic/cputime.h
@@ -30,6 +30,9 @@ typedef u64 cputime64_t;
 #define cputime64_to_jiffies64(__ct)	(__ct)
 #define jiffies64_to_cputime64(__jif)	(__jif)
 #define cputime_to_cputime64(__ct)	((u64) __ct)
+#define cputime64_gt(__a, __b)		((__a) >  (__b))
+
+#define nsecs_to_cputime64(__ct)	nsecs_to_jiffies64(__ct)
 
 
 /*
diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 6811f4b..922aa31 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -307,6 +307,7 @@ extern clock_t jiffies_to_clock_t(long x);
 extern unsigned long clock_t_to_jiffies(unsigned long x);
 extern u64 jiffies_64_to_clock_t(u64 x);
 extern u64 nsec_to_clock_t(u64 x);
+extern u64 nsecs_to_jiffies64(u64 n);
 extern unsigned long nsecs_to_jiffies(u64 n);
 
 #define TIMESTAMP_SIZE	30
diff --git a/kernel/time.c b/kernel/time.c
index ba9b338..fde4691 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -645,7 +645,7 @@ u64 nsec_to_clock_t(u64 x)
 }
 
 /**
- * nsecs_to_jiffies - Convert nsecs in u64 to jiffies
+ * nsecs_to_jiffies64 - Convert nsecs in u64 to jiffies64
  *
  * @n:	nsecs in u64
  *
@@ -657,7 +657,7 @@ u64 nsec_to_clock_t(u64 x)
  *   NSEC_PER_SEC = 10^9 = (5^9 * 2^9) = (1953125 * 512)
  *   ULLONG_MAX ns = 18446744073.709551615 secs = about 584 years
  */
-unsigned long nsecs_to_jiffies(u64 n)
+u64 nsecs_to_jiffies64(u64 n)
 {
 #if (NSEC_PER_SEC % HZ) == 0
 	/* Common case, HZ = 100, 128, 200, 250, 256, 500, 512, 1000 etc. */
@@ -674,6 +674,25 @@ unsigned long nsecs_to_jiffies(u64 n)
 #endif
 }
 
+
+/**
+ * nsecs_to_jiffies - Convert nsecs in u64 to jiffies
+ *
+ * @n:	nsecs in u64
+ *
+ * Unlike {m,u}secs_to_jiffies, type of input is not unsigned int but u64.
+ * And this doesn't return MAX_JIFFY_OFFSET since this function is designed
+ * for scheduler, not for use in device drivers to calculate timeout value.
+ *
+ * note:
+ *   NSEC_PER_SEC = 10^9 = (5^9 * 2^9) = (1953125 * 512)
+ *   ULLONG_MAX ns = 18446744073.709551615 secs = about 584 years
+ */
+unsigned long nsecs_to_jiffies(u64 n)
+{
+	return (unsigned long)nsecs_to_jiffies64(n);
+}
+
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
 {
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] Refactor account_system_time separating id-update -v2
  2010-12-22  1:08 [PATCH 0/5] Proper kernel irq time reporting -v2 Venkatesh Pallipadi
  2010-12-22  1:09 ` [PATCH 1/5] Free up pf flag PF_KSOFTIRQD -v2 Venkatesh Pallipadi
  2010-12-22  1:09 ` [PATCH 2/5] Add nsecs_to_cputime64 interface for asm-generic -v2 Venkatesh Pallipadi
@ 2010-12-22  1:09 ` Venkatesh Pallipadi
  2011-01-26 12:12   ` [tip:sched/core] sched: Refactor account_system_time separating id-update tip-bot for Venkatesh Pallipadi
  2010-12-22  1:09 ` [PATCH 4/5] Export ns irqtimes through /proc/stat -v2 Venkatesh Pallipadi
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-22  1:09 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Balbir Singh, Martin Schwidefsky
  Cc: linux-kernel, Paul Turner, Eric Dumazet, Shaun Ruffell,
	Venkatesh Pallipadi

Refactor account_system_time, to separate out the logic of
identifying the update needed and code that does actual update.

This is used by following patch for IRQ_TIME_ACCOUNTING,
which has different identification logic and same update logic.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 kernel/sched.c |   46 +++++++++++++++++++++++++++++++---------------
 1 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index bfc9646..c428017 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3701,6 +3701,32 @@ static void account_guest_time(struct task_struct *p, cputime_t cputime,
 }
 
 /*
+ * Account system cpu time to a process and desired cpustat field
+ * @p: the process that the cpu time gets accounted to
+ * @cputime: the cpu time spent in kernel space since the last update
+ * @cputime_scaled: cputime scaled by cpu frequency
+ * @target_cputime64: pointer to cpustat field that has to be updated
+ */
+static inline
+void __account_system_time(struct task_struct *p, cputime_t cputime,
+			cputime_t cputime_scaled, cputime64_t *target_cputime64)
+{
+	cputime64_t tmp = cputime_to_cputime64(cputime);
+
+	/* Add system time to process. */
+	p->stime = cputime_add(p->stime, cputime);
+	p->stimescaled = cputime_add(p->stimescaled, cputime_scaled);
+	account_group_system_time(p, cputime);
+
+	/* Add system time to cpustat. */
+	*target_cputime64 = cputime64_add(*target_cputime64, tmp);
+	cpuacct_update_stats(p, CPUACCT_STAT_SYSTEM, cputime);
+
+	/* Account for system time used */
+	acct_update_integrals(p);
+}
+
+/*
  * Account system cpu time to a process.
  * @p: the process that the cpu time gets accounted to
  * @hardirq_offset: the offset to subtract from hardirq_count()
@@ -3711,31 +3737,21 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
 			 cputime_t cputime, cputime_t cputime_scaled)
 {
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
-	cputime64_t tmp;
+	cputime64_t *target_cputime64;
 
 	if ((p->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0)) {
 		account_guest_time(p, cputime, cputime_scaled);
 		return;
 	}
 
-	/* Add system time to process. */
-	p->stime = cputime_add(p->stime, cputime);
-	p->stimescaled = cputime_add(p->stimescaled, cputime_scaled);
-	account_group_system_time(p, cputime);
-
-	/* Add system time to cpustat. */
-	tmp = cputime_to_cputime64(cputime);
 	if (hardirq_count() - hardirq_offset)
-		cpustat->irq = cputime64_add(cpustat->irq, tmp);
+		target_cputime64 = &cpustat->irq;
 	else if (in_serving_softirq())
-		cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
+		target_cputime64 = &cpustat->softirq;
 	else
-		cpustat->system = cputime64_add(cpustat->system, tmp);
+		target_cputime64 = &cpustat->system;
 
-	cpuacct_update_stats(p, CPUACCT_STAT_SYSTEM, cputime);
-
-	/* Account for system time used */
-	acct_update_integrals(p);
+	__account_system_time(p, cputime, cputime_scaled, target_cputime64);
 }
 
 /*
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] Export ns irqtimes through /proc/stat -v2
  2010-12-22  1:08 [PATCH 0/5] Proper kernel irq time reporting -v2 Venkatesh Pallipadi
                   ` (2 preceding siblings ...)
  2010-12-22  1:09 ` [PATCH 3/5] Refactor account_system_time separating id-update -v2 Venkatesh Pallipadi
@ 2010-12-22  1:09 ` Venkatesh Pallipadi
  2011-01-26 12:13   ` [tip:sched/core] sched: Export ns irqtimes through /proc/stat tip-bot for Venkatesh Pallipadi
  2010-12-22  1:09 ` [PATCH 5/5] Account ksoftirqd time as cpustat softirq -v2 Venkatesh Pallipadi
  2011-01-06 15:31 ` [PATCH 0/5] Proper kernel irq time reporting -v2 Shaun Ruffell
  5 siblings, 1 reply; 20+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-22  1:09 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Balbir Singh, Martin Schwidefsky
  Cc: linux-kernel, Paul Turner, Eric Dumazet, Shaun Ruffell,
	Venkatesh Pallipadi

CONFIG_IRQ_TIME_ACCOUNTING adds ns granularity irq time on each CPU.
This info is already used in scheduler to do proper task chargeback
(earlier patches). This patch retro-fits this ns granularity
hardirq and softirq information to /proc/stat irq and softirq fields.

The update is still done on timer tick, where we look at accumulated
ns hardirq/softirq time and account the tick to user/system/irq/hardirq/guest
accordingly.

No new interface added.

Earlier versions looked at adding this as new fields in some /proc
files. This one seems to be the best in terms of impact to existing
apps, even though it has somewhat more kernel code than earlier versions.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 kernel/sched.c |  102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index c428017..193b1d0 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2051,8 +2051,40 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 		sched_rt_avg_update(rq, irq_delta);
 }
 
+static int irqtime_account_hi_update(void)
+{
+	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+	unsigned long flags;
+	u64 latest_ns;
+	int ret = 0;
+
+	local_irq_save(flags);
+	latest_ns = this_cpu_read(cpu_hardirq_time);
+	if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->irq))
+		ret = 1;
+	local_irq_restore(flags);
+	return ret;
+}
+
+static int irqtime_account_si_update(void)
+{
+	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+	unsigned long flags;
+	u64 latest_ns;
+	int ret = 0;
+
+	local_irq_save(flags);
+	latest_ns = this_cpu_read(cpu_softirq_time);
+	if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->softirq))
+		ret = 1;
+	local_irq_restore(flags);
+	return ret;
+}
+
 #else /* CONFIG_IRQ_TIME_ACCOUNTING */
 
+#define sched_clock_irqtime	(0)
+
 static void update_rq_clock_task(struct rq *rq, s64 delta)
 {
 	rq->clock_task += delta;
@@ -3754,6 +3786,65 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
 	__account_system_time(p, cputime, cputime_scaled, target_cputime64);
 }
 
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+/*
+ * Account a tick to a process and cpustat
+ * @p: the process that the cpu time gets accounted to
+ * @user_tick: is the tick from userspace
+ * @rq: the pointer to rq
+ *
+ * Tick demultiplexing follows the order
+ * - pending hardirq update
+ * - pending softirq update
+ * - user_time
+ * - idle_time
+ * - system time
+ *   - check for guest_time
+ *   - else account as system_time
+ *
+ * Check for hardirq is done both for system and user time as there is
+ * no timer going off while we are on hardirq and hence we may never get an
+ * opportunity to update it solely in system time.
+ * p->stime and friends are only updated on system time and not on irq
+ * softirq as those do not count in task exec_runtime any more.
+ */
+static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
+						struct rq *rq)
+{
+	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
+	cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy);
+	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+
+	if (irqtime_account_hi_update()) {
+		cpustat->irq = cputime64_add(cpustat->irq, tmp);
+	} else if (irqtime_account_si_update()) {
+		cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
+	} else if (user_tick) {
+		account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
+	} else if (p == rq->idle) {
+		account_idle_time(cputime_one_jiffy);
+	} else if (p->flags & PF_VCPU) { /* System time or guest time */
+		account_guest_time(p, cputime_one_jiffy, one_jiffy_scaled);
+	} else {
+		__account_system_time(p, cputime_one_jiffy, one_jiffy_scaled,
+					&cpustat->system);
+	}
+}
+
+static void irqtime_account_idle_ticks(int ticks)
+{
+	int i;
+	struct rq *rq = this_rq();
+
+	for (i = 0; i < ticks; i++)
+		irqtime_account_process_tick(current, 0, rq);
+}
+#else
+static void irqtime_account_idle_ticks(int ticks) {}
+static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
+						struct rq *rq) {}
+#endif
+
 /*
  * Account for involuntary wait time.
  * @steal: the cpu time spent in involuntary wait
@@ -3794,6 +3885,11 @@ void account_process_tick(struct task_struct *p, int user_tick)
 	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
 	struct rq *rq = this_rq();
 
+	if (sched_clock_irqtime) {
+		irqtime_account_process_tick(p, user_tick, rq);
+		return;
+	}
+
 	if (user_tick)
 		account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
 	else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET))
@@ -3819,6 +3915,12 @@ void account_steal_ticks(unsigned long ticks)
  */
 void account_idle_ticks(unsigned long ticks)
 {
+
+	if (sched_clock_irqtime) {
+		irqtime_account_idle_ticks(ticks);
+		return;
+	}
+
 	account_idle_time(jiffies_to_cputime(ticks));
 }
 
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] Account ksoftirqd time as cpustat softirq -v2
  2010-12-22  1:08 [PATCH 0/5] Proper kernel irq time reporting -v2 Venkatesh Pallipadi
                   ` (3 preceding siblings ...)
  2010-12-22  1:09 ` [PATCH 4/5] Export ns irqtimes through /proc/stat -v2 Venkatesh Pallipadi
@ 2010-12-22  1:09 ` Venkatesh Pallipadi
  2010-12-22  9:20   ` Peter Zijlstra
  2011-01-26 12:13   ` [tip:sched/core] softirqs: Account ksoftirqd time as cpustat softirq tip-bot for Venkatesh Pallipadi
  2011-01-06 15:31 ` [PATCH 0/5] Proper kernel irq time reporting -v2 Shaun Ruffell
  5 siblings, 2 replies; 20+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-22  1:09 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Balbir Singh, Martin Schwidefsky
  Cc: linux-kernel, Paul Turner, Eric Dumazet, Shaun Ruffell,
	Venkatesh Pallipadi

softirq time in ksoftirqd context is not accounted in ns granularity
per cpu softirq stats, as we want that to be a part of ksoftirqd
exec_runtime.

Accounting them as softirq on /proc/stat separately.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 kernel/sched.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 193b1d0..4ff6d1a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3819,6 +3819,14 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 		cpustat->irq = cputime64_add(cpustat->irq, tmp);
 	} else if (irqtime_account_si_update()) {
 		cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
+	} else if (this_cpu_ksoftirqd() == p) {
+		/*
+		 * ksoftirqd time do not get accounted in cpu_softirq_time.
+		 * So, we have to handle it separately here.
+		 * Also, p->stime needs to be updated for ksoftirqd.
+		 */
+		__account_system_time(p, cputime_one_jiffy, one_jiffy_scaled,
+					&cpustat->softirq);
 	} else if (user_tick) {
 		account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
 	} else if (p == rq->idle) {
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] Add nsecs_to_cputime64 interface for asm-generic -v2
  2010-12-22  1:09 ` [PATCH 2/5] Add nsecs_to_cputime64 interface for asm-generic -v2 Venkatesh Pallipadi
@ 2010-12-22  8:30   ` Martin Schwidefsky
  2010-12-22 14:23     ` Venkatesh Pallipadi
  2011-01-26 12:12   ` [tip:sched/core] time: Add nsecs_to_cputime64 interface for asm-generic tip-bot for Venkatesh Pallipadi
  1 sibling, 1 reply; 20+ messages in thread
From: Martin Schwidefsky @ 2010-12-22  8:30 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: Peter Zijlstra, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Balbir Singh, linux-kernel, Paul Turner, Eric Dumazet,
	Shaun Ruffell

On Tue, 21 Dec 2010 17:09:01 -0800
Venkatesh Pallipadi <venki@google.com> wrote:

> Add nsecs_to_cputime64 interface. This is used in following patches that
> updates cpu irq stat based on ns granularity info in IRQ_TIME_ACCOUNTING.
> 
> Tested-by: Shaun Ruffell <sruffell@digium.com>
> Signed-off-by: Venkatesh Pallipadi <venki@google.com>
> ---
>  include/asm-generic/cputime.h |    3 +++
>  include/linux/jiffies.h       |    1 +
>  kernel/time.c                 |   23 +++++++++++++++++++++--
>  3 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/include/asm-generic/cputime.h b/include/asm-generic/cputime.h
> index 2bcc5c7..61e03dd 100644
> --- a/include/asm-generic/cputime.h
> +++ b/include/asm-generic/cputime.h
> @@ -30,6 +30,9 @@ typedef u64 cputime64_t;
>  #define cputime64_to_jiffies64(__ct)	(__ct)
>  #define jiffies64_to_cputime64(__jif)	(__jif)
>  #define cputime_to_cputime64(__ct)	((u64) __ct)
> +#define cputime64_gt(__a, __b)		((__a) >  (__b))
> +
> +#define nsecs_to_cputime64(__ct)	nsecs_to_jiffies64(__ct)
> 
> 
>  /*

If you define a new cputime primitive it is not enough to add it to the
generic cputime header. You have to add it to the arch-specific cputime
header for ia64, powerpc and s390 as well. Otherwise the build will
break with CONFIG_VIRT_CPU_ACCOUNTING=y on these architectures once the
new primitive is used.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] Free up pf flag PF_KSOFTIRQD -v2
  2010-12-22  1:09 ` [PATCH 1/5] Free up pf flag PF_KSOFTIRQD -v2 Venkatesh Pallipadi
@ 2010-12-22  9:17   ` Peter Zijlstra
  2011-01-26 12:11   ` [tip:sched/core] softirqs: Free up pf flag PF_KSOFTIRQD tip-bot for Venkatesh Pallipadi
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2010-12-22  9:17 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Balbir Singh,
	Martin Schwidefsky, linux-kernel, Paul Turner, Eric Dumazet,
	Shaun Ruffell

On Tue, 2010-12-21 at 17:09 -0800, Venkatesh Pallipadi wrote:
> Patchset:
> This is Part 2 of
> "Proper kernel irq time accounting -v4"
> http://lkml.indiana.edu/hypermail//linux/kernel/1010.0/01175.html
> 
> and applies 2.6.37-rc7.
> 
> Part 1 solves the way irqs are accounted in scheduler and tasks. This
> patchset solves how irq times are reported in /proc/stat and also not
> to include irq time in task->stime, etc.
> 
> Example:
> Running a cpu intensive loop and network intensive nc on a 4 CPU system
> and looking at 'top' output.
> 
> With vanilla kernel:
> Cpu0  :  0.0% us,  0.3% sy,  0.0% ni, 99.3% id,  0.0% wa,  0.0% hi,  0.3% si
> Cpu1  : 100.0% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
> Cpu2  :  1.3% us, 27.2% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 71.4% si
> Cpu3  :  1.6% us,  1.3% sy,  0.0% ni, 96.7% id,  0.0% wa,  0.0% hi,  0.3% si
> 
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  7555 root      20   0  1760  528  436 R  100  0.0   0:15.79 nc
>  7563 root      20   0  3632  268  204 R  100  0.0   0:13.13 loop
> 
> Notes:
> * Both tasks show 100% CPU, even when one of them is stuck on a CPU thats
>   processing 70% softirq.
> * no hardirq time.
> 
> 
> With "Part 1" patches:
> Cpu0  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
> Cpu1  : 100.0% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
> Cpu2  :  2.0% us, 30.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 67.4% si
> Cpu3  :  0.7% us,  0.7% sy,  0.3% ni, 98.3% id,  0.0% wa,  0.0% hi,  0.0% si
> 
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  6289 root      20   0  3632  268  204 R  100  0.0   2:18.67 loop
>  5737 root      20   0  1760  528  436 R   33  0.0   0:26.72 nc
> 
> Notes:
> * Tasks show 100% CPU and 33% CPU that correspond to their non-irq exec time.
> * no hardirq time.
> 
> 
> With "Part 1 + Part 2" patches:
> Cpu0  :  1.3% us,  1.0% sy,  0.3% ni, 97.0% id,  0.0% wa,  0.0% hi,  0.3% si
> Cpu1  : 99.3% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.7% hi,  0.0% si
> Cpu2  :  1.3% us, 31.5% sy,  0.0% ni,  0.0% id,  0.0% wa,  8.3% hi, 58.9% si
> Cpu3  :  1.0% us,  2.0% sy,  0.3% ni, 95.0% id,  0.0% wa,  0.7% hi,  1.0% si
> 
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 20929 root      20   0  3632  268  204 R   99  0.0   3:48.25 loop
> 20796 root      20   0  1760  528  436 R   33  0.0   2:38.65 nc
> 
> Notes:
> * Both task exec time and hard irq time reported correctly.
> * hi and si time are based on fine granularity info and not on samples.
> * getrusage would give proper utime/stime split not including irq times
>   in that ratio.
> * Other places that report user/sys time like, cgroup cpuacct.stat will
>   now include only non-irq exectime.
> 
> This patch:

Your 0/x seem repeated in here for some reason... I would expect on the
below little bit.

> Cleanup patch, freeing up PF_KSOFTIRQD and use per_cpu ksoftirqd pointer
> instead, as suggested by Eric Dumazet.
> 
> Tested-by: Shaun Ruffell <sruffell@digium.com>
> Signed-off-by: Venkatesh Pallipadi <venki@google.com>
> --- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] Account ksoftirqd time as cpustat softirq -v2
  2010-12-22  1:09 ` [PATCH 5/5] Account ksoftirqd time as cpustat softirq -v2 Venkatesh Pallipadi
@ 2010-12-22  9:20   ` Peter Zijlstra
  2010-12-22 13:59     ` Venkatesh Pallipadi
  2011-01-26 12:13   ` [tip:sched/core] softirqs: Account ksoftirqd time as cpustat softirq tip-bot for Venkatesh Pallipadi
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2010-12-22  9:20 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Balbir Singh,
	Martin Schwidefsky, linux-kernel, Paul Turner, Eric Dumazet,
	Shaun Ruffell

On Tue, 2010-12-21 at 17:09 -0800, Venkatesh Pallipadi wrote:
> softirq time in ksoftirqd context is not accounted in ns granularity
> per cpu softirq stats, as we want that to be a part of ksoftirqd
> exec_runtime.
> 
> Accounting them as softirq on /proc/stat separately.
> 
> Tested-by: Shaun Ruffell <sruffell@digium.com>
> Signed-off-by: Venkatesh Pallipadi <venki@google.com>
> ---
>  kernel/sched.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 193b1d0..4ff6d1a 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -3819,6 +3819,14 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
>  		cpustat->irq = cputime64_add(cpustat->irq, tmp);
>  	} else if (irqtime_account_si_update()) {
>  		cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
> +	} else if (this_cpu_ksoftirqd() == p) {
> +		/*
> +		 * ksoftirqd time do not get accounted in cpu_softirq_time.
> +		 * So, we have to handle it separately here.
> +		 * Also, p->stime needs to be updated for ksoftirqd.
> +		 */
> +		__account_system_time(p, cputime_one_jiffy, one_jiffy_scaled,
> +					&cpustat->softirq);
>  	} else if (user_tick) {
>  		account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
>  	} else if (p == rq->idle) {

So you add the ksoftirqd runtime as softirq time in the /proc/stat
output?

That seems dubious...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] Account ksoftirqd time as cpustat softirq -v2
  2010-12-22  9:20   ` Peter Zijlstra
@ 2010-12-22 13:59     ` Venkatesh Pallipadi
  2010-12-22 14:05       ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-22 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Balbir Singh,
	Martin Schwidefsky, linux-kernel, Paul Turner, Eric Dumazet,
	Shaun Ruffell

On Wed, Dec 22, 2010 at 1:20 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2010-12-21 at 17:09 -0800, Venkatesh Pallipadi wrote:
>> softirq time in ksoftirqd context is not accounted in ns granularity
>> per cpu softirq stats, as we want that to be a part of ksoftirqd
>> exec_runtime.
>>
>> Accounting them as softirq on /proc/stat separately.
>>
>> Tested-by: Shaun Ruffell <sruffell@digium.com>
>> Signed-off-by: Venkatesh Pallipadi <venki@google.com>
>> ---
>>  kernel/sched.c |    8 ++++++++
>>  1 files changed, 8 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index 193b1d0..4ff6d1a 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -3819,6 +3819,14 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
>>               cpustat->irq = cputime64_add(cpustat->irq, tmp);
>>       } else if (irqtime_account_si_update()) {
>>               cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
>> +     } else if (this_cpu_ksoftirqd() == p) {
>> +             /*
>> +              * ksoftirqd time do not get accounted in cpu_softirq_time.
>> +              * So, we have to handle it separately here.
>> +              * Also, p->stime needs to be updated for ksoftirqd.
>> +              */
>> +             __account_system_time(p, cputime_one_jiffy, one_jiffy_scaled,
>> +                                     &cpustat->softirq);
>>       } else if (user_tick) {
>>               account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
>>       } else if (p == rq->idle) {
>
> So you add the ksoftirqd runtime as softirq time in the /proc/stat
> output?
>
> That seems dubious...
>

Yes. Thats similar to what we do today. As long as we are in softirq,
whether bottomhalf or ksoftirqd, we account the tick as softirq time
in /proc/stat.
I thought it also makes sense from user perspective. They want to know
how much time are we currently spending serving softirqs and hardirqs.
It is better to get that information at one place, instead of say look
at /proc/stat and add percpu ksoftirqd time to it. I mean,
user/administrator does not have to care whether ksoftirqd is involved
or not to know how much time was spend serving softirqs.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] Account ksoftirqd time as cpustat softirq -v2
  2010-12-22 13:59     ` Venkatesh Pallipadi
@ 2010-12-22 14:05       ` Peter Zijlstra
  2010-12-22 14:17         ` Venkatesh Pallipadi
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2010-12-22 14:05 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Balbir Singh,
	Martin Schwidefsky, linux-kernel, Paul Turner, Eric Dumazet,
	Shaun Ruffell

On Wed, 2010-12-22 at 05:59 -0800, Venkatesh Pallipadi wrote:
> > So you add the ksoftirqd runtime as softirq time in the /proc/stat
> > output?
> >
> > That seems dubious...
> >
> 
> Yes. Thats similar to what we do today. As long as we are in softirq,
> whether bottomhalf or ksoftirqd, we account the tick as softirq time
> in /proc/stat.
> I thought it also makes sense from user perspective. They want to know
> how much time are we currently spending serving softirqs and hardirqs.
> It is better to get that information at one place, instead of say look
> at /proc/stat and add percpu ksoftirqd time to it. I mean,
> user/administrator does not have to care whether ksoftirqd is involved
> or not to know how much time was spend serving softirqs.
> 
But ksoftirqd time is also counted as sys time, so somewhere this won't
add up to 100%.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] Account ksoftirqd time as cpustat softirq -v2
  2010-12-22 14:05       ` Peter Zijlstra
@ 2010-12-22 14:17         ` Venkatesh Pallipadi
  0 siblings, 0 replies; 20+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-22 14:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Balbir Singh,
	Martin Schwidefsky, linux-kernel, Paul Turner, Eric Dumazet,
	Shaun Ruffell

On Wed, Dec 22, 2010 at 6:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-12-22 at 05:59 -0800, Venkatesh Pallipadi wrote:
>> > So you add the ksoftirqd runtime as softirq time in the /proc/stat
>> > output?
>> >
>> > That seems dubious...
>> >
>>
>> Yes. Thats similar to what we do today. As long as we are in softirq,
>> whether bottomhalf or ksoftirqd, we account the tick as softirq time
>> in /proc/stat.
>> I thought it also makes sense from user perspective. They want to know
>> how much time are we currently spending serving softirqs and hardirqs.
>> It is better to get that information at one place, instead of say look
>> at /proc/stat and add percpu ksoftirqd time to it. I mean,
>> user/administrator does not have to care whether ksoftirqd is involved
>> or not to know how much time was spend serving softirqs.
>>
> But ksoftirqd time is also counted as sys time, so somewhere this won't
> add up to 100%.
>

It won't get added into cpu kstat system time. It will get added into
task stime, so looking into ksoftirqd task time, this will be seen as
system time. So, per cpu kstat should be consistent when added up. The
problem will be, with ksoftirqd, cpu stats will say the time as
softirq and task stats will account the time to ksoftirqd task as
stime, which is different from what we do with other tasks. I guess
that should be OK as ksoftirqd are special in some sense.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] Add nsecs_to_cputime64 interface for asm-generic -v2
  2010-12-22  8:30   ` Martin Schwidefsky
@ 2010-12-22 14:23     ` Venkatesh Pallipadi
  2010-12-22 15:25       ` Martin Schwidefsky
  0 siblings, 1 reply; 20+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-22 14:23 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Peter Zijlstra, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Balbir Singh, linux-kernel, Paul Turner, Eric Dumazet,
	Shaun Ruffell

On Wed, Dec 22, 2010 at 12:30 AM, Martin Schwidefsky
<schwidefsky@de.ibm.com> wrote:
> On Tue, 21 Dec 2010 17:09:01 -0800
> Venkatesh Pallipadi <venki@google.com> wrote:
>
>> Add nsecs_to_cputime64 interface. This is used in following patches that
>> updates cpu irq stat based on ns granularity info in IRQ_TIME_ACCOUNTING.
>>
>> Tested-by: Shaun Ruffell <sruffell@digium.com>
>> Signed-off-by: Venkatesh Pallipadi <venki@google.com>
>> ---
>>  include/asm-generic/cputime.h |    3 +++
>>  include/linux/jiffies.h       |    1 +
>>  kernel/time.c                 |   23 +++++++++++++++++++++--
>>  3 files changed, 25 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/asm-generic/cputime.h b/include/asm-generic/cputime.h
>> index 2bcc5c7..61e03dd 100644
>> --- a/include/asm-generic/cputime.h
>> +++ b/include/asm-generic/cputime.h
>> @@ -30,6 +30,9 @@ typedef u64 cputime64_t;
>>  #define cputime64_to_jiffies64(__ct) (__ct)
>>  #define jiffies64_to_cputime64(__jif)        (__jif)
>>  #define cputime_to_cputime64(__ct)   ((u64) __ct)
>> +#define cputime64_gt(__a, __b)               ((__a) >  (__b))
>> +
>> +#define nsecs_to_cputime64(__ct)     nsecs_to_jiffies64(__ct)
>>
>>
>>  /*
>
> If you define a new cputime primitive it is not enough to add it to the
> generic cputime header. You have to add it to the arch-specific cputime
> header for ia64, powerpc and s390 as well. Otherwise the build will
> break with CONFIG_VIRT_CPU_ACCOUNTING=y on these architectures once the
> new primitive is used.
>

But, those archs have arch specific cputime as well and so need a
different macro for nsecs_to_cputime64, instead of nsec_to_jiffies64.
Right? As of now this new primitive should not be used in those archs.
I guess this is something needed when using the primitive in future?

Thanks,
Venki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] Add nsecs_to_cputime64 interface for asm-generic -v2
  2010-12-22 14:23     ` Venkatesh Pallipadi
@ 2010-12-22 15:25       ` Martin Schwidefsky
  0 siblings, 0 replies; 20+ messages in thread
From: Martin Schwidefsky @ 2010-12-22 15:25 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: Peter Zijlstra, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Balbir Singh, linux-kernel, Paul Turner, Eric Dumazet,
	Shaun Ruffell

On Wed, 22 Dec 2010 06:23:55 -0800
Venkatesh Pallipadi <venki@google.com> wrote:

> On Wed, Dec 22, 2010 at 12:30 AM, Martin Schwidefsky
> <schwidefsky@de.ibm.com> wrote:
> > On Tue, 21 Dec 2010 17:09:01 -0800
> > Venkatesh Pallipadi <venki@google.com> wrote:
> >
> >> Add nsecs_to_cputime64 interface. This is used in following patches that
> >> updates cpu irq stat based on ns granularity info in IRQ_TIME_ACCOUNTING.
> >>
> >> Tested-by: Shaun Ruffell <sruffell@digium.com>
> >> Signed-off-by: Venkatesh Pallipadi <venki@google.com>
> >> ---
> >>  include/asm-generic/cputime.h |    3 +++
> >>  include/linux/jiffies.h       |    1 +
> >>  kernel/time.c                 |   23 +++++++++++++++++++++--
> >>  3 files changed, 25 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/asm-generic/cputime.h b/include/asm-generic/cputime.h
> >> index 2bcc5c7..61e03dd 100644
> >> --- a/include/asm-generic/cputime.h
> >> +++ b/include/asm-generic/cputime.h
> >> @@ -30,6 +30,9 @@ typedef u64 cputime64_t;
> >>  #define cputime64_to_jiffies64(__ct) (__ct)
> >>  #define jiffies64_to_cputime64(__jif)        (__jif)
> >>  #define cputime_to_cputime64(__ct)   ((u64) __ct)
> >> +#define cputime64_gt(__a, __b)               ((__a) >  (__b))
> >> +
> >> +#define nsecs_to_cputime64(__ct)     nsecs_to_jiffies64(__ct)
> >>
> >>
> >>  /*
> >
> > If you define a new cputime primitive it is not enough to add it to the
> > generic cputime header. You have to add it to the arch-specific cputime
> > header for ia64, powerpc and s390 as well. Otherwise the build will
> > break with CONFIG_VIRT_CPU_ACCOUNTING=y on these architectures once the
> > new primitive is used.
> >
> 
> But, those archs have arch specific cputime as well and so need a
> different macro for nsecs_to_cputime64, instead of nsec_to_jiffies64.
> Right? As of now this new primitive should not be used in those archs.
> I guess this is something needed when using the primitive in future?

Yes on both parts of the question. You need an independent
implementation of the nsecs_to_cputime64 primitive for the architectures
that have their own definition of cputime and this definition will be
needed when the primitive is used is the future.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/5] Proper kernel irq time reporting -v2
  2010-12-22  1:08 [PATCH 0/5] Proper kernel irq time reporting -v2 Venkatesh Pallipadi
                   ` (4 preceding siblings ...)
  2010-12-22  1:09 ` [PATCH 5/5] Account ksoftirqd time as cpustat softirq -v2 Venkatesh Pallipadi
@ 2011-01-06 15:31 ` Shaun Ruffell
  5 siblings, 0 replies; 20+ messages in thread
From: Shaun Ruffell @ 2011-01-06 15:31 UTC (permalink / raw)
  To: Venkatesh Pallipadi; +Cc: linux-kernel

On 12/21/2010 07:08 PM, Venkatesh Pallipadi wrote:
> This is Part 2 of
> "Proper kernel irq time accounting -v4"
> http://lkml.indiana.edu/hypermail//linux/kernel/1010.0/01175.html
> 
> and applies over 2.6.37-rc7.
> 

Just a minor note to say I've been running with your updated patches on
top of -rc7 the last week without any issues.  I just updated my
machines with these patches on top of 2.6.37 final without issue.

Thanks, Shaun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:sched/core] softirqs: Free up pf flag PF_KSOFTIRQD
  2010-12-22  1:09 ` [PATCH 1/5] Free up pf flag PF_KSOFTIRQD -v2 Venkatesh Pallipadi
  2010-12-22  9:17   ` Peter Zijlstra
@ 2011-01-26 12:11   ` tip-bot for Venkatesh Pallipadi
  1 sibling, 0 replies; 20+ messages in thread
From: tip-bot for Venkatesh Pallipadi @ 2011-01-26 12:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, sruffell, tglx, venki, mingo

Commit-ID:  4dd53d891ca46dcc1fde0376a33540d3fd83cb9a
Gitweb:     http://git.kernel.org/tip/4dd53d891ca46dcc1fde0376a33540d3fd83cb9a
Author:     Venkatesh Pallipadi <venki@google.com>
AuthorDate: Tue, 21 Dec 2010 17:09:00 -0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 26 Jan 2011 12:33:20 +0100

softirqs: Free up pf flag PF_KSOFTIRQD

Cleanup patch, freeing up PF_KSOFTIRQD and use per_cpu ksoftirqd pointer
instead, as suggested by Eric Dumazet.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-2-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/interrupt.h |    7 +++++++
 include/linux/sched.h     |    1 -
 kernel/sched.c            |    2 +-
 kernel/softirq.c          |    3 +--
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 55e0d42..a1382b9 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -426,6 +426,13 @@ extern void raise_softirq(unsigned int nr);
  */
 DECLARE_PER_CPU(struct list_head [NR_SOFTIRQS], softirq_work_list);
 
+DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
+
+static inline struct task_struct *this_cpu_ksoftirqd(void)
+{
+	return this_cpu_read(ksoftirqd);
+}
+
 /* Try to send a softirq to a remote cpu.  If this cannot be done, the
  * work will be queued to the local cpu.
  */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d747f94..af6e15f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1715,7 +1715,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 /*
  * Per process flags
  */
-#define PF_KSOFTIRQD	0x00000001	/* I am ksoftirqd */
 #define PF_STARTING	0x00000002	/* being created */
 #define PF_EXITING	0x00000004	/* getting shut down */
 #define PF_EXITPIDONE	0x00000008	/* pi exit done on shut down */
diff --git a/kernel/sched.c b/kernel/sched.c
index 6820b5b..8b89b3b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1880,7 +1880,7 @@ void account_system_vtime(struct task_struct *curr)
 	 */
 	if (hardirq_count())
 		__this_cpu_add(cpu_hardirq_time, delta);
-	else if (in_serving_softirq() && !(curr->flags & PF_KSOFTIRQD))
+	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
 		__this_cpu_add(cpu_softirq_time, delta);
 
 	irq_time_write_end();
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 68eb5ef..0cee504 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -54,7 +54,7 @@ EXPORT_SYMBOL(irq_stat);
 
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 
-static DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
 
 char *softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
@@ -721,7 +721,6 @@ static int run_ksoftirqd(void * __bind_cpu)
 {
 	set_current_state(TASK_INTERRUPTIBLE);
 
-	current->flags |= PF_KSOFTIRQD;
 	while (!kthread_should_stop()) {
 		preempt_disable();
 		if (!local_softirq_pending()) {

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip:sched/core] time: Add nsecs_to_cputime64 interface for asm-generic
  2010-12-22  1:09 ` [PATCH 2/5] Add nsecs_to_cputime64 interface for asm-generic -v2 Venkatesh Pallipadi
  2010-12-22  8:30   ` Martin Schwidefsky
@ 2011-01-26 12:12   ` tip-bot for Venkatesh Pallipadi
  1 sibling, 0 replies; 20+ messages in thread
From: tip-bot for Venkatesh Pallipadi @ 2011-01-26 12:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, sruffell, tglx, venki, mingo

Commit-ID:  a1dabb6bfffccb897eff3e1d102dacf2a4bedf3b
Gitweb:     http://git.kernel.org/tip/a1dabb6bfffccb897eff3e1d102dacf2a4bedf3b
Author:     Venkatesh Pallipadi <venki@google.com>
AuthorDate: Tue, 21 Dec 2010 17:09:01 -0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 26 Jan 2011 12:33:20 +0100

time: Add nsecs_to_cputime64 interface for asm-generic

Add nsecs_to_cputime64 interface. This is used in following patches that
updates cpu irq stat based on ns granularity info in IRQ_TIME_ACCOUNTING.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-3-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/asm-generic/cputime.h |    3 +++
 include/linux/jiffies.h       |    1 +
 kernel/time.c                 |   23 +++++++++++++++++++++--
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/cputime.h b/include/asm-generic/cputime.h
index 2bcc5c7..61e03dd 100644
--- a/include/asm-generic/cputime.h
+++ b/include/asm-generic/cputime.h
@@ -30,6 +30,9 @@ typedef u64 cputime64_t;
 #define cputime64_to_jiffies64(__ct)	(__ct)
 #define jiffies64_to_cputime64(__jif)	(__jif)
 #define cputime_to_cputime64(__ct)	((u64) __ct)
+#define cputime64_gt(__a, __b)		((__a) >  (__b))
+
+#define nsecs_to_cputime64(__ct)	nsecs_to_jiffies64(__ct)
 
 
 /*
diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 6811f4b..922aa31 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -307,6 +307,7 @@ extern clock_t jiffies_to_clock_t(long x);
 extern unsigned long clock_t_to_jiffies(unsigned long x);
 extern u64 jiffies_64_to_clock_t(u64 x);
 extern u64 nsec_to_clock_t(u64 x);
+extern u64 nsecs_to_jiffies64(u64 n);
 extern unsigned long nsecs_to_jiffies(u64 n);
 
 #define TIMESTAMP_SIZE	30
diff --git a/kernel/time.c b/kernel/time.c
index 3217435..55337a8 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -645,7 +645,7 @@ u64 nsec_to_clock_t(u64 x)
 }
 
 /**
- * nsecs_to_jiffies - Convert nsecs in u64 to jiffies
+ * nsecs_to_jiffies64 - Convert nsecs in u64 to jiffies64
  *
  * @n:	nsecs in u64
  *
@@ -657,7 +657,7 @@ u64 nsec_to_clock_t(u64 x)
  *   NSEC_PER_SEC = 10^9 = (5^9 * 2^9) = (1953125 * 512)
  *   ULLONG_MAX ns = 18446744073.709551615 secs = about 584 years
  */
-unsigned long nsecs_to_jiffies(u64 n)
+u64 nsecs_to_jiffies64(u64 n)
 {
 #if (NSEC_PER_SEC % HZ) == 0
 	/* Common case, HZ = 100, 128, 200, 250, 256, 500, 512, 1000 etc. */
@@ -674,6 +674,25 @@ unsigned long nsecs_to_jiffies(u64 n)
 #endif
 }
 
+
+/**
+ * nsecs_to_jiffies - Convert nsecs in u64 to jiffies
+ *
+ * @n:	nsecs in u64
+ *
+ * Unlike {m,u}secs_to_jiffies, type of input is not unsigned int but u64.
+ * And this doesn't return MAX_JIFFY_OFFSET since this function is designed
+ * for scheduler, not for use in device drivers to calculate timeout value.
+ *
+ * note:
+ *   NSEC_PER_SEC = 10^9 = (5^9 * 2^9) = (1953125 * 512)
+ *   ULLONG_MAX ns = 18446744073.709551615 secs = about 584 years
+ */
+unsigned long nsecs_to_jiffies(u64 n)
+{
+	return (unsigned long)nsecs_to_jiffies64(n);
+}
+
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
 {

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip:sched/core] sched: Refactor account_system_time separating id-update
  2010-12-22  1:09 ` [PATCH 3/5] Refactor account_system_time separating id-update -v2 Venkatesh Pallipadi
@ 2011-01-26 12:12   ` tip-bot for Venkatesh Pallipadi
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Venkatesh Pallipadi @ 2011-01-26 12:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, sruffell, tglx, venki, mingo

Commit-ID:  70a89a6620f658d47a1488515bada4b8ee6291d8
Gitweb:     http://git.kernel.org/tip/70a89a6620f658d47a1488515bada4b8ee6291d8
Author:     Venkatesh Pallipadi <venki@google.com>
AuthorDate: Tue, 21 Dec 2010 17:09:02 -0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 26 Jan 2011 12:33:21 +0100

sched: Refactor account_system_time separating id-update

Refactor account_system_time, to separate out the logic of
identifying the update needed and code that does actual update.

This is used by following patch for IRQ_TIME_ACCOUNTING,
which has different identification logic and same update logic.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-4-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   46 +++++++++++++++++++++++++++++++---------------
 1 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 8b89b3b..e3fa921 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3568,6 +3568,32 @@ static void account_guest_time(struct task_struct *p, cputime_t cputime,
 }
 
 /*
+ * Account system cpu time to a process and desired cpustat field
+ * @p: the process that the cpu time gets accounted to
+ * @cputime: the cpu time spent in kernel space since the last update
+ * @cputime_scaled: cputime scaled by cpu frequency
+ * @target_cputime64: pointer to cpustat field that has to be updated
+ */
+static inline
+void __account_system_time(struct task_struct *p, cputime_t cputime,
+			cputime_t cputime_scaled, cputime64_t *target_cputime64)
+{
+	cputime64_t tmp = cputime_to_cputime64(cputime);
+
+	/* Add system time to process. */
+	p->stime = cputime_add(p->stime, cputime);
+	p->stimescaled = cputime_add(p->stimescaled, cputime_scaled);
+	account_group_system_time(p, cputime);
+
+	/* Add system time to cpustat. */
+	*target_cputime64 = cputime64_add(*target_cputime64, tmp);
+	cpuacct_update_stats(p, CPUACCT_STAT_SYSTEM, cputime);
+
+	/* Account for system time used */
+	acct_update_integrals(p);
+}
+
+/*
  * Account system cpu time to a process.
  * @p: the process that the cpu time gets accounted to
  * @hardirq_offset: the offset to subtract from hardirq_count()
@@ -3578,31 +3604,21 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
 			 cputime_t cputime, cputime_t cputime_scaled)
 {
 	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
-	cputime64_t tmp;
+	cputime64_t *target_cputime64;
 
 	if ((p->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0)) {
 		account_guest_time(p, cputime, cputime_scaled);
 		return;
 	}
 
-	/* Add system time to process. */
-	p->stime = cputime_add(p->stime, cputime);
-	p->stimescaled = cputime_add(p->stimescaled, cputime_scaled);
-	account_group_system_time(p, cputime);
-
-	/* Add system time to cpustat. */
-	tmp = cputime_to_cputime64(cputime);
 	if (hardirq_count() - hardirq_offset)
-		cpustat->irq = cputime64_add(cpustat->irq, tmp);
+		target_cputime64 = &cpustat->irq;
 	else if (in_serving_softirq())
-		cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
+		target_cputime64 = &cpustat->softirq;
 	else
-		cpustat->system = cputime64_add(cpustat->system, tmp);
+		target_cputime64 = &cpustat->system;
 
-	cpuacct_update_stats(p, CPUACCT_STAT_SYSTEM, cputime);
-
-	/* Account for system time used */
-	acct_update_integrals(p);
+	__account_system_time(p, cputime, cputime_scaled, target_cputime64);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip:sched/core] sched: Export ns irqtimes through /proc/stat
  2010-12-22  1:09 ` [PATCH 4/5] Export ns irqtimes through /proc/stat -v2 Venkatesh Pallipadi
@ 2011-01-26 12:13   ` tip-bot for Venkatesh Pallipadi
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Venkatesh Pallipadi @ 2011-01-26 12:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, sruffell, tglx, venki, mingo

Commit-ID:  abb74cefa9c682fb38ba86c17ca3c86fed6cc464
Gitweb:     http://git.kernel.org/tip/abb74cefa9c682fb38ba86c17ca3c86fed6cc464
Author:     Venkatesh Pallipadi <venki@google.com>
AuthorDate: Tue, 21 Dec 2010 17:09:03 -0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 26 Jan 2011 12:33:21 +0100

sched: Export ns irqtimes through /proc/stat

CONFIG_IRQ_TIME_ACCOUNTING adds ns granularity irq time on each CPU.
This info is already used in scheduler to do proper task chargeback
(earlier patches). This patch retro-fits this ns granularity
hardirq and softirq information to /proc/stat irq and softirq fields.

The update is still done on timer tick, where we look at accumulated
ns hardirq/softirq time and account the tick to user/system/irq/hardirq/guest
accordingly.

No new interface added.

Earlier versions looked at adding this as new fields in some /proc
files. This one seems to be the best in terms of impact to existing
apps, even though it has somewhat more kernel code than earlier versions.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-5-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |  102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index e3fa921..2a3c979 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1920,8 +1920,40 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 		sched_rt_avg_update(rq, irq_delta);
 }
 
+static int irqtime_account_hi_update(void)
+{
+	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+	unsigned long flags;
+	u64 latest_ns;
+	int ret = 0;
+
+	local_irq_save(flags);
+	latest_ns = this_cpu_read(cpu_hardirq_time);
+	if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->irq))
+		ret = 1;
+	local_irq_restore(flags);
+	return ret;
+}
+
+static int irqtime_account_si_update(void)
+{
+	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+	unsigned long flags;
+	u64 latest_ns;
+	int ret = 0;
+
+	local_irq_save(flags);
+	latest_ns = this_cpu_read(cpu_softirq_time);
+	if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->softirq))
+		ret = 1;
+	local_irq_restore(flags);
+	return ret;
+}
+
 #else /* CONFIG_IRQ_TIME_ACCOUNTING */
 
+#define sched_clock_irqtime	(0)
+
 static void update_rq_clock_task(struct rq *rq, s64 delta)
 {
 	rq->clock_task += delta;
@@ -3621,6 +3653,65 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
 	__account_system_time(p, cputime, cputime_scaled, target_cputime64);
 }
 
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+/*
+ * Account a tick to a process and cpustat
+ * @p: the process that the cpu time gets accounted to
+ * @user_tick: is the tick from userspace
+ * @rq: the pointer to rq
+ *
+ * Tick demultiplexing follows the order
+ * - pending hardirq update
+ * - pending softirq update
+ * - user_time
+ * - idle_time
+ * - system time
+ *   - check for guest_time
+ *   - else account as system_time
+ *
+ * Check for hardirq is done both for system and user time as there is
+ * no timer going off while we are on hardirq and hence we may never get an
+ * opportunity to update it solely in system time.
+ * p->stime and friends are only updated on system time and not on irq
+ * softirq as those do not count in task exec_runtime any more.
+ */
+static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
+						struct rq *rq)
+{
+	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
+	cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy);
+	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+
+	if (irqtime_account_hi_update()) {
+		cpustat->irq = cputime64_add(cpustat->irq, tmp);
+	} else if (irqtime_account_si_update()) {
+		cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
+	} else if (user_tick) {
+		account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
+	} else if (p == rq->idle) {
+		account_idle_time(cputime_one_jiffy);
+	} else if (p->flags & PF_VCPU) { /* System time or guest time */
+		account_guest_time(p, cputime_one_jiffy, one_jiffy_scaled);
+	} else {
+		__account_system_time(p, cputime_one_jiffy, one_jiffy_scaled,
+					&cpustat->system);
+	}
+}
+
+static void irqtime_account_idle_ticks(int ticks)
+{
+	int i;
+	struct rq *rq = this_rq();
+
+	for (i = 0; i < ticks; i++)
+		irqtime_account_process_tick(current, 0, rq);
+}
+#else
+static void irqtime_account_idle_ticks(int ticks) {}
+static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
+						struct rq *rq) {}
+#endif
+
 /*
  * Account for involuntary wait time.
  * @steal: the cpu time spent in involuntary wait
@@ -3661,6 +3752,11 @@ void account_process_tick(struct task_struct *p, int user_tick)
 	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
 	struct rq *rq = this_rq();
 
+	if (sched_clock_irqtime) {
+		irqtime_account_process_tick(p, user_tick, rq);
+		return;
+	}
+
 	if (user_tick)
 		account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
 	else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET))
@@ -3686,6 +3782,12 @@ void account_steal_ticks(unsigned long ticks)
  */
 void account_idle_ticks(unsigned long ticks)
 {
+
+	if (sched_clock_irqtime) {
+		irqtime_account_idle_ticks(ticks);
+		return;
+	}
+
 	account_idle_time(jiffies_to_cputime(ticks));
 }
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip:sched/core] softirqs: Account ksoftirqd time as cpustat softirq
  2010-12-22  1:09 ` [PATCH 5/5] Account ksoftirqd time as cpustat softirq -v2 Venkatesh Pallipadi
  2010-12-22  9:20   ` Peter Zijlstra
@ 2011-01-26 12:13   ` tip-bot for Venkatesh Pallipadi
  1 sibling, 0 replies; 20+ messages in thread
From: tip-bot for Venkatesh Pallipadi @ 2011-01-26 12:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, sruffell, tglx, venki, mingo

Commit-ID:  414bee9ba613adb3804965e2d84db32d0599f9c6
Gitweb:     http://git.kernel.org/tip/414bee9ba613adb3804965e2d84db32d0599f9c6
Author:     Venkatesh Pallipadi <venki@google.com>
AuthorDate: Tue, 21 Dec 2010 17:09:04 -0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 26 Jan 2011 12:33:22 +0100

softirqs: Account ksoftirqd time as cpustat softirq

softirq time in ksoftirqd context is not accounted in ns granularity
per cpu softirq stats, as we want that to be a part of ksoftirqd
exec_runtime.

Accounting them as softirq on /proc/stat separately.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-6-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 2a3c979..8b718b5 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3686,6 +3686,14 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 		cpustat->irq = cputime64_add(cpustat->irq, tmp);
 	} else if (irqtime_account_si_update()) {
 		cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
+	} else if (this_cpu_ksoftirqd() == p) {
+		/*
+		 * ksoftirqd time do not get accounted in cpu_softirq_time.
+		 * So, we have to handle it separately here.
+		 * Also, p->stime needs to be updated for ksoftirqd.
+		 */
+		__account_system_time(p, cputime_one_jiffy, one_jiffy_scaled,
+					&cpustat->softirq);
 	} else if (user_tick) {
 		account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
 	} else if (p == rq->idle) {

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-01-26 12:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-22  1:08 [PATCH 0/5] Proper kernel irq time reporting -v2 Venkatesh Pallipadi
2010-12-22  1:09 ` [PATCH 1/5] Free up pf flag PF_KSOFTIRQD -v2 Venkatesh Pallipadi
2010-12-22  9:17   ` Peter Zijlstra
2011-01-26 12:11   ` [tip:sched/core] softirqs: Free up pf flag PF_KSOFTIRQD tip-bot for Venkatesh Pallipadi
2010-12-22  1:09 ` [PATCH 2/5] Add nsecs_to_cputime64 interface for asm-generic -v2 Venkatesh Pallipadi
2010-12-22  8:30   ` Martin Schwidefsky
2010-12-22 14:23     ` Venkatesh Pallipadi
2010-12-22 15:25       ` Martin Schwidefsky
2011-01-26 12:12   ` [tip:sched/core] time: Add nsecs_to_cputime64 interface for asm-generic tip-bot for Venkatesh Pallipadi
2010-12-22  1:09 ` [PATCH 3/5] Refactor account_system_time separating id-update -v2 Venkatesh Pallipadi
2011-01-26 12:12   ` [tip:sched/core] sched: Refactor account_system_time separating id-update tip-bot for Venkatesh Pallipadi
2010-12-22  1:09 ` [PATCH 4/5] Export ns irqtimes through /proc/stat -v2 Venkatesh Pallipadi
2011-01-26 12:13   ` [tip:sched/core] sched: Export ns irqtimes through /proc/stat tip-bot for Venkatesh Pallipadi
2010-12-22  1:09 ` [PATCH 5/5] Account ksoftirqd time as cpustat softirq -v2 Venkatesh Pallipadi
2010-12-22  9:20   ` Peter Zijlstra
2010-12-22 13:59     ` Venkatesh Pallipadi
2010-12-22 14:05       ` Peter Zijlstra
2010-12-22 14:17         ` Venkatesh Pallipadi
2011-01-26 12:13   ` [tip:sched/core] softirqs: Account ksoftirqd time as cpustat softirq tip-bot for Venkatesh Pallipadi
2011-01-06 15:31 ` [PATCH 0/5] Proper kernel irq time reporting -v2 Shaun Ruffell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.