linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/5] sched,time: make irq time accounting work for nohz_idle
@ 2016-06-08  2:29 riel
  2016-06-08  2:30 ` [PATCH 1/5] sched,time: count actually elapsed irq & softirq time riel
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: riel @ 2016-06-08  2:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: kernellwp, mingo, peterz, tglx, fweisbec

This patch series seems to make irq time accounting work with
nohz_idle, by having it re-use the same strategy used for steal
time accounting in Wanpeng Li's patch.

It applies on top of an earlier version of Wanpeng Li's patch.

It gets rid of some code duplication, but needs a little bit more
work. Specifically, selecting CONFIG_IRQ_TIME_ACCOUNTING at the
same time as CONFIG_TICK_BASED_ACCOUNTING probably breaks :)

I am posting this because it works, and because I would like to
know what other changes I need to make at the same time.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] sched,time: count actually elapsed irq & softirq time
  2016-06-08  2:29 [PATCH RFC 0/5] sched,time: make irq time accounting work for nohz_idle riel
@ 2016-06-08  2:30 ` riel
  2016-06-08  2:30 ` [PATCH 2/5] nohz,cputime: remove VTIME_GEN vtime irq time code riel
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: riel @ 2016-06-08  2:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kernellwp, mingo, peterz, tglx, fweisbec

From: Rik van Riel <riel@redhat.com>

Currently, if there was any irq or softirq time during 'ticks'
jiffies, the entire period will be accounted as irq or softirq
time.

This is inaccurate if only a subset of 'ticks' jiffies was
actually spent handling irqs, and could conceivably mis-count
all of the ticks during a period as irq time, when there was
some irq and some softirq time.

Fix this by changing irqtime_account_hi_update and
irqtime_account_si_update to round elapsed irq and softirq
time to jiffies, and return the number of jiffies spent in
each mode, similar to how steal time is handled.

Additionally, have irqtime_account_process_tick take into
account how much time was spent in each of steal, irq,
and softirq time.

The latter could help improve the accuracy of timekeeping
when returning from idle on a NO_HZ_IDLE CPU.

Properly accounting how much time was spent in hardirq and
softirq time will also allow the NO_HZ_FULL code to re-use
these same functions for hardirq and softirq accounting.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/cputime.c | 54 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index f51c98c740b5..4bd6d1b774ab 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -79,34 +79,36 @@ void irqtime_account_irq(struct task_struct *curr)
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
-static int irqtime_account_hi_update(void)
+static unsigned long irqtime_account_hi_update(void)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
+	unsigned long irq_jiffies;
 	unsigned long flags;
-	u64 latest_ns;
-	int ret = 0;
+	u64 irq;
 
 	local_irq_save(flags);
-	latest_ns = this_cpu_read(cpu_hardirq_time);
-	if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_IRQ])
-		ret = 1;
+	irq = this_cpu_read(cpu_hardirq_time) - cpustat[CPUTIME_IRQ];
+	irq_jiffies = cputime_to_jiffies(irq);
+	if (irq_jiffies)
+		cpustat[CPUTIME_IRQ] += jiffies_to_cputime(irq_jiffies);
 	local_irq_restore(flags);
-	return ret;
+	return irq_jiffies;
 }
 
-static int irqtime_account_si_update(void)
+static unsigned long irqtime_account_si_update(void)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
+	unsigned long si_jiffies;
 	unsigned long flags;
-	u64 latest_ns;
-	int ret = 0;
+	u64 softirq;
 
 	local_irq_save(flags);
-	latest_ns = this_cpu_read(cpu_softirq_time);
-	if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_SOFTIRQ])
-		ret = 1;
+	delta = this_cpu_read(cpu_softirq_time) - cpustat[CPUTIME_SOFTIRQ];
+	si_jiffies = cputime_to_jiffies(delta);
+	if (si_jiffies)
+		cpustat[CPUSTIME_SOFTIRQ] += jiffies_to_cputime(si_jiffies);
 	local_irq_restore(flags);
-	return ret;
+	return si_jiffies;
 }
 
 #else /* CONFIG_IRQ_TIME_ACCOUNTING */
@@ -345,18 +347,30 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 	cputime_t scaled = cputime_to_scaled(cputime_one_jiffy);
 	u64 cputime = (__force u64) cputime_one_jiffy;
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
+	unsigned long other;
 
-	if (steal_account_process_tick())
+	/*
+	 * Subtract steal, irq & softirq time (if any) from ticks.
+	 * These are counted in nanoseconds and not aligned with jiffies.
+	 * Multiple of these could "break a jiffy" simultaneously due to
+	 * rounding; be careful to not account too much of this time at once.
+	 */
+	other = steal_account_process_tick();
+	if (other >= ticks)
+		return;
+
+	other += irqtime_account_hi_update();
+	if (other >= ticks)
+		return;
+
+	other += irqtime_account_si_update();
+	if (other >= ticks)
 		return;
 
 	cputime *= ticks;
 	scaled *= ticks;
 
-	if (irqtime_account_hi_update()) {
-		cpustat[CPUTIME_IRQ] += cputime;
-	} else if (irqtime_account_si_update()) {
-		cpustat[CPUTIME_SOFTIRQ] += cputime;
-	} else if (this_cpu_ksoftirqd() == p) {
+	if (this_cpu_ksoftirqd() == p) {
 		/*
 		 * ksoftirqd time do not get accounted in cpu_softirq_time.
 		 * So, we have to handle it separately here.
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] nohz,cputime: remove VTIME_GEN vtime irq time code
  2016-06-08  2:29 [PATCH RFC 0/5] sched,time: make irq time accounting work for nohz_idle riel
  2016-06-08  2:30 ` [PATCH 1/5] sched,time: count actually elapsed irq & softirq time riel
@ 2016-06-08  2:30 ` riel
  2016-06-08  2:30 ` [PATCH 3/5] cputime: allow irq time accounting to be selected as an option riel
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: riel @ 2016-06-08  2:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kernellwp, mingo, peterz, tglx, fweisbec

From: Rik van Riel <riel@redhat.com>

The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not
appear to currently work right.

On CPUs that are nohz_full, people typically do not assign IRQs.
On the housekeeping CPU (when a system is booted up with nohz_full),
sampling should work ok to determine irq and softirq time use, but
that only covers the housekeeping CPU itself, not the other
non-nohz_full CPUs.

On CPUs that are nohz_idle (the typical way a distro kernel is
booted), irq time is not accounted at all while the CPU is idle,
due to the lack of timer ticks.

Remove the VTIME_GEN vtime irq time code. The next patch will
allow NO_HZ_FULL kernels to use the IRQ_TIME_ACCOUNTING code.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/vtime.h  | 32 +++++++++++++----------------
 kernel/sched/cputime.c | 55 ++++++++++++++++++++++++++++----------------------
 2 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index fa2196990f84..3b384bf5ce1a 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -64,17 +64,6 @@ extern void vtime_account_system(struct task_struct *tsk);
 extern void vtime_account_idle(struct task_struct *tsk);
 extern void vtime_account_user(struct task_struct *tsk);
 
-#ifdef __ARCH_HAS_VTIME_ACCOUNT
-extern void vtime_account_irq_enter(struct task_struct *tsk);
-#else
-extern void vtime_common_account_irq_enter(struct task_struct *tsk);
-static inline void vtime_account_irq_enter(struct task_struct *tsk)
-{
-	if (vtime_accounting_cpu_enabled())
-		vtime_common_account_irq_enter(tsk);
-}
-#endif /* __ARCH_HAS_VTIME_ACCOUNT */
-
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING */
 
 static inline void vtime_task_switch(struct task_struct *prev) { }
@@ -85,13 +74,8 @@ static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 extern void arch_vtime_task_switch(struct task_struct *tsk);
-extern void vtime_gen_account_irq_exit(struct task_struct *tsk);
-
-static inline void vtime_account_irq_exit(struct task_struct *tsk)
-{
-	if (vtime_accounting_cpu_enabled())
-		vtime_gen_account_irq_exit(tsk);
-}
+static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
+static inline void vtime_account_irq_exit(struct task_struct *tsk) { }
 
 extern void vtime_user_enter(struct task_struct *tsk);
 
@@ -113,6 +97,18 @@ static inline void vtime_user_exit(struct task_struct *tsk) { }
 static inline void vtime_guest_enter(struct task_struct *tsk) { }
 static inline void vtime_guest_exit(struct task_struct *tsk) { }
 static inline void vtime_init_idle(struct task_struct *tsk, int cpu) { }
+
+#ifdef __ARCH_HAS_VTIME_ACCOUNT
+extern void vtime_account_irq_enter(struct task_struct *tsk);
+#else
+extern void vtime_common_account_irq_enter(struct task_struct *tsk);
+static inline void vtime_account_irq_enter(struct task_struct *tsk)
+{
+	if (vtime_accounting_cpu_enabled())
+		vtime_common_account_irq_enter(tsk);
+}
+#endif /* __ARCH_HAS_VTIME_ACCOUNT */
+
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 4bd6d1b774ab..2f862dfdb520 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -103,10 +103,10 @@ static unsigned long irqtime_account_si_update(void)
 	u64 softirq;
 
 	local_irq_save(flags);
-	delta = this_cpu_read(cpu_softirq_time) - cpustat[CPUTIME_SOFTIRQ];
-	si_jiffies = cputime_to_jiffies(delta);
+	softirq = this_cpu_read(cpu_softirq_time) - cpustat[CPUTIME_SOFTIRQ];
+	si_jiffies = cputime_to_jiffies(softirq);
 	if (si_jiffies)
-		cpustat[CPUSTIME_SOFTIRQ] += jiffies_to_cputime(si_jiffies);
+		cpustat[CPUTIME_SOFTIRQ] += jiffies_to_cputime(si_jiffies);
 	local_irq_restore(flags);
 	return si_jiffies;
 }
@@ -115,6 +115,16 @@ static unsigned long irqtime_account_si_update(void)
 
 #define sched_clock_irqtime	(0)
 
+static unsigned long irqtime_account_hi_update(void)
+{
+	return 0;
+}
+
+static unsigned long irqtime_account_si_update(void)
+{
+	return 0;
+}
+
 #endif /* !CONFIG_IRQ_TIME_ACCOUNTING */
 
 static inline void task_group_account_field(struct task_struct *p, int index,
@@ -346,7 +356,6 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
 {
 	cputime_t scaled = cputime_to_scaled(cputime_one_jiffy);
 	u64 cputime = (__force u64) cputime_one_jiffy;
-	u64 *cpustat = kcpustat_this_cpu->cpustat;
 	unsigned long other;
 
 	/*
@@ -703,16 +712,26 @@ static cputime_t get_vtime_delta(struct task_struct *tsk)
 	return jiffies_to_cputime(delta);
 }
 
+/* Account per-cpu irq, softirq, and steal time. Not accounted to a task. */
+static cputime_t __vtime_account_other(void)
+{
+	unsigned long ticks;
+
+	ticks = steal_account_process_tick();
+	ticks += irqtime_account_hi_update();
+	ticks += irqtime_account_si_update();
+
+	return jiffies_to_cputime(ticks);
+}
+
 static void __vtime_account_system(struct task_struct *tsk)
 {
-	cputime_t steal_time;
 	cputime_t delta_cpu = get_vtime_delta(tsk);
-	unsigned long delta_st = steal_account_process_tick();
-	steal_time = jiffies_to_cputime(delta_st);
+	cputime_t other = __vtime_account_other();
 
-	if (steal_time >= delta_cpu)
+	if (other >= delta_cpu)
 		return;
-	delta_cpu -= steal_time;
+	delta_cpu -= other;
 	account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
 }
 
@@ -726,16 +745,6 @@ void vtime_account_system(struct task_struct *tsk)
 	write_seqcount_end(&tsk->vtime_seqcount);
 }
 
-void vtime_gen_account_irq_exit(struct task_struct *tsk)
-{
-	write_seqcount_begin(&tsk->vtime_seqcount);
-	if (vtime_delta(tsk))
-		__vtime_account_system(tsk);
-	if (context_tracking_in_user())
-		tsk->vtime_snap_whence = VTIME_USER;
-	write_seqcount_end(&tsk->vtime_seqcount);
-}
-
 void vtime_account_user(struct task_struct *tsk)
 {
 	cputime_t delta_cpu;
@@ -743,16 +752,14 @@ void vtime_account_user(struct task_struct *tsk)
 	write_seqcount_begin(&tsk->vtime_seqcount);
 	tsk->vtime_snap_whence = VTIME_SYS;
 	if (vtime_delta(tsk)) {
-		cputime_t steal_time;
-		unsigned long delta_st = steal_account_process_tick();
+		cputime_t other = __vtime_account_other();
 		delta_cpu = get_vtime_delta(tsk);
-		steal_time = jiffies_to_cputime(delta_st);
 
-		if (steal_time >= delta_cpu) {
+		if (other >= delta_cpu) {
 			write_seqcount_end(&tsk->vtime_seqcount);
 			return;
 		}
-		delta_cpu -= steal_time;
+		delta_cpu -= other;
 		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
 	}
 	write_seqcount_end(&tsk->vtime_seqcount);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] cputime: allow irq time accounting to be selected as an option
  2016-06-08  2:29 [PATCH RFC 0/5] sched,time: make irq time accounting work for nohz_idle riel
  2016-06-08  2:30 ` [PATCH 1/5] sched,time: count actually elapsed irq & softirq time riel
  2016-06-08  2:30 ` [PATCH 2/5] nohz,cputime: remove VTIME_GEN vtime irq time code riel
@ 2016-06-08  2:30 ` riel
  2016-06-08  2:30 ` [PATCH 4/5] irqtime: add irq type parameter to irqtime_account_irq riel
  2016-06-08  2:30 ` [PATCH 5/5] irqtime: drop local_irq_save/restore from irqtime_account_irq riel
  4 siblings, 0 replies; 8+ messages in thread
From: riel @ 2016-06-08  2:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kernellwp, mingo, peterz, tglx, fweisbec

From: Rik van Riel <riel@redhat.com>

Allow CONFIG_IRQ_TIME_ACCOUNTING to be selected as an option, on top
of CONFIG_VIRT_CPU_ACCOUNTING_GEN (and potentially others?).

This allows for the irq time accounting code to be used with nohz_idle
CPUs, which is how several distributions ship their kernels. Using the
same code for several timer modes also allows us to drop duplicate code.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 init/Kconfig | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d54c65..4c7ee4f136cf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -375,9 +375,11 @@ config VIRT_CPU_ACCOUNTING_GEN
 
 	  If unsure, say N.
 
+endchoice
+
 config IRQ_TIME_ACCOUNTING
 	bool "Fine granularity task level IRQ time accounting"
-	depends on HAVE_IRQ_TIME_ACCOUNTING && !NO_HZ_FULL
+	depends on HAVE_IRQ_TIME_ACCOUNTING && !VIRT_CPU_ACCOUNTING_NATIVE
 	help
 	  Select this option to enable fine granularity task irq time
 	  accounting. This is done by reading a timestamp on each
@@ -386,8 +388,6 @@ config IRQ_TIME_ACCOUNTING
 
 	  If in doubt, say N here.
 
-endchoice
-
 config BSD_PROCESS_ACCT
 	bool "BSD Process Accounting"
 	depends on MULTIUSER
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] irqtime: add irq type parameter to irqtime_account_irq
  2016-06-08  2:29 [PATCH RFC 0/5] sched,time: make irq time accounting work for nohz_idle riel
                   ` (2 preceding siblings ...)
  2016-06-08  2:30 ` [PATCH 3/5] cputime: allow irq time accounting to be selected as an option riel
@ 2016-06-08  2:30 ` riel
  2016-06-08  2:30 ` [PATCH 5/5] irqtime: drop local_irq_save/restore from irqtime_account_irq riel
  4 siblings, 0 replies; 8+ messages in thread
From: riel @ 2016-06-08  2:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kernellwp, mingo, peterz, tglx, fweisbec

From: Rik van Riel <riel@redhat.com>

Add an irq type parameter and documentation to irqtime_account_irq,
this can be used to distinguish between transitioning from process
context to hardirq time, and from process context to softirq time.

This is necessary to be able to remove the local_irq_disable from
irqtime_account_irq.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/hardirq.h | 20 ++++++++++----------
 include/linux/vtime.h   | 12 ++++++------
 kernel/sched/cputime.c  |  9 ++++++++-
 kernel/softirq.c        |  6 +++---
 4 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index dfd59d6bc6f0..1ebb31f56285 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -32,11 +32,11 @@ extern void rcu_nmi_exit(void);
  * always balanced, so the interrupted value of ->hardirq_context
  * will always be restored.
  */
-#define __irq_enter()					\
-	do {						\
-		account_irq_enter_time(current);	\
-		preempt_count_add(HARDIRQ_OFFSET);	\
-		trace_hardirq_enter();			\
+#define __irq_enter()							\
+	do {								\
+		account_irq_enter_time(current, HARDIRQ_OFFSET);	\
+		preempt_count_add(HARDIRQ_OFFSET);			\
+		trace_hardirq_enter();					\
 	} while (0)
 
 /*
@@ -47,11 +47,11 @@ extern void irq_enter(void);
 /*
  * Exit irq context without processing softirqs:
  */
-#define __irq_exit()					\
-	do {						\
-		trace_hardirq_exit();			\
-		account_irq_exit_time(current);		\
-		preempt_count_sub(HARDIRQ_OFFSET);	\
+#define __irq_exit()							\
+	do {								\
+		trace_hardirq_exit();					\
+		account_irq_exit_time(current, HARDIRQ_OFFSET);		\
+		preempt_count_sub(HARDIRQ_OFFSET);			\
 	} while (0)
 
 /*
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 3b384bf5ce1a..58f036f3ebea 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -112,21 +112,21 @@ static inline void vtime_account_irq_enter(struct task_struct *tsk)
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-extern void irqtime_account_irq(struct task_struct *tsk);
+extern void irqtime_account_irq(struct task_struct *tsk, int irqtype);
 #else
-static inline void irqtime_account_irq(struct task_struct *tsk) { }
+static inline void irqtime_account_irq(struct task_struct *tsk, int irqtype) { }
 #endif
 
-static inline void account_irq_enter_time(struct task_struct *tsk)
+static inline void account_irq_enter_time(struct task_struct *tsk, int irqtype)
 {
 	vtime_account_irq_enter(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_irq(tsk, irqtype);
 }
 
-static inline void account_irq_exit_time(struct task_struct *tsk)
+static inline void account_irq_exit_time(struct task_struct *tsk, int irqtype)
 {
 	vtime_account_irq_exit(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_irq(tsk, irqtype);
 }
 
 #endif /* _LINUX_KERNEL_VTIME_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 2f862dfdb520..e009077aeab6 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -46,8 +46,15 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
+ *
+ * There are six possible transitions:
+ * process -> softirq, softirq -> process
+ * process -> hardirq, hardirq -> process
+ * softirq -> hardirq, hardirq -> softirq
+ *
+ * When exiting hardirq or softirq time, account the elapsed time.
  */
-void irqtime_account_irq(struct task_struct *curr)
+void irqtime_account_irq(struct task_struct *curr, int irqtype)
 {
 	unsigned long flags;
 	s64 delta;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 17caf4b63342..a311c9622c86 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -245,7 +245,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	current->flags &= ~PF_MEMALLOC;
 
 	pending = local_softirq_pending();
-	account_irq_enter_time(current);
+	account_irq_enter_time(current, SOFTIRQ_OFFSET);
 
 	__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
 	in_hardirq = lockdep_softirq_start();
@@ -295,7 +295,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	}
 
 	lockdep_softirq_end(in_hardirq);
-	account_irq_exit_time(current);
+	account_irq_exit_time(current, SOFTIRQ_OFFSET);
 	__local_bh_enable(SOFTIRQ_OFFSET);
 	WARN_ON_ONCE(in_interrupt());
 	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
@@ -385,7 +385,7 @@ void irq_exit(void)
 	WARN_ON_ONCE(!irqs_disabled());
 #endif
 
-	account_irq_exit_time(current);
+	account_irq_exit_time(current, HARDIRQ_OFFSET);
 	preempt_count_sub(HARDIRQ_OFFSET);
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] irqtime: drop local_irq_save/restore from irqtime_account_irq
  2016-06-08  2:29 [PATCH RFC 0/5] sched,time: make irq time accounting work for nohz_idle riel
                   ` (3 preceding siblings ...)
  2016-06-08  2:30 ` [PATCH 4/5] irqtime: add irq type parameter to irqtime_account_irq riel
@ 2016-06-08  2:30 ` riel
  4 siblings, 0 replies; 8+ messages in thread
From: riel @ 2016-06-08  2:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: kernellwp, mingo, peterz, tglx, fweisbec

From: Rik van Riel <riel@redhat.com>

Drop local_irq_save/restore from irqtime_account_irq.
Instead, have softirq and hardirq track their time spent
independently, with the softirq code subtracting hardirq
time that happened during the duration of the softirq run.

The softirq code can be interrupted by hardirq code at
any point in time, but it can check whether it got a
consistent snapshot of the timekeeping variables it wants,
and loop around in the unlikely case that it did not.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sched/cputime.c | 54 ++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e009077aeab6..466aff107f73 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -26,7 +26,9 @@
 DEFINE_PER_CPU(u64, cpu_hardirq_time);
 DEFINE_PER_CPU(u64, cpu_softirq_time);
 
-static DEFINE_PER_CPU(u64, irq_start_time);
+static DEFINE_PER_CPU(u64, hardirq_start_time);
+static DEFINE_PER_CPU(u64, softirq_start_time);
+static DEFINE_PER_CPU(u64, prev_hardirq_time);
 static int sched_clock_irqtime;
 
 void enable_sched_clock_irqtime(void)
@@ -53,36 +55,66 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
  * softirq -> hardirq, hardirq -> softirq
  *
  * When exiting hardirq or softirq time, account the elapsed time.
+ *
+ * When exiting softirq time, subtract the amount of hardirq time that
+ * interrupted this softirq run, to avoid double accounting of that time.
  */
 void irqtime_account_irq(struct task_struct *curr, int irqtype)
 {
-	unsigned long flags;
-	s64 delta;
+	u64 prev_softirq_start;
+	u64 prev_hardirq;
+	u64 hardirq_time;
+	s64 delta = 0;
 	int cpu;
 
 	if (!sched_clock_irqtime)
 		return;
 
-	local_irq_save(flags);
-
 	cpu = smp_processor_id();
-	delta = sched_clock_cpu(cpu) - __this_cpu_read(irq_start_time);
-	__this_cpu_add(irq_start_time, delta);
+	prev_hardirq = __this_cpu_read(prev_hardirq_time);
+	prev_softirq_start = __this_cpu_read(softirq_start_time);
+	/*
+	 * Softirq context may get interrupted by hardirq context,
+	 * on the same CPU. At softirq 
+	 */
+	if (irqtype == HARDIRQ_OFFSET) {
+		delta = sched_clock_cpu(cpu) - __this_cpu_read(hardirq_start_time);
+		__this_cpu_add(hardirq_start_time, delta);
+	} else do {
+		hardirq_time = READ_ONCE(per_cpu(cpu_hardirq_time, cpu));
+		u64 now = sched_clock_cpu(cpu);
+
+		delta = now - prev_softirq_start;
+		if (in_serving_softirq()) {
+			/*
+			 * Leaving softirq context. Avoid double counting by
+			 * subtracting hardirq time from this interval.
+			 */
+			delta -= hardirq_time - prev_hardirq;
+		} else {
+			/* Entering softirq context. Note start times. */
+			__this_cpu_write(softirq_start_time, now);
+			__this_cpu_write(prev_hardirq_time, hardirq_time);
+		}
+		/*
+		 * If a hardirq happened during this calculation, it may not
+		 * have gotten a consistent snapshot. Try again.
+		 */
+	} while (hardirq_time != READ_ONCE(per_cpu(cpu_hardirq_time, cpu)));
 
-	irq_time_write_begin();
 	/*
 	 * We do not account for softirq time from ksoftirqd here.
 	 * We want to continue accounting softirq time to ksoftirqd thread
 	 * in that case, so as not to confuse scheduler with a special task
 	 * that do not consume any time, but still wants to run.
 	 */
-	if (hardirq_count())
+	if (irqtype == HARDIRQ_OFFSET && hardirq_count())
 		__this_cpu_add(cpu_hardirq_time, delta);
-	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+	else if (irqtype == SOFTIRQ_OFFSET && in_serving_softirq() &&
+				curr != this_cpu_ksoftirqd())
 		__this_cpu_add(cpu_softirq_time, delta);
 
 	irq_time_write_end();
-	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] irqtime: add irq type parameter to irqtime_account_irq
  2016-06-23  2:25 [PATCH v2 0/5] sched,time: fix irq time accounting with nohz_idle riel
@ 2016-06-23  2:25 ` riel
  0 siblings, 0 replies; 8+ messages in thread
From: riel @ 2016-06-23  2:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, mingo, pbonzini, fweisbec, wanpeng.li, efault, tglx, rkrcmar

From: Rik van Riel <riel@redhat.com>

Add an irq type parameter and documentation to irqtime_account_irq,
this can be used to distinguish between transitioning from process
context to hardirq time, and from process context to softirq time.

This is necessary to be able to remove the local_irq_disable from
irqtime_account_irq.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/hardirq.h | 20 ++++++++++----------
 include/linux/vtime.h   | 12 ++++++------
 kernel/sched/cputime.c  |  9 ++++++++-
 kernel/softirq.c        |  6 +++---
 4 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index dfd59d6bc6f0..1ebb31f56285 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -32,11 +32,11 @@ extern void rcu_nmi_exit(void);
  * always balanced, so the interrupted value of ->hardirq_context
  * will always be restored.
  */
-#define __irq_enter()					\
-	do {						\
-		account_irq_enter_time(current);	\
-		preempt_count_add(HARDIRQ_OFFSET);	\
-		trace_hardirq_enter();			\
+#define __irq_enter()							\
+	do {								\
+		account_irq_enter_time(current, HARDIRQ_OFFSET);	\
+		preempt_count_add(HARDIRQ_OFFSET);			\
+		trace_hardirq_enter();					\
 	} while (0)
 
 /*
@@ -47,11 +47,11 @@ extern void irq_enter(void);
 /*
  * Exit irq context without processing softirqs:
  */
-#define __irq_exit()					\
-	do {						\
-		trace_hardirq_exit();			\
-		account_irq_exit_time(current);		\
-		preempt_count_sub(HARDIRQ_OFFSET);	\
+#define __irq_exit()							\
+	do {								\
+		trace_hardirq_exit();					\
+		account_irq_exit_time(current, HARDIRQ_OFFSET);		\
+		preempt_count_sub(HARDIRQ_OFFSET);			\
 	} while (0)
 
 /*
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index d1977d84ebdf..31e486d741b0 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -112,21 +112,21 @@ static inline void vtime_init_idle(struct task_struct *tsk, int cpu) { }
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-extern void irqtime_account_irq(struct task_struct *tsk);
+extern void irqtime_account_irq(struct task_struct *tsk, int irqtype);
 #else
-static inline void irqtime_account_irq(struct task_struct *tsk) { }
+static inline void irqtime_account_irq(struct task_struct *tsk, int irqtype) { }
 #endif
 
-static inline void account_irq_enter_time(struct task_struct *tsk)
+static inline void account_irq_enter_time(struct task_struct *tsk, int irqtype)
 {
 	vtime_account_irq_enter(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_irq(tsk, irqtype);
 }
 
-static inline void account_irq_exit_time(struct task_struct *tsk)
+static inline void account_irq_exit_time(struct task_struct *tsk, int irqtype)
 {
 	vtime_account_irq_exit(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_irq(tsk, irqtype);
 }
 
 #endif /* _LINUX_KERNEL_VTIME_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 0280528800f2..fc4122afc022 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -46,8 +46,15 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
+ *
+ * There are six possible transitions:
+ * process -> softirq, softirq -> process
+ * process -> hardirq, hardirq -> process
+ * softirq -> hardirq, hardirq -> softirq
+ *
+ * When exiting hardirq or softirq time, account the elapsed time.
  */
-void irqtime_account_irq(struct task_struct *curr)
+void irqtime_account_irq(struct task_struct *curr, int irqtype)
 {
 	unsigned long flags;
 	s64 delta;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 17caf4b63342..a311c9622c86 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -245,7 +245,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	current->flags &= ~PF_MEMALLOC;
 
 	pending = local_softirq_pending();
-	account_irq_enter_time(current);
+	account_irq_enter_time(current, SOFTIRQ_OFFSET);
 
 	__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
 	in_hardirq = lockdep_softirq_start();
@@ -295,7 +295,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	}
 
 	lockdep_softirq_end(in_hardirq);
-	account_irq_exit_time(current);
+	account_irq_exit_time(current, SOFTIRQ_OFFSET);
 	__local_bh_enable(SOFTIRQ_OFFSET);
 	WARN_ON_ONCE(in_interrupt());
 	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
@@ -385,7 +385,7 @@ void irq_exit(void)
 	WARN_ON_ONCE(!irqs_disabled());
 #endif
 
-	account_irq_exit_time(current);
+	account_irq_exit_time(current, HARDIRQ_OFFSET);
 	preempt_count_sub(HARDIRQ_OFFSET);
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] irqtime: add irq type parameter to irqtime_account_irq
  2016-06-16 16:06 [PATCH 0/5] sched,time: fix irq time accounting with nohz_idle riel
@ 2016-06-16 16:06 ` riel
  0 siblings, 0 replies; 8+ messages in thread
From: riel @ 2016-06-16 16:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, pbonzini, peterz, fweisbec, wanpeng.li, efault, tglx, rkrcmar

From: Rik van Riel <riel@redhat.com>

Add an irq type parameter and documentation to irqtime_account_irq,
this can be used to distinguish between transitioning from process
context to hardirq time, and from process context to softirq time.

This is necessary to be able to remove the local_irq_disable from
irqtime_account_irq.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/hardirq.h | 20 ++++++++++----------
 include/linux/vtime.h   | 12 ++++++------
 kernel/sched/cputime.c  |  9 ++++++++-
 kernel/softirq.c        |  6 +++---
 4 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index dfd59d6bc6f0..1ebb31f56285 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -32,11 +32,11 @@ extern void rcu_nmi_exit(void);
  * always balanced, so the interrupted value of ->hardirq_context
  * will always be restored.
  */
-#define __irq_enter()					\
-	do {						\
-		account_irq_enter_time(current);	\
-		preempt_count_add(HARDIRQ_OFFSET);	\
-		trace_hardirq_enter();			\
+#define __irq_enter()							\
+	do {								\
+		account_irq_enter_time(current, HARDIRQ_OFFSET);	\
+		preempt_count_add(HARDIRQ_OFFSET);			\
+		trace_hardirq_enter();					\
 	} while (0)
 
 /*
@@ -47,11 +47,11 @@ extern void irq_enter(void);
 /*
  * Exit irq context without processing softirqs:
  */
-#define __irq_exit()					\
-	do {						\
-		trace_hardirq_exit();			\
-		account_irq_exit_time(current);		\
-		preempt_count_sub(HARDIRQ_OFFSET);	\
+#define __irq_exit()							\
+	do {								\
+		trace_hardirq_exit();					\
+		account_irq_exit_time(current, HARDIRQ_OFFSET);		\
+		preempt_count_sub(HARDIRQ_OFFSET);			\
 	} while (0)
 
 /*
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index d1977d84ebdf..31e486d741b0 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -112,21 +112,21 @@ static inline void vtime_init_idle(struct task_struct *tsk, int cpu) { }
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-extern void irqtime_account_irq(struct task_struct *tsk);
+extern void irqtime_account_irq(struct task_struct *tsk, int irqtype);
 #else
-static inline void irqtime_account_irq(struct task_struct *tsk) { }
+static inline void irqtime_account_irq(struct task_struct *tsk, int irqtype) { }
 #endif
 
-static inline void account_irq_enter_time(struct task_struct *tsk)
+static inline void account_irq_enter_time(struct task_struct *tsk, int irqtype)
 {
 	vtime_account_irq_enter(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_irq(tsk, irqtype);
 }
 
-static inline void account_irq_exit_time(struct task_struct *tsk)
+static inline void account_irq_exit_time(struct task_struct *tsk, int irqtype)
 {
 	vtime_account_irq_exit(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_irq(tsk, irqtype);
 }
 
 #endif /* _LINUX_KERNEL_VTIME_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 261c9002348a..07f847267e4e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -46,8 +46,15 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
+ *
+ * There are six possible transitions:
+ * process -> softirq, softirq -> process
+ * process -> hardirq, hardirq -> process
+ * softirq -> hardirq, hardirq -> softirq
+ *
+ * When exiting hardirq or softirq time, account the elapsed time.
  */
-void irqtime_account_irq(struct task_struct *curr)
+void irqtime_account_irq(struct task_struct *curr, int irqtype)
 {
 	unsigned long flags;
 	s64 delta;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 17caf4b63342..a311c9622c86 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -245,7 +245,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	current->flags &= ~PF_MEMALLOC;
 
 	pending = local_softirq_pending();
-	account_irq_enter_time(current);
+	account_irq_enter_time(current, SOFTIRQ_OFFSET);
 
 	__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
 	in_hardirq = lockdep_softirq_start();
@@ -295,7 +295,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	}
 
 	lockdep_softirq_end(in_hardirq);
-	account_irq_exit_time(current);
+	account_irq_exit_time(current, SOFTIRQ_OFFSET);
 	__local_bh_enable(SOFTIRQ_OFFSET);
 	WARN_ON_ONCE(in_interrupt());
 	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
@@ -385,7 +385,7 @@ void irq_exit(void)
 	WARN_ON_ONCE(!irqs_disabled());
 #endif
 
-	account_irq_exit_time(current);
+	account_irq_exit_time(current, HARDIRQ_OFFSET);
 	preempt_count_sub(HARDIRQ_OFFSET);
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-23  2:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-08  2:29 [PATCH RFC 0/5] sched,time: make irq time accounting work for nohz_idle riel
2016-06-08  2:30 ` [PATCH 1/5] sched,time: count actually elapsed irq & softirq time riel
2016-06-08  2:30 ` [PATCH 2/5] nohz,cputime: remove VTIME_GEN vtime irq time code riel
2016-06-08  2:30 ` [PATCH 3/5] cputime: allow irq time accounting to be selected as an option riel
2016-06-08  2:30 ` [PATCH 4/5] irqtime: add irq type parameter to irqtime_account_irq riel
2016-06-08  2:30 ` [PATCH 5/5] irqtime: drop local_irq_save/restore from irqtime_account_irq riel
2016-06-16 16:06 [PATCH 0/5] sched,time: fix irq time accounting with nohz_idle riel
2016-06-16 16:06 ` [PATCH 4/5] irqtime: add irq type parameter to irqtime_account_irq riel
2016-06-23  2:25 [PATCH v2 0/5] sched,time: fix irq time accounting with nohz_idle riel
2016-06-23  2:25 ` [PATCH 4/5] irqtime: add irq type parameter to irqtime_account_irq riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).