linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry v2
@ 2020-12-01  0:12 Frederic Weisbecker
  2020-12-01  0:12 ` [PATCH 1/5] sched/cputime: Remove symbol exports from IRQ time accounting Frederic Weisbecker
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-01  0:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Tony Luck, Peter Zijlstra,
	Vasily Gorbik, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Christian Borntraeger, Fenghua Yu,
	Heiko Carstens

In this version, the EXPORT_SYMBOL_GPL() on IRQ time accounting have
been removed since no modules should be calling there.

Link to v1 for reference:
	https://lore.kernel.org/lkml/20201125021542.30237-1-frederic@kernel.org/

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	irq/core

HEAD: 2e6155a86dba7c53d080d58284ef5c65f487bef0

Frederic Weisbecker (5):
  sched/cputime: Remove symbol exports from IRQ time accounting
  sched/vtime: Consolidate IRQ time accounting
  s390/vtime: Convert to consolidated IRQ time accounting
  irqtime: Move irqtime entry accounting after irq offset incrementation
  irq: Call tick_irq_enter() inside HARDIRQ_OFFSET

 arch/ia64/kernel/time.c       | 20 +++++++---
 arch/powerpc/kernel/time.c    | 58 ++++++++++++++++++++--------
 arch/s390/include/asm/vtime.h |  1 -
 arch/s390/kernel/vtime.c      | 58 ++++++++++++++++++----------
 include/linux/hardirq.h       |  4 +-
 include/linux/vtime.h         | 18 ++++-----
 kernel/sched/cputime.c        | 73 +++++++++++++++++++++++++++--------
 kernel/softirq.c              | 16 +++-----
 8 files changed, 167 insertions(+), 81 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5] sched/cputime: Remove symbol exports from IRQ time accounting
  2020-12-01  0:12 [PATCH 0/5] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry v2 Frederic Weisbecker
@ 2020-12-01  0:12 ` Frederic Weisbecker
  2020-12-01  0:12 ` [PATCH 2/5] sched/vtime: Consolidate " Frederic Weisbecker
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-01  0:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Tony Luck, Peter Zijlstra,
	Vasily Gorbik, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Christian Borntraeger, Fenghua Yu,
	Heiko Carstens

account_irq_enter_time() and account_irq_exit_time() are not called
from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ
cputime accounting functions called from there.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kernel/vtime.c | 10 +++++-----
 kernel/sched/cputime.c   |  2 --
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 8df10d3c8f6c..f9f2a11958a5 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -226,7 +226,7 @@ void vtime_flush(struct task_struct *tsk)
  * Update process times based on virtual cpu times stored by entry.S
  * to the lowcore fields user_timer, system_timer & steal_clock.
  */
-void vtime_account_irq_enter(struct task_struct *tsk)
+void vtime_account_kernel(struct task_struct *tsk)
 {
 	u64 timer;
 
@@ -245,12 +245,12 @@ void vtime_account_irq_enter(struct task_struct *tsk)
 
 	virt_timer_forward(timer);
 }
-EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
-
-void vtime_account_kernel(struct task_struct *tsk)
-__attribute__((alias("vtime_account_irq_enter")));
 EXPORT_SYMBOL_GPL(vtime_account_kernel);
 
+void vtime_account_irq_enter(struct task_struct *tsk)
+__attribute__((alias("vtime_account_kernel")));
+
+
 /*
  * Sorted add to a list. List is linear searched until first bigger
  * element is found.
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5a55d2300452..61ce9f9bf0a3 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -71,7 +71,6 @@ void irqtime_account_irq(struct task_struct *curr)
 	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
 		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
 }
-EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
 static u64 irqtime_tick_accounted(u64 maxtime)
 {
@@ -434,7 +433,6 @@ void vtime_account_irq_enter(struct task_struct *tsk)
 	else
 		vtime_account_kernel(tsk);
 }
-EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
 #endif /* __ARCH_HAS_VTIME_ACCOUNT */
 
 void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/5] sched/vtime: Consolidate IRQ time accounting
  2020-12-01  0:12 [PATCH 0/5] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry v2 Frederic Weisbecker
  2020-12-01  0:12 ` [PATCH 1/5] sched/cputime: Remove symbol exports from IRQ time accounting Frederic Weisbecker
@ 2020-12-01  0:12 ` Frederic Weisbecker
  2020-12-01  0:12 ` [PATCH 3/5] s390/vtime: Convert to consolidated " Frederic Weisbecker
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-01  0:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Tony Luck, Peter Zijlstra,
	Vasily Gorbik, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Christian Borntraeger, Fenghua Yu,
	Heiko Carstens

The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
all have their own version of irq time accounting that dispatch the
cputime to the appropriate index: hardirq, softirq, system, idle,
guest... from an all-in-one function.

Instead of having these ad-hoc versions, move the cputime destination
dispatch decision to the core code and leave only the actual per-index
cputime accounting to the architecture.

For now only ia64 and powerpc are handled. s390 will need a slightly
different treatment.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/ia64/kernel/time.c    | 20 +++++++++----
 arch/powerpc/kernel/time.c | 58 +++++++++++++++++++++++++++-----------
 arch/s390/kernel/vtime.c   |  2 ++
 include/linux/vtime.h      |  8 ++----
 kernel/sched/cputime.c     | 20 +++++++++++--
 5 files changed, 79 insertions(+), 29 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 7abc5f37bfaf..733e0e3324b8 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -138,12 +138,8 @@ void vtime_account_kernel(struct task_struct *tsk)
 	struct thread_info *ti = task_thread_info(tsk);
 	__u64 stime = vtime_delta(tsk);
 
-	if ((tsk->flags & PF_VCPU) && !irq_count())
+	if (tsk->flags & PF_VCPU)
 		ti->gtime += stime;
-	else if (hardirq_count())
-		ti->hardirq_time += stime;
-	else if (in_serving_softirq())
-		ti->softirq_time += stime;
 	else
 		ti->stime += stime;
 }
@@ -156,6 +152,20 @@ void vtime_account_idle(struct task_struct *tsk)
 	ti->idle_time += vtime_delta(tsk);
 }
 
+void vtime_account_softirq(struct task_struct *tsk)
+{
+	struct thread_info *ti = task_thread_info(tsk);
+
+	ti->softirq_time += vtime_delta(tsk);
+}
+
+void vtime_account_hardirq(struct task_struct *tsk)
+{
+	struct thread_info *ti = task_thread_info(tsk);
+
+	ti->hardirq_time += vtime_delta(tsk);
+}
+
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 static irqreturn_t
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 74efe46f5532..b44e892bddf8 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -311,12 +311,11 @@ static unsigned long vtime_delta_scaled(struct cpu_accounting_data *acct,
 	return stime_scaled;
 }
 
-static unsigned long vtime_delta(struct task_struct *tsk,
+static unsigned long vtime_delta(struct cpu_accounting_data *acct,
 				 unsigned long *stime_scaled,
 				 unsigned long *steal_time)
 {
 	unsigned long now, stime;
-	struct cpu_accounting_data *acct = get_accounting(tsk);
 
 	WARN_ON_ONCE(!irqs_disabled());
 
@@ -331,29 +330,30 @@ static unsigned long vtime_delta(struct task_struct *tsk,
 	return stime;
 }
 
-void vtime_account_kernel(struct task_struct *tsk)
+static void vtime_delta_kernel(struct cpu_accounting_data *acct,
+			       unsigned long *stime, unsigned long *stime_scaled)
 {
-	unsigned long stime, stime_scaled, steal_time;
-	struct cpu_accounting_data *acct = get_accounting(tsk);
+	unsigned long steal_time;
 
-	stime = vtime_delta(tsk, &stime_scaled, &steal_time);
-
-	stime -= min(stime, steal_time);
+	*stime = vtime_delta(acct, stime_scaled, &steal_time);
+	*stime -= min(*stime, steal_time);
 	acct->steal_time += steal_time;
+}
 
-	if ((tsk->flags & PF_VCPU) && !irq_count()) {
+void vtime_account_kernel(struct task_struct *tsk)
+{
+	struct cpu_accounting_data *acct = get_accounting(tsk);
+	unsigned long stime, stime_scaled;
+
+	vtime_delta_kernel(acct, &stime, &stime_scaled);
+
+	if (tsk->flags & PF_VCPU) {
 		acct->gtime += stime;
 #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
 		acct->utime_scaled += stime_scaled;
 #endif
 	} else {
-		if (hardirq_count())
-			acct->hardirq_time += stime;
-		else if (in_serving_softirq())
-			acct->softirq_time += stime;
-		else
-			acct->stime += stime;
-
+		acct->stime += stime;
 #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
 		acct->stime_scaled += stime_scaled;
 #endif
@@ -366,10 +366,34 @@ void vtime_account_idle(struct task_struct *tsk)
 	unsigned long stime, stime_scaled, steal_time;
 	struct cpu_accounting_data *acct = get_accounting(tsk);
 
-	stime = vtime_delta(tsk, &stime_scaled, &steal_time);
+	stime = vtime_delta(acct, &stime_scaled, &steal_time);
 	acct->idle_time += stime + steal_time;
 }
 
+static void vtime_account_irq(struct cpu_accounting_data *acct,
+			      unsigned long *field)
+{
+	unsigned long stime, stime_scaled;
+
+	vtime_delta_kernel(acct, &stime, &stime_scaled);
+	*field += stime;
+#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
+	acct->stime_scaled += stime_scaled;
+#endif
+}
+
+void vtime_account_softirq(struct task_struct *tsk)
+{
+	struct cpu_accounting_data *acct = get_accounting(tsk);
+	vtime_account_irq(acct, &acct->softirq_time);
+}
+
+void vtime_account_hardirq(struct task_struct *tsk)
+{
+	struct cpu_accounting_data *acct = get_accounting(tsk);
+	vtime_account_irq(acct, &acct->hardirq_time);
+}
+
 static void vtime_flush_scaled(struct task_struct *tsk,
 			       struct cpu_accounting_data *acct)
 {
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index f9f2a11958a5..e043808a1488 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -250,6 +250,8 @@ EXPORT_SYMBOL_GPL(vtime_account_kernel);
 void vtime_account_irq_enter(struct task_struct *tsk)
 __attribute__((alias("vtime_account_kernel")));
 
+void vtime_account_irq_exit(struct task_struct *tsk)
+__attribute__((alias("vtime_account_kernel")));
 
 /*
  * Sorted add to a list. List is linear searched until first bigger
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 2cdeca062db3..f827b38c3bb7 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -84,11 +84,9 @@ static inline void vtime_init_idle(struct task_struct *tsk, int cpu) { }
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 extern void vtime_account_irq_enter(struct task_struct *tsk);
-static inline void vtime_account_irq_exit(struct task_struct *tsk)
-{
-	/* On hard|softirq exit we always account to hard|softirq cputime */
-	vtime_account_kernel(tsk);
-}
+extern void vtime_account_irq_exit(struct task_struct *tsk);
+extern void vtime_account_softirq(struct task_struct *tsk);
+extern void vtime_account_hardirq(struct task_struct *tsk);
 extern void vtime_flush(struct task_struct *tsk);
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 61ce9f9bf0a3..65efde4d0dca 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -428,10 +428,26 @@ void vtime_task_switch(struct task_struct *prev)
 #ifndef __ARCH_HAS_VTIME_ACCOUNT
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
-	if (!in_interrupt() && is_idle_task(tsk))
+	if (hardirq_count()) {
+		vtime_account_hardirq(tsk);
+	} else if (in_serving_softirq()) {
+		vtime_account_softirq(tsk);
+	} else if (is_idle_task(tsk)) {
 		vtime_account_idle(tsk);
-	else
+	} else {
 		vtime_account_kernel(tsk);
+	}
+}
+
+void vtime_account_irq_exit(struct task_struct *tsk)
+{
+	WARN_ON_ONCE(in_task());
+
+	if (hardirq_count()) {
+		vtime_account_hardirq(tsk);
+	} else if (in_serving_softirq()) {
+		vtime_account_softirq(tsk);
+	}
 }
 #endif /* __ARCH_HAS_VTIME_ACCOUNT */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/5] s390/vtime: Convert to consolidated IRQ time accounting
  2020-12-01  0:12 [PATCH 0/5] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry v2 Frederic Weisbecker
  2020-12-01  0:12 ` [PATCH 1/5] sched/cputime: Remove symbol exports from IRQ time accounting Frederic Weisbecker
  2020-12-01  0:12 ` [PATCH 2/5] sched/vtime: Consolidate " Frederic Weisbecker
@ 2020-12-01  0:12 ` Frederic Weisbecker
  2020-12-01  0:12 ` [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation Frederic Weisbecker
  2020-12-01  0:12 ` [PATCH 5/5] irq: Call tick_irq_enter() inside HARDIRQ_OFFSET Frederic Weisbecker
  4 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-01  0:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Tony Luck, Peter Zijlstra,
	Vasily Gorbik, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Christian Borntraeger, Fenghua Yu,
	Heiko Carstens

s390 has its own version of IRQ time accounting because it doesn't
account the idle time the same way the other architectures do. Only
the actual idle sleep time is accounted as idle time, the rest of the
idle task execution is accounted as system time.

However converting it to the consolidated IRQ time accounting is easy:
just keep the current behaviour and redirect generic idle time
accounting to system time accounting.

This removes the need to maintain an ad-hoc implementation of cputime
dispatch decision.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/vtime.h |  1 -
 arch/s390/kernel/vtime.c      | 54 +++++++++++++++++++++++------------
 kernel/sched/cputime.c        |  2 --
 3 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/arch/s390/include/asm/vtime.h b/arch/s390/include/asm/vtime.h
index 3622d4ebc73a..fac6a67988eb 100644
--- a/arch/s390/include/asm/vtime.h
+++ b/arch/s390/include/asm/vtime.h
@@ -2,7 +2,6 @@
 #ifndef _S390_VTIME_H
 #define _S390_VTIME_H
 
-#define __ARCH_HAS_VTIME_ACCOUNT
 #define __ARCH_HAS_VTIME_TASK_SWITCH
 
 #endif /* _S390_VTIME_H */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index e043808a1488..ab75aab7f8cf 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -222,36 +222,52 @@ void vtime_flush(struct task_struct *tsk)
 	S390_lowcore.avg_steal_timer = avg_steal;
 }
 
-/*
- * Update process times based on virtual cpu times stored by entry.S
- * to the lowcore fields user_timer, system_timer & steal_clock.
- */
-void vtime_account_kernel(struct task_struct *tsk)
+static u64 vtime_delta(void)
 {
-	u64 timer;
+	u64 timer = S390_lowcore.last_update_timer;
 
-	timer = S390_lowcore.last_update_timer;
 	S390_lowcore.last_update_timer = get_vtimer();
-	timer -= S390_lowcore.last_update_timer;
 
-	if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
-		S390_lowcore.guest_timer += timer;
-	else if (hardirq_count())
-		S390_lowcore.hardirq_timer += timer;
-	else if (in_serving_softirq())
-		S390_lowcore.softirq_timer += timer;
+	return timer - S390_lowcore.last_update_timer;
+}
+
+/*
+ * Update process times based on virtual cpu times stored by entry.S
+ * to the lowcore fields user_timer, system_timer & steal_clock.
+ */
+void vtime_account_kernel(struct task_struct *tsk)
+{
+	u64 delta = vtime_delta();
+
+	if (tsk->flags & PF_VCPU)
+		S390_lowcore.guest_timer += delta;
 	else
-		S390_lowcore.system_timer += timer;
+		S390_lowcore.system_timer += delta;
 
-	virt_timer_forward(timer);
+	virt_timer_forward(delta);
 }
 EXPORT_SYMBOL_GPL(vtime_account_kernel);
 
-void vtime_account_irq_enter(struct task_struct *tsk)
+void vtime_account_idle(struct task_struct *tsk)
 __attribute__((alias("vtime_account_kernel")));
 
-void vtime_account_irq_exit(struct task_struct *tsk)
-__attribute__((alias("vtime_account_kernel")));
+void vtime_account_softirq(struct task_struct *tsk)
+{
+	u64 delta = vtime_delta();
+
+	S390_lowcore.softirq_timer += delta;
+
+	virt_timer_forward(delta);
+}
+
+void vtime_account_hardirq(struct task_struct *tsk)
+{
+	u64 delta = vtime_delta();
+
+	S390_lowcore.hardirq_timer += delta;
+
+	virt_timer_forward(delta);
+}
 
 /*
  * Sorted add to a list. List is linear searched until first bigger
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 65efde4d0dca..3675452f6029 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -425,7 +425,6 @@ void vtime_task_switch(struct task_struct *prev)
  * time spent by the CPU when it's in low power mode) must override
  * vtime_account().
  */
-#ifndef __ARCH_HAS_VTIME_ACCOUNT
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
 	if (hardirq_count()) {
@@ -449,7 +448,6 @@ void vtime_account_irq_exit(struct task_struct *tsk)
 		vtime_account_softirq(tsk);
 	}
 }
-#endif /* __ARCH_HAS_VTIME_ACCOUNT */
 
 void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
 		    u64 *ut, u64 *st)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-01  0:12 [PATCH 0/5] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry v2 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2020-12-01  0:12 ` [PATCH 3/5] s390/vtime: Convert to consolidated " Frederic Weisbecker
@ 2020-12-01  0:12 ` Frederic Weisbecker
  2020-12-01  9:20   ` Peter Zijlstra
  2020-12-01  0:12 ` [PATCH 5/5] irq: Call tick_irq_enter() inside HARDIRQ_OFFSET Frederic Weisbecker
  4 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-01  0:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Tony Luck, Peter Zijlstra,
	Vasily Gorbik, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Christian Borntraeger, Fenghua Yu,
	Heiko Carstens

IRQ time entry is currently accounted before HARDIRQ_OFFSET or
SOFTIRQ_OFFSET are incremented. This is convenient to decide to which
index the cputime to account is dispatched.

Unfortunately it prevents tick_irq_enter() from being called under
HARDIRQ_OFFSET because tick_irq_enter() has to be called before the IRQ
entry accounting due to the necessary clock catch up. As a result we
don't benefit from appropriate lockdep coverage on tick_irq_enter().

To prepare for fixing this, move the IRQ entry cputime accounting after
the preempt offset is incremented. This requires the cputime dispatch
code to handle the extra offset.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 include/linux/hardirq.h |  4 ++--
 include/linux/vtime.h   | 10 ++++----
 kernel/sched/cputime.c  | 53 +++++++++++++++++++++++++++++++----------
 kernel/softirq.c        |  2 +-
 4 files changed, 49 insertions(+), 20 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 754f67ac4326..02499c10fbf7 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -32,9 +32,9 @@ static __always_inline void rcu_irq_enter_check_tick(void)
  */
 #define __irq_enter()					\
 	do {						\
+		preempt_count_add(HARDIRQ_OFFSET);	\
+		lockdep_hardirq_enter();		\
 		account_irq_enter_time(current);	\
-		preempt_count_add(HARDIRQ_OFFSET);	\
-		lockdep_hardirq_enter();		\
 	} while (0)
 
 /*
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index f827b38c3bb7..cad8ff530273 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -96,21 +96,23 @@ static inline void vtime_flush(struct task_struct *tsk) { }
 
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-extern void irqtime_account_irq(struct task_struct *tsk);
+extern void irqtime_account_enter(struct task_struct *tsk);
+extern void irqtime_account_exit(struct task_struct *tsk);
 #else
-static inline void irqtime_account_irq(struct task_struct *tsk) { }
+static inline void irqtime_account_enter(struct task_struct *tsk) { }
+static inline void irqtime_account_exit(struct task_struct *tsk) { }
 #endif
 
 static inline void account_irq_enter_time(struct task_struct *tsk)
 {
 	vtime_account_irq_enter(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_enter(tsk);
 }
 
 static inline void account_irq_exit_time(struct task_struct *tsk)
 {
 	vtime_account_irq_exit(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_exit(tsk);
 }
 
 #endif /* _LINUX_KERNEL_VTIME_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 3675452f6029..44bd774af37d 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -43,23 +43,48 @@ static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
 	u64_stats_update_end(&irqtime->sync);
 }
 
-/*
- * Called before incrementing preempt_count on {soft,}irq_enter
- * and before decrementing preempt_count on {soft,}irq_exit.
- */
-void irqtime_account_irq(struct task_struct *curr)
+static s64 irqtime_get_delta(struct irqtime *irqtime)
 {
-	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
+	int cpu = smp_processor_id();
 	s64 delta;
-	int cpu;
 
-	if (!sched_clock_irqtime)
-		return;
-
-	cpu = smp_processor_id();
 	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
 	irqtime->irq_start_time += delta;
 
+	return delta;
+}
+
+/* Called after incrementing preempt_count on {soft,}irq_enter */
+void irqtime_account_enter(struct task_struct *curr)
+{
+	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
+	u64 delta;
+
+	if (!sched_clock_irqtime)
+		return;
+
+	delta = irqtime_get_delta(irqtime);
+	/*
+	 * We do not account for softirq time from ksoftirqd here.
+	 * We want to continue accounting softirq time to ksoftirqd thread
+	 * in that case, so as not to confuse scheduler with a special task
+	 * that do not consume any time, but still wants to run.
+	 */
+	if ((irq_count() == (SOFTIRQ_OFFSET | HARDIRQ_OFFSET)) &&
+	    curr != this_cpu_ksoftirqd())
+		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
+}
+
+/* Called before decrementing preempt_count on {soft,}irq_exit */
+void irqtime_account_exit(struct task_struct *curr)
+{
+	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
+	u64 delta;
+
+	if (!sched_clock_irqtime)
+		return;
+
+	delta = irqtime_get_delta(irqtime);
 	/*
 	 * We do not account for softirq time from ksoftirqd here.
 	 * We want to continue accounting softirq time to ksoftirqd thread
@@ -427,9 +452,11 @@ void vtime_task_switch(struct task_struct *prev)
  */
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
-	if (hardirq_count()) {
+	WARN_ON_ONCE(in_task());
+
+	if (hardirq_count() > HARDIRQ_OFFSET) {
 		vtime_account_hardirq(tsk);
-	} else if (in_serving_softirq()) {
+	} else if (hardirq_count() && in_serving_softirq()) {
 		vtime_account_softirq(tsk);
 	} else if (is_idle_task(tsk)) {
 		vtime_account_idle(tsk);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 617009ccd82c..24254c41bb7c 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -315,9 +315,9 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	current->flags &= ~PF_MEMALLOC;
 
 	pending = local_softirq_pending();
-	account_irq_enter_time(current);
 
 	__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
+	account_irq_enter_time(current);
 	in_hardirq = lockdep_softirq_start();
 
 restart:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/5] irq: Call tick_irq_enter() inside HARDIRQ_OFFSET
  2020-12-01  0:12 [PATCH 0/5] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry v2 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2020-12-01  0:12 ` [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation Frederic Weisbecker
@ 2020-12-01  0:12 ` Frederic Weisbecker
  4 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-01  0:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Tony Luck, Peter Zijlstra,
	Vasily Gorbik, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Christian Borntraeger, Fenghua Yu,
	Heiko Carstens

Now that account_irq_enter_time() is called after HARDIRQ_OFFSET has
been incremented, there is nothing left that prevents us from also
moving tick_irq_enter() after HARDIRQ_OFFSET is incremented.

The desired outcome is to remove the nasty hack that prevents softirqs
from being raised through ksoftirqd instead of the hardirq bottom half.
Also tick_irq_enter() then becomes appropriately covered by lockdep.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 kernel/softirq.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 24254c41bb7c..447ee4d6fe4d 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -377,16 +377,12 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
  */
 void irq_enter_rcu(void)
 {
-	if (is_idle_task(current) && !in_interrupt()) {
-		/*
-		 * Prevent raise_softirq from needlessly waking up ksoftirqd
-		 * here, as softirq will be serviced on return from interrupt.
-		 */
-		local_bh_disable();
+	__irq_enter_raw();
+
+	if (is_idle_task(current) && (irq_count() == HARDIRQ_OFFSET))
 		tick_irq_enter();
-		_local_bh_enable();
-	}
-	__irq_enter();
+
+	account_irq_enter_time(current);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-01  0:12 ` [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation Frederic Weisbecker
@ 2020-12-01  9:20   ` Peter Zijlstra
  2020-12-01 11:23     ` Frederic Weisbecker
  2020-12-01 11:33     ` Thomas Gleixner
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2020-12-01  9:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, LKML, Tony Luck, Vasily Gorbik,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Christian Borntraeger, Fenghua Yu, Heiko Carstens

On Tue, Dec 01, 2020 at 01:12:25AM +0100, Frederic Weisbecker wrote:
> +static s64 irqtime_get_delta(struct irqtime *irqtime)
>  {
> +	int cpu = smp_processor_id();
>  	s64 delta;
>  
>  	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
>  	irqtime->irq_start_time += delta;
>  
> +	return delta;
> +}
> +
> +/* Called after incrementing preempt_count on {soft,}irq_enter */
> +void irqtime_account_enter(struct task_struct *curr)
> +{
> +	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
> +	u64 delta;
> +
> +	if (!sched_clock_irqtime)
> +		return;
> +
> +	delta = irqtime_get_delta(irqtime);
> +	/*
> +	 * We do not account for softirq time from ksoftirqd here.
> +	 * We want to continue accounting softirq time to ksoftirqd thread
> +	 * in that case, so as not to confuse scheduler with a special task
> +	 * that do not consume any time, but still wants to run.
> +	 */
> +	if ((irq_count() == (SOFTIRQ_OFFSET | HARDIRQ_OFFSET)) &&
> +	    curr != this_cpu_ksoftirqd())
> +		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
> +}
> +
> +/* Called before decrementing preempt_count on {soft,}irq_exit */
> +void irqtime_account_exit(struct task_struct *curr)
> +{
> +	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
> +	u64 delta;
> +
> +	if (!sched_clock_irqtime)
> +		return;
> +
> +	delta = irqtime_get_delta(irqtime);
>  	/*
>  	 * We do not account for softirq time from ksoftirqd here.
>  	 * We want to continue accounting softirq time to ksoftirqd thread


Urgh...


Why not something like:

void irqtime_account_irq(struct task_struct *curr, unsigned int offset)
{
	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
	unsigned int pc = preempt_count() - offset;
	s64 delta;
	int cpu;

	if (!sched_clock_irqtime)
		return;

	cpu = smp_processor_id();
	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
	irqtime->irq_start_time += delta;

	/*
	 * We do not account for softirq time from ksoftirqd here.
	 * We want to continue accounting softirq time to ksoftirqd thread
	 * in that case, so as not to confuse scheduler with a special task
	 * that do not consume any time, but still wants to run.
	 */
	if (pc & HARDIRQ_MASK)
		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
	else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
}



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-01  9:20   ` Peter Zijlstra
@ 2020-12-01 11:23     ` Frederic Weisbecker
  2020-12-01 11:33     ` Thomas Gleixner
  1 sibling, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-01 11:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Tony Luck, Vasily Gorbik,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Christian Borntraeger, Fenghua Yu, Heiko Carstens

On Tue, Dec 01, 2020 at 10:20:11AM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 01:12:25AM +0100, Frederic Weisbecker wrote:
> > +static s64 irqtime_get_delta(struct irqtime *irqtime)
> >  {
> > +	int cpu = smp_processor_id();
> >  	s64 delta;
> >  
> >  	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
> >  	irqtime->irq_start_time += delta;
> >  
> > +	return delta;
> > +}
> > +
> > +/* Called after incrementing preempt_count on {soft,}irq_enter */
> > +void irqtime_account_enter(struct task_struct *curr)
> > +{
> > +	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
> > +	u64 delta;
> > +
> > +	if (!sched_clock_irqtime)
> > +		return;
> > +
> > +	delta = irqtime_get_delta(irqtime);
> > +	/*
> > +	 * We do not account for softirq time from ksoftirqd here.
> > +	 * We want to continue accounting softirq time to ksoftirqd thread
> > +	 * in that case, so as not to confuse scheduler with a special task
> > +	 * that do not consume any time, but still wants to run.
> > +	 */
> > +	if ((irq_count() == (SOFTIRQ_OFFSET | HARDIRQ_OFFSET)) &&
> > +	    curr != this_cpu_ksoftirqd())
> > +		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
> > +}
> > +
> > +/* Called before decrementing preempt_count on {soft,}irq_exit */
> > +void irqtime_account_exit(struct task_struct *curr)
> > +{
> > +	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
> > +	u64 delta;
> > +
> > +	if (!sched_clock_irqtime)
> > +		return;
> > +
> > +	delta = irqtime_get_delta(irqtime);
> >  	/*
> >  	 * We do not account for softirq time from ksoftirqd here.
> >  	 * We want to continue accounting softirq time to ksoftirqd thread
> 
> 
> Urgh...
> 
> 
> Why not something like:
> 
> void irqtime_account_irq(struct task_struct *curr, unsigned int offset)
> {
> 	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
> 	unsigned int pc = preempt_count() - offset;
> 	s64 delta;
> 	int cpu;
> 
> 	if (!sched_clock_irqtime)
> 		return;
> 
> 	cpu = smp_processor_id();
> 	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
> 	irqtime->irq_start_time += delta;
> 
> 	/*
> 	 * We do not account for softirq time from ksoftirqd here.
> 	 * We want to continue accounting softirq time to ksoftirqd thread
> 	 * in that case, so as not to confuse scheduler with a special task
> 	 * that do not consume any time, but still wants to run.
> 	 */
> 	if (pc & HARDIRQ_MASK)
> 		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
> 	else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
> 		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
> }
> 
> 

Right that's better, I'll do that.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-01  9:20   ` Peter Zijlstra
  2020-12-01 11:23     ` Frederic Weisbecker
@ 2020-12-01 11:33     ` Thomas Gleixner
  2020-12-01 11:40       ` Frederic Weisbecker
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2020-12-01 11:33 UTC (permalink / raw)
  To: Peter Zijlstra, Frederic Weisbecker
  Cc: LKML, Tony Luck, Vasily Gorbik, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Christian Borntraeger,
	Fenghua Yu, Heiko Carstens

On Tue, Dec 01 2020 at 10:20, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 01:12:25AM +0100, Frederic Weisbecker wrote:
> Why not something like:
>
> void irqtime_account_irq(struct task_struct *curr, unsigned int offset)
> {
> 	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
> 	unsigned int pc = preempt_count() - offset;
> 	s64 delta;
> 	int cpu;
>
> 	if (!sched_clock_irqtime)
> 		return;
>
> 	cpu = smp_processor_id();
> 	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
> 	irqtime->irq_start_time += delta;
>
> 	/*
> 	 * We do not account for softirq time from ksoftirqd here.
> 	 * We want to continue accounting softirq time to ksoftirqd thread
> 	 * in that case, so as not to confuse scheduler with a special task
> 	 * that do not consume any time, but still wants to run.
> 	 */
> 	if (pc & HARDIRQ_MASK)
> 		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
> 	else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
> 		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
> }

Why not making all of this explicit instead of these conditionals?

Thanks,

        tglx


     

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-01 11:33     ` Thomas Gleixner
@ 2020-12-01 11:40       ` Frederic Weisbecker
  2020-12-01 13:34         ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-01 11:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, LKML, Tony Luck, Vasily Gorbik, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Christian Borntraeger,
	Fenghua Yu, Heiko Carstens

On Tue, Dec 01, 2020 at 12:33:26PM +0100, Thomas Gleixner wrote:
> On Tue, Dec 01 2020 at 10:20, Peter Zijlstra wrote:
> > On Tue, Dec 01, 2020 at 01:12:25AM +0100, Frederic Weisbecker wrote:
> > Why not something like:
> >
> > void irqtime_account_irq(struct task_struct *curr, unsigned int offset)
> > {
> > 	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
> > 	unsigned int pc = preempt_count() - offset;
> > 	s64 delta;
> > 	int cpu;
> >
> > 	if (!sched_clock_irqtime)
> > 		return;
> >
> > 	cpu = smp_processor_id();
> > 	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
> > 	irqtime->irq_start_time += delta;
> >
> > 	/*
> > 	 * We do not account for softirq time from ksoftirqd here.
> > 	 * We want to continue accounting softirq time to ksoftirqd thread
> > 	 * in that case, so as not to confuse scheduler with a special task
> > 	 * that do not consume any time, but still wants to run.
> > 	 */
> > 	if (pc & HARDIRQ_MASK)
> > 		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
> > 	else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
> > 		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
> > }
> 
> Why not making all of this explicit instead of these conditionals?

Hmm, I'm not sure I get what you suggest?

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-01 11:40       ` Frederic Weisbecker
@ 2020-12-01 13:34         ` Thomas Gleixner
  2020-12-01 14:35           ` Frederic Weisbecker
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2020-12-01 13:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Tony Luck, Vasily Gorbik, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Christian Borntraeger,
	Fenghua Yu, Heiko Carstens

On Tue, Dec 01 2020 at 12:40, Frederic Weisbecker wrote:
> On Tue, Dec 01, 2020 at 12:33:26PM +0100, Thomas Gleixner wrote:
>> > 	/*
>> > 	 * We do not account for softirq time from ksoftirqd here.
>> > 	 * We want to continue accounting softirq time to ksoftirqd thread
>> > 	 * in that case, so as not to confuse scheduler with a special task
>> > 	 * that do not consume any time, but still wants to run.
>> > 	 */
>> > 	if (pc & HARDIRQ_MASK)
>> > 		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
>> > 	else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
>> > 		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
>> > }
>> 
>> Why not making all of this explicit instead of these conditionals?
>
> Hmm, I'm not sure I get what you suggest?

Instead of playing games with preeempt count and offsets and checking
for ksoftirqd, can't you just have:

        account_hardirqtime()
        account_softirqtime()

and call them from the right spots. See the below for illustration (it's
obviously incomplete).

Thanks,

        tglx
---
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -377,6 +377,7 @@ static inline void invoke_softirq(void)
 		return;
 
 	if (!force_irqthreads) {
+		account_softirq_enter_time(current);
 #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
 		/*
 		 * We can safely execute softirq on the current stack if
@@ -391,6 +392,7 @@ static inline void invoke_softirq(void)
 		 * to prevent from any overrun.
 		 */
 		do_softirq_own_stack();
+		account_softirq_exit_time(current);
 #endif
 	} else {
 		wakeup_softirqd();
@@ -417,7 +419,7 @@ static inline void __irq_exit_rcu(void)
 #else
 	lockdep_assert_irqs_disabled();
 #endif
-	account_irq_exit_time(current);
+	account_hardirq_exit_time(current);
 	preempt_count_sub(HARDIRQ_OFFSET);
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-01 13:34         ` Thomas Gleixner
@ 2020-12-01 14:35           ` Frederic Weisbecker
  2020-12-01 15:01             ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-01 14:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, LKML, Tony Luck, Vasily Gorbik, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Christian Borntraeger,
	Fenghua Yu, Heiko Carstens

On Tue, Dec 01, 2020 at 02:34:49PM +0100, Thomas Gleixner wrote:
> On Tue, Dec 01 2020 at 12:40, Frederic Weisbecker wrote:
> > On Tue, Dec 01, 2020 at 12:33:26PM +0100, Thomas Gleixner wrote:
> >> > 	/*
> >> > 	 * We do not account for softirq time from ksoftirqd here.
> >> > 	 * We want to continue accounting softirq time to ksoftirqd thread
> >> > 	 * in that case, so as not to confuse scheduler with a special task
> >> > 	 * that do not consume any time, but still wants to run.
> >> > 	 */
> >> > 	if (pc & HARDIRQ_MASK)
> >> > 		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
> >> > 	else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
> >> > 		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
> >> > }
> >> 
> >> Why not making all of this explicit instead of these conditionals?
> >
> > Hmm, I'm not sure I get what you suggest?
> 
> Instead of playing games with preeempt count and offsets and checking
> for ksoftirqd, can't you just have:
> 
>         account_hardirqtime()
>         account_softirqtime()
> 
> and call them from the right spots. See the below for illustration (it's
> obviously incomplete).
> 
> Thanks,
> 
>         tglx
> ---
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -377,6 +377,7 @@ static inline void invoke_softirq(void)
>  		return;
>  
>  	if (!force_irqthreads) {
> +		account_softirq_enter_time(current);
>  #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
>  		/*
>  		 * We can safely execute softirq on the current stack if
> @@ -391,6 +392,7 @@ static inline void invoke_softirq(void)
>  		 * to prevent from any overrun.
>  		 */
>  		do_softirq_own_stack();
> +		account_softirq_exit_time(current);

Indeed for the softirq part it simplifies things.


>  #endif
>  	} else {
>  		wakeup_softirqd();
> @@ -417,7 +419,7 @@ static inline void __irq_exit_rcu(void)
>  #else
>  	lockdep_assert_irqs_disabled();
>  #endif
> -	account_irq_exit_time(current);
> +	account_hardirq_exit_time(current);

And that one too makes things simple. But note that

    account_hardirq_enter_time()

will still need some preempt count checks to see if
this is a nested hardirq, a hardirq interrupting a softirq
or a hardirq interrupting a task.

But I think it's a win in the end. I'll try that.

Thanks.

>  	preempt_count_sub(HARDIRQ_OFFSET);
>  	if (!in_interrupt() && local_softirq_pending())
>  		invoke_softirq();

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-01 14:35           ` Frederic Weisbecker
@ 2020-12-01 15:01             ` Peter Zijlstra
  2020-12-01 15:53               ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2020-12-01 15:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, LKML, Tony Luck, Vasily Gorbik,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Christian Borntraeger, Fenghua Yu, Heiko Carstens

On Tue, Dec 01, 2020 at 03:35:45PM +0100, Frederic Weisbecker wrote:
> And that one too makes things simple. But note that
> 
>     account_hardirq_enter_time()
> 
> will still need some preempt count checks to see if
> this is a nested hardirq, a hardirq interrupting a softirq
> or a hardirq interrupting a task.

So the current tests get that all correct in a single function.
Splitting it out will just result in more lines to get wrong.

That is, I don't think you can do it saner than:

  account_softirq_enter() := irqtime_account_irq(curr, SOFTIRQ_OFFSET);
  account_softirq_exit()  := irqtime_account_irq(curr, 0);
  account_hardirq_enter() := irqtime_account_irq(curr, HARDIRQ_OFFSET);
  account_hardirq_exit()  := irqtime_account_irq(curr, 0);

Fundamentally you have to determine the previous context to determine
where to account the delta to. Note that when the previous context is
task context we throw away the delta.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-01 15:01             ` Peter Zijlstra
@ 2020-12-01 15:53               ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2020-12-01 15:53 UTC (permalink / raw)
  To: Peter Zijlstra, Frederic Weisbecker
  Cc: LKML, Tony Luck, Vasily Gorbik, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Christian Borntraeger,
	Fenghua Yu, Heiko Carstens

On Tue, Dec 01 2020 at 16:01, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 03:35:45PM +0100, Frederic Weisbecker wrote:
>> And that one too makes things simple. But note that
>> 
>>     account_hardirq_enter_time()
>> 
>> will still need some preempt count checks to see if
>> this is a nested hardirq, a hardirq interrupting a softirq
>> or a hardirq interrupting a task.
>
> So the current tests get that all correct in a single function.
> Splitting it out will just result in more lines to get wrong.
>
> That is, I don't think you can do it saner than:
>
>   account_softirq_enter() := irqtime_account_irq(curr, SOFTIRQ_OFFSET);
>   account_softirq_exit()  := irqtime_account_irq(curr, 0);
>   account_hardirq_enter() := irqtime_account_irq(curr, HARDIRQ_OFFSET);
>   account_hardirq_exit()  := irqtime_account_irq(curr, 0);
>
> Fundamentally you have to determine the previous context to determine
> where to account the delta to. Note that when the previous context is
> task context we throw away the delta.

Fair enough.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-29 14:30         ` Frederic Weisbecker
@ 2020-12-29 15:58           ` Qais Yousef
  0 siblings, 0 replies; 21+ messages in thread
From: Qais Yousef @ 2020-12-29 15:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Peter Zijlstra, LKML, Tony Luck, Vasily Gorbik,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Christian Borntraeger, Fenghua Yu, Heiko Carstens

On 12/29/20 15:30, Frederic Weisbecker wrote:
> On Tue, Dec 29, 2020 at 02:12:31PM +0000, Qais Yousef wrote:
> > On 12/29/20 14:41, Frederic Weisbecker wrote:
> > > > > -void vtime_account_irq(struct task_struct *tsk)
> > > > > +void vtime_account_irq(struct task_struct *tsk, unsigned int offset)
> > > > >  {
> > > > > -	if (hardirq_count()) {
> > > > > +	unsigned int pc = preempt_count() - offset;
> > > > > +
> > > > > +	if (pc & HARDIRQ_OFFSET) {
> > > > 
> > > > Shouldn't this be HARDIRQ_MASK like above?
> > > 
> > > In the rare cases of nested hardirqs happening with broken drivers, Only the outer hardirq
> > > does matter. All the time spent in the inner hardirqs is included in the outer
> > > one.
> > 
> > Ah I see. The original code was doing hardirq_count(), which apparently wasn't
> > right either.
> > 
> > Shouldn't it be pc == HARDIRQ_OFFSET then? All odd nest counts will trigger
> > this otherwise, and IIUC we want this to trigger once on first entry only.
> 
> Right but we must also handle hardirqs interrupting either preempt disabled sections
> or softirq servicing/disabled section.
> 
> 3 stacking hardirqs should be rare enough that we don't really care. In the
> worst case we are going to account the third IRQ seperately. Not a correctness
> issue, just a rare unoptimized case.

I admit I need to wrap my head around some more details to fully comprehend
that, but that's my own confusion to clear out :-)

Thanks for your answer.

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-29 14:12       ` Qais Yousef
@ 2020-12-29 14:30         ` Frederic Weisbecker
  2020-12-29 15:58           ` Qais Yousef
  0 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-29 14:30 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Thomas Gleixner, Peter Zijlstra, LKML, Tony Luck, Vasily Gorbik,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Christian Borntraeger, Fenghua Yu, Heiko Carstens

On Tue, Dec 29, 2020 at 02:12:31PM +0000, Qais Yousef wrote:
> On 12/29/20 14:41, Frederic Weisbecker wrote:
> > > > -void vtime_account_irq(struct task_struct *tsk)
> > > > +void vtime_account_irq(struct task_struct *tsk, unsigned int offset)
> > > >  {
> > > > -	if (hardirq_count()) {
> > > > +	unsigned int pc = preempt_count() - offset;
> > > > +
> > > > +	if (pc & HARDIRQ_OFFSET) {
> > > 
> > > Shouldn't this be HARDIRQ_MASK like above?
> > 
> > In the rare cases of nested hardirqs happening with broken drivers, Only the outer hardirq
> > does matter. All the time spent in the inner hardirqs is included in the outer
> > one.
> 
> Ah I see. The original code was doing hardirq_count(), which apparently wasn't
> right either.
> 
> Shouldn't it be pc == HARDIRQ_OFFSET then? All odd nest counts will trigger
> this otherwise, and IIUC we want this to trigger once on first entry only.

Right but we must also handle hardirqs interrupting either preempt disabled sections
or softirq servicing/disabled section.

3 stacking hardirqs should be rare enough that we don't really care. In the
worst case we are going to account the third IRQ seperately. Not a correctness
issue, just a rare unoptimized case.

> 
> Thanks
> 
> --
> Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-29 13:41     ` Frederic Weisbecker
@ 2020-12-29 14:12       ` Qais Yousef
  2020-12-29 14:30         ` Frederic Weisbecker
  0 siblings, 1 reply; 21+ messages in thread
From: Qais Yousef @ 2020-12-29 14:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Peter Zijlstra, LKML, Tony Luck, Vasily Gorbik,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Christian Borntraeger, Fenghua Yu, Heiko Carstens

On 12/29/20 14:41, Frederic Weisbecker wrote:
> On Mon, Dec 28, 2020 at 02:15:29AM +0000, Qais Yousef wrote:
> > Hi Frederic
> > 
> > On 12/02/20 12:57, Frederic Weisbecker wrote:
> > > @@ -66,9 +68,9 @@ void irqtime_account_irq(struct task_struct *curr)
> > >  	 * in that case, so as not to confuse scheduler with a special task
> > >  	 * that do not consume any time, but still wants to run.
> > >  	 */
> > > -	if (hardirq_count())
> > > +	if (pc & HARDIRQ_MASK)
> > >  		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
> > > -	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
> > > +	else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
> > 
> > Noob question. Why for SOFTIRQs we do sofirq_count() & *SOFTIRQ_OFFSET*? It
> > seems we're in-softirq only if the count is odd numbered.
> > 
> > /me tries to dig more
> > 
> > Hmm could it be because the softirq count is actually 1 bit and the rest is
> > for SOFTIRQ_DISABLE_OFFSET (BH disabled)?
> 
> Exactly!
> 
> > 
> > IOW, 1 bit is for we're in softirq context, and the remaining 7 bits are to
> > count BH disable nesting, right?
> > 
> > I guess this would make sense; we don't nest softirqs processing AFAIK. But
> > I could be misreading the code too :-)
> 
> You got it right!
> 
> This is commented in softirq.c somewhere:
> 
> /*
>  * preempt_count and SOFTIRQ_OFFSET usage:
>  * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
>  *   softirq processing.
>  * - preempt_count is changed by SOFTIRQ_DISABLE_OFFSET (= 2 * SOFTIRQ_OFFSET)
>  *   on local_bh_disable or local_bh_enable.
>  * This lets us distinguish between whether we are currently processing
>  * softirq and whether we just have bh disabled.
>  */
> 
> But we should elaborate on the fact that, indeed, softirq processing can't nest,
> while softirq disablement can. I should try to send a patch and comment more
> thoroughly on the subtleties of preempt mask in preempt.h.

Thanks for the info!

> 
> > 
> > >  		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
> > >  }
> > >  
> > > @@ -417,11 +419,13 @@ void vtime_task_switch(struct task_struct *prev)
> > >  }
> > >  # endif
> > >  
> > > -void vtime_account_irq(struct task_struct *tsk)
> > > +void vtime_account_irq(struct task_struct *tsk, unsigned int offset)
> > >  {
> > > -	if (hardirq_count()) {
> > > +	unsigned int pc = preempt_count() - offset;
> > > +
> > > +	if (pc & HARDIRQ_OFFSET) {
> > 
> > Shouldn't this be HARDIRQ_MASK like above?
> 
> In the rare cases of nested hardirqs happening with broken drivers, Only the outer hardirq
> does matter. All the time spent in the inner hardirqs is included in the outer
> one.

Ah I see. The original code was doing hardirq_count(), which apparently wasn't
right either.

Shouldn't it be pc == HARDIRQ_OFFSET then? All odd nest counts will trigger
this otherwise, and IIUC we want this to trigger once on first entry only.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-28  2:15   ` Qais Yousef
@ 2020-12-29 13:41     ` Frederic Weisbecker
  2020-12-29 14:12       ` Qais Yousef
  0 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-29 13:41 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Thomas Gleixner, Peter Zijlstra, LKML, Tony Luck, Vasily Gorbik,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Christian Borntraeger, Fenghua Yu, Heiko Carstens

On Mon, Dec 28, 2020 at 02:15:29AM +0000, Qais Yousef wrote:
> Hi Frederic
> 
> On 12/02/20 12:57, Frederic Weisbecker wrote:
> > @@ -66,9 +68,9 @@ void irqtime_account_irq(struct task_struct *curr)
> >  	 * in that case, so as not to confuse scheduler with a special task
> >  	 * that do not consume any time, but still wants to run.
> >  	 */
> > -	if (hardirq_count())
> > +	if (pc & HARDIRQ_MASK)
> >  		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
> > -	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
> > +	else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
> 
> Noob question. Why for SOFTIRQs we do sofirq_count() & *SOFTIRQ_OFFSET*? It
> seems we're in-softirq only if the count is odd numbered.
> 
> /me tries to dig more
> 
> Hmm could it be because the softirq count is actually 1 bit and the rest is
> for SOFTIRQ_DISABLE_OFFSET (BH disabled)?

Exactly!

> 
> IOW, 1 bit is for we're in softirq context, and the remaining 7 bits are to
> count BH disable nesting, right?
> 
> I guess this would make sense; we don't nest softirqs processing AFAIK. But
> I could be misreading the code too :-)

You got it right!

This is commented in softirq.c somewhere:

/*
 * preempt_count and SOFTIRQ_OFFSET usage:
 * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
 *   softirq processing.
 * - preempt_count is changed by SOFTIRQ_DISABLE_OFFSET (= 2 * SOFTIRQ_OFFSET)
 *   on local_bh_disable or local_bh_enable.
 * This lets us distinguish between whether we are currently processing
 * softirq and whether we just have bh disabled.
 */

But we should elaborate on the fact that, indeed, softirq processing can't nest,
while softirq disablement can. I should try to send a patch and comment more
thoroughly on the subtleties of preempt mask in preempt.h.

> 
> >  		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
> >  }
> >  
> > @@ -417,11 +419,13 @@ void vtime_task_switch(struct task_struct *prev)
> >  }
> >  # endif
> >  
> > -void vtime_account_irq(struct task_struct *tsk)
> > +void vtime_account_irq(struct task_struct *tsk, unsigned int offset)
> >  {
> > -	if (hardirq_count()) {
> > +	unsigned int pc = preempt_count() - offset;
> > +
> > +	if (pc & HARDIRQ_OFFSET) {
> 
> Shouldn't this be HARDIRQ_MASK like above?

In the rare cases of nested hardirqs happening with broken drivers, Only the outer hardirq
does matter. All the time spent in the inner hardirqs is included in the outer
one.

Thanks.

> 
> >  		vtime_account_hardirq(tsk);
> > -	} else if (in_serving_softirq()) {
> > +	} else if (pc & SOFTIRQ_OFFSET) {
> >  		vtime_account_softirq(tsk);
> >  	} else if (!IS_ENABLED(CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE) &&
> >  		   is_idle_task(tsk)) {
> 
> Thanks
> 
> --
> Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-02 11:57 ` [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation Frederic Weisbecker
  2020-12-02 12:36   ` Peter Zijlstra
@ 2020-12-28  2:15   ` Qais Yousef
  2020-12-29 13:41     ` Frederic Weisbecker
  1 sibling, 1 reply; 21+ messages in thread
From: Qais Yousef @ 2020-12-28  2:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, Peter Zijlstra, LKML, Tony Luck, Vasily Gorbik,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Christian Borntraeger, Fenghua Yu, Heiko Carstens

Hi Frederic

On 12/02/20 12:57, Frederic Weisbecker wrote:
>  #endif /* _LINUX_KERNEL_VTIME_H */
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 02163d4260d7..5f611658eeab 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -44,12 +44,13 @@ static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
>  }
>  
>  /*
> - * Called before incrementing preempt_count on {soft,}irq_enter
> + * Called after incrementing preempt_count on {soft,}irq_enter
>   * and before decrementing preempt_count on {soft,}irq_exit.
>   */
> -void irqtime_account_irq(struct task_struct *curr)
> +void irqtime_account_irq(struct task_struct *curr, unsigned int offset)
>  {
>  	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
> +	unsigned int pc;
>  	s64 delta;
>  	int cpu;
>  
> @@ -59,6 +60,7 @@ void irqtime_account_irq(struct task_struct *curr)
>  	cpu = smp_processor_id();
>  	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
>  	irqtime->irq_start_time += delta;
> +	pc = preempt_count() - offset;
>  
>  	/*
>  	 * We do not account for softirq time from ksoftirqd here.
> @@ -66,9 +68,9 @@ void irqtime_account_irq(struct task_struct *curr)
>  	 * in that case, so as not to confuse scheduler with a special task
>  	 * that do not consume any time, but still wants to run.
>  	 */
> -	if (hardirq_count())
> +	if (pc & HARDIRQ_MASK)
>  		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
> -	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
> +	else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())

Noob question. Why for SOFTIRQs we do sofirq_count() & *SOFTIRQ_OFFSET*? It
seems we're in-softirq only if the count is odd numbered.

/me tries to dig more

Hmm could it be because the softirq count is actually 1 bit and the rest is
for SOFTIRQ_DISABLE_OFFSET (BH disabled)?

IOW, 1 bit is for we're in softirq context, and the remaining 7 bits are to
count BH disable nesting, right?

I guess this would make sense; we don't nest softirqs processing AFAIK. But
I could be misreading the code too :-)

>  		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
>  }
>  
> @@ -417,11 +419,13 @@ void vtime_task_switch(struct task_struct *prev)
>  }
>  # endif
>  
> -void vtime_account_irq(struct task_struct *tsk)
> +void vtime_account_irq(struct task_struct *tsk, unsigned int offset)
>  {
> -	if (hardirq_count()) {
> +	unsigned int pc = preempt_count() - offset;
> +
> +	if (pc & HARDIRQ_OFFSET) {

Shouldn't this be HARDIRQ_MASK like above?

>  		vtime_account_hardirq(tsk);
> -	} else if (in_serving_softirq()) {
> +	} else if (pc & SOFTIRQ_OFFSET) {
>  		vtime_account_softirq(tsk);
>  	} else if (!IS_ENABLED(CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE) &&
>  		   is_idle_task(tsk)) {

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-02 11:57 ` [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation Frederic Weisbecker
@ 2020-12-02 12:36   ` Peter Zijlstra
  2020-12-28  2:15   ` Qais Yousef
  1 sibling, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2020-12-02 12:36 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, LKML, Tony Luck, Vasily Gorbik,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Christian Borntraeger, Fenghua Yu, Heiko Carstens

On Wed, Dec 02, 2020 at 12:57:31PM +0100, Frederic Weisbecker wrote:
> IRQ time entry is currently accounted before HARDIRQ_OFFSET or
> SOFTIRQ_OFFSET are incremented. This is convenient to decide to which
> index the cputime to account is dispatched.
> 
> Unfortunately it prevents tick_irq_enter() from being called under
> HARDIRQ_OFFSET because tick_irq_enter() has to be called before the IRQ
> entry accounting due to the necessary clock catch up. As a result we
> don't benefit from appropriate lockdep coverage on tick_irq_enter().
> 
> To prepare for fixing this, move the IRQ entry cputime accounting after
> the preempt offset is incremented. This requires the cputime dispatch
> code to handle the extra offset.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-12-02 11:57 [PATCH 0/5] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry v3 Frederic Weisbecker
@ 2020-12-02 11:57 ` Frederic Weisbecker
  2020-12-02 12:36   ` Peter Zijlstra
  2020-12-28  2:15   ` Qais Yousef
  0 siblings, 2 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2020-12-02 11:57 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Tony Luck, Vasily Gorbik,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Christian Borntraeger, Fenghua Yu, Heiko Carstens

IRQ time entry is currently accounted before HARDIRQ_OFFSET or
SOFTIRQ_OFFSET are incremented. This is convenient to decide to which
index the cputime to account is dispatched.

Unfortunately it prevents tick_irq_enter() from being called under
HARDIRQ_OFFSET because tick_irq_enter() has to be called before the IRQ
entry accounting due to the necessary clock catch up. As a result we
don't benefit from appropriate lockdep coverage on tick_irq_enter().

To prepare for fixing this, move the IRQ entry cputime accounting after
the preempt offset is incremented. This requires the cputime dispatch
code to handle the extra offset.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 include/linux/hardirq.h |  4 ++--
 include/linux/vtime.h   | 34 ++++++++++++++++++++++++----------
 kernel/sched/cputime.c  | 18 +++++++++++-------
 kernel/softirq.c        |  6 +++---
 4 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 754f67ac4326..7c9d6a2d7e90 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -32,9 +32,9 @@ static __always_inline void rcu_irq_enter_check_tick(void)
  */
 #define __irq_enter()					\
 	do {						\
-		account_irq_enter_time(current);	\
 		preempt_count_add(HARDIRQ_OFFSET);	\
 		lockdep_hardirq_enter();		\
+		account_hardirq_enter(current);		\
 	} while (0)
 
 /*
@@ -62,8 +62,8 @@ void irq_enter_rcu(void);
  */
 #define __irq_exit()					\
 	do {						\
+		account_hardirq_exit(current);		\
 		lockdep_hardirq_exit();			\
-		account_irq_exit_time(current);		\
 		preempt_count_sub(HARDIRQ_OFFSET);	\
 	} while (0)
 
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 6c9867419615..041d6524d144 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -83,32 +83,46 @@ static inline void vtime_init_idle(struct task_struct *tsk, int cpu) { }
 #endif
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-extern void vtime_account_irq(struct task_struct *tsk);
+extern void vtime_account_irq(struct task_struct *tsk, unsigned int offset);
 extern void vtime_account_softirq(struct task_struct *tsk);
 extern void vtime_account_hardirq(struct task_struct *tsk);
 extern void vtime_flush(struct task_struct *tsk);
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
-static inline void vtime_account_irq(struct task_struct *tsk) { }
+static inline void vtime_account_irq(struct task_struct *tsk, unsigned int offset) { }
+static inline void vtime_account_softirq(struct task_struct *tsk) { }
+static inline void vtime_account_hardirq(struct task_struct *tsk) { }
 static inline void vtime_flush(struct task_struct *tsk) { }
 #endif
 
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-extern void irqtime_account_irq(struct task_struct *tsk);
+extern void irqtime_account_irq(struct task_struct *tsk, unsigned int offset);
 #else
-static inline void irqtime_account_irq(struct task_struct *tsk) { }
+static inline void irqtime_account_irq(struct task_struct *tsk, unsigned int offset) { }
 #endif
 
-static inline void account_irq_enter_time(struct task_struct *tsk)
+static inline void account_softirq_enter(struct task_struct *tsk)
 {
-	vtime_account_irq(tsk);
-	irqtime_account_irq(tsk);
+	vtime_account_irq(tsk, SOFTIRQ_OFFSET);
+	irqtime_account_irq(tsk, SOFTIRQ_OFFSET);
 }
 
-static inline void account_irq_exit_time(struct task_struct *tsk)
+static inline void account_softirq_exit(struct task_struct *tsk)
 {
-	vtime_account_irq(tsk);
-	irqtime_account_irq(tsk);
+	vtime_account_softirq(tsk);
+	irqtime_account_irq(tsk, 0);
+}
+
+static inline void account_hardirq_enter(struct task_struct *tsk)
+{
+	vtime_account_irq(tsk, HARDIRQ_OFFSET);
+	irqtime_account_irq(tsk, HARDIRQ_OFFSET);
+}
+
+static inline void account_hardirq_exit(struct task_struct *tsk)
+{
+	vtime_account_hardirq(tsk);
+	irqtime_account_irq(tsk, 0);
 }
 
 #endif /* _LINUX_KERNEL_VTIME_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 02163d4260d7..5f611658eeab 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -44,12 +44,13 @@ static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
 }
 
 /*
- * Called before incrementing preempt_count on {soft,}irq_enter
+ * Called after incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
  */
-void irqtime_account_irq(struct task_struct *curr)
+void irqtime_account_irq(struct task_struct *curr, unsigned int offset)
 {
 	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
+	unsigned int pc;
 	s64 delta;
 	int cpu;
 
@@ -59,6 +60,7 @@ void irqtime_account_irq(struct task_struct *curr)
 	cpu = smp_processor_id();
 	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
 	irqtime->irq_start_time += delta;
+	pc = preempt_count() - offset;
 
 	/*
 	 * We do not account for softirq time from ksoftirqd here.
@@ -66,9 +68,9 @@ void irqtime_account_irq(struct task_struct *curr)
 	 * in that case, so as not to confuse scheduler with a special task
 	 * that do not consume any time, but still wants to run.
 	 */
-	if (hardirq_count())
+	if (pc & HARDIRQ_MASK)
 		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
-	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+	else if ((pc & SOFTIRQ_OFFSET) && curr != this_cpu_ksoftirqd())
 		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
 }
 
@@ -417,11 +419,13 @@ void vtime_task_switch(struct task_struct *prev)
 }
 # endif
 
-void vtime_account_irq(struct task_struct *tsk)
+void vtime_account_irq(struct task_struct *tsk, unsigned int offset)
 {
-	if (hardirq_count()) {
+	unsigned int pc = preempt_count() - offset;
+
+	if (pc & HARDIRQ_OFFSET) {
 		vtime_account_hardirq(tsk);
-	} else if (in_serving_softirq()) {
+	} else if (pc & SOFTIRQ_OFFSET) {
 		vtime_account_softirq(tsk);
 	} else if (!IS_ENABLED(CONFIG_HAVE_VIRT_CPU_ACCOUNTING_IDLE) &&
 		   is_idle_task(tsk)) {
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 617009ccd82c..b8f42b3ba8ca 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -315,10 +315,10 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	current->flags &= ~PF_MEMALLOC;
 
 	pending = local_softirq_pending();
-	account_irq_enter_time(current);
 
 	__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
 	in_hardirq = lockdep_softirq_start();
+	account_softirq_enter(current);
 
 restart:
 	/* Reset the pending bitmask before enabling irqs */
@@ -365,8 +365,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 		wakeup_softirqd();
 	}
 
+	account_softirq_exit(current);
 	lockdep_softirq_end(in_hardirq);
-	account_irq_exit_time(current);
 	__local_bh_enable(SOFTIRQ_OFFSET);
 	WARN_ON_ONCE(in_interrupt());
 	current_restore_flags(old_flags, PF_MEMALLOC);
@@ -418,7 +418,7 @@ static inline void __irq_exit_rcu(void)
 #else
 	lockdep_assert_irqs_disabled();
 #endif
-	account_irq_exit_time(current);
+	account_hardirq_exit(current);
 	preempt_count_sub(HARDIRQ_OFFSET);
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-12-29 15:59 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-01  0:12 [PATCH 0/5] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry v2 Frederic Weisbecker
2020-12-01  0:12 ` [PATCH 1/5] sched/cputime: Remove symbol exports from IRQ time accounting Frederic Weisbecker
2020-12-01  0:12 ` [PATCH 2/5] sched/vtime: Consolidate " Frederic Weisbecker
2020-12-01  0:12 ` [PATCH 3/5] s390/vtime: Convert to consolidated " Frederic Weisbecker
2020-12-01  0:12 ` [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation Frederic Weisbecker
2020-12-01  9:20   ` Peter Zijlstra
2020-12-01 11:23     ` Frederic Weisbecker
2020-12-01 11:33     ` Thomas Gleixner
2020-12-01 11:40       ` Frederic Weisbecker
2020-12-01 13:34         ` Thomas Gleixner
2020-12-01 14:35           ` Frederic Weisbecker
2020-12-01 15:01             ` Peter Zijlstra
2020-12-01 15:53               ` Thomas Gleixner
2020-12-01  0:12 ` [PATCH 5/5] irq: Call tick_irq_enter() inside HARDIRQ_OFFSET Frederic Weisbecker
2020-12-02 11:57 [PATCH 0/5] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry v3 Frederic Weisbecker
2020-12-02 11:57 ` [PATCH 4/5] irqtime: Move irqtime entry accounting after irq offset incrementation Frederic Weisbecker
2020-12-02 12:36   ` Peter Zijlstra
2020-12-28  2:15   ` Qais Yousef
2020-12-29 13:41     ` Frederic Weisbecker
2020-12-29 14:12       ` Qais Yousef
2020-12-29 14:30         ` Frederic Weisbecker
2020-12-29 15:58           ` Qais Yousef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).