linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry
@ 2020-11-25  2:15 Frederic Weisbecker
  2020-11-25  2:15 ` [RFC PATCH 1/4] sched/vtime: Consolidate IRQ time accounting Frederic Weisbecker
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2020-11-25  2:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Tony Luck, Peter Zijlstra,
	Vasily Gorbik, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Christian Borntraeger, Fenghua Yu,
	Heiko Carstens

There are two issues with the current layout of tick_irq_enter() as
it's called before HARDIRQ_OFFSET is incremented:

1) It's not correctly handled by lockdep which doesn't consider it as
   hardirq context. And jiffies/timekeeping update take a few interesting
   locks.

2) Softirqs need to be explicitly disabled around it to prevent ksoftirqd
   from being spuriously woken up.

The current call dependency prevents tick_irq_enter() from being moved
after HARDIRQ_OFFSET: account_irq_enter_time() needs to be called before
HARDIRQ_OFFSET incrementation due to cputime index dispatch and it must
be called after tick_irq_enter() which updates the clocks that may be
necessary for cputime accounting.

Here is a proposal to fix this layout.

(The EXPORT_SYMBOL_GPL() in vtime will likely disappear in the next take
as they don't seem to be necessary anymore, but I'll need to check
that thoroughly).

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	irq/core

HEAD: 9502ee20aed8bb847176e1d7d83ccd0625430744


Frederic Weisbecker (4):
  sched/vtime: Consolidate IRQ time accounting
  s390/vtime: Convert to consolidated IRQ time accounting
  irqtime: Move irqtime entry accounting after irq offset incrementation
  irq: Call tick_irq_enter() inside HARDIRQ_OFFSET

 arch/ia64/kernel/time.c       | 22 +++++++---
 arch/powerpc/kernel/time.c    | 60 ++++++++++++++++++++--------
 arch/s390/include/asm/vtime.h |  1 -
 arch/s390/kernel/vtime.c      | 60 ++++++++++++++++++----------
 include/linux/hardirq.h       |  4 +-
 include/linux/vtime.h         | 18 ++++-----
 kernel/sched/cputime.c        | 75 +++++++++++++++++++++++++++--------
 kernel/softirq.c              | 16 +++-----
 8 files changed, 176 insertions(+), 80 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 1/4] sched/vtime: Consolidate IRQ time accounting
  2020-11-25  2:15 [RFC PATCH 0/4] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry Frederic Weisbecker
@ 2020-11-25  2:15 ` Frederic Weisbecker
  2020-11-25  2:15 ` [RFC PATCH 2/4] s390/vtime: Convert to consolidated " Frederic Weisbecker
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2020-11-25  2:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Tony Luck, Peter Zijlstra,
	Vasily Gorbik, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Christian Borntraeger, Fenghua Yu,
	Heiko Carstens

The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
all have their own version of irq time accounting that dispatch the
cputime to the appropriate index: hardirq, softirq, system, idle,
guest... from an all-in-one function.

Instead of having these ad-hoc versions, move the cputime destination
dispatch decision to the core code and leave only the actual per-index
cputime accounting to the architecture.

For now only ia64 and powerpc are handled. s390 will need a slightly
different treatment.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/ia64/kernel/time.c    | 22 ++++++++++----
 arch/powerpc/kernel/time.c | 60 +++++++++++++++++++++++++++-----------
 arch/s390/kernel/vtime.c   | 15 ++++++----
 include/linux/vtime.h      |  8 ++---
 kernel/sched/cputime.c     | 21 +++++++++++--
 5 files changed, 92 insertions(+), 34 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 7abc5f37bfaf..3fd1daf5bc09 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -138,12 +138,8 @@ void vtime_account_kernel(struct task_struct *tsk)
 	struct thread_info *ti = task_thread_info(tsk);
 	__u64 stime = vtime_delta(tsk);
 
-	if ((tsk->flags & PF_VCPU) && !irq_count())
+	if (tsk->flags & PF_VCPU)
 		ti->gtime += stime;
-	else if (hardirq_count())
-		ti->hardirq_time += stime;
-	else if (in_serving_softirq())
-		ti->softirq_time += stime;
 	else
 		ti->stime += stime;
 }
@@ -156,6 +152,22 @@ void vtime_account_idle(struct task_struct *tsk)
 	ti->idle_time += vtime_delta(tsk);
 }
 
+void vtime_account_softirq(struct task_struct *tsk)
+{
+	struct thread_info *ti = task_thread_info(tsk);
+
+	ti->softirq_time += vtime_delta(tsk);
+}
+EXPORT_SYMBOL_GPL(vtime_account_softirq);
+
+void vtime_account_hardirq(struct task_struct *tsk)
+{
+	struct thread_info *ti = task_thread_info(tsk);
+
+	ti->hardirq_time += vtime_delta(tsk);
+}
+EXPORT_SYMBOL_GPL(vtime_account_hardirq);
+
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 static irqreturn_t
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 74efe46f5532..6b9496d615b2 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -311,12 +311,11 @@ static unsigned long vtime_delta_scaled(struct cpu_accounting_data *acct,
 	return stime_scaled;
 }
 
-static unsigned long vtime_delta(struct task_struct *tsk,
+static unsigned long vtime_delta(struct cpu_accounting_data *acct,
 				 unsigned long *stime_scaled,
 				 unsigned long *steal_time)
 {
 	unsigned long now, stime;
-	struct cpu_accounting_data *acct = get_accounting(tsk);
 
 	WARN_ON_ONCE(!irqs_disabled());
 
@@ -331,29 +330,30 @@ static unsigned long vtime_delta(struct task_struct *tsk,
 	return stime;
 }
 
-void vtime_account_kernel(struct task_struct *tsk)
+static void vtime_delta_kernel(struct cpu_accounting_data *acct,
+			       unsigned long *stime, unsigned long *stime_scaled)
 {
-	unsigned long stime, stime_scaled, steal_time;
-	struct cpu_accounting_data *acct = get_accounting(tsk);
+	unsigned long steal_time;
 
-	stime = vtime_delta(tsk, &stime_scaled, &steal_time);
-
-	stime -= min(stime, steal_time);
+	*stime = vtime_delta(acct, stime_scaled, &steal_time);
+	*stime -= min(*stime, steal_time);
 	acct->steal_time += steal_time;
+}
 
-	if ((tsk->flags & PF_VCPU) && !irq_count()) {
+void vtime_account_kernel(struct task_struct *tsk)
+{
+	struct cpu_accounting_data *acct = get_accounting(tsk);
+	unsigned long stime, stime_scaled;
+
+	vtime_delta_kernel(acct, &stime, &stime_scaled);
+
+	if (tsk->flags & PF_VCPU) {
 		acct->gtime += stime;
 #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
 		acct->utime_scaled += stime_scaled;
 #endif
 	} else {
-		if (hardirq_count())
-			acct->hardirq_time += stime;
-		else if (in_serving_softirq())
-			acct->softirq_time += stime;
-		else
-			acct->stime += stime;
-
+		acct->stime += stime;
 #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
 		acct->stime_scaled += stime_scaled;
 #endif
@@ -366,10 +366,36 @@ void vtime_account_idle(struct task_struct *tsk)
 	unsigned long stime, stime_scaled, steal_time;
 	struct cpu_accounting_data *acct = get_accounting(tsk);
 
-	stime = vtime_delta(tsk, &stime_scaled, &steal_time);
+	stime = vtime_delta(acct, &stime_scaled, &steal_time);
 	acct->idle_time += stime + steal_time;
 }
 
+static void vtime_account_irq(struct cpu_accounting_data *acct,
+			      unsigned long *field)
+{
+	unsigned long stime, stime_scaled;
+
+	vtime_delta_kernel(acct, &stime, &stime_scaled);
+	*field += stime;
+#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
+	acct->stime_scaled += stime_scaled;
+#endif
+}
+
+void vtime_account_softirq(struct task_struct *tsk)
+{
+	struct cpu_accounting_data *acct = get_accounting(tsk);
+	vtime_account_irq(acct, &acct->softirq_time);
+}
+EXPORT_SYMBOL_GPL(vtime_account_softirq);
+
+void vtime_account_hardirq(struct task_struct *tsk)
+{
+	struct cpu_accounting_data *acct = get_accounting(tsk);
+	vtime_account_irq(acct, &acct->hardirq_time);
+}
+EXPORT_SYMBOL_GPL(vtime_account_hardirq);
+
 static void vtime_flush_scaled(struct task_struct *tsk,
 			       struct cpu_accounting_data *acct)
 {
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 8df10d3c8f6c..9fce2ca1b448 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -226,7 +226,7 @@ void vtime_flush(struct task_struct *tsk)
  * Update process times based on virtual cpu times stored by entry.S
  * to the lowcore fields user_timer, system_timer & steal_clock.
  */
-void vtime_account_irq_enter(struct task_struct *tsk)
+void vtime_account_kernel(struct task_struct *tsk)
 {
 	u64 timer;
 
@@ -245,12 +245,17 @@ void vtime_account_irq_enter(struct task_struct *tsk)
 
 	virt_timer_forward(timer);
 }
-EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
-
-void vtime_account_kernel(struct task_struct *tsk)
-__attribute__((alias("vtime_account_irq_enter")));
 EXPORT_SYMBOL_GPL(vtime_account_kernel);
 
+void vtime_account_irq_enter(struct task_struct *tsk)
+__attribute__((alias("vtime_account_kernel")));
+EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
+
+void vtime_account_irq_exit(struct task_struct *tsk)
+__attribute__((alias("vtime_account_kernel")));
+EXPORT_SYMBOL_GPL(vtime_account_irq_exit);
+
+
 /*
  * Sorted add to a list. List is linear searched until first bigger
  * element is found.
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 2cdeca062db3..f827b38c3bb7 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -84,11 +84,9 @@ static inline void vtime_init_idle(struct task_struct *tsk, int cpu) { }
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 extern void vtime_account_irq_enter(struct task_struct *tsk);
-static inline void vtime_account_irq_exit(struct task_struct *tsk)
-{
-	/* On hard|softirq exit we always account to hard|softirq cputime */
-	vtime_account_kernel(tsk);
-}
+extern void vtime_account_irq_exit(struct task_struct *tsk);
+extern void vtime_account_softirq(struct task_struct *tsk);
+extern void vtime_account_hardirq(struct task_struct *tsk);
 extern void vtime_flush(struct task_struct *tsk);
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5a55d2300452..a042250ecbfe 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -429,12 +429,29 @@ void vtime_task_switch(struct task_struct *prev)
 #ifndef __ARCH_HAS_VTIME_ACCOUNT
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
-	if (!in_interrupt() && is_idle_task(tsk))
+	if (hardirq_count()) {
+		vtime_account_hardirq(tsk);
+	} else if (in_serving_softirq()) {
+		vtime_account_softirq(tsk);
+	} else if (is_idle_task(tsk)) {
 		vtime_account_idle(tsk);
-	else
+	} else {
 		vtime_account_kernel(tsk);
+	}
 }
 EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
+
+void vtime_account_irq_exit(struct task_struct *tsk)
+{
+	WARN_ON_ONCE(in_task());
+
+	if (hardirq_count()) {
+		vtime_account_hardirq(tsk);
+	} else if (in_serving_softirq()) {
+		vtime_account_softirq(tsk);
+	}
+}
+EXPORT_SYMBOL_GPL(vtime_account_irq_exit);
 #endif /* __ARCH_HAS_VTIME_ACCOUNT */
 
 void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/4] s390/vtime: Convert to consolidated IRQ time accounting
  2020-11-25  2:15 [RFC PATCH 0/4] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry Frederic Weisbecker
  2020-11-25  2:15 ` [RFC PATCH 1/4] sched/vtime: Consolidate IRQ time accounting Frederic Weisbecker
@ 2020-11-25  2:15 ` Frederic Weisbecker
  2020-11-25  2:15 ` [RFC PATCH 3/4] sched/irqtime: Move irqtime entry accounting after irq offset incrementation Frederic Weisbecker
  2020-11-25  2:15 ` [RFC PATCH 4/4] irq: Call tick_irq_enter() inside HARDIRQ_OFFSET Frederic Weisbecker
  3 siblings, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2020-11-25  2:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Tony Luck, Peter Zijlstra,
	Vasily Gorbik, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Christian Borntraeger, Fenghua Yu,
	Heiko Carstens

s390 has its own version of IRQ time accounting because it doesn't
account the idle time the same way the other architectures do. Only
the actual idle sleep time is accounted as idle time, the rest of the
idle task execution is accounted as system time.

However converting it to the consolidated IRQ time accounting is easy:
just keep the current behaviour and redirect generic idle time
accounting to system time accounting.

This removes the need to maintain an ad-hoc implementation of cputime
dispatch decision.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/vtime.h |  1 -
 arch/s390/kernel/vtime.c      | 57 ++++++++++++++++++++++-------------
 kernel/sched/cputime.c        |  2 --
 3 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/arch/s390/include/asm/vtime.h b/arch/s390/include/asm/vtime.h
index 3622d4ebc73a..fac6a67988eb 100644
--- a/arch/s390/include/asm/vtime.h
+++ b/arch/s390/include/asm/vtime.h
@@ -2,7 +2,6 @@
 #ifndef _S390_VTIME_H
 #define _S390_VTIME_H
 
-#define __ARCH_HAS_VTIME_ACCOUNT
 #define __ARCH_HAS_VTIME_TASK_SWITCH
 
 #endif /* _S390_VTIME_H */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 9fce2ca1b448..09328baeb61d 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -222,39 +222,54 @@ void vtime_flush(struct task_struct *tsk)
 	S390_lowcore.avg_steal_timer = avg_steal;
 }
 
-/*
- * Update process times based on virtual cpu times stored by entry.S
- * to the lowcore fields user_timer, system_timer & steal_clock.
- */
-void vtime_account_kernel(struct task_struct *tsk)
+static u64 vtime_delta(void)
 {
-	u64 timer;
+	u64 timer = S390_lowcore.last_update_timer;
 
-	timer = S390_lowcore.last_update_timer;
 	S390_lowcore.last_update_timer = get_vtimer();
-	timer -= S390_lowcore.last_update_timer;
 
-	if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
-		S390_lowcore.guest_timer += timer;
-	else if (hardirq_count())
-		S390_lowcore.hardirq_timer += timer;
-	else if (in_serving_softirq())
-		S390_lowcore.softirq_timer += timer;
+	return timer - S390_lowcore.last_update_timer;
+}
+
+/*
+ * Update process times based on virtual cpu times stored by entry.S
+ * to the lowcore fields user_timer, system_timer & steal_clock.
+ */
+void vtime_account_kernel(struct task_struct *tsk)
+{
+	u64 delta = vtime_delta();
+
+	if (tsk->flags & PF_VCPU)
+		S390_lowcore.guest_timer += delta;
 	else
-		S390_lowcore.system_timer += timer;
+		S390_lowcore.system_timer += delta;
 
-	virt_timer_forward(timer);
+	virt_timer_forward(delta);
 }
 EXPORT_SYMBOL_GPL(vtime_account_kernel);
 
-void vtime_account_irq_enter(struct task_struct *tsk)
+void vtime_account_idle(struct task_struct *tsk)
 __attribute__((alias("vtime_account_kernel")));
-EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
 
-void vtime_account_irq_exit(struct task_struct *tsk)
-__attribute__((alias("vtime_account_kernel")));
-EXPORT_SYMBOL_GPL(vtime_account_irq_exit);
+void vtime_account_softirq(struct task_struct *tsk)
+{
+	u64 delta = vtime_delta();
+
+	S390_lowcore.softirq_timer += delta;
+
+	virt_timer_forward(delta);
+}
+EXPORT_SYMBOL_GPL(vtime_account_softirq);
+
+void vtime_account_hardirq(struct task_struct *tsk)
+{
+	u64 delta = vtime_delta();
+
+	S390_lowcore.hardirq_timer += delta;
 
+	virt_timer_forward(delta);
+}
+EXPORT_SYMBOL_GPL(vtime_account_hardirq);
 
 /*
  * Sorted add to a list. List is linear searched until first bigger
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index a042250ecbfe..6fa81cc33fec 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -426,7 +426,6 @@ void vtime_task_switch(struct task_struct *prev)
  * time spent by the CPU when it's in low power mode) must override
  * vtime_account().
  */
-#ifndef __ARCH_HAS_VTIME_ACCOUNT
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
 	if (hardirq_count()) {
@@ -452,7 +451,6 @@ void vtime_account_irq_exit(struct task_struct *tsk)
 	}
 }
 EXPORT_SYMBOL_GPL(vtime_account_irq_exit);
-#endif /* __ARCH_HAS_VTIME_ACCOUNT */
 
 void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
 		    u64 *ut, u64 *st)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 3/4] sched/irqtime: Move irqtime entry accounting after irq offset incrementation
  2020-11-25  2:15 [RFC PATCH 0/4] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry Frederic Weisbecker
  2020-11-25  2:15 ` [RFC PATCH 1/4] sched/vtime: Consolidate IRQ time accounting Frederic Weisbecker
  2020-11-25  2:15 ` [RFC PATCH 2/4] s390/vtime: Convert to consolidated " Frederic Weisbecker
@ 2020-11-25  2:15 ` Frederic Weisbecker
  2020-11-25  2:15 ` [RFC PATCH 4/4] irq: Call tick_irq_enter() inside HARDIRQ_OFFSET Frederic Weisbecker
  3 siblings, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2020-11-25  2:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Tony Luck, Peter Zijlstra,
	Vasily Gorbik, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Christian Borntraeger, Fenghua Yu,
	Heiko Carstens

IRQ time entry is currently accounted before HARDIRQ_OFFSET or
SOFTIRQ_OFFSET are incremented. This is convenient to decide to which
index the cputime to account is dispatched.

Unfortunately it prevents tick_irq_enter() from being called under
HARDIRQ_OFFSET because tick_irq_enter() has to be called before the IRQ
entry accounting due to the necessary clock catch up. As a result we
don't benefit from appropriate lockdep coverage on tick_irq_enter().

To prepare for fixing this, move the IRQ entry cputime accounting after
the preempt offset is incremented. This requires the cputime dispatch
code to handle the extra offset.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 include/linux/hardirq.h |  4 +--
 include/linux/vtime.h   | 10 +++++---
 kernel/sched/cputime.c  | 56 ++++++++++++++++++++++++++++++-----------
 kernel/softirq.c        |  2 +-
 4 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 754f67ac4326..02499c10fbf7 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -32,9 +32,9 @@ static __always_inline void rcu_irq_enter_check_tick(void)
  */
 #define __irq_enter()					\
 	do {						\
+		preempt_count_add(HARDIRQ_OFFSET);	\
+		lockdep_hardirq_enter();		\
 		account_irq_enter_time(current);	\
-		preempt_count_add(HARDIRQ_OFFSET);	\
-		lockdep_hardirq_enter();		\
 	} while (0)
 
 /*
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index f827b38c3bb7..cad8ff530273 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -96,21 +96,23 @@ static inline void vtime_flush(struct task_struct *tsk) { }
 
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-extern void irqtime_account_irq(struct task_struct *tsk);
+extern void irqtime_account_enter(struct task_struct *tsk);
+extern void irqtime_account_exit(struct task_struct *tsk);
 #else
-static inline void irqtime_account_irq(struct task_struct *tsk) { }
+static inline void irqtime_account_enter(struct task_struct *tsk) { }
+static inline void irqtime_account_exit(struct task_struct *tsk) { }
 #endif
 
 static inline void account_irq_enter_time(struct task_struct *tsk)
 {
 	vtime_account_irq_enter(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_enter(tsk);
 }
 
 static inline void account_irq_exit_time(struct task_struct *tsk)
 {
 	vtime_account_irq_exit(tsk);
-	irqtime_account_irq(tsk);
+	irqtime_account_exit(tsk);
 }
 
 #endif /* _LINUX_KERNEL_VTIME_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 6fa81cc33fec..82623d97667c 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -43,23 +43,49 @@ static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
 	u64_stats_update_end(&irqtime->sync);
 }
 
-/*
- * Called before incrementing preempt_count on {soft,}irq_enter
- * and before decrementing preempt_count on {soft,}irq_exit.
- */
-void irqtime_account_irq(struct task_struct *curr)
+static s64 irqtime_get_delta(struct irqtime *irqtime)
 {
-	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
+	int cpu = smp_processor_id();
 	s64 delta;
-	int cpu;
 
-	if (!sched_clock_irqtime)
-		return;
-
-	cpu = smp_processor_id();
 	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
 	irqtime->irq_start_time += delta;
 
+	return delta;
+}
+
+/* Called after incrementing preempt_count on {soft,}irq_enter */
+void irqtime_account_enter(struct task_struct *curr)
+{
+	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
+	u64 delta;
+
+	if (!sched_clock_irqtime)
+		return;
+
+	delta = irqtime_get_delta(irqtime);
+	/*
+	 * We do not account for softirq time from ksoftirqd here.
+	 * We want to continue accounting softirq time to ksoftirqd thread
+	 * in that case, so as not to confuse scheduler with a special task
+	 * that do not consume any time, but still wants to run.
+	 */
+	if ((irq_count() == (SOFTIRQ_OFFSET | HARDIRQ_OFFSET)) &&
+	    curr != this_cpu_ksoftirqd())
+		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
+}
+EXPORT_SYMBOL_GPL(irqtime_account_enter);
+
+/* Called before decrementing preempt_count on {soft,}irq_exit */
+void irqtime_account_exit(struct task_struct *curr)
+{
+	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
+	u64 delta;
+
+	if (!sched_clock_irqtime)
+		return;
+
+	delta = irqtime_get_delta(irqtime);
 	/*
 	 * We do not account for softirq time from ksoftirqd here.
 	 * We want to continue accounting softirq time to ksoftirqd thread
@@ -71,7 +97,7 @@ void irqtime_account_irq(struct task_struct *curr)
 	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
 		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
 }
-EXPORT_SYMBOL_GPL(irqtime_account_irq);
+EXPORT_SYMBOL_GPL(irqtime_account_exit);
 
 static u64 irqtime_tick_accounted(u64 maxtime)
 {
@@ -428,9 +454,11 @@ void vtime_task_switch(struct task_struct *prev)
  */
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
-	if (hardirq_count()) {
+	WARN_ON_ONCE(in_task());
+
+	if (hardirq_count() > HARDIRQ_OFFSET) {
 		vtime_account_hardirq(tsk);
-	} else if (in_serving_softirq()) {
+	} else if (hardirq_count() && in_serving_softirq()) {
 		vtime_account_softirq(tsk);
 	} else if (is_idle_task(tsk)) {
 		vtime_account_idle(tsk);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 617009ccd82c..24254c41bb7c 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -315,9 +315,9 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	current->flags &= ~PF_MEMALLOC;
 
 	pending = local_softirq_pending();
-	account_irq_enter_time(current);
 
 	__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
+	account_irq_enter_time(current);
 	in_hardirq = lockdep_softirq_start();
 
 restart:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 4/4] irq: Call tick_irq_enter() inside HARDIRQ_OFFSET
  2020-11-25  2:15 [RFC PATCH 0/4] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2020-11-25  2:15 ` [RFC PATCH 3/4] sched/irqtime: Move irqtime entry accounting after irq offset incrementation Frederic Weisbecker
@ 2020-11-25  2:15 ` Frederic Weisbecker
  3 siblings, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2020-11-25  2:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Tony Luck, Peter Zijlstra,
	Vasily Gorbik, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Christian Borntraeger, Fenghua Yu,
	Heiko Carstens

Now that account_irq_enter_time() is called after HARDIRQ_OFFSET has
been incremented, there is nothing left that prevents us from also
moving tick_irq_enter() after HARDIRQ_OFFSET is incremented.

The desired outcome is to remove the nasty hack that prevents softirqs
from being raised through ksoftirqd instead of the hardirq bottom half.
Also tick_irq_enter() then becomes appropriately covered by lockdep.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 kernel/softirq.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 24254c41bb7c..447ee4d6fe4d 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -377,16 +377,12 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
  */
 void irq_enter_rcu(void)
 {
-	if (is_idle_task(current) && !in_interrupt()) {
-		/*
-		 * Prevent raise_softirq from needlessly waking up ksoftirqd
-		 * here, as softirq will be serviced on return from interrupt.
-		 */
-		local_bh_disable();
+	__irq_enter_raw();
+
+	if (is_idle_task(current) && (irq_count() == HARDIRQ_OFFSET))
 		tick_irq_enter();
-		_local_bh_enable();
-	}
-	__irq_enter();
+
+	account_irq_enter_time(current);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-25  2:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-25  2:15 [RFC PATCH 0/4] irq: Reorder time handling against HARDIRQ_OFFSET on IRQ entry Frederic Weisbecker
2020-11-25  2:15 ` [RFC PATCH 1/4] sched/vtime: Consolidate IRQ time accounting Frederic Weisbecker
2020-11-25  2:15 ` [RFC PATCH 2/4] s390/vtime: Convert to consolidated " Frederic Weisbecker
2020-11-25  2:15 ` [RFC PATCH 3/4] sched/irqtime: Move irqtime entry accounting after irq offset incrementation Frederic Weisbecker
2020-11-25  2:15 ` [RFC PATCH 4/4] irq: Call tick_irq_enter() inside HARDIRQ_OFFSET Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).