All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for v4.9-stable] sched: fix softirq time accounting
@ 2017-12-13 10:11 Rabin Vincent
  2017-12-13 10:54 ` Greg KH
  0 siblings, 1 reply; 3+ messages in thread
From: Rabin Vincent @ 2017-12-13 10:11 UTC (permalink / raw)
  To: gregkh, fweisbec, riel, mingo, peterz; +Cc: stable, linux-kernel, Rabin Vincent

From: Rabin Vincent <rabinv@axis.com>

softirq time accounting is broken on v4.9.x if ksoftirqd runs.

With
	CONFIG_IRQ_TIME_ACCOUNTING=y
	# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set

this test code:

	struct tasklet_struct tasklet;

	static void delay_tasklet(unsigned long data)
	{
		udelay(10);
		tasklet_schedule(&tasklet);
	}

	tasklet_init(&tasklet, delay_tasklet, 0);
	tasklet_schedule(&tasklet);

results in:

$ while :; do grep cpu0 /proc/stat; done
cpu0 5 0 80 25 16 107 1 0 0 0
cpu0 5 0 80 25 16 107 0 0 0 0
cpu0 5 0 80 25 16 107 0 0 0 0
cpu0 5 0 80 25 16 107 0 0 0 0
cpu0 5 0 81 25 16 107 0 0 0 0
cpu0 5 0 81 25 16 107 1 0 0 0
cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
cpu0 5 0 81 25 16 108 0 0 0 0
cpu0 6 0 81 25 16 108 0 0 0 0
cpu0 6 0 81 25 16 108 0 0 0 0

As can be seen, the softirq numbers are totally bogus.

When ksoftirq is running, irqtime_account_process_tick() increments
cpustat[CPUSTAT_SOFTIRQ].  This causes the "nsecs_to_cputime64(irqtime)
- cpustat[CPUSTAT_SOFTIRQ]" calculation in irqtime_account_update() to
underflow the next time a softirq is handled leading to the above
values.

The underflow bug was added by 57430218317e5b280 ("sched/cputime: Count
actually elapsed irq & softirq time").

But ksoftirqd accounting was wrong even in earlier kernels.  In earlier
kernels, after a ksoftirq run, the kernel would simply stop accounting
softirq time spent outside of ksoftirqd until that (accumulated) time
exceeded the time for which ksofirqd previously had run.

Fix both the underflow and the wrong accounting by using a counter
specifically for the non-ksoftirqd softirq case.

This code has been fixed in current mainline by a499a5a14db
("sched/cputime: Increment kcpustat directly on irqtime account") [note
also the followup 25e2d8c1b9e327e ("sched/cputime: Fix ksoftirqd cputime
accounting regression")], but that patch is a part of the many changes
for eliminating of cputime_t so it does not seem suitable for backport.

Signed-off-by: Rabin Vincent <rabinv@axis.com>
---
 include/linux/kernel_stat.h | 1 +
 kernel/sched/cputime.c      | 9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 44fda64..d0826f1 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -33,6 +33,7 @@ enum cpu_usage_stat {
 
 struct kernel_cpustat {
 	u64 cpustat[NR_STATS];
+	u64 softirq_no_ksoftirqd;
 };
 
 struct kernel_stat {
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5ebee31..1b5a9e6 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -73,12 +73,19 @@ EXPORT_SYMBOL_GPL(irqtime_account_irq);
 static cputime_t irqtime_account_update(u64 irqtime, int idx, cputime_t maxtime)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
+	u64 base = cpustat[idx];
 	cputime_t irq_cputime;
 
-	irq_cputime = nsecs_to_cputime64(irqtime) - cpustat[idx];
+	if (idx == CPUTIME_SOFTIRQ)
+		base = kcpustat_this_cpu->softirq_no_ksoftirqd;
+
+	irq_cputime = nsecs_to_cputime64(irqtime) - base;
 	irq_cputime = min(irq_cputime, maxtime);
 	cpustat[idx] += irq_cputime;
 
+	if (idx == CPUTIME_SOFTIRQ)
+		kcpustat_this_cpu->softirq_no_ksoftirqd += irq_cputime;
+
 	return irq_cputime;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH for v4.9-stable] sched: fix softirq time accounting
  2017-12-13 10:11 [PATCH for v4.9-stable] sched: fix softirq time accounting Rabin Vincent
@ 2017-12-13 10:54 ` Greg KH
  2018-01-17 13:54   ` Greg KH
  0 siblings, 1 reply; 3+ messages in thread
From: Greg KH @ 2017-12-13 10:54 UTC (permalink / raw)
  To: Rabin Vincent
  Cc: fweisbec, riel, mingo, peterz, stable, linux-kernel, Rabin Vincent

On Wed, Dec 13, 2017 at 11:11:16AM +0100, Rabin Vincent wrote:
> From: Rabin Vincent <rabinv@axis.com>
> 
> softirq time accounting is broken on v4.9.x if ksoftirqd runs.
> 
> With
> 	CONFIG_IRQ_TIME_ACCOUNTING=y
> 	# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
> 
> this test code:
> 
> 	struct tasklet_struct tasklet;
> 
> 	static void delay_tasklet(unsigned long data)
> 	{
> 		udelay(10);
> 		tasklet_schedule(&tasklet);
> 	}
> 
> 	tasklet_init(&tasklet, delay_tasklet, 0);
> 	tasklet_schedule(&tasklet);
> 
> results in:
> 
> $ while :; do grep cpu0 /proc/stat; done
> cpu0 5 0 80 25 16 107 1 0 0 0
> cpu0 5 0 80 25 16 107 0 0 0 0
> cpu0 5 0 80 25 16 107 0 0 0 0
> cpu0 5 0 80 25 16 107 0 0 0 0
> cpu0 5 0 81 25 16 107 0 0 0 0
> cpu0 5 0 81 25 16 107 1 0 0 0
> cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> cpu0 5 0 81 25 16 108 0 0 0 0
> cpu0 6 0 81 25 16 108 0 0 0 0
> cpu0 6 0 81 25 16 108 0 0 0 0
> 
> As can be seen, the softirq numbers are totally bogus.
> 
> When ksoftirq is running, irqtime_account_process_tick() increments
> cpustat[CPUSTAT_SOFTIRQ].  This causes the "nsecs_to_cputime64(irqtime)
> - cpustat[CPUSTAT_SOFTIRQ]" calculation in irqtime_account_update() to
> underflow the next time a softirq is handled leading to the above
> values.
> 
> The underflow bug was added by 57430218317e5b280 ("sched/cputime: Count
> actually elapsed irq & softirq time").
> 
> But ksoftirqd accounting was wrong even in earlier kernels.  In earlier
> kernels, after a ksoftirq run, the kernel would simply stop accounting
> softirq time spent outside of ksoftirqd until that (accumulated) time
> exceeded the time for which ksofirqd previously had run.
> 
> Fix both the underflow and the wrong accounting by using a counter
> specifically for the non-ksoftirqd softirq case.
> 
> This code has been fixed in current mainline by a499a5a14db
> ("sched/cputime: Increment kcpustat directly on irqtime account") [note
> also the followup 25e2d8c1b9e327e ("sched/cputime: Fix ksoftirqd cputime
> accounting regression")], but that patch is a part of the many changes
> for eliminating of cputime_t so it does not seem suitable for backport.

I _really_ only want to take the exact upstream patches, as every time
we do something like what you are proposing to do here, we get it wrong.
Seriously, our track record is horrible.  Like 90% wrong.

Can you try just those two patches instead?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH for v4.9-stable] sched: fix softirq time accounting
  2017-12-13 10:54 ` Greg KH
@ 2018-01-17 13:54   ` Greg KH
  0 siblings, 0 replies; 3+ messages in thread
From: Greg KH @ 2018-01-17 13:54 UTC (permalink / raw)
  To: Rabin Vincent
  Cc: fweisbec, riel, mingo, peterz, stable, linux-kernel, Rabin Vincent

On Wed, Dec 13, 2017 at 11:54:56AM +0100, Greg KH wrote:
> On Wed, Dec 13, 2017 at 11:11:16AM +0100, Rabin Vincent wrote:
> > From: Rabin Vincent <rabinv@axis.com>
> > 
> > softirq time accounting is broken on v4.9.x if ksoftirqd runs.
> > 
> > With
> > 	CONFIG_IRQ_TIME_ACCOUNTING=y
> > 	# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
> > 
> > this test code:
> > 
> > 	struct tasklet_struct tasklet;
> > 
> > 	static void delay_tasklet(unsigned long data)
> > 	{
> > 		udelay(10);
> > 		tasklet_schedule(&tasklet);
> > 	}
> > 
> > 	tasklet_init(&tasklet, delay_tasklet, 0);
> > 	tasklet_schedule(&tasklet);
> > 
> > results in:
> > 
> > $ while :; do grep cpu0 /proc/stat; done
> > cpu0 5 0 80 25 16 107 1 0 0 0
> > cpu0 5 0 80 25 16 107 0 0 0 0
> > cpu0 5 0 80 25 16 107 0 0 0 0
> > cpu0 5 0 80 25 16 107 0 0 0 0
> > cpu0 5 0 81 25 16 107 0 0 0 0
> > cpu0 5 0 81 25 16 107 1 0 0 0
> > cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> > cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> > cpu0 5 0 81 25 16 108 18446744073709551615 0 0 0
> > cpu0 5 0 81 25 16 108 0 0 0 0
> > cpu0 6 0 81 25 16 108 0 0 0 0
> > cpu0 6 0 81 25 16 108 0 0 0 0
> > 
> > As can be seen, the softirq numbers are totally bogus.
> > 
> > When ksoftirq is running, irqtime_account_process_tick() increments
> > cpustat[CPUSTAT_SOFTIRQ].  This causes the "nsecs_to_cputime64(irqtime)
> > - cpustat[CPUSTAT_SOFTIRQ]" calculation in irqtime_account_update() to
> > underflow the next time a softirq is handled leading to the above
> > values.
> > 
> > The underflow bug was added by 57430218317e5b280 ("sched/cputime: Count
> > actually elapsed irq & softirq time").
> > 
> > But ksoftirqd accounting was wrong even in earlier kernels.  In earlier
> > kernels, after a ksoftirq run, the kernel would simply stop accounting
> > softirq time spent outside of ksoftirqd until that (accumulated) time
> > exceeded the time for which ksofirqd previously had run.
> > 
> > Fix both the underflow and the wrong accounting by using a counter
> > specifically for the non-ksoftirqd softirq case.
> > 
> > This code has been fixed in current mainline by a499a5a14db
> > ("sched/cputime: Increment kcpustat directly on irqtime account") [note
> > also the followup 25e2d8c1b9e327e ("sched/cputime: Fix ksoftirqd cputime
> > accounting regression")], but that patch is a part of the many changes
> > for eliminating of cputime_t so it does not seem suitable for backport.
> 
> I _really_ only want to take the exact upstream patches, as every time
> we do something like what you are proposing to do here, we get it wrong.
> Seriously, our track record is horrible.  Like 90% wrong.
> 
> Can you try just those two patches instead?

Dropping from my mbox due to a lack of response.  If you do test this
out, please resend.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-01-17 13:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-13 10:11 [PATCH for v4.9-stable] sched: fix softirq time accounting Rabin Vincent
2017-12-13 10:54 ` Greg KH
2018-01-17 13:54   ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.