From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Frederic Weisbecker <fweisbec@gmail.com>,
Rik van Riel <riel@redhat.com>,
Jesper Dangaard Brouer <brouer@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Stanislaw Gruszka <sgruszka@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Wanpeng Li <wanpeng.li@hotmail.com>,
Ingo Molnar <mingo@kernel.org>,
Ivan Delalande <colona@arista.com>
Subject: [PATCH 4.9 33/35] sched/cputime: Fix ksoftirqd cputime accounting regression
Date: Thu, 18 Oct 2018 19:55:02 +0200 [thread overview]
Message-ID: <20181018175427.156878860@linuxfoundation.org> (raw)
In-Reply-To: <20181018175422.506152522@linuxfoundation.org>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frederic Weisbecker <fweisbec@gmail.com>
commit 25e2d8c1b9e327ed260edd13169cc22bc7a78bc6 upstream.
irq_time_read() returns the irqtime minus the ksoftirqd time. This
is necessary because irq_time_read() is used to substract the IRQ time
from the sum_exec_runtime of a task. If we were to include the softirq
time of ksoftirqd, this task would substract its own CPU time everytime
it updates ksoftirqd->sum_exec_runtime which would therefore never
progress.
But this behaviour got broken by:
a499a5a14db ("sched/cputime: Increment kcpustat directly on irqtime account")
... which now includes ksoftirqd softirq time in the time returned by
irq_time_read().
This has resulted in wrong ksoftirqd cputime reported to userspace
through /proc/stat and thus "top" not showing ksoftirqd when it should
after intense networking load.
ksoftirqd->stime happens to be correct but it gets scaled down by
sum_exec_runtime through task_cputime_adjusted().
To fix this, just account the strict IRQ time in a separate counter and
use it to report the IRQ time.
Reported-and-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1493129448-5356-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
kernel/sched/cputime.c | 27 ++++++++++++++++-----------
kernel/sched/sched.h | 9 +++++++--
2 files changed, 23 insertions(+), 13 deletions(-)
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -37,6 +37,18 @@ void disable_sched_clock_irqtime(void)
sched_clock_irqtime = 0;
}
+static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
+ enum cpu_usage_stat idx)
+{
+ u64 *cpustat = kcpustat_this_cpu->cpustat;
+
+ u64_stats_update_begin(&irqtime->sync);
+ cpustat[idx] += delta;
+ irqtime->total += delta;
+ irqtime->tick_delta += delta;
+ u64_stats_update_end(&irqtime->sync);
+}
+
/*
* Called before incrementing preempt_count on {soft,}irq_enter
* and before decrementing preempt_count on {soft,}irq_exit.
@@ -44,7 +56,6 @@ void disable_sched_clock_irqtime(void)
void irqtime_account_irq(struct task_struct *curr)
{
struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
- u64 *cpustat = kcpustat_this_cpu->cpustat;
s64 delta;
int cpu;
@@ -55,22 +66,16 @@ void irqtime_account_irq(struct task_str
delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
irqtime->irq_start_time += delta;
- u64_stats_update_begin(&irqtime->sync);
/*
* We do not account for softirq time from ksoftirqd here.
* We want to continue accounting softirq time to ksoftirqd thread
* in that case, so as not to confuse scheduler with a special task
* that do not consume any time, but still wants to run.
*/
- if (hardirq_count()) {
- cpustat[CPUTIME_IRQ] += delta;
- irqtime->tick_delta += delta;
- } else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) {
- cpustat[CPUTIME_SOFTIRQ] += delta;
- irqtime->tick_delta += delta;
- }
-
- u64_stats_update_end(&irqtime->sync);
+ if (hardirq_count())
+ irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
+ else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+ irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
}
EXPORT_SYMBOL_GPL(irqtime_account_irq);
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1743,6 +1743,7 @@ static inline void nohz_balance_exit_idl
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
struct irqtime {
+ u64 total;
u64 tick_delta;
u64 irq_start_time;
struct u64_stats_sync sync;
@@ -1750,16 +1751,20 @@ struct irqtime {
DECLARE_PER_CPU(struct irqtime, cpu_irqtime);
+/*
+ * Returns the irqtime minus the softirq time computed by ksoftirqd.
+ * Otherwise ksoftirqd's sum_exec_runtime is substracted its own runtime
+ * and never move forward.
+ */
static inline u64 irq_time_read(int cpu)
{
struct irqtime *irqtime = &per_cpu(cpu_irqtime, cpu);
- u64 *cpustat = kcpustat_cpu(cpu).cpustat;
unsigned int seq;
u64 total;
do {
seq = __u64_stats_fetch_begin(&irqtime->sync);
- total = cpustat[CPUTIME_SOFTIRQ] + cpustat[CPUTIME_IRQ];
+ total = irqtime->total;
} while (__u64_stats_fetch_retry(&irqtime->sync, seq));
return total;
next prev parent reply other threads:[~2018-10-18 18:04 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-18 17:54 [PATCH 4.9 00/35] 4.9.135-stable review Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 01/35] media: af9035: prevent buffer overflow on write Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 02/35] batman-adv: Fix segfault when writing to throughput_override Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 03/35] batman-adv: Fix segfault when writing to sysfs elp_interval Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 04/35] batman-adv: Prevent duplicated nc_node entry Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 05/35] batman-adv: Prevent duplicated softif_vlan entry Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 06/35] batman-adv: Prevent duplicated global TT entry Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 07/35] batman-adv: Prevent duplicated tvlv handler Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 08/35] batman-adv: fix backbone_gw refcount on queue_work() failure Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 09/35] batman-adv: fix hardif_neigh " Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 10/35] clocksource/drivers/ti-32k: Add CLOCK_SOURCE_SUSPEND_NONSTOP flag for non-am43 SoCs Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 11/35] scsi: ibmvscsis: Fix a stringop-overflow warning Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 12/35] scsi: ibmvscsis: Ensure partition name is properly NUL terminated Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 13/35] Input: atakbd - fix Atari keymap Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 14/35] Input: atakbd - fix Atari CapsLock behaviour Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 15/35] ravb: do not write 1 to reserved bits Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 16/35] drm: mali-dp: Call drm_crtc_vblank_reset on device init Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 17/35] scsi: sd: dont crash the host on invalid commands Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 18/35] net/mlx4: Use cpumask_available for eq->affinity_mask Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 19/35] RISC-V: include linux/ftrace.h in asm-prototypes.h Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 20/35] powerpc/tm: Fix userspace r13 corruption Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 21/35] powerpc/tm: Avoid possible userspace r1 corruption on reclaim Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 22/35] iommu/amd: Return devid as alias for ACPI HID devices Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 23/35] mremap: properly flush TLB before releasing the page Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 24/35] mm: Preserve _PAGE_DEVMAP across mprotect() calls Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 25/35] netfilter: check for seqadj ext existence before adding it in nf_nat_setup_info Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 26/35] ARC: build: Get rid of toolchain check Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 27/35] ARC: build: Dont set CROSS_COMPILE in archs Makefile Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 28/35] HID: quirks: fix support for Apple Magic Keyboards Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 29/35] usb: gadget: serial: fix oops when data rxd after close Greg Kroah-Hartman
2018-10-18 17:54 ` [PATCH 4.9 30/35] sched/cputime: Convert kcpustat to nsecs Greg Kroah-Hartman
2018-10-18 17:55 ` [PATCH 4.9 31/35] macintosh/rack-meter: Convert cputime64_t use to u64 Greg Kroah-Hartman
2018-10-18 17:55 ` [PATCH 4.9 32/35] sched/cputime: Increment kcpustat directly on irqtime account Greg Kroah-Hartman
2018-10-18 17:55 ` Greg Kroah-Hartman [this message]
2018-10-18 17:55 ` [PATCH 4.9 34/35] ext4: avoid running out of journal credits when appending to an inline file Greg Kroah-Hartman
2018-10-18 17:55 ` [PATCH 4.9 35/35] HV: properly delay KVP packets when negotiation is in progress Greg Kroah-Hartman
2018-10-19 1:41 ` [PATCH 4.9 00/35] 4.9.135-stable review Nathan Chancellor
2018-10-19 13:20 ` Rafael David Tinoco
2018-10-19 15:48 ` Guenter Roeck
2018-10-19 20:46 ` Shuah Khan
2018-10-19 20:55 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181018175427.156878860@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=brouer@redhat.com \
--cc=colona@arista.com \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=sgruszka@redhat.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=wanpeng.li@hotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).