From: Andrey Ryabinin <arbn@yandex-team.com> To: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org> Cc: Andrey Ryabinin <arbn@yandex-team.com>, stable@vger.kernel.org, Daniel Jordan <daniel.m.jordan@oracle.com>, Tejun Heo <tj@kernel.org>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Dan Schatzberg <schatzberg.dan@gmail.com> Subject: [PATCH v3 3/4] sched/cpuacct: fix user/system in shown cpuacct.usage* Date: Mon, 15 Nov 2021 19:46:06 +0300 [thread overview] Message-ID: <20211115164607.23784-3-arbn@yandex-team.com> (raw) In-Reply-To: <20211115164607.23784-1-arbn@yandex-team.com> cpuacct has 2 different ways of accounting and showing user and system times. The first one uses cpuacct_account_field() to account times and cpuacct.stat file to expose them. And this one seems to work ok. The second one is uses cpuacct_charge() function for accounting and set of cpuacct.usage* files to show times. Despite some attempts to fix it in the past it still doesn't work. Sometimes while running KVM guest the cpuacct_charge() accounts most of the guest time as system time. This doesn't match with user&system times shown in cpuacct.stat or proc/<pid>/stat. Demonstration: # git clone https://github.com/aryabinin/kvmsample # make # mkdir /sys/fs/cgroup/cpuacct/test # echo $$ > /sys/fs/cgroup/cpuacct/test/tasks # ./kvmsample & # for i in {1..5}; do cat /sys/fs/cgroup/cpuacct/test/cpuacct.usage_sys; sleep 1; done 1976535645 2979839428 3979832704 4983603153 5983604157 Use cpustats accounted in cpuacct_account_field() as the source of user/sys times for cpuacct.usage* files. Make cpuacct_charge() to account only summary execution time. Fixes: d740037fac70 ("sched/cpuacct: Split usage accounting into user_usage and sys_usage") Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Cc: <stable@vger.kernel.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> --- kernel/sched/cpuacct.c | 79 +++++++++++++++++------------------------- 1 file changed, 32 insertions(+), 47 deletions(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index f347cf9e4634..9de7dd51beb0 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -21,15 +21,11 @@ static const char * const cpuacct_stat_desc[] = { [CPUACCT_STAT_SYSTEM] = "system", }; -struct cpuacct_usage { - u64 usages[CPUACCT_STAT_NSTATS]; -}; - /* track CPU usage of a group of tasks and its child groups */ struct cpuacct { struct cgroup_subsys_state css; /* cpuusage holds pointer to a u64-type object on every CPU */ - struct cpuacct_usage __percpu *cpuusage; + u64 __percpu *cpuusage; struct kernel_cpustat __percpu *cpustat; }; @@ -49,7 +45,7 @@ static inline struct cpuacct *parent_ca(struct cpuacct *ca) return css_ca(ca->css.parent); } -static DEFINE_PER_CPU(struct cpuacct_usage, root_cpuacct_cpuusage); +static DEFINE_PER_CPU(u64, root_cpuacct_cpuusage); static struct cpuacct root_cpuacct = { .cpustat = &kernel_cpustat, .cpuusage = &root_cpuacct_cpuusage, @@ -68,7 +64,7 @@ cpuacct_css_alloc(struct cgroup_subsys_state *parent_css) if (!ca) goto out; - ca->cpuusage = alloc_percpu(struct cpuacct_usage); + ca->cpuusage = alloc_percpu(u64); if (!ca->cpuusage) goto out_free_ca; @@ -99,7 +95,8 @@ static void cpuacct_css_free(struct cgroup_subsys_state *css) static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, enum cpuacct_stat_index index) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; u64 data; /* @@ -116,14 +113,17 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, raw_spin_rq_lock_irq(cpu_rq(cpu)); #endif - if (index == CPUACCT_STAT_NSTATS) { - int i = 0; - - data = 0; - for (i = 0; i < CPUACCT_STAT_NSTATS; i++) - data += cpuusage->usages[i]; - } else { - data = cpuusage->usages[index]; + switch (index) { + case CPUACCT_STAT_USER: + data = cpustat[CPUTIME_USER] + cpustat[CPUTIME_NICE]; + break; + case CPUACCT_STAT_SYSTEM: + data = cpustat[CPUTIME_SYSTEM] + cpustat[CPUTIME_IRQ] + + cpustat[CPUTIME_SOFTIRQ]; + break; + case CPUACCT_STAT_NSTATS: + data = *cpuusage; + break; } #ifndef CONFIG_64BIT @@ -133,10 +133,14 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, return data; } -static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) +static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); - int i; + u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; + + /* Don't allow to reset global kernel_cpustat */ + if (ca == &root_cpuacct) + return; #ifndef CONFIG_64BIT /* @@ -144,9 +148,10 @@ static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) */ raw_spin_rq_lock_irq(cpu_rq(cpu)); #endif - - for (i = 0; i < CPUACCT_STAT_NSTATS; i++) - cpuusage->usages[i] = val; + *cpuusage = 0; + cpustat[CPUTIME_USER] = cpustat[CPUTIME_NICE] = 0; + cpustat[CPUTIME_SYSTEM] = cpustat[CPUTIME_IRQ] = 0; + cpustat[CPUTIME_SOFTIRQ] = 0; #ifndef CONFIG_64BIT raw_spin_rq_unlock_irq(cpu_rq(cpu)); @@ -197,7 +202,7 @@ static int cpuusage_write(struct cgroup_subsys_state *css, struct cftype *cft, return -EINVAL; for_each_possible_cpu(cpu) - cpuacct_cpuusage_write(ca, cpu, 0); + cpuacct_cpuusage_write(ca, cpu); return 0; } @@ -244,25 +249,10 @@ static int cpuacct_all_seq_show(struct seq_file *m, void *V) seq_puts(m, "\n"); for_each_possible_cpu(cpu) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); - seq_printf(m, "%d", cpu); - - for (index = 0; index < CPUACCT_STAT_NSTATS; index++) { -#ifndef CONFIG_64BIT - /* - * Take rq->lock to make 64-bit read safe on 32-bit - * platforms. - */ - raw_spin_rq_lock_irq(cpu_rq(cpu)); -#endif - - seq_printf(m, " %llu", cpuusage->usages[index]); - -#ifndef CONFIG_64BIT - raw_spin_rq_unlock_irq(cpu_rq(cpu)); -#endif - } + for (index = 0; index < CPUACCT_STAT_NSTATS; index++) + seq_printf(m, " %llu", + cpuacct_cpuusage_read(ca, cpu, index)); seq_puts(m, "\n"); } return 0; @@ -340,16 +330,11 @@ static struct cftype files[] = { void cpuacct_charge(struct task_struct *tsk, u64 cputime) { struct cpuacct *ca; - int index = CPUACCT_STAT_SYSTEM; - struct pt_regs *regs = get_irq_regs() ? : task_pt_regs(tsk); - - if (regs && user_mode(regs)) - index = CPUACCT_STAT_USER; rcu_read_lock(); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) - __this_cpu_add(ca->cpuusage->usages[index], cputime); + __this_cpu_add(*ca->cpuusage, cputime); rcu_read_unlock(); } -- 2.32.0
WARNING: multiple messages have this Message-ID (diff)
From: Andrey Ryabinin <arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> To: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> Cc: Andrey Ryabinin <arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org>, stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Daniel Jordan <daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, Juri Lelli <juri.lelli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Vincent Guittot <vincent.guittot-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>, Dietmar Eggemann <dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>, Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>, Ben Segall <bsegall-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>, Daniel Bristot de Oliveira <bristot-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Dan Schatzberg <schatzberg.dan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Subject: [PATCH v3 3/4] sched/cpuacct: fix user/system in shown cpuacct.usage* Date: Mon, 15 Nov 2021 19:46:06 +0300 [thread overview] Message-ID: <20211115164607.23784-3-arbn@yandex-team.com> (raw) In-Reply-To: <20211115164607.23784-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> cpuacct has 2 different ways of accounting and showing user and system times. The first one uses cpuacct_account_field() to account times and cpuacct.stat file to expose them. And this one seems to work ok. The second one is uses cpuacct_charge() function for accounting and set of cpuacct.usage* files to show times. Despite some attempts to fix it in the past it still doesn't work. Sometimes while running KVM guest the cpuacct_charge() accounts most of the guest time as system time. This doesn't match with user&system times shown in cpuacct.stat or proc/<pid>/stat. Demonstration: # git clone https://github.com/aryabinin/kvmsample # make # mkdir /sys/fs/cgroup/cpuacct/test # echo $$ > /sys/fs/cgroup/cpuacct/test/tasks # ./kvmsample & # for i in {1..5}; do cat /sys/fs/cgroup/cpuacct/test/cpuacct.usage_sys; sleep 1; done 1976535645 2979839428 3979832704 4983603153 5983604157 Use cpustats accounted in cpuacct_account_field() as the source of user/sys times for cpuacct.usage* files. Make cpuacct_charge() to account only summary execution time. Fixes: d740037fac70 ("sched/cpuacct: Split usage accounting into user_usage and sys_usage") Signed-off-by: Andrey Ryabinin <arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> Reviewed-by: Daniel Jordan <daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- kernel/sched/cpuacct.c | 79 +++++++++++++++++------------------------- 1 file changed, 32 insertions(+), 47 deletions(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index f347cf9e4634..9de7dd51beb0 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -21,15 +21,11 @@ static const char * const cpuacct_stat_desc[] = { [CPUACCT_STAT_SYSTEM] = "system", }; -struct cpuacct_usage { - u64 usages[CPUACCT_STAT_NSTATS]; -}; - /* track CPU usage of a group of tasks and its child groups */ struct cpuacct { struct cgroup_subsys_state css; /* cpuusage holds pointer to a u64-type object on every CPU */ - struct cpuacct_usage __percpu *cpuusage; + u64 __percpu *cpuusage; struct kernel_cpustat __percpu *cpustat; }; @@ -49,7 +45,7 @@ static inline struct cpuacct *parent_ca(struct cpuacct *ca) return css_ca(ca->css.parent); } -static DEFINE_PER_CPU(struct cpuacct_usage, root_cpuacct_cpuusage); +static DEFINE_PER_CPU(u64, root_cpuacct_cpuusage); static struct cpuacct root_cpuacct = { .cpustat = &kernel_cpustat, .cpuusage = &root_cpuacct_cpuusage, @@ -68,7 +64,7 @@ cpuacct_css_alloc(struct cgroup_subsys_state *parent_css) if (!ca) goto out; - ca->cpuusage = alloc_percpu(struct cpuacct_usage); + ca->cpuusage = alloc_percpu(u64); if (!ca->cpuusage) goto out_free_ca; @@ -99,7 +95,8 @@ static void cpuacct_css_free(struct cgroup_subsys_state *css) static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, enum cpuacct_stat_index index) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; u64 data; /* @@ -116,14 +113,17 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, raw_spin_rq_lock_irq(cpu_rq(cpu)); #endif - if (index == CPUACCT_STAT_NSTATS) { - int i = 0; - - data = 0; - for (i = 0; i < CPUACCT_STAT_NSTATS; i++) - data += cpuusage->usages[i]; - } else { - data = cpuusage->usages[index]; + switch (index) { + case CPUACCT_STAT_USER: + data = cpustat[CPUTIME_USER] + cpustat[CPUTIME_NICE]; + break; + case CPUACCT_STAT_SYSTEM: + data = cpustat[CPUTIME_SYSTEM] + cpustat[CPUTIME_IRQ] + + cpustat[CPUTIME_SOFTIRQ]; + break; + case CPUACCT_STAT_NSTATS: + data = *cpuusage; + break; } #ifndef CONFIG_64BIT @@ -133,10 +133,14 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, return data; } -static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) +static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); - int i; + u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; + + /* Don't allow to reset global kernel_cpustat */ + if (ca == &root_cpuacct) + return; #ifndef CONFIG_64BIT /* @@ -144,9 +148,10 @@ static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) */ raw_spin_rq_lock_irq(cpu_rq(cpu)); #endif - - for (i = 0; i < CPUACCT_STAT_NSTATS; i++) - cpuusage->usages[i] = val; + *cpuusage = 0; + cpustat[CPUTIME_USER] = cpustat[CPUTIME_NICE] = 0; + cpustat[CPUTIME_SYSTEM] = cpustat[CPUTIME_IRQ] = 0; + cpustat[CPUTIME_SOFTIRQ] = 0; #ifndef CONFIG_64BIT raw_spin_rq_unlock_irq(cpu_rq(cpu)); @@ -197,7 +202,7 @@ static int cpuusage_write(struct cgroup_subsys_state *css, struct cftype *cft, return -EINVAL; for_each_possible_cpu(cpu) - cpuacct_cpuusage_write(ca, cpu, 0); + cpuacct_cpuusage_write(ca, cpu); return 0; } @@ -244,25 +249,10 @@ static int cpuacct_all_seq_show(struct seq_file *m, void *V) seq_puts(m, "\n"); for_each_possible_cpu(cpu) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); - seq_printf(m, "%d", cpu); - - for (index = 0; index < CPUACCT_STAT_NSTATS; index++) { -#ifndef CONFIG_64BIT - /* - * Take rq->lock to make 64-bit read safe on 32-bit - * platforms. - */ - raw_spin_rq_lock_irq(cpu_rq(cpu)); -#endif - - seq_printf(m, " %llu", cpuusage->usages[index]); - -#ifndef CONFIG_64BIT - raw_spin_rq_unlock_irq(cpu_rq(cpu)); -#endif - } + for (index = 0; index < CPUACCT_STAT_NSTATS; index++) + seq_printf(m, " %llu", + cpuacct_cpuusage_read(ca, cpu, index)); seq_puts(m, "\n"); } return 0; @@ -340,16 +330,11 @@ static struct cftype files[] = { void cpuacct_charge(struct task_struct *tsk, u64 cputime) { struct cpuacct *ca; - int index = CPUACCT_STAT_SYSTEM; - struct pt_regs *regs = get_irq_regs() ? : task_pt_regs(tsk); - - if (regs && user_mode(regs)) - index = CPUACCT_STAT_USER; rcu_read_lock(); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) - __this_cpu_add(ca->cpuusage->usages[index], cputime); + __this_cpu_add(*ca->cpuusage, cputime); rcu_read_unlock(); } -- 2.32.0
next prev parent reply other threads:[~2021-11-15 16:47 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-11-15 16:46 [PATCH v3 1/4] cputime, cpuacct: Include guest time in user time in cpuacct.stat Andrey Ryabinin 2021-11-15 16:46 ` [PATCH v3 2/4] cpuacct: convert BUG_ON() to WARN_ON_ONCE() Andrey Ryabinin 2021-11-15 16:46 ` Andrey Ryabinin 2021-11-23 11:01 ` [tip: sched/core] cpuacct: Convert " tip-bot2 for Andrey Ryabinin 2021-11-15 16:46 ` Andrey Ryabinin [this message] 2021-11-15 16:46 ` [PATCH v3 3/4] sched/cpuacct: fix user/system in shown cpuacct.usage* Andrey Ryabinin 2021-11-23 11:01 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Andrey Ryabinin 2021-11-15 16:46 ` [PATCH v3 4/4] sched/cpuacct: Make user/system times in cpuacct.stat more precise Andrey Ryabinin 2021-11-15 16:46 ` Andrey Ryabinin 2021-11-23 11:01 ` [tip: sched/core] " tip-bot2 for Andrey Ryabinin 2021-11-15 21:06 ` [PATCH v3 1/4] cputime, cpuacct: Include guest time in user time in cpuacct.stat Tejun Heo 2021-11-18 14:29 ` Peter Zijlstra 2021-11-23 11:01 ` [tip: sched/core] " tip-bot2 for Andrey Ryabinin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20211115164607.23784-3-arbn@yandex-team.com \ --to=arbn@yandex-team.com \ --cc=bristot@redhat.com \ --cc=bsegall@google.com \ --cc=cgroups@vger.kernel.org \ --cc=daniel.m.jordan@oracle.com \ --cc=dietmar.eggemann@arm.com \ --cc=juri.lelli@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=mgorman@suse.de \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=rostedt@goodmis.org \ --cc=schatzberg.dan@gmail.com \ --cc=stable@vger.kernel.org \ --cc=tj@kernel.org \ --cc=vincent.guittot@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.