[PATCH 0/6] sched/nohz: Make the rest of kcpustat vtime aware v3

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] sched/nohz: Make the rest of kcpustat vtime aware v3
@ 2019-11-21  2:44 Frederic Weisbecker
  2019-11-21  2:44 ` [PATCH 1/6] sched/cputime: Support other fields on kcpustat_field() Frederic Weisbecker
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2019-11-21  2:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

(For the record, see v2 at
https://lore.kernel.org/lkml/20191119232218.4206-1-frederic@kernel.org/ )

This set addresses reviews from Ingo on previous take:

* Fix comment nit
* Remind in comment about the reschedule IPI that may happen on renice
* Constify input
* Indent a few blocks of assignments
* Use struct kernel_cpustat as an output for kcpustat accessor. It makes
  things easier and cleaner than the independant variables.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	nohz/kcpustat-v5

HEAD: 8941f0df7da73e1d95f0c0bed0121a14cdb52873

Thanks,
	Frederic
---

Frederic Weisbecker (6):
      sched/cputime: Support other fields on kcpustat_field()
      sched/vtime: Bring up complete kcpustat accessor
      procfs: Use all-in-one vtime aware kcpustat accessor
      cpufreq: Use vtime aware kcpustat accessors for user time
      leds: Use all-in-one vtime aware kcpustat accessor
      rackmeter: Use vtime aware kcpustat accessor


 drivers/cpufreq/cpufreq.c               |  17 +--
 drivers/cpufreq/cpufreq_governor.c      |   6 +-
 drivers/leds/trigger/ledtrig-activity.c |  14 ++-
 drivers/macintosh/rack-meter.c          |   7 +-
 fs/proc/stat.c                          |  56 +++++-----
 include/linux/kernel_stat.h             |   7 ++
 kernel/sched/cputime.c                  | 190 ++++++++++++++++++++++++++------
 7 files changed, 223 insertions(+), 74 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/6] sched/cputime: Support other fields on kcpustat_field()
  2019-11-21  2:44 [PATCH 0/6] sched/nohz: Make the rest of kcpustat vtime aware v3 Frederic Weisbecker
@ 2019-11-21  2:44 ` Frederic Weisbecker
  2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
  2019-11-21  2:44 ` [PATCH 2/6] sched/vtime: Bring up complete kcpustat accessor Frederic Weisbecker
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2019-11-21  2:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

Provide support for user, nice, guest and guest_nice fields through
kcpustat_field().

Whether we account the delta to a nice or not nice field is decided on
top of the nice value snapshot taken at the time we call kcpustat_field().
If the nice value of the task has been changed since the last vtime
update, we may have inacurrate distribution of the nice VS unnice
cputime.

However this is considered as a minor issue compared to the proper fix
that would involve interrupting the target on nice updates, which is
undesired on nohz_full CPUs.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/cputime.c | 54 +++++++++++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e0cd20693ef5..27b5406222fc 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -912,11 +912,21 @@ void task_cputime(struct task_struct *t, u64 *utime, u64 *stime)
 	} while (read_seqcount_retry(&vtime->seqcount, seq));
 }
 
+static u64 kcpustat_user_vtime(struct vtime *vtime)
+{
+	if (vtime->state == VTIME_USER)
+		return vtime->utime + vtime_delta(vtime);
+	else if (vtime->state == VTIME_GUEST)
+		return vtime->gtime + vtime_delta(vtime);
+	return 0;
+}
+
 static int kcpustat_field_vtime(u64 *cpustat,
-				struct vtime *vtime,
+				struct task_struct *tsk,
 				enum cpu_usage_stat usage,
 				int cpu, u64 *val)
 {
+	struct vtime *vtime = &tsk->vtime;
 	unsigned int seq;
 	int err;
 
@@ -946,9 +956,37 @@ static int kcpustat_field_vtime(u64 *cpustat,
 
 		*val = cpustat[usage];
 
-		if (vtime->state == VTIME_SYS)
-			*val += vtime->stime + vtime_delta(vtime);
-
+		/*
+		 * Nice VS unnice cputime accounting may be inaccurate if
+		 * the nice value has changed since the last vtime update.
+		 * But proper fix would involve interrupting target on nice
+		 * updates which is a no go on nohz_full (although the scheduler
+		 * may still interrupt the target if rescheduling is needed...)
+		 */
+		switch (usage) {
+		case CPUTIME_SYSTEM:
+			if (vtime->state == VTIME_SYS)
+				*val += vtime->stime + vtime_delta(vtime);
+			break;
+		case CPUTIME_USER:
+			if (task_nice(tsk) <= 0)
+				*val += kcpustat_user_vtime(vtime);
+			break;
+		case CPUTIME_NICE:
+			if (task_nice(tsk) > 0)
+				*val += kcpustat_user_vtime(vtime);
+			break;
+		case CPUTIME_GUEST:
+			if (vtime->state == VTIME_GUEST && task_nice(tsk) <= 0)
+				*val += vtime->gtime + vtime_delta(vtime);
+			break;
+		case CPUTIME_GUEST_NICE:
+			if (vtime->state == VTIME_GUEST && task_nice(tsk) > 0)
+				*val += vtime->gtime + vtime_delta(vtime);
+			break;
+		default:
+			break;
+		}
 	} while (read_seqcount_retry(&vtime->seqcount, seq));
 
 	return 0;
@@ -965,15 +1003,10 @@ u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 	if (!vtime_accounting_enabled_cpu(cpu))
 		return cpustat[usage];
 
-	/* Only support sys vtime for now */
-	if (usage != CPUTIME_SYSTEM)
-		return cpustat[usage];
-
 	rq = cpu_rq(cpu);
 
 	for (;;) {
 		struct task_struct *curr;
-		struct vtime *vtime;
 
 		rcu_read_lock();
 		curr = rcu_dereference(rq->curr);
@@ -982,8 +1015,7 @@ u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 			return cpustat[usage];
 		}
 
-		vtime = &curr->vtime;
-		err = kcpustat_field_vtime(cpustat, vtime, usage, cpu, &val);
+		err = kcpustat_field_vtime(cpustat, curr, usage, cpu, &val);
 		rcu_read_unlock();
 
 		if (!err)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/6] sched/vtime: Bring up complete kcpustat accessor
  2019-11-21  2:44 [PATCH 0/6] sched/nohz: Make the rest of kcpustat vtime aware v3 Frederic Weisbecker
  2019-11-21  2:44 ` [PATCH 1/6] sched/cputime: Support other fields on kcpustat_field() Frederic Weisbecker
@ 2019-11-21  2:44 ` Frederic Weisbecker
  2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
  2019-12-28 20:56   ` [PATCH 2/6] " Chris Wilson
  2019-11-21  2:44 ` [PATCH 3/6] procfs: Use all-in-one vtime aware kcpustat accessor Frederic Weisbecker
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2019-11-21  2:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

Many callsites want to fetch the values of system, user, user_nice, guest
or guest_nice kcpustat fields altogether or at least a pair of these.

In that case calling kcpustat_field() for each requested field brings
unecessary overhead when we could fetch all of them in a row.

So provide kcpustat_cpu_fetch() that fetches the whole kcpustat array
in a vtime safe way under the same RCU and seqcount block.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
 include/linux/kernel_stat.h |   7 ++
 kernel/sched/cputime.c      | 136 ++++++++++++++++++++++++++++++------
 2 files changed, 123 insertions(+), 20 deletions(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 79781196eb25..89f0745c096d 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -81,12 +81,19 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 extern u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 			  enum cpu_usage_stat usage, int cpu);
+extern void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu);
 #else
 static inline u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 				 enum cpu_usage_stat usage, int cpu)
 {
 	return kcpustat->cpustat[usage];
 }
+
+static inline void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu)
+{
+	*dst = kcpustat_cpu(cpu);
+}
+
 #endif
 
 extern void account_user_time(struct task_struct *, u64);
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 27b5406222fc..d43318a489f2 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -912,6 +912,30 @@ void task_cputime(struct task_struct *t, u64 *utime, u64 *stime)
 	} while (read_seqcount_retry(&vtime->seqcount, seq));
 }
 
+static int vtime_state_check(struct vtime *vtime, int cpu)
+{
+	/*
+	 * We raced against a context switch, fetch the
+	 * kcpustat task again.
+	 */
+	if (vtime->cpu != cpu && vtime->cpu != -1)
+		return -EAGAIN;
+
+	/*
+	 * Two possible things here:
+	 * 1) We are seeing the scheduling out task (prev) or any past one.
+	 * 2) We are seeing the scheduling in task (next) but it hasn't
+	 *    passed though vtime_task_switch() yet so the pending
+	 *    cputime of the prev task may not be flushed yet.
+	 *
+	 * Case 1) is ok but 2) is not. So wait for a safe VTIME state.
+	 */
+	if (vtime->state == VTIME_INACTIVE)
+		return -EAGAIN;
+
+	return 0;
+}
+
 static u64 kcpustat_user_vtime(struct vtime *vtime)
 {
 	if (vtime->state == VTIME_USER)
@@ -933,26 +957,9 @@ static int kcpustat_field_vtime(u64 *cpustat,
 	do {
 		seq = read_seqcount_begin(&vtime->seqcount);
 
-		/*
-		 * We raced against context switch, fetch the
-		 * kcpustat task again.
-		 */
-		if (vtime->cpu != cpu && vtime->cpu != -1)
-			return -EAGAIN;
-
-		/*
-		 * Two possible things here:
-		 * 1) We are seeing the scheduling out task (prev) or any past one.
-		 * 2) We are seeing the scheduling in task (next) but it hasn't
-		 *    passed though vtime_task_switch() yet so the pending
-		 *    cputime of the prev task may not be flushed yet.
-		 *
-		 * Case 1) is ok but 2) is not. So wait for a safe VTIME state.
-		 */
-		if (vtime->state == VTIME_INACTIVE)
-			return -EAGAIN;
-
-		err = 0;
+		err = vtime_state_check(vtime, cpu);
+		if (err < 0)
+			return err;
 
 		*val = cpustat[usage];
 
@@ -1025,4 +1032,93 @@ u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 	}
 }
 EXPORT_SYMBOL_GPL(kcpustat_field);
+
+static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
+				    const struct kernel_cpustat *src,
+				    struct task_struct *tsk, int cpu)
+{
+	struct vtime *vtime = &tsk->vtime;
+	unsigned int seq;
+	int err;
+
+	do {
+		u64 *cpustat;
+		u64 delta;
+
+		seq = read_seqcount_begin(&vtime->seqcount);
+
+		err = vtime_state_check(vtime, cpu);
+		if (err < 0)
+			return err;
+
+		*dst = *src;
+		cpustat = dst->cpustat;
+
+		/* Task is sleeping, dead or idle, nothing to add */
+		if (vtime->state < VTIME_SYS)
+			continue;
+
+		delta = vtime_delta(vtime);
+
+		/*
+		 * Task runs either in user (including guest) or kernel space,
+		 * add pending nohz time to the right place.
+		 */
+		if (vtime->state == VTIME_SYS) {
+			cpustat[CPUTIME_SYSTEM] += vtime->stime + delta;
+		} else if (vtime->state == VTIME_USER) {
+			if (task_nice(tsk) > 0)
+				cpustat[CPUTIME_NICE] += vtime->utime + delta;
+			else
+				cpustat[CPUTIME_USER] += vtime->utime + delta;
+		} else {
+			WARN_ON_ONCE(vtime->state != VTIME_GUEST);
+			if (task_nice(tsk) > 0) {
+				cpustat[CPUTIME_GUEST_NICE] += vtime->gtime + delta;
+				cpustat[CPUTIME_NICE] += vtime->gtime + delta;
+			} else {
+				cpustat[CPUTIME_GUEST] += vtime->gtime + delta;
+				cpustat[CPUTIME_USER] += vtime->gtime + delta;
+			}
+		}
+	} while (read_seqcount_retry(&vtime->seqcount, seq));
+
+	return err;
+}
+
+void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu)
+{
+	const struct kernel_cpustat *src = &kcpustat_cpu(cpu);
+	struct rq *rq;
+	int err;
+
+	if (!vtime_accounting_enabled_cpu(cpu)) {
+		*dst = *src;
+		return;
+	}
+
+	rq = cpu_rq(cpu);
+
+	for (;;) {
+		struct task_struct *curr;
+
+		rcu_read_lock();
+		curr = rcu_dereference(rq->curr);
+		if (WARN_ON_ONCE(!curr)) {
+			rcu_read_unlock();
+			*dst = *src;
+			return;
+		}
+
+		err = kcpustat_cpu_fetch_vtime(dst, src, curr, cpu);
+		rcu_read_unlock();
+
+		if (!err)
+			return;
+
+		cpu_relax();
+	}
+}
+EXPORT_SYMBOL_GPL(kcpustat_cpu_fetch);
+
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/6] procfs: Use all-in-one vtime aware kcpustat accessor
  2019-11-21  2:44 [PATCH 0/6] sched/nohz: Make the rest of kcpustat vtime aware v3 Frederic Weisbecker
  2019-11-21  2:44 ` [PATCH 1/6] sched/cputime: Support other fields on kcpustat_field() Frederic Weisbecker
  2019-11-21  2:44 ` [PATCH 2/6] sched/vtime: Bring up complete kcpustat accessor Frederic Weisbecker
@ 2019-11-21  2:44 ` Frederic Weisbecker
  2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
  2019-12-08 15:57   ` [PATCH 3/6] " Paul Orlyk
  2019-11-21  2:44 ` [PATCH 4/6] cpufreq: Use vtime aware kcpustat accessors for user time Frederic Weisbecker
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2019-11-21  2:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

Now that we can read also user and guest time safely under vtime, use
the relevant accessor to fix frozen kcpustat values on nohz_full CPUs.

Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
 fs/proc/stat.c | 54 ++++++++++++++++++++++++++++----------------------
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 5c6bd0ae3802..37bdbec5b402 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -120,20 +120,23 @@ static int show_stat(struct seq_file *p, void *v)
 	getboottime64(&boottime);
 
 	for_each_possible_cpu(i) {
-		struct kernel_cpustat *kcs = &kcpustat_cpu(i);
+		struct kernel_cpustat kcpustat;
+		u64 *cpustat = kcpustat.cpustat;
 
-		user += kcs->cpustat[CPUTIME_USER];
-		nice += kcs->cpustat[CPUTIME_NICE];
-		system += kcpustat_field(kcs, CPUTIME_SYSTEM, i);
-		idle += get_idle_time(kcs, i);
-		iowait += get_iowait_time(kcs, i);
-		irq += kcs->cpustat[CPUTIME_IRQ];
-		softirq += kcs->cpustat[CPUTIME_SOFTIRQ];
-		steal += kcs->cpustat[CPUTIME_STEAL];
-		guest += kcs->cpustat[CPUTIME_GUEST];
-		guest_nice += kcs->cpustat[CPUTIME_GUEST_NICE];
-		sum += kstat_cpu_irqs_sum(i);
-		sum += arch_irq_stat_cpu(i);
+		kcpustat_cpu_fetch(&kcpustat, i);
+
+		user		+= cpustat[CPUTIME_USER];
+		nice		+= cpustat[CPUTIME_NICE];
+		system		+= cpustat[CPUTIME_SYSTEM];
+		idle		+= get_idle_time(&kcpustat, i);
+		iowait		+= get_iowait_time(&kcpustat, i);
+		irq		+= cpustat[CPUTIME_IRQ];
+		softirq		+= cpustat[CPUTIME_SOFTIRQ];
+		steal		+= cpustat[CPUTIME_STEAL];
+		guest		+= cpustat[CPUTIME_GUEST];
+		guest_nice	+= cpustat[CPUTIME_USER];
+		sum		+= kstat_cpu_irqs_sum(i);
+		sum		+= arch_irq_stat_cpu(i);
 
 		for (j = 0; j < NR_SOFTIRQS; j++) {
 			unsigned int softirq_stat = kstat_softirqs_cpu(j, i);
@@ -157,19 +160,22 @@ static int show_stat(struct seq_file *p, void *v)
 	seq_putc(p, '\n');
 
 	for_each_online_cpu(i) {
-		struct kernel_cpustat *kcs = &kcpustat_cpu(i);
+		struct kernel_cpustat kcpustat;
+		u64 *cpustat = kcpustat.cpustat;
+
+		kcpustat_cpu_fetch(&kcpustat, i);
 
 		/* Copy values here to work around gcc-2.95.3, gcc-2.96 */
-		user = kcs->cpustat[CPUTIME_USER];
-		nice = kcs->cpustat[CPUTIME_NICE];
-		system = kcpustat_field(kcs, CPUTIME_SYSTEM, i);
-		idle = get_idle_time(kcs, i);
-		iowait = get_iowait_time(kcs, i);
-		irq = kcs->cpustat[CPUTIME_IRQ];
-		softirq = kcs->cpustat[CPUTIME_SOFTIRQ];
-		steal = kcs->cpustat[CPUTIME_STEAL];
-		guest = kcs->cpustat[CPUTIME_GUEST];
-		guest_nice = kcs->cpustat[CPUTIME_GUEST_NICE];
+		user		= cpustat[CPUTIME_USER];
+		nice		= cpustat[CPUTIME_NICE];
+		system		= cpustat[CPUTIME_SYSTEM];
+		idle		= get_idle_time(&kcpustat, i);
+		iowait		= get_iowait_time(&kcpustat, i);
+		irq		= cpustat[CPUTIME_IRQ];
+		softirq		= cpustat[CPUTIME_SOFTIRQ];
+		steal		= cpustat[CPUTIME_STEAL];
+		guest		= cpustat[CPUTIME_GUEST];
+		guest_nice	= cpustat[CPUTIME_USER];
 		seq_printf(p, "cpu%d", i);
 		seq_put_decimal_ull(p, " ", nsec_to_clock_t(user));
 		seq_put_decimal_ull(p, " ", nsec_to_clock_t(nice));
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/6] cpufreq: Use vtime aware kcpustat accessors for user time
  2019-11-21  2:44 [PATCH 0/6] sched/nohz: Make the rest of kcpustat vtime aware v3 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2019-11-21  2:44 ` [PATCH 3/6] procfs: Use all-in-one vtime aware kcpustat accessor Frederic Weisbecker
@ 2019-11-21  2:44 ` Frederic Weisbecker
  2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
  2019-11-21  2:44 ` [PATCH 5/6] leds: Use all-in-one vtime aware kcpustat accessor Frederic Weisbecker
  2019-11-21  2:44 ` [PATCH 6/6] rackmeter: Use " Frederic Weisbecker
  5 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2019-11-21  2:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

We can now safely read user and guest kcpustat fields on nohz_full CPUs.
Use the appropriate accessors.

Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
---
 drivers/cpufreq/cpufreq.c          | 17 ++++++++++-------
 drivers/cpufreq/cpufreq_governor.c |  6 +++---
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 527fd068dc12..ee23eaf20f35 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -113,18 +113,21 @@ EXPORT_SYMBOL_GPL(get_governor_parent_kobj);
 
 static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
 {
-	u64 idle_time;
+	struct kernel_cpustat kcpustat;
 	u64 cur_wall_time;
+	u64 idle_time;
 	u64 busy_time;
 
 	cur_wall_time = jiffies64_to_nsecs(get_jiffies_64());
 
-	busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER];
-	busy_time += kcpustat_field(&kcpustat_cpu(cpu), CPUTIME_SYSTEM, cpu);
-	busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_IRQ];
-	busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SOFTIRQ];
-	busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_STEAL];
-	busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE];
+	kcpustat_cpu_fetch(&kcpustat, cpu);
+
+	busy_time = kcpustat.cpustat[CPUTIME_USER];
+	busy_time += kcpustat.cpustat[CPUTIME_SYSTEM];
+	busy_time += kcpustat.cpustat[CPUTIME_IRQ];
+	busy_time += kcpustat.cpustat[CPUTIME_SOFTIRQ];
+	busy_time += kcpustat.cpustat[CPUTIME_STEAL];
+	busy_time += kcpustat.cpustat[CPUTIME_NICE];
 
 	idle_time = cur_wall_time - busy_time;
 	if (wall)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 4bb054d0cb43..f99ae45efaea 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -105,7 +105,7 @@ void gov_update_cpu_data(struct dbs_data *dbs_data)
 			j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_update_time,
 								  dbs_data->io_is_busy);
 			if (dbs_data->ignore_nice_load)
-				j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
+				j_cdbs->prev_cpu_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j);
 		}
 	}
 }
@@ -149,7 +149,7 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
 		j_cdbs->prev_cpu_idle = cur_idle_time;
 
 		if (ignore_nice) {
-			u64 cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
+			u64 cur_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j);
 
 			idle_time += div_u64(cur_nice - j_cdbs->prev_cpu_nice, NSEC_PER_USEC);
 			j_cdbs->prev_cpu_nice = cur_nice;
@@ -530,7 +530,7 @@ int cpufreq_dbs_governor_start(struct cpufreq_policy *policy)
 		j_cdbs->prev_load = 0;
 
 		if (ignore_nice)
-			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
+			j_cdbs->prev_cpu_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j);
 	}
 
 	gov->start(policy);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/6] leds: Use all-in-one vtime aware kcpustat accessor
  2019-11-21  2:44 [PATCH 0/6] sched/nohz: Make the rest of kcpustat vtime aware v3 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2019-11-21  2:44 ` [PATCH 4/6] cpufreq: Use vtime aware kcpustat accessors for user time Frederic Weisbecker
@ 2019-11-21  2:44 ` Frederic Weisbecker
  2019-11-21  6:58   ` Ingo Molnar
  2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
  2019-11-21  2:44 ` [PATCH 6/6] rackmeter: Use " Frederic Weisbecker
  5 siblings, 2 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2019-11-21  2:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

We can now safely read user kcpustat fields on nohz_full CPUs.
Use the appropriate accessor.

Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
 drivers/leds/trigger/ledtrig-activity.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/leds/trigger/ledtrig-activity.c b/drivers/leds/trigger/ledtrig-activity.c
index ddfc5edd07c8..6901e3631c22 100644
--- a/drivers/leds/trigger/ledtrig-activity.c
+++ b/drivers/leds/trigger/ledtrig-activity.c
@@ -57,11 +57,15 @@ static void led_activity_function(struct timer_list *t)
 	curr_used = 0;
 
 	for_each_possible_cpu(i) {
-		curr_used += kcpustat_cpu(i).cpustat[CPUTIME_USER]
-			  +  kcpustat_cpu(i).cpustat[CPUTIME_NICE]
-			  +  kcpustat_field(&kcpustat_cpu(i), CPUTIME_SYSTEM, i)
-			  +  kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]
-			  +  kcpustat_cpu(i).cpustat[CPUTIME_IRQ];
+		struct kernel_cpustat kcpustat;
+
+		kcpustat_fetch_cpu(&kcpustat, i);
+
+		curr_used += kcpustat.cpustat[CPUTIME_USER]
+			  +  kcpustat.cpustat[CPUTIME_NICE]
+			  +  kcpustat.cpustat[CPUTIME_SYSTEM]
+			  +  kcpustat.cpustat[CPUTIME_SOFTIRQ]
+			  +  kcpustat.cpustat[CPUTIME_IRQ];
 		cpus++;
 	}
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/6] rackmeter: Use vtime aware kcpustat accessor
  2019-11-21  2:44 [PATCH 0/6] sched/nohz: Make the rest of kcpustat vtime aware v3 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2019-11-21  2:44 ` [PATCH 5/6] leds: Use all-in-one vtime aware kcpustat accessor Frederic Weisbecker
@ 2019-11-21  2:44 ` Frederic Weisbecker
  2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
  5 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2019-11-21  2:44 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

Now that we have a vtime safe kcpustat accessor, use it to fetch
CPUTIME_NICE and fix frozen kcpustat values on nohz_full CPUs.

Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
 drivers/macintosh/rack-meter.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c
index 4134e580f786..60311e8d6240 100644
--- a/drivers/macintosh/rack-meter.c
+++ b/drivers/macintosh/rack-meter.c
@@ -81,13 +81,14 @@ static int rackmeter_ignore_nice;
  */
 static inline u64 get_cpu_idle_time(unsigned int cpu)
 {
+	struct kernel_cpustat *kcpustat = &kcpustat_cpu(cpu);
 	u64 retval;
 
-	retval = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE] +
-		 kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT];
+	retval = kcpustat->cpustat[CPUTIME_IDLE] +
+		 kcpustat->cpustat[CPUTIME_IOWAIT];
 
 	if (rackmeter_ignore_nice)
-		retval += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE];
+		retval += kcpustat_field(kcpustat, CPUTIME_NICE, cpu);
 
 	return retval;
 }
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] leds: Use all-in-one vtime aware kcpustat accessor
  2019-11-21  2:44 ` [PATCH 5/6] leds: Use all-in-one vtime aware kcpustat accessor Frederic Weisbecker
@ 2019-11-21  6:58   ` Ingo Molnar
  2019-11-21 14:14     ` Frederic Weisbecker
  2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
  1 sibling, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2019-11-21  6:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek


* Frederic Weisbecker <frederic@kernel.org> wrote:

> We can now safely read user kcpustat fields on nohz_full CPUs.
> Use the appropriate accessor.
> 
> Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>  drivers/leds/trigger/ledtrig-activity.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/leds/trigger/ledtrig-activity.c b/drivers/leds/trigger/ledtrig-activity.c
> index ddfc5edd07c8..6901e3631c22 100644
> --- a/drivers/leds/trigger/ledtrig-activity.c
> +++ b/drivers/leds/trigger/ledtrig-activity.c
> @@ -57,11 +57,15 @@ static void led_activity_function(struct timer_list *t)
>  	curr_used = 0;
>  
>  	for_each_possible_cpu(i) {
> -		curr_used += kcpustat_cpu(i).cpustat[CPUTIME_USER]
> -			  +  kcpustat_cpu(i).cpustat[CPUTIME_NICE]
> -			  +  kcpustat_field(&kcpustat_cpu(i), CPUTIME_SYSTEM, i)
> -			  +  kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]
> -			  +  kcpustat_cpu(i).cpustat[CPUTIME_IRQ];
> +		struct kernel_cpustat kcpustat;
> +
> +		kcpustat_fetch_cpu(&kcpustat, i);
> +
> +		curr_used += kcpustat.cpustat[CPUTIME_USER]
> +			  +  kcpustat.cpustat[CPUTIME_NICE]
> +			  +  kcpustat.cpustat[CPUTIME_SYSTEM]
> +			  +  kcpustat.cpustat[CPUTIME_SOFTIRQ]
> +			  +  kcpustat.cpustat[CPUTIME_IRQ];

Not the best tested series:

--- a/drivers/leds/trigger/ledtrig-activity.c
+++ b/drivers/leds/trigger/ledtrig-activity.c
@@ -59,7 +59,7 @@ static void led_activity_function(struct timer_list *t)
 	for_each_possible_cpu(i) {
 		struct kernel_cpustat kcpustat;
 
-		kcpustat_fetch_cpu(&kcpustat, i);
+		kcpustat_cpu_fetch(&kcpustat, i);
 
 		curr_used += kcpustat.cpustat[CPUTIME_USER]
 			  +  kcpustat.cpustat[CPUTIME_NICE]


:-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] leds: Use all-in-one vtime aware kcpustat accessor
  2019-11-21  6:58   ` Ingo Molnar
@ 2019-11-21 14:14     ` Frederic Weisbecker
  2019-11-21 16:49       ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2019-11-21 14:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, LKML, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

On Thu, Nov 21, 2019 at 07:58:26AM +0100, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <frederic@kernel.org> wrote:
> 
> > We can now safely read user kcpustat fields on nohz_full CPUs.
> > Use the appropriate accessor.
> > 
> > Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
> > Cc: Pavel Machek <pavel@ucw.cz>
> > Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Rik van Riel <riel@surriel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Wanpeng Li <wanpengli@tencent.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > ---
> >  drivers/leds/trigger/ledtrig-activity.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/leds/trigger/ledtrig-activity.c b/drivers/leds/trigger/ledtrig-activity.c
> > index ddfc5edd07c8..6901e3631c22 100644
> > --- a/drivers/leds/trigger/ledtrig-activity.c
> > +++ b/drivers/leds/trigger/ledtrig-activity.c
> > @@ -57,11 +57,15 @@ static void led_activity_function(struct timer_list *t)
> >  	curr_used = 0;
> >  
> >  	for_each_possible_cpu(i) {
> > -		curr_used += kcpustat_cpu(i).cpustat[CPUTIME_USER]
> > -			  +  kcpustat_cpu(i).cpustat[CPUTIME_NICE]
> > -			  +  kcpustat_field(&kcpustat_cpu(i), CPUTIME_SYSTEM, i)
> > -			  +  kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]
> > -			  +  kcpustat_cpu(i).cpustat[CPUTIME_IRQ];
> > +		struct kernel_cpustat kcpustat;
> > +
> > +		kcpustat_fetch_cpu(&kcpustat, i);
> > +
> > +		curr_used += kcpustat.cpustat[CPUTIME_USER]
> > +			  +  kcpustat.cpustat[CPUTIME_NICE]
> > +			  +  kcpustat.cpustat[CPUTIME_SYSTEM]
> > +			  +  kcpustat.cpustat[CPUTIME_SOFTIRQ]
> > +			  +  kcpustat.cpustat[CPUTIME_IRQ];
> 
> Not the best tested series:
> 
> --- a/drivers/leds/trigger/ledtrig-activity.c
> +++ b/drivers/leds/trigger/ledtrig-activity.c
> @@ -59,7 +59,7 @@ static void led_activity_function(struct timer_list *t)
>  	for_each_possible_cpu(i) {
>  		struct kernel_cpustat kcpustat;
>  
> -		kcpustat_fetch_cpu(&kcpustat, i);
> +		kcpustat_cpu_fetch(&kcpustat, i);
>  
>  		curr_used += kcpustat.cpustat[CPUTIME_USER]
>  			  +  kcpustat.cpustat[CPUTIME_NICE]
> 
> 
> :-)

Oops, I tested with vtime on and off but that one slipped under my config.
Do you want me to resend?

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip: sched/core] cpufreq: Use vtime aware kcpustat accessors for user time
  2019-11-21  2:44 ` [PATCH 4/6] cpufreq: Use vtime aware kcpustat accessors for user time Frederic Weisbecker
@ 2019-11-21 16:48   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2019-11-21 16:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yauheni Kaliuta, Frederic Weisbecker, Rafael J. Wysocki,
	Viresh Kumar, Peter Zijlstra, Wanpeng Li, Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     5720821ba1d845f0b91a9278137e9005c5f6931d
Gitweb:        https://git.kernel.org/tip/5720821ba1d845f0b91a9278137e9005c5f6931d
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Thu, 21 Nov 2019 03:44:28 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 21 Nov 2019 07:33:25 +01:00

cpufreq: Use vtime aware kcpustat accessors for user time

We can now safely read user and guest kcpustat fields on nohz_full CPUs.
Use the appropriate accessors.

Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Link: https://lkml.kernel.org/r/20191121024430.19938-5-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 drivers/cpufreq/cpufreq.c          | 17 ++++++++++-------
 drivers/cpufreq/cpufreq_governor.c |  6 +++---
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 527fd06..ee23eaf 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -113,18 +113,21 @@ EXPORT_SYMBOL_GPL(get_governor_parent_kobj);
 
 static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
 {
-	u64 idle_time;
+	struct kernel_cpustat kcpustat;
 	u64 cur_wall_time;
+	u64 idle_time;
 	u64 busy_time;
 
 	cur_wall_time = jiffies64_to_nsecs(get_jiffies_64());
 
-	busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER];
-	busy_time += kcpustat_field(&kcpustat_cpu(cpu), CPUTIME_SYSTEM, cpu);
-	busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_IRQ];
-	busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SOFTIRQ];
-	busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_STEAL];
-	busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE];
+	kcpustat_cpu_fetch(&kcpustat, cpu);
+
+	busy_time = kcpustat.cpustat[CPUTIME_USER];
+	busy_time += kcpustat.cpustat[CPUTIME_SYSTEM];
+	busy_time += kcpustat.cpustat[CPUTIME_IRQ];
+	busy_time += kcpustat.cpustat[CPUTIME_SOFTIRQ];
+	busy_time += kcpustat.cpustat[CPUTIME_STEAL];
+	busy_time += kcpustat.cpustat[CPUTIME_NICE];
 
 	idle_time = cur_wall_time - busy_time;
 	if (wall)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 4bb054d..f99ae45 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -105,7 +105,7 @@ void gov_update_cpu_data(struct dbs_data *dbs_data)
 			j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_update_time,
 								  dbs_data->io_is_busy);
 			if (dbs_data->ignore_nice_load)
-				j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
+				j_cdbs->prev_cpu_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j);
 		}
 	}
 }
@@ -149,7 +149,7 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
 		j_cdbs->prev_cpu_idle = cur_idle_time;
 
 		if (ignore_nice) {
-			u64 cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
+			u64 cur_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j);
 
 			idle_time += div_u64(cur_nice - j_cdbs->prev_cpu_nice, NSEC_PER_USEC);
 			j_cdbs->prev_cpu_nice = cur_nice;
@@ -530,7 +530,7 @@ int cpufreq_dbs_governor_start(struct cpufreq_policy *policy)
 		j_cdbs->prev_load = 0;
 
 		if (ignore_nice)
-			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
+			j_cdbs->prev_cpu_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j);
 	}
 
 	gov->start(policy);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: sched/core] procfs: Use all-in-one vtime aware kcpustat accessor
  2019-11-21  2:44 ` [PATCH 3/6] procfs: Use all-in-one vtime aware kcpustat accessor Frederic Weisbecker
@ 2019-11-21 16:48   ` tip-bot2 for Frederic Weisbecker
  2019-12-08 15:57   ` [PATCH 3/6] " Paul Orlyk
  1 sibling, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2019-11-21 16:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yauheni Kaliuta, Frederic Weisbecker, Al Viro, Peter Zijlstra,
	Wanpeng Li, Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     26dae145a76c3615588f263885904c6e567ff116
Gitweb:        https://git.kernel.org/tip/26dae145a76c3615588f263885904c6e567ff116
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Thu, 21 Nov 2019 03:44:27 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 21 Nov 2019 07:33:24 +01:00

procfs: Use all-in-one vtime aware kcpustat accessor

Now that we can read also user and guest time safely under vtime, use
the relevant accessor to fix frozen kcpustat values on nohz_full CPUs.

Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Link: https://lkml.kernel.org/r/20191121024430.19938-4-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 fs/proc/stat.c | 56 +++++++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 25 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 5c6bd0a..37bdbec 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -120,20 +120,23 @@ static int show_stat(struct seq_file *p, void *v)
 	getboottime64(&boottime);
 
 	for_each_possible_cpu(i) {
-		struct kernel_cpustat *kcs = &kcpustat_cpu(i);
-
-		user += kcs->cpustat[CPUTIME_USER];
-		nice += kcs->cpustat[CPUTIME_NICE];
-		system += kcpustat_field(kcs, CPUTIME_SYSTEM, i);
-		idle += get_idle_time(kcs, i);
-		iowait += get_iowait_time(kcs, i);
-		irq += kcs->cpustat[CPUTIME_IRQ];
-		softirq += kcs->cpustat[CPUTIME_SOFTIRQ];
-		steal += kcs->cpustat[CPUTIME_STEAL];
-		guest += kcs->cpustat[CPUTIME_GUEST];
-		guest_nice += kcs->cpustat[CPUTIME_GUEST_NICE];
-		sum += kstat_cpu_irqs_sum(i);
-		sum += arch_irq_stat_cpu(i);
+		struct kernel_cpustat kcpustat;
+		u64 *cpustat = kcpustat.cpustat;
+
+		kcpustat_cpu_fetch(&kcpustat, i);
+
+		user		+= cpustat[CPUTIME_USER];
+		nice		+= cpustat[CPUTIME_NICE];
+		system		+= cpustat[CPUTIME_SYSTEM];
+		idle		+= get_idle_time(&kcpustat, i);
+		iowait		+= get_iowait_time(&kcpustat, i);
+		irq		+= cpustat[CPUTIME_IRQ];
+		softirq		+= cpustat[CPUTIME_SOFTIRQ];
+		steal		+= cpustat[CPUTIME_STEAL];
+		guest		+= cpustat[CPUTIME_GUEST];
+		guest_nice	+= cpustat[CPUTIME_USER];
+		sum		+= kstat_cpu_irqs_sum(i);
+		sum		+= arch_irq_stat_cpu(i);
 
 		for (j = 0; j < NR_SOFTIRQS; j++) {
 			unsigned int softirq_stat = kstat_softirqs_cpu(j, i);
@@ -157,19 +160,22 @@ static int show_stat(struct seq_file *p, void *v)
 	seq_putc(p, '\n');
 
 	for_each_online_cpu(i) {
-		struct kernel_cpustat *kcs = &kcpustat_cpu(i);
+		struct kernel_cpustat kcpustat;
+		u64 *cpustat = kcpustat.cpustat;
+
+		kcpustat_cpu_fetch(&kcpustat, i);
 
 		/* Copy values here to work around gcc-2.95.3, gcc-2.96 */
-		user = kcs->cpustat[CPUTIME_USER];
-		nice = kcs->cpustat[CPUTIME_NICE];
-		system = kcpustat_field(kcs, CPUTIME_SYSTEM, i);
-		idle = get_idle_time(kcs, i);
-		iowait = get_iowait_time(kcs, i);
-		irq = kcs->cpustat[CPUTIME_IRQ];
-		softirq = kcs->cpustat[CPUTIME_SOFTIRQ];
-		steal = kcs->cpustat[CPUTIME_STEAL];
-		guest = kcs->cpustat[CPUTIME_GUEST];
-		guest_nice = kcs->cpustat[CPUTIME_GUEST_NICE];
+		user		= cpustat[CPUTIME_USER];
+		nice		= cpustat[CPUTIME_NICE];
+		system		= cpustat[CPUTIME_SYSTEM];
+		idle		= get_idle_time(&kcpustat, i);
+		iowait		= get_iowait_time(&kcpustat, i);
+		irq		= cpustat[CPUTIME_IRQ];
+		softirq		= cpustat[CPUTIME_SOFTIRQ];
+		steal		= cpustat[CPUTIME_STEAL];
+		guest		= cpustat[CPUTIME_GUEST];
+		guest_nice	= cpustat[CPUTIME_USER];
 		seq_printf(p, "cpu%d", i);
 		seq_put_decimal_ull(p, " ", nsec_to_clock_t(user));
 		seq_put_decimal_ull(p, " ", nsec_to_clock_t(nice));

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: sched/core] rackmeter: Use vtime aware kcpustat accessor
  2019-11-21  2:44 ` [PATCH 6/6] rackmeter: Use " Frederic Weisbecker
@ 2019-11-21 16:48   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2019-11-21 16:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yauheni Kaliuta, Frederic Weisbecker, Benjamin Herrenschmidt,
	Peter Zijlstra, Wanpeng Li, Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     8392853e964c025b0616bd54c0cdf9cbc3c9a769
Gitweb:        https://git.kernel.org/tip/8392853e964c025b0616bd54c0cdf9cbc3c9a769
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Thu, 21 Nov 2019 03:44:30 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 21 Nov 2019 07:59:00 +01:00

rackmeter: Use vtime aware kcpustat accessor

Now that we have a vtime safe kcpustat accessor, use it to fetch
CPUTIME_NICE and fix frozen kcpustat values on nohz_full CPUs.

Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Link: https://lkml.kernel.org/r/20191121024430.19938-7-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 drivers/macintosh/rack-meter.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c
index 4134e58..60311e8 100644
--- a/drivers/macintosh/rack-meter.c
+++ b/drivers/macintosh/rack-meter.c
@@ -81,13 +81,14 @@ static int rackmeter_ignore_nice;
  */
 static inline u64 get_cpu_idle_time(unsigned int cpu)
 {
+	struct kernel_cpustat *kcpustat = &kcpustat_cpu(cpu);
 	u64 retval;
 
-	retval = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE] +
-		 kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT];
+	retval = kcpustat->cpustat[CPUTIME_IDLE] +
+		 kcpustat->cpustat[CPUTIME_IOWAIT];
 
 	if (rackmeter_ignore_nice)
-		retval += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE];
+		retval += kcpustat_field(kcpustat, CPUTIME_NICE, cpu);
 
 	return retval;
 }

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: sched/core] leds: Use all-in-one vtime aware kcpustat accessor
  2019-11-21  2:44 ` [PATCH 5/6] leds: Use all-in-one vtime aware kcpustat accessor Frederic Weisbecker
  2019-11-21  6:58   ` Ingo Molnar
@ 2019-11-21 16:48   ` tip-bot2 for Frederic Weisbecker
  1 sibling, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2019-11-21 16:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yauheni Kaliuta, Frederic Weisbecker, Jacek Anaszewski,
	Pavel Machek, Dan Murphy, Peter Zijlstra, Wanpeng Li,
	Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     8688f2fb671b2ed59b1b16083136b6edc3750435
Gitweb:        https://git.kernel.org/tip/8688f2fb671b2ed59b1b16083136b6edc3750435
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Thu, 21 Nov 2019 03:44:29 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 21 Nov 2019 07:58:48 +01:00

leds: Use all-in-one vtime aware kcpustat accessor

We can now safely read user kcpustat fields on nohz_full CPUs.
Use the appropriate accessor.

[ mingo: Fixed build failure. ]

Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> (maintainer:LED SUBSYSTEM)
Cc: Pavel Machek <pavel@ucw.cz> (maintainer:LED SUBSYSTEM)
Cc: Dan Murphy <dmurphy@ti.com> (reviewer:LED SUBSYSTEM)
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Link: https://lkml.kernel.org/r/20191121024430.19938-6-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 drivers/leds/trigger/ledtrig-activity.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/leds/trigger/ledtrig-activity.c b/drivers/leds/trigger/ledtrig-activity.c
index ddfc5ed..14ba7fa 100644
--- a/drivers/leds/trigger/ledtrig-activity.c
+++ b/drivers/leds/trigger/ledtrig-activity.c
@@ -57,11 +57,15 @@ static void led_activity_function(struct timer_list *t)
 	curr_used = 0;
 
 	for_each_possible_cpu(i) {
-		curr_used += kcpustat_cpu(i).cpustat[CPUTIME_USER]
-			  +  kcpustat_cpu(i).cpustat[CPUTIME_NICE]
-			  +  kcpustat_field(&kcpustat_cpu(i), CPUTIME_SYSTEM, i)
-			  +  kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]
-			  +  kcpustat_cpu(i).cpustat[CPUTIME_IRQ];
+		struct kernel_cpustat kcpustat;
+
+		kcpustat_cpu_fetch(&kcpustat, i);
+
+		curr_used += kcpustat.cpustat[CPUTIME_USER]
+			  +  kcpustat.cpustat[CPUTIME_NICE]
+			  +  kcpustat.cpustat[CPUTIME_SYSTEM]
+			  +  kcpustat.cpustat[CPUTIME_SOFTIRQ]
+			  +  kcpustat.cpustat[CPUTIME_IRQ];
 		cpus++;
 	}
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: sched/core] sched/vtime: Bring up complete kcpustat accessor
  2019-11-21  2:44 ` [PATCH 2/6] sched/vtime: Bring up complete kcpustat accessor Frederic Weisbecker
@ 2019-11-21 16:48   ` tip-bot2 for Frederic Weisbecker
  2019-12-28 20:56   ` [PATCH 2/6] " Chris Wilson
  1 sibling, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2019-11-21 16:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Peter Zijlstra, Wanpeng Li, Yauheni Kaliuta,
	Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     74722bb223d0f236303b60c9509ff924a9713780
Gitweb:        https://git.kernel.org/tip/74722bb223d0f236303b60c9509ff924a9713780
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Thu, 21 Nov 2019 03:44:26 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 21 Nov 2019 07:33:24 +01:00

sched/vtime: Bring up complete kcpustat accessor

Many callsites want to fetch the values of system, user, user_nice, guest
or guest_nice kcpustat fields altogether or at least a pair of these.

In that case calling kcpustat_field() for each requested field brings
unecessary overhead when we could fetch all of them in a row.

So provide kcpustat_cpu_fetch() that fetches the whole kcpustat array
in a vtime safe way under the same RCU and seqcount block.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Link: https://lkml.kernel.org/r/20191121024430.19938-3-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/kernel_stat.h |   7 ++-
 kernel/sched/cputime.c      | 136 +++++++++++++++++++++++++++++------
 2 files changed, 123 insertions(+), 20 deletions(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 7978119..89f0745 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -81,12 +81,19 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 extern u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 			  enum cpu_usage_stat usage, int cpu);
+extern void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu);
 #else
 static inline u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 				 enum cpu_usage_stat usage, int cpu)
 {
 	return kcpustat->cpustat[usage];
 }
+
+static inline void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu)
+{
+	*dst = kcpustat_cpu(cpu);
+}
+
 #endif
 
 extern void account_user_time(struct task_struct *, u64);
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 27b5406..d43318a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -912,6 +912,30 @@ void task_cputime(struct task_struct *t, u64 *utime, u64 *stime)
 	} while (read_seqcount_retry(&vtime->seqcount, seq));
 }
 
+static int vtime_state_check(struct vtime *vtime, int cpu)
+{
+	/*
+	 * We raced against a context switch, fetch the
+	 * kcpustat task again.
+	 */
+	if (vtime->cpu != cpu && vtime->cpu != -1)
+		return -EAGAIN;
+
+	/*
+	 * Two possible things here:
+	 * 1) We are seeing the scheduling out task (prev) or any past one.
+	 * 2) We are seeing the scheduling in task (next) but it hasn't
+	 *    passed though vtime_task_switch() yet so the pending
+	 *    cputime of the prev task may not be flushed yet.
+	 *
+	 * Case 1) is ok but 2) is not. So wait for a safe VTIME state.
+	 */
+	if (vtime->state == VTIME_INACTIVE)
+		return -EAGAIN;
+
+	return 0;
+}
+
 static u64 kcpustat_user_vtime(struct vtime *vtime)
 {
 	if (vtime->state == VTIME_USER)
@@ -933,26 +957,9 @@ static int kcpustat_field_vtime(u64 *cpustat,
 	do {
 		seq = read_seqcount_begin(&vtime->seqcount);
 
-		/*
-		 * We raced against context switch, fetch the
-		 * kcpustat task again.
-		 */
-		if (vtime->cpu != cpu && vtime->cpu != -1)
-			return -EAGAIN;
-
-		/*
-		 * Two possible things here:
-		 * 1) We are seeing the scheduling out task (prev) or any past one.
-		 * 2) We are seeing the scheduling in task (next) but it hasn't
-		 *    passed though vtime_task_switch() yet so the pending
-		 *    cputime of the prev task may not be flushed yet.
-		 *
-		 * Case 1) is ok but 2) is not. So wait for a safe VTIME state.
-		 */
-		if (vtime->state == VTIME_INACTIVE)
-			return -EAGAIN;
-
-		err = 0;
+		err = vtime_state_check(vtime, cpu);
+		if (err < 0)
+			return err;
 
 		*val = cpustat[usage];
 
@@ -1025,4 +1032,93 @@ u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 	}
 }
 EXPORT_SYMBOL_GPL(kcpustat_field);
+
+static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
+				    const struct kernel_cpustat *src,
+				    struct task_struct *tsk, int cpu)
+{
+	struct vtime *vtime = &tsk->vtime;
+	unsigned int seq;
+	int err;
+
+	do {
+		u64 *cpustat;
+		u64 delta;
+
+		seq = read_seqcount_begin(&vtime->seqcount);
+
+		err = vtime_state_check(vtime, cpu);
+		if (err < 0)
+			return err;
+
+		*dst = *src;
+		cpustat = dst->cpustat;
+
+		/* Task is sleeping, dead or idle, nothing to add */
+		if (vtime->state < VTIME_SYS)
+			continue;
+
+		delta = vtime_delta(vtime);
+
+		/*
+		 * Task runs either in user (including guest) or kernel space,
+		 * add pending nohz time to the right place.
+		 */
+		if (vtime->state == VTIME_SYS) {
+			cpustat[CPUTIME_SYSTEM] += vtime->stime + delta;
+		} else if (vtime->state == VTIME_USER) {
+			if (task_nice(tsk) > 0)
+				cpustat[CPUTIME_NICE] += vtime->utime + delta;
+			else
+				cpustat[CPUTIME_USER] += vtime->utime + delta;
+		} else {
+			WARN_ON_ONCE(vtime->state != VTIME_GUEST);
+			if (task_nice(tsk) > 0) {
+				cpustat[CPUTIME_GUEST_NICE] += vtime->gtime + delta;
+				cpustat[CPUTIME_NICE] += vtime->gtime + delta;
+			} else {
+				cpustat[CPUTIME_GUEST] += vtime->gtime + delta;
+				cpustat[CPUTIME_USER] += vtime->gtime + delta;
+			}
+		}
+	} while (read_seqcount_retry(&vtime->seqcount, seq));
+
+	return err;
+}
+
+void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu)
+{
+	const struct kernel_cpustat *src = &kcpustat_cpu(cpu);
+	struct rq *rq;
+	int err;
+
+	if (!vtime_accounting_enabled_cpu(cpu)) {
+		*dst = *src;
+		return;
+	}
+
+	rq = cpu_rq(cpu);
+
+	for (;;) {
+		struct task_struct *curr;
+
+		rcu_read_lock();
+		curr = rcu_dereference(rq->curr);
+		if (WARN_ON_ONCE(!curr)) {
+			rcu_read_unlock();
+			*dst = *src;
+			return;
+		}
+
+		err = kcpustat_cpu_fetch_vtime(dst, src, curr, cpu);
+		rcu_read_unlock();
+
+		if (!err)
+			return;
+
+		cpu_relax();
+	}
+}
+EXPORT_SYMBOL_GPL(kcpustat_cpu_fetch);
+
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip: sched/core] sched/cputime: Support other fields on kcpustat_field()
  2019-11-21  2:44 ` [PATCH 1/6] sched/cputime: Support other fields on kcpustat_field() Frederic Weisbecker
@ 2019-11-21 16:48   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2019-11-21 16:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Peter Zijlstra, Wanpeng Li, Yauheni Kaliuta,
	Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     5a1c95580f1d89c8a736bb02ecd82a8858388b8a
Gitweb:        https://git.kernel.org/tip/5a1c95580f1d89c8a736bb02ecd82a8858388b8a
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Thu, 21 Nov 2019 03:44:25 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 21 Nov 2019 07:33:23 +01:00

sched/cputime: Support other fields on kcpustat_field()

Provide support for user, nice, guest and guest_nice fields through
kcpustat_field().

Whether we account the delta to a nice or not nice field is decided on
top of the nice value snapshot taken at the time we call kcpustat_field().
If the nice value of the task has been changed since the last vtime
update, we may have inacurrate distribution of the nice VS unnice
cputime.

However this is considered as a minor issue compared to the proper fix
that would involve interrupting the target on nice updates, which is
undesired on nohz_full CPUs.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Link: https://lkml.kernel.org/r/20191121024430.19938-2-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/cputime.c | 54 ++++++++++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e0cd206..27b5406 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -912,11 +912,21 @@ void task_cputime(struct task_struct *t, u64 *utime, u64 *stime)
 	} while (read_seqcount_retry(&vtime->seqcount, seq));
 }
 
+static u64 kcpustat_user_vtime(struct vtime *vtime)
+{
+	if (vtime->state == VTIME_USER)
+		return vtime->utime + vtime_delta(vtime);
+	else if (vtime->state == VTIME_GUEST)
+		return vtime->gtime + vtime_delta(vtime);
+	return 0;
+}
+
 static int kcpustat_field_vtime(u64 *cpustat,
-				struct vtime *vtime,
+				struct task_struct *tsk,
 				enum cpu_usage_stat usage,
 				int cpu, u64 *val)
 {
+	struct vtime *vtime = &tsk->vtime;
 	unsigned int seq;
 	int err;
 
@@ -946,9 +956,37 @@ static int kcpustat_field_vtime(u64 *cpustat,
 
 		*val = cpustat[usage];
 
-		if (vtime->state == VTIME_SYS)
-			*val += vtime->stime + vtime_delta(vtime);
-
+		/*
+		 * Nice VS unnice cputime accounting may be inaccurate if
+		 * the nice value has changed since the last vtime update.
+		 * But proper fix would involve interrupting target on nice
+		 * updates which is a no go on nohz_full (although the scheduler
+		 * may still interrupt the target if rescheduling is needed...)
+		 */
+		switch (usage) {
+		case CPUTIME_SYSTEM:
+			if (vtime->state == VTIME_SYS)
+				*val += vtime->stime + vtime_delta(vtime);
+			break;
+		case CPUTIME_USER:
+			if (task_nice(tsk) <= 0)
+				*val += kcpustat_user_vtime(vtime);
+			break;
+		case CPUTIME_NICE:
+			if (task_nice(tsk) > 0)
+				*val += kcpustat_user_vtime(vtime);
+			break;
+		case CPUTIME_GUEST:
+			if (vtime->state == VTIME_GUEST && task_nice(tsk) <= 0)
+				*val += vtime->gtime + vtime_delta(vtime);
+			break;
+		case CPUTIME_GUEST_NICE:
+			if (vtime->state == VTIME_GUEST && task_nice(tsk) > 0)
+				*val += vtime->gtime + vtime_delta(vtime);
+			break;
+		default:
+			break;
+		}
 	} while (read_seqcount_retry(&vtime->seqcount, seq));
 
 	return 0;
@@ -965,15 +1003,10 @@ u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 	if (!vtime_accounting_enabled_cpu(cpu))
 		return cpustat[usage];
 
-	/* Only support sys vtime for now */
-	if (usage != CPUTIME_SYSTEM)
-		return cpustat[usage];
-
 	rq = cpu_rq(cpu);
 
 	for (;;) {
 		struct task_struct *curr;
-		struct vtime *vtime;
 
 		rcu_read_lock();
 		curr = rcu_dereference(rq->curr);
@@ -982,8 +1015,7 @@ u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 			return cpustat[usage];
 		}
 
-		vtime = &curr->vtime;
-		err = kcpustat_field_vtime(cpustat, vtime, usage, cpu, &val);
+		err = kcpustat_field_vtime(cpustat, curr, usage, cpu, &val);
 		rcu_read_unlock();
 
 		if (!err)

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] leds: Use all-in-one vtime aware kcpustat accessor
  2019-11-21 14:14     ` Frederic Weisbecker
@ 2019-11-21 16:49       ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2019-11-21 16:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek


* Frederic Weisbecker <frederic@kernel.org> wrote:

> On Thu, Nov 21, 2019 at 07:58:26AM +0100, Ingo Molnar wrote:
> > 
> > * Frederic Weisbecker <frederic@kernel.org> wrote:
> > 
> > > We can now safely read user kcpustat fields on nohz_full CPUs.
> > > Use the appropriate accessor.
> > > 
> > > Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
> > > Cc: Pavel Machek <pavel@ucw.cz>
> > > Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Rik van Riel <riel@surriel.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Wanpeng Li <wanpengli@tencent.com>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > ---
> > >  drivers/leds/trigger/ledtrig-activity.c | 14 +++++++++-----
> > >  1 file changed, 9 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/leds/trigger/ledtrig-activity.c b/drivers/leds/trigger/ledtrig-activity.c
> > > index ddfc5edd07c8..6901e3631c22 100644
> > > --- a/drivers/leds/trigger/ledtrig-activity.c
> > > +++ b/drivers/leds/trigger/ledtrig-activity.c
> > > @@ -57,11 +57,15 @@ static void led_activity_function(struct timer_list *t)
> > >  	curr_used = 0;
> > >  
> > >  	for_each_possible_cpu(i) {
> > > -		curr_used += kcpustat_cpu(i).cpustat[CPUTIME_USER]
> > > -			  +  kcpustat_cpu(i).cpustat[CPUTIME_NICE]
> > > -			  +  kcpustat_field(&kcpustat_cpu(i), CPUTIME_SYSTEM, i)
> > > -			  +  kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]
> > > -			  +  kcpustat_cpu(i).cpustat[CPUTIME_IRQ];
> > > +		struct kernel_cpustat kcpustat;
> > > +
> > > +		kcpustat_fetch_cpu(&kcpustat, i);
> > > +
> > > +		curr_used += kcpustat.cpustat[CPUTIME_USER]
> > > +			  +  kcpustat.cpustat[CPUTIME_NICE]
> > > +			  +  kcpustat.cpustat[CPUTIME_SYSTEM]
> > > +			  +  kcpustat.cpustat[CPUTIME_SOFTIRQ]
> > > +			  +  kcpustat.cpustat[CPUTIME_IRQ];
> > 
> > Not the best tested series:
> > 
> > --- a/drivers/leds/trigger/ledtrig-activity.c
> > +++ b/drivers/leds/trigger/ledtrig-activity.c
> > @@ -59,7 +59,7 @@ static void led_activity_function(struct timer_list *t)
> >  	for_each_possible_cpu(i) {
> >  		struct kernel_cpustat kcpustat;
> >  
> > -		kcpustat_fetch_cpu(&kcpustat, i);
> > +		kcpustat_cpu_fetch(&kcpustat, i);
> >  
> >  		curr_used += kcpustat.cpustat[CPUTIME_USER]
> >  			  +  kcpustat.cpustat[CPUTIME_NICE]
> > 
> > 
> > :-)
> 
> Oops, I tested with vtime on and off but that one slipped under my config.
> Do you want me to resend?

No need, I suspect this slipped in via my last minute request for that 
interface cleanup - so I just applied the fix and tested it - it all 
passed testing today and I just pushed it out into tip:sched/core.

So all's good so far.

Thanks Frederic!

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/6] procfs: Use all-in-one vtime aware kcpustat accessor
  2019-11-21  2:44 ` [PATCH 3/6] procfs: Use all-in-one vtime aware kcpustat accessor Frederic Weisbecker
  2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
@ 2019-12-08 15:57   ` Paul Orlyk
  2019-12-09 15:50     ` Frederic Weisbecker
  1 sibling, 1 reply; 25+ messages in thread
From: Paul Orlyk @ 2019-12-08 15:57 UTC (permalink / raw)
  To: Frederic Weisbecker, Peter Zijlstra, Ingo Molnar
  Cc: LKML, Jacek Anaszewski, Wanpeng Li, Rafael J . Wysocki,
	Benjamin Herrenschmidt, Rik van Riel, Thomas Gleixner,
	Yauheni Kaliuta, Viresh Kumar, Pavel Machek

Looks like a copy-paste error. I think it should be:

- guest_nice	+= cpustat[CPUTIME_USER];
+ guest_nice	+= cpustat[CPUTIME_GUEST_NICE];

and

- guest_nice	= cpustat[CPUTIME_USER];
+ guest_nice	= cpustat[CPUTIME_GUEST_NICE];

With best regards,
Paul Orlyk

On 11/21/19 4:44 AM, Frederic Weisbecker wrote:
> +		guest		+= cpustat[CPUTIME_GUEST];
> +		guest_nice	+= cpustat[CPUTIME_USER];
> +		sum		+= kstat_cpu_irqs_sum(i);
> 
> +		guest		= cpustat[CPUTIME_GUEST];
> +		guest_nice	= cpustat[CPUTIME_USER];
>   		seq_printf(p, "cpu%d", i);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/6] procfs: Use all-in-one vtime aware kcpustat accessor
  2019-12-08 15:57   ` [PATCH 3/6] " Paul Orlyk
@ 2019-12-09 15:50     ` Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2019-12-09 15:50 UTC (permalink / raw)
  To: Paul Orlyk
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

On Sun, Dec 08, 2019 at 05:57:47PM +0200, Paul Orlyk wrote:
> Looks like a copy-paste error. I think it should be:
> 
> - guest_nice	+= cpustat[CPUTIME_USER];
> + guest_nice	+= cpustat[CPUTIME_GUEST_NICE];
> 
> and
> 
> - guest_nice	= cpustat[CPUTIME_USER];
> + guest_nice	= cpustat[CPUTIME_GUEST_NICE];
> 
> With best regards,
> Paul Orlyk

Yes the fix should be applied soonish:

https://lore.kernel.org/lkml/20191205020344.14940-1-frederic@kernel.org/

Thanks.

> 
> On 11/21/19 4:44 AM, Frederic Weisbecker wrote:
> > +		guest		+= cpustat[CPUTIME_GUEST];
> > +		guest_nice	+= cpustat[CPUTIME_USER];
> > +		sum		+= kstat_cpu_irqs_sum(i);
> > 
> > +		guest		= cpustat[CPUTIME_GUEST];
> > +		guest_nice	= cpustat[CPUTIME_USER];
> >   		seq_printf(p, "cpu%d", i);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] sched/vtime: Bring up complete kcpustat accessor
  2019-11-21  2:44 ` [PATCH 2/6] sched/vtime: Bring up complete kcpustat accessor Frederic Weisbecker
  2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
@ 2019-12-28 20:56   ` Chris Wilson
  2019-12-30  1:08     ` Frederic Weisbecker
  1 sibling, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2019-12-28 20:56 UTC (permalink / raw)
  To: Frederic Weisbecker, Ingo Molnar, Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

Quoting Frederic Weisbecker (2019-11-21 02:44:26)
> +static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
> +                                   const struct kernel_cpustat *src,
> +                                   struct task_struct *tsk, int cpu)
> +{
> +       struct vtime *vtime = &tsk->vtime;
> +       unsigned int seq;
> +       int err;
> +
> +       do {
> +               u64 *cpustat;
> +               u64 delta;
> +
> +               seq = read_seqcount_begin(&vtime->seqcount);
> +
> +               err = vtime_state_check(vtime, cpu);
> +               if (err < 0)
> +                       return err;
> +
> +               *dst = *src;
> +               cpustat = dst->cpustat;
> +
> +               /* Task is sleeping, dead or idle, nothing to add */
> +               if (vtime->state < VTIME_SYS)
> +                       continue;
> +
> +               delta = vtime_delta(vtime);
> +
> +               /*
> +                * Task runs either in user (including guest) or kernel space,
> +                * add pending nohz time to the right place.
> +                */
> +               if (vtime->state == VTIME_SYS) {
> +                       cpustat[CPUTIME_SYSTEM] += vtime->stime + delta;
> +               } else if (vtime->state == VTIME_USER) {
> +                       if (task_nice(tsk) > 0)
> +                               cpustat[CPUTIME_NICE] += vtime->utime + delta;
> +                       else
> +                               cpustat[CPUTIME_USER] += vtime->utime + delta;
> +               } else {
> +                       WARN_ON_ONCE(vtime->state != VTIME_GUEST);

I'm randomly hitting this WARN on a non-virtualised system reading
/proc/stat.

vtime->state is updated under the write_seqcount, so the access here is
deliberately racey, and the change in vtime->state would be picked up
the seqcount_retry.

Quick suggestion would be something along the lines of

 static int vtime_state_check(struct vtime *vtime, int cpu)
 {
+	int state = READ_ONCE(vtime->state);
+
 	/*
 	 * We raced against a context switch, fetch the
 	 * kcpustat task again.
@@ -930,10 +932,10 @@ static int vtime_state_check(struct vtime *vtime, int cpu)
 	 *
 	 * Case 1) is ok but 2) is not. So wait for a safe VTIME state.
 	 */
-	if (vtime->state == VTIME_INACTIVE)
+	if (state == VTIME_INACTIVE)
 		return -EAGAIN;

-	return 0;
+	return state;
 }

 static u64 kcpustat_user_vtime(struct vtime *vtime)
@@ -1055,7 +1057,7 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
 		cpustat = dst->cpustat;

 		/* Task is sleeping, dead or idle, nothing to add */
-		if (vtime->state < VTIME_SYS)
+		if (err < VTIME_SYS)
 			continue;

 		delta = vtime_delta(vtime);
@@ -1064,15 +1066,15 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
 		 * Task runs either in user (including guest) or kernel space,
 		 * add pending nohz time to the right place.
 		 */
-		if (vtime->state == VTIME_SYS) {
+		if (err == VTIME_SYS) {
 			cpustat[CPUTIME_SYSTEM] += vtime->stime + delta;
-		} else if (vtime->state == VTIME_USER) {
+		} else if (err == VTIME_USER) {
 			if (task_nice(tsk) > 0)
 				cpustat[CPUTIME_NICE] += vtime->utime + delta;
 			else
 				cpustat[CPUTIME_USER] += vtime->utime + delta;
 		} else {
-			WARN_ON_ONCE(vtime->state != VTIME_GUEST);
+			WARN_ON_ONCE(err != VTIME_GUEST);
 			if (task_nice(tsk) > 0) {
 				cpustat[CPUTIME_GUEST_NICE] += vtime->gtime + delta;
 				cpustat[CPUTIME_NICE] += vtime->gtime + delta;

Or drop the warn.
-Chris

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] sched/vtime: Bring up complete kcpustat accessor
  2019-12-28 20:56   ` [PATCH 2/6] " Chris Wilson
@ 2019-12-30  1:08     ` Frederic Weisbecker
  2019-12-30  9:04       ` [PATCH] sched/vtime: Prevent unstable evaluation of WARN(vtime->state) Chris Wilson
  0 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2019-12-30  1:08 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

On Sat, Dec 28, 2019 at 08:56:19PM +0000, Chris Wilson wrote:
> I'm randomly hitting this WARN on a non-virtualised system reading
> /proc/stat.
> 
> vtime->state is updated under the write_seqcount, so the access here is
> deliberately racey, and the change in vtime->state would be picked up
> the seqcount_retry.
> 
> Quick suggestion would be something along the lines of
> 
>  static int vtime_state_check(struct vtime *vtime, int cpu)
>  {
> +	int state = READ_ONCE(vtime->state);
> +
>  	/*
>  	 * We raced against a context switch, fetch the
>  	 * kcpustat task again.
> @@ -930,10 +932,10 @@ static int vtime_state_check(struct vtime *vtime, int cpu)
>  	 *
>  	 * Case 1) is ok but 2) is not. So wait for a safe VTIME state.
>  	 */
> -	if (vtime->state == VTIME_INACTIVE)
> +	if (state == VTIME_INACTIVE)
>  		return -EAGAIN;
> 
> -	return 0;
> +	return state;
>  }
> 
>  static u64 kcpustat_user_vtime(struct vtime *vtime)
> @@ -1055,7 +1057,7 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
>  		cpustat = dst->cpustat;
> 
>  		/* Task is sleeping, dead or idle, nothing to add */
> -		if (vtime->state < VTIME_SYS)
> +		if (err < VTIME_SYS)
>  			continue;
> 
>  		delta = vtime_delta(vtime);
> @@ -1064,15 +1066,15 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
>  		 * Task runs either in user (including guest) or kernel space,
>  		 * add pending nohz time to the right place.
>  		 */
> -		if (vtime->state == VTIME_SYS) {
> +		if (err == VTIME_SYS) {
>  			cpustat[CPUTIME_SYSTEM] += vtime->stime + delta;
> -		} else if (vtime->state == VTIME_USER) {
> +		} else if (err == VTIME_USER) {
>  			if (task_nice(tsk) > 0)
>  				cpustat[CPUTIME_NICE] += vtime->utime + delta;
>  			else
>  				cpustat[CPUTIME_USER] += vtime->utime + delta;
>  		} else {
> -			WARN_ON_ONCE(vtime->state != VTIME_GUEST);
> +			WARN_ON_ONCE(err != VTIME_GUEST);
>  			if (task_nice(tsk) > 0) {
>  				cpustat[CPUTIME_GUEST_NICE] += vtime->gtime + delta;
>  				cpustat[CPUTIME_NICE] += vtime->gtime + delta;
> 
> Or drop the warn.

Good catch, can I use your Signed-off-by ?

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] sched/vtime: Prevent unstable evaluation of WARN(vtime->state)
  2019-12-30  1:08     ` Frederic Weisbecker
@ 2019-12-30  9:04       ` Chris Wilson
  2019-12-30 17:39         ` Frederic Weisbecker
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2019-12-30  9:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chris Wilson, Frederic Weisbecker, Peter Zijlstra, Ingo Molnar

As the vtime is sampled under loose seqcount protection by kcpustat, the
vtime fields may change as the code flows. Where logic dictates a field
has a static value, use a READ_ONCE.

Fixes: 74722bb223d0 ("sched/vtime: Bring up complete kcpustat accessor")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/cputime.c | 39 +++++++++++++++++++++------------------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index d43318a489f2..96bbd52f43ae 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -914,6 +914,8 @@ void task_cputime(struct task_struct *t, u64 *utime, u64 *stime)
 
 static int vtime_state_check(struct vtime *vtime, int cpu)
 {
+	int state = READ_ONCE(vtime->state);
+
 	/*
 	 * We raced against a context switch, fetch the
 	 * kcpustat task again.
@@ -930,10 +932,10 @@ static int vtime_state_check(struct vtime *vtime, int cpu)
 	 *
 	 * Case 1) is ok but 2) is not. So wait for a safe VTIME state.
 	 */
-	if (vtime->state == VTIME_INACTIVE)
+	if (state == VTIME_INACTIVE)
 		return -EAGAIN;
 
-	return 0;
+	return state;
 }
 
 static u64 kcpustat_user_vtime(struct vtime *vtime)
@@ -952,14 +954,15 @@ static int kcpustat_field_vtime(u64 *cpustat,
 {
 	struct vtime *vtime = &tsk->vtime;
 	unsigned int seq;
-	int err;
 
 	do {
+		int state;
+
 		seq = read_seqcount_begin(&vtime->seqcount);
 
-		err = vtime_state_check(vtime, cpu);
-		if (err < 0)
-			return err;
+		state = vtime_state_check(vtime, cpu);
+		if (state < 0)
+			return state;
 
 		*val = cpustat[usage];
 
@@ -972,7 +975,7 @@ static int kcpustat_field_vtime(u64 *cpustat,
 		 */
 		switch (usage) {
 		case CPUTIME_SYSTEM:
-			if (vtime->state == VTIME_SYS)
+			if (state == VTIME_SYS)
 				*val += vtime->stime + vtime_delta(vtime);
 			break;
 		case CPUTIME_USER:
@@ -984,11 +987,11 @@ static int kcpustat_field_vtime(u64 *cpustat,
 				*val += kcpustat_user_vtime(vtime);
 			break;
 		case CPUTIME_GUEST:
-			if (vtime->state == VTIME_GUEST && task_nice(tsk) <= 0)
+			if (state == VTIME_GUEST && task_nice(tsk) <= 0)
 				*val += vtime->gtime + vtime_delta(vtime);
 			break;
 		case CPUTIME_GUEST_NICE:
-			if (vtime->state == VTIME_GUEST && task_nice(tsk) > 0)
+			if (state == VTIME_GUEST && task_nice(tsk) > 0)
 				*val += vtime->gtime + vtime_delta(vtime);
 			break;
 		default:
@@ -1039,23 +1042,23 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
 {
 	struct vtime *vtime = &tsk->vtime;
 	unsigned int seq;
-	int err;
 
 	do {
 		u64 *cpustat;
 		u64 delta;
+		int state;
 
 		seq = read_seqcount_begin(&vtime->seqcount);
 
-		err = vtime_state_check(vtime, cpu);
-		if (err < 0)
-			return err;
+		state = vtime_state_check(vtime, cpu);
+		if (state < 0)
+			return state;
 
 		*dst = *src;
 		cpustat = dst->cpustat;
 
 		/* Task is sleeping, dead or idle, nothing to add */
-		if (vtime->state < VTIME_SYS)
+		if (state < VTIME_SYS)
 			continue;
 
 		delta = vtime_delta(vtime);
@@ -1064,15 +1067,15 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
 		 * Task runs either in user (including guest) or kernel space,
 		 * add pending nohz time to the right place.
 		 */
-		if (vtime->state == VTIME_SYS) {
+		if (state == VTIME_SYS) {
 			cpustat[CPUTIME_SYSTEM] += vtime->stime + delta;
-		} else if (vtime->state == VTIME_USER) {
+		} else if (state == VTIME_USER) {
 			if (task_nice(tsk) > 0)
 				cpustat[CPUTIME_NICE] += vtime->utime + delta;
 			else
 				cpustat[CPUTIME_USER] += vtime->utime + delta;
 		} else {
-			WARN_ON_ONCE(vtime->state != VTIME_GUEST);
+			WARN_ON_ONCE(state != VTIME_GUEST);
 			if (task_nice(tsk) > 0) {
 				cpustat[CPUTIME_GUEST_NICE] += vtime->gtime + delta;
 				cpustat[CPUTIME_NICE] += vtime->gtime + delta;
@@ -1083,7 +1086,7 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
 		}
 	} while (read_seqcount_retry(&vtime->seqcount, seq));
 
-	return err;
+	return 0;
 }
 
 void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu)
-- 
2.25.0.rc0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] sched/vtime: Prevent unstable evaluation of WARN(vtime->state)
  2019-12-30  9:04       ` [PATCH] sched/vtime: Prevent unstable evaluation of WARN(vtime->state) Chris Wilson
@ 2019-12-30 17:39         ` Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2019-12-30 17:39 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Mon, Dec 30, 2019 at 09:04:36AM +0000, Chris Wilson wrote:
> As the vtime is sampled under loose seqcount protection by kcpustat, the
> vtime fields may change as the code flows. Where logic dictates a field
> has a static value, use a READ_ONCE.
> 
> Fixes: 74722bb223d0 ("sched/vtime: Bring up complete kcpustat accessor")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>

Oh thanks for taking the time to make it a proper patch!
Looks good, I'm relaying it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] sched/cputime: Support other fields on kcpustat_field()
  2019-11-20 11:51   ` Ingo Molnar
@ 2019-11-20 21:04     ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2019-11-20 21:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, LKML, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

On Wed, Nov 20, 2019 at 12:51:42PM +0100, Ingo Molnar wrote:
> * Frederic Weisbecker <frederic@kernel.org> wrote:

> > +		/*
> > +		 * Nice VS unnice cputime accounting may be inaccurate if
> > +		 * the nice value has changed since the last vtime update.
> > +		 * But proper fix would involve interrupting target on nice
> > +		 * updates which is a no go on nohz_full.
> 
> Well, we actually already interrupt the target in both sys_nice() and 
> sys_setpriority() etc. syscall variants: we call set_user_nice() which 
> calls resched_curr() and the task is sent an IPI and runs through a 
> reschedule.

I think we can easily avoid doing that IPI when we find it is the only
task on that runqueue. Which is exactly the case for NOHZ_FULL.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] sched/cputime: Support other fields on kcpustat_field()
  2019-11-19 23:22 ` [PATCH 1/6] sched/cputime: Support other fields on kcpustat_field() Frederic Weisbecker
@ 2019-11-20 11:51   ` Ingo Molnar
  2019-11-20 21:04     ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2019-11-20 11:51 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek


* Frederic Weisbecker <frederic@kernel.org> wrote:

> Provide support for user, nice, guest and guest_nice fields through
> kcpustat_field().
> 
> Whether we account the delta to a nice or not nice field is decided on
> top of the nice value snapshot taken at the time we call kcpustat_field().
> If the nice value of the task has been changed since the last vtime
> update, we may have inacurrate distribution of the nice VS unnice
> cputime.
> 
> However this is considered as a minor issue compared to the proper fix
> that would involve interrupting the target on nice updates, which is
> undesired on nohz_full CPUs.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/sched/cputime.c | 53 +++++++++++++++++++++++++++++++++---------
>  1 file changed, 42 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index e0cd20693ef5..b2cf544e2109 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -912,11 +912,21 @@ void task_cputime(struct task_struct *t, u64 *utime, u64 *stime)
>  	} while (read_seqcount_retry(&vtime->seqcount, seq));
>  }
>  
> +static u64 kcpustat_user_vtime(struct vtime *vtime)
> +{
> +	if (vtime->state == VTIME_USER)
> +		return vtime->utime + vtime_delta(vtime);
> +	else if (vtime->state == VTIME_GUEST)
> +		return vtime->gtime + vtime_delta(vtime);
> +	return 0;
> +}
> +
>  static int kcpustat_field_vtime(u64 *cpustat,
> -				struct vtime *vtime,
> +				struct task_struct *tsk,
>  				enum cpu_usage_stat usage,
>  				int cpu, u64 *val)
>  {
> +	struct vtime *vtime = &tsk->vtime;
>  	unsigned int seq;
>  	int err;
>  
> @@ -946,9 +956,36 @@ static int kcpustat_field_vtime(u64 *cpustat,
>  
>  		*val = cpustat[usage];
>  
> -		if (vtime->state == VTIME_SYS)
> -			*val += vtime->stime + vtime_delta(vtime);
> -
> +		/*
> +		 * Nice VS unnice cputime accounting may be inaccurate if
> +		 * the nice value has changed since the last vtime update.
> +		 * But proper fix would involve interrupting target on nice
> +		 * updates which is a no go on nohz_full.

Well, we actually already interrupt the target in both sys_nice() and 
sys_setpriority() etc. syscall variants: we call set_user_nice() which 
calls resched_curr() and the task is sent an IPI and runs through a 
reschedule.

But ... I do agree that this kind of granularity of nice/non-nice 
accounting doesn't really matter in practice: the changing of nice values 
is a relatively low frequency operation on most systems.

But nevertheless the comment should probably be updated to reflect this.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/6] sched/cputime: Support other fields on kcpustat_field()
  2019-11-19 23:22 [PATCH 0/6] sched/nohz: Make the rest of kcpustat vtime aware v2 Frederic Weisbecker
@ 2019-11-19 23:22 ` Frederic Weisbecker
  2019-11-20 11:51   ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2019-11-19 23:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Jacek Anaszewski, Wanpeng Li,
	Rafael J . Wysocki, Benjamin Herrenschmidt, Rik van Riel,
	Thomas Gleixner, Yauheni Kaliuta, Viresh Kumar, Pavel Machek

Provide support for user, nice, guest and guest_nice fields through
kcpustat_field().

Whether we account the delta to a nice or not nice field is decided on
top of the nice value snapshot taken at the time we call kcpustat_field().
If the nice value of the task has been changed since the last vtime
update, we may have inacurrate distribution of the nice VS unnice
cputime.

However this is considered as a minor issue compared to the proper fix
that would involve interrupting the target on nice updates, which is
undesired on nohz_full CPUs.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/cputime.c | 53 +++++++++++++++++++++++++++++++++---------
 1 file changed, 42 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e0cd20693ef5..b2cf544e2109 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -912,11 +912,21 @@ void task_cputime(struct task_struct *t, u64 *utime, u64 *stime)
 	} while (read_seqcount_retry(&vtime->seqcount, seq));
 }
 
+static u64 kcpustat_user_vtime(struct vtime *vtime)
+{
+	if (vtime->state == VTIME_USER)
+		return vtime->utime + vtime_delta(vtime);
+	else if (vtime->state == VTIME_GUEST)
+		return vtime->gtime + vtime_delta(vtime);
+	return 0;
+}
+
 static int kcpustat_field_vtime(u64 *cpustat,
-				struct vtime *vtime,
+				struct task_struct *tsk,
 				enum cpu_usage_stat usage,
 				int cpu, u64 *val)
 {
+	struct vtime *vtime = &tsk->vtime;
 	unsigned int seq;
 	int err;
 
@@ -946,9 +956,36 @@ static int kcpustat_field_vtime(u64 *cpustat,
 
 		*val = cpustat[usage];
 
-		if (vtime->state == VTIME_SYS)
-			*val += vtime->stime + vtime_delta(vtime);
-
+		/*
+		 * Nice VS unnice cputime accounting may be inaccurate if
+		 * the nice value has changed since the last vtime update.
+		 * But proper fix would involve interrupting target on nice
+		 * updates which is a no go on nohz_full.
+		 */
+		switch (usage) {
+		case CPUTIME_SYSTEM:
+			if (vtime->state == VTIME_SYS)
+				*val += vtime->stime + vtime_delta(vtime);
+			break;
+		case CPUTIME_USER:
+			if (task_nice(tsk) <= 0)
+				*val += kcpustat_user_vtime(vtime);
+			break;
+		case CPUTIME_NICE:
+			if (task_nice(tsk) > 0)
+				*val += kcpustat_user_vtime(vtime);
+			break;
+		case CPUTIME_GUEST:
+			if (vtime->state == VTIME_GUEST && task_nice(tsk) <= 0)
+				*val += vtime->gtime + vtime_delta(vtime);
+			break;
+		case CPUTIME_GUEST_NICE:
+			if (vtime->state == VTIME_GUEST && task_nice(tsk) > 0)
+				*val += vtime->gtime + vtime_delta(vtime);
+			break;
+		default:
+			break;
+		}
 	} while (read_seqcount_retry(&vtime->seqcount, seq));
 
 	return 0;
@@ -965,15 +1002,10 @@ u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 	if (!vtime_accounting_enabled_cpu(cpu))
 		return cpustat[usage];
 
-	/* Only support sys vtime for now */
-	if (usage != CPUTIME_SYSTEM)
-		return cpustat[usage];
-
 	rq = cpu_rq(cpu);
 
 	for (;;) {
 		struct task_struct *curr;
-		struct vtime *vtime;
 
 		rcu_read_lock();
 		curr = rcu_dereference(rq->curr);
@@ -982,8 +1014,7 @@ u64 kcpustat_field(struct kernel_cpustat *kcpustat,
 			return cpustat[usage];
 		}
 
-		vtime = &curr->vtime;
-		err = kcpustat_field_vtime(cpustat, vtime, usage, cpu, &val);
+		err = kcpustat_field_vtime(cpustat, curr, usage, cpu, &val);
 		rcu_read_unlock();
 
 		if (!err)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-12-30 17:39 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-21  2:44 [PATCH 0/6] sched/nohz: Make the rest of kcpustat vtime aware v3 Frederic Weisbecker
2019-11-21  2:44 ` [PATCH 1/6] sched/cputime: Support other fields on kcpustat_field() Frederic Weisbecker
2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
2019-11-21  2:44 ` [PATCH 2/6] sched/vtime: Bring up complete kcpustat accessor Frederic Weisbecker
2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
2019-12-28 20:56   ` [PATCH 2/6] " Chris Wilson
2019-12-30  1:08     ` Frederic Weisbecker
2019-12-30  9:04       ` [PATCH] sched/vtime: Prevent unstable evaluation of WARN(vtime->state) Chris Wilson
2019-12-30 17:39         ` Frederic Weisbecker
2019-11-21  2:44 ` [PATCH 3/6] procfs: Use all-in-one vtime aware kcpustat accessor Frederic Weisbecker
2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
2019-12-08 15:57   ` [PATCH 3/6] " Paul Orlyk
2019-12-09 15:50     ` Frederic Weisbecker
2019-11-21  2:44 ` [PATCH 4/6] cpufreq: Use vtime aware kcpustat accessors for user time Frederic Weisbecker
2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
2019-11-21  2:44 ` [PATCH 5/6] leds: Use all-in-one vtime aware kcpustat accessor Frederic Weisbecker
2019-11-21  6:58   ` Ingo Molnar
2019-11-21 14:14     ` Frederic Weisbecker
2019-11-21 16:49       ` Ingo Molnar
2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
2019-11-21  2:44 ` [PATCH 6/6] rackmeter: Use " Frederic Weisbecker
2019-11-21 16:48   ` [tip: sched/core] " tip-bot2 for Frederic Weisbecker
  -- strict thread matches above, loose matches on Subject: below --
2019-11-19 23:22 [PATCH 0/6] sched/nohz: Make the rest of kcpustat vtime aware v2 Frederic Weisbecker
2019-11-19 23:22 ` [PATCH 1/6] sched/cputime: Support other fields on kcpustat_field() Frederic Weisbecker
2019-11-20 11:51   ` Ingo Molnar
2019-11-20 21:04     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).