kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] sched,delayacct: Some cleanups
@ 2021-05-05 10:59 Peter Zijlstra
  2021-05-05 10:59 ` [PATCH 1/6] delayacct: Use sched_clock() Peter Zijlstra
                   ` (9 more replies)
  0 siblings, 10 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-05 10:59 UTC (permalink / raw)
  To: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, peterz, riel, hannes

Hi,

Due to:

  https://lkml.kernel.org/r/0000000000001d43ac05c0f5c6a0@google.com

and general principle, delayacct really shouldn't be using ktime (pvclock also
really shouldn't be doing what it does, but that's another story). This lead me
to looking at the SCHED_INFO, SCHEDSTATS, DELAYACCT (and PSI) accounting hell.

The rest of the patches are an attempt at simplifying all that a little. All
that crud is enabled by default for distros which is leading to a death by a
thousand cuts.

The last patch is an attempt at default disabling DELAYACCT, because I don't
think anybody actually uses that much, but what do I know, there were no ill
effects on my testbox. Perhaps we should mirror
/proc/sys/kernel/sched_schedstats and provide a delayacct sysctl for runtime
frobbing.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/6] delayacct: Use sched_clock()
  2021-05-05 10:59 [PATCH 0/6] sched,delayacct: Some cleanups Peter Zijlstra
@ 2021-05-05 10:59 ` Peter Zijlstra
  2021-05-05 14:40   ` Rik van Riel
                     ` (3 more replies)
  2021-05-05 10:59 ` [PATCH 2/6] sched: Rename sched_info_{queued,dequeued} Peter Zijlstra
                   ` (8 subsequent siblings)
  9 siblings, 4 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-05 10:59 UTC (permalink / raw)
  To: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, peterz, riel, hannes

Like all scheduler statistics, use sched_clock() based time.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/delayacct.c |   22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -7,9 +7,9 @@
 #include <linux/sched.h>
 #include <linux/sched/task.h>
 #include <linux/sched/cputime.h>
+#include <linux/sched/clock.h>
 #include <linux/slab.h>
 #include <linux/taskstats.h>
-#include <linux/time.h>
 #include <linux/sysctl.h>
 #include <linux/delayacct.h>
 #include <linux/module.h>
@@ -42,10 +42,9 @@ void __delayacct_tsk_init(struct task_st
  * Finish delay accounting for a statistic using its timestamps (@start),
  * accumalator (@total) and @count
  */
-static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total,
-			  u32 *count)
+static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count)
 {
-	s64 ns = ktime_get_ns() - *start;
+	s64 ns = local_clock() - *start;
 	unsigned long flags;
 
 	if (ns > 0) {
@@ -58,7 +57,7 @@ static void delayacct_end(raw_spinlock_t
 
 void __delayacct_blkio_start(void)
 {
-	current->delays->blkio_start = ktime_get_ns();
+	current->delays->blkio_start = local_clock();
 }
 
 /*
@@ -151,21 +150,20 @@ __u64 __delayacct_blkio_ticks(struct tas
 
 void __delayacct_freepages_start(void)
 {
-	current->delays->freepages_start = ktime_get_ns();
+	current->delays->freepages_start = local_clock();
 }
 
 void __delayacct_freepages_end(void)
 {
-	delayacct_end(
-		&current->delays->lock,
-		&current->delays->freepages_start,
-		&current->delays->freepages_delay,
-		&current->delays->freepages_count);
+	delayacct_end(&current->delays->lock,
+		      &current->delays->freepages_start,
+		      &current->delays->freepages_delay,
+		      &current->delays->freepages_count);
 }
 
 void __delayacct_thrashing_start(void)
 {
-	current->delays->thrashing_start = ktime_get_ns();
+	current->delays->thrashing_start = local_clock();
 }
 
 void __delayacct_thrashing_end(void)



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 2/6] sched: Rename sched_info_{queued,dequeued}
  2021-05-05 10:59 [PATCH 0/6] sched,delayacct: Some cleanups Peter Zijlstra
  2021-05-05 10:59 ` [PATCH 1/6] delayacct: Use sched_clock() Peter Zijlstra
@ 2021-05-05 10:59 ` Peter Zijlstra
  2021-05-05 14:39   ` Rik van Riel
                     ` (3 more replies)
  2021-05-05 10:59 ` [PATCH 3/6] sched: Simplify sched_info_on() Peter Zijlstra
                   ` (7 subsequent siblings)
  9 siblings, 4 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-05 10:59 UTC (permalink / raw)
  To: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, peterz, riel, hannes

For consistency, rename {queued,dequeued} to {enqueue,dequeue}.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c  |    4 ++--
 kernel/sched/stats.h |   20 ++++++++++----------
 2 files changed, 12 insertions(+), 12 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1935,7 +1935,7 @@ static inline void enqueue_task(struct r
 		update_rq_clock(rq);
 
 	if (!(flags & ENQUEUE_RESTORE)) {
-		sched_info_queued(rq, p);
+		sched_info_enqueue(rq, p);
 		psi_enqueue(p, flags & ENQUEUE_WAKEUP);
 	}
 
@@ -1955,7 +1955,7 @@ static inline void dequeue_task(struct r
 		update_rq_clock(rq);
 
 	if (!(flags & DEQUEUE_SAVE)) {
-		sched_info_dequeued(rq, p);
+		sched_info_dequeue(rq, p);
 		psi_dequeue(p, flags & DEQUEUE_SLEEP);
 	}
 
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -25,7 +25,7 @@ rq_sched_info_depart(struct rq *rq, unsi
 }
 
 static inline void
-rq_sched_info_dequeued(struct rq *rq, unsigned long long delta)
+rq_sched_info_dequeue(struct rq *rq, unsigned long long delta)
 {
 	if (rq)
 		rq->rq_sched_info.run_delay += delta;
@@ -42,7 +42,7 @@ rq_sched_info_dequeued(struct rq *rq, un
 
 #else /* !CONFIG_SCHEDSTATS: */
 static inline void rq_sched_info_arrive  (struct rq *rq, unsigned long long delta) { }
-static inline void rq_sched_info_dequeued(struct rq *rq, unsigned long long delta) { }
+static inline void rq_sched_info_dequeue(struct rq *rq, unsigned long long delta) { }
 static inline void rq_sched_info_depart  (struct rq *rq, unsigned long long delta) { }
 # define   schedstat_enabled()		0
 # define __schedstat_inc(var)		do { } while (0)
@@ -161,7 +161,7 @@ static inline void sched_info_reset_dequ
  * from dequeue_task() to account for possible rq->clock skew across CPUs. The
  * delta taken on each CPU would annul the skew.
  */
-static inline void sched_info_dequeued(struct rq *rq, struct task_struct *t)
+static inline void sched_info_dequeue(struct rq *rq, struct task_struct *t)
 {
 	unsigned long long now = rq_clock(rq), delta = 0;
 
@@ -172,7 +172,7 @@ static inline void sched_info_dequeued(s
 	sched_info_reset_dequeued(t);
 	t->sched_info.run_delay += delta;
 
-	rq_sched_info_dequeued(rq, delta);
+	rq_sched_info_dequeue(rq, delta);
 }
 
 /*
@@ -197,9 +197,9 @@ static void sched_info_arrive(struct rq
 /*
  * This function is only called from enqueue_task(), but also only updates
  * the timestamp if it is already not set.  It's assumed that
- * sched_info_dequeued() will clear that stamp when appropriate.
+ * sched_info_dequeue() will clear that stamp when appropriate.
  */
-static inline void sched_info_queued(struct rq *rq, struct task_struct *t)
+static inline void sched_info_enqueue(struct rq *rq, struct task_struct *t)
 {
 	if (sched_info_on()) {
 		if (!t->sched_info.last_queued)
@@ -212,7 +212,7 @@ static inline void sched_info_queued(str
  * due, typically, to expiring its time slice (this may also be called when
  * switching to the idle task).  Now we can calculate how long we ran.
  * Also, if the process is still in the TASK_RUNNING state, call
- * sched_info_queued() to mark that it has now again started waiting on
+ * sched_info_enqueue() to mark that it has now again started waiting on
  * the runqueue.
  */
 static inline void sched_info_depart(struct rq *rq, struct task_struct *t)
@@ -222,7 +222,7 @@ static inline void sched_info_depart(str
 	rq_sched_info_depart(rq, delta);
 
 	if (t->state == TASK_RUNNING)
-		sched_info_queued(rq, t);
+		sched_info_enqueue(rq, t);
 }
 
 /*
@@ -253,9 +253,9 @@ sched_info_switch(struct rq *rq, struct
 }
 
 #else /* !CONFIG_SCHED_INFO: */
-# define sched_info_queued(rq, t)	do { } while (0)
+# define sched_info_enqueue(rq, t)	do { } while (0)
 # define sched_info_reset_dequeued(t)	do { } while (0)
-# define sched_info_dequeued(rq, t)	do { } while (0)
+# define sched_info_dequeue(rq, t)	do { } while (0)
 # define sched_info_depart(rq, t)	do { } while (0)
 # define sched_info_arrive(rq, next)	do { } while (0)
 # define sched_info_switch(rq, t, next)	do { } while (0)



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 3/6] sched: Simplify sched_info_on()
  2021-05-05 10:59 [PATCH 0/6] sched,delayacct: Some cleanups Peter Zijlstra
  2021-05-05 10:59 ` [PATCH 1/6] delayacct: Use sched_clock() Peter Zijlstra
  2021-05-05 10:59 ` [PATCH 2/6] sched: Rename sched_info_{queued,dequeued} Peter Zijlstra
@ 2021-05-05 10:59 ` Peter Zijlstra
  2021-05-06 14:03   ` Johannes Weiner
  2021-05-12 11:10   ` Mel Gorman
  2021-05-05 10:59 ` [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT Peter Zijlstra
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-05 10:59 UTC (permalink / raw)
  To: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, peterz, riel, hannes

The situation around sched_info is somewhat complicated, it is used by
sched_stats and delayacct and, indirectly, kvm.

If SCHEDSTATS=Y (but disabled by default) sched_info_on() is
unconditionally true -- this is the case for all distro kernel configs
I checked.

If for some reason SCHEDSTATS=N, but TASK_DELAY_ACCT=Y, then
sched_info_on() can return false when delayacct is disabled,
presumably because there would be no other users left; except kvm is.

Instead of complicating matters further by accurately accounting
sched_stat and kvm state, simply unconditionally enable when
SCHED_INFO=Y, matching the common distro case.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched/stat.h |   10 ++--------
 kernel/delayacct.c         |    1 -
 kernel/sched/stats.h       |   37 ++++++++++---------------------------
 3 files changed, 12 insertions(+), 36 deletions(-)

--- a/include/linux/sched/stat.h
+++ b/include/linux/sched/stat.h
@@ -3,6 +3,7 @@
 #define _LINUX_SCHED_STAT_H
 
 #include <linux/percpu.h>
+#include <linux/kconfig.h>
 
 /*
  * Various counters maintained by the scheduler and fork(),
@@ -23,14 +24,7 @@ extern unsigned long nr_iowait_cpu(int c
 
 static inline int sched_info_on(void)
 {
-#ifdef CONFIG_SCHEDSTATS
-	return 1;
-#elif defined(CONFIG_TASK_DELAY_ACCT)
-	extern int delayacct_on;
-	return delayacct_on;
-#else
-	return 0;
-#endif
+	return IS_ENABLED(CONFIG_SCHED_INFO);
 }
 
 #ifdef CONFIG_SCHEDSTATS
--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -15,7 +15,6 @@
 #include <linux/module.h>
 
 int delayacct_on __read_mostly = 1;	/* Delay accounting turned on/off */
-EXPORT_SYMBOL_GPL(delayacct_on);
 struct kmem_cache *delayacct_cache;
 
 static int __init delayacct_setup_disable(char *str)
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -150,11 +150,6 @@ static inline void psi_sched_switch(stru
 #endif /* CONFIG_PSI */
 
 #ifdef CONFIG_SCHED_INFO
-static inline void sched_info_reset_dequeued(struct task_struct *t)
-{
-	t->sched_info.last_queued = 0;
-}
-
 /*
  * We are interested in knowing how long it was from the *first* time a
  * task was queued to the time that it finally hit a CPU, we call this routine
@@ -163,13 +158,12 @@ static inline void sched_info_reset_dequ
  */
 static inline void sched_info_dequeue(struct rq *rq, struct task_struct *t)
 {
-	unsigned long long now = rq_clock(rq), delta = 0;
+	unsigned long long delta = 0;
 
-	if (sched_info_on()) {
-		if (t->sched_info.last_queued)
-			delta = now - t->sched_info.last_queued;
+	if (t->sched_info.last_queued) {
+		delta = rq_clock(rq) - t->sched_info.last_queued;
+		t->sched_info.last_queued = 0;
 	}
-	sched_info_reset_dequeued(t);
 	t->sched_info.run_delay += delta;
 
 	rq_sched_info_dequeue(rq, delta);
@@ -184,9 +178,10 @@ static void sched_info_arrive(struct rq
 {
 	unsigned long long now = rq_clock(rq), delta = 0;
 
-	if (t->sched_info.last_queued)
+	if (t->sched_info.last_queued) {
 		delta = now - t->sched_info.last_queued;
-	sched_info_reset_dequeued(t);
+		t->sched_info.last_queued = 0;
+	}
 	t->sched_info.run_delay += delta;
 	t->sched_info.last_arrival = now;
 	t->sched_info.pcount++;
@@ -201,10 +196,8 @@ static void sched_info_arrive(struct rq
  */
 static inline void sched_info_enqueue(struct rq *rq, struct task_struct *t)
 {
-	if (sched_info_on()) {
-		if (!t->sched_info.last_queued)
-			t->sched_info.last_queued = rq_clock(rq);
-	}
+	if (!t->sched_info.last_queued)
+		t->sched_info.last_queued = rq_clock(rq);
 }
 
 /*
@@ -231,7 +224,7 @@ static inline void sched_info_depart(str
  * the idle task.)  We are only called when prev != next.
  */
 static inline void
-__sched_info_switch(struct rq *rq, struct task_struct *prev, struct task_struct *next)
+sched_info_switch(struct rq *rq, struct task_struct *prev, struct task_struct *next)
 {
 	/*
 	 * prev now departs the CPU.  It's not interesting to record
@@ -245,18 +238,8 @@ __sched_info_switch(struct rq *rq, struc
 		sched_info_arrive(rq, next);
 }
 
-static inline void
-sched_info_switch(struct rq *rq, struct task_struct *prev, struct task_struct *next)
-{
-	if (sched_info_on())
-		__sched_info_switch(rq, prev, next);
-}
-
 #else /* !CONFIG_SCHED_INFO: */
 # define sched_info_enqueue(rq, t)	do { } while (0)
-# define sched_info_reset_dequeued(t)	do { } while (0)
 # define sched_info_dequeue(rq, t)	do { } while (0)
-# define sched_info_depart(rq, t)	do { } while (0)
-# define sched_info_arrive(rq, next)	do { } while (0)
 # define sched_info_switch(rq, t, next)	do { } while (0)
 #endif /* CONFIG_SCHED_INFO */



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT
  2021-05-05 10:59 [PATCH 0/6] sched,delayacct: Some cleanups Peter Zijlstra
                   ` (2 preceding siblings ...)
  2021-05-05 10:59 ` [PATCH 3/6] sched: Simplify sched_info_on() Peter Zijlstra
@ 2021-05-05 10:59 ` Peter Zijlstra
  2021-05-05 11:37   ` Paolo Bonzini
                     ` (3 more replies)
  2021-05-05 10:59 ` [PATCH 5/6] delayacct: Add static_branch in scheduler hooks Peter Zijlstra
                   ` (5 subsequent siblings)
  9 siblings, 4 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-05 10:59 UTC (permalink / raw)
  To: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, peterz, riel, hannes

AFAICT KVM only relies on SCHED_INFO. Nothing uses the p->delays data
that belongs to TASK_DELAY_ACCT.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/kvm/Kconfig |    5 +----
 arch/x86/kvm/Kconfig   |    5 +----
 2 files changed, 2 insertions(+), 8 deletions(-)

--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -20,8 +20,6 @@ if VIRTUALIZATION
 menuconfig KVM
 	bool "Kernel-based Virtual Machine (KVM) support"
 	depends on OF
-	# for TASKSTATS/TASK_DELAY_ACCT:
-	depends on NET && MULTIUSER
 	select MMU_NOTIFIER
 	select PREEMPT_NOTIFIERS
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
@@ -38,8 +36,7 @@ menuconfig KVM
 	select IRQ_BYPASS_MANAGER
 	select HAVE_KVM_IRQ_BYPASS
 	select HAVE_KVM_VCPU_RUN_PID_CHANGE
-	select TASKSTATS
-	select TASK_DELAY_ACCT
+	select SCHED_INFO
 	help
 	  Support hosting virtualized guest machines.
 
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -22,8 +22,6 @@ config KVM
 	tristate "Kernel-based Virtual Machine (KVM) support"
 	depends on HAVE_KVM
 	depends on HIGH_RES_TIMERS
-	# for TASKSTATS/TASK_DELAY_ACCT:
-	depends on NET && MULTIUSER
 	depends on X86_LOCAL_APIC
 	select PREEMPT_NOTIFIERS
 	select MMU_NOTIFIER
@@ -36,8 +34,7 @@ config KVM
 	select KVM_ASYNC_PF
 	select USER_RETURN_NOTIFIER
 	select KVM_MMIO
-	select TASKSTATS
-	select TASK_DELAY_ACCT
+	select SCHED_INFO
 	select PERF_EVENTS
 	select HAVE_KVM_MSI
 	select HAVE_KVM_CPU_RELAX_INTERCEPT



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 5/6] delayacct: Add static_branch in scheduler hooks
  2021-05-05 10:59 [PATCH 0/6] sched,delayacct: Some cleanups Peter Zijlstra
                   ` (3 preceding siblings ...)
  2021-05-05 10:59 ` [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT Peter Zijlstra
@ 2021-05-05 10:59 ` Peter Zijlstra
  2021-05-06 14:05   ` Johannes Weiner
                     ` (2 more replies)
  2021-05-05 10:59 ` [PATCH 6/6] [RFC] delayacct: Default disabled Peter Zijlstra
                   ` (4 subsequent siblings)
  9 siblings, 3 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-05 10:59 UTC (permalink / raw)
  To: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, peterz, riel, hannes

Cheaper when delayacct is disabled.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/delayacct.h |    8 ++++++++
 kernel/delayacct.c        |    3 +++
 2 files changed, 11 insertions(+)

--- a/include/linux/delayacct.h
+++ b/include/linux/delayacct.h
@@ -58,8 +58,10 @@ struct task_delay_info {
 
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/jump_label.h>
 
 #ifdef CONFIG_TASK_DELAY_ACCT
+DECLARE_STATIC_KEY_TRUE(delayacct_key);
 extern int delayacct_on;	/* Delay accounting turned on/off */
 extern struct kmem_cache *delayacct_cache;
 extern void delayacct_init(void);
@@ -114,6 +116,9 @@ static inline void delayacct_tsk_free(st
 
 static inline void delayacct_blkio_start(void)
 {
+	if (!static_branch_likely(&delayacct_key))
+		return;
+
 	delayacct_set_flag(DELAYACCT_PF_BLKIO);
 	if (current->delays)
 		__delayacct_blkio_start();
@@ -121,6 +126,9 @@ static inline void delayacct_blkio_start
 
 static inline void delayacct_blkio_end(struct task_struct *p)
 {
+	if (!static_branch_likely(&delayacct_key))
+		return;
+
 	if (p->delays)
 		__delayacct_blkio_end(p);
 	delayacct_clear_flag(DELAYACCT_PF_BLKIO);
--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -14,6 +14,7 @@
 #include <linux/delayacct.h>
 #include <linux/module.h>
 
+DEFINE_STATIC_KEY_TRUE(delayacct_key);
 int delayacct_on __read_mostly = 1;	/* Delay accounting turned on/off */
 struct kmem_cache *delayacct_cache;
 
@@ -28,6 +29,8 @@ void delayacct_init(void)
 {
 	delayacct_cache = KMEM_CACHE(task_delay_info, SLAB_PANIC|SLAB_ACCOUNT);
 	delayacct_tsk_init(&init_task);
+	if (!delayacct_on)
+		static_branch_disable(&delayacct_key);
 }
 
 void __delayacct_tsk_init(struct task_struct *tsk)



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 6/6] [RFC] delayacct: Default disabled
  2021-05-05 10:59 [PATCH 0/6] sched,delayacct: Some cleanups Peter Zijlstra
                   ` (4 preceding siblings ...)
  2021-05-05 10:59 ` [PATCH 5/6] delayacct: Add static_branch in scheduler hooks Peter Zijlstra
@ 2021-05-05 10:59 ` Peter Zijlstra
  2021-05-12 11:35   ` Mel Gorman
  2021-05-05 22:29 ` [PATCH 0/6] sched,delayacct: Some cleanups Balbir Singh
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-05 10:59 UTC (permalink / raw)
  To: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, peterz, riel, hannes

Assuming this stuff isn't actually used much; disable it by default
and avoid allocating and tracking the task_delay_info structure.

taskstats is changed to still report the regular sched and sched_info
and only skip the missing task_delay_info fields instead of not
reporting anything.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Documentation/accounting/delay-accounting.rst   |    8 ++++----
 Documentation/admin-guide/kernel-parameters.txt |    2 +-
 include/linux/delayacct.h                       |   16 ++++------------
 kernel/delayacct.c                              |   19 +++++++++++--------
 4 files changed, 20 insertions(+), 25 deletions(-)

--- a/Documentation/accounting/delay-accounting.rst
+++ b/Documentation/accounting/delay-accounting.rst
@@ -69,13 +69,13 @@ Usage
 	CONFIG_TASK_DELAY_ACCT=y
 	CONFIG_TASKSTATS=y
 
-Delay accounting is enabled by default at boot up.
-To disable, add::
+Delay accounting is disabled by default at boot up.
+To enable, add::
 
-   nodelayacct
+   delayacct
 
 to the kernel boot options. The rest of the instructions
-below assume this has not been done.
+below assume this has been done.
 
 After the system has booted up, use a utility
 similar to  getdelays.c to access the delays
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3213,7 +3213,7 @@
 
 	noclflush	[BUGS=X86] Don't use the CLFLUSH instruction
 
-	nodelayacct	[KNL] Disable per-task delay accounting
+	delayacct	[KNL] Enable per-task delay accounting
 
 	nodsp		[SH] Disable hardware DSP at boot time.
 
--- a/include/linux/delayacct.h
+++ b/include/linux/delayacct.h
@@ -61,7 +61,7 @@ struct task_delay_info {
 #include <linux/jump_label.h>
 
 #ifdef CONFIG_TASK_DELAY_ACCT
-DECLARE_STATIC_KEY_TRUE(delayacct_key);
+DECLARE_STATIC_KEY_FALSE(delayacct_key);
 extern int delayacct_on;	/* Delay accounting turned on/off */
 extern struct kmem_cache *delayacct_cache;
 extern void delayacct_init(void);
@@ -69,7 +69,7 @@ extern void __delayacct_tsk_init(struct
 extern void __delayacct_tsk_exit(struct task_struct *);
 extern void __delayacct_blkio_start(void);
 extern void __delayacct_blkio_end(struct task_struct *);
-extern int __delayacct_add_tsk(struct taskstats *, struct task_struct *);
+extern int delayacct_add_tsk(struct taskstats *, struct task_struct *);
 extern __u64 __delayacct_blkio_ticks(struct task_struct *);
 extern void __delayacct_freepages_start(void);
 extern void __delayacct_freepages_end(void);
@@ -116,7 +116,7 @@ static inline void delayacct_tsk_free(st
 
 static inline void delayacct_blkio_start(void)
 {
-	if (!static_branch_likely(&delayacct_key))
+	if (!static_branch_unlikely(&delayacct_key))
 		return;
 
 	delayacct_set_flag(DELAYACCT_PF_BLKIO);
@@ -126,7 +126,7 @@ static inline void delayacct_blkio_start
 
 static inline void delayacct_blkio_end(struct task_struct *p)
 {
-	if (!static_branch_likely(&delayacct_key))
+	if (!static_branch_unlikely(&delayacct_key))
 		return;
 
 	if (p->delays)
@@ -134,14 +134,6 @@ static inline void delayacct_blkio_end(s
 	delayacct_clear_flag(DELAYACCT_PF_BLKIO);
 }
 
-static inline int delayacct_add_tsk(struct taskstats *d,
-					struct task_struct *tsk)
-{
-	if (!delayacct_on || !tsk->delays)
-		return 0;
-	return __delayacct_add_tsk(d, tsk);
-}
-
 static inline __u64 delayacct_blkio_ticks(struct task_struct *tsk)
 {
 	if (tsk->delays)
--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -14,23 +14,23 @@
 #include <linux/delayacct.h>
 #include <linux/module.h>
 
-DEFINE_STATIC_KEY_TRUE(delayacct_key);
-int delayacct_on __read_mostly = 1;	/* Delay accounting turned on/off */
+DEFINE_STATIC_KEY_FALSE(delayacct_key);
+int delayacct_on __read_mostly;	/* Delay accounting turned on/off */
 struct kmem_cache *delayacct_cache;
 
-static int __init delayacct_setup_disable(char *str)
+static int __init delayacct_setup_enable(char *str)
 {
-	delayacct_on = 0;
+	delayacct_on = 1;
 	return 1;
 }
-__setup("nodelayacct", delayacct_setup_disable);
+__setup("delayacct", delayacct_setup_enable);
 
 void delayacct_init(void)
 {
 	delayacct_cache = KMEM_CACHE(task_delay_info, SLAB_PANIC|SLAB_ACCOUNT);
 	delayacct_tsk_init(&init_task);
-	if (!delayacct_on)
-		static_branch_disable(&delayacct_key);
+	if (delayacct_on)
+		static_branch_enable(&delayacct_key);
 }
 
 void __delayacct_tsk_init(struct task_struct *tsk)
@@ -83,7 +83,7 @@ void __delayacct_blkio_end(struct task_s
 	delayacct_end(&delays->lock, &delays->blkio_start, total, count);
 }
 
-int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
+int delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
 {
 	u64 utime, stime, stimescaled, utimescaled;
 	unsigned long long t2, t3;
@@ -118,6 +118,9 @@ int __delayacct_add_tsk(struct taskstats
 	d->cpu_run_virtual_total =
 		(tmp < (s64)d->cpu_run_virtual_total) ?	0 : tmp;
 
+	if (!tsk->delays)
+		return 0;
+
 	/* zero XXX_total, non-zero XXX_count implies XXX stat overflowed */
 
 	raw_spin_lock_irqsave(&tsk->delays->lock, flags);



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT
  2021-05-05 10:59 ` [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT Peter Zijlstra
@ 2021-05-05 11:37   ` Paolo Bonzini
  2021-05-06 14:38   ` Marc Zyngier
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2021-05-05 11:37 UTC (permalink / raw)
  To: Peter Zijlstra, tglx, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	bsingharora, maz
  Cc: linux-kernel, kvm, riel, hannes

On 05/05/21 12:59, Peter Zijlstra wrote:
> AFAICT KVM only relies on SCHED_INFO. Nothing uses the p->delays data
> that belongs to TASK_DELAY_ACCT.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Right, SCHED_INFO didn't exist at the time (it was introduced in 2015, 
while KVM started using run_delay in 2011).  I'm not sure if it could 
have used SCHEDSTATS instead.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> ---
>   arch/arm64/kvm/Kconfig |    5 +----
>   arch/x86/kvm/Kconfig   |    5 +----
>   2 files changed, 2 insertions(+), 8 deletions(-)
> 
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -20,8 +20,6 @@ if VIRTUALIZATION
>   menuconfig KVM
>   	bool "Kernel-based Virtual Machine (KVM) support"
>   	depends on OF
> -	# for TASKSTATS/TASK_DELAY_ACCT:
> -	depends on NET && MULTIUSER
>   	select MMU_NOTIFIER
>   	select PREEMPT_NOTIFIERS
>   	select HAVE_KVM_CPU_RELAX_INTERCEPT
> @@ -38,8 +36,7 @@ menuconfig KVM
>   	select IRQ_BYPASS_MANAGER
>   	select HAVE_KVM_IRQ_BYPASS
>   	select HAVE_KVM_VCPU_RUN_PID_CHANGE
> -	select TASKSTATS
> -	select TASK_DELAY_ACCT
> +	select SCHED_INFO
>   	help
>   	  Support hosting virtualized guest machines.
>   
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -22,8 +22,6 @@ config KVM
>   	tristate "Kernel-based Virtual Machine (KVM) support"
>   	depends on HAVE_KVM
>   	depends on HIGH_RES_TIMERS
> -	# for TASKSTATS/TASK_DELAY_ACCT:
> -	depends on NET && MULTIUSER
>   	depends on X86_LOCAL_APIC
>   	select PREEMPT_NOTIFIERS
>   	select MMU_NOTIFIER
> @@ -36,8 +34,7 @@ config KVM
>   	select KVM_ASYNC_PF
>   	select USER_RETURN_NOTIFIER
>   	select KVM_MMIO
> -	select TASKSTATS
> -	select TASK_DELAY_ACCT
> +	select SCHED_INFO
>   	select PERF_EVENTS
>   	select HAVE_KVM_MSI
>   	select HAVE_KVM_CPU_RELAX_INTERCEPT
> 
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/6] sched: Rename sched_info_{queued,dequeued}
  2021-05-05 10:59 ` [PATCH 2/6] sched: Rename sched_info_{queued,dequeued} Peter Zijlstra
@ 2021-05-05 14:39   ` Rik van Riel
  2021-05-06 13:59   ` Johannes Weiner
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: Rik van Riel @ 2021-05-05 14:39 UTC (permalink / raw)
  To: Peter Zijlstra, tglx, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, hannes

[-- Attachment #1: Type: text/plain, Size: 284 bytes --]

On Wed, 2021-05-05 at 12:59 +0200, Peter Zijlstra wrote:
> For consistency, rename {queued,dequeued} to {enqueue,dequeue}.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Ohhh nice.

Reviewed-by: Rik van Riel <riel@surriel.com>

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/6] delayacct: Use sched_clock()
  2021-05-05 10:59 ` [PATCH 1/6] delayacct: Use sched_clock() Peter Zijlstra
@ 2021-05-05 14:40   ` Rik van Riel
  2021-05-06 13:59   ` Johannes Weiner
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: Rik van Riel @ 2021-05-05 14:40 UTC (permalink / raw)
  To: Peter Zijlstra, tglx, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, hannes

[-- Attachment #1: Type: text/plain, Size: 433 bytes --]

On Wed, 2021-05-05 at 12:59 +0200, Peter Zijlstra wrote:
> Like all scheduler statistics, use sched_clock() based time.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Looks like this works even on messed up systems because
the delayacct code is called from the same CPU at sleep
time and wakeup time, before a task is migrated.

Reviewed-by: Rik van Riel <riel@surriel.com>

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/6] sched,delayacct: Some cleanups
  2021-05-05 10:59 [PATCH 0/6] sched,delayacct: Some cleanups Peter Zijlstra
                   ` (5 preceding siblings ...)
  2021-05-05 10:59 ` [PATCH 6/6] [RFC] delayacct: Default disabled Peter Zijlstra
@ 2021-05-05 22:29 ` Balbir Singh
  2021-05-06  9:13   ` Peter Zijlstra
  2021-05-07  9:05 ` Thomas Gleixner
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Balbir Singh @ 2021-05-05 22:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, pbonzini, maz, linux-kernel,
	kvm, riel, hannes

On Wed, May 05, 2021 at 12:59:40PM +0200, Peter Zijlstra wrote:
> Hi,
> 
> Due to:
> 
>   https://lkml.kernel.org/r/0000000000001d43ac05c0f5c6a0@google.com
> 
> and general principle, delayacct really shouldn't be using ktime (pvclock also
> really shouldn't be doing what it does, but that's another story). This lead me
> to looking at the SCHED_INFO, SCHEDSTATS, DELAYACCT (and PSI) accounting hell.
> 
> The rest of the patches are an attempt at simplifying all that a little. All
> that crud is enabled by default for distros which is leading to a death by a
> thousand cuts.
> 
> The last patch is an attempt at default disabling DELAYACCT, because I don't
> think anybody actually uses that much, but what do I know, there were no ill
> effects on my testbox. Perhaps we should mirror
> /proc/sys/kernel/sched_schedstats and provide a delayacct sysctl for runtime
> frobbing.
>

There are tools like iotop that use delayacct to display information. When the
code was checked in, we did run SPEC* back in the day 2006 to find overheads,
nothing significant showed. Do we have any date on the overhead your seeing?

I'll look at the patches

Balbir Singh. 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/6] sched,delayacct: Some cleanups
  2021-05-05 22:29 ` [PATCH 0/6] sched,delayacct: Some cleanups Balbir Singh
@ 2021-05-06  9:13   ` Peter Zijlstra
  2021-05-07 12:38     ` Balbir Singh
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-06  9:13 UTC (permalink / raw)
  To: Balbir Singh
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, pbonzini, maz, linux-kernel,
	kvm, riel, hannes

On Thu, May 06, 2021 at 08:29:40AM +1000, Balbir Singh wrote:
> On Wed, May 05, 2021 at 12:59:40PM +0200, Peter Zijlstra wrote:
> > Hi,
> > 
> > Due to:
> > 
> >   https://lkml.kernel.org/r/0000000000001d43ac05c0f5c6a0@google.com
> > 
> > and general principle, delayacct really shouldn't be using ktime (pvclock also
> > really shouldn't be doing what it does, but that's another story). This lead me
> > to looking at the SCHED_INFO, SCHEDSTATS, DELAYACCT (and PSI) accounting hell.
> > 
> > The rest of the patches are an attempt at simplifying all that a little. All
> > that crud is enabled by default for distros which is leading to a death by a
> > thousand cuts.
> > 
> > The last patch is an attempt at default disabling DELAYACCT, because I don't
> > think anybody actually uses that much, but what do I know, there were no ill
> > effects on my testbox. Perhaps we should mirror
> > /proc/sys/kernel/sched_schedstats and provide a delayacct sysctl for runtime
> > frobbing.
> >
> 
> There are tools like iotop that use delayacct to display information. 

Right, but how many actual people use that? Does that justify saddling
the whole sodding world with the overhead?

> When the
> code was checked in, we did run SPEC* back in the day 2006 to find overheads,
> nothing significant showed. Do we have any date on the overhead your seeing?

I've not looked, but having it disabled saves that per-task allocation
and that spinlock in delayacct_end() for iowait wakeups and a bunch of
cache misses ofcourse.

I doubt SPEC is a benchmark that tickles those paths much if at all.

The thing is; we can't just keep growing more and more stats, that'll
kill us quite dead.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/6] delayacct: Use sched_clock()
  2021-05-05 10:59 ` [PATCH 1/6] delayacct: Use sched_clock() Peter Zijlstra
  2021-05-05 14:40   ` Rik van Riel
@ 2021-05-06 13:59   ` Johannes Weiner
  2021-05-06 14:17     ` Peter Zijlstra
  2021-05-07 12:40   ` Balbir Singh
  2021-05-12 10:43   ` Mel Gorman
  3 siblings, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2021-05-06 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel

On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote:
> @@ -42,10 +42,9 @@ void __delayacct_tsk_init(struct task_st
>   * Finish delay accounting for a statistic using its timestamps (@start),
>   * accumalator (@total) and @count
>   */
> -static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total,
> -			  u32 *count)
> +static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count)
>  {
> -	s64 ns = ktime_get_ns() - *start;
> +	s64 ns = local_clock() - *start;

I don't think this is safe. These time sections that have preemption
and migration enabled and so might span multiple CPUs. local_clock()
could end up behind *start, AFAICS.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/6] sched: Rename sched_info_{queued,dequeued}
  2021-05-05 10:59 ` [PATCH 2/6] sched: Rename sched_info_{queued,dequeued} Peter Zijlstra
  2021-05-05 14:39   ` Rik van Riel
@ 2021-05-06 13:59   ` Johannes Weiner
  2021-05-10  8:45   ` Balbir Singh
  2021-05-12 10:49   ` Mel Gorman
  3 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2021-05-06 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel

On Wed, May 05, 2021 at 12:59:42PM +0200, Peter Zijlstra wrote:
> For consistency, rename {queued,dequeued} to {enqueue,dequeue}.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/6] sched: Simplify sched_info_on()
  2021-05-05 10:59 ` [PATCH 3/6] sched: Simplify sched_info_on() Peter Zijlstra
@ 2021-05-06 14:03   ` Johannes Weiner
  2021-05-12 11:10   ` Mel Gorman
  1 sibling, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2021-05-06 14:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel

On Wed, May 05, 2021 at 12:59:43PM +0200, Peter Zijlstra wrote:
> The situation around sched_info is somewhat complicated, it is used by
> sched_stats and delayacct and, indirectly, kvm.
> 
> If SCHEDSTATS=Y (but disabled by default) sched_info_on() is
> unconditionally true -- this is the case for all distro kernel configs
> I checked.
> 
> If for some reason SCHEDSTATS=N, but TASK_DELAY_ACCT=Y, then
> sched_info_on() can return false when delayacct is disabled,
> presumably because there would be no other users left; except kvm is.
> 
> Instead of complicating matters further by accurately accounting
> sched_stat and kvm state, simply unconditionally enable when
> SCHED_INFO=Y, matching the common distro case.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 5/6] delayacct: Add static_branch in scheduler hooks
  2021-05-05 10:59 ` [PATCH 5/6] delayacct: Add static_branch in scheduler hooks Peter Zijlstra
@ 2021-05-06 14:05   ` Johannes Weiner
  2021-05-10  8:42   ` Balbir Singh
  2021-05-12 11:13   ` Mel Gorman
  2 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2021-05-06 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel

On Wed, May 05, 2021 at 12:59:45PM +0200, Peter Zijlstra wrote:
> Cheaper when delayacct is disabled.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/6] delayacct: Use sched_clock()
  2021-05-06 13:59   ` Johannes Weiner
@ 2021-05-06 14:17     ` Peter Zijlstra
  2021-05-06 15:17       ` Johannes Weiner
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-06 14:17 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel

On Thu, May 06, 2021 at 09:59:11AM -0400, Johannes Weiner wrote:
> On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote:
> > @@ -42,10 +42,9 @@ void __delayacct_tsk_init(struct task_st
> >   * Finish delay accounting for a statistic using its timestamps (@start),
> >   * accumalator (@total) and @count
> >   */
> > -static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total,
> > -			  u32 *count)
> > +static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count)
> >  {
> > -	s64 ns = ktime_get_ns() - *start;
> > +	s64 ns = local_clock() - *start;
> 
> I don't think this is safe. These time sections that have preemption
> and migration enabled and so might span multiple CPUs. local_clock()
> could end up behind *start, AFAICS.

Only if you have really crummy hardware, and in that case the drift is
bounded by around 1 tick. Also, this function actually checks: ns > 0.

And if you do have crummy hardware like that, ktime_get_ns() is the very
last thing you want to call at any frequency because it'll be the HPET.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT
  2021-05-05 10:59 ` [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT Peter Zijlstra
  2021-05-05 11:37   ` Paolo Bonzini
@ 2021-05-06 14:38   ` Marc Zyngier
  2021-05-07 12:42   ` Balbir Singh
  2021-05-12 11:11   ` Mel Gorman
  3 siblings, 0 replies; 43+ messages in thread
From: Marc Zyngier @ 2021-05-06 14:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini,
	linux-kernel, kvm, riel, hannes

On Wed, 05 May 2021 11:59:44 +0100,
Peter Zijlstra <peterz@infradead.org> wrote:
> 
> AFAICT KVM only relies on SCHED_INFO. Nothing uses the p->delays data
> that belongs to TASK_DELAY_ACCT.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm64/kvm/Kconfig |    5 +----
>  arch/x86/kvm/Kconfig   |    5 +----
>  2 files changed, 2 insertions(+), 8 deletions(-)

Acked-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/6] delayacct: Use sched_clock()
  2021-05-06 14:17     ` Peter Zijlstra
@ 2021-05-06 15:17       ` Johannes Weiner
  0 siblings, 0 replies; 43+ messages in thread
From: Johannes Weiner @ 2021-05-06 15:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel

On Thu, May 06, 2021 at 04:17:33PM +0200, Peter Zijlstra wrote:
> On Thu, May 06, 2021 at 09:59:11AM -0400, Johannes Weiner wrote:
> > On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote:
> > > @@ -42,10 +42,9 @@ void __delayacct_tsk_init(struct task_st
> > >   * Finish delay accounting for a statistic using its timestamps (@start),
> > >   * accumalator (@total) and @count
> > >   */
> > > -static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total,
> > > -			  u32 *count)
> > > +static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count)
> > >  {
> > > -	s64 ns = ktime_get_ns() - *start;
> > > +	s64 ns = local_clock() - *start;
> > 
> > I don't think this is safe. These time sections that have preemption
> > and migration enabled and so might span multiple CPUs. local_clock()
> > could end up behind *start, AFAICS.
> 
> Only if you have really crummy hardware, and in that case the drift is
> bounded by around 1 tick. Also, this function actually checks: ns > 0.

Oh, I didn't realize it was that close. I just went off the dramatic
warnings on cpu_clock() :-) But yeah, that seems plenty accurate for
this purpose.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/6] sched,delayacct: Some cleanups
  2021-05-05 10:59 [PATCH 0/6] sched,delayacct: Some cleanups Peter Zijlstra
                   ` (6 preceding siblings ...)
  2021-05-05 22:29 ` [PATCH 0/6] sched,delayacct: Some cleanups Balbir Singh
@ 2021-05-07  9:05 ` Thomas Gleixner
  2021-05-10  7:08 ` Ingo Molnar
  2021-05-10 12:05 ` [PATCH 7/6] delayacct: Add sysctl to enable at runtime Peter Zijlstra
  9 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2021-05-07  9:05 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot,
	bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, peterz, riel, hannes

On Wed, May 05 2021 at 12:59, Peter Zijlstra wrote:
> Due to:
>
>   https://lkml.kernel.org/r/0000000000001d43ac05c0f5c6a0@google.com
>
> and general principle, delayacct really shouldn't be using ktime (pvclock also
> really shouldn't be doing what it does, but that's another story). This lead me
> to looking at the SCHED_INFO, SCHEDSTATS, DELAYACCT (and PSI) accounting hell.
>
> The rest of the patches are an attempt at simplifying all that a little. All
> that crud is enabled by default for distros which is leading to a death by a
> thousand cuts.
>
> The last patch is an attempt at default disabling DELAYACCT, because I don't
> think anybody actually uses that much, but what do I know, there were no ill
> effects on my testbox. Perhaps we should mirror
> /proc/sys/kernel/sched_schedstats and provide a delayacct sysctl for runtime
> frobbing.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/6] sched,delayacct: Some cleanups
  2021-05-06  9:13   ` Peter Zijlstra
@ 2021-05-07 12:38     ` Balbir Singh
  2021-05-12 11:34       ` Mel Gorman
  0 siblings, 1 reply; 43+ messages in thread
From: Balbir Singh @ 2021-05-07 12:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, pbonzini, maz, linux-kernel,
	kvm, riel, hannes

On Thu, May 06, 2021 at 11:13:52AM +0200, Peter Zijlstra wrote:
> On Thu, May 06, 2021 at 08:29:40AM +1000, Balbir Singh wrote:
> > On Wed, May 05, 2021 at 12:59:40PM +0200, Peter Zijlstra wrote:
> > > Hi,
> > > 
> > > Due to:
> > > 
> > >   https://lkml.kernel.org/r/0000000000001d43ac05c0f5c6a0@google.com
> > > 
> > > and general principle, delayacct really shouldn't be using ktime (pvclock also
> > > really shouldn't be doing what it does, but that's another story). This lead me
> > > to looking at the SCHED_INFO, SCHEDSTATS, DELAYACCT (and PSI) accounting hell.
> > > 
> > > The rest of the patches are an attempt at simplifying all that a little. All
> > > that crud is enabled by default for distros which is leading to a death by a
> > > thousand cuts.
> > > 
> > > The last patch is an attempt at default disabling DELAYACCT, because I don't
> > > think anybody actually uses that much, but what do I know, there were no ill
> > > effects on my testbox. Perhaps we should mirror
> > > /proc/sys/kernel/sched_schedstats and provide a delayacct sysctl for runtime
> > > frobbing.
> > >
> > 
> > There are tools like iotop that use delayacct to display information. 
> 
> Right, but how many actual people use that? Does that justify saddling
> the whole sodding world with the overhead?
>

Not sure I have that data.
 
> > When the
> > code was checked in, we did run SPEC* back in the day 2006 to find overheads,
> > nothing significant showed. Do we have any date on the overhead your seeing?
> 
> I've not looked, but having it disabled saves that per-task allocation
> and that spinlock in delayacct_end() for iowait wakeups and a bunch of
> cache misses ofcourse.
> 
> I doubt SPEC is a benchmark that tickles those paths much if at all.
> 
> The thing is; we can't just keep growing more and more stats, that'll
> kill us quite dead.


I don't disagree, we've had these around for a while and I know of users
that use these stats to find potential starvation. I am OK with default
disabled. I suspect distros will have the final say.

Balbir Singh.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/6] delayacct: Use sched_clock()
  2021-05-05 10:59 ` [PATCH 1/6] delayacct: Use sched_clock() Peter Zijlstra
  2021-05-05 14:40   ` Rik van Riel
  2021-05-06 13:59   ` Johannes Weiner
@ 2021-05-07 12:40   ` Balbir Singh
  2021-05-12 10:43   ` Mel Gorman
  3 siblings, 0 replies; 43+ messages in thread
From: Balbir Singh @ 2021-05-07 12:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, pbonzini, maz, linux-kernel,
	kvm, riel, hannes

On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote:
> Like all scheduler statistics, use sched_clock() based time.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Goind by your comment about preemption safety not being a concern
the patch looks good.

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT
  2021-05-05 10:59 ` [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT Peter Zijlstra
  2021-05-05 11:37   ` Paolo Bonzini
  2021-05-06 14:38   ` Marc Zyngier
@ 2021-05-07 12:42   ` Balbir Singh
  2021-05-12 11:11   ` Mel Gorman
  3 siblings, 0 replies; 43+ messages in thread
From: Balbir Singh @ 2021-05-07 12:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, pbonzini, maz, linux-kernel,
	kvm, riel, hannes

On Wed, May 05, 2021 at 12:59:44PM +0200, Peter Zijlstra wrote:
> AFAICT KVM only relies on SCHED_INFO. Nothing uses the p->delays data
> that belongs to TASK_DELAY_ACCT.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/6] sched,delayacct: Some cleanups
  2021-05-05 10:59 [PATCH 0/6] sched,delayacct: Some cleanups Peter Zijlstra
                   ` (7 preceding siblings ...)
  2021-05-07  9:05 ` Thomas Gleixner
@ 2021-05-10  7:08 ` Ingo Molnar
  2021-05-10 12:05 ` [PATCH 7/6] delayacct: Add sysctl to enable at runtime Peter Zijlstra
  9 siblings, 0 replies; 43+ messages in thread
From: Ingo Molnar @ 2021-05-10  7:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel, hannes


* Peter Zijlstra <peterz@infradead.org> wrote:

> Hi,
> 
> Due to:
> 
>   https://lkml.kernel.org/r/0000000000001d43ac05c0f5c6a0@google.com
> 
> and general principle, delayacct really shouldn't be using ktime (pvclock also
> really shouldn't be doing what it does, but that's another story). This lead me
> to looking at the SCHED_INFO, SCHEDSTATS, DELAYACCT (and PSI) accounting hell.
> 
> The rest of the patches are an attempt at simplifying all that a little. All
> that crud is enabled by default for distros which is leading to a death by a
> thousand cuts.
> 
> The last patch is an attempt at default disabling DELAYACCT, because I don't
> think anybody actually uses that much, but what do I know, there were no ill
> effects on my testbox. Perhaps we should mirror
> /proc/sys/kernel/sched_schedstats and provide a delayacct sysctl for runtime
> frobbing.

Reviewed-by: Ingo Molnar <mingo@kernel.org>

(This includes the #6 RFC patch.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 5/6] delayacct: Add static_branch in scheduler hooks
  2021-05-05 10:59 ` [PATCH 5/6] delayacct: Add static_branch in scheduler hooks Peter Zijlstra
  2021-05-06 14:05   ` Johannes Weiner
@ 2021-05-10  8:42   ` Balbir Singh
  2021-05-12 11:13   ` Mel Gorman
  2 siblings, 0 replies; 43+ messages in thread
From: Balbir Singh @ 2021-05-10  8:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, pbonzini, maz, linux-kernel,
	kvm, riel, hannes

On Wed, May 05, 2021 at 12:59:45PM +0200, Peter Zijlstra wrote:
> Cheaper when delayacct is disabled.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Acked-by: Balbir Singh <bsingharora@gmail.com> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/6] sched: Rename sched_info_{queued,dequeued}
  2021-05-05 10:59 ` [PATCH 2/6] sched: Rename sched_info_{queued,dequeued} Peter Zijlstra
  2021-05-05 14:39   ` Rik van Riel
  2021-05-06 13:59   ` Johannes Weiner
@ 2021-05-10  8:45   ` Balbir Singh
  2021-05-12 10:49   ` Mel Gorman
  3 siblings, 0 replies; 43+ messages in thread
From: Balbir Singh @ 2021-05-10  8:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, pbonzini, maz, linux-kernel,
	kvm, riel, hannes

On Wed, May 05, 2021 at 12:59:42PM +0200, Peter Zijlstra wrote:
> For consistency, rename {queued,dequeued} to {enqueue,dequeue}.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 7/6] delayacct: Add sysctl to enable at runtime
  2021-05-05 10:59 [PATCH 0/6] sched,delayacct: Some cleanups Peter Zijlstra
                   ` (8 preceding siblings ...)
  2021-05-10  7:08 ` Ingo Molnar
@ 2021-05-10 12:05 ` Peter Zijlstra
  2021-05-10 12:06   ` Peter Zijlstra
  2021-05-12 11:40   ` Mel Gorman
  9 siblings, 2 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-10 12:05 UTC (permalink / raw)
  To: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, riel, hannes


Just like sched_schedstats, allow runtime enabling (and disabling) of
delayacct. This is useful if one forgot to add the delayacct boot time
option.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Documentation/accounting/delay-accounting.rst |    6 ++--
 kernel/delayacct.c                            |   36 ++++++++++++++++++++++++--
 kernel/sysctl.c                               |   12 ++++++++
 3 files changed, 50 insertions(+), 4 deletions(-)

--- a/Documentation/accounting/delay-accounting.rst
+++ b/Documentation/accounting/delay-accounting.rst
@@ -74,8 +74,10 @@ Delay accounting is disabled by default
 
    delayacct
 
-to the kernel boot options. The rest of the instructions
-below assume this has been done.
+to the kernel boot options. The rest of the instructions below assume this has
+been done. Alternatively, use sysctl kernel.sched_delayacct to switch the state
+at runtime. Note however that only tasks started after enabling it will have
+delayacct information.
 
 After the system has booted up, use a utility
 similar to  getdelays.c to access the delays
--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -18,6 +18,17 @@ DEFINE_STATIC_KEY_FALSE(delayacct_key);
 int delayacct_on __read_mostly;	/* Delay accounting turned on/off */
 struct kmem_cache *delayacct_cache;
 
+static void set_delayacct(bool enabled)
+{
+	if (enabled) {
+		static_branch_enable(&delayacct_key);
+		delayacct_on = 1;
+	} else {
+		delayacct_on = 0;
+		static_branch_disable(&delayacct_key);
+	}
+}
+
 static int __init delayacct_setup_enable(char *str)
 {
 	delayacct_on = 1;
@@ -29,9 +40,30 @@ void delayacct_init(void)
 {
 	delayacct_cache = KMEM_CACHE(task_delay_info, SLAB_PANIC|SLAB_ACCOUNT);
 	delayacct_tsk_init(&init_task);
-	if (delayacct_on)
-		static_branch_enable(&delayacct_key);
+	set_delayacct(delayacct_on);
+}
+
+#ifdef CONFIG_PROC_SYSCTL
+int sysctl_delayacct(struct ctl_table *table, int write, void *buffer,
+		     size_t *lenp, loff_t *ppos)
+{
+	int state = delayacct_on;
+	struct ctl_table t;
+	int err;
+
+	if (write && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	t = *table;
+	t.data = &state;
+	err = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
+	if (err < 0)
+		return err;
+	if (write)
+		set_delayacct(state);
+	return err;
 }
+#endif
 
 void __delayacct_tsk_init(struct task_struct *tsk)
 {
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -71,6 +71,7 @@
 #include <linux/coredump.h>
 #include <linux/latencytop.h>
 #include <linux/pid.h>
+#include <linux/delayacct.h>
 
 #include "../lib/kstrtox.h"
 
@@ -1727,6 +1728,17 @@ static struct ctl_table kern_table[] = {
 		.extra2		= SYSCTL_ONE,
 	},
 #endif /* CONFIG_SCHEDSTATS */
+#ifdef CONFIG_TASK_DELAY_ACCT
+	{
+		.procname	= "sched_delayacct",
+		.data		= NULL,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= sysctl_delayacct,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
+#endif /* CONFIG_TASK_DELAY_ACCT */
 #ifdef CONFIG_NUMA_BALANCING
 	{
 		.procname	= "numa_balancing",

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 7/6] delayacct: Add sysctl to enable at runtime
  2021-05-10 12:05 ` [PATCH 7/6] delayacct: Add sysctl to enable at runtime Peter Zijlstra
@ 2021-05-10 12:06   ` Peter Zijlstra
  2021-05-12 11:40   ` Mel Gorman
  1 sibling, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-10 12:06 UTC (permalink / raw)
  To: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, bsingharora, pbonzini, maz
  Cc: linux-kernel, kvm, riel, hannes

On Mon, May 10, 2021 at 02:05:13PM +0200, Peter Zijlstra wrote:
> 
> Just like sched_schedstats, allow runtime enabling (and disabling) of
> delayacct. This is useful if one forgot to add the delayacct boot time
> option.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  Documentation/accounting/delay-accounting.rst |    6 ++--
>  kernel/delayacct.c                            |   36 ++++++++++++++++++++++++--
>  kernel/sysctl.c                               |   12 ++++++++
>  3 files changed, 50 insertions(+), 4 deletions(-)

*sigh*, the below hunk went missing..


--- a/include/linux/delayacct.h
+++ b/include/linux/delayacct.h
@@ -65,6 +65,10 @@ DECLARE_STATIC_KEY_FALSE(delayacct_key);
 extern int delayacct_on;	/* Delay accounting turned on/off */
 extern struct kmem_cache *delayacct_cache;
 extern void delayacct_init(void);
+
+extern int sysctl_delayacct(struct ctl_table *table, int write, void *buffer,
+			    size_t *lenp, loff_t *ppos);
+
 extern void __delayacct_tsk_init(struct task_struct *);
 extern void __delayacct_tsk_exit(struct task_struct *);
 extern void __delayacct_blkio_start(void);

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/6] delayacct: Use sched_clock()
  2021-05-05 10:59 ` [PATCH 1/6] delayacct: Use sched_clock() Peter Zijlstra
                     ` (2 preceding siblings ...)
  2021-05-07 12:40   ` Balbir Singh
@ 2021-05-12 10:43   ` Mel Gorman
  3 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2021-05-12 10:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel, hannes

On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote:
> Like all scheduler statistics, use sched_clock() based time.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 2/6] sched: Rename sched_info_{queued,dequeued}
  2021-05-05 10:59 ` [PATCH 2/6] sched: Rename sched_info_{queued,dequeued} Peter Zijlstra
                     ` (2 preceding siblings ...)
  2021-05-10  8:45   ` Balbir Singh
@ 2021-05-12 10:49   ` Mel Gorman
  3 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2021-05-12 10:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel, hannes

On Wed, May 05, 2021 at 12:59:42PM +0200, Peter Zijlstra wrote:
> For consistency, rename {queued,dequeued} to {enqueue,dequeue}.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/6] sched: Simplify sched_info_on()
  2021-05-05 10:59 ` [PATCH 3/6] sched: Simplify sched_info_on() Peter Zijlstra
  2021-05-06 14:03   ` Johannes Weiner
@ 2021-05-12 11:10   ` Mel Gorman
  2021-05-12 11:32     ` Peter Zijlstra
  1 sibling, 1 reply; 43+ messages in thread
From: Mel Gorman @ 2021-05-12 11:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel, hannes

On Wed, May 05, 2021 at 12:59:43PM +0200, Peter Zijlstra wrote:
> The situation around sched_info is somewhat complicated, it is used by
> sched_stats and delayacct and, indirectly, kvm.
> 
> If SCHEDSTATS=Y (but disabled by default) sched_info_on() is
> unconditionally true -- this is the case for all distro kernel configs
> I checked.
> 
> If for some reason SCHEDSTATS=N, but TASK_DELAY_ACCT=Y, then
> sched_info_on() can return false when delayacct is disabled,
> presumably because there would be no other users left; except kvm is.
> 
> Instead of complicating matters further by accurately accounting
> sched_stat and kvm state, simply unconditionally enable when
> SCHED_INFO=Y, matching the common distro case.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> @@ -163,13 +158,12 @@ static inline void sched_info_reset_dequ
>   */
>  static inline void sched_info_dequeue(struct rq *rq, struct task_struct *t)
>  {
> -	unsigned long long now = rq_clock(rq), delta = 0;
> +	unsigned long long delta = 0;
>  
> -	if (sched_info_on()) {
> -		if (t->sched_info.last_queued)
> -			delta = now - t->sched_info.last_queued;
> +	if (t->sched_info.last_queued) {
> +		delta = rq_clock(rq) - t->sched_info.last_queued;
> +		t->sched_info.last_queued = 0;
>  	}
> -	sched_info_reset_dequeued(t);
>  	t->sched_info.run_delay += delta;
>  
>  	rq_sched_info_dequeue(rq, delta);

As delta is !0 iff t->sched_info.last_queued, why not this?

diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 33ffd41935ba..37e33c0eeb7c 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -158,15 +158,14 @@ static inline void psi_sched_switch(struct task_struct *prev,
  */
 static inline void sched_info_dequeue(struct rq *rq, struct task_struct *t)
 {
-	unsigned long long delta = 0;
-
 	if (t->sched_info.last_queued) {
+		unsigned long long delta;
+
 		delta = rq_clock(rq) - t->sched_info.last_queued;
 		t->sched_info.last_queued = 0;
+		t->sched_info.run_delay += delta;
+		rq_sched_info_dequeue(rq, delta);
 	}
-	t->sched_info.run_delay += delta;
-
-	rq_sched_info_dequeue(rq, delta);
 }
 
 /*

> @@ -184,9 +178,10 @@ static void sched_info_arrive(struct rq
>  {
>  	unsigned long long now = rq_clock(rq), delta = 0;
>  
> -	if (t->sched_info.last_queued)
> +	if (t->sched_info.last_queued) {
>  		delta = now - t->sched_info.last_queued;
> -	sched_info_reset_dequeued(t);
> +		t->sched_info.last_queued = 0;
> +	}
>  	t->sched_info.run_delay += delta;
>  	t->sched_info.last_arrival = now;
>  	t->sched_info.pcount++;

Similarly

@@ -176,17 +175,18 @@ static inline void sched_info_dequeue(struct rq *rq, struct task_struct *t)
  */
 static void sched_info_arrive(struct rq *rq, struct task_struct *t)
 {
-	unsigned long long now = rq_clock(rq), delta = 0;
+	unsigned long long now = rq_clock(rq);
 
 	if (t->sched_info.last_queued) {
+		unsigned long long delta;
+
 		delta = now - t->sched_info.last_queued;
 		t->sched_info.last_queued = 0;
+		t->sched_info.run_delay += delta;
+		rq_sched_info_arrive(rq, delta);
 	}
-	t->sched_info.run_delay += delta;
 	t->sched_info.last_arrival = now;
 	t->sched_info.pcount++;
-
-	rq_sched_info_arrive(rq, delta);
 }
 
 /*

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT
  2021-05-05 10:59 ` [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT Peter Zijlstra
                     ` (2 preceding siblings ...)
  2021-05-07 12:42   ` Balbir Singh
@ 2021-05-12 11:11   ` Mel Gorman
  3 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2021-05-12 11:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel, hannes

On Wed, May 05, 2021 at 12:59:44PM +0200, Peter Zijlstra wrote:
> AFAICT KVM only relies on SCHED_INFO. Nothing uses the p->delays data
> that belongs to TASK_DELAY_ACCT.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 5/6] delayacct: Add static_branch in scheduler hooks
  2021-05-05 10:59 ` [PATCH 5/6] delayacct: Add static_branch in scheduler hooks Peter Zijlstra
  2021-05-06 14:05   ` Johannes Weiner
  2021-05-10  8:42   ` Balbir Singh
@ 2021-05-12 11:13   ` Mel Gorman
  2 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2021-05-12 11:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel, hannes

On Wed, May 05, 2021 at 12:59:45PM +0200, Peter Zijlstra wrote:
> Cheaper when delayacct is disabled.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/6] sched: Simplify sched_info_on()
  2021-05-12 11:10   ` Mel Gorman
@ 2021-05-12 11:32     ` Peter Zijlstra
  2021-05-12 12:51       ` Mel Gorman
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-12 11:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel, hannes

On Wed, May 12, 2021 at 12:10:40PM +0100, Mel Gorman wrote:
> As delta is !0 iff t->sched_info.last_queued, why not this?
> 
> diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
> index 33ffd41935ba..37e33c0eeb7c 100644
> --- a/kernel/sched/stats.h
> +++ b/kernel/sched/stats.h
> @@ -158,15 +158,14 @@ static inline void psi_sched_switch(struct task_struct *prev,
>   */
>  static inline void sched_info_dequeue(struct rq *rq, struct task_struct *t)
>  {
> -	unsigned long long delta = 0;
> -
>  	if (t->sched_info.last_queued) {
> +		unsigned long long delta;
> +
>  		delta = rq_clock(rq) - t->sched_info.last_queued;
>  		t->sched_info.last_queued = 0;
> +		t->sched_info.run_delay += delta;
> +		rq_sched_info_dequeue(rq, delta);
>  	}
> -	t->sched_info.run_delay += delta;
> -
> -	rq_sched_info_dequeue(rq, delta);
>  }

Right.. clearly I missed the obvious there.. Lemme go add another patch
on top of the lot.

Something like this I suppose.

---
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 33ffd41935ba..111072ee9663 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -160,10 +160,11 @@ static inline void sched_info_dequeue(struct rq *rq, struct task_struct *t)
 {
 	unsigned long long delta = 0;
 
-	if (t->sched_info.last_queued) {
-		delta = rq_clock(rq) - t->sched_info.last_queued;
-		t->sched_info.last_queued = 0;
-	}
+	if (!t->sched_info.last_queued)
+		return;
+
+	delta = rq_clock(rq) - t->sched_info.last_queued;
+	t->sched_info.last_queued = 0;
 	t->sched_info.run_delay += delta;
 
 	rq_sched_info_dequeue(rq, delta);
@@ -176,12 +177,14 @@ static inline void sched_info_dequeue(struct rq *rq, struct task_struct *t)
  */
 static void sched_info_arrive(struct rq *rq, struct task_struct *t)
 {
-	unsigned long long now = rq_clock(rq), delta = 0;
+	unsigned long long now, delta = 0;
 
-	if (t->sched_info.last_queued) {
-		delta = now - t->sched_info.last_queued;
-		t->sched_info.last_queued = 0;
-	}
+	if (!t->sched_info.last_queued)
+		return;
+
+	now = rq_clock(rq);
+	delta = now - t->sched_info.last_queued;
+	t->sched_info.last_queued = 0;
 	t->sched_info.run_delay += delta;
 	t->sched_info.last_arrival = now;
 	t->sched_info.pcount++;



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/6] sched,delayacct: Some cleanups
  2021-05-07 12:38     ` Balbir Singh
@ 2021-05-12 11:34       ` Mel Gorman
  2021-05-12 11:38         ` Peter Zijlstra
                           ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Mel Gorman @ 2021-05-12 11:34 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Peter Zijlstra, tglx, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, pbonzini, maz,
	linux-kernel, kvm, riel, hannes, Paul Wise

On Fri, May 07, 2021 at 10:38:10PM +1000, Balbir Singh wrote:
> On Thu, May 06, 2021 at 11:13:52AM +0200, Peter Zijlstra wrote:
> > On Thu, May 06, 2021 at 08:29:40AM +1000, Balbir Singh wrote:
> > > On Wed, May 05, 2021 at 12:59:40PM +0200, Peter Zijlstra wrote:
> > > > Hi,
> > > > 
> > > > Due to:
> > > > 
> > > >   https://lkml.kernel.org/r/0000000000001d43ac05c0f5c6a0@google.com
> > > > 
> > > > and general principle, delayacct really shouldn't be using ktime (pvclock also
> > > > really shouldn't be doing what it does, but that's another story). This lead me
> > > > to looking at the SCHED_INFO, SCHEDSTATS, DELAYACCT (and PSI) accounting hell.
> > > > 
> > > > The rest of the patches are an attempt at simplifying all that a little. All
> > > > that crud is enabled by default for distros which is leading to a death by a
> > > > thousand cuts.
> > > > 
> > > > The last patch is an attempt at default disabling DELAYACCT, because I don't
> > > > think anybody actually uses that much, but what do I know, there were no ill
> > > > effects on my testbox. Perhaps we should mirror
> > > > /proc/sys/kernel/sched_schedstats and provide a delayacct sysctl for runtime
> > > > frobbing.
> > > >
> > > 
> > > There are tools like iotop that use delayacct to display information. 
> > 
> > Right, but how many actual people use that? Does that justify saddling
> > the whole sodding world with the overhead?
> >
> 
> Not sure I have that data.
>  

It's used occasionally as a debugging tool when looking at IO problems
in general. Like sched_schedstats, it seems unnecesary to incur overhead
unless someone is debugging.

Minimally disabling had a small positive impact -- tbench 0-8% gain
depending on thread count and machine, positive impact in general to
specjbb2005, neutral on hackbench and most of the other workloads checked.

> > > When the
> > > code was checked in, we did run SPEC* back in the day 2006 to find overheads,
> > > nothing significant showed. Do we have any date on the overhead your seeing?
> > 
> > I've not looked, but having it disabled saves that per-task allocation
> > and that spinlock in delayacct_end() for iowait wakeups and a bunch of
> > cache misses ofcourse.
> > 
> > I doubt SPEC is a benchmark that tickles those paths much if at all.
> > 
> > The thing is; we can't just keep growing more and more stats, that'll
> > kill us quite dead.
> 
> 
> I don't disagree, we've had these around for a while and I know of users
> that use these stats to find potential starvation. I am OK with default
> disabled. I suspect distros will have the final say.
> 

I think default disabled should be fine. At worst when dealing with a bug
there would be a need to either reboot or enable at runtime with patch
7 included and add that instruction to the bug report when requesting
iotop data. At worst, a distro could revert the patch if iotop generated
too many bug reports or patch iotop in the distro package.

Alternatively, I've added Paul Wise to the cc who is the latest
committer to iotop.  Maybe he knows who could add/commit a check for
sysctl.sched_delayacct and if it exists then check if it's 1 and display
an error suggesting corrective action (add delayacct to the kernel command
line or sysctl sched.sched_delayacct=1). iotop appears to be in maintenance
mode but gets occasional commits even if it has not had a new version
since 2013 so maybe it could get a 0.7 tag if such a check was added.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 6/6] [RFC] delayacct: Default disabled
  2021-05-05 10:59 ` [PATCH 6/6] [RFC] delayacct: Default disabled Peter Zijlstra
@ 2021-05-12 11:35   ` Mel Gorman
  0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2021-05-12 11:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel, hannes

On Wed, May 05, 2021 at 12:59:46PM +0200, Peter Zijlstra wrote:
> Assuming this stuff isn't actually used much; disable it by default
> and avoid allocating and tracking the task_delay_info structure.
> 
> taskstats is changed to still report the regular sched and sched_info
> and only skip the missing task_delay_info fields instead of not
> reporting anything.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/6] sched,delayacct: Some cleanups
  2021-05-12 11:34       ` Mel Gorman
@ 2021-05-12 11:38         ` Peter Zijlstra
  2021-05-12 12:23         ` Paul Wise
  2021-06-25  0:50         ` Paul Wise
  2 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2021-05-12 11:38 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Balbir Singh, tglx, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, pbonzini, maz,
	linux-kernel, kvm, riel, hannes, Paul Wise

On Wed, May 12, 2021 at 12:34:19PM +0100, Mel Gorman wrote:
> > I don't disagree, we've had these around for a while and I know of users
> > that use these stats to find potential starvation. I am OK with default
> > disabled. I suspect distros will have the final say.
> > 
> 
> I think default disabled should be fine. At worst when dealing with a bug
> there would be a need to either reboot or enable at runtime with patch
> 7 included and add that instruction to the bug report when requesting
> iotop data. At worst, a distro could revert the patch if iotop generated
> too many bug reports or patch iotop in the distro package.
> 
> Alternatively, I've added Paul Wise to the cc who is the latest
> committer to iotop.  Maybe he knows who could add/commit a check for
> sysctl.sched_delayacct and if it exists then check if it's 1 and display
> an error suggesting corrective action (add delayacct to the kernel command
> line or sysctl sched.sched_delayacct=1). iotop appears to be in maintenance
> mode but gets occasional commits even if it has not had a new version
> since 2013 so maybe it could get a 0.7 tag if such a check was added.

In the final commit I made the knob be known as task_delayacct. I
figured that was a slightly better name.

But yes, a check in iotop, if that really is the sole user of all this,
would be very nice.

Thanks!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 7/6] delayacct: Add sysctl to enable at runtime
  2021-05-10 12:05 ` [PATCH 7/6] delayacct: Add sysctl to enable at runtime Peter Zijlstra
  2021-05-10 12:06   ` Peter Zijlstra
@ 2021-05-12 11:40   ` Mel Gorman
  1 sibling, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2021-05-12 11:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel, hannes, Jonathan Corbet

On Mon, May 10, 2021 at 02:05:13PM +0200, Peter Zijlstra wrote:
> 
> Just like sched_schedstats, allow runtime enabling (and disabling) of
> delayacct. This is useful if one forgot to add the delayacct boot time
> option.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  Documentation/accounting/delay-accounting.rst |    6 ++--
>  kernel/delayacct.c                            |   36 ++++++++++++++++++++++++--
>  kernel/sysctl.c                               |   12 ++++++++
>  3 files changed, 50 insertions(+), 4 deletions(-)
> 

Update sysctl/kernel.rst?

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 1d56a6b73a4e..5d9193bd8d27 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1087,6 +1087,13 @@ Model available). If your platform happens to meet the
 requirements for EAS but you do not want to use it, change
 this value to 0.
 
+sched_delayacct
+===============
+
+Enables/disables task delay accounting (see
+:doc:`accounting/delay-accounting.rst`). Enabling this feature incurs
+a small amount of overhead in the scheduler but is useful for debugging
+and performance tuning. It is required by some tools such as iotop.
 
 sched_schedstats
 ================

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/6] sched,delayacct: Some cleanups
  2021-05-12 11:34       ` Mel Gorman
  2021-05-12 11:38         ` Peter Zijlstra
@ 2021-05-12 12:23         ` Paul Wise
  2021-05-12 13:00           ` Mel Gorman
  2021-06-25  0:50         ` Paul Wise
  2 siblings, 1 reply; 43+ messages in thread
From: Paul Wise @ 2021-05-12 12:23 UTC (permalink / raw)
  To: Mel Gorman, Balbir Singh
  Cc: Peter Zijlstra, tglx, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, pbonzini, maz,
	linux-kernel, kvm, riel, hannes

[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]

On Wed, 2021-05-12 at 12:34 +0100, Mel Gorman wrote:

> Alternatively, I've added Paul Wise to the cc who is the latest
> committer to iotop.  Maybe he knows who could add/commit a check for
> sysctl.sched_delayacct and if it exists then check if it's 1 and display
> an error suggesting corrective action (add delayacct to the kernel command
> line or sysctl sched.sched_delayacct=1). iotop appears to be in maintenance
> mode but gets occasional commits even if it has not had a new version
> since 2013 so maybe it could get a 0.7 tag if such a check was added.

I am able to commit to the iotop repository but I don't have permission
from the author to make releases nor do I have access to the website.

I am happy to apply any patches anyone has for iotop and upload the
result to Debian, or I'll be willing to write patches to cope with
changes in Linux behaviour, if given a succinct explanation of what
changes are needed in iotop, once the Linux changes are fully merged.

As well as the Python iotop implementation, there is one written in C 
with more features, so please also file an issue or pull request there.
Please note that I don't have commit access to that repository though.

https://github.com/Tomas-M/iotop

-- 
bye,
pabs

https://bonedaddy.net/pabs3/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 3/6] sched: Simplify sched_info_on()
  2021-05-12 11:32     ` Peter Zijlstra
@ 2021-05-12 12:51       ` Mel Gorman
  0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2021-05-12 12:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, bsingharora, pbonzini, maz,
	linux-kernel, kvm, riel, hannes

On Wed, May 12, 2021 at 01:32:03PM +0200, Peter Zijlstra wrote:
> On Wed, May 12, 2021 at 12:10:40PM +0100, Mel Gorman wrote:
> > As delta is !0 iff t->sched_info.last_queued, why not this?
> > 
> > diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
> > index 33ffd41935ba..37e33c0eeb7c 100644
> > --- a/kernel/sched/stats.h
> > +++ b/kernel/sched/stats.h
> > @@ -158,15 +158,14 @@ static inline void psi_sched_switch(struct task_struct *prev,
> >   */
> >  static inline void sched_info_dequeue(struct rq *rq, struct task_struct *t)
> >  {
> > -	unsigned long long delta = 0;
> > -
> >  	if (t->sched_info.last_queued) {
> > +		unsigned long long delta;
> > +
> >  		delta = rq_clock(rq) - t->sched_info.last_queued;
> >  		t->sched_info.last_queued = 0;
> > +		t->sched_info.run_delay += delta;
> > +		rq_sched_info_dequeue(rq, delta);
> >  	}
> > -	t->sched_info.run_delay += delta;
> > -
> > -	rq_sched_info_dequeue(rq, delta);
> >  }
> 
> Right.. clearly I missed the obvious there.. Lemme go add another patch
> on top of the lot.
> 
> Something like this I suppose.
> 

That works for me.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/6] sched,delayacct: Some cleanups
  2021-05-12 12:23         ` Paul Wise
@ 2021-05-12 13:00           ` Mel Gorman
  2021-05-13  1:29             ` Paul Wise
  0 siblings, 1 reply; 43+ messages in thread
From: Mel Gorman @ 2021-05-12 13:00 UTC (permalink / raw)
  To: Paul Wise
  Cc: Balbir Singh, Peter Zijlstra, tglx, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, bristot,
	pbonzini, maz, linux-kernel, kvm, riel, hannes

[-- Attachment #1: Type: text/plain, Size: 1846 bytes --]

On Wed, May 12, 2021 at 08:23:51PM +0800, Paul Wise wrote:
> On Wed, 2021-05-12 at 12:34 +0100, Mel Gorman wrote:
> 
> > Alternatively, I've added Paul Wise to the cc who is the latest
> > committer to iotop.  Maybe he knows who could add/commit a check for
> > sysctl.sched_delayacct and if it exists then check if it's 1 and display
> > an error suggesting corrective action (add delayacct to the kernel command
> > line or sysctl sched.sched_delayacct=1). iotop appears to be in maintenance
> > mode but gets occasional commits even if it has not had a new version
> > since 2013 so maybe it could get a 0.7 tag if such a check was added.
> 
> I am able to commit to the iotop repository but I don't have permission
> from the author to make releases nor do I have access to the website.
> 
> I am happy to apply any patches anyone has for iotop and upload the
> result to Debian, or I'll be willing to write patches to cope with
> changes in Linux behaviour, if given a succinct explanation of what
> changes are needed in iotop, once the Linux changes are fully merged.
> 

If you send me the same patch, I can do submit a request to the devel
package for openSUSE. I don't have commit access but I would be surprised
if the package maintainer didn't accept the request. Obviously, I'll
build+boot a kernel that includes the final version of this series in
case of any naming changes or other oddities.

> As well as the Python iotop implementation, there is one written in C 
> with more features, so please also file an issue or pull request there.
> Please note that I don't have commit access to that repository though.
> 

Good thinking. I'll open a bug on github when I've tested your iotop
patch so that the bug report is more coherent.

Thanks for the quick response.

-- 
Mel Gorman
SUSE Labs

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/6] sched,delayacct: Some cleanups
  2021-05-12 13:00           ` Mel Gorman
@ 2021-05-13  1:29             ` Paul Wise
  0 siblings, 0 replies; 43+ messages in thread
From: Paul Wise @ 2021-05-13  1:29 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Balbir Singh, Peter Zijlstra, tglx, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, bristot,
	pbonzini, maz, linux-kernel, kvm, riel, hannes

[-- Attachment #1: Type: text/plain, Size: 1408 bytes --]

On Wed, 2021-05-12 at 14:00 +0100, Mel Gorman wrote:

> If you send me the same patch, I can do submit a request to the devel
> package for openSUSE. I don't have commit access but I would be surprised
> if the package maintainer didn't accept the request. Obviously, I'll
> build+boot a kernel that includes the final version of this series in
> case of any naming changes or other oddities.

At this point I'm not clear exactly what needs to be done and whether
or not the details have been nailed down enough that it is time to
commit the change to the iotop-py and iotop-c git repositories.

I recommend upgrading the openSUSE iotop package to the latest git
commit rather than just applying the latest patch on top.

Alternatively, once the patch is applied I can probably overstep my
permissions and add a tag to the iotop-py git repository, in case folks
are happy to pull from the git repository instead of the website.

> Good thinking. I'll open a bug on github when I've tested your iotop
> patch so that the bug report is more coherent.

OK, sounds good.

PS: does Linux have a facility for userspace processes to convert
syscall names to numbers for the currently running Linux kernel? I
noticed that iotop-py just hard-codes the syscall numbers for
ioprio_set and ioprio_get on common arches, missing newer arches.

-- 
bye,
pabs

https://bonedaddy.net/pabs3/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 0/6] sched,delayacct: Some cleanups
  2021-05-12 11:34       ` Mel Gorman
  2021-05-12 11:38         ` Peter Zijlstra
  2021-05-12 12:23         ` Paul Wise
@ 2021-06-25  0:50         ` Paul Wise
  2 siblings, 0 replies; 43+ messages in thread
From: Paul Wise @ 2021-06-25  0:50 UTC (permalink / raw)
  To: Mel Gorman, Balbir Singh
  Cc: Peter Zijlstra, tglx, mingo, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, pbonzini, maz,
	linux-kernel, kvm, riel, hannes

[-- Attachment #1: Type: text/plain, Size: 799 bytes --]

On Wed, 2021-05-12 at 12:34 +0100, Mel Gorman wrote:

> Alternatively, I've added Paul Wise to the cc who is the latest
> committer to iotop.  Maybe he knows who could add/commit a check for
> sysctl.sched_delayacct and if it exists then check if it's 1 and display
> an error suggesting corrective action (add delayacct to the kernel command
> line or sysctl sched.sched_delayacct=1). iotop appears to be in maintenance
> mode but gets occasional commits even if it has not had a new version
> since 2013 so maybe it could get a 0.7 tag if such a check was added.

Did the proposed changes get merged?

If so, please let me know the details of what needs to happen in iotop
and iotop-c to cope with the changes in the Linux kernel.

-- 
bye,
pabs

https://bonedaddy.net/pabs3/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2021-06-25  0:57 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 10:59 [PATCH 0/6] sched,delayacct: Some cleanups Peter Zijlstra
2021-05-05 10:59 ` [PATCH 1/6] delayacct: Use sched_clock() Peter Zijlstra
2021-05-05 14:40   ` Rik van Riel
2021-05-06 13:59   ` Johannes Weiner
2021-05-06 14:17     ` Peter Zijlstra
2021-05-06 15:17       ` Johannes Weiner
2021-05-07 12:40   ` Balbir Singh
2021-05-12 10:43   ` Mel Gorman
2021-05-05 10:59 ` [PATCH 2/6] sched: Rename sched_info_{queued,dequeued} Peter Zijlstra
2021-05-05 14:39   ` Rik van Riel
2021-05-06 13:59   ` Johannes Weiner
2021-05-10  8:45   ` Balbir Singh
2021-05-12 10:49   ` Mel Gorman
2021-05-05 10:59 ` [PATCH 3/6] sched: Simplify sched_info_on() Peter Zijlstra
2021-05-06 14:03   ` Johannes Weiner
2021-05-12 11:10   ` Mel Gorman
2021-05-12 11:32     ` Peter Zijlstra
2021-05-12 12:51       ` Mel Gorman
2021-05-05 10:59 ` [PATCH 4/6] kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT Peter Zijlstra
2021-05-05 11:37   ` Paolo Bonzini
2021-05-06 14:38   ` Marc Zyngier
2021-05-07 12:42   ` Balbir Singh
2021-05-12 11:11   ` Mel Gorman
2021-05-05 10:59 ` [PATCH 5/6] delayacct: Add static_branch in scheduler hooks Peter Zijlstra
2021-05-06 14:05   ` Johannes Weiner
2021-05-10  8:42   ` Balbir Singh
2021-05-12 11:13   ` Mel Gorman
2021-05-05 10:59 ` [PATCH 6/6] [RFC] delayacct: Default disabled Peter Zijlstra
2021-05-12 11:35   ` Mel Gorman
2021-05-05 22:29 ` [PATCH 0/6] sched,delayacct: Some cleanups Balbir Singh
2021-05-06  9:13   ` Peter Zijlstra
2021-05-07 12:38     ` Balbir Singh
2021-05-12 11:34       ` Mel Gorman
2021-05-12 11:38         ` Peter Zijlstra
2021-05-12 12:23         ` Paul Wise
2021-05-12 13:00           ` Mel Gorman
2021-05-13  1:29             ` Paul Wise
2021-06-25  0:50         ` Paul Wise
2021-05-07  9:05 ` Thomas Gleixner
2021-05-10  7:08 ` Ingo Molnar
2021-05-10 12:05 ` [PATCH 7/6] delayacct: Add sysctl to enable at runtime Peter Zijlstra
2021-05-10 12:06   ` Peter Zijlstra
2021-05-12 11:40   ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).