linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] sched: Clean up SCHED_DEBUG
@ 2021-03-26 10:33 Peter Zijlstra
  2021-03-26 10:33 ` [PATCH 1/9] sched/numa: Allow runtime enabling/disabling of NUMA balance without SCHED_DEBUG Peter Zijlstra
                   ` (8 more replies)
  0 siblings, 9 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 10:33 UTC (permalink / raw)
  To: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, peterz, greg

We currently have sysctl, procfs and debugfs SCHED_DEBUG interfaces, end the
insanity and move everything to debugfs.

And since this is a nice moment to have people re-evaluate their 'tunings',
also change how some of the values work.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/9] sched/numa: Allow runtime enabling/disabling of NUMA balance without SCHED_DEBUG
  2021-03-26 10:33 [PATCH 0/9] sched: Clean up SCHED_DEBUG Peter Zijlstra
@ 2021-03-26 10:33 ` Peter Zijlstra
  2021-03-26 10:33 ` [PATCH 2/9] sched: Remove sched_schedstats sysctl out from under SCHED_DEBUG Peter Zijlstra
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 10:33 UTC (permalink / raw)
  To: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, peterz, greg

From: Mel Gorman <mgorman@suse.de>

The ability to enable/disable NUMA balancing is not a debugging feature
and should not depend on CONFIG_SCHED_DEBUG.  For example, machines within
a HPC cluster may disable NUMA balancing temporarily for some jobs and
re-enable it for other jobs without needing to reboot.

This patch removes the dependency on CONFIG_SCHED_DEBUG for
kernel.numa_balancing sysctl. The other numa balancing related sysctls
are left as-is because if they need to be tuned then it is more likely
that NUMA balancing needs to be fixed instead.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210324133916.GQ15768@suse.de
---
 kernel/sysctl.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1753,6 +1753,9 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ONE,
 	},
+#endif /* CONFIG_NUMA_BALANCING */
+#endif /* CONFIG_SCHED_DEBUG */
+#ifdef CONFIG_NUMA_BALANCING
 	{
 		.procname	= "numa_balancing",
 		.data		= NULL, /* filled in by handler */
@@ -1763,7 +1766,6 @@ static struct ctl_table kern_table[] = {
 		.extra2		= SYSCTL_ONE,
 	},
 #endif /* CONFIG_NUMA_BALANCING */
-#endif /* CONFIG_SCHED_DEBUG */
 	{
 		.procname	= "sched_rt_period_us",
 		.data		= &sysctl_sched_rt_period,



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 2/9] sched: Remove sched_schedstats sysctl out from under SCHED_DEBUG
  2021-03-26 10:33 [PATCH 0/9] sched: Clean up SCHED_DEBUG Peter Zijlstra
  2021-03-26 10:33 ` [PATCH 1/9] sched/numa: Allow runtime enabling/disabling of NUMA balance without SCHED_DEBUG Peter Zijlstra
@ 2021-03-26 10:33 ` Peter Zijlstra
  2021-03-26 10:33 ` [PATCH 3/9] sched: Dont make LATENCYTOP select SCHED_DEBUG Peter Zijlstra
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 10:33 UTC (permalink / raw)
  To: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, peterz, greg

CONFIG_SCHEDSTATS does not depend on SCHED_DEBUG, it is inconsistent
to have the sysctl depend on it.

Suggested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sysctl.c |   22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1711,17 +1711,6 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
-#ifdef CONFIG_SCHEDSTATS
-	{
-		.procname	= "sched_schedstats",
-		.data		= NULL,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= sysctl_schedstats,
-		.extra1		= SYSCTL_ZERO,
-		.extra2		= SYSCTL_ONE,
-	},
-#endif /* CONFIG_SCHEDSTATS */
 #endif /* CONFIG_SMP */
 #ifdef CONFIG_NUMA_BALANCING
 	{
@@ -1755,6 +1744,17 @@ static struct ctl_table kern_table[] = {
 	},
 #endif /* CONFIG_NUMA_BALANCING */
 #endif /* CONFIG_SCHED_DEBUG */
+#ifdef CONFIG_SCHEDSTATS
+	{
+		.procname	= "sched_schedstats",
+		.data		= NULL,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= sysctl_schedstats,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
+#endif /* CONFIG_SCHEDSTATS */
 #ifdef CONFIG_NUMA_BALANCING
 	{
 		.procname	= "numa_balancing",



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 3/9] sched: Dont make LATENCYTOP select SCHED_DEBUG
  2021-03-26 10:33 [PATCH 0/9] sched: Clean up SCHED_DEBUG Peter Zijlstra
  2021-03-26 10:33 ` [PATCH 1/9] sched/numa: Allow runtime enabling/disabling of NUMA balance without SCHED_DEBUG Peter Zijlstra
  2021-03-26 10:33 ` [PATCH 2/9] sched: Remove sched_schedstats sysctl out from under SCHED_DEBUG Peter Zijlstra
@ 2021-03-26 10:33 ` Peter Zijlstra
  2021-03-26 10:33 ` [PATCH 4/9] sched: Move SCHED_DEBUG to debugfs Peter Zijlstra
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 10:33 UTC (permalink / raw)
  To: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, peterz, greg

SCHED_DEBUG is not in fact required for LATENCYTOP, don't select it.

Suggested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 lib/Kconfig.debug |    1 -
 1 file changed, 1 deletion(-)

--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1670,7 +1670,6 @@ config LATENCYTOP
 	select KALLSYMS_ALL
 	select STACKTRACE
 	select SCHEDSTATS
-	select SCHED_DEBUG
 	help
 	  Enable this option if you want to use the LatencyTOP tool
 	  to find out which userspace is blocking on what kernel operations.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 4/9] sched: Move SCHED_DEBUG to debugfs
  2021-03-26 10:33 [PATCH 0/9] sched: Clean up SCHED_DEBUG Peter Zijlstra
                   ` (2 preceding siblings ...)
  2021-03-26 10:33 ` [PATCH 3/9] sched: Dont make LATENCYTOP select SCHED_DEBUG Peter Zijlstra
@ 2021-03-26 10:33 ` Peter Zijlstra
  2021-03-26 11:06   ` Greg KH
  2021-04-07 10:46   ` Valentin Schneider
  2021-03-26 10:33 ` [PATCH 5/9] sched,preempt: Move preempt_dynamic to debug.c Peter Zijlstra
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 10:33 UTC (permalink / raw)
  To: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, peterz, greg

Stop polluting sysctl with undocumented knobs that really are debug
only, move them all to /debug/sched/ along with the existing
/debug/sched_* files that already exist.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched/sysctl.h |    8 +--
 kernel/sched/core.c          |    4 +
 kernel/sched/debug.c         |   74 +++++++++++++++++++++++++++++++--
 kernel/sched/fair.c          |    9 ----
 kernel/sched/sched.h         |    2 
 kernel/sysctl.c              |   96 -------------------------------------------
 6 files changed, 80 insertions(+), 113 deletions(-)

--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -26,10 +26,11 @@ int proc_dohung_task_timeout_secs(struct
 enum { sysctl_hung_task_timeout_secs = 0 };
 #endif
 
+extern unsigned int sysctl_sched_child_runs_first;
+
 extern unsigned int sysctl_sched_latency;
 extern unsigned int sysctl_sched_min_granularity;
 extern unsigned int sysctl_sched_wakeup_granularity;
-extern unsigned int sysctl_sched_child_runs_first;
 
 enum sched_tunable_scaling {
 	SCHED_TUNABLESCALING_NONE,
@@ -37,7 +38,7 @@ enum sched_tunable_scaling {
 	SCHED_TUNABLESCALING_LINEAR,
 	SCHED_TUNABLESCALING_END,
 };
-extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;
+extern unsigned int sysctl_sched_tunable_scaling;
 
 extern unsigned int sysctl_numa_balancing_scan_delay;
 extern unsigned int sysctl_numa_balancing_scan_period_min;
@@ -47,9 +48,6 @@ extern unsigned int sysctl_numa_balancin
 #ifdef CONFIG_SCHED_DEBUG
 extern __read_mostly unsigned int sysctl_sched_migration_cost;
 extern __read_mostly unsigned int sysctl_sched_nr_migrate;
-
-int sched_proc_update_handler(struct ctl_table *table, int write,
-		void *buffer, size_t *length, loff_t *ppos);
 #endif
 
 /*
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5504,9 +5504,11 @@ static const struct file_operations sche
 	.release	= single_release,
 };
 
+extern struct dentry *debugfs_sched;
+
 static __init int sched_init_debug_dynamic(void)
 {
-	debugfs_create_file("sched_preempt", 0644, NULL, NULL, &sched_dynamic_fops);
+	debugfs_create_file("sched_preempt", 0644, debugfs_sched, NULL, &sched_dynamic_fops);
 	return 0;
 }
 late_initcall(sched_init_debug_dynamic);
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -169,15 +169,81 @@ static const struct file_operations sche
 	.release	= single_release,
 };
 
+#ifdef CONFIG_SMP
+
+static ssize_t sched_scaling_write(struct file *filp, const char __user *ubuf,
+				   size_t cnt, loff_t *ppos)
+{
+	char buf[16];
+
+	if (cnt > 15)
+		cnt = 15;
+
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
+
+	if (kstrtouint(buf, 10, &sysctl_sched_tunable_scaling))
+		return -EINVAL;
+
+	if (sched_update_scaling())
+		return -EINVAL;
+
+	*ppos += cnt;
+	return cnt;
+}
+
+static int sched_scaling_show(struct seq_file *m, void *v)
+{
+	seq_printf(m, "%d\n", sysctl_sched_tunable_scaling);
+	return 0;
+}
+
+static int sched_scaling_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, sched_scaling_show, NULL);
+}
+
+static const struct file_operations sched_scaling_fops = {
+	.open		= sched_scaling_open,
+	.write		= sched_scaling_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+#endif /* SMP */
+
 __read_mostly bool sched_debug_enabled;
 
+struct dentry *debugfs_sched;
+
 static __init int sched_init_debug(void)
 {
-	debugfs_create_file("sched_features", 0644, NULL, NULL,
-			&sched_feat_fops);
+	struct dentry __maybe_unused *numa;
+
+	debugfs_sched = debugfs_create_dir("sched", NULL);
+
+	debugfs_create_file("features", 0644, debugfs_sched, NULL, &sched_feat_fops);
+	debugfs_create_bool("debug_enabled", 0644, debugfs_sched, &sched_debug_enabled);
+
+	debugfs_create_u32("latency_ns", 0644, debugfs_sched, &sysctl_sched_latency);
+	debugfs_create_u32("min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_min_granularity);
+	debugfs_create_u32("wakeup_granularity_ns", 0644, debugfs_sched, &sysctl_sched_wakeup_granularity);
+
+#ifdef CONFIG_SMP
+	debugfs_create_file("tunable_scaling", 0644, debugfs_sched, NULL, &sched_scaling_fops);
+	debugfs_create_u32("migration_cost_ns", 0644, debugfs_sched, &sysctl_sched_migration_cost);
+	debugfs_create_u32("nr_migrate", 0644, debugfs_sched, &sysctl_sched_nr_migrate);
+#endif
+
+#ifdef CONFIG_NUMA_BALANCING
+	numa = debugfs_create_dir("numa_balancing", debugfs_sched);
 
-	debugfs_create_bool("sched_debug", 0644, NULL,
-			&sched_debug_enabled);
+	debugfs_create_u32("scan_delay_ms", 0644, numa, &sysctl_numa_balancing_scan_delay);
+	debugfs_create_u32("scan_period_min_ms", 0644, numa, &sysctl_numa_balancing_scan_period_min);
+	debugfs_create_u32("scan_period_max_ms", 0644, numa, &sysctl_numa_balancing_scan_period_max);
+	debugfs_create_u32("scan_size_mb", 0644, numa, &sysctl_numa_balancing_scan_size);
+#endif
 
 	return 0;
 }
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -49,7 +49,7 @@ static unsigned int normalized_sysctl_sc
  *
  * (default SCHED_TUNABLESCALING_LOG = *(1+ilog(ncpus))
  */
-enum sched_tunable_scaling sysctl_sched_tunable_scaling = SCHED_TUNABLESCALING_LOG;
+unsigned int sysctl_sched_tunable_scaling = SCHED_TUNABLESCALING_LOG;
 
 /*
  * Minimal preemption granularity for CPU-bound tasks:
@@ -627,15 +627,10 @@ struct sched_entity *__pick_last_entity(
  * Scheduling class statistics methods:
  */
 
-int sched_proc_update_handler(struct ctl_table *table, int write,
-		void *buffer, size_t *lenp, loff_t *ppos)
+int sched_update_scaling(void)
 {
-	int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	unsigned int factor = get_update_sysctl_factor();
 
-	if (ret || !write)
-		return ret;
-
 	sched_nr_latency = DIV_ROUND_UP(sysctl_sched_latency,
 					sysctl_sched_min_granularity);
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1569,6 +1569,8 @@ static inline void unregister_sched_doma
 }
 #endif
 
+extern int sched_update_scaling(void);
+
 extern void flush_smp_call_function_from_idle(void);
 
 #else /* !CONFIG_SMP: */
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -184,17 +184,6 @@ static enum sysctl_writes_mode sysctl_wr
 int sysctl_legacy_va_layout;
 #endif
 
-#ifdef CONFIG_SCHED_DEBUG
-static int min_sched_granularity_ns = 100000;		/* 100 usecs */
-static int max_sched_granularity_ns = NSEC_PER_SEC;	/* 1 second */
-static int min_wakeup_granularity_ns;			/* 0 usecs */
-static int max_wakeup_granularity_ns = NSEC_PER_SEC;	/* 1 second */
-#ifdef CONFIG_SMP
-static int min_sched_tunable_scaling = SCHED_TUNABLESCALING_NONE;
-static int max_sched_tunable_scaling = SCHED_TUNABLESCALING_END-1;
-#endif /* CONFIG_SMP */
-#endif /* CONFIG_SCHED_DEBUG */
-
 #ifdef CONFIG_COMPACTION
 static int min_extfrag_threshold;
 static int max_extfrag_threshold = 1000;
@@ -1659,91 +1648,6 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
-#ifdef CONFIG_SCHED_DEBUG
-	{
-		.procname	= "sched_min_granularity_ns",
-		.data		= &sysctl_sched_min_granularity,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= sched_proc_update_handler,
-		.extra1		= &min_sched_granularity_ns,
-		.extra2		= &max_sched_granularity_ns,
-	},
-	{
-		.procname	= "sched_latency_ns",
-		.data		= &sysctl_sched_latency,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= sched_proc_update_handler,
-		.extra1		= &min_sched_granularity_ns,
-		.extra2		= &max_sched_granularity_ns,
-	},
-	{
-		.procname	= "sched_wakeup_granularity_ns",
-		.data		= &sysctl_sched_wakeup_granularity,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= sched_proc_update_handler,
-		.extra1		= &min_wakeup_granularity_ns,
-		.extra2		= &max_wakeup_granularity_ns,
-	},
-#ifdef CONFIG_SMP
-	{
-		.procname	= "sched_tunable_scaling",
-		.data		= &sysctl_sched_tunable_scaling,
-		.maxlen		= sizeof(enum sched_tunable_scaling),
-		.mode		= 0644,
-		.proc_handler	= sched_proc_update_handler,
-		.extra1		= &min_sched_tunable_scaling,
-		.extra2		= &max_sched_tunable_scaling,
-	},
-	{
-		.procname	= "sched_migration_cost_ns",
-		.data		= &sysctl_sched_migration_cost,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-	},
-	{
-		.procname	= "sched_nr_migrate",
-		.data		= &sysctl_sched_nr_migrate,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-	},
-#endif /* CONFIG_SMP */
-#ifdef CONFIG_NUMA_BALANCING
-	{
-		.procname	= "numa_balancing_scan_delay_ms",
-		.data		= &sysctl_numa_balancing_scan_delay,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-	},
-	{
-		.procname	= "numa_balancing_scan_period_min_ms",
-		.data		= &sysctl_numa_balancing_scan_period_min,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-	},
-	{
-		.procname	= "numa_balancing_scan_period_max_ms",
-		.data		= &sysctl_numa_balancing_scan_period_max,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-	},
-	{
-		.procname	= "numa_balancing_scan_size_mb",
-		.data		= &sysctl_numa_balancing_scan_size,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= SYSCTL_ONE,
-	},
-#endif /* CONFIG_NUMA_BALANCING */
-#endif /* CONFIG_SCHED_DEBUG */
 #ifdef CONFIG_SCHEDSTATS
 	{
 		.procname	= "sched_schedstats",



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 5/9] sched,preempt: Move preempt_dynamic to debug.c
  2021-03-26 10:33 [PATCH 0/9] sched: Clean up SCHED_DEBUG Peter Zijlstra
                   ` (3 preceding siblings ...)
  2021-03-26 10:33 ` [PATCH 4/9] sched: Move SCHED_DEBUG to debugfs Peter Zijlstra
@ 2021-03-26 10:33 ` Peter Zijlstra
  2021-03-26 10:33 ` [PATCH 6/9] debugfs: Implement debugfs_create_str() Peter Zijlstra
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 10:33 UTC (permalink / raw)
  To: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, peterz, greg

Move the #ifdef SCHED_DEBUG bits to kernel/sched/debug.c in order to
collect all the debugfs bits.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c  |   77 +--------------------------------------------------
 kernel/sched/debug.c |   67 +++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h |   11 +++++--
 3 files changed, 78 insertions(+), 77 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5371,9 +5371,9 @@ enum {
 	preempt_dynamic_full,
 };
 
-static int preempt_dynamic_mode = preempt_dynamic_full;
+int preempt_dynamic_mode = preempt_dynamic_full;
 
-static int sched_dynamic_mode(const char *str)
+int sched_dynamic_mode(const char *str)
 {
 	if (!strcmp(str, "none"))
 		return preempt_dynamic_none;
@@ -5387,7 +5387,7 @@ static int sched_dynamic_mode(const char
 	return -EINVAL;
 }
 
-static void sched_dynamic_update(int mode)
+void sched_dynamic_update(int mode)
 {
 	/*
 	 * Avoid {NONE,VOLUNTARY} -> FULL transitions from ever ending up in
@@ -5444,79 +5444,8 @@ static int __init setup_preempt_mode(cha
 }
 __setup("preempt=", setup_preempt_mode);
 
-#ifdef CONFIG_SCHED_DEBUG
-
-static ssize_t sched_dynamic_write(struct file *filp, const char __user *ubuf,
-				   size_t cnt, loff_t *ppos)
-{
-	char buf[16];
-	int mode;
-
-	if (cnt > 15)
-		cnt = 15;
-
-	if (copy_from_user(&buf, ubuf, cnt))
-		return -EFAULT;
-
-	buf[cnt] = 0;
-	mode = sched_dynamic_mode(strstrip(buf));
-	if (mode < 0)
-		return mode;
-
-	sched_dynamic_update(mode);
-
-	*ppos += cnt;
-
-	return cnt;
-}
-
-static int sched_dynamic_show(struct seq_file *m, void *v)
-{
-	static const char * preempt_modes[] = {
-		"none", "voluntary", "full"
-	};
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(preempt_modes); i++) {
-		if (preempt_dynamic_mode == i)
-			seq_puts(m, "(");
-		seq_puts(m, preempt_modes[i]);
-		if (preempt_dynamic_mode == i)
-			seq_puts(m, ")");
-
-		seq_puts(m, " ");
-	}
-
-	seq_puts(m, "\n");
-	return 0;
-}
-
-static int sched_dynamic_open(struct inode *inode, struct file *filp)
-{
-	return single_open(filp, sched_dynamic_show, NULL);
-}
-
-static const struct file_operations sched_dynamic_fops = {
-	.open		= sched_dynamic_open,
-	.write		= sched_dynamic_write,
-	.read		= seq_read,
-	.llseek		= seq_lseek,
-	.release	= single_release,
-};
-
-extern struct dentry *debugfs_sched;
-
-static __init int sched_init_debug_dynamic(void)
-{
-	debugfs_create_file("sched_preempt", 0644, debugfs_sched, NULL, &sched_dynamic_fops);
-	return 0;
-}
-late_initcall(sched_init_debug_dynamic);
-
-#endif /* CONFIG_SCHED_DEBUG */
 #endif /* CONFIG_PREEMPT_DYNAMIC */
 
-
 /*
  * This is the entry point to schedule() from kernel preemption
  * off of irq context.
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -215,9 +215,71 @@ static const struct file_operations sche
 
 #endif /* SMP */
 
+#ifdef CONFIG_PREEMPT_DYNAMIC
+
+static ssize_t sched_dynamic_write(struct file *filp, const char __user *ubuf,
+				   size_t cnt, loff_t *ppos)
+{
+	char buf[16];
+	int mode;
+
+	if (cnt > 15)
+		cnt = 15;
+
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
+
+	buf[cnt] = 0;
+	mode = sched_dynamic_mode(strstrip(buf));
+	if (mode < 0)
+		return mode;
+
+	sched_dynamic_update(mode);
+
+	*ppos += cnt;
+
+	return cnt;
+}
+
+static int sched_dynamic_show(struct seq_file *m, void *v)
+{
+	static const char * preempt_modes[] = {
+		"none", "voluntary", "full"
+	};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(preempt_modes); i++) {
+		if (preempt_dynamic_mode == i)
+			seq_puts(m, "(");
+		seq_puts(m, preempt_modes[i]);
+		if (preempt_dynamic_mode == i)
+			seq_puts(m, ")");
+
+		seq_puts(m, " ");
+	}
+
+	seq_puts(m, "\n");
+	return 0;
+}
+
+static int sched_dynamic_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, sched_dynamic_show, NULL);
+}
+
+static const struct file_operations sched_dynamic_fops = {
+	.open		= sched_dynamic_open,
+	.write		= sched_dynamic_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+#endif /* CONFIG_PREEMPT_DYNAMIC */
+
 __read_mostly bool sched_debug_enabled;
 
-struct dentry *debugfs_sched;
+static struct dentry *debugfs_sched;
 
 static __init int sched_init_debug(void)
 {
@@ -227,6 +289,9 @@ static __init int sched_init_debug(void)
 
 	debugfs_create_file("features", 0644, debugfs_sched, NULL, &sched_feat_fops);
 	debugfs_create_bool("debug_enabled", 0644, debugfs_sched, &sched_debug_enabled);
+#ifdef CONFIG_PREEMPT_DYNAMIC
+	debugfs_create_file("preempt", 0644, debugfs_sched, NULL, &sched_dynamic_fops);
+#endif
 
 	debugfs_create_u32("latency_ns", 0644, debugfs_sched, &sysctl_sched_latency);
 	debugfs_create_u32("min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_min_granularity);
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2733,5 +2733,12 @@ static inline bool is_per_cpu_kthread(st
 }
 #endif
 
-void swake_up_all_locked(struct swait_queue_head *q);
-void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
+extern void swake_up_all_locked(struct swait_queue_head *q);
+extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
+
+#ifdef CONFIG_PREEMPT_DYNAMIC
+extern int preempt_dynamic_mode;
+extern int sched_dynamic_mode(const char *str);
+extern void sched_dynamic_update(int mode);
+#endif
+



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 10:33 [PATCH 0/9] sched: Clean up SCHED_DEBUG Peter Zijlstra
                   ` (4 preceding siblings ...)
  2021-03-26 10:33 ` [PATCH 5/9] sched,preempt: Move preempt_dynamic to debug.c Peter Zijlstra
@ 2021-03-26 10:33 ` Peter Zijlstra
  2021-03-26 11:05   ` Greg KH
                     ` (2 more replies)
  2021-03-26 10:33 ` [PATCH 7/9] sched,debug: Convert sysctl sched_domains to debugfs Peter Zijlstra
                   ` (2 subsequent siblings)
  8 siblings, 3 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 10:33 UTC (permalink / raw)
  To: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, peterz, greg


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 fs/debugfs/file.c       |  144 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/debugfs.h |   29 +++++++++
 2 files changed, 173 insertions(+)

--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -865,6 +865,150 @@ struct dentry *debugfs_create_bool(const
 }
 EXPORT_SYMBOL_GPL(debugfs_create_bool);
 
+ssize_t debugfs_read_file_str(struct file *file, char __user *user_buf,
+			      size_t count, loff_t *ppos)
+{
+	struct dentry *dentry = F_DENTRY(file);
+	char *str, *copy = NULL;
+	int copy_len, len;
+	ssize_t ret;
+
+	ret = debugfs_file_get(dentry);
+	if (unlikely(ret))
+		return ret;
+
+again:
+	rcu_read_lock();
+	str = rcu_dereference(*(char **)file->private_data);
+	len = strlen(str) + 1;
+
+	if (!copy || copy_len < len) {
+		rcu_read_unlock();
+		kfree(copy);
+		copy = kmalloc(len + 1, GFP_KERNEL);
+		if (!copy) {
+			debugfs_file_put(dentry);
+			return -ENOMEM;
+		}
+		copy_len = len;
+		goto again;
+	}
+
+	strncpy(copy, str, len);
+	copy[len] = '\n';
+	copy[len+1] = '\0';
+	rcu_read_unlock();
+
+	debugfs_file_put(dentry);
+
+	ret = simple_read_from_buffer(user_buf, count, ppos, copy, len + 1);
+	kfree(copy);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(debugfs_read_file_str);
+
+ssize_t debugfs_write_file_str(struct file *file, const char __user *user_buf,
+			       size_t count, loff_t *ppos)
+{
+	struct dentry *dentry = F_DENTRY(file);
+	char *old, *new = NULL;
+	int pos = *ppos;
+	int r;
+
+	r = debugfs_file_get(dentry);
+	if (unlikely(r))
+		return r;
+
+	old = *(char **)file->private_data;
+
+	/* only allow strict concattenation */
+	r = -EINVAL;
+	if (pos && pos != strlen(old))
+		goto error;
+
+	r = -ENOMEM;
+	new = kmalloc(pos + count + 1, GFP_KERNEL);
+	if (!new)
+		goto error;
+
+	if (pos)
+		memcpy(new, old, pos);
+
+	r = -EFAULT;
+	if (copy_from_user(new + pos, user_buf, count))
+		goto error;
+
+	new[pos + count] = '\0';
+	strim(new);
+
+	rcu_assign_pointer(*(char **)file->private_data, new);
+	synchronize_rcu();
+	kfree(old);
+
+	debugfs_file_put(dentry);
+	return count;
+
+error:
+	kfree(new);
+	debugfs_file_put(dentry);
+	return r;
+}
+EXPORT_SYMBOL_GPL(debugfs_write_file_str);
+
+static const struct file_operations fops_str = {
+	.read =		debugfs_read_file_str,
+	.write =	debugfs_write_file_str,
+	.open =		simple_open,
+	.llseek =	default_llseek,
+};
+
+static const struct file_operations fops_str_ro = {
+	.read =		debugfs_read_file_str,
+	.open =		simple_open,
+	.llseek =	default_llseek,
+};
+
+static const struct file_operations fops_str_wo = {
+	.write =	debugfs_write_file_str,
+	.open =		simple_open,
+	.llseek =	default_llseek,
+};
+
+/**
+ * debugfs_create_str - create a debugfs file that is used to read and write a string value
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *          directory dentry if set.  If this parameter is %NULL, then the
+ *          file will be created in the root of the debugfs filesystem.
+ * @value: a pointer to the variable that the file should read to and write
+ *         from.
+ *
+ * This function creates a file in debugfs with the given name that
+ * contains the value of the variable @value.  If the @mode variable is so
+ * set, it can be read from, and written to.
+ *
+ * This function will return a pointer to a dentry if it succeeds.  This
+ * pointer must be passed to the debugfs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.)  If an error occurs, ERR_PTR(-ERROR) will be
+ * returned.
+ *
+ * NOTE: when writing is enabled it will replace the string, string lifetime is
+ * assumed to be RCU managed.
+ *
+ * If debugfs is not enabled in the kernel, the value ERR_PTR(-ENODEV) will
+ * be returned.
+ */
+struct dentry *debugfs_create_str(const char *name, umode_t mode,
+				   struct dentry *parent, char **value)
+{
+	return debugfs_create_mode_unsafe(name, mode, parent, value, &fops_str,
+				   &fops_str_ro, &fops_str_wo);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_str);
+
 static ssize_t read_file_blob(struct file *file, char __user *user_buf,
 			      size_t count, loff_t *ppos)
 {
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -128,6 +128,8 @@ void debugfs_create_atomic_t(const char
 			     struct dentry *parent, atomic_t *value);
 struct dentry *debugfs_create_bool(const char *name, umode_t mode,
 				  struct dentry *parent, bool *value);
+struct dentry *debugfs_create_str(const char *name, umode_t mode,
+				  struct dentry *parent, char **value);
 
 struct dentry *debugfs_create_blob(const char *name, umode_t mode,
 				  struct dentry *parent,
@@ -156,6 +158,12 @@ ssize_t debugfs_read_file_bool(struct fi
 ssize_t debugfs_write_file_bool(struct file *file, const char __user *user_buf,
 				size_t count, loff_t *ppos);
 
+ssize_t debugfs_read_file_str(struct file *file, char __user *user_buf,
+			      size_t count, loff_t *ppos);
+
+ssize_t debugfs_write_file_str(struct file *file, const char __user *user_buf,
+			       size_t count, loff_t *ppos);
+
 #else
 
 #include <linux/err.h>
@@ -297,6 +305,13 @@ static inline struct dentry *debugfs_cre
 	return ERR_PTR(-ENODEV);
 }
 
+static inline struct dentry *debugfs_create_str(const char *name, umode_t mode,
+						struct dentry *parent,
+						char **value)
+{
+	return ERR_PTR(-ENODEV);
+}
+
 static inline struct dentry *debugfs_create_blob(const char *name, umode_t mode,
 				  struct dentry *parent,
 				  struct debugfs_blob_wrapper *blob)
@@ -347,6 +362,20 @@ static inline ssize_t debugfs_write_file
 {
 	return -ENODEV;
 }
+
+static inline ssize_t debugfs_read_file_str(struct file *file,
+					    char __user *user_buf,
+					    size_t count, loff_t *ppos)
+{
+	return -ENODEV;
+}
+
+static inline ssize_t debugfs_write_file_str(struct file *file,
+					     const char __user *user_buf,
+					     size_t count, loff_t *ppos)
+{
+	return -ENODEV;
+}
 
 #endif
 



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 7/9] sched,debug: Convert sysctl sched_domains to debugfs
  2021-03-26 10:33 [PATCH 0/9] sched: Clean up SCHED_DEBUG Peter Zijlstra
                   ` (5 preceding siblings ...)
  2021-03-26 10:33 ` [PATCH 6/9] debugfs: Implement debugfs_create_str() Peter Zijlstra
@ 2021-03-26 10:33 ` Peter Zijlstra
  2021-03-26 13:11   ` Dietmar Eggemann
  2021-04-07 10:46   ` Valentin Schneider
  2021-03-26 10:34 ` [PATCH 8/9] sched: Move /proc/sched_debug " Peter Zijlstra
  2021-03-26 10:34 ` [PATCH 9/9] sched,fair: Alternative sched_slice() Peter Zijlstra
  8 siblings, 2 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 10:33 UTC (permalink / raw)
  To: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, peterz, greg

Stop polluting sysctl, move to debugfs for SCHED_DEBUG stuff.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/debug.c    |  255 ++++++++++--------------------------------------
 kernel/sched/sched.h    |    2 
 kernel/sched/topology.c |    1 
 3 files changed, 59 insertions(+), 199 deletions(-)

--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -299,6 +299,10 @@ static __init int sched_init_debug(void)
 	debugfs_create_file("tunable_scaling", 0644, debugfs_sched, NULL, &sched_scaling_fops);
 	debugfs_create_u32("migration_cost_ns", 0644, debugfs_sched, &sysctl_sched_migration_cost);
 	debugfs_create_u32("nr_migrate", 0644, debugfs_sched, &sysctl_sched_nr_migrate);
+
+	mutex_lock(&sched_domains_mutex);
+	register_sched_domain_sysctl();
+	mutex_unlock(&sched_domains_mutex);
 #endif
 
 #ifdef CONFIG_NUMA_BALANCING
@@ -316,229 +320,87 @@ late_initcall(sched_init_debug);
 
 #ifdef CONFIG_SMP
 
-#ifdef CONFIG_SYSCTL
-
-static struct ctl_table sd_ctl_dir[] = {
-	{
-		.procname	= "sched_domain",
-		.mode		= 0555,
-	},
-	{}
-};
-
-static struct ctl_table sd_ctl_root[] = {
-	{
-		.procname	= "kernel",
-		.mode		= 0555,
-		.child		= sd_ctl_dir,
-	},
-	{}
-};
-
-static struct ctl_table *sd_alloc_ctl_entry(int n)
-{
-	struct ctl_table *entry =
-		kcalloc(n, sizeof(struct ctl_table), GFP_KERNEL);
-
-	return entry;
-}
-
-static void sd_free_ctl_entry(struct ctl_table **tablep)
-{
-	struct ctl_table *entry;
-
-	/*
-	 * In the intermediate directories, both the child directory and
-	 * procname are dynamically allocated and could fail but the mode
-	 * will always be set. In the lowest directory the names are
-	 * static strings and all have proc handlers.
-	 */
-	for (entry = *tablep; entry->mode; entry++) {
-		if (entry->child)
-			sd_free_ctl_entry(&entry->child);
-		if (entry->proc_handler == NULL)
-			kfree(entry->procname);
-	}
-
-	kfree(*tablep);
-	*tablep = NULL;
-}
-
-static void
-set_table_entry(struct ctl_table *entry,
-		const char *procname, void *data, int maxlen,
-		umode_t mode, proc_handler *proc_handler)
-{
-	entry->procname = procname;
-	entry->data = data;
-	entry->maxlen = maxlen;
-	entry->mode = mode;
-	entry->proc_handler = proc_handler;
-}
+static cpumask_var_t		sd_sysctl_cpus;
+static struct dentry		*sd_dentry;
 
-static int sd_ctl_doflags(struct ctl_table *table, int write,
-			  void *buffer, size_t *lenp, loff_t *ppos)
+static int sd_flags_show(struct seq_file *m, void *v)
 {
-	unsigned long flags = *(unsigned long *)table->data;
-	size_t data_size = 0;
-	size_t len = 0;
-	char *tmp, *buf;
+	unsigned long flags = *(unsigned int *)m->private;
 	int idx;
 
-	if (write)
-		return 0;
-
-	for_each_set_bit(idx, &flags, __SD_FLAG_CNT) {
-		char *name = sd_flag_debug[idx].name;
-
-		/* Name plus whitespace */
-		data_size += strlen(name) + 1;
-	}
-
-	if (*ppos > data_size) {
-		*lenp = 0;
-		return 0;
-	}
-
-	buf = kcalloc(data_size + 1, sizeof(*buf), GFP_KERNEL);
-	if (!buf)
-		return -ENOMEM;
-
 	for_each_set_bit(idx, &flags, __SD_FLAG_CNT) {
-		char *name = sd_flag_debug[idx].name;
-
-		len += snprintf(buf + len, strlen(name) + 2, "%s ", name);
+		seq_puts(m, sd_flag_debug[idx].name);
+		seq_puts(m, " ");
 	}
-
-	tmp = buf + *ppos;
-	len -= *ppos;
-
-	if (len > *lenp)
-		len = *lenp;
-	if (len)
-		memcpy(buffer, tmp, len);
-	if (len < *lenp) {
-		((char *)buffer)[len] = '\n';
-		len++;
-	}
-
-	*lenp = len;
-	*ppos += len;
-
-	kfree(buf);
+	seq_puts(m, "\n");
 
 	return 0;
 }
 
-static struct ctl_table *
-sd_alloc_ctl_domain_table(struct sched_domain *sd)
-{
-	struct ctl_table *table = sd_alloc_ctl_entry(9);
-
-	if (table == NULL)
-		return NULL;
-
-	set_table_entry(&table[0], "min_interval",	  &sd->min_interval,	    sizeof(long), 0644, proc_doulongvec_minmax);
-	set_table_entry(&table[1], "max_interval",	  &sd->max_interval,	    sizeof(long), 0644, proc_doulongvec_minmax);
-	set_table_entry(&table[2], "busy_factor",	  &sd->busy_factor,	    sizeof(int),  0644, proc_dointvec_minmax);
-	set_table_entry(&table[3], "imbalance_pct",	  &sd->imbalance_pct,	    sizeof(int),  0644, proc_dointvec_minmax);
-	set_table_entry(&table[4], "cache_nice_tries",	  &sd->cache_nice_tries,    sizeof(int),  0644, proc_dointvec_minmax);
-	set_table_entry(&table[5], "flags",		  &sd->flags,		    sizeof(int),  0444, sd_ctl_doflags);
-	set_table_entry(&table[6], "max_newidle_lb_cost", &sd->max_newidle_lb_cost, sizeof(long), 0644, proc_doulongvec_minmax);
-	set_table_entry(&table[7], "name",		  sd->name,	       CORENAME_MAX_SIZE, 0444, proc_dostring);
-	/* &table[8] is terminator */
-
-	return table;
-}
-
-static struct ctl_table *sd_alloc_ctl_cpu_table(int cpu)
+static int sd_flags_open(struct inode *inode, struct file *file)
 {
-	struct ctl_table *entry, *table;
-	struct sched_domain *sd;
-	int domain_num = 0, i;
-	char buf[32];
-
-	for_each_domain(cpu, sd)
-		domain_num++;
-	entry = table = sd_alloc_ctl_entry(domain_num + 1);
-	if (table == NULL)
-		return NULL;
-
-	i = 0;
-	for_each_domain(cpu, sd) {
-		snprintf(buf, 32, "domain%d", i);
-		entry->procname = kstrdup(buf, GFP_KERNEL);
-		entry->mode = 0555;
-		entry->child = sd_alloc_ctl_domain_table(sd);
-		entry++;
-		i++;
-	}
-	return table;
+	return single_open(file, sd_flags_show, inode->i_private);
 }
 
-static cpumask_var_t		sd_sysctl_cpus;
-static struct ctl_table_header	*sd_sysctl_header;
+static const struct file_operations sd_flags_fops = {
+	.open		= sd_flags_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
 
-void register_sched_domain_sysctl(void)
+static void register_sd(struct sched_domain *sd, struct dentry *parent)
 {
-	static struct ctl_table *cpu_entries;
-	static struct ctl_table **cpu_idx;
-	static bool init_done = false;
-	char buf[32];
-	int i;
-
-	if (!cpu_entries) {
-		cpu_entries = sd_alloc_ctl_entry(num_possible_cpus() + 1);
-		if (!cpu_entries)
-			return;
+#define SDM(type, mode, member)	\
+	debugfs_create_##type(#member, mode, parent, &sd->member)
 
-		WARN_ON(sd_ctl_dir[0].child);
-		sd_ctl_dir[0].child = cpu_entries;
-	}
+	SDM(ulong, 0644, min_interval);
+	SDM(ulong, 0644, max_interval);
+	SDM(u64,   0644, max_newidle_lb_cost);
+	SDM(u32,   0644, busy_factor);
+	SDM(u32,   0644, imbalance_pct);
+	SDM(u32,   0644, cache_nice_tries);
+//	SDM(x32,   0444, flags);
+	SDM(str,   0444, name);
 
-	if (!cpu_idx) {
-		struct ctl_table *e = cpu_entries;
+#undef SDM
 
-		cpu_idx = kcalloc(nr_cpu_ids, sizeof(struct ctl_table*), GFP_KERNEL);
-		if (!cpu_idx)
-			return;
+	debugfs_create_file("flags", 0444, parent, &sd->flags, &sd_flags_fops);
+}
 
-		/* deal with sparse possible map */
-		for_each_possible_cpu(i) {
-			cpu_idx[i] = e;
-			e++;
-		}
-	}
+void register_sched_domain_sysctl(void)
+{
+	int cpu, i;
 
 	if (!cpumask_available(sd_sysctl_cpus)) {
 		if (!alloc_cpumask_var(&sd_sysctl_cpus, GFP_KERNEL))
 			return;
-	}
-
-	if (!init_done) {
-		init_done = true;
-		/* init to possible to not have holes in @cpu_entries */
 		cpumask_copy(sd_sysctl_cpus, cpu_possible_mask);
 	}
 
-	for_each_cpu(i, sd_sysctl_cpus) {
-		struct ctl_table *e = cpu_idx[i];
+	if (!sd_dentry)
+		sd_dentry = debugfs_create_dir("domains", debugfs_sched);
 
-		if (e->child)
-			sd_free_ctl_entry(&e->child);
+	for_each_cpu(cpu, sd_sysctl_cpus) {
+		struct sched_domain *sd;
+		struct dentry *d_cpu;
+		char buf[32];
+
+		snprintf(buf, sizeof(buf), "cpu%d", cpu);
+		debugfs_remove(debugfs_lookup(buf, sd_dentry));
+		d_cpu = debugfs_create_dir(buf, sd_dentry);
+
+		i = 0;
+		for_each_domain(cpu, sd) {
+			struct dentry *d_sd;
 
-		if (!e->procname) {
-			snprintf(buf, 32, "cpu%d", i);
-			e->procname = kstrdup(buf, GFP_KERNEL);
-		}
-		e->mode = 0555;
-		e->child = sd_alloc_ctl_cpu_table(i);
+			snprintf(buf, sizeof(buf), "domain%d", i);
+			d_sd = debugfs_create_dir(buf, d_cpu);
 
-		__cpumask_clear_cpu(i, sd_sysctl_cpus);
+			register_sd(sd, d_sd);
+			i++;
+		}
 	}
-
-	WARN_ON(sd_sysctl_header);
-	sd_sysctl_header = register_sysctl_table(sd_ctl_root);
 }
 
 void dirty_sched_domain_sysctl(int cpu)
@@ -547,13 +409,12 @@ void dirty_sched_domain_sysctl(int cpu)
 		__cpumask_set_cpu(cpu, sd_sysctl_cpus);
 }
 
-/* may be called multiple times per register */
 void unregister_sched_domain_sysctl(void)
 {
-	unregister_sysctl_table(sd_sysctl_header);
-	sd_sysctl_header = NULL;
+	debugfs_remove(sd_dentry);
+	sd_dentry = NULL;
 }
-#endif /* CONFIG_SYSCTL */
+
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1553,7 +1553,7 @@ static inline unsigned int group_first_c
 
 extern int group_balance_cpu(struct sched_group *sg);
 
-#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL)
+#ifdef CONFIG_SCHED_DEBUG
 void register_sched_domain_sysctl(void);
 void dirty_sched_domain_sysctl(int cpu);
 void unregister_sched_domain_sysctl(void);
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2223,7 +2223,6 @@ int sched_init_domains(const struct cpum
 		doms_cur = &fallback_doms;
 	cpumask_and(doms_cur[0], cpu_map, housekeeping_cpumask(HK_FLAG_DOMAIN));
 	err = build_sched_domains(doms_cur[0], NULL);
-	register_sched_domain_sysctl();
 
 	return err;
 }



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 8/9] sched: Move /proc/sched_debug to debugfs
  2021-03-26 10:33 [PATCH 0/9] sched: Clean up SCHED_DEBUG Peter Zijlstra
                   ` (6 preceding siblings ...)
  2021-03-26 10:33 ` [PATCH 7/9] sched,debug: Convert sysctl sched_domains to debugfs Peter Zijlstra
@ 2021-03-26 10:34 ` Peter Zijlstra
  2021-03-26 11:05   ` Greg KH
  2021-03-26 10:34 ` [PATCH 9/9] sched,fair: Alternative sched_slice() Peter Zijlstra
  8 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 10:34 UTC (permalink / raw)
  To: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, peterz, greg


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/debug.c |   25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -279,6 +279,20 @@ static const struct file_operations sche
 
 __read_mostly bool sched_debug_enabled;
 
+static const struct seq_operations sched_debug_sops;
+
+static int sched_debug_open(struct inode *inode, struct file *filp)
+{
+	return seq_open(filp, &sched_debug_sops);
+}
+
+static const struct file_operations sched_debug_fops = {
+	.open		= sched_debug_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+
 static struct dentry *debugfs_sched;
 
 static __init int sched_init_debug(void)
@@ -316,6 +330,8 @@ static __init int sched_init_debug(void)
 	debugfs_create_u32("scan_size_mb", 0644, numa, &sysctl_numa_balancing_scan_size);
 #endif
 
+	debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops);
+
 	return 0;
 }
 late_initcall(sched_init_debug);
@@ -854,15 +870,6 @@ static const struct seq_operations sched
 	.show		= sched_debug_show,
 };
 
-static int __init init_sched_debug_procfs(void)
-{
-	if (!proc_create_seq("sched_debug", 0444, NULL, &sched_debug_sops))
-		return -ENOMEM;
-	return 0;
-}
-
-__initcall(init_sched_debug_procfs);
-
 #define __PS(S, F) SEQ_printf(m, "%-45s:%21Ld\n", S, (long long)(F))
 #define __P(F) __PS(#F, F)
 #define   P(F) __PS(#F, p->F)



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 9/9] sched,fair: Alternative sched_slice()
  2021-03-26 10:33 [PATCH 0/9] sched: Clean up SCHED_DEBUG Peter Zijlstra
                   ` (7 preceding siblings ...)
  2021-03-26 10:34 ` [PATCH 8/9] sched: Move /proc/sched_debug " Peter Zijlstra
@ 2021-03-26 10:34 ` Peter Zijlstra
  2021-03-26 12:08   ` Dietmar Eggemann
  2021-03-26 15:37   ` Vincent Guittot
  8 siblings, 2 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 10:34 UTC (permalink / raw)
  To: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, peterz, greg

The current sched_slice() seems to have issues; there's two possible
things that could be improved:

 - the 'nr_running' used for __sched_period() is daft when cgroups are
   considered. Using the RQ wide h_nr_running seems like a much more
   consistent number.

 - (esp) cgroups can slice it real fine, which makes for easy
   over-scheduling, ensure min_gran is what the name says.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/fair.c     |   15 ++++++++++++++-
 kernel/sched/features.h |    3 +++
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -680,7 +680,16 @@ static u64 __sched_period(unsigned long
  */
 static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
+	unsigned int nr_running = cfs_rq->nr_running;
+	u64 slice;
+
+	if (sched_feat(ALT_PERIOD))
+		nr_running = rq_of(cfs_rq)->cfs.h_nr_running;
+
+	slice = __sched_period(nr_running + !se->on_rq);
+
+	if (sched_feat(BASE_SLICE))
+		slice -= sysctl_sched_min_granularity;
 
 	for_each_sched_entity(se) {
 		struct load_weight *load;
@@ -697,6 +706,10 @@ static u64 sched_slice(struct cfs_rq *cf
 		}
 		slice = __calc_delta(slice, se->load.weight, load);
 	}
+
+	if (sched_feat(BASE_SLICE))
+		slice += sysctl_sched_min_granularity;
+
 	return slice;
 }
 
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -90,3 +90,6 @@ SCHED_FEAT(WA_BIAS, true)
  */
 SCHED_FEAT(UTIL_EST, true)
 SCHED_FEAT(UTIL_EST_FASTUP, true)
+
+SCHED_FEAT(ALT_PERIOD, true)
+SCHED_FEAT(BASE_SLICE, true)



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 10:33 ` [PATCH 6/9] debugfs: Implement debugfs_create_str() Peter Zijlstra
@ 2021-03-26 11:05   ` Greg KH
  2021-03-26 11:18     ` Peter Zijlstra
  2021-03-26 14:50   ` [PATCH v3 " Peter Zijlstra
  2021-03-27 22:24   ` [PATCH " Al Viro
  2 siblings, 1 reply; 40+ messages in thread
From: Greg KH @ 2021-03-26 11:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider,
	linux-kernel

On Fri, Mar 26, 2021 at 11:33:58AM +0100, Peter Zijlstra wrote:
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

No changelog text?  :(

> +/**
> + * debugfs_create_str - create a debugfs file that is used to read and write a string value
> + * @name: a pointer to a string containing the name of the file to create.
> + * @mode: the permission that the file should have
> + * @parent: a pointer to the parent dentry for this file.  This should be a
> + *          directory dentry if set.  If this parameter is %NULL, then the
> + *          file will be created in the root of the debugfs filesystem.
> + * @value: a pointer to the variable that the file should read to and write
> + *         from.
> + *
> + * This function creates a file in debugfs with the given name that
> + * contains the value of the variable @value.  If the @mode variable is so
> + * set, it can be read from, and written to.
> + *
> + * This function will return a pointer to a dentry if it succeeds.  This
> + * pointer must be passed to the debugfs_remove() function when the file is
> + * to be removed (no automatic cleanup happens if your module is unloaded,
> + * you are responsible here.)  If an error occurs, ERR_PTR(-ERROR) will be
> + * returned.
> + *
> + * NOTE: when writing is enabled it will replace the string, string lifetime is
> + * assumed to be RCU managed.
> + *
> + * If debugfs is not enabled in the kernel, the value ERR_PTR(-ENODEV) will
> + * be returned.
> + */
> +struct dentry *debugfs_create_str(const char *name, umode_t mode,
> +				   struct dentry *parent, char **value)

Please have this return void, no need for me to have to clean up
afterward later on :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/9] sched: Move /proc/sched_debug to debugfs
  2021-03-26 10:34 ` [PATCH 8/9] sched: Move /proc/sched_debug " Peter Zijlstra
@ 2021-03-26 11:05   ` Greg KH
  0 siblings, 0 replies; 40+ messages in thread
From: Greg KH @ 2021-03-26 11:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider,
	linux-kernel

On Fri, Mar 26, 2021 at 11:34:00AM +0100, Peter Zijlstra wrote:
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/debug.c |   25 ++++++++++++++++---------
>  1 file changed, 16 insertions(+), 9 deletions(-)

Yes!

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] sched: Move SCHED_DEBUG to debugfs
  2021-03-26 10:33 ` [PATCH 4/9] sched: Move SCHED_DEBUG to debugfs Peter Zijlstra
@ 2021-03-26 11:06   ` Greg KH
  2021-04-07 10:46   ` Valentin Schneider
  1 sibling, 0 replies; 40+ messages in thread
From: Greg KH @ 2021-03-26 11:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider,
	linux-kernel

On Fri, Mar 26, 2021 at 11:33:56AM +0100, Peter Zijlstra wrote:
> Stop polluting sysctl with undocumented knobs that really are debug
> only, move them all to /debug/sched/ along with the existing
> /debug/sched_* files that already exist.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/sched/sysctl.h |    8 +--
>  kernel/sched/core.c          |    4 +
>  kernel/sched/debug.c         |   74 +++++++++++++++++++++++++++++++--
>  kernel/sched/fair.c          |    9 ----
>  kernel/sched/sched.h         |    2 
>  kernel/sysctl.c              |   96 -------------------------------------------
>  6 files changed, 80 insertions(+), 113 deletions(-)

Yes!!!

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 11:05   ` Greg KH
@ 2021-03-26 11:18     ` Peter Zijlstra
  2021-03-26 11:30       ` Greg KH
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 11:18 UTC (permalink / raw)
  To: Greg KH
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider,
	linux-kernel

On Fri, Mar 26, 2021 at 12:05:07PM +0100, Greg KH wrote:
> On Fri, Mar 26, 2021 at 11:33:58AM +0100, Peter Zijlstra wrote:
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> No changelog text?  :(

Yeah, didn't really know what to say that the Subject didn't already do.

> > +/**
> > + * debugfs_create_str - create a debugfs file that is used to read and write a string value
> > + * @name: a pointer to a string containing the name of the file to create.
> > + * @mode: the permission that the file should have
> > + * @parent: a pointer to the parent dentry for this file.  This should be a
> > + *          directory dentry if set.  If this parameter is %NULL, then the
> > + *          file will be created in the root of the debugfs filesystem.
> > + * @value: a pointer to the variable that the file should read to and write
> > + *         from.
> > + *
> > + * This function creates a file in debugfs with the given name that
> > + * contains the value of the variable @value.  If the @mode variable is so
> > + * set, it can be read from, and written to.
> > + *
> > + * This function will return a pointer to a dentry if it succeeds.  This
> > + * pointer must be passed to the debugfs_remove() function when the file is
> > + * to be removed (no automatic cleanup happens if your module is unloaded,
> > + * you are responsible here.)  If an error occurs, ERR_PTR(-ERROR) will be
> > + * returned.
> > + *
> > + * NOTE: when writing is enabled it will replace the string, string lifetime is
> > + * assumed to be RCU managed.
> > + *
> > + * If debugfs is not enabled in the kernel, the value ERR_PTR(-ENODEV) will
> > + * be returned.
> > + */
> > +struct dentry *debugfs_create_str(const char *name, umode_t mode,
> > +				   struct dentry *parent, char **value)
> 
> Please have this return void, no need for me to have to clean up
> afterward later on :)

That would make it inconsistent with debugfs_create_{bool,blob}() from
which I copiously 'borrowed'.

Also, the return dentry might be useful with debugfs_create_symlink(),
but you're right in that most other create_file wrappers return void.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 11:18     ` Peter Zijlstra
@ 2021-03-26 11:30       ` Greg KH
  2021-03-26 11:38         ` [PATCH v2 " Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: Greg KH @ 2021-03-26 11:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider,
	linux-kernel

On Fri, Mar 26, 2021 at 12:18:47PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 26, 2021 at 12:05:07PM +0100, Greg KH wrote:
> > On Fri, Mar 26, 2021 at 11:33:58AM +0100, Peter Zijlstra wrote:
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > 
> > No changelog text?  :(
> 
> Yeah, didn't really know what to say that the Subject didn't already do.
> 
> > > +/**
> > > + * debugfs_create_str - create a debugfs file that is used to read and write a string value
> > > + * @name: a pointer to a string containing the name of the file to create.
> > > + * @mode: the permission that the file should have
> > > + * @parent: a pointer to the parent dentry for this file.  This should be a
> > > + *          directory dentry if set.  If this parameter is %NULL, then the
> > > + *          file will be created in the root of the debugfs filesystem.
> > > + * @value: a pointer to the variable that the file should read to and write
> > > + *         from.
> > > + *
> > > + * This function creates a file in debugfs with the given name that
> > > + * contains the value of the variable @value.  If the @mode variable is so
> > > + * set, it can be read from, and written to.
> > > + *
> > > + * This function will return a pointer to a dentry if it succeeds.  This
> > > + * pointer must be passed to the debugfs_remove() function when the file is
> > > + * to be removed (no automatic cleanup happens if your module is unloaded,
> > > + * you are responsible here.)  If an error occurs, ERR_PTR(-ERROR) will be
> > > + * returned.
> > > + *
> > > + * NOTE: when writing is enabled it will replace the string, string lifetime is
> > > + * assumed to be RCU managed.
> > > + *
> > > + * If debugfs is not enabled in the kernel, the value ERR_PTR(-ENODEV) will
> > > + * be returned.
> > > + */
> > > +struct dentry *debugfs_create_str(const char *name, umode_t mode,
> > > +				   struct dentry *parent, char **value)
> > 
> > Please have this return void, no need for me to have to clean up
> > afterward later on :)
> 
> That would make it inconsistent with debugfs_create_{bool,blob}() from
> which I copiously 'borrowed'.

As mentioned on IRC, I am trying to get rid of the return value, those
are the only 2 holdouts given their current use in some of the cruftier
areas of the kernel at the moment :(

> Also, the return dentry might be useful with debugfs_create_symlink(),
> but you're right in that most other create_file wrappers return void.

Great, change that and limit the size of the string that can be written
and it looks good to me, thanks for adding this.

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 11:30       ` Greg KH
@ 2021-03-26 11:38         ` Peter Zijlstra
  2021-03-26 12:18           ` Greg KH
  2021-03-26 12:53           ` Rasmus Villemoes
  0 siblings, 2 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 11:38 UTC (permalink / raw)
  To: Greg KH
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider,
	linux-kernel

On Fri, Mar 26, 2021 at 12:30:24PM +0100, Greg KH wrote:
> Great, change that and limit the size of the string that can be written
> and it looks good to me, thanks for adding this.

Here goes..

---
Subject: debugfs: Implement debugfs_create_str()
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Mar 25 10:53:55 CET 2021

Implement debugfs_create_str() to easily display names and such in
debugfs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 fs/debugfs/file.c       |  148 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/debugfs.h |   27 ++++++++
 2 files changed, 175 insertions(+)

--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -865,6 +865,154 @@ struct dentry *debugfs_create_bool(const
 }
 EXPORT_SYMBOL_GPL(debugfs_create_bool);
 
+ssize_t debugfs_read_file_str(struct file *file, char __user *user_buf,
+			      size_t count, loff_t *ppos)
+{
+	struct dentry *dentry = F_DENTRY(file);
+	char *str, *copy = NULL;
+	int copy_len, len;
+	ssize_t ret;
+
+	ret = debugfs_file_get(dentry);
+	if (unlikely(ret))
+		return ret;
+
+again:
+	rcu_read_lock();
+	str = rcu_dereference(*(char **)file->private_data);
+	len = strlen(str) + 1;
+
+	if (!copy || copy_len < len) {
+		rcu_read_unlock();
+		kfree(copy);
+		copy = kmalloc(len + 1, GFP_KERNEL);
+		if (!copy) {
+			debugfs_file_put(dentry);
+			return -ENOMEM;
+		}
+		copy_len = len;
+		goto again;
+	}
+
+	strncpy(copy, str, len);
+	copy[len] = '\n';
+	copy[len+1] = '\0';
+	rcu_read_unlock();
+
+	debugfs_file_put(dentry);
+
+	ret = simple_read_from_buffer(user_buf, count, ppos, copy, len + 1);
+	kfree(copy);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(debugfs_read_file_str);
+
+ssize_t debugfs_write_file_str(struct file *file, const char __user *user_buf,
+			       size_t count, loff_t *ppos)
+{
+	struct dentry *dentry = F_DENTRY(file);
+	char *old, *new = NULL;
+	int pos = *ppos;
+	int r;
+
+	r = debugfs_file_get(dentry);
+	if (unlikely(r))
+		return r;
+
+	old = *(char **)file->private_data;
+
+	/* only allow strict concattenation */
+	r = -EINVAL;
+	if (pos && pos != strlen(old))
+		goto error;
+
+	r = -E2BIG;
+	if (pos + count + 1 > PAGE_SIZE)
+		goto error;
+
+	r = -ENOMEM;
+	new = kmalloc(pos + count + 1, GFP_KERNEL);
+	if (!new)
+		goto error;
+
+	if (pos)
+		memcpy(new, old, pos);
+
+	r = -EFAULT;
+	if (copy_from_user(new + pos, user_buf, count))
+		goto error;
+
+	new[pos + count] = '\0';
+	strim(new);
+
+	rcu_assign_pointer(*(char **)file->private_data, new);
+	synchronize_rcu();
+	kfree(old);
+
+	debugfs_file_put(dentry);
+	return count;
+
+error:
+	kfree(new);
+	debugfs_file_put(dentry);
+	return r;
+}
+EXPORT_SYMBOL_GPL(debugfs_write_file_str);
+
+static const struct file_operations fops_str = {
+	.read =		debugfs_read_file_str,
+	.write =	debugfs_write_file_str,
+	.open =		simple_open,
+	.llseek =	default_llseek,
+};
+
+static const struct file_operations fops_str_ro = {
+	.read =		debugfs_read_file_str,
+	.open =		simple_open,
+	.llseek =	default_llseek,
+};
+
+static const struct file_operations fops_str_wo = {
+	.write =	debugfs_write_file_str,
+	.open =		simple_open,
+	.llseek =	default_llseek,
+};
+
+/**
+ * debugfs_create_str - create a debugfs file that is used to read and write a string value
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *          directory dentry if set.  If this parameter is %NULL, then the
+ *          file will be created in the root of the debugfs filesystem.
+ * @value: a pointer to the variable that the file should read to and write
+ *         from.
+ *
+ * This function creates a file in debugfs with the given name that
+ * contains the value of the variable @value.  If the @mode variable is so
+ * set, it can be read from, and written to.
+ *
+ * This function will return a pointer to a dentry if it succeeds.  This
+ * pointer must be passed to the debugfs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.)  If an error occurs, ERR_PTR(-ERROR) will be
+ * returned.
+ *
+ * NOTE: when writing is enabled it will replace the string, string lifetime is
+ * assumed to be RCU managed.
+ *
+ * If debugfs is not enabled in the kernel, the value ERR_PTR(-ENODEV) will
+ * be returned.
+ */
+void debugfs_create_str(const char *name, umode_t mode,
+			struct dentry *parent, char **value)
+{
+	debugfs_create_mode_unsafe(name, mode, parent, value, &fops_str,
+				   &fops_str_ro, &fops_str_wo);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_str);
+
 static ssize_t read_file_blob(struct file *file, char __user *user_buf,
 			      size_t count, loff_t *ppos)
 {
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -128,6 +128,8 @@ void debugfs_create_atomic_t(const char
 			     struct dentry *parent, atomic_t *value);
 struct dentry *debugfs_create_bool(const char *name, umode_t mode,
 				  struct dentry *parent, bool *value);
+void debugfs_create_str(const char *name, umode_t mode,
+			struct dentry *parent, char **value);
 
 struct dentry *debugfs_create_blob(const char *name, umode_t mode,
 				  struct dentry *parent,
@@ -156,6 +158,12 @@ ssize_t debugfs_read_file_bool(struct fi
 ssize_t debugfs_write_file_bool(struct file *file, const char __user *user_buf,
 				size_t count, loff_t *ppos);
 
+ssize_t debugfs_read_file_str(struct file *file, char __user *user_buf,
+			      size_t count, loff_t *ppos);
+
+ssize_t debugfs_write_file_str(struct file *file, const char __user *user_buf,
+			       size_t count, loff_t *ppos);
+
 #else
 
 #include <linux/err.h>
@@ -297,6 +305,11 @@ static inline struct dentry *debugfs_cre
 	return ERR_PTR(-ENODEV);
 }
 
+static inline void debugfs_create_str(const char *name, umode_t mode,
+				      struct dentry *parent,
+				      char **value)
+{ }
+
 static inline struct dentry *debugfs_create_blob(const char *name, umode_t mode,
 				  struct dentry *parent,
 				  struct debugfs_blob_wrapper *blob)
@@ -347,6 +360,20 @@ static inline ssize_t debugfs_write_file
 {
 	return -ENODEV;
 }
+
+static inline ssize_t debugfs_read_file_str(struct file *file,
+					    char __user *user_buf,
+					    size_t count, loff_t *ppos)
+{
+	return -ENODEV;
+}
+
+static inline ssize_t debugfs_write_file_str(struct file *file,
+					     const char __user *user_buf,
+					     size_t count, loff_t *ppos)
+{
+	return -ENODEV;
+}
 
 #endif
 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 9/9] sched,fair: Alternative sched_slice()
  2021-03-26 10:34 ` [PATCH 9/9] sched,fair: Alternative sched_slice() Peter Zijlstra
@ 2021-03-26 12:08   ` Dietmar Eggemann
  2021-03-26 14:07     ` Peter Zijlstra
  2021-03-26 15:37   ` Vincent Guittot
  1 sibling, 1 reply; 40+ messages in thread
From: Dietmar Eggemann @ 2021-03-26 12:08 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, mgorman, juri.lelli, vincent.guittot,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, greg

On 26/03/2021 11:34, Peter Zijlstra wrote:
> The current sched_slice() seems to have issues; there's two possible
> things that could be improved:
> 
>  - the 'nr_running' used for __sched_period() is daft when cgroups are
>    considered. Using the RQ wide h_nr_running seems like a much more
>    consistent number.
> 
>  - (esp) cgroups can slice it real fine, which makes for easy
>    over-scheduling, ensure min_gran is what the name says.

So ALT_PERIOD considers all runnable CFS tasks now and BASE_SLICE
guarantees min_gran as a floor for cgroup (hierarchies) with small
weight value(s)?

> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/fair.c     |   15 ++++++++++++++-
>  kernel/sched/features.h |    3 +++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -680,7 +680,16 @@ static u64 __sched_period(unsigned long
>   */
>  static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> -	u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
> +	unsigned int nr_running = cfs_rq->nr_running;
> +	u64 slice;
> +
> +	if (sched_feat(ALT_PERIOD))
> +		nr_running = rq_of(cfs_rq)->cfs.h_nr_running;
> +
> +	slice = __sched_period(nr_running + !se->on_rq);
> +
> +	if (sched_feat(BASE_SLICE))
> +		slice -= sysctl_sched_min_granularity;
>  
>  	for_each_sched_entity(se) {
>  		struct load_weight *load;
> @@ -697,6 +706,10 @@ static u64 sched_slice(struct cfs_rq *cf
>  		}
>  		slice = __calc_delta(slice, se->load.weight, load);
>  	}
> +
> +	if (sched_feat(BASE_SLICE))
> +		slice += sysctl_sched_min_granularity;
> +
>  	return slice;
>  }
>  
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -90,3 +90,6 @@ SCHED_FEAT(WA_BIAS, true)
>   */
>  SCHED_FEAT(UTIL_EST, true)
>  SCHED_FEAT(UTIL_EST_FASTUP, true)
> +
> +SCHED_FEAT(ALT_PERIOD, true)
> +SCHED_FEAT(BASE_SLICE, true)
> 
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 11:38         ` [PATCH v2 " Peter Zijlstra
@ 2021-03-26 12:18           ` Greg KH
  2021-03-26 12:53           ` Rasmus Villemoes
  1 sibling, 0 replies; 40+ messages in thread
From: Greg KH @ 2021-03-26 12:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider,
	linux-kernel

On Fri, Mar 26, 2021 at 12:38:39PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 26, 2021 at 12:30:24PM +0100, Greg KH wrote:
> > Great, change that and limit the size of the string that can be written
> > and it looks good to me, thanks for adding this.
> 
> Here goes..

Great, except one tiny thing:

> + * This function will return a pointer to a dentry if it succeeds.  This
> + * pointer must be passed to the debugfs_remove() function when the file is
> + * to be removed (no automatic cleanup happens if your module is unloaded,
> + * you are responsible here.)  If an error occurs, ERR_PTR(-ERROR) will be
> + * returned.
> + *
> + * NOTE: when writing is enabled it will replace the string, string lifetime is
> + * assumed to be RCU managed.
> + *
> + * If debugfs is not enabled in the kernel, the value ERR_PTR(-ENODEV) will
> + * be returned.

Nothing is returned anymore so the top and bottom paragraphs here no
longer apply.

Fix that up and feel free to add:

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 11:38         ` [PATCH v2 " Peter Zijlstra
  2021-03-26 12:18           ` Greg KH
@ 2021-03-26 12:53           ` Rasmus Villemoes
  2021-03-26 12:57             ` Greg KH
  2021-03-26 14:22             ` Peter Zijlstra
  1 sibling, 2 replies; 40+ messages in thread
From: Rasmus Villemoes @ 2021-03-26 12:53 UTC (permalink / raw)
  To: Peter Zijlstra, Greg KH
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider,
	linux-kernel

On 26/03/2021 12.38, Peter Zijlstra wrote:

> Implement debugfs_create_str() to easily display names and such in
> debugfs.

Patches 7-9 don't seem to add any users of this. What's it for precisely?

> +
> +again:
> +	rcu_read_lock();
> +	str = rcu_dereference(*(char **)file->private_data);
> +	len = strlen(str) + 1;
> +
> +	if (!copy || copy_len < len) {
> +		rcu_read_unlock();
> +		kfree(copy);
> +		copy = kmalloc(len + 1, GFP_KERNEL);
> +		if (!copy) {
> +			debugfs_file_put(dentry);
> +			return -ENOMEM;
> +		}
> +		copy_len = len;
> +		goto again;
> +	}
> +
> +	strncpy(copy, str, len);
> +	copy[len] = '\n';
> +	copy[len+1] = '\0';
> +	rcu_read_unlock();

As noted (accidentally off-list), this is broken. I think you want this
on top

- len = strlen(str) + 1;
+ len = strlen(str);

- strncpy(copy, str, len);
+ memcpy(copy, str, len);
  copy[len] = '\n';
- copy[len+1] = '\0';

> +EXPORT_SYMBOL_GPL(debugfs_read_file_str);

Why?

> +
> +ssize_t debugfs_write_file_str(struct file *file, const char __user *user_buf,
> +			       size_t count, loff_t *ppos)
> +{
> +	struct dentry *dentry = F_DENTRY(file);
> +	char *old, *new = NULL;
> +	int pos = *ppos;
> +	int r;
> +
> +	r = debugfs_file_get(dentry);
> +	if (unlikely(r))
> +		return r;
> +
> +	old = *(char **)file->private_data;
> +
> +	/* only allow strict concattenation */
> +	r = -EINVAL;
> +	if (pos && pos != strlen(old))
> +		goto error;

Other than the synchronize_rcu() below, I don't see any rcu stuff in
here. What prevents one task from kfree'ing old while another computes
its strlen()? Or two tasks from getting the same value of old and both
kfree'ing the same pointer?

And what part of the kernel periodically looks at some string and
decides something needs to be done? Shouldn't a write imply some
notification or callback? I can see the usefulness of the read part to
expose some otherwise-maintained string, but what good does allowing
writes do?

Rasmus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 12:53           ` Rasmus Villemoes
@ 2021-03-26 12:57             ` Greg KH
  2021-03-26 13:10               ` Rasmus Villemoes
  2021-03-26 14:22             ` Peter Zijlstra
  1 sibling, 1 reply; 40+ messages in thread
From: Greg KH @ 2021-03-26 12:57 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Peter Zijlstra, mingo, mgorman, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, joshdon,
	valentin.schneider, linux-kernel

On Fri, Mar 26, 2021 at 01:53:59PM +0100, Rasmus Villemoes wrote:
> On 26/03/2021 12.38, Peter Zijlstra wrote:
> 
> > Implement debugfs_create_str() to easily display names and such in
> > debugfs.
> 
> Patches 7-9 don't seem to add any users of this. What's it for precisely?

It's used in patch 7, look close :)


> 
> > +
> > +again:
> > +	rcu_read_lock();
> > +	str = rcu_dereference(*(char **)file->private_data);
> > +	len = strlen(str) + 1;
> > +
> > +	if (!copy || copy_len < len) {
> > +		rcu_read_unlock();
> > +		kfree(copy);
> > +		copy = kmalloc(len + 1, GFP_KERNEL);
> > +		if (!copy) {
> > +			debugfs_file_put(dentry);
> > +			return -ENOMEM;
> > +		}
> > +		copy_len = len;
> > +		goto again;
> > +	}
> > +
> > +	strncpy(copy, str, len);
> > +	copy[len] = '\n';
> > +	copy[len+1] = '\0';
> > +	rcu_read_unlock();
> 
> As noted (accidentally off-list), this is broken. I think you want this
> on top
> 
> - len = strlen(str) + 1;
> + len = strlen(str);
> 
> - strncpy(copy, str, len);
> + memcpy(copy, str, len);
>   copy[len] = '\n';
> - copy[len+1] = '\0';
> 
> > +EXPORT_SYMBOL_GPL(debugfs_read_file_str);
> 
> Why?

Because there is a user of it?  Yes, it's not in a module, but to make
it easy, might as well export it now.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 12:57             ` Greg KH
@ 2021-03-26 13:10               ` Rasmus Villemoes
  2021-03-26 14:12                 ` Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: Rasmus Villemoes @ 2021-03-26 13:10 UTC (permalink / raw)
  To: Greg KH, Rasmus Villemoes
  Cc: Peter Zijlstra, mingo, mgorman, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, joshdon,
	valentin.schneider, linux-kernel

On 26/03/2021 13.57, Greg KH wrote:
> On Fri, Mar 26, 2021 at 01:53:59PM +0100, Rasmus Villemoes wrote:
>> On 26/03/2021 12.38, Peter Zijlstra wrote:
>>
>>> Implement debugfs_create_str() to easily display names and such in
>>> debugfs.
>>
>> Patches 7-9 don't seem to add any users of this. What's it for precisely?
> 
> It's used in patch 7, look close :)

Ah, macrology. But the write capability doesn't seem used, and I (still)
fail to see how that could be useful.

>>> +EXPORT_SYMBOL_GPL(debugfs_read_file_str);
>>
>> Why?
> 
> Because there is a user of it?  Yes, it's not in a module, but to make
> it easy, might as well export it now.

No, I was asking why the individual read (and write) methods would need
to be exported. They have even less reason then debugfs_create_str() -
which I also think shouldn't be exported until a modular user shows up.

Rasmus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/9] sched,debug: Convert sysctl sched_domains to debugfs
  2021-03-26 10:33 ` [PATCH 7/9] sched,debug: Convert sysctl sched_domains to debugfs Peter Zijlstra
@ 2021-03-26 13:11   ` Dietmar Eggemann
  2021-04-07 10:46   ` Valentin Schneider
  1 sibling, 0 replies; 40+ messages in thread
From: Dietmar Eggemann @ 2021-03-26 13:11 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, mgorman, juri.lelli, vincent.guittot,
	rostedt, bsegall, bristot, joshdon, valentin.schneider
  Cc: linux-kernel, greg

On 26/03/2021 11:33, Peter Zijlstra wrote:
> Stop polluting sysctl, move to debugfs for SCHED_DEBUG stuff.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/debug.c    |  255 ++++++++++--------------------------------------
>  kernel/sched/sched.h    |    2 
>  kernel/sched/topology.c |    1 
>  3 files changed, 59 insertions(+), 199 deletions(-)

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

[...]

> +#define SDM(type, mode, member)	\
> +	debugfs_create_##type(#member, mode, parent, &sd->member)
>  
> -		WARN_ON(sd_ctl_dir[0].child);
> -		sd_ctl_dir[0].child = cpu_entries;
> -	}
> +	SDM(ulong, 0644, min_interval);
> +	SDM(ulong, 0644, max_interval);
> +	SDM(u64,   0644, max_newidle_lb_cost);
> +	SDM(u32,   0644, busy_factor);
> +	SDM(u32,   0644, imbalance_pct);
> +	SDM(u32,   0644, cache_nice_tries);
> +//	SDM(x32,   0444, flags);

Can be removed.

> +	SDM(str,   0444, name);
>  
> -	if (!cpu_idx) {
> -		struct ctl_table *e = cpu_entries;
> +#undef SDM
>  
> -		cpu_idx = kcalloc(nr_cpu_ids, sizeof(struct ctl_table*), GFP_KERNEL);
> -		if (!cpu_idx)
> -			return;
> +	debugfs_create_file("flags", 0444, parent, &sd->flags, &sd_flags_fops);
> +}

[...]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 9/9] sched,fair: Alternative sched_slice()
  2021-03-26 12:08   ` Dietmar Eggemann
@ 2021-03-26 14:07     ` Peter Zijlstra
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 14:07 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, rostedt, bsegall,
	bristot, joshdon, valentin.schneider, linux-kernel, greg

On Fri, Mar 26, 2021 at 01:08:44PM +0100, Dietmar Eggemann wrote:
> On 26/03/2021 11:34, Peter Zijlstra wrote:
> > The current sched_slice() seems to have issues; there's two possible
> > things that could be improved:
> > 
> >  - the 'nr_running' used for __sched_period() is daft when cgroups are
> >    considered. Using the RQ wide h_nr_running seems like a much more
> >    consistent number.
> > 
> >  - (esp) cgroups can slice it real fine, which makes for easy
> >    over-scheduling, ensure min_gran is what the name says.
> 
> So ALT_PERIOD considers all runnable CFS tasks now and BASE_SLICE
> guarantees min_gran as a floor for cgroup (hierarchies) with small
> weight value(s)?

Pretty much.

The previous cfs_rq->nr_running is just how many runnable thingies there
are in whatever cgroup you happen to be in on our CPU, not counting its
child cgroups nor whatever is upwards. Which is a pretty arbitrary
value.

By always using h_nr_running of the root, we get a consistent number and
the period is the same for all tasks on the CPU.

And yes, low weight cgroups, or even just a nice -20 and 19 task
together would result in *tiny* slices, which then leads to
over-scheduling. So by only scaling the part between period and
min_gran, we still get a variable slice, but also avoid the worst cases.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 13:10               ` Rasmus Villemoes
@ 2021-03-26 14:12                 ` Peter Zijlstra
  2021-03-26 14:19                   ` Greg KH
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 14:12 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Greg KH, mingo, mgorman, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, joshdon,
	valentin.schneider, linux-kernel

On Fri, Mar 26, 2021 at 02:10:41PM +0100, Rasmus Villemoes wrote:
> On 26/03/2021 13.57, Greg KH wrote:
> > On Fri, Mar 26, 2021 at 01:53:59PM +0100, Rasmus Villemoes wrote:
> >> On 26/03/2021 12.38, Peter Zijlstra wrote:
> >>
> >>> Implement debugfs_create_str() to easily display names and such in
> >>> debugfs.
> >>
> >> Patches 7-9 don't seem to add any users of this. What's it for precisely?
> > 
> > It's used in patch 7, look close :)
> 
> Ah, macrology. But the write capability doesn't seem used, and I (still)
> fail to see how that could be useful.

Correct, it isn't used. But I didn't think it would be acceptible to
not implement the write side. OTOH, if Greg would be okay with it, I can
change it to:

ssize_t debugfs_write_file_str(struct file *file, const char __user *user_buf,
                               size_t count, loff_t *ppos)
{
        /* This is really only for read-only strings */
        return -EINVAL;
}
EXPORT_SYMBOL_GPL(debugfs_write_file_str);

That's certianly simpler.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 14:12                 ` Peter Zijlstra
@ 2021-03-26 14:19                   ` Greg KH
  0 siblings, 0 replies; 40+ messages in thread
From: Greg KH @ 2021-03-26 14:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rasmus Villemoes, mingo, mgorman, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, joshdon,
	valentin.schneider, linux-kernel

On Fri, Mar 26, 2021 at 03:12:35PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 26, 2021 at 02:10:41PM +0100, Rasmus Villemoes wrote:
> > On 26/03/2021 13.57, Greg KH wrote:
> > > On Fri, Mar 26, 2021 at 01:53:59PM +0100, Rasmus Villemoes wrote:
> > >> On 26/03/2021 12.38, Peter Zijlstra wrote:
> > >>
> > >>> Implement debugfs_create_str() to easily display names and such in
> > >>> debugfs.
> > >>
> > >> Patches 7-9 don't seem to add any users of this. What's it for precisely?
> > > 
> > > It's used in patch 7, look close :)
> > 
> > Ah, macrology. But the write capability doesn't seem used, and I (still)
> > fail to see how that could be useful.
> 
> Correct, it isn't used. But I didn't think it would be acceptible to
> not implement the write side. OTOH, if Greg would be okay with it, I can
> change it to:
> 
> ssize_t debugfs_write_file_str(struct file *file, const char __user *user_buf,
>                                size_t count, loff_t *ppos)
> {
>         /* This is really only for read-only strings */
>         return -EINVAL;
> }
> EXPORT_SYMBOL_GPL(debugfs_write_file_str);
> 
> That's certianly simpler.

Fine with me.  Might as well make it static and not exported as well if
you are touching it :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 12:53           ` Rasmus Villemoes
  2021-03-26 12:57             ` Greg KH
@ 2021-03-26 14:22             ` Peter Zijlstra
  2021-03-26 14:58               ` Rasmus Villemoes
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 14:22 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Greg KH, mingo, mgorman, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, joshdon,
	valentin.schneider, linux-kernel

On Fri, Mar 26, 2021 at 01:53:59PM +0100, Rasmus Villemoes wrote:
> On 26/03/2021 12.38, Peter Zijlstra wrote:

> > +
> > +again:
> > +	rcu_read_lock();
> > +	str = rcu_dereference(*(char **)file->private_data);
> > +	len = strlen(str) + 1;
> > +
> > +	if (!copy || copy_len < len) {
> > +		rcu_read_unlock();
> > +		kfree(copy);
> > +		copy = kmalloc(len + 1, GFP_KERNEL);
> > +		if (!copy) {
> > +			debugfs_file_put(dentry);
> > +			return -ENOMEM;
> > +		}
> > +		copy_len = len;
> > +		goto again;
> > +	}
> > +
> > +	strncpy(copy, str, len);
> > +	copy[len] = '\n';
> > +	copy[len+1] = '\0';
> > +	rcu_read_unlock();
> 
> As noted (accidentally off-list), this is broken. I think you want this
> on top
> 
> - len = strlen(str) + 1;
> + len = strlen(str);

  kmalloc(len + 2, ...);

> - strncpy(copy, str, len);
> + memcpy(copy, str, len);
>   copy[len] = '\n';
> - copy[len+1] = '\0';

I'll go with strscpy() I tihnk, something like:

	len = strscpy(copy, str, len);
	if (len < 0)
		return len;
	copy[len] = '\n';
	copy[len + 1] = '\0';

> > +EXPORT_SYMBOL_GPL(debugfs_read_file_str);
> 
> Why?

Copy-pasta from debugfs_*_bool(). This thing seems to export everything
and I figured I'd go along with that.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 10:33 ` [PATCH 6/9] debugfs: Implement debugfs_create_str() Peter Zijlstra
  2021-03-26 11:05   ` Greg KH
@ 2021-03-26 14:50   ` Peter Zijlstra
  2021-03-27 10:42     ` Greg KH
  2021-03-27 22:24   ` [PATCH " Al Viro
  2 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 14:50 UTC (permalink / raw)
  To: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider, linux
  Cc: linux-kernel, greg

Subject: debugfs: Implement debugfs_create_str()
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Mar 25 10:53:55 CET 2021

Implement debugfs_create_str() to easily display names and such in
debugfs.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 fs/debugfs/file.c       |   94 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/debugfs.h |   17 ++++++++
 2 files changed, 111 insertions(+)

--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -865,6 +865,100 @@ struct dentry *debugfs_create_bool(const
 }
 EXPORT_SYMBOL_GPL(debugfs_create_bool);
 
+ssize_t debugfs_read_file_str(struct file *file, char __user *user_buf,
+			      size_t count, loff_t *ppos)
+{
+	struct dentry *dentry = F_DENTRY(file);
+	char *str, *copy = NULL;
+	int copy_len, len;
+	ssize_t ret;
+
+	ret = debugfs_file_get(dentry);
+	if (unlikely(ret))
+		return ret;
+
+	str = *(char **)file->private_data;
+	len = strlen(str) + 1;
+	copy = kmalloc(len + 1, GFP_KERNEL);
+	if (!copy) {
+		debugfs_file_put(dentry);
+		return -ENOMEM;
+	}
+
+	copy_len = strscpy(copy, str, len);
+	debugfs_file_put(dentry);
+	if (copy_len < 0) {
+		kfree(copy);
+		return copy_len;
+	}
+
+	copy[copy_len]     = '\n';
+	copy[copy_len + 1] = '\0';
+
+	ret = simple_read_from_buffer(user_buf, count, ppos, copy, copy_len + 1);
+	kfree(copy);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(debugfs_read_file_str);
+
+static ssize_t debugfs_write_file_str(struct file *file, const char __user *user_buf,
+				      size_t count, loff_t *ppos)
+{
+	/* This is really only for read-only strings */
+	return -EINVAL;
+}
+
+static const struct file_operations fops_str = {
+	.read =		debugfs_read_file_str,
+	.write =	debugfs_write_file_str,
+	.open =		simple_open,
+	.llseek =	default_llseek,
+};
+
+static const struct file_operations fops_str_ro = {
+	.read =		debugfs_read_file_str,
+	.open =		simple_open,
+	.llseek =	default_llseek,
+};
+
+static const struct file_operations fops_str_wo = {
+	.write =	debugfs_write_file_str,
+	.open =		simple_open,
+	.llseek =	default_llseek,
+};
+
+/**
+ * debugfs_create_str - create a debugfs file that is used to read and write a string value
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *          directory dentry if set.  If this parameter is %NULL, then the
+ *          file will be created in the root of the debugfs filesystem.
+ * @value: a pointer to the variable that the file should read to and write
+ *         from.
+ *
+ * This function creates a file in debugfs with the given name that
+ * contains the value of the variable @value.  If the @mode variable is so
+ * set, it can be read from, and written to.
+ *
+ * This function will return a pointer to a dentry if it succeeds.  This
+ * pointer must be passed to the debugfs_remove() function when the file is
+ * to be removed (no automatic cleanup happens if your module is unloaded,
+ * you are responsible here.)  If an error occurs, ERR_PTR(-ERROR) will be
+ * returned.
+ *
+ * If debugfs is not enabled in the kernel, the value ERR_PTR(-ENODEV) will
+ * be returned.
+ */
+void debugfs_create_str(const char *name, umode_t mode,
+			struct dentry *parent, char **value)
+{
+	debugfs_create_mode_unsafe(name, mode, parent, value, &fops_str,
+				   &fops_str_ro, &fops_str_wo);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_str);
+
 static ssize_t read_file_blob(struct file *file, char __user *user_buf,
 			      size_t count, loff_t *ppos)
 {
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -128,6 +128,8 @@ void debugfs_create_atomic_t(const char
 			     struct dentry *parent, atomic_t *value);
 struct dentry *debugfs_create_bool(const char *name, umode_t mode,
 				  struct dentry *parent, bool *value);
+void debugfs_create_str(const char *name, umode_t mode,
+			struct dentry *parent, char **value);
 
 struct dentry *debugfs_create_blob(const char *name, umode_t mode,
 				  struct dentry *parent,
@@ -156,6 +158,9 @@ ssize_t debugfs_read_file_bool(struct fi
 ssize_t debugfs_write_file_bool(struct file *file, const char __user *user_buf,
 				size_t count, loff_t *ppos);
 
+ssize_t debugfs_read_file_str(struct file *file, char __user *user_buf,
+			      size_t count, loff_t *ppos);
+
 #else
 
 #include <linux/err.h>
@@ -297,6 +302,11 @@ static inline struct dentry *debugfs_cre
 	return ERR_PTR(-ENODEV);
 }
 
+static inline void debugfs_create_str(const char *name, umode_t mode,
+				      struct dentry *parent,
+				      char **value)
+{ }
+
 static inline struct dentry *debugfs_create_blob(const char *name, umode_t mode,
 				  struct dentry *parent,
 				  struct debugfs_blob_wrapper *blob)
@@ -347,6 +357,13 @@ static inline ssize_t debugfs_write_file
 {
 	return -ENODEV;
 }
+
+static inline ssize_t debugfs_read_file_str(struct file *file,
+					    char __user *user_buf,
+					    size_t count, loff_t *ppos)
+{
+	return -ENODEV;
+}
 
 #endif
 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 14:22             ` Peter Zijlstra
@ 2021-03-26 14:58               ` Rasmus Villemoes
  2021-03-26 15:19                 ` Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: Rasmus Villemoes @ 2021-03-26 14:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, mingo, mgorman, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, joshdon,
	valentin.schneider, linux-kernel

On 26/03/2021 15.22, Peter Zijlstra wrote:
> On Fri, Mar 26, 2021 at 01:53:59PM +0100, Rasmus Villemoes wrote:
>> On 26/03/2021 12.38, Peter Zijlstra wrote:
> 
>>> +
>>> +again:
>>> +	rcu_read_lock();
>>> +	str = rcu_dereference(*(char **)file->private_data);
>>> +	len = strlen(str) + 1;
>>> +
>>> +	if (!copy || copy_len < len) {
>>> +		rcu_read_unlock();
>>> +		kfree(copy);
>>> +		copy = kmalloc(len + 1, GFP_KERNEL);
>>> +		if (!copy) {
>>> +			debugfs_file_put(dentry);
>>> +			return -ENOMEM;
>>> +		}
>>> +		copy_len = len;
>>> +		goto again;
>>> +	}
>>> +
>>> +	strncpy(copy, str, len);
>>> +	copy[len] = '\n';
>>> +	copy[len+1] = '\0';
>>> +	rcu_read_unlock();
>>
>> As noted (accidentally off-list), this is broken. I think you want this
>> on top
>>
>> - len = strlen(str) + 1;
>> + len = strlen(str);
> 
>   kmalloc(len + 2, ...);

No, because nul-terminating the stuff you pass to
simple_read_from_buffer is pointless cargo-culting. Yeah, read_file_bool
does it, but that's just bogus.

>> - strncpy(copy, str, len);
>> + memcpy(copy, str, len);
>>   copy[len] = '\n';
>> - copy[len+1] = '\0';
> 
> I'll go with strscpy() I tihnk, something like:
> 
> 	len = strscpy(copy, str, len);
> 	if (len < 0)
> 		return len;

To what end? The only way that could possibly return -EFOO is if the
nul-terminator in str vanished between the strlen() and here, and in
that case you have bigger problems.

> 	copy[len] = '\n';
> 	copy[len + 1] = '\0';
> 
>>> +EXPORT_SYMBOL_GPL(debugfs_read_file_str);
>>
>> Why?
> 
> Copy-pasta from debugfs_*_bool(). This thing seems to export everything
> and I figured I'd go along with that.

I thought the convention was not to export anything until the kernel
itself had a (modular) user. But I can't find that stated under
Documentation/ anywhere - but it does say "Every function that is
exported to loadable modules using
``EXPORT_SYMBOL`` or ``EXPORT_SYMBOL_GPL`` should have a kernel-doc
comment.". Anyway, the *_bool stuff doesn't seem to be something to be
copy-pasted.

Rasmus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 14:58               ` Rasmus Villemoes
@ 2021-03-26 15:19                 ` Peter Zijlstra
  2021-03-27 10:41                   ` Greg KH
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 15:19 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Greg KH, mingo, mgorman, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, joshdon,
	valentin.schneider, linux-kernel

On Fri, Mar 26, 2021 at 03:58:37PM +0100, Rasmus Villemoes wrote:
> >   kmalloc(len + 2, ...);
> 
> No, because nul-terminating the stuff you pass to
> simple_read_from_buffer is pointless cargo-culting. Yeah, read_file_bool
> does it, but that's just bogus.

Urgh, feel yuck to not have it zero terminated, but if you feel strongly
about it I suppose I can make it go away.

> > 	len = strscpy(copy, str, len);
> > 	if (len < 0)
> > 		return len;
> 
> To what end? The only way that could possibly return -EFOO is if the
> nul-terminator in str vanished between the strlen() and here, and in
> that case you have bigger problems.

There are strings in the kernel which we rewrite in most ugly ways,
task_struct::comm comes to mind. Best be paranoid.

> > Copy-pasta from debugfs_*_bool(). This thing seems to export everything
> > and I figured I'd go along with that.
> 
> I thought the convention was not to export anything until the kernel
> itself had a (modular) user.

That's generally how I feel too. But this really isn't my subsystem so I
more or less try to mimmick what I see done there.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 9/9] sched,fair: Alternative sched_slice()
  2021-03-26 10:34 ` [PATCH 9/9] sched,fair: Alternative sched_slice() Peter Zijlstra
  2021-03-26 12:08   ` Dietmar Eggemann
@ 2021-03-26 15:37   ` Vincent Guittot
  2021-03-26 18:30     ` Peter Zijlstra
  1 sibling, 1 reply; 40+ messages in thread
From: Vincent Guittot @ 2021-03-26 15:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Mel Gorman, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira, Josh Don,
	Valentin Schneider, linux-kernel, greg

On Fri, 26 Mar 2021 at 11:43, Peter Zijlstra <peterz@infradead.org> wrote:
>
> The current sched_slice() seems to have issues; there's two possible
> things that could be improved:
>
>  - the 'nr_running' used for __sched_period() is daft when cgroups are
>    considered. Using the RQ wide h_nr_running seems like a much more
>    consistent number.
>
>  - (esp) cgroups can slice it real fine, which makes for easy
>    over-scheduling, ensure min_gran is what the name says.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/fair.c     |   15 ++++++++++++++-
>  kernel/sched/features.h |    3 +++
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -680,7 +680,16 @@ static u64 __sched_period(unsigned long
>   */
>  static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> -       u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
> +       unsigned int nr_running = cfs_rq->nr_running;
> +       u64 slice;
> +
> +       if (sched_feat(ALT_PERIOD))
> +               nr_running = rq_of(cfs_rq)->cfs.h_nr_running;
> +
> +       slice = __sched_period(nr_running + !se->on_rq);
> +
> +       if (sched_feat(BASE_SLICE))
> +               slice -= sysctl_sched_min_granularity;
>
>         for_each_sched_entity(se) {
>                 struct load_weight *load;
> @@ -697,6 +706,10 @@ static u64 sched_slice(struct cfs_rq *cf
>                 }
>                 slice = __calc_delta(slice, se->load.weight, load);
>         }
> +
> +       if (sched_feat(BASE_SLICE))
> +               slice += sysctl_sched_min_granularity;

Why not only doing a max of slice and sysctl_sched_min_granularity
instead of scaling only the part above sysctl_sched_min_granularity ?

With your change, cases where the slices would have been in a good
range already, will be modified as well

> +
>         return slice;
>  }
>
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -90,3 +90,6 @@ SCHED_FEAT(WA_BIAS, true)
>   */
>  SCHED_FEAT(UTIL_EST, true)
>  SCHED_FEAT(UTIL_EST_FASTUP, true)
> +
> +SCHED_FEAT(ALT_PERIOD, true)
> +SCHED_FEAT(BASE_SLICE, true)
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 9/9] sched,fair: Alternative sched_slice()
  2021-03-26 15:37   ` Vincent Guittot
@ 2021-03-26 18:30     ` Peter Zijlstra
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-03-26 18:30 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Mel Gorman, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira, Josh Don,
	Valentin Schneider, linux-kernel, greg

On Fri, Mar 26, 2021 at 04:37:03PM +0100, Vincent Guittot wrote:
> On Fri, 26 Mar 2021 at 11:43, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > The current sched_slice() seems to have issues; there's two possible
> > things that could be improved:
> >
> >  - the 'nr_running' used for __sched_period() is daft when cgroups are
> >    considered. Using the RQ wide h_nr_running seems like a much more
> >    consistent number.
> >
> >  - (esp) cgroups can slice it real fine, which makes for easy
> >    over-scheduling, ensure min_gran is what the name says.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  kernel/sched/fair.c     |   15 ++++++++++++++-
> >  kernel/sched/features.h |    3 +++
> >  2 files changed, 17 insertions(+), 1 deletion(-)
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -680,7 +680,16 @@ static u64 __sched_period(unsigned long
> >   */
> >  static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >  {
> > -       u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
> > +       unsigned int nr_running = cfs_rq->nr_running;
> > +       u64 slice;
> > +
> > +       if (sched_feat(ALT_PERIOD))
> > +               nr_running = rq_of(cfs_rq)->cfs.h_nr_running;
> > +
> > +       slice = __sched_period(nr_running + !se->on_rq);
> > +
> > +       if (sched_feat(BASE_SLICE))
> > +               slice -= sysctl_sched_min_granularity;
> >
> >         for_each_sched_entity(se) {
> >                 struct load_weight *load;
> > @@ -697,6 +706,10 @@ static u64 sched_slice(struct cfs_rq *cf
> >                 }
> >                 slice = __calc_delta(slice, se->load.weight, load);
> >         }
> > +
> > +       if (sched_feat(BASE_SLICE))
> > +               slice += sysctl_sched_min_granularity;
> 
> Why not only doing a max of slice and sysctl_sched_min_granularity
> instead of scaling only the part above sysctl_sched_min_granularity ?
> 
> With your change, cases where the slices would have been in a good
> range already, will be modified as well

Can do I suppose. Not sure how I ended up with this.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 15:19                 ` Peter Zijlstra
@ 2021-03-27 10:41                   ` Greg KH
  0 siblings, 0 replies; 40+ messages in thread
From: Greg KH @ 2021-03-27 10:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rasmus Villemoes, mingo, mgorman, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, joshdon,
	valentin.schneider, linux-kernel

On Fri, Mar 26, 2021 at 04:19:12PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 26, 2021 at 03:58:37PM +0100, Rasmus Villemoes wrote:
> > >   kmalloc(len + 2, ...);
> > 
> > No, because nul-terminating the stuff you pass to
> > simple_read_from_buffer is pointless cargo-culting. Yeah, read_file_bool
> > does it, but that's just bogus.
> 
> Urgh, feel yuck to not have it zero terminated, but if you feel strongly
> about it I suppose I can make it go away.
> 
> > > 	len = strscpy(copy, str, len);
> > > 	if (len < 0)
> > > 		return len;
> > 
> > To what end? The only way that could possibly return -EFOO is if the
> > nul-terminator in str vanished between the strlen() and here, and in
> > that case you have bigger problems.
> 
> There are strings in the kernel which we rewrite in most ugly ways,
> task_struct::comm comes to mind. Best be paranoid.
> 
> > > Copy-pasta from debugfs_*_bool(). This thing seems to export everything
> > > and I figured I'd go along with that.
> > 
> > I thought the convention was not to export anything until the kernel
> > itself had a (modular) user.
> 
> That's generally how I feel too. But this really isn't my subsystem so I
> more or less try to mimmick what I see done there.

I'll take some time next week and go through and remove any exports in
debugfs that are not actually needed for exports as it's been a while
since I last looked...

But that should not affect this change, it's fine to me as-is.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 14:50   ` [PATCH v3 " Peter Zijlstra
@ 2021-03-27 10:42     ` Greg KH
  0 siblings, 0 replies; 40+ messages in thread
From: Greg KH @ 2021-03-27 10:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider, linux,
	linux-kernel

On Fri, Mar 26, 2021 at 03:50:00PM +0100, Peter Zijlstra wrote:
> Subject: debugfs: Implement debugfs_create_str()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Thu Mar 25 10:53:55 CET 2021
> 
> Implement debugfs_create_str() to easily display names and such in
> debugfs.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/9] debugfs: Implement debugfs_create_str()
  2021-03-26 10:33 ` [PATCH 6/9] debugfs: Implement debugfs_create_str() Peter Zijlstra
  2021-03-26 11:05   ` Greg KH
  2021-03-26 14:50   ` [PATCH v3 " Peter Zijlstra
@ 2021-03-27 22:24   ` Al Viro
  2021-03-28  0:33     ` Steven Rostedt
  2 siblings, 1 reply; 40+ messages in thread
From: Al Viro @ 2021-03-27 22:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, valentin.schneider,
	linux-kernel, greg

On Fri, Mar 26, 2021 at 11:33:58AM +0100, Peter Zijlstra wrote:

> +again:
> +	rcu_read_lock();
> +	str = rcu_dereference(*(char **)file->private_data);
> +	len = strlen(str) + 1;
> +
> +	if (!copy || copy_len < len) {
> +		rcu_read_unlock();
> +		kfree(copy);
> +		copy = kmalloc(len + 1, GFP_KERNEL);
> +		if (!copy) {
> +			debugfs_file_put(dentry);
> +			return -ENOMEM;
> +		}
> +		copy_len = len;
> +		goto again;
> +	}
> +
> +	strncpy(copy, str, len);
> +	copy[len] = '\n';
> +	copy[len+1] = '\0';
> +	rcu_read_unlock();

*Ow*

	If the string can't change under you, what is RCU use about?
And if it can, any use of string functions is asking for serious
trouble; they are *not* guaranteed to be safe when any of the strings
involved might be modified under them.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/9] debugfs: Implement debugfs_create_str()
  2021-03-27 22:24   ` [PATCH " Al Viro
@ 2021-03-28  0:33     ` Steven Rostedt
  0 siblings, 0 replies; 40+ messages in thread
From: Steven Rostedt @ 2021-03-28  0:33 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Zijlstra, mingo, mgorman, juri.lelli, vincent.guittot,
	dietmar.eggemann, bsegall, bristot, joshdon, valentin.schneider,
	linux-kernel, greg

On Sat, 27 Mar 2021 22:24:45 +0000
Al Viro <viro@zeniv.linux.org.uk> wrote:

> On Fri, Mar 26, 2021 at 11:33:58AM +0100, Peter Zijlstra wrote:
> 
> > +again:
> > +	rcu_read_lock();
> > +	str = rcu_dereference(*(char **)file->private_data);
> > +	len = strlen(str) + 1;
> > +
> > +	if (!copy || copy_len < len) {
> > +		rcu_read_unlock();
> > +		kfree(copy);
> > +		copy = kmalloc(len + 1, GFP_KERNEL);
> > +		if (!copy) {
> > +			debugfs_file_put(dentry);
> > +			return -ENOMEM;
> > +		}
> > +		copy_len = len;
> > +		goto again;
> > +	}
> > +
> > +	strncpy(copy, str, len);
> > +	copy[len] = '\n';
> > +	copy[len+1] = '\0';
> > +	rcu_read_unlock();  
> 
> *Ow*
> 
> 	If the string can't change under you, what is RCU use about?
> And if it can, any use of string functions is asking for serious
> trouble; they are *not* guaranteed to be safe when any of the strings
> involved might be modified under them.

Just from looking at the above, RCU isn't protecting that the string
can change under you, but the pointer to file->private_data can.

	str = rcu_dereference(*(char **)file->private_data);

That's just getting a pointer to the string. While under rcu, the value
of that string wont change nor will it be free. But file->private_data
might change, and it might free its old value, but will do so after a
RCU grace period (which is why the above has rcu_read_lock).

What the above looks like to me is a way to copy that string safely,
without worrying that it will be freed underneath you. But there's no
worry that it will change.

-- Steve

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/9] sched,debug: Convert sysctl sched_domains to debugfs
  2021-03-26 10:33 ` [PATCH 7/9] sched,debug: Convert sysctl sched_domains to debugfs Peter Zijlstra
  2021-03-26 13:11   ` Dietmar Eggemann
@ 2021-04-07 10:46   ` Valentin Schneider
  2021-04-07 12:18     ` Peter Zijlstra
  1 sibling, 1 reply; 40+ messages in thread
From: Valentin Schneider @ 2021-04-07 10:46 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, mgorman, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, joshdon
  Cc: linux-kernel, peterz, greg

On 26/03/21 11:33, Peter Zijlstra wrote:
> Stop polluting sysctl, move to debugfs for SCHED_DEBUG stuff.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/debug.c    |  255 ++++++++++--------------------------------------
>  kernel/sched/sched.h    |    2
>  kernel/sched/topology.c |    1
>  3 files changed, 59 insertions(+), 199 deletions(-)
>

I do very much like to see a simple pair of seq_puts() replacing the
mess I put in there!

One comment below.

> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> +void register_sched_domain_sysctl(void)
> +{
> +	int cpu, i;
>
>       if (!cpumask_available(sd_sysctl_cpus)) {
>               if (!alloc_cpumask_var(&sd_sysctl_cpus, GFP_KERNEL))
>                       return;
> -	}
> -
> -	if (!init_done) {
> -		init_done = true;
> -		/* init to possible to not have holes in @cpu_entries */
>               cpumask_copy(sd_sysctl_cpus, cpu_possible_mask);
>       }
>
> -	for_each_cpu(i, sd_sysctl_cpus) {
> -		struct ctl_table *e = cpu_idx[i];
> +	if (!sd_dentry)
> +		sd_dentry = debugfs_create_dir("domains", debugfs_sched);
>
> -		if (e->child)
> -			sd_free_ctl_entry(&e->child);
> +	for_each_cpu(cpu, sd_sysctl_cpus) {
> +		struct sched_domain *sd;
> +		struct dentry *d_cpu;
> +		char buf[32];
> +
> +		snprintf(buf, sizeof(buf), "cpu%d", cpu);
> +		debugfs_remove(debugfs_lookup(buf, sd_dentry));
> +		d_cpu = debugfs_create_dir(buf, sd_dentry);
> +
> +		i = 0;
> +		for_each_domain(cpu, sd) {
> +			struct dentry *d_sd;
>
> -		if (!e->procname) {
> -			snprintf(buf, 32, "cpu%d", i);
> -			e->procname = kstrdup(buf, GFP_KERNEL);
> -		}
> -		e->mode = 0555;
> -		e->child = sd_alloc_ctl_cpu_table(i);
> +			snprintf(buf, sizeof(buf), "domain%d", i);
> +			d_sd = debugfs_create_dir(buf, d_cpu);
>
> -		__cpumask_clear_cpu(i, sd_sysctl_cpus);

That seems to be the only place we cleared a CPU in that cpumask, and I
don't see its replacement, which would go against:

  bbdacdfed2f5 ("sched/debug: Optimize sched_domain sysctl generation")

With my very limited understanding of debugfs and sysctl, it seems that
before we would have some stuff saved in sd_ctl_table and free/reinit just
the bits we needed. With debugfs_remove(), I think we wipe everything
clean, or did I read that wrong?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] sched: Move SCHED_DEBUG to debugfs
  2021-03-26 10:33 ` [PATCH 4/9] sched: Move SCHED_DEBUG to debugfs Peter Zijlstra
  2021-03-26 11:06   ` Greg KH
@ 2021-04-07 10:46   ` Valentin Schneider
  2021-04-07 12:26     ` Peter Zijlstra
  1 sibling, 1 reply; 40+ messages in thread
From: Valentin Schneider @ 2021-04-07 10:46 UTC (permalink / raw)
  To: Peter Zijlstra, mingo, mgorman, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, bristot, joshdon
  Cc: linux-kernel, peterz, greg

On 26/03/21 11:33, Peter Zijlstra wrote:
>  __read_mostly bool sched_debug_enabled;
>  
> +struct dentry *debugfs_sched;
> +
>  static __init int sched_init_debug(void)
>  {
> -	debugfs_create_file("sched_features", 0644, NULL, NULL,
> -			&sched_feat_fops);
> +	struct dentry __maybe_unused *numa;
> +
> +	debugfs_sched = debugfs_create_dir("sched", NULL);
> +
> +	debugfs_create_file("features", 0644, debugfs_sched, NULL, &sched_feat_fops);
> +	debugfs_create_bool("debug_enabled", 0644, debugfs_sched, &sched_debug_enabled);
> +

Could we kill this too? I'm probably biased because I've spent some amount
of time banging my head at topology problems, but this two-tiered debugging
setup (KCONFIG + cmdline or post-boot write) has always irked me.

I can't find the threads in a hurry, but ISTR justifications for keeping
this around were:
- Most distros have CONFIG_SCHED_DEBUG=y because knobs and ponies
- Topology debug prints are "too verbose"
- NUMA distance matrix processing gets slower

If we make it so distros stop / don't need to select CONFIG_SCHED_DEBUG,
then I don't think the above really stands anymore (also, sched_init_numa()
now has the same complexity regardless of sched_debug), and we could keep
everything under CONFIG_SCHED_DEBUG.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 7/9] sched,debug: Convert sysctl sched_domains to debugfs
  2021-04-07 10:46   ` Valentin Schneider
@ 2021-04-07 12:18     ` Peter Zijlstra
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2021-04-07 12:18 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, linux-kernel, greg

On Wed, Apr 07, 2021 at 11:46:32AM +0100, Valentin Schneider wrote:

> > -		__cpumask_clear_cpu(i, sd_sysctl_cpus);
> 
> That seems to be the only place we cleared a CPU in that cpumask, and I
> don't see its replacement

Yeah, oops :-) Lemme go fix that. Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] sched: Move SCHED_DEBUG to debugfs
  2021-04-07 10:46   ` Valentin Schneider
@ 2021-04-07 12:26     ` Peter Zijlstra
  2021-04-07 12:57       ` Valentin Schneider
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2021-04-07 12:26 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, linux-kernel, greg

On Wed, Apr 07, 2021 at 11:46:43AM +0100, Valentin Schneider wrote:
> On 26/03/21 11:33, Peter Zijlstra wrote:
> >  __read_mostly bool sched_debug_enabled;
> >  
> > +struct dentry *debugfs_sched;
> > +
> >  static __init int sched_init_debug(void)
> >  {
> > -	debugfs_create_file("sched_features", 0644, NULL, NULL,
> > -			&sched_feat_fops);
> > +	struct dentry __maybe_unused *numa;
> > +
> > +	debugfs_sched = debugfs_create_dir("sched", NULL);
> > +
> > +	debugfs_create_file("features", 0644, debugfs_sched, NULL, &sched_feat_fops);
> > +	debugfs_create_bool("debug_enabled", 0644, debugfs_sched, &sched_debug_enabled);
> > +
> 
> Could we kill this too? I'm probably biased because I've spent some amount
> of time banging my head at topology problems, but this two-tiered debugging
> setup (KCONFIG + cmdline or post-boot write) has always irked me.
> 
> I can't find the threads in a hurry, but ISTR justifications for keeping
> this around were:
> - Most distros have CONFIG_SCHED_DEBUG=y because knobs and ponies
> - Topology debug prints are "too verbose"

^^ that mostly.

> - NUMA distance matrix processing gets slower
> 
> If we make it so distros stop / don't need to select CONFIG_SCHED_DEBUG,

We're not there yet, I think :-(

> then I don't think the above really stands anymore (also, sched_init_numa()
> now has the same complexity regardless of sched_debug), and we could keep
> everything under CONFIG_SCHED_DEBUG.

But yes, the reason this knob exists is ebcause I too frequently forget
to add the boot time knob, so I added this one to enable it at runtime
and then I get topology prints when I hotplug cycle a cpu.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/9] sched: Move SCHED_DEBUG to debugfs
  2021-04-07 12:26     ` Peter Zijlstra
@ 2021-04-07 12:57       ` Valentin Schneider
  0 siblings, 0 replies; 40+ messages in thread
From: Valentin Schneider @ 2021-04-07 12:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, mgorman, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, bristot, joshdon, linux-kernel, greg

On 07/04/21 14:26, Peter Zijlstra wrote:
> On Wed, Apr 07, 2021 at 11:46:43AM +0100, Valentin Schneider wrote:
>> I can't find the threads in a hurry, but ISTR justifications for keeping
>> this around were:
>> - Most distros have CONFIG_SCHED_DEBUG=y because knobs and ponies
>> - Topology debug prints are "too verbose"
>
> ^^ that mostly.
>
>> - NUMA distance matrix processing gets slower
>>
>> If we make it so distros stop / don't need to select CONFIG_SCHED_DEBUG,
>
> We're not there yet, I think :-(
>

Yeah, I know, a man can dream... At least now I can make use of
CMDLINE_EXTEND on arm64.

>> then I don't think the above really stands anymore (also, sched_init_numa()
>> now has the same complexity regardless of sched_debug), and we could keep
>> everything under CONFIG_SCHED_DEBUG.
>
> But yes, the reason this knob exists is ebcause I too frequently forget
> to add the boot time knob, so I added this one to enable it at runtime
> and then I get topology prints when I hotplug cycle a cpu.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-04-07 12:57 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-26 10:33 [PATCH 0/9] sched: Clean up SCHED_DEBUG Peter Zijlstra
2021-03-26 10:33 ` [PATCH 1/9] sched/numa: Allow runtime enabling/disabling of NUMA balance without SCHED_DEBUG Peter Zijlstra
2021-03-26 10:33 ` [PATCH 2/9] sched: Remove sched_schedstats sysctl out from under SCHED_DEBUG Peter Zijlstra
2021-03-26 10:33 ` [PATCH 3/9] sched: Dont make LATENCYTOP select SCHED_DEBUG Peter Zijlstra
2021-03-26 10:33 ` [PATCH 4/9] sched: Move SCHED_DEBUG to debugfs Peter Zijlstra
2021-03-26 11:06   ` Greg KH
2021-04-07 10:46   ` Valentin Schneider
2021-04-07 12:26     ` Peter Zijlstra
2021-04-07 12:57       ` Valentin Schneider
2021-03-26 10:33 ` [PATCH 5/9] sched,preempt: Move preempt_dynamic to debug.c Peter Zijlstra
2021-03-26 10:33 ` [PATCH 6/9] debugfs: Implement debugfs_create_str() Peter Zijlstra
2021-03-26 11:05   ` Greg KH
2021-03-26 11:18     ` Peter Zijlstra
2021-03-26 11:30       ` Greg KH
2021-03-26 11:38         ` [PATCH v2 " Peter Zijlstra
2021-03-26 12:18           ` Greg KH
2021-03-26 12:53           ` Rasmus Villemoes
2021-03-26 12:57             ` Greg KH
2021-03-26 13:10               ` Rasmus Villemoes
2021-03-26 14:12                 ` Peter Zijlstra
2021-03-26 14:19                   ` Greg KH
2021-03-26 14:22             ` Peter Zijlstra
2021-03-26 14:58               ` Rasmus Villemoes
2021-03-26 15:19                 ` Peter Zijlstra
2021-03-27 10:41                   ` Greg KH
2021-03-26 14:50   ` [PATCH v3 " Peter Zijlstra
2021-03-27 10:42     ` Greg KH
2021-03-27 22:24   ` [PATCH " Al Viro
2021-03-28  0:33     ` Steven Rostedt
2021-03-26 10:33 ` [PATCH 7/9] sched,debug: Convert sysctl sched_domains to debugfs Peter Zijlstra
2021-03-26 13:11   ` Dietmar Eggemann
2021-04-07 10:46   ` Valentin Schneider
2021-04-07 12:18     ` Peter Zijlstra
2021-03-26 10:34 ` [PATCH 8/9] sched: Move /proc/sched_debug " Peter Zijlstra
2021-03-26 11:05   ` Greg KH
2021-03-26 10:34 ` [PATCH 9/9] sched,fair: Alternative sched_slice() Peter Zijlstra
2021-03-26 12:08   ` Dietmar Eggemann
2021-03-26 14:07     ` Peter Zijlstra
2021-03-26 15:37   ` Vincent Guittot
2021-03-26 18:30     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).