linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] RFC: support for global CPU list abbreviations
@ 2020-11-08 16:08 Paul Gortmaker
  2020-11-08 16:08 ` [PATCH 1/4] cpumask: un-inline cpulist_parse; prepare for ascii helpers Paul Gortmaker
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Paul Gortmaker @ 2020-11-08 16:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: cgroups, Paul Gortmaker, Frederic Weisbecker, Paul E. McKenney,
	Josh Triplett, Thomas Gleixner, Ingo Molnar, Li Zefan

The basic objective here was to add support for "nohz_full=8-last" and/or
"rcu_nocbs="4-last" -- essentially introduce "last" as a portable
reference evaluated at boot/runtime for anything using a CPU list.

The thinking behind this, is that people carve off a few early CPUs to
support housekeeping tasks, and perhaps dedicate one to a busy I/O
peripheral, and then the remaining pool of CPUs out to the end are a
part of a commonly configured pool used for the real work the user
cares about.

Extend that logic out to a fleet of machines - some new, and some
nearing EOL, and you've probably got a wide range of core counts to
contend with - even though the early number of cores dedicated to the
system overhead probably doesn't vary.

This change would enable sysadmins to have a common bootarg across all
such systems, and would also avoid any off-by-one fencepost errors that
happen for users who might briefly forget that core counts start at
zero.

Looking around before starting, I noticed RCU already had a short-form
abbreviation "all" -- but if we want to treat CPU lists in a uniform
matter, then tokens shouldn't be implemented at a subsystem level and
hence be subsystem specific; each with their own variations.

So I moved "all" to global use - for boot args, and for cgroups.  Then
I added the inverse "none" and finally, the one I wanted -- "last".

The use of "last" isn't a standalone word like "all" or "none".  It will
be a part of a complete range specification, possibly with CSV separate
ranges, and possibly specified multiple times.  So I had to be a bit
more careful with string matching - and hence un-inlined the parse
function as commit #1 in this series.

But it really is a generic support for "replace token ABC with known at
boot value XYZ" - for example, it would be trivial to extend support to
add "half" as a dynamic token to be replaced with 1/2 the core count,
even though I wouldn't suggest that has a use case like "last" does.

I tested the string matching with a bunch of intentionally badly crafted
strings in a user-space harness, and tested bootarg use with nohz_full
and rcu_nocbs, and also the post-boot cgroup use case as per below:

   root@hackbox:/sys/fs/cgroup/cpuset# mkdir foo
   root@hackbox:/sys/fs/cgroup/cpuset# cd foo
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 10-last > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   10-15
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo all > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   0-15
   root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo none > cpuset.cpus
   root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
   
   root@hackbox:/sys/fs/cgroup/cpuset/foo#

This was on a 16 core machine with CONFIG_NR_CPUS=16 in .config file.

Note that the two use cases (boot and runtime) are why you see "early"
parameter in the code - I entertained just sticking the string copy on
the stack vs. the early alloc dance, but this felt more correct/robust.
The cgroup and modular code using cpulist_parse() are runtime cases.

---

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>

Paul Gortmaker (4):
  cpumask: un-inline cpulist_parse; prepare for ascii helpers
  cpumask: make "all" alias global and not just RCU
  cpumask: add a "none" alias to complement "all"
  cpumask: add "last" alias for cpu list specifications

 .../admin-guide/kernel-parameters.rst         |  20 +++
 .../admin-guide/kernel-parameters.txt         |   4 +-
 include/linux/cpumask.h                       |  12 +-
 kernel/rcu/tree_plugin.h                      |  13 +-
 lib/cpumask.c                                 | 132 ++++++++++++++++++
 5 files changed, 158 insertions(+), 23 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] cpumask: un-inline cpulist_parse; prepare for ascii helpers
  2020-11-08 16:08 [PATCH 0/4] RFC: support for global CPU list abbreviations Paul Gortmaker
@ 2020-11-08 16:08 ` Paul Gortmaker
  2020-11-08 16:08 ` [PATCH 2/4] cpumask: make "all" alias global and not just RCU Paul Gortmaker
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Paul Gortmaker @ 2020-11-08 16:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: cgroups, Paul Gortmaker

In order to support convenience tokens like "all", and "none" and
"last" in CPU lists, we'll have to use string operations and expand on
what is currently a simple wrapper around the underlying bitmap
function call.

Rather than add header dependencies to cpumask.h and code more complex
operations not really appropriate for a header file, we prepare by
simply un-inlining it here and move it to the lib dir alongside the
other more complex cpumask functions.

Aside from an additional exported symbol, no functional changes are
anticipated with this move.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 include/linux/cpumask.h | 12 +-----------
 lib/cpumask.c           | 13 +++++++++++++
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f0d895d6ac39..6656019cce0f 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -673,17 +673,7 @@ static inline int cpumask_parse(const char *buf, struct cpumask *dstp)
 	return bitmap_parse(buf, UINT_MAX, cpumask_bits(dstp), nr_cpumask_bits);
 }
 
-/**
- * cpulist_parse - extract a cpumask from a user string of ranges
- * @buf: the buffer to extract from
- * @dstp: the cpumask to set.
- *
- * Returns -errno, or 0 for success.
- */
-static inline int cpulist_parse(const char *buf, struct cpumask *dstp)
-{
-	return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
-}
+int cpulist_parse(const char *buf, struct cpumask *dstp);
 
 /**
  * cpumask_size - size to allocate for a 'struct cpumask' in bytes
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 85da6ab4fbb5..5eb002237404 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -95,6 +95,19 @@ int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap)
 }
 EXPORT_SYMBOL(cpumask_next_wrap);
 
+/**
+ * cpulist_parse - extract a cpumask from a user string of ranges
+ * @buf: the buffer to extract from
+ * @dstp: the cpumask to set.
+ *
+ * Returns -errno, or 0 for success.
+ */
+int cpulist_parse(const char *buf, struct cpumask *dstp)
+{
+	return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
+}
+EXPORT_SYMBOL(cpulist_parse);
+
 /* These are not inline because of header tangles. */
 #ifdef CONFIG_CPUMASK_OFFSTACK
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] cpumask: make "all" alias global and not just RCU
  2020-11-08 16:08 [PATCH 0/4] RFC: support for global CPU list abbreviations Paul Gortmaker
  2020-11-08 16:08 ` [PATCH 1/4] cpumask: un-inline cpulist_parse; prepare for ascii helpers Paul Gortmaker
@ 2020-11-08 16:08 ` Paul Gortmaker
  2020-11-08 16:08 ` [PATCH 3/4] cpumask: add a "none" alias to complement "all" Paul Gortmaker
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Paul Gortmaker @ 2020-11-08 16:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: cgroups, Paul Gortmaker

It is probably better that we don't have subsystem specific
abbreviations or aliases for generic CPU list specifications.

Hence we move the "all" from RCU out to lib/ so that it can be
used in any instance where CPU lists are being parsed.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 Documentation/admin-guide/kernel-parameters.rst |  7 +++++++
 Documentation/admin-guide/kernel-parameters.txt |  4 +---
 kernel/rcu/tree_plugin.h                        | 13 ++++---------
 lib/cpumask.c                                   |  6 ++++++
 4 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 6d421694d98e..ef98ca700946 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -68,6 +68,13 @@ For example one can add to the command line following parameter:
 
 where the final item represents CPUs 100,101,125,126,150,151,...
 
+The following convenience aliases are also accepted and used:
+
+        foo_cpus=all
+
+is equivalent to "foo_cpus=0-N" -- where "N" is the numerically last CPU on
+the system, thus avoiding looking up the value in "/sys/devices/system/cpu"
+in advance on each deployed system.
 
 
 This document may not be entirely up to date and comprehensive. The command
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 526d65d8573a..96eed72f02a2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4023,9 +4023,7 @@
 				see CONFIG_RAS_CEC help text.
 
 	rcu_nocbs=	[KNL]
-			The argument is a cpu list, as described above,
-			except that the string "all" can be used to
-			specify every CPU on the system.
+			The argument is a cpu list, as described above.
 
 			In kernels built with CONFIG_RCU_NOCB_CPU=y, set
 			the specified list of CPUs to be no-callback CPUs.
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index fd8a52e9a887..b18f89f94fd3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1463,20 +1463,15 @@ static void rcu_cleanup_after_idle(void)
 
 /*
  * Parse the boot-time rcu_nocb_mask CPU list from the kernel parameters.
- * The string after the "rcu_nocbs=" is either "all" for all CPUs, or a
- * comma-separated list of CPUs and/or CPU ranges.  If an invalid list is
- * given, a warning is emitted and all CPUs are offloaded.
+ * If the list is invalid, a warning is emitted and all CPUs are offloaded.
  */
 static int __init rcu_nocb_setup(char *str)
 {
 	alloc_bootmem_cpumask_var(&rcu_nocb_mask);
-	if (!strcasecmp(str, "all"))
+	if (cpulist_parse(str, rcu_nocb_mask)) {
+		pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
 		cpumask_setall(rcu_nocb_mask);
-	else
-		if (cpulist_parse(str, rcu_nocb_mask)) {
-			pr_warn("rcu_nocbs= bad CPU range, all CPUs set\n");
-			cpumask_setall(rcu_nocb_mask);
-		}
+	}
 	return 1;
 }
 __setup("rcu_nocbs=", rcu_nocb_setup);
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 5eb002237404..15599cdf5db6 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -2,6 +2,7 @@
 #include <linux/slab.h>
 #include <linux/kernel.h>
 #include <linux/bitops.h>
+#include <linux/string.h>
 #include <linux/cpumask.h>
 #include <linux/export.h>
 #include <linux/memblock.h>
@@ -104,6 +105,11 @@ EXPORT_SYMBOL(cpumask_next_wrap);
  */
 int cpulist_parse(const char *buf, struct cpumask *dstp)
 {
+	if (!strcmp(buf, "all")) {
+		cpumask_setall(dstp);
+		return 0;
+	}
+
 	return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
 }
 EXPORT_SYMBOL(cpulist_parse);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] cpumask: add a "none" alias to complement "all"
  2020-11-08 16:08 [PATCH 0/4] RFC: support for global CPU list abbreviations Paul Gortmaker
  2020-11-08 16:08 ` [PATCH 1/4] cpumask: un-inline cpulist_parse; prepare for ascii helpers Paul Gortmaker
  2020-11-08 16:08 ` [PATCH 2/4] cpumask: make "all" alias global and not just RCU Paul Gortmaker
@ 2020-11-08 16:08 ` Paul Gortmaker
  2020-11-08 16:08 ` [PATCH 4/4] cpumask: add "last" alias for cpu list specifications Paul Gortmaker
  2020-11-08 18:02 ` [PATCH 0/4] RFC: support for global CPU list abbreviations Paul E. McKenney
  4 siblings, 0 replies; 9+ messages in thread
From: Paul Gortmaker @ 2020-11-08 16:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: cgroups, Paul Gortmaker

With global support for a CPU list alias of "all", it seems to just make
sense to also trivially extend support for an opposite "none" specifier.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 Documentation/admin-guide/kernel-parameters.rst | 6 ++++++
 lib/cpumask.c                                   | 5 +++++
 2 files changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index ef98ca700946..9e1c4522e1f0 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -76,6 +76,12 @@ is equivalent to "foo_cpus=0-N" -- where "N" is the numerically last CPU on
 the system, thus avoiding looking up the value in "/sys/devices/system/cpu"
 in advance on each deployed system.
 
+        foo_cpus=none
+
+will provide an empty/cleared cpu mask for the associated boot argument.
+
+Note that "all" and "none" are not necessarily valid/sensible input values
+for each available parameter expecting a CPU list.
 
 This document may not be entirely up to date and comprehensive. The command
 "modinfo -p ${modulename}" shows a current list of all parameters of a loadable
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 15599cdf5db6..eb8b1c92501e 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -110,6 +110,11 @@ int cpulist_parse(const char *buf, struct cpumask *dstp)
 		return 0;
 	}
 
+	if (!strcmp(buf, "none")) {
+		cpumask_clear(dstp);
+		return 0;
+	}
+
 	return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
 }
 EXPORT_SYMBOL(cpulist_parse);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] cpumask: add "last" alias for cpu list specifications
  2020-11-08 16:08 [PATCH 0/4] RFC: support for global CPU list abbreviations Paul Gortmaker
                   ` (2 preceding siblings ...)
  2020-11-08 16:08 ` [PATCH 3/4] cpumask: add a "none" alias to complement "all" Paul Gortmaker
@ 2020-11-08 16:08 ` Paul Gortmaker
  2020-11-08 18:02 ` [PATCH 0/4] RFC: support for global CPU list abbreviations Paul E. McKenney
  4 siblings, 0 replies; 9+ messages in thread
From: Paul Gortmaker @ 2020-11-08 16:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: cgroups, Paul Gortmaker

It seems that a common configuration is to use the 1st couple cores
for housekeeping tasks, and or driving a busy peripheral that generates
a lot of interrupts, or something similar.

This tends to leave the remaining ones to form a pool of similarly
configured cores to take on the real workload of interest to the user.

So on machine A - with 32 cores, it could be 0-3 for "system" and then
4-31 being used in boot args like nohz_full=, or rcu_nocbs= as part of
setting up the worker pool of CPUs.

But then newer machine B is added, and it has 48 cores, and so while
the 0-3 part remains unchanged, the pool setup cpu list becomes 4-47.

Deployment would be easier if we could just simply replace 31 and 47
with "last" and let the system substitute in the actual number at boot;
a number that it knows better than we do.

No need to have custom boot args per node, no need to do a trial boot
in order to snoop /proc/cpuinfo and/or /sys/devices/system/cpu - no
more fencepost errors of using 32 and 48 instead of 31 and 47.

A generic token replacement is used to substitute "last" with the
number of CPUs present before handing off to bitmap processing.  But
it could just as easily be used to replace any placeholder token with
any other token or value only known at/after boot.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 .../admin-guide/kernel-parameters.rst         |   7 ++
 lib/cpumask.c                                 | 112 +++++++++++++++++-
 2 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 9e1c4522e1f0..362dea55034e 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -83,6 +83,13 @@ will provide an empty/cleared cpu mask for the associated boot argument.
 Note that "all" and "none" are not necessarily valid/sensible input values
 for each available parameter expecting a CPU list.
 
+        foo_cpus=1,3,5,16-last
+
+will at runtime, replace "last" with the number of the last (highest number)
+present CPU on the system.  Thus a common deployment can be used on multiple
+systems with different total number of cores present, without needing to
+evaluate the total core count in advance on each system.
+
 This document may not be entirely up to date and comprehensive. The command
 "modinfo -p ${modulename}" shows a current list of all parameters of a loadable
 module. Loadable modules, after being loaded into the running kernel, also
diff --git a/lib/cpumask.c b/lib/cpumask.c
index eb8b1c92501e..6c66f94b701d 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -3,6 +3,7 @@
 #include <linux/kernel.h>
 #include <linux/bitops.h>
 #include <linux/string.h>
+#include <linux/ctype.h>
 #include <linux/cpumask.h>
 #include <linux/export.h>
 #include <linux/memblock.h>
@@ -96,15 +97,97 @@ int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap)
 }
 EXPORT_SYMBOL(cpumask_next_wrap);
 
+/*
+ * Basically strstr() but given "foo", ignore "foobar", "myfoo", "foofoo"
+ * and "foo2bar" -- i.e. any case where the token is a word fragment.
+ */
+static char *cpumask_find_token(const char *str, const char *token)
+{
+	char *here = strstr(str, token);
+	size_t tlen = strlen(token);
+
+	if (!here)
+		return NULL;
+
+	while (here) {
+		size_t offset = here - str;
+		char prev, next = str[offset + tlen];
+
+		if (offset)
+			prev = str[offset - 1];
+		else
+			prev = '\0';
+
+		if (!(isalnum(prev) || isalnum(next)))
+			break;
+
+		here = strstr(here + tlen, token);
+	}
+
+	return here;
+}
+
+/*
+ * replace old token with new token: Given a convenience or placeholder
+ * token "last" and an associated value not known until boot, of say 1234,
+ * replace instances of "last" with "1234".
+ *
+ * For example src = "1,3,last,7-last,9,lastly,last-2047\0"  results in a
+ *            dest = "1,3,1234,7-1234,9,lastly,1234-2047\0"
+ *
+ * The destination string may be shorter than, equal to, or longer than
+ * the source string -- based on whether the new token strlen is shorter
+ * than, equal to, or longer than the old token strlen.
+ * The caller must allocate dest space accordingly with that in mind.
+ */
+
+static void cpulist_replace_token(char *dest, const char *src,
+			   const char *old_token, const char *new_token)
+{
+	const char *src_start = src;
+	char *dest_start = dest, *here;
+	const size_t olen = strlen(old_token);
+	const size_t nlen = strlen(new_token);
+
+	here = cpumask_find_token(src_start, old_token);
+	if (!here) {
+		strcpy(dest, src);
+		return;
+	}
+
+	while (here) {
+		size_t offset = here - src_start;
+
+		strncpy(dest_start, src_start, offset);
+		dest_start += offset;
+		src_start += offset;
+
+		strcpy(dest_start, new_token);
+		dest_start += nlen;
+		src_start += olen;
+
+		strcpy(dest_start, src_start);	/* remainder of string */
+		here = cpumask_find_token(src_start, old_token);
+	}
+}
+
 /**
  * cpulist_parse - extract a cpumask from a user string of ranges
  * @buf: the buffer to extract from
  * @dstp: the cpumask to set.
  *
  * Returns -errno, or 0 for success.
+ *
+ * Marked __ref because memblock_*() are __meminit and we use them for
+ * any early calls before slab is available.
  */
-int cpulist_parse(const char *buf, struct cpumask *dstp)
+int __ref cpulist_parse(const char *buf, struct cpumask *dstp)
 {
+	int r;
+	char *cpulist, last_cpu[5];	/* NR_CPUS <= 9999 */
+	size_t len = strlen(buf);
+	bool early = !slab_is_available();
+
 	if (!strcmp(buf, "all")) {
 		cpumask_setall(dstp);
 		return 0;
@@ -115,7 +198,32 @@ int cpulist_parse(const char *buf, struct cpumask *dstp)
 		return 0;
 	}
 
-	return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
+	/*
+	 * strlen("last") means "len" is OK up to NR_CPUS <= 9999.
+	 */
+	if (early)
+		cpulist = memblock_alloc(len, SMP_CACHE_BYTES);
+	else
+		cpulist = kzalloc(len, GFP_KERNEL);
+
+	if (cpulist == NULL)
+		return -ENOMEM;
+
+	/*
+	 * bitmap_parselist has no concept of "last" CPU, so we have to
+	 * replace "last" with a real number in dest copy of the string.
+	 */
+	sprintf(last_cpu, "%d", cpumask_last(cpu_present_mask));
+	cpulist_replace_token(cpulist, buf, "last", last_cpu);
+
+	r = bitmap_parselist(cpulist, cpumask_bits(dstp), nr_cpumask_bits);
+
+	if (early)
+		memblock_free((phys_addr_t)cpulist, len);
+	else
+		kfree(cpulist);
+
+	return r;
 }
 EXPORT_SYMBOL(cpulist_parse);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] RFC: support for global CPU list abbreviations
  2020-11-08 16:08 [PATCH 0/4] RFC: support for global CPU list abbreviations Paul Gortmaker
                   ` (3 preceding siblings ...)
  2020-11-08 16:08 ` [PATCH 4/4] cpumask: add "last" alias for cpu list specifications Paul Gortmaker
@ 2020-11-08 18:02 ` Paul E. McKenney
  2020-11-08 20:21   ` Paul Gortmaker
  4 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2020-11-08 18:02 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-kernel, cgroups, Frederic Weisbecker, Josh Triplett,
	Thomas Gleixner, Ingo Molnar, Li Zefan

On Sun, Nov 08, 2020 at 11:08:12AM -0500, Paul Gortmaker wrote:
> The basic objective here was to add support for "nohz_full=8-last" and/or
> "rcu_nocbs="4-last" -- essentially introduce "last" as a portable
> reference evaluated at boot/runtime for anything using a CPU list.
> 
> The thinking behind this, is that people carve off a few early CPUs to
> support housekeeping tasks, and perhaps dedicate one to a busy I/O
> peripheral, and then the remaining pool of CPUs out to the end are a
> part of a commonly configured pool used for the real work the user
> cares about.
> 
> Extend that logic out to a fleet of machines - some new, and some
> nearing EOL, and you've probably got a wide range of core counts to
> contend with - even though the early number of cores dedicated to the
> system overhead probably doesn't vary.
> 
> This change would enable sysadmins to have a common bootarg across all
> such systems, and would also avoid any off-by-one fencepost errors that
> happen for users who might briefly forget that core counts start at
> zero.
> 
> Looking around before starting, I noticed RCU already had a short-form
> abbreviation "all" -- but if we want to treat CPU lists in a uniform
> matter, then tokens shouldn't be implemented at a subsystem level and
> hence be subsystem specific; each with their own variations.
> 
> So I moved "all" to global use - for boot args, and for cgroups.  Then
> I added the inverse "none" and finally, the one I wanted -- "last".
> 
> The use of "last" isn't a standalone word like "all" or "none".  It will
> be a part of a complete range specification, possibly with CSV separate
> ranges, and possibly specified multiple times.  So I had to be a bit
> more careful with string matching - and hence un-inlined the parse
> function as commit #1 in this series.
> 
> But it really is a generic support for "replace token ABC with known at
> boot value XYZ" - for example, it would be trivial to extend support to
> add "half" as a dynamic token to be replaced with 1/2 the core count,
> even though I wouldn't suggest that has a use case like "last" does.
> 
> I tested the string matching with a bunch of intentionally badly crafted
> strings in a user-space harness, and tested bootarg use with nohz_full
> and rcu_nocbs, and also the post-boot cgroup use case as per below:
> 
>    root@hackbox:/sys/fs/cgroup/cpuset# mkdir foo
>    root@hackbox:/sys/fs/cgroup/cpuset# cd foo
>    root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
>    
>    root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo 10-last > cpuset.cpus
>    root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
>    10-15
>    root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo all > cpuset.cpus
>    root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
>    0-15
>    root@hackbox:/sys/fs/cgroup/cpuset/foo# /bin/echo none > cpuset.cpus
>    root@hackbox:/sys/fs/cgroup/cpuset/foo# cat cpuset.cpus
>    
>    root@hackbox:/sys/fs/cgroup/cpuset/foo#
> 
> This was on a 16 core machine with CONFIG_NR_CPUS=16 in .config file.
> 
> Note that the two use cases (boot and runtime) are why you see "early"
> parameter in the code - I entertained just sticking the string copy on
> the stack vs. the early alloc dance, but this felt more correct/robust.
> The cgroup and modular code using cpulist_parse() are runtime cases.

I considered doing this when adding "all" for RCU, but was just too
lazy.  So you are a better man than I am!  ;-)

I have queued these for testing, both "all" and "last" work just fine.
Given that "all" works, I hereby declare "none" to be working by
inspection.  Therefore, for 1, 2, and 4:

Tested-by: Paul E. McKenney <paulmck@kernel.org>

For 3:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

Or I can carry them if you wish.  My expected changes in response to
this series are shown below, and are also what I used to test it.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE04.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE04.boot
index 5adc675..25a765d 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE04.boot
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE04.boot
@@ -1 +1 @@
-rcutree.rcu_fanout_leaf=4 nohz_full=1-7
+rcutree.rcu_fanout_leaf=4 nohz_full=1-last
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE08.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE08.boot
index 22478fd..94d3844 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE08.boot
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE08.boot
@@ -1,3 +1,3 @@
 rcupdate.rcu_self_test=1
 rcutree.rcu_fanout_exact=1
-rcu_nocbs=0-7
+rcu_nocbs=all

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] RFC: support for global CPU list abbreviations
  2020-11-08 18:02 ` [PATCH 0/4] RFC: support for global CPU list abbreviations Paul E. McKenney
@ 2020-11-08 20:21   ` Paul Gortmaker
  2020-11-08 21:24     ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Gortmaker @ 2020-11-08 20:21 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, cgroups, Frederic Weisbecker, Josh Triplett,
	Thomas Gleixner, Ingo Molnar, Li Zefan

On 2020-11-08 1:02 p.m., Paul E. McKenney wrote:

 > Or I can carry them if you wish.  My expected changes in response to
 > this series are shown below, and are also what I used to test it.

Thanks Paul - that would get linux-next exposure w/o me pestering sfr.
If nobody else has objections, having them in rcu-next would be great.

Paul.
--

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] RFC: support for global CPU list abbreviations
  2020-11-08 20:21   ` Paul Gortmaker
@ 2020-11-08 21:24     ` Paul E. McKenney
  0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2020-11-08 21:24 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-kernel, cgroups, Frederic Weisbecker, Josh Triplett,
	Thomas Gleixner, Ingo Molnar, Li Zefan

On Sun, Nov 08, 2020 at 03:21:40PM -0500, Paul Gortmaker wrote:
> On 2020-11-08 1:02 p.m., Paul E. McKenney wrote:
> 
> > Or I can carry them if you wish.  My expected changes in response to
> > this series are shown below, and are also what I used to test it.
> 
> Thanks Paul - that would get linux-next exposure w/o me pestering sfr.
> If nobody else has objections, having them in rcu-next would be great.

Unless/until someone objects, you got it!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 4/4] cpumask: add "last" alias for cpu list specifications
  2020-11-10  4:07 [PATCH v2 0/4] " Paul Gortmaker
@ 2020-11-10  4:07 ` Paul Gortmaker
  0 siblings, 0 replies; 9+ messages in thread
From: Paul Gortmaker @ 2020-11-10  4:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: cgroups, rdunlap, colin.king, Paul Gortmaker

It seems that a common configuration is to use the 1st couple cores
for housekeeping tasks, and or driving a busy peripheral that generates
a lot of interrupts, or something similar.

This tends to leave the remaining ones to form a pool of similarly
configured cores to take on the real workload of interest to the user.

So on machine A - with 32 cores, it could be 0-3 for "system" and then
4-31 being used in boot args like nohz_full=, or rcu_nocbs= as part of
setting up the worker pool of CPUs.

But then newer machine B is added, and it has 48 cores, and so while
the 0-3 part remains unchanged, the pool setup cpu list becomes 4-47.

Deployment would be easier if we could just simply replace 31 and 47
with "last" and let the system substitute in the actual number at boot;
a number that it knows better than we do.

No need to have custom boot args per node, no need to do a trial boot
in order to snoop /proc/cpuinfo and/or /sys/devices/system/cpu - no
more fencepost errors of using 32 and 48 instead of 31 and 47.

A generic token replacement is used to substitute "last" with the
number of CPUs present before handing off to bitmap processing.  But
it could just as easily be used to replace any placeholder token with
any other token or value only known at/after boot.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 .../admin-guide/kernel-parameters.rst         |   7 ++
 lib/cpumask.c                                 | 112 +++++++++++++++++-
 2 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 9e1c4522e1f0..362dea55034e 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -83,6 +83,13 @@ will provide an empty/cleared cpu mask for the associated boot argument.
 Note that "all" and "none" are not necessarily valid/sensible input values
 for each available parameter expecting a CPU list.
 
+        foo_cpus=1,3,5,16-last
+
+will at runtime, replace "last" with the number of the last (highest number)
+present CPU on the system.  Thus a common deployment can be used on multiple
+systems with different total number of cores present, without needing to
+evaluate the total core count in advance on each system.
+
 This document may not be entirely up to date and comprehensive. The command
 "modinfo -p ${modulename}" shows a current list of all parameters of a loadable
 module. Loadable modules, after being loaded into the running kernel, also
diff --git a/lib/cpumask.c b/lib/cpumask.c
index eb8b1c92501e..fa56d622c1d8 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -3,6 +3,7 @@
 #include <linux/kernel.h>
 #include <linux/bitops.h>
 #include <linux/string.h>
+#include <linux/ctype.h>
 #include <linux/cpumask.h>
 #include <linux/export.h>
 #include <linux/memblock.h>
@@ -96,15 +97,97 @@ int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap)
 }
 EXPORT_SYMBOL(cpumask_next_wrap);
 
+/*
+ * Basically strstr() but given "foo", ignore "foobar", "myfoo", "foofoo"
+ * and "foo2bar" -- i.e. any case where the token is a word fragment.
+ */
+static char *cpumask_find_token(const char *str, const char *token)
+{
+	char *here = strstr(str, token);
+	size_t tlen = strlen(token);
+
+	if (!here)
+		return NULL;
+
+	while (here) {
+		size_t offset = here - str;
+		char prev, next = str[offset + tlen];
+
+		if (offset)
+			prev = str[offset - 1];
+		else
+			prev = '\0';
+
+		if (!(isalnum(prev) || isalnum(next)))
+			break;
+
+		here = strstr(here + tlen, token);
+	}
+
+	return here;
+}
+
+/*
+ * replace old token with new token: Given a convenience or placeholder
+ * token "last" and an associated value not known until boot, of say 1234,
+ * replace instances of "last" with "1234".
+ *
+ * For example src = "1,3,last,7-last,9,lastly,last-2047\0"  results in a
+ *            dest = "1,3,1234,7-1234,9,lastly,1234-2047\0"
+ *
+ * The destination string may be shorter than, equal to, or longer than
+ * the source string -- based on whether the new token strlen is shorter
+ * than, equal to, or longer than the old token strlen.
+ * The caller must allocate dest space accordingly with that in mind.
+ */
+
+static void cpulist_replace_token(char *dest, const char *src,
+			   const char *old_token, const char *new_token)
+{
+	const char *src_start = src;
+	char *dest_start = dest, *here;
+	const size_t olen = strlen(old_token);
+	const size_t nlen = strlen(new_token);
+
+	here = cpumask_find_token(src_start, old_token);
+	if (!here) {
+		strcpy(dest, src);
+		return;
+	}
+
+	while (here) {
+		size_t offset = here - src_start;
+
+		strncpy(dest_start, src_start, offset);
+		dest_start += offset;
+		src_start += offset;
+
+		strcpy(dest_start, new_token);
+		dest_start += nlen;
+		src_start += olen;
+
+		strcpy(dest_start, src_start);	/* remainder of string */
+		here = cpumask_find_token(src_start, old_token);
+	}
+}
+
 /**
  * cpulist_parse - extract a cpumask from a user string of ranges
  * @buf: the buffer to extract from
  * @dstp: the cpumask to set.
  *
  * Returns -errno, or 0 for success.
+ *
+ * Marked __ref because memblock_*() are __meminit and we use them for
+ * any early calls before slab is available.
  */
-int cpulist_parse(const char *buf, struct cpumask *dstp)
+int __ref cpulist_parse(const char *buf, struct cpumask *dstp)
 {
+	int r;
+	char *cpulist, last_cpu[5];	/* NR_CPUS <= 9999 */
+	size_t len = strlen(buf) + 1;	/* don't forget '\0' */
+	bool early = !slab_is_available();
+
 	if (!strcmp(buf, "all")) {
 		cpumask_setall(dstp);
 		return 0;
@@ -115,7 +198,32 @@ int cpulist_parse(const char *buf, struct cpumask *dstp)
 		return 0;
 	}
 
-	return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
+	/*
+	 * strlen("last") means "len" is OK up to NR_CPUS <= 9999.
+	 */
+	if (early)
+		cpulist = memblock_alloc(len, SMP_CACHE_BYTES);
+	else
+		cpulist = kzalloc(len, GFP_KERNEL);
+
+	if (cpulist == NULL)
+		return -ENOMEM;
+
+	/*
+	 * bitmap_parselist has no concept of "last" CPU, so we have to
+	 * replace "last" with a real number in dest copy of the string.
+	 */
+	sprintf(last_cpu, "%d", cpumask_last(cpu_present_mask));
+	cpulist_replace_token(cpulist, buf, "last", last_cpu);
+
+	r = bitmap_parselist(cpulist, cpumask_bits(dstp), nr_cpumask_bits);
+
+	if (early)
+		memblock_free((phys_addr_t)cpulist, len);
+	else
+		kfree(cpulist);
+
+	return r;
 }
 EXPORT_SYMBOL(cpulist_parse);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-10  4:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-08 16:08 [PATCH 0/4] RFC: support for global CPU list abbreviations Paul Gortmaker
2020-11-08 16:08 ` [PATCH 1/4] cpumask: un-inline cpulist_parse; prepare for ascii helpers Paul Gortmaker
2020-11-08 16:08 ` [PATCH 2/4] cpumask: make "all" alias global and not just RCU Paul Gortmaker
2020-11-08 16:08 ` [PATCH 3/4] cpumask: add a "none" alias to complement "all" Paul Gortmaker
2020-11-08 16:08 ` [PATCH 4/4] cpumask: add "last" alias for cpu list specifications Paul Gortmaker
2020-11-08 18:02 ` [PATCH 0/4] RFC: support for global CPU list abbreviations Paul E. McKenney
2020-11-08 20:21   ` Paul Gortmaker
2020-11-08 21:24     ` Paul E. McKenney
2020-11-10  4:07 [PATCH v2 0/4] " Paul Gortmaker
2020-11-10  4:07 ` [PATCH 4/4] cpumask: add "last" alias for cpu list specifications Paul Gortmaker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).