All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/2] Fix perf bench numa to work with machines having #CPUs > 1K
@ 2022-04-12 16:40 ` Athira Rajeev
  0 siblings, 0 replies; 9+ messages in thread
From: Athira Rajeev @ 2022-04-12 16:40 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain,
	linux-kernel, srikar, irogers

The perf benchmark for collections: numa hits failure in system
configuration with CPU's more than 1024. These benchmarks uses
"sched_getaffinity" and "sched_setaffinity" in the code to
work with affinity.

Example snippet from numa benchmark:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>

bind_to_node function uses "sched_getaffinity" to save the cpumask.
This fails with EINVAL because the default mask size in glibc is 1024

To overcome this 1024 CPUs mask size limitation of cpu_set_t,
change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.

Fix all the relevant places in the code to use mask size which is large
enough to represent number of possible CPU's in the system.

This patchset also address a fix for parse_setup_cpu_list function in
numa bench to check if input CPU is online before binding task to
that CPU. This is to fix failures where, though CPU number is within
max CPU, it could happen that CPU is offline. Here, sched_setaffinity
will result in failure when using cpumask having that cpu bit set
in the mask.

Patch 1 address fix for parse_setup_cpu_list to check if CPU used to bind
task is online. Patch 2 has fix for bench numa to work with machines
having #CPUs > 1K

Athira Rajeev (2):
  tools/perf: Fix perf bench numa testcase to check if CPU used to bind
    task is online
  perf bench: Fix numa bench to fix usage of affinity for machines with
    #CPUs > 1K

Changelog:
v2 -> v3
Link to the v2 version :
https://lore.kernel.org/all/20220406175113.87881-1-atrajeev@linux.vnet.ibm.com/
 - From the v2 version, patch 1 and patch 2 are now part of upstream.
 - This v3 version separates patch 3 and patch 4 to address review
   comments from arnaldo which includes using sysfs__read_str for reading
   sysfs file and fixing the compilation issues observed in debian

 tools/perf/bench/numa.c  | 136 +++++++++++++++++++++++++++++----------
 tools/perf/util/header.c |  51 +++++++++++++++
 tools/perf/util/header.h |   1 +
 3 files changed, 153 insertions(+), 35 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V3 0/2] Fix perf bench numa to work with machines having #CPUs > 1K
@ 2022-04-12 16:40 ` Athira Rajeev
  0 siblings, 0 replies; 9+ messages in thread
From: Athira Rajeev @ 2022-04-12 16:40 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: irogers, maddy, srikar, rnsastry, linux-kernel, linux-perf-users,
	kjain, linuxppc-dev

The perf benchmark for collections: numa hits failure in system
configuration with CPU's more than 1024. These benchmarks uses
"sched_getaffinity" and "sched_setaffinity" in the code to
work with affinity.

Example snippet from numa benchmark:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>

bind_to_node function uses "sched_getaffinity" to save the cpumask.
This fails with EINVAL because the default mask size in glibc is 1024

To overcome this 1024 CPUs mask size limitation of cpu_set_t,
change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.

Fix all the relevant places in the code to use mask size which is large
enough to represent number of possible CPU's in the system.

This patchset also address a fix for parse_setup_cpu_list function in
numa bench to check if input CPU is online before binding task to
that CPU. This is to fix failures where, though CPU number is within
max CPU, it could happen that CPU is offline. Here, sched_setaffinity
will result in failure when using cpumask having that cpu bit set
in the mask.

Patch 1 address fix for parse_setup_cpu_list to check if CPU used to bind
task is online. Patch 2 has fix for bench numa to work with machines
having #CPUs > 1K

Athira Rajeev (2):
  tools/perf: Fix perf bench numa testcase to check if CPU used to bind
    task is online
  perf bench: Fix numa bench to fix usage of affinity for machines with
    #CPUs > 1K

Changelog:
v2 -> v3
Link to the v2 version :
https://lore.kernel.org/all/20220406175113.87881-1-atrajeev@linux.vnet.ibm.com/
 - From the v2 version, patch 1 and patch 2 are now part of upstream.
 - This v3 version separates patch 3 and patch 4 to address review
   comments from arnaldo which includes using sysfs__read_str for reading
   sysfs file and fixing the compilation issues observed in debian

 tools/perf/bench/numa.c  | 136 +++++++++++++++++++++++++++++----------
 tools/perf/util/header.c |  51 +++++++++++++++
 tools/perf/util/header.h |   1 +
 3 files changed, 153 insertions(+), 35 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V3 1/2] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online
  2022-04-12 16:40 ` Athira Rajeev
@ 2022-04-12 16:40   ` Athira Rajeev
  -1 siblings, 0 replies; 9+ messages in thread
From: Athira Rajeev @ 2022-04-12 16:40 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain,
	linux-kernel, srikar, irogers

Perf numa bench test fails with error:

Testcase:
./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
--thp  1 --no-data_rand_walk

Failure snippet:
<<>>
 Running 'numa/mem' benchmark:

 # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
-M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"

perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
<<>>

The Testcases uses CPU's 0 and 8. In function "parse_setup_cpu_list",
There is check to see if cpu number is greater than max cpu's possible
in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
say 48 CPU's, but only number of online CPU's is 0-7. Other CPU's
are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
and set bit for CPU 8 also in cpumask ( td->bind_cpumask).

bind_to_cpumask function is called to set affinity using
sched_setaffinity and the cpumask. Since the CPU8 is not present,
set affinity will fail here with EINVAL. Fix this issue by adding a
check to make sure that, CPU's provided in the input argument values
are online before proceeding further and skip the test. For this,
include new helper function "is_cpu_online" in
"tools/perf/util/header.c".

Since "BIT(x)" definition will get included from header.h, remove
that from bench/numa.c

Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
---
Changelog
 The first version was posted here:
 https://lore.kernel.org/all/20220406175113.87881-5-atrajeev@linux.vnet.ibm.com/
 This v3 version separates patch 4 and addressing review comments
 from Arnaldo to use sysfs__read_str for reading sysfs file.

 tools/perf/bench/numa.c  |  8 +++++--
 tools/perf/util/header.c | 51 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/header.h |  1 +
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index f2640179ada9..c838c248aa2c 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -34,6 +34,7 @@
 #include <linux/numa.h>
 #include <linux/zalloc.h>
 
+#include "../util/header.h"
 #include <numa.h>
 #include <numaif.h>
 
@@ -585,6 +586,11 @@ static int parse_setup_cpu_list(void)
 			return -1;
 		}
 
+		if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
+			printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
+			return -1;
+		}
+
 		BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
 		BUG_ON(bind_cpu_0 > bind_cpu_1);
 
@@ -752,8 +758,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
 	return parse_node_list(arg);
 }
 
-#define BIT(x) (1ul << x)
-
 static inline uint32_t lfsr_32(uint32_t lfsr)
 {
 	const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index d546ff724dbe..a27132e5a5ef 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -983,6 +983,57 @@ static int write_dir_format(struct feat_fd *ff,
 	return do_write(ff, &data->dir.version, sizeof(data->dir.version));
 }
 
+/*
+ * Check whether a CPU is online
+ *
+ * Returns:
+ *     1 -> if CPU is online
+ *     0 -> if CPU is offline
+ *    -1 -> error case
+ */
+int is_cpu_online(unsigned int cpu)
+{
+	char *str;
+	size_t strlen;
+	char buf[256];
+	int status = -1;
+	struct stat statbuf;
+
+	snprintf(buf, sizeof(buf),
+		"/sys/devices/system/cpu/cpu%d", cpu);
+	if (stat(buf, &statbuf) != 0)
+		return 0;
+
+	/*
+	 * Check if /sys/devices/system/cpu/cpux/online file
+	 * exists. Some cases cpu0 won't have online file since
+	 * it is not expected to be turned off generally.
+	 * In kernels without CONFIG_HOTPLUG_CPU, this
+	 * file won't exist
+	 */
+	snprintf(buf, sizeof(buf),
+		"/sys/devices/system/cpu/cpu%d/online", cpu);
+	if (stat(buf, &statbuf) != 0)
+		return 1;
+
+	/*
+	 * Read online file using sysfs__read_str.
+	 * If read or open fails, return -1.
+	 * If read succeeds, return value from file
+	 * which gets stored in "str"
+	 */
+	snprintf(buf, sizeof(buf),
+		"devices/system/cpu/cpu%d/online", cpu);
+
+	if (sysfs__read_str(buf, &str, &strlen) < 0)
+		return status;
+
+	status = atoi(str);
+
+	free(str);
+	return status;
+}
+
 #ifdef HAVE_LIBBPF_SUPPORT
 static int write_bpf_prog_info(struct feat_fd *ff,
 			       struct evlist *evlist __maybe_unused)
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index c9e3265832d9..0eb4bc29a5a4 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
 int write_padded(struct feat_fd *fd, const void *bf,
 		 size_t count, size_t count_aligned);
 
+int is_cpu_online(unsigned int cpu);
 /*
  * arch specific callback
  */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 1/2] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online
@ 2022-04-12 16:40   ` Athira Rajeev
  0 siblings, 0 replies; 9+ messages in thread
From: Athira Rajeev @ 2022-04-12 16:40 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: irogers, maddy, srikar, rnsastry, linux-kernel, linux-perf-users,
	kjain, linuxppc-dev

Perf numa bench test fails with error:

Testcase:
./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
--thp  1 --no-data_rand_walk

Failure snippet:
<<>>
 Running 'numa/mem' benchmark:

 # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
-M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"

perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
<<>>

The Testcases uses CPU's 0 and 8. In function "parse_setup_cpu_list",
There is check to see if cpu number is greater than max cpu's possible
in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
say 48 CPU's, but only number of online CPU's is 0-7. Other CPU's
are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
and set bit for CPU 8 also in cpumask ( td->bind_cpumask).

bind_to_cpumask function is called to set affinity using
sched_setaffinity and the cpumask. Since the CPU8 is not present,
set affinity will fail here with EINVAL. Fix this issue by adding a
check to make sure that, CPU's provided in the input argument values
are online before proceeding further and skip the test. For this,
include new helper function "is_cpu_online" in
"tools/perf/util/header.c".

Since "BIT(x)" definition will get included from header.h, remove
that from bench/numa.c

Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
---
Changelog
 The first version was posted here:
 https://lore.kernel.org/all/20220406175113.87881-5-atrajeev@linux.vnet.ibm.com/
 This v3 version separates patch 4 and addressing review comments
 from Arnaldo to use sysfs__read_str for reading sysfs file.

 tools/perf/bench/numa.c  |  8 +++++--
 tools/perf/util/header.c | 51 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/header.h |  1 +
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index f2640179ada9..c838c248aa2c 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -34,6 +34,7 @@
 #include <linux/numa.h>
 #include <linux/zalloc.h>
 
+#include "../util/header.h"
 #include <numa.h>
 #include <numaif.h>
 
@@ -585,6 +586,11 @@ static int parse_setup_cpu_list(void)
 			return -1;
 		}
 
+		if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
+			printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
+			return -1;
+		}
+
 		BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
 		BUG_ON(bind_cpu_0 > bind_cpu_1);
 
@@ -752,8 +758,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
 	return parse_node_list(arg);
 }
 
-#define BIT(x) (1ul << x)
-
 static inline uint32_t lfsr_32(uint32_t lfsr)
 {
 	const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index d546ff724dbe..a27132e5a5ef 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -983,6 +983,57 @@ static int write_dir_format(struct feat_fd *ff,
 	return do_write(ff, &data->dir.version, sizeof(data->dir.version));
 }
 
+/*
+ * Check whether a CPU is online
+ *
+ * Returns:
+ *     1 -> if CPU is online
+ *     0 -> if CPU is offline
+ *    -1 -> error case
+ */
+int is_cpu_online(unsigned int cpu)
+{
+	char *str;
+	size_t strlen;
+	char buf[256];
+	int status = -1;
+	struct stat statbuf;
+
+	snprintf(buf, sizeof(buf),
+		"/sys/devices/system/cpu/cpu%d", cpu);
+	if (stat(buf, &statbuf) != 0)
+		return 0;
+
+	/*
+	 * Check if /sys/devices/system/cpu/cpux/online file
+	 * exists. Some cases cpu0 won't have online file since
+	 * it is not expected to be turned off generally.
+	 * In kernels without CONFIG_HOTPLUG_CPU, this
+	 * file won't exist
+	 */
+	snprintf(buf, sizeof(buf),
+		"/sys/devices/system/cpu/cpu%d/online", cpu);
+	if (stat(buf, &statbuf) != 0)
+		return 1;
+
+	/*
+	 * Read online file using sysfs__read_str.
+	 * If read or open fails, return -1.
+	 * If read succeeds, return value from file
+	 * which gets stored in "str"
+	 */
+	snprintf(buf, sizeof(buf),
+		"devices/system/cpu/cpu%d/online", cpu);
+
+	if (sysfs__read_str(buf, &str, &strlen) < 0)
+		return status;
+
+	status = atoi(str);
+
+	free(str);
+	return status;
+}
+
 #ifdef HAVE_LIBBPF_SUPPORT
 static int write_bpf_prog_info(struct feat_fd *ff,
 			       struct evlist *evlist __maybe_unused)
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index c9e3265832d9..0eb4bc29a5a4 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
 int write_padded(struct feat_fd *fd, const void *bf,
 		 size_t count, size_t count_aligned);
 
+int is_cpu_online(unsigned int cpu);
 /*
  * arch specific callback
  */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 2/2] perf bench: Fix numa bench to fix usage of affinity for machines with #CPUs > 1K
  2022-04-12 16:40 ` Athira Rajeev
@ 2022-04-12 16:40   ` Athira Rajeev
  -1 siblings, 0 replies; 9+ messages in thread
From: Athira Rajeev @ 2022-04-12 16:40 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain,
	linux-kernel, srikar, irogers

The 'perf bench numa' testcase fails on systems with more than 1K CPUs.

Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1

Snippet of code:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>

bind_to_node function uses "sched_getaffinity" to save the original
cpumask and this call is returning EINVAL ((invalid argument).

This happens because the default mask size in glibc is 1024.  To
overcome this 1024 CPUs mask size limitation of cpu_set_t, change the
mask size using the CPU_*_S macros ie, use CPU_ALLOC to allocate
cpumask, CPU_ALLOC_SIZE for size.

Apart from fixing this for "orig_mask", apply same logic to "mask" as
well which is used to setaffinity so that mask size is large enough to
represent number of possible CPU's in the system.

sched_getaffinity is used in one more place in perf numa bench. It is in
"bind_to_cpu" function. Apply the same logic there also. Though
currently no failure is reported from there, it is ideal to change
getaffinity to work with such system configurations having CPU's more
than default mask size supported by glibc.

Also fix "sched_setaffinity" to use mask size which is large enough to
represent number of possible CPU's in the system.

Fixed all places where "bind_cpumask" which is part of "struct
thread_data" is used such that bind_cpumask works in all configuration.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
Changelog
 The first version was posted here:
 https://lore.kernel.org/all/20220406175113.87881-5-atrajeev@linux.vnet.ibm.com/
 This v3 version separates patch 3 and addressing compilation
 failure in some distro's. Change in V3 does initialization of cpu_set
 to NULL in main to fix compilation warning for uninitialized pointer.
 Also made changes in "bind_to_cpu" and "bind_to_node" to CPU_FREE
 masks properly.

 tools/perf/bench/numa.c | 128 +++++++++++++++++++++++++++++-----------
 1 file changed, 95 insertions(+), 33 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index c838c248aa2c..ffa83744ef99 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -55,7 +55,7 @@
 
 struct thread_data {
 	int			curr_cpu;
-	cpu_set_t		bind_cpumask;
+	cpu_set_t		*bind_cpumask;
 	int			bind_node;
 	u8			*process_data;
 	int			process_nr;
@@ -267,71 +267,115 @@ static bool node_has_cpus(int node)
 	return ret;
 }
 
-static cpu_set_t bind_to_cpu(int target_cpu)
+static cpu_set_t *bind_to_cpu(int target_cpu)
 {
-	cpu_set_t orig_mask, mask;
-	int ret;
+	int nrcpus = numa_num_possible_cpus();
+	cpu_set_t *orig_mask, *mask;
+	size_t size;
 
-	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
-	BUG_ON(ret);
+	orig_mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!orig_mask);
+	size = CPU_ALLOC_SIZE(nrcpus);
+	CPU_ZERO_S(size, orig_mask);
+
+	if (sched_getaffinity(0, size, orig_mask))
+		goto err_out;
+
+	mask = CPU_ALLOC(nrcpus);
+	if (!mask)
+		goto err_out;
 
-	CPU_ZERO(&mask);
+	CPU_ZERO_S(size, mask);
 
 	if (target_cpu == -1) {
 		int cpu;
 
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &mask);
+			CPU_SET_S(cpu, size, mask);
 	} else {
-		BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
-		CPU_SET(target_cpu, &mask);
+		if (target_cpu < 0 || target_cpu >= g->p.nr_cpus)
+			goto err;
+
+		CPU_SET_S(target_cpu, size, mask);
 	}
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	if (sched_setaffinity(0, size, mask))
+		goto err;
 
 	return orig_mask;
+
+err:
+	CPU_FREE(mask);
+err_out:
+	CPU_FREE(orig_mask);
+
+	/* BUG_ON due to failure in allocation of orig_mask/mask */
+	BUG_ON(-1);
 }
 
-static cpu_set_t bind_to_node(int target_node)
+static cpu_set_t *bind_to_node(int target_node)
 {
-	cpu_set_t orig_mask, mask;
+	int nrcpus = numa_num_possible_cpus();
+	size_t size;
+	cpu_set_t *orig_mask, *mask;
 	int cpu;
-	int ret;
 
-	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
-	BUG_ON(ret);
+	orig_mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!orig_mask);
+	size = CPU_ALLOC_SIZE(nrcpus);
+	CPU_ZERO_S(size, orig_mask);
+
+	if (sched_getaffinity(0, size, orig_mask))
+		goto err_out;
+
+	mask = CPU_ALLOC(nrcpus);
+	if (!mask)
+		goto err_out;
 
-	CPU_ZERO(&mask);
+	CPU_ZERO_S(size, mask);
 
 	if (target_node == NUMA_NO_NODE) {
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &mask);
+			CPU_SET_S(cpu, size, mask);
 	} else {
 		struct bitmask *cpumask = numa_allocate_cpumask();
 
-		BUG_ON(!cpumask);
+		if (!cpumask)
+			goto err;
+
 		if (!numa_node_to_cpus(target_node, cpumask)) {
 			for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
 				if (numa_bitmask_isbitset(cpumask, cpu))
-					CPU_SET(cpu, &mask);
+					CPU_SET_S(cpu, size, mask);
 			}
 		}
 		numa_free_cpumask(cpumask);
 	}
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	if (sched_setaffinity(0, size, mask))
+		goto err;
 
 	return orig_mask;
+
+err:
+	CPU_FREE(mask);
+err_out:
+	CPU_FREE(orig_mask);
+
+	/* BUG_ON due to failure in allocation of orig_mask/mask */
+	BUG_ON(-1);
 }
 
-static void bind_to_cpumask(cpu_set_t mask)
+static void bind_to_cpumask(cpu_set_t *mask)
 {
 	int ret;
+	size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	ret = sched_setaffinity(0, size, mask);
+	if (ret) {
+		CPU_FREE(mask);
+		BUG_ON(ret);
+	}
 }
 
 static void mempol_restore(void)
@@ -377,7 +421,7 @@ do {							\
 static u8 *alloc_data(ssize_t bytes0, int map_flags,
 		      int init_zero, int init_cpu0, int thp, int init_random)
 {
-	cpu_set_t orig_mask;
+	cpu_set_t *orig_mask = NULL;
 	ssize_t bytes;
 	u8 *buf;
 	int ret;
@@ -435,6 +479,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
 	/* Restore affinity: */
 	if (init_cpu0) {
 		bind_to_cpumask(orig_mask);
+		CPU_FREE(orig_mask);
 		mempol_restore();
 	}
 
@@ -595,6 +640,7 @@ static int parse_setup_cpu_list(void)
 		BUG_ON(bind_cpu_0 > bind_cpu_1);
 
 		for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
+			size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
 			int i;
 
 			for (i = 0; i < mul; i++) {
@@ -614,10 +660,15 @@ static int parse_setup_cpu_list(void)
 					tprintf("%2d", bind_cpu);
 				}
 
-				CPU_ZERO(&td->bind_cpumask);
+				td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+				BUG_ON(!td->bind_cpumask);
+				CPU_ZERO_S(size, td->bind_cpumask);
 				for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
-					BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
-					CPU_SET(cpu, &td->bind_cpumask);
+					if (cpu < 0 || cpu >= g->p.nr_cpus) {
+						CPU_FREE(td->bind_cpumask);
+						BUG_ON(-1);
+					}
+					CPU_SET_S(cpu, size, td->bind_cpumask);
 				}
 				t++;
 			}
@@ -1245,7 +1296,7 @@ static void *worker_thread(void *__tdata)
 		 * by migrating to CPU#0:
 		 */
 		if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
-			cpu_set_t orig_mask;
+			cpu_set_t *orig_mask;
 			int target_cpu;
 			int this_cpu;
 
@@ -1269,6 +1320,7 @@ static void *worker_thread(void *__tdata)
 				printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
 
 			bind_to_cpumask(orig_mask);
+			CPU_FREE(orig_mask);
 		}
 
 		if (details >= 3) {
@@ -1402,21 +1454,31 @@ static void init_thread_data(void)
 
 	for (t = 0; t < g->p.nr_tasks; t++) {
 		struct thread_data *td = g->threads + t;
+		size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
 		int cpu;
 
 		/* Allow all nodes by default: */
 		td->bind_node = NUMA_NO_NODE;
 
 		/* Allow all CPUs by default: */
-		CPU_ZERO(&td->bind_cpumask);
+		td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+		BUG_ON(!td->bind_cpumask);
+		CPU_ZERO_S(cpuset_size, td->bind_cpumask);
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &td->bind_cpumask);
+			CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
 	}
 }
 
 static void deinit_thread_data(void)
 {
 	ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
+	int t;
+
+	/* Free the bind_cpumask allocated for thread_data */
+	for (t = 0; t < g->p.nr_tasks; t++) {
+		struct thread_data *td = g->threads + t;
+		CPU_FREE(td->bind_cpumask);
+	}
 
 	free_data(g->threads, size);
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V3 2/2] perf bench: Fix numa bench to fix usage of affinity for machines with #CPUs > 1K
@ 2022-04-12 16:40   ` Athira Rajeev
  0 siblings, 0 replies; 9+ messages in thread
From: Athira Rajeev @ 2022-04-12 16:40 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: irogers, maddy, srikar, rnsastry, linux-kernel, linux-perf-users,
	kjain, linuxppc-dev

The 'perf bench numa' testcase fails on systems with more than 1K CPUs.

Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1

Snippet of code:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>

bind_to_node function uses "sched_getaffinity" to save the original
cpumask and this call is returning EINVAL ((invalid argument).

This happens because the default mask size in glibc is 1024.  To
overcome this 1024 CPUs mask size limitation of cpu_set_t, change the
mask size using the CPU_*_S macros ie, use CPU_ALLOC to allocate
cpumask, CPU_ALLOC_SIZE for size.

Apart from fixing this for "orig_mask", apply same logic to "mask" as
well which is used to setaffinity so that mask size is large enough to
represent number of possible CPU's in the system.

sched_getaffinity is used in one more place in perf numa bench. It is in
"bind_to_cpu" function. Apply the same logic there also. Though
currently no failure is reported from there, it is ideal to change
getaffinity to work with such system configurations having CPU's more
than default mask size supported by glibc.

Also fix "sched_setaffinity" to use mask size which is large enough to
represent number of possible CPU's in the system.

Fixed all places where "bind_cpumask" which is part of "struct
thread_data" is used such that bind_cpumask works in all configuration.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
Changelog
 The first version was posted here:
 https://lore.kernel.org/all/20220406175113.87881-5-atrajeev@linux.vnet.ibm.com/
 This v3 version separates patch 3 and addressing compilation
 failure in some distro's. Change in V3 does initialization of cpu_set
 to NULL in main to fix compilation warning for uninitialized pointer.
 Also made changes in "bind_to_cpu" and "bind_to_node" to CPU_FREE
 masks properly.

 tools/perf/bench/numa.c | 128 +++++++++++++++++++++++++++++-----------
 1 file changed, 95 insertions(+), 33 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index c838c248aa2c..ffa83744ef99 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -55,7 +55,7 @@
 
 struct thread_data {
 	int			curr_cpu;
-	cpu_set_t		bind_cpumask;
+	cpu_set_t		*bind_cpumask;
 	int			bind_node;
 	u8			*process_data;
 	int			process_nr;
@@ -267,71 +267,115 @@ static bool node_has_cpus(int node)
 	return ret;
 }
 
-static cpu_set_t bind_to_cpu(int target_cpu)
+static cpu_set_t *bind_to_cpu(int target_cpu)
 {
-	cpu_set_t orig_mask, mask;
-	int ret;
+	int nrcpus = numa_num_possible_cpus();
+	cpu_set_t *orig_mask, *mask;
+	size_t size;
 
-	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
-	BUG_ON(ret);
+	orig_mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!orig_mask);
+	size = CPU_ALLOC_SIZE(nrcpus);
+	CPU_ZERO_S(size, orig_mask);
+
+	if (sched_getaffinity(0, size, orig_mask))
+		goto err_out;
+
+	mask = CPU_ALLOC(nrcpus);
+	if (!mask)
+		goto err_out;
 
-	CPU_ZERO(&mask);
+	CPU_ZERO_S(size, mask);
 
 	if (target_cpu == -1) {
 		int cpu;
 
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &mask);
+			CPU_SET_S(cpu, size, mask);
 	} else {
-		BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
-		CPU_SET(target_cpu, &mask);
+		if (target_cpu < 0 || target_cpu >= g->p.nr_cpus)
+			goto err;
+
+		CPU_SET_S(target_cpu, size, mask);
 	}
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	if (sched_setaffinity(0, size, mask))
+		goto err;
 
 	return orig_mask;
+
+err:
+	CPU_FREE(mask);
+err_out:
+	CPU_FREE(orig_mask);
+
+	/* BUG_ON due to failure in allocation of orig_mask/mask */
+	BUG_ON(-1);
 }
 
-static cpu_set_t bind_to_node(int target_node)
+static cpu_set_t *bind_to_node(int target_node)
 {
-	cpu_set_t orig_mask, mask;
+	int nrcpus = numa_num_possible_cpus();
+	size_t size;
+	cpu_set_t *orig_mask, *mask;
 	int cpu;
-	int ret;
 
-	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
-	BUG_ON(ret);
+	orig_mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!orig_mask);
+	size = CPU_ALLOC_SIZE(nrcpus);
+	CPU_ZERO_S(size, orig_mask);
+
+	if (sched_getaffinity(0, size, orig_mask))
+		goto err_out;
+
+	mask = CPU_ALLOC(nrcpus);
+	if (!mask)
+		goto err_out;
 
-	CPU_ZERO(&mask);
+	CPU_ZERO_S(size, mask);
 
 	if (target_node == NUMA_NO_NODE) {
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &mask);
+			CPU_SET_S(cpu, size, mask);
 	} else {
 		struct bitmask *cpumask = numa_allocate_cpumask();
 
-		BUG_ON(!cpumask);
+		if (!cpumask)
+			goto err;
+
 		if (!numa_node_to_cpus(target_node, cpumask)) {
 			for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
 				if (numa_bitmask_isbitset(cpumask, cpu))
-					CPU_SET(cpu, &mask);
+					CPU_SET_S(cpu, size, mask);
 			}
 		}
 		numa_free_cpumask(cpumask);
 	}
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	if (sched_setaffinity(0, size, mask))
+		goto err;
 
 	return orig_mask;
+
+err:
+	CPU_FREE(mask);
+err_out:
+	CPU_FREE(orig_mask);
+
+	/* BUG_ON due to failure in allocation of orig_mask/mask */
+	BUG_ON(-1);
 }
 
-static void bind_to_cpumask(cpu_set_t mask)
+static void bind_to_cpumask(cpu_set_t *mask)
 {
 	int ret;
+	size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	ret = sched_setaffinity(0, size, mask);
+	if (ret) {
+		CPU_FREE(mask);
+		BUG_ON(ret);
+	}
 }
 
 static void mempol_restore(void)
@@ -377,7 +421,7 @@ do {							\
 static u8 *alloc_data(ssize_t bytes0, int map_flags,
 		      int init_zero, int init_cpu0, int thp, int init_random)
 {
-	cpu_set_t orig_mask;
+	cpu_set_t *orig_mask = NULL;
 	ssize_t bytes;
 	u8 *buf;
 	int ret;
@@ -435,6 +479,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
 	/* Restore affinity: */
 	if (init_cpu0) {
 		bind_to_cpumask(orig_mask);
+		CPU_FREE(orig_mask);
 		mempol_restore();
 	}
 
@@ -595,6 +640,7 @@ static int parse_setup_cpu_list(void)
 		BUG_ON(bind_cpu_0 > bind_cpu_1);
 
 		for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
+			size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
 			int i;
 
 			for (i = 0; i < mul; i++) {
@@ -614,10 +660,15 @@ static int parse_setup_cpu_list(void)
 					tprintf("%2d", bind_cpu);
 				}
 
-				CPU_ZERO(&td->bind_cpumask);
+				td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+				BUG_ON(!td->bind_cpumask);
+				CPU_ZERO_S(size, td->bind_cpumask);
 				for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
-					BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
-					CPU_SET(cpu, &td->bind_cpumask);
+					if (cpu < 0 || cpu >= g->p.nr_cpus) {
+						CPU_FREE(td->bind_cpumask);
+						BUG_ON(-1);
+					}
+					CPU_SET_S(cpu, size, td->bind_cpumask);
 				}
 				t++;
 			}
@@ -1245,7 +1296,7 @@ static void *worker_thread(void *__tdata)
 		 * by migrating to CPU#0:
 		 */
 		if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
-			cpu_set_t orig_mask;
+			cpu_set_t *orig_mask;
 			int target_cpu;
 			int this_cpu;
 
@@ -1269,6 +1320,7 @@ static void *worker_thread(void *__tdata)
 				printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
 
 			bind_to_cpumask(orig_mask);
+			CPU_FREE(orig_mask);
 		}
 
 		if (details >= 3) {
@@ -1402,21 +1454,31 @@ static void init_thread_data(void)
 
 	for (t = 0; t < g->p.nr_tasks; t++) {
 		struct thread_data *td = g->threads + t;
+		size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
 		int cpu;
 
 		/* Allow all nodes by default: */
 		td->bind_node = NUMA_NO_NODE;
 
 		/* Allow all CPUs by default: */
-		CPU_ZERO(&td->bind_cpumask);
+		td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+		BUG_ON(!td->bind_cpumask);
+		CPU_ZERO_S(cpuset_size, td->bind_cpumask);
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &td->bind_cpumask);
+			CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
 	}
 }
 
 static void deinit_thread_data(void)
 {
 	ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
+	int t;
+
+	/* Free the bind_cpumask allocated for thread_data */
+	for (t = 0; t < g->p.nr_tasks; t++) {
+		struct thread_data *td = g->threads + t;
+		CPU_FREE(td->bind_cpumask);
+	}
 
 	free_data(g->threads, size);
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH V3 0/2] Fix perf bench numa to work with machines having #CPUs > 1K
  2022-04-12 16:40 ` Athira Rajeev
                   ` (2 preceding siblings ...)
  (?)
@ 2022-04-13 14:31 ` Disha Goel
  -1 siblings, 0 replies; 9+ messages in thread
From: Disha Goel @ 2022-04-13 14:31 UTC (permalink / raw)
  To: Athira Rajeev, acme, jolsa
  Cc: irogers, maddy, srikar, rnsastry, linux-kernel, linux-perf-users,
	kjain, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2786 bytes --]



-----Original Message-----
From: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
To: acme@kernel.org, jolsa@kernel.org, disgoel@linux.vnet.ibm.com
Cc: mpe@ellerman.id.au, linux-perf-users@vger.kernel.org, 
linuxppc-dev@lists.ozlabs.org, maddy@linux.vnet.ibm.com, 
rnsastry@linux.ibm.com, kjain@linux.ibm.com, 
linux-kernel@vger.kernel.org, srikar@linux.vnet.ibm.com, 
irogers@google.com
Subject: [PATCH V3 0/2] Fix perf bench numa to work with machines
having #CPUs > 1K
Date: Tue, 12 Apr 2022 22:10:57 +0530

The perf benchmark for collections: numa hits failure in
systemconfiguration with CPU's more than 1024. These benchmarks
uses"sched_getaffinity" and "sched_setaffinity" in the code towork with
affinity.
Example snippet from numa benchmark:<<>>perf: bench/numa.c:302:
bind_to_node: Assertion `!(ret)' failed.Aborted (core dumped)<<>>
bind_to_node function uses "sched_getaffinity" to save the cpumask.This
fails with EINVAL because the default mask size in glibc is 1024
To overcome this 1024 CPUs mask size limitation of cpu_set_t,change the
mask size using the CPU_*_S macros ie, use CPU_ALLOC toallocate
cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.
Fix all the relevant places in the code to use mask size which is
largeenough to represent number of possible CPU's in the system.
This patchset also address a fix for parse_setup_cpu_list function
innuma bench to check if input CPU is online before binding task tothat
CPU. This is to fix failures where, though CPU number is withinmax CPU,
it could happen that CPU is offline. Here, sched_setaffinitywill result
in failure when using cpumask having that cpu bit setin the mask.
Patch 1 address fix for parse_setup_cpu_list to check if CPU used to
bindtask is online. Patch 2 has fix for bench numa to work with
machineshaving #CPUs > 1K
Athira Rajeev (2):  tools/perf: Fix perf bench numa testcase to check
if CPU used to bind    task is online  perf bench: Fix numa bench to
fix usage of affinity for machines with    #CPUs > 1K
Changelog:v2 -> v3Link to the v2 version :
https://lore.kernel.org/all/20220406175113.87881-1-atrajeev@linux.vnet.ibm.com/
 - From the v2 version, patch 1 and patch 2 are now part of upstream. -
This v3 version separates patch 3 and patch 4 to address
review   comments from arnaldo which includes using sysfs__read_str for
reading   sysfs file and fixing the compilation issues observed in
debian
Tesed the patches on powerpc with CPU > 1K and other configurations as
well, verified the perf bench numa with the patch set.Tested-by: Disha
Goel <disgoel@linux.vnet.ibm.com>
 tools/perf/bench/numa.c  | 136 +++++++++++++++++++++++++++++----------
tools/perf/util/header.c |  51 +++++++++++++++ tools/perf/util/header.h
|   1 + 3 files changed, 153 insertions(+), 35 deletions(-)


[-- Attachment #2: Type: text/html, Size: 4686 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V3 0/2] Fix perf bench numa to work with machines having #CPUs > 1K
  2022-04-12 16:40 ` Athira Rajeev
@ 2022-04-14 12:16   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-04-14 12:16 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: jolsa, disgoel, mpe, linux-perf-users, linuxppc-dev, maddy,
	rnsastry, kjain, linux-kernel, srikar, irogers

Em Tue, Apr 12, 2022 at 10:10:57PM +0530, Athira Rajeev escreveu:
> The perf benchmark for collections: numa hits failure in system
> configuration with CPU's more than 1024. These benchmarks uses
> "sched_getaffinity" and "sched_setaffinity" in the code to
> work with affinity.

Thanks, applied.

- Arnaldo

 
> Example snippet from numa benchmark:
> <<>>
> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
> Aborted (core dumped)
> <<>>
> 
> bind_to_node function uses "sched_getaffinity" to save the cpumask.
> This fails with EINVAL because the default mask size in glibc is 1024
> 
> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
> allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.
> 
> Fix all the relevant places in the code to use mask size which is large
> enough to represent number of possible CPU's in the system.
> 
> This patchset also address a fix for parse_setup_cpu_list function in
> numa bench to check if input CPU is online before binding task to
> that CPU. This is to fix failures where, though CPU number is within
> max CPU, it could happen that CPU is offline. Here, sched_setaffinity
> will result in failure when using cpumask having that cpu bit set
> in the mask.
> 
> Patch 1 address fix for parse_setup_cpu_list to check if CPU used to bind
> task is online. Patch 2 has fix for bench numa to work with machines
> having #CPUs > 1K
> 
> Athira Rajeev (2):
>   tools/perf: Fix perf bench numa testcase to check if CPU used to bind
>     task is online
>   perf bench: Fix numa bench to fix usage of affinity for machines with
>     #CPUs > 1K
> 
> Changelog:
> v2 -> v3
> Link to the v2 version :
> https://lore.kernel.org/all/20220406175113.87881-1-atrajeev@linux.vnet.ibm.com/
>  - From the v2 version, patch 1 and patch 2 are now part of upstream.
>  - This v3 version separates patch 3 and patch 4 to address review
>    comments from arnaldo which includes using sysfs__read_str for reading
>    sysfs file and fixing the compilation issues observed in debian
> 
>  tools/perf/bench/numa.c  | 136 +++++++++++++++++++++++++++++----------
>  tools/perf/util/header.c |  51 +++++++++++++++
>  tools/perf/util/header.h |   1 +
>  3 files changed, 153 insertions(+), 35 deletions(-)
> 
> -- 
> 2.35.1

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V3 0/2] Fix perf bench numa to work with machines having #CPUs > 1K
@ 2022-04-14 12:16   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-04-14 12:16 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: irogers, maddy, srikar, rnsastry, linux-kernel, linux-perf-users,
	jolsa, kjain, disgoel, linuxppc-dev

Em Tue, Apr 12, 2022 at 10:10:57PM +0530, Athira Rajeev escreveu:
> The perf benchmark for collections: numa hits failure in system
> configuration with CPU's more than 1024. These benchmarks uses
> "sched_getaffinity" and "sched_setaffinity" in the code to
> work with affinity.

Thanks, applied.

- Arnaldo

 
> Example snippet from numa benchmark:
> <<>>
> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
> Aborted (core dumped)
> <<>>
> 
> bind_to_node function uses "sched_getaffinity" to save the cpumask.
> This fails with EINVAL because the default mask size in glibc is 1024
> 
> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
> allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.
> 
> Fix all the relevant places in the code to use mask size which is large
> enough to represent number of possible CPU's in the system.
> 
> This patchset also address a fix for parse_setup_cpu_list function in
> numa bench to check if input CPU is online before binding task to
> that CPU. This is to fix failures where, though CPU number is within
> max CPU, it could happen that CPU is offline. Here, sched_setaffinity
> will result in failure when using cpumask having that cpu bit set
> in the mask.
> 
> Patch 1 address fix for parse_setup_cpu_list to check if CPU used to bind
> task is online. Patch 2 has fix for bench numa to work with machines
> having #CPUs > 1K
> 
> Athira Rajeev (2):
>   tools/perf: Fix perf bench numa testcase to check if CPU used to bind
>     task is online
>   perf bench: Fix numa bench to fix usage of affinity for machines with
>     #CPUs > 1K
> 
> Changelog:
> v2 -> v3
> Link to the v2 version :
> https://lore.kernel.org/all/20220406175113.87881-1-atrajeev@linux.vnet.ibm.com/
>  - From the v2 version, patch 1 and patch 2 are now part of upstream.
>  - This v3 version separates patch 3 and patch 4 to address review
>    comments from arnaldo which includes using sysfs__read_str for reading
>    sysfs file and fixing the compilation issues observed in debian
> 
>  tools/perf/bench/numa.c  | 136 +++++++++++++++++++++++++++++----------
>  tools/perf/util/header.c |  51 +++++++++++++++
>  tools/perf/util/header.h |   1 +
>  3 files changed, 153 insertions(+), 35 deletions(-)
> 
> -- 
> 2.35.1

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-04-14 12:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12 16:40 [PATCH V3 0/2] Fix perf bench numa to work with machines having #CPUs > 1K Athira Rajeev
2022-04-12 16:40 ` Athira Rajeev
2022-04-12 16:40 ` [PATCH V3 1/2] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online Athira Rajeev
2022-04-12 16:40   ` Athira Rajeev
2022-04-12 16:40 ` [PATCH V3 2/2] perf bench: Fix numa bench to fix usage of affinity for machines with #CPUs > 1K Athira Rajeev
2022-04-12 16:40   ` Athira Rajeev
2022-04-13 14:31 ` [PATCH V3 0/2] Fix perf bench numa to work with machines having " Disha Goel
2022-04-14 12:16 ` Arnaldo Carvalho de Melo
2022-04-14 12:16   ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.