[PATCH 0/4] tools/perf: Fix perf bench numa, futex and epoll to work with machines having #CPUs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] tools/perf: Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K
@ 2022-04-01 18:58 ` Athira Rajeev
  0 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-01 18:58 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain

The perf benchmark for collections: numa, futex and epoll
hits failure in system configuration with CPU's more than 1024.
These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
in the code to work with affinity.

Example snippet from numa benchmark:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>

bind_to_node function uses "sched_getaffinity" to save the cpumask.
This fails with EINVAL because the default mask size in glibc is 1024.

Similarly in futex and epoll benchmark, uses sched_setaffinity during
pthread_create with affinity. And since it returns EINVAL in such system
configuration, benchmark doesn't run.

To overcome this 1024 CPUs mask size limitation of cpu_set_t,
change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.

Fix all the relevant places in the code to use mask size which is large
enough to represent number of possible CPU's in the system.

Fix parse_setup_cpu_list function in numa bench to check if input CPU
is online before binding task to that CPU. This is to fix failures where,
though CPU number is within max CPU, it could happen that CPU is offline.
Here, sched_setaffinity will result in failure when using cpumask having
that cpu bit set in the mask.

Patch 1 and Patch 2 address fix for perf bench futex and perf bench
epoll benchmark. Patch 3 and Patch 4 address fix in perf bench numa
benchmark

Athira Rajeev (4):
  tools/perf: Fix perf bench futex to correct usage of affinity for
    machines with #CPUs > 1K
  tools/perf: Fix perf bench epoll to correct usage of affinity for
    machines with #CPUs > 1K
  tools/perf: Fix perf numa bench to fix usage of affinity for machines
    with #CPUs > 1K
  tools/perf: Fix perf bench numa testcase to check if CPU used to bind
    task is online

 tools/perf/bench/epoll-ctl.c           |  25 ++++--
 tools/perf/bench/epoll-wait.c          |  25 ++++--
 tools/perf/bench/futex-hash.c          |  26 ++++--
 tools/perf/bench/futex-lock-pi.c       |  21 +++--
 tools/perf/bench/futex-requeue.c       |  21 +++--
 tools/perf/bench/futex-wake-parallel.c |  21 +++--
 tools/perf/bench/futex-wake.c          |  22 +++--
 tools/perf/bench/numa.c                | 117 ++++++++++++++++++-------
 tools/perf/util/header.c               |  43 +++++++++
 tools/perf/util/header.h               |   1 +
 10 files changed, 252 insertions(+), 70 deletions(-)

-- 
2.35.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 0/4] tools/perf: Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K
@ 2022-04-01 18:58 ` Athira Rajeev
  0 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-01 18:58 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: maddy, rnsastry, linux-perf-users, kjain, linuxppc-dev

The perf benchmark for collections: numa, futex and epoll
hits failure in system configuration with CPU's more than 1024.
These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
in the code to work with affinity.

Example snippet from numa benchmark:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>

bind_to_node function uses "sched_getaffinity" to save the cpumask.
This fails with EINVAL because the default mask size in glibc is 1024.

Similarly in futex and epoll benchmark, uses sched_setaffinity during
pthread_create with affinity. And since it returns EINVAL in such system
configuration, benchmark doesn't run.

To overcome this 1024 CPUs mask size limitation of cpu_set_t,
change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.

Fix all the relevant places in the code to use mask size which is large
enough to represent number of possible CPU's in the system.

Fix parse_setup_cpu_list function in numa bench to check if input CPU
is online before binding task to that CPU. This is to fix failures where,
though CPU number is within max CPU, it could happen that CPU is offline.
Here, sched_setaffinity will result in failure when using cpumask having
that cpu bit set in the mask.

Patch 1 and Patch 2 address fix for perf bench futex and perf bench
epoll benchmark. Patch 3 and Patch 4 address fix in perf bench numa
benchmark

Athira Rajeev (4):
  tools/perf: Fix perf bench futex to correct usage of affinity for
    machines with #CPUs > 1K
  tools/perf: Fix perf bench epoll to correct usage of affinity for
    machines with #CPUs > 1K
  tools/perf: Fix perf numa bench to fix usage of affinity for machines
    with #CPUs > 1K
  tools/perf: Fix perf bench numa testcase to check if CPU used to bind
    task is online

 tools/perf/bench/epoll-ctl.c           |  25 ++++--
 tools/perf/bench/epoll-wait.c          |  25 ++++--
 tools/perf/bench/futex-hash.c          |  26 ++++--
 tools/perf/bench/futex-lock-pi.c       |  21 +++--
 tools/perf/bench/futex-requeue.c       |  21 +++--
 tools/perf/bench/futex-wake-parallel.c |  21 +++--
 tools/perf/bench/futex-wake.c          |  22 +++--
 tools/perf/bench/numa.c                | 117 ++++++++++++++++++-------
 tools/perf/util/header.c               |  43 +++++++++
 tools/perf/util/header.h               |   1 +
 10 files changed, 252 insertions(+), 70 deletions(-)

-- 
2.35.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] tools/perf: Fix perf bench futex to correct usage of affinity for machines with #CPUs > 1K
  2022-04-01 18:58 ` Athira Rajeev
@ 2022-04-01 18:58   ` Athira Rajeev
  -1 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-01 18:58 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain

perf bench futex testcase fails on systems with CPU's
more than 1K.

Testcase: perf bench futex all
Failure snippet:
<<>>Running futex/hash benchmark...

perf: pthread_create: No such file or directory
<<>>

All the futex benchmarks ( ie hash, lock-api, requeue, wake,
wake-parallel ), pthread_create is invoked in respective bench_futex_*
function. Though the logs shows direct failure from pthread_create,
strace logs showed that actual failure is from  "sched_setaffinity"
returning EINVAL (invalid argument). This happens because the default
mask size in glibc is 1024. To overcome this 1024 CPUs mask size
limitation of cpu_set_t, change the mask size using the CPU_*_S macros.

Patch addresses this by fixing all the futex benchmarks to use
CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
CPU_SET_S to set the mask.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/bench/futex-hash.c          | 26 +++++++++++++++++++-------
 tools/perf/bench/futex-lock-pi.c       | 21 ++++++++++++++++-----
 tools/perf/bench/futex-requeue.c       | 21 ++++++++++++++++-----
 tools/perf/bench/futex-wake-parallel.c | 21 ++++++++++++++++-----
 tools/perf/bench/futex-wake.c          | 22 ++++++++++++++++------
 5 files changed, 83 insertions(+), 28 deletions(-)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index 9627b6ab8670..dfce64e551e2 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -122,12 +122,14 @@ static void print_summary(void)
 int bench_futex_hash(int argc, const char **argv)
 {
 	int ret = 0;
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	struct sigaction act;
 	unsigned int i;
 	pthread_attr_t thread_attr;
 	struct worker *worker = NULL;
 	struct perf_cpu_map *cpu;
+	int nrcpus;
+	size_t size;
 
 	argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0);
 	if (argc) {
@@ -170,25 +172,35 @@ int bench_futex_hash(int argc, const char **argv)
 	threads_starting = params.nthreads;
 	pthread_attr_init(&thread_attr);
 	gettimeofday(&bench__start, NULL);
+
+	nrcpus = perf_cpu_map__nr(cpu);
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < params.nthreads; i++) {
 		worker[i].tid = i;
 		worker[i].futex = calloc(params.nfutexes, sizeof(*worker[i].futex));
 		if (!worker[i].futex)
 			goto errmem;
 
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
 
-		ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
-		if (ret)
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
+		ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
-
+		}
 		ret = pthread_create(&worker[i].thread, &thread_attr, workerfn,
 				     (void *)(struct worker *) &worker[i]);
-		if (ret)
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 
 	}
+	CPU_FREE(cpuset);
 	pthread_attr_destroy(&thread_attr);
 
 	pthread_mutex_lock(&thread_lock);
diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c
index a512a320df74..61c3bb80d4cf 100644
--- a/tools/perf/bench/futex-lock-pi.c
+++ b/tools/perf/bench/futex-lock-pi.c
@@ -120,11 +120,17 @@ static void *workerfn(void *arg)
 static void create_threads(struct worker *w, pthread_attr_t thread_attr,
 			   struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
+	int nrcpus =  perf_cpu_map__nr(cpu);
+	size_t size;
 
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < params.nthreads; i++) {
 		worker[i].tid = i;
 
@@ -135,15 +141,20 @@ static void create_threads(struct worker *w, pthread_attr_t thread_attr,
 		} else
 			worker[i].futex = &global_futex;
 
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i].thread, &thread_attr, workerfn, &worker[i]))
+		if (pthread_create(&w[i].thread, &thread_attr, workerfn, &worker[i])) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 int bench_futex_lock_pi(int argc, const char **argv)
diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c
index aca47ce8b1e7..2cb013f7ffe5 100644
--- a/tools/perf/bench/futex-requeue.c
+++ b/tools/perf/bench/futex-requeue.c
@@ -123,22 +123,33 @@ static void *workerfn(void *arg __maybe_unused)
 static void block_threads(pthread_t *w,
 			  pthread_attr_t thread_attr, struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
+	int nrcpus = perf_cpu_map__nr(cpu);
+	size_t size;
 
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	/* create and block all threads */
 	for (i = 0; i < params.nthreads; i++) {
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i], &thread_attr, workerfn, NULL))
+		if (pthread_create(&w[i], &thread_attr, workerfn, NULL)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 static void toggle_done(int sig __maybe_unused,
diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c
index 888ee6037945..efa5070a5eb3 100644
--- a/tools/perf/bench/futex-wake-parallel.c
+++ b/tools/perf/bench/futex-wake-parallel.c
@@ -144,22 +144,33 @@ static void *blocked_workerfn(void *arg __maybe_unused)
 static void block_threads(pthread_t *w, pthread_attr_t thread_attr,
 			  struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
+	int nrcpus = perf_cpu_map__nr(cpu);
+	size_t size;
 
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	/* create and block all threads */
 	for (i = 0; i < params.nthreads; i++) {
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i], &thread_attr, blocked_workerfn, NULL))
+		if (pthread_create(&w[i], &thread_attr, blocked_workerfn, NULL)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 static void print_run(struct thread_data *waking_worker, unsigned int run_num)
diff --git a/tools/perf/bench/futex-wake.c b/tools/perf/bench/futex-wake.c
index aa82db51c0ab..3a10f54900c1 100644
--- a/tools/perf/bench/futex-wake.c
+++ b/tools/perf/bench/futex-wake.c
@@ -97,22 +97,32 @@ static void print_summary(void)
 static void block_threads(pthread_t *w,
 			  pthread_attr_t thread_attr, struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
-
+	size_t size;
+	int nrcpus = perf_cpu_map__nr(cpu);
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	/* create and block all threads */
 	for (i = 0; i < params.nthreads; i++) {
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i], &thread_attr, workerfn, NULL))
+		if (pthread_create(&w[i], &thread_attr, workerfn, NULL)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 static void toggle_done(int sig __maybe_unused,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 1/4] tools/perf: Fix perf bench futex to correct usage of affinity for machines with #CPUs > 1K
@ 2022-04-01 18:58   ` Athira Rajeev
  0 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-01 18:58 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: maddy, rnsastry, linux-perf-users, kjain, linuxppc-dev

perf bench futex testcase fails on systems with CPU's
more than 1K.

Testcase: perf bench futex all
Failure snippet:
<<>>Running futex/hash benchmark...

perf: pthread_create: No such file or directory
<<>>

All the futex benchmarks ( ie hash, lock-api, requeue, wake,
wake-parallel ), pthread_create is invoked in respective bench_futex_*
function. Though the logs shows direct failure from pthread_create,
strace logs showed that actual failure is from  "sched_setaffinity"
returning EINVAL (invalid argument). This happens because the default
mask size in glibc is 1024. To overcome this 1024 CPUs mask size
limitation of cpu_set_t, change the mask size using the CPU_*_S macros.

Patch addresses this by fixing all the futex benchmarks to use
CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
CPU_SET_S to set the mask.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/bench/futex-hash.c          | 26 +++++++++++++++++++-------
 tools/perf/bench/futex-lock-pi.c       | 21 ++++++++++++++++-----
 tools/perf/bench/futex-requeue.c       | 21 ++++++++++++++++-----
 tools/perf/bench/futex-wake-parallel.c | 21 ++++++++++++++++-----
 tools/perf/bench/futex-wake.c          | 22 ++++++++++++++++------
 5 files changed, 83 insertions(+), 28 deletions(-)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index 9627b6ab8670..dfce64e551e2 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -122,12 +122,14 @@ static void print_summary(void)
 int bench_futex_hash(int argc, const char **argv)
 {
 	int ret = 0;
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	struct sigaction act;
 	unsigned int i;
 	pthread_attr_t thread_attr;
 	struct worker *worker = NULL;
 	struct perf_cpu_map *cpu;
+	int nrcpus;
+	size_t size;
 
 	argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0);
 	if (argc) {
@@ -170,25 +172,35 @@ int bench_futex_hash(int argc, const char **argv)
 	threads_starting = params.nthreads;
 	pthread_attr_init(&thread_attr);
 	gettimeofday(&bench__start, NULL);
+
+	nrcpus = perf_cpu_map__nr(cpu);
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < params.nthreads; i++) {
 		worker[i].tid = i;
 		worker[i].futex = calloc(params.nfutexes, sizeof(*worker[i].futex));
 		if (!worker[i].futex)
 			goto errmem;
 
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
 
-		ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
-		if (ret)
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
+		ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
-
+		}
 		ret = pthread_create(&worker[i].thread, &thread_attr, workerfn,
 				     (void *)(struct worker *) &worker[i]);
-		if (ret)
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 
 	}
+	CPU_FREE(cpuset);
 	pthread_attr_destroy(&thread_attr);
 
 	pthread_mutex_lock(&thread_lock);
diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c
index a512a320df74..61c3bb80d4cf 100644
--- a/tools/perf/bench/futex-lock-pi.c
+++ b/tools/perf/bench/futex-lock-pi.c
@@ -120,11 +120,17 @@ static void *workerfn(void *arg)
 static void create_threads(struct worker *w, pthread_attr_t thread_attr,
 			   struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
+	int nrcpus =  perf_cpu_map__nr(cpu);
+	size_t size;
 
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < params.nthreads; i++) {
 		worker[i].tid = i;
 
@@ -135,15 +141,20 @@ static void create_threads(struct worker *w, pthread_attr_t thread_attr,
 		} else
 			worker[i].futex = &global_futex;
 
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i].thread, &thread_attr, workerfn, &worker[i]))
+		if (pthread_create(&w[i].thread, &thread_attr, workerfn, &worker[i])) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 int bench_futex_lock_pi(int argc, const char **argv)
diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c
index aca47ce8b1e7..2cb013f7ffe5 100644
--- a/tools/perf/bench/futex-requeue.c
+++ b/tools/perf/bench/futex-requeue.c
@@ -123,22 +123,33 @@ static void *workerfn(void *arg __maybe_unused)
 static void block_threads(pthread_t *w,
 			  pthread_attr_t thread_attr, struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
+	int nrcpus = perf_cpu_map__nr(cpu);
+	size_t size;
 
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	/* create and block all threads */
 	for (i = 0; i < params.nthreads; i++) {
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i], &thread_attr, workerfn, NULL))
+		if (pthread_create(&w[i], &thread_attr, workerfn, NULL)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 static void toggle_done(int sig __maybe_unused,
diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c
index 888ee6037945..efa5070a5eb3 100644
--- a/tools/perf/bench/futex-wake-parallel.c
+++ b/tools/perf/bench/futex-wake-parallel.c
@@ -144,22 +144,33 @@ static void *blocked_workerfn(void *arg __maybe_unused)
 static void block_threads(pthread_t *w, pthread_attr_t thread_attr,
 			  struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
+	int nrcpus = perf_cpu_map__nr(cpu);
+	size_t size;
 
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	/* create and block all threads */
 	for (i = 0; i < params.nthreads; i++) {
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i], &thread_attr, blocked_workerfn, NULL))
+		if (pthread_create(&w[i], &thread_attr, blocked_workerfn, NULL)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 static void print_run(struct thread_data *waking_worker, unsigned int run_num)
diff --git a/tools/perf/bench/futex-wake.c b/tools/perf/bench/futex-wake.c
index aa82db51c0ab..3a10f54900c1 100644
--- a/tools/perf/bench/futex-wake.c
+++ b/tools/perf/bench/futex-wake.c
@@ -97,22 +97,32 @@ static void print_summary(void)
 static void block_threads(pthread_t *w,
 			  pthread_attr_t thread_attr, struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
-
+	size_t size;
+	int nrcpus = perf_cpu_map__nr(cpu);
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	/* create and block all threads */
 	for (i = 0; i < params.nthreads; i++) {
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i], &thread_attr, workerfn, NULL))
+		if (pthread_create(&w[i], &thread_attr, workerfn, NULL)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 static void toggle_done(int sig __maybe_unused,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] tools/perf: Fix perf bench epoll to correct usage of affinity for machines with #CPUs > 1K
  2022-04-01 18:58 ` Athira Rajeev
@ 2022-04-01 18:58   ` Athira Rajeev
  -1 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-01 18:58 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain

perf bench epoll testcase fails on systems with CPU's
more than 1K.

Testcase: perf bench epoll all
Result snippet:
<<>>
Run summary [PID 106497]: 1399 threads monitoring on 64 file-descriptors for 8 secs.

perf: pthread_create: No such file or directory
<<>>

In epoll benchmarks (ctl, wait) pthread_create is invoked in do_threads
from respective bench_epoll_*  function. Though the logs shows direct
failure from pthread_create, the actual failure is from  "sched_setaffinity"
returning EINVAL (invalid argument). This happens because the default
mask size in glibc is 1024. To overcome this 1024 CPUs mask size
limitation of cpu_set_t, change the mask size using the CPU_*_S macros.

Patch addresses this by fixing all the epoll benchmarks to use
CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
CPU_SET_S to set the mask.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/bench/epoll-ctl.c  | 25 +++++++++++++++++++------
 tools/perf/bench/epoll-wait.c | 25 +++++++++++++++++++------
 2 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/tools/perf/bench/epoll-ctl.c b/tools/perf/bench/epoll-ctl.c
index 1a17ec83d3c4..91c53f6c6d87 100644
--- a/tools/perf/bench/epoll-ctl.c
+++ b/tools/perf/bench/epoll-ctl.c
@@ -222,13 +222,20 @@ static void init_fdmaps(struct worker *w, int pct)
 static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 {
 	pthread_attr_t thread_attr, *attrp = NULL;
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i, j;
 	int ret = 0;
+	int nrcpus;
+	size_t size;
 
 	if (!noaffinity)
 		pthread_attr_init(&thread_attr);
 
+	nrcpus = perf_cpu_map__nr(cpu);
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < nthreads; i++) {
 		struct worker *w = &worker[i];
 
@@ -252,22 +259,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 			init_fdmaps(w, 50);
 
 		if (!noaffinity) {
-			CPU_ZERO(&cpuset);
-			CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+			CPU_ZERO_S(size, cpuset);
+			CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
+					size, cpuset);
 
-			ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
-			if (ret)
+			ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
+			if (ret) {
+				CPU_FREE(cpuset);
 				err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+			}
 
 			attrp = &thread_attr;
 		}
 
 		ret = pthread_create(&w->thread, attrp, workerfn,
 				     (void *)(struct worker *) w);
-		if (ret)
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
 
+	CPU_FREE(cpuset);
 	if (!noaffinity)
 		pthread_attr_destroy(&thread_attr);
 
diff --git a/tools/perf/bench/epoll-wait.c b/tools/perf/bench/epoll-wait.c
index 0d1dd8879197..9469a53ffab9 100644
--- a/tools/perf/bench/epoll-wait.c
+++ b/tools/perf/bench/epoll-wait.c
@@ -291,9 +291,11 @@ static void print_summary(void)
 static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 {
 	pthread_attr_t thread_attr, *attrp = NULL;
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i, j;
 	int ret = 0, events = EPOLLIN;
+	int nrcpus;
+	size_t size;
 
 	if (oneshot)
 		events |= EPOLLONESHOT;
@@ -306,6 +308,11 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 	if (!noaffinity)
 		pthread_attr_init(&thread_attr);
 
+	nrcpus = perf_cpu_map__nr(cpu);
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < nthreads; i++) {
 		struct worker *w = &worker[i];
 
@@ -341,22 +348,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 		}
 
 		if (!noaffinity) {
-			CPU_ZERO(&cpuset);
-			CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+			CPU_ZERO_S(size, cpuset);
+			CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
+					size, cpuset);
 
-			ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
-			if (ret)
+			ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
+			if (ret) {
+				CPU_FREE(cpuset);
 				err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+			}
 
 			attrp = &thread_attr;
 		}
 
 		ret = pthread_create(&w->thread, attrp, workerfn,
 				     (void *)(struct worker *) w);
-		if (ret)
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
 
+	CPU_FREE(cpuset);
 	if (!noaffinity)
 		pthread_attr_destroy(&thread_attr);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] tools/perf: Fix perf bench epoll to correct usage of affinity for machines with #CPUs > 1K
@ 2022-04-01 18:58   ` Athira Rajeev
  0 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-01 18:58 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: maddy, rnsastry, linux-perf-users, kjain, linuxppc-dev

perf bench epoll testcase fails on systems with CPU's
more than 1K.

Testcase: perf bench epoll all
Result snippet:
<<>>
Run summary [PID 106497]: 1399 threads monitoring on 64 file-descriptors for 8 secs.

perf: pthread_create: No such file or directory
<<>>

In epoll benchmarks (ctl, wait) pthread_create is invoked in do_threads
from respective bench_epoll_*  function. Though the logs shows direct
failure from pthread_create, the actual failure is from  "sched_setaffinity"
returning EINVAL (invalid argument). This happens because the default
mask size in glibc is 1024. To overcome this 1024 CPUs mask size
limitation of cpu_set_t, change the mask size using the CPU_*_S macros.

Patch addresses this by fixing all the epoll benchmarks to use
CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
CPU_SET_S to set the mask.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/bench/epoll-ctl.c  | 25 +++++++++++++++++++------
 tools/perf/bench/epoll-wait.c | 25 +++++++++++++++++++------
 2 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/tools/perf/bench/epoll-ctl.c b/tools/perf/bench/epoll-ctl.c
index 1a17ec83d3c4..91c53f6c6d87 100644
--- a/tools/perf/bench/epoll-ctl.c
+++ b/tools/perf/bench/epoll-ctl.c
@@ -222,13 +222,20 @@ static void init_fdmaps(struct worker *w, int pct)
 static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 {
 	pthread_attr_t thread_attr, *attrp = NULL;
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i, j;
 	int ret = 0;
+	int nrcpus;
+	size_t size;
 
 	if (!noaffinity)
 		pthread_attr_init(&thread_attr);
 
+	nrcpus = perf_cpu_map__nr(cpu);
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < nthreads; i++) {
 		struct worker *w = &worker[i];
 
@@ -252,22 +259,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 			init_fdmaps(w, 50);
 
 		if (!noaffinity) {
-			CPU_ZERO(&cpuset);
-			CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+			CPU_ZERO_S(size, cpuset);
+			CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
+					size, cpuset);
 
-			ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
-			if (ret)
+			ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
+			if (ret) {
+				CPU_FREE(cpuset);
 				err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+			}
 
 			attrp = &thread_attr;
 		}
 
 		ret = pthread_create(&w->thread, attrp, workerfn,
 				     (void *)(struct worker *) w);
-		if (ret)
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
 
+	CPU_FREE(cpuset);
 	if (!noaffinity)
 		pthread_attr_destroy(&thread_attr);
 
diff --git a/tools/perf/bench/epoll-wait.c b/tools/perf/bench/epoll-wait.c
index 0d1dd8879197..9469a53ffab9 100644
--- a/tools/perf/bench/epoll-wait.c
+++ b/tools/perf/bench/epoll-wait.c
@@ -291,9 +291,11 @@ static void print_summary(void)
 static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 {
 	pthread_attr_t thread_attr, *attrp = NULL;
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i, j;
 	int ret = 0, events = EPOLLIN;
+	int nrcpus;
+	size_t size;
 
 	if (oneshot)
 		events |= EPOLLONESHOT;
@@ -306,6 +308,11 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 	if (!noaffinity)
 		pthread_attr_init(&thread_attr);
 
+	nrcpus = perf_cpu_map__nr(cpu);
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < nthreads; i++) {
 		struct worker *w = &worker[i];
 
@@ -341,22 +348,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 		}
 
 		if (!noaffinity) {
-			CPU_ZERO(&cpuset);
-			CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+			CPU_ZERO_S(size, cpuset);
+			CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
+					size, cpuset);
 
-			ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
-			if (ret)
+			ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
+			if (ret) {
+				CPU_FREE(cpuset);
 				err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+			}
 
 			attrp = &thread_attr;
 		}
 
 		ret = pthread_create(&w->thread, attrp, workerfn,
 				     (void *)(struct worker *) w);
-		if (ret)
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
 
+	CPU_FREE(cpuset);
 	if (!noaffinity)
 		pthread_attr_destroy(&thread_attr);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] tools/perf: Fix perf numa bench to fix usage of affinity for machines with #CPUs > 1K
  2022-04-01 18:58 ` Athira Rajeev
@ 2022-04-01 18:58   ` Athira Rajeev
  -1 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-01 18:58 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain

perf bench numa testcase fails on systems with CPU's
more than 1K.

Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1
Snippet of code:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>

bind_to_node function uses "sched_getaffinity" to save the original
cpumask and this call is returning EINVAL ((invalid argument).
This happens because the default mask size in glibc is 1024.
To overcome this 1024 CPUs mask size limitation of cpu_set_t,
change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this
for "orig_mask", apply same logic to "mask" as well which is used to
setaffinity so that mask size is large enough to represent number
of possible CPU's in the system.

sched_getaffinity is used in one more place in perf numa bench. It
is in "bind_to_cpu" function. Apply the same logic there also. Though
currently no failure is reported from there, it is ideal to change
getaffinity to work with such system configurations having CPU's more
than default mask size supported by glibc.

Also fix "sched_setaffinity" to use mask size which is large enough
to represent number of possible CPU's in the system.

Fixed all places where "bind_cpumask" which is part of "struct
thread_data" is used such that bind_cpumask works in all configuration.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/bench/numa.c | 109 +++++++++++++++++++++++++++++-----------
 1 file changed, 81 insertions(+), 28 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index f2640179ada9..333896907e45 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -54,7 +54,7 @@
 
 struct thread_data {
 	int			curr_cpu;
-	cpu_set_t		bind_cpumask;
+	cpu_set_t		*bind_cpumask;
 	int			bind_node;
 	u8			*process_data;
 	int			process_nr;
@@ -266,46 +266,75 @@ static bool node_has_cpus(int node)
 	return ret;
 }
 
-static cpu_set_t bind_to_cpu(int target_cpu)
+static cpu_set_t *bind_to_cpu(int target_cpu)
 {
-	cpu_set_t orig_mask, mask;
+	int nrcpus = numa_num_possible_cpus();
+	cpu_set_t *orig_mask, *mask;
+	size_t size;
 	int ret;
 
-	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
-	BUG_ON(ret);
+	orig_mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!orig_mask);
+	size = CPU_ALLOC_SIZE(nrcpus);
+	CPU_ZERO_S(size, orig_mask);
+
+	ret = sched_getaffinity(0, size, orig_mask);
+	if (ret) {
+		CPU_FREE(orig_mask);
+		BUG_ON(ret);
+	}
 
-	CPU_ZERO(&mask);
+	mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!mask);
+	CPU_ZERO_S(size, mask);
 
 	if (target_cpu == -1) {
 		int cpu;
 
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &mask);
+			CPU_SET_S(cpu, size, mask);
 	} else {
 		BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
-		CPU_SET(target_cpu, &mask);
+		CPU_SET_S(target_cpu, size, mask);
 	}
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	ret = sched_setaffinity(0, size, mask);
+	if (ret) {
+		CPU_FREE(mask);
+		BUG_ON(ret);
+	}
+
+	CPU_FREE(mask);
 
 	return orig_mask;
 }
 
-static cpu_set_t bind_to_node(int target_node)
+static cpu_set_t *bind_to_node(int target_node)
 {
-	cpu_set_t orig_mask, mask;
+	int nrcpus = numa_num_possible_cpus();
+	cpu_set_t *orig_mask, *mask;
+	size_t size;
 	int cpu;
 	int ret;
 
-	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
-	BUG_ON(ret);
+	orig_mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!orig_mask);
+	size = CPU_ALLOC_SIZE(nrcpus);
+	CPU_ZERO_S(size, orig_mask);
+
+	ret = sched_getaffinity(0, size, orig_mask);
+	if (ret) {
+		CPU_FREE(orig_mask);
+		BUG_ON(ret);
+	}
 
-	CPU_ZERO(&mask);
+	mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!mask);
+	CPU_ZERO_S(size, mask);
 
 	if (target_node == NUMA_NO_NODE) {
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &mask);
+			CPU_SET_S(cpu, size, mask);
 	} else {
 		struct bitmask *cpumask = numa_allocate_cpumask();
 
@@ -313,24 +342,33 @@ static cpu_set_t bind_to_node(int target_node)
 		if (!numa_node_to_cpus(target_node, cpumask)) {
 			for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
 				if (numa_bitmask_isbitset(cpumask, cpu))
-					CPU_SET(cpu, &mask);
+					CPU_SET_S(cpu, size, mask);
 			}
 		}
 		numa_free_cpumask(cpumask);
 	}
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	ret = sched_setaffinity(0, size, mask);
+	if (ret) {
+		CPU_FREE(mask);
+		BUG_ON(ret);
+	}
+
+	CPU_FREE(mask);
 
 	return orig_mask;
 }
 
-static void bind_to_cpumask(cpu_set_t mask)
+static void bind_to_cpumask(cpu_set_t *mask)
 {
 	int ret;
+	size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	ret = sched_setaffinity(0, size, mask);
+	if (ret) {
+		CPU_FREE(mask);
+		BUG_ON(ret);
+	}
 }
 
 static void mempol_restore(void)
@@ -376,7 +414,7 @@ do {							\
 static u8 *alloc_data(ssize_t bytes0, int map_flags,
 		      int init_zero, int init_cpu0, int thp, int init_random)
 {
-	cpu_set_t orig_mask;
+	cpu_set_t *orig_mask;
 	ssize_t bytes;
 	u8 *buf;
 	int ret;
@@ -434,6 +472,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
 	/* Restore affinity: */
 	if (init_cpu0) {
 		bind_to_cpumask(orig_mask);
+		CPU_FREE(orig_mask);
 		mempol_restore();
 	}
 
@@ -589,6 +628,7 @@ static int parse_setup_cpu_list(void)
 		BUG_ON(bind_cpu_0 > bind_cpu_1);
 
 		for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
+			size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
 			int i;
 
 			for (i = 0; i < mul; i++) {
@@ -608,10 +648,12 @@ static int parse_setup_cpu_list(void)
 					tprintf("%2d", bind_cpu);
 				}
 
-				CPU_ZERO(&td->bind_cpumask);
+				td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+				BUG_ON(!td->bind_cpumask);
+				CPU_ZERO_S(size, td->bind_cpumask);
 				for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
 					BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
-					CPU_SET(cpu, &td->bind_cpumask);
+					CPU_SET_S(cpu, size, td->bind_cpumask);
 				}
 				t++;
 			}
@@ -1241,7 +1283,7 @@ static void *worker_thread(void *__tdata)
 		 * by migrating to CPU#0:
 		 */
 		if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
-			cpu_set_t orig_mask;
+			cpu_set_t *orig_mask;
 			int target_cpu;
 			int this_cpu;
 
@@ -1265,6 +1307,7 @@ static void *worker_thread(void *__tdata)
 				printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
 
 			bind_to_cpumask(orig_mask);
+			CPU_FREE(orig_mask);
 		}
 
 		if (details >= 3) {
@@ -1398,21 +1441,31 @@ static void init_thread_data(void)
 
 	for (t = 0; t < g->p.nr_tasks; t++) {
 		struct thread_data *td = g->threads + t;
+		size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
 		int cpu;
 
 		/* Allow all nodes by default: */
 		td->bind_node = NUMA_NO_NODE;
 
 		/* Allow all CPUs by default: */
-		CPU_ZERO(&td->bind_cpumask);
+		td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+		BUG_ON(!td->bind_cpumask);
+		CPU_ZERO_S(cpuset_size, td->bind_cpumask);
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &td->bind_cpumask);
+			CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
 	}
 }
 
 static void deinit_thread_data(void)
 {
 	ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
+	int t;
+
+	/* Free the bind_cpumask allocated for thread_data */
+	for (t = 0; t < g->p.nr_tasks; t++) {
+		struct thread_data *td = g->threads + t;
+		CPU_FREE(td->bind_cpumask);
+	}
 
 	free_data(g->threads, size);
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] tools/perf: Fix perf numa bench to fix usage of affinity for machines with #CPUs > 1K
@ 2022-04-01 18:58   ` Athira Rajeev
  0 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-01 18:58 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: maddy, rnsastry, linux-perf-users, kjain, linuxppc-dev

perf bench numa testcase fails on systems with CPU's
more than 1K.

Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1
Snippet of code:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>

bind_to_node function uses "sched_getaffinity" to save the original
cpumask and this call is returning EINVAL ((invalid argument).
This happens because the default mask size in glibc is 1024.
To overcome this 1024 CPUs mask size limitation of cpu_set_t,
change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this
for "orig_mask", apply same logic to "mask" as well which is used to
setaffinity so that mask size is large enough to represent number
of possible CPU's in the system.

sched_getaffinity is used in one more place in perf numa bench. It
is in "bind_to_cpu" function. Apply the same logic there also. Though
currently no failure is reported from there, it is ideal to change
getaffinity to work with such system configurations having CPU's more
than default mask size supported by glibc.

Also fix "sched_setaffinity" to use mask size which is large enough
to represent number of possible CPU's in the system.

Fixed all places where "bind_cpumask" which is part of "struct
thread_data" is used such that bind_cpumask works in all configuration.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/bench/numa.c | 109 +++++++++++++++++++++++++++++-----------
 1 file changed, 81 insertions(+), 28 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index f2640179ada9..333896907e45 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -54,7 +54,7 @@
 
 struct thread_data {
 	int			curr_cpu;
-	cpu_set_t		bind_cpumask;
+	cpu_set_t		*bind_cpumask;
 	int			bind_node;
 	u8			*process_data;
 	int			process_nr;
@@ -266,46 +266,75 @@ static bool node_has_cpus(int node)
 	return ret;
 }
 
-static cpu_set_t bind_to_cpu(int target_cpu)
+static cpu_set_t *bind_to_cpu(int target_cpu)
 {
-	cpu_set_t orig_mask, mask;
+	int nrcpus = numa_num_possible_cpus();
+	cpu_set_t *orig_mask, *mask;
+	size_t size;
 	int ret;
 
-	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
-	BUG_ON(ret);
+	orig_mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!orig_mask);
+	size = CPU_ALLOC_SIZE(nrcpus);
+	CPU_ZERO_S(size, orig_mask);
+
+	ret = sched_getaffinity(0, size, orig_mask);
+	if (ret) {
+		CPU_FREE(orig_mask);
+		BUG_ON(ret);
+	}
 
-	CPU_ZERO(&mask);
+	mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!mask);
+	CPU_ZERO_S(size, mask);
 
 	if (target_cpu == -1) {
 		int cpu;
 
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &mask);
+			CPU_SET_S(cpu, size, mask);
 	} else {
 		BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
-		CPU_SET(target_cpu, &mask);
+		CPU_SET_S(target_cpu, size, mask);
 	}
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	ret = sched_setaffinity(0, size, mask);
+	if (ret) {
+		CPU_FREE(mask);
+		BUG_ON(ret);
+	}
+
+	CPU_FREE(mask);
 
 	return orig_mask;
 }
 
-static cpu_set_t bind_to_node(int target_node)
+static cpu_set_t *bind_to_node(int target_node)
 {
-	cpu_set_t orig_mask, mask;
+	int nrcpus = numa_num_possible_cpus();
+	cpu_set_t *orig_mask, *mask;
+	size_t size;
 	int cpu;
 	int ret;
 
-	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
-	BUG_ON(ret);
+	orig_mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!orig_mask);
+	size = CPU_ALLOC_SIZE(nrcpus);
+	CPU_ZERO_S(size, orig_mask);
+
+	ret = sched_getaffinity(0, size, orig_mask);
+	if (ret) {
+		CPU_FREE(orig_mask);
+		BUG_ON(ret);
+	}
 
-	CPU_ZERO(&mask);
+	mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!mask);
+	CPU_ZERO_S(size, mask);
 
 	if (target_node == NUMA_NO_NODE) {
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &mask);
+			CPU_SET_S(cpu, size, mask);
 	} else {
 		struct bitmask *cpumask = numa_allocate_cpumask();
 
@@ -313,24 +342,33 @@ static cpu_set_t bind_to_node(int target_node)
 		if (!numa_node_to_cpus(target_node, cpumask)) {
 			for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
 				if (numa_bitmask_isbitset(cpumask, cpu))
-					CPU_SET(cpu, &mask);
+					CPU_SET_S(cpu, size, mask);
 			}
 		}
 		numa_free_cpumask(cpumask);
 	}
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	ret = sched_setaffinity(0, size, mask);
+	if (ret) {
+		CPU_FREE(mask);
+		BUG_ON(ret);
+	}
+
+	CPU_FREE(mask);
 
 	return orig_mask;
 }
 
-static void bind_to_cpumask(cpu_set_t mask)
+static void bind_to_cpumask(cpu_set_t *mask)
 {
 	int ret;
+	size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	ret = sched_setaffinity(0, size, mask);
+	if (ret) {
+		CPU_FREE(mask);
+		BUG_ON(ret);
+	}
 }
 
 static void mempol_restore(void)
@@ -376,7 +414,7 @@ do {							\
 static u8 *alloc_data(ssize_t bytes0, int map_flags,
 		      int init_zero, int init_cpu0, int thp, int init_random)
 {
-	cpu_set_t orig_mask;
+	cpu_set_t *orig_mask;
 	ssize_t bytes;
 	u8 *buf;
 	int ret;
@@ -434,6 +472,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
 	/* Restore affinity: */
 	if (init_cpu0) {
 		bind_to_cpumask(orig_mask);
+		CPU_FREE(orig_mask);
 		mempol_restore();
 	}
 
@@ -589,6 +628,7 @@ static int parse_setup_cpu_list(void)
 		BUG_ON(bind_cpu_0 > bind_cpu_1);
 
 		for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
+			size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
 			int i;
 
 			for (i = 0; i < mul; i++) {
@@ -608,10 +648,12 @@ static int parse_setup_cpu_list(void)
 					tprintf("%2d", bind_cpu);
 				}
 
-				CPU_ZERO(&td->bind_cpumask);
+				td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+				BUG_ON(!td->bind_cpumask);
+				CPU_ZERO_S(size, td->bind_cpumask);
 				for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
 					BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
-					CPU_SET(cpu, &td->bind_cpumask);
+					CPU_SET_S(cpu, size, td->bind_cpumask);
 				}
 				t++;
 			}
@@ -1241,7 +1283,7 @@ static void *worker_thread(void *__tdata)
 		 * by migrating to CPU#0:
 		 */
 		if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
-			cpu_set_t orig_mask;
+			cpu_set_t *orig_mask;
 			int target_cpu;
 			int this_cpu;
 
@@ -1265,6 +1307,7 @@ static void *worker_thread(void *__tdata)
 				printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
 
 			bind_to_cpumask(orig_mask);
+			CPU_FREE(orig_mask);
 		}
 
 		if (details >= 3) {
@@ -1398,21 +1441,31 @@ static void init_thread_data(void)
 
 	for (t = 0; t < g->p.nr_tasks; t++) {
 		struct thread_data *td = g->threads + t;
+		size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
 		int cpu;
 
 		/* Allow all nodes by default: */
 		td->bind_node = NUMA_NO_NODE;
 
 		/* Allow all CPUs by default: */
-		CPU_ZERO(&td->bind_cpumask);
+		td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+		BUG_ON(!td->bind_cpumask);
+		CPU_ZERO_S(cpuset_size, td->bind_cpumask);
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &td->bind_cpumask);
+			CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
 	}
 }
 
 static void deinit_thread_data(void)
 {
 	ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
+	int t;
+
+	/* Free the bind_cpumask allocated for thread_data */
+	for (t = 0; t < g->p.nr_tasks; t++) {
+		struct thread_data *td = g->threads + t;
+		CPU_FREE(td->bind_cpumask);
+	}
 
 	free_data(g->threads, size);
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online
  2022-04-01 18:58 ` Athira Rajeev
@ 2022-04-01 18:58   ` Athira Rajeev
  -1 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-01 18:58 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain

Perf numa bench test fails with error:

Testcase:
./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
--thp  1 --no-data_rand_walk

Failure snippet:
<<>>
 Running 'numa/mem' benchmark:

 # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
-M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"

perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
<<>>

The Testcases uses CPU’s 0 and 8. In function "parse_setup_cpu_list",
There is check to see if cpu number is greater than max cpu’s possible
in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
say 48 CPU’s, but only number of online CPU’s is 0-7. Other CPU’s
are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
and set bit for CPU 8 also in cpumask ( td->bind_cpumask).

bind_to_cpumask function is called to set affinity using
sched_setaffinity and the cpumask. Since the CPU8 is not present,
set affinity will fail here with EINVAL. Fix this issue by adding a
check to make sure that, CPU’s provided in the input argument values
are online before proceeding further and skip the test. For this,
include new helper function "is_cpu_online" in
"tools/perf/util/header.c".

Since "BIT(x)" definition will get included from header.h, remove
that from bench/numa.c

Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/bench/numa.c  |  8 ++++++--
 tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/header.h |  1 +
 3 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 333896907e45..42e2b2ed30c3 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -34,6 +34,7 @@
 #include <linux/numa.h>
 #include <linux/zalloc.h>
 
+#include "../util/header.h"
 #include <numa.h>
 #include <numaif.h>
 
@@ -624,6 +625,11 @@ static int parse_setup_cpu_list(void)
 			return -1;
 		}
 
+		if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
+			printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
+			return -1;
+		}
+
 		BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
 		BUG_ON(bind_cpu_0 > bind_cpu_1);
 
@@ -794,8 +800,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
 	return parse_node_list(arg);
 }
 
-#define BIT(x) (1ul << x)
-
 static inline uint32_t lfsr_32(uint32_t lfsr)
 {
 	const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 6da12e522edc..3f5fcf5d4b3f 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
 	return do_write(ff, &data->dir.version, sizeof(data->dir.version));
 }
 
+#define SYSFS "/sys/devices/system/cpu/"
+
+/*
+ * Check whether a CPU is online
+ *
+ * Returns:
+ *     1 -> if CPU is online
+ *     0 -> if CPU is offline
+ *    -1 -> error case
+ */
+int is_cpu_online(unsigned int cpu)
+{
+	char sysfs_cpu[255];
+	char buf[255];
+	struct stat statbuf;
+	size_t len;
+	int fd;
+
+	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
+
+	if (stat(sysfs_cpu, &statbuf) != 0)
+		return 0;
+
+	/*
+	 * Check if /sys/devices/system/cpu/cpux/online file
+	 * exists. In kernels without CONFIG_HOTPLUG_CPU, this
+	 * file won't exist.
+	 */
+	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
+	if (stat(sysfs_cpu, &statbuf) != 0)
+		return 1;
+
+	fd = open(sysfs_cpu, O_RDONLY);
+	if (fd == -1)
+		return -1;
+
+	len = read(fd, buf, sizeof(buf) - 1);
+	buf[len] = '\0';
+	close(fd);
+
+	return strtoul(buf, NULL, 16);
+}
+
 #ifdef HAVE_LIBBPF_SUPPORT
 static int write_bpf_prog_info(struct feat_fd *ff,
 			       struct evlist *evlist __maybe_unused)
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index c9e3265832d9..0eb4bc29a5a4 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
 int write_padded(struct feat_fd *fd, const void *bf,
 		 size_t count, size_t count_aligned);
 
+int is_cpu_online(unsigned int cpu);
 /*
  * arch specific callback
  */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online
@ 2022-04-01 18:58   ` Athira Rajeev
  0 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-01 18:58 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: maddy, rnsastry, linux-perf-users, kjain, linuxppc-dev

Perf numa bench test fails with error:

Testcase:
./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
--thp  1 --no-data_rand_walk

Failure snippet:
<<>>
 Running 'numa/mem' benchmark:

 # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
-M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"

perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
<<>>

The Testcases uses CPU’s 0 and 8. In function "parse_setup_cpu_list",
There is check to see if cpu number is greater than max cpu’s possible
in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
say 48 CPU’s, but only number of online CPU’s is 0-7. Other CPU’s
are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
and set bit for CPU 8 also in cpumask ( td->bind_cpumask).

bind_to_cpumask function is called to set affinity using
sched_setaffinity and the cpumask. Since the CPU8 is not present,
set affinity will fail here with EINVAL. Fix this issue by adding a
check to make sure that, CPU’s provided in the input argument values
are online before proceeding further and skip the test. For this,
include new helper function "is_cpu_online" in
"tools/perf/util/header.c".

Since "BIT(x)" definition will get included from header.h, remove
that from bench/numa.c

Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 tools/perf/bench/numa.c  |  8 ++++++--
 tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/header.h |  1 +
 3 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 333896907e45..42e2b2ed30c3 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -34,6 +34,7 @@
 #include <linux/numa.h>
 #include <linux/zalloc.h>
 
+#include "../util/header.h"
 #include <numa.h>
 #include <numaif.h>
 
@@ -624,6 +625,11 @@ static int parse_setup_cpu_list(void)
 			return -1;
 		}
 
+		if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
+			printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
+			return -1;
+		}
+
 		BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
 		BUG_ON(bind_cpu_0 > bind_cpu_1);
 
@@ -794,8 +800,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
 	return parse_node_list(arg);
 }
 
-#define BIT(x) (1ul << x)
-
 static inline uint32_t lfsr_32(uint32_t lfsr)
 {
 	const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 6da12e522edc..3f5fcf5d4b3f 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
 	return do_write(ff, &data->dir.version, sizeof(data->dir.version));
 }
 
+#define SYSFS "/sys/devices/system/cpu/"
+
+/*
+ * Check whether a CPU is online
+ *
+ * Returns:
+ *     1 -> if CPU is online
+ *     0 -> if CPU is offline
+ *    -1 -> error case
+ */
+int is_cpu_online(unsigned int cpu)
+{
+	char sysfs_cpu[255];
+	char buf[255];
+	struct stat statbuf;
+	size_t len;
+	int fd;
+
+	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
+
+	if (stat(sysfs_cpu, &statbuf) != 0)
+		return 0;
+
+	/*
+	 * Check if /sys/devices/system/cpu/cpux/online file
+	 * exists. In kernels without CONFIG_HOTPLUG_CPU, this
+	 * file won't exist.
+	 */
+	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
+	if (stat(sysfs_cpu, &statbuf) != 0)
+		return 1;
+
+	fd = open(sysfs_cpu, O_RDONLY);
+	if (fd == -1)
+		return -1;
+
+	len = read(fd, buf, sizeof(buf) - 1);
+	buf[len] = '\0';
+	close(fd);
+
+	return strtoul(buf, NULL, 16);
+}
+
 #ifdef HAVE_LIBBPF_SUPPORT
 static int write_bpf_prog_info(struct feat_fd *ff,
 			       struct evlist *evlist __maybe_unused)
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index c9e3265832d9..0eb4bc29a5a4 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
 int write_padded(struct feat_fd *fd, const void *bf,
 		 size_t count, size_t count_aligned);
 
+int is_cpu_online(unsigned int cpu);
 /*
  * arch specific callback
  */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] tools/perf: Fix perf numa bench to fix usage of affinity for machines with #CPUs > 1K
  2022-04-01 18:58   ` Athira Rajeev
@ 2022-04-05 17:52     ` Ian Rogers
  -1 siblings, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2022-04-05 17:52 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: maddy, rnsastry, acme, linux-perf-users, jolsa, kjain, disgoel,
	linuxppc-dev

On Fri, Apr 1, 2022 at 11:59 AM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
> perf bench numa testcase fails on systems with CPU's
> more than 1K.
>
> Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1
> Snippet of code:
> <<>>
> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
> Aborted (core dumped)
> <<>>
>
> bind_to_node function uses "sched_getaffinity" to save the original
> cpumask and this call is returning EINVAL ((invalid argument).
> This happens because the default mask size in glibc is 1024.
> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
> allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this
> for "orig_mask", apply same logic to "mask" as well which is used to
> setaffinity so that mask size is large enough to represent number
> of possible CPU's in the system.
>
> sched_getaffinity is used in one more place in perf numa bench. It
> is in "bind_to_cpu" function. Apply the same logic there also. Though
> currently no failure is reported from there, it is ideal to change
> getaffinity to work with such system configurations having CPU's more
> than default mask size supported by glibc.
>
> Also fix "sched_setaffinity" to use mask size which is large enough
> to represent number of possible CPU's in the system.
>
> Fixed all places where "bind_cpumask" which is part of "struct
> thread_data" is used such that bind_cpumask works in all configuration.
>
> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/perf/bench/numa.c | 109 +++++++++++++++++++++++++++++-----------
>  1 file changed, 81 insertions(+), 28 deletions(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index f2640179ada9..333896907e45 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -54,7 +54,7 @@
>
>  struct thread_data {
>         int                     curr_cpu;
> -       cpu_set_t               bind_cpumask;
> +       cpu_set_t               *bind_cpumask;
>         int                     bind_node;
>         u8                      *process_data;
>         int                     process_nr;
> @@ -266,46 +266,75 @@ static bool node_has_cpus(int node)
>         return ret;
>  }
>
> -static cpu_set_t bind_to_cpu(int target_cpu)
> +static cpu_set_t *bind_to_cpu(int target_cpu)
>  {
> -       cpu_set_t orig_mask, mask;
> +       int nrcpus = numa_num_possible_cpus();
> +       cpu_set_t *orig_mask, *mask;
> +       size_t size;
>         int ret;
>
> -       ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> -       BUG_ON(ret);
> +       orig_mask = CPU_ALLOC(nrcpus);
> +       BUG_ON(!orig_mask);
> +       size = CPU_ALLOC_SIZE(nrcpus);
> +       CPU_ZERO_S(size, orig_mask);
> +
> +       ret = sched_getaffinity(0, size, orig_mask);
> +       if (ret) {
> +               CPU_FREE(orig_mask);
> +               BUG_ON(ret);
> +       }
>
> -       CPU_ZERO(&mask);
> +       mask = CPU_ALLOC(nrcpus);
> +       BUG_ON(!mask);
> +       CPU_ZERO_S(size, mask);
>
>         if (target_cpu == -1) {
>                 int cpu;
>
>                 for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> -                       CPU_SET(cpu, &mask);
> +                       CPU_SET_S(cpu, size, mask);
>         } else {
>                 BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
> -               CPU_SET(target_cpu, &mask);
> +               CPU_SET_S(target_cpu, size, mask);
>         }
>
> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
> -       BUG_ON(ret);
> +       ret = sched_setaffinity(0, size, mask);
> +       if (ret) {
> +               CPU_FREE(mask);
> +               BUG_ON(ret);
> +       }
> +
> +       CPU_FREE(mask);

This all looks good, a nit here it could it be a little shorter as:
ret = sched_setaffinity(0, size, mask);
CPU_FREE(mask);
BUG_ON(ret);

Thanks,
Ian

>
>         return orig_mask;
>  }
>
> -static cpu_set_t bind_to_node(int target_node)
> +static cpu_set_t *bind_to_node(int target_node)
>  {
> -       cpu_set_t orig_mask, mask;
> +       int nrcpus = numa_num_possible_cpus();
> +       cpu_set_t *orig_mask, *mask;
> +       size_t size;
>         int cpu;
>         int ret;
>
> -       ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> -       BUG_ON(ret);
> +       orig_mask = CPU_ALLOC(nrcpus);
> +       BUG_ON(!orig_mask);
> +       size = CPU_ALLOC_SIZE(nrcpus);
> +       CPU_ZERO_S(size, orig_mask);
> +
> +       ret = sched_getaffinity(0, size, orig_mask);
> +       if (ret) {
> +               CPU_FREE(orig_mask);
> +               BUG_ON(ret);
> +       }
>
> -       CPU_ZERO(&mask);
> +       mask = CPU_ALLOC(nrcpus);
> +       BUG_ON(!mask);
> +       CPU_ZERO_S(size, mask);
>
>         if (target_node == NUMA_NO_NODE) {
>                 for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> -                       CPU_SET(cpu, &mask);
> +                       CPU_SET_S(cpu, size, mask);
>         } else {
>                 struct bitmask *cpumask = numa_allocate_cpumask();
>
> @@ -313,24 +342,33 @@ static cpu_set_t bind_to_node(int target_node)
>                 if (!numa_node_to_cpus(target_node, cpumask)) {
>                         for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
>                                 if (numa_bitmask_isbitset(cpumask, cpu))
> -                                       CPU_SET(cpu, &mask);
> +                                       CPU_SET_S(cpu, size, mask);
>                         }
>                 }
>                 numa_free_cpumask(cpumask);
>         }
>
> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
> -       BUG_ON(ret);
> +       ret = sched_setaffinity(0, size, mask);
> +       if (ret) {
> +               CPU_FREE(mask);
> +               BUG_ON(ret);
> +       }
> +
> +       CPU_FREE(mask);
>
>         return orig_mask;
>  }
>
> -static void bind_to_cpumask(cpu_set_t mask)
> +static void bind_to_cpumask(cpu_set_t *mask)
>  {
>         int ret;
> +       size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
>
> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
> -       BUG_ON(ret);
> +       ret = sched_setaffinity(0, size, mask);
> +       if (ret) {
> +               CPU_FREE(mask);
> +               BUG_ON(ret);
> +       }
>  }
>
>  static void mempol_restore(void)
> @@ -376,7 +414,7 @@ do {                                                        \
>  static u8 *alloc_data(ssize_t bytes0, int map_flags,
>                       int init_zero, int init_cpu0, int thp, int init_random)
>  {
> -       cpu_set_t orig_mask;
> +       cpu_set_t *orig_mask;
>         ssize_t bytes;
>         u8 *buf;
>         int ret;
> @@ -434,6 +472,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
>         /* Restore affinity: */
>         if (init_cpu0) {
>                 bind_to_cpumask(orig_mask);
> +               CPU_FREE(orig_mask);
>                 mempol_restore();
>         }
>
> @@ -589,6 +628,7 @@ static int parse_setup_cpu_list(void)
>                 BUG_ON(bind_cpu_0 > bind_cpu_1);
>
>                 for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
> +                       size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
>                         int i;
>
>                         for (i = 0; i < mul; i++) {
> @@ -608,10 +648,12 @@ static int parse_setup_cpu_list(void)
>                                         tprintf("%2d", bind_cpu);
>                                 }
>
> -                               CPU_ZERO(&td->bind_cpumask);
> +                               td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
> +                               BUG_ON(!td->bind_cpumask);
> +                               CPU_ZERO_S(size, td->bind_cpumask);
>                                 for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
>                                         BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
> -                                       CPU_SET(cpu, &td->bind_cpumask);
> +                                       CPU_SET_S(cpu, size, td->bind_cpumask);
>                                 }
>                                 t++;
>                         }
> @@ -1241,7 +1283,7 @@ static void *worker_thread(void *__tdata)
>                  * by migrating to CPU#0:
>                  */
>                 if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
> -                       cpu_set_t orig_mask;
> +                       cpu_set_t *orig_mask;
>                         int target_cpu;
>                         int this_cpu;
>
> @@ -1265,6 +1307,7 @@ static void *worker_thread(void *__tdata)
>                                 printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
>
>                         bind_to_cpumask(orig_mask);
> +                       CPU_FREE(orig_mask);
>                 }
>
>                 if (details >= 3) {
> @@ -1398,21 +1441,31 @@ static void init_thread_data(void)
>
>         for (t = 0; t < g->p.nr_tasks; t++) {
>                 struct thread_data *td = g->threads + t;
> +               size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
>                 int cpu;
>
>                 /* Allow all nodes by default: */
>                 td->bind_node = NUMA_NO_NODE;
>
>                 /* Allow all CPUs by default: */
> -               CPU_ZERO(&td->bind_cpumask);
> +               td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
> +               BUG_ON(!td->bind_cpumask);
> +               CPU_ZERO_S(cpuset_size, td->bind_cpumask);
>                 for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> -                       CPU_SET(cpu, &td->bind_cpumask);
> +                       CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
>         }
>  }
>
>  static void deinit_thread_data(void)
>  {
>         ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
> +       int t;
> +
> +       /* Free the bind_cpumask allocated for thread_data */
> +       for (t = 0; t < g->p.nr_tasks; t++) {
> +               struct thread_data *td = g->threads + t;
> +               CPU_FREE(td->bind_cpumask);
> +       }
>
>         free_data(g->threads, size);
>  }
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] tools/perf: Fix perf numa bench to fix usage of affinity for machines with #CPUs > 1K
@ 2022-04-05 17:52     ` Ian Rogers
  0 siblings, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2022-04-05 17:52 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, disgoel, mpe, linux-perf-users, linuxppc-dev, maddy,
	rnsastry, kjain

On Fri, Apr 1, 2022 at 11:59 AM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
> perf bench numa testcase fails on systems with CPU's
> more than 1K.
>
> Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1
> Snippet of code:
> <<>>
> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
> Aborted (core dumped)
> <<>>
>
> bind_to_node function uses "sched_getaffinity" to save the original
> cpumask and this call is returning EINVAL ((invalid argument).
> This happens because the default mask size in glibc is 1024.
> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
> allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this
> for "orig_mask", apply same logic to "mask" as well which is used to
> setaffinity so that mask size is large enough to represent number
> of possible CPU's in the system.
>
> sched_getaffinity is used in one more place in perf numa bench. It
> is in "bind_to_cpu" function. Apply the same logic there also. Though
> currently no failure is reported from there, it is ideal to change
> getaffinity to work with such system configurations having CPU's more
> than default mask size supported by glibc.
>
> Also fix "sched_setaffinity" to use mask size which is large enough
> to represent number of possible CPU's in the system.
>
> Fixed all places where "bind_cpumask" which is part of "struct
> thread_data" is used such that bind_cpumask works in all configuration.
>
> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/perf/bench/numa.c | 109 +++++++++++++++++++++++++++++-----------
>  1 file changed, 81 insertions(+), 28 deletions(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index f2640179ada9..333896907e45 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -54,7 +54,7 @@
>
>  struct thread_data {
>         int                     curr_cpu;
> -       cpu_set_t               bind_cpumask;
> +       cpu_set_t               *bind_cpumask;
>         int                     bind_node;
>         u8                      *process_data;
>         int                     process_nr;
> @@ -266,46 +266,75 @@ static bool node_has_cpus(int node)
>         return ret;
>  }
>
> -static cpu_set_t bind_to_cpu(int target_cpu)
> +static cpu_set_t *bind_to_cpu(int target_cpu)
>  {
> -       cpu_set_t orig_mask, mask;
> +       int nrcpus = numa_num_possible_cpus();
> +       cpu_set_t *orig_mask, *mask;
> +       size_t size;
>         int ret;
>
> -       ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> -       BUG_ON(ret);
> +       orig_mask = CPU_ALLOC(nrcpus);
> +       BUG_ON(!orig_mask);
> +       size = CPU_ALLOC_SIZE(nrcpus);
> +       CPU_ZERO_S(size, orig_mask);
> +
> +       ret = sched_getaffinity(0, size, orig_mask);
> +       if (ret) {
> +               CPU_FREE(orig_mask);
> +               BUG_ON(ret);
> +       }
>
> -       CPU_ZERO(&mask);
> +       mask = CPU_ALLOC(nrcpus);
> +       BUG_ON(!mask);
> +       CPU_ZERO_S(size, mask);
>
>         if (target_cpu == -1) {
>                 int cpu;
>
>                 for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> -                       CPU_SET(cpu, &mask);
> +                       CPU_SET_S(cpu, size, mask);
>         } else {
>                 BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
> -               CPU_SET(target_cpu, &mask);
> +               CPU_SET_S(target_cpu, size, mask);
>         }
>
> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
> -       BUG_ON(ret);
> +       ret = sched_setaffinity(0, size, mask);
> +       if (ret) {
> +               CPU_FREE(mask);
> +               BUG_ON(ret);
> +       }
> +
> +       CPU_FREE(mask);

This all looks good, a nit here it could it be a little shorter as:
ret = sched_setaffinity(0, size, mask);
CPU_FREE(mask);
BUG_ON(ret);

Thanks,
Ian

>
>         return orig_mask;
>  }
>
> -static cpu_set_t bind_to_node(int target_node)
> +static cpu_set_t *bind_to_node(int target_node)
>  {
> -       cpu_set_t orig_mask, mask;
> +       int nrcpus = numa_num_possible_cpus();
> +       cpu_set_t *orig_mask, *mask;
> +       size_t size;
>         int cpu;
>         int ret;
>
> -       ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> -       BUG_ON(ret);
> +       orig_mask = CPU_ALLOC(nrcpus);
> +       BUG_ON(!orig_mask);
> +       size = CPU_ALLOC_SIZE(nrcpus);
> +       CPU_ZERO_S(size, orig_mask);
> +
> +       ret = sched_getaffinity(0, size, orig_mask);
> +       if (ret) {
> +               CPU_FREE(orig_mask);
> +               BUG_ON(ret);
> +       }
>
> -       CPU_ZERO(&mask);
> +       mask = CPU_ALLOC(nrcpus);
> +       BUG_ON(!mask);
> +       CPU_ZERO_S(size, mask);
>
>         if (target_node == NUMA_NO_NODE) {
>                 for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> -                       CPU_SET(cpu, &mask);
> +                       CPU_SET_S(cpu, size, mask);
>         } else {
>                 struct bitmask *cpumask = numa_allocate_cpumask();
>
> @@ -313,24 +342,33 @@ static cpu_set_t bind_to_node(int target_node)
>                 if (!numa_node_to_cpus(target_node, cpumask)) {
>                         for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
>                                 if (numa_bitmask_isbitset(cpumask, cpu))
> -                                       CPU_SET(cpu, &mask);
> +                                       CPU_SET_S(cpu, size, mask);
>                         }
>                 }
>                 numa_free_cpumask(cpumask);
>         }
>
> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
> -       BUG_ON(ret);
> +       ret = sched_setaffinity(0, size, mask);
> +       if (ret) {
> +               CPU_FREE(mask);
> +               BUG_ON(ret);
> +       }
> +
> +       CPU_FREE(mask);
>
>         return orig_mask;
>  }
>
> -static void bind_to_cpumask(cpu_set_t mask)
> +static void bind_to_cpumask(cpu_set_t *mask)
>  {
>         int ret;
> +       size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
>
> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
> -       BUG_ON(ret);
> +       ret = sched_setaffinity(0, size, mask);
> +       if (ret) {
> +               CPU_FREE(mask);
> +               BUG_ON(ret);
> +       }
>  }
>
>  static void mempol_restore(void)
> @@ -376,7 +414,7 @@ do {                                                        \
>  static u8 *alloc_data(ssize_t bytes0, int map_flags,
>                       int init_zero, int init_cpu0, int thp, int init_random)
>  {
> -       cpu_set_t orig_mask;
> +       cpu_set_t *orig_mask;
>         ssize_t bytes;
>         u8 *buf;
>         int ret;
> @@ -434,6 +472,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
>         /* Restore affinity: */
>         if (init_cpu0) {
>                 bind_to_cpumask(orig_mask);
> +               CPU_FREE(orig_mask);
>                 mempol_restore();
>         }
>
> @@ -589,6 +628,7 @@ static int parse_setup_cpu_list(void)
>                 BUG_ON(bind_cpu_0 > bind_cpu_1);
>
>                 for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
> +                       size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
>                         int i;
>
>                         for (i = 0; i < mul; i++) {
> @@ -608,10 +648,12 @@ static int parse_setup_cpu_list(void)
>                                         tprintf("%2d", bind_cpu);
>                                 }
>
> -                               CPU_ZERO(&td->bind_cpumask);
> +                               td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
> +                               BUG_ON(!td->bind_cpumask);
> +                               CPU_ZERO_S(size, td->bind_cpumask);
>                                 for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
>                                         BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
> -                                       CPU_SET(cpu, &td->bind_cpumask);
> +                                       CPU_SET_S(cpu, size, td->bind_cpumask);
>                                 }
>                                 t++;
>                         }
> @@ -1241,7 +1283,7 @@ static void *worker_thread(void *__tdata)
>                  * by migrating to CPU#0:
>                  */
>                 if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
> -                       cpu_set_t orig_mask;
> +                       cpu_set_t *orig_mask;
>                         int target_cpu;
>                         int this_cpu;
>
> @@ -1265,6 +1307,7 @@ static void *worker_thread(void *__tdata)
>                                 printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
>
>                         bind_to_cpumask(orig_mask);
> +                       CPU_FREE(orig_mask);
>                 }
>
>                 if (details >= 3) {
> @@ -1398,21 +1441,31 @@ static void init_thread_data(void)
>
>         for (t = 0; t < g->p.nr_tasks; t++) {
>                 struct thread_data *td = g->threads + t;
> +               size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
>                 int cpu;
>
>                 /* Allow all nodes by default: */
>                 td->bind_node = NUMA_NO_NODE;
>
>                 /* Allow all CPUs by default: */
> -               CPU_ZERO(&td->bind_cpumask);
> +               td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
> +               BUG_ON(!td->bind_cpumask);
> +               CPU_ZERO_S(cpuset_size, td->bind_cpumask);
>                 for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> -                       CPU_SET(cpu, &td->bind_cpumask);
> +                       CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
>         }
>  }
>
>  static void deinit_thread_data(void)
>  {
>         ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
> +       int t;
> +
> +       /* Free the bind_cpumask allocated for thread_data */
> +       for (t = 0; t < g->p.nr_tasks; t++) {
> +               struct thread_data *td = g->threads + t;
> +               CPU_FREE(td->bind_cpumask);
> +       }
>
>         free_data(g->threads, size);
>  }
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] tools/perf: Fix perf bench epoll to correct usage of affinity for machines with #CPUs > 1K
  2022-04-01 18:58   ` Athira Rajeev
@ 2022-04-05 17:56     ` Ian Rogers
  -1 siblings, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2022-04-05 17:56 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: maddy, rnsastry, acme, linux-perf-users, jolsa, kjain, disgoel,
	linuxppc-dev

On Fri, Apr 1, 2022 at 12:00 PM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
> perf bench epoll testcase fails on systems with CPU's
> more than 1K.
>
> Testcase: perf bench epoll all
> Result snippet:
> <<>>
> Run summary [PID 106497]: 1399 threads monitoring on 64 file-descriptors for 8 secs.
>
> perf: pthread_create: No such file or directory
> <<>>
>
> In epoll benchmarks (ctl, wait) pthread_create is invoked in do_threads
> from respective bench_epoll_*  function. Though the logs shows direct
> failure from pthread_create, the actual failure is from  "sched_setaffinity"
> returning EINVAL (invalid argument). This happens because the default
> mask size in glibc is 1024. To overcome this 1024 CPUs mask size
> limitation of cpu_set_t, change the mask size using the CPU_*_S macros.
>
> Patch addresses this by fixing all the epoll benchmarks to use
> CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
> CPU_SET_S to set the mask.
>
> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/perf/bench/epoll-ctl.c  | 25 +++++++++++++++++++------
>  tools/perf/bench/epoll-wait.c | 25 +++++++++++++++++++------
>  2 files changed, 38 insertions(+), 12 deletions(-)
>
> diff --git a/tools/perf/bench/epoll-ctl.c b/tools/perf/bench/epoll-ctl.c
> index 1a17ec83d3c4..91c53f6c6d87 100644
> --- a/tools/perf/bench/epoll-ctl.c
> +++ b/tools/perf/bench/epoll-ctl.c
> @@ -222,13 +222,20 @@ static void init_fdmaps(struct worker *w, int pct)
>  static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>  {
>         pthread_attr_t thread_attr, *attrp = NULL;
> -       cpu_set_t cpuset;
> +       cpu_set_t *cpuset;
>         unsigned int i, j;
>         int ret = 0;
> +       int nrcpus;
> +       size_t size;
>
>         if (!noaffinity)
>                 pthread_attr_init(&thread_attr);
>
> +       nrcpus = perf_cpu_map__nr(cpu);
> +       cpuset = CPU_ALLOC(nrcpus);
> +       BUG_ON(!cpuset);
> +       size = CPU_ALLOC_SIZE(nrcpus);
> +
>         for (i = 0; i < nthreads; i++) {
>                 struct worker *w = &worker[i];
>
> @@ -252,22 +259,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>                         init_fdmaps(w, 50);
>
>                 if (!noaffinity) {
> -                       CPU_ZERO(&cpuset);
> -                       CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
> +                       CPU_ZERO_S(size, cpuset);
> +                       CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
> +                                       size, cpuset);
>
> -                       ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
> -                       if (ret)
> +                       ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
> +                       if (ret) {
> +                               CPU_FREE(cpuset);
>                                 err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
> +                       }
>
>                         attrp = &thread_attr;
>                 }
>
>                 ret = pthread_create(&w->thread, attrp, workerfn,
>                                      (void *)(struct worker *) w);
> -               if (ret)
> +               if (ret) {
> +                       CPU_FREE(cpuset);
>                         err(EXIT_FAILURE, "pthread_create");
> +               }
>         }
>
> +       CPU_FREE(cpuset);

A nit here you could CPU_FREE right after the ret = pthread_create...
to make it a bit shorter.

Thanks,
Ian

>         if (!noaffinity)
>                 pthread_attr_destroy(&thread_attr);
>
> diff --git a/tools/perf/bench/epoll-wait.c b/tools/perf/bench/epoll-wait.c
> index 0d1dd8879197..9469a53ffab9 100644
> --- a/tools/perf/bench/epoll-wait.c
> +++ b/tools/perf/bench/epoll-wait.c
> @@ -291,9 +291,11 @@ static void print_summary(void)
>  static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>  {
>         pthread_attr_t thread_attr, *attrp = NULL;
> -       cpu_set_t cpuset;
> +       cpu_set_t *cpuset;
>         unsigned int i, j;
>         int ret = 0, events = EPOLLIN;
> +       int nrcpus;
> +       size_t size;
>
>         if (oneshot)
>                 events |= EPOLLONESHOT;
> @@ -306,6 +308,11 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>         if (!noaffinity)
>                 pthread_attr_init(&thread_attr);
>
> +       nrcpus = perf_cpu_map__nr(cpu);
> +       cpuset = CPU_ALLOC(nrcpus);
> +       BUG_ON(!cpuset);
> +       size = CPU_ALLOC_SIZE(nrcpus);
> +
>         for (i = 0; i < nthreads; i++) {
>                 struct worker *w = &worker[i];
>
> @@ -341,22 +348,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>                 }
>
>                 if (!noaffinity) {
> -                       CPU_ZERO(&cpuset);
> -                       CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
> +                       CPU_ZERO_S(size, cpuset);
> +                       CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
> +                                       size, cpuset);
>
> -                       ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
> -                       if (ret)
> +                       ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
> +                       if (ret) {
> +                               CPU_FREE(cpuset);
>                                 err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
> +                       }
>
>                         attrp = &thread_attr;
>                 }
>
>                 ret = pthread_create(&w->thread, attrp, workerfn,
>                                      (void *)(struct worker *) w);
> -               if (ret)
> +               if (ret) {
> +                       CPU_FREE(cpuset);
>                         err(EXIT_FAILURE, "pthread_create");
> +               }
>         }
>
> +       CPU_FREE(cpuset);
>         if (!noaffinity)
>                 pthread_attr_destroy(&thread_attr);
>
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] tools/perf: Fix perf bench epoll to correct usage of affinity for machines with #CPUs > 1K
@ 2022-04-05 17:56     ` Ian Rogers
  0 siblings, 0 replies; 19+ messages in thread
From: Ian Rogers @ 2022-04-05 17:56 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, disgoel, mpe, linux-perf-users, linuxppc-dev, maddy,
	rnsastry, kjain

On Fri, Apr 1, 2022 at 12:00 PM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
> perf bench epoll testcase fails on systems with CPU's
> more than 1K.
>
> Testcase: perf bench epoll all
> Result snippet:
> <<>>
> Run summary [PID 106497]: 1399 threads monitoring on 64 file-descriptors for 8 secs.
>
> perf: pthread_create: No such file or directory
> <<>>
>
> In epoll benchmarks (ctl, wait) pthread_create is invoked in do_threads
> from respective bench_epoll_*  function. Though the logs shows direct
> failure from pthread_create, the actual failure is from  "sched_setaffinity"
> returning EINVAL (invalid argument). This happens because the default
> mask size in glibc is 1024. To overcome this 1024 CPUs mask size
> limitation of cpu_set_t, change the mask size using the CPU_*_S macros.
>
> Patch addresses this by fixing all the epoll benchmarks to use
> CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
> CPU_SET_S to set the mask.
>
> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  tools/perf/bench/epoll-ctl.c  | 25 +++++++++++++++++++------
>  tools/perf/bench/epoll-wait.c | 25 +++++++++++++++++++------
>  2 files changed, 38 insertions(+), 12 deletions(-)
>
> diff --git a/tools/perf/bench/epoll-ctl.c b/tools/perf/bench/epoll-ctl.c
> index 1a17ec83d3c4..91c53f6c6d87 100644
> --- a/tools/perf/bench/epoll-ctl.c
> +++ b/tools/perf/bench/epoll-ctl.c
> @@ -222,13 +222,20 @@ static void init_fdmaps(struct worker *w, int pct)
>  static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>  {
>         pthread_attr_t thread_attr, *attrp = NULL;
> -       cpu_set_t cpuset;
> +       cpu_set_t *cpuset;
>         unsigned int i, j;
>         int ret = 0;
> +       int nrcpus;
> +       size_t size;
>
>         if (!noaffinity)
>                 pthread_attr_init(&thread_attr);
>
> +       nrcpus = perf_cpu_map__nr(cpu);
> +       cpuset = CPU_ALLOC(nrcpus);
> +       BUG_ON(!cpuset);
> +       size = CPU_ALLOC_SIZE(nrcpus);
> +
>         for (i = 0; i < nthreads; i++) {
>                 struct worker *w = &worker[i];
>
> @@ -252,22 +259,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>                         init_fdmaps(w, 50);
>
>                 if (!noaffinity) {
> -                       CPU_ZERO(&cpuset);
> -                       CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
> +                       CPU_ZERO_S(size, cpuset);
> +                       CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
> +                                       size, cpuset);
>
> -                       ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
> -                       if (ret)
> +                       ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
> +                       if (ret) {
> +                               CPU_FREE(cpuset);
>                                 err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
> +                       }
>
>                         attrp = &thread_attr;
>                 }
>
>                 ret = pthread_create(&w->thread, attrp, workerfn,
>                                      (void *)(struct worker *) w);
> -               if (ret)
> +               if (ret) {
> +                       CPU_FREE(cpuset);
>                         err(EXIT_FAILURE, "pthread_create");
> +               }
>         }
>
> +       CPU_FREE(cpuset);

A nit here you could CPU_FREE right after the ret = pthread_create...
to make it a bit shorter.

Thanks,
Ian

>         if (!noaffinity)
>                 pthread_attr_destroy(&thread_attr);
>
> diff --git a/tools/perf/bench/epoll-wait.c b/tools/perf/bench/epoll-wait.c
> index 0d1dd8879197..9469a53ffab9 100644
> --- a/tools/perf/bench/epoll-wait.c
> +++ b/tools/perf/bench/epoll-wait.c
> @@ -291,9 +291,11 @@ static void print_summary(void)
>  static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>  {
>         pthread_attr_t thread_attr, *attrp = NULL;
> -       cpu_set_t cpuset;
> +       cpu_set_t *cpuset;
>         unsigned int i, j;
>         int ret = 0, events = EPOLLIN;
> +       int nrcpus;
> +       size_t size;
>
>         if (oneshot)
>                 events |= EPOLLONESHOT;
> @@ -306,6 +308,11 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>         if (!noaffinity)
>                 pthread_attr_init(&thread_attr);
>
> +       nrcpus = perf_cpu_map__nr(cpu);
> +       cpuset = CPU_ALLOC(nrcpus);
> +       BUG_ON(!cpuset);
> +       size = CPU_ALLOC_SIZE(nrcpus);
> +
>         for (i = 0; i < nthreads; i++) {
>                 struct worker *w = &worker[i];
>
> @@ -341,22 +348,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>                 }
>
>                 if (!noaffinity) {
> -                       CPU_ZERO(&cpuset);
> -                       CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
> +                       CPU_ZERO_S(size, cpuset);
> +                       CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
> +                                       size, cpuset);
>
> -                       ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
> -                       if (ret)
> +                       ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
> +                       if (ret) {
> +                               CPU_FREE(cpuset);
>                                 err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
> +                       }
>
>                         attrp = &thread_attr;
>                 }
>
>                 ret = pthread_create(&w->thread, attrp, workerfn,
>                                      (void *)(struct worker *) w);
> -               if (ret)
> +               if (ret) {
> +                       CPU_FREE(cpuset);
>                         err(EXIT_FAILURE, "pthread_create");
> +               }
>         }
>
> +       CPU_FREE(cpuset);
>         if (!noaffinity)
>                 pthread_attr_destroy(&thread_attr);
>
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] tools/perf: Fix perf numa bench to fix usage of affinity for machines with #CPUs > 1K
  2022-04-05 17:52     ` Ian Rogers
@ 2022-04-06  7:15       ` Athira Rajeev
  -1 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-06  7:15 UTC (permalink / raw)
  To: Ian Rogers
  Cc: maddy, Nageswara Sastry, kajoljain, Arnaldo Carvalho de Melo,
	linux-perf-users, Jiri Olsa, disgoel, linuxppc-dev



> On 05-Apr-2022, at 11:22 PM, Ian Rogers <irogers@google.com> wrote:
> 
> On Fri, Apr 1, 2022 at 11:59 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com> wrote:
>> 
>> perf bench numa testcase fails on systems with CPU's
>> more than 1K.
>> 
>> Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1
>> Snippet of code:
>> <<>>
>> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
>> Aborted (core dumped)
>> <<>>
>> 
>> bind_to_node function uses "sched_getaffinity" to save the original
>> cpumask and this call is returning EINVAL ((invalid argument).
>> This happens because the default mask size in glibc is 1024.
>> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
>> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
>> allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this
>> for "orig_mask", apply same logic to "mask" as well which is used to
>> setaffinity so that mask size is large enough to represent number
>> of possible CPU's in the system.
>> 
>> sched_getaffinity is used in one more place in perf numa bench. It
>> is in "bind_to_cpu" function. Apply the same logic there also. Though
>> currently no failure is reported from there, it is ideal to change
>> getaffinity to work with such system configurations having CPU's more
>> than default mask size supported by glibc.
>> 
>> Also fix "sched_setaffinity" to use mask size which is large enough
>> to represent number of possible CPU's in the system.
>> 
>> Fixed all places where "bind_cpumask" which is part of "struct
>> thread_data" is used such that bind_cpumask works in all configuration.
>> 
>> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/perf/bench/numa.c | 109 +++++++++++++++++++++++++++++-----------
>> 1 file changed, 81 insertions(+), 28 deletions(-)
>> 
>> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
>> index f2640179ada9..333896907e45 100644
>> --- a/tools/perf/bench/numa.c
>> +++ b/tools/perf/bench/numa.c
>> @@ -54,7 +54,7 @@
>> 
>> struct thread_data {
>>        int                     curr_cpu;
>> -       cpu_set_t               bind_cpumask;
>> +       cpu_set_t               *bind_cpumask;
>>        int                     bind_node;
>>        u8                      *process_data;
>>        int                     process_nr;
>> @@ -266,46 +266,75 @@ static bool node_has_cpus(int node)
>>        return ret;
>> }
>> 
>> -static cpu_set_t bind_to_cpu(int target_cpu)
>> +static cpu_set_t *bind_to_cpu(int target_cpu)
>> {
>> -       cpu_set_t orig_mask, mask;
>> +       int nrcpus = numa_num_possible_cpus();
>> +       cpu_set_t *orig_mask, *mask;
>> +       size_t size;
>>        int ret;
>> 
>> -       ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
>> -       BUG_ON(ret);
>> +       orig_mask = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!orig_mask);
>> +       size = CPU_ALLOC_SIZE(nrcpus);
>> +       CPU_ZERO_S(size, orig_mask);
>> +
>> +       ret = sched_getaffinity(0, size, orig_mask);
>> +       if (ret) {
>> +               CPU_FREE(orig_mask);
>> +               BUG_ON(ret);
>> +       }
>> 
>> -       CPU_ZERO(&mask);
>> +       mask = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!mask);
>> +       CPU_ZERO_S(size, mask);
>> 
>>        if (target_cpu == -1) {
>>                int cpu;
>> 
>>                for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
>> -                       CPU_SET(cpu, &mask);
>> +                       CPU_SET_S(cpu, size, mask);
>>        } else {
>>                BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
>> -               CPU_SET(target_cpu, &mask);
>> +               CPU_SET_S(target_cpu, size, mask);
>>        }
>> 
>> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
>> -       BUG_ON(ret);
>> +       ret = sched_setaffinity(0, size, mask);
>> +       if (ret) {
>> +               CPU_FREE(mask);
>> +               BUG_ON(ret);
>> +       }
>> +
>> +       CPU_FREE(mask);
> 
> This all looks good, a nit here it could it be a little shorter as:
> ret = sched_setaffinity(0, size, mask);
> CPU_FREE(mask);
> BUG_ON(ret);
> 
> Thanks,
> Ian
> 

Thanks for the review Ian.
Sure, I will address this change in V2

Thanks
Athira
>> 
>>        return orig_mask;
>> }
>> 
>> -static cpu_set_t bind_to_node(int target_node)
>> +static cpu_set_t *bind_to_node(int target_node)
>> {
>> -       cpu_set_t orig_mask, mask;
>> +       int nrcpus = numa_num_possible_cpus();
>> +       cpu_set_t *orig_mask, *mask;
>> +       size_t size;
>>        int cpu;
>>        int ret;
>> 
>> -       ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
>> -       BUG_ON(ret);
>> +       orig_mask = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!orig_mask);
>> +       size = CPU_ALLOC_SIZE(nrcpus);
>> +       CPU_ZERO_S(size, orig_mask);
>> +
>> +       ret = sched_getaffinity(0, size, orig_mask);
>> +       if (ret) {
>> +               CPU_FREE(orig_mask);
>> +               BUG_ON(ret);
>> +       }
>> 
>> -       CPU_ZERO(&mask);
>> +       mask = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!mask);
>> +       CPU_ZERO_S(size, mask);
>> 
>>        if (target_node == NUMA_NO_NODE) {
>>                for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
>> -                       CPU_SET(cpu, &mask);
>> +                       CPU_SET_S(cpu, size, mask);
>>        } else {
>>                struct bitmask *cpumask = numa_allocate_cpumask();
>> 
>> @@ -313,24 +342,33 @@ static cpu_set_t bind_to_node(int target_node)
>>                if (!numa_node_to_cpus(target_node, cpumask)) {
>>                        for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
>>                                if (numa_bitmask_isbitset(cpumask, cpu))
>> -                                       CPU_SET(cpu, &mask);
>> +                                       CPU_SET_S(cpu, size, mask);
>>                        }
>>                }
>>                numa_free_cpumask(cpumask);
>>        }
>> 
>> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
>> -       BUG_ON(ret);
>> +       ret = sched_setaffinity(0, size, mask);
>> +       if (ret) {
>> +               CPU_FREE(mask);
>> +               BUG_ON(ret);
>> +       }
>> +
>> +       CPU_FREE(mask);
>> 
>>        return orig_mask;
>> }
>> 
>> -static void bind_to_cpumask(cpu_set_t mask)
>> +static void bind_to_cpumask(cpu_set_t *mask)
>> {
>>        int ret;
>> +       size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
>> 
>> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
>> -       BUG_ON(ret);
>> +       ret = sched_setaffinity(0, size, mask);
>> +       if (ret) {
>> +               CPU_FREE(mask);
>> +               BUG_ON(ret);
>> +       }
>> }
>> 
>> static void mempol_restore(void)
>> @@ -376,7 +414,7 @@ do {                                                        \
>> static u8 *alloc_data(ssize_t bytes0, int map_flags,
>>                      int init_zero, int init_cpu0, int thp, int init_random)
>> {
>> -       cpu_set_t orig_mask;
>> +       cpu_set_t *orig_mask;
>>        ssize_t bytes;
>>        u8 *buf;
>>        int ret;
>> @@ -434,6 +472,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
>>        /* Restore affinity: */
>>        if (init_cpu0) {
>>                bind_to_cpumask(orig_mask);
>> +               CPU_FREE(orig_mask);
>>                mempol_restore();
>>        }
>> 
>> @@ -589,6 +628,7 @@ static int parse_setup_cpu_list(void)
>>                BUG_ON(bind_cpu_0 > bind_cpu_1);
>> 
>>                for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
>> +                       size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
>>                        int i;
>> 
>>                        for (i = 0; i < mul; i++) {
>> @@ -608,10 +648,12 @@ static int parse_setup_cpu_list(void)
>>                                        tprintf("%2d", bind_cpu);
>>                                }
>> 
>> -                               CPU_ZERO(&td->bind_cpumask);
>> +                               td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
>> +                               BUG_ON(!td->bind_cpumask);
>> +                               CPU_ZERO_S(size, td->bind_cpumask);
>>                                for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
>>                                        BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
>> -                                       CPU_SET(cpu, &td->bind_cpumask);
>> +                                       CPU_SET_S(cpu, size, td->bind_cpumask);
>>                                }
>>                                t++;
>>                        }
>> @@ -1241,7 +1283,7 @@ static void *worker_thread(void *__tdata)
>>                 * by migrating to CPU#0:
>>                 */
>>                if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
>> -                       cpu_set_t orig_mask;
>> +                       cpu_set_t *orig_mask;
>>                        int target_cpu;
>>                        int this_cpu;
>> 
>> @@ -1265,6 +1307,7 @@ static void *worker_thread(void *__tdata)
>>                                printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
>> 
>>                        bind_to_cpumask(orig_mask);
>> +                       CPU_FREE(orig_mask);
>>                }
>> 
>>                if (details >= 3) {
>> @@ -1398,21 +1441,31 @@ static void init_thread_data(void)
>> 
>>        for (t = 0; t < g->p.nr_tasks; t++) {
>>                struct thread_data *td = g->threads + t;
>> +               size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
>>                int cpu;
>> 
>>                /* Allow all nodes by default: */
>>                td->bind_node = NUMA_NO_NODE;
>> 
>>                /* Allow all CPUs by default: */
>> -               CPU_ZERO(&td->bind_cpumask);
>> +               td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
>> +               BUG_ON(!td->bind_cpumask);
>> +               CPU_ZERO_S(cpuset_size, td->bind_cpumask);
>>                for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
>> -                       CPU_SET(cpu, &td->bind_cpumask);
>> +                       CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
>>        }
>> }
>> 
>> static void deinit_thread_data(void)
>> {
>>        ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
>> +       int t;
>> +
>> +       /* Free the bind_cpumask allocated for thread_data */
>> +       for (t = 0; t < g->p.nr_tasks; t++) {
>> +               struct thread_data *td = g->threads + t;
>> +               CPU_FREE(td->bind_cpumask);
>> +       }
>> 
>>        free_data(g->threads, size);
>> }
>> --
>> 2.35.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] tools/perf: Fix perf numa bench to fix usage of affinity for machines with #CPUs > 1K
@ 2022-04-06  7:15       ` Athira Rajeev
  0 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-06  7:15 UTC (permalink / raw)
  To: Ian Rogers
  Cc: maddy, Nageswara Sastry, Arnaldo Carvalho de Melo,
	linux-perf-users, Jiri Olsa, kajoljain, disgoel, linuxppc-dev



> On 05-Apr-2022, at 11:22 PM, Ian Rogers <irogers@google.com> wrote:
> 
> On Fri, Apr 1, 2022 at 11:59 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com> wrote:
>> 
>> perf bench numa testcase fails on systems with CPU's
>> more than 1K.
>> 
>> Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1
>> Snippet of code:
>> <<>>
>> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
>> Aborted (core dumped)
>> <<>>
>> 
>> bind_to_node function uses "sched_getaffinity" to save the original
>> cpumask and this call is returning EINVAL ((invalid argument).
>> This happens because the default mask size in glibc is 1024.
>> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
>> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
>> allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this
>> for "orig_mask", apply same logic to "mask" as well which is used to
>> setaffinity so that mask size is large enough to represent number
>> of possible CPU's in the system.
>> 
>> sched_getaffinity is used in one more place in perf numa bench. It
>> is in "bind_to_cpu" function. Apply the same logic there also. Though
>> currently no failure is reported from there, it is ideal to change
>> getaffinity to work with such system configurations having CPU's more
>> than default mask size supported by glibc.
>> 
>> Also fix "sched_setaffinity" to use mask size which is large enough
>> to represent number of possible CPU's in the system.
>> 
>> Fixed all places where "bind_cpumask" which is part of "struct
>> thread_data" is used such that bind_cpumask works in all configuration.
>> 
>> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/perf/bench/numa.c | 109 +++++++++++++++++++++++++++++-----------
>> 1 file changed, 81 insertions(+), 28 deletions(-)
>> 
>> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
>> index f2640179ada9..333896907e45 100644
>> --- a/tools/perf/bench/numa.c
>> +++ b/tools/perf/bench/numa.c
>> @@ -54,7 +54,7 @@
>> 
>> struct thread_data {
>>        int                     curr_cpu;
>> -       cpu_set_t               bind_cpumask;
>> +       cpu_set_t               *bind_cpumask;
>>        int                     bind_node;
>>        u8                      *process_data;
>>        int                     process_nr;
>> @@ -266,46 +266,75 @@ static bool node_has_cpus(int node)
>>        return ret;
>> }
>> 
>> -static cpu_set_t bind_to_cpu(int target_cpu)
>> +static cpu_set_t *bind_to_cpu(int target_cpu)
>> {
>> -       cpu_set_t orig_mask, mask;
>> +       int nrcpus = numa_num_possible_cpus();
>> +       cpu_set_t *orig_mask, *mask;
>> +       size_t size;
>>        int ret;
>> 
>> -       ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
>> -       BUG_ON(ret);
>> +       orig_mask = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!orig_mask);
>> +       size = CPU_ALLOC_SIZE(nrcpus);
>> +       CPU_ZERO_S(size, orig_mask);
>> +
>> +       ret = sched_getaffinity(0, size, orig_mask);
>> +       if (ret) {
>> +               CPU_FREE(orig_mask);
>> +               BUG_ON(ret);
>> +       }
>> 
>> -       CPU_ZERO(&mask);
>> +       mask = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!mask);
>> +       CPU_ZERO_S(size, mask);
>> 
>>        if (target_cpu == -1) {
>>                int cpu;
>> 
>>                for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
>> -                       CPU_SET(cpu, &mask);
>> +                       CPU_SET_S(cpu, size, mask);
>>        } else {
>>                BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
>> -               CPU_SET(target_cpu, &mask);
>> +               CPU_SET_S(target_cpu, size, mask);
>>        }
>> 
>> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
>> -       BUG_ON(ret);
>> +       ret = sched_setaffinity(0, size, mask);
>> +       if (ret) {
>> +               CPU_FREE(mask);
>> +               BUG_ON(ret);
>> +       }
>> +
>> +       CPU_FREE(mask);
> 
> This all looks good, a nit here it could it be a little shorter as:
> ret = sched_setaffinity(0, size, mask);
> CPU_FREE(mask);
> BUG_ON(ret);
> 
> Thanks,
> Ian
> 

Thanks for the review Ian.
Sure, I will address this change in V2

Thanks
Athira
>> 
>>        return orig_mask;
>> }
>> 
>> -static cpu_set_t bind_to_node(int target_node)
>> +static cpu_set_t *bind_to_node(int target_node)
>> {
>> -       cpu_set_t orig_mask, mask;
>> +       int nrcpus = numa_num_possible_cpus();
>> +       cpu_set_t *orig_mask, *mask;
>> +       size_t size;
>>        int cpu;
>>        int ret;
>> 
>> -       ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
>> -       BUG_ON(ret);
>> +       orig_mask = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!orig_mask);
>> +       size = CPU_ALLOC_SIZE(nrcpus);
>> +       CPU_ZERO_S(size, orig_mask);
>> +
>> +       ret = sched_getaffinity(0, size, orig_mask);
>> +       if (ret) {
>> +               CPU_FREE(orig_mask);
>> +               BUG_ON(ret);
>> +       }
>> 
>> -       CPU_ZERO(&mask);
>> +       mask = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!mask);
>> +       CPU_ZERO_S(size, mask);
>> 
>>        if (target_node == NUMA_NO_NODE) {
>>                for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
>> -                       CPU_SET(cpu, &mask);
>> +                       CPU_SET_S(cpu, size, mask);
>>        } else {
>>                struct bitmask *cpumask = numa_allocate_cpumask();
>> 
>> @@ -313,24 +342,33 @@ static cpu_set_t bind_to_node(int target_node)
>>                if (!numa_node_to_cpus(target_node, cpumask)) {
>>                        for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
>>                                if (numa_bitmask_isbitset(cpumask, cpu))
>> -                                       CPU_SET(cpu, &mask);
>> +                                       CPU_SET_S(cpu, size, mask);
>>                        }
>>                }
>>                numa_free_cpumask(cpumask);
>>        }
>> 
>> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
>> -       BUG_ON(ret);
>> +       ret = sched_setaffinity(0, size, mask);
>> +       if (ret) {
>> +               CPU_FREE(mask);
>> +               BUG_ON(ret);
>> +       }
>> +
>> +       CPU_FREE(mask);
>> 
>>        return orig_mask;
>> }
>> 
>> -static void bind_to_cpumask(cpu_set_t mask)
>> +static void bind_to_cpumask(cpu_set_t *mask)
>> {
>>        int ret;
>> +       size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
>> 
>> -       ret = sched_setaffinity(0, sizeof(mask), &mask);
>> -       BUG_ON(ret);
>> +       ret = sched_setaffinity(0, size, mask);
>> +       if (ret) {
>> +               CPU_FREE(mask);
>> +               BUG_ON(ret);
>> +       }
>> }
>> 
>> static void mempol_restore(void)
>> @@ -376,7 +414,7 @@ do {                                                        \
>> static u8 *alloc_data(ssize_t bytes0, int map_flags,
>>                      int init_zero, int init_cpu0, int thp, int init_random)
>> {
>> -       cpu_set_t orig_mask;
>> +       cpu_set_t *orig_mask;
>>        ssize_t bytes;
>>        u8 *buf;
>>        int ret;
>> @@ -434,6 +472,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
>>        /* Restore affinity: */
>>        if (init_cpu0) {
>>                bind_to_cpumask(orig_mask);
>> +               CPU_FREE(orig_mask);
>>                mempol_restore();
>>        }
>> 
>> @@ -589,6 +628,7 @@ static int parse_setup_cpu_list(void)
>>                BUG_ON(bind_cpu_0 > bind_cpu_1);
>> 
>>                for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
>> +                       size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
>>                        int i;
>> 
>>                        for (i = 0; i < mul; i++) {
>> @@ -608,10 +648,12 @@ static int parse_setup_cpu_list(void)
>>                                        tprintf("%2d", bind_cpu);
>>                                }
>> 
>> -                               CPU_ZERO(&td->bind_cpumask);
>> +                               td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
>> +                               BUG_ON(!td->bind_cpumask);
>> +                               CPU_ZERO_S(size, td->bind_cpumask);
>>                                for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
>>                                        BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
>> -                                       CPU_SET(cpu, &td->bind_cpumask);
>> +                                       CPU_SET_S(cpu, size, td->bind_cpumask);
>>                                }
>>                                t++;
>>                        }
>> @@ -1241,7 +1283,7 @@ static void *worker_thread(void *__tdata)
>>                 * by migrating to CPU#0:
>>                 */
>>                if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
>> -                       cpu_set_t orig_mask;
>> +                       cpu_set_t *orig_mask;
>>                        int target_cpu;
>>                        int this_cpu;
>> 
>> @@ -1265,6 +1307,7 @@ static void *worker_thread(void *__tdata)
>>                                printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
>> 
>>                        bind_to_cpumask(orig_mask);
>> +                       CPU_FREE(orig_mask);
>>                }
>> 
>>                if (details >= 3) {
>> @@ -1398,21 +1441,31 @@ static void init_thread_data(void)
>> 
>>        for (t = 0; t < g->p.nr_tasks; t++) {
>>                struct thread_data *td = g->threads + t;
>> +               size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
>>                int cpu;
>> 
>>                /* Allow all nodes by default: */
>>                td->bind_node = NUMA_NO_NODE;
>> 
>>                /* Allow all CPUs by default: */
>> -               CPU_ZERO(&td->bind_cpumask);
>> +               td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
>> +               BUG_ON(!td->bind_cpumask);
>> +               CPU_ZERO_S(cpuset_size, td->bind_cpumask);
>>                for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
>> -                       CPU_SET(cpu, &td->bind_cpumask);
>> +                       CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
>>        }
>> }
>> 
>> static void deinit_thread_data(void)
>> {
>>        ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
>> +       int t;
>> +
>> +       /* Free the bind_cpumask allocated for thread_data */
>> +       for (t = 0; t < g->p.nr_tasks; t++) {
>> +               struct thread_data *td = g->threads + t;
>> +               CPU_FREE(td->bind_cpumask);
>> +       }
>> 
>>        free_data(g->threads, size);
>> }
>> --
>> 2.35.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] tools/perf: Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K
  2022-04-01 18:58 ` Athira Rajeev
                   ` (4 preceding siblings ...)
  (?)
@ 2022-04-06  7:33 ` Disha Goel
  -1 siblings, 0 replies; 19+ messages in thread
From: Disha Goel @ 2022-04-06  7:33 UTC (permalink / raw)
  To: Athira Rajeev, acme, jolsa
  Cc: maddy, rnsastry, linux-perf-users, kjain, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 3102 bytes --]

-----Original Message-----
From: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
To: acme@kernel.org, jolsa@kernel.org, disgoel@linux.vnet.ibm.com
Cc: mpe@ellerman.id.au, linux-perf-users@vger.kernel.org, 
linuxppc-dev@lists.ozlabs.org, maddy@linux.vnet.ibm.com, 
rnsastry@linux.ibm.com, kjain@linux.ibm.com
Subject: [PATCH 0/4] tools/perf: Fix perf bench numa, futex and epoll
to work with machines having #CPUs > 1K
Date: Sat,  2 Apr 2022 00:28:49 +0530

The perf benchmark for collections: numa, futex and epollhits failure
in system configuration with CPU's more than 1024.These benchmarks uses
"sched_getaffinity" and "sched_setaffinity"in the code to work with
affinity.
Example snippet from numa benchmark:<<>>perf: bench/numa.c:302:
bind_to_node: Assertion `!(ret)' failed.Aborted (core dumped)<<>>
bind_to_node function uses "sched_getaffinity" to save the cpumask.This
fails with EINVAL because the default mask size in glibc is 1024.
Similarly in futex and epoll benchmark, uses sched_setaffinity
duringpthread_create with affinity. And since it returns EINVAL in such
systemconfiguration, benchmark doesn't run.
To overcome this 1024 CPUs mask size limitation of cpu_set_t,change the
mask size using the CPU_*_S macros ie, use CPU_ALLOC toallocate
cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.
Fix all the relevant places in the code to use mask size which is
largeenough to represent number of possible CPU's in the system.
Fix parse_setup_cpu_list function in numa bench to check if input CPUis
online before binding task to that CPU. This is to fix failures
where,though CPU number is within max CPU, it could happen that CPU is
offline.Here, sched_setaffinity will result in failure when using
cpumask havingthat cpu bit set in the mask.
Patch 1 and Patch 2 address fix for perf bench futex and perf
benchepoll benchmark. Patch 3 and Patch 4 address fix in perf bench
numabenchmark
Athira Rajeev (4):  tools/perf: Fix perf bench futex to correct usage
of affinity for    machines with #CPUs > 1K  tools/perf: Fix perf bench
epoll to correct usage of affinity for    machines with #CPUs >
1K  tools/perf: Fix perf numa bench to fix usage of affinity for
machines    with #CPUs > 1K  tools/perf: Fix perf bench numa testcase
to check if CPU used to bind    task is online
Tesed the patches on powerpc with CPU > 1K and other configurations as
well, verified the perf benchmark numa, futex and epoll for whole
series with the patch set.Tested-by: Disha Goel <
disgoel@linux.vnet.ibm.com>
 tools/perf/bench/epoll-ctl.c           |  25 ++++--
tools/perf/bench/epoll-wait.c          |  25 ++++--
tools/perf/bench/futex-hash.c          |  26 ++++--
tools/perf/bench/futex-lock-pi.c       |  21 +++--
tools/perf/bench/futex-requeue.c       |  21 +++--
tools/perf/bench/futex-wake-parallel.c |  21 +++--
tools/perf/bench/futex-wake.c          |  22 +++--
tools/perf/bench/numa.c                | 117 ++++++++++++++++++-------
tools/perf/util/header.c               |  43 +++++++++
tools/perf/util/header.h               |   1 + 10 files changed, 252
insertions(+), 70 deletions(-)

[-- Attachment #2: Type: text/html, Size: 5065 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] tools/perf: Fix perf bench epoll to correct usage of affinity for machines with #CPUs > 1K
  2022-04-05 17:56     ` Ian Rogers
@ 2022-04-06  7:47       ` Athira Rajeev
  -1 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-06  7:47 UTC (permalink / raw)
  To: Ian Rogers
  Cc: maddy, rnsastry, Arnaldo Carvalho de Melo, linux-perf-users,
	Jiri Olsa, kjain, disgoel, linuxppc-dev



> On 05-Apr-2022, at 11:26 PM, Ian Rogers <irogers@google.com> wrote:
> 
> On Fri, Apr 1, 2022 at 12:00 PM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com> wrote:
>> 
>> perf bench epoll testcase fails on systems with CPU's
>> more than 1K.
>> 
>> Testcase: perf bench epoll all
>> Result snippet:
>> <<>>
>> Run summary [PID 106497]: 1399 threads monitoring on 64 file-descriptors for 8 secs.
>> 
>> perf: pthread_create: No such file or directory
>> <<>>
>> 
>> In epoll benchmarks (ctl, wait) pthread_create is invoked in do_threads
>> from respective bench_epoll_*  function. Though the logs shows direct
>> failure from pthread_create, the actual failure is from  "sched_setaffinity"
>> returning EINVAL (invalid argument). This happens because the default
>> mask size in glibc is 1024. To overcome this 1024 CPUs mask size
>> limitation of cpu_set_t, change the mask size using the CPU_*_S macros.
>> 
>> Patch addresses this by fixing all the epoll benchmarks to use
>> CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
>> CPU_SET_S to set the mask.
>> 
>> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/perf/bench/epoll-ctl.c  | 25 +++++++++++++++++++------
>> tools/perf/bench/epoll-wait.c | 25 +++++++++++++++++++------
>> 2 files changed, 38 insertions(+), 12 deletions(-)
>> 
>> diff --git a/tools/perf/bench/epoll-ctl.c b/tools/perf/bench/epoll-ctl.c
>> index 1a17ec83d3c4..91c53f6c6d87 100644
>> --- a/tools/perf/bench/epoll-ctl.c
>> +++ b/tools/perf/bench/epoll-ctl.c
>> @@ -222,13 +222,20 @@ static void init_fdmaps(struct worker *w, int pct)
>> static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>> {
>>        pthread_attr_t thread_attr, *attrp = NULL;
>> -       cpu_set_t cpuset;
>> +       cpu_set_t *cpuset;
>>        unsigned int i, j;
>>        int ret = 0;
>> +       int nrcpus;
>> +       size_t size;
>> 
>>        if (!noaffinity)
>>                pthread_attr_init(&thread_attr);
>> 
>> +       nrcpus = perf_cpu_map__nr(cpu);
>> +       cpuset = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!cpuset);
>> +       size = CPU_ALLOC_SIZE(nrcpus);
>> +
>>        for (i = 0; i < nthreads; i++) {
>>                struct worker *w = &worker[i];
>> 
>> @@ -252,22 +259,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>>                        init_fdmaps(w, 50);
>> 
>>                if (!noaffinity) {
>> -                       CPU_ZERO(&cpuset);
>> -                       CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
>> +                       CPU_ZERO_S(size, cpuset);
>> +                       CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
>> +                                       size, cpuset);
>> 
>> -                       ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
>> -                       if (ret)
>> +                       ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
>> +                       if (ret) {
>> +                               CPU_FREE(cpuset);
>>                                err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
>> +                       }
>> 
>>                        attrp = &thread_attr;
>>                }
>> 
>>                ret = pthread_create(&w->thread, attrp, workerfn,
>>                                     (void *)(struct worker *) w);
>> -               if (ret)
>> +               if (ret) {
>> +                       CPU_FREE(cpuset);
>>                        err(EXIT_FAILURE, "pthread_create");
>> +               }
>>        }
>> 
>> +       CPU_FREE(cpuset);
> 
> A nit here you could CPU_FREE right after the ret = pthread_create...
> to make it a bit shorter.
> 
> Thanks,
> Ian

Hi Ian,

Here, the pthread_create is run in loop for nthreads.
And CPU_ALLOC for cpu set is done before the loop. I am doing “CPU_FREE" inside loop only for cases when pthread_create fails where benchmark exits. In other cases, we need the mask allocated till loop completes for “nthreads”. 
So, we need CPU_FREE outside loop also. 

Thanks
Athira
> 
>>        if (!noaffinity)
>>                pthread_attr_destroy(&thread_attr);
>> 
>> diff --git a/tools/perf/bench/epoll-wait.c b/tools/perf/bench/epoll-wait.c
>> index 0d1dd8879197..9469a53ffab9 100644
>> --- a/tools/perf/bench/epoll-wait.c
>> +++ b/tools/perf/bench/epoll-wait.c
>> @@ -291,9 +291,11 @@ static void print_summary(void)
>> static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>> {
>>        pthread_attr_t thread_attr, *attrp = NULL;
>> -       cpu_set_t cpuset;
>> +       cpu_set_t *cpuset;
>>        unsigned int i, j;
>>        int ret = 0, events = EPOLLIN;
>> +       int nrcpus;
>> +       size_t size;
>> 
>>        if (oneshot)
>>                events |= EPOLLONESHOT;
>> @@ -306,6 +308,11 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>>        if (!noaffinity)
>>                pthread_attr_init(&thread_attr);
>> 
>> +       nrcpus = perf_cpu_map__nr(cpu);
>> +       cpuset = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!cpuset);
>> +       size = CPU_ALLOC_SIZE(nrcpus);
>> +
>>        for (i = 0; i < nthreads; i++) {
>>                struct worker *w = &worker[i];
>> 
>> @@ -341,22 +348,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>>                }
>> 
>>                if (!noaffinity) {
>> -                       CPU_ZERO(&cpuset);
>> -                       CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
>> +                       CPU_ZERO_S(size, cpuset);
>> +                       CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
>> +                                       size, cpuset);
>> 
>> -                       ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
>> -                       if (ret)
>> +                       ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
>> +                       if (ret) {
>> +                               CPU_FREE(cpuset);
>>                                err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
>> +                       }
>> 
>>                        attrp = &thread_attr;
>>                }
>> 
>>                ret = pthread_create(&w->thread, attrp, workerfn,
>>                                     (void *)(struct worker *) w);
>> -               if (ret)
>> +               if (ret) {
>> +                       CPU_FREE(cpuset);
>>                        err(EXIT_FAILURE, "pthread_create");
>> +               }
>>        }
>> 
>> +       CPU_FREE(cpuset);
>>        if (!noaffinity)
>>                pthread_attr_destroy(&thread_attr);
>> 
>> --
>> 2.35.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] tools/perf: Fix perf bench epoll to correct usage of affinity for machines with #CPUs > 1K
@ 2022-04-06  7:47       ` Athira Rajeev
  0 siblings, 0 replies; 19+ messages in thread
From: Athira Rajeev @ 2022-04-06  7:47 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, disgoel, Michael Ellerman,
	linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain



> On 05-Apr-2022, at 11:26 PM, Ian Rogers <irogers@google.com> wrote:
> 
> On Fri, Apr 1, 2022 at 12:00 PM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com> wrote:
>> 
>> perf bench epoll testcase fails on systems with CPU's
>> more than 1K.
>> 
>> Testcase: perf bench epoll all
>> Result snippet:
>> <<>>
>> Run summary [PID 106497]: 1399 threads monitoring on 64 file-descriptors for 8 secs.
>> 
>> perf: pthread_create: No such file or directory
>> <<>>
>> 
>> In epoll benchmarks (ctl, wait) pthread_create is invoked in do_threads
>> from respective bench_epoll_*  function. Though the logs shows direct
>> failure from pthread_create, the actual failure is from  "sched_setaffinity"
>> returning EINVAL (invalid argument). This happens because the default
>> mask size in glibc is 1024. To overcome this 1024 CPUs mask size
>> limitation of cpu_set_t, change the mask size using the CPU_*_S macros.
>> 
>> Patch addresses this by fixing all the epoll benchmarks to use
>> CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
>> CPU_SET_S to set the mask.
>> 
>> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> ---
>> tools/perf/bench/epoll-ctl.c  | 25 +++++++++++++++++++------
>> tools/perf/bench/epoll-wait.c | 25 +++++++++++++++++++------
>> 2 files changed, 38 insertions(+), 12 deletions(-)
>> 
>> diff --git a/tools/perf/bench/epoll-ctl.c b/tools/perf/bench/epoll-ctl.c
>> index 1a17ec83d3c4..91c53f6c6d87 100644
>> --- a/tools/perf/bench/epoll-ctl.c
>> +++ b/tools/perf/bench/epoll-ctl.c
>> @@ -222,13 +222,20 @@ static void init_fdmaps(struct worker *w, int pct)
>> static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>> {
>>        pthread_attr_t thread_attr, *attrp = NULL;
>> -       cpu_set_t cpuset;
>> +       cpu_set_t *cpuset;
>>        unsigned int i, j;
>>        int ret = 0;
>> +       int nrcpus;
>> +       size_t size;
>> 
>>        if (!noaffinity)
>>                pthread_attr_init(&thread_attr);
>> 
>> +       nrcpus = perf_cpu_map__nr(cpu);
>> +       cpuset = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!cpuset);
>> +       size = CPU_ALLOC_SIZE(nrcpus);
>> +
>>        for (i = 0; i < nthreads; i++) {
>>                struct worker *w = &worker[i];
>> 
>> @@ -252,22 +259,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>>                        init_fdmaps(w, 50);
>> 
>>                if (!noaffinity) {
>> -                       CPU_ZERO(&cpuset);
>> -                       CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
>> +                       CPU_ZERO_S(size, cpuset);
>> +                       CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
>> +                                       size, cpuset);
>> 
>> -                       ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
>> -                       if (ret)
>> +                       ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
>> +                       if (ret) {
>> +                               CPU_FREE(cpuset);
>>                                err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
>> +                       }
>> 
>>                        attrp = &thread_attr;
>>                }
>> 
>>                ret = pthread_create(&w->thread, attrp, workerfn,
>>                                     (void *)(struct worker *) w);
>> -               if (ret)
>> +               if (ret) {
>> +                       CPU_FREE(cpuset);
>>                        err(EXIT_FAILURE, "pthread_create");
>> +               }
>>        }
>> 
>> +       CPU_FREE(cpuset);
> 
> A nit here you could CPU_FREE right after the ret = pthread_create...
> to make it a bit shorter.
> 
> Thanks,
> Ian

Hi Ian,

Here, the pthread_create is run in loop for nthreads.
And CPU_ALLOC for cpu set is done before the loop. I am doing “CPU_FREE" inside loop only for cases when pthread_create fails where benchmark exits. In other cases, we need the mask allocated till loop completes for “nthreads”. 
So, we need CPU_FREE outside loop also. 

Thanks
Athira
> 
>>        if (!noaffinity)
>>                pthread_attr_destroy(&thread_attr);
>> 
>> diff --git a/tools/perf/bench/epoll-wait.c b/tools/perf/bench/epoll-wait.c
>> index 0d1dd8879197..9469a53ffab9 100644
>> --- a/tools/perf/bench/epoll-wait.c
>> +++ b/tools/perf/bench/epoll-wait.c
>> @@ -291,9 +291,11 @@ static void print_summary(void)
>> static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>> {
>>        pthread_attr_t thread_attr, *attrp = NULL;
>> -       cpu_set_t cpuset;
>> +       cpu_set_t *cpuset;
>>        unsigned int i, j;
>>        int ret = 0, events = EPOLLIN;
>> +       int nrcpus;
>> +       size_t size;
>> 
>>        if (oneshot)
>>                events |= EPOLLONESHOT;
>> @@ -306,6 +308,11 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>>        if (!noaffinity)
>>                pthread_attr_init(&thread_attr);
>> 
>> +       nrcpus = perf_cpu_map__nr(cpu);
>> +       cpuset = CPU_ALLOC(nrcpus);
>> +       BUG_ON(!cpuset);
>> +       size = CPU_ALLOC_SIZE(nrcpus);
>> +
>>        for (i = 0; i < nthreads; i++) {
>>                struct worker *w = &worker[i];
>> 
>> @@ -341,22 +348,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
>>                }
>> 
>>                if (!noaffinity) {
>> -                       CPU_ZERO(&cpuset);
>> -                       CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
>> +                       CPU_ZERO_S(size, cpuset);
>> +                       CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
>> +                                       size, cpuset);
>> 
>> -                       ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
>> -                       if (ret)
>> +                       ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
>> +                       if (ret) {
>> +                               CPU_FREE(cpuset);
>>                                err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
>> +                       }
>> 
>>                        attrp = &thread_attr;
>>                }
>> 
>>                ret = pthread_create(&w->thread, attrp, workerfn,
>>                                     (void *)(struct worker *) w);
>> -               if (ret)
>> +               if (ret) {
>> +                       CPU_FREE(cpuset);
>>                        err(EXIT_FAILURE, "pthread_create");
>> +               }
>>        }
>> 
>> +       CPU_FREE(cpuset);
>>        if (!noaffinity)
>>                pthread_attr_destroy(&thread_attr);
>> 
>> --
>> 2.35.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-04-07  0:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-01 18:58 [PATCH 0/4] tools/perf: Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Athira Rajeev
2022-04-01 18:58 ` Athira Rajeev
2022-04-01 18:58 ` [PATCH 1/4] tools/perf: Fix perf bench futex to correct usage of affinity for machines with " Athira Rajeev
2022-04-01 18:58   ` Athira Rajeev
2022-04-01 18:58 ` [PATCH 2/4] tools/perf: Fix perf bench epoll " Athira Rajeev
2022-04-01 18:58   ` Athira Rajeev
2022-04-05 17:56   ` Ian Rogers
2022-04-05 17:56     ` Ian Rogers
2022-04-06  7:47     ` Athira Rajeev
2022-04-06  7:47       ` Athira Rajeev
2022-04-01 18:58 ` [PATCH 3/4] tools/perf: Fix perf numa bench to fix " Athira Rajeev
2022-04-01 18:58   ` Athira Rajeev
2022-04-05 17:52   ` Ian Rogers
2022-04-05 17:52     ` Ian Rogers
2022-04-06  7:15     ` Athira Rajeev
2022-04-06  7:15       ` Athira Rajeev
2022-04-01 18:58 ` [PATCH 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online Athira Rajeev
2022-04-01 18:58   ` Athira Rajeev
2022-04-06  7:33 ` [PATCH 0/4] tools/perf: Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Disha Goel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.