linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K
@ 2022-04-06 17:51 Athira Rajeev
  2022-04-06 17:51 ` [PATCH v2 1/4] tools/perf: Fix perf bench futex to correct usage of affinity for machines with " Athira Rajeev
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Athira Rajeev @ 2022-04-06 17:51 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain,
	linux-kernel, srikar, irogers

The perf benchmark for collections: numa, futex and epoll
hits failure in system configuration with CPU's more than 1024.
These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
in the code to work with affinity.

Example snippet from numa benchmark:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>

bind_to_node function uses "sched_getaffinity" to save the cpumask.
This fails with EINVAL because the default mask size in glibc is 1024.

Similarly in futex and epoll benchmark, uses sched_setaffinity during
pthread_create with affinity. And since it returns EINVAL in such system
configuration, benchmark doesn't run.

To overcome this 1024 CPUs mask size limitation of cpu_set_t,
change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.

Fix all the relevant places in the code to use mask size which is large
enough to represent number of possible CPU's in the system.

Fix parse_setup_cpu_list function in numa bench to check if input CPU
is online before binding task to that CPU. This is to fix failures where,
though CPU number is within max CPU, it could happen that CPU is offline.
Here, sched_setaffinity will result in failure when using cpumask having
that cpu bit set in the mask.

Patch 1 and Patch 2 address fix for perf bench futex and perf bench
epoll benchmark. Patch 3 and Patch 4 address fix in perf bench numa
benchmark

Athira Rajeev (4):
  tools/perf: Fix perf bench futex to correct usage of affinity for
    machines with #CPUs > 1K
  tools/perf: Fix perf bench epoll to correct usage of affinity for
    machines with #CPUs > 1K
  tools/perf: Fix perf numa bench to fix usage of affinity for machines
    with #CPUs > 1K
  tools/perf: Fix perf bench numa testcase to check if CPU used to bind
    task is online

Changelog:
From v1 -> v2:
 Addressed review comment from Ian Rogers to do
 CPU_FREE in a cleaner way.
 Added Tested-by from Disha Goel

 tools/perf/bench/epoll-ctl.c           |  25 ++++--
 tools/perf/bench/epoll-wait.c          |  25 ++++--
 tools/perf/bench/futex-hash.c          |  26 ++++--
 tools/perf/bench/futex-lock-pi.c       |  21 +++--
 tools/perf/bench/futex-requeue.c       |  21 +++--
 tools/perf/bench/futex-wake-parallel.c |  21 +++--
 tools/perf/bench/futex-wake.c          |  22 ++++--
 tools/perf/bench/numa.c                | 105 ++++++++++++++++++-------
 tools/perf/util/header.c               |  43 ++++++++++
 tools/perf/util/header.h               |   1 +
 10 files changed, 242 insertions(+), 68 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/4] tools/perf: Fix perf bench futex to correct usage of affinity for machines with #CPUs > 1K
  2022-04-06 17:51 [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Athira Rajeev
@ 2022-04-06 17:51 ` Athira Rajeev
  2022-04-08 12:27   ` Srikar Dronamraju
  2022-04-06 17:51 ` [PATCH v2 2/4] tools/perf: Fix perf bench epoll " Athira Rajeev
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Athira Rajeev @ 2022-04-06 17:51 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain,
	linux-kernel, srikar, irogers

perf bench futex testcase fails on systems with CPU's
more than 1K.

Testcase: perf bench futex all
Failure snippet:
<<>>Running futex/hash benchmark...

perf: pthread_create: No such file or directory
<<>>

All the futex benchmarks ( ie hash, lock-api, requeue, wake,
wake-parallel ), pthread_create is invoked in respective bench_futex_*
function. Though the logs shows direct failure from pthread_create,
strace logs showed that actual failure is from  "sched_setaffinity"
returning EINVAL (invalid argument). This happens because the default
mask size in glibc is 1024. To overcome this 1024 CPUs mask size
limitation of cpu_set_t, change the mask size using the CPU_*_S macros.

Patch addresses this by fixing all the futex benchmarks to use
CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
CPU_SET_S to set the mask.

Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
---
 tools/perf/bench/futex-hash.c          | 26 +++++++++++++++++++-------
 tools/perf/bench/futex-lock-pi.c       | 21 ++++++++++++++++-----
 tools/perf/bench/futex-requeue.c       | 21 ++++++++++++++++-----
 tools/perf/bench/futex-wake-parallel.c | 21 ++++++++++++++++-----
 tools/perf/bench/futex-wake.c          | 22 ++++++++++++++++------
 5 files changed, 83 insertions(+), 28 deletions(-)

diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
index 9627b6ab8670..dfce64e551e2 100644
--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -122,12 +122,14 @@ static void print_summary(void)
 int bench_futex_hash(int argc, const char **argv)
 {
 	int ret = 0;
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	struct sigaction act;
 	unsigned int i;
 	pthread_attr_t thread_attr;
 	struct worker *worker = NULL;
 	struct perf_cpu_map *cpu;
+	int nrcpus;
+	size_t size;
 
 	argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0);
 	if (argc) {
@@ -170,25 +172,35 @@ int bench_futex_hash(int argc, const char **argv)
 	threads_starting = params.nthreads;
 	pthread_attr_init(&thread_attr);
 	gettimeofday(&bench__start, NULL);
+
+	nrcpus = perf_cpu_map__nr(cpu);
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < params.nthreads; i++) {
 		worker[i].tid = i;
 		worker[i].futex = calloc(params.nfutexes, sizeof(*worker[i].futex));
 		if (!worker[i].futex)
 			goto errmem;
 
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
 
-		ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
-		if (ret)
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
+		ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
-
+		}
 		ret = pthread_create(&worker[i].thread, &thread_attr, workerfn,
 				     (void *)(struct worker *) &worker[i]);
-		if (ret)
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 
 	}
+	CPU_FREE(cpuset);
 	pthread_attr_destroy(&thread_attr);
 
 	pthread_mutex_lock(&thread_lock);
diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c
index a512a320df74..61c3bb80d4cf 100644
--- a/tools/perf/bench/futex-lock-pi.c
+++ b/tools/perf/bench/futex-lock-pi.c
@@ -120,11 +120,17 @@ static void *workerfn(void *arg)
 static void create_threads(struct worker *w, pthread_attr_t thread_attr,
 			   struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
+	int nrcpus =  perf_cpu_map__nr(cpu);
+	size_t size;
 
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < params.nthreads; i++) {
 		worker[i].tid = i;
 
@@ -135,15 +141,20 @@ static void create_threads(struct worker *w, pthread_attr_t thread_attr,
 		} else
 			worker[i].futex = &global_futex;
 
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i].thread, &thread_attr, workerfn, &worker[i]))
+		if (pthread_create(&w[i].thread, &thread_attr, workerfn, &worker[i])) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 int bench_futex_lock_pi(int argc, const char **argv)
diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c
index aca47ce8b1e7..2cb013f7ffe5 100644
--- a/tools/perf/bench/futex-requeue.c
+++ b/tools/perf/bench/futex-requeue.c
@@ -123,22 +123,33 @@ static void *workerfn(void *arg __maybe_unused)
 static void block_threads(pthread_t *w,
 			  pthread_attr_t thread_attr, struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
+	int nrcpus = perf_cpu_map__nr(cpu);
+	size_t size;
 
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	/* create and block all threads */
 	for (i = 0; i < params.nthreads; i++) {
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i], &thread_attr, workerfn, NULL))
+		if (pthread_create(&w[i], &thread_attr, workerfn, NULL)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 static void toggle_done(int sig __maybe_unused,
diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c
index 888ee6037945..efa5070a5eb3 100644
--- a/tools/perf/bench/futex-wake-parallel.c
+++ b/tools/perf/bench/futex-wake-parallel.c
@@ -144,22 +144,33 @@ static void *blocked_workerfn(void *arg __maybe_unused)
 static void block_threads(pthread_t *w, pthread_attr_t thread_attr,
 			  struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
+	int nrcpus = perf_cpu_map__nr(cpu);
+	size_t size;
 
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	/* create and block all threads */
 	for (i = 0; i < params.nthreads; i++) {
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i], &thread_attr, blocked_workerfn, NULL))
+		if (pthread_create(&w[i], &thread_attr, blocked_workerfn, NULL)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 static void print_run(struct thread_data *waking_worker, unsigned int run_num)
diff --git a/tools/perf/bench/futex-wake.c b/tools/perf/bench/futex-wake.c
index aa82db51c0ab..3a10f54900c1 100644
--- a/tools/perf/bench/futex-wake.c
+++ b/tools/perf/bench/futex-wake.c
@@ -97,22 +97,32 @@ static void print_summary(void)
 static void block_threads(pthread_t *w,
 			  pthread_attr_t thread_attr, struct perf_cpu_map *cpu)
 {
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i;
-
+	size_t size;
+	int nrcpus = perf_cpu_map__nr(cpu);
 	threads_starting = params.nthreads;
 
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	/* create and block all threads */
 	for (i = 0; i < params.nthreads; i++) {
-		CPU_ZERO(&cpuset);
-		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+		CPU_ZERO_S(size, cpuset);
+		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
 
-		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
+		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
 
-		if (pthread_create(&w[i], &thread_attr, workerfn, NULL))
+		if (pthread_create(&w[i], &thread_attr, workerfn, NULL)) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
+	CPU_FREE(cpuset);
 }
 
 static void toggle_done(int sig __maybe_unused,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/4] tools/perf: Fix perf bench epoll to correct usage of affinity for machines with #CPUs > 1K
  2022-04-06 17:51 [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Athira Rajeev
  2022-04-06 17:51 ` [PATCH v2 1/4] tools/perf: Fix perf bench futex to correct usage of affinity for machines with " Athira Rajeev
@ 2022-04-06 17:51 ` Athira Rajeev
  2022-04-06 17:51 ` [PATCH v2 3/4] tools/perf: Fix perf numa bench to fix " Athira Rajeev
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Athira Rajeev @ 2022-04-06 17:51 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain,
	linux-kernel, srikar, irogers

perf bench epoll testcase fails on systems with CPU's
more than 1K.

Testcase: perf bench epoll all
Result snippet:
<<>>
Run summary [PID 106497]: 1399 threads monitoring on 64 file-descriptors for 8 secs.

perf: pthread_create: No such file or directory
<<>>

In epoll benchmarks (ctl, wait) pthread_create is invoked in do_threads
from respective bench_epoll_*  function. Though the logs shows direct
failure from pthread_create, the actual failure is from  "sched_setaffinity"
returning EINVAL (invalid argument). This happens because the default
mask size in glibc is 1024. To overcome this 1024 CPUs mask size
limitation of cpu_set_t, change the mask size using the CPU_*_S macros.

Patch addresses this by fixing all the epoll benchmarks to use
CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
CPU_SET_S to set the mask.

Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
---
 tools/perf/bench/epoll-ctl.c  | 25 +++++++++++++++++++------
 tools/perf/bench/epoll-wait.c | 25 +++++++++++++++++++------
 2 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/tools/perf/bench/epoll-ctl.c b/tools/perf/bench/epoll-ctl.c
index 1a17ec83d3c4..91c53f6c6d87 100644
--- a/tools/perf/bench/epoll-ctl.c
+++ b/tools/perf/bench/epoll-ctl.c
@@ -222,13 +222,20 @@ static void init_fdmaps(struct worker *w, int pct)
 static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 {
 	pthread_attr_t thread_attr, *attrp = NULL;
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i, j;
 	int ret = 0;
+	int nrcpus;
+	size_t size;
 
 	if (!noaffinity)
 		pthread_attr_init(&thread_attr);
 
+	nrcpus = perf_cpu_map__nr(cpu);
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < nthreads; i++) {
 		struct worker *w = &worker[i];
 
@@ -252,22 +259,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 			init_fdmaps(w, 50);
 
 		if (!noaffinity) {
-			CPU_ZERO(&cpuset);
-			CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+			CPU_ZERO_S(size, cpuset);
+			CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
+					size, cpuset);
 
-			ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
-			if (ret)
+			ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
+			if (ret) {
+				CPU_FREE(cpuset);
 				err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+			}
 
 			attrp = &thread_attr;
 		}
 
 		ret = pthread_create(&w->thread, attrp, workerfn,
 				     (void *)(struct worker *) w);
-		if (ret)
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
 
+	CPU_FREE(cpuset);
 	if (!noaffinity)
 		pthread_attr_destroy(&thread_attr);
 
diff --git a/tools/perf/bench/epoll-wait.c b/tools/perf/bench/epoll-wait.c
index 0d1dd8879197..9469a53ffab9 100644
--- a/tools/perf/bench/epoll-wait.c
+++ b/tools/perf/bench/epoll-wait.c
@@ -291,9 +291,11 @@ static void print_summary(void)
 static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 {
 	pthread_attr_t thread_attr, *attrp = NULL;
-	cpu_set_t cpuset;
+	cpu_set_t *cpuset;
 	unsigned int i, j;
 	int ret = 0, events = EPOLLIN;
+	int nrcpus;
+	size_t size;
 
 	if (oneshot)
 		events |= EPOLLONESHOT;
@@ -306,6 +308,11 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 	if (!noaffinity)
 		pthread_attr_init(&thread_attr);
 
+	nrcpus = perf_cpu_map__nr(cpu);
+	cpuset = CPU_ALLOC(nrcpus);
+	BUG_ON(!cpuset);
+	size = CPU_ALLOC_SIZE(nrcpus);
+
 	for (i = 0; i < nthreads; i++) {
 		struct worker *w = &worker[i];
 
@@ -341,22 +348,28 @@ static int do_threads(struct worker *worker, struct perf_cpu_map *cpu)
 		}
 
 		if (!noaffinity) {
-			CPU_ZERO(&cpuset);
-			CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
+			CPU_ZERO_S(size, cpuset);
+			CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu,
+					size, cpuset);
 
-			ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
-			if (ret)
+			ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
+			if (ret) {
+				CPU_FREE(cpuset);
 				err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+			}
 
 			attrp = &thread_attr;
 		}
 
 		ret = pthread_create(&w->thread, attrp, workerfn,
 				     (void *)(struct worker *) w);
-		if (ret)
+		if (ret) {
+			CPU_FREE(cpuset);
 			err(EXIT_FAILURE, "pthread_create");
+		}
 	}
 
+	CPU_FREE(cpuset);
 	if (!noaffinity)
 		pthread_attr_destroy(&thread_attr);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/4] tools/perf: Fix perf numa bench to fix usage of affinity for machines with #CPUs > 1K
  2022-04-06 17:51 [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Athira Rajeev
  2022-04-06 17:51 ` [PATCH v2 1/4] tools/perf: Fix perf bench futex to correct usage of affinity for machines with " Athira Rajeev
  2022-04-06 17:51 ` [PATCH v2 2/4] tools/perf: Fix perf bench epoll " Athira Rajeev
@ 2022-04-06 17:51 ` Athira Rajeev
  2022-04-08 12:28   ` Srikar Dronamraju
  2022-04-06 17:51 ` [PATCH v2 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online Athira Rajeev
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Athira Rajeev @ 2022-04-06 17:51 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain,
	linux-kernel, srikar, irogers

perf bench numa testcase fails on systems with CPU's
more than 1K.

Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1
Snippet of code:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>

bind_to_node function uses "sched_getaffinity" to save the original
cpumask and this call is returning EINVAL ((invalid argument).
This happens because the default mask size in glibc is 1024.
To overcome this 1024 CPUs mask size limitation of cpu_set_t,
change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this
for "orig_mask", apply same logic to "mask" as well which is used to
setaffinity so that mask size is large enough to represent number
of possible CPU's in the system.

sched_getaffinity is used in one more place in perf numa bench. It
is in "bind_to_cpu" function. Apply the same logic there also. Though
currently no failure is reported from there, it is ideal to change
getaffinity to work with such system configurations having CPU's more
than default mask size supported by glibc.

Also fix "sched_setaffinity" to use mask size which is large enough
to represent number of possible CPU's in the system.

Fixed all places where "bind_cpumask" which is part of "struct
thread_data" is used such that bind_cpumask works in all configuration.

Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
---
 tools/perf/bench/numa.c | 97 ++++++++++++++++++++++++++++++-----------
 1 file changed, 71 insertions(+), 26 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index f2640179ada9..29e41e32bd88 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -54,7 +54,7 @@
 
 struct thread_data {
 	int			curr_cpu;
-	cpu_set_t		bind_cpumask;
+	cpu_set_t		*bind_cpumask;
 	int			bind_node;
 	u8			*process_data;
 	int			process_nr;
@@ -266,46 +266,71 @@ static bool node_has_cpus(int node)
 	return ret;
 }
 
-static cpu_set_t bind_to_cpu(int target_cpu)
+static cpu_set_t *bind_to_cpu(int target_cpu)
 {
-	cpu_set_t orig_mask, mask;
+	int nrcpus = numa_num_possible_cpus();
+	cpu_set_t *orig_mask, *mask;
+	size_t size;
 	int ret;
 
-	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
-	BUG_ON(ret);
+	orig_mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!orig_mask);
+	size = CPU_ALLOC_SIZE(nrcpus);
+	CPU_ZERO_S(size, orig_mask);
+
+	ret = sched_getaffinity(0, size, orig_mask);
+	if (ret) {
+		CPU_FREE(orig_mask);
+		BUG_ON(ret);
+	}
 
-	CPU_ZERO(&mask);
+	mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!mask);
+	CPU_ZERO_S(size, mask);
 
 	if (target_cpu == -1) {
 		int cpu;
 
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &mask);
+			CPU_SET_S(cpu, size, mask);
 	} else {
 		BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
-		CPU_SET(target_cpu, &mask);
+		CPU_SET_S(target_cpu, size, mask);
 	}
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
+	ret = sched_setaffinity(0, size, mask);
+	CPU_FREE(mask);
 	BUG_ON(ret);
 
 	return orig_mask;
 }
 
-static cpu_set_t bind_to_node(int target_node)
+static cpu_set_t *bind_to_node(int target_node)
 {
-	cpu_set_t orig_mask, mask;
+	int nrcpus = numa_num_possible_cpus();
+	cpu_set_t *orig_mask, *mask;
+	size_t size;
 	int cpu;
 	int ret;
 
-	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
-	BUG_ON(ret);
+	orig_mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!orig_mask);
+	size = CPU_ALLOC_SIZE(nrcpus);
+	CPU_ZERO_S(size, orig_mask);
+
+	ret = sched_getaffinity(0, size, orig_mask);
+	if (ret) {
+		CPU_FREE(orig_mask);
+		BUG_ON(ret);
+	}
 
-	CPU_ZERO(&mask);
+	mask = CPU_ALLOC(nrcpus);
+	BUG_ON(!mask);
+	CPU_ZERO_S(size, mask);
 
 	if (target_node == NUMA_NO_NODE) {
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &mask);
+			CPU_SET_S(cpu, size, mask);
 	} else {
 		struct bitmask *cpumask = numa_allocate_cpumask();
 
@@ -313,24 +338,29 @@ static cpu_set_t bind_to_node(int target_node)
 		if (!numa_node_to_cpus(target_node, cpumask)) {
 			for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
 				if (numa_bitmask_isbitset(cpumask, cpu))
-					CPU_SET(cpu, &mask);
+					CPU_SET_S(cpu, size, mask);
 			}
 		}
 		numa_free_cpumask(cpumask);
 	}
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
+	ret = sched_setaffinity(0, size, mask);
+	CPU_FREE(mask);
 	BUG_ON(ret);
 
 	return orig_mask;
 }
 
-static void bind_to_cpumask(cpu_set_t mask)
+static void bind_to_cpumask(cpu_set_t *mask)
 {
 	int ret;
+	size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
 
-	ret = sched_setaffinity(0, sizeof(mask), &mask);
-	BUG_ON(ret);
+	ret = sched_setaffinity(0, size, mask);
+	if (ret) {
+		CPU_FREE(mask);
+		BUG_ON(ret);
+	}
 }
 
 static void mempol_restore(void)
@@ -376,7 +406,7 @@ do {							\
 static u8 *alloc_data(ssize_t bytes0, int map_flags,
 		      int init_zero, int init_cpu0, int thp, int init_random)
 {
-	cpu_set_t orig_mask;
+	cpu_set_t *orig_mask;
 	ssize_t bytes;
 	u8 *buf;
 	int ret;
@@ -434,6 +464,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
 	/* Restore affinity: */
 	if (init_cpu0) {
 		bind_to_cpumask(orig_mask);
+		CPU_FREE(orig_mask);
 		mempol_restore();
 	}
 
@@ -589,6 +620,7 @@ static int parse_setup_cpu_list(void)
 		BUG_ON(bind_cpu_0 > bind_cpu_1);
 
 		for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
+			size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
 			int i;
 
 			for (i = 0; i < mul; i++) {
@@ -608,10 +640,12 @@ static int parse_setup_cpu_list(void)
 					tprintf("%2d", bind_cpu);
 				}
 
-				CPU_ZERO(&td->bind_cpumask);
+				td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+				BUG_ON(!td->bind_cpumask);
+				CPU_ZERO_S(size, td->bind_cpumask);
 				for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
 					BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
-					CPU_SET(cpu, &td->bind_cpumask);
+					CPU_SET_S(cpu, size, td->bind_cpumask);
 				}
 				t++;
 			}
@@ -1241,7 +1275,7 @@ static void *worker_thread(void *__tdata)
 		 * by migrating to CPU#0:
 		 */
 		if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
-			cpu_set_t orig_mask;
+			cpu_set_t *orig_mask;
 			int target_cpu;
 			int this_cpu;
 
@@ -1265,6 +1299,7 @@ static void *worker_thread(void *__tdata)
 				printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
 
 			bind_to_cpumask(orig_mask);
+			CPU_FREE(orig_mask);
 		}
 
 		if (details >= 3) {
@@ -1398,21 +1433,31 @@ static void init_thread_data(void)
 
 	for (t = 0; t < g->p.nr_tasks; t++) {
 		struct thread_data *td = g->threads + t;
+		size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
 		int cpu;
 
 		/* Allow all nodes by default: */
 		td->bind_node = NUMA_NO_NODE;
 
 		/* Allow all CPUs by default: */
-		CPU_ZERO(&td->bind_cpumask);
+		td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
+		BUG_ON(!td->bind_cpumask);
+		CPU_ZERO_S(cpuset_size, td->bind_cpumask);
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
-			CPU_SET(cpu, &td->bind_cpumask);
+			CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
 	}
 }
 
 static void deinit_thread_data(void)
 {
 	ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
+	int t;
+
+	/* Free the bind_cpumask allocated for thread_data */
+	for (t = 0; t < g->p.nr_tasks; t++) {
+		struct thread_data *td = g->threads + t;
+		CPU_FREE(td->bind_cpumask);
+	}
 
 	free_data(g->threads, size);
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online
  2022-04-06 17:51 [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Athira Rajeev
                   ` (2 preceding siblings ...)
  2022-04-06 17:51 ` [PATCH v2 3/4] tools/perf: Fix perf numa bench to fix " Athira Rajeev
@ 2022-04-06 17:51 ` Athira Rajeev
  2022-04-08 12:26   ` Srikar Dronamraju
  2022-04-09 15:20   ` Arnaldo Carvalho de Melo
  2022-04-07  0:35 ` [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Ian Rogers
  2022-04-09 15:28 ` Arnaldo Carvalho de Melo
  5 siblings, 2 replies; 16+ messages in thread
From: Athira Rajeev @ 2022-04-06 17:51 UTC (permalink / raw)
  To: acme, jolsa, disgoel
  Cc: mpe, linux-perf-users, linuxppc-dev, maddy, rnsastry, kjain,
	linux-kernel, srikar, irogers

Perf numa bench test fails with error:

Testcase:
./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
--thp  1 --no-data_rand_walk

Failure snippet:
<<>>
 Running 'numa/mem' benchmark:

 # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
-M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"

perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
<<>>

The Testcases uses CPU’s 0 and 8. In function "parse_setup_cpu_list",
There is check to see if cpu number is greater than max cpu’s possible
in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
say 48 CPU’s, but only number of online CPU’s is 0-7. Other CPU’s
are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
and set bit for CPU 8 also in cpumask ( td->bind_cpumask).

bind_to_cpumask function is called to set affinity using
sched_setaffinity and the cpumask. Since the CPU8 is not present,
set affinity will fail here with EINVAL. Fix this issue by adding a
check to make sure that, CPU’s provided in the input argument values
are online before proceeding further and skip the test. For this,
include new helper function "is_cpu_online" in
"tools/perf/util/header.c".

Since "BIT(x)" definition will get included from header.h, remove
that from bench/numa.c

Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
---
 tools/perf/bench/numa.c  |  8 ++++++--
 tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/header.h |  1 +
 3 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 29e41e32bd88..7992d79b3e41 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -34,6 +34,7 @@
 #include <linux/numa.h>
 #include <linux/zalloc.h>
 
+#include "../util/header.h"
 #include <numa.h>
 #include <numaif.h>
 
@@ -616,6 +617,11 @@ static int parse_setup_cpu_list(void)
 			return -1;
 		}
 
+		if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
+			printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
+			return -1;
+		}
+
 		BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
 		BUG_ON(bind_cpu_0 > bind_cpu_1);
 
@@ -786,8 +792,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
 	return parse_node_list(arg);
 }
 
-#define BIT(x) (1ul << x)
-
 static inline uint32_t lfsr_32(uint32_t lfsr)
 {
 	const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 6da12e522edc..3f5fcf5d4b3f 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
 	return do_write(ff, &data->dir.version, sizeof(data->dir.version));
 }
 
+#define SYSFS "/sys/devices/system/cpu/"
+
+/*
+ * Check whether a CPU is online
+ *
+ * Returns:
+ *     1 -> if CPU is online
+ *     0 -> if CPU is offline
+ *    -1 -> error case
+ */
+int is_cpu_online(unsigned int cpu)
+{
+	char sysfs_cpu[255];
+	char buf[255];
+	struct stat statbuf;
+	size_t len;
+	int fd;
+
+	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
+
+	if (stat(sysfs_cpu, &statbuf) != 0)
+		return 0;
+
+	/*
+	 * Check if /sys/devices/system/cpu/cpux/online file
+	 * exists. In kernels without CONFIG_HOTPLUG_CPU, this
+	 * file won't exist.
+	 */
+	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
+	if (stat(sysfs_cpu, &statbuf) != 0)
+		return 1;
+
+	fd = open(sysfs_cpu, O_RDONLY);
+	if (fd == -1)
+		return -1;
+
+	len = read(fd, buf, sizeof(buf) - 1);
+	buf[len] = '\0';
+	close(fd);
+
+	return strtoul(buf, NULL, 16);
+}
+
 #ifdef HAVE_LIBBPF_SUPPORT
 static int write_bpf_prog_info(struct feat_fd *ff,
 			       struct evlist *evlist __maybe_unused)
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index c9e3265832d9..0eb4bc29a5a4 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
 int write_padded(struct feat_fd *fd, const void *bf,
 		 size_t count, size_t count_aligned);
 
+int is_cpu_online(unsigned int cpu);
 /*
  * arch specific callback
  */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K
  2022-04-06 17:51 [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Athira Rajeev
                   ` (3 preceding siblings ...)
  2022-04-06 17:51 ` [PATCH v2 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online Athira Rajeev
@ 2022-04-07  0:35 ` Ian Rogers
  2022-04-07  4:31   ` Athira Rajeev
  2022-04-09 15:28 ` Arnaldo Carvalho de Melo
  5 siblings, 1 reply; 16+ messages in thread
From: Ian Rogers @ 2022-04-07  0:35 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, disgoel, mpe, linux-perf-users, linuxppc-dev, maddy,
	rnsastry, kjain, linux-kernel, srikar

On Wed, Apr 6, 2022 at 10:51 AM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
> The perf benchmark for collections: numa, futex and epoll
> hits failure in system configuration with CPU's more than 1024.
> These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
> in the code to work with affinity.
>
> Example snippet from numa benchmark:
> <<>>
> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
> Aborted (core dumped)
> <<>>
>
> bind_to_node function uses "sched_getaffinity" to save the cpumask.
> This fails with EINVAL because the default mask size in glibc is 1024.
>
> Similarly in futex and epoll benchmark, uses sched_setaffinity during
> pthread_create with affinity. And since it returns EINVAL in such system
> configuration, benchmark doesn't run.
>
> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
> allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.
>
> Fix all the relevant places in the code to use mask size which is large
> enough to represent number of possible CPU's in the system.
>
> Fix parse_setup_cpu_list function in numa bench to check if input CPU
> is online before binding task to that CPU. This is to fix failures where,
> though CPU number is within max CPU, it could happen that CPU is offline.
> Here, sched_setaffinity will result in failure when using cpumask having
> that cpu bit set in the mask.
>
> Patch 1 and Patch 2 address fix for perf bench futex and perf bench
> epoll benchmark. Patch 3 and Patch 4 address fix in perf bench numa
> benchmark
>
> Athira Rajeev (4):
>   tools/perf: Fix perf bench futex to correct usage of affinity for
>     machines with #CPUs > 1K
>   tools/perf: Fix perf bench epoll to correct usage of affinity for
>     machines with #CPUs > 1K
>   tools/perf: Fix perf numa bench to fix usage of affinity for machines
>     with #CPUs > 1K
>   tools/perf: Fix perf bench numa testcase to check if CPU used to bind
>     task is online
>
> Changelog:
> From v1 -> v2:
>  Addressed review comment from Ian Rogers to do
>  CPU_FREE in a cleaner way.
>  Added Tested-by from Disha Goel


The whole set:
Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

>  tools/perf/bench/epoll-ctl.c           |  25 ++++--
>  tools/perf/bench/epoll-wait.c          |  25 ++++--
>  tools/perf/bench/futex-hash.c          |  26 ++++--
>  tools/perf/bench/futex-lock-pi.c       |  21 +++--
>  tools/perf/bench/futex-requeue.c       |  21 +++--
>  tools/perf/bench/futex-wake-parallel.c |  21 +++--
>  tools/perf/bench/futex-wake.c          |  22 ++++--
>  tools/perf/bench/numa.c                | 105 ++++++++++++++++++-------
>  tools/perf/util/header.c               |  43 ++++++++++
>  tools/perf/util/header.h               |   1 +
>  10 files changed, 242 insertions(+), 68 deletions(-)
>
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K
  2022-04-07  0:35 ` [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Ian Rogers
@ 2022-04-07  4:31   ` Athira Rajeev
  0 siblings, 0 replies; 16+ messages in thread
From: Athira Rajeev @ 2022-04-07  4:31 UTC (permalink / raw)
  To: Ian Rogers
  Cc: maddy, Srikar Dronamraju, Nageswara Sastry,
	Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	linux-perf-users, jolsa, kjain, disgoel, linuxppc-dev



> On 07-Apr-2022, at 6:05 AM, Ian Rogers <irogers@google.com> wrote:
> 
> On Wed, Apr 6, 2022 at 10:51 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com> wrote:
>> 
>> The perf benchmark for collections: numa, futex and epoll
>> hits failure in system configuration with CPU's more than 1024.
>> These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
>> in the code to work with affinity.
>> 
>> Example snippet from numa benchmark:
>> <<>>
>> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
>> Aborted (core dumped)
>> <<>>
>> 
>> bind_to_node function uses "sched_getaffinity" to save the cpumask.
>> This fails with EINVAL because the default mask size in glibc is 1024.
>> 
>> Similarly in futex and epoll benchmark, uses sched_setaffinity during
>> pthread_create with affinity. And since it returns EINVAL in such system
>> configuration, benchmark doesn't run.
>> 
>> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
>> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
>> allocate cpumask, CPU_ALLOC_SIZE for size, CPU_SET_S to set mask bit.
>> 
>> Fix all the relevant places in the code to use mask size which is large
>> enough to represent number of possible CPU's in the system.
>> 
>> Fix parse_setup_cpu_list function in numa bench to check if input CPU
>> is online before binding task to that CPU. This is to fix failures where,
>> though CPU number is within max CPU, it could happen that CPU is offline.
>> Here, sched_setaffinity will result in failure when using cpumask having
>> that cpu bit set in the mask.
>> 
>> Patch 1 and Patch 2 address fix for perf bench futex and perf bench
>> epoll benchmark. Patch 3 and Patch 4 address fix in perf bench numa
>> benchmark
>> 
>> Athira Rajeev (4):
>>  tools/perf: Fix perf bench futex to correct usage of affinity for
>>    machines with #CPUs > 1K
>>  tools/perf: Fix perf bench epoll to correct usage of affinity for
>>    machines with #CPUs > 1K
>>  tools/perf: Fix perf numa bench to fix usage of affinity for machines
>>    with #CPUs > 1K
>>  tools/perf: Fix perf bench numa testcase to check if CPU used to bind
>>    task is online
>> 
>> Changelog:
>> From v1 -> v2:
>> Addressed review comment from Ian Rogers to do
>> CPU_FREE in a cleaner way.
>> Added Tested-by from Disha Goel
> 
> 
> The whole set:
> Acked-by: Ian Rogers <irogers@google.com>

Thanks for checking Ian.

Athira.
> 
> Thanks,
> Ian
> 
>> tools/perf/bench/epoll-ctl.c           |  25 ++++--
>> tools/perf/bench/epoll-wait.c          |  25 ++++--
>> tools/perf/bench/futex-hash.c          |  26 ++++--
>> tools/perf/bench/futex-lock-pi.c       |  21 +++--
>> tools/perf/bench/futex-requeue.c       |  21 +++--
>> tools/perf/bench/futex-wake-parallel.c |  21 +++--
>> tools/perf/bench/futex-wake.c          |  22 ++++--
>> tools/perf/bench/numa.c                | 105 ++++++++++++++++++-------
>> tools/perf/util/header.c               |  43 ++++++++++
>> tools/perf/util/header.h               |   1 +
>> 10 files changed, 242 insertions(+), 68 deletions(-)
>> 
>> --
>> 2.35.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online
  2022-04-06 17:51 ` [PATCH v2 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online Athira Rajeev
@ 2022-04-08 12:26   ` Srikar Dronamraju
  2022-04-09  6:29     ` Athira Rajeev
  2022-04-09 15:20   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 16+ messages in thread
From: Srikar Dronamraju @ 2022-04-08 12:26 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, disgoel, mpe, linux-perf-users, linuxppc-dev, maddy,
	rnsastry, kjain, linux-kernel, irogers

* Athira Rajeev <atrajeev@linux.vnet.ibm.com> [2022-04-06 23:21:13]:

> Perf numa bench test fails with error:
> 
> Testcase:
> ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
> --thp  1 --no-data_rand_walk
> 
> Failure snippet:
> <<>>
>  Running 'numa/mem' benchmark:
> 
>  # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
> -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"
> 
> perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
> <<>>
> 
> The Testcases uses CPU???s 0 and 8. In function "parse_setup_cpu_list",
> There is check to see if cpu number is greater than max cpu???s possible
> in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
> bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
> say 48 CPU???s, but only number of online CPU???s is 0-7. Other CPU???s
> are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
> and set bit for CPU 8 also in cpumask ( td->bind_cpumask).
> 
> bind_to_cpumask function is called to set affinity using
> sched_setaffinity and the cpumask. Since the CPU8 is not present,
> set affinity will fail here with EINVAL. Fix this issue by adding a
> check to make sure that, CPU???s provided in the input argument values
> are online before proceeding further and skip the test. For this,
> include new helper function "is_cpu_online" in
> "tools/perf/util/header.c".
> 
> Since "BIT(x)" definition will get included from header.h, remove
> that from bench/numa.c
> 
> Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>

Looks good to me.
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> ---
>  tools/perf/bench/numa.c  |  8 ++++++--
>  tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/header.h |  1 +
>  3 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 29e41e32bd88..7992d79b3e41 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -34,6 +34,7 @@
>  #include <linux/numa.h>
>  #include <linux/zalloc.h>
> 
> +#include "../util/header.h"
>  #include <numa.h>
>  #include <numaif.h>
> 
> @@ -616,6 +617,11 @@ static int parse_setup_cpu_list(void)
>  			return -1;
>  		}
> 
> +		if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
> +			printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
> +			return -1;
> +		}
> +
>  		BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
>  		BUG_ON(bind_cpu_0 > bind_cpu_1);
> 
> @@ -786,8 +792,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
>  	return parse_node_list(arg);
>  }
> 
> -#define BIT(x) (1ul << x)
> -
>  static inline uint32_t lfsr_32(uint32_t lfsr)
>  {
>  	const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 6da12e522edc..3f5fcf5d4b3f 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
>  	return do_write(ff, &data->dir.version, sizeof(data->dir.version));
>  }
> 
> +#define SYSFS "/sys/devices/system/cpu/"
> +
> +/*
> + * Check whether a CPU is online
> + *
> + * Returns:
> + *     1 -> if CPU is online
> + *     0 -> if CPU is offline
> + *    -1 -> error case
> + */
> +int is_cpu_online(unsigned int cpu)
> +{
> +	char sysfs_cpu[255];
> +	char buf[255];
> +	struct stat statbuf;
> +	size_t len;
> +	int fd;
> +
> +	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
> +
> +	if (stat(sysfs_cpu, &statbuf) != 0)
> +		return 0;
> +
> +	/*
> +	 * Check if /sys/devices/system/cpu/cpux/online file
> +	 * exists. In kernels without CONFIG_HOTPLUG_CPU, this
> +	 * file won't exist.
> +	 */
> +	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
> +	if (stat(sysfs_cpu, &statbuf) != 0)
> +		return 1;
> +
> +	fd = open(sysfs_cpu, O_RDONLY);
> +	if (fd == -1)
> +		return -1;
> +
> +	len = read(fd, buf, sizeof(buf) - 1);
> +	buf[len] = '\0';
> +	close(fd);
> +
> +	return strtoul(buf, NULL, 16);
> +}
> +
>  #ifdef HAVE_LIBBPF_SUPPORT
>  static int write_bpf_prog_info(struct feat_fd *ff,
>  			       struct evlist *evlist __maybe_unused)
> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
> index c9e3265832d9..0eb4bc29a5a4 100644
> --- a/tools/perf/util/header.h
> +++ b/tools/perf/util/header.h
> @@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
>  int write_padded(struct feat_fd *fd, const void *bf,
>  		 size_t count, size_t count_aligned);
> 
> +int is_cpu_online(unsigned int cpu);
>  /*
>   * arch specific callback
>   */
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/4] tools/perf: Fix perf bench futex to correct usage of affinity for machines with #CPUs > 1K
  2022-04-06 17:51 ` [PATCH v2 1/4] tools/perf: Fix perf bench futex to correct usage of affinity for machines with " Athira Rajeev
@ 2022-04-08 12:27   ` Srikar Dronamraju
  0 siblings, 0 replies; 16+ messages in thread
From: Srikar Dronamraju @ 2022-04-08 12:27 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, disgoel, mpe, linux-perf-users, linuxppc-dev, maddy,
	rnsastry, kjain, linux-kernel, irogers

* Athira Rajeev <atrajeev@linux.vnet.ibm.com> [2022-04-06 23:21:10]:

> perf bench futex testcase fails on systems with CPU's
> more than 1K.
> 
> Testcase: perf bench futex all
> Failure snippet:
> <<>>Running futex/hash benchmark...
> 
> perf: pthread_create: No such file or directory
> <<>>
> 
> All the futex benchmarks ( ie hash, lock-api, requeue, wake,
> wake-parallel ), pthread_create is invoked in respective bench_futex_*
> function. Though the logs shows direct failure from pthread_create,
> strace logs showed that actual failure is from  "sched_setaffinity"
> returning EINVAL (invalid argument). This happens because the default
> mask size in glibc is 1024. To overcome this 1024 CPUs mask size
> limitation of cpu_set_t, change the mask size using the CPU_*_S macros.
> 
> Patch addresses this by fixing all the futex benchmarks to use
> CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
> CPU_SET_S to set the mask.
> 
> Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>

Looks good to me
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


> ---
>  tools/perf/bench/futex-hash.c          | 26 +++++++++++++++++++-------
>  tools/perf/bench/futex-lock-pi.c       | 21 ++++++++++++++++-----
>  tools/perf/bench/futex-requeue.c       | 21 ++++++++++++++++-----
>  tools/perf/bench/futex-wake-parallel.c | 21 ++++++++++++++++-----
>  tools/perf/bench/futex-wake.c          | 22 ++++++++++++++++------
>  5 files changed, 83 insertions(+), 28 deletions(-)
> 
> diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
> index 9627b6ab8670..dfce64e551e2 100644
> --- a/tools/perf/bench/futex-hash.c
> +++ b/tools/perf/bench/futex-hash.c
> @@ -122,12 +122,14 @@ static void print_summary(void)
>  int bench_futex_hash(int argc, const char **argv)
>  {
>  	int ret = 0;
> -	cpu_set_t cpuset;
> +	cpu_set_t *cpuset;
>  	struct sigaction act;
>  	unsigned int i;
>  	pthread_attr_t thread_attr;
>  	struct worker *worker = NULL;
>  	struct perf_cpu_map *cpu;
> +	int nrcpus;
> +	size_t size;
> 
>  	argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0);
>  	if (argc) {
> @@ -170,25 +172,35 @@ int bench_futex_hash(int argc, const char **argv)
>  	threads_starting = params.nthreads;
>  	pthread_attr_init(&thread_attr);
>  	gettimeofday(&bench__start, NULL);
> +
> +	nrcpus = perf_cpu_map__nr(cpu);
> +	cpuset = CPU_ALLOC(nrcpus);
> +	BUG_ON(!cpuset);
> +	size = CPU_ALLOC_SIZE(nrcpus);
> +
>  	for (i = 0; i < params.nthreads; i++) {
>  		worker[i].tid = i;
>  		worker[i].futex = calloc(params.nfutexes, sizeof(*worker[i].futex));
>  		if (!worker[i].futex)
>  			goto errmem;
> 
> -		CPU_ZERO(&cpuset);
> -		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
> +		CPU_ZERO_S(size, cpuset);
> 
> -		ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset);
> -		if (ret)
> +		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
> +		ret = pthread_attr_setaffinity_np(&thread_attr, size, cpuset);
> +		if (ret) {
> +			CPU_FREE(cpuset);
>  			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
> -
> +		}
>  		ret = pthread_create(&worker[i].thread, &thread_attr, workerfn,
>  				     (void *)(struct worker *) &worker[i]);
> -		if (ret)
> +		if (ret) {
> +			CPU_FREE(cpuset);
>  			err(EXIT_FAILURE, "pthread_create");
> +		}
> 
>  	}
> +	CPU_FREE(cpuset);
>  	pthread_attr_destroy(&thread_attr);
> 
>  	pthread_mutex_lock(&thread_lock);
> diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c
> index a512a320df74..61c3bb80d4cf 100644
> --- a/tools/perf/bench/futex-lock-pi.c
> +++ b/tools/perf/bench/futex-lock-pi.c
> @@ -120,11 +120,17 @@ static void *workerfn(void *arg)
>  static void create_threads(struct worker *w, pthread_attr_t thread_attr,
>  			   struct perf_cpu_map *cpu)
>  {
> -	cpu_set_t cpuset;
> +	cpu_set_t *cpuset;
>  	unsigned int i;
> +	int nrcpus =  perf_cpu_map__nr(cpu);
> +	size_t size;
> 
>  	threads_starting = params.nthreads;
> 
> +	cpuset = CPU_ALLOC(nrcpus);
> +	BUG_ON(!cpuset);
> +	size = CPU_ALLOC_SIZE(nrcpus);
> +
>  	for (i = 0; i < params.nthreads; i++) {
>  		worker[i].tid = i;
> 
> @@ -135,15 +141,20 @@ static void create_threads(struct worker *w, pthread_attr_t thread_attr,
>  		} else
>  			worker[i].futex = &global_futex;
> 
> -		CPU_ZERO(&cpuset);
> -		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
> +		CPU_ZERO_S(size, cpuset);
> +		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
> 
> -		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
> +		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
> +			CPU_FREE(cpuset);
>  			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
> +		}
> 
> -		if (pthread_create(&w[i].thread, &thread_attr, workerfn, &worker[i]))
> +		if (pthread_create(&w[i].thread, &thread_attr, workerfn, &worker[i])) {
> +			CPU_FREE(cpuset);
>  			err(EXIT_FAILURE, "pthread_create");
> +		}
>  	}
> +	CPU_FREE(cpuset);
>  }
> 
>  int bench_futex_lock_pi(int argc, const char **argv)
> diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c
> index aca47ce8b1e7..2cb013f7ffe5 100644
> --- a/tools/perf/bench/futex-requeue.c
> +++ b/tools/perf/bench/futex-requeue.c
> @@ -123,22 +123,33 @@ static void *workerfn(void *arg __maybe_unused)
>  static void block_threads(pthread_t *w,
>  			  pthread_attr_t thread_attr, struct perf_cpu_map *cpu)
>  {
> -	cpu_set_t cpuset;
> +	cpu_set_t *cpuset;
>  	unsigned int i;
> +	int nrcpus = perf_cpu_map__nr(cpu);
> +	size_t size;
> 
>  	threads_starting = params.nthreads;
> 
> +	cpuset = CPU_ALLOC(nrcpus);
> +	BUG_ON(!cpuset);
> +	size = CPU_ALLOC_SIZE(nrcpus);
> +
>  	/* create and block all threads */
>  	for (i = 0; i < params.nthreads; i++) {
> -		CPU_ZERO(&cpuset);
> -		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
> +		CPU_ZERO_S(size, cpuset);
> +		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
> 
> -		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
> +		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
> +			CPU_FREE(cpuset);
>  			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
> +		}
> 
> -		if (pthread_create(&w[i], &thread_attr, workerfn, NULL))
> +		if (pthread_create(&w[i], &thread_attr, workerfn, NULL)) {
> +			CPU_FREE(cpuset);
>  			err(EXIT_FAILURE, "pthread_create");
> +		}
>  	}
> +	CPU_FREE(cpuset);
>  }
> 
>  static void toggle_done(int sig __maybe_unused,
> diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c
> index 888ee6037945..efa5070a5eb3 100644
> --- a/tools/perf/bench/futex-wake-parallel.c
> +++ b/tools/perf/bench/futex-wake-parallel.c
> @@ -144,22 +144,33 @@ static void *blocked_workerfn(void *arg __maybe_unused)
>  static void block_threads(pthread_t *w, pthread_attr_t thread_attr,
>  			  struct perf_cpu_map *cpu)
>  {
> -	cpu_set_t cpuset;
> +	cpu_set_t *cpuset;
>  	unsigned int i;
> +	int nrcpus = perf_cpu_map__nr(cpu);
> +	size_t size;
> 
>  	threads_starting = params.nthreads;
> 
> +	cpuset = CPU_ALLOC(nrcpus);
> +	BUG_ON(!cpuset);
> +	size = CPU_ALLOC_SIZE(nrcpus);
> +
>  	/* create and block all threads */
>  	for (i = 0; i < params.nthreads; i++) {
> -		CPU_ZERO(&cpuset);
> -		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
> +		CPU_ZERO_S(size, cpuset);
> +		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
> 
> -		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
> +		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
> +			CPU_FREE(cpuset);
>  			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
> +		}
> 
> -		if (pthread_create(&w[i], &thread_attr, blocked_workerfn, NULL))
> +		if (pthread_create(&w[i], &thread_attr, blocked_workerfn, NULL)) {
> +			CPU_FREE(cpuset);
>  			err(EXIT_FAILURE, "pthread_create");
> +		}
>  	}
> +	CPU_FREE(cpuset);
>  }
> 
>  static void print_run(struct thread_data *waking_worker, unsigned int run_num)
> diff --git a/tools/perf/bench/futex-wake.c b/tools/perf/bench/futex-wake.c
> index aa82db51c0ab..3a10f54900c1 100644
> --- a/tools/perf/bench/futex-wake.c
> +++ b/tools/perf/bench/futex-wake.c
> @@ -97,22 +97,32 @@ static void print_summary(void)
>  static void block_threads(pthread_t *w,
>  			  pthread_attr_t thread_attr, struct perf_cpu_map *cpu)
>  {
> -	cpu_set_t cpuset;
> +	cpu_set_t *cpuset;
>  	unsigned int i;
> -
> +	size_t size;
> +	int nrcpus = perf_cpu_map__nr(cpu);
>  	threads_starting = params.nthreads;
> 
> +	cpuset = CPU_ALLOC(nrcpus);
> +	BUG_ON(!cpuset);
> +	size = CPU_ALLOC_SIZE(nrcpus);
> +
>  	/* create and block all threads */
>  	for (i = 0; i < params.nthreads; i++) {
> -		CPU_ZERO(&cpuset);
> -		CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, &cpuset);
> +		CPU_ZERO_S(size, cpuset);
> +		CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, cpuset);
> 
> -		if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpuset))
> +		if (pthread_attr_setaffinity_np(&thread_attr, size, cpuset)) {
> +			CPU_FREE(cpuset);
>  			err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
> +		}
> 
> -		if (pthread_create(&w[i], &thread_attr, workerfn, NULL))
> +		if (pthread_create(&w[i], &thread_attr, workerfn, NULL)) {
> +			CPU_FREE(cpuset);
>  			err(EXIT_FAILURE, "pthread_create");
> +		}
>  	}
> +	CPU_FREE(cpuset);
>  }
> 
>  static void toggle_done(int sig __maybe_unused,
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/4] tools/perf: Fix perf numa bench to fix usage of affinity for machines with #CPUs > 1K
  2022-04-06 17:51 ` [PATCH v2 3/4] tools/perf: Fix perf numa bench to fix " Athira Rajeev
@ 2022-04-08 12:28   ` Srikar Dronamraju
  0 siblings, 0 replies; 16+ messages in thread
From: Srikar Dronamraju @ 2022-04-08 12:28 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: acme, jolsa, disgoel, mpe, linux-perf-users, linuxppc-dev, maddy,
	rnsastry, kjain, linux-kernel, irogers

* Athira Rajeev <atrajeev@linux.vnet.ibm.com> [2022-04-06 23:21:12]:

> perf bench numa testcase fails on systems with CPU's
> more than 1K.
> 
> Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1
> Snippet of code:
> <<>>
> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
> Aborted (core dumped)
> <<>>
> 
> bind_to_node function uses "sched_getaffinity" to save the original
> cpumask and this call is returning EINVAL ((invalid argument).
> This happens because the default mask size in glibc is 1024.
> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
> allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this
> for "orig_mask", apply same logic to "mask" as well which is used to
> setaffinity so that mask size is large enough to represent number
> of possible CPU's in the system.
> 
> sched_getaffinity is used in one more place in perf numa bench. It
> is in "bind_to_cpu" function. Apply the same logic there also. Though
> currently no failure is reported from there, it is ideal to change
> getaffinity to work with such system configurations having CPU's more
> than default mask size supported by glibc.
> 
> Also fix "sched_setaffinity" to use mask size which is large enough
> to represent number of possible CPU's in the system.
> 
> Fixed all places where "bind_cpumask" which is part of "struct
> thread_data" is used such that bind_cpumask works in all configuration.
> 
> Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>

Looks good to me.

Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> ---
>  tools/perf/bench/numa.c | 97 ++++++++++++++++++++++++++++++-----------
>  1 file changed, 71 insertions(+), 26 deletions(-)
> 
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index f2640179ada9..29e41e32bd88 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -54,7 +54,7 @@
> 
>  struct thread_data {
>  	int			curr_cpu;
> -	cpu_set_t		bind_cpumask;
> +	cpu_set_t		*bind_cpumask;
>  	int			bind_node;
>  	u8			*process_data;
>  	int			process_nr;
> @@ -266,46 +266,71 @@ static bool node_has_cpus(int node)
>  	return ret;
>  }
> 
> -static cpu_set_t bind_to_cpu(int target_cpu)
> +static cpu_set_t *bind_to_cpu(int target_cpu)
>  {
> -	cpu_set_t orig_mask, mask;
> +	int nrcpus = numa_num_possible_cpus();
> +	cpu_set_t *orig_mask, *mask;
> +	size_t size;
>  	int ret;
> 
> -	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> -	BUG_ON(ret);
> +	orig_mask = CPU_ALLOC(nrcpus);
> +	BUG_ON(!orig_mask);
> +	size = CPU_ALLOC_SIZE(nrcpus);
> +	CPU_ZERO_S(size, orig_mask);
> +
> +	ret = sched_getaffinity(0, size, orig_mask);
> +	if (ret) {
> +		CPU_FREE(orig_mask);
> +		BUG_ON(ret);
> +	}
> 
> -	CPU_ZERO(&mask);
> +	mask = CPU_ALLOC(nrcpus);
> +	BUG_ON(!mask);
> +	CPU_ZERO_S(size, mask);
> 
>  	if (target_cpu == -1) {
>  		int cpu;
> 
>  		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> -			CPU_SET(cpu, &mask);
> +			CPU_SET_S(cpu, size, mask);
>  	} else {
>  		BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
> -		CPU_SET(target_cpu, &mask);
> +		CPU_SET_S(target_cpu, size, mask);
>  	}
> 
> -	ret = sched_setaffinity(0, sizeof(mask), &mask);
> +	ret = sched_setaffinity(0, size, mask);
> +	CPU_FREE(mask);
>  	BUG_ON(ret);
> 
>  	return orig_mask;
>  }
> 
> -static cpu_set_t bind_to_node(int target_node)
> +static cpu_set_t *bind_to_node(int target_node)
>  {
> -	cpu_set_t orig_mask, mask;
> +	int nrcpus = numa_num_possible_cpus();
> +	cpu_set_t *orig_mask, *mask;
> +	size_t size;
>  	int cpu;
>  	int ret;
> 
> -	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> -	BUG_ON(ret);
> +	orig_mask = CPU_ALLOC(nrcpus);
> +	BUG_ON(!orig_mask);
> +	size = CPU_ALLOC_SIZE(nrcpus);
> +	CPU_ZERO_S(size, orig_mask);
> +
> +	ret = sched_getaffinity(0, size, orig_mask);
> +	if (ret) {
> +		CPU_FREE(orig_mask);
> +		BUG_ON(ret);
> +	}
> 
> -	CPU_ZERO(&mask);
> +	mask = CPU_ALLOC(nrcpus);
> +	BUG_ON(!mask);
> +	CPU_ZERO_S(size, mask);
> 
>  	if (target_node == NUMA_NO_NODE) {
>  		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> -			CPU_SET(cpu, &mask);
> +			CPU_SET_S(cpu, size, mask);
>  	} else {
>  		struct bitmask *cpumask = numa_allocate_cpumask();
> 
> @@ -313,24 +338,29 @@ static cpu_set_t bind_to_node(int target_node)
>  		if (!numa_node_to_cpus(target_node, cpumask)) {
>  			for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
>  				if (numa_bitmask_isbitset(cpumask, cpu))
> -					CPU_SET(cpu, &mask);
> +					CPU_SET_S(cpu, size, mask);
>  			}
>  		}
>  		numa_free_cpumask(cpumask);
>  	}
> 
> -	ret = sched_setaffinity(0, sizeof(mask), &mask);
> +	ret = sched_setaffinity(0, size, mask);
> +	CPU_FREE(mask);
>  	BUG_ON(ret);
> 
>  	return orig_mask;
>  }
> 
> -static void bind_to_cpumask(cpu_set_t mask)
> +static void bind_to_cpumask(cpu_set_t *mask)
>  {
>  	int ret;
> +	size_t size = CPU_ALLOC_SIZE(numa_num_possible_cpus());
> 
> -	ret = sched_setaffinity(0, sizeof(mask), &mask);
> -	BUG_ON(ret);
> +	ret = sched_setaffinity(0, size, mask);
> +	if (ret) {
> +		CPU_FREE(mask);
> +		BUG_ON(ret);
> +	}
>  }
> 
>  static void mempol_restore(void)
> @@ -376,7 +406,7 @@ do {							\
>  static u8 *alloc_data(ssize_t bytes0, int map_flags,
>  		      int init_zero, int init_cpu0, int thp, int init_random)
>  {
> -	cpu_set_t orig_mask;
> +	cpu_set_t *orig_mask;
>  	ssize_t bytes;
>  	u8 *buf;
>  	int ret;
> @@ -434,6 +464,7 @@ static u8 *alloc_data(ssize_t bytes0, int map_flags,
>  	/* Restore affinity: */
>  	if (init_cpu0) {
>  		bind_to_cpumask(orig_mask);
> +		CPU_FREE(orig_mask);
>  		mempol_restore();
>  	}
> 
> @@ -589,6 +620,7 @@ static int parse_setup_cpu_list(void)
>  		BUG_ON(bind_cpu_0 > bind_cpu_1);
> 
>  		for (bind_cpu = bind_cpu_0; bind_cpu <= bind_cpu_1; bind_cpu += step) {
> +			size_t size = CPU_ALLOC_SIZE(g->p.nr_cpus);
>  			int i;
> 
>  			for (i = 0; i < mul; i++) {
> @@ -608,10 +640,12 @@ static int parse_setup_cpu_list(void)
>  					tprintf("%2d", bind_cpu);
>  				}
> 
> -				CPU_ZERO(&td->bind_cpumask);
> +				td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
> +				BUG_ON(!td->bind_cpumask);
> +				CPU_ZERO_S(size, td->bind_cpumask);
>  				for (cpu = bind_cpu; cpu < bind_cpu+bind_len; cpu++) {
>  					BUG_ON(cpu < 0 || cpu >= g->p.nr_cpus);
> -					CPU_SET(cpu, &td->bind_cpumask);
> +					CPU_SET_S(cpu, size, td->bind_cpumask);
>  				}
>  				t++;
>  			}
> @@ -1241,7 +1275,7 @@ static void *worker_thread(void *__tdata)
>  		 * by migrating to CPU#0:
>  		 */
>  		if (first_task && g->p.perturb_secs && (int)(stop.tv_sec - last_perturbance) >= g->p.perturb_secs) {
> -			cpu_set_t orig_mask;
> +			cpu_set_t *orig_mask;
>  			int target_cpu;
>  			int this_cpu;
> 
> @@ -1265,6 +1299,7 @@ static void *worker_thread(void *__tdata)
>  				printf(" (injecting perturbalance, moved to CPU#%d)\n", target_cpu);
> 
>  			bind_to_cpumask(orig_mask);
> +			CPU_FREE(orig_mask);
>  		}
> 
>  		if (details >= 3) {
> @@ -1398,21 +1433,31 @@ static void init_thread_data(void)
> 
>  	for (t = 0; t < g->p.nr_tasks; t++) {
>  		struct thread_data *td = g->threads + t;
> +		size_t cpuset_size = CPU_ALLOC_SIZE(g->p.nr_cpus);
>  		int cpu;
> 
>  		/* Allow all nodes by default: */
>  		td->bind_node = NUMA_NO_NODE;
> 
>  		/* Allow all CPUs by default: */
> -		CPU_ZERO(&td->bind_cpumask);
> +		td->bind_cpumask = CPU_ALLOC(g->p.nr_cpus);
> +		BUG_ON(!td->bind_cpumask);
> +		CPU_ZERO_S(cpuset_size, td->bind_cpumask);
>  		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> -			CPU_SET(cpu, &td->bind_cpumask);
> +			CPU_SET_S(cpu, cpuset_size, td->bind_cpumask);
>  	}
>  }
> 
>  static void deinit_thread_data(void)
>  {
>  	ssize_t size = sizeof(*g->threads)*g->p.nr_tasks;
> +	int t;
> +
> +	/* Free the bind_cpumask allocated for thread_data */
> +	for (t = 0; t < g->p.nr_tasks; t++) {
> +		struct thread_data *td = g->threads + t;
> +		CPU_FREE(td->bind_cpumask);
> +	}
> 
>  	free_data(g->threads, size);
>  }
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online
  2022-04-08 12:26   ` Srikar Dronamraju
@ 2022-04-09  6:29     ` Athira Rajeev
  0 siblings, 0 replies; 16+ messages in thread
From: Athira Rajeev @ 2022-04-09  6:29 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ian Rogers, maddy, Nageswara Sastry, Linux Kernel Mailing List,
	Arnaldo Carvalho de Melo, linux-perf-users, jolsa, kjain,
	disgoel, linuxppc-dev



> On 08-Apr-2022, at 5:56 PM, Srikar Dronamraju <srikar@linux.vnet.ibm.com> wrote:
> 
> * Athira Rajeev <atrajeev@linux.vnet.ibm.com> [2022-04-06 23:21:13]:
> 
>> Perf numa bench test fails with error:
>> 
>> Testcase:
>> ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
>> --thp  1 --no-data_rand_walk
>> 
>> Failure snippet:
>> <<>>
>> Running 'numa/mem' benchmark:
>> 
>> # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
>> -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"
>> 
>> perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
>> <<>>
>> 
>> The Testcases uses CPU???s 0 and 8. In function "parse_setup_cpu_list",
>> There is check to see if cpu number is greater than max cpu???s possible
>> in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
>> bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
>> say 48 CPU???s, but only number of online CPU???s is 0-7. Other CPU???s
>> are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
>> and set bit for CPU 8 also in cpumask ( td->bind_cpumask).
>> 
>> bind_to_cpumask function is called to set affinity using
>> sched_setaffinity and the cpumask. Since the CPU8 is not present,
>> set affinity will fail here with EINVAL. Fix this issue by adding a
>> check to make sure that, CPU???s provided in the input argument values
>> are online before proceeding further and skip the test. For this,
>> include new helper function "is_cpu_online" in
>> "tools/perf/util/header.c".
>> 
>> Since "BIT(x)" definition will get included from header.h, remove
>> that from bench/numa.c
>> 
>> Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
> 
> Looks good to me.
> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Hi Srikar,

Thanks for the review

Athira
> 
>> ---
>> tools/perf/bench/numa.c  |  8 ++++++--
>> tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
>> tools/perf/util/header.h |  1 +
>> 3 files changed, 50 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
>> index 29e41e32bd88..7992d79b3e41 100644
>> --- a/tools/perf/bench/numa.c
>> +++ b/tools/perf/bench/numa.c
>> @@ -34,6 +34,7 @@
>> #include <linux/numa.h>
>> #include <linux/zalloc.h>
>> 
>> +#include "../util/header.h"
>> #include <numa.h>
>> #include <numaif.h>
>> 
>> @@ -616,6 +617,11 @@ static int parse_setup_cpu_list(void)
>> 			return -1;
>> 		}
>> 
>> +		if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
>> +			printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
>> +			return -1;
>> +		}
>> +
>> 		BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
>> 		BUG_ON(bind_cpu_0 > bind_cpu_1);
>> 
>> @@ -786,8 +792,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
>> 	return parse_node_list(arg);
>> }
>> 
>> -#define BIT(x) (1ul << x)
>> -
>> static inline uint32_t lfsr_32(uint32_t lfsr)
>> {
>> 	const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
>> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
>> index 6da12e522edc..3f5fcf5d4b3f 100644
>> --- a/tools/perf/util/header.c
>> +++ b/tools/perf/util/header.c
>> @@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
>> 	return do_write(ff, &data->dir.version, sizeof(data->dir.version));
>> }
>> 
>> +#define SYSFS "/sys/devices/system/cpu/"
>> +
>> +/*
>> + * Check whether a CPU is online
>> + *
>> + * Returns:
>> + *     1 -> if CPU is online
>> + *     0 -> if CPU is offline
>> + *    -1 -> error case
>> + */
>> +int is_cpu_online(unsigned int cpu)
>> +{
>> +	char sysfs_cpu[255];
>> +	char buf[255];
>> +	struct stat statbuf;
>> +	size_t len;
>> +	int fd;
>> +
>> +	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
>> +
>> +	if (stat(sysfs_cpu, &statbuf) != 0)
>> +		return 0;
>> +
>> +	/*
>> +	 * Check if /sys/devices/system/cpu/cpux/online file
>> +	 * exists. In kernels without CONFIG_HOTPLUG_CPU, this
>> +	 * file won't exist.
>> +	 */
>> +	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
>> +	if (stat(sysfs_cpu, &statbuf) != 0)
>> +		return 1;
>> +
>> +	fd = open(sysfs_cpu, O_RDONLY);
>> +	if (fd == -1)
>> +		return -1;
>> +
>> +	len = read(fd, buf, sizeof(buf) - 1);
>> +	buf[len] = '\0';
>> +	close(fd);
>> +
>> +	return strtoul(buf, NULL, 16);
>> +}
>> +
>> #ifdef HAVE_LIBBPF_SUPPORT
>> static int write_bpf_prog_info(struct feat_fd *ff,
>> 			       struct evlist *evlist __maybe_unused)
>> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
>> index c9e3265832d9..0eb4bc29a5a4 100644
>> --- a/tools/perf/util/header.h
>> +++ b/tools/perf/util/header.h
>> @@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
>> int write_padded(struct feat_fd *fd, const void *bf,
>> 		 size_t count, size_t count_aligned);
>> 
>> +int is_cpu_online(unsigned int cpu);
>> /*
>>  * arch specific callback
>>  */
>> -- 
>> 2.35.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online
  2022-04-06 17:51 ` [PATCH v2 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online Athira Rajeev
  2022-04-08 12:26   ` Srikar Dronamraju
@ 2022-04-09 15:20   ` Arnaldo Carvalho de Melo
  2022-04-11 13:54     ` Athira Rajeev
  1 sibling, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-04-09 15:20 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: jolsa, disgoel, mpe, linux-perf-users, linuxppc-dev, maddy,
	rnsastry, kjain, linux-kernel, srikar, irogers

Em Wed, Apr 06, 2022 at 11:21:13PM +0530, Athira Rajeev escreveu:
> Perf numa bench test fails with error:
> 
> Testcase:
> ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
> --thp  1 --no-data_rand_walk
> 
> Failure snippet:
> <<>>
>  Running 'numa/mem' benchmark:
> 
>  # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
> -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"
> 
> perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
> <<>>
> 
> The Testcases uses CPU’s 0 and 8. In function "parse_setup_cpu_list",
> There is check to see if cpu number is greater than max cpu’s possible
> in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
> bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
> say 48 CPU’s, but only number of online CPU’s is 0-7. Other CPU’s
> are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
> and set bit for CPU 8 also in cpumask ( td->bind_cpumask).
> 
> bind_to_cpumask function is called to set affinity using
> sched_setaffinity and the cpumask. Since the CPU8 is not present,
> set affinity will fail here with EINVAL. Fix this issue by adding a
> check to make sure that, CPU’s provided in the input argument values
> are online before proceeding further and skip the test. For this,
> include new helper function "is_cpu_online" in
> "tools/perf/util/header.c".
> 
> Since "BIT(x)" definition will get included from header.h, remove
> that from bench/numa.c
> 
> Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
> ---
>  tools/perf/bench/numa.c  |  8 ++++++--
>  tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/header.h |  1 +
>  3 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 29e41e32bd88..7992d79b3e41 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -34,6 +34,7 @@
>  #include <linux/numa.h>
>  #include <linux/zalloc.h>
>  
> +#include "../util/header.h"
>  #include <numa.h>
>  #include <numaif.h>
>  
> @@ -616,6 +617,11 @@ static int parse_setup_cpu_list(void)
>  			return -1;
>  		}
>  
> +		if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
> +			printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
> +			return -1;
> +		}
> +
>  		BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
>  		BUG_ON(bind_cpu_0 > bind_cpu_1);
>  
> @@ -786,8 +792,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
>  	return parse_node_list(arg);
>  }
>  
> -#define BIT(x) (1ul << x)
> -
>  static inline uint32_t lfsr_32(uint32_t lfsr)
>  {
>  	const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 6da12e522edc..3f5fcf5d4b3f 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
>  	return do_write(ff, &data->dir.version, sizeof(data->dir.version));
>  }
>  
> +#define SYSFS "/sys/devices/system/cpu/"

Please use

int sysfs__read_str(const char *entry, char **buf, size_t *sizep)

See how to use it in the smt_on() function at tools/perf/util/smt.c, for
example.

Now looking at the first patches in the series.

- Arnaldo

> +/*
> + * Check whether a CPU is online
> + *
> + * Returns:
> + *     1 -> if CPU is online
> + *     0 -> if CPU is offline
> + *    -1 -> error case
> + */
> +int is_cpu_online(unsigned int cpu)
> +{
> +	char sysfs_cpu[255];
> +	char buf[255];
> +	struct stat statbuf;
> +	size_t len;
> +	int fd;
> +
> +	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
> +
> +	if (stat(sysfs_cpu, &statbuf) != 0)
> +		return 0;
> +
> +	/*
> +	 * Check if /sys/devices/system/cpu/cpux/online file
> +	 * exists. In kernels without CONFIG_HOTPLUG_CPU, this
> +	 * file won't exist.
> +	 */
> +	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
> +	if (stat(sysfs_cpu, &statbuf) != 0)
> +		return 1;
> +
> +	fd = open(sysfs_cpu, O_RDONLY);
> +	if (fd == -1)
> +		return -1;
> +
> +	len = read(fd, buf, sizeof(buf) - 1);
> +	buf[len] = '\0';
> +	close(fd);
> +
> +	return strtoul(buf, NULL, 16);
> +}
> +
>  #ifdef HAVE_LIBBPF_SUPPORT
>  static int write_bpf_prog_info(struct feat_fd *ff,
>  			       struct evlist *evlist __maybe_unused)
> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
> index c9e3265832d9..0eb4bc29a5a4 100644
> --- a/tools/perf/util/header.h
> +++ b/tools/perf/util/header.h
> @@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
>  int write_padded(struct feat_fd *fd, const void *bf,
>  		 size_t count, size_t count_aligned);
>  
> +int is_cpu_online(unsigned int cpu);
>  /*
>   * arch specific callback
>   */
> -- 
> 2.35.1

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K
  2022-04-06 17:51 [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Athira Rajeev
                   ` (4 preceding siblings ...)
  2022-04-07  0:35 ` [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Ian Rogers
@ 2022-04-09 15:28 ` Arnaldo Carvalho de Melo
  2022-04-09 17:18   ` Arnaldo Carvalho de Melo
  5 siblings, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-04-09 15:28 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: jolsa, disgoel, mpe, linux-perf-users, linuxppc-dev, maddy,
	rnsastry, kjain, linux-kernel, srikar, irogers

Em Wed, Apr 06, 2022 at 11:21:09PM +0530, Athira Rajeev escreveu:
> The perf benchmark for collections: numa, futex and epoll
> hits failure in system configuration with CPU's more than 1024.
> These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
> in the code to work with affinity.

Applied 1-3, 4 needs some reworking and can wait for v5.19, the first 3
are fixes, so can go now.

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K
  2022-04-09 15:28 ` Arnaldo Carvalho de Melo
@ 2022-04-09 17:18   ` Arnaldo Carvalho de Melo
  2022-04-12 16:33     ` Athira Rajeev
  0 siblings, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-04-09 17:18 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: jolsa, disgoel, mpe, linux-perf-users, linuxppc-dev, maddy,
	rnsastry, kjain, linux-kernel, srikar, irogers

Em Sat, Apr 09, 2022 at 12:28:01PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Apr 06, 2022 at 11:21:09PM +0530, Athira Rajeev escreveu:
> > The perf benchmark for collections: numa, futex and epoll
> > hits failure in system configuration with CPU's more than 1024.
> > These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
> > in the code to work with affinity.
> 
> Applied 1-3, 4 needs some reworking and can wait for v5.19, the first 3
> are fixes, so can go now.

Now trying to fix this:

  26     7.89 debian:9                      : FAIL gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
    bench/numa.c: In function 'alloc_data':
    bench/numa.c:359:6: error: 'orig_mask' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      ret = sched_setaffinity(0, size, mask);
      ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    bench/numa.c:409:13: note: 'orig_mask' was declared here
      cpu_set_t *orig_mask;
                 ^~~~~~~~~
    cc1: all warnings being treated as errors
    /git/perf-5.18.0-rc1/tools/build/Makefile.build:139: recipe for target 'bench' failed
    make[3]: *** [bench] Error 2


Happened in several distros.

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online
  2022-04-09 15:20   ` Arnaldo Carvalho de Melo
@ 2022-04-11 13:54     ` Athira Rajeev
  0 siblings, 0 replies; 16+ messages in thread
From: Athira Rajeev @ 2022-04-11 13:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, disgoel, Michael Ellerman, linux-perf-users,
	linuxppc-dev, maddy, Nageswara Sastry, kajoljain,
	Linux Kernel Mailing List, Srikar Dronamraju, Ian Rogers



> On 09-Apr-2022, at 8:50 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> Em Wed, Apr 06, 2022 at 11:21:13PM +0530, Athira Rajeev escreveu:
>> Perf numa bench test fails with error:
>> 
>> Testcase:
>> ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
>> --thp  1 --no-data_rand_walk
>> 
>> Failure snippet:
>> <<>>
>> Running 'numa/mem' benchmark:
>> 
>> # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
>> -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"
>> 
>> perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
>> <<>>
>> 
>> The Testcases uses CPU’s 0 and 8. In function "parse_setup_cpu_list",
>> There is check to see if cpu number is greater than max cpu’s possible
>> in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
>> bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
>> say 48 CPU’s, but only number of online CPU’s is 0-7. Other CPU’s
>> are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
>> and set bit for CPU 8 also in cpumask ( td->bind_cpumask).
>> 
>> bind_to_cpumask function is called to set affinity using
>> sched_setaffinity and the cpumask. Since the CPU8 is not present,
>> set affinity will fail here with EINVAL. Fix this issue by adding a
>> check to make sure that, CPU’s provided in the input argument values
>> are online before proceeding further and skip the test. For this,
>> include new helper function "is_cpu_online" in
>> "tools/perf/util/header.c".
>> 
>> Since "BIT(x)" definition will get included from header.h, remove
>> that from bench/numa.c
>> 
>> Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
>> ---
>> tools/perf/bench/numa.c  |  8 ++++++--
>> tools/perf/util/header.c | 43 ++++++++++++++++++++++++++++++++++++++++
>> tools/perf/util/header.h |  1 +
>> 3 files changed, 50 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
>> index 29e41e32bd88..7992d79b3e41 100644
>> --- a/tools/perf/bench/numa.c
>> +++ b/tools/perf/bench/numa.c
>> @@ -34,6 +34,7 @@
>> #include <linux/numa.h>
>> #include <linux/zalloc.h>
>> 
>> +#include "../util/header.h"
>> #include <numa.h>
>> #include <numaif.h>
>> 
>> @@ -616,6 +617,11 @@ static int parse_setup_cpu_list(void)
>> 			return -1;
>> 		}
>> 
>> +		if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) != 1) {
>> +			printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 is offline\n");
>> +			return -1;
>> +		}
>> +
>> 		BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
>> 		BUG_ON(bind_cpu_0 > bind_cpu_1);
>> 
>> @@ -786,8 +792,6 @@ static int parse_nodes_opt(const struct option *opt __maybe_unused,
>> 	return parse_node_list(arg);
>> }
>> 
>> -#define BIT(x) (1ul << x)
>> -
>> static inline uint32_t lfsr_32(uint32_t lfsr)
>> {
>> 	const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
>> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
>> index 6da12e522edc..3f5fcf5d4b3f 100644
>> --- a/tools/perf/util/header.c
>> +++ b/tools/perf/util/header.c
>> @@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
>> 	return do_write(ff, &data->dir.version, sizeof(data->dir.version));
>> }
>> 
>> +#define SYSFS "/sys/devices/system/cpu/"
> 
> Please use
> 
> int sysfs__read_str(const char *entry, char **buf, size_t *sizep)

Hi Arnaldo,

Sure, I will send a V3 for this separately which uses “sysfs__read_str”

Thanks for the review
Athira
> 
> See how to use it in the smt_on() function at tools/perf/util/smt.c, for
> example.
> 
> Now looking at the first patches in the series.
> 
> - Arnaldo
> 
>> +/*
>> + * Check whether a CPU is online
>> + *
>> + * Returns:
>> + *     1 -> if CPU is online
>> + *     0 -> if CPU is offline
>> + *    -1 -> error case
>> + */
>> +int is_cpu_online(unsigned int cpu)
>> +{
>> +	char sysfs_cpu[255];
>> +	char buf[255];
>> +	struct stat statbuf;
>> +	size_t len;
>> +	int fd;
>> +
>> +	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
>> +
>> +	if (stat(sysfs_cpu, &statbuf) != 0)
>> +		return 0;
>> +
>> +	/*
>> +	 * Check if /sys/devices/system/cpu/cpux/online file
>> +	 * exists. In kernels without CONFIG_HOTPLUG_CPU, this
>> +	 * file won't exist.
>> +	 */
>> +	snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
>> +	if (stat(sysfs_cpu, &statbuf) != 0)
>> +		return 1;
>> +
>> +	fd = open(sysfs_cpu, O_RDONLY);
>> +	if (fd == -1)
>> +		return -1;
>> +
>> +	len = read(fd, buf, sizeof(buf) - 1);
>> +	buf[len] = '\0';
>> +	close(fd);
>> +
>> +	return strtoul(buf, NULL, 16);
>> +}
>> +
>> #ifdef HAVE_LIBBPF_SUPPORT
>> static int write_bpf_prog_info(struct feat_fd *ff,
>> 			       struct evlist *evlist __maybe_unused)
>> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
>> index c9e3265832d9..0eb4bc29a5a4 100644
>> --- a/tools/perf/util/header.h
>> +++ b/tools/perf/util/header.h
>> @@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t size);
>> int write_padded(struct feat_fd *fd, const void *bf,
>> 		 size_t count, size_t count_aligned);
>> 
>> +int is_cpu_online(unsigned int cpu);
>> /*
>>  * arch specific callback
>>  */
>> -- 
>> 2.35.1
> 
> -- 
> 
> - Arnaldo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K
  2022-04-09 17:18   ` Arnaldo Carvalho de Melo
@ 2022-04-12 16:33     ` Athira Rajeev
  0 siblings, 0 replies; 16+ messages in thread
From: Athira Rajeev @ 2022-04-12 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ian Rogers, maddy, Srikar Dronamraju, Nageswara Sastry,
	Linux Kernel Mailing List, linux-perf-users, Jiri Olsa,
	kajoljain, disgoel, linuxppc-dev



> On 09-Apr-2022, at 10:48 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> Em Sat, Apr 09, 2022 at 12:28:01PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Wed, Apr 06, 2022 at 11:21:09PM +0530, Athira Rajeev escreveu:
>>> The perf benchmark for collections: numa, futex and epoll
>>> hits failure in system configuration with CPU's more than 1024.
>>> These benchmarks uses "sched_getaffinity" and "sched_setaffinity"
>>> in the code to work with affinity.
>> 
>> Applied 1-3, 4 needs some reworking and can wait for v5.19, the first 3
>> are fixes, so can go now.
> 
> Now trying to fix this:
> 
>  26     7.89 debian:9                      : FAIL gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
>    bench/numa.c: In function 'alloc_data':
>    bench/numa.c:359:6: error: 'orig_mask' may be used uninitialized in this function [-Werror=maybe-uninitialized]
>      ret = sched_setaffinity(0, size, mask);
>      ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    bench/numa.c:409:13: note: 'orig_mask' was declared here
>      cpu_set_t *orig_mask;
>                 ^~~~~~~~~
>    cc1: all warnings being treated as errors
>    /git/perf-5.18.0-rc1/tools/build/Makefile.build:139: recipe for target 'bench' failed
>    make[3]: *** [bench] Error 2
> 
> 
> Happened in several distros.

Hi Arnaldo

Thanks for pointing it. I could be able to recreate this compile error in Debian.
The reason for this issue is variable orig_mask which is used and initialised in “alloc_data"
function within if condition for "init_cpu0”. We can fix this issue by initialising it to NULL since
it is accessed conditionally. I also made some changes to CPU_FREE the mask in other error paths.
I will post a V3 with these changes.

Athira

> 
> - Arnaldo


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-04-12 16:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 17:51 [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Athira Rajeev
2022-04-06 17:51 ` [PATCH v2 1/4] tools/perf: Fix perf bench futex to correct usage of affinity for machines with " Athira Rajeev
2022-04-08 12:27   ` Srikar Dronamraju
2022-04-06 17:51 ` [PATCH v2 2/4] tools/perf: Fix perf bench epoll " Athira Rajeev
2022-04-06 17:51 ` [PATCH v2 3/4] tools/perf: Fix perf numa bench to fix " Athira Rajeev
2022-04-08 12:28   ` Srikar Dronamraju
2022-04-06 17:51 ` [PATCH v2 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online Athira Rajeev
2022-04-08 12:26   ` Srikar Dronamraju
2022-04-09  6:29     ` Athira Rajeev
2022-04-09 15:20   ` Arnaldo Carvalho de Melo
2022-04-11 13:54     ` Athira Rajeev
2022-04-07  0:35 ` [PATCH v2 0/4] Fix perf bench numa, futex and epoll to work with machines having #CPUs > 1K Ian Rogers
2022-04-07  4:31   ` Athira Rajeev
2022-04-09 15:28 ` Arnaldo Carvalho de Melo
2022-04-09 17:18   ` Arnaldo Carvalho de Melo
2022-04-12 16:33     ` Athira Rajeev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).