All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] perf bench numa: make possible testing on uneven and/or overlapping CPU ranges
@ 2020-08-10  6:21 Alexander Gordeev
  2020-08-10  6:21 ` [PATCH v2 1/3] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes Alexander Gordeev
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Alexander Gordeev @ 2020-08-10  6:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Gordeev, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim

This series allows running the tool on some configurations
that do not conform to an assumption each node contains
nr_cpus/nr_nodes CPUs at most. Instead, the actual node-to-
CPU mapping is acquired dynamically.

patch 1 fixes the described issue
patches 2,3 are follow-up fixes

Changes since v1:
  - numa01* and numa02* test names left intact;
  - "2x3-convergence" fix moved out to separate patch

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>

Alexander Gordeev (3):
  perf bench numa: use numa_node_to_cpus() to bind tasks to nodes
  perf bench numa: fix number of processes in "2x3-convergence" test
  perf bench numa: fix benchmark names

 tools/perf/bench/numa.c | 64 ++++++++++++++++++++++++-------------------------
 1 file changed, 31 insertions(+), 33 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes
  2020-08-10  6:21 [PATCH v2 0/3] perf bench numa: make possible testing on uneven and/or overlapping CPU ranges Alexander Gordeev
@ 2020-08-10  6:21 ` Alexander Gordeev
  2020-08-11  7:26   ` Namhyung Kim
  2020-08-10  6:21 ` [PATCH v2 2/3] perf bench numa: fix number of processes in "2x3-convergence" test Alexander Gordeev
  2020-08-10  6:22 ` [PATCH v2 3/3] perf bench numa: fix benchmark names Alexander Gordeev
  2 siblings, 1 reply; 15+ messages in thread
From: Alexander Gordeev @ 2020-08-10  6:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Gordeev, Satheesh Rajendran, Srikar Dronamraju,
	Naveen N. Rao, Balamuruhan S, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim

It is currently assumed that each node contains at most
nr_cpus/nr_nodes CPUs and node CPU ranges do not overlap.
That assumption is generally incorrect as there are archs
where a CPU number does not depend on to its node number.

This update removes the described assumption by simply calling
numa_node_to_cpus() interface and using the returned mask for
binding CPUs to nodes. It also tightens a cpumask allocation
failure check a bit.

Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
---
 tools/perf/bench/numa.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 5797253..23e224e 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -247,12 +247,13 @@ static int is_node_present(int node)
  */
 static bool node_has_cpus(int node)
 {
-	struct bitmask *cpu = numa_allocate_cpumask();
+	struct bitmask *cpumask = numa_allocate_cpumask();
 	unsigned int i;
 
-	if (cpu && !numa_node_to_cpus(node, cpu)) {
-		for (i = 0; i < cpu->size; i++) {
-			if (numa_bitmask_isbitset(cpu, i))
+	BUG_ON(!cpumask);
+	if (!numa_node_to_cpus(node, cpumask)) {
+		for (i = 0; i < cpumask->size; i++) {
+			if (numa_bitmask_isbitset(cpumask, i))
 				return true;
 		}
 	}
@@ -288,14 +289,10 @@ static cpu_set_t bind_to_cpu(int target_cpu)
 
 static cpu_set_t bind_to_node(int target_node)
 {
-	int cpus_per_node = g->p.nr_cpus / nr_numa_nodes();
 	cpu_set_t orig_mask, mask;
 	int cpu;
 	int ret;
 
-	BUG_ON(cpus_per_node * nr_numa_nodes() != g->p.nr_cpus);
-	BUG_ON(!cpus_per_node);
-
 	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
 	BUG_ON(ret);
 
@@ -305,13 +302,15 @@ static cpu_set_t bind_to_node(int target_node)
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
 			CPU_SET(cpu, &mask);
 	} else {
-		int cpu_start = (target_node + 0) * cpus_per_node;
-		int cpu_stop  = (target_node + 1) * cpus_per_node;
-
-		BUG_ON(cpu_stop > g->p.nr_cpus);
+		struct bitmask *cpumask = numa_allocate_cpumask();
 
-		for (cpu = cpu_start; cpu < cpu_stop; cpu++)
-			CPU_SET(cpu, &mask);
+		BUG_ON(!cpumask);
+		if (!numa_node_to_cpus(target_node, cpumask)) {
+			for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
+				if (numa_bitmask_isbitset(cpumask, cpu))
+					CPU_SET(cpu, &mask);
+			}
+		}
 	}
 
 	ret = sched_setaffinity(0, sizeof(mask), &mask);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] perf bench numa: fix number of processes in "2x3-convergence" test
  2020-08-10  6:21 [PATCH v2 0/3] perf bench numa: make possible testing on uneven and/or overlapping CPU ranges Alexander Gordeev
  2020-08-10  6:21 ` [PATCH v2 1/3] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes Alexander Gordeev
@ 2020-08-10  6:21 ` Alexander Gordeev
  2020-08-11  7:27   ` Namhyung Kim
  2020-08-10  6:22 ` [PATCH v2 3/3] perf bench numa: fix benchmark names Alexander Gordeev
  2 siblings, 1 reply; 15+ messages in thread
From: Alexander Gordeev @ 2020-08-10  6:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Gordeev, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
---
 tools/perf/bench/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 23e224e..90639c9 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -1754,7 +1754,7 @@ static int run_bench_numa(const char *name, const char **argv)
    { " 1x3-convergence,", "mem",  "-p",  "1", "-t",  "3", "-P",  "512", OPT_CONV },
    { " 1x4-convergence,", "mem",  "-p",  "1", "-t",  "4", "-P",  "512", OPT_CONV },
    { " 1x6-convergence,", "mem",  "-p",  "1", "-t",  "6", "-P", "1020", OPT_CONV },
-   { " 2x3-convergence,", "mem",  "-p",  "3", "-t",  "3", "-P", "1020", OPT_CONV },
+   { " 2x3-convergence,", "mem",  "-p",  "2", "-t",  "3", "-P", "1020", OPT_CONV },
    { " 3x3-convergence,", "mem",  "-p",  "3", "-t",  "3", "-P", "1020", OPT_CONV },
    { " 4x4-convergence,", "mem",  "-p",  "4", "-t",  "4", "-P",  "512", OPT_CONV },
    { " 4x4-convergence-NOTHP,",
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] perf bench numa: fix benchmark names
  2020-08-10  6:21 [PATCH v2 0/3] perf bench numa: make possible testing on uneven and/or overlapping CPU ranges Alexander Gordeev
  2020-08-10  6:21 ` [PATCH v2 1/3] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes Alexander Gordeev
  2020-08-10  6:21 ` [PATCH v2 2/3] perf bench numa: fix number of processes in "2x3-convergence" test Alexander Gordeev
@ 2020-08-10  6:22 ` Alexander Gordeev
  2020-08-12 12:09   ` Arnaldo Carvalho de Melo
  2 siblings, 1 reply; 15+ messages in thread
From: Alexander Gordeev @ 2020-08-10  6:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Gordeev, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim

Standard benchmark names let users know the tests specifics.
For example "2x1-bw-process" name tells that two processes
one thread each are run and the RAM bandwidth is measured.

Several benchmarks names do not correspond to their actual
running configuration. Fix that and also some whitespace
and comment inconsistencies.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
---
 tools/perf/bench/numa.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 90639c9..3b4b63f 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -812,12 +812,12 @@ static u64 do_work(u8 *__data, long bytes, int nr, int nr_max, int loop, u64 val
 			}
 		}
 	} else if (!g->p.data_backwards || (nr + loop) & 1) {
+		/* Process data forwards: */
 
 		d0 = data + off;
 		d  = data + off + 1;
 		d1 = data + words;
 
-		/* Process data forwards: */
 		for (;;) {
 			if (unlikely(d >= d1))
 				d = data;
@@ -835,7 +835,6 @@ static u64 do_work(u8 *__data, long bytes, int nr, int nr_max, int loop, u64 val
 		d  = data + off - 1;
 		d1 = data + words;
 
-		/* Process data forwards: */
 		for (;;) {
 			if (unlikely(d < data))
 				d = data + words-1;
@@ -1732,12 +1731,12 @@ static int run_bench_numa(const char *name, const char **argv)
  */
 static const char *tests[][MAX_ARGS] = {
    /* Basic single-stream NUMA bandwidth measurements: */
-   { "RAM-bw-local,",	  "mem",  "-p",  "1",  "-t",  "1", "-P", "1024",
+   { "RAM-bw-local,",     "mem",  "-p",  "1",  "-t",  "1", "-P", "1024",
 			  "-C" ,   "0", "-M",   "0", OPT_BW_RAM },
    { "RAM-bw-local-NOTHP,",
 			  "mem",  "-p",  "1",  "-t",  "1", "-P", "1024",
 			  "-C" ,   "0", "-M",   "0", OPT_BW_RAM_NOTHP },
-   { "RAM-bw-remote,",	  "mem",  "-p",  "1",  "-t",  "1", "-P", "1024",
+   { "RAM-bw-remote,",    "mem",  "-p",  "1",  "-t",  "1", "-P", "1024",
 			  "-C" ,   "0", "-M",   "1", OPT_BW_RAM },
 
    /* 2-stream NUMA bandwidth measurements: */
@@ -1779,24 +1778,24 @@ static int run_bench_numa(const char *name, const char **argv)
 			  "mem",  "-p",  "8", "-t",  "1", "-P", " 512", OPT_BW_NOTHP },
    { "16x1-bw-process,",  "mem",  "-p", "16", "-t",  "1", "-P",  "256", OPT_BW },
 
-   { " 4x1-bw-thread,",	  "mem",  "-p",  "1", "-t",  "4", "-T",  "256", OPT_BW },
-   { " 8x1-bw-thread,",	  "mem",  "-p",  "1", "-t",  "8", "-T",  "256", OPT_BW },
-   { "16x1-bw-thread,",   "mem",  "-p",  "1", "-t", "16", "-T",  "128", OPT_BW },
-   { "32x1-bw-thread,",   "mem",  "-p",  "1", "-t", "32", "-T",   "64", OPT_BW },
+   { " 1x4-bw-thread,",   "mem",  "-p",  "1", "-t",  "4", "-T",  "256", OPT_BW },
+   { " 1x8-bw-thread,",   "mem",  "-p",  "1", "-t",  "8", "-T",  "256", OPT_BW },
+   { "1x16-bw-thread,",   "mem",  "-p",  "1", "-t", "16", "-T",  "128", OPT_BW },
+   { "1x32-bw-thread,",   "mem",  "-p",  "1", "-t", "32", "-T",   "64", OPT_BW },
 
-   { " 2x3-bw-thread,",	  "mem",  "-p",  "2", "-t",  "3", "-P",  "512", OPT_BW },
-   { " 4x4-bw-thread,",	  "mem",  "-p",  "4", "-t",  "4", "-P",  "512", OPT_BW },
-   { " 4x6-bw-thread,",	  "mem",  "-p",  "4", "-t",  "6", "-P",  "512", OPT_BW },
-   { " 4x8-bw-thread,",	  "mem",  "-p",  "4", "-t",  "8", "-P",  "512", OPT_BW },
-   { " 4x8-bw-thread-NOTHP,",
+   { " 2x3-bw-process,",  "mem",  "-p",  "2", "-t",  "3", "-P",  "512", OPT_BW },
+   { " 4x4-bw-process,",  "mem",  "-p",  "4", "-t",  "4", "-P",  "512", OPT_BW },
+   { " 4x6-bw-process,",  "mem",  "-p",  "4", "-t",  "6", "-P",  "512", OPT_BW },
+   { " 4x8-bw-process,",  "mem",  "-p",  "4", "-t",  "8", "-P",  "512", OPT_BW },
+   { " 4x8-bw-process-NOTHP,",
 			  "mem",  "-p",  "4", "-t",  "8", "-P",  "512", OPT_BW_NOTHP },
-   { " 3x3-bw-thread,",	  "mem",  "-p",  "3", "-t",  "3", "-P",  "512", OPT_BW },
-   { " 5x5-bw-thread,",	  "mem",  "-p",  "5", "-t",  "5", "-P",  "512", OPT_BW },
+   { " 3x3-bw-process,",  "mem",  "-p",  "3", "-t",  "3", "-P",  "512", OPT_BW },
+   { " 5x5-bw-process,",  "mem",  "-p",  "5", "-t",  "5", "-P",  "512", OPT_BW },
 
-   { "2x16-bw-thread,",   "mem",  "-p",  "2", "-t", "16", "-P",  "512", OPT_BW },
-   { "1x32-bw-thread,",   "mem",  "-p",  "1", "-t", "32", "-P", "2048", OPT_BW },
+   { "2x16-bw-process,",  "mem",  "-p",  "2", "-t", "16", "-P",  "512", OPT_BW },
+   { "1x32-bw-process,",  "mem",  "-p",  "1", "-t", "32", "-P", "2048", OPT_BW },
 
-   { "numa02-bw,",	  "mem",  "-p",  "1", "-t", "32", "-T",   "32", OPT_BW },
+   { "numa02-bw,",        "mem",  "-p",  "1", "-t", "32", "-T",   "32", OPT_BW },
    { "numa02-bw-NOTHP,",  "mem",  "-p",  "1", "-t", "32", "-T",   "32", OPT_BW_NOTHP },
    { "numa01-bw-thread,", "mem",  "-p",  "2", "-t", "16", "-T",  "192", OPT_BW },
    { "numa01-bw-thread-NOTHP,",
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes
  2020-08-10  6:21 ` [PATCH v2 1/3] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes Alexander Gordeev
@ 2020-08-11  7:26   ` Namhyung Kim
  2020-08-11  8:11     ` Alexander Gordeev
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Namhyung Kim @ 2020-08-11  7:26 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Satheesh Rajendran, Srikar Dronamraju,
	Naveen N. Rao, Balamuruhan S, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa

Hello,

On Mon, Aug 10, 2020 at 3:22 PM Alexander Gordeev
<agordeev@linux.ibm.com> wrote:
>
> It is currently assumed that each node contains at most
> nr_cpus/nr_nodes CPUs and node CPU ranges do not overlap.
> That assumption is generally incorrect as there are archs
> where a CPU number does not depend on to its node number.
>
> This update removes the described assumption by simply calling
> numa_node_to_cpus() interface and using the returned mask for
> binding CPUs to nodes. It also tightens a cpumask allocation
> failure check a bit.
>
> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
> ---
>  tools/perf/bench/numa.c | 27 +++++++++++++--------------
>  1 file changed, 13 insertions(+), 14 deletions(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 5797253..23e224e 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -247,12 +247,13 @@ static int is_node_present(int node)
>   */
>  static bool node_has_cpus(int node)
>  {
> -       struct bitmask *cpu = numa_allocate_cpumask();
> +       struct bitmask *cpumask = numa_allocate_cpumask();
>         unsigned int i;
>
> -       if (cpu && !numa_node_to_cpus(node, cpu)) {
> -               for (i = 0; i < cpu->size; i++) {
> -                       if (numa_bitmask_isbitset(cpu, i))
> +       BUG_ON(!cpumask);
> +       if (!numa_node_to_cpus(node, cpumask)) {
> +               for (i = 0; i < cpumask->size; i++) {
> +                       if (numa_bitmask_isbitset(cpumask, i))
>                                 return true;
>                 }
>         }
> @@ -288,14 +289,10 @@ static cpu_set_t bind_to_cpu(int target_cpu)
>
>  static cpu_set_t bind_to_node(int target_node)
>  {
> -       int cpus_per_node = g->p.nr_cpus / nr_numa_nodes();
>         cpu_set_t orig_mask, mask;
>         int cpu;
>         int ret;
>
> -       BUG_ON(cpus_per_node * nr_numa_nodes() != g->p.nr_cpus);
> -       BUG_ON(!cpus_per_node);
> -
>         ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
>         BUG_ON(ret);
>
> @@ -305,13 +302,15 @@ static cpu_set_t bind_to_node(int target_node)
>                 for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
>                         CPU_SET(cpu, &mask);
>         } else {
> -               int cpu_start = (target_node + 0) * cpus_per_node;
> -               int cpu_stop  = (target_node + 1) * cpus_per_node;
> -
> -               BUG_ON(cpu_stop > g->p.nr_cpus);
> +               struct bitmask *cpumask = numa_allocate_cpumask();
>
> -               for (cpu = cpu_start; cpu < cpu_stop; cpu++)
> -                       CPU_SET(cpu, &mask);
> +               BUG_ON(!cpumask);
> +               if (!numa_node_to_cpus(target_node, cpumask)) {
> +                       for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
> +                               if (numa_bitmask_isbitset(cpumask, cpu))
> +                                       CPU_SET(cpu, &mask);
> +                       }
> +               }

It seems you need to call numa_free_cpumask() for both functions.

Thanks
Namhyung

>         }
>
>         ret = sched_setaffinity(0, sizeof(mask), &mask);
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] perf bench numa: fix number of processes in "2x3-convergence" test
  2020-08-10  6:21 ` [PATCH v2 2/3] perf bench numa: fix number of processes in "2x3-convergence" test Alexander Gordeev
@ 2020-08-11  7:27   ` Namhyung Kim
  2020-08-12 12:07     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2020-08-11  7:27 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa

On Mon, Aug 10, 2020 at 3:22 PM Alexander Gordeev
<agordeev@linux.ibm.com> wrote:
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks
Namhyung

> ---
>  tools/perf/bench/numa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 23e224e..90639c9 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -1754,7 +1754,7 @@ static int run_bench_numa(const char *name, const char **argv)
>     { " 1x3-convergence,", "mem",  "-p",  "1", "-t",  "3", "-P",  "512", OPT_CONV },
>     { " 1x4-convergence,", "mem",  "-p",  "1", "-t",  "4", "-P",  "512", OPT_CONV },
>     { " 1x6-convergence,", "mem",  "-p",  "1", "-t",  "6", "-P", "1020", OPT_CONV },
> -   { " 2x3-convergence,", "mem",  "-p",  "3", "-t",  "3", "-P", "1020", OPT_CONV },
> +   { " 2x3-convergence,", "mem",  "-p",  "2", "-t",  "3", "-P", "1020", OPT_CONV },
>     { " 3x3-convergence,", "mem",  "-p",  "3", "-t",  "3", "-P", "1020", OPT_CONV },
>     { " 4x4-convergence,", "mem",  "-p",  "4", "-t",  "4", "-P",  "512", OPT_CONV },
>     { " 4x4-convergence-NOTHP,",
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes
  2020-08-11  7:26   ` Namhyung Kim
@ 2020-08-11  8:11     ` Alexander Gordeev
  2020-08-13 11:30     ` [PATCH v3 1/2] perf bench numa: fix cpumask memory leak in node_has_cpus() Alexander Gordeev
  2020-08-13 11:32     ` [PATCH v3 1/2] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes Alexander Gordeev
  2 siblings, 0 replies; 15+ messages in thread
From: Alexander Gordeev @ 2020-08-11  8:11 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Satheesh Rajendran, Srikar Dronamraju,
	Naveen N. Rao, Balamuruhan S, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa

On Tue, Aug 11, 2020 at 04:26:50PM +0900, Namhyung Kim wrote:
> Hello,
> 
> On Mon, Aug 10, 2020 at 3:22 PM Alexander Gordeev
> <agordeev@linux.ibm.com> wrote:
> >
> > It is currently assumed that each node contains at most
> > nr_cpus/nr_nodes CPUs and node CPU ranges do not overlap.
> > That assumption is generally incorrect as there are archs
> > where a CPU number does not depend on to its node number.
> >
> > This update removes the described assumption by simply calling
> > numa_node_to_cpus() interface and using the returned mask for
> > binding CPUs to nodes. It also tightens a cpumask allocation
> > failure check a bit.
> >
> > Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> > Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> > Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
> > ---
> >  tools/perf/bench/numa.c | 27 +++++++++++++--------------
> >  1 file changed, 13 insertions(+), 14 deletions(-)
> >
> > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > index 5797253..23e224e 100644
> > --- a/tools/perf/bench/numa.c
> > +++ b/tools/perf/bench/numa.c
> > @@ -247,12 +247,13 @@ static int is_node_present(int node)
> >   */
> >  static bool node_has_cpus(int node)
> >  {
> > -       struct bitmask *cpu = numa_allocate_cpumask();
> > +       struct bitmask *cpumask = numa_allocate_cpumask();
> >         unsigned int i;
> >
> > -       if (cpu && !numa_node_to_cpus(node, cpu)) {
> > -               for (i = 0; i < cpu->size; i++) {
> > -                       if (numa_bitmask_isbitset(cpu, i))
> > +       BUG_ON(!cpumask);
> > +       if (!numa_node_to_cpus(node, cpumask)) {
> > +               for (i = 0; i < cpumask->size; i++) {
> > +                       if (numa_bitmask_isbitset(cpumask, i))
> >                                 return true;
> >                 }
> >         }
> > @@ -288,14 +289,10 @@ static cpu_set_t bind_to_cpu(int target_cpu)
> >
> >  static cpu_set_t bind_to_node(int target_node)
> >  {
> > -       int cpus_per_node = g->p.nr_cpus / nr_numa_nodes();
> >         cpu_set_t orig_mask, mask;
> >         int cpu;
> >         int ret;
> >
> > -       BUG_ON(cpus_per_node * nr_numa_nodes() != g->p.nr_cpus);
> > -       BUG_ON(!cpus_per_node);
> > -
> >         ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
> >         BUG_ON(ret);
> >
> > @@ -305,13 +302,15 @@ static cpu_set_t bind_to_node(int target_node)
> >                 for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> >                         CPU_SET(cpu, &mask);
> >         } else {
> > -               int cpu_start = (target_node + 0) * cpus_per_node;
> > -               int cpu_stop  = (target_node + 1) * cpus_per_node;
> > -
> > -               BUG_ON(cpu_stop > g->p.nr_cpus);
> > +               struct bitmask *cpumask = numa_allocate_cpumask();
> >
> > -               for (cpu = cpu_start; cpu < cpu_stop; cpu++)
> > -                       CPU_SET(cpu, &mask);
> > +               BUG_ON(!cpumask);
> > +               if (!numa_node_to_cpus(target_node, cpumask)) {
> > +                       for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
> > +                               if (numa_bitmask_isbitset(cpumask, cpu))
> > +                                       CPU_SET(cpu, &mask);
> > +                       }
> > +               }
> 
> It seems you need to call numa_free_cpumask() for both functions.

Well, the whole approach to memory allocation is rather relaxed
troughout the code. I.e cpumasks do not get deallocated (*),
strdup() return values are not checked etc.

If it worth fixing all that then it would be a separate effort,
as far as I am concerned.

> Thanks
> Namhyung
> 
> >         }
> >
> >         ret = sched_setaffinity(0, sizeof(mask), &mask);
> > --
> > 1.8.3.1
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] perf bench numa: fix number of processes in "2x3-convergence" test
  2020-08-11  7:27   ` Namhyung Kim
@ 2020-08-12 12:07     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-08-12 12:07 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexander Gordeev, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa

Em Tue, Aug 11, 2020 at 04:27:35PM +0900, Namhyung Kim escreveu:
> On Mon, Aug 10, 2020 at 3:22 PM Alexander Gordeev
> <agordeev@linux.ibm.com> wrote:
> >
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
> 
> Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks, applied,

- Arnaldo
 
> Thanks
> Namhyung
> 
> > ---
> >  tools/perf/bench/numa.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > index 23e224e..90639c9 100644
> > --- a/tools/perf/bench/numa.c
> > +++ b/tools/perf/bench/numa.c
> > @@ -1754,7 +1754,7 @@ static int run_bench_numa(const char *name, const char **argv)
> >     { " 1x3-convergence,", "mem",  "-p",  "1", "-t",  "3", "-P",  "512", OPT_CONV },
> >     { " 1x4-convergence,", "mem",  "-p",  "1", "-t",  "4", "-P",  "512", OPT_CONV },
> >     { " 1x6-convergence,", "mem",  "-p",  "1", "-t",  "6", "-P", "1020", OPT_CONV },
> > -   { " 2x3-convergence,", "mem",  "-p",  "3", "-t",  "3", "-P", "1020", OPT_CONV },
> > +   { " 2x3-convergence,", "mem",  "-p",  "2", "-t",  "3", "-P", "1020", OPT_CONV },
> >     { " 3x3-convergence,", "mem",  "-p",  "3", "-t",  "3", "-P", "1020", OPT_CONV },
> >     { " 4x4-convergence,", "mem",  "-p",  "4", "-t",  "4", "-P",  "512", OPT_CONV },
> >     { " 4x4-convergence-NOTHP,",
> > --
> > 1.8.3.1
> >

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] perf bench numa: fix benchmark names
  2020-08-10  6:22 ` [PATCH v2 3/3] perf bench numa: fix benchmark names Alexander Gordeev
@ 2020-08-12 12:09   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-08-12 12:09 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim

Em Mon, Aug 10, 2020 at 08:22:00AM +0200, Alexander Gordeev escreveu:
> Standard benchmark names let users know the tests specifics.
> For example "2x1-bw-process" name tells that two processes
> one thread each are run and the RAM bandwidth is measured.
> 
> Several benchmarks names do not correspond to their actual
> running configuration. Fix that and also some whitespace
> and comment inconsistencies.

Looks, ok, applied.

- Arnaldo
 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
> ---
>  tools/perf/bench/numa.c | 35 +++++++++++++++++------------------
>  1 file changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 90639c9..3b4b63f 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -812,12 +812,12 @@ static u64 do_work(u8 *__data, long bytes, int nr, int nr_max, int loop, u64 val
>  			}
>  		}
>  	} else if (!g->p.data_backwards || (nr + loop) & 1) {
> +		/* Process data forwards: */
>  
>  		d0 = data + off;
>  		d  = data + off + 1;
>  		d1 = data + words;
>  
> -		/* Process data forwards: */
>  		for (;;) {
>  			if (unlikely(d >= d1))
>  				d = data;
> @@ -835,7 +835,6 @@ static u64 do_work(u8 *__data, long bytes, int nr, int nr_max, int loop, u64 val
>  		d  = data + off - 1;
>  		d1 = data + words;
>  
> -		/* Process data forwards: */
>  		for (;;) {
>  			if (unlikely(d < data))
>  				d = data + words-1;
> @@ -1732,12 +1731,12 @@ static int run_bench_numa(const char *name, const char **argv)
>   */
>  static const char *tests[][MAX_ARGS] = {
>     /* Basic single-stream NUMA bandwidth measurements: */
> -   { "RAM-bw-local,",	  "mem",  "-p",  "1",  "-t",  "1", "-P", "1024",
> +   { "RAM-bw-local,",     "mem",  "-p",  "1",  "-t",  "1", "-P", "1024",
>  			  "-C" ,   "0", "-M",   "0", OPT_BW_RAM },
>     { "RAM-bw-local-NOTHP,",
>  			  "mem",  "-p",  "1",  "-t",  "1", "-P", "1024",
>  			  "-C" ,   "0", "-M",   "0", OPT_BW_RAM_NOTHP },
> -   { "RAM-bw-remote,",	  "mem",  "-p",  "1",  "-t",  "1", "-P", "1024",
> +   { "RAM-bw-remote,",    "mem",  "-p",  "1",  "-t",  "1", "-P", "1024",
>  			  "-C" ,   "0", "-M",   "1", OPT_BW_RAM },
>  
>     /* 2-stream NUMA bandwidth measurements: */
> @@ -1779,24 +1778,24 @@ static int run_bench_numa(const char *name, const char **argv)
>  			  "mem",  "-p",  "8", "-t",  "1", "-P", " 512", OPT_BW_NOTHP },
>     { "16x1-bw-process,",  "mem",  "-p", "16", "-t",  "1", "-P",  "256", OPT_BW },
>  
> -   { " 4x1-bw-thread,",	  "mem",  "-p",  "1", "-t",  "4", "-T",  "256", OPT_BW },
> -   { " 8x1-bw-thread,",	  "mem",  "-p",  "1", "-t",  "8", "-T",  "256", OPT_BW },
> -   { "16x1-bw-thread,",   "mem",  "-p",  "1", "-t", "16", "-T",  "128", OPT_BW },
> -   { "32x1-bw-thread,",   "mem",  "-p",  "1", "-t", "32", "-T",   "64", OPT_BW },
> +   { " 1x4-bw-thread,",   "mem",  "-p",  "1", "-t",  "4", "-T",  "256", OPT_BW },
> +   { " 1x8-bw-thread,",   "mem",  "-p",  "1", "-t",  "8", "-T",  "256", OPT_BW },
> +   { "1x16-bw-thread,",   "mem",  "-p",  "1", "-t", "16", "-T",  "128", OPT_BW },
> +   { "1x32-bw-thread,",   "mem",  "-p",  "1", "-t", "32", "-T",   "64", OPT_BW },
>  
> -   { " 2x3-bw-thread,",	  "mem",  "-p",  "2", "-t",  "3", "-P",  "512", OPT_BW },
> -   { " 4x4-bw-thread,",	  "mem",  "-p",  "4", "-t",  "4", "-P",  "512", OPT_BW },
> -   { " 4x6-bw-thread,",	  "mem",  "-p",  "4", "-t",  "6", "-P",  "512", OPT_BW },
> -   { " 4x8-bw-thread,",	  "mem",  "-p",  "4", "-t",  "8", "-P",  "512", OPT_BW },
> -   { " 4x8-bw-thread-NOTHP,",
> +   { " 2x3-bw-process,",  "mem",  "-p",  "2", "-t",  "3", "-P",  "512", OPT_BW },
> +   { " 4x4-bw-process,",  "mem",  "-p",  "4", "-t",  "4", "-P",  "512", OPT_BW },
> +   { " 4x6-bw-process,",  "mem",  "-p",  "4", "-t",  "6", "-P",  "512", OPT_BW },
> +   { " 4x8-bw-process,",  "mem",  "-p",  "4", "-t",  "8", "-P",  "512", OPT_BW },
> +   { " 4x8-bw-process-NOTHP,",
>  			  "mem",  "-p",  "4", "-t",  "8", "-P",  "512", OPT_BW_NOTHP },
> -   { " 3x3-bw-thread,",	  "mem",  "-p",  "3", "-t",  "3", "-P",  "512", OPT_BW },
> -   { " 5x5-bw-thread,",	  "mem",  "-p",  "5", "-t",  "5", "-P",  "512", OPT_BW },
> +   { " 3x3-bw-process,",  "mem",  "-p",  "3", "-t",  "3", "-P",  "512", OPT_BW },
> +   { " 5x5-bw-process,",  "mem",  "-p",  "5", "-t",  "5", "-P",  "512", OPT_BW },
>  
> -   { "2x16-bw-thread,",   "mem",  "-p",  "2", "-t", "16", "-P",  "512", OPT_BW },
> -   { "1x32-bw-thread,",   "mem",  "-p",  "1", "-t", "32", "-P", "2048", OPT_BW },
> +   { "2x16-bw-process,",  "mem",  "-p",  "2", "-t", "16", "-P",  "512", OPT_BW },
> +   { "1x32-bw-process,",  "mem",  "-p",  "1", "-t", "32", "-P", "2048", OPT_BW },
>  
> -   { "numa02-bw,",	  "mem",  "-p",  "1", "-t", "32", "-T",   "32", OPT_BW },
> +   { "numa02-bw,",        "mem",  "-p",  "1", "-t", "32", "-T",   "32", OPT_BW },
>     { "numa02-bw-NOTHP,",  "mem",  "-p",  "1", "-t", "32", "-T",   "32", OPT_BW_NOTHP },
>     { "numa01-bw-thread,", "mem",  "-p",  "2", "-t", "16", "-T",  "192", OPT_BW },
>     { "numa01-bw-thread-NOTHP,",
> -- 
> 1.8.3.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/2] perf bench numa: fix cpumask memory leak in node_has_cpus()
  2020-08-11  7:26   ` Namhyung Kim
  2020-08-11  8:11     ` Alexander Gordeev
@ 2020-08-13 11:30     ` Alexander Gordeev
  2020-08-13 12:06       ` Srikar Dronamraju
  2020-08-13 11:32     ` [PATCH v3 1/2] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes Alexander Gordeev
  2 siblings, 1 reply; 15+ messages in thread
From: Alexander Gordeev @ 2020-08-13 11:30 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: linux-kernel, Satheesh Rajendran, Srikar Dronamraju,
	Naveen N. Rao, Balamuruhan S, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa

Couple numa_allocate_cpumask() and numa_free_cpumask() functions

Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
---
 tools/perf/bench/numa.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 31e2601..9066511 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -248,16 +248,21 @@ static int is_node_present(int node)
 static bool node_has_cpus(int node)
 {
 	struct bitmask *cpu = numa_allocate_cpumask();
+	bool ret = false; /* fall back to nocpus */
 	unsigned int i;
 
-	if (cpu && !numa_node_to_cpus(node, cpu)) {
+	BUG_ON(!cpu);
+	if (!numa_node_to_cpus(node, cpu)) {
 		for (i = 0; i < cpu->size; i++) {
-			if (numa_bitmask_isbitset(cpu, i))
-				return true;
+			if (numa_bitmask_isbitset(cpu, i)) {
+				ret = true;
+				break;
+			}
 		}
 	}
+	numa_free_cpumask(cpu);
 
-	return false; /* lets fall back to nocpus safely */
+	return ret;
 }
 
 static cpu_set_t bind_to_cpu(int target_cpu)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 1/2] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes
  2020-08-11  7:26   ` Namhyung Kim
  2020-08-11  8:11     ` Alexander Gordeev
  2020-08-13 11:30     ` [PATCH v3 1/2] perf bench numa: fix cpumask memory leak in node_has_cpus() Alexander Gordeev
@ 2020-08-13 11:32     ` Alexander Gordeev
  2020-08-13 12:07       ` Srikar Dronamraju
  2 siblings, 1 reply; 15+ messages in thread
From: Alexander Gordeev @ 2020-08-13 11:32 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: linux-kernel, Satheesh Rajendran, Srikar Dronamraju,
	Naveen N. Rao, Balamuruhan S, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa

It is currently assumed that each node contains at most
nr_cpus/nr_nodes CPUs and nodes' CPU ranges do not overlap.
That assumption is generally incorrect as there are archs
where a CPU number does not depend on to its node number.

This update removes the described assumption by simply calling
numa_node_to_cpus() interface and using the returned mask for
binding CPUs to nodes.

Also, variable types and names made consistent in functions
using cpumask.

Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
---
 tools/perf/bench/numa.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 9066511..6d5c890 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -247,20 +247,20 @@ static int is_node_present(int node)
  */
 static bool node_has_cpus(int node)
 {
-	struct bitmask *cpu = numa_allocate_cpumask();
+	struct bitmask *cpumask = numa_allocate_cpumask();
 	bool ret = false; /* fall back to nocpus */
-	unsigned int i;
+	int cpu;
 
-	BUG_ON(!cpu);
-	if (!numa_node_to_cpus(node, cpu)) {
-		for (i = 0; i < cpu->size; i++) {
-			if (numa_bitmask_isbitset(cpu, i)) {
+	BUG_ON(!cpumask);
+	if (!numa_node_to_cpus(node, cpumask)) {
+		for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
+			if (numa_bitmask_isbitset(cpumask, cpu)) {
 				ret = true;
 				break;
 			}
 		}
 	}
-	numa_free_cpumask(cpu);
+	numa_free_cpumask(cpumask);
 
 	return ret;
 }
@@ -293,14 +293,10 @@ static cpu_set_t bind_to_cpu(int target_cpu)
 
 static cpu_set_t bind_to_node(int target_node)
 {
-	int cpus_per_node = g->p.nr_cpus / nr_numa_nodes();
 	cpu_set_t orig_mask, mask;
 	int cpu;
 	int ret;
 
-	BUG_ON(cpus_per_node * nr_numa_nodes() != g->p.nr_cpus);
-	BUG_ON(!cpus_per_node);
-
 	ret = sched_getaffinity(0, sizeof(orig_mask), &orig_mask);
 	BUG_ON(ret);
 
@@ -310,13 +306,16 @@ static cpu_set_t bind_to_node(int target_node)
 		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
 			CPU_SET(cpu, &mask);
 	} else {
-		int cpu_start = (target_node + 0) * cpus_per_node;
-		int cpu_stop  = (target_node + 1) * cpus_per_node;
-
-		BUG_ON(cpu_stop > g->p.nr_cpus);
+		struct bitmask *cpumask = numa_allocate_cpumask();
 
-		for (cpu = cpu_start; cpu < cpu_stop; cpu++)
-			CPU_SET(cpu, &mask);
+		BUG_ON(!cpumask);
+		if (!numa_node_to_cpus(target_node, cpumask)) {
+			for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
+				if (numa_bitmask_isbitset(cpumask, cpu))
+					CPU_SET(cpu, &mask);
+			}
+		}
+		numa_free_cpumask(cpumask);
 	}
 
 	ret = sched_setaffinity(0, sizeof(mask), &mask);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] perf bench numa: fix cpumask memory leak in node_has_cpus()
  2020-08-13 11:30     ` [PATCH v3 1/2] perf bench numa: fix cpumask memory leak in node_has_cpus() Alexander Gordeev
@ 2020-08-13 12:06       ` Srikar Dronamraju
  2020-08-13 13:03         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2020-08-13 12:06 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, linux-kernel,
	Satheesh Rajendran, Naveen N. Rao, Balamuruhan S, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa

* Alexander Gordeev <agordeev@linux.ibm.com> [2020-08-13 13:30:42]:

> Couple numa_allocate_cpumask() and numa_free_cpumask() functions
> 
> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
> ---
>  tools/perf/bench/numa.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 31e2601..9066511 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -248,16 +248,21 @@ static int is_node_present(int node)
>  static bool node_has_cpus(int node)
>  {
>  	struct bitmask *cpu = numa_allocate_cpumask();
> +	bool ret = false; /* fall back to nocpus */
>  	unsigned int i;
> 
> -	if (cpu && !numa_node_to_cpus(node, cpu)) {
> +	BUG_ON(!cpu);
> +	if (!numa_node_to_cpus(node, cpu)) {
>  		for (i = 0; i < cpu->size; i++) {
> -			if (numa_bitmask_isbitset(cpu, i))
> -				return true;
> +			if (numa_bitmask_isbitset(cpu, i)) {
> +				ret = true;
> +				break;
> +			}
>  		}
>  	}
> +	numa_free_cpumask(cpu);
> 
> -	return false; /* lets fall back to nocpus safely */
> +	return ret;
>  }
> 
>  static cpu_set_t bind_to_cpu(int target_cpu)
> -- 
> 1.8.3.1
> 

Looks good to me.

Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes
  2020-08-13 11:32     ` [PATCH v3 1/2] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes Alexander Gordeev
@ 2020-08-13 12:07       ` Srikar Dronamraju
  2020-08-13 13:03         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2020-08-13 12:07 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, linux-kernel,
	Satheesh Rajendran, Naveen N. Rao, Balamuruhan S, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa

* Alexander Gordeev <agordeev@linux.ibm.com> [2020-08-13 13:32:48]:

> It is currently assumed that each node contains at most
> nr_cpus/nr_nodes CPUs and nodes' CPU ranges do not overlap.
> That assumption is generally incorrect as there are archs
> where a CPU number does not depend on to its node number.
> 
> This update removes the described assumption by simply calling
> numa_node_to_cpus() interface and using the returned mask for
> binding CPUs to nodes.
> 
> Also, variable types and names made consistent in functions
> using cpumask.
> 
> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Cc: Balamuruhan S <bala24@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
> ---
> @@ -310,13 +306,16 @@ static cpu_set_t bind_to_node(int target_node)
>  		for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
>  			CPU_SET(cpu, &mask);
>  	} else {
> -		int cpu_start = (target_node + 0) * cpus_per_node;
> -		int cpu_stop  = (target_node + 1) * cpus_per_node;
> -
> -		BUG_ON(cpu_stop > g->p.nr_cpus);
> +		struct bitmask *cpumask = numa_allocate_cpumask();
> 
> -		for (cpu = cpu_start; cpu < cpu_stop; cpu++)
> -			CPU_SET(cpu, &mask);
> +		BUG_ON(!cpumask);
> +		if (!numa_node_to_cpus(target_node, cpumask)) {
> +			for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
> +				if (numa_bitmask_isbitset(cpumask, cpu))
> +					CPU_SET(cpu, &mask);
> +			}
> +		}
> +		numa_free_cpumask(cpumask);
>  	}
> 
>  	ret = sched_setaffinity(0, sizeof(mask), &mask);
> -- 
> 1.8.3.1
> 

Looks good to me.

Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] perf bench numa: fix cpumask memory leak in node_has_cpus()
  2020-08-13 12:06       ` Srikar Dronamraju
@ 2020-08-13 13:03         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-08-13 13:03 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Alexander Gordeev, Namhyung Kim, linux-kernel,
	Satheesh Rajendran, Naveen N. Rao, Balamuruhan S, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa

Em Thu, Aug 13, 2020 at 05:36:49PM +0530, Srikar Dronamraju escreveu:
> Looks good to me.
> 
> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Thanks, applied.

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes
  2020-08-13 12:07       ` Srikar Dronamraju
@ 2020-08-13 13:03         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-08-13 13:03 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Alexander Gordeev, Namhyung Kim, linux-kernel,
	Satheesh Rajendran, Naveen N. Rao, Balamuruhan S, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa

Em Thu, Aug 13, 2020 at 05:37:38PM +0530, Srikar Dronamraju escreveu:
> 
> Looks good to me.
> 
> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Thanks, applied.

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-08-13 13:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-10  6:21 [PATCH v2 0/3] perf bench numa: make possible testing on uneven and/or overlapping CPU ranges Alexander Gordeev
2020-08-10  6:21 ` [PATCH v2 1/3] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes Alexander Gordeev
2020-08-11  7:26   ` Namhyung Kim
2020-08-11  8:11     ` Alexander Gordeev
2020-08-13 11:30     ` [PATCH v3 1/2] perf bench numa: fix cpumask memory leak in node_has_cpus() Alexander Gordeev
2020-08-13 12:06       ` Srikar Dronamraju
2020-08-13 13:03         ` Arnaldo Carvalho de Melo
2020-08-13 11:32     ` [PATCH v3 1/2] perf bench numa: use numa_node_to_cpus() to bind tasks to nodes Alexander Gordeev
2020-08-13 12:07       ` Srikar Dronamraju
2020-08-13 13:03         ` Arnaldo Carvalho de Melo
2020-08-10  6:21 ` [PATCH v2 2/3] perf bench numa: fix number of processes in "2x3-convergence" test Alexander Gordeev
2020-08-11  7:27   ` Namhyung Kim
2020-08-12 12:07     ` Arnaldo Carvalho de Melo
2020-08-10  6:22 ` [PATCH v2 3/3] perf bench numa: fix benchmark names Alexander Gordeev
2020-08-12 12:09   ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.