All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/3] perf record: adapt NUMA awareness to machines with #CPUs > 1K
@ 2019-12-03 11:41 Alexey Budankov
  2019-12-03 11:43 ` [PATCH v5 1/3] tools bitmap: implement bitmap_equal() operation at bitmap API Alexey Budankov
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Alexey Budankov @ 2019-12-03 11:41 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra,
	Ingo Molnar, Andi Kleen, linux-kernel


Current implementation of cpu_set_t type by glibc has internal cpu
mask size limitation of no more than 1024 CPUs. This limitation confines
NUMA awareness of Perf tool in record mode, thru --affinity option,
to the first 1024 CPUs on machines with larger amount of CPUs.

This patch set enables Perf tool to overcome 1024 CPUs limitation by
using a dedicated struct mmap_cpu_mask type and applying tool's bitmap
API operations to manipulate affinity masks of the tool's thread and
the mmaped data buffers.

tools bitmap API has been extended with bitmap_free() function and
bitmap_equal() operation whose implementation is derived from the
kernel one.

---
Changes in v5:
- avoided allocation of mmap affinity masks in case of 
  rec->opts.affinity == PERF_AFFINITY_SYS
Changes in v4:
- renamed perf_mmap__print_cpu_mask() to mmap_cpu_mask__scnprintf()
- avoided checking mask bits for NULL prior calling bitmask_free()
- avoided thread affinity mask allocation for case of 
  rec->opts.affinity == PERF_AFFINITY_SYS
Changes in v3:
- implemented perf_mmap__print_cpu_mask() function
- use perf_mmap__print_cpu_mask() to log thread and mmap cpus masks
  when verbose level is equal to 2
Changes in v2:
- implemented bitmap_free() for symmetry with bitmap_alloc()
- capitalized MMAP_CPU_MASK_BYTES() macro
- returned -1 from perf_mmap__setup_affinity_mask()
- implemented releasing of masks using bitmap_free()
- moved debug printing under -vv option

---
Alexey Budankov (3):
  tools bitmap: implement bitmap_equal() operation at bitmap API
  perf mmap: declare type for cpu mask of arbitrary length
  perf record: adapt affinity to machines with #CPUs > 1K

 tools/include/linux/bitmap.h | 30 +++++++++++++++++++++++++++
 tools/lib/bitmap.c           | 15 ++++++++++++++
 tools/perf/builtin-record.c  | 28 +++++++++++++++++++------
 tools/perf/util/mmap.c       | 40 ++++++++++++++++++++++++++++++------
 tools/perf/util/mmap.h       | 13 +++++++++++-
 5 files changed, 113 insertions(+), 13 deletions(-)

---
Validation:

# tools/perf/perf record -vv -- ls
Using CPUID GenuineIntel-6-5E-3
intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
nr_cblocks: 0
affinity: SYS
mmap flush: 1
comp level: 0
------------------------------------------------------------
perf_event_attr:
  size                             120
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|PERIOD
  read_format                      ID
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  freq                             1
  enable_on_exec                   1
  task                             1
  precise_ip                       3
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
------------------------------------------------------------
sys_perf_event_open: pid 23718  cpu 0  group_fd -1  flags 0x8 = 4
sys_perf_event_open: pid 23718  cpu 1  group_fd -1  flags 0x8 = 5
sys_perf_event_open: pid 23718  cpu 2  group_fd -1  flags 0x8 = 6
sys_perf_event_open: pid 23718  cpu 3  group_fd -1  flags 0x8 = 9
sys_perf_event_open: pid 23718  cpu 4  group_fd -1  flags 0x8 = 10
sys_perf_event_open: pid 23718  cpu 5  group_fd -1  flags 0x8 = 11
sys_perf_event_open: pid 23718  cpu 6  group_fd -1  flags 0x8 = 12
sys_perf_event_open: pid 23718  cpu 7  group_fd -1  flags 0x8 = 13
mmap size 528384B
0x7f3e06e060b8: mmap mask[8]: 
0x7f3e06e16180: mmap mask[8]: 
0x7f3e06e26248: mmap mask[8]: 
0x7f3e06e36310: mmap mask[8]: 
0x7f3e06e463d8: mmap mask[8]: 
0x7f3e06e564a0: mmap mask[8]: 
0x7f3e06e66568: mmap mask[8]: 
0x7f3e06e76630: mmap mask[8]: 
------------------------------------------------------------
perf_event_attr:
  type                             1
  size                             120
  config                           0x9
  watermark                        1
  sample_id_all                    1
  bpf_event                        1
  { wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 14
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 15
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 16
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 17
sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 18
sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 19
sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 20
sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 21
mmap size 528384B
0x7f3e0697d0b8: mmap mask[8]: 
0x7f3e0698d180: mmap mask[8]: 
0x7f3e0699d248: mmap mask[8]: 
0x7f3e069ad310: mmap mask[8]: 
0x7f3e069bd3d8: mmap mask[8]: 
0x7f3e069cd4a0: mmap mask[8]: 
0x7f3e069dd568: mmap mask[8]: 
0x7f3e069ed630: mmap mask[8]: 
Synthesizing TSC conversion information
arch			      copy     Documentation  init     kernel	 MAINTAINERS	  modules.builtin.modinfo  perf.data	  scripts   System.map	vmlinux
block			      COPYING  drivers	      ipc      lbuild	 Makefile	  modules.order		   perf.data.old  security  tools	vmlinux.o
certs			      CREDITS  fs	      Kbuild   lib	 mm		  Module.symvers	   README	  sound     usr
config-5.2.7-100.fc29.x86_64  crypto   include	      Kconfig  LICENSES  modules.builtin  net			   samples	  stdio     virt
[ perf record: Woken up 1 times to write data ]
Looking at the vmlinux_path (8 entries long)
Using vmlinux for symbols
[ perf record: Captured and wrote 0.013 MB perf.data (8 samples) ]

tools/perf/perf record -vv --affinity=cpu -- ls
thread mask[8]: empty
Using CPUID GenuineIntel-6-5E-3
intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
nr_cblocks: 0
affinity: CPU
mmap flush: 1
comp level: 0
------------------------------------------------------------
perf_event_attr:
  size                             120
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|PERIOD
  read_format                      ID
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  freq                             1
  enable_on_exec                   1
  task                             1
  precise_ip                       3
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
------------------------------------------------------------
sys_perf_event_open: pid 23713  cpu 0  group_fd -1  flags 0x8 = 4
sys_perf_event_open: pid 23713  cpu 1  group_fd -1  flags 0x8 = 5
sys_perf_event_open: pid 23713  cpu 2  group_fd -1  flags 0x8 = 6
sys_perf_event_open: pid 23713  cpu 3  group_fd -1  flags 0x8 = 9
sys_perf_event_open: pid 23713  cpu 4  group_fd -1  flags 0x8 = 10
sys_perf_event_open: pid 23713  cpu 5  group_fd -1  flags 0x8 = 11
sys_perf_event_open: pid 23713  cpu 6  group_fd -1  flags 0x8 = 12
sys_perf_event_open: pid 23713  cpu 7  group_fd -1  flags 0x8 = 13
mmap size 528384B
0x7f3e005bc0b8: mmap mask[8]: 0
0x7f3e005cc180: mmap mask[8]: 1
0x7f3e005dc248: mmap mask[8]: 2
0x7f3e005ec310: mmap mask[8]: 3
0x7f3e005fc3d8: mmap mask[8]: 4
0x7f3e0060c4a0: mmap mask[8]: 5
0x7f3e0061c568: mmap mask[8]: 6
0x7f3e0062c630: mmap mask[8]: 7
------------------------------------------------------------
perf_event_attr:
  type                             1
  size                             120
  config                           0x9
  watermark                        1
  sample_id_all                    1
  bpf_event                        1
  { wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 14
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 15
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 16
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 17
sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 18
sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 19
sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 20
sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 21
mmap size 528384B
0x7f3e001330b8: mmap mask[8]: 
0x7f3e00143180: mmap mask[8]: 
0x7f3e00153248: mmap mask[8]: 
0x7f3e00163310: mmap mask[8]: 
0x7f3e001733d8: mmap mask[8]: 
0x7f3e001834a0: mmap mask[8]: 
0x7f3e00193568: mmap mask[8]: 
0x7f3e001a3630: mmap mask[8]: 
Synthesizing TSC conversion information
0x9c9ff0: thread mask[8]: 0
0x9c9ff0: thread mask[8]: 1
0x9c9ff0: thread mask[8]: 2
0x9c9ff0: thread mask[8]: 3
0x9c9ff0: thread mask[8]: 4
arch			      copy     Documentation  init     kernel	 MAINTAINERS	  modules.builtin.modinfo  perf.data	  scripts   System.map	vmlinux
block			      COPYING  drivers	      ipc      lbuild	 Makefile	  modules.order		   perf.data.old  security  tools	vmlinux.o
certs			      CREDITS  fs	      Kbuild   lib	 mm		  Module.symvers	   README	  sound     usr
config-5.2.7-100.fc29.x86_64  crypto   include	      Kconfig  LICENSES  modules.builtin  net			   samples	  stdio     virt
0x9c9ff0: thread mask[8]: 5
0x9c9ff0: thread mask[8]: 6
0x9c9ff0: thread mask[8]: 7
0x9c9ff0: thread mask[8]: 0
0x9c9ff0: thread mask[8]: 1
0x9c9ff0: thread mask[8]: 2
0x9c9ff0: thread mask[8]: 3
0x9c9ff0: thread mask[8]: 4
0x9c9ff0: thread mask[8]: 5
0x9c9ff0: thread mask[8]: 6
0x9c9ff0: thread mask[8]: 7
[ perf record: Woken up 0 times to write data ]
0x9c9ff0: thread mask[8]: 0
0x9c9ff0: thread mask[8]: 1
0x9c9ff0: thread mask[8]: 2
0x9c9ff0: thread mask[8]: 3
0x9c9ff0: thread mask[8]: 4
0x9c9ff0: thread mask[8]: 5
0x9c9ff0: thread mask[8]: 6
0x9c9ff0: thread mask[8]: 7
Looking at the vmlinux_path (8 entries long)
Using vmlinux for symbols
...
[ perf record: Captured and wrote 0.013 MB perf.data (10 samples) ]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 1/3] tools bitmap: implement bitmap_equal() operation at bitmap API
  2019-12-03 11:41 [PATCH v5 0/3] perf record: adapt NUMA awareness to machines with #CPUs > 1K Alexey Budankov
@ 2019-12-03 11:43 ` Alexey Budankov
  2020-01-10 17:53   ` [tip: perf/core] tools bitmap: Implement " tip-bot2 for Alexey Budankov
  2019-12-03 11:44 ` [PATCH v5 2/3] perf mmap: declare type for cpu mask of arbitrary length Alexey Budankov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Alexey Budankov @ 2019-12-03 11:43 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra,
	Ingo Molnar, Andi Kleen, linux-kernel


Extend tools bitmap API with bitmap_equal() implementation.
The implementation has been derived from the kernel.

Extend tools bitmap API with bitmap_free() implementation for
symmetry with bitmap_alloc() function.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/include/linux/bitmap.h | 30 ++++++++++++++++++++++++++++++
 tools/lib/bitmap.c           | 15 +++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 05dca5c203f3..477a1cae513f 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -15,6 +15,8 @@ void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
 		 const unsigned long *bitmap2, int bits);
 int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
 		 const unsigned long *bitmap2, unsigned int bits);
+int __bitmap_equal(const unsigned long *bitmap1,
+		   const unsigned long *bitmap2, unsigned int bits);
 void bitmap_clear(unsigned long *map, unsigned int start, int len);
 
 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
@@ -123,6 +125,15 @@ static inline unsigned long *bitmap_alloc(int nbits)
 	return calloc(1, BITS_TO_LONGS(nbits) * sizeof(unsigned long));
 }
 
+/*
+ * bitmap_free - Free bitmap
+ * @bitmap: pointer to bitmap
+ */
+static inline void bitmap_free(unsigned long *bitmap)
+{
+	free(bitmap);
+}
+
 /*
  * bitmap_scnprintf - print bitmap list into buffer
  * @bitmap: bitmap
@@ -148,4 +159,23 @@ static inline int bitmap_and(unsigned long *dst, const unsigned long *src1,
 	return __bitmap_and(dst, src1, src2, nbits);
 }
 
+#ifdef __LITTLE_ENDIAN
+#define BITMAP_MEM_ALIGNMENT 8
+#else
+#define BITMAP_MEM_ALIGNMENT (8 * sizeof(unsigned long))
+#endif
+#define BITMAP_MEM_MASK (BITMAP_MEM_ALIGNMENT - 1)
+#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
+
+static inline int bitmap_equal(const unsigned long *src1,
+			const unsigned long *src2, unsigned int nbits)
+{
+	if (small_const_nbits(nbits))
+		return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
+	if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
+	    IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
+		return !memcmp(src1, src2, nbits / 8);
+	return __bitmap_equal(src1, src2, nbits);
+}
+
 #endif /* _PERF_BITOPS_H */
diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c
index 38494782be06..5043747ef6c5 100644
--- a/tools/lib/bitmap.c
+++ b/tools/lib/bitmap.c
@@ -71,3 +71,18 @@ int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
 			   BITMAP_LAST_WORD_MASK(bits));
 	return result != 0;
 }
+
+int __bitmap_equal(const unsigned long *bitmap1,
+		const unsigned long *bitmap2, unsigned int bits)
+{
+	unsigned int k, lim = bits/BITS_PER_LONG;
+	for (k = 0; k < lim; ++k)
+		if (bitmap1[k] != bitmap2[k])
+			return 0;
+
+	if (bits % BITS_PER_LONG)
+		if ((bitmap1[k] ^ bitmap2[k]) & BITMAP_LAST_WORD_MASK(bits))
+			return 0;
+
+	return 1;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 2/3] perf mmap: declare type for cpu mask of arbitrary length
  2019-12-03 11:41 [PATCH v5 0/3] perf record: adapt NUMA awareness to machines with #CPUs > 1K Alexey Budankov
  2019-12-03 11:43 ` [PATCH v5 1/3] tools bitmap: implement bitmap_equal() operation at bitmap API Alexey Budankov
@ 2019-12-03 11:44 ` Alexey Budankov
  2019-12-04 13:49   ` Arnaldo Carvalho de Melo
  2020-01-10 17:53   ` [tip: perf/core] perf mmap: Declare " tip-bot2 for Alexey Budankov
  2019-12-03 11:45 ` [PATCH v5 3/3] perf record: adapt affinity to machines with #CPUs > 1K Alexey Budankov
  2019-12-03 12:17 ` [PATCH v5 0/3] perf record: adapt NUMA awareness " Jiri Olsa
  3 siblings, 2 replies; 13+ messages in thread
From: Alexey Budankov @ 2019-12-03 11:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra,
	Ingo Molnar, Andi Kleen, linux-kernel


Declare a dedicated struct map_cpu_mask type for cpu masks of
arbitrary length. Mask is available thru bits pointer and the
mask length is kept in nbits field. MMAP_CPU_MASK_BYTES() macro
returns mask storage size in bytes. mmap_cpu_mask__scnprintf()
function can be used to log text representation of the mask.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/util/mmap.c | 12 ++++++++++++
 tools/perf/util/mmap.h | 11 +++++++++++
 2 files changed, 23 insertions(+)

diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 063d1b93c53d..43c12b4a3e17 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -23,6 +23,18 @@
 #include "mmap.h"
 #include "../perf.h"
 #include <internal/lib.h> /* page_size */
+#include <linux/bitmap.h>
+
+#define MASK_SIZE 1023
+void mmap_cpu_mask__scnprintf(struct mmap_cpu_mask *mask, const char *tag)
+{
+	char buf[MASK_SIZE + 1];
+	size_t len;
+
+	len = bitmap_scnprintf(mask->bits, mask->nbits, buf, MASK_SIZE);
+	buf[len] = '\0';
+	pr_debug("%p: %s mask[%ld]: %s\n", mask, tag, mask->nbits, buf);
+}
 
 size_t mmap__mmap_len(struct mmap *map)
 {
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index bee4e83f7109..ef51667fabcb 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -15,6 +15,15 @@
 #include "event.h"
 
 struct aiocb;
+
+struct mmap_cpu_mask {
+	unsigned long *bits;
+	size_t nbits;
+};
+
+#define MMAP_CPU_MASK_BYTES(m) \
+	(BITS_TO_LONGS(((struct mmap_cpu_mask *)m)->nbits) * sizeof(unsigned long))
+
 /**
  * struct mmap - perf's ring buffer mmap details
  *
@@ -52,4 +61,6 @@ int perf_mmap__push(struct mmap *md, void *to,
 
 size_t mmap__mmap_len(struct mmap *map);
 
+void mmap_cpu_mask__scnprintf(struct mmap_cpu_mask *mask, const char *tag);
+
 #endif /*__PERF_MMAP_H */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 3/3] perf record: adapt affinity to machines with #CPUs > 1K
  2019-12-03 11:41 [PATCH v5 0/3] perf record: adapt NUMA awareness to machines with #CPUs > 1K Alexey Budankov
  2019-12-03 11:43 ` [PATCH v5 1/3] tools bitmap: implement bitmap_equal() operation at bitmap API Alexey Budankov
  2019-12-03 11:44 ` [PATCH v5 2/3] perf mmap: declare type for cpu mask of arbitrary length Alexey Budankov
@ 2019-12-03 11:45 ` Alexey Budankov
  2019-12-04 13:48   ` Arnaldo Carvalho de Melo
  2020-01-10 17:53   ` [tip: perf/core] perf record: Adapt " tip-bot2 for Alexey Budankov
  2019-12-03 12:17 ` [PATCH v5 0/3] perf record: adapt NUMA awareness " Jiri Olsa
  3 siblings, 2 replies; 13+ messages in thread
From: Alexey Budankov @ 2019-12-03 11:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra,
	Ingo Molnar, Andi Kleen, linux-kernel


Use struct mmap_cpu_mask type for tool's thread and mmap data
buffers to overcome current 1024 CPUs mask size limitation of
cpu_set_t type.

Currently glibc cpu_set_t type has internal mask size limit
of 1024 CPUs. Moving to struct mmap_cpu_mask type allows
overcoming that limit. tools bitmap API is used to manipulate
objects of struct mmap_cpu_mask type.

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
 tools/perf/builtin-record.c | 28 ++++++++++++++++++++++------
 tools/perf/util/mmap.c      | 28 ++++++++++++++++++++++------
 tools/perf/util/mmap.h      |  2 +-
 3 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index fb19ef63cc35..7bc83755ef8c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -62,6 +62,7 @@
 #include <linux/string.h>
 #include <linux/time64.h>
 #include <linux/zalloc.h>
+#include <linux/bitmap.h>
 
 struct switch_output {
 	bool		 enabled;
@@ -93,7 +94,7 @@ struct record {
 	bool			timestamp_boundary;
 	struct switch_output	switch_output;
 	unsigned long long	samples;
-	cpu_set_t		affinity_mask;
+	struct mmap_cpu_mask	affinity_mask;
 	unsigned long		output_max_size;	/* = 0: unlimited */
 };
 
@@ -961,10 +962,15 @@ static struct perf_event_header finished_round_event = {
 static void record__adjust_affinity(struct record *rec, struct mmap *map)
 {
 	if (rec->opts.affinity != PERF_AFFINITY_SYS &&
-	    !CPU_EQUAL(&rec->affinity_mask, &map->affinity_mask)) {
-		CPU_ZERO(&rec->affinity_mask);
-		CPU_OR(&rec->affinity_mask, &rec->affinity_mask, &map->affinity_mask);
-		sched_setaffinity(0, sizeof(rec->affinity_mask), &rec->affinity_mask);
+	    !bitmap_equal(rec->affinity_mask.bits, map->affinity_mask.bits,
+			  rec->affinity_mask.nbits)) {
+		bitmap_zero(rec->affinity_mask.bits, rec->affinity_mask.nbits);
+		bitmap_or(rec->affinity_mask.bits, rec->affinity_mask.bits,
+			  map->affinity_mask.bits, rec->affinity_mask.nbits);
+		sched_setaffinity(0, MMAP_CPU_MASK_BYTES(&rec->affinity_mask),
+				  (cpu_set_t *)rec->affinity_mask.bits);
+		if (verbose == 2)
+			mmap_cpu_mask__scnprintf(&rec->affinity_mask, "thread");
 	}
 }
 
@@ -2433,7 +2439,6 @@ int cmd_record(int argc, const char **argv)
 # undef REASON
 #endif
 
-	CPU_ZERO(&rec->affinity_mask);
 	rec->opts.affinity = PERF_AFFINITY_SYS;
 
 	rec->evlist = evlist__new();
@@ -2499,6 +2504,16 @@ int cmd_record(int argc, const char **argv)
 
 	symbol__init(NULL);
 
+	if (rec->opts.affinity != PERF_AFFINITY_SYS) {
+		rec->affinity_mask.nbits = cpu__max_cpu();
+		rec->affinity_mask.bits = bitmap_alloc(rec->affinity_mask.nbits);
+		if (!rec->affinity_mask.bits) {
+			pr_err("Failed to allocate thread mask for %ld cpus\n", rec->affinity_mask.nbits);
+			return -ENOMEM;
+		}
+		pr_debug2("thread mask[%ld]: empty\n", rec->affinity_mask.nbits);
+	}
+
 	err = record__auxtrace_init(rec);
 	if (err)
 		goto out;
@@ -2613,6 +2628,7 @@ int cmd_record(int argc, const char **argv)
 
 	err = __cmd_record(&record, argc, argv);
 out:
+	bitmap_free(rec->affinity_mask.bits);
 	evlist__delete(rec->evlist);
 	symbol__exit();
 	auxtrace_record__free(rec->itr);
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 43c12b4a3e17..832d2cb94b2c 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -219,6 +219,8 @@ static void perf_mmap__aio_munmap(struct mmap *map __maybe_unused)
 
 void mmap__munmap(struct mmap *map)
 {
+	bitmap_free(map->affinity_mask.bits);
+
 	perf_mmap__aio_munmap(map);
 	if (map->data != NULL) {
 		munmap(map->data, mmap__mmap_len(map));
@@ -227,7 +229,7 @@ void mmap__munmap(struct mmap *map)
 	auxtrace_mmap__munmap(&map->auxtrace_mmap);
 }
 
-static void build_node_mask(int node, cpu_set_t *mask)
+static void build_node_mask(int node, struct mmap_cpu_mask *mask)
 {
 	int c, cpu, nr_cpus;
 	const struct perf_cpu_map *cpu_map = NULL;
@@ -240,17 +242,23 @@ static void build_node_mask(int node, cpu_set_t *mask)
 	for (c = 0; c < nr_cpus; c++) {
 		cpu = cpu_map->map[c]; /* map c index to online cpu index */
 		if (cpu__get_node(cpu) == node)
-			CPU_SET(cpu, mask);
+			set_bit(cpu, mask->bits);
 	}
 }
 
-static void perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
+static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
 {
-	CPU_ZERO(&map->affinity_mask);
+	map->affinity_mask.nbits = cpu__max_cpu();
+	map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits);
+	if (!map->affinity_mask.bits)
+		return -1;
+
 	if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1)
 		build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask);
 	else if (mp->affinity == PERF_AFFINITY_CPU)
-		CPU_SET(map->core.cpu, &map->affinity_mask);
+		set_bit(map->core.cpu, map->affinity_mask.bits);
+
+	return 0;
 }
 
 int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
@@ -261,7 +269,15 @@ int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
 		return -1;
 	}
 
-	perf_mmap__setup_affinity_mask(map, mp);
+	if (mp->affinity != PERF_AFFINITY_SYS &&
+		perf_mmap__setup_affinity_mask(map, mp)) {
+		pr_debug2("failed to alloc mmap affinity mask, error %d\n",
+			  errno);
+		return -1;
+	}
+
+	if (verbose == 2)
+		mmap_cpu_mask__scnprintf(&map->affinity_mask, "mmap");
 
 	map->core.flush = mp->flush;
 
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index ef51667fabcb..9d5f589f02ae 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -40,7 +40,7 @@ struct mmap {
 		int		 nr_cblocks;
 	} aio;
 #endif
-	cpu_set_t	affinity_mask;
+	struct mmap_cpu_mask	affinity_mask;
 	void		*data;
 	int		comp_level;
 };
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 0/3] perf record: adapt NUMA awareness to machines with #CPUs > 1K
  2019-12-03 11:41 [PATCH v5 0/3] perf record: adapt NUMA awareness to machines with #CPUs > 1K Alexey Budankov
                   ` (2 preceding siblings ...)
  2019-12-03 11:45 ` [PATCH v5 3/3] perf record: adapt affinity to machines with #CPUs > 1K Alexey Budankov
@ 2019-12-03 12:17 ` Jiri Olsa
  2019-12-03 18:36   ` Arnaldo Carvalho de Melo
  3 siblings, 1 reply; 13+ messages in thread
From: Jiri Olsa @ 2019-12-03 12:17 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Andi Kleen, linux-kernel

On Tue, Dec 03, 2019 at 02:41:29PM +0300, Alexey Budankov wrote:
> 
> Current implementation of cpu_set_t type by glibc has internal cpu
> mask size limitation of no more than 1024 CPUs. This limitation confines
> NUMA awareness of Perf tool in record mode, thru --affinity option,
> to the first 1024 CPUs on machines with larger amount of CPUs.
> 
> This patch set enables Perf tool to overcome 1024 CPUs limitation by
> using a dedicated struct mmap_cpu_mask type and applying tool's bitmap
> API operations to manipulate affinity masks of the tool's thread and
> the mmaped data buffers.
> 
> tools bitmap API has been extended with bitmap_free() function and
> bitmap_equal() operation whose implementation is derived from the
> kernel one.
> 
> ---
> Changes in v5:
> - avoided allocation of mmap affinity masks in case of 
>   rec->opts.affinity == PERF_AFFINITY_SYS

Acked-by: Jiri Olsa <jolsa@redhat.com>

thanks,
jirka

> Changes in v4:
> - renamed perf_mmap__print_cpu_mask() to mmap_cpu_mask__scnprintf()
> - avoided checking mask bits for NULL prior calling bitmask_free()
> - avoided thread affinity mask allocation for case of 
>   rec->opts.affinity == PERF_AFFINITY_SYS
> Changes in v3:
> - implemented perf_mmap__print_cpu_mask() function
> - use perf_mmap__print_cpu_mask() to log thread and mmap cpus masks
>   when verbose level is equal to 2
> Changes in v2:
> - implemented bitmap_free() for symmetry with bitmap_alloc()
> - capitalized MMAP_CPU_MASK_BYTES() macro
> - returned -1 from perf_mmap__setup_affinity_mask()
> - implemented releasing of masks using bitmap_free()
> - moved debug printing under -vv option
> 
> ---
> Alexey Budankov (3):
>   tools bitmap: implement bitmap_equal() operation at bitmap API
>   perf mmap: declare type for cpu mask of arbitrary length
>   perf record: adapt affinity to machines with #CPUs > 1K
> 
>  tools/include/linux/bitmap.h | 30 +++++++++++++++++++++++++++
>  tools/lib/bitmap.c           | 15 ++++++++++++++
>  tools/perf/builtin-record.c  | 28 +++++++++++++++++++------
>  tools/perf/util/mmap.c       | 40 ++++++++++++++++++++++++++++++------
>  tools/perf/util/mmap.h       | 13 +++++++++++-
>  5 files changed, 113 insertions(+), 13 deletions(-)
> 
> ---
> Validation:
> 
> # tools/perf/perf record -vv -- ls
> Using CPUID GenuineIntel-6-5E-3
> intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
> nr_cblocks: 0
> affinity: SYS
> mmap flush: 1
> comp level: 0
> ------------------------------------------------------------
> perf_event_attr:
>   size                             120
>   { sample_period, sample_freq }   4000
>   sample_type                      IP|TID|TIME|PERIOD
>   read_format                      ID
>   disabled                         1
>   inherit                          1
>   mmap                             1
>   comm                             1
>   freq                             1
>   enable_on_exec                   1
>   task                             1
>   precise_ip                       3
>   sample_id_all                    1
>   exclude_guest                    1
>   mmap2                            1
>   comm_exec                        1
>   ksymbol                          1
>   bpf_event                        1
> ------------------------------------------------------------
> sys_perf_event_open: pid 23718  cpu 0  group_fd -1  flags 0x8 = 4
> sys_perf_event_open: pid 23718  cpu 1  group_fd -1  flags 0x8 = 5
> sys_perf_event_open: pid 23718  cpu 2  group_fd -1  flags 0x8 = 6
> sys_perf_event_open: pid 23718  cpu 3  group_fd -1  flags 0x8 = 9
> sys_perf_event_open: pid 23718  cpu 4  group_fd -1  flags 0x8 = 10
> sys_perf_event_open: pid 23718  cpu 5  group_fd -1  flags 0x8 = 11
> sys_perf_event_open: pid 23718  cpu 6  group_fd -1  flags 0x8 = 12
> sys_perf_event_open: pid 23718  cpu 7  group_fd -1  flags 0x8 = 13
> mmap size 528384B
> 0x7f3e06e060b8: mmap mask[8]: 
> 0x7f3e06e16180: mmap mask[8]: 
> 0x7f3e06e26248: mmap mask[8]: 
> 0x7f3e06e36310: mmap mask[8]: 
> 0x7f3e06e463d8: mmap mask[8]: 
> 0x7f3e06e564a0: mmap mask[8]: 
> 0x7f3e06e66568: mmap mask[8]: 
> 0x7f3e06e76630: mmap mask[8]: 
> ------------------------------------------------------------
> perf_event_attr:
>   type                             1
>   size                             120
>   config                           0x9
>   watermark                        1
>   sample_id_all                    1
>   bpf_event                        1
>   { wakeup_events, wakeup_watermark } 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 14
> sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 15
> sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 16
> sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 17
> sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 18
> sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 19
> sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 20
> sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 21
> mmap size 528384B
> 0x7f3e0697d0b8: mmap mask[8]: 
> 0x7f3e0698d180: mmap mask[8]: 
> 0x7f3e0699d248: mmap mask[8]: 
> 0x7f3e069ad310: mmap mask[8]: 
> 0x7f3e069bd3d8: mmap mask[8]: 
> 0x7f3e069cd4a0: mmap mask[8]: 
> 0x7f3e069dd568: mmap mask[8]: 
> 0x7f3e069ed630: mmap mask[8]: 
> Synthesizing TSC conversion information
> arch			      copy     Documentation  init     kernel	 MAINTAINERS	  modules.builtin.modinfo  perf.data	  scripts   System.map	vmlinux
> block			      COPYING  drivers	      ipc      lbuild	 Makefile	  modules.order		   perf.data.old  security  tools	vmlinux.o
> certs			      CREDITS  fs	      Kbuild   lib	 mm		  Module.symvers	   README	  sound     usr
> config-5.2.7-100.fc29.x86_64  crypto   include	      Kconfig  LICENSES  modules.builtin  net			   samples	  stdio     virt
> [ perf record: Woken up 1 times to write data ]
> Looking at the vmlinux_path (8 entries long)
> Using vmlinux for symbols
> [ perf record: Captured and wrote 0.013 MB perf.data (8 samples) ]
> 
> tools/perf/perf record -vv --affinity=cpu -- ls
> thread mask[8]: empty
> Using CPUID GenuineIntel-6-5E-3
> intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
> nr_cblocks: 0
> affinity: CPU
> mmap flush: 1
> comp level: 0
> ------------------------------------------------------------
> perf_event_attr:
>   size                             120
>   { sample_period, sample_freq }   4000
>   sample_type                      IP|TID|TIME|PERIOD
>   read_format                      ID
>   disabled                         1
>   inherit                          1
>   mmap                             1
>   comm                             1
>   freq                             1
>   enable_on_exec                   1
>   task                             1
>   precise_ip                       3
>   sample_id_all                    1
>   exclude_guest                    1
>   mmap2                            1
>   comm_exec                        1
>   ksymbol                          1
>   bpf_event                        1
> ------------------------------------------------------------
> sys_perf_event_open: pid 23713  cpu 0  group_fd -1  flags 0x8 = 4
> sys_perf_event_open: pid 23713  cpu 1  group_fd -1  flags 0x8 = 5
> sys_perf_event_open: pid 23713  cpu 2  group_fd -1  flags 0x8 = 6
> sys_perf_event_open: pid 23713  cpu 3  group_fd -1  flags 0x8 = 9
> sys_perf_event_open: pid 23713  cpu 4  group_fd -1  flags 0x8 = 10
> sys_perf_event_open: pid 23713  cpu 5  group_fd -1  flags 0x8 = 11
> sys_perf_event_open: pid 23713  cpu 6  group_fd -1  flags 0x8 = 12
> sys_perf_event_open: pid 23713  cpu 7  group_fd -1  flags 0x8 = 13
> mmap size 528384B
> 0x7f3e005bc0b8: mmap mask[8]: 0
> 0x7f3e005cc180: mmap mask[8]: 1
> 0x7f3e005dc248: mmap mask[8]: 2
> 0x7f3e005ec310: mmap mask[8]: 3
> 0x7f3e005fc3d8: mmap mask[8]: 4
> 0x7f3e0060c4a0: mmap mask[8]: 5
> 0x7f3e0061c568: mmap mask[8]: 6
> 0x7f3e0062c630: mmap mask[8]: 7
> ------------------------------------------------------------
> perf_event_attr:
>   type                             1
>   size                             120
>   config                           0x9
>   watermark                        1
>   sample_id_all                    1
>   bpf_event                        1
>   { wakeup_events, wakeup_watermark } 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 14
> sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 15
> sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 16
> sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 17
> sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 18
> sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 19
> sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 20
> sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 21
> mmap size 528384B
> 0x7f3e001330b8: mmap mask[8]: 
> 0x7f3e00143180: mmap mask[8]: 
> 0x7f3e00153248: mmap mask[8]: 
> 0x7f3e00163310: mmap mask[8]: 
> 0x7f3e001733d8: mmap mask[8]: 
> 0x7f3e001834a0: mmap mask[8]: 
> 0x7f3e00193568: mmap mask[8]: 
> 0x7f3e001a3630: mmap mask[8]: 
> Synthesizing TSC conversion information
> 0x9c9ff0: thread mask[8]: 0
> 0x9c9ff0: thread mask[8]: 1
> 0x9c9ff0: thread mask[8]: 2
> 0x9c9ff0: thread mask[8]: 3
> 0x9c9ff0: thread mask[8]: 4
> arch			      copy     Documentation  init     kernel	 MAINTAINERS	  modules.builtin.modinfo  perf.data	  scripts   System.map	vmlinux
> block			      COPYING  drivers	      ipc      lbuild	 Makefile	  modules.order		   perf.data.old  security  tools	vmlinux.o
> certs			      CREDITS  fs	      Kbuild   lib	 mm		  Module.symvers	   README	  sound     usr
> config-5.2.7-100.fc29.x86_64  crypto   include	      Kconfig  LICENSES  modules.builtin  net			   samples	  stdio     virt
> 0x9c9ff0: thread mask[8]: 5
> 0x9c9ff0: thread mask[8]: 6
> 0x9c9ff0: thread mask[8]: 7
> 0x9c9ff0: thread mask[8]: 0
> 0x9c9ff0: thread mask[8]: 1
> 0x9c9ff0: thread mask[8]: 2
> 0x9c9ff0: thread mask[8]: 3
> 0x9c9ff0: thread mask[8]: 4
> 0x9c9ff0: thread mask[8]: 5
> 0x9c9ff0: thread mask[8]: 6
> 0x9c9ff0: thread mask[8]: 7
> [ perf record: Woken up 0 times to write data ]
> 0x9c9ff0: thread mask[8]: 0
> 0x9c9ff0: thread mask[8]: 1
> 0x9c9ff0: thread mask[8]: 2
> 0x9c9ff0: thread mask[8]: 3
> 0x9c9ff0: thread mask[8]: 4
> 0x9c9ff0: thread mask[8]: 5
> 0x9c9ff0: thread mask[8]: 6
> 0x9c9ff0: thread mask[8]: 7
> Looking at the vmlinux_path (8 entries long)
> Using vmlinux for symbols
> ...
> [ perf record: Captured and wrote 0.013 MB perf.data (10 samples) ]
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 0/3] perf record: adapt NUMA awareness to machines with #CPUs > 1K
  2019-12-03 12:17 ` [PATCH v5 0/3] perf record: adapt NUMA awareness " Jiri Olsa
@ 2019-12-03 18:36   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-12-03 18:36 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexey Budankov, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Ingo Molnar, Andi Kleen, linux-kernel

Em Tue, Dec 03, 2019 at 01:17:45PM +0100, Jiri Olsa escreveu:
> On Tue, Dec 03, 2019 at 02:41:29PM +0300, Alexey Budankov wrote:
> > 
> > Current implementation of cpu_set_t type by glibc has internal cpu
> > mask size limitation of no more than 1024 CPUs. This limitation confines
> > NUMA awareness of Perf tool in record mode, thru --affinity option,
> > to the first 1024 CPUs on machines with larger amount of CPUs.
> > 
> > This patch set enables Perf tool to overcome 1024 CPUs limitation by
> > using a dedicated struct mmap_cpu_mask type and applying tool's bitmap
> > API operations to manipulate affinity masks of the tool's thread and
> > the mmaped data buffers.
> > 
> > tools bitmap API has been extended with bitmap_free() function and
> > bitmap_equal() operation whose implementation is derived from the
> > kernel one.
> > 
> > ---
> > Changes in v5:
> > - avoided allocation of mmap affinity masks in case of 
> >   rec->opts.affinity == PERF_AFFINITY_SYS
> 
> Acked-by: Jiri Olsa <jolsa@redhat.com>

Applied to my local perf/core branch, going thru tests.

- Arnaldo
 
> thanks,
> jirka
> 
> > Changes in v4:
> > - renamed perf_mmap__print_cpu_mask() to mmap_cpu_mask__scnprintf()
> > - avoided checking mask bits for NULL prior calling bitmask_free()
> > - avoided thread affinity mask allocation for case of 
> >   rec->opts.affinity == PERF_AFFINITY_SYS
> > Changes in v3:
> > - implemented perf_mmap__print_cpu_mask() function
> > - use perf_mmap__print_cpu_mask() to log thread and mmap cpus masks
> >   when verbose level is equal to 2
> > Changes in v2:
> > - implemented bitmap_free() for symmetry with bitmap_alloc()
> > - capitalized MMAP_CPU_MASK_BYTES() macro
> > - returned -1 from perf_mmap__setup_affinity_mask()
> > - implemented releasing of masks using bitmap_free()
> > - moved debug printing under -vv option
> > 
> > ---
> > Alexey Budankov (3):
> >   tools bitmap: implement bitmap_equal() operation at bitmap API
> >   perf mmap: declare type for cpu mask of arbitrary length
> >   perf record: adapt affinity to machines with #CPUs > 1K
> > 
> >  tools/include/linux/bitmap.h | 30 +++++++++++++++++++++++++++
> >  tools/lib/bitmap.c           | 15 ++++++++++++++
> >  tools/perf/builtin-record.c  | 28 +++++++++++++++++++------
> >  tools/perf/util/mmap.c       | 40 ++++++++++++++++++++++++++++++------
> >  tools/perf/util/mmap.h       | 13 +++++++++++-
> >  5 files changed, 113 insertions(+), 13 deletions(-)
> > 
> > ---
> > Validation:
> > 
> > # tools/perf/perf record -vv -- ls
> > Using CPUID GenuineIntel-6-5E-3
> > intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
> > nr_cblocks: 0
> > affinity: SYS
> > mmap flush: 1
> > comp level: 0
> > ------------------------------------------------------------
> > perf_event_attr:
> >   size                             120
> >   { sample_period, sample_freq }   4000
> >   sample_type                      IP|TID|TIME|PERIOD
> >   read_format                      ID
> >   disabled                         1
> >   inherit                          1
> >   mmap                             1
> >   comm                             1
> >   freq                             1
> >   enable_on_exec                   1
> >   task                             1
> >   precise_ip                       3
> >   sample_id_all                    1
> >   exclude_guest                    1
> >   mmap2                            1
> >   comm_exec                        1
> >   ksymbol                          1
> >   bpf_event                        1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 23718  cpu 0  group_fd -1  flags 0x8 = 4
> > sys_perf_event_open: pid 23718  cpu 1  group_fd -1  flags 0x8 = 5
> > sys_perf_event_open: pid 23718  cpu 2  group_fd -1  flags 0x8 = 6
> > sys_perf_event_open: pid 23718  cpu 3  group_fd -1  flags 0x8 = 9
> > sys_perf_event_open: pid 23718  cpu 4  group_fd -1  flags 0x8 = 10
> > sys_perf_event_open: pid 23718  cpu 5  group_fd -1  flags 0x8 = 11
> > sys_perf_event_open: pid 23718  cpu 6  group_fd -1  flags 0x8 = 12
> > sys_perf_event_open: pid 23718  cpu 7  group_fd -1  flags 0x8 = 13
> > mmap size 528384B
> > 0x7f3e06e060b8: mmap mask[8]: 
> > 0x7f3e06e16180: mmap mask[8]: 
> > 0x7f3e06e26248: mmap mask[8]: 
> > 0x7f3e06e36310: mmap mask[8]: 
> > 0x7f3e06e463d8: mmap mask[8]: 
> > 0x7f3e06e564a0: mmap mask[8]: 
> > 0x7f3e06e66568: mmap mask[8]: 
> > 0x7f3e06e76630: mmap mask[8]: 
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             1
> >   size                             120
> >   config                           0x9
> >   watermark                        1
> >   sample_id_all                    1
> >   bpf_event                        1
> >   { wakeup_events, wakeup_watermark } 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 14
> > sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 15
> > sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 16
> > sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 17
> > sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 18
> > sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 19
> > sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 20
> > sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 21
> > mmap size 528384B
> > 0x7f3e0697d0b8: mmap mask[8]: 
> > 0x7f3e0698d180: mmap mask[8]: 
> > 0x7f3e0699d248: mmap mask[8]: 
> > 0x7f3e069ad310: mmap mask[8]: 
> > 0x7f3e069bd3d8: mmap mask[8]: 
> > 0x7f3e069cd4a0: mmap mask[8]: 
> > 0x7f3e069dd568: mmap mask[8]: 
> > 0x7f3e069ed630: mmap mask[8]: 
> > Synthesizing TSC conversion information
> > arch			      copy     Documentation  init     kernel	 MAINTAINERS	  modules.builtin.modinfo  perf.data	  scripts   System.map	vmlinux
> > block			      COPYING  drivers	      ipc      lbuild	 Makefile	  modules.order		   perf.data.old  security  tools	vmlinux.o
> > certs			      CREDITS  fs	      Kbuild   lib	 mm		  Module.symvers	   README	  sound     usr
> > config-5.2.7-100.fc29.x86_64  crypto   include	      Kconfig  LICENSES  modules.builtin  net			   samples	  stdio     virt
> > [ perf record: Woken up 1 times to write data ]
> > Looking at the vmlinux_path (8 entries long)
> > Using vmlinux for symbols
> > [ perf record: Captured and wrote 0.013 MB perf.data (8 samples) ]
> > 
> > tools/perf/perf record -vv --affinity=cpu -- ls
> > thread mask[8]: empty
> > Using CPUID GenuineIntel-6-5E-3
> > intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch
> > nr_cblocks: 0
> > affinity: CPU
> > mmap flush: 1
> > comp level: 0
> > ------------------------------------------------------------
> > perf_event_attr:
> >   size                             120
> >   { sample_period, sample_freq }   4000
> >   sample_type                      IP|TID|TIME|PERIOD
> >   read_format                      ID
> >   disabled                         1
> >   inherit                          1
> >   mmap                             1
> >   comm                             1
> >   freq                             1
> >   enable_on_exec                   1
> >   task                             1
> >   precise_ip                       3
> >   sample_id_all                    1
> >   exclude_guest                    1
> >   mmap2                            1
> >   comm_exec                        1
> >   ksymbol                          1
> >   bpf_event                        1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 23713  cpu 0  group_fd -1  flags 0x8 = 4
> > sys_perf_event_open: pid 23713  cpu 1  group_fd -1  flags 0x8 = 5
> > sys_perf_event_open: pid 23713  cpu 2  group_fd -1  flags 0x8 = 6
> > sys_perf_event_open: pid 23713  cpu 3  group_fd -1  flags 0x8 = 9
> > sys_perf_event_open: pid 23713  cpu 4  group_fd -1  flags 0x8 = 10
> > sys_perf_event_open: pid 23713  cpu 5  group_fd -1  flags 0x8 = 11
> > sys_perf_event_open: pid 23713  cpu 6  group_fd -1  flags 0x8 = 12
> > sys_perf_event_open: pid 23713  cpu 7  group_fd -1  flags 0x8 = 13
> > mmap size 528384B
> > 0x7f3e005bc0b8: mmap mask[8]: 0
> > 0x7f3e005cc180: mmap mask[8]: 1
> > 0x7f3e005dc248: mmap mask[8]: 2
> > 0x7f3e005ec310: mmap mask[8]: 3
> > 0x7f3e005fc3d8: mmap mask[8]: 4
> > 0x7f3e0060c4a0: mmap mask[8]: 5
> > 0x7f3e0061c568: mmap mask[8]: 6
> > 0x7f3e0062c630: mmap mask[8]: 7
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             1
> >   size                             120
> >   config                           0x9
> >   watermark                        1
> >   sample_id_all                    1
> >   bpf_event                        1
> >   { wakeup_events, wakeup_watermark } 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 14
> > sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 15
> > sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 16
> > sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 17
> > sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 18
> > sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 19
> > sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 20
> > sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 21
> > mmap size 528384B
> > 0x7f3e001330b8: mmap mask[8]: 
> > 0x7f3e00143180: mmap mask[8]: 
> > 0x7f3e00153248: mmap mask[8]: 
> > 0x7f3e00163310: mmap mask[8]: 
> > 0x7f3e001733d8: mmap mask[8]: 
> > 0x7f3e001834a0: mmap mask[8]: 
> > 0x7f3e00193568: mmap mask[8]: 
> > 0x7f3e001a3630: mmap mask[8]: 
> > Synthesizing TSC conversion information
> > 0x9c9ff0: thread mask[8]: 0
> > 0x9c9ff0: thread mask[8]: 1
> > 0x9c9ff0: thread mask[8]: 2
> > 0x9c9ff0: thread mask[8]: 3
> > 0x9c9ff0: thread mask[8]: 4
> > arch			      copy     Documentation  init     kernel	 MAINTAINERS	  modules.builtin.modinfo  perf.data	  scripts   System.map	vmlinux
> > block			      COPYING  drivers	      ipc      lbuild	 Makefile	  modules.order		   perf.data.old  security  tools	vmlinux.o
> > certs			      CREDITS  fs	      Kbuild   lib	 mm		  Module.symvers	   README	  sound     usr
> > config-5.2.7-100.fc29.x86_64  crypto   include	      Kconfig  LICENSES  modules.builtin  net			   samples	  stdio     virt
> > 0x9c9ff0: thread mask[8]: 5
> > 0x9c9ff0: thread mask[8]: 6
> > 0x9c9ff0: thread mask[8]: 7
> > 0x9c9ff0: thread mask[8]: 0
> > 0x9c9ff0: thread mask[8]: 1
> > 0x9c9ff0: thread mask[8]: 2
> > 0x9c9ff0: thread mask[8]: 3
> > 0x9c9ff0: thread mask[8]: 4
> > 0x9c9ff0: thread mask[8]: 5
> > 0x9c9ff0: thread mask[8]: 6
> > 0x9c9ff0: thread mask[8]: 7
> > [ perf record: Woken up 0 times to write data ]
> > 0x9c9ff0: thread mask[8]: 0
> > 0x9c9ff0: thread mask[8]: 1
> > 0x9c9ff0: thread mask[8]: 2
> > 0x9c9ff0: thread mask[8]: 3
> > 0x9c9ff0: thread mask[8]: 4
> > 0x9c9ff0: thread mask[8]: 5
> > 0x9c9ff0: thread mask[8]: 6
> > 0x9c9ff0: thread mask[8]: 7
> > Looking at the vmlinux_path (8 entries long)
> > Using vmlinux for symbols
> > ...
> > [ perf record: Captured and wrote 0.013 MB perf.data (10 samples) ]
> > 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/3] perf record: adapt affinity to machines with #CPUs > 1K
  2019-12-03 11:45 ` [PATCH v5 3/3] perf record: adapt affinity to machines with #CPUs > 1K Alexey Budankov
@ 2019-12-04 13:48   ` Arnaldo Carvalho de Melo
  2019-12-05  7:30     ` Alexey Budankov
  2020-01-10 17:53   ` [tip: perf/core] perf record: Adapt " tip-bot2 for Alexey Budankov
  1 sibling, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-12-04 13:48 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra,
	Ingo Molnar, Andi Kleen, linux-kernel

Em Tue, Dec 03, 2019 at 02:45:27PM +0300, Alexey Budankov escreveu:
> 
> Use struct mmap_cpu_mask type for tool's thread and mmap data
> buffers to overcome current 1024 CPUs mask size limitation of
> cpu_set_t type.
> 
> Currently glibc cpu_set_t type has internal mask size limit
> of 1024 CPUs. Moving to struct mmap_cpu_mask type allows
> overcoming that limit. tools bitmap API is used to manipulate
> objects of struct mmap_cpu_mask type.

Had to apply this to fix the build in some toolchains/arches:

[acme@quaco perf]$ git diff
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7bc83755ef8c..4c301466101b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2508,10 +2508,10 @@ int cmd_record(int argc, const char **argv)
                rec->affinity_mask.nbits = cpu__max_cpu();
                rec->affinity_mask.bits = bitmap_alloc(rec->affinity_mask.nbits);
                if (!rec->affinity_mask.bits) {
-                       pr_err("Failed to allocate thread mask for %ld cpus\n", rec->affinity_mask.nbits);
+                       pr_err("Failed to allocate thread mask for %zd cpus\n", rec->affinity_mask.nbits);
                        return -ENOMEM;
                }
-               pr_debug2("thread mask[%ld]: empty\n", rec->affinity_mask.nbits);
+               pr_debug2("thread mask[%zd]: empty\n", rec->affinity_mask.nbits);
        }

        err = record__auxtrace_init(rec);


 
> Reported-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
>  tools/perf/builtin-record.c | 28 ++++++++++++++++++++++------
>  tools/perf/util/mmap.c      | 28 ++++++++++++++++++++++------
>  tools/perf/util/mmap.h      |  2 +-
>  3 files changed, 45 insertions(+), 13 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index fb19ef63cc35..7bc83755ef8c 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -62,6 +62,7 @@
>  #include <linux/string.h>
>  #include <linux/time64.h>
>  #include <linux/zalloc.h>
> +#include <linux/bitmap.h>
>  
>  struct switch_output {
>  	bool		 enabled;
> @@ -93,7 +94,7 @@ struct record {
>  	bool			timestamp_boundary;
>  	struct switch_output	switch_output;
>  	unsigned long long	samples;
> -	cpu_set_t		affinity_mask;
> +	struct mmap_cpu_mask	affinity_mask;
>  	unsigned long		output_max_size;	/* = 0: unlimited */
>  };
>  
> @@ -961,10 +962,15 @@ static struct perf_event_header finished_round_event = {
>  static void record__adjust_affinity(struct record *rec, struct mmap *map)
>  {
>  	if (rec->opts.affinity != PERF_AFFINITY_SYS &&
> -	    !CPU_EQUAL(&rec->affinity_mask, &map->affinity_mask)) {
> -		CPU_ZERO(&rec->affinity_mask);
> -		CPU_OR(&rec->affinity_mask, &rec->affinity_mask, &map->affinity_mask);
> -		sched_setaffinity(0, sizeof(rec->affinity_mask), &rec->affinity_mask);
> +	    !bitmap_equal(rec->affinity_mask.bits, map->affinity_mask.bits,
> +			  rec->affinity_mask.nbits)) {
> +		bitmap_zero(rec->affinity_mask.bits, rec->affinity_mask.nbits);
> +		bitmap_or(rec->affinity_mask.bits, rec->affinity_mask.bits,
> +			  map->affinity_mask.bits, rec->affinity_mask.nbits);
> +		sched_setaffinity(0, MMAP_CPU_MASK_BYTES(&rec->affinity_mask),
> +				  (cpu_set_t *)rec->affinity_mask.bits);
> +		if (verbose == 2)
> +			mmap_cpu_mask__scnprintf(&rec->affinity_mask, "thread");
>  	}
>  }
>  
> @@ -2433,7 +2439,6 @@ int cmd_record(int argc, const char **argv)
>  # undef REASON
>  #endif
>  
> -	CPU_ZERO(&rec->affinity_mask);
>  	rec->opts.affinity = PERF_AFFINITY_SYS;
>  
>  	rec->evlist = evlist__new();
> @@ -2499,6 +2504,16 @@ int cmd_record(int argc, const char **argv)
>  
>  	symbol__init(NULL);
>  
> +	if (rec->opts.affinity != PERF_AFFINITY_SYS) {
> +		rec->affinity_mask.nbits = cpu__max_cpu();
> +		rec->affinity_mask.bits = bitmap_alloc(rec->affinity_mask.nbits);
> +		if (!rec->affinity_mask.bits) {
> +			pr_err("Failed to allocate thread mask for %ld cpus\n", rec->affinity_mask.nbits);
> +			return -ENOMEM;
> +		}
> +		pr_debug2("thread mask[%ld]: empty\n", rec->affinity_mask.nbits);
> +	}
> +
>  	err = record__auxtrace_init(rec);
>  	if (err)
>  		goto out;
> @@ -2613,6 +2628,7 @@ int cmd_record(int argc, const char **argv)
>  
>  	err = __cmd_record(&record, argc, argv);
>  out:
> +	bitmap_free(rec->affinity_mask.bits);
>  	evlist__delete(rec->evlist);
>  	symbol__exit();
>  	auxtrace_record__free(rec->itr);
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 43c12b4a3e17..832d2cb94b2c 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -219,6 +219,8 @@ static void perf_mmap__aio_munmap(struct mmap *map __maybe_unused)
>  
>  void mmap__munmap(struct mmap *map)
>  {
> +	bitmap_free(map->affinity_mask.bits);
> +
>  	perf_mmap__aio_munmap(map);
>  	if (map->data != NULL) {
>  		munmap(map->data, mmap__mmap_len(map));
> @@ -227,7 +229,7 @@ void mmap__munmap(struct mmap *map)
>  	auxtrace_mmap__munmap(&map->auxtrace_mmap);
>  }
>  
> -static void build_node_mask(int node, cpu_set_t *mask)
> +static void build_node_mask(int node, struct mmap_cpu_mask *mask)
>  {
>  	int c, cpu, nr_cpus;
>  	const struct perf_cpu_map *cpu_map = NULL;
> @@ -240,17 +242,23 @@ static void build_node_mask(int node, cpu_set_t *mask)
>  	for (c = 0; c < nr_cpus; c++) {
>  		cpu = cpu_map->map[c]; /* map c index to online cpu index */
>  		if (cpu__get_node(cpu) == node)
> -			CPU_SET(cpu, mask);
> +			set_bit(cpu, mask->bits);
>  	}
>  }
>  
> -static void perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
> +static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
>  {
> -	CPU_ZERO(&map->affinity_mask);
> +	map->affinity_mask.nbits = cpu__max_cpu();
> +	map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits);
> +	if (!map->affinity_mask.bits)
> +		return -1;
> +
>  	if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1)
>  		build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask);
>  	else if (mp->affinity == PERF_AFFINITY_CPU)
> -		CPU_SET(map->core.cpu, &map->affinity_mask);
> +		set_bit(map->core.cpu, map->affinity_mask.bits);
> +
> +	return 0;
>  }
>  
>  int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
> @@ -261,7 +269,15 @@ int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
>  		return -1;
>  	}
>  
> -	perf_mmap__setup_affinity_mask(map, mp);
> +	if (mp->affinity != PERF_AFFINITY_SYS &&
> +		perf_mmap__setup_affinity_mask(map, mp)) {
> +		pr_debug2("failed to alloc mmap affinity mask, error %d\n",
> +			  errno);
> +		return -1;
> +	}
> +
> +	if (verbose == 2)
> +		mmap_cpu_mask__scnprintf(&map->affinity_mask, "mmap");
>  
>  	map->core.flush = mp->flush;
>  
> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
> index ef51667fabcb..9d5f589f02ae 100644
> --- a/tools/perf/util/mmap.h
> +++ b/tools/perf/util/mmap.h
> @@ -40,7 +40,7 @@ struct mmap {
>  		int		 nr_cblocks;
>  	} aio;
>  #endif
> -	cpu_set_t	affinity_mask;
> +	struct mmap_cpu_mask	affinity_mask;
>  	void		*data;
>  	int		comp_level;
>  };
> -- 
> 2.20.1
> 

-- 

- Arnaldo

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 2/3] perf mmap: declare type for cpu mask of arbitrary length
  2019-12-03 11:44 ` [PATCH v5 2/3] perf mmap: declare type for cpu mask of arbitrary length Alexey Budankov
@ 2019-12-04 13:49   ` Arnaldo Carvalho de Melo
  2019-12-05  7:31     ` Alexey Budankov
  2020-01-10 17:53   ` [tip: perf/core] perf mmap: Declare " tip-bot2 for Alexey Budankov
  1 sibling, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-12-04 13:49 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra,
	Ingo Molnar, Andi Kleen, linux-kernel

Em Tue, Dec 03, 2019 at 02:44:18PM +0300, Alexey Budankov escreveu:
> 
> Declare a dedicated struct map_cpu_mask type for cpu masks of
> arbitrary length. Mask is available thru bits pointer and the
> mask length is kept in nbits field. MMAP_CPU_MASK_BYTES() macro
> returns mask storage size in bytes. mmap_cpu_mask__scnprintf()
> function can be used to log text representation of the mask.
> 
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
>  tools/perf/util/mmap.c | 12 ++++++++++++
>  tools/perf/util/mmap.h | 11 +++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 063d1b93c53d..43c12b4a3e17 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -23,6 +23,18 @@
>  #include "mmap.h"
>  #include "../perf.h"
>  #include <internal/lib.h> /* page_size */
> +#include <linux/bitmap.h>
> +
> +#define MASK_SIZE 1023
> +void mmap_cpu_mask__scnprintf(struct mmap_cpu_mask *mask, const char *tag)
> +{
> +	char buf[MASK_SIZE + 1];
> +	size_t len;
> +
> +	len = bitmap_scnprintf(mask->bits, mask->nbits, buf, MASK_SIZE);
> +	buf[len] = '\0';
> +	pr_debug("%p: %s mask[%ld]: %s\n", mask, tag, mask->nbits, buf);
> +}

Above should also be %zd, fixed.

- Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/3] perf record: adapt affinity to machines with #CPUs > 1K
  2019-12-04 13:48   ` Arnaldo Carvalho de Melo
@ 2019-12-05  7:30     ` Alexey Budankov
  0 siblings, 0 replies; 13+ messages in thread
From: Alexey Budankov @ 2019-12-05  7:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Peter Zijlstra,
	Ingo Molnar, Andi Kleen, linux-kernel


On 04.12.2019 16:48, Arnaldo Carvalho de Melo wrote:
> Em Tue, Dec 03, 2019 at 02:45:27PM +0300, Alexey Budankov escreveu:
>>
>> Use struct mmap_cpu_mask type for tool's thread and mmap data
>> buffers to overcome current 1024 CPUs mask size limitation of
>> cpu_set_t type.
>>
>> Currently glibc cpu_set_t type has internal mask size limit
>> of 1024 CPUs. Moving to struct mmap_cpu_mask type allows
>> overcoming that limit. tools bitmap API is used to manipulate
>> objects of struct mmap_cpu_mask type.
> 
> Had to apply this to fix the build in some toolchains/arches:
> 
> [acme@quaco perf]$ git diff
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 7bc83755ef8c..4c301466101b 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -2508,10 +2508,10 @@ int cmd_record(int argc, const char **argv)
>                 rec->affinity_mask.nbits = cpu__max_cpu();
>                 rec->affinity_mask.bits = bitmap_alloc(rec->affinity_mask.nbits);
>                 if (!rec->affinity_mask.bits) {
> -                       pr_err("Failed to allocate thread mask for %ld cpus\n", rec->affinity_mask.nbits);
> +                       pr_err("Failed to allocate thread mask for %zd cpus\n", rec->affinity_mask.nbits);
>                         return -ENOMEM;
>                 }
> -               pr_debug2("thread mask[%ld]: empty\n", rec->affinity_mask.nbits);
> +               pr_debug2("thread mask[%zd]: empty\n", rec->affinity_mask.nbits);
>         }
> 
>         err = record__auxtrace_init(rec);

Thank you.

~Alexey

> 
> 
>  
>> Reported-by: Andi Kleen <ak@linux.intel.com>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>>  tools/perf/builtin-record.c | 28 ++++++++++++++++++++++------
>>  tools/perf/util/mmap.c      | 28 ++++++++++++++++++++++------
>>  tools/perf/util/mmap.h      |  2 +-
>>  3 files changed, 45 insertions(+), 13 deletions(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index fb19ef63cc35..7bc83755ef8c 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -62,6 +62,7 @@
>>  #include <linux/string.h>
>>  #include <linux/time64.h>
>>  #include <linux/zalloc.h>
>> +#include <linux/bitmap.h>
>>  
>>  struct switch_output {
>>  	bool		 enabled;
>> @@ -93,7 +94,7 @@ struct record {
>>  	bool			timestamp_boundary;
>>  	struct switch_output	switch_output;
>>  	unsigned long long	samples;
>> -	cpu_set_t		affinity_mask;
>> +	struct mmap_cpu_mask	affinity_mask;
>>  	unsigned long		output_max_size;	/* = 0: unlimited */
>>  };
>>  
>> @@ -961,10 +962,15 @@ static struct perf_event_header finished_round_event = {
>>  static void record__adjust_affinity(struct record *rec, struct mmap *map)
>>  {
>>  	if (rec->opts.affinity != PERF_AFFINITY_SYS &&
>> -	    !CPU_EQUAL(&rec->affinity_mask, &map->affinity_mask)) {
>> -		CPU_ZERO(&rec->affinity_mask);
>> -		CPU_OR(&rec->affinity_mask, &rec->affinity_mask, &map->affinity_mask);
>> -		sched_setaffinity(0, sizeof(rec->affinity_mask), &rec->affinity_mask);
>> +	    !bitmap_equal(rec->affinity_mask.bits, map->affinity_mask.bits,
>> +			  rec->affinity_mask.nbits)) {
>> +		bitmap_zero(rec->affinity_mask.bits, rec->affinity_mask.nbits);
>> +		bitmap_or(rec->affinity_mask.bits, rec->affinity_mask.bits,
>> +			  map->affinity_mask.bits, rec->affinity_mask.nbits);
>> +		sched_setaffinity(0, MMAP_CPU_MASK_BYTES(&rec->affinity_mask),
>> +				  (cpu_set_t *)rec->affinity_mask.bits);
>> +		if (verbose == 2)
>> +			mmap_cpu_mask__scnprintf(&rec->affinity_mask, "thread");
>>  	}
>>  }
>>  
>> @@ -2433,7 +2439,6 @@ int cmd_record(int argc, const char **argv)
>>  # undef REASON
>>  #endif
>>  
>> -	CPU_ZERO(&rec->affinity_mask);
>>  	rec->opts.affinity = PERF_AFFINITY_SYS;
>>  
>>  	rec->evlist = evlist__new();
>> @@ -2499,6 +2504,16 @@ int cmd_record(int argc, const char **argv)
>>  
>>  	symbol__init(NULL);
>>  
>> +	if (rec->opts.affinity != PERF_AFFINITY_SYS) {
>> +		rec->affinity_mask.nbits = cpu__max_cpu();
>> +		rec->affinity_mask.bits = bitmap_alloc(rec->affinity_mask.nbits);
>> +		if (!rec->affinity_mask.bits) {
>> +			pr_err("Failed to allocate thread mask for %ld cpus\n", rec->affinity_mask.nbits);
>> +			return -ENOMEM;
>> +		}
>> +		pr_debug2("thread mask[%ld]: empty\n", rec->affinity_mask.nbits);
>> +	}
>> +
>>  	err = record__auxtrace_init(rec);
>>  	if (err)
>>  		goto out;
>> @@ -2613,6 +2628,7 @@ int cmd_record(int argc, const char **argv)
>>  
>>  	err = __cmd_record(&record, argc, argv);
>>  out:
>> +	bitmap_free(rec->affinity_mask.bits);
>>  	evlist__delete(rec->evlist);
>>  	symbol__exit();
>>  	auxtrace_record__free(rec->itr);
>> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
>> index 43c12b4a3e17..832d2cb94b2c 100644
>> --- a/tools/perf/util/mmap.c
>> +++ b/tools/perf/util/mmap.c
>> @@ -219,6 +219,8 @@ static void perf_mmap__aio_munmap(struct mmap *map __maybe_unused)
>>  
>>  void mmap__munmap(struct mmap *map)
>>  {
>> +	bitmap_free(map->affinity_mask.bits);
>> +
>>  	perf_mmap__aio_munmap(map);
>>  	if (map->data != NULL) {
>>  		munmap(map->data, mmap__mmap_len(map));
>> @@ -227,7 +229,7 @@ void mmap__munmap(struct mmap *map)
>>  	auxtrace_mmap__munmap(&map->auxtrace_mmap);
>>  }
>>  
>> -static void build_node_mask(int node, cpu_set_t *mask)
>> +static void build_node_mask(int node, struct mmap_cpu_mask *mask)
>>  {
>>  	int c, cpu, nr_cpus;
>>  	const struct perf_cpu_map *cpu_map = NULL;
>> @@ -240,17 +242,23 @@ static void build_node_mask(int node, cpu_set_t *mask)
>>  	for (c = 0; c < nr_cpus; c++) {
>>  		cpu = cpu_map->map[c]; /* map c index to online cpu index */
>>  		if (cpu__get_node(cpu) == node)
>> -			CPU_SET(cpu, mask);
>> +			set_bit(cpu, mask->bits);
>>  	}
>>  }
>>  
>> -static void perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
>> +static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
>>  {
>> -	CPU_ZERO(&map->affinity_mask);
>> +	map->affinity_mask.nbits = cpu__max_cpu();
>> +	map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits);
>> +	if (!map->affinity_mask.bits)
>> +		return -1;
>> +
>>  	if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1)
>>  		build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask);
>>  	else if (mp->affinity == PERF_AFFINITY_CPU)
>> -		CPU_SET(map->core.cpu, &map->affinity_mask);
>> +		set_bit(map->core.cpu, map->affinity_mask.bits);
>> +
>> +	return 0;
>>  }
>>  
>>  int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
>> @@ -261,7 +269,15 @@ int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
>>  		return -1;
>>  	}
>>  
>> -	perf_mmap__setup_affinity_mask(map, mp);
>> +	if (mp->affinity != PERF_AFFINITY_SYS &&
>> +		perf_mmap__setup_affinity_mask(map, mp)) {
>> +		pr_debug2("failed to alloc mmap affinity mask, error %d\n",
>> +			  errno);
>> +		return -1;
>> +	}
>> +
>> +	if (verbose == 2)
>> +		mmap_cpu_mask__scnprintf(&map->affinity_mask, "mmap");
>>  
>>  	map->core.flush = mp->flush;
>>  
>> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
>> index ef51667fabcb..9d5f589f02ae 100644
>> --- a/tools/perf/util/mmap.h
>> +++ b/tools/perf/util/mmap.h
>> @@ -40,7 +40,7 @@ struct mmap {
>>  		int		 nr_cblocks;
>>  	} aio;
>>  #endif
>> -	cpu_set_t	affinity_mask;
>> +	struct mmap_cpu_mask	affinity_mask;
>>  	void		*data;
>>  	int		comp_level;
>>  };
>> -- 
>> 2.20.1
>>
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 2/3] perf mmap: declare type for cpu mask of arbitrary length
  2019-12-04 13:49   ` Arnaldo Carvalho de Melo
@ 2019-12-05  7:31     ` Alexey Budankov
  0 siblings, 0 replies; 13+ messages in thread
From: Alexey Budankov @ 2019-12-05  7:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Ingo Molnar,
	Andi Kleen, linux-kernel

On 04.12.2019 16:49, Arnaldo Carvalho de Melo wrote:
> Em Tue, Dec 03, 2019 at 02:44:18PM +0300, Alexey Budankov escreveu:
>>
>> Declare a dedicated struct map_cpu_mask type for cpu masks of
>> arbitrary length. Mask is available thru bits pointer and the
>> mask length is kept in nbits field. MMAP_CPU_MASK_BYTES() macro
>> returns mask storage size in bytes. mmap_cpu_mask__scnprintf()
>> function can be used to log text representation of the mask.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>>  tools/perf/util/mmap.c | 12 ++++++++++++
>>  tools/perf/util/mmap.h | 11 +++++++++++
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
>> index 063d1b93c53d..43c12b4a3e17 100644
>> --- a/tools/perf/util/mmap.c
>> +++ b/tools/perf/util/mmap.c
>> @@ -23,6 +23,18 @@
>>  #include "mmap.h"
>>  #include "../perf.h"
>>  #include <internal/lib.h> /* page_size */
>> +#include <linux/bitmap.h>
>> +
>> +#define MASK_SIZE 1023
>> +void mmap_cpu_mask__scnprintf(struct mmap_cpu_mask *mask, const char *tag)
>> +{
>> +	char buf[MASK_SIZE + 1];
>> +	size_t len;
>> +
>> +	len = bitmap_scnprintf(mask->bits, mask->nbits, buf, MASK_SIZE);
>> +	buf[len] = '\0';
>> +	pr_debug("%p: %s mask[%ld]: %s\n", mask, tag, mask->nbits, buf);
>> +}
> 
> Above should also be %zd, fixed.

Thanks Arnaldo, Jiri! Appreciate you collaboration and help.

~Alexey

> 
> - Arnaldo
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [tip: perf/core] perf record: Adapt affinity to machines with #CPUs > 1K
  2019-12-03 11:45 ` [PATCH v5 3/3] perf record: adapt affinity to machines with #CPUs > 1K Alexey Budankov
  2019-12-04 13:48   ` Arnaldo Carvalho de Melo
@ 2020-01-10 17:53   ` tip-bot2 for Alexey Budankov
  1 sibling, 0 replies; 13+ messages in thread
From: tip-bot2 for Alexey Budankov @ 2020-01-10 17:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andi Kleen, Alexey Budankov, Jiri Olsa, Alexander Shishkin,
	Namhyung Kim, Peter Zijlstra, Arnaldo Carvalho de Melo, x86,
	LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     8384a2600c7ddfc875f64e160d8b423aca4e203a
Gitweb:        https://git.kernel.org/tip/8384a2600c7ddfc875f64e160d8b423aca4e203a
Author:        Alexey Budankov <alexey.budankov@linux.intel.com>
AuthorDate:    Tue, 03 Dec 2019 14:45:27 +03:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Mon, 06 Jan 2020 11:46:09 -03:00

perf record: Adapt affinity to machines with #CPUs > 1K

Use struct mmap_cpu_mask type for the tool's thread and mmap data
buffers to overcome current 1024 CPUs mask size limitation of cpu_set_t
type.

Currently glibc's cpu_set_t type has an internal mask size limit of 1024
CPUs.

Moving to the 'struct mmap_cpu_mask' type allows overcoming that limit.

The tools bitmap API is used to manipulate objects of 'struct mmap_cpu_mask'
type.

Committer notes:

To print the 'nbits' struct member we must use %zd, since it is a
size_t, this fixes the build in some toolchains/arches.

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/96d7e2ff-ce8b-c1e0-d52c-aa59ea96f0ea@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c | 28 ++++++++++++++++++++++------
 tools/perf/util/mmap.c      | 28 ++++++++++++++++++++++------
 tools/perf/util/mmap.h      |  2 +-
 3 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index fb19ef6..4c30146 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -62,6 +62,7 @@
 #include <linux/string.h>
 #include <linux/time64.h>
 #include <linux/zalloc.h>
+#include <linux/bitmap.h>
 
 struct switch_output {
 	bool		 enabled;
@@ -93,7 +94,7 @@ struct record {
 	bool			timestamp_boundary;
 	struct switch_output	switch_output;
 	unsigned long long	samples;
-	cpu_set_t		affinity_mask;
+	struct mmap_cpu_mask	affinity_mask;
 	unsigned long		output_max_size;	/* = 0: unlimited */
 };
 
@@ -961,10 +962,15 @@ static struct perf_event_header finished_round_event = {
 static void record__adjust_affinity(struct record *rec, struct mmap *map)
 {
 	if (rec->opts.affinity != PERF_AFFINITY_SYS &&
-	    !CPU_EQUAL(&rec->affinity_mask, &map->affinity_mask)) {
-		CPU_ZERO(&rec->affinity_mask);
-		CPU_OR(&rec->affinity_mask, &rec->affinity_mask, &map->affinity_mask);
-		sched_setaffinity(0, sizeof(rec->affinity_mask), &rec->affinity_mask);
+	    !bitmap_equal(rec->affinity_mask.bits, map->affinity_mask.bits,
+			  rec->affinity_mask.nbits)) {
+		bitmap_zero(rec->affinity_mask.bits, rec->affinity_mask.nbits);
+		bitmap_or(rec->affinity_mask.bits, rec->affinity_mask.bits,
+			  map->affinity_mask.bits, rec->affinity_mask.nbits);
+		sched_setaffinity(0, MMAP_CPU_MASK_BYTES(&rec->affinity_mask),
+				  (cpu_set_t *)rec->affinity_mask.bits);
+		if (verbose == 2)
+			mmap_cpu_mask__scnprintf(&rec->affinity_mask, "thread");
 	}
 }
 
@@ -2433,7 +2439,6 @@ int cmd_record(int argc, const char **argv)
 # undef REASON
 #endif
 
-	CPU_ZERO(&rec->affinity_mask);
 	rec->opts.affinity = PERF_AFFINITY_SYS;
 
 	rec->evlist = evlist__new();
@@ -2499,6 +2504,16 @@ int cmd_record(int argc, const char **argv)
 
 	symbol__init(NULL);
 
+	if (rec->opts.affinity != PERF_AFFINITY_SYS) {
+		rec->affinity_mask.nbits = cpu__max_cpu();
+		rec->affinity_mask.bits = bitmap_alloc(rec->affinity_mask.nbits);
+		if (!rec->affinity_mask.bits) {
+			pr_err("Failed to allocate thread mask for %zd cpus\n", rec->affinity_mask.nbits);
+			return -ENOMEM;
+		}
+		pr_debug2("thread mask[%zd]: empty\n", rec->affinity_mask.nbits);
+	}
+
 	err = record__auxtrace_init(rec);
 	if (err)
 		goto out;
@@ -2613,6 +2628,7 @@ int cmd_record(int argc, const char **argv)
 
 	err = __cmd_record(&record, argc, argv);
 out:
+	bitmap_free(rec->affinity_mask.bits);
 	evlist__delete(rec->evlist);
 	symbol__exit();
 	auxtrace_record__free(rec->itr);
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 2ee4faa..3b664fa 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -219,6 +219,8 @@ static void perf_mmap__aio_munmap(struct mmap *map __maybe_unused)
 
 void mmap__munmap(struct mmap *map)
 {
+	bitmap_free(map->affinity_mask.bits);
+
 	perf_mmap__aio_munmap(map);
 	if (map->data != NULL) {
 		munmap(map->data, mmap__mmap_len(map));
@@ -227,7 +229,7 @@ void mmap__munmap(struct mmap *map)
 	auxtrace_mmap__munmap(&map->auxtrace_mmap);
 }
 
-static void build_node_mask(int node, cpu_set_t *mask)
+static void build_node_mask(int node, struct mmap_cpu_mask *mask)
 {
 	int c, cpu, nr_cpus;
 	const struct perf_cpu_map *cpu_map = NULL;
@@ -240,17 +242,23 @@ static void build_node_mask(int node, cpu_set_t *mask)
 	for (c = 0; c < nr_cpus; c++) {
 		cpu = cpu_map->map[c]; /* map c index to online cpu index */
 		if (cpu__get_node(cpu) == node)
-			CPU_SET(cpu, mask);
+			set_bit(cpu, mask->bits);
 	}
 }
 
-static void perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
+static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp)
 {
-	CPU_ZERO(&map->affinity_mask);
+	map->affinity_mask.nbits = cpu__max_cpu();
+	map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits);
+	if (!map->affinity_mask.bits)
+		return -1;
+
 	if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1)
 		build_node_mask(cpu__get_node(map->core.cpu), &map->affinity_mask);
 	else if (mp->affinity == PERF_AFFINITY_CPU)
-		CPU_SET(map->core.cpu, &map->affinity_mask);
+		set_bit(map->core.cpu, map->affinity_mask.bits);
+
+	return 0;
 }
 
 int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
@@ -261,7 +269,15 @@ int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, int cpu)
 		return -1;
 	}
 
-	perf_mmap__setup_affinity_mask(map, mp);
+	if (mp->affinity != PERF_AFFINITY_SYS &&
+		perf_mmap__setup_affinity_mask(map, mp)) {
+		pr_debug2("failed to alloc mmap affinity mask, error %d\n",
+			  errno);
+		return -1;
+	}
+
+	if (verbose == 2)
+		mmap_cpu_mask__scnprintf(&map->affinity_mask, "mmap");
 
 	map->core.flush = mp->flush;
 
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index ef51667..9d5f589 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -40,7 +40,7 @@ struct mmap {
 		int		 nr_cblocks;
 	} aio;
 #endif
-	cpu_set_t	affinity_mask;
+	struct mmap_cpu_mask	affinity_mask;
 	void		*data;
 	int		comp_level;
 };

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip: perf/core] tools bitmap: Implement bitmap_equal() operation at bitmap API
  2019-12-03 11:43 ` [PATCH v5 1/3] tools bitmap: implement bitmap_equal() operation at bitmap API Alexey Budankov
@ 2020-01-10 17:53   ` tip-bot2 for Alexey Budankov
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot2 for Alexey Budankov @ 2020-01-10 17:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Alexey Budankov, Jiri Olsa, Alexander Shishkin, Andi Kleen,
	Namhyung Kim, Peter Zijlstra, Arnaldo Carvalho de Melo, x86,
	LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     8812ad412f851216d6c39488a7e563ccc5c604cc
Gitweb:        https://git.kernel.org/tip/8812ad412f851216d6c39488a7e563ccc5c604cc
Author:        Alexey Budankov <alexey.budankov@linux.intel.com>
AuthorDate:    Tue, 03 Dec 2019 14:43:33 +03:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Mon, 06 Jan 2020 11:46:04 -03:00

tools bitmap: Implement bitmap_equal() operation at bitmap API

Extend tools bitmap API with bitmap_equal() implementation.

The implementation has been derived from the kernel.

Extend tools bitmap API with bitmap_free() implementation for symmetry
with bitmap_alloc() function.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/43757993-0b28-d8af-a6c7-ede12e3a6877@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/include/linux/bitmap.h | 30 ++++++++++++++++++++++++++++++
 tools/lib/bitmap.c           | 15 +++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index 05dca5c..477a1ca 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -15,6 +15,8 @@ void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
 		 const unsigned long *bitmap2, int bits);
 int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
 		 const unsigned long *bitmap2, unsigned int bits);
+int __bitmap_equal(const unsigned long *bitmap1,
+		   const unsigned long *bitmap2, unsigned int bits);
 void bitmap_clear(unsigned long *map, unsigned int start, int len);
 
 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
@@ -124,6 +126,15 @@ static inline unsigned long *bitmap_alloc(int nbits)
 }
 
 /*
+ * bitmap_free - Free bitmap
+ * @bitmap: pointer to bitmap
+ */
+static inline void bitmap_free(unsigned long *bitmap)
+{
+	free(bitmap);
+}
+
+/*
  * bitmap_scnprintf - print bitmap list into buffer
  * @bitmap: bitmap
  * @nbits: size of bitmap
@@ -148,4 +159,23 @@ static inline int bitmap_and(unsigned long *dst, const unsigned long *src1,
 	return __bitmap_and(dst, src1, src2, nbits);
 }
 
+#ifdef __LITTLE_ENDIAN
+#define BITMAP_MEM_ALIGNMENT 8
+#else
+#define BITMAP_MEM_ALIGNMENT (8 * sizeof(unsigned long))
+#endif
+#define BITMAP_MEM_MASK (BITMAP_MEM_ALIGNMENT - 1)
+#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
+
+static inline int bitmap_equal(const unsigned long *src1,
+			const unsigned long *src2, unsigned int nbits)
+{
+	if (small_const_nbits(nbits))
+		return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
+	if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
+	    IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
+		return !memcmp(src1, src2, nbits / 8);
+	return __bitmap_equal(src1, src2, nbits);
+}
+
 #endif /* _PERF_BITOPS_H */
diff --git a/tools/lib/bitmap.c b/tools/lib/bitmap.c
index 3849478..5043747 100644
--- a/tools/lib/bitmap.c
+++ b/tools/lib/bitmap.c
@@ -71,3 +71,18 @@ int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
 			   BITMAP_LAST_WORD_MASK(bits));
 	return result != 0;
 }
+
+int __bitmap_equal(const unsigned long *bitmap1,
+		const unsigned long *bitmap2, unsigned int bits)
+{
+	unsigned int k, lim = bits/BITS_PER_LONG;
+	for (k = 0; k < lim; ++k)
+		if (bitmap1[k] != bitmap2[k])
+			return 0;
+
+	if (bits % BITS_PER_LONG)
+		if ((bitmap1[k] ^ bitmap2[k]) & BITMAP_LAST_WORD_MASK(bits))
+			return 0;
+
+	return 1;
+}

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip: perf/core] perf mmap: Declare type for cpu mask of arbitrary length
  2019-12-03 11:44 ` [PATCH v5 2/3] perf mmap: declare type for cpu mask of arbitrary length Alexey Budankov
  2019-12-04 13:49   ` Arnaldo Carvalho de Melo
@ 2020-01-10 17:53   ` tip-bot2 for Alexey Budankov
  1 sibling, 0 replies; 13+ messages in thread
From: tip-bot2 for Alexey Budankov @ 2020-01-10 17:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Alexey Budankov, Jiri Olsa, Alexander Shishkin, Andi Kleen,
	Namhyung Kim, Peter Zijlstra, Arnaldo Carvalho de Melo, x86,
	LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     9c080c0279a80057cad3dfc05d09fb283ddf72f4
Gitweb:        https://git.kernel.org/tip/9c080c0279a80057cad3dfc05d09fb283ddf72f4
Author:        Alexey Budankov <alexey.budankov@linux.intel.com>
AuthorDate:    Tue, 03 Dec 2019 14:44:18 +03:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Mon, 06 Jan 2020 11:46:09 -03:00

perf mmap: Declare type for cpu mask of arbitrary length

Declare a dedicated struct map_cpu_mask type for cpu masks of arbitrary
length.

The mask is available thru bits pointer and the mask length is kept in
nbits field. MMAP_CPU_MASK_BYTES() macro returns mask storage size in
bytes.

The mmap_cpu_mask__scnprintf() function can be used to log text
representation of the mask.

Committer notes:

To print the 'nbits' struct member we must use %zd, since it is a
size_t, this fixes the build in some toolchains/arches.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/0fd2454f-477f-d15a-f4ee-79bcbd2585ff@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/mmap.c | 12 ++++++++++++
 tools/perf/util/mmap.h | 11 +++++++++++
 2 files changed, 23 insertions(+)

diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 063d1b9..2ee4faa 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -23,6 +23,18 @@
 #include "mmap.h"
 #include "../perf.h"
 #include <internal/lib.h> /* page_size */
+#include <linux/bitmap.h>
+
+#define MASK_SIZE 1023
+void mmap_cpu_mask__scnprintf(struct mmap_cpu_mask *mask, const char *tag)
+{
+	char buf[MASK_SIZE + 1];
+	size_t len;
+
+	len = bitmap_scnprintf(mask->bits, mask->nbits, buf, MASK_SIZE);
+	buf[len] = '\0';
+	pr_debug("%p: %s mask[%zd]: %s\n", mask, tag, mask->nbits, buf);
+}
 
 size_t mmap__mmap_len(struct mmap *map)
 {
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index bee4e83..ef51667 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -15,6 +15,15 @@
 #include "event.h"
 
 struct aiocb;
+
+struct mmap_cpu_mask {
+	unsigned long *bits;
+	size_t nbits;
+};
+
+#define MMAP_CPU_MASK_BYTES(m) \
+	(BITS_TO_LONGS(((struct mmap_cpu_mask *)m)->nbits) * sizeof(unsigned long))
+
 /**
  * struct mmap - perf's ring buffer mmap details
  *
@@ -52,4 +61,6 @@ int perf_mmap__push(struct mmap *md, void *to,
 
 size_t mmap__mmap_len(struct mmap *map);
 
+void mmap_cpu_mask__scnprintf(struct mmap_cpu_mask *mask, const char *tag);
+
 #endif /*__PERF_MMAP_H */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-01-10 17:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-03 11:41 [PATCH v5 0/3] perf record: adapt NUMA awareness to machines with #CPUs > 1K Alexey Budankov
2019-12-03 11:43 ` [PATCH v5 1/3] tools bitmap: implement bitmap_equal() operation at bitmap API Alexey Budankov
2020-01-10 17:53   ` [tip: perf/core] tools bitmap: Implement " tip-bot2 for Alexey Budankov
2019-12-03 11:44 ` [PATCH v5 2/3] perf mmap: declare type for cpu mask of arbitrary length Alexey Budankov
2019-12-04 13:49   ` Arnaldo Carvalho de Melo
2019-12-05  7:31     ` Alexey Budankov
2020-01-10 17:53   ` [tip: perf/core] perf mmap: Declare " tip-bot2 for Alexey Budankov
2019-12-03 11:45 ` [PATCH v5 3/3] perf record: adapt affinity to machines with #CPUs > 1K Alexey Budankov
2019-12-04 13:48   ` Arnaldo Carvalho de Melo
2019-12-05  7:30     ` Alexey Budankov
2020-01-10 17:53   ` [tip: perf/core] perf record: Adapt " tip-bot2 for Alexey Budankov
2019-12-03 12:17 ` [PATCH v5 0/3] perf record: adapt NUMA awareness " Jiri Olsa
2019-12-03 18:36   ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.