* [PATCH v2 0/4] Reduce NUMA related overhead in perf record profiling on large server systems
@ 2018-12-24 12:11 Alexey Budankov
2018-12-24 12:23 ` [PATCH v2 1/4] perf record: allocate affinity masks Alexey Budankov
` (3 more replies)
0 siblings, 4 replies; 25+ messages in thread
From: Alexey Budankov @ 2018-12-24 12:11 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
It has been observed that trace reading thread runs on the same hw thread
most of the time during perf record sampling collection. This scheduling
layout leads up to 30% profiling overhead in case when some cpu intensive
workload fully utilizes a large server system with NUMA. Overhead usually
arises from remote (cross node) HW and memory references that have much
longer latencies than local ones [1].
This patch set implements --affinity option that lowers 30% overhead
completely for serial trace streaming (--affinity=cpu) and from 30% to
10% for AIO1 (--aio=1) trace streaming (--affinity=node|cpu).
See OVERHEAD section below for more details.
Implemented extension provides users with capability to instruct Perf
tool to bounce trace reading thread's affinity mask between NUMA nodes
(--affinity=node) or assign the thread to the exact cpu (--affinity=cpu)
that trace buffer being processed belongs to.
The extension brings improvement in case of full system utilization when
Perf tool process contends with workload process on cpu cores. In case a
system has free cores to execute Perf tool process during profiling the
default system scheduling layout induces the lowest overhead.
The patch set has been validated on BT benchmark from NAS Parallel
Benchmarks [2] running on dual socket, 44 cores, 88 hw threads Broadwell
system with kernels v4.4-21-generic (Ubuntu 16.04) and v4.20.0-rc5
(tip perf/core).
OVERHEAD:
BENCH REPORT BASED ELAPSED TIME BASED
v4.20.0-rc5
(tip perf/core):
(current) SERIAL-SYS / BASE : 1.27x (14.37/11.31), 1.29x (15.19/11.69)
SERIAL-NODE / BASE : 1.15x (13.04/11.31), 1.17x (13.79/11.69)
SERIAL-CPU / BASE : 1.00x (11.32/11.31), 1.01x (11.89/11.69)
AIO1-SYS / BASE : 1.29x (14.58/11.31), 1.29x (15.26/11.69)
AIO1-NODE / BASE : 1.08x (12.23/11.31), 1,11x (13.01/11.69)
AIO1-CPU / BASE : 1.07x (12.14/11.31), 1.08x (12.83/11.69)
v4.4.0-21-generic
(Ubuntu 16.04 LTS):
(current) SERIAL-SYS / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
SERIAL-NODE / BASE : 1.19x (13.02/10.87), 1.23x (14.03/11.32)
SERIAL-CPU / BASE : 1.03x (11.21/10.87), 1.07x (12.18/11.32)
AIO1-SYS / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
AIO1-NODE / BASE : 1.10x (12.04/10.87), 1.15x (13.03/11.32)
AIO1-CPU / BASE : 1.12x (12.20/10.87), 1.15x (13.09/11.32)
The patch set is generated for acme perf/core repository.
---
Alexey Budankov (4):
perf record: allocate affinity masks
perf record: bind the AIO user space buffers to nodes
perf record: apply affinity masks when reading mmap buffers
perf record: implement --affinity=node|cpu option
tools/perf/Documentation/perf-record.txt | 5 ++
tools/perf/builtin-record.c | 40 ++++++++++++++-
tools/perf/perf.h | 8 +++
tools/perf/util/evlist.c | 10 ++--
tools/perf/util/evlist.h | 2 +-
tools/perf/util/mmap.c | 63 +++++++++++++++++++++++-
tools/perf/util/mmap.h | 4 +-
7 files changed, 124 insertions(+), 8 deletions(-)
---
Changes in v2:
- made debug affinity mode message user friendly
- converted affinity mode defines to enum values
- implemented perf_mmap__aio_alloc, perf_mmap__aio_free, perf_mmap__aio_bind
and put HAVE_LIBNUMA_SUPPORT #ifdefs in there
- separated AIO buffers binding to patch 2/4
---
[1] https://en.wikipedia.org/wiki/Non-uniform_memory_access
[2] https://www.nas.nasa.gov/publications/npb.html
[3] http://man7.org/linux/man-pages/man2/sched_setaffinity.2.html
[4] http://man7.org/linux/man-pages/man2/mbind.2.html
---
ENVIRONMENT AND MEASUREMENTS:
MACHINE:
broadwell, dual socket, 44 core, 88 threads
/proc/cpuinfo
processor : 87
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
stepping : 1
microcode : 0xb000019
cpu MHz : 1200.117
cache size : 56320 KB
physical id : 1
siblings : 44
core id : 28
cpu cores : 22
apicid : 121
initial apicid : 121
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
bugs :
bogomips : 4391.42
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
BASE:
/usr/bin/time ./bt.B.x
NAS Parallel Benchmarks (NPB3.3-OMP) - BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of available threads: 88
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 10.87
Total threads = 88
Avail threads = 88
Mop/s total = 64608.74
Mop/s/thread = 734.19
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3.1
Compile date = 20 Sep 2018
956.25user 19.14system 0:11.32elapsed 8616%CPU (0avgtext+0avgdata 210496maxresident)k
0inputs+0outputs (0major+57939minor)pagefaults 0swaps
SERIAL-SYS:
/usr/bin/time ./tip/tools/perf/perf record -v -N -B -T -R -F 25000 -a -e cycles -- ./bt.B.x
Using CPUID GenuineIntel-6-4F-1
nr_cblocks: 0
affinity (UNSET:0, NODE:1, CPU:2) = 0
mmap size 528384B
NAS Parallel Benchmarks (NPB3.3-OMP) - BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of available threads: 88
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 13.73
Total threads = 88
Avail threads = 88
Mop/s total = 51136.52
Mop/s/thread = 581.10
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3.1
Compile date = 20 Sep 2018
[ perf record: Captured and wrote 1661,120 MB perf.data ]
1184.84user 40.70system 0:14.69elapsed 8341%CPU (0avgtext+0avgdata 208612maxresident)k
0inputs+3402072outputs (0major+137077minor)pagefaults 0swaps
SERIAL-NODE:
/usr/bin/time ./tip/tools/perf/perf record -v -N -B -T -R -F 25000 --affinity=node -a -e cycles -- ./bt.B.x
Using CPUID GenuineIntel-6-4F-1
nr_cblocks: 0
affinity (UNSET:0, NODE:1, CPU:2) = 1
mmap size 528384B
NAS Parallel Benchmarks (NPB3.3-OMP) - BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of available threads: 88
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 13.02
Total threads = 88
Avail threads = 88
Mop/s total = 53924.69
Mop/s/thread = 612.78
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3.1
Compile date = 20 Sep 2018
[ perf record: Captured and wrote 1557,152 MB perf.data ]
1120.42user 29.92system 0:14.03elapsed 8198%CPU (0avgtext+0avgdata 206388maxresident)k
0inputs+3189128outputs (0major+149207minor)pagefaults 0swaps
SERIAL-CPU:
/usr/bin/time ./tip/tools/perf/perf record -v -N -B -T -R -F 25000 --affinity=cpu -a -e cycles -- ./bt.B.x
Using CPUID GenuineIntel-6-4F-1
nr_cblocks: 0
affinity (UNSET:0, NODE:1, CPU:2) = 2
mmap size 528384B
NAS Parallel Benchmarks (NPB3.3-OMP) - BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of available threads: 88
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 11.21
Total threads = 88
Avail threads = 88
Mop/s total = 62642.24
Mop/s/thread = 711.84
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3.1
Compile date = 20 Sep 2018
[ perf record: Captured and wrote 1365,043 MB perf.data ]
976.06user 31.35system 0:12.18elapsed 8264%CPU (0avgtext+0avgdata 208488maxresident)k
0inputs+2795704outputs (0major+126032minor)pagefaults 0swaps
AIO1-SYS:
/usr/bin/time ./tip/tools/perf/perf record -v -N -B -T -R -F 25000 --aio=1 -a -e cycles -- ./bt.B.x
Using CPUID GenuineIntel-6-4F-1
nr_cblocks: 1
affinity (UNSET:0, NODE:1, CPU:2) = 0
mmap size 528384B
NAS Parallel Benchmarks (NPB3.3-OMP) - BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of available threads: 88
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 14.23
Total threads = 88
Avail threads = 88
Mop/s total = 49338.27
Mop/s/thread = 560.66
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3.1
Compile date = 20 Sep 2018
[ perf record: Captured and wrote 1720,590 MB perf.data ]
1229.19user 41.99system 0:15.22elapsed 8350%CPU (0avgtext+0avgdata 208604maxresident)k
0inputs+3523880outputs (0major+124670minor)pagefaults 0swaps
AIO1-NODE:
/usr/bin/time ./tip/tools/perf/perf record -v -N -B -T -R -F 25000 --aio=1 --affinity=node -a -e cycles -- ./bt.B.x
Using CPUID GenuineIntel-6-4F-1
nr_cblocks: 1
affinity (UNSET:0, NODE:1, CPU:2) = 1
mmap size 528384B
NAS Parallel Benchmarks (NPB3.3-OMP) - BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of available threads: 88
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 12.04
Total threads = 88
Avail threads = 88
Mop/s total = 58313.58
Mop/s/thread = 662.65
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3.1
Compile date = 20 Sep 2018
[ perf record: Captured and wrote 1471,279 MB perf.data ]
1055.62user 30.43system 0:13.03elapsed 8333%CPU (0avgtext+0avgdata 208424maxresident)k
0inputs+3013288outputs (0major+79088minor)pagefaults 0swaps
AIO1-CPU:
/usr/bin/time ./tip/tools/perf/perf record -v -N -B -T -R -F 25000 --aio=1 --affinity=cpu -a -e cycles -- ./bt.B.x
Using CPUID GenuineIntel-6-4F-1
nr_cblocks: 1
affinity (UNSET:0, NODE:1, CPU:2) = 2
mmap size 528384B
NAS Parallel Benchmarks (NPB3.3-OMP) - BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 102x 102x 102
Iterations: 200 dt: 0.0003000
Number of available threads: 88
BT Benchmark Completed.
Class = B
Size = 102x 102x 102
Iterations = 200
Time in seconds = 12.20
Total threads = 88
Avail threads = 88
Mop/s total = 57538.84
Mop/s/thread = 653.85
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3.1
Compile date = 20 Sep 2018
[ perf record: Captured and wrote 1486,859 MB perf.data ]
1051.97user 42.07system 0:13.09elapsed 8352%CPU (0avgtext+0avgdata 206388maxresident)k
0inputs+3045168outputs (0major+174612minor)pagefaults 0swaps
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 1/4] perf record: allocate affinity masks
2018-12-24 12:11 [PATCH v2 0/4] Reduce NUMA related overhead in perf record profiling on large server systems Alexey Budankov
@ 2018-12-24 12:23 ` Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2018-12-24 12:24 ` [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes Alexey Budankov
` (2 subsequent siblings)
3 siblings, 1 reply; 25+ messages in thread
From: Alexey Budankov @ 2018-12-24 12:23 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Allocate affinity option and masks for mmap data buffers and
record thread as well as initialize allocated objects.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
Changes in v2:
- made debug affinity mode message user friendly
- converted affinity mode defines to enum values
---
tools/perf/builtin-record.c | 13 ++++++++++++-
tools/perf/perf.h | 8 ++++++++
tools/perf/util/evlist.c | 6 +++---
tools/perf/util/evlist.h | 2 +-
tools/perf/util/mmap.c | 2 ++
tools/perf/util/mmap.h | 3 ++-
6 files changed, 28 insertions(+), 6 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 882285fb9f64..b26febb54d01 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -81,12 +81,17 @@ struct record {
bool timestamp_boundary;
struct switch_output switch_output;
unsigned long long samples;
+ cpu_set_t affinity_mask;
};
static volatile int auxtrace_record__snapshot_started;
static DEFINE_TRIGGER(auxtrace_snapshot_trigger);
static DEFINE_TRIGGER(switch_output_trigger);
+static const char* affinity_tags[PERF_AFFINITY_EOF] = {
+ "SYS", "NODE", "CPU"
+};
+
static bool switch_output_signal(struct record *rec)
{
return rec->switch_output.signal &&
@@ -533,7 +538,8 @@ static int record__mmap_evlist(struct record *rec,
if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
opts->auxtrace_mmap_pages,
- opts->auxtrace_snapshot_mode, opts->nr_cblocks) < 0) {
+ opts->auxtrace_snapshot_mode,
+ opts->nr_cblocks, opts->affinity) < 0) {
if (errno == EPERM) {
pr_err("Permission error mapping pages.\n"
"Consider increasing "
@@ -1980,6 +1986,9 @@ int cmd_record(int argc, const char **argv)
# undef REASON
#endif
+ CPU_ZERO(&rec->affinity_mask);
+ rec->opts.affinity = PERF_AFFINITY_SYS;
+
rec->evlist = perf_evlist__new();
if (rec->evlist == NULL)
return -ENOMEM;
@@ -2143,6 +2152,8 @@ int cmd_record(int argc, const char **argv)
if (verbose > 0)
pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
+ pr_debug("affinity: %s\n", affinity_tags[rec->opts.affinity]);
+
err = __cmd_record(&record, argc, argv);
out:
perf_evlist__delete(rec->evlist);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 388c6dd128b8..69f54529d81f 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -83,6 +83,14 @@ struct record_opts {
clockid_t clockid;
u64 clockid_res_ns;
int nr_cblocks;
+ int affinity;
+};
+
+enum perf_affinity {
+ PERF_AFFINITY_SYS = 0,
+ PERF_AFFINITY_NODE,
+ PERF_AFFINITY_CPU,
+ PERF_AFFINITY_EOF
};
struct option;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e90575192209..60e825be944a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1018,7 +1018,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
*/
int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
unsigned int auxtrace_pages,
- bool auxtrace_overwrite, int nr_cblocks)
+ bool auxtrace_overwrite, int nr_cblocks, int affinity)
{
struct perf_evsel *evsel;
const struct cpu_map *cpus = evlist->cpus;
@@ -1028,7 +1028,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
* Its value is decided by evsel's write_backward.
* So &mp should not be passed through const pointer.
*/
- struct mmap_params mp = { .nr_cblocks = nr_cblocks };
+ struct mmap_params mp = { .nr_cblocks = nr_cblocks, .affinity = affinity };
if (!evlist->mmap)
evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
@@ -1060,7 +1060,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
{
- return perf_evlist__mmap_ex(evlist, pages, 0, false, 0);
+ return perf_evlist__mmap_ex(evlist, pages, 0, false, 0, PERF_AFFINITY_SYS);
}
int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 868294491194..72728d7f4432 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -162,7 +162,7 @@ unsigned long perf_event_mlock_kb_in_pages(void);
int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
unsigned int auxtrace_pages,
- bool auxtrace_overwrite, int nr_cblocks);
+ bool auxtrace_overwrite, int nr_cblocks, int affinity);
int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages);
void perf_evlist__munmap(struct perf_evlist *evlist);
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 8fc39311a30d..e68ba754a8e2 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -343,6 +343,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
map->fd = fd;
map->cpu = cpu;
+ CPU_ZERO(&map->affinity_mask);
+
if (auxtrace_mmap__mmap(&map->auxtrace_mmap,
&mp->auxtrace_mp, map->base, fd))
return -1;
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index aeb6942fdb00..e566c19b242b 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -38,6 +38,7 @@ struct perf_mmap {
int nr_cblocks;
} aio;
#endif
+ cpu_set_t affinity_mask;
};
/*
@@ -69,7 +70,7 @@ enum bkw_mmap_state {
};
struct mmap_params {
- int prot, mask, nr_cblocks;
+ int prot, mask, nr_cblocks, affinity;
struct auxtrace_mmap_params auxtrace_mp;
};
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes
2018-12-24 12:11 [PATCH v2 0/4] Reduce NUMA related overhead in perf record profiling on large server systems Alexey Budankov
2018-12-24 12:23 ` [PATCH v2 1/4] perf record: allocate affinity masks Alexey Budankov
@ 2018-12-24 12:24 ` Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
` (2 more replies)
2018-12-24 12:27 ` [PATCH v2 3/4] perf record: apply affinity masks when reading mmap buffers Alexey Budankov
2018-12-24 12:28 ` [PATCH v2 4/4] perf record: implement --affinity=node|cpu option Alexey Budankov
3 siblings, 3 replies; 25+ messages in thread
From: Alexey Budankov @ 2018-12-24 12:24 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Allocate and bind AIO user space buffers to the memory nodes
that mmap kernel buffers are bound to.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
Changes in v2:
- implemented perf_mmap__aio_alloc, perf_mmap__aio_free, perf_mmap__aio_bind
and put HAVE_LIBNUMA_SUPPORT #ifdefs in there
---
tools/perf/util/mmap.c | 49 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 47 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index e68ba754a8e2..742fa9a8e498 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -10,6 +10,9 @@
#include <sys/mman.h>
#include <inttypes.h>
#include <asm/bug.h>
+#ifdef HAVE_LIBNUMA_SUPPORT
+#include <numaif.h>
+#endif
#include "debug.h"
#include "event.h"
#include "mmap.h"
@@ -154,6 +157,46 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
}
#ifdef HAVE_AIO_SUPPORT
+
+#ifdef HAVE_LIBNUMA_SUPPORT
+static void perf_mmap__aio_alloc(void **data, size_t len)
+{
+ *data = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
+}
+
+static void perf_mmap__aio_free(void **data, size_t len)
+{
+ munmap(*data, len);
+ *data = NULL;
+}
+
+static void perf_mmap__aio_bind(void *data, size_t len, int cpu, int affinity)
+{
+ if (affinity != PERF_AFFINITY_SYS && cpu__max_node() > 1) {
+ unsigned long node_mask = 1UL << cpu__get_node(cpu);
+ if (mbind(data, len, MPOL_BIND, &node_mask, 1, 0)) {
+ pr_debug2("failed to bind [%p-%p] to node %d\n",
+ data, data + len, cpu__get_node(cpu));
+ }
+ }
+}
+#else
+static void perf_mmap__aio_alloc(void **data, size_t len)
+{
+ *data = malloc(len);
+}
+
+static void perf_mmap__aio_free(void **data, size_t len __maybe_unused)
+{
+ zfree(data);
+}
+
+static void perf_mmap__aio_bind(void *data __maybe_unused, size_t len __maybe_unused,
+ int cpu __maybe_unused, int affinity __maybe_unused)
+{
+}
+#endif
+
static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
{
int delta_max, i, prio;
@@ -177,11 +220,13 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
}
delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
for (i = 0; i < map->aio.nr_cblocks; ++i) {
- map->aio.data[i] = malloc(perf_mmap__mmap_len(map));
+ size_t mmap_len = perf_mmap__mmap_len(map);
+ perf_mmap__aio_alloc(&(map->aio.data[i]), mmap_len);
if (!map->aio.data[i]) {
pr_debug2("failed to allocate data buffer area, error %m");
return -1;
}
+ perf_mmap__aio_bind(map->aio.data[i], mmap_len, map->cpu, mp->affinity);
/*
* Use cblock.aio_fildes value different from -1
* to denote started aio write operation on the
@@ -210,7 +255,7 @@ static void perf_mmap__aio_munmap(struct perf_mmap *map)
int i;
for (i = 0; i < map->aio.nr_cblocks; ++i)
- zfree(&map->aio.data[i]);
+ perf_mmap__aio_free(&(map->aio.data[i]), perf_mmap__mmap_len(map));
if (map->aio.data)
zfree(&map->aio.data);
zfree(&map->aio.cblocks);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 3/4] perf record: apply affinity masks when reading mmap buffers
2018-12-24 12:11 [PATCH v2 0/4] Reduce NUMA related overhead in perf record profiling on large server systems Alexey Budankov
2018-12-24 12:23 ` [PATCH v2 1/4] perf record: allocate affinity masks Alexey Budankov
2018-12-24 12:24 ` [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes Alexey Budankov
@ 2018-12-24 12:27 ` Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2019-01-01 21:39 ` Jiri Olsa
2018-12-24 12:28 ` [PATCH v2 4/4] perf record: implement --affinity=node|cpu option Alexey Budankov
3 siblings, 2 replies; 25+ messages in thread
From: Alexey Budankov @ 2018-12-24 12:27 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Build node cpu masks for mmap data buffers. Apply node cpu
masks to tool thread every time it references data buffers
cross node or cross cpu.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
Changes in v2:
- separated AIO buffers binding to patch 2/4
---
tools/perf/builtin-record.c | 9 +++++++++
tools/perf/util/evlist.c | 6 +++++-
tools/perf/util/mmap.c | 12 ++++++++++++
tools/perf/util/mmap.h | 1 +
4 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b26febb54d01..eea96794ee45 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -536,6 +536,9 @@ static int record__mmap_evlist(struct record *rec,
struct record_opts *opts = &rec->opts;
char msg[512];
+ if (opts->affinity != PERF_AFFINITY_SYS)
+ cpu__setup_cpunode_map();
+
if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
opts->auxtrace_mmap_pages,
opts->auxtrace_snapshot_mode,
@@ -755,6 +758,12 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
struct perf_mmap *map = &maps[i];
if (map->base) {
+ if (rec->opts.affinity != PERF_AFFINITY_SYS &&
+ !CPU_EQUAL(&rec->affinity_mask, &map->affinity_mask)) {
+ CPU_ZERO(&rec->affinity_mask);
+ CPU_OR(&rec->affinity_mask, &rec->affinity_mask, &map->affinity_mask);
+ sched_setaffinity(0, sizeof(rec->affinity_mask), &rec->affinity_mask);
+ }
if (!record__aio_enabled(rec)) {
if (perf_mmap__push(map, rec, record__pushfn) != 0) {
rc = -1;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 60e825be944a..5ca5bb5ea0db 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1028,7 +1028,11 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
* Its value is decided by evsel's write_backward.
* So &mp should not be passed through const pointer.
*/
- struct mmap_params mp = { .nr_cblocks = nr_cblocks, .affinity = affinity };
+ struct mmap_params mp = {
+ .nr_cblocks = nr_cblocks,
+ .affinity = affinity,
+ .cpu_map = cpus
+ };
if (!evlist->mmap)
evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 742fa9a8e498..a2095e4eda4b 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -361,6 +361,7 @@ void perf_mmap__munmap(struct perf_mmap *map)
int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu)
{
+ int c, nr_cpus, node;
/*
* The last one will be done at perf_mmap__consume(), so that we
* make sure we don't prevent tools from consuming every last event in
@@ -389,6 +390,17 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
map->cpu = cpu;
CPU_ZERO(&map->affinity_mask);
+ if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1) {
+ nr_cpus = cpu_map__nr(mp->cpu_map);
+ node = cpu__get_node(map->cpu);
+ for (c = 0; c < nr_cpus; c++) {
+ if (cpu__get_node(c) == node) {
+ CPU_SET(c, &map->affinity_mask);
+ }
+ }
+ } else if (mp->affinity == PERF_AFFINITY_CPU) {
+ CPU_SET(map->cpu, &map->affinity_mask);
+ }
if (auxtrace_mmap__mmap(&map->auxtrace_mmap,
&mp->auxtrace_mp, map->base, fd))
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index e566c19b242b..b3f724fad22e 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -72,6 +72,7 @@ enum bkw_mmap_state {
struct mmap_params {
int prot, mask, nr_cblocks, affinity;
struct auxtrace_mmap_params auxtrace_mp;
+ const struct cpu_map *cpu_map;
};
int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 4/4] perf record: implement --affinity=node|cpu option
2018-12-24 12:11 [PATCH v2 0/4] Reduce NUMA related overhead in perf record profiling on large server systems Alexey Budankov
` (2 preceding siblings ...)
2018-12-24 12:27 ` [PATCH v2 3/4] perf record: apply affinity masks when reading mmap buffers Alexey Budankov
@ 2018-12-24 12:28 ` Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2019-01-01 21:39 ` Jiri Olsa
3 siblings, 2 replies; 25+ messages in thread
From: Alexey Budankov @ 2018-12-24 12:28 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Implement --affinity=node|cpu option for the record mode defaulting
to system affinity mask bouncing.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
tools/perf/Documentation/perf-record.txt | 5 +++++
tools/perf/builtin-record.c | 18 ++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index d232b13ea713..efb839784f32 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -440,6 +440,11 @@ Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default:
Asynchronous mode is supported only when linking Perf tool with libc library
providing implementation for Posix AIO API.
+--affinity=mode::
+Set affinity mask of trace reading thread according to the policy defined by 'mode' value:
+ node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer
+ cpu - thread affinity mask is set to cpu of the processed mmap buffer
+
--all-kernel::
Configure all used events to run in kernel space.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index eea96794ee45..57dc3a45d16f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1653,6 +1653,21 @@ static int parse_clockid(const struct option *opt, const char *str, int unset)
ui__warning("unknown clockid %s, check man page\n", ostr);
return -1;
}
+static int record__parse_affinity(const struct option *opt, const char *str, int unset)
+{
+ struct record_opts *opts = (struct record_opts *)opt->value;
+
+ if (!unset) {
+ if (str) {
+ if (!strcasecmp(str, "node"))
+ opts->affinity = PERF_AFFINITY_NODE;
+ else if (!strcasecmp(str, "cpu"))
+ opts->affinity = PERF_AFFINITY_CPU;
+ }
+ }
+
+ return 0;
+}
static int record__parse_mmap_pages(const struct option *opt,
const char *str,
@@ -1961,6 +1976,9 @@ static struct option __record_options[] = {
&nr_cblocks_default, "n", "Use <n> control blocks in asynchronous trace writing mode (default: 1, max: 4)",
record__aio_parse),
#endif
+ OPT_CALLBACK(0, "affinity", &record.opts, "node|cpu",
+ "Set affinity mask of trace reading thread to NUMA node cpu mask or cpu of processed mmap buffer",
+ record__parse_affinity),
OPT_END()
};
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes
2018-12-24 12:24 ` [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes Alexey Budankov
@ 2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:10 ` Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2019-01-01 21:41 ` Jiri Olsa
2 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2019-01-01 21:39 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Mon, Dec 24, 2018 at 03:24:36PM +0300, Alexey Budankov wrote:
>
> Allocate and bind AIO user space buffers to the memory nodes
> that mmap kernel buffers are bound to.
>
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
> Changes in v2:
> - implemented perf_mmap__aio_alloc, perf_mmap__aio_free, perf_mmap__aio_bind
> and put HAVE_LIBNUMA_SUPPORT #ifdefs in there
> ---
> tools/perf/util/mmap.c | 49 ++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 47 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index e68ba754a8e2..742fa9a8e498 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -10,6 +10,9 @@
> #include <sys/mman.h>
> #include <inttypes.h>
> #include <asm/bug.h>
> +#ifdef HAVE_LIBNUMA_SUPPORT
> +#include <numaif.h>
> +#endif
> #include "debug.h"
> #include "event.h"
> #include "mmap.h"
> @@ -154,6 +157,46 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
> }
>
> #ifdef HAVE_AIO_SUPPORT
> +
> +#ifdef HAVE_LIBNUMA_SUPPORT
> +static void perf_mmap__aio_alloc(void **data, size_t len)
> +{
> + *data = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
> +}
please make perf_mmap__aio_alloc return the pointer,
I dont see a need for **data arg
also perf_mmap__ preffix indicates it's function over
'struct perf_mmap', which should be the first arg..
not sure, but perhaps that could make the code also
simpler, like:
static int perf_mmap__aio_alloc(struct perf_mmap *, int index);
static int perf_mmap__aio_bind(struct perf_mmap *, int index, int affinity);
static void perf_mmap__aio_free(struct perf_mmap *, int index);
jirka
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/4] perf record: implement --affinity=node|cpu option
2018-12-24 12:28 ` [PATCH v2 4/4] perf record: implement --affinity=node|cpu option Alexey Budankov
@ 2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:15 ` Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
1 sibling, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2019-01-01 21:39 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Mon, Dec 24, 2018 at 03:28:33PM +0300, Alexey Budankov wrote:
>
> Implement --affinity=node|cpu option for the record mode defaulting
> to system affinity mask bouncing.
>
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
> tools/perf/Documentation/perf-record.txt | 5 +++++
> tools/perf/builtin-record.c | 18 ++++++++++++++++++
> 2 files changed, 23 insertions(+)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index d232b13ea713..efb839784f32 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -440,6 +440,11 @@ Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default:
> Asynchronous mode is supported only when linking Perf tool with libc library
> providing implementation for Posix AIO API.
>
> +--affinity=mode::
> +Set affinity mask of trace reading thread according to the policy defined by 'mode' value:
> + node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer
> + cpu - thread affinity mask is set to cpu of the processed mmap buffer
> +
> --all-kernel::
> Configure all used events to run in kernel space.
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index eea96794ee45..57dc3a45d16f 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1653,6 +1653,21 @@ static int parse_clockid(const struct option *opt, const char *str, int unset)
> ui__warning("unknown clockid %s, check man page\n", ostr);
> return -1;
> }
> +static int record__parse_affinity(const struct option *opt, const char *str, int unset)
> +{
> + struct record_opts *opts = (struct record_opts *)opt->value;
> +
> + if (!unset) {
> + if (str) {
> + if (!strcasecmp(str, "node"))
> + opts->affinity = PERF_AFFINITY_NODE;
> + else if (!strcasecmp(str, "cpu"))
> + opts->affinity = PERF_AFFINITY_CPU;
> + }
> + }
> +
> + return 0;
> +}
>
> static int record__parse_mmap_pages(const struct option *opt,
> const char *str,
> @@ -1961,6 +1976,9 @@ static struct option __record_options[] = {
> &nr_cblocks_default, "n", "Use <n> control blocks in asynchronous trace writing mode (default: 1, max: 4)",
> record__aio_parse),
> #endif
> + OPT_CALLBACK(0, "affinity", &record.opts, "node|cpu",
> + "Set affinity mask of trace reading thread to NUMA node cpu mask or cpu of processed mmap buffer",
> + record__parse_affinity),
so this makes sense only when there's --aio and LIBNUMA
in place.. we should check for those and allow this only
for these
jirka
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] perf record: allocate affinity masks
2018-12-24 12:23 ` [PATCH v2 1/4] perf record: allocate affinity masks Alexey Budankov
@ 2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:10 ` Alexey Budankov
0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2019-01-01 21:39 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Mon, Dec 24, 2018 at 03:23:13PM +0300, Alexey Budankov wrote:
SNIP
> @@ -2143,6 +2152,8 @@ int cmd_record(int argc, const char **argv)
> if (verbose > 0)
> pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
>
> + pr_debug("affinity: %s\n", affinity_tags[rec->opts.affinity]);
> +
> err = __cmd_record(&record, argc, argv);
> out:
> perf_evlist__delete(rec->evlist);
> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
> index 388c6dd128b8..69f54529d81f 100644
> --- a/tools/perf/perf.h
> +++ b/tools/perf/perf.h
> @@ -83,6 +83,14 @@ struct record_opts {
> clockid_t clockid;
> u64 clockid_res_ns;
> int nr_cblocks;
> + int affinity;
> +};
> +
> +enum perf_affinity {
> + PERF_AFFINITY_SYS = 0,
> + PERF_AFFINITY_NODE,
> + PERF_AFFINITY_CPU,
> + PERF_AFFINITY_EOF
PERF_AFFINITY_MAX might be better name
jirka
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] perf record: apply affinity masks when reading mmap buffers
2018-12-24 12:27 ` [PATCH v2 3/4] perf record: apply affinity masks when reading mmap buffers Alexey Budankov
@ 2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:13 ` Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
1 sibling, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2019-01-01 21:39 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Mon, Dec 24, 2018 at 03:27:17PM +0300, Alexey Budankov wrote:
SNIP
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 742fa9a8e498..a2095e4eda4b 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -361,6 +361,7 @@ void perf_mmap__munmap(struct perf_mmap *map)
>
> int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu)
> {
> + int c, nr_cpus, node;
> /*
> * The last one will be done at perf_mmap__consume(), so that we
> * make sure we don't prevent tools from consuming every last event in
> @@ -389,6 +390,17 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
> map->cpu = cpu;
>
> CPU_ZERO(&map->affinity_mask);
> + if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1) {
> + nr_cpus = cpu_map__nr(mp->cpu_map);
> + node = cpu__get_node(map->cpu);
> + for (c = 0; c < nr_cpus; c++) {
> + if (cpu__get_node(c) == node) {
the 'c' is just an index here, I think you need to
use the real cpu value from the mp->cpu_map->map[c]
jirka
> + CPU_SET(c, &map->affinity_mask);
> + }
> + }
> + } else if (mp->affinity == PERF_AFFINITY_CPU) {
> + CPU_SET(map->cpu, &map->affinity_mask);
> + }
>
> if (auxtrace_mmap__mmap(&map->auxtrace_mmap,
> &mp->auxtrace_mp, map->base, fd))
> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
> index e566c19b242b..b3f724fad22e 100644
> --- a/tools/perf/util/mmap.h
> +++ b/tools/perf/util/mmap.h
> @@ -72,6 +72,7 @@ enum bkw_mmap_state {
> struct mmap_params {
> int prot, mask, nr_cblocks, affinity;
> struct auxtrace_mmap_params auxtrace_mp;
> + const struct cpu_map *cpu_map;
> };
>
> int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu);
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes
2018-12-24 12:24 ` [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
@ 2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:10 ` Alexey Budankov
2019-01-01 21:41 ` Jiri Olsa
2 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2019-01-01 21:39 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Mon, Dec 24, 2018 at 03:24:36PM +0300, Alexey Budankov wrote:
SNIP
> +#else
> +static void perf_mmap__aio_alloc(void **data, size_t len)
> +{
> + *data = malloc(len);
> +}
> +
> +static void perf_mmap__aio_free(void **data, size_t len __maybe_unused)
> +{
> + zfree(data);
> +}
> +
> +static void perf_mmap__aio_bind(void *data __maybe_unused, size_t len __maybe_unused,
> + int cpu __maybe_unused, int affinity __maybe_unused)
> +{
> +}
> +#endif
> +
> static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
> {
> int delta_max, i, prio;
> @@ -177,11 +220,13 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
> }
> delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
> for (i = 0; i < map->aio.nr_cblocks; ++i) {
> - map->aio.data[i] = malloc(perf_mmap__mmap_len(map));
> + size_t mmap_len = perf_mmap__mmap_len(map);
WARNING: Missing a blank line after declarations
and plenty of othres from scripts/checkpatch.pl,
please run that
jirka
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/4] perf record: implement --affinity=node|cpu option
2018-12-24 12:28 ` [PATCH v2 4/4] perf record: implement --affinity=node|cpu option Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
@ 2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:15 ` Alexey Budankov
2019-01-09 9:15 ` Alexey Budankov
1 sibling, 2 replies; 25+ messages in thread
From: Jiri Olsa @ 2019-01-01 21:39 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Mon, Dec 24, 2018 at 03:28:33PM +0300, Alexey Budankov wrote:
>
> Implement --affinity=node|cpu option for the record mode defaulting
> to system affinity mask bouncing.
>
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
> tools/perf/Documentation/perf-record.txt | 5 +++++
> tools/perf/builtin-record.c | 18 ++++++++++++++++++
> 2 files changed, 23 insertions(+)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index d232b13ea713..efb839784f32 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -440,6 +440,11 @@ Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default:
> Asynchronous mode is supported only when linking Perf tool with libc library
> providing implementation for Posix AIO API.
>
> +--affinity=mode::
> +Set affinity mask of trace reading thread according to the policy defined by 'mode' value:
> + node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer
> + cpu - thread affinity mask is set to cpu of the processed mmap buffer
> +
> --all-kernel::
> Configure all used events to run in kernel space.
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index eea96794ee45..57dc3a45d16f 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1653,6 +1653,21 @@ static int parse_clockid(const struct option *opt, const char *str, int unset)
> ui__warning("unknown clockid %s, check man page\n", ostr);
> return -1;
> }
> +static int record__parse_affinity(const struct option *opt, const char *str, int unset)
> +{
> + struct record_opts *opts = (struct record_opts *)opt->value;
> +
please use:
if (unset)
return 0;
if (str) {
...
}
jirka
> + if (!unset) {
> + if (str) {
> + if (!strcasecmp(str, "node"))
> + opts->affinity = PERF_AFFINITY_NODE;
> + else if (!strcasecmp(str, "cpu"))
> + opts->affinity = PERF_AFFINITY_CPU;
> + }
> + }
> +
> + return 0;
> +}
>
> static int record__parse_mmap_pages(const struct option *opt,
> const char *str,
> @@ -1961,6 +1976,9 @@ static struct option __record_options[] = {
> &nr_cblocks_default, "n", "Use <n> control blocks in asynchronous trace writing mode (default: 1, max: 4)",
> record__aio_parse),
> #endif
> + OPT_CALLBACK(0, "affinity", &record.opts, "node|cpu",
> + "Set affinity mask of trace reading thread to NUMA node cpu mask or cpu of processed mmap buffer",
> + record__parse_affinity),
> OPT_END()
> };
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] perf record: apply affinity masks when reading mmap buffers
2018-12-24 12:27 ` [PATCH v2 3/4] perf record: apply affinity masks when reading mmap buffers Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
@ 2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:14 ` Alexey Budankov
1 sibling, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2019-01-01 21:39 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Mon, Dec 24, 2018 at 03:27:17PM +0300, Alexey Budankov wrote:
>
> Build node cpu masks for mmap data buffers. Apply node cpu
> masks to tool thread every time it references data buffers
> cross node or cross cpu.
>
> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
> ---
> Changes in v2:
> - separated AIO buffers binding to patch 2/4
> ---
> tools/perf/builtin-record.c | 9 +++++++++
> tools/perf/util/evlist.c | 6 +++++-
> tools/perf/util/mmap.c | 12 ++++++++++++
> tools/perf/util/mmap.h | 1 +
> 4 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index b26febb54d01..eea96794ee45 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -536,6 +536,9 @@ static int record__mmap_evlist(struct record *rec,
> struct record_opts *opts = &rec->opts;
> char msg[512];
>
> + if (opts->affinity != PERF_AFFINITY_SYS)
> + cpu__setup_cpunode_map();
> +
> if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
> opts->auxtrace_mmap_pages,
> opts->auxtrace_snapshot_mode,
> @@ -755,6 +758,12 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
> struct perf_mmap *map = &maps[i];
>
> if (map->base) {
> + if (rec->opts.affinity != PERF_AFFINITY_SYS &&
> + !CPU_EQUAL(&rec->affinity_mask, &map->affinity_mask)) {
> + CPU_ZERO(&rec->affinity_mask);
> + CPU_OR(&rec->affinity_mask, &rec->affinity_mask, &map->affinity_mask);
> + sched_setaffinity(0, sizeof(rec->affinity_mask), &rec->affinity_mask);
all this code depends on aio and LIBNUMA, let's keep it there then
also please add this and the affinity_mask setup code below to a function
thanks,
jirka
> + }
> if (!record__aio_enabled(rec)) {
> if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> rc = -1;
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 60e825be944a..5ca5bb5ea0db 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1028,7 +1028,11 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
> * Its value is decided by evsel's write_backward.
> * So &mp should not be passed through const pointer.
> */
> - struct mmap_params mp = { .nr_cblocks = nr_cblocks, .affinity = affinity };
> + struct mmap_params mp = {
> + .nr_cblocks = nr_cblocks,
> + .affinity = affinity,
> + .cpu_map = cpus
> + };
>
> if (!evlist->mmap)
> evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 742fa9a8e498..a2095e4eda4b 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -361,6 +361,7 @@ void perf_mmap__munmap(struct perf_mmap *map)
>
> int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu)
> {
> + int c, nr_cpus, node;
> /*
> * The last one will be done at perf_mmap__consume(), so that we
> * make sure we don't prevent tools from consuming every last event in
> @@ -389,6 +390,17 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
> map->cpu = cpu;
>
> CPU_ZERO(&map->affinity_mask);
> + if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1) {
> + nr_cpus = cpu_map__nr(mp->cpu_map);
> + node = cpu__get_node(map->cpu);
> + for (c = 0; c < nr_cpus; c++) {
> + if (cpu__get_node(c) == node) {
> + CPU_SET(c, &map->affinity_mask);
> + }
> + }
> + } else if (mp->affinity == PERF_AFFINITY_CPU) {
> + CPU_SET(map->cpu, &map->affinity_mask);
> + }
>
> if (auxtrace_mmap__mmap(&map->auxtrace_mmap,
> &mp->auxtrace_mp, map->base, fd))
> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
> index e566c19b242b..b3f724fad22e 100644
> --- a/tools/perf/util/mmap.h
> +++ b/tools/perf/util/mmap.h
> @@ -72,6 +72,7 @@ enum bkw_mmap_state {
> struct mmap_params {
> int prot, mask, nr_cblocks, affinity;
> struct auxtrace_mmap_params auxtrace_mp;
> + const struct cpu_map *cpu_map;
> };
>
> int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu);
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes
2018-12-24 12:24 ` [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2019-01-01 21:39 ` Jiri Olsa
@ 2019-01-01 21:41 ` Jiri Olsa
2019-01-09 9:12 ` Alexey Budankov
2 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2019-01-01 21:41 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Mon, Dec 24, 2018 at 03:24:36PM +0300, Alexey Budankov wrote:
SNIP
> +static void perf_mmap__aio_free(void **data, size_t len __maybe_unused)
> +{
> + zfree(data);
> +}
> +
> +static void perf_mmap__aio_bind(void *data __maybe_unused, size_t len __maybe_unused,
> + int cpu __maybe_unused, int affinity __maybe_unused)
> +{
> +}
> +#endif
> +
> static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
> {
> int delta_max, i, prio;
> @@ -177,11 +220,13 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
> }
> delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
> for (i = 0; i < map->aio.nr_cblocks; ++i) {
> - map->aio.data[i] = malloc(perf_mmap__mmap_len(map));
> + size_t mmap_len = perf_mmap__mmap_len(map);
> + perf_mmap__aio_alloc(&(map->aio.data[i]), mmap_len);
> if (!map->aio.data[i]) {
> pr_debug2("failed to allocate data buffer area, error %m");
> return -1;
> }
> + perf_mmap__aio_bind(map->aio.data[i], mmap_len, map->cpu, mp->affinity);
this all does not work if bind fails.. I think we need to
propagate the error value here and fail
jirka
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/4] perf record: allocate affinity masks
2019-01-01 21:39 ` Jiri Olsa
@ 2019-01-09 9:10 ` Alexey Budankov
0 siblings, 0 replies; 25+ messages in thread
From: Alexey Budankov @ 2019-01-09 9:10 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Hi,
On 02.01.2019 0:39, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 03:23:13PM +0300, Alexey Budankov wrote:
>
> SNIP
>
>> @@ -2143,6 +2152,8 @@ int cmd_record(int argc, const char **argv)
>> if (verbose > 0)
>> pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
>>
>> + pr_debug("affinity: %s\n", affinity_tags[rec->opts.affinity]);
>> +
>> err = __cmd_record(&record, argc, argv);
>> out:
>> perf_evlist__delete(rec->evlist);
>> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
>> index 388c6dd128b8..69f54529d81f 100644
>> --- a/tools/perf/perf.h
>> +++ b/tools/perf/perf.h
>> @@ -83,6 +83,14 @@ struct record_opts {
>> clockid_t clockid;
>> u64 clockid_res_ns;
>> int nr_cblocks;
>> + int affinity;
>> +};
>> +
>> +enum perf_affinity {
>> + PERF_AFFINITY_SYS = 0,
>> + PERF_AFFINITY_NODE,
>> + PERF_AFFINITY_CPU,
>> + PERF_AFFINITY_EOF
>
> PERF_AFFINITY_MAX might be better name
Corrected in v3.
Thanks,
Alexey
>
> jirka
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes
2019-01-01 21:39 ` Jiri Olsa
@ 2019-01-09 9:10 ` Alexey Budankov
0 siblings, 0 replies; 25+ messages in thread
From: Alexey Budankov @ 2019-01-09 9:10 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Hi,
On 02.01.2019 0:39, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 03:24:36PM +0300, Alexey Budankov wrote:
>>
>> Allocate and bind AIO user space buffers to the memory nodes
>> that mmap kernel buffers are bound to.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>> Changes in v2:
>> - implemented perf_mmap__aio_alloc, perf_mmap__aio_free, perf_mmap__aio_bind
>> and put HAVE_LIBNUMA_SUPPORT #ifdefs in there
>> ---
>> tools/perf/util/mmap.c | 49 ++++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 47 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
>> index e68ba754a8e2..742fa9a8e498 100644
>> --- a/tools/perf/util/mmap.c
>> +++ b/tools/perf/util/mmap.c
>> @@ -10,6 +10,9 @@
>> #include <sys/mman.h>
>> #include <inttypes.h>
>> #include <asm/bug.h>
>> +#ifdef HAVE_LIBNUMA_SUPPORT
>> +#include <numaif.h>
>> +#endif
>> #include "debug.h"
>> #include "event.h"
>> #include "mmap.h"
>> @@ -154,6 +157,46 @@ void __weak auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp __mayb
>> }
>>
>> #ifdef HAVE_AIO_SUPPORT
>> +
>> +#ifdef HAVE_LIBNUMA_SUPPORT
>> +static void perf_mmap__aio_alloc(void **data, size_t len)
>> +{
>> + *data = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
>> +}
>
> please make perf_mmap__aio_alloc return the pointer,
> I dont see a need for **data arg
>
> also perf_mmap__ preffix indicates it's function over
> 'struct perf_mmap', which should be the first arg..
> not sure, but perhaps that could make the code also
> simpler, like:
>
> static int perf_mmap__aio_alloc(struct perf_mmap *, int index);
> static int perf_mmap__aio_bind(struct perf_mmap *, int index, int affinity);
> static void perf_mmap__aio_free(struct perf_mmap *, int index);
Makes sense. Applied all that in v3.
Thanks,
Alexey
>
> jirka
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes
2019-01-01 21:39 ` Jiri Olsa
@ 2019-01-09 9:10 ` Alexey Budankov
0 siblings, 0 replies; 25+ messages in thread
From: Alexey Budankov @ 2019-01-09 9:10 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Hi,
On 02.01.2019 0:39, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 03:24:36PM +0300, Alexey Budankov wrote:
>
> SNIP
>
>> +#else
>> +static void perf_mmap__aio_alloc(void **data, size_t len)
>> +{
>> + *data = malloc(len);
>> +}
>> +
>> +static void perf_mmap__aio_free(void **data, size_t len __maybe_unused)
>> +{
>> + zfree(data);
>> +}
>> +
>> +static void perf_mmap__aio_bind(void *data __maybe_unused, size_t len __maybe_unused,
>> + int cpu __maybe_unused, int affinity __maybe_unused)
>> +{
>> +}
>> +#endif
>> +
>> static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>> {
>> int delta_max, i, prio;
>> @@ -177,11 +220,13 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>> }
>> delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
>> for (i = 0; i < map->aio.nr_cblocks; ++i) {
>> - map->aio.data[i] = malloc(perf_mmap__mmap_len(map));
>> + size_t mmap_len = perf_mmap__mmap_len(map);
>
> WARNING: Missing a blank line after declarations
>
> and plenty of othres from scripts/checkpatch.pl,
> please run that
Corrected in v3. Thanks!
Alexey
>
> jirka
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes
2019-01-01 21:41 ` Jiri Olsa
@ 2019-01-09 9:12 ` Alexey Budankov
2019-01-09 16:49 ` Jiri Olsa
0 siblings, 1 reply; 25+ messages in thread
From: Alexey Budankov @ 2019-01-09 9:12 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Hi,
On 02.01.2019 0:41, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 03:24:36PM +0300, Alexey Budankov wrote:
>
> SNIP
>
>> +static void perf_mmap__aio_free(void **data, size_t len __maybe_unused)
>> +{
>> + zfree(data);
>> +}
>> +
>> +static void perf_mmap__aio_bind(void *data __maybe_unused, size_t len __maybe_unused,
>> + int cpu __maybe_unused, int affinity __maybe_unused)
>> +{
>> +}
>> +#endif
>> +
>> static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>> {
>> int delta_max, i, prio;
>> @@ -177,11 +220,13 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>> }
>> delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
>> for (i = 0; i < map->aio.nr_cblocks; ++i) {
>> - map->aio.data[i] = malloc(perf_mmap__mmap_len(map));
>> + size_t mmap_len = perf_mmap__mmap_len(map);
>> + perf_mmap__aio_alloc(&(map->aio.data[i]), mmap_len);
>> if (!map->aio.data[i]) {
>> pr_debug2("failed to allocate data buffer area, error %m");
>> return -1;
>> }
>> + perf_mmap__aio_bind(map->aio.data[i], mmap_len, map->cpu, mp->affinity);
>
> this all does not work if bind fails.. I think we need to
> propagate the error value here and fail
Proceeding further from this point still makes sense because
the buffer is available for operations and thread migration
alone can bring performance benefits. So the error is not fatal
and an explicit warning is implemented in v3. If you still think
it is better to propagate error from here it can be implemented.
Thanks,
Alexey
>
> jirka
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] perf record: apply affinity masks when reading mmap buffers
2019-01-01 21:39 ` Jiri Olsa
@ 2019-01-09 9:13 ` Alexey Budankov
0 siblings, 0 replies; 25+ messages in thread
From: Alexey Budankov @ 2019-01-09 9:13 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Hi,
On 02.01.2019 0:39, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 03:27:17PM +0300, Alexey Budankov wrote:
>
> SNIP
>
>> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
>> index 742fa9a8e498..a2095e4eda4b 100644
>> --- a/tools/perf/util/mmap.c
>> +++ b/tools/perf/util/mmap.c
>> @@ -361,6 +361,7 @@ void perf_mmap__munmap(struct perf_mmap *map)
>>
>> int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu)
>> {
>> + int c, nr_cpus, node;
>> /*
>> * The last one will be done at perf_mmap__consume(), so that we
>> * make sure we don't prevent tools from consuming every last event in
>> @@ -389,6 +390,17 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
>> map->cpu = cpu;
>>
>> CPU_ZERO(&map->affinity_mask);
>> + if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1) {
>> + nr_cpus = cpu_map__nr(mp->cpu_map);
>> + node = cpu__get_node(map->cpu);
>> + for (c = 0; c < nr_cpus; c++) {
>> + if (cpu__get_node(c) == node) {
>
> the 'c' is just an index here, I think you need to
> use the real cpu value from the mp->cpu_map->map[c]
Well, yes, mapping c index to online cpu index is more generic.
Corrected in v3. Thanks!
Alexey
>
> jirka
>
>> + CPU_SET(c, &map->affinity_mask);
>> + }
>> + }
>> + } else if (mp->affinity == PERF_AFFINITY_CPU) {
>> + CPU_SET(map->cpu, &map->affinity_mask);
>> + }
>>
>> if (auxtrace_mmap__mmap(&map->auxtrace_mmap,
>> &mp->auxtrace_mp, map->base, fd))
>> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
>> index e566c19b242b..b3f724fad22e 100644
>> --- a/tools/perf/util/mmap.h
>> +++ b/tools/perf/util/mmap.h
>> @@ -72,6 +72,7 @@ enum bkw_mmap_state {
>> struct mmap_params {
>> int prot, mask, nr_cblocks, affinity;
>> struct auxtrace_mmap_params auxtrace_mp;
>> + const struct cpu_map *cpu_map;
>> };
>>
>> int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu);
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/4] perf record: apply affinity masks when reading mmap buffers
2019-01-01 21:39 ` Jiri Olsa
@ 2019-01-09 9:14 ` Alexey Budankov
0 siblings, 0 replies; 25+ messages in thread
From: Alexey Budankov @ 2019-01-09 9:14 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Hi,
On 02.01.2019 0:39, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 03:27:17PM +0300, Alexey Budankov wrote:
>>
>> Build node cpu masks for mmap data buffers. Apply node cpu
>> masks to tool thread every time it references data buffers
>> cross node or cross cpu.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>> Changes in v2:
>> - separated AIO buffers binding to patch 2/4
>> ---
>> tools/perf/builtin-record.c | 9 +++++++++
>> tools/perf/util/evlist.c | 6 +++++-
>> tools/perf/util/mmap.c | 12 ++++++++++++
>> tools/perf/util/mmap.h | 1 +
>> 4 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index b26febb54d01..eea96794ee45 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -536,6 +536,9 @@ static int record__mmap_evlist(struct record *rec,
>> struct record_opts *opts = &rec->opts;
>> char msg[512];
>>
>> + if (opts->affinity != PERF_AFFINITY_SYS)
>> + cpu__setup_cpunode_map();
>> +
>> if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
>> opts->auxtrace_mmap_pages,
>> opts->auxtrace_snapshot_mode,
>> @@ -755,6 +758,12 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>> struct perf_mmap *map = &maps[i];
>>
>> if (map->base) {
>> + if (rec->opts.affinity != PERF_AFFINITY_SYS &&
>> + !CPU_EQUAL(&rec->affinity_mask, &map->affinity_mask)) {
>> + CPU_ZERO(&rec->affinity_mask);
>> + CPU_OR(&rec->affinity_mask, &rec->affinity_mask, &map->affinity_mask);
>> + sched_setaffinity(0, sizeof(rec->affinity_mask), &rec->affinity_mask);
>
> all this code depends on aio and LIBNUMA, let's keep it there then
Please note that thread migration improves performance for serial case too:
BENCH REPORT BASED ELAPSED TIME BASED
v4.20.0-rc5
(tip perf/core):
(current) SERIAL-SYS / BASE : 1.27x (14.37/11.31), 1.29x (15.19/11.69)
SERIAL-NODE / BASE : 1.15x (13.04/11.31), 1.17x (13.79/11.69)
SERIAL-CPU / BASE : 1.00x (11.32/11.31), 1.01x (11.89/11.69)
mbind() for AIO buffers is the only related adjustment.
>
> also please add this and the affinity_mask setup code below to a function
Separated the code into record__adjust_affinity() and perf_mmap__setup_affinity_mask() in v3.
Thanks,
Alexey
>
> thanks,
> jirka
>
>> + }
>> if (!record__aio_enabled(rec)) {
>> if (perf_mmap__push(map, rec, record__pushfn) != 0) {
>> rc = -1;
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index 60e825be944a..5ca5bb5ea0db 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -1028,7 +1028,11 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
>> * Its value is decided by evsel's write_backward.
>> * So &mp should not be passed through const pointer.
>> */
>> - struct mmap_params mp = { .nr_cblocks = nr_cblocks, .affinity = affinity };
>> + struct mmap_params mp = {
>> + .nr_cblocks = nr_cblocks,
>> + .affinity = affinity,
>> + .cpu_map = cpus
>> + };
>>
>> if (!evlist->mmap)
>> evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
>> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
>> index 742fa9a8e498..a2095e4eda4b 100644
>> --- a/tools/perf/util/mmap.c
>> +++ b/tools/perf/util/mmap.c
>> @@ -361,6 +361,7 @@ void perf_mmap__munmap(struct perf_mmap *map)
>>
>> int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu)
>> {
>> + int c, nr_cpus, node;
>> /*
>> * The last one will be done at perf_mmap__consume(), so that we
>> * make sure we don't prevent tools from consuming every last event in
>> @@ -389,6 +390,17 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
>> map->cpu = cpu;
>>
>> CPU_ZERO(&map->affinity_mask);
>> + if (mp->affinity == PERF_AFFINITY_NODE && cpu__max_node() > 1) {
>> + nr_cpus = cpu_map__nr(mp->cpu_map);
>> + node = cpu__get_node(map->cpu);
>> + for (c = 0; c < nr_cpus; c++) {
>> + if (cpu__get_node(c) == node) {
>> + CPU_SET(c, &map->affinity_mask);
>> + }
>> + }
>> + } else if (mp->affinity == PERF_AFFINITY_CPU) {
>> + CPU_SET(map->cpu, &map->affinity_mask);
>> + }
>>
>> if (auxtrace_mmap__mmap(&map->auxtrace_mmap,
>> &mp->auxtrace_mp, map->base, fd))
>> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
>> index e566c19b242b..b3f724fad22e 100644
>> --- a/tools/perf/util/mmap.h
>> +++ b/tools/perf/util/mmap.h
>> @@ -72,6 +72,7 @@ enum bkw_mmap_state {
>> struct mmap_params {
>> int prot, mask, nr_cblocks, affinity;
>> struct auxtrace_mmap_params auxtrace_mp;
>> + const struct cpu_map *cpu_map;
>> };
>>
>> int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu);
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/4] perf record: implement --affinity=node|cpu option
2019-01-01 21:39 ` Jiri Olsa
@ 2019-01-09 9:15 ` Alexey Budankov
2019-01-09 9:15 ` Alexey Budankov
1 sibling, 0 replies; 25+ messages in thread
From: Alexey Budankov @ 2019-01-09 9:15 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Hi,
On 02.01.2019 0:39, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 03:28:33PM +0300, Alexey Budankov wrote:
>>
>> Implement --affinity=node|cpu option for the record mode defaulting
>> to system affinity mask bouncing.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>> tools/perf/Documentation/perf-record.txt | 5 +++++
>> tools/perf/builtin-record.c | 18 ++++++++++++++++++
>> 2 files changed, 23 insertions(+)
>>
>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
>> index d232b13ea713..efb839784f32 100644
>> --- a/tools/perf/Documentation/perf-record.txt
>> +++ b/tools/perf/Documentation/perf-record.txt
>> @@ -440,6 +440,11 @@ Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default:
>> Asynchronous mode is supported only when linking Perf tool with libc library
>> providing implementation for Posix AIO API.
>>
>> +--affinity=mode::
>> +Set affinity mask of trace reading thread according to the policy defined by 'mode' value:
>> + node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer
>> + cpu - thread affinity mask is set to cpu of the processed mmap buffer
>> +
>> --all-kernel::
>> Configure all used events to run in kernel space.
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index eea96794ee45..57dc3a45d16f 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -1653,6 +1653,21 @@ static int parse_clockid(const struct option *opt, const char *str, int unset)
>> ui__warning("unknown clockid %s, check man page\n", ostr);
>> return -1;
>> }
>> +static int record__parse_affinity(const struct option *opt, const char *str, int unset)
>> +{
>> + struct record_opts *opts = (struct record_opts *)opt->value;
>> +
>
> please use:
>
> if (unset)
> return 0;
>
> if (str) {
> ...
> }
Corrected in v3.
Thanks,
Alexey
>
> jirka
>
>> + if (!unset) {
>> + if (str) {
>> + if (!strcasecmp(str, "node"))
>> + opts->affinity = PERF_AFFINITY_NODE;
>> + else if (!strcasecmp(str, "cpu"))
>> + opts->affinity = PERF_AFFINITY_CPU;
>> + }
>> + }
>> +
>> + return 0;
>> +}
>>
>> static int record__parse_mmap_pages(const struct option *opt,
>> const char *str,
>> @@ -1961,6 +1976,9 @@ static struct option __record_options[] = {
>> &nr_cblocks_default, "n", "Use <n> control blocks in asynchronous trace writing mode (default: 1, max: 4)",
>> record__aio_parse),
>> #endif
>> + OPT_CALLBACK(0, "affinity", &record.opts, "node|cpu",
>> + "Set affinity mask of trace reading thread to NUMA node cpu mask or cpu of processed mmap buffer",
>> + record__parse_affinity),
>> OPT_END()
>> };
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/4] perf record: implement --affinity=node|cpu option
2019-01-01 21:39 ` Jiri Olsa
@ 2019-01-09 9:15 ` Alexey Budankov
0 siblings, 0 replies; 25+ messages in thread
From: Alexey Budankov @ 2019-01-09 9:15 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Hi,
On 02.01.2019 0:39, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 03:28:33PM +0300, Alexey Budankov wrote:
>>
>> Implement --affinity=node|cpu option for the record mode defaulting
>> to system affinity mask bouncing.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>> tools/perf/Documentation/perf-record.txt | 5 +++++
>> tools/perf/builtin-record.c | 18 ++++++++++++++++++
>> 2 files changed, 23 insertions(+)
>>
>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
>> index d232b13ea713..efb839784f32 100644
>> --- a/tools/perf/Documentation/perf-record.txt
>> +++ b/tools/perf/Documentation/perf-record.txt
>> @@ -440,6 +440,11 @@ Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default:
>> Asynchronous mode is supported only when linking Perf tool with libc library
>> providing implementation for Posix AIO API.
>>
>> +--affinity=mode::
>> +Set affinity mask of trace reading thread according to the policy defined by 'mode' value:
>> + node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer
>> + cpu - thread affinity mask is set to cpu of the processed mmap buffer
>> +
>> --all-kernel::
>> Configure all used events to run in kernel space.
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index eea96794ee45..57dc3a45d16f 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -1653,6 +1653,21 @@ static int parse_clockid(const struct option *opt, const char *str, int unset)
>> ui__warning("unknown clockid %s, check man page\n", ostr);
>> return -1;
>> }
>> +static int record__parse_affinity(const struct option *opt, const char *str, int unset)
>> +{
>> + struct record_opts *opts = (struct record_opts *)opt->value;
>> +
>> + if (!unset) {
>> + if (str) {
>> + if (!strcasecmp(str, "node"))
>> + opts->affinity = PERF_AFFINITY_NODE;
>> + else if (!strcasecmp(str, "cpu"))
>> + opts->affinity = PERF_AFFINITY_CPU;
>> + }
>> + }
>> +
>> + return 0;
>> +}
>>
>> static int record__parse_mmap_pages(const struct option *opt,
>> const char *str,
>> @@ -1961,6 +1976,9 @@ static struct option __record_options[] = {
>> &nr_cblocks_default, "n", "Use <n> control blocks in asynchronous trace writing mode (default: 1, max: 4)",
>> record__aio_parse),
>> #endif
>> + OPT_CALLBACK(0, "affinity", &record.opts, "node|cpu",
>> + "Set affinity mask of trace reading thread to NUMA node cpu mask or cpu of processed mmap buffer",
>> + record__parse_affinity),
>
> so this makes sense only when there's --aio and LIBNUMA
> in place.. we should check for those and allow this only
> for these
Serial trace streaming also benefits from that.
Thanks,
Alexey
>
> jirka
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/4] perf record: implement --affinity=node|cpu option
2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:15 ` Alexey Budankov
@ 2019-01-09 9:15 ` Alexey Budankov
1 sibling, 0 replies; 25+ messages in thread
From: Alexey Budankov @ 2019-01-09 9:15 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Hi,
On 02.01.2019 0:39, Jiri Olsa wrote:
> On Mon, Dec 24, 2018 at 03:28:33PM +0300, Alexey Budankov wrote:
>>
>> Implement --affinity=node|cpu option for the record mode defaulting
>> to system affinity mask bouncing.
>>
>> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
>> ---
>> tools/perf/Documentation/perf-record.txt | 5 +++++
>> tools/perf/builtin-record.c | 18 ++++++++++++++++++
>> 2 files changed, 23 insertions(+)
>>
>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
>> index d232b13ea713..efb839784f32 100644
>> --- a/tools/perf/Documentation/perf-record.txt
>> +++ b/tools/perf/Documentation/perf-record.txt
>> @@ -440,6 +440,11 @@ Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default:
>> Asynchronous mode is supported only when linking Perf tool with libc library
>> providing implementation for Posix AIO API.
>>
>> +--affinity=mode::
>> +Set affinity mask of trace reading thread according to the policy defined by 'mode' value:
>> + node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer
>> + cpu - thread affinity mask is set to cpu of the processed mmap buffer
>> +
>> --all-kernel::
>> Configure all used events to run in kernel space.
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index eea96794ee45..57dc3a45d16f 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -1653,6 +1653,21 @@ static int parse_clockid(const struct option *opt, const char *str, int unset)
>> ui__warning("unknown clockid %s, check man page\n", ostr);
>> return -1;
>> }
>> +static int record__parse_affinity(const struct option *opt, const char *str, int unset)
>> +{
>> + struct record_opts *opts = (struct record_opts *)opt->value;
>> +
>
> please use:
>
> if (unset)
> return 0;
>
> if (str) {
> ...
> }
>
Addressed in v3.
Thanks,
Alexey
> jirka
>
>> + if (!unset) {
>> + if (str) {
>> + if (!strcasecmp(str, "node"))
>> + opts->affinity = PERF_AFFINITY_NODE;
>> + else if (!strcasecmp(str, "cpu"))
>> + opts->affinity = PERF_AFFINITY_CPU;
>> + }
>> + }
>> +
>> + return 0;
>> +}
>>
>> static int record__parse_mmap_pages(const struct option *opt,
>> const char *str,
>> @@ -1961,6 +1976,9 @@ static struct option __record_options[] = {
>> &nr_cblocks_default, "n", "Use <n> control blocks in asynchronous trace writing mode (default: 1, max: 4)",
>> record__aio_parse),
>> #endif
>> + OPT_CALLBACK(0, "affinity", &record.opts, "node|cpu",
>> + "Set affinity mask of trace reading thread to NUMA node cpu mask or cpu of processed mmap buffer",
>> + record__parse_affinity),
>> OPT_END()
>> };
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes
2019-01-09 9:12 ` Alexey Budankov
@ 2019-01-09 16:49 ` Jiri Olsa
2019-01-09 18:14 ` Alexey Budankov
0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2019-01-09 16:49 UTC (permalink / raw)
To: Alexey Budankov
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On Wed, Jan 09, 2019 at 12:12:37PM +0300, Alexey Budankov wrote:
> Hi,
>
> On 02.01.2019 0:41, Jiri Olsa wrote:
> > On Mon, Dec 24, 2018 at 03:24:36PM +0300, Alexey Budankov wrote:
> >
> > SNIP
> >
> >> +static void perf_mmap__aio_free(void **data, size_t len __maybe_unused)
> >> +{
> >> + zfree(data);
> >> +}
> >> +
> >> +static void perf_mmap__aio_bind(void *data __maybe_unused, size_t len __maybe_unused,
> >> + int cpu __maybe_unused, int affinity __maybe_unused)
> >> +{
> >> +}
> >> +#endif
> >> +
> >> static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
> >> {
> >> int delta_max, i, prio;
> >> @@ -177,11 +220,13 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
> >> }
> >> delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
> >> for (i = 0; i < map->aio.nr_cblocks; ++i) {
> >> - map->aio.data[i] = malloc(perf_mmap__mmap_len(map));
> >> + size_t mmap_len = perf_mmap__mmap_len(map);
> >> + perf_mmap__aio_alloc(&(map->aio.data[i]), mmap_len);
> >> if (!map->aio.data[i]) {
> >> pr_debug2("failed to allocate data buffer area, error %m");
> >> return -1;
> >> }
> >> + perf_mmap__aio_bind(map->aio.data[i], mmap_len, map->cpu, mp->affinity);
> >
> > this all does not work if bind fails.. I think we need to
> > propagate the error value here and fail
>
> Proceeding further from this point still makes sense because
> the buffer is available for operations and thread migration
> alone can bring performance benefits. So the error is not fatal
> and an explicit warning is implemented in v3. If you still think
> it is better to propagate error from here it can be implemented.
so if that fails that the aio buffers won't be bound to node,
while mmaps are, so I guess the speedup is from there?
if I use:
# perf record --aio --affinity=node
and see:
"failed to bind..."
I can still see the benefit..? I guess the warning is ok then,
another option seems confusing
jirka
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes
2019-01-09 16:49 ` Jiri Olsa
@ 2019-01-09 18:14 ` Alexey Budankov
0 siblings, 0 replies; 25+ messages in thread
From: Alexey Budankov @ 2019-01-09 18:14 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
On 09.01.2019 19:49, Jiri Olsa wrote:
> On Wed, Jan 09, 2019 at 12:12:37PM +0300, Alexey Budankov wrote:
>> Hi,
>>
>> On 02.01.2019 0:41, Jiri Olsa wrote:
>>> On Mon, Dec 24, 2018 at 03:24:36PM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
>>>> +static void perf_mmap__aio_free(void **data, size_t len __maybe_unused)
>>>> +{
>>>> + zfree(data);
>>>> +}
>>>> +
>>>> +static void perf_mmap__aio_bind(void *data __maybe_unused, size_t len __maybe_unused,
>>>> + int cpu __maybe_unused, int affinity __maybe_unused)
>>>> +{
>>>> +}
>>>> +#endif
>>>> +
>>>> static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>>>> {
>>>> int delta_max, i, prio;
>>>> @@ -177,11 +220,13 @@ static int perf_mmap__aio_mmap(struct perf_mmap *map, struct mmap_params *mp)
>>>> }
>>>> delta_max = sysconf(_SC_AIO_PRIO_DELTA_MAX);
>>>> for (i = 0; i < map->aio.nr_cblocks; ++i) {
>>>> - map->aio.data[i] = malloc(perf_mmap__mmap_len(map));
>>>> + size_t mmap_len = perf_mmap__mmap_len(map);
>>>> + perf_mmap__aio_alloc(&(map->aio.data[i]), mmap_len);
>>>> if (!map->aio.data[i]) {
>>>> pr_debug2("failed to allocate data buffer area, error %m");
>>>> return -1;
>>>> }
>>>> + perf_mmap__aio_bind(map->aio.data[i], mmap_len, map->cpu, mp->affinity);
>>>
>>> this all does not work if bind fails.. I think we need to
>>> propagate the error value here and fail
>>
>> Proceeding further from this point still makes sense because
>> the buffer is available for operations and thread migration
>> alone can bring performance benefits. So the error is not fatal
>> and an explicit warning is implemented in v3. If you still think
>> it is better to propagate error from here it can be implemented.
>
> so if that fails that the aio buffers won't be bound to node,
> while mmaps are, so I guess the speedup is from there?
>
> if I use:
>
> # perf record --aio --affinity=node
>
> and see:
> "failed to bind..."
>
> I can still see the benefit..? I guess the warning is ok then,
It still can bring benefits. kernel buffers are allocated locally,
tool thread migrates to make sure it reads data locally from the buffers.
Even if aio buffers failed to be mapped locally, what is quite rare case,
--affinity=node could still be not slower than without thread migration.
> another option seems confusing
Do you mean --affinity=cpu?
So, well, that cases - bind fail + affinity=node or affinity=cpu - could
be tested, of course, but it looks simpler and safer to implement error
reporting and stop, because, again, the cases are quite rare.
So let's stay on the safer side, in v4. :)
Thanks,
Alexey
>
> jirka
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 1/4] perf record: allocate affinity masks
2018-12-13 7:07 [PATCH v2 0/4] Reduce NUMA related overhead in perf record profiling on large server systems Alexey Budankov
@ 2018-12-13 7:18 ` Alexey Budankov
0 siblings, 0 replies; 25+ messages in thread
From: Alexey Budankov @ 2018-12-13 7:18 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra
Cc: Jiri Olsa, Namhyung Kim, Alexander Shishkin, Andi Kleen, linux-kernel
Allocate affinity option and masks for mmap data buffers and
record thread as well as initialize allocated objects.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
---
Changes in v2:
- made debug affinity mode message user friendly
- converted affinity mode defines to enum values
---
tools/perf/builtin-record.c | 13 ++++++++++++-
tools/perf/perf.h | 8 ++++++++
tools/perf/util/evlist.c | 6 +++---
tools/perf/util/evlist.h | 2 +-
tools/perf/util/mmap.c | 2 ++
tools/perf/util/mmap.h | 3 ++-
6 files changed, 28 insertions(+), 6 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 882285fb9f64..b26febb54d01 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -81,12 +81,17 @@ struct record {
bool timestamp_boundary;
struct switch_output switch_output;
unsigned long long samples;
+ cpu_set_t affinity_mask;
};
static volatile int auxtrace_record__snapshot_started;
static DEFINE_TRIGGER(auxtrace_snapshot_trigger);
static DEFINE_TRIGGER(switch_output_trigger);
+static const char* affinity_tags[PERF_AFFINITY_EOF] = {
+ "SYS", "NODE", "CPU"
+};
+
static bool switch_output_signal(struct record *rec)
{
return rec->switch_output.signal &&
@@ -533,7 +538,8 @@ static int record__mmap_evlist(struct record *rec,
if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
opts->auxtrace_mmap_pages,
- opts->auxtrace_snapshot_mode, opts->nr_cblocks) < 0) {
+ opts->auxtrace_snapshot_mode,
+ opts->nr_cblocks, opts->affinity) < 0) {
if (errno == EPERM) {
pr_err("Permission error mapping pages.\n"
"Consider increasing "
@@ -1980,6 +1986,9 @@ int cmd_record(int argc, const char **argv)
# undef REASON
#endif
+ CPU_ZERO(&rec->affinity_mask);
+ rec->opts.affinity = PERF_AFFINITY_SYS;
+
rec->evlist = perf_evlist__new();
if (rec->evlist == NULL)
return -ENOMEM;
@@ -2143,6 +2152,8 @@ int cmd_record(int argc, const char **argv)
if (verbose > 0)
pr_info("nr_cblocks: %d\n", rec->opts.nr_cblocks);
+ pr_debug("affinity: %s\n", affinity_tags[rec->opts.affinity]);
+
err = __cmd_record(&record, argc, argv);
out:
perf_evlist__delete(rec->evlist);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 388c6dd128b8..69f54529d81f 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -83,6 +83,14 @@ struct record_opts {
clockid_t clockid;
u64 clockid_res_ns;
int nr_cblocks;
+ int affinity;
+};
+
+enum perf_affinity {
+ PERF_AFFINITY_SYS = 0,
+ PERF_AFFINITY_NODE,
+ PERF_AFFINITY_CPU,
+ PERF_AFFINITY_EOF
};
struct option;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e90575192209..60e825be944a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1018,7 +1018,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
*/
int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
unsigned int auxtrace_pages,
- bool auxtrace_overwrite, int nr_cblocks)
+ bool auxtrace_overwrite, int nr_cblocks, int affinity)
{
struct perf_evsel *evsel;
const struct cpu_map *cpus = evlist->cpus;
@@ -1028,7 +1028,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
* Its value is decided by evsel's write_backward.
* So &mp should not be passed through const pointer.
*/
- struct mmap_params mp = { .nr_cblocks = nr_cblocks };
+ struct mmap_params mp = { .nr_cblocks = nr_cblocks, .affinity = affinity };
if (!evlist->mmap)
evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
@@ -1060,7 +1060,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
{
- return perf_evlist__mmap_ex(evlist, pages, 0, false, 0);
+ return perf_evlist__mmap_ex(evlist, pages, 0, false, 0, PERF_AFFINITY_SYS);
}
int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 868294491194..72728d7f4432 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -162,7 +162,7 @@ unsigned long perf_event_mlock_kb_in_pages(void);
int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
unsigned int auxtrace_pages,
- bool auxtrace_overwrite, int nr_cblocks);
+ bool auxtrace_overwrite, int nr_cblocks, int affinity);
int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages);
void perf_evlist__munmap(struct perf_evlist *evlist);
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 8fc39311a30d..e68ba754a8e2 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -343,6 +343,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int c
map->fd = fd;
map->cpu = cpu;
+ CPU_ZERO(&map->affinity_mask);
+
if (auxtrace_mmap__mmap(&map->auxtrace_mmap,
&mp->auxtrace_mp, map->base, fd))
return -1;
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index aeb6942fdb00..e566c19b242b 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -38,6 +38,7 @@ struct perf_mmap {
int nr_cblocks;
} aio;
#endif
+ cpu_set_t affinity_mask;
};
/*
@@ -69,7 +70,7 @@ enum bkw_mmap_state {
};
struct mmap_params {
- int prot, mask, nr_cblocks;
+ int prot, mask, nr_cblocks, affinity;
struct auxtrace_mmap_params auxtrace_mp;
};
^ permalink raw reply related [flat|nested] 25+ messages in thread
end of thread, other threads:[~2019-01-09 18:14 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-24 12:11 [PATCH v2 0/4] Reduce NUMA related overhead in perf record profiling on large server systems Alexey Budankov
2018-12-24 12:23 ` [PATCH v2 1/4] perf record: allocate affinity masks Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:10 ` Alexey Budankov
2018-12-24 12:24 ` [PATCH v2 2/4] perf record: bind the AIO user space buffers to nodes Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:10 ` Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:10 ` Alexey Budankov
2019-01-01 21:41 ` Jiri Olsa
2019-01-09 9:12 ` Alexey Budankov
2019-01-09 16:49 ` Jiri Olsa
2019-01-09 18:14 ` Alexey Budankov
2018-12-24 12:27 ` [PATCH v2 3/4] perf record: apply affinity masks when reading mmap buffers Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:13 ` Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:14 ` Alexey Budankov
2018-12-24 12:28 ` [PATCH v2 4/4] perf record: implement --affinity=node|cpu option Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:15 ` Alexey Budankov
2019-01-01 21:39 ` Jiri Olsa
2019-01-09 9:15 ` Alexey Budankov
2019-01-09 9:15 ` Alexey Budankov
-- strict thread matches above, loose matches on Subject: below --
2018-12-13 7:07 [PATCH v2 0/4] Reduce NUMA related overhead in perf record profiling on large server systems Alexey Budankov
2018-12-13 7:18 ` [PATCH v2 1/4] perf record: allocate affinity masks Alexey Budankov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.