All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v0 0/2] perf: Allow forcing high-order allocation for AUX buffers
@ 2019-02-13 11:47 Alexander Shishkin
  2019-02-13 11:47 ` [PATCH v0 1/2] perf: Add an option to ask for high order allocations " Alexander Shishkin
  2019-02-13 11:47 ` [PATCH v0 2/2] perf record: Add --aux-highorder Alexander Shishkin
  0 siblings, 2 replies; 8+ messages in thread
From: Alexander Shishkin @ 2019-02-13 11:47 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, jolsa, Alexander Shishkin

Hi Peter and Arnaldo,

It turns out that using high-order allocations for AUX buffers reduces the
run-time performance penalty, for example, with Intel PT. The assumption is
that this comes from not having to fetch the next page's address at every
page boundary. Given a workload that does a lot of indirect branches (thus
generating more PT data with addresses of branch targets), it takes around
6% longer to complete under PT in snapshot mode than without PT, but only
around 4% if we use high-order output regions instead of single page output
regions. This is measured on an Atom CPU.

We already use high-order allocations for PMUs that don't do HW SG (like
Intel PT on BDW). This patchset adds an attribute bit that enables the
same for the PMUs that do have HW SG, and a command line option for perf
record to set this bit.

Alexander Shishkin (2):
  perf: Add an option to ask for high order allocations for AUX buffers
  perf record: Add --aux-highorder

 include/uapi/linux/perf_event.h       | 3 ++-
 kernel/events/core.c                  | 3 +++
 kernel/events/ring_buffer.c           | 3 ++-
 tools/include/uapi/linux/perf_event.h | 3 ++-
 tools/perf/builtin-record.c           | 2 ++
 tools/perf/perf.h                     | 1 +
 tools/perf/util/evsel.c               | 9 +++++++++
 7 files changed, 21 insertions(+), 3 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v0 1/2] perf: Add an option to ask for high order allocations for AUX buffers
  2019-02-13 11:47 [PATCH v0 0/2] perf: Allow forcing high-order allocation for AUX buffers Alexander Shishkin
@ 2019-02-13 11:47 ` Alexander Shishkin
  2019-02-13 13:07   ` Peter Zijlstra
  2019-02-13 11:47 ` [PATCH v0 2/2] perf record: Add --aux-highorder Alexander Shishkin
  1 sibling, 1 reply; 8+ messages in thread
From: Alexander Shishkin @ 2019-02-13 11:47 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, jolsa, Alexander Shishkin

Currently, the AUX buffer allocator will use high-order allocations
for PMUs that don't support hardware scatter-gather chaining to ensure
large contiguous blocks of pages, and always use an array of single
pages otherwise.

There is, however, a tangible performance benefit in using larger chunks
of contiguous memory even in the latter case, that comes from not having
to fetch the next page's address at every page boundary. In particular,
a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime
penalty with a single multi-page output region in snapshot mode (no PMI)
than with multiple single-page output regions, from ~6% down to ~4%. For
the snapshot mode it does make a difference as it is intended to run over
long periods of time.

Following the above justification, add an attribute bit to ask for a
high-order AUX allocation. To prevent an unprivileged user from using up
the higher orders of the page allocator, require CAP_SYS_ADMIN for this
option.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 include/uapi/linux/perf_event.h | 3 ++-
 kernel/events/core.c            | 3 +++
 kernel/events/ring_buffer.c     | 3 ++-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 7198ddd0c6b1..04726b5729c8 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -374,7 +374,8 @@ struct perf_event_attr {
 				namespaces     :  1, /* include namespaces data */
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
-				__reserved_1   : 33;
+				aux_highorder  :  1, /* use high order allocations for AUX data */
+				__reserved_1   : 32;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5aeb4c74fb99..ba95398505c5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10688,6 +10688,9 @@ SYSCALL_DEFINE5(perf_event_open,
 	    perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
+	if (attr.aux_highorder && !capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
 	/*
 	 * In cgroup mode, the pid argument is used to pass the fd
 	 * opened to the cgroup directory in cgroupfs. The cpu argument
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 70ae2422cbaf..72b7380deb0a 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -603,7 +603,8 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
 	if (!has_aux(event))
 		return -EOPNOTSUPP;
 
-	if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NO_SG) {
+	if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NO_SG ||
+	    event->attr.aux_highorder) {
 		/*
 		 * We need to start with the max_order that fits in nr_pages,
 		 * not the other way around, hence ilog2() and not get_order.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v0 2/2] perf record: Add --aux-highorder
  2019-02-13 11:47 [PATCH v0 0/2] perf: Allow forcing high-order allocation for AUX buffers Alexander Shishkin
  2019-02-13 11:47 ` [PATCH v0 1/2] perf: Add an option to ask for high order allocations " Alexander Shishkin
@ 2019-02-13 11:47 ` Alexander Shishkin
  1 sibling, 0 replies; 8+ messages in thread
From: Alexander Shishkin @ 2019-02-13 11:47 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, jolsa, Alexander Shishkin

Allow requesting high-order AUX allocations from the perf record command
line. Since this operation requires CAP_SYS_ADMIN, adjust the error message
to suggest this as a potential reason of failure.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 tools/include/uapi/linux/perf_event.h | 3 ++-
 tools/perf/builtin-record.c           | 2 ++
 tools/perf/perf.h                     | 1 +
 tools/perf/util/evsel.c               | 9 +++++++++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 7198ddd0c6b1..04726b5729c8 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -374,7 +374,8 @@ struct perf_event_attr {
 				namespaces     :  1, /* include namespaces data */
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
-				__reserved_1   : 33;
+				aux_highorder  :  1, /* use high order allocations for AUX data */
+				__reserved_1   : 32;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 3fdfbaebd95e..9f8d5eba24a8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1929,6 +1929,8 @@ static struct option __record_options[] = {
 	parse_clockid),
 	OPT_STRING_OPTARG('S', "snapshot", &record.opts.auxtrace_snapshot_opts,
 			  "opts", "AUX area tracing Snapshot Mode", ""),
+	OPT_BOOLEAN(0, "aux-highorder", &record.opts.aux_highorder,
+		    "Use high-order allocation for AUX buffers"),
 	OPT_UINTEGER(0, "proc-map-timeout", &proc_map_timeout,
 			"per thread proc mmap processing timeout in ms"),
 	OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index b120e547ddc7..42f1fa5faeed 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -67,6 +67,7 @@ struct record_opts {
 	bool	     strict_freq;
 	bool	     sample_id;
 	bool	     bpf_event;
+	bool	     aux_highorder;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 684c893ca6bc..3f4dc8f79d2f 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1098,6 +1098,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
 	if (evsel->own_cpus || evsel->unit)
 		evsel->attr.read_format |= PERF_FORMAT_ID;
 
+	if (opts->aux_highorder)
+		attr->aux_highorder = 1;
+
 	/*
 	 * Apply event specific term settings,
 	 * it overloads any global configuration.
@@ -1657,6 +1660,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(namespaces, p_unsigned);
 	PRINT_ATTRf(ksymbol, p_unsigned);
 	PRINT_ATTRf(bpf_event, p_unsigned);
+	PRINT_ATTRf(aux_highorder, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
@@ -2891,6 +2895,11 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target,
 				"No permission to enable %s event.\n\n",
 				perf_evsel__name(evsel));
 
+		if (evsel->attr.aux_highorder)
+			printed += scnprintf(msg + printed, size - printed,
+					     "Using --aux-highorder may not be possible with your privilege level,\n"
+					     "which requires CAP_SYS_ADMIN capability. Consider using sudo.\n\n");
+
 		return scnprintf(msg + printed, size - printed,
 		 "You may not have permission to collect %sstats.\n\n"
 		 "Consider tweaking /proc/sys/kernel/perf_event_paranoid,\n"
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v0 1/2] perf: Add an option to ask for high order allocations for AUX buffers
  2019-02-13 11:47 ` [PATCH v0 1/2] perf: Add an option to ask for high order allocations " Alexander Shishkin
@ 2019-02-13 13:07   ` Peter Zijlstra
  2019-02-13 13:49     ` Alexander Shishkin
  2019-02-13 17:47     ` Mel Gorman
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2019-02-13 13:07 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, jolsa, mgorman

On Wed, Feb 13, 2019 at 01:47:15PM +0200, Alexander Shishkin wrote:
> Currently, the AUX buffer allocator will use high-order allocations
> for PMUs that don't support hardware scatter-gather chaining to ensure
> large contiguous blocks of pages, and always use an array of single
> pages otherwise.
> 
> There is, however, a tangible performance benefit in using larger chunks
> of contiguous memory even in the latter case, that comes from not having
> to fetch the next page's address at every page boundary. In particular,
> a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime
> penalty with a single multi-page output region in snapshot mode (no PMI)
> than with multiple single-page output regions, from ~6% down to ~4%. For
> the snapshot mode it does make a difference as it is intended to run over
> long periods of time.
> 
> Following the above justification, add an attribute bit to ask for a
> high-order AUX allocation. To prevent an unprivileged user from using up
> the higher orders of the page allocator, require CAP_SYS_ADMIN for this
> option.

Why do we need a knob for that? Last time I checked unpriv users could
fragment the page allocator just fine. What is there to protect?

Also, since we return all pages upon buffer free, the page allocator
should in fact re-construct the high order stuff.

So a buffer alloc + free, using high order pages, should be an effective
nop on high order availability.

Unlike spraying dentries or whatever works these days around the
machine.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v0 1/2] perf: Add an option to ask for high order allocations for AUX buffers
  2019-02-13 13:07   ` Peter Zijlstra
@ 2019-02-13 13:49     ` Alexander Shishkin
  2019-02-13 17:47     ` Mel Gorman
  1 sibling, 0 replies; 8+ messages in thread
From: Alexander Shishkin @ 2019-02-13 13:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, jolsa, mgorman

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Feb 13, 2019 at 01:47:15PM +0200, Alexander Shishkin wrote:
>> Currently, the AUX buffer allocator will use high-order allocations
>> for PMUs that don't support hardware scatter-gather chaining to ensure
>> large contiguous blocks of pages, and always use an array of single
>> pages otherwise.
>> 
>> There is, however, a tangible performance benefit in using larger chunks
>> of contiguous memory even in the latter case, that comes from not having
>> to fetch the next page's address at every page boundary. In particular,
>> a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime
>> penalty with a single multi-page output region in snapshot mode (no PMI)
>> than with multiple single-page output regions, from ~6% down to ~4%. For
>> the snapshot mode it does make a difference as it is intended to run over
>> long periods of time.
>> 
>> Following the above justification, add an attribute bit to ask for a
>> high-order AUX allocation. To prevent an unprivileged user from using up
>> the higher orders of the page allocator, require CAP_SYS_ADMIN for this
>> option.
>
> Why do we need a knob for that? Last time I checked unpriv users could
> fragment the page allocator just fine. What is there to protect?
>
> Also, since we return all pages upon buffer free, the page allocator
> should in fact re-construct the high order stuff.
>
> So a buffer alloc + free, using high order pages, should be an effective
> nop on high order availability.
>
> Unlike spraying dentries or whatever works these days around the
> machine.

Good point. My worry is that since it's 'snapshot' mode, the buffer(s)
will be around potentially for a long time. Otherwise, I'd also prefer
to drop the capability wall.

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v0 1/2] perf: Add an option to ask for high order allocations for AUX buffers
  2019-02-13 13:07   ` Peter Zijlstra
  2019-02-13 13:49     ` Alexander Shishkin
@ 2019-02-13 17:47     ` Mel Gorman
  2019-02-13 17:54       ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2019-02-13 17:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Shishkin, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, jolsa

On Wed, Feb 13, 2019 at 02:07:55PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 01:47:15PM +0200, Alexander Shishkin wrote:
> > Currently, the AUX buffer allocator will use high-order allocations
> > for PMUs that don't support hardware scatter-gather chaining to ensure
> > large contiguous blocks of pages, and always use an array of single
> > pages otherwise.
> > 
> > There is, however, a tangible performance benefit in using larger chunks
> > of contiguous memory even in the latter case, that comes from not having
> > to fetch the next page's address at every page boundary. In particular,
> > a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime
> > penalty with a single multi-page output region in snapshot mode (no PMI)
> > than with multiple single-page output regions, from ~6% down to ~4%. For
> > the snapshot mode it does make a difference as it is intended to run over
> > long periods of time.
> > 
> > Following the above justification, add an attribute bit to ask for a
> > high-order AUX allocation. To prevent an unprivileged user from using up
> > the higher orders of the page allocator, require CAP_SYS_ADMIN for this
> > option.
> 
> Why do we need a knob for that? Last time I checked unpriv users could
> fragment the page allocator just fine. What is there to protect?
> 

It's "protected" in that we try to avoid long-lived unmovable allocations
causing problems but it's internal to the page allocator. The exception
is ZONE_MOVABLE and CMA which has some protection or static pools like
hugetlbfs with tunables and APIs that can strict access if desired.

> Also, since we return all pages upon buffer free, the page allocator
> should in fact re-construct the high order stuff.
> 

It does. At worse, other unmovable allocations may be allocated elsewhere
in the meantime but that's normal activity. It can be replicated in any
number of innocent ways by a normal user. It's even the basis of a test
case that forces fragmentation, measures THP allocation success rate and
stresses compaction.

> So a buffer alloc + free, using high order pages, should be an effective
> nop on high order availability.
> 

Shouldn't even be necessary. With the possible exception of hugetlbfs,
the userspace is not forced to take special action to keep the page
allocator happy.

If there is a tangiable performance benefit from using contiguous regions
then I would suggest optimistically allocating them with appropriate
GFP flags to avoid large latencies at startup time and fall back if
necessary. Don't stick it behind capabilities or restrict it to privileged
users. Only hugetlbfs provides restricted access and exposes an
interface to userspace for applications and even that can be
unprivileged.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v0 1/2] perf: Add an option to ask for high order allocations for AUX buffers
  2019-02-13 17:47     ` Mel Gorman
@ 2019-02-13 17:54       ` Peter Zijlstra
  2019-02-13 19:17         ` Mel Gorman
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2019-02-13 17:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexander Shishkin, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, jolsa

On Wed, Feb 13, 2019 at 05:47:56PM +0000, Mel Gorman wrote:
> If there is a tangiable performance benefit from using contiguous regions
> then I would suggest optimistically allocating them with appropriate
> GFP flags to avoid large latencies at startup time and fall back if
> necessary. 

Right; the code does the fallback thing. It successively tries smaller
order allocations until 0-order fails.

It currently uses:

#define PERF_AUX_GFP    (GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY)

Is that what you could consider appropriate?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v0 1/2] perf: Add an option to ask for high order allocations for AUX buffers
  2019-02-13 17:54       ` Peter Zijlstra
@ 2019-02-13 19:17         ` Mel Gorman
  0 siblings, 0 replies; 8+ messages in thread
From: Mel Gorman @ 2019-02-13 19:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Shishkin, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, jolsa

On Wed, Feb 13, 2019 at 06:54:34PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 05:47:56PM +0000, Mel Gorman wrote:
> > If there is a tangiable performance benefit from using contiguous regions
> > then I would suggest optimistically allocating them with appropriate
> > GFP flags to avoid large latencies at startup time and fall back if
> > necessary. 
> 
> Right; the code does the fallback thing. It successively tries smaller
> order allocations until 0-order fails.
> 
> It currently uses:
> 
> #define PERF_AUX_GFP    (GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY)
> 
> Is that what you could consider appropriate?

Yeah, spot on. Conceivably the semantics of __GFP_NORETY will change a
little at some point in the next year but not in any way I'd consider
harmful (depends on how THP and locality discussions go). Even *if*
we did something harmful, there will be complaints before it's problematic.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-02-13 19:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-13 11:47 [PATCH v0 0/2] perf: Allow forcing high-order allocation for AUX buffers Alexander Shishkin
2019-02-13 11:47 ` [PATCH v0 1/2] perf: Add an option to ask for high order allocations " Alexander Shishkin
2019-02-13 13:07   ` Peter Zijlstra
2019-02-13 13:49     ` Alexander Shishkin
2019-02-13 17:47     ` Mel Gorman
2019-02-13 17:54       ` Peter Zijlstra
2019-02-13 19:17         ` Mel Gorman
2019-02-13 11:47 ` [PATCH v0 2/2] perf record: Add --aux-highorder Alexander Shishkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.