linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/1] perf: High-order AUX allocations
@ 2019-02-15 11:47 Alexander Shishkin
  2019-02-15 11:47 ` [PATCH v1 1/1] perf: Optimistically use high order allocations for AUX buffers Alexander Shishkin
  0 siblings, 1 reply; 4+ messages in thread
From: Alexander Shishkin @ 2019-02-15 11:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, jolsa,
	Alexander Shishkin

Hi Peter,

This supersedes [1]. Since there is no reason not to always go with high
order allocations, we simply change the AUX allocator to optimistically
do that with a fallback to lower orders. And since this is not an opt-in
any more, no tool changes.

[1] https://marc.info/?l=linux-kernel&m=155005848007143

Alexander Shishkin (1):
  perf: Optimistically use high order allocations for AUX buffers

 kernel/events/ring_buffer.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v1 1/1] perf: Optimistically use high order allocations for AUX buffers
  2019-02-15 11:47 [PATCH v1 0/1] perf: High-order AUX allocations Alexander Shishkin
@ 2019-02-15 11:47 ` Alexander Shishkin
  2019-02-22 12:57   ` Alexander Shishkin
  2019-03-09 14:38   ` [tip:perf/urgent] perf/ring_buffer: Use high order allocations for AUX buffers optimistically tip-bot for Alexander Shishkin
  0 siblings, 2 replies; 4+ messages in thread
From: Alexander Shishkin @ 2019-02-15 11:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, jolsa,
	Alexander Shishkin

Currently, the AUX buffer allocator will use high-order allocations
for PMUs that don't support hardware scatter-gather chaining to ensure
large contiguous blocks of pages, and always use an array of single
pages otherwise.

There is, however, a tangible performance benefit in using larger chunks
of contiguous memory even in the latter case, that comes from not having
to fetch the next page's address at every page boundary. In particular,
a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime
penalty with a single multi-page output region in snapshot mode (no PMI)
than with multiple single-page output regions, from ~6% down to ~4%. For
the snapshot mode it does make a difference as it is intended to run over
long periods of time.

For this reason, change the allocation policy to always optimistically
start with the highest possible order when allocating pages for the AUX
buffer, desceding until the allocation succeeds or order zero allocation
fails.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/ring_buffer.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 70ae2422cbaf..cc395475e0dd 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -598,29 +598,27 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
 {
 	bool overwrite = !(flags & RING_BUFFER_WRITABLE);
 	int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu);
-	int ret = -ENOMEM, max_order = 0;
+	int ret = -ENOMEM, max_order;
 
 	if (!has_aux(event))
 		return -EOPNOTSUPP;
 
-	if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NO_SG) {
-		/*
-		 * We need to start with the max_order that fits in nr_pages,
-		 * not the other way around, hence ilog2() and not get_order.
-		 */
-		max_order = ilog2(nr_pages);
+	/*
+	 * We need to start with the max_order that fits in nr_pages,
+	 * not the other way around, hence ilog2() and not get_order.
+	 */
+	max_order = ilog2(nr_pages);
 
-		/*
-		 * PMU requests more than one contiguous chunks of memory
-		 * for SW double buffering
-		 */
-		if ((event->pmu->capabilities & PERF_PMU_CAP_AUX_SW_DOUBLEBUF) &&
-		    !overwrite) {
-			if (!max_order)
-				return -EINVAL;
+	/*
+	 * PMU requests more than one contiguous chunks of memory
+	 * for SW double buffering
+	 */
+	if ((event->pmu->capabilities & PERF_PMU_CAP_AUX_SW_DOUBLEBUF) &&
+	    !overwrite) {
+		if (!max_order)
+			return -EINVAL;
 
-			max_order--;
-		}
+		max_order--;
 	}
 
 	rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v1 1/1] perf: Optimistically use high order allocations for AUX buffers
  2019-02-15 11:47 ` [PATCH v1 1/1] perf: Optimistically use high order allocations for AUX buffers Alexander Shishkin
@ 2019-02-22 12:57   ` Alexander Shishkin
  2019-03-09 14:38   ` [tip:perf/urgent] perf/ring_buffer: Use high order allocations for AUX buffers optimistically tip-bot for Alexander Shishkin
  1 sibling, 0 replies; 4+ messages in thread
From: Alexander Shishkin @ 2019-02-22 12:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, jolsa

Alexander Shishkin <alexander.shishkin@linux.intel.com> writes:

> Currently, the AUX buffer allocator will use high-order allocations
> for PMUs that don't support hardware scatter-gather chaining to ensure
> large contiguous blocks of pages, and always use an array of single
> pages otherwise.
>
> There is, however, a tangible performance benefit in using larger chunks
> of contiguous memory even in the latter case, that comes from not having
> to fetch the next page's address at every page boundary. In particular,
> a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime
> penalty with a single multi-page output region in snapshot mode (no PMI)
> than with multiple single-page output regions, from ~6% down to ~4%. For
> the snapshot mode it does make a difference as it is intended to run over
> long periods of time.
>
> For this reason, change the allocation policy to always optimistically
> start with the highest possible order when allocating pages for the AUX
> buffer, desceding until the allocation succeeds or order zero allocation
> fails.
>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>

Does anybody want to pick this up?

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip:perf/urgent] perf/ring_buffer: Use high order allocations for AUX buffers optimistically
  2019-02-15 11:47 ` [PATCH v1 1/1] perf: Optimistically use high order allocations for AUX buffers Alexander Shishkin
  2019-02-22 12:57   ` Alexander Shishkin
@ 2019-03-09 14:38   ` tip-bot for Alexander Shishkin
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot for Alexander Shishkin @ 2019-03-09 14:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, acme, vincent.weaver, riel, mingo, hpa,
	alexander.shishkin, dave.hansen, luto, bp, peterz, tglx,
	torvalds, linux-kernel, eranian

Commit-ID:  5768402fd9c6e872252b5268ad85e3fbae4fe26b
Gitweb:     https://git.kernel.org/tip/5768402fd9c6e872252b5268ad85e3fbae4fe26b
Author:     Alexander Shishkin <alexander.shishkin@linux.intel.com>
AuthorDate: Fri, 15 Feb 2019 13:47:27 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 9 Mar 2019 14:10:30 +0100

perf/ring_buffer: Use high order allocations for AUX buffers optimistically

Currently, the AUX buffer allocator will use high-order allocations
for PMUs that don't support hardware scatter-gather chaining to ensure
large contiguous blocks of pages, and always use an array of single
pages otherwise.

There is, however, a tangible performance benefit in using larger chunks
of contiguous memory even in the latter case, that comes from not having
to fetch the next page's address at every page boundary. In particular,
a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime
penalty with a single multi-page output region in snapshot mode (no PMI)
than with multiple single-page output regions, from ~6% down to ~4%. For
the snapshot mode it does make a difference as it is intended to run over
long periods of time.

For this reason, change the allocation policy to always optimistically
start with the highest possible order when allocating pages for the AUX
buffer, desceding until the allocation succeeds or order zero allocation
fails.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lkml.kernel.org/r/20190215114727.62648-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/ring_buffer.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 678ccec60d8f..a4047321d7d8 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -598,29 +598,27 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
 {
 	bool overwrite = !(flags & RING_BUFFER_WRITABLE);
 	int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu);
-	int ret = -ENOMEM, max_order = 0;
+	int ret = -ENOMEM, max_order;
 
 	if (!has_aux(event))
 		return -EOPNOTSUPP;
 
-	if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NO_SG) {
-		/*
-		 * We need to start with the max_order that fits in nr_pages,
-		 * not the other way around, hence ilog2() and not get_order.
-		 */
-		max_order = ilog2(nr_pages);
+	/*
+	 * We need to start with the max_order that fits in nr_pages,
+	 * not the other way around, hence ilog2() and not get_order.
+	 */
+	max_order = ilog2(nr_pages);
 
-		/*
-		 * PMU requests more than one contiguous chunks of memory
-		 * for SW double buffering
-		 */
-		if ((event->pmu->capabilities & PERF_PMU_CAP_AUX_SW_DOUBLEBUF) &&
-		    !overwrite) {
-			if (!max_order)
-				return -EINVAL;
+	/*
+	 * PMU requests more than one contiguous chunks of memory
+	 * for SW double buffering
+	 */
+	if ((event->pmu->capabilities & PERF_PMU_CAP_AUX_SW_DOUBLEBUF) &&
+	    !overwrite) {
+		if (!max_order)
+			return -EINVAL;
 
-			max_order--;
-		}
+		max_order--;
 	}
 
 	rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-09 14:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-15 11:47 [PATCH v1 0/1] perf: High-order AUX allocations Alexander Shishkin
2019-02-15 11:47 ` [PATCH v1 1/1] perf: Optimistically use high order allocations for AUX buffers Alexander Shishkin
2019-02-22 12:57   ` Alexander Shishkin
2019-03-09 14:38   ` [tip:perf/urgent] perf/ring_buffer: Use high order allocations for AUX buffers optimistically tip-bot for Alexander Shishkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).