linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] perf, pt: Improve data loss
@ 2021-04-08 15:31 Alexander Shishkin
  2021-04-08 15:31 ` [PATCH 1/2] perf: Cap allocation order at aux_watermark Alexander Shishkin
  2021-04-08 15:31 ` [PATCH 2/2] perf intel-pt: Use aux_watermark Alexander Shishkin
  0 siblings, 2 replies; 5+ messages in thread
From: Alexander Shishkin @ 2021-04-08 15:31 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, adrian.hunter
  Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Mathieu Poirier,
	Alexander Shishkin

Hi guys,

There is a problem between the PT driver and the AUX allocator that results
in smaller buffers consisting of 2 high-order regions, which also means
only 2 possibilities of where PMI gets generated and where tracing stops.

This is not good enough for double buffering: when we get a PMI mid-buffer,
we update the ->aux_head etc and immediately start a new transaction while
observing ->aux_tail to still be zero, which makes the PT driver put a stop
bit at the end of the buffer. However quick userspace is to update the
->aux_tail, that second transaction/PERF_RECORD_AUX ends up truncated.

The proposed solution here is to set up attr.aux_watermark to a quarter of
the buffer. Unfortunately, at the moment, the PT driver is not equipped to
deal with aux_watermark that's smaller than the AUX allocation order. I
could fix this in the driver itself, but, seeing as it's the only PMU that
actually uses the 'order' from AUX allocations, I'd rather fix the
allocator instead, which is done in patch 1/2.

Patch 2/2 could be replaced by instead changing the in-kernel aux_watermark
default, but that may interfere with PMU drivers that don't ignore said
watermark / handle->wakeup (afaict, that's only arm_spe).

Alexander Shishkin (2):
  perf: Cap allocation order at aux_watermark
  perf intel-pt: Use aux_watermark

 kernel/events/ring_buffer.c         | 34 +++++++++++++++--------------
 tools/perf/arch/x86/util/intel-pt.c |  4 ++++
 2 files changed, 22 insertions(+), 16 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] perf: Cap allocation order at aux_watermark
  2021-04-08 15:31 [PATCH 0/2] perf, pt: Improve data loss Alexander Shishkin
@ 2021-04-08 15:31 ` Alexander Shishkin
  2021-04-08 15:31 ` [PATCH 2/2] perf intel-pt: Use aux_watermark Alexander Shishkin
  1 sibling, 0 replies; 5+ messages in thread
From: Alexander Shishkin @ 2021-04-08 15:31 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, adrian.hunter
  Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Mathieu Poirier,
	Alexander Shishkin

Currently, we start allocating AUX pages half the size of the total
requested AUX buffer size, ignoring the attr.aux_watermark setting. This,
in turn, makes intel_pt driver disregard the watermark also, as it uses
page order for its SG (ToPA) configuration.

Now, this can be fixed in the intel_pt PMU driver, but seeing as it's the
only one currently making use of high order allocations, there is no
reason not to fix the allocator instead. This way, any other driver
wishing to add this support would not have to worry about this.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/ring_buffer.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index bd55ccc91373..bd94b91bd4be 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -674,21 +674,26 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
 	if (!has_aux(event))
 		return -EOPNOTSUPP;
 
-	/*
-	 * We need to start with the max_order that fits in nr_pages,
-	 * not the other way around, hence ilog2() and not get_order.
-	 */
-	max_order = ilog2(nr_pages);
-
-	/*
-	 * PMU requests more than one contiguous chunks of memory
-	 * for SW double buffering
-	 */
 	if (!overwrite) {
-		if (!max_order)
-			return -EINVAL;
+		/*
+		 * Watermark defaults to half the buffer, and so does the
+		 * max_order, to aid PMU drivers in double buffering.
+		 */
+		if (!watermark)
+			watermark = nr_pages << (PAGE_SHIFT - 1);
 
-		max_order--;
+		/*
+		 * Use aux_watermark as the basis for chunking to
+		 * help PMU drivers honor the watermark.
+		 */
+		max_order = get_order(watermark);
+	} else {
+		/*
+		* We need to start with the max_order that fits in nr_pages,
+		* not the other way around, hence ilog2() and not get_order.
+		*/
+		max_order = ilog2(nr_pages);
+		watermark = 0;
 	}
 
 	rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
@@ -743,9 +748,6 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
 	rb->aux_overwrite = overwrite;
 	rb->aux_watermark = watermark;
 
-	if (!rb->aux_watermark && !rb->aux_overwrite)
-		rb->aux_watermark = nr_pages << (PAGE_SHIFT - 1);
-
 out:
 	if (!ret)
 		rb->aux_pgoff = pgoff;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] perf intel-pt: Use aux_watermark
  2021-04-08 15:31 [PATCH 0/2] perf, pt: Improve data loss Alexander Shishkin
  2021-04-08 15:31 ` [PATCH 1/2] perf: Cap allocation order at aux_watermark Alexander Shishkin
@ 2021-04-08 15:31 ` Alexander Shishkin
  2021-04-14  6:00   ` Adrian Hunter
  1 sibling, 1 reply; 5+ messages in thread
From: Alexander Shishkin @ 2021-04-08 15:31 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo, adrian.hunter
  Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Mathieu Poirier,
	Alexander Shishkin

Turns out, the default setting of attr.aux_watermark to half of the total
buffer size is not very useful, especially with smaller buffers. The
problem is that, after half of the buffer is filled up, the kernel updates
->aux_head and sets up the next "transaction", while observing that
->aux_tail is still zero (as userspace haven't had the chance to update
it), meaning that the trace will have to stop at the end of this second
"transaction". This means, for example, that the second PERF_RECORD_AUX in
every trace comes with TRUNCATED flag set.

Setting attr.aux_watermark to quarter of the buffer gives enough space for
the ->aux_tail update to be observed and prevents the data loss.

The obligatory before/after showcase:

> # perf_before record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 6 times to write data ]
> Warning:
> AUX data lost 4 times out of 10!
>
> [ perf record: Captured and wrote 0.099 MB perf.data ]
> # perf record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 4 times to write data ]
> [ perf record: Captured and wrote 0.039 MB perf.data ]

The effect is still visible with large workloads and large buffers,
although less pronounced.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 tools/perf/arch/x86/util/intel-pt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index a6420c647959..d00707faf547 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -776,6 +776,10 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 		}
 	}
 
+	if (opts->full_auxtrace)
+		intel_pt_evsel->core.attr.aux_watermark =
+		       opts->auxtrace_mmap_pages / 4 * page_size;
+
 	intel_pt_parse_terms(intel_pt_pmu->name, &intel_pt_pmu->format,
 			     "tsc", &tsc_bit);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] perf intel-pt: Use aux_watermark
  2021-04-08 15:31 ` [PATCH 2/2] perf intel-pt: Use aux_watermark Alexander Shishkin
@ 2021-04-14  6:00   ` Adrian Hunter
  2021-04-14 15:50     ` Alexander Shishkin
  0 siblings, 1 reply; 5+ messages in thread
From: Adrian Hunter @ 2021-04-14  6:00 UTC (permalink / raw)
  To: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Mathieu Poirier

On 8/04/21 6:31 pm, Alexander Shishkin wrote:
> Turns out, the default setting of attr.aux_watermark to half of the total
> buffer size is not very useful, especially with smaller buffers. The
> problem is that, after half of the buffer is filled up, the kernel updates
> ->aux_head and sets up the next "transaction", while observing that
> ->aux_tail is still zero (as userspace haven't had the chance to update
> it), meaning that the trace will have to stop at the end of this second
> "transaction". This means, for example, that the second PERF_RECORD_AUX in
> every trace comes with TRUNCATED flag set.
> 
> Setting attr.aux_watermark to quarter of the buffer gives enough space for
> the ->aux_tail update to be observed and prevents the data loss.
> 
> The obligatory before/after showcase:
> 
>> # perf_before record -e intel_pt//u -m,8 uname
>> Linux
>> [ perf record: Woken up 6 times to write data ]
>> Warning:
>> AUX data lost 4 times out of 10!
>>
>> [ perf record: Captured and wrote 0.099 MB perf.data ]
>> # perf record -e intel_pt//u -m,8 uname
>> Linux
>> [ perf record: Woken up 4 times to write data ]
>> [ perf record: Captured and wrote 0.039 MB perf.data ]
> 
> The effect is still visible with large workloads and large buffers,
> although less pronounced.
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> ---
>  tools/perf/arch/x86/util/intel-pt.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
> index a6420c647959..d00707faf547 100644
> --- a/tools/perf/arch/x86/util/intel-pt.c
> +++ b/tools/perf/arch/x86/util/intel-pt.c
> @@ -776,6 +776,10 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
>  		}
>  	}
>  
> +	if (opts->full_auxtrace)
> +		intel_pt_evsel->core.attr.aux_watermark =
> +		       opts->auxtrace_mmap_pages / 4 * page_size;
> +

I would be explicit about the mode and put "/ 4" at the end
for the case auxtrace_mmap_pages is not a multiple of 4 (e.g. 2).
i.e.

	if (!opts->auxtrace_snapshot_mode && !opts->auxtrace_sample_mode) {
		u32 aux_watermark = opts->auxtrace_mmap_pages * page_size / 4;

		intel_pt_evsel->core.attr.aux_watermark = aux_watermark;
	}


>  	intel_pt_parse_terms(intel_pt_pmu->name, &intel_pt_pmu->format,
>  			     "tsc", &tsc_bit);
>  
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] perf intel-pt: Use aux_watermark
  2021-04-14  6:00   ` Adrian Hunter
@ 2021-04-14 15:50     ` Alexander Shishkin
  0 siblings, 0 replies; 5+ messages in thread
From: Alexander Shishkin @ 2021-04-14 15:50 UTC (permalink / raw)
  To: Adrian Hunter, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Mathieu Poirier,
	alexander.shishkin

Adrian Hunter <adrian.hunter@intel.com> writes:

> On 8/04/21 6:31 pm, Alexander Shishkin wrote:
>> diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
>> index a6420c647959..d00707faf547 100644
>> --- a/tools/perf/arch/x86/util/intel-pt.c
>> +++ b/tools/perf/arch/x86/util/intel-pt.c
>> @@ -776,6 +776,10 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
>>  		}
>>  	}
>>  
>> +	if (opts->full_auxtrace)
>> +		intel_pt_evsel->core.attr.aux_watermark =
>> +		       opts->auxtrace_mmap_pages / 4 * page_size;
>> +
>
> I would be explicit about the mode and put "/ 4" at the end
> for the case auxtrace_mmap_pages is not a multiple of 4 (e.g. 2).
> i.e.
>
> 	if (!opts->auxtrace_snapshot_mode && !opts->auxtrace_sample_mode) {
> 		u32 aux_watermark = opts->auxtrace_mmap_pages * page_size / 4;
>
> 		intel_pt_evsel->core.attr.aux_watermark = aux_watermark;
> 	}

Thank you! I'll do exactly that.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-14 15:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08 15:31 [PATCH 0/2] perf, pt: Improve data loss Alexander Shishkin
2021-04-08 15:31 ` [PATCH 1/2] perf: Cap allocation order at aux_watermark Alexander Shishkin
2021-04-08 15:31 ` [PATCH 2/2] perf intel-pt: Use aux_watermark Alexander Shishkin
2021-04-14  6:00   ` Adrian Hunter
2021-04-14 15:50     ` Alexander Shishkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).