All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 0/3] drm/i915/perf: A few fixes and enhancements
@ 2023-09-09  1:16 Ashutosh Dixit
  2023-09-09  1:16 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: Subtract gtt_offset from hw_tail Ashutosh Dixit
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ashutosh Dixit @ 2023-09-09  1:16 UTC (permalink / raw)
  To: intel-gfx

Ashutosh Dixit (2):
  drm/i915/perf: Subtract gtt_offset from hw_tail
  drm/i915/perf: Remove gtt_offset from stream->oa_buffer.head/.tail

Umesh Nerlige Ramappa (1):
  drm/i915/perf: Initialize gen12 OA buffer unconditionally

 drivers/gpu/drm/i915/i915_perf.c | 62 ++++++++++----------------------
 1 file changed, 18 insertions(+), 44 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH 1/3] drm/i915/perf: Subtract gtt_offset from hw_tail
  2023-09-09  1:16 [Intel-gfx] [PATCH 0/3] drm/i915/perf: A few fixes and enhancements Ashutosh Dixit
@ 2023-09-09  1:16 ` Ashutosh Dixit
  2023-09-13  1:25   ` Umesh Nerlige Ramappa
  2023-09-09  1:16 ` [Intel-gfx] [PATCH 2/3] drm/i915/perf: Remove gtt_offset from stream->oa_buffer.head/.tail Ashutosh Dixit
  2023-09-09  1:16 ` [Intel-gfx] [PATCH 3/3] drm/i915/perf: Initialize gen12 OA buffer unconditionally Ashutosh Dixit
  2 siblings, 1 reply; 10+ messages in thread
From: Ashutosh Dixit @ 2023-09-09  1:16 UTC (permalink / raw)
  To: intel-gfx

The code in oa_buffer_check_unlocked() is correct only if the OA buffer is
16 MB aligned (which seems to be the case today in i915). However when the
16 MB alignment is dropped, when we "Subtract partial amount off the tail",
the "& (OA_BUFFER_SIZE - 1)" operation in OA_TAKEN() will result in an
incorrect hw_tail value.

Therefore hw_tail must be brought to the same base as head and read_tail
prior to OA_TAKEN by subtracting gtt_offset from hw_tail.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 018f42fff4cc0..ec0fc2934045a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -565,6 +565,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	partial_report_size %= report_size;
 
 	/* Subtract partial amount off the tail */
+	hw_tail -= gtt_offset;
 	hw_tail = OA_TAKEN(hw_tail, partial_report_size);
 
 	/* NB: The head we observe here might effectively be a little
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH 2/3] drm/i915/perf: Remove gtt_offset from stream->oa_buffer.head/.tail
  2023-09-09  1:16 [Intel-gfx] [PATCH 0/3] drm/i915/perf: A few fixes and enhancements Ashutosh Dixit
  2023-09-09  1:16 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: Subtract gtt_offset from hw_tail Ashutosh Dixit
@ 2023-09-09  1:16 ` Ashutosh Dixit
  2023-09-13  1:36   ` Umesh Nerlige Ramappa
  2023-09-09  1:16 ` [Intel-gfx] [PATCH 3/3] drm/i915/perf: Initialize gen12 OA buffer unconditionally Ashutosh Dixit
  2 siblings, 1 reply; 10+ messages in thread
From: Ashutosh Dixit @ 2023-09-09  1:16 UTC (permalink / raw)
  To: intel-gfx

There is no reason to add gtt_offset to the cached head/tail pointers
stream->oa_buffer.head and stream->oa_buffer.tail. This causes the code to
constantly add gtt_offset and subtract gtt_offset and is error
prone (e.g. see previous patch).

It is much simpler to maintain stream->oa_buffer.head and
stream->oa_buffer.tail without adding gtt_offset to them and just allow for
the gtt_offset when reading/writing from/to HW registers.

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 53 ++++++++------------------------
 1 file changed, 13 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index ec0fc2934045a..1347e4ec9dd5a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -543,10 +543,9 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 {
 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
 	int report_size = stream->oa_buffer.format->size;
-	u32 head, tail, read_tail;
+	u32 tail, hw_tail;
 	unsigned long flags;
 	bool pollin;
-	u32 hw_tail;
 	u32 partial_report_size;
 
 	/* We have to consider the (unlikely) possibility that read() errors
@@ -556,6 +555,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
 	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
+	hw_tail -= gtt_offset;
 
 	/* The tail pointer increases in 64 byte increments, not in report_size
 	 * steps. Also the report size may not be a power of 2. Compute
@@ -565,16 +565,8 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	partial_report_size %= report_size;
 
 	/* Subtract partial amount off the tail */
-	hw_tail -= gtt_offset;
 	hw_tail = OA_TAKEN(hw_tail, partial_report_size);
 
-	/* NB: The head we observe here might effectively be a little
-	 * out of date. If a read() is in progress, the head could be
-	 * anywhere between this head and stream->oa_buffer.tail.
-	 */
-	head = stream->oa_buffer.head - gtt_offset;
-	read_tail = stream->oa_buffer.tail - gtt_offset;
-
 	tail = hw_tail;
 
 	/* Walk the stream backward until we find a report with report
@@ -588,7 +580,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	 * memory in the order they were written to.
 	 * If not : (╯°□°)╯︵ ┻━┻
 	 */
-	while (OA_TAKEN(tail, read_tail) >= report_size) {
+	while (OA_TAKEN(tail, stream->oa_buffer.tail) >= report_size) {
 		void *report = stream->oa_buffer.vaddr + tail;
 
 		if (oa_report_id(stream, report) ||
@@ -602,9 +594,9 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	    __ratelimit(&stream->perf->tail_pointer_race))
 		drm_notice(&stream->uncore->i915->drm,
 			   "unlanded report(s) head=0x%x tail=0x%x hw_tail=0x%x\n",
-		 head, tail, hw_tail);
+		 stream->oa_buffer.head, tail, hw_tail);
 
-	stream->oa_buffer.tail = gtt_offset + tail;
+	stream->oa_buffer.tail = tail;
 
 	pollin = OA_TAKEN(stream->oa_buffer.tail,
 			  stream->oa_buffer.head) >= report_size;
@@ -754,13 +746,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	/*
-	 * NB: oa_buffer.head/tail include the gtt_offset which we don't want
-	 * while indexing relative to oa_buf_base.
-	 */
-	head -= gtt_offset;
-	tail -= gtt_offset;
-
 	/*
 	 * An out of bounds or misaligned head or tail pointer implies a driver
 	 * bug since we validate + align the tail pointers we read from the
@@ -896,9 +881,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		 * We removed the gtt_offset for the copy loop above, indexing
 		 * relative to oa_buf_base so put back here...
 		 */
-		head += gtt_offset;
 		intel_uncore_write(uncore, oaheadptr,
-				   head & GEN12_OAG_OAHEADPTR_MASK);
+				   (head + gtt_offset) & GEN12_OAG_OAHEADPTR_MASK);
 		stream->oa_buffer.head = head;
 
 		spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
@@ -1043,12 +1027,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	/* NB: oa_buffer.head/tail include the gtt_offset which we don't want
-	 * while indexing relative to oa_buf_base.
-	 */
-	head -= gtt_offset;
-	tail -= gtt_offset;
-
 	/* An out of bounds or misaligned head or tail pointer implies a driver
 	 * bug since we validate + align the tail pointers we read from the
 	 * hardware and we are in full control of the head pointer which should
@@ -1111,13 +1089,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	if (start_offset != *offset) {
 		spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
-		/* We removed the gtt_offset for the copy loop above, indexing
-		 * relative to oa_buf_base so put back here...
-		 */
-		head += gtt_offset;
-
 		intel_uncore_write(uncore, GEN7_OASTATUS2,
-				   (head & GEN7_OASTATUS2_HEAD_MASK) |
+				   ((head + gtt_offset) & GEN7_OASTATUS2_HEAD_MASK) |
 				   GEN7_OASTATUS2_MEM_SELECT_GGTT);
 		stream->oa_buffer.head = head;
 
@@ -1705,7 +1678,7 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
 	 */
 	intel_uncore_write(uncore, GEN7_OASTATUS2, /* head */
 			   gtt_offset | GEN7_OASTATUS2_MEM_SELECT_GGTT);
-	stream->oa_buffer.head = gtt_offset;
+	stream->oa_buffer.head = 0;
 
 	intel_uncore_write(uncore, GEN7_OABUFFER, gtt_offset);
 
@@ -1713,7 +1686,7 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
 			   gtt_offset | OABUFFER_SIZE_16M);
 
 	/* Mark that we need updated tail pointers to read from... */
-	stream->oa_buffer.tail = gtt_offset;
+	stream->oa_buffer.tail = 0;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
@@ -1747,7 +1720,7 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
 
 	intel_uncore_write(uncore, GEN8_OASTATUS, 0);
 	intel_uncore_write(uncore, GEN8_OAHEADPTR, gtt_offset);
-	stream->oa_buffer.head = gtt_offset;
+	stream->oa_buffer.head = 0;
 
 	intel_uncore_write(uncore, GEN8_OABUFFER_UDW, 0);
 
@@ -1764,7 +1737,7 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
 	intel_uncore_write(uncore, GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK);
 
 	/* Mark that we need updated tail pointers to read from... */
-	stream->oa_buffer.tail = gtt_offset;
+	stream->oa_buffer.tail = 0;
 
 	/*
 	 * Reset state used to recognise context switches, affecting which
@@ -1801,7 +1774,7 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
 	intel_uncore_write(uncore, __oa_regs(stream)->oa_status, 0);
 	intel_uncore_write(uncore, __oa_regs(stream)->oa_head_ptr,
 			   gtt_offset & GEN12_OAG_OAHEADPTR_MASK);
-	stream->oa_buffer.head = gtt_offset;
+	stream->oa_buffer.head = 0;
 
 	/*
 	 * PRM says:
@@ -1817,7 +1790,7 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
 			   gtt_offset & GEN12_OAG_OATAILPTR_MASK);
 
 	/* Mark that we need updated tail pointers to read from... */
-	stream->oa_buffer.tail = gtt_offset;
+	stream->oa_buffer.tail = 0;
 
 	/*
 	 * Reset state used to recognise context switches, affecting which
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH 3/3] drm/i915/perf: Initialize gen12 OA buffer unconditionally
  2023-09-09  1:16 [Intel-gfx] [PATCH 0/3] drm/i915/perf: A few fixes and enhancements Ashutosh Dixit
  2023-09-09  1:16 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: Subtract gtt_offset from hw_tail Ashutosh Dixit
  2023-09-09  1:16 ` [Intel-gfx] [PATCH 2/3] drm/i915/perf: Remove gtt_offset from stream->oa_buffer.head/.tail Ashutosh Dixit
@ 2023-09-09  1:16 ` Ashutosh Dixit
  2023-09-09  1:24   ` Dixit, Ashutosh
  2 siblings, 1 reply; 10+ messages in thread
From: Ashutosh Dixit @ 2023-09-09  1:16 UTC (permalink / raw)
  To: intel-gfx

From: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Correct values for OAR counters are still dependent on enabling the
GEN12_OAG_OACONTROL_OA_COUNTER_ENABLE in OAG_OACONTROL. Enabling this
bit means OAG unit will write reports to the OAG buffer, so
initialize the OAG buffer unconditionally for all use cases.

BSpec: 46822

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 1347e4ec9dd5a..30cf37d6e79be 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -3032,12 +3032,12 @@ static void gen12_oa_enable(struct i915_perf_stream *stream)
 	u32 val;
 
 	/*
-	 * If we don't want OA reports from the OA buffer, then we don't even
-	 * need to program the OAG unit.
+	 * BSpec: 46822
+	 * Correct values for OAR counters are still dependent on enabling the
+	 * GEN12_OAG_OACONTROL_OA_COUNTER_ENABLE in OAG_OACONTROL. Enabling this
+	 * bit means OAG unit will write reports to the OAG buffer, so
+	 * initialize the OAG buffer correctly.
 	 */
-	if (!(stream->sample_flags & SAMPLE_OA_REPORT))
-		return;
-
 	gen12_init_oa_buffer(stream);
 
 	regs = __oa_regs(stream);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH 3/3] drm/i915/perf: Initialize gen12 OA buffer unconditionally
  2023-09-09  1:16 ` [Intel-gfx] [PATCH 3/3] drm/i915/perf: Initialize gen12 OA buffer unconditionally Ashutosh Dixit
@ 2023-09-09  1:24   ` Dixit, Ashutosh
  2023-09-13  1:46     ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 10+ messages in thread
From: Dixit, Ashutosh @ 2023-09-09  1:24 UTC (permalink / raw)
  To: intel-gfx

On Fri, 08 Sep 2023 18:16:26 -0700, Ashutosh Dixit wrote:
>

Hi Umesh,

> From: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>
> Correct values for OAR counters are still dependent on enabling the
> GEN12_OAG_OACONTROL_OA_COUNTER_ENABLE in OAG_OACONTROL. Enabling this
> bit means OAG unit will write reports to the OAG buffer, so
> initialize the OAG buffer unconditionally for all use cases.
>
> BSpec: 46822
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 1347e4ec9dd5a..30cf37d6e79be 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -3032,12 +3032,12 @@ static void gen12_oa_enable(struct i915_perf_stream *stream)
>	u32 val;
>
>	/*
> -	 * If we don't want OA reports from the OA buffer, then we don't even
> -	 * need to program the OAG unit.
> +	 * BSpec: 46822
> +	 * Correct values for OAR counters are still dependent on enabling the
> +	 * GEN12_OAG_OACONTROL_OA_COUNTER_ENABLE in OAG_OACONTROL. Enabling this
> +	 * bit means OAG unit will write reports to the OAG buffer, so
> +	 * initialize the OAG buffer correctly.
>	 */
> -	if (!(stream->sample_flags & SAMPLE_OA_REPORT))
> -		return;
> -
>	gen12_init_oa_buffer(stream);
>
>	regs = __oa_regs(stream);

Looks like this should be needed, I can R-b it.

However, gen12_test_mi_rpc IGT says:

	/* OA unit configuration:
	 * DRM_I915_PERF_PROP_SAMPLE_OA is no longer required for Gen12
	 * because the OAR unit increments counters only for the
	 * relevant context. No other parameters are needed since we do
	 * not rely on the OA buffer anymore to normalize the counter
	 * values.
	 */

So gen12_test_mi_rpc doesn't set DRM_I915_PERF_PROP_SAMPLE_OA and also
seems to be passing in CI (don't see it but there seem to be no open
bugs). Thoughts?

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Subtract gtt_offset from hw_tail
  2023-09-09  1:16 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: Subtract gtt_offset from hw_tail Ashutosh Dixit
@ 2023-09-13  1:25   ` Umesh Nerlige Ramappa
  2023-09-13  2:20     ` Dixit, Ashutosh
  0 siblings, 1 reply; 10+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-09-13  1:25 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-gfx

On Fri, Sep 08, 2023 at 06:16:24PM -0700, Ashutosh Dixit wrote:
>The code in oa_buffer_check_unlocked() is correct only if the OA buffer is
>16 MB aligned (which seems to be the case today in i915). However when the
>16 MB alignment is dropped, when we "Subtract partial amount off the tail",
>the "& (OA_BUFFER_SIZE - 1)" operation in OA_TAKEN() will result in an
>incorrect hw_tail value.
>
>Therefore hw_tail must be brought to the same base as head and read_tail
>prior to OA_TAKEN by subtracting gtt_offset from hw_tail.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/i915/i915_perf.c | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>index 018f42fff4cc0..ec0fc2934045a 100644
>--- a/drivers/gpu/drm/i915/i915_perf.c
>+++ b/drivers/gpu/drm/i915/i915_perf.c
>@@ -565,6 +565,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> 	partial_report_size %= report_size;
>
> 	/* Subtract partial amount off the tail */
>+	hw_tail -= gtt_offset;
> 	hw_tail = OA_TAKEN(hw_tail, partial_report_size);

I see partial_report_size is a value in the 0 - report_size range and it 
may not have the gtt_offset added to it, so I guess the OA_TAKEN may 
result in a bad value, but I am not able to visualize what the specific 
issue is. Can you please provide an example with numbers?

Also, slightly confused about the need for this patch. Are we dropping 
the 16 MB alignment for some reason?  If not, I suggest we can add this 
patch later with any series that drops it.

Thanks,
Umesh

>
> 	/* NB: The head we observe here might effectively be a little
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH 2/3] drm/i915/perf: Remove gtt_offset from stream->oa_buffer.head/.tail
  2023-09-09  1:16 ` [Intel-gfx] [PATCH 2/3] drm/i915/perf: Remove gtt_offset from stream->oa_buffer.head/.tail Ashutosh Dixit
@ 2023-09-13  1:36   ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 10+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-09-13  1:36 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-gfx

On Fri, Sep 08, 2023 at 06:16:25PM -0700, Ashutosh Dixit wrote:
>There is no reason to add gtt_offset to the cached head/tail pointers
>stream->oa_buffer.head and stream->oa_buffer.tail. This causes the code to
>constantly add gtt_offset and subtract gtt_offset and is error
>prone (e.g. see previous patch).
>
>It is much simpler to maintain stream->oa_buffer.head and
>stream->oa_buffer.tail without adding gtt_offset to them and just allow for
>the gtt_offset when reading/writing from/to HW registers.
>
>Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>---
> drivers/gpu/drm/i915/i915_perf.c | 53 ++++++++------------------------
> 1 file changed, 13 insertions(+), 40 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>index ec0fc2934045a..1347e4ec9dd5a 100644
>--- a/drivers/gpu/drm/i915/i915_perf.c
>+++ b/drivers/gpu/drm/i915/i915_perf.c
>@@ -543,10 +543,9 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> {
> 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
> 	int report_size = stream->oa_buffer.format->size;
>-	u32 head, tail, read_tail;
>+	u32 tail, hw_tail;
> 	unsigned long flags;
> 	bool pollin;
>-	u32 hw_tail;
> 	u32 partial_report_size;
>
> 	/* We have to consider the (unlikely) possibility that read() errors
>@@ -556,6 +555,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>
> 	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
>+	hw_tail -= gtt_offset;

Since this patch intends to remove gtt_offset for all head/tail 
calculations, we don't need patch 1/3. Patch 1 can be dropped.

>
> 	/* The tail pointer increases in 64 byte increments, not in report_size
> 	 * steps. Also the report size may not be a power of 2. Compute
>@@ -565,16 +565,8 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> 	partial_report_size %= report_size;
>
> 	/* Subtract partial amount off the tail */
>-	hw_tail -= gtt_offset;
> 	hw_tail = OA_TAKEN(hw_tail, partial_report_size);
>
>-	/* NB: The head we observe here might effectively be a little
>-	 * out of date. If a read() is in progress, the head could be
>-	 * anywhere between this head and stream->oa_buffer.tail.
>-	 */
>-	head = stream->oa_buffer.head - gtt_offset;
>-	read_tail = stream->oa_buffer.tail - gtt_offset;
>-
> 	tail = hw_tail;
>
> 	/* Walk the stream backward until we find a report with report
>@@ -588,7 +580,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> 	 * memory in the order they were written to.
> 	 * If not : (╯°□°)╯︵ ┻━┻
> 	 */
>-	while (OA_TAKEN(tail, read_tail) >= report_size) {
>+	while (OA_TAKEN(tail, stream->oa_buffer.tail) >= report_size) {
> 		void *report = stream->oa_buffer.vaddr + tail;
>
> 		if (oa_report_id(stream, report) ||
>@@ -602,9 +594,9 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> 	    __ratelimit(&stream->perf->tail_pointer_race))
> 		drm_notice(&stream->uncore->i915->drm,
> 			   "unlanded report(s) head=0x%x tail=0x%x hw_tail=0x%x\n",
>-		 head, tail, hw_tail);
>+		 stream->oa_buffer.head, tail, hw_tail);
>
>-	stream->oa_buffer.tail = gtt_offset + tail;
>+	stream->oa_buffer.tail = tail;
>
> 	pollin = OA_TAKEN(stream->oa_buffer.tail,
> 			  stream->oa_buffer.head) >= report_size;
>@@ -754,13 +746,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>
> 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>
>-	/*
>-	 * NB: oa_buffer.head/tail include the gtt_offset which we don't want
>-	 * while indexing relative to oa_buf_base.
>-	 */
>-	head -= gtt_offset;
>-	tail -= gtt_offset;
>-
> 	/*
> 	 * An out of bounds or misaligned head or tail pointer implies a driver
> 	 * bug since we validate + align the tail pointers we read from the
>@@ -896,9 +881,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
> 		 * We removed the gtt_offset for the copy loop above, indexing
> 		 * relative to oa_buf_base so put back here...
> 		 */
>-		head += gtt_offset;
> 		intel_uncore_write(uncore, oaheadptr,
>-				   head & GEN12_OAG_OAHEADPTR_MASK);
>+				   (head + gtt_offset) & GEN12_OAG_OAHEADPTR_MASK);
> 		stream->oa_buffer.head = head;
>
> 		spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>@@ -1043,12 +1027,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>
> 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>
>-	/* NB: oa_buffer.head/tail include the gtt_offset which we don't want
>-	 * while indexing relative to oa_buf_base.
>-	 */
>-	head -= gtt_offset;
>-	tail -= gtt_offset;
>-
> 	/* An out of bounds or misaligned head or tail pointer implies a driver
> 	 * bug since we validate + align the tail pointers we read from the
> 	 * hardware and we are in full control of the head pointer which should
>@@ -1111,13 +1089,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
> 	if (start_offset != *offset) {
> 		spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>
>-		/* We removed the gtt_offset for the copy loop above, indexing
>-		 * relative to oa_buf_base so put back here...
>-		 */
>-		head += gtt_offset;
>-
> 		intel_uncore_write(uncore, GEN7_OASTATUS2,
>-				   (head & GEN7_OASTATUS2_HEAD_MASK) |
>+				   ((head + gtt_offset) & GEN7_OASTATUS2_HEAD_MASK) |
> 				   GEN7_OASTATUS2_MEM_SELECT_GGTT);
> 		stream->oa_buffer.head = head;
>
>@@ -1705,7 +1678,7 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
> 	 */
> 	intel_uncore_write(uncore, GEN7_OASTATUS2, /* head */
> 			   gtt_offset | GEN7_OASTATUS2_MEM_SELECT_GGTT);
>-	stream->oa_buffer.head = gtt_offset;
>+	stream->oa_buffer.head = 0;
>
> 	intel_uncore_write(uncore, GEN7_OABUFFER, gtt_offset);
>
>@@ -1713,7 +1686,7 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
> 			   gtt_offset | OABUFFER_SIZE_16M);
>
> 	/* Mark that we need updated tail pointers to read from... */
>-	stream->oa_buffer.tail = gtt_offset;
>+	stream->oa_buffer.tail = 0;
>
> 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>
>@@ -1747,7 +1720,7 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
>
> 	intel_uncore_write(uncore, GEN8_OASTATUS, 0);
> 	intel_uncore_write(uncore, GEN8_OAHEADPTR, gtt_offset);
>-	stream->oa_buffer.head = gtt_offset;
>+	stream->oa_buffer.head = 0;
>
> 	intel_uncore_write(uncore, GEN8_OABUFFER_UDW, 0);
>
>@@ -1764,7 +1737,7 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
> 	intel_uncore_write(uncore, GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK);
>
> 	/* Mark that we need updated tail pointers to read from... */
>-	stream->oa_buffer.tail = gtt_offset;
>+	stream->oa_buffer.tail = 0;
>
> 	/*
> 	 * Reset state used to recognise context switches, affecting which
>@@ -1801,7 +1774,7 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
> 	intel_uncore_write(uncore, __oa_regs(stream)->oa_status, 0);
> 	intel_uncore_write(uncore, __oa_regs(stream)->oa_head_ptr,
> 			   gtt_offset & GEN12_OAG_OAHEADPTR_MASK);
>-	stream->oa_buffer.head = gtt_offset;
>+	stream->oa_buffer.head = 0;
>
> 	/*
> 	 * PRM says:
>@@ -1817,7 +1790,7 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
> 			   gtt_offset & GEN12_OAG_OATAILPTR_MASK);
>
> 	/* Mark that we need updated tail pointers to read from... */
>-	stream->oa_buffer.tail = gtt_offset;
>+	stream->oa_buffer.tail = 0;
>

Looks correct.

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

Thanks,
Umesh
> 	/*
> 	 * Reset state used to recognise context switches, affecting which
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH 3/3] drm/i915/perf: Initialize gen12 OA buffer unconditionally
  2023-09-09  1:24   ` Dixit, Ashutosh
@ 2023-09-13  1:46     ` Umesh Nerlige Ramappa
  2023-09-13 16:59       ` Dixit, Ashutosh
  0 siblings, 1 reply; 10+ messages in thread
From: Umesh Nerlige Ramappa @ 2023-09-13  1:46 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Fri, Sep 08, 2023 at 06:24:16PM -0700, Dixit, Ashutosh wrote:
>On Fri, 08 Sep 2023 18:16:26 -0700, Ashutosh Dixit wrote:
>>
>
>Hi Umesh,
>
>> From: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>
>> Correct values for OAR counters are still dependent on enabling the
>> GEN12_OAG_OACONTROL_OA_COUNTER_ENABLE in OAG_OACONTROL. Enabling this
>> bit means OAG unit will write reports to the OAG buffer, so
>> initialize the OAG buffer unconditionally for all use cases.
>>
>> BSpec: 46822
>>
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_perf.c | 10 +++++-----
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 1347e4ec9dd5a..30cf37d6e79be 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -3032,12 +3032,12 @@ static void gen12_oa_enable(struct i915_perf_stream *stream)
>>	u32 val;
>>
>>	/*
>> -	 * If we don't want OA reports from the OA buffer, then we don't even
>> -	 * need to program the OAG unit.
>> +	 * BSpec: 46822
>> +	 * Correct values for OAR counters are still dependent on enabling the
>> +	 * GEN12_OAG_OACONTROL_OA_COUNTER_ENABLE in OAG_OACONTROL. Enabling this
>> +	 * bit means OAG unit will write reports to the OAG buffer, so
>> +	 * initialize the OAG buffer correctly.
>>	 */
>> -	if (!(stream->sample_flags & SAMPLE_OA_REPORT))
>> -		return;
>> -
>>	gen12_init_oa_buffer(stream);
>>
>>	regs = __oa_regs(stream);
>
>Looks like this should be needed, I can R-b it.
>
>However, gen12_test_mi_rpc IGT says:
>
>	/* OA unit configuration:
>	 * DRM_I915_PERF_PROP_SAMPLE_OA is no longer required for Gen12
>	 * because the OAR unit increments counters only for the
>	 * relevant context. No other parameters are needed since we do
>	 * not rely on the OA buffer anymore to normalize the counter
>	 * values.
>	 */

That's wrong. When TGL support was added, this was misunderstood and I 
removed the OAR-OAG dependency. Ideally we should enforce user to pass 
SAMPLE_OA always, but now that will break uabi.

For for the OAR case, let's just enable OAG unconditionally so that the 
OAR counters tick correctly. While we do that, we should disable all 
events that trigger a report into the OA buffer. In addition, I would 
also allocate the smallest OA buffer size for this case, so that memory 
impact is low.

Needs a Fixes tag with the commit that enabled OA for TGL.

Regards,
Umesh

>
>So gen12_test_mi_rpc doesn't set DRM_I915_PERF_PROP_SAMPLE_OA and also
>seems to be passing in CI (don't see it but there seem to be no open
>bugs). Thoughts?
>
>Thanks.
>--
>Ashutosh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Subtract gtt_offset from hw_tail
  2023-09-13  1:25   ` Umesh Nerlige Ramappa
@ 2023-09-13  2:20     ` Dixit, Ashutosh
  0 siblings, 0 replies; 10+ messages in thread
From: Dixit, Ashutosh @ 2023-09-13  2:20 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 12 Sep 2023 18:25:16 -0700, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> On Fri, Sep 08, 2023 at 06:16:24PM -0700, Ashutosh Dixit wrote:
> > The code in oa_buffer_check_unlocked() is correct only if the OA buffer is
> > 16 MB aligned (which seems to be the case today in i915). However when the
> > 16 MB alignment is dropped, when we "Subtract partial amount off the tail",
> > the "& (OA_BUFFER_SIZE - 1)" operation in OA_TAKEN() will result in an
> > incorrect hw_tail value.
> >
> > Therefore hw_tail must be brought to the same base as head and read_tail
> > prior to OA_TAKEN by subtracting gtt_offset from hw_tail.
> >
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_perf.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index 018f42fff4cc0..ec0fc2934045a 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -565,6 +565,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >	partial_report_size %= report_size;
> >
> >	/* Subtract partial amount off the tail */
> > +	hw_tail -= gtt_offset;
> >	hw_tail = OA_TAKEN(hw_tail, partial_report_size);
>
> I see partial_report_size is a value in the 0 - report_size range and it
> may not have the gtt_offset added to it, so I guess the OA_TAKEN may result
> in a bad value, but I am not able to visualize what the specific issue
> is. Can you please provide an example with numbers?
>
> Also, slightly confused about the need for this patch. Are we dropping the
> 16 MB alignment for some reason?  If not, I suggest we can add this patch
> later with any series that drops it.

I found this issue when porting i915 OA code to XE (which doesn't have 16
MB alignment), I had to make this fix in XE in order to get things to work.

In any case, as you already pointed out, Patch 2 overrides Patch 1, so
let's just drop Patch 1.

Thanks.
--
Ashutosh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH 3/3] drm/i915/perf: Initialize gen12 OA buffer unconditionally
  2023-09-13  1:46     ` Umesh Nerlige Ramappa
@ 2023-09-13 16:59       ` Dixit, Ashutosh
  0 siblings, 0 replies; 10+ messages in thread
From: Dixit, Ashutosh @ 2023-09-13 16:59 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 12 Sep 2023 18:46:12 -0700, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> On Fri, Sep 08, 2023 at 06:24:16PM -0700, Dixit, Ashutosh wrote:
> > On Fri, 08 Sep 2023 18:16:26 -0700, Ashutosh Dixit wrote:
> >>
> >
> >> From: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >>
> >> Correct values for OAR counters are still dependent on enabling the
> >> GEN12_OAG_OACONTROL_OA_COUNTER_ENABLE in OAG_OACONTROL. Enabling this
> >> bit means OAG unit will write reports to the OAG buffer, so
> >> initialize the OAG buffer unconditionally for all use cases.
> >>
> >> BSpec: 46822
> >>
> >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/i915_perf.c | 10 +++++-----
> >>  1 file changed, 5 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >> index 1347e4ec9dd5a..30cf37d6e79be 100644
> >> --- a/drivers/gpu/drm/i915/i915_perf.c
> >> +++ b/drivers/gpu/drm/i915/i915_perf.c
> >> @@ -3032,12 +3032,12 @@ static void gen12_oa_enable(struct i915_perf_stream *stream)
> >>	u32 val;
> >>
> >>	/*
> >> -	 * If we don't want OA reports from the OA buffer, then we don't even
> >> -	 * need to program the OAG unit.
> >> +	 * BSpec: 46822
> >> +	 * Correct values for OAR counters are still dependent on enabling the
> >> +	 * GEN12_OAG_OACONTROL_OA_COUNTER_ENABLE in OAG_OACONTROL. Enabling this
> >> +	 * bit means OAG unit will write reports to the OAG buffer, so
> >> +	 * initialize the OAG buffer correctly.
> >>	 */
> >> -	if (!(stream->sample_flags & SAMPLE_OA_REPORT))
> >> -		return;
> >> -
> >>	gen12_init_oa_buffer(stream);
> >>
> >>	regs = __oa_regs(stream);
> >
> > Looks like this should be needed, I can R-b it.
> >
> > However, gen12_test_mi_rpc IGT says:
> >
> >	/* OA unit configuration:
> >	 * DRM_I915_PERF_PROP_SAMPLE_OA is no longer required for Gen12
> >	 * because the OAR unit increments counters only for the
> >	 * relevant context. No other parameters are needed since we do
> >	 * not rely on the OA buffer anymore to normalize the counter
> >	 * values.
> >	 */
>
> That's wrong. When TGL support was added, this was misunderstood and I
> removed the OAR-OAG dependency. Ideally we should enforce user to pass
> SAMPLE_OA always, but now that will break uabi.

SAMPLE_OA has other effects like enabling the timer etc. Also, if we
enforce user to always pass it, we can do it in the kernel itself and
ignore the uapi parameter.

>
> For for the OAR case, let's just enable OAG unconditionally so that the OAR
> counters tick correctly. While we do that, we should disable all events
> that trigger a report into the OA buffer. In addition, I would also
> allocate the smallest OA buffer size for this case, so that memory impact
> is low.
>
> Needs a Fixes tag with the commit that enabled OA for TGL.

If you think this patch is needed, I think it's better you do this so that
you can stay the patch author and I can be the reviewer, otherwise we'll
need to go hunt for another reviewer.

Thanks.
--
Ashutosh


> >
> > So gen12_test_mi_rpc doesn't set DRM_I915_PERF_PROP_SAMPLE_OA and also
> > seems to be passing in CI (don't see it but there seem to be no open
> > bugs). Thoughts?
> >
> > Thanks.
> > --
> > Ashutosh

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-13 16:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-09  1:16 [Intel-gfx] [PATCH 0/3] drm/i915/perf: A few fixes and enhancements Ashutosh Dixit
2023-09-09  1:16 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: Subtract gtt_offset from hw_tail Ashutosh Dixit
2023-09-13  1:25   ` Umesh Nerlige Ramappa
2023-09-13  2:20     ` Dixit, Ashutosh
2023-09-09  1:16 ` [Intel-gfx] [PATCH 2/3] drm/i915/perf: Remove gtt_offset from stream->oa_buffer.head/.tail Ashutosh Dixit
2023-09-13  1:36   ` Umesh Nerlige Ramappa
2023-09-09  1:16 ` [Intel-gfx] [PATCH 3/3] drm/i915/perf: Initialize gen12 OA buffer unconditionally Ashutosh Dixit
2023-09-09  1:24   ` Dixit, Ashutosh
2023-09-13  1:46     ` Umesh Nerlige Ramappa
2023-09-13 16:59       ` Dixit, Ashutosh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.