[Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support

intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support
@ 2020-03-12 23:04 Umesh Nerlige Ramappa
  2020-03-12 23:04 ` [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-03-12 23:04 UTC (permalink / raw)
  To: Lionel G Landwerlin, Ashutosh Dixit, intel-gfx

Hi all,

This is a revival of an earlier patch series submitted by Lionel
Landwerlin - https://patchwork.freedesktop.org/series/54280/

The patches enable interrupt support for the perf OA unit in
i915, further details can be found in the orignal series linked
above.

v2: (Umesh)
- This series will split into 2. This revision will only support the addition of
  poll delay parameter to perf OA. If needed a future revision will incorporate
  interrupt support.

Regards,
Umesh


Lionel Landwerlin (4):
  drm/i915/perf: rework aging tail workaround
  drm/i915/perf: move pollin setup to non hw specific code
  drm/i915/perf: only append status when data is available
  drm/i915/perf: add new open param to configure polling of OA buffer

 drivers/gpu/drm/i915/i915_perf.c       | 310 +++++++++++++------------
 drivers/gpu/drm/i915/i915_perf_types.h |  35 +--
 include/uapi/drm/i915_drm.h            |  13 ++
 3 files changed, 194 insertions(+), 164 deletions(-)

-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround
  2020-03-12 23:04 [Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
@ 2020-03-12 23:04 ` Umesh Nerlige Ramappa
  2020-03-13  9:54   ` Lionel Landwerlin
  2020-03-16 19:23   ` Dixit, Ashutosh
  2020-03-12 23:05 ` [Intel-gfx] [PATCH 2/4] drm/i915/perf: move pollin setup to non hw specific code Umesh Nerlige Ramappa
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-03-12 23:04 UTC (permalink / raw)
  To: Lionel G Landwerlin, Ashutosh Dixit, intel-gfx

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

We're about to introduce an options to open the perf stream, giving
the user ability to configure how often it wants the kernel to poll
the OA registers for available data.

Right now the workaround against the OA tail pointer race condition
requires at least twice the internal kernel polling timer to make any
data available.

This changes introduce checks on the OA data written into the circular
buffer to make as much data as possible available on the first
iteration of the polling timer.

v2: Use OA_TAKEN macro without the gtt_offset (Lionel)
v3: (Umesh)
- Rebase
- Change report to report32 from below review
  https://patchwork.freedesktop.org/patch/330704/?series=66697&rev=1

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 207 +++++++++++--------------
 drivers/gpu/drm/i915/i915_perf_types.h |  29 ++--
 2 files changed, 106 insertions(+), 130 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 1b074bb4a7fe..92c5c75e2a2b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -241,23 +241,14 @@
  * for this earlier, as part of the oa_buffer_check to avoid lots of redundant
  * read() attempts.
  *
- * In effect we define a tail pointer for reading that lags the real tail
- * pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough
- * time for the corresponding reports to become visible to the CPU.
- *
- * To manage this we actually track two tail pointers:
- *  1) An 'aging' tail with an associated timestamp that is tracked until we
- *     can trust the corresponding data is visible to the CPU; at which point
- *     it is considered 'aged'.
- *  2) An 'aged' tail that can be used for read()ing.
- *
- * The two separate pointers let us decouple read()s from tail pointer aging.
- *
- * The tail pointers are checked and updated at a limited rate within a hrtimer
- * callback (the same callback that is used for delivering EPOLLIN events)
- *
- * Initially the tails are marked invalid with %INVALID_TAIL_PTR which
- * indicates that an updated tail pointer is needed.
+ * We workaround this issue in oa_buffer_check() by reading the reports in the
+ * OA buffer, starting from the tail reported by the HW until we find 2
+ * consecutive reports with their first 2 dwords of not at 0. Those dwords are
+ * also set to 0 once read and the whole buffer is cleared upon OA buffer
+ * initialization. The first dword is the reason for this report while the
+ * second is the timestamp, making the chances of having those 2 fields at 0
+ * fairly unlikely. A more detailed explanation is available in
+ * oa_buffer_check().
  *
  * Most of the implementation details for this workaround are in
  * oa_buffer_check_unlocked() and _append_oa_reports()
@@ -270,7 +261,6 @@
  * enabled without any periodic sampling.
  */
 #define OA_TAIL_MARGIN_NSEC	100000ULL
-#define INVALID_TAIL_PTR	0xffffffff
 
 /* frequency for checking whether the OA unit has written new reports to the
  * circular OA buffer...
@@ -476,10 +466,10 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
  */
 static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 {
+	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
 	int report_size = stream->oa_buffer.format_size;
 	unsigned long flags;
-	unsigned int aged_idx;
-	u32 head, hw_tail, aged_tail, aging_tail;
+	u32 hw_tail;
 	u64 now;
 
 	/* We have to consider the (unlikely) possibility that read() errors
@@ -488,16 +478,6 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	 */
 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
-	/* NB: The head we observe here might effectively be a little out of
-	 * date (between head and tails[aged_idx].offset if there is currently
-	 * a read() in progress.
-	 */
-	head = stream->oa_buffer.head;
-
-	aged_idx = stream->oa_buffer.aged_tail_idx;
-	aged_tail = stream->oa_buffer.tails[aged_idx].offset;
-	aging_tail = stream->oa_buffer.tails[!aged_idx].offset;
-
 	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
 
 	/* The tail pointer increases in 64 byte increments,
@@ -507,64 +487,76 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 
 	now = ktime_get_mono_fast_ns();
 
-	/* Update the aged tail
-	 *
-	 * Flip the tail pointer available for read()s once the aging tail is
-	 * old enough to trust that the corresponding data will be visible to
-	 * the CPU...
-	 *
-	 * Do this before updating the aging pointer in case we may be able to
-	 * immediately start aging a new pointer too (if new data has become
-	 * available) without needing to wait for a later hrtimer callback.
-	 */
-	if (aging_tail != INVALID_TAIL_PTR &&
-	    ((now - stream->oa_buffer.aging_timestamp) >
-	     OA_TAIL_MARGIN_NSEC)) {
-
-		aged_idx ^= 1;
-		stream->oa_buffer.aged_tail_idx = aged_idx;
+	if (hw_tail == stream->oa_buffer.aging_tail) {
+		/* If the HW tail hasn't move since the last check and the HW
+		 * tail has been aging for long enough, declare it the new
+		 * tail.
+		 */
+		if ((now - stream->oa_buffer.aging_timestamp) >
+		    OA_TAIL_MARGIN_NSEC) {
+			stream->oa_buffer.tail =
+				stream->oa_buffer.aging_tail;
+		}
+	} else {
+		u32 head, tail, landed_report_heads;
 
-		aged_tail = aging_tail;
+		/* NB: The head we observe here might effectively be a little out of
+		 * date (between head and tails[aged_idx].offset if there is currently
+		 * a read() in progress.
+		 */
+		head = stream->oa_buffer.head - gtt_offset;
 
-		/* Mark that we need a new pointer to start aging... */
-		stream->oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
-		aging_tail = INVALID_TAIL_PTR;
-	}
+		hw_tail -= gtt_offset;
+		tail = hw_tail;
 
-	/* Update the aging tail
-	 *
-	 * We throttle aging tail updates until we have a new tail that
-	 * represents >= one report more data than is already available for
-	 * reading. This ensures there will be enough data for a successful
-	 * read once this new pointer has aged and ensures we will give the new
-	 * pointer time to age.
-	 */
-	if (aging_tail == INVALID_TAIL_PTR &&
-	    (aged_tail == INVALID_TAIL_PTR ||
-	     OA_TAKEN(hw_tail, aged_tail) >= report_size)) {
-		struct i915_vma *vma = stream->oa_buffer.vma;
-		u32 gtt_offset = i915_ggtt_offset(vma);
-
-		/* Be paranoid and do a bounds check on the pointer read back
-		 * from hardware, just in case some spurious hardware condition
-		 * could put the tail out of bounds...
+		/* Walk the stream backward until we find at least 2 reports
+		 * with dword 0 & 1 not at 0. Since the circular buffer
+		 * pointers progress by increments of 64 bytes and that
+		 * reports can be up to 256 bytes long, we can't tell whether
+		 * a report has fully landed in memory before the first 2
+		 * dwords of the following report have effectively landed.
+		 *
+		 * This is assuming that the writes of the OA unit land in
+		 * memory in the order they were written to.
+		 * If not : (╯°□°）╯︵ ┻━┻
 		 */
-		if (hw_tail >= gtt_offset &&
-		    hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
-			stream->oa_buffer.tails[!aged_idx].offset =
-				aging_tail = hw_tail;
-			stream->oa_buffer.aging_timestamp = now;
-		} else {
-			drm_err(&stream->perf->i915->drm,
-				"Ignoring spurious out of range OA buffer tail pointer = %x\n",
-				hw_tail);
+		landed_report_heads = 0;
+		while (OA_TAKEN(tail, head) >= report_size) {
+			u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
+			u8 *report = stream->oa_buffer.vaddr + previous_tail;
+			u32 *report32 = (void *) report;
+
+			/* Head of the report indicated by the HW tail register has
+			 * indeed landed into memory.
+			 */
+			if (report32[0] != 0 || report32[1] != 0) {
+				landed_report_heads++;
+
+				if (landed_report_heads >= 2)
+					break;
+			}
+
+			tail = previous_tail;
 		}
+
+		if (abs(tail - hw_tail) >= (2 * report_size)) {
+			if (__ratelimit(&stream->perf->tail_pointer_race)) {
+				DRM_NOTE("unlanded report(s) head=0x%x "
+					 "tail=0x%x hw_tail=0x%x\n",
+					 head, tail, hw_tail);
+			}
+		}
+
+		stream->oa_buffer.tail = gtt_offset + tail;
+		stream->oa_buffer.aging_tail = gtt_offset + hw_tail;
+		stream->oa_buffer.aging_timestamp = now;
 	}
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	return aged_tail == INVALID_TAIL_PTR ?
-		false : OA_TAKEN(aged_tail, head) >= report_size;
+	return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
+			stream->oa_buffer.head - gtt_offset) >= report_size;
+
 }
 
 /**
@@ -682,7 +674,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	u32 mask = (OA_BUFFER_SIZE - 1);
 	size_t start_offset = *offset;
 	unsigned long flags;
-	unsigned int aged_tail_idx;
 	u32 head, tail;
 	u32 taken;
 	int ret = 0;
@@ -693,18 +684,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
 	head = stream->oa_buffer.head;
-	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
-	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
+	tail = stream->oa_buffer.tail;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	/*
-	 * An invalid tail pointer here means we're still waiting for the poll
-	 * hrtimer callback to give us a pointer
-	 */
-	if (tail == INVALID_TAIL_PTR)
-		return -EAGAIN;
-
 	/*
 	 * NB: oa_buffer.head/tail include the gtt_offset which we don't want
 	 * while indexing relative to oa_buf_base.
@@ -838,13 +821,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		}
 
 		/*
-		 * The above reason field sanity check is based on
-		 * the assumption that the OA buffer is initially
-		 * zeroed and we reset the field after copying so the
-		 * check is still meaningful once old reports start
-		 * being overwritten.
+		 * Clear out the first 2 dword as a mean to detect unlanded
+		 * reports.
 		 */
-		report32[0] = 0;
+		report32[0] = report32[1] = 0;
 	}
 
 	if (start_offset != *offset) {
@@ -985,7 +965,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	u32 mask = (OA_BUFFER_SIZE - 1);
 	size_t start_offset = *offset;
 	unsigned long flags;
-	unsigned int aged_tail_idx;
 	u32 head, tail;
 	u32 taken;
 	int ret = 0;
@@ -996,17 +975,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
 	head = stream->oa_buffer.head;
-	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
-	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
+	tail = stream->oa_buffer.tail;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	/* An invalid tail pointer here means we're still waiting for the poll
-	 * hrtimer callback to give us a pointer
-	 */
-	if (tail == INVALID_TAIL_PTR)
-		return -EAGAIN;
-
 	/* NB: oa_buffer.head/tail include the gtt_offset which we don't want
 	 * while indexing relative to oa_buf_base.
 	 */
@@ -1064,13 +1036,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 		if (ret)
 			break;
 
-		/* The above report-id field sanity check is based on
-		 * the assumption that the OA buffer is initially
-		 * zeroed and we reset the field after copying so the
-		 * check is still meaningful once old reports start
-		 * being overwritten.
+		/* Clear out the first 2 dwords as a mean to detect unlanded
+		 * reports.
 		 */
-		report32[0] = 0;
+		report32[0] = report32[1] = 0;
 	}
 
 	if (start_offset != *offset) {
@@ -1449,8 +1418,8 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
 			   gtt_offset | OABUFFER_SIZE_16M);
 
 	/* Mark that we need updated tail pointers to read from... */
-	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
-	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
+	stream->oa_buffer.aging_tail =
+		stream->oa_buffer.tail = gtt_offset;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
@@ -1503,8 +1472,8 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
 	intel_uncore_write(uncore, GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK);
 
 	/* Mark that we need updated tail pointers to read from... */
-	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
-	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
+	stream->oa_buffer.aging_tail =
+		stream->oa_buffer.tail = gtt_offset;
 
 	/*
 	 * Reset state used to recognise context switches, affecting which
@@ -1559,8 +1528,8 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
 			   gtt_offset & GEN12_OAG_OATAILPTR_MASK);
 
 	/* Mark that we need updated tail pointers to read from... */
-	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
-	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
+	stream->oa_buffer.aging_tail =
+		stream->oa_buffer.tail = gtt_offset;
 
 	/*
 	 * Reset state used to recognise context switches, affecting which
@@ -4408,6 +4377,12 @@ void i915_perf_init(struct drm_i915_private *i915)
 		ratelimit_set_flags(&perf->spurious_report_rs,
 				    RATELIMIT_MSG_ON_RELEASE);
 
+		ratelimit_state_init(&perf->tail_pointer_race,
+				     5 * HZ, 10);
+		ratelimit_set_flags(&perf->tail_pointer_race,
+				    RATELIMIT_MSG_ON_RELEASE);
+
+
 		atomic64_set(&perf->noa_programming_delay,
 			     500 * 1000 /* 500us */);
 
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index a0e22f00f6cf..9ee7c58e70d5 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -272,21 +272,10 @@ struct i915_perf_stream {
 		spinlock_t ptr_lock;
 
 		/**
-		 * @tails: One 'aging' tail pointer and one 'aged' tail pointer ready to
-		 * used for reading.
-		 *
-		 * Initial values of 0xffffffff are invalid and imply that an
-		 * update is required (and should be ignored by an attempted
-		 * read)
+		 * @aging_tail: The last HW tail reported by HW. The data
+		 * might not have made it to memory yet though.
 		 */
-		struct {
-			u32 offset;
-		} tails[2];
-
-		/**
-		 * @aged_tail_idx: Index for the aged tail ready to read() data up to.
-		 */
-		unsigned int aged_tail_idx;
+		u32 aging_tail;
 
 		/**
 		 * @aging_timestamp: A monotonic timestamp for when the current aging tail pointer
@@ -302,6 +291,12 @@ struct i915_perf_stream {
 		 * OA buffer data to userspace.
 		 */
 		u32 head;
+
+		/**
+		 * @tail: The last tail verified tail that can be read by
+		 * userspace.
+		 */
+		u32 tail;
 	} oa_buffer;
 
 	/**
@@ -413,6 +408,12 @@ struct i915_perf {
 	 */
 	struct ratelimit_state spurious_report_rs;
 
+	/**
+	 * For rate limiting any notifications of tail pointer
+	 * race.
+	 */
+	struct ratelimit_state tail_pointer_race;
+
 	struct i915_oa_config test_config;
 
 	u32 gen7_latched_oastatus1;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH 2/4] drm/i915/perf: move pollin setup to non hw specific code
  2020-03-12 23:04 [Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
  2020-03-12 23:04 ` [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
@ 2020-03-12 23:05 ` Umesh Nerlige Ramappa
  2020-03-16 20:18   ` Dixit, Ashutosh
  2020-03-12 23:05 ` [Intel-gfx] [PATCH 3/4] drm/i915/perf: only append status when data is available Umesh Nerlige Ramappa
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-03-12 23:05 UTC (permalink / raw)
  To: Lionel G Landwerlin, Ashutosh Dixit, intel-gfx

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

This isn't really gen specific stuff, so just move it to the common
code.

v2: Rebase (Umesh)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 92c5c75e2a2b..0ec4546f1330 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1441,8 +1441,6 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
 	 * memory...
 	 */
 	memset(stream->oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
-
-	stream->pollin = false;
 }
 
 static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
@@ -1497,8 +1495,6 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
 	 * memory...
 	 */
 	memset(stream->oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
-
-	stream->pollin = false;
 }
 
 static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
@@ -1554,8 +1550,6 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
 	 */
 	memset(stream->oa_buffer.vaddr, 0,
 	       stream->oa_buffer.vma->size);
-
-	stream->pollin = false;
 }
 
 static int alloc_oa_buffer(struct i915_perf_stream *stream)
@@ -2626,6 +2620,12 @@ static void gen12_oa_enable(struct i915_perf_stream *stream)
  */
 static void i915_oa_stream_enable(struct i915_perf_stream *stream)
 {
+	/*
+	 * Maybe make ->pollin per-stream state if we support multiple
+	 * concurrent streams in the future.
+	 */
+	stream->pollin = false;
+
 	stream->perf->ops.oa_enable(stream);
 
 	if (stream->periodic)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH 3/4] drm/i915/perf: only append status when data is available
  2020-03-12 23:04 [Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
  2020-03-12 23:04 ` [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
  2020-03-12 23:05 ` [Intel-gfx] [PATCH 2/4] drm/i915/perf: move pollin setup to non hw specific code Umesh Nerlige Ramappa
@ 2020-03-12 23:05 ` Umesh Nerlige Ramappa
  2020-03-16 22:16   ` Dixit, Ashutosh
  2020-03-12 23:05 ` [Intel-gfx] [PATCH 4/4] drm/i915/perf: add new open param to configure polling of OA buffer Umesh Nerlige Ramappa
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-03-12 23:05 UTC (permalink / raw)
  To: Lionel G Landwerlin, Ashutosh Dixit, intel-gfx

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

The only bit of the status register we currently report in the
i915-perf stream is the "report loss" bit. Only report this when we
have some data to report with it. There was a kind of inconsistency
here in that we could report report loss without appending the reports
associated with the loss.

v2: Rebase (Umesh)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 54 ++++++++++++++++++++------------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 0ec4546f1330..21a63644846f 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -663,6 +663,7 @@ static int append_oa_sample(struct i915_perf_stream *stream,
  * Returns: 0 on success, negative error code on failure.
  */
 static int gen8_append_oa_reports(struct i915_perf_stream *stream,
+				  u32 oastatus,
 				  char __user *buf,
 				  size_t count,
 				  size_t *offset)
@@ -709,6 +710,21 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 			  head, tail))
 		return -EIO;
 
+	/*
+	 * If there is nothing to read, don't append the status report yet,
+	 * wait until we have some data available.
+	 */
+	if (!OA_TAKEN(tail, head))
+		return 0;
+
+	if (oastatus & GEN8_OASTATUS_REPORT_LOST) {
+		ret = append_oa_status(stream, buf, count, offset,
+				       DRM_I915_PERF_RECORD_OA_REPORT_LOST);
+		if (ret)
+			return ret;
+		intel_uncore_write(uncore, GEN8_OASTATUS,
+				   oastatus & ~GEN8_OASTATUS_REPORT_LOST);
+	}
 
 	for (/* none */;
 	     (taken = OA_TAKEN(tail, head));
@@ -921,16 +937,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
 		oastatus = intel_uncore_read(uncore, oastatus_reg);
 	}
 
-	if (oastatus & GEN8_OASTATUS_REPORT_LOST) {
-		ret = append_oa_status(stream, buf, count, offset,
-				       DRM_I915_PERF_RECORD_OA_REPORT_LOST);
-		if (ret)
-			return ret;
-		intel_uncore_write(uncore, oastatus_reg,
-				   oastatus & ~GEN8_OASTATUS_REPORT_LOST);
-	}
-
-	return gen8_append_oa_reports(stream, buf, count, offset);
+	return gen8_append_oa_reports(stream, oastatus, buf, count, offset);
 }
 
 /**
@@ -954,6 +961,7 @@ static int gen8_oa_read(struct i915_perf_stream *stream,
  * Returns: 0 on success, negative error code on failure.
  */
 static int gen7_append_oa_reports(struct i915_perf_stream *stream,
+				  u32 oastatus1,
 				  char __user *buf,
 				  size_t count,
 				  size_t *offset)
@@ -998,6 +1006,21 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 			  head, tail))
 		return -EIO;
 
+	/*
+	 * If there is nothing to read, don't append the status report yet,
+	 * wait until we have some data available.
+	 */
+	if (!OA_TAKEN(tail, head))
+		return 0;
+
+	if (unlikely(oastatus1 & GEN7_OASTATUS1_REPORT_LOST)) {
+		ret = append_oa_status(stream, buf, count, offset,
+				       DRM_I915_PERF_RECORD_OA_REPORT_LOST);
+		if (ret)
+			return ret;
+		stream->perf->gen7_latched_oastatus1 |=
+			GEN7_OASTATUS1_REPORT_LOST;
+	}
 
 	for (/* none */;
 	     (taken = OA_TAKEN(tail, head));
@@ -1133,16 +1156,7 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
 		oastatus1 = intel_uncore_read(uncore, GEN7_OASTATUS1);
 	}
 
-	if (unlikely(oastatus1 & GEN7_OASTATUS1_REPORT_LOST)) {
-		ret = append_oa_status(stream, buf, count, offset,
-				       DRM_I915_PERF_RECORD_OA_REPORT_LOST);
-		if (ret)
-			return ret;
-		stream->perf->gen7_latched_oastatus1 |=
-			GEN7_OASTATUS1_REPORT_LOST;
-	}
-
-	return gen7_append_oa_reports(stream, buf, count, offset);
+	return gen7_append_oa_reports(stream, oastatus1, buf, count, offset);
 }
 
 /**
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH 4/4] drm/i915/perf: add new open param to configure polling of OA buffer
  2020-03-12 23:04 [Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (2 preceding siblings ...)
  2020-03-12 23:05 ` [Intel-gfx] [PATCH 3/4] drm/i915/perf: only append status when data is available Umesh Nerlige Ramappa
@ 2020-03-12 23:05 ` Umesh Nerlige Ramappa
  2020-03-13  9:59   ` Lionel Landwerlin
  2020-03-17  0:02   ` Dixit, Ashutosh
  2020-03-13  0:45 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: add OA interrupt support (rev6) Patchwork
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-03-12 23:05 UTC (permalink / raw)
  To: Lionel G Landwerlin, Ashutosh Dixit, intel-gfx

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

This new parameter let's the application choose how often the OA
buffer should be checked on the CPU side for data availability. Longer
polling period tend to reduce CPU overhead if the application does not
care about somewhat real time data collection.

v2: Allow disabling polling completely with 0 value (Lionel)
v3: Version the new parameter (Joonas)
v4: Rebase (Umesh)
v5: Make poll delay value of 0 invalid (Umesh)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 37 ++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_perf_types.h |  6 +++++
 include/uapi/drm/i915_drm.h            | 13 +++++++++
 3 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 21a63644846f..ca139ac31b11 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -262,11 +262,11 @@
  */
 #define OA_TAIL_MARGIN_NSEC	100000ULL
 
-/* frequency for checking whether the OA unit has written new reports to the
- * circular OA buffer...
+/* The default frequency for checking whether the OA unit has written new
+ * reports to the circular OA buffer...
  */
-#define POLL_FREQUENCY 200
-#define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY)
+#define DEFAULT_POLL_FREQUENCY 200
+#define DEFAULT_POLL_PERIOD (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY)
 
 /* for sysctl proc_dointvec_minmax of dev.i915.perf_stream_paranoid */
 static u32 i915_perf_stream_paranoid = true;
@@ -349,6 +349,8 @@ static const struct i915_oa_format gen12_oa_formats[I915_OA_FORMAT_MAX] = {
  * @oa_periodic: Whether to enable periodic OA unit sampling
  * @oa_period_exponent: The OA unit sampling period is derived from this
  * @engine: The engine (typically rcs0) being monitored by the OA unit
+ * @poll_oa_period: The period at which the CPU will check for OA data
+ * availability
  *
  * As read_properties_unlocked() enumerates and validates the properties given
  * to open a stream of metrics the configuration is built up in the structure
@@ -368,6 +370,7 @@ struct perf_open_properties {
 	int oa_period_exponent;
 
 	struct intel_engine_cs *engine;
+	u64 poll_oa_period;
 };
 
 struct i915_oa_config_bo {
@@ -2642,9 +2645,9 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream)
 
 	stream->perf->ops.oa_enable(stream);
 
-	if (stream->periodic)
+	if (stream->periodic && stream->poll_oa_period)
 		hrtimer_start(&stream->poll_check_timer,
-			      ns_to_ktime(POLL_PERIOD),
+			      ns_to_ktime(stream->poll_oa_period),
 			      HRTIMER_MODE_REL_PINNED);
 }
 
@@ -3044,7 +3047,8 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
 		wake_up(&stream->poll_wq);
 	}
 
-	hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
+	hrtimer_forward_now(hrtimer,
+			    ns_to_ktime(stream->poll_oa_period));
 
 	return HRTIMER_RESTART;
 }
@@ -3425,6 +3429,7 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
 
 	stream->perf = perf;
 	stream->ctx = specific_ctx;
+	stream->poll_oa_period = props->poll_oa_period;
 
 	ret = i915_oa_stream_init(stream, param, props);
 	if (ret)
@@ -3481,6 +3486,7 @@ static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
 /**
  * read_properties_unlocked - validate + copy userspace stream open properties
  * @perf: i915 perf instance
+ * @open_flags: Flags set by userspace for the opening of the stream
  * @uprops: The array of u64 key value pairs given by userspace
  * @n_props: The number of key value pairs expected in @uprops
  * @props: The stream configuration built up while validating properties
@@ -3494,6 +3500,7 @@ static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
  * rule out defining new properties with ordering requirements in the future.
  */
 static int read_properties_unlocked(struct i915_perf *perf,
+				    u32 open_flags,
 				    u64 __user *uprops,
 				    u32 n_props,
 				    struct perf_open_properties *props)
@@ -3502,6 +3509,7 @@ static int read_properties_unlocked(struct i915_perf *perf,
 	u32 i;
 
 	memset(props, 0, sizeof(struct perf_open_properties));
+	props->poll_oa_period = DEFAULT_POLL_PERIOD;
 
 	if (!n_props) {
 		DRM_DEBUG("No i915 perf properties given\n");
@@ -3617,6 +3625,14 @@ static int read_properties_unlocked(struct i915_perf *perf,
 		case DRM_I915_PERF_PROP_HOLD_PREEMPTION:
 			props->hold_preemption = !!value;
 			break;
+		case DRM_I915_PERF_PROP_POLL_OA_DELAY:
+			if (value < 100000 /* 100us */) {
+				DRM_DEBUG("OA availability timer too small (%lluns < 100us)\n",
+					  value);
+				return -EINVAL;
+			}
+			props->poll_oa_period = value;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
@@ -3675,6 +3691,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 	}
 
 	ret = read_properties_unlocked(perf,
+				       param->flags,
 				       u64_to_user_ptr(param->properties_ptr),
 				       param->num_properties,
 				       &props);
@@ -4457,8 +4474,12 @@ int i915_perf_ioctl_version(void)
 	 *    preemption on a particular context so that performance data is
 	 *    accessible from a delta of MI_RPC reports without looking at the
 	 *    OA buffer.
+	 *
+	 * 4: Add DRM_I915_PERF_PROP_POLL_OA_DELAY parameter that controls
+	 *    enable/disable as well as the interval for the hrtimer used to
+	 *    check for OA data.
 	 */
-	return 3;
+	return 4;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index 9ee7c58e70d5..01559ead22e2 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -304,6 +304,12 @@ struct i915_perf_stream {
 	 * reprogrammed.
 	 */
 	struct i915_vma *noa_wait;
+
+	/**
+	 * @poll_oa_period: The period in nanoseconds at which the OA
+	 * buffer should be checked for available data.
+	 */
+	u64 poll_oa_period;
 };
 
 /**
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 2813e579b480..dd511e7f795d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1969,6 +1969,19 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_HOLD_PREEMPTION,
 
+	/**
+	 * This optional parameter specifies the timer interval in nanoseconds
+	 * at which the i915 driver will check the OA buffer for available data.
+	 * Minimum allowed value is 100 microseconds. A default value is used by
+	 * the driver if this parameter is not specified. Note that a large
+	 * value may reduce cpu consumption during OA perf captures, but it
+	 * would also potentially result in OA buffer overwrite as the captures
+	 * reach end of the OA buffer.
+	 *
+	 * This property is available in perf revision 4.
+	 */
+	DRM_I915_PERF_PROP_POLL_OA_DELAY,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: add OA interrupt support (rev6)
  2020-03-12 23:04 [Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (3 preceding siblings ...)
  2020-03-12 23:05 ` [Intel-gfx] [PATCH 4/4] drm/i915/perf: add new open param to configure polling of OA buffer Umesh Nerlige Ramappa
@ 2020-03-13  0:45 ` Patchwork
  2020-03-13  0:47 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-03-13  0:45 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev6)
URL   : https://patchwork.freedesktop.org/series/54280/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
cf4dc4c5b6bb drm/i915/perf: rework aging tail workaround
-:189: CHECK:SPACING: No space is necessary after a cast
#189: FILE: drivers/gpu/drm/i915/i915_perf.c:527:
+			u32 *report32 = (void *) report;

-:268: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#268: FILE: drivers/gpu/drm/i915/i915_perf.c:827:
+		report32[0] = report32[1] = 0;

-:312: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#312: FILE: drivers/gpu/drm/i915/i915_perf.c:1042:
+		report32[0] = report32[1] = 0;

-:358: CHECK:LINE_SPACING: Please don't use multiple blank lines
#358: FILE: drivers/gpu/drm/i915/i915_perf.c:4385:
+
+

total: 0 errors, 0 warnings, 4 checks, 364 lines checked
0a0d3681c903 drm/i915/perf: move pollin setup to non hw specific code
77c909d6ff76 drm/i915/perf: only append status when data is available
5c91c06ca2f8 drm/i915/perf: add new open param to configure polling of OA buffer

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/perf: add OA interrupt support (rev6)
  2020-03-12 23:04 [Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (4 preceding siblings ...)
  2020-03-13  0:45 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: add OA interrupt support (rev6) Patchwork
@ 2020-03-13  0:47 ` Patchwork
  2020-03-13  1:02 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-03-13  0:47 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev6)
URL   : https://patchwork.freedesktop.org/series/54280/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.6.0
Commit: drm/i915/perf: rework aging tail workaround
Okay!

Commit: drm/i915/perf: move pollin setup to non hw specific code
-O:drivers/gpu/drm/i915/i915_perf.c:1443:15: warning: memset with byte count of 16777216
-O:drivers/gpu/drm/i915/i915_perf.c:1499:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1443:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1497:15: warning: memset with byte count of 16777216

Commit: drm/i915/perf: only append status when data is available
Okay!

Commit: drm/i915/perf: add new open param to configure polling of OA buffer
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✗ Fi.CI.DOCS: warning for drm/i915/perf: add OA interrupt support (rev6)
  2020-03-12 23:04 [Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (5 preceding siblings ...)
  2020-03-13  0:47 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2020-03-13  1:02 ` Patchwork
  2020-03-13  1:13 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
  2020-03-13 14:14 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  8 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-03-13  1:02 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev6)
URL   : https://patchwork.freedesktop.org/series/54280/
State : warning

== Summary ==

$ make htmldocs 2>&1 > /dev/null | grep i915
./drivers/gpu/drm/i915/i915_perf.c:674: warning: Function parameter or member 'oastatus' not described in 'gen8_append_oa_reports'
./drivers/gpu/drm/i915/i915_perf.c:972: warning: Function parameter or member 'oastatus1' not described in 'gen7_append_oa_reports'

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/perf: add OA interrupt support (rev6)
  2020-03-12 23:04 [Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (6 preceding siblings ...)
  2020-03-13  1:02 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
@ 2020-03-13  1:13 ` Patchwork
  2020-03-13 14:14 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  8 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-03-13  1:13 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev6)
URL   : https://patchwork.freedesktop.org/series/54280/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8131 -> Patchwork_16958
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/index.html

Known issues
------------

  Here are the changes found in Patchwork_16958 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live@execlists:
    - fi-skl-lmem:        [PASS][1] -> [INCOMPLETE][2] ([i915#656])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/fi-skl-lmem/igt@i915_selftest@live@execlists.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/fi-skl-lmem/igt@i915_selftest@live@execlists.html
    - fi-kbl-soraka:      [PASS][3] -> [INCOMPLETE][4] ([fdo#112259] / [i915#1430] / [i915#656])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/fi-kbl-soraka/igt@i915_selftest@live@execlists.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/fi-kbl-soraka/igt@i915_selftest@live@execlists.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@execlists:
    - fi-glk-dsi:         [INCOMPLETE][5] ([i915#1430] / [i915#58] / [i915#656] / [k.org#198133]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/fi-glk-dsi/igt@i915_selftest@live@execlists.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/fi-glk-dsi/igt@i915_selftest@live@execlists.html
    - fi-kbl-guc:         [INCOMPLETE][7] ([fdo#112259] / [i915#1430] / [i915#656]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/fi-kbl-guc/igt@i915_selftest@live@execlists.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/fi-kbl-guc/igt@i915_selftest@live@execlists.html

  
  [fdo#112259]: https://bugs.freedesktop.org/show_bug.cgi?id=112259
  [i915#1430]: https://gitlab.freedesktop.org/drm/intel/issues/1430
  [i915#58]: https://gitlab.freedesktop.org/drm/intel/issues/58
  [i915#656]: https://gitlab.freedesktop.org/drm/intel/issues/656
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (47 -> 42)
------------------------------

  Additional (1): fi-snb-2520m 
  Missing    (6): fi-hsw-4200u fi-skl-6770hq fi-byt-squawks fi-bsw-cyan fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8131 -> Patchwork_16958

  CI-20190529: 20190529
  CI_DRM_8131: 5137b6daebf1eac30b307c37fe371fee1db7d6cd @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5506: 59fd8a0d01dac58dc6c7d86ef391ed4393ab5aae @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_16958: 5c91c06ca2f8a42a0760f3f30ac014fdeede74cc @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

5c91c06ca2f8 drm/i915/perf: add new open param to configure polling of OA buffer
77c909d6ff76 drm/i915/perf: only append status when data is available
0a0d3681c903 drm/i915/perf: move pollin setup to non hw specific code
cf4dc4c5b6bb drm/i915/perf: rework aging tail workaround

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround
  2020-03-12 23:04 ` [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
@ 2020-03-13  9:54   ` Lionel Landwerlin
  2020-03-16 19:23   ` Dixit, Ashutosh
  1 sibling, 0 replies; 19+ messages in thread
From: Lionel Landwerlin @ 2020-03-13  9:54 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, Ashutosh Dixit, intel-gfx

On 13/03/2020 01:04, Umesh Nerlige Ramappa wrote:
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> We're about to introduce an options to open the perf stream, giving
> the user ability to configure how often it wants the kernel to poll
> the OA registers for available data.
>
> Right now the workaround against the OA tail pointer race condition
> requires at least twice the internal kernel polling timer to make any
> data available.
>
> This changes introduce checks on the OA data written into the circular
> buffer to make as much data as possible available on the first
> iteration of the polling timer.
>
> v2: Use OA_TAKEN macro without the gtt_offset (Lionel)
> v3: (Umesh)
> - Rebase
> - Change report to report32 from below review
>    https://patchwork.freedesktop.org/patch/330704/?series=66697&rev=1
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c       | 207 +++++++++++--------------
>   drivers/gpu/drm/i915/i915_perf_types.h |  29 ++--
>   2 files changed, 106 insertions(+), 130 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 1b074bb4a7fe..92c5c75e2a2b 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -241,23 +241,14 @@
>    * for this earlier, as part of the oa_buffer_check to avoid lots of redundant
>    * read() attempts.
>    *
> - * In effect we define a tail pointer for reading that lags the real tail
> - * pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough
> - * time for the corresponding reports to become visible to the CPU.
> - *
> - * To manage this we actually track two tail pointers:
> - *  1) An 'aging' tail with an associated timestamp that is tracked until we
> - *     can trust the corresponding data is visible to the CPU; at which point
> - *     it is considered 'aged'.
> - *  2) An 'aged' tail that can be used for read()ing.
> - *
> - * The two separate pointers let us decouple read()s from tail pointer aging.
> - *
> - * The tail pointers are checked and updated at a limited rate within a hrtimer
> - * callback (the same callback that is used for delivering EPOLLIN events)
> - *
> - * Initially the tails are marked invalid with %INVALID_TAIL_PTR which
> - * indicates that an updated tail pointer is needed.
> + * We workaround this issue in oa_buffer_check() by reading the reports in the
s/oa_buffer_check/oa_buffer_check_unlocked/
> + * OA buffer, starting from the tail reported by the HW until we find 2
> + * consecutive reports with their first 2 dwords of not at 0. Those dwords are
> + * also set to 0 once read and the whole buffer is cleared upon OA buffer
> + * initialization. The first dword is the reason for this report while the
> + * second is the timestamp, making the chances of having those 2 fields at 0
> + * fairly unlikely. A more detailed explanation is available in
> + * oa_buffer_check().
>    *
>    * Most of the implementation details for this workaround are in
>    * oa_buffer_check_unlocked() and _append_oa_reports()
> @@ -270,7 +261,6 @@
>    * enabled without any periodic sampling.
>    */
>   #define OA_TAIL_MARGIN_NSEC	100000ULL
> -#define INVALID_TAIL_PTR	0xffffffff
>   
>   /* frequency for checking whether the OA unit has written new reports to the
>    * circular OA buffer...
> @@ -476,10 +466,10 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
>    */
>   static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   {
> +	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>   	int report_size = stream->oa_buffer.format_size;
>   	unsigned long flags;
> -	unsigned int aged_idx;
> -	u32 head, hw_tail, aged_tail, aging_tail;
> +	u32 hw_tail;
>   	u64 now;
>   
>   	/* We have to consider the (unlikely) possibility that read() errors
> @@ -488,16 +478,6 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   	 */
>   	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>   
> -	/* NB: The head we observe here might effectively be a little out of
> -	 * date (between head and tails[aged_idx].offset if there is currently
> -	 * a read() in progress.
> -	 */
> -	head = stream->oa_buffer.head;
> -
> -	aged_idx = stream->oa_buffer.aged_tail_idx;
> -	aged_tail = stream->oa_buffer.tails[aged_idx].offset;
> -	aging_tail = stream->oa_buffer.tails[!aged_idx].offset;
> -
>   	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
>   
>   	/* The tail pointer increases in 64 byte increments,
> @@ -507,64 +487,76 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   
>   	now = ktime_get_mono_fast_ns();
>   
> -	/* Update the aged tail
> -	 *
> -	 * Flip the tail pointer available for read()s once the aging tail is
> -	 * old enough to trust that the corresponding data will be visible to
> -	 * the CPU...
> -	 *
> -	 * Do this before updating the aging pointer in case we may be able to
> -	 * immediately start aging a new pointer too (if new data has become
> -	 * available) without needing to wait for a later hrtimer callback.
> -	 */
> -	if (aging_tail != INVALID_TAIL_PTR &&
> -	    ((now - stream->oa_buffer.aging_timestamp) >
> -	     OA_TAIL_MARGIN_NSEC)) {
> -
> -		aged_idx ^= 1;
> -		stream->oa_buffer.aged_tail_idx = aged_idx;
> +	if (hw_tail == stream->oa_buffer.aging_tail) {
> +		/* If the HW tail hasn't move since the last check and the HW
> +		 * tail has been aging for long enough, declare it the new
> +		 * tail.
> +		 */
> +		if ((now - stream->oa_buffer.aging_timestamp) >
> +		    OA_TAIL_MARGIN_NSEC) {
> +			stream->oa_buffer.tail =
> +				stream->oa_buffer.aging_tail;
> +		}
> +	} else {
> +		u32 head, tail, landed_report_heads;
>   
> -		aged_tail = aging_tail;
> +		/* NB: The head we observe here might effectively be a little out of
> +		 * date (between head and tails[aged_idx].offset if there is currently
> +		 * a read() in progress.
> +		 */
> +		head = stream->oa_buffer.head - gtt_offset;
>   
> -		/* Mark that we need a new pointer to start aging... */
> -		stream->oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
> -		aging_tail = INVALID_TAIL_PTR;
> -	}
> +		hw_tail -= gtt_offset;
> +		tail = hw_tail;
>   
> -	/* Update the aging tail
> -	 *
> -	 * We throttle aging tail updates until we have a new tail that
> -	 * represents >= one report more data than is already available for
> -	 * reading. This ensures there will be enough data for a successful
> -	 * read once this new pointer has aged and ensures we will give the new
> -	 * pointer time to age.
> -	 */
> -	if (aging_tail == INVALID_TAIL_PTR &&
> -	    (aged_tail == INVALID_TAIL_PTR ||
> -	     OA_TAKEN(hw_tail, aged_tail) >= report_size)) {
> -		struct i915_vma *vma = stream->oa_buffer.vma;
> -		u32 gtt_offset = i915_ggtt_offset(vma);
> -
> -		/* Be paranoid and do a bounds check on the pointer read back
> -		 * from hardware, just in case some spurious hardware condition
> -		 * could put the tail out of bounds...
> +		/* Walk the stream backward until we find at least 2 reports
> +		 * with dword 0 & 1 not at 0. Since the circular buffer
> +		 * pointers progress by increments of 64 bytes and that
> +		 * reports can be up to 256 bytes long, we can't tell whether
> +		 * a report has fully landed in memory before the first 2
> +		 * dwords of the following report have effectively landed.
> +		 *
> +		 * This is assuming that the writes of the OA unit land in
> +		 * memory in the order they were written to.
> +		 * If not : (╯°□°）╯︵ ┻━┻
>   		 */
> -		if (hw_tail >= gtt_offset &&
> -		    hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
> -			stream->oa_buffer.tails[!aged_idx].offset =
> -				aging_tail = hw_tail;
> -			stream->oa_buffer.aging_timestamp = now;
> -		} else {
> -			drm_err(&stream->perf->i915->drm,
> -				"Ignoring spurious out of range OA buffer tail pointer = %x\n",
> -				hw_tail);
> +		landed_report_heads = 0;
> +		while (OA_TAKEN(tail, head) >= report_size) {
> +			u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
> +			u8 *report = stream->oa_buffer.vaddr + previous_tail;
> +			u32 *report32 = (void *) report;
> +
> +			/* Head of the report indicated by the HW tail register has
> +			 * indeed landed into memory.
> +			 */
> +			if (report32[0] != 0 || report32[1] != 0) {
> +				landed_report_heads++;
> +
> +				if (landed_report_heads >= 2)
> +					break;
> +			}
> +
> +			tail = previous_tail;
>   		}
> +
> +		if (abs(tail - hw_tail) >= (2 * report_size)) {
> +			if (__ratelimit(&stream->perf->tail_pointer_race)) {
> +				DRM_NOTE("unlanded report(s) head=0x%x "
> +					 "tail=0x%x hw_tail=0x%x\n",
> +					 head, tail, hw_tail);
> +			}
> +		}
> +
> +		stream->oa_buffer.tail = gtt_offset + tail;
> +		stream->oa_buffer.aging_tail = gtt_offset + hw_tail;
> +		stream->oa_buffer.aging_timestamp = now;
>   	}
>   
>   	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>   
> -	return aged_tail == INVALID_TAIL_PTR ?
> -		false : OA_TAKEN(aged_tail, head) >= report_size;
> +	return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> +			stream->oa_buffer.head - gtt_offset) >= report_size;
> +
>   }
>   
>   /**
> @@ -682,7 +674,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   	u32 mask = (OA_BUFFER_SIZE - 1);
>   	size_t start_offset = *offset;
>   	unsigned long flags;
> -	unsigned int aged_tail_idx;
>   	u32 head, tail;
>   	u32 taken;
>   	int ret = 0;
> @@ -693,18 +684,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>   
>   	head = stream->oa_buffer.head;
> -	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
> -	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
> +	tail = stream->oa_buffer.tail;
>   
>   	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>   
> -	/*
> -	 * An invalid tail pointer here means we're still waiting for the poll
> -	 * hrtimer callback to give us a pointer
> -	 */
> -	if (tail == INVALID_TAIL_PTR)
> -		return -EAGAIN;
> -
>   	/*
>   	 * NB: oa_buffer.head/tail include the gtt_offset which we don't want
>   	 * while indexing relative to oa_buf_base.
> @@ -838,13 +821,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   		}
>   
>   		/*
> -		 * The above reason field sanity check is based on
> -		 * the assumption that the OA buffer is initially
> -		 * zeroed and we reset the field after copying so the
> -		 * check is still meaningful once old reports start
> -		 * being overwritten.
> +		 * Clear out the first 2 dword as a mean to detect unlanded
> +		 * reports.
>   		 */
> -		report32[0] = 0;
> +		report32[0] = report32[1] = 0;
Looks like the scripts are complaining about this. You'll have to split 
it in 2 lines.
>   	}
>   
>   	if (start_offset != *offset) {
> @@ -985,7 +965,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   	u32 mask = (OA_BUFFER_SIZE - 1);
>   	size_t start_offset = *offset;
>   	unsigned long flags;
> -	unsigned int aged_tail_idx;
>   	u32 head, tail;
>   	u32 taken;
>   	int ret = 0;
> @@ -996,17 +975,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>   
>   	head = stream->oa_buffer.head;
> -	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
> -	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
> +	tail = stream->oa_buffer.tail;
>   
>   	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>   
> -	/* An invalid tail pointer here means we're still waiting for the poll
> -	 * hrtimer callback to give us a pointer
> -	 */
> -	if (tail == INVALID_TAIL_PTR)
> -		return -EAGAIN;
> -
>   	/* NB: oa_buffer.head/tail include the gtt_offset which we don't want
>   	 * while indexing relative to oa_buf_base.
>   	 */
> @@ -1064,13 +1036,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   		if (ret)
>   			break;
>   
> -		/* The above report-id field sanity check is based on
> -		 * the assumption that the OA buffer is initially
> -		 * zeroed and we reset the field after copying so the
> -		 * check is still meaningful once old reports start
> -		 * being overwritten.
> +		/* Clear out the first 2 dwords as a mean to detect unlanded
> +		 * reports.
>   		 */
> -		report32[0] = 0;
> +		report32[0] = report32[1] = 0;
Same here.
>   	}
>   
>   	if (start_offset != *offset) {
> @@ -1449,8 +1418,8 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
>   			   gtt_offset | OABUFFER_SIZE_16M);
>   
>   	/* Mark that we need updated tail pointers to read from... */
> -	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
> -	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
> +	stream->oa_buffer.aging_tail =
> +		stream->oa_buffer.tail = gtt_offset;
>   
>   	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>   
> @@ -1503,8 +1472,8 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
>   	intel_uncore_write(uncore, GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK);
>   
>   	/* Mark that we need updated tail pointers to read from... */
> -	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
> -	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
> +	stream->oa_buffer.aging_tail =
> +		stream->oa_buffer.tail = gtt_offset;
>   
>   	/*
>   	 * Reset state used to recognise context switches, affecting which
> @@ -1559,8 +1528,8 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
>   			   gtt_offset & GEN12_OAG_OATAILPTR_MASK);
>   
>   	/* Mark that we need updated tail pointers to read from... */
> -	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
> -	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
> +	stream->oa_buffer.aging_tail =
> +		stream->oa_buffer.tail = gtt_offset;
>   
>   	/*
>   	 * Reset state used to recognise context switches, affecting which
> @@ -4408,6 +4377,12 @@ void i915_perf_init(struct drm_i915_private *i915)
>   		ratelimit_set_flags(&perf->spurious_report_rs,
>   				    RATELIMIT_MSG_ON_RELEASE);
>   
> +		ratelimit_state_init(&perf->tail_pointer_race,
> +				     5 * HZ, 10);
> +		ratelimit_set_flags(&perf->tail_pointer_race,
> +				    RATELIMIT_MSG_ON_RELEASE);
> +
> +
>   		atomic64_set(&perf->noa_programming_delay,
>   			     500 * 1000 /* 500us */);
>   
> diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
> index a0e22f00f6cf..9ee7c58e70d5 100644
> --- a/drivers/gpu/drm/i915/i915_perf_types.h
> +++ b/drivers/gpu/drm/i915/i915_perf_types.h
> @@ -272,21 +272,10 @@ struct i915_perf_stream {
>   		spinlock_t ptr_lock;
>   
>   		/**
> -		 * @tails: One 'aging' tail pointer and one 'aged' tail pointer ready to
> -		 * used for reading.
> -		 *
> -		 * Initial values of 0xffffffff are invalid and imply that an
> -		 * update is required (and should be ignored by an attempted
> -		 * read)
> +		 * @aging_tail: The last HW tail reported by HW. The data
> +		 * might not have made it to memory yet though.
>   		 */
> -		struct {
> -			u32 offset;
> -		} tails[2];
> -
> -		/**
> -		 * @aged_tail_idx: Index for the aged tail ready to read() data up to.
> -		 */
> -		unsigned int aged_tail_idx;
> +		u32 aging_tail;
>   
>   		/**
>   		 * @aging_timestamp: A monotonic timestamp for when the current aging tail pointer
> @@ -302,6 +291,12 @@ struct i915_perf_stream {
>   		 * OA buffer data to userspace.
>   		 */
>   		u32 head;
> +
> +		/**
> +		 * @tail: The last tail verified tail that can be read by
> +		 * userspace.
> +		 */
> +		u32 tail;
>   	} oa_buffer;
>   
>   	/**
> @@ -413,6 +408,12 @@ struct i915_perf {
>   	 */
>   	struct ratelimit_state spurious_report_rs;
>   
> +	/**
> +	 * For rate limiting any notifications of tail pointer
> +	 * race.
> +	 */
> +	struct ratelimit_state tail_pointer_race;
> +
>   	struct i915_oa_config test_config;
>   
>   	u32 gen7_latched_oastatus1;


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 4/4] drm/i915/perf: add new open param to configure polling of OA buffer
  2020-03-12 23:05 ` [Intel-gfx] [PATCH 4/4] drm/i915/perf: add new open param to configure polling of OA buffer Umesh Nerlige Ramappa
@ 2020-03-13  9:59   ` Lionel Landwerlin
  2020-03-17  0:02   ` Dixit, Ashutosh
  1 sibling, 0 replies; 19+ messages in thread
From: Lionel Landwerlin @ 2020-03-13  9:59 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, Ashutosh Dixit, intel-gfx

On 13/03/2020 01:05, Umesh Nerlige Ramappa wrote:
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> This new parameter let's the application choose how often the OA
> buffer should be checked on the CPU side for data availability. Longer
> polling period tend to reduce CPU overhead if the application does not
> care about somewhat real time data collection.
>
> v2: Allow disabling polling completely with 0 value (Lionel)
> v3: Version the new parameter (Joonas)
> v4: Rebase (Umesh)
> v5: Make poll delay value of 0 invalid (Umesh)
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c       | 37 ++++++++++++++++++++------
>   drivers/gpu/drm/i915/i915_perf_types.h |  6 +++++
>   include/uapi/drm/i915_drm.h            | 13 +++++++++
>   3 files changed, 48 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 21a63644846f..ca139ac31b11 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -262,11 +262,11 @@
>    */
>   #define OA_TAIL_MARGIN_NSEC	100000ULL
>   
> -/* frequency for checking whether the OA unit has written new reports to the
> - * circular OA buffer...
> +/* The default frequency for checking whether the OA unit has written new
> + * reports to the circular OA buffer...
>    */
> -#define POLL_FREQUENCY 200
> -#define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY)
> +#define DEFAULT_POLL_FREQUENCY 200
> +#define DEFAULT_POLL_PERIOD (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY)
>   
>   /* for sysctl proc_dointvec_minmax of dev.i915.perf_stream_paranoid */
>   static u32 i915_perf_stream_paranoid = true;
> @@ -349,6 +349,8 @@ static const struct i915_oa_format gen12_oa_formats[I915_OA_FORMAT_MAX] = {
>    * @oa_periodic: Whether to enable periodic OA unit sampling
>    * @oa_period_exponent: The OA unit sampling period is derived from this
>    * @engine: The engine (typically rcs0) being monitored by the OA unit
> + * @poll_oa_period: The period at which the CPU will check for OA data
> + * availability
>    *
>    * As read_properties_unlocked() enumerates and validates the properties given
>    * to open a stream of metrics the configuration is built up in the structure
> @@ -368,6 +370,7 @@ struct perf_open_properties {
>   	int oa_period_exponent;
>   
>   	struct intel_engine_cs *engine;
> +	u64 poll_oa_period;
>   };
>   
>   struct i915_oa_config_bo {
> @@ -2642,9 +2645,9 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream)
>   
>   	stream->perf->ops.oa_enable(stream);
>   
> -	if (stream->periodic)
> +	if (stream->periodic && stream->poll_oa_period)
>   		hrtimer_start(&stream->poll_check_timer,
> -			      ns_to_ktime(POLL_PERIOD),
> +			      ns_to_ktime(stream->poll_oa_period),
>   			      HRTIMER_MODE_REL_PINNED);
>   }
>   
> @@ -3044,7 +3047,8 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
>   		wake_up(&stream->poll_wq);
>   	}
>   
> -	hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
> +	hrtimer_forward_now(hrtimer,
> +			    ns_to_ktime(stream->poll_oa_period));
>   
>   	return HRTIMER_RESTART;
>   }
> @@ -3425,6 +3429,7 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
>   
>   	stream->perf = perf;
>   	stream->ctx = specific_ctx;
> +	stream->poll_oa_period = props->poll_oa_period;
>   
>   	ret = i915_oa_stream_init(stream, param, props);
>   	if (ret)
> @@ -3481,6 +3486,7 @@ static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
>   /**
>    * read_properties_unlocked - validate + copy userspace stream open properties
>    * @perf: i915 perf instance
> + * @open_flags: Flags set by userspace for the opening of the stream


Looks you can drop the open_flags, it's not used anymore in this version.


>    * @uprops: The array of u64 key value pairs given by userspace
>    * @n_props: The number of key value pairs expected in @uprops
>    * @props: The stream configuration built up while validating properties
> @@ -3494,6 +3500,7 @@ static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
>    * rule out defining new properties with ordering requirements in the future.
>    */
>   static int read_properties_unlocked(struct i915_perf *perf,
> +				    u32 open_flags,
>   				    u64 __user *uprops,
>   				    u32 n_props,
>   				    struct perf_open_properties *props)
> @@ -3502,6 +3509,7 @@ static int read_properties_unlocked(struct i915_perf *perf,
>   	u32 i;
>   
>   	memset(props, 0, sizeof(struct perf_open_properties));
> +	props->poll_oa_period = DEFAULT_POLL_PERIOD;
>   
>   	if (!n_props) {
>   		DRM_DEBUG("No i915 perf properties given\n");
> @@ -3617,6 +3625,14 @@ static int read_properties_unlocked(struct i915_perf *perf,
>   		case DRM_I915_PERF_PROP_HOLD_PREEMPTION:
>   			props->hold_preemption = !!value;
>   			break;
> +		case DRM_I915_PERF_PROP_POLL_OA_DELAY:
> +			if (value < 100000 /* 100us */) {
> +				DRM_DEBUG("OA availability timer too small (%lluns < 100us)\n",
> +					  value);
> +				return -EINVAL;
> +			}
> +			props->poll_oa_period = value;
> +			break;
>   		case DRM_I915_PERF_PROP_MAX:
>   			MISSING_CASE(id);
>   			return -EINVAL;
> @@ -3675,6 +3691,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
>   	}
>   
>   	ret = read_properties_unlocked(perf,
> +				       param->flags,
>   				       u64_to_user_ptr(param->properties_ptr),
>   				       param->num_properties,
>   				       &props);
> @@ -4457,8 +4474,12 @@ int i915_perf_ioctl_version(void)
>   	 *    preemption on a particular context so that performance data is
>   	 *    accessible from a delta of MI_RPC reports without looking at the
>   	 *    OA buffer.
> +	 *
> +	 * 4: Add DRM_I915_PERF_PROP_POLL_OA_DELAY parameter that controls
> +	 *    enable/disable as well as the interval for the hrtimer used to
> +	 *    check for OA data.


I think you should drop the "enable/disable" part. We just make the 
delay configurable now with a lower bound limit of 100us.


>   	 */
> -	return 3;
> +	return 4;
>   }
>   
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
> index 9ee7c58e70d5..01559ead22e2 100644
> --- a/drivers/gpu/drm/i915/i915_perf_types.h
> +++ b/drivers/gpu/drm/i915/i915_perf_types.h
> @@ -304,6 +304,12 @@ struct i915_perf_stream {
>   	 * reprogrammed.
>   	 */
>   	struct i915_vma *noa_wait;
> +
> +	/**
> +	 * @poll_oa_period: The period in nanoseconds at which the OA
> +	 * buffer should be checked for available data.
> +	 */
> +	u64 poll_oa_period;
>   };
>   
>   /**
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 2813e579b480..dd511e7f795d 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1969,6 +1969,19 @@ enum drm_i915_perf_property_id {
>   	 */
>   	DRM_I915_PERF_PROP_HOLD_PREEMPTION,
>   
> +	/**
> +	 * This optional parameter specifies the timer interval in nanoseconds
> +	 * at which the i915 driver will check the OA buffer for available data.
> +	 * Minimum allowed value is 100 microseconds. A default value is used by
> +	 * the driver if this parameter is not specified. Note that a large
> +	 * value may reduce cpu consumption during OA perf captures, but it
> +	 * would also potentially result in OA buffer overwrite as the captures
> +	 * reach end of the OA buffer.
> +	 *
> +	 * This property is available in perf revision 4.
> +	 */
> +	DRM_I915_PERF_PROP_POLL_OA_DELAY,
> +
>   	DRM_I915_PERF_PROP_MAX /* non-ABI */
>   };
>   


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/perf: add OA interrupt support (rev6)
  2020-03-12 23:04 [Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (7 preceding siblings ...)
  2020-03-13  1:13 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2020-03-13 14:14 ` Patchwork
  8 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-03-13 14:14 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev6)
URL   : https://patchwork.freedesktop.org/series/54280/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8131_full -> Patchwork_16958_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_16958_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_busy@busy-vcs1:
    - shard-iclb:         [PASS][1] -> [SKIP][2] ([fdo#112080]) +17 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb4/igt@gem_busy@busy-vcs1.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb3/igt@gem_busy@busy-vcs1.html

  * igt@gem_ctx_persistence@close-replace-race:
    - shard-kbl:          [PASS][3] -> [INCOMPLETE][4] ([i915#1402])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-kbl6/igt@gem_ctx_persistence@close-replace-race.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-kbl6/igt@gem_ctx_persistence@close-replace-race.html

  * igt@gem_exec_balancer@smoke:
    - shard-iclb:         [PASS][5] -> [SKIP][6] ([fdo#110854])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb1/igt@gem_exec_balancer@smoke.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb6/igt@gem_exec_balancer@smoke.html

  * igt@gem_exec_schedule@implicit-read-write-bsd:
    - shard-iclb:         [PASS][7] -> [SKIP][8] ([i915#677])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb6/igt@gem_exec_schedule@implicit-read-write-bsd.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb1/igt@gem_exec_schedule@implicit-read-write-bsd.html

  * igt@gem_exec_schedule@reorder-wide-bsd:
    - shard-iclb:         [PASS][9] -> [SKIP][10] ([fdo#112146]) +4 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb5/igt@gem_exec_schedule@reorder-wide-bsd.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb1/igt@gem_exec_schedule@reorder-wide-bsd.html

  * igt@gem_ppgtt@flink-and-close-vma-leak:
    - shard-glk:          [PASS][11] -> [FAIL][12] ([i915#644])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-glk7/igt@gem_ppgtt@flink-and-close-vma-leak.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-glk7/igt@gem_ppgtt@flink-and-close-vma-leak.html
    - shard-apl:          [PASS][13] -> [FAIL][14] ([i915#644])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-apl3/igt@gem_ppgtt@flink-and-close-vma-leak.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-apl2/igt@gem_ppgtt@flink-and-close-vma-leak.html

  * igt@gem_tiled_swapping@non-threaded:
    - shard-kbl:          [PASS][15] -> [DMESG-WARN][16] ([i915#183])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-kbl7/igt@gem_tiled_swapping@non-threaded.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-kbl4/igt@gem_tiled_swapping@non-threaded.html

  * igt@gem_workarounds@suspend-resume-context:
    - shard-apl:          [PASS][17] -> [DMESG-WARN][18] ([i915#180]) +3 similar issues
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-apl6/igt@gem_workarounds@suspend-resume-context.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-apl4/igt@gem_workarounds@suspend-resume-context.html

  * igt@i915_pm_rps@reset:
    - shard-tglb:         [PASS][19] -> [FAIL][20] ([i915#413])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-tglb5/igt@i915_pm_rps@reset.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-tglb6/igt@i915_pm_rps@reset.html
    - shard-iclb:         [PASS][21] -> [FAIL][22] ([i915#413])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb1/igt@i915_pm_rps@reset.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb6/igt@i915_pm_rps@reset.html

  * igt@i915_selftest@live@execlists:
    - shard-apl:          [PASS][23] -> [INCOMPLETE][24] ([fdo#103927] / [i915#656])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-apl2/igt@i915_selftest@live@execlists.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-apl4/igt@i915_selftest@live@execlists.html

  * igt@kms_cursor_crc@pipe-a-cursor-64x21-random:
    - shard-kbl:          [PASS][25] -> [FAIL][26] ([i915#54])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-kbl3/igt@kms_cursor_crc@pipe-a-cursor-64x21-random.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-kbl2/igt@kms_cursor_crc@pipe-a-cursor-64x21-random.html
    - shard-skl:          [PASS][27] -> [FAIL][28] ([i915#54])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-skl8/igt@kms_cursor_crc@pipe-a-cursor-64x21-random.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-skl1/igt@kms_cursor_crc@pipe-a-cursor-64x21-random.html
    - shard-apl:          [PASS][29] -> [FAIL][30] ([i915#54])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-apl1/igt@kms_cursor_crc@pipe-a-cursor-64x21-random.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-apl3/igt@kms_cursor_crc@pipe-a-cursor-64x21-random.html

  * igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy:
    - shard-glk:          [PASS][31] -> [FAIL][32] ([i915#72])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-glk8/igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-glk1/igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy.html

  * igt@kms_dp_dsc@basic-dsc-enable-edp:
    - shard-iclb:         [PASS][33] -> [SKIP][34] ([fdo#109349])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb2/igt@kms_dp_dsc@basic-dsc-enable-edp.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb1/igt@kms_dp_dsc@basic-dsc-enable-edp.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-kbl:          [PASS][35] -> [DMESG-WARN][36] ([i915#180]) +1 similar issue
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-kbl1/igt@kms_flip@flip-vs-suspend-interruptible.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-kbl3/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-indfb-draw-render:
    - shard-skl:          [PASS][37] -> [FAIL][38] ([i915#49])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-skl8/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-indfb-draw-render.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-skl1/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-indfb-draw-render.html

  * igt@kms_hdr@bpc-switch-suspend:
    - shard-skl:          [PASS][39] -> [FAIL][40] ([i915#1188])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-skl4/igt@kms_hdr@bpc-switch-suspend.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-skl10/igt@kms_hdr@bpc-switch-suspend.html

  * igt@kms_psr@psr2_basic:
    - shard-iclb:         [PASS][41] -> [SKIP][42] ([fdo#109441])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb2/igt@kms_psr@psr2_basic.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb8/igt@kms_psr@psr2_basic.html

  * igt@prime_busy@hang-bsd2:
    - shard-iclb:         [PASS][43] -> [SKIP][44] ([fdo#109276]) +23 similar issues
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb1/igt@prime_busy@hang-bsd2.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb6/igt@prime_busy@hang-bsd2.html

  
#### Possible fixes ####

  * igt@gem_ctx_isolation@bcs0-s3:
    - shard-kbl:          [DMESG-WARN][45] ([i915#180]) -> [PASS][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-kbl4/igt@gem_ctx_isolation@bcs0-s3.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-kbl1/igt@gem_ctx_isolation@bcs0-s3.html

  * igt@gem_ctx_persistence@close-replace-race:
    - shard-tglb:         [INCOMPLETE][47] ([i915#1402]) -> [PASS][48]
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-tglb8/igt@gem_ctx_persistence@close-replace-race.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-tglb2/igt@gem_ctx_persistence@close-replace-race.html
    - shard-skl:          [INCOMPLETE][49] ([i915#1402]) -> [PASS][50]
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-skl2/igt@gem_ctx_persistence@close-replace-race.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-skl7/igt@gem_ctx_persistence@close-replace-race.html

  * igt@gem_ctx_shared@exec-single-timeline-bsd:
    - shard-iclb:         [SKIP][51] ([fdo#110841]) -> [PASS][52]
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb2/igt@gem_ctx_shared@exec-single-timeline-bsd.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb8/igt@gem_ctx_shared@exec-single-timeline-bsd.html

  * igt@gem_exec_schedule@implicit-read-write-bsd1:
    - shard-iclb:         [SKIP][53] ([fdo#109276] / [i915#677]) -> [PASS][54] +1 similar issue
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb8/igt@gem_exec_schedule@implicit-read-write-bsd1.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb4/igt@gem_exec_schedule@implicit-read-write-bsd1.html

  * igt@gem_exec_schedule@pi-common-bsd:
    - shard-iclb:         [SKIP][55] ([i915#677]) -> [PASS][56] +2 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb2/igt@gem_exec_schedule@pi-common-bsd.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb3/igt@gem_exec_schedule@pi-common-bsd.html

  * igt@gem_exec_schedule@preempt-queue-bsd1:
    - shard-iclb:         [SKIP][57] ([fdo#109276]) -> [PASS][58] +22 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb5/igt@gem_exec_schedule@preempt-queue-bsd1.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb1/igt@gem_exec_schedule@preempt-queue-bsd1.html

  * igt@gem_exec_schedule@preemptive-hang-bsd:
    - shard-iclb:         [SKIP][59] ([fdo#112146]) -> [PASS][60] +7 similar issues
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb4/igt@gem_exec_schedule@preemptive-hang-bsd.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb8/igt@gem_exec_schedule@preemptive-hang-bsd.html

  * igt@gem_ppgtt@flink-and-close-vma-leak:
    - shard-skl:          [FAIL][61] ([i915#644]) -> [PASS][62]
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-skl5/igt@gem_ppgtt@flink-and-close-vma-leak.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-skl4/igt@gem_ppgtt@flink-and-close-vma-leak.html

  * igt@i915_pm_rps@waitboost:
    - shard-iclb:         [FAIL][63] ([i915#413]) -> [PASS][64]
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb5/igt@i915_pm_rps@waitboost.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb7/igt@i915_pm_rps@waitboost.html

  * igt@i915_selftest@live@execlists:
    - shard-glk:          [INCOMPLETE][65] ([i915#58] / [i915#656] / [k.org#198133]) -> [PASS][66]
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-glk6/igt@i915_selftest@live@execlists.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-glk4/igt@i915_selftest@live@execlists.html

  * igt@kms_cursor_crc@pipe-b-cursor-64x21-sliding:
    - shard-skl:          [FAIL][67] ([i915#54]) -> [PASS][68]
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-skl4/igt@kms_cursor_crc@pipe-b-cursor-64x21-sliding.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-skl10/igt@kms_cursor_crc@pipe-b-cursor-64x21-sliding.html

  * igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic:
    - shard-glk:          [FAIL][69] ([i915#133]) -> [PASS][70]
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-glk9/igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-glk3/igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic.html

  * igt@kms_flip@blocking-wf_vblank:
    - shard-hsw:          [INCOMPLETE][71] ([i915#61]) -> [PASS][72]
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-hsw5/igt@kms_flip@blocking-wf_vblank.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-hsw6/igt@kms_flip@blocking-wf_vblank.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes:
    - shard-apl:          [DMESG-WARN][73] ([i915#180]) -> [PASS][74]
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-apl4/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes.html
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-apl1/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes.html

  * igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min:
    - shard-skl:          [FAIL][75] ([fdo#108145]) -> [PASS][76] +1 similar issue
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-skl8/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-skl4/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [FAIL][77] ([fdo#108145] / [i915#265]) -> [PASS][78]
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-skl4/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-skl10/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_plane_lowres@pipe-a-tiling-y:
    - shard-glk:          [FAIL][79] ([i915#899]) -> [PASS][80]
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-glk1/igt@kms_plane_lowres@pipe-a-tiling-y.html
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-glk6/igt@kms_plane_lowres@pipe-a-tiling-y.html

  * igt@kms_psr@psr2_cursor_blt:
    - shard-iclb:         [SKIP][81] ([fdo#109441]) -> [PASS][82] +1 similar issue
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb8/igt@kms_psr@psr2_cursor_blt.html
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb2/igt@kms_psr@psr2_cursor_blt.html

  * igt@kms_setmode@basic:
    - shard-apl:          [FAIL][83] ([i915#31]) -> [PASS][84]
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-apl3/igt@kms_setmode@basic.html
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-apl6/igt@kms_setmode@basic.html

  * igt@perf_pmu@busy-no-semaphores-vcs1:
    - shard-iclb:         [SKIP][85] ([fdo#112080]) -> [PASS][86] +15 similar issues
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb6/igt@perf_pmu@busy-no-semaphores-vcs1.html
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb1/igt@perf_pmu@busy-no-semaphores-vcs1.html

  
#### Warnings ####

  * igt@gem_ctx_persistence@close-replace-race:
    - shard-iclb:         [INCOMPLETE][87] ([i915#1402]) -> [TIMEOUT][88] ([i915#1340])
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb1/igt@gem_ctx_persistence@close-replace-race.html
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb5/igt@gem_ctx_persistence@close-replace-race.html

  * igt@i915_pm_dc@dc3co-vpb-simulation:
    - shard-iclb:         [SKIP][89] ([i915#658]) -> [SKIP][90] ([i915#588])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-iclb8/igt@i915_pm_dc@dc3co-vpb-simulation.html
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-iclb2/igt@i915_pm_dc@dc3co-vpb-simulation.html

  * igt@runner@aborted:
    - shard-kbl:          [FAIL][91] ([i915#92]) -> ([FAIL][92], [FAIL][93]) ([i915#1389] / [i915#1402] / [i915#92])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-kbl1/igt@runner@aborted.html
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-kbl6/igt@runner@aborted.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-kbl3/igt@runner@aborted.html
    - shard-apl:          ([FAIL][94], [FAIL][95]) ([fdo#103927] / [i915#1402]) -> ([FAIL][96], [FAIL][97], [FAIL][98]) ([fdo#103927] / [i915#1402] / [i915#529])
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-apl8/igt@runner@aborted.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8131/shard-apl7/igt@runner@aborted.html
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-apl4/igt@runner@aborted.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-apl3/igt@runner@aborted.html
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/shard-apl6/igt@runner@aborted.html

  
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109349]: https://bugs.freedesktop.org/show_bug.cgi?id=109349
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#110841]: https://bugs.freedesktop.org/show_bug.cgi?id=110841
  [fdo#110854]: https://bugs.freedesktop.org/show_bug.cgi?id=110854
  [fdo#112080]: https://bugs.freedesktop.org/show_bug.cgi?id=112080
  [fdo#112146]: https://bugs.freedesktop.org/show_bug.cgi?id=112146
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#133]: https://gitlab.freedesktop.org/drm/intel/issues/133
  [i915#1340]: https://gitlab.freedesktop.org/drm/intel/issues/1340
  [i915#1389]: https://gitlab.freedesktop.org/drm/intel/issues/1389
  [i915#1402]: https://gitlab.freedesktop.org/drm/intel/issues/1402
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#183]: https://gitlab.freedesktop.org/drm/intel/issues/183
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#31]: https://gitlab.freedesktop.org/drm/intel/issues/31
  [i915#413]: https://gitlab.freedesktop.org/drm/intel/issues/413
  [i915#49]: https://gitlab.freedesktop.org/drm/intel/issues/49
  [i915#529]: https://gitlab.freedesktop.org/drm/intel/issues/529
  [i915#54]: https://gitlab.freedesktop.org/drm/intel/issues/54
  [i915#58]: https://gitlab.freedesktop.org/drm/intel/issues/58
  [i915#588]: https://gitlab.freedesktop.org/drm/intel/issues/588
  [i915#61]: https://gitlab.freedesktop.org/drm/intel/issues/61
  [i915#644]: https://gitlab.freedesktop.org/drm/intel/issues/644
  [i915#656]: https://gitlab.freedesktop.org/drm/intel/issues/656
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#677]: https://gitlab.freedesktop.org/drm/intel/issues/677
  [i915#72]: https://gitlab.freedesktop.org/drm/intel/issues/72
  [i915#899]: https://gitlab.freedesktop.org/drm/intel/issues/899
  [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (9 -> 10)
------------------------------

  Additional (1): pig-glk-j5005 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8131 -> Patchwork_16958

  CI-20190529: 20190529
  CI_DRM_8131: 5137b6daebf1eac30b307c37fe371fee1db7d6cd @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5506: 59fd8a0d01dac58dc6c7d86ef391ed4393ab5aae @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_16958: 5c91c06ca2f8a42a0760f3f30ac014fdeede74cc @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16958/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround
  2020-03-12 23:04 ` [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
  2020-03-13  9:54   ` Lionel Landwerlin
@ 2020-03-16 19:23   ` Dixit, Ashutosh
  2020-03-18  0:03     ` Lionel Landwerlin
  1 sibling, 1 reply; 19+ messages in thread
From: Dixit, Ashutosh @ 2020-03-16 19:23 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Thu, 12 Mar 2020 16:04:59 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> We're about to introduce an options to open the perf stream, giving
> the user ability to configure how often it wants the kernel to poll
> the OA registers for available data.
>
> Right now the workaround against the OA tail pointer race condition
> requires at least twice the internal kernel polling timer to make any
> data available.
>
> This changes introduce checks on the OA data written into the circular
> buffer to make as much data as possible available on the first
> iteration of the polling timer.

Excellent, this absolutely needs to be done, I was thinking it may be
possible even with the approach taken in the original code but the approach
here looks better. It is also a nice cleanup.

> @@ -507,64 +487,76 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>
>	now = ktime_get_mono_fast_ns();
>
> -	/* Update the aged tail
> -	 *
> -	 * Flip the tail pointer available for read()s once the aging tail is
> -	 * old enough to trust that the corresponding data will be visible to
> -	 * the CPU...
> -	 *
> -	 * Do this before updating the aging pointer in case we may be able to
> -	 * immediately start aging a new pointer too (if new data has become
> -	 * available) without needing to wait for a later hrtimer callback.
> -	 */
> -	if (aging_tail != INVALID_TAIL_PTR &&
> -	    ((now - stream->oa_buffer.aging_timestamp) >
> -	     OA_TAIL_MARGIN_NSEC)) {
> -
> -		aged_idx ^= 1;
> -		stream->oa_buffer.aged_tail_idx = aged_idx;
> +	if (hw_tail == stream->oa_buffer.aging_tail) {
> +		/* If the HW tail hasn't move since the last check and the HW
> +		 * tail has been aging for long enough, declare it the new
> +		 * tail.
> +		 */

Is this really needed? True we will never return the last report but maybe
it's ok, save some code?

> +		if ((now - stream->oa_buffer.aging_timestamp) >

Do we need to initialize 'aging_timestamp = ktime_get_mono_fast_ns()' when
the stream is enabled?

> +		    OA_TAIL_MARGIN_NSEC) {
> +			stream->oa_buffer.tail =
> +				stream->oa_buffer.aging_tail;
> +		}
> +	} else {
> +		u32 head, tail, landed_report_heads;
>
> -		aged_tail = aging_tail;
> +		/* NB: The head we observe here might effectively be a little out of
> +		 * date (between head and tails[aged_idx].offset if there is currently
> +		 * a read() in progress.
> +		 */

Is this comment correct? Aren't we are taking the same lock when updating
head after the read?

> +		head = stream->oa_buffer.head - gtt_offset;
>
> -		/* Mark that we need a new pointer to start aging... */
> -		stream->oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
> -		aging_tail = INVALID_TAIL_PTR;
> -	}
> +		hw_tail -= gtt_offset;
>
> -	/* Update the aging tail
> -	 *
> -	 * We throttle aging tail updates until we have a new tail that
> -	 * represents >= one report more data than is already available for
> -	 * reading. This ensures there will be enough data for a successful
> -	 * read once this new pointer has aged and ensures we will give the new
> -	 * pointer time to age.
> -	 */
> -	if (aging_tail == INVALID_TAIL_PTR &&
> -	    (aged_tail == INVALID_TAIL_PTR ||
> -	     OA_TAKEN(hw_tail, aged_tail) >= report_size)) {
> -		struct i915_vma *vma = stream->oa_buffer.vma;
> -		u32 gtt_offset = i915_ggtt_offset(vma);
> -
> -		/* Be paranoid and do a bounds check on the pointer read back
> -		 * from hardware, just in case some spurious hardware condition
> -		 * could put the tail out of bounds...
> +		/* Walk the stream backward until we find at least 2 reports
> +		 * with dword 0 & 1 not at 0. Since the circular buffer
> +		 * pointers progress by increments of 64 bytes and that
> +		 * reports can be up to 256 bytes long, we can't tell whether
> +		 * a report has fully landed in memory before the first 2
> +		 * dwords of the following report have effectively landed.
> +		 *
> +		 * This is assuming that the writes of the OA unit land in
> +		 * memory in the order they were written to.
> +		 * If not : (╯°□°）╯︵ ┻━┻
>		 */
> -		if (hw_tail >= gtt_offset &&
> -		    hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
> -			stream->oa_buffer.tails[!aged_idx].offset =
> -				aging_tail = hw_tail;
> -			stream->oa_buffer.aging_timestamp = now;
> -		} else {
> -			drm_err(&stream->perf->i915->drm,
> -				"Ignoring spurious out of range OA buffer tail pointer = %x\n",
> -				hw_tail);
> +		landed_report_heads = 0;
> +		while (OA_TAKEN(tail, head) >= report_size) {
> +			u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
> +			u8 *report = stream->oa_buffer.vaddr + previous_tail;
> +			u32 *report32 = (void *) report;

nit: instead of the 2 statements above we can just have a single statement,
because stream->oa_buffer.vaddr is u8*:

	u32 *report32 = (void *)(stream->oa_buffer.vaddr + previous_tail);

> +
> +			/* Head of the report indicated by the HW tail register has
> +			 * indeed landed into memory.
> +			 */
> +			if (report32[0] != 0 || report32[1] != 0) {

Just wondering if it is sufficient to check just one of these two (report
reason and timestamp)?

> +				landed_report_heads++;

Theoretically we should be checking for "two consecutive" reports, not just
any two as the code is doing? Or maybe it's ok to just check once, i.e. if
data at previous tail has landed we can break out of the loop?

> +
> +				if (landed_report_heads >= 2)
> +					break;
> +			}
> +
> +			tail = previous_tail;
>		}
> +
> +		if (abs(tail - hw_tail) >= (2 * report_size)) {

Shouldn't it be accounted that this is a circular not a linear buffer?

> +			if (__ratelimit(&stream->perf->tail_pointer_race)) {
> +				DRM_NOTE("unlanded report(s) head=0x%x "
> +					 "tail=0x%x hw_tail=0x%x\n",
> +					 head, tail, hw_tail);

Because this is a ratelimited output, should we also include a count here
to indicate how often this is happening?

> +			}
> +		}
> +
> +		stream->oa_buffer.tail = gtt_offset + tail;
> +		stream->oa_buffer.aging_tail = gtt_offset + hw_tail;
> +		stream->oa_buffer.aging_timestamp = now;
>	}
>
>	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);

Hmm, I think I would compute the value to return below and then drop the
lock, just in case read moves the head?

>
> -	return aged_tail == INVALID_TAIL_PTR ?
> -		false : OA_TAKEN(aged_tail, head) >= report_size;
> +	return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> +			stream->oa_buffer.head - gtt_offset) >= report_size;
> +

Delete extra line, run checkpatch?

> @@ -838,13 +821,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>		}
>
>		/*
> -		 * The above reason field sanity check is based on
> -		 * the assumption that the OA buffer is initially
> -		 * zeroed and we reset the field after copying so the
> -		 * check is still meaningful once old reports start
> -		 * being overwritten.
> +		 * Clear out the first 2 dword as a mean to detect unlanded
> +		 * reports.
>		 */
> -		report32[0] = 0;
> +		report32[0] = report32[1] = 0;

Again, how about initialization here? That is, since we are doing this when
copying to user buffer, is the gem object allocated in alloc_oa_buffer()
zero'd out to begin with?

> @@ -1559,8 +1528,8 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
>			   gtt_offset & GEN12_OAG_OATAILPTR_MASK);
>
>	/* Mark that we need updated tail pointers to read from... */
> -	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
> -	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
> +	stream->oa_buffer.aging_tail =
> +		stream->oa_buffer.tail = gtt_offset;

What is the reason for initializing the head and tail first to gtt_offset
and later subtracting it out? If hw_tail read from the HW has this offset
it seems simpler to just subtract out gtt_offset from hw_tail but leave all
these software head/tail not have gtt_offset (initialized to 0)? Though I
think this is all over the place so it is probably better left to a
separate later patch. So maybe leave as is for now.

Thanks!
--
Ashutosh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 2/4] drm/i915/perf: move pollin setup to non hw specific code
  2020-03-12 23:05 ` [Intel-gfx] [PATCH 2/4] drm/i915/perf: move pollin setup to non hw specific code Umesh Nerlige Ramappa
@ 2020-03-16 20:18   ` Dixit, Ashutosh
  0 siblings, 0 replies; 19+ messages in thread
From: Dixit, Ashutosh @ 2020-03-16 20:18 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Thu, 12 Mar 2020 16:05:00 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> This isn't really gen specific stuff, so just move it to the common
> code.

It seems pollin is not the only member which is not gen specific but is
initialized in gen specific code. Anyway any other such moves are for
future patches.

> @@ -2626,6 +2620,12 @@ static void gen12_oa_enable(struct i915_perf_stream *stream)
>   */
>  static void i915_oa_stream_enable(struct i915_perf_stream *stream)
>  {
> +	/*
> +	 * Maybe make ->pollin per-stream state if we support multiple
> +	 * concurrent streams in the future.
> +	 */
> +	stream->pollin = false;

What about the comment above? Isn't pollin already per-stream?

Please fix if needed. Otherwise this is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/i915/perf: only append status when data is available
  2020-03-12 23:05 ` [Intel-gfx] [PATCH 3/4] drm/i915/perf: only append status when data is available Umesh Nerlige Ramappa
@ 2020-03-16 22:16   ` Dixit, Ashutosh
  2020-03-17 10:32     ` Lionel Landwerlin
  0 siblings, 1 reply; 19+ messages in thread
From: Dixit, Ashutosh @ 2020-03-16 22:16 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Thu, 12 Mar 2020 16:05:01 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> The only bit of the status register we currently report in the
> i915-perf stream is the "report loss" bit. Only report this when we
> have some data to report with it. There was a kind of inconsistency
> here in that we could report report loss without appending the reports
> associated with the loss.

Splitting hair a bit, but I am wondering if this is realistic? If reports
have been lost in the middle of a OA buffer then there /will/ be some data
from the hardware so head != tail. So is the situation which this patch is
fixing ever been observed in practice?

Also, if we are doing this, how about moving the entire status handling
here, including intel_uncore_read() and OABUFFER_OVERFLOW handling (which I
understand resets the stream so probably doesn't have associated data).

In any case, since these are just random questions, this is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 4/4] drm/i915/perf: add new open param to configure polling of OA buffer
  2020-03-12 23:05 ` [Intel-gfx] [PATCH 4/4] drm/i915/perf: add new open param to configure polling of OA buffer Umesh Nerlige Ramappa
  2020-03-13  9:59   ` Lionel Landwerlin
@ 2020-03-17  0:02   ` Dixit, Ashutosh
  1 sibling, 0 replies; 19+ messages in thread
From: Dixit, Ashutosh @ 2020-03-17  0:02 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Thu, 12 Mar 2020 16:05:02 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> This new parameter let's the application choose how often the OA
> buffer should be checked on the CPU side for data availability. Longer
> polling period tend to reduce CPU overhead if the application does not
> care about somewhat real time data collection.

Lionely already has comments on this so skipping those.

> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 21a63644846f..ca139ac31b11 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -262,11 +262,11 @@
>   */
>  #define OA_TAIL_MARGIN_NSEC	100000ULL
>
> -/* frequency for checking whether the OA unit has written new reports to the
> - * circular OA buffer...
> +/* The default frequency for checking whether the OA unit has written new
> + * reports to the circular OA buffer...
>   */
> -#define POLL_FREQUENCY 200
> -#define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY)
> +#define DEFAULT_POLL_FREQUENCY 200
> +#define DEFAULT_POLL_PERIOD (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY)

nit, but I mostly like to have "units" as part of the name, so I'd recommed
calling this DEFAULT_POLL_PERIOD_NS.

> @@ -349,6 +349,8 @@ static const struct i915_oa_format gen12_oa_formats[I915_OA_FORMAT_MAX] = {
>   * @oa_periodic: Whether to enable periodic OA unit sampling
>   * @oa_period_exponent: The OA unit sampling period is derived from this
>   * @engine: The engine (typically rcs0) being monitored by the OA unit
> + * @poll_oa_period: The period at which the CPU will check for OA data
> + * availability
>   *
>   * As read_properties_unlocked() enumerates and validates the properties given
>   * to open a stream of metrics the configuration is built up in the structure
> @@ -368,6 +370,7 @@ struct perf_open_properties {
>	int oa_period_exponent;
>
>	struct intel_engine_cs *engine;
> +	u64 poll_oa_period;

poll_oa_period_ns?

> @@ -2642,9 +2645,9 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream)
>
>	stream->perf->ops.oa_enable(stream);
>
> -	if (stream->periodic)
> +	if (stream->periodic && stream->poll_oa_period)

poll_oa_period check is not required, it's always present.

> @@ -3617,6 +3625,14 @@ static int read_properties_unlocked(struct i915_perf *perf,
>		case DRM_I915_PERF_PROP_HOLD_PREEMPTION:
>			props->hold_preemption = !!value;
>			break;
> +		case DRM_I915_PERF_PROP_POLL_OA_DELAY:
> +			if (value < 100000 /* 100us */) {

So no maximum? That's probably ok, reap what you sow. Have we tested 100
us, seems low, anyway.

> diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
> index 9ee7c58e70d5..01559ead22e2 100644
> --- a/drivers/gpu/drm/i915/i915_perf_types.h
> +++ b/drivers/gpu/drm/i915/i915_perf_types.h
> @@ -304,6 +304,12 @@ struct i915_perf_stream {
>	 * reprogrammed.
>	 */
>	struct i915_vma *noa_wait;
> +
> +	/**
> +	 * @poll_oa_period: The period in nanoseconds at which the OA
> +	 * buffer should be checked for available data.
> +	 */
> +	u64 poll_oa_period;

poll_oa_period_ns?

>  };
>
>  /**
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 2813e579b480..dd511e7f795d 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1969,6 +1969,19 @@ enum drm_i915_perf_property_id {
>	 */
>	DRM_I915_PERF_PROP_HOLD_PREEMPTION,
>
> +	/**
> +	 * This optional parameter specifies the timer interval in nanoseconds
> +	 * at which the i915 driver will check the OA buffer for available data.
> +	 * Minimum allowed value is 100 microseconds. A default value is used by
> +	 * the driver if this parameter is not specified. Note that a large
> +	 * value may reduce cpu consumption during OA perf captures, but it
> +	 * would also potentially result in OA buffer overwrite as the captures
> +	 * reach end of the OA buffer.
> +	 *
> +	 * This property is available in perf revision 4.
> +	 */
> +	DRM_I915_PERF_PROP_POLL_OA_DELAY,

Is DRM_I915_PERF_PROP_POLL_OA_PERIOD_NS a better name? At least PERIOD
instead of DELAY.

Thanks!
--
Ashutosh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/i915/perf: only append status when data is available
  2020-03-16 22:16   ` Dixit, Ashutosh
@ 2020-03-17 10:32     ` Lionel Landwerlin
  0 siblings, 0 replies; 19+ messages in thread
From: Lionel Landwerlin @ 2020-03-17 10:32 UTC (permalink / raw)
  To: Dixit, Ashutosh, Umesh Nerlige Ramappa; +Cc: intel-gfx

On 17/03/2020 00:16, Dixit, Ashutosh wrote:
> On Thu, 12 Mar 2020 16:05:01 -0700, Umesh Nerlige Ramappa wrote:
>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>
>> The only bit of the status register we currently report in the
>> i915-perf stream is the "report loss" bit. Only report this when we
>> have some data to report with it. There was a kind of inconsistency
>> here in that we could report report loss without appending the reports
>> associated with the loss.
> Splitting hair a bit, but I am wondering if this is realistic? If reports
> have been lost in the middle of a OA buffer then there /will/ be some data
> from the hardware so head != tail. So is the situation which this patch is
> fixing ever been observed in practice?
>
> Also, if we are doing this, how about moving the entire status handling
> here, including intel_uncore_read() and OABUFFER_OVERFLOW handling (which I
> understand resets the stream so probably doesn't have associated data).
>
> In any case, since these are just random questions, this is:
>
> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

For some reason I thought this writing of lost report was inconsistent.

But it's fairly unrelated to this series.

Maybe drop this patch...


-Lionel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround
  2020-03-16 19:23   ` Dixit, Ashutosh
@ 2020-03-18  0:03     ` Lionel Landwerlin
  2020-03-19  7:44       ` Dixit, Ashutosh
  0 siblings, 1 reply; 19+ messages in thread
From: Lionel Landwerlin @ 2020-03-18  0:03 UTC (permalink / raw)
  To: Dixit, Ashutosh, Umesh Nerlige Ramappa; +Cc: intel-gfx

On 16/03/2020 21:23, Dixit, Ashutosh wrote:
> On Thu, 12 Mar 2020 16:04:59 -0700, Umesh Nerlige Ramappa wrote:
>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>
>> We're about to introduce an options to open the perf stream, giving
>> the user ability to configure how often it wants the kernel to poll
>> the OA registers for available data.
>>
>> Right now the workaround against the OA tail pointer race condition
>> requires at least twice the internal kernel polling timer to make any
>> data available.
>>
>> This changes introduce checks on the OA data written into the circular
>> buffer to make as much data as possible available on the first
>> iteration of the polling timer.
> Excellent, this absolutely needs to be done, I was thinking it may be
> possible even with the approach taken in the original code but the approach
> here looks better. It is also a nice cleanup.
>
>> @@ -507,64 +487,76 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>
>> 	now = ktime_get_mono_fast_ns();
>>
>> -	/* Update the aged tail
>> -	 *
>> -	 * Flip the tail pointer available for read()s once the aging tail is
>> -	 * old enough to trust that the corresponding data will be visible to
>> -	 * the CPU...
>> -	 *
>> -	 * Do this before updating the aging pointer in case we may be able to
>> -	 * immediately start aging a new pointer too (if new data has become
>> -	 * available) without needing to wait for a later hrtimer callback.
>> -	 */
>> -	if (aging_tail != INVALID_TAIL_PTR &&
>> -	    ((now - stream->oa_buffer.aging_timestamp) >
>> -	     OA_TAIL_MARGIN_NSEC)) {
>> -
>> -		aged_idx ^= 1;
>> -		stream->oa_buffer.aged_tail_idx = aged_idx;
>> +	if (hw_tail == stream->oa_buffer.aging_tail) {
>> +		/* If the HW tail hasn't move since the last check and the HW
>> +		 * tail has been aging for long enough, declare it the new
>> +		 * tail.
>> +		 */
> Is this really needed? True we will never return the last report but maybe
> it's ok, save some code?


It doesn't look like a lot of code :)


>
>> +		if ((now - stream->oa_buffer.aging_timestamp) >
> Do we need to initialize 'aging_timestamp = ktime_get_mono_fast_ns()' when
> the stream is enabled?


Aye... I think aging_tail should probably be initialized to 
INVALID_TAIL_PTR in init_oa_buffer() vfuncs.

So that on the first call it never matches : if (hw_tail == 
stream->oa_buffer.aging_tail)

And that aging_timestamp is set properly.


>
>> +		    OA_TAIL_MARGIN_NSEC) {
>> +			stream->oa_buffer.tail =
>> +				stream->oa_buffer.aging_tail;
>> +		}
>> +	} else {
>> +		u32 head, tail, landed_report_heads;
>>
>> -		aged_tail = aging_tail;
>> +		/* NB: The head we observe here might effectively be a little out of
>> +		 * date (between head and tails[aged_idx].offset if there is currently
>> +		 * a read() in progress.
>> +		 */
> Is this comment correct? Aren't we are taking the same lock when updating
> head after the read?


It seems the mention of tails[aged_idx].offset is definitely out of date.


That being said, the read is concurrent to this function (only the 
update of head is locked).

That means the read code can be going through the reports while we're 
going through a different set of reports in the same OA buffer.


>
>> +		head = stream->oa_buffer.head - gtt_offset;
>>
>> -		/* Mark that we need a new pointer to start aging... */
>> -		stream->oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
>> -		aging_tail = INVALID_TAIL_PTR;
>> -	}
>> +		hw_tail -= gtt_offset;
>>
>> -	/* Update the aging tail
>> -	 *
>> -	 * We throttle aging tail updates until we have a new tail that
>> -	 * represents >= one report more data than is already available for
>> -	 * reading. This ensures there will be enough data for a successful
>> -	 * read once this new pointer has aged and ensures we will give the new
>> -	 * pointer time to age.
>> -	 */
>> -	if (aging_tail == INVALID_TAIL_PTR &&
>> -	    (aged_tail == INVALID_TAIL_PTR ||
>> -	     OA_TAKEN(hw_tail, aged_tail) >= report_size)) {
>> -		struct i915_vma *vma = stream->oa_buffer.vma;
>> -		u32 gtt_offset = i915_ggtt_offset(vma);
>> -
>> -		/* Be paranoid and do a bounds check on the pointer read back
>> -		 * from hardware, just in case some spurious hardware condition
>> -		 * could put the tail out of bounds...
>> +		/* Walk the stream backward until we find at least 2 reports
>> +		 * with dword 0 & 1 not at 0. Since the circular buffer
>> +		 * pointers progress by increments of 64 bytes and that
>> +		 * reports can be up to 256 bytes long, we can't tell whether
>> +		 * a report has fully landed in memory before the first 2
>> +		 * dwords of the following report have effectively landed.
>> +		 *
>> +		 * This is assuming that the writes of the OA unit land in
>> +		 * memory in the order they were written to.
>> +		 * If not : (╯°□°）╯︵ ┻━┻
>> 		 */
>> -		if (hw_tail >= gtt_offset &&
>> -		    hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
>> -			stream->oa_buffer.tails[!aged_idx].offset =
>> -				aging_tail = hw_tail;
>> -			stream->oa_buffer.aging_timestamp = now;
>> -		} else {
>> -			drm_err(&stream->perf->i915->drm,
>> -				"Ignoring spurious out of range OA buffer tail pointer = %x\n",
>> -				hw_tail);
>> +		landed_report_heads = 0;
>> +		while (OA_TAKEN(tail, head) >= report_size) {
>> +			u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
>> +			u8 *report = stream->oa_buffer.vaddr + previous_tail;
>> +			u32 *report32 = (void *) report;
> nit: instead of the 2 statements above we can just have a single statement,
> because stream->oa_buffer.vaddr is u8*:
>
> 	u32 *report32 = (void *)(stream->oa_buffer.vaddr + previous_tail);


Looks right :)


>
>> +
>> +			/* Head of the report indicated by the HW tail register has
>> +			 * indeed landed into memory.
>> +			 */
>> +			if (report32[0] != 0 || report32[1] != 0) {
> Just wondering if it is sufficient to check just one of these two (report
> reason and timestamp)?


We could be really unlucky and get on the timestamp = 0.

But if that's the case, we might hit the timeout from above soon enough.

Or we'll see the tail updated and probably find another report that matches.


>
>> +				landed_report_heads++;
> Theoretically we should be checking for "two consecutive" reports, not just
> any two as the code is doing? Or maybe it's ok to just check once, i.e. if
> data at previous tail has landed we can break out of the loop?


Yeah, I think the code isn't quite "two consecutive".


Now that you mention it. It should be "find a report with dwords[0,1] != 
0 and consider the report before that fully landed".

Again assuming the memory controller makes things visible in order they 
were written.


>
>> +
>> +				if (landed_report_heads >= 2)
>> +					break;
>> +			}
>> +
>> +			tail = previous_tail;
>> 		}
>> +
>> +		if (abs(tail - hw_tail) >= (2 * report_size)) {
> Shouldn't it be accounted that this is a circular not a linear buffer?


I think you're right.


>
>> +			if (__ratelimit(&stream->perf->tail_pointer_race)) {
>> +				DRM_NOTE("unlanded report(s) head=0x%x "
>> +					 "tail=0x%x hw_tail=0x%x\n",
>> +					 head, tail, hw_tail);
> Because this is a ratelimited output, should we also include a count here
> to indicate how often this is happening?


I think ratelimit() will print some counts when it happens too often.


>
>> +			}
>> +		}
>> +
>> +		stream->oa_buffer.tail = gtt_offset + tail;
>> +		stream->oa_buffer.aging_tail = gtt_offset + hw_tail;
>> +		stream->oa_buffer.aging_timestamp = now;
>> 	}
>>
>> 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> Hmm, I think I would compute the value to return below and then drop the
> lock, just in case read moves the head?


Correct! Thanks for spotting this.


>
>> -	return aged_tail == INVALID_TAIL_PTR ?
>> -		false : OA_TAKEN(aged_tail, head) >= report_size;
>> +	return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
>> +			stream->oa_buffer.head - gtt_offset) >= report_size;
>> +
> Delete extra line, run checkpatch?
>
>> @@ -838,13 +821,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>> 		}
>>
>> 		/*
>> -		 * The above reason field sanity check is based on
>> -		 * the assumption that the OA buffer is initially
>> -		 * zeroed and we reset the field after copying so the
>> -		 * check is still meaningful once old reports start
>> -		 * being overwritten.
>> +		 * Clear out the first 2 dword as a mean to detect unlanded
>> +		 * reports.
>> 		 */
>> -		report32[0] = 0;
>> +		report32[0] = report32[1] = 0;
> Again, how about initialization here? That is, since we are doing this when
> copying to user buffer, is the gem object allocated in alloc_oa_buffer()
> zero'd out to begin with?


memset to 0 in init_oa_buffer vfuncs.


>
>> @@ -1559,8 +1528,8 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
>> 			   gtt_offset & GEN12_OAG_OATAILPTR_MASK);
>>
>> 	/* Mark that we need updated tail pointers to read from... */
>> -	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
>> -	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
>> +	stream->oa_buffer.aging_tail =
>> +		stream->oa_buffer.tail = gtt_offset;
> What is the reason for initializing the head and tail first to gtt_offset
> and later subtracting it out? If hw_tail read from the HW has this offset
> it seems simpler to just subtract out gtt_offset from hw_tail but leave all
> these software head/tail not have gtt_offset (initialized to 0)? Though I
> think this is all over the place so it is probably better left to a
> separate later patch. So maybe leave as is for now.


Quite possibly why I left the gtt_offset in there :)

No recollection though...


>
> Thanks!
> --
> Ashutosh


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround
  2020-03-18  0:03     ` Lionel Landwerlin
@ 2020-03-19  7:44       ` Dixit, Ashutosh
  0 siblings, 0 replies; 19+ messages in thread
From: Dixit, Ashutosh @ 2020-03-19  7:44 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

Discussed with Umesh today. Below is what we came up with.

On Tue, 17 Mar 2020 17:03:30 -0700, Lionel Landwerlin wrote:
> On 16/03/2020 21:23, Dixit, Ashutosh wrote:
> > On Thu, 12 Mar 2020 16:04:59 -0700, Umesh Nerlige Ramappa wrote:
> >> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >>
> >> We're about to introduce an options to open the perf stream, giving
> >> the user ability to configure how often it wants the kernel to poll
> >> the OA registers for available data.
> >>
> >> Right now the workaround against the OA tail pointer race condition
> >> requires at least twice the internal kernel polling timer to make any
> >> data available.
> >>
> >> This changes introduce checks on the OA data written into the circular
> >> buffer to make as much data as possible available on the first
> >> iteration of the polling timer.
> > Excellent, this absolutely needs to be done, I was thinking it may be
> > possible even with the approach taken in the original code but the approach
> > here looks better. It is also a nice cleanup.
> >
> >> @@ -507,64 +487,76 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >>
> >>	now = ktime_get_mono_fast_ns();
> >>
> >> -	/* Update the aged tail
> >> -	 *
> >> -	 * Flip the tail pointer available for read()s once the aging tail is
> >> -	 * old enough to trust that the corresponding data will be visible to
> >> -	 * the CPU...
> >> -	 *
> >> -	 * Do this before updating the aging pointer in case we may be able to
> >> -	 * immediately start aging a new pointer too (if new data has become
> >> -	 * available) without needing to wait for a later hrtimer callback.
> >> -	 */
> >> -	if (aging_tail != INVALID_TAIL_PTR &&
> >> -	    ((now - stream->oa_buffer.aging_timestamp) >
> >> -	     OA_TAIL_MARGIN_NSEC)) {
> >> -
> >> -		aged_idx ^= 1;
> >> -		stream->oa_buffer.aged_tail_idx = aged_idx;
> >> +	if (hw_tail == stream->oa_buffer.aging_tail) {
> >> +		/* If the HW tail hasn't move since the last check and the HW
> >> +		 * tail has been aging for long enough, declare it the new
> >> +		 * tail.
> >> +		 */
> > Is this really needed? True we will never return the last report but maybe
> > it's ok, save some code?
>
> It doesn't look like a lot of code :)

OK, leave as is in the patch.

>
> >
> >> +		if ((now - stream->oa_buffer.aging_timestamp) >
> > Do we need to initialize 'aging_timestamp = ktime_get_mono_fast_ns()' when
> > the stream is enabled?
>
>
> Aye... I think aging_tail should probably be initialized to
> INVALID_TAIL_PTR in init_oa_buffer() vfuncs.
>
> So that on the first call it never matches : if (hw_tail ==
> stream->oa_buffer.aging_tail)
>
> And that aging_timestamp is set properly.

OK, let's do this.

> >
> >> +		    OA_TAIL_MARGIN_NSEC) {
> >> +			stream->oa_buffer.tail =
> >> +				stream->oa_buffer.aging_tail;
> >> +		}
> >> +	} else {
> >> +		u32 head, tail, landed_report_heads;
> >>
> >> -		aged_tail = aging_tail;
> >> +		/* NB: The head we observe here might effectively be a little out of
> >> +		 * date (between head and tails[aged_idx].offset if there is currently
> >> +		 * a read() in progress.
> >> +		 */
> > Is this comment correct? Aren't we are taking the same lock when updating
> > head after the read?
>
>
> It seems the mention of tails[aged_idx].offset is definitely out of date.
>
>
> That being said, the read is concurrent to this function (only the update
> of head is locked).
>
> That means the read code can be going through the reports while we're going
> through a different set of reports in the same OA buffer.

OK, reword the comment so that it doesn't look like it's a locking problem.

>
>
> >
> >> +		head = stream->oa_buffer.head - gtt_offset;
> >>
> >> -		/* Mark that we need a new pointer to start aging... */
> >> -		stream->oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
> >> -		aging_tail = INVALID_TAIL_PTR;
> >> -	}
> >> +		hw_tail -= gtt_offset;
> >>
> >> -	/* Update the aging tail
> >> -	 *
> >> -	 * We throttle aging tail updates until we have a new tail that
> >> -	 * represents >= one report more data than is already available for
> >> -	 * reading. This ensures there will be enough data for a successful
> >> -	 * read once this new pointer has aged and ensures we will give the new
> >> -	 * pointer time to age.
> >> -	 */
> >> -	if (aging_tail == INVALID_TAIL_PTR &&
> >> -	    (aged_tail == INVALID_TAIL_PTR ||
> >> -	     OA_TAKEN(hw_tail, aged_tail) >= report_size)) {
> >> -		struct i915_vma *vma = stream->oa_buffer.vma;
> >> -		u32 gtt_offset = i915_ggtt_offset(vma);
> >> -
> >> -		/* Be paranoid and do a bounds check on the pointer read back
> >> -		 * from hardware, just in case some spurious hardware condition
> >> -		 * could put the tail out of bounds...
> >> +		/* Walk the stream backward until we find at least 2 reports
> >> +		 * with dword 0 & 1 not at 0. Since the circular buffer
> >> +		 * pointers progress by increments of 64 bytes and that
> >> +		 * reports can be up to 256 bytes long, we can't tell whether
> >> +		 * a report has fully landed in memory before the first 2
> >> +		 * dwords of the following report have effectively landed.
> >> +		 *
> >> +		 * This is assuming that the writes of the OA unit land in
> >> +		 * memory in the order they were written to.
> >> +		 * If not : (╯°□°）╯︵ ┻━┻
> >>		 */
> >> -		if (hw_tail >= gtt_offset &&
> >> -		    hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
> >> -			stream->oa_buffer.tails[!aged_idx].offset =
> >> -				aging_tail = hw_tail;
> >> -			stream->oa_buffer.aging_timestamp = now;
> >> -		} else {
> >> -			drm_err(&stream->perf->i915->drm,
> >> -				"Ignoring spurious out of range OA buffer tail pointer = %x\n",
> >> -				hw_tail);
> >> +		landed_report_heads = 0;
> >> +		while (OA_TAKEN(tail, head) >= report_size) {

Rather than OA_TAKEN(tail, head) this should probably be
OA_TAKEN(tail, old_tail), investigating.

> >> +			u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
> >> +			u8 *report = stream->oa_buffer.vaddr + previous_tail;
> >> +			u32 *report32 = (void *) report;
> > nit: instead of the 2 statements above we can just have a single statement,
> > because stream->oa_buffer.vaddr is u8*:
> >
> >	u32 *report32 = (void *)(stream->oa_buffer.vaddr + previous_tail);
>
>
> Looks right :)
>
>
> >
> >> +
> >> +			/* Head of the report indicated by the HW tail register has
> >> +			 * indeed landed into memory.
> >> +			 */
> >> +			if (report32[0] != 0 || report32[1] != 0) {
> > Just wondering if it is sufficient to check just one of these two (report
> > reason and timestamp)?
>
>
> We could be really unlucky and get on the timestamp = 0.
>
> But if that's the case, we might hit the timeout from above soon enough.
>
> Or we'll see the tail updated and probably find another report that
> matches.

OK, leave as is.

>
>
> >
> >> +				landed_report_heads++;
> > Theoretically we should be checking for "two consecutive" reports, not just
> > any two as the code is doing? Or maybe it's ok to just check once, i.e. if
> > data at previous tail has landed we can break out of the loop?
>
>
> Yeah, I think the code isn't quite "two consecutive".
>
>
> Now that you mention it. It should be "find a report with dwords[0,1] != 0
> and consider the report before that fully landed".
>
> Again assuming the memory controller makes things visible in order they
> were written.

Ok, just need to look at one not at "two consecutive".

>
> >
> >> +
> >> +				if (landed_report_heads >= 2)
> >> +					break;
> >> +			}
> >> +
> >> +			tail = previous_tail;
> >>		}
> >> +
> >> +		if (abs(tail - hw_tail) >= (2 * report_size)) {
> > Shouldn't it be accounted that this is a circular not a linear buffer?
>
>
> I think you're right.

Will fix.

>
>
> >
> >> +			if (__ratelimit(&stream->perf->tail_pointer_race)) {
> >> +				DRM_NOTE("unlanded report(s) head=0x%x "
> >> +					 "tail=0x%x hw_tail=0x%x\n",
> >> +					 head, tail, hw_tail);
> > Because this is a ratelimited output, should we also include a count here
> > to indicate how often this is happening?
>
>
> I think ratelimit() will print some counts when it happens too often.

OK.

>
>
> >
> >> +			}
> >> +		}
> >> +
> >> +		stream->oa_buffer.tail = gtt_offset + tail;
> >> +		stream->oa_buffer.aging_tail = gtt_offset + hw_tail;
> >> +		stream->oa_buffer.aging_timestamp = now;
> >>	}
> >>
> >>	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> > Hmm, I think I would compute the value to return below and then drop the
> > lock, just in case read moves the head?
>
>
> Correct! Thanks for spotting this.

Actually now I am thinking leaving as is. Note that the update of head
after read is atomic so there is no problem of picking up a inconsistent
head. This way we pick up an actual head, not the un-updated one, and will
probably report mostly correctly if there actually are new reports to be
read.

In reality the circular buffer should be lockless, but that we can look at
that in the future.

>
>
> >
> >> -	return aged_tail == INVALID_TAIL_PTR ?
> >> -		false : OA_TAKEN(aged_tail, head) >= report_size;
> >> +	return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> >> +			stream->oa_buffer.head - gtt_offset) >= report_size;
> >> +
> > Delete extra line, run checkpatch?
> >
> >> @@ -838,13 +821,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
> >>		}
> >>
> >>		/*
> >> -		 * The above reason field sanity check is based on
> >> -		 * the assumption that the OA buffer is initially
> >> -		 * zeroed and we reset the field after copying so the
> >> -		 * check is still meaningful once old reports start
> >> -		 * being overwritten.
> >> +		 * Clear out the first 2 dword as a mean to detect unlanded
> >> +		 * reports.
> >>		 */
> >> -		report32[0] = 0;
> >> +		report32[0] = report32[1] = 0;
> > Again, how about initialization here? That is, since we are doing this when
> > copying to user buffer, is the gem object allocated in alloc_oa_buffer()
> > zero'd out to begin with?
>
>
> memset to 0 in init_oa_buffer vfuncs.

OK.

>
>
> >
> >> @@ -1559,8 +1528,8 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
> >>			   gtt_offset & GEN12_OAG_OATAILPTR_MASK);
> >>
> >>	/* Mark that we need updated tail pointers to read from... */
> >> -	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
> >> -	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
> >> +	stream->oa_buffer.aging_tail =
> >> +		stream->oa_buffer.tail = gtt_offset;
> > What is the reason for initializing the head and tail first to gtt_offset
> > and later subtracting it out? If hw_tail read from the HW has this offset
> > it seems simpler to just subtract out gtt_offset from hw_tail but leave all
> > these software head/tail not have gtt_offset (initialized to 0)? Though I
> > think this is all over the place so it is probably better left to a
> > separate later patch. So maybe leave as is for now.
>
>
> Quite possibly why I left the gtt_offset in there :)
>
> No recollection though...

Leave as is for this series.

>
>
> >
> > Thanks!
> > --
> > Ashutosh
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-03-19  7:46 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-12 23:04 [Intel-gfx] [PATCH 0/4] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
2020-03-12 23:04 ` [Intel-gfx] [PATCH 1/4] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
2020-03-13  9:54   ` Lionel Landwerlin
2020-03-16 19:23   ` Dixit, Ashutosh
2020-03-18  0:03     ` Lionel Landwerlin
2020-03-19  7:44       ` Dixit, Ashutosh
2020-03-12 23:05 ` [Intel-gfx] [PATCH 2/4] drm/i915/perf: move pollin setup to non hw specific code Umesh Nerlige Ramappa
2020-03-16 20:18   ` Dixit, Ashutosh
2020-03-12 23:05 ` [Intel-gfx] [PATCH 3/4] drm/i915/perf: only append status when data is available Umesh Nerlige Ramappa
2020-03-16 22:16   ` Dixit, Ashutosh
2020-03-17 10:32     ` Lionel Landwerlin
2020-03-12 23:05 ` [Intel-gfx] [PATCH 4/4] drm/i915/perf: add new open param to configure polling of OA buffer Umesh Nerlige Ramappa
2020-03-13  9:59   ` Lionel Landwerlin
2020-03-17  0:02   ` Dixit, Ashutosh
2020-03-13  0:45 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: add OA interrupt support (rev6) Patchwork
2020-03-13  0:47 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-03-13  1:02 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2020-03-13  1:13 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-03-13 14:14 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).