intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support
@ 2020-03-24 18:54 Umesh Nerlige Ramappa
  2020-03-24 18:54 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-03-24 18:54 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit

Hi all,

This is a revival of an earlier patch series submitted by Lionel
Landwerlin - https://patchwork.freedesktop.org/series/54280/

The patches enable interrupt support for the perf OA unit in
i915, further details can be found in the orignal series linked
above.

v2: (Umesh)
- This series will split into 2. This revision will only support the addition of
  poll delay parameter to perf OA. If needed a future revision will incorporate
  interrupt support.

Regards,
Umesh


Lionel Landwerlin (3):
  drm/i915/perf: rework aging tail workaround
  drm/i915/perf: move pollin setup to non hw specific code
  drm/i915/perf: add new open param to configure polling of OA buffer

 drivers/gpu/drm/i915/i915_perf.c       | 250 +++++++++++--------------
 drivers/gpu/drm/i915/i915_perf_types.h |  34 ++--
 include/uapi/drm/i915_drm.h            |  13 ++
 3 files changed, 144 insertions(+), 153 deletions(-)

-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround
  2020-03-24 18:54 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
@ 2020-03-24 18:54 ` Umesh Nerlige Ramappa
  2020-03-24 21:25   ` Dixit, Ashutosh
  2020-03-24 18:54 ` [Intel-gfx] [PATCH 2/3] drm/i915/perf: move pollin setup to non hw specific code Umesh Nerlige Ramappa
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-03-24 18:54 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

We're about to introduce an options to open the perf stream, giving
the user ability to configure how often it wants the kernel to poll
the OA registers for available data.

Right now the workaround against the OA tail pointer race condition
requires at least twice the internal kernel polling timer to make any
data available.

This changes introduce checks on the OA data written into the circular
buffer to make as much data as possible available on the first
iteration of the polling timer.

v2: Use OA_TAKEN macro without the gtt_offset (Lionel)
v3: (Umesh)
- Rebase
- Change report to report32 from below review
  https://patchwork.freedesktop.org/patch/330704/?series=66697&rev=1
v4: (Ashutosh, Lionel)
- Fix checkpatch errors
- Fix aging_timestamp initialization
- Check for only one valid landed report
- Fix check for unlanded report
v5: (Ashutosh)
- Fix bug in accurately determining landed report.
- Optimize the check for landed reports by going as far as the
  previously determined aged tail.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 204 ++++++++++---------------
 drivers/gpu/drm/i915/i915_perf_types.h |  28 ++--
 2 files changed, 97 insertions(+), 135 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 3222f6cd8255..4583ae9b77be 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -223,26 +223,17 @@
  *
  * Although this can be observed explicitly while copying reports to userspace
  * by checking for a zeroed report-id field in tail reports, we want to account
- * for this earlier, as part of the oa_buffer_check to avoid lots of redundant
- * read() attempts.
- *
- * In effect we define a tail pointer for reading that lags the real tail
- * pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough
- * time for the corresponding reports to become visible to the CPU.
- *
- * To manage this we actually track two tail pointers:
- *  1) An 'aging' tail with an associated timestamp that is tracked until we
- *     can trust the corresponding data is visible to the CPU; at which point
- *     it is considered 'aged'.
- *  2) An 'aged' tail that can be used for read()ing.
- *
- * The two separate pointers let us decouple read()s from tail pointer aging.
- *
- * The tail pointers are checked and updated at a limited rate within a hrtimer
- * callback (the same callback that is used for delivering EPOLLIN events)
- *
- * Initially the tails are marked invalid with %INVALID_TAIL_PTR which
- * indicates that an updated tail pointer is needed.
+ * for this earlier, as part of the oa_buffer_check_unlocked to avoid lots of
+ * redundant read() attempts.
+ *
+ * We workaround this issue in oa_buffer_check_unlocked() by reading the reports
+ * in the OA buffer, starting from the tail reported by the HW until we find a
+ * report with its first 2 dwords not 0 meaning its previous report is
+ * completely in memory and ready to be read. Those dwords are also set to 0
+ * once read and the whole buffer is cleared upon OA buffer initialization. The
+ * first dword is the reason for this report while the second is the timestamp,
+ * making the chances of having those 2 fields at 0 fairly unlikely. A more
+ * detailed explanation is available in oa_buffer_check_unlocked().
  *
  * Most of the implementation details for this workaround are in
  * oa_buffer_check_unlocked() and _append_oa_reports()
@@ -454,8 +445,8 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
  * (See description of OA_TAIL_MARGIN_NSEC above for further details.)
  *
  * Besides returning true when there is data available to read() this function
- * also has the side effect of updating the oa_buffer.tails[], .aging_timestamp
- * and .aged_tail_idx state used for reading.
+ * also updates the tail, aging_tail and aging_timestamp in the oa_buffer
+ * object.
  *
  * Note: It's safe to read OA config state here unlocked, assuming that this is
  * only called while the stream is enabled, while the global OA configuration
@@ -465,28 +456,18 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
  */
 static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 {
+	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
 	int report_size = stream->oa_buffer.format_size;
 	unsigned long flags;
-	unsigned int aged_idx;
-	u32 head, hw_tail, aged_tail, aging_tail;
+	u32 hw_tail;
 	u64 now;
 
 	/* We have to consider the (unlikely) possibility that read() errors
-	 * could result in an OA buffer reset which might reset the head,
-	 * tails[] and aged_tail state.
+	 * could result in an OA buffer reset which might reset the head and
+	 * tail state.
 	 */
 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
-	/* NB: The head we observe here might effectively be a little out of
-	 * date (between head and tails[aged_idx].offset if there is currently
-	 * a read() in progress.
-	 */
-	head = stream->oa_buffer.head;
-
-	aged_idx = stream->oa_buffer.aged_tail_idx;
-	aged_tail = stream->oa_buffer.tails[aged_idx].offset;
-	aging_tail = stream->oa_buffer.tails[!aged_idx].offset;
-
 	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
 
 	/* The tail pointer increases in 64 byte increments,
@@ -496,64 +477,61 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 
 	now = ktime_get_mono_fast_ns();
 
-	/* Update the aged tail
-	 *
-	 * Flip the tail pointer available for read()s once the aging tail is
-	 * old enough to trust that the corresponding data will be visible to
-	 * the CPU...
-	 *
-	 * Do this before updating the aging pointer in case we may be able to
-	 * immediately start aging a new pointer too (if new data has become
-	 * available) without needing to wait for a later hrtimer callback.
-	 */
-	if (aging_tail != INVALID_TAIL_PTR &&
-	    ((now - stream->oa_buffer.aging_timestamp) >
-	     OA_TAIL_MARGIN_NSEC)) {
-
-		aged_idx ^= 1;
-		stream->oa_buffer.aged_tail_idx = aged_idx;
+	if (hw_tail == stream->oa_buffer.aging_tail &&
+	    (now - stream->oa_buffer.aging_timestamp) > OA_TAIL_MARGIN_NSEC) {
+		/* If the HW tail hasn't move since the last check and the HW
+		 * tail has been aging for long enough, declare it the new
+		 * tail.
+		 */
+		stream->oa_buffer.tail = stream->oa_buffer.aging_tail;
+	} else {
+		u32 head, tail, aged_tail;
 
-		aged_tail = aging_tail;
+		/* NB: The head we observe here might effectively be a little
+		 * out of date. If a read() is in progress, the head could be
+		 * anywhere between this head and stream->oa_buffer.tail.
+		 */
+		head = stream->oa_buffer.head - gtt_offset;
+		aged_tail = stream->oa_buffer.tail - gtt_offset;
+
+		hw_tail -= gtt_offset;
+		tail = hw_tail;
+
+		/* Walk the stream backward until we find a report with dword 0
+		 * & 1 not at 0. Since the circular buffer pointers progress by
+		 * increments of 64 bytes and that reports can be up to 256
+		 * bytes long, we can't tell whether a report has fully landed
+		 * in memory before the first 2 dwords of the following report
+		 * have effectively landed.
+		 *
+		 * This is assuming that the writes of the OA unit land in
+		 * memory in the order they were written to.
+		 * If not : (╯°□°)╯︵ ┻━┻
+		 */
+		while (OA_TAKEN(tail, aged_tail) >= report_size) {
+			u32 *report32 = (void *)(stream->oa_buffer.vaddr + tail);
 
-		/* Mark that we need a new pointer to start aging... */
-		stream->oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
-		aging_tail = INVALID_TAIL_PTR;
-	}
+			if (report32[0] != 0 || report32[1] != 0)
+				break;
 
-	/* Update the aging tail
-	 *
-	 * We throttle aging tail updates until we have a new tail that
-	 * represents >= one report more data than is already available for
-	 * reading. This ensures there will be enough data for a successful
-	 * read once this new pointer has aged and ensures we will give the new
-	 * pointer time to age.
-	 */
-	if (aging_tail == INVALID_TAIL_PTR &&
-	    (aged_tail == INVALID_TAIL_PTR ||
-	     OA_TAKEN(hw_tail, aged_tail) >= report_size)) {
-		struct i915_vma *vma = stream->oa_buffer.vma;
-		u32 gtt_offset = i915_ggtt_offset(vma);
-
-		/* Be paranoid and do a bounds check on the pointer read back
-		 * from hardware, just in case some spurious hardware condition
-		 * could put the tail out of bounds...
-		 */
-		if (hw_tail >= gtt_offset &&
-		    hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
-			stream->oa_buffer.tails[!aged_idx].offset =
-				aging_tail = hw_tail;
-			stream->oa_buffer.aging_timestamp = now;
-		} else {
-			drm_err(&stream->perf->i915->drm,
-				"Ignoring spurious out of range OA buffer tail pointer = %x\n",
-				hw_tail);
+			tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
 		}
+
+		if (OA_TAKEN(hw_tail, tail) > report_size &&
+		    __ratelimit(&stream->perf->tail_pointer_race))
+			DRM_NOTE("unlanded report(s) head=0x%x "
+				 "tail=0x%x hw_tail=0x%x\n",
+				 head, tail, hw_tail);
+
+		stream->oa_buffer.tail = gtt_offset + tail;
+		stream->oa_buffer.aging_tail = gtt_offset + hw_tail;
+		stream->oa_buffer.aging_timestamp = now;
 	}
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	return aged_tail == INVALID_TAIL_PTR ?
-		false : OA_TAKEN(aged_tail, head) >= report_size;
+	return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
+			stream->oa_buffer.head - gtt_offset) >= report_size;
 }
 
 /**
@@ -671,7 +649,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	u32 mask = (OA_BUFFER_SIZE - 1);
 	size_t start_offset = *offset;
 	unsigned long flags;
-	unsigned int aged_tail_idx;
 	u32 head, tail;
 	u32 taken;
 	int ret = 0;
@@ -682,18 +659,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
 	head = stream->oa_buffer.head;
-	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
-	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
+	tail = stream->oa_buffer.tail;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	/*
-	 * An invalid tail pointer here means we're still waiting for the poll
-	 * hrtimer callback to give us a pointer
-	 */
-	if (tail == INVALID_TAIL_PTR)
-		return -EAGAIN;
-
 	/*
 	 * NB: oa_buffer.head/tail include the gtt_offset which we don't want
 	 * while indexing relative to oa_buf_base.
@@ -827,13 +796,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		}
 
 		/*
-		 * The above reason field sanity check is based on
-		 * the assumption that the OA buffer is initially
-		 * zeroed and we reset the field after copying so the
-		 * check is still meaningful once old reports start
-		 * being overwritten.
+		 * Clear out the first 2 dword as a mean to detect unlanded
+		 * reports.
 		 */
 		report32[0] = 0;
+		report32[1] = 0;
 	}
 
 	if (start_offset != *offset) {
@@ -974,7 +941,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	u32 mask = (OA_BUFFER_SIZE - 1);
 	size_t start_offset = *offset;
 	unsigned long flags;
-	unsigned int aged_tail_idx;
 	u32 head, tail;
 	u32 taken;
 	int ret = 0;
@@ -985,17 +951,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
 	head = stream->oa_buffer.head;
-	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
-	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
+	tail = stream->oa_buffer.tail;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	/* An invalid tail pointer here means we're still waiting for the poll
-	 * hrtimer callback to give us a pointer
-	 */
-	if (tail == INVALID_TAIL_PTR)
-		return -EAGAIN;
-
 	/* NB: oa_buffer.head/tail include the gtt_offset which we don't want
 	 * while indexing relative to oa_buf_base.
 	 */
@@ -1053,13 +1012,11 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 		if (ret)
 			break;
 
-		/* The above report-id field sanity check is based on
-		 * the assumption that the OA buffer is initially
-		 * zeroed and we reset the field after copying so the
-		 * check is still meaningful once old reports start
-		 * being overwritten.
+		/* Clear out the first 2 dwords as a mean to detect unlanded
+		 * reports.
 		 */
 		report32[0] = 0;
+		report32[1] = 0;
 	}
 
 	if (start_offset != *offset) {
@@ -1438,8 +1395,8 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
 			   gtt_offset | OABUFFER_SIZE_16M);
 
 	/* Mark that we need updated tail pointers to read from... */
-	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
-	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
+	stream->oa_buffer.aging_tail = INVALID_TAIL_PTR;
+	stream->oa_buffer.tail = gtt_offset;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
@@ -1492,8 +1449,8 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
 	intel_uncore_write(uncore, GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK);
 
 	/* Mark that we need updated tail pointers to read from... */
-	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
-	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
+	stream->oa_buffer.aging_tail = INVALID_TAIL_PTR;
+	stream->oa_buffer.tail = gtt_offset;
 
 	/*
 	 * Reset state used to recognise context switches, affecting which
@@ -1548,8 +1505,8 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
 			   gtt_offset & GEN12_OAG_OATAILPTR_MASK);
 
 	/* Mark that we need updated tail pointers to read from... */
-	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
-	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
+	stream->oa_buffer.aging_tail = INVALID_TAIL_PTR;
+	stream->oa_buffer.tail = gtt_offset;
 
 	/*
 	 * Reset state used to recognise context switches, affecting which
@@ -4398,6 +4355,11 @@ void i915_perf_init(struct drm_i915_private *i915)
 		ratelimit_set_flags(&perf->spurious_report_rs,
 				    RATELIMIT_MSG_ON_RELEASE);
 
+		ratelimit_state_init(&perf->tail_pointer_race,
+				     5 * HZ, 10);
+		ratelimit_set_flags(&perf->tail_pointer_race,
+				    RATELIMIT_MSG_ON_RELEASE);
+
 		atomic64_set(&perf->noa_programming_delay,
 			     500 * 1000 /* 500us */);
 
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index 32289cbda648..c3ab184c604a 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -273,21 +273,10 @@ struct i915_perf_stream {
 		spinlock_t ptr_lock;
 
 		/**
-		 * @tails: One 'aging' tail pointer and one 'aged' tail pointer ready to
-		 * used for reading.
-		 *
-		 * Initial values of 0xffffffff are invalid and imply that an
-		 * update is required (and should be ignored by an attempted
-		 * read)
+		 * @aging_tail: The last HW tail reported by HW. The data
+		 * might not have made it to memory yet though.
 		 */
-		struct {
-			u32 offset;
-		} tails[2];
-
-		/**
-		 * @aged_tail_idx: Index for the aged tail ready to read() data up to.
-		 */
-		unsigned int aged_tail_idx;
+		u32 aging_tail;
 
 		/**
 		 * @aging_timestamp: A monotonic timestamp for when the current aging tail pointer
@@ -303,6 +292,11 @@ struct i915_perf_stream {
 		 * OA buffer data to userspace.
 		 */
 		u32 head;
+
+		/**
+		 * @tail: The last verified tail that can be read by userspace.
+		 */
+		u32 tail;
 	} oa_buffer;
 
 	/**
@@ -420,6 +414,12 @@ struct i915_perf {
 	 */
 	struct ratelimit_state spurious_report_rs;
 
+	/**
+	 * For rate limiting any notifications of tail pointer
+	 * race.
+	 */
+	struct ratelimit_state tail_pointer_race;
+
 	u32 gen7_latched_oastatus1;
 	u32 ctx_oactxctrl_offset;
 	u32 ctx_flexeu0_offset;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Intel-gfx] [PATCH 2/3] drm/i915/perf: move pollin setup to non hw specific code
  2020-03-24 18:54 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
  2020-03-24 18:54 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
@ 2020-03-24 18:54 ` Umesh Nerlige Ramappa
  2020-03-24 18:54 ` [Intel-gfx] [PATCH 3/3] drm/i915/perf: add new open param to configure polling of OA buffer Umesh Nerlige Ramappa
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-03-24 18:54 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

This isn't really gen specific stuff, so just move it to the common
code.

v2: Rebase (Umesh)
v3: Remove comment, pollin is a per stream state already (Ashutosh)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 4583ae9b77be..4cadf97f94bc 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1418,8 +1418,6 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
 	 * memory...
 	 */
 	memset(stream->oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
-
-	stream->pollin = false;
 }
 
 static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
@@ -1474,8 +1472,6 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
 	 * memory...
 	 */
 	memset(stream->oa_buffer.vaddr, 0, OA_BUFFER_SIZE);
-
-	stream->pollin = false;
 }
 
 static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
@@ -1531,8 +1527,6 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
 	 */
 	memset(stream->oa_buffer.vaddr, 0,
 	       stream->oa_buffer.vma->size);
-
-	stream->pollin = false;
 }
 
 static int alloc_oa_buffer(struct i915_perf_stream *stream)
@@ -2600,6 +2594,8 @@ static void gen12_oa_enable(struct i915_perf_stream *stream)
  */
 static void i915_oa_stream_enable(struct i915_perf_stream *stream)
 {
+	stream->pollin = false;
+
 	stream->perf->ops.oa_enable(stream);
 
 	if (stream->periodic)
@@ -3023,12 +3019,8 @@ static ssize_t i915_perf_read(struct file *file,
 	 * effectively ensures we back off until the next hrtimer callback
 	 * before reporting another EPOLLIN event.
 	 */
-	if (ret >= 0 || ret == -EAGAIN) {
-		/* Maybe make ->pollin per-stream state if we support multiple
-		 * concurrent streams in the future.
-		 */
+	if (ret >= 0 || ret == -EAGAIN)
 		stream->pollin = false;
-	}
 
 	return ret;
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Intel-gfx] [PATCH 3/3] drm/i915/perf: add new open param to configure polling of OA buffer
  2020-03-24 18:54 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
  2020-03-24 18:54 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
  2020-03-24 18:54 ` [Intel-gfx] [PATCH 2/3] drm/i915/perf: move pollin setup to non hw specific code Umesh Nerlige Ramappa
@ 2020-03-24 18:54 ` Umesh Nerlige Ramappa
  2020-03-24 19:27 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/perf: add OA interrupt support (rev8) Patchwork
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-03-24 18:54 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

This new parameter let's the application choose how often the OA
buffer should be checked on the CPU side for data availability. Longer
polling period tend to reduce CPU overhead if the application does not
care about somewhat real time data collection.

v2: Allow disabling polling completely with 0 value (Lionel)
v3: Version the new parameter (Joonas)
v4: Rebase (Umesh)
v5: Make poll delay value of 0 invalid (Umesh)
v6:
- Describe poll_oa_period (Ashutosh)
- Fix comment for new poll parameter (Lionel)
- Drop open_flags in read_properties_unlocked (Lionel)
- Rename uapi parameter (Ashutosh)
v7: Reword the comment in uapi (Ashutosh)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c       | 32 ++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_perf_types.h |  6 +++++
 include/uapi/drm/i915_drm.h            | 13 +++++++++++
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 4cadf97f94bc..c74ebac50015 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -248,11 +248,11 @@
 #define OA_TAIL_MARGIN_NSEC	100000ULL
 #define INVALID_TAIL_PTR	0xffffffff
 
-/* frequency for checking whether the OA unit has written new reports to the
- * circular OA buffer...
+/* The default frequency for checking whether the OA unit has written new
+ * reports to the circular OA buffer...
  */
-#define POLL_FREQUENCY 200
-#define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY)
+#define DEFAULT_POLL_FREQUENCY_HZ 200
+#define DEFAULT_POLL_PERIOD_NS (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY_HZ)
 
 /* for sysctl proc_dointvec_minmax of dev.i915.perf_stream_paranoid */
 static u32 i915_perf_stream_paranoid = true;
@@ -339,6 +339,8 @@ static const struct i915_oa_format gen12_oa_formats[I915_OA_FORMAT_MAX] = {
  * @sseu: internal SSEU configuration computed either from the userspace
  *        specified configuration in the opening parameters or a default value
  *        (see get_default_sseu_config())
+ * @poll_oa_period: The period in nanoseconds at which the CPU will check for OA
+ * data availability
  *
  * As read_properties_unlocked() enumerates and validates the properties given
  * to open a stream of metrics the configuration is built up in the structure
@@ -361,6 +363,8 @@ struct perf_open_properties {
 
 	bool has_sseu;
 	struct intel_sseu sseu;
+
+	u64 poll_oa_period;
 };
 
 struct i915_oa_config_bo {
@@ -2600,7 +2604,7 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream)
 
 	if (stream->periodic)
 		hrtimer_start(&stream->poll_check_timer,
-			      ns_to_ktime(POLL_PERIOD),
+			      ns_to_ktime(stream->poll_oa_period),
 			      HRTIMER_MODE_REL_PINNED);
 }
 
@@ -3035,7 +3039,8 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer)
 		wake_up(&stream->poll_wq);
 	}
 
-	hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
+	hrtimer_forward_now(hrtimer,
+			    ns_to_ktime(stream->poll_oa_period));
 
 	return HRTIMER_RESTART;
 }
@@ -3424,6 +3429,7 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
 
 	stream->perf = perf;
 	stream->ctx = specific_ctx;
+	stream->poll_oa_period = props->poll_oa_period;
 
 	ret = i915_oa_stream_init(stream, param, props);
 	if (ret)
@@ -3502,6 +3508,7 @@ static int read_properties_unlocked(struct i915_perf *perf,
 	int ret;
 
 	memset(props, 0, sizeof(struct perf_open_properties));
+	props->poll_oa_period = DEFAULT_POLL_PERIOD_NS;
 
 	if (!n_props) {
 		DRM_DEBUG("No i915 perf properties given\n");
@@ -3634,6 +3641,14 @@ static int read_properties_unlocked(struct i915_perf *perf,
 			props->has_sseu = true;
 			break;
 		}
+		case DRM_I915_PERF_PROP_POLL_OA_PERIOD:
+			if (value < 100000 /* 100us */) {
+				DRM_DEBUG("OA availability timer too small (%lluns < 100us)\n",
+					  value);
+				return -EINVAL;
+			}
+			props->poll_oa_period = value;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
@@ -4416,8 +4431,11 @@ int i915_perf_ioctl_version(void)
 	 * 4: Add DRM_I915_PERF_PROP_ALLOWED_SSEU to limit what contexts can
 	 *    be run for the duration of the performance recording based on
 	 *    their SSEU configuration.
+	 *
+	 * 5: Add DRM_I915_PERF_PROP_POLL_OA_PERIOD parameter that controls the
+	 *    interval for the hrtimer used to check for OA data.
 	 */
-	return 4;
+	return 5;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index c3ab184c604a..371dcf243622 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -304,6 +304,12 @@ struct i915_perf_stream {
 	 * reprogrammed.
 	 */
 	struct i915_vma *noa_wait;
+
+	/**
+	 * @poll_oa_period: The period in nanoseconds at which the OA
+	 * buffer should be checked for available data.
+	 */
+	u64 poll_oa_period;
 };
 
 /**
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index db649d03ab52..14b67cd6b54b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1980,6 +1980,19 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_GLOBAL_SSEU,
 
+	/**
+	 * This optional parameter specifies the timer interval in nanoseconds
+	 * at which the i915 driver will check the OA buffer for available data.
+	 * Minimum allowed value is 100 microseconds. A default value is used by
+	 * the driver if this parameter is not specified. Note that larger timer
+	 * values will reduce cpu consumption during OA perf captures. However,
+	 * excessively large values would potentially result in OA buffer
+	 * overwrites as captures reach end of the OA buffer.
+	 *
+	 * This property is available in perf revision 5.
+	 */
+	DRM_I915_PERF_PROP_POLL_OA_PERIOD,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/perf: add OA interrupt support (rev8)
  2020-03-24 18:54 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (2 preceding siblings ...)
  2020-03-24 18:54 ` [Intel-gfx] [PATCH 3/3] drm/i915/perf: add new open param to configure polling of OA buffer Umesh Nerlige Ramappa
@ 2020-03-24 19:27 ` Patchwork
  2020-03-24 19:50 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
  2020-03-24 21:13 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  5 siblings, 0 replies; 8+ messages in thread
From: Patchwork @ 2020-03-24 19:27 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev8)
URL   : https://patchwork.freedesktop.org/series/54280/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.6.0
Commit: drm/i915/perf: rework aging tail workaround
Okay!

Commit: drm/i915/perf: move pollin setup to non hw specific code
-O:drivers/gpu/drm/i915/i915_perf.c:1420:15: warning: memset with byte count of 16777216
-O:drivers/gpu/drm/i915/i915_perf.c:1476:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1420:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1474:15: warning: memset with byte count of 16777216

Commit: drm/i915/perf: add new open param to configure polling of OA buffer
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/perf: add OA interrupt support (rev8)
  2020-03-24 18:54 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (3 preceding siblings ...)
  2020-03-24 19:27 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/perf: add OA interrupt support (rev8) Patchwork
@ 2020-03-24 19:50 ` Patchwork
  2020-03-24 21:13 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  5 siblings, 0 replies; 8+ messages in thread
From: Patchwork @ 2020-03-24 19:50 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev8)
URL   : https://patchwork.freedesktop.org/series/54280/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8183 -> Patchwork_17072
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/index.html

Known issues
------------

  Here are the changes found in Patchwork_17072 that come from known issues:

### IGT changes ###

#### Warnings ####

  * igt@i915_selftest@live@gem_contexts:
    - fi-cfl-guc:         [INCOMPLETE][1] ([fdo#106070] / [i915#424]) -> [DMESG-FAIL][2] ([i915#943])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/fi-cfl-guc/igt@i915_selftest@live@gem_contexts.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/fi-cfl-guc/igt@i915_selftest@live@gem_contexts.html

  
  [fdo#106070]: https://bugs.freedesktop.org/show_bug.cgi?id=106070
  [i915#424]: https://gitlab.freedesktop.org/drm/intel/issues/424
  [i915#943]: https://gitlab.freedesktop.org/drm/intel/issues/943


Participating hosts (45 -> 42)
------------------------------

  Additional (5): fi-kbl-7560u fi-kbl-7500u fi-cfl-8109u fi-skl-lmem fi-blb-e6850 
  Missing    (8): fi-hsw-4770r fi-hsw-4200u fi-hsw-peppy fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8183 -> Patchwork_17072

  CI-20190529: 20190529
  CI_DRM_8183: 795894daf2cc32246af94541733e08649d082470 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5535: d1dcf40cc6869ac858586c5ad9f09af6617ce2ee @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17072: 72aa75eecb08b19d17cd766043dc0d2ccbbba8f6 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

72aa75eecb08 drm/i915/perf: add new open param to configure polling of OA buffer
8216331a9ceb drm/i915/perf: move pollin setup to non hw specific code
2dbd12eaa63c drm/i915/perf: rework aging tail workaround

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/perf: add OA interrupt support (rev8)
  2020-03-24 18:54 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (4 preceding siblings ...)
  2020-03-24 19:50 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2020-03-24 21:13 ` Patchwork
  5 siblings, 0 replies; 8+ messages in thread
From: Patchwork @ 2020-03-24 21:13 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev8)
URL   : https://patchwork.freedesktop.org/series/54280/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8183_full -> Patchwork_17072_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_17072_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_balancer@smoke:
    - shard-iclb:         [PASS][1] -> [SKIP][2] ([fdo#110854])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb2/igt@gem_exec_balancer@smoke.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb3/igt@gem_exec_balancer@smoke.html

  * igt@gem_exec_schedule@implicit-read-write-bsd1:
    - shard-iclb:         [PASS][3] -> [SKIP][4] ([fdo#109276] / [i915#677]) +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb1/igt@gem_exec_schedule@implicit-read-write-bsd1.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb3/igt@gem_exec_schedule@implicit-read-write-bsd1.html

  * igt@gem_exec_schedule@pi-shared-iova-bsd:
    - shard-iclb:         [PASS][5] -> [SKIP][6] ([i915#677])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb6/igt@gem_exec_schedule@pi-shared-iova-bsd.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb1/igt@gem_exec_schedule@pi-shared-iova-bsd.html

  * igt@gem_exec_schedule@preempt-queue-bsd1:
    - shard-iclb:         [PASS][7] -> [SKIP][8] ([fdo#109276]) +22 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb2/igt@gem_exec_schedule@preempt-queue-bsd1.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb3/igt@gem_exec_schedule@preempt-queue-bsd1.html

  * igt@gem_exec_schedule@preemptive-hang-bsd:
    - shard-iclb:         [PASS][9] -> [SKIP][10] ([fdo#112146]) +3 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb7/igt@gem_exec_schedule@preemptive-hang-bsd.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb1/igt@gem_exec_schedule@preemptive-hang-bsd.html

  * igt@kms_cursor_crc@pipe-b-cursor-256x85-sliding:
    - shard-skl:          [PASS][11] -> [FAIL][12] ([i915#54])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-skl3/igt@kms_cursor_crc@pipe-b-cursor-256x85-sliding.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-skl7/igt@kms_cursor_crc@pipe-b-cursor-256x85-sliding.html

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-glk:          [PASS][13] -> [FAIL][14] ([i915#79])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-glk9/igt@kms_flip@flip-vs-expired-vblank.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-glk6/igt@kms_flip@flip-vs-expired-vblank.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-kbl:          [PASS][15] -> [DMESG-WARN][16] ([i915#180]) +3 similar issues
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-kbl3/igt@kms_flip@flip-vs-suspend-interruptible.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-kbl3/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [PASS][17] -> [FAIL][18] ([fdo#108145] / [i915#265]) +1 similar issue
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-skl9/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-skl1/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html

  * igt@kms_psr2_su@page_flip:
    - shard-iclb:         [PASS][19] -> [SKIP][20] ([fdo#109642] / [fdo#111068])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb2/igt@kms_psr2_su@page_flip.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb3/igt@kms_psr2_su@page_flip.html

  * igt@kms_psr@psr2_dpms:
    - shard-iclb:         [PASS][21] -> [SKIP][22] ([fdo#109441]) +1 similar issue
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb2/igt@kms_psr@psr2_dpms.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb4/igt@kms_psr@psr2_dpms.html

  * igt@kms_vblank@pipe-a-ts-continuation-suspend:
    - shard-apl:          [PASS][23] -> [DMESG-WARN][24] ([i915#180])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-apl7/igt@kms_vblank@pipe-a-ts-continuation-suspend.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-apl1/igt@kms_vblank@pipe-a-ts-continuation-suspend.html

  * igt@perf_pmu@busy-vcs1:
    - shard-iclb:         [PASS][25] -> [SKIP][26] ([fdo#112080]) +11 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb2/igt@perf_pmu@busy-vcs1.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb3/igt@perf_pmu@busy-vcs1.html

  
#### Possible fixes ####

  * igt@gem_busy@busy-vcs1:
    - shard-iclb:         [SKIP][27] ([fdo#112080]) -> [PASS][28] +6 similar issues
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb8/igt@gem_busy@busy-vcs1.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb4/igt@gem_busy@busy-vcs1.html

  * igt@gem_exec_parallel@contexts:
    - shard-tglb:         [INCOMPLETE][29] ([i915#470] / [i915#529]) -> [PASS][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-tglb2/igt@gem_exec_parallel@contexts.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-tglb2/igt@gem_exec_parallel@contexts.html

  * igt@gem_exec_schedule@implicit-read-write-bsd2:
    - shard-iclb:         [SKIP][31] ([fdo#109276] / [i915#677]) -> [PASS][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb7/igt@gem_exec_schedule@implicit-read-write-bsd2.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb4/igt@gem_exec_schedule@implicit-read-write-bsd2.html

  * igt@gem_exec_schedule@pi-distinct-iova-bsd:
    - shard-iclb:         [SKIP][33] ([i915#677]) -> [PASS][34] +2 similar issues
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb1/igt@gem_exec_schedule@pi-distinct-iova-bsd.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb8/igt@gem_exec_schedule@pi-distinct-iova-bsd.html

  * igt@gem_exec_schedule@preempt-self-bsd:
    - shard-iclb:         [SKIP][35] ([fdo#112146]) -> [PASS][36] +4 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb1/igt@gem_exec_schedule@preempt-self-bsd.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb8/igt@gem_exec_schedule@preempt-self-bsd.html

  * igt@gem_ringfill@basic-default-hang:
    - shard-tglb:         [INCOMPLETE][37] -> [PASS][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-tglb6/igt@gem_ringfill@basic-default-hang.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-tglb8/igt@gem_ringfill@basic-default-hang.html

  * igt@i915_suspend@fence-restore-tiled2untiled:
    - shard-apl:          [DMESG-WARN][39] ([i915#180]) -> [PASS][40] +5 similar issues
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-apl4/igt@i915_suspend@fence-restore-tiled2untiled.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-apl7/igt@i915_suspend@fence-restore-tiled2untiled.html

  * igt@kms_cursor_crc@pipe-a-cursor-64x64-random:
    - shard-apl:          [FAIL][41] ([i915#54]) -> [PASS][42]
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-apl3/igt@kms_cursor_crc@pipe-a-cursor-64x64-random.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-apl3/igt@kms_cursor_crc@pipe-a-cursor-64x64-random.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][43] ([i915#180]) -> [PASS][44] +4 similar issues
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-kbl1/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-kbl4/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_cursor_crc@pipe-c-cursor-64x21-random:
    - shard-glk:          [FAIL][45] ([i915#54]) -> [PASS][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-glk8/igt@kms_cursor_crc@pipe-c-cursor-64x21-random.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-glk9/igt@kms_cursor_crc@pipe-c-cursor-64x21-random.html

  * igt@kms_flip@2x-plain-flip-fb-recreate-interruptible:
    - shard-glk:          [FAIL][47] ([i915#34]) -> [PASS][48]
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-glk4/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-glk7/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible.html

  * igt@kms_hdr@bpc-switch:
    - shard-skl:          [FAIL][49] ([i915#1188]) -> [PASS][50]
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-skl7/igt@kms_hdr@bpc-switch.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-skl10/igt@kms_hdr@bpc-switch.html

  * igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min:
    - shard-skl:          [FAIL][51] ([fdo#108145]) -> [PASS][52]
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-skl1/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-skl9/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html

  * igt@kms_psr@psr2_sprite_mmap_cpu:
    - shard-iclb:         [SKIP][53] ([fdo#109441]) -> [PASS][54]
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb5/igt@kms_psr@psr2_sprite_mmap_cpu.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb2/igt@kms_psr@psr2_sprite_mmap_cpu.html

  * igt@prime_busy@hang-bsd2:
    - shard-iclb:         [SKIP][55] ([fdo#109276]) -> [PASS][56] +20 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-iclb8/igt@prime_busy@hang-bsd2.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-iclb1/igt@prime_busy@hang-bsd2.html

  
#### Warnings ####

  * igt@i915_pm_dc@dc6-dpms:
    - shard-skl:          [FAIL][57] ([i915#454]) -> [INCOMPLETE][58] ([i915#198])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-skl5/igt@i915_pm_dc@dc6-dpms.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-skl1/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_rpm@cursor:
    - shard-snb:          [INCOMPLETE][59] ([i915#82]) -> [SKIP][60] ([fdo#109271])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8183/shard-snb2/igt@i915_pm_rpm@cursor.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/shard-snb5/igt@i915_pm_rpm@cursor.html

  
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#110854]: https://bugs.freedesktop.org/show_bug.cgi?id=110854
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [fdo#112080]: https://bugs.freedesktop.org/show_bug.cgi?id=112080
  [fdo#112146]: https://bugs.freedesktop.org/show_bug.cgi?id=112146
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#198]: https://gitlab.freedesktop.org/drm/intel/issues/198
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#34]: https://gitlab.freedesktop.org/drm/intel/issues/34
  [i915#454]: https://gitlab.freedesktop.org/drm/intel/issues/454
  [i915#470]: https://gitlab.freedesktop.org/drm/intel/issues/470
  [i915#529]: https://gitlab.freedesktop.org/drm/intel/issues/529
  [i915#54]: https://gitlab.freedesktop.org/drm/intel/issues/54
  [i915#677]: https://gitlab.freedesktop.org/drm/intel/issues/677
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#82]: https://gitlab.freedesktop.org/drm/intel/issues/82


Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8183 -> Patchwork_17072

  CI-20190529: 20190529
  CI_DRM_8183: 795894daf2cc32246af94541733e08649d082470 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5535: d1dcf40cc6869ac858586c5ad9f09af6617ce2ee @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17072: 72aa75eecb08b19d17cd766043dc0d2ccbbba8f6 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17072/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround
  2020-03-24 18:54 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
@ 2020-03-24 21:25   ` Dixit, Ashutosh
  0 siblings, 0 replies; 8+ messages in thread
From: Dixit, Ashutosh @ 2020-03-24 21:25 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 24 Mar 2020 11:54:55 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> We're about to introduce an options to open the perf stream, giving
> the user ability to configure how often it wants the kernel to poll
> the OA registers for available data.
>
> Right now the workaround against the OA tail pointer race condition
> requires at least twice the internal kernel polling timer to make any
> data available.
>
> This changes introduce checks on the OA data written into the circular
> buffer to make as much data as possible available on the first
> iteration of the polling timer.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

>
> v2: Use OA_TAKEN macro without the gtt_offset (Lionel)
> v3: (Umesh)
> - Rebase
> - Change report to report32 from below review
>   https://patchwork.freedesktop.org/patch/330704/?series=66697&rev=1
> v4: (Ashutosh, Lionel)
> - Fix checkpatch errors
> - Fix aging_timestamp initialization
> - Check for only one valid landed report
> - Fix check for unlanded report
> v5: (Ashutosh)
> - Fix bug in accurately determining landed report.
> - Optimize the check for landed reports by going as far as the
>   previously determined aged tail.
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_perf.c       | 204 ++++++++++---------------
>  drivers/gpu/drm/i915/i915_perf_types.h |  28 ++--
>  2 files changed, 97 insertions(+), 135 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 3222f6cd8255..4583ae9b77be 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -223,26 +223,17 @@
>   *
>   * Although this can be observed explicitly while copying reports to userspace
>   * by checking for a zeroed report-id field in tail reports, we want to account
> - * for this earlier, as part of the oa_buffer_check to avoid lots of redundant
> - * read() attempts.
> - *
> - * In effect we define a tail pointer for reading that lags the real tail
> - * pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough
> - * time for the corresponding reports to become visible to the CPU.
> - *
> - * To manage this we actually track two tail pointers:
> - *  1) An 'aging' tail with an associated timestamp that is tracked until we
> - *     can trust the corresponding data is visible to the CPU; at which point
> - *     it is considered 'aged'.
> - *  2) An 'aged' tail that can be used for read()ing.
> - *
> - * The two separate pointers let us decouple read()s from tail pointer aging.
> - *
> - * The tail pointers are checked and updated at a limited rate within a hrtimer
> - * callback (the same callback that is used for delivering EPOLLIN events)
> - *
> - * Initially the tails are marked invalid with %INVALID_TAIL_PTR which
> - * indicates that an updated tail pointer is needed.
> + * for this earlier, as part of the oa_buffer_check_unlocked to avoid lots of
> + * redundant read() attempts.
> + *
> + * We workaround this issue in oa_buffer_check_unlocked() by reading the reports
> + * in the OA buffer, starting from the tail reported by the HW until we find a
> + * report with its first 2 dwords not 0 meaning its previous report is
> + * completely in memory and ready to be read. Those dwords are also set to 0
> + * once read and the whole buffer is cleared upon OA buffer initialization. The
> + * first dword is the reason for this report while the second is the timestamp,
> + * making the chances of having those 2 fields at 0 fairly unlikely. A more
> + * detailed explanation is available in oa_buffer_check_unlocked().
>   *
>   * Most of the implementation details for this workaround are in
>   * oa_buffer_check_unlocked() and _append_oa_reports()
> @@ -454,8 +445,8 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
>   * (See description of OA_TAIL_MARGIN_NSEC above for further details.)
>   *
>   * Besides returning true when there is data available to read() this function
> - * also has the side effect of updating the oa_buffer.tails[], .aging_timestamp
> - * and .aged_tail_idx state used for reading.
> + * also updates the tail, aging_tail and aging_timestamp in the oa_buffer
> + * object.
>   *
>   * Note: It's safe to read OA config state here unlocked, assuming that this is
>   * only called while the stream is enabled, while the global OA configuration
> @@ -465,28 +456,18 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
>   */
>  static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>  {
> +	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>	int report_size = stream->oa_buffer.format_size;
>	unsigned long flags;
> -	unsigned int aged_idx;
> -	u32 head, hw_tail, aged_tail, aging_tail;
> +	u32 hw_tail;
>	u64 now;
>
>	/* We have to consider the (unlikely) possibility that read() errors
> -	 * could result in an OA buffer reset which might reset the head,
> -	 * tails[] and aged_tail state.
> +	 * could result in an OA buffer reset which might reset the head and
> +	 * tail state.
>	 */
>	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>
> -	/* NB: The head we observe here might effectively be a little out of
> -	 * date (between head and tails[aged_idx].offset if there is currently
> -	 * a read() in progress.
> -	 */
> -	head = stream->oa_buffer.head;
> -
> -	aged_idx = stream->oa_buffer.aged_tail_idx;
> -	aged_tail = stream->oa_buffer.tails[aged_idx].offset;
> -	aging_tail = stream->oa_buffer.tails[!aged_idx].offset;
> -
>	hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
>
>	/* The tail pointer increases in 64 byte increments,
> @@ -496,64 +477,61 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>
>	now = ktime_get_mono_fast_ns();
>
> -	/* Update the aged tail
> -	 *
> -	 * Flip the tail pointer available for read()s once the aging tail is
> -	 * old enough to trust that the corresponding data will be visible to
> -	 * the CPU...
> -	 *
> -	 * Do this before updating the aging pointer in case we may be able to
> -	 * immediately start aging a new pointer too (if new data has become
> -	 * available) without needing to wait for a later hrtimer callback.
> -	 */
> -	if (aging_tail != INVALID_TAIL_PTR &&
> -	    ((now - stream->oa_buffer.aging_timestamp) >
> -	     OA_TAIL_MARGIN_NSEC)) {
> -
> -		aged_idx ^= 1;
> -		stream->oa_buffer.aged_tail_idx = aged_idx;
> +	if (hw_tail == stream->oa_buffer.aging_tail &&
> +	    (now - stream->oa_buffer.aging_timestamp) > OA_TAIL_MARGIN_NSEC) {
> +		/* If the HW tail hasn't move since the last check and the HW
> +		 * tail has been aging for long enough, declare it the new
> +		 * tail.
> +		 */
> +		stream->oa_buffer.tail = stream->oa_buffer.aging_tail;
> +	} else {
> +		u32 head, tail, aged_tail;
>
> -		aged_tail = aging_tail;
> +		/* NB: The head we observe here might effectively be a little
> +		 * out of date. If a read() is in progress, the head could be
> +		 * anywhere between this head and stream->oa_buffer.tail.
> +		 */
> +		head = stream->oa_buffer.head - gtt_offset;
> +		aged_tail = stream->oa_buffer.tail - gtt_offset;
> +
> +		hw_tail -= gtt_offset;
> +		tail = hw_tail;
> +
> +		/* Walk the stream backward until we find a report with dword 0
> +		 * & 1 not at 0. Since the circular buffer pointers progress by
> +		 * increments of 64 bytes and that reports can be up to 256
> +		 * bytes long, we can't tell whether a report has fully landed
> +		 * in memory before the first 2 dwords of the following report
> +		 * have effectively landed.
> +		 *
> +		 * This is assuming that the writes of the OA unit land in
> +		 * memory in the order they were written to.
> +		 * If not : (╯°□°)╯︵ ┻━┻
> +		 */
> +		while (OA_TAKEN(tail, aged_tail) >= report_size) {
> +			u32 *report32 = (void *)(stream->oa_buffer.vaddr + tail);
>
> -		/* Mark that we need a new pointer to start aging... */
> -		stream->oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
> -		aging_tail = INVALID_TAIL_PTR;
> -	}
> +			if (report32[0] != 0 || report32[1] != 0)
> +				break;
>
> -	/* Update the aging tail
> -	 *
> -	 * We throttle aging tail updates until we have a new tail that
> -	 * represents >= one report more data than is already available for
> -	 * reading. This ensures there will be enough data for a successful
> -	 * read once this new pointer has aged and ensures we will give the new
> -	 * pointer time to age.
> -	 */
> -	if (aging_tail == INVALID_TAIL_PTR &&
> -	    (aged_tail == INVALID_TAIL_PTR ||
> -	     OA_TAKEN(hw_tail, aged_tail) >= report_size)) {
> -		struct i915_vma *vma = stream->oa_buffer.vma;
> -		u32 gtt_offset = i915_ggtt_offset(vma);
> -
> -		/* Be paranoid and do a bounds check on the pointer read back
> -		 * from hardware, just in case some spurious hardware condition
> -		 * could put the tail out of bounds...
> -		 */
> -		if (hw_tail >= gtt_offset &&
> -		    hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
> -			stream->oa_buffer.tails[!aged_idx].offset =
> -				aging_tail = hw_tail;
> -			stream->oa_buffer.aging_timestamp = now;
> -		} else {
> -			drm_err(&stream->perf->i915->drm,
> -				"Ignoring spurious out of range OA buffer tail pointer = %x\n",
> -				hw_tail);
> +			tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
>		}
> +
> +		if (OA_TAKEN(hw_tail, tail) > report_size &&
> +		    __ratelimit(&stream->perf->tail_pointer_race))
> +			DRM_NOTE("unlanded report(s) head=0x%x "
> +				 "tail=0x%x hw_tail=0x%x\n",
> +				 head, tail, hw_tail);
> +
> +		stream->oa_buffer.tail = gtt_offset + tail;
> +		stream->oa_buffer.aging_tail = gtt_offset + hw_tail;
> +		stream->oa_buffer.aging_timestamp = now;
>	}
>
>	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>
> -	return aged_tail == INVALID_TAIL_PTR ?
> -		false : OA_TAKEN(aged_tail, head) >= report_size;
> +	return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> +			stream->oa_buffer.head - gtt_offset) >= report_size;
>  }
>
>  /**
> @@ -671,7 +649,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>	u32 mask = (OA_BUFFER_SIZE - 1);
>	size_t start_offset = *offset;
>	unsigned long flags;
> -	unsigned int aged_tail_idx;
>	u32 head, tail;
>	u32 taken;
>	int ret = 0;
> @@ -682,18 +659,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>
>	head = stream->oa_buffer.head;
> -	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
> -	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
> +	tail = stream->oa_buffer.tail;
>
>	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>
> -	/*
> -	 * An invalid tail pointer here means we're still waiting for the poll
> -	 * hrtimer callback to give us a pointer
> -	 */
> -	if (tail == INVALID_TAIL_PTR)
> -		return -EAGAIN;
> -
>	/*
>	 * NB: oa_buffer.head/tail include the gtt_offset which we don't want
>	 * while indexing relative to oa_buf_base.
> @@ -827,13 +796,11 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>		}
>
>		/*
> -		 * The above reason field sanity check is based on
> -		 * the assumption that the OA buffer is initially
> -		 * zeroed and we reset the field after copying so the
> -		 * check is still meaningful once old reports start
> -		 * being overwritten.
> +		 * Clear out the first 2 dword as a mean to detect unlanded
> +		 * reports.
>		 */
>		report32[0] = 0;
> +		report32[1] = 0;
>	}
>
>	if (start_offset != *offset) {
> @@ -974,7 +941,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>	u32 mask = (OA_BUFFER_SIZE - 1);
>	size_t start_offset = *offset;
>	unsigned long flags;
> -	unsigned int aged_tail_idx;
>	u32 head, tail;
>	u32 taken;
>	int ret = 0;
> @@ -985,17 +951,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>
>	head = stream->oa_buffer.head;
> -	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
> -	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
> +	tail = stream->oa_buffer.tail;
>
>	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>
> -	/* An invalid tail pointer here means we're still waiting for the poll
> -	 * hrtimer callback to give us a pointer
> -	 */
> -	if (tail == INVALID_TAIL_PTR)
> -		return -EAGAIN;
> -
>	/* NB: oa_buffer.head/tail include the gtt_offset which we don't want
>	 * while indexing relative to oa_buf_base.
>	 */
> @@ -1053,13 +1012,11 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>		if (ret)
>			break;
>
> -		/* The above report-id field sanity check is based on
> -		 * the assumption that the OA buffer is initially
> -		 * zeroed and we reset the field after copying so the
> -		 * check is still meaningful once old reports start
> -		 * being overwritten.
> +		/* Clear out the first 2 dwords as a mean to detect unlanded
> +		 * reports.
>		 */
>		report32[0] = 0;
> +		report32[1] = 0;
>	}
>
>	if (start_offset != *offset) {
> @@ -1438,8 +1395,8 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
>			   gtt_offset | OABUFFER_SIZE_16M);
>
>	/* Mark that we need updated tail pointers to read from... */
> -	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
> -	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
> +	stream->oa_buffer.aging_tail = INVALID_TAIL_PTR;
> +	stream->oa_buffer.tail = gtt_offset;
>
>	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>
> @@ -1492,8 +1449,8 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
>	intel_uncore_write(uncore, GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK);
>
>	/* Mark that we need updated tail pointers to read from... */
> -	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
> -	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
> +	stream->oa_buffer.aging_tail = INVALID_TAIL_PTR;
> +	stream->oa_buffer.tail = gtt_offset;
>
>	/*
>	 * Reset state used to recognise context switches, affecting which
> @@ -1548,8 +1505,8 @@ static void gen12_init_oa_buffer(struct i915_perf_stream *stream)
>			   gtt_offset & GEN12_OAG_OATAILPTR_MASK);
>
>	/* Mark that we need updated tail pointers to read from... */
> -	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
> -	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
> +	stream->oa_buffer.aging_tail = INVALID_TAIL_PTR;
> +	stream->oa_buffer.tail = gtt_offset;
>
>	/*
>	 * Reset state used to recognise context switches, affecting which
> @@ -4398,6 +4355,11 @@ void i915_perf_init(struct drm_i915_private *i915)
>		ratelimit_set_flags(&perf->spurious_report_rs,
>				    RATELIMIT_MSG_ON_RELEASE);
>
> +		ratelimit_state_init(&perf->tail_pointer_race,
> +				     5 * HZ, 10);
> +		ratelimit_set_flags(&perf->tail_pointer_race,
> +				    RATELIMIT_MSG_ON_RELEASE);
> +
>		atomic64_set(&perf->noa_programming_delay,
>			     500 * 1000 /* 500us */);
>
> diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
> index 32289cbda648..c3ab184c604a 100644
> --- a/drivers/gpu/drm/i915/i915_perf_types.h
> +++ b/drivers/gpu/drm/i915/i915_perf_types.h
> @@ -273,21 +273,10 @@ struct i915_perf_stream {
>		spinlock_t ptr_lock;
>
>		/**
> -		 * @tails: One 'aging' tail pointer and one 'aged' tail pointer ready to
> -		 * used for reading.
> -		 *
> -		 * Initial values of 0xffffffff are invalid and imply that an
> -		 * update is required (and should be ignored by an attempted
> -		 * read)
> +		 * @aging_tail: The last HW tail reported by HW. The data
> +		 * might not have made it to memory yet though.
>		 */
> -		struct {
> -			u32 offset;
> -		} tails[2];
> -
> -		/**
> -		 * @aged_tail_idx: Index for the aged tail ready to read() data up to.
> -		 */
> -		unsigned int aged_tail_idx;
> +		u32 aging_tail;
>
>		/**
>		 * @aging_timestamp: A monotonic timestamp for when the current aging tail pointer
> @@ -303,6 +292,11 @@ struct i915_perf_stream {
>		 * OA buffer data to userspace.
>		 */
>		u32 head;
> +
> +		/**
> +		 * @tail: The last verified tail that can be read by userspace.
> +		 */
> +		u32 tail;
>	} oa_buffer;
>
>	/**
> @@ -420,6 +414,12 @@ struct i915_perf {
>	 */
>	struct ratelimit_state spurious_report_rs;
>
> +	/**
> +	 * For rate limiting any notifications of tail pointer
> +	 * race.
> +	 */
> +	struct ratelimit_state tail_pointer_race;
> +
>	u32 gen7_latched_oastatus1;
>	u32 ctx_oactxctrl_offset;
>	u32 ctx_flexeu0_offset;
> --
> 2.20.1
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-03-24 21:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-24 18:54 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
2020-03-24 18:54 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
2020-03-24 21:25   ` Dixit, Ashutosh
2020-03-24 18:54 ` [Intel-gfx] [PATCH 2/3] drm/i915/perf: move pollin setup to non hw specific code Umesh Nerlige Ramappa
2020-03-24 18:54 ` [Intel-gfx] [PATCH 3/3] drm/i915/perf: add new open param to configure polling of OA buffer Umesh Nerlige Ramappa
2020-03-24 19:27 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/perf: add OA interrupt support (rev8) Patchwork
2020-03-24 19:50 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-03-24 21:13 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).