All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] drm/i915/perf: Enable non-power-of-2 OA report sizes
@ 2019-09-13 23:06 Umesh Nerlige Ramappa
  2019-09-13 23:06 ` [PATCH 1/3] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Umesh Nerlige Ramappa @ 2019-09-13 23:06 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, Lionel G Landwerlin, intel-gfx

Current driver supports only report sizes that are power of 2. Enable
driver support for other available sizes as well so that the user has
a flexible choice of report formats.

This first patch simplifies the aging tail workaround in the driver
before the other patches add support for other OA report sizes.

Lionel Landwerlin (1):
  drm/i915/perf: rework aging tail workaround

Umesh Nerlige Ramappa (2):
  drm/i915/perf: Add support for report sizes that are not power of 2
  drm/i915/perf: Add the report format with a non-power-of-2 size

 drivers/gpu/drm/i915/i915_drv.h  |  30 ++--
 drivers/gpu/drm/i915/i915_perf.c | 253 +++++++++++++------------------
 include/uapi/drm/i915_drm.h      |  11 +-
 3 files changed, 127 insertions(+), 167 deletions(-)

-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] drm/i915/perf: rework aging tail workaround
  2019-09-13 23:06 [PATCH 0/3] drm/i915/perf: Enable non-power-of-2 OA report sizes Umesh Nerlige Ramappa
@ 2019-09-13 23:06 ` Umesh Nerlige Ramappa
  2019-09-15 11:15   ` Lionel Landwerlin
  2019-09-13 23:06 ` [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2 Umesh Nerlige Ramappa
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Umesh Nerlige Ramappa @ 2019-09-13 23:06 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, Lionel G Landwerlin, intel-gfx

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

Right now the workaround against the OA tail pointer race condition
requires at least twice the internal kernel polling timer to make any
data available.

This changes introduce checks on the OA data written into the circular
buffer to make as much data as possible available on the first
iteration of the polling timer.

v2: Use OA_TAKEN macro without the gtt_offset (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  30 ++---
 drivers/gpu/drm/i915/i915_perf.c | 200 ++++++++++++++-----------------
 2 files changed, 103 insertions(+), 127 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bf600888b3f1..876aeaf3568e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1180,21 +1180,11 @@ struct i915_perf_stream {
 		spinlock_t ptr_lock;
 
 		/**
-		 * One 'aging' tail pointer and one 'aged' tail pointer ready to
-		 * used for reading.
-		 *
-		 * Initial values of 0xffffffff are invalid and imply that an
-		 * update is required (and should be ignored by an attempted
-		 * read)
-		 */
-		struct {
-			u32 offset;
-		} tails[2];
-
-		/**
-		 * Index for the aged tail ready to read() data up to.
+		 * The last HW tail reported by HW. The data
+		 * might not have made it to memory yet
+		 * though.
 		 */
-		unsigned int aged_tail_idx;
+		u32 aging_tail;
 
 		/**
 		 * A monotonic timestamp for when the current aging tail pointer
@@ -1210,6 +1200,12 @@ struct i915_perf_stream {
 		 * OA buffer data to userspace.
 		 */
 		u32 head;
+
+		/**
+		 * The last verified tail that can be read
+		 * by user space
+		 */
+		u32 tail;
 	} oa_buffer;
 };
 
@@ -1693,6 +1689,12 @@ struct drm_i915_private {
 		 */
 		struct ratelimit_state spurious_report_rs;
 
+		/**
+		 * For rate limiting any notifications of tail pointer
+		 * race.
+		 */
+		struct ratelimit_state tail_pointer_race;
+
 		struct i915_oa_config test_config;
 
 		u32 gen7_latched_oastatus1;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c1b764233761..50b6d154fd46 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -237,23 +237,14 @@
  * for this earlier, as part of the oa_buffer_check to avoid lots of redundant
  * read() attempts.
  *
- * In effect we define a tail pointer for reading that lags the real tail
- * pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough
- * time for the corresponding reports to become visible to the CPU.
- *
- * To manage this we actually track two tail pointers:
- *  1) An 'aging' tail with an associated timestamp that is tracked until we
- *     can trust the corresponding data is visible to the CPU; at which point
- *     it is considered 'aged'.
- *  2) An 'aged' tail that can be used for read()ing.
- *
- * The two separate pointers let us decouple read()s from tail pointer aging.
- *
- * The tail pointers are checked and updated at a limited rate within a hrtimer
- * callback (the same callback that is used for delivering EPOLLIN events)
- *
- * Initially the tails are marked invalid with %INVALID_TAIL_PTR which
- * indicates that an updated tail pointer is needed.
+ * We workaround this issue in oa_buffer_check() by reading the reports in the
+ * OA buffer, starting from the tail reported by the HW until we find 2
+ * consecutive reports with their first 2 dwords of not at 0. Those dwords are
+ * also set to 0 once read and the whole buffer is cleared upon OA buffer
+ * initialization. The first dword is the reason for this report while the
+ * second is the timestamp, making the chances of having those 2 fields at 0
+ * fairly unlikely. A more detailed explanation is available in
+ * oa_buffer_check().
  *
  * Most of the implementation details for this workaround are in
  * oa_buffer_check_unlocked() and _append_oa_reports()
@@ -266,7 +257,6 @@
  * enabled without any periodic sampling.
  */
 #define OA_TAIL_MARGIN_NSEC	100000ULL
-#define INVALID_TAIL_PTR	0xffffffff
 
 /* frequency for checking whether the OA unit has written new reports to the
  * circular OA buffer...
@@ -457,10 +447,10 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
 static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
+	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
 	int report_size = stream->oa_buffer.format_size;
 	unsigned long flags;
-	unsigned int aged_idx;
-	u32 head, hw_tail, aged_tail, aging_tail;
+	u32 hw_tail;
 	u64 now;
 
 	/* We have to consider the (unlikely) possibility that read() errors
@@ -469,16 +459,6 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	 */
 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
-	/* NB: The head we observe here might effectively be a little out of
-	 * date (between head and tails[aged_idx].offset if there is currently
-	 * a read() in progress.
-	 */
-	head = stream->oa_buffer.head;
-
-	aged_idx = stream->oa_buffer.aged_tail_idx;
-	aged_tail = stream->oa_buffer.tails[aged_idx].offset;
-	aging_tail = stream->oa_buffer.tails[!aged_idx].offset;
-
 	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
 
 	/* The tail pointer increases in 64 byte increments,
@@ -488,63 +468,75 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 
 	now = ktime_get_mono_fast_ns();
 
-	/* Update the aged tail
-	 *
-	 * Flip the tail pointer available for read()s once the aging tail is
-	 * old enough to trust that the corresponding data will be visible to
-	 * the CPU...
-	 *
-	 * Do this before updating the aging pointer in case we may be able to
-	 * immediately start aging a new pointer too (if new data has become
-	 * available) without needing to wait for a later hrtimer callback.
-	 */
-	if (aging_tail != INVALID_TAIL_PTR &&
-	    ((now - stream->oa_buffer.aging_timestamp) >
-	     OA_TAIL_MARGIN_NSEC)) {
-
-		aged_idx ^= 1;
-		stream->oa_buffer.aged_tail_idx = aged_idx;
+	if (hw_tail == stream->oa_buffer.aging_tail) {
+		/* If the HW tail hasn't move since the last check and the HW
+		 * tail has been aging for long enough, declare it the new
+		 * tail.
+		 */
+		if ((now - stream->oa_buffer.aging_timestamp) >
+		    OA_TAIL_MARGIN_NSEC) {
+			stream->oa_buffer.tail =
+				stream->oa_buffer.aging_tail;
+		}
+	} else {
+		u32 head, tail, landed_report_heads;
 
-		aged_tail = aging_tail;
+		/* NB: The head we observe here might effectively be a little out of
+		 * date (between head and tails[aged_idx].offset if there is currently
+		 * a read() in progress.
+		 */
+		head = stream->oa_buffer.head - gtt_offset;
 
-		/* Mark that we need a new pointer to start aging... */
-		stream->oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
-		aging_tail = INVALID_TAIL_PTR;
-	}
+		hw_tail -= gtt_offset;
+		tail = hw_tail;
 
-	/* Update the aging tail
-	 *
-	 * We throttle aging tail updates until we have a new tail that
-	 * represents >= one report more data than is already available for
-	 * reading. This ensures there will be enough data for a successful
-	 * read once this new pointer has aged and ensures we will give the new
-	 * pointer time to age.
-	 */
-	if (aging_tail == INVALID_TAIL_PTR &&
-	    (aged_tail == INVALID_TAIL_PTR ||
-	     OA_TAKEN(hw_tail, aged_tail) >= report_size)) {
-		struct i915_vma *vma = stream->oa_buffer.vma;
-		u32 gtt_offset = i915_ggtt_offset(vma);
-
-		/* Be paranoid and do a bounds check on the pointer read back
-		 * from hardware, just in case some spurious hardware condition
-		 * could put the tail out of bounds...
+		/* Walk the stream backward until we find at least 2 reports
+		 * with dword 0 & 1 not at 0. Since the circular buffer
+		 * pointers progress by increments of 64 bytes and that
+		 * reports can be up to 256 bytes long, we can't tell whether
+		 * a report has fully landed in memory before the first 2
+		 * dwords of the following report have effectively landed.
+		 *
+		 * This is assuming that the writes of the OA unit land in
+		 * memory in the order they were written to.
+		 * If not : (╯°□°)╯︵ ┻━┻
 		 */
-		if (hw_tail >= gtt_offset &&
-		    hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
-			stream->oa_buffer.tails[!aged_idx].offset =
-				aging_tail = hw_tail;
-			stream->oa_buffer.aging_timestamp = now;
-		} else {
-			DRM_ERROR("Ignoring spurious out of range OA buffer tail pointer = %u\n",
-				  hw_tail);
+		landed_report_heads = 0;
+		while (OA_TAKEN(tail, head) >= report_size) {
+			u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
+			u8 *report = stream->oa_buffer.vaddr + previous_tail;
+			u32 *report32 = (void *) report;
+
+			/* Head of the report indicated by the HW tail register has
+			 * indeed landed into memory.
+			 */
+			if (report32[0] != 0 || report[1] != 0) {
+				landed_report_heads++;
+
+				if (landed_report_heads >= 2)
+					break;
+			}
+
+			tail = previous_tail;
+		}
+
+		if (abs(tail - hw_tail) >= (2 * report_size)) {
+			if (__ratelimit(&dev_priv->perf.tail_pointer_race)) {
+				DRM_NOTE("unlanded report(s) head=0x%x "
+					 "tail=0x%x hw_tail=0x%x\n",
+					 head, tail, hw_tail);
+			}
 		}
+
+		stream->oa_buffer.tail = gtt_offset + tail;
+		stream->oa_buffer.aging_tail = gtt_offset + hw_tail;
+		stream->oa_buffer.aging_timestamp = now;
 	}
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	return aged_tail == INVALID_TAIL_PTR ?
-		false : OA_TAKEN(aged_tail, head) >= report_size;
+	return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
+			stream->oa_buffer.head - gtt_offset) >= report_size;
 }
 
 /**
@@ -662,7 +654,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	u32 mask = (OA_BUFFER_SIZE - 1);
 	size_t start_offset = *offset;
 	unsigned long flags;
-	unsigned int aged_tail_idx;
 	u32 head, tail;
 	u32 taken;
 	int ret = 0;
@@ -673,18 +664,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
 	head = stream->oa_buffer.head;
-	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
-	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
+	tail = stream->oa_buffer.tail;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	/*
-	 * An invalid tail pointer here means we're still waiting for the poll
-	 * hrtimer callback to give us a pointer
-	 */
-	if (tail == INVALID_TAIL_PTR)
-		return -EAGAIN;
-
 	/*
 	 * NB: oa_buffer.head/tail include the gtt_offset which we don't want
 	 * while indexing relative to oa_buf_base.
@@ -812,13 +795,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		}
 
 		/*
-		 * The above reason field sanity check is based on
-		 * the assumption that the OA buffer is initially
-		 * zeroed and we reset the field after copying so the
-		 * check is still meaningful once old reports start
-		 * being overwritten.
+		 * Clear out the first 2 dword as a mean to detect unlanded
+		 * reports.
 		 */
-		report32[0] = 0;
+		report32[0] = report32[1] = 0;
 	}
 
 	if (start_offset != *offset) {
@@ -950,7 +930,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	u32 mask = (OA_BUFFER_SIZE - 1);
 	size_t start_offset = *offset;
 	unsigned long flags;
-	unsigned int aged_tail_idx;
 	u32 head, tail;
 	u32 taken;
 	int ret = 0;
@@ -961,17 +940,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
 	head = stream->oa_buffer.head;
-	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
-	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
+	tail = stream->oa_buffer.tail;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	/* An invalid tail pointer here means we're still waiting for the poll
-	 * hrtimer callback to give us a pointer
-	 */
-	if (tail == INVALID_TAIL_PTR)
-		return -EAGAIN;
-
 	/* NB: oa_buffer.head/tail include the gtt_offset which we don't want
 	 * while indexing relative to oa_buf_base.
 	 */
@@ -1026,13 +998,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 		if (ret)
 			break;
 
-		/* The above report-id field sanity check is based on
-		 * the assumption that the OA buffer is initially
-		 * zeroed and we reset the field after copying so the
-		 * check is still meaningful once old reports start
-		 * being overwritten.
+		/* Clear out the first 2 dwords as a mean to detect unlanded
+		 * reports.
 		 */
-		report32[0] = 0;
+		report32[0] = report32[1] = 0;
 	}
 
 	if (start_offset != *offset) {
@@ -1411,8 +1380,8 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
 	I915_WRITE(GEN7_OASTATUS1, gtt_offset | OABUFFER_SIZE_16M); /* tail */
 
 	/* Mark that we need updated tail pointers to read from... */
-	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
-	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
+	stream->oa_buffer.aging_tail =
+		stream->oa_buffer.tail = gtt_offset;
 
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
@@ -1468,8 +1437,8 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
 	I915_WRITE(GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK);
 
 	/* Mark that we need updated tail pointers to read from... */
-	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
-	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
+	stream->oa_buffer.aging_tail =
+		stream->oa_buffer.tail = gtt_offset;
 
 	/*
 	 * Reset state used to recognise context switches, affecting which
@@ -3672,6 +3641,11 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
 		ratelimit_set_flags(&dev_priv->perf.spurious_report_rs,
 				    RATELIMIT_MSG_ON_RELEASE);
 
+		ratelimit_state_init(&dev_priv->perf.tail_pointer_race,
+				     5 * HZ, 10);
+		ratelimit_set_flags(&dev_priv->perf.tail_pointer_race,
+				    RATELIMIT_MSG_ON_RELEASE);
+
 		dev_priv->perf.initialized = true;
 	}
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2
  2019-09-13 23:06 [PATCH 0/3] drm/i915/perf: Enable non-power-of-2 OA report sizes Umesh Nerlige Ramappa
  2019-09-13 23:06 ` [PATCH 1/3] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
@ 2019-09-13 23:06 ` Umesh Nerlige Ramappa
  2019-09-15 11:24   ` Lionel Landwerlin
  2019-09-13 23:06 ` [PATCH 3/3] drm/i915/perf: Add the report format with a non-power-of-2 size Umesh Nerlige Ramappa
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Umesh Nerlige Ramappa @ 2019-09-13 23:06 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, Lionel G Landwerlin, intel-gfx

OA perf unit supports non-power of 2 report sizes. Enable support for
these sizes in the driver.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 59 ++++++++++++--------------------
 1 file changed, 21 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 50b6d154fd46..482fca3da7de 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -450,7 +450,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
 	int report_size = stream->oa_buffer.format_size;
 	unsigned long flags;
-	u32 hw_tail;
+	u32 hw_tail, aging_tail;
 	u64 now;
 
 	/* We have to consider the (unlikely) possibility that read() errors
@@ -459,16 +459,17 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	 */
 	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
 
-	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
+	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream) - gtt_offset;
+	aging_tail = stream->oa_buffer.aging_tail - gtt_offset;
 
 	/* The tail pointer increases in 64 byte increments,
 	 * not in report_size steps...
 	 */
-	hw_tail &= ~(report_size - 1);
+	hw_tail = OA_TAKEN(hw_tail, (OA_TAKEN(hw_tail, aging_tail) % report_size));
 
 	now = ktime_get_mono_fast_ns();
 
-	if (hw_tail == stream->oa_buffer.aging_tail) {
+	if (hw_tail == aging_tail) {
 		/* If the HW tail hasn't move since the last check and the HW
 		 * tail has been aging for long enough, declare it the new
 		 * tail.
@@ -486,8 +487,6 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 		 * a read() in progress.
 		 */
 		head = stream->oa_buffer.head - gtt_offset;
-
-		hw_tail -= gtt_offset;
 		tail = hw_tail;
 
 		/* Walk the stream backward until we find at least 2 reports
@@ -613,7 +612,18 @@ static int append_oa_sample(struct i915_perf_stream *stream,
 	buf += sizeof(header);
 
 	if (sample_flags & SAMPLE_OA_REPORT) {
-		if (copy_to_user(buf, report, report_size))
+		u8 *oa_buf_end = stream->oa_buffer.vaddr + OA_BUFFER_SIZE;
+		int report_size_partial = oa_buf_end - report;
+
+		if (report_size_partial < report_size) {
+			if (copy_to_user(buf, report, report_size_partial))
+				return -EFAULT;
+			buf += report_size_partial;
+
+			if (copy_to_user(buf, stream->oa_buffer.vaddr,
+					report_size - report_size_partial))
+				return -EFAULT;
+		} else if (copy_to_user(buf, report, report_size))
 			return -EFAULT;
 	}
 
@@ -682,8 +692,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 	 * only be incremented by multiples of the report size (notably also
 	 * all a power of two).
 	 */
-	if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size ||
-		      tail > OA_BUFFER_SIZE || tail % report_size,
+	if (WARN_ONCE(head > OA_BUFFER_SIZE ||
+		      tail > OA_BUFFER_SIZE,
 		      "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
 		      head, tail))
 		return -EIO;
@@ -697,20 +707,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		u32 ctx_id;
 		u32 reason;
 
-		/*
-		 * All the report sizes factor neatly into the buffer
-		 * size so we never expect to see a report split
-		 * between the beginning and end of the buffer.
-		 *
-		 * Given the initial alignment check a misalignment
-		 * here would imply a driver bug that would result
-		 * in an overrun.
-		 */
-		if (WARN_ON((OA_BUFFER_SIZE - head) < report_size)) {
-			DRM_ERROR("Spurious OA head ptr: non-integral report offset\n");
-			break;
-		}
-
 		/*
 		 * The reason field includes flags identifying what
 		 * triggered this specific report (mostly timer
@@ -956,8 +952,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 	 * only be incremented by multiples of the report size (notably also
 	 * all a power of two).
 	 */
-	if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size ||
-		      tail > OA_BUFFER_SIZE || tail % report_size,
+	if (WARN_ONCE(head > OA_BUFFER_SIZE ||
+		      tail > OA_BUFFER_SIZE,
 		      "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
 		      head, tail))
 		return -EIO;
@@ -969,19 +965,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
 		u8 *report = oa_buf_base + head;
 		u32 *report32 = (void *)report;
 
-		/* All the report sizes factor neatly into the buffer
-		 * size so we never expect to see a report split
-		 * between the beginning and end of the buffer.
-		 *
-		 * Given the initial alignment check a misalignment
-		 * here would imply a driver bug that would result
-		 * in an overrun.
-		 */
-		if (WARN_ON((OA_BUFFER_SIZE - head) < report_size)) {
-			DRM_ERROR("Spurious OA head ptr: non-integral report offset\n");
-			break;
-		}
-
 		/* The report-ID field for periodic samples includes
 		 * some undocumented flags related to what triggered
 		 * the report and is never expected to be zero so we
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] drm/i915/perf: Add the report format with a non-power-of-2 size
  2019-09-13 23:06 [PATCH 0/3] drm/i915/perf: Enable non-power-of-2 OA report sizes Umesh Nerlige Ramappa
  2019-09-13 23:06 ` [PATCH 1/3] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
  2019-09-13 23:06 ` [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2 Umesh Nerlige Ramappa
@ 2019-09-13 23:06 ` Umesh Nerlige Ramappa
  2019-09-15 11:27   ` Lionel Landwerlin
  2019-09-13 23:35 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: Enable non-power-of-2 OA report sizes Patchwork
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Umesh Nerlige Ramappa @ 2019-09-13 23:06 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, Lionel G Landwerlin, intel-gfx

Add the report format with size that is not a power of 2. This allows
use of all report formats defined in hardware.

Move the format definition to end to avoid breaking API (Lionel)

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c |  2 +-
 include/uapi/drm/i915_drm.h      | 11 ++++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 482fca3da7de..781fc5892493 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -312,11 +312,11 @@ static const struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A13]	    = { 0, 64 },
 	[I915_OA_FORMAT_A29]	    = { 1, 128 },
 	[I915_OA_FORMAT_A13_B8_C8]  = { 2, 128 },
-	/* A29_B8_C8 Disallowed as 192 bytes doesn't factor into buffer size */
 	[I915_OA_FORMAT_B4_C8]	    = { 4, 64 },
 	[I915_OA_FORMAT_A45_B8_C8]  = { 5, 256 },
 	[I915_OA_FORMAT_B4_C8_A16]  = { 6, 128 },
 	[I915_OA_FORMAT_C4_B8]	    = { 7, 64 },
+	[I915_OA_FORMAT_A29_B8_C8]  = { 3, 192 },
 };
 
 static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = {
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 469dc512cca3..4e2d4e492b06 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1829,12 +1829,13 @@ enum drm_i915_oa_format {
 	I915_OA_FORMAT_B4_C8,	    /* HSW only */
 	I915_OA_FORMAT_A45_B8_C8,   /* HSW only */
 	I915_OA_FORMAT_B4_C8_A16,   /* HSW only */
-	I915_OA_FORMAT_C4_B8,	    /* HSW+ */
+	I915_OA_FORMAT_C4_B8,	    /* HSW only */
 
-	/* Gen8+ */
-	I915_OA_FORMAT_A12,
-	I915_OA_FORMAT_A12_B8_C8,
-	I915_OA_FORMAT_A32u40_A4u32_B8_C8,
+	I915_OA_FORMAT_A12,			/* Gen8+ */
+	I915_OA_FORMAT_A12_B8_C8,		/* Gen8+ */
+	I915_OA_FORMAT_A32u40_A4u32_B8_C8,	/* Gen8+ */
+
+	I915_OA_FORMAT_A29_B8_C8,   /* HSW only */
 
 	I915_OA_FORMAT_MAX	    /* non-ABI */
 };
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: Enable non-power-of-2 OA report sizes
  2019-09-13 23:06 [PATCH 0/3] drm/i915/perf: Enable non-power-of-2 OA report sizes Umesh Nerlige Ramappa
                   ` (2 preceding siblings ...)
  2019-09-13 23:06 ` [PATCH 3/3] drm/i915/perf: Add the report format with a non-power-of-2 size Umesh Nerlige Ramappa
@ 2019-09-13 23:35 ` Patchwork
  2019-09-13 23:55 ` ✓ Fi.CI.BAT: success " Patchwork
  2019-09-15  9:22 ` ✓ Fi.CI.IGT: " Patchwork
  5 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2019-09-13 23:35 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: Enable non-power-of-2 OA report sizes
URL   : https://patchwork.freedesktop.org/series/66697/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
debe3b5da38e drm/i915/perf: rework aging tail workaround
-:235: CHECK:SPACING: No space is necessary after a cast
#235: FILE: drivers/gpu/drm/i915/i915_perf.c:508:
+			u32 *report32 = (void *) report;

-:313: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#313: FILE: drivers/gpu/drm/i915/i915_perf.c:801:
+		report32[0] = report32[1] = 0;

-:357: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#357: FILE: drivers/gpu/drm/i915/i915_perf.c:1004:
+		report32[0] = report32[1] = 0;

total: 0 errors, 0 warnings, 3 checks, 352 lines checked
6472a0aed3d1 drm/i915/perf: Add support for report sizes that are not power of 2
-:63: CHECK:BRACES: braces {} should be used on all arms of this statement
#63: FILE: drivers/gpu/drm/i915/i915_perf.c:618:
+		if (report_size_partial < report_size) {
[...]
+		} else if (copy_to_user(buf, report, report_size))
[...]

-:69: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#69: FILE: drivers/gpu/drm/i915/i915_perf.c:624:
+			if (copy_to_user(buf, stream->oa_buffer.vaddr,
+					report_size - report_size_partial))

total: 0 errors, 0 warnings, 2 checks, 114 lines checked
757d4db65691 drm/i915/perf: Add the report format with a non-power-of-2 size

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915/perf: Enable non-power-of-2 OA report sizes
  2019-09-13 23:06 [PATCH 0/3] drm/i915/perf: Enable non-power-of-2 OA report sizes Umesh Nerlige Ramappa
                   ` (3 preceding siblings ...)
  2019-09-13 23:35 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: Enable non-power-of-2 OA report sizes Patchwork
@ 2019-09-13 23:55 ` Patchwork
  2019-09-15  9:22 ` ✓ Fi.CI.IGT: " Patchwork
  5 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2019-09-13 23:55 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: Enable non-power-of-2 OA report sizes
URL   : https://patchwork.freedesktop.org/series/66697/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_6894 -> Patchwork_14411
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/

Known issues
------------

  Here are the changes found in Patchwork_14411 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live_gem_contexts:
    - fi-cfl-8700k:       [PASS][1] -> [INCOMPLETE][2] ([fdo#111514])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/fi-cfl-8700k/igt@i915_selftest@live_gem_contexts.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/fi-cfl-8700k/igt@i915_selftest@live_gem_contexts.html

  * igt@i915_selftest@live_hangcheck:
    - fi-icl-u3:          [PASS][3] -> [DMESG-FAIL][4] ([fdo#111678])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/fi-icl-u3/igt@i915_selftest@live_hangcheck.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/fi-icl-u3/igt@i915_selftest@live_hangcheck.html

  * igt@prime_vgem@basic-gtt:
    - fi-icl-u3:          [PASS][5] -> [DMESG-WARN][6] ([fdo#107724]) +1 similar issue
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/fi-icl-u3/igt@prime_vgem@basic-gtt.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/fi-icl-u3/igt@prime_vgem@basic-gtt.html

  
#### Possible fixes ####

  * igt@gem_ctx_switch@legacy-render:
    - {fi-icl-guc}:       [INCOMPLETE][7] ([fdo#107713] / [fdo#111381]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/fi-icl-guc/igt@gem_ctx_switch@legacy-render.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/fi-icl-guc/igt@gem_ctx_switch@legacy-render.html

  * igt@gem_mmap_gtt@basic-write-cpu-read-gtt:
    - fi-icl-u3:          [DMESG-WARN][9] ([fdo#107724]) -> [PASS][10] +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/fi-icl-u3/igt@gem_mmap_gtt@basic-write-cpu-read-gtt.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/fi-icl-u3/igt@gem_mmap_gtt@basic-write-cpu-read-gtt.html

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-icl-u2:          [FAIL][11] ([fdo#109635 ]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/fi-icl-u2/igt@kms_chamelium@hdmi-crc-fast.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/fi-icl-u2/igt@kms_chamelium@hdmi-crc-fast.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#109635 ]: https://bugs.freedesktop.org/show_bug.cgi?id=109635 
  [fdo#111381]: https://bugs.freedesktop.org/show_bug.cgi?id=111381
  [fdo#111514]: https://bugs.freedesktop.org/show_bug.cgi?id=111514
  [fdo#111678]: https://bugs.freedesktop.org/show_bug.cgi?id=111678


Participating hosts (54 -> 46)
------------------------------

  Missing    (8): fi-ilk-m540 fi-hsw-4200u fi-byt-j1900 fi-byt-squawks fi-bsw-cyan fi-icl-y fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_6894 -> Patchwork_14411

  CI-20190529: 20190529
  CI_DRM_6894: a323fd657c577491b1660662624bac36bb964222 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5182: f7104497049e3761ac297b66fd5586849b3cfcc8 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_14411: 757d4db656912fd50e02fd6148bbfd46efa3cf21 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

757d4db65691 drm/i915/perf: Add the report format with a non-power-of-2 size
6472a0aed3d1 drm/i915/perf: Add support for report sizes that are not power of 2
debe3b5da38e drm/i915/perf: rework aging tail workaround

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915/perf: Enable non-power-of-2 OA report sizes
  2019-09-13 23:06 [PATCH 0/3] drm/i915/perf: Enable non-power-of-2 OA report sizes Umesh Nerlige Ramappa
                   ` (4 preceding siblings ...)
  2019-09-13 23:55 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-09-15  9:22 ` Patchwork
  5 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2019-09-15  9:22 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: Enable non-power-of-2 OA report sizes
URL   : https://patchwork.freedesktop.org/series/66697/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_6894_full -> Patchwork_14411_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_14411_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_switch@legacy-bsd1-heavy-queue:
    - shard-apl:          [PASS][1] -> [INCOMPLETE][2] ([fdo#103927]) +2 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-apl8/igt@gem_ctx_switch@legacy-bsd1-heavy-queue.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-apl4/igt@gem_ctx_switch@legacy-bsd1-heavy-queue.html

  * igt@gem_exec_balancer@smoke:
    - shard-iclb:         [PASS][3] -> [SKIP][4] ([fdo#110854])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb1/igt@gem_exec_balancer@smoke.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb5/igt@gem_exec_balancer@smoke.html

  * igt@gem_exec_schedule@preempt-other-chain-bsd:
    - shard-iclb:         [PASS][5] -> [SKIP][6] ([fdo#111325]) +6 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb3/igt@gem_exec_schedule@preempt-other-chain-bsd.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb2/igt@gem_exec_schedule@preempt-other-chain-bsd.html

  * igt@gem_exec_schedule@promotion-bsd1:
    - shard-iclb:         [PASS][7] -> [SKIP][8] ([fdo#109276]) +22 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb1/igt@gem_exec_schedule@promotion-bsd1.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb5/igt@gem_exec_schedule@promotion-bsd1.html

  * igt@gem_tiled_swapping@non-threaded:
    - shard-apl:          [PASS][9] -> [DMESG-WARN][10] ([fdo#108686])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-apl1/igt@gem_tiled_swapping@non-threaded.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-apl1/igt@gem_tiled_swapping@non-threaded.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-hsw:          [PASS][11] -> [INCOMPLETE][12] ([fdo#103540])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-hsw5/igt@kms_flip@flip-vs-suspend-interruptible.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-hsw5/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-spr-indfb-draw-render:
    - shard-iclb:         [PASS][13] -> [FAIL][14] ([fdo#103167]) +5 similar issues
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb3/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-spr-indfb-draw-render.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb2/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-spr-indfb-draw-render.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes:
    - shard-apl:          [PASS][15] -> [DMESG-WARN][16] ([fdo#108566]) +3 similar issues
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-apl4/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-apl5/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html

  * igt@kms_plane_lowres@pipe-a-tiling-x:
    - shard-iclb:         [PASS][17] -> [FAIL][18] ([fdo#103166])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb1/igt@kms_plane_lowres@pipe-a-tiling-x.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb6/igt@kms_plane_lowres@pipe-a-tiling-x.html

  * igt@kms_psr@psr2_sprite_blt:
    - shard-iclb:         [PASS][19] -> [SKIP][20] ([fdo#109441])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb2/igt@kms_psr@psr2_sprite_blt.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb6/igt@kms_psr@psr2_sprite_blt.html

  * igt@tools_test@tools_test:
    - shard-kbl:          [PASS][21] -> [SKIP][22] ([fdo#109271])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-kbl3/igt@tools_test@tools_test.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-kbl1/igt@tools_test@tools_test.html

  
#### Possible fixes ####

  * igt@gem_ctx_isolation@rcs0-s3:
    - shard-apl:          [DMESG-WARN][23] ([fdo#108566]) -> [PASS][24] +5 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-apl7/igt@gem_ctx_isolation@rcs0-s3.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-apl6/igt@gem_ctx_isolation@rcs0-s3.html

  * igt@gem_eio@reset-stress:
    - shard-iclb:         [FAIL][25] ([fdo#109661]) -> [PASS][26]
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb2/igt@gem_eio@reset-stress.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb6/igt@gem_eio@reset-stress.html

  * igt@gem_exec_schedule@pi-ringfull-bsd:
    - shard-iclb:         [SKIP][27] ([fdo#111325]) -> [PASS][28] +1 similar issue
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb4/igt@gem_exec_schedule@pi-ringfull-bsd.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb6/igt@gem_exec_schedule@pi-ringfull-bsd.html

  * igt@gem_exec_schedule@preempt-queue-bsd1:
    - shard-iclb:         [SKIP][29] ([fdo#109276]) -> [PASS][30] +18 similar issues
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb6/igt@gem_exec_schedule@preempt-queue-bsd1.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb1/igt@gem_exec_schedule@preempt-queue-bsd1.html

  * igt@kms_cursor_crc@pipe-c-cursor-128x42-offscreen:
    - shard-hsw:          [INCOMPLETE][31] ([fdo#103540]) -> [PASS][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-hsw8/igt@kms_cursor_crc@pipe-c-cursor-128x42-offscreen.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-hsw5/igt@kms_cursor_crc@pipe-c-cursor-128x42-offscreen.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-kbl:          [DMESG-WARN][33] ([fdo#108566]) -> [PASS][34] +1 similar issue
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-kbl3/igt@kms_flip@flip-vs-suspend-interruptible.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-kbl4/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt:
    - shard-iclb:         [FAIL][35] ([fdo#103167]) -> [PASS][36] +2 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [FAIL][37] ([fdo#108145] / [fdo#110403]) -> [PASS][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-skl5/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-skl5/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_psr@psr2_suspend:
    - shard-iclb:         [SKIP][39] ([fdo#109441]) -> [PASS][40]
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb3/igt@kms_psr@psr2_suspend.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb2/igt@kms_psr@psr2_suspend.html

  * igt@kms_rotation_crc@cursor-rotation-180:
    - shard-iclb:         [INCOMPLETE][41] ([fdo#107713] / [fdo#110026]) -> [PASS][42]
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb7/igt@kms_rotation_crc@cursor-rotation-180.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb8/igt@kms_rotation_crc@cursor-rotation-180.html

  
#### Warnings ####

  * igt@gem_ctx_isolation@vcs1-nonpriv:
    - shard-iclb:         [FAIL][43] ([fdo#111329]) -> [SKIP][44] ([fdo#109276])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb4/igt@gem_ctx_isolation@vcs1-nonpriv.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb7/igt@gem_ctx_isolation@vcs1-nonpriv.html

  * igt@gem_mocs_settings@mocs-reset-bsd2:
    - shard-iclb:         [FAIL][45] ([fdo#111330]) -> [SKIP][46] ([fdo#109276]) +1 similar issue
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6894/shard-iclb1/igt@gem_mocs_settings@mocs-reset-bsd2.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/shard-iclb6/igt@gem_mocs_settings@mocs-reset-bsd2.html

  
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103540]: https://bugs.freedesktop.org/show_bug.cgi?id=103540
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108566]: https://bugs.freedesktop.org/show_bug.cgi?id=108566
  [fdo#108686]: https://bugs.freedesktop.org/show_bug.cgi?id=108686
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109661]: https://bugs.freedesktop.org/show_bug.cgi?id=109661
  [fdo#110026]: https://bugs.freedesktop.org/show_bug.cgi?id=110026
  [fdo#110403]: https://bugs.freedesktop.org/show_bug.cgi?id=110403
  [fdo#110854]: https://bugs.freedesktop.org/show_bug.cgi?id=110854
  [fdo#111325]: https://bugs.freedesktop.org/show_bug.cgi?id=111325
  [fdo#111329]: https://bugs.freedesktop.org/show_bug.cgi?id=111329
  [fdo#111330]: https://bugs.freedesktop.org/show_bug.cgi?id=111330


Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_6894 -> Patchwork_14411

  CI-20190529: 20190529
  CI_DRM_6894: a323fd657c577491b1660662624bac36bb964222 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5182: f7104497049e3761ac297b66fd5586849b3cfcc8 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_14411: 757d4db656912fd50e02fd6148bbfd46efa3cf21 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14411/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] drm/i915/perf: rework aging tail workaround
  2019-09-13 23:06 ` [PATCH 1/3] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
@ 2019-09-15 11:15   ` Lionel Landwerlin
  0 siblings, 0 replies; 17+ messages in thread
From: Lionel Landwerlin @ 2019-09-15 11:15 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On 14/09/2019 02:06, Umesh Nerlige Ramappa wrote:
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> Right now the workaround against the OA tail pointer race condition
> requires at least twice the internal kernel polling timer to make any
> data available.
>
> This changes introduce checks on the OA data written into the circular
> buffer to make as much data as possible available on the first
> iteration of the polling timer.
>
> v2: Use OA_TAKEN macro without the gtt_offset (Lionel)
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h  |  30 ++---
>   drivers/gpu/drm/i915/i915_perf.c | 200 ++++++++++++++-----------------
>   2 files changed, 103 insertions(+), 127 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index bf600888b3f1..876aeaf3568e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1180,21 +1180,11 @@ struct i915_perf_stream {
>   		spinlock_t ptr_lock;
>   
>   		/**
> -		 * One 'aging' tail pointer and one 'aged' tail pointer ready to
> -		 * used for reading.
> -		 *
> -		 * Initial values of 0xffffffff are invalid and imply that an
> -		 * update is required (and should be ignored by an attempted
> -		 * read)
> -		 */
> -		struct {
> -			u32 offset;
> -		} tails[2];
> -
> -		/**
> -		 * Index for the aged tail ready to read() data up to.
> +		 * The last HW tail reported by HW. The data
> +		 * might not have made it to memory yet
> +		 * though.
>   		 */
> -		unsigned int aged_tail_idx;
> +		u32 aging_tail;
>   
>   		/**
>   		 * A monotonic timestamp for when the current aging tail pointer
> @@ -1210,6 +1200,12 @@ struct i915_perf_stream {
>   		 * OA buffer data to userspace.
>   		 */
>   		u32 head;
> +
> +		/**
> +		 * The last verified tail that can be read
> +		 * by user space
> +		 */
> +		u32 tail;
>   	} oa_buffer;
>   };
>   
> @@ -1693,6 +1689,12 @@ struct drm_i915_private {
>   		 */
>   		struct ratelimit_state spurious_report_rs;
>   
> +		/**
> +		 * For rate limiting any notifications of tail pointer
> +		 * race.
> +		 */
> +		struct ratelimit_state tail_pointer_race;
> +
>   		struct i915_oa_config test_config;
>   
>   		u32 gen7_latched_oastatus1;
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index c1b764233761..50b6d154fd46 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -237,23 +237,14 @@
>    * for this earlier, as part of the oa_buffer_check to avoid lots of redundant
>    * read() attempts.
>    *
> - * In effect we define a tail pointer for reading that lags the real tail
> - * pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough
> - * time for the corresponding reports to become visible to the CPU.
> - *
> - * To manage this we actually track two tail pointers:
> - *  1) An 'aging' tail with an associated timestamp that is tracked until we
> - *     can trust the corresponding data is visible to the CPU; at which point
> - *     it is considered 'aged'.
> - *  2) An 'aged' tail that can be used for read()ing.
> - *
> - * The two separate pointers let us decouple read()s from tail pointer aging.
> - *
> - * The tail pointers are checked and updated at a limited rate within a hrtimer
> - * callback (the same callback that is used for delivering EPOLLIN events)
> - *
> - * Initially the tails are marked invalid with %INVALID_TAIL_PTR which
> - * indicates that an updated tail pointer is needed.
> + * We workaround this issue in oa_buffer_check() by reading the reports in the
> + * OA buffer, starting from the tail reported by the HW until we find 2
> + * consecutive reports with their first 2 dwords of not at 0. Those dwords are
> + * also set to 0 once read and the whole buffer is cleared upon OA buffer
> + * initialization. The first dword is the reason for this report while the
> + * second is the timestamp, making the chances of having those 2 fields at 0
> + * fairly unlikely. A more detailed explanation is available in
> + * oa_buffer_check().
>    *
>    * Most of the implementation details for this workaround are in
>    * oa_buffer_check_unlocked() and _append_oa_reports()
> @@ -266,7 +257,6 @@
>    * enabled without any periodic sampling.
>    */
>   #define OA_TAIL_MARGIN_NSEC	100000ULL
> -#define INVALID_TAIL_PTR	0xffffffff
>   
>   /* frequency for checking whether the OA unit has written new reports to the
>    * circular OA buffer...
> @@ -457,10 +447,10 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
>   static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>   	int report_size = stream->oa_buffer.format_size;
>   	unsigned long flags;
> -	unsigned int aged_idx;
> -	u32 head, hw_tail, aged_tail, aging_tail;
> +	u32 hw_tail;
>   	u64 now;
>   
>   	/* We have to consider the (unlikely) possibility that read() errors
> @@ -469,16 +459,6 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   	 */
>   	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>   
> -	/* NB: The head we observe here might effectively be a little out of
> -	 * date (between head and tails[aged_idx].offset if there is currently
> -	 * a read() in progress.
> -	 */
> -	head = stream->oa_buffer.head;
> -
> -	aged_idx = stream->oa_buffer.aged_tail_idx;
> -	aged_tail = stream->oa_buffer.tails[aged_idx].offset;
> -	aging_tail = stream->oa_buffer.tails[!aged_idx].offset;
> -
>   	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
>   
>   	/* The tail pointer increases in 64 byte increments,
> @@ -488,63 +468,75 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   
>   	now = ktime_get_mono_fast_ns();
>   
> -	/* Update the aged tail
> -	 *
> -	 * Flip the tail pointer available for read()s once the aging tail is
> -	 * old enough to trust that the corresponding data will be visible to
> -	 * the CPU...
> -	 *
> -	 * Do this before updating the aging pointer in case we may be able to
> -	 * immediately start aging a new pointer too (if new data has become
> -	 * available) without needing to wait for a later hrtimer callback.
> -	 */
> -	if (aging_tail != INVALID_TAIL_PTR &&
> -	    ((now - stream->oa_buffer.aging_timestamp) >
> -	     OA_TAIL_MARGIN_NSEC)) {
> -
> -		aged_idx ^= 1;
> -		stream->oa_buffer.aged_tail_idx = aged_idx;
> +	if (hw_tail == stream->oa_buffer.aging_tail) {
> +		/* If the HW tail hasn't move since the last check and the HW
> +		 * tail has been aging for long enough, declare it the new
> +		 * tail.
> +		 */
> +		if ((now - stream->oa_buffer.aging_timestamp) >
> +		    OA_TAIL_MARGIN_NSEC) {
> +			stream->oa_buffer.tail =
> +				stream->oa_buffer.aging_tail;
> +		}
> +	} else {
> +		u32 head, tail, landed_report_heads;
>   
> -		aged_tail = aging_tail;
> +		/* NB: The head we observe here might effectively be a little out of
> +		 * date (between head and tails[aged_idx].offset if there is currently
> +		 * a read() in progress.
> +		 */
> +		head = stream->oa_buffer.head - gtt_offset;
>   
> -		/* Mark that we need a new pointer to start aging... */
> -		stream->oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR;
> -		aging_tail = INVALID_TAIL_PTR;
> -	}
> +		hw_tail -= gtt_offset;
> +		tail = hw_tail;
>   
> -	/* Update the aging tail
> -	 *
> -	 * We throttle aging tail updates until we have a new tail that
> -	 * represents >= one report more data than is already available for
> -	 * reading. This ensures there will be enough data for a successful
> -	 * read once this new pointer has aged and ensures we will give the new
> -	 * pointer time to age.
> -	 */
> -	if (aging_tail == INVALID_TAIL_PTR &&
> -	    (aged_tail == INVALID_TAIL_PTR ||
> -	     OA_TAKEN(hw_tail, aged_tail) >= report_size)) {
> -		struct i915_vma *vma = stream->oa_buffer.vma;
> -		u32 gtt_offset = i915_ggtt_offset(vma);
> -
> -		/* Be paranoid and do a bounds check on the pointer read back
> -		 * from hardware, just in case some spurious hardware condition
> -		 * could put the tail out of bounds...
> +		/* Walk the stream backward until we find at least 2 reports
> +		 * with dword 0 & 1 not at 0. Since the circular buffer
> +		 * pointers progress by increments of 64 bytes and that
> +		 * reports can be up to 256 bytes long, we can't tell whether
> +		 * a report has fully landed in memory before the first 2
> +		 * dwords of the following report have effectively landed.
> +		 *
> +		 * This is assuming that the writes of the OA unit land in
> +		 * memory in the order they were written to.
> +		 * If not : (╯°□°)╯︵ ┻━┻
>   		 */
> -		if (hw_tail >= gtt_offset &&
> -		    hw_tail < (gtt_offset + OA_BUFFER_SIZE)) {
> -			stream->oa_buffer.tails[!aged_idx].offset =
> -				aging_tail = hw_tail;
> -			stream->oa_buffer.aging_timestamp = now;
> -		} else {
> -			DRM_ERROR("Ignoring spurious out of range OA buffer tail pointer = %u\n",
> -				  hw_tail);
> +		landed_report_heads = 0;
> +		while (OA_TAKEN(tail, head) >= report_size) {
> +			u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
> +			u8 *report = stream->oa_buffer.vaddr + previous_tail;
> +			u32 *report32 = (void *) report;
> +
> +			/* Head of the report indicated by the HW tail register has
> +			 * indeed landed into memory.
> +			 */
> +			if (report32[0] != 0 || report[1] != 0) {


Just noticed this is wrong, it should be : if (report32[0] != || 
report32[1] != 0) {

report is not a pointer to a dword.


-Lionel


> +				landed_report_heads++;
> +
> +				if (landed_report_heads >= 2)
> +					break;
> +			}
> +
> +			tail = previous_tail;
> +		}
> +
> +		if (abs(tail - hw_tail) >= (2 * report_size)) {
> +			if (__ratelimit(&dev_priv->perf.tail_pointer_race)) {
> +				DRM_NOTE("unlanded report(s) head=0x%x "
> +					 "tail=0x%x hw_tail=0x%x\n",
> +					 head, tail, hw_tail);
> +			}
>   		}
> +
> +		stream->oa_buffer.tail = gtt_offset + tail;
> +		stream->oa_buffer.aging_tail = gtt_offset + hw_tail;
> +		stream->oa_buffer.aging_timestamp = now;
>   	}
>   
>   	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>   
> -	return aged_tail == INVALID_TAIL_PTR ?
> -		false : OA_TAKEN(aged_tail, head) >= report_size;
> +	return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> +			stream->oa_buffer.head - gtt_offset) >= report_size;
>   }
>   
>   /**
> @@ -662,7 +654,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   	u32 mask = (OA_BUFFER_SIZE - 1);
>   	size_t start_offset = *offset;
>   	unsigned long flags;
> -	unsigned int aged_tail_idx;
>   	u32 head, tail;
>   	u32 taken;
>   	int ret = 0;
> @@ -673,18 +664,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>   
>   	head = stream->oa_buffer.head;
> -	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
> -	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
> +	tail = stream->oa_buffer.tail;
>   
>   	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>   
> -	/*
> -	 * An invalid tail pointer here means we're still waiting for the poll
> -	 * hrtimer callback to give us a pointer
> -	 */
> -	if (tail == INVALID_TAIL_PTR)
> -		return -EAGAIN;
> -
>   	/*
>   	 * NB: oa_buffer.head/tail include the gtt_offset which we don't want
>   	 * while indexing relative to oa_buf_base.
> @@ -812,13 +795,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   		}
>   
>   		/*
> -		 * The above reason field sanity check is based on
> -		 * the assumption that the OA buffer is initially
> -		 * zeroed and we reset the field after copying so the
> -		 * check is still meaningful once old reports start
> -		 * being overwritten.
> +		 * Clear out the first 2 dword as a mean to detect unlanded
> +		 * reports.
>   		 */
> -		report32[0] = 0;
> +		report32[0] = report32[1] = 0;
>   	}
>   
>   	if (start_offset != *offset) {
> @@ -950,7 +930,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   	u32 mask = (OA_BUFFER_SIZE - 1);
>   	size_t start_offset = *offset;
>   	unsigned long flags;
> -	unsigned int aged_tail_idx;
>   	u32 head, tail;
>   	u32 taken;
>   	int ret = 0;
> @@ -961,17 +940,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>   
>   	head = stream->oa_buffer.head;
> -	aged_tail_idx = stream->oa_buffer.aged_tail_idx;
> -	tail = stream->oa_buffer.tails[aged_tail_idx].offset;
> +	tail = stream->oa_buffer.tail;
>   
>   	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>   
> -	/* An invalid tail pointer here means we're still waiting for the poll
> -	 * hrtimer callback to give us a pointer
> -	 */
> -	if (tail == INVALID_TAIL_PTR)
> -		return -EAGAIN;
> -
>   	/* NB: oa_buffer.head/tail include the gtt_offset which we don't want
>   	 * while indexing relative to oa_buf_base.
>   	 */
> @@ -1026,13 +998,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   		if (ret)
>   			break;
>   
> -		/* The above report-id field sanity check is based on
> -		 * the assumption that the OA buffer is initially
> -		 * zeroed and we reset the field after copying so the
> -		 * check is still meaningful once old reports start
> -		 * being overwritten.
> +		/* Clear out the first 2 dwords as a mean to detect unlanded
> +		 * reports.
>   		 */
> -		report32[0] = 0;
> +		report32[0] = report32[1] = 0;
>   	}
>   
>   	if (start_offset != *offset) {
> @@ -1411,8 +1380,8 @@ static void gen7_init_oa_buffer(struct i915_perf_stream *stream)
>   	I915_WRITE(GEN7_OASTATUS1, gtt_offset | OABUFFER_SIZE_16M); /* tail */
>   
>   	/* Mark that we need updated tail pointers to read from... */
> -	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
> -	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
> +	stream->oa_buffer.aging_tail =
> +		stream->oa_buffer.tail = gtt_offset;
>   
>   	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>   
> @@ -1468,8 +1437,8 @@ static void gen8_init_oa_buffer(struct i915_perf_stream *stream)
>   	I915_WRITE(GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK);
>   
>   	/* Mark that we need updated tail pointers to read from... */
> -	stream->oa_buffer.tails[0].offset = INVALID_TAIL_PTR;
> -	stream->oa_buffer.tails[1].offset = INVALID_TAIL_PTR;
> +	stream->oa_buffer.aging_tail =
> +		stream->oa_buffer.tail = gtt_offset;
>   
>   	/*
>   	 * Reset state used to recognise context switches, affecting which
> @@ -3672,6 +3641,11 @@ void i915_perf_init(struct drm_i915_private *dev_priv)
>   		ratelimit_set_flags(&dev_priv->perf.spurious_report_rs,
>   				    RATELIMIT_MSG_ON_RELEASE);
>   
> +		ratelimit_state_init(&dev_priv->perf.tail_pointer_race,
> +				     5 * HZ, 10);
> +		ratelimit_set_flags(&dev_priv->perf.tail_pointer_race,
> +				    RATELIMIT_MSG_ON_RELEASE);
> +
>   		dev_priv->perf.initialized = true;
>   	}
>   }


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2
  2019-09-13 23:06 ` [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2 Umesh Nerlige Ramappa
@ 2019-09-15 11:24   ` Lionel Landwerlin
  2019-09-16  6:10     ` Ashutosh Dixit
  2019-09-16 19:17     ` Umesh Nerlige Ramappa
  0 siblings, 2 replies; 17+ messages in thread
From: Lionel Landwerlin @ 2019-09-15 11:24 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On 14/09/2019 02:06, Umesh Nerlige Ramappa wrote:
> OA perf unit supports non-power of 2 report sizes. Enable support for
> these sizes in the driver.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c | 59 ++++++++++++--------------------
>   1 file changed, 21 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 50b6d154fd46..482fca3da7de 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -450,7 +450,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>   	int report_size = stream->oa_buffer.format_size;
>   	unsigned long flags;
> -	u32 hw_tail;
> +	u32 hw_tail, aging_tail;
>   	u64 now;
>   
>   	/* We have to consider the (unlikely) possibility that read() errors
> @@ -459,16 +459,17 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   	 */
>   	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>   
> -	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
> +	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream) - gtt_offset;
> +	aging_tail = stream->oa_buffer.aging_tail - gtt_offset;
>   
>   	/* The tail pointer increases in 64 byte increments,
>   	 * not in report_size steps...
>   	 */
> -	hw_tail &= ~(report_size - 1);
> +	hw_tail = OA_TAKEN(hw_tail, (OA_TAKEN(hw_tail, aging_tail) % report_size));


I'm struggling to parse this line above and I'm not 100% sure it's correct.

Could add a comment to explain what is going on?


Thanks,


-Lionel


>   
>   	now = ktime_get_mono_fast_ns();
>   
> -	if (hw_tail == stream->oa_buffer.aging_tail) {
> +	if (hw_tail == aging_tail) {
>   		/* If the HW tail hasn't move since the last check and the HW
>   		 * tail has been aging for long enough, declare it the new
>   		 * tail.
> @@ -486,8 +487,6 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   		 * a read() in progress.
>   		 */
>   		head = stream->oa_buffer.head - gtt_offset;
> -
> -		hw_tail -= gtt_offset;
>   		tail = hw_tail;
>   
>   		/* Walk the stream backward until we find at least 2 reports
> @@ -613,7 +612,18 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>   	buf += sizeof(header);
>   
>   	if (sample_flags & SAMPLE_OA_REPORT) {
> -		if (copy_to_user(buf, report, report_size))
> +		u8 *oa_buf_end = stream->oa_buffer.vaddr + OA_BUFFER_SIZE;
> +		int report_size_partial = oa_buf_end - report;
> +
> +		if (report_size_partial < report_size) {
> +			if (copy_to_user(buf, report, report_size_partial))
> +				return -EFAULT;
> +			buf += report_size_partial;
> +
> +			if (copy_to_user(buf, stream->oa_buffer.vaddr,
> +					report_size - report_size_partial))
> +				return -EFAULT;
> +		} else if (copy_to_user(buf, report, report_size))
>   			return -EFAULT;
>   	}
>   
> @@ -682,8 +692,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   	 * only be incremented by multiples of the report size (notably also
>   	 * all a power of two).
>   	 */
> -	if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size ||
> -		      tail > OA_BUFFER_SIZE || tail % report_size,
> +	if (WARN_ONCE(head > OA_BUFFER_SIZE ||
> +		      tail > OA_BUFFER_SIZE,
>   		      "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
>   		      head, tail))
>   		return -EIO;
> @@ -697,20 +707,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>   		u32 ctx_id;
>   		u32 reason;
>   
> -		/*
> -		 * All the report sizes factor neatly into the buffer
> -		 * size so we never expect to see a report split
> -		 * between the beginning and end of the buffer.
> -		 *
> -		 * Given the initial alignment check a misalignment
> -		 * here would imply a driver bug that would result
> -		 * in an overrun.
> -		 */
> -		if (WARN_ON((OA_BUFFER_SIZE - head) < report_size)) {
> -			DRM_ERROR("Spurious OA head ptr: non-integral report offset\n");
> -			break;
> -		}
> -
>   		/*
>   		 * The reason field includes flags identifying what
>   		 * triggered this specific report (mostly timer
> @@ -956,8 +952,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   	 * only be incremented by multiples of the report size (notably also
>   	 * all a power of two).
>   	 */
> -	if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size ||
> -		      tail > OA_BUFFER_SIZE || tail % report_size,
> +	if (WARN_ONCE(head > OA_BUFFER_SIZE ||
> +		      tail > OA_BUFFER_SIZE,
>   		      "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
>   		      head, tail))
>   		return -EIO;
> @@ -969,19 +965,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>   		u8 *report = oa_buf_base + head;
>   		u32 *report32 = (void *)report;
>   
> -		/* All the report sizes factor neatly into the buffer
> -		 * size so we never expect to see a report split
> -		 * between the beginning and end of the buffer.
> -		 *
> -		 * Given the initial alignment check a misalignment
> -		 * here would imply a driver bug that would result
> -		 * in an overrun.
> -		 */
> -		if (WARN_ON((OA_BUFFER_SIZE - head) < report_size)) {
> -			DRM_ERROR("Spurious OA head ptr: non-integral report offset\n");
> -			break;
> -		}
> -
>   		/* The report-ID field for periodic samples includes
>   		 * some undocumented flags related to what triggered
>   		 * the report and is never expected to be zero so we


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] drm/i915/perf: Add the report format with a non-power-of-2 size
  2019-09-13 23:06 ` [PATCH 3/3] drm/i915/perf: Add the report format with a non-power-of-2 size Umesh Nerlige Ramappa
@ 2019-09-15 11:27   ` Lionel Landwerlin
  0 siblings, 0 replies; 17+ messages in thread
From: Lionel Landwerlin @ 2019-09-15 11:27 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx

On 14/09/2019 02:06, Umesh Nerlige Ramappa wrote:
> Add the report format with size that is not a power of 2. This allows
> use of all report formats defined in hardware.
>
> Move the format definition to end to avoid breaking API (Lionel)
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>


I think that like any change to the uAPI should be versioned so an 
application can know what's available without trying to open the stream 
to test whether a given parameter is available.

I would pull the patch in the link before this one : 
https://patchwork.freedesktop.org/patch/329723/?series=66418&rev=1

And bump the version number to 2 in this patch.


Cheers,


-Lionel


> ---
>   drivers/gpu/drm/i915/i915_perf.c |  2 +-
>   include/uapi/drm/i915_drm.h      | 11 ++++++-----
>   2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 482fca3da7de..781fc5892493 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -312,11 +312,11 @@ static const struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = {
>   	[I915_OA_FORMAT_A13]	    = { 0, 64 },
>   	[I915_OA_FORMAT_A29]	    = { 1, 128 },
>   	[I915_OA_FORMAT_A13_B8_C8]  = { 2, 128 },
> -	/* A29_B8_C8 Disallowed as 192 bytes doesn't factor into buffer size */
>   	[I915_OA_FORMAT_B4_C8]	    = { 4, 64 },
>   	[I915_OA_FORMAT_A45_B8_C8]  = { 5, 256 },
>   	[I915_OA_FORMAT_B4_C8_A16]  = { 6, 128 },
>   	[I915_OA_FORMAT_C4_B8]	    = { 7, 64 },
> +	[I915_OA_FORMAT_A29_B8_C8]  = { 3, 192 },
>   };
>   
>   static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = {
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 469dc512cca3..4e2d4e492b06 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1829,12 +1829,13 @@ enum drm_i915_oa_format {
>   	I915_OA_FORMAT_B4_C8,	    /* HSW only */
>   	I915_OA_FORMAT_A45_B8_C8,   /* HSW only */
>   	I915_OA_FORMAT_B4_C8_A16,   /* HSW only */
> -	I915_OA_FORMAT_C4_B8,	    /* HSW+ */
> +	I915_OA_FORMAT_C4_B8,	    /* HSW only */
>   
> -	/* Gen8+ */
> -	I915_OA_FORMAT_A12,
> -	I915_OA_FORMAT_A12_B8_C8,
> -	I915_OA_FORMAT_A32u40_A4u32_B8_C8,
> +	I915_OA_FORMAT_A12,			/* Gen8+ */
> +	I915_OA_FORMAT_A12_B8_C8,		/* Gen8+ */
> +	I915_OA_FORMAT_A32u40_A4u32_B8_C8,	/* Gen8+ */
> +
> +	I915_OA_FORMAT_A29_B8_C8,   /* HSW only */
>   
>   	I915_OA_FORMAT_MAX	    /* non-ABI */
>   };


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2
  2019-09-15 11:24   ` Lionel Landwerlin
@ 2019-09-16  6:10     ` Ashutosh Dixit
  2019-09-16 19:17     ` Umesh Nerlige Ramappa
  1 sibling, 0 replies; 17+ messages in thread
From: Ashutosh Dixit @ 2019-09-16  6:10 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Sun, 15 Sep 2019 04:24:41 -0700, Lionel Landwerlin wrote:
>
> On 14/09/2019 02:06, Umesh Nerlige Ramappa wrote:
> > OA perf unit supports non-power of 2 report sizes. Enable support for
> > these sizes in the driver.
> >
> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_perf.c | 59 ++++++++++++--------------------
> >   1 file changed, 21 insertions(+), 38 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index 50b6d154fd46..482fca3da7de 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -450,7 +450,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
> >	int report_size = stream->oa_buffer.format_size;
> >	unsigned long flags;
> > -	u32 hw_tail;
> > +	u32 hw_tail, aging_tail;
> >	u64 now;
> >		/* We have to consider the (unlikely) possibility that read()
> > errors
> > @@ -459,16 +459,17 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >	 */
> >	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> >   -	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
> > +	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream) - gtt_offset;
> > +	aging_tail = stream->oa_buffer.aging_tail - gtt_offset;
> >		/* The tail pointer increases in 64 byte increments,
> >	 * not in report_size steps...
> >	 */
> > -	hw_tail &= ~(report_size - 1);
> > +	hw_tail = OA_TAKEN(hw_tail, (OA_TAKEN(hw_tail, aging_tail) % report_size));
>
>
> I'm struggling to parse this line above and I'm not 100% sure it's correct.
>
> Could add a comment to explain what is going on?

Also for efficiency perhaps the modulo (%) should be replaced by a
increment, compare and wraparound?

Thanks!
--
Ashutosh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2
  2019-09-15 11:24   ` Lionel Landwerlin
  2019-09-16  6:10     ` Ashutosh Dixit
@ 2019-09-16 19:17     ` Umesh Nerlige Ramappa
  2019-09-17  4:11       ` Ashutosh Dixit
  2019-09-18  8:21       ` Lionel Landwerlin
  1 sibling, 2 replies; 17+ messages in thread
From: Umesh Nerlige Ramappa @ 2019-09-16 19:17 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Sun, Sep 15, 2019 at 02:24:41PM +0300, Lionel Landwerlin wrote:
>On 14/09/2019 02:06, Umesh Nerlige Ramappa wrote:
>>OA perf unit supports non-power of 2 report sizes. Enable support for
>>these sizes in the driver.
>>
>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>---
>>  drivers/gpu/drm/i915/i915_perf.c | 59 ++++++++++++--------------------
>>  1 file changed, 21 insertions(+), 38 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>index 50b6d154fd46..482fca3da7de 100644
>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>@@ -450,7 +450,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>  	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>>  	int report_size = stream->oa_buffer.format_size;
>>  	unsigned long flags;
>>-	u32 hw_tail;
>>+	u32 hw_tail, aging_tail;
>>  	u64 now;
>>  	/* We have to consider the (unlikely) possibility that read() errors
>>@@ -459,16 +459,17 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>  	 */
>>  	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>>-	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
>>+	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream) - gtt_offset;
>>+	aging_tail = stream->oa_buffer.aging_tail - gtt_offset;
>>  	/* The tail pointer increases in 64 byte increments,
>>  	 * not in report_size steps...
>>  	 */
>>-	hw_tail &= ~(report_size - 1);
>>+	hw_tail = OA_TAKEN(hw_tail, (OA_TAKEN(hw_tail, aging_tail) % report_size));
>
>
>I'm struggling to parse this line above and I'm not 100% sure it's correct.
>
>Could add a comment to explain what is going on?

The aging tail is always pointing to the boundary of a report whereas
the hw_tail is advancing in 64 byte increments.
 
The innermost OA_TAKEN is returning the number of bytes between the
hw_tail and the aging_tail. The modulo is getting the size of the
partial report (if any).

The outermost OA_TAKEN is subtracting the size of partial report from
the hw_tail to get a hw_tail that points to the boundary of the last
full report.

The value of hw_tail would be the same as from the deleted line of code
above this line.

Thanks,
Umesh

>
>
>Thanks,
>
>
>-Lionel
>
>
>>  	now = ktime_get_mono_fast_ns();
>>-	if (hw_tail == stream->oa_buffer.aging_tail) {
>>+	if (hw_tail == aging_tail) {
>>  		/* If the HW tail hasn't move since the last check and the HW
>>  		 * tail has been aging for long enough, declare it the new
>>  		 * tail.
>>@@ -486,8 +487,6 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>  		 * a read() in progress.
>>  		 */
>>  		head = stream->oa_buffer.head - gtt_offset;
>>-
>>-		hw_tail -= gtt_offset;
>>  		tail = hw_tail;
>>  		/* Walk the stream backward until we find at least 2 reports
>>@@ -613,7 +612,18 @@ static int append_oa_sample(struct i915_perf_stream *stream,
>>  	buf += sizeof(header);
>>  	if (sample_flags & SAMPLE_OA_REPORT) {
>>-		if (copy_to_user(buf, report, report_size))
>>+		u8 *oa_buf_end = stream->oa_buffer.vaddr + OA_BUFFER_SIZE;
>>+		int report_size_partial = oa_buf_end - report;
>>+
>>+		if (report_size_partial < report_size) {
>>+			if (copy_to_user(buf, report, report_size_partial))
>>+				return -EFAULT;
>>+			buf += report_size_partial;
>>+
>>+			if (copy_to_user(buf, stream->oa_buffer.vaddr,
>>+					report_size - report_size_partial))
>>+				return -EFAULT;
>>+		} else if (copy_to_user(buf, report, report_size))
>>  			return -EFAULT;
>>  	}
>>@@ -682,8 +692,8 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>>  	 * only be incremented by multiples of the report size (notably also
>>  	 * all a power of two).
>>  	 */
>>-	if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size ||
>>-		      tail > OA_BUFFER_SIZE || tail % report_size,
>>+	if (WARN_ONCE(head > OA_BUFFER_SIZE ||
>>+		      tail > OA_BUFFER_SIZE,
>>  		      "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
>>  		      head, tail))
>>  		return -EIO;
>>@@ -697,20 +707,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>>  		u32 ctx_id;
>>  		u32 reason;
>>-		/*
>>-		 * All the report sizes factor neatly into the buffer
>>-		 * size so we never expect to see a report split
>>-		 * between the beginning and end of the buffer.
>>-		 *
>>-		 * Given the initial alignment check a misalignment
>>-		 * here would imply a driver bug that would result
>>-		 * in an overrun.
>>-		 */
>>-		if (WARN_ON((OA_BUFFER_SIZE - head) < report_size)) {
>>-			DRM_ERROR("Spurious OA head ptr: non-integral report offset\n");
>>-			break;
>>-		}
>>-
>>  		/*
>>  		 * The reason field includes flags identifying what
>>  		 * triggered this specific report (mostly timer
>>@@ -956,8 +952,8 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>>  	 * only be incremented by multiples of the report size (notably also
>>  	 * all a power of two).
>>  	 */
>>-	if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size ||
>>-		      tail > OA_BUFFER_SIZE || tail % report_size,
>>+	if (WARN_ONCE(head > OA_BUFFER_SIZE ||
>>+		      tail > OA_BUFFER_SIZE,
>>  		      "Inconsistent OA buffer pointers: head = %u, tail = %u\n",
>>  		      head, tail))
>>  		return -EIO;
>>@@ -969,19 +965,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream,
>>  		u8 *report = oa_buf_base + head;
>>  		u32 *report32 = (void *)report;
>>-		/* All the report sizes factor neatly into the buffer
>>-		 * size so we never expect to see a report split
>>-		 * between the beginning and end of the buffer.
>>-		 *
>>-		 * Given the initial alignment check a misalignment
>>-		 * here would imply a driver bug that would result
>>-		 * in an overrun.
>>-		 */
>>-		if (WARN_ON((OA_BUFFER_SIZE - head) < report_size)) {
>>-			DRM_ERROR("Spurious OA head ptr: non-integral report offset\n");
>>-			break;
>>-		}
>>-
>>  		/* The report-ID field for periodic samples includes
>>  		 * some undocumented flags related to what triggered
>>  		 * the report and is never expected to be zero so we
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2
  2019-09-16 19:17     ` Umesh Nerlige Ramappa
@ 2019-09-17  4:11       ` Ashutosh Dixit
  2019-09-17 17:33         ` Umesh Nerlige Ramappa
  2019-09-18  8:21       ` Lionel Landwerlin
  1 sibling, 1 reply; 17+ messages in thread
From: Ashutosh Dixit @ 2019-09-17  4:11 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Mon, 16 Sep 2019 12:17:54 -0700, Umesh Nerlige Ramappa wrote:
>
> On Sun, Sep 15, 2019 at 02:24:41PM +0300, Lionel Landwerlin wrote:
> > On 14/09/2019 02:06, Umesh Nerlige Ramappa wrote:
> >> OA perf unit supports non-power of 2 report sizes. Enable support for
> >> these sizes in the driver.
> >>
> >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/i915_perf.c | 59 ++++++++++++--------------------
> >>  1 file changed, 21 insertions(+), 38 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >> index 50b6d154fd46..482fca3da7de 100644
> >> --- a/drivers/gpu/drm/i915/i915_perf.c
> >> +++ b/drivers/gpu/drm/i915/i915_perf.c
> >> @@ -450,7 +450,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >>	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
> >>	int report_size = stream->oa_buffer.format_size;
> >>	unsigned long flags;
> >> -	u32 hw_tail;
> >> +	u32 hw_tail, aging_tail;
> >>	u64 now;
> >>	/* We have to consider the (unlikely) possibility that read() errors
> >> @@ -459,16 +459,17 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >>	 */
> >>	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> >> -	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
> >> +	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream) - gtt_offset;
> >> +	aging_tail = stream->oa_buffer.aging_tail - gtt_offset;
> >>	/* The tail pointer increases in 64 byte increments,
> >>	 * not in report_size steps...
> >>	 */
> >> -	hw_tail &= ~(report_size - 1);
> >> +	hw_tail = OA_TAKEN(hw_tail, (OA_TAKEN(hw_tail, aging_tail) % report_size));
> >
> >
> > I'm struggling to parse this line above and I'm not 100% sure it's correct.
> >
> > Could add a comment to explain what is going on?
>
> The aging tail is always pointing to the boundary of a report whereas
> the hw_tail is advancing in 64 byte increments.
>
> The innermost OA_TAKEN is returning the number of bytes between the
> hw_tail and the aging_tail. The modulo is getting the size of the
> partial report (if any).
>
> The outermost OA_TAKEN is subtracting the size of partial report from
> the hw_tail to get a hw_tail that points to the boundary of the last
> full report.
>
> The value of hw_tail would be the same as from the deleted line of code
> above this line.

Looks like aging_tail should not be needed (it is complicating the
expression and is not there in the original expression). All we need is a
"circular modulus". For example would the following work?

    if (hw_tail < report_size)
        hw_tail += OA_BUFFER_SIZE;
    hw_tail = rounddown(hw_tail, report_size);

Another way (if we want to avoid the division) would be to maintain a
cached copy of hw_tail in SW which is successively (and circularly)
incremented by report_size till it catches up with hw_tail read from
HW. But probably the first method above is simpler.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2
  2019-09-17  4:11       ` Ashutosh Dixit
@ 2019-09-17 17:33         ` Umesh Nerlige Ramappa
  2019-09-18  5:38           ` Ashutosh Dixit
  0 siblings, 1 reply; 17+ messages in thread
From: Umesh Nerlige Ramappa @ 2019-09-17 17:33 UTC (permalink / raw)
  To: Ashutosh Dixit; +Cc: intel-gfx

On Mon, Sep 16, 2019 at 09:11:58PM -0700, Ashutosh Dixit wrote:
>On Mon, 16 Sep 2019 12:17:54 -0700, Umesh Nerlige Ramappa wrote:
>>
>> On Sun, Sep 15, 2019 at 02:24:41PM +0300, Lionel Landwerlin wrote:
>> > On 14/09/2019 02:06, Umesh Nerlige Ramappa wrote:
>> >> OA perf unit supports non-power of 2 report sizes. Enable support for
>> >> these sizes in the driver.
>> >>
>> >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> >> ---
>> >>  drivers/gpu/drm/i915/i915_perf.c | 59 ++++++++++++--------------------
>> >>  1 file changed, 21 insertions(+), 38 deletions(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> >> index 50b6d154fd46..482fca3da7de 100644
>> >> --- a/drivers/gpu/drm/i915/i915_perf.c
>> >> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> >> @@ -450,7 +450,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>> >>	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>> >>	int report_size = stream->oa_buffer.format_size;
>> >>	unsigned long flags;
>> >> -	u32 hw_tail;
>> >> +	u32 hw_tail, aging_tail;
>> >>	u64 now;
>> >>	/* We have to consider the (unlikely) possibility that read() errors
>> >> @@ -459,16 +459,17 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>> >>	 */
>> >>	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>> >> -	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
>> >> +	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream) - gtt_offset;
>> >> +	aging_tail = stream->oa_buffer.aging_tail - gtt_offset;
>> >>	/* The tail pointer increases in 64 byte increments,
>> >>	 * not in report_size steps...
>> >>	 */
>> >> -	hw_tail &= ~(report_size - 1);
>> >> +	hw_tail = OA_TAKEN(hw_tail, (OA_TAKEN(hw_tail, aging_tail) % report_size));
>> >
>> >
>> > I'm struggling to parse this line above and I'm not 100% sure it's correct.
>> >
>> > Could add a comment to explain what is going on?
>>
>> The aging tail is always pointing to the boundary of a report whereas
>> the hw_tail is advancing in 64 byte increments.
>>
>> The innermost OA_TAKEN is returning the number of bytes between the
>> hw_tail and the aging_tail. The modulo is getting the size of the
>> partial report (if any).
>>
>> The outermost OA_TAKEN is subtracting the size of partial report from
>> the hw_tail to get a hw_tail that points to the boundary of the last
>> full report.
>>
>> The value of hw_tail would be the same as from the deleted line of code
>> above this line.
>
>Looks like aging_tail should not be needed (it is complicating the
>expression and is not there in the original expression). All we need is a
>"circular modulus". For example would the following work?

original expression assumes all report sizes are powers of 2 and hence 
does not need a reference (like aging_tail) to rounddown the hw_tail.

>
>    if (hw_tail < report_size)
>        hw_tail += OA_BUFFER_SIZE;

Assuming that this condition is detecting a report split across the OA 
buffer boundary, the above check will not catch the split in cases where 
there are multiple reports generated before we read the hw_tail.

>    hw_tail = rounddown(hw_tail, report_size);
>
>Another way (if we want to avoid the division) would be to maintain a
>cached copy of hw_tail in SW which is successively (and circularly)
>incremented by report_size till it catches up with hw_tail read from
>HW. But probably the first method above is simpler.

aging_tail is a cached copy of the hw_tail that advances only in report 
size increments.

Thanks,
Umesh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2
  2019-09-17 17:33         ` Umesh Nerlige Ramappa
@ 2019-09-18  5:38           ` Ashutosh Dixit
  0 siblings, 0 replies; 17+ messages in thread
From: Ashutosh Dixit @ 2019-09-18  5:38 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 17 Sep 2019 10:33:51 -0700, Umesh Nerlige Ramappa wrote:
>
> On Mon, Sep 16, 2019 at 09:11:58PM -0700, Ashutosh Dixit wrote:
> > On Mon, 16 Sep 2019 12:17:54 -0700, Umesh Nerlige Ramappa wrote:
> >>
> >> On Sun, Sep 15, 2019 at 02:24:41PM +0300, Lionel Landwerlin wrote:
> >> > On 14/09/2019 02:06, Umesh Nerlige Ramappa wrote:
> >> >> OA perf unit supports non-power of 2 report sizes. Enable support for
> >> >> these sizes in the driver.
> >> >>
> >> >> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> >> ---
> >> >>  drivers/gpu/drm/i915/i915_perf.c | 59 ++++++++++++--------------------
> >> >>  1 file changed, 21 insertions(+), 38 deletions(-)
> >> >>
> >> >> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >> >> index 50b6d154fd46..482fca3da7de 100644
> >> >> --- a/drivers/gpu/drm/i915/i915_perf.c
> >> >> +++ b/drivers/gpu/drm/i915/i915_perf.c
> >> >> @@ -450,7 +450,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >> >>	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
> >> >>	int report_size = stream->oa_buffer.format_size;
> >> >>	unsigned long flags;
> >> >> -	u32 hw_tail;
> >> >> +	u32 hw_tail, aging_tail;
> >> >>	u64 now;
> >> >>	/* We have to consider the (unlikely) possibility that read() errors
> >> >> @@ -459,16 +459,17 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >> >>	 */
> >> >>	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> >> >> -	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
> >> >> +	hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream) - gtt_offset;
> >> >> +	aging_tail = stream->oa_buffer.aging_tail - gtt_offset;
> >> >>	/* The tail pointer increases in 64 byte increments,
> >> >>	 * not in report_size steps...
> >> >>	 */
> >> >> -	hw_tail &= ~(report_size - 1);
> >> >> +	hw_tail = OA_TAKEN(hw_tail, (OA_TAKEN(hw_tail, aging_tail) % report_size));
> >> >
> >> >
> >> > I'm struggling to parse this line above and I'm not 100% sure it's correct.
> >> >
> >> > Could add a comment to explain what is going on?
> >>
> >> The aging tail is always pointing to the boundary of a report whereas
> >> the hw_tail is advancing in 64 byte increments.
> >>
> >> The innermost OA_TAKEN is returning the number of bytes between the
> >> hw_tail and the aging_tail. The modulo is getting the size of the
> >> partial report (if any).
> >>
> >> The outermost OA_TAKEN is subtracting the size of partial report from
> >> the hw_tail to get a hw_tail that points to the boundary of the last
> >> full report.
> >>
> >> The value of hw_tail would be the same as from the deleted line of code
> >> above this line.
> >
> > Looks like aging_tail should not be needed (it is complicating the
> > expression and is not there in the original expression). All we need is a
> > "circular modulus". For example would the following work?
>
> original expression assumes all report sizes are powers of 2 and hence does
> not need a reference (like aging_tail) to rounddown the hw_tail.
>
> >
> >    if (hw_tail < report_size)
> >        hw_tail += OA_BUFFER_SIZE;
>
> Assuming that this condition is detecting a report split across the OA
> buffer boundary, the above check will not catch the split in cases where
> there are multiple reports generated before we read the hw_tail.
>
> >    hw_tail = rounddown(hw_tail, report_size);
> >
> > Another way (if we want to avoid the division) would be to maintain a
> > cached copy of hw_tail in SW which is successively (and circularly)
> > incremented by report_size till it catches up with hw_tail read from
> > HW. But probably the first method above is simpler.
>
> aging_tail is a cached copy of the hw_tail that advances only in report
> size increments.

Umesh is right, the previous aging_tail needs to be taken into
account. Basically we need to start from the previous aging_tail and
continue incrementing by report_size till we catch up with the new hw_tail
(at the previous report_size boundary, which gives the value of the new
aging_tail).
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2
  2019-09-16 19:17     ` Umesh Nerlige Ramappa
  2019-09-17  4:11       ` Ashutosh Dixit
@ 2019-09-18  8:21       ` Lionel Landwerlin
  2019-09-18 16:59         ` Umesh Nerlige Ramappa
  1 sibling, 1 reply; 17+ messages in thread
From: Lionel Landwerlin @ 2019-09-18  8:21 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On 16/09/2019 22:17, Umesh Nerlige Ramappa wrote:
> On Sun, Sep 15, 2019 at 02:24:41PM +0300, Lionel Landwerlin wrote:
>> On 14/09/2019 02:06, Umesh Nerlige Ramappa wrote:
>>> OA perf unit supports non-power of 2 report sizes. Enable support for
>>> these sizes in the driver.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/i915_perf.c | 59 ++++++++++++--------------------
>>>  1 file changed, 21 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>> b/drivers/gpu/drm/i915/i915_perf.c
>>> index 50b6d154fd46..482fca3da7de 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -450,7 +450,7 @@ static bool oa_buffer_check_unlocked(struct 
>>> i915_perf_stream *stream)
>>>      u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>>>      int report_size = stream->oa_buffer.format_size;
>>>      unsigned long flags;
>>> -    u32 hw_tail;
>>> +    u32 hw_tail, aging_tail;
>>>      u64 now;
>>>      /* We have to consider the (unlikely) possibility that read() 
>>> errors
>>> @@ -459,16 +459,17 @@ static bool oa_buffer_check_unlocked(struct 
>>> i915_perf_stream *stream)
>>>       */
>>>      spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>>> -    hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
>>> +    hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream) - gtt_offset;
>>> +    aging_tail = stream->oa_buffer.aging_tail - gtt_offset;
>>>      /* The tail pointer increases in 64 byte increments,
>>>       * not in report_size steps...
>>>       */
>>> -    hw_tail &= ~(report_size - 1);
>>> +    hw_tail = OA_TAKEN(hw_tail, (OA_TAKEN(hw_tail, aging_tail) % 
>>> report_size));
>>
>>
>> I'm struggling to parse this line above and I'm not 100% sure it's 
>> correct.
>>
>> Could add a comment to explain what is going on?
>
> The aging tail is always pointing to the boundary of a report whereas
> the hw_tail is advancing in 64 byte increments.
>
> The innermost OA_TAKEN is returning the number of bytes between the
> hw_tail and the aging_tail. The modulo is getting the size of the
> partial report (if any).
>
> The outermost OA_TAKEN is subtracting the size of partial report from
> the hw_tail to get a hw_tail that points to the boundary of the last
> full report.
>
> The value of hw_tail would be the same as from the deleted line of code
> above this line.
>
> Thanks,
> Umesh


Thanks, I ran a few tests locally to convince myself it's correct :)


It's still a bit difficult to parse, probably because OA_TAKEN() wasn't 
meant for this.

Could create a helper function that does this computation, something 
like this :


static inline u32 align_hw_tail_to_report_boundary(u32 hw_tail, u32 
last_aligned_tail)

{

     /* Compute potentially partially landed report in the OA buffer */

     u32 partial_report_size = OA_TAKEN(hw_tail, last_aligned_tail) % 
report_size;

     /* Substract that partial amount off the tail. */

     return (hw_tail - partial_report_size) % OA_BUFFER_SIZE;

}


Cheers,


-Lionel


>
>>
>>
>> Thanks,
>>
>>
>> -Lionel
>>
>>
>>>      now = ktime_get_mono_fast_ns();
>>> -    if (hw_tail == stream->oa_buffer.aging_tail) {
>>> +    if (hw_tail == aging_tail) {
>>>          /* If the HW tail hasn't move since the last check and the HW
>>>           * tail has been aging for long enough, declare it the new
>>>           * tail.
>>> @@ -486,8 +487,6 @@ static bool oa_buffer_check_unlocked(struct 
>>> i915_perf_stream *stream)
>>>           * a read() in progress.
>>>           */
>>>          head = stream->oa_buffer.head - gtt_offset;
>>> -
>>> -        hw_tail -= gtt_offset;
>>>          tail = hw_tail;
>>>          /* Walk the stream backward until we find at least 2 reports
>>> @@ -613,7 +612,18 @@ static int append_oa_sample(struct 
>>> i915_perf_stream *stream,
>>>      buf += sizeof(header);
>>>      if (sample_flags & SAMPLE_OA_REPORT) {
>>> -        if (copy_to_user(buf, report, report_size))
>>> +        u8 *oa_buf_end = stream->oa_buffer.vaddr + OA_BUFFER_SIZE;
>>> +        int report_size_partial = oa_buf_end - report;
>>> +
>>> +        if (report_size_partial < report_size) {
>>> +            if (copy_to_user(buf, report, report_size_partial))
>>> +                return -EFAULT;
>>> +            buf += report_size_partial;
>>> +
>>> +            if (copy_to_user(buf, stream->oa_buffer.vaddr,
>>> +                    report_size - report_size_partial))
>>> +                return -EFAULT;
>>> +        } else if (copy_to_user(buf, report, report_size))
>>>              return -EFAULT;
>>>      }
>>> @@ -682,8 +692,8 @@ static int gen8_append_oa_reports(struct 
>>> i915_perf_stream *stream,
>>>       * only be incremented by multiples of the report size (notably 
>>> also
>>>       * all a power of two).
>>>       */
>>> -    if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size ||
>>> -              tail > OA_BUFFER_SIZE || tail % report_size,
>>> +    if (WARN_ONCE(head > OA_BUFFER_SIZE ||
>>> +              tail > OA_BUFFER_SIZE,
>>>                "Inconsistent OA buffer pointers: head = %u, tail = 
>>> %u\n",
>>>                head, tail))
>>>          return -EIO;
>>> @@ -697,20 +707,6 @@ static int gen8_append_oa_reports(struct 
>>> i915_perf_stream *stream,
>>>          u32 ctx_id;
>>>          u32 reason;
>>> -        /*
>>> -         * All the report sizes factor neatly into the buffer
>>> -         * size so we never expect to see a report split
>>> -         * between the beginning and end of the buffer.
>>> -         *
>>> -         * Given the initial alignment check a misalignment
>>> -         * here would imply a driver bug that would result
>>> -         * in an overrun.
>>> -         */
>>> -        if (WARN_ON((OA_BUFFER_SIZE - head) < report_size)) {
>>> -            DRM_ERROR("Spurious OA head ptr: non-integral report 
>>> offset\n");
>>> -            break;
>>> -        }
>>> -
>>>          /*
>>>           * The reason field includes flags identifying what
>>>           * triggered this specific report (mostly timer
>>> @@ -956,8 +952,8 @@ static int gen7_append_oa_reports(struct 
>>> i915_perf_stream *stream,
>>>       * only be incremented by multiples of the report size (notably 
>>> also
>>>       * all a power of two).
>>>       */
>>> -    if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size ||
>>> -              tail > OA_BUFFER_SIZE || tail % report_size,
>>> +    if (WARN_ONCE(head > OA_BUFFER_SIZE ||
>>> +              tail > OA_BUFFER_SIZE,
>>>                "Inconsistent OA buffer pointers: head = %u, tail = 
>>> %u\n",
>>>                head, tail))
>>>          return -EIO;
>>> @@ -969,19 +965,6 @@ static int gen7_append_oa_reports(struct 
>>> i915_perf_stream *stream,
>>>          u8 *report = oa_buf_base + head;
>>>          u32 *report32 = (void *)report;
>>> -        /* All the report sizes factor neatly into the buffer
>>> -         * size so we never expect to see a report split
>>> -         * between the beginning and end of the buffer.
>>> -         *
>>> -         * Given the initial alignment check a misalignment
>>> -         * here would imply a driver bug that would result
>>> -         * in an overrun.
>>> -         */
>>> -        if (WARN_ON((OA_BUFFER_SIZE - head) < report_size)) {
>>> -            DRM_ERROR("Spurious OA head ptr: non-integral report 
>>> offset\n");
>>> -            break;
>>> -        }
>>> -
>>>          /* The report-ID field for periodic samples includes
>>>           * some undocumented flags related to what triggered
>>>           * the report and is never expected to be zero so we
>>
>>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2
  2019-09-18  8:21       ` Lionel Landwerlin
@ 2019-09-18 16:59         ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 17+ messages in thread
From: Umesh Nerlige Ramappa @ 2019-09-18 16:59 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Wed, Sep 18, 2019 at 11:21:01AM +0300, Lionel Landwerlin wrote:
>On 16/09/2019 22:17, Umesh Nerlige Ramappa wrote:
>>On Sun, Sep 15, 2019 at 02:24:41PM +0300, Lionel Landwerlin wrote:
>>>On 14/09/2019 02:06, Umesh Nerlige Ramappa wrote:
>>>>OA perf unit supports non-power of 2 report sizes. Enable support for
>>>>these sizes in the driver.
>>>>
>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>>---
>>>> drivers/gpu/drm/i915/i915_perf.c | 59 ++++++++++++--------------------
>>>> 1 file changed, 21 insertions(+), 38 deletions(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>b/drivers/gpu/drm/i915/i915_perf.c
>>>>index 50b6d154fd46..482fca3da7de 100644
>>>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>@@ -450,7 +450,7 @@ static bool oa_buffer_check_unlocked(struct 
>>>>i915_perf_stream *stream)
>>>>     u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>>>>     int report_size = stream->oa_buffer.format_size;
>>>>     unsigned long flags;
>>>>-    u32 hw_tail;
>>>>+    u32 hw_tail, aging_tail;
>>>>     u64 now;
>>>>     /* We have to consider the (unlikely) possibility that 
>>>>read() errors
>>>>@@ -459,16 +459,17 @@ static bool 
>>>>oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>>>      */
>>>>     spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>>>>-    hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream);
>>>>+    hw_tail = dev_priv->perf.ops.oa_hw_tail_read(stream) - gtt_offset;
>>>>+    aging_tail = stream->oa_buffer.aging_tail - gtt_offset;
>>>>     /* The tail pointer increases in 64 byte increments,
>>>>      * not in report_size steps...
>>>>      */
>>>>-    hw_tail &= ~(report_size - 1);
>>>>+    hw_tail = OA_TAKEN(hw_tail, (OA_TAKEN(hw_tail, aging_tail) 
>>>>% report_size));
>>>
>>>
>>>I'm struggling to parse this line above and I'm not 100% sure it's 
>>>correct.
>>>
>>>Could add a comment to explain what is going on?
>>
>>The aging tail is always pointing to the boundary of a report whereas
>>the hw_tail is advancing in 64 byte increments.
>>
>>The innermost OA_TAKEN is returning the number of bytes between the
>>hw_tail and the aging_tail. The modulo is getting the size of the
>>partial report (if any).
>>
>>The outermost OA_TAKEN is subtracting the size of partial report from
>>the hw_tail to get a hw_tail that points to the boundary of the last
>>full report.
>>
>>The value of hw_tail would be the same as from the deleted line of code
>>above this line.
>>
>>Thanks,
>>Umesh
>
>
>Thanks, I ran a few tests locally to convince myself it's correct :)
>
>
>It's still a bit difficult to parse, probably because OA_TAKEN() 
>wasn't meant for this.
>
>Could create a helper function that does this computation, something 
>like this :
>
>
>static inline u32 align_hw_tail_to_report_boundary(u32 hw_tail, u32 
>last_aligned_tail)
>
>{
>
>    /* Compute potentially partially landed report in the OA buffer */
>
>    u32 partial_report_size = OA_TAKEN(hw_tail, last_aligned_tail) % 
>report_size;
>
>    /* Substract that partial amount off the tail. */
>
>    return (hw_tail - partial_report_size) % OA_BUFFER_SIZE;
>
>}

Sure, I can add a helper function to make this more readable.

Thanks,
Umesh

>
>
>Cheers,
>
>
>-Lionel
>
>
>>
>>>
>>>
>>>Thanks,
>>>
>>>
>>>-Lionel
>>>
>>>
>>>>     now = ktime_get_mono_fast_ns();
>>>>-    if (hw_tail == stream->oa_buffer.aging_tail) {
>>>>+    if (hw_tail == aging_tail) {
>>>>         /* If the HW tail hasn't move since the last check and the HW
>>>>          * tail has been aging for long enough, declare it the new
>>>>          * tail.
>>>>@@ -486,8 +487,6 @@ static bool oa_buffer_check_unlocked(struct 
>>>>i915_perf_stream *stream)
>>>>          * a read() in progress.
>>>>          */
>>>>         head = stream->oa_buffer.head - gtt_offset;
>>>>-
>>>>-        hw_tail -= gtt_offset;
>>>>         tail = hw_tail;
>>>>         /* Walk the stream backward until we find at least 2 reports
>>>>@@ -613,7 +612,18 @@ static int append_oa_sample(struct 
>>>>i915_perf_stream *stream,
>>>>     buf += sizeof(header);
>>>>     if (sample_flags & SAMPLE_OA_REPORT) {
>>>>-        if (copy_to_user(buf, report, report_size))
>>>>+        u8 *oa_buf_end = stream->oa_buffer.vaddr + OA_BUFFER_SIZE;
>>>>+        int report_size_partial = oa_buf_end - report;
>>>>+
>>>>+        if (report_size_partial < report_size) {
>>>>+            if (copy_to_user(buf, report, report_size_partial))
>>>>+                return -EFAULT;
>>>>+            buf += report_size_partial;
>>>>+
>>>>+            if (copy_to_user(buf, stream->oa_buffer.vaddr,
>>>>+                    report_size - report_size_partial))
>>>>+                return -EFAULT;
>>>>+        } else if (copy_to_user(buf, report, report_size))
>>>>             return -EFAULT;
>>>>     }
>>>>@@ -682,8 +692,8 @@ static int gen8_append_oa_reports(struct 
>>>>i915_perf_stream *stream,
>>>>      * only be incremented by multiples of the report size 
>>>>(notably also
>>>>      * all a power of two).
>>>>      */
>>>>-    if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size ||
>>>>-              tail > OA_BUFFER_SIZE || tail % report_size,
>>>>+    if (WARN_ONCE(head > OA_BUFFER_SIZE ||
>>>>+              tail > OA_BUFFER_SIZE,
>>>>               "Inconsistent OA buffer pointers: head = %u, tail 
>>>>= %u\n",
>>>>               head, tail))
>>>>         return -EIO;
>>>>@@ -697,20 +707,6 @@ static int gen8_append_oa_reports(struct 
>>>>i915_perf_stream *stream,
>>>>         u32 ctx_id;
>>>>         u32 reason;
>>>>-        /*
>>>>-         * All the report sizes factor neatly into the buffer
>>>>-         * size so we never expect to see a report split
>>>>-         * between the beginning and end of the buffer.
>>>>-         *
>>>>-         * Given the initial alignment check a misalignment
>>>>-         * here would imply a driver bug that would result
>>>>-         * in an overrun.
>>>>-         */
>>>>-        if (WARN_ON((OA_BUFFER_SIZE - head) < report_size)) {
>>>>-            DRM_ERROR("Spurious OA head ptr: non-integral 
>>>>report offset\n");
>>>>-            break;
>>>>-        }
>>>>-
>>>>         /*
>>>>          * The reason field includes flags identifying what
>>>>          * triggered this specific report (mostly timer
>>>>@@ -956,8 +952,8 @@ static int gen7_append_oa_reports(struct 
>>>>i915_perf_stream *stream,
>>>>      * only be incremented by multiples of the report size 
>>>>(notably also
>>>>      * all a power of two).
>>>>      */
>>>>-    if (WARN_ONCE(head > OA_BUFFER_SIZE || head % report_size ||
>>>>-              tail > OA_BUFFER_SIZE || tail % report_size,
>>>>+    if (WARN_ONCE(head > OA_BUFFER_SIZE ||
>>>>+              tail > OA_BUFFER_SIZE,
>>>>               "Inconsistent OA buffer pointers: head = %u, tail 
>>>>= %u\n",
>>>>               head, tail))
>>>>         return -EIO;
>>>>@@ -969,19 +965,6 @@ static int gen7_append_oa_reports(struct 
>>>>i915_perf_stream *stream,
>>>>         u8 *report = oa_buf_base + head;
>>>>         u32 *report32 = (void *)report;
>>>>-        /* All the report sizes factor neatly into the buffer
>>>>-         * size so we never expect to see a report split
>>>>-         * between the beginning and end of the buffer.
>>>>-         *
>>>>-         * Given the initial alignment check a misalignment
>>>>-         * here would imply a driver bug that would result
>>>>-         * in an overrun.
>>>>-         */
>>>>-        if (WARN_ON((OA_BUFFER_SIZE - head) < report_size)) {
>>>>-            DRM_ERROR("Spurious OA head ptr: non-integral 
>>>>report offset\n");
>>>>-            break;
>>>>-        }
>>>>-
>>>>         /* The report-ID field for periodic samples includes
>>>>          * some undocumented flags related to what triggered
>>>>          * the report and is never expected to be zero so we
>>>
>>>
>>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-09-18 16:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-13 23:06 [PATCH 0/3] drm/i915/perf: Enable non-power-of-2 OA report sizes Umesh Nerlige Ramappa
2019-09-13 23:06 ` [PATCH 1/3] drm/i915/perf: rework aging tail workaround Umesh Nerlige Ramappa
2019-09-15 11:15   ` Lionel Landwerlin
2019-09-13 23:06 ` [PATCH 2/3] drm/i915/perf: Add support for report sizes that are not power of 2 Umesh Nerlige Ramappa
2019-09-15 11:24   ` Lionel Landwerlin
2019-09-16  6:10     ` Ashutosh Dixit
2019-09-16 19:17     ` Umesh Nerlige Ramappa
2019-09-17  4:11       ` Ashutosh Dixit
2019-09-17 17:33         ` Umesh Nerlige Ramappa
2019-09-18  5:38           ` Ashutosh Dixit
2019-09-18  8:21       ` Lionel Landwerlin
2019-09-18 16:59         ` Umesh Nerlige Ramappa
2019-09-13 23:06 ` [PATCH 3/3] drm/i915/perf: Add the report format with a non-power-of-2 size Umesh Nerlige Ramappa
2019-09-15 11:27   ` Lionel Landwerlin
2019-09-13 23:35 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: Enable non-power-of-2 OA report sizes Patchwork
2019-09-13 23:55 ` ✓ Fi.CI.BAT: success " Patchwork
2019-09-15  9:22 ` ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.