[Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support
@ 2020-04-13 15:48 Umesh Nerlige Ramappa
  2020-04-13 15:48 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads Umesh Nerlige Ramappa
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-04-13 15:48 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit

Hi all,

This is a revival of an earlier patch series submitted by Lionel
Landwerlin - https://patchwork.freedesktop.org/series/54280/

The patches enable interrupt support for the perf OA unit in
i915, further details can be found in the orignal series linked
above.

This series was split into 2. The first part is merged. This part
enables interrupts.

Regards,
Umesh

Lionel Landwerlin (2):
  drm/i915: handle interrupts from the OA unit
  drm/i915/perf: add interrupt enabling parameter

Umesh Nerlige Ramappa (1):
  drm/i915/perf: Reduce cpu overhead for blocking perf OA reads

 drivers/gpu/drm/i915/gt/intel_gt_irq.c        |  29 ++++-
 drivers/gpu/drm/i915/gt/intel_gt_irq.h        |   1 +
 drivers/gpu/drm/i915/gt/intel_gt_pm_irq.c     |  34 ++---
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   2 +
 drivers/gpu/drm/i915/i915_perf.c              | 118 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_perf_types.h        |  19 +++
 drivers/gpu/drm/i915/i915_reg.h               |   9 ++
 include/uapi/drm/i915_drm.h                   |  13 +-
 8 files changed, 202 insertions(+), 23 deletions(-)

-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads
  2020-04-13 15:48 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
@ 2020-04-13 15:48 ` Umesh Nerlige Ramappa
  2020-04-14  5:08   ` Dixit, Ashutosh
  2020-04-15 10:00   ` Lionel Landwerlin
  2020-04-13 15:48 ` [Intel-gfx] [PATCH 2/3] drm/i915: handle interrupts from the OA unit Umesh Nerlige Ramappa
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-04-13 15:48 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit

A condition in wait_event_interruptible seems to be checked twice before
waiting on the event to occur. These checks are redundant when hrtimer
events will call oa_buffer_check_unlocked to update the oa_buffer tail
pointers. The redundant checks add cpu overhead. Simplify the check
to reduce cpu overhead when using blocking io to read oa buffer reports.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 5cde3e4e7be6..e28a3ab83fde 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	return pollin;
 }
 
+/**
+ * oa_buffer_check_reports - quick check if reports are available
+ * @stream: i915 stream instance
+ *
+ * The return from this function is used as a condition for
+ * wait_event_interruptible in blocking read. This is used to detect
+ * available reports.
+ *
+ * A condition in wait_event_interruptible seems to be checked twice before
+ * waiting on an event to occur. These checks are redundant when hrtimer events
+ * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
+ * redundant checks add cpu overhead. We simplify the check to reduce cpu
+ * overhead.
+ */
+static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
+{
+	unsigned long flags;
+	bool available;
+
+	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
+	available = stream->oa_buffer.tail != stream->oa_buffer.head;
+	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
+
+	return available;
+}
+
 /**
  * append_oa_status - Appends a status record to a userspace read() buffer.
  * @stream: An i915-perf stream opened for OA metrics
@@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
 		return -EIO;
 
 	return wait_event_interruptible(stream->poll_wq,
-					oa_buffer_check_unlocked(stream));
+					oa_buffer_check_reports(stream));
 }
 
 /**
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH 2/3] drm/i915: handle interrupts from the OA unit
  2020-04-13 15:48 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
  2020-04-13 15:48 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads Umesh Nerlige Ramappa
@ 2020-04-13 15:48 ` Umesh Nerlige Ramappa
  2020-04-17 23:25   ` Dixit, Ashutosh
  2020-04-13 15:48 ` [Intel-gfx] [PATCH 3/3] drm/i915/perf: add interrupt enabling parameter Umesh Nerlige Ramappa
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-04-13 15:48 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

The OA unit can notify that its circular buffer is half full through
an interrupt and we would like to give the application the ability to
make use of this interrupt to get rid of CPU checks on the OA buffer.

This change wires up the interrupt to the i915-perf stream and leaves
it ignored for now.

v2: Use spin_lock_irq() to access the IMR register on Haswell (Chris)
v3: (Umesh)
- Rebase
- Enable OA IMR for gen12
- Add OA interrupt to the pm enable/disable irq for gen11/12

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_irq.c        | 29 ++++++++++++++--
 drivers/gpu/drm/i915/gt/intel_gt_irq.h        |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_pm_irq.c     | 34 +++++++++++--------
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  2 ++
 drivers/gpu/drm/i915/i915_perf.c              | 34 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_perf_types.h        | 19 +++++++++++
 drivers/gpu/drm/i915/i915_reg.h               |  9 +++++
 7 files changed, 112 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
index 0cc7dd54f4f9..61eee9fb8872 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
@@ -94,6 +94,18 @@ gen11_gt_engine_identity(struct intel_gt *gt,
 	return ident;
 }
 
+static void notify_perfmon_buffer_half_full(struct drm_i915_private *i915)
+{
+	atomic64_inc(&i915->perf.exclusive_stream->half_full_count);
+	wake_up_all(&i915->perf.exclusive_stream->poll_wq);
+}
+
+static void gen8_perfmon_handler(struct drm_i915_private *i915, u32 iir)
+{
+	if (iir & GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT)
+		notify_perfmon_buffer_half_full(i915);
+}
+
 static void
 gen11_other_irq_handler(struct intel_gt *gt, const u8 instance,
 			const u16 iir)
@@ -104,6 +116,9 @@ gen11_other_irq_handler(struct intel_gt *gt, const u8 instance,
 	if (instance == OTHER_GTPM_INSTANCE)
 		return gen11_rps_irq_handler(&gt->rps, iir);
 
+	if (instance == OTHER_WDOAPERF_INSTANCE)
+		return gen8_perfmon_handler(gt->i915, iir);
+
 	WARN_ONCE(1, "unhandled other interrupt instance=0x%x, iir=0x%x\n",
 		  instance, iir);
 }
@@ -310,6 +325,9 @@ void gen6_gt_irq_handler(struct intel_gt *gt, u32 gt_iir)
 		      GT_CS_MASTER_ERROR_INTERRUPT))
 		DRM_DEBUG("Command parser error, gt_iir 0x%08x\n", gt_iir);
 
+	if (gt_iir & GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT)
+		notify_perfmon_buffer_half_full(gt->i915);
+
 	if (gt_iir & GT_PARITY_ERROR(gt->i915))
 		gen7_parity_error_irq_handler(gt, gt_iir);
 }
@@ -341,11 +359,13 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl)
 		}
 	}
 
-	if (master_ctl & GEN8_GT_VECS_IRQ) {
+	if (master_ctl & (GEN8_GT_VECS_IRQ | GEN8_GT_WDBOX_OACS_IRQ)) {
 		iir = raw_reg_read(regs, GEN8_GT_IIR(3));
 		if (likely(iir)) {
 			cs_irq_handler(gt->engine_class[VIDEO_ENHANCEMENT_CLASS][0],
 				       iir >> GEN8_VECS_IRQ_SHIFT);
+			gen8_perfmon_handler(gt->i915,
+					     iir >> GEN8_WD_IRQ_SHIFT);
 			raw_reg_write(regs, GEN8_GT_IIR(3), iir);
 		}
 	}
@@ -382,7 +402,8 @@ void gen8_gt_irq_postinstall(struct intel_gt *gt)
 		irqs << GEN8_RCS_IRQ_SHIFT | irqs << GEN8_BCS_IRQ_SHIFT,
 		irqs << GEN8_VCS0_IRQ_SHIFT | irqs << GEN8_VCS1_IRQ_SHIFT,
 		0,
-		irqs << GEN8_VECS_IRQ_SHIFT,
+		irqs << GEN8_VECS_IRQ_SHIFT |
+		GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT << GEN8_WD_IRQ_SHIFT,
 	};
 	struct intel_uncore *uncore = gt->uncore;
 
@@ -450,6 +471,10 @@ void gen5_gt_irq_postinstall(struct intel_gt *gt)
 	else
 		gt_irqs |= GT_BLT_USER_INTERRUPT | GT_BSD_USER_INTERRUPT;
 
+	/* We only expose the i915/perf interface on HSW+. */
+	if (IS_HASWELL(gt->i915))
+		gt_irqs |= GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT;
+
 	GEN3_IRQ_INIT(uncore, GT, gt->gt_imr, gt_irqs);
 
 	if (INTEL_GEN(gt->i915) >= 6) {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.h b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
index 886c5cf408a2..67cd2e60ccb4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
@@ -15,6 +15,7 @@ struct intel_gt;
 		      GEN8_GT_BCS_IRQ | \
 		      GEN8_GT_VCS0_IRQ | \
 		      GEN8_GT_VCS1_IRQ | \
+		      GEN8_GT_WDBOX_OACS_IRQ | \
 		      GEN8_GT_VECS_IRQ | \
 		      GEN8_GT_PM_IRQ | \
 		      GEN8_GT_GUC_IRQ)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_pm_irq.c
index babe866126d7..6eecf9912a67 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_irq.c
@@ -14,19 +14,16 @@ static void write_pm_imr(struct intel_gt *gt)
 {
 	struct drm_i915_private *i915 = gt->i915;
 	struct intel_uncore *uncore = gt->uncore;
-	u32 mask = gt->pm_imr;
 	i915_reg_t reg;
 
-	if (INTEL_GEN(i915) >= 11) {
+	if (INTEL_GEN(i915) >= 11)
 		reg = GEN11_GPM_WGBOXPERF_INTR_MASK;
-		mask <<= 16; /* pm is in upper half */
-	} else if (INTEL_GEN(i915) >= 8) {
+	else if (INTEL_GEN(i915) >= 8)
 		reg = GEN8_GT_IMR(2);
-	} else {
+	else
 		reg = GEN6_PMIMR;
-	}
 
-	intel_uncore_write(uncore, reg, mask);
+	intel_uncore_write(uncore, reg, gt->pm_imr);
 }
 
 static void gen6_gt_pm_update_irq(struct intel_gt *gt,
@@ -75,25 +72,28 @@ static void write_pm_ier(struct intel_gt *gt)
 {
 	struct drm_i915_private *i915 = gt->i915;
 	struct intel_uncore *uncore = gt->uncore;
-	u32 mask = gt->pm_ier;
 	i915_reg_t reg;
 
-	if (INTEL_GEN(i915) >= 11) {
+	if (INTEL_GEN(i915) >= 11)
 		reg = GEN11_GPM_WGBOXPERF_INTR_ENABLE;
-		mask <<= 16; /* pm is in upper half */
-	} else if (INTEL_GEN(i915) >= 8) {
+	else if (INTEL_GEN(i915) >= 8)
 		reg = GEN8_GT_IER(2);
-	} else {
+	else
 		reg = GEN6_PMIER;
-	}
 
-	intel_uncore_write(uncore, reg, mask);
+	intel_uncore_write(uncore, reg, gt->pm_ier);
 }
 
 void gen6_gt_pm_enable_irq(struct intel_gt *gt, u32 enable_mask)
 {
+	struct drm_i915_private *i915 = gt->i915;
+
 	lockdep_assert_held(&gt->irq_lock);
 
+	if (INTEL_GEN(i915) >= 11)
+		enable_mask = (enable_mask << 16) | /* pm is in upper half */
+			GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT;
+
 	gt->pm_ier |= enable_mask;
 	write_pm_ier(gt);
 	gen6_gt_pm_unmask_irq(gt, enable_mask);
@@ -101,8 +101,14 @@ void gen6_gt_pm_enable_irq(struct intel_gt *gt, u32 enable_mask)
 
 void gen6_gt_pm_disable_irq(struct intel_gt *gt, u32 disable_mask)
 {
+	struct drm_i915_private *i915 = gt->i915;
+
 	lockdep_assert_held(&gt->irq_lock);
 
+	if (INTEL_GEN(i915) >= 11)
+		disable_mask = (disable_mask << 16) | /* pm is in upper half */
+			GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT;
+
 	gt->pm_ier &= ~disable_mask;
 	gen6_gt_pm_mask_irq(gt, disable_mask);
 	write_pm_ier(gt);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index d015f7b8b28e..a19099d69e3e 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -1904,6 +1904,8 @@ static void setup_rcs(struct intel_engine_cs *engine)
 
 	if (HAS_L3_DPF(i915))
 		engine->irq_keep_mask = GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
+	if (IS_HASWELL(i915))
+		engine->irq_keep_mask |= GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT;
 
 	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e28a3ab83fde..a9423af67bb8 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -201,6 +201,7 @@
 #include "gt/intel_gt.h"
 #include "gt/intel_lrc_reg.h"
 #include "gt/intel_ring.h"
+#include "gt/intel_gt_irq.h"
 
 #include "i915_drv.h"
 #include "i915_perf.h"
@@ -341,6 +342,7 @@ static const struct i915_oa_format gen12_oa_formats[I915_OA_FORMAT_MAX] = {
  *        (see get_default_sseu_config())
  * @poll_oa_period: The period in nanoseconds at which the CPU will check for OA
  * data availability
+ * @oa_interrupt_monitor: Whether we should monitor the OA interrupt.
  *
  * As read_properties_unlocked() enumerates and validates the properties given
  * to open a stream of metrics the configuration is built up in the structure
@@ -365,6 +367,7 @@ struct perf_open_properties {
 	struct intel_sseu sseu;
 
 	u64 poll_oa_period;
+	bool oa_interrupt_monitor;
 };
 
 struct i915_oa_config_bo {
@@ -2604,6 +2607,13 @@ static void gen7_oa_enable(struct i915_perf_stream *stream)
 	 */
 	gen7_init_oa_buffer(stream);
 
+	if (stream->oa_interrupt_monitor) {
+		spin_lock_irq(&stream->perf->i915->irq_lock);
+		gen5_gt_enable_irq(&stream->perf->i915->gt,
+				   GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT);
+		spin_unlock_irq(&stream->perf->i915->irq_lock);
+	}
+
 	intel_uncore_write(uncore, GEN7_OACONTROL,
 			   (ctx_id & GEN7_OACONTROL_CTX_MASK) |
 			   (period_exponent <<
@@ -2630,6 +2640,10 @@ static void gen8_oa_enable(struct i915_perf_stream *stream)
 	 */
 	gen8_init_oa_buffer(stream);
 
+	if (stream->oa_interrupt_monitor)
+		intel_uncore_write(uncore, GEN8_OA_IMR,
+				   ~GEN8_OA_IMR_MASK_INTR);
+
 	/*
 	 * Note: we don't rely on the hardware to perform single context
 	 * filtering and instead filter on the cpu based on the context-id
@@ -2654,6 +2668,10 @@ static void gen12_oa_enable(struct i915_perf_stream *stream)
 
 	gen12_init_oa_buffer(stream);
 
+	if (stream->oa_interrupt_monitor)
+		intel_uncore_write(uncore, GEN12_OA_IMR,
+				   ~GEN8_OA_IMR_MASK_INTR);
+
 	intel_uncore_write(uncore, GEN12_OAG_OACONTROL,
 			   (report_format << GEN12_OAG_OACONTROL_OA_COUNTER_FORMAT_SHIFT) |
 			   GEN12_OAG_OACONTROL_OA_COUNTER_ENABLE);
@@ -2672,6 +2690,10 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream)
 {
 	stream->pollin = false;
 
+	stream->half_full_count_last = 0;
+	atomic64_set(&stream->half_full_count,
+		     stream->half_full_count_last);
+
 	stream->perf->ops.oa_enable(stream);
 
 	if (stream->periodic)
@@ -2684,6 +2706,13 @@ static void gen7_oa_disable(struct i915_perf_stream *stream)
 {
 	struct intel_uncore *uncore = stream->uncore;
 
+	if (stream->oa_interrupt_monitor) {
+		spin_lock_irq(&stream->perf->i915->irq_lock);
+		gen5_gt_disable_irq(&stream->perf->i915->gt,
+				    GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT);
+		spin_unlock_irq(&stream->perf->i915->irq_lock);
+	}
+
 	intel_uncore_write(uncore, GEN7_OACONTROL, 0);
 	if (intel_wait_for_register(uncore,
 				    GEN7_OACONTROL, GEN7_OACONTROL_ENABLE, 0,
@@ -2696,6 +2725,8 @@ static void gen8_oa_disable(struct i915_perf_stream *stream)
 {
 	struct intel_uncore *uncore = stream->uncore;
 
+	intel_uncore_write(uncore, GEN8_OA_IMR, 0xffffffff);
+
 	intel_uncore_write(uncore, GEN8_OACONTROL, 0);
 	if (intel_wait_for_register(uncore,
 				    GEN8_OACONTROL, GEN8_OA_COUNTER_ENABLE, 0,
@@ -2708,6 +2739,8 @@ static void gen12_oa_disable(struct i915_perf_stream *stream)
 {
 	struct intel_uncore *uncore = stream->uncore;
 
+	intel_uncore_write(uncore, GEN12_OA_IMR, 0xffffffff);
+
 	intel_uncore_write(uncore, GEN12_OAG_OACONTROL, 0);
 	if (intel_wait_for_register(uncore,
 				    GEN12_OAG_OACONTROL,
@@ -3464,6 +3497,7 @@ i915_perf_open_ioctl_locked(struct i915_perf *perf,
 	stream->perf = perf;
 	stream->ctx = specific_ctx;
 	stream->poll_oa_period = props->poll_oa_period;
+	stream->oa_interrupt_monitor = props->oa_interrupt_monitor;
 
 	ret = i915_oa_stream_init(stream, param, props);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_perf_types.h b/drivers/gpu/drm/i915/i915_perf_types.h
index a36a455ae336..d9f7ad1af6ba 100644
--- a/drivers/gpu/drm/i915/i915_perf_types.h
+++ b/drivers/gpu/drm/i915/i915_perf_types.h
@@ -230,6 +230,19 @@ struct i915_perf_stream {
 	 */
 	bool pollin;
 
+	/**
+	 * @half_full_count: Atomic counter incremented by the interrupt
+	 * handling code for each OA half full interrupt received.
+	 */
+	atomic64_t half_full_count;
+
+	/**
+	 * half_full_count_last: Copy of the atomic half_full_count that was
+	 * last processed in the i915-perf driver. If both counters differ,
+	 * there is data available to read in the OA buffer.
+	 */
+	u64 half_full_count_last;
+
 	/**
 	 * @periodic: Whether periodic sampling is currently enabled.
 	 */
@@ -311,6 +324,12 @@ struct i915_perf_stream {
 	 * buffer should be checked for available data.
 	 */
 	u64 poll_oa_period;
+
+	/**
+	 * @oa_interrupt_monitor: Whether the stream will be notified by OA
+	 * interrupts.
+	 */
+	bool oa_interrupt_monitor;
 };
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0b39b9abf8a4..53eaa010ac92 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -695,6 +695,11 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 
 #define GEN12_OA_TLB_INV_CR _MMIO(0xceec)
 
+#define GEN8_OA_IMR _MMIO(0x2b20)
+#define  GEN8_OA_IMR_MASK_INTR (1 << 28)
+
+#define GEN12_OA_IMR _MMIO(0xdb14)
+
 /* Gen12 OAR unit */
 #define GEN12_OAR_OACONTROL _MMIO(0x2960)
 #define  GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT 1
@@ -3093,8 +3098,10 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define GT_BLT_USER_INTERRUPT			(1 << 22)
 #define GT_BSD_CS_ERROR_INTERRUPT		(1 << 15)
 #define GT_BSD_USER_INTERRUPT			(1 << 12)
+#define GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT (1 << 12) /* bdw+ */
 #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1	(1 << 11) /* hsw+; rsvd on snb, ivb, vlv */
 #define GT_WAIT_SEMAPHORE_INTERRUPT		REG_BIT(11) /* bdw+ */
+#define GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT   (1 <<  9) /* ivb+ but only used on hsw+ */
 #define GT_CONTEXT_SWITCH_INTERRUPT		(1 <<  8)
 #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT	(1 <<  5) /* !snb */
 #define GT_RENDER_PIPECTL_NOTIFY_INTERRUPT	(1 <<  4)
@@ -7531,6 +7538,7 @@ enum {
 #define  GEN8_DE_PIPE_B_IRQ		(1 << 17)
 #define  GEN8_DE_PIPE_A_IRQ		(1 << 16)
 #define  GEN8_DE_PIPE_IRQ(pipe)		(1 << (16 + (pipe)))
+#define  GEN8_GT_WDBOX_OACS_IRQ         (1 << 7)
 #define  GEN8_GT_VECS_IRQ		(1 << 6)
 #define  GEN8_GT_GUC_IRQ		(1 << 5)
 #define  GEN8_GT_PM_IRQ			(1 << 4)
@@ -7728,6 +7736,7 @@ enum {
 /* irq instances for OTHER_CLASS */
 #define OTHER_GUC_INSTANCE	0
 #define OTHER_GTPM_INSTANCE	1
+#define OTHER_WDOAPERF_INSTANCE	2
 
 #define GEN11_INTR_IDENTITY_REG(x)	_MMIO(0x190060 + ((x) * 4))
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH 3/3] drm/i915/perf: add interrupt enabling parameter
  2020-04-13 15:48 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
  2020-04-13 15:48 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads Umesh Nerlige Ramappa
  2020-04-13 15:48 ` [Intel-gfx] [PATCH 2/3] drm/i915: handle interrupts from the OA unit Umesh Nerlige Ramappa
@ 2020-04-13 15:48 ` Umesh Nerlige Ramappa
  2020-04-17 23:26   ` Dixit, Ashutosh
  2020-04-14 19:09 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: add OA interrupt support (rev9) Patchwork
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-04-13 15:48 UTC (permalink / raw)
  To: intel-gfx, Lionel G Landwerlin, Ashutosh Dixit

From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

This let's the application choose to be driven by the interrupt
mechanism of the HW. In conjunction with long periods for checks for
the availability of data on the CPU, this can reduce the CPU load when
doing capture of OA data.

v2: Version the new parameter (Joonas)
v3: Rebase (Umesh)
v4: Disable hrtimer if poll oa period is zero

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 64 +++++++++++++++++++++++++++-----
 include/uapi/drm/i915_drm.h      | 13 ++++++-
 2 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index a9423af67bb8..d3396129d828 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -463,6 +463,7 @@ static u32 gen7_oa_hw_tail_read(struct i915_perf_stream *stream)
  */
 static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 {
+	u64 half_full_count = atomic64_read(&stream->half_full_count);
 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
 	int report_size = stream->oa_buffer.format_size;
 	unsigned long flags;
@@ -539,6 +540,8 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
 			  stream->oa_buffer.head - gtt_offset) >= report_size;
 
+	stream->half_full_count_last = half_full_count;
+
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
 	return pollin;
@@ -556,16 +559,23 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
  * waiting on an event to occur. These checks are redundant when hrtimer events
  * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
  * redundant checks add cpu overhead. We simplify the check to reduce cpu
- * overhead.
+ * overhead. For interrupt events, we still need to make sure that
+ * oa_buffer_check_unlocked is called when an interrupt occurs.
  */
 static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
 {
 	unsigned long flags;
 	bool available;
 
-	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
-	available = stream->oa_buffer.tail != stream->oa_buffer.head;
-	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
+	if (!stream->oa_interrupt_monitor) {
+		spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
+		available = stream->oa_buffer.tail != stream->oa_buffer.head;
+		spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
+	} else {
+		if (stream->half_full_count_last !=
+		    atomic64_read(&stream->half_full_count))
+			available = oa_buffer_check_unlocked(stream);
+	}
 
 	return available;
 }
@@ -1163,8 +1173,8 @@ static int gen7_oa_read(struct i915_perf_stream *stream,
  * @stream: An i915-perf stream opened for OA metrics
  *
  * Called when userspace tries to read() from a blocking stream FD opened
- * for OA metrics. It waits until the hrtimer callback finds a non-empty
- * OA buffer and wakes us.
+ * for OA metrics. It waits until either the hrtimer callback finds a non-empty
+ * OA buffer or the OA interrupt kicks in and wakes us.
  *
  * Note: it's acceptable to have this return with some false positives
  * since any subsequent read handling will return -EAGAIN if there isn't
@@ -2696,7 +2706,7 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream)
 
 	stream->perf->ops.oa_enable(stream);
 
-	if (stream->periodic)
+	if (stream->periodic && stream->poll_oa_period)
 		hrtimer_start(&stream->poll_check_timer,
 			      ns_to_ktime(stream->poll_oa_period),
 			      HRTIMER_MODE_REL_PINNED);
@@ -2770,6 +2780,10 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream)
 {
 	stream->perf->ops.oa_disable(stream);
 
+	stream->half_full_count_last = 0;
+	atomic64_set(&stream->half_full_count,
+		     stream->half_full_count_last);
+
 	if (stream->periodic)
 		hrtimer_cancel(&stream->poll_check_timer);
 }
@@ -3137,6 +3151,16 @@ static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream,
 
 	stream->ops->poll_wait(stream, file, wait);
 
+	/*
+	 * Only check the half buffer full notifications if requested by the
+	 * user.
+	 */
+	if (stream->oa_interrupt_monitor &&
+	    (stream->half_full_count_last !=
+	     atomic64_read(&stream->half_full_count))) {
+		stream->pollin = oa_buffer_check_unlocked(stream);
+	}
+
 	/* Note: we don't explicitly check whether there's something to read
 	 * here since this path may be very hot depending on what else
 	 * userspace is polling, or on the timeout in use. We rely solely on
@@ -3554,6 +3578,7 @@ static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
 /**
  * read_properties_unlocked - validate + copy userspace stream open properties
  * @perf: i915 perf instance
+ * @open_flags: Flags set by userspace for the opening of the stream
  * @uprops: The array of u64 key value pairs given by userspace
  * @n_props: The number of key value pairs expected in @uprops
  * @props: The stream configuration built up while validating properties
@@ -3567,6 +3592,7 @@ static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent)
  * rule out defining new properties with ordering requirements in the future.
  */
 static int read_properties_unlocked(struct i915_perf *perf,
+				    u32 open_flags,
 				    u64 __user *uprops,
 				    u32 n_props,
 				    struct perf_open_properties *props)
@@ -3710,13 +3736,16 @@ static int read_properties_unlocked(struct i915_perf *perf,
 			break;
 		}
 		case DRM_I915_PERF_PROP_POLL_OA_PERIOD:
-			if (value < 100000 /* 100us */) {
+			if (value > 0 && value < 100000 /* 100us */) {
 				DRM_DEBUG("OA availability timer too small (%lluns < 100us)\n",
 					  value);
 				return -EINVAL;
 			}
 			props->poll_oa_period = value;
 			break;
+		case DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT:
+			props->oa_interrupt_monitor = value != 0;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
@@ -3725,6 +3754,19 @@ static int read_properties_unlocked(struct i915_perf *perf,
 		uprop += 2;
 	}
 
+	/*
+	 * Blocking read need to be waken up by some mechanism. If no polling
+	 * of the HEAD/TAIL register is done by the kernel and no interrupt is
+	 * enabled, we'll never be able to wake up.
+	 */
+	if ((open_flags & I915_PERF_FLAG_FD_NONBLOCK) == 0 &&
+	    !props->poll_oa_period &&
+	    !props->oa_interrupt_monitor) {
+		DRM_DEBUG("Requesting a blocking stream with no polling period "
+			  "& no interrupt.\n");
+		return -EINVAL;
+	}
+
 	return 0;
 }
 
@@ -3775,6 +3817,7 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 	}
 
 	ret = read_properties_unlocked(perf,
+				       param->flags,
 				       u64_to_user_ptr(param->properties_ptr),
 				       param->num_properties,
 				       &props);
@@ -4502,8 +4545,11 @@ int i915_perf_ioctl_version(void)
 	 *
 	 * 5: Add DRM_I915_PERF_PROP_POLL_OA_PERIOD parameter that controls the
 	 *    interval for the hrtimer used to check for OA data.
+	 *
+	 * 6: Add DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT parameter to
+	 *    enable/disable interrupts in OA.
 	 */
-	return 5;
+	return 6;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 14b67cd6b54b..947e65b937eb 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1987,12 +1987,23 @@ enum drm_i915_perf_property_id {
 	 * the driver if this parameter is not specified. Note that larger timer
 	 * values will reduce cpu consumption during OA perf captures. However,
 	 * excessively large values would potentially result in OA buffer
-	 * overwrites as captures reach end of the OA buffer.
+	 * overwrites as captures reach end of the OA buffer. A value of 0 means
+	 * no hrtimer will be started.
 	 *
 	 * This property is available in perf revision 5.
 	 */
 	DRM_I915_PERF_PROP_POLL_OA_PERIOD,
 
+	/**
+	 * Specifying this property sets up the interrupt mechanism for the OA
+	 * buffer in i915. This option in conjunction with a long polling period
+	 * for avaibility of OA data can reduce CPU load significantly if you
+	 * do not care about OA data being read as soon as it's available.
+	 *
+	 * This property is available in perf revision 6.
+	 */
+	DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads
  2020-04-13 15:48 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads Umesh Nerlige Ramappa
@ 2020-04-14  5:08   ` Dixit, Ashutosh
  2020-04-14  6:58     ` Dixit, Ashutosh
  2020-04-15 10:00   ` Lionel Landwerlin
  1 sibling, 1 reply; 19+ messages in thread
From: Dixit, Ashutosh @ 2020-04-14  5:08 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
>
> A condition in wait_event_interruptible seems to be checked twice before
> waiting on the event to occur. These checks are redundant when hrtimer
> events will call oa_buffer_check_unlocked to update the oa_buffer tail
> pointers. The redundant checks add cpu overhead. Simplify the check
> to reduce cpu overhead when using blocking io to read oa buffer reports.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 5cde3e4e7be6..e28a3ab83fde 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>	return pollin;
>  }
>
> +/**
> + * oa_buffer_check_reports - quick check if reports are available
> + * @stream: i915 stream instance
> + *
> + * The return from this function is used as a condition for
> + * wait_event_interruptible in blocking read. This is used to detect
> + * available reports.
> + *
> + * A condition in wait_event_interruptible seems to be checked twice before
> + * waiting on an event to occur. These checks are redundant when hrtimer events
> + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
> + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> + * overhead.
> + */
> +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> +{
> +	unsigned long flags;
> +	bool available;
> +
> +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> +
> +	return available;
> +}
> +
>  /**
>   * append_oa_status - Appends a status record to a userspace read() buffer.
>   * @stream: An i915-perf stream opened for OA metrics
> @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>		return -EIO;
>
>	return wait_event_interruptible(stream->poll_wq,
> -					oa_buffer_check_unlocked(stream));
> +					oa_buffer_check_reports(stream));
>  }

I think the problem with this patch is that the original code had the
property that the condition for data availability is checked (and the OA
tail advanced) /before/ entering the wait. So the tail was advanced and if
there was data there was no need to wait at all. This change has lost that
property. With this change we will first always enter the wait and then get
unblocked after the timer interrupt which will advance the tail and wake us
up.

I think if we want to do this, it is possible to do without losing the
original property. Just call oa_buffer_check_unlocked() first (outside
wait_event) and if there is data just return. If not, put yourself on
stream->poll_wq from which the timer callback will wake us up. I think this
is going to be something like prepare_to_wait() / schedule() /
finish_wait() pattern of which there are numerous examples in the
kernel. So in this case we won't call wait_event_interruptible() which
checks the condition upon waking up. No need to define
oa_buffer_check_reports() either.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads
  2020-04-14  5:08   ` Dixit, Ashutosh
@ 2020-04-14  6:58     ` Dixit, Ashutosh
  2020-04-14 16:59       ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 19+ messages in thread
From: Dixit, Ashutosh @ 2020-04-14  6:58 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
>
> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
> >
> > A condition in wait_event_interruptible seems to be checked twice before
> > waiting on the event to occur. These checks are redundant when hrtimer
> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
> > pointers. The redundant checks add cpu overhead. Simplify the check
> > to reduce cpu overhead when using blocking io to read oa buffer reports.
> >
> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
> >  1 file changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index 5cde3e4e7be6..e28a3ab83fde 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >	return pollin;
> >  }
> >
> > +/**
> > + * oa_buffer_check_reports - quick check if reports are available
> > + * @stream: i915 stream instance
> > + *
> > + * The return from this function is used as a condition for
> > + * wait_event_interruptible in blocking read. This is used to detect
> > + * available reports.
> > + *
> > + * A condition in wait_event_interruptible seems to be checked twice before
> > + * waiting on an event to occur. These checks are redundant when hrtimer events
> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> > + * overhead.
> > + */
> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> > +{
> > +	unsigned long flags;
> > +	bool available;
> > +
> > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> > +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> > +
> > +	return available;
> > +}
> > +
> >  /**
> >   * append_oa_status - Appends a status record to a userspace read() buffer.
> >   * @stream: An i915-perf stream opened for OA metrics
> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> >		return -EIO;
> >
> >	return wait_event_interruptible(stream->poll_wq,
> > -					oa_buffer_check_unlocked(stream));
> > +					oa_buffer_check_reports(stream));
> >  }
>
> I think the problem with this patch is that the original code had the
> property that the condition for data availability is checked (and the OA
> tail advanced) /before/ entering the wait. So the tail was advanced and if
> there was data there was no need to wait at all. This change has lost that
> property. With this change we will first always enter the wait and then get
> unblocked after the timer interrupt which will advance the tail and wake us
> up.
>
> I think if we want to do this, it is possible to do without losing the
> original property. Just call oa_buffer_check_unlocked() first (outside
> wait_event) and if there is data just return. If not, put yourself on
> stream->poll_wq from which the timer callback will wake us up. I think this
> is going to be something like prepare_to_wait() / schedule() /
> finish_wait() pattern of which there are numerous examples in the
> kernel. So in this case we won't call wait_event_interruptible() which
> checks the condition upon waking up. No need to define
> oa_buffer_check_reports() either.

If on the other hand we say that this should actually be the desired
behavior of the blocking read, that it should not unblock immediately but
only after the next timer interrupt (so that an app can call the blocking
read repeatedly without worrying about it will return a little bit of data
in each call at a considerable amount of CPU load), then we may be able to
do something like this, but even then we may have to think about what would
be the correct way to do that. Though this may be that correct way and we
may just need to change the commit message, but is that what we want?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads
  2020-04-14  6:58     ` Dixit, Ashutosh
@ 2020-04-14 16:59       ` Umesh Nerlige Ramappa
  2020-04-14 19:09         ` Dixit, Ashutosh
  0 siblings, 1 reply; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-04-14 16:59 UTC (permalink / raw)
  To: Dixit, Ashutosh; +Cc: intel-gfx

On Mon, Apr 13, 2020 at 11:58:18PM -0700, Dixit, Ashutosh wrote:
>On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
>>
>> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
>> >
>> > A condition in wait_event_interruptible seems to be checked twice before
>> > waiting on the event to occur. These checks are redundant when hrtimer
>> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
>> > pointers. The redundant checks add cpu overhead. Simplify the check
>> > to reduce cpu overhead when using blocking io to read oa buffer reports.
>> >
>> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>> >  1 file changed, 27 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> > index 5cde3e4e7be6..e28a3ab83fde 100644
>> > --- a/drivers/gpu/drm/i915/i915_perf.c
>> > +++ b/drivers/gpu/drm/i915/i915_perf.c
>> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>> >	return pollin;
>> >  }
>> >
>> > +/**
>> > + * oa_buffer_check_reports - quick check if reports are available
>> > + * @stream: i915 stream instance
>> > + *
>> > + * The return from this function is used as a condition for
>> > + * wait_event_interruptible in blocking read. This is used to detect
>> > + * available reports.
>> > + *
>> > + * A condition in wait_event_interruptible seems to be checked twice before
>> > + * waiting on an event to occur. These checks are redundant when hrtimer events
>> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
>> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu
>> > + * overhead.
>> > + */
>> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
>> > +{
>> > +	unsigned long flags;
>> > +	bool available;
>> > +
>> > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>> > +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
>> > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>> > +
>> > +	return available;
>> > +}
>> > +
>> >  /**
>> >   * append_oa_status - Appends a status record to a userspace read() buffer.
>> >   * @stream: An i915-perf stream opened for OA metrics
>> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>> >		return -EIO;
>> >
>> >	return wait_event_interruptible(stream->poll_wq,
>> > -					oa_buffer_check_unlocked(stream));
>> > +					oa_buffer_check_reports(stream));
>> >  }
>>
>> I think the problem with this patch is that the original code had the
>> property that the condition for data availability is checked (and the OA
>> tail advanced) /before/ entering the wait. So the tail was advanced and if
>> there was data there was no need to wait at all. This change has lost that
>> property. With this change we will first always enter the wait and then get
>> unblocked after the timer interrupt which will advance the tail and wake us
>> up.
>>
>> I think if we want to do this, it is possible to do without losing the
>> original property. Just call oa_buffer_check_unlocked() first (outside
>> wait_event) and if there is data just return. If not, put yourself on
>> stream->poll_wq from which the timer callback will wake us up. I think this
>> is going to be something like prepare_to_wait() / schedule() /
>> finish_wait() pattern of which there are numerous examples in the
>> kernel. So in this case we won't call wait_event_interruptible() which
>> checks the condition upon waking up. No need to define
>> oa_buffer_check_reports() either.
>
>If on the other hand we say that this should actually be the desired
>behavior of the blocking read, that it should not unblock immediately but
>only after the next timer interrupt (so that an app can call the blocking
>read repeatedly without worrying about it will return a little bit of data
>in each call at a considerable amount of CPU load), then we may be able to
>do something like this, but even then we may have to think about what would
>be the correct way to do that. Though this may be that correct way and we
>may just need to change the commit message, but is that what we want?

I am not quite clear on how the blocking read should behave in terms of 
the API itself.

The change here is solely to reduce cpu overhead from the additional 2 
calls to oa_buffer_check_unlocked before blocking and that would happen 
on every call to the read. I agree that the read would block if the 
timer did not update the tail yet, but that makes sense in a way that we 
don't want read to constantly return data when the OA sampling rates are 
high (say 100 us). With this change the behavior becomes consistent 
irrespective of the OA sampling rate and user has the flexibility to set 
the POLL_OA_PERIOD to whatever value is suitable for the OA sampling 
rate chosen.

Thanks,
Umesh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: add OA interrupt support (rev9)
  2020-04-13 15:48 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (2 preceding siblings ...)
  2020-04-13 15:48 ` [Intel-gfx] [PATCH 3/3] drm/i915/perf: add interrupt enabling parameter Umesh Nerlige Ramappa
@ 2020-04-14 19:09 ` Patchwork
  2020-04-14 19:28 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-04-14 19:09 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev9)
URL   : https://patchwork.freedesktop.org/series/54280/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
e5020136b3c9 drm/i915/perf: Reduce cpu overhead for blocking perf OA reads
260dadb836a3 drm/i915: handle interrupts from the OA unit
4ff3bedffc2f drm/i915/perf: add interrupt enabling parameter
-:106: CHECK:UNNECESSARY_PARENTHESES: Unnecessary parentheses around 'stream->half_full_count_last !=
 	     atomic64_read(&stream->half_full_count)'
#106: FILE: drivers/gpu/drm/i915/i915_perf.c:3158:
+	if (stream->oa_interrupt_monitor &&
+	    (stream->half_full_count_last !=
+	     atomic64_read(&stream->half_full_count))) {

total: 0 errors, 0 warnings, 1 checks, 179 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads
  2020-04-14 16:59       ` Umesh Nerlige Ramappa
@ 2020-04-14 19:09         ` Dixit, Ashutosh
  2020-04-14 19:37           ` Dixit, Ashutosh
  0 siblings, 1 reply; 19+ messages in thread
From: Dixit, Ashutosh @ 2020-04-14 19:09 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 14 Apr 2020 09:59:48 -0700, Umesh Nerlige Ramappa wrote:
> On Mon, Apr 13, 2020 at 11:58:18PM -0700, Dixit, Ashutosh wrote:
> > On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
> >>
> >> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
> >> >
> >> > A condition in wait_event_interruptible seems to be checked twice before
> >> > waiting on the event to occur. These checks are redundant when hrtimer
> >> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
> >> > pointers. The redundant checks add cpu overhead. Simplify the check
> >> > to reduce cpu overhead when using blocking io to read oa buffer reports.
> >> >
> >> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> > ---
> >> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
> >> >  1 file changed, 27 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >> > index 5cde3e4e7be6..e28a3ab83fde 100644
> >> > --- a/drivers/gpu/drm/i915/i915_perf.c
> >> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> >> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >> >	return pollin;
> >> >  }
> >> >
> >> > +/**
> >> > + * oa_buffer_check_reports - quick check if reports are available
> >> > + * @stream: i915 stream instance
> >> > + *
> >> > + * The return from this function is used as a condition for
> >> > + * wait_event_interruptible in blocking read. This is used to detect
> >> > + * available reports.
> >> > + *
> >> > + * A condition in wait_event_interruptible seems to be checked twice before
> >> > + * waiting on an event to occur. These checks are redundant when hrtimer events
> >> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
> >> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> >> > + * overhead.
> >> > + */
> >> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> >> > +{
> >> > +	unsigned long flags;
> >> > +	bool available;
> >> > +
> >> > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> >> > +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> >> > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> >> > +
> >> > +	return available;
> >> > +}
> >> > +
> >> >  /**
> >> >   * append_oa_status - Appends a status record to a userspace read() buffer.
> >> >   * @stream: An i915-perf stream opened for OA metrics
> >> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> >> >		return -EIO;
> >> >
> >> >	return wait_event_interruptible(stream->poll_wq,
> >> > -					oa_buffer_check_unlocked(stream));
> >> > +					oa_buffer_check_reports(stream));
> >> >  }
> >>
> >> I think the problem with this patch is that the original code had the
> >> property that the condition for data availability is checked (and the OA
> >> tail advanced) /before/ entering the wait. So the tail was advanced and if
> >> there was data there was no need to wait at all. This change has lost that
> >> property. With this change we will first always enter the wait and then get
> >> unblocked after the timer interrupt which will advance the tail and wake us
> >> up.
> >>
> >> I think if we want to do this, it is possible to do without losing the
> >> original property. Just call oa_buffer_check_unlocked() first (outside
> >> wait_event) and if there is data just return. If not, put yourself on
> >> stream->poll_wq from which the timer callback will wake us up. I think this
> >> is going to be something like prepare_to_wait() / schedule() /
> >> finish_wait() pattern of which there are numerous examples in the
> >> kernel. So in this case we won't call wait_event_interruptible() which
> >> checks the condition upon waking up. No need to define
> >> oa_buffer_check_reports() either.
> >
> > If on the other hand we say that this should actually be the desired
> > behavior of the blocking read, that it should not unblock immediately but
> > only after the next timer interrupt (so that an app can call the blocking
> > read repeatedly without worrying about it will return a little bit of data
> > in each call at a considerable amount of CPU load), then we may be able to
> > do something like this, but even then we may have to think about what would
> > be the correct way to do that. Though this may be that correct way and we
> > may just need to change the commit message, but is that what we want?
>
> I am not quite clear on how the blocking read should behave in terms of the
> API itself.
>
> The change here is solely to reduce cpu overhead from the additional 2
> calls to oa_buffer_check_unlocked before blocking and that would happen on
> every call to the read. I agree that the read would block if the timer did
> not update the tail yet, but that makes sense in a way that we don't want
> read to constantly return data when the OA sampling rates are high (say 100
> us). With this change the behavior becomes consistent irrespective of the
> OA sampling rate and user has the flexibility to set the POLL_OA_PERIOD to
> whatever value is suitable for the OA sampling rate chosen.

I think the patch commit message should clearly state the reason for the
patch. Afais if the purpose of the patch is to reduce cpu overhead, the
patch should not change API behavior. If on the other hand the purpose of
the patch is to make the API behavior consistent, then please resubmit the
patch after modifying the commit message to reflect the purpose of the
patch so that the patch can be reviewed from that point of view.

Also I think we should be able to just use stream->pollin instead of
defining the new oa_buffer_check_reports(). E.g. isn't
stream->pollin == (stream->oa_buffer.tail != stream->oa_buffer.head) ?

Thanks!
--
Ashutosh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✗ Fi.CI.DOCS: warning for drm/i915/perf: add OA interrupt support (rev9)
  2020-04-13 15:48 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (3 preceding siblings ...)
  2020-04-14 19:09 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: add OA interrupt support (rev9) Patchwork
@ 2020-04-14 19:28 ` Patchwork
  2020-04-14 19:33 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
  2020-04-15 11:50 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  6 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-04-14 19:28 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev9)
URL   : https://patchwork.freedesktop.org/series/54280/
State : warning

== Summary ==

$ make htmldocs 2>&1 > /dev/null | grep i915
./drivers/gpu/drm/i915/i915_perf_types.h:240: warning: Incorrect use of kernel-doc format:          * half_full_count_last: Copy of the atomic half_full_count that was
./drivers/gpu/drm/i915/i915_perf_types.h:333: warning: Function parameter or member 'half_full_count_last' not described in 'i915_perf_stream'
./drivers/gpu/drm/i915/i915_perf_types.h:240: warning: Incorrect use of kernel-doc format:          * half_full_count_last: Copy of the atomic half_full_count that was
./drivers/gpu/drm/i915/i915_perf_types.h:240: warning: Incorrect use of kernel-doc format:          * half_full_count_last: Copy of the atomic half_full_count that was
/home/cidrm/kernel/Documentation/gpu/i915.rst:610: WARNING: duplicate label gpu/i915:layout, other instance in /home/cidrm/kernel/Documentation/gpu/i915.rst

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/perf: add OA interrupt support (rev9)
  2020-04-13 15:48 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (4 preceding siblings ...)
  2020-04-14 19:28 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
@ 2020-04-14 19:33 ` Patchwork
  2020-04-15 11:50 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  6 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-04-14 19:33 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev9)
URL   : https://patchwork.freedesktop.org/series/54280/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8298 -> Patchwork_17287
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/index.html

Known issues
------------

  Here are the changes found in Patchwork_17287 that come from known issues:

### IGT changes ###

#### Warnings ####

  * igt@i915_pm_rpm@module-reload:
    - fi-kbl-x1275:       [SKIP][1] ([fdo#109271]) -> [FAIL][2] ([i915#62])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/fi-kbl-x1275/igt@i915_pm_rpm@module-reload.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/fi-kbl-x1275/igt@i915_pm_rpm@module-reload.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62


Participating hosts (48 -> 44)
------------------------------

  Missing    (4): fi-byt-clapper fi-byt-squawks fi-bsw-cyan fi-hsw-4200u 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8298 -> Patchwork_17287

  CI-20190529: 20190529
  CI_DRM_8298: 17f82f0c2857d0b442adbdb62eb44b61d0f5b775 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5589: 31962324ac86f029e2841e56e97c42cf9d572956 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17287: 4ff3bedffc2fc1a27ffdeb74a023a46fd04a53a6 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

4ff3bedffc2f drm/i915/perf: add interrupt enabling parameter
260dadb836a3 drm/i915: handle interrupts from the OA unit
e5020136b3c9 drm/i915/perf: Reduce cpu overhead for blocking perf OA reads

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads
  2020-04-14 19:09         ` Dixit, Ashutosh
@ 2020-04-14 19:37           ` Dixit, Ashutosh
  0 siblings, 0 replies; 19+ messages in thread
From: Dixit, Ashutosh @ 2020-04-14 19:37 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Tue, 14 Apr 2020 12:09:42 -0700, Dixit, Ashutosh wrote:
>
> On Tue, 14 Apr 2020 09:59:48 -0700, Umesh Nerlige Ramappa wrote:
> > On Mon, Apr 13, 2020 at 11:58:18PM -0700, Dixit, Ashutosh wrote:
> > > On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
> > >>
> > >> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
> > >> >
> > >> > A condition in wait_event_interruptible seems to be checked twice before
> > >> > waiting on the event to occur. These checks are redundant when hrtimer
> > >> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
> > >> > pointers. The redundant checks add cpu overhead. Simplify the check
> > >> > to reduce cpu overhead when using blocking io to read oa buffer reports.
> > >> >
> > >> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > >> > ---
> > >> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
> > >> >  1 file changed, 27 insertions(+), 1 deletion(-)
> > >> >
> > >> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > >> > index 5cde3e4e7be6..e28a3ab83fde 100644
> > >> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > >> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > >> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> > >> >	return pollin;
> > >> >  }
> > >> >
> > >> > +/**
> > >> > + * oa_buffer_check_reports - quick check if reports are available
> > >> > + * @stream: i915 stream instance
> > >> > + *
> > >> > + * The return from this function is used as a condition for
> > >> > + * wait_event_interruptible in blocking read. This is used to detect
> > >> > + * available reports.
> > >> > + *
> > >> > + * A condition in wait_event_interruptible seems to be checked twice before
> > >> > + * waiting on an event to occur. These checks are redundant when hrtimer events
> > >> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
> > >> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> > >> > + * overhead.
> > >> > + */
> > >> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> > >> > +{
> > >> > +	unsigned long flags;
> > >> > +	bool available;
> > >> > +
> > >> > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> > >> > +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> > >> > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> > >> > +
> > >> > +	return available;
> > >> > +}
> > >> > +
> > >> >  /**
> > >> >   * append_oa_status - Appends a status record to a userspace read() buffer.
> > >> >   * @stream: An i915-perf stream opened for OA metrics
> > >> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> > >> >		return -EIO;
> > >> >
> > >> >	return wait_event_interruptible(stream->poll_wq,
> > >> > -					oa_buffer_check_unlocked(stream));
> > >> > +					oa_buffer_check_reports(stream));
> > >> >  }
> > >>
> > >> I think the problem with this patch is that the original code had the
> > >> property that the condition for data availability is checked (and the OA
> > >> tail advanced) /before/ entering the wait. So the tail was advanced and if
> > >> there was data there was no need to wait at all. This change has lost that
> > >> property. With this change we will first always enter the wait and then get
> > >> unblocked after the timer interrupt which will advance the tail and wake us
> > >> up.
> > >>
> > >> I think if we want to do this, it is possible to do without losing the
> > >> original property. Just call oa_buffer_check_unlocked() first (outside
> > >> wait_event) and if there is data just return. If not, put yourself on
> > >> stream->poll_wq from which the timer callback will wake us up. I think this
> > >> is going to be something like prepare_to_wait() / schedule() /
> > >> finish_wait() pattern of which there are numerous examples in the
> > >> kernel. So in this case we won't call wait_event_interruptible() which
> > >> checks the condition upon waking up. No need to define
> > >> oa_buffer_check_reports() either.
> > >
> > > If on the other hand we say that this should actually be the desired
> > > behavior of the blocking read, that it should not unblock immediately but
> > > only after the next timer interrupt (so that an app can call the blocking
> > > read repeatedly without worrying about it will return a little bit of data
> > > in each call at a considerable amount of CPU load), then we may be able to
> > > do something like this, but even then we may have to think about what would
> > > be the correct way to do that. Though this may be that correct way and we
> > > may just need to change the commit message, but is that what we want?
> >
> > I am not quite clear on how the blocking read should behave in terms of the
> > API itself.
> >
> > The change here is solely to reduce cpu overhead from the additional 2
> > calls to oa_buffer_check_unlocked before blocking and that would happen on
> > every call to the read. I agree that the read would block if the timer did
> > not update the tail yet, but that makes sense in a way that we don't want
> > read to constantly return data when the OA sampling rates are high (say 100
> > us). With this change the behavior becomes consistent irrespective of the
> > OA sampling rate and user has the flexibility to set the POLL_OA_PERIOD to
> > whatever value is suitable for the OA sampling rate chosen.
>
> I think the patch commit message should clearly state the reason for the
> patch. Afais if the purpose of the patch is to reduce cpu overhead, the
> patch should not change API behavior. If on the other hand the purpose of
> the patch is to make the API behavior consistent, then please resubmit the
> patch after modifying the commit message to reflect the purpose of the
> patch so that the patch can be reviewed from that point of view.
>
> Also I think we should be able to just use stream->pollin instead of
> defining the new oa_buffer_check_reports(). E.g. isn't
> stream->pollin == (stream->oa_buffer.tail != stream->oa_buffer.head) ?

Actually yes, using stream->pollin will make the wait for both blocking and
non-blocking reads identical. So the blocking wait just becomes:

static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
{
	...

        return wait_event_interruptible(stream->poll_wq, stream->pollin);
}

That is if we re-purpose the patch to make the API behavior consistent
across blocking and non-blocking reads.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads
  2020-04-13 15:48 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads Umesh Nerlige Ramappa
  2020-04-14  5:08   ` Dixit, Ashutosh
@ 2020-04-15 10:00   ` Lionel Landwerlin
  2020-04-15 18:55     ` Umesh Nerlige Ramappa
  1 sibling, 1 reply; 19+ messages in thread
From: Lionel Landwerlin @ 2020-04-15 10:00 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx, Ashutosh Dixit

On 13/04/2020 18:48, Umesh Nerlige Ramappa wrote:
> A condition in wait_event_interruptible seems to be checked twice before
> waiting on the event to occur. These checks are redundant when hrtimer
> events will call oa_buffer_check_unlocked to update the oa_buffer tail
> pointers. The redundant checks add cpu overhead. Simplify the check
> to reduce cpu overhead when using blocking io to read oa buffer reports.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>


I cherry picked this patch alone and it seems to break the 
disabled-read-error test.


> ---
>   drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>   1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 5cde3e4e7be6..e28a3ab83fde 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   	return pollin;
>   }
>   
> +/**
> + * oa_buffer_check_reports - quick check if reports are available
> + * @stream: i915 stream instance
> + *
> + * The return from this function is used as a condition for
> + * wait_event_interruptible in blocking read. This is used to detect
> + * available reports.
> + *
> + * A condition in wait_event_interruptible seems to be checked twice before
> + * waiting on an event to occur. These checks are redundant when hrtimer events
> + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
> + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> + * overhead.
> + */
> +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> +{
> +	unsigned long flags;
> +	bool available;
> +
> +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> +
> +	return available;
> +}
> +
>   /**
>    * append_oa_status - Appends a status record to a userspace read() buffer.
>    * @stream: An i915-perf stream opened for OA metrics
> @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>   		return -EIO;
>   
>   	return wait_event_interruptible(stream->poll_wq,
> -					oa_buffer_check_unlocked(stream));
> +					oa_buffer_check_reports(stream));
>   }
>   
>   /**


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/perf: add OA interrupt support (rev9)
  2020-04-13 15:48 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
                   ` (5 preceding siblings ...)
  2020-04-14 19:33 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2020-04-15 11:50 ` Patchwork
  6 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-04-15 11:50 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/perf: add OA interrupt support (rev9)
URL   : https://patchwork.freedesktop.org/series/54280/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8298_full -> Patchwork_17287_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_17287_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_balancer@hang:
    - shard-tglb:         [PASS][1] -> [FAIL][2] ([i915#1277])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-tglb2/igt@gem_exec_balancer@hang.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-tglb1/igt@gem_exec_balancer@hang.html

  * igt@kms_cursor_crc@pipe-a-cursor-128x42-offscreen:
    - shard-apl:          [PASS][3] -> [FAIL][4] ([i915#54] / [i915#95])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-apl1/igt@kms_cursor_crc@pipe-a-cursor-128x42-offscreen.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-apl1/igt@kms_cursor_crc@pipe-a-cursor-128x42-offscreen.html

  * igt@kms_cursor_crc@pipe-b-cursor-suspend:
    - shard-apl:          [PASS][5] -> [DMESG-WARN][6] ([i915#180]) +1 similar issue
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-apl6/igt@kms_cursor_crc@pipe-b-cursor-suspend.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-apl2/igt@kms_cursor_crc@pipe-b-cursor-suspend.html

  * igt@kms_cursor_crc@pipe-c-cursor-suspend:
    - shard-kbl:          [PASS][7] -> [DMESG-WARN][8] ([i915#180]) +3 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-kbl3/igt@kms_cursor_crc@pipe-c-cursor-suspend.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-kbl4/igt@kms_cursor_crc@pipe-c-cursor-suspend.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-skl:          [PASS][9] -> [FAIL][10] ([i915#79])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-skl7/igt@kms_flip@flip-vs-expired-vblank-interruptible.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-skl3/igt@kms_flip@flip-vs-expired-vblank-interruptible.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
    - shard-skl:          [PASS][11] -> [INCOMPLETE][12] ([i915#69])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-skl9/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-skl2/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes:
    - shard-apl:          [PASS][13] -> [DMESG-WARN][14] ([i915#180] / [i915#95])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-apl1/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-apl1/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html
    - shard-kbl:          [PASS][15] -> [DMESG-WARN][16] ([i915#180] / [i915#93] / [i915#95])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-kbl3/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-kbl4/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [PASS][17] -> [FAIL][18] ([fdo#108145] / [i915#265]) +1 similar issue
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-skl3/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-skl10/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_psr@psr2_primary_page_flip:
    - shard-iclb:         [PASS][19] -> [SKIP][20] ([fdo#109441])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-iclb2/igt@kms_psr@psr2_primary_page_flip.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-iclb6/igt@kms_psr@psr2_primary_page_flip.html

  
#### Possible fixes ####

  * igt@gem_exec_params@invalid-bsd-ring:
    - shard-iclb:         [SKIP][21] ([fdo#109276]) -> [PASS][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-iclb7/igt@gem_exec_params@invalid-bsd-ring.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-iclb2/igt@gem_exec_params@invalid-bsd-ring.html

  * {igt@gem_wait@write-wait@all}:
    - shard-skl:          [FAIL][23] ([i915#1676]) -> [PASS][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-skl2/igt@gem_wait@write-wait@all.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-skl7/igt@gem_wait@write-wait@all.html

  * igt@gen9_exec_parse@allowed-all:
    - shard-apl:          [DMESG-WARN][25] ([i915#716]) -> [PASS][26]
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-apl3/igt@gen9_exec_parse@allowed-all.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-apl2/igt@gen9_exec_parse@allowed-all.html

  * igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy:
    - shard-glk:          [FAIL][27] ([i915#72]) -> [PASS][28]
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-glk6/igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-glk1/igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy.html

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-skl:          [FAIL][29] ([i915#79]) -> [PASS][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-skl8/igt@kms_flip@flip-vs-expired-vblank.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-skl3/igt@kms_flip@flip-vs-expired-vblank.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes:
    - shard-kbl:          [DMESG-WARN][31] ([i915#180]) -> [PASS][32] +1 similar issue
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-kbl3/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-kbl4/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html

  * igt@kms_psr@psr2_cursor_render:
    - shard-iclb:         [SKIP][33] ([fdo#109441]) -> [PASS][34] +3 similar issues
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-iclb7/igt@kms_psr@psr2_cursor_render.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-iclb2/igt@kms_psr@psr2_cursor_render.html

  * igt@kms_vblank@pipe-b-ts-continuation-suspend:
    - shard-apl:          [DMESG-WARN][35] ([i915#180]) -> [PASS][36] +3 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-apl4/igt@kms_vblank@pipe-b-ts-continuation-suspend.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-apl4/igt@kms_vblank@pipe-b-ts-continuation-suspend.html

  * {igt@perf@blocking-parameterized}:
    - shard-glk:          [FAIL][37] ([i915#1542]) -> [PASS][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-glk4/igt@perf@blocking-parameterized.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-glk2/igt@perf@blocking-parameterized.html

  * {igt@sysfs_heartbeat_interval@mixed@vcs0}:
    - shard-skl:          [INCOMPLETE][39] ([i915#198]) -> [PASS][40]
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-skl7/igt@sysfs_heartbeat_interval@mixed@vcs0.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-skl3/igt@sysfs_heartbeat_interval@mixed@vcs0.html

  
#### Warnings ####

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-kbl:          [DMESG-FAIL][41] ([i915#180] / [i915#95]) -> [FAIL][42] ([i915#93] / [i915#95])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-kbl6/igt@kms_fbcon_fbt@fbc-suspend.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-kbl7/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-7efc:
    - shard-kbl:          [FAIL][43] ([fdo#108145] / [i915#265] / [i915#93] / [i915#95]) -> [FAIL][44] ([fdo#108145] / [i915#265])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-kbl2/igt@kms_plane_alpha_blend@pipe-c-alpha-7efc.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-kbl3/igt@kms_plane_alpha_blend@pipe-c-alpha-7efc.html
    - shard-apl:          [FAIL][45] ([fdo#108145] / [i915#265] / [i915#95]) -> [FAIL][46] ([fdo#108145] / [i915#265])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-apl4/igt@kms_plane_alpha_blend@pipe-c-alpha-7efc.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-apl3/igt@kms_plane_alpha_blend@pipe-c-alpha-7efc.html

  * igt@runner@aborted:
    - shard-apl:          ([FAIL][47], [FAIL][48]) ([i915#1423] / [i915#716]) -> [FAIL][49] ([i915#1423])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-apl3/igt@runner@aborted.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8298/shard-apl3/igt@runner@aborted.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/shard-apl1/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [i915#1277]: https://gitlab.freedesktop.org/drm/intel/issues/1277
  [i915#1423]: https://gitlab.freedesktop.org/drm/intel/issues/1423
  [i915#1542]: https://gitlab.freedesktop.org/drm/intel/issues/1542
  [i915#1676]: https://gitlab.freedesktop.org/drm/intel/issues/1676
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#198]: https://gitlab.freedesktop.org/drm/intel/issues/198
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#54]: https://gitlab.freedesktop.org/drm/intel/issues/54
  [i915#69]: https://gitlab.freedesktop.org/drm/intel/issues/69
  [i915#716]: https://gitlab.freedesktop.org/drm/intel/issues/716
  [i915#72]: https://gitlab.freedesktop.org/drm/intel/issues/72
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#93]: https://gitlab.freedesktop.org/drm/intel/issues/93
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8298 -> Patchwork_17287

  CI-20190529: 20190529
  CI_DRM_8298: 17f82f0c2857d0b442adbdb62eb44b61d0f5b775 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5589: 31962324ac86f029e2841e56e97c42cf9d572956 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17287: 4ff3bedffc2fc1a27ffdeb74a023a46fd04a53a6 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17287/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads
  2020-04-15 10:00   ` Lionel Landwerlin
@ 2020-04-15 18:55     ` Umesh Nerlige Ramappa
  2020-04-15 19:16       ` Lionel Landwerlin
  0 siblings, 1 reply; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-04-15 18:55 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Wed, Apr 15, 2020 at 01:00:30PM +0300, Lionel Landwerlin wrote:
>On 13/04/2020 18:48, Umesh Nerlige Ramappa wrote:
>>A condition in wait_event_interruptible seems to be checked twice before
>>waiting on the event to occur. These checks are redundant when hrtimer
>>events will call oa_buffer_check_unlocked to update the oa_buffer tail
>>pointers. The redundant checks add cpu overhead. Simplify the check
>>to reduce cpu overhead when using blocking io to read oa buffer reports.
>>
>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>
>
>I cherry picked this patch alone and it seems to break the 
>disabled-read-error test.

Strange. I don't see it fail on my CFL. I am apply this on the latest 
drm-tip from yesterday.

The patch still checks if reports are available before blocking.  The 
behavior should still be the same w.r.t this test.

What machine did you run it on? I will try on the same. Any chance you 
have the debug output from the test?

Thanks,
Umesh

>
>
>>---
>>  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>>  1 file changed, 27 insertions(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>index 5cde3e4e7be6..e28a3ab83fde 100644
>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>@@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>  	return pollin;
>>  }
>>+/**
>>+ * oa_buffer_check_reports - quick check if reports are available
>>+ * @stream: i915 stream instance
>>+ *
>>+ * The return from this function is used as a condition for
>>+ * wait_event_interruptible in blocking read. This is used to detect
>>+ * available reports.
>>+ *
>>+ * A condition in wait_event_interruptible seems to be checked twice before
>>+ * waiting on an event to occur. These checks are redundant when hrtimer events
>>+ * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
>>+ * redundant checks add cpu overhead. We simplify the check to reduce cpu
>>+ * overhead.
>>+ */
>>+static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
>>+{
>>+	unsigned long flags;
>>+	bool available;
>>+
>>+	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>>+	available = stream->oa_buffer.tail != stream->oa_buffer.head;
>>+	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>>+
>>+	return available;
>>+}
>>+
>>  /**
>>   * append_oa_status - Appends a status record to a userspace read() buffer.
>>   * @stream: An i915-perf stream opened for OA metrics
>>@@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>>  		return -EIO;
>>  	return wait_event_interruptible(stream->poll_wq,
>>-					oa_buffer_check_unlocked(stream));
>>+					oa_buffer_check_reports(stream));
>>  }
>>  /**
>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads
  2020-04-15 18:55     ` Umesh Nerlige Ramappa
@ 2020-04-15 19:16       ` Lionel Landwerlin
  2020-04-15 22:47         ` Umesh Nerlige Ramappa
  0 siblings, 1 reply; 19+ messages in thread
From: Lionel Landwerlin @ 2020-04-15 19:16 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On 15/04/2020 21:55, Umesh Nerlige Ramappa wrote:
> On Wed, Apr 15, 2020 at 01:00:30PM +0300, Lionel Landwerlin wrote:
>> On 13/04/2020 18:48, Umesh Nerlige Ramappa wrote:
>>> A condition in wait_event_interruptible seems to be checked twice 
>>> before
>>> waiting on the event to occur. These checks are redundant when hrtimer
>>> events will call oa_buffer_check_unlocked to update the oa_buffer tail
>>> pointers. The redundant checks add cpu overhead. Simplify the check
>>> to reduce cpu overhead when using blocking io to read oa buffer 
>>> reports.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>
>>
>> I cherry picked this patch alone and it seems to break the 
>> disabled-read-error test.
>
> Strange. I don't see it fail on my CFL. I am apply this on the latest 
> drm-tip from yesterday.
>
> The patch still checks if reports are available before blocking. The 
> behavior should still be the same w.r.t this test.
>
> What machine did you run it on? I will try on the same. Any chance you 
> have the debug output from the test?
>
> Thanks,
> Umesh
>

I got that on SKL GT4 : http://paste.debian.net/1140604/


Thanks,


-Lionel


>>
>>
>>> ---
>>>  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>>>  1 file changed, 27 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>> b/drivers/gpu/drm/i915/i915_perf.c
>>> index 5cde3e4e7be6..e28a3ab83fde 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct 
>>> i915_perf_stream *stream)
>>>      return pollin;
>>>  }
>>> +/**
>>> + * oa_buffer_check_reports - quick check if reports are available
>>> + * @stream: i915 stream instance
>>> + *
>>> + * The return from this function is used as a condition for
>>> + * wait_event_interruptible in blocking read. This is used to detect
>>> + * available reports.
>>> + *
>>> + * A condition in wait_event_interruptible seems to be checked 
>>> twice before
>>> + * waiting on an event to occur. These checks are redundant when 
>>> hrtimer events
>>> + * will call oa_buffer_check_unlocked to update the oa_buffer tail 
>>> pointers. The
>>> + * redundant checks add cpu overhead. We simplify the check to 
>>> reduce cpu
>>> + * overhead.
>>> + */
>>> +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
>>> +{
>>> +    unsigned long flags;
>>> +    bool available;
>>> +
>>> +    spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>>> +    available = stream->oa_buffer.tail != stream->oa_buffer.head;
>>> + spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>>> +
>>> +    return available;
>>> +}
>>> +
>>>  /**
>>>   * append_oa_status - Appends a status record to a userspace read() 
>>> buffer.
>>>   * @stream: An i915-perf stream opened for OA metrics
>>> @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct 
>>> i915_perf_stream *stream)
>>>          return -EIO;
>>>      return wait_event_interruptible(stream->poll_wq,
>>> -                    oa_buffer_check_unlocked(stream));
>>> +                    oa_buffer_check_reports(stream));
>>>  }
>>>  /**
>>
>>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads
  2020-04-15 19:16       ` Lionel Landwerlin
@ 2020-04-15 22:47         ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 19+ messages in thread
From: Umesh Nerlige Ramappa @ 2020-04-15 22:47 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

On Wed, Apr 15, 2020 at 10:16:59PM +0300, Lionel Landwerlin wrote:
>On 15/04/2020 21:55, Umesh Nerlige Ramappa wrote:
>>On Wed, Apr 15, 2020 at 01:00:30PM +0300, Lionel Landwerlin wrote:
>>>On 13/04/2020 18:48, Umesh Nerlige Ramappa wrote:
>>>>A condition in wait_event_interruptible seems to be checked 
>>>>twice before
>>>>waiting on the event to occur. These checks are redundant when hrtimer
>>>>events will call oa_buffer_check_unlocked to update the oa_buffer tail
>>>>pointers. The redundant checks add cpu overhead. Simplify the check
>>>>to reduce cpu overhead when using blocking io to read oa buffer 
>>>>reports.
>>>>
>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>
>>>
>>>I cherry picked this patch alone and it seems to break the 
>>>disabled-read-error test.
>>
>>Strange. I don't see it fail on my CFL. I am apply this on the 
>>latest drm-tip from yesterday.
>>
>>The patch still checks if reports are available before blocking. The 
>>behavior should still be the same w.r.t this test.
>>
>>What machine did you run it on? I will try on the same. Any chance 
>>you have the debug output from the test?
>>
>>Thanks,
>>Umesh
>>
>
>I got that on SKL GT4 : http://paste.debian.net/1140604/
>

Fails always on SKL GT4. Thanks for catching this :)

Also fails without this patch if this test were to use NONBLOCKing IO.

This has to do with being able to read reports at sampling frequencies 
as high as 5 us (on some platforms, I guess).

Will look into it further and repost the series. Let me know if you have 
any other thoughts on this.

Thanks,
Umesh

>
>Thanks,
>
>
>-Lionel
>
>
>>>
>>>
>>>>---
>>>> drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>>>> 1 file changed, 27 insertions(+), 1 deletion(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>b/drivers/gpu/drm/i915/i915_perf.c
>>>>index 5cde3e4e7be6..e28a3ab83fde 100644
>>>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>@@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct 
>>>>i915_perf_stream *stream)
>>>>     return pollin;
>>>> }
>>>>+/**
>>>>+ * oa_buffer_check_reports - quick check if reports are available
>>>>+ * @stream: i915 stream instance
>>>>+ *
>>>>+ * The return from this function is used as a condition for
>>>>+ * wait_event_interruptible in blocking read. This is used to detect
>>>>+ * available reports.
>>>>+ *
>>>>+ * A condition in wait_event_interruptible seems to be checked 
>>>>twice before
>>>>+ * waiting on an event to occur. These checks are redundant 
>>>>when hrtimer events
>>>>+ * will call oa_buffer_check_unlocked to update the oa_buffer 
>>>>tail pointers. The
>>>>+ * redundant checks add cpu overhead. We simplify the check to 
>>>>reduce cpu
>>>>+ * overhead.
>>>>+ */
>>>>+static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
>>>>+{
>>>>+    unsigned long flags;
>>>>+    bool available;
>>>>+
>>>>+    spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>>>>+    available = stream->oa_buffer.tail != stream->oa_buffer.head;
>>>>+ spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>>>>+
>>>>+    return available;
>>>>+}
>>>>+
>>>> /**
>>>>  * append_oa_status - Appends a status record to a userspace 
>>>>read() buffer.
>>>>  * @stream: An i915-perf stream opened for OA metrics
>>>>@@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct 
>>>>i915_perf_stream *stream)
>>>>         return -EIO;
>>>>     return wait_event_interruptible(stream->poll_wq,
>>>>-                    oa_buffer_check_unlocked(stream));
>>>>+                    oa_buffer_check_reports(stream));
>>>> }
>>>> /**
>>>
>>>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 2/3] drm/i915: handle interrupts from the OA unit
  2020-04-13 15:48 ` [Intel-gfx] [PATCH 2/3] drm/i915: handle interrupts from the OA unit Umesh Nerlige Ramappa
@ 2020-04-17 23:25   ` Dixit, Ashutosh
  0 siblings, 0 replies; 19+ messages in thread
From: Dixit, Ashutosh @ 2020-04-17 23:25 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Mon, 13 Apr 2020 08:48:21 -0700, Umesh Nerlige Ramappa wrote:
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> index 0cc7dd54f4f9..61eee9fb8872 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> @@ -94,6 +94,18 @@ gen11_gt_engine_identity(struct intel_gt *gt,
>	return ident;
>  }
>
> +static void notify_perfmon_buffer_half_full(struct drm_i915_private *i915)
> +{
> +	atomic64_inc(&i915->perf.exclusive_stream->half_full_count);
> +	wake_up_all(&i915->perf.exclusive_stream->poll_wq);
> +}
> +

I was expecting this function to be almost the same as the timer
oa_poll_check_timer_cb(), something like, maybe with minor variations:

static void notify_perfmon_buffer_half_full(struct drm_i915_private *i915)
{
        struct i915_perf_stream *stream = i915->perf.exclusive_stream;

	if (oa_buffer_check_unlocked(stream)) {
                stream->pollin = true;
                wake_up(&stream->poll_wq);
        }
}

And consequently I was expecting to see zero changes to functions such as
oa_buffer_check_unlocked() and i915_perf_poll_locked() since everything
else is driven off stream->pollin as it is in case the timer callback.

So my question is why is notify_perfmon_buffer_half_full() not as I've
written above and what purpose are these new members half_full_count and
half_full_count_last serving? If it is to save a few cycles to adjust the
tail in oa_buffer_check_unlocked() (and I am not even sure of that) for an
interrupt which fires when half the buffer is full imo it is not worth it.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 3/3] drm/i915/perf: add interrupt enabling parameter
  2020-04-13 15:48 ` [Intel-gfx] [PATCH 3/3] drm/i915/perf: add interrupt enabling parameter Umesh Nerlige Ramappa
@ 2020-04-17 23:26   ` Dixit, Ashutosh
  0 siblings, 0 replies; 19+ messages in thread
From: Dixit, Ashutosh @ 2020-04-17 23:26 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

On Mon, 13 Apr 2020 08:48:22 -0700, Umesh Nerlige Ramappa wrote:
>
> @@ -556,16 +559,23 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   * waiting on an event to occur. These checks are redundant when hrtimer events
>   * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
>   * redundant checks add cpu overhead. We simplify the check to reduce cpu
> - * overhead.
> + * overhead. For interrupt events, we still need to make sure that
> + * oa_buffer_check_unlocked is called when an interrupt occurs.
>   */
>  static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
>  {
>	unsigned long flags;
>	bool available;
>
> -	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> -	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> -	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> +	if (!stream->oa_interrupt_monitor) {
> +		spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> +		available = stream->oa_buffer.tail != stream->oa_buffer.head;
> +		spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> +	} else {
> +		if (stream->half_full_count_last !=
> +		    atomic64_read(&stream->half_full_count))
> +			available = oa_buffer_check_unlocked(stream);
> +	}

This logic is broken if we have both the interrupt and the timer enabled?
Anyway as I said for Patch 1, oa_buffer_check_reports() should not be
needed (and hopefully neither the half_full_count's as per the comment in
Patch 2).

> @@ -3710,13 +3736,16 @@ static int read_properties_unlocked(struct i915_perf *perf,
>			break;
>		}
>		case DRM_I915_PERF_PROP_POLL_OA_PERIOD:
> -			if (value < 100000 /* 100us */) {
> +			if (value > 0 && value < 100000 /* 100us */) {
>				DRM_DEBUG("OA availability timer too small (%lluns < 100us)\n",
>					  value);
>				return -EINVAL;
>			}
>			props->poll_oa_period = value;
>			break;
> +		case DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT:
> +			props->oa_interrupt_monitor = value != 0;

At one point it was fashionable to write this as '= !!value' but I believe
even that is not needed now and this can be written as '= value'.

> @@ -3725,6 +3754,19 @@ static int read_properties_unlocked(struct i915_perf *perf,
>		uprop += 2;
>	}
>
> +	/*
> +	 * Blocking read need to be waken up by some mechanism. If no polling
> +	 * of the HEAD/TAIL register is done by the kernel and no interrupt is
> +	 * enabled, we'll never be able to wake up.
> +	 */
> +	if ((open_flags & I915_PERF_FLAG_FD_NONBLOCK) == 0 &&
> +	    !props->poll_oa_period &&
> +	    !props->oa_interrupt_monitor) {
> +		DRM_DEBUG("Requesting a blocking stream with no polling period "
> +			  "& no interrupt.\n");
> +		return -EINVAL;
> +	}

Shouldn't this be true for non-blocking reads too since the poll() can
block indefinitely if both timer and interrupt are disabled? I think we
should disallow disabling both in either case.

> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 14b67cd6b54b..947e65b937eb 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1987,12 +1987,23 @@ enum drm_i915_perf_property_id {
>	 * the driver if this parameter is not specified. Note that larger timer
>	 * values will reduce cpu consumption during OA perf captures. However,
>	 * excessively large values would potentially result in OA buffer
> -	 * overwrites as captures reach end of the OA buffer.
> +	 * overwrites as captures reach end of the OA buffer. A value of 0 means
> +	 * no hrtimer will be started.
>	 *
>	 * This property is available in perf revision 5.
>	 */
>	DRM_I915_PERF_PROP_POLL_OA_PERIOD,
>
> +	/**
> +	 * Specifying this property sets up the interrupt mechanism for the OA
> +	 * buffer in i915. This option in conjunction with a long polling period
> +	 * for avaibility of OA data can reduce CPU load significantly if you
> +	 * do not care about OA data being read as soon as it's available.
> +	 *
> +	 * This property is available in perf revision 6.
> +	 */
> +	DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT,

I am still torn about exposing this new control to userspace. If we don't
we can have the interrupt always enabled and just document that the timer
can be disabled and when the timer is disabled the the OA sampling will be
driven off the interrupt. Anyway, if we decide that exposing this to user
space is better or more explicit, I'll go with it.

In either case, we need to add to the documentation above that the
interrupt fires when the OA buffer gets half filled so the user can
estimate the time between interrupts and decide if they want to use the
timer together with the interrupt.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-04-17 23:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-13 15:48 [Intel-gfx] [PATCH 0/3] drm/i915/perf: add OA interrupt support Umesh Nerlige Ramappa
2020-04-13 15:48 ` [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads Umesh Nerlige Ramappa
2020-04-14  5:08   ` Dixit, Ashutosh
2020-04-14  6:58     ` Dixit, Ashutosh
2020-04-14 16:59       ` Umesh Nerlige Ramappa
2020-04-14 19:09         ` Dixit, Ashutosh
2020-04-14 19:37           ` Dixit, Ashutosh
2020-04-15 10:00   ` Lionel Landwerlin
2020-04-15 18:55     ` Umesh Nerlige Ramappa
2020-04-15 19:16       ` Lionel Landwerlin
2020-04-15 22:47         ` Umesh Nerlige Ramappa
2020-04-13 15:48 ` [Intel-gfx] [PATCH 2/3] drm/i915: handle interrupts from the OA unit Umesh Nerlige Ramappa
2020-04-17 23:25   ` Dixit, Ashutosh
2020-04-13 15:48 ` [Intel-gfx] [PATCH 3/3] drm/i915/perf: add interrupt enabling parameter Umesh Nerlige Ramappa
2020-04-17 23:26   ` Dixit, Ashutosh
2020-04-14 19:09 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/perf: add OA interrupt support (rev9) Patchwork
2020-04-14 19:28 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2020-04-14 19:33 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-04-15 11:50 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.