All of lore.kernel.org
 help / color / mirror / Atom feed
* ksoftirqd avoidance
@ 2018-05-31 18:51 Chris Wilson
  2018-05-31 18:51 ` [PATCH 01/11] drm/i915: Be irqsafe inside reset Chris Wilson
                   ` (17 more replies)
  0 siblings, 18 replies; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:51 UTC (permalink / raw)
  To: intel-gfx

This is the same as the last posting, but hopefully now all the ground
work is inplace that we get through CI cleanly (gem_eio is the bane of
my existence).

The goal of the patchset is to eliminate high latencies caused by
execlists_submission_tasklet being deferred to ksoftirqd and scheduled
as a normal task. This is achieved by writing to the ELSP directly from
the process submitting the new work. From irq context, we continue to
use the tasklet, so that we don't starve other users of the system, as
here we expect to keep the hw supplied with a pair of contexts so that
it has a request to work on as we queue the next, a slight delay in
our processing of the irq should not result in a GPU stall.

The consequence of rearranging the work to be split between different
contexts is that we end up with even more under an irqsoff spinlock. To
address this raise upon impact on the rest of the system (as well as
worrying about contention between ourselves), we first make sure that we
eliminate all unnecessary mmio and especially the forcewake. Later we
apply some more microoptimisations -- that aren't so micro at the
microsecond latency level!
-Chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 01/11] drm/i915: Be irqsafe inside reset
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
@ 2018-05-31 18:51 ` Chris Wilson
  2018-06-01 13:44   ` Tvrtko Ursulin
  2018-05-31 18:51 ` [PATCH 02/11] drm/i915/execlists: Reset the CSB head tracking on reset/sanitization Chris Wilson
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:51 UTC (permalink / raw)
  To: intel-gfx

As we want to be able to call i915_reset_engine and co from a softirq or
timer context, we need to be irqsafe at all timers. So we have to forgo
the simple spin_lock_irq for the full spin_lock_irqsave.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f5c4ef052001..550fa8288c45 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3142,15 +3142,17 @@ i915_gem_reset_request(struct intel_engine_cs *engine,
 		 */
 		request = i915_gem_find_active_request(engine);
 		if (request) {
+			unsigned long flags;
+
 			i915_gem_context_mark_innocent(request->gem_context);
 			dma_fence_set_error(&request->fence, -EAGAIN);
 
 			/* Rewind the engine to replay the incomplete rq */
-			spin_lock_irq(&engine->timeline.lock);
+			spin_lock_irqsave(&engine->timeline.lock, flags);
 			request = list_prev_entry(request, link);
 			if (&request->link == &engine->timeline.requests)
 				request = NULL;
-			spin_unlock_irq(&engine->timeline.lock);
+			spin_unlock_irqrestore(&engine->timeline.lock, flags);
 		}
 	}
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 02/11] drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
  2018-05-31 18:51 ` [PATCH 01/11] drm/i915: Be irqsafe inside reset Chris Wilson
@ 2018-05-31 18:51 ` Chris Wilson
  2018-06-01 13:54   ` Tvrtko Ursulin
  2018-05-31 18:51 ` [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock Chris Wilson
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:51 UTC (permalink / raw)
  To: intel-gfx

We can avoid the mmio read of the CSB pointers after reset based on the
knowledge that the HW always start writing at entry 0 in the CSB buffer.
We need to reset our CSB head tracking after GPU reset (and on
sanitization after resume) so that we are expecting to read from entry
0.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 517e92c6a70b..e5cf049c18f8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -965,22 +965,19 @@ static void process_csb(struct intel_engine_cs *engine)
 			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 		unsigned int head, tail;
 
-		if (unlikely(execlists->csb_use_mmio)) {
-			buf = (u32 * __force)
-				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
-			execlists->csb_head = -1; /* force mmio read of CSB */
-		}
-
 		/* Clear before reading to catch new interrupts */
 		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 		smp_mb__after_atomic();
 
-		if (unlikely(execlists->csb_head == -1)) { /* after a reset */
+		if (unlikely(execlists->csb_use_mmio)) {
 			if (!fw) {
 				intel_uncore_forcewake_get(i915, execlists->fw_domains);
 				fw = true;
 			}
 
+			buf = (u32 * __force)
+				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+
 			head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
 			tail = GEN8_CSB_WRITE_PTR(head);
 			head = GEN8_CSB_READ_PTR(head);
@@ -1959,7 +1956,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
 	spin_unlock(&engine->timeline.lock);
 
 	/* Following the reset, we need to reload the CSB read/write pointers */
-	engine->execlists.csb_head = -1;
+	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
 
 	local_irq_restore(flags);
 
@@ -2460,7 +2457,7 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 			upper_32_bits(ce->lrc_desc);
 	}
 
-	engine->execlists.csb_head = -1;
+	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
 
 	return 0;
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
  2018-05-31 18:51 ` [PATCH 01/11] drm/i915: Be irqsafe inside reset Chris Wilson
  2018-05-31 18:51 ` [PATCH 02/11] drm/i915/execlists: Reset the CSB head tracking on reset/sanitization Chris Wilson
@ 2018-05-31 18:51 ` Chris Wilson
  2018-06-01 14:02   ` Tvrtko Ursulin
  2018-06-04 14:19   ` Tvrtko Ursulin
  2018-05-31 18:51 ` [PATCH 04/11] drm/i915/execlists: Pull CSB reset under the timeline.lock Chris Wilson
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:51 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will begin processing the CSB from inside the
interrupt handler. This means that updating the execlists->port[] will
no longer be locked by the tasklet but by the engine->timeline.lock
instead. Pull dequeue and submit under the same lock for protection.
(An alternative, future, plan is to keep the in/out arrays separate for
concurrent processing and reduced lock coverage.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 32 ++++++++++++--------------------
 1 file changed, 12 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e5cf049c18f8..d207a1bf9dc9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -562,7 +562,7 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
 	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
 }
 
-static bool __execlists_dequeue(struct intel_engine_cs *engine)
+static void __execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
@@ -617,11 +617,11 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 		 * the HW to indicate that it has had a chance to respond.
 		 */
 		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
-			return false;
+			return;
 
 		if (need_preempt(engine, last, execlists->queue_priority)) {
 			inject_preempt_context(engine);
-			return false;
+			return;
 		}
 
 		/*
@@ -646,7 +646,7 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 		 * priorities of the ports haven't been switch.
 		 */
 		if (port_count(&port[1]))
-			return false;
+			return;
 
 		/*
 		 * WaIdleLiteRestore:bdw,skl
@@ -746,8 +746,10 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 		port != execlists->port ? rq_prio(last) : INT_MIN;
 
 	execlists->first = rb;
-	if (submit)
+	if (submit) {
 		port_assign(port, last);
+		execlists_submit_ports(engine);
+	}
 
 	/* We must always keep the beast fed if we have work piled up */
 	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
@@ -756,24 +758,19 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 	if (last)
 		execlists_user_begin(execlists, execlists->port);
 
-	return submit;
+	/* If the engine is now idle, so should be the flag; and vice versa. */
+	GEM_BUG_ON(execlists_is_active(&engine->execlists,
+				       EXECLISTS_ACTIVE_USER) ==
+		   !port_isset(engine->execlists.port));
 }
 
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
 	unsigned long flags;
-	bool submit;
 
 	spin_lock_irqsave(&engine->timeline.lock, flags);
-	submit = __execlists_dequeue(engine);
+	__execlists_dequeue(engine);
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
-
-	if (submit)
-		execlists_submit_ports(engine);
-
-	GEM_BUG_ON(port_isset(execlists->port) &&
-		   !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
 }
 
 void
@@ -1156,11 +1153,6 @@ static void execlists_submission_tasklet(unsigned long data)
 
 	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
 		execlists_dequeue(engine);
-
-	/* If the engine is now idle, so should be the flag; and vice versa. */
-	GEM_BUG_ON(execlists_is_active(&engine->execlists,
-				       EXECLISTS_ACTIVE_USER) ==
-		   !port_isset(engine->execlists.port));
 }
 
 static void queue_request(struct intel_engine_cs *engine,
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 04/11] drm/i915/execlists: Pull CSB reset under the timeline.lock
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (2 preceding siblings ...)
  2018-05-31 18:51 ` [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock Chris Wilson
@ 2018-05-31 18:51 ` Chris Wilson
  2018-06-04 14:25   ` Tvrtko Ursulin
  2018-05-31 18:51 ` [PATCH 05/11] drm/i915/execlists: Process one CSB interrupt at a time Chris Wilson
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:51 UTC (permalink / raw)
  To: intel-gfx

In the following patch, we will process the CSB interrupt under the
timeline.lock and not under the tasklet lock. This also means that we
will need to protect access to common variables such as
execlists->csb_head with the timeline.lock during reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d207a1bf9dc9..ec54c29b610f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1927,8 +1927,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
 		  engine->name, request ? request->global_seqno : 0,
 		  intel_engine_get_seqno(engine));
 
-	/* See execlists_cancel_requests() for the irq/spinlock split. */
-	local_irq_save(flags);
+	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	/*
 	 * Catch up with any missed context-switch interrupts.
@@ -1943,14 +1942,12 @@ static void execlists_reset(struct intel_engine_cs *engine,
 	reset_irq(engine);
 
 	/* Push back any incomplete requests for replay after the reset. */
-	spin_lock(&engine->timeline.lock);
 	__unwind_incomplete_requests(engine);
-	spin_unlock(&engine->timeline.lock);
 
 	/* Following the reset, we need to reload the CSB read/write pointers */
 	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
 
-	local_irq_restore(flags);
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
 	/*
 	 * If the request was innocent, we leave the request in the ELSP
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 05/11] drm/i915/execlists: Process one CSB interrupt at a time
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (3 preceding siblings ...)
  2018-05-31 18:51 ` [PATCH 04/11] drm/i915/execlists: Pull CSB reset under the timeline.lock Chris Wilson
@ 2018-05-31 18:51 ` Chris Wilson
  2018-06-04 14:27   ` Tvrtko Ursulin
  2018-05-31 18:51 ` [PATCH 06/11] drm/i915/execlists: Unify CSB access pointers Chris Wilson
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:51 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will process the CSB events directly from the CS
interrupt handler, being called for each interrupt. Hence, we will no
longer have the need for a loop until the has-interrupt bit is clear,
and in the meantime can remove that small optimisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 274 +++++++++++++++----------------
 1 file changed, 135 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ec54c29b610f..df2b7005df65 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -954,166 +954,162 @@ static void process_csb(struct intel_engine_cs *engine)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
 	struct drm_i915_private *i915 = engine->i915;
+
+	/* The HWSP contains a (cacheable) mirror of the CSB */
+	const u32 *buf =
+		&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
+	unsigned int head, tail;
 	bool fw = false;
 
-	do {
-		/* The HWSP contains a (cacheable) mirror of the CSB */
-		const u32 *buf =
-			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
-		unsigned int head, tail;
+	/* Clear before reading to catch new interrupts */
+	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+	smp_mb__after_atomic();
 
-		/* Clear before reading to catch new interrupts */
-		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-		smp_mb__after_atomic();
+	if (unlikely(execlists->csb_use_mmio)) {
+		intel_uncore_forcewake_get(i915, execlists->fw_domains);
+		fw = true;
 
-		if (unlikely(execlists->csb_use_mmio)) {
-			if (!fw) {
-				intel_uncore_forcewake_get(i915, execlists->fw_domains);
-				fw = true;
-			}
+		buf = (u32 * __force)
+			(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
 
-			buf = (u32 * __force)
-				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+		head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+		tail = GEN8_CSB_WRITE_PTR(head);
+		head = GEN8_CSB_READ_PTR(head);
+		execlists->csb_head = head;
+	} else {
+		const int write_idx =
+			intel_hws_csb_write_index(i915) -
+			I915_HWS_CSB_BUF0_INDEX;
 
-			head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
-			tail = GEN8_CSB_WRITE_PTR(head);
-			head = GEN8_CSB_READ_PTR(head);
-			execlists->csb_head = head;
-		} else {
-			const int write_idx =
-				intel_hws_csb_write_index(i915) -
-				I915_HWS_CSB_BUF0_INDEX;
+		head = execlists->csb_head;
+		tail = READ_ONCE(buf[write_idx]);
+		rmb(); /* Hopefully paired with a wmb() in HW */
+	}
+	GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
+		  engine->name,
+		  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
+		  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
 
-			head = execlists->csb_head;
-			tail = READ_ONCE(buf[write_idx]);
-			rmb(); /* Hopefully paired with a wmb() in HW */
-		}
-		GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
-			  engine->name,
-			  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
-			  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
+	while (head != tail) {
+		struct i915_request *rq;
+		unsigned int status;
+		unsigned int count;
 
-		while (head != tail) {
-			struct i915_request *rq;
-			unsigned int status;
-			unsigned int count;
+		if (++head == GEN8_CSB_ENTRIES)
+			head = 0;
 
-			if (++head == GEN8_CSB_ENTRIES)
-				head = 0;
+		/*
+		 * We are flying near dragons again.
+		 *
+		 * We hold a reference to the request in execlist_port[]
+		 * but no more than that. We are operating in softirq
+		 * context and so cannot hold any mutex or sleep. That
+		 * prevents us stopping the requests we are processing
+		 * in port[] from being retired simultaneously (the
+		 * breadcrumb will be complete before we see the
+		 * context-switch). As we only hold the reference to the
+		 * request, any pointer chasing underneath the request
+		 * is subject to a potential use-after-free. Thus we
+		 * store all of the bookkeeping within port[] as
+		 * required, and avoid using unguarded pointers beneath
+		 * request itself. The same applies to the atomic
+		 * status notifier.
+		 */
 
-			/*
-			 * We are flying near dragons again.
-			 *
-			 * We hold a reference to the request in execlist_port[]
-			 * but no more than that. We are operating in softirq
-			 * context and so cannot hold any mutex or sleep. That
-			 * prevents us stopping the requests we are processing
-			 * in port[] from being retired simultaneously (the
-			 * breadcrumb will be complete before we see the
-			 * context-switch). As we only hold the reference to the
-			 * request, any pointer chasing underneath the request
-			 * is subject to a potential use-after-free. Thus we
-			 * store all of the bookkeeping within port[] as
-			 * required, and avoid using unguarded pointers beneath
-			 * request itself. The same applies to the atomic
-			 * status notifier.
-			 */
+		status = READ_ONCE(buf[2 * head]); /* maybe mmio! */
+		GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x, active=0x%x\n",
+			  engine->name, head,
+			  status, buf[2*head + 1],
+			  execlists->active);
+
+		if (status & (GEN8_CTX_STATUS_IDLE_ACTIVE |
+			      GEN8_CTX_STATUS_PREEMPTED))
+			execlists_set_active(execlists,
+					     EXECLISTS_ACTIVE_HWACK);
+		if (status & GEN8_CTX_STATUS_ACTIVE_IDLE)
+			execlists_clear_active(execlists,
+					       EXECLISTS_ACTIVE_HWACK);
+
+		if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
+			continue;
 
-			status = READ_ONCE(buf[2 * head]); /* maybe mmio! */
-			GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x, active=0x%x\n",
-				  engine->name, head,
-				  status, buf[2*head + 1],
-				  execlists->active);
-
-			if (status & (GEN8_CTX_STATUS_IDLE_ACTIVE |
-				      GEN8_CTX_STATUS_PREEMPTED))
-				execlists_set_active(execlists,
-						     EXECLISTS_ACTIVE_HWACK);
-			if (status & GEN8_CTX_STATUS_ACTIVE_IDLE)
-				execlists_clear_active(execlists,
-						       EXECLISTS_ACTIVE_HWACK);
-
-			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
-				continue;
+		/* We should never get a COMPLETED | IDLE_ACTIVE! */
+		GEM_BUG_ON(status & GEN8_CTX_STATUS_IDLE_ACTIVE);
 
-			/* We should never get a COMPLETED | IDLE_ACTIVE! */
-			GEM_BUG_ON(status & GEN8_CTX_STATUS_IDLE_ACTIVE);
+		if (status & GEN8_CTX_STATUS_COMPLETE &&
+		    buf[2*head + 1] == execlists->preempt_complete_status) {
+			GEM_TRACE("%s preempt-idle\n", engine->name);
+			complete_preempt_context(execlists);
+			continue;
+		}
 
-			if (status & GEN8_CTX_STATUS_COMPLETE &&
-			    buf[2*head + 1] == execlists->preempt_complete_status) {
-				GEM_TRACE("%s preempt-idle\n", engine->name);
-				complete_preempt_context(execlists);
-				continue;
-			}
+		if (status & GEN8_CTX_STATUS_PREEMPTED &&
+		    execlists_is_active(execlists,
+					EXECLISTS_ACTIVE_PREEMPT))
+			continue;
 
-			if (status & GEN8_CTX_STATUS_PREEMPTED &&
-			    execlists_is_active(execlists,
-						EXECLISTS_ACTIVE_PREEMPT))
-				continue;
+		GEM_BUG_ON(!execlists_is_active(execlists,
+						EXECLISTS_ACTIVE_USER));
 
-			GEM_BUG_ON(!execlists_is_active(execlists,
-							EXECLISTS_ACTIVE_USER));
+		rq = port_unpack(port, &count);
+		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%d) (current %d), prio=%d\n",
+			  engine->name,
+			  port->context_id, count,
+			  rq ? rq->global_seqno : 0,
+			  rq ? rq->fence.context : 0,
+			  rq ? rq->fence.seqno : 0,
+			  intel_engine_get_seqno(engine),
+			  rq ? rq_prio(rq) : 0);
+
+		/* Check the context/desc id for this event matches */
+		GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
+
+		GEM_BUG_ON(count == 0);
+		if (--count == 0) {
+			/*
+			 * On the final event corresponding to the
+			 * submission of this context, we expect either
+			 * an element-switch event or a completion
+			 * event (and on completion, the active-idle
+			 * marker). No more preemptions, lite-restore
+			 * or otherwise.
+			 */
+			GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
+			GEM_BUG_ON(port_isset(&port[1]) &&
+				   !(status & GEN8_CTX_STATUS_ELEMENT_SWITCH));
+			GEM_BUG_ON(!port_isset(&port[1]) &&
+				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
 
-			rq = port_unpack(port, &count);
-			GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%d) (current %d), prio=%d\n",
-				  engine->name,
-				  port->context_id, count,
-				  rq ? rq->global_seqno : 0,
-				  rq ? rq->fence.context : 0,
-				  rq ? rq->fence.seqno : 0,
-				  intel_engine_get_seqno(engine),
-				  rq ? rq_prio(rq) : 0);
+			/*
+			 * We rely on the hardware being strongly
+			 * ordered, that the breadcrumb write is
+			 * coherent (visible from the CPU) before the
+			 * user interrupt and CSB is processed.
+			 */
+			GEM_BUG_ON(!i915_request_completed(rq));
 
-			/* Check the context/desc id for this event matches */
-			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
+			execlists_context_schedule_out(rq,
+						       INTEL_CONTEXT_SCHEDULE_OUT);
+			i915_request_put(rq);
 
-			GEM_BUG_ON(count == 0);
-			if (--count == 0) {
-				/*
-				 * On the final event corresponding to the
-				 * submission of this context, we expect either
-				 * an element-switch event or a completion
-				 * event (and on completion, the active-idle
-				 * marker). No more preemptions, lite-restore
-				 * or otherwise.
-				 */
-				GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
-				GEM_BUG_ON(port_isset(&port[1]) &&
-					   !(status & GEN8_CTX_STATUS_ELEMENT_SWITCH));
-				GEM_BUG_ON(!port_isset(&port[1]) &&
-					   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
+			GEM_TRACE("%s completed ctx=%d\n",
+				  engine->name, port->context_id);
 
-				/*
-				 * We rely on the hardware being strongly
-				 * ordered, that the breadcrumb write is
-				 * coherent (visible from the CPU) before the
-				 * user interrupt and CSB is processed.
-				 */
-				GEM_BUG_ON(!i915_request_completed(rq));
-
-				execlists_context_schedule_out(rq,
-							       INTEL_CONTEXT_SCHEDULE_OUT);
-				i915_request_put(rq);
-
-				GEM_TRACE("%s completed ctx=%d\n",
-					  engine->name, port->context_id);
-
-				port = execlists_port_complete(execlists, port);
-				if (port_isset(port))
-					execlists_user_begin(execlists, port);
-				else
-					execlists_user_end(execlists);
-			} else {
-				port_set(port, port_pack(rq, count));
-			}
+			port = execlists_port_complete(execlists, port);
+			if (port_isset(port))
+				execlists_user_begin(execlists, port);
+			else
+				execlists_user_end(execlists);
+		} else {
+			port_set(port, port_pack(rq, count));
 		}
+	}
 
-		if (head != execlists->csb_head) {
-			execlists->csb_head = head;
-			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
-			       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
-		}
-	} while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
+	if (head != execlists->csb_head) {
+		execlists->csb_head = head;
+		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
+		       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+	}
 
 	if (unlikely(fw))
 		intel_uncore_forcewake_put(i915, execlists->fw_domains);
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 06/11] drm/i915/execlists: Unify CSB access pointers
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (4 preceding siblings ...)
  2018-05-31 18:51 ` [PATCH 05/11] drm/i915/execlists: Process one CSB interrupt at a time Chris Wilson
@ 2018-05-31 18:51 ` Chris Wilson
  2018-06-04 14:53   ` Tvrtko Ursulin
  2018-05-31 18:52 ` [PATCH 07/11] drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd) Chris Wilson
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:51 UTC (permalink / raw)
  To: intel-gfx

Following the removal of the last workarounds, the only CSB mmio access
is for the old vGPU interface. The mmio registers presented by vGPU do
not require forcewake and can be treated as ordinary volatile memory,
i.e. they behave just like the HWSP access just at a different location.
We can reduce the CSB access inside the irq handler to a set of
read/write/buffer pointers and treat the various paths identically and
not worry about forcewake.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  |  12 ---
 drivers/gpu/drm/i915/intel_lrc.c        | 116 ++++++++++--------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  23 +++--
 3 files changed, 65 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 13448ea76f57..bc0193199a03 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -25,7 +25,6 @@
 #include <drm/drm_print.h>
 
 #include "i915_drv.h"
-#include "i915_vgpu.h"
 #include "intel_ringbuffer.h"
 #include "intel_lrc.h"
 
@@ -456,21 +455,10 @@ static void intel_engine_init_batch_pool(struct intel_engine_cs *engine)
 	i915_gem_batch_pool_init(&engine->batch_pool, engine);
 }
 
-static bool csb_force_mmio(struct drm_i915_private *i915)
-{
-	/* Older GVT emulation depends upon intercepting CSB mmio */
-	if (intel_vgpu_active(i915) && !intel_vgpu_has_hwsp_emulation(i915))
-		return true;
-
-	return false;
-}
-
 static void intel_engine_init_execlist(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 
-	execlists->csb_use_mmio = csb_force_mmio(engine->i915);
-
 	execlists->port_mask = 1;
 	BUILD_BUG_ON_NOT_POWER_OF_2(execlists_num_ports(execlists));
 	GEM_BUG_ON(execlists_num_ports(execlists) > EXECLIST_MAX_PORTS);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index df2b7005df65..7b36aa984304 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -137,6 +137,7 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 #include "i915_gem_render_state.h"
+#include "i915_vgpu.h"
 #include "intel_lrc_reg.h"
 #include "intel_mocs.h"
 #include "intel_workarounds.h"
@@ -953,44 +954,23 @@ static void process_csb(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
-	struct drm_i915_private *i915 = engine->i915;
-
-	/* The HWSP contains a (cacheable) mirror of the CSB */
-	const u32 *buf =
-		&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
-	unsigned int head, tail;
-	bool fw = false;
+	const u32 * const buf = execlists->csb_status;
+	u8 head, tail;
 
 	/* Clear before reading to catch new interrupts */
 	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 	smp_mb__after_atomic();
 
-	if (unlikely(execlists->csb_use_mmio)) {
-		intel_uncore_forcewake_get(i915, execlists->fw_domains);
-		fw = true;
-
-		buf = (u32 * __force)
-			(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+	/* Note that csb_write, csb_status may be either in HWSP or mmio */
+	head = execlists->csb_head;
+	tail = READ_ONCE(*execlists->csb_write);
+	GEM_TRACE("%s cs-irq head=%d, tail=%d\n", engine->name, head, tail);
+	if (unlikely(head == tail))
+		return;
 
-		head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
-		tail = GEN8_CSB_WRITE_PTR(head);
-		head = GEN8_CSB_READ_PTR(head);
-		execlists->csb_head = head;
-	} else {
-		const int write_idx =
-			intel_hws_csb_write_index(i915) -
-			I915_HWS_CSB_BUF0_INDEX;
+	rmb(); /* Hopefully paired with a wmb() in HW */
 
-		head = execlists->csb_head;
-		tail = READ_ONCE(buf[write_idx]);
-		rmb(); /* Hopefully paired with a wmb() in HW */
-	}
-	GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
-		  engine->name,
-		  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
-		  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
-
-	while (head != tail) {
+	do {
 		struct i915_request *rq;
 		unsigned int status;
 		unsigned int count;
@@ -1016,12 +996,12 @@ static void process_csb(struct intel_engine_cs *engine)
 		 * status notifier.
 		 */
 
-		status = READ_ONCE(buf[2 * head]); /* maybe mmio! */
 		GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x, active=0x%x\n",
 			  engine->name, head,
-			  status, buf[2*head + 1],
+			  buf[2 * head + 0], buf[2 * head + 1],
 			  execlists->active);
 
+		status = buf[2 * head];
 		if (status & (GEN8_CTX_STATUS_IDLE_ACTIVE |
 			      GEN8_CTX_STATUS_PREEMPTED))
 			execlists_set_active(execlists,
@@ -1103,16 +1083,11 @@ static void process_csb(struct intel_engine_cs *engine)
 		} else {
 			port_set(port, port_pack(rq, count));
 		}
-	}
-
-	if (head != execlists->csb_head) {
-		execlists->csb_head = head;
-		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
-		       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
-	}
+	} while (head != tail);
 
-	if (unlikely(fw))
-		intel_uncore_forcewake_put(i915, execlists->fw_domains);
+	writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
+	       execlists->csb_read);
+	execlists->csb_head = head;
 }
 
 /*
@@ -2386,28 +2361,11 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
 static void
 logical_ring_setup(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-	enum forcewake_domains fw_domains;
-
 	intel_engine_setup_common(engine);
 
 	/* Intentionally left blank. */
 	engine->buffer = NULL;
 
-	fw_domains = intel_uncore_forcewake_for_reg(dev_priv,
-						    RING_ELSP(engine),
-						    FW_REG_WRITE);
-
-	fw_domains |= intel_uncore_forcewake_for_reg(dev_priv,
-						     RING_CONTEXT_STATUS_PTR(engine),
-						     FW_REG_READ | FW_REG_WRITE);
-
-	fw_domains |= intel_uncore_forcewake_for_reg(dev_priv,
-						     RING_CONTEXT_STATUS_BUF_BASE(engine),
-						     FW_REG_READ);
-
-	engine->execlists.fw_domains = fw_domains;
-
 	tasklet_init(&engine->execlists.tasklet,
 		     execlists_submission_tasklet, (unsigned long)engine);
 
@@ -2415,34 +2373,56 @@ logical_ring_setup(struct intel_engine_cs *engine)
 	logical_ring_default_irqs(engine);
 }
 
+static bool csb_force_mmio(struct drm_i915_private *i915)
+{
+	/* Older GVT emulation depends upon intercepting CSB mmio */
+	return intel_vgpu_active(i915) && !intel_vgpu_has_hwsp_emulation(i915);
+}
+
 static int logical_ring_init(struct intel_engine_cs *engine)
 {
+	struct drm_i915_private *i915 = engine->i915;
+	struct intel_engine_execlists * const execlists = &engine->execlists;
 	int ret;
 
 	ret = intel_engine_init_common(engine);
 	if (ret)
 		goto error;
 
-	if (HAS_LOGICAL_RING_ELSQ(engine->i915)) {
-		engine->execlists.submit_reg = engine->i915->regs +
+	if (HAS_LOGICAL_RING_ELSQ(i915)) {
+		execlists->submit_reg = i915->regs +
 			i915_mmio_reg_offset(RING_EXECLIST_SQ_CONTENTS(engine));
-		engine->execlists.ctrl_reg = engine->i915->regs +
+		execlists->ctrl_reg = i915->regs +
 			i915_mmio_reg_offset(RING_EXECLIST_CONTROL(engine));
 	} else {
-		engine->execlists.submit_reg = engine->i915->regs +
+		execlists->submit_reg = i915->regs +
 			i915_mmio_reg_offset(RING_ELSP(engine));
 	}
 
-	engine->execlists.preempt_complete_status = ~0u;
-	if (engine->i915->preempt_context) {
+	execlists->preempt_complete_status = ~0u;
+	if (i915->preempt_context) {
 		struct intel_context *ce =
-			to_intel_context(engine->i915->preempt_context, engine);
+			to_intel_context(i915->preempt_context, engine);
 
-		engine->execlists.preempt_complete_status =
+		execlists->preempt_complete_status =
 			upper_32_bits(ce->lrc_desc);
 	}
 
-	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
+	execlists->csb_head = GEN8_CSB_ENTRIES - 1;
+	execlists->csb_read =
+		i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
+	if (csb_force_mmio(i915)) {
+		execlists->csb_status = (u32 __force *)
+			(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+
+		execlists->csb_write = (u32 __force *)execlists->csb_read;
+	} else {
+		execlists->csb_status =
+			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
+
+		execlists->csb_write =
+			&engine->status_page.page_addr[intel_hws_csb_write_index(i915)];
+	}
 
 	return 0;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index acef385c4c80..88418477f9ab 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -299,19 +299,30 @@ struct intel_engine_execlists {
 	struct rb_node *first;
 
 	/**
-	 * @fw_domains: forcewake domains for irq tasklet
+	 * @csb_head: context status buffer head
 	 */
-	unsigned int fw_domains;
+	unsigned int csb_head;
 
 	/**
-	 * @csb_head: context status buffer head
+	 * @csb_read: control register for Context Switch buffer
+	 *
+	 * Note this register is always in mmio.
 	 */
-	unsigned int csb_head;
+	u32 __iomem *csb_read;
 
 	/**
-	 * @csb_use_mmio: access csb through mmio, instead of hwsp
+	 * @csb_write: control register for Context Switch buffer
+	 *
+	 * Note this register may be either mmio or HWSP shadow.
+	 */
+	u32 *csb_write;
+
+	/**
+	 * @csb_status: status array for Context Switch buffer
+	 *
+	 * Note these register may be either mmio or HWSP shadow.
 	 */
-	bool csb_use_mmio;
+	u32 *csb_status;
 
 	/**
 	 * @preempt_complete_status: expected CSB upon completing preemption
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 07/11] drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (5 preceding siblings ...)
  2018-05-31 18:51 ` [PATCH 06/11] drm/i915/execlists: Unify CSB access pointers Chris Wilson
@ 2018-05-31 18:52 ` Chris Wilson
  2018-05-31 19:57   ` [PATCH] " Chris Wilson
  2018-05-31 18:52 ` [PATCH 08/11] drm/i915: Move rate-limiting request retire to after submission Chris Wilson
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:52 UTC (permalink / raw)
  To: intel-gfx

Back in commit 27af5eea54d1 ("drm/i915: Move execlists irq handler to a
bottom half"), we came to the conclusion that running our CSB processing
and ELSP submission from inside the irq handler was a bad idea. A really
bad idea as we could impose nearly 1s latency on other users of the
system, on average! Deferring our work to a tasklet allowed us to do the
processing with irqs enabled, reducing the impact to an average of about
50us.

We have since eradicated the use of forcewaked mmio from inside the CSB
processing and ELSP submission, bringing the impact down to around 5us
(on Kabylake); an order of magnitude better than our measurements 2
years ago on Broadwell and only about 2x worse on average than the
gem_syslatency on an unladen system.

In this iteration of the tasklet-vs-direct submission debate, we seek a
compromise where by we submit new requests immediately to the HW but
defer processing the CS interrupt onto a tasklet. We gain the advantage
of low-latency and ksoftirqd avoidance when waking up the HW, while
avoiding the system-wide starvation of our CS irq-storms.

Comparing the impact on the maximum latency observed (that is the time stolen
from an RT process) over a 120s interval, repeated several times (using
gem_syslatency, similar to RT's cyclictest) while the system is fully
laden with i915 nops, we see that direct submission an actually improve
the worse case.

Maximum latency in microseconds of a third party RT thread
(gem_syslatency -t 120 -f 2)
  x Always using tasklets (a couple of >1000us outliers removed)
  + Only using tasklets from CS irq, direct submission of requests
+------------------------------------------------------------------------+
|          +                                                             |
|          +                                                             |
|          +                                                             |
|          +       +                                                     |
|          + +     +                                                     |
|       +  + +     +  x     x     x                                      |
|      +++ + +     +  x  x  x  x  x  x                                   |
|      +++ + ++  + +  *x x  x  x  x  x                                   |
|      +++ + ++  + *  *x x  *  x  x  x                                   |
|    + +++ + ++  * * +*xxx  *  x  x  xx                                  |
|    * +++ + ++++* *x+**xx+ *  x  x xxxx x                               |
|   **x++++*++**+*x*x****x+ * +x xx xxxx x          x                    |
|x* ******+***************++*+***xxxxxx* xx*x     xxx +                x+|
|             |__________MA___________|                                  |
|      |______M__A________|                                              |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 118            91           186           124     125.28814     16.279137
+ 120            92           187           109     112.00833     13.458617
Difference at 95.0% confidence
	-13.2798 +/- 3.79219
	-10.5994% +/- 3.02677%
	(Student's t, pooled s = 14.9237)

However the mean latency is adversely affected:

Mean latency in microseconds of a third party RT thread
(gem_syslatency -t 120 -f 1)
  x Always using tasklets
  + Only using tasklets from CS irq, direct submission of requests
+------------------------------------------------------------------------+
|           xxxxxx                                        +   ++         |
|           xxxxxx                                        +   ++         |
|           xxxxxx                                      + +++ ++         |
|           xxxxxxx                                     +++++ ++         |
|           xxxxxxx                                     +++++ ++         |
|           xxxxxxx                                     +++++ +++        |
|           xxxxxxx                                   + ++++++++++       |
|           xxxxxxxx                                 ++ ++++++++++       |
|           xxxxxxxx                                 ++ ++++++++++       |
|          xxxxxxxxxx                                +++++++++++++++     |
|         xxxxxxxxxxx    x                           +++++++++++++++     |
|x       xxxxxxxxxxxxx   x           +            + ++++++++++++++++++  +|
|           |__A__|                                                      |
|                                                      |____A___|        |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 120         3.506         3.727         3.631     3.6321417    0.02773109
+ 120         3.834         4.149         4.039     4.0375167   0.041221676
Difference at 95.0% confidence
	0.405375 +/- 0.00888913
	11.1608% +/- 0.244735%
	(Student's t, pooled s = 0.03513)

However, since the mean latency corresponds to the amount of irqsoff
processing we have to do for a CS interrupt, we only need to speed that
up to benefit not just system latency but our own throughput.

v2: Remember to defer submissions when under reset.
v4: Only use direct submission for new requests

Testcase: igt/gem_exec_latency/*rthog*
References: 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.h         |   5 +
 drivers/gpu/drm/i915/i915_irq.c         |  11 +-
 drivers/gpu/drm/i915/intel_engine_cs.c  |   8 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 147 ++++++++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 -
 5 files changed, 98 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 261da577829a..7892ac773916 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -88,4 +88,9 @@ static inline void __tasklet_enable_sync_once(struct tasklet_struct *t)
 		tasklet_kill(t);
 }
 
+static inline bool __tasklet_is_enabled(const struct tasklet_struct *t)
+{
+	return likely(!atomic_read(&t->count));
+}
+
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f9bc3aaa90d0..3a8eacab5172 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1462,14 +1462,10 @@ static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
 static void
 gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
 	bool tasklet = false;
 
-	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
-		if (READ_ONCE(engine->execlists.active))
-			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
-						    &engine->irq_posted);
-	}
+	if (iir & GT_CONTEXT_SWITCH_INTERRUPT)
+		tasklet = true;
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
 		notify_ring(engine);
@@ -1477,7 +1473,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	}
 
 	if (tasklet)
-		tasklet_hi_schedule(&execlists->tasklet);
+		tasklet_hi_schedule(&engine->execlists.tasklet);
 }
 
 static void gen8_gt_irq_ack(struct drm_i915_private *i915,
@@ -2185,7 +2181,6 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
 
 		I915_WRITE(VLV_IER, ier);
 		I915_WRITE(GEN8_MASTER_IRQ, GEN8_MASTER_IRQ_CONTROL);
-		POSTING_READ(GEN8_MASTER_IRQ);
 
 		gen8_gt_irq_handler(dev_priv, master_ctl, gt_iir);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index bc0193199a03..43880bf0f529 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1329,12 +1329,10 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
 		ptr = I915_READ(RING_CONTEXT_STATUS_PTR(engine));
 		read = GEN8_CSB_READ_PTR(ptr);
 		write = GEN8_CSB_WRITE_PTR(ptr);
-		drm_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws], interrupt posted? %s, tasklet queued? %s (%s)\n",
+		drm_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws], tasklet queued? %s (%s)\n",
 			   read, execlists->csb_head,
 			   write,
 			   intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)),
-			   yesno(test_bit(ENGINE_IRQ_EXECLIST,
-					  &engine->irq_posted)),
 			   yesno(test_bit(TASKLET_STATE_SCHED,
 					  &engine->execlists.tasklet.state)),
 			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)));
@@ -1515,11 +1513,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	spin_unlock(&b->rb_lock);
 	local_irq_restore(flags);
 
-	drm_printf(m, "IRQ? 0x%lx (breadcrumbs? %s) (execlists? %s)\n",
+	drm_printf(m, "IRQ? 0x%lx (breadcrumbs? %s)\n",
 		   engine->irq_posted,
 		   yesno(test_bit(ENGINE_IRQ_BREADCRUMB,
-				  &engine->irq_posted)),
-		   yesno(test_bit(ENGINE_IRQ_EXECLIST,
 				  &engine->irq_posted)));
 
 	drm_printf(m, "HWSP:\n");
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7b36aa984304..fc58ed96a33f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -557,13 +557,15 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
 {
 	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
 
+	__unwind_incomplete_requests(container_of(execlists,
+						  typeof(struct intel_engine_cs),
+						  execlists));
 	execlists_cancel_port_requests(execlists);
-	execlists_unwind_incomplete_requests(execlists);
 
 	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
 }
 
-static void __execlists_dequeue(struct intel_engine_cs *engine)
+static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
@@ -575,7 +577,11 @@ static void __execlists_dequeue(struct intel_engine_cs *engine)
 
 	lockdep_assert_held(&engine->timeline.lock);
 
-	/* Hardware submission is through 2 ports. Conceptually each port
+	GEM_BUG_ON(execlists_is_active(&engine->execlists,
+				       EXECLISTS_ACTIVE_PREEMPT));
+
+	/*
+	 * Hardware submission is through 2 ports. Conceptually each port
 	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
 	 * static for a context, and unique to each, so we only execute
 	 * requests belonging to a single context from each ring. RING_HEAD
@@ -765,15 +771,6 @@ static void __execlists_dequeue(struct intel_engine_cs *engine)
 		   !port_isset(engine->execlists.port));
 }
 
-static void execlists_dequeue(struct intel_engine_cs *engine)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&engine->timeline.lock, flags);
-	__execlists_dequeue(engine);
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
-}
-
 void
 execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 {
@@ -870,14 +867,6 @@ static void reset_irq(struct intel_engine_cs *engine)
 	synchronize_hardirq(engine->i915->drm.irq);
 
 	clear_gtiir(engine);
-
-	/*
-	 * The port is checked prior to scheduling a tasklet, but
-	 * just in case we have suspended the tasklet to do the
-	 * wedging make sure that when it wakes, it decides there
-	 * is no work to do by clearing the irq_posted bit.
-	 */
-	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 }
 
 static void execlists_cancel_requests(struct intel_engine_cs *engine)
@@ -950,6 +939,12 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	local_irq_restore(flags);
 }
 
+static inline bool
+reset_in_progress(const struct intel_engine_execlists *execlists)
+{
+	return unlikely(!__tasklet_is_enabled(&execlists->tasklet));
+}
+
 static void process_csb(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -957,10 +952,6 @@ static void process_csb(struct intel_engine_cs *engine)
 	const u32 * const buf = execlists->csb_status;
 	u8 head, tail;
 
-	/* Clear before reading to catch new interrupts */
-	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-	smp_mb__after_atomic();
-
 	/* Note that csb_write, csb_status may be either in HWSP or mmio */
 	head = execlists->csb_head;
 	tail = READ_ONCE(*execlists->csb_write);
@@ -1090,19 +1081,9 @@ static void process_csb(struct intel_engine_cs *engine)
 	execlists->csb_head = head;
 }
 
-/*
- * Check the unread Context Status Buffers and manage the submission of new
- * contexts to the ELSP accordingly.
- */
-static void execlists_submission_tasklet(unsigned long data)
+static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 {
-	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
-
-	GEM_TRACE("%s awake?=%d, active=%x, irq-posted?=%d\n",
-		  engine->name,
-		  engine->i915->gt.awake,
-		  engine->execlists.active,
-		  test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
+	lockdep_assert_held(&engine->timeline.lock);
 
 	/*
 	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
@@ -1114,18 +1095,32 @@ static void execlists_submission_tasklet(unsigned long data)
 	 */
 	GEM_BUG_ON(!engine->i915->gt.awake);
 
-	/*
-	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
-	 * imposing the cost of a locked atomic transaction when submitting a
-	 * new request (outside of the context-switch interrupt).
-	 */
-	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
-		process_csb(engine);
-
+	process_csb(engine);
 	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
 		execlists_dequeue(engine);
 }
 
+/*
+ * Check the unread Context Status Buffers and manage the submission of new
+ * contexts to the ELSP accordingly.
+ */
+static void execlists_submission_tasklet(unsigned long data)
+{
+	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
+	unsigned long flags;
+
+	GEM_TRACE("%s awake?=%d, active=%x\n",
+		  engine->name,
+		  engine->i915->gt.awake,
+		  engine->execlists.active);
+
+	spin_lock_irqsave(&engine->timeline.lock, flags);
+
+	__execlists_submission_tasklet(engine);
+
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+}
+
 static void queue_request(struct intel_engine_cs *engine,
 			  struct i915_sched_node *node,
 			  int prio)
@@ -1134,16 +1129,30 @@ static void queue_request(struct intel_engine_cs *engine,
 		      &lookup_priolist(engine, prio)->requests);
 }
 
-static void __submit_queue(struct intel_engine_cs *engine, int prio)
+static void __update_queue(struct intel_engine_cs *engine, int prio)
 {
 	engine->execlists.queue_priority = prio;
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+}
+
+static void __submit_queue(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+
+	if (reset_in_progress(execlists))
+		return; /* defer until we restart the engine following reset */
+
+	if (execlists->tasklet.func == execlists_submission_tasklet)
+		__execlists_submission_tasklet(engine);
+	else
+		tasklet_hi_schedule(&execlists->tasklet);
 }
 
 static void submit_queue(struct intel_engine_cs *engine, int prio)
 {
-	if (prio > engine->execlists.queue_priority)
-		__submit_queue(engine, prio);
+	if (prio > engine->execlists.queue_priority) {
+		__update_queue(engine, prio);
+		__submit_queue(engine);
+	}
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -1155,11 +1164,12 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	queue_request(engine, &request->sched, rq_prio(request));
-	submit_queue(engine, rq_prio(request));
 
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->sched.link));
 
+	submit_queue(engine, rq_prio(request));
+
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
@@ -1286,8 +1296,11 @@ static void execlists_schedule(struct i915_request *request,
 		}
 
 		if (prio > engine->execlists.queue_priority &&
-		    i915_sw_fence_done(&sched_to_request(node)->submit))
-			__submit_queue(engine, prio);
+		    i915_sw_fence_done(&sched_to_request(node)->submit)) {
+			/* defer submission until after all of our updates */
+			__update_queue(engine, prio);
+			tasklet_hi_schedule(&engine->execlists.tasklet);
+		}
 	}
 
 	spin_unlock_irq(&engine->timeline.lock);
@@ -1831,6 +1844,7 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_request *request, *active;
+	unsigned long flags;
 
 	GEM_TRACE("%s\n", engine->name);
 
@@ -1852,8 +1866,10 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	 * and avoid blaming an innocent request if the stall was due to the
 	 * preemption itself.
 	 */
-	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
-		process_csb(engine);
+	spin_lock_irqsave(&engine->timeline.lock, flags);
+	synchronize_hardirq(engine->i915->drm.irq);
+
+	process_csb(engine);
 
 	/*
 	 * The last active request can then be no later than the last request
@@ -1863,15 +1879,12 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	active = NULL;
 	request = port_request(execlists->port);
 	if (request) {
-		unsigned long flags;
-
 		/*
 		 * Prevent the breadcrumb from advancing before we decide
 		 * which request is currently active.
 		 */
 		intel_engine_stop_cs(engine);
 
-		spin_lock_irqsave(&engine->timeline.lock, flags);
 		list_for_each_entry_from_reverse(request,
 						 &engine->timeline.requests,
 						 link) {
@@ -1881,12 +1894,28 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 
 			active = request;
 		}
-		spin_unlock_irqrestore(&engine->timeline.lock, flags);
 	}
 
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+
 	return active;
 }
 
+static void reset_csb_pointers(struct intel_engine_execlists *execlists)
+{
+	/*
+	 * After a reset, the HW starts writing into CSB entry [0]. We
+	 * therefore have to set our HEAD pointer back one entry so that
+	 * the *first* entry we check is entry 0. To complicate this further,
+	 * as we don't wait for the first interrupt after reset, we have to
+	 * fake the HW write to point back to the last entry so that our
+	 * inline comparison of our cached head position against the last HW
+	 * write works even before the first interrupt.
+	 */
+	execlists->csb_head = GEN8_CSB_ENTRIES - 1;
+	WRITE_ONCE(*execlists->csb_write, (GEN8_CSB_ENTRIES - 1) | 0xff << 16);
+}
+
 static void execlists_reset(struct intel_engine_cs *engine,
 			    struct i915_request *request)
 {
@@ -1916,7 +1945,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
 	__unwind_incomplete_requests(engine);
 
 	/* Following the reset, we need to reload the CSB read/write pointers */
-	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
+	reset_csb_pointers(&engine->execlists);
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
@@ -2408,7 +2437,6 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 			upper_32_bits(ce->lrc_desc);
 	}
 
-	execlists->csb_head = GEN8_CSB_ENTRIES - 1;
 	execlists->csb_read =
 		i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
 	if (csb_force_mmio(i915)) {
@@ -2423,6 +2451,7 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 		execlists->csb_write =
 			&engine->status_page.page_addr[intel_hws_csb_write_index(i915)];
 	}
+	reset_csb_pointers(execlists);
 
 	return 0;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 88418477f9ab..f02346291080 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -358,7 +358,6 @@ struct intel_engine_cs {
 	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
-#define ENGINE_IRQ_EXECLIST 1
 
 	/* Rather than have every client wait upon all user interrupts,
 	 * with the herd waking after every interrupt and each doing the
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 08/11] drm/i915: Move rate-limiting request retire to after submission
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (6 preceding siblings ...)
  2018-05-31 18:52 ` [PATCH 07/11] drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd) Chris Wilson
@ 2018-05-31 18:52 ` Chris Wilson
  2018-05-31 18:52 ` [PATCH 09/11] drm/i915: Wait for engines to idle before retiring Chris Wilson
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:52 UTC (permalink / raw)
  To: intel-gfx

Our long standing defense against a single client from flooding the
system with requests (causing mempressure and stalls across the whole
system) is to retire the old request on every allocation. (By retiring
the oldest, we try to keep returning requests back to the system in a
steady flow.) This adds an extra step into the submission path that we
can reduce simply by moving it to after the submission itself.

We already do try to clean up a stale request list after submission, so
always retiring all completed requests fits in as a natural extension.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f187250e60c6..796dad25e6ba 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -694,12 +694,6 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	if (ret)
 		goto err_unreserve;
 
-	/* Move our oldest request to the slab-cache (if not in use!) */
-	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
-	if (!list_is_last(&rq->ring_link, &ce->ring->request_list) &&
-	    i915_request_completed(rq))
-		i915_request_retire(rq);
-
 	/*
 	 * Beware: Dragons be flying overhead.
 	 *
@@ -1122,6 +1116,8 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
 
 	/*
+	 * Move our oldest requests to the slab-cache (if not in use!)
+	 *
 	 * In typical scenarios, we do not expect the previous request on
 	 * the timeline to be still tracked by timeline->last_request if it
 	 * has been completed. If the completed request is still here, that
@@ -1138,8 +1134,22 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	 * work on behalf of others -- but instead we should benefit from
 	 * improved resource management. (Well, that's the theory at least.)
 	 */
-	if (prev && i915_request_completed(prev))
-		i915_request_retire_upto(prev);
+	do {
+		prev = list_first_entry(&ring->request_list,
+					typeof(*prev), ring_link);
+
+		/*
+		 * Keep the current request, the caller may not be
+		 * expecting it to be retired (and freed!) immediately,
+		 * and preserving one request from the client allows us to
+		 * carry forward frequently reused state onto the next
+		 * submission.
+		 */
+		if (prev == request || !i915_request_completed(prev))
+			break;
+
+		i915_request_retire(prev);
+	} while (1);
 }
 
 static unsigned long local_clock_us(unsigned int *cpu)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 09/11] drm/i915: Wait for engines to idle before retiring
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (7 preceding siblings ...)
  2018-05-31 18:52 ` [PATCH 08/11] drm/i915: Move rate-limiting request retire to after submission Chris Wilson
@ 2018-05-31 18:52 ` Chris Wilson
  2018-05-31 18:52 ` [PATCH 10/11] drm/i915: Move engine request retirement to intel_engine_cs Chris Wilson
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:52 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will start to defer retiring the request from the
engine list if it is still active on the submission backend. To preserve
the semantics that after wait-for-idle completes the system is idle and
fully retired, we need to therefore wait for the backends to idle before
calling i915_retire_requests().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 550fa8288c45..3e442c3094e7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3768,10 +3768,13 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915, unsigned int flags)
 			if (err)
 				return err;
 		}
+
+		err = wait_for_engines(i915);
+		if (err)
+			return err;
+
 		i915_retire_requests(i915);
 		GEM_BUG_ON(i915->gt.active_requests);
-
-		return wait_for_engines(i915);
 	} else {
 		struct intel_engine_cs *engine;
 		enum intel_engine_id id;
@@ -3782,9 +3785,9 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915, unsigned int flags)
 			if (err)
 				return err;
 		}
-
-		return 0;
 	}
+
+	return 0;
 }
 
 static void __i915_gem_object_flush_for_display(struct drm_i915_gem_object *obj)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 10/11] drm/i915: Move engine request retirement to intel_engine_cs
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (8 preceding siblings ...)
  2018-05-31 18:52 ` [PATCH 09/11] drm/i915: Wait for engines to idle before retiring Chris Wilson
@ 2018-05-31 18:52 ` Chris Wilson
  2018-05-31 18:52 ` [PATCH 11/11] drm/i915: Hold request reference for submission until retirement Chris Wilson
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:52 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will move ownership of the fence reference to the
submission backend and will want to drop its final reference when
retiring it from the submission backend. We will also need a catch up
when parking the engine to cleanup any residual entries in the engine
timeline. In short, move the engine retirement from i915_request.c to
intel_engine_cs.c for future use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c     | 47 +--------------------
 drivers/gpu/drm/i915/intel_engine_cs.c  | 54 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +
 3 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 796dad25e6ba..f91942e4d852 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -344,50 +344,6 @@ static void free_capture_list(struct i915_request *request)
 	}
 }
 
-static void __retire_engine_request(struct intel_engine_cs *engine,
-				    struct i915_request *rq)
-{
-	GEM_TRACE("%s(%s) fence %llx:%d, global=%d, current %d\n",
-		  __func__, engine->name,
-		  rq->fence.context, rq->fence.seqno,
-		  rq->global_seqno,
-		  intel_engine_get_seqno(engine));
-
-	GEM_BUG_ON(!i915_request_completed(rq));
-
-	local_irq_disable();
-
-	spin_lock(&engine->timeline.lock);
-	GEM_BUG_ON(!list_is_first(&rq->link, &engine->timeline.requests));
-	list_del_init(&rq->link);
-	spin_unlock(&engine->timeline.lock);
-
-	spin_lock(&rq->lock);
-	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
-		dma_fence_signal_locked(&rq->fence);
-	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
-		intel_engine_cancel_signaling(rq);
-	if (rq->waitboost) {
-		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
-		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
-	}
-	spin_unlock(&rq->lock);
-
-	local_irq_enable();
-
-	/*
-	 * The backing object for the context is done after switching to the
-	 * *next* context. Therefore we cannot retire the previous context until
-	 * the next context has already started running. However, since we
-	 * cannot take the required locks at i915_request_submit() we
-	 * defer the unpinning of the active context to now, retirement of
-	 * the subsequent request.
-	 */
-	if (engine->last_retired_context)
-		intel_context_unpin(engine->last_retired_context);
-	engine->last_retired_context = rq->hw_context;
-}
-
 static void __retire_engine_upto(struct intel_engine_cs *engine,
 				 struct i915_request *rq)
 {
@@ -400,8 +356,7 @@ static void __retire_engine_upto(struct intel_engine_cs *engine,
 		tmp = list_first_entry(&engine->timeline.requests,
 				       typeof(*tmp), link);
 
-		GEM_BUG_ON(tmp->engine != engine);
-		__retire_engine_request(engine, tmp);
+		intel_engine_retire_request(engine, tmp);
 	} while (tmp != rq);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 43880bf0f529..cce7234b9071 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1064,6 +1064,60 @@ void intel_engines_reset_default_submission(struct drm_i915_private *i915)
 		engine->set_default_submission(engine);
 }
 
+/**
+ * intel_engines_retire_request: drop the request reference from the engine
+ * @engine: the engine
+ * @rq: the request
+ *
+ * This request has been completed and is part of the chain being retired by
+ * the caller, so drop any reference to it from the engine.
+ */
+void intel_engine_retire_request(struct intel_engine_cs *engine,
+				 struct i915_request *rq)
+{
+	GEM_TRACE("%s(%s) fence %llx:%d, global=%d, current %d\n",
+		  __func__, engine->name,
+		  rq->fence.context, rq->fence.seqno,
+		  rq->global_seqno,
+		  intel_engine_get_seqno(engine));
+
+	lockdep_assert_held(&engine->i915->drm.struct_mutex);
+	GEM_BUG_ON(rq->engine != engine);
+	GEM_BUG_ON(!i915_request_completed(rq));
+
+	local_irq_disable();
+
+	spin_lock(&engine->timeline.lock);
+	GEM_BUG_ON(!list_is_first(&rq->link, &engine->timeline.requests));
+	list_del_init(&rq->link);
+	spin_unlock(&engine->timeline.lock);
+
+	spin_lock(&rq->lock);
+	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
+		dma_fence_signal_locked(&rq->fence);
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
+		intel_engine_cancel_signaling(rq);
+	if (rq->waitboost) {
+		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
+		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
+	}
+	spin_unlock(&rq->lock);
+
+	local_irq_enable();
+
+	/*
+	 * The backing object for the context is done after switching to the
+	 * *next* context. Therefore we cannot retire the previous context until
+	 * the next context has already started running. However, since we
+	 * cannot take the required locks at i915_request_submit() we
+	 * defer the unpinning of the active context to now, retirement of
+	 * the subsequent request.
+	 */
+	if (engine->last_retired_context)
+		intel_context_unpin(engine->last_retired_context);
+	engine->last_retired_context = rq->hw_context;
+}
+
 /**
  * intel_engines_park: called when the GT is transitioning from busy->idle
  * @i915: the i915 device
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index f02346291080..bd0067047044 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -888,6 +888,8 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_blt_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine);
 
+void intel_engine_retire_request(struct intel_engine_cs *engine,
+				 struct i915_request *rq);
 int intel_engine_stop_cs(struct intel_engine_cs *engine);
 
 u64 intel_engine_get_active_head(const struct intel_engine_cs *engine);
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 11/11] drm/i915: Hold request reference for submission until retirement
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (9 preceding siblings ...)
  2018-05-31 18:52 ` [PATCH 10/11] drm/i915: Move engine request retirement to intel_engine_cs Chris Wilson
@ 2018-05-31 18:52 ` Chris Wilson
  2018-05-31 20:21   ` [PATCH] " Chris Wilson
  2018-05-31 19:30 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset Patchwork
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 18:52 UTC (permalink / raw)
  To: intel-gfx

Currently the async submission backends (guc and execlists) hold a extra
reference to the requests being processed as they are not serialised with
request retirement. If we instead, prevent the request being dropped
from the engine timeline until after submission has finished processing
the request, we can use a single reference held for the entire
submission process (currently, it is held only for the submission
fence).

By doing so we remove a few atomics from inside the irqoff path, on the
order of 200ns as measured by gem_syslatency, bringing the impact of
direct submission into line with the previous tasklet implementation.
The tradeoff is that as we may postpone the retirement, we have to check
for any residual requests after detecting that the engines are idle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_request.c         | 20 ++++++++---------
 drivers/gpu/drm/i915/intel_engine_cs.c      | 24 ++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_guc_submission.c |  4 +---
 drivers/gpu/drm/i915/intel_lrc.c            | 10 +--------
 drivers/gpu/drm/i915/intel_ringbuffer.h     |  2 +-
 5 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f91942e4d852..2cfa6f3a2f16 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -347,17 +347,15 @@ static void free_capture_list(struct i915_request *request)
 static void __retire_engine_upto(struct intel_engine_cs *engine,
 				 struct i915_request *rq)
 {
+	struct list_head * const requests = &engine->timeline.requests;
 	struct i915_request *tmp;
 
 	if (list_empty(&rq->link))
 		return;
 
-	do {
-		tmp = list_first_entry(&engine->timeline.requests,
-				       typeof(*tmp), link);
-
-		intel_engine_retire_request(engine, tmp);
-	} while (tmp != rq);
+	do
+		tmp = list_first_entry(requests, typeof(*tmp), link);
+	while (intel_engine_retire_request(engine, tmp) && tmp != rq);
 }
 
 static void i915_request_retire(struct i915_request *request)
@@ -376,6 +374,8 @@ static void i915_request_retire(struct i915_request *request)
 
 	trace_i915_request_retire(request);
 
+	__retire_engine_upto(request->engine, request);
+
 	advance_ring(request);
 	free_capture_list(request);
 
@@ -414,8 +414,6 @@ static void i915_request_retire(struct i915_request *request)
 	atomic_dec_if_positive(&request->gem_context->ban_score);
 	intel_context_unpin(request->hw_context);
 
-	__retire_engine_upto(request->engine, request);
-
 	unreserve_gt(request->i915);
 
 	i915_sched_node_fini(request->i915, &request->sched);
@@ -722,8 +720,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		       rq->timeline->fence_context,
 		       timeline_get_seqno(rq->timeline));
 
-	/* We bump the ref for the fence chain */
-	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
+	/* We bump the ref for the fence chain and for the submit backend. */
+	refcount_set(&rq->fence.refcount.refcount, 3);
+
+	i915_sw_fence_init(&rq->submit, submit_notify);
 	init_waitqueue_head(&rq->execute);
 
 	i915_sched_node_init(&rq->sched);
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index cce7234b9071..b45d6aa7d29d 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1071,8 +1071,10 @@ void intel_engines_reset_default_submission(struct drm_i915_private *i915)
  *
  * This request has been completed and is part of the chain being retired by
  * the caller, so drop any reference to it from the engine.
+ *
+ * Returns: true if the reference was dropped, false if it was still busy.
  */
-void intel_engine_retire_request(struct intel_engine_cs *engine,
+bool intel_engine_retire_request(struct intel_engine_cs *engine,
 				 struct i915_request *rq)
 {
 	GEM_TRACE("%s(%s) fence %llx:%d, global=%d, current %d\n",
@@ -1085,6 +1087,10 @@ void intel_engine_retire_request(struct intel_engine_cs *engine,
 	GEM_BUG_ON(rq->engine != engine);
 	GEM_BUG_ON(!i915_request_completed(rq));
 
+	/* Don't drop the final ref until after the backend has finished */
+	if (port_request(engine->execlists.port) == rq)
+		return false;
+
 	local_irq_disable();
 
 	spin_lock(&engine->timeline.lock);
@@ -1116,6 +1122,19 @@ void intel_engine_retire_request(struct intel_engine_cs *engine,
 	if (engine->last_retired_context)
 		intel_context_unpin(engine->last_retired_context);
 	engine->last_retired_context = rq->hw_context;
+
+	i915_request_put(rq);
+	return true;
+}
+
+static void engine_retire_requests(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq, *next;
+
+	list_for_each_entry_safe(rq, next, &engine->timeline.requests, link) {
+		if (WARN_ON(!intel_engine_retire_request(engine, rq)))
+			break;
+	}
 }
 
 /**
@@ -1148,6 +1167,7 @@ void intel_engines_park(struct drm_i915_private *i915)
 				"%s is not idle before parking\n",
 				engine->name);
 			intel_engine_dump(engine, &p, NULL);
+			engine->cancel_requests(engine);
 		}
 
 		/* Must be reset upon idling, or we may miss the busy wakeup. */
@@ -1156,6 +1176,8 @@ void intel_engines_park(struct drm_i915_private *i915)
 		if (engine->park)
 			engine->park(engine);
 
+		engine_retire_requests(engine);
+
 		if (engine->pinned_default_state) {
 			i915_gem_object_unpin_map(engine->default_state);
 			engine->pinned_default_state = NULL;
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 133367a17863..6f6223644140 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -669,8 +669,7 @@ static void guc_submit(struct intel_engine_cs *engine)
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(port_isset(port));
-
-	port_set(port, i915_request_get(rq));
+	port_set(port, rq);
 }
 
 static inline int rq_prio(const struct i915_request *rq)
@@ -793,7 +792,6 @@ static void guc_submission_tasklet(unsigned long data)
 	rq = port_request(port);
 	while (rq && i915_request_completed(rq)) {
 		trace_i915_request_out(rq);
-		i915_request_put(rq);
 
 		port = execlists_port_complete(execlists, port);
 		if (port_isset(port)) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fc58ed96a33f..3fc764f882eb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -513,11 +513,7 @@ static bool can_merge_ctx(const struct intel_context *prev,
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(rq == port_request(port));
-
-	if (port_isset(port))
-		i915_request_put(port_request(port));
-
-	port_set(port, port_pack(i915_request_get(rq), port_count(port)));
+	port_set(port, port_pack(rq, port_count(port)));
 }
 
 static void inject_preempt_context(struct intel_engine_cs *engine)
@@ -793,8 +789,6 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 					       INTEL_CONTEXT_SCHEDULE_OUT :
 					       INTEL_CONTEXT_SCHEDULE_PREEMPTED);
 
-		i915_request_put(rq);
-
 		memset(port, 0, sizeof(*port));
 		port++;
 	}
@@ -1061,8 +1055,6 @@ static void process_csb(struct intel_engine_cs *engine)
 
 			execlists_context_schedule_out(rq,
 						       INTEL_CONTEXT_SCHEDULE_OUT);
-			i915_request_put(rq);
-
 			GEM_TRACE("%s completed ctx=%d\n",
 				  engine->name, port->context_id);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index bd0067047044..d429765ff9cd 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -888,7 +888,7 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_blt_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine);
 
-void intel_engine_retire_request(struct intel_engine_cs *engine,
+bool intel_engine_retire_request(struct intel_engine_cs *engine,
 				 struct i915_request *rq);
 int intel_engine_stop_cs(struct intel_engine_cs *engine);
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (10 preceding siblings ...)
  2018-05-31 18:52 ` [PATCH 11/11] drm/i915: Hold request reference for submission until retirement Chris Wilson
@ 2018-05-31 19:30 ` Patchwork
  2018-05-31 19:48 ` ✗ Fi.CI.BAT: failure " Patchwork
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2018-05-31 19:30 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Be irqsafe inside reset
URL   : https://patchwork.freedesktop.org/series/44041/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
1ba6410ddc51 drm/i915: Be irqsafe inside reset
14fd12cfeba8 drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
-:41: WARNING:LONG_LINE: line over 100 characters
#41: FILE: drivers/gpu/drm/i915/intel_lrc.c:979:
+				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));

total: 0 errors, 1 warnings, 0 checks, 42 lines checked
f0a4fe328767 drm/i915/execlists: Pull submit after dequeue under timeline lock
a70428062fe9 drm/i915/execlists: Pull CSB reset under the timeline.lock
ae196ee5aeae drm/i915/execlists: Process one CSB interrupt at a time
-:35: WARNING:MEMORY_BARRIER: memory barrier without comment
#35: FILE: drivers/gpu/drm/i915/intel_lrc.c:966:
+	smp_mb__after_atomic();

-:77: WARNING:LONG_LINE: line over 100 characters
#77: FILE: drivers/gpu/drm/i915/intel_lrc.c:990:
+		  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",

-:78: WARNING:LONG_LINE: line over 100 characters
#78: FILE: drivers/gpu/drm/i915/intel_lrc.c:991:
+		  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");

-:140: CHECK:SPACING: spaces preferred around that '*' (ctx:VxV)
#140: FILE: drivers/gpu/drm/i915/intel_lrc.c:1022:
+			  status, buf[2*head + 1],
 			               ^

-:176: CHECK:SPACING: spaces preferred around that '*' (ctx:VxV)
#176: FILE: drivers/gpu/drm/i915/intel_lrc.c:1040:
+		    buf[2*head + 1] == execlists->preempt_complete_status) {
 		         ^

total: 0 errors, 3 warnings, 2 checks, 301 lines checked
cedc7a7c2b86 drm/i915/execlists: Unify CSB access pointers
3050c9d29fb3 drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
-:27: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#27: 
Comparing the impact on the maximum latency observed (that is the time stolen

-:100: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half")'
#100: 
References: 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half")

total: 1 errors, 1 warnings, 0 checks, 358 lines checked
0e1792ea6ee9 drm/i915: Move rate-limiting request retire to after submission
91807ea0b483 drm/i915: Wait for engines to idle before retiring
cda0eeead5d8 drm/i915: Move engine request retirement to intel_engine_cs
d0aacc9e35bd drm/i915: Hold request reference for submission until retirement

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/11] drm/i915: Be irqsafe inside reset
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (11 preceding siblings ...)
  2018-05-31 19:30 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset Patchwork
@ 2018-05-31 19:48 ` Patchwork
  2018-05-31 20:14 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset (rev2) Patchwork
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2018-05-31 19:48 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Be irqsafe inside reset
URL   : https://patchwork.freedesktop.org/series/44041/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4268 -> Patchwork_9160 =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9160 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9160, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/44041/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9160:

  === IGT changes ===

    ==== Possible regressions ====

    igt@gem_render_tiled_blits@basic:
      fi-cfl-s3:          PASS -> INCOMPLETE

    
    ==== Warnings ====

    igt@gem_exec_gttfill@basic:
      fi-pnv-d510:        PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_9160 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_flip@basic-plain-flip:
      fi-cnl-y3:          PASS -> INCOMPLETE (fdo#105086, fdo#104962)

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-bxt-dsi:         NOTRUN -> INCOMPLETE (fdo#103927)

    igt@prime_vgem@basic-fence-flip:
      fi-glk-j4005:       PASS -> FAIL (fdo#104008)

    
    ==== Possible fixes ====

    igt@kms_chamelium@hdmi-hpd-fast:
      fi-kbl-7500u:       FAIL (fdo#102672, fdo#103841) -> SKIP

    
  fdo#102672 https://bugs.freedesktop.org/show_bug.cgi?id=102672
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008
  fdo#104962 https://bugs.freedesktop.org/show_bug.cgi?id=104962
  fdo#105086 https://bugs.freedesktop.org/show_bug.cgi?id=105086


== Participating hosts (42 -> 40) ==

  Additional (1): fi-bxt-dsi 
  Missing    (3): fi-ctg-p8600 fi-ilk-m540 fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4268 -> Patchwork_9160

  CI_DRM_4268: 7138a144b0da03dcbe54731f7ac0dab6948beafb @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4503: ae0ea2a0cff1cf8516d18ada5b9db01c56b73ed9 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9160: d0aacc9e35bdcbfc2de3f612727fb5829cb6b5a2 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

d0aacc9e35bd drm/i915: Hold request reference for submission until retirement
cda0eeead5d8 drm/i915: Move engine request retirement to intel_engine_cs
91807ea0b483 drm/i915: Wait for engines to idle before retiring
0e1792ea6ee9 drm/i915: Move rate-limiting request retire to after submission
3050c9d29fb3 drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
cedc7a7c2b86 drm/i915/execlists: Unify CSB access pointers
ae196ee5aeae drm/i915/execlists: Process one CSB interrupt at a time
a70428062fe9 drm/i915/execlists: Pull CSB reset under the timeline.lock
f0a4fe328767 drm/i915/execlists: Pull submit after dequeue under timeline lock
14fd12cfeba8 drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
1ba6410ddc51 drm/i915: Be irqsafe inside reset

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9160/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
  2018-05-31 18:52 ` [PATCH 07/11] drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd) Chris Wilson
@ 2018-05-31 19:57   ` Chris Wilson
  0 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 19:57 UTC (permalink / raw)
  To: intel-gfx

Back in commit 27af5eea54d1 ("drm/i915: Move execlists irq handler to a
bottom half"), we came to the conclusion that running our CSB processing
and ELSP submission from inside the irq handler was a bad idea. A really
bad idea as we could impose nearly 1s latency on other users of the
system, on average! Deferring our work to a tasklet allowed us to do the
processing with irqs enabled, reducing the impact to an average of about
50us.

We have since eradicated the use of forcewaked mmio from inside the CSB
processing and ELSP submission, bringing the impact down to around 5us
(on Kabylake); an order of magnitude better than our measurements 2
years ago on Broadwell and only about 2x worse on average than the
gem_syslatency on an unladen system.

In this iteration of the tasklet-vs-direct submission debate, we seek a
compromise where by we submit new requests immediately to the HW but
defer processing the CS interrupt onto a tasklet. We gain the advantage
of low-latency and ksoftirqd avoidance when waking up the HW, while
avoiding the system-wide starvation of our CS irq-storms.

Comparing the impact on the maximum latency observed (that is the time stolen
from an RT process) over a 120s interval, repeated several times (using
gem_syslatency, similar to RT's cyclictest) while the system is fully
laden with i915 nops, we see that direct submission an actually improve
the worse case.

Maximum latency in microseconds of a third party RT thread
(gem_syslatency -t 120 -f 2)
  x Always using tasklets (a couple of >1000us outliers removed)
  + Only using tasklets from CS irq, direct submission of requests
+------------------------------------------------------------------------+
|          +                                                             |
|          +                                                             |
|          +                                                             |
|          +       +                                                     |
|          + +     +                                                     |
|       +  + +     +  x     x     x                                      |
|      +++ + +     +  x  x  x  x  x  x                                   |
|      +++ + ++  + +  *x x  x  x  x  x                                   |
|      +++ + ++  + *  *x x  *  x  x  x                                   |
|    + +++ + ++  * * +*xxx  *  x  x  xx                                  |
|    * +++ + ++++* *x+**xx+ *  x  x xxxx x                               |
|   **x++++*++**+*x*x****x+ * +x xx xxxx x          x                    |
|x* ******+***************++*+***xxxxxx* xx*x     xxx +                x+|
|             |__________MA___________|                                  |
|      |______M__A________|                                              |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 118            91           186           124     125.28814     16.279137
+ 120            92           187           109     112.00833     13.458617
Difference at 95.0% confidence
	-13.2798 +/- 3.79219
	-10.5994% +/- 3.02677%
	(Student's t, pooled s = 14.9237)

However the mean latency is adversely affected:

Mean latency in microseconds of a third party RT thread
(gem_syslatency -t 120 -f 1)
  x Always using tasklets
  + Only using tasklets from CS irq, direct submission of requests
+------------------------------------------------------------------------+
|           xxxxxx                                        +   ++         |
|           xxxxxx                                        +   ++         |
|           xxxxxx                                      + +++ ++         |
|           xxxxxxx                                     +++++ ++         |
|           xxxxxxx                                     +++++ ++         |
|           xxxxxxx                                     +++++ +++        |
|           xxxxxxx                                   + ++++++++++       |
|           xxxxxxxx                                 ++ ++++++++++       |
|           xxxxxxxx                                 ++ ++++++++++       |
|          xxxxxxxxxx                                +++++++++++++++     |
|         xxxxxxxxxxx    x                           +++++++++++++++     |
|x       xxxxxxxxxxxxx   x           +            + ++++++++++++++++++  +|
|           |__A__|                                                      |
|                                                      |____A___|        |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 120         3.506         3.727         3.631     3.6321417    0.02773109
+ 120         3.834         4.149         4.039     4.0375167   0.041221676
Difference at 95.0% confidence
	0.405375 +/- 0.00888913
	11.1608% +/- 0.244735%
	(Student's t, pooled s = 0.03513)

However, since the mean latency corresponds to the amount of irqsoff
processing we have to do for a CS interrupt, we only need to speed that
up to benefit not just system latency but our own throughput.

v2: Remember to defer submissions when under reset.
v4: Only use direct submission for new requests
v5: Be aware that with mixing direct tasklet evaluation and deferred
tasklets, we may end up idling before running the deferred tasklet.

Testcase: igt/gem_exec_latency/*rthog*
References: 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.h         |   5 +
 drivers/gpu/drm/i915/i915_irq.c         |  11 +-
 drivers/gpu/drm/i915/intel_engine_cs.c  |   8 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 148 ++++++++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 -
 5 files changed, 99 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 261da577829a..7892ac773916 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -88,4 +88,9 @@ static inline void __tasklet_enable_sync_once(struct tasklet_struct *t)
 		tasklet_kill(t);
 }
 
+static inline bool __tasklet_is_enabled(const struct tasklet_struct *t)
+{
+	return likely(!atomic_read(&t->count));
+}
+
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f9bc3aaa90d0..3a8eacab5172 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1462,14 +1462,10 @@ static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
 static void
 gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
 	bool tasklet = false;
 
-	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
-		if (READ_ONCE(engine->execlists.active))
-			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
-						    &engine->irq_posted);
-	}
+	if (iir & GT_CONTEXT_SWITCH_INTERRUPT)
+		tasklet = true;
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
 		notify_ring(engine);
@@ -1477,7 +1473,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	}
 
 	if (tasklet)
-		tasklet_hi_schedule(&execlists->tasklet);
+		tasklet_hi_schedule(&engine->execlists.tasklet);
 }
 
 static void gen8_gt_irq_ack(struct drm_i915_private *i915,
@@ -2185,7 +2181,6 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
 
 		I915_WRITE(VLV_IER, ier);
 		I915_WRITE(GEN8_MASTER_IRQ, GEN8_MASTER_IRQ_CONTROL);
-		POSTING_READ(GEN8_MASTER_IRQ);
 
 		gen8_gt_irq_handler(dev_priv, master_ctl, gt_iir);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index bc0193199a03..43880bf0f529 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1329,12 +1329,10 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
 		ptr = I915_READ(RING_CONTEXT_STATUS_PTR(engine));
 		read = GEN8_CSB_READ_PTR(ptr);
 		write = GEN8_CSB_WRITE_PTR(ptr);
-		drm_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws], interrupt posted? %s, tasklet queued? %s (%s)\n",
+		drm_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws], tasklet queued? %s (%s)\n",
 			   read, execlists->csb_head,
 			   write,
 			   intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)),
-			   yesno(test_bit(ENGINE_IRQ_EXECLIST,
-					  &engine->irq_posted)),
 			   yesno(test_bit(TASKLET_STATE_SCHED,
 					  &engine->execlists.tasklet.state)),
 			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)));
@@ -1515,11 +1513,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	spin_unlock(&b->rb_lock);
 	local_irq_restore(flags);
 
-	drm_printf(m, "IRQ? 0x%lx (breadcrumbs? %s) (execlists? %s)\n",
+	drm_printf(m, "IRQ? 0x%lx (breadcrumbs? %s)\n",
 		   engine->irq_posted,
 		   yesno(test_bit(ENGINE_IRQ_BREADCRUMB,
-				  &engine->irq_posted)),
-		   yesno(test_bit(ENGINE_IRQ_EXECLIST,
 				  &engine->irq_posted)));
 
 	drm_printf(m, "HWSP:\n");
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7b36aa984304..46e8d29acf5f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -557,13 +557,15 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
 {
 	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
 
+	__unwind_incomplete_requests(container_of(execlists,
+						  typeof(struct intel_engine_cs),
+						  execlists));
 	execlists_cancel_port_requests(execlists);
-	execlists_unwind_incomplete_requests(execlists);
 
 	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
 }
 
-static void __execlists_dequeue(struct intel_engine_cs *engine)
+static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
@@ -575,7 +577,11 @@ static void __execlists_dequeue(struct intel_engine_cs *engine)
 
 	lockdep_assert_held(&engine->timeline.lock);
 
-	/* Hardware submission is through 2 ports. Conceptually each port
+	GEM_BUG_ON(execlists_is_active(&engine->execlists,
+				       EXECLISTS_ACTIVE_PREEMPT));
+
+	/*
+	 * Hardware submission is through 2 ports. Conceptually each port
 	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
 	 * static for a context, and unique to each, so we only execute
 	 * requests belonging to a single context from each ring. RING_HEAD
@@ -765,15 +771,6 @@ static void __execlists_dequeue(struct intel_engine_cs *engine)
 		   !port_isset(engine->execlists.port));
 }
 
-static void execlists_dequeue(struct intel_engine_cs *engine)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&engine->timeline.lock, flags);
-	__execlists_dequeue(engine);
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
-}
-
 void
 execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 {
@@ -870,14 +867,6 @@ static void reset_irq(struct intel_engine_cs *engine)
 	synchronize_hardirq(engine->i915->drm.irq);
 
 	clear_gtiir(engine);
-
-	/*
-	 * The port is checked prior to scheduling a tasklet, but
-	 * just in case we have suspended the tasklet to do the
-	 * wedging make sure that when it wakes, it decides there
-	 * is no work to do by clearing the irq_posted bit.
-	 */
-	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 }
 
 static void execlists_cancel_requests(struct intel_engine_cs *engine)
@@ -950,6 +939,12 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	local_irq_restore(flags);
 }
 
+static inline bool
+reset_in_progress(const struct intel_engine_execlists *execlists)
+{
+	return unlikely(!__tasklet_is_enabled(&execlists->tasklet));
+}
+
 static void process_csb(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -957,10 +952,6 @@ static void process_csb(struct intel_engine_cs *engine)
 	const u32 * const buf = execlists->csb_status;
 	u8 head, tail;
 
-	/* Clear before reading to catch new interrupts */
-	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-	smp_mb__after_atomic();
-
 	/* Note that csb_write, csb_status may be either in HWSP or mmio */
 	head = execlists->csb_head;
 	tail = READ_ONCE(*execlists->csb_write);
@@ -1090,19 +1081,9 @@ static void process_csb(struct intel_engine_cs *engine)
 	execlists->csb_head = head;
 }
 
-/*
- * Check the unread Context Status Buffers and manage the submission of new
- * contexts to the ELSP accordingly.
- */
-static void execlists_submission_tasklet(unsigned long data)
+static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 {
-	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
-
-	GEM_TRACE("%s awake?=%d, active=%x, irq-posted?=%d\n",
-		  engine->name,
-		  engine->i915->gt.awake,
-		  engine->execlists.active,
-		  test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
+	lockdep_assert_held(&engine->timeline.lock);
 
 	/*
 	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
@@ -1114,18 +1095,33 @@ static void execlists_submission_tasklet(unsigned long data)
 	 */
 	GEM_BUG_ON(!engine->i915->gt.awake);
 
-	/*
-	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
-	 * imposing the cost of a locked atomic transaction when submitting a
-	 * new request (outside of the context-switch interrupt).
-	 */
-	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
-		process_csb(engine);
-
+	process_csb(engine);
 	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
 		execlists_dequeue(engine);
 }
 
+/*
+ * Check the unread Context Status Buffers and manage the submission of new
+ * contexts to the ELSP accordingly.
+ */
+static void execlists_submission_tasklet(unsigned long data)
+{
+	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
+	unsigned long flags;
+
+	GEM_TRACE("%s awake?=%d, active=%x\n",
+		  engine->name,
+		  engine->i915->gt.awake,
+		  engine->execlists.active);
+
+	spin_lock_irqsave(&engine->timeline.lock, flags);
+
+	if (engine->i915->gt.awake) /* we may be delayed until after we idle! */
+		__execlists_submission_tasklet(engine);
+
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+}
+
 static void queue_request(struct intel_engine_cs *engine,
 			  struct i915_sched_node *node,
 			  int prio)
@@ -1134,16 +1130,30 @@ static void queue_request(struct intel_engine_cs *engine,
 		      &lookup_priolist(engine, prio)->requests);
 }
 
-static void __submit_queue(struct intel_engine_cs *engine, int prio)
+static void __update_queue(struct intel_engine_cs *engine, int prio)
 {
 	engine->execlists.queue_priority = prio;
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+}
+
+static void __submit_queue(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+
+	if (reset_in_progress(execlists))
+		return; /* defer until we restart the engine following reset */
+
+	if (execlists->tasklet.func == execlists_submission_tasklet)
+		__execlists_submission_tasklet(engine);
+	else
+		tasklet_hi_schedule(&execlists->tasklet);
 }
 
 static void submit_queue(struct intel_engine_cs *engine, int prio)
 {
-	if (prio > engine->execlists.queue_priority)
-		__submit_queue(engine, prio);
+	if (prio > engine->execlists.queue_priority) {
+		__update_queue(engine, prio);
+		__submit_queue(engine);
+	}
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -1155,11 +1165,12 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	queue_request(engine, &request->sched, rq_prio(request));
-	submit_queue(engine, rq_prio(request));
 
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->sched.link));
 
+	submit_queue(engine, rq_prio(request));
+
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
@@ -1286,8 +1297,11 @@ static void execlists_schedule(struct i915_request *request,
 		}
 
 		if (prio > engine->execlists.queue_priority &&
-		    i915_sw_fence_done(&sched_to_request(node)->submit))
-			__submit_queue(engine, prio);
+		    i915_sw_fence_done(&sched_to_request(node)->submit)) {
+			/* defer submission until after all of our updates */
+			__update_queue(engine, prio);
+			tasklet_hi_schedule(&engine->execlists.tasklet);
+		}
 	}
 
 	spin_unlock_irq(&engine->timeline.lock);
@@ -1831,6 +1845,7 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_request *request, *active;
+	unsigned long flags;
 
 	GEM_TRACE("%s\n", engine->name);
 
@@ -1852,8 +1867,10 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	 * and avoid blaming an innocent request if the stall was due to the
 	 * preemption itself.
 	 */
-	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
-		process_csb(engine);
+	spin_lock_irqsave(&engine->timeline.lock, flags);
+	synchronize_hardirq(engine->i915->drm.irq);
+
+	process_csb(engine);
 
 	/*
 	 * The last active request can then be no later than the last request
@@ -1863,15 +1880,12 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	active = NULL;
 	request = port_request(execlists->port);
 	if (request) {
-		unsigned long flags;
-
 		/*
 		 * Prevent the breadcrumb from advancing before we decide
 		 * which request is currently active.
 		 */
 		intel_engine_stop_cs(engine);
 
-		spin_lock_irqsave(&engine->timeline.lock, flags);
 		list_for_each_entry_from_reverse(request,
 						 &engine->timeline.requests,
 						 link) {
@@ -1881,12 +1895,28 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 
 			active = request;
 		}
-		spin_unlock_irqrestore(&engine->timeline.lock, flags);
 	}
 
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+
 	return active;
 }
 
+static void reset_csb_pointers(struct intel_engine_execlists *execlists)
+{
+	/*
+	 * After a reset, the HW starts writing into CSB entry [0]. We
+	 * therefore have to set our HEAD pointer back one entry so that
+	 * the *first* entry we check is entry 0. To complicate this further,
+	 * as we don't wait for the first interrupt after reset, we have to
+	 * fake the HW write to point back to the last entry so that our
+	 * inline comparison of our cached head position against the last HW
+	 * write works even before the first interrupt.
+	 */
+	execlists->csb_head = GEN8_CSB_ENTRIES - 1;
+	WRITE_ONCE(*execlists->csb_write, (GEN8_CSB_ENTRIES - 1) | 0xff << 16);
+}
+
 static void execlists_reset(struct intel_engine_cs *engine,
 			    struct i915_request *request)
 {
@@ -1916,7 +1946,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
 	__unwind_incomplete_requests(engine);
 
 	/* Following the reset, we need to reload the CSB read/write pointers */
-	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
+	reset_csb_pointers(&engine->execlists);
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
@@ -2408,7 +2438,6 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 			upper_32_bits(ce->lrc_desc);
 	}
 
-	execlists->csb_head = GEN8_CSB_ENTRIES - 1;
 	execlists->csb_read =
 		i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
 	if (csb_force_mmio(i915)) {
@@ -2423,6 +2452,7 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 		execlists->csb_write =
 			&engine->status_page.page_addr[intel_hws_csb_write_index(i915)];
 	}
+	reset_csb_pointers(execlists);
 
 	return 0;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 88418477f9ab..f02346291080 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -358,7 +358,6 @@ struct intel_engine_cs {
 	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
-#define ENGINE_IRQ_EXECLIST 1
 
 	/* Rather than have every client wait upon all user interrupts,
 	 * with the herd waking after every interrupt and each doing the
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset (rev2)
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (12 preceding siblings ...)
  2018-05-31 19:48 ` ✗ Fi.CI.BAT: failure " Patchwork
@ 2018-05-31 20:14 ` Patchwork
  2018-05-31 20:33 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2018-05-31 20:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Be irqsafe inside reset (rev2)
URL   : https://patchwork.freedesktop.org/series/44041/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
df8d5c388e8c drm/i915: Be irqsafe inside reset
827a42f841af drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
-:41: WARNING:LONG_LINE: line over 100 characters
#41: FILE: drivers/gpu/drm/i915/intel_lrc.c:979:
+				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));

total: 0 errors, 1 warnings, 0 checks, 42 lines checked
d75be47cd2c7 drm/i915/execlists: Pull submit after dequeue under timeline lock
089902205660 drm/i915/execlists: Pull CSB reset under the timeline.lock
bd7b385b52f6 drm/i915/execlists: Process one CSB interrupt at a time
-:35: WARNING:MEMORY_BARRIER: memory barrier without comment
#35: FILE: drivers/gpu/drm/i915/intel_lrc.c:966:
+	smp_mb__after_atomic();

-:77: WARNING:LONG_LINE: line over 100 characters
#77: FILE: drivers/gpu/drm/i915/intel_lrc.c:990:
+		  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",

-:78: WARNING:LONG_LINE: line over 100 characters
#78: FILE: drivers/gpu/drm/i915/intel_lrc.c:991:
+		  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");

-:140: CHECK:SPACING: spaces preferred around that '*' (ctx:VxV)
#140: FILE: drivers/gpu/drm/i915/intel_lrc.c:1022:
+			  status, buf[2*head + 1],
 			               ^

-:176: CHECK:SPACING: spaces preferred around that '*' (ctx:VxV)
#176: FILE: drivers/gpu/drm/i915/intel_lrc.c:1040:
+		    buf[2*head + 1] == execlists->preempt_complete_status) {
 		         ^

total: 0 errors, 3 warnings, 2 checks, 301 lines checked
0c7415c81af2 drm/i915/execlists: Unify CSB access pointers
fc7a70b009e1 drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
-:27: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#27: 
Comparing the impact on the maximum latency observed (that is the time stolen

-:102: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half")'
#102: 
References: 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half")

total: 1 errors, 1 warnings, 0 checks, 359 lines checked
95590ea35f5d drm/i915: Move rate-limiting request retire to after submission
2f23f62da048 drm/i915: Wait for engines to idle before retiring
f9ddd078e8f5 drm/i915: Move engine request retirement to intel_engine_cs
f14b08071ab6 drm/i915: Hold request reference for submission until retirement

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] drm/i915: Hold request reference for submission until retirement
  2018-05-31 18:52 ` [PATCH 11/11] drm/i915: Hold request reference for submission until retirement Chris Wilson
@ 2018-05-31 20:21   ` Chris Wilson
  0 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2018-05-31 20:21 UTC (permalink / raw)
  To: intel-gfx

Currently the async submission backends (guc and execlists) hold a extra
reference to the requests being processed as they are not serialised with
request retirement. If we instead, prevent the request being dropped
from the engine timeline until after submission has finished processing
the request, we can use a single reference held for the entire
submission process (currently, it is held only for the submission
fence).

By doing so we remove a few atomics from inside the irqoff path, on the
order of 200ns as measured by gem_syslatency, bringing the impact of
direct submission into line with the previous tasklet implementation.
The tradeoff is that as we may postpone the retirement, we have to check
for any residual requests after detecting that the engines are idle.

v2: switch-to-kernel-context needs to be cognisant of the delayed
release on the engine->timeline again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c     |  2 +-
 drivers/gpu/drm/i915/i915_request.c         | 20 +++++-----
 drivers/gpu/drm/i915/intel_engine_cs.c      | 42 +++++++++++++++++----
 drivers/gpu/drm/i915/intel_guc_submission.c |  4 +-
 drivers/gpu/drm/i915/intel_lrc.c            | 10 +----
 drivers/gpu/drm/i915/intel_ringbuffer.h     |  4 +-
 6 files changed, 49 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 94e4db1870aa..cfce2531c19f 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -641,7 +641,7 @@ static bool engine_has_kernel_context_barrier(struct intel_engine_cs *engine)
 		return true;
 
 	/* The engine is idle; check that it is idling in the kernel context. */
-	return engine->last_retired_context == ce;
+	return intel_engine_has_kernel_context(engine);
 }
 
 int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f91942e4d852..2cfa6f3a2f16 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -347,17 +347,15 @@ static void free_capture_list(struct i915_request *request)
 static void __retire_engine_upto(struct intel_engine_cs *engine,
 				 struct i915_request *rq)
 {
+	struct list_head * const requests = &engine->timeline.requests;
 	struct i915_request *tmp;
 
 	if (list_empty(&rq->link))
 		return;
 
-	do {
-		tmp = list_first_entry(&engine->timeline.requests,
-				       typeof(*tmp), link);
-
-		intel_engine_retire_request(engine, tmp);
-	} while (tmp != rq);
+	do
+		tmp = list_first_entry(requests, typeof(*tmp), link);
+	while (intel_engine_retire_request(engine, tmp) && tmp != rq);
 }
 
 static void i915_request_retire(struct i915_request *request)
@@ -376,6 +374,8 @@ static void i915_request_retire(struct i915_request *request)
 
 	trace_i915_request_retire(request);
 
+	__retire_engine_upto(request->engine, request);
+
 	advance_ring(request);
 	free_capture_list(request);
 
@@ -414,8 +414,6 @@ static void i915_request_retire(struct i915_request *request)
 	atomic_dec_if_positive(&request->gem_context->ban_score);
 	intel_context_unpin(request->hw_context);
 
-	__retire_engine_upto(request->engine, request);
-
 	unreserve_gt(request->i915);
 
 	i915_sched_node_fini(request->i915, &request->sched);
@@ -722,8 +720,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		       rq->timeline->fence_context,
 		       timeline_get_seqno(rq->timeline));
 
-	/* We bump the ref for the fence chain */
-	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
+	/* We bump the ref for the fence chain and for the submit backend. */
+	refcount_set(&rq->fence.refcount.refcount, 3);
+
+	i915_sw_fence_init(&rq->submit, submit_notify);
 	init_waitqueue_head(&rq->execute);
 
 	i915_sched_node_init(&rq->sched);
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index cce7234b9071..6fa2ef8862ce 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1035,11 +1035,11 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
  * executed if the engine is already idle, is the kernel context
  * (#i915.kernel_context).
  */
-bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
+bool intel_engine_has_kernel_context(struct intel_engine_cs *engine)
 {
 	const struct intel_context *kernel_context =
 		to_intel_context(engine->i915->kernel_context, engine);
-	struct i915_request *rq;
+	const struct intel_context *last;
 
 	lockdep_assert_held(&engine->i915->drm.struct_mutex);
 
@@ -1048,11 +1048,15 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
 	 * the last request that remains in the timeline. When idle, it is
 	 * the last executed context as tracked by retirement.
 	 */
-	rq = __i915_gem_active_peek(&engine->timeline.last_request);
-	if (rq)
-		return rq->hw_context == kernel_context;
-	else
-		return engine->last_retired_context == kernel_context;
+	last = engine->last_retired_context;
+
+	spin_lock_irq(&engine->timeline.lock);
+	if (!list_empty(&engine->timeline.requests))
+		last = list_last_entry(&engine->timeline.requests,
+				       struct i915_request, link)->hw_context;
+	spin_unlock_irq(&engine->timeline.lock);
+
+	return last == kernel_context;
 }
 
 void intel_engines_reset_default_submission(struct drm_i915_private *i915)
@@ -1071,8 +1075,10 @@ void intel_engines_reset_default_submission(struct drm_i915_private *i915)
  *
  * This request has been completed and is part of the chain being retired by
  * the caller, so drop any reference to it from the engine.
+ *
+ * Returns: true if the reference was dropped, false if it was still busy.
  */
-void intel_engine_retire_request(struct intel_engine_cs *engine,
+bool intel_engine_retire_request(struct intel_engine_cs *engine,
 				 struct i915_request *rq)
 {
 	GEM_TRACE("%s(%s) fence %llx:%d, global=%d, current %d\n",
@@ -1085,6 +1091,10 @@ void intel_engine_retire_request(struct intel_engine_cs *engine,
 	GEM_BUG_ON(rq->engine != engine);
 	GEM_BUG_ON(!i915_request_completed(rq));
 
+	/* Don't drop the final ref until after the backend has finished */
+	if (port_request(engine->execlists.port) == rq)
+		return false;
+
 	local_irq_disable();
 
 	spin_lock(&engine->timeline.lock);
@@ -1116,6 +1126,19 @@ void intel_engine_retire_request(struct intel_engine_cs *engine,
 	if (engine->last_retired_context)
 		intel_context_unpin(engine->last_retired_context);
 	engine->last_retired_context = rq->hw_context;
+
+	i915_request_put(rq);
+	return true;
+}
+
+static void engine_retire_requests(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq, *next;
+
+	list_for_each_entry_safe(rq, next, &engine->timeline.requests, link) {
+		if (WARN_ON(!intel_engine_retire_request(engine, rq)))
+			break;
+	}
 }
 
 /**
@@ -1148,6 +1171,7 @@ void intel_engines_park(struct drm_i915_private *i915)
 				"%s is not idle before parking\n",
 				engine->name);
 			intel_engine_dump(engine, &p, NULL);
+			engine->cancel_requests(engine);
 		}
 
 		/* Must be reset upon idling, or we may miss the busy wakeup. */
@@ -1156,6 +1180,8 @@ void intel_engines_park(struct drm_i915_private *i915)
 		if (engine->park)
 			engine->park(engine);
 
+		engine_retire_requests(engine);
+
 		if (engine->pinned_default_state) {
 			i915_gem_object_unpin_map(engine->default_state);
 			engine->pinned_default_state = NULL;
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 133367a17863..6f6223644140 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -669,8 +669,7 @@ static void guc_submit(struct intel_engine_cs *engine)
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(port_isset(port));
-
-	port_set(port, i915_request_get(rq));
+	port_set(port, rq);
 }
 
 static inline int rq_prio(const struct i915_request *rq)
@@ -793,7 +792,6 @@ static void guc_submission_tasklet(unsigned long data)
 	rq = port_request(port);
 	while (rq && i915_request_completed(rq)) {
 		trace_i915_request_out(rq);
-		i915_request_put(rq);
 
 		port = execlists_port_complete(execlists, port);
 		if (port_isset(port)) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 46e8d29acf5f..d519231200d9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -513,11 +513,7 @@ static bool can_merge_ctx(const struct intel_context *prev,
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(rq == port_request(port));
-
-	if (port_isset(port))
-		i915_request_put(port_request(port));
-
-	port_set(port, port_pack(i915_request_get(rq), port_count(port)));
+	port_set(port, port_pack(rq, port_count(port)));
 }
 
 static void inject_preempt_context(struct intel_engine_cs *engine)
@@ -793,8 +789,6 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 					       INTEL_CONTEXT_SCHEDULE_OUT :
 					       INTEL_CONTEXT_SCHEDULE_PREEMPTED);
 
-		i915_request_put(rq);
-
 		memset(port, 0, sizeof(*port));
 		port++;
 	}
@@ -1061,8 +1055,6 @@ static void process_csb(struct intel_engine_cs *engine)
 
 			execlists_context_schedule_out(rq,
 						       INTEL_CONTEXT_SCHEDULE_OUT);
-			i915_request_put(rq);
-
 			GEM_TRACE("%s completed ctx=%d\n",
 				  engine->name, port->context_id);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index bd0067047044..3265e2ca047d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -888,7 +888,7 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_blt_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine);
 
-void intel_engine_retire_request(struct intel_engine_cs *engine,
+bool intel_engine_retire_request(struct intel_engine_cs *engine,
 				 struct i915_request *rq);
 int intel_engine_stop_cs(struct intel_engine_cs *engine);
 
@@ -1064,7 +1064,7 @@ gen8_emit_ggtt_write(u32 *cs, u32 value, u32 gtt_offset)
 bool intel_engine_is_idle(struct intel_engine_cs *engine);
 bool intel_engines_are_idle(struct drm_i915_private *dev_priv);
 
-bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine);
+bool intel_engine_has_kernel_context(struct intel_engine_cs *engine);
 void intel_engine_lost_context(struct intel_engine_cs *engine);
 
 void intel_engines_park(struct drm_i915_private *i915);
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/11] drm/i915: Be irqsafe inside reset (rev2)
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (13 preceding siblings ...)
  2018-05-31 20:14 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset (rev2) Patchwork
@ 2018-05-31 20:33 ` Patchwork
  2018-05-31 20:37 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset (rev3) Patchwork
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2018-05-31 20:33 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Be irqsafe inside reset (rev2)
URL   : https://patchwork.freedesktop.org/series/44041/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4268 -> Patchwork_9162 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/44041/revisions/2/mbox/

== Known issues ==

  Here are the changes found in Patchwork_9162 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_module_reload@basic-reload-inject:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106725, fdo#106248)

    igt@gem_mmap_gtt@basic-small-bo-tiledx:
      fi-gdg-551:         PASS -> FAIL (fdo#102575)

    igt@kms_flip@basic-flip-vs-dpms:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106000)

    igt@prime_vgem@basic-fence-flip:
      fi-ilk-650:         PASS -> FAIL (fdo#104008)

    
    ==== Possible fixes ====

    igt@kms_chamelium@hdmi-hpd-fast:
      fi-kbl-7500u:       FAIL (fdo#103841, fdo#102672) -> SKIP

    igt@kms_flip@basic-flip-vs-modeset:
      fi-glk-j4005:       DMESG-WARN (fdo#106000) -> PASS

    
  fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
  fdo#102672 https://bugs.freedesktop.org/show_bug.cgi?id=102672
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008
  fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
  fdo#106248 https://bugs.freedesktop.org/show_bug.cgi?id=106248
  fdo#106725 https://bugs.freedesktop.org/show_bug.cgi?id=106725


== Participating hosts (42 -> 40) ==

  Additional (1): fi-bxt-dsi 
  Missing    (3): fi-ctg-p8600 fi-ilk-m540 fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4268 -> Patchwork_9162

  CI_DRM_4268: 7138a144b0da03dcbe54731f7ac0dab6948beafb @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4503: ae0ea2a0cff1cf8516d18ada5b9db01c56b73ed9 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9162: f14b08071ab62b30a7662d290c8f7af8e0a6de6d @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

f14b08071ab6 drm/i915: Hold request reference for submission until retirement
f9ddd078e8f5 drm/i915: Move engine request retirement to intel_engine_cs
2f23f62da048 drm/i915: Wait for engines to idle before retiring
95590ea35f5d drm/i915: Move rate-limiting request retire to after submission
fc7a70b009e1 drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
0c7415c81af2 drm/i915/execlists: Unify CSB access pointers
bd7b385b52f6 drm/i915/execlists: Process one CSB interrupt at a time
089902205660 drm/i915/execlists: Pull CSB reset under the timeline.lock
d75be47cd2c7 drm/i915/execlists: Pull submit after dequeue under timeline lock
827a42f841af drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
df8d5c388e8c drm/i915: Be irqsafe inside reset

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9162/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset (rev3)
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (14 preceding siblings ...)
  2018-05-31 20:33 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-05-31 20:37 ` Patchwork
  2018-05-31 20:54 ` ✓ Fi.CI.BAT: success " Patchwork
  2018-05-31 21:54 ` ✗ Fi.CI.IGT: failure " Patchwork
  17 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2018-05-31 20:37 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Be irqsafe inside reset (rev3)
URL   : https://patchwork.freedesktop.org/series/44041/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
4f0a70687283 drm/i915: Be irqsafe inside reset
80c4b2f2bb87 drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
-:41: WARNING:LONG_LINE: line over 100 characters
#41: FILE: drivers/gpu/drm/i915/intel_lrc.c:979:
+				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));

total: 0 errors, 1 warnings, 0 checks, 42 lines checked
805cf1fa48d2 drm/i915/execlists: Pull submit after dequeue under timeline lock
4430842b2698 drm/i915/execlists: Pull CSB reset under the timeline.lock
db1455d68cf7 drm/i915/execlists: Process one CSB interrupt at a time
-:35: WARNING:MEMORY_BARRIER: memory barrier without comment
#35: FILE: drivers/gpu/drm/i915/intel_lrc.c:966:
+	smp_mb__after_atomic();

-:77: WARNING:LONG_LINE: line over 100 characters
#77: FILE: drivers/gpu/drm/i915/intel_lrc.c:990:
+		  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",

-:78: WARNING:LONG_LINE: line over 100 characters
#78: FILE: drivers/gpu/drm/i915/intel_lrc.c:991:
+		  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");

-:140: CHECK:SPACING: spaces preferred around that '*' (ctx:VxV)
#140: FILE: drivers/gpu/drm/i915/intel_lrc.c:1022:
+			  status, buf[2*head + 1],
 			               ^

-:176: CHECK:SPACING: spaces preferred around that '*' (ctx:VxV)
#176: FILE: drivers/gpu/drm/i915/intel_lrc.c:1040:
+		    buf[2*head + 1] == execlists->preempt_complete_status) {
 		         ^

total: 0 errors, 3 warnings, 2 checks, 301 lines checked
765d6aaa9b4a drm/i915/execlists: Unify CSB access pointers
54bfb646a85a drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
-:27: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#27: 
Comparing the impact on the maximum latency observed (that is the time stolen

-:102: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half")'
#102: 
References: 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half")

total: 1 errors, 1 warnings, 0 checks, 359 lines checked
f68f438e5165 drm/i915: Move rate-limiting request retire to after submission
9d9175a4bd19 drm/i915: Wait for engines to idle before retiring
5cb96575ec5c drm/i915: Move engine request retirement to intel_engine_cs
eb2138838e70 drm/i915: Hold request reference for submission until retirement

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/11] drm/i915: Be irqsafe inside reset (rev3)
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (15 preceding siblings ...)
  2018-05-31 20:37 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset (rev3) Patchwork
@ 2018-05-31 20:54 ` Patchwork
  2018-05-31 21:54 ` ✗ Fi.CI.IGT: failure " Patchwork
  17 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2018-05-31 20:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Be irqsafe inside reset (rev3)
URL   : https://patchwork.freedesktop.org/series/44041/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4268 -> Patchwork_9163 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/44041/revisions/3/mbox/

== Known issues ==

  Here are the changes found in Patchwork_9163 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_mmap_gtt@basic-small-bo-tiledx:
      fi-gdg-551:         PASS -> FAIL (fdo#102575)

    igt@kms_frontbuffer_tracking@basic:
      fi-hsw-peppy:       PASS -> DMESG-FAIL (fdo#106103, fdo#102614)

    igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#105719)

    
    ==== Possible fixes ====

    igt@kms_chamelium@hdmi-hpd-fast:
      fi-kbl-7500u:       FAIL (fdo#103841, fdo#102672) -> SKIP

    igt@kms_flip@basic-flip-vs-modeset:
      fi-glk-j4005:       DMESG-WARN (fdo#106000) -> PASS

    
  fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
  fdo#102614 https://bugs.freedesktop.org/show_bug.cgi?id=102614
  fdo#102672 https://bugs.freedesktop.org/show_bug.cgi?id=102672
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#105719 https://bugs.freedesktop.org/show_bug.cgi?id=105719
  fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
  fdo#106103 https://bugs.freedesktop.org/show_bug.cgi?id=106103


== Participating hosts (42 -> 40) ==

  Additional (1): fi-bxt-dsi 
  Missing    (3): fi-ctg-p8600 fi-ilk-m540 fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4268 -> Patchwork_9163

  CI_DRM_4268: 7138a144b0da03dcbe54731f7ac0dab6948beafb @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4503: ae0ea2a0cff1cf8516d18ada5b9db01c56b73ed9 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9163: eb2138838e704ba6d157a492695a3467dd17e4c4 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

eb2138838e70 drm/i915: Hold request reference for submission until retirement
5cb96575ec5c drm/i915: Move engine request retirement to intel_engine_cs
9d9175a4bd19 drm/i915: Wait for engines to idle before retiring
f68f438e5165 drm/i915: Move rate-limiting request retire to after submission
54bfb646a85a drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
765d6aaa9b4a drm/i915/execlists: Unify CSB access pointers
db1455d68cf7 drm/i915/execlists: Process one CSB interrupt at a time
4430842b2698 drm/i915/execlists: Pull CSB reset under the timeline.lock
805cf1fa48d2 drm/i915/execlists: Pull submit after dequeue under timeline lock
80c4b2f2bb87 drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
4f0a70687283 drm/i915: Be irqsafe inside reset

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9163/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [01/11] drm/i915: Be irqsafe inside reset (rev3)
  2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
                   ` (16 preceding siblings ...)
  2018-05-31 20:54 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-05-31 21:54 ` Patchwork
  17 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2018-05-31 21:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Be irqsafe inside reset (rev3)
URL   : https://patchwork.freedesktop.org/series/44041/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4268_full -> Patchwork_9163_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9163_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9163_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/44041/revisions/3/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9163_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@drv_selftest@live_hangcheck:
      shard-apl:          PASS -> DMESG-FAIL

    
    ==== Warnings ====

    igt@drv_selftest@live_execlists:
      shard-apl:          PASS -> SKIP +1

    igt@gem_exec_schedule@deep-blt:
      shard-kbl:          PASS -> SKIP +2

    igt@gem_mocs_settings@mocs-rc6-vebox:
      shard-kbl:          SKIP -> PASS +3

    igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-draw-mmap-gtt:
      shard-hsw:          SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_9163_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_gtt:
      shard-kbl:          PASS -> FAIL (fdo#105347)
      shard-glk:          PASS -> INCOMPLETE (fdo#103359, k.org#198133)

    igt@kms_flip_tiling@flip-x-tiled:
      shard-glk:          PASS -> FAIL (fdo#103822, fdo#104724) +1

    igt@kms_setmode@basic:
      shard-apl:          PASS -> FAIL (fdo#99912)

    
    ==== Possible fixes ====

    igt@drv_selftest@live_hangcheck:
      shard-kbl:          DMESG-FAIL (fdo#106560) -> PASS

    igt@gem_eio@hibernate:
      shard-snb:          FAIL (fdo#105957) -> PASS

    igt@gem_eio@suspend:
      shard-snb:          INCOMPLETE (fdo#105411) -> PASS

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking:
      shard-glk:          FAIL (fdo#105703) -> PASS

    
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#103822 https://bugs.freedesktop.org/show_bug.cgi?id=103822
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105347 https://bugs.freedesktop.org/show_bug.cgi?id=105347
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#105703 https://bugs.freedesktop.org/show_bug.cgi?id=105703
  fdo#105957 https://bugs.freedesktop.org/show_bug.cgi?id=105957
  fdo#106560 https://bugs.freedesktop.org/show_bug.cgi?id=106560
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4268 -> Patchwork_9163

  CI_DRM_4268: 7138a144b0da03dcbe54731f7ac0dab6948beafb @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4503: ae0ea2a0cff1cf8516d18ada5b9db01c56b73ed9 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9163: eb2138838e704ba6d157a492695a3467dd17e4c4 @ git://anongit.freedesktop.org/gfx-ci/linux

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9163/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/11] drm/i915: Be irqsafe inside reset
  2018-05-31 18:51 ` [PATCH 01/11] drm/i915: Be irqsafe inside reset Chris Wilson
@ 2018-06-01 13:44   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-01 13:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/05/2018 19:51, Chris Wilson wrote:
> As we want to be able to call i915_reset_engine and co from a softirq or
> timer context, we need to be irqsafe at all timers. So we have to forgo

s/timers/times/

> the simple spin_lock_irq for the full spin_lock_irqsave.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f5c4ef052001..550fa8288c45 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3142,15 +3142,17 @@ i915_gem_reset_request(struct intel_engine_cs *engine,
>   		 */
>   		request = i915_gem_find_active_request(engine);
>   		if (request) {
> +			unsigned long flags;
> +
>   			i915_gem_context_mark_innocent(request->gem_context);
>   			dma_fence_set_error(&request->fence, -EAGAIN);
>   
>   			/* Rewind the engine to replay the incomplete rq */
> -			spin_lock_irq(&engine->timeline.lock);
> +			spin_lock_irqsave(&engine->timeline.lock, flags);
>   			request = list_prev_entry(request, link);
>   			if (&request->link == &engine->timeline.requests)
>   				request = NULL;
> -			spin_unlock_irq(&engine->timeline.lock);
> +			spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   		}
>   	}
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/11] drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
  2018-05-31 18:51 ` [PATCH 02/11] drm/i915/execlists: Reset the CSB head tracking on reset/sanitization Chris Wilson
@ 2018-06-01 13:54   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-01 13:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/05/2018 19:51, Chris Wilson wrote:
> We can avoid the mmio read of the CSB pointers after reset based on the
> knowledge that the HW always start writing at entry 0 in the CSB buffer.
> We need to reset our CSB head tracking after GPU reset (and on
> sanitization after resume) so that we are expecting to read from entry
> 0.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 15 ++++++---------
>   1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 517e92c6a70b..e5cf049c18f8 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -965,22 +965,19 @@ static void process_csb(struct intel_engine_cs *engine)
>   			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
>   		unsigned int head, tail;
>   
> -		if (unlikely(execlists->csb_use_mmio)) {
> -			buf = (u32 * __force)
> -				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
> -			execlists->csb_head = -1; /* force mmio read of CSB */
> -		}
> -
>   		/* Clear before reading to catch new interrupts */
>   		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>   		smp_mb__after_atomic();
>   
> -		if (unlikely(execlists->csb_head == -1)) { /* after a reset */
> +		if (unlikely(execlists->csb_use_mmio)) {
>   			if (!fw) {
>   				intel_uncore_forcewake_get(i915, execlists->fw_domains);
>   				fw = true;
>   			}
>   
> +			buf = (u32 * __force)
> +				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
> +
>   			head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
>   			tail = GEN8_CSB_WRITE_PTR(head);
>   			head = GEN8_CSB_READ_PTR(head);
> @@ -1959,7 +1956,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
>   	spin_unlock(&engine->timeline.lock);
>   
>   	/* Following the reset, we need to reload the CSB read/write pointers */
> -	engine->execlists.csb_head = -1;
> +	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
>   
>   	local_irq_restore(flags);
>   
> @@ -2460,7 +2457,7 @@ static int logical_ring_init(struct intel_engine_cs *engine)
>   			upper_32_bits(ce->lrc_desc);
>   	}
>   
> -	engine->execlists.csb_head = -1;
> +	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
>   
>   	return 0;
>   
> 

Looks OK. Just not that exciting due rarity of the event.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock
  2018-05-31 18:51 ` [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock Chris Wilson
@ 2018-06-01 14:02   ` Tvrtko Ursulin
  2018-06-01 14:07     ` Chris Wilson
  2018-06-04 14:19   ` Tvrtko Ursulin
  1 sibling, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-01 14:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/05/2018 19:51, Chris Wilson wrote:
> In the next patch, we will begin processing the CSB from inside the
> interrupt handler. This means that updating the execlists->port[] will

The message that we will be processing CSB from irq handler, here and in 
following patch 5/11, doesn't seem to be true. So why is this needed? 
Why not just drop some patches from the series?

Regards,

Tvrtko

> no longer be locked by the tasklet but by the engine->timeline.lock
> instead. Pull dequeue and submit under the same lock for protection.
> (An alternative, future, plan is to keep the in/out arrays separate for
> concurrent processing and reduced lock coverage.)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 32 ++++++++++++--------------------
>   1 file changed, 12 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e5cf049c18f8..d207a1bf9dc9 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -562,7 +562,7 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
>   	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
>   }
>   
> -static bool __execlists_dequeue(struct intel_engine_cs *engine)
> +static void __execlists_dequeue(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	struct execlist_port *port = execlists->port;
> @@ -617,11 +617,11 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   		 * the HW to indicate that it has had a chance to respond.
>   		 */
>   		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
> -			return false;
> +			return;
>   
>   		if (need_preempt(engine, last, execlists->queue_priority)) {
>   			inject_preempt_context(engine);
> -			return false;
> +			return;
>   		}
>   
>   		/*
> @@ -646,7 +646,7 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   		 * priorities of the ports haven't been switch.
>   		 */
>   		if (port_count(&port[1]))
> -			return false;
> +			return;
>   
>   		/*
>   		 * WaIdleLiteRestore:bdw,skl
> @@ -746,8 +746,10 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   		port != execlists->port ? rq_prio(last) : INT_MIN;
>   
>   	execlists->first = rb;
> -	if (submit)
> +	if (submit) {
>   		port_assign(port, last);
> +		execlists_submit_ports(engine);
> +	}
>   
>   	/* We must always keep the beast fed if we have work piled up */
>   	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
> @@ -756,24 +758,19 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   	if (last)
>   		execlists_user_begin(execlists, execlists->port);
>   
> -	return submit;
> +	/* If the engine is now idle, so should be the flag; and vice versa. */
> +	GEM_BUG_ON(execlists_is_active(&engine->execlists,
> +				       EXECLISTS_ACTIVE_USER) ==
> +		   !port_isset(engine->execlists.port));
>   }
>   
>   static void execlists_dequeue(struct intel_engine_cs *engine)
>   {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	unsigned long flags;
> -	bool submit;
>   
>   	spin_lock_irqsave(&engine->timeline.lock, flags);
> -	submit = __execlists_dequeue(engine);
> +	__execlists_dequeue(engine);
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> -
> -	if (submit)
> -		execlists_submit_ports(engine);
> -
> -	GEM_BUG_ON(port_isset(execlists->port) &&
> -		   !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
>   }
>   
>   void
> @@ -1156,11 +1153,6 @@ static void execlists_submission_tasklet(unsigned long data)
>   
>   	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
>   		execlists_dequeue(engine);
> -
> -	/* If the engine is now idle, so should be the flag; and vice versa. */
> -	GEM_BUG_ON(execlists_is_active(&engine->execlists,
> -				       EXECLISTS_ACTIVE_USER) ==
> -		   !port_isset(engine->execlists.port));
>   }
>   
>   static void queue_request(struct intel_engine_cs *engine,
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock
  2018-06-01 14:02   ` Tvrtko Ursulin
@ 2018-06-01 14:07     ` Chris Wilson
  2018-06-04  9:25       ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2018-06-01 14:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-06-01 15:02:38)
> 
> On 31/05/2018 19:51, Chris Wilson wrote:
> > In the next patch, we will begin processing the CSB from inside the
> > interrupt handler. This means that updating the execlists->port[] will
> 
> The message that we will be processing CSB from irq handler, here and in 
> following patch 5/11, doesn't seem to be true. So why is this needed? 
> Why not just drop some patches from the series?

It will still be called from irq context; as submit_request is called
from irq context. Hence we still require irqsafe variants.

So yes, while the overall direction of the patchset changed slightly to
be less enthusiastic about calling from irq context, such calls were not
eliminated.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock
  2018-06-01 14:07     ` Chris Wilson
@ 2018-06-04  9:25       ` Tvrtko Ursulin
  2018-06-04 10:12         ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-04  9:25 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/06/2018 15:07, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-01 15:02:38)
>>
>> On 31/05/2018 19:51, Chris Wilson wrote:
>>> In the next patch, we will begin processing the CSB from inside the
>>> interrupt handler. This means that updating the execlists->port[] will
>>
>> The message that we will be processing CSB from irq handler, here and in
>> following patch 5/11, doesn't seem to be true. So why is this needed?
>> Why not just drop some patches from the series?
> 
> It will still be called from irq context; as submit_request is called
> from irq context. Hence we still require irqsafe variants.

submit_request is already called from irqoff contexts so I don't get it.

> So yes, while the overall direction of the patchset changed slightly to
> be less enthusiastic about calling from irq context, such calls were not
> eliminated.

I have a problem with commit messages where one says we'll be processing 
CSB directly from hard irq, while another says we'll always use the tasklet.

It is very hard to review it like this since I cannot rely on the high 
level narrative for a little bit of help. To help me figure out what is 
needed in this latest incarnation, and what perhaps isn't. And we 
shouldn't really do it like this in general (have inconsistent commit 
messages).

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock
  2018-06-04  9:25       ` Tvrtko Ursulin
@ 2018-06-04 10:12         ` Chris Wilson
  2018-06-04 10:58           ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2018-06-04 10:12 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-06-04 10:25:46)
> 
> On 01/06/2018 15:07, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-06-01 15:02:38)
> >>
> >> On 31/05/2018 19:51, Chris Wilson wrote:
> >>> In the next patch, we will begin processing the CSB from inside the
> >>> interrupt handler. This means that updating the execlists->port[] will
> >>
> >> The message that we will be processing CSB from irq handler, here and in
> >> following patch 5/11, doesn't seem to be true. So why is this needed?
> >> Why not just drop some patches from the series?
> > 
> > It will still be called from irq context; as submit_request is called
> > from irq context. Hence we still require irqsafe variants.
> 
> submit_request is already called from irqoff contexts so I don't get it.

Right, but the patch series pulls the CSB processing into
submit_request as well.
 
> > So yes, while the overall direction of the patchset changed slightly to
> > be less enthusiastic about calling from irq context, such calls were not
> > eliminated.
> 
> I have a problem with commit messages where one says we'll be processing 
> CSB directly from hard irq, while another says we'll always use the tasklet.

It's done inside the tasklet callback; the tasklet callback may be
invoked directly, and that may also happen inside an irq.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock
  2018-06-04 10:12         ` Chris Wilson
@ 2018-06-04 10:58           ` Tvrtko Ursulin
  2018-06-04 11:15             ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-04 10:58 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/06/2018 11:12, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-04 10:25:46)
>>
>> On 01/06/2018 15:07, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-06-01 15:02:38)
>>>>
>>>> On 31/05/2018 19:51, Chris Wilson wrote:
>>>>> In the next patch, we will begin processing the CSB from inside the
>>>>> interrupt handler. This means that updating the execlists->port[] will
>>>>
>>>> The message that we will be processing CSB from irq handler, here and in
>>>> following patch 5/11, doesn't seem to be true. So why is this needed?
>>>> Why not just drop some patches from the series?
>>>
>>> It will still be called from irq context; as submit_request is called
>>> from irq context. Hence we still require irqsafe variants.
>>
>> submit_request is already called from irqoff contexts so I don't get it.
> 
> Right, but the patch series pulls the CSB processing into
> submit_request as well.
>   
>>> So yes, while the overall direction of the patchset changed slightly to
>>> be less enthusiastic about calling from irq context, such calls were not
>>> eliminated.
>>
>> I have a problem with commit messages where one says we'll be processing
>> CSB directly from hard irq, while another says we'll always use the tasklet.
> 
> It's done inside the tasklet callback; the tasklet callback may be
> invoked directly, and that may also happen inside an irq.

Via notify_ring yes. I was looking for something on the context switch 
path all this time.

So CSB processing via notify_ring is quite optimistic and my question is 
whether it brings anything? Or whole change is purely due dequeue?

Another mystery later in the patch series is what happens with 
writel(execlists->csb_read), which is mmio, but I see you have removed 
forcewake handling and one commit says there is no more mmio.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock
  2018-06-04 10:58           ` Tvrtko Ursulin
@ 2018-06-04 11:15             ` Chris Wilson
  0 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2018-06-04 11:15 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-06-04 11:58:00)
> 
> On 04/06/2018 11:12, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-06-04 10:25:46)
> >>
> >> On 01/06/2018 15:07, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-06-01 15:02:38)
> >>>>
> >>>> On 31/05/2018 19:51, Chris Wilson wrote:
> >>>>> In the next patch, we will begin processing the CSB from inside the
> >>>>> interrupt handler. This means that updating the execlists->port[] will
> >>>>
> >>>> The message that we will be processing CSB from irq handler, here and in
> >>>> following patch 5/11, doesn't seem to be true. So why is this needed?
> >>>> Why not just drop some patches from the series?
> >>>
> >>> It will still be called from irq context; as submit_request is called
> >>> from irq context. Hence we still require irqsafe variants.
> >>
> >> submit_request is already called from irqoff contexts so I don't get it.
> > 
> > Right, but the patch series pulls the CSB processing into
> > submit_request as well.
> >   
> >>> So yes, while the overall direction of the patchset changed slightly to
> >>> be less enthusiastic about calling from irq context, such calls were not
> >>> eliminated.
> >>
> >> I have a problem with commit messages where one says we'll be processing
> >> CSB directly from hard irq, while another says we'll always use the tasklet.
> > 
> > It's done inside the tasklet callback; the tasklet callback may be
> > invoked directly, and that may also happen inside an irq.
> 
> Via notify_ring yes. I was looking for something on the context switch 
> path all this time.

Also don't forget third-party callers may come in from under their irq.
 
> So CSB processing via notify_ring is quite optimistic and my question is 
> whether it brings anything? Or whole change is purely due dequeue?

That's the essence of the major improvement for ring switching. Pretty
sure I cooked up gem_exec_latency/rthog to show that one as well, if not
that's certainly one I'd like to show.

> Another mystery later in the patch series is what happens with 
> writel(execlists->csb_read), which is mmio, but I see you have removed 
> forcewake handling and one commit says there is no more mmio.

We removed forcewake for that last year. Where I say mmio, think
forcewaked mmio reads.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock
  2018-05-31 18:51 ` [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock Chris Wilson
  2018-06-01 14:02   ` Tvrtko Ursulin
@ 2018-06-04 14:19   ` Tvrtko Ursulin
  1 sibling, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-04 14:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/05/2018 19:51, Chris Wilson wrote:
> In the next patch, we will begin processing the CSB from inside the
> interrupt handler. This means that updating the execlists->port[] will

I'd put an extra note here that this is via the user interrupt hooks and 
not the more obvious csb interrupt.

> no longer be locked by the tasklet but by the engine->timeline.lock
> instead. Pull dequeue and submit under the same lock for protection.
> (An alternative, future, plan is to keep the in/out arrays separate for
> concurrent processing and reduced lock coverage.)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 32 ++++++++++++--------------------
>   1 file changed, 12 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e5cf049c18f8..d207a1bf9dc9 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -562,7 +562,7 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
>   	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
>   }
>   
> -static bool __execlists_dequeue(struct intel_engine_cs *engine)
> +static void __execlists_dequeue(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	struct execlist_port *port = execlists->port;
> @@ -617,11 +617,11 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   		 * the HW to indicate that it has had a chance to respond.
>   		 */
>   		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
> -			return false;
> +			return;
>   
>   		if (need_preempt(engine, last, execlists->queue_priority)) {
>   			inject_preempt_context(engine);
> -			return false;
> +			return;
>   		}
>   
>   		/*
> @@ -646,7 +646,7 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   		 * priorities of the ports haven't been switch.
>   		 */
>   		if (port_count(&port[1]))
> -			return false;
> +			return;
>   
>   		/*
>   		 * WaIdleLiteRestore:bdw,skl
> @@ -746,8 +746,10 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   		port != execlists->port ? rq_prio(last) : INT_MIN;
>   
>   	execlists->first = rb;
> -	if (submit)
> +	if (submit) {
>   		port_assign(port, last);
> +		execlists_submit_ports(engine);
> +	}
>   
>   	/* We must always keep the beast fed if we have work piled up */
>   	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
> @@ -756,24 +758,19 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
>   	if (last)
>   		execlists_user_begin(execlists, execlists->port);
>   
> -	return submit;
> +	/* If the engine is now idle, so should be the flag; and vice versa. */
> +	GEM_BUG_ON(execlists_is_active(&engine->execlists,
> +				       EXECLISTS_ACTIVE_USER) ==
> +		   !port_isset(engine->execlists.port));
>   }
>   
>   static void execlists_dequeue(struct intel_engine_cs *engine)
>   {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	unsigned long flags;
> -	bool submit;
>   
>   	spin_lock_irqsave(&engine->timeline.lock, flags);
> -	submit = __execlists_dequeue(engine);
> +	__execlists_dequeue(engine);
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> -
> -	if (submit)
> -		execlists_submit_ports(engine);
> -
> -	GEM_BUG_ON(port_isset(execlists->port) &&
> -		   !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
>   }
>   
>   void
> @@ -1156,11 +1153,6 @@ static void execlists_submission_tasklet(unsigned long data)
>   
>   	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
>   		execlists_dequeue(engine);
> -
> -	/* If the engine is now idle, so should be the flag; and vice versa. */
> -	GEM_BUG_ON(execlists_is_active(&engine->execlists,
> -				       EXECLISTS_ACTIVE_USER) ==
> -		   !port_isset(engine->execlists.port));
>   }
>   
>   static void queue_request(struct intel_engine_cs *engine,
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/11] drm/i915/execlists: Pull CSB reset under the timeline.lock
  2018-05-31 18:51 ` [PATCH 04/11] drm/i915/execlists: Pull CSB reset under the timeline.lock Chris Wilson
@ 2018-06-04 14:25   ` Tvrtko Ursulin
  2018-06-04 15:29     ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-04 14:25 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/05/2018 19:51, Chris Wilson wrote:
> In the following patch, we will process the CSB interrupt under the
> timeline.lock and not under the tasklet lock. This also means that we

Implied tasklet lock? Or tasklet serialization?

> will need to protect access to common variables such as
> execlists->csb_head with the timeline.lock during reset.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index d207a1bf9dc9..ec54c29b610f 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1927,8 +1927,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
>   		  engine->name, request ? request->global_seqno : 0,
>   		  intel_engine_get_seqno(engine));
>   
> -	/* See execlists_cancel_requests() for the irq/spinlock split. */
> -	local_irq_save(flags);
> +	spin_lock_irqsave(&engine->timeline.lock, flags);
>   
>   	/*
>   	 * Catch up with any missed context-switch interrupts.
> @@ -1943,14 +1942,12 @@ static void execlists_reset(struct intel_engine_cs *engine,
>   	reset_irq(engine);

There is a synchronize_hardirq in this one so this could be a deadlock I 
think depending on lock ordering. If another CPU enters the irq handler 
and tries to take the timeline lock which this thread already has, then 
they deadlock each other.

Regards,

Tvrtko

>   
>   	/* Push back any incomplete requests for replay after the reset. */
> -	spin_lock(&engine->timeline.lock);
>   	__unwind_incomplete_requests(engine);
> -	spin_unlock(&engine->timeline.lock);
>   
>   	/* Following the reset, we need to reload the CSB read/write pointers */
>   	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
>   
> -	local_irq_restore(flags);
> +	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   
>   	/*
>   	 * If the request was innocent, we leave the request in the ELSP
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/11] drm/i915/execlists: Process one CSB interrupt at a time
  2018-05-31 18:51 ` [PATCH 05/11] drm/i915/execlists: Process one CSB interrupt at a time Chris Wilson
@ 2018-06-04 14:27   ` Tvrtko Ursulin
  2018-06-04 15:32     ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-04 14:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/05/2018 19:51, Chris Wilson wrote:
> In the next patch, we will process the CSB events directly from the CS
> interrupt handler, being called for each interrupt. Hence, we will no

Correct to explain it is the user interrupt handler.

> longer have the need for a loop until the has-interrupt bit is clear,
> and in the meantime can remove that small optimisation.

Hm, isn't this assuming the optimistic csb processing from user 
interrupt sees the event? If it doesn't, couldn't we still end up 
needing the loop here?

Regards,

Tvrtko

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 274 +++++++++++++++----------------
>   1 file changed, 135 insertions(+), 139 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ec54c29b610f..df2b7005df65 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -954,166 +954,162 @@ static void process_csb(struct intel_engine_cs *engine)
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	struct execlist_port *port = execlists->port;
>   	struct drm_i915_private *i915 = engine->i915;
> +
> +	/* The HWSP contains a (cacheable) mirror of the CSB */
> +	const u32 *buf =
> +		&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> +	unsigned int head, tail;
>   	bool fw = false;
>   
> -	do {
> -		/* The HWSP contains a (cacheable) mirror of the CSB */
> -		const u32 *buf =
> -			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> -		unsigned int head, tail;
> +	/* Clear before reading to catch new interrupts */
> +	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +	smp_mb__after_atomic();
>   
> -		/* Clear before reading to catch new interrupts */
> -		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> -		smp_mb__after_atomic();
> +	if (unlikely(execlists->csb_use_mmio)) {
> +		intel_uncore_forcewake_get(i915, execlists->fw_domains);
> +		fw = true;
>   
> -		if (unlikely(execlists->csb_use_mmio)) {
> -			if (!fw) {
> -				intel_uncore_forcewake_get(i915, execlists->fw_domains);
> -				fw = true;
> -			}
> +		buf = (u32 * __force)
> +			(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
>   
> -			buf = (u32 * __force)
> -				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
> +		head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> +		tail = GEN8_CSB_WRITE_PTR(head);
> +		head = GEN8_CSB_READ_PTR(head);
> +		execlists->csb_head = head;
> +	} else {
> +		const int write_idx =
> +			intel_hws_csb_write_index(i915) -
> +			I915_HWS_CSB_BUF0_INDEX;
>   
> -			head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> -			tail = GEN8_CSB_WRITE_PTR(head);
> -			head = GEN8_CSB_READ_PTR(head);
> -			execlists->csb_head = head;
> -		} else {
> -			const int write_idx =
> -				intel_hws_csb_write_index(i915) -
> -				I915_HWS_CSB_BUF0_INDEX;
> +		head = execlists->csb_head;
> +		tail = READ_ONCE(buf[write_idx]);
> +		rmb(); /* Hopefully paired with a wmb() in HW */
> +	}
> +	GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
> +		  engine->name,
> +		  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
> +		  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
>   
> -			head = execlists->csb_head;
> -			tail = READ_ONCE(buf[write_idx]);
> -			rmb(); /* Hopefully paired with a wmb() in HW */
> -		}
> -		GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
> -			  engine->name,
> -			  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
> -			  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
> +	while (head != tail) {
> +		struct i915_request *rq;
> +		unsigned int status;
> +		unsigned int count;
>   
> -		while (head != tail) {
> -			struct i915_request *rq;
> -			unsigned int status;
> -			unsigned int count;
> +		if (++head == GEN8_CSB_ENTRIES)
> +			head = 0;
>   
> -			if (++head == GEN8_CSB_ENTRIES)
> -				head = 0;
> +		/*
> +		 * We are flying near dragons again.
> +		 *
> +		 * We hold a reference to the request in execlist_port[]
> +		 * but no more than that. We are operating in softirq
> +		 * context and so cannot hold any mutex or sleep. That
> +		 * prevents us stopping the requests we are processing
> +		 * in port[] from being retired simultaneously (the
> +		 * breadcrumb will be complete before we see the
> +		 * context-switch). As we only hold the reference to the
> +		 * request, any pointer chasing underneath the request
> +		 * is subject to a potential use-after-free. Thus we
> +		 * store all of the bookkeeping within port[] as
> +		 * required, and avoid using unguarded pointers beneath
> +		 * request itself. The same applies to the atomic
> +		 * status notifier.
> +		 */
>   
> -			/*
> -			 * We are flying near dragons again.
> -			 *
> -			 * We hold a reference to the request in execlist_port[]
> -			 * but no more than that. We are operating in softirq
> -			 * context and so cannot hold any mutex or sleep. That
> -			 * prevents us stopping the requests we are processing
> -			 * in port[] from being retired simultaneously (the
> -			 * breadcrumb will be complete before we see the
> -			 * context-switch). As we only hold the reference to the
> -			 * request, any pointer chasing underneath the request
> -			 * is subject to a potential use-after-free. Thus we
> -			 * store all of the bookkeeping within port[] as
> -			 * required, and avoid using unguarded pointers beneath
> -			 * request itself. The same applies to the atomic
> -			 * status notifier.
> -			 */
> +		status = READ_ONCE(buf[2 * head]); /* maybe mmio! */
> +		GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x, active=0x%x\n",
> +			  engine->name, head,
> +			  status, buf[2*head + 1],
> +			  execlists->active);
> +
> +		if (status & (GEN8_CTX_STATUS_IDLE_ACTIVE |
> +			      GEN8_CTX_STATUS_PREEMPTED))
> +			execlists_set_active(execlists,
> +					     EXECLISTS_ACTIVE_HWACK);
> +		if (status & GEN8_CTX_STATUS_ACTIVE_IDLE)
> +			execlists_clear_active(execlists,
> +					       EXECLISTS_ACTIVE_HWACK);
> +
> +		if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
> +			continue;
>   
> -			status = READ_ONCE(buf[2 * head]); /* maybe mmio! */
> -			GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x, active=0x%x\n",
> -				  engine->name, head,
> -				  status, buf[2*head + 1],
> -				  execlists->active);
> -
> -			if (status & (GEN8_CTX_STATUS_IDLE_ACTIVE |
> -				      GEN8_CTX_STATUS_PREEMPTED))
> -				execlists_set_active(execlists,
> -						     EXECLISTS_ACTIVE_HWACK);
> -			if (status & GEN8_CTX_STATUS_ACTIVE_IDLE)
> -				execlists_clear_active(execlists,
> -						       EXECLISTS_ACTIVE_HWACK);
> -
> -			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
> -				continue;
> +		/* We should never get a COMPLETED | IDLE_ACTIVE! */
> +		GEM_BUG_ON(status & GEN8_CTX_STATUS_IDLE_ACTIVE);
>   
> -			/* We should never get a COMPLETED | IDLE_ACTIVE! */
> -			GEM_BUG_ON(status & GEN8_CTX_STATUS_IDLE_ACTIVE);
> +		if (status & GEN8_CTX_STATUS_COMPLETE &&
> +		    buf[2*head + 1] == execlists->preempt_complete_status) {
> +			GEM_TRACE("%s preempt-idle\n", engine->name);
> +			complete_preempt_context(execlists);
> +			continue;
> +		}
>   
> -			if (status & GEN8_CTX_STATUS_COMPLETE &&
> -			    buf[2*head + 1] == execlists->preempt_complete_status) {
> -				GEM_TRACE("%s preempt-idle\n", engine->name);
> -				complete_preempt_context(execlists);
> -				continue;
> -			}
> +		if (status & GEN8_CTX_STATUS_PREEMPTED &&
> +		    execlists_is_active(execlists,
> +					EXECLISTS_ACTIVE_PREEMPT))
> +			continue;
>   
> -			if (status & GEN8_CTX_STATUS_PREEMPTED &&
> -			    execlists_is_active(execlists,
> -						EXECLISTS_ACTIVE_PREEMPT))
> -				continue;
> +		GEM_BUG_ON(!execlists_is_active(execlists,
> +						EXECLISTS_ACTIVE_USER));
>   
> -			GEM_BUG_ON(!execlists_is_active(execlists,
> -							EXECLISTS_ACTIVE_USER));
> +		rq = port_unpack(port, &count);
> +		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%d) (current %d), prio=%d\n",
> +			  engine->name,
> +			  port->context_id, count,
> +			  rq ? rq->global_seqno : 0,
> +			  rq ? rq->fence.context : 0,
> +			  rq ? rq->fence.seqno : 0,
> +			  intel_engine_get_seqno(engine),
> +			  rq ? rq_prio(rq) : 0);
> +
> +		/* Check the context/desc id for this event matches */
> +		GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
> +
> +		GEM_BUG_ON(count == 0);
> +		if (--count == 0) {
> +			/*
> +			 * On the final event corresponding to the
> +			 * submission of this context, we expect either
> +			 * an element-switch event or a completion
> +			 * event (and on completion, the active-idle
> +			 * marker). No more preemptions, lite-restore
> +			 * or otherwise.
> +			 */
> +			GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
> +			GEM_BUG_ON(port_isset(&port[1]) &&
> +				   !(status & GEN8_CTX_STATUS_ELEMENT_SWITCH));
> +			GEM_BUG_ON(!port_isset(&port[1]) &&
> +				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
>   
> -			rq = port_unpack(port, &count);
> -			GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%d) (current %d), prio=%d\n",
> -				  engine->name,
> -				  port->context_id, count,
> -				  rq ? rq->global_seqno : 0,
> -				  rq ? rq->fence.context : 0,
> -				  rq ? rq->fence.seqno : 0,
> -				  intel_engine_get_seqno(engine),
> -				  rq ? rq_prio(rq) : 0);
> +			/*
> +			 * We rely on the hardware being strongly
> +			 * ordered, that the breadcrumb write is
> +			 * coherent (visible from the CPU) before the
> +			 * user interrupt and CSB is processed.
> +			 */
> +			GEM_BUG_ON(!i915_request_completed(rq));
>   
> -			/* Check the context/desc id for this event matches */
> -			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
> +			execlists_context_schedule_out(rq,
> +						       INTEL_CONTEXT_SCHEDULE_OUT);
> +			i915_request_put(rq);
>   
> -			GEM_BUG_ON(count == 0);
> -			if (--count == 0) {
> -				/*
> -				 * On the final event corresponding to the
> -				 * submission of this context, we expect either
> -				 * an element-switch event or a completion
> -				 * event (and on completion, the active-idle
> -				 * marker). No more preemptions, lite-restore
> -				 * or otherwise.
> -				 */
> -				GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
> -				GEM_BUG_ON(port_isset(&port[1]) &&
> -					   !(status & GEN8_CTX_STATUS_ELEMENT_SWITCH));
> -				GEM_BUG_ON(!port_isset(&port[1]) &&
> -					   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
> +			GEM_TRACE("%s completed ctx=%d\n",
> +				  engine->name, port->context_id);
>   
> -				/*
> -				 * We rely on the hardware being strongly
> -				 * ordered, that the breadcrumb write is
> -				 * coherent (visible from the CPU) before the
> -				 * user interrupt and CSB is processed.
> -				 */
> -				GEM_BUG_ON(!i915_request_completed(rq));
> -
> -				execlists_context_schedule_out(rq,
> -							       INTEL_CONTEXT_SCHEDULE_OUT);
> -				i915_request_put(rq);
> -
> -				GEM_TRACE("%s completed ctx=%d\n",
> -					  engine->name, port->context_id);
> -
> -				port = execlists_port_complete(execlists, port);
> -				if (port_isset(port))
> -					execlists_user_begin(execlists, port);
> -				else
> -					execlists_user_end(execlists);
> -			} else {
> -				port_set(port, port_pack(rq, count));
> -			}
> +			port = execlists_port_complete(execlists, port);
> +			if (port_isset(port))
> +				execlists_user_begin(execlists, port);
> +			else
> +				execlists_user_end(execlists);
> +		} else {
> +			port_set(port, port_pack(rq, count));
>   		}
> +	}
>   
> -		if (head != execlists->csb_head) {
> -			execlists->csb_head = head;
> -			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> -			       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> -		}
> -	} while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
> +	if (head != execlists->csb_head) {
> +		execlists->csb_head = head;
> +		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> +		       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> +	}
>   
>   	if (unlikely(fw))
>   		intel_uncore_forcewake_put(i915, execlists->fw_domains);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/11] drm/i915/execlists: Unify CSB access pointers
  2018-05-31 18:51 ` [PATCH 06/11] drm/i915/execlists: Unify CSB access pointers Chris Wilson
@ 2018-06-04 14:53   ` Tvrtko Ursulin
  2018-06-04 16:58     ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-04 14:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[CC Daniele for his great documentation searching skills.]

On 31/05/2018 19:51, Chris Wilson wrote:
> Following the removal of the last workarounds, the only CSB mmio access
> is for the old vGPU interface. The mmio registers presented by vGPU do
> not require forcewake and can be treated as ordinary volatile memory,
> i.e. they behave just like the HWSP access just at a different location.
> We can reduce the CSB access inside the irq handler to a set of
> read/write/buffer pointers and treat the various paths identically and
> not worry about forcewake.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_engine_cs.c  |  12 ---
>   drivers/gpu/drm/i915/intel_lrc.c        | 116 ++++++++++--------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  23 +++--
>   3 files changed, 65 insertions(+), 86 deletions(-)

[snip]

> @@ -1103,16 +1083,11 @@ static void process_csb(struct intel_engine_cs *engine)
>   		} else {
>   			port_set(port, port_pack(rq, count));
>   		}
> -	}
> -
> -	if (head != execlists->csb_head) {
> -		execlists->csb_head = head;
> -		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> -		       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> -	}
> +	} while (head != tail);
>   
> -	if (unlikely(fw))
> -		intel_uncore_forcewake_put(i915, execlists->fw_domains);
> +	writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> +	       execlists->csb_read);

This is the write of RING_CONTEXT_STATUS_PTR and the mystery is whether 
it is or isn't a shadowed register. We don't have it in the shadowed 
list in intel_uncore.c, but we stopped taking forcewake for it after 
HWSP conversion and it seems to work regardless. I think if we could 
find that it is officially shadowed it would be good to put it in the 
list so it is documented properly in code.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/11] drm/i915/execlists: Pull CSB reset under the timeline.lock
  2018-06-04 14:25   ` Tvrtko Ursulin
@ 2018-06-04 15:29     ` Chris Wilson
  2018-06-05  8:25       ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2018-06-04 15:29 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-06-04 15:25:26)
> 
> On 31/05/2018 19:51, Chris Wilson wrote:
> > In the following patch, we will process the CSB interrupt under the
> > timeline.lock and not under the tasklet lock. This also means that we
> 
> Implied tasklet lock? Or tasklet serialization?

We were using the tasklet to provide exclusive access to the port[], so
were using it as a serialisation primitive and lock.
 
> > will need to protect access to common variables such as
> > execlists->csb_head with the timeline.lock during reset.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/intel_lrc.c | 7 ++-----
> >   1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index d207a1bf9dc9..ec54c29b610f 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -1927,8 +1927,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
> >                 engine->name, request ? request->global_seqno : 0,
> >                 intel_engine_get_seqno(engine));
> >   
> > -     /* See execlists_cancel_requests() for the irq/spinlock split. */
> > -     local_irq_save(flags);
> > +     spin_lock_irqsave(&engine->timeline.lock, flags);
> >   
> >       /*
> >        * Catch up with any missed context-switch interrupts.
> > @@ -1943,14 +1942,12 @@ static void execlists_reset(struct intel_engine_cs *engine,
> >       reset_irq(engine);
> 
> There is a synchronize_hardirq in this one so this could be a deadlock I 
> think depending on lock ordering. If another CPU enters the irq handler 
> and tries to take the timeline lock which this thread already has, then 
> they deadlock each other.

Nothing, and that would explain the deadlock in the other thread.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/11] drm/i915/execlists: Process one CSB interrupt at a time
  2018-06-04 14:27   ` Tvrtko Ursulin
@ 2018-06-04 15:32     ` Chris Wilson
  2018-06-05  8:30       ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2018-06-04 15:32 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-06-04 15:27:34)
> 
> On 31/05/2018 19:51, Chris Wilson wrote:
> > In the next patch, we will process the CSB events directly from the CS
> > interrupt handler, being called for each interrupt. Hence, we will no
> 
> Correct to explain it is the user interrupt handler.

?

> > longer have the need for a loop until the has-interrupt bit is clear,
> > and in the meantime can remove that small optimisation.
> 
> Hm, isn't this assuming the optimistic csb processing from user 
> interrupt sees the event? If it doesn't, couldn't we still end up 
> needing the loop here?

No. We get the CS interrupt when hw updates the write pointer. (Not sure
if it's just the first write.)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/11] drm/i915/execlists: Unify CSB access pointers
  2018-06-04 14:53   ` Tvrtko Ursulin
@ 2018-06-04 16:58     ` Daniele Ceraolo Spurio
  2018-06-05  8:38       ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Daniele Ceraolo Spurio @ 2018-06-04 16:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson, intel-gfx



On 04/06/18 07:53, Tvrtko Ursulin wrote:
> 
> [CC Daniele for his great documentation searching skills.]
> 
> On 31/05/2018 19:51, Chris Wilson wrote:
>> Following the removal of the last workarounds, the only CSB mmio access
>> is for the old vGPU interface. The mmio registers presented by vGPU do
>> not require forcewake and can be treated as ordinary volatile memory,
>> i.e. they behave just like the HWSP access just at a different location.
>> We can reduce the CSB access inside the irq handler to a set of
>> read/write/buffer pointers and treat the various paths identically and
>> not worry about forcewake.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>>   drivers/gpu/drm/i915/intel_engine_cs.c  |  12 ---
>>   drivers/gpu/drm/i915/intel_lrc.c        | 116 ++++++++++--------------
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  23 +++--
>>   3 files changed, 65 insertions(+), 86 deletions(-)
> 
> [snip]
> 
>> @@ -1103,16 +1083,11 @@ static void process_csb(struct intel_engine_cs 
>> *engine)
>>           } else {
>>               port_set(port, port_pack(rq, count));
>>           }
>> -    }
>> -
>> -    if (head != execlists->csb_head) {
>> -        execlists->csb_head = head;
>> -        writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
>> -               i915->regs + 
>> i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
>> -    }
>> +    } while (head != tail);
>> -    if (unlikely(fw))
>> -        intel_uncore_forcewake_put(i915, execlists->fw_domains);
>> +    writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
>> +           execlists->csb_read);
> 
> This is the write of RING_CONTEXT_STATUS_PTR and the mystery is whether 
> it is or isn't a shadowed register. We don't have it in the shadowed 
> list in intel_uncore.c, but we stopped taking forcewake for it after 
> HWSP conversion and it seems to work regardless. I think if we could 
> find that it is officially shadowed it would be good to put it in the 
> list so it is documented properly in code.
> 

I couldn't find the list of shadowed registers for gen8-9, but the gen10 
list (Bspec: 18333) seem to indicate that among the registers involved 
here only ELSP and TAIL are shadowed.

AFAIK up to gen10 the write of head to RING_CONTEXT_STATUS_PTR is for SW 
use to track the head position of the next CSB to read, HW only cares 
about the tail, so given that on non-gvt we only read back the register 
after a reset we shouldn't have issues even if a write is missed (logs 
aside). With gen11 there is a new possible usage of the head value with 
the new CSB_STATUS register, which can be used to read the CSB pointed 
by head and then automatically bump the head value. Using this register 
would require forcewake and would also deprecate the manual head update 
so we'd be covered if we ever wanted to use it.

Daniele

> Regards,
> 
> Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/11] drm/i915/execlists: Pull CSB reset under the timeline.lock
  2018-06-04 15:29     ` Chris Wilson
@ 2018-06-05  8:25       ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-05  8:25 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/06/2018 16:29, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-04 15:25:26)
>>
>> On 31/05/2018 19:51, Chris Wilson wrote:
>>> In the following patch, we will process the CSB interrupt under the
>>> timeline.lock and not under the tasklet lock. This also means that we
>>
>> Implied tasklet lock? Or tasklet serialization?
> 
> We were using the tasklet to provide exclusive access to the port[], so
> were using it as a serialisation primitive and lock.

Yes of course, was just thinking if it worth being more verbose with the 
"tasklist lock" wording. I just think of it more as "serialized by the 
tasklet" and not so much as the lock.

>>> will need to protect access to common variables such as
>>> execlists->csb_head with the timeline.lock during reset.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/intel_lrc.c | 7 ++-----
>>>    1 file changed, 2 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index d207a1bf9dc9..ec54c29b610f 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -1927,8 +1927,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
>>>                  engine->name, request ? request->global_seqno : 0,
>>>                  intel_engine_get_seqno(engine));
>>>    
>>> -     /* See execlists_cancel_requests() for the irq/spinlock split. */
>>> -     local_irq_save(flags);
>>> +     spin_lock_irqsave(&engine->timeline.lock, flags);
>>>    
>>>        /*
>>>         * Catch up with any missed context-switch interrupts.
>>> @@ -1943,14 +1942,12 @@ static void execlists_reset(struct intel_engine_cs *engine,
>>>        reset_irq(engine);
>>
>> There is a synchronize_hardirq in this one so this could be a deadlock I
>> think depending on lock ordering. If another CPU enters the irq handler
>> and tries to take the timeline lock which this thread already has, then
>> they deadlock each other.
> 
> Nothing, and that would explain the deadlock in the other thread.

Nothing what? :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/11] drm/i915/execlists: Process one CSB interrupt at a time
  2018-06-04 15:32     ` Chris Wilson
@ 2018-06-05  8:30       ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-05  8:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/06/2018 16:32, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-04 15:27:34)
>>
>> On 31/05/2018 19:51, Chris Wilson wrote:
>>> In the next patch, we will process the CSB events directly from the CS
>>> interrupt handler, being called for each interrupt. Hence, we will no
>>
>> Correct to explain it is the user interrupt handler.
> 
> ?

Series is not processing CSB from the CS interrupt, but from the user 
interrupt handler, via notify ring, fence signal, etc, as discussed in 
the other sub-thread. Or I am still missing something?

>>> longer have the need for a loop until the has-interrupt bit is clear,
>>> and in the meantime can remove that small optimisation.
>>
>> Hm, isn't this assuming the optimistic csb processing from user
>> interrupt sees the event? If it doesn't, couldn't we still end up
>> needing the loop here?
> 
> No. We get the CS interrupt when hw updates the write pointer. (Not sure
> if it's just the first write.)

Yeah I confused something here.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/11] drm/i915/execlists: Unify CSB access pointers
  2018-06-04 16:58     ` Daniele Ceraolo Spurio
@ 2018-06-05  8:38       ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2018-06-05  8:38 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, Chris Wilson, intel-gfx


On 04/06/2018 17:58, Daniele Ceraolo Spurio wrote:
> On 04/06/18 07:53, Tvrtko Ursulin wrote:
>>
>> [CC Daniele for his great documentation searching skills.]
>>
>> On 31/05/2018 19:51, Chris Wilson wrote:
>>> Following the removal of the last workarounds, the only CSB mmio access
>>> is for the old vGPU interface. The mmio registers presented by vGPU do
>>> not require forcewake and can be treated as ordinary volatile memory,
>>> i.e. they behave just like the HWSP access just at a different location.
>>> We can reduce the CSB access inside the irq handler to a set of
>>> read/write/buffer pointers and treat the various paths identically and
>>> not worry about forcewake.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>   drivers/gpu/drm/i915/intel_engine_cs.c  |  12 ---
>>>   drivers/gpu/drm/i915/intel_lrc.c        | 116 ++++++++++--------------
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  23 +++--
>>>   3 files changed, 65 insertions(+), 86 deletions(-)
>>
>> [snip]
>>
>>> @@ -1103,16 +1083,11 @@ static void process_csb(struct 
>>> intel_engine_cs *engine)
>>>           } else {
>>>               port_set(port, port_pack(rq, count));
>>>           }
>>> -    }
>>> -
>>> -    if (head != execlists->csb_head) {
>>> -        execlists->csb_head = head;
>>> -        writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
>>> -               i915->regs + 
>>> i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
>>> -    }
>>> +    } while (head != tail);
>>> -    if (unlikely(fw))
>>> -        intel_uncore_forcewake_put(i915, execlists->fw_domains);
>>> +    writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
>>> +           execlists->csb_read);
>>
>> This is the write of RING_CONTEXT_STATUS_PTR and the mystery is 
>> whether it is or isn't a shadowed register. We don't have it in the 
>> shadowed list in intel_uncore.c, but we stopped taking forcewake for 
>> it after HWSP conversion and it seems to work regardless. I think if 
>> we could find that it is officially shadowed it would be good to put 
>> it in the list so it is documented properly in code.
>>
> 
> I couldn't find the list of shadowed registers for gen8-9, but the gen10 
> list (Bspec: 18333) seem to indicate that among the registers involved 
> here only ELSP and TAIL are shadowed.
> 
> AFAIK up to gen10 the write of head to RING_CONTEXT_STATUS_PTR is for SW 
> use to track the head position of the next CSB to read, HW only cares 
> about the tail, so given that on non-gvt we only read back the register 
> after a reset we shouldn't have issues even if a write is missed (logs 
> aside). With gen11 there is a new possible usage of the head value with 
> the new CSB_STATUS register, which can be used to read the CSB pointed 
> by head and then automatically bump the head value. Using this register 
> would require forcewake and would also deprecate the manual head update 
> so we'd be covered if we ever wanted to use it.

Thanks!

So it seems we could just drop this write and save some cycles. Since 
after this patch AFAICS it is never read-back, only written to.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2018-06-05  8:38 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-31 18:51 ksoftirqd avoidance Chris Wilson
2018-05-31 18:51 ` [PATCH 01/11] drm/i915: Be irqsafe inside reset Chris Wilson
2018-06-01 13:44   ` Tvrtko Ursulin
2018-05-31 18:51 ` [PATCH 02/11] drm/i915/execlists: Reset the CSB head tracking on reset/sanitization Chris Wilson
2018-06-01 13:54   ` Tvrtko Ursulin
2018-05-31 18:51 ` [PATCH 03/11] drm/i915/execlists: Pull submit after dequeue under timeline lock Chris Wilson
2018-06-01 14:02   ` Tvrtko Ursulin
2018-06-01 14:07     ` Chris Wilson
2018-06-04  9:25       ` Tvrtko Ursulin
2018-06-04 10:12         ` Chris Wilson
2018-06-04 10:58           ` Tvrtko Ursulin
2018-06-04 11:15             ` Chris Wilson
2018-06-04 14:19   ` Tvrtko Ursulin
2018-05-31 18:51 ` [PATCH 04/11] drm/i915/execlists: Pull CSB reset under the timeline.lock Chris Wilson
2018-06-04 14:25   ` Tvrtko Ursulin
2018-06-04 15:29     ` Chris Wilson
2018-06-05  8:25       ` Tvrtko Ursulin
2018-05-31 18:51 ` [PATCH 05/11] drm/i915/execlists: Process one CSB interrupt at a time Chris Wilson
2018-06-04 14:27   ` Tvrtko Ursulin
2018-06-04 15:32     ` Chris Wilson
2018-06-05  8:30       ` Tvrtko Ursulin
2018-05-31 18:51 ` [PATCH 06/11] drm/i915/execlists: Unify CSB access pointers Chris Wilson
2018-06-04 14:53   ` Tvrtko Ursulin
2018-06-04 16:58     ` Daniele Ceraolo Spurio
2018-06-05  8:38       ` Tvrtko Ursulin
2018-05-31 18:52 ` [PATCH 07/11] drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd) Chris Wilson
2018-05-31 19:57   ` [PATCH] " Chris Wilson
2018-05-31 18:52 ` [PATCH 08/11] drm/i915: Move rate-limiting request retire to after submission Chris Wilson
2018-05-31 18:52 ` [PATCH 09/11] drm/i915: Wait for engines to idle before retiring Chris Wilson
2018-05-31 18:52 ` [PATCH 10/11] drm/i915: Move engine request retirement to intel_engine_cs Chris Wilson
2018-05-31 18:52 ` [PATCH 11/11] drm/i915: Hold request reference for submission until retirement Chris Wilson
2018-05-31 20:21   ` [PATCH] " Chris Wilson
2018-05-31 19:30 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset Patchwork
2018-05-31 19:48 ` ✗ Fi.CI.BAT: failure " Patchwork
2018-05-31 20:14 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset (rev2) Patchwork
2018-05-31 20:33 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-31 20:37 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Be irqsafe inside reset (rev3) Patchwork
2018-05-31 20:54 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-31 21:54 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.