All of lore.kernel.org
 help / color / mirror / Atom feed
* [CI 1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance()
@ 2017-02-06 17:05 Chris Wilson
  2017-02-06 17:05 ` [CI 2/2] drm/i915: Avoid unguarded reads from the request pointer Chris Wilson
  2017-02-06 19:52 ` ✓ Fi.CI.BAT: success for series starting with [CI,1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance() Patchwork
  0 siblings, 2 replies; 5+ messages in thread
From: Chris Wilson @ 2017-02-06 17:05 UTC (permalink / raw)
  To: intel-gfx

It is required that the caller declare the exact number of dwords they
wish to write into the ring. This is required for two reasons, we need
to allocate sufficient space for the entire command packet and we need
to be sure that the contents are completely written to avoid executing
stale data. The current interface requires for any bug to be caught in
review, the reader has to carefully count the number of
intel_ring_emit() between intel_ring_begin() and intel_ring_advance().
If we record the end of the packet of each intel_ring_begin() we can
also have CI check for us.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.h         | 9 +++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index a585d47c420a..412fa2794cbb 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -28,9 +28,18 @@
 #ifdef CONFIG_DRM_I915_DEBUG_GEM
 #define GEM_BUG_ON(expr) BUG_ON(expr)
 #define GEM_WARN_ON(expr) WARN_ON(expr)
+
+#define GEM_BUG_ONLY(expr) expr
+#define GEM_BUG_ONLY_DECLARE(var) var
+#define GEM_BUG_ONLY_ON(expr) GEM_BUG_ON(expr)
+
 #else
 #define GEM_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
 #define GEM_WARN_ON(expr) (BUILD_BUG_ON_INVALID(expr), 0)
+
+#define GEM_BUG_ONLY(expr) do { } while (0)
+#define GEM_BUG_ONLY_DECLARE(var)
+#define GEM_BUG_ONLY_ON(expr)
 #endif
 
 #define I915_NUM_ENGINES 5
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d32cbba25d98..383083ef2210 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2238,6 +2238,7 @@ int intel_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
 
 	ring->space -= bytes;
 	GEM_BUG_ON(ring->space < 0);
+	GEM_BUG_ONLY(ring->advance = ring->tail + bytes);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b9c15cd40fbf..2c6d3655985e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -144,6 +144,8 @@ struct intel_ring {
 
 	u32 head;
 	u32 tail;
+	GEM_BUG_ONLY_DECLARE(u32 advance);
+
 	int space;
 	int size;
 	int effective_size;
@@ -516,6 +518,7 @@ static inline void intel_ring_advance(struct intel_ring *ring)
 	 * reserved for the command packet (i.e. the value passed to
 	 * intel_ring_begin()).
 	 */
+	GEM_BUG_ONLY_ON(ring->tail != ring->advance);
 }
 
 static inline u32 intel_ring_offset(struct intel_ring *ring, void *addr)
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [CI 2/2] drm/i915: Avoid unguarded reads from the request pointer
  2017-02-06 17:05 [CI 1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance() Chris Wilson
@ 2017-02-06 17:05 ` Chris Wilson
  2017-02-06 19:52 ` ✓ Fi.CI.BAT: success for series starting with [CI,1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance() Patchwork
  1 sibling, 0 replies; 5+ messages in thread
From: Chris Wilson @ 2017-02-06 17:05 UTC (permalink / raw)
  To: intel-gfx

In commit 86aa7e760a67 ("drm/i915: Assert that the context-switch
completion matches our context") I added a read to the irq tasklet
handler that compared the on-chip status with that of our sw tracking,
using an unguarded read of the request pointer to get the context and
beyond. Whilst we hold a reference to the request, we do not hold
anything on the context and if we are unlucky it may be reaped from a
second thread retiring the request (since it may retire the request as
soon as the breadcrumb is complete, even before we finish processing the
context switch) as we try to read from the context pointer.

Avoid the racy read from underneath the request by storing the expected
result in the execlist_port[].

v2: Include commentary about port[].request being unprotected.

Fixes: 86aa7e760a67 ("drm/i915: Assert that the context-switch completion matches our context")
Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Testcase: igt/gem_ctx_create
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 24 +++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 15b8a132b8e0..df8e6f74dc7e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -351,6 +351,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 		execlists_context_status_change(port[0].request,
 						INTEL_CONTEXT_SCHEDULE_IN);
 	desc[0] = execlists_update_context(port[0].request);
+	GEM_BUG_ONLY(port[0].context_id = upper_32_bits(desc[0]));
 	port[0].count++;
 
 	if (port[1].request) {
@@ -358,6 +359,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 		execlists_context_status_change(port[1].request,
 						INTEL_CONTEXT_SCHEDULE_IN);
 		desc[1] = execlists_update_context(port[1].request);
+		GEM_BUG_ONLY(port[1].context_id = upper_32_bits(desc[1]));
 		port[1].count = 1;
 	} else {
 		desc[1] = 0;
@@ -560,13 +562,29 @@ static void intel_lrc_irq_handler(unsigned long data)
 			unsigned int idx = ++head % GEN8_CSB_ENTRIES;
 			unsigned int status = readl(buf + 2 * idx);
 
+			/* We are flying near dragons again.
+			 *
+			 * We hold a reference to the request in execlist_port[]
+			 * but no more than that. We are operating in softirq
+			 * context and so cannot hold any mutex or sleep. That
+			 * prevents us stopping the requests we are processing
+			 * in port[] from being retired simultaneously (the
+			 * breadcrumb will be complete before we see the
+			 * context-switch). As we only hold the reference to the
+			 * request, any pointer chasing underneath the request
+			 * is subject to a potential use-after-free. Thus we
+			 * store all of the bookkeeping within port[] as
+			 * required, and avoid using unguarded pointers beneath
+			 * request itself. The same applies to the atomic
+			 * status notifier.
+			 */
+
 			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
 				continue;
 
 			/* Check the context/desc id for this event matches */
-			GEM_BUG_ON(readl(buf + 2 * idx + 1) !=
-				   upper_32_bits(intel_lr_context_descriptor(port[0].request->ctx,
-									     engine)));
+			GEM_BUG_ONLY_ON(readl(buf + 2 * idx + 1) !=
+					port[0].context_id);
 
 			GEM_BUG_ON(port[0].count == 0);
 			if (--port[0].count == 0) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2c6d3655985e..896838ca502c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -381,6 +381,7 @@ struct intel_engine_cs {
 	struct execlist_port {
 		struct drm_i915_gem_request *request;
 		unsigned int count;
+		GEM_BUG_ONLY_DECLARE(u32 context_id);
 	} execlist_port[2];
 	struct rb_root execlist_queue;
 	struct rb_node *execlist_first;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [CI,1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance()
  2017-02-06 17:05 [CI 1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance() Chris Wilson
  2017-02-06 17:05 ` [CI 2/2] drm/i915: Avoid unguarded reads from the request pointer Chris Wilson
@ 2017-02-06 19:52 ` Patchwork
  2017-02-06 20:44   ` Chris Wilson
  1 sibling, 1 reply; 5+ messages in thread
From: Patchwork @ 2017-02-06 19:52 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [CI,1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance()
URL   : https://patchwork.freedesktop.org/series/19169/
State : success

== Summary ==

Series 19169v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/19169/revisions/1/mbox/

Test kms_force_connector_basic:
        Subgroup force-connector-state:
                skip       -> PASS       (fi-snb-2520m)

fi-bdw-5557u     total:252  pass:238  dwarn:0   dfail:0   fail:0   skip:14 
fi-bsw-n3050     total:252  pass:213  dwarn:0   dfail:0   fail:0   skip:39 
fi-bxt-j4205     total:252  pass:230  dwarn:0   dfail:0   fail:0   skip:22 
fi-bxt-t5700     total:83   pass:70   dwarn:0   dfail:0   fail:0   skip:12 
fi-byt-j1900     total:252  pass:225  dwarn:0   dfail:0   fail:0   skip:27 
fi-byt-n2820     total:252  pass:221  dwarn:0   dfail:0   fail:0   skip:31 
fi-hsw-4770      total:252  pass:233  dwarn:0   dfail:0   fail:0   skip:19 
fi-hsw-4770r     total:252  pass:233  dwarn:0   dfail:0   fail:0   skip:19 
fi-ilk-650       total:252  pass:199  dwarn:0   dfail:0   fail:0   skip:53 
fi-ivb-3520m     total:252  pass:231  dwarn:0   dfail:0   fail:0   skip:21 
fi-ivb-3770      total:252  pass:231  dwarn:0   dfail:0   fail:0   skip:21 
fi-kbl-7500u     total:252  pass:229  dwarn:0   dfail:0   fail:2   skip:21 
fi-skl-6260u     total:252  pass:239  dwarn:0   dfail:0   fail:0   skip:13 
fi-skl-6700hq    total:252  pass:232  dwarn:0   dfail:0   fail:0   skip:20 
fi-skl-6700k     total:252  pass:227  dwarn:4   dfail:0   fail:0   skip:21 
fi-skl-6770hq    total:252  pass:239  dwarn:0   dfail:0   fail:0   skip:13 
fi-snb-2520m     total:252  pass:221  dwarn:0   dfail:0   fail:0   skip:31 
fi-snb-2600      total:252  pass:220  dwarn:0   dfail:0   fail:0   skip:32 

6e475fdb541d8148ea949f62e7c2082e57bc6736 drm-tip: 2017y-02m-06d-16h-01m-53s UTC integration manifest
730018a drm/i915: Avoid unguarded reads from the request pointer
fa9a892 drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance()

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3715/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ✓ Fi.CI.BAT: success for series starting with [CI,1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance()
  2017-02-06 19:52 ` ✓ Fi.CI.BAT: success for series starting with [CI,1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance() Patchwork
@ 2017-02-06 20:44   ` Chris Wilson
  2017-02-07 11:02     ` Mika Kuoppala
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2017-02-06 20:44 UTC (permalink / raw)
  To: intel-gfx

On Mon, Feb 06, 2017 at 07:52:15PM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: series starting with [CI,1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance()
> URL   : https://patchwork.freedesktop.org/series/19169/
> State : success
> 
> == Summary ==
> 
> Series 19169v1 Series without cover letter
> https://patchwork.freedesktop.org/api/1.0/series/19169/revisions/1/mbox/
> 
> Test kms_force_connector_basic:
>         Subgroup force-connector-state:
>                 skip       -> PASS       (fi-snb-2520m)

Ok, I pushed this as given Mika's feedback this may be responsible for
skl (at least) sporadically dieing in CI. I'm sure the macros for the
conditional code I added can and will be improved.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ✓ Fi.CI.BAT: success for series starting with [CI,1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance()
  2017-02-06 20:44   ` Chris Wilson
@ 2017-02-07 11:02     ` Mika Kuoppala
  0 siblings, 0 replies; 5+ messages in thread
From: Mika Kuoppala @ 2017-02-07 11:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Mon, Feb 06, 2017 at 07:52:15PM -0000, Patchwork wrote:
>> == Series Details ==
>> 
>> Series: series starting with [CI,1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance()
>> URL   : https://patchwork.freedesktop.org/series/19169/
>> State : success
>> 
>> == Summary ==
>> 
>> Series 19169v1 Series without cover letter
>> https://patchwork.freedesktop.org/api/1.0/series/19169/revisions/1/mbox/
>> 
>> Test kms_force_connector_basic:
>>         Subgroup force-connector-state:
>>                 skip       -> PASS       (fi-snb-2520m)
>
> Ok, I pushed this as given Mika's feedback this may be responsible for
> skl (at least) sporadically dieing in CI. I'm sure the macros for the
> conditional code I added can and will be improved.

Observations with my kbl indicate that the retirement freed the request's
context, which then got referenced by the GEM_BUG_ON. Instead of neat
trace, full system hang ensued.

With this patch, my kbl seems very stable again.

I suspect that atleast some incomplete CI results we have seen,
on execlist machines, were due to this.

-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-07 11:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-06 17:05 [CI 1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance() Chris Wilson
2017-02-06 17:05 ` [CI 2/2] drm/i915: Avoid unguarded reads from the request pointer Chris Wilson
2017-02-06 19:52 ` ✓ Fi.CI.BAT: success for series starting with [CI,1/2] drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance() Patchwork
2017-02-06 20:44   ` Chris Wilson
2017-02-07 11:02     ` Mika Kuoppala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.