All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>, <dri-devel@lists.freedesktop.org>
Cc: daniele.ceraolospurio@intel.com, john.c.harrison@intel.com
Subject: [PATCH 03/33] drm/i915: Hold reference to intel_context over life of i915_request
Date: Mon, 26 Jul 2021 17:23:18 -0700	[thread overview]
Message-ID: <20210727002348.97202-4-matthew.brost@intel.com> (raw)
In-Reply-To: <20210727002348.97202-1-matthew.brost@intel.com>

Hold a reference to the intel_context over life of an i915_request.
Without this an i915_request can exist after the context has been
destroyed (e.g. request retired, context closed, but user space holds a
reference to the request from an out fence). In the case of GuC
submission + virtual engine, the engine that the request references is
also destroyed which can trigger bad pointer dref in fence ops (e.g.
i915_fence_get_driver_name). We could likely change
i915_fence_get_driver_name to avoid touching the engine but let's just
be safe and hold the intel_context reference.

v2:
 (John Harrison)
  - Update comment explaining how GuC mode and execlists mode deal with
    virtual engines differently

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 55 ++++++++++++-----------------
 1 file changed, 23 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 39a21d96577e..57c9187aff74 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -125,39 +125,17 @@ static void i915_fence_release(struct dma_fence *fence)
 	i915_sw_fence_fini(&rq->semaphore);
 
 	/*
-	 * Keep one request on each engine for reserved use under mempressure
-	 *
-	 * We do not hold a reference to the engine here and so have to be
-	 * very careful in what rq->engine we poke. The virtual engine is
-	 * referenced via the rq->context and we released that ref during
-	 * i915_request_retire(), ergo we must not dereference a virtual
-	 * engine here. Not that we would want to, as the only consumer of
-	 * the reserved engine->request_pool is the power management parking,
-	 * which must-not-fail, and that is only run on the physical engines.
-	 *
-	 * Since the request must have been executed to be have completed,
-	 * we know that it will have been processed by the HW and will
-	 * not be unsubmitted again, so rq->engine and rq->execution_mask
-	 * at this point is stable. rq->execution_mask will be a single
-	 * bit if the last and _only_ engine it could execution on was a
-	 * physical engine, if it's multiple bits then it started on and
-	 * could still be on a virtual engine. Thus if the mask is not a
-	 * power-of-two we assume that rq->engine may still be a virtual
-	 * engine and so a dangling invalid pointer that we cannot dereference
-	 *
-	 * For example, consider the flow of a bonded request through a virtual
-	 * engine. The request is created with a wide engine mask (all engines
-	 * that we might execute on). On processing the bond, the request mask
-	 * is reduced to one or more engines. If the request is subsequently
-	 * bound to a single engine, it will then be constrained to only
-	 * execute on that engine and never returned to the virtual engine
-	 * after timeslicing away, see __unwind_incomplete_requests(). Thus we
-	 * know that if the rq->execution_mask is a single bit, rq->engine
-	 * can be a physical engine with the exact corresponding mask.
+	 * Keep one request on each engine for reserved use under mempressure,
+	 * do not use with virtual engines as this really is only needed for
+	 * kernel contexts.
 	 */
-	if (is_power_of_2(rq->execution_mask) &&
-	    !cmpxchg(&rq->engine->request_pool, NULL, rq))
+	if (!intel_engine_is_virtual(rq->engine) &&
+	    !cmpxchg(&rq->engine->request_pool, NULL, rq)) {
+		intel_context_put(rq->context);
 		return;
+	}
+
+	intel_context_put(rq->context);
 
 	kmem_cache_free(global.slab_requests, rq);
 }
@@ -956,7 +934,19 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 		}
 	}
 
-	rq->context = ce;
+	/*
+	 * Hold a reference to the intel_context over life of an i915_request.
+	 * Without this an i915_request can exist after the context has been
+	 * destroyed (e.g. request retired, context closed, but user space holds
+	 * a reference to the request from an out fence). In the case of GuC
+	 * submission + virtual engine, the engine that the request references
+	 * is also destroyed which can trigger bad pointer dref in fence ops
+	 * (e.g. i915_fence_get_driver_name). We could likely change these
+	 * functions to avoid touching the engine but let's just be safe and
+	 * hold the intel_context reference. In execlist mode the request always
+	 * eventually points to a physical engine so this isn't an issue.
+	 */
+	rq->context = intel_context_get(ce);
 	rq->engine = ce->engine;
 	rq->ring = ce->ring;
 	rq->execution_mask = ce->engine->mask;
@@ -1033,6 +1023,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
 err_free:
+	intel_context_put(ce);
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	intel_context_unpin(ce);
-- 
2.28.0


WARNING: multiple messages have this Message-ID (diff)
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>, <dri-devel@lists.freedesktop.org>
Subject: [Intel-gfx] [PATCH 03/33] drm/i915: Hold reference to intel_context over life of i915_request
Date: Mon, 26 Jul 2021 17:23:18 -0700	[thread overview]
Message-ID: <20210727002348.97202-4-matthew.brost@intel.com> (raw)
In-Reply-To: <20210727002348.97202-1-matthew.brost@intel.com>

Hold a reference to the intel_context over life of an i915_request.
Without this an i915_request can exist after the context has been
destroyed (e.g. request retired, context closed, but user space holds a
reference to the request from an out fence). In the case of GuC
submission + virtual engine, the engine that the request references is
also destroyed which can trigger bad pointer dref in fence ops (e.g.
i915_fence_get_driver_name). We could likely change
i915_fence_get_driver_name to avoid touching the engine but let's just
be safe and hold the intel_context reference.

v2:
 (John Harrison)
  - Update comment explaining how GuC mode and execlists mode deal with
    virtual engines differently

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 55 ++++++++++++-----------------
 1 file changed, 23 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 39a21d96577e..57c9187aff74 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -125,39 +125,17 @@ static void i915_fence_release(struct dma_fence *fence)
 	i915_sw_fence_fini(&rq->semaphore);
 
 	/*
-	 * Keep one request on each engine for reserved use under mempressure
-	 *
-	 * We do not hold a reference to the engine here and so have to be
-	 * very careful in what rq->engine we poke. The virtual engine is
-	 * referenced via the rq->context and we released that ref during
-	 * i915_request_retire(), ergo we must not dereference a virtual
-	 * engine here. Not that we would want to, as the only consumer of
-	 * the reserved engine->request_pool is the power management parking,
-	 * which must-not-fail, and that is only run on the physical engines.
-	 *
-	 * Since the request must have been executed to be have completed,
-	 * we know that it will have been processed by the HW and will
-	 * not be unsubmitted again, so rq->engine and rq->execution_mask
-	 * at this point is stable. rq->execution_mask will be a single
-	 * bit if the last and _only_ engine it could execution on was a
-	 * physical engine, if it's multiple bits then it started on and
-	 * could still be on a virtual engine. Thus if the mask is not a
-	 * power-of-two we assume that rq->engine may still be a virtual
-	 * engine and so a dangling invalid pointer that we cannot dereference
-	 *
-	 * For example, consider the flow of a bonded request through a virtual
-	 * engine. The request is created with a wide engine mask (all engines
-	 * that we might execute on). On processing the bond, the request mask
-	 * is reduced to one or more engines. If the request is subsequently
-	 * bound to a single engine, it will then be constrained to only
-	 * execute on that engine and never returned to the virtual engine
-	 * after timeslicing away, see __unwind_incomplete_requests(). Thus we
-	 * know that if the rq->execution_mask is a single bit, rq->engine
-	 * can be a physical engine with the exact corresponding mask.
+	 * Keep one request on each engine for reserved use under mempressure,
+	 * do not use with virtual engines as this really is only needed for
+	 * kernel contexts.
 	 */
-	if (is_power_of_2(rq->execution_mask) &&
-	    !cmpxchg(&rq->engine->request_pool, NULL, rq))
+	if (!intel_engine_is_virtual(rq->engine) &&
+	    !cmpxchg(&rq->engine->request_pool, NULL, rq)) {
+		intel_context_put(rq->context);
 		return;
+	}
+
+	intel_context_put(rq->context);
 
 	kmem_cache_free(global.slab_requests, rq);
 }
@@ -956,7 +934,19 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 		}
 	}
 
-	rq->context = ce;
+	/*
+	 * Hold a reference to the intel_context over life of an i915_request.
+	 * Without this an i915_request can exist after the context has been
+	 * destroyed (e.g. request retired, context closed, but user space holds
+	 * a reference to the request from an out fence). In the case of GuC
+	 * submission + virtual engine, the engine that the request references
+	 * is also destroyed which can trigger bad pointer dref in fence ops
+	 * (e.g. i915_fence_get_driver_name). We could likely change these
+	 * functions to avoid touching the engine but let's just be safe and
+	 * hold the intel_context reference. In execlist mode the request always
+	 * eventually points to a physical engine so this isn't an issue.
+	 */
+	rq->context = intel_context_get(ce);
 	rq->engine = ce->engine;
 	rq->ring = ce->ring;
 	rq->execution_mask = ce->engine->mask;
@@ -1033,6 +1023,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
 err_free:
+	intel_context_put(ce);
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	intel_context_unpin(ce);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2021-07-27  0:06 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27  0:23 [PATCH 00/33] Remaining patches for basic GuC submission Matthew Brost
2021-07-27  0:23 ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 01/33] drm/i915/guc: GuC virtual engines Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 02/33] drm/i915/guc: Make hangcheck work with " Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` Matthew Brost [this message]
2021-07-27  0:23   ` [Intel-gfx] [PATCH 03/33] drm/i915: Hold reference to intel_context over life of i915_request Matthew Brost
2021-07-28  8:01   ` Daniel Vetter
2021-07-28  8:01     ` Daniel Vetter
2021-07-27  0:23 ` [PATCH 04/33] drm/i915/guc: Disable bonding extension with GuC submission Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 05/33] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 06/33] drm/i915: Add i915_sched_engine destroy vfunc Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 07/33] drm/i915: Move active request tracking to a vfunc Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 08/33] drm/i915/guc: Reset implementation for new GuC interface Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 09/33] drm/i915: Reset GPU immediately if submission is disabled Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 10/33] drm/i915/guc: Add disable interrupts to guc sanitize Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 11/33] drm/i915/guc: Suspend/resume implementation for new interface Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 12/33] drm/i915/guc: Handle context reset notification Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 13/33] drm/i915/guc: Handle engine reset failure notification Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 14/33] drm/i915/guc: Enable the timer expired interrupt for GuC Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 15/33] drm/i915/guc: Provide mmio list to be saved/restored on engine reset Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 16/33] drm/i915/guc: Don't complain about reset races Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 17/33] drm/i915/guc: Enable GuC engine reset Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 18/33] drm/i915/guc: Capture error state on context reset Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 19/33] drm/i915/guc: Fix for error capture after full GPU reset with GuC Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 20/33] drm/i915/guc: Hook GuC scheduling policies up Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 21/33] drm/i915/guc: Connect reset modparam updates to GuC policy flags Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-08-26  8:55   ` Jani Nikula
2021-08-26  8:55     ` [Intel-gfx] " Jani Nikula
2021-07-27  0:23 ` [PATCH 22/33] drm/i915/guc: Include scheduling policies in the debugfs state dump Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 23/33] drm/i915/guc: Add golden context to GuC ADS Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 24/33] drm/i915/guc: Implement banned contexts for GuC submission Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-08-05 11:52   ` Tvrtko Ursulin
2021-08-25 10:39   ` Tvrtko Ursulin
2021-08-26  3:49     ` Matthew Brost
2021-08-26 11:27       ` Tvrtko Ursulin
2021-08-26 14:28         ` Matthew Brost
2021-07-27  0:23 ` [PATCH 25/33] drm/i915/guc: Support request cancellation Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27 19:15   ` Daniele Ceraolo Spurio
2021-07-27 19:15     ` [Intel-gfx] " Daniele Ceraolo Spurio
2021-10-05  7:06     ` Sebastian Andrzej Siewior
2021-10-05  7:06       ` [Intel-gfx] " Sebastian Andrzej Siewior
2021-10-05 10:13       ` Tvrtko Ursulin
2021-10-05 10:58         ` Sebastian Andrzej Siewior
2021-07-27  0:23 ` [PATCH 26/33] drm/i915/selftest: Better error reporting from hangcheck selftest Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 27/33] drm/i915/selftest: Fix workarounds selftest for GuC submission Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 28/33] drm/i915/selftest: Fix MOCS " Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 29/33] drm/i915/selftest: Increase some timeouts in live_requests Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27 19:21   ` John Harrison
2021-07-27 19:21     ` [Intel-gfx] " John Harrison
2021-07-27  0:23 ` [PATCH 30/33] drm/i915/selftest: Fix hangcheck self test for GuC submission Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 31/33] drm/i915/selftest: Bump selftest timeouts for hangcheck Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:23 ` [PATCH 32/33] drm/i915/guc: Implement GuC priority management Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2022-06-07 13:58   ` Tvrtko Ursulin
2022-06-07 13:58     ` Tvrtko Ursulin
2022-06-07 22:40     ` John Harrison
2021-07-27  0:23 ` [PATCH 33/33] drm/i915/guc: Unblock GuC submission on Gen11+ Matthew Brost
2021-07-27  0:23   ` [Intel-gfx] " Matthew Brost
2021-07-27  0:34 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Remaining patches for basic GuC submission (rev2) Patchwork
2021-07-27  1:04 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-07-27  4:50 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2021-07-22 23:53 [PATCH 00/33] Remaining patches for basic GuC submission Matthew Brost
2021-07-22 23:53 ` [PATCH 03/33] drm/i915: Hold reference to intel_context over life of i915_request Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210727002348.97202-4-matthew.brost@intel.com \
    --to=matthew.brost@intel.com \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.