From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f51.google.com ([74.125.82.51]:32869 "EHLO mail-wm0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751145AbbLUM2Q (ORCPT ); Mon, 21 Dec 2015 07:28:16 -0500 Received: by mail-wm0-f51.google.com with SMTP id p187so66017988wmp.0 for ; Mon, 21 Dec 2015 04:28:15 -0800 (PST) Date: Mon, 21 Dec 2015 13:28:17 +0100 From: Daniel Vetter To: Chris Wilson Cc: intel-gfx@lists.freedesktop.org, Daniel Vetter , stable@vger.kernel.org Subject: Re: [PATCH] drm/i915: Hide one invalid cancellation bug in i915_switch_context() Message-ID: <20151221122816.GQ30437@phenom.ffwll.local> References: <1450376285-609-1-git-send-email-chris@chris-wilson.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1450376285-609-1-git-send-email-chris@chris-wilson.co.uk> Sender: stable-owner@vger.kernel.org List-ID: On Thu, Dec 17, 2015 at 06:18:05PM +0000, Chris Wilson wrote: > As we add the VMA to the request early, it may be cancelled during > execbuf reservation. This will leave the context object pointing to a > dangling request; i915_wait_request() simply skips the wait and so we > may unbind the object whilst it is still active. > > We can partially prevent such atrocity by doing the RCS context > initialisation earlier. This ensures that one callsite from blowing up > (and for igt this is a frequent culprit due to how the stressful batches > are submitted). > > Signed-off-by: Chris Wilson > Cc: Daniel Vetter > Cc: stable@vger.kernel.org > --- > drivers/gpu/drm/i915/i915_gem_context.c | 29 +++++++++++++++++------------ > 1 file changed, 17 insertions(+), 12 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index 900ffd044db8..657686e6492f 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -657,7 +657,6 @@ static int do_switch(struct drm_i915_gem_request *req) > struct drm_i915_private *dev_priv = ring->dev->dev_private; > struct intel_context *from = ring->last_context; > u32 hw_flags = 0; > - bool uninitialized = false; > int ret, i; > > if (from != NULL && ring == &dev_priv->ring[RCS]) { > @@ -764,6 +763,15 @@ static int do_switch(struct drm_i915_gem_request *req) > to->remap_slice &= ~(1< } > > + if (!to->legacy_hw_ctx.initialized) { > + if (ring->init_context) { > + ret = ring->init_context(req); > + if (ret) > + goto unpin_out; > + } > + to->legacy_hw_ctx.initialized = true; > + } > + > /* The backing object for the context is done after switching to the > * *next* context. Therefore we cannot retire the previous context until > * the next context has already started running. In fact, the below code > @@ -772,6 +780,14 @@ static int do_switch(struct drm_i915_gem_request *req) > */ > if (from != NULL) { > from->legacy_hw_ctx.rcs_state->base.read_domains = I915_GEM_DOMAIN_INSTRUCTION; > + /* XXX Note very well this is dangerous! > + * We are pinning this object using this request as our > + * active reference. However, this request may yet be cancelled > + * during the execbuf dispatch, leaving us waiting on a > + * dangling request. Waiting upon this dangling request is > + * ignored, which means that we may unbind the context whilst > + * the GPU is still writing to the backing storage. > + */ > i915_vma_move_to_active(i915_gem_obj_to_ggtt(from->legacy_hw_ctx.rcs_state), req); Hm, since this is just a partial fix, what about instead marking any request that has been put to use already in move_to_active. And then when we try to cancel it from execbuf notice that and not cancel it? Fixes both bugs and makes the entire thing a bit more robust since it allows us to throw stuff at a request without ordering constraints. -Daniel > /* As long as MI_SET_CONTEXT is serializing, ie. it flushes the > * whole damn pipeline, we don't need to explicitly mark the > @@ -787,21 +803,10 @@ static int do_switch(struct drm_i915_gem_request *req) > i915_gem_context_unreference(from); > } > > - uninitialized = !to->legacy_hw_ctx.initialized; > - to->legacy_hw_ctx.initialized = true; > - > done: > i915_gem_context_reference(to); > ring->last_context = to; > > - if (uninitialized) { > - if (ring->init_context) { > - ret = ring->init_context(req); > - if (ret) > - DRM_ERROR("ring init context: %d\n", ret); > - } > - } > - > return 0; > > unpin_out: > -- > 2.6.4 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch