Re: [Intel-gfx] [PATCH 2/3] drm/i915: Fix some invalid requests cancellations

From: Daniel Vetter <daniel@ffwll.ch>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>,
	intel-gfx@lists.freedesktop.org,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	stable@vger.kernel.org
Subject: Re: [Intel-gfx] [PATCH 2/3] drm/i915: Fix some invalid requests cancellations
Date: Mon, 15 Feb 2016 10:47:26 +0100	[thread overview]
Message-ID: <20160215094726.GH11240@phenom.ffwll.local> (raw)
In-Reply-To: <56BC862E.2020305@linux.intel.com>

On Thu, Feb 11, 2016 at 01:01:34PM +0000, Tvrtko Ursulin wrote:
> On 29/01/16 16:49, Chris Wilson wrote:
> >As we add the VMA to the request early, it may be cancelled during
> >execbuf reservation. This will leave the context object pointing to a
> 
> I don't get it, request is created after the reservation.

That's the problem - we add vma for a given request (well ring/seqno pair)
before that request exists.
-Daniel

> 
> >dangling request; i915_wait_request() simply skips the wait and so we
> >may unbind the object whilst it is still active.
> >
> >However, if at any point we make a change to the hardware (and equally
> >importantly our bookkeeping in the driver), we cannot cancel the request
> >as what has already been written must be submitted. Submitting a partial
> >request is far easier than trying to unwind the incomplete change.
> >
> >Unfortunately this patch undoes the excess breadcrumb usage that olr
> >prevented, e.g. if we interrupt batchbuffer submission then we submit
> >the requests along with the memory writes and interrupt (even though we
> >do no real work). Disassociating requests from breadcrumbs (and
> >semaphores) is a topic for a past/future series, but now much more
> >important.
> >
> >v2: Rebase
> >
> >Note that igt/gem_concurrent_blit tiggers both misrendering and a GPF
> >that is fixed by this patch.
> >
> >Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
> >Testcase: igt/gem_concurrent_blit
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >Cc: stable@vger.kernel.org
> >---
> >  drivers/gpu/drm/i915/i915_drv.h            |  1 -
> >  drivers/gpu/drm/i915/i915_gem.c            |  7 ++-----
> >  drivers/gpu/drm/i915/i915_gem_context.c    | 21 +++++++++------------
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 16 +++++-----------
> >  drivers/gpu/drm/i915/intel_display.c       |  2 +-
> >  drivers/gpu/drm/i915/intel_lrc.c           |  1 -
> >  6 files changed, 17 insertions(+), 31 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index a2d2f08b7515..f828a7ffed37 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -2823,7 +2823,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
> >  			     struct drm_file *file_priv);
> >  void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
> >  					struct drm_i915_gem_request *req);
> >-void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
> >  int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
> >  				   struct drm_i915_gem_execbuffer2 *args,
> >  				   struct list_head *vmas);
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index e9b19bca1383..f764f33580fc 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -3407,12 +3407,9 @@ int i915_gpu_idle(struct drm_device *dev)
> >  				return PTR_ERR(req);
> >
> >  			ret = i915_switch_context(req);
> >-			if (ret) {
> >-				i915_gem_request_cancel(req);
> >-				return ret;
> >-			}
> >-
> >  			i915_add_request_no_flush(req);
> >+			if (ret)
> >+				return ret;
> >  		}
> >
> >  		ret = intel_ring_idle(ring);
> >diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> >index 83a097c94911..5da7adc3f7b2 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_context.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >@@ -652,7 +652,6 @@ static int do_switch(struct drm_i915_gem_request *req)
> >  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> >  	struct intel_context *from = ring->last_context;
> >  	u32 hw_flags = 0;
> >-	bool uninitialized = false;
> >  	int ret, i;
> >
> >  	if (from != NULL && ring == &dev_priv->ring[RCS]) {
> >@@ -759,6 +758,15 @@ static int do_switch(struct drm_i915_gem_request *req)
> >  			to->remap_slice &= ~(1<<i);
> >  	}
> >
> >+	if (!to->legacy_hw_ctx.initialized) {
> >+		if (ring->init_context) {
> >+			ret = ring->init_context(req);
> >+			if (ret)
> >+				goto unpin_out;
> >+		}
> >+		to->legacy_hw_ctx.initialized = true;
> >+	}
> >+
> >  	/* The backing object for the context is done after switching to the
> >  	 * *next* context. Therefore we cannot retire the previous context until
> >  	 * the next context has already started running. In fact, the below code
> >@@ -782,21 +790,10 @@ static int do_switch(struct drm_i915_gem_request *req)
> >  		i915_gem_context_unreference(from);
> >  	}
> >
> >-	uninitialized = !to->legacy_hw_ctx.initialized;
> >-	to->legacy_hw_ctx.initialized = true;
> >-
> >  done:
> >  	i915_gem_context_reference(to);
> >  	ring->last_context = to;
> >
> >-	if (uninitialized) {
> >-		if (ring->init_context) {
> >-			ret = ring->init_context(req);
> >-			if (ret)
> >-				DRM_ERROR("ring init context: %d\n", ret);
> >-		}
> >-	}
> >-
> >  	return 0;
> >
> >  unpin_out:
> >diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >index 8fd00d279447..61bb15507b30 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >@@ -1133,7 +1133,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
> >  	}
> >  }
> >
> >-void
> >+static void
> >  i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params)
> >  {
> >  	/* Unconditionally force add_request to emit a full flush. */
> >@@ -1318,7 +1318,6 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
> >  	trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
> >
> >  	i915_gem_execbuffer_move_to_active(vmas, params->request);
> >-	i915_gem_execbuffer_retire_commands(params);
> 
> Hm this whole block block from trace_... to
> i915_gem_execbuffer_retire_commands could be moved to do_execbuffer as and
> independent cleanup patch if under "if (ret == 0)".
> 
> So the concern is that something has been written to the ringbuffer but not
> all of it?
> 
> Why do we have to submit that, I am assuming whatever was written is not
> interesting to be executed unless the whole bb went in.
> 
> So could we just rewind the pointers? Store them at the beginning and rewind
> if something failed.
> 
> >
> >  	return 0;
> >  }
> >@@ -1616,8 +1615,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	}
> >
> >  	ret = i915_gem_request_add_to_client(req, file);
> >-	if (ret)
> >+	if (ret) {
> >+		i915_gem_request_cancel(req);
> >  		goto err_batch_unpin;
> >+	}
> 
> Doesn't look like this can fail so a side cleanup only?
> 
> Regards,
> 
> Tvrtko

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch