From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [RFC 17/44] drm/i915: Prelude to splitting
 i915_gem_do_execbuffer in two
Date: Wed, 23 Jul 2014 20:14:59 +0200
Message-ID: <20140723181458.GW15237@phenom.ffwll.local>
References: <1403803475-16337-1-git-send-email-John.C.Harrison@Intel.com>
 <1403803475-16337-18-git-send-email-John.C.Harrison@Intel.com>
 <20140702113423.189ec11c@jbarnes-desktop>
 <20140707192132.GE5821@phenom.ffwll.local>
 <53CFE3E6.2080208@Intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com
 [209.85.212.172])
 by gabe.freedesktop.org (Postfix) with ESMTP id 1D9766E68F
 for <Intel-GFX@lists.freedesktop.org>; Wed, 23 Jul 2014 11:14:51 -0700 (PDT)
Received: by mail-wi0-f172.google.com with SMTP id n3so8276736wiv.11
 for <Intel-GFX@lists.freedesktop.org>; Wed, 23 Jul 2014 11:14:49 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <53CFE3E6.2080208@Intel.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
To: John Harrison <John.C.Harrison@Intel.com>
Cc: Intel-GFX@lists.freedesktop.org
List-Id: intel-gfx@lists.freedesktop.org

On Wed, Jul 23, 2014 at 05:33:42PM +0100, John Harrison wrote:
> 
> On 07/07/2014 20:21, Daniel Vetter wrote:
> >On Wed, Jul 02, 2014 at 11:34:23AM -0700, Jesse Barnes wrote:
> >>On Thu, 26 Jun 2014 18:24:08 +0100
> >>John.C.Harrison@Intel.com wrote:
> >>
> >>>From: John Harrison <John.C.Harrison@Intel.com>
> >>>
> >>>The scheduler decouples the submission of batch buffers to the driver with their
> >>>submission to the hardware. This basically means splitting the execbuffer()
> >>>function in half. This change rearranges some code ready for the split to occur.
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   23 ++++++++++++++++-------
> >>>  1 file changed, 16 insertions(+), 7 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>index ec274ef..fda9187 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>@@ -32,6 +32,7 @@
> >>>  #include "i915_trace.h"
> >>>  #include "intel_drv.h"
> >>>  #include <linux/dma_remapping.h>
> >>>+#include "i915_scheduler.h"
> >>>  #define  __EXEC_OBJECT_HAS_PIN (1<<31)
> >>>  #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
> >>>@@ -874,10 +875,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
> >>>  	if (flush_domains & I915_GEM_DOMAIN_GTT)
> >>>  		wmb();
> >>>-	/* Unconditionally invalidate gpu caches and ensure that we do flush
> >>>-	 * any residual writes from the previous batch.
> >>>-	 */
> >>>-	return intel_ring_invalidate_all_caches(ring);
> >>>+	return 0;
> >>>  }
> >>>  static bool
> >>>@@ -1219,8 +1217,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >>>  		}
> >>>  	}
> >>>-	intel_runtime_pm_get(dev_priv);
> >>>-
> >>>  	ret = i915_mutex_lock_interruptible(dev);
> >>>  	if (ret)
> >>>  		goto pre_mutex_err;
> >>>@@ -1331,6 +1327,20 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >>>  	if (ret)
> >>>  		goto err;
> >>>+	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
> >>>+
> >>>+	/* To be split into two functions here... */
> >>>+
> >>>+	intel_runtime_pm_get(dev_priv);
> >>>+
> >>>+	/* Unconditionally invalidate gpu caches and ensure that we do flush
> >>>+	 * any residual writes from the previous batch.
> >>>+	 */
> >>>+	ret = intel_ring_invalidate_all_caches(ring);
> >>>+	if (ret)
> >>>+		goto err;
> >>>+
> >>>+	/* Switch to the correct context for the batch */
> >>>  	ret = i915_switch_context(ring, ctx);
> >>>  	if (ret)
> >>>  		goto err;
> >>>@@ -1381,7 +1391,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >>>  	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
> >>>-	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
> >>>  	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
> >>>  err:
> >>I'd like Chris to take a look too, but it looks safe afaict.
> >>
> >>Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> >switch_context can fail with EINTR so we really can't move stuff to the
> >active list before that point. Or we need to make sure that all the stuff
> >between the old and new move_to_active callsite can't fail.
> >
> >Or we need to track this and tell userspace with an EIO and adjusted reset
> >stats that something between our point of no return where the kernel
> >committed to executing the batch failed.
> >
> >Or we need to unrol move_to_active (which is currently not really
> >possible).
> >-Daniel
> 
> switch_context can fail with quite a lot of different error codes. Is there
> anything particularly special about EINTR? I can't spot that particular code
> path at the moment.
> 
> The context switch is done at the point of submission to the hardware. As
> batch buffers can be re-ordered between submission to driver and submission
> to hardware, there is no point choosing a context any earlier. Whereas the
> move to active needs to be done at the point of submission to the driver.
> The object needs to be marked as in use even though the batch buffer that
> actually uses it might not be executed for some time. From the software
> viewpoint, the object is in use and all the syncrhonisation code needs to
> know that.
> 
> The scheduler makes the batch buffer execution asynchronous to its
> submission to the driver. There is no way to communicate back a return code
> to user land. Instead, it is up to the scheduler to check the return codes
> from all the execution paths and to retry later if something fails for a
> temporary reason. Or to discard the buffer if it is truly toast.

EINTR is simply really easy to test&hit since you can provoke it with
signals. And X uses signals excessively. One point where EINTR might
happen is in intel_ring_begin, the other when we try to pin the context
into ggtt. The other error codes are true exceptions and will much harder
to hit.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch