All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 40/46] drm/i915: Multi-batch execbuffer2
Date: Mon, 9 Aug 2021 19:02:21 +0200	[thread overview]
Message-ID: <YRFfnZYTRSWGFUSy@phenom.ffwll.local> (raw)
In-Reply-To: <20210803222943.27686-41-matthew.brost@intel.com>

On Tue, Aug 03, 2021 at 03:29:37PM -0700, Matthew Brost wrote:
> For contexts with width set to two or more, we add a mode to execbuf2
> which implies there are N batch buffers in the buffer list, each of
> which will be sent to one of the engines from the engine map array
> (I915_CONTEXT_PARAM_ENGINES, I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT).
> 
> Those N batches can either be first N, or last N objects in the list as
> controlled by the existing execbuffer2 flag.
> 
> The N batches will be submitted to consecutive engines from the previously
> configured allowed engine array starting at index 0.
> 
> Input and output fences are fully supported, with the latter getting
> signalled when all batch buffers have completed.
> 
> Last, it isn't safe for subsequent batches to touch any objects written
> to by a multi-BB submission until all the batches in that submission
> complete. As such all batches in a multi-BB submission must be combined
> into a single composite fence and put into the dma reseveration excl
> fence slot.
> 
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

So either I've missed something, or this has the exact deadlock issue as
the old submit fence, except it's all internally in the kmd.

Also, this is bad news (if I'm right about what's going on here).

- Between each batch submission we drop the dma_resv_locks on the objects.
  This can currently even happen due to relocations within a submission,
  but since we don't allow relocations on platforms with parallel
  submit/guc scheduler, this could be worked around.

- When the buffer is unlocked someone else could step in and do exactly
  what you say is not allowed, namely touch the object.

- The indivual batch fences won't completely until the last one has
  finished, leading to a deadlock which might or might not get resolved by
  gpu reset code. Since the deadlock is on the submission side I'm
  assuming the answer is "it won't be resolved by gpu reset", but maybe
  you do have a "I'm stuck for too long, let's ragequit" timer in your
  state machine somewhere. Old bonded submit would be rescued by the
  hangcheck we readded at least because there it's all completely
  free-floating requests.

- ttm on dgpu makes this all substantially worse.

The fundamental fix is still to build up a single i915_request, go through
the execbuf flow once, and then split things up again in the backend. That
would also mean all your prep work to pull execbuf prep step out of
do_execbuf() is a pure distraction.

I'm not yet fully understanding all the ordering rules drm/sched has, but
I don't think it will be any happier about this kind of submission model.

tldr; what do?

Cheers, Daniel
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 262 +++++++++++++++---
>  drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>  drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
>  drivers/gpu/drm/i915/i915_vma.c               |  13 +-
>  drivers/gpu/drm/i915/i915_vma.h               |  16 +-
>  5 files changed, 266 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index b6143973ac67..ecdb583cc2eb 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -252,6 +252,9 @@ struct i915_execbuffer {
>  	struct eb_vma *batch; /** identity of the batch obj/vma */
>  	struct i915_vma *trampoline; /** trampoline used for chaining */
>  
> +	/** used for excl fence in dma_resv objects when > 1 BB submitted */
> +	struct dma_fence *composite_fence;
> +
>  	/* batch_index in vma list */
>  	unsigned int batch_index;
>  
> @@ -367,11 +370,6 @@ static int eb_create(struct i915_execbuffer *eb)
>  		eb->lut_size = -eb->buffer_count;
>  	}
>  
> -	if (eb->args->flags & I915_EXEC_BATCH_FIRST)
> -		eb->batch_index = 0;
> -	else
> -		eb->batch_index = eb->args->buffer_count - 1;
> -
>  	return 0;
>  }
>  
> @@ -2241,7 +2239,7 @@ static int eb_relocate_parse(struct i915_execbuffer *eb)
>  	return err;
>  }
>  
> -static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first)
> +static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first, bool last)
>  {
>  	const unsigned int count = eb->buffer_count;
>  	unsigned int i = count;
> @@ -2289,8 +2287,16 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first)
>  		}
>  
>  		if (err == 0)
> -			err = i915_vma_move_to_active(vma, eb->request,
> -						      flags | __EXEC_OBJECT_NO_RESERVE);
> +			err = _i915_vma_move_to_active(vma, eb->request,
> +						       flags | __EXEC_OBJECT_NO_RESERVE,
> +						       !last ?
> +						       NULL :
> +						       eb->composite_fence ?
> +						       eb->composite_fence :
> +						       &eb->request->fence,
> +						       eb->composite_fence ?
> +						       eb->composite_fence :
> +						       &eb->request->fence);
>  	}
>  
>  #ifdef CONFIG_MMU_NOTIFIER
> @@ -2528,14 +2534,14 @@ static int eb_parse(struct i915_execbuffer *eb)
>  }
>  
>  static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch,
> -		     bool first)
> +		     bool first, bool last)
>  {
>  	int err;
>  
>  	if (intel_context_nopreempt(eb->context))
>  		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &eb->request->fence.flags);
>  
> -	err = eb_move_to_gpu(eb, first);
> +	err = eb_move_to_gpu(eb, first, last);
>  	if (err)
>  		return err;
>  
> @@ -2748,7 +2754,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
>  }
>  
>  static int
> -eb_select_engine(struct i915_execbuffer *eb)
> +eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
>  {
>  	struct intel_context *ce;
>  	unsigned int idx;
> @@ -2763,6 +2769,18 @@ eb_select_engine(struct i915_execbuffer *eb)
>  	if (IS_ERR(ce))
>  		return PTR_ERR(ce);
>  
> +	if (batch_number > 0) {
> +		struct intel_context *parent = ce;
> +
> +		GEM_BUG_ON(!intel_context_is_parent(parent));
> +
> +		for_each_child(parent, ce)
> +			if (!--batch_number)
> +				break;
> +		intel_context_put(parent);
> +		intel_context_get(ce);
> +	}
> +
>  	intel_gt_pm_get(ce->engine->gt);
>  
>  	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
> @@ -3155,13 +3173,49 @@ parse_execbuf2_extensions(struct drm_i915_gem_execbuffer2 *args,
>  				    eb);
>  }
>  
> +static int setup_composite_fence(struct i915_execbuffer *eb,
> +				 struct dma_fence **out_fence,
> +				 unsigned int num_batches)
> +{
> +	struct dma_fence_array *fence_array;
> +	struct dma_fence **fences = kmalloc(num_batches * sizeof(*fences),
> +					    GFP_KERNEL);
> +	struct intel_context *parent = intel_context_to_parent(eb->context);
> +	int i;
> +
> +	if (!fences)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < num_batches; ++i)
> +		fences[i] = out_fence[i];
> +
> +	fence_array = dma_fence_array_create(num_batches,
> +					     fences,
> +					     parent->fence_context,
> +					     ++parent->seqno,
> +					     false);
> +	if (!fence_array) {
> +		kfree(fences);
> +		return -ENOMEM;
> +	}
> +
> +	/* Move ownership to the dma_fence_array created above */
> +	for (i = 0; i < num_batches; ++i)
> +		dma_fence_get(fences[i]);
> +
> +	eb->composite_fence = &fence_array->base;
> +
> +	return 0;
> +}
> +
>  static int
>  i915_gem_do_execbuffer(struct drm_device *dev,
>  		       struct drm_file *file,
>  		       struct drm_i915_gem_execbuffer2 *args,
>  		       struct drm_i915_gem_exec_object2 *exec,
> -		       int batch_index,
> +		       unsigned int batch_index,
>  		       unsigned int num_batches,
> +		       unsigned int batch_number,
>  		       struct dma_fence *in_fence,
>  		       struct dma_fence *exec_fence,
>  		       struct dma_fence **out_fence)
> @@ -3170,6 +3224,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>  	struct i915_execbuffer eb;
>  	struct i915_vma *batch;
>  	int err;
> +	bool first = batch_number == 0;
> +	bool last = batch_number + 1 == num_batches;
>  
>  	BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS);
>  	BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS &
> @@ -3194,6 +3250,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>  	eb.batch_start_offset = args->batch_start_offset;
>  	eb.batch_len = args->batch_len;
>  	eb.trampoline = NULL;
> +	eb.composite_fence = NULL;
>  
>  	eb.fences = NULL;
>  	eb.num_fences = 0;
> @@ -3219,14 +3276,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>  	GEM_BUG_ON(!eb.lut_size);
>  
>  	eb.num_batches = num_batches;
> -	if (batch_index >= 0)
> -		eb.batch_index = batch_index;
> +	eb.batch_index = batch_index;
>  
>  	err = eb_select_context(&eb);
>  	if (unlikely(err))
>  		goto err_destroy;
>  
> -	err = eb_select_engine(&eb);
> +	err = eb_select_engine(&eb, batch_number);
>  	if (unlikely(err))
>  		goto err_context;
>  
> @@ -3275,6 +3331,23 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>  			goto err_ext;
>  	}
>  
> +	if (out_fence) {
> +		/* Move ownership to caller (i915_gem_execbuffer2_ioctl) */
> +		out_fence[batch_number] = dma_fence_get(&eb.request->fence);
> +
> +		/*
> +		 * Need to create a composite fence (dma_fence_array,
> +		 * eb.composite_fence) for the excl fence of the dma_resv
> +		 * objects as each BB can write to the object. Since we create
> +		 */
> +		if (num_batches > 1 && last) {
> +			err = setup_composite_fence(&eb, out_fence,
> +						    num_batches);
> +			if (err < 0)
> +				goto err_request;
> +		}
> +	}
> +
>  	if (exec_fence) {
>  		err = i915_request_await_execution(eb.request,
>  						   exec_fence);
> @@ -3307,17 +3380,27 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>  		intel_gt_buffer_pool_mark_active(eb.batch_pool, eb.request);
>  
>  	trace_i915_request_queue(eb.request, eb.batch_flags);
> -	err = eb_submit(&eb, batch, true);
> +	err = eb_submit(&eb, batch, first, last);
>  
>  err_request:
> +	if (last)
> +		set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
> +			&eb.request->fence.flags);
> +
>  	i915_request_get(eb.request);
>  	err = eb_request_add(&eb, err);
>  
>  	if (eb.fences)
>  		signal_fence_array(&eb);
>  
> -	if (!err && out_fence)
> -		*out_fence = dma_fence_get(&eb.request->fence);
> +	/*
> +	 * Ownership of the composite fence (dma_fence_array,
> +	 * eb.composite_fence) has been moved to the dma_resv objects these BB
> +	 * write to in i915_vma_move_to_active. It is ok to release the creation
> +	 * reference of this fence now.
> +	 */
> +	if (eb.composite_fence)
> +		dma_fence_put(eb.composite_fence);
>  
>  	if (unlikely(eb.gem_context->syncobj)) {
>  		drm_syncobj_replace_fence(eb.gem_context->syncobj,
> @@ -3368,6 +3451,17 @@ static bool check_buffer_count(size_t count)
>  	return !(count < 1 || count > INT_MAX || count > SIZE_MAX / sz - 1);
>  }
>  
> +/* Release fences from the dma_fence_get in i915_gem_do_execbuffer. */
> +static inline void put_out_fences(struct dma_fence **out_fences,
> +				  unsigned int num_batches)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_batches; ++i)
> +		if (out_fences[i])
> +			dma_fence_put(out_fences[i]);
> +}
> +
>  int
>  i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>  			   struct drm_file *file)
> @@ -3375,13 +3469,16 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>  	struct drm_i915_private *i915 = to_i915(dev);
>  	struct drm_i915_gem_execbuffer2 *args = data;
>  	struct drm_i915_gem_exec_object2 *exec2_list;
> -	struct dma_fence **out_fence_p = NULL;
> -	struct dma_fence *out_fence = NULL;
> +	struct dma_fence **out_fences = NULL;
>  	struct dma_fence *in_fence = NULL;
>  	struct dma_fence *exec_fence = NULL;
>  	int out_fence_fd = -1;
>  	const size_t count = args->buffer_count;
>  	int err;
> +	struct i915_gem_context *ctx;
> +	struct intel_context *parent = NULL;
> +	unsigned int num_batches = 1, i;
> +	bool is_parallel = false;
>  
>  	if (!check_buffer_count(count)) {
>  		drm_dbg(&i915->drm, "execbuf2 with %zd buffers\n", count);
> @@ -3404,10 +3501,39 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>  	if (err)
>  		return err;
>  
> +	ctx = i915_gem_context_lookup(file->driver_priv, args->rsvd1);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	if (i915_gem_context_user_engines(ctx)) {
> +		parent = i915_gem_context_get_engine(ctx, args->flags &
> +						     I915_EXEC_RING_MASK);
> +		if (IS_ERR(parent)) {
> +			err = PTR_ERR(parent);
> +			goto err_context;
> +		}
> +
> +		if (intel_context_is_parent(parent)) {
> +			if (args->batch_len) {
> +				err = -EINVAL;
> +				goto err_context;
> +			}
> +
> +			num_batches = parent->guc_number_children + 1;
> +			if (num_batches > count) {
> +				i915_gem_context_put(ctx);
> +				goto err_parent;
> +			}
> +			is_parallel = true;
> +		}
> +	}
> +
>  	if (args->flags & I915_EXEC_FENCE_IN) {
>  		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
> -		if (!in_fence)
> -			return -EINVAL;
> +		if (!in_fence) {
> +			err = -EINVAL;
> +			goto err_parent;
> +		}
>  	}
>  
>  	if (args->flags & I915_EXEC_FENCE_SUBMIT) {
> @@ -3423,13 +3549,25 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>  		}
>  	}
>  
> -	if (args->flags & I915_EXEC_FENCE_OUT) {
> -		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
> -		if (out_fence_fd < 0) {
> -			err = out_fence_fd;
> +	/*
> +	 * We always allocate out fences when doing multi-BB submission as
> +	 * this is required to create an excl fence for any dma buf objects
> +	 * these BBs touch.
> +	 */
> +	if (args->flags & I915_EXEC_FENCE_OUT || is_parallel) {
> +		out_fences = kcalloc(num_batches, sizeof(*out_fences),
> +				     GFP_KERNEL);
> +		if (!out_fences) {
> +			err = -ENOMEM;
>  			goto err_out_fence;
>  		}
> -		out_fence_p = &out_fence;
> +		if (args->flags & I915_EXEC_FENCE_OUT) {
> +			out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
> +			if (out_fence_fd < 0) {
> +				err = out_fence_fd;
> +				goto err_out_fence;
> +			}
> +		}
>  	}
>  
>  	/* Allocate extra slots for use by the command parser */
> @@ -3449,8 +3587,35 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>  		goto err_copy;
>  	}
>  
> -	err = i915_gem_do_execbuffer(dev, file, args, exec2_list, -1, 1,
> -				     in_fence, exec_fence, out_fence_p);
> +	/*
> +	 * Downstream submission code expects all parallel submissions to occur
> +	 * in intel_context sequence, thus only 1 submission can happen at a
> +	 * time.
> +	 */
> +	if (is_parallel)
> +		mutex_lock(&parent->parallel_submit);
> +
> +	err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
> +				     args->flags & I915_EXEC_BATCH_FIRST ?
> +				     0 : count - num_batches,
> +				     num_batches,
> +				     0,
> +				     in_fence,
> +				     exec_fence,
> +				     out_fences);
> +
> +	for (i = 1; err == 0 && i < num_batches; i++)
> +		err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
> +					     args->flags & I915_EXEC_BATCH_FIRST ?
> +					     i : count - num_batches + i,
> +					     num_batches,
> +					     i,
> +					     NULL,
> +					     NULL,
> +					     out_fences);
> +
> +	if (is_parallel)
> +		mutex_unlock(&parent->parallel_submit);
>  
>  	/*
>  	 * Now that we have begun execution of the batchbuffer, we ignore
> @@ -3491,8 +3656,31 @@ end:;
>  	}
>  
>  	if (!err && out_fence_fd >= 0) {
> +		struct dma_fence *out_fence = NULL;
>  		struct sync_file *sync_fence;
>  
> +		if (is_parallel) {
> +			struct dma_fence_array *fence_array;
> +
> +			/*
> +			 * The dma_fence_array now owns out_fences (from
> +			 * dma_fence_get in i915_gem_do_execbuffer) assuming
> +			 * successful creation of dma_fence_array.
> +			 */
> +			fence_array = dma_fence_array_create(num_batches,
> +							     out_fences,
> +							     parent->fence_context,
> +							     ++parent->seqno,
> +							     false);
> +			if (!fence_array)
> +				goto put_out_fences;
> +
> +			out_fence = &fence_array->base;
> +			out_fences = NULL;
> +		} else {
> +			out_fence = out_fences[0];
> +		}
> +
>  		sync_fence = sync_file_create(out_fence);
>  		if (sync_fence) {
>  			fd_install(out_fence_fd, sync_fence->file);
> @@ -3500,9 +3688,15 @@ end:;
>  			args->rsvd2 |= (u64)out_fence_fd << 32;
>  			out_fence_fd = -1;
>  		}
> +
> +		/*
> +		 * The sync_file now owns out_fence, drop the creation
> +		 * reference.
> +		 */
>  		dma_fence_put(out_fence);
> -	} else if (out_fence) {
> -		dma_fence_put(out_fence);
> +	} else if (out_fences) {
> +put_out_fences:
> +		put_out_fences(out_fences, num_batches);
>  	}
>  
>  	args->flags &= ~__I915_EXEC_UNKNOWN_FLAGS;
> @@ -3513,9 +3707,15 @@ end:;
>  	if (out_fence_fd >= 0)
>  		put_unused_fd(out_fence_fd);
>  err_out_fence:
> +	kfree(out_fences);
>  	dma_fence_put(exec_fence);
>  err_exec_fence:
>  	dma_fence_put(in_fence);
> +err_parent:
> +	if (parent)
> +		intel_context_put(parent);
> +err_context:
> +	i915_gem_context_put(ctx);
>  
>  	return err;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index f396993374da..2c07f5f22c94 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -472,6 +472,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>  	ce->guc_id = GUC_INVALID_LRC_ID;
>  	INIT_LIST_HEAD(&ce->guc_id_link);
>  
> +	mutex_init(&ce->parallel_submit);
> +	ce->fence_context = dma_fence_context_alloc(1);
> +
>  	/*
>  	 * Initialize fence to be complete as this is expected to be complete
>  	 * unless there is a pending schedule disable outstanding.
> @@ -498,6 +501,8 @@ void intel_context_fini(struct intel_context *ce)
>  		for_each_child_safe(ce, child, next)
>  			intel_context_put(child);
>  
> +	mutex_destroy(&ce->parallel_submit);
> +
>  	mutex_destroy(&ce->pin_mutex);
>  	i915_active_fini(&ce->active);
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index fdc4890335b7..8af9ace4c052 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -235,6 +235,15 @@ struct intel_context {
>  
>  	/* Last request submitted on a parent */
>  	struct i915_request *last_rq;
> +
> +	/* Parallel submission mutex */
> +	struct mutex parallel_submit;
> +
> +	/* Fence context for parallel submission */
> +	u64 fence_context;
> +
> +	/* Seqno for parallel submission */
> +	u32 seqno;
>  };
>  
>  #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 4b7fc4647e46..ed4e790276a9 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -1234,9 +1234,11 @@ int __i915_vma_move_to_active(struct i915_vma *vma, struct i915_request *rq)
>  	return i915_active_add_request(&vma->active, rq);
>  }
>  
> -int i915_vma_move_to_active(struct i915_vma *vma,
> -			    struct i915_request *rq,
> -			    unsigned int flags)
> +int _i915_vma_move_to_active(struct i915_vma *vma,
> +			     struct i915_request *rq,
> +			     unsigned int flags,
> +			     struct dma_fence *shared_fence,
> +			     struct dma_fence *excl_fence)
>  {
>  	struct drm_i915_gem_object *obj = vma->obj;
>  	int err;
> @@ -1257,7 +1259,7 @@ int i915_vma_move_to_active(struct i915_vma *vma,
>  			intel_frontbuffer_put(front);
>  		}
>  
> -		dma_resv_add_excl_fence(vma->resv, &rq->fence);
> +		dma_resv_add_excl_fence(vma->resv, excl_fence);
>  		obj->write_domain = I915_GEM_DOMAIN_RENDER;
>  		obj->read_domains = 0;
>  	} else {
> @@ -1267,7 +1269,8 @@ int i915_vma_move_to_active(struct i915_vma *vma,
>  				return err;
>  		}
>  
> -		dma_resv_add_shared_fence(vma->resv, &rq->fence);
> +		if (shared_fence)
> +			dma_resv_add_shared_fence(vma->resv, shared_fence);
>  		obj->write_domain = 0;
>  	}
>  
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index ed69f66c7ab0..a36da651dbff 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -57,9 +57,19 @@ static inline bool i915_vma_is_active(const struct i915_vma *vma)
>  
>  int __must_check __i915_vma_move_to_active(struct i915_vma *vma,
>  					   struct i915_request *rq);
> -int __must_check i915_vma_move_to_active(struct i915_vma *vma,
> -					 struct i915_request *rq,
> -					 unsigned int flags);
> +
> +int __must_check _i915_vma_move_to_active(struct i915_vma *vma,
> +					  struct i915_request *rq,
> +					  unsigned int flags,
> +					  struct dma_fence *shared_fence,
> +					  struct dma_fence *excl_fence);
> +static inline int __must_check
> +i915_vma_move_to_active(struct i915_vma *vma,
> +			struct i915_request *rq,
> +			unsigned int flags)
> +{
> +	return _i915_vma_move_to_active(vma, rq, flags, &rq->fence, &rq->fence);
> +}
>  
>  #define __i915_vma_flags(v) ((unsigned long *)&(v)->flags.counter)
>  
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 40/46] drm/i915: Multi-batch execbuffer2
Date: Mon, 9 Aug 2021 19:02:21 +0200	[thread overview]
Message-ID: <YRFfnZYTRSWGFUSy@phenom.ffwll.local> (raw)
In-Reply-To: <20210803222943.27686-41-matthew.brost@intel.com>

On Tue, Aug 03, 2021 at 03:29:37PM -0700, Matthew Brost wrote:
> For contexts with width set to two or more, we add a mode to execbuf2
> which implies there are N batch buffers in the buffer list, each of
> which will be sent to one of the engines from the engine map array
> (I915_CONTEXT_PARAM_ENGINES, I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT).
> 
> Those N batches can either be first N, or last N objects in the list as
> controlled by the existing execbuffer2 flag.
> 
> The N batches will be submitted to consecutive engines from the previously
> configured allowed engine array starting at index 0.
> 
> Input and output fences are fully supported, with the latter getting
> signalled when all batch buffers have completed.
> 
> Last, it isn't safe for subsequent batches to touch any objects written
> to by a multi-BB submission until all the batches in that submission
> complete. As such all batches in a multi-BB submission must be combined
> into a single composite fence and put into the dma reseveration excl
> fence slot.
> 
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

So either I've missed something, or this has the exact deadlock issue as
the old submit fence, except it's all internally in the kmd.

Also, this is bad news (if I'm right about what's going on here).

- Between each batch submission we drop the dma_resv_locks on the objects.
  This can currently even happen due to relocations within a submission,
  but since we don't allow relocations on platforms with parallel
  submit/guc scheduler, this could be worked around.

- When the buffer is unlocked someone else could step in and do exactly
  what you say is not allowed, namely touch the object.

- The indivual batch fences won't completely until the last one has
  finished, leading to a deadlock which might or might not get resolved by
  gpu reset code. Since the deadlock is on the submission side I'm
  assuming the answer is "it won't be resolved by gpu reset", but maybe
  you do have a "I'm stuck for too long, let's ragequit" timer in your
  state machine somewhere. Old bonded submit would be rescued by the
  hangcheck we readded at least because there it's all completely
  free-floating requests.

- ttm on dgpu makes this all substantially worse.

The fundamental fix is still to build up a single i915_request, go through
the execbuf flow once, and then split things up again in the backend. That
would also mean all your prep work to pull execbuf prep step out of
do_execbuf() is a pure distraction.

I'm not yet fully understanding all the ordering rules drm/sched has, but
I don't think it will be any happier about this kind of submission model.

tldr; what do?

Cheers, Daniel
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 262 +++++++++++++++---
>  drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>  drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
>  drivers/gpu/drm/i915/i915_vma.c               |  13 +-
>  drivers/gpu/drm/i915/i915_vma.h               |  16 +-
>  5 files changed, 266 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index b6143973ac67..ecdb583cc2eb 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -252,6 +252,9 @@ struct i915_execbuffer {
>  	struct eb_vma *batch; /** identity of the batch obj/vma */
>  	struct i915_vma *trampoline; /** trampoline used for chaining */
>  
> +	/** used for excl fence in dma_resv objects when > 1 BB submitted */
> +	struct dma_fence *composite_fence;
> +
>  	/* batch_index in vma list */
>  	unsigned int batch_index;
>  
> @@ -367,11 +370,6 @@ static int eb_create(struct i915_execbuffer *eb)
>  		eb->lut_size = -eb->buffer_count;
>  	}
>  
> -	if (eb->args->flags & I915_EXEC_BATCH_FIRST)
> -		eb->batch_index = 0;
> -	else
> -		eb->batch_index = eb->args->buffer_count - 1;
> -
>  	return 0;
>  }
>  
> @@ -2241,7 +2239,7 @@ static int eb_relocate_parse(struct i915_execbuffer *eb)
>  	return err;
>  }
>  
> -static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first)
> +static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first, bool last)
>  {
>  	const unsigned int count = eb->buffer_count;
>  	unsigned int i = count;
> @@ -2289,8 +2287,16 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb, bool first)
>  		}
>  
>  		if (err == 0)
> -			err = i915_vma_move_to_active(vma, eb->request,
> -						      flags | __EXEC_OBJECT_NO_RESERVE);
> +			err = _i915_vma_move_to_active(vma, eb->request,
> +						       flags | __EXEC_OBJECT_NO_RESERVE,
> +						       !last ?
> +						       NULL :
> +						       eb->composite_fence ?
> +						       eb->composite_fence :
> +						       &eb->request->fence,
> +						       eb->composite_fence ?
> +						       eb->composite_fence :
> +						       &eb->request->fence);
>  	}
>  
>  #ifdef CONFIG_MMU_NOTIFIER
> @@ -2528,14 +2534,14 @@ static int eb_parse(struct i915_execbuffer *eb)
>  }
>  
>  static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch,
> -		     bool first)
> +		     bool first, bool last)
>  {
>  	int err;
>  
>  	if (intel_context_nopreempt(eb->context))
>  		__set_bit(I915_FENCE_FLAG_NOPREEMPT, &eb->request->fence.flags);
>  
> -	err = eb_move_to_gpu(eb, first);
> +	err = eb_move_to_gpu(eb, first, last);
>  	if (err)
>  		return err;
>  
> @@ -2748,7 +2754,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
>  }
>  
>  static int
> -eb_select_engine(struct i915_execbuffer *eb)
> +eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)
>  {
>  	struct intel_context *ce;
>  	unsigned int idx;
> @@ -2763,6 +2769,18 @@ eb_select_engine(struct i915_execbuffer *eb)
>  	if (IS_ERR(ce))
>  		return PTR_ERR(ce);
>  
> +	if (batch_number > 0) {
> +		struct intel_context *parent = ce;
> +
> +		GEM_BUG_ON(!intel_context_is_parent(parent));
> +
> +		for_each_child(parent, ce)
> +			if (!--batch_number)
> +				break;
> +		intel_context_put(parent);
> +		intel_context_get(ce);
> +	}
> +
>  	intel_gt_pm_get(ce->engine->gt);
>  
>  	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
> @@ -3155,13 +3173,49 @@ parse_execbuf2_extensions(struct drm_i915_gem_execbuffer2 *args,
>  				    eb);
>  }
>  
> +static int setup_composite_fence(struct i915_execbuffer *eb,
> +				 struct dma_fence **out_fence,
> +				 unsigned int num_batches)
> +{
> +	struct dma_fence_array *fence_array;
> +	struct dma_fence **fences = kmalloc(num_batches * sizeof(*fences),
> +					    GFP_KERNEL);
> +	struct intel_context *parent = intel_context_to_parent(eb->context);
> +	int i;
> +
> +	if (!fences)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < num_batches; ++i)
> +		fences[i] = out_fence[i];
> +
> +	fence_array = dma_fence_array_create(num_batches,
> +					     fences,
> +					     parent->fence_context,
> +					     ++parent->seqno,
> +					     false);
> +	if (!fence_array) {
> +		kfree(fences);
> +		return -ENOMEM;
> +	}
> +
> +	/* Move ownership to the dma_fence_array created above */
> +	for (i = 0; i < num_batches; ++i)
> +		dma_fence_get(fences[i]);
> +
> +	eb->composite_fence = &fence_array->base;
> +
> +	return 0;
> +}
> +
>  static int
>  i915_gem_do_execbuffer(struct drm_device *dev,
>  		       struct drm_file *file,
>  		       struct drm_i915_gem_execbuffer2 *args,
>  		       struct drm_i915_gem_exec_object2 *exec,
> -		       int batch_index,
> +		       unsigned int batch_index,
>  		       unsigned int num_batches,
> +		       unsigned int batch_number,
>  		       struct dma_fence *in_fence,
>  		       struct dma_fence *exec_fence,
>  		       struct dma_fence **out_fence)
> @@ -3170,6 +3224,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>  	struct i915_execbuffer eb;
>  	struct i915_vma *batch;
>  	int err;
> +	bool first = batch_number == 0;
> +	bool last = batch_number + 1 == num_batches;
>  
>  	BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS);
>  	BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS &
> @@ -3194,6 +3250,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>  	eb.batch_start_offset = args->batch_start_offset;
>  	eb.batch_len = args->batch_len;
>  	eb.trampoline = NULL;
> +	eb.composite_fence = NULL;
>  
>  	eb.fences = NULL;
>  	eb.num_fences = 0;
> @@ -3219,14 +3276,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>  	GEM_BUG_ON(!eb.lut_size);
>  
>  	eb.num_batches = num_batches;
> -	if (batch_index >= 0)
> -		eb.batch_index = batch_index;
> +	eb.batch_index = batch_index;
>  
>  	err = eb_select_context(&eb);
>  	if (unlikely(err))
>  		goto err_destroy;
>  
> -	err = eb_select_engine(&eb);
> +	err = eb_select_engine(&eb, batch_number);
>  	if (unlikely(err))
>  		goto err_context;
>  
> @@ -3275,6 +3331,23 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>  			goto err_ext;
>  	}
>  
> +	if (out_fence) {
> +		/* Move ownership to caller (i915_gem_execbuffer2_ioctl) */
> +		out_fence[batch_number] = dma_fence_get(&eb.request->fence);
> +
> +		/*
> +		 * Need to create a composite fence (dma_fence_array,
> +		 * eb.composite_fence) for the excl fence of the dma_resv
> +		 * objects as each BB can write to the object. Since we create
> +		 */
> +		if (num_batches > 1 && last) {
> +			err = setup_composite_fence(&eb, out_fence,
> +						    num_batches);
> +			if (err < 0)
> +				goto err_request;
> +		}
> +	}
> +
>  	if (exec_fence) {
>  		err = i915_request_await_execution(eb.request,
>  						   exec_fence);
> @@ -3307,17 +3380,27 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>  		intel_gt_buffer_pool_mark_active(eb.batch_pool, eb.request);
>  
>  	trace_i915_request_queue(eb.request, eb.batch_flags);
> -	err = eb_submit(&eb, batch, true);
> +	err = eb_submit(&eb, batch, first, last);
>  
>  err_request:
> +	if (last)
> +		set_bit(I915_FENCE_FLAG_SUBMIT_PARALLEL,
> +			&eb.request->fence.flags);
> +
>  	i915_request_get(eb.request);
>  	err = eb_request_add(&eb, err);
>  
>  	if (eb.fences)
>  		signal_fence_array(&eb);
>  
> -	if (!err && out_fence)
> -		*out_fence = dma_fence_get(&eb.request->fence);
> +	/*
> +	 * Ownership of the composite fence (dma_fence_array,
> +	 * eb.composite_fence) has been moved to the dma_resv objects these BB
> +	 * write to in i915_vma_move_to_active. It is ok to release the creation
> +	 * reference of this fence now.
> +	 */
> +	if (eb.composite_fence)
> +		dma_fence_put(eb.composite_fence);
>  
>  	if (unlikely(eb.gem_context->syncobj)) {
>  		drm_syncobj_replace_fence(eb.gem_context->syncobj,
> @@ -3368,6 +3451,17 @@ static bool check_buffer_count(size_t count)
>  	return !(count < 1 || count > INT_MAX || count > SIZE_MAX / sz - 1);
>  }
>  
> +/* Release fences from the dma_fence_get in i915_gem_do_execbuffer. */
> +static inline void put_out_fences(struct dma_fence **out_fences,
> +				  unsigned int num_batches)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_batches; ++i)
> +		if (out_fences[i])
> +			dma_fence_put(out_fences[i]);
> +}
> +
>  int
>  i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>  			   struct drm_file *file)
> @@ -3375,13 +3469,16 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>  	struct drm_i915_private *i915 = to_i915(dev);
>  	struct drm_i915_gem_execbuffer2 *args = data;
>  	struct drm_i915_gem_exec_object2 *exec2_list;
> -	struct dma_fence **out_fence_p = NULL;
> -	struct dma_fence *out_fence = NULL;
> +	struct dma_fence **out_fences = NULL;
>  	struct dma_fence *in_fence = NULL;
>  	struct dma_fence *exec_fence = NULL;
>  	int out_fence_fd = -1;
>  	const size_t count = args->buffer_count;
>  	int err;
> +	struct i915_gem_context *ctx;
> +	struct intel_context *parent = NULL;
> +	unsigned int num_batches = 1, i;
> +	bool is_parallel = false;
>  
>  	if (!check_buffer_count(count)) {
>  		drm_dbg(&i915->drm, "execbuf2 with %zd buffers\n", count);
> @@ -3404,10 +3501,39 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>  	if (err)
>  		return err;
>  
> +	ctx = i915_gem_context_lookup(file->driver_priv, args->rsvd1);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	if (i915_gem_context_user_engines(ctx)) {
> +		parent = i915_gem_context_get_engine(ctx, args->flags &
> +						     I915_EXEC_RING_MASK);
> +		if (IS_ERR(parent)) {
> +			err = PTR_ERR(parent);
> +			goto err_context;
> +		}
> +
> +		if (intel_context_is_parent(parent)) {
> +			if (args->batch_len) {
> +				err = -EINVAL;
> +				goto err_context;
> +			}
> +
> +			num_batches = parent->guc_number_children + 1;
> +			if (num_batches > count) {
> +				i915_gem_context_put(ctx);
> +				goto err_parent;
> +			}
> +			is_parallel = true;
> +		}
> +	}
> +
>  	if (args->flags & I915_EXEC_FENCE_IN) {
>  		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
> -		if (!in_fence)
> -			return -EINVAL;
> +		if (!in_fence) {
> +			err = -EINVAL;
> +			goto err_parent;
> +		}
>  	}
>  
>  	if (args->flags & I915_EXEC_FENCE_SUBMIT) {
> @@ -3423,13 +3549,25 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>  		}
>  	}
>  
> -	if (args->flags & I915_EXEC_FENCE_OUT) {
> -		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
> -		if (out_fence_fd < 0) {
> -			err = out_fence_fd;
> +	/*
> +	 * We always allocate out fences when doing multi-BB submission as
> +	 * this is required to create an excl fence for any dma buf objects
> +	 * these BBs touch.
> +	 */
> +	if (args->flags & I915_EXEC_FENCE_OUT || is_parallel) {
> +		out_fences = kcalloc(num_batches, sizeof(*out_fences),
> +				     GFP_KERNEL);
> +		if (!out_fences) {
> +			err = -ENOMEM;
>  			goto err_out_fence;
>  		}
> -		out_fence_p = &out_fence;
> +		if (args->flags & I915_EXEC_FENCE_OUT) {
> +			out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
> +			if (out_fence_fd < 0) {
> +				err = out_fence_fd;
> +				goto err_out_fence;
> +			}
> +		}
>  	}
>  
>  	/* Allocate extra slots for use by the command parser */
> @@ -3449,8 +3587,35 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data,
>  		goto err_copy;
>  	}
>  
> -	err = i915_gem_do_execbuffer(dev, file, args, exec2_list, -1, 1,
> -				     in_fence, exec_fence, out_fence_p);
> +	/*
> +	 * Downstream submission code expects all parallel submissions to occur
> +	 * in intel_context sequence, thus only 1 submission can happen at a
> +	 * time.
> +	 */
> +	if (is_parallel)
> +		mutex_lock(&parent->parallel_submit);
> +
> +	err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
> +				     args->flags & I915_EXEC_BATCH_FIRST ?
> +				     0 : count - num_batches,
> +				     num_batches,
> +				     0,
> +				     in_fence,
> +				     exec_fence,
> +				     out_fences);
> +
> +	for (i = 1; err == 0 && i < num_batches; i++)
> +		err = i915_gem_do_execbuffer(dev, file, args, exec2_list,
> +					     args->flags & I915_EXEC_BATCH_FIRST ?
> +					     i : count - num_batches + i,
> +					     num_batches,
> +					     i,
> +					     NULL,
> +					     NULL,
> +					     out_fences);
> +
> +	if (is_parallel)
> +		mutex_unlock(&parent->parallel_submit);
>  
>  	/*
>  	 * Now that we have begun execution of the batchbuffer, we ignore
> @@ -3491,8 +3656,31 @@ end:;
>  	}
>  
>  	if (!err && out_fence_fd >= 0) {
> +		struct dma_fence *out_fence = NULL;
>  		struct sync_file *sync_fence;
>  
> +		if (is_parallel) {
> +			struct dma_fence_array *fence_array;
> +
> +			/*
> +			 * The dma_fence_array now owns out_fences (from
> +			 * dma_fence_get in i915_gem_do_execbuffer) assuming
> +			 * successful creation of dma_fence_array.
> +			 */
> +			fence_array = dma_fence_array_create(num_batches,
> +							     out_fences,
> +							     parent->fence_context,
> +							     ++parent->seqno,
> +							     false);
> +			if (!fence_array)
> +				goto put_out_fences;
> +
> +			out_fence = &fence_array->base;
> +			out_fences = NULL;
> +		} else {
> +			out_fence = out_fences[0];
> +		}
> +
>  		sync_fence = sync_file_create(out_fence);
>  		if (sync_fence) {
>  			fd_install(out_fence_fd, sync_fence->file);
> @@ -3500,9 +3688,15 @@ end:;
>  			args->rsvd2 |= (u64)out_fence_fd << 32;
>  			out_fence_fd = -1;
>  		}
> +
> +		/*
> +		 * The sync_file now owns out_fence, drop the creation
> +		 * reference.
> +		 */
>  		dma_fence_put(out_fence);
> -	} else if (out_fence) {
> -		dma_fence_put(out_fence);
> +	} else if (out_fences) {
> +put_out_fences:
> +		put_out_fences(out_fences, num_batches);
>  	}
>  
>  	args->flags &= ~__I915_EXEC_UNKNOWN_FLAGS;
> @@ -3513,9 +3707,15 @@ end:;
>  	if (out_fence_fd >= 0)
>  		put_unused_fd(out_fence_fd);
>  err_out_fence:
> +	kfree(out_fences);
>  	dma_fence_put(exec_fence);
>  err_exec_fence:
>  	dma_fence_put(in_fence);
> +err_parent:
> +	if (parent)
> +		intel_context_put(parent);
> +err_context:
> +	i915_gem_context_put(ctx);
>  
>  	return err;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index f396993374da..2c07f5f22c94 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -472,6 +472,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>  	ce->guc_id = GUC_INVALID_LRC_ID;
>  	INIT_LIST_HEAD(&ce->guc_id_link);
>  
> +	mutex_init(&ce->parallel_submit);
> +	ce->fence_context = dma_fence_context_alloc(1);
> +
>  	/*
>  	 * Initialize fence to be complete as this is expected to be complete
>  	 * unless there is a pending schedule disable outstanding.
> @@ -498,6 +501,8 @@ void intel_context_fini(struct intel_context *ce)
>  		for_each_child_safe(ce, child, next)
>  			intel_context_put(child);
>  
> +	mutex_destroy(&ce->parallel_submit);
> +
>  	mutex_destroy(&ce->pin_mutex);
>  	i915_active_fini(&ce->active);
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index fdc4890335b7..8af9ace4c052 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -235,6 +235,15 @@ struct intel_context {
>  
>  	/* Last request submitted on a parent */
>  	struct i915_request *last_rq;
> +
> +	/* Parallel submission mutex */
> +	struct mutex parallel_submit;
> +
> +	/* Fence context for parallel submission */
> +	u64 fence_context;
> +
> +	/* Seqno for parallel submission */
> +	u32 seqno;
>  };
>  
>  #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 4b7fc4647e46..ed4e790276a9 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -1234,9 +1234,11 @@ int __i915_vma_move_to_active(struct i915_vma *vma, struct i915_request *rq)
>  	return i915_active_add_request(&vma->active, rq);
>  }
>  
> -int i915_vma_move_to_active(struct i915_vma *vma,
> -			    struct i915_request *rq,
> -			    unsigned int flags)
> +int _i915_vma_move_to_active(struct i915_vma *vma,
> +			     struct i915_request *rq,
> +			     unsigned int flags,
> +			     struct dma_fence *shared_fence,
> +			     struct dma_fence *excl_fence)
>  {
>  	struct drm_i915_gem_object *obj = vma->obj;
>  	int err;
> @@ -1257,7 +1259,7 @@ int i915_vma_move_to_active(struct i915_vma *vma,
>  			intel_frontbuffer_put(front);
>  		}
>  
> -		dma_resv_add_excl_fence(vma->resv, &rq->fence);
> +		dma_resv_add_excl_fence(vma->resv, excl_fence);
>  		obj->write_domain = I915_GEM_DOMAIN_RENDER;
>  		obj->read_domains = 0;
>  	} else {
> @@ -1267,7 +1269,8 @@ int i915_vma_move_to_active(struct i915_vma *vma,
>  				return err;
>  		}
>  
> -		dma_resv_add_shared_fence(vma->resv, &rq->fence);
> +		if (shared_fence)
> +			dma_resv_add_shared_fence(vma->resv, shared_fence);
>  		obj->write_domain = 0;
>  	}
>  
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index ed69f66c7ab0..a36da651dbff 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -57,9 +57,19 @@ static inline bool i915_vma_is_active(const struct i915_vma *vma)
>  
>  int __must_check __i915_vma_move_to_active(struct i915_vma *vma,
>  					   struct i915_request *rq);
> -int __must_check i915_vma_move_to_active(struct i915_vma *vma,
> -					 struct i915_request *rq,
> -					 unsigned int flags);
> +
> +int __must_check _i915_vma_move_to_active(struct i915_vma *vma,
> +					  struct i915_request *rq,
> +					  unsigned int flags,
> +					  struct dma_fence *shared_fence,
> +					  struct dma_fence *excl_fence);
> +static inline int __must_check
> +i915_vma_move_to_active(struct i915_vma *vma,
> +			struct i915_request *rq,
> +			unsigned int flags)
> +{
> +	return _i915_vma_move_to_active(vma, rq, flags, &rq->fence, &rq->fence);
> +}
>  
>  #define __i915_vma_flags(v) ((unsigned long *)&(v)->flags.counter)
>  
> -- 
> 2.28.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2021-08-09 17:02 UTC|newest]

Thread overview: 186+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-03 22:28 [PATCH 00/46] Parallel submission aka multi-bb execbuf Matthew Brost
2021-08-03 22:28 ` [Intel-gfx] " Matthew Brost
2021-08-03 22:28 ` [PATCH 01/46] drm/i915/guc: Allow flexible number of context ids Matthew Brost
2021-08-03 22:28   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:28 ` [PATCH 02/46] drm/i915/guc: Connect the number of guc_ids to debugfs Matthew Brost
2021-08-03 22:28   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 03/46] drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-05  8:27   ` Daniel Vetter
2021-08-05  8:27     ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 04/46] drm/i915/guc: Don't allow requests not ready to consume all guc_ids Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-05  8:29   ` Daniel Vetter
2021-08-03 22:29 ` [PATCH 05/46] drm/i915/guc: Introduce guc_submit_engine object Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 06/46] drm/i915/guc: Check return of __xa_store when registering a context Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 07/46] drm/i915/guc: Non-static lrc descriptor registration buffer Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 08/46] drm/i915/guc: Take GT PM ref when deregistering context Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 09/46] drm/i915: Add GT PM unpark worker Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 10/46] drm/i915/guc: Take engine PM when a context is pinned with GuC submission Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 14:23   ` Daniel Vetter
2021-08-09 14:23     ` [Intel-gfx] " Daniel Vetter
2021-08-09 18:11     ` Matthew Brost
2021-08-09 18:11       ` [Intel-gfx] " Matthew Brost
2021-08-10  6:43       ` Daniel Vetter
2021-08-10  6:43         ` [Intel-gfx] " Daniel Vetter
2021-08-10 21:29         ` Matthew Brost
2021-08-10 21:29           ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 11/46] drm/i915/guc: Don't call switch_to_kernel_context " Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 14:27   ` Daniel Vetter
2021-08-09 18:20     ` Matthew Brost
2021-08-10  6:47       ` Daniel Vetter
2021-08-11 17:47         ` Matthew Brost
2021-08-03 22:29 ` [PATCH 12/46] drm/i915/guc: Selftest for GuC flow control Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 13/46] drm/i915: Add logical engine mapping Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 14:28   ` Daniel Vetter
2021-08-09 14:28     ` [Intel-gfx] " Daniel Vetter
2021-08-09 18:28     ` Matthew Brost
2021-08-09 18:28       ` [Intel-gfx] " Matthew Brost
2021-08-10  6:49       ` Daniel Vetter
2021-08-10  6:49         ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 14/46] drm/i915: Expose logical engine instance to user Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 14:30   ` Daniel Vetter
2021-08-09 14:30     ` [Intel-gfx] " Daniel Vetter
2021-08-09 18:37     ` Matthew Brost
2021-08-09 18:37       ` [Intel-gfx] " Matthew Brost
2021-08-10  6:53       ` Daniel Vetter
2021-08-10  6:53         ` [Intel-gfx] " Daniel Vetter
2021-08-11 17:55         ` Matthew Brost
2021-08-11 17:55           ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 15/46] drm/i915/guc: Introduce context parent-child relationship Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 14:37   ` Daniel Vetter
2021-08-09 14:40     ` Daniel Vetter
2021-08-09 18:45       ` Matthew Brost
2021-08-09 18:44     ` Matthew Brost
2021-08-10  8:45       ` Daniel Vetter
2021-08-03 22:29 ` [PATCH 16/46] drm/i915/guc: Implement GuC parent-child context pin / unpin functions Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 15:17   ` Daniel Vetter
2021-08-09 18:58     ` Matthew Brost
2021-08-10  8:53       ` Daniel Vetter
2021-08-10  9:07         ` Daniel Vetter
2021-08-11 18:06           ` Matthew Brost
2021-08-12 14:45             ` Daniel Vetter
2021-08-12 14:52               ` Daniel Vetter
2021-08-11 18:23         ` Matthew Brost
2021-08-03 22:29 ` [PATCH 17/46] drm/i915/guc: Add multi-lrc context registration Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 18/46] drm/i915/guc: Ensure GuC schedule operations do not operate on child contexts Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 19/46] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 15:31   ` Daniel Vetter
2021-08-09 15:31     ` [Intel-gfx] " Daniel Vetter
2021-08-09 19:03     ` Matthew Brost
2021-08-09 19:03       ` [Intel-gfx] " Matthew Brost
2021-08-10  9:12       ` Daniel Vetter
2021-08-10  9:12         ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 20/46] drm/i915/guc: Add hang check to GuC submit engine Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 15:35   ` Daniel Vetter
2021-08-09 15:35     ` [Intel-gfx] " Daniel Vetter
2021-08-09 19:05     ` Matthew Brost
2021-08-09 19:05       ` [Intel-gfx] " Matthew Brost
2021-08-10  9:18       ` Daniel Vetter
2021-08-10  9:18         ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 21/46] drm/i915/guc: Add guc_child_context_destroy Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 15:36   ` Daniel Vetter
2021-08-09 19:06     ` Matthew Brost
2021-08-03 22:29 ` [PATCH 22/46] drm/i915/guc: Implement multi-lrc submission Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 23/46] drm/i915/guc: Insert submit fences between requests in parent-child relationship Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 16:32   ` Daniel Vetter
2021-08-09 16:39     ` Matthew Brost
2021-08-09 17:03       ` Daniel Vetter
2021-08-03 22:29 ` [PATCH 24/46] drm/i915/guc: Implement multi-lrc reset Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 25/46] drm/i915/guc: Update debugfs for GuC multi-lrc Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 16:36   ` Daniel Vetter
2021-08-09 16:36     ` [Intel-gfx] " Daniel Vetter
2021-08-09 19:13     ` Matthew Brost
2021-08-09 19:13       ` [Intel-gfx] " Matthew Brost
2021-08-10  9:23       ` Daniel Vetter
2021-08-10  9:23         ` [Intel-gfx] " Daniel Vetter
2021-08-10  9:27         ` Daniel Vetter
2021-08-10  9:27           ` [Intel-gfx] " Daniel Vetter
2021-08-10 17:29           ` Matthew Brost
2021-08-10 17:29             ` [Intel-gfx] " Matthew Brost
2021-08-11 10:04             ` Daniel Vetter
2021-08-11 10:04               ` [Intel-gfx] " Daniel Vetter
2021-08-11 17:35               ` Matthew Brost
2021-08-11 17:35                 ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 26/46] drm/i915: Connect UAPI to GuC multi-lrc interface Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 16:37   ` Daniel Vetter
2021-08-09 16:37     ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 27/46] drm/i915/doc: Update parallel submit doc to point to i915_drm.h Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 28/46] drm/i915/guc: Add basic GuC multi-lrc selftest Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 29/46] drm/i915/guc: Extend GuC flow control selftest for multi-lrc Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 30/46] drm/i915/guc: Implement no mid batch preemption " Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 31/46] drm/i915: Move secure execbuf check to execbuf2 Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 32/46] drm/i915: Move input/exec fence handling to i915_gem_execbuffer2 Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 33/46] drm/i915: Move output " Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 34/46] drm/i915: Return output fence from i915_gem_do_execbuffer Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 35/46] drm/i915: Store batch index in struct i915_execbuffer Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 36/46] drm/i915: Allow callers of i915_gem_do_execbuffer to override the batch index Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 37/46] drm/i915: Teach execbuf there can be more than one batch in the objects list Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 38/46] drm/i915: Only track object dependencies on first request Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 39/46] drm/i915: Force parallel contexts to use copy engine for reloc Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 16:39   ` Daniel Vetter
2021-08-09 16:39     ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 40/46] drm/i915: Multi-batch execbuffer2 Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 17:02   ` Daniel Vetter [this message]
2021-08-09 17:02     ` Daniel Vetter
2021-08-03 22:29 ` [PATCH 41/46] drm/i915: Eliminate unnecessary VMA calls for multi-BB submission Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 17:07   ` Daniel Vetter
2021-08-09 17:12     ` Daniel Vetter
2021-08-03 22:29 ` [PATCH 42/46] drm/i915: Hold all parallel requests until last request, properly handle error Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 43/46] drm/i915/guc: Handle errors in multi-lrc requests Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 44/46] drm/i915: Enable multi-bb execbuf Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 45/46] drm/i915/execlists: Weak parallel submission support for execlists Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 46/46] drm/i915/guc: Add delay before disabling scheduling on contexts Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 17:17   ` Daniel Vetter
2021-08-09 19:32     ` Matthew Brost
2021-08-11  9:55       ` Daniel Vetter
2021-08-11 17:43         ` Matthew Brost
2021-08-12 14:04           ` Daniel Vetter
2021-08-12 19:26   ` Daniel Vetter
2021-08-03 22:51 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Parallel submission aka multi-bb execbuf (rev2) Patchwork
2021-08-03 22:53 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-08-03 22:57 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2021-08-03 23:19 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-08-05  3:53 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YRFfnZYTRSWGFUSy@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.