All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 10/20] drm/i915: Export a preallocate variant of i915_active_acquire()
Date: Mon, 13 Jul 2020 15:29:17 +0100	[thread overview]
Message-ID: <f2797001-2cca-e82c-42e2-015a9b874ec2@linux.intel.com> (raw)
In-Reply-To: <20200706061926.6687-11-chris@chris-wilson.co.uk>


On 06/07/2020 07:19, Chris Wilson wrote:
> Sometimes we have to be very careful not to allocate underneath a mutex
> (or spinlock) and yet still want to track activity. Enter
> i915_active_acquire_for_context(). This raises the activity counter on
> i915_active prior to use and ensures that the fence-tree contains a slot
> for the context.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
>   drivers/gpu/drm/i915/gt/intel_timeline.c      |   4 +-
>   drivers/gpu/drm/i915/i915_active.c            | 113 +++++++++++++++---
>   drivers/gpu/drm/i915/i915_active.h            |  14 ++-
>   4 files changed, 113 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 1015c4fd9f3e..6d20be29ff3c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -1789,7 +1789,7 @@ __parser_mark_active(struct i915_vma *vma,
>   {
>   	struct intel_gt_buffer_pool_node *node = vma->private;
>   
> -	return i915_active_ref(&node->active, tl, fence);
> +	return i915_active_ref(&node->active, tl->fence_context, fence);
>   }
>   
>   static int
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> index 4546284fede1..e4a5326633b8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> @@ -479,7 +479,9 @@ __intel_timeline_get_seqno(struct intel_timeline *tl,
>   	 * free it after the current request is retired, which ensures that
>   	 * all writes into the cacheline from previous requests are complete.
>   	 */
> -	err = i915_active_ref(&tl->hwsp_cacheline->active, tl, &rq->fence);
> +	err = i915_active_ref(&tl->hwsp_cacheline->active,
> +			      tl->fence_context,
> +			      &rq->fence);
>   	if (err)
>   		goto err_cacheline;
>   
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index d960d0be5bd2..3f595446fd44 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -217,11 +217,10 @@ excl_retire(struct dma_fence *fence, struct dma_fence_cb *cb)
>   }
>   
>   static struct i915_active_fence *
> -active_instance(struct i915_active *ref, struct intel_timeline *tl)
> +active_instance(struct i915_active *ref, u64 idx)
>   {
>   	struct active_node *node, *prealloc;
>   	struct rb_node **p, *parent;
> -	u64 idx = tl->fence_context;
>   
>   	/*
>   	 * We track the most recently used timeline to skip a rbtree search
> @@ -353,21 +352,17 @@ __active_del_barrier(struct i915_active *ref, struct active_node *node)
>   	return ____active_del_barrier(ref, node, barrier_to_engine(node));
>   }
>   
> -int i915_active_ref(struct i915_active *ref,
> -		    struct intel_timeline *tl,
> -		    struct dma_fence *fence)
> +int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence)
>   {
>   	struct i915_active_fence *active;
>   	int err;
>   
> -	lockdep_assert_held(&tl->mutex);
> -
>   	/* Prevent reaping in case we malloc/wait while building the tree */
>   	err = i915_active_acquire(ref);
>   	if (err)
>   		return err;
>   
> -	active = active_instance(ref, tl);
> +	active = active_instance(ref, idx);
>   	if (!active) {
>   		err = -ENOMEM;
>   		goto out;
> @@ -384,32 +379,104 @@ int i915_active_ref(struct i915_active *ref,
>   		atomic_dec(&ref->count);
>   	}
>   	if (!__i915_active_fence_set(active, fence))
> -		atomic_inc(&ref->count);
> +		__i915_active_acquire(ref);
>   
>   out:
>   	i915_active_release(ref);
>   	return err;
>   }
>   
> -struct dma_fence *
> -i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
> +static struct dma_fence *
> +__i915_active_set_fence(struct i915_active *ref,
> +			struct i915_active_fence *active,
> +			struct dma_fence *fence)
>   {
>   	struct dma_fence *prev;
>   
>   	/* We expect the caller to manage the exclusive timeline ordering */
>   	GEM_BUG_ON(i915_active_is_idle(ref));
>   
> +	if (is_barrier(active)) { /* proto-node used by our idle barrier */
> +		/*
> +		 * This request is on the kernel_context timeline, and so
> +		 * we can use it to substitute for the pending idle-barrer
> +		 * request that we want to emit on the kernel_context.
> +		 */
> +		__active_del_barrier(ref, node_from_active(active));
> +		RCU_INIT_POINTER(active->fence, NULL);

The condition, comments and operation up to here is duplicated from 
i915_active_ref not shown in the diff.

More important question is on the relationship between i915_active_ref 
and __i915_active_ref. Latter actually calls this new 
__i915_active_set_fence which is kind of surprising considering the 
usual patterns.

Is there any duplication of the APIs ie. opportunity to consolidate? For 
instance should i915_active_ref be broken up into 
preallocate/lookup/install helpers too?

Another confusing end result is that we'd end up with both 
__i915_active_fence_set(active-fence, dma-fence) and
__i915_active_set_fence(active, active-fence, dma-fence).

I am not offering any solutions just raising oddities at this point. :)

Regards,

Tvrtko

> +		atomic_dec(&ref->count);
> +	}
> +
>   	rcu_read_lock();
> -	prev = __i915_active_fence_set(&ref->excl, f);
> +	prev = __i915_active_fence_set(active, fence);
>   	if (prev)
>   		prev = dma_fence_get_rcu(prev);
>   	else
> -		atomic_inc(&ref->count);
> +		__i915_active_acquire(ref);
>   	rcu_read_unlock();
>   
>   	return prev;
>   }
>   
> +static struct i915_active_fence *
> +__active_lookup(struct i915_active *ref, u64 idx)
> +{
> +	struct active_node *node;
> +	struct rb_node *p;
> +
> +	/* Like active_instance() but with no malloc */
> +
> +	node = READ_ONCE(ref->cache);
> +	if (node && node->timeline == idx)
> +		return &node->base;
> +
> +	spin_lock_irq(&ref->tree_lock);
> +	GEM_BUG_ON(i915_active_is_idle(ref));
> +
> +	p = ref->tree.rb_node;
> +	while (p) {
> +		node = rb_entry(p, struct active_node, node);
> +		if (node->timeline == idx) {
> +			ref->cache = node;
> +			spin_unlock_irq(&ref->tree_lock);
> +			return &node->base;
> +		}
> +
> +		if (node->timeline < idx)
> +			p = p->rb_right;
> +		else
> +			p = p->rb_left;
> +	}
> +
> +	spin_unlock_irq(&ref->tree_lock);
> +
> +	return NULL;
> +}
> +
> +struct dma_fence *
> +__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence)
> +{
> +	struct dma_fence *prev = ERR_PTR(-ENOENT);
> +	struct i915_active_fence *active;
> +
> +	if (!i915_active_acquire_if_busy(ref))
> +		return ERR_PTR(-EINVAL);
> +
> +	active = __active_lookup(ref, idx);
> +	if (active)
> +		prev = __i915_active_set_fence(ref, active, fence);
> +
> +	i915_active_release(ref);
> +	return prev;
> +}
> +
> +struct dma_fence *
> +i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
> +{
> +	/* We expect the caller to manage the exclusive timeline ordering */
> +	return __i915_active_set_fence(ref, &ref->excl, f);
> +}
> +
>   bool i915_active_acquire_if_busy(struct i915_active *ref)
>   {
>   	debug_active_assert(ref);
> @@ -443,6 +510,24 @@ int i915_active_acquire(struct i915_active *ref)
>   	return err;
>   }
>   
> +int i915_active_acquire_for_context(struct i915_active *ref, u64 idx)
> +{
> +	struct i915_active_fence *active;
> +	int err;
> +
> +	err = i915_active_acquire(ref);
> +	if (err)
> +		return err;
> +
> +	active = active_instance(ref, idx);
> +	if (!active) {
> +		i915_active_release(ref);
> +		return -ENOMEM;
> +	}
> +
> +	return 0; /* return with active ref */
> +}
> +
>   void i915_active_release(struct i915_active *ref)
>   {
>   	debug_active_assert(ref);
> @@ -804,7 +889,7 @@ int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
>   			 */
>   			RCU_INIT_POINTER(node->base.fence, ERR_PTR(-EAGAIN));
>   			node->base.cb.node.prev = (void *)engine;
> -			atomic_inc(&ref->count);
> +			__i915_active_acquire(ref);
>   		}
>   		GEM_BUG_ON(rcu_access_pointer(node->base.fence) != ERR_PTR(-EAGAIN));
>   
> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
> index cf4058150966..2e0bcb3289ec 100644
> --- a/drivers/gpu/drm/i915/i915_active.h
> +++ b/drivers/gpu/drm/i915/i915_active.h
> @@ -163,14 +163,18 @@ void __i915_active_init(struct i915_active *ref,
>   	__i915_active_init(ref, active, retire, &__mkey, &__wkey);	\
>   } while (0)
>   
> -int i915_active_ref(struct i915_active *ref,
> -		    struct intel_timeline *tl,
> -		    struct dma_fence *fence);
> +struct dma_fence *
> +__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence);
> +int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence);
>   
>   static inline int
>   i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
>   {
> -	return i915_active_ref(ref, i915_request_timeline(rq), &rq->fence);
> +	struct intel_timeline *tl = i915_request_timeline(rq);
> +
> +	lockdep_assert_held(&tl->mutex);
> +
> +	return i915_active_ref(ref, tl->fence_context, &rq->fence);
>   }
>   
>   struct dma_fence *
> @@ -198,7 +202,9 @@ int i915_request_await_active(struct i915_request *rq,
>   #define I915_ACTIVE_AWAIT_BARRIER BIT(2)
>   
>   int i915_active_acquire(struct i915_active *ref);
> +int i915_active_acquire_for_context(struct i915_active *ref, u64 idx);
>   bool i915_active_acquire_if_busy(struct i915_active *ref);
> +
>   void i915_active_release(struct i915_active *ref);
>   
>   static inline void __i915_active_acquire(struct i915_active *ref)
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2020-07-13 14:29 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-06  6:19 [Intel-gfx] s/obj->mm.lock// Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 01/20] drm/i915: Preallocate stashes for vma page-directories Chris Wilson
2020-07-06 18:15   ` Matthew Auld
2020-07-06 18:20     ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 02/20] drm/i915: Switch to object allocations for page directories Chris Wilson
2020-07-06 19:06   ` Matthew Auld
2020-07-06 19:31     ` Chris Wilson
2020-07-06 20:01     ` Chris Wilson
2020-07-06 21:08       ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 03/20] drm/i915/gem: Don't drop the timeline lock during execbuf Chris Wilson
2020-07-08 16:54   ` Tvrtko Ursulin
2020-07-08 18:08     ` Chris Wilson
2020-07-09 10:52       ` Tvrtko Ursulin
2020-07-09 10:57         ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 04/20] drm/i915/gem: Rename execbuf.bind_link to unbound_link Chris Wilson
2020-07-10 11:26   ` Tvrtko Ursulin
2020-07-06  6:19 ` [Intel-gfx] [PATCH 05/20] drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup Chris Wilson
2020-07-10 11:27   ` Tvrtko Ursulin
2020-07-06  6:19 ` [Intel-gfx] [PATCH 06/20] drm/i915/gem: Remove the call for no-evict i915_vma_pin Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 07/20] drm/i915: Add list_for_each_entry_safe_continue_reverse Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 08/20] drm/i915: Always defer fenced work to the worker Chris Wilson
2020-07-08 12:18   ` Tvrtko Ursulin
2020-07-08 12:25     ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 09/20] drm/i915/gem: Assign context id for async work Chris Wilson
2020-07-08 12:26   ` Tvrtko Ursulin
2020-07-08 12:42     ` Chris Wilson
2020-07-08 14:24       ` Tvrtko Ursulin
2020-07-08 15:36         ` Chris Wilson
2020-07-09 11:01           ` Tvrtko Ursulin
2020-07-09 11:07             ` Chris Wilson
2020-07-09 11:59               ` Tvrtko Ursulin
2020-07-09 12:07                 ` Chris Wilson
2020-07-13 12:22                   ` Tvrtko Ursulin
2020-07-14 14:01                     ` Chris Wilson
2020-07-08 12:45     ` Tvrtko Ursulin
2020-07-06  6:19 ` [Intel-gfx] [PATCH 10/20] drm/i915: Export a preallocate variant of i915_active_acquire() Chris Wilson
2020-07-09 14:36   ` Maarten Lankhorst
2020-07-10 12:24     ` Tvrtko Ursulin
2020-07-10 12:32       ` Maarten Lankhorst
2020-07-13 14:29   ` Tvrtko Ursulin [this message]
2020-07-06  6:19 ` [Intel-gfx] [PATCH 11/20] drm/i915/gem: Separate the ww_mutex walker into its own list Chris Wilson
2020-07-13 14:53   ` Tvrtko Ursulin
2020-07-14 14:10     ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 12/20] drm/i915/gem: Asynchronous GTT unbinding Chris Wilson
2020-07-14  9:02   ` Tvrtko Ursulin
2020-07-14 15:05     ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 13/20] drm/i915/gem: Bind the fence async for execbuf Chris Wilson
2020-07-14 12:19   ` Tvrtko Ursulin
2020-07-14 15:21     ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 14/20] drm/i915/gem: Include cmdparser in common execbuf pinning Chris Wilson
2020-07-14 12:48   ` Tvrtko Ursulin
2020-07-06  6:19 ` [Intel-gfx] [PATCH 15/20] drm/i915/gem: Include secure batch " Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 16/20] drm/i915/gem: Reintroduce multiple passes for reloc processing Chris Wilson
2020-07-09 15:39   ` Tvrtko Ursulin
2020-07-06  6:19 ` [Intel-gfx] [PATCH 17/20] drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2 Chris Wilson
2020-07-06 17:21   ` kernel test robot
2020-07-06  6:19 ` [Intel-gfx] [PATCH 18/20] drm/i915/gem: Pull execbuf dma resv under a single critical section Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 19/20] drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class Chris Wilson
2020-07-09 14:06   ` Maarten Lankhorst
2020-07-06  6:19 ` [Intel-gfx] [PATCH 20/20] drm/i915: Track i915_vma with its own reference counter Chris Wilson
2020-07-06  6:28 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/20] drm/i915: Preallocate stashes for vma page-directories Patchwork
2020-07-06  6:29 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-07-06  6:51 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-07-06  7:55 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2020-07-27 18:53 ` [Intel-gfx] s/obj->mm.lock// Thomas Hellström (Intel)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2797001-2cca-e82c-42e2-015a9b874ec2@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.