All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
Subject: Re: [Intel-gfx] [PATCH v6 01/64] drm/i915: Do not share hwsp across contexts any more, v6
Date: Thu, 14 Jan 2021 12:06:09 +0000	[thread overview]
Message-ID: <4639c3c8-f5a9-def1-7c73-9afade95e7e5@linux.intel.com> (raw)
In-Reply-To: <20210105153558.134272-2-maarten.lankhorst@linux.intel.com>


On 05/01/2021 15:34, Maarten Lankhorst wrote:
> Instead of sharing pages with breadcrumbs, give each timeline a
> single page. This allows unrelated timelines not to share locks
> any more during command submission.
> 
> As an additional benefit, seqno wraparound no longer requires
> i915_vma_pin, which means we no longer need to worry about a
> potential -EDEADLK at a point where we are ready to submit.
> 
> Changes since v1:
> - Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle).
> - Extra check for completion in intel_read_hwsp().
> Changes since v2:
> - Fix inconsistent indent in hwsp_alloc() (kbuild)
> - memset entire cacheline to 0.
> Changes since v3:
> - Do same in intel_timeline_reset_seqno(), and clflush for good measure.
> Changes since v4:
> - Use refcounting on timeline, instead of relying on i915_active.
> - Fix waiting on kernel requests.
> Changes since v5:
> - Bump amount of slots to maximum (256), for best wraparounds.
> - Add hwsp_offset to i915_request to fix potential wraparound hang.
> - Ensure timeline wrap test works with the changes.
> - Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to
>    fix a hang.
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> #v1
> Reported-by: kernel test robot <lkp@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/gen2_engine_cs.c      |   2 +-
>   drivers/gpu/drm/i915/gt/gen6_engine_cs.c      |   8 +-
>   drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |  12 +-
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   1 +
>   drivers/gpu/drm/i915/gt/intel_gt_types.h      |   4 -
>   drivers/gpu/drm/i915/gt/intel_timeline.c      | 422 ++++--------------
>   .../gpu/drm/i915/gt/intel_timeline_types.h    |  17 +-
>   drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |   5 +-
>   drivers/gpu/drm/i915/gt/selftest_timeline.c   |  83 ++--
>   drivers/gpu/drm/i915/i915_request.c           |   4 -
>   drivers/gpu/drm/i915/i915_request.h           |  17 +-
>   11 files changed, 160 insertions(+), 415 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
> index b491a64919c8..9646200d2792 100644
> --- a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
> @@ -143,7 +143,7 @@ static u32 *__gen2_emit_breadcrumb(struct i915_request *rq, u32 *cs,
>   				   int flush, int post)
>   {
>   	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
> -	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
> +	GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR);
>   
>   	*cs++ = MI_FLUSH;
>   
> diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
> index ce38d1bcaba3..dd094e21bb51 100644
> --- a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
> @@ -161,7 +161,7 @@ u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
>   		 PIPE_CONTROL_DC_FLUSH_ENABLE |
>   		 PIPE_CONTROL_QW_WRITE |
>   		 PIPE_CONTROL_CS_STALL);
> -	*cs++ = i915_request_active_timeline(rq)->hwsp_offset |
> +	*cs++ = i915_request_active_offset(rq) |

"Active offset" is maybe a bit non-descriptive. How about 
i915_request_hwsp_seqno()?

>   		PIPE_CONTROL_GLOBAL_GTT;
>   	*cs++ = rq->fence.seqno;
>   
> @@ -359,7 +359,7 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
>   		 PIPE_CONTROL_QW_WRITE |
>   		 PIPE_CONTROL_GLOBAL_GTT_IVB |
>   		 PIPE_CONTROL_CS_STALL);
> -	*cs++ = i915_request_active_timeline(rq)->hwsp_offset;
> +	*cs++ = i915_request_active_offset(rq);
>   	*cs++ = rq->fence.seqno;
>   
>   	*cs++ = MI_USER_INTERRUPT;
> @@ -374,7 +374,7 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
>   u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
>   {
>   	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
> -	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
> +	GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR);
>   
>   	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
>   	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
> @@ -394,7 +394,7 @@ u32 *gen7_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
>   	int i;
>   
>   	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
> -	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
> +	GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR);
>   
>   	*cs++ = MI_FLUSH_DW | MI_INVALIDATE_TLB |
>   		MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index 1972dd5dca00..33eaa0203b1d 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -338,15 +338,13 @@ static inline u32 preempt_address(struct intel_engine_cs *engine)
>   
>   static u32 hwsp_offset(const struct i915_request *rq)
>   {
> -	const struct intel_timeline_cacheline *cl;
> +	const struct intel_timeline *tl;
>   
> -	/* Before the request is executed, the timeline/cachline is fixed */
> +	/* Before the request is executed, the timeline is fixed */
> +	tl = rcu_dereference_protected(rq->timeline,
> +				       !i915_request_signaled(rq));
>   
> -	cl = rcu_dereference_protected(rq->hwsp_cacheline, 1);
> -	if (cl)
> -		return cl->ggtt_offset;
> -
> -	return rcu_dereference_protected(rq->timeline, 1)->hwsp_offset;
> +	return page_mask_bits(tl->hwsp_offset) + offset_in_page(rq->hwsp_seqno);

I don't get this.

tl->hwsp_offset is an offset, not vaddr. What does page_mask_bits on it 
achieve?

Then rq->hwsp_seqno is a vaddr assigned as:

   rq->hwsp_seqno = tl->hwsp_seqno;

And elsewhere in the patch:

+	timeline->hwsp_map = vaddr;
+	seqno = vaddr + timeline->hwsp_offset;
+	WRITE_ONCE(*seqno, 0);
+	timeline->hwsp_seqno = seqno;

So page_mask_bits(tl->hwsp_offset) + offset_in_page(rq->hwsp_seqno) 
seems to me to reduce to 0 + tl->hwsp_offset, or tl->hwsp_offset.

Maybe I am blind..

>   }
>   
>   int gen8_emit_init_breadcrumb(struct i915_request *rq)
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 1847d3c2ea99..1db608d1f26f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -754,6 +754,7 @@ static int measure_breadcrumb_dw(struct intel_context *ce)
>   	frame->rq.engine = engine;
>   	frame->rq.context = ce;
>   	rcu_assign_pointer(frame->rq.timeline, ce->timeline);
> +	frame->rq.hwsp_seqno = ce->timeline->hwsp_seqno;
>   
>   	frame->ring.vaddr = frame->cs;
>   	frame->ring.size = sizeof(frame->cs);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index a83d3e18254d..f7dab4fc926c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -39,10 +39,6 @@ struct intel_gt {
>   	struct intel_gt_timelines {
>   		spinlock_t lock; /* protects active_list */
>   		struct list_head active_list;
> -
> -		/* Pack multiple timelines' seqnos into the same page */
> -		spinlock_t hwsp_lock;
> -		struct list_head hwsp_free_list;
>   	} timelines;
>   
>   	struct intel_gt_requests {
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> index 7fe05918a76e..3d43f15f34f3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> @@ -12,21 +12,9 @@
>   #include "intel_ring.h"
>   #include "intel_timeline.h"
>   
> -#define ptr_set_bit(ptr, bit) ((typeof(ptr))((unsigned long)(ptr) | BIT(bit)))
> -#define ptr_test_bit(ptr, bit) ((unsigned long)(ptr) & BIT(bit))
> +#define TIMELINE_SEQNO_BYTES 8
>   
> -#define CACHELINE_BITS 6
> -#define CACHELINE_FREE CACHELINE_BITS
> -
> -struct intel_timeline_hwsp {
> -	struct intel_gt *gt;
> -	struct intel_gt_timelines *gt_timelines;
> -	struct list_head free_link;
> -	struct i915_vma *vma;
> -	u64 free_bitmap;
> -};
> -
> -static struct i915_vma *__hwsp_alloc(struct intel_gt *gt)
> +static struct i915_vma *hwsp_alloc(struct intel_gt *gt)
>   {
>   	struct drm_i915_private *i915 = gt->i915;
>   	struct drm_i915_gem_object *obj;
> @@ -45,220 +33,59 @@ static struct i915_vma *__hwsp_alloc(struct intel_gt *gt)
>   	return vma;
>   }
>   
> -static struct i915_vma *
> -hwsp_alloc(struct intel_timeline *timeline, unsigned int *cacheline)
> -{
> -	struct intel_gt_timelines *gt = &timeline->gt->timelines;
> -	struct intel_timeline_hwsp *hwsp;
> -
> -	BUILD_BUG_ON(BITS_PER_TYPE(u64) * CACHELINE_BYTES > PAGE_SIZE);
> -
> -	spin_lock_irq(&gt->hwsp_lock);
> -
> -	/* hwsp_free_list only contains HWSP that have available cachelines */
> -	hwsp = list_first_entry_or_null(&gt->hwsp_free_list,
> -					typeof(*hwsp), free_link);
> -	if (!hwsp) {
> -		struct i915_vma *vma;
> -
> -		spin_unlock_irq(&gt->hwsp_lock);
> -
> -		hwsp = kmalloc(sizeof(*hwsp), GFP_KERNEL);
> -		if (!hwsp)
> -			return ERR_PTR(-ENOMEM);
> -
> -		vma = __hwsp_alloc(timeline->gt);
> -		if (IS_ERR(vma)) {
> -			kfree(hwsp);
> -			return vma;
> -		}
> -
> -		GT_TRACE(timeline->gt, "new HWSP allocated\n");
> -
> -		vma->private = hwsp;
> -		hwsp->gt = timeline->gt;
> -		hwsp->vma = vma;
> -		hwsp->free_bitmap = ~0ull;
> -		hwsp->gt_timelines = gt;
> -
> -		spin_lock_irq(&gt->hwsp_lock);
> -		list_add(&hwsp->free_link, &gt->hwsp_free_list);
> -	}
> -
> -	GEM_BUG_ON(!hwsp->free_bitmap);
> -	*cacheline = __ffs64(hwsp->free_bitmap);
> -	hwsp->free_bitmap &= ~BIT_ULL(*cacheline);
> -	if (!hwsp->free_bitmap)
> -		list_del(&hwsp->free_link);
> -
> -	spin_unlock_irq(&gt->hwsp_lock);
> -
> -	GEM_BUG_ON(hwsp->vma->private != hwsp);
> -	return hwsp->vma;
> -}
> -
> -static void __idle_hwsp_free(struct intel_timeline_hwsp *hwsp, int cacheline)
> -{
> -	struct intel_gt_timelines *gt = hwsp->gt_timelines;
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&gt->hwsp_lock, flags);
> -
> -	/* As a cacheline becomes available, publish the HWSP on the freelist */
> -	if (!hwsp->free_bitmap)
> -		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
> -
> -	GEM_BUG_ON(cacheline >= BITS_PER_TYPE(hwsp->free_bitmap));
> -	hwsp->free_bitmap |= BIT_ULL(cacheline);
> -
> -	/* And if no one is left using it, give the page back to the system */
> -	if (hwsp->free_bitmap == ~0ull) {
> -		i915_vma_put(hwsp->vma);
> -		list_del(&hwsp->free_link);
> -		kfree(hwsp);
> -	}
> -
> -	spin_unlock_irqrestore(&gt->hwsp_lock, flags);
> -}
> -
> -static void __rcu_cacheline_free(struct rcu_head *rcu)
> -{
> -	struct intel_timeline_cacheline *cl =
> -		container_of(rcu, typeof(*cl), rcu);
> -
> -	/* Must wait until after all *rq->hwsp are complete before removing */
> -	i915_gem_object_unpin_map(cl->hwsp->vma->obj);
> -	__idle_hwsp_free(cl->hwsp, ptr_unmask_bits(cl->vaddr, CACHELINE_BITS));
> -
> -	i915_active_fini(&cl->active);
> -	kfree(cl);
> -}
> -
> -static void __idle_cacheline_free(struct intel_timeline_cacheline *cl)
> -{
> -	GEM_BUG_ON(!i915_active_is_idle(&cl->active));
> -	call_rcu(&cl->rcu, __rcu_cacheline_free);
> -}
> -
>   __i915_active_call
> -static void __cacheline_retire(struct i915_active *active)
> +static void __timeline_retire(struct i915_active *active)
>   {
> -	struct intel_timeline_cacheline *cl =
> -		container_of(active, typeof(*cl), active);
> +	struct intel_timeline *tl =
> +		container_of(active, typeof(*tl), active);
>   
> -	i915_vma_unpin(cl->hwsp->vma);
> -	if (ptr_test_bit(cl->vaddr, CACHELINE_FREE))
> -		__idle_cacheline_free(cl);
> +	i915_vma_unpin(tl->hwsp_ggtt);
> +	intel_timeline_put(tl);
>   }
>   
> -static int __cacheline_active(struct i915_active *active)
> +static int __timeline_active(struct i915_active *active)
>   {
> -	struct intel_timeline_cacheline *cl =
> -		container_of(active, typeof(*cl), active);
> +	struct intel_timeline *tl =
> +		container_of(active, typeof(*tl), active);
>   
> -	__i915_vma_pin(cl->hwsp->vma);
> +	__i915_vma_pin(tl->hwsp_ggtt);
> +	intel_timeline_get(tl);
>   	return 0;
>   }
>   
> -static struct intel_timeline_cacheline *
> -cacheline_alloc(struct intel_timeline_hwsp *hwsp, unsigned int cacheline)
> -{
> -	struct intel_timeline_cacheline *cl;
> -	void *vaddr;
> -
> -	GEM_BUG_ON(cacheline >= BIT(CACHELINE_BITS));
> -
> -	cl = kmalloc(sizeof(*cl), GFP_KERNEL);
> -	if (!cl)
> -		return ERR_PTR(-ENOMEM);
> -
> -	vaddr = i915_gem_object_pin_map(hwsp->vma->obj, I915_MAP_WB);
> -	if (IS_ERR(vaddr)) {
> -		kfree(cl);
> -		return ERR_CAST(vaddr);
> -	}
> -
> -	cl->hwsp = hwsp;
> -	cl->vaddr = page_pack_bits(vaddr, cacheline);
> -
> -	i915_active_init(&cl->active, __cacheline_active, __cacheline_retire);
> -
> -	return cl;
> -}
> -
> -static void cacheline_acquire(struct intel_timeline_cacheline *cl,
> -			      u32 ggtt_offset)
> -{
> -	if (!cl)
> -		return;
> -
> -	cl->ggtt_offset = ggtt_offset;
> -	i915_active_acquire(&cl->active);
> -}
> -
> -static void cacheline_release(struct intel_timeline_cacheline *cl)
> -{
> -	if (cl)
> -		i915_active_release(&cl->active);
> -}
> -
> -static void cacheline_free(struct intel_timeline_cacheline *cl)
> -{
> -	if (!i915_active_acquire_if_busy(&cl->active)) {
> -		__idle_cacheline_free(cl);
> -		return;
> -	}
> -
> -	GEM_BUG_ON(ptr_test_bit(cl->vaddr, CACHELINE_FREE));
> -	cl->vaddr = ptr_set_bit(cl->vaddr, CACHELINE_FREE);
> -
> -	i915_active_release(&cl->active);
> -}
> -
>   static int intel_timeline_init(struct intel_timeline *timeline,
>   			       struct intel_gt *gt,
>   			       struct i915_vma *hwsp,
>   			       unsigned int offset)
>   {
>   	void *vaddr;
> +	u32 *seqno;
>   
>   	kref_init(&timeline->kref);
>   	atomic_set(&timeline->pin_count, 0);
>   
>   	timeline->gt = gt;
>   
> -	timeline->has_initial_breadcrumb = !hwsp;
> -	timeline->hwsp_cacheline = NULL;
> -
> -	if (!hwsp) {
> -		struct intel_timeline_cacheline *cl;
> -		unsigned int cacheline;
> -
> -		hwsp = hwsp_alloc(timeline, &cacheline);
> +	if (hwsp) {
> +		timeline->hwsp_offset = offset;
> +		timeline->hwsp_ggtt = i915_vma_get(hwsp);
> +	} else {
> +		timeline->has_initial_breadcrumb = true;
> +		hwsp = hwsp_alloc(gt);
>   		if (IS_ERR(hwsp))
>   			return PTR_ERR(hwsp);
> -
> -		cl = cacheline_alloc(hwsp->private, cacheline);
> -		if (IS_ERR(cl)) {
> -			__idle_hwsp_free(hwsp->private, cacheline);
> -			return PTR_ERR(cl);
> -		}
> -
> -		timeline->hwsp_cacheline = cl;
> -		timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
> -
> -		vaddr = page_mask_bits(cl->vaddr);
> -	} else {
> -		timeline->hwsp_offset = offset;
> -		vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
> -		if (IS_ERR(vaddr))
> -			return PTR_ERR(vaddr);
> +		timeline->hwsp_ggtt = hwsp;
>   	}
>   
> -	timeline->hwsp_seqno =
> -		memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES);
> +	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
> +	if (IS_ERR(vaddr))
> +		return PTR_ERR(vaddr);
> +
> +	timeline->hwsp_map = vaddr;
> +	seqno = vaddr + timeline->hwsp_offset;
> +	WRITE_ONCE(*seqno, 0);
> +	timeline->hwsp_seqno = seqno;
>   
> -	timeline->hwsp_ggtt = i915_vma_get(hwsp);
>   	GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size);
>   
>   	timeline->fence_context = dma_fence_context_alloc(1);
> @@ -269,6 +96,7 @@ static int intel_timeline_init(struct intel_timeline *timeline,
>   	INIT_LIST_HEAD(&timeline->requests);
>   
>   	i915_syncmap_init(&timeline->sync);
> +	i915_active_init(&timeline->active, __timeline_active, __timeline_retire);
>   
>   	return 0;
>   }
> @@ -279,23 +107,18 @@ void intel_gt_init_timelines(struct intel_gt *gt)
>   
>   	spin_lock_init(&timelines->lock);
>   	INIT_LIST_HEAD(&timelines->active_list);
> -
> -	spin_lock_init(&timelines->hwsp_lock);
> -	INIT_LIST_HEAD(&timelines->hwsp_free_list);
>   }
>   
> -static void intel_timeline_fini(struct intel_timeline *timeline)
> +static void intel_timeline_fini(struct rcu_head *rcu)
>   {
> -	GEM_BUG_ON(atomic_read(&timeline->pin_count));
> -	GEM_BUG_ON(!list_empty(&timeline->requests));
> -	GEM_BUG_ON(timeline->retire);
> +	struct intel_timeline *timeline =
> +		container_of(rcu, struct intel_timeline, rcu);
>   
> -	if (timeline->hwsp_cacheline)
> -		cacheline_free(timeline->hwsp_cacheline);
> -	else
> -		i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
> +	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
>   
>   	i915_vma_put(timeline->hwsp_ggtt);
> +	i915_active_fini(&timeline->active);
> +	kfree(timeline);
>   }
>   
>   struct intel_timeline *
> @@ -361,9 +184,9 @@ int intel_timeline_pin(struct intel_timeline *tl, struct i915_gem_ww_ctx *ww)
>   	GT_TRACE(tl->gt, "timeline:%llx using HWSP offset:%x\n",
>   		 tl->fence_context, tl->hwsp_offset);
>   
> -	cacheline_acquire(tl->hwsp_cacheline, tl->hwsp_offset);
> +	i915_active_acquire(&tl->active);
>   	if (atomic_fetch_inc(&tl->pin_count)) {
> -		cacheline_release(tl->hwsp_cacheline);
> +		i915_active_release(&tl->active);
>   		__i915_vma_unpin(tl->hwsp_ggtt);
>   	}
>   
> @@ -372,9 +195,13 @@ int intel_timeline_pin(struct intel_timeline *tl, struct i915_gem_ww_ctx *ww)
>   
>   void intel_timeline_reset_seqno(const struct intel_timeline *tl)
>   {
> +	u32 *hwsp_seqno = (u32 *)tl->hwsp_seqno;
>   	/* Must be pinned to be writable, and no requests in flight. */
>   	GEM_BUG_ON(!atomic_read(&tl->pin_count));
> -	WRITE_ONCE(*(u32 *)tl->hwsp_seqno, tl->seqno);
> +
> +	memset(hwsp_seqno + 1, 0, TIMELINE_SEQNO_BYTES - sizeof(*hwsp_seqno));
> +	WRITE_ONCE(*hwsp_seqno, tl->seqno);
> +	clflush(hwsp_seqno);
>   }
>   
>   void intel_timeline_enter(struct intel_timeline *tl)
> @@ -450,106 +277,23 @@ static u32 timeline_advance(struct intel_timeline *tl)
>   	return tl->seqno += 1 + tl->has_initial_breadcrumb;
>   }
>   
> -static void timeline_rollback(struct intel_timeline *tl)
> -{
> -	tl->seqno -= 1 + tl->has_initial_breadcrumb;
> -}
> -
>   static noinline int
>   __intel_timeline_get_seqno(struct intel_timeline *tl,
> -			   struct i915_request *rq,
>   			   u32 *seqno)
>   {
> -	struct intel_timeline_cacheline *cl;
> -	unsigned int cacheline;
> -	struct i915_vma *vma;
> -	void *vaddr;
> -	int err;
> -
> -	might_lock(&tl->gt->ggtt->vm.mutex);
> -	GT_TRACE(tl->gt, "timeline:%llx wrapped\n", tl->fence_context);
> -
> -	/*
> -	 * If there is an outstanding GPU reference to this cacheline,
> -	 * such as it being sampled by a HW semaphore on another timeline,
> -	 * we cannot wraparound our seqno value (the HW semaphore does
> -	 * a strict greater-than-or-equals compare, not i915_seqno_passed).
> -	 * So if the cacheline is still busy, we must detach ourselves
> -	 * from it and leave it inflight alongside its users.
> -	 *
> -	 * However, if nobody is watching and we can guarantee that nobody
> -	 * will, we could simply reuse the same cacheline.
> -	 *
> -	 * if (i915_active_request_is_signaled(&tl->last_request) &&
> -	 *     i915_active_is_signaled(&tl->hwsp_cacheline->active))
> -	 *	return 0;
> -	 *
> -	 * That seems unlikely for a busy timeline that needed to wrap in
> -	 * the first place, so just replace the cacheline.
> -	 */
> -
> -	vma = hwsp_alloc(tl, &cacheline);
> -	if (IS_ERR(vma)) {
> -		err = PTR_ERR(vma);
> -		goto err_rollback;
> -	}
> -
> -	err = i915_ggtt_pin(vma, NULL, 0, PIN_HIGH);
> -	if (err) {
> -		__idle_hwsp_free(vma->private, cacheline);
> -		goto err_rollback;
> -	}
> +	u32 next_ofs = offset_in_page(tl->hwsp_offset + TIMELINE_SEQNO_BYTES);
>   
> -	cl = cacheline_alloc(vma->private, cacheline);
> -	if (IS_ERR(cl)) {
> -		err = PTR_ERR(cl);
> -		__idle_hwsp_free(vma->private, cacheline);
> -		goto err_unpin;
> -	}
> -	GEM_BUG_ON(cl->hwsp->vma != vma);
> -
> -	/*
> -	 * Attach the old cacheline to the current request, so that we only
> -	 * free it after the current request is retired, which ensures that
> -	 * all writes into the cacheline from previous requests are complete.
> -	 */
> -	err = i915_active_ref(&tl->hwsp_cacheline->active,
> -			      tl->fence_context,
> -			      &rq->fence);
> -	if (err)
> -		goto err_cacheline;
> +	/* w/a: bit 5 needs to be zero for MI_FLUSH_DW address. */
> +	if (TIMELINE_SEQNO_BYTES <= BIT(5) && (next_ofs & BIT(5)))
> +		next_ofs = offset_in_page(next_ofs + BIT(5));
>   
> -	cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */
> -	cacheline_free(tl->hwsp_cacheline);
> -
> -	i915_vma_unpin(tl->hwsp_ggtt); /* binding kept alive by old cacheline */
> -	i915_vma_put(tl->hwsp_ggtt);
> -
> -	tl->hwsp_ggtt = i915_vma_get(vma);
> -
> -	vaddr = page_mask_bits(cl->vaddr);
> -	tl->hwsp_offset = cacheline * CACHELINE_BYTES;
> -	tl->hwsp_seqno =
> -		memset(vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES);
> -
> -	tl->hwsp_offset += i915_ggtt_offset(vma);
> -	GT_TRACE(tl->gt, "timeline:%llx using HWSP offset:%x\n",
> -		 tl->fence_context, tl->hwsp_offset);
> -
> -	cacheline_acquire(cl, tl->hwsp_offset);
> -	tl->hwsp_cacheline = cl;
> +	tl->hwsp_offset = i915_ggtt_offset(tl->hwsp_ggtt) + next_ofs;
> +	tl->hwsp_seqno = tl->hwsp_map + next_ofs;
> +	intel_timeline_reset_seqno(tl);
>   
>   	*seqno = timeline_advance(tl);
>   	GEM_BUG_ON(i915_seqno_passed(*tl->hwsp_seqno, *seqno));
>   	return 0;
> -
> -err_cacheline:
> -	cacheline_free(cl);
> -err_unpin:
> -	i915_vma_unpin(vma);
> -err_rollback:
> -	timeline_rollback(tl);
> -	return err;
>   }
>   
>   int intel_timeline_get_seqno(struct intel_timeline *tl,
> @@ -559,51 +303,52 @@ int intel_timeline_get_seqno(struct intel_timeline *tl,
>   	*seqno = timeline_advance(tl);
>   
>   	/* Replace the HWSP on wraparound for HW semaphores */
> -	if (unlikely(!*seqno && tl->hwsp_cacheline))
> -		return __intel_timeline_get_seqno(tl, rq, seqno);
> +	if (unlikely(!*seqno && tl->has_initial_breadcrumb))
> +		return __intel_timeline_get_seqno(tl, seqno);
>   
>   	return 0;
>   }
>   
> -static int cacheline_ref(struct intel_timeline_cacheline *cl,
> -			 struct i915_request *rq)
> -{
> -	return i915_active_add_request(&cl->active, rq);
> -}
> -
>   int intel_timeline_read_hwsp(struct i915_request *from,
>   			     struct i915_request *to,
>   			     u32 *hwsp)
>   {
> -	struct intel_timeline_cacheline *cl;
> +	struct intel_timeline *tl;
>   	int err;
>   
> -	GEM_BUG_ON(!rcu_access_pointer(from->hwsp_cacheline));
> -
>   	rcu_read_lock();
> -	cl = rcu_dereference(from->hwsp_cacheline);
> -	if (i915_request_completed(from)) /* confirm cacheline is valid */
> -		goto unlock;
> -	if (unlikely(!i915_active_acquire_if_busy(&cl->active)))
> -		goto unlock; /* seqno wrapped and completed! */
> -	if (unlikely(i915_request_completed(from)))
> -		goto release;
> +	tl = rcu_dereference(from->timeline);
> +	if (i915_request_completed(from) ||
> +	    !i915_active_acquire_if_busy(&tl->active))
> +		tl = NULL;
> +
> +	if (tl) {
> +		/* hwsp_offset may wraparound, so use from->hwsp_seqno */
> +		*hwsp = i915_ggtt_offset(tl->hwsp_ggtt) +
> +			offset_in_page(from->hwsp_seqno);
> +	}
> +
> +	/* ensure we wait on the right request, if not, we completed */
> +	if (tl && i915_request_completed(from)) {
> +		i915_active_release(&tl->active);
> +		tl = NULL;
> +	}
>   	rcu_read_unlock();
>   
> -	err = cacheline_ref(cl, to);
> -	if (err)
> +	if (!tl)
> +		return 1;
> +
> +	/* Can't do semaphore waits on kernel context */
> +	if (!tl->has_initial_breadcrumb) {
> +		err = -EINVAL;
>   		goto out;
> +	}
> +
> +	err = i915_active_add_request(&tl->active, to);
>   
> -	*hwsp = cl->ggtt_offset;
>   out:
> -	i915_active_release(&cl->active);
> +	i915_active_release(&tl->active);
>   	return err;
> -
> -release:
> -	i915_active_release(&cl->active);
> -unlock:
> -	rcu_read_unlock();
> -	return 1;
>   }
>   
>   void intel_timeline_unpin(struct intel_timeline *tl)
> @@ -612,8 +357,7 @@ void intel_timeline_unpin(struct intel_timeline *tl)
>   	if (!atomic_dec_and_test(&tl->pin_count))
>   		return;
>   
> -	cacheline_release(tl->hwsp_cacheline);
> -
> +	i915_active_release(&tl->active);
>   	__i915_vma_unpin(tl->hwsp_ggtt);
>   }
>   
> @@ -622,8 +366,11 @@ void __intel_timeline_free(struct kref *kref)
>   	struct intel_timeline *timeline =
>   		container_of(kref, typeof(*timeline), kref);
>   
> -	intel_timeline_fini(timeline);
> -	kfree_rcu(timeline, rcu);
> +	GEM_BUG_ON(atomic_read(&timeline->pin_count));
> +	GEM_BUG_ON(!list_empty(&timeline->requests));
> +	GEM_BUG_ON(timeline->retire);
> +
> +	call_rcu(&timeline->rcu, intel_timeline_fini);
>   }
>   
>   void intel_gt_fini_timelines(struct intel_gt *gt)
> @@ -631,7 +378,6 @@ void intel_gt_fini_timelines(struct intel_gt *gt)
>   	struct intel_gt_timelines *timelines = &gt->timelines;
>   
>   	GEM_BUG_ON(!list_empty(&timelines->active_list));
> -	GEM_BUG_ON(!list_empty(&timelines->hwsp_free_list));
>   }
>   
>   void intel_gt_show_timelines(struct intel_gt *gt,
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline_types.h b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> index e360f50706bf..9c1d767a867f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> @@ -18,7 +18,6 @@
>   struct i915_vma;
>   struct i915_syncmap;
>   struct intel_gt;
> -struct intel_timeline_hwsp;
>   
>   struct intel_timeline {
>   	u64 fence_context;
> @@ -45,12 +44,11 @@ struct intel_timeline {
>   	atomic_t pin_count;
>   	atomic_t active_count;
>   
> +	void *hwsp_map;
>   	const u32 *hwsp_seqno;
>   	struct i915_vma *hwsp_ggtt;
>   	u32 hwsp_offset;
>   
> -	struct intel_timeline_cacheline *hwsp_cacheline;
> -
>   	bool has_initial_breadcrumb;
>   
>   	/**
> @@ -67,6 +65,8 @@ struct intel_timeline {
>   	 */
>   	struct i915_active_fence last_request;
>   
> +	struct i915_active active;
> +
>   	/** A chain of completed timelines ready for early retirement. */
>   	struct intel_timeline *retire;
>   
> @@ -90,15 +90,4 @@ struct intel_timeline {
>   	struct rcu_head rcu;
>   };
>   
> -struct intel_timeline_cacheline {
> -	struct i915_active active;
> -
> -	struct intel_timeline_hwsp *hwsp;
> -	void *vaddr;
> -
> -	u32 ggtt_offset;
> -
> -	struct rcu_head rcu;
> -};
> -
>   #endif /* __I915_TIMELINE_TYPES_H__ */
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
> index 439c8984f5fa..fd9414b0d1bd 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
> @@ -42,6 +42,9 @@ static int perf_end(struct intel_gt *gt)
>   
>   static int write_timestamp(struct i915_request *rq, int slot)
>   {
> +	struct intel_timeline *tl =
> +		rcu_dereference_protected(rq->timeline,
> +					  !i915_request_signaled(rq));
>   	u32 cmd;
>   	u32 *cs;
>   
> @@ -54,7 +57,7 @@ static int write_timestamp(struct i915_request *rq, int slot)
>   		cmd++;
>   	*cs++ = cmd;
>   	*cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(rq->engine->mmio_base));
> -	*cs++ = i915_request_timeline(rq)->hwsp_offset + slot * sizeof(u32);
> +	*cs++ = tl->hwsp_offset + slot * sizeof(u32);
>   	*cs++ = 0;
>   
>   	intel_ring_advance(rq, cs);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c
> index 6f3a3687ef0f..6e32e91cdab4 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
> @@ -666,7 +666,7 @@ static int live_hwsp_wrap(void *arg)
>   	if (IS_ERR(tl))
>   		return PTR_ERR(tl);
>   
> -	if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
> +	if (!tl->has_initial_breadcrumb)
>   		goto out_free;
>   
>   	err = intel_timeline_pin(tl, NULL);
> @@ -833,12 +833,26 @@ static int setup_watcher(struct hwsp_watcher *w, struct intel_gt *gt)
>   	return 0;
>   }
>   
> +static void switch_tl_lock(struct i915_request *from, struct i915_request *to)
> +{
> +	/* some light mutex juggling required; think co-routines */
> +
> +	if (from) {
> +		lockdep_unpin_lock(&from->context->timeline->mutex, from->cookie);
> +		mutex_unlock(&from->context->timeline->mutex);
> +	}
> +
> +	if (to) {
> +		mutex_lock(&to->context->timeline->mutex);
> +		to->cookie = lockdep_pin_lock(&to->context->timeline->mutex);
> +	}
> +}
> +
>   static int create_watcher(struct hwsp_watcher *w,
>   			  struct intel_engine_cs *engine,
>   			  int ringsz)
>   {
>   	struct intel_context *ce;
> -	struct intel_timeline *tl;
>   
>   	ce = intel_context_create(engine);
>   	if (IS_ERR(ce))
> @@ -851,11 +865,8 @@ static int create_watcher(struct hwsp_watcher *w,
>   		return PTR_ERR(w->rq);
>   
>   	w->addr = i915_ggtt_offset(w->vma);
> -	tl = w->rq->context->timeline;
>   
> -	/* some light mutex juggling required; think co-routines */
> -	lockdep_unpin_lock(&tl->mutex, w->rq->cookie);
> -	mutex_unlock(&tl->mutex);
> +	switch_tl_lock(w->rq, NULL);
>   
>   	return 0;
>   }
> @@ -864,15 +875,13 @@ static int check_watcher(struct hwsp_watcher *w, const char *name,
>   			 bool (*op)(u32 hwsp, u32 seqno))
>   {
>   	struct i915_request *rq = fetch_and_zero(&w->rq);
> -	struct intel_timeline *tl = rq->context->timeline;
>   	u32 offset, end;
>   	int err;
>   
>   	GEM_BUG_ON(w->addr - i915_ggtt_offset(w->vma) > w->vma->size);
>   
>   	i915_request_get(rq);
> -	mutex_lock(&tl->mutex);
> -	rq->cookie = lockdep_pin_lock(&tl->mutex);
> +	switch_tl_lock(NULL, rq);
>   	i915_request_add(rq);
>   
>   	if (i915_request_wait(rq, 0, HZ) < 0) {
> @@ -901,10 +910,7 @@ static int check_watcher(struct hwsp_watcher *w, const char *name,
>   static void cleanup_watcher(struct hwsp_watcher *w)
>   {
>   	if (w->rq) {
> -		struct intel_timeline *tl = w->rq->context->timeline;
> -
> -		mutex_lock(&tl->mutex);
> -		w->rq->cookie = lockdep_pin_lock(&tl->mutex);
> +		switch_tl_lock(NULL, w->rq);
>   
>   		i915_request_add(w->rq);
>   	}
> @@ -942,7 +948,7 @@ static struct i915_request *wrap_timeline(struct i915_request *rq)
>   	}
>   
>   	i915_request_put(rq);
> -	rq = intel_context_create_request(ce);
> +	rq = i915_request_create(ce);
>   	if (IS_ERR(rq))
>   		return rq;
>   
> @@ -977,7 +983,7 @@ static int live_hwsp_read(void *arg)
>   	if (IS_ERR(tl))
>   		return PTR_ERR(tl);
>   
> -	if (!tl->hwsp_cacheline)
> +	if (!tl->has_initial_breadcrumb)
>   		goto out_free;
>   
>   	for (i = 0; i < ARRAY_SIZE(watcher); i++) {
> @@ -999,7 +1005,7 @@ static int live_hwsp_read(void *arg)
>   		do {
>   			struct i915_sw_fence *submit;
>   			struct i915_request *rq;
> -			u32 hwsp;
> +			u32 hwsp, dummy;
>   
>   			submit = heap_fence_create(GFP_KERNEL);
>   			if (!submit) {
> @@ -1017,14 +1023,26 @@ static int live_hwsp_read(void *arg)
>   				goto out;
>   			}
>   
> -			/* Skip to the end, saving 30 minutes of nops */
> -			tl->seqno = -10u + 2 * (count & 3);
> -			WRITE_ONCE(*(u32 *)tl->hwsp_seqno, tl->seqno);
>   			ce->timeline = intel_timeline_get(tl);
>   
> -			rq = intel_context_create_request(ce);
> +			/* Ensure timeline is mapped, done during first pin */
> +			err = intel_context_pin(ce);
> +			if (err) {
> +				intel_context_put(ce);
> +				goto out;
> +			}
> +
> +			/*
> +			 * Start at a new wrap, and set seqno right before another wrap,
> +			 * saving 30 minutes of nops
> +			 */
> +			tl->seqno = -12u + 2 * (count & 3);
> +			__intel_timeline_get_seqno(tl, &dummy);
> +
> +			rq = i915_request_create(ce);
>   			if (IS_ERR(rq)) {
>   				err = PTR_ERR(rq);
> +				intel_context_unpin(ce);
>   				intel_context_put(ce);
>   				goto out;
>   			}
> @@ -1034,32 +1052,35 @@ static int live_hwsp_read(void *arg)
>   							    GFP_KERNEL);
>   			if (err < 0) {
>   				i915_request_add(rq);
> +				intel_context_unpin(ce);
>   				intel_context_put(ce);
>   				goto out;
>   			}
>   
> -			mutex_lock(&watcher[0].rq->context->timeline->mutex);
> +			switch_tl_lock(rq, watcher[0].rq);
>   			err = intel_timeline_read_hwsp(rq, watcher[0].rq, &hwsp);
>   			if (err == 0)
>   				err = emit_read_hwsp(watcher[0].rq, /* before */
>   						     rq->fence.seqno, hwsp,
>   						     &watcher[0].addr);
> -			mutex_unlock(&watcher[0].rq->context->timeline->mutex);
> +			switch_tl_lock(watcher[0].rq, rq);

Ugh this is some advanced stuff. Trickyness level out of 10?

Change is "touching" two mutexes instead of one.

Do you have a branch somewhere I could check out at this stage and look 
at the code to try and figure it out more easily?

>   			if (err) {
>   				i915_request_add(rq);
> +				intel_context_unpin(ce);
>   				intel_context_put(ce);
>   				goto out;
>   			}
>   
> -			mutex_lock(&watcher[1].rq->context->timeline->mutex);
> +			switch_tl_lock(rq, watcher[1].rq);
>   			err = intel_timeline_read_hwsp(rq, watcher[1].rq, &hwsp);
>   			if (err == 0)
>   				err = emit_read_hwsp(watcher[1].rq, /* after */
>   						     rq->fence.seqno, hwsp,
>   						     &watcher[1].addr);
> -			mutex_unlock(&watcher[1].rq->context->timeline->mutex);
> +			switch_tl_lock(watcher[1].rq, rq);
>   			if (err) {
>   				i915_request_add(rq);
> +				intel_context_unpin(ce);
>   				intel_context_put(ce);
>   				goto out;
>   			}
> @@ -1068,6 +1089,7 @@ static int live_hwsp_read(void *arg)
>   			i915_request_add(rq);
>   
>   			rq = wrap_timeline(rq);
> +			intel_context_unpin(ce);
>   			intel_context_put(ce);
>   			if (IS_ERR(rq)) {
>   				err = PTR_ERR(rq);
> @@ -1107,8 +1129,8 @@ static int live_hwsp_read(void *arg)
>   			    3 * watcher[1].rq->ring->size)
>   				break;
>   
> -		} while (!__igt_timeout(end_time, NULL));
> -		WRITE_ONCE(*(u32 *)tl->hwsp_seqno, 0xdeadbeef);
> +		} while (!__igt_timeout(end_time, NULL) &&
> +			 count < (PAGE_SIZE / TIMELINE_SEQNO_BYTES - 1) / 2);
>   
>   		pr_info("%s: simulated %lu wraps\n", engine->name, count);
>   		err = check_watcher(&watcher[1], "after", cmp_gte);
> @@ -1153,9 +1175,7 @@ static int live_hwsp_rollover_kernel(void *arg)
>   		}
>   
>   		GEM_BUG_ON(i915_active_fence_isset(&tl->last_request));
> -		tl->seqno = 0;
> -		timeline_rollback(tl);
> -		timeline_rollback(tl);
> +		tl->seqno = -2u;
>   		WRITE_ONCE(*(u32 *)tl->hwsp_seqno, tl->seqno);
>   
>   		for (i = 0; i < ARRAY_SIZE(rq); i++) {
> @@ -1235,11 +1255,10 @@ static int live_hwsp_rollover_user(void *arg)
>   			goto out;
>   
>   		tl = ce->timeline;
> -		if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
> +		if (!tl->has_initial_breadcrumb)
>   			goto out;
>   
> -		timeline_rollback(tl);
> -		timeline_rollback(tl);
> +		tl->seqno = -4u;
>   		WRITE_ONCE(*(u32 *)tl->hwsp_seqno, tl->seqno);
>   
>   		for (i = 0; i < ARRAY_SIZE(rq); i++) {
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index bfd82d8259f4..b661b3b5992e 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -851,7 +851,6 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
>   	rq->fence.seqno = seqno;
>   
>   	RCU_INIT_POINTER(rq->timeline, tl);
> -	RCU_INIT_POINTER(rq->hwsp_cacheline, tl->hwsp_cacheline);
>   	rq->hwsp_seqno = tl->hwsp_seqno;
>   	GEM_BUG_ON(i915_request_completed(rq));
>   
> @@ -1090,9 +1089,6 @@ emit_semaphore_wait(struct i915_request *to,
>   	if (i915_request_has_initial_breadcrumb(to))
>   		goto await_fence;
>   
> -	if (!rcu_access_pointer(from->hwsp_cacheline))
> -		goto await_fence;
> -
>   	/*
>   	 * If this or its dependents are waiting on an external fence
>   	 * that may fail catastrophically, then we want to avoid using
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index a8c413203f72..c25b0afcc225 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -237,16 +237,6 @@ struct i915_request {
>   	 */
>   	const u32 *hwsp_seqno;
>   
> -	/*
> -	 * If we need to access the timeline's seqno for this request in
> -	 * another request, we need to keep a read reference to this associated
> -	 * cacheline, so that we do not free and recycle it before the foreign
> -	 * observers have completed. Hence, we keep a pointer to the cacheline
> -	 * inside the timeline's HWSP vma, but it is only valid while this
> -	 * request has not completed and guarded by the timeline mutex.
> -	 */
> -	struct intel_timeline_cacheline __rcu *hwsp_cacheline;
> -
>   	/** Position in the ring of the start of the request */
>   	u32 head;
>   
> @@ -615,4 +605,11 @@ i915_request_active_timeline(const struct i915_request *rq)
>   					 lockdep_is_held(&rq->engine->active.lock));
>   }
>   
> +static inline u32
> +i915_request_active_offset(const struct i915_request *rq)
> +{
> +	return page_mask_bits(i915_request_active_timeline(rq)->hwsp_offset) +
> +	       offset_in_page(rq->hwsp_seqno);

Seems the same conundrum as I had earlier in hwsp_offset(). TBD.

Regards,

Tvrtko

> +}
> +
>   #endif /* I915_REQUEST_H */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2021-01-14 12:06 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-05 15:34 [Intel-gfx] [PATCH v6 00/64] drm/i915: Remove obj->mm.lock! Maarten Lankhorst
2021-01-05 15:34 ` [Intel-gfx] [PATCH v6 01/64] drm/i915: Do not share hwsp across contexts any more, v6 Maarten Lankhorst
2021-01-14 12:06   ` Tvrtko Ursulin [this message]
2021-01-18 16:09     ` Maarten Lankhorst
2021-01-05 15:34 ` [Intel-gfx] [PATCH v6 02/64] drm/i915: Pin timeline map after first timeline pin, v3 Maarten Lankhorst
2021-01-18 10:28   ` Thomas Hellström (Intel)
2021-01-05 15:34 ` [Intel-gfx] [PATCH v6 03/64] drm/i915: Move cmd parser pinning to execbuffer Maarten Lankhorst
2021-01-05 15:34 ` [Intel-gfx] [PATCH v6 04/64] drm/i915: Add missing -EDEADLK handling to execbuf pinning, v2 Maarten Lankhorst
2021-01-05 15:34 ` [Intel-gfx] [PATCH v6 05/64] drm/i915: Ensure we hold the object mutex in pin correctly Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 06/64] drm/i915: Add gem object locking to madvise Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 07/64] drm/i915: Move HAS_STRUCT_PAGE to obj->flags Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 08/64] drm/i915: Rework struct phys attachment handling Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 09/64] drm/i915: Convert i915_gem_object_attach_phys() to ww locking, v2 Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 10/64] drm/i915: make lockdep slightly happier about execbuf Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 11/64] drm/i915: Disable userptr pread/pwrite support Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 12/64] drm/i915: No longer allow exporting userptr through dma-buf Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 13/64] drm/i915: Reject more ioctls for userptr Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 14/64] drm/i915: Reject UNSYNCHRONIZED for userptr, v2 Maarten Lankhorst
2021-01-11 20:50   ` Dave Airlie
2021-01-18 10:34   ` Thomas Hellström
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 15/64] drm/i915: Make compilation of userptr code depend on MMU_NOTIFIER Maarten Lankhorst
2021-01-18 11:17   ` Thomas Hellström
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 16/64] drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v5 Maarten Lankhorst
2021-01-11 20:51   ` Dave Airlie
2021-01-18 11:30   ` Thomas Hellström (Intel)
2021-01-18 12:43     ` Maarten Lankhorst
2021-01-18 12:55       ` Thomas Hellström (Intel)
2021-01-18 14:43         ` Maarten Lankhorst
2021-01-20 13:32         ` Thomas Hellström (Intel)
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 17/64] drm/i915: Flatten obj->mm.lock Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 18/64] drm/i915: Populate logical context during first pin Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 19/64] drm/i915: Make ring submission compatible with obj->mm.lock removal, v2 Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 20/64] drm/i915: Handle ww locking in init_status_page Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 21/64] drm/i915: Rework clflush to work correctly without obj->mm.lock Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 22/64] drm/i915: Pass ww ctx to intel_pin_to_display_plane Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 23/64] drm/i915: Add object locking to vm_fault_cpu Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 24/64] drm/i915: Move pinning to inside engine_wa_list_verify() Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 25/64] drm/i915: Take reservation lock around i915_vma_pin Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 26/64] drm/i915: Make lrc_init_wa_ctx compatible with ww locking Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 27/64] drm/i915: Make __engine_unpark() " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 28/64] drm/i915: Take obj lock around set_domain ioctl Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 29/64] drm/i915: Defer pin calls in buffer pool until first use by caller Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 30/64] drm/i915: Fix pread/pwrite to work with new locking rules Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 31/64] drm/i915: Fix workarounds selftest, part 1 Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 32/64] drm/i915: Prepare for obj->mm.lock removal Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 33/64] drm/i915: Add igt_spinner_pin() to allow for ww locking around spinner Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 34/64] drm/i915: Add ww locking around vm_access() Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 35/64] drm/i915: Increase ww locking for perf Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 36/64] drm/i915: Lock ww in ucode objects correctly Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 37/64] drm/i915: Add ww locking to dma-buf ops Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 38/64] drm/i915: Add missing ww lock in intel_dsb_prepare Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 39/64] drm/i915: Fix ww locking in shmem_create_from_object Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 40/64] drm/i915: Use a single page table lock for each gtt Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 41/64] drm/i915/selftests: Prepare huge_pages testcases for obj->mm.lock removal Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 42/64] drm/i915/selftests: Prepare client blit " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 43/64] drm/i915/selftests: Prepare coherency tests " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 44/64] drm/i915/selftests: Prepare context " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 45/64] drm/i915/selftests: Prepare dma-buf " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 46/64] drm/i915/selftests: Prepare execbuf " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 47/64] drm/i915/selftests: Prepare mman testcases " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 48/64] drm/i915/selftests: Prepare object tests " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 49/64] drm/i915/selftests: Prepare object blit " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 50/64] drm/i915/selftests: Prepare igt_gem_utils " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 51/64] drm/i915/selftests: Prepare context selftest " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 52/64] drm/i915/selftests: Prepare hangcheck " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 53/64] drm/i915/selftests: Prepare execlists and lrc selftests " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 54/64] drm/i915/selftests: Prepare mocs tests " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 55/64] drm/i915/selftests: Prepare ring submission " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 56/64] drm/i915/selftests: Prepare timeline tests " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 57/64] drm/i915/selftests: Prepare i915_request " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 58/64] drm/i915/selftests: Prepare memory region " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 59/64] drm/i915/selftests: Prepare cs engine " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 60/64] drm/i915/selftests: Prepare gtt " Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 61/64] drm/i915: Finally remove obj->mm.lock Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 62/64] drm/i915: Keep userpointer bindings if seqcount is unchanged, v2 Maarten Lankhorst
2021-01-18 10:58   ` Thomas Hellström
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 63/64] drm/i915: Move gt_revoke() slightly Maarten Lankhorst
2021-01-18 11:11   ` Thomas Hellström
2021-01-18 12:01     ` Maarten Lankhorst
2021-01-18 13:22       ` Thomas Hellström
2021-01-18 13:28         ` Thomas Hellström
2021-01-18 14:46           ` Maarten Lankhorst
2021-01-18 15:05             ` Thomas Hellström
2021-01-18 15:32               ` Maarten Lankhorst
2021-01-05 15:35 ` [Intel-gfx] [PATCH v6 64/64] drm/i915: Avoid some false positives in assert_object_held() Maarten Lankhorst
2021-01-05 17:24 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Remove obj->mm.lock! (rev12) Patchwork
2021-01-05 17:26 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-01-05 17:29 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2021-01-07 15:42 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Remove obj->mm.lock! (rev13) Patchwork
2021-01-07 15:44 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-01-07 15:47 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2021-01-07 16:11 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4639c3c8-f5a9-def1-7c73-9afade95e7e5@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=thomas.hellstrom@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.