All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] [REPOST] Broadwell HW semaphores
@ 2014-01-29 19:55 Ben Widawsky
  2014-01-29 19:55 ` [PATCH 01/13] drm/i915: Move semaphore specific ring members to struct Ben Widawsky
                   ` (12 more replies)
  0 siblings, 13 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

These are the remaining patches for enabling HW semaphores on Broadwell.
The patches are rebased against the latest drm-intel-nightly, and the
only other intentional modifications were those requested by Chris. The
functionality they provide is the same as before. Unfortunately, I will
not have the ability to test these until I return from FOSDEM - however
it sounds like we have some reviewer time now.

Since last time:
I had a couple of rebase conflicts, and build errors as a result of
moving things (like the invention module parameter structure). They were
trivial, so the only concern there would be if my 'test-every-commit'
script blew up. It also should have all the feedback I had received from
Chris on the first round. I wasn't sure what to do with a couple of the
things Chris said "drm/i915/bdw: collect semaphore error state" for
example. Anything which I was confused about is left in.

As before, the series is pushed here:
git://people.freedesktop.org/~bwidawsk/drm-intel bdw-sema

If you find yourself with a problem after running these, you can either
disable semaphores from the kernel command line, or revert "drm/i915:
unleash semaphores on gen8"


Ben Widawsky (13):
  drm/i915: Move semaphore specific ring members to struct
  drm/i915: Virtualize the ringbuffer signal func
  drm/i915: Move ring_begin to signal()
  drm/i915: Make semaphore updates more precise
  drm/i915: gen specific ring init
  drm/i915/bdw: implement semaphore signal
  drm/i915/bdw: implement semaphore wait
  drm/i915: FORCE_RESTORE for gen8 semaphores
  drm/i915/bdw: poll semaphores
  drm/i915: Extract semaphore error collection
  drm/i915/bdw: collect semaphore error state
  drm/i915: unleash semaphores on gen8
  drm/i915: semaphore debugfs

 drivers/gpu/drm/i915/i915_debugfs.c     |  69 +++++++
 drivers/gpu/drm/i915/i915_drv.c         |   6 -
 drivers/gpu/drm/i915/i915_drv.h         |   2 +
 drivers/gpu/drm/i915/i915_gem.c         |  10 +-
 drivers/gpu/drm/i915/i915_gem_context.c |   9 +
 drivers/gpu/drm/i915/i915_gpu_error.c   |  76 ++++++--
 drivers/gpu/drm/i915/i915_reg.h         |   8 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 330 ++++++++++++++++++++++++--------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  87 ++++++++-
 9 files changed, 483 insertions(+), 114 deletions(-)

-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 01/13] drm/i915: Move semaphore specific ring members to struct
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-29 19:55 ` [PATCH 02/13] drm/i915: Virtualize the ringbuffer signal func Ben Widawsky
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This will be helpful in abstracting some of the code in preparation for
gen8 semaphores.

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem.c         | 10 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c   |  6 +--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 84 ++++++++++++++++-----------------
 drivers/gpu/drm/i915/intel_ringbuffer.h | 17 ++++---
 4 files changed, 60 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 39770f7..1ab6cd0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2082,8 +2082,8 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
 	for_each_ring(ring, dev_priv, i) {
 		intel_ring_init_seqno(ring, seqno);
 
-		for (j = 0; j < ARRAY_SIZE(ring->sync_seqno); j++)
-			ring->sync_seqno[j] = 0;
+		for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
+			ring->semaphore.sync_seqno[j] = 0;
 	}
 
 	return 0;
@@ -2715,7 +2715,7 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	idx = intel_ring_sync_index(from, to);
 
 	seqno = obj->last_read_seqno;
-	if (seqno <= from->sync_seqno[idx])
+	if (seqno <= from->semaphore.sync_seqno[idx])
 		return 0;
 
 	ret = i915_gem_check_olr(obj->ring, seqno);
@@ -2723,13 +2723,13 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		return ret;
 
 	trace_i915_gem_ring_sync_to(from, to, seqno);
-	ret = to->sync_to(to, from, seqno);
+	ret = to->semaphore.sync_to(to, from, seqno);
 	if (!ret)
 		/* We use last_read_seqno because sync_to()
 		 * might have just caused seqno wrap under
 		 * the radar.
 		 */
-		from->sync_seqno[idx] = obj->last_read_seqno;
+		from->semaphore.sync_seqno[idx] = obj->last_read_seqno;
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 4cc9162..a8b91fc 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -767,14 +767,14 @@ static void i915_record_ring_state(struct drm_device *dev,
 			= I915_READ(RING_SYNC_0(ring->mmio_base));
 		error->semaphore_mboxes[ring->id][1]
 			= I915_READ(RING_SYNC_1(ring->mmio_base));
-		error->semaphore_seqno[ring->id][0] = ring->sync_seqno[0];
-		error->semaphore_seqno[ring->id][1] = ring->sync_seqno[1];
+		error->semaphore_seqno[ring->id][0] = ring->semaphore.sync_seqno[0];
+		error->semaphore_seqno[ring->id][1] = ring->semaphore.sync_seqno[1];
 	}
 
 	if (HAS_VEBOX(dev)) {
 		error->semaphore_mboxes[ring->id][2] =
 			I915_READ(RING_SYNC_2(ring->mmio_base));
-		error->semaphore_seqno[ring->id][2] = ring->sync_seqno[2];
+		error->semaphore_seqno[ring->id][2] = ring->semaphore.sync_seqno[2];
 	}
 
 	if (INTEL_INFO(dev)->gen >= 4) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d897a19..4588d7f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -675,7 +675,7 @@ gen6_add_request(struct intel_ring_buffer *ring)
 
 	if (i915_semaphore_is_enabled(dev)) {
 		for_each_ring(useless, dev_priv, i) {
-			u32 mbox_reg = ring->signal_mbox[i];
+			u32 mbox_reg = ring->semaphore.signal_mbox[i];
 			if (mbox_reg != GEN6_NOSYNC)
 				update_mboxes(ring, mbox_reg);
 		}
@@ -720,7 +720,7 @@ gen6_ring_sync(struct intel_ring_buffer *waiter,
 	 */
 	seqno -= 1;
 
-	WARN_ON(signaller->semaphore_register[waiter->id] ==
+	WARN_ON(signaller->semaphore.mbox[waiter->id] ==
 		MI_SEMAPHORE_SYNC_INVALID);
 
 	ret = intel_ring_begin(waiter, 4);
@@ -729,9 +729,8 @@ gen6_ring_sync(struct intel_ring_buffer *waiter,
 
 	/* If seqno wrap happened, omit the wait with no-ops */
 	if (likely(!i915_gem_has_seqno_wrapped(waiter->dev, seqno))) {
-		intel_ring_emit(waiter,
-				dw1 |
-				signaller->semaphore_register[waiter->id]);
+		intel_ring_emit(waiter, dw1 |
+					signaller->semaphore.mbox[waiter->id]);
 		intel_ring_emit(waiter, seqno);
 		intel_ring_emit(waiter, 0);
 		intel_ring_emit(waiter, MI_NOOP);
@@ -1328,7 +1327,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	ring->size = 32 * PAGE_SIZE;
-	memset(ring->sync_seqno, 0, sizeof(ring->sync_seqno));
+	memset(ring->semaphore.sync_seqno, 0,
+	       sizeof(ring->semaphore.sync_seqno));
 
 	init_waitqueue_head(&ring->irq_queue);
 
@@ -1871,15 +1871,15 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->get_seqno = gen6_ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
-		ring->sync_to = gen6_ring_sync;
-		ring->semaphore_register[RCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore_register[VCS] = MI_SEMAPHORE_SYNC_RV;
-		ring->semaphore_register[BCS] = MI_SEMAPHORE_SYNC_RB;
-		ring->semaphore_register[VECS] = MI_SEMAPHORE_SYNC_RVE;
-		ring->signal_mbox[RCS] = GEN6_NOSYNC;
-		ring->signal_mbox[VCS] = GEN6_VRSYNC;
-		ring->signal_mbox[BCS] = GEN6_BRSYNC;
-		ring->signal_mbox[VECS] = GEN6_VERSYNC;
+		ring->semaphore.sync_to = gen6_ring_sync;
+		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_RV;
+		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_RB;
+		ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_RVE;
+		ring->semaphore.signal_mbox[RCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[VCS] = GEN6_VRSYNC;
+		ring->semaphore.signal_mbox[BCS] = GEN6_BRSYNC;
+		ring->semaphore.signal_mbox[VECS] = GEN6_VERSYNC;
 	} else if (IS_GEN5(dev)) {
 		ring->add_request = pc_render_add_request;
 		ring->flush = gen4_render_ring_flush;
@@ -2047,15 +2047,15 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->dispatch_execbuffer =
 				gen6_ring_dispatch_execbuffer;
 		}
-		ring->sync_to = gen6_ring_sync;
-		ring->semaphore_register[RCS] = MI_SEMAPHORE_SYNC_VR;
-		ring->semaphore_register[VCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore_register[BCS] = MI_SEMAPHORE_SYNC_VB;
-		ring->semaphore_register[VECS] = MI_SEMAPHORE_SYNC_VVE;
-		ring->signal_mbox[RCS] = GEN6_RVSYNC;
-		ring->signal_mbox[VCS] = GEN6_NOSYNC;
-		ring->signal_mbox[BCS] = GEN6_BVSYNC;
-		ring->signal_mbox[VECS] = GEN6_VEVSYNC;
+		ring->semaphore.sync_to = gen6_ring_sync;
+		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VR;
+		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VB;
+		ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_VVE;
+		ring->semaphore.signal_mbox[RCS] = GEN6_RVSYNC;
+		ring->semaphore.signal_mbox[VCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[BCS] = GEN6_BVSYNC;
+		ring->semaphore.signal_mbox[VECS] = GEN6_VEVSYNC;
 	} else {
 		ring->mmio_base = BSD_RING_BASE;
 		ring->flush = bsd_ring_flush;
@@ -2104,15 +2104,15 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->irq_put = gen6_ring_put_irq;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 	}
-	ring->sync_to = gen6_ring_sync;
-	ring->semaphore_register[RCS] = MI_SEMAPHORE_SYNC_BR;
-	ring->semaphore_register[VCS] = MI_SEMAPHORE_SYNC_BV;
-	ring->semaphore_register[BCS] = MI_SEMAPHORE_SYNC_INVALID;
-	ring->semaphore_register[VECS] = MI_SEMAPHORE_SYNC_BVE;
-	ring->signal_mbox[RCS] = GEN6_RBSYNC;
-	ring->signal_mbox[VCS] = GEN6_VBSYNC;
-	ring->signal_mbox[BCS] = GEN6_NOSYNC;
-	ring->signal_mbox[VECS] = GEN6_VEBSYNC;
+	ring->semaphore.sync_to = gen6_ring_sync;
+	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_BR;
+	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_BV;
+	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
+	ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_BVE;
+	ring->semaphore.signal_mbox[RCS] = GEN6_RBSYNC;
+	ring->semaphore.signal_mbox[VCS] = GEN6_VBSYNC;
+	ring->semaphore.signal_mbox[BCS] = GEN6_NOSYNC;
+	ring->semaphore.signal_mbox[VECS] = GEN6_VEBSYNC;
 	ring->init = init_ring_common;
 
 	return intel_init_ring_buffer(dev, ring);
@@ -2145,15 +2145,15 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->irq_put = hsw_vebox_put_irq;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 	}
-	ring->sync_to = gen6_ring_sync;
-	ring->semaphore_register[RCS] = MI_SEMAPHORE_SYNC_VER;
-	ring->semaphore_register[VCS] = MI_SEMAPHORE_SYNC_VEV;
-	ring->semaphore_register[BCS] = MI_SEMAPHORE_SYNC_VEB;
-	ring->semaphore_register[VECS] = MI_SEMAPHORE_SYNC_INVALID;
-	ring->signal_mbox[RCS] = GEN6_RVESYNC;
-	ring->signal_mbox[VCS] = GEN6_VVESYNC;
-	ring->signal_mbox[BCS] = GEN6_BVESYNC;
-	ring->signal_mbox[VECS] = GEN6_NOSYNC;
+	ring->semaphore.sync_to = gen6_ring_sync;
+	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VER;
+	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_VEV;
+	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VEB;
+	ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_INVALID;
+	ring->semaphore.signal_mbox[RCS] = GEN6_RVESYNC;
+	ring->semaphore.signal_mbox[VCS] = GEN6_VVESYNC;
+	ring->semaphore.signal_mbox[BCS] = GEN6_BVESYNC;
+	ring->semaphore.signal_mbox[VECS] = GEN6_NOSYNC;
 	ring->init = init_ring_common;
 
 	return intel_init_ring_buffer(dev, ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 71a73f4..b5fc768 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -83,7 +83,6 @@ struct  intel_ring_buffer {
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
 	u32		trace_irq_seqno;
-	u32		sync_seqno[I915_NUM_RINGS-1];
 	bool __must_check (*irq_get)(struct intel_ring_buffer *ring);
 	void		(*irq_put)(struct intel_ring_buffer *ring);
 
@@ -111,14 +110,18 @@ struct  intel_ring_buffer {
 #define I915_DISPATCH_SECURE 0x1
 #define I915_DISPATCH_PINNED 0x2
 	void		(*cleanup)(struct intel_ring_buffer *ring);
-	int		(*sync_to)(struct intel_ring_buffer *ring,
+
+	struct {
+		u32	sync_seqno[I915_NUM_RINGS-1];
+		/* AKA wait() */
+		int	(*sync_to)(struct intel_ring_buffer *ring,
 				   struct intel_ring_buffer *to,
 				   u32 seqno);
-
-	/* our mbox written by others */
-	u32		semaphore_register[I915_NUM_RINGS];
-	/* mboxes this ring signals to */
-	u32		signal_mbox[I915_NUM_RINGS];
+		/* our mbox written by others */
+		u32		mbox[I915_NUM_RINGS];
+		/* mboxes this ring signals to */
+		u32		signal_mbox[I915_NUM_RINGS];
+	} semaphore;
 
 	/**
 	 * List of objects currently involved in rendering from the
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 02/13] drm/i915: Virtualize the ringbuffer signal func
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
  2014-01-29 19:55 ` [PATCH 01/13] drm/i915: Move semaphore specific ring members to struct Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-29 19:55 ` [PATCH 03/13] drm/i915: Move ring_begin to signal() Ben Widawsky
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

This abstraction again is in preparation for gen8. Gen8 will bring new
semantics for doing this operation.

While here, make the writes of MI_NOOPs explicit for non-existent rings.
This should have been implicit before.

NOTE: This is going to be removed in a few patches.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 42 ++++++++++++++++++++-------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 4588d7f..1ff3d9d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -632,20 +632,32 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
 	ring->scratch.obj = NULL;
 }
 
-static void
-update_mboxes(struct intel_ring_buffer *ring,
-	      u32 mmio_offset)
+static void gen6_signal(struct intel_ring_buffer *signaller)
 {
+	struct drm_i915_private *dev_priv = signaller->dev->dev_private;
+	struct intel_ring_buffer *useless;
+	int i;
+
 /* NB: In order to be able to do semaphore MBOX updates for varying number
  * of rings, it's easiest if we round up each individual update to a
  * multiple of 2 (since ring updates must always be a multiple of 2)
  * even though the actual update only requires 3 dwords.
  */
 #define MBOX_UPDATE_DWORDS 4
-	intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
-	intel_ring_emit(ring, mmio_offset);
-	intel_ring_emit(ring, ring->outstanding_lazy_seqno);
-	intel_ring_emit(ring, MI_NOOP);
+	for_each_ring(useless, dev_priv, i) {
+		u32 mbox_reg = signaller->semaphore.signal_mbox[i];
+		if (mbox_reg != GEN6_NOSYNC) {
+			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
+			intel_ring_emit(signaller, mbox_reg);
+			intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
+			intel_ring_emit(signaller, MI_NOOP);
+		} else {
+			intel_ring_emit(signaller, MI_NOOP);
+			intel_ring_emit(signaller, MI_NOOP);
+			intel_ring_emit(signaller, MI_NOOP);
+			intel_ring_emit(signaller, MI_NOOP);
+		}
+	}
 }
 
 /**
@@ -661,9 +673,7 @@ static int
 gen6_add_request(struct intel_ring_buffer *ring)
 {
 	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	struct intel_ring_buffer *useless;
-	int i, ret, num_dwords = 4;
+	int ret, num_dwords = 4;
 
 	if (i915_semaphore_is_enabled(dev))
 		num_dwords += ((I915_NUM_RINGS-1) * MBOX_UPDATE_DWORDS);
@@ -673,13 +683,7 @@ gen6_add_request(struct intel_ring_buffer *ring)
 	if (ret)
 		return ret;
 
-	if (i915_semaphore_is_enabled(dev)) {
-		for_each_ring(useless, dev_priv, i) {
-			u32 mbox_reg = ring->semaphore.signal_mbox[i];
-			if (mbox_reg != GEN6_NOSYNC)
-				update_mboxes(ring, mbox_reg);
-		}
-	}
+	ring->semaphore.signal(ring);
 
 	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
@@ -1872,6 +1876,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->get_seqno = gen6_ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
 		ring->semaphore.sync_to = gen6_ring_sync;
+		ring->semaphore.signal = gen6_signal;
 		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
 		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_RV;
 		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_RB;
@@ -2048,6 +2053,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 				gen6_ring_dispatch_execbuffer;
 		}
 		ring->semaphore.sync_to = gen6_ring_sync;
+		ring->semaphore.signal = gen6_signal;
 		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VR;
 		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
 		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VB;
@@ -2105,6 +2111,7 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 	}
 	ring->semaphore.sync_to = gen6_ring_sync;
+	ring->semaphore.signal = gen6_signal;
 	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_BR;
 	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_BV;
 	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
@@ -2146,6 +2153,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 	}
 	ring->semaphore.sync_to = gen6_ring_sync;
+	ring->semaphore.signal = gen6_signal;
 	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VER;
 	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_VEV;
 	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VEB;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b5fc768..e01a1ff 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -121,6 +121,8 @@ struct  intel_ring_buffer {
 		u32		mbox[I915_NUM_RINGS];
 		/* mboxes this ring signals to */
 		u32		signal_mbox[I915_NUM_RINGS];
+
+		void		(*signal)(struct intel_ring_buffer *signaller);
 	} semaphore;
 
 	/**
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 03/13] drm/i915: Move ring_begin to signal()
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
  2014-01-29 19:55 ` [PATCH 01/13] drm/i915: Move semaphore specific ring members to struct Ben Widawsky
  2014-01-29 19:55 ` [PATCH 02/13] drm/i915: Virtualize the ringbuffer signal func Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-29 19:55 ` [PATCH 04/13] drm/i915: Make semaphore updates more precise Ben Widawsky
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Add_request has always contained both the semaphore mailbox updates as
well as the breadcrumb writes. Since the semaphore signal is the one
which actually knows about the number of dwords it needs to emit to the
ring, we move the ring_begin to that function. This allows us to remove
the hideously shared #define

On a related not, gen8 will use a different number of dwords for
semaphores, but not for add request.

v2: Make number of dwords an explicit part of signalling (via function
argument). (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 39 +++++++++++++++++++--------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 +++-
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1ff3d9d..70f7190 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -632,18 +632,28 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
 	ring->scratch.obj = NULL;
 }
 
-static void gen6_signal(struct intel_ring_buffer *signaller)
+static int gen6_signal(struct intel_ring_buffer *signaller,
+		       unsigned int num_dwords)
 {
-	struct drm_i915_private *dev_priv = signaller->dev->dev_private;
+	struct drm_device *dev = signaller->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *useless;
-	int i;
+	int i, ret;
 
-/* NB: In order to be able to do semaphore MBOX updates for varying number
- * of rings, it's easiest if we round up each individual update to a
- * multiple of 2 (since ring updates must always be a multiple of 2)
- * even though the actual update only requires 3 dwords.
- */
+	/* NB: In order to be able to do semaphore MBOX updates for varying
+	 * number of rings, it's easiest if we round up each individual update
+	 * to a multiple of 2 (since ring updates must always be a multiple of
+	 * 2) even though the actual update only requires 3 dwords.
+	 */
 #define MBOX_UPDATE_DWORDS 4
+	if (i915_semaphore_is_enabled(dev))
+		num_dwords += ((I915_NUM_RINGS-1) * MBOX_UPDATE_DWORDS);
+
+	ret = intel_ring_begin(signaller, num_dwords);
+	if (ret)
+		return ret;
+#undef MBOX_UPDATE_DWORDS
+
 	for_each_ring(useless, dev_priv, i) {
 		u32 mbox_reg = signaller->semaphore.signal_mbox[i];
 		if (mbox_reg != GEN6_NOSYNC) {
@@ -658,6 +668,8 @@ static void gen6_signal(struct intel_ring_buffer *signaller)
 			intel_ring_emit(signaller, MI_NOOP);
 		}
 	}
+
+	return 0;
 }
 
 /**
@@ -672,19 +684,12 @@ static void gen6_signal(struct intel_ring_buffer *signaller)
 static int
 gen6_add_request(struct intel_ring_buffer *ring)
 {
-	struct drm_device *dev = ring->dev;
-	int ret, num_dwords = 4;
-
-	if (i915_semaphore_is_enabled(dev))
-		num_dwords += ((I915_NUM_RINGS-1) * MBOX_UPDATE_DWORDS);
-#undef MBOX_UPDATE_DWORDS
+	int ret;
 
-	ret = intel_ring_begin(ring, num_dwords);
+	ret = ring->semaphore.signal(ring, 4);
 	if (ret)
 		return ret;
 
-	ring->semaphore.signal(ring);
-
 	intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
 	intel_ring_emit(ring, I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
 	intel_ring_emit(ring, ring->outstanding_lazy_seqno);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index e01a1ff..c69ae10 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -122,7 +122,9 @@ struct  intel_ring_buffer {
 		/* mboxes this ring signals to */
 		u32		signal_mbox[I915_NUM_RINGS];
 
-		void		(*signal)(struct intel_ring_buffer *signaller);
+		/* num_dwords is space the caller will need for atomic update */
+		int		(*signal)(struct intel_ring_buffer *signaller,
+					  unsigned int num_dwords);
 	} semaphore;
 
 	/**
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 04/13] drm/i915: Make semaphore updates more precise
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
                   ` (2 preceding siblings ...)
  2014-01-29 19:55 ` [PATCH 03/13] drm/i915: Move ring_begin to signal() Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-30 11:25   ` Ville Syrjälä
  2014-02-11 20:20   ` [PATCH] [v2] " Ben Widawsky
  2014-01-29 19:55 ` [PATCH 05/13] drm/i915: gen specific ring init Ben Widawsky
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

With the ring mask we now have an easy way to know the number of rings
in the system, and therefore can accurately predict the number of dwords
to emit for semaphore signalling. This was not possible (easily)
previously.

There should be no functional impact, simply fewer instructions emitted.

While we're here, simply do the round up to 2 instead of the fancier
rounding we did before, which rounding up per mbox, ie 4.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 43 +++++++++++++++++----------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 70f7190..97789ff 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -635,24 +635,20 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
 static int gen6_signal(struct intel_ring_buffer *signaller,
 		       unsigned int num_dwords)
 {
+#define MBOX_UPDATE_DWORDS 4
 	struct drm_device *dev = signaller->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *useless;
-	int i, ret;
+	int i, ret, num_rings;
 
-	/* NB: In order to be able to do semaphore MBOX updates for varying
-	 * number of rings, it's easiest if we round up each individual update
-	 * to a multiple of 2 (since ring updates must always be a multiple of
-	 * 2) even though the actual update only requires 3 dwords.
-	 */
-#define MBOX_UPDATE_DWORDS 4
-	if (i915_semaphore_is_enabled(dev))
-		num_dwords += ((I915_NUM_RINGS-1) * MBOX_UPDATE_DWORDS);
+	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
+	num_dwords = round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
+#undef MBOX_UPDATE_DWORDS
 
-	ret = intel_ring_begin(signaller, num_dwords);
+	/* XXX: + 4 for the caller */
+	ret = intel_ring_begin(signaller, num_dwords + 4);
 	if (ret)
 		return ret;
-#undef MBOX_UPDATE_DWORDS
 
 	for_each_ring(useless, dev_priv, i) {
 		u32 mbox_reg = signaller->semaphore.signal_mbox[i];
@@ -661,14 +657,11 @@ static int gen6_signal(struct intel_ring_buffer *signaller,
 			intel_ring_emit(signaller, mbox_reg);
 			intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
 			intel_ring_emit(signaller, MI_NOOP);
-		} else {
-			intel_ring_emit(signaller, MI_NOOP);
-			intel_ring_emit(signaller, MI_NOOP);
-			intel_ring_emit(signaller, MI_NOOP);
-			intel_ring_emit(signaller, MI_NOOP);
 		}
 	}
 
+	WARN_ON(i != num_rings);
+
 	return 0;
 }
 
@@ -686,7 +679,11 @@ gen6_add_request(struct intel_ring_buffer *ring)
 {
 	int ret;
 
-	ret = ring->semaphore.signal(ring, 4);
+	if (ring->semaphore.signal)
+		ret = ring->semaphore.signal(ring, 4);
+	else
+		ret = intel_ring_begin(ring, 4);
+
 	if (ret)
 		return ret;
 
@@ -1881,7 +1878,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->get_seqno = gen6_ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
 		ring->semaphore.sync_to = gen6_ring_sync;
-		ring->semaphore.signal = gen6_signal;
+		if (i915_semaphore_is_enabled(dev))
+			ring->semaphore.signal = gen6_signal;
 		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
 		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_RV;
 		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_RB;
@@ -2058,7 +2056,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 				gen6_ring_dispatch_execbuffer;
 		}
 		ring->semaphore.sync_to = gen6_ring_sync;
-		ring->semaphore.signal = gen6_signal;
+		if (i915_semaphore_is_enabled(dev))
+			ring->semaphore.signal = gen6_signal;
 		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VR;
 		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
 		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VB;
@@ -2116,7 +2115,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 	}
 	ring->semaphore.sync_to = gen6_ring_sync;
-	ring->semaphore.signal = gen6_signal;
+	if (i915_semaphore_is_enabled(dev))
+		ring->semaphore.signal = gen6_signal;
 	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_BR;
 	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_BV;
 	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
@@ -2158,7 +2158,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 	}
 	ring->semaphore.sync_to = gen6_ring_sync;
-	ring->semaphore.signal = gen6_signal;
+	if (i915_semaphore_is_enabled(dev))
+		ring->semaphore.signal = gen6_signal;
 	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VER;
 	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_VEV;
 	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VEB;
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 05/13] drm/i915: gen specific ring init
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
                   ` (3 preceding siblings ...)
  2014-01-29 19:55 ` [PATCH 04/13] drm/i915: Make semaphore updates more precise Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-29 19:55 ` [PATCH 06/13] drm/i915/bdw: implement semaphore signal Ben Widawsky
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Gen8 has already had some differentiation with how it handles rings.
Semaphores bring yet more differences, and now is as good a time as any
to do the split.

Also, since gen8 doesn't actually use semaphores up until this point,
put the proper "NULL" values in for the mbox info.

v2: v1 had a stale commit message

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 134 ++++++++++++++++++++++----------
 1 file changed, 92 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 97789ff..37ae2b1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1861,19 +1861,33 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	ring->id = RCS;
 	ring->mmio_base = RENDER_RING_BASE;
 
-	if (INTEL_INFO(dev)->gen >= 6) {
+	if (INTEL_INFO(dev)->gen >= 8) {
+		ring->add_request = gen6_add_request;
+		ring->flush = gen8_render_ring_flush;
+		ring->irq_get = gen8_ring_get_irq;
+		ring->irq_put = gen8_ring_put_irq;
+		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
+		ring->get_seqno = gen6_ring_get_seqno;
+		ring->set_seqno = ring_set_seqno;
+		ring->semaphore.sync_to = gen6_ring_sync;
+		if (i915_semaphore_is_enabled(dev))
+			ring->semaphore.signal = gen6_signal;
+		ring->semaphore.signal = gen6_signal;
+		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.signal_mbox[RCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[VCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[BCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[VECS] = GEN6_NOSYNC;
+	} else if (INTEL_INFO(dev)->gen >= 6) {
 		ring->add_request = gen6_add_request;
 		ring->flush = gen7_render_ring_flush;
 		if (INTEL_INFO(dev)->gen == 6)
 			ring->flush = gen6_render_ring_flush;
-		if (INTEL_INFO(dev)->gen >= 8) {
-			ring->flush = gen8_render_ring_flush;
-			ring->irq_get = gen8_ring_get_irq;
-			ring->irq_put = gen8_ring_put_irq;
-		} else {
-			ring->irq_get = gen6_ring_get_irq;
-			ring->irq_put = gen6_ring_put_irq;
-		}
+		ring->irq_get = gen6_ring_get_irq;
+		ring->irq_put = gen6_ring_put_irq;
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->get_seqno = gen6_ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
@@ -1915,6 +1929,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_enable_mask = I915_USER_INTERRUPT;
 	}
 	ring->write_tail = ring_write_tail;
+
 	if (IS_HASWELL(dev))
 		ring->dispatch_execbuffer = hsw_ring_dispatch_execbuffer;
 	else if (IS_GEN8(dev))
@@ -2048,24 +2063,35 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->irq_put = gen8_ring_put_irq;
 			ring->dispatch_execbuffer =
 				gen8_ring_dispatch_execbuffer;
+			ring->semaphore.sync_to = gen6_ring_sync;
+			if (i915_semaphore_is_enabled(dev))
+				ring->semaphore.signal = gen6_signal;
+			ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
+			ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
+			ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
+			ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_INVALID;
+			ring->semaphore.signal_mbox[RCS] = GEN6_NOSYNC;
+			ring->semaphore.signal_mbox[VCS] = GEN6_NOSYNC;
+			ring->semaphore.signal_mbox[BCS] = GEN6_NOSYNC;
+			ring->semaphore.signal_mbox[VECS] = GEN6_NOSYNC;
 		} else {
 			ring->irq_enable_mask = GT_BSD_USER_INTERRUPT;
 			ring->irq_get = gen6_ring_get_irq;
 			ring->irq_put = gen6_ring_put_irq;
 			ring->dispatch_execbuffer =
 				gen6_ring_dispatch_execbuffer;
+			ring->semaphore.sync_to = gen6_ring_sync;
+			if (i915_semaphore_is_enabled(dev))
+				ring->semaphore.signal = gen6_signal;
+			ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VR;
+			ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
+			ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VB;
+			ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_VVE;
+			ring->semaphore.signal_mbox[RCS] = GEN6_RVSYNC;
+			ring->semaphore.signal_mbox[VCS] = GEN6_NOSYNC;
+			ring->semaphore.signal_mbox[BCS] = GEN6_BVSYNC;
+			ring->semaphore.signal_mbox[VECS] = GEN6_VEVSYNC;
 		}
-		ring->semaphore.sync_to = gen6_ring_sync;
-		if (i915_semaphore_is_enabled(dev))
-			ring->semaphore.signal = gen6_signal;
-		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VR;
-		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VB;
-		ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_VVE;
-		ring->semaphore.signal_mbox[RCS] = GEN6_RVSYNC;
-		ring->semaphore.signal_mbox[VCS] = GEN6_NOSYNC;
-		ring->semaphore.signal_mbox[BCS] = GEN6_BVSYNC;
-		ring->semaphore.signal_mbox[VECS] = GEN6_VEVSYNC;
 	} else {
 		ring->mmio_base = BSD_RING_BASE;
 		ring->flush = bsd_ring_flush;
@@ -2108,23 +2134,35 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->irq_get = gen8_ring_get_irq;
 		ring->irq_put = gen8_ring_put_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
+		ring->semaphore.sync_to = gen6_ring_sync;
+		if (i915_semaphore_is_enabled(dev))
+			ring->semaphore.signal = gen6_signal;
+		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.signal_mbox[RCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[VCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[BCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[VECS] = GEN6_NOSYNC;
 	} else {
 		ring->irq_enable_mask = GT_BLT_USER_INTERRUPT;
 		ring->irq_get = gen6_ring_get_irq;
 		ring->irq_put = gen6_ring_put_irq;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
+		ring->semaphore.sync_to = gen6_ring_sync;
+		if (i915_semaphore_is_enabled(dev))
+			ring->semaphore.signal = gen6_signal;
+		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_BR;
+		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_BV;
+		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_BVE;
+		ring->semaphore.signal_mbox[RCS] = GEN6_RBSYNC;
+		ring->semaphore.signal_mbox[VCS] = GEN6_VBSYNC;
+		ring->semaphore.signal_mbox[BCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[VECS] = GEN6_VEBSYNC;
 	}
-	ring->semaphore.sync_to = gen6_ring_sync;
-	if (i915_semaphore_is_enabled(dev))
-		ring->semaphore.signal = gen6_signal;
-	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_BR;
-	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_BV;
-	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
-	ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_BVE;
-	ring->semaphore.signal_mbox[RCS] = GEN6_RBSYNC;
-	ring->semaphore.signal_mbox[VCS] = GEN6_VBSYNC;
-	ring->semaphore.signal_mbox[BCS] = GEN6_NOSYNC;
-	ring->semaphore.signal_mbox[VECS] = GEN6_VEBSYNC;
+
 	ring->init = init_ring_common;
 
 	return intel_init_ring_buffer(dev, ring);
@@ -2151,23 +2189,35 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->irq_get = gen8_ring_get_irq;
 		ring->irq_put = gen8_ring_put_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
+		ring->semaphore.sync_to = gen6_ring_sync;
+		if (i915_semaphore_is_enabled(dev))
+			ring->semaphore.signal = gen6_signal;
+		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.signal_mbox[RCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[VCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[BCS] = GEN6_NOSYNC;
+		ring->semaphore.signal_mbox[VECS] = GEN6_NOSYNC;
 	} else {
 		ring->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
 		ring->irq_get = hsw_vebox_get_irq;
 		ring->irq_put = hsw_vebox_put_irq;
+		ring->semaphore.sync_to = gen6_ring_sync;
+		if (i915_semaphore_is_enabled(dev))
+			ring->semaphore.signal = gen6_signal;
+		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VER;
+		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_VEV;
+		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VEB;
+		ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_INVALID;
+		ring->semaphore.signal_mbox[RCS] = GEN6_RVESYNC;
+		ring->semaphore.signal_mbox[VCS] = GEN6_VVESYNC;
+		ring->semaphore.signal_mbox[BCS] = GEN6_BVESYNC;
+		ring->semaphore.signal_mbox[VECS] = GEN6_NOSYNC;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 	}
-	ring->semaphore.sync_to = gen6_ring_sync;
-	if (i915_semaphore_is_enabled(dev))
-		ring->semaphore.signal = gen6_signal;
-	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VER;
-	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_VEV;
-	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VEB;
-	ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_INVALID;
-	ring->semaphore.signal_mbox[RCS] = GEN6_RVESYNC;
-	ring->semaphore.signal_mbox[VCS] = GEN6_VVESYNC;
-	ring->semaphore.signal_mbox[BCS] = GEN6_BVESYNC;
-	ring->semaphore.signal_mbox[VECS] = GEN6_NOSYNC;
+
 	ring->init = init_ring_common;
 
 	return intel_init_ring_buffer(dev, ring);
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
                   ` (4 preceding siblings ...)
  2014-01-29 19:55 ` [PATCH 05/13] drm/i915: gen specific ring init Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-30 12:38   ` Ville Syrjälä
  2014-01-29 19:55 ` [PATCH 07/13] drm/i915/bdw: implement semaphore wait Ben Widawsky
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Semaphore signalling works similarly to previous GENs with the exception
that the per ring mailboxes no longer exist. Instead you must define
your own space, somewhere in the GTT.

The comments in the code define the layout I've opted for, which should
be fairly future proof. Ie. I tried to define offsets in abstract terms
(NUM_RINGS, seqno size, etc).

NOTE: If one wanted to move this to the HWSP they could. I've decided
one 4k object would be easier to deal with, and provide potential wins
with cache locality, but that's all speculative.

v2: Update the macro to not need the other ring's ring->id (Chris)
Update the comment to use the correct formula (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h         |   1 +
 drivers/gpu/drm/i915/i915_reg.h         |   5 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 199 +++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  38 +++++-
 4 files changed, 197 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3673ba1..f521059 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1380,6 +1380,7 @@ typedef struct drm_i915_private {
 
 	struct pci_dev *bridge_dev;
 	struct intel_ring_buffer ring[I915_NUM_RINGS];
+	struct drm_i915_gem_object *semaphore_obj;
 	uint32_t last_seqno, next_seqno;
 
 	drm_dma_handle_t *status_page_dmah;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index cbbaf26..8b745dc 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -216,7 +216,7 @@
 #define   MI_DISPLAY_FLIP_IVB_SPRITE_B (3 << 19)
 #define   MI_DISPLAY_FLIP_IVB_PLANE_C  (4 << 19)
 #define   MI_DISPLAY_FLIP_IVB_SPRITE_C (5 << 19)
-#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6+ */
+#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6, gen7 */
 #define   MI_SEMAPHORE_GLOBAL_GTT    (1<<22)
 #define   MI_SEMAPHORE_UPDATE	    (1<<21)
 #define   MI_SEMAPHORE_COMPARE	    (1<<20)
@@ -241,6 +241,8 @@
 #define   MI_RESTORE_EXT_STATE_EN	(1<<2)
 #define   MI_FORCE_RESTORE		(1<<1)
 #define   MI_RESTORE_INHIBIT		(1<<0)
+#define MI_SEMAPHORE_SIGNAL	MI_INSTR(0x1b, 0) /* GEN8+ */
+#define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
 #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
 #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
@@ -329,6 +331,7 @@
 #define   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE		(1<<10) /* GM45+ only */
 #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
 #define   PIPE_CONTROL_NOTIFY				(1<<8)
+#define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
 #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
 #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
 #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 37ae2b1..b750835 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -619,6 +619,13 @@ static int init_render_ring(struct intel_ring_buffer *ring)
 static void render_ring_cleanup(struct intel_ring_buffer *ring)
 {
 	struct drm_device *dev = ring->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (dev_priv->semaphore_obj) {
+		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
+		drm_gem_object_unreference(&dev_priv->semaphore_obj->base);
+		dev_priv->semaphore_obj = NULL;
+	}
 
 	if (ring->scratch.obj == NULL)
 		return;
@@ -632,6 +639,86 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
 	ring->scratch.obj = NULL;
 }
 
+static int gen8_rcs_signal(struct intel_ring_buffer *signaller,
+			   unsigned int num_dwords)
+{
+#define MBOX_UPDATE_DWORDS 8
+	struct drm_device *dev = signaller->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ring_buffer *waiter;
+	int i, ret, num_rings;
+
+	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
+	num_dwords = (num_rings-1) * MBOX_UPDATE_DWORDS;
+#undef MBOX_UPDATE_DWORDS
+
+	/* XXX: + 4 for the caller */
+	ret = intel_ring_begin(signaller, num_dwords + 4);
+	if (ret)
+		return ret;
+
+	for_each_ring(waiter, dev_priv, i) {
+		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
+		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
+			continue;
+
+		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
+		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
+					   PIPE_CONTROL_QW_WRITE |
+					   PIPE_CONTROL_FLUSH_ENABLE);
+		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
+		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
+		intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
+		intel_ring_emit(signaller, 0);
+		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
+					   MI_SEMAPHORE_TARGET(waiter->id));
+		intel_ring_emit(signaller, 0);
+	}
+
+	WARN_ON(i != num_rings);
+
+	return 0;
+}
+
+static int gen8_xcs_signal(struct intel_ring_buffer *signaller,
+			   unsigned int num_dwords)
+{
+#define MBOX_UPDATE_DWORDS 6
+	struct drm_device *dev = signaller->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ring_buffer *waiter;
+	int i, ret, num_rings;
+
+	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
+	num_dwords = (num_rings-1) * MBOX_UPDATE_DWORDS;
+#undef MBOX_UPDATE_DWORDS
+
+	/* XXX: + 4 for the caller */
+	ret = intel_ring_begin(signaller, num_dwords + 4);
+	if (ret)
+		return ret;
+
+	for_each_ring(waiter, dev_priv, i) {
+		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
+		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
+			continue;
+
+		intel_ring_emit(signaller, (MI_FLUSH_DW + 1) |
+					   MI_FLUSH_DW_OP_STOREDW);
+		intel_ring_emit(signaller, lower_32_bits(gtt_offset) |
+					   MI_FLUSH_DW_USE_GTT);
+		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
+		intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
+		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
+					   MI_SEMAPHORE_TARGET(waiter->id));
+		intel_ring_emit(signaller, 0);
+	}
+
+	WARN_ON(i != num_rings);
+
+	return 0;
+}
+
 static int gen6_signal(struct intel_ring_buffer *signaller,
 		       unsigned int num_dwords)
 {
@@ -1852,16 +1939,67 @@ static int gen6_ring_flush(struct intel_ring_buffer *ring,
 	return 0;
 }
 
+/* seqno size is actually only a uint32, but since we plan to use MI_FLUSH_DW to
+ * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
+ */
+#define SEQNO_SIZE sizeof(uint64_t)
+#define GEN8_SIGNAL_OFFSET(to) \
+	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
+	(ring->id * I915_NUM_RINGS * SEQNO_SIZE) + \
+	(SEQNO_SIZE * (to)))
+
+#define GEN8_WAIT_OFFSET(from) \
+	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
+	((from) * I915_NUM_RINGS * SEQNO_SIZE) + \
+	(SEQNO_SIZE * ring->id))
+
+#define GEN8_RING_SEMAPHORE_INIT do { \
+	if (!dev_priv->semaphore_obj) { \
+		break; \
+	} \
+	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(RCS); \
+	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(VCS); \
+	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(BCS); \
+	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(VECS); \
+	ring->semaphore.mbox[RCS] = GEN8_WAIT_OFFSET(RCS); \
+	ring->semaphore.mbox[VCS] = GEN8_WAIT_OFFSET(VCS); \
+	ring->semaphore.mbox[BCS] = GEN8_WAIT_OFFSET(BCS); \
+	ring->semaphore.mbox[VECS] = GEN8_WAIT_OFFSET(VECS); \
+	ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \
+	ring->semaphore.mbox[ring->id] = GEN6_NOSYNC; \
+	} while(0)
+#undef seqno_size
+
+
+
 int intel_init_render_ring_buffer(struct drm_device *dev)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring = &dev_priv->ring[RCS];
+	struct drm_i915_gem_object *obj;
+	int ret;
 
 	ring->name = "render ring";
 	ring->id = RCS;
 	ring->mmio_base = RENDER_RING_BASE;
 
 	if (INTEL_INFO(dev)->gen >= 8) {
+		if (i915_semaphore_is_enabled(dev)) {
+			obj = i915_gem_alloc_object(dev, 4096);
+			if (obj == NULL) {
+				DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n");
+				i915.semaphores = 0;
+			} else {
+				i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+				ret = i915_gem_obj_ggtt_pin(obj, 0, false, true);
+				if (ret != 0) {
+					drm_gem_object_unreference(&obj->base);
+					DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n");
+					i915.semaphores = 0;
+				} else
+					dev_priv->semaphore_obj = obj;
+			}
+		}
 		ring->add_request = gen6_add_request;
 		ring->flush = gen8_render_ring_flush;
 		ring->irq_get = gen8_ring_get_irq;
@@ -1870,17 +2008,11 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->get_seqno = gen6_ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
 		ring->semaphore.sync_to = gen6_ring_sync;
-		if (i915_semaphore_is_enabled(dev))
-			ring->semaphore.signal = gen6_signal;
-		ring->semaphore.signal = gen6_signal;
-		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.signal_mbox[RCS] = GEN6_NOSYNC;
-		ring->semaphore.signal_mbox[VCS] = GEN6_NOSYNC;
-		ring->semaphore.signal_mbox[BCS] = GEN6_NOSYNC;
-		ring->semaphore.signal_mbox[VECS] = GEN6_NOSYNC;
+		if (i915_semaphore_is_enabled(dev)) {
+			BUG_ON(!dev_priv->semaphore_obj);
+			ring->semaphore.signal = gen8_rcs_signal;
+			GEN8_RING_SEMAPHORE_INIT;
+		}
 	} else if (INTEL_INFO(dev)->gen >= 6) {
 		ring->add_request = gen6_add_request;
 		ring->flush = gen7_render_ring_flush;
@@ -1947,9 +2079,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 
 	/* Workaround batchbuffer to combat CS tlb bug. */
 	if (HAS_BROKEN_CS_TLB(dev)) {
-		struct drm_i915_gem_object *obj;
-		int ret;
-
 		obj = i915_gem_alloc_object(dev, I830_BATCH_LIMIT);
 		if (obj == NULL) {
 			DRM_ERROR("Failed to allocate batch bo\n");
@@ -2064,16 +2193,10 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->dispatch_execbuffer =
 				gen8_ring_dispatch_execbuffer;
 			ring->semaphore.sync_to = gen6_ring_sync;
-			if (i915_semaphore_is_enabled(dev))
-				ring->semaphore.signal = gen6_signal;
-			ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
-			ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
-			ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
-			ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_INVALID;
-			ring->semaphore.signal_mbox[RCS] = GEN6_NOSYNC;
-			ring->semaphore.signal_mbox[VCS] = GEN6_NOSYNC;
-			ring->semaphore.signal_mbox[BCS] = GEN6_NOSYNC;
-			ring->semaphore.signal_mbox[VECS] = GEN6_NOSYNC;
+			if (i915_semaphore_is_enabled(dev)) {
+				ring->semaphore.signal = gen8_xcs_signal;
+				GEN8_RING_SEMAPHORE_INIT;
+			}
 		} else {
 			ring->irq_enable_mask = GT_BSD_USER_INTERRUPT;
 			ring->irq_get = gen6_ring_get_irq;
@@ -2135,16 +2258,10 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->irq_put = gen8_ring_put_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
 		ring->semaphore.sync_to = gen6_ring_sync;
-		if (i915_semaphore_is_enabled(dev))
-			ring->semaphore.signal = gen6_signal;
-		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.signal_mbox[RCS] = GEN6_NOSYNC;
-		ring->semaphore.signal_mbox[VCS] = GEN6_NOSYNC;
-		ring->semaphore.signal_mbox[BCS] = GEN6_NOSYNC;
-		ring->semaphore.signal_mbox[VECS] = GEN6_NOSYNC;
+		if (i915_semaphore_is_enabled(dev)) {
+			ring->semaphore.signal = gen8_xcs_signal;
+			GEN8_RING_SEMAPHORE_INIT;
+		}
 	} else {
 		ring->irq_enable_mask = GT_BLT_USER_INTERRUPT;
 		ring->irq_get = gen6_ring_get_irq;
@@ -2190,16 +2307,10 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->irq_put = gen8_ring_put_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
 		ring->semaphore.sync_to = gen6_ring_sync;
-		if (i915_semaphore_is_enabled(dev))
-			ring->semaphore.signal = gen6_signal;
-		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.mbox[VECS] = MI_SEMAPHORE_SYNC_INVALID;
-		ring->semaphore.signal_mbox[RCS] = GEN6_NOSYNC;
-		ring->semaphore.signal_mbox[VCS] = GEN6_NOSYNC;
-		ring->semaphore.signal_mbox[BCS] = GEN6_NOSYNC;
-		ring->semaphore.signal_mbox[VECS] = GEN6_NOSYNC;
+		if (i915_semaphore_is_enabled(dev)) {
+			ring->semaphore.signal = gen8_xcs_signal;
+			GEN8_RING_SEMAPHORE_INIT;
+		}
 	} else {
 		ring->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
 		ring->irq_get = hsw_vebox_get_irq;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c69ae10..f1e7a66 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -111,6 +111,39 @@ struct  intel_ring_buffer {
 #define I915_DISPATCH_PINNED 0x2
 	void		(*cleanup)(struct intel_ring_buffer *ring);
 
+	/* GEN8 signal/wait table
+	 *	  signal to  signal to    signal to   signal to
+	 *	    RCS         VCS          BCS        VECS
+	 *      ------------------------------------------------------
+	 *  RCS | NOP (0x00) | BCS (0x08) | VCS (0x10) | VECS (0x18) |
+	 *	|-----------------------------------------------------
+	 *  VCS | RCS (0x20) | NOP (0x28) | BCS (0x30) | VECS (0x38) |
+	 *	|-----------------------------------------------------
+	 *  BCS | RCS (0x40) | VCS (0x48) | NOP (0x50) | VECS (0x58) |
+	 *	|-----------------------------------------------------
+	 * VECS | RCS (0x60) | VCS (0x68) | BCS (0x70) |  NOP (0x78) |
+	 *	|-----------------------------------------------------
+	 *
+	 * Generalization:
+	 *  f(x, y) := (x->id * NUM_RINGS * seqno_size) + (seqno_size * y->id)
+	 *  ie. transpose of g(x, y)
+	 *
+	 *	 sync from   sync from    sync from    sync from
+	 *	    RCS         VCS          BCS        VECS
+	 *      ------------------------------------------------------
+	 *  RCS | NOP (0x00) | BCS (0x20) | VCS (0x40) | VECS (0x60) |
+	 *	|-----------------------------------------------------
+	 *  VCS | RCS (0x08) | NOP (0x28) | BCS (0x48) | VECS (0x68) |
+	 *	|-----------------------------------------------------
+	 *  BCS | RCS (0x10) | VCS (0x30) | NOP (0x50) | VECS (0x60) |
+	 *	|-----------------------------------------------------
+	 * VECS | RCS (0x18) | VCS (0x38) | BCS (0x58) |  NOP (0x78) |
+	 *	|-----------------------------------------------------
+	 *
+	 * Generalization:
+	 *  g(x, y) := (y->id * NUM_RINGS * seqno_size) + (seqno_size * x->id)
+	 *  ie. transpose of f(x, y)
+	 */
 	struct {
 		u32	sync_seqno[I915_NUM_RINGS-1];
 		/* AKA wait() */
@@ -120,7 +153,10 @@ struct  intel_ring_buffer {
 		/* our mbox written by others */
 		u32		mbox[I915_NUM_RINGS];
 		/* mboxes this ring signals to */
-		u32		signal_mbox[I915_NUM_RINGS];
+		union {
+			u32		signal_mbox[I915_NUM_RINGS];
+			u64		signal_ggtt[I915_NUM_RINGS];
+		};
 
 		/* num_dwords is space the caller will need for atomic update */
 		int		(*signal)(struct intel_ring_buffer *signaller,
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 07/13] drm/i915/bdw: implement semaphore wait
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
                   ` (5 preceding siblings ...)
  2014-01-29 19:55 ` [PATCH 06/13] drm/i915/bdw: implement semaphore signal Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-30 12:48   ` Ville Syrjälä
  2014-01-29 19:55 ` [PATCH 08/13] drm/i915: FORCE_RESTORE for gen8 semaphores Ben Widawsky
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Semaphore waits use a new instruction, MI_SEMAPHORE_WAIT. The seqno to
wait on is all well defined by the table in the previous patch. There is
nothing else different from previous GEN's semaphore synchronization
code.

v2: Update macros to not require the other ring's ring->id (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_reg.h         |  3 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 66 +++++++++++++++------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h | 30 +++++++++++++++
 3 files changed, 62 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 8b745dc..6e8edaf 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -243,6 +243,9 @@
 #define   MI_RESTORE_INHIBIT		(1<<0)
 #define MI_SEMAPHORE_SIGNAL	MI_INSTR(0x1b, 0) /* GEN8+ */
 #define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
+#define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
+#define   MI_SEMAPHORE_POLL		(1<<15)
+#define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
 #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
 #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b750835..3cfcc78 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -797,6 +797,31 @@ static inline bool i915_gem_has_seqno_wrapped(struct drm_device *dev,
  * @signaller - ring which has, or will signal
  * @seqno - seqno which the waiter will block on
  */
+
+static int
+gen8_ring_sync(struct intel_ring_buffer *waiter,
+	       struct intel_ring_buffer *signaller,
+	       u32 seqno)
+{
+	struct drm_i915_private *dev_priv = waiter->dev->dev_private;
+	int ret;
+
+	ret = intel_ring_begin(waiter, 4);
+	if (ret)
+		return ret;
+
+	intel_ring_emit(waiter, MI_SEMAPHORE_WAIT |
+				MI_SEMAPHORE_GLOBAL_GTT |
+				MI_SEMAPHORE_SAD_GTE_SDD);
+	intel_ring_emit(waiter, seqno);
+	intel_ring_emit(waiter,
+			lower_32_bits(GEN8_WAIT_OFFSET(waiter, signaller->id)));
+	intel_ring_emit(waiter,
+			upper_32_bits(GEN8_WAIT_OFFSET(waiter, signaller->id)));
+	intel_ring_advance(waiter);
+	return 0;
+}
+
 static int
 gen6_ring_sync(struct intel_ring_buffer *waiter,
 	       struct intel_ring_buffer *signaller,
@@ -1939,39 +1964,6 @@ static int gen6_ring_flush(struct intel_ring_buffer *ring,
 	return 0;
 }
 
-/* seqno size is actually only a uint32, but since we plan to use MI_FLUSH_DW to
- * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
- */
-#define SEQNO_SIZE sizeof(uint64_t)
-#define GEN8_SIGNAL_OFFSET(to) \
-	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
-	(ring->id * I915_NUM_RINGS * SEQNO_SIZE) + \
-	(SEQNO_SIZE * (to)))
-
-#define GEN8_WAIT_OFFSET(from) \
-	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
-	((from) * I915_NUM_RINGS * SEQNO_SIZE) + \
-	(SEQNO_SIZE * ring->id))
-
-#define GEN8_RING_SEMAPHORE_INIT do { \
-	if (!dev_priv->semaphore_obj) { \
-		break; \
-	} \
-	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(RCS); \
-	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(VCS); \
-	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(BCS); \
-	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(VECS); \
-	ring->semaphore.mbox[RCS] = GEN8_WAIT_OFFSET(RCS); \
-	ring->semaphore.mbox[VCS] = GEN8_WAIT_OFFSET(VCS); \
-	ring->semaphore.mbox[BCS] = GEN8_WAIT_OFFSET(BCS); \
-	ring->semaphore.mbox[VECS] = GEN8_WAIT_OFFSET(VECS); \
-	ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \
-	ring->semaphore.mbox[ring->id] = GEN6_NOSYNC; \
-	} while(0)
-#undef seqno_size
-
-
-
 int intel_init_render_ring_buffer(struct drm_device *dev)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
@@ -2007,7 +1999,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->get_seqno = gen6_ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
-		ring->semaphore.sync_to = gen6_ring_sync;
+		ring->semaphore.sync_to = gen8_ring_sync;
 		if (i915_semaphore_is_enabled(dev)) {
 			BUG_ON(!dev_priv->semaphore_obj);
 			ring->semaphore.signal = gen8_rcs_signal;
@@ -2192,7 +2184,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->irq_put = gen8_ring_put_irq;
 			ring->dispatch_execbuffer =
 				gen8_ring_dispatch_execbuffer;
-			ring->semaphore.sync_to = gen6_ring_sync;
+			ring->semaphore.sync_to = gen8_ring_sync;
 			if (i915_semaphore_is_enabled(dev)) {
 				ring->semaphore.signal = gen8_xcs_signal;
 				GEN8_RING_SEMAPHORE_INIT;
@@ -2257,7 +2249,7 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->irq_get = gen8_ring_get_irq;
 		ring->irq_put = gen8_ring_put_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
-		ring->semaphore.sync_to = gen6_ring_sync;
+		ring->semaphore.sync_to = gen8_ring_sync;
 		if (i915_semaphore_is_enabled(dev)) {
 			ring->semaphore.signal = gen8_xcs_signal;
 			GEN8_RING_SEMAPHORE_INIT;
@@ -2306,7 +2298,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->irq_get = gen8_ring_get_irq;
 		ring->irq_put = gen8_ring_put_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
-		ring->semaphore.sync_to = gen6_ring_sync;
+		ring->semaphore.sync_to = gen8_ring_sync;
 		if (i915_semaphore_is_enabled(dev)) {
 			ring->semaphore.signal = gen8_xcs_signal;
 			GEN8_RING_SEMAPHORE_INIT;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index f1e7a66..ed55370 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -33,6 +33,36 @@ struct  intel_hw_status_page {
 #define I915_READ_IMR(ring) I915_READ(RING_IMR((ring)->mmio_base))
 #define I915_WRITE_IMR(ring, val) I915_WRITE(RING_IMR((ring)->mmio_base), val)
 
+/* seqno size is actually only a uint32, but since we plan to use MI_FLUSH_DW to
+ * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
+ */
+#define i915_semaphore_seqno_size sizeof(uint64_t)
+#define GEN8_SIGNAL_OFFSET(to) \
+	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
+	(ring->id * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
+	(i915_semaphore_seqno_size * (to)))
+
+#define GEN8_WAIT_OFFSET(__ring, from) \
+	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
+	((from) * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
+	(i915_semaphore_seqno_size * (__ring)->id))
+
+#define GEN8_RING_SEMAPHORE_INIT do { \
+	if (!dev_priv->semaphore_obj) { \
+		break; \
+	} \
+	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(RCS); \
+	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(VCS); \
+	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(BCS); \
+	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(VECS); \
+	ring->semaphore.mbox[RCS] = GEN8_WAIT_OFFSET(ring, RCS); \
+	ring->semaphore.mbox[VCS] = GEN8_WAIT_OFFSET(ring, VCS); \
+	ring->semaphore.mbox[BCS] = GEN8_WAIT_OFFSET(ring, BCS); \
+	ring->semaphore.mbox[VECS] = GEN8_WAIT_OFFSET(ring, VECS); \
+	ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \
+	ring->semaphore.mbox[ring->id] = GEN6_NOSYNC; \
+	} while(0)
+
 enum intel_ring_hangcheck_action {
 	HANGCHECK_IDLE = 0,
 	HANGCHECK_WAIT,
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 08/13] drm/i915: FORCE_RESTORE for gen8 semaphores
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
                   ` (6 preceding siblings ...)
  2014-01-29 19:55 ` [PATCH 07/13] drm/i915/bdw: implement semaphore wait Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-29 19:55 ` [PATCH 09/13] drm/i915/bdw: poll semaphores Ben Widawsky
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Implement the note indicated in the bspec.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 2b0598e..bb84be8 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -659,6 +659,15 @@ static int do_switch(struct intel_ring_buffer *ring,
 	if (!to->is_initialized || is_default_context(to))
 		hw_flags |= MI_RESTORE_INHIBIT;
 
+	/* When SW intends to use semaphore signaling between Command streamers,
+	 * it must avoid lite restores in HW by programming "Force Restore" bit
+	 * to ‘1’ in context descriptor during context submission
+	 *
+	 * XXX: is this really needed for ringbuffer mode?
+	 */
+	if (IS_GEN8(ring->dev) && i915_semaphore_is_enabled(ring->dev))
+		hw_flags |= MI_FORCE_RESTORE;
+
 	ret = mi_set_context(ring, to, hw_flags);
 	if (ret)
 		goto unpin_out;
-- 
1.8.5.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 09/13] drm/i915/bdw: poll semaphores
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
                   ` (7 preceding siblings ...)
  2014-01-29 19:55 ` [PATCH 08/13] drm/i915: FORCE_RESTORE for gen8 semaphores Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-30 13:26   ` Ville Syrjälä
  2014-01-29 19:55 ` [PATCH 10/13] drm/i915: Extract semaphore error collection Ben Widawsky
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 3cfcc78..3a3ba81 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -812,6 +812,7 @@ gen8_ring_sync(struct intel_ring_buffer *waiter,
 
 	intel_ring_emit(waiter, MI_SEMAPHORE_WAIT |
 				MI_SEMAPHORE_GLOBAL_GTT |
+				MI_SEMAPHORE_POLL |
 				MI_SEMAPHORE_SAD_GTE_SDD);
 	intel_ring_emit(waiter, seqno);
 	intel_ring_emit(waiter,
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 10/13] drm/i915: Extract semaphore error collection
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
                   ` (8 preceding siblings ...)
  2014-01-29 19:55 ` [PATCH 09/13] drm/i915/bdw: poll semaphores Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-29 19:55 ` [PATCH 11/13] drm/i915/bdw: collect semaphore error state Ben Widawsky
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a8b91fc..efaad96 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -754,6 +754,24 @@ i915_error_first_batchbuffer(struct drm_i915_private *dev_priv,
 	return NULL;
 }
 
+static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
+					struct drm_i915_error_state *error,
+					struct intel_ring_buffer *ring)
+{
+	error->semaphore_mboxes[ring->id][0]
+			= I915_READ(RING_SYNC_0(ring->mmio_base));
+	error->semaphore_mboxes[ring->id][1]
+		= I915_READ(RING_SYNC_1(ring->mmio_base));
+	error->semaphore_seqno[ring->id][0] = ring->semaphore.sync_seqno[0];
+	error->semaphore_seqno[ring->id][1] = ring->semaphore.sync_seqno[1];
+
+	if (HAS_VEBOX(dev_priv->dev)) {
+		error->semaphore_mboxes[ring->id][2] =
+			I915_READ(RING_SYNC_2(ring->mmio_base));
+		error->semaphore_seqno[ring->id][2] = ring->semaphore.sync_seqno[2];
+	}
+}
+
 static void i915_record_ring_state(struct drm_device *dev,
 				   struct drm_i915_error_state *error,
 				   struct intel_ring_buffer *ring)
@@ -763,18 +781,7 @@ static void i915_record_ring_state(struct drm_device *dev,
 	if (INTEL_INFO(dev)->gen >= 6) {
 		error->rc_psmi[ring->id] = I915_READ(ring->mmio_base + 0x50);
 		error->fault_reg[ring->id] = I915_READ(RING_FAULT_REG(ring));
-		error->semaphore_mboxes[ring->id][0]
-			= I915_READ(RING_SYNC_0(ring->mmio_base));
-		error->semaphore_mboxes[ring->id][1]
-			= I915_READ(RING_SYNC_1(ring->mmio_base));
-		error->semaphore_seqno[ring->id][0] = ring->semaphore.sync_seqno[0];
-		error->semaphore_seqno[ring->id][1] = ring->semaphore.sync_seqno[1];
-	}
-
-	if (HAS_VEBOX(dev)) {
-		error->semaphore_mboxes[ring->id][2] =
-			I915_READ(RING_SYNC_2(ring->mmio_base));
-		error->semaphore_seqno[ring->id][2] = ring->semaphore.sync_seqno[2];
+		gen6_record_semaphore_state(dev_priv, error, ring);
 	}
 
 	if (INTEL_INFO(dev)->gen >= 4) {
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 11/13] drm/i915/bdw: collect semaphore error state
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
                   ` (9 preceding siblings ...)
  2014-01-29 19:55 ` [PATCH 10/13] drm/i915: Extract semaphore error collection Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-30 14:53   ` Ville Syrjälä
  2014-01-29 19:55 ` [PATCH 12/13] drm/i915: unleash semaphores on gen8 Ben Widawsky
  2014-01-29 19:55 ` [PATCH 13/13] drm/i915: semaphore debugfs Ben Widawsky
  12 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Since the semaphore information is in an object, just dump it, and let
the user parse it later.

NOTE: The page being used for the semaphores are incoherent with the
CPU. No matter what I do, I cannot figure out a way to read anything but
0s. Note that the semaphore waits are indeed working.

v2: Don't print signal, and wait (they should be the same). Instead,
print sync_seqno (Chris)

v3: Free the semaphore error object (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/i915_gpu_error.c   | 47 ++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 12 ++++-----
 3 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f521059..b08e6eb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -313,6 +313,7 @@ struct drm_i915_error_state {
 	u32 acthd[I915_NUM_RINGS];
 	u32 semaphore_mboxes[I915_NUM_RINGS][I915_NUM_RINGS - 1];
 	u32 semaphore_seqno[I915_NUM_RINGS][I915_NUM_RINGS - 1];
+	struct drm_i915_error_object *semaphore_obj;
 	u32 rc_psmi[I915_NUM_RINGS]; /* sleep state */
 	/* our own tracking of ring head and tail */
 	u32 cpu_ring_head[I915_NUM_RINGS];
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index efaad96..d6afc01 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -297,6 +297,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 	struct drm_device *dev = error_priv->dev;
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct drm_i915_error_state *error = error_priv->error;
+	struct drm_i915_error_object *obj;
 	int i, j, page, offset, elt;
 
 	if (!error) {
@@ -345,8 +346,6 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 				    error->pinned_bo_count[0]);
 
 	for (i = 0; i < ARRAY_SIZE(error->ring); i++) {
-		struct drm_i915_error_object *obj;
-
 		if ((obj = error->ring[i].batchbuffer)) {
 			err_printf(m, "%s --- gtt_offset = 0x%08x\n",
 				   dev_priv->ring[i].name,
@@ -421,6 +420,19 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		}
 	}
 
+	obj = error->semaphore_obj;
+	if (obj) {
+		err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
+		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
+			err_printf(m, "[%04x] %08x %08x %08x %08x\n",
+				   elt * 4,
+				   obj->pages[0][elt],
+				   obj->pages[0][elt+1],
+				   obj->pages[0][elt+2],
+				   obj->pages[0][elt+3]);
+		}
+	}
+
 	if (error->overlay)
 		intel_overlay_print_error_state(m, error->overlay);
 
@@ -491,6 +503,7 @@ static void i915_error_state_free(struct kref *error_ref)
 		kfree(error->ring[i].requests);
 	}
 
+	i915_error_object_free(error->semaphore_obj);
 	kfree(error->active_bo);
 	kfree(error->overlay);
 	kfree(error->display);
@@ -772,6 +785,31 @@ static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
 	}
 }
 
+static void gen8_record_semaphore_state(struct drm_i915_private *dev_priv,
+					struct drm_i915_error_state *error,
+					struct intel_ring_buffer *ring)
+{
+	struct intel_ring_buffer *useless;
+	int i;
+
+	if (!i915_semaphore_is_enabled(dev_priv->dev))
+		return;
+
+	if (!error->semaphore_obj)
+		error->semaphore_obj =
+			i915_error_object_create(dev_priv,
+						 dev_priv->semaphore_obj,
+						 &dev_priv->gtt.base);
+
+	for_each_ring(useless, dev_priv, i) {
+		u16 signal_offset = GEN8_SIGNAL_OFFSET(ring, i) / 4;
+		u32 *tmp = error->semaphore_obj->pages[0];
+
+		error->semaphore_mboxes[ring->id][i] = tmp[signal_offset];
+		error->semaphore_seqno[ring->id][i] = ring->semaphore.sync_seqno[i];
+	}
+}
+
 static void i915_record_ring_state(struct drm_device *dev,
 				   struct drm_i915_error_state *error,
 				   struct intel_ring_buffer *ring)
@@ -781,7 +819,10 @@ static void i915_record_ring_state(struct drm_device *dev,
 	if (INTEL_INFO(dev)->gen >= 6) {
 		error->rc_psmi[ring->id] = I915_READ(ring->mmio_base + 0x50);
 		error->fault_reg[ring->id] = I915_READ(RING_FAULT_REG(ring));
-		gen6_record_semaphore_state(dev_priv, error, ring);
+		if (INTEL_INFO(dev)->gen >= 8)
+			gen8_record_semaphore_state(dev_priv, error, ring);
+		else
+			gen6_record_semaphore_state(dev_priv, error, ring);
 	}
 
 	if (INTEL_INFO(dev)->gen >= 4) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ed55370..4ca2789 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -37,9 +37,9 @@ struct  intel_hw_status_page {
  * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
  */
 #define i915_semaphore_seqno_size sizeof(uint64_t)
-#define GEN8_SIGNAL_OFFSET(to) \
+#define GEN8_SIGNAL_OFFSET(__ring, to) \
 	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
-	(ring->id * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
+	((__ring)->id * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
 	(i915_semaphore_seqno_size * (to)))
 
 #define GEN8_WAIT_OFFSET(__ring, from) \
@@ -51,10 +51,10 @@ struct  intel_hw_status_page {
 	if (!dev_priv->semaphore_obj) { \
 		break; \
 	} \
-	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(RCS); \
-	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(VCS); \
-	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(BCS); \
-	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(VECS); \
+	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(ring, RCS); \
+	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(ring, VCS); \
+	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(ring, BCS); \
+	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(ring, VECS); \
 	ring->semaphore.mbox[RCS] = GEN8_WAIT_OFFSET(ring, RCS); \
 	ring->semaphore.mbox[VCS] = GEN8_WAIT_OFFSET(ring, VCS); \
 	ring->semaphore.mbox[BCS] = GEN8_WAIT_OFFSET(ring, BCS); \
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 12/13] drm/i915: unleash semaphores on gen8
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
                   ` (10 preceding siblings ...)
  2014-01-29 19:55 ` [PATCH 11/13] drm/i915/bdw: collect semaphore error state Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  2014-01-29 19:55 ` [PATCH 13/13] drm/i915: semaphore debugfs Ben Widawsky
  12 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Everything should be lined up now to make gen8 semaphores work like they
did on previous generations, so just do it.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index a071748..e34bcf2 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -364,12 +364,6 @@ bool i915_semaphore_is_enabled(struct drm_device *dev)
 	if (INTEL_INFO(dev)->gen < 6)
 		return false;
 
-	/* Until we get further testing... */
-	if (IS_GEN8(dev)) {
-		WARN_ON(!i915.preliminary_hw_support);
-		return false;
-	}
-
 	if (i915.semaphores >= 0)
 		return i915.semaphores;
 
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 13/13] drm/i915: semaphore debugfs
  2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
                   ` (11 preceding siblings ...)
  2014-01-29 19:55 ` [PATCH 12/13] drm/i915: unleash semaphores on gen8 Ben Widawsky
@ 2014-01-29 19:55 ` Ben Widawsky
  12 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-01-29 19:55 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

Simple debugfs file to display the current state of semaphores. This is
useful if you want to see the state without hanging the GPU.

NOTE: This patch is optional to the series.

NOTE2: Like the GPU error state collection, the reads are currently
incoherent.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 69 +++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index bc8707f..6837b98 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2074,6 +2074,74 @@ static int i915_power_domain_info(struct seq_file *m, void *unused)
 	return 0;
 }
 
+static int i915_semaphore_status(struct seq_file *m, void *unused)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ring_buffer *ring;
+	int i, j, ret;
+
+	if (!i915_semaphore_is_enabled(dev)) {
+		seq_puts(m, "Semaphores are disabled\n");
+		return 0;
+	}
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
+
+	if (IS_BROADWELL(dev)) {
+		struct page *page;
+		uint64_t *seqno;
+
+		page = i915_gem_object_get_page(dev_priv->semaphore_obj, 0);
+
+		seqno = (uint64_t *)kmap_atomic(page);
+		for_each_ring(ring, dev_priv, i) {
+			uint64_t offset;
+
+			seq_printf(m, "%s\n", ring->name);
+
+			seq_puts(m, "  Last signal:");
+			for (j = 0; j < I915_NUM_RINGS; j++) {
+				offset = i * I915_NUM_RINGS + j;
+				seq_printf(m, "0x%08llx (0x%02llx) ",
+					   seqno[offset], offset * 8);
+			}
+			seq_putc(m, '\n');
+
+			seq_puts(m, "  Last wait:  ");
+			for (j = 0; j < I915_NUM_RINGS; j++) {
+				offset = i + (j * I915_NUM_RINGS);
+				seq_printf(m, "0x%08llx (0x%02llx) ",
+					   seqno[offset], offset * 8);
+			}
+			seq_putc(m, '\n');
+
+		}
+		kunmap_atomic(seqno);
+	} else {
+		seq_puts(m, "  Last signal:");
+		for_each_ring(ring, dev_priv, i)
+			for (j = 0; j < I915_NUM_RINGS; j++)
+				seq_printf(m, "0x%08x\n", I915_READ(ring->semaphore.signal_mbox[j]));
+		seq_putc(m, '\n');
+	}
+
+	seq_puts(m, "\nSync seqno:\n");
+	for_each_ring(ring, dev_priv, i) {
+		for (j = 0; j < I915_NUM_RINGS; j++) {
+			seq_printf(m, "  0x%08x ", ring->semaphore.sync_seqno[j]);
+		}
+		seq_putc(m, '\n');
+	}
+	seq_putc(m, '\n');
+
+	mutex_unlock(&dev->struct_mutex);
+	return 0;
+}
+
 struct pipe_crc_info {
 	const char *name;
 	struct drm_device *dev;
@@ -3486,6 +3554,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_energy_uJ", i915_energy_uJ, 0},
 	{"i915_pc8_status", i915_pc8_status, 0},
 	{"i915_power_domain_info", i915_power_domain_info, 0},
+	{"i915_semaphore_status", i915_semaphore_status, 0},
 };
 #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
 
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/13] drm/i915: Make semaphore updates more precise
  2014-01-29 19:55 ` [PATCH 04/13] drm/i915: Make semaphore updates more precise Ben Widawsky
@ 2014-01-30 11:25   ` Ville Syrjälä
  2014-02-11 16:08     ` Ben Widawsky
  2014-02-11 20:20   ` [PATCH] [v2] " Ben Widawsky
  1 sibling, 1 reply; 40+ messages in thread
From: Ville Syrjälä @ 2014-01-30 11:25 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, Jan 29, 2014 at 11:55:24AM -0800, Ben Widawsky wrote:
> With the ring mask we now have an easy way to know the number of rings
> in the system, and therefore can accurately predict the number of dwords
> to emit for semaphore signalling. This was not possible (easily)
> previously.
> 
> There should be no functional impact, simply fewer instructions emitted.
> 
> While we're here, simply do the round up to 2 instead of the fancier
> rounding we did before, which rounding up per mbox, ie 4.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 43 +++++++++++++++++----------------
>  1 file changed, 22 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 70f7190..97789ff 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -635,24 +635,20 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
>  static int gen6_signal(struct intel_ring_buffer *signaller,
>  		       unsigned int num_dwords)
>  {
> +#define MBOX_UPDATE_DWORDS 4
>  	struct drm_device *dev = signaller->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct intel_ring_buffer *useless;
> -	int i, ret;
> +	int i, ret, num_rings;
>  
> -	/* NB: In order to be able to do semaphore MBOX updates for varying
> -	 * number of rings, it's easiest if we round up each individual update
> -	 * to a multiple of 2 (since ring updates must always be a multiple of
> -	 * 2) even though the actual update only requires 3 dwords.
> -	 */
> -#define MBOX_UPDATE_DWORDS 4
> -	if (i915_semaphore_is_enabled(dev))
> -		num_dwords += ((I915_NUM_RINGS-1) * MBOX_UPDATE_DWORDS);
> +	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
> +	num_dwords = round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);

num_dwords +=

Also round_up() is useless since it's already a multiple of 4. Or did
you mean to change it to emit only 3 dwords per mbox?

> +#undef MBOX_UPDATE_DWORDS
>  
> -	ret = intel_ring_begin(signaller, num_dwords);
> +	/* XXX: + 4 for the caller */
> +	ret = intel_ring_begin(signaller, num_dwords + 4);

The += earlier gets rid of the +4 here.

>  	if (ret)
>  		return ret;
> -#undef MBOX_UPDATE_DWORDS
>  
>  	for_each_ring(useless, dev_priv, i) {
>  		u32 mbox_reg = signaller->semaphore.signal_mbox[i];
> @@ -661,14 +657,11 @@ static int gen6_signal(struct intel_ring_buffer *signaller,
>  			intel_ring_emit(signaller, mbox_reg);
>  			intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
>  			intel_ring_emit(signaller, MI_NOOP);
> -		} else {
> -			intel_ring_emit(signaller, MI_NOOP);
> -			intel_ring_emit(signaller, MI_NOOP);
> -			intel_ring_emit(signaller, MI_NOOP);
> -			intel_ring_emit(signaller, MI_NOOP);
>  		}
>  	}
>  
> +	WARN_ON(i != num_rings);

So we're not expecting dev_priv->ring[] to be sparsely populated ever?

> +
>  	return 0;
>  }
>  
> @@ -686,7 +679,11 @@ gen6_add_request(struct intel_ring_buffer *ring)
>  {
>  	int ret;
>  
> -	ret = ring->semaphore.signal(ring, 4);
> +	if (ring->semaphore.signal)
> +		ret = ring->semaphore.signal(ring, 4);
> +	else
> +		ret = intel_ring_begin(ring, 4);
> +
>  	if (ret)
>  		return ret;
>  
> @@ -1881,7 +1878,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>  		ring->get_seqno = gen6_ring_get_seqno;
>  		ring->set_seqno = ring_set_seqno;
>  		ring->semaphore.sync_to = gen6_ring_sync;
> -		ring->semaphore.signal = gen6_signal;
> +		if (i915_semaphore_is_enabled(dev))
> +			ring->semaphore.signal = gen6_signal;

I guess we could also set .sync_to conditionally, but doesn't really
matter since we won't call it anyway w/o semaphores enabled.

>  		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
>  		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_RV;
>  		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_RB;
> @@ -2058,7 +2056,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
>  				gen6_ring_dispatch_execbuffer;
>  		}
>  		ring->semaphore.sync_to = gen6_ring_sync;
> -		ring->semaphore.signal = gen6_signal;
> +		if (i915_semaphore_is_enabled(dev))
> +			ring->semaphore.signal = gen6_signal;
>  		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VR;
>  		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
>  		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VB;
> @@ -2116,7 +2115,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
>  		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
>  	}
>  	ring->semaphore.sync_to = gen6_ring_sync;
> -	ring->semaphore.signal = gen6_signal;
> +	if (i915_semaphore_is_enabled(dev))
> +		ring->semaphore.signal = gen6_signal;
>  	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_BR;
>  	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_BV;
>  	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
> @@ -2158,7 +2158,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
>  		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
>  	}
>  	ring->semaphore.sync_to = gen6_ring_sync;
> -	ring->semaphore.signal = gen6_signal;
> +	if (i915_semaphore_is_enabled(dev))
> +		ring->semaphore.signal = gen6_signal;
>  	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VER;
>  	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_VEV;
>  	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VEB;
> -- 
> 1.8.5.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-01-29 19:55 ` [PATCH 06/13] drm/i915/bdw: implement semaphore signal Ben Widawsky
@ 2014-01-30 12:38   ` Ville Syrjälä
  2014-01-30 12:46     ` Chris Wilson
  2014-02-11 22:11     ` Ben Widawsky
  0 siblings, 2 replies; 40+ messages in thread
From: Ville Syrjälä @ 2014-01-30 12:38 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, Jan 29, 2014 at 11:55:26AM -0800, Ben Widawsky wrote:
> Semaphore signalling works similarly to previous GENs with the exception
> that the per ring mailboxes no longer exist. Instead you must define
> your own space, somewhere in the GTT.
> 
> The comments in the code define the layout I've opted for, which should
> be fairly future proof. Ie. I tried to define offsets in abstract terms
> (NUM_RINGS, seqno size, etc).
> 
> NOTE: If one wanted to move this to the HWSP they could. I've decided
> one 4k object would be easier to deal with, and provide potential wins
> with cache locality, but that's all speculative.
> 
> v2: Update the macro to not need the other ring's ring->id (Chris)
> Update the comment to use the correct formula (Chris)
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |   1 +
>  drivers/gpu/drm/i915/i915_reg.h         |   5 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 199 +++++++++++++++++++++++++-------
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  38 +++++-
>  4 files changed, 197 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 3673ba1..f521059 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1380,6 +1380,7 @@ typedef struct drm_i915_private {
>  
>  	struct pci_dev *bridge_dev;
>  	struct intel_ring_buffer ring[I915_NUM_RINGS];
> +	struct drm_i915_gem_object *semaphore_obj;
>  	uint32_t last_seqno, next_seqno;
>  
>  	drm_dma_handle_t *status_page_dmah;
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index cbbaf26..8b745dc 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -216,7 +216,7 @@
>  #define   MI_DISPLAY_FLIP_IVB_SPRITE_B (3 << 19)
>  #define   MI_DISPLAY_FLIP_IVB_PLANE_C  (4 << 19)
>  #define   MI_DISPLAY_FLIP_IVB_SPRITE_C (5 << 19)
> -#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6+ */
> +#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6, gen7 */
>  #define   MI_SEMAPHORE_GLOBAL_GTT    (1<<22)
>  #define   MI_SEMAPHORE_UPDATE	    (1<<21)
>  #define   MI_SEMAPHORE_COMPARE	    (1<<20)
> @@ -241,6 +241,8 @@
>  #define   MI_RESTORE_EXT_STATE_EN	(1<<2)
>  #define   MI_FORCE_RESTORE		(1<<1)
>  #define   MI_RESTORE_INHIBIT		(1<<0)
> +#define MI_SEMAPHORE_SIGNAL	MI_INSTR(0x1b, 0) /* GEN8+ */
> +#define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
>  #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
>  #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
>  #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
> @@ -329,6 +331,7 @@
>  #define   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE		(1<<10) /* GM45+ only */
>  #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
>  #define   PIPE_CONTROL_NOTIFY				(1<<8)
> +#define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
>  #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
>  #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
>  #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 37ae2b1..b750835 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -619,6 +619,13 @@ static int init_render_ring(struct intel_ring_buffer *ring)
>  static void render_ring_cleanup(struct intel_ring_buffer *ring)
>  {
>  	struct drm_device *dev = ring->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	if (dev_priv->semaphore_obj) {
> +		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
> +		drm_gem_object_unreference(&dev_priv->semaphore_obj->base);
> +		dev_priv->semaphore_obj = NULL;
> +	}
>  
>  	if (ring->scratch.obj == NULL)
>  		return;
> @@ -632,6 +639,86 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
>  	ring->scratch.obj = NULL;
>  }
>  
> +static int gen8_rcs_signal(struct intel_ring_buffer *signaller,
> +			   unsigned int num_dwords)
> +{
> +#define MBOX_UPDATE_DWORDS 8
> +	struct drm_device *dev = signaller->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_ring_buffer *waiter;
> +	int i, ret, num_rings;
> +
> +	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
> +	num_dwords = (num_rings-1) * MBOX_UPDATE_DWORDS;

Again num_dwords +=

> +#undef MBOX_UPDATE_DWORDS
> +
> +	/* XXX: + 4 for the caller */
> +	ret = intel_ring_begin(signaller, num_dwords + 4);

and the +4 goes away.

> +	if (ret)
> +		return ret;
> +
> +	for_each_ring(waiter, dev_priv, i) {
> +		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
> +		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
> +			continue;
> +
> +		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
> +		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
> +					   PIPE_CONTROL_QW_WRITE |
> +					   PIPE_CONTROL_FLUSH_ENABLE);
> +		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
> +		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> +		intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> +		intel_ring_emit(signaller, 0);
> +		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
> +					   MI_SEMAPHORE_TARGET(waiter->id));
> +		intel_ring_emit(signaller, 0);
> +	}
> +
> +	WARN_ON(i != num_rings);
> +
> +	return 0;
> +}

<snip>

> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index c69ae10..f1e7a66 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -111,6 +111,39 @@ struct  intel_ring_buffer {
>  #define I915_DISPATCH_PINNED 0x2
>  	void		(*cleanup)(struct intel_ring_buffer *ring);
>  
> +	/* GEN8 signal/wait table
> +	 *	  signal to  signal to    signal to   signal to
> +	 *	    RCS         VCS          BCS        VECS
> +	 *      ------------------------------------------------------
> +	 *  RCS | NOP (0x00) | BCS (0x08) | VCS (0x10) | VECS (0x18) |
> +	 *	|-----------------------------------------------------
> +	 *  VCS | RCS (0x20) | NOP (0x28) | BCS (0x30) | VECS (0x38) |
> +	 *	|-----------------------------------------------------
> +	 *  BCS | RCS (0x40) | VCS (0x48) | NOP (0x50) | VECS (0x58) |
> +	 *	|-----------------------------------------------------
> +	 * VECS | RCS (0x60) | VCS (0x68) | BCS (0x70) |  NOP (0x78) |
> +	 *	|-----------------------------------------------------
> +	 *
> +	 * Generalization:
> +	 *  f(x, y) := (x->id * NUM_RINGS * seqno_size) + (seqno_size * y->id)
> +	 *  ie. transpose of g(x, y)
> +	 *
> +	 *	 sync from   sync from    sync from    sync from
> +	 *	    RCS         VCS          BCS        VECS
> +	 *      ------------------------------------------------------
> +	 *  RCS | NOP (0x00) | BCS (0x20) | VCS (0x40) | VECS (0x60) |
> +	 *	|-----------------------------------------------------
> +	 *  VCS | RCS (0x08) | NOP (0x28) | BCS (0x48) | VECS (0x68) |
> +	 *	|-----------------------------------------------------
> +	 *  BCS | RCS (0x10) | VCS (0x30) | NOP (0x50) | VECS (0x60) |
> +	 *	|-----------------------------------------------------
> +	 * VECS | RCS (0x18) | VCS (0x38) | BCS (0x58) |  NOP (0x78) |
> +	 *	|-----------------------------------------------------
> +	 *
> +	 * Generalization:
> +	 *  g(x, y) := (y->id * NUM_RINGS * seqno_size) + (seqno_size * x->id)
> +	 *  ie. transpose of f(x, y)
> +	 */
>  	struct {
>  		u32	sync_seqno[I915_NUM_RINGS-1];
>  		/* AKA wait() */
> @@ -120,7 +153,10 @@ struct  intel_ring_buffer {
>  		/* our mbox written by others */
>  		u32		mbox[I915_NUM_RINGS];

mbox should also get a u64 friend, right?

>  		/* mboxes this ring signals to */
> -		u32		signal_mbox[I915_NUM_RINGS];
> +		union {
> +			u32		signal_mbox[I915_NUM_RINGS];
> +			u64		signal_ggtt[I915_NUM_RINGS];
> +		};
>  
>  		/* num_dwords is space the caller will need for atomic update */
>  		int		(*signal)(struct intel_ring_buffer *signaller,
> -- 
> 1.8.5.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-01-30 12:38   ` Ville Syrjälä
@ 2014-01-30 12:46     ` Chris Wilson
  2014-01-30 13:18       ` Daniel Vetter
  2014-02-11 22:11     ` Ben Widawsky
  1 sibling, 1 reply; 40+ messages in thread
From: Chris Wilson @ 2014-01-30 12:46 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel GFX, Ben Widawsky, Ben Widawsky

On Thu, Jan 30, 2014 at 02:38:17PM +0200, Ville Syrjälä wrote:
> On Wed, Jan 29, 2014 at 11:55:26AM -0800, Ben Widawsky wrote:
> > Semaphore signalling works similarly to previous GENs with the exception
> > that the per ring mailboxes no longer exist. Instead you must define
> > your own space, somewhere in the GTT.
> > 
> > The comments in the code define the layout I've opted for, which should
> > be fairly future proof. Ie. I tried to define offsets in abstract terms
> > (NUM_RINGS, seqno size, etc).
> > 
> > NOTE: If one wanted to move this to the HWSP they could. I've decided
> > one 4k object would be easier to deal with, and provide potential wins
> > with cache locality, but that's all speculative.
> > 
> > v2: Update the macro to not need the other ring's ring->id (Chris)
> > Update the comment to use the correct formula (Chris)
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |   1 +
> >  drivers/gpu/drm/i915/i915_reg.h         |   5 +-
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 199 +++++++++++++++++++++++++-------
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  38 +++++-
> >  4 files changed, 197 insertions(+), 46 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 3673ba1..f521059 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1380,6 +1380,7 @@ typedef struct drm_i915_private {
> >  
> >  	struct pci_dev *bridge_dev;
> >  	struct intel_ring_buffer ring[I915_NUM_RINGS];
> > +	struct drm_i915_gem_object *semaphore_obj;
> >  	uint32_t last_seqno, next_seqno;
> >  
> >  	drm_dma_handle_t *status_page_dmah;
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index cbbaf26..8b745dc 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -216,7 +216,7 @@
> >  #define   MI_DISPLAY_FLIP_IVB_SPRITE_B (3 << 19)
> >  #define   MI_DISPLAY_FLIP_IVB_PLANE_C  (4 << 19)
> >  #define   MI_DISPLAY_FLIP_IVB_SPRITE_C (5 << 19)
> > -#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6+ */
> > +#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6, gen7 */
> >  #define   MI_SEMAPHORE_GLOBAL_GTT    (1<<22)
> >  #define   MI_SEMAPHORE_UPDATE	    (1<<21)
> >  #define   MI_SEMAPHORE_COMPARE	    (1<<20)
> > @@ -241,6 +241,8 @@
> >  #define   MI_RESTORE_EXT_STATE_EN	(1<<2)
> >  #define   MI_FORCE_RESTORE		(1<<1)
> >  #define   MI_RESTORE_INHIBIT		(1<<0)
> > +#define MI_SEMAPHORE_SIGNAL	MI_INSTR(0x1b, 0) /* GEN8+ */
> > +#define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
> >  #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
> >  #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
> >  #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
> > @@ -329,6 +331,7 @@
> >  #define   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE		(1<<10) /* GM45+ only */
> >  #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
> >  #define   PIPE_CONTROL_NOTIFY				(1<<8)
> > +#define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */

Oh. So they changed how post-sync writes operated - this should be a
separate fix for stable I believe (so that batches are not run before we
have finished invalidating the TLBs required).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 07/13] drm/i915/bdw: implement semaphore wait
  2014-01-29 19:55 ` [PATCH 07/13] drm/i915/bdw: implement semaphore wait Ben Widawsky
@ 2014-01-30 12:48   ` Ville Syrjälä
  0 siblings, 0 replies; 40+ messages in thread
From: Ville Syrjälä @ 2014-01-30 12:48 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, Jan 29, 2014 at 11:55:27AM -0800, Ben Widawsky wrote:
> Semaphore waits use a new instruction, MI_SEMAPHORE_WAIT. The seqno to
> wait on is all well defined by the table in the previous patch. There is
> nothing else different from previous GEN's semaphore synchronization
> code.
> 
> v2: Update macros to not require the other ring's ring->id (Chris)
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_reg.h         |  3 ++
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 66 +++++++++++++++------------------
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 30 +++++++++++++++
>  3 files changed, 62 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 8b745dc..6e8edaf 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -243,6 +243,9 @@
>  #define   MI_RESTORE_INHIBIT		(1<<0)
>  #define MI_SEMAPHORE_SIGNAL	MI_INSTR(0x1b, 0) /* GEN8+ */
>  #define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
> +#define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
> +#define   MI_SEMAPHORE_POLL		(1<<15)
> +#define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
>  #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
>  #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
>  #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index b750835..3cfcc78 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -797,6 +797,31 @@ static inline bool i915_gem_has_seqno_wrapped(struct drm_device *dev,
>   * @signaller - ring which has, or will signal
>   * @seqno - seqno which the waiter will block on
>   */
> +
> +static int
> +gen8_ring_sync(struct intel_ring_buffer *waiter,
> +	       struct intel_ring_buffer *signaller,
> +	       u32 seqno)
> +{
> +	struct drm_i915_private *dev_priv = waiter->dev->dev_private;
> +	int ret;
> +
> +	ret = intel_ring_begin(waiter, 4);
> +	if (ret)
> +		return ret;
> +
> +	intel_ring_emit(waiter, MI_SEMAPHORE_WAIT |
> +				MI_SEMAPHORE_GLOBAL_GTT |
> +				MI_SEMAPHORE_SAD_GTE_SDD);
> +	intel_ring_emit(waiter, seqno);
> +	intel_ring_emit(waiter,
> +			lower_32_bits(GEN8_WAIT_OFFSET(waiter, signaller->id)));
> +	intel_ring_emit(waiter,
> +			upper_32_bits(GEN8_WAIT_OFFSET(waiter, signaller->id)));
> +	intel_ring_advance(waiter);
> +	return 0;
> +}
> +
>  static int
>  gen6_ring_sync(struct intel_ring_buffer *waiter,
>  	       struct intel_ring_buffer *signaller,
> @@ -1939,39 +1964,6 @@ static int gen6_ring_flush(struct intel_ring_buffer *ring,
>  	return 0;
>  }
>  
> -/* seqno size is actually only a uint32, but since we plan to use MI_FLUSH_DW to
> - * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
> - */
> -#define SEQNO_SIZE sizeof(uint64_t)
> -#define GEN8_SIGNAL_OFFSET(to) \
> -	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
> -	(ring->id * I915_NUM_RINGS * SEQNO_SIZE) + \
> -	(SEQNO_SIZE * (to)))
> -
> -#define GEN8_WAIT_OFFSET(from) \
> -	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
> -	((from) * I915_NUM_RINGS * SEQNO_SIZE) + \
> -	(SEQNO_SIZE * ring->id))
> -
> -#define GEN8_RING_SEMAPHORE_INIT do { \
> -	if (!dev_priv->semaphore_obj) { \
> -		break; \
> -	} \
> -	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(RCS); \
> -	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(VCS); \
> -	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(BCS); \
> -	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(VECS); \
> -	ring->semaphore.mbox[RCS] = GEN8_WAIT_OFFSET(RCS); \
> -	ring->semaphore.mbox[VCS] = GEN8_WAIT_OFFSET(VCS); \
> -	ring->semaphore.mbox[BCS] = GEN8_WAIT_OFFSET(BCS); \
> -	ring->semaphore.mbox[VECS] = GEN8_WAIT_OFFSET(VECS); \
> -	ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \
> -	ring->semaphore.mbox[ring->id] = GEN6_NOSYNC; \
> -	} while(0)
> -#undef seqno_size
> -
> -
> -

Maybe stick this stuff into the header to begin with to avoid churn.

>  int intel_init_render_ring_buffer(struct drm_device *dev)
>  {
>  	drm_i915_private_t *dev_priv = dev->dev_private;
> @@ -2007,7 +1999,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>  		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
>  		ring->get_seqno = gen6_ring_get_seqno;
>  		ring->set_seqno = ring_set_seqno;
> -		ring->semaphore.sync_to = gen6_ring_sync;
> +		ring->semaphore.sync_to = gen8_ring_sync;
>  		if (i915_semaphore_is_enabled(dev)) {
>  			BUG_ON(!dev_priv->semaphore_obj);
>  			ring->semaphore.signal = gen8_rcs_signal;
> @@ -2192,7 +2184,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
>  			ring->irq_put = gen8_ring_put_irq;
>  			ring->dispatch_execbuffer =
>  				gen8_ring_dispatch_execbuffer;
> -			ring->semaphore.sync_to = gen6_ring_sync;
> +			ring->semaphore.sync_to = gen8_ring_sync;
>  			if (i915_semaphore_is_enabled(dev)) {
>  				ring->semaphore.signal = gen8_xcs_signal;
>  				GEN8_RING_SEMAPHORE_INIT;
> @@ -2257,7 +2249,7 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
>  		ring->irq_get = gen8_ring_get_irq;
>  		ring->irq_put = gen8_ring_put_irq;
>  		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
> -		ring->semaphore.sync_to = gen6_ring_sync;
> +		ring->semaphore.sync_to = gen8_ring_sync;
>  		if (i915_semaphore_is_enabled(dev)) {
>  			ring->semaphore.signal = gen8_xcs_signal;
>  			GEN8_RING_SEMAPHORE_INIT;
> @@ -2306,7 +2298,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
>  		ring->irq_get = gen8_ring_get_irq;
>  		ring->irq_put = gen8_ring_put_irq;
>  		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
> -		ring->semaphore.sync_to = gen6_ring_sync;
> +		ring->semaphore.sync_to = gen8_ring_sync;
>  		if (i915_semaphore_is_enabled(dev)) {
>  			ring->semaphore.signal = gen8_xcs_signal;
>  			GEN8_RING_SEMAPHORE_INIT;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index f1e7a66..ed55370 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -33,6 +33,36 @@ struct  intel_hw_status_page {
>  #define I915_READ_IMR(ring) I915_READ(RING_IMR((ring)->mmio_base))
>  #define I915_WRITE_IMR(ring, val) I915_WRITE(RING_IMR((ring)->mmio_base), val)
>  
> +/* seqno size is actually only a uint32, but since we plan to use MI_FLUSH_DW to
> + * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
> + */
> +#define i915_semaphore_seqno_size sizeof(uint64_t)
> +#define GEN8_SIGNAL_OFFSET(to) \
> +	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
> +	(ring->id * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
> +	(i915_semaphore_seqno_size * (to)))
> +
> +#define GEN8_WAIT_OFFSET(__ring, from) \
> +	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
> +	((from) * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
> +	(i915_semaphore_seqno_size * (__ring)->id))
> +
> +#define GEN8_RING_SEMAPHORE_INIT do { \
> +	if (!dev_priv->semaphore_obj) { \
> +		break; \
> +	} \
> +	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(RCS); \
> +	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(VCS); \
> +	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(BCS); \
> +	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(VECS); \
> +	ring->semaphore.mbox[RCS] = GEN8_WAIT_OFFSET(ring, RCS); \
> +	ring->semaphore.mbox[VCS] = GEN8_WAIT_OFFSET(ring, VCS); \
> +	ring->semaphore.mbox[BCS] = GEN8_WAIT_OFFSET(ring, BCS); \
> +	ring->semaphore.mbox[VECS] = GEN8_WAIT_OFFSET(ring, VECS); \
> +	ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \
> +	ring->semaphore.mbox[ring->id] = GEN6_NOSYNC; \
> +	} while(0)
> +
>  enum intel_ring_hangcheck_action {
>  	HANGCHECK_IDLE = 0,
>  	HANGCHECK_WAIT,
> -- 
> 1.8.5.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-01-30 12:46     ` Chris Wilson
@ 2014-01-30 13:18       ` Daniel Vetter
  2014-01-30 13:25         ` Chris Wilson
  2014-01-30 13:35         ` Chris Wilson
  0 siblings, 2 replies; 40+ messages in thread
From: Daniel Vetter @ 2014-01-30 13:18 UTC (permalink / raw)
  To: Chris Wilson, Ville Syrjälä,
	Ben Widawsky, Intel GFX, Ben Widawsky

On Thu, Jan 30, 2014 at 1:46 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Oh. So they changed how post-sync writes operated - this should be a
> separate fix for stable I believe (so that batches are not run before we
> have finished invalidating the TLBs required).

We have an igt to exercise tlb invalidation stuff, which runs on all
rings. But it only runs a batch, so only uses the CS tlb. Do we need
to extend this?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-01-30 13:18       ` Daniel Vetter
@ 2014-01-30 13:25         ` Chris Wilson
  2014-01-30 13:35         ` Chris Wilson
  1 sibling, 0 replies; 40+ messages in thread
From: Chris Wilson @ 2014-01-30 13:25 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel GFX, Ben Widawsky, Ben Widawsky

On Thu, Jan 30, 2014 at 02:18:32PM +0100, Daniel Vetter wrote:
> On Thu, Jan 30, 2014 at 1:46 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > Oh. So they changed how post-sync writes operated - this should be a
> > separate fix for stable I believe (so that batches are not run before we
> > have finished invalidating the TLBs required).
> 
> We have an igt to exercise tlb invalidation stuff, which runs on all
> rings. But it only runs a batch, so only uses the CS tlb. Do we need
> to extend this?

You could try and catch out the sampler. Or it may be that the
hardware internally serialises the operation of invalidating the TLBs
and lookup. Or it may be just such a slim window that it will only be
hit during a demo and never a test case ;)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 09/13] drm/i915/bdw: poll semaphores
  2014-01-29 19:55 ` [PATCH 09/13] drm/i915/bdw: poll semaphores Ben Widawsky
@ 2014-01-30 13:26   ` Ville Syrjälä
  0 siblings, 0 replies; 40+ messages in thread
From: Ville Syrjälä @ 2014-01-30 13:26 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, Jan 29, 2014 at 11:55:29AM -0800, Ben Widawsky wrote:
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 3cfcc78..3a3ba81 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -812,6 +812,7 @@ gen8_ring_sync(struct intel_ring_buffer *waiter,
>  
>  	intel_ring_emit(waiter, MI_SEMAPHORE_WAIT |
>  				MI_SEMAPHORE_GLOBAL_GTT |
> +				MI_SEMAPHORE_POLL |
>  				MI_SEMAPHORE_SAD_GTE_SDD);

I was thinking that we shouldn't need this. However the docs suck a bit
and they don't actually specify whether the hardware will wait for the
signal before even checking the semaphore once. But that sounds so
wrong that it can't possibly be true.

>  	intel_ring_emit(waiter, seqno);
>  	intel_ring_emit(waiter,
> -- 
> 1.8.5.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-01-30 13:18       ` Daniel Vetter
  2014-01-30 13:25         ` Chris Wilson
@ 2014-01-30 13:35         ` Chris Wilson
  2014-02-11 21:48           ` Ben Widawsky
  1 sibling, 1 reply; 40+ messages in thread
From: Chris Wilson @ 2014-01-30 13:35 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel GFX, Ben Widawsky, Ben Widawsky

On Thu, Jan 30, 2014 at 02:18:32PM +0100, Daniel Vetter wrote:
> On Thu, Jan 30, 2014 at 1:46 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > Oh. So they changed how post-sync writes operated - this should be a
> > separate fix for stable I believe (so that batches are not run before we
> > have finished invalidating the TLBs required).
> 
> We have an igt to exercise tlb invalidation stuff, which runs on all
> rings. But it only runs a batch, so only uses the CS tlb. Do we need
> to extend this?

So the spec says:

Pipe Control Flush Enable (IVB+)
If ENABLED, the PIPE_CONTROL command will wait until all previous writes
of immediate data from post sync circles are complete before executing
the next command.

Post Sync Operation
This field specifies an optional action to be taken upon completion of
the synchronization operation.

TLB Invalidate
If ENABLED, all TLBs belonging to Render Engine will be invalidated once
the flush operation is complete.

Command Streamer Stall Enable
If ENABLED, the sync operation will not occur until all previous flush
operations pending a completion of those previous flushes will complete,
including the flush produced from this command. This enables the command
to act similar to the legacy MI_FLUSH command.

Going by that, the order is

flush, stall, TLB invalidate / post-sync op, [pipe control flush]

Based on my reading of the above (which unless someone has a more
definitive source) says that without the CONTROL_FLUSH_ENABLE, the CS
can continue operations as soon as the flush is complete - in parallel
to the TLB invalidate. Adding CONTROL_FLUSH_ENABLE would then stall the
CS until the post-sync operation completes. That still leaves the
possibility that the TLB invalidate is being performed in parallel and
is itself provides no CS sync.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 11/13] drm/i915/bdw: collect semaphore error state
  2014-01-29 19:55 ` [PATCH 11/13] drm/i915/bdw: collect semaphore error state Ben Widawsky
@ 2014-01-30 14:53   ` Ville Syrjälä
  2014-01-30 14:58     ` Chris Wilson
  2014-02-12  0:23     ` Ben Widawsky
  0 siblings, 2 replies; 40+ messages in thread
From: Ville Syrjälä @ 2014-01-30 14:53 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Wed, Jan 29, 2014 at 11:55:31AM -0800, Ben Widawsky wrote:
> Since the semaphore information is in an object, just dump it, and let
> the user parse it later.
> 
> NOTE: The page being used for the semaphores are incoherent with the
> CPU. No matter what I do, I cannot figure out a way to read anything but
> 0s. Note that the semaphore waits are indeed working.
> 
> v2: Don't print signal, and wait (they should be the same). Instead,
> print sync_seqno (Chris)
> 
> v3: Free the semaphore error object (Chris)
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  1 +
>  drivers/gpu/drm/i915/i915_gpu_error.c   | 47 ++++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 12 ++++-----
>  3 files changed, 51 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index f521059..b08e6eb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -313,6 +313,7 @@ struct drm_i915_error_state {
>  	u32 acthd[I915_NUM_RINGS];
>  	u32 semaphore_mboxes[I915_NUM_RINGS][I915_NUM_RINGS - 1];
>  	u32 semaphore_seqno[I915_NUM_RINGS][I915_NUM_RINGS - 1];
> +	struct drm_i915_error_object *semaphore_obj;
>  	u32 rc_psmi[I915_NUM_RINGS]; /* sleep state */
>  	/* our own tracking of ring head and tail */
>  	u32 cpu_ring_head[I915_NUM_RINGS];
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index efaad96..d6afc01 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -297,6 +297,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>  	struct drm_device *dev = error_priv->dev;
>  	drm_i915_private_t *dev_priv = dev->dev_private;
>  	struct drm_i915_error_state *error = error_priv->error;
> +	struct drm_i915_error_object *obj;
>  	int i, j, page, offset, elt;
>  
>  	if (!error) {
> @@ -345,8 +346,6 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>  				    error->pinned_bo_count[0]);
>  
>  	for (i = 0; i < ARRAY_SIZE(error->ring); i++) {
> -		struct drm_i915_error_object *obj;
> -
>  		if ((obj = error->ring[i].batchbuffer)) {
>  			err_printf(m, "%s --- gtt_offset = 0x%08x\n",
>  				   dev_priv->ring[i].name,
> @@ -421,6 +420,19 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>  		}
>  	}
>  
> +	obj = error->semaphore_obj;
> +	if (obj) {

Chris will come along and change this to

if ((obj = error->semaphore_obj))


> +		err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
> +		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
> +			err_printf(m, "[%04x] %08x %08x %08x %08x\n",
> +				   elt * 4,
> +				   obj->pages[0][elt],
> +				   obj->pages[0][elt+1],
> +				   obj->pages[0][elt+2],
> +				   obj->pages[0][elt+3]);
> +		}

That'll be the third copy of this page dumping code. Time to refactor?

> +	}
> +
>  	if (error->overlay)
>  		intel_overlay_print_error_state(m, error->overlay);
>  
> @@ -491,6 +503,7 @@ static void i915_error_state_free(struct kref *error_ref)
>  		kfree(error->ring[i].requests);
>  	}
>  
> +	i915_error_object_free(error->semaphore_obj);
>  	kfree(error->active_bo);
>  	kfree(error->overlay);
>  	kfree(error->display);
> @@ -772,6 +785,31 @@ static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
>  	}
>  }
>  
> +static void gen8_record_semaphore_state(struct drm_i915_private *dev_priv,
> +					struct drm_i915_error_state *error,
> +					struct intel_ring_buffer *ring)
> +{
> +	struct intel_ring_buffer *useless;
> +	int i;
> +
> +	if (!i915_semaphore_is_enabled(dev_priv->dev))
> +		return;
> +
> +	if (!error->semaphore_obj)
> +		error->semaphore_obj =
> +			i915_error_object_create(dev_priv,
> +						 dev_priv->semaphore_obj,
> +						 &dev_priv->gtt.base);
> +
> +	for_each_ring(useless, dev_priv, i) {
> +		u16 signal_offset = GEN8_SIGNAL_OFFSET(ring, i) / 4;

GEN8_SIGNAL_OFFSET() returns the full ggtt offset.

> +		u32 *tmp = error->semaphore_obj->pages[0];
> +
> +		error->semaphore_mboxes[ring->id][i] = tmp[signal_offset];
> +		error->semaphore_seqno[ring->id][i] = ring->semaphore.sync_seqno[i];
> +	}
> +}
> +
>  static void i915_record_ring_state(struct drm_device *dev,
>  				   struct drm_i915_error_state *error,
>  				   struct intel_ring_buffer *ring)
> @@ -781,7 +819,10 @@ static void i915_record_ring_state(struct drm_device *dev,
>  	if (INTEL_INFO(dev)->gen >= 6) {
>  		error->rc_psmi[ring->id] = I915_READ(ring->mmio_base + 0x50);
>  		error->fault_reg[ring->id] = I915_READ(RING_FAULT_REG(ring));
> -		gen6_record_semaphore_state(dev_priv, error, ring);
> +		if (INTEL_INFO(dev)->gen >= 8)
> +			gen8_record_semaphore_state(dev_priv, error, ring);
> +		else
> +			gen6_record_semaphore_state(dev_priv, error, ring);
>  	}
>  
>  	if (INTEL_INFO(dev)->gen >= 4) {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index ed55370..4ca2789 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -37,9 +37,9 @@ struct  intel_hw_status_page {
>   * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
>   */
>  #define i915_semaphore_seqno_size sizeof(uint64_t)
> -#define GEN8_SIGNAL_OFFSET(to) \
> +#define GEN8_SIGNAL_OFFSET(__ring, to) \
>  	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
> -	(ring->id * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
> +	((__ring)->id * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
>  	(i915_semaphore_seqno_size * (to)))
>  
>  #define GEN8_WAIT_OFFSET(__ring, from) \
> @@ -51,10 +51,10 @@ struct  intel_hw_status_page {
>  	if (!dev_priv->semaphore_obj) { \
>  		break; \
>  	} \
> -	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(RCS); \
> -	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(VCS); \
> -	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(BCS); \
> -	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(VECS); \
> +	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(ring, RCS); \
> +	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(ring, VCS); \
> +	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(ring, BCS); \
> +	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(ring, VECS); \
>  	ring->semaphore.mbox[RCS] = GEN8_WAIT_OFFSET(ring, RCS); \
>  	ring->semaphore.mbox[VCS] = GEN8_WAIT_OFFSET(ring, VCS); \
>  	ring->semaphore.mbox[BCS] = GEN8_WAIT_OFFSET(ring, BCS); \
> -- 
> 1.8.5.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 11/13] drm/i915/bdw: collect semaphore error state
  2014-01-30 14:53   ` Ville Syrjälä
@ 2014-01-30 14:58     ` Chris Wilson
  2014-02-12  0:19       ` Ben Widawsky
  2014-02-12  0:23     ` Ben Widawsky
  1 sibling, 1 reply; 40+ messages in thread
From: Chris Wilson @ 2014-01-30 14:58 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel GFX, Ben Widawsky, Ben Widawsky

On Thu, Jan 30, 2014 at 04:53:32PM +0200, Ville Syrjälä wrote:
> On Wed, Jan 29, 2014 at 11:55:31AM -0800, Ben Widawsky wrote:
> > +	obj = error->semaphore_obj;
> > +	if (obj) {
> 
> Chris will come along and change this to
> 
> if ((obj = error->semaphore_obj))

I was merely keeping in style with the rest of the code. Which was
probably written by me, so I can't win!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/13] drm/i915: Make semaphore updates more precise
  2014-01-30 11:25   ` Ville Syrjälä
@ 2014-02-11 16:08     ` Ben Widawsky
  2014-02-11 17:13       ` Ville Syrjälä
  0 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2014-02-11 16:08 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel GFX, Ben Widawsky

On Thu, Jan 30, 2014 at 01:25:42PM +0200, Ville Syrjälä wrote:
> On Wed, Jan 29, 2014 at 11:55:24AM -0800, Ben Widawsky wrote:
> > With the ring mask we now have an easy way to know the number of rings
> > in the system, and therefore can accurately predict the number of dwords
> > to emit for semaphore signalling. This was not possible (easily)
> > previously.
> > 
> > There should be no functional impact, simply fewer instructions emitted.
> > 
> > While we're here, simply do the round up to 2 instead of the fancier
> > rounding we did before, which rounding up per mbox, ie 4.
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 43 +++++++++++++++++----------------
> >  1 file changed, 22 insertions(+), 21 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 70f7190..97789ff 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -635,24 +635,20 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
> >  static int gen6_signal(struct intel_ring_buffer *signaller,
> >  		       unsigned int num_dwords)
> >  {
> > +#define MBOX_UPDATE_DWORDS 4
> >  	struct drm_device *dev = signaller->dev;
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	struct intel_ring_buffer *useless;
> > -	int i, ret;
> > +	int i, ret, num_rings;
> >  
> > -	/* NB: In order to be able to do semaphore MBOX updates for varying
> > -	 * number of rings, it's easiest if we round up each individual update
> > -	 * to a multiple of 2 (since ring updates must always be a multiple of
> > -	 * 2) even though the actual update only requires 3 dwords.
> > -	 */
> > -#define MBOX_UPDATE_DWORDS 4
> > -	if (i915_semaphore_is_enabled(dev))
> > -		num_dwords += ((I915_NUM_RINGS-1) * MBOX_UPDATE_DWORDS);
> > +	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
> > +	num_dwords = round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
> 
> num_dwords +=
> 

Hmm. I think I may have submitted the wrong patch here since I was
pretty certain Chris had caught this before. Anyway, thanks.

> Also round_up() is useless since it's already a multiple of 4. Or did
> you mean to change it to emit only 3 dwords per mbox?

Yep, I meant to change to 3, thanks. Double on the, maybe this was the
wrong patch.

> 
> > +#undef MBOX_UPDATE_DWORDS
> >  
> > -	ret = intel_ring_begin(signaller, num_dwords);
> > +	/* XXX: + 4 for the caller */
> > +	ret = intel_ring_begin(signaller, num_dwords + 4);
> 
> The += earlier gets rid of the +4 here.
> 
> >  	if (ret)
> >  		return ret;
> > -#undef MBOX_UPDATE_DWORDS
> >  
> >  	for_each_ring(useless, dev_priv, i) {
> >  		u32 mbox_reg = signaller->semaphore.signal_mbox[i];
> > @@ -661,14 +657,11 @@ static int gen6_signal(struct intel_ring_buffer *signaller,
> >  			intel_ring_emit(signaller, mbox_reg);
> >  			intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> >  			intel_ring_emit(signaller, MI_NOOP);
> > -		} else {
> > -			intel_ring_emit(signaller, MI_NOOP);
> > -			intel_ring_emit(signaller, MI_NOOP);
> > -			intel_ring_emit(signaller, MI_NOOP);
> > -			intel_ring_emit(signaller, MI_NOOP);
> >  		}
> >  	}
> >  
> > +	WARN_ON(i != num_rings);
> 
> So we're not expecting dev_priv->ring[] to be sparsely populated ever?

I'll never say "ever." Currently however it is unexpected. I suppose it
would be nicer to hit the WARN sooner in the init path, but I'm not sure
how offended you are by this. The code is still slightly muddy with
regards to rings on the HW vs. enabled rings - but it's a separate issue
which is slowly getting better.

In either case, there is a bug here because for_each_ring will be >
num_rings on SNB and IVB.

> 
> > +
> >  	return 0;
> >  }
> >  
> > @@ -686,7 +679,11 @@ gen6_add_request(struct intel_ring_buffer *ring)
> >  {
> >  	int ret;
> >  
> > -	ret = ring->semaphore.signal(ring, 4);
> > +	if (ring->semaphore.signal)
> > +		ret = ring->semaphore.signal(ring, 4);
> > +	else
> > +		ret = intel_ring_begin(ring, 4);
> > +
> >  	if (ret)
> >  		return ret;
> >  
> > @@ -1881,7 +1878,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
> >  		ring->get_seqno = gen6_ring_get_seqno;
> >  		ring->set_seqno = ring_set_seqno;
> >  		ring->semaphore.sync_to = gen6_ring_sync;
> > -		ring->semaphore.signal = gen6_signal;
> > +		if (i915_semaphore_is_enabled(dev))
> > +			ring->semaphore.signal = gen6_signal;
> 
> I guess we could also set .sync_to conditionally, but doesn't really
> matter since we won't call it anyway w/o semaphores enabled.
> 

I prefer that. Not sure why I didn't do it in the first place.

> >  		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
> >  		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_RV;
> >  		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_RB;
> > @@ -2058,7 +2056,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
> >  				gen6_ring_dispatch_execbuffer;
> >  		}
> >  		ring->semaphore.sync_to = gen6_ring_sync;
> > -		ring->semaphore.signal = gen6_signal;
> > +		if (i915_semaphore_is_enabled(dev))
> > +			ring->semaphore.signal = gen6_signal;
> >  		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VR;
> >  		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
> >  		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VB;
> > @@ -2116,7 +2115,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
> >  		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
> >  	}
> >  	ring->semaphore.sync_to = gen6_ring_sync;
> > -	ring->semaphore.signal = gen6_signal;
> > +	if (i915_semaphore_is_enabled(dev))
> > +		ring->semaphore.signal = gen6_signal;
> >  	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_BR;
> >  	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_BV;
> >  	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
> > @@ -2158,7 +2158,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
> >  		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
> >  	}
> >  	ring->semaphore.sync_to = gen6_ring_sync;
> > -	ring->semaphore.signal = gen6_signal;
> > +	if (i915_semaphore_is_enabled(dev))
> > +		ring->semaphore.signal = gen6_signal;
> >  	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VER;
> >  	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_VEV;
> >  	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VEB;
> -- 
> Ville Syrjälä
> Intel OTC

FYI: here is the diff I am using:

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 97789ff..3bec0f5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -635,18 +635,18 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
 static int gen6_signal(struct intel_ring_buffer *signaller,
 		       unsigned int num_dwords)
 {
-#define MBOX_UPDATE_DWORDS 4
+#define MBOX_UPDATE_DWORDS 3
 	struct drm_device *dev = signaller->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *useless;
 	int i, ret, num_rings;
 
 	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
-	num_dwords = round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
+	num_dwords += round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
 #undef MBOX_UPDATE_DWORDS
 
 	/* XXX: + 4 for the caller */
-	ret = intel_ring_begin(signaller, num_dwords + 4);
+	ret = intel_ring_begin(signaller, num_dwords);
 	if (ret)
 		return ret;
 
@@ -656,7 +656,6 @@ static int gen6_signal(struct intel_ring_buffer *signaller,
 			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
 			intel_ring_emit(signaller, mbox_reg);
 			intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
-			intel_ring_emit(signaller, MI_NOOP);
 		}
 	}
 
@@ -1877,9 +1876,10 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->get_seqno = gen6_ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
-		ring->semaphore.sync_to = gen6_ring_sync;
-		if (i915_semaphore_is_enabled(dev))
+		if (i915_semaphore_is_enabled(dev)) {
+			ring->semaphore.sync_to = gen6_ring_sync;
 			ring->semaphore.signal = gen6_signal;
+		}
 		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
 		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_RV;
 		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_RB;
@@ -2055,9 +2055,10 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->dispatch_execbuffer =
 				gen6_ring_dispatch_execbuffer;
 		}
-		ring->semaphore.sync_to = gen6_ring_sync;
-		if (i915_semaphore_is_enabled(dev))
+		if (i915_semaphore_is_enabled(dev)) {
+			ring->semaphore.sync_to = gen6_ring_sync;
 			ring->semaphore.signal = gen6_signal;
+		}
 		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VR;
 		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
 		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VB;
@@ -2114,9 +2115,10 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->irq_put = gen6_ring_put_irq;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 	}
-	ring->semaphore.sync_to = gen6_ring_sync;
-	if (i915_semaphore_is_enabled(dev))
+	if (i915_semaphore_is_enabled(dev)) {
 		ring->semaphore.signal = gen6_signal;
+		ring->semaphore.sync_to = gen6_ring_sync;
+	}
 	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_BR;
 	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_BV;
 	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
@@ -2157,9 +2159,10 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->irq_put = hsw_vebox_put_irq;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 	}
-	ring->semaphore.sync_to = gen6_ring_sync;
-	if (i915_semaphore_is_enabled(dev))
+	if (i915_semaphore_is_enabled(dev)) {
+		ring->semaphore.sync_to = gen6_ring_sync;
 		ring->semaphore.signal = gen6_signal;
+	}
 	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VER;
 	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_VEV;
 	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VEB;

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 04/13] drm/i915: Make semaphore updates more precise
  2014-02-11 16:08     ` Ben Widawsky
@ 2014-02-11 17:13       ` Ville Syrjälä
  0 siblings, 0 replies; 40+ messages in thread
From: Ville Syrjälä @ 2014-02-11 17:13 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Tue, Feb 11, 2014 at 08:08:27AM -0800, Ben Widawsky wrote:
> On Thu, Jan 30, 2014 at 01:25:42PM +0200, Ville Syrjälä wrote:
> > On Wed, Jan 29, 2014 at 11:55:24AM -0800, Ben Widawsky wrote:
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 97789ff..3bec0f5 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -635,18 +635,18 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
>  static int gen6_signal(struct intel_ring_buffer *signaller,
>  		       unsigned int num_dwords)
>  {
> -#define MBOX_UPDATE_DWORDS 4
> +#define MBOX_UPDATE_DWORDS 3
>  	struct drm_device *dev = signaller->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct intel_ring_buffer *useless;
>  	int i, ret, num_rings;
>  
>  	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
> -	num_dwords = round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
> +	num_dwords += round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
>  #undef MBOX_UPDATE_DWORDS
>  
>  	/* XXX: + 4 for the caller */
> -	ret = intel_ring_begin(signaller, num_dwords + 4);
> +	ret = intel_ring_begin(signaller, num_dwords);
>  	if (ret)
>  		return ret;
>  
> @@ -656,7 +656,6 @@ static int gen6_signal(struct intel_ring_buffer *signaller,
>  			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
>  			intel_ring_emit(signaller, mbox_reg);
>  			intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> -			intel_ring_emit(signaller, MI_NOOP);
>  		}
>  	}

Still need to emit an extra MI_NOOP if num_dwords got rounded.

>  
> @@ -1877,9 +1876,10 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>  		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
>  		ring->get_seqno = gen6_ring_get_seqno;
>  		ring->set_seqno = ring_set_seqno;
> -		ring->semaphore.sync_to = gen6_ring_sync;
> -		if (i915_semaphore_is_enabled(dev))
> +		if (i915_semaphore_is_enabled(dev)) {
> +			ring->semaphore.sync_to = gen6_ring_sync;
>  			ring->semaphore.signal = gen6_signal;
> +		}
>  		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
>  		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_RV;
>  		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_RB;
-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH] [v2] drm/i915: Make semaphore updates more precise
  2014-01-29 19:55 ` [PATCH 04/13] drm/i915: Make semaphore updates more precise Ben Widawsky
  2014-01-30 11:25   ` Ville Syrjälä
@ 2014-02-11 20:20   ` Ben Widawsky
  2014-02-11 20:53     ` Ville Syrjälä
  1 sibling, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2014-02-11 20:20 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky, Ben Widawsky

With the ring mask we now have an easy way to know the number of rings
in the system, and therefore can accurately predict the number of dwords
to emit for semaphore signalling. This was not possible (easily)
previously.

There should be no functional impact, simply fewer instructions emitted.

While we're here, simply do the round up to 2 instead of the fancier
rounding we did before, which rounding up per mbox, ie 4. This also
allows us to drop the unnecessary MI_NOOP, so not really 4, 3.

v2: Use 3 dwords instead of 4 (Ville)
Do the proper calculation to get the number of dwords to emit (Ville)
Conditionally set .sync_to when semaphores are enabled (Ville)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 55 ++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 70f7190..483684f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -635,24 +635,19 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
 static int gen6_signal(struct intel_ring_buffer *signaller,
 		       unsigned int num_dwords)
 {
+#define MBOX_UPDATE_DWORDS 3
 	struct drm_device *dev = signaller->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *useless;
-	int i, ret;
+	int i, ret, num_rings;
 
-	/* NB: In order to be able to do semaphore MBOX updates for varying
-	 * number of rings, it's easiest if we round up each individual update
-	 * to a multiple of 2 (since ring updates must always be a multiple of
-	 * 2) even though the actual update only requires 3 dwords.
-	 */
-#define MBOX_UPDATE_DWORDS 4
-	if (i915_semaphore_is_enabled(dev))
-		num_dwords += ((I915_NUM_RINGS-1) * MBOX_UPDATE_DWORDS);
+	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
+	num_dwords += round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
+#undef MBOX_UPDATE_DWORDS
 
 	ret = intel_ring_begin(signaller, num_dwords);
 	if (ret)
 		return ret;
-#undef MBOX_UPDATE_DWORDS
 
 	for_each_ring(useless, dev_priv, i) {
 		u32 mbox_reg = signaller->semaphore.signal_mbox[i];
@@ -660,15 +655,13 @@ static int gen6_signal(struct intel_ring_buffer *signaller,
 			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
 			intel_ring_emit(signaller, mbox_reg);
 			intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
-			intel_ring_emit(signaller, MI_NOOP);
-		} else {
-			intel_ring_emit(signaller, MI_NOOP);
-			intel_ring_emit(signaller, MI_NOOP);
-			intel_ring_emit(signaller, MI_NOOP);
-			intel_ring_emit(signaller, MI_NOOP);
 		}
 	}
 
+	/* If num_dwords was rounded, make sure the tail pointer is correct */
+	if (num_rings % 2 == 0)
+		intel_ring_emit(signaller, MI_NOOP);
+
 	return 0;
 }
 
@@ -686,7 +679,11 @@ gen6_add_request(struct intel_ring_buffer *ring)
 {
 	int ret;
 
-	ret = ring->semaphore.signal(ring, 4);
+	if (ring->semaphore.signal)
+		ret = ring->semaphore.signal(ring, 4);
+	else
+		ret = intel_ring_begin(ring, 4);
+
 	if (ret)
 		return ret;
 
@@ -1880,8 +1877,10 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		ring->get_seqno = gen6_ring_get_seqno;
 		ring->set_seqno = ring_set_seqno;
-		ring->semaphore.sync_to = gen6_ring_sync;
-		ring->semaphore.signal = gen6_signal;
+		if (i915_semaphore_is_enabled(dev)) {
+			ring->semaphore.sync_to = gen6_ring_sync;
+			ring->semaphore.signal = gen6_signal;
+		}
 		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
 		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_RV;
 		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_RB;
@@ -2057,8 +2056,10 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			ring->dispatch_execbuffer =
 				gen6_ring_dispatch_execbuffer;
 		}
-		ring->semaphore.sync_to = gen6_ring_sync;
-		ring->semaphore.signal = gen6_signal;
+		if (i915_semaphore_is_enabled(dev)) {
+			ring->semaphore.sync_to = gen6_ring_sync;
+			ring->semaphore.signal = gen6_signal;
+		}
 		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VR;
 		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
 		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VB;
@@ -2115,8 +2116,10 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		ring->irq_put = gen6_ring_put_irq;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 	}
-	ring->semaphore.sync_to = gen6_ring_sync;
-	ring->semaphore.signal = gen6_signal;
+	if (i915_semaphore_is_enabled(dev)) {
+		ring->semaphore.signal = gen6_signal;
+		ring->semaphore.sync_to = gen6_ring_sync;
+	}
 	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_BR;
 	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_BV;
 	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
@@ -2157,8 +2160,10 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		ring->irq_put = hsw_vebox_put_irq;
 		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 	}
-	ring->semaphore.sync_to = gen6_ring_sync;
-	ring->semaphore.signal = gen6_signal;
+	if (i915_semaphore_is_enabled(dev)) {
+		ring->semaphore.sync_to = gen6_ring_sync;
+		ring->semaphore.signal = gen6_signal;
+	}
 	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VER;
 	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_VEV;
 	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VEB;
-- 
1.8.5.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] [v2] drm/i915: Make semaphore updates more precise
  2014-02-11 20:20   ` [PATCH] [v2] " Ben Widawsky
@ 2014-02-11 20:53     ` Ville Syrjälä
  2014-02-11 21:50       ` Ben Widawsky
  0 siblings, 1 reply; 40+ messages in thread
From: Ville Syrjälä @ 2014-02-11 20:53 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Tue, Feb 11, 2014 at 12:20:42PM -0800, Ben Widawsky wrote:
> With the ring mask we now have an easy way to know the number of rings
> in the system, and therefore can accurately predict the number of dwords
> to emit for semaphore signalling. This was not possible (easily)
> previously.
> 
> There should be no functional impact, simply fewer instructions emitted.
> 
> While we're here, simply do the round up to 2 instead of the fancier
> rounding we did before, which rounding up per mbox, ie 4. This also
> allows us to drop the unnecessary MI_NOOP, so not really 4, 3.
> 
> v2: Use 3 dwords instead of 4 (Ville)
> Do the proper calculation to get the number of dwords to emit (Ville)
> Conditionally set .sync_to when semaphores are enabled (Ville)
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Yeah looks OK now. Well, assuming we don't keep going when we fail to
init one or more rings, because in that case the loop would fail to emit
all the dwords it was supposed to.

IIRC the rest of the patches looked good up to 05/11. So for patches
01-05:
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 55 ++++++++++++++++++---------------
>  1 file changed, 30 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 70f7190..483684f 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -635,24 +635,19 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
>  static int gen6_signal(struct intel_ring_buffer *signaller,
>  		       unsigned int num_dwords)
>  {
> +#define MBOX_UPDATE_DWORDS 3
>  	struct drm_device *dev = signaller->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct intel_ring_buffer *useless;
> -	int i, ret;
> +	int i, ret, num_rings;
>  
> -	/* NB: In order to be able to do semaphore MBOX updates for varying
> -	 * number of rings, it's easiest if we round up each individual update
> -	 * to a multiple of 2 (since ring updates must always be a multiple of
> -	 * 2) even though the actual update only requires 3 dwords.
> -	 */
> -#define MBOX_UPDATE_DWORDS 4
> -	if (i915_semaphore_is_enabled(dev))
> -		num_dwords += ((I915_NUM_RINGS-1) * MBOX_UPDATE_DWORDS);
> +	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
> +	num_dwords += round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
> +#undef MBOX_UPDATE_DWORDS
>  
>  	ret = intel_ring_begin(signaller, num_dwords);
>  	if (ret)
>  		return ret;
> -#undef MBOX_UPDATE_DWORDS
>  
>  	for_each_ring(useless, dev_priv, i) {
>  		u32 mbox_reg = signaller->semaphore.signal_mbox[i];
> @@ -660,15 +655,13 @@ static int gen6_signal(struct intel_ring_buffer *signaller,
>  			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
>  			intel_ring_emit(signaller, mbox_reg);
>  			intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> -			intel_ring_emit(signaller, MI_NOOP);
> -		} else {
> -			intel_ring_emit(signaller, MI_NOOP);
> -			intel_ring_emit(signaller, MI_NOOP);
> -			intel_ring_emit(signaller, MI_NOOP);
> -			intel_ring_emit(signaller, MI_NOOP);
>  		}
>  	}
>  
> +	/* If num_dwords was rounded, make sure the tail pointer is correct */
> +	if (num_rings % 2 == 0)
> +		intel_ring_emit(signaller, MI_NOOP);
> +
>  	return 0;
>  }
>  
> @@ -686,7 +679,11 @@ gen6_add_request(struct intel_ring_buffer *ring)
>  {
>  	int ret;
>  
> -	ret = ring->semaphore.signal(ring, 4);
> +	if (ring->semaphore.signal)
> +		ret = ring->semaphore.signal(ring, 4);
> +	else
> +		ret = intel_ring_begin(ring, 4);
> +
>  	if (ret)
>  		return ret;
>  
> @@ -1880,8 +1877,10 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>  		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
>  		ring->get_seqno = gen6_ring_get_seqno;
>  		ring->set_seqno = ring_set_seqno;
> -		ring->semaphore.sync_to = gen6_ring_sync;
> -		ring->semaphore.signal = gen6_signal;
> +		if (i915_semaphore_is_enabled(dev)) {
> +			ring->semaphore.sync_to = gen6_ring_sync;
> +			ring->semaphore.signal = gen6_signal;
> +		}
>  		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
>  		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_RV;
>  		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_RB;
> @@ -2057,8 +2056,10 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
>  			ring->dispatch_execbuffer =
>  				gen6_ring_dispatch_execbuffer;
>  		}
> -		ring->semaphore.sync_to = gen6_ring_sync;
> -		ring->semaphore.signal = gen6_signal;
> +		if (i915_semaphore_is_enabled(dev)) {
> +			ring->semaphore.sync_to = gen6_ring_sync;
> +			ring->semaphore.signal = gen6_signal;
> +		}
>  		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VR;
>  		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
>  		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VB;
> @@ -2115,8 +2116,10 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
>  		ring->irq_put = gen6_ring_put_irq;
>  		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
>  	}
> -	ring->semaphore.sync_to = gen6_ring_sync;
> -	ring->semaphore.signal = gen6_signal;
> +	if (i915_semaphore_is_enabled(dev)) {
> +		ring->semaphore.signal = gen6_signal;
> +		ring->semaphore.sync_to = gen6_ring_sync;
> +	}
>  	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_BR;
>  	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_BV;
>  	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
> @@ -2157,8 +2160,10 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
>  		ring->irq_put = hsw_vebox_put_irq;
>  		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
>  	}
> -	ring->semaphore.sync_to = gen6_ring_sync;
> -	ring->semaphore.signal = gen6_signal;
> +	if (i915_semaphore_is_enabled(dev)) {
> +		ring->semaphore.sync_to = gen6_ring_sync;
> +		ring->semaphore.signal = gen6_signal;
> +	}
>  	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VER;
>  	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_VEV;
>  	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VEB;
> -- 
> 1.8.5.4

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-01-30 13:35         ` Chris Wilson
@ 2014-02-11 21:48           ` Ben Widawsky
  2014-02-11 22:23             ` Chris Wilson
  0 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2014-02-11 21:48 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Ville Syrjälä,
	Ben Widawsky, Intel GFX

On Thu, Jan 30, 2014 at 01:35:41PM +0000, Chris Wilson wrote:
> On Thu, Jan 30, 2014 at 02:18:32PM +0100, Daniel Vetter wrote:
> > On Thu, Jan 30, 2014 at 1:46 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > Oh. So they changed how post-sync writes operated - this should be a
> > > separate fix for stable I believe (so that batches are not run before we
> > > have finished invalidating the TLBs required).
> > 
> > We have an igt to exercise tlb invalidation stuff, which runs on all
> > rings. But it only runs a batch, so only uses the CS tlb. Do we need
> > to extend this?
> 
> So the spec says:
> 
> Pipe Control Flush Enable (IVB+)
> If ENABLED, the PIPE_CONTROL command will wait until all previous writes
> of immediate data from post sync circles are complete before executing
> the next command.
> 
> Post Sync Operation
> This field specifies an optional action to be taken upon completion of
> the synchronization operation.
> 
> TLB Invalidate
> If ENABLED, all TLBs belonging to Render Engine will be invalidated once
> the flush operation is complete.
> 
> Command Streamer Stall Enable
> If ENABLED, the sync operation will not occur until all previous flush
> operations pending a completion of those previous flushes will complete,
> including the flush produced from this command. This enables the command
> to act similar to the legacy MI_FLUSH command.
> 
> Going by that, the order is
> 
> flush, stall, TLB invalidate / post-sync op, [pipe control flush]
> 
> Based on my reading of the above (which unless someone has a more
> definitive source) says that without the CONTROL_FLUSH_ENABLE, the CS
> can continue operations as soon as the flush is complete - in parallel
> to the TLB invalidate. Adding CONTROL_FLUSH_ENABLE would then stall the
> CS until the post-sync operation completes. That still leaves the
> possibility that the TLB invalidate is being performed in parallel and
> is itself provides no CS sync.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

so.... what the verdict?

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] [v2] drm/i915: Make semaphore updates more precise
  2014-02-11 20:53     ` Ville Syrjälä
@ 2014-02-11 21:50       ` Ben Widawsky
  0 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-02-11 21:50 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel GFX, Ben Widawsky

On Tue, Feb 11, 2014 at 10:53:40PM +0200, Ville Syrjälä wrote:
> On Tue, Feb 11, 2014 at 12:20:42PM -0800, Ben Widawsky wrote:
> > With the ring mask we now have an easy way to know the number of rings
> > in the system, and therefore can accurately predict the number of dwords
> > to emit for semaphore signalling. This was not possible (easily)
> > previously.
> > 
> > There should be no functional impact, simply fewer instructions emitted.
> > 
> > While we're here, simply do the round up to 2 instead of the fancier
> > rounding we did before, which rounding up per mbox, ie 4. This also
> > allows us to drop the unnecessary MI_NOOP, so not really 4, 3.
> > 
> > v2: Use 3 dwords instead of 4 (Ville)
> > Do the proper calculation to get the number of dwords to emit (Ville)
> > Conditionally set .sync_to when semaphores are enabled (Ville)
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> 
> Yeah looks OK now. Well, assuming we don't keep going when we fail to
> init one or more rings, because in that case the loop would fail to emit
> all the dwords it was supposed to.
> 

Yeah. I don't think this is ever the behavior we should aim form.
Though more generally though I feel our code chickens out too often. If
the HW is supposed to support it, I'd rather get a bug report than try
to limp along.

> IIRC the rest of the patches looked good up to 05/11. So for patches
> 01-05:
> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Thanks.

> 
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 55 ++++++++++++++++++---------------
> >  1 file changed, 30 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 70f7190..483684f 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -635,24 +635,19 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
> >  static int gen6_signal(struct intel_ring_buffer *signaller,
> >  		       unsigned int num_dwords)
> >  {
> > +#define MBOX_UPDATE_DWORDS 3
> >  	struct drm_device *dev = signaller->dev;
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	struct intel_ring_buffer *useless;
> > -	int i, ret;
> > +	int i, ret, num_rings;
> >  
> > -	/* NB: In order to be able to do semaphore MBOX updates for varying
> > -	 * number of rings, it's easiest if we round up each individual update
> > -	 * to a multiple of 2 (since ring updates must always be a multiple of
> > -	 * 2) even though the actual update only requires 3 dwords.
> > -	 */
> > -#define MBOX_UPDATE_DWORDS 4
> > -	if (i915_semaphore_is_enabled(dev))
> > -		num_dwords += ((I915_NUM_RINGS-1) * MBOX_UPDATE_DWORDS);
> > +	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
> > +	num_dwords += round_up((num_rings-1) * MBOX_UPDATE_DWORDS, 2);
> > +#undef MBOX_UPDATE_DWORDS
> >  
> >  	ret = intel_ring_begin(signaller, num_dwords);
> >  	if (ret)
> >  		return ret;
> > -#undef MBOX_UPDATE_DWORDS
> >  
> >  	for_each_ring(useless, dev_priv, i) {
> >  		u32 mbox_reg = signaller->semaphore.signal_mbox[i];
> > @@ -660,15 +655,13 @@ static int gen6_signal(struct intel_ring_buffer *signaller,
> >  			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
> >  			intel_ring_emit(signaller, mbox_reg);
> >  			intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> > -			intel_ring_emit(signaller, MI_NOOP);
> > -		} else {
> > -			intel_ring_emit(signaller, MI_NOOP);
> > -			intel_ring_emit(signaller, MI_NOOP);
> > -			intel_ring_emit(signaller, MI_NOOP);
> > -			intel_ring_emit(signaller, MI_NOOP);
> >  		}
> >  	}
> >  
> > +	/* If num_dwords was rounded, make sure the tail pointer is correct */
> > +	if (num_rings % 2 == 0)
> > +		intel_ring_emit(signaller, MI_NOOP);
> > +
> >  	return 0;
> >  }
> >  
> > @@ -686,7 +679,11 @@ gen6_add_request(struct intel_ring_buffer *ring)
> >  {
> >  	int ret;
> >  
> > -	ret = ring->semaphore.signal(ring, 4);
> > +	if (ring->semaphore.signal)
> > +		ret = ring->semaphore.signal(ring, 4);
> > +	else
> > +		ret = intel_ring_begin(ring, 4);
> > +
> >  	if (ret)
> >  		return ret;
> >  
> > @@ -1880,8 +1877,10 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
> >  		ring->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
> >  		ring->get_seqno = gen6_ring_get_seqno;
> >  		ring->set_seqno = ring_set_seqno;
> > -		ring->semaphore.sync_to = gen6_ring_sync;
> > -		ring->semaphore.signal = gen6_signal;
> > +		if (i915_semaphore_is_enabled(dev)) {
> > +			ring->semaphore.sync_to = gen6_ring_sync;
> > +			ring->semaphore.signal = gen6_signal;
> > +		}
> >  		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_INVALID;
> >  		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_RV;
> >  		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_RB;
> > @@ -2057,8 +2056,10 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
> >  			ring->dispatch_execbuffer =
> >  				gen6_ring_dispatch_execbuffer;
> >  		}
> > -		ring->semaphore.sync_to = gen6_ring_sync;
> > -		ring->semaphore.signal = gen6_signal;
> > +		if (i915_semaphore_is_enabled(dev)) {
> > +			ring->semaphore.sync_to = gen6_ring_sync;
> > +			ring->semaphore.signal = gen6_signal;
> > +		}
> >  		ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VR;
> >  		ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_INVALID;
> >  		ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VB;
> > @@ -2115,8 +2116,10 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
> >  		ring->irq_put = gen6_ring_put_irq;
> >  		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
> >  	}
> > -	ring->semaphore.sync_to = gen6_ring_sync;
> > -	ring->semaphore.signal = gen6_signal;
> > +	if (i915_semaphore_is_enabled(dev)) {
> > +		ring->semaphore.signal = gen6_signal;
> > +		ring->semaphore.sync_to = gen6_ring_sync;
> > +	}
> >  	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_BR;
> >  	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_BV;
> >  	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_INVALID;
> > @@ -2157,8 +2160,10 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
> >  		ring->irq_put = hsw_vebox_put_irq;
> >  		ring->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
> >  	}
> > -	ring->semaphore.sync_to = gen6_ring_sync;
> > -	ring->semaphore.signal = gen6_signal;
> > +	if (i915_semaphore_is_enabled(dev)) {
> > +		ring->semaphore.sync_to = gen6_ring_sync;
> > +		ring->semaphore.signal = gen6_signal;
> > +	}
> >  	ring->semaphore.mbox[RCS] = MI_SEMAPHORE_SYNC_VER;
> >  	ring->semaphore.mbox[VCS] = MI_SEMAPHORE_SYNC_VEV;
> >  	ring->semaphore.mbox[BCS] = MI_SEMAPHORE_SYNC_VEB;
> > -- 
> > 1.8.5.4
> 
> -- 
> Ville Syrjälä
> Intel OTC

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-01-30 12:38   ` Ville Syrjälä
  2014-01-30 12:46     ` Chris Wilson
@ 2014-02-11 22:11     ` Ben Widawsky
  2014-02-11 22:22       ` Ben Widawsky
  1 sibling, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2014-02-11 22:11 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel GFX, Ben Widawsky

On Thu, Jan 30, 2014 at 02:38:17PM +0200, Ville Syrjälä wrote:
> On Wed, Jan 29, 2014 at 11:55:26AM -0800, Ben Widawsky wrote:
> > Semaphore signalling works similarly to previous GENs with the exception
> > that the per ring mailboxes no longer exist. Instead you must define
> > your own space, somewhere in the GTT.
> > 
> > The comments in the code define the layout I've opted for, which should
> > be fairly future proof. Ie. I tried to define offsets in abstract terms
> > (NUM_RINGS, seqno size, etc).
> > 
> > NOTE: If one wanted to move this to the HWSP they could. I've decided
> > one 4k object would be easier to deal with, and provide potential wins
> > with cache locality, but that's all speculative.
> > 
> > v2: Update the macro to not need the other ring's ring->id (Chris)
> > Update the comment to use the correct formula (Chris)
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |   1 +
> >  drivers/gpu/drm/i915/i915_reg.h         |   5 +-
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 199 +++++++++++++++++++++++++-------
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  38 +++++-
> >  4 files changed, 197 insertions(+), 46 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 3673ba1..f521059 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1380,6 +1380,7 @@ typedef struct drm_i915_private {
> >  
> >  	struct pci_dev *bridge_dev;
> >  	struct intel_ring_buffer ring[I915_NUM_RINGS];
> > +	struct drm_i915_gem_object *semaphore_obj;
> >  	uint32_t last_seqno, next_seqno;
> >  
> >  	drm_dma_handle_t *status_page_dmah;
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index cbbaf26..8b745dc 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -216,7 +216,7 @@
> >  #define   MI_DISPLAY_FLIP_IVB_SPRITE_B (3 << 19)
> >  #define   MI_DISPLAY_FLIP_IVB_PLANE_C  (4 << 19)
> >  #define   MI_DISPLAY_FLIP_IVB_SPRITE_C (5 << 19)
> > -#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6+ */
> > +#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6, gen7 */
> >  #define   MI_SEMAPHORE_GLOBAL_GTT    (1<<22)
> >  #define   MI_SEMAPHORE_UPDATE	    (1<<21)
> >  #define   MI_SEMAPHORE_COMPARE	    (1<<20)
> > @@ -241,6 +241,8 @@
> >  #define   MI_RESTORE_EXT_STATE_EN	(1<<2)
> >  #define   MI_FORCE_RESTORE		(1<<1)
> >  #define   MI_RESTORE_INHIBIT		(1<<0)
> > +#define MI_SEMAPHORE_SIGNAL	MI_INSTR(0x1b, 0) /* GEN8+ */
> > +#define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
> >  #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
> >  #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
> >  #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
> > @@ -329,6 +331,7 @@
> >  #define   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE		(1<<10) /* GM45+ only */
> >  #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
> >  #define   PIPE_CONTROL_NOTIFY				(1<<8)
> > +#define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
> >  #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
> >  #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
> >  #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 37ae2b1..b750835 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -619,6 +619,13 @@ static int init_render_ring(struct intel_ring_buffer *ring)
> >  static void render_ring_cleanup(struct intel_ring_buffer *ring)
> >  {
> >  	struct drm_device *dev = ring->dev;
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +
> > +	if (dev_priv->semaphore_obj) {
> > +		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
> > +		drm_gem_object_unreference(&dev_priv->semaphore_obj->base);
> > +		dev_priv->semaphore_obj = NULL;
> > +	}
> >  
> >  	if (ring->scratch.obj == NULL)
> >  		return;
> > @@ -632,6 +639,86 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
> >  	ring->scratch.obj = NULL;
> >  }
> >  
> > +static int gen8_rcs_signal(struct intel_ring_buffer *signaller,
> > +			   unsigned int num_dwords)
> > +{
> > +#define MBOX_UPDATE_DWORDS 8
> > +	struct drm_device *dev = signaller->dev;
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +	struct intel_ring_buffer *waiter;
> > +	int i, ret, num_rings;
> > +
> > +	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
> > +	num_dwords = (num_rings-1) * MBOX_UPDATE_DWORDS;
> 
> Again num_dwords +=
> 
> > +#undef MBOX_UPDATE_DWORDS
> > +
> > +	/* XXX: + 4 for the caller */
> > +	ret = intel_ring_begin(signaller, num_dwords + 4);
> 
> and the +4 goes away.
> 
> > +	if (ret)
> > +		return ret;
> > +
> > +	for_each_ring(waiter, dev_priv, i) {
> > +		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
> > +		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
> > +			continue;
> > +
> > +		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
> > +		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
> > +					   PIPE_CONTROL_QW_WRITE |
> > +					   PIPE_CONTROL_FLUSH_ENABLE);
> > +		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
> > +		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> > +		intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> > +		intel_ring_emit(signaller, 0);
> > +		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
> > +					   MI_SEMAPHORE_TARGET(waiter->id));
> > +		intel_ring_emit(signaller, 0);
> > +	}
> > +
> > +	WARN_ON(i != num_rings);
> > +
> > +	return 0;
> > +}
> 
> <snip>

Got those, thanks.

> 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index c69ae10..f1e7a66 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -111,6 +111,39 @@ struct  intel_ring_buffer {
> >  #define I915_DISPATCH_PINNED 0x2
> >  	void		(*cleanup)(struct intel_ring_buffer *ring);
> >  
> > +	/* GEN8 signal/wait table
> > +	 *	  signal to  signal to    signal to   signal to
> > +	 *	    RCS         VCS          BCS        VECS
> > +	 *      ------------------------------------------------------
> > +	 *  RCS | NOP (0x00) | BCS (0x08) | VCS (0x10) | VECS (0x18) |
> > +	 *	|-----------------------------------------------------
> > +	 *  VCS | RCS (0x20) | NOP (0x28) | BCS (0x30) | VECS (0x38) |
> > +	 *	|-----------------------------------------------------
> > +	 *  BCS | RCS (0x40) | VCS (0x48) | NOP (0x50) | VECS (0x58) |
> > +	 *	|-----------------------------------------------------
> > +	 * VECS | RCS (0x60) | VCS (0x68) | BCS (0x70) |  NOP (0x78) |
> > +	 *	|-----------------------------------------------------
> > +	 *
> > +	 * Generalization:
> > +	 *  f(x, y) := (x->id * NUM_RINGS * seqno_size) + (seqno_size * y->id)
> > +	 *  ie. transpose of g(x, y)
> > +	 *
> > +	 *	 sync from   sync from    sync from    sync from
> > +	 *	    RCS         VCS          BCS        VECS
> > +	 *      ------------------------------------------------------
> > +	 *  RCS | NOP (0x00) | BCS (0x20) | VCS (0x40) | VECS (0x60) |
> > +	 *	|-----------------------------------------------------
> > +	 *  VCS | RCS (0x08) | NOP (0x28) | BCS (0x48) | VECS (0x68) |
> > +	 *	|-----------------------------------------------------
> > +	 *  BCS | RCS (0x10) | VCS (0x30) | NOP (0x50) | VECS (0x60) |
> > +	 *	|-----------------------------------------------------
> > +	 * VECS | RCS (0x18) | VCS (0x38) | BCS (0x58) |  NOP (0x78) |
> > +	 *	|-----------------------------------------------------
> > +	 *
> > +	 * Generalization:
> > +	 *  g(x, y) := (y->id * NUM_RINGS * seqno_size) + (seqno_size * x->id)
> > +	 *  ie. transpose of f(x, y)
> > +	 */
> >  	struct {
> >  		u32	sync_seqno[I915_NUM_RINGS-1];
> >  		/* AKA wait() */
> > @@ -120,7 +153,10 @@ struct  intel_ring_buffer {
> >  		/* our mbox written by others */
> >  		u32		mbox[I915_NUM_RINGS];
> 
> mbox should also get a u64 friend, right?

mbox should be gen6 only, given the change to using the gtt on gen8. In
this point in the series, semaphores should be forcibly disabled on
gen8, so the code looks wrong, but the path cannot [should not] be
taken.

I suppose I should kill the initialization of mbox for gen8, or somehow
consolidate with a union to prevent confusion.

> 
> >  		/* mboxes this ring signals to */
> > -		u32		signal_mbox[I915_NUM_RINGS];
> > +		union {
> > +			u32		signal_mbox[I915_NUM_RINGS];
> > +			u64		signal_ggtt[I915_NUM_RINGS];
> > +		};
> >  
> >  		/* num_dwords is space the caller will need for atomic update */
> >  		int		(*signal)(struct intel_ring_buffer *signaller,
> > -- 
> > 1.8.5.3
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ville Syrjälä
> Intel OTC

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-02-11 22:11     ` Ben Widawsky
@ 2014-02-11 22:22       ` Ben Widawsky
  2014-02-11 23:01         ` Ben Widawsky
  0 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2014-02-11 22:22 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel GFX, Ben Widawsky

On Tue, Feb 11, 2014 at 02:11:04PM -0800, Ben Widawsky wrote:
> On Thu, Jan 30, 2014 at 02:38:17PM +0200, Ville Syrjälä wrote:
> > On Wed, Jan 29, 2014 at 11:55:26AM -0800, Ben Widawsky wrote:
> > > Semaphore signalling works similarly to previous GENs with the exception
> > > that the per ring mailboxes no longer exist. Instead you must define
> > > your own space, somewhere in the GTT.
> > > 
> > > The comments in the code define the layout I've opted for, which should
> > > be fairly future proof. Ie. I tried to define offsets in abstract terms
> > > (NUM_RINGS, seqno size, etc).
> > > 
> > > NOTE: If one wanted to move this to the HWSP they could. I've decided
> > > one 4k object would be easier to deal with, and provide potential wins
> > > with cache locality, but that's all speculative.
> > > 
> > > v2: Update the macro to not need the other ring's ring->id (Chris)
> > > Update the comment to use the correct formula (Chris)
> > > 
> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > > ---
> > >  drivers/gpu/drm/i915/i915_drv.h         |   1 +
> > >  drivers/gpu/drm/i915/i915_reg.h         |   5 +-
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 199 +++++++++++++++++++++++++-------
> > >  drivers/gpu/drm/i915/intel_ringbuffer.h |  38 +++++-
> > >  4 files changed, 197 insertions(+), 46 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > index 3673ba1..f521059 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -1380,6 +1380,7 @@ typedef struct drm_i915_private {
> > >  
> > >  	struct pci_dev *bridge_dev;
> > >  	struct intel_ring_buffer ring[I915_NUM_RINGS];
> > > +	struct drm_i915_gem_object *semaphore_obj;
> > >  	uint32_t last_seqno, next_seqno;
> > >  
> > >  	drm_dma_handle_t *status_page_dmah;
> > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > > index cbbaf26..8b745dc 100644
> > > --- a/drivers/gpu/drm/i915/i915_reg.h
> > > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > > @@ -216,7 +216,7 @@
> > >  #define   MI_DISPLAY_FLIP_IVB_SPRITE_B (3 << 19)
> > >  #define   MI_DISPLAY_FLIP_IVB_PLANE_C  (4 << 19)
> > >  #define   MI_DISPLAY_FLIP_IVB_SPRITE_C (5 << 19)
> > > -#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6+ */
> > > +#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6, gen7 */
> > >  #define   MI_SEMAPHORE_GLOBAL_GTT    (1<<22)
> > >  #define   MI_SEMAPHORE_UPDATE	    (1<<21)
> > >  #define   MI_SEMAPHORE_COMPARE	    (1<<20)
> > > @@ -241,6 +241,8 @@
> > >  #define   MI_RESTORE_EXT_STATE_EN	(1<<2)
> > >  #define   MI_FORCE_RESTORE		(1<<1)
> > >  #define   MI_RESTORE_INHIBIT		(1<<0)
> > > +#define MI_SEMAPHORE_SIGNAL	MI_INSTR(0x1b, 0) /* GEN8+ */
> > > +#define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
> > >  #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
> > >  #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
> > >  #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
> > > @@ -329,6 +331,7 @@
> > >  #define   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE		(1<<10) /* GM45+ only */
> > >  #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
> > >  #define   PIPE_CONTROL_NOTIFY				(1<<8)
> > > +#define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
> > >  #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
> > >  #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
> > >  #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index 37ae2b1..b750835 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -619,6 +619,13 @@ static int init_render_ring(struct intel_ring_buffer *ring)
> > >  static void render_ring_cleanup(struct intel_ring_buffer *ring)
> > >  {
> > >  	struct drm_device *dev = ring->dev;
> > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > +
> > > +	if (dev_priv->semaphore_obj) {
> > > +		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
> > > +		drm_gem_object_unreference(&dev_priv->semaphore_obj->base);
> > > +		dev_priv->semaphore_obj = NULL;
> > > +	}
> > >  
> > >  	if (ring->scratch.obj == NULL)
> > >  		return;
> > > @@ -632,6 +639,86 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
> > >  	ring->scratch.obj = NULL;
> > >  }
> > >  
> > > +static int gen8_rcs_signal(struct intel_ring_buffer *signaller,
> > > +			   unsigned int num_dwords)
> > > +{
> > > +#define MBOX_UPDATE_DWORDS 8
> > > +	struct drm_device *dev = signaller->dev;
> > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > +	struct intel_ring_buffer *waiter;
> > > +	int i, ret, num_rings;
> > > +
> > > +	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
> > > +	num_dwords = (num_rings-1) * MBOX_UPDATE_DWORDS;
> > 
> > Again num_dwords +=
> > 
> > > +#undef MBOX_UPDATE_DWORDS
> > > +
> > > +	/* XXX: + 4 for the caller */
> > > +	ret = intel_ring_begin(signaller, num_dwords + 4);
> > 
> > and the +4 goes away.
> > 
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	for_each_ring(waiter, dev_priv, i) {
> > > +		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
> > > +		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
> > > +			continue;
> > > +
> > > +		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
> > > +		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
> > > +					   PIPE_CONTROL_QW_WRITE |
> > > +					   PIPE_CONTROL_FLUSH_ENABLE);
> > > +		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
> > > +		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> > > +		intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> > > +		intel_ring_emit(signaller, 0);
> > > +		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
> > > +					   MI_SEMAPHORE_TARGET(waiter->id));
> > > +		intel_ring_emit(signaller, 0);
> > > +	}
> > > +
> > > +	WARN_ON(i != num_rings);
> > > +
> > > +	return 0;
> > > +}
> > 
> > <snip>
> 
> Got those, thanks.
> 
> > 
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > index c69ae10..f1e7a66 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > @@ -111,6 +111,39 @@ struct  intel_ring_buffer {
> > >  #define I915_DISPATCH_PINNED 0x2
> > >  	void		(*cleanup)(struct intel_ring_buffer *ring);
> > >  
> > > +	/* GEN8 signal/wait table
> > > +	 *	  signal to  signal to    signal to   signal to
> > > +	 *	    RCS         VCS          BCS        VECS
> > > +	 *      ------------------------------------------------------
> > > +	 *  RCS | NOP (0x00) | BCS (0x08) | VCS (0x10) | VECS (0x18) |
> > > +	 *	|-----------------------------------------------------
> > > +	 *  VCS | RCS (0x20) | NOP (0x28) | BCS (0x30) | VECS (0x38) |
> > > +	 *	|-----------------------------------------------------
> > > +	 *  BCS | RCS (0x40) | VCS (0x48) | NOP (0x50) | VECS (0x58) |
> > > +	 *	|-----------------------------------------------------
> > > +	 * VECS | RCS (0x60) | VCS (0x68) | BCS (0x70) |  NOP (0x78) |
> > > +	 *	|-----------------------------------------------------
> > > +	 *
> > > +	 * Generalization:
> > > +	 *  f(x, y) := (x->id * NUM_RINGS * seqno_size) + (seqno_size * y->id)
> > > +	 *  ie. transpose of g(x, y)
> > > +	 *
> > > +	 *	 sync from   sync from    sync from    sync from
> > > +	 *	    RCS         VCS          BCS        VECS
> > > +	 *      ------------------------------------------------------
> > > +	 *  RCS | NOP (0x00) | BCS (0x20) | VCS (0x40) | VECS (0x60) |
> > > +	 *	|-----------------------------------------------------
> > > +	 *  VCS | RCS (0x08) | NOP (0x28) | BCS (0x48) | VECS (0x68) |
> > > +	 *	|-----------------------------------------------------
> > > +	 *  BCS | RCS (0x10) | VCS (0x30) | NOP (0x50) | VECS (0x60) |
> > > +	 *	|-----------------------------------------------------
> > > +	 * VECS | RCS (0x18) | VCS (0x38) | BCS (0x58) |  NOP (0x78) |
> > > +	 *	|-----------------------------------------------------
> > > +	 *
> > > +	 * Generalization:
> > > +	 *  g(x, y) := (y->id * NUM_RINGS * seqno_size) + (seqno_size * x->id)
> > > +	 *  ie. transpose of f(x, y)
> > > +	 */
> > >  	struct {
> > >  		u32	sync_seqno[I915_NUM_RINGS-1];
> > >  		/* AKA wait() */
> > > @@ -120,7 +153,10 @@ struct  intel_ring_buffer {
> > >  		/* our mbox written by others */
> > >  		u32		mbox[I915_NUM_RINGS];
> > 
> > mbox should also get a u64 friend, right?
> 
> mbox should be gen6 only, given the change to using the gtt on gen8. In
> this point in the series, semaphores should be forcibly disabled on
> gen8, so the code looks wrong, but the path cannot [should not] be
> taken.
> 
> I suppose I should kill the initialization of mbox for gen8, or somehow
> consolidate with a union to prevent confusion.
> 

Just to clarify it should be

gen6:
signal uses signal_mbox for signal
wait uses mbox

gen8:
signal uses signal_ggtt for signal
wait uses arithmetic to figure out the offset

> > 
> > >  		/* mboxes this ring signals to */
> > > -		u32		signal_mbox[I915_NUM_RINGS];
> > > +		union {
> > > +			u32		signal_mbox[I915_NUM_RINGS];
> > > +			u64		signal_ggtt[I915_NUM_RINGS];
> > > +		};
> > >  
> > >  		/* num_dwords is space the caller will need for atomic update */
> > >  		int		(*signal)(struct intel_ring_buffer *signaller,
> > > -- 
> > > 1.8.5.3
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > 
> > -- 
> > Ville Syrjälä
> > Intel OTC
> 
> -- 
> Ben Widawsky, Intel Open Source Technology Center

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-02-11 21:48           ` Ben Widawsky
@ 2014-02-11 22:23             ` Chris Wilson
  2014-02-11 22:25               ` Ben Widawsky
  0 siblings, 1 reply; 40+ messages in thread
From: Chris Wilson @ 2014-02-11 22:23 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Tue, Feb 11, 2014 at 01:48:22PM -0800, Ben Widawsky wrote:
> On Thu, Jan 30, 2014 at 01:35:41PM +0000, Chris Wilson wrote:
> > On Thu, Jan 30, 2014 at 02:18:32PM +0100, Daniel Vetter wrote:
> > > On Thu, Jan 30, 2014 at 1:46 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > Oh. So they changed how post-sync writes operated - this should be a
> > > > separate fix for stable I believe (so that batches are not run before we
> > > > have finished invalidating the TLBs required).
> > > 
> > > We have an igt to exercise tlb invalidation stuff, which runs on all
> > > rings. But it only runs a batch, so only uses the CS tlb. Do we need
> > > to extend this?
> > 
> > So the spec says:
> > 
> > Pipe Control Flush Enable (IVB+)
> > If ENABLED, the PIPE_CONTROL command will wait until all previous writes
> > of immediate data from post sync circles are complete before executing
> > the next command.
> > 
> > Post Sync Operation
> > This field specifies an optional action to be taken upon completion of
> > the synchronization operation.
> > 
> > TLB Invalidate
> > If ENABLED, all TLBs belonging to Render Engine will be invalidated once
> > the flush operation is complete.
> > 
> > Command Streamer Stall Enable
> > If ENABLED, the sync operation will not occur until all previous flush
> > operations pending a completion of those previous flushes will complete,
> > including the flush produced from this command. This enables the command
> > to act similar to the legacy MI_FLUSH command.
> > 
> > Going by that, the order is
> > 
> > flush, stall, TLB invalidate / post-sync op, [pipe control flush]
> > 
> > Based on my reading of the above (which unless someone has a more
> > definitive source) says that without the CONTROL_FLUSH_ENABLE, the CS
> > can continue operations as soon as the flush is complete - in parallel
> > to the TLB invalidate. Adding CONTROL_FLUSH_ENABLE would then stall the
> > CS until the post-sync operation completes. That still leaves the
> > possibility that the TLB invalidate is being performed in parallel and
> > is itself provides no CS sync.
> > -Chris
> > 
> > -- 
> > Chris Wilson, Intel Open Source Technology Centre
> 
> so.... what the verdict?

Gut feeling is that it fixes an issue with IVB TLB invalidate.
(Not yet sure if the bug I was looking at was accidentally fixed at the
same time as testing this.)
So cc stable@
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-02-11 22:23             ` Chris Wilson
@ 2014-02-11 22:25               ` Ben Widawsky
  2014-02-11 22:28                 ` Chris Wilson
  0 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2014-02-11 22:25 UTC (permalink / raw)
  To: Chris Wilson, Ben Widawsky, Daniel Vetter,
	Ville Syrjälä,
	Intel GFX

On Tue, Feb 11, 2014 at 10:23:38PM +0000, Chris Wilson wrote:
> On Tue, Feb 11, 2014 at 01:48:22PM -0800, Ben Widawsky wrote:
> > On Thu, Jan 30, 2014 at 01:35:41PM +0000, Chris Wilson wrote:
> > > On Thu, Jan 30, 2014 at 02:18:32PM +0100, Daniel Vetter wrote:
> > > > On Thu, Jan 30, 2014 at 1:46 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > > Oh. So they changed how post-sync writes operated - this should be a
> > > > > separate fix for stable I believe (so that batches are not run before we
> > > > > have finished invalidating the TLBs required).
> > > > 
> > > > We have an igt to exercise tlb invalidation stuff, which runs on all
> > > > rings. But it only runs a batch, so only uses the CS tlb. Do we need
> > > > to extend this?
> > > 
> > > So the spec says:
> > > 
> > > Pipe Control Flush Enable (IVB+)
> > > If ENABLED, the PIPE_CONTROL command will wait until all previous writes
> > > of immediate data from post sync circles are complete before executing
> > > the next command.
> > > 
> > > Post Sync Operation
> > > This field specifies an optional action to be taken upon completion of
> > > the synchronization operation.
> > > 
> > > TLB Invalidate
> > > If ENABLED, all TLBs belonging to Render Engine will be invalidated once
> > > the flush operation is complete.
> > > 
> > > Command Streamer Stall Enable
> > > If ENABLED, the sync operation will not occur until all previous flush
> > > operations pending a completion of those previous flushes will complete,
> > > including the flush produced from this command. This enables the command
> > > to act similar to the legacy MI_FLUSH command.
> > > 
> > > Going by that, the order is
> > > 
> > > flush, stall, TLB invalidate / post-sync op, [pipe control flush]
> > > 
> > > Based on my reading of the above (which unless someone has a more
> > > definitive source) says that without the CONTROL_FLUSH_ENABLE, the CS
> > > can continue operations as soon as the flush is complete - in parallel
> > > to the TLB invalidate. Adding CONTROL_FLUSH_ENABLE would then stall the
> > > CS until the post-sync operation completes. That still leaves the
> > > possibility that the TLB invalidate is being performed in parallel and
> > > is itself provides no CS sync.
> > > -Chris
> > > 
> > > -- 
> > > Chris Wilson, Intel Open Source Technology Centre
> > 
> > so.... what the verdict?
> 
> Gut feeling is that it fixes an issue with IVB TLB invalidate.
> (Not yet sure if the bug I was looking at was accidentally fixed at the
> same time as testing this.)
> So cc stable@
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

You still want a separate patch?

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-02-11 22:25               ` Ben Widawsky
@ 2014-02-11 22:28                 ` Chris Wilson
  0 siblings, 0 replies; 40+ messages in thread
From: Chris Wilson @ 2014-02-11 22:28 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Ben Widawsky, Intel GFX

On Tue, Feb 11, 2014 at 02:25:43PM -0800, Ben Widawsky wrote:
> On Tue, Feb 11, 2014 at 10:23:38PM +0000, Chris Wilson wrote:
> > Gut feeling is that it fixes an issue with IVB TLB invalidate.
> > (Not yet sure if the bug I was looking at was accidentally fixed at the
> > same time as testing this.)
> > So cc stable@

> You still want a separate patch?

Actually, bad news for me. The bug I had thought had gone, was merely
dorminant. It reappeared, so I have no known issue that this fixes. :(

I still think we need to add the Pipe Control Flush Enable to the TLB
invalidate sequence though, but no longer urgent.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-02-11 22:22       ` Ben Widawsky
@ 2014-02-11 23:01         ` Ben Widawsky
  2014-02-12  9:29           ` Ville Syrjälä
  0 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2014-02-11 23:01 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel GFX, Ben Widawsky

On Tue, Feb 11, 2014 at 02:22:37PM -0800, Ben Widawsky wrote:
> On Tue, Feb 11, 2014 at 02:11:04PM -0800, Ben Widawsky wrote:
> > On Thu, Jan 30, 2014 at 02:38:17PM +0200, Ville Syrjälä wrote:
> > > On Wed, Jan 29, 2014 at 11:55:26AM -0800, Ben Widawsky wrote:
> > > > Semaphore signalling works similarly to previous GENs with the exception
> > > > that the per ring mailboxes no longer exist. Instead you must define
> > > > your own space, somewhere in the GTT.
> > > > 
> > > > The comments in the code define the layout I've opted for, which should
> > > > be fairly future proof. Ie. I tried to define offsets in abstract terms
> > > > (NUM_RINGS, seqno size, etc).
> > > > 
> > > > NOTE: If one wanted to move this to the HWSP they could. I've decided
> > > > one 4k object would be easier to deal with, and provide potential wins
> > > > with cache locality, but that's all speculative.
> > > > 
> > > > v2: Update the macro to not need the other ring's ring->id (Chris)
> > > > Update the comment to use the correct formula (Chris)
> > > > 
> > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_drv.h         |   1 +
> > > >  drivers/gpu/drm/i915/i915_reg.h         |   5 +-
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 199 +++++++++++++++++++++++++-------
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.h |  38 +++++-
> > > >  4 files changed, 197 insertions(+), 46 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > > index 3673ba1..f521059 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > @@ -1380,6 +1380,7 @@ typedef struct drm_i915_private {
> > > >  
> > > >  	struct pci_dev *bridge_dev;
> > > >  	struct intel_ring_buffer ring[I915_NUM_RINGS];
> > > > +	struct drm_i915_gem_object *semaphore_obj;
> > > >  	uint32_t last_seqno, next_seqno;
> > > >  
> > > >  	drm_dma_handle_t *status_page_dmah;
> > > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > > > index cbbaf26..8b745dc 100644
> > > > --- a/drivers/gpu/drm/i915/i915_reg.h
> > > > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > > > @@ -216,7 +216,7 @@
> > > >  #define   MI_DISPLAY_FLIP_IVB_SPRITE_B (3 << 19)
> > > >  #define   MI_DISPLAY_FLIP_IVB_PLANE_C  (4 << 19)
> > > >  #define   MI_DISPLAY_FLIP_IVB_SPRITE_C (5 << 19)
> > > > -#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6+ */
> > > > +#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6, gen7 */
> > > >  #define   MI_SEMAPHORE_GLOBAL_GTT    (1<<22)
> > > >  #define   MI_SEMAPHORE_UPDATE	    (1<<21)
> > > >  #define   MI_SEMAPHORE_COMPARE	    (1<<20)
> > > > @@ -241,6 +241,8 @@
> > > >  #define   MI_RESTORE_EXT_STATE_EN	(1<<2)
> > > >  #define   MI_FORCE_RESTORE		(1<<1)
> > > >  #define   MI_RESTORE_INHIBIT		(1<<0)
> > > > +#define MI_SEMAPHORE_SIGNAL	MI_INSTR(0x1b, 0) /* GEN8+ */
> > > > +#define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
> > > >  #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
> > > >  #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
> > > >  #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
> > > > @@ -329,6 +331,7 @@
> > > >  #define   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE		(1<<10) /* GM45+ only */
> > > >  #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
> > > >  #define   PIPE_CONTROL_NOTIFY				(1<<8)
> > > > +#define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
> > > >  #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
> > > >  #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
> > > >  #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > index 37ae2b1..b750835 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -619,6 +619,13 @@ static int init_render_ring(struct intel_ring_buffer *ring)
> > > >  static void render_ring_cleanup(struct intel_ring_buffer *ring)
> > > >  {
> > > >  	struct drm_device *dev = ring->dev;
> > > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > > +
> > > > +	if (dev_priv->semaphore_obj) {
> > > > +		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
> > > > +		drm_gem_object_unreference(&dev_priv->semaphore_obj->base);
> > > > +		dev_priv->semaphore_obj = NULL;
> > > > +	}
> > > >  
> > > >  	if (ring->scratch.obj == NULL)
> > > >  		return;
> > > > @@ -632,6 +639,86 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
> > > >  	ring->scratch.obj = NULL;
> > > >  }
> > > >  
> > > > +static int gen8_rcs_signal(struct intel_ring_buffer *signaller,
> > > > +			   unsigned int num_dwords)
> > > > +{
> > > > +#define MBOX_UPDATE_DWORDS 8
> > > > +	struct drm_device *dev = signaller->dev;
> > > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > > +	struct intel_ring_buffer *waiter;
> > > > +	int i, ret, num_rings;
> > > > +
> > > > +	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
> > > > +	num_dwords = (num_rings-1) * MBOX_UPDATE_DWORDS;
> > > 
> > > Again num_dwords +=
> > > 
> > > > +#undef MBOX_UPDATE_DWORDS
> > > > +
> > > > +	/* XXX: + 4 for the caller */
> > > > +	ret = intel_ring_begin(signaller, num_dwords + 4);
> > > 
> > > and the +4 goes away.
> > > 
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	for_each_ring(waiter, dev_priv, i) {
> > > > +		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
> > > > +		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
> > > > +			continue;
> > > > +
> > > > +		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
> > > > +		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
> > > > +					   PIPE_CONTROL_QW_WRITE |
> > > > +					   PIPE_CONTROL_FLUSH_ENABLE);
> > > > +		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
> > > > +		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> > > > +		intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> > > > +		intel_ring_emit(signaller, 0);
> > > > +		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
> > > > +					   MI_SEMAPHORE_TARGET(waiter->id));
> > > > +		intel_ring_emit(signaller, 0);
> > > > +	}
> > > > +
> > > > +	WARN_ON(i != num_rings);
> > > > +
> > > > +	return 0;
> > > > +}
> > > 
> > > <snip>
> > 
> > Got those, thanks.
> > 
> > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > > index c69ae10..f1e7a66 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > > @@ -111,6 +111,39 @@ struct  intel_ring_buffer {
> > > >  #define I915_DISPATCH_PINNED 0x2
> > > >  	void		(*cleanup)(struct intel_ring_buffer *ring);
> > > >  
> > > > +	/* GEN8 signal/wait table
> > > > +	 *	  signal to  signal to    signal to   signal to
> > > > +	 *	    RCS         VCS          BCS        VECS
> > > > +	 *      ------------------------------------------------------
> > > > +	 *  RCS | NOP (0x00) | BCS (0x08) | VCS (0x10) | VECS (0x18) |
> > > > +	 *	|-----------------------------------------------------
> > > > +	 *  VCS | RCS (0x20) | NOP (0x28) | BCS (0x30) | VECS (0x38) |
> > > > +	 *	|-----------------------------------------------------
> > > > +	 *  BCS | RCS (0x40) | VCS (0x48) | NOP (0x50) | VECS (0x58) |
> > > > +	 *	|-----------------------------------------------------
> > > > +	 * VECS | RCS (0x60) | VCS (0x68) | BCS (0x70) |  NOP (0x78) |
> > > > +	 *	|-----------------------------------------------------
> > > > +	 *
> > > > +	 * Generalization:
> > > > +	 *  f(x, y) := (x->id * NUM_RINGS * seqno_size) + (seqno_size * y->id)
> > > > +	 *  ie. transpose of g(x, y)
> > > > +	 *
> > > > +	 *	 sync from   sync from    sync from    sync from
> > > > +	 *	    RCS         VCS          BCS        VECS
> > > > +	 *      ------------------------------------------------------
> > > > +	 *  RCS | NOP (0x00) | BCS (0x20) | VCS (0x40) | VECS (0x60) |
> > > > +	 *	|-----------------------------------------------------
> > > > +	 *  VCS | RCS (0x08) | NOP (0x28) | BCS (0x48) | VECS (0x68) |
> > > > +	 *	|-----------------------------------------------------
> > > > +	 *  BCS | RCS (0x10) | VCS (0x30) | NOP (0x50) | VECS (0x60) |
> > > > +	 *	|-----------------------------------------------------
> > > > +	 * VECS | RCS (0x18) | VCS (0x38) | BCS (0x58) |  NOP (0x78) |
> > > > +	 *	|-----------------------------------------------------
> > > > +	 *
> > > > +	 * Generalization:
> > > > +	 *  g(x, y) := (y->id * NUM_RINGS * seqno_size) + (seqno_size * x->id)
> > > > +	 *  ie. transpose of f(x, y)
> > > > +	 */
> > > >  	struct {
> > > >  		u32	sync_seqno[I915_NUM_RINGS-1];
> > > >  		/* AKA wait() */
> > > > @@ -120,7 +153,10 @@ struct  intel_ring_buffer {
> > > >  		/* our mbox written by others */
> > > >  		u32		mbox[I915_NUM_RINGS];
> > > 
> > > mbox should also get a u64 friend, right?
> > 
> > mbox should be gen6 only, given the change to using the gtt on gen8. In
> > this point in the series, semaphores should be forcibly disabled on
> > gen8, so the code looks wrong, but the path cannot [should not] be
> > taken.
> > 
> > I suppose I should kill the initialization of mbox for gen8, or somehow
> > consolidate with a union to prevent confusion.
> > 
> 
> Just to clarify it should be
> 
> gen6:
> signal uses signal_mbox for signal
> wait uses mbox
> 
> gen8:
> signal uses signal_ggtt for signal
> wait uses arithmetic to figure out the offset
> 

Ok, I've fixed this up to make things clearer, but it ends up in the
next patch. So look there when I repost.

> > > 
> > > >  		/* mboxes this ring signals to */
> > > > -		u32		signal_mbox[I915_NUM_RINGS];
> > > > +		union {
> > > > +			u32		signal_mbox[I915_NUM_RINGS];
> > > > +			u64		signal_ggtt[I915_NUM_RINGS];
> > > > +		};
> > > >  
> > > >  		/* num_dwords is space the caller will need for atomic update */
> > > >  		int		(*signal)(struct intel_ring_buffer *signaller,
> > > > -- 
> > > > 1.8.5.3
> > > > 
> > > > _______________________________________________
> > > > Intel-gfx mailing list
> > > > Intel-gfx@lists.freedesktop.org
> > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > > 
> > > -- 
> > > Ville Syrjälä
> > > Intel OTC
> > 
> > -- 
> > Ben Widawsky, Intel Open Source Technology Center
> 
> -- 
> Ben Widawsky, Intel Open Source Technology Center

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 11/13] drm/i915/bdw: collect semaphore error state
  2014-01-30 14:58     ` Chris Wilson
@ 2014-02-12  0:19       ` Ben Widawsky
  0 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-02-12  0:19 UTC (permalink / raw)
  To: Chris Wilson, Ville Syrjälä, Ben Widawsky, Intel GFX

On Thu, Jan 30, 2014 at 02:58:09PM +0000, Chris Wilson wrote:
> On Thu, Jan 30, 2014 at 04:53:32PM +0200, Ville Syrjälä wrote:
> > On Wed, Jan 29, 2014 at 11:55:31AM -0800, Ben Widawsky wrote:
> > > +	obj = error->semaphore_obj;
> > > +	if (obj) {
> > 
> > Chris will come along and change this to
> > 
> > if ((obj = error->semaphore_obj))
> 
> I was merely keeping in style with the rest of the code. Which was
> probably written by me, so I can't win!
> -Chris

My choice was intentional. Chris accuses me of polka dotting even when
it wasn't intentional - so why not.

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 11/13] drm/i915/bdw: collect semaphore error state
  2014-01-30 14:53   ` Ville Syrjälä
  2014-01-30 14:58     ` Chris Wilson
@ 2014-02-12  0:23     ` Ben Widawsky
  1 sibling, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-02-12  0:23 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel GFX, Ben Widawsky

On Thu, Jan 30, 2014 at 04:53:32PM +0200, Ville Syrjälä wrote:
> On Wed, Jan 29, 2014 at 11:55:31AM -0800, Ben Widawsky wrote:
> > Since the semaphore information is in an object, just dump it, and let
> > the user parse it later.
> > 
> > NOTE: The page being used for the semaphores are incoherent with the
> > CPU. No matter what I do, I cannot figure out a way to read anything but
> > 0s. Note that the semaphore waits are indeed working.
> > 
> > v2: Don't print signal, and wait (they should be the same). Instead,
> > print sync_seqno (Chris)
> > 
> > v3: Free the semaphore error object (Chris)
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |  1 +
> >  drivers/gpu/drm/i915/i915_gpu_error.c   | 47 ++++++++++++++++++++++++++++++---
> >  drivers/gpu/drm/i915/intel_ringbuffer.h | 12 ++++-----
> >  3 files changed, 51 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index f521059..b08e6eb 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -313,6 +313,7 @@ struct drm_i915_error_state {
> >  	u32 acthd[I915_NUM_RINGS];
> >  	u32 semaphore_mboxes[I915_NUM_RINGS][I915_NUM_RINGS - 1];
> >  	u32 semaphore_seqno[I915_NUM_RINGS][I915_NUM_RINGS - 1];
> > +	struct drm_i915_error_object *semaphore_obj;
> >  	u32 rc_psmi[I915_NUM_RINGS]; /* sleep state */
> >  	/* our own tracking of ring head and tail */
> >  	u32 cpu_ring_head[I915_NUM_RINGS];
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index efaad96..d6afc01 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -297,6 +297,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> >  	struct drm_device *dev = error_priv->dev;
> >  	drm_i915_private_t *dev_priv = dev->dev_private;
> >  	struct drm_i915_error_state *error = error_priv->error;
> > +	struct drm_i915_error_object *obj;
> >  	int i, j, page, offset, elt;
> >  
> >  	if (!error) {
> > @@ -345,8 +346,6 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> >  				    error->pinned_bo_count[0]);
> >  
> >  	for (i = 0; i < ARRAY_SIZE(error->ring); i++) {
> > -		struct drm_i915_error_object *obj;
> > -
> >  		if ((obj = error->ring[i].batchbuffer)) {
> >  			err_printf(m, "%s --- gtt_offset = 0x%08x\n",
> >  				   dev_priv->ring[i].name,
> > @@ -421,6 +420,19 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> >  		}
> >  	}
> >  
> > +	obj = error->semaphore_obj;
> > +	if (obj) {
> 
> Chris will come along and change this to
> 
> if ((obj = error->semaphore_obj))
> 
> 
> > +		err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
> > +		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
> > +			err_printf(m, "[%04x] %08x %08x %08x %08x\n",
> > +				   elt * 4,
> > +				   obj->pages[0][elt],
> > +				   obj->pages[0][elt+1],
> > +				   obj->pages[0][elt+2],
> > +				   obj->pages[0][elt+3]);
> > +		}
> 
> That'll be the third copy of this page dumping code. Time to refactor?
> 

I did enough refactoring in this series - but acked.

> > +	}
> > +
> >  	if (error->overlay)
> >  		intel_overlay_print_error_state(m, error->overlay);
> >  
> > @@ -491,6 +503,7 @@ static void i915_error_state_free(struct kref *error_ref)
> >  		kfree(error->ring[i].requests);
> >  	}
> >  
> > +	i915_error_object_free(error->semaphore_obj);
> >  	kfree(error->active_bo);
> >  	kfree(error->overlay);
> >  	kfree(error->display);
> > @@ -772,6 +785,31 @@ static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
> >  	}
> >  }
> >  
> > +static void gen8_record_semaphore_state(struct drm_i915_private *dev_priv,
> > +					struct drm_i915_error_state *error,
> > +					struct intel_ring_buffer *ring)
> > +{
> > +	struct intel_ring_buffer *useless;
> > +	int i;
> > +
> > +	if (!i915_semaphore_is_enabled(dev_priv->dev))
> > +		return;
> > +
> > +	if (!error->semaphore_obj)
> > +		error->semaphore_obj =
> > +			i915_error_object_create(dev_priv,
> > +						 dev_priv->semaphore_obj,
> > +						 &dev_priv->gtt.base);
> > +
> > +	for_each_ring(useless, dev_priv, i) {
> > +		u16 signal_offset = GEN8_SIGNAL_OFFSET(ring, i) / 4;
> 
> GEN8_SIGNAL_OFFSET() returns the full ggtt offset.
> 

Fixed locally. I'll wait until you respond to one of my earlier mails
and then resend.

> > +		u32 *tmp = error->semaphore_obj->pages[0];
> > +
> > +		error->semaphore_mboxes[ring->id][i] = tmp[signal_offset];
> > +		error->semaphore_seqno[ring->id][i] = ring->semaphore.sync_seqno[i];
> > +	}
> > +}
> > +
> >  static void i915_record_ring_state(struct drm_device *dev,
> >  				   struct drm_i915_error_state *error,
> >  				   struct intel_ring_buffer *ring)
> > @@ -781,7 +819,10 @@ static void i915_record_ring_state(struct drm_device *dev,
> >  	if (INTEL_INFO(dev)->gen >= 6) {
> >  		error->rc_psmi[ring->id] = I915_READ(ring->mmio_base + 0x50);
> >  		error->fault_reg[ring->id] = I915_READ(RING_FAULT_REG(ring));
> > -		gen6_record_semaphore_state(dev_priv, error, ring);
> > +		if (INTEL_INFO(dev)->gen >= 8)
> > +			gen8_record_semaphore_state(dev_priv, error, ring);
> > +		else
> > +			gen6_record_semaphore_state(dev_priv, error, ring);
> >  	}
> >  
> >  	if (INTEL_INFO(dev)->gen >= 4) {
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index ed55370..4ca2789 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -37,9 +37,9 @@ struct  intel_hw_status_page {
> >   * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
> >   */
> >  #define i915_semaphore_seqno_size sizeof(uint64_t)
> > -#define GEN8_SIGNAL_OFFSET(to) \
> > +#define GEN8_SIGNAL_OFFSET(__ring, to) \
> >  	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
> > -	(ring->id * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
> > +	((__ring)->id * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
> >  	(i915_semaphore_seqno_size * (to)))
> >  
> >  #define GEN8_WAIT_OFFSET(__ring, from) \
> > @@ -51,10 +51,10 @@ struct  intel_hw_status_page {
> >  	if (!dev_priv->semaphore_obj) { \
> >  		break; \
> >  	} \
> > -	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(RCS); \
> > -	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(VCS); \
> > -	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(BCS); \
> > -	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(VECS); \
> > +	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(ring, RCS); \
> > +	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(ring, VCS); \
> > +	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(ring, BCS); \
> > +	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(ring, VECS); \
> >  	ring->semaphore.mbox[RCS] = GEN8_WAIT_OFFSET(ring, RCS); \
> >  	ring->semaphore.mbox[VCS] = GEN8_WAIT_OFFSET(ring, VCS); \
> >  	ring->semaphore.mbox[BCS] = GEN8_WAIT_OFFSET(ring, BCS); \
> > -- 
> > 1.8.5.3
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ville Syrjälä
> Intel OTC

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 06/13] drm/i915/bdw: implement semaphore signal
  2014-02-11 23:01         ` Ben Widawsky
@ 2014-02-12  9:29           ` Ville Syrjälä
  0 siblings, 0 replies; 40+ messages in thread
From: Ville Syrjälä @ 2014-02-12  9:29 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX, Ben Widawsky

On Tue, Feb 11, 2014 at 03:01:31PM -0800, Ben Widawsky wrote:
> On Tue, Feb 11, 2014 at 02:22:37PM -0800, Ben Widawsky wrote:
> > On Tue, Feb 11, 2014 at 02:11:04PM -0800, Ben Widawsky wrote:
> > > On Thu, Jan 30, 2014 at 02:38:17PM +0200, Ville Syrjälä wrote:
> > > > On Wed, Jan 29, 2014 at 11:55:26AM -0800, Ben Widawsky wrote:
> > > > > Semaphore signalling works similarly to previous GENs with the exception
> > > > > that the per ring mailboxes no longer exist. Instead you must define
> > > > > your own space, somewhere in the GTT.
> > > > > 
> > > > > The comments in the code define the layout I've opted for, which should
> > > > > be fairly future proof. Ie. I tried to define offsets in abstract terms
> > > > > (NUM_RINGS, seqno size, etc).
> > > > > 
> > > > > NOTE: If one wanted to move this to the HWSP they could. I've decided
> > > > > one 4k object would be easier to deal with, and provide potential wins
> > > > > with cache locality, but that's all speculative.
> > > > > 
> > > > > v2: Update the macro to not need the other ring's ring->id (Chris)
> > > > > Update the comment to use the correct formula (Chris)
> > > > > 
> > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/i915_drv.h         |   1 +
> > > > >  drivers/gpu/drm/i915/i915_reg.h         |   5 +-
> > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 199 +++++++++++++++++++++++++-------
> > > > >  drivers/gpu/drm/i915/intel_ringbuffer.h |  38 +++++-
> > > > >  4 files changed, 197 insertions(+), 46 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > > > index 3673ba1..f521059 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > > @@ -1380,6 +1380,7 @@ typedef struct drm_i915_private {
> > > > >  
> > > > >  	struct pci_dev *bridge_dev;
> > > > >  	struct intel_ring_buffer ring[I915_NUM_RINGS];
> > > > > +	struct drm_i915_gem_object *semaphore_obj;
> > > > >  	uint32_t last_seqno, next_seqno;
> > > > >  
> > > > >  	drm_dma_handle_t *status_page_dmah;
> > > > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > > > > index cbbaf26..8b745dc 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_reg.h
> > > > > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > > > > @@ -216,7 +216,7 @@
> > > > >  #define   MI_DISPLAY_FLIP_IVB_SPRITE_B (3 << 19)
> > > > >  #define   MI_DISPLAY_FLIP_IVB_PLANE_C  (4 << 19)
> > > > >  #define   MI_DISPLAY_FLIP_IVB_SPRITE_C (5 << 19)
> > > > > -#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6+ */
> > > > > +#define MI_SEMAPHORE_MBOX	MI_INSTR(0x16, 1) /* gen6, gen7 */
> > > > >  #define   MI_SEMAPHORE_GLOBAL_GTT    (1<<22)
> > > > >  #define   MI_SEMAPHORE_UPDATE	    (1<<21)
> > > > >  #define   MI_SEMAPHORE_COMPARE	    (1<<20)
> > > > > @@ -241,6 +241,8 @@
> > > > >  #define   MI_RESTORE_EXT_STATE_EN	(1<<2)
> > > > >  #define   MI_FORCE_RESTORE		(1<<1)
> > > > >  #define   MI_RESTORE_INHIBIT		(1<<0)
> > > > > +#define MI_SEMAPHORE_SIGNAL	MI_INSTR(0x1b, 0) /* GEN8+ */
> > > > > +#define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
> > > > >  #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
> > > > >  #define   MI_MEM_VIRTUAL	(1 << 22) /* 965+ only */
> > > > >  #define MI_STORE_DWORD_INDEX	MI_INSTR(0x21, 1)
> > > > > @@ -329,6 +331,7 @@
> > > > >  #define   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE		(1<<10) /* GM45+ only */
> > > > >  #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
> > > > >  #define   PIPE_CONTROL_NOTIFY				(1<<8)
> > > > > +#define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
> > > > >  #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
> > > > >  #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
> > > > >  #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
> > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > index 37ae2b1..b750835 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > @@ -619,6 +619,13 @@ static int init_render_ring(struct intel_ring_buffer *ring)
> > > > >  static void render_ring_cleanup(struct intel_ring_buffer *ring)
> > > > >  {
> > > > >  	struct drm_device *dev = ring->dev;
> > > > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > > > +
> > > > > +	if (dev_priv->semaphore_obj) {
> > > > > +		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
> > > > > +		drm_gem_object_unreference(&dev_priv->semaphore_obj->base);
> > > > > +		dev_priv->semaphore_obj = NULL;
> > > > > +	}
> > > > >  
> > > > >  	if (ring->scratch.obj == NULL)
> > > > >  		return;
> > > > > @@ -632,6 +639,86 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring)
> > > > >  	ring->scratch.obj = NULL;
> > > > >  }
> > > > >  
> > > > > +static int gen8_rcs_signal(struct intel_ring_buffer *signaller,
> > > > > +			   unsigned int num_dwords)
> > > > > +{
> > > > > +#define MBOX_UPDATE_DWORDS 8
> > > > > +	struct drm_device *dev = signaller->dev;
> > > > > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > > > > +	struct intel_ring_buffer *waiter;
> > > > > +	int i, ret, num_rings;
> > > > > +
> > > > > +	num_rings = hweight_long(INTEL_INFO(dev)->ring_mask);
> > > > > +	num_dwords = (num_rings-1) * MBOX_UPDATE_DWORDS;
> > > > 
> > > > Again num_dwords +=
> > > > 
> > > > > +#undef MBOX_UPDATE_DWORDS
> > > > > +
> > > > > +	/* XXX: + 4 for the caller */
> > > > > +	ret = intel_ring_begin(signaller, num_dwords + 4);
> > > > 
> > > > and the +4 goes away.
> > > > 
> > > > > +	if (ret)
> > > > > +		return ret;
> > > > > +
> > > > > +	for_each_ring(waiter, dev_priv, i) {
> > > > > +		u64 gtt_offset = signaller->semaphore.signal_ggtt[i];
> > > > > +		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
> > > > > +			continue;
> > > > > +
> > > > > +		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
> > > > > +		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
> > > > > +					   PIPE_CONTROL_QW_WRITE |
> > > > > +					   PIPE_CONTROL_FLUSH_ENABLE);
> > > > > +		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
> > > > > +		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> > > > > +		intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
> > > > > +		intel_ring_emit(signaller, 0);
> > > > > +		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
> > > > > +					   MI_SEMAPHORE_TARGET(waiter->id));
> > > > > +		intel_ring_emit(signaller, 0);
> > > > > +	}
> > > > > +
> > > > > +	WARN_ON(i != num_rings);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > 
> > > > <snip>
> > > 
> > > Got those, thanks.
> > > 
> > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > > > index c69ae10..f1e7a66 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > > > > @@ -111,6 +111,39 @@ struct  intel_ring_buffer {
> > > > >  #define I915_DISPATCH_PINNED 0x2
> > > > >  	void		(*cleanup)(struct intel_ring_buffer *ring);
> > > > >  
> > > > > +	/* GEN8 signal/wait table
> > > > > +	 *	  signal to  signal to    signal to   signal to
> > > > > +	 *	    RCS         VCS          BCS        VECS
> > > > > +	 *      ------------------------------------------------------
> > > > > +	 *  RCS | NOP (0x00) | BCS (0x08) | VCS (0x10) | VECS (0x18) |
> > > > > +	 *	|-----------------------------------------------------
> > > > > +	 *  VCS | RCS (0x20) | NOP (0x28) | BCS (0x30) | VECS (0x38) |
> > > > > +	 *	|-----------------------------------------------------
> > > > > +	 *  BCS | RCS (0x40) | VCS (0x48) | NOP (0x50) | VECS (0x58) |
> > > > > +	 *	|-----------------------------------------------------
> > > > > +	 * VECS | RCS (0x60) | VCS (0x68) | BCS (0x70) |  NOP (0x78) |
> > > > > +	 *	|-----------------------------------------------------
> > > > > +	 *
> > > > > +	 * Generalization:
> > > > > +	 *  f(x, y) := (x->id * NUM_RINGS * seqno_size) + (seqno_size * y->id)
> > > > > +	 *  ie. transpose of g(x, y)
> > > > > +	 *
> > > > > +	 *	 sync from   sync from    sync from    sync from
> > > > > +	 *	    RCS         VCS          BCS        VECS
> > > > > +	 *      ------------------------------------------------------
> > > > > +	 *  RCS | NOP (0x00) | BCS (0x20) | VCS (0x40) | VECS (0x60) |
> > > > > +	 *	|-----------------------------------------------------
> > > > > +	 *  VCS | RCS (0x08) | NOP (0x28) | BCS (0x48) | VECS (0x68) |
> > > > > +	 *	|-----------------------------------------------------
> > > > > +	 *  BCS | RCS (0x10) | VCS (0x30) | NOP (0x50) | VECS (0x60) |
> > > > > +	 *	|-----------------------------------------------------
> > > > > +	 * VECS | RCS (0x18) | VCS (0x38) | BCS (0x58) |  NOP (0x78) |
> > > > > +	 *	|-----------------------------------------------------
> > > > > +	 *
> > > > > +	 * Generalization:
> > > > > +	 *  g(x, y) := (y->id * NUM_RINGS * seqno_size) + (seqno_size * x->id)
> > > > > +	 *  ie. transpose of f(x, y)
> > > > > +	 */
> > > > >  	struct {
> > > > >  		u32	sync_seqno[I915_NUM_RINGS-1];
> > > > >  		/* AKA wait() */
> > > > > @@ -120,7 +153,10 @@ struct  intel_ring_buffer {
> > > > >  		/* our mbox written by others */
> > > > >  		u32		mbox[I915_NUM_RINGS];
> > > > 
> > > > mbox should also get a u64 friend, right?
> > > 
> > > mbox should be gen6 only, given the change to using the gtt on gen8. In
> > > this point in the series, semaphores should be forcibly disabled on
> > > gen8, so the code looks wrong, but the path cannot [should not] be
> > > taken.
> > > 
> > > I suppose I should kill the initialization of mbox for gen8, or somehow
> > > consolidate with a union to prevent confusion.
> > > 
> > 
> > Just to clarify it should be
> > 
> > gen6:
> > signal uses signal_mbox for signal
> > wait uses mbox
> > 
> > gen8:
> > signal uses signal_ggtt for signal
> > wait uses arithmetic to figure out the offset
> > 
> 
> Ok, I've fixed this up to make things clearer, but it ends up in the
> next patch. So look there when I repost.

I was confused by these:
+ ring->semaphore.mbox[RCS] = GEN8_WAIT_OFFSET(ring, RCS);

So you did store the wait offset into mbox, but then you didn't use the
precomputed values and instead recomputed on the spot in gen8
ring_sync().


> 
> > > > 
> > > > >  		/* mboxes this ring signals to */
> > > > > -		u32		signal_mbox[I915_NUM_RINGS];
> > > > > +		union {
> > > > > +			u32		signal_mbox[I915_NUM_RINGS];
> > > > > +			u64		signal_ggtt[I915_NUM_RINGS];
> > > > > +		};
> > > > >  
> > > > >  		/* num_dwords is space the caller will need for atomic update */
> > > > >  		int		(*signal)(struct intel_ring_buffer *signaller,
> > > > > -- 
> > > > > 1.8.5.3
> > > > > 
> > > > > _______________________________________________
> > > > > Intel-gfx mailing list
> > > > > Intel-gfx@lists.freedesktop.org
> > > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > > > 
> > > > -- 
> > > > Ville Syrjälä
> > > > Intel OTC
> > > 
> > > -- 
> > > Ben Widawsky, Intel Open Source Technology Center
> > 
> > -- 
> > Ben Widawsky, Intel Open Source Technology Center
> 
> -- 
> Ben Widawsky, Intel Open Source Technology Center

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 11/13] drm/i915/bdw: collect semaphore error state
  2014-04-29 21:52 [PATCH 00/13] [REPOST] BDW Semaphores Ben Widawsky
@ 2014-04-29 21:52 ` Ben Widawsky
  0 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2014-04-29 21:52 UTC (permalink / raw)
  To: Intel GFX

Since the semaphore information is in an object, just dump it, and let
the user parse it later.

NOTE: The page being used for the semaphores are incoherent with the
CPU. No matter what I do, I cannot figure out a way to read anything but
0s. Note that the semaphore waits are indeed working.

v2: Don't print signal, and wait (they should be the same). Instead,
print sync_seqno (Chris)

v3: Free the semaphore error object (Chris)

v4: Fix semaphore offset calculation during error state collection
(Ville)

v5: VCS2 rebase
Make semaphore object error capture coding style consistent (Ville)
Do the proper math for the signal offset (Ville)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/i915_gpu_error.c   | 51 ++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 14 ++++-----
 3 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 44cb744..237faf3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -328,6 +328,7 @@ struct drm_i915_error_state {
 	u64 fence[I915_MAX_NUM_FENCES];
 	struct intel_overlay_error_state *overlay;
 	struct intel_display_error_state *display;
+	struct drm_i915_error_object *semaphore_obj;
 
 	struct drm_i915_error_ring {
 		bool valid;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a7eaab2..50d2af8 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -326,6 +326,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 	struct drm_device *dev = error_priv->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_error_state *error = error_priv->error;
+	struct drm_i915_error_object *obj;
 	int i, j, offset, elt;
 	int max_hangcheck_score;
 
@@ -394,8 +395,6 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 				    error->pinned_bo_count[0]);
 
 	for (i = 0; i < ARRAY_SIZE(error->ring); i++) {
-		struct drm_i915_error_object *obj;
-
 		obj = error->ring[i].batchbuffer;
 		if (obj) {
 			err_puts(m, dev_priv->ring[i].name);
@@ -458,6 +457,18 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		}
 	}
 
+	if ((obj = error->semaphore_obj)) {
+		err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
+		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
+			err_printf(m, "[%04x] %08x %08x %08x %08x\n",
+				   elt * 4,
+				   obj->pages[0][elt],
+				   obj->pages[0][elt+1],
+				   obj->pages[0][elt+2],
+				   obj->pages[0][elt+3]);
+		}
+	}
+
 	if (error->overlay)
 		intel_overlay_print_error_state(m, error->overlay);
 
@@ -528,6 +539,7 @@ static void i915_error_state_free(struct kref *error_ref)
 		kfree(error->ring[i].requests);
 	}
 
+	i915_error_object_free(error->semaphore_obj);
 	kfree(error->active_bo);
 	kfree(error->overlay);
 	kfree(error->display);
@@ -745,6 +757,33 @@ static void i915_gem_record_fences(struct drm_device *dev,
 }
 
 
+static void gen8_record_semaphore_state(struct drm_i915_private *dev_priv,
+					struct drm_i915_error_state *error,
+					struct intel_ring_buffer *ring,
+					struct drm_i915_error_ring *ering)
+{
+	struct intel_ring_buffer *useless;
+	int i;
+
+	if (!i915_semaphore_is_enabled(dev_priv->dev))
+		return;
+
+	if (!error->semaphore_obj)
+		error->semaphore_obj =
+			i915_error_object_create(dev_priv,
+						 dev_priv->semaphore_obj,
+						 &dev_priv->gtt.base);
+
+	for_each_ring(useless, dev_priv, i) {
+		u16 signal_offset =
+			(GEN8_SIGNAL_OFFSET(ring, i) & PAGE_MASK) / 4;
+		u32 *tmp = error->semaphore_obj->pages[0];
+
+		ering->semaphore_mboxes[i] = tmp[signal_offset];
+		ering->semaphore_seqno[i] = ring->semaphore.sync_seqno[i];
+	}
+}
+
 static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
 					struct intel_ring_buffer *ring,
 					struct drm_i915_error_ring *ering)
@@ -762,6 +801,7 @@ static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
 }
 
 static void i915_record_ring_state(struct drm_device *dev,
+				   struct drm_i915_error_state *error,
 				   struct intel_ring_buffer *ring,
 				   struct drm_i915_error_ring *ering)
 {
@@ -770,7 +810,10 @@ static void i915_record_ring_state(struct drm_device *dev,
 	if (INTEL_INFO(dev)->gen >= 6) {
 		ering->rc_psmi = I915_READ(ring->mmio_base + 0x50);
 		ering->fault_reg = I915_READ(RING_FAULT_REG(ring));
-		gen6_record_semaphore_state(dev_priv, ring, ering);
+		if (INTEL_INFO(dev)->gen >= 8)
+			gen8_record_semaphore_state(dev_priv, error, ring, ering);
+		else
+			gen6_record_semaphore_state(dev_priv, ring, ering);
 	}
 
 	if (INTEL_INFO(dev)->gen >= 4) {
@@ -897,7 +940,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 
 		error->ring[i].valid = true;
 
-		i915_record_ring_state(dev, ring, &error->ring[i]);
+		i915_record_ring_state(dev, error, ring, &error->ring[i]);
 
 		error->ring[i].pid = -1;
 		request = i915_gem_find_active_request(ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index a7ff166..20af934 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -40,9 +40,9 @@ struct  intel_hw_status_page {
  * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
  */
 #define i915_semaphore_seqno_size sizeof(uint64_t)
-#define GEN8_SIGNAL_OFFSET(to) \
+#define GEN8_SIGNAL_OFFSET(__ring, to) \
 	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
-	(ring->id * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
+	((__ring)->id * I915_NUM_RINGS * i915_semaphore_seqno_size) + \
 	(i915_semaphore_seqno_size * (to)))
 
 #define GEN8_WAIT_OFFSET(__ring, from) \
@@ -54,11 +54,11 @@ struct  intel_hw_status_page {
 	if (!dev_priv->semaphore_obj) { \
 		break; \
 	} \
-	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(RCS); \
-	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(VCS); \
-	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(BCS); \
-	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(VECS); \
-	ring->semaphore.signal_ggtt[VCS2] = GEN8_SIGNAL_OFFSET(VCS2); \
+	ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(ring, RCS); \
+	ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(ring, VCS); \
+	ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(ring, BCS); \
+	ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(ring, VECS); \
+	ring->semaphore.signal_ggtt[VCS2] = GEN8_SIGNAL_OFFSET(ring, VCS2); \
 	ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \
 	} while(0)
 
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2014-04-29 21:52 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-29 19:55 [PATCH 00/13] [REPOST] Broadwell HW semaphores Ben Widawsky
2014-01-29 19:55 ` [PATCH 01/13] drm/i915: Move semaphore specific ring members to struct Ben Widawsky
2014-01-29 19:55 ` [PATCH 02/13] drm/i915: Virtualize the ringbuffer signal func Ben Widawsky
2014-01-29 19:55 ` [PATCH 03/13] drm/i915: Move ring_begin to signal() Ben Widawsky
2014-01-29 19:55 ` [PATCH 04/13] drm/i915: Make semaphore updates more precise Ben Widawsky
2014-01-30 11:25   ` Ville Syrjälä
2014-02-11 16:08     ` Ben Widawsky
2014-02-11 17:13       ` Ville Syrjälä
2014-02-11 20:20   ` [PATCH] [v2] " Ben Widawsky
2014-02-11 20:53     ` Ville Syrjälä
2014-02-11 21:50       ` Ben Widawsky
2014-01-29 19:55 ` [PATCH 05/13] drm/i915: gen specific ring init Ben Widawsky
2014-01-29 19:55 ` [PATCH 06/13] drm/i915/bdw: implement semaphore signal Ben Widawsky
2014-01-30 12:38   ` Ville Syrjälä
2014-01-30 12:46     ` Chris Wilson
2014-01-30 13:18       ` Daniel Vetter
2014-01-30 13:25         ` Chris Wilson
2014-01-30 13:35         ` Chris Wilson
2014-02-11 21:48           ` Ben Widawsky
2014-02-11 22:23             ` Chris Wilson
2014-02-11 22:25               ` Ben Widawsky
2014-02-11 22:28                 ` Chris Wilson
2014-02-11 22:11     ` Ben Widawsky
2014-02-11 22:22       ` Ben Widawsky
2014-02-11 23:01         ` Ben Widawsky
2014-02-12  9:29           ` Ville Syrjälä
2014-01-29 19:55 ` [PATCH 07/13] drm/i915/bdw: implement semaphore wait Ben Widawsky
2014-01-30 12:48   ` Ville Syrjälä
2014-01-29 19:55 ` [PATCH 08/13] drm/i915: FORCE_RESTORE for gen8 semaphores Ben Widawsky
2014-01-29 19:55 ` [PATCH 09/13] drm/i915/bdw: poll semaphores Ben Widawsky
2014-01-30 13:26   ` Ville Syrjälä
2014-01-29 19:55 ` [PATCH 10/13] drm/i915: Extract semaphore error collection Ben Widawsky
2014-01-29 19:55 ` [PATCH 11/13] drm/i915/bdw: collect semaphore error state Ben Widawsky
2014-01-30 14:53   ` Ville Syrjälä
2014-01-30 14:58     ` Chris Wilson
2014-02-12  0:19       ` Ben Widawsky
2014-02-12  0:23     ` Ben Widawsky
2014-01-29 19:55 ` [PATCH 12/13] drm/i915: unleash semaphores on gen8 Ben Widawsky
2014-01-29 19:55 ` [PATCH 13/13] drm/i915: semaphore debugfs Ben Widawsky
2014-04-29 21:52 [PATCH 00/13] [REPOST] BDW Semaphores Ben Widawsky
2014-04-29 21:52 ` [PATCH 11/13] drm/i915/bdw: collect semaphore error state Ben Widawsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.