All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] DPF (GPU l3 parity detection) improvements
@ 2013-09-13  5:28 Ben Widawsky
  2013-09-13  5:28 ` [PATCH 1/8] drm/i915: Remove extra "ring" Ben Widawsky
                   ` (17 more replies)
  0 siblings, 18 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, vishnu.venkatesh

Since IVB, our driver has supported GPU L3 cacheline remapping for
parity errors. This is known as, "DPF" for Dynamic Parity Feature. I am
told such an error is a good predictor for a subsequent error in the
same part of the cache.  To address this possible issue for workloads
requiring precise and correct data, like GPGPU workloads the HW has
extra space in the cache which can be dynamically remapped to fill in
the old, faulting parts of the cache. I should also note, to my
knowledge, no such error has actually been seen on either Ivybridge or
Haswell in the wild.

Note, and reminder: GPU L3 is not the same thing as "L3." It is a
special (usually incoherent) cache that is only used by certain
components within the GPU.

Included in the patches:
1. Fix HSW test cases previously submitted and bikeshedded by Ville.
2. Support for an extra area of L3 added in certain HSW SKUs
3. Error injection support from the user space for test.
4. A reference daemon for listening to the parity error events.

Caveats:
* I've not implemented the "hang" injection. I was not clear what it does, and
  I don't really see how it benefits testing the software I have written.

* I am currently missing a test which uses the error injection.
  Volunteers who want to help, please raise your hand. If not, I'll get
  to it as soon as possible.

* We do have a race with the udev mechanism of error delivery. If I
  understand the way udev works, if we have more than 1 event before the
  daemon is woken, the properties will get us the failing cache location
  of the last error only. I think this is okay because of the earlier statement
  that a parity error is a good indicator of a future parity error. One thing
  which I've not done is trying to track when there are missed errors which
  should be possible even if the info about the location of the error can't be
  retrieved.

* There is no way to read out the per context remapping information through
  sysfs. I only expose whether or not a context has outstanding remaps through
  debugfs. This does effect the testability a bit, but the implementation is
  simple enough that I'm not terrible worried.

Ben Widawsky (8):
  drm/i915: Remove extra "ring"
  drm/i915: Round l3 parity reads down
  drm/i915: Fix l3 parity user buffer offset
  drm/i915: Fix HSW parity test
  drm/i915: Add second slice l3 remapping
  drm/i915: Make l3 remapping use the ring
  drm/i915: Keep a list of all contexts
  drm/i915: Do remaps for all contexts

 drivers/gpu/drm/i915/i915_debugfs.c     | 23 ++++++---
 drivers/gpu/drm/i915/i915_drv.h         | 13 +++--
 drivers/gpu/drm/i915/i915_gem.c         | 46 +++++++++---------
 drivers/gpu/drm/i915/i915_gem_context.c | 20 +++++++-
 drivers/gpu/drm/i915/i915_irq.c         | 84 +++++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_reg.h         |  6 +++
 drivers/gpu/drm/i915/i915_sysfs.c       | 57 +++++++++++++++-------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +--
 include/uapi/drm/i915_drm.h             |  8 ++--
 9 files changed, 175 insertions(+), 88 deletions(-)

-- 
1.8.4

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/8] drm/i915: Remove extra "ring"
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  5:28 ` [PATCH 2/8] drm/i915: Round l3 parity reads down Ben Widawsky
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Sadly, this isn't the first time we've done this:
http://lists.freedesktop.org/archives/intel-gfx/2013-June/029065.html

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 9ac4e31..1d77624 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1462,7 +1462,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 
 	for_each_ring(ring, dev_priv, i) {
 		if (ring->default_context) {
-			seq_printf(m, "HW default context %s ring ", ring->name);
+			seq_printf(m, "HW default context %s ", ring->name);
 			describe_obj(m, ring->default_context->obj);
 			seq_putc(m, '\n');
 		}
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/8] drm/i915: Round l3 parity reads down
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
  2013-09-13  5:28 ` [PATCH 1/8] drm/i915: Remove extra "ring" Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  5:28 ` [PATCH 3/8] drm/i915: Fix l3 parity user buffer offset Ben Widawsky
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

We always read a register for l3 parity reads, and we don't really want
to ever let userspace trick us into giving back less than the dword.

Writes are okay because we assume everything will be 0 filled, and as
such, if a user really wants to write less than a dword, let them.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_sysfs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index c8c4112..9070f50 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -121,6 +121,8 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
 	uint32_t misccpctl;
 	int i, ret;
 
+	count = round_down(count, 4);
+
 	ret = l3_access_valid(drm_dev, offset);
 	if (ret)
 		return ret;
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/8] drm/i915: Fix l3 parity user buffer offset
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
  2013-09-13  5:28 ` [PATCH 1/8] drm/i915: Remove extra "ring" Ben Widawsky
  2013-09-13  5:28 ` [PATCH 2/8] drm/i915: Round l3 parity reads down Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13 12:56   ` Daniel Vetter
  2013-09-13  5:28 ` [PATCH 4/8] drm/i915: Fix HSW parity test Ben Widawsky
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

The buf pointer used during l3_write is just char *, therefore it does
not require the silly any addition of offset.

v2: Also fix i915_l3_read with a suggested logic from Ville

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_sysfs.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 9070f50..d572435 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -127,6 +127,8 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
 	if (ret)
 		return ret;
 
+	count = min_t(int, GEN7_L3LOG_SIZE-offset, count);
+
 	ret = i915_mutex_lock_interruptible(drm_dev);
 	if (ret)
 		return ret;
@@ -134,14 +136,14 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
 	misccpctl = I915_READ(GEN7_MISCCPCTL);
 	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
 
-	for (i = offset; count >= 4 && i < GEN7_L3LOG_SIZE; i += 4, count -= 4)
-		*((uint32_t *)(&buf[i])) = I915_READ(GEN7_L3LOG_BASE + i);
+	for (i = 0; i < count; i += 4)
+		*((uint32_t *)(&buf[i])) = I915_READ(GEN7_L3LOG_BASE + offset + i);
 
 	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
 
 	mutex_unlock(&drm_dev->struct_mutex);
 
-	return i - offset;
+	return i;
 }
 
 static ssize_t
@@ -186,9 +188,7 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
 	if (temp)
 		dev_priv->l3_parity.remap_info = temp;
 
-	memcpy(dev_priv->l3_parity.remap_info + (offset/4),
-	       buf + (offset/4),
-	       count);
+	memcpy(dev_priv->l3_parity.remap_info + (offset/4), buf, count);
 
 	i915_gem_l3_remap(drm_dev);
 
-- 
1.8.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/8] drm/i915: Fix HSW parity test
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (2 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 3/8] drm/i915: Fix l3 parity user buffer offset Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  8:17   ` Ville Syrjälä
  2013-09-13  5:28 ` [PATCH 5/8] drm/i915: Add second slice l3 remapping Ben Widawsky
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Haswell changed the log registers to be WO, so we can no longer read
them to determine the programming (which sucks, see later note). For
now, simply use the cached value, and hope HW doesn't screw us over.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57441
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_sysfs.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index d572435..43c2e81 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -133,6 +133,19 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
 	if (ret)
 		return ret;
 
+	if (IS_HASWELL(drm_dev)) {
+		int last = min_t(int, GEN7_L3LOG_SIZE, count + offset);
+		if ((!dev_priv->l3_parity.remap_info))
+			memset(buf + offset, 0, last - offset);
+		else
+			memcpy(buf + offset,
+			       dev_priv->l3_parity.remap_info + (offset/4),
+			       last - offset);
+
+		i = last;
+		goto out;
+	}
+
 	misccpctl = I915_READ(GEN7_MISCCPCTL);
 	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
 
@@ -141,6 +154,7 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
 
 	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
 
+out:
 	mutex_unlock(&drm_dev->struct_mutex);
 
 	return i;
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 5/8] drm/i915: Add second slice l3 remapping
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (3 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 4/8] drm/i915: Fix HSW parity test Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  9:38   ` Ville Syrjälä
  2013-09-13  5:28 ` [PATCH 6/8] drm/i915: Make l3 remapping use the ring Ben Widawsky
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Certain HSW SKUs have a second bank of L3. This L3 remapping has a
separate register set, and interrupt from the first "slice". A slice is
simply a term to define some subset of the GPU's l3 cache. This patch
implements both the interrupt handler, and ability to communicate with
userspace about this second slice.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h         |  9 ++--
 drivers/gpu/drm/i915/i915_gem.c         | 26 ++++++----
 drivers/gpu/drm/i915/i915_irq.c         | 84 +++++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_reg.h         |  6 +++
 drivers/gpu/drm/i915/i915_sysfs.c       | 34 ++++++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +--
 include/uapi/drm/i915_drm.h             |  8 ++--
 7 files changed, 115 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 81ba5bb..eb90461 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -918,9 +918,11 @@ struct i915_ums_state {
 	int mm_suspended;
 };
 
+#define MAX_L3_SLICES 2
 struct intel_l3_parity {
-	u32 *remap_info;
+	u32 *remap_info[MAX_L3_SLICES];
 	struct work_struct error_work;
+	int which_slice;
 };
 
 struct i915_gem_mm {
@@ -1686,7 +1688,8 @@ struct drm_i915_file_private {
 
 #define HAS_FORCE_WAKE(dev) (INTEL_INFO(dev)->has_force_wake)
 
-#define HAS_L3_GPU_CACHE(dev) (IS_IVYBRIDGE(dev) || IS_HASWELL(dev))
+#define HAS_L3_GPU_CACHE(dev) (INTEL_INFO(dev)->gen >= 7)
+#define NUM_L3_SLICES(dev) (IS_HSW_GT3(dev) ? 2 : HAS_L3_GPU_CACHE(dev))
 
 #define GT_FREQUENCY_MULTIPLIER 50
 
@@ -1947,7 +1950,7 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
 int __must_check i915_gem_init(struct drm_device *dev);
 int __must_check i915_gem_init_hw(struct drm_device *dev);
-void i915_gem_l3_remap(struct drm_device *dev);
+void i915_gem_l3_remap(struct drm_device *dev, int slice);
 void i915_gem_init_swizzling(struct drm_device *dev);
 void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
 int __must_check i915_gpu_idle(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5b510a3..b11f7d6c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4256,16 +4256,21 @@ i915_gem_idle(struct drm_device *dev)
 	return 0;
 }
 
-void i915_gem_l3_remap(struct drm_device *dev)
+void i915_gem_l3_remap(struct drm_device *dev, int slice)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
+	u32 reg_base = GEN7_L3LOG_BASE + (slice * 0x200);
+	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
 	u32 misccpctl;
 	int i;
 
 	if (!HAS_L3_GPU_CACHE(dev))
 		return;
 
-	if (!dev_priv->l3_parity.remap_info)
+	if (NUM_L3_SLICES(dev) < 2 && slice)
+		return;
+
+	if (!remap_info)
 		return;
 
 	misccpctl = I915_READ(GEN7_MISCCPCTL);
@@ -4273,17 +4278,17 @@ void i915_gem_l3_remap(struct drm_device *dev)
 	POSTING_READ(GEN7_MISCCPCTL);
 
 	for (i = 0; i < GEN7_L3LOG_SIZE; i += 4) {
-		u32 remap = I915_READ(GEN7_L3LOG_BASE + i);
-		if (remap && remap != dev_priv->l3_parity.remap_info[i/4])
+		u32 remap = I915_READ(reg_base + i);
+		if (remap && remap != remap_info[i/4])
 			DRM_DEBUG("0x%x was already programmed to %x\n",
-				  GEN7_L3LOG_BASE + i, remap);
-		if (remap && !dev_priv->l3_parity.remap_info[i/4])
+				  reg_base + i, remap);
+		if (remap && !remap_info[i/4])
 			DRM_DEBUG_DRIVER("Clearing remapped register\n");
-		I915_WRITE(GEN7_L3LOG_BASE + i, dev_priv->l3_parity.remap_info[i/4]);
+		I915_WRITE(reg_base + i, remap_info[i/4]);
 	}
 
 	/* Make sure all the writes land before disabling dop clock gating */
-	POSTING_READ(GEN7_L3LOG_BASE);
+	POSTING_READ(reg_base);
 
 	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
 }
@@ -4377,7 +4382,7 @@ int
 i915_gem_init_hw(struct drm_device *dev)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
-	int ret;
+	int ret, i;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -4396,7 +4401,8 @@ i915_gem_init_hw(struct drm_device *dev)
 		I915_WRITE(GEN7_MSG_CTL, temp);
 	}
 
-	i915_gem_l3_remap(dev);
+	for (i = 0; i < NUM_L3_SLICES(dev); i++)
+		i915_gem_l3_remap(dev, i);
 
 	i915_gem_init_swizzling(dev);
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 13d26cf..62cdf05 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -882,9 +882,10 @@ static void ivybridge_parity_work(struct work_struct *work)
 	drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
 						    l3_parity.error_work);
 	u32 error_status, row, bank, subbank;
-	char *parity_event[5];
+	char *parity_event[6];
 	uint32_t misccpctl;
 	unsigned long flags;
+	uint8_t slice = 0;
 
 	/* We must turn off DOP level clock gating to access the L3 registers.
 	 * In order to prevent a get/put style interface, acquire struct mutex
@@ -892,45 +893,63 @@ static void ivybridge_parity_work(struct work_struct *work)
 	 */
 	mutex_lock(&dev_priv->dev->struct_mutex);
 
+	/* If we've screwed up tracking, just let the interrupt fire again */
+	if (WARN_ON(!dev_priv->l3_parity.which_slice))
+		goto out;
+
 	misccpctl = I915_READ(GEN7_MISCCPCTL);
 	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
 	POSTING_READ(GEN7_MISCCPCTL);
 
-	error_status = I915_READ(GEN7_L3CDERRST1);
-	row = GEN7_PARITY_ERROR_ROW(error_status);
-	bank = GEN7_PARITY_ERROR_BANK(error_status);
-	subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
+	while ((slice = ffs(dev_priv->l3_parity.which_slice)) != 0) {
+		u32 reg;
 
-	I915_WRITE(GEN7_L3CDERRST1, GEN7_PARITY_ERROR_VALID |
-				    GEN7_L3CDERRST1_ENABLE);
-	POSTING_READ(GEN7_L3CDERRST1);
+		if (WARN_ON(slice >= MAX_L3_SLICES))
+			break;
 
-	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
+		dev_priv->l3_parity.which_slice &= ~(1<<slice);
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	ilk_enable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+		reg = GEN7_L3CDERRST1 + (slice * 0x200);
 
-	mutex_unlock(&dev_priv->dev->struct_mutex);
+		error_status = I915_READ(reg);
+		row = GEN7_PARITY_ERROR_ROW(error_status);
+		bank = GEN7_PARITY_ERROR_BANK(error_status);
+		subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
+
+		I915_WRITE(reg, GEN7_PARITY_ERROR_VALID | GEN7_L3CDERRST1_ENABLE);
+		POSTING_READ(reg);
+
+		parity_event[0] = I915_L3_PARITY_UEVENT "=1";
+		parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
+		parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
+		parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
+		parity_event[4] = kasprintf(GFP_KERNEL, "SLICE=%d", slice);
+		parity_event[5] = NULL;
 
-	parity_event[0] = I915_L3_PARITY_UEVENT "=1";
-	parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
-	parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
-	parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
-	parity_event[4] = NULL;
+		kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
+				   KOBJ_CHANGE, parity_event);
 
-	kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
-			   KOBJ_CHANGE, parity_event);
+		DRM_DEBUG("Parity error: Slice = %d, Row = %d, Bank = %d, Sub bank = %d.\n",
+			  slice, row, bank, subbank);
 
-	DRM_DEBUG("Parity error: Row = %d, Bank = %d, Sub bank = %d.\n",
-		  row, bank, subbank);
+		kfree(parity_event[4]);
+		kfree(parity_event[3]);
+		kfree(parity_event[2]);
+		kfree(parity_event[1]);
+	}
+
+	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
+
+out:
+	WARN_ON(dev_priv->l3_parity.which_slice);
+	spin_lock_irqsave(&dev_priv->irq_lock, flags);
+	ilk_enable_gt_irq(dev_priv, GT_PARITY_ERROR);
+	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
 
-	kfree(parity_event[3]);
-	kfree(parity_event[2]);
-	kfree(parity_event[1]);
+	mutex_unlock(&dev_priv->dev->struct_mutex);
 }
 
-static void ivybridge_parity_error_irq_handler(struct drm_device *dev)
+static void ivybridge_parity_error_irq_handler(struct drm_device *dev, u32 iir)
 {
 	drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private;
 
@@ -938,9 +957,12 @@ static void ivybridge_parity_error_irq_handler(struct drm_device *dev)
 		return;
 
 	spin_lock(&dev_priv->irq_lock);
-	ilk_disable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
+	ilk_disable_gt_irq(dev_priv, GT_PARITY_ERROR);
 	spin_unlock(&dev_priv->irq_lock);
 
+	iir &= GT_PARITY_ERROR;
+	dev_priv->l3_parity.which_slice =
+		1 << (iir & GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1 ? 1 : 0);
 	queue_work(dev_priv->wq, &dev_priv->l3_parity.error_work);
 }
 
@@ -975,8 +997,8 @@ static void snb_gt_irq_handler(struct drm_device *dev,
 		i915_handle_error(dev, false);
 	}
 
-	if (gt_iir & GT_RENDER_L3_PARITY_ERROR_INTERRUPT)
-		ivybridge_parity_error_irq_handler(dev);
+	if (gt_iir & GT_PARITY_ERROR)
+		ivybridge_parity_error_irq_handler(dev, gt_iir);
 }
 
 #define HPD_STORM_DETECT_PERIOD 1000
@@ -2261,8 +2283,8 @@ static void gen5_gt_irq_postinstall(struct drm_device *dev)
 	dev_priv->gt_irq_mask = ~0;
 	if (HAS_L3_GPU_CACHE(dev)) {
 		/* L3 parity interrupt is always unmasked. */
-		dev_priv->gt_irq_mask = ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
-		gt_irqs |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
+		dev_priv->gt_irq_mask = ~GT_PARITY_ERROR;
+		gt_irqs |= GT_PARITY_ERROR;
 	}
 
 	gt_irqs |= GT_RENDER_USER_INTERRUPT;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index bcee89b..4155a1d 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -927,6 +927,7 @@
 #define GT_BLT_USER_INTERRUPT			(1 << 22)
 #define GT_BSD_CS_ERROR_INTERRUPT		(1 << 15)
 #define GT_BSD_USER_INTERRUPT			(1 << 12)
+#define GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1	(1 << 11) /* hsw+; rsvd on snb, ivb, vlv */
 #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT	(1 <<  5) /* !snb */
 #define GT_RENDER_PIPECTL_NOTIFY_INTERRUPT	(1 <<  4)
 #define GT_RENDER_CS_MASTER_ERROR_INTERRUPT	(1 <<  3)
@@ -937,6 +938,9 @@
 #define PM_VEBOX_CS_ERROR_INTERRUPT		(1 << 12) /* hsw+ */
 #define PM_VEBOX_USER_INTERRUPT			(1 << 10) /* hsw+ */
 
+#define GT_PARITY_ERROR				(GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1 | \
+						 GT_RENDER_L3_PARITY_ERROR_INTERRUPT)
+
 /* These are all the "old" interrupts */
 #define ILK_BSD_USER_INTERRUPT				(1<<5)
 #define I915_PIPE_CONTROL_NOTIFY_INTERRUPT		(1<<18)
@@ -4742,6 +4746,7 @@
 
 /* IVYBRIDGE DPF */
 #define GEN7_L3CDERRST1			0xB008 /* L3CD Error Status 1 */
+#define HSW_L3CDERRST11			0xB208 /* L3CD Error Status register 1 slice 1 */
 #define   GEN7_L3CDERRST1_ROW_MASK	(0x7ff<<14)
 #define   GEN7_PARITY_ERROR_VALID	(1<<13)
 #define   GEN7_L3CDERRST1_BANK_MASK	(3<<11)
@@ -4755,6 +4760,7 @@
 #define   GEN7_L3CDERRST1_ENABLE	(1<<7)
 
 #define GEN7_L3LOG_BASE			0xB070
+#define HSW_L3LOG_BASE_SLICE1		0xB270
 #define GEN7_L3LOG_SIZE			0x80
 
 #define GEN7_HALF_SLICE_CHICKEN1	0xe100 /* IVB GT1 + VLV */
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 43c2e81..d208f2d 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -119,6 +119,7 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
 	struct drm_device *drm_dev = dminor->dev;
 	struct drm_i915_private *dev_priv = drm_dev->dev_private;
 	uint32_t misccpctl;
+	int slice = (int)(uintptr_t)attr->private;
 	int i, ret;
 
 	count = round_down(count, 4);
@@ -135,11 +136,11 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
 
 	if (IS_HASWELL(drm_dev)) {
 		int last = min_t(int, GEN7_L3LOG_SIZE, count + offset);
-		if ((!dev_priv->l3_parity.remap_info))
+		if ((!dev_priv->l3_parity.remap_info[slice]))
 			memset(buf + offset, 0, last - offset);
 		else
 			memcpy(buf + offset,
-			       dev_priv->l3_parity.remap_info + (offset/4),
+			       dev_priv->l3_parity.remap_info[slice] + (offset/4),
 			       last - offset);
 
 		i = last;
@@ -170,6 +171,7 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
 	struct drm_device *drm_dev = dminor->dev;
 	struct drm_i915_private *dev_priv = drm_dev->dev_private;
 	u32 *temp = NULL; /* Just here to make handling failures easy */
+	int slice = (int)(uintptr_t)attr->private;
 	int ret;
 
 	ret = l3_access_valid(drm_dev, offset);
@@ -180,7 +182,7 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
 	if (ret)
 		return ret;
 
-	if (!dev_priv->l3_parity.remap_info) {
+	if (!dev_priv->l3_parity.remap_info[slice]) {
 		temp = kzalloc(GEN7_L3LOG_SIZE, GFP_KERNEL);
 		if (!temp) {
 			mutex_unlock(&drm_dev->struct_mutex);
@@ -200,11 +202,11 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
 	 * at this point it is left as a TODO.
 	*/
 	if (temp)
-		dev_priv->l3_parity.remap_info = temp;
+		dev_priv->l3_parity.remap_info[slice] = temp;
 
-	memcpy(dev_priv->l3_parity.remap_info + (offset/4), buf, count);
+	memcpy(dev_priv->l3_parity.remap_info[slice] + (offset/4), buf, count);
 
-	i915_gem_l3_remap(drm_dev);
+	i915_gem_l3_remap(drm_dev, slice);
 
 	mutex_unlock(&drm_dev->struct_mutex);
 
@@ -216,7 +218,17 @@ static struct bin_attribute dpf_attrs = {
 	.size = GEN7_L3LOG_SIZE,
 	.read = i915_l3_read,
 	.write = i915_l3_write,
-	.mmap = NULL
+	.mmap = NULL,
+	.private = (void *)0
+};
+
+static struct bin_attribute dpf_attrs_1 = {
+	.attr = {.name = "l3_parity_slice_1", .mode = (S_IRUSR | S_IWUSR)},
+	.size = GEN7_L3LOG_SIZE,
+	.read = i915_l3_read,
+	.write = i915_l3_write,
+	.mmap = NULL,
+	.private = (void *)1
 };
 
 static ssize_t gt_cur_freq_mhz_show(struct device *kdev,
@@ -527,6 +539,13 @@ void i915_setup_sysfs(struct drm_device *dev)
 		ret = device_create_bin_file(&dev->primary->kdev, &dpf_attrs);
 		if (ret)
 			DRM_ERROR("l3 parity sysfs setup failed\n");
+
+		if (NUM_L3_SLICES(dev) > 1) {
+			ret = device_create_bin_file(&dev->primary->kdev,
+						     &dpf_attrs_1);
+			if (ret)
+				DRM_ERROR("l3 parity slice 1 setup failed\n");
+		}
 	}
 
 	ret = 0;
@@ -550,6 +569,7 @@ void i915_teardown_sysfs(struct drm_device *dev)
 		sysfs_remove_files(&dev->primary->kdev.kobj, vlv_attrs);
 	else
 		sysfs_remove_files(&dev->primary->kdev.kobj, gen6_attrs);
+	device_remove_bin_file(&dev->primary->kdev,  &dpf_attrs_1);
 	device_remove_bin_file(&dev->primary->kdev,  &dpf_attrs);
 #ifdef CONFIG_PM
 	sysfs_unmerge_group(&dev->primary->kdev.kobj, &rc6_attr_group);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 686e5b2..3539b45 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -570,7 +570,7 @@ static int init_render_ring(struct intel_ring_buffer *ring)
 		I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
 
 	if (HAS_L3_GPU_CACHE(dev))
-		I915_WRITE_IMR(ring, ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
+		I915_WRITE_IMR(ring, ~GT_PARITY_ERROR);
 
 	return ret;
 }
@@ -1000,7 +1000,7 @@ gen6_ring_get_irq(struct intel_ring_buffer *ring)
 		if (HAS_L3_GPU_CACHE(dev) && ring->id == RCS)
 			I915_WRITE_IMR(ring,
 				       ~(ring->irq_enable_mask |
-					 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
+					 GT_PARITY_ERROR));
 		else
 			I915_WRITE_IMR(ring, ~ring->irq_enable_mask);
 		ilk_enable_gt_irq(dev_priv, ring->irq_enable_mask);
@@ -1021,7 +1021,7 @@ gen6_ring_put_irq(struct intel_ring_buffer *ring)
 	if (--ring->irq_refcount == 0) {
 		if (HAS_L3_GPU_CACHE(dev) && ring->id == RCS)
 			I915_WRITE_IMR(ring,
-				       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
+				       ~GT_PARITY_ERROR);
 		else
 			I915_WRITE_IMR(ring, ~0);
 		ilk_disable_gt_irq(dev_priv, ring->irq_enable_mask);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 55bb572..3a4e97b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -38,10 +38,10 @@
  *
  * I915_L3_PARITY_UEVENT - Generated when the driver receives a parity mismatch
  *	event from the gpu l3 cache. Additional information supplied is ROW,
- *	BANK, SUBBANK of the affected cacheline. Userspace should keep track of
- *	these events and if a specific cache-line seems to have a persistent
- *	error remap it with the l3 remapping tool supplied in intel-gpu-tools.
- *	The value supplied with the event is always 1.
+ *	BANK, SUBBANK, SLICE of the affected cacheline. Userspace should keep
+ *	track of these events and if a specific cache-line seems to have a
+ *	persistent error remap it with the l3 remapping tool supplied in
+ *	intel-gpu-tools.  The value supplied with the event is always 1.
  *
  * I915_ERROR_UEVENT - Generated upon error detection, currently only via
  *	hangcheck. The error detection event is a good indicator of when things
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 6/8] drm/i915: Make l3 remapping use the ring
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (4 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 5/8] drm/i915: Add second slice l3 remapping Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13 16:16   ` Daniel Vetter
  2013-09-13  5:28 ` [PATCH 7/8] drm/i915: Keep a list of all contexts Ben Widawsky
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Using LRI for setting the remapping registers allows us to stream l3
remapping information. This is necessary to handle per context remaps as
we'll see implemented in an upcoming patch.

Using the ring also means we don't need to frob the DOP clock gating
bits.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h   |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   | 39 +++++++++++++++++----------------------
 drivers/gpu/drm/i915/i915_sysfs.c |  3 ++-
 3 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index eb90461..493a9cd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1950,7 +1950,7 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
 int __must_check i915_gem_init(struct drm_device *dev);
 int __must_check i915_gem_init_hw(struct drm_device *dev);
-void i915_gem_l3_remap(struct drm_device *dev, int slice);
+int i915_gem_l3_remap(struct intel_ring_buffer *ring, int slice);
 void i915_gem_init_swizzling(struct drm_device *dev);
 void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
 int __must_check i915_gpu_idle(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b11f7d6c..fa01c69 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4256,41 +4256,36 @@ i915_gem_idle(struct drm_device *dev)
 	return 0;
 }
 
-void i915_gem_l3_remap(struct drm_device *dev, int slice)
+int i915_gem_l3_remap(struct intel_ring_buffer *ring, int slice)
 {
+	struct drm_device *dev = ring->dev;
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	u32 reg_base = GEN7_L3LOG_BASE + (slice * 0x200);
 	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
-	u32 misccpctl;
-	int i;
+	int i, ret;
 
 	if (!HAS_L3_GPU_CACHE(dev))
-		return;
+		return 0;
 
 	if (NUM_L3_SLICES(dev) < 2 && slice)
-		return;
+		return 0;
 
 	if (!remap_info)
-		return;
+		return 0;
 
-	misccpctl = I915_READ(GEN7_MISCCPCTL);
-	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
-	POSTING_READ(GEN7_MISCCPCTL);
+	ret = intel_ring_begin(ring, GEN7_L3LOG_SIZE / 4 * 3);
+	if (ret)
+		return ret;
 
 	for (i = 0; i < GEN7_L3LOG_SIZE; i += 4) {
-		u32 remap = I915_READ(reg_base + i);
-		if (remap && remap != remap_info[i/4])
-			DRM_DEBUG("0x%x was already programmed to %x\n",
-				  reg_base + i, remap);
-		if (remap && !remap_info[i/4])
-			DRM_DEBUG_DRIVER("Clearing remapped register\n");
-		I915_WRITE(reg_base + i, remap_info[i/4]);
+		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
+		intel_ring_emit(ring, reg_base + i);
+		intel_ring_emit(ring, remap_info[i/4]);
 	}
 
-	/* Make sure all the writes land before disabling dop clock gating */
-	POSTING_READ(reg_base);
+	intel_ring_advance(ring);
 
-	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
+	return ret;
 }
 
 void i915_gem_init_swizzling(struct drm_device *dev)
@@ -4401,15 +4396,15 @@ i915_gem_init_hw(struct drm_device *dev)
 		I915_WRITE(GEN7_MSG_CTL, temp);
 	}
 
-	for (i = 0; i < NUM_L3_SLICES(dev); i++)
-		i915_gem_l3_remap(dev, i);
-
 	i915_gem_init_swizzling(dev);
 
 	ret = i915_gem_init_rings(dev);
 	if (ret)
 		return ret;
 
+	for (i = 0; i < NUM_L3_SLICES(dev); i++)
+		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
+
 	/*
 	 * XXX: There was some w/a described somewhere suggesting loading
 	 * contexts before PPGTT.
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index d208f2d..65a7274 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -206,7 +206,8 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
 
 	memcpy(dev_priv->l3_parity.remap_info[slice] + (offset/4), buf, count);
 
-	i915_gem_l3_remap(drm_dev, slice);
+	if (i915_gem_l3_remap(&dev_priv->ring[RCS], slice))
+		count = 0;
 
 	mutex_unlock(&drm_dev->struct_mutex);
 
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 7/8] drm/i915: Keep a list of all contexts
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (5 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 6/8] drm/i915: Make l3 remapping use the ring Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  5:28 ` [PATCH 8/8] drm/i915: Do remaps for " Ben Widawsky
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

I have implemented this patch before without creating a separate list
(I'm having trouble finding the links, but the messages ids are:
<1364942743-6041-2-git-send-email-ben@bwidawsk.net>
<1365118914-15753-9-git-send-email-ben@bwidawsk.net>)

However, the code is much simpler to just use a list and it makes the
code from the next patch a lot more pretty.

As you'll see in the next patch, the reason for this is to be able to
specify when a context needs to get L3 remapping. More details there.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c     | 15 +++++++++------
 drivers/gpu/drm/i915/i915_drv.h         |  3 +++
 drivers/gpu/drm/i915/i915_gem.c         |  1 +
 drivers/gpu/drm/i915/i915_gem_context.c |  3 +++
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 1d77624..ada0950 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1442,6 +1442,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 	struct drm_device *dev = node->minor->dev;
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring;
+	struct i915_hw_context *ctx;
 	int ret, i;
 
 	ret = mutex_lock_interruptible(&dev->mode_config.mutex);
@@ -1460,12 +1461,14 @@ static int i915_context_status(struct seq_file *m, void *unused)
 		seq_putc(m, '\n');
 	}
 
-	for_each_ring(ring, dev_priv, i) {
-		if (ring->default_context) {
-			seq_printf(m, "HW default context %s ", ring->name);
-			describe_obj(m, ring->default_context->obj);
-			seq_putc(m, '\n');
-		}
+	list_for_each_entry(ctx, &dev_priv->context_list, link) {
+		seq_puts(m, "HW context ");
+		for_each_ring(ring, dev_priv, i)
+			if (ring->default_context == ctx)
+				seq_printf(m, "(default context %s) ", ring->name);
+
+		describe_obj(m, ctx->obj);
+		seq_putc(m, '\n');
 	}
 
 	mutex_unlock(&dev->mode_config.mutex);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 493a9cd..68f992b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -606,6 +606,8 @@ struct i915_hw_context {
 	struct intel_ring_buffer *ring;
 	struct drm_i915_gem_object *obj;
 	struct i915_ctx_hang_stats hang_stats;
+
+	struct list_head link;
 };
 
 struct i915_fbc {
@@ -1344,6 +1346,7 @@ typedef struct drm_i915_private {
 
 	bool hw_contexts_disabled;
 	uint32_t hw_context_size;
+	struct list_head context_list;
 
 	u32 fdi_rx_config;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fa01c69..fe4e579 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4576,6 +4576,7 @@ i915_gem_load(struct drm_device *dev)
 	INIT_LIST_HEAD(&dev_priv->vm_list);
 	i915_init_vm(dev_priv, &dev_priv->gtt.base);
 
+	INIT_LIST_HEAD(&dev_priv->context_list);
 	INIT_LIST_HEAD(&dev_priv->mm.unbound_list);
 	INIT_LIST_HEAD(&dev_priv->mm.bound_list);
 	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 26c3fcc..2bbdce8 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -129,6 +129,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 	struct i915_hw_context *ctx = container_of(ctx_ref,
 						   typeof(*ctx), ref);
 
+	list_del(&ctx->link);
 	drm_gem_object_unreference(&ctx->obj->base);
 	kfree(ctx);
 }
@@ -147,6 +148,7 @@ create_hw_context(struct drm_device *dev,
 
 	kref_init(&ctx->ref);
 	ctx->obj = i915_gem_alloc_object(dev, dev_priv->hw_context_size);
+	INIT_LIST_HEAD(&ctx->link);
 	if (ctx->obj == NULL) {
 		kfree(ctx);
 		DRM_DEBUG_DRIVER("Context object allocated failed\n");
@@ -166,6 +168,7 @@ create_hw_context(struct drm_device *dev,
 	 * assertion in the context switch code.
 	 */
 	ctx->ring = &dev_priv->ring[RCS];
+	list_add_tail(&ctx->link, &dev_priv->context_list);
 
 	/* Default context will never have a file_priv */
 	if (file_priv == NULL)
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 8/8] drm/i915: Do remaps for all contexts
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (6 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 7/8] drm/i915: Keep a list of all contexts Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  9:17   ` Ville Syrjälä
  2013-09-13  5:28 ` [PATCH 09/16] intel_l3_parity: Fix indentation Ben Widawsky
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

On both Ivybridge and Haswell, row remapping information is saved and
restored with context. This means, we never actually properly supported
the l3 remapping because our sysfs interface is asynchronous (and not
tied to any context), and the known faulty HW would be reused by the
next context to run.

Not that due to the asynchronous nature of the sysfs entry, there is no
point modifying the registers for the existing context. Instead we set a
flag for all contexts to load the correct remapping information on the
next run. Interested clients can use debugfs to determine whether or not
the row has been remapped.

One could propose at this point that we just do the remapping in the
kernel. I guess since we have to maintain the sysfs interface anyway,
I'm not sure how useful it is, and I do like keeping the policy in
userspace; (it wasn't my original decision to make the
interface the way it is, so I'm not attached).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  8 +++++++
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/i915_gem_context.c | 17 +++++++++++++--
 drivers/gpu/drm/i915/i915_sysfs.c       | 38 +++++++++++----------------------
 4 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index ada0950..80bed69 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -145,6 +145,13 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		seq_printf(m, " (%s)", obj->ring->name);
 }
 
+static void describe_ctx(struct seq_file *m, struct i915_hw_context *ctx)
+{
+	seq_putc(m, ctx->is_initialized ? 'I' : 'i');
+	seq_putc(m, ctx->remap_slice ? 'R' : 'r');
+	seq_putc(m, ' ');
+}
+
 static int i915_gem_object_list_info(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = (struct drm_info_node *) m->private;
@@ -1463,6 +1470,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
 		seq_puts(m, "HW context ");
+		describe_ctx(m, ctx);
 		for_each_ring(ring, dev_priv, i)
 			if (ring->default_context == ctx)
 				seq_printf(m, "(default context %s) ", ring->name);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 68f992b..6ba78cd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -602,6 +602,7 @@ struct i915_hw_context {
 	struct kref ref;
 	int id;
 	bool is_initialized;
+	uint8_t remap_slice;
 	struct drm_i915_file_private *file_priv;
 	struct intel_ring_buffer *ring;
 	struct drm_i915_gem_object *obj;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 2bbdce8..72b7202 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -140,7 +140,7 @@ create_hw_context(struct drm_device *dev,
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_hw_context *ctx;
-	int ret;
+	int ret, i;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
 	if (ctx == NULL)
@@ -181,6 +181,8 @@ create_hw_context(struct drm_device *dev,
 
 	ctx->file_priv = file_priv;
 	ctx->id = ret;
+	for (i = 0; i < NUM_L3_SLICES(dev); i++)
+		ctx->remap_slice |= 1 << 1;
 
 	return ctx;
 
@@ -396,7 +398,7 @@ static int do_switch(struct i915_hw_context *to)
 	struct intel_ring_buffer *ring = to->ring;
 	struct i915_hw_context *from = ring->last_context;
 	u32 hw_flags = 0;
-	int ret;
+	int ret, i;
 
 	BUG_ON(from != NULL && from->obj != NULL && from->obj->pin_count == 0);
 
@@ -432,6 +434,17 @@ static int do_switch(struct i915_hw_context *to)
 		return ret;
 	}
 
+	for (i = 0; i < MAX_L3_SLICES; i++) {
+		if (!(to->remap_slice & (1<<i)))
+			continue;
+
+		ret = i915_gem_l3_remap(ring, i);
+		if (!ret) {
+			to->remap_slice &= ~(1<<i);
+		}
+		/* If it failed, try again next round */
+	}
+
 	/* The backing object for the context is done after switching to the
 	 * *next* context. Therefore we cannot retire the previous context until
 	 * the next context has already started running. In fact, the below code
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 65a7274..f0d7e1c 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -118,9 +118,8 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
 	struct drm_minor *dminor = container_of(dev, struct drm_minor, kdev);
 	struct drm_device *drm_dev = dminor->dev;
 	struct drm_i915_private *dev_priv = drm_dev->dev_private;
-	uint32_t misccpctl;
 	int slice = (int)(uintptr_t)attr->private;
-	int i, ret;
+	int ret;
 
 	count = round_down(count, 4);
 
@@ -134,31 +133,16 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
 	if (ret)
 		return ret;
 
-	if (IS_HASWELL(drm_dev)) {
-		int last = min_t(int, GEN7_L3LOG_SIZE, count + offset);
-		if ((!dev_priv->l3_parity.remap_info[slice]))
-			memset(buf + offset, 0, last - offset);
-		else
-			memcpy(buf + offset,
-			       dev_priv->l3_parity.remap_info[slice] + (offset/4),
-			       last - offset);
-
-		i = last;
-		goto out;
-	}
-
-	misccpctl = I915_READ(GEN7_MISCCPCTL);
-	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
-
-	for (i = 0; i < count; i += 4)
-		*((uint32_t *)(&buf[i])) = I915_READ(GEN7_L3LOG_BASE + offset + i);
-
-	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
+	if ((!dev_priv->l3_parity.remap_info[slice]))
+		memset(buf + offset, 0, count);
+	else
+		memcpy(buf + offset,
+		       dev_priv->l3_parity.remap_info[slice] + (offset/4),
+		       count);
 
-out:
 	mutex_unlock(&drm_dev->struct_mutex);
 
-	return i;
+	return count;
 }
 
 static ssize_t
@@ -170,6 +154,7 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
 	struct drm_minor *dminor = container_of(dev, struct drm_minor, kdev);
 	struct drm_device *drm_dev = dminor->dev;
 	struct drm_i915_private *dev_priv = drm_dev->dev_private;
+	struct i915_hw_context *ctx;
 	u32 *temp = NULL; /* Just here to make handling failures easy */
 	int slice = (int)(uintptr_t)attr->private;
 	int ret;
@@ -206,8 +191,9 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
 
 	memcpy(dev_priv->l3_parity.remap_info[slice] + (offset/4), buf, count);
 
-	if (i915_gem_l3_remap(&dev_priv->ring[RCS], slice))
-		count = 0;
+	/* NB: We defer the remapping until we switch to the context */
+	list_for_each_entry(ctx, &dev_priv->context_list, link)
+		ctx->remap_slice |= (1<<slice);
 
 	mutex_unlock(&drm_dev->struct_mutex);
 
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 09/16] intel_l3_parity: Fix indentation
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (7 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 8/8] drm/i915: Do remaps for " Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  5:28 ` [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support Ben Widawsky
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 tools/intel_l3_parity.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c
index ad027ac..970dcd6 100644
--- a/tools/intel_l3_parity.c
+++ b/tools/intel_l3_parity.c
@@ -58,12 +58,12 @@ static void dumpit(void)
 		for (j = 0; j < NUM_SUBBANKS; j++) {
 			struct l3_log_register *reg = &l3log[i][j];
 
-		if (reg->row0_enable)
-			printf("Row %d, Bank %d, Subbank %d is disabled\n",
-			       reg->row0, i, j);
-		if (reg->row1_enable)
-			printf("Row %d, Bank %d, Subbank %d is disabled\n",
-			       reg->row1, i, j);
+			if (reg->row0_enable)
+				printf("Row %d, Bank %d, Subbank %d is disabled\n",
+				       reg->row0, i, j);
+			if (reg->row1_enable)
+				printf("Row %d, Bank %d, Subbank %d is disabled\n",
+				       reg->row1, i, j);
 		}
 	}
 }
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (8 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 09/16] intel_l3_parity: Fix indentation Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-16 18:18   ` Bell, Bryan J
  2013-09-13  5:28 ` [PATCH 11/16] intel_l3_parity: Use getopt for the l3 parity tool Ben Widawsky
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 tools/intel_l3_parity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c
index 970dcd6..a3d268b 100644
--- a/tools/intel_l3_parity.c
+++ b/tools/intel_l3_parity.c
@@ -120,7 +120,7 @@ int main(int argc, char *argv[])
 	assert(ret != -1);
 
 	fd = open(path, O_RDWR);
-	if (fd == -1 && IS_IVYBRIDGE(devid)) {
+	if (fd == -1 && intel_gen(devid) > 6) {
 		perror("Opening sysfs");
 		exit(EXIT_FAILURE);
 	} else if (fd == -1)
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 11/16] intel_l3_parity: Use getopt for the l3 parity tool
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (9 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  5:28 ` [PATCH 12/16] intel_l3_parity: Hardware info argument Ben Widawsky
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Add new command line arguments in addition to supporting the old
features. This patch only introduces one feature, the -e argument to
enable a specific row/bank/subbank. Previously you could only enable
all. Otherwise, it has what you expect (we prefer -r -b -s for
specifying the row/bank/subbank).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 tests/sysfs_l3_parity   |  12 ++---
 tools/intel_l3_parity.c | 122 ++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 109 insertions(+), 25 deletions(-)

diff --git a/tests/sysfs_l3_parity b/tests/sysfs_l3_parity
index 6f814a1..a0dfad9 100755
--- a/tests/sysfs_l3_parity
+++ b/tests/sysfs_l3_parity
@@ -8,20 +8,20 @@ fi
 SOURCE_DIR="$( dirname "${BASH_SOURCE[0]}" )"
 . $SOURCE_DIR/drm_lib.sh
 
-$SOURCE_DIR/../tools/intel_l3_parity -c
+$SOURCE_DIR/../tools/intel_l3_parity -r 0 -b 0 -s 0 -e
 
 #Check that we can remap a row
-$SOURCE_DIR/../tools/intel_l3_parity 0,0,0
-disabled=`$SOURCE_DIR/../tools/intel_l3_parity | grep -c 'Row 0, Bank 0, Subbank 0 is disabled'`
+$SOURCE_DIR/../tools/intel_l3_parity -r 0 -b 0 -s 0 -d
+disabled=`$SOURCE_DIR/../tools/intel_l3_parity -l | grep -c 'Row 0, Bank 0, Subbank 0 is disabled'`
 if [ "$disabled" != "1" ] ; then
 	echo "Fail"
 	exit 1
 fi
 
-$SOURCE_DIR/../tools/intel_l3_parity -c
+$SOURCE_DIR/../tools/intel_l3_parity -r 0 -b 0 -s 0 -e
 
 #Check that we can clear remaps
-if [ `$SOURCE_DIR/../tools/intel_l3_parity | wc -c` != "0" ] ; then
-	echo "Fail"
+if [ `$SOURCE_DIR/../tools/intel_l3_parity -l | wc -c` != "0" ] ; then
+	echo "Fail 2"
 	exit 1
 fi
diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c
index a3d268b..fa5adaa 100644
--- a/tools/intel_l3_parity.c
+++ b/tools/intel_l3_parity.c
@@ -33,12 +33,14 @@
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
+#include <getopt.h>
 #include "intel_chipset.h"
 #include "intel_gpu_tools.h"
 #include "drmtest.h"
 
 #define NUM_BANKS 4
 #define NUM_SUBBANKS 8
+#define MAX_ROW (1<<12)
 #define NUM_REGS (NUM_BANKS * NUM_SUBBANKS)
 
 struct __attribute__ ((__packed__)) l3_log_register {
@@ -93,17 +95,34 @@ static int disable_rbs(int row, int bank, int sbank)
 	return 0;
 }
 
-static int do_parse(int argc, char *argv[])
+static void enables_rbs(int row, int bank, int sbank)
 {
-	int row, bank, sbank, i, ret;
+	struct l3_log_register *reg = &l3log[bank][sbank];
 
-	for (i = 1; i < argc; i++) {
-		ret = sscanf(argv[i], "%d,%d,%d", &row, &bank, &sbank);
-		if (ret != 3)
-			return i;
-		assert(disable_rbs(row, bank, sbank) == 0);
-	}
-	return 0;
+	if (!reg->row0_enable && !reg->row1_enable)
+		return;
+
+	if (reg->row1_enable && reg->row1 == row)
+		reg->row1_enable = 0;
+	else if (reg->row0_enable && reg->row0 == row)
+		reg->row0_enable = 0;
+}
+
+static void usage(const char *name)
+{
+	printf("usage: %s [OPTIONS] [ACTION]\n"
+		"Operate on the i915 L3 GPU cache (should be run as root)\n\n"
+		" OPTIONS:\n"
+		"  -r, --row=[row]			The row to act upon (default 0)\n"
+		"  -b, --bank=[bank]			The bank to act upon (default 0)\n"
+		"  -s, --subbank=[subbank]		The subbank to act upon (default 0)\n"
+		" ACTIONS (only 1 may be specified at a time):\n"
+		"  -h, --help				Display this help\n"
+		"  -l, --list				List the current L3 logs\n"
+		"  -a, --clear-all			Clear all disabled rows\n"
+		"  -e, --enable				Enable row, bank, subbank (undo -d)\n"
+		"  -d, --disable=<row,bank,subbank>	Disable row, bank, subbank (inline arguments are deprecated. Please use -r, -b, -s instead\n",
+		name);
 }
 
 int main(int argc, char *argv[])
@@ -111,7 +130,9 @@ int main(int argc, char *argv[])
 	const int device = drm_get_card();
 	char *path;
 	unsigned int devid;
+	int row = 0, bank = 0, sbank = 0;
 	int drm_fd, fd, ret;
+	int action = '0';
 
 	drm_fd = drm_open_any();
 	devid = intel_get_drm_devid(drm_fd);
@@ -134,19 +155,82 @@ int main(int argc, char *argv[])
 
 	assert(lseek(fd, 0, SEEK_SET) == 0);
 
-	if (argc == 1) {
-		dumpit();
-		exit(EXIT_SUCCESS);
-	} else if (!strncmp("-c", argv[1], 2)) {
-		memset(l3log, 0, sizeof(l3log));
-	} else {
-		ret = do_parse(argc, argv);
-		if (ret != 0) {
-			fprintf(stderr, "Malformed command line at %s\n", argv[ret]);
-			exit(EXIT_FAILURE);
+	while (1) {
+		int c, option_index = 0;
+		static struct option long_options[] = {
+			{ "help", no_argument, 0, 'h' },
+			{ "list", no_argument, 0, 'l' },
+			{ "clear-all", no_argument, 0, 'a' },
+			{ "enable", no_argument, 0, 'e' },
+			{ "disable", optional_argument, 0, 'd' },
+			{ "row", required_argument, 0, 'r' },
+			{ "bank", required_argument, 0, 'b' },
+			{ "subbank", required_argument, 0, 's' },
+			{0, 0, 0, 0}
+		};
+
+		c = getopt_long(argc, argv, "hr:b:s:aled::", long_options,
+				&option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+			case '?':
+			case 'h':
+				usage(argv[0]);
+				exit(EXIT_SUCCESS);
+			case 'r':
+				row = atoi(optarg);
+				if (row >= MAX_ROW)
+					exit(EXIT_FAILURE);
+				break;
+			case 'b':
+				bank = atoi(optarg);
+				if (bank >= NUM_BANKS)
+					exit(EXIT_FAILURE);
+				break;
+			case 's':
+				sbank = atoi(optarg);
+				if (sbank >= NUM_SUBBANKS)
+					exit(EXIT_FAILURE);
+				break;
+			case 'd':
+				if (optarg) {
+					ret = sscanf(optarg, "%d,%d,%d", &row, &bank, &sbank);
+					if (ret != 3)
+						exit(EXIT_FAILURE);
+				}
+			case 'a':
+			case 'l':
+			case 'e':
+				if (action != '0') {
+					fprintf(stderr, "Only one action may be specified\n");
+					exit(EXIT_FAILURE);
+				}
+				action = c;
+				break;
+			default:
+				abort();
 		}
 	}
 
+	switch (action) {
+		case 'l':
+			dumpit();
+			exit(EXIT_SUCCESS);
+		case 'a':
+			memset(l3log, 0, sizeof(l3log));
+			break;
+		case 'e':
+			enables_rbs(row, bank, sbank);
+			break;
+		case 'd':
+			assert(disable_rbs(row, bank, sbank) == 0);
+			break;
+		default:
+			abort();
+	}
+
 	ret = write(fd, l3log, NUM_REGS * sizeof(uint32_t));
 	if (ret == -1) {
 		perror("Writing sysfs");
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 12/16] intel_l3_parity: Hardware info argument
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (10 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 11/16] intel_l3_parity: Use getopt for the l3 parity tool Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  5:28 ` [PATCH 13/16] intel_l3_parity: slice support Ben Widawsky
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Add a new command line argument to the tool which will spit out various
parameters for the giving hardware. As a result of this, some new
defines are added to help with the various info.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 tools/intel_l3_parity.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c
index fa5adaa..5031fbd 100644
--- a/tools/intel_l3_parity.c
+++ b/tools/intel_l3_parity.c
@@ -38,9 +38,19 @@
 #include "intel_gpu_tools.h"
 #include "drmtest.h"
 
-#define NUM_BANKS 4
+static unsigned int devid;
+/* L3 size is always a function of banks. The number of banks cannot be
+ * determined by number of slices however */
+#define MAX_BANKS 4
+#define NUM_BANKS \
+	((devid == PCI_CHIP_IVYBRIDGE_GT1 || devid == PCI_CHIP_IVYBRIDGE_M_GT1) ? 2 : 4)
 #define NUM_SUBBANKS 8
+#define BYTES_PER_BANK (128 << 10)
+/* Each row addresses [up to] 4b. This multiplied by the number of subbanks
+ * will give the L3 size per bank.
+ * TODO: Row size is fixed on IVB, and variable on HSW.*/
 #define MAX_ROW (1<<12)
+#define L3_SIZE ((MAX_ROW * 4) * NUM_SUBBANKS *  NUM_BANKS)
 #define NUM_REGS (NUM_BANKS * NUM_SUBBANKS)
 
 struct __attribute__ ((__packed__)) l3_log_register {
@@ -50,7 +60,7 @@ struct __attribute__ ((__packed__)) l3_log_register {
 	uint32_t row1_enable	: 1;
 	uint32_t rsvd1		: 4;
 	uint32_t row1		: 11;
-} l3log[NUM_BANKS][NUM_SUBBANKS];
+} l3log[MAX_BANKS][NUM_SUBBANKS];
 
 static void dumpit(void)
 {
@@ -118,6 +128,7 @@ static void usage(const char *name)
 		"  -s, --subbank=[subbank]		The subbank to act upon (default 0)\n"
 		" ACTIONS (only 1 may be specified at a time):\n"
 		"  -h, --help				Display this help\n"
+		"  -H, --hw-info				Display the current L3 properties\n"
 		"  -l, --list				List the current L3 logs\n"
 		"  -a, --clear-all			Clear all disabled rows\n"
 		"  -e, --enable				Enable row, bank, subbank (undo -d)\n"
@@ -129,7 +140,6 @@ int main(int argc, char *argv[])
 {
 	const int device = drm_get_card();
 	char *path;
-	unsigned int devid;
 	int row = 0, bank = 0, sbank = 0;
 	int drm_fd, fd, ret;
 	int action = '0';
@@ -163,13 +173,14 @@ int main(int argc, char *argv[])
 			{ "clear-all", no_argument, 0, 'a' },
 			{ "enable", no_argument, 0, 'e' },
 			{ "disable", optional_argument, 0, 'd' },
+			{ "hw-info", no_argument, 0, 'H' },
 			{ "row", required_argument, 0, 'r' },
 			{ "bank", required_argument, 0, 'b' },
 			{ "subbank", required_argument, 0, 's' },
 			{0, 0, 0, 0}
 		};
 
-		c = getopt_long(argc, argv, "hr:b:s:aled::", long_options,
+		c = getopt_long(argc, argv, "hHr:b:s:aled::", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -179,6 +190,11 @@ int main(int argc, char *argv[])
 			case 'h':
 				usage(argv[0]);
 				exit(EXIT_SUCCESS);
+			case 'H':
+				printf("Number of banks: %d\n", NUM_BANKS);
+				printf("Subbanks per bank: %d\n", NUM_SUBBANKS);
+				printf("L3 size: %dK\n", L3_SIZE >> 10);
+				exit(EXIT_SUCCESS);
 			case 'r':
 				row = atoi(optarg);
 				if (row >= MAX_ROW)
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 13/16] intel_l3_parity: slice support
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (11 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 12/16] intel_l3_parity: Hardware info argument Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  5:28 ` [PATCH 14/16] intel_l3_parity: Actually support multiple slices Ben Widawsky
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Haswell GT3 adds a new slice which is kept distinct from the old
register interface. Plumb it into the code, though it's only 1 slice
still.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 tools/intel_l3_parity.c | 113 +++++++++++++++++++++++++++++-------------------
 1 file changed, 68 insertions(+), 45 deletions(-)

diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c
index 5031fbd..fd09d39 100644
--- a/tools/intel_l3_parity.c
+++ b/tools/intel_l3_parity.c
@@ -52,6 +52,8 @@ static unsigned int devid;
 #define MAX_ROW (1<<12)
 #define L3_SIZE ((MAX_ROW * 4) * NUM_SUBBANKS *  NUM_BANKS)
 #define NUM_REGS (NUM_BANKS * NUM_SUBBANKS)
+#define MAX_SLICES 1
+#define REAL_MAX_SLICES 1
 
 struct __attribute__ ((__packed__)) l3_log_register {
 	uint32_t row0_enable	: 1;
@@ -60,15 +62,21 @@ struct __attribute__ ((__packed__)) l3_log_register {
 	uint32_t row1_enable	: 1;
 	uint32_t rsvd1		: 4;
 	uint32_t row1		: 11;
-} l3log[MAX_BANKS][NUM_SUBBANKS];
+} l3logs[REAL_MAX_SLICES][MAX_BANKS][NUM_SUBBANKS];
 
-static void dumpit(void)
+static int which_slice = -1;
+#define for_each_slice(__i) \
+	for ((__i) = (which_slice == -1) ? 0 : which_slice; \
+			(__i) < ((which_slice == -1) ? MAX_SLICES : (which_slice + 1)); \
+			(__i)++)
+
+static void dumpit(int slice)
 {
 	int i, j;
 
 	for (i = 0; i < NUM_BANKS; i++) {
 		for (j = 0; j < NUM_SUBBANKS; j++) {
-			struct l3_log_register *reg = &l3log[i][j];
+			struct l3_log_register *reg = &l3logs[slice][i][j];
 
 			if (reg->row0_enable)
 				printf("Row %d, Bank %d, Subbank %d is disabled\n",
@@ -80,9 +88,9 @@ static void dumpit(void)
 	}
 }
 
-static int disable_rbs(int row, int bank, int sbank)
+static int disable_rbs(int row, int bank, int sbank, int slice)
 {
-	struct l3_log_register *reg = &l3log[bank][sbank];
+	struct l3_log_register *reg = &l3logs[slice][bank][sbank];
 
 	// can't map more than 2 rows
 	if (reg->row0_enable && reg->row1_enable)
@@ -105,9 +113,9 @@ static int disable_rbs(int row, int bank, int sbank)
 	return 0;
 }
 
-static void enables_rbs(int row, int bank, int sbank)
+static void enables_rbs(int row, int bank, int sbank, int slice)
 {
-	struct l3_log_register *reg = &l3log[bank][sbank];
+	struct l3_log_register *reg = &l3logs[slice][bank][sbank];
 
 	if (!reg->row0_enable && !reg->row1_enable)
 		return;
@@ -126,6 +134,7 @@ static void usage(const char *name)
 		"  -r, --row=[row]			The row to act upon (default 0)\n"
 		"  -b, --bank=[bank]			The bank to act upon (default 0)\n"
 		"  -s, --subbank=[subbank]		The subbank to act upon (default 0)\n"
+		"  -w, --slice=[slice]			Which slice to act on (default: -1 [all])"
 		" ACTIONS (only 1 may be specified at a time):\n"
 		"  -h, --help				Display this help\n"
 		"  -H, --hw-info				Display the current L3 properties\n"
@@ -139,31 +148,30 @@ static void usage(const char *name)
 int main(int argc, char *argv[])
 {
 	const int device = drm_get_card();
-	char *path;
+	char *path[REAL_MAX_SLICES];
 	int row = 0, bank = 0, sbank = 0;
-	int drm_fd, fd, ret;
+	int fd[REAL_MAX_SLICES] = {0}, ret, i;
 	int action = '0';
-
-	drm_fd = drm_open_any();
+	int drm_fd = drm_open_any();
 	devid = intel_get_drm_devid(drm_fd);
 
-	ret = asprintf(&path, "/sys/class/drm/card%d/l3_parity", device);
-	assert(ret != -1);
-
-	fd = open(path, O_RDWR);
-	if (fd == -1 && intel_gen(devid) > 6) {
-		perror("Opening sysfs");
-		exit(EXIT_FAILURE);
-	} else if (fd == -1)
+	if (intel_gen(devid) < 7)
 		exit(EXIT_SUCCESS);
 
-	ret = read(fd, l3log, NUM_REGS * sizeof(uint32_t));
-	if (ret == -1) {
-		perror("Reading sysfs");
-		exit(EXIT_FAILURE);
+	ret = asprintf(&path[0], "/sys/class/drm/card%d/l3_parity", device);
+	assert(ret != -1);
+
+	for_each_slice(i) {
+		fd[i] = open(path[i], O_RDWR);
+		assert(fd[i]);
+		ret = read(fd[i], l3logs[i], NUM_REGS * sizeof(uint32_t));
+		if (ret == -1) {
+			perror("Reading sysfs");
+			exit(EXIT_FAILURE);
+		}
+		assert(lseek(fd[i], 0, SEEK_SET) == 0);
 	}
 
-	assert(lseek(fd, 0, SEEK_SET) == 0);
 
 	while (1) {
 		int c, option_index = 0;
@@ -177,10 +185,11 @@ int main(int argc, char *argv[])
 			{ "row", required_argument, 0, 'r' },
 			{ "bank", required_argument, 0, 'b' },
 			{ "subbank", required_argument, 0, 's' },
+			{ "slice", required_argument, 0, 'w' },
 			{0, 0, 0, 0}
 		};
 
-		c = getopt_long(argc, argv, "hHr:b:s:aled::", long_options,
+		c = getopt_long(argc, argv, "hHr:b:s:w:aled::", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -191,6 +200,7 @@ int main(int argc, char *argv[])
 				usage(argv[0]);
 				exit(EXIT_SUCCESS);
 			case 'H':
+				printf("Number of slices: %d\n", MAX_SLICES);
 				printf("Number of banks: %d\n", NUM_BANKS);
 				printf("Subbanks per bank: %d\n", NUM_SUBBANKS);
 				printf("L3 size: %dK\n", L3_SIZE >> 10);
@@ -210,6 +220,11 @@ int main(int argc, char *argv[])
 				if (sbank >= NUM_SUBBANKS)
 					exit(EXIT_FAILURE);
 				break;
+			case 'w':
+				which_slice = atoi(optarg);
+				if (which_slice > 1)
+					exit(EXIT_FAILURE);
+				break;
 			case 'd':
 				if (optarg) {
 					ret = sscanf(optarg, "%d,%d,%d", &row, &bank, &sbank);
@@ -230,30 +245,38 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	switch (action) {
-		case 'l':
-			dumpit();
-			exit(EXIT_SUCCESS);
-		case 'a':
-			memset(l3log, 0, sizeof(l3log));
-			break;
-		case 'e':
-			enables_rbs(row, bank, sbank);
-			break;
-		case 'd':
-			assert(disable_rbs(row, bank, sbank) == 0);
-			break;
-		default:
-			abort();
+	/* Per slice operations */
+	for_each_slice(i) {
+		switch (action) {
+			case 'l':
+				dumpit(i);
+				break;
+			case 'a':
+				memset(l3logs[i], 0, NUM_REGS * sizeof(struct l3_log_register));
+				break;
+			case 'e':
+				enables_rbs(row, bank, sbank, i);
+				break;
+			case 'd':
+				assert(disable_rbs(row, bank, sbank, i) == 0);
+				break;
+			default:
+				abort();
+		}
 	}
 
-	ret = write(fd, l3log, NUM_REGS * sizeof(uint32_t));
-	if (ret == -1) {
-		perror("Writing sysfs");
-		exit(EXIT_FAILURE);
+	if (action == 'l')
+		exit(EXIT_SUCCESS);
+
+	for_each_slice(i) {
+		ret = write(fd[i], l3logs[i], NUM_REGS * sizeof(uint32_t));
+		if (ret == -1) {
+			perror("Writing sysfs");
+			exit(EXIT_FAILURE);
+		}
+		close(fd[i]);
 	}
 
-	close(fd);
 
 	exit(EXIT_SUCCESS);
 }
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 14/16] intel_l3_parity: Actually support multiple slices
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (12 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 13/16] intel_l3_parity: slice support Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  5:28 ` [PATCH 15/16] intel_l3_parity: Support error injection Ben Widawsky
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 tools/intel_l3_parity.c | 45 ++++++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c
index fd09d39..cf15541 100644
--- a/tools/intel_l3_parity.c
+++ b/tools/intel_l3_parity.c
@@ -41,19 +41,28 @@
 static unsigned int devid;
 /* L3 size is always a function of banks. The number of banks cannot be
  * determined by number of slices however */
-#define MAX_BANKS 4
-#define NUM_BANKS \
-	((devid == PCI_CHIP_IVYBRIDGE_GT1 || devid == PCI_CHIP_IVYBRIDGE_M_GT1) ? 2 : 4)
+static inline int num_banks(void) {
+	if (IS_HSW_GT3(devid))
+		return 8; /* 4 per each slice */
+	else if (IS_HSW_GT1(devid) ||
+			devid == PCI_CHIP_IVYBRIDGE_GT1 ||
+			devid == PCI_CHIP_IVYBRIDGE_M_GT1)
+		return 2;
+	else
+		return 4;
+}
 #define NUM_SUBBANKS 8
 #define BYTES_PER_BANK (128 << 10)
 /* Each row addresses [up to] 4b. This multiplied by the number of subbanks
  * will give the L3 size per bank.
  * TODO: Row size is fixed on IVB, and variable on HSW.*/
 #define MAX_ROW (1<<12)
-#define L3_SIZE ((MAX_ROW * 4) * NUM_SUBBANKS *  NUM_BANKS)
-#define NUM_REGS (NUM_BANKS * NUM_SUBBANKS)
-#define MAX_SLICES 1
-#define REAL_MAX_SLICES 1
+#define MAX_BANKS_PER_SLICE 4
+#define NUM_REGS (MAX_BANKS_PER_SLICE * NUM_SUBBANKS)
+#define MAX_SLICES (IS_HSW_GT3(devid) ? 2 : 1)
+#define REAL_MAX_SLICES 2
+/* TODO support SLM config */
+#define L3_SIZE ((MAX_ROW * 4) * NUM_SUBBANKS *  num_banks())
 
 struct __attribute__ ((__packed__)) l3_log_register {
 	uint32_t row0_enable	: 1;
@@ -62,7 +71,7 @@ struct __attribute__ ((__packed__)) l3_log_register {
 	uint32_t row1_enable	: 1;
 	uint32_t rsvd1		: 4;
 	uint32_t row1		: 11;
-} l3logs[REAL_MAX_SLICES][MAX_BANKS][NUM_SUBBANKS];
+} l3logs[REAL_MAX_SLICES][MAX_BANKS_PER_SLICE][NUM_SUBBANKS];
 
 static int which_slice = -1;
 #define for_each_slice(__i) \
@@ -74,16 +83,16 @@ static void dumpit(int slice)
 {
 	int i, j;
 
-	for (i = 0; i < NUM_BANKS; i++) {
+	for (i = 0; i < MAX_BANKS_PER_SLICE; i++) {
 		for (j = 0; j < NUM_SUBBANKS; j++) {
 			struct l3_log_register *reg = &l3logs[slice][i][j];
 
 			if (reg->row0_enable)
-				printf("Row %d, Bank %d, Subbank %d is disabled\n",
-				       reg->row0, i, j);
+				printf("Slice %d, Row %d, Bank %d, Subbank %d is disabled\n",
+				       slice, reg->row0, i, j);
 			if (reg->row1_enable)
-				printf("Row %d, Bank %d, Subbank %d is disabled\n",
-				       reg->row1, i, j);
+				printf("Slice %d, Row %d, Bank %d, Subbank %d is disabled\n",
+				       slice, reg->row1, i, j);
 		}
 	}
 }
@@ -160,6 +169,8 @@ int main(int argc, char *argv[])
 
 	ret = asprintf(&path[0], "/sys/class/drm/card%d/l3_parity", device);
 	assert(ret != -1);
+	ret = asprintf(&path[1], "/sys/class/drm/card%d/l3_parity_slice_1", device);
+	assert(ret != -1);
 
 	for_each_slice(i) {
 		fd[i] = open(path[i], O_RDWR);
@@ -201,9 +212,9 @@ int main(int argc, char *argv[])
 				exit(EXIT_SUCCESS);
 			case 'H':
 				printf("Number of slices: %d\n", MAX_SLICES);
-				printf("Number of banks: %d\n", NUM_BANKS);
+				printf("Number of banks: %d\n", num_banks());
 				printf("Subbanks per bank: %d\n", NUM_SUBBANKS);
-				printf("L3 size: %dK\n", L3_SIZE >> 10);
+				printf("Max L3 size: %dK\n", L3_SIZE >> 10);
 				exit(EXIT_SUCCESS);
 			case 'r':
 				row = atoi(optarg);
@@ -212,7 +223,7 @@ int main(int argc, char *argv[])
 				break;
 			case 'b':
 				bank = atoi(optarg);
-				if (bank >= NUM_BANKS)
+				if (bank >= num_banks() || bank >= MAX_BANKS_PER_SLICE)
 					exit(EXIT_FAILURE);
 				break;
 			case 's':
@@ -222,7 +233,7 @@ int main(int argc, char *argv[])
 				break;
 			case 'w':
 				which_slice = atoi(optarg);
-				if (which_slice > 1)
+				if (which_slice >= MAX_SLICES)
 					exit(EXIT_FAILURE);
 				break;
 			case 'd':
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 15/16] intel_l3_parity: Support error injection
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (13 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 14/16] intel_l3_parity: Actually support multiple slices Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  9:12   ` Daniel Vetter
  2013-09-13  5:28 ` [PATCH 16/16] intel_l3_parity: Support a daemonic mode Ben Widawsky
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Haswell added the ability to inject errors which is extremely useful for
testing. Add two arguments to the tool to inject, and uninject.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 tests/sysfs_l3_parity   |  2 +-
 tools/intel_l3_parity.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/tests/sysfs_l3_parity b/tests/sysfs_l3_parity
index a0dfad9..e9d4411 100755
--- a/tests/sysfs_l3_parity
+++ b/tests/sysfs_l3_parity
@@ -21,7 +21,7 @@ fi
 $SOURCE_DIR/../tools/intel_l3_parity -r 0 -b 0 -s 0 -e
 
 #Check that we can clear remaps
-if [ `$SOURCE_DIR/../tools/intel_l3_parity -l | wc -c` != "0" ] ; then
+if [ `$SOURCE_DIR/../tools/intel_l3_parity -l | wc -l` != 1 ] ; then
 	echo "Fail 2"
 	exit 1
 fi
diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c
index cf15541..cd6754e 100644
--- a/tools/intel_l3_parity.c
+++ b/tools/intel_l3_parity.c
@@ -79,6 +79,20 @@ static int which_slice = -1;
 			(__i) < ((which_slice == -1) ? MAX_SLICES : (which_slice + 1)); \
 			(__i)++)
 
+static void decode_dft(uint32_t dft)
+{
+	if (IS_IVYBRIDGE(devid) || !(dft & 1)) {
+		printf("Error injection disabled\n");
+		return;
+	}
+	printf("Error injection enabled\n");
+	printf("  Hang = %s\n", (dft >> 28) & 0x1 ? "yes" : "no");
+	printf("  Row = %d\n", (dft >> 7) & 0x7ff);
+	printf("  Bank = %d\n", (dft >> 2) & 0x3);
+	printf("  Subbank = %d\n", (dft >> 4) & 0x7);
+	printf("  Slice = %d\n", (dft >> 1) & 0x1);
+}
+
 static void dumpit(int slice)
 {
 	int i, j;
@@ -150,7 +164,9 @@ static void usage(const char *name)
 		"  -l, --list				List the current L3 logs\n"
 		"  -a, --clear-all			Clear all disabled rows\n"
 		"  -e, --enable				Enable row, bank, subbank (undo -d)\n"
-		"  -d, --disable=<row,bank,subbank>	Disable row, bank, subbank (inline arguments are deprecated. Please use -r, -b, -s instead\n",
+		"  -d, --disable=<row,bank,subbank>	Disable row, bank, subbank (inline arguments are deprecated. Please use -r, -b, -s instead\n"
+		"  -i, --inject				[HSW only] Cause hardware to inject a row errors\n"
+		"  -u, --uninject			[HSW only] Turn off hardware error injectection (undo -i)\n",
 		name);
 }
 
@@ -158,6 +174,7 @@ int main(int argc, char *argv[])
 {
 	const int device = drm_get_card();
 	char *path[REAL_MAX_SLICES];
+	uint32_t dft;
 	int row = 0, bank = 0, sbank = 0;
 	int fd[REAL_MAX_SLICES] = {0}, ret, i;
 	int action = '0';
@@ -167,6 +184,8 @@ int main(int argc, char *argv[])
 	if (intel_gen(devid) < 7)
 		exit(EXIT_SUCCESS);
 
+	assert(intel_register_access_init(intel_get_pci_device(), 0) == 0);
+
 	ret = asprintf(&path[0], "/sys/class/drm/card%d/l3_parity", device);
 	assert(ret != -1);
 	ret = asprintf(&path[1], "/sys/class/drm/card%d/l3_parity_slice_1", device);
@@ -183,6 +202,7 @@ int main(int argc, char *argv[])
 		assert(lseek(fd[i], 0, SEEK_SET) == 0);
 	}
 
+	dft = intel_register_read(0xb038);
 
 	while (1) {
 		int c, option_index = 0;
@@ -192,6 +212,8 @@ int main(int argc, char *argv[])
 			{ "clear-all", no_argument, 0, 'a' },
 			{ "enable", no_argument, 0, 'e' },
 			{ "disable", optional_argument, 0, 'd' },
+			{ "inject", no_argument, 0, 'i' },
+			{ "uninject", no_argument, 0, 'u' },
 			{ "hw-info", no_argument, 0, 'H' },
 			{ "row", required_argument, 0, 'r' },
 			{ "bank", required_argument, 0, 'b' },
@@ -200,7 +222,7 @@ int main(int argc, char *argv[])
 			{0, 0, 0, 0}
 		};
 
-		c = getopt_long(argc, argv, "hHr:b:s:w:aled::", long_options,
+		c = getopt_long(argc, argv, "hHr:b:s:w:aled::iu", long_options,
 				&option_index);
 		if (c == -1)
 			break;
@@ -215,6 +237,7 @@ int main(int argc, char *argv[])
 				printf("Number of banks: %d\n", num_banks());
 				printf("Subbanks per bank: %d\n", NUM_SUBBANKS);
 				printf("Max L3 size: %dK\n", L3_SIZE >> 10);
+				printf("Has error injection: %s\n", IS_HASWELL(devid) ? "yes" : "no");
 				exit(EXIT_SUCCESS);
 			case 'r':
 				row = atoi(optarg);
@@ -236,6 +259,12 @@ int main(int argc, char *argv[])
 				if (which_slice >= MAX_SLICES)
 					exit(EXIT_FAILURE);
 				break;
+			case 'i':
+			case 'u':
+				if (!IS_HASWELL(devid)) {
+					fprintf(stderr, "Error injection supported on HSW+ only\n");
+					exit(EXIT_FAILURE);
+				}
 			case 'd':
 				if (optarg) {
 					ret = sscanf(optarg, "%d,%d,%d", &row, &bank, &sbank);
@@ -256,6 +285,23 @@ int main(int argc, char *argv[])
 		}
 	}
 
+	if (action == 'i') {
+		if (((dft >> 1) & 1) != which_slice) {
+			fprintf(stderr, "DFT register already has slice %d enabled, and we don't support multiple slices. Try modifying -w; but sometimes the register sticks in the wrong way\n", (dft >> 1) & 1);
+			exit(EXIT_FAILURE);
+		}
+
+		if (which_slice == -1) {
+			fprintf(stderr, "Cannot inject errors to multiple slices (modify -w)\n");
+			exit(EXIT_FAILURE);
+		}
+		if (dft & 1 && ((dft >> 1) && 1) == which_slice)
+			printf("warning: overwriting existing injections. This is very dangerous.\n");
+	}
+
+	if (action == 'l')
+		decode_dft(dft);
+
 	/* Per slice operations */
 	for_each_slice(i) {
 		switch (action) {
@@ -271,11 +317,30 @@ int main(int argc, char *argv[])
 			case 'd':
 				assert(disable_rbs(row, bank, sbank, i) == 0);
 				break;
+			case 'i':
+				if (bank == 3) {
+					fprintf(stderr, "The hardware does not support error inject on bank 3.\n");
+					exit(EXIT_FAILURE);
+				}
+				dft |= row << 7;
+				dft |= sbank << 4;
+				dft |= bank << 2;
+				assert(i < 2);
+				dft |= i << 1; /* slice */
+				dft |= 1 << 0; /* enable */
+				intel_register_write(0xb038, dft);
+				break;
+			case 'u':
+				intel_register_write(0xb038, dft & ~(1<<0));
+				break;
+			case 'L':
+				break;
 			default:
 				abort();
 		}
 	}
 
+	intel_register_access_fini();
 	if (action == 'l')
 		exit(EXIT_SUCCESS);
 
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 16/16] intel_l3_parity: Support a daemonic mode
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (14 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 15/16] intel_l3_parity: Support error injection Ben Widawsky
@ 2013-09-13  5:28 ` Ben Widawsky
  2013-09-13  9:44 ` [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ville Syrjälä
  2013-09-17  0:52 ` Bell, Bryan J
  17 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13  5:28 UTC (permalink / raw)
  To: intel-gfx; +Cc: bryan.j.bell, Ben Widawsky, Ben Widawsky, vishnu.venkatesh

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 tools/Makefile.am              |   6 ++-
 tools/intel_l3_parity.c        |  39 +++++++++++++--
 tools/intel_l3_parity.h        |  31 ++++++++++++
 tools/intel_l3_udev_listener.c | 108 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 179 insertions(+), 5 deletions(-)
 create mode 100644 tools/intel_l3_parity.h
 create mode 100644 tools/intel_l3_udev_listener.c

diff --git a/tools/Makefile.am b/tools/Makefile.am
index 47bd5b3..19810cf 100644
--- a/tools/Makefile.am
+++ b/tools/Makefile.am
@@ -39,7 +39,7 @@ dist_bin_SCRIPTS = intel_gpu_abrt
 
 AM_CPPFLAGS = -I$(top_srcdir) -I$(top_srcdir)/lib
 AM_CFLAGS = $(DRM_CFLAGS) $(PCIACCESS_CFLAGS) $(CWARNFLAGS) $(CAIRO_CFLAGS)
-LDADD = $(top_builddir)/lib/libintel_tools.la $(DRM_LIBS) $(PCIACCESS_LIBS) $(CAIRO_LIBS)
+LDADD = $(top_builddir)/lib/libintel_tools.la $(DRM_LIBS) $(PCIACCESS_LIBS) $(CAIRO_LIBS) $(LIBUDEV_LIBS)
 
 intel_dump_decode_SOURCES = 	\
 	intel_dump_decode.c
@@ -50,3 +50,7 @@ intel_error_decode_SOURCES =	\
 intel_bios_reader_SOURCES =	\
 	intel_bios_reader.c	\
 	intel_bios.h
+
+intel_l3_parity_SOURCES =	\
+	intel_l3_parity.c	\
+	intel_l3_udev_listener.c
diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c
index cd6754e..1af424e 100644
--- a/tools/intel_l3_parity.c
+++ b/tools/intel_l3_parity.c
@@ -37,6 +37,14 @@
 #include "intel_chipset.h"
 #include "intel_gpu_tools.h"
 #include "drmtest.h"
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+#if HAVE_UDEV
+#include <libudev.h>
+#include <syslog.h>
+#endif
+#include "intel_l3_parity.h"
 
 static unsigned int devid;
 /* L3 size is always a function of banks. The number of banks cannot be
@@ -157,7 +165,8 @@ static void usage(const char *name)
 		"  -r, --row=[row]			The row to act upon (default 0)\n"
 		"  -b, --bank=[bank]			The bank to act upon (default 0)\n"
 		"  -s, --subbank=[subbank]		The subbank to act upon (default 0)\n"
-		"  -w, --slice=[slice]			Which slice to act on (default: -1 [all])"
+		"  -w, --slice=[slice]			Which slice to act on (default: -1 [all])\n"
+		"    , --daemon				Run the listener (-L) as a daemon\n"
 		" ACTIONS (only 1 may be specified at a time):\n"
 		"  -h, --help				Display this help\n"
 		"  -H, --hw-info				Display the current L3 properties\n"
@@ -166,7 +175,8 @@ static void usage(const char *name)
 		"  -e, --enable				Enable row, bank, subbank (undo -d)\n"
 		"  -d, --disable=<row,bank,subbank>	Disable row, bank, subbank (inline arguments are deprecated. Please use -r, -b, -s instead\n"
 		"  -i, --inject				[HSW only] Cause hardware to inject a row errors\n"
-		"  -u, --uninject			[HSW only] Turn off hardware error injectection (undo -i)\n",
+		"  -u, --uninject			[HSW only] Turn off hardware error injectection (undo -i)\n"
+		"  -L, --listen				Listen for uevent errors\n",
 		name);
 }
 
@@ -179,6 +189,7 @@ int main(int argc, char *argv[])
 	int fd[REAL_MAX_SLICES] = {0}, ret, i;
 	int action = '0';
 	int drm_fd = drm_open_any();
+	int daemonize = 0;
 	devid = intel_get_drm_devid(drm_fd);
 
 	if (intel_gen(devid) < 7)
@@ -206,7 +217,7 @@ int main(int argc, char *argv[])
 
 	while (1) {
 		int c, option_index = 0;
-		static struct option long_options[] = {
+		struct option long_options[] = {
 			{ "help", no_argument, 0, 'h' },
 			{ "list", no_argument, 0, 'l' },
 			{ "clear-all", no_argument, 0, 'a' },
@@ -215,18 +226,23 @@ int main(int argc, char *argv[])
 			{ "inject", no_argument, 0, 'i' },
 			{ "uninject", no_argument, 0, 'u' },
 			{ "hw-info", no_argument, 0, 'H' },
+			{ "listen", no_argument, 0, 'L' },
 			{ "row", required_argument, 0, 'r' },
 			{ "bank", required_argument, 0, 'b' },
 			{ "subbank", required_argument, 0, 's' },
 			{ "slice", required_argument, 0, 'w' },
+			{ "daemon", no_argument, &daemonize, 1 },
 			{0, 0, 0, 0}
 		};
 
-		c = getopt_long(argc, argv, "hHr:b:s:w:aled::iu", long_options,
+		c = getopt_long(argc, argv, "hHr:b:s:w:aled::iuL", long_options,
 				&option_index);
 		if (c == -1)
 			break;
 
+		if (c == 0)
+			continue;
+
 		switch (c) {
 			case '?':
 			case 'h':
@@ -274,6 +290,7 @@ int main(int argc, char *argv[])
 			case 'a':
 			case 'l':
 			case 'e':
+			case 'L':
 				if (action != '0') {
 					fprintf(stderr, "Only one action may be specified\n");
 					exit(EXIT_FAILURE);
@@ -299,6 +316,20 @@ int main(int argc, char *argv[])
 			printf("warning: overwriting existing injections. This is very dangerous.\n");
 	}
 
+	/* Daemon doesn't work like the other commands */
+	if (action == 'L') {
+		struct l3_parity par;
+		struct l3_location loc;
+		if (daemonize) {
+			assert(daemon(0, 0) == 0);
+			openlog(argv[0], LOG_CONS | LOG_PID, LOG_USER);
+		}
+		memset(&par, 0, sizeof(par));
+		assert(l3_uevent_setup(&par) == 0);
+		assert(l3_listen(&par, daemonize == 1, &loc) == 0);
+		exit(EXIT_SUCCESS);
+	}
+
 	if (action == 'l')
 		decode_dft(dft);
 
diff --git a/tools/intel_l3_parity.h b/tools/intel_l3_parity.h
new file mode 100644
index 0000000..65697c4
--- /dev/null
+++ b/tools/intel_l3_parity.h
@@ -0,0 +1,31 @@
+#ifndef INTEL_L3_PARITY_H_
+#define INTEL_L3_PARITY_H_
+
+#include <stdint.h>
+#include <stdbool.h>
+
+struct l3_parity {
+	struct udev *udev;
+	struct udev_monitor *uevent_monitor;
+	int fd;
+	fd_set fdset;
+};
+
+struct l3_location {
+	uint8_t slice;
+	uint16_t row;
+	uint8_t bank;
+	uint8_t subbank;
+};
+
+#if HAVE_UDEV
+int l3_uevent_setup(struct l3_parity *par);
+/* Listens (blocks) for an l3 parity event. Returns the location of the error. */
+int l3_listen(struct l3_parity *par, bool daemon, struct l3_location *loc);
+#define l3_uevent_teardown(par) {}
+#else
+#define l3_uevent_setup(par, daemon, loc) -1
+#define l3_listen(par) -1
+#endif
+
+#endif
diff --git a/tools/intel_l3_udev_listener.c b/tools/intel_l3_udev_listener.c
new file mode 100644
index 0000000..c50820c
--- /dev/null
+++ b/tools/intel_l3_udev_listener.c
@@ -0,0 +1,108 @@
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
+#if HAVE_UDEV
+#include <libudev.h>
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <assert.h>
+#include <syslog.h>
+#include "i915_drm.h"
+#include "intel_l3_parity.h"
+
+#ifndef I915_L3_PARITY_UEVENT
+#define I915_L3_PARITY_UEVENT "L3_PARITY_ERROR"
+#endif
+
+int l3_uevent_setup(struct l3_parity *par)
+{
+	struct udev *udev;
+	struct udev_monitor *uevent_monitor;
+	fd_set fdset;
+	int fd, ret = -1;
+
+	udev = udev_new();
+	if (!udev) {
+		return -1;
+	}
+
+	uevent_monitor = udev_monitor_new_from_netlink(udev, "udev");
+	if (!uevent_monitor)
+		goto err_out;
+
+	ret = udev_monitor_filter_add_match_subsystem_devtype(uevent_monitor, "drm", "drm_minor");
+	if (ret < 0)
+		goto err_out;
+
+	ret = udev_monitor_enable_receiving(uevent_monitor);
+	if (ret < 0)
+		goto err_out;
+
+	fd = udev_monitor_get_fd(uevent_monitor);
+	FD_ZERO(&fdset);
+	FD_SET(fd, &fdset);
+
+	par->udev = udev;
+	par->fd = fd;
+	par->fdset = fdset;
+	par->uevent_monitor = uevent_monitor;
+	return 0;
+
+err_out:
+	udev_unref(udev);
+	return ret;
+}
+
+int l3_listen(struct l3_parity *par, bool daemon, struct l3_location *loc)
+{
+	struct udev_device *udev_dev;
+	const char *parity_status;
+	char *err_msg;
+	int ret;
+
+again:
+	ret = select(par->fd + 1, &par->fdset, NULL, NULL, NULL);
+	/* Number of bits set is returned, must be >= 1 */
+	if (ret <= 0) {
+		return ret;
+	}
+
+	assert(FD_ISSET(par->fd, &par->fdset));
+
+	udev_dev = udev_monitor_receive_device(par->uevent_monitor);
+	if (!udev_dev)
+		return -1;
+
+	parity_status = udev_device_get_property_value(udev_dev, I915_L3_PARITY_UEVENT);
+	if (strncmp(parity_status, "1", 1))
+		goto again;
+
+	loc->slice = atoi(udev_device_get_property_value(udev_dev, "SLICE"));
+	loc->row = atoi(udev_device_get_property_value(udev_dev, "ROW"));
+	loc->bank = atoi(udev_device_get_property_value(udev_dev, "BANK"));
+	loc->subbank = atoi(udev_device_get_property_value(udev_dev, "SUBBANK"));
+
+	udev_device_unref(udev_dev);
+
+	asprintf(&err_msg, "Parity error detected on: %d,%d,%d,%d. "
+			"Try to run intel_l3_parity -r %d -b %d -s %d -w %d -d",
+			loc->slice, loc->row, loc->bank, loc->subbank,
+			loc->row, loc->bank, loc->subbank, loc->slice);
+	if (daemon) {
+		syslog(LOG_INFO, "%s\n", err_msg);
+		goto again;
+	}
+
+	fprintf(stderr, "%s\n", err_msg);
+
+	free(err_msg);
+
+	return 0;
+}
+#endif
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/8] drm/i915: Fix HSW parity test
  2013-09-13  5:28 ` [PATCH 4/8] drm/i915: Fix HSW parity test Ben Widawsky
@ 2013-09-13  8:17   ` Ville Syrjälä
  0 siblings, 0 replies; 40+ messages in thread
From: Ville Syrjälä @ 2013-09-13  8:17 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, vishnu.venkatesh, bryan.j.bell, Ben Widawsky

On Thu, Sep 12, 2013 at 10:28:30PM -0700, Ben Widawsky wrote:
> Haswell changed the log registers to be WO, so we can no longer read
> them to determine the programming (which sucks, see later note). For
> now, simply use the cached value, and hope HW doesn't screw us over.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57441
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_sysfs.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
> index d572435..43c2e81 100644
> --- a/drivers/gpu/drm/i915/i915_sysfs.c
> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
> @@ -133,6 +133,19 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
>  	if (ret)
>  		return ret;
>  
> +	if (IS_HASWELL(drm_dev)) {
> +		int last = min_t(int, GEN7_L3LOG_SIZE, count + offset);
> +		if ((!dev_priv->l3_parity.remap_info))
                   ^^

Also could just flip the if vs. else branches around to avoid the '!'.

> +			memset(buf + offset, 0, last - offset);
> +		else
> +			memcpy(buf + offset,
> +			       dev_priv->l3_parity.remap_info + (offset/4),
> +			       last - offset);
> +
> +		i = last;

And it looks like this didn't get updated after we bikesh(r)edded the
register read part. It should just be:

 if (...)
  memset(buf, 0, count);
 else
  memcpy(buf, dev_priv->l3_parity.remap_info + (offset/4), count);


> +		goto out;
> +	}
> +
>  	misccpctl = I915_READ(GEN7_MISCCPCTL);
>  	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
>  
> @@ -141,6 +154,7 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
   
>  	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
>  
> +out:
>  	mutex_unlock(&drm_dev->struct_mutex);
>  
>  	return i;
> -- 
> 1.8.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 15/16] intel_l3_parity: Support error injection
  2013-09-13  5:28 ` [PATCH 15/16] intel_l3_parity: Support error injection Ben Widawsky
@ 2013-09-13  9:12   ` Daniel Vetter
  2013-09-13 15:54     ` Ben Widawsky
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Vetter @ 2013-09-13  9:12 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, vishnu.venkatesh, bryan.j.bell, Ben Widawsky

On Thu, Sep 12, 2013 at 10:28:41PM -0700, Ben Widawsky wrote:
> Haswell added the ability to inject errors which is extremely useful for
> testing. Add two arguments to the tool to inject, and uninject.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Do we run any risk that a concurrent write/read to the same register range
could hang the machine due to the same-cacheline w/a we need? Just want to
make sure that when we integrate this into a testcase there's no surprises
like with intel_gpu_top ...
-Daniel

> ---
>  tests/sysfs_l3_parity   |  2 +-
>  tools/intel_l3_parity.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/sysfs_l3_parity b/tests/sysfs_l3_parity
> index a0dfad9..e9d4411 100755
> --- a/tests/sysfs_l3_parity
> +++ b/tests/sysfs_l3_parity
> @@ -21,7 +21,7 @@ fi
>  $SOURCE_DIR/../tools/intel_l3_parity -r 0 -b 0 -s 0 -e
>  
>  #Check that we can clear remaps
> -if [ `$SOURCE_DIR/../tools/intel_l3_parity -l | wc -c` != "0" ] ; then
> +if [ `$SOURCE_DIR/../tools/intel_l3_parity -l | wc -l` != 1 ] ; then
>  	echo "Fail 2"
>  	exit 1
>  fi
> diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c
> index cf15541..cd6754e 100644
> --- a/tools/intel_l3_parity.c
> +++ b/tools/intel_l3_parity.c
> @@ -79,6 +79,20 @@ static int which_slice = -1;
>  			(__i) < ((which_slice == -1) ? MAX_SLICES : (which_slice + 1)); \
>  			(__i)++)
>  
> +static void decode_dft(uint32_t dft)
> +{
> +	if (IS_IVYBRIDGE(devid) || !(dft & 1)) {
> +		printf("Error injection disabled\n");
> +		return;
> +	}
> +	printf("Error injection enabled\n");
> +	printf("  Hang = %s\n", (dft >> 28) & 0x1 ? "yes" : "no");
> +	printf("  Row = %d\n", (dft >> 7) & 0x7ff);
> +	printf("  Bank = %d\n", (dft >> 2) & 0x3);
> +	printf("  Subbank = %d\n", (dft >> 4) & 0x7);
> +	printf("  Slice = %d\n", (dft >> 1) & 0x1);
> +}
> +
>  static void dumpit(int slice)
>  {
>  	int i, j;
> @@ -150,7 +164,9 @@ static void usage(const char *name)
>  		"  -l, --list				List the current L3 logs\n"
>  		"  -a, --clear-all			Clear all disabled rows\n"
>  		"  -e, --enable				Enable row, bank, subbank (undo -d)\n"
> -		"  -d, --disable=<row,bank,subbank>	Disable row, bank, subbank (inline arguments are deprecated. Please use -r, -b, -s instead\n",
> +		"  -d, --disable=<row,bank,subbank>	Disable row, bank, subbank (inline arguments are deprecated. Please use -r, -b, -s instead\n"
> +		"  -i, --inject				[HSW only] Cause hardware to inject a row errors\n"
> +		"  -u, --uninject			[HSW only] Turn off hardware error injectection (undo -i)\n",
>  		name);
>  }
>  
> @@ -158,6 +174,7 @@ int main(int argc, char *argv[])
>  {
>  	const int device = drm_get_card();
>  	char *path[REAL_MAX_SLICES];
> +	uint32_t dft;
>  	int row = 0, bank = 0, sbank = 0;
>  	int fd[REAL_MAX_SLICES] = {0}, ret, i;
>  	int action = '0';
> @@ -167,6 +184,8 @@ int main(int argc, char *argv[])
>  	if (intel_gen(devid) < 7)
>  		exit(EXIT_SUCCESS);
>  
> +	assert(intel_register_access_init(intel_get_pci_device(), 0) == 0);
> +
>  	ret = asprintf(&path[0], "/sys/class/drm/card%d/l3_parity", device);
>  	assert(ret != -1);
>  	ret = asprintf(&path[1], "/sys/class/drm/card%d/l3_parity_slice_1", device);
> @@ -183,6 +202,7 @@ int main(int argc, char *argv[])
>  		assert(lseek(fd[i], 0, SEEK_SET) == 0);
>  	}
>  
> +	dft = intel_register_read(0xb038);
>  
>  	while (1) {
>  		int c, option_index = 0;
> @@ -192,6 +212,8 @@ int main(int argc, char *argv[])
>  			{ "clear-all", no_argument, 0, 'a' },
>  			{ "enable", no_argument, 0, 'e' },
>  			{ "disable", optional_argument, 0, 'd' },
> +			{ "inject", no_argument, 0, 'i' },
> +			{ "uninject", no_argument, 0, 'u' },
>  			{ "hw-info", no_argument, 0, 'H' },
>  			{ "row", required_argument, 0, 'r' },
>  			{ "bank", required_argument, 0, 'b' },
> @@ -200,7 +222,7 @@ int main(int argc, char *argv[])
>  			{0, 0, 0, 0}
>  		};
>  
> -		c = getopt_long(argc, argv, "hHr:b:s:w:aled::", long_options,
> +		c = getopt_long(argc, argv, "hHr:b:s:w:aled::iu", long_options,
>  				&option_index);
>  		if (c == -1)
>  			break;
> @@ -215,6 +237,7 @@ int main(int argc, char *argv[])
>  				printf("Number of banks: %d\n", num_banks());
>  				printf("Subbanks per bank: %d\n", NUM_SUBBANKS);
>  				printf("Max L3 size: %dK\n", L3_SIZE >> 10);
> +				printf("Has error injection: %s\n", IS_HASWELL(devid) ? "yes" : "no");
>  				exit(EXIT_SUCCESS);
>  			case 'r':
>  				row = atoi(optarg);
> @@ -236,6 +259,12 @@ int main(int argc, char *argv[])
>  				if (which_slice >= MAX_SLICES)
>  					exit(EXIT_FAILURE);
>  				break;
> +			case 'i':
> +			case 'u':
> +				if (!IS_HASWELL(devid)) {
> +					fprintf(stderr, "Error injection supported on HSW+ only\n");
> +					exit(EXIT_FAILURE);
> +				}
>  			case 'd':
>  				if (optarg) {
>  					ret = sscanf(optarg, "%d,%d,%d", &row, &bank, &sbank);
> @@ -256,6 +285,23 @@ int main(int argc, char *argv[])
>  		}
>  	}
>  
> +	if (action == 'i') {
> +		if (((dft >> 1) & 1) != which_slice) {
> +			fprintf(stderr, "DFT register already has slice %d enabled, and we don't support multiple slices. Try modifying -w; but sometimes the register sticks in the wrong way\n", (dft >> 1) & 1);
> +			exit(EXIT_FAILURE);
> +		}
> +
> +		if (which_slice == -1) {
> +			fprintf(stderr, "Cannot inject errors to multiple slices (modify -w)\n");
> +			exit(EXIT_FAILURE);
> +		}
> +		if (dft & 1 && ((dft >> 1) && 1) == which_slice)
> +			printf("warning: overwriting existing injections. This is very dangerous.\n");
> +	}
> +
> +	if (action == 'l')
> +		decode_dft(dft);
> +
>  	/* Per slice operations */
>  	for_each_slice(i) {
>  		switch (action) {
> @@ -271,11 +317,30 @@ int main(int argc, char *argv[])
>  			case 'd':
>  				assert(disable_rbs(row, bank, sbank, i) == 0);
>  				break;
> +			case 'i':
> +				if (bank == 3) {
> +					fprintf(stderr, "The hardware does not support error inject on bank 3.\n");
> +					exit(EXIT_FAILURE);
> +				}
> +				dft |= row << 7;
> +				dft |= sbank << 4;
> +				dft |= bank << 2;
> +				assert(i < 2);
> +				dft |= i << 1; /* slice */
> +				dft |= 1 << 0; /* enable */
> +				intel_register_write(0xb038, dft);
> +				break;
> +			case 'u':
> +				intel_register_write(0xb038, dft & ~(1<<0));
> +				break;
> +			case 'L':
> +				break;
>  			default:
>  				abort();
>  		}
>  	}
>  
> +	intel_register_access_fini();
>  	if (action == 'l')
>  		exit(EXIT_SUCCESS);
>  
> -- 
> 1.8.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/8] drm/i915: Do remaps for all contexts
  2013-09-13  5:28 ` [PATCH 8/8] drm/i915: Do remaps for " Ben Widawsky
@ 2013-09-13  9:17   ` Ville Syrjälä
  2013-09-13  9:20     ` Ville Syrjälä
  2013-09-17 20:42     ` Ben Widawsky
  0 siblings, 2 replies; 40+ messages in thread
From: Ville Syrjälä @ 2013-09-13  9:17 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, vishnu.venkatesh, bryan.j.bell, Ben Widawsky

On Thu, Sep 12, 2013 at 10:28:34PM -0700, Ben Widawsky wrote:
> On both Ivybridge and Haswell, row remapping information is saved and
> restored with context. This means, we never actually properly supported
> the l3 remapping because our sysfs interface is asynchronous (and not
> tied to any context), and the known faulty HW would be reused by the
> next context to run.
> 
> Not that due to the asynchronous nature of the sysfs entry, there is no
> point modifying the registers for the existing context. Instead we set a
> flag for all contexts to load the correct remapping information on the
> next run. Interested clients can use debugfs to determine whether or not
> the row has been remapped.
> 
> One could propose at this point that we just do the remapping in the
> kernel. I guess since we have to maintain the sysfs interface anyway,
> I'm not sure how useful it is, and I do like keeping the policy in
> userspace; (it wasn't my original decision to make the
> interface the way it is, so I'm not attached).
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     |  8 +++++++
>  drivers/gpu/drm/i915/i915_drv.h         |  1 +
>  drivers/gpu/drm/i915/i915_gem_context.c | 17 +++++++++++++--
>  drivers/gpu/drm/i915/i915_sysfs.c       | 38 +++++++++++----------------------
>  4 files changed, 36 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index ada0950..80bed69 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -145,6 +145,13 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>  		seq_printf(m, " (%s)", obj->ring->name);
>  }
>  
> +static void describe_ctx(struct seq_file *m, struct i915_hw_context *ctx)
> +{
> +	seq_putc(m, ctx->is_initialized ? 'I' : 'i');
> +	seq_putc(m, ctx->remap_slice ? 'R' : 'r');
> +	seq_putc(m, ' ');
> +}
> +
>  static int i915_gem_object_list_info(struct seq_file *m, void *data)
>  {
>  	struct drm_info_node *node = (struct drm_info_node *) m->private;
> @@ -1463,6 +1470,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
>  
>  	list_for_each_entry(ctx, &dev_priv->context_list, link) {
>  		seq_puts(m, "HW context ");
> +		describe_ctx(m, ctx);
>  		for_each_ring(ring, dev_priv, i)
>  			if (ring->default_context == ctx)
>  				seq_printf(m, "(default context %s) ", ring->name);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 68f992b..6ba78cd 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -602,6 +602,7 @@ struct i915_hw_context {
>  	struct kref ref;
>  	int id;
>  	bool is_initialized;
> +	uint8_t remap_slice;
>  	struct drm_i915_file_private *file_priv;
>  	struct intel_ring_buffer *ring;
>  	struct drm_i915_gem_object *obj;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 2bbdce8..72b7202 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -140,7 +140,7 @@ create_hw_context(struct drm_device *dev,
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct i915_hw_context *ctx;
> -	int ret;
> +	int ret, i;
>  
>  	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>  	if (ctx == NULL)
> @@ -181,6 +181,8 @@ create_hw_context(struct drm_device *dev,
>  
>  	ctx->file_priv = file_priv;
>  	ctx->id = ret;
> +	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> +		ctx->remap_slice |= 1 << 1;
                                         ^
i

>  
>  	return ctx;
>  
> @@ -396,7 +398,7 @@ static int do_switch(struct i915_hw_context *to)
>  	struct intel_ring_buffer *ring = to->ring;
>  	struct i915_hw_context *from = ring->last_context;
>  	u32 hw_flags = 0;
> -	int ret;
> +	int ret, i;
>  
>  	BUG_ON(from != NULL && from->obj != NULL && from->obj->pin_count == 0);
>  
> @@ -432,6 +434,17 @@ static int do_switch(struct i915_hw_context *to)
>  		return ret;
>  	}
>  
> +	for (i = 0; i < MAX_L3_SLICES; i++) {
> +		if (!(to->remap_slice & (1<<i)))
> +			continue;
> +
> +		ret = i915_gem_l3_remap(ring, i);
> +		if (!ret) {
> +			to->remap_slice &= ~(1<<i);
> +		}
> +		/* If it failed, try again next round */
> +	}
> +
>  	/* The backing object for the context is done after switching to the
>  	 * *next* context. Therefore we cannot retire the previous context until
>  	 * the next context has already started running. In fact, the below code
> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
> index 65a7274..f0d7e1c 100644
> --- a/drivers/gpu/drm/i915/i915_sysfs.c
> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
> @@ -118,9 +118,8 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
>  	struct drm_minor *dminor = container_of(dev, struct drm_minor, kdev);
>  	struct drm_device *drm_dev = dminor->dev;
>  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
> -	uint32_t misccpctl;
>  	int slice = (int)(uintptr_t)attr->private;
> -	int i, ret;
> +	int ret;
>  
>  	count = round_down(count, 4);
>  
> @@ -134,31 +133,16 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
>  	if (ret)
>  		return ret;
>  
> -	if (IS_HASWELL(drm_dev)) {
> -		int last = min_t(int, GEN7_L3LOG_SIZE, count + offset);
> -		if ((!dev_priv->l3_parity.remap_info[slice]))
> -			memset(buf + offset, 0, last - offset);
> -		else
> -			memcpy(buf + offset,
> -			       dev_priv->l3_parity.remap_info[slice] + (offset/4),
> -			       last - offset);
> -
> -		i = last;
> -		goto out;
> -	}
> -
> -	misccpctl = I915_READ(GEN7_MISCCPCTL);
> -	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
> -
> -	for (i = 0; i < count; i += 4)
> -		*((uint32_t *)(&buf[i])) = I915_READ(GEN7_L3LOG_BASE + offset + i);
> -
> -	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> +	if ((!dev_priv->l3_parity.remap_info[slice]))
> +		memset(buf + offset, 0, count);
> +	else
> +		memcpy(buf + offset,
> +		       dev_priv->l3_parity.remap_info[slice] + (offset/4),
> +		       count);

Needs fixing after patch 4 is fixed.

>  
> -out:
>  	mutex_unlock(&drm_dev->struct_mutex);
>  
> -	return i;
> +	return count;
>  }
>  
>  static ssize_t
> @@ -170,6 +154,7 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
>  	struct drm_minor *dminor = container_of(dev, struct drm_minor, kdev);
>  	struct drm_device *drm_dev = dminor->dev;
>  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
> +	struct i915_hw_context *ctx;
>  	u32 *temp = NULL; /* Just here to make handling failures easy */
>  	int slice = (int)(uintptr_t)attr->private;
>  	int ret;
> @@ -206,8 +191,9 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
>  
>  	memcpy(dev_priv->l3_parity.remap_info[slice] + (offset/4), buf, count);
>  
> -	if (i915_gem_l3_remap(&dev_priv->ring[RCS], slice))
> -		count = 0;
> +	/* NB: We defer the remapping until we switch to the context */
> +	list_for_each_entry(ctx, &dev_priv->context_list, link)
> +		ctx->remap_slice |= (1<<slice);

Should we force a context switch on the next batch if the current
context has remap_slice != 0?

Also what happens if hw_contexts_disabled==false?

>  
>  	mutex_unlock(&drm_dev->struct_mutex);
>  
> -- 
> 1.8.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/8] drm/i915: Do remaps for all contexts
  2013-09-13  9:17   ` Ville Syrjälä
@ 2013-09-13  9:20     ` Ville Syrjälä
  2013-09-17 20:42     ` Ben Widawsky
  1 sibling, 0 replies; 40+ messages in thread
From: Ville Syrjälä @ 2013-09-13  9:20 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, vishnu.venkatesh, bryan.j.bell, Ben Widawsky

On Fri, Sep 13, 2013 at 12:17:17PM +0300, Ville Syrjälä wrote:
> On Thu, Sep 12, 2013 at 10:28:34PM -0700, Ben Widawsky wrote:
> > On both Ivybridge and Haswell, row remapping information is saved and
> > restored with context. This means, we never actually properly supported
> > the l3 remapping because our sysfs interface is asynchronous (and not
> > tied to any context), and the known faulty HW would be reused by the
> > next context to run.
> > 
> > Not that due to the asynchronous nature of the sysfs entry, there is no
> > point modifying the registers for the existing context. Instead we set a
> > flag for all contexts to load the correct remapping information on the
> > next run. Interested clients can use debugfs to determine whether or not
> > the row has been remapped.
> > 
> > One could propose at this point that we just do the remapping in the
> > kernel. I guess since we have to maintain the sysfs interface anyway,
> > I'm not sure how useful it is, and I do like keeping the policy in
> > userspace; (it wasn't my original decision to make the
> > interface the way it is, so I'm not attached).
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c     |  8 +++++++
> >  drivers/gpu/drm/i915/i915_drv.h         |  1 +
> >  drivers/gpu/drm/i915/i915_gem_context.c | 17 +++++++++++++--
> >  drivers/gpu/drm/i915/i915_sysfs.c       | 38 +++++++++++----------------------
> >  4 files changed, 36 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index ada0950..80bed69 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -145,6 +145,13 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> >  		seq_printf(m, " (%s)", obj->ring->name);
> >  }
> >  
> > +static void describe_ctx(struct seq_file *m, struct i915_hw_context *ctx)
> > +{
> > +	seq_putc(m, ctx->is_initialized ? 'I' : 'i');
> > +	seq_putc(m, ctx->remap_slice ? 'R' : 'r');
> > +	seq_putc(m, ' ');
> > +}
> > +
> >  static int i915_gem_object_list_info(struct seq_file *m, void *data)
> >  {
> >  	struct drm_info_node *node = (struct drm_info_node *) m->private;
> > @@ -1463,6 +1470,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
> >  
> >  	list_for_each_entry(ctx, &dev_priv->context_list, link) {
> >  		seq_puts(m, "HW context ");
> > +		describe_ctx(m, ctx);
> >  		for_each_ring(ring, dev_priv, i)
> >  			if (ring->default_context == ctx)
> >  				seq_printf(m, "(default context %s) ", ring->name);
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 68f992b..6ba78cd 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -602,6 +602,7 @@ struct i915_hw_context {
> >  	struct kref ref;
> >  	int id;
> >  	bool is_initialized;
> > +	uint8_t remap_slice;
> >  	struct drm_i915_file_private *file_priv;
> >  	struct intel_ring_buffer *ring;
> >  	struct drm_i915_gem_object *obj;
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 2bbdce8..72b7202 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -140,7 +140,7 @@ create_hw_context(struct drm_device *dev,
> >  {
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	struct i915_hw_context *ctx;
> > -	int ret;
> > +	int ret, i;
> >  
> >  	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> >  	if (ctx == NULL)
> > @@ -181,6 +181,8 @@ create_hw_context(struct drm_device *dev,
> >  
> >  	ctx->file_priv = file_priv;
> >  	ctx->id = ret;
> > +	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > +		ctx->remap_slice |= 1 << 1;
>                                          ^
> i
> 
> >  
> >  	return ctx;
> >  
> > @@ -396,7 +398,7 @@ static int do_switch(struct i915_hw_context *to)
> >  	struct intel_ring_buffer *ring = to->ring;
> >  	struct i915_hw_context *from = ring->last_context;
> >  	u32 hw_flags = 0;
> > -	int ret;
> > +	int ret, i;
> >  
> >  	BUG_ON(from != NULL && from->obj != NULL && from->obj->pin_count == 0);
> >  
> > @@ -432,6 +434,17 @@ static int do_switch(struct i915_hw_context *to)
> >  		return ret;
> >  	}
> >  
> > +	for (i = 0; i < MAX_L3_SLICES; i++) {
> > +		if (!(to->remap_slice & (1<<i)))
> > +			continue;
> > +
> > +		ret = i915_gem_l3_remap(ring, i);
> > +		if (!ret) {
> > +			to->remap_slice &= ~(1<<i);
> > +		}
> > +		/* If it failed, try again next round */
> > +	}
> > +
> >  	/* The backing object for the context is done after switching to the
> >  	 * *next* context. Therefore we cannot retire the previous context until
> >  	 * the next context has already started running. In fact, the below code
> > diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
> > index 65a7274..f0d7e1c 100644
> > --- a/drivers/gpu/drm/i915/i915_sysfs.c
> > +++ b/drivers/gpu/drm/i915/i915_sysfs.c
> > @@ -118,9 +118,8 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
> >  	struct drm_minor *dminor = container_of(dev, struct drm_minor, kdev);
> >  	struct drm_device *drm_dev = dminor->dev;
> >  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
> > -	uint32_t misccpctl;
> >  	int slice = (int)(uintptr_t)attr->private;
> > -	int i, ret;
> > +	int ret;
> >  
> >  	count = round_down(count, 4);
> >  
> > @@ -134,31 +133,16 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
> >  	if (ret)
> >  		return ret;
> >  
> > -	if (IS_HASWELL(drm_dev)) {
> > -		int last = min_t(int, GEN7_L3LOG_SIZE, count + offset);
> > -		if ((!dev_priv->l3_parity.remap_info[slice]))
> > -			memset(buf + offset, 0, last - offset);
> > -		else
> > -			memcpy(buf + offset,
> > -			       dev_priv->l3_parity.remap_info[slice] + (offset/4),
> > -			       last - offset);
> > -
> > -		i = last;
> > -		goto out;
> > -	}
> > -
> > -	misccpctl = I915_READ(GEN7_MISCCPCTL);
> > -	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
> > -
> > -	for (i = 0; i < count; i += 4)
> > -		*((uint32_t *)(&buf[i])) = I915_READ(GEN7_L3LOG_BASE + offset + i);
> > -
> > -	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> > +	if ((!dev_priv->l3_parity.remap_info[slice]))
> > +		memset(buf + offset, 0, count);
> > +	else
> > +		memcpy(buf + offset,
> > +		       dev_priv->l3_parity.remap_info[slice] + (offset/4),
> > +		       count);
> 
> Needs fixing after patch 4 is fixed.
> 
> >  
> > -out:
> >  	mutex_unlock(&drm_dev->struct_mutex);
> >  
> > -	return i;
> > +	return count;
> >  }
> >  
> >  static ssize_t
> > @@ -170,6 +154,7 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
> >  	struct drm_minor *dminor = container_of(dev, struct drm_minor, kdev);
> >  	struct drm_device *drm_dev = dminor->dev;
> >  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
> > +	struct i915_hw_context *ctx;
> >  	u32 *temp = NULL; /* Just here to make handling failures easy */
> >  	int slice = (int)(uintptr_t)attr->private;
> >  	int ret;
> > @@ -206,8 +191,9 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
> >  
> >  	memcpy(dev_priv->l3_parity.remap_info[slice] + (offset/4), buf, count);
> >  
> > -	if (i915_gem_l3_remap(&dev_priv->ring[RCS], slice))
> > -		count = 0;
> > +	/* NB: We defer the remapping until we switch to the context */
> > +	list_for_each_entry(ctx, &dev_priv->context_list, link)
> > +		ctx->remap_slice |= (1<<slice);
> 
> Should we force a context switch on the next batch if the current
> context has remap_slice != 0?
> 
> Also what happens if hw_contexts_disabled==false?

Argh. I mean true of course. Brain has regressed and no longer
understands a single "disabled" boolean.

> 
> >  
> >  	mutex_unlock(&drm_dev->struct_mutex);
> >  
> > -- 
> > 1.8.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ville Syrjälä
> Intel OTC

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/8] drm/i915: Add second slice l3 remapping
  2013-09-13  5:28 ` [PATCH 5/8] drm/i915: Add second slice l3 remapping Ben Widawsky
@ 2013-09-13  9:38   ` Ville Syrjälä
  2013-09-17 18:45     ` Ben Widawsky
  0 siblings, 1 reply; 40+ messages in thread
From: Ville Syrjälä @ 2013-09-13  9:38 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, vishnu.venkatesh, bryan.j.bell, Ben Widawsky

On Thu, Sep 12, 2013 at 10:28:31PM -0700, Ben Widawsky wrote:
> Certain HSW SKUs have a second bank of L3. This L3 remapping has a
> separate register set, and interrupt from the first "slice". A slice is
> simply a term to define some subset of the GPU's l3 cache. This patch
> implements both the interrupt handler, and ability to communicate with
> userspace about this second slice.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  9 ++--
>  drivers/gpu/drm/i915/i915_gem.c         | 26 ++++++----
>  drivers/gpu/drm/i915/i915_irq.c         | 84 +++++++++++++++++++++------------
>  drivers/gpu/drm/i915/i915_reg.h         |  6 +++
>  drivers/gpu/drm/i915/i915_sysfs.c       | 34 ++++++++++---
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +--
>  include/uapi/drm/i915_drm.h             |  8 ++--
>  7 files changed, 115 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 81ba5bb..eb90461 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -918,9 +918,11 @@ struct i915_ums_state {
>  	int mm_suspended;
>  };
>  
> +#define MAX_L3_SLICES 2
>  struct intel_l3_parity {
> -	u32 *remap_info;
> +	u32 *remap_info[MAX_L3_SLICES];
>  	struct work_struct error_work;
> +	int which_slice;
>  };
>  
>  struct i915_gem_mm {
> @@ -1686,7 +1688,8 @@ struct drm_i915_file_private {
>  
>  #define HAS_FORCE_WAKE(dev) (INTEL_INFO(dev)->has_force_wake)
>  
> -#define HAS_L3_GPU_CACHE(dev) (IS_IVYBRIDGE(dev) || IS_HASWELL(dev))
> +#define HAS_L3_GPU_CACHE(dev) (INTEL_INFO(dev)->gen >= 7)
> +#define NUM_L3_SLICES(dev) (IS_HSW_GT3(dev) ? 2 : HAS_L3_GPU_CACHE(dev))
>  
>  #define GT_FREQUENCY_MULTIPLIER 50
>  
> @@ -1947,7 +1950,7 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
>  int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
>  int __must_check i915_gem_init(struct drm_device *dev);
>  int __must_check i915_gem_init_hw(struct drm_device *dev);
> -void i915_gem_l3_remap(struct drm_device *dev);
> +void i915_gem_l3_remap(struct drm_device *dev, int slice);
>  void i915_gem_init_swizzling(struct drm_device *dev);
>  void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
>  int __must_check i915_gpu_idle(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5b510a3..b11f7d6c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4256,16 +4256,21 @@ i915_gem_idle(struct drm_device *dev)
>  	return 0;
>  }
>  
> -void i915_gem_l3_remap(struct drm_device *dev)
> +void i915_gem_l3_remap(struct drm_device *dev, int slice)
>  {
>  	drm_i915_private_t *dev_priv = dev->dev_private;
> +	u32 reg_base = GEN7_L3LOG_BASE + (slice * 0x200);
> +	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
>  	u32 misccpctl;
>  	int i;
>  
>  	if (!HAS_L3_GPU_CACHE(dev))
>  		return;
>  
> -	if (!dev_priv->l3_parity.remap_info)
> +	if (NUM_L3_SLICES(dev) < 2 && slice)
> +		return;

This check is redundant as we should never populate
l3_parity.remap_info[1] when there's no second slice.

> +
> +	if (!remap_info)
>  		return;
>  
>  	misccpctl = I915_READ(GEN7_MISCCPCTL);
> @@ -4273,17 +4278,17 @@ void i915_gem_l3_remap(struct drm_device *dev)
>  	POSTING_READ(GEN7_MISCCPCTL);
>  
>  	for (i = 0; i < GEN7_L3LOG_SIZE; i += 4) {
> -		u32 remap = I915_READ(GEN7_L3LOG_BASE + i);
> -		if (remap && remap != dev_priv->l3_parity.remap_info[i/4])
> +		u32 remap = I915_READ(reg_base + i);
> +		if (remap && remap != remap_info[i/4])
>  			DRM_DEBUG("0x%x was already programmed to %x\n",
> -				  GEN7_L3LOG_BASE + i, remap);
> -		if (remap && !dev_priv->l3_parity.remap_info[i/4])
> +				  reg_base + i, remap);
> +		if (remap && !remap_info[i/4])
>  			DRM_DEBUG_DRIVER("Clearing remapped register\n");
> -		I915_WRITE(GEN7_L3LOG_BASE + i, dev_priv->l3_parity.remap_info[i/4]);
> +		I915_WRITE(reg_base + i, remap_info[i/4]);
>  	}
>  
>  	/* Make sure all the writes land before disabling dop clock gating */
> -	POSTING_READ(GEN7_L3LOG_BASE);
> +	POSTING_READ(reg_base);
>  
>  	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
>  }
> @@ -4377,7 +4382,7 @@ int
>  i915_gem_init_hw(struct drm_device *dev)
>  {
>  	drm_i915_private_t *dev_priv = dev->dev_private;
> -	int ret;
> +	int ret, i;
>  
>  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>  		return -EIO;
> @@ -4396,7 +4401,8 @@ i915_gem_init_hw(struct drm_device *dev)
>  		I915_WRITE(GEN7_MSG_CTL, temp);
>  	}
>  
> -	i915_gem_l3_remap(dev);
> +	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> +		i915_gem_l3_remap(dev, i);
>  
>  	i915_gem_init_swizzling(dev);
>  
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 13d26cf..62cdf05 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -882,9 +882,10 @@ static void ivybridge_parity_work(struct work_struct *work)
>  	drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
>  						    l3_parity.error_work);
>  	u32 error_status, row, bank, subbank;
> -	char *parity_event[5];
> +	char *parity_event[6];
>  	uint32_t misccpctl;
>  	unsigned long flags;
> +	uint8_t slice = 0;
>  
>  	/* We must turn off DOP level clock gating to access the L3 registers.
>  	 * In order to prevent a get/put style interface, acquire struct mutex
> @@ -892,45 +893,63 @@ static void ivybridge_parity_work(struct work_struct *work)
>  	 */
>  	mutex_lock(&dev_priv->dev->struct_mutex);
>  
> +	/* If we've screwed up tracking, just let the interrupt fire again */
> +	if (WARN_ON(!dev_priv->l3_parity.which_slice))
> +		goto out;
> +
>  	misccpctl = I915_READ(GEN7_MISCCPCTL);
>  	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
>  	POSTING_READ(GEN7_MISCCPCTL);
>  
> -	error_status = I915_READ(GEN7_L3CDERRST1);
> -	row = GEN7_PARITY_ERROR_ROW(error_status);
> -	bank = GEN7_PARITY_ERROR_BANK(error_status);
> -	subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
> +	while ((slice = ffs(dev_priv->l3_parity.which_slice)) != 0) {
> +		u32 reg;
>  
> -	I915_WRITE(GEN7_L3CDERRST1, GEN7_PARITY_ERROR_VALID |
> -				    GEN7_L3CDERRST1_ENABLE);
> -	POSTING_READ(GEN7_L3CDERRST1);
> +		if (WARN_ON(slice >= MAX_L3_SLICES))
> +			break;

Could be >= NUM_L3_SLICES(dev) for a bit of extra paranoia. Also we
would fail to clear invalid bits from which_slice in this case, and
thus we'd get the WARN every time the work runs. But I guess this
should never happen in any case so probably not worth worrying about
this too much.

>  
> -	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> +		dev_priv->l3_parity.which_slice &= ~(1<<slice);
>  
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	ilk_enable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +		reg = GEN7_L3CDERRST1 + (slice * 0x200);
>  
> -	mutex_unlock(&dev_priv->dev->struct_mutex);
> +		error_status = I915_READ(reg);
> +		row = GEN7_PARITY_ERROR_ROW(error_status);
> +		bank = GEN7_PARITY_ERROR_BANK(error_status);
> +		subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
> +
> +		I915_WRITE(reg, GEN7_PARITY_ERROR_VALID | GEN7_L3CDERRST1_ENABLE);
> +		POSTING_READ(reg);
> +
> +		parity_event[0] = I915_L3_PARITY_UEVENT "=1";
> +		parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
> +		parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
> +		parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
> +		parity_event[4] = kasprintf(GFP_KERNEL, "SLICE=%d", slice);
> +		parity_event[5] = NULL;
>  
> -	parity_event[0] = I915_L3_PARITY_UEVENT "=1";
> -	parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
> -	parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
> -	parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
> -	parity_event[4] = NULL;
> +		kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
> +				   KOBJ_CHANGE, parity_event);
>  
> -	kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
> -			   KOBJ_CHANGE, parity_event);
> +		DRM_DEBUG("Parity error: Slice = %d, Row = %d, Bank = %d, Sub bank = %d.\n",
> +			  slice, row, bank, subbank);
>  
> -	DRM_DEBUG("Parity error: Row = %d, Bank = %d, Sub bank = %d.\n",
> -		  row, bank, subbank);
> +		kfree(parity_event[4]);
> +		kfree(parity_event[3]);
> +		kfree(parity_event[2]);
> +		kfree(parity_event[1]);
> +	}
> +
> +	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> +
> +out:
> +	WARN_ON(dev_priv->l3_parity.which_slice);

First I figured the irq could rearm this behind our back, but we disable
the irq until the work is done. So yeah, this is fine.

> +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> +	ilk_enable_gt_irq(dev_priv, GT_PARITY_ERROR);

Is it actually safe to enable the second slice irq when there's no second
slice? This docs say it's just "reserved", but no mention whether it RO or
could there be side effects.

> +	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
>  
> -	kfree(parity_event[3]);
> -	kfree(parity_event[2]);
> -	kfree(parity_event[1]);
> +	mutex_unlock(&dev_priv->dev->struct_mutex);
>  }
>  
> -static void ivybridge_parity_error_irq_handler(struct drm_device *dev)
> +static void ivybridge_parity_error_irq_handler(struct drm_device *dev, u32 iir)
>  {
>  	drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private;
>  
> @@ -938,9 +957,12 @@ static void ivybridge_parity_error_irq_handler(struct drm_device *dev)
>  		return;
>  
>  	spin_lock(&dev_priv->irq_lock);
> -	ilk_disable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> +	ilk_disable_gt_irq(dev_priv, GT_PARITY_ERROR);
>  	spin_unlock(&dev_priv->irq_lock);
>  
> +	iir &= GT_PARITY_ERROR;
> +	dev_priv->l3_parity.which_slice =
> +		1 << (iir & GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1 ? 1 : 0);

What if both slices report an error at the same time?

>  	queue_work(dev_priv->wq, &dev_priv->l3_parity.error_work);
>  }
>  
> @@ -975,8 +997,8 @@ static void snb_gt_irq_handler(struct drm_device *dev,
>  		i915_handle_error(dev, false);
>  	}
>  
> -	if (gt_iir & GT_RENDER_L3_PARITY_ERROR_INTERRUPT)
> -		ivybridge_parity_error_irq_handler(dev);
> +	if (gt_iir & GT_PARITY_ERROR)
> +		ivybridge_parity_error_irq_handler(dev, gt_iir);
>  }
>  
>  #define HPD_STORM_DETECT_PERIOD 1000
> @@ -2261,8 +2283,8 @@ static void gen5_gt_irq_postinstall(struct drm_device *dev)
>  	dev_priv->gt_irq_mask = ~0;
>  	if (HAS_L3_GPU_CACHE(dev)) {
>  		/* L3 parity interrupt is always unmasked. */
> -		dev_priv->gt_irq_mask = ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
> -		gt_irqs |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
> +		dev_priv->gt_irq_mask = ~GT_PARITY_ERROR;
> +		gt_irqs |= GT_PARITY_ERROR;
>  	}
>  
>  	gt_irqs |= GT_RENDER_USER_INTERRUPT;
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index bcee89b..4155a1d 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -927,6 +927,7 @@
>  #define GT_BLT_USER_INTERRUPT			(1 << 22)
>  #define GT_BSD_CS_ERROR_INTERRUPT		(1 << 15)
>  #define GT_BSD_USER_INTERRUPT			(1 << 12)
> +#define GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1	(1 << 11) /* hsw+; rsvd on snb, ivb, vlv */
>  #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT	(1 <<  5) /* !snb */
>  #define GT_RENDER_PIPECTL_NOTIFY_INTERRUPT	(1 <<  4)
>  #define GT_RENDER_CS_MASTER_ERROR_INTERRUPT	(1 <<  3)
> @@ -937,6 +938,9 @@
>  #define PM_VEBOX_CS_ERROR_INTERRUPT		(1 << 12) /* hsw+ */
>  #define PM_VEBOX_USER_INTERRUPT			(1 << 10) /* hsw+ */
>  
> +#define GT_PARITY_ERROR				(GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1 | \
> +						 GT_RENDER_L3_PARITY_ERROR_INTERRUPT)
> +
>  /* These are all the "old" interrupts */
>  #define ILK_BSD_USER_INTERRUPT				(1<<5)
>  #define I915_PIPE_CONTROL_NOTIFY_INTERRUPT		(1<<18)
> @@ -4742,6 +4746,7 @@
>  
>  /* IVYBRIDGE DPF */
>  #define GEN7_L3CDERRST1			0xB008 /* L3CD Error Status 1 */
> +#define HSW_L3CDERRST11			0xB208 /* L3CD Error Status register 1 slice 1 */
>  #define   GEN7_L3CDERRST1_ROW_MASK	(0x7ff<<14)
>  #define   GEN7_PARITY_ERROR_VALID	(1<<13)
>  #define   GEN7_L3CDERRST1_BANK_MASK	(3<<11)
> @@ -4755,6 +4760,7 @@
>  #define   GEN7_L3CDERRST1_ENABLE	(1<<7)
>  
>  #define GEN7_L3LOG_BASE			0xB070
> +#define HSW_L3LOG_BASE_SLICE1		0xB270
>  #define GEN7_L3LOG_SIZE			0x80
>  
>  #define GEN7_HALF_SLICE_CHICKEN1	0xe100 /* IVB GT1 + VLV */
> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
> index 43c2e81..d208f2d 100644
> --- a/drivers/gpu/drm/i915/i915_sysfs.c
> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
> @@ -119,6 +119,7 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
>  	struct drm_device *drm_dev = dminor->dev;
>  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
>  	uint32_t misccpctl;
> +	int slice = (int)(uintptr_t)attr->private;
>  	int i, ret;
>  
>  	count = round_down(count, 4);
> @@ -135,11 +136,11 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
>  
>  	if (IS_HASWELL(drm_dev)) {
>  		int last = min_t(int, GEN7_L3LOG_SIZE, count + offset);
> -		if ((!dev_priv->l3_parity.remap_info))
> +		if ((!dev_priv->l3_parity.remap_info[slice]))
>  			memset(buf + offset, 0, last - offset);
>  		else
>  			memcpy(buf + offset,
> -			       dev_priv->l3_parity.remap_info + (offset/4),
> +			       dev_priv->l3_parity.remap_info[slice] + (offset/4),
>  			       last - offset);
>  
>  		i = last;
> @@ -170,6 +171,7 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
>  	struct drm_device *drm_dev = dminor->dev;
>  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
>  	u32 *temp = NULL; /* Just here to make handling failures easy */
> +	int slice = (int)(uintptr_t)attr->private;
>  	int ret;
>  
>  	ret = l3_access_valid(drm_dev, offset);
> @@ -180,7 +182,7 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
>  	if (ret)
>  		return ret;
>  
> -	if (!dev_priv->l3_parity.remap_info) {
> +	if (!dev_priv->l3_parity.remap_info[slice]) {
>  		temp = kzalloc(GEN7_L3LOG_SIZE, GFP_KERNEL);
>  		if (!temp) {
>  			mutex_unlock(&drm_dev->struct_mutex);
> @@ -200,11 +202,11 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
>  	 * at this point it is left as a TODO.
>  	*/
>  	if (temp)
> -		dev_priv->l3_parity.remap_info = temp;
> +		dev_priv->l3_parity.remap_info[slice] = temp;
>  
> -	memcpy(dev_priv->l3_parity.remap_info + (offset/4), buf, count);
> +	memcpy(dev_priv->l3_parity.remap_info[slice] + (offset/4), buf, count);
>  
> -	i915_gem_l3_remap(drm_dev);
> +	i915_gem_l3_remap(drm_dev, slice);
>  
>  	mutex_unlock(&drm_dev->struct_mutex);
>  
> @@ -216,7 +218,17 @@ static struct bin_attribute dpf_attrs = {
>  	.size = GEN7_L3LOG_SIZE,
>  	.read = i915_l3_read,
>  	.write = i915_l3_write,
> -	.mmap = NULL
> +	.mmap = NULL,
> +	.private = (void *)0
> +};
> +
> +static struct bin_attribute dpf_attrs_1 = {
> +	.attr = {.name = "l3_parity_slice_1", .mode = (S_IRUSR | S_IWUSR)},
> +	.size = GEN7_L3LOG_SIZE,
> +	.read = i915_l3_read,
> +	.write = i915_l3_write,
> +	.mmap = NULL,
> +	.private = (void *)1
>  };
>  
>  static ssize_t gt_cur_freq_mhz_show(struct device *kdev,
> @@ -527,6 +539,13 @@ void i915_setup_sysfs(struct drm_device *dev)
>  		ret = device_create_bin_file(&dev->primary->kdev, &dpf_attrs);
>  		if (ret)
>  			DRM_ERROR("l3 parity sysfs setup failed\n");
> +
> +		if (NUM_L3_SLICES(dev) > 1) {
> +			ret = device_create_bin_file(&dev->primary->kdev,
> +						     &dpf_attrs_1);
> +			if (ret)
> +				DRM_ERROR("l3 parity slice 1 setup failed\n");
> +		}
>  	}
>  
>  	ret = 0;
> @@ -550,6 +569,7 @@ void i915_teardown_sysfs(struct drm_device *dev)
>  		sysfs_remove_files(&dev->primary->kdev.kobj, vlv_attrs);
>  	else
>  		sysfs_remove_files(&dev->primary->kdev.kobj, gen6_attrs);
> +	device_remove_bin_file(&dev->primary->kdev,  &dpf_attrs_1);
>  	device_remove_bin_file(&dev->primary->kdev,  &dpf_attrs);
>  #ifdef CONFIG_PM
>  	sysfs_unmerge_group(&dev->primary->kdev.kobj, &rc6_attr_group);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 686e5b2..3539b45 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -570,7 +570,7 @@ static int init_render_ring(struct intel_ring_buffer *ring)
>  		I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
>  
>  	if (HAS_L3_GPU_CACHE(dev))
> -		I915_WRITE_IMR(ring, ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> +		I915_WRITE_IMR(ring, ~GT_PARITY_ERROR);
>  
>  	return ret;
>  }
> @@ -1000,7 +1000,7 @@ gen6_ring_get_irq(struct intel_ring_buffer *ring)
>  		if (HAS_L3_GPU_CACHE(dev) && ring->id == RCS)
>  			I915_WRITE_IMR(ring,
>  				       ~(ring->irq_enable_mask |
> -					 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
> +					 GT_PARITY_ERROR));
>  		else
>  			I915_WRITE_IMR(ring, ~ring->irq_enable_mask);
>  		ilk_enable_gt_irq(dev_priv, ring->irq_enable_mask);
> @@ -1021,7 +1021,7 @@ gen6_ring_put_irq(struct intel_ring_buffer *ring)
>  	if (--ring->irq_refcount == 0) {
>  		if (HAS_L3_GPU_CACHE(dev) && ring->id == RCS)
>  			I915_WRITE_IMR(ring,
> -				       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> +				       ~GT_PARITY_ERROR);
>  		else
>  			I915_WRITE_IMR(ring, ~0);
>  		ilk_disable_gt_irq(dev_priv, ring->irq_enable_mask);
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 55bb572..3a4e97b 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -38,10 +38,10 @@
>   *
>   * I915_L3_PARITY_UEVENT - Generated when the driver receives a parity mismatch
>   *	event from the gpu l3 cache. Additional information supplied is ROW,
> - *	BANK, SUBBANK of the affected cacheline. Userspace should keep track of
> - *	these events and if a specific cache-line seems to have a persistent
> - *	error remap it with the l3 remapping tool supplied in intel-gpu-tools.
> - *	The value supplied with the event is always 1.
> + *	BANK, SUBBANK, SLICE of the affected cacheline. Userspace should keep
> + *	track of these events and if a specific cache-line seems to have a
> + *	persistent error remap it with the l3 remapping tool supplied in
> + *	intel-gpu-tools.  The value supplied with the event is always 1.
>   *
>   * I915_ERROR_UEVENT - Generated upon error detection, currently only via
>   *	hangcheck. The error detection event is a good indicator of when things
> -- 
> 1.8.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] DPF (GPU l3 parity detection) improvements
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (15 preceding siblings ...)
  2013-09-13  5:28 ` [PATCH 16/16] intel_l3_parity: Support a daemonic mode Ben Widawsky
@ 2013-09-13  9:44 ` Ville Syrjälä
  2013-09-17  0:52 ` Bell, Bryan J
  17 siblings, 0 replies; 40+ messages in thread
From: Ville Syrjälä @ 2013-09-13  9:44 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, bryan.j.bell, vishnu.venkatesh

For patches 1,2,3,6,7:
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/8] drm/i915: Fix l3 parity user buffer offset
  2013-09-13  5:28 ` [PATCH 3/8] drm/i915: Fix l3 parity user buffer offset Ben Widawsky
@ 2013-09-13 12:56   ` Daniel Vetter
  0 siblings, 0 replies; 40+ messages in thread
From: Daniel Vetter @ 2013-09-13 12:56 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, vishnu.venkatesh, bryan.j.bell, Ben Widawsky

On Thu, Sep 12, 2013 at 10:28:29PM -0700, Ben Widawsky wrote:
> The buf pointer used during l3_write is just char *, therefore it does
> not require the silly any addition of offset.
> 
> v2: Also fix i915_l3_read with a suggested logic from Ville
> 
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Up to this all merged to dinq, thanks for patches&review.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 15/16] intel_l3_parity: Support error injection
  2013-09-13  9:12   ` Daniel Vetter
@ 2013-09-13 15:54     ` Ben Widawsky
  2013-09-13 16:14       ` Daniel Vetter
  0 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13 15:54 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, vishnu.venkatesh, bryan.j.bell, Ben Widawsky

On Fri, Sep 13, 2013 at 11:12:11AM +0200, Daniel Vetter wrote:
> On Thu, Sep 12, 2013 at 10:28:41PM -0700, Ben Widawsky wrote:
> > Haswell added the ability to inject errors which is extremely useful for
> > testing. Add two arguments to the tool to inject, and uninject.
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> 
> Do we run any risk that a concurrent write/read to the same register range
> could hang the machine due to the same-cacheline w/a we need? Just want to
> make sure that when we integrate this into a testcase there's no surprises
> like with intel_gpu_top ...
> -Daniel

The race against the kernel is ever present on all tests/tools. Are we
running parallel igt yet? If so, I can make the read/write functions
threadsafe.

On this note in particular I suppose we can make a debugfs entry like
the forcewake one to allow user space to do register accesses.

Interestingly, this also reminds me of another caveat I meant to put in
the commit message and forgot... the error injection register is also
per context, which makes it a pain to clear (and the pain in writing the
test case). I'm even beginning to think maybe a debugfs for this
register is the way to go.

As a side note, the injection feature is entirely debug only - but
agreed, random hangs in the test suite is not good.

[snip]

> 
> > ---
> >  tests/sysfs_l3_parity   |  2 +-
> >  tools/intel_l3_parity.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 68 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tests/sysfs_l3_parity b/tests/sysfs_l3_parity
> > index a0dfad9..e9d4411 100755
> > --- a/tests/sysfs_l3_parity
> > +++ b/tests/sysfs_l3_parity
> > @@ -21,7 +21,7 @@ fi
> >  $SOURCE_DIR/../tools/intel_l3_parity -r 0 -b 0 -s 0 -e
> >  
> >  #Check that we can clear remaps
> > -if [ `$SOURCE_DIR/../tools/intel_l3_parity -l | wc -c` != "0" ] ; then
> > +if [ `$SOURCE_DIR/../tools/intel_l3_parity -l | wc -l` != 1 ] ; then
> >  	echo "Fail 2"
> >  	exit 1
> >  fi
> > diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c
> > index cf15541..cd6754e 100644
> > --- a/tools/intel_l3_parity.c
> > +++ b/tools/intel_l3_parity.c
> > @@ -79,6 +79,20 @@ static int which_slice = -1;
> >  			(__i) < ((which_slice == -1) ? MAX_SLICES : (which_slice + 1)); \
> >  			(__i)++)
> >  
> > +static void decode_dft(uint32_t dft)
> > +{
> > +	if (IS_IVYBRIDGE(devid) || !(dft & 1)) {
> > +		printf("Error injection disabled\n");
> > +		return;
> > +	}
> > +	printf("Error injection enabled\n");
> > +	printf("  Hang = %s\n", (dft >> 28) & 0x1 ? "yes" : "no");
> > +	printf("  Row = %d\n", (dft >> 7) & 0x7ff);
> > +	printf("  Bank = %d\n", (dft >> 2) & 0x3);
> > +	printf("  Subbank = %d\n", (dft >> 4) & 0x7);
> > +	printf("  Slice = %d\n", (dft >> 1) & 0x1);
> > +}
> > +
> >  static void dumpit(int slice)
> >  {
> >  	int i, j;
> > @@ -150,7 +164,9 @@ static void usage(const char *name)
> >  		"  -l, --list				List the current L3 logs\n"
> >  		"  -a, --clear-all			Clear all disabled rows\n"
> >  		"  -e, --enable				Enable row, bank, subbank (undo -d)\n"
> > -		"  -d, --disable=<row,bank,subbank>	Disable row, bank, subbank (inline arguments are deprecated. Please use -r, -b, -s instead\n",
> > +		"  -d, --disable=<row,bank,subbank>	Disable row, bank, subbank (inline arguments are deprecated. Please use -r, -b, -s instead\n"
> > +		"  -i, --inject				[HSW only] Cause hardware to inject a row errors\n"
> > +		"  -u, --uninject			[HSW only] Turn off hardware error injectection (undo -i)\n",
> >  		name);
> >  }
> >  
> > @@ -158,6 +174,7 @@ int main(int argc, char *argv[])
> >  {
> >  	const int device = drm_get_card();
> >  	char *path[REAL_MAX_SLICES];
> > +	uint32_t dft;
> >  	int row = 0, bank = 0, sbank = 0;
> >  	int fd[REAL_MAX_SLICES] = {0}, ret, i;
> >  	int action = '0';
> > @@ -167,6 +184,8 @@ int main(int argc, char *argv[])
> >  	if (intel_gen(devid) < 7)
> >  		exit(EXIT_SUCCESS);
> >  
> > +	assert(intel_register_access_init(intel_get_pci_device(), 0) == 0);
> > +
> >  	ret = asprintf(&path[0], "/sys/class/drm/card%d/l3_parity", device);
> >  	assert(ret != -1);
> >  	ret = asprintf(&path[1], "/sys/class/drm/card%d/l3_parity_slice_1", device);
> > @@ -183,6 +202,7 @@ int main(int argc, char *argv[])
> >  		assert(lseek(fd[i], 0, SEEK_SET) == 0);
> >  	}
> >  
> > +	dft = intel_register_read(0xb038);
> >  
> >  	while (1) {
> >  		int c, option_index = 0;
> > @@ -192,6 +212,8 @@ int main(int argc, char *argv[])
> >  			{ "clear-all", no_argument, 0, 'a' },
> >  			{ "enable", no_argument, 0, 'e' },
> >  			{ "disable", optional_argument, 0, 'd' },
> > +			{ "inject", no_argument, 0, 'i' },
> > +			{ "uninject", no_argument, 0, 'u' },
> >  			{ "hw-info", no_argument, 0, 'H' },
> >  			{ "row", required_argument, 0, 'r' },
> >  			{ "bank", required_argument, 0, 'b' },
> > @@ -200,7 +222,7 @@ int main(int argc, char *argv[])
> >  			{0, 0, 0, 0}
> >  		};
> >  
> > -		c = getopt_long(argc, argv, "hHr:b:s:w:aled::", long_options,
> > +		c = getopt_long(argc, argv, "hHr:b:s:w:aled::iu", long_options,
> >  				&option_index);
> >  		if (c == -1)
> >  			break;
> > @@ -215,6 +237,7 @@ int main(int argc, char *argv[])
> >  				printf("Number of banks: %d\n", num_banks());
> >  				printf("Subbanks per bank: %d\n", NUM_SUBBANKS);
> >  				printf("Max L3 size: %dK\n", L3_SIZE >> 10);
> > +				printf("Has error injection: %s\n", IS_HASWELL(devid) ? "yes" : "no");
> >  				exit(EXIT_SUCCESS);
> >  			case 'r':
> >  				row = atoi(optarg);
> > @@ -236,6 +259,12 @@ int main(int argc, char *argv[])
> >  				if (which_slice >= MAX_SLICES)
> >  					exit(EXIT_FAILURE);
> >  				break;
> > +			case 'i':
> > +			case 'u':
> > +				if (!IS_HASWELL(devid)) {
> > +					fprintf(stderr, "Error injection supported on HSW+ only\n");
> > +					exit(EXIT_FAILURE);
> > +				}
> >  			case 'd':
> >  				if (optarg) {
> >  					ret = sscanf(optarg, "%d,%d,%d", &row, &bank, &sbank);
> > @@ -256,6 +285,23 @@ int main(int argc, char *argv[])
> >  		}
> >  	}
> >  
> > +	if (action == 'i') {
> > +		if (((dft >> 1) & 1) != which_slice) {
> > +			fprintf(stderr, "DFT register already has slice %d enabled, and we don't support multiple slices. Try modifying -w; but sometimes the register sticks in the wrong way\n", (dft >> 1) & 1);
> > +			exit(EXIT_FAILURE);
> > +		}
> > +
> > +		if (which_slice == -1) {
> > +			fprintf(stderr, "Cannot inject errors to multiple slices (modify -w)\n");
> > +			exit(EXIT_FAILURE);
> > +		}
> > +		if (dft & 1 && ((dft >> 1) && 1) == which_slice)
> > +			printf("warning: overwriting existing injections. This is very dangerous.\n");
> > +	}
> > +
> > +	if (action == 'l')
> > +		decode_dft(dft);
> > +
> >  	/* Per slice operations */
> >  	for_each_slice(i) {
> >  		switch (action) {
> > @@ -271,11 +317,30 @@ int main(int argc, char *argv[])
> >  			case 'd':
> >  				assert(disable_rbs(row, bank, sbank, i) == 0);
> >  				break;
> > +			case 'i':
> > +				if (bank == 3) {
> > +					fprintf(stderr, "The hardware does not support error inject on bank 3.\n");
> > +					exit(EXIT_FAILURE);
> > +				}
> > +				dft |= row << 7;
> > +				dft |= sbank << 4;
> > +				dft |= bank << 2;
> > +				assert(i < 2);
> > +				dft |= i << 1; /* slice */
> > +				dft |= 1 << 0; /* enable */
> > +				intel_register_write(0xb038, dft);
> > +				break;
> > +			case 'u':
> > +				intel_register_write(0xb038, dft & ~(1<<0));
> > +				break;
> > +			case 'L':
> > +				break;
> >  			default:
> >  				abort();
> >  		}
> >  	}
> >  
> > +	intel_register_access_fini();
> >  	if (action == 'l')
> >  		exit(EXIT_SUCCESS);
> >  
> > -- 
> > 1.8.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 15/16] intel_l3_parity: Support error injection
  2013-09-13 15:54     ` Ben Widawsky
@ 2013-09-13 16:14       ` Daniel Vetter
  2013-09-13 16:29         ` Ben Widawsky
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Vetter @ 2013-09-13 16:14 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, vishnu.venkatesh, bryan.j.bell, Ben Widawsky

On Fri, Sep 13, 2013 at 5:54 PM, Ben Widawsky <ben@bwidawsk.net> wrote:
> On Fri, Sep 13, 2013 at 11:12:11AM +0200, Daniel Vetter wrote:
>> On Thu, Sep 12, 2013 at 10:28:41PM -0700, Ben Widawsky wrote:
>> > Haswell added the ability to inject errors which is extremely useful for
>> > testing. Add two arguments to the tool to inject, and uninject.
>> >
>> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>>
>> Do we run any risk that a concurrent write/read to the same register range
>> could hang the machine due to the same-cacheline w/a we need? Just want to
>> make sure that when we integrate this into a testcase there's no surprises
>> like with intel_gpu_top ...
>> -Daniel
>
> The race against the kernel is ever present on all tests/tools. Are we
> running parallel igt yet? If so, I can make the read/write functions
> threadsafe.
>
> On this note in particular I suppose we can make a debugfs entry like
> the forcewake one to allow user space to do register accesses.
>
> Interestingly, this also reminds me of another caveat I meant to put in
> the commit message and forgot... the error injection register is also
> per context, which makes it a pain to clear (and the pain in writing the
> test case). I'm even beginning to think maybe a debugfs for this
> register is the way to go.
>
> As a side note, the injection feature is entirely debug only - but
> agreed, random hangs in the test suite is not good.

Hm, this will be tricky. If nothing else writes this range (i.e. not
our interrupt handler) we could use a secure batchbuffer and emit the
MI_LRI from the userspace batch. Then we could submit some workload
using hw contexts that uses the l3$ cache (I guess without something
in there it won't notice the injected error) and after the error is
detected we could simply kill the context, restoring the original
state again.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 6/8] drm/i915: Make l3 remapping use the ring
  2013-09-13  5:28 ` [PATCH 6/8] drm/i915: Make l3 remapping use the ring Ben Widawsky
@ 2013-09-13 16:16   ` Daniel Vetter
  0 siblings, 0 replies; 40+ messages in thread
From: Daniel Vetter @ 2013-09-13 16:16 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, vishnu.venkatesh, bryan.j.bell, Ben Widawsky

On Thu, Sep 12, 2013 at 10:28:32PM -0700, Ben Widawsky wrote:
> Using LRI for setting the remapping registers allows us to stream l3
> remapping information. This is necessary to handle per context remaps as
> we'll see implemented in an upcoming patch.
> 
> Using the ring also means we don't need to frob the DOP clock gating
> bits.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_drv.h   |  2 +-
>  drivers/gpu/drm/i915/i915_gem.c   | 39 +++++++++++++++++----------------------
>  drivers/gpu/drm/i915/i915_sysfs.c |  3 ++-
>  3 files changed, 20 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index eb90461..493a9cd 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1950,7 +1950,7 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
>  int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
>  int __must_check i915_gem_init(struct drm_device *dev);
>  int __must_check i915_gem_init_hw(struct drm_device *dev);
> -void i915_gem_l3_remap(struct drm_device *dev, int slice);
> +int i915_gem_l3_remap(struct intel_ring_buffer *ring, int slice);
>  void i915_gem_init_swizzling(struct drm_device *dev);
>  void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
>  int __must_check i915_gpu_idle(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b11f7d6c..fa01c69 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4256,41 +4256,36 @@ i915_gem_idle(struct drm_device *dev)
>  	return 0;
>  }
>  
> -void i915_gem_l3_remap(struct drm_device *dev, int slice)
> +int i915_gem_l3_remap(struct intel_ring_buffer *ring, int slice)
>  {
> +	struct drm_device *dev = ring->dev;
>  	drm_i915_private_t *dev_priv = dev->dev_private;
>  	u32 reg_base = GEN7_L3LOG_BASE + (slice * 0x200);
>  	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
> -	u32 misccpctl;
> -	int i;
> +	int i, ret;
>  
>  	if (!HAS_L3_GPU_CACHE(dev))
> -		return;
> +		return 0;
>  
>  	if (NUM_L3_SLICES(dev) < 2 && slice)
> -		return;
> +		return 0;
>  
>  	if (!remap_info)
> -		return;
> +		return 0;
>  
> -	misccpctl = I915_READ(GEN7_MISCCPCTL);
> -	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
> -	POSTING_READ(GEN7_MISCCPCTL);
> +	ret = intel_ring_begin(ring, GEN7_L3LOG_SIZE / 4 * 3);
> +	if (ret)
> +		return ret;
>  
>  	for (i = 0; i < GEN7_L3LOG_SIZE; i += 4) {
> -		u32 remap = I915_READ(reg_base + i);
> -		if (remap && remap != remap_info[i/4])
> -			DRM_DEBUG("0x%x was already programmed to %x\n",
> -				  reg_base + i, remap);
> -		if (remap && !remap_info[i/4])
> -			DRM_DEBUG_DRIVER("Clearing remapped register\n");
> -		I915_WRITE(reg_base + i, remap_info[i/4]);
> +		intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
> +		intel_ring_emit(ring, reg_base + i);
> +		intel_ring_emit(ring, remap_info[i/4]);

I think a comment here explaining that on haswell we don't ever read back
this register range and hence should be safe for concurrent register
access would be good. Or is this not a concern here?
-Daniel

>  	}
>  
> -	/* Make sure all the writes land before disabling dop clock gating */
> -	POSTING_READ(reg_base);
> +	intel_ring_advance(ring);
>  
> -	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> +	return ret;
>  }
>  
>  void i915_gem_init_swizzling(struct drm_device *dev)
> @@ -4401,15 +4396,15 @@ i915_gem_init_hw(struct drm_device *dev)
>  		I915_WRITE(GEN7_MSG_CTL, temp);
>  	}
>  
> -	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> -		i915_gem_l3_remap(dev, i);
> -
>  	i915_gem_init_swizzling(dev);
>  
>  	ret = i915_gem_init_rings(dev);
>  	if (ret)
>  		return ret;
>  
> +	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> +		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> +
>  	/*
>  	 * XXX: There was some w/a described somewhere suggesting loading
>  	 * contexts before PPGTT.
> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
> index d208f2d..65a7274 100644
> --- a/drivers/gpu/drm/i915/i915_sysfs.c
> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
> @@ -206,7 +206,8 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
>  
>  	memcpy(dev_priv->l3_parity.remap_info[slice] + (offset/4), buf, count);
>  
> -	i915_gem_l3_remap(drm_dev, slice);
> +	if (i915_gem_l3_remap(&dev_priv->ring[RCS], slice))
> +		count = 0;
>  
>  	mutex_unlock(&drm_dev->struct_mutex);
>  
> -- 
> 1.8.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 15/16] intel_l3_parity: Support error injection
  2013-09-13 16:14       ` Daniel Vetter
@ 2013-09-13 16:29         ` Ben Widawsky
  0 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-13 16:29 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Ben Widawsky, bryan.j.bell, vishnu.venkatesh

On Fri, Sep 13, 2013 at 06:14:38PM +0200, Daniel Vetter wrote:
> On Fri, Sep 13, 2013 at 5:54 PM, Ben Widawsky <ben@bwidawsk.net> wrote:
> > On Fri, Sep 13, 2013 at 11:12:11AM +0200, Daniel Vetter wrote:
> >> On Thu, Sep 12, 2013 at 10:28:41PM -0700, Ben Widawsky wrote:
> >> > Haswell added the ability to inject errors which is extremely useful for
> >> > testing. Add two arguments to the tool to inject, and uninject.
> >> >
> >> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> >>
> >> Do we run any risk that a concurrent write/read to the same register range
> >> could hang the machine due to the same-cacheline w/a we need? Just want to
> >> make sure that when we integrate this into a testcase there's no surprises
> >> like with intel_gpu_top ...
> >> -Daniel
> >
> > The race against the kernel is ever present on all tests/tools. Are we
> > running parallel igt yet? If so, I can make the read/write functions
> > threadsafe.
> >
> > On this note in particular I suppose we can make a debugfs entry like
> > the forcewake one to allow user space to do register accesses.
> >
> > Interestingly, this also reminds me of another caveat I meant to put in
> > the commit message and forgot... the error injection register is also
> > per context, which makes it a pain to clear (and the pain in writing the
> > test case). I'm even beginning to think maybe a debugfs for this
> > register is the way to go.
> >
> > As a side note, the injection feature is entirely debug only - but
> > agreed, random hangs in the test suite is not good.
> 
> Hm, this will be tricky. If nothing else writes this range (i.e. not
> our interrupt handler) we could use a secure batchbuffer and emit the
> MI_LRI from the userspace batch. Then we could submit some workload
> using hw contexts that uses the l3$ cache (I guess without something
> in there it won't notice the injected error) and after the error is
> detected we could simply kill the context, restoring the original
> state again.
> -Daniel

Actually, I don't think there is anything else used in the cacheline of
the error injection register which are accessed after driver load.

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support
  2013-09-13  5:28 ` [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support Ben Widawsky
@ 2013-09-16 18:18   ` Bell, Bryan J
  2013-09-17 23:52     ` Ben Widawsky
  0 siblings, 1 reply; 40+ messages in thread
From: Bell, Bryan J @ 2013-09-16 18:18 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky, Widawsky, Benjamin, Venkatesh, Vishnu

L3 dynamic parity is not supported on VLV. Please add the check for VLV. 

I can send you the email thread, if needed. 

--Thanks
Bryan

-----Original Message-----
From: Ben Widawsky [mailto:benjamin.widawsky@intel.com] 
Sent: Thursday, September 12, 2013 10:29 PM
To: intel-gfx@lists.freedesktop.org
Cc: Venkatesh, Vishnu; Bell, Bryan J; Widawsky, Benjamin; Ben Widawsky
Subject: [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 tools/intel_l3_parity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c index 970dcd6..a3d268b 100644
--- a/tools/intel_l3_parity.c
+++ b/tools/intel_l3_parity.c
@@ -120,7 +120,7 @@ int main(int argc, char *argv[])
 	assert(ret != -1);
 
 	fd = open(path, O_RDWR);
-	if (fd == -1 && IS_IVYBRIDGE(devid)) {
+	if (fd == -1 && intel_gen(devid) > 6) {
 		perror("Opening sysfs");
 		exit(EXIT_FAILURE);
 	} else if (fd == -1)
--
1.8.4

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] DPF (GPU l3 parity detection) improvements
  2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
                   ` (16 preceding siblings ...)
  2013-09-13  9:44 ` [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ville Syrjälä
@ 2013-09-17  0:52 ` Bell, Bryan J
  2013-09-17  4:15   ` Ben Widawsky
  17 siblings, 1 reply; 40+ messages in thread
From: Bell, Bryan J @ 2013-09-17  0:52 UTC (permalink / raw)
  To: intel-gfx; +Cc: Widawsky, Benjamin, Venkatesh, Vishnu

The "hang" injection is for the scenarios like:
(1) L3 error occurs
(2) Workload completion, reported to user mode driver, e.g. OpenCL
(3) L3 error interrupt, handled.

If (2) occurs before (3), it's possible to report that a GPGPU workload successfully completed when in fact it did not due to the L3 error. 

It should be up to the user mode if the "hang" bit is set.

--Thanks
Bryan
-----Original Message-----
From: Ben Widawsky [mailto:benjamin.widawsky@intel.com] 
Sent: Thursday, September 12, 2013 10:28 PM
To: intel-gfx@lists.freedesktop.org
Cc: Venkatesh, Vishnu; Bell, Bryan J; Widawsky, Benjamin
Subject: [PATCH 0/8] DPF (GPU l3 parity detection) improvements

Since IVB, our driver has supported GPU L3 cacheline remapping for parity errors. This is known as, "DPF" for Dynamic Parity Feature. I am told such an error is a good predictor for a subsequent error in the same part of the cache.  To address this possible issue for workloads requiring precise and correct data, like GPGPU workloads the HW has extra space in the cache which can be dynamically remapped to fill in the old, faulting parts of the cache. I should also note, to my knowledge, no such error has actually been seen on either Ivybridge or Haswell in the wild.

Note, and reminder: GPU L3 is not the same thing as "L3." It is a special (usually incoherent) cache that is only used by certain components within the GPU.

Included in the patches:
1. Fix HSW test cases previously submitted and bikeshedded by Ville.
2. Support for an extra area of L3 added in certain HSW SKUs 3. Error injection support from the user space for test.
4. A reference daemon for listening to the parity error events.

Caveats:
* I've not implemented the "hang" injection. I was not clear what it does, and
  I don't really see how it benefits testing the software I have written.

* I am currently missing a test which uses the error injection.
  Volunteers who want to help, please raise your hand. If not, I'll get
  to it as soon as possible.

* We do have a race with the udev mechanism of error delivery. If I
  understand the way udev works, if we have more than 1 event before the
  daemon is woken, the properties will get us the failing cache location
  of the last error only. I think this is okay because of the earlier statement
  that a parity error is a good indicator of a future parity error. One thing
  which I've not done is trying to track when there are missed errors which
  should be possible even if the info about the location of the error can't be
  retrieved.

* There is no way to read out the per context remapping information through
  sysfs. I only expose whether or not a context has outstanding remaps through
  debugfs. This does effect the testability a bit, but the implementation is
  simple enough that I'm not terrible worried.

Ben Widawsky (8):
  drm/i915: Remove extra "ring"
  drm/i915: Round l3 parity reads down
  drm/i915: Fix l3 parity user buffer offset
  drm/i915: Fix HSW parity test
  drm/i915: Add second slice l3 remapping
  drm/i915: Make l3 remapping use the ring
  drm/i915: Keep a list of all contexts
  drm/i915: Do remaps for all contexts

 drivers/gpu/drm/i915/i915_debugfs.c     | 23 ++++++---
 drivers/gpu/drm/i915/i915_drv.h         | 13 +++--
 drivers/gpu/drm/i915/i915_gem.c         | 46 +++++++++---------
 drivers/gpu/drm/i915/i915_gem_context.c | 20 +++++++-
 drivers/gpu/drm/i915/i915_irq.c         | 84 +++++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_reg.h         |  6 +++
 drivers/gpu/drm/i915/i915_sysfs.c       | 57 +++++++++++++++-------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +--
 include/uapi/drm/i915_drm.h             |  8 ++--
 9 files changed, 175 insertions(+), 88 deletions(-)

--
1.8.4

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] DPF (GPU l3 parity detection) improvements
  2013-09-17  0:52 ` Bell, Bryan J
@ 2013-09-17  4:15   ` Ben Widawsky
  2013-09-17  7:27     ` Daniel Vetter
  0 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2013-09-17  4:15 UTC (permalink / raw)
  To: Bell, Bryan J; +Cc: intel-gfx, Venkatesh, Vishnu

I see. I had thought the hang bit was part of the test injection, when
it's actually modifying the behavior or L3 errors. Any opinions on
what the default should be (agreed that policy should be controlled by
user space, but we can control the default)? What does a "hang" mean
exactly, is the rest of memory still responsive, L3?

On Mon, Sep 16, 2013 at 5:52 PM, Bell, Bryan J <bryan.j.bell@intel.com> wrote:
> The "hang" injection is for the scenarios like:
> (1) L3 error occurs
> (2) Workload completion, reported to user mode driver, e.g. OpenCL
> (3) L3 error interrupt, handled.
>
> If (2) occurs before (3), it's possible to report that a GPGPU workload successfully completed when in fact it did not due to the L3 error.
>
> It should be up to the user mode if the "hang" bit is set.
>
> --Thanks
> Bryan
> -----Original Message-----
> From: Ben Widawsky [mailto:benjamin.widawsky@intel.com]
> Sent: Thursday, September 12, 2013 10:28 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Venkatesh, Vishnu; Bell, Bryan J; Widawsky, Benjamin
> Subject: [PATCH 0/8] DPF (GPU l3 parity detection) improvements
>
> Since IVB, our driver has supported GPU L3 cacheline remapping for parity errors. This is known as, "DPF" for Dynamic Parity Feature. I am told such an error is a good predictor for a subsequent error in the same part of the cache.  To address this possible issue for workloads requiring precise and correct data, like GPGPU workloads the HW has extra space in the cache which can be dynamically remapped to fill in the old, faulting parts of the cache. I should also note, to my knowledge, no such error has actually been seen on either Ivybridge or Haswell in the wild.
>
> Note, and reminder: GPU L3 is not the same thing as "L3." It is a special (usually incoherent) cache that is only used by certain components within the GPU.
>
> Included in the patches:
> 1. Fix HSW test cases previously submitted and bikeshedded by Ville.
> 2. Support for an extra area of L3 added in certain HSW SKUs 3. Error injection support from the user space for test.
> 4. A reference daemon for listening to the parity error events.
>
> Caveats:
> * I've not implemented the "hang" injection. I was not clear what it does, and
>   I don't really see how it benefits testing the software I have written.
>
> * I am currently missing a test which uses the error injection.
>   Volunteers who want to help, please raise your hand. If not, I'll get
>   to it as soon as possible.
>
> * We do have a race with the udev mechanism of error delivery. If I
>   understand the way udev works, if we have more than 1 event before the
>   daemon is woken, the properties will get us the failing cache location
>   of the last error only. I think this is okay because of the earlier statement
>   that a parity error is a good indicator of a future parity error. One thing
>   which I've not done is trying to track when there are missed errors which
>   should be possible even if the info about the location of the error can't be
>   retrieved.
>
> * There is no way to read out the per context remapping information through
>   sysfs. I only expose whether or not a context has outstanding remaps through
>   debugfs. This does effect the testability a bit, but the implementation is
>   simple enough that I'm not terrible worried.
>
> Ben Widawsky (8):
>   drm/i915: Remove extra "ring"
>   drm/i915: Round l3 parity reads down
>   drm/i915: Fix l3 parity user buffer offset
>   drm/i915: Fix HSW parity test
>   drm/i915: Add second slice l3 remapping
>   drm/i915: Make l3 remapping use the ring
>   drm/i915: Keep a list of all contexts
>   drm/i915: Do remaps for all contexts
>
>  drivers/gpu/drm/i915/i915_debugfs.c     | 23 ++++++---
>  drivers/gpu/drm/i915/i915_drv.h         | 13 +++--
>  drivers/gpu/drm/i915/i915_gem.c         | 46 +++++++++---------
>  drivers/gpu/drm/i915/i915_gem_context.c | 20 +++++++-
>  drivers/gpu/drm/i915/i915_irq.c         | 84 +++++++++++++++++++++------------
>  drivers/gpu/drm/i915/i915_reg.h         |  6 +++
>  drivers/gpu/drm/i915/i915_sysfs.c       | 57 +++++++++++++++-------
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +--
>  include/uapi/drm/i915_drm.h             |  8 ++--
>  9 files changed, 175 insertions(+), 88 deletions(-)
>
> --
> 1.8.4
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] DPF (GPU l3 parity detection) improvements
  2013-09-17  4:15   ` Ben Widawsky
@ 2013-09-17  7:27     ` Daniel Vetter
  2013-09-17 18:23       ` Bell, Bryan J
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Vetter @ 2013-09-17  7:27 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Bell, Bryan J, intel-gfx, Venkatesh, Vishnu

On Tue, Sep 17, 2013 at 6:15 AM, Ben Widawsky
<benjamin.widawsky@intel.com> wrote:
> I see. I had thought the hang bit was part of the test injection, when
> it's actually modifying the behavior or L3 errors. Any opinions on
> what the default should be (agreed that policy should be controlled by
> user space, but we can control the default)? What does a "hang" mean
> exactly, is the rest of memory still responsive, L3?

I guess we want a hang-on-L3-error bit at context creation. But that
seems orthogonal to fixing l3 error reporting on hsw and wiring up the
test facilities, so I'd wait until someone screams for this.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/8] DPF (GPU l3 parity detection) improvements
  2013-09-17  7:27     ` Daniel Vetter
@ 2013-09-17 18:23       ` Bell, Bryan J
  0 siblings, 0 replies; 40+ messages in thread
From: Bell, Bryan J @ 2013-09-17 18:23 UTC (permalink / raw)
  To: Daniel Vetter, Widawsky, Benjamin; +Cc: intel-gfx, Venkatesh, Vishnu

On Windows, with the hang bit set, we do the cache line replacement after an error occurs, but nothing else [so L3 log registers are definitely responsive, I don't know if other memory is also]. 


-----Original Message-----
From: daniel.vetter@ffwll.ch [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
Sent: Tuesday, September 17, 2013 12:28 AM
To: Widawsky, Benjamin
Cc: Bell, Bryan J; intel-gfx@lists.freedesktop.org; Venkatesh, Vishnu
Subject: Re: [Intel-gfx] [PATCH 0/8] DPF (GPU l3 parity detection) improvements

On Tue, Sep 17, 2013 at 6:15 AM, Ben Widawsky <benjamin.widawsky@intel.com> wrote:
> I see. I had thought the hang bit was part of the test injection, when 
> it's actually modifying the behavior or L3 errors. Any opinions on 
> what the default should be (agreed that policy should be controlled by 
> user space, but we can control the default)? What does a "hang" mean 
> exactly, is the rest of memory still responsive, L3?

I guess we want a hang-on-L3-error bit at context creation. But that seems orthogonal to fixing l3 error reporting on hsw and wiring up the test facilities, so I'd wait until someone screams for this.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/8] drm/i915: Add second slice l3 remapping
  2013-09-13  9:38   ` Ville Syrjälä
@ 2013-09-17 18:45     ` Ben Widawsky
  2013-09-17 18:51       ` Bell, Bryan J
  0 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2013-09-17 18:45 UTC (permalink / raw)
  To: Ville Syrjälä, bryan.j.bell
  Cc: intel-gfx, vishnu.venkatesh, Ben Widawsky

On Fri, Sep 13, 2013 at 12:38:01PM +0300, Ville Syrjälä wrote:
> On Thu, Sep 12, 2013 at 10:28:31PM -0700, Ben Widawsky wrote:
> > Certain HSW SKUs have a second bank of L3. This L3 remapping has a
> > separate register set, and interrupt from the first "slice". A slice is
> > simply a term to define some subset of the GPU's l3 cache. This patch
> > implements both the interrupt handler, and ability to communicate with
> > userspace about this second slice.
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |  9 ++--
> >  drivers/gpu/drm/i915/i915_gem.c         | 26 ++++++----
> >  drivers/gpu/drm/i915/i915_irq.c         | 84 +++++++++++++++++++++------------
> >  drivers/gpu/drm/i915/i915_reg.h         |  6 +++
> >  drivers/gpu/drm/i915/i915_sysfs.c       | 34 ++++++++++---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +--
> >  include/uapi/drm/i915_drm.h             |  8 ++--
> >  7 files changed, 115 insertions(+), 58 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 81ba5bb..eb90461 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -918,9 +918,11 @@ struct i915_ums_state {
> >  	int mm_suspended;
> >  };
> >  
> > +#define MAX_L3_SLICES 2
> >  struct intel_l3_parity {
> > -	u32 *remap_info;
> > +	u32 *remap_info[MAX_L3_SLICES];
> >  	struct work_struct error_work;
> > +	int which_slice;
> >  };
> >  
> >  struct i915_gem_mm {
> > @@ -1686,7 +1688,8 @@ struct drm_i915_file_private {
> >  
> >  #define HAS_FORCE_WAKE(dev) (INTEL_INFO(dev)->has_force_wake)
> >  
> > -#define HAS_L3_GPU_CACHE(dev) (IS_IVYBRIDGE(dev) || IS_HASWELL(dev))
> > +#define HAS_L3_GPU_CACHE(dev) (INTEL_INFO(dev)->gen >= 7)
> > +#define NUM_L3_SLICES(dev) (IS_HSW_GT3(dev) ? 2 : HAS_L3_GPU_CACHE(dev))
> >  
> >  #define GT_FREQUENCY_MULTIPLIER 50
> >  
> > @@ -1947,7 +1950,7 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
> >  int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
> >  int __must_check i915_gem_init(struct drm_device *dev);
> >  int __must_check i915_gem_init_hw(struct drm_device *dev);
> > -void i915_gem_l3_remap(struct drm_device *dev);
> > +void i915_gem_l3_remap(struct drm_device *dev, int slice);
> >  void i915_gem_init_swizzling(struct drm_device *dev);
> >  void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
> >  int __must_check i915_gpu_idle(struct drm_device *dev);
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 5b510a3..b11f7d6c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4256,16 +4256,21 @@ i915_gem_idle(struct drm_device *dev)
> >  	return 0;
> >  }
> >  
> > -void i915_gem_l3_remap(struct drm_device *dev)
> > +void i915_gem_l3_remap(struct drm_device *dev, int slice)
> >  {
> >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > +	u32 reg_base = GEN7_L3LOG_BASE + (slice * 0x200);
> > +	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
> >  	u32 misccpctl;
> >  	int i;
> >  
> >  	if (!HAS_L3_GPU_CACHE(dev))
> >  		return;
> >  
> > -	if (!dev_priv->l3_parity.remap_info)
> > +	if (NUM_L3_SLICES(dev) < 2 && slice)
> > +		return;
> 
> This check is redundant as we should never populate
> l3_parity.remap_info[1] when there's no second slice.
> 

Got it. Smashed the early exit check together while at it.

> > +
> > +	if (!remap_info)
> >  		return;
> >  
> >  	misccpctl = I915_READ(GEN7_MISCCPCTL);
> > @@ -4273,17 +4278,17 @@ void i915_gem_l3_remap(struct drm_device *dev)
> >  	POSTING_READ(GEN7_MISCCPCTL);
> >  
> >  	for (i = 0; i < GEN7_L3LOG_SIZE; i += 4) {
> > -		u32 remap = I915_READ(GEN7_L3LOG_BASE + i);
> > -		if (remap && remap != dev_priv->l3_parity.remap_info[i/4])
> > +		u32 remap = I915_READ(reg_base + i);
> > +		if (remap && remap != remap_info[i/4])
> >  			DRM_DEBUG("0x%x was already programmed to %x\n",
> > -				  GEN7_L3LOG_BASE + i, remap);
> > -		if (remap && !dev_priv->l3_parity.remap_info[i/4])
> > +				  reg_base + i, remap);
> > +		if (remap && !remap_info[i/4])
> >  			DRM_DEBUG_DRIVER("Clearing remapped register\n");
> > -		I915_WRITE(GEN7_L3LOG_BASE + i, dev_priv->l3_parity.remap_info[i/4]);
> > +		I915_WRITE(reg_base + i, remap_info[i/4]);
> >  	}
> >  
> >  	/* Make sure all the writes land before disabling dop clock gating */
> > -	POSTING_READ(GEN7_L3LOG_BASE);
> > +	POSTING_READ(reg_base);
> >  
> >  	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> >  }
> > @@ -4377,7 +4382,7 @@ int
> >  i915_gem_init_hw(struct drm_device *dev)
> >  {
> >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > -	int ret;
> > +	int ret, i;
> >  
> >  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
> >  		return -EIO;
> > @@ -4396,7 +4401,8 @@ i915_gem_init_hw(struct drm_device *dev)
> >  		I915_WRITE(GEN7_MSG_CTL, temp);
> >  	}
> >  
> > -	i915_gem_l3_remap(dev);
> > +	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > +		i915_gem_l3_remap(dev, i);
> >  
> >  	i915_gem_init_swizzling(dev);
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 13d26cf..62cdf05 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -882,9 +882,10 @@ static void ivybridge_parity_work(struct work_struct *work)
> >  	drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
> >  						    l3_parity.error_work);
> >  	u32 error_status, row, bank, subbank;
> > -	char *parity_event[5];
> > +	char *parity_event[6];
> >  	uint32_t misccpctl;
> >  	unsigned long flags;
> > +	uint8_t slice = 0;
> >  
> >  	/* We must turn off DOP level clock gating to access the L3 registers.
> >  	 * In order to prevent a get/put style interface, acquire struct mutex
> > @@ -892,45 +893,63 @@ static void ivybridge_parity_work(struct work_struct *work)
> >  	 */
> >  	mutex_lock(&dev_priv->dev->struct_mutex);
> >  
> > +	/* If we've screwed up tracking, just let the interrupt fire again */
> > +	if (WARN_ON(!dev_priv->l3_parity.which_slice))
> > +		goto out;
> > +
> >  	misccpctl = I915_READ(GEN7_MISCCPCTL);
> >  	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
> >  	POSTING_READ(GEN7_MISCCPCTL);
> >  
> > -	error_status = I915_READ(GEN7_L3CDERRST1);
> > -	row = GEN7_PARITY_ERROR_ROW(error_status);
> > -	bank = GEN7_PARITY_ERROR_BANK(error_status);
> > -	subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
> > +	while ((slice = ffs(dev_priv->l3_parity.which_slice)) != 0) {
> > +		u32 reg;
> >  
> > -	I915_WRITE(GEN7_L3CDERRST1, GEN7_PARITY_ERROR_VALID |
> > -				    GEN7_L3CDERRST1_ENABLE);
> > -	POSTING_READ(GEN7_L3CDERRST1);
> > +		if (WARN_ON(slice >= MAX_L3_SLICES))
> > +			break;
> 
> Could be >= NUM_L3_SLICES(dev) for a bit of extra paranoia. Also we
> would fail to clear invalid bits from which_slice in this case, and
> thus we'd get the WARN every time the work runs. But I guess this
> should never happen in any case so probably not worth worrying about
> this too much.

Not worth worrying, but I didn't mean to be so noisy. I've fixed this
with WARN_ON_ONCE.

> 
> >  
> > -	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> > +		dev_priv->l3_parity.which_slice &= ~(1<<slice);
> >  
> > -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> > -	ilk_enable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> > -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> > +		reg = GEN7_L3CDERRST1 + (slice * 0x200);
> >  
> > -	mutex_unlock(&dev_priv->dev->struct_mutex);
> > +		error_status = I915_READ(reg);
> > +		row = GEN7_PARITY_ERROR_ROW(error_status);
> > +		bank = GEN7_PARITY_ERROR_BANK(error_status);
> > +		subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
> > +
> > +		I915_WRITE(reg, GEN7_PARITY_ERROR_VALID | GEN7_L3CDERRST1_ENABLE);
> > +		POSTING_READ(reg);
> > +
> > +		parity_event[0] = I915_L3_PARITY_UEVENT "=1";
> > +		parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
> > +		parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
> > +		parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
> > +		parity_event[4] = kasprintf(GFP_KERNEL, "SLICE=%d", slice);
> > +		parity_event[5] = NULL;
> >  
> > -	parity_event[0] = I915_L3_PARITY_UEVENT "=1";
> > -	parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
> > -	parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
> > -	parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
> > -	parity_event[4] = NULL;
> > +		kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
> > +				   KOBJ_CHANGE, parity_event);
> >  
> > -	kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
> > -			   KOBJ_CHANGE, parity_event);
> > +		DRM_DEBUG("Parity error: Slice = %d, Row = %d, Bank = %d, Sub bank = %d.\n",
> > +			  slice, row, bank, subbank);
> >  
> > -	DRM_DEBUG("Parity error: Row = %d, Bank = %d, Sub bank = %d.\n",
> > -		  row, bank, subbank);
> > +		kfree(parity_event[4]);
> > +		kfree(parity_event[3]);
> > +		kfree(parity_event[2]);
> > +		kfree(parity_event[1]);
> > +	}
> > +
> > +	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> > +
> > +out:
> > +	WARN_ON(dev_priv->l3_parity.which_slice);
> 
> First I figured the irq could rearm this behind our back, but we disable
> the irq until the work is done. So yeah, this is fine.
> 
> > +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> > +	ilk_enable_gt_irq(dev_priv, GT_PARITY_ERROR);
> 
> Is it actually safe to enable the second slice irq when there's no second
> slice? This docs say it's just "reserved", but no mention whether it RO or
> could there be side effects.

Tests on my machine appear to work. But I don't know for certain. Bryan,
could you answer this?

> 
> > +	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> >  
> > -	kfree(parity_event[3]);
> > -	kfree(parity_event[2]);
> > -	kfree(parity_event[1]);
> > +	mutex_unlock(&dev_priv->dev->struct_mutex);
> >  }
> >  
> > -static void ivybridge_parity_error_irq_handler(struct drm_device *dev)
> > +static void ivybridge_parity_error_irq_handler(struct drm_device *dev, u32 iir)
> >  {
> >  	drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private;
> >  
> > @@ -938,9 +957,12 @@ static void ivybridge_parity_error_irq_handler(struct drm_device *dev)
> >  		return;
> >  
> >  	spin_lock(&dev_priv->irq_lock);
> > -	ilk_disable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> > +	ilk_disable_gt_irq(dev_priv, GT_PARITY_ERROR);
> >  	spin_unlock(&dev_priv->irq_lock);
> >  
> > +	iir &= GT_PARITY_ERROR;
> > +	dev_priv->l3_parity.which_slice =
> > +		1 << (iir & GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1 ? 1 : 0);
> 
> What if both slices report an error at the same time?

I was thinking that such an event can not occur, but on rethinking it
you are right that it's possible. I really hope this never happens, but
it's fixed. Anyway, it should have been |=, not =


[snip]

I'll resend the patch after Bryan answers the question about both
interrupts.

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/8] drm/i915: Add second slice l3 remapping
  2013-09-17 18:45     ` Ben Widawsky
@ 2013-09-17 18:51       ` Bell, Bryan J
  2013-09-17 19:02         ` Ville Syrjälä
  0 siblings, 1 reply; 40+ messages in thread
From: Bell, Bryan J @ 2013-09-17 18:51 UTC (permalink / raw)
  To: Ben Widawsky, Ville Syrjälä
  Cc: intel-gfx, Venkatesh, Vishnu, Widawsky, Benjamin

>> > +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
>> > +	ilk_enable_gt_irq(dev_priv, GT_PARITY_ERROR);
>> 
>> Is it actually safe to enable the second slice irq when there's no 
>> second slice? This docs say it's just "reserved", but no mention 
>> whether it RO or could there be side effects.

>Tests on my machine appear to work. But I don't know for certain. Bryan, could you answer this?

On the Windows driver we enable the IRQ on all HSW skus and haven't seen any issues. 

-----Original Message-----
From: Ben Widawsky [mailto:ben@bwidawsk.net] 
Sent: Tuesday, September 17, 2013 11:46 AM
To: Ville Syrjälä; Bell, Bryan J
Cc: Widawsky, Benjamin; intel-gfx@lists.freedesktop.org; Venkatesh, Vishnu
Subject: Re: [Intel-gfx] [PATCH 5/8] drm/i915: Add second slice l3 remapping

On Fri, Sep 13, 2013 at 12:38:01PM +0300, Ville Syrjälä wrote:
> On Thu, Sep 12, 2013 at 10:28:31PM -0700, Ben Widawsky wrote:
> > Certain HSW SKUs have a second bank of L3. This L3 remapping has a 
> > separate register set, and interrupt from the first "slice". A slice 
> > is simply a term to define some subset of the GPU's l3 cache. This 
> > patch implements both the interrupt handler, and ability to 
> > communicate with userspace about this second slice.
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |  9 ++--
> >  drivers/gpu/drm/i915/i915_gem.c         | 26 ++++++----
> >  drivers/gpu/drm/i915/i915_irq.c         | 84 +++++++++++++++++++++------------
> >  drivers/gpu/drm/i915/i915_reg.h         |  6 +++
> >  drivers/gpu/drm/i915/i915_sysfs.c       | 34 ++++++++++---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +--
> >  include/uapi/drm/i915_drm.h             |  8 ++--
> >  7 files changed, 115 insertions(+), 58 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > b/drivers/gpu/drm/i915/i915_drv.h index 81ba5bb..eb90461 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -918,9 +918,11 @@ struct i915_ums_state {
> >  	int mm_suspended;
> >  };
> >  
> > +#define MAX_L3_SLICES 2
> >  struct intel_l3_parity {
> > -	u32 *remap_info;
> > +	u32 *remap_info[MAX_L3_SLICES];
> >  	struct work_struct error_work;
> > +	int which_slice;
> >  };
> >  
> >  struct i915_gem_mm {
> > @@ -1686,7 +1688,8 @@ struct drm_i915_file_private {
> >  
> >  #define HAS_FORCE_WAKE(dev) (INTEL_INFO(dev)->has_force_wake)
> >  
> > -#define HAS_L3_GPU_CACHE(dev) (IS_IVYBRIDGE(dev) || 
> > IS_HASWELL(dev))
> > +#define HAS_L3_GPU_CACHE(dev) (INTEL_INFO(dev)->gen >= 7) #define 
> > +NUM_L3_SLICES(dev) (IS_HSW_GT3(dev) ? 2 : HAS_L3_GPU_CACHE(dev))
> >  
> >  #define GT_FREQUENCY_MULTIPLIER 50
> >  
> > @@ -1947,7 +1950,7 @@ bool i915_gem_clflush_object(struct 
> > drm_i915_gem_object *obj, bool force);  int __must_check 
> > i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);  int 
> > __must_check i915_gem_init(struct drm_device *dev);  int 
> > __must_check i915_gem_init_hw(struct drm_device *dev); -void 
> > i915_gem_l3_remap(struct drm_device *dev);
> > +void i915_gem_l3_remap(struct drm_device *dev, int slice);
> >  void i915_gem_init_swizzling(struct drm_device *dev);  void 
> > i915_gem_cleanup_ringbuffer(struct drm_device *dev);  int 
> > __must_check i915_gpu_idle(struct drm_device *dev); diff --git 
> > a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c 
> > index 5b510a3..b11f7d6c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4256,16 +4256,21 @@ i915_gem_idle(struct drm_device *dev)
> >  	return 0;
> >  }
> >  
> > -void i915_gem_l3_remap(struct drm_device *dev)
> > +void i915_gem_l3_remap(struct drm_device *dev, int slice)
> >  {
> >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > +	u32 reg_base = GEN7_L3LOG_BASE + (slice * 0x200);
> > +	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
> >  	u32 misccpctl;
> >  	int i;
> >  
> >  	if (!HAS_L3_GPU_CACHE(dev))
> >  		return;
> >  
> > -	if (!dev_priv->l3_parity.remap_info)
> > +	if (NUM_L3_SLICES(dev) < 2 && slice)
> > +		return;
> 
> This check is redundant as we should never populate 
> l3_parity.remap_info[1] when there's no second slice.
> 

Got it. Smashed the early exit check together while at it.

> > +
> > +	if (!remap_info)
> >  		return;
> >  
> >  	misccpctl = I915_READ(GEN7_MISCCPCTL); @@ -4273,17 +4278,17 @@ 
> > void i915_gem_l3_remap(struct drm_device *dev)
> >  	POSTING_READ(GEN7_MISCCPCTL);
> >  
> >  	for (i = 0; i < GEN7_L3LOG_SIZE; i += 4) {
> > -		u32 remap = I915_READ(GEN7_L3LOG_BASE + i);
> > -		if (remap && remap != dev_priv->l3_parity.remap_info[i/4])
> > +		u32 remap = I915_READ(reg_base + i);
> > +		if (remap && remap != remap_info[i/4])
> >  			DRM_DEBUG("0x%x was already programmed to %x\n",
> > -				  GEN7_L3LOG_BASE + i, remap);
> > -		if (remap && !dev_priv->l3_parity.remap_info[i/4])
> > +				  reg_base + i, remap);
> > +		if (remap && !remap_info[i/4])
> >  			DRM_DEBUG_DRIVER("Clearing remapped register\n");
> > -		I915_WRITE(GEN7_L3LOG_BASE + i, dev_priv->l3_parity.remap_info[i/4]);
> > +		I915_WRITE(reg_base + i, remap_info[i/4]);
> >  	}
> >  
> >  	/* Make sure all the writes land before disabling dop clock gating */
> > -	POSTING_READ(GEN7_L3LOG_BASE);
> > +	POSTING_READ(reg_base);
> >  
> >  	I915_WRITE(GEN7_MISCCPCTL, misccpctl);  } @@ -4377,7 +4382,7 @@ 
> > int  i915_gem_init_hw(struct drm_device *dev)  {
> >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > -	int ret;
> > +	int ret, i;
> >  
> >  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
> >  		return -EIO;
> > @@ -4396,7 +4401,8 @@ i915_gem_init_hw(struct drm_device *dev)
> >  		I915_WRITE(GEN7_MSG_CTL, temp);
> >  	}
> >  
> > -	i915_gem_l3_remap(dev);
> > +	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > +		i915_gem_l3_remap(dev, i);
> >  
> >  	i915_gem_init_swizzling(dev);
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> > b/drivers/gpu/drm/i915/i915_irq.c index 13d26cf..62cdf05 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -882,9 +882,10 @@ static void ivybridge_parity_work(struct work_struct *work)
> >  	drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
> >  						    l3_parity.error_work);
> >  	u32 error_status, row, bank, subbank;
> > -	char *parity_event[5];
> > +	char *parity_event[6];
> >  	uint32_t misccpctl;
> >  	unsigned long flags;
> > +	uint8_t slice = 0;
> >  
> >  	/* We must turn off DOP level clock gating to access the L3 registers.
> >  	 * In order to prevent a get/put style interface, acquire struct 
> > mutex @@ -892,45 +893,63 @@ static void ivybridge_parity_work(struct work_struct *work)
> >  	 */
> >  	mutex_lock(&dev_priv->dev->struct_mutex);
> >  
> > +	/* If we've screwed up tracking, just let the interrupt fire again */
> > +	if (WARN_ON(!dev_priv->l3_parity.which_slice))
> > +		goto out;
> > +
> >  	misccpctl = I915_READ(GEN7_MISCCPCTL);
> >  	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
> >  	POSTING_READ(GEN7_MISCCPCTL);
> >  
> > -	error_status = I915_READ(GEN7_L3CDERRST1);
> > -	row = GEN7_PARITY_ERROR_ROW(error_status);
> > -	bank = GEN7_PARITY_ERROR_BANK(error_status);
> > -	subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
> > +	while ((slice = ffs(dev_priv->l3_parity.which_slice)) != 0) {
> > +		u32 reg;
> >  
> > -	I915_WRITE(GEN7_L3CDERRST1, GEN7_PARITY_ERROR_VALID |
> > -				    GEN7_L3CDERRST1_ENABLE);
> > -	POSTING_READ(GEN7_L3CDERRST1);
> > +		if (WARN_ON(slice >= MAX_L3_SLICES))
> > +			break;
> 
> Could be >= NUM_L3_SLICES(dev) for a bit of extra paranoia. Also we 
> would fail to clear invalid bits from which_slice in this case, and 
> thus we'd get the WARN every time the work runs. But I guess this 
> should never happen in any case so probably not worth worrying about 
> this too much.

Not worth worrying, but I didn't mean to be so noisy. I've fixed this with WARN_ON_ONCE.

> 
> >  
> > -	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> > +		dev_priv->l3_parity.which_slice &= ~(1<<slice);
> >  
> > -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> > -	ilk_enable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> > -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> > +		reg = GEN7_L3CDERRST1 + (slice * 0x200);
> >  
> > -	mutex_unlock(&dev_priv->dev->struct_mutex);
> > +		error_status = I915_READ(reg);
> > +		row = GEN7_PARITY_ERROR_ROW(error_status);
> > +		bank = GEN7_PARITY_ERROR_BANK(error_status);
> > +		subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
> > +
> > +		I915_WRITE(reg, GEN7_PARITY_ERROR_VALID | GEN7_L3CDERRST1_ENABLE);
> > +		POSTING_READ(reg);
> > +
> > +		parity_event[0] = I915_L3_PARITY_UEVENT "=1";
> > +		parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
> > +		parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
> > +		parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
> > +		parity_event[4] = kasprintf(GFP_KERNEL, "SLICE=%d", slice);
> > +		parity_event[5] = NULL;
> >  
> > -	parity_event[0] = I915_L3_PARITY_UEVENT "=1";
> > -	parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
> > -	parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
> > -	parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
> > -	parity_event[4] = NULL;
> > +		kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
> > +				   KOBJ_CHANGE, parity_event);
> >  
> > -	kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
> > -			   KOBJ_CHANGE, parity_event);
> > +		DRM_DEBUG("Parity error: Slice = %d, Row = %d, Bank = %d, Sub bank = %d.\n",
> > +			  slice, row, bank, subbank);
> >  
> > -	DRM_DEBUG("Parity error: Row = %d, Bank = %d, Sub bank = %d.\n",
> > -		  row, bank, subbank);
> > +		kfree(parity_event[4]);
> > +		kfree(parity_event[3]);
> > +		kfree(parity_event[2]);
> > +		kfree(parity_event[1]);
> > +	}
> > +
> > +	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> > +
> > +out:
> > +	WARN_ON(dev_priv->l3_parity.which_slice);
> 
> First I figured the irq could rearm this behind our back, but we 
> disable the irq until the work is done. So yeah, this is fine.
> 
> > +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> > +	ilk_enable_gt_irq(dev_priv, GT_PARITY_ERROR);
> 
> Is it actually safe to enable the second slice irq when there's no 
> second slice? This docs say it's just "reserved", but no mention 
> whether it RO or could there be side effects.

Tests on my machine appear to work. But I don't know for certain. Bryan, could you answer this?

> 
> > +	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> >  
> > -	kfree(parity_event[3]);
> > -	kfree(parity_event[2]);
> > -	kfree(parity_event[1]);
> > +	mutex_unlock(&dev_priv->dev->struct_mutex);
> >  }
> >  
> > -static void ivybridge_parity_error_irq_handler(struct drm_device 
> > *dev)
> > +static void ivybridge_parity_error_irq_handler(struct drm_device 
> > +*dev, u32 iir)
> >  {
> >  	drm_i915_private_t *dev_priv = (drm_i915_private_t *) 
> > dev->dev_private;
> >  
> > @@ -938,9 +957,12 @@ static void ivybridge_parity_error_irq_handler(struct drm_device *dev)
> >  		return;
> >  
> >  	spin_lock(&dev_priv->irq_lock);
> > -	ilk_disable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> > +	ilk_disable_gt_irq(dev_priv, GT_PARITY_ERROR);
> >  	spin_unlock(&dev_priv->irq_lock);
> >  
> > +	iir &= GT_PARITY_ERROR;
> > +	dev_priv->l3_parity.which_slice =
> > +		1 << (iir & GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1 ? 1 : 0);
> 
> What if both slices report an error at the same time?

I was thinking that such an event can not occur, but on rethinking it you are right that it's possible. I really hope this never happens, but it's fixed. Anyway, it should have been |=, not =


[snip]

I'll resend the patch after Bryan answers the question about both interrupts.

--
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/8] drm/i915: Add second slice l3 remapping
  2013-09-17 18:51       ` Bell, Bryan J
@ 2013-09-17 19:02         ` Ville Syrjälä
  2013-09-17 19:08           ` Bell, Bryan J
  0 siblings, 1 reply; 40+ messages in thread
From: Ville Syrjälä @ 2013-09-17 19:02 UTC (permalink / raw)
  To: Bell, Bryan J
  Cc: Ben Widawsky, Venkatesh, Vishnu, intel-gfx, Widawsky, Benjamin

On Tue, Sep 17, 2013 at 06:51:31PM +0000, Bell, Bryan J wrote:
> >> > +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> >> > +	ilk_enable_gt_irq(dev_priv, GT_PARITY_ERROR);
> >> 
> >> Is it actually safe to enable the second slice irq when there's no 
> >> second slice? This docs say it's just "reserved", but no mention 
> >> whether it RO or could there be side effects.
> 
> >Tests on my machine appear to work. But I don't know for certain. Bryan, could you answer this?
> 
> On the Windows driver we enable the IRQ on all HSW skus and haven't seen any issues. 

This code would enable it for IVB too. Any data on that?

> 
> -----Original Message-----
> From: Ben Widawsky [mailto:ben@bwidawsk.net] 
> Sent: Tuesday, September 17, 2013 11:46 AM
> To: Ville Syrjälä; Bell, Bryan J
> Cc: Widawsky, Benjamin; intel-gfx@lists.freedesktop.org; Venkatesh, Vishnu
> Subject: Re: [Intel-gfx] [PATCH 5/8] drm/i915: Add second slice l3 remapping
> 
> On Fri, Sep 13, 2013 at 12:38:01PM +0300, Ville Syrjälä wrote:
> > On Thu, Sep 12, 2013 at 10:28:31PM -0700, Ben Widawsky wrote:
> > > Certain HSW SKUs have a second bank of L3. This L3 remapping has a 
> > > separate register set, and interrupt from the first "slice". A slice 
> > > is simply a term to define some subset of the GPU's l3 cache. This 
> > > patch implements both the interrupt handler, and ability to 
> > > communicate with userspace about this second slice.
> > > 
> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > > ---
> > >  drivers/gpu/drm/i915/i915_drv.h         |  9 ++--
> > >  drivers/gpu/drm/i915/i915_gem.c         | 26 ++++++----
> > >  drivers/gpu/drm/i915/i915_irq.c         | 84 +++++++++++++++++++++------------
> > >  drivers/gpu/drm/i915/i915_reg.h         |  6 +++
> > >  drivers/gpu/drm/i915/i915_sysfs.c       | 34 ++++++++++---
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +--
> > >  include/uapi/drm/i915_drm.h             |  8 ++--
> > >  7 files changed, 115 insertions(+), 58 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > > b/drivers/gpu/drm/i915/i915_drv.h index 81ba5bb..eb90461 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -918,9 +918,11 @@ struct i915_ums_state {
> > >  	int mm_suspended;
> > >  };
> > >  
> > > +#define MAX_L3_SLICES 2
> > >  struct intel_l3_parity {
> > > -	u32 *remap_info;
> > > +	u32 *remap_info[MAX_L3_SLICES];
> > >  	struct work_struct error_work;
> > > +	int which_slice;
> > >  };
> > >  
> > >  struct i915_gem_mm {
> > > @@ -1686,7 +1688,8 @@ struct drm_i915_file_private {
> > >  
> > >  #define HAS_FORCE_WAKE(dev) (INTEL_INFO(dev)->has_force_wake)
> > >  
> > > -#define HAS_L3_GPU_CACHE(dev) (IS_IVYBRIDGE(dev) || 
> > > IS_HASWELL(dev))
> > > +#define HAS_L3_GPU_CACHE(dev) (INTEL_INFO(dev)->gen >= 7) #define 
> > > +NUM_L3_SLICES(dev) (IS_HSW_GT3(dev) ? 2 : HAS_L3_GPU_CACHE(dev))
> > >  
> > >  #define GT_FREQUENCY_MULTIPLIER 50
> > >  
> > > @@ -1947,7 +1950,7 @@ bool i915_gem_clflush_object(struct 
> > > drm_i915_gem_object *obj, bool force);  int __must_check 
> > > i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);  int 
> > > __must_check i915_gem_init(struct drm_device *dev);  int 
> > > __must_check i915_gem_init_hw(struct drm_device *dev); -void 
> > > i915_gem_l3_remap(struct drm_device *dev);
> > > +void i915_gem_l3_remap(struct drm_device *dev, int slice);
> > >  void i915_gem_init_swizzling(struct drm_device *dev);  void 
> > > i915_gem_cleanup_ringbuffer(struct drm_device *dev);  int 
> > > __must_check i915_gpu_idle(struct drm_device *dev); diff --git 
> > > a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c 
> > > index 5b510a3..b11f7d6c 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -4256,16 +4256,21 @@ i915_gem_idle(struct drm_device *dev)
> > >  	return 0;
> > >  }
> > >  
> > > -void i915_gem_l3_remap(struct drm_device *dev)
> > > +void i915_gem_l3_remap(struct drm_device *dev, int slice)
> > >  {
> > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > +	u32 reg_base = GEN7_L3LOG_BASE + (slice * 0x200);
> > > +	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
> > >  	u32 misccpctl;
> > >  	int i;
> > >  
> > >  	if (!HAS_L3_GPU_CACHE(dev))
> > >  		return;
> > >  
> > > -	if (!dev_priv->l3_parity.remap_info)
> > > +	if (NUM_L3_SLICES(dev) < 2 && slice)
> > > +		return;
> > 
> > This check is redundant as we should never populate 
> > l3_parity.remap_info[1] when there's no second slice.
> > 
> 
> Got it. Smashed the early exit check together while at it.
> 
> > > +
> > > +	if (!remap_info)
> > >  		return;
> > >  
> > >  	misccpctl = I915_READ(GEN7_MISCCPCTL); @@ -4273,17 +4278,17 @@ 
> > > void i915_gem_l3_remap(struct drm_device *dev)
> > >  	POSTING_READ(GEN7_MISCCPCTL);
> > >  
> > >  	for (i = 0; i < GEN7_L3LOG_SIZE; i += 4) {
> > > -		u32 remap = I915_READ(GEN7_L3LOG_BASE + i);
> > > -		if (remap && remap != dev_priv->l3_parity.remap_info[i/4])
> > > +		u32 remap = I915_READ(reg_base + i);
> > > +		if (remap && remap != remap_info[i/4])
> > >  			DRM_DEBUG("0x%x was already programmed to %x\n",
> > > -				  GEN7_L3LOG_BASE + i, remap);
> > > -		if (remap && !dev_priv->l3_parity.remap_info[i/4])
> > > +				  reg_base + i, remap);
> > > +		if (remap && !remap_info[i/4])
> > >  			DRM_DEBUG_DRIVER("Clearing remapped register\n");
> > > -		I915_WRITE(GEN7_L3LOG_BASE + i, dev_priv->l3_parity.remap_info[i/4]);
> > > +		I915_WRITE(reg_base + i, remap_info[i/4]);
> > >  	}
> > >  
> > >  	/* Make sure all the writes land before disabling dop clock gating */
> > > -	POSTING_READ(GEN7_L3LOG_BASE);
> > > +	POSTING_READ(reg_base);
> > >  
> > >  	I915_WRITE(GEN7_MISCCPCTL, misccpctl);  } @@ -4377,7 +4382,7 @@ 
> > > int  i915_gem_init_hw(struct drm_device *dev)  {
> > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > -	int ret;
> > > +	int ret, i;
> > >  
> > >  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
> > >  		return -EIO;
> > > @@ -4396,7 +4401,8 @@ i915_gem_init_hw(struct drm_device *dev)
> > >  		I915_WRITE(GEN7_MSG_CTL, temp);
> > >  	}
> > >  
> > > -	i915_gem_l3_remap(dev);
> > > +	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > > +		i915_gem_l3_remap(dev, i);
> > >  
> > >  	i915_gem_init_swizzling(dev);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> > > b/drivers/gpu/drm/i915/i915_irq.c index 13d26cf..62cdf05 100644
> > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > @@ -882,9 +882,10 @@ static void ivybridge_parity_work(struct work_struct *work)
> > >  	drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
> > >  						    l3_parity.error_work);
> > >  	u32 error_status, row, bank, subbank;
> > > -	char *parity_event[5];
> > > +	char *parity_event[6];
> > >  	uint32_t misccpctl;
> > >  	unsigned long flags;
> > > +	uint8_t slice = 0;
> > >  
> > >  	/* We must turn off DOP level clock gating to access the L3 registers.
> > >  	 * In order to prevent a get/put style interface, acquire struct 
> > > mutex @@ -892,45 +893,63 @@ static void ivybridge_parity_work(struct work_struct *work)
> > >  	 */
> > >  	mutex_lock(&dev_priv->dev->struct_mutex);
> > >  
> > > +	/* If we've screwed up tracking, just let the interrupt fire again */
> > > +	if (WARN_ON(!dev_priv->l3_parity.which_slice))
> > > +		goto out;
> > > +
> > >  	misccpctl = I915_READ(GEN7_MISCCPCTL);
> > >  	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
> > >  	POSTING_READ(GEN7_MISCCPCTL);
> > >  
> > > -	error_status = I915_READ(GEN7_L3CDERRST1);
> > > -	row = GEN7_PARITY_ERROR_ROW(error_status);
> > > -	bank = GEN7_PARITY_ERROR_BANK(error_status);
> > > -	subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
> > > +	while ((slice = ffs(dev_priv->l3_parity.which_slice)) != 0) {
> > > +		u32 reg;
> > >  
> > > -	I915_WRITE(GEN7_L3CDERRST1, GEN7_PARITY_ERROR_VALID |
> > > -				    GEN7_L3CDERRST1_ENABLE);
> > > -	POSTING_READ(GEN7_L3CDERRST1);
> > > +		if (WARN_ON(slice >= MAX_L3_SLICES))
> > > +			break;
> > 
> > Could be >= NUM_L3_SLICES(dev) for a bit of extra paranoia. Also we 
> > would fail to clear invalid bits from which_slice in this case, and 
> > thus we'd get the WARN every time the work runs. But I guess this 
> > should never happen in any case so probably not worth worrying about 
> > this too much.
> 
> Not worth worrying, but I didn't mean to be so noisy. I've fixed this with WARN_ON_ONCE.
> 
> > 
> > >  
> > > -	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> > > +		dev_priv->l3_parity.which_slice &= ~(1<<slice);
> > >  
> > > -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> > > -	ilk_enable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> > > -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> > > +		reg = GEN7_L3CDERRST1 + (slice * 0x200);
> > >  
> > > -	mutex_unlock(&dev_priv->dev->struct_mutex);
> > > +		error_status = I915_READ(reg);
> > > +		row = GEN7_PARITY_ERROR_ROW(error_status);
> > > +		bank = GEN7_PARITY_ERROR_BANK(error_status);
> > > +		subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
> > > +
> > > +		I915_WRITE(reg, GEN7_PARITY_ERROR_VALID | GEN7_L3CDERRST1_ENABLE);
> > > +		POSTING_READ(reg);
> > > +
> > > +		parity_event[0] = I915_L3_PARITY_UEVENT "=1";
> > > +		parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
> > > +		parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
> > > +		parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
> > > +		parity_event[4] = kasprintf(GFP_KERNEL, "SLICE=%d", slice);
> > > +		parity_event[5] = NULL;
> > >  
> > > -	parity_event[0] = I915_L3_PARITY_UEVENT "=1";
> > > -	parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
> > > -	parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
> > > -	parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
> > > -	parity_event[4] = NULL;
> > > +		kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
> > > +				   KOBJ_CHANGE, parity_event);
> > >  
> > > -	kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
> > > -			   KOBJ_CHANGE, parity_event);
> > > +		DRM_DEBUG("Parity error: Slice = %d, Row = %d, Bank = %d, Sub bank = %d.\n",
> > > +			  slice, row, bank, subbank);
> > >  
> > > -	DRM_DEBUG("Parity error: Row = %d, Bank = %d, Sub bank = %d.\n",
> > > -		  row, bank, subbank);
> > > +		kfree(parity_event[4]);
> > > +		kfree(parity_event[3]);
> > > +		kfree(parity_event[2]);
> > > +		kfree(parity_event[1]);
> > > +	}
> > > +
> > > +	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> > > +
> > > +out:
> > > +	WARN_ON(dev_priv->l3_parity.which_slice);
> > 
> > First I figured the irq could rearm this behind our back, but we 
> > disable the irq until the work is done. So yeah, this is fine.
> > 
> > > +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> > > +	ilk_enable_gt_irq(dev_priv, GT_PARITY_ERROR);
> > 
> > Is it actually safe to enable the second slice irq when there's no 
> > second slice? This docs say it's just "reserved", but no mention 
> > whether it RO or could there be side effects.
> 
> Tests on my machine appear to work. But I don't know for certain. Bryan, could you answer this?
> 
> > 
> > > +	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> > >  
> > > -	kfree(parity_event[3]);
> > > -	kfree(parity_event[2]);
> > > -	kfree(parity_event[1]);
> > > +	mutex_unlock(&dev_priv->dev->struct_mutex);
> > >  }
> > >  
> > > -static void ivybridge_parity_error_irq_handler(struct drm_device 
> > > *dev)
> > > +static void ivybridge_parity_error_irq_handler(struct drm_device 
> > > +*dev, u32 iir)
> > >  {
> > >  	drm_i915_private_t *dev_priv = (drm_i915_private_t *) 
> > > dev->dev_private;
> > >  
> > > @@ -938,9 +957,12 @@ static void ivybridge_parity_error_irq_handler(struct drm_device *dev)
> > >  		return;
> > >  
> > >  	spin_lock(&dev_priv->irq_lock);
> > > -	ilk_disable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> > > +	ilk_disable_gt_irq(dev_priv, GT_PARITY_ERROR);
> > >  	spin_unlock(&dev_priv->irq_lock);
> > >  
> > > +	iir &= GT_PARITY_ERROR;
> > > +	dev_priv->l3_parity.which_slice =
> > > +		1 << (iir & GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1 ? 1 : 0);
> > 
> > What if both slices report an error at the same time?
> 
> I was thinking that such an event can not occur, but on rethinking it you are right that it's possible. I really hope this never happens, but it's fixed. Anyway, it should have been |=, not =
> 
> 
> [snip]
> 
> I'll resend the patch after Bryan answers the question about both interrupts.
> 
> --
> Ben Widawsky, Intel Open Source Technology Center

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/8] drm/i915: Add second slice l3 remapping
  2013-09-17 19:02         ` Ville Syrjälä
@ 2013-09-17 19:08           ` Bell, Bryan J
  0 siblings, 0 replies; 40+ messages in thread
From: Bell, Bryan J @ 2013-09-17 19:08 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Ben Widawsky, Venkatesh, Vishnu, intel-gfx, Widawsky, Benjamin

> -----Original Message-----
> From: Ville Syrjälä [mailto:ville.syrjala@linux.intel.com] 
> Sent: Tuesday, September 17, 2013 12:02 PM
> To: Bell, Bryan J
> Cc: Ben Widawsky; Widawsky, Benjamin; intel-gfx@lists.freedesktop.org; Venkatesh, Vishnu
> Subject: Re: [Intel-gfx] [PATCH 5/8] drm/i915: Add second slice l3 remapping
> 
> On Tue, Sep 17, 2013 at 06:51:31PM +0000, Bell, Bryan J wrote:
> > >> > +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> > >> > +	ilk_enable_gt_irq(dev_priv, GT_PARITY_ERROR);
> > >> 
> > >> Is it actually safe to enable the second slice irq when there's no 
> > >> second slice? This docs say it's just "reserved", but no mention 
> > >> whether it RO or could there be side effects.
> > 
> > >Tests on my machine appear to work. But I don't know for certain. Bryan, could you answer this?
> > 
> > On the Windows driver we enable the IRQ on all HSW skus and haven't seen any issues. 
> 
> This code would enable it for IVB too. Any data on that?

No data on IVB, I recommend against enabling it on IVB. 

FYI: My understanding is that L3 DPF code and or interrupts should be disabled on VLV. 
VLV does not support dynamic L3 row replacement. 

> > 
> > -----Original Message-----
> > From: Ben Widawsky [mailto:ben@bwidawsk.net]
> > Sent: Tuesday, September 17, 2013 11:46 AM
> > To: Ville Syrjälä; Bell, Bryan J
> > Cc: Widawsky, Benjamin; intel-gfx@lists.freedesktop.org; Venkatesh, 
> > Vishnu
> > Subject: Re: [Intel-gfx] [PATCH 5/8] drm/i915: Add second slice l3 
> > remapping
> > 
> > On Fri, Sep 13, 2013 at 12:38:01PM +0300, Ville Syrjälä wrote:
> > > On Thu, Sep 12, 2013 at 10:28:31PM -0700, Ben Widawsky wrote:
> > > > Certain HSW SKUs have a second bank of L3. This L3 remapping has a 
> > > > separate register set, and interrupt from the first "slice". A 
> > > > slice is simply a term to define some subset of the GPU's l3 
> > > > cache. This patch implements both the interrupt handler, and 
> > > > ability to communicate with userspace about this second slice.
> > > > 
> > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_drv.h         |  9 ++--
> > > >  drivers/gpu/drm/i915/i915_gem.c         | 26 ++++++----
> > > >  drivers/gpu/drm/i915/i915_irq.c         | 84 +++++++++++++++++++++------------
> > > >  drivers/gpu/drm/i915/i915_reg.h         |  6 +++
> > > >  drivers/gpu/drm/i915/i915_sysfs.c       | 34 ++++++++++---
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +--
> > > >  include/uapi/drm/i915_drm.h             |  8 ++--
> > > >  7 files changed, 115 insertions(+), 58 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > > > b/drivers/gpu/drm/i915/i915_drv.h index 81ba5bb..eb90461 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > @@ -918,9 +918,11 @@ struct i915_ums_state {
> > > >  	int mm_suspended;
> > > >  };
> > > >  
> > > > +#define MAX_L3_SLICES 2
> > > >  struct intel_l3_parity {
> > > > -	u32 *remap_info;
> > > > +	u32 *remap_info[MAX_L3_SLICES];
> > > >  	struct work_struct error_work;
> > > > +	int which_slice;
> > > >  };
> > > >  
> > > >  struct i915_gem_mm {
> > > > @@ -1686,7 +1688,8 @@ struct drm_i915_file_private {
> > > >  
> > > >  #define HAS_FORCE_WAKE(dev) (INTEL_INFO(dev)->has_force_wake)
> > > >  
> > > > -#define HAS_L3_GPU_CACHE(dev) (IS_IVYBRIDGE(dev) ||
> > > > IS_HASWELL(dev))
> > > > +#define HAS_L3_GPU_CACHE(dev) (INTEL_INFO(dev)->gen >= 7) #define
> > > > +NUM_L3_SLICES(dev) (IS_HSW_GT3(dev) ? 2 : HAS_L3_GPU_CACHE(dev))
> > > >  
> > > >  #define GT_FREQUENCY_MULTIPLIER 50
> > > >  
> > > > @@ -1947,7 +1950,7 @@ bool i915_gem_clflush_object(struct 
> > > > drm_i915_gem_object *obj, bool force);  int __must_check 
> > > > i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);  int 
> > > > __must_check i915_gem_init(struct drm_device *dev);  int 
> > > > __must_check i915_gem_init_hw(struct drm_device *dev); -void 
> > > > i915_gem_l3_remap(struct drm_device *dev);
> > > > +void i915_gem_l3_remap(struct drm_device *dev, int slice);
> > > >  void i915_gem_init_swizzling(struct drm_device *dev);  void 
> > > > i915_gem_cleanup_ringbuffer(struct drm_device *dev);  int 
> > > > __must_check i915_gpu_idle(struct drm_device *dev); diff --git 
> > > > a/drivers/gpu/drm/i915/i915_gem.c 
> > > > b/drivers/gpu/drm/i915/i915_gem.c index 5b510a3..b11f7d6c 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -4256,16 +4256,21 @@ i915_gem_idle(struct drm_device *dev)
> > > >  	return 0;
> > > >  }
> > > >  
> > > > -void i915_gem_l3_remap(struct drm_device *dev)
> > > > +void i915_gem_l3_remap(struct drm_device *dev, int slice)
> > > >  {
> > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > +	u32 reg_base = GEN7_L3LOG_BASE + (slice * 0x200);
> > > > +	u32 *remap_info = dev_priv->l3_parity.remap_info[slice];
> > > >  	u32 misccpctl;
> > > >  	int i;
> > > >  
> > > >  	if (!HAS_L3_GPU_CACHE(dev))
> > > >  		return;
> > > >  
> > > > -	if (!dev_priv->l3_parity.remap_info)
> > > > +	if (NUM_L3_SLICES(dev) < 2 && slice)
> > > > +		return;
> > > 
> > > This check is redundant as we should never populate 
> > > l3_parity.remap_info[1] when there's no second slice.
> > > 
> > 
> > Got it. Smashed the early exit check together while at it.
> > 
> > > > +
> > > > +	if (!remap_info)
> > > >  		return;
> > > >  
> > > >  	misccpctl = I915_READ(GEN7_MISCCPCTL); @@ -4273,17 +4278,17 @@ 
> > > > void i915_gem_l3_remap(struct drm_device *dev)
> > > >  	POSTING_READ(GEN7_MISCCPCTL);
> > > >  
> > > >  	for (i = 0; i < GEN7_L3LOG_SIZE; i += 4) {
> > > > -		u32 remap = I915_READ(GEN7_L3LOG_BASE + i);
> > > > -		if (remap && remap != dev_priv->l3_parity.remap_info[i/4])
> > > > +		u32 remap = I915_READ(reg_base + i);
> > > > +		if (remap && remap != remap_info[i/4])
> > > >  			DRM_DEBUG("0x%x was already programmed to %x\n",
> > > > -				  GEN7_L3LOG_BASE + i, remap);
> > > > -		if (remap && !dev_priv->l3_parity.remap_info[i/4])
> > > > +				  reg_base + i, remap);
> > > > +		if (remap && !remap_info[i/4])
> > > >  			DRM_DEBUG_DRIVER("Clearing remapped register\n");
> > > > -		I915_WRITE(GEN7_L3LOG_BASE + i, dev_priv->l3_parity.remap_info[i/4]);
> > > > +		I915_WRITE(reg_base + i, remap_info[i/4]);
> > > >  	}
> > > >  
> > > >  	/* Make sure all the writes land before disabling dop clock gating */
> > > > -	POSTING_READ(GEN7_L3LOG_BASE);
> > > > +	POSTING_READ(reg_base);
> > > >  
> > > >  	I915_WRITE(GEN7_MISCCPCTL, misccpctl);  } @@ -4377,7 +4382,7 @@ 
> > > > int  i915_gem_init_hw(struct drm_device *dev)  {
> > > >  	drm_i915_private_t *dev_priv = dev->dev_private;
> > > > -	int ret;
> > > > +	int ret, i;
> > > >  
> > > >  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
> > > >  		return -EIO;
> > > > @@ -4396,7 +4401,8 @@ i915_gem_init_hw(struct drm_device *dev)
> > > >  		I915_WRITE(GEN7_MSG_CTL, temp);
> > > >  	}
> > > >  
> > > > -	i915_gem_l3_remap(dev);
> > > > +	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > > > +		i915_gem_l3_remap(dev, i);
> > > >  
> > > >  	i915_gem_init_swizzling(dev);
> > > >  
> > > > diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> > > > b/drivers/gpu/drm/i915/i915_irq.c index 13d26cf..62cdf05 100644
> > > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > > @@ -882,9 +882,10 @@ static void ivybridge_parity_work(struct work_struct *work)
> > > >  	drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
> > > >  						    l3_parity.error_work);
> > > >  	u32 error_status, row, bank, subbank;
> > > > -	char *parity_event[5];
> > > > +	char *parity_event[6];
> > > >  	uint32_t misccpctl;
> > > >  	unsigned long flags;
> > > > +	uint8_t slice = 0;
> > > >  
> > > >  	/* We must turn off DOP level clock gating to access the L3 registers.
> > > >  	 * In order to prevent a get/put style interface, acquire struct 
> > > > mutex @@ -892,45 +893,63 @@ static void ivybridge_parity_work(struct work_struct *work)
> > > >  	 */
> > > >  	mutex_lock(&dev_priv->dev->struct_mutex);
> > > >  
> > > > +	/* If we've screwed up tracking, just let the interrupt fire again */
> > > > +	if (WARN_ON(!dev_priv->l3_parity.which_slice))
> > > > +		goto out;
> > > > +
> > > >  	misccpctl = I915_READ(GEN7_MISCCPCTL);
> > > >  	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
> > > >  	POSTING_READ(GEN7_MISCCPCTL);
> > > >  
> > > > -	error_status = I915_READ(GEN7_L3CDERRST1);
> > > > -	row = GEN7_PARITY_ERROR_ROW(error_status);
> > > > -	bank = GEN7_PARITY_ERROR_BANK(error_status);
> > > > -	subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
> > > > +	while ((slice = ffs(dev_priv->l3_parity.which_slice)) != 0) {
> > > > +		u32 reg;
> > > >  
> > > > -	I915_WRITE(GEN7_L3CDERRST1, GEN7_PARITY_ERROR_VALID |
> > > > -				    GEN7_L3CDERRST1_ENABLE);
> > > > -	POSTING_READ(GEN7_L3CDERRST1);
> > > > +		if (WARN_ON(slice >= MAX_L3_SLICES))
> > > > +			break;
> > > 
> > > Could be >= NUM_L3_SLICES(dev) for a bit of extra paranoia. Also we 
> > > would fail to clear invalid bits from which_slice in this case, and 
> > > thus we'd get the WARN every time the work runs. But I guess this 
> > > should never happen in any case so probably not worth worrying about 
> > > this too much.
> > 
> > Not worth worrying, but I didn't mean to be so noisy. I've fixed this with WARN_ON_ONCE.
> > 
> > > 
> > > >  
> > > > -	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> > > > +		dev_priv->l3_parity.which_slice &= ~(1<<slice);
> > > >  
> > > > -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> > > > -	ilk_enable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> > > > -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> > > > +		reg = GEN7_L3CDERRST1 + (slice * 0x200);
> > > >  
> > > > -	mutex_unlock(&dev_priv->dev->struct_mutex);
> > > > +		error_status = I915_READ(reg);
> > > > +		row = GEN7_PARITY_ERROR_ROW(error_status);
> > > > +		bank = GEN7_PARITY_ERROR_BANK(error_status);
> > > > +		subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
> > > > +
> > > > +		I915_WRITE(reg, GEN7_PARITY_ERROR_VALID | GEN7_L3CDERRST1_ENABLE);
> > > > +		POSTING_READ(reg);
> > > > +
> > > > +		parity_event[0] = I915_L3_PARITY_UEVENT "=1";
> > > > +		parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
> > > > +		parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
> > > > +		parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
> > > > +		parity_event[4] = kasprintf(GFP_KERNEL, "SLICE=%d", slice);
> > > > +		parity_event[5] = NULL;
> > > >  
> > > > -	parity_event[0] = I915_L3_PARITY_UEVENT "=1";
> > > > -	parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
> > > > -	parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
> > > > -	parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
> > > > -	parity_event[4] = NULL;
> > > > +		kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
> > > > +				   KOBJ_CHANGE, parity_event);
> > > >  
> > > > -	kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
> > > > -			   KOBJ_CHANGE, parity_event);
> > > > +		DRM_DEBUG("Parity error: Slice = %d, Row = %d, Bank = %d, Sub bank = %d.\n",
> > > > +			  slice, row, bank, subbank);
> > > >  
> > > > -	DRM_DEBUG("Parity error: Row = %d, Bank = %d, Sub bank = %d.\n",
> > > > -		  row, bank, subbank);
> > > > +		kfree(parity_event[4]);
> > > > +		kfree(parity_event[3]);
> > > > +		kfree(parity_event[2]);
> > > > +		kfree(parity_event[1]);
> > > > +	}
> > > > +
> > > > +	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> > > > +
> > > > +out:
> > > > +	WARN_ON(dev_priv->l3_parity.which_slice);
> > > 
> > > First I figured the irq could rearm this behind our back, but we 
> > > disable the irq until the work is done. So yeah, this is fine.
> > > 
> > > > +	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> > > > +	ilk_enable_gt_irq(dev_priv, GT_PARITY_ERROR);
> > > 
> > > Is it actually safe to enable the second slice irq when there's no 
> > > second slice? This docs say it's just "reserved", but no mention 
> > > whether it RO or could there be side effects.
> > 
> > Tests on my machine appear to work. But I don't know for certain. Bryan, could you answer this?
> > 
> > > 
> > > > +	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> > > >  
> > > > -	kfree(parity_event[3]);
> > > > -	kfree(parity_event[2]);
> > > > -	kfree(parity_event[1]);
> > > > +	mutex_unlock(&dev_priv->dev->struct_mutex);
> > > >  }
> > > >  
> > > > -static void ivybridge_parity_error_irq_handler(struct drm_device
> > > > *dev)
> > > > +static void ivybridge_parity_error_irq_handler(struct drm_device 
> > > > +*dev, u32 iir)
> > > >  {
> > > >  	drm_i915_private_t *dev_priv = (drm_i915_private_t *)
> > > > dev->dev_private;
> > > >  
> > > > @@ -938,9 +957,12 @@ static void ivybridge_parity_error_irq_handler(struct drm_device *dev)
> > > >  		return;
> > > >  
> > > >  	spin_lock(&dev_priv->irq_lock);
> > > > -	ilk_disable_gt_irq(dev_priv, GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> > > > +	ilk_disable_gt_irq(dev_priv, GT_PARITY_ERROR);
> > > >  	spin_unlock(&dev_priv->irq_lock);
> > > >  
> > > > +	iir &= GT_PARITY_ERROR;
> > > > +	dev_priv->l3_parity.which_slice =
> > > > +		1 << (iir & GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1 ? 1 : 0);
> > > 
> > > What if both slices report an error at the same time?
> > 
> > I was thinking that such an event can not occur, but on rethinking it 
> > you are right that it's possible. I really hope this never happens, 
> > but it's fixed. Anyway, it should have been |=, not =
> > 
> > 
> > [snip]
> > 
> > I'll resend the patch after Bryan answers the question about both interrupts.
> > 
> > --
> > Ben Widawsky, Intel Open Source Technology Center
> 
> --
> Ville Syrjälä
> Intel OTC

--Regards
Bryan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 8/8] drm/i915: Do remaps for all contexts
  2013-09-13  9:17   ` Ville Syrjälä
  2013-09-13  9:20     ` Ville Syrjälä
@ 2013-09-17 20:42     ` Ben Widawsky
  1 sibling, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-17 20:42 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: intel-gfx, vishnu.venkatesh, bryan.j.bell, Ben Widawsky

On Fri, Sep 13, 2013 at 12:17:17PM +0300, Ville Syrjälä wrote:
> On Thu, Sep 12, 2013 at 10:28:34PM -0700, Ben Widawsky wrote:
> > On both Ivybridge and Haswell, row remapping information is saved and
> > restored with context. This means, we never actually properly supported
> > the l3 remapping because our sysfs interface is asynchronous (and not
> > tied to any context), and the known faulty HW would be reused by the
> > next context to run.
> > 
> > Not that due to the asynchronous nature of the sysfs entry, there is no
> > point modifying the registers for the existing context. Instead we set a
> > flag for all contexts to load the correct remapping information on the
> > next run. Interested clients can use debugfs to determine whether or not
> > the row has been remapped.
> > 
> > One could propose at this point that we just do the remapping in the
> > kernel. I guess since we have to maintain the sysfs interface anyway,
> > I'm not sure how useful it is, and I do like keeping the policy in
> > userspace; (it wasn't my original decision to make the
> > interface the way it is, so I'm not attached).
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c     |  8 +++++++
> >  drivers/gpu/drm/i915/i915_drv.h         |  1 +
> >  drivers/gpu/drm/i915/i915_gem_context.c | 17 +++++++++++++--
> >  drivers/gpu/drm/i915/i915_sysfs.c       | 38 +++++++++++----------------------
> >  4 files changed, 36 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index ada0950..80bed69 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -145,6 +145,13 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
> >  		seq_printf(m, " (%s)", obj->ring->name);
> >  }
> >  
> > +static void describe_ctx(struct seq_file *m, struct i915_hw_context *ctx)
> > +{
> > +	seq_putc(m, ctx->is_initialized ? 'I' : 'i');
> > +	seq_putc(m, ctx->remap_slice ? 'R' : 'r');
> > +	seq_putc(m, ' ');
> > +}
> > +
> >  static int i915_gem_object_list_info(struct seq_file *m, void *data)
> >  {
> >  	struct drm_info_node *node = (struct drm_info_node *) m->private;
> > @@ -1463,6 +1470,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
> >  
> >  	list_for_each_entry(ctx, &dev_priv->context_list, link) {
> >  		seq_puts(m, "HW context ");
> > +		describe_ctx(m, ctx);
> >  		for_each_ring(ring, dev_priv, i)
> >  			if (ring->default_context == ctx)
> >  				seq_printf(m, "(default context %s) ", ring->name);
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 68f992b..6ba78cd 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -602,6 +602,7 @@ struct i915_hw_context {
> >  	struct kref ref;
> >  	int id;
> >  	bool is_initialized;
> > +	uint8_t remap_slice;
> >  	struct drm_i915_file_private *file_priv;
> >  	struct intel_ring_buffer *ring;
> >  	struct drm_i915_gem_object *obj;
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 2bbdce8..72b7202 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -140,7 +140,7 @@ create_hw_context(struct drm_device *dev,
> >  {
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	struct i915_hw_context *ctx;
> > -	int ret;
> > +	int ret, i;
> >  
> >  	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> >  	if (ctx == NULL)
> > @@ -181,6 +181,8 @@ create_hw_context(struct drm_device *dev,
> >  
> >  	ctx->file_priv = file_priv;
> >  	ctx->id = ret;
> > +	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > +		ctx->remap_slice |= 1 << 1;
>                                          ^
> i
> 
> >  
> >  	return ctx;
> >  
> > @@ -396,7 +398,7 @@ static int do_switch(struct i915_hw_context *to)
> >  	struct intel_ring_buffer *ring = to->ring;
> >  	struct i915_hw_context *from = ring->last_context;
> >  	u32 hw_flags = 0;
> > -	int ret;
> > +	int ret, i;
> >  
> >  	BUG_ON(from != NULL && from->obj != NULL && from->obj->pin_count == 0);
> >  
> > @@ -432,6 +434,17 @@ static int do_switch(struct i915_hw_context *to)
> >  		return ret;
> >  	}
> >  
> > +	for (i = 0; i < MAX_L3_SLICES; i++) {
> > +		if (!(to->remap_slice & (1<<i)))
> > +			continue;
> > +
> > +		ret = i915_gem_l3_remap(ring, i);
> > +		if (!ret) {
> > +			to->remap_slice &= ~(1<<i);
> > +		}
> > +		/* If it failed, try again next round */
> > +	}
> > +
> >  	/* The backing object for the context is done after switching to the
> >  	 * *next* context. Therefore we cannot retire the previous context until
> >  	 * the next context has already started running. In fact, the below code
> > diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
> > index 65a7274..f0d7e1c 100644
> > --- a/drivers/gpu/drm/i915/i915_sysfs.c
> > +++ b/drivers/gpu/drm/i915/i915_sysfs.c
> > @@ -118,9 +118,8 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
> >  	struct drm_minor *dminor = container_of(dev, struct drm_minor, kdev);
> >  	struct drm_device *drm_dev = dminor->dev;
> >  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
> > -	uint32_t misccpctl;
> >  	int slice = (int)(uintptr_t)attr->private;
> > -	int i, ret;
> > +	int ret;
> >  
> >  	count = round_down(count, 4);
> >  
> > @@ -134,31 +133,16 @@ i915_l3_read(struct file *filp, struct kobject *kobj,
> >  	if (ret)
> >  		return ret;
> >  
> > -	if (IS_HASWELL(drm_dev)) {
> > -		int last = min_t(int, GEN7_L3LOG_SIZE, count + offset);
> > -		if ((!dev_priv->l3_parity.remap_info[slice]))
> > -			memset(buf + offset, 0, last - offset);
> > -		else
> > -			memcpy(buf + offset,
> > -			       dev_priv->l3_parity.remap_info[slice] + (offset/4),
> > -			       last - offset);
> > -
> > -		i = last;
> > -		goto out;
> > -	}
> > -
> > -	misccpctl = I915_READ(GEN7_MISCCPCTL);
> > -	I915_WRITE(GEN7_MISCCPCTL, misccpctl & ~GEN7_DOP_CLOCK_GATE_ENABLE);
> > -
> > -	for (i = 0; i < count; i += 4)
> > -		*((uint32_t *)(&buf[i])) = I915_READ(GEN7_L3LOG_BASE + offset + i);
> > -
> > -	I915_WRITE(GEN7_MISCCPCTL, misccpctl);
> > +	if ((!dev_priv->l3_parity.remap_info[slice]))
> > +		memset(buf + offset, 0, count);
> > +	else
> > +		memcpy(buf + offset,
> > +		       dev_priv->l3_parity.remap_info[slice] + (offset/4),
> > +		       count);
> 
> Needs fixing after patch 4 is fixed.
> 
> >  
> > -out:
> >  	mutex_unlock(&drm_dev->struct_mutex);
> >  
> > -	return i;
> > +	return count;
> >  }
> >  
> >  static ssize_t
> > @@ -170,6 +154,7 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
> >  	struct drm_minor *dminor = container_of(dev, struct drm_minor, kdev);
> >  	struct drm_device *drm_dev = dminor->dev;
> >  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
> > +	struct i915_hw_context *ctx;
> >  	u32 *temp = NULL; /* Just here to make handling failures easy */
> >  	int slice = (int)(uintptr_t)attr->private;
> >  	int ret;
> > @@ -206,8 +191,9 @@ i915_l3_write(struct file *filp, struct kobject *kobj,
> >  
> >  	memcpy(dev_priv->l3_parity.remap_info[slice] + (offset/4), buf, count);
> >  
> > -	if (i915_gem_l3_remap(&dev_priv->ring[RCS], slice))
> > -		count = 0;
> > +	/* NB: We defer the remapping until we switch to the context */
> > +	list_for_each_entry(ctx, &dev_priv->context_list, link)
> > +		ctx->remap_slice |= (1<<slice);
> 
> Should we force a context switch on the next batch if the current
> context has remap_slice != 0?
> 

I think that's a good idea as long as you're willing to review the added
complexity.

> Also what happens if hw_contexts_disabled==false?
> 

Well if I had gotten my way and we'd merged full PPGTT already,
hw_contexts couldn't be disabled; but alas it is a valid concern.

I think punting is the right thing to do here. Mesa won't run if
contexts are disabled. However an error message, and return value would
be appropriate. What do you think?

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support
  2013-09-16 18:18   ` Bell, Bryan J
@ 2013-09-17 23:52     ` Ben Widawsky
  2013-09-17 23:59       ` Ben Widawsky
  0 siblings, 1 reply; 40+ messages in thread
From: Ben Widawsky @ 2013-09-17 23:52 UTC (permalink / raw)
  To: Bell, Bryan J; +Cc: intel-gfx, Venkatesh, Vishnu, Widawsky, Benjamin

On Mon, Sep 16, 2013 at 06:18:28PM +0000, Bell, Bryan J wrote:
> L3 dynamic parity is not supported on VLV. Please add the check for VLV. 
> 
> I can send you the email thread, if needed. 
> 
> --Thanks
> Bryan

More importantly, we need to fix this in the kernel too. Thanks for
catching this.

> 
> -----Original Message-----
> From: Ben Widawsky [mailto:benjamin.widawsky@intel.com] 
> Sent: Thursday, September 12, 2013 10:29 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Venkatesh, Vishnu; Bell, Bryan J; Widawsky, Benjamin; Ben Widawsky
> Subject: [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  tools/intel_l3_parity.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c index 970dcd6..a3d268b 100644
> --- a/tools/intel_l3_parity.c
> +++ b/tools/intel_l3_parity.c
> @@ -120,7 +120,7 @@ int main(int argc, char *argv[])
>  	assert(ret != -1);
>  
>  	fd = open(path, O_RDWR);
> -	if (fd == -1 && IS_IVYBRIDGE(devid)) {
> +	if (fd == -1 && intel_gen(devid) > 6) {
>  		perror("Opening sysfs");
>  		exit(EXIT_FAILURE);
>  	} else if (fd == -1)
> --
> 1.8.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support
  2013-09-17 23:52     ` Ben Widawsky
@ 2013-09-17 23:59       ` Ben Widawsky
  0 siblings, 0 replies; 40+ messages in thread
From: Ben Widawsky @ 2013-09-17 23:59 UTC (permalink / raw)
  To: Bell, Bryan J; +Cc: intel-gfx, Widawsky, Benjamin, Venkatesh, Vishnu

On Tue, Sep 17, 2013 at 04:52:59PM -0700, Ben Widawsky wrote:
> On Mon, Sep 16, 2013 at 06:18:28PM +0000, Bell, Bryan J wrote:
> > L3 dynamic parity is not supported on VLV. Please add the check for VLV. 
> > 
> > I can send you the email thread, if needed. 
> > 
> > --Thanks
> > Bryan
> 
> More importantly, we need to fix this in the kernel too. Thanks for
> catching this.

Correction, fix my breaking of it in the kernel patches.
> 
> > 
> > -----Original Message-----
> > From: Ben Widawsky [mailto:benjamin.widawsky@intel.com] 
> > Sent: Thursday, September 12, 2013 10:29 PM
> > To: intel-gfx@lists.freedesktop.org
> > Cc: Venkatesh, Vishnu; Bell, Bryan J; Widawsky, Benjamin; Ben Widawsky
> > Subject: [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  tools/intel_l3_parity.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c index 970dcd6..a3d268b 100644
> > --- a/tools/intel_l3_parity.c
> > +++ b/tools/intel_l3_parity.c
> > @@ -120,7 +120,7 @@ int main(int argc, char *argv[])
> >  	assert(ret != -1);
> >  
> >  	fd = open(path, O_RDWR);
> > -	if (fd == -1 && IS_IVYBRIDGE(devid)) {
> > +	if (fd == -1 && intel_gen(devid) > 6) {
> >  		perror("Opening sysfs");
> >  		exit(EXIT_FAILURE);
> >  	} else if (fd == -1)
> > --
> > 1.8.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ben Widawsky, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2013-09-17 23:59 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-13  5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
2013-09-13  5:28 ` [PATCH 1/8] drm/i915: Remove extra "ring" Ben Widawsky
2013-09-13  5:28 ` [PATCH 2/8] drm/i915: Round l3 parity reads down Ben Widawsky
2013-09-13  5:28 ` [PATCH 3/8] drm/i915: Fix l3 parity user buffer offset Ben Widawsky
2013-09-13 12:56   ` Daniel Vetter
2013-09-13  5:28 ` [PATCH 4/8] drm/i915: Fix HSW parity test Ben Widawsky
2013-09-13  8:17   ` Ville Syrjälä
2013-09-13  5:28 ` [PATCH 5/8] drm/i915: Add second slice l3 remapping Ben Widawsky
2013-09-13  9:38   ` Ville Syrjälä
2013-09-17 18:45     ` Ben Widawsky
2013-09-17 18:51       ` Bell, Bryan J
2013-09-17 19:02         ` Ville Syrjälä
2013-09-17 19:08           ` Bell, Bryan J
2013-09-13  5:28 ` [PATCH 6/8] drm/i915: Make l3 remapping use the ring Ben Widawsky
2013-09-13 16:16   ` Daniel Vetter
2013-09-13  5:28 ` [PATCH 7/8] drm/i915: Keep a list of all contexts Ben Widawsky
2013-09-13  5:28 ` [PATCH 8/8] drm/i915: Do remaps for " Ben Widawsky
2013-09-13  9:17   ` Ville Syrjälä
2013-09-13  9:20     ` Ville Syrjälä
2013-09-17 20:42     ` Ben Widawsky
2013-09-13  5:28 ` [PATCH 09/16] intel_l3_parity: Fix indentation Ben Widawsky
2013-09-13  5:28 ` [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support Ben Widawsky
2013-09-16 18:18   ` Bell, Bryan J
2013-09-17 23:52     ` Ben Widawsky
2013-09-17 23:59       ` Ben Widawsky
2013-09-13  5:28 ` [PATCH 11/16] intel_l3_parity: Use getopt for the l3 parity tool Ben Widawsky
2013-09-13  5:28 ` [PATCH 12/16] intel_l3_parity: Hardware info argument Ben Widawsky
2013-09-13  5:28 ` [PATCH 13/16] intel_l3_parity: slice support Ben Widawsky
2013-09-13  5:28 ` [PATCH 14/16] intel_l3_parity: Actually support multiple slices Ben Widawsky
2013-09-13  5:28 ` [PATCH 15/16] intel_l3_parity: Support error injection Ben Widawsky
2013-09-13  9:12   ` Daniel Vetter
2013-09-13 15:54     ` Ben Widawsky
2013-09-13 16:14       ` Daniel Vetter
2013-09-13 16:29         ` Ben Widawsky
2013-09-13  5:28 ` [PATCH 16/16] intel_l3_parity: Support a daemonic mode Ben Widawsky
2013-09-13  9:44 ` [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ville Syrjälä
2013-09-17  0:52 ` Bell, Bryan J
2013-09-17  4:15   ` Ben Widawsky
2013-09-17  7:27     ` Daniel Vetter
2013-09-17 18:23       ` Bell, Bryan J

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.