All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] [RFC] Dynamic parity detection
@ 2012-04-13 23:05 Ben Widawsky
  2012-04-13 23:05 ` [PATCH 1/2] drm/i915: Dynamic Parity Detection handling Ben Widawsky
  2012-04-13 23:05 ` [PATCH 2/2] drm/i915: l3 parity sysfs interface Ben Widawsky
  0 siblings, 2 replies; 6+ messages in thread
From: Ben Widawsky @ 2012-04-13 23:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky

I'm still working on the test cases for this feature. I just wanted to
put these patches out there to make sure people were okay with the
concept.

I think it's pretty straightforward... uevent on error, and binary sysfs
interface to remap the bad rows.

If nobody has any issue with the "design" I'll repost the series when
I've tested this on specially modified hardware.

Ben Widawsky (2):
  drm/i915: Dynamic Parity Detection handling
  drm/i915: l3 parity sysfs interface

 drivers/gpu/drm/i915/i915_drv.h         |    2 +
 drivers/gpu/drm/i915/i915_irq.c         |   65 ++++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h         |   26 ++++++++
 drivers/gpu/drm/i915/i915_sysfs.c       |  103 ++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.c |    6 +-
 5 files changed, 197 insertions(+), 5 deletions(-)

-- 
1.7.10

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] drm/i915: Dynamic Parity Detection handling
  2012-04-13 23:05 [PATCH 0/2] [RFC] Dynamic parity detection Ben Widawsky
@ 2012-04-13 23:05 ` Ben Widawsky
  2012-04-13 23:05 ` [PATCH 2/2] drm/i915: l3 parity sysfs interface Ben Widawsky
  1 sibling, 0 replies; 6+ messages in thread
From: Ben Widawsky @ 2012-04-13 23:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky, Ben Widawsky

On IVB hardware we are given an interrupt whenever a parity error occurs
in the L3 cache. This is a very rare occurrence (in fact to test this I
need to use specially instrumented silicon).

When a row in the L3 cache detects a parity error, we alert userspace
via a uevent. With this information userspace can use a sysfs interface
(follow-up patch) to remap those row. Based on the fabrication
process, and random events like gamma rays, certain rows *can* be more
susceptible.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |    2 +
 drivers/gpu/drm/i915/i915_irq.c         |   65 ++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h         |   16 ++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |    6 ++-
 4 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 422f424..8f31e0f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -805,6 +805,8 @@ typedef struct drm_i915_private {
 
 	struct drm_property *broadcast_rgb_property;
 	struct drm_property *force_audio_property;
+
+	struct work_struct parity_error_work;
 } drm_i915_private_t;
 
 enum hdmi_force_audio {
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index febddc2..ef7c126 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -430,6 +430,63 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	mutex_unlock(&dev_priv->dev->struct_mutex);
 }
 
+
+/**
+ * ivybridge_parity_work - Workqueue called when a parity error interrupt
+ * occurred.
+ *
+ * Doesn't actually do anything except notify userspace so that userspace may
+ * disable things later on.
+ */
+static void ivybridge_parity_work(struct work_struct *work)
+{
+	drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
+						    parity_error_work);
+
+	u32 error_status, row, bank, subbank;
+	char *parity_event[5];
+
+	/* We must turn off DOP level clock gating to access the L3 registers.
+	 * In order to prevent a get/put style interface, acquire struct mutex
+	 * any time we access those registers.
+	 */
+	mutex_lock(&dev_priv->dev->struct_mutex);
+	I915_WRITE(GEN7_MISCCPCTL,
+		   I915_READ(GEN7_MISCCPCTL) & ~GEN7_DOP_CLOCK_GATE_ENABLE);
+	POSTING_READ(GEN7_MISCCPCTL);
+	error_status = I915_READ(GEN7_L3CDERRST1);
+	row = GEN7_PARITY_ERROR_ROW(error_status);
+	bank = GEN7_PARITY_ERROR_BANK(error_status);
+	subbank = GEN7_PARITY_ERROR_SUBBANK(error_status);
+	I915_WRITE(GEN7_L3CDERRST1, GEN7_PARITY_ERROR_VALID);
+	POSTING_READ(GEN7_MISCCPCTL);
+	I915_WRITE(GEN7_MISCCPCTL,
+		   I915_READ(GEN7_MISCCPCTL) | GEN7_DOP_CLOCK_GATE_ENABLE);
+	mutex_unlock(&dev_priv->dev->struct_mutex);
+
+	parity_event[0] = "L3_PARITY_ERROR=1";
+	parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
+	parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
+	parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
+	parity_event[4] = NULL;
+
+	kobject_uevent_env(&dev_priv->dev->primary->kdev.kobj,
+			   KOBJ_CHANGE, parity_event);
+
+	kfree(parity_event[3]);
+	kfree(parity_event[2]);
+	kfree(parity_event[1]);
+
+}
+
+void ivybridge_handle_parity_error(struct drm_device *dev)
+{
+	drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private;
+
+	queue_work(dev_priv->wq, &dev_priv->parity_error_work);
+	DRM_INFO("Parity error interrupt. Scheduling work\n");
+}
+
 static void snb_gt_irq_handler(struct drm_device *dev,
 			       struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
@@ -449,6 +506,9 @@ static void snb_gt_irq_handler(struct drm_device *dev,
 		DRM_ERROR("GT error interrupt 0x%08x\n", gt_iir);
 		i915_handle_error(dev, false);
 	}
+
+	if (gt_iir & GT_GEN7_L3_PARITY_ERROR_INTERRUPT)
+		ivybridge_handle_parity_error(dev);
 }
 
 static irqreturn_t valleyview_irq_handler(DRM_IRQ_ARGS)
@@ -1972,6 +2032,9 @@ static void ironlake_irq_preinstall(struct drm_device *dev)
 	if (IS_GEN6(dev) || IS_IVYBRIDGE(dev))
 		INIT_WORK(&dev_priv->rps_work, gen6_pm_rps_work);
 
+	if (IS_IVYBRIDGE(dev))
+		INIT_WORK(&dev_priv->parity_error_work, ivybridge_parity_work);
+
 	I915_WRITE(HWSTAM, 0xeffe);
 
 	/* XXX hotplug from PCH */
@@ -2152,7 +2215,7 @@ static int ivybridge_irq_postinstall(struct drm_device *dev)
 	I915_WRITE(GTIMR, dev_priv->gt_irq_mask);
 
 	render_irqs = GT_USER_INTERRUPT | GEN6_BSD_USER_INTERRUPT |
-		GEN6_BLITTER_USER_INTERRUPT;
+		GEN6_BLITTER_USER_INTERRUPT | GT_GEN7_L3_PARITY_ERROR_INTERRUPT;
 	I915_WRITE(GTIER, render_irqs);
 	POSTING_READ(GTIER);
 
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 972321f..de1685e 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3991,6 +3991,22 @@
 #define   GEN6_RC6			3
 #define   GEN6_RC7			4
 
+#define GEN7_MISCCPCTL			(0x9424)
+#define   GEN7_DOP_CLOCK_GATE_ENABLE	(1<<0)
+
+/* IVYBRIDGE DPF */
+#define GEN7_L3CDERRST1			0xB008 /* L3CD Error Status 1 */
+#define   GEN7_L3CDERRST1_ROW_MASK	(0x7ff<<14)
+#define   GEN7_PARITY_ERROR_VALID	(1<<13)
+#define   GEN7_L3CDERRST1_BANK_MASK	(3<<11)
+#define   GEN7_L3CDERRST1_SUBBANK_MASK	(7<<8)
+#define GEN7_PARITY_ERROR_ROW(reg) \
+		((reg & GEN7_L3CDERRST1_ROW_MASK) >> 14)
+#define GEN7_PARITY_ERROR_BANK(reg) \
+		((reg & GEN7_L3CDERRST1_BANK_MASK) >> 11)
+#define GEN7_PARITY_ERROR_SUBBANK(reg) \
+		((reg & GEN7_L3CDERRST1_SUBBANK_MASK) >> 8)
+
 #define G4X_AUD_VID_DID			0x62020
 #define INTEL_AUDIO_DEVCL		0x808629FB
 #define INTEL_AUDIO_DEVBLC		0x80862801
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 467b331..f956741 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1446,7 +1446,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		ring->flush = gen6_render_ring_flush;
 		ring->irq_get = gen6_ring_get_irq;
 		ring->irq_put = gen6_ring_put_irq;
-		ring->irq_enable = GT_USER_INTERRUPT;
+		ring->irq_enable = GT_USER_INTERRUPT |
+				   GT_GEN7_L3_PARITY_ERROR_INTERRUPT;
 		ring->get_seqno = gen6_ring_get_seqno;
 	} else if (IS_GEN5(dev)) {
 		ring->add_request = pc_render_add_request;
@@ -1471,7 +1472,8 @@ int intel_render_ring_init_dri(struct drm_device *dev, u64 start, u32 size)
 		ring->add_request = gen6_add_request;
 		ring->irq_get = gen6_ring_get_irq;
 		ring->irq_put = gen6_ring_put_irq;
-		ring->irq_enable = GT_USER_INTERRUPT;
+		ring->irq_enable = GT_USER_INTERRUPT |
+				   GT_GEN7_L3_PARITY_ERROR_INTERRUPT;
 	} else if (IS_GEN5(dev)) {
 		ring->add_request = pc_render_add_request;
 		ring->get_seqno = pc_render_get_seqno;
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] drm/i915: l3 parity sysfs interface
  2012-04-13 23:05 [PATCH 0/2] [RFC] Dynamic parity detection Ben Widawsky
  2012-04-13 23:05 ` [PATCH 1/2] drm/i915: Dynamic Parity Detection handling Ben Widawsky
@ 2012-04-13 23:05 ` Ben Widawsky
  2012-04-13 23:24   ` Chris Wilson
  1 sibling, 1 reply; 6+ messages in thread
From: Ben Widawsky @ 2012-04-13 23:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky, Ben Widawsky

Dumb binary interfaces which allow root-only updates of our cache
remapping registers.  See intel-gpu-tools for how this can/should be
used.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h   |   10 ++++
 drivers/gpu/drm/i915/i915_sysfs.c |  103 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index de1685e..1a93395 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -4007,6 +4007,16 @@
 #define GEN7_PARITY_ERROR_SUBBANK(reg) \
 		((reg & GEN7_L3CDERRST1_SUBBANK_MASK) >> 8)
 
+#define GEN7_L3LOG_BASE				0xB070
+#define GEN7_L3LOG_SIZE				0x80
+#if 0
+#define   GEN7_L3LOG_ROW1(reg)         ((reg>>21) & IVYBRDIGE_L3_ROW_MASK)
+#define   GEN7_L3LOG_VLDERR1           (1<<16)
+#define   GEN7_L3LOG_ROW0(reg)         ((reg>>5) & IVYBRDIGE_L3_ROW_MASK)
+#define   GEN7_L3LOG_VLDERR0           (1<<0)
+#endif
+
+
 #define G4X_AUD_VID_DID			0x62020
 #define INTEL_AUDIO_DEVCL		0x808629FB
 #define INTEL_AUDIO_DEVBLC		0x80862801
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index f1b5108..fe37960 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -92,20 +92,119 @@ static struct attribute_group rc6_attr_group = {
 	.attrs =  rc6_attrs
 };
 
+static bool is_l3_access_valid(struct drm_device *dev, loff_t offset)
+{
+	if (!IS_IVYBRIDGE(dev))
+		return false;
+
+	if (offset >= GEN7_L3LOG_SIZE)
+		return false;
+
+	if (offset % 4 != 0)
+		return false;
+
+	return true;
+}
+
+static ssize_t
+i915_l3_read(struct file *filp, struct kobject *kobj,
+	     struct bin_attribute *attr, char *buf,
+	     loff_t offset, size_t count)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct drm_minor *dminor = container_of(dev, struct drm_minor, kdev);
+	struct drm_device *drm_dev = dminor->dev;
+	struct drm_i915_private *dev_priv = drm_dev->dev_private;
+	int i;
+
+	if (is_l3_access_valid(drm_dev, offset))
+		return 0;
+
+	if (i915_mutex_lock_interruptible(drm_dev))
+		return 0;
+
+	I915_WRITE(GEN7_MISCCPCTL,
+		   I915_READ(GEN7_MISCCPCTL) & ~GEN7_DOP_CLOCK_GATE_ENABLE);
+	POSTING_READ(GEN7_MISCCPCTL);
+
+	for (i = offset; count >= 4 && i < GEN7_L3LOG_SIZE; i += 4, count -= 4)
+		buf[i] = I915_READ(GEN7_L3LOG_BASE + i);
+
+	I915_WRITE(GEN7_MISCCPCTL,
+		   I915_READ(GEN7_MISCCPCTL) | GEN7_DOP_CLOCK_GATE_ENABLE);
+	POSTING_READ(GEN7_MISCCPCTL);
+
+	mutex_unlock(&drm_dev->struct_mutex);
+
+	return i - offset;
+}
+
+static ssize_t
+i915_l3_write(struct file *filp, struct kobject *kobj,
+	      struct bin_attribute *attr, char *buf,
+	      loff_t offset, size_t count)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct drm_minor *dminor = container_of(dev, struct drm_minor, kdev);
+	struct drm_device *drm_dev = dminor->dev;
+	struct drm_i915_private *dev_priv = drm_dev->dev_private;
+	int i;
+
+	if (is_l3_access_valid(drm_dev, offset))
+		return 0;
+
+	if (i915_mutex_lock_interruptible(drm_dev))
+		return 0;
+
+	I915_WRITE(GEN7_MISCCPCTL,
+		   I915_READ(GEN7_MISCCPCTL) & ~GEN7_DOP_CLOCK_GATE_ENABLE);
+	POSTING_READ(GEN7_MISCCPCTL);
+
+
+	for (i = offset; count >= 4 && i < GEN7_L3LOG_SIZE; i += 4, count -= 4) {
+		uint32_t data = *(uint32_t *)(&buf[i]);
+		I915_WRITE(GEN7_L3LOG_BASE + i, data);
+	}
+
+	I915_WRITE(GEN7_MISCCPCTL,
+		   I915_READ(GEN7_MISCCPCTL) | GEN7_DOP_CLOCK_GATE_ENABLE);
+	POSTING_READ(GEN7_MISCCPCTL);
+
+	mutex_unlock(&drm_dev->struct_mutex);
+
+	return 0;
+}
+
+static struct bin_attribute dpf_attrs = {
+	.attr = {.name = "l3_parity", .mode = (S_IRWXU)},
+	.size = GEN7_L3LOG_SIZE,
+	.read = i915_l3_read,
+	.write = i915_l3_write,
+	.mmap = NULL
+};
+
 void i915_setup_sysfs(struct drm_device *dev)
 {
 	int ret;
 
-	/* ILK doesn't have any residency information */
+	/* ILK and below don't yet have relevant sysfs files */
 	if (INTEL_INFO(dev)->gen < 6)
 		return;
 
 	ret = sysfs_merge_group(&dev->primary->kdev.kobj, &rc6_attr_group);
 	if (ret)
-		DRM_ERROR("sysfs setup failed\n");
+		DRM_ERROR("RC6 residency sysfs setup failed\n");
+
+	if (!IS_IVYBRIDGE(dev))
+		return;
+
+	ret = device_create_bin_file(&dev->primary->kdev, &dpf_attrs);
+	if (ret)
+		DRM_ERROR("l3 parity sysfs setup failed\n");
 }
 
 void i915_teardown_sysfs(struct drm_device *dev)
 {
+	device_remove_bin_file(&dev->primary->kdev,  &dpf_attrs);
 	sysfs_unmerge_group(&dev->primary->kdev.kobj, &rc6_attr_group);
 }
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drm/i915: l3 parity sysfs interface
  2012-04-13 23:05 ` [PATCH 2/2] drm/i915: l3 parity sysfs interface Ben Widawsky
@ 2012-04-13 23:24   ` Chris Wilson
  2012-04-14  1:39     ` Ben Widawsky
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2012-04-13 23:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky, Ben Widawsky

On Fri, 13 Apr 2012 16:05:14 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> Dumb binary interfaces which allow root-only updates of our cache
> remapping registers.  See intel-gpu-tools for how this can/should be
> used.

Initial comments: don't bother posting a read just before a read, and do
return errors from the sysfs read/write functions (the return value is
signed for that purpose). A lesser issue is that if you are worried
about necessity of posting-reads, you should also worry about the effect of
the weak ordering of writes.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drm/i915: l3 parity sysfs interface
  2012-04-13 23:24   ` Chris Wilson
@ 2012-04-14  1:39     ` Ben Widawsky
  2012-04-14  1:42       ` Ben Widawsky
  0 siblings, 1 reply; 6+ messages in thread
From: Ben Widawsky @ 2012-04-14  1:39 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Ben Widawsky

On Sat, 14 Apr 2012 00:24:03 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> On Fri, 13 Apr 2012 16:05:14 -0700, Ben Widawsky <ben@bwidawsk.net>
> wrote:
> > Dumb binary interfaces which allow root-only updates of our cache
> > remapping registers.  See intel-gpu-tools for how this can/should be
> > used.
> 
> Initial comments: don't bother posting a read just before a read, and
> do return errors from the sysfs read/write functions (the return
> value is signed for that purpose). A lesser issue is that if you are
> worried about necessity of posting-reads, you should also worry about
> the effect of the weak ordering of writes.
> -Chris
> 

Thanks for the advice on the sysfs return values. Some of the code I
was referring from just returns 0, but this seems to be not the right
thing to do after looking further into it.

As for the POSTING_READ it was intentional. Unless reads are always in
order??? I have to make sure the clock gating disable propogates before
I can read the L3 registers.

Ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drm/i915: l3 parity sysfs interface
  2012-04-14  1:39     ` Ben Widawsky
@ 2012-04-14  1:42       ` Ben Widawsky
  0 siblings, 0 replies; 6+ messages in thread
From: Ben Widawsky @ 2012-04-14  1:42 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx, Ben Widawsky

On Fri, 13 Apr 2012 18:39:24 -0700
Ben Widawsky <ben@bwidawsk.net> wrote:

> On Sat, 14 Apr 2012 00:24:03 +0100
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > On Fri, 13 Apr 2012 16:05:14 -0700, Ben Widawsky <ben@bwidawsk.net>
> > wrote:
> > > Dumb binary interfaces which allow root-only updates of our cache
> > > remapping registers.  See intel-gpu-tools for how this can/should
> > > be used.
> > 
> > Initial comments: don't bother posting a read just before a read,
> > and do return errors from the sysfs read/write functions (the return
> > value is signed for that purpose). A lesser issue is that if you are
> > worried about necessity of posting-reads, you should also worry
> > about the effect of the weak ordering of writes.
> > -Chris
> > 
> 
> Thanks for the advice on the sysfs return values. Some of the code I
> was referring from just returns 0, but this seems to be not the right
> thing to do after looking further into it.
> 
> As for the POSTING_READ it was intentional. Unless reads are always in
> order??? I have to make sure the clock gating disable propogates
> before I can read the L3 registers.
>

On further thought, yeah - you are right.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-04-14  1:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-13 23:05 [PATCH 0/2] [RFC] Dynamic parity detection Ben Widawsky
2012-04-13 23:05 ` [PATCH 1/2] drm/i915: Dynamic Parity Detection handling Ben Widawsky
2012-04-13 23:05 ` [PATCH 2/2] drm/i915: l3 parity sysfs interface Ben Widawsky
2012-04-13 23:24   ` Chris Wilson
2012-04-14  1:39     ` Ben Widawsky
2012-04-14  1:42       ` Ben Widawsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.