* [PATCH v2 0/3] hangcheck fixes
@ 2011-10-04 21:11 Ben Widawsky
2011-10-04 21:11 ` [PATCH 1/3] drm/i915: collect more per ring error info Ben Widawsky
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Ben Widawsky @ 2011-10-04 21:11 UTC (permalink / raw)
To: intel-gfx; +Cc: Ben Widawsky
I split out a lot more of the per ring registers in this version, as
well as added the patch to remove the kick_ring, which AFAICT doesn't
really help anything.
Patch 3 can be dropped if not desired.
Ben Widawsky (3):
drm/i915: collect more per ring error info
drm/i915: check acthd for all rings
drm/i915: remove ring kick
drivers/gpu/drm/i915/i915_debugfs.c | 44 +++++---
drivers/gpu/drm/i915/i915_drv.h | 33 ++----
drivers/gpu/drm/i915/i915_irq.c | 200 ++++++++++++++++++++---------------
drivers/gpu/drm/i915/i915_reg.h | 14 +++-
4 files changed, 164 insertions(+), 127 deletions(-)
--
1.7.6.4
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] drm/i915: collect more per ring error info
2011-10-04 21:11 [PATCH v2 0/3] hangcheck fixes Ben Widawsky
@ 2011-10-04 21:11 ` Ben Widawsky
2011-10-07 13:52 ` Daniel Vetter
2011-10-04 21:11 ` [PATCH 2/3] drm/i915: check acthd for all rings Ben Widawsky
2011-10-04 21:11 ` [PATCH 3/3] drm/i915: remove ring kick Ben Widawsky
2 siblings, 1 reply; 8+ messages in thread
From: Ben Widawsky @ 2011-10-04 21:11 UTC (permalink / raw)
To: intel-gfx; +Cc: Ben Widawsky
V1: Just added per ring page faults
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
V2: Added much more per ring stuff
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
drivers/gpu/drm/i915/i915_debugfs.c | 44 +++++++++++++++----------
drivers/gpu/drm/i915/i915_drv.h | 28 ++++++----------
drivers/gpu/drm/i915/i915_irq.c | 60 ++++++++++++++++++++--------------
drivers/gpu/drm/i915/i915_reg.h | 14 +++++++-
4 files changed, 84 insertions(+), 62 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 8e95d66..332a418 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -747,34 +747,42 @@ static int i915_error_state(struct seq_file *m, void *unused)
seq_printf(m, "Time: %ld s %ld us\n", error->time.tv_sec,
error->time.tv_usec);
seq_printf(m, "PCI ID: 0x%04x\n", dev->pci_device);
- seq_printf(m, "EIR: 0x%08x\n", error->eir);
+ seq_printf(m, "EIR: 0x%08x\n", error->eir[RCS]);
seq_printf(m, "PGTBL_ER: 0x%08x\n", error->pgtbl_er);
if (INTEL_INFO(dev)->gen >= 6) {
+ seq_printf(m, "GFX Page Fault: 0x%08x\n", error->page_fault[RCS]);
+ seq_printf(m, "Media Page Fault: 0x%08x\n", error->page_fault[VCS]);
+ seq_printf(m, "Blitter Page Fault: 0x%08x\n", error->page_fault[BCS]);
seq_printf(m, "ERROR: 0x%08x\n", error->error);
seq_printf(m, "Blitter command stream:\n");
- seq_printf(m, " ACTHD: 0x%08x\n", error->bcs_acthd);
- seq_printf(m, " IPEIR: 0x%08x\n", error->bcs_ipeir);
- seq_printf(m, " IPEHR: 0x%08x\n", error->bcs_ipehr);
- seq_printf(m, " INSTDONE: 0x%08x\n", error->bcs_instdone);
- seq_printf(m, " seqno: 0x%08x\n", error->bcs_seqno);
+ seq_printf(m, " ACTHD: 0x%08x\n", error->acthd[BCS]);
+ seq_printf(m, " IPEIR: 0x%08x\n", error->ipeir[BCS]);
+ seq_printf(m, " IPEHR: 0x%08x\n", error->ipehr[BCS]);
+ seq_printf(m, " INSTDONE: 0x%08x\n", error->instdone[BCS]);
+ seq_printf(m, " FADDR: 0x%08x\n", error->fadd[BCS]);
+ seq_printf(m, " INSTPS: 0x%08x\n", error->instps[BCS]);
+ seq_printf(m, " seqno: 0x%08x\n", error->seqno[BCS]);
seq_printf(m, "Video (BSD) command stream:\n");
- seq_printf(m, " ACTHD: 0x%08x\n", error->vcs_acthd);
- seq_printf(m, " IPEIR: 0x%08x\n", error->vcs_ipeir);
- seq_printf(m, " IPEHR: 0x%08x\n", error->vcs_ipehr);
- seq_printf(m, " INSTDONE: 0x%08x\n", error->vcs_instdone);
- seq_printf(m, " seqno: 0x%08x\n", error->vcs_seqno);
+ seq_printf(m, " ACTHD: 0x%08x\n", error->acthd[VCS]);
+ seq_printf(m, " IPEIR: 0x%08x\n", error->ipeir[VCS]);
+ seq_printf(m, " IPEHR: 0x%08x\n", error->ipehr[VCS]);
+ seq_printf(m, " INSTDONE: 0x%08x\n", error->instdone[VCS]);
+ seq_printf(m, " FADDR: 0x%08x\n", error->fadd[VCS]);
+ seq_printf(m, " INSTPS: 0x%08x\n", error->instps[VCS]);
+ seq_printf(m, " seqno: 0x%08x\n", error->seqno[VCS]);
}
seq_printf(m, "Render command stream:\n");
- seq_printf(m, " ACTHD: 0x%08x\n", error->acthd);
- seq_printf(m, " IPEIR: 0x%08x\n", error->ipeir);
- seq_printf(m, " IPEHR: 0x%08x\n", error->ipehr);
- seq_printf(m, " INSTDONE: 0x%08x\n", error->instdone);
+ seq_printf(m, " ACTHD: 0x%08x\n", error->acthd[RCS]);
+ seq_printf(m, " IPEIR: 0x%08x\n", error->ipeir[RCS]);
+ seq_printf(m, " IPEHR: 0x%08x\n", error->ipehr[RCS]);
+ seq_printf(m, " INSTDONE: 0x%08x\n", error->instdone[RCS]);
if (INTEL_INFO(dev)->gen >= 4) {
seq_printf(m, " INSTDONE1: 0x%08x\n", error->instdone1);
- seq_printf(m, " INSTPS: 0x%08x\n", error->instps);
+ seq_printf(m, " INSTPS: 0x%08x\n", error->instps[RCS]);
}
- seq_printf(m, " INSTPM: 0x%08x\n", error->instpm);
- seq_printf(m, " seqno: 0x%08x\n", error->seqno);
+ seq_printf(m, " INSTPM: 0x%08x\n", error->instpm[RCS]);
+ seq_printf(m, " FADDR: 0x%08x\n", error->fadd[RCS]);
+ seq_printf(m, " seqno: 0x%08x\n", error->seqno[RCS]);
for (i = 0; i < dev_priv->num_fence_regs; i++)
seq_printf(m, " fence[%d] = %08llx\n", i, error->fence[i]);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 15c0ca5..567275c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -146,28 +146,20 @@ struct sdvo_device_mapping {
struct intel_display_error_state;
struct drm_i915_error_state {
- u32 eir;
+ u32 eir[I915_NUM_RINGS];
u32 pgtbl_er;
u32 pipestat[I915_MAX_PIPES];
- u32 ipeir;
- u32 ipehr;
- u32 instdone;
- u32 acthd;
+ u32 ipeir[I915_NUM_RINGS];
+ u32 ipehr[I915_NUM_RINGS];
+ u32 instdone[I915_NUM_RINGS];
+ u32 acthd[I915_NUM_RINGS];
+ u32 page_fault[I915_NUM_RINGS];
+ u32 fadd[I915_NUM_RINGS];
u32 error; /* gen6+ */
- u32 bcs_acthd; /* gen6+ blt engine */
- u32 bcs_ipehr;
- u32 bcs_ipeir;
- u32 bcs_instdone;
- u32 bcs_seqno;
- u32 vcs_acthd; /* gen6+ bsd engine */
- u32 vcs_ipehr;
- u32 vcs_ipeir;
- u32 vcs_instdone;
- u32 vcs_seqno;
- u32 instpm;
- u32 instps;
+ u32 instpm[I915_NUM_RINGS];
+ u32 instps[I915_NUM_RINGS];
u32 instdone1;
- u32 seqno;
+ u32 seqno[I915_NUM_RINGS];
u64 bbaddr;
u64 fence[16];
struct timeval time;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 012732b..97e338b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -906,45 +906,55 @@ static void i915_capture_error_state(struct drm_device *dev)
DRM_INFO("capturing error event; look for more information in /debug/dri/%d/i915_error_state\n",
dev->primary->index);
- error->seqno = dev_priv->ring[RCS].get_seqno(&dev_priv->ring[RCS]);
- error->eir = I915_READ(EIR);
+ error->seqno[RCS] = dev_priv->ring[RCS].get_seqno(&dev_priv->ring[RCS]);
+ error->eir[RCS] = I915_READ(EIR);
error->pgtbl_er = I915_READ(PGTBL_ER);
for_each_pipe(pipe)
error->pipestat[pipe] = I915_READ(PIPESTAT(pipe));
- error->instpm = I915_READ(INSTPM);
+ error->instpm[RCS] = I915_READ(INSTPM);
error->error = 0;
if (INTEL_INFO(dev)->gen >= 6) {
error->error = I915_READ(ERROR_GEN6);
-
- error->bcs_acthd = I915_READ(BCS_ACTHD);
- error->bcs_ipehr = I915_READ(BCS_IPEHR);
- error->bcs_ipeir = I915_READ(BCS_IPEIR);
- error->bcs_instdone = I915_READ(BCS_INSTDONE);
- error->bcs_seqno = 0;
+ error->page_fault[RCS] = I915_READ(GEN6_FAULT);
+ error->fadd[RCS] = I915_READ(GEN6_DMA_FADD);
+
+ error->eir[BCS] = I915_READ(GEN6_BLITTER_EIR);
+ error->acthd[BCS] = I915_READ(BCS_ACTHD);
+ error->ipehr[BCS] = I915_READ(BCS_IPEHR);
+ error->ipeir[BCS] = I915_READ(BCS_IPEIR);
+ error->fadd[BCS] = I915_READ(GEN6_BCS_FADD);
+ error->instdone[BCS] = I915_READ(BCS_INSTDONE);
+ error->instps[BCS] = I915_READ(GEN6_BCS_INSTPS);
+ error->seqno[BCS] = 0;
if (dev_priv->ring[BCS].get_seqno)
- error->bcs_seqno = dev_priv->ring[BCS].get_seqno(&dev_priv->ring[BCS]);
-
- error->vcs_acthd = I915_READ(VCS_ACTHD);
- error->vcs_ipehr = I915_READ(VCS_IPEHR);
- error->vcs_ipeir = I915_READ(VCS_IPEIR);
- error->vcs_instdone = I915_READ(VCS_INSTDONE);
- error->vcs_seqno = 0;
+ error->seqno[BCS] = dev_priv->ring[BCS].get_seqno(&dev_priv->ring[BCS]);
+ error->page_fault[BCS] = I915_READ(GEN6_BLT_FAULT);
+
+ error->eir[VCS] = I915_READ(GEN6_BSD_EIR);
+ error->acthd[VCS] = I915_READ(VCS_ACTHD);
+ error->ipehr[VCS] = I915_READ(VCS_IPEHR);
+ error->ipeir[VCS] = I915_READ(VCS_IPEIR);
+ error->fadd[VCS] = I915_READ(GEN6_VCS_FADD);
+ error->instdone[VCS] = I915_READ(VCS_INSTDONE);
+ error->instps[VCS] = I915_READ(GEN6_VCS_INSTPS);
+ error->seqno[VCS] = 0;
if (dev_priv->ring[VCS].get_seqno)
- error->vcs_seqno = dev_priv->ring[VCS].get_seqno(&dev_priv->ring[VCS]);
+ error->seqno[VCS] = dev_priv->ring[VCS].get_seqno(&dev_priv->ring[VCS]);
+ error->page_fault[VCS] = I915_READ(GEN6_BSD_FAULT);
}
if (INTEL_INFO(dev)->gen >= 4) {
- error->ipeir = I915_READ(IPEIR_I965);
- error->ipehr = I915_READ(IPEHR_I965);
- error->instdone = I915_READ(INSTDONE_I965);
- error->instps = I915_READ(INSTPS);
+ error->ipeir[RCS] = I915_READ(IPEIR_I965);
+ error->ipehr[RCS] = I915_READ(IPEHR_I965);
+ error->instdone[RCS] = I915_READ(INSTDONE_I965);
+ error->instps[RCS] = I915_READ(INSTPS);
error->instdone1 = I915_READ(INSTDONE1);
error->acthd = I915_READ(ACTHD_I965);
error->bbaddr = I915_READ64(BB_ADDR);
} else {
- error->ipeir = I915_READ(IPEIR);
- error->ipehr = I915_READ(IPEHR);
- error->instdone = I915_READ(INSTDONE);
- error->acthd = I915_READ(ACTHD);
+ error->ipeir[RCS] = I915_READ(IPEIR);
+ error->ipehr[RCS] = I915_READ(IPEHR);
+ error->instdone[RCS] = I915_READ(INSTDONE);
+ error->acthd[RCS] = I915_READ(ACTHD);
error->bbaddr = 0;
}
i915_gem_record_fences(dev, error);
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 138eae1..ca28efc 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,7 @@
#define IPEHR_I965 0x02068
#define INSTDONE_I965 0x0206c
#define INSTPS 0x02070 /* 965+ only */
+#define GEN6_DMA_FADD 0x02078
#define INSTDONE1 0x0207c /* 965+ only */
#define ACTHD_I965 0x02074
#define HWS_PGA 0x02080
@@ -362,12 +363,21 @@
#define VCS_INSTDONE 0x1206C
#define VCS_IPEIR 0x12064
#define VCS_IPEHR 0x12068
+#define GEN6_VCS_INSTPS 0x12070 /* Not clear if this is valid */
#define VCS_ACTHD 0x12074
+#define GEN6_VCS_FADD 0x12078
+#define GEN6_VCS_INSTDONE 0x12090
#define BCS_INSTDONE 0x2206C
#define BCS_IPEIR 0x22064
#define BCS_IPEHR 0x22068
+#define GEN6_BCS_INSTPS 0x22070
#define BCS_ACTHD 0x22074
+#define GEN6_BCS_FADD 0x22078
+#define GEN6_BCS_INSTDONE 0x22090
+#define GEN6_FAULT 0x04094
+#define GEN6_BSD_FAULT 0x04194
+#define GEN6_BLT_FAULT 0x04294
#define ERROR_GEN6 0x040a0
/* GM45+ chicken bits -- debug workaround bits that may be required
@@ -545,6 +555,8 @@
#define GEN6_BLITTER_COMMAND_PARSER_MASTER_ERROR (1 << 25)
#define GEN6_BLITTER_SYNC_STATUS (1 << 24)
#define GEN6_BLITTER_USER_INTERRUPT (1 << 22)
+#define GEN6_BLITTER_EIR 0x220b0
+#define GEN6_BLITTER_EMR 0x220b4
#define GEN6_BLITTER_ECOSKPD 0x221d0
#define GEN6_BLITTER_LOCK_SHIFT 16
@@ -559,7 +571,7 @@
#define GEN6_BSD_HWSTAM 0x12098
#define GEN6_BSD_IMR 0x120a8
#define GEN6_BSD_USER_INTERRUPT (1 << 12)
-
+#define GEN6_BSD_EIR 0x120b0
#define GEN6_BSD_RNCID 0x12198
/*
--
1.7.6.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] drm/i915: check acthd for all rings
2011-10-04 21:11 [PATCH v2 0/3] hangcheck fixes Ben Widawsky
2011-10-04 21:11 ` [PATCH 1/3] drm/i915: collect more per ring error info Ben Widawsky
@ 2011-10-04 21:11 ` Ben Widawsky
2011-10-07 13:59 ` Daniel Vetter
2011-10-04 21:11 ` [PATCH 3/3] drm/i915: remove ring kick Ben Widawsky
2 siblings, 1 reply; 8+ messages in thread
From: Ben Widawsky @ 2011-10-04 21:11 UTC (permalink / raw)
To: intel-gfx; +Cc: Ben Widawsky
On Gen6+ we have other rings which may be in use. We haven't hung if the
blit or media ring is still going
Before rebase:
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
drivers/gpu/drm/i915/i915_drv.h | 5 +-
drivers/gpu/drm/i915/i915_irq.c | 143 +++++++++++++++++++++++++++------------
2 files changed, 102 insertions(+), 46 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 567275c..edfa8be 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -323,9 +323,8 @@ typedef struct drm_i915_private {
#define DRM_I915_HANGCHECK_PERIOD 1500 /* in ms */
struct timer_list hangcheck_timer;
int hangcheck_count;
- uint32_t last_acthd;
- uint32_t last_instdone;
- uint32_t last_instdone1;
+ uint32_t last_acthd[I915_NUM_RINGS];
+ uint64_t last_instdone[I915_NUM_RINGS];
unsigned long cfb_size;
unsigned int cfb_fb;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 97e338b..6b6abe1 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -948,7 +948,7 @@ static void i915_capture_error_state(struct drm_device *dev)
error->instdone[RCS] = I915_READ(INSTDONE_I965);
error->instps[RCS] = I915_READ(INSTPS);
error->instdone1 = I915_READ(INSTDONE1);
- error->acthd = I915_READ(ACTHD_I965);
+ error->acthd[RCS] = I915_READ(ACTHD_I965);
error->bbaddr = I915_READ64(BB_ADDR);
} else {
error->ipeir[RCS] = I915_READ(IPEIR);
@@ -1666,6 +1666,85 @@ static bool kick_ring(struct intel_ring_buffer *ring)
return false;
}
+static bool
+instdone_stuck(struct drm_device *dev)
+{
+ struct drm_i915_private *dev_priv = dev->dev_private;
+ uint64_t instdone = 0, instdone1 = 0, vcs_instdone = 0, bcs_instdone = 0;
+ bool stuck;
+
+ switch (INTEL_INFO(dev)->gen) {
+ case 7:
+ case 6:
+ bcs_instdone = I915_READ(BCS_INSTDONE);
+ case 5:
+ vcs_instdone = I915_READ(VCS_INSTDONE);
+ case 4:
+ instdone = I915_READ(INSTDONE_I965);
+ instdone1 = I915_READ(INSTDONE1);
+ break;
+ case 3:
+ case 2:
+ instdone = I915_READ(INSTDONE);
+ break;
+ }
+
+ stuck =
+ (dev_priv->last_instdone[RCS] == ((instdone << 32) | instdone1)) &&
+ (dev_priv->last_instdone[VCS] == vcs_instdone) &&
+ (dev_priv->last_instdone[BCS] == bcs_instdone);
+
+ dev_priv->last_instdone[RCS] = (instdone << 32) | instdone1;
+ dev_priv->last_instdone[VCS] = vcs_instdone;
+ dev_priv->last_instdone[BCS] = bcs_instdone;
+
+ return stuck;
+}
+
+static bool
+acthd_stuck(struct drm_device *dev)
+{
+ struct drm_i915_private *dev_priv = dev->dev_private;
+ uint32_t acthd = 0, vcs_acthd = 0, bcs_acthd = 0;
+ bool stuck = false;
+
+ switch (INTEL_INFO(dev)->gen) {
+ case 7:
+ case 6:
+ bcs_acthd = intel_ring_get_active_head(&dev_priv->ring[BCS]);
+ case 5:
+ vcs_acthd = intel_ring_get_active_head(&dev_priv->ring[VCS]);
+ case 4:
+ case 3:
+ case 2:
+ acthd = intel_ring_get_active_head(&dev_priv->ring[RCS]);
+ break;
+ }
+
+ stuck =
+ (dev_priv->last_acthd[RCS] == acthd) &&
+ (dev_priv->last_acthd[VCS] == vcs_acthd) &&
+ (dev_priv->last_acthd[BCS] == bcs_acthd);
+
+ dev_priv->last_acthd[RCS] = acthd;
+ dev_priv->last_acthd[VCS] = vcs_acthd;
+ dev_priv->last_acthd[BCS] = bcs_acthd;
+
+ return stuck;
+}
+
+static bool gpu_stuck(struct drm_device *dev)
+{
+ #define NUM_HANGCHECKS_TO_RESET 1
+
+ struct drm_i915_private *dev_priv = dev->dev_private;
+
+ if (dev_priv->hangcheck_count++ < NUM_HANGCHECKS_TO_RESET)
+ return false;
+
+ return acthd_stuck(dev) && instdone_stuck(dev);
+}
+
/**
* This is called when the chip hasn't reported back with completed
* batchbuffers in a long time. The first time this is called we simply record
@@ -1676,13 +1755,11 @@ void i915_hangcheck_elapsed(unsigned long data)
{
struct drm_device *dev = (struct drm_device *)data;
drm_i915_private_t *dev_priv = dev->dev_private;
- uint32_t acthd, instdone, instdone1;
bool err = false;
if (!i915_enable_hangcheck)
return;
- /* If all work is done then ACTHD clearly hasn't advanced. */
if (i915_hangcheck_ring_idle(&dev_priv->ring[RCS], &err) &&
i915_hangcheck_ring_idle(&dev_priv->ring[VCS], &err) &&
i915_hangcheck_ring_idle(&dev_priv->ring[BCS], &err)) {
@@ -1692,50 +1769,30 @@ void i915_hangcheck_elapsed(unsigned long data)
return;
}
- if (INTEL_INFO(dev)->gen < 4) {
- acthd = I915_READ(ACTHD);
- instdone = I915_READ(INSTDONE);
- instdone1 = 0;
- } else {
- acthd = I915_READ(ACTHD_I965);
- instdone = I915_READ(INSTDONE_I965);
- instdone1 = I915_READ(INSTDONE1);
- }
-
- if (dev_priv->last_acthd == acthd &&
- dev_priv->last_instdone == instdone &&
- dev_priv->last_instdone1 == instdone1) {
- if (dev_priv->hangcheck_count++ > 1) {
- DRM_ERROR("Hangcheck timer elapsed... GPU hung\n");
+ if (gpu_stuck(dev)) {
+ DRM_ERROR("Hangcheck timer elapsed... GPU hung\n");
- if (!IS_GEN2(dev)) {
- /* Is the chip hanging on a WAIT_FOR_EVENT?
- * If so we can simply poke the RB_WAIT bit
- * and break the hang. This should work on
- * all but the second generation chipsets.
- */
-
- if (kick_ring(&dev_priv->ring[RCS]))
- goto repeat;
+ if (!IS_GEN2(dev)) {
+ /* Is the chip hanging on a WAIT_FOR_EVENT?
+ * If so we can simply poke the RB_WAIT bit
+ * and break the hang. This should work on
+ * all but the second generation chipsets.
+ */
- if (HAS_BSD(dev) &&
- kick_ring(&dev_priv->ring[VCS]))
- goto repeat;
+ if (kick_ring(&dev_priv->ring[RCS]))
+ goto repeat;
- if (HAS_BLT(dev) &&
- kick_ring(&dev_priv->ring[BCS]))
- goto repeat;
- }
+ if (HAS_BSD(dev) &&
+ kick_ring(&dev_priv->ring[VCS]))
+ goto repeat;
- i915_handle_error(dev, true);
- return;
+ if (HAS_BLT(dev) &&
+ kick_ring(&dev_priv->ring[BCS]))
+ goto repeat;
}
- } else {
- dev_priv->hangcheck_count = 0;
- dev_priv->last_acthd = acthd;
- dev_priv->last_instdone = instdone;
- dev_priv->last_instdone1 = instdone1;
+ i915_handle_error(dev, true);
+ return;
}
repeat:
--
1.7.6.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] drm/i915: remove ring kick
2011-10-04 21:11 [PATCH v2 0/3] hangcheck fixes Ben Widawsky
2011-10-04 21:11 ` [PATCH 1/3] drm/i915: collect more per ring error info Ben Widawsky
2011-10-04 21:11 ` [PATCH 2/3] drm/i915: check acthd for all rings Ben Widawsky
@ 2011-10-04 21:11 ` Ben Widawsky
2011-10-07 14:00 ` Daniel Vetter
2 siblings, 1 reply; 8+ messages in thread
From: Ben Widawsky @ 2011-10-04 21:11 UTC (permalink / raw)
To: intel-gfx; +Cc: Ben Widawsky
Ring kick was racy (forcewake), and doesn't really do anything for us
that shouldn't be done by reset.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
drivers/gpu/drm/i915/i915_irq.c | 41 ---------------------------------------
1 files changed, 0 insertions(+), 41 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 6b6abe1..6e168b6 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1645,27 +1645,6 @@ static bool i915_hangcheck_ring_idle(struct intel_ring_buffer *ring, bool *err)
return false;
}
-static bool kick_ring(struct intel_ring_buffer *ring)
-{
- struct drm_device *dev = ring->dev;
- struct drm_i915_private *dev_priv = dev->dev_private;
- u32 tmp = I915_READ_CTL(ring);
- if (tmp & RING_WAIT) {
- DRM_ERROR("Kicking stuck wait on %s\n",
- ring->name);
- I915_WRITE_CTL(ring, tmp);
- return true;
- }
- if (IS_GEN6(dev) &&
- (tmp & RING_WAIT_SEMAPHORE)) {
- DRM_ERROR("Kicking stuck semaphore on %s\n",
- ring->name);
- I915_WRITE_CTL(ring, tmp);
- return true;
- }
- return false;
-}
-
static bool
instdone_stuck(struct drm_device *dev)
{
@@ -1771,26 +1750,6 @@ void i915_hangcheck_elapsed(unsigned long data)
if (gpu_stuck(dev)) {
DRM_ERROR("Hangcheck timer elapsed... GPU hung\n");
-
- if (!IS_GEN2(dev)) {
- /* Is the chip hanging on a WAIT_FOR_EVENT?
- * If so we can simply poke the RB_WAIT bit
- * and break the hang. This should work on
- * all but the second generation chipsets.
- */
-
- if (kick_ring(&dev_priv->ring[RCS]))
- goto repeat;
-
- if (HAS_BSD(dev) &&
- kick_ring(&dev_priv->ring[VCS]))
- goto repeat;
-
- if (HAS_BLT(dev) &&
- kick_ring(&dev_priv->ring[BCS]))
- goto repeat;
- }
-
i915_handle_error(dev, true);
return;
}
--
1.7.6.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] drm/i915: collect more per ring error info
2011-10-04 21:11 ` [PATCH 1/3] drm/i915: collect more per ring error info Ben Widawsky
@ 2011-10-07 13:52 ` Daniel Vetter
2011-10-07 16:19 ` Chris Wilson
0 siblings, 1 reply; 8+ messages in thread
From: Daniel Vetter @ 2011-10-07 13:52 UTC (permalink / raw)
To: Ben Widawsky; +Cc: intel-gfx
On Tue, Oct 04, 2011 at 02:11:51PM -0700, Ben Widawsky wrote:
> V1: Just added per ring page faults
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> V2: Added much more per ring stuff
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
While you wrestle this, can we cut down on the line-noise a bit by using
ring->mmio_base an per-ring functions? By storing VCS, BCS, RCS in
ring->id instead of the flags that are currently there this should reduce
quite a bit.
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] drm/i915: check acthd for all rings
2011-10-04 21:11 ` [PATCH 2/3] drm/i915: check acthd for all rings Ben Widawsky
@ 2011-10-07 13:59 ` Daniel Vetter
0 siblings, 0 replies; 8+ messages in thread
From: Daniel Vetter @ 2011-10-07 13:59 UTC (permalink / raw)
To: Ben Widawsky; +Cc: intel-gfx
On Tue, Oct 04, 2011 at 02:11:52PM -0700, Ben Widawsky wrote:
> On Gen6+ we have other rings which may be in use. We haven't hung if the
> blit or media ring is still going
>
> Before rebase:
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
I've spotted another goof-up besides the instdone_stuck && acthd_stuck
I've overlooked last time around.
> ---
> drivers/gpu/drm/i915/i915_drv.h | 5 +-
> drivers/gpu/drm/i915/i915_irq.c | 143 +++++++++++++++++++++++++++------------
> 2 files changed, 102 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 567275c..edfa8be 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -323,9 +323,8 @@ typedef struct drm_i915_private {
> #define DRM_I915_HANGCHECK_PERIOD 1500 /* in ms */
> struct timer_list hangcheck_timer;
> int hangcheck_count;
> - uint32_t last_acthd;
> - uint32_t last_instdone;
> - uint32_t last_instdone1;
> + uint32_t last_acthd[I915_NUM_RINGS];
> + uint64_t last_instdone[I915_NUM_RINGS];
>
> unsigned long cfb_size;
> unsigned int cfb_fb;
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 97e338b..6b6abe1 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -948,7 +948,7 @@ static void i915_capture_error_state(struct drm_device *dev)
> error->instdone[RCS] = I915_READ(INSTDONE_I965);
> error->instps[RCS] = I915_READ(INSTPS);
> error->instdone1 = I915_READ(INSTDONE1);
> - error->acthd = I915_READ(ACTHD_I965);
> + error->acthd[RCS] = I915_READ(ACTHD_I965);
> error->bbaddr = I915_READ64(BB_ADDR);
> } else {
> error->ipeir[RCS] = I915_READ(IPEIR);
> @@ -1666,6 +1666,85 @@ static bool kick_ring(struct intel_ring_buffer *ring)
> return false;
> }
>
> +static bool
> +instdone_stuck(struct drm_device *dev)
> +{
> + struct drm_i915_private *dev_priv = dev->dev_private;
> + uint64_t instdone = 0, instdone1 = 0, vcs_instdone = 0, bcs_instdone = 0;
> + bool stuck;
> +
> + switch (INTEL_INFO(dev)->gen) {
> + case 7:
> + case 6:
> + bcs_instdone = I915_READ(BCS_INSTDONE);
> + case 5:
> + vcs_instdone = I915_READ(VCS_INSTDONE);
> + case 4:
> + instdone = I915_READ(INSTDONE_I965);
> + instdone1 = I915_READ(INSTDONE1);
> + break;
> + case 3:
> + case 2:
> + instdone = I915_READ(INSTDONE);
> + break;
> + }
> +
> + stuck =
> + (dev_priv->last_instdone[RCS] == ((instdone << 32) | instdone1)) &&
> + (dev_priv->last_instdone[VCS] == vcs_instdone) &&
> + (dev_priv->last_instdone[BCS] == bcs_instdone);
> +
> + dev_priv->last_instdone[RCS] = (instdone << 32) | instdone1;
> + dev_priv->last_instdone[VCS] = vcs_instdone;
> + dev_priv->last_instdone[BCS] = bcs_instdone;
> +
> + return stuck;
> +}
> +
> +static bool
> +acthd_stuck(struct drm_device *dev)
> +{
> + struct drm_i915_private *dev_priv = dev->dev_private;
> + uint32_t acthd = 0, vcs_acthd = 0, bcs_acthd = 0;
> + bool stuck = false;
> +
> + switch (INTEL_INFO(dev)->gen) {
> + case 7:
> + case 6:
> + bcs_acthd = intel_ring_get_active_head(&dev_priv->ring[BCS]);
> + case 5:
> + vcs_acthd = intel_ring_get_active_head(&dev_priv->ring[VCS]);
> + case 4:
> + case 3:
> + case 2:
> + acthd = intel_ring_get_active_head(&dev_priv->ring[RCS]);
> + break;
> + }
> +
> + stuck =
> + (dev_priv->last_acthd[RCS] == acthd) &&
> + (dev_priv->last_acthd[VCS] == vcs_acthd) &&
> + (dev_priv->last_acthd[BCS] == bcs_acthd);
> +
> + dev_priv->last_acthd[RCS] = acthd;
> + dev_priv->last_acthd[VCS] = vcs_acthd;
> + dev_priv->last_acthd[BCS] = bcs_acthd;
> +
> + return stuck;
> +}
> +
> +static bool gpu_stuck(struct drm_device *dev)
> +{
> + #define NUM_HANGCHECKS_TO_RESET 1
> +
> + struct drm_i915_private *dev_priv = dev->dev_private;
> +
> + if (dev_priv->hangcheck_count++ < NUM_HANGCHECKS_TO_RESET)
> + return false;
> +
> + return acthd_stuck(dev) && instdone_stuck(dev);
First check whether the gpu ist stuck, then increment the hangcheck_count
If the gpu is not stuck, we also need to clear the hangcheck_count again.
> +}
> +
> /**
> * This is called when the chip hasn't reported back with completed
> * batchbuffers in a long time. The first time this is called we simply record
> @@ -1676,13 +1755,11 @@ void i915_hangcheck_elapsed(unsigned long data)
> {
> struct drm_device *dev = (struct drm_device *)data;
> drm_i915_private_t *dev_priv = dev->dev_private;
> - uint32_t acthd, instdone, instdone1;
> bool err = false;
>
> if (!i915_enable_hangcheck)
> return;
>
> - /* If all work is done then ACTHD clearly hasn't advanced. */
> if (i915_hangcheck_ring_idle(&dev_priv->ring[RCS], &err) &&
> i915_hangcheck_ring_idle(&dev_priv->ring[VCS], &err) &&
> i915_hangcheck_ring_idle(&dev_priv->ring[BCS], &err)) {
> @@ -1692,50 +1769,30 @@ void i915_hangcheck_elapsed(unsigned long data)
> return;
> }
>
> - if (INTEL_INFO(dev)->gen < 4) {
> - acthd = I915_READ(ACTHD);
> - instdone = I915_READ(INSTDONE);
> - instdone1 = 0;
> - } else {
> - acthd = I915_READ(ACTHD_I965);
> - instdone = I915_READ(INSTDONE_I965);
> - instdone1 = I915_READ(INSTDONE1);
> - }
> -
> - if (dev_priv->last_acthd == acthd &&
> - dev_priv->last_instdone == instdone &&
> - dev_priv->last_instdone1 == instdone1) {
> - if (dev_priv->hangcheck_count++ > 1) {
> - DRM_ERROR("Hangcheck timer elapsed... GPU hung\n");
> + if (gpu_stuck(dev)) {
> + DRM_ERROR("Hangcheck timer elapsed... GPU hung\n");
>
> - if (!IS_GEN2(dev)) {
> - /* Is the chip hanging on a WAIT_FOR_EVENT?
> - * If so we can simply poke the RB_WAIT bit
> - * and break the hang. This should work on
> - * all but the second generation chipsets.
> - */
> -
> - if (kick_ring(&dev_priv->ring[RCS]))
> - goto repeat;
> + if (!IS_GEN2(dev)) {
> + /* Is the chip hanging on a WAIT_FOR_EVENT?
> + * If so we can simply poke the RB_WAIT bit
> + * and break the hang. This should work on
> + * all but the second generation chipsets.
> + */
>
> - if (HAS_BSD(dev) &&
> - kick_ring(&dev_priv->ring[VCS]))
> - goto repeat;
> + if (kick_ring(&dev_priv->ring[RCS]))
> + goto repeat;
>
> - if (HAS_BLT(dev) &&
> - kick_ring(&dev_priv->ring[BCS]))
> - goto repeat;
> - }
> + if (HAS_BSD(dev) &&
> + kick_ring(&dev_priv->ring[VCS]))
> + goto repeat;
>
> - i915_handle_error(dev, true);
> - return;
> + if (HAS_BLT(dev) &&
> + kick_ring(&dev_priv->ring[BCS]))
> + goto repeat;
> }
> - } else {
> - dev_priv->hangcheck_count = 0;
>
> - dev_priv->last_acthd = acthd;
> - dev_priv->last_instdone = instdone;
> - dev_priv->last_instdone1 = instdone1;
> + i915_handle_error(dev, true);
> + return;
> }
>
> repeat:
> --
> 1.7.6.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3] drm/i915: remove ring kick
2011-10-04 21:11 ` [PATCH 3/3] drm/i915: remove ring kick Ben Widawsky
@ 2011-10-07 14:00 ` Daniel Vetter
0 siblings, 0 replies; 8+ messages in thread
From: Daniel Vetter @ 2011-10-07 14:00 UTC (permalink / raw)
To: Ben Widawsky; +Cc: intel-gfx
On Tue, Oct 04, 2011 at 02:11:53PM -0700, Ben Widawsky wrote:
> Ring kick was racy (forcewake), and doesn't really do anything for us
> that shouldn't be done by reset.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
> drivers/gpu/drm/i915/i915_irq.c | 41 ---------------------------------------
> 1 files changed, 0 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 6b6abe1..6e168b6 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1645,27 +1645,6 @@ static bool i915_hangcheck_ring_idle(struct intel_ring_buffer *ring, bool *err)
> return false;
> }
>
> -static bool kick_ring(struct intel_ring_buffer *ring)
> -{
> - struct drm_device *dev = ring->dev;
> - struct drm_i915_private *dev_priv = dev->dev_private;
> - u32 tmp = I915_READ_CTL(ring);
> - if (tmp & RING_WAIT) {
> - DRM_ERROR("Kicking stuck wait on %s\n",
> - ring->name);
> - I915_WRITE_CTL(ring, tmp);
> - return true;
> - }
> - if (IS_GEN6(dev) &&
> - (tmp & RING_WAIT_SEMAPHORE)) {
> - DRM_ERROR("Kicking stuck semaphore on %s\n",
> - ring->name);
> - I915_WRITE_CTL(ring, tmp);
> - return true;
> - }
As discussed on irc, I think we need to retain the kick ring stuck on
MI_WAIT for older, broken userspace. Also, kicking rings stuck on MI_WAITS
should be pretty harmless, at most we see a bit of gargabe on the screen.
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] drm/i915: collect more per ring error info
2011-10-07 13:52 ` Daniel Vetter
@ 2011-10-07 16:19 ` Chris Wilson
0 siblings, 0 replies; 8+ messages in thread
From: Chris Wilson @ 2011-10-07 16:19 UTC (permalink / raw)
To: Daniel Vetter, Ben Widawsky; +Cc: intel-gfx
On Fri, 7 Oct 2011 15:52:01 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, Oct 04, 2011 at 02:11:51PM -0700, Ben Widawsky wrote:
> > V1: Just added per ring page faults
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> > V2: Added much more per ring stuff
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>
> While you wrestle this, can we cut down on the line-noise a bit by using
> ring->mmio_base an per-ring functions? By storing VCS, BCS, RCS in
> ring->id instead of the flags that are currently there this should reduce
> quite a bit.
You beat me to it, I was also going to suggest this. Seconded ;-)
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-10-07 16:19 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-04 21:11 [PATCH v2 0/3] hangcheck fixes Ben Widawsky
2011-10-04 21:11 ` [PATCH 1/3] drm/i915: collect more per ring error info Ben Widawsky
2011-10-07 13:52 ` Daniel Vetter
2011-10-07 16:19 ` Chris Wilson
2011-10-04 21:11 ` [PATCH 2/3] drm/i915: check acthd for all rings Ben Widawsky
2011-10-07 13:59 ` Daniel Vetter
2011-10-04 21:11 ` [PATCH 3/3] drm/i915: remove ring kick Ben Widawsky
2011-10-07 14:00 ` Daniel Vetter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.