All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: [PATCH] drm/i915: add interface to simulate gpu hangs
Date: Thu,  3 May 2012 14:48:16 +0200	[thread overview]
Message-ID: <1336049296-5494-1-git-send-email-daniel.vetter@ffwll.ch> (raw)
In-Reply-To: <1335532667-10597-1-git-send-email-daniel.vetter@ffwll.ch>

gpu reset is a very important piece of our infrastructure.
Unfortunately we only really it test by actually hanging the gpu,
which often has bad side-effects for the entire system. And the gpu
hang handling code is one of the rather complicated pieces of code we
have, consisting of
- hang detection
- error capture
- actual gpu reset
- reset of all the gem bookkeeping
- reinitialition of the entire gpu

This patch adds a debugfs to selectively stopping rings by ceasing to
update the hw tail pointer, which will result in the gpu no longer
updating it's head pointer and eventually to the hangcheck firing.
This way we can exercise the gpu hang code under controlled conditions
without a dying gpu taking down the entire systems.

Patch motivated by me forgetting to properly reinitialize ppgtt after
a gpu reset.

Usage:

echo $((1 << $ringnum)) > i915_ring_stop # stops one ring

echo 0xffffffff > i915_ring_stop # stops all, future-proof version

then run whatever testload is desired. i915_ring_stop automatically
resets after a gpu hang is detected to avoid hanging the gpu to fast
and declaring it wedged.

v2: Incorporate feedback from Chris Wilson.

v3: Add the missing cleanup.

v4: Fix up inconsistent size of ring_stop_read vs _write, noticed by
Eugeni Dodonov.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |   65 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.c         |    2 +
 drivers/gpu/drm/i915/i915_drv.h         |    2 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |    4 ++
 4 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 120db46..602e100 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1585,6 +1585,64 @@ static const struct file_operations i915_wedged_fops = {
 };
 
 static ssize_t
+i915_ring_stop_read(struct file *filp,
+		    char __user *ubuf,
+		    size_t max,
+		    loff_t *ppos)
+{
+	struct drm_device *dev = filp->private_data;
+	drm_i915_private_t *dev_priv = dev->dev_private;
+	char buf[20];
+	int len;
+
+	len = snprintf(buf, sizeof(buf),
+		       "0x%08x\n", dev_priv->stop_rings);
+
+	if (len > sizeof(buf))
+		len = sizeof(buf);
+
+	return simple_read_from_buffer(ubuf, max, ppos, buf, len);
+}
+
+static ssize_t
+i915_ring_stop_write(struct file *filp,
+		     const char __user *ubuf,
+		     size_t cnt,
+		     loff_t *ppos)
+{
+	struct drm_device *dev = filp->private_data;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	char buf[20];
+	int val = 0;
+
+	if (cnt > 0) {
+		if (cnt > sizeof(buf) - 1)
+			return -EINVAL;
+
+		if (copy_from_user(buf, ubuf, cnt))
+			return -EFAULT;
+		buf[cnt] = 0;
+
+		val = simple_strtoul(buf, NULL, 0);
+	}
+
+	DRM_DEBUG_DRIVER("Stopping rings 0x%08x\n", val);
+
+	mutex_lock(&dev->struct_mutex);
+	dev_priv->stop_rings = val;
+	mutex_unlock(&dev->struct_mutex);
+
+	return cnt;
+}
+
+static const struct file_operations i915_ring_stop_fops = {
+	.owner = THIS_MODULE,
+	.open = simple_open,
+	.read = i915_ring_stop_read,
+	.write = i915_ring_stop_write,
+	.llseek = default_llseek,
+};
+static ssize_t
 i915_max_freq_read(struct file *filp,
 		   char __user *ubuf,
 		   size_t max,
@@ -1884,6 +1942,11 @@ int i915_debugfs_init(struct drm_minor *minor)
 				  &i915_cache_sharing_fops);
 	if (ret)
 		return ret;
+	ret = i915_debugfs_create(minor->debugfs_root, minor,
+				  "i915_ring_stop",
+				  &i915_ring_stop_fops);
+	if (ret)
+		return ret;
 
 	return drm_debugfs_create_files(i915_debugfs_list,
 					I915_DEBUGFS_ENTRIES,
@@ -1902,6 +1965,8 @@ void i915_debugfs_cleanup(struct drm_minor *minor)
 				 1, minor);
 	drm_debugfs_remove_files((struct drm_info_list *) &i915_cache_sharing_fops,
 				 1, minor);
+	drm_debugfs_remove_files((struct drm_info_list *) &i915_ring_stop_fops,
+				 1, minor);
 }
 
 #endif /* CONFIG_DEBUG_FS */
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 8a98f9a..90a84f9 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -800,6 +800,8 @@ int i915_reset(struct drm_device *dev, u8 flags)
 	if (!mutex_trylock(&dev->struct_mutex))
 		return -EBUSY;
 
+	dev_priv->stop_rings = 0;
+
 	i915_gem_reset(dev);
 
 	ret = -ENODEV;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0095c8d..5e62b35 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -413,6 +413,8 @@ typedef struct drm_i915_private {
 	uint32_t last_instdone;
 	uint32_t last_instdone1;
 
+	unsigned int stop_rings;
+
 	unsigned long cfb_size;
 	unsigned int cfb_fb;
 	enum plane cfb_plane;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 427b7c5..d2de16d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1221,7 +1221,11 @@ int intel_ring_begin(struct intel_ring_buffer *ring,
 
 void intel_ring_advance(struct intel_ring_buffer *ring)
 {
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
 	ring->tail &= ring->size - 1;
+	if (dev_priv->stop_rings & intel_ring_flag(ring))
+		return;
 	ring->write_tail(ring, ring->tail);
 }
 
-- 
1.7.7.6

  parent reply	other threads:[~2012-05-03 13:52 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-27 13:17 [PATCH 01/10] drm/i915: add interface to simulate gpu hangs Daniel Vetter
2012-04-27 13:17 ` [PATCH 02/10] drm/i915: rework dev->first_error locking Daniel Vetter
2012-05-04 17:04   ` Eugeni Dodonov
2012-04-27 13:17 ` [PATCH 03/10] drm/i915: allow the existing error_state to be destroyed Daniel Vetter
2012-05-04 11:56   ` Daniel Vetter
2012-05-04 17:15     ` Eugeni Dodonov
2012-05-04 19:58       ` Daniel Vetter
2012-04-27 13:17 ` [PATCH 04/10] drm/i915: simplify i915_reset a bit Daniel Vetter
2012-05-04 16:47   ` Eugeni Dodonov
2012-04-27 13:17 ` [PATCH 05/10] drm/i915: extract intel_gpu_reset Daniel Vetter
2012-04-30  1:03   ` Ben Widawsky
2012-05-04 16:51   ` Eugeni Dodonov
2012-04-27 13:17 ` [PATCH 06/10] drm/i915: make gpu hangman more resilient Daniel Vetter
2012-05-04 16:47   ` Eugeni Dodonov
2012-04-27 13:17 ` [PATCH 07/10] drm/i915: kill flags parameter for reset functions Daniel Vetter
2012-05-04 16:47   ` Eugeni Dodonov
2012-04-27 13:17 ` [PATCH 08/10] drm/i915: also reset the media engine on gen4/5 Daniel Vetter
2012-04-27 13:17 ` [PATCH 09/10] drm/i915: remove modeset reset from i915_reset Daniel Vetter
2012-04-27 13:17 ` [PATCH 10/10] drm/i915: kill gen4 gpu reset code Daniel Vetter
2012-04-27 18:49   ` Eric Anholt
2012-04-27 19:17     ` Daniel Vetter
2012-04-27 23:26       ` Eric Anholt
2012-05-02 19:33         ` [PATCH] drm/i915: fix gen4 gpu reset Daniel Vetter
2012-05-02 19:54           ` Kenneth Graunke
2012-05-04 12:07             ` Daniel Vetter
2012-05-04 17:06           ` Eugeni Dodonov
2012-04-28  4:56 ` [PATCH 01/10] drm/i915: add interface to simulate gpu hangs Ben Widawsky
2012-05-02 15:23   ` Daniel Vetter
2012-05-03 12:48 ` Daniel Vetter [this message]
2012-05-03 19:00   ` [PATCH] " Eugeni Dodonov
2012-05-05 19:13     ` Daniel Vetter
  -- strict thread matches above, loose matches on Subject: below --
2011-11-10 13:18 [PATCH 3/9] " Daniel Vetter
2011-11-10 16:34 ` [PATCH] " Daniel Vetter
2011-12-02 22:21   ` Daniel Vetter
2011-12-03  1:33     ` Chris Wilson
2011-12-05 23:20     ` Ben Widawsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1336049296-5494-1-git-send-email-daniel.vetter@ffwll.ch \
    --to=daniel.vetter@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.