EIO cleanup

All of lore.kernel.org
 help / color / mirror / Atom feed

* EIO cleanup
@ 2016-04-09 11:19 Chris Wilson
  2016-04-09 11:19 ` [PATCH 01/10] drm/i915: Force clean compilation with -Werror Chris Wilson
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 11:19 UTC (permalink / raw)
  To: intel-gfx

Mostly reviewed series, just 2/10 needs attention (and 1/10 will be good
to run through 0day to make sure it is invisible to the automated builders).
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 01/10] drm/i915: Force clean compilation with -Werror
  2016-04-09 11:19 EIO cleanup Chris Wilson
@ 2016-04-09 11:19 ` Chris Wilson
  2016-04-09 11:19 ` [PATCH 02/10] drm/i915: Disentangle i915_drv.h includes Chris Wilson
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 11:19 UTC (permalink / raw)
  To: intel-gfx; +Cc: Jani Nikula

Our driver compiles clean (nowadays thanks to 0day) but for me, at least,
it would be beneficial if the compiler threw an error rather than a
warning when it found a piece of suspect code. (I use this to
compile-check patch series and want to break on the first compiler error
in order to fix the patch.)

v2: Kick off a new "Debugging" submenu for i915.ko

At this point, we applied it to the kernel and promptly kicked it out
again as it broke buildbots (due to a compiler warning on 32bits):

commit 908d759b210effb33d927a8cb6603a16448474e4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue May 26 07:46:21 2015 +0200

    Revert "drm/i915: Force clean compilation with -Werror"

v3: Avoid enabling -Werror for allyesconfig/allmodconfig builds, using
COMPILE_TEST as a suitable proxy suggested by Andrew Morton. (Damien)
Only make the option available for EXPERT to reinforce that the option
should not be casually enabled.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/Kconfig.debug | 17 +++++++++++++++++
 drivers/gpu/drm/i915/Makefile      |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
index 649a562ddf17..61485c8ba3a8 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -1,3 +1,20 @@
+config DRM_I915_WERROR
+        bool "Force GCC to throw an error instead of a warning when compiling"
+        # As this may inadvertently break the build, only allow the user
+        # to shoot oneself in the foot iff they aim really hard
+        depends on EXPERT
+        # We use the dependency on !COMPILE_TEST to not be enabled in
+        # allmodconfig or allyesconfig configurations
+        depends on !COMPILE_TEST
+        default n
+        help
+          Add -Werror to the build flags for (and only for) i915.ko.
+          Do not enable this unless you are writing code for the i915.ko module.
+
+          Recommended for driver developers only.
+
+          If in doubt, say "N".
+
 config DRM_I915_DEBUG
         bool "Enable additional driver debugging"
         depends on DRM_I915
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 7ffb51b0cbc2..0b88ba0f3c1f 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -2,6 +2,8 @@
 # Makefile for the drm device driver.  This driver provides support for the
 # Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
 
+subdir-ccflags-$(CONFIG_DRM_I915_WERROR) := -Werror
+
 # Please keep these build lists sorted!
 
 # core driver code
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 02/10] drm/i915: Disentangle i915_drv.h includes
  2016-04-09 11:19 EIO cleanup Chris Wilson
  2016-04-09 11:19 ` [PATCH 01/10] drm/i915: Force clean compilation with -Werror Chris Wilson
@ 2016-04-09 11:19 ` Chris Wilson
  2016-04-11  8:00   ` Joonas Lahtinen
  2016-04-09 11:19 ` [PATCH 03/10] drm/i915: Add GEM debugging Kconfig option Chris Wilson
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 11:19 UTC (permalink / raw)
  To: intel-gfx

Separate out the layers of includes (linux, drm, intel, i915) so that it
is a little easier to order our definitions between our multiple
reentrant headers. A couple of headers needed fixes to make them more
standalone (forgotten includes, forward declarations etc).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h  | 29 +++++++++++++++++------------
 drivers/gpu/drm/i915/intel_guc.h |  2 ++
 drivers/gpu/drm/i915/intel_lrc.h |  2 ++
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 542401659013..1753077aebbc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -33,27 +33,32 @@
 #include <uapi/drm/i915_drm.h>
 #include <uapi/drm/drm_fourcc.h>
 
-#include <drm/drmP.h>
-#include "i915_params.h"
-#include "i915_reg.h"
-#include "intel_bios.h"
-#include "intel_ringbuffer.h"
-#include "intel_lrc.h"
-#include "i915_gem_gtt.h"
-#include "i915_gem_render_state.h"
 #include <linux/io-mapping.h>
 #include <linux/i2c.h>
 #include <linux/i2c-algo-bit.h>
-#include <drm/intel-gtt.h>
-#include <drm/drm_legacy.h> /* for struct drm_dma_handle */
-#include <drm/drm_gem.h>
 #include <linux/backlight.h>
 #include <linux/hashtable.h>
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
-#include "intel_guc.h"
+#include <linux/shmem_fs.h>
+
+#include <drm/drmP.h>
+#include <drm/intel-gtt.h>
+#include <drm/drm_legacy.h> /* for struct drm_dma_handle */
+#include <drm/drm_gem.h>
+
+#include "i915_params.h"
+#include "i915_reg.h"
+
+#include "intel_bios.h"
 #include "intel_dpll_mgr.h"
+#include "intel_guc.h"
+#include "intel_lrc.h"
+#include "intel_ringbuffer.h"
+
+#include "i915_gem_gtt.h"
+#include "i915_gem_render_state.h"
 
 /* General customization:
  */
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 73002e901ff2..3bb85b127cb0 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -27,6 +27,8 @@
 #include "intel_guc_fwif.h"
 #include "i915_guc_reg.h"
 
+struct drm_i915_gem_request;
+
 struct i915_guc_client {
 	struct drm_i915_gem_object *client_obj;
 	struct intel_context *owner;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 0b0853eee91e..5136a2cf50b5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -24,6 +24,8 @@
 #ifndef _INTEL_LRC_H_
 #define _INTEL_LRC_H_
 
+#include "intel_ringbuffer.h"
+
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
 /* Execlists regs */
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 03/10] drm/i915: Add GEM debugging Kconfig option
  2016-04-09 11:19 EIO cleanup Chris Wilson
  2016-04-09 11:19 ` [PATCH 01/10] drm/i915: Force clean compilation with -Werror Chris Wilson
  2016-04-09 11:19 ` [PATCH 02/10] drm/i915: Disentangle i915_drv.h includes Chris Wilson
@ 2016-04-09 11:19 ` Chris Wilson
  2016-04-11  8:12   ` Joonas Lahtinen
  2016-04-09 11:19 ` [PATCH 04/10] drm/i915: Hide the atomic_read(reset_counter) behind a helper Chris Wilson
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 11:19 UTC (permalink / raw)
  To: intel-gfx

Currently there is a #define to enable extra BUG_ON for debugging
requests and associated activities. I want to expand its use to cover
all of GEM internals (so that we can saturate the code with asserts).
We can add a Kconfig option to make it easier to enable - with the usual
caveats of not enabling unless explicitly requested.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/Kconfig.debug | 12 ++++++++++++
 drivers/gpu/drm/i915/i915_drv.h    |  1 +
 drivers/gpu/drm/i915/i915_gem.c    | 12 +++++-------
 drivers/gpu/drm/i915/i915_gem.h    | 34 ++++++++++++++++++++++++++++++++++
 4 files changed, 52 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem.h

diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
index 61485c8ba3a8..8f404103341d 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -27,3 +27,15 @@ config DRM_I915_DEBUG
 
           If in doubt, say "N".
 
+config DRM_I915_DEBUG_GEM
+        bool "Insert extra checks into the GEM internals"
+        default n
+        depends on DRM_I915_WERROR
+        help
+          Enable extra sanity checks (including BUGs) along the GEM driver
+          paths that may slow the system down and if hit hang the machine.
+
+          Recommended for driver developers only.
+
+          If in doubt, say "N".
+
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1753077aebbc..fcecc0a7332f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -57,6 +57,7 @@
 #include "intel_lrc.h"
 #include "intel_ringbuffer.h"
 
+#include "i915_gem.h"
 #include "i915_gem_gtt.h"
 #include "i915_gem_render_state.h"
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5a65a7663b88..716b7e00dd88 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -38,8 +38,6 @@
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
 
-#define RQ_BUG_ON(expr)
-
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
 static void
@@ -1521,7 +1519,7 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
 
 			i915_gem_object_retire__read(obj, i);
 		}
-		RQ_BUG_ON(obj->active);
+		GEM_BUG_ON(obj->active);
 	}
 
 	return 0;
@@ -2422,8 +2420,8 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 static void
 i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
 {
-	RQ_BUG_ON(obj->last_write_req == NULL);
-	RQ_BUG_ON(!(obj->active & intel_engine_flag(obj->last_write_req->engine)));
+	GEM_BUG_ON(obj->last_write_req == NULL);
+	GEM_BUG_ON(!(obj->active & intel_engine_flag(obj->last_write_req->engine)));
 
 	i915_gem_request_assign(&obj->last_write_req, NULL);
 	intel_fb_obj_flush(obj, true, ORIGIN_CS);
@@ -2434,8 +2432,8 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 {
 	struct i915_vma *vma;
 
-	RQ_BUG_ON(obj->last_read_req[ring] == NULL);
-	RQ_BUG_ON(!(obj->active & (1 << ring)));
+	GEM_BUG_ON(obj->last_read_req[ring] == NULL);
+	GEM_BUG_ON(!(obj->active & (1 << ring)));
 
 	list_del_init(&obj->engine_list[ring]);
 	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
new file mode 100644
index 000000000000..8292e797d9b5
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright © 2016 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __I915_GEM_H__
+#define __I915_GEM_H__
+
+#ifdef CONFIG_DRM_I915_DEBUG_GEM
+#define GEM_BUG_ON(expr) BUG_ON(expr)
+#else
+#define GEM_BUG_ON(expr)
+#endif
+
+#endif /* __I915_GEM_H__ */
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 04/10] drm/i915: Hide the atomic_read(reset_counter) behind a helper
  2016-04-09 11:19 EIO cleanup Chris Wilson
                   ` (2 preceding siblings ...)
  2016-04-09 11:19 ` [PATCH 03/10] drm/i915: Add GEM debugging Kconfig option Chris Wilson
@ 2016-04-09 11:19 ` Chris Wilson
  2016-04-09 11:19 ` [PATCH 05/10] drm/i915: Simplify checking of GPU reset_counter in display pageflips Chris Wilson
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 11:19 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

This is principally a little bit of syntatic sugar to hide the
atomic_read()s throughout the code to retrieve the current reset_counter.
It also provides the other utility functions to check the reset state on the
already read reset_counter, so that (in later patches) we can read it once
and do multiple tests rather than risk the value changing between tests.

v2: Be more strict on converting existing i915_reset_in_progress() over to
the more verbose i915_reset_in_progress_or_wedged().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h         | 32 ++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem.c         | 16 ++++++++--------
 drivers/gpu/drm/i915/i915_irq.c         |  2 +-
 drivers/gpu/drm/i915/intel_display.c    | 18 +++++++++++-------
 drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  7 ++++---
 7 files changed, 55 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 9640738aabf2..5def0c076ee2 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4722,7 +4722,7 @@ i915_wedged_get(void *data, u64 *val)
 	struct drm_device *dev = data;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	*val = atomic_read(&dev_priv->gpu_error.reset_counter);
+	*val = i915_reset_counter(&dev_priv->gpu_error);
 
 	return 0;
 }
@@ -4741,7 +4741,7 @@ i915_wedged_set(void *data, u64 val)
 	 * while it is writing to 'i915_wedged'
 	 */
 
-	if (i915_reset_in_progress(&dev_priv->gpu_error))
+	if (i915_reset_in_progress_or_wedged(&dev_priv->gpu_error))
 		return -EAGAIN;
 
 	intel_runtime_pm_get(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fcecc0a7332f..38f6dce05574 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3049,20 +3049,44 @@ void i915_gem_retire_requests_ring(struct intel_engine_cs *engine);
 int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
 				      bool interruptible);
 
+static inline u32 i915_reset_counter(struct i915_gpu_error *error)
+{
+	return atomic_read(&error->reset_counter);
+}
+
+static inline bool __i915_reset_in_progress(u32 reset)
+{
+	return unlikely(reset & I915_RESET_IN_PROGRESS_FLAG);
+}
+
+static inline bool __i915_reset_in_progress_or_wedged(u32 reset)
+{
+	return unlikely(reset & (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
+}
+
+static inline bool __i915_terminally_wedged(u32 reset)
+{
+	return unlikely(reset & I915_WEDGED);
+}
+
 static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
 {
-	return unlikely(atomic_read(&error->reset_counter)
-			& (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
+	return __i915_reset_in_progress(i915_reset_counter(error));
+}
+
+static inline bool i915_reset_in_progress_or_wedged(struct i915_gpu_error *error)
+{
+	return __i915_reset_in_progress_or_wedged(i915_reset_counter(error));
 }
 
 static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 {
-	return atomic_read(&error->reset_counter) & I915_WEDGED;
+	return __i915_terminally_wedged(i915_reset_counter(error));
 }
 
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
-	return ((atomic_read(&error->reset_counter) & ~I915_WEDGED) + 1) / 2;
+	return ((i915_reset_counter(error) & ~I915_WEDGED) + 1) / 2;
 }
 
 static inline bool i915_stop_ring_allow_ban(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 716b7e00dd88..eb79a54c09bc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -83,7 +83,7 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
 {
 	int ret;
 
-#define EXIT_COND (!i915_reset_in_progress(error) || \
+#define EXIT_COND (!i915_reset_in_progress_or_wedged(error) || \
 		   i915_terminally_wedged(error))
 	if (EXIT_COND)
 		return 0;
@@ -1112,7 +1112,7 @@ int
 i915_gem_check_wedge(struct i915_gpu_error *error,
 		     bool interruptible)
 {
-	if (i915_reset_in_progress(error)) {
+	if (i915_reset_in_progress_or_wedged(error)) {
 		/* Non-interruptible callers can't handle -EAGAIN, hence return
 		 * -EIO unconditionally for these. */
 		if (!interruptible)
@@ -1299,7 +1299,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 		/* We need to check whether any gpu reset happened in between
 		 * the caller grabbing the seqno and now ... */
-		if (reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter)) {
+		if (reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
 			/* ... but upgrade the -EAGAIN to an -EIO if the gpu
 			 * is truely gone. */
 			ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
@@ -1474,7 +1474,7 @@ i915_wait_request(struct drm_i915_gem_request *req)
 		return ret;
 
 	ret = __i915_wait_request(req,
-				  atomic_read(&dev_priv->gpu_error.reset_counter),
+				  i915_reset_counter(&dev_priv->gpu_error),
 				  interruptible, NULL, NULL);
 	if (ret)
 		return ret;
@@ -1563,7 +1563,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 
 	if (readonly) {
 		struct drm_i915_gem_request *req;
@@ -3128,7 +3128,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	}
 
 	drm_gem_object_unreference(&obj->base);
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 
 	for (i = 0; i < I915_NUM_ENGINES; i++) {
 		if (obj->last_read_req[i] == NULL)
@@ -3173,7 +3173,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
 		ret = __i915_wait_request(from_req,
-					  atomic_read(&i915->gpu_error.reset_counter),
+					  i915_reset_counter(&i915->gpu_error),
 					  i915->mm.interruptible,
 					  NULL,
 					  &i915->rps.semaphores);
@@ -4154,7 +4154,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 
 		target = request;
 	}
-	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 	if (target)
 		i915_gem_request_reference(target);
 	spin_unlock(&file_priv->mm.lock);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 679f08c944ef..f620581e567a 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2501,7 +2501,7 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 	 * the reset in-progress bit is only ever set by code outside of this
 	 * work we don't need to worry about any other races.
 	 */
-	if (i915_reset_in_progress(error) && !i915_terminally_wedged(error)) {
+	if (i915_reset_in_progress_or_wedged(error) && !i915_terminally_wedged(error)) {
 		DRM_DEBUG_DRIVER("resetting chip\n");
 		kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
 				   reset_event);
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index feb7028341b8..a1a15342be72 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3200,10 +3200,12 @@ static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
 	struct drm_device *dev = crtc->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+	unsigned reset_counter;
 	bool pending;
 
-	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
-	    intel_crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
+	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
+	if (intel_crtc->reset_counter != reset_counter ||
+	    __i915_reset_in_progress_or_wedged(reset_counter))
 		return false;
 
 	spin_lock_irq(&dev->event_lock);
@@ -10903,9 +10905,11 @@ static bool page_flip_finished(struct intel_crtc *crtc)
 {
 	struct drm_device *dev = crtc->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	unsigned reset_counter;
 
-	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
-	    crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
+	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
+	if (crtc->reset_counter != reset_counter ||
+	    __i915_reset_in_progress_or_wedged(reset_counter))
 		return true;
 
 	/*
@@ -11568,7 +11572,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 		goto cleanup;
 
 	atomic_inc(&intel_crtc->unpin_work_count);
-	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+	intel_crtc->reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 
 	if (INTEL_INFO(dev)->gen >= 5 || IS_G4X(dev))
 		work->flip_count = I915_READ(PIPE_FLIPCOUNT_G4X(pipe)) + 1;
@@ -13379,10 +13383,10 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 		return ret;
 
 	ret = drm_atomic_helper_prepare_planes(dev, state);
-	if (!ret && !async && !i915_reset_in_progress(&dev_priv->gpu_error)) {
+	if (!ret && !async && !i915_reset_in_progress_or_wedged(&dev_priv->gpu_error)) {
 		u32 reset_counter;
 
-		reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+		reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 		mutex_unlock(&dev->struct_mutex);
 
 		for_each_plane_in_state(state, plane, plane_state, i) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f209ecfdcb5c..95fc704eeade 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1057,7 +1057,7 @@ void intel_logical_ring_stop(struct intel_engine_cs *engine)
 		return;
 
 	ret = intel_engine_idle(engine);
-	if (ret && !i915_reset_in_progress(&to_i915(engine->dev)->gpu_error))
+	if (ret && !i915_reset_in_progress_or_wedged(&dev_priv->gpu_error))
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  engine->name, ret);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e144f4f301bf..493124c3d934 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2386,8 +2386,8 @@ int intel_engine_idle(struct intel_engine_cs *engine)
 
 	/* Make sure we do not trigger any retires */
 	return __i915_wait_request(req,
-				   atomic_read(&to_i915(engine->dev)->gpu_error.reset_counter),
-				   to_i915(engine->dev)->mm.interruptible,
+				   i915_reset_counter(&req->i915->gpu_error),
+				   req->i915->mm.interruptible,
 				   NULL, NULL);
 }
 
@@ -3212,7 +3212,8 @@ intel_stop_engine(struct intel_engine_cs *engine)
 		return;
 
 	ret = intel_engine_idle(engine);
-	if (ret && !i915_reset_in_progress(&to_i915(engine->dev)->gpu_error))
+	if (ret &&
+	    !i915_reset_in_progress_or_wedged(&to_i915(engine->dev)->gpu_error))
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  engine->name, ret);
 
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 05/10] drm/i915: Simplify checking of GPU reset_counter in display pageflips
  2016-04-09 11:19 EIO cleanup Chris Wilson
                   ` (3 preceding siblings ...)
  2016-04-09 11:19 ` [PATCH 04/10] drm/i915: Hide the atomic_read(reset_counter) behind a helper Chris Wilson
@ 2016-04-09 11:19 ` Chris Wilson
  2016-04-09 11:19 ` [PATCH 06/10] drm/i915: Tighten reset_counter for reset status Chris Wilson
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 11:19 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

If we, when we store the reset_counter for the operation, we ensure that
it is not in a wedged or in the middle of a reset, we can then assert that
if any reset occurs the reset_counter must change. Later we can just
compare the operation's reset epoch against the current counter to see
if we need to abort the operation (to handle the hang).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_display.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index a1a15342be72..3c01ebf10fa2 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3198,14 +3198,12 @@ void intel_finish_reset(struct drm_device *dev)
 static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
 {
 	struct drm_device *dev = crtc->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	unsigned reset_counter;
 	bool pending;
 
-	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
-	if (intel_crtc->reset_counter != reset_counter ||
-	    __i915_reset_in_progress_or_wedged(reset_counter))
+	reset_counter = i915_reset_counter(&to_i915(dev)->gpu_error);
+	if (intel_crtc->reset_counter != reset_counter)
 		return false;
 
 	spin_lock_irq(&dev->event_lock);
@@ -10908,8 +10906,7 @@ static bool page_flip_finished(struct intel_crtc *crtc)
 	unsigned reset_counter;
 
 	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
-	if (crtc->reset_counter != reset_counter ||
-	    __i915_reset_in_progress_or_wedged(reset_counter))
+	if (crtc->reset_counter != reset_counter)
 		return true;
 
 	/*
@@ -11571,8 +11568,13 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	if (ret)
 		goto cleanup;
 
-	atomic_inc(&intel_crtc->unpin_work_count);
 	intel_crtc->reset_counter = i915_reset_counter(&dev_priv->gpu_error);
+	if (__i915_reset_in_progress_or_wedged(intel_crtc->reset_counter)) {
+		ret = -EIO;
+		goto cleanup;
+	}
+
+	atomic_inc(&intel_crtc->unpin_work_count);
 
 	if (INTEL_INFO(dev)->gen >= 5 || IS_G4X(dev))
 		work->flip_count = I915_READ(PIPE_FLIPCOUNT_G4X(pipe)) + 1;
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 06/10] drm/i915: Tighten reset_counter for reset status
  2016-04-09 11:19 EIO cleanup Chris Wilson
                   ` (4 preceding siblings ...)
  2016-04-09 11:19 ` [PATCH 05/10] drm/i915: Simplify checking of GPU reset_counter in display pageflips Chris Wilson
@ 2016-04-09 11:19 ` Chris Wilson
  2016-04-09 11:19 ` [PATCH 07/10] drm/i915: Store the reset counter when constructing a request Chris Wilson
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 11:19 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

In the reset_counter, we use two bits to track a GPU hang and reset. The
low bit is a "reset-in-progress" flag that we set to signal when we need
to break waiters in order for the recovery task to grab the mutex. As
soon as the recovery task has the mutex, we can clear that flag (which
we do by incrementing the reset_counter thereby incrementing the gobal
reset epoch). By clearing that flag when the recovery task holds the
struct_mutex, we can forgo a second flag that simply tells GEM to ignore
the "reset-in-progress" flag.

The second flag we store in the reset_counter is whether the
reset failed and we consider the GPU terminally wedged. Whilst this flag
is set, all access to the GPU (at least through GEM rather than direct mmio
access) is verboten.

PS: Fun is in store, as in the future we want to move from a global
reset epoch to a per-engine reset engine with request recovery.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  4 ++--
 drivers/gpu/drm/i915/i915_drv.c     | 39 ++++++++++++++++++++++---------------
 drivers/gpu/drm/i915/i915_drv.h     |  3 ---
 drivers/gpu/drm/i915/i915_gem.c     | 27 +++++++++----------------
 drivers/gpu/drm/i915/i915_irq.c     | 21 ++------------------
 5 files changed, 36 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5def0c076ee2..70676fad8669 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4722,7 +4722,7 @@ i915_wedged_get(void *data, u64 *val)
 	struct drm_device *dev = data;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	*val = i915_reset_counter(&dev_priv->gpu_error);
+	*val = i915_terminally_wedged(&dev_priv->gpu_error);
 
 	return 0;
 }
@@ -4741,7 +4741,7 @@ i915_wedged_set(void *data, u64 val)
 	 * while it is writing to 'i915_wedged'
 	 */
 
-	if (i915_reset_in_progress_or_wedged(&dev_priv->gpu_error))
+	if (i915_reset_in_progress(&dev_priv->gpu_error))
 		return -EAGAIN;
 
 	intel_runtime_pm_get(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 29b4e79c85a6..310dc61817fa 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -880,23 +880,32 @@ int i915_resume_switcheroo(struct drm_device *dev)
 int i915_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	bool simulated;
+	struct i915_gpu_error *error = &dev_priv->gpu_error;
+	unsigned reset_counter;
 	int ret;
 
 	intel_reset_gt_powersave(dev);
 
 	mutex_lock(&dev->struct_mutex);
 
-	i915_gem_reset(dev);
+	/* Clear any previous failed attempts at recovery. Time to try again. */
+	atomic_andnot(I915_WEDGED, &error->reset_counter);
 
-	simulated = dev_priv->gpu_error.stop_rings != 0;
+	/* Clear the reset-in-progress flag and increment the reset epoch. */
+	reset_counter = atomic_inc_return(&error->reset_counter);
+	if (WARN_ON(__i915_reset_in_progress(reset_counter))) {
+		ret = -EIO;
+		goto error;
+	}
+
+	i915_gem_reset(dev);
 
 	ret = intel_gpu_reset(dev, ALL_ENGINES);
 
 	/* Also reset the gpu hangman. */
-	if (simulated) {
+	if (error->stop_rings != 0) {
 		DRM_INFO("Simulated gpu hang, resetting stop_rings\n");
-		dev_priv->gpu_error.stop_rings = 0;
+		error->stop_rings = 0;
 		if (ret == -ENODEV) {
 			DRM_INFO("Reset not implemented, but ignoring "
 				 "error for simulated gpu hangs\n");
@@ -909,8 +918,7 @@ int i915_reset(struct drm_device *dev)
 
 	if (ret) {
 		DRM_ERROR("Failed to reset chip: %i\n", ret);
-		mutex_unlock(&dev->struct_mutex);
-		return ret;
+		goto error;
 	}
 
 	intel_overlay_reset(dev_priv);
@@ -929,20 +937,14 @@ int i915_reset(struct drm_device *dev)
 	 * was running at the time of the reset (i.e. we weren't VT
 	 * switched away).
 	 */
-
-	/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */
-	dev_priv->gpu_error.reload_in_reset = true;
-
 	ret = i915_gem_init_hw(dev);
-
-	dev_priv->gpu_error.reload_in_reset = false;
-
-	mutex_unlock(&dev->struct_mutex);
 	if (ret) {
 		DRM_ERROR("Failed hw init on reset %d\n", ret);
-		return ret;
+		goto error;
 	}
 
+	mutex_unlock(&dev->struct_mutex);
+
 	/*
 	 * rps/rc6 re-init is necessary to restore state lost after the
 	 * reset and the re-install of gt irqs. Skip for ironlake per
@@ -953,6 +955,11 @@ int i915_reset(struct drm_device *dev)
 		intel_enable_gt_powersave(dev);
 
 	return 0;
+
+error:
+	atomic_or(I915_WEDGED, &error->reset_counter);
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
 }
 
 static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 38f6dce05574..a4026babd43b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1391,9 +1391,6 @@ struct i915_gpu_error {
 
 	/* For missed irq/seqno simulation. */
 	unsigned int test_irq_rings;
-
-	/* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset   */
-	bool reload_in_reset;
 };
 
 enum modeset_restore {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index eb79a54c09bc..2cab1786be79 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -83,9 +83,7 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
 {
 	int ret;
 
-#define EXIT_COND (!i915_reset_in_progress_or_wedged(error) || \
-		   i915_terminally_wedged(error))
-	if (EXIT_COND)
+	if (!i915_reset_in_progress(error))
 		return 0;
 
 	/*
@@ -94,17 +92,16 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
 	 * we should simply try to bail out and fail as gracefully as possible.
 	 */
 	ret = wait_event_interruptible_timeout(error->reset_queue,
-					       EXIT_COND,
+					       !i915_reset_in_progress(error),
 					       10*HZ);
 	if (ret == 0) {
 		DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
 		return -EIO;
 	} else if (ret < 0) {
 		return ret;
+	} else {
+		return 0;
 	}
-#undef EXIT_COND
-
-	return 0;
 }
 
 int i915_mutex_lock_interruptible(struct drm_device *dev)
@@ -1113,22 +1110,16 @@ i915_gem_check_wedge(struct i915_gpu_error *error,
 		     bool interruptible)
 {
 	if (i915_reset_in_progress_or_wedged(error)) {
+		/* Recovery complete, but the reset failed ... */
+		if (i915_terminally_wedged(error))
+			return -EIO;
+
 		/* Non-interruptible callers can't handle -EAGAIN, hence return
 		 * -EIO unconditionally for these. */
 		if (!interruptible)
 			return -EIO;
 
-		/* Recovery complete, but the reset failed ... */
-		if (i915_terminally_wedged(error))
-			return -EIO;
-
-		/*
-		 * Check if GPU Reset is in progress - we need intel_ring_begin
-		 * to work properly to reinit the hw state while the gpu is
-		 * still marked as reset-in-progress. Handle this with a flag.
-		 */
-		if (!error->reload_in_reset)
-			return -EAGAIN;
+		return -EAGAIN;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f620581e567a..af9129966120 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2483,7 +2483,6 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 static void i915_reset_and_wakeup(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct i915_gpu_error *error = &dev_priv->gpu_error;
 	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
 	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
 	char *reset_done_event[] = { I915_ERROR_UEVENT "=0", NULL };
@@ -2501,7 +2500,7 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 	 * the reset in-progress bit is only ever set by code outside of this
 	 * work we don't need to worry about any other races.
 	 */
-	if (i915_reset_in_progress_or_wedged(error) && !i915_terminally_wedged(error)) {
+	if (i915_reset_in_progress(&dev_priv->gpu_error)) {
 		DRM_DEBUG_DRIVER("resetting chip\n");
 		kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
 				   reset_event);
@@ -2529,25 +2528,9 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
 
 		intel_runtime_pm_put(dev_priv);
 
-		if (ret == 0) {
-			/*
-			 * After all the gem state is reset, increment the reset
-			 * counter and wake up everyone waiting for the reset to
-			 * complete.
-			 *
-			 * Since unlock operations are a one-sided barrier only,
-			 * we need to insert a barrier here to order any seqno
-			 * updates before
-			 * the counter increment.
-			 */
-			smp_mb__before_atomic();
-			atomic_inc(&dev_priv->gpu_error.reset_counter);
-
+		if (ret == 0)
 			kobject_uevent_env(&dev->primary->kdev->kobj,
 					   KOBJ_CHANGE, reset_done_event);
-		} else {
-			atomic_or(I915_WEDGED, &error->reset_counter);
-		}
 
 		/*
 		 * Note: The wake_up also serves as a memory barrier so that
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 07/10] drm/i915: Store the reset counter when constructing a request
  2016-04-09 11:19 EIO cleanup Chris Wilson
                   ` (5 preceding siblings ...)
  2016-04-09 11:19 ` [PATCH 06/10] drm/i915: Tighten reset_counter for reset status Chris Wilson
@ 2016-04-09 11:19 ` Chris Wilson
  2016-04-11  7:56   ` Joonas Lahtinen
  2016-04-09 11:19 ` [PATCH 08/10] drm/i915: Simplify reset_counter handling during atomic modesetting Chris Wilson
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 11:19 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

As the request is only valid during the same global reset epoch, we can
record the current reset_counter when constructing the request and reuse
it when waiting upon that request in future. This removes a very hairy
atomic check serialised by the struct_mutex at the time of waiting and
allows us to transfer those waits to a central dispatcher for all
waiters and all requests.

PS: With per-engine resets, we obviously cannot assume a global reset
epoch for the requests - a per-engine epoch makes the most sense. The
challenge then is how to handle checking in the waiter for when to break
the wait, as the fine-grained reset may also want to requeue the
request (i.e. the assumption that just because the epoch changes the
request is completed may be broken - or we just avoid breaking that
assumption with the fine-grained resets).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by:: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h         |  2 +-
 drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++----------------------
 drivers/gpu/drm/i915/intel_display.c    |  7 +-----
 drivers/gpu/drm/i915/intel_lrc.c        |  7 ------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  6 -----
 5 files changed, 15 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a4026babd43b..02e56161fac2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2243,6 +2243,7 @@ struct drm_i915_gem_request {
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
 	struct intel_engine_cs *engine;
+	unsigned reset_counter;
 
 	 /** GEM sequence number associated with the previous request,
 	  * when the HWS breadcrumb is equal to this the GPU is processing
@@ -3116,7 +3117,6 @@ void __i915_add_request(struct drm_i915_gem_request *req,
 #define i915_add_request_no_flush(req) \
 	__i915_add_request(req, NULL, false)
 int __i915_wait_request(struct drm_i915_gem_request *req,
-			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
 			struct intel_rps_client *rps);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2cab1786be79..80ca6bab3258 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1213,7 +1213,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 /**
  * __i915_wait_request - wait until execution of request has finished
  * @req: duh!
- * @reset_counter: reset sequence associated with the given request
  * @interruptible: do an interruptible wait (normally yes)
  * @timeout: in - how long to wait (NULL forever); out - how much time remaining
  *
@@ -1228,7 +1227,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
  * errno with remaining time filled in timeout argument.
  */
 int __i915_wait_request(struct drm_i915_gem_request *req,
-			unsigned reset_counter,
 			bool interruptible,
 			s64 *timeout,
 			struct intel_rps_client *rps)
@@ -1290,7 +1288,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 		/* We need to check whether any gpu reset happened in between
 		 * the caller grabbing the seqno and now ... */
-		if (reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
+		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
 			/* ... but upgrade the -EAGAIN to an -EIO if the gpu
 			 * is truely gone. */
 			ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
@@ -1460,13 +1458,7 @@ i915_wait_request(struct drm_i915_gem_request *req)
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
-	if (ret)
-		return ret;
-
-	ret = __i915_wait_request(req,
-				  i915_reset_counter(&dev_priv->gpu_error),
-				  interruptible, NULL, NULL);
+	ret = __i915_wait_request(req, interruptible, NULL, NULL);
 	if (ret)
 		return ret;
 
@@ -1541,7 +1533,6 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_request *requests[I915_NUM_ENGINES];
-	unsigned reset_counter;
 	int ret, i, n = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
@@ -1550,12 +1541,6 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	if (!obj->active)
 		return 0;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, true);
-	if (ret)
-		return ret;
-
-	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
-
 	if (readonly) {
 		struct drm_i915_gem_request *req;
 
@@ -1577,9 +1562,9 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 	}
 
 	mutex_unlock(&dev->struct_mutex);
+	ret = 0;
 	for (i = 0; ret == 0 && i < n; i++)
-		ret = __i915_wait_request(requests[i], reset_counter, true,
-					  NULL, rps);
+		ret = __i915_wait_request(requests[i], true, NULL, rps);
 	mutex_lock(&dev->struct_mutex);
 
 	for (i = 0; i < n; i++) {
@@ -2684,6 +2669,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 			 struct drm_i915_gem_request **req_out)
 {
 	struct drm_i915_private *dev_priv = to_i915(engine->dev);
+	unsigned reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 	struct drm_i915_gem_request *req;
 	int ret;
 
@@ -2692,6 +2678,11 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 
 	*req_out = NULL;
 
+	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
+				   dev_priv->mm.interruptible);
+	if (ret)
+		return ret;
+
 	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
 	if (req == NULL)
 		return -ENOMEM;
@@ -2703,6 +2694,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 	kref_init(&req->ref);
 	req->i915 = dev_priv;
 	req->engine = engine;
+	req->reset_counter = reset_counter;
 	req->ctx  = ctx;
 	i915_gem_context_reference(req->ctx);
 
@@ -3081,11 +3073,9 @@ retire:
 int
 i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_wait *args = data;
 	struct drm_i915_gem_object *obj;
 	struct drm_i915_gem_request *req[I915_NUM_ENGINES];
-	unsigned reset_counter;
 	int i, n = 0;
 	int ret;
 
@@ -3119,7 +3109,6 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	}
 
 	drm_gem_object_unreference(&obj->base);
-	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 
 	for (i = 0; i < I915_NUM_ENGINES; i++) {
 		if (obj->last_read_req[i] == NULL)
@@ -3132,7 +3121,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 	for (i = 0; i < n; i++) {
 		if (ret == 0)
-			ret = __i915_wait_request(req[i], reset_counter, true,
+			ret = __i915_wait_request(req[i], true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  to_rps_client(file));
 		i915_gem_request_unreference__unlocked(req[i]);
@@ -3164,7 +3153,6 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
 		ret = __i915_wait_request(from_req,
-					  i915_reset_counter(&i915->gpu_error),
 					  i915->mm.interruptible,
 					  NULL,
 					  &i915->rps.semaphores);
@@ -4120,7 +4108,6 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES;
 	struct drm_i915_gem_request *request, *target = NULL;
-	unsigned reset_counter;
 	int ret;
 
 	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
@@ -4145,7 +4132,6 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 
 		target = request;
 	}
-	reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 	if (target)
 		i915_gem_request_reference(target);
 	spin_unlock(&file_priv->mm.lock);
@@ -4153,7 +4139,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (target == NULL)
 		return 0;
 
-	ret = __i915_wait_request(target, reset_counter, true, NULL, NULL);
+	ret = __i915_wait_request(target, true, NULL, NULL);
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 3c01ebf10fa2..438e2f7ca836 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11360,7 +11360,6 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 
 	if (mmio_flip->req) {
 		WARN_ON(__i915_wait_request(mmio_flip->req,
-					    mmio_flip->crtc->reset_counter,
 					    false, NULL,
 					    &mmio_flip->i915->rps.mmioflips));
 		i915_gem_request_unreference__unlocked(mmio_flip->req);
@@ -13386,9 +13385,6 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 
 	ret = drm_atomic_helper_prepare_planes(dev, state);
 	if (!ret && !async && !i915_reset_in_progress_or_wedged(&dev_priv->gpu_error)) {
-		u32 reset_counter;
-
-		reset_counter = i915_reset_counter(&dev_priv->gpu_error);
 		mutex_unlock(&dev->struct_mutex);
 
 		for_each_plane_in_state(state, plane, plane_state, i) {
@@ -13399,8 +13395,7 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 				continue;
 
 			ret = __i915_wait_request(intel_plane_state->wait_req,
-						  reset_counter, true,
-						  NULL, NULL);
+						  true, NULL, NULL);
 
 			/* Swallow -EIO errors to allow updates during hw lockup. */
 			if (ret == -EIO)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 95fc704eeade..fe2bbd4c9a65 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -892,16 +892,9 @@ static int logical_ring_prepare(struct drm_i915_gem_request *req, int bytes)
  */
 int intel_logical_ring_begin(struct drm_i915_gem_request *req, int num_dwords)
 {
-	struct drm_i915_private *dev_priv;
 	int ret;
 
 	WARN_ON(req == NULL);
-	dev_priv = req->i915;
-
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-				   dev_priv->mm.interruptible);
-	if (ret)
-		return ret;
 
 	ret = logical_ring_prepare(req, num_dwords * sizeof(uint32_t));
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 493124c3d934..2ce316388b62 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2386,7 +2386,6 @@ int intel_engine_idle(struct intel_engine_cs *engine)
 
 	/* Make sure we do not trigger any retires */
 	return __i915_wait_request(req,
-				   i915_reset_counter(&req->i915->gpu_error),
 				   req->i915->mm.interruptible,
 				   NULL, NULL);
 }
@@ -2517,11 +2516,6 @@ int intel_ring_begin(struct drm_i915_gem_request *req,
 	engine = req->engine;
 	dev_priv = req->i915;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-				   dev_priv->mm.interruptible);
-	if (ret)
-		return ret;
-
 	ret = __intel_ring_prepare(engine, num_dwords * sizeof(uint32_t));
 	if (ret)
 		return ret;
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 08/10] drm/i915: Simplify reset_counter handling during atomic modesetting
  2016-04-09 11:19 EIO cleanup Chris Wilson
                   ` (6 preceding siblings ...)
  2016-04-09 11:19 ` [PATCH 07/10] drm/i915: Store the reset counter when constructing a request Chris Wilson
@ 2016-04-09 11:19 ` Chris Wilson
  2016-04-09 11:19 ` [PATCH 09/10] drm/i915: Prevent leaking of -EIO from i915_wait_request() Chris Wilson
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 11:19 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Now that the reset_counter is stored on the request, we can rearrange
the code to handle reading the counter versus waiting during the atomic
modesetting for readibility (by deleting the hairiest of codes).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_display.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 438e2f7ca836..8858f57488a6 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13384,9 +13384,9 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 		return ret;
 
 	ret = drm_atomic_helper_prepare_planes(dev, state);
-	if (!ret && !async && !i915_reset_in_progress_or_wedged(&dev_priv->gpu_error)) {
-		mutex_unlock(&dev->struct_mutex);
+	mutex_unlock(&dev->struct_mutex);
 
+	if (!ret && !async) {
 		for_each_plane_in_state(state, plane, plane_state, i) {
 			struct intel_plane_state *intel_plane_state =
 				to_intel_plane_state(plane_state);
@@ -13400,19 +13400,15 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 			/* Swallow -EIO errors to allow updates during hw lockup. */
 			if (ret == -EIO)
 				ret = 0;
-
-			if (ret)
+			if (ret) {
+				mutex_lock(&dev->struct_mutex);
+				drm_atomic_helper_cleanup_planes(dev, state);
+				mutex_unlock(&dev->struct_mutex);
 				break;
+			}
 		}
-
-		if (!ret)
-			return 0;
-
-		mutex_lock(&dev->struct_mutex);
-		drm_atomic_helper_cleanup_planes(dev, state);
 	}
 
-	mutex_unlock(&dev->struct_mutex);
 	return ret;
 }
 
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 09/10] drm/i915: Prevent leaking of -EIO from i915_wait_request()
  2016-04-09 11:19 EIO cleanup Chris Wilson
                   ` (7 preceding siblings ...)
  2016-04-09 11:19 ` [PATCH 08/10] drm/i915: Simplify reset_counter handling during atomic modesetting Chris Wilson
@ 2016-04-09 11:19 ` Chris Wilson
  2016-04-09 11:19 ` [PATCH 10/10] drm/i915: Suppress error message when GPU resets are disabled Chris Wilson
  2016-04-09 12:27 ` ✗ Fi.CI.BAT: failure for series starting with [01/10] drm/i915: Force clean compilation with -Werror Patchwork
  10 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 11:19 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.

If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.

Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.

v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h         |  2 --
 drivers/gpu/drm/i915/i915_gem.c         | 44 ++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_userptr.c |  6 ++---
 drivers/gpu/drm/i915/intel_display.c    | 13 +++++-----
 drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  3 +--
 6 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 02e56161fac2..97ff6eeb1f0c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3044,8 +3044,6 @@ i915_gem_find_active_request(struct intel_engine_cs *engine);
 
 bool i915_gem_retire_requests(struct drm_device *dev);
 void i915_gem_retire_requests_ring(struct intel_engine_cs *engine);
-int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
-				      bool interruptible);
 
 static inline u32 i915_reset_counter(struct i915_gpu_error *error)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 80ca6bab3258..04678942fc32 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -206,11 +206,10 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj)
 	BUG_ON(obj->madv == __I915_MADV_PURGED);
 
 	ret = i915_gem_object_set_to_cpu_domain(obj, true);
-	if (ret) {
+	if (WARN_ON(ret)) {
 		/* In the event of a disaster, abandon all caches and
 		 * hope for the best.
 		 */
-		WARN_ON(ret != -EIO);
 		obj->base.read_domains = obj->base.write_domain = I915_GEM_DOMAIN_CPU;
 	}
 
@@ -1105,15 +1104,13 @@ put_rpm:
 	return ret;
 }
 
-int
-i915_gem_check_wedge(struct i915_gpu_error *error,
-		     bool interruptible)
+static int
+i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
 {
-	if (i915_reset_in_progress_or_wedged(error)) {
-		/* Recovery complete, but the reset failed ... */
-		if (i915_terminally_wedged(error))
-			return -EIO;
+	if (__i915_terminally_wedged(reset_counter))
+		return -EIO;
 
+	if (__i915_reset_in_progress(reset_counter)) {
 		/* Non-interruptible callers can't handle -EAGAIN, hence return
 		 * -EIO unconditionally for these. */
 		if (!interruptible)
@@ -1287,13 +1284,14 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		prepare_to_wait(&engine->irq_queue, &wait, state);
 
 		/* We need to check whether any gpu reset happened in between
-		 * the caller grabbing the seqno and now ... */
+		 * the request being submitted and now. If a reset has occurred,
+		 * the request is effectively complete (we either are in the
+		 * process of or have discarded the rendering and completely
+		 * reset the GPU. The results of the request are lost and we
+		 * are free to continue on with the original operation.
+		 */
 		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
-			/* ... but upgrade the -EAGAIN to an -EIO if the gpu
-			 * is truely gone. */
-			ret = i915_gem_check_wedge(&dev_priv->gpu_error, interruptible);
-			if (ret == 0)
-				ret = -EAGAIN;
+			ret = 0;
 			break;
 		}
 
@@ -2154,11 +2152,10 @@ i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 	BUG_ON(obj->madv == __I915_MADV_PURGED);
 
 	ret = i915_gem_object_set_to_cpu_domain(obj, true);
-	if (ret) {
+	if (WARN_ON(ret)) {
 		/* In the event of a disaster, abandon all caches and
 		 * hope for the best.
 		 */
-		WARN_ON(ret != -EIO);
 		i915_gem_clflush_object(obj, true);
 		obj->base.read_domains = obj->base.write_domain = I915_GEM_DOMAIN_CPU;
 	}
@@ -2678,8 +2675,11 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 
 	*req_out = NULL;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
-				   dev_priv->mm.interruptible);
+	/* ABI: Before userspace accesses the GPU (e.g. execbuffer), report
+	 * EIO if the GPU is already wedged, or EAGAIN to drop the struct_mutex
+	 * and restart.
+	 */
+	ret = i915_gem_check_wedge(reset_counter, dev_priv->mm.interruptible);
 	if (ret)
 		return ret;
 
@@ -4114,9 +4114,9 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (ret)
 		return ret;
 
-	ret = i915_gem_check_wedge(&dev_priv->gpu_error, false);
-	if (ret)
-		return ret;
+	/* ABI: return -EIO if already wedged */
+	if (i915_terminally_wedged(&dev_priv->gpu_error))
+		return -EIO;
 
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 962cb4c507cb..d44e4e6d820e 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -78,10 +78,8 @@ static void cancel_userptr(struct work_struct *work)
 		was_interruptible = dev_priv->mm.interruptible;
 		dev_priv->mm.interruptible = false;
 
-		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link) {
-			int ret = i915_vma_unbind(vma);
-			WARN_ON(ret && ret != -EIO);
-		}
+		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link)
+			WARN_ON(i915_vma_unbind(vma));
 		WARN_ON(i915_gem_object_put_pages(obj));
 
 		dev_priv->mm.interruptible = was_interruptible;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 8858f57488a6..5c61b1a23af1 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13396,11 +13396,9 @@ static int intel_atomic_prepare_commit(struct drm_device *dev,
 
 			ret = __i915_wait_request(intel_plane_state->wait_req,
 						  true, NULL, NULL);
-
-			/* Swallow -EIO errors to allow updates during hw lockup. */
-			if (ret == -EIO)
-				ret = 0;
 			if (ret) {
+				/* Any hang should be swallowed by the wait */
+				WARN_ON(ret == -EIO);
 				mutex_lock(&dev->struct_mutex);
 				drm_atomic_helper_cleanup_planes(dev, state);
 				mutex_unlock(&dev->struct_mutex);
@@ -13751,10 +13749,11 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		 */
 		if (needs_modeset(crtc_state))
 			ret = i915_gem_object_wait_rendering(old_obj, true);
-
-		/* Swallow -EIO errors to allow updates during hw lockup. */
-		if (ret && ret != -EIO)
+		if (ret) {
+			/* GPU hangs should have been swallowed by the wait */
+			WARN_ON(ret == -EIO);
 			return ret;
+		}
 	}
 
 	/* For framebuffer backed by dmabuf, wait for fence */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fe2bbd4c9a65..2f5cddf066e7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1050,7 +1050,7 @@ void intel_logical_ring_stop(struct intel_engine_cs *engine)
 		return;
 
 	ret = intel_engine_idle(engine);
-	if (ret && !i915_reset_in_progress_or_wedged(&dev_priv->gpu_error))
+	if (ret)
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  engine->name, ret);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 2ce316388b62..8cac03e21cd0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -3206,8 +3206,7 @@ intel_stop_engine(struct intel_engine_cs *engine)
 		return;
 
 	ret = intel_engine_idle(engine);
-	if (ret &&
-	    !i915_reset_in_progress_or_wedged(&to_i915(engine->dev)->gpu_error))
+	if (ret)
 		DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
 			  engine->name, ret);
 
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 10/10] drm/i915: Suppress error message when GPU resets are disabled
  2016-04-09 11:19 EIO cleanup Chris Wilson
                   ` (8 preceding siblings ...)
  2016-04-09 11:19 ` [PATCH 09/10] drm/i915: Prevent leaking of -EIO from i915_wait_request() Chris Wilson
@ 2016-04-09 11:19 ` Chris Wilson
  2016-04-09 12:27 ` ✗ Fi.CI.BAT: failure for series starting with [01/10] drm/i915: Force clean compilation with -Werror Patchwork
  10 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 11:19 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

If we do not have lowlevel support for reseting the GPU, or if the user
has explicitly disabled reseting the device, the failure is expected.
Since it is an expected failure, we should be using a lower priority
message than *ERROR*, perhaps NOTICE. In the absence of DRM_NOTICE, just
emit the expected failure as a DEBUG message.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 310dc61817fa..18eb3e698763 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -917,7 +917,10 @@ int i915_reset(struct drm_device *dev)
 		pr_notice("drm/i915: Resetting chip after gpu hang\n");
 
 	if (ret) {
-		DRM_ERROR("Failed to reset chip: %i\n", ret);
+		if (ret != -ENODEV)
+			DRM_ERROR("Failed to reset chip: %i\n", ret);
+		else
+			DRM_DEBUG_DRIVER("GPU reset disabled\n");
 		goto error;
 	}
 
-- 
2.8.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/10] drm/i915: Force clean compilation with -Werror
  2016-04-09 11:19 EIO cleanup Chris Wilson
                   ` (9 preceding siblings ...)
  2016-04-09 11:19 ` [PATCH 10/10] drm/i915: Suppress error message when GPU resets are disabled Chris Wilson
@ 2016-04-09 12:27 ` Patchwork
  2016-04-09 12:39   ` Chris Wilson
  10 siblings, 1 reply; 17+ messages in thread
From: Patchwork @ 2016-04-09 12:27 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/10] drm/i915: Force clean compilation with -Werror
URL   : https://patchwork.freedesktop.org/series/5487/
State : failure

== Summary ==

Series 5487v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/5487/revisions/1/mbox/

Test drv_module_reload_basic:
                dmesg-warn -> PASS       (ilk-hp8440p)
Test kms_flip:
        Subgroup basic-flip-vs-dpms:
                pass       -> DMESG-WARN (ilk-hp8440p) UNSTABLE
        Subgroup basic-flip-vs-modeset:
                pass       -> DMESG-WARN (ilk-hp8440p) UNSTABLE
Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-a:
                skip       -> FAIL       (bsw-nuc-2)
                pass       -> FAIL       (ivb-t430s)
                pass       -> FAIL       (bdw-ultra)
                pass       -> FAIL       (skl-i7k-2)
                pass       -> FAIL       (byt-nuc)
                pass       -> FAIL       (snb-x220t)
                pass       -> FAIL       (snb-dellxps)
                pass       -> FAIL       (hsw-brixbox)
                pass       -> FAIL       (bdw-nuci7)
                pass       -> FAIL       (ilk-hp8440p)
        Subgroup hang-read-crc-pipe-b:
                skip       -> FAIL       (bsw-nuc-2)
                pass       -> FAIL       (ivb-t430s)
                pass       -> FAIL       (bdw-ultra)
                pass       -> FAIL       (skl-i7k-2)
                pass       -> FAIL       (byt-nuc)
                pass       -> FAIL       (snb-x220t)
                pass       -> FAIL       (snb-dellxps)
                pass       -> FAIL       (hsw-brixbox)
                pass       -> FAIL       (bdw-nuci7)
                pass       -> FAIL       (ilk-hp8440p)
        Subgroup hang-read-crc-pipe-c:
                pass       -> FAIL       (bsw-nuc-2)
                pass       -> FAIL       (ivb-t430s)
                pass       -> FAIL       (bdw-ultra)
                pass       -> FAIL       (skl-i7k-2)
                skip       -> FAIL       (byt-nuc)
                skip       -> FAIL       (snb-x220t)
                skip       -> FAIL       (snb-dellxps)
                pass       -> FAIL       (hsw-brixbox)
                pass       -> FAIL       (bdw-nuci7)
                skip       -> FAIL       (ilk-hp8440p)

bdw-nuci7        total:196  pass:181  dwarn:0   dfail:0   fail:3   skip:12 
bdw-ultra        total:196  pass:172  dwarn:0   dfail:0   fail:3   skip:21 
bsw-nuc-2        total:196  pass:158  dwarn:0   dfail:0   fail:3   skip:35 
byt-nuc          total:196  pass:159  dwarn:0   dfail:0   fail:3   skip:34 
hsw-brixbox      total:196  pass:171  dwarn:0   dfail:0   fail:3   skip:22 
ilk-hp8440p      total:196  pass:128  dwarn:2   dfail:0   fail:3   skip:63 
ivb-t430s        total:196  pass:168  dwarn:0   dfail:0   fail:3   skip:25 
skl-i7k-2        total:196  pass:170  dwarn:0   dfail:0   fail:3   skip:23 
snb-dellxps      total:196  pass:160  dwarn:0   dfail:0   fail:3   skip:33 
snb-x220t        total:196  pass:160  dwarn:0   dfail:0   fail:4   skip:32 

Results at /archive/results/CI_IGT_test/Patchwork_1855/

56aa709c9614bf7f39ee255fd0ddf4f1b2743387 drm-intel-nightly: 2016y-04m-09d-11h-10m-49s UTC integration manifest
2991addbcc20b7c9edef418e957f0a4b5f3f6338 drm/i915: Suppress error message when GPU resets are disabled
2d2ba3cc386af50a9dbd5fe68eac37d8a6483878 drm/i915: Prevent leaking of -EIO from i915_wait_request()
8d03b898b939ac59dec888dbdaa4fc7785c7d0d0 drm/i915: Simplify reset_counter handling during atomic modesetting
c17a7f934130cff780c8195adcd158f9cc16e2a2 drm/i915: Store the reset counter when constructing a request
eddf9a904c8c4d3f840af264fa069cd5cdbb9ff0 drm/i915: Tighten reset_counter for reset status
51e07f54e96cf699b532b0d9ede3ffe84a757c94 drm/i915: Simplify checking of GPU reset_counter in display pageflips
c99c64ac47dff274a4d3da9dc31f0210832b8e17 drm/i915: Hide the atomic_read(reset_counter) behind a helper
1ce3cd107f28e31cc0bf43281bd863e4a1c3a8b2 drm/i915: Add GEM debugging Kconfig option
c68a744b2ec476f0ab5b576a3409b09a289e2f85 drm/i915: Disentangle i915_drv.h includes
7baec233ff87dd04ef53d8c10f9fc26438ca944f drm/i915: Force clean compilation with -Werror

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for series starting with [01/10] drm/i915: Force clean compilation with -Werror
  2016-04-09 12:27 ` ✗ Fi.CI.BAT: failure for series starting with [01/10] drm/i915: Force clean compilation with -Werror Patchwork
@ 2016-04-09 12:39   ` Chris Wilson
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2016-04-09 12:39 UTC (permalink / raw)
  To: intel-gfx

On Sat, Apr 09, 2016 at 12:27:35PM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: series starting with [01/10] drm/i915: Force clean compilation with -Werror
> URL   : https://patchwork.freedesktop.org/series/5487/
> State : failure
> 
> == Summary ==
> 
> Series 5487v1 Series without cover letter
> http://patchwork.freedesktop.org/api/1.0/series/5487/revisions/1/mbox/
> 
> Test drv_module_reload_basic:
>                 dmesg-warn -> PASS       (ilk-hp8440p)
> Test kms_flip:
>         Subgroup basic-flip-vs-dpms:
>                 pass       -> DMESG-WARN (ilk-hp8440p) UNSTABLE
>         Subgroup basic-flip-vs-modeset:
>                 pass       -> DMESG-WARN (ilk-hp8440p) UNSTABLE
> Test kms_pipe_crc_basic:
>         Subgroup hang-read-crc-pipe-a:
>                 skip       -> FAIL       (bsw-nuc-2)

Forgot to push the test fix since it was using the defunct debug-only
mechanism and not hanging the GPU at all.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 07/10] drm/i915: Store the reset counter when constructing a request
  2016-04-09 11:19 ` [PATCH 07/10] drm/i915: Store the reset counter when constructing a request Chris Wilson
@ 2016-04-11  7:56   ` Joonas Lahtinen
  0 siblings, 0 replies; 17+ messages in thread
From: Joonas Lahtinen @ 2016-04-11  7:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter

On la, 2016-04-09 at 12:19 +0100, Chris Wilson wrote:
> As the request is only valid during the same global reset epoch, we can
> record the current reset_counter when constructing the request and reuse
> it when waiting upon that request in future. This removes a very hairy
> atomic check serialised by the struct_mutex at the time of waiting and
> allows us to transfer those waits to a central dispatcher for all
> waiters and all requests.
> 
> PS: With per-engine resets, we obviously cannot assume a global reset
> epoch for the requests - a per-engine epoch makes the most sense. The
> challenge then is how to handle checking in the waiter for when to break
> the wait, as the fine-grained reset may also want to requeue the
> request (i.e. the assumption that just because the epoch changes the
> request is completed may be broken - or we just avoid breaking that
> assumption with the fine-grained resets).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by:: Daniel Vetter <daniel.vetter@ffwll.ch>

Double :'s.

Regards, Joonas

--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 02/10] drm/i915: Disentangle i915_drv.h includes
  2016-04-09 11:19 ` [PATCH 02/10] drm/i915: Disentangle i915_drv.h includes Chris Wilson
@ 2016-04-11  8:00   ` Joonas Lahtinen
  0 siblings, 0 replies; 17+ messages in thread
From: Joonas Lahtinen @ 2016-04-11  8:00 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On la, 2016-04-09 at 12:19 +0100, Chris Wilson wrote:
> Separate out the layers of includes (linux, drm, intel, i915) so that it
> is a little easier to order our definitions between our multiple
> reentrant headers. A couple of headers needed fixes to make them more
> standalone (forgotten includes, forward declarations etc).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_drv.h  | 29 +++++++++++++++++------------
>  drivers/gpu/drm/i915/intel_guc.h |  2 ++
>  drivers/gpu/drm/i915/intel_lrc.h |  2 ++
>  3 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 542401659013..1753077aebbc 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -33,27 +33,32 @@
>  #include 
>  #include 
>  
> -#include 
> -#include "i915_params.h"
> -#include "i915_reg.h"
> -#include "intel_bios.h"
> -#include "intel_ringbuffer.h"
> -#include "intel_lrc.h"
> -#include "i915_gem_gtt.h"
> -#include "i915_gem_render_state.h"
>  #include 
>  #include 
>  #include 
> -#include 
> -#include  /* for struct drm_dma_handle */
> -#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> -#include "intel_guc.h"
> +#include 
> +
> +#include 
> +#include 
> +#include  /* for struct drm_dma_handle */
> +#include 
> +
> +#include "i915_params.h"
> +#include "i915_reg.h"
> +
> +#include "intel_bios.h"
>  #include "intel_dpll_mgr.h"
> +#include "intel_guc.h"
> +#include "intel_lrc.h"
> +#include "intel_ringbuffer.h"
> +
> +#include "i915_gem_gtt.h"
> +#include "i915_gem_render_state.h"
>  
>  /* General customization:
>   */
> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
> index 73002e901ff2..3bb85b127cb0 100644
> --- a/drivers/gpu/drm/i915/intel_guc.h
> +++ b/drivers/gpu/drm/i915/intel_guc.h
> @@ -27,6 +27,8 @@
>  #include "intel_guc_fwif.h"
>  #include "i915_guc_reg.h"
>  
> +struct drm_i915_gem_request;
> +
>  struct i915_guc_client {
>  	struct drm_i915_gem_object *client_obj;
>  	struct intel_context *owner;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 0b0853eee91e..5136a2cf50b5 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -24,6 +24,8 @@
>  #ifndef _INTEL_LRC_H_
>  #define _INTEL_LRC_H_
>  
> +#include "intel_ringbuffer.h"
> +
>  #define GEN8_LR_CONTEXT_ALIGN 4096
>  
>  /* Execlists regs */
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 03/10] drm/i915: Add GEM debugging Kconfig option
  2016-04-09 11:19 ` [PATCH 03/10] drm/i915: Add GEM debugging Kconfig option Chris Wilson
@ 2016-04-11  8:12   ` Joonas Lahtinen
  2016-04-11  8:27     ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Joonas Lahtinen @ 2016-04-11  8:12 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On la, 2016-04-09 at 12:19 +0100, Chris Wilson wrote:
> Currently there is a #define to enable extra BUG_ON for debugging
> requests and associated activities. I want to expand its use to cover
> all of GEM internals (so that we can saturate the code with asserts).
> We can add a Kconfig option to make it easier to enable - with the usual
> caveats of not enabling unless explicitly requested.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A few comments below, still:

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/Kconfig.debug | 12 ++++++++++++
>  drivers/gpu/drm/i915/i915_drv.h    |  1 +
>  drivers/gpu/drm/i915/i915_gem.c    | 12 +++++-------
>  drivers/gpu/drm/i915/i915_gem.h    | 34 ++++++++++++++++++++++++++++++++++
>  4 files changed, 52 insertions(+), 7 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_gem.h
> 
> diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
> index 61485c8ba3a8..8f404103341d 100644
> --- a/drivers/gpu/drm/i915/Kconfig.debug
> +++ b/drivers/gpu/drm/i915/Kconfig.debug
> @@ -27,3 +27,15 @@ config DRM_I915_DEBUG
>  
>            If in doubt, say "N".
>  
> +config DRM_I915_DEBUG_GEM
> +        bool "Insert extra checks into the GEM internals"
> +        default n
> +        depends on DRM_I915_WERROR

Not sure if this needs to be explicit.

> +        help
> +          Enable extra sanity checks (including BUGs) along the GEM driver
> +          paths that may slow the system down and if hit hang the machine.
> +
> +          Recommended for driver developers only.
> +
> +          If in doubt, say "N".
> +
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1753077aebbc..fcecc0a7332f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -57,6 +57,7 @@
>  #include "intel_lrc.h"
>  #include "intel_ringbuffer.h"
>  
> +#include "i915_gem.h"
>  #include "i915_gem_gtt.h"
>  #include "i915_gem_render_state.h"
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5a65a7663b88..716b7e00dd88 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -38,8 +38,6 @@
>  #include 
>  #include 
>  
> -#define RQ_BUG_ON(expr)
> -
>  static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>  static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
>  static void
> @@ -1521,7 +1519,7 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>  
>  			i915_gem_object_retire__read(obj, i);
>  		}
> -		RQ_BUG_ON(obj->active);
> +		GEM_BUG_ON(obj->active);
>  	}
>  
>  	return 0;
> @@ -2422,8 +2420,8 @@ void i915_vma_move_to_active(struct i915_vma *vma,
>  static void
>  i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
>  {
> -	RQ_BUG_ON(obj->last_write_req == NULL);
> -	RQ_BUG_ON(!(obj->active & intel_engine_flag(obj->last_write_req->engine)));
> +	GEM_BUG_ON(obj->last_write_req == NULL);
> +	GEM_BUG_ON(!(obj->active & intel_engine_flag(obj->last_write_req->engine)));
>  
>  	i915_gem_request_assign(&obj->last_write_req, NULL);
>  	intel_fb_obj_flush(obj, true, ORIGIN_CS);
> @@ -2434,8 +2432,8 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>  {
>  	struct i915_vma *vma;
>  
> -	RQ_BUG_ON(obj->last_read_req[ring] == NULL);
> -	RQ_BUG_ON(!(obj->active & (1 << ring)));
> +	GEM_BUG_ON(obj->last_read_req[ring] == NULL);
> +	GEM_BUG_ON(!(obj->active & (1 << ring)));
>  
>  	list_del_init(&obj->engine_list[ring]);
>  	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
> new file mode 100644
> index 000000000000..8292e797d9b5
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_gem.h
> @@ -0,0 +1,34 @@
> +/*
> + * Copyright © 2016 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef __I915_GEM_H__
> +#define __I915_GEM_H__
> +
> +#ifdef CONFIG_DRM_I915_DEBUG_GEM
> +#define GEM_BUG_ON(expr) BUG_ON(expr)
> +#else
> +#define GEM_BUG_ON(expr)

There seems to be no consistent way of doing this within kernel so
guess this is fin, s/(expr)//g would do, too.

Regards, Joonas

> +#endif
> +
> +#endif /* __I915_GEM_H__ */
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 03/10] drm/i915: Add GEM debugging Kconfig option
  2016-04-11  8:12   ` Joonas Lahtinen
@ 2016-04-11  8:27     ` Chris Wilson
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2016-04-11  8:27 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Mon, Apr 11, 2016 at 11:12:48AM +0300, Joonas Lahtinen wrote:
> On la, 2016-04-09 at 12:19 +0100, Chris Wilson wrote:
> > Currently there is a #define to enable extra BUG_ON for debugging
> > requests and associated activities. I want to expand its use to cover
> > all of GEM internals (so that we can saturate the code with asserts).
> > We can add a Kconfig option to make it easier to enable - with the usual
> > caveats of not enabling unless explicitly requested.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> A few comments below, still:
> 
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> 
> > ---
> >  drivers/gpu/drm/i915/Kconfig.debug | 12 ++++++++++++
> >  drivers/gpu/drm/i915/i915_drv.h    |  1 +
> >  drivers/gpu/drm/i915/i915_gem.c    | 12 +++++-------
> >  drivers/gpu/drm/i915/i915_gem.h    | 34 ++++++++++++++++++++++++++++++++++
> >  4 files changed, 52 insertions(+), 7 deletions(-)
> >  create mode 100644 drivers/gpu/drm/i915/i915_gem.h
> > 
> > diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
> > index 61485c8ba3a8..8f404103341d 100644
> > --- a/drivers/gpu/drm/i915/Kconfig.debug
> > +++ b/drivers/gpu/drm/i915/Kconfig.debug
> > @@ -27,3 +27,15 @@ config DRM_I915_DEBUG
> >  
> >            If in doubt, say "N".
> >  
> > +config DRM_I915_DEBUG_GEM
> > +        bool "Insert extra checks into the GEM internals"
> > +        default n
> > +        depends on DRM_I915_WERROR
> 
> Not sure if this needs to be explicit.

I was using it to hide the option unless in "developer" mode. When we
hit these BUG_ON we can often limp on without any noticeable effect
(yes, they are precursors of infinite loops, memory corruption and
whatnot but not always ;)

> > +#ifdef CONFIG_DRM_I915_DEBUG_GEM
> > +#define GEM_BUG_ON(expr) BUG_ON(expr)
> > +#else
> > +#define GEM_BUG_ON(expr)
> 
> There seems to be no consistent way of doing this within kernel so
> guess this is fin, s/(expr)//g would do, too.

The choice is whether we want side-effects from expr to be evaluated. I
voted for no, as that allows us to write complicated expr such that only
the BUG_ON() need to jump through 29 pointers and function calls to sanity
check calling conditions. And partly why I want Werror enabled so that
I have to fix any bugs in the BUG_ON() for the code to compile!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-04-11  8:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-09 11:19 EIO cleanup Chris Wilson
2016-04-09 11:19 ` [PATCH 01/10] drm/i915: Force clean compilation with -Werror Chris Wilson
2016-04-09 11:19 ` [PATCH 02/10] drm/i915: Disentangle i915_drv.h includes Chris Wilson
2016-04-11  8:00   ` Joonas Lahtinen
2016-04-09 11:19 ` [PATCH 03/10] drm/i915: Add GEM debugging Kconfig option Chris Wilson
2016-04-11  8:12   ` Joonas Lahtinen
2016-04-11  8:27     ` Chris Wilson
2016-04-09 11:19 ` [PATCH 04/10] drm/i915: Hide the atomic_read(reset_counter) behind a helper Chris Wilson
2016-04-09 11:19 ` [PATCH 05/10] drm/i915: Simplify checking of GPU reset_counter in display pageflips Chris Wilson
2016-04-09 11:19 ` [PATCH 06/10] drm/i915: Tighten reset_counter for reset status Chris Wilson
2016-04-09 11:19 ` [PATCH 07/10] drm/i915: Store the reset counter when constructing a request Chris Wilson
2016-04-11  7:56   ` Joonas Lahtinen
2016-04-09 11:19 ` [PATCH 08/10] drm/i915: Simplify reset_counter handling during atomic modesetting Chris Wilson
2016-04-09 11:19 ` [PATCH 09/10] drm/i915: Prevent leaking of -EIO from i915_wait_request() Chris Wilson
2016-04-09 11:19 ` [PATCH 10/10] drm/i915: Suppress error message when GPU resets are disabled Chris Wilson
2016-04-09 12:27 ` ✗ Fi.CI.BAT: failure for series starting with [01/10] drm/i915: Force clean compilation with -Werror Patchwork
2016-04-09 12:39   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.