[PATCH 1/5] drm/i915: Do a full device reset after being wedged

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/5] drm/i915: Do a full device reset after being wedged
@ 2018-09-03  8:33 Chris Wilson
  2018-09-03  8:33 ` [PATCH 2/5] drm/i915: Flag any possible writes for a GTT fault Chris Wilson
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Chris Wilson @ 2018-09-03  8:33 UTC (permalink / raw)
  To: intel-gfx

We only call unset_wedged on the global reset path (since it's a global
operation), so if we are terminally wedged and wish to reset, take the
full device reset path rather than the quicker individual engine resets.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e31093ce871c..10f28a2ee2e6 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3309,7 +3309,8 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(dev_priv)) {
+	if (intel_has_reset_engine(dev_priv) &&
+	    !i915_terminally_wedged(&dev_priv->gpu_error)) {
 		for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
-- 
2.19.0.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] drm/i915: Flag any possible writes for a GTT fault
  2018-09-03  8:33 [PATCH 1/5] drm/i915: Do a full device reset after being wedged Chris Wilson
@ 2018-09-03  8:33 ` Chris Wilson
  2018-09-03 10:30   ` Joonas Lahtinen
  2018-09-03  8:33 ` [PATCH 3/5] drm/i915: Force the slow path after a user-write error Chris Wilson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2018-09-03  8:33 UTC (permalink / raw)
  To: intel-gfx

We do not explicitly mark the PTE for the user's GTT mmap as being
wrprotect, so we don't get a refault when we would need to change a
read-only mmapping into read-write. As such, we must presume that if the
vma has PROT_WRITE it may be written to, although this is supposed to be
indicated by set-domain there are cases (e.g. after swap) where
userspace may not be aware of the implicit domain change.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7b7bbfe59697..625e07c56fe2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2018,7 +2018,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	bool write = !!(vmf->flags & FAULT_FLAG_WRITE);
+	bool write = area->vm_flags & VM_WRITE;
 	struct i915_vma *vma;
 	pgoff_t page_offset;
 	int ret;
-- 
2.19.0.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] drm/i915: Force the slow path after a user-write error
  2018-09-03  8:33 [PATCH 1/5] drm/i915: Do a full device reset after being wedged Chris Wilson
  2018-09-03  8:33 ` [PATCH 2/5] drm/i915: Flag any possible writes for a GTT fault Chris Wilson
@ 2018-09-03  8:33 ` Chris Wilson
  2018-09-03 10:08   ` Joonas Lahtinen
  2018-09-03  8:33 ` [PATCH 4/5] drm/i915: Early rejection of buffer allocations larger than RAM Chris Wilson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2018-09-03  8:33 UTC (permalink / raw)
  To: intel-gfx

If we fail to write the user relocation back when it is changed, force
ourselves to take the slow relocation path where we can handle faults in
the write path. There is still an element of dubiousness as having
patched up the batch to use the correct offset, it no longer matches the
presumed_offset in the relocation, so a second pass may miss any changes
in layout.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a926d7d47183..931be2651f01 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1486,8 +1486,10 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct i915_vma *vma)
 				 * can read from this userspace address.
 				 */
 				offset = gen8_canonical_addr(offset & ~UPDATE);
-				__put_user(offset,
-					   &urelocs[r-stack].presumed_offset);
+				if (unlikely(__put_user(offset, &urelocs[r-stack].presumed_offset))) {
+					remain = -EFAULT;
+					goto out;
+				}
 			}
 		} while (r++, --count);
 		urelocs += ARRAY_SIZE(stack);
@@ -1572,7 +1574,6 @@ static int eb_copy_relocations(const struct i915_execbuffer *eb)
 
 		relocs = kvmalloc_array(size, 1, GFP_KERNEL);
 		if (!relocs) {
-			kvfree(relocs);
 			err = -ENOMEM;
 			goto err;
 		}
@@ -1586,6 +1587,7 @@ static int eb_copy_relocations(const struct i915_execbuffer *eb)
 			if (__copy_from_user((char *)relocs + copied,
 					     (char __user *)urelocs + copied,
 					     len)) {
+end_user:
 				kvfree(relocs);
 				err = -EFAULT;
 				goto err;
@@ -1609,7 +1611,6 @@ static int eb_copy_relocations(const struct i915_execbuffer *eb)
 			unsafe_put_user(-1,
 					&urelocs[copied].presumed_offset,
 					end_user);
-end_user:
 		user_access_end();
 
 		eb->exec[i].relocs_ptr = (uintptr_t)relocs;
-- 
2.19.0.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] drm/i915: Early rejection of buffer allocations larger than RAM
  2018-09-03  8:33 [PATCH 1/5] drm/i915: Do a full device reset after being wedged Chris Wilson
  2018-09-03  8:33 ` [PATCH 2/5] drm/i915: Flag any possible writes for a GTT fault Chris Wilson
  2018-09-03  8:33 ` [PATCH 3/5] drm/i915: Force the slow path after a user-write error Chris Wilson
@ 2018-09-03  8:33 ` Chris Wilson
  2018-09-03 10:10   ` Joonas Lahtinen
  2018-09-03  8:33 ` [PATCH 5/5] drm/i915: Forcibly flush unwanted requests in drop-caches Chris Wilson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2018-09-03  8:33 UTC (permalink / raw)
  To: intel-gfx

We currently try to pin and allocate the whole buffer at a time. If that
object is larger than RAM, we will try to pin the whole of physical
memory, force the machine into oom, and then still fail the allocation.

If the request is obviously too large, error out early. We opt to do
this in the backend to make it easy to use alternate paths that do not
require the entire object pinned, or may easily handle proxy objects
that are larger than physical memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 625e07c56fe2..89834ce19acd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2533,13 +2533,21 @@ static int i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	gfp_t noreclaim;
 	int ret;
 
-	/* Assert that the object is not currently in any GPU domain. As it
+	/*
+	 * Assert that the object is not currently in any GPU domain. As it
 	 * wasn't in the GTT, there shouldn't be any way it could have been in
 	 * a GPU cache
 	 */
 	GEM_BUG_ON(obj->read_domains & I915_GEM_GPU_DOMAINS);
 	GEM_BUG_ON(obj->write_domain & I915_GEM_GPU_DOMAINS);
 
+	/*
+	 * If there's no chance of allocating enough pages for the whole
+	 * object, bail early.
+	 */
+	if (page_count > totalram_pages)
+		return -ENOMEM;
+
 	st = kmalloc(sizeof(*st), GFP_KERNEL);
 	if (st == NULL)
 		return -ENOMEM;
@@ -2550,7 +2558,8 @@ static int i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 		return -ENOMEM;
 	}
 
-	/* Get the list of pages out of our struct file.  They'll be pinned
+	/*
+	 * Get the list of pages out of our struct file.  They'll be pinned
 	 * at this point until we release them.
 	 *
 	 * Fail silently without starting the shrinker
@@ -2582,7 +2591,8 @@ static int i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 			i915_gem_shrink(dev_priv, 2 * page_count, NULL, *s++);
 			cond_resched();
 
-			/* We've tried hard to allocate the memory by reaping
+			/*
+			 * We've tried hard to allocate the memory by reaping
 			 * our own buffer, now let the real VM do its job and
 			 * go down in flames if truly OOM.
 			 *
@@ -2594,7 +2604,8 @@ static int i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 				/* reclaim and warn, but no oom */
 				gfp = mapping_gfp_mask(mapping);
 
-				/* Our bo are always dirty and so we require
+				/*
+				 * Our bo are always dirty and so we require
 				 * kswapd to reclaim our pages (direct reclaim
 				 * does not effectively begin pageout of our
 				 * buffers on its own). However, direct reclaim
@@ -2638,7 +2649,8 @@ static int i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 
 	ret = i915_gem_gtt_prepare_pages(obj, st);
 	if (ret) {
-		/* DMA remapping failed? One possible cause is that
+		/*
+		 * DMA remapping failed? One possible cause is that
 		 * it could not reserve enough large entries, asking
 		 * for PAGE_SIZE chunks instead may be helpful.
 		 */
@@ -2672,7 +2684,8 @@ static int i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	sg_free_table(st);
 	kfree(st);
 
-	/* shmemfs first checks if there is enough memory to allocate the page
+	/*
+	 * shmemfs first checks if there is enough memory to allocate the page
 	 * and reports ENOSPC should there be insufficient, along with the usual
 	 * ENOMEM for a genuine allocation failure.
 	 *
-- 
2.19.0.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] drm/i915: Forcibly flush unwanted requests in drop-caches
  2018-09-03  8:33 [PATCH 1/5] drm/i915: Do a full device reset after being wedged Chris Wilson
                   ` (2 preceding siblings ...)
  2018-09-03  8:33 ` [PATCH 4/5] drm/i915: Early rejection of buffer allocations larger than RAM Chris Wilson
@ 2018-09-03  8:33 ` Chris Wilson
  2018-09-03 10:24   ` Joonas Lahtinen
  2018-09-03  8:42 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/5] drm/i915: Do a full device reset after being wedged Patchwork
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2018-09-03  8:33 UTC (permalink / raw)
  To: intel-gfx

Add a mode to debugfs/drop-caches to flush unwanted requests off the GPU
(by wedging the device and resetting). This is very useful if a test
terminated leaving a long queue of hanging batches that would ordinarily
require a round trip through hangcheck for each.

It reduces the inter-test operation to just a write into drop-caches to
reset driver/GPU state between tests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 52 ++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index a5265c236a33..4ad0e2ed8610 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4131,13 +4131,17 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_ring_test_irq_fops,
 #define DROP_FREED	BIT(4)
 #define DROP_SHRINK_ALL	BIT(5)
 #define DROP_IDLE	BIT(6)
+#define DROP_RESET_ACTIVE	BIT(7)
+#define DROP_RESET_SEQNO	BIT(8)
 #define DROP_ALL (DROP_UNBOUND	| \
 		  DROP_BOUND	| \
 		  DROP_RETIRE	| \
 		  DROP_ACTIVE	| \
 		  DROP_FREED	| \
 		  DROP_SHRINK_ALL |\
-		  DROP_IDLE)
+		  DROP_IDLE	| \
+		  DROP_RESET_ACTIVE | \
+		  DROP_RESET_SEQNO)
 static int
 i915_drop_caches_get(void *data, u64 *val)
 {
@@ -4149,53 +4153,69 @@ i915_drop_caches_get(void *data, u64 *val)
 static int
 i915_drop_caches_set(void *data, u64 val)
 {
-	struct drm_i915_private *dev_priv = data;
-	struct drm_device *dev = &dev_priv->drm;
+	struct drm_i915_private *i915 = data;
 	int ret = 0;
 
 	DRM_DEBUG("Dropping caches: 0x%08llx [0x%08llx]\n",
 		  val, val & DROP_ALL);
 
+	if (val & DROP_RESET_ACTIVE && !intel_engines_are_idle(i915))
+		i915_gem_set_wedged(i915);
+
 	/* No need to check and wait for gpu resets, only libdrm auto-restarts
 	 * on ioctls on -EAGAIN. */
-	if (val & (DROP_ACTIVE | DROP_RETIRE)) {
-		ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (val & (DROP_ACTIVE | DROP_RETIRE | DROP_RESET_SEQNO)) {
+		ret = mutex_lock_interruptible(&i915->drm.struct_mutex);
 		if (ret)
 			return ret;
 
 		if (val & DROP_ACTIVE)
-			ret = i915_gem_wait_for_idle(dev_priv,
+			ret = i915_gem_wait_for_idle(i915,
 						     I915_WAIT_INTERRUPTIBLE |
 						     I915_WAIT_LOCKED,
 						     MAX_SCHEDULE_TIMEOUT);
 
+		if (val & DROP_RESET_SEQNO) {
+			intel_runtime_pm_get(i915);
+			ret = i915_gem_set_global_seqno(&i915->drm, 1);
+			intel_runtime_pm_put(i915);
+		}
+
 		if (val & DROP_RETIRE)
-			i915_retire_requests(dev_priv);
+			i915_retire_requests(i915);
 
-		mutex_unlock(&dev->struct_mutex);
+		mutex_unlock(&i915->drm.struct_mutex);
+	}
+
+	if (val & DROP_RESET_ACTIVE &&
+	    i915_terminally_wedged(&i915->gpu_error)) {
+		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
+		wait_on_bit(&i915->gpu_error.flags,
+			    I915_RESET_HANDOFF,
+			    TASK_UNINTERRUPTIBLE);
 	}
 
 	fs_reclaim_acquire(GFP_KERNEL);
 	if (val & DROP_BOUND)
-		i915_gem_shrink(dev_priv, LONG_MAX, NULL, I915_SHRINK_BOUND);
+		i915_gem_shrink(i915, LONG_MAX, NULL, I915_SHRINK_BOUND);
 
 	if (val & DROP_UNBOUND)
-		i915_gem_shrink(dev_priv, LONG_MAX, NULL, I915_SHRINK_UNBOUND);
+		i915_gem_shrink(i915, LONG_MAX, NULL, I915_SHRINK_UNBOUND);
 
 	if (val & DROP_SHRINK_ALL)
-		i915_gem_shrink_all(dev_priv);
+		i915_gem_shrink_all(i915);
 	fs_reclaim_release(GFP_KERNEL);
 
 	if (val & DROP_IDLE) {
 		do {
-			if (READ_ONCE(dev_priv->gt.active_requests))
-				flush_delayed_work(&dev_priv->gt.retire_work);
-			drain_delayed_work(&dev_priv->gt.idle_work);
-		} while (READ_ONCE(dev_priv->gt.awake));
+			if (READ_ONCE(i915->gt.active_requests))
+				flush_delayed_work(&i915->gt.retire_work);
+			drain_delayed_work(&i915->gt.idle_work);
+		} while (READ_ONCE(i915->gt.awake));
 	}
 
 	if (val & DROP_FREED)
-		i915_gem_drain_freed_objects(dev_priv);
+		i915_gem_drain_freed_objects(i915);
 
 	return ret;
 }
-- 
2.19.0.rc1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/5] drm/i915: Do a full device reset after being wedged
  2018-09-03  8:33 [PATCH 1/5] drm/i915: Do a full device reset after being wedged Chris Wilson
                   ` (3 preceding siblings ...)
  2018-09-03  8:33 ` [PATCH 5/5] drm/i915: Forcibly flush unwanted requests in drop-caches Chris Wilson
@ 2018-09-03  8:42 ` Patchwork
  2018-09-03  8:59 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2018-09-03  8:42 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/5] drm/i915: Do a full device reset after being wedged
URL   : https://patchwork.freedesktop.org/series/49061/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
4125e8b6522d drm/i915: Do a full device reset after being wedged
81b274ed9f78 drm/i915: Flag any possible writes for a GTT fault
537b66e108d7 drm/i915: Force the slow path after a user-write error
-:25: WARNING:LONG_LINE: line over 100 characters
#25: FILE: drivers/gpu/drm/i915/i915_gem_execbuffer.c:1489:
+				if (unlikely(__put_user(offset, &urelocs[r-stack].presumed_offset))) {

-:25: CHECK:SPACING: spaces preferred around that '-' (ctx:VxV)
#25: FILE: drivers/gpu/drm/i915/i915_gem_execbuffer.c:1489:
+				if (unlikely(__put_user(offset, &urelocs[r-stack].presumed_offset))) {
 				                                          ^

total: 0 errors, 1 warnings, 1 checks, 33 lines checked
592099574554 drm/i915: Early rejection of buffer allocations larger than RAM
f7020f7c61c7 drm/i915: Forcibly flush unwanted requests in drop-caches

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915: Do a full device reset after being wedged
  2018-09-03  8:33 [PATCH 1/5] drm/i915: Do a full device reset after being wedged Chris Wilson
                   ` (4 preceding siblings ...)
  2018-09-03  8:42 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/5] drm/i915: Do a full device reset after being wedged Patchwork
@ 2018-09-03  8:59 ` Patchwork
  2018-09-03  9:56 ` [PATCH 1/5] " Joonas Lahtinen
  2018-09-03 10:29 ` ✓ Fi.CI.IGT: success for series starting with [1/5] " Patchwork
  7 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2018-09-03  8:59 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/5] drm/i915: Do a full device reset after being wedged
URL   : https://patchwork.freedesktop.org/series/49061/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4753 -> Patchwork_10068 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/49061/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_10068:

  === IGT changes ===

    ==== Warnings ====

    {igt@pm_rpm@module-reload}:
      fi-hsw-4770r:       PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_10068 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@amdgpu/amd_basic@userptr:
      fi-kbl-8809g:       PASS -> INCOMPLETE (fdo#107402)

    igt@drv_module_reload@basic-reload-inject:
      fi-hsw-4770r:       PASS -> DMESG-WARN (fdo#107425)

    {igt@pm_rpm@module-reload}:
      fi-glk-dsi:         NOTRUN -> WARN (fdo#107708, fdo#107602)
      fi-cnl-psr:         PASS -> WARN (fdo#107708, fdo#107602)

    
    ==== Possible fixes ====

    igt@gem_exec_suspend@basic-s4-devices:
      fi-blb-e6850:       INCOMPLETE (fdo#107718) -> PASS

    igt@kms_chamelium@hdmi-hpd-fast:
      fi-kbl-7500u:       FAIL (fdo#102672, fdo#103841) -> SKIP

    igt@kms_frontbuffer_tracking@basic:
      fi-hsw-peppy:       DMESG-WARN (fdo#102614) -> PASS

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      {fi-byt-clapper}:   FAIL (fdo#107362, fdo#103191) -> PASS

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-bxt-dsi:         INCOMPLETE (fdo#103927) -> PASS

    igt@prime_vgem@basic-fence-flip:
      fi-ilk-650:         FAIL (fdo#104008) -> PASS

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#102614 https://bugs.freedesktop.org/show_bug.cgi?id=102614
  fdo#102672 https://bugs.freedesktop.org/show_bug.cgi?id=102672
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008
  fdo#107362 https://bugs.freedesktop.org/show_bug.cgi?id=107362
  fdo#107402 https://bugs.freedesktop.org/show_bug.cgi?id=107402
  fdo#107425 https://bugs.freedesktop.org/show_bug.cgi?id=107425
  fdo#107602 https://bugs.freedesktop.org/show_bug.cgi?id=107602
  fdo#107708 https://bugs.freedesktop.org/show_bug.cgi?id=107708
  fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718


== Participating hosts (49 -> 48) ==

  Additional (3): fi-cfl-8109u fi-glk-dsi fi-skl-6700hq 
  Missing    (4): fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-hsw-4200u 


== Build changes ==

    * Linux: CI_DRM_4753 -> Patchwork_10068

  CI_DRM_4753: 0892613a442a70a96cba33b12bb344033b557879 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4619: 9e5fa9112546e5767d57237db8eace7c815b1996 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10068: f7020f7c61c7f49683d92ce208121a80b22206c2 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

f7020f7c61c7 drm/i915: Forcibly flush unwanted requests in drop-caches
592099574554 drm/i915: Early rejection of buffer allocations larger than RAM
537b66e108d7 drm/i915: Force the slow path after a user-write error
81b274ed9f78 drm/i915: Flag any possible writes for a GTT fault
4125e8b6522d drm/i915: Do a full device reset after being wedged

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10068/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] drm/i915: Do a full device reset after being wedged
  2018-09-03  8:33 [PATCH 1/5] drm/i915: Do a full device reset after being wedged Chris Wilson
                   ` (5 preceding siblings ...)
  2018-09-03  8:59 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-09-03  9:56 ` Joonas Lahtinen
  2018-09-03 10:29 ` ✓ Fi.CI.IGT: success for series starting with [1/5] " Patchwork
  7 siblings, 0 replies; 14+ messages in thread
From: Joonas Lahtinen @ 2018-09-03  9:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-09-03 11:33:33)
> We only call unset_wedged on the global reset path (since it's a global
> operation), so if we are terminally wedged and wish to reset, take the
> full device reset path rather than the quicker individual engine resets.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

<SNIP>

> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3309,7 +3309,8 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
>          * Try engine reset when available. We fall back to full reset if
>          * single reset fails.
>          */
> -       if (intel_has_reset_engine(dev_priv)) {
> +       if (intel_has_reset_engine(dev_priv) &&
> +           !i915_terminally_wedged(&dev_priv->gpu_error)) {

NOT terminally wedged AND can reset individually reads clearer, but
either way:

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] drm/i915: Force the slow path after a user-write error
  2018-09-03  8:33 ` [PATCH 3/5] drm/i915: Force the slow path after a user-write error Chris Wilson
@ 2018-09-03 10:08   ` Joonas Lahtinen
  0 siblings, 0 replies; 14+ messages in thread
From: Joonas Lahtinen @ 2018-09-03 10:08 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-09-03 11:33:35)
> If we fail to write the user relocation back when it is changed, force
> ourselves to take the slow relocation path where we can handle faults in
> the write path. There is still an element of dubiousness as having
> patched up the batch to use the correct offset, it no longer matches the
> presumed_offset in the relocation, so a second pass may miss any changes
> in layout.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] drm/i915: Early rejection of buffer allocations larger than RAM
  2018-09-03  8:33 ` [PATCH 4/5] drm/i915: Early rejection of buffer allocations larger than RAM Chris Wilson
@ 2018-09-03 10:10   ` Joonas Lahtinen
  0 siblings, 0 replies; 14+ messages in thread
From: Joonas Lahtinen @ 2018-09-03 10:10 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-09-03 11:33:36)
> We currently try to pin and allocate the whole buffer at a time. If that
> object is larger than RAM, we will try to pin the whole of physical
> memory, force the machine into oom, and then still fail the allocation.
> 
> If the request is obviously too large, error out early. We opt to do
> this in the backend to make it easy to use alternate paths that do not
> require the entire object pinned, or may easily handle proxy objects
> that are larger than physical memory.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] drm/i915: Forcibly flush unwanted requests in drop-caches
  2018-09-03  8:33 ` [PATCH 5/5] drm/i915: Forcibly flush unwanted requests in drop-caches Chris Wilson
@ 2018-09-03 10:24   ` Joonas Lahtinen
  0 siblings, 0 replies; 14+ messages in thread
From: Joonas Lahtinen @ 2018-09-03 10:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-09-03 11:33:37)
> Add a mode to debugfs/drop-caches to flush unwanted requests off the GPU
> (by wedging the device and resetting). This is very useful if a test
> terminated leaving a long queue of hanging batches that would ordinarily
> require a round trip through hangcheck for each.
> 
> It reduces the inter-test operation to just a write into drop-caches to
> reset driver/GPU state between tests.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [1/5] drm/i915: Do a full device reset after being wedged
  2018-09-03  8:33 [PATCH 1/5] drm/i915: Do a full device reset after being wedged Chris Wilson
                   ` (6 preceding siblings ...)
  2018-09-03  9:56 ` [PATCH 1/5] " Joonas Lahtinen
@ 2018-09-03 10:29 ` Patchwork
  2018-09-03 11:05   ` Chris Wilson
  7 siblings, 1 reply; 14+ messages in thread
From: Patchwork @ 2018-09-03 10:29 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/5] drm/i915: Do a full device reset after being wedged
URL   : https://patchwork.freedesktop.org/series/49061/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4753_full -> Patchwork_10068_full =

== Summary - SUCCESS ==

  No regressions found.

  

== Known issues ==

  Here are the changes found in Patchwork_10068_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_flip@2x-flip-vs-expired-vblank:
      shard-glk:          PASS -> INCOMPLETE (k.org#198133, fdo#103359)

    igt@kms_setmode@basic:
      shard-apl:          PASS -> FAIL (fdo#99912)

    igt@perf@blocking:
      shard-hsw:          PASS -> FAIL (fdo#102252)

    
    ==== Possible fixes ====

    igt@kms_vblank@pipe-b-ts-continuation-suspend:
      shard-apl:          INCOMPLETE (fdo#103927) -> PASS

    igt@perf@polling:
      shard-hsw:          FAIL (fdo#102252) -> PASS

    
  fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4753 -> Patchwork_10068

  CI_DRM_4753: 0892613a442a70a96cba33b12bb344033b557879 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4619: 9e5fa9112546e5767d57237db8eace7c815b1996 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10068: f7020f7c61c7f49683d92ce208121a80b22206c2 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10068/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] drm/i915: Flag any possible writes for a GTT fault
  2018-09-03  8:33 ` [PATCH 2/5] drm/i915: Flag any possible writes for a GTT fault Chris Wilson
@ 2018-09-03 10:30   ` Joonas Lahtinen
  0 siblings, 0 replies; 14+ messages in thread
From: Joonas Lahtinen @ 2018-09-03 10:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-09-03 11:33:34)
> We do not explicitly mark the PTE for the user's GTT mmap as being
> wrprotect, so we don't get a refault when we would need to change a
> read-only mmapping into read-write. As such, we must presume that if the
> vma has PROT_WRITE it may be written to, although this is supposed to be
> indicated by set-domain there are cases (e.g. after swap) where
> userspace may not be aware of the implicit domain change.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: ✓ Fi.CI.IGT: success for series starting with [1/5] drm/i915: Do a full device reset after being wedged
  2018-09-03 10:29 ` ✓ Fi.CI.IGT: success for series starting with [1/5] " Patchwork
@ 2018-09-03 11:05   ` Chris Wilson
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2018-09-03 11:05 UTC (permalink / raw)
  To: Patchwork; +Cc: intel-gfx

Quoting Patchwork (2018-09-03 11:29:24)
> == Series Details ==
> 
> Series: series starting with [1/5] drm/i915: Do a full device reset after being wedged
> URL   : https://patchwork.freedesktop.org/series/49061/
> State : success
> 
> == Summary ==
> 
> = CI Bug Log - changes from CI_DRM_4753_full -> Patchwork_10068_full =
> 
> == Summary - SUCCESS ==
> 
>   No regressions found.

A peaceful change. Thanks for the review,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-09-03 11:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-03  8:33 [PATCH 1/5] drm/i915: Do a full device reset after being wedged Chris Wilson
2018-09-03  8:33 ` [PATCH 2/5] drm/i915: Flag any possible writes for a GTT fault Chris Wilson
2018-09-03 10:30   ` Joonas Lahtinen
2018-09-03  8:33 ` [PATCH 3/5] drm/i915: Force the slow path after a user-write error Chris Wilson
2018-09-03 10:08   ` Joonas Lahtinen
2018-09-03  8:33 ` [PATCH 4/5] drm/i915: Early rejection of buffer allocations larger than RAM Chris Wilson
2018-09-03 10:10   ` Joonas Lahtinen
2018-09-03  8:33 ` [PATCH 5/5] drm/i915: Forcibly flush unwanted requests in drop-caches Chris Wilson
2018-09-03 10:24   ` Joonas Lahtinen
2018-09-03  8:42 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/5] drm/i915: Do a full device reset after being wedged Patchwork
2018-09-03  8:59 ` ✓ Fi.CI.BAT: success " Patchwork
2018-09-03  9:56 ` [PATCH 1/5] " Joonas Lahtinen
2018-09-03 10:29 ` ✓ Fi.CI.IGT: success for series starting with [1/5] " Patchwork
2018-09-03 11:05   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.