All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Wait for completion of pending flips when starved of fences
@ 2014-01-19 21:20 Chris Wilson
  2014-01-19 21:55 ` Daniel Vetter
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2014-01-19 21:20 UTC (permalink / raw)
  To: intel-gfx

On older generations (gen2, gen3) the GPU requires fences for many
operations, such as blits. The display hardware also requires fences for
scanouts and this leads to a situation where an arbitrary number of
fences may be pinned by old scanouts following a pageflip but before we
have executed the unpin workqueue. This is unpredictable by userspace
and leads to random EDEADLK when submitting an otherwise benign
execbuffer. However, we can detect when we have an outstanding flip and
so cause userspace to wait upon their completion before finally
declaring that the system is starved of fences. This is really no worse
than forcing the GPU to stall waiting for older execbuffer to retire and
release their fences before we can reallocate them for the next
execbuffer.

Reported-and-tested-by: dimon@gmx.net
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c |   23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 41d5a99..686620c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3214,7 +3214,7 @@ i915_find_fence_reg(struct drm_device *dev)
 	}
 
 	if (avail == NULL)
-		return NULL;
+		goto deadlock;
 
 	/* None available, try to steal one or wait for a user to finish */
 	list_for_each_entry(reg, &dev_priv->mm.fence_list, lru_list) {
@@ -3224,7 +3224,22 @@ i915_find_fence_reg(struct drm_device *dev)
 		return reg;
 	}
 
-	return NULL;
+deadlock:
+	{ /* Wait for completion of pending flips which consume fences */
+		struct intel_crtc *crtc;
+
+		list_for_each_entry(crtc, &dev->mode_config.crtc_list, base.head) {
+			if (atomic_read(&crtc->unpin_work_count) == 0)
+				continue;
+
+			if (crtc->unpin_work)
+				intel_wait_for_vblank(dev, crtc->pipe);
+
+			return ERR_PTR(-EAGAIN);
+		}
+	}
+
+	return ERR_PTR(-EDEADLK);
 }
 
 /**
@@ -3269,8 +3284,8 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
 		}
 	} else if (enable) {
 		reg = i915_find_fence_reg(dev);
-		if (reg == NULL)
-			return -EDEADLK;
+		if (IS_ERR(reg))
+			return PTR_ERR(reg);
 
 		if (reg->obj) {
 			struct drm_i915_gem_object *old = reg->obj;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Wait for completion of pending flips when starved of fences
  2014-01-19 21:20 [PATCH] drm/i915: Wait for completion of pending flips when starved of fences Chris Wilson
@ 2014-01-19 21:55 ` Daniel Vetter
  2014-01-20  9:49   ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Vetter @ 2014-01-19 21:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Sun, Jan 19, 2014 at 10:20 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On older generations (gen2, gen3) the GPU requires fences for many
> operations, such as blits. The display hardware also requires fences for
> scanouts and this leads to a situation where an arbitrary number of
> fences may be pinned by old scanouts following a pageflip but before we
> have executed the unpin workqueue. This is unpredictable by userspace
> and leads to random EDEADLK when submitting an otherwise benign
> execbuffer. However, we can detect when we have an outstanding flip and
> so cause userspace to wait upon their completion before finally
> declaring that the system is starved of fences. This is really no worse
> than forcing the GPU to stall waiting for older execbuffer to retire and
> release their fences before we can reallocate them for the next
> execbuffer.
>
> Reported-and-tested-by: dimon@gmx.net
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

New subtest for kms_flip which submits such a blt buffer while a
pageflip is still pending? Also there's a certain chance we'll starve
the unpin work, similar to the issues around flushing the unpin work
in our pageflip implementation. Finally I think this should be
encapsulated into a little helper to be used in evict_something.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Wait for completion of pending flips when starved of fences
  2014-01-19 21:55 ` Daniel Vetter
@ 2014-01-20  9:49   ` Chris Wilson
  2014-01-20 10:17     ` [PATCH 1/2] " Chris Wilson
  2014-01-20 10:37     ` [PATCH] " Daniel Vetter
  0 siblings, 2 replies; 14+ messages in thread
From: Chris Wilson @ 2014-01-20  9:49 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sun, Jan 19, 2014 at 10:55:26PM +0100, Daniel Vetter wrote:
> On Sun, Jan 19, 2014 at 10:20 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On older generations (gen2, gen3) the GPU requires fences for many
> > operations, such as blits. The display hardware also requires fences for
> > scanouts and this leads to a situation where an arbitrary number of
> > fences may be pinned by old scanouts following a pageflip but before we
> > have executed the unpin workqueue. This is unpredictable by userspace
> > and leads to random EDEADLK when submitting an otherwise benign
> > execbuffer. However, we can detect when we have an outstanding flip and
> > so cause userspace to wait upon their completion before finally
> > declaring that the system is starved of fences. This is really no worse
> > than forcing the GPU to stall waiting for older execbuffer to retire and
> > release their fences before we can reallocate them for the next
> > execbuffer.
> >
> > Reported-and-tested-by: dimon@gmx.net
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> New subtest for kms_flip which submits such a blt buffer while a
> pageflip is still pending?

Correct.

> Also there's a certain chance we'll starve
> the unpin work, similar to the issues around flushing the unpin work
> in our pageflip implementation.

If you mean that we will never run the unpin workqueue, that's what the
implementation will fix, eventually, after a busy-spin in userspace since
set_need_resched() was removed. I can teach userspace to yield() after
an EAGAIN which seems a reasonable compromise (userspace gets a bonus
for being cooperative rather than penalized for using up its timeslice.)

> Finally I think this should be
> encapsulated into a little helper to be used in evict_something.

Yes, this should work for our ENOSPC due to pageflips as well.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] drm/i915: Wait for completion of pending flips when starved of fences
  2014-01-20  9:49   ` Chris Wilson
@ 2014-01-20 10:17     ` Chris Wilson
  2014-01-20 10:17       ` [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip completions are outstanding Chris Wilson
                         ` (2 more replies)
  2014-01-20 10:37     ` [PATCH] " Daniel Vetter
  1 sibling, 3 replies; 14+ messages in thread
From: Chris Wilson @ 2014-01-20 10:17 UTC (permalink / raw)
  To: intel-gfx

On older generations (gen2, gen3) the GPU requires fences for many
operations, such as blits. The display hardware also requires fences for
scanouts and this leads to a situation where an arbitrary number of
fences may be pinned by old scanouts following a pageflip but before we
have executed the unpin workqueue. This is unpredictable by userspace
and leads to random EDEADLK when submitting an otherwise benign
execbuffer. However, we can detect when we have an outstanding flip and
so cause userspace to wait upon their completion before finally
declaring that the system is starved of fences. This is really no worse
than forcing the GPU to stall waiting for older execbuffer to retire and
release their fences before we can reallocate them for the next
execbuffer.

v2: move the test for a pending fb unpin to a common routine for
later reuse during eviction

Reported-and-tested-by: dimon@gmx.net
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c      |   13 +++++++++----
 drivers/gpu/drm/i915/intel_display.c |   24 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_drv.h     |    1 +
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 41d5a99..ba99b31 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3214,7 +3214,7 @@ i915_find_fence_reg(struct drm_device *dev)
 	}
 
 	if (avail == NULL)
-		return NULL;
+		goto deadlock;
 
 	/* None available, try to steal one or wait for a user to finish */
 	list_for_each_entry(reg, &dev_priv->mm.fence_list, lru_list) {
@@ -3224,7 +3224,12 @@ i915_find_fence_reg(struct drm_device *dev)
 		return reg;
 	}
 
-	return NULL;
+deadlock:
+	/* Wait for completion of pending flips which consume fences */
+	if (intel_has_pending_fb_unpin(dev))
+		return ERR_PTR(-EAGAIN);
+
+	return ERR_PTR(-EDEADLK);
 }
 
 /**
@@ -3269,8 +3274,8 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
 		}
 	} else if (enable) {
 		reg = i915_find_fence_reg(dev);
-		if (reg == NULL)
-			return -EDEADLK;
+		if (IS_ERR(reg))
+			return PTR_ERR(reg);
 
 		if (reg->obj) {
 			struct drm_i915_gem_object *old = reg->obj;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 25c01c2..03e703d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2982,6 +2982,30 @@ static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
 	return pending;
 }
 
+bool intel_has_pending_fb_unpin(struct drm_device *dev)
+{
+	struct intel_crtc *crtc;
+
+	/* Note that we don't need to be called with mode_config.lock here
+	 * as our list of CRTC objects is static for the lifetime of the
+	 * device and so cannot disappear as we iterate. Similarly, we can
+	 * happily treat the predicates as racy, atomic checks as userspace
+	 * cannot claim and pin a new fb without at least acquring the
+	 * struct_mutex and so serialising with us.
+	 */
+	list_for_each_entry(crtc, &dev->mode_config.crtc_list, base.head) {
+		if (atomic_read(&crtc->unpin_work_count) == 0)
+			continue;
+
+		if (crtc->unpin_work)
+			intel_wait_for_vblank(dev, crtc->pipe);
+
+		return true;
+	}
+
+	return false;
+}
+
 static void intel_crtc_wait_for_pending_flips(struct drm_crtc *crtc)
 {
 	struct drm_device *dev = crtc->dev;
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 2b72b1d..713009b 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -632,6 +632,7 @@ void intel_ddi_get_config(struct intel_encoder *encoder,
 
 /* intel_display.c */
 const char *intel_output_name(int output);
+bool intel_has_pending_fb_unpin(struct drm_device *dev);
 int intel_pch_rawclk(struct drm_device *dev);
 void intel_mark_busy(struct drm_device *dev);
 void intel_mark_fb_busy(struct drm_i915_gem_object *obj,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip completions are outstanding
  2014-01-20 10:17     ` [PATCH 1/2] " Chris Wilson
@ 2014-01-20 10:17       ` Chris Wilson
  2014-01-21 11:03         ` Bloomfield, Jon
  2014-01-20 20:30       ` [PATCH 1/2] drm/i915: Wait for completion of pending flips when starved of fences Chris Wilson
  2014-01-21 11:03       ` Bloomfield, Jon
  2 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2014-01-20 10:17 UTC (permalink / raw)
  To: intel-gfx

Since an old pageflip will keep its scanout buffer object pinned until
it has executed its unpin task on the common workqueue, we can clog up
our GGTT with stale pinned objects. As we cannot flush those workqueues
without dropping our locks, we have to resort to falling back to
userspace and telling them to repeat the operation in order to have a
chance to run our workqueues and free up the required memory. If we
fail, then we are forced to report ENOSPC back to userspace causing the
operation to fail and best-case scenario is that it introduces temporary
corruption.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_evict.c |   27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index fed72a6..525b242 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -27,8 +27,10 @@
  */
 
 #include <drm/drmP.h>
-#include "i915_drv.h"
 #include <drm/i915_drm.h>
+
+#include "i915_drv.h"
+#include "intel_drv.h"
 #include "i915_trace.h"
 
 static bool
@@ -54,6 +56,7 @@ i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm,
 	struct list_head eviction_list, unwind_list;
 	struct i915_vma *vma;
 	int ret = 0;
+	int pass = 0;
 
 	trace_i915_gem_evict(dev, min_size, alignment, flags);
 
@@ -120,14 +123,24 @@ none:
 	/* Can we unpin some objects such as idle hw contents,
 	 * or pending flips?
 	 */
-	ret = flags & PIN_NONBLOCK ? -ENOSPC : i915_gpu_idle(dev);
-	if (ret)
-		return ret;
+	if (flags & PIN_NONBLOCK)
+		return -ENOSPC;
 
 	/* Only idle the GPU and repeat the search once */
-	i915_gem_retire_requests(dev);
-	flags |= PIN_NONBLOCK;
-	goto search_again;
+	if (pass++ == 0) {
+		ret = i915_gpu_idle(dev);
+		if (ret)
+			return ret;
+
+		i915_gem_retire_requests(dev);
+		goto search_again;
+	}
+
+	/* If we still have pending pageflip completions, drop
+	 * back to userspace to give our workqueues time to
+	 * acquire our locks and unpin the old scanouts.
+	 */
+	return intel_has_pending_fb_unpin(dev) ? -EAGAIN : -ENOSPC;
 
 found:
 	/* drm_mm doesn't allow any other other operations while
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Wait for completion of pending flips when starved of fences
  2014-01-20  9:49   ` Chris Wilson
  2014-01-20 10:17     ` [PATCH 1/2] " Chris Wilson
@ 2014-01-20 10:37     ` Daniel Vetter
  2014-01-20 10:44       ` Chris Wilson
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Vetter @ 2014-01-20 10:37 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Mon, Jan 20, 2014 at 09:49:24AM +0000, Chris Wilson wrote:
> On Sun, Jan 19, 2014 at 10:55:26PM +0100, Daniel Vetter wrote:
> > On Sun, Jan 19, 2014 at 10:20 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > On older generations (gen2, gen3) the GPU requires fences for many
> > > operations, such as blits. The display hardware also requires fences for
> > > scanouts and this leads to a situation where an arbitrary number of
> > > fences may be pinned by old scanouts following a pageflip but before we
> > > have executed the unpin workqueue. This is unpredictable by userspace
> > > and leads to random EDEADLK when submitting an otherwise benign
> > > execbuffer. However, we can detect when we have an outstanding flip and
> > > so cause userspace to wait upon their completion before finally
> > > declaring that the system is starved of fences. This is really no worse
> > > than forcing the GPU to stall waiting for older execbuffer to retire and
> > > release their fences before we can reallocate them for the next
> > > execbuffer.
> > >
> > > Reported-and-tested-by: dimon@gmx.net
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > New subtest for kms_flip which submits such a blt buffer while a
> > pageflip is still pending?
> 
> Correct.
> 
> > Also there's a certain chance we'll starve
> > the unpin work, similar to the issues around flushing the unpin work
> > in our pageflip implementation.
> 
> If you mean that we will never run the unpin workqueue, that's what the
> implementation will fix, eventually, after a busy-spin in userspace since
> set_need_resched() was removed. I can teach userspace to yield() after
> an EAGAIN which seems a reasonable compromise (userspace gets a bonus
> for being cooperative rather than penalized for using up its timeslice.)

yield won't help, we need to block on the work-queue draining like we do
in the pageflip code with flush_workqueue. At least we've had bug reports
in the past where someone found it intriguing to run his entire userspace
with rt prio, which ended up starving the sched_normal workqueue and so
livelocked the entire system.

Instead of busy-looping through userspace with -EAGAIN I think we should
keep all the unpin works on a spinlock-protected list and synchronously
unpin the buffers in the get_fence and evict_something paths (after the
flip completed, we've removed the unpin entry from the list and dropped
the spinlock ofc).

The only downside is that we have a notch more complexity since we need to
manually check for gpu hangs and bail out correctly if there is one. Which
means another kms_flip subtest, but that shouldn't be too much fuzz with
the combinatorial testflags we already have.

Since we don't have a test where rt threads starve our workers for the
normal pageflip code I think we can eshew that part here, too. I'll add it
to the i-g-t wishlist though for a rainy afternoon ;-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Wait for completion of pending flips when starved of fences
  2014-01-20 10:37     ` [PATCH] " Daniel Vetter
@ 2014-01-20 10:44       ` Chris Wilson
  2014-01-20 16:17         ` Daniel Vetter
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2014-01-20 10:44 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, Jan 20, 2014 at 11:37:42AM +0100, Daniel Vetter wrote:
> On Mon, Jan 20, 2014 at 09:49:24AM +0000, Chris Wilson wrote:
> > On Sun, Jan 19, 2014 at 10:55:26PM +0100, Daniel Vetter wrote:
> > > Also there's a certain chance we'll starve
> > > the unpin work, similar to the issues around flushing the unpin work
> > > in our pageflip implementation.
> > 
> > If you mean that we will never run the unpin workqueue, that's what the
> > implementation will fix, eventually, after a busy-spin in userspace since
> > set_need_resched() was removed. I can teach userspace to yield() after
> > an EAGAIN which seems a reasonable compromise (userspace gets a bonus
> > for being cooperative rather than penalized for using up its timeslice.)
> 
> yield won't help, we need to block on the work-queue draining like we do
> in the pageflip code with flush_workqueue. At least we've had bug reports
> in the past where someone found it intriguing to run his entire userspace
> with rt prio, which ended up starving the sched_normal workqueue and so
> livelocked the entire system.

But isn't that exactly the behaviour the RT user programmed?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Wait for completion of pending flips when starved of fences
  2014-01-20 10:44       ` Chris Wilson
@ 2014-01-20 16:17         ` Daniel Vetter
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Vetter @ 2014-01-20 16:17 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx

On Mon, Jan 20, 2014 at 10:44:27AM +0000, Chris Wilson wrote:
> On Mon, Jan 20, 2014 at 11:37:42AM +0100, Daniel Vetter wrote:
> > On Mon, Jan 20, 2014 at 09:49:24AM +0000, Chris Wilson wrote:
> > > On Sun, Jan 19, 2014 at 10:55:26PM +0100, Daniel Vetter wrote:
> > > > Also there's a certain chance we'll starve
> > > > the unpin work, similar to the issues around flushing the unpin work
> > > > in our pageflip implementation.
> > > 
> > > If you mean that we will never run the unpin workqueue, that's what the
> > > implementation will fix, eventually, after a busy-spin in userspace since
> > > set_need_resched() was removed. I can teach userspace to yield() after
> > > an EAGAIN which seems a reasonable compromise (userspace gets a bonus
> > > for being cooperative rather than penalized for using up its timeslice.)
> > 
> > yield won't help, we need to block on the work-queue draining like we do
> > in the pageflip code with flush_workqueue. At least we've had bug reports
> > in the past where someone found it intriguing to run his entire userspace
> > with rt prio, which ended up starving the sched_normal workqueue and so
> > livelocked the entire system.
> 
> But isn't that exactly the behaviour the RT user programmed?

Well userspace asked for pageflips and execbuffers and the kernel
delivered a deadlock. So not quite imo ;-)

I know it's a ridiculous corner-case but in general I think the kernel
should ensure that forward process of userspace requests happens and that
offloading things to workqueues is just an implementation details. Also,
epxlicit waits and locks have better debug instrumentation compared to
hand-rolled busy-loops.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] drm/i915: Wait for completion of pending flips when starved of fences
  2014-01-20 10:17     ` [PATCH 1/2] " Chris Wilson
  2014-01-20 10:17       ` [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip completions are outstanding Chris Wilson
@ 2014-01-20 20:30       ` Chris Wilson
  2014-01-21  9:32         ` Daniel Vetter
  2014-01-21 11:03       ` Bloomfield, Jon
  2 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2014-01-20 20:30 UTC (permalink / raw)
  To: intel-gfx

On Mon, Jan 20, 2014 at 10:17:36AM +0000, Chris Wilson wrote:
> On older generations (gen2, gen3) the GPU requires fences for many
> operations, such as blits. The display hardware also requires fences for
> scanouts and this leads to a situation where an arbitrary number of
> fences may be pinned by old scanouts following a pageflip but before we
> have executed the unpin workqueue. This is unpredictable by userspace
> and leads to random EDEADLK when submitting an otherwise benign
> execbuffer. However, we can detect when we have an outstanding flip and
> so cause userspace to wait upon their completion before finally
> declaring that the system is starved of fences. This is really no worse
> than forcing the GPU to stall waiting for older execbuffer to retire and
> release their fences before we can reallocate them for the next
> execbuffer.
> 
> v2: move the test for a pending fb unpin to a common routine for
> later reuse during eviction
> 
> Reported-and-tested-by: dimon@gmx.net
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696
Testcase: i-g-t/kms_flip/flip-vs-fences
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

The testcase is only for the trivial reproduction scenario, not the more
sporadic situation involving a slightly hungry workqueue, but it is
enough to demonstrate the issue and the effectiveness of the simple
workaround (for typical userspace).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] drm/i915: Wait for completion of pending flips when starved of fences
  2014-01-20 20:30       ` [PATCH 1/2] drm/i915: Wait for completion of pending flips when starved of fences Chris Wilson
@ 2014-01-21  9:32         ` Daniel Vetter
  2014-01-21  9:37           ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Vetter @ 2014-01-21  9:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Bloomfield, Jon

On Mon, Jan 20, 2014 at 9:30 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Mon, Jan 20, 2014 at 10:17:36AM +0000, Chris Wilson wrote:
>> On older generations (gen2, gen3) the GPU requires fences for many
>> operations, such as blits. The display hardware also requires fences for
>> scanouts and this leads to a situation where an arbitrary number of
>> fences may be pinned by old scanouts following a pageflip but before we
>> have executed the unpin workqueue. This is unpredictable by userspace
>> and leads to random EDEADLK when submitting an otherwise benign
>> execbuffer. However, we can detect when we have an outstanding flip and
>> so cause userspace to wait upon their completion before finally
>> declaring that the system is starved of fences. This is really no worse
>> than forcing the GPU to stall waiting for older execbuffer to retire and
>> release their fences before we can reallocate them for the next
>> execbuffer.
>>
>> v2: move the test for a pending fb unpin to a common routine for
>> later reuse during eviction
>>
>> Reported-and-tested-by: dimon@gmx.net
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696
> Testcase: i-g-t/kms_flip/flip-vs-fences
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> The testcase is only for the trivial reproduction scenario, not the more
> sporadic situation involving a slightly hungry workqueue, but it is
> enough to demonstrate the issue and the effectiveness of the simple
> workaround (for typical userspace).

Testcase convinced me, this is better than status quo ;-) I guess I'll
add a short comment though when merging that busy-looping through
EAGAIN is a but un-neat.

Jon, since this ties into the evict_something fun we've had last year
can you please review this patch plus the follow-up one?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] drm/i915: Wait for completion of pending flips when starved of fences
  2014-01-21  9:32         ` Daniel Vetter
@ 2014-01-21  9:37           ` Chris Wilson
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2014-01-21  9:37 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Jan 21, 2014 at 10:32:50AM +0100, Daniel Vetter wrote:
> On Mon, Jan 20, 2014 at 9:30 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > The testcase is only for the trivial reproduction scenario, not the more
> > sporadic situation involving a slightly hungry workqueue, but it is
> > enough to demonstrate the issue and the effectiveness of the simple
> > workaround (for typical userspace).
> 
> Testcase convinced me, this is better than status quo ;-) I guess I'll
> add a short comment though when merging that busy-looping through
> EAGAIN is a but un-neat.

On the other hand, it does reduce the RT problem into a typical priority
inversion issue - all the lock dropping and complex error case handling
is built in and does not need introduction of more fragile code. So
reducing it to a known problem is itself neat. :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip completions are outstanding
  2014-01-20 10:17       ` [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip completions are outstanding Chris Wilson
@ 2014-01-21 11:03         ` Bloomfield, Jon
  2014-01-21 12:19           ` Daniel Vetter
  0 siblings, 1 reply; 14+ messages in thread
From: Bloomfield, Jon @ 2014-01-21 11:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

> From: intel-gfx-bounces@lists.freedesktop.org [mailto:intel-gfx-
> bounces@lists.freedesktop.org] On Behalf Of Chris Wilson
> Sent: Monday, January 20, 2014 10:18 AM
> To: intel-gfx@lists.freedesktop.org
> Subject: [Intel-gfx] [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip
> completions are outstanding
> 
> Since an old pageflip will keep its scanout buffer object pinned until it has
> executed its unpin task on the common workqueue, we can clog up our
> GGTT with stale pinned objects. As we cannot flush those workqueues
> without dropping our locks, we have to resort to falling back to userspace
> and telling them to repeat the operation in order to have a chance to run our
> workqueues and free up the required memory. If we fail, then we are forced
> to report ENOSPC back to userspace causing the operation to fail and best-
> case scenario is that it introduces temporary corruption.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] drm/i915: Wait for completion of pending flips when starved of fences
  2014-01-20 10:17     ` [PATCH 1/2] " Chris Wilson
  2014-01-20 10:17       ` [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip completions are outstanding Chris Wilson
  2014-01-20 20:30       ` [PATCH 1/2] drm/i915: Wait for completion of pending flips when starved of fences Chris Wilson
@ 2014-01-21 11:03       ` Bloomfield, Jon
  2 siblings, 0 replies; 14+ messages in thread
From: Bloomfield, Jon @ 2014-01-21 11:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

> -----Original Message-----
> From: intel-gfx-bounces@lists.freedesktop.org [mailto:intel-gfx-
> bounces@lists.freedesktop.org] On Behalf Of Chris Wilson
> Sent: Monday, January 20, 2014 10:18 AM
> To: intel-gfx@lists.freedesktop.org
> Subject: [Intel-gfx] [PATCH 1/2] drm/i915: Wait for completion of pending
> flips when starved of fences
> 
> On older generations (gen2, gen3) the GPU requires fences for many
> operations, such as blits. The display hardware also requires fences for
> scanouts and this leads to a situation where an arbitrary number of fences
> may be pinned by old scanouts following a pageflip but before we have
> executed the unpin workqueue. This is unpredictable by userspace and leads
> to random EDEADLK when submitting an otherwise benign execbuffer.
> However, we can detect when we have an outstanding flip and so cause
> userspace to wait upon their completion before finally declaring that the
> system is starved of fences. This is really no worse than forcing the GPU to
> stall waiting for older execbuffer to retire and release their fences before we
> can reallocate them for the next execbuffer.
> 
> v2: move the test for a pending fb unpin to a common routine for later reuse
> during eviction
> 
> Reported-and-tested-by: dimon@gmx.net
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Review-by: Jon Bloomfield <jon.bloomfield@intel.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip completions are outstanding
  2014-01-21 11:03         ` Bloomfield, Jon
@ 2014-01-21 12:19           ` Daniel Vetter
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Vetter @ 2014-01-21 12:19 UTC (permalink / raw)
  To: Bloomfield, Jon; +Cc: intel-gfx

On Tue, Jan 21, 2014 at 11:03:43AM +0000, Bloomfield, Jon wrote:
> > From: intel-gfx-bounces@lists.freedesktop.org [mailto:intel-gfx-
> > bounces@lists.freedesktop.org] On Behalf Of Chris Wilson
> > Sent: Monday, January 20, 2014 10:18 AM
> > To: intel-gfx@lists.freedesktop.org
> > Subject: [Intel-gfx] [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip
> > completions are outstanding
> > 
> > Since an old pageflip will keep its scanout buffer object pinned until it has
> > executed its unpin task on the common workqueue, we can clog up our
> > GGTT with stale pinned objects. As we cannot flush those workqueues
> > without dropping our locks, we have to resort to falling back to userspace
> > and telling them to repeat the operation in order to have a chance to run our
> > workqueues and free up the required memory. If we fail, then we are forced
> > to report ENOSPC back to userspace causing the operation to fail and best-
> > case scenario is that it introduces temporary corruption.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>

Both merged, with some frobbery to make the 2nd patch apply.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-01-21 12:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-19 21:20 [PATCH] drm/i915: Wait for completion of pending flips when starved of fences Chris Wilson
2014-01-19 21:55 ` Daniel Vetter
2014-01-20  9:49   ` Chris Wilson
2014-01-20 10:17     ` [PATCH 1/2] " Chris Wilson
2014-01-20 10:17       ` [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip completions are outstanding Chris Wilson
2014-01-21 11:03         ` Bloomfield, Jon
2014-01-21 12:19           ` Daniel Vetter
2014-01-20 20:30       ` [PATCH 1/2] drm/i915: Wait for completion of pending flips when starved of fences Chris Wilson
2014-01-21  9:32         ` Daniel Vetter
2014-01-21  9:37           ` Chris Wilson
2014-01-21 11:03       ` Bloomfield, Jon
2014-01-20 10:37     ` [PATCH] " Daniel Vetter
2014-01-20 10:44       ` Chris Wilson
2014-01-20 16:17         ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.