[PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.
@ 2017-01-26 15:59 Maarten Lankhorst
  2017-01-26 15:59 ` [PATCH 1/3] drm/atomic: Bump timeout for waiting for hw_done to 90s in swap_state Maarten Lankhorst
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Maarten Lankhorst @ 2017-01-26 15:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel

When writing some testcases for nonblocking modesets. I found out that the
infinite wait on the old fb was causing issues.

A nonblocking modeset with a hang on the old fb, followed by a blocking
update was enough to trigger it.

For old platforms patch 2 is needed else
kms_busy.extended-modeset-hang-oldfb-*
will put the gpu in a bad state when the 90s in swap_state is reached,
for newer platforms patch 3 will fix this. The display doesn't need to be
reset when there are no CS flips and no display reset.

If you have a newer platform, (gen4+) then the display reset can be
simulated with the kms_busy.extended-modeset-hang-oldfb-with-reset-* test.

Maarten Lankhorst (3):
  drm/atomic: Bump timeout for waiting for hw_done to 90s in swap_state
  drm/i915: Set a timeout when waiting for fence on the old fb
  drm/i915: Skip modeset locking when atomic pageflips are used.

 drivers/gpu/drm/drm_atomic_helper.c  | 12 ++++++++----
 drivers/gpu/drm/i915/intel_display.c | 14 +++++++++++++-
 2 files changed, 21 insertions(+), 5 deletions(-)

-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] drm/atomic: Bump timeout for waiting for hw_done to 90s in swap_state
  2017-01-26 15:59 [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly Maarten Lankhorst
@ 2017-01-26 15:59 ` Maarten Lankhorst
  2017-01-26 15:59 ` [PATCH 2/3] drm/i915: Set a timeout when waiting for fence on the old fb Maarten Lankhorst
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Maarten Lankhorst @ 2017-01-26 15:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/drm_atomic_helper.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index b535f782b2c0..34338678db26 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1977,7 +1977,6 @@ void drm_atomic_helper_swap_state(struct drm_atomic_state *state,
 				  bool stall)
 {
 	int i;
-	long ret;
 	struct drm_connector *connector;
 	struct drm_connector_state *old_conn_state, *new_conn_state;
 	struct drm_crtc *crtc;
@@ -1987,6 +1986,8 @@ void drm_atomic_helper_swap_state(struct drm_atomic_state *state,
 	struct drm_crtc_commit *commit;
 
 	if (stall) {
+		long ret = 90*HZ;
+
 		for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) {
 			spin_lock(&crtc->commit_lock);
 			commit = list_first_entry_or_null(&crtc->commit_list,
@@ -1999,11 +2000,14 @@ void drm_atomic_helper_swap_state(struct drm_atomic_state *state,
 				continue;
 
 			ret = wait_for_completion_timeout(&commit->hw_done,
-							  10*HZ);
-			if (ret == 0)
+							  ret);
+			drm_crtc_commit_put(commit);
+
+			if (ret == 0) {
 				DRM_ERROR("[CRTC:%d:%s] hw_done timed out\n",
 					  crtc->base.id, crtc->name);
-			drm_crtc_commit_put(commit);
+				break;
+			}
 		}
 	}
 
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] drm/i915: Set a timeout when waiting for fence on the old fb
  2017-01-26 15:59 [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly Maarten Lankhorst
  2017-01-26 15:59 ` [PATCH 1/3] drm/atomic: Bump timeout for waiting for hw_done to 90s in swap_state Maarten Lankhorst
@ 2017-01-26 15:59 ` Maarten Lankhorst
  2017-01-26 15:59 ` [PATCH 3/3] drm/i915: Skip modeset locking when atomic pageflips are used Maarten Lankhorst
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Maarten Lankhorst @ 2017-01-26 15:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 7c19b3c7a4ee..35d25e58a37e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -14797,7 +14797,7 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		if (needs_modeset(crtc_state)) {
 			ret = i915_sw_fence_await_reservation(&intel_state->commit_ready,
 							      old_obj->resv, NULL,
-							      false, 0,
+							      false, I915_RESET_TIMEOUT,
 							      GFP_KERNEL);
 			if (ret < 0)
 				return ret;
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] drm/i915: Skip modeset locking when atomic pageflips are used.
  2017-01-26 15:59 [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly Maarten Lankhorst
  2017-01-26 15:59 ` [PATCH 1/3] drm/atomic: Bump timeout for waiting for hw_done to 90s in swap_state Maarten Lankhorst
  2017-01-26 15:59 ` [PATCH 2/3] drm/i915: Set a timeout when waiting for fence on the old fb Maarten Lankhorst
@ 2017-01-26 15:59 ` Maarten Lankhorst
  2017-01-26 16:39 ` [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly Ville Syrjälä
  2017-01-27  9:30 ` [Intel-gfx] " Chris Wilson
  4 siblings, 0 replies; 15+ messages in thread
From: Maarten Lankhorst @ 2017-01-26 15:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel

With the atomic helper for pageflips there are no CS flips when
that require resetting, so on most platforms we can completely
skip the locking.

Because we may end up reverting the page_flip patch add an explicit
function pointer check so that if someone reverts the page flip patch
there will not be any issues if this is forgotten.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 35d25e58a37e..bf5891fc1369 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3519,6 +3519,8 @@ static bool gpu_reset_clobbers_display(struct drm_i915_private *dev_priv)
 		INTEL_GEN(dev_priv) < 5 && !IS_G4X(dev_priv);
 }
 
+static const struct drm_crtc_funcs intel_crtc_funcs;
+
 void intel_prepare_reset(struct drm_i915_private *dev_priv)
 {
 	struct drm_device *dev = &dev_priv->drm;
@@ -3526,6 +3528,11 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
 	struct drm_atomic_state *state;
 	int ret;
 
+	if (intel_crtc_funcs.page_flip == drm_atomic_helper_page_flip &&
+	    !i915.force_reset_modeset_test &&
+	    !gpu_reset_clobbers_display(dev_priv))
+		return;
+
 	/*
 	 * Need mode_config.mutex so that we don't
 	 * trample ongoing ->detect() and whatnot.
@@ -3574,6 +3581,11 @@ void intel_finish_reset(struct drm_i915_private *dev_priv)
 	struct drm_atomic_state *state = dev_priv->modeset_restore_state;
 	int ret;
 
+	if (intel_crtc_funcs.page_flip == drm_atomic_helper_page_flip &&
+	    !i915.force_reset_modeset_test &&
+	    !gpu_reset_clobbers_display(dev_priv))
+		return;
+
 	/*
 	 * Flips in the rings will be nuked by the reset,
 	 * so complete all pending flips so that user space
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.
  2017-01-26 15:59 [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly Maarten Lankhorst
                   ` (2 preceding siblings ...)
  2017-01-26 15:59 ` [PATCH 3/3] drm/i915: Skip modeset locking when atomic pageflips are used Maarten Lankhorst
@ 2017-01-26 16:39 ` Ville Syrjälä
  2017-01-27  9:30 ` [Intel-gfx] " Chris Wilson
  4 siblings, 0 replies; 15+ messages in thread
From: Ville Syrjälä @ 2017-01-26 16:39 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: intel-gfx, dri-devel

On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
> When writing some testcases for nonblocking modesets. I found out that the
> infinite wait on the old fb was causing issues.
> 
> A nonblocking modeset with a hang on the old fb, followed by a blocking
> update was enough to trigger it.
> 
> For old platforms patch 2 is needed else
> kms_busy.extended-modeset-hang-oldfb-*
> will put the gpu in a bad state when the 90s in swap_state is reached,
> for newer platforms patch 3 will fix this. The display doesn't need to be
> reset when there are no CS flips and no display reset.
> 
> If you have a newer platform, (gen4+) then the display reset can be

g4x+

> simulated with the kms_busy.extended-modeset-hang-oldfb-with-reset-* test.
> 
> Maarten Lankhorst (3):
>   drm/atomic: Bump timeout for waiting for hw_done to 90s in swap_state
>   drm/i915: Set a timeout when waiting for fence on the old fb
>   drm/i915: Skip modeset locking when atomic pageflips are used.
> 
>  drivers/gpu/drm/drm_atomic_helper.c  | 12 ++++++++----
>  drivers/gpu/drm/i915/intel_display.c | 14 +++++++++++++-
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> -- 
> 2.7.4
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.
  2017-01-26 15:59 [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly Maarten Lankhorst
                   ` (3 preceding siblings ...)
  2017-01-26 16:39 ` [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly Ville Syrjälä
@ 2017-01-27  9:30 ` Chris Wilson
  2017-01-27 14:21   ` Daniel Vetter
  4 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-01-27  9:30 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: intel-gfx, dri-devel

On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
> When writing some testcases for nonblocking modesets. I found out that the
> infinite wait on the old fb was causing issues.

The crux of the issue here is the locked wait for old dependencies and
the inability to inject the intel_prepare_reset disabling of all planes.
There are a couple of locked waits on struct_mutex within the modeset
locks for intel_overlay and if we happen to be using the display plane
for the first time.

The first I suggested solving using fences to track dependencies and
keep the order between atomic states. Cancelling the outstanding
modesets, replacing with a disable and then on restore jumping to the
final state look doable. It also requires avoiding the struct_mutex for
disabling, which is quite easy. To avoid the wait under struct_mutex,
we've talked about switching to mmio, but for starters we could move the
wait from inside intel_overlay into the fence for the atomic operation.
(But's that a little more surgery than we would like for intel_overlay I
guess - dig out Ville's patches for overlay planes?) And to prevent the
wait under struct_mutex for pin_to_display_plane, my plane is to move
that to an async fenced operation that is then naturally waited upon by
the atomic modeset.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.
  2017-01-27  9:30 ` [Intel-gfx] " Chris Wilson
@ 2017-01-27 14:21   ` Daniel Vetter
  2017-01-27 14:31     ` Chris Wilson
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Vetter @ 2017-01-27 14:21 UTC (permalink / raw)
  To: Chris Wilson, Maarten Lankhorst, intel-gfx, dri-devel

On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote:
> On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
> > When writing some testcases for nonblocking modesets. I found out that the
> > infinite wait on the old fb was causing issues.
> 
> The crux of the issue here is the locked wait for old dependencies and
> the inability to inject the intel_prepare_reset disabling of all planes.
> There are a couple of locked waits on struct_mutex within the modeset
> locks for intel_overlay and if we happen to be using the display plane
> for the first time.
> 
> The first I suggested solving using fences to track dependencies and
> keep the order between atomic states. Cancelling the outstanding
> modesets, replacing with a disable and then on restore jumping to the
> final state look doable. It also requires avoiding the struct_mutex for
> disabling, which is quite easy. To avoid the wait under struct_mutex,
> we've talked about switching to mmio, but for starters we could move the
> wait from inside intel_overlay into the fence for the atomic operation.
> (But's that a little more surgery than we would like for intel_overlay I
> guess - dig out Ville's patches for overlay planes?) And to prevent the
> wait under struct_mutex for pin_to_display_plane, my plane is to move
> that to an async fenced operation that is then naturally waited upon by
> the atomic modeset.

A bit more a hack, but a different idea, and I think hack for gen234.0 is
ok:

We complete all the requests before we start the hw reset with fence.error
= -EIO. But we do this only when we need to get at the display locks. A
slightly more elegant solution would be to trylock modeset locks, and if
one of them fails (and only then) complete all requests with -EIO to get
the concurrent modeset to proceed before we reset the hardware. That's
essentially the logic we had before all the reworks, and it worked. But I
didn't look at how scary that all would be to make it work again ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.
  2017-01-27 14:21   ` Daniel Vetter
@ 2017-01-27 14:31     ` Chris Wilson
  2017-01-27 14:58       ` [Intel-gfx] " Daniel Vetter
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-01-27 14:31 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, dri-devel

On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote:
> On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote:
> > On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
> > > When writing some testcases for nonblocking modesets. I found out that the
> > > infinite wait on the old fb was causing issues.
> > 
> > The crux of the issue here is the locked wait for old dependencies and
> > the inability to inject the intel_prepare_reset disabling of all planes.
> > There are a couple of locked waits on struct_mutex within the modeset
> > locks for intel_overlay and if we happen to be using the display plane
> > for the first time.
> > 
> > The first I suggested solving using fences to track dependencies and
> > keep the order between atomic states. Cancelling the outstanding
> > modesets, replacing with a disable and then on restore jumping to the
> > final state look doable. It also requires avoiding the struct_mutex for
> > disabling, which is quite easy. To avoid the wait under struct_mutex,
> > we've talked about switching to mmio, but for starters we could move the
> > wait from inside intel_overlay into the fence for the atomic operation.
> > (But's that a little more surgery than we would like for intel_overlay I
> > guess - dig out Ville's patches for overlay planes?) And to prevent the
> > wait under struct_mutex for pin_to_display_plane, my plane is to move
> > that to an async fenced operation that is then naturally waited upon by
> > the atomic modeset.
> 
> A bit more a hack, but a different idea, and I think hack for gen234.0 is
> ok:
> 
> We complete all the requests before we start the hw reset with fence.error
> = -EIO. But we do this only when we need to get at the display locks. A
> slightly more elegant solution would be to trylock modeset locks, and if
> one of them fails (and only then) complete all requests with -EIO to get
> the concurrent modeset to proceed before we reset the hardware. That's
> essentially the logic we had before all the reworks, and it worked. But I
> didn't look at how scary that all would be to make it work again ...

The modeset lock may not just be waiting on our requests (even on pnv we
can expect that there are already users celebrating that pnv+nouveau
finally works ;) and that the display is not the only user/observer of
those requests. Using the requests to break the modeset lock just feels
like the wrong approach.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.
  2017-01-27 14:31     ` Chris Wilson
@ 2017-01-27 14:58       ` Daniel Vetter
  2017-01-27 15:08         ` Chris Wilson
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Vetter @ 2017-01-27 14:58 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Maarten Lankhorst, intel-gfx, dri-devel

On Fri, Jan 27, 2017 at 02:31:55PM +0000, Chris Wilson wrote:
> On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote:
> > On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote:
> > > On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
> > > > When writing some testcases for nonblocking modesets. I found out that the
> > > > infinite wait on the old fb was causing issues.
> > > 
> > > The crux of the issue here is the locked wait for old dependencies and
> > > the inability to inject the intel_prepare_reset disabling of all planes.
> > > There are a couple of locked waits on struct_mutex within the modeset
> > > locks for intel_overlay and if we happen to be using the display plane
> > > for the first time.
> > > 
> > > The first I suggested solving using fences to track dependencies and
> > > keep the order between atomic states. Cancelling the outstanding
> > > modesets, replacing with a disable and then on restore jumping to the
> > > final state look doable. It also requires avoiding the struct_mutex for
> > > disabling, which is quite easy. To avoid the wait under struct_mutex,
> > > we've talked about switching to mmio, but for starters we could move the
> > > wait from inside intel_overlay into the fence for the atomic operation.
> > > (But's that a little more surgery than we would like for intel_overlay I
> > > guess - dig out Ville's patches for overlay planes?) And to prevent the
> > > wait under struct_mutex for pin_to_display_plane, my plane is to move
> > > that to an async fenced operation that is then naturally waited upon by
> > > the atomic modeset.
> > 
> > A bit more a hack, but a different idea, and I think hack for gen234.0 is
> > ok:
> > 
> > We complete all the requests before we start the hw reset with fence.error
> > = -EIO. But we do this only when we need to get at the display locks. A
> > slightly more elegant solution would be to trylock modeset locks, and if
> > one of them fails (and only then) complete all requests with -EIO to get
> > the concurrent modeset to proceed before we reset the hardware. That's
> > essentially the logic we had before all the reworks, and it worked. But I
> > didn't look at how scary that all would be to make it work again ...
> 
> The modeset lock may not just be waiting on our requests (even on pnv we
> can expect that there are already users celebrating that pnv+nouveau
> finally works ;) and that the display is not the only user/observer of
> those requests. Using the requests to break the modeset lock just feels
> like the wrong approach.

It's a cycle, and we need to break it somewhere. Another option might be
to break the cycle the same way we do it for gem locks: Wake up everyone
and restart the modeset ioctl. Since the trouble only happens for
synchronous modesets where we hold the locks while waiting for fences, we
can also break out of that and restart. And I also don't think that would
leak to other drivers, after all our gem locking restart dances also don't
leak to other drivers - it's just our own driver's lock which are affected
by these special wakupe semantics.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.
  2017-01-27 14:58       ` [Intel-gfx] " Daniel Vetter
@ 2017-01-27 15:08         ` Chris Wilson
  2017-01-30  8:17           ` Daniel Vetter
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2017-01-27 15:08 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, dri-devel

On Fri, Jan 27, 2017 at 03:58:08PM +0100, Daniel Vetter wrote:
> On Fri, Jan 27, 2017 at 02:31:55PM +0000, Chris Wilson wrote:
> > On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote:
> > > On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote:
> > > > On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
> > > > > When writing some testcases for nonblocking modesets. I found out that the
> > > > > infinite wait on the old fb was causing issues.
> > > > 
> > > > The crux of the issue here is the locked wait for old dependencies and
> > > > the inability to inject the intel_prepare_reset disabling of all planes.
> > > > There are a couple of locked waits on struct_mutex within the modeset
> > > > locks for intel_overlay and if we happen to be using the display plane
> > > > for the first time.
> > > > 
> > > > The first I suggested solving using fences to track dependencies and
> > > > keep the order between atomic states. Cancelling the outstanding
> > > > modesets, replacing with a disable and then on restore jumping to the
> > > > final state look doable. It also requires avoiding the struct_mutex for
> > > > disabling, which is quite easy. To avoid the wait under struct_mutex,
> > > > we've talked about switching to mmio, but for starters we could move the
> > > > wait from inside intel_overlay into the fence for the atomic operation.
> > > > (But's that a little more surgery than we would like for intel_overlay I
> > > > guess - dig out Ville's patches for overlay planes?) And to prevent the
> > > > wait under struct_mutex for pin_to_display_plane, my plane is to move
> > > > that to an async fenced operation that is then naturally waited upon by
> > > > the atomic modeset.
> > > 
> > > A bit more a hack, but a different idea, and I think hack for gen234.0 is
> > > ok:
> > > 
> > > We complete all the requests before we start the hw reset with fence.error
> > > = -EIO. But we do this only when we need to get at the display locks. A
> > > slightly more elegant solution would be to trylock modeset locks, and if
> > > one of them fails (and only then) complete all requests with -EIO to get
> > > the concurrent modeset to proceed before we reset the hardware. That's
> > > essentially the logic we had before all the reworks, and it worked. But I
> > > didn't look at how scary that all would be to make it work again ...
> > 
> > The modeset lock may not just be waiting on our requests (even on pnv we
> > can expect that there are already users celebrating that pnv+nouveau
> > finally works ;) and that the display is not the only user/observer of
> > those requests. Using the requests to break the modeset lock just feels
> > like the wrong approach.
> 
> It's a cycle, and we need to break it somewhere. Another option might be
> to break the cycle the same way we do it for gem locks: Wake up everyone
> and restart the modeset ioctl. Since the trouble only happens for
> synchronous modesets where we hold the locks while waiting for fences, we
> can also break out of that and restart. And I also don't think that would
> leak to other drivers, after all our gem locking restart dances also don't
> leak to other drivers - it's just our own driver's lock which are affected
> by these special wakupe semantics.

It's a queue of nonblocking modesets that we need to worry about, afaik.
Moving the wait for blocking modeset outside of modeset lock is easily
achievable (and avoiding the other waits under both the modeset + 
struct_mutex I have at least an idea for). So the challenge is how to
inject all-planes-off for gen3 and then allow the queue to continue again
afterwards.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.
  2017-01-27 15:08         ` Chris Wilson
@ 2017-01-30  8:17           ` Daniel Vetter
  2017-01-30 14:42             ` Maarten Lankhorst
  2017-01-30 15:25             ` [PATCH] drm/i915: Skip modeset locking when atomic pageflips are used Maarten Lankhorst
  0 siblings, 2 replies; 15+ messages in thread
From: Daniel Vetter @ 2017-01-30  8:17 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Maarten Lankhorst, intel-gfx, dri-devel

On Fri, Jan 27, 2017 at 03:08:45PM +0000, Chris Wilson wrote:
> On Fri, Jan 27, 2017 at 03:58:08PM +0100, Daniel Vetter wrote:
> > On Fri, Jan 27, 2017 at 02:31:55PM +0000, Chris Wilson wrote:
> > > On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote:
> > > > On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote:
> > > > > On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
> > > > > > When writing some testcases for nonblocking modesets. I found out that the
> > > > > > infinite wait on the old fb was causing issues.
> > > > > 
> > > > > The crux of the issue here is the locked wait for old dependencies and
> > > > > the inability to inject the intel_prepare_reset disabling of all planes.
> > > > > There are a couple of locked waits on struct_mutex within the modeset
> > > > > locks for intel_overlay and if we happen to be using the display plane
> > > > > for the first time.
> > > > > 
> > > > > The first I suggested solving using fences to track dependencies and
> > > > > keep the order between atomic states. Cancelling the outstanding
> > > > > modesets, replacing with a disable and then on restore jumping to the
> > > > > final state look doable. It also requires avoiding the struct_mutex for
> > > > > disabling, which is quite easy. To avoid the wait under struct_mutex,
> > > > > we've talked about switching to mmio, but for starters we could move the
> > > > > wait from inside intel_overlay into the fence for the atomic operation.
> > > > > (But's that a little more surgery than we would like for intel_overlay I
> > > > > guess - dig out Ville's patches for overlay planes?) And to prevent the
> > > > > wait under struct_mutex for pin_to_display_plane, my plane is to move
> > > > > that to an async fenced operation that is then naturally waited upon by
> > > > > the atomic modeset.
> > > > 
> > > > A bit more a hack, but a different idea, and I think hack for gen234.0 is
> > > > ok:
> > > > 
> > > > We complete all the requests before we start the hw reset with fence.error
> > > > = -EIO. But we do this only when we need to get at the display locks. A
> > > > slightly more elegant solution would be to trylock modeset locks, and if
> > > > one of them fails (and only then) complete all requests with -EIO to get
> > > > the concurrent modeset to proceed before we reset the hardware. That's
> > > > essentially the logic we had before all the reworks, and it worked. But I
> > > > didn't look at how scary that all would be to make it work again ...
> > > 
> > > The modeset lock may not just be waiting on our requests (even on pnv we
> > > can expect that there are already users celebrating that pnv+nouveau
> > > finally works ;) and that the display is not the only user/observer of
> > > those requests. Using the requests to break the modeset lock just feels
> > > like the wrong approach.
> > 
> > It's a cycle, and we need to break it somewhere. Another option might be
> > to break the cycle the same way we do it for gem locks: Wake up everyone
> > and restart the modeset ioctl. Since the trouble only happens for
> > synchronous modesets where we hold the locks while waiting for fences, we
> > can also break out of that and restart. And I also don't think that would
> > leak to other drivers, after all our gem locking restart dances also don't
> > leak to other drivers - it's just our own driver's lock which are affected
> > by these special wakupe semantics.
> 
> It's a queue of nonblocking modesets that we need to worry about, afaik.
> Moving the wait for blocking modeset outside of modeset lock is easily
> achievable (and avoiding the other waits under both the modeset + 
> struct_mutex I have at least an idea for). So the challenge is how to
> inject all-planes-off for gen3 and then allow the queue to continue again
> afterwards.

Hm right, I missed the nonblocking updates which don't take locks. But
assuming we do the display reset for gpu resets as a full modeset (i.e.
going through ->atomic_commit) it should still work out correctly:

Starting state: gpu is hung, nonblocking modeset waiting for some requests
to complete.

1. hangcheck kicks in, fires off reset work.

2. We complete all requests with fence.error = -EIO and wake up any
waiters. That means no re-queueing for older platforms, but oh well.

3. We grab all the display locks. Nothing happens yet.

4. We reset the chip, display dies.

5. We run ->atomic_commit to restore things. This will also force the
nonblocking commit worker to complete before this display restore touches
anything.

The only trouble I see is that the nonblocking worker can still touch the
display block while we kill it, which isn't awesome. But we can fix that
by waiting for all pending nonblocking commits in step 3 manually (without
calling into atomic_commit), as long as we do step 2.

So completing everything with EIO unconditionally still seems like the
simplest option that actually works for pre-g4x ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.
  2017-01-30  8:17           ` Daniel Vetter
@ 2017-01-30 14:42             ` Maarten Lankhorst
  2017-01-31  7:46               ` [Intel-gfx] " Daniel Vetter
  2017-01-30 15:25             ` [PATCH] drm/i915: Skip modeset locking when atomic pageflips are used Maarten Lankhorst
  1 sibling, 1 reply; 15+ messages in thread
From: Maarten Lankhorst @ 2017-01-30 14:42 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson, intel-gfx, dri-devel

Op 30-01-17 om 09:17 schreef Daniel Vetter:
> On Fri, Jan 27, 2017 at 03:08:45PM +0000, Chris Wilson wrote:
>> On Fri, Jan 27, 2017 at 03:58:08PM +0100, Daniel Vetter wrote:
>>> On Fri, Jan 27, 2017 at 02:31:55PM +0000, Chris Wilson wrote:
>>>> On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote:
>>>>> On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote:
>>>>>> On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
>>>>>>> When writing some testcases for nonblocking modesets. I found out that the
>>>>>>> infinite wait on the old fb was causing issues.
>>>>>> The crux of the issue here is the locked wait for old dependencies and
>>>>>> the inability to inject the intel_prepare_reset disabling of all planes.
>>>>>> There are a couple of locked waits on struct_mutex within the modeset
>>>>>> locks for intel_overlay and if we happen to be using the display plane
>>>>>> for the first time.
>>>>>>
>>>>>> The first I suggested solving using fences to track dependencies and
>>>>>> keep the order between atomic states. Cancelling the outstanding
>>>>>> modesets, replacing with a disable and then on restore jumping to the
>>>>>> final state look doable. It also requires avoiding the struct_mutex for
>>>>>> disabling, which is quite easy. To avoid the wait under struct_mutex,
>>>>>> we've talked about switching to mmio, but for starters we could move the
>>>>>> wait from inside intel_overlay into the fence for the atomic operation.
>>>>>> (But's that a little more surgery than we would like for intel_overlay I
>>>>>> guess - dig out Ville's patches for overlay planes?) And to prevent the
>>>>>> wait under struct_mutex for pin_to_display_plane, my plane is to move
>>>>>> that to an async fenced operation that is then naturally waited upon by
>>>>>> the atomic modeset.
>>>>> A bit more a hack, but a different idea, and I think hack for gen234.0 is
>>>>> ok:
>>>>>
>>>>> We complete all the requests before we start the hw reset with fence.error
>>>>> = -EIO. But we do this only when we need to get at the display locks. A
>>>>> slightly more elegant solution would be to trylock modeset locks, and if
>>>>> one of them fails (and only then) complete all requests with -EIO to get
>>>>> the concurrent modeset to proceed before we reset the hardware. That's
>>>>> essentially the logic we had before all the reworks, and it worked. But I
>>>>> didn't look at how scary that all would be to make it work again ...
>>>> The modeset lock may not just be waiting on our requests (even on pnv we
>>>> can expect that there are already users celebrating that pnv+nouveau
>>>> finally works ;) and that the display is not the only user/observer of
>>>> those requests. Using the requests to break the modeset lock just feels
>>>> like the wrong approach.
>>> It's a cycle, and we need to break it somewhere. Another option might be
>>> to break the cycle the same way we do it for gem locks: Wake up everyone
>>> and restart the modeset ioctl. Since the trouble only happens for
>>> synchronous modesets where we hold the locks while waiting for fences, we
>>> can also break out of that and restart. And I also don't think that would
>>> leak to other drivers, after all our gem locking restart dances also don't
>>> leak to other drivers - it's just our own driver's lock which are affected
>>> by these special wakupe semantics.
>> It's a queue of nonblocking modesets that we need to worry about, afaik.
>> Moving the wait for blocking modeset outside of modeset lock is easily
>> achievable (and avoiding the other waits under both the modeset + 
>> struct_mutex I have at least an idea for). So the challenge is how to
>> inject all-planes-off for gen3 and then allow the queue to continue again
>> afterwards.
> Hm right, I missed the nonblocking updates which don't take locks. But
> assuming we do the display reset for gpu resets as a full modeset (i.e.
> going through ->atomic_commit) it should still work out correctly:
>
> Starting state: gpu is hung, nonblocking modeset waiting for some requests
> to complete.
Missing one evil detail here, else things would have moved forward..

A unrelated thread performs a blocking commit, and holds all locks until the nonblocking modeset completes.
> 1. hangcheck kicks in, fires off reset work.
>
> 2. We complete all requests with fence.error = -EIO and wake up any
> waiters. That means no re-queueing for older platforms, but oh well.
>
> 3. We grab all the display locks. Nothing happens yet.
>
> 4. We reset the chip, display dies.
>
> 5. We run ->atomic_commit to restore things. This will also force the
> nonblocking commit worker to complete before this display restore touches
> anything.
>
> The only trouble I see is that the nonblocking worker can still touch the
> display block while we kill it, which isn't awesome. But we can fix that
> by waiting for all pending nonblocking commits in step 3 manually (without
> calling into atomic_commit), as long as we do step 2.
>
> So completing everything with EIO unconditionally still seems like the
> simplest option that actually works for pre-g4x ...
> -Daniel


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] drm/i915: Skip modeset locking when atomic pageflips are used.
  2017-01-30  8:17           ` Daniel Vetter
  2017-01-30 14:42             ` Maarten Lankhorst
@ 2017-01-30 15:25             ` Maarten Lankhorst
  1 sibling, 0 replies; 15+ messages in thread
From: Maarten Lankhorst @ 2017-01-30 15:25 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson, intel-gfx, dri-devel

With the atomic helper for pageflips there are no CS flips when
that require resetting, so on most platforms we can completely
skip the locking.

Because we may end up reverting the page_flip patch add an explicit
function pointer check so that if someone reverts the page flip patch
there will not be any issues if this is forgotten.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Testcase: kms_busy.extended-modeset-hang-oldfb-*
---
This is a standalone patch to fix modeset hangs on g4x+.

The path for gen4 and lower is simulated in kms_busy.extended-modeset-hang-oldfb-with-reset
and still fails.

 drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index d8db1caec1b8..1dd480a6752a 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3529,6 +3529,8 @@ static bool gpu_reset_clobbers_display(struct drm_i915_private *dev_priv)
 		INTEL_GEN(dev_priv) < 5 && !IS_G4X(dev_priv);
 }
 
+static const struct drm_crtc_funcs intel_crtc_funcs;
+
 void intel_prepare_reset(struct drm_i915_private *dev_priv)
 {
 	struct drm_device *dev = &dev_priv->drm;
@@ -3536,6 +3538,11 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
 	struct drm_atomic_state *state;
 	int ret;
 
+	if (intel_crtc_funcs.page_flip == drm_atomic_helper_page_flip &&
+	    !i915.force_reset_modeset_test &&
+	    !gpu_reset_clobbers_display(dev_priv))
+		return;
+
 	/*
 	 * Need mode_config.mutex so that we don't
 	 * trample ongoing ->detect() and whatnot.
@@ -3584,6 +3591,11 @@ void intel_finish_reset(struct drm_i915_private *dev_priv)
 	struct drm_atomic_state *state = dev_priv->modeset_restore_state;
 	int ret;
 
+	if (intel_crtc_funcs.page_flip == drm_atomic_helper_page_flip &&
+	    !i915.force_reset_modeset_test &&
+	    !gpu_reset_clobbers_display(dev_priv))
+		return;
+
 	/*
 	 * Flips in the rings will be nuked by the reset,
 	 * so complete all pending flips so that user space
-- 
2.7.4


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.
  2017-01-30 14:42             ` Maarten Lankhorst
@ 2017-01-31  7:46               ` Daniel Vetter
  2017-01-31  9:11                 ` Maarten Lankhorst
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Vetter @ 2017-01-31  7:46 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: intel-gfx, dri-devel

On Mon, Jan 30, 2017 at 03:42:17PM +0100, Maarten Lankhorst wrote:
> Op 30-01-17 om 09:17 schreef Daniel Vetter:
> > On Fri, Jan 27, 2017 at 03:08:45PM +0000, Chris Wilson wrote:
> >> On Fri, Jan 27, 2017 at 03:58:08PM +0100, Daniel Vetter wrote:
> >>> On Fri, Jan 27, 2017 at 02:31:55PM +0000, Chris Wilson wrote:
> >>>> On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote:
> >>>>> On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote:
> >>>>>> On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
> >>>>>>> When writing some testcases for nonblocking modesets. I found out that the
> >>>>>>> infinite wait on the old fb was causing issues.
> >>>>>> The crux of the issue here is the locked wait for old dependencies and
> >>>>>> the inability to inject the intel_prepare_reset disabling of all planes.
> >>>>>> There are a couple of locked waits on struct_mutex within the modeset
> >>>>>> locks for intel_overlay and if we happen to be using the display plane
> >>>>>> for the first time.
> >>>>>>
> >>>>>> The first I suggested solving using fences to track dependencies and
> >>>>>> keep the order between atomic states. Cancelling the outstanding
> >>>>>> modesets, replacing with a disable and then on restore jumping to the
> >>>>>> final state look doable. It also requires avoiding the struct_mutex for
> >>>>>> disabling, which is quite easy. To avoid the wait under struct_mutex,
> >>>>>> we've talked about switching to mmio, but for starters we could move the
> >>>>>> wait from inside intel_overlay into the fence for the atomic operation.
> >>>>>> (But's that a little more surgery than we would like for intel_overlay I
> >>>>>> guess - dig out Ville's patches for overlay planes?) And to prevent the
> >>>>>> wait under struct_mutex for pin_to_display_plane, my plane is to move
> >>>>>> that to an async fenced operation that is then naturally waited upon by
> >>>>>> the atomic modeset.
> >>>>> A bit more a hack, but a different idea, and I think hack for gen234.0 is
> >>>>> ok:
> >>>>>
> >>>>> We complete all the requests before we start the hw reset with fence.error
> >>>>> = -EIO. But we do this only when we need to get at the display locks. A
> >>>>> slightly more elegant solution would be to trylock modeset locks, and if
> >>>>> one of them fails (and only then) complete all requests with -EIO to get
> >>>>> the concurrent modeset to proceed before we reset the hardware. That's
> >>>>> essentially the logic we had before all the reworks, and it worked. But I
> >>>>> didn't look at how scary that all would be to make it work again ...
> >>>> The modeset lock may not just be waiting on our requests (even on pnv we
> >>>> can expect that there are already users celebrating that pnv+nouveau
> >>>> finally works ;) and that the display is not the only user/observer of
> >>>> those requests. Using the requests to break the modeset lock just feels
> >>>> like the wrong approach.
> >>> It's a cycle, and we need to break it somewhere. Another option might be
> >>> to break the cycle the same way we do it for gem locks: Wake up everyone
> >>> and restart the modeset ioctl. Since the trouble only happens for
> >>> synchronous modesets where we hold the locks while waiting for fences, we
> >>> can also break out of that and restart. And I also don't think that would
> >>> leak to other drivers, after all our gem locking restart dances also don't
> >>> leak to other drivers - it's just our own driver's lock which are affected
> >>> by these special wakupe semantics.
> >> It's a queue of nonblocking modesets that we need to worry about, afaik.
> >> Moving the wait for blocking modeset outside of modeset lock is easily
> >> achievable (and avoiding the other waits under both the modeset + 
> >> struct_mutex I have at least an idea for). So the challenge is how to
> >> inject all-planes-off for gen3 and then allow the queue to continue again
> >> afterwards.
> > Hm right, I missed the nonblocking updates which don't take locks. But
> > assuming we do the display reset for gpu resets as a full modeset (i.e.
> > going through ->atomic_commit) it should still work out correctly:
> >
> > Starting state: gpu is hung, nonblocking modeset waiting for some requests
> > to complete.
> Missing one evil detail here, else things would have moved forward..
> 
> A unrelated thread performs a blocking commit, and holds all locks until the nonblocking modeset completes.

And where is the problem in that? If we first set all fences to -EIO, and
then try to grab locks, that other thread will be able to complete. After
all this scheme worked before we reworked the reset logic completely.
-Daniel

> > 1. hangcheck kicks in, fires off reset work.
> >
> > 2. We complete all requests with fence.error = -EIO and wake up any
> > waiters. That means no re-queueing for older platforms, but oh well.
> >
> > 3. We grab all the display locks. Nothing happens yet.
> >
> > 4. We reset the chip, display dies.
> >
> > 5. We run ->atomic_commit to restore things. This will also force the
> > nonblocking commit worker to complete before this display restore touches
> > anything.
> >
> > The only trouble I see is that the nonblocking worker can still touch the
> > display block while we kill it, which isn't awesome. But we can fix that
> > by waiting for all pending nonblocking commits in step 3 manually (without
> > calling into atomic_commit), as long as we do step 2.
> >
> > So completing everything with EIO unconditionally still seems like the
> > simplest option that actually works for pre-g4x ...
> > -Daniel
> 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly.
  2017-01-31  7:46               ` [Intel-gfx] " Daniel Vetter
@ 2017-01-31  9:11                 ` Maarten Lankhorst
  0 siblings, 0 replies; 15+ messages in thread
From: Maarten Lankhorst @ 2017-01-31  9:11 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, dri-devel

Hey,

Op 31-01-17 om 08:46 schreef Daniel Vetter:
> On Mon, Jan 30, 2017 at 03:42:17PM +0100, Maarten Lankhorst wrote:
>> Op 30-01-17 om 09:17 schreef Daniel Vetter:
>>> On Fri, Jan 27, 2017 at 03:08:45PM +0000, Chris Wilson wrote:
>>>> On Fri, Jan 27, 2017 at 03:58:08PM +0100, Daniel Vetter wrote:
>>>>> On Fri, Jan 27, 2017 at 02:31:55PM +0000, Chris Wilson wrote:
>>>>>> On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote:
>>>>>>> On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote:
>>>>>>>> On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote:
>>>>>>>>> When writing some testcases for nonblocking modesets. I found out that the
>>>>>>>>> infinite wait on the old fb was causing issues.
>>>>>>>> The crux of the issue here is the locked wait for old dependencies and
>>>>>>>> the inability to inject the intel_prepare_reset disabling of all planes.
>>>>>>>> There are a couple of locked waits on struct_mutex within the modeset
>>>>>>>> locks for intel_overlay and if we happen to be using the display plane
>>>>>>>> for the first time.
>>>>>>>>
>>>>>>>> The first I suggested solving using fences to track dependencies and
>>>>>>>> keep the order between atomic states. Cancelling the outstanding
>>>>>>>> modesets, replacing with a disable and then on restore jumping to the
>>>>>>>> final state look doable. It also requires avoiding the struct_mutex for
>>>>>>>> disabling, which is quite easy. To avoid the wait under struct_mutex,
>>>>>>>> we've talked about switching to mmio, but for starters we could move the
>>>>>>>> wait from inside intel_overlay into the fence for the atomic operation.
>>>>>>>> (But's that a little more surgery than we would like for intel_overlay I
>>>>>>>> guess - dig out Ville's patches for overlay planes?) And to prevent the
>>>>>>>> wait under struct_mutex for pin_to_display_plane, my plane is to move
>>>>>>>> that to an async fenced operation that is then naturally waited upon by
>>>>>>>> the atomic modeset.
>>>>>>> A bit more a hack, but a different idea, and I think hack for gen234.0 is
>>>>>>> ok:
>>>>>>>
>>>>>>> We complete all the requests before we start the hw reset with fence.error
>>>>>>> = -EIO. But we do this only when we need to get at the display locks. A
>>>>>>> slightly more elegant solution would be to trylock modeset locks, and if
>>>>>>> one of them fails (and only then) complete all requests with -EIO to get
>>>>>>> the concurrent modeset to proceed before we reset the hardware. That's
>>>>>>> essentially the logic we had before all the reworks, and it worked. But I
>>>>>>> didn't look at how scary that all would be to make it work again ...
>>>>>> The modeset lock may not just be waiting on our requests (even on pnv we
>>>>>> can expect that there are already users celebrating that pnv+nouveau
>>>>>> finally works ;) and that the display is not the only user/observer of
>>>>>> those requests. Using the requests to break the modeset lock just feels
>>>>>> like the wrong approach.
>>>>> It's a cycle, and we need to break it somewhere. Another option might be
>>>>> to break the cycle the same way we do it for gem locks: Wake up everyone
>>>>> and restart the modeset ioctl. Since the trouble only happens for
>>>>> synchronous modesets where we hold the locks while waiting for fences, we
>>>>> can also break out of that and restart. And I also don't think that would
>>>>> leak to other drivers, after all our gem locking restart dances also don't
>>>>> leak to other drivers - it's just our own driver's lock which are affected
>>>>> by these special wakupe semantics.
>>>> It's a queue of nonblocking modesets that we need to worry about, afaik.
>>>> Moving the wait for blocking modeset outside of modeset lock is easily
>>>> achievable (and avoiding the other waits under both the modeset + 
>>>> struct_mutex I have at least an idea for). So the challenge is how to
>>>> inject all-planes-off for gen3 and then allow the queue to continue again
>>>> afterwards.
>>> Hm right, I missed the nonblocking updates which don't take locks. But
>>> assuming we do the display reset for gpu resets as a full modeset (i.e.
>>> going through ->atomic_commit) it should still work out correctly:
>>>
>>> Starting state: gpu is hung, nonblocking modeset waiting for some requests
>>> to complete.
>> Missing one evil detail here, else things would have moved forward..
>>
>> A unrelated thread performs a blocking commit, and holds all locks until the nonblocking modeset completes.
> And where is the problem in that? If we first set all fences to -EIO, and
> then try to grab locks, that other thread will be able to complete. After
> all this scheme worked before we reworked the reset logic completely.
True, but we probably still want to cap the timeout (patch 2) to prevent a deadlock when a fence on another driver misbehaves.

And if we have a timeout, then things will move forward eventually even if we wait for locks, though it might still be a good idea to complete everything with -EIO first
to make it happen faster. :)

~Maarten
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-01-31  9:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-26 15:59 [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly Maarten Lankhorst
2017-01-26 15:59 ` [PATCH 1/3] drm/atomic: Bump timeout for waiting for hw_done to 90s in swap_state Maarten Lankhorst
2017-01-26 15:59 ` [PATCH 2/3] drm/i915: Set a timeout when waiting for fence on the old fb Maarten Lankhorst
2017-01-26 15:59 ` [PATCH 3/3] drm/i915: Skip modeset locking when atomic pageflips are used Maarten Lankhorst
2017-01-26 16:39 ` [PATCH 0/3] drm/i915: Handle hanging during nonblocking modeset correctly Ville Syrjälä
2017-01-27  9:30 ` [Intel-gfx] " Chris Wilson
2017-01-27 14:21   ` Daniel Vetter
2017-01-27 14:31     ` Chris Wilson
2017-01-27 14:58       ` [Intel-gfx] " Daniel Vetter
2017-01-27 15:08         ` Chris Wilson
2017-01-30  8:17           ` Daniel Vetter
2017-01-30 14:42             ` Maarten Lankhorst
2017-01-31  7:46               ` [Intel-gfx] " Daniel Vetter
2017-01-31  9:11                 ` Maarten Lankhorst
2017-01-30 15:25             ` [PATCH] drm/i915: Skip modeset locking when atomic pageflips are used Maarten Lankhorst

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.