All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Daniel Vetter" <daniel@ffwll.ch>,
	"Ville Syrjälä" <ville.syrjala@linux.intel.com>,
	intel-gfx <intel-gfx@lists.freedesktop.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	stable <stable@vger.kernel.org>
Subject: Re: [PATCH v4 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore
Date: Mon, 3 Jul 2017 18:42:33 +0200	[thread overview]
Message-ID: <20170703164233.5zyeqv2ot4ihhprg@phenom.ffwll.local> (raw)
In-Reply-To: <149906981450.32733.596187619908591705@mail.alporthouse.com>

On Mon, Jul 03, 2017 at 09:16:54AM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-07-03 09:03:36)
> > On Fri, Jun 30, 2017 at 5:39 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > >> Yeah, but my point is that this here is an extremely fancy and fragile
> > >> (and afaics in this form, incomplete) fix for something that in the past
> > >> was fixed much, much simpler. Why do we need this massive amount of
> > >> complexity now? Who's asking for all this (we don't even have anyone yet
> > >> asking for fully queued atomic commits, much less on gen4)?
> > >
> > > It was never "fixed", it was always borked; broken by design.
> > 
> > Hm, what was broken by design in gen3/4 reset? We never bothered to
> > resubmit rendering when the gpu died, but besides that I'm not aware
> > of a deisgn issue in that logic. We nuked in-flight pageflips (and
> > restored those), and we stalled for any pending modesets (grabbing
> > locks did that since all modesets where blocking), and that made sure
> > the hw was in a consistent state.
> 
> KMS reset was taking mutexes within reset without any means of breaking the
> inherent deadlock, instead relying on something else to magically fix
> it. We only ever engineered around struct_mutex for reset, the remains
> of the deadlock upon struct_mutex within modeset is an issue but not the
> one causing trouble here.

So on the kms side we've had:
- grab kms locks -> grab struct_mutex -> wait for rendering

- on the reset side we've had the same order afair, with the caveat that
  we've broken the "wait for rendering" complete before trying to grab any
  of the locks in the reset path.

I thought that the problem is that gpu reset stopping force-completing
everything asap, which then lead the kms side to time-out at a point where
it shouldn't and start to fall over.

So before

commit 221fe7994554cc3985fc5d761ed7e44dcae0fa52
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Sep 9 14:11:51 2016 +0100

    drm/i915: Perform a direct reset of the GPU from the waiter

what was the deadlock we've had? Besides breaking the "wait for rendering"
I don't remember any inversions we've had. And for the breaking we've had
a fairly complex dance of barriers and reset_in_progress and waking up
waiters to make sure it would catch them all ...

> Please do note the bugs that indicate that even without modeset reset,
> hw is not in a consistent state.

So there's more bugs, not sure how that is relevant for gpu reset?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel <dri-devel@lists.freedesktop.org>,
	stable <stable@vger.kernel.org>,
	intel-gfx <intel-gfx@lists.freedesktop.org>
Subject: Re: [PATCH v4 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore
Date: Mon, 3 Jul 2017 18:42:33 +0200	[thread overview]
Message-ID: <20170703164233.5zyeqv2ot4ihhprg@phenom.ffwll.local> (raw)
In-Reply-To: <149906981450.32733.596187619908591705@mail.alporthouse.com>

On Mon, Jul 03, 2017 at 09:16:54AM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-07-03 09:03:36)
> > On Fri, Jun 30, 2017 at 5:39 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > >> Yeah, but my point is that this here is an extremely fancy and fragile
> > >> (and afaics in this form, incomplete) fix for something that in the past
> > >> was fixed much, much simpler. Why do we need this massive amount of
> > >> complexity now? Who's asking for all this (we don't even have anyone yet
> > >> asking for fully queued atomic commits, much less on gen4)?
> > >
> > > It was never "fixed", it was always borked; broken by design.
> > 
> > Hm, what was broken by design in gen3/4 reset? We never bothered to
> > resubmit rendering when the gpu died, but besides that I'm not aware
> > of a deisgn issue in that logic. We nuked in-flight pageflips (and
> > restored those), and we stalled for any pending modesets (grabbing
> > locks did that since all modesets where blocking), and that made sure
> > the hw was in a consistent state.
> 
> KMS reset was taking mutexes within reset without any means of breaking the
> inherent deadlock, instead relying on something else to magically fix
> it. We only ever engineered around struct_mutex for reset, the remains
> of the deadlock upon struct_mutex within modeset is an issue but not the
> one causing trouble here.

So on the kms side we've had:
- grab kms locks -> grab struct_mutex -> wait for rendering

- on the reset side we've had the same order afair, with the caveat that
  we've broken the "wait for rendering" complete before trying to grab any
  of the locks in the reset path.

I thought that the problem is that gpu reset stopping force-completing
everything asap, which then lead the kms side to time-out at a point where
it shouldn't and start to fall over.

So before

commit 221fe7994554cc3985fc5d761ed7e44dcae0fa52
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Sep 9 14:11:51 2016 +0100

    drm/i915: Perform a direct reset of the GPU from the waiter

what was the deadlock we've had? Besides breaking the "wait for rendering"
I don't remember any inversions we've had. And for the breaking we've had
a fairly complex dance of barriers and reset_in_progress and waking up
waiters to make sure it would catch them all ...

> Please do note the bugs that indicate that even without modeset reset,
> hw is not in a consistent state.

So there's more bugs, not sure how that is relevant for gpu reset?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2017-07-03 16:42 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-29 13:49 [PATCH 0/5] drm/i915: Fix pre-g4x GPU reset, again ville.syrjala
2017-06-29 13:49 ` [PATCH 1/5] drm/atomic: Refactor drm_atomic_state_realloc_connectors() ville.syrjala
2017-06-30 13:15   ` Daniel Vetter
2017-06-29 13:49 ` [PATCH 2/5] drm/atomic: Introduce drm_atomic_helper_duplicate_committed_state() ville.syrjala
2017-06-29 13:49 ` [PATCH 3/5] drm/i915% Store vma gtt offset in plane state ville.syrjala
2017-06-29 13:49 ` [PATCH 4/5] drm/i915: Refactor __intel_atomic_commit_tail() ville.syrjala
2017-06-29 13:49 ` [PATCH 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore ville.syrjala
2017-06-29 13:56   ` Chris Wilson
2017-06-29 14:36   ` [PATCH v4 " ville.syrjala
2017-06-29 14:36     ` ville.syrjala
2017-06-29 17:57     ` Chris Wilson
2017-06-29 19:26       ` Ville Syrjälä
2017-06-29 19:26         ` Ville Syrjälä
2017-06-30 13:35         ` Daniel Vetter
2017-06-30 13:35           ` Daniel Vetter
2017-06-30 13:53           ` Ville Syrjälä
2017-06-30 13:53             ` Ville Syrjälä
2017-06-30 15:30             ` Daniel Vetter
2017-06-30 15:30               ` Daniel Vetter
2017-06-30 15:39               ` Chris Wilson
2017-06-30 15:39                 ` Chris Wilson
2017-07-03  8:03                 ` Daniel Vetter
2017-07-03  8:03                   ` Daniel Vetter
2017-07-03  8:16                   ` Chris Wilson
2017-07-03 16:42                     ` Daniel Vetter [this message]
2017-07-03 16:42                       ` Daniel Vetter
2017-06-30 13:25   ` [PATCH " Daniel Vetter
2017-06-30 13:25     ` Daniel Vetter
2017-06-30 13:30     ` Daniel Vetter
2017-06-30 13:30       ` Daniel Vetter
2017-06-30 15:44       ` Ville Syrjälä
2017-06-30 15:44         ` Ville Syrjälä
2017-06-30 18:23         ` Daniel Vetter
2017-06-30 18:46           ` Ville Syrjälä
2017-06-30 18:46             ` Ville Syrjälä
2017-07-03  7:55             ` Daniel Vetter
2017-07-03  7:55               ` Daniel Vetter
2017-07-03  9:30               ` Ville Syrjälä
2017-07-03  9:30                 ` Ville Syrjälä
2017-07-03 16:48                 ` Daniel Vetter
2017-07-03 16:48                   ` Daniel Vetter
2017-06-29 14:24 ` ✓ Fi.CI.BAT: success for drm/i915: Fix pre-g4x GPU reset, again Patchwork
2017-06-29 15:07   ` Ville Syrjälä
2017-06-29 14:55 ` ✓ Fi.CI.BAT: success for drm/i915: Fix pre-g4x GPU reset, again (rev2) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170703164233.5zyeqv2ot4ihhprg@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=chris@chris-wilson.co.uk \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=stable@vger.kernel.org \
    --cc=ville.syrjala@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.