From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH 00/81] modeset rework
Date: Sun, 15 Jul 2012 16:57:33 +0200
Message-ID: <20120715145733.GB5184@phenom.ffwll.local>
References: <1342016944-23395-1-git-send-email-daniel.vetter@ffwll.ch>
	<1342358035_1196@CP5-2952>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org>
Received: from mail-wi0-f177.google.com (mail-wi0-f177.google.com
	[209.85.212.177])
	by gabe.freedesktop.org (Postfix) with ESMTP id 048F99E752
	for <intel-gfx@lists.freedesktop.org>;
	Sun, 15 Jul 2012 07:57:29 -0700 (PDT)
Received: by wibhm11 with SMTP id hm11so1530477wib.12
	for <intel-gfx@lists.freedesktop.org>;
	Sun, 15 Jul 2012 07:57:29 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <1342358035_1196@CP5-2952>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, Intel Graphics Development <intel-gfx@lists.freedesktop.org>
List-Id: intel-gfx@lists.freedesktop.org

On Sun, Jul 15, 2012 at 02:13:41PM +0100, Chris Wilson wrote:
> On Wed, 11 Jul 2012 16:27:43 +0200, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > Or: How to fix cpu edp in 81 simple steps. Admittedly this includes some minor
> > detours.
> 
> Let me quickly jump in with my demands:
> 
> That modeset should detect failure and propagate that back to the user.
> 
> The modesetting code has outgrown its origins of setting a couple of
> registers and be done, as we are now in the domain of fiddly link
> training and end-point negotations. So it is fragile, can conceivably
> fail and so the errors need to be reported rather than blindly assume
> the output is now working.
> 
> So all of these nice new WARNs I'm seeing need to return error codes as
> well.

Tbh failure propagation hasn't been front-and-center when I've done this
rework, but I've certainly noticed that we currenlty suck at this. Hence
nothing too coherent, but just a few thoughts:

- I think the WARNs should stay as-is. With the new code I only call them
  once the modeset has completely, or at least once our code thinks it has
  completed. Imo it's a really good tool to catch state-tracking goof-ups
  or modeset sequence ordering issues, not more. I.e. all the WARNs I've
  hit cleared indicated an issue in the code itself, and usually where the
  harbingers of this pesky permanent-black-screen-till-reboot issue.

- The current error code handling inherited from the crtc helpers is
  rather lacking, i.e. I wondered whether I should smash a quick patch on
  top of this series to make set_base interruptible. And then noticed that
  we only return a bool in set_base (and many other such places). So even
  improving this for some simple&know cases will be quite some work.

- I think we should also differentiate between the different reasons why a
  modeset can fail: From modes that the output doesn't support, failure in
  pinning the fb because the gpu died, global resource allocation issues
  (e.g. lack of pll, fdi links, ...) to the hw actually doing something
  wrong. Current set_mode ioctl is restricted to -ERRNO, but I wonder
  whether we should add something more flexible for the global modeset
  code (similar to how we already have tons of reasons to reject a mode in
  the ->mode_valid functions).

- For global resource allocation and reservation (get_pch, pin_and_fence
  ...) I think we should move this kind of stuff out of the crtc/encoder
  callbacks into a prepare step (maybe merged together with the
  ->mode_fixup step). That way the actual modeset code should only fail if
  the hw is in a non-cooperative mood. This is also a necessary step to
  implement Jesse's idea of a modeset configuration checker. If we'd add
  some drm modeset specific error codes the mode configuration selection
  tool could even show nice tooltips on greyed-out-modes, explaning why
  the mode doesn't work (e.g. "not enough memory bandwidth", "not enough
  plls", or some generic "not enough resources" error ...). Yeah, I can
  have dreams ...

- After all that, we'd be left with with the true hw issues like link
  training failures or stuck pipes. Judging by bug reports and own
  experience, once we're in such a state, we're stuck in such a state and
  re-doing the modeset doesn't help too much. We already try to restore
  the old mode, but if that also fails I'm not too sure what userspace
  should do with the resulting -EIO.

At least for the short-term I think that our modeset code will profits the
most if we exploit the simplified state machine of the new code and cut
down on the statespace we're handling in the code. Like this patch series
already does for DP. Also, all these WARNs should help in improving our
modeset experience indirectly, through more bug reports and their fixes
;-)

Long term I agree that we need robuster error handling, but I think
handling of hw error should be about the last item: Ime we tend to keep on
flailing once we start, and I expect the pay-off in code quality and
robustness of the previous items to be better. In any case, this is all
stuff that should sit on top of this modeset rework.

Cc'ing Ville, maybe he has some further ideas from his global modeset
work.

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48