Re: [PATCH] drm/i915: Wait for PSR exit before checking for vblank evasion for an atomic update

From: Tarun Vyas <tarun.vyas@intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org, dhinakaran.pandiyan@intel.com
Subject: Re: [PATCH] drm/i915: Wait for PSR exit before checking for vblank evasion for an atomic update
Date: Thu, 3 May 2018 10:08:43 -0700	[thread overview]
Message-ID: <20180503170843.GA42101@otc-chromeosbuild-5> (raw)
In-Reply-To: <20180503165856.GB17530@intel.com>

On Thu, May 03, 2018 at 09:58:56AM -0700, Rodrigo Vivi wrote:
> On Wed, May 02, 2018 at 03:31:15PM -0700, Tarun Vyas wrote:
> > On Wed, May 02, 2018 at 01:04:06PM -0700, Vivi, Rodrigo wrote:
> > > On Wed, May 02, 2018 at 09:51:43PM +0300, Ville Syrjälä wrote:
> > > > On Wed, May 02, 2018 at 11:19:14AM -0700, Tarun Vyas wrote:
> > > > > On Mon, Apr 30, 2018 at 10:19:33AM -0700, Rodrigo Vivi wrote:
> > > > > > On Sun, Apr 29, 2018 at 09:00:18PM -0700, Tarun Vyas wrote:
> > > > > > > From: Tarun <tarun.vyas@intel.com>
> > > > > > >
> > > > > > > The PIPEDSL freezes on PSR entry and if PSR hasn't fully exited, then
> > > > > > > the pipe_update_start call schedules itself out to check back later.
> > > > > > >
> > > > > > > On ChromeOS-4.4 kernel, which is fairly up-to-date w.r.t drm/i915 but
> > > > > > > lags w.r.t core kernel code, hot plugging an external display triggers
> > > > > > > tons of "potential atomic update errors" in the dmesg, on *pipe A*. A
> > > > > > > closer analysis reveals that we try to read the scanline 3 times and
> > > > > > > eventually timeout, b/c PSR hasn't exited fully leading to a PIPEDSL
> > > > > > > stuck @ 1599. This issue is not seen on upstream kernels, b/c for *some*
> > > > > > > reason we loop inside intel_pipe_update start for ~2+ msec which in this
> > > > > > > case is more than enough to exit PSR fully, hence an *unstuck* PIPEDSL
> > > > > > > counter, hence no error. On the other hand, the ChromeOS kernel spends
> > > > > > > ~1.1 msec looping inside intel_pipe_update_start and hence errors out
> > > > > > > b/c the source is still in PSR.
> > > > > > >
> > > > > > > Regardless, we should wait for PSR exit (if PSR is supported and active
> > > > > > > on the current pipe) before reading the PIPEDSL, b/c if we haven't
> > > > > > > fully exited PSR, then checking for vblank evasion isn't actually
> > > > > > > applicable.
> > > > > > >
> > > > > > > This scenario applies to a configuration with an additional pipe,
> > > > > > > as of now.
> > > > > >
> > > > > > I honestly believe you picking the wrong culprit here. By "coincidence".
> > > > > > PSR will allow DC state with screen on and DC state will mess up with all
> > > > > > registers reads....
> > > > > >
> > > > > > probably what you are missing you your kernel is some power domain
> > > > > > grab that would keep DC_OFF and consequently a sane read of these
> > > > > > registers.
> > > > > >
> > > > > > Maybe Imre has a quick idea of what you could be missing on your kernel
> > > > > > that we already have on upstream one.
> > > > > >
> > > > > > Thanks,
> > > > > > Rodrigo.
> > > > > >
> > > > > Thanks for the quick response Rodrigo !
> > > > > Some key observations based on my experiments so far:
> > > > >        for (;;) {
> > > > >                 /*
> > > > >                  * prepare_to_wait() has a memory barrier, which guarantees
> > > > >                  * other CPUs can see the task state update by the time we
> > > > >                  * read the scanline.
> > > > >                  */
> > > > >                 prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> > > > >
> > > > >                 scanline = intel_get_crtc_scanline(crtc);
> > > > >                 if (scanline < min || scanline > max)
> > > > >                         break;
> > > > >
> > > > >                 if (timeout <= 0) {
> > > > >                         DRM_ERROR("Potential atomic update failure on pipe %c\n",
> > > > >                                   pipe_name(crtc->pipe));
> > > > >                         break;
> > > > >                 }
> > > > >
> > > > >                 local_irq_enable();
> > > > >
> > > > >                 timeout = schedule_timeout(timeout);
> > > > >
> > > > >                 local_irq_disable();
> > > > >         }
> > > > > 1. In the above loop inside pipe_update_start, the *first time*, we read the PIPEDSL, with PSR1 and external display connected, it always reads 1599, for *both* the kernels(upstream and ChromeOS-4.4) . The PSR_STATUS also reads the exact same for *both* kernels and shows that we haven't *fully* exited PSR.
> > > > >
> > > > > 2. The difference between the two kernels comes after this first read of the PIPEDSL. ChromeOS-4.4 spends ~1 msec inside that loop and upstream spends ~2msec. I suspect that it is because of the scheduling changes between the two kernels, b/c I can't find any i915 specific code running in that loop, except for vblank processing.
> > > > >
> > > > > 3. So to summarize it, both the kernels are in the same state w.r.t PSR and PIPEDSL value when they read the PIPEDSL for the first time inside the loop. *When* the kernels *transition* to a *full PSR exit* is what is differing.
> > > 
> > > Oh! So you really are getting reliable counters....
> > > 
> > > > >
> > > > > My rationale for this patch is that, the pipe_update_start function is meant to evade 100 usec before a vblank, but, *if* we haven't *fully* exited PSR (which is true for both the kernels for the first PIPEDSL read), then vblank evasion is *not applicable* b/c the PIPEDSL will be messed up. So we shouldn't bother evading vblank until we have fully exited PSR.
> > > >
> > > > Yeah, I think this is the right direction. The problem really is the
> > > > extra vblank pulse that the hardware generates (or at least can
> > > > generate depending on a chicken bit) when it exits PSR. We have no
> > > > control over when that happens and hence we have no control over when
> > > > the registers get latched. And yet we still have to somehow prevent
> > > > the register latching from occurring while we're in middle of
> > > > reprogramming them.
> > > 
> > > I see the problem now. Thanks for the explanation.
> > > 
> > > >
> > > > There are a couple of ways to avoid this:
> > > > 1) Set the chicken bit so that we don't get the vblank pulse. The
> > > >    pipe should restart from the vblank start, so we would have one
> > > >    full frame to reprogam the registers. Howver IIRC DK told me
> > > >    there is no way to fully eliminate it in all cases so this
> > > >    option is probably out. There was also some implication for FBC
> > > >    which I already forgot.
> > > > 2) Make sure we've exited PSR before repgrogamming the registers
> > > >    (ie. what you do).
> > > > 3) Use the DOUBLE_BUFFER_CTL to prevent the extra vblank pulse from
> > > >    latching the registers while we're still reprogramming them.
> > > >    This feature only exists on SKL+ so is not a solution for
> > > >    HSW/BDW. But maybe HSW/BDW didn't even have the extra vblank
> > > >    pulse?
> > > >
> > > > Option 2) does provide a consistent behaviour on all platforms, so I
> > > > do kinda like it. It also avoids a bigger reword on account of the
> > > > DOUBLE_BUFFER_CTL. I do think we'll have to start using
> > > > DOUBLE_BUFFER_CTL anyway due to other issues, but at least this way
> > > > we don't block PSR progress on that work.
> > > 
> > > My vote is for the option 2. Seems more straighforward and more broad.
> > > 
> > > DK?
> > > 
> > > My only request on the patch itself would be to create a function
> > > on intel_psr.c intel_psr_wait_for_idle... or something like this
> > > and put the register wait logic inside it instead of spreading
> > > the psr code around.
> > > 
> > > Thanks,
> > > Rodrigo.
> > > 
> > > >
> > > > --
> > > > Ville Syrjälä
> > > > Intel
> > Thanks for the comments, Ville and Rodrigo. I'll rework this to move the wait to intel_psr.c. There is a psr_wait_for_idle() in there, but there are some PSR locks being passed around inside it (eventually released by the caller). Also,the max timeout specified there is 50 msec which might be way too much ?
> 
> ouch! that function is ugly.... unlock than lock back again...
> (specially unlock without any assert locked... :/)
> 
> If you can improve that or split in a way that we reuse some code it would be nice...
>
Yes, I was also considering splitting it up. I'll rework and send it out. Thanks !

-Tarun 
> >
> > Best,
> > Tarun
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx