All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@bootlin.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: "Hollingworth, Gordon" <gordon@raspberrypi.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Eben Upton <eben@raspberrypi.org>
Subject: Re: [RFC PATCH] drm/vc4: Add a load tracker to prevent HVS underflow errors
Date: Mon, 29 Oct 2018 10:10:56 +0100	[thread overview]
Message-ID: <20181029101056.3a33ac66@bbrezillon> (raw)
In-Reply-To: <20181029090301.GR21967@phenom.ffwll.local>

On Mon, 29 Oct 2018 10:03:01 +0100
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Mon, Oct 29, 2018 at 09:41:36AM +0100, Boris Brezillon wrote:
> > On Mon, 29 Oct 2018 09:06:40 +0100
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > On Fri, Oct 26, 2018 at 06:10:03PM +0300, Ville Syrjälä wrote:  
> > > > On Fri, Oct 26, 2018 at 04:52:33PM +0200, Boris Brezillon wrote:    
> > > > > On Fri, 26 Oct 2018 16:26:03 +0200
> > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >     
> > > > > > On Fri, Oct 26, 2018 at 3:57 PM Boris Brezillon
> > > > > > <boris.brezillon@bootlin.com> wrote:    
> > > > > > >
> > > > > > > On Fri, 26 Oct 2018 15:30:26 +0200
> > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > >      
> > > > > > > > On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon wrote:      
> > > > > > > > > On Thu, 25 Oct 2018 11:33:14 +0200
> > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > >      
> > > > > > > > > > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:      
> > > > > > > > > > > On Tue, 23 Oct 2018 15:44:43 +0200
> > > > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > > > >      
> > > > > > > > > > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:      
> > > > > > > > > > > > > Hi Daniel,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > > > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > > > > > >      
> > > > > > > > > > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:      
> > > > > > > > > > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > > > > > > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > > > > > > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > > > > > > > > > > account we might end up with HVS underflow errors.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > > > > > > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > > > > > > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > > > > > > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > > > > > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > > > > > > > > > > this case, since the HVS is not used by external components.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > This logic has been validated using a simple shell script and
> > > > > > > > > > > > > > > some instrumentation in the VC4 driver:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > > > > > > > > > > >   reporting those errors
> > > > > > > > > > > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > > > > > > > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The branch containing those modification is available here [1], and the
> > > > > > > > > > > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > > > > > > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > > > > > > > > > > reject setups that might have previously worked, so we might want to
> > > > > > > > > > > > > > > adjust the limits to avoid that.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > > > > > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test      
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > > > > > > > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > > > > > > > > > > those to hunt for underruns on i915 hw.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > > > > > > > > > > level, using DRM_ERROR,      
> > > > > > > > > > > > >
> > > > > > > > > > > > > Are you masking the underrun interrupt after it's been reported? If we
> > > > > > > > > > > > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > > > > > > > > > > > someone comes and update the config.      
> > > > > > > > > > > >
> > > > > > > > > > > > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > > > > > > > > > > > masking (so tests need to make sure they start with a modeset, or it'll be
> > > > > > > > > > > > for nothing).
> > > > > > > > > > > >      
> > > > > > > > > > > > >      
> > > > > > > > > > > > > > plus a tracepoint. See e.g.
> > > > > > > > > > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > > > > > > > > > > perhaps extract this into something common, similar to what was done with
> > > > > > > > > > > > > > crc support already.
> > > > > > > > > > > > > >      
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm not a big fan of hardcoded trace points in general (because of the
> > > > > > > > > > > > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > > > > > > > > > > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > > > > > > > > > > > about all the HW specific bits one might want to expose. For instance,
> > > > > > > > > > > > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > > > > > > > > > > > scanline field, and AFAICT, we don't have such information in the VC4
> > > > > > > > > > > > > case.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Any opinion on that?      
> > > > > > > > > > > >
> > > > > > > > > > > > It's only abi if you're unlucky. If it's just for debugging and
> > > > > > > > > > > > validation, you can change it again. Tbh, no idea why we even have these
> > > > > > > > > > > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > > > > > > > > > > > output. Maybe run git blame and ask the original author, we can probably
> > > > > > > > > > > > update them to suit our needs.      
> > > > > > > > > > >
> > > > > > > > > > > Okay, I think I'll go for a generic debugfs entry that returns true
> > > > > > > > > > > when an underrun error happened since the last modeset, false otherwise.
> > > > > > > > > > >
> > > > > > > > > > > Next question is: should I attach the underrun status to the drm_device
> > > > > > > > > > > or have one per CRTC? In my case, I only care about the "has an
> > > > > > > > > > > underrun error occurred on any of the active CRTC" case, so I'd vote for
> > > > > > > > > > > a per-device underrun status.      
> > > > > > > > > >
> > > > > > > > > > Yeah probably good enough. For our CI all we care about is the warn/error
> > > > > > > > > > level dmesg output. Anything at that level is considered a CI failure.      
> > > > > > > > >
> > > > > > > > > So igt is grepping dmesg to detect when an underrun happens?      
> > > > > > > >
> > > > > > > > No, but the CI runner is also observing dmesg. Anything in there at
> > > > > > > > warning or higher level is considered a failure.      
> > > > > > >
> > > > > > > Eric, do you do the same when you launch the IGT testsuite?
> > > > > > >      
> > > > > > > >      
> > > > > > > > > > What do you need the debugfs file for?      
> > > > > > > > >
> > > > > > > > > I just thought having a debugfs file to expose the underrun information
> > > > > > > > > would be cleaner than having to grep in dmesg to detect such failures.      
> > > > > > > >
> > > > > > > > The issue is that you want to detect underruns everywhere, not just in the
> > > > > > > > specific tests you're checking for it. Anything that does a modeset could
> > > > > > > > cause an underrun (at least we've managed to do so pretty much everywhere
> > > > > > > > on i915 hw, if you misprogram is sufficiently).      
> > > > > > >
> > > > > > > In my specific case, I want to have the IGT test check the underrun
> > > > > > > value while the test is being executed so that I know which exact
> > > > > > > configuration triggers the underrun situation. At least that's how I
> > > > > > > did to adjust/debug the HVS load tracking code. Maybe it's not really a
> > > > > > > problem when all we do is tracking regressions.      
> > > > > > 
> > > > > > Ok, that makes sense, and explains why you want the overall underrun
> > > > > > counter exposed in debugfs.    
> > > > > 
> > > > > Just one tiny detail which is not exactly related to this discussion
> > > > > but I thought I'd mention it here: underrun is actually not a counter,
> > > > > it's just a boolean. I used an atomic_t type to avoid having to add a
> > > > > spinlock to protect the variable (the variable is modified from
> > > > > an interrupt context and read from a non-atomic context). So, the
> > > > > acceptable values for underrun are currently 0 or 1. I can make it a
> > > > > counter if required.    
> > > > 
> > > > One idea I had a while back for i915 would be to count the number of
> > > > frames that had an underrun. So basically something like this:
> > > > 
> > > > underrun_irq() {
> > > > 	underrun_frames=1
> > > > 	disable_underrun_irq();
> > > > 	vblank_get();
> > > > }
> > > > vblank_irq() {
> > > > 	if (underrun) {
> > > > 		underrun_frames++;
> > > > 	} else if (underrun_frames) {
> > > > 		vblank_put();
> > > > 		enable_underrun_irq();
> > > > 		DEBUG("%d frames had an underrun\n", underrun_frames);
> > > > 		underrun_frames=0;
> > > > 	}
> > > > }
> > > > 
> > > > This avoids drowning in underrun interrupts while still
> > > > reporting at least how many frames had problems.
> > > > 
> > > > But I haven't had time to implement that yet.    
> > 
> > May I ask why you need to know how many frames had underrun issues?  
> 
> Much quicker testing when you're experimenting around with e.g. watermark
> settings. Full modeset for resetting can easily take a few seconds. I
> think we even had patches that restored the interrupt after 1ms already,
> to have more accurate sampling to judge whether a given change made things
> worse or better. We might have somewhat strange hw :-)
> 
> > > The other upshot of a counter is that there's no problem with resetting.
> > > Userspace simply grabs the counter before and after the test and compares.
> > > If you only have 0/1 you need some way to reset, or otherwise automated
> > > running in a CI farm isn't possible.  
> > 
> > The reset was done during atomic commit in my proposal, so no action
> > was required on the user side apart from applying a new config.
> > Anyway, I'll change that for a real counter.  
> 
> We want to measure underruns while doing full modesets too (specifically,
> when shutting down the pipeline). At least for our hw this has
> historically helped greatly in catching programming sequence issues.

Looks like you have particular needs for intel HW which I don't know
about, maybe I shouldn't be the one driving this rework...
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2018-10-29  9:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-16  9:40 [RFC PATCH] drm/vc4: Add a load tracker to prevent HVS underflow errors Boris Brezillon
2018-10-16 12:57 ` Daniel Vetter
2018-10-16 13:10   ` Boris Brezillon
2018-10-16 13:12     ` Daniel Vetter
2018-10-16 13:28       ` Ville Syrjälä
2018-10-16 16:41         ` Daniel Vetter
2018-10-16 17:39           ` Ville Syrjälä
2018-10-16 17:51             ` Daniel Vetter
2018-10-17 13:45           ` Boris Brezillon
2018-10-23  7:55   ` Boris Brezillon
2018-10-23 13:44     ` Daniel Vetter
2018-10-25  8:09       ` Boris Brezillon
2018-10-25  9:33         ` Daniel Vetter
2018-10-25  9:41           ` Boris Brezillon
2018-10-26 13:30             ` Daniel Vetter
2018-10-26 13:57               ` Boris Brezillon
2018-10-26 14:26                 ` Daniel Vetter
2018-10-26 14:52                   ` Boris Brezillon
2018-10-26 15:10                     ` Ville Syrjälä
2018-10-29  8:06                       ` Daniel Vetter
2018-10-29  8:41                         ` Boris Brezillon
2018-10-29  9:03                           ` Daniel Vetter
2018-10-29  9:10                             ` Boris Brezillon [this message]
2018-10-29 13:48                               ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181029101056.3a33ac66@bbrezillon \
    --to=boris.brezillon@bootlin.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=eben@raspberrypi.org \
    --cc=gordon@raspberrypi.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.