Re: [Intel-gfx] [PATCH] drm/i915: program wm blocks to at least blocks required per line

From: "Govindapillai, Vinod" <vinod.govindapillai@intel.com>
To: "ville.syrjala@linux.intel.com" <ville.syrjala@linux.intel.com>,
	"Lisovskiy, Stanislav" <stanislav.lisovskiy@intel.com>
Cc: "intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH] drm/i915: program wm blocks to at least blocks required per line
Date: Thu, 7 Apr 2022 12:09:48 +0000	[thread overview]
Message-ID: <5b133d1f8fb9d6c96270e8c00f0ae978d28da9a8.camel@intel.com> (raw)
In-Reply-To: <20220407064350.GA24386@intel.com>

On Thu, 2022-04-07 at 09:43 +0300, Lisovskiy, Stanislav wrote:
> On Wed, Apr 06, 2022 at 09:09:06PM +0300, Ville Syrjälä wrote:
> > On Wed, Apr 06, 2022 at 08:14:58PM +0300, Lisovskiy, Stanislav wrote:
> > > On Wed, Apr 06, 2022 at 05:01:39PM +0300, Ville Syrjälä wrote:
> > > > On Wed, Apr 06, 2022 at 04:45:26PM +0300, Lisovskiy, Stanislav wrote:
> > > > > On Wed, Apr 06, 2022 at 03:48:02PM +0300, Ville Syrjälä wrote:
> > > > > > On Mon, Apr 04, 2022 at 04:49:18PM +0300, Vinod Govindapillai wrote:
> > > > > > > In configurations with single DRAM channel, for usecases like
> > > > > > > 4K 60 Hz, FIFO underruns are observed quite frequently. Looks
> > > > > > > like the wm0 watermark values need to bumped up because the wm0
> > > > > > > memory latency calculations are probably not taking the DRAM
> > > > > > > channel's impact into account.
> > > > > > > 
> > > > > > > As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > > > one plane_blocks_per_line we should have selected method2.
> > > > > > > Assuming that modern HW versions have enough dbuf to hold
> > > > > > > at least one line, set the wm blocks to equivalent to blocks
> > > > > > > per line.
> > > > > > > 
> > > > > > > cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > > cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > > > > > > 
> > > > > > > Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/i915/intel_pm.c | 19 ++++++++++++++++++-
> > > > > > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > index 8824f269e5f5..ae28a8c63ca4 100644
> > > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > @@ -5474,7 +5474,24 @@ static void skl_compute_plane_wm(const struct intel_crtc_state
> > > > > > > *crtc_state,
> > > > > > >  		}
> > > > > > >  	}
> > > > > > >  
> > > > > > > -	blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > > > +	/*
> > > > > > > +	 * Lets have blocks at minimum equivalent to plane_blocks_per_line
> > > > > > > +	 * as there will be at minimum one line for lines configuration.
> > > > > > > +	 *
> > > > > > > +	 * As per the Bspec 49325, if the ddb allocation can hold at least
> > > > > > > +	 * one plane_blocks_per_line, we should have selected method2 in
> > > > > > > +	 * the above logic. Assuming that modern versions have enough dbuf
> > > > > > > +	 * and method2 guarantees blocks equivalent to at least 1 line,
> > > > > > > +	 * select the blocks as plane_blocks_per_line.
> > > > > > > +	 *
> > > > > > > +	 * TODO: Revisit the logic when we have better understanding on DRAM
> > > > > > > +	 * channels' impact on the level 0 memory latency and the relevant
> > > > > > > +	 * wm calculations.
> > > > > > > +	 */
> > > > > > > +	blocks = skl_wm_has_lines(dev_priv, level) ?
> > > > > > > +			max_t(u32, fixed16_to_u32_round_up(selected_result) + 1,
> > > > > > > +				  fixed16_to_u32_round_up(wp->plane_blocks_per_line)) :
> > > > > > > +			fixed16_to_u32_round_up(selected_result) + 1;
> > > > > > 
> > > > > > That's looks rather convoluted.
> > > > > > 
> > > > > >   blocks = fixed16_to_u32_round_up(selected_result) + 1;
> > > > > > + /* blah */
> > > > > > + if (has_lines)
> > > > > > +	blocks = max(blocks, fixed16_to_u32_round_up(wp->plane_blocks_per_line));
> > > > > 
> > > > > We probably need to do similar refactoring in the whole function ;-)
> > > > > 
> > > > > > Also since Art said nothing like this should actually be needed
> > > > > > I think the comment should make it a bit more clear that this
> > > > > > is just a hack to work around the underruns with some single
> > > > > > memory channel configurations.
> > > > > 
> > > > > It is actually not quite a hack, because we are missing that condition
> > > > > implementation from BSpec 49325, which instructs us to select method2
> > > > > when ddb blocks allocation is known and that ratio is >= 1.
> > > > 
> > > > The ddb allocation is not yet known, so we're implementing the
> > > > algorithm 100% correctly.
> > > > 
> > > > And this patch does not implement that misisng part anyway.
> > > 
> > > Yes, as I understood method2 would just give amount of blocks to be
> > > at least as dbuf blocks per line.
> > > 
> > > Wonder whether should we actually fully implement this BSpec clause 
> > > and add it to the point where ddb allocation is known or are there 
> > > any obstacles to do that, besides having to reshuffle this function a bit?
> > 
> > We need to calculate the wm to figure out how much ddb to allocate,
> > and then we'd need the ddb allocation to figure out how to calculate
> > the wm. Very much chicken vs. egg right there. We'd have to do some
> > kind of hideous loop where we'd calculate everything twice. I don't
> > really want to do that since I'd actually like to move the wm
> > calculation to happen already much earlier during .check_plane()
> > as that could reduce the amount of redundant wm calculations we
> > are currently doing.
> 
> I might be missing some details right now, but why do we need a ddb
> allocation to count wms?
> 
> I thought its like we usually calculate wm levels + min_ddb_allocation,
> then based on that we do allocate min_ddb + extra for each plane.
> This is correct that by this moment when we calculate wms we have only
> min_ddb available, so if this level would be even enabled, we would
> at least need min_ddb blocks.
> 
> I think we could just use that min_ddb value here for that purpose,
> because the condition anyway checks if 
> (plane buffer allocation / plane blocks per line) >=1 so, even if
> if this wm level would be enabled plane buffer allocation would
> be at least min_ddb _or higher_ - however that won't affect this 
> condition because even if it happens to be "plane buffer allocation
> + some extra" the ratio would still be valid.
> So if it executes for min_ddb / plane blocks per line, we can
> probably safely state, further it will be also true.

min_ddb = 110% of the blocks calculated from the 2 methods (blocks + 10%)
It depends on what method we choose. So I dont think we can use it for any assumptions.

But in any case, I think this patch do not cause any harm in most of the usecases expected out of
skl+ platforms which have enough dbuf!

Per plane ddb allocation happens based on the highest wm level min_ddb which can fit into the
allocation. If one level is not fit, then that level + above package C state transitions are
disabled. 
Now if you look at the logic to select which method to use - if the latency >= linetime, we select
the large buffer method which guantees that there is atleast plane_blocks_per_line. So I think we
can safely assume that latency for wake wm level will be mostly higher, which implies using the
"large buffer" method.

So this change mostly limits to wm0. And hence should not impact ddb allocation, but the memory
fetch bursts might happen slightly more frequently when the processor is in C0?

BR
vinod

> 
> Stan
> 
> > -- 
> > Ville Syrjälä
> > Intel