Re: [Intel-gfx] [PATCH 8/9] drm/i915: Implement .get_format_info() hook for CCS

From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Chad Versace <chadversary@chromium.org>,
	intel-gfx@lists.freedesktop.org, Ben Widawsky <ben@bwidawsk.net>,
	Vandana Kannan <vandana.kannan@intel.com>,
	dri-devel@lists.freedesktop.org,
	Jason Ekstrand <jason@jlekstrand.net>
Subject: Re: [Intel-gfx] [PATCH 8/9] drm/i915: Implement .get_format_info() hook for CCS
Date: Mon, 27 Feb 2017 17:13:41 +0200	[thread overview]
Message-ID: <20170227151341.GE31595@intel.com> (raw)
In-Reply-To: <20170226224150.GA26183@chadversary-glaptop0.roam.corp.google.com>

On Sun, Feb 26, 2017 at 02:41:50PM -0800, Chad Versace wrote:
> On Wed 04 Jan 2017, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > SKL+ display engine can scan out certain kinds of compressed surfaces
> > produced by the render engine. This involved telling the display engine
> > the location of the color control surfae (CCS) which describes which
> > parts of the main surface are compressed and which are not. The location
> > of CCS is provided by userspace as just another plane with its own offset.
> > 
> > By providing our own format information for the CCS formats, we should
> > be able to make framebuffer_check() do the right thing for the CCS
> > surface as well.
> > 
> > Note that we'll return the same format info for both Y and Yf tiled
> > format as that's what happens with the non-CCS Y vs. Yf as well. If
> > desired, we could potentially return a unique pointer for each
> > pixel_format+tiling+ccs combination, in which case we immediately be
> > able to tell if any of that stuff changed by just comparing the
> > pointers. But that does sound a bit wasteful space wise.
> > 
> > v2: Drop the 'dev' argument from the hook
> > v3: Include the description of the CCS surface layout
> > 
> > Cc: Vandana Kannan <vandana.kannan@intel.com>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Ben Widawsky <ben@bwidawsk.net>
> > Cc: Jason Ekstrand <jason@jlekstrand.net>
> > Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_display.c | 36 ++++++++++++++++++++++++++
> >  include/uapi/drm/drm_fourcc.h        | 49 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 85 insertions(+)
> 
> 
> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > index 9e1bb7fabcde..4581e3d41e5c 100644
> > --- a/include/uapi/drm/drm_fourcc.h
> > +++ b/include/uapi/drm/drm_fourcc.h
> > @@ -230,6 +230,55 @@ extern "C" {
> >  #define I915_FORMAT_MOD_Yf_TILED fourcc_mod_code(INTEL, 3)
> >  
> >  /*
> > + * Intel color control surface (CCS) for render compression
> > + *
> > + * The framebuffer format must be one of the 8:8:8:8 RGB formats,
> > + * the main surface will be plane index 0 and will be Y/Yf-tiled,
> > + * the CCS will be plane index 1.
> > + *
> 
> The paragraph below is...
> 
> > + * Each byte of CCS contains 4 pairs of bits, with each pair of bits
> > + * matching an area of 8x4 pixels of the main surface. Which would seem
> > + * to match 2 cachelines containing 4x4 pixels each. The pairs bits within
> > + * the byte form a 2x2 grid, which thus matches a 16x8 pixel area of the
> > + * main surface. This is the 2x2 pattern the bits form (0=lsb, 7=msb):
> > + * -----------
> > + * | 01 | 23 |
> > + *  ----------
> > + * | 45 | 67 |
> > + * -----------
> 
> ...almost correct. The hw docs state that each bit-pair of the CCS maps
> cacheline-pair, horizontally adjacent in the Y tile, of the primary surface. As
> a consequence, the remainder of the above paragraph is correct for 32-bit
> formats, but not others.

Which is why the comment says at the very top that the fb needs to
use a 8:8:8:8 format. IIRC that's the only thing the hardware supports.

> 
> This is not a nitpick, because Vulkan and EGL users may want to share
> dma_bufs with a fat format like R32G32B32A32_UNORM, and want to have CCS
> enabled when possible. As long as the users use the dma_buf only the 3D
> engine, and don't submit it to KMS, it's all safe.
> 
> For those users, we should document the bit-pair/cacheline-pair relationship
> correctly. And then preface the following detailed explanation and nice ascii
> diagrams by saying "this applies only to the 32-bit formats".
> 
> Here's the relevant PRM quote:
> 
>      The Color Control Surface (CCS) contains the compression status
>      of the cache-line pairs. The compression state of the cache-line
>      pair is specified by 2 bits in the CCS.  Each CCS cache-line
>      represents an area on the main surface of 16x16 sets of 128 byte
>      Y-tiled cache-line-pairs. CCS is always Y tiled.
> 
> See this Mesa comment for more details:
> https://cgit.freedesktop.org/mesa/mesa/tree/src/intel/isl/isl.c?h=17.0#n211
> 
> > + * Actually only the lower bit of the pair seems to have any effect.
> > + * No idea why. 0 in the lower bit would seem to mean not compressed,
> > + * and 1 is compressed.  The interpreation of the main surface data
> > + * when the block is marked compressed is unknown as of now.
> 
> If I recall correctly (it's been several months since I inspected the
> bits), the high bit is actually significant. Bit pattern 11 means that
> the data in primary surface's cacheline-pair is invalid; the 3D engine
> interprets the color of all pixels in that cacheline-pair to be the
> clear color stored in RENDER_SURFACE_STATE. Before the display engine
> can consume the surface, userspace is required to do a partial resolve,
> flushing the clear color into the primary surface. So it makes sense
> that the kernel would never observe that bit pattern in an incoming ccs
> surface, as long as userspace is doing its job correctly. And it makes
> sense that the display engine would ignore the high bit, because there is
> no way to provide the clear color to the display engine (at least
> according my reading of the docs).
> 
> Jason, does this match your understanding of the high bit?
> 
> > + *
> > + * CCS tile is laid out in 8 byte horizontal strips each strip thus corresponds
> > + * to a 128x8 pixel are of the main surface. So each 8x8 bytes of the CCS
> > + * (1 cacheline) will match an area of 4x2 tiles on the main surface.
> > + *
> > + * Here is the layout of a full CCS tile, with the 8 byte strips numbered 0-511:
> > + * ------------------------
> > + * |  0 |  64 | ... | 448 |
> > + * |  1 |  65 |     | 449 |
> > + * |  2 |  66 |     | 450 |
> > + * |  . |   . |     |   . |
> > + * |  . |   . |     |   . |
> > + * |  . |   . |     |   . |
> > + * | 63 | 127 |     | 511 |
> > + * ------------------------
> > + *
> > + * This will match an area of 1024x512 pixels on the main surface.
> > + *
> > + * The CCS plane pitch must be a multiple of the CCS tile width (64 bytes),
> > + * and for the purposes of the CCS plane offset we assume cpp==1. As each
> > + * byte matches a 16x8 area of the main surface, the dimensions of the CCS
> > + * plane will thus appear to be width/16 x height/8. Both planes must be
> > + * contained within the same gem object.
> 
> Again, the above paragraphs should clarify that they apply only to 32-bit formats.
> 
> > + */
> > +#define I915_FORMAT_MOD_Y_TILED_CCS	fourcc_mod_code(INTEL, 4)
> > +#define I915_FORMAT_MOD_Yf_TILED_CCS	fourcc_mod_code(INTEL, 5)

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel