All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm: Generalized NV Block Linear DRM format mod
@ 2019-10-14 22:13 James Jones
  2019-10-14 22:13 ` James Jones
  0 siblings, 1 reply; 6+ messages in thread
From: James Jones @ 2019-10-14 22:13 UTC (permalink / raw)
  To: Ben Skeggs, David Airlie, Daniel Vetter
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Beyond general review, I'm looking for feedback on a few things
specifically here:

-Is the level of backwards compatibility described here sufficient?
 Technically I can make the user space drivers support the old
 modifiers too, but that would mean the layout they specify would
 morph based on the GPU they're being used on, and sharing buffers
 between two different NV GPUs, which would appear to be possible,
 would result in corruption on one side or the other.

-I used "magic" numbers for all the bit shifting.  Would it be
 better to use __fourcc_XXX constants like the broadcom modifiers
 do?  I wasn't sure which style was preferred.  The nouveau code is
 full of magic numbers, but that's a bit lower level than this file.

If preferred, I can send this out as part of a patchset that adds
support for the modifiers to nouveau and TegraDRM, but I have some
things to clean up there before it's ready for proper review, and
I didn't want to block review of the basic modifier layout on that
work.


_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] drm: Generalized NV Block Linear DRM format mod
  2019-10-14 22:13 [PATCH] drm: Generalized NV Block Linear DRM format mod James Jones
@ 2019-10-14 22:13 ` James Jones
       [not found]   ` <20191014221321.8163-2-jajones-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: James Jones @ 2019-10-14 22:13 UTC (permalink / raw)
  To: Ben Skeggs, David Airlie, Daniel Vetter; +Cc: nouveau, James Jones, dri-devel

Builds upon the existing NVIDIA 16Bx2 block linear
format modifiers by adding more "fields" to the
existing parameterized
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
macro that allow fully defining a unique-across-
all-NVIDIA-hardware bit layout using a minimal
set of fields and values.  The new modifier macro
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
effectively backwards compatible with the existing
macro, introducing a superset of the previously
definable format modifiers.

Backwards compatibility has two quirks.  First,
the zero value for the "kind" field, which is
implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
macro, must be special cased in drivers and
assumed to map to the pre-Turing generic kind of
0xfe, since a kind of "zero" is reserved for
linear buffer layouts on all GPUs.

Second, it is assumed backwards compatibility
is only needed when running on Tegra GPUs, and
specifically Tegra GPUs prior to Xavier.  This
is based on two assertions:

-Tegra GPUs prior to Xavier used a slightly
 different raw bit layout than desktop GPUs,
 making it impossible to directly share block
 linear buffers between the two.

-Support for the existing block linear modifiers
 was incomplete, making them useful only for
 exporting buffers created by nouveau and
 importing them to Tegra DRM as framebuffers for
 scan out.  There was no support for adding
 framebuffers using format modifiers in nouveau,
 nor importing dma-buf/PRIME GEM objects into
 nouveau userspace drivers with modifiers in Mesa.

Hence it is assumed the prior modifiers were not
intended for use on desktop GPUs, and as a
corrolary, were not intended to support sharing
block linear buffers across two different NVIDIA
GPUs.

Signed-off-by: James Jones <jajones@nvidia.com>
---
 include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++---
 1 file changed, 100 insertions(+), 8 deletions(-)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 3feeaa3f987a..cc9853d42a24 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -497,7 +497,99 @@ extern "C" {
 #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
 
 /*
- * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
+ * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80,
+ * and Tegra GPUs starting with Tegra K1.
+ *
+ * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
+ * based on the architecture generation.  GOBs themselves are then arranged in
+ * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
+ * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
+ * a block depth or height of "4").
+ *
+ * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
+ * in full detail.
+ *
+ *       Macro
+ * Bits  Param Description
+ * ----  ----- -----------------------------------------------------------------
+ *
+ *  3:0  h     log2(height) of each block, in GOBs.  Placed here for
+ *             compatibility with the existing
+ *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  4:4  -     Must be 1, to indicate block-linear layout.  Necessary for
+ *             compatibility with the existing
+ *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  8:5  -     Reserved (To support 3D-surfaces with variable log2(depth) block
+ *             size).  Must be zero.
+ *
+ *             Note there is no log2(width) parameter.  Some portions of the
+ *             hardware support a block width of two gobs, but it is impractical
+ *             to use due to lack of support elsewhere, and has no known
+ *             benefits.
+ *
+ * 11:9  -     Reserved (To support 2D-array textures with variable array stride
+ *             in blocks, specified via log2(tile width in blocks)).  Must be
+ *             zero.
+ *
+ * 19:12 k     Page Kind.  This value directly maps to a field in the page
+ *             tables of all GPUs >= NV50.  It affects the exact layout of bits
+ *             in memory and can be derived from the tuple
+ *
+ *               (format, GPU model, compression type, samples per pixel)
+ *
+ *             Where compression type is defined below.  If GPU model were
+ *             implied by the format modifier, format, or memory buffer, page
+ *             kind would not need to be included in the modifier itself, but
+ *             since the modifier should define the layout of the associated
+ *             memory buffer independent from any device or other context, it
+ *             must be included here.
+ *
+ *             To grandfather in prior block linear format modifiers to this
+ *             layout, the page kind "0", which corresponds to "pitch/linear"
+ *             and hence is unusable with block-linear layouts, is remapped
+ *             within drivers to the value 0xfe, which corresponds to the
+ *             "generic" kind used for simple single-sample color formats on
+ *             pre-Turing GPUs.
+ *
+ * 21:20 g     GOB Height and Page Kind Generation.  The height of a GOB changed
+ *             starting with Fermi GPUs.  Additionally, the mapping between page
+ *             kind and bit layout has changed at various points.
+ *
+ *               0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping
+ *               1 = Gob Height 4, G80 - GT2XX Page Kind mapping
+ *               2 = Gob Height 8, Turing+ Page Kind mapping
+ *               3 = Reserved for future use.
+ *
+ * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there is a further
+ *             bit remapping step that occurs at an even lower level than the
+ *             page kind and block linear swizzles.  This causes the layout of
+ *             surfaces mapped in those SOC's GPUs to be incompatible with the
+ *             equivalent mapping on other GPUs in the same system.
+ *
+ *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
+ *               1 = Desktop GPU and Tegra Xavier+ Layout
+ *
+ * 24:23 c     Lossless Framebuffer Compression type.
+ *
+ *               0 = none
+ *               1 = ROP/3D, actual compression implied by the Page Kind field
+ *               2 = CDE horizontal
+ *               3 = CDE vertical
+ *
+ * 55:25 -     Reserved for future use.  Must be zero.
+ */
+#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
+	fourcc_mod_code(NVIDIA, (0x10 | \
+				 ((h) & 0xf) | \
+				 (((k) & 0xff) << 12) | \
+				 (((g) & 0x3) << 20) | \
+				 (((s) & 0x1) << 22) | \
+				 (((c) & 0x3) << 23)))
+
+/*
+ * 16Bx2 Block Linear layout, used by Tegra K1 and later
  *
  * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked
  * vertically by a power of 2 (1 to 32 GOBs) to form a block.
@@ -518,20 +610,20 @@ extern "C" {
  * in full detail.
  */
 #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
-	fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
+	DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
 
 #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
-	fourcc_mod_code(NVIDIA, 0x10)
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
 #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
-	fourcc_mod_code(NVIDIA, 0x11)
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
 #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
-	fourcc_mod_code(NVIDIA, 0x12)
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
 #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
-	fourcc_mod_code(NVIDIA, 0x13)
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
 #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
-	fourcc_mod_code(NVIDIA, 0x14)
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
 #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
-	fourcc_mod_code(NVIDIA, 0x15)
+	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
 
 /*
  * Some Broadcom modifiers take parameters, for example the number of
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm: Generalized NV Block Linear DRM format mod
       [not found]   ` <20191014221321.8163-2-jajones-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
@ 2019-10-15 14:19     ` Daniel Vetter
       [not found]       ` <20191015141913.GO11828-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2019-10-15 14:19 UTC (permalink / raw)
  To: James Jones
  Cc: David Airlie, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs,
	Daniel Vetter

On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:
> Builds upon the existing NVIDIA 16Bx2 block linear
> format modifiers by adding more "fields" to the
> existing parameterized
> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
> macro that allow fully defining a unique-across-
> all-NVIDIA-hardware bit layout using a minimal
> set of fields and values.  The new modifier macro
> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
> effectively backwards compatible with the existing
> macro, introducing a superset of the previously
> definable format modifiers.
> 
> Backwards compatibility has two quirks.  First,
> the zero value for the "kind" field, which is
> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
> macro, must be special cased in drivers and
> assumed to map to the pre-Turing generic kind of
> 0xfe, since a kind of "zero" is reserved for
> linear buffer layouts on all GPUs.
> 
> Second, it is assumed backwards compatibility
> is only needed when running on Tegra GPUs, and
> specifically Tegra GPUs prior to Xavier.  This
> is based on two assertions:
> 
> -Tegra GPUs prior to Xavier used a slightly
>  different raw bit layout than desktop GPUs,
>  making it impossible to directly share block
>  linear buffers between the two.
> 
> -Support for the existing block linear modifiers
>  was incomplete, making them useful only for
>  exporting buffers created by nouveau and
>  importing them to Tegra DRM as framebuffers for
>  scan out.  There was no support for adding
>  framebuffers using format modifiers in nouveau,
>  nor importing dma-buf/PRIME GEM objects into
>  nouveau userspace drivers with modifiers in Mesa.
> 
> Hence it is assumed the prior modifiers were not
> intended for use on desktop GPUs, and as a
> corrolary, were not intended to support sharing
> block linear buffers across two different NVIDIA
> GPUs.
> 
> Signed-off-by: James Jones <jajones@nvidia.com>
> ---
>  include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++---
>  1 file changed, 100 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 3feeaa3f987a..cc9853d42a24 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -497,7 +497,99 @@ extern "C" {
>  #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
>  
>  /*
> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
> + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80,
> + * and Tegra GPUs starting with Tegra K1.
> + *
> + * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
> + * based on the architecture generation.  GOBs themselves are then arranged in
> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
> + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
> + * a block depth or height of "4").
> + *
> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
> + * in full detail.
> + *
> + *       Macro
> + * Bits  Param Description
> + * ----  ----- -----------------------------------------------------------------
> + *
> + *  3:0  h     log2(height) of each block, in GOBs.  Placed here for
> + *             compatibility with the existing
> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> + *
> + *  4:4  -     Must be 1, to indicate block-linear layout.  Necessary for
> + *             compatibility with the existing
> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> + *
> + *  8:5  -     Reserved (To support 3D-surfaces with variable log2(depth) block
> + *             size).  Must be zero.
> + *
> + *             Note there is no log2(width) parameter.  Some portions of the
> + *             hardware support a block width of two gobs, but it is impractical
> + *             to use due to lack of support elsewhere, and has no known
> + *             benefits.
> + *
> + * 11:9  -     Reserved (To support 2D-array textures with variable array stride
> + *             in blocks, specified via log2(tile width in blocks)).  Must be
> + *             zero.
> + *
> + * 19:12 k     Page Kind.  This value directly maps to a field in the page
> + *             tables of all GPUs >= NV50.  It affects the exact layout of bits
> + *             in memory and can be derived from the tuple
> + *
> + *               (format, GPU model, compression type, samples per pixel)
> + *
> + *             Where compression type is defined below.  If GPU model were
> + *             implied by the format modifier, format, or memory buffer, page
> + *             kind would not need to be included in the modifier itself, but
> + *             since the modifier should define the layout of the associated
> + *             memory buffer independent from any device or other context, it
> + *             must be included here.
> + *
> + *             To grandfather in prior block linear format modifiers to this
> + *             layout, the page kind "0", which corresponds to "pitch/linear"
> + *             and hence is unusable with block-linear layouts, is remapped
> + *             within drivers to the value 0xfe, which corresponds to the
> + *             "generic" kind used for simple single-sample color formats on
> + *             pre-Turing GPUs.

Hm, maybe a tiny static inline function which canonizalizes modifiers?
Something like

static inline u64
drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool
is_pre_turing)
{
}

Would then give you a nice place to stick this backward compat note and
make it really clear what should be done. I think establishing this as a
pattern would also be nice, since I'm sure we'll have a pile more of these
cases where modifiers turn out to assume a few too many things about the
platform they're used on (we have a similar case on the intel side too).

Just a drive-by idea, feel free to ignore.

Cheers, Daniel

> + *
> + * 21:20 g     GOB Height and Page Kind Generation.  The height of a GOB changed
> + *             starting with Fermi GPUs.  Additionally, the mapping between page
> + *             kind and bit layout has changed at various points.
> + *
> + *               0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping
> + *               1 = Gob Height 4, G80 - GT2XX Page Kind mapping
> + *               2 = Gob Height 8, Turing+ Page Kind mapping
> + *               3 = Reserved for future use.
> + *
> + * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there is a further
> + *             bit remapping step that occurs at an even lower level than the
> + *             page kind and block linear swizzles.  This causes the layout of
> + *             surfaces mapped in those SOC's GPUs to be incompatible with the
> + *             equivalent mapping on other GPUs in the same system.
> + *
> + *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
> + *               1 = Desktop GPU and Tegra Xavier+ Layout
> + *
> + * 24:23 c     Lossless Framebuffer Compression type.
> + *
> + *               0 = none
> + *               1 = ROP/3D, actual compression implied by the Page Kind field
> + *               2 = CDE horizontal
> + *               3 = CDE vertical
> + *
> + * 55:25 -     Reserved for future use.  Must be zero.
> + */
> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
> +	fourcc_mod_code(NVIDIA, (0x10 | \
> +				 ((h) & 0xf) | \
> +				 (((k) & 0xff) << 12) | \
> +				 (((g) & 0x3) << 20) | \
> +				 (((s) & 0x1) << 22) | \
> +				 (((c) & 0x3) << 23)))
> +
> +/*
> + * 16Bx2 Block Linear layout, used by Tegra K1 and later
>   *
>   * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked
>   * vertically by a power of 2 (1 to 32 GOBs) to form a block.
> @@ -518,20 +610,20 @@ extern "C" {
>   * in full detail.
>   */
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
> -	fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
> +	DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
>  
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
> -	fourcc_mod_code(NVIDIA, 0x10)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
> -	fourcc_mod_code(NVIDIA, 0x11)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
> -	fourcc_mod_code(NVIDIA, 0x12)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
> -	fourcc_mod_code(NVIDIA, 0x13)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
> -	fourcc_mod_code(NVIDIA, 0x14)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
> -	fourcc_mod_code(NVIDIA, 0x15)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
>  
>  /*
>   * Some Broadcom modifiers take parameters, for example the number of
> -- 
> 2.17.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm: Generalized NV Block Linear DRM format mod
       [not found]       ` <20191015141913.GO11828-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2019-10-15 15:07         ` James Jones
       [not found]           ` <d6252e64-57b5-3007-62ae-7e5aff0df5f6-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: James Jones @ 2019-10-15 15:07 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: David Airlie, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Ben Skeggs, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 10/15/19 7:19 AM, Daniel Vetter wrote:
> On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:
>> Builds upon the existing NVIDIA 16Bx2 block linear
>> format modifiers by adding more "fields" to the
>> existing parameterized
>> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
>> macro that allow fully defining a unique-across-
>> all-NVIDIA-hardware bit layout using a minimal
>> set of fields and values.  The new modifier macro
>> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
>> effectively backwards compatible with the existing
>> macro, introducing a superset of the previously
>> definable format modifiers.
>>
>> Backwards compatibility has two quirks.  First,
>> the zero value for the "kind" field, which is
>> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
>> macro, must be special cased in drivers and
>> assumed to map to the pre-Turing generic kind of
>> 0xfe, since a kind of "zero" is reserved for
>> linear buffer layouts on all GPUs.
>>
>> Second, it is assumed backwards compatibility
>> is only needed when running on Tegra GPUs, and
>> specifically Tegra GPUs prior to Xavier.  This
>> is based on two assertions:
>>
>> -Tegra GPUs prior to Xavier used a slightly
>>   different raw bit layout than desktop GPUs,
>>   making it impossible to directly share block
>>   linear buffers between the two.
>>
>> -Support for the existing block linear modifiers
>>   was incomplete, making them useful only for
>>   exporting buffers created by nouveau and
>>   importing them to Tegra DRM as framebuffers for
>>   scan out.  There was no support for adding
>>   framebuffers using format modifiers in nouveau,
>>   nor importing dma-buf/PRIME GEM objects into
>>   nouveau userspace drivers with modifiers in Mesa.
>>
>> Hence it is assumed the prior modifiers were not
>> intended for use on desktop GPUs, and as a
>> corrolary, were not intended to support sharing
>> block linear buffers across two different NVIDIA
>> GPUs.
>>
>> Signed-off-by: James Jones <jajones@nvidia.com>
>> ---
>>   include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++---
>>   1 file changed, 100 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>> index 3feeaa3f987a..cc9853d42a24 100644
>> --- a/include/uapi/drm/drm_fourcc.h
>> +++ b/include/uapi/drm/drm_fourcc.h
>> @@ -497,7 +497,99 @@ extern "C" {
>>   #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
>>   
>>   /*
>> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
>> + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80,
>> + * and Tegra GPUs starting with Tegra K1.
>> + *
>> + * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
>> + * based on the architecture generation.  GOBs themselves are then arranged in
>> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
>> + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
>> + * a block depth or height of "4").
>> + *
>> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
>> + * in full detail.
>> + *
>> + *       Macro
>> + * Bits  Param Description
>> + * ----  ----- -----------------------------------------------------------------
>> + *
>> + *  3:0  h     log2(height) of each block, in GOBs.  Placed here for
>> + *             compatibility with the existing
>> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
>> + *
>> + *  4:4  -     Must be 1, to indicate block-linear layout.  Necessary for
>> + *             compatibility with the existing
>> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
>> + *
>> + *  8:5  -     Reserved (To support 3D-surfaces with variable log2(depth) block
>> + *             size).  Must be zero.
>> + *
>> + *             Note there is no log2(width) parameter.  Some portions of the
>> + *             hardware support a block width of two gobs, but it is impractical
>> + *             to use due to lack of support elsewhere, and has no known
>> + *             benefits.
>> + *
>> + * 11:9  -     Reserved (To support 2D-array textures with variable array stride
>> + *             in blocks, specified via log2(tile width in blocks)).  Must be
>> + *             zero.
>> + *
>> + * 19:12 k     Page Kind.  This value directly maps to a field in the page
>> + *             tables of all GPUs >= NV50.  It affects the exact layout of bits
>> + *             in memory and can be derived from the tuple
>> + *
>> + *               (format, GPU model, compression type, samples per pixel)
>> + *
>> + *             Where compression type is defined below.  If GPU model were
>> + *             implied by the format modifier, format, or memory buffer, page
>> + *             kind would not need to be included in the modifier itself, but
>> + *             since the modifier should define the layout of the associated
>> + *             memory buffer independent from any device or other context, it
>> + *             must be included here.
>> + *
>> + *             To grandfather in prior block linear format modifiers to this
>> + *             layout, the page kind "0", which corresponds to "pitch/linear"
>> + *             and hence is unusable with block-linear layouts, is remapped
>> + *             within drivers to the value 0xfe, which corresponds to the
>> + *             "generic" kind used for simple single-sample color formats on
>> + *             pre-Turing GPUs.
> 
> Hm, maybe a tiny static inline function which canonizalizes modifiers?
> Something like
> 
> static inline u64
> drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool
> is_pre_turing)
> {
> }
> 
> Would then give you a nice place to stick this backward compat note and
> make it really clear what should be done. I think establishing this as a
> pattern would also be nice, since I'm sure we'll have a pile more of these
> cases where modifiers turn out to assume a few too many things about the
> platform they're used on (we have a similar case on the intel side too).

To make sure I'm clear, it would behave like this?

   fixed_mod = canonicalize(old_style_valid_mod, true);
   assert(fixed_mod == old_style_valid_mod | (0xfe << 12));
   fixed_mod = canonicalize(new_style_valid_mod, [false,true]);
   assert(fixed_mod == new_style_valid_mod);

I'm unclear what it should do in this case though:

   fixed_mod = canonicalize(old_style_valid_mod, false);

Since there's no code out there using the old style modifiers with 
Turing+ yet, and I don't want to try to support such usage.  Maybe just 
drop the "is_pre_turing" parameter and always canonicalize by mapping 0 
-> 0xfe as the comment above states, and not touching other values?  Or 
should any invalid modifier, including this case, return 
DRM_FORMAT_MOD_INVALID?  This latter idea seems risky because it would 
cause software compiled against old drm_fourcc.h to potentially reject 
format modifiers from newer kernels or libraries with an expanded 
representation, but maybe that's what we want if only driver components 
are supposed to call this function.

Thanks,
-James

> Just a drive-by idea, feel free to ignore.
> 
> Cheers, Daniel
> 
>> + *
>> + * 21:20 g     GOB Height and Page Kind Generation.  The height of a GOB changed
>> + *             starting with Fermi GPUs.  Additionally, the mapping between page
>> + *             kind and bit layout has changed at various points.
>> + *
>> + *               0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping
>> + *               1 = Gob Height 4, G80 - GT2XX Page Kind mapping
>> + *               2 = Gob Height 8, Turing+ Page Kind mapping
>> + *               3 = Reserved for future use.
>> + *
>> + * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there is a further
>> + *             bit remapping step that occurs at an even lower level than the
>> + *             page kind and block linear swizzles.  This causes the layout of
>> + *             surfaces mapped in those SOC's GPUs to be incompatible with the
>> + *             equivalent mapping on other GPUs in the same system.
>> + *
>> + *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
>> + *               1 = Desktop GPU and Tegra Xavier+ Layout
>> + *
>> + * 24:23 c     Lossless Framebuffer Compression type.
>> + *
>> + *               0 = none
>> + *               1 = ROP/3D, actual compression implied by the Page Kind field
>> + *               2 = CDE horizontal
>> + *               3 = CDE vertical
>> + *
>> + * 55:25 -     Reserved for future use.  Must be zero.
>> + */
>> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
>> +	fourcc_mod_code(NVIDIA, (0x10 | \
>> +				 ((h) & 0xf) | \
>> +				 (((k) & 0xff) << 12) | \
>> +				 (((g) & 0x3) << 20) | \
>> +				 (((s) & 0x1) << 22) | \
>> +				 (((c) & 0x3) << 23)))
>> +
>> +/*
>> + * 16Bx2 Block Linear layout, used by Tegra K1 and later
>>    *
>>    * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked
>>    * vertically by a power of 2 (1 to 32 GOBs) to form a block.
>> @@ -518,20 +610,20 @@ extern "C" {
>>    * in full detail.
>>    */
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
>> -	fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
>> +	DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
>>   
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x10)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x11)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x12)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x13)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x14)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
>>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
>> -	fourcc_mod_code(NVIDIA, 0x15)
>> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
>>   
>>   /*
>>    * Some Broadcom modifiers take parameters, for example the number of
>> -- 
>> 2.17.1
>>
> 
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm: Generalized NV Block Linear DRM format mod
       [not found]           ` <d6252e64-57b5-3007-62ae-7e5aff0df5f6-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
@ 2019-10-15 15:42             ` Daniel Vetter
       [not found]               ` <CAKMK7uGC46L_PxSYRxpjnF_gDWvo5K7=6ZBo_Gc1g27M1+TrXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2019-10-15 15:42 UTC (permalink / raw)
  To: James Jones; +Cc: David Airlie, Nouveau Dev, Ben Skeggs, dri-devel

On Tue, Oct 15, 2019 at 5:14 PM James Jones <jajones@nvidia.com> wrote:
>
> On 10/15/19 7:19 AM, Daniel Vetter wrote:
> > On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:
> >> Builds upon the existing NVIDIA 16Bx2 block linear
> >> format modifiers by adding more "fields" to the
> >> existing parameterized
> >> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
> >> macro that allow fully defining a unique-across-
> >> all-NVIDIA-hardware bit layout using a minimal
> >> set of fields and values.  The new modifier macro
> >> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
> >> effectively backwards compatible with the existing
> >> macro, introducing a superset of the previously
> >> definable format modifiers.
> >>
> >> Backwards compatibility has two quirks.  First,
> >> the zero value for the "kind" field, which is
> >> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
> >> macro, must be special cased in drivers and
> >> assumed to map to the pre-Turing generic kind of
> >> 0xfe, since a kind of "zero" is reserved for
> >> linear buffer layouts on all GPUs.
> >>
> >> Second, it is assumed backwards compatibility
> >> is only needed when running on Tegra GPUs, and
> >> specifically Tegra GPUs prior to Xavier.  This
> >> is based on two assertions:
> >>
> >> -Tegra GPUs prior to Xavier used a slightly
> >>   different raw bit layout than desktop GPUs,
> >>   making it impossible to directly share block
> >>   linear buffers between the two.
> >>
> >> -Support for the existing block linear modifiers
> >>   was incomplete, making them useful only for
> >>   exporting buffers created by nouveau and
> >>   importing them to Tegra DRM as framebuffers for
> >>   scan out.  There was no support for adding
> >>   framebuffers using format modifiers in nouveau,
> >>   nor importing dma-buf/PRIME GEM objects into
> >>   nouveau userspace drivers with modifiers in Mesa.
> >>
> >> Hence it is assumed the prior modifiers were not
> >> intended for use on desktop GPUs, and as a
> >> corrolary, were not intended to support sharing
> >> block linear buffers across two different NVIDIA
> >> GPUs.
> >>
> >> Signed-off-by: James Jones <jajones@nvidia.com>
> >> ---
> >>   include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++---
> >>   1 file changed, 100 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> >> index 3feeaa3f987a..cc9853d42a24 100644
> >> --- a/include/uapi/drm/drm_fourcc.h
> >> +++ b/include/uapi/drm/drm_fourcc.h
> >> @@ -497,7 +497,99 @@ extern "C" {
> >>   #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
> >>
> >>   /*
> >> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
> >> + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80,
> >> + * and Tegra GPUs starting with Tegra K1.
> >> + *
> >> + * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
> >> + * based on the architecture generation.  GOBs themselves are then arranged in
> >> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
> >> + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
> >> + * a block depth or height of "4").
> >> + *
> >> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
> >> + * in full detail.
> >> + *
> >> + *       Macro
> >> + * Bits  Param Description
> >> + * ----  ----- -----------------------------------------------------------------
> >> + *
> >> + *  3:0  h     log2(height) of each block, in GOBs.  Placed here for
> >> + *             compatibility with the existing
> >> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> >> + *
> >> + *  4:4  -     Must be 1, to indicate block-linear layout.  Necessary for
> >> + *             compatibility with the existing
> >> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> >> + *
> >> + *  8:5  -     Reserved (To support 3D-surfaces with variable log2(depth) block
> >> + *             size).  Must be zero.
> >> + *
> >> + *             Note there is no log2(width) parameter.  Some portions of the
> >> + *             hardware support a block width of two gobs, but it is impractical
> >> + *             to use due to lack of support elsewhere, and has no known
> >> + *             benefits.
> >> + *
> >> + * 11:9  -     Reserved (To support 2D-array textures with variable array stride
> >> + *             in blocks, specified via log2(tile width in blocks)).  Must be
> >> + *             zero.
> >> + *
> >> + * 19:12 k     Page Kind.  This value directly maps to a field in the page
> >> + *             tables of all GPUs >= NV50.  It affects the exact layout of bits
> >> + *             in memory and can be derived from the tuple
> >> + *
> >> + *               (format, GPU model, compression type, samples per pixel)
> >> + *
> >> + *             Where compression type is defined below.  If GPU model were
> >> + *             implied by the format modifier, format, or memory buffer, page
> >> + *             kind would not need to be included in the modifier itself, but
> >> + *             since the modifier should define the layout of the associated
> >> + *             memory buffer independent from any device or other context, it
> >> + *             must be included here.
> >> + *
> >> + *             To grandfather in prior block linear format modifiers to this
> >> + *             layout, the page kind "0", which corresponds to "pitch/linear"
> >> + *             and hence is unusable with block-linear layouts, is remapped
> >> + *             within drivers to the value 0xfe, which corresponds to the
> >> + *             "generic" kind used for simple single-sample color formats on
> >> + *             pre-Turing GPUs.
> >
> > Hm, maybe a tiny static inline function which canonizalizes modifiers?
> > Something like
> >
> > static inline u64
> > drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool
> > is_pre_turing)
> > {
> > }
> >
> > Would then give you a nice place to stick this backward compat note and
> > make it really clear what should be done. I think establishing this as a
> > pattern would also be nice, since I'm sure we'll have a pile more of these
> > cases where modifiers turn out to assume a few too many things about the
> > platform they're used on (we have a similar case on the intel side too).
>
> To make sure I'm clear, it would behave like this?
>
>    fixed_mod = canonicalize(old_style_valid_mod, true);
>    assert(fixed_mod == old_style_valid_mod | (0xfe << 12));
>    fixed_mod = canonicalize(new_style_valid_mod, [false,true]);
>    assert(fixed_mod == new_style_valid_mod);
>
> I'm unclear what it should do in this case though:
>
>    fixed_mod = canonicalize(old_style_valid_mod, false);
>
> Since there's no code out there using the old style modifiers with
> Turing+ yet, and I don't want to try to support such usage.  Maybe just
> drop the "is_pre_turing" parameter and always canonicalize by mapping 0
> -> 0xfe as the comment above states, and not touching other values?  Or
> should any invalid modifier, including this case, return
> DRM_FORMAT_MOD_INVALID?  This latter idea seems risky because it would
> cause software compiled against old drm_fourcc.h to potentially reject
> format modifiers from newer kernels or libraries with an expanded
> representation, but maybe that's what we want if only driver components
> are supposed to call this function.

That's where I'm showing that I have no clue about nvidia buffer
formats I guess :-)

If 0 never makes sense, then yeah I guess you could just
unconditionally canonicalize. Maybe you want to reject the old/legacy
style on turing+ plus, but that's a tradeoff up to you guys - as long
as it's consistent across all involved drivers. Not rejecting it would
essentially make 0 an alias for 0xfe everywhere, post and pre-turing.
More reasons for having a shared canonicalize function which enforces
the same behaviour (whichever you pick) everywhere I think.

And yeah the canonicalize function would only be called by drivers,
not by clients.
-Daniel


> Thanks,
> -James
>
> > Just a drive-by idea, feel free to ignore.
> >
> > Cheers, Daniel
> >
> >> + *
> >> + * 21:20 g     GOB Height and Page Kind Generation.  The height of a GOB changed
> >> + *             starting with Fermi GPUs.  Additionally, the mapping between page
> >> + *             kind and bit layout has changed at various points.
> >> + *
> >> + *               0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping
> >> + *               1 = Gob Height 4, G80 - GT2XX Page Kind mapping
> >> + *               2 = Gob Height 8, Turing+ Page Kind mapping
> >> + *               3 = Reserved for future use.
> >> + *
> >> + * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there is a further
> >> + *             bit remapping step that occurs at an even lower level than the
> >> + *             page kind and block linear swizzles.  This causes the layout of
> >> + *             surfaces mapped in those SOC's GPUs to be incompatible with the
> >> + *             equivalent mapping on other GPUs in the same system.
> >> + *
> >> + *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
> >> + *               1 = Desktop GPU and Tegra Xavier+ Layout
> >> + *
> >> + * 24:23 c     Lossless Framebuffer Compression type.
> >> + *
> >> + *               0 = none
> >> + *               1 = ROP/3D, actual compression implied by the Page Kind field
> >> + *               2 = CDE horizontal
> >> + *               3 = CDE vertical
> >> + *
> >> + * 55:25 -     Reserved for future use.  Must be zero.
> >> + */
> >> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
> >> +    fourcc_mod_code(NVIDIA, (0x10 | \
> >> +                             ((h) & 0xf) | \
> >> +                             (((k) & 0xff) << 12) | \
> >> +                             (((g) & 0x3) << 20) | \
> >> +                             (((s) & 0x1) << 22) | \
> >> +                             (((c) & 0x3) << 23)))
> >> +
> >> +/*
> >> + * 16Bx2 Block Linear layout, used by Tegra K1 and later
> >>    *
> >>    * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked
> >>    * vertically by a power of 2 (1 to 32 GOBs) to form a block.
> >> @@ -518,20 +610,20 @@ extern "C" {
> >>    * in full detail.
> >>    */
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
> >> -    fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
> >> +    DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
> >>
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x10)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x11)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x12)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x13)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x14)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
> >>   #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
> >> -    fourcc_mod_code(NVIDIA, 0x15)
> >> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
> >>
> >>   /*
> >>    * Some Broadcom modifiers take parameters, for example the number of
> >> --
> >> 2.17.1
> >>
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm: Generalized NV Block Linear DRM format mod
       [not found]               ` <CAKMK7uGC46L_PxSYRxpjnF_gDWvo5K7=6ZBo_Gc1g27M1+TrXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-10-16 17:24                 ` James Jones
  0 siblings, 0 replies; 6+ messages in thread
From: James Jones @ 2019-10-16 17:24 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: David Airlie, Nouveau Dev, Ben Skeggs, dri-devel

On 10/15/19 8:42 AM, Daniel Vetter wrote:
> On Tue, Oct 15, 2019 at 5:14 PM James Jones <jajones@nvidia.com> wrote:
>>
>> On 10/15/19 7:19 AM, Daniel Vetter wrote:
>>> On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:
>>>> Builds upon the existing NVIDIA 16Bx2 block linear
>>>> format modifiers by adding more "fields" to the
>>>> existing parameterized
>>>> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
>>>> macro that allow fully defining a unique-across-
>>>> all-NVIDIA-hardware bit layout using a minimal
>>>> set of fields and values.  The new modifier macro
>>>> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
>>>> effectively backwards compatible with the existing
>>>> macro, introducing a superset of the previously
>>>> definable format modifiers.
>>>>
>>>> Backwards compatibility has two quirks.  First,
>>>> the zero value for the "kind" field, which is
>>>> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
>>>> macro, must be special cased in drivers and
>>>> assumed to map to the pre-Turing generic kind of
>>>> 0xfe, since a kind of "zero" is reserved for
>>>> linear buffer layouts on all GPUs.
>>>>
>>>> Second, it is assumed backwards compatibility
>>>> is only needed when running on Tegra GPUs, and
>>>> specifically Tegra GPUs prior to Xavier.  This
>>>> is based on two assertions:
>>>>
>>>> -Tegra GPUs prior to Xavier used a slightly
>>>>    different raw bit layout than desktop GPUs,
>>>>    making it impossible to directly share block
>>>>    linear buffers between the two.
>>>>
>>>> -Support for the existing block linear modifiers
>>>>    was incomplete, making them useful only for
>>>>    exporting buffers created by nouveau and
>>>>    importing them to Tegra DRM as framebuffers for
>>>>    scan out.  There was no support for adding
>>>>    framebuffers using format modifiers in nouveau,
>>>>    nor importing dma-buf/PRIME GEM objects into
>>>>    nouveau userspace drivers with modifiers in Mesa.
>>>>
>>>> Hence it is assumed the prior modifiers were not
>>>> intended for use on desktop GPUs, and as a
>>>> corrolary, were not intended to support sharing
>>>> block linear buffers across two different NVIDIA
>>>> GPUs.
>>>>
>>>> Signed-off-by: James Jones <jajones@nvidia.com>
>>>> ---
>>>>    include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++---
>>>>    1 file changed, 100 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>>>> index 3feeaa3f987a..cc9853d42a24 100644
>>>> --- a/include/uapi/drm/drm_fourcc.h
>>>> +++ b/include/uapi/drm/drm_fourcc.h
>>>> @@ -497,7 +497,99 @@ extern "C" {
>>>>    #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
>>>>
>>>>    /*
>>>> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
>>>> + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80,
>>>> + * and Tegra GPUs starting with Tegra K1.
>>>> + *
>>>> + * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
>>>> + * based on the architecture generation.  GOBs themselves are then arranged in
>>>> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
>>>> + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
>>>> + * a block depth or height of "4").
>>>> + *
>>>> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
>>>> + * in full detail.
>>>> + *
>>>> + *       Macro
>>>> + * Bits  Param Description
>>>> + * ----  ----- -----------------------------------------------------------------
>>>> + *
>>>> + *  3:0  h     log2(height) of each block, in GOBs.  Placed here for
>>>> + *             compatibility with the existing
>>>> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
>>>> + *
>>>> + *  4:4  -     Must be 1, to indicate block-linear layout.  Necessary for
>>>> + *             compatibility with the existing
>>>> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
>>>> + *
>>>> + *  8:5  -     Reserved (To support 3D-surfaces with variable log2(depth) block
>>>> + *             size).  Must be zero.
>>>> + *
>>>> + *             Note there is no log2(width) parameter.  Some portions of the
>>>> + *             hardware support a block width of two gobs, but it is impractical
>>>> + *             to use due to lack of support elsewhere, and has no known
>>>> + *             benefits.
>>>> + *
>>>> + * 11:9  -     Reserved (To support 2D-array textures with variable array stride
>>>> + *             in blocks, specified via log2(tile width in blocks)).  Must be
>>>> + *             zero.
>>>> + *
>>>> + * 19:12 k     Page Kind.  This value directly maps to a field in the page
>>>> + *             tables of all GPUs >= NV50.  It affects the exact layout of bits
>>>> + *             in memory and can be derived from the tuple
>>>> + *
>>>> + *               (format, GPU model, compression type, samples per pixel)
>>>> + *
>>>> + *             Where compression type is defined below.  If GPU model were
>>>> + *             implied by the format modifier, format, or memory buffer, page
>>>> + *             kind would not need to be included in the modifier itself, but
>>>> + *             since the modifier should define the layout of the associated
>>>> + *             memory buffer independent from any device or other context, it
>>>> + *             must be included here.
>>>> + *
>>>> + *             To grandfather in prior block linear format modifiers to this
>>>> + *             layout, the page kind "0", which corresponds to "pitch/linear"
>>>> + *             and hence is unusable with block-linear layouts, is remapped
>>>> + *             within drivers to the value 0xfe, which corresponds to the
>>>> + *             "generic" kind used for simple single-sample color formats on
>>>> + *             pre-Turing GPUs.
>>>
>>> Hm, maybe a tiny static inline function which canonizalizes modifiers?
>>> Something like
>>>
>>> static inline u64
>>> drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool
>>> is_pre_turing)
>>> {
>>> }
>>>
>>> Would then give you a nice place to stick this backward compat note and
>>> make it really clear what should be done. I think establishing this as a
>>> pattern would also be nice, since I'm sure we'll have a pile more of these
>>> cases where modifiers turn out to assume a few too many things about the
>>> platform they're used on (we have a similar case on the intel side too).
>>
>> To make sure I'm clear, it would behave like this?
>>
>>     fixed_mod = canonicalize(old_style_valid_mod, true);
>>     assert(fixed_mod == old_style_valid_mod | (0xfe << 12));
>>     fixed_mod = canonicalize(new_style_valid_mod, [false,true]);
>>     assert(fixed_mod == new_style_valid_mod);
>>
>> I'm unclear what it should do in this case though:
>>
>>     fixed_mod = canonicalize(old_style_valid_mod, false);
>>
>> Since there's no code out there using the old style modifiers with
>> Turing+ yet, and I don't want to try to support such usage.  Maybe just
>> drop the "is_pre_turing" parameter and always canonicalize by mapping 0
>> -> 0xfe as the comment above states, and not touching other values?  Or
>> should any invalid modifier, including this case, return
>> DRM_FORMAT_MOD_INVALID?  This latter idea seems risky because it would
>> cause software compiled against old drm_fourcc.h to potentially reject
>> format modifiers from newer kernels or libraries with an expanded
>> representation, but maybe that's what we want if only driver components
>> are supposed to call this function.
> 
> That's where I'm showing that I have no clue about nvidia buffer
> formats I guess :-)
> 
> If 0 never makes sense, then yeah I guess you could just
> unconditionally canonicalize. Maybe you want to reject the old/legacy
> style on turing+ plus, but that's a tradeoff up to you guys - as long
> as it's consistent across all involved drivers. Not rejecting it would
> essentially make 0 an alias for 0xfe everywhere, post and pre-turing.
> More reasons for having a shared canonicalize function which enforces
> the same behaviour (whichever you pick) everywhere I think.
> 
> And yeah the canonicalize function would only be called by drivers,
> not by clients.
> -Daniel

I wrote up a variant of this that canonicalizes any NV format mod to 
make it easier to use, and rebased my patches on top of it.  I only 
ended up with one caller, but I do like the design.  Sent out as v2.

Thanks,
-James

> 
>> Thanks,
>> -James
>>
>>> Just a drive-by idea, feel free to ignore.
>>>
>>> Cheers, Daniel
>>>
>>>> + *
>>>> + * 21:20 g     GOB Height and Page Kind Generation.  The height of a GOB changed
>>>> + *             starting with Fermi GPUs.  Additionally, the mapping between page
>>>> + *             kind and bit layout has changed at various points.
>>>> + *
>>>> + *               0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping
>>>> + *               1 = Gob Height 4, G80 - GT2XX Page Kind mapping
>>>> + *               2 = Gob Height 8, Turing+ Page Kind mapping
>>>> + *               3 = Reserved for future use.
>>>> + *
>>>> + * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there is a further
>>>> + *             bit remapping step that occurs at an even lower level than the
>>>> + *             page kind and block linear swizzles.  This causes the layout of
>>>> + *             surfaces mapped in those SOC's GPUs to be incompatible with the
>>>> + *             equivalent mapping on other GPUs in the same system.
>>>> + *
>>>> + *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
>>>> + *               1 = Desktop GPU and Tegra Xavier+ Layout
>>>> + *
>>>> + * 24:23 c     Lossless Framebuffer Compression type.
>>>> + *
>>>> + *               0 = none
>>>> + *               1 = ROP/3D, actual compression implied by the Page Kind field
>>>> + *               2 = CDE horizontal
>>>> + *               3 = CDE vertical
>>>> + *
>>>> + * 55:25 -     Reserved for future use.  Must be zero.
>>>> + */
>>>> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
>>>> +    fourcc_mod_code(NVIDIA, (0x10 | \
>>>> +                             ((h) & 0xf) | \
>>>> +                             (((k) & 0xff) << 12) | \
>>>> +                             (((g) & 0x3) << 20) | \
>>>> +                             (((s) & 0x1) << 22) | \
>>>> +                             (((c) & 0x3) << 23)))
>>>> +
>>>> +/*
>>>> + * 16Bx2 Block Linear layout, used by Tegra K1 and later
>>>>     *
>>>>     * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked
>>>>     * vertically by a power of 2 (1 to 32 GOBs) to form a block.
>>>> @@ -518,20 +610,20 @@ extern "C" {
>>>>     * in full detail.
>>>>     */
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
>>>> -    fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
>>>> +    DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
>>>>
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x10)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x11)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x12)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x13)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x14)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x15)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
>>>>
>>>>    /*
>>>>     * Some Broadcom modifiers take parameters, for example the number of
>>>> --
>>>> 2.17.1
>>>>
>>>
> 
> 
> 
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-10-16 17:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-14 22:13 [PATCH] drm: Generalized NV Block Linear DRM format mod James Jones
2019-10-14 22:13 ` James Jones
     [not found]   ` <20191014221321.8163-2-jajones-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2019-10-15 14:19     ` Daniel Vetter
     [not found]       ` <20191015141913.GO11828-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-10-15 15:07         ` James Jones
     [not found]           ` <d6252e64-57b5-3007-62ae-7e5aff0df5f6-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2019-10-15 15:42             ` Daniel Vetter
     [not found]               ` <CAKMK7uGC46L_PxSYRxpjnF_gDWvo5K7=6ZBo_Gc1g27M1+TrXg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-10-16 17:24                 ` James Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.