All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/19] drm/i915/dg2: Enabling 64k page size and flat ccs
@ 2022-02-01 10:41 ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Matthew Auld, Lionel Landwerlin

This series introduces the enabling patches for new memory compression
feature Flat CCS and 64k page support for i915 local memory, along with
documentation on the uAPI impact. Included the details of the feature and
the implications on the uAPI below. Which is also added into
Documentation/gpu/rfc/i915_dg2.rst

DG2 64K page size support:
=========================

On discrete platforms, starting from DG2, we have to contend with GTT
page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
objects.  Specifically the hardware only supports 64K or larger GTT
page sizes for such memory. The kernel will already ensure that all
I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
sizes underneath.

Note that the returned size here will always reflect any required
rounding up done by the kernel, i.e 4K will now become 64K on devices
such as DG2.

Special DG2 GTT address alignment requirement:

The GTT alignment will also need to be at least 2M for such objects.

Note that due to how the hardware implements 64K GTT page support, we
have some further complications:

1) The entire PDE (which covers a 2MB virtual address range), must
contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
PDE is forbidden by the hardware.

2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
objects.

To keep things simple for userland, we mandate that any GTT mappings
must be aligned to and rounded up to 2MB. As this only wastes virtual
address space and avoids userland having to copy any needlessly
complicated PDE sharing scheme (coloring) and only affects DG2, this
is deemed to be a good compromise.

Flat CCS support for lmem
=========================
On Xe-HP and later devices, we use dedicated compression control state
(CCS) stored in local memory for each surface, to support the 3D and
media compression formats.

The memory required for the CCS of the entire local memory is 1/256 of
the local memory size. So before the kernel boot, the required memory is
reserved for the CCS data and a secure register will be programmed with
the CCS base address.

Flat CCS data needs to be cleared when a lmem object is allocated. And
CCS data can be copied in and out of CCS region through
XY_CTRL_SURF_COPY_BLT. CPU can’t access the CCS data directly.

When we exaust the lmem, if the object’s placements support smem, then
we can directly decompress the compressed lmem object into smem and
start using it from smem itself.

But when we need to swapout the compressed lmem object into a smem
region though objects’ placement doesn’t support smem, then we copy the
lmem content as it is into smem region along with ccs data (using
XY_CTRL_SURF_COPY_BLT). When the object is referred, lmem content will
be swaped in along with restoration of the CCS data (using
XY_CTRL_SURF_COPY_BLT) at corresponding location.

Flat-CCS Modifiers for different compression formats
====================================================
I915_FORMAT_MOD_4_TILED_DG2_RC_CCS - used to indicate the buffers of
Flat CCS render compression formats. Though the general layout is same
as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression
algorithm is used. Render compression uses 128 byte compression blocks

I915_FORMAT_MOD_4_TILED_DG2_MC_CCS -used to indicate the buffers of Flat
CCS media compression formats. Though the general layout is same as
I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm
is used. Media compression uses 256 byte compression blocks.

I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC - used to indicate the buffers of
Flat CCS clear color render compression formats. Unified compression
format for clear color render compression. The genral layout is a tiled
layout using 4Kb tiles i.e Tile4 layout. Fast clear color value expected
by HW is located in fb at offset 0 of plane#1

v2:
  Fixed some formatting issues and platform naming issues
  Added some more documentation on Flat-CCS

v3:
  Plane programming is handled for flat-ccs and clear color
  Tile4 and flat ccs modifier patches are rebased on table based
    modifier reference method
  Three patches are squashed
  Y tile is pruned for DG2.
  flat_ccs_cc plane format info is added
  Added mesa, compute and media ppl for required uAPI ack.

v4:
  Rebasing of the patches

v5:
  KDoc is enhanced for cc modifier. [Nanley & Lionel]
  inbuild macro usage for functional fix [Bob]
  Addressed review comments from Matt
  Platform coverage fix for modifiers [Imre]

Abdiel Janulgue (1):
  drm/i915/lmem: Enable lmem for platforms with Flat CCS

Anshuman Gupta (1):
  drm/i915/dg2: Flat CCS Support

Ayaz A Siddiqui (1):
  drm/i915/gt: Clear compress metadata for Xe_HP platforms

CQ Tang (1):
  drm/i915/xehpsdv: Add has_flat_ccs to device info

Matt Roper (1):
  drm/i915/dg2: Add DG2 unified compression

Matthew Auld (6):
  drm/i915: enforce min GTT alignment for discrete cards
  drm/i915: support 64K GTT pages for discrete cards
  drm/i915/gtt: allow overriding the pt alignment
  drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  drm/i915/migrate: add acceleration support for DG2
  drm/i915/uapi: document behaviour for DG2 64K support

Mika Kahola (1):
  uapi/drm/dg2: Introduce format modifier for DG2 clear color

Ramalingam C (4):
  drm/i915: add needs_compact_pt flag
  Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
  drm/i915/Flat-CCS: Document on Flat-CCS memory compression
  Doc/gpu/rfc/i915: i915 DG2 flat-CCS uAPI

Robert Beckett (1):
  drm/i915: add gtt misalignment test

Stanislav Lisovskiy (2):
  drm/i915: Introduce new Tile 4 format
  drm/i915/dg2: Tile 4 plane format support

 Documentation/gpu/rfc/i915_dg2.rst            |  32 ++
 Documentation/gpu/rfc/index.rst               |   3 +
 drivers/gpu/drm/i915/display/intel_display.c  |   5 +-
 drivers/gpu/drm/i915/display/intel_fb.c       |  68 +++-
 drivers/gpu/drm/i915/display/intel_fb.h       |   1 +
 drivers/gpu/drm/i915/display/intel_fbc.c      |   1 +
 .../drm/i915/display/intel_plane_initial.c    |   1 +
 .../drm/i915/display/skl_universal_plane.c    |  70 +++-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++
 .../i915/gem/selftests/i915_gem_client_blt.c  |  21 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 158 +++++++-
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |  14 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |  19 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |   1 +
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  12 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  31 +-
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 336 ++++++++++++++++--
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  17 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  24 +-
 drivers/gpu/drm/i915/i915_drv.h               |  18 +-
 drivers/gpu/drm/i915/i915_pci.c               |   4 +
 drivers/gpu/drm/i915/i915_reg.h               |   4 +
 drivers/gpu/drm/i915/i915_vma.c               |   9 +
 drivers/gpu/drm/i915/intel_device_info.h      |   3 +
 drivers/gpu/drm/i915/intel_pm.c               |   1 +
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 224 ++++++++++--
 include/uapi/drm/drm_fourcc.h                 |  43 +++
 include/uapi/drm/i915_drm.h                   |  44 ++-
 28 files changed, 1102 insertions(+), 122 deletions(-)
 create mode 100644 Documentation/gpu/rfc/i915_dg2.rst

-- 
2.20.1


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 00/19] drm/i915/dg2: Enabling 64k page size and flat ccs
@ 2022-02-01 10:41 ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Matthew Auld

This series introduces the enabling patches for new memory compression
feature Flat CCS and 64k page support for i915 local memory, along with
documentation on the uAPI impact. Included the details of the feature and
the implications on the uAPI below. Which is also added into
Documentation/gpu/rfc/i915_dg2.rst

DG2 64K page size support:
=========================

On discrete platforms, starting from DG2, we have to contend with GTT
page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
objects.  Specifically the hardware only supports 64K or larger GTT
page sizes for such memory. The kernel will already ensure that all
I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
sizes underneath.

Note that the returned size here will always reflect any required
rounding up done by the kernel, i.e 4K will now become 64K on devices
such as DG2.

Special DG2 GTT address alignment requirement:

The GTT alignment will also need to be at least 2M for such objects.

Note that due to how the hardware implements 64K GTT page support, we
have some further complications:

1) The entire PDE (which covers a 2MB virtual address range), must
contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
PDE is forbidden by the hardware.

2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
objects.

To keep things simple for userland, we mandate that any GTT mappings
must be aligned to and rounded up to 2MB. As this only wastes virtual
address space and avoids userland having to copy any needlessly
complicated PDE sharing scheme (coloring) and only affects DG2, this
is deemed to be a good compromise.

Flat CCS support for lmem
=========================
On Xe-HP and later devices, we use dedicated compression control state
(CCS) stored in local memory for each surface, to support the 3D and
media compression formats.

The memory required for the CCS of the entire local memory is 1/256 of
the local memory size. So before the kernel boot, the required memory is
reserved for the CCS data and a secure register will be programmed with
the CCS base address.

Flat CCS data needs to be cleared when a lmem object is allocated. And
CCS data can be copied in and out of CCS region through
XY_CTRL_SURF_COPY_BLT. CPU can’t access the CCS data directly.

When we exaust the lmem, if the object’s placements support smem, then
we can directly decompress the compressed lmem object into smem and
start using it from smem itself.

But when we need to swapout the compressed lmem object into a smem
region though objects’ placement doesn’t support smem, then we copy the
lmem content as it is into smem region along with ccs data (using
XY_CTRL_SURF_COPY_BLT). When the object is referred, lmem content will
be swaped in along with restoration of the CCS data (using
XY_CTRL_SURF_COPY_BLT) at corresponding location.

Flat-CCS Modifiers for different compression formats
====================================================
I915_FORMAT_MOD_4_TILED_DG2_RC_CCS - used to indicate the buffers of
Flat CCS render compression formats. Though the general layout is same
as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression
algorithm is used. Render compression uses 128 byte compression blocks

I915_FORMAT_MOD_4_TILED_DG2_MC_CCS -used to indicate the buffers of Flat
CCS media compression formats. Though the general layout is same as
I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm
is used. Media compression uses 256 byte compression blocks.

I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC - used to indicate the buffers of
Flat CCS clear color render compression formats. Unified compression
format for clear color render compression. The genral layout is a tiled
layout using 4Kb tiles i.e Tile4 layout. Fast clear color value expected
by HW is located in fb at offset 0 of plane#1

v2:
  Fixed some formatting issues and platform naming issues
  Added some more documentation on Flat-CCS

v3:
  Plane programming is handled for flat-ccs and clear color
  Tile4 and flat ccs modifier patches are rebased on table based
    modifier reference method
  Three patches are squashed
  Y tile is pruned for DG2.
  flat_ccs_cc plane format info is added
  Added mesa, compute and media ppl for required uAPI ack.

v4:
  Rebasing of the patches

v5:
  KDoc is enhanced for cc modifier. [Nanley & Lionel]
  inbuild macro usage for functional fix [Bob]
  Addressed review comments from Matt
  Platform coverage fix for modifiers [Imre]

Abdiel Janulgue (1):
  drm/i915/lmem: Enable lmem for platforms with Flat CCS

Anshuman Gupta (1):
  drm/i915/dg2: Flat CCS Support

Ayaz A Siddiqui (1):
  drm/i915/gt: Clear compress metadata for Xe_HP platforms

CQ Tang (1):
  drm/i915/xehpsdv: Add has_flat_ccs to device info

Matt Roper (1):
  drm/i915/dg2: Add DG2 unified compression

Matthew Auld (6):
  drm/i915: enforce min GTT alignment for discrete cards
  drm/i915: support 64K GTT pages for discrete cards
  drm/i915/gtt: allow overriding the pt alignment
  drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  drm/i915/migrate: add acceleration support for DG2
  drm/i915/uapi: document behaviour for DG2 64K support

Mika Kahola (1):
  uapi/drm/dg2: Introduce format modifier for DG2 clear color

Ramalingam C (4):
  drm/i915: add needs_compact_pt flag
  Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
  drm/i915/Flat-CCS: Document on Flat-CCS memory compression
  Doc/gpu/rfc/i915: i915 DG2 flat-CCS uAPI

Robert Beckett (1):
  drm/i915: add gtt misalignment test

Stanislav Lisovskiy (2):
  drm/i915: Introduce new Tile 4 format
  drm/i915/dg2: Tile 4 plane format support

 Documentation/gpu/rfc/i915_dg2.rst            |  32 ++
 Documentation/gpu/rfc/index.rst               |   3 +
 drivers/gpu/drm/i915/display/intel_display.c  |   5 +-
 drivers/gpu/drm/i915/display/intel_fb.c       |  68 +++-
 drivers/gpu/drm/i915/display/intel_fb.h       |   1 +
 drivers/gpu/drm/i915/display/intel_fbc.c      |   1 +
 .../drm/i915/display/intel_plane_initial.c    |   1 +
 .../drm/i915/display/skl_universal_plane.c    |  70 +++-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++
 .../i915/gem/selftests/i915_gem_client_blt.c  |  21 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 158 +++++++-
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |  14 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |  19 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |   1 +
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  12 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  31 +-
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 336 ++++++++++++++++--
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  17 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  24 +-
 drivers/gpu/drm/i915/i915_drv.h               |  18 +-
 drivers/gpu/drm/i915/i915_pci.c               |   4 +
 drivers/gpu/drm/i915/i915_reg.h               |   4 +
 drivers/gpu/drm/i915/i915_vma.c               |   9 +
 drivers/gpu/drm/i915/intel_device_info.h      |   3 +
 drivers/gpu/drm/i915/intel_pm.c               |   1 +
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 224 ++++++++++--
 include/uapi/drm/drm_fourcc.h                 |  43 +++
 include/uapi/drm/i915_drm.h                   |  44 ++-
 28 files changed, 1102 insertions(+), 122 deletions(-)
 create mode 100644 Documentation/gpu/rfc/i915_dg2.rst

-- 
2.20.1


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v5 01/19] drm/i915: add needs_compact_pt flag
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Thomas Hellström, Matthew Auld, Lionel Landwerlin

Add a new platform flag, needs_compact_pt, to mark the requirement of
compact pt layout support for the ppGTT when using 64K GTT pages.

With this flag has_64k_pages will only indicate requirement of 64K
GTT page sizes or larger for device local memory access.

v6:
	* minor doc formatting

Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 11 ++++++++---
 drivers/gpu/drm/i915/i915_pci.c          |  2 ++
 drivers/gpu/drm/i915/intel_device_info.h |  1 +
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 00e7594b59c9..4afdfa5fd3b3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1512,12 +1512,17 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 
 /*
  * Set this flag, when platform requires 64K GTT page sizes or larger for
- * device local memory access. Also this flag implies that we require or
- * at least support the compact PT layout for the ppGTT when using the 64K
- * GTT pages.
+ * device local memory access.
  */
 #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
 
+/*
+ * Set this flag when platform doesn't allow both 64k pages and 4k pages in
+ * the same PT. this flag means we need to support compact PT layout for the
+ * ppGTT when using the 64K GTT pages.
+ */
+#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt)
+
 #define HAS_IPC(dev_priv)		 (INTEL_INFO(dev_priv)->display.has_ipc)
 
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 2df2db0a5d70..ce6ae6a3cbdf 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1028,6 +1028,7 @@ static const struct intel_device_info xehpsdv_info = {
 	PLATFORM(INTEL_XEHPSDV),
 	.display = { },
 	.has_64k_pages = 1,
+	.needs_compact_pt = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) |
@@ -1046,6 +1047,7 @@ static const struct intel_device_info dg2_info = {
 	PLATFORM(INTEL_DG2),
 	.has_guc_deprivilege = 1,
 	.has_64k_pages = 1,
+	.needs_compact_pt = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) |
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index abf1e103c558..d8da40d01dca 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -130,6 +130,7 @@ enum intel_ppgtt_type {
 	/* Keep has_* in alphabetical order */ \
 	func(has_64bit_reloc); \
 	func(has_64k_pages); \
+	func(needs_compact_pt); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
 	func(has_global_mocs); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 01/19] drm/i915: add needs_compact_pt flag
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Thomas Hellström, Matthew Auld

Add a new platform flag, needs_compact_pt, to mark the requirement of
compact pt layout support for the ppGTT when using 64K GTT pages.

With this flag has_64k_pages will only indicate requirement of 64K
GTT page sizes or larger for device local memory access.

v6:
	* minor doc formatting

Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 11 ++++++++---
 drivers/gpu/drm/i915/i915_pci.c          |  2 ++
 drivers/gpu/drm/i915/intel_device_info.h |  1 +
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 00e7594b59c9..4afdfa5fd3b3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1512,12 +1512,17 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 
 /*
  * Set this flag, when platform requires 64K GTT page sizes or larger for
- * device local memory access. Also this flag implies that we require or
- * at least support the compact PT layout for the ppGTT when using the 64K
- * GTT pages.
+ * device local memory access.
  */
 #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
 
+/*
+ * Set this flag when platform doesn't allow both 64k pages and 4k pages in
+ * the same PT. this flag means we need to support compact PT layout for the
+ * ppGTT when using the 64K GTT pages.
+ */
+#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt)
+
 #define HAS_IPC(dev_priv)		 (INTEL_INFO(dev_priv)->display.has_ipc)
 
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 2df2db0a5d70..ce6ae6a3cbdf 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1028,6 +1028,7 @@ static const struct intel_device_info xehpsdv_info = {
 	PLATFORM(INTEL_XEHPSDV),
 	.display = { },
 	.has_64k_pages = 1,
+	.needs_compact_pt = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) |
@@ -1046,6 +1047,7 @@ static const struct intel_device_info dg2_info = {
 	PLATFORM(INTEL_DG2),
 	.has_guc_deprivilege = 1,
 	.has_64k_pages = 1,
+	.needs_compact_pt = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) |
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index abf1e103c558..d8da40d01dca 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -130,6 +130,7 @@ enum intel_ppgtt_type {
 	/* Keep has_* in alphabetical order */ \
 	func(has_64bit_reloc); \
 	func(has_64k_pages); \
+	func(needs_compact_pt); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
 	func(has_global_mocs); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 02/19] drm/i915: enforce min GTT alignment for discrete cards
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Thomas Hellström, Lionel Landwerlin, Matthew Auld, Rodrigo Vivi

From: Matthew Auld <matthew.auld@intel.com>

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For compact-pt we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

v3:
	* use needs_compact_pt flag to discriminate between
	  64K and 64K with compact-pt
	* add i915_vm_obj_min_alignment
	* use i915_vm_obj_min_alignment to round up vma reservation
	  if compact-pt instead of hard coding
v5:
	* fix i915_vm_obj_min_alignment for internal objects which
	  have no memory region
v6:
	* tiled_blits_create correctly pick largest required alignment

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 21 ++--
 drivers/gpu/drm/i915/gt/intel_gtt.c           | 12 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h           | 18 ++++
 drivers/gpu/drm/i915/i915_vma.c               |  9 ++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
 5 files changed, 115 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index c8ff8bf0986d..3675d12a7d9a 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -39,6 +39,7 @@ struct tiled_blits {
 	struct blit_buffer scratch;
 	struct i915_vma *batch;
 	u64 hole;
+	u64 align;
 	u32 width;
 	u32 height;
 };
@@ -410,14 +411,19 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_free;
 	}
 
-	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+	t->align = i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL);
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
 	hole_size *= 2; /* room to maneuver */
-	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+	hole_size += 2 * t->align; /* padding on either side */
 
 	mutex_lock(&t->ce->vm->mutex);
 	memset(&hole, 0, sizeof(hole));
 	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
-					  hole_size, 0, I915_COLOR_UNEVICTABLE,
+					  hole_size, t->align,
+					  I915_COLOR_UNEVICTABLE,
 					  0, U64_MAX,
 					  DRM_MM_INSERT_BEST);
 	if (!err)
@@ -428,7 +434,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_put;
 	}
 
-	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+	t->hole = hole.start + t->align;
 	pr_info("Using hole at %llx\n", t->hole);
 
 	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -455,7 +461,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
 			       struct rnd_state *prng)
 {
-	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+	u64 offset = round_up(t->width * t->height * 4, t->align);
 	u32 *map;
 	int err;
 	int i;
@@ -486,8 +492,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-	u64 offset =
-		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
 	int err;
 
 	/* We want to check position invariant tiling across GTT eviction */
@@ -500,7 +505,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 
 	/* Reposition so that we overlap the old addresses, and slightly off */
 	err = tiled_blit(t,
-			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+			 &t->buffers[2], t->hole + t->align,
 			 &t->buffers[1], t->hole + 3 * offset / 2);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 46be4197b93f..df23ebdfc994 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -223,6 +223,18 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	GEM_BUG_ON(!vm->total);
 	drm_mm_init(&vm->mm, 0, vm->total);
+
+	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+		 ARRAY_SIZE(vm->min_alignment));
+
+	if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+	} else if (HAS_64K_PAGES(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+	}
+
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
 	INIT_LIST_HEAD(&vm->bound_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 8073438b67c8..ba9f040f8606 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -29,6 +29,8 @@
 #include "i915_selftest.h"
 #include "i915_vma_resource.h"
 #include "i915_vma_types.h"
+#include "i915_params.h"
+#include "intel_memory_region.h"
 
 #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
 
@@ -223,6 +225,7 @@ struct i915_address_space {
 	struct device *dma;
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 	u64 reserved;		/* size addr space reserved */
+	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
 
 	unsigned int bind_async_flags;
 
@@ -384,6 +387,21 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
 	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
 }
 
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
+					enum intel_memory_type type)
+{
+	return vm->min_alignment[type];
+}
+
+static inline u64 i915_vm_obj_min_alignment(struct i915_address_space *vm,
+					    struct drm_i915_gem_object  *obj)
+{
+	struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+	enum intel_memory_type type = mr ? mr->type : INTEL_MEMORY_SYSTEM;
+
+	return i915_vm_min_alignment(vm, type);
+}
+
 static inline bool
 i915_vm_has_cache_coloring(struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 845cd88f8313..3558b16a929c 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -757,6 +757,14 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 		end = min_t(u64, end, (1ULL << 32) - I915_GTT_PAGE_SIZE);
 	GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE));
 
+	alignment = max(alignment, i915_vm_obj_min_alignment(vma->vm, vma->obj));
+	/*
+	 * for compact-pt we round up the reservation to prevent
+	 * any smaller pages being used within the same PDE
+	 */
+	if (NEEDS_COMPACT_PT(vma->vm->i915))
+		size = round_up(size, alignment);
+
 	/* If binding the object/GGTT view requires more space than the entire
 	 * aperture has, reject it early before evicting everything in a vain
 	 * attempt to find space.
@@ -769,6 +777,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 	}
 
 	color = 0;
+
 	if (i915_vm_has_cache_coloring(vma->vm))
 		color = vma->obj->cache_level;
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index fba1c8be1649..b80788a2b7f9 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			 u64 hole_start, u64 hole_end,
 			 unsigned long end_time)
 {
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	I915_RND_STATE(seed_prng);
 	struct i915_vma_resource *mock_vma_res;
 	unsigned int size;
@@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		I915_RND_SUBSTATE(prng, seed_prng);
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 		GEM_BUG_ON(!order);
 
-		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
-		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
+		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
+		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
 
 		/* Ignore allocation failures (i.e. don't report them as
 		 * a test failure) as we are purposefully allocating very
@@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
-			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
+			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
 
 			if (igt_timeout(end_time,
 					"%s timed out before %d/%d\n",
@@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			}
 
 			mock_vma_res->bi.pages = obj->mm.pages;
-			mock_vma_res->node_size = BIT_ULL(size);
+			mock_vma_res->node_size = BIT_ULL(aligned_size);
 			mock_vma_res->start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
@@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 		i915_random_reorder(order, count, &prng);
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
 			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
@@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
 {
 	const u64 hole_size = hole_end - hole_start;
 	struct drm_i915_gem_object *obj;
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	const unsigned long max_pages =
-		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
+		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
 	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
 	unsigned long npages, prime, flags;
 	struct i915_vma *vma;
@@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 			}
@@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
 	const u64 hole_size = hole_end - hole_start;
 	const unsigned long max_pages =
 		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
+	unsigned long min_alignment;
 	unsigned long flags;
 	u64 size;
 
@@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	for_each_prime_number_from(size, 1, max_pages) {
 		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
@@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
 
 		for (addr = hole_start;
 		     addr + obj->base.size < hole_end;
-		     addr += obj->base.size) {
+		     addr += round_up(obj->base.size, min_alignment)) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
 				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
@@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	unsigned int min_alignment;
 	unsigned long flags;
 	unsigned int pot;
 	int err = 0;
@@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
@@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
 
 	/* Insert a pair of pages across every pot boundary within the hole */
 	for (pot = fls64(hole_end - 1) - 1;
-	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
+	     pot > ilog2(2 * min_alignment);
 	     pot--) {
 		u64 step = BIT_ULL(pot);
 		u64 addr;
 
-		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
-		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
+		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
+		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
 		     addr += step) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		      unsigned long end_time)
 {
 	I915_RND_STATE(prng);
+	unsigned int min_alignment;
 	unsigned int size;
 	unsigned long flags;
 
@@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (size = 12; (hole_end - hole_start) >> size; size++) {
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
 		struct i915_vma *vma;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 		int err = -ENODEV;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		GEM_BUG_ON(vma->size != BIT_ULL(size));
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	unsigned int min_alignment;
 	unsigned int order = 12;
 	LIST_HEAD(objects);
 	int err = 0;
 	u64 addr;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (addr = hole_start; addr < hole_end; ) {
 		struct i915_vma *vma;
@@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 		}
 
 		i915_vma_unpin(vma);
-		addr += size;
+		addr += round_up(size, min_alignment);
 
 		/*
 		 * Since we are injecting allocation faults at random intervals,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 02/19] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Thomas Hellström, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For compact-pt we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

v3:
	* use needs_compact_pt flag to discriminate between
	  64K and 64K with compact-pt
	* add i915_vm_obj_min_alignment
	* use i915_vm_obj_min_alignment to round up vma reservation
	  if compact-pt instead of hard coding
v5:
	* fix i915_vm_obj_min_alignment for internal objects which
	  have no memory region
v6:
	* tiled_blits_create correctly pick largest required alignment

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 21 ++--
 drivers/gpu/drm/i915/gt/intel_gtt.c           | 12 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h           | 18 ++++
 drivers/gpu/drm/i915/i915_vma.c               |  9 ++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
 5 files changed, 115 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index c8ff8bf0986d..3675d12a7d9a 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -39,6 +39,7 @@ struct tiled_blits {
 	struct blit_buffer scratch;
 	struct i915_vma *batch;
 	u64 hole;
+	u64 align;
 	u32 width;
 	u32 height;
 };
@@ -410,14 +411,19 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_free;
 	}
 
-	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+	t->align = i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL);
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
 	hole_size *= 2; /* room to maneuver */
-	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+	hole_size += 2 * t->align; /* padding on either side */
 
 	mutex_lock(&t->ce->vm->mutex);
 	memset(&hole, 0, sizeof(hole));
 	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
-					  hole_size, 0, I915_COLOR_UNEVICTABLE,
+					  hole_size, t->align,
+					  I915_COLOR_UNEVICTABLE,
 					  0, U64_MAX,
 					  DRM_MM_INSERT_BEST);
 	if (!err)
@@ -428,7 +434,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_put;
 	}
 
-	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+	t->hole = hole.start + t->align;
 	pr_info("Using hole at %llx\n", t->hole);
 
 	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -455,7 +461,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
 			       struct rnd_state *prng)
 {
-	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+	u64 offset = round_up(t->width * t->height * 4, t->align);
 	u32 *map;
 	int err;
 	int i;
@@ -486,8 +492,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-	u64 offset =
-		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
 	int err;
 
 	/* We want to check position invariant tiling across GTT eviction */
@@ -500,7 +505,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 
 	/* Reposition so that we overlap the old addresses, and slightly off */
 	err = tiled_blit(t,
-			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+			 &t->buffers[2], t->hole + t->align,
 			 &t->buffers[1], t->hole + 3 * offset / 2);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 46be4197b93f..df23ebdfc994 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -223,6 +223,18 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	GEM_BUG_ON(!vm->total);
 	drm_mm_init(&vm->mm, 0, vm->total);
+
+	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+		 ARRAY_SIZE(vm->min_alignment));
+
+	if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+	} else if (HAS_64K_PAGES(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+	}
+
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
 	INIT_LIST_HEAD(&vm->bound_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 8073438b67c8..ba9f040f8606 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -29,6 +29,8 @@
 #include "i915_selftest.h"
 #include "i915_vma_resource.h"
 #include "i915_vma_types.h"
+#include "i915_params.h"
+#include "intel_memory_region.h"
 
 #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
 
@@ -223,6 +225,7 @@ struct i915_address_space {
 	struct device *dma;
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 	u64 reserved;		/* size addr space reserved */
+	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
 
 	unsigned int bind_async_flags;
 
@@ -384,6 +387,21 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
 	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
 }
 
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
+					enum intel_memory_type type)
+{
+	return vm->min_alignment[type];
+}
+
+static inline u64 i915_vm_obj_min_alignment(struct i915_address_space *vm,
+					    struct drm_i915_gem_object  *obj)
+{
+	struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+	enum intel_memory_type type = mr ? mr->type : INTEL_MEMORY_SYSTEM;
+
+	return i915_vm_min_alignment(vm, type);
+}
+
 static inline bool
 i915_vm_has_cache_coloring(struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 845cd88f8313..3558b16a929c 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -757,6 +757,14 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 		end = min_t(u64, end, (1ULL << 32) - I915_GTT_PAGE_SIZE);
 	GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE));
 
+	alignment = max(alignment, i915_vm_obj_min_alignment(vma->vm, vma->obj));
+	/*
+	 * for compact-pt we round up the reservation to prevent
+	 * any smaller pages being used within the same PDE
+	 */
+	if (NEEDS_COMPACT_PT(vma->vm->i915))
+		size = round_up(size, alignment);
+
 	/* If binding the object/GGTT view requires more space than the entire
 	 * aperture has, reject it early before evicting everything in a vain
 	 * attempt to find space.
@@ -769,6 +777,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 	}
 
 	color = 0;
+
 	if (i915_vm_has_cache_coloring(vma->vm))
 		color = vma->obj->cache_level;
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index fba1c8be1649..b80788a2b7f9 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			 u64 hole_start, u64 hole_end,
 			 unsigned long end_time)
 {
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	I915_RND_STATE(seed_prng);
 	struct i915_vma_resource *mock_vma_res;
 	unsigned int size;
@@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		I915_RND_SUBSTATE(prng, seed_prng);
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 		GEM_BUG_ON(!order);
 
-		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
-		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
+		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
+		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
 
 		/* Ignore allocation failures (i.e. don't report them as
 		 * a test failure) as we are purposefully allocating very
@@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
-			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
+			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
 
 			if (igt_timeout(end_time,
 					"%s timed out before %d/%d\n",
@@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			}
 
 			mock_vma_res->bi.pages = obj->mm.pages;
-			mock_vma_res->node_size = BIT_ULL(size);
+			mock_vma_res->node_size = BIT_ULL(aligned_size);
 			mock_vma_res->start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
@@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 		i915_random_reorder(order, count, &prng);
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
 			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
@@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm,
 {
 	const u64 hole_size = hole_end - hole_start;
 	struct drm_i915_gem_object *obj;
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	const unsigned long max_pages =
-		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
+		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
 	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
 	unsigned long npages, prime, flags;
 	struct i915_vma *vma;
@@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 			}
@@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm,
 	const u64 hole_size = hole_end - hole_start;
 	const unsigned long max_pages =
 		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
+	unsigned long min_alignment;
 	unsigned long flags;
 	u64 size;
 
@@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	for_each_prime_number_from(size, 1, max_pages) {
 		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
@@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
 
 		for (addr = hole_start;
 		     addr + obj->base.size < hole_end;
-		     addr += obj->base.size) {
+		     addr += round_up(obj->base.size, min_alignment)) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
 				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
@@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	unsigned int min_alignment;
 	unsigned long flags;
 	unsigned int pot;
 	int err = 0;
@@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
@@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
 
 	/* Insert a pair of pages across every pot boundary within the hole */
 	for (pot = fls64(hole_end - 1) - 1;
-	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
+	     pot > ilog2(2 * min_alignment);
 	     pot--) {
 		u64 step = BIT_ULL(pot);
 		u64 addr;
 
-		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
-		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
+		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
+		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
 		     addr += step) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		      unsigned long end_time)
 {
 	I915_RND_STATE(prng);
+	unsigned int min_alignment;
 	unsigned int size;
 	unsigned long flags;
 
@@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (size = 12; (hole_end - hole_start) >> size; size++) {
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
 		struct i915_vma *vma;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 		int err = -ENODEV;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		GEM_BUG_ON(vma->size != BIT_ULL(size));
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	unsigned int min_alignment;
 	unsigned int order = 12;
 	LIST_HEAD(objects);
 	int err = 0;
 	u64 addr;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (addr = hole_start; addr < hole_end; ) {
 		struct i915_vma *vma;
@@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 		}
 
 		i915_vma_unpin(vma);
-		addr += size;
+		addr += round_up(size, min_alignment);
 
 		/*
 		 * Since we are injecting allocation faults at random intervals,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 03/19] drm/i915: support 64K GTT pages for discrete cards
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Thomas Hellström, kernel test robot, Lionel Landwerlin,
	Matthew Auld, Rodrigo Vivi

From: Matthew Auld <matthew.auld@intel.com>

discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

v4: don't return uninitialized err in igt_ppgtt_compact
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++++++++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 108 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 4 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index f36191ebf964..a7d9bdb85d70 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg)
 	return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct drm_i915_gem_object *obj;
+	int err;
+
+	/*
+	 * Simple test to catch issues with compact 64K pages -- since the pt is
+	 * compacted to 256B that gives us 32 entries per pt, however since the
+	 * backing page for the pt is 4K, any extra entries we might incorrectly
+	 * write out should be ignored by the HW. If ever hit such a case this
+	 * test should catch it since some of our writes would land in scratch.
+	 */
+
+	if (!HAS_64K_PAGES(i915)) {
+		pr_info("device lacks compact 64K page support, skipping\n");
+		return 0;
+	}
+
+	if (!HAS_LMEM(i915)) {
+		pr_info("device lacks LMEM support, skipping\n");
+		return 0;
+	}
+
+	/* We want the range to cover multiple page-table boundaries. */
+	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	err = i915_gem_object_pin_pages_unlocked(obj);
+	if (err)
+		goto out_put;
+
+	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+		pr_info("LMEM compact unable to allocate huge-page(s)\n");
+		goto out_unpin;
+	}
+
+	/*
+	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+	 * insertion.
+	 */
+	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+	err = igt_write_huge(i915, obj);
+	if (err)
+		pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+	i915_gem_object_unpin_pages(obj);
+out_put:
+	i915_gem_object_put(obj);
+
+	if (err == -ENOMEM)
+		err = 0;
+
+	return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_tmpfs_fallback),
 		SUBTEST(igt_ppgtt_smoke_huge),
 		SUBTEST(igt_ppgtt_sanity_check),
+		SUBTEST(igt_ppgtt_compact),
 	};
 
 	if (!HAS_PPGTT(i915)) {
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index c43e724afa9f..62471730266c 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 						   start, end, lvl);
 		} else {
 			unsigned int count;
+			unsigned int pte = gen8_pd_index(start, 0);
+			unsigned int num_ptes;
 			u64 *vaddr;
 
 			count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 			    atomic_read(&pt->used));
 			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+			num_ptes = count;
+			if (pt->is_compact) {
+				GEM_BUG_ON(num_ptes % 16);
+				GEM_BUG_ON(pte % 16);
+				num_ptes /= 16;
+				pte /= 16;
+			}
+
 			vaddr = px_vaddr(pt);
-			memset64(vaddr + gen8_pd_index(start, 0),
+			memset64(vaddr + pte,
 				 vm->scratch[0]->encode,
-				 count);
+				 num_ptes);
 
 			atomic_sub(count, &pt->used);
 			start += count;
@@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void
+xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
+			  struct i915_vma_resource *vma_res,
+			  struct sgt_dma *iter,
+			  enum i915_cache_level cache_level,
+			  u32 flags)
+{
+	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	unsigned int rem = sg_dma_len(iter->sg);
+	u64 start = vma_res->start;
+
+	GEM_BUG_ON(!i915_vm_is_4lvl(vm));
+
+	do {
+		struct i915_page_directory * const pdp =
+			gen8_pdp_for_page_address(vm, start);
+		struct i915_page_directory * const pd =
+			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
+		struct i915_page_table *pt =
+			i915_pt_entry(pd, __gen8_pte_index(start, 1));
+		gen8_pte_t encode = pte_encode;
+		unsigned int page_size;
+		gen8_pte_t *vaddr;
+		u16 index, max;
+
+		max = I915_PDES;
+
+		if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
+		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
+		    rem >= I915_GTT_PAGE_SIZE_2M &&
+		    !__gen8_pte_index(start, 0)) {
+			index = __gen8_pte_index(start, 1);
+			encode |= GEN8_PDE_PS_2M;
+			page_size = I915_GTT_PAGE_SIZE_2M;
+
+			vaddr = px_vaddr(pd);
+		} else {
+			if (encode & GEN12_PPGTT_PTE_LM) {
+				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
+				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
+				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
+						       I915_GTT_PAGE_SIZE_64K));
+
+				index = __gen8_pte_index(start, 0) / 16;
+				page_size = I915_GTT_PAGE_SIZE_64K;
+
+				max /= 16;
+
+				vaddr = px_vaddr(pd);
+				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
+
+				pt->is_compact = true;
+			} else {
+				GEM_BUG_ON(pt->is_compact);
+				index =  __gen8_pte_index(start, 0);
+				page_size = I915_GTT_PAGE_SIZE;
+			}
+
+			vaddr = px_vaddr(pt);
+		}
+
+		do {
+			GEM_BUG_ON(rem < page_size);
+			vaddr[index++] = encode | iter->dma;
+
+			start += page_size;
+			iter->dma += page_size;
+			rem -= page_size;
+			if (iter->dma >= iter->max) {
+				iter->sg = __sg_next(iter->sg);
+				if (!iter->sg)
+					break;
+
+				rem = sg_dma_len(iter->sg);
+				if (!rem)
+					break;
+
+				iter->dma = sg_dma_address(iter->sg);
+				iter->max = iter->dma + rem;
+
+				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
+					break;
+			}
+		} while (rem >= page_size && index < max);
+
+		vma_res->page_sizes_gtt |= page_size;
+	} while (iter->sg && sg_dma_len(iter->sg));
+}
+
 static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
 				   struct sgt_dma *iter,
@@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma_res);
 
 	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
-		gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		if (HAS_64K_PAGES(vm->i915))
+			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		else
+			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
 	} else  {
 		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index ba9f040f8606..e6ce0be6d484 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
 
 #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
 
+#define GEN12_PDE_64K BIT(6)
+
 /*
  * Cacheability Control is a 4-bit value. The low three bits are stored in bits
  * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
@@ -160,6 +162,7 @@ struct i915_page_table {
 		atomic_t used;
 		struct i915_page_table *stash;
 	};
+	bool is_compact;
 };
 
 struct i915_page_directory {
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 48e6e2f87700..043652dc6892 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	pt->is_compact = false;
 	atomic_set(&pt->used, 0);
 	return pt;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 03/19] drm/i915: support 64K GTT pages for discrete cards
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Thomas Hellström, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

v4: don't return uninitialized err in igt_ppgtt_compact
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++++++++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 108 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 4 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index f36191ebf964..a7d9bdb85d70 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg)
 	return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct drm_i915_gem_object *obj;
+	int err;
+
+	/*
+	 * Simple test to catch issues with compact 64K pages -- since the pt is
+	 * compacted to 256B that gives us 32 entries per pt, however since the
+	 * backing page for the pt is 4K, any extra entries we might incorrectly
+	 * write out should be ignored by the HW. If ever hit such a case this
+	 * test should catch it since some of our writes would land in scratch.
+	 */
+
+	if (!HAS_64K_PAGES(i915)) {
+		pr_info("device lacks compact 64K page support, skipping\n");
+		return 0;
+	}
+
+	if (!HAS_LMEM(i915)) {
+		pr_info("device lacks LMEM support, skipping\n");
+		return 0;
+	}
+
+	/* We want the range to cover multiple page-table boundaries. */
+	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	err = i915_gem_object_pin_pages_unlocked(obj);
+	if (err)
+		goto out_put;
+
+	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+		pr_info("LMEM compact unable to allocate huge-page(s)\n");
+		goto out_unpin;
+	}
+
+	/*
+	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+	 * insertion.
+	 */
+	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+	err = igt_write_huge(i915, obj);
+	if (err)
+		pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+	i915_gem_object_unpin_pages(obj);
+out_put:
+	i915_gem_object_put(obj);
+
+	if (err == -ENOMEM)
+		err = 0;
+
+	return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_tmpfs_fallback),
 		SUBTEST(igt_ppgtt_smoke_huge),
 		SUBTEST(igt_ppgtt_sanity_check),
+		SUBTEST(igt_ppgtt_compact),
 	};
 
 	if (!HAS_PPGTT(i915)) {
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index c43e724afa9f..62471730266c 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 						   start, end, lvl);
 		} else {
 			unsigned int count;
+			unsigned int pte = gen8_pd_index(start, 0);
+			unsigned int num_ptes;
 			u64 *vaddr;
 
 			count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 			    atomic_read(&pt->used));
 			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+			num_ptes = count;
+			if (pt->is_compact) {
+				GEM_BUG_ON(num_ptes % 16);
+				GEM_BUG_ON(pte % 16);
+				num_ptes /= 16;
+				pte /= 16;
+			}
+
 			vaddr = px_vaddr(pt);
-			memset64(vaddr + gen8_pd_index(start, 0),
+			memset64(vaddr + pte,
 				 vm->scratch[0]->encode,
-				 count);
+				 num_ptes);
 
 			atomic_sub(count, &pt->used);
 			start += count;
@@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void
+xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
+			  struct i915_vma_resource *vma_res,
+			  struct sgt_dma *iter,
+			  enum i915_cache_level cache_level,
+			  u32 flags)
+{
+	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	unsigned int rem = sg_dma_len(iter->sg);
+	u64 start = vma_res->start;
+
+	GEM_BUG_ON(!i915_vm_is_4lvl(vm));
+
+	do {
+		struct i915_page_directory * const pdp =
+			gen8_pdp_for_page_address(vm, start);
+		struct i915_page_directory * const pd =
+			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
+		struct i915_page_table *pt =
+			i915_pt_entry(pd, __gen8_pte_index(start, 1));
+		gen8_pte_t encode = pte_encode;
+		unsigned int page_size;
+		gen8_pte_t *vaddr;
+		u16 index, max;
+
+		max = I915_PDES;
+
+		if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
+		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
+		    rem >= I915_GTT_PAGE_SIZE_2M &&
+		    !__gen8_pte_index(start, 0)) {
+			index = __gen8_pte_index(start, 1);
+			encode |= GEN8_PDE_PS_2M;
+			page_size = I915_GTT_PAGE_SIZE_2M;
+
+			vaddr = px_vaddr(pd);
+		} else {
+			if (encode & GEN12_PPGTT_PTE_LM) {
+				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
+				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
+				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
+						       I915_GTT_PAGE_SIZE_64K));
+
+				index = __gen8_pte_index(start, 0) / 16;
+				page_size = I915_GTT_PAGE_SIZE_64K;
+
+				max /= 16;
+
+				vaddr = px_vaddr(pd);
+				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
+
+				pt->is_compact = true;
+			} else {
+				GEM_BUG_ON(pt->is_compact);
+				index =  __gen8_pte_index(start, 0);
+				page_size = I915_GTT_PAGE_SIZE;
+			}
+
+			vaddr = px_vaddr(pt);
+		}
+
+		do {
+			GEM_BUG_ON(rem < page_size);
+			vaddr[index++] = encode | iter->dma;
+
+			start += page_size;
+			iter->dma += page_size;
+			rem -= page_size;
+			if (iter->dma >= iter->max) {
+				iter->sg = __sg_next(iter->sg);
+				if (!iter->sg)
+					break;
+
+				rem = sg_dma_len(iter->sg);
+				if (!rem)
+					break;
+
+				iter->dma = sg_dma_address(iter->sg);
+				iter->max = iter->dma + rem;
+
+				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
+					break;
+			}
+		} while (rem >= page_size && index < max);
+
+		vma_res->page_sizes_gtt |= page_size;
+	} while (iter->sg && sg_dma_len(iter->sg));
+}
+
 static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
 				   struct sgt_dma *iter,
@@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma_res);
 
 	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
-		gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		if (HAS_64K_PAGES(vm->i915))
+			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		else
+			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
 	} else  {
 		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index ba9f040f8606..e6ce0be6d484 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
 
 #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
 
+#define GEN12_PDE_64K BIT(6)
+
 /*
  * Cacheability Control is a 4-bit value. The low three bits are stored in bits
  * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
@@ -160,6 +162,7 @@ struct i915_page_table {
 		atomic_t used;
 		struct i915_page_table *stash;
 	};
+	bool is_compact;
 };
 
 struct i915_page_directory {
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 48e6e2f87700..043652dc6892 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	pt->is_compact = false;
 	atomic_set(&pt->used, 0);
 	return pt;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 04/19] drm/i915: add gtt misalignment test
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Robert Beckett, Thomas Hellström, kernel test robot,
	Lionel Landwerlin, Matthew Auld

From: Robert Beckett <bob.beckett@collabora.com>

add test to check handling of misaligned offsets and sizes

v4:
	* remove spurious blank lines
	* explicitly cast intel_region_id to intel_memory_type in misaligned_pin
Reported-by: kernel test robot <lkp@intel.com>
v6:
	* use NEEDS_COMPACT_PT instead of hard coding for DG2

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 128 ++++++++++++++++++
 1 file changed, 128 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index b80788a2b7f9..c23b1e5cc436 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -22,10 +22,12 @@
  *
  */
 
+#include "gt/intel_gtt.h"
 #include <linux/list_sort.h>
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_region.h"
 #include "gem/selftests/mock_context.h"
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
@@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm,
 	return err;
 }
 
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr,
+			   u64 addr, u64 size, unsigned long flags)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int err = 0;
+	u64 expected_vma_size, expected_node_size;
+
+	obj = i915_gem_object_create_region(mr, size, 0, 0);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	vma = i915_vma_instance(obj, vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_put;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, addr | flags);
+	if (err)
+		goto err_put;
+	i915_vma_unpin(vma);
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1));
+	expected_node_size = expected_vma_size;
+
+	if (NEEDS_COMPACT_PT(vm->i915) && i915_gem_object_is_lmem(obj)) {
+		/* compact-pt should expand lmem node to 2MB */
+		expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K);
+		expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+	}
+
+	if (vma->size != expected_vma_size || vma->node.size != expected_node_size) {
+		err = i915_vma_unbind(vma);
+		err = -EBADSLT;
+		goto err_put;
+	}
+
+	err = i915_vma_unbind(vma);
+	if (err)
+		goto err_put;
+
+	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+
+err_put:
+	i915_gem_object_put(obj);
+	cleanup_freed_objects(vm->i915);
+	return err;
+}
+
+static int misaligned_pin(struct i915_address_space *vm,
+			  u64 hole_start, u64 hole_end,
+			  unsigned long end_time)
+{
+	struct intel_memory_region *mr;
+	enum intel_region_id id;
+	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	int err = 0;
+	u64 hole_size = hole_end - hole_start;
+
+	if (i915_is_ggtt(vm))
+		flags |= PIN_GLOBAL;
+
+	for_each_memory_region(mr, vm->i915, id) {
+		u64 min_alignment = i915_vm_min_alignment(vm, (enum intel_memory_type)id);
+		u64 size = min_alignment;
+		u64 addr = round_up(hole_start + (hole_size / 2), min_alignment);
+
+		/* we can't test < 4k alignment due to flags being encoded in lower bits */
+		if (min_alignment != I915_GTT_PAGE_SIZE_4K) {
+			err = misaligned_case(vm, mr, addr + (min_alignment / 2), size, flags);
+			/* misaligned should error with -EINVAL*/
+			if (!err)
+				err = -EBADSLT;
+			if (err != -EINVAL)
+				return err;
+		}
+
+		/* test for vma->size expansion to min page size */
+		err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+
+		/* test for intermediate size not expanding vma->size for large alignments */
+		err = misaligned_case(vm, mr, addr, size / 2, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 			  int (*func)(struct i915_address_space *vm,
 				      u64 hole_start, u64 hole_end,
@@ -1136,6 +1252,11 @@ static int igt_ppgtt_shrink_boom(void *arg)
 	return exercise_ppgtt(arg, shrink_boom);
 }
 
+static int igt_ppgtt_misaligned_pin(void *arg)
+{
+	return exercise_ppgtt(arg, misaligned_pin);
+}
+
 static int sort_holes(void *priv, const struct list_head *A,
 		      const struct list_head *B)
 {
@@ -1208,6 +1329,11 @@ static int igt_ggtt_lowlevel(void *arg)
 	return exercise_ggtt(arg, lowlevel_hole);
 }
 
+static int igt_ggtt_misaligned_pin(void *arg)
+{
+	return exercise_ggtt(arg, misaligned_pin);
+}
+
 static int igt_ggtt_page(void *arg)
 {
 	const unsigned int count = PAGE_SIZE/sizeof(u32);
@@ -2180,12 +2306,14 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_ppgtt_fill),
 		SUBTEST(igt_ppgtt_shrink),
 		SUBTEST(igt_ppgtt_shrink_boom),
+		SUBTEST(igt_ppgtt_misaligned_pin),
 		SUBTEST(igt_ggtt_lowlevel),
 		SUBTEST(igt_ggtt_drunk),
 		SUBTEST(igt_ggtt_walk),
 		SUBTEST(igt_ggtt_pot),
 		SUBTEST(igt_ggtt_fill),
 		SUBTEST(igt_ggtt_page),
+		SUBTEST(igt_ggtt_misaligned_pin),
 		SUBTEST(igt_cs_tlb),
 	};
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 04/19] drm/i915: add gtt misalignment test
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Thomas Hellström, Matthew Auld

From: Robert Beckett <bob.beckett@collabora.com>

add test to check handling of misaligned offsets and sizes

v4:
	* remove spurious blank lines
	* explicitly cast intel_region_id to intel_memory_type in misaligned_pin
Reported-by: kernel test robot <lkp@intel.com>
v6:
	* use NEEDS_COMPACT_PT instead of hard coding for DG2

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 128 ++++++++++++++++++
 1 file changed, 128 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index b80788a2b7f9..c23b1e5cc436 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -22,10 +22,12 @@
  *
  */
 
+#include "gt/intel_gtt.h"
 #include <linux/list_sort.h>
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_region.h"
 #include "gem/selftests/mock_context.h"
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
@@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm,
 	return err;
 }
 
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr,
+			   u64 addr, u64 size, unsigned long flags)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int err = 0;
+	u64 expected_vma_size, expected_node_size;
+
+	obj = i915_gem_object_create_region(mr, size, 0, 0);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	vma = i915_vma_instance(obj, vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_put;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, addr | flags);
+	if (err)
+		goto err_put;
+	i915_vma_unpin(vma);
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1));
+	expected_node_size = expected_vma_size;
+
+	if (NEEDS_COMPACT_PT(vm->i915) && i915_gem_object_is_lmem(obj)) {
+		/* compact-pt should expand lmem node to 2MB */
+		expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K);
+		expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+	}
+
+	if (vma->size != expected_vma_size || vma->node.size != expected_node_size) {
+		err = i915_vma_unbind(vma);
+		err = -EBADSLT;
+		goto err_put;
+	}
+
+	err = i915_vma_unbind(vma);
+	if (err)
+		goto err_put;
+
+	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+
+err_put:
+	i915_gem_object_put(obj);
+	cleanup_freed_objects(vm->i915);
+	return err;
+}
+
+static int misaligned_pin(struct i915_address_space *vm,
+			  u64 hole_start, u64 hole_end,
+			  unsigned long end_time)
+{
+	struct intel_memory_region *mr;
+	enum intel_region_id id;
+	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	int err = 0;
+	u64 hole_size = hole_end - hole_start;
+
+	if (i915_is_ggtt(vm))
+		flags |= PIN_GLOBAL;
+
+	for_each_memory_region(mr, vm->i915, id) {
+		u64 min_alignment = i915_vm_min_alignment(vm, (enum intel_memory_type)id);
+		u64 size = min_alignment;
+		u64 addr = round_up(hole_start + (hole_size / 2), min_alignment);
+
+		/* we can't test < 4k alignment due to flags being encoded in lower bits */
+		if (min_alignment != I915_GTT_PAGE_SIZE_4K) {
+			err = misaligned_case(vm, mr, addr + (min_alignment / 2), size, flags);
+			/* misaligned should error with -EINVAL*/
+			if (!err)
+				err = -EBADSLT;
+			if (err != -EINVAL)
+				return err;
+		}
+
+		/* test for vma->size expansion to min page size */
+		err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+
+		/* test for intermediate size not expanding vma->size for large alignments */
+		err = misaligned_case(vm, mr, addr, size / 2, flags);
+		if (min_alignment > hole_size) {
+			if (!err)
+				err = -EBADSLT;
+			else if (err == -ENOSPC)
+				err = 0;
+		}
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 			  int (*func)(struct i915_address_space *vm,
 				      u64 hole_start, u64 hole_end,
@@ -1136,6 +1252,11 @@ static int igt_ppgtt_shrink_boom(void *arg)
 	return exercise_ppgtt(arg, shrink_boom);
 }
 
+static int igt_ppgtt_misaligned_pin(void *arg)
+{
+	return exercise_ppgtt(arg, misaligned_pin);
+}
+
 static int sort_holes(void *priv, const struct list_head *A,
 		      const struct list_head *B)
 {
@@ -1208,6 +1329,11 @@ static int igt_ggtt_lowlevel(void *arg)
 	return exercise_ggtt(arg, lowlevel_hole);
 }
 
+static int igt_ggtt_misaligned_pin(void *arg)
+{
+	return exercise_ggtt(arg, misaligned_pin);
+}
+
 static int igt_ggtt_page(void *arg)
 {
 	const unsigned int count = PAGE_SIZE/sizeof(u32);
@@ -2180,12 +2306,14 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_ppgtt_fill),
 		SUBTEST(igt_ppgtt_shrink),
 		SUBTEST(igt_ppgtt_shrink_boom),
+		SUBTEST(igt_ppgtt_misaligned_pin),
 		SUBTEST(igt_ggtt_lowlevel),
 		SUBTEST(igt_ggtt_drunk),
 		SUBTEST(igt_ggtt_walk),
 		SUBTEST(igt_ggtt_pot),
 		SUBTEST(igt_ggtt_fill),
 		SUBTEST(igt_ggtt_page),
+		SUBTEST(igt_ggtt_misaligned_pin),
 		SUBTEST(igt_cs_tlb),
 	};
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 05/19] drm/i915/gtt: allow overriding the pt alignment
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Thomas Hellström, Matthew Auld, Lionel Landwerlin

From: Matthew Auld <matthew.auld@intel.com>

On some platforms we have alignment restrictions when accessing LMEM
from the GTT. In the next patch few patches we need to be able to modify
the page-tables directly via the GTT itself.

Suggested-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gtt.h   | 10 +++++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c | 16 ++++++++++++----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index e6ce0be6d484..2c62411eb52c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -200,6 +200,14 @@ void *__px_vaddr(struct drm_i915_gem_object *p);
 struct i915_vm_pt_stash {
 	/* preallocated chains of page tables/directories */
 	struct i915_page_table *pt[2];
+	/*
+	 * Optionally override the alignment/size of the physical page that
+	 * contains each PT. If not set defaults back to the usual
+	 * I915_GTT_PAGE_SIZE_4K. This does not influence the other paging
+	 * structures. MUST be a power-of-two. ONLY applicable on discrete
+	 * platforms.
+	 */
+	int pt_sz;
 };
 
 struct i915_vma_ops {
@@ -591,7 +599,7 @@ void free_scratch(struct i915_address_space *vm);
 
 struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz);
 struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz);
-struct i915_page_table *alloc_pt(struct i915_address_space *vm);
+struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz);
 struct i915_page_directory *alloc_pd(struct i915_address_space *vm);
 struct i915_page_directory *__alloc_pd(int npde);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 043652dc6892..d91e2beb7517 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -12,7 +12,7 @@
 #include "gen6_ppgtt.h"
 #include "gen8_ppgtt.h"
 
-struct i915_page_table *alloc_pt(struct i915_address_space *vm)
+struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz)
 {
 	struct i915_page_table *pt;
 
@@ -20,7 +20,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 	if (unlikely(!pt))
 		return ERR_PTR(-ENOMEM);
 
-	pt->base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
+	pt->base = vm->alloc_pt_dma(vm, sz);
 	if (IS_ERR(pt->base)) {
 		kfree(pt);
 		return ERR_PTR(-ENOMEM);
@@ -221,17 +221,25 @@ int i915_vm_alloc_pt_stash(struct i915_address_space *vm,
 			   u64 size)
 {
 	unsigned long count;
-	int shift, n;
+	int shift, n, pt_sz;
 
 	shift = vm->pd_shift;
 	if (!shift)
 		return 0;
 
+	pt_sz = stash->pt_sz;
+	if (!pt_sz)
+		pt_sz = I915_GTT_PAGE_SIZE_4K;
+	else
+		GEM_BUG_ON(!IS_DGFX(vm->i915));
+
+	GEM_BUG_ON(!is_power_of_2(pt_sz));
+
 	count = pd_count(size, shift);
 	while (count--) {
 		struct i915_page_table *pt;
 
-		pt = alloc_pt(vm);
+		pt = alloc_pt(vm, pt_sz);
 		if (IS_ERR(pt)) {
 			i915_vm_free_pt_stash(vm, stash);
 			return PTR_ERR(pt);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 05/19] drm/i915/gtt: allow overriding the pt alignment
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Thomas Hellström, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

On some platforms we have alignment restrictions when accessing LMEM
from the GTT. In the next patch few patches we need to be able to modify
the page-tables directly via the GTT itself.

Suggested-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gtt.h   | 10 +++++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c | 16 ++++++++++++----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index e6ce0be6d484..2c62411eb52c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -200,6 +200,14 @@ void *__px_vaddr(struct drm_i915_gem_object *p);
 struct i915_vm_pt_stash {
 	/* preallocated chains of page tables/directories */
 	struct i915_page_table *pt[2];
+	/*
+	 * Optionally override the alignment/size of the physical page that
+	 * contains each PT. If not set defaults back to the usual
+	 * I915_GTT_PAGE_SIZE_4K. This does not influence the other paging
+	 * structures. MUST be a power-of-two. ONLY applicable on discrete
+	 * platforms.
+	 */
+	int pt_sz;
 };
 
 struct i915_vma_ops {
@@ -591,7 +599,7 @@ void free_scratch(struct i915_address_space *vm);
 
 struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz);
 struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz);
-struct i915_page_table *alloc_pt(struct i915_address_space *vm);
+struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz);
 struct i915_page_directory *alloc_pd(struct i915_address_space *vm);
 struct i915_page_directory *__alloc_pd(int npde);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 043652dc6892..d91e2beb7517 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -12,7 +12,7 @@
 #include "gen6_ppgtt.h"
 #include "gen8_ppgtt.h"
 
-struct i915_page_table *alloc_pt(struct i915_address_space *vm)
+struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz)
 {
 	struct i915_page_table *pt;
 
@@ -20,7 +20,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 	if (unlikely(!pt))
 		return ERR_PTR(-ENOMEM);
 
-	pt->base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
+	pt->base = vm->alloc_pt_dma(vm, sz);
 	if (IS_ERR(pt->base)) {
 		kfree(pt);
 		return ERR_PTR(-ENOMEM);
@@ -221,17 +221,25 @@ int i915_vm_alloc_pt_stash(struct i915_address_space *vm,
 			   u64 size)
 {
 	unsigned long count;
-	int shift, n;
+	int shift, n, pt_sz;
 
 	shift = vm->pd_shift;
 	if (!shift)
 		return 0;
 
+	pt_sz = stash->pt_sz;
+	if (!pt_sz)
+		pt_sz = I915_GTT_PAGE_SIZE_4K;
+	else
+		GEM_BUG_ON(!IS_DGFX(vm->i915));
+
+	GEM_BUG_ON(!is_power_of_2(pt_sz));
+
 	count = pd_count(size, shift);
 	while (count--) {
 		struct i915_page_table *pt;
 
-		pt = alloc_pt(vm);
+		pt = alloc_pt(vm, pt_sz);
 		if (IS_ERR(pt)) {
 			i915_vm_free_pt_stash(vm, stash);
 			return PTR_ERR(pt);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 06/19] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Matthew Auld, Lionel Landwerlin, Thomas Hellström

From: Matthew Auld <matthew.auld@intel.com>

If this is LMEM then we get a 32 entry PT, with each PTE pointing to
some 64K block of memory, otherwise it's just the usual 512 entry PT.
This very much assumes the caller knows what they are doing.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 62471730266c..f574da00eff1 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -715,13 +715,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 		gen8_pdp_for_page_index(vm, idx);
 	struct i915_page_directory *pd =
 		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
 	gen8_pte_t *vaddr;
 
-	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
+	GEM_BUG_ON(pt->is_compact);
+
+	vaddr = px_vaddr(pt);
 	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
 	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
 }
 
+static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
+					    dma_addr_t addr,
+					    u64 offset,
+					    enum i915_cache_level level,
+					    u32 flags)
+{
+	u64 idx = offset >> GEN8_PTE_SHIFT;
+	struct i915_page_directory * const pdp =
+		gen8_pdp_for_page_index(vm, idx);
+	struct i915_page_directory *pd =
+		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
+	gen8_pte_t *vaddr;
+
+	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
+	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
+
+	if (!pt->is_compact) {
+		vaddr = px_vaddr(pd);
+		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
+		pt->is_compact = true;
+	}
+
+	vaddr = px_vaddr(pt);
+	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
+}
+
+static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
+				       dma_addr_t addr,
+				       u64 offset,
+				       enum i915_cache_level level,
+				       u32 flags)
+{
+	if (flags & PTE_LM)
+		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
+						       level, flags);
+
+	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
+}
+
 static int gen8_init_scratch(struct i915_address_space *vm)
 {
 	u32 pte_flags;
@@ -921,7 +964,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 
 	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
 	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
-	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
+	if (HAS_64K_PAGES(gt->i915))
+		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
+	else
+		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
 	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
 	ppgtt->vm.clear_range = gen8_ppgtt_clear;
 	ppgtt->vm.foreach = gen8_ppgtt_foreach;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 06/19] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Matthew Auld, Thomas Hellström

From: Matthew Auld <matthew.auld@intel.com>

If this is LMEM then we get a 32 entry PT, with each PTE pointing to
some 64K block of memory, otherwise it's just the usual 512 entry PT.
This very much assumes the caller knows what they are doing.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 62471730266c..f574da00eff1 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -715,13 +715,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 		gen8_pdp_for_page_index(vm, idx);
 	struct i915_page_directory *pd =
 		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
 	gen8_pte_t *vaddr;
 
-	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
+	GEM_BUG_ON(pt->is_compact);
+
+	vaddr = px_vaddr(pt);
 	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
 	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
 }
 
+static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
+					    dma_addr_t addr,
+					    u64 offset,
+					    enum i915_cache_level level,
+					    u32 flags)
+{
+	u64 idx = offset >> GEN8_PTE_SHIFT;
+	struct i915_page_directory * const pdp =
+		gen8_pdp_for_page_index(vm, idx);
+	struct i915_page_directory *pd =
+		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
+	gen8_pte_t *vaddr;
+
+	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
+	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
+
+	if (!pt->is_compact) {
+		vaddr = px_vaddr(pd);
+		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
+		pt->is_compact = true;
+	}
+
+	vaddr = px_vaddr(pt);
+	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
+}
+
+static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
+				       dma_addr_t addr,
+				       u64 offset,
+				       enum i915_cache_level level,
+				       u32 flags)
+{
+	if (flags & PTE_LM)
+		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
+						       level, flags);
+
+	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
+}
+
 static int gen8_init_scratch(struct i915_address_space *vm)
 {
 	u32 pte_flags;
@@ -921,7 +964,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 
 	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
 	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
-	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
+	if (HAS_64K_PAGES(gt->i915))
+		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
+	else
+		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
 	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
 	ppgtt->vm.clear_range = gen8_ppgtt_clear;
 	ppgtt->vm.foreach = gen8_ppgtt_foreach;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 07/19] drm/i915/migrate: add acceleration support for DG2
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Matthew Auld, Lionel Landwerlin, Thomas Hellström

From: Matthew Auld <matthew.auld@intel.com>

This is all kinds of awkward since we now have to contend with using 64K
GTT pages when mapping anything in LMEM(including the page-tables
themselves).

v2: Rebased [Ram]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 179 +++++++++++++++++++-----
 1 file changed, 147 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 18b44af56969..cac791155244 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -32,6 +32,38 @@ static bool engine_supports_migration(struct intel_engine_cs *engine)
 	return true;
 }
 
+static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
+				struct i915_page_table *pt,
+				void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
+	 * we have a correctly setup PDE structure for later use.
+	 */
+	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
+	GEM_BUG_ON(!pt->is_compact);
+	d->offset += SZ_2M;
+}
+
+static void xehpsdv_insert_pte(struct i915_address_space *vm,
+			       struct i915_page_table *pt,
+			       void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * We are playing tricks here, since the actual pt, from the hw
+	 * pov, is only 256bytes with 32 entries, or 4096bytes with 512
+	 * entries, but we are still guaranteed that the physical
+	 * alignment is 64K underneath for the pt, and we are careful
+	 * not to access the space in the void.
+	 */
+	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
+	d->offset += SZ_64K;
+}
+
 static void insert_pte(struct i915_address_space *vm,
 		       struct i915_page_table *pt,
 		       void *data)
@@ -74,7 +106,12 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 	 * i.e. within the same non-preemptible window so that we do not switch
 	 * to another migration context that overwrites the PTE.
 	 *
-	 * TODO: Add support for huge LMEM PTEs
+	 * On platforms with HAS_64K_PAGES support we have three windows, and
+	 * dedicate two windows just for mapping lmem pages(smem <-> smem is not
+	 * a thing), since we are forced to use 64K GTT pages underneath which
+	 * requires also modifying the PDE. An alternative might be to instead
+	 * map the PD into the GTT, and then on the fly toggle the 4K/64K mode
+	 * in the PDE from the same batch that also modifies the PTEs.
 	 */
 
 	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
@@ -86,6 +123,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 		goto err_vm;
 	}
 
+	if (HAS_64K_PAGES(gt->i915))
+		stash.pt_sz = I915_GTT_PAGE_SIZE_64K;
+
 	/*
 	 * Each engine instance is assigned its own chunk in the VM, so
 	 * that we can run multiple instances concurrently
@@ -105,14 +145,20 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 		 * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need
 		 * 4x2 page directories for source/destination.
 		 */
-		sz = 2 * CHUNK_SZ;
+		if (HAS_64K_PAGES(gt->i915))
+			sz = 3 * CHUNK_SZ;
+		else
+			sz = 2 * CHUNK_SZ;
 		d.offset = base + sz;
 
 		/*
 		 * We need another page directory setup so that we can write
 		 * the 8x512 PTE in each chunk.
 		 */
-		sz += (sz >> 12) * sizeof(u64);
+		if (HAS_64K_PAGES(gt->i915))
+			sz += (sz / SZ_2M) * SZ_64K;
+		else
+			sz += (sz >> 12) * sizeof(u64);
 
 		err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz);
 		if (err)
@@ -133,7 +179,18 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 			goto err_vm;
 
 		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
-		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
+		if (HAS_64K_PAGES(gt->i915)) {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       xehpsdv_insert_pte, &d);
+			d.offset = base + CHUNK_SZ;
+			vm->vm.foreach(&vm->vm,
+				       d.offset,
+				       2 * CHUNK_SZ,
+				       xehpsdv_toggle_pdes, &d);
+		} else {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       insert_pte, &d);
+		}
 	}
 
 	return &vm->vm;
@@ -269,19 +326,38 @@ static int emit_pte(struct i915_request *rq,
 		    u64 offset,
 		    int length)
 {
+	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
 	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
 						       is_lmem ? PTE_LM : 0);
 	struct intel_ring *ring = rq->ring;
-	int total = 0;
+	int pkt, dword_length;
+	u32 total = 0;
+	u32 page_size;
 	u32 *hdr, *cs;
-	int pkt;
 
 	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
 
+	page_size = I915_GTT_PAGE_SIZE;
+	dword_length = 0x400;
+
 	/* Compute the page directory offset for the target address range */
-	offset >>= 12;
-	offset *= sizeof(u64);
-	offset += 2 * CHUNK_SZ;
+	if (has_64K_pages) {
+		GEM_BUG_ON(!IS_ALIGNED(offset, SZ_2M));
+
+		offset /= SZ_2M;
+		offset *= SZ_64K;
+		offset += 3 * CHUNK_SZ;
+
+		if (is_lmem) {
+			page_size = I915_GTT_PAGE_SIZE_64K;
+			dword_length = 0x40;
+		}
+	} else {
+		offset >>= 12;
+		offset *= sizeof(u64);
+		offset += 2 * CHUNK_SZ;
+	}
+
 	offset += (u64)rq->engine->instance << 32;
 
 	cs = intel_ring_begin(rq, 6);
@@ -289,7 +365,7 @@ static int emit_pte(struct i915_request *rq,
 		return PTR_ERR(cs);
 
 	/* Pack as many PTE updates as possible into a single MI command */
-	pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+	pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5);
 	pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 	hdr = cs;
@@ -299,6 +375,8 @@ static int emit_pte(struct i915_request *rq,
 
 	do {
 		if (cs - hdr >= pkt) {
+			int dword_rem;
+
 			*hdr += cs - hdr - 2;
 			*cs++ = MI_NOOP;
 
@@ -310,7 +388,18 @@ static int emit_pte(struct i915_request *rq,
 			if (IS_ERR(cs))
 				return PTR_ERR(cs);
 
-			pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+			dword_rem = dword_length;
+			if (has_64K_pages) {
+				if (IS_ALIGNED(total, SZ_2M)) {
+					offset = round_up(offset, SZ_64K);
+				} else {
+					dword_rem = SZ_2M - (total & (SZ_2M - 1));
+					dword_rem /= page_size;
+					dword_rem *= 2;
+				}
+			}
+
+			pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5);
 			pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 			hdr = cs;
@@ -319,13 +408,15 @@ static int emit_pte(struct i915_request *rq,
 			*cs++ = upper_32_bits(offset);
 		}
 
+		GEM_BUG_ON(!IS_ALIGNED(it->dma, page_size));
+
 		*cs++ = lower_32_bits(encode | it->dma);
 		*cs++ = upper_32_bits(encode | it->dma);
 
 		offset += 8;
-		total += I915_GTT_PAGE_SIZE;
+		total += page_size;
 
-		it->dma += I915_GTT_PAGE_SIZE;
+		it->dma += page_size;
 		if (it->dma >= it->max) {
 			it->sg = __sg_next(it->sg);
 			if (!it->sg || sg_dma_len(it->sg) == 0)
@@ -356,7 +447,8 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
-static int emit_copy(struct i915_request *rq, int size)
+static int emit_copy(struct i915_request *rq,
+		     u32 dst_offset, u32 src_offset, int size)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 instance = rq->engine->instance;
@@ -371,31 +463,31 @@ static int emit_copy(struct i915_request *rq, int size)
 		*cs++ = BLT_DEPTH_32 | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else if (ver >= 8) {
 		*cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else {
 		GEM_BUG_ON(instance);
 		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 	}
 
 	intel_ring_advance(rq, cs);
@@ -423,6 +515,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 src_offset, dst_offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -450,15 +543,28 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0,
-			       CHUNK_SZ);
+		src_offset = 0;
+		dst_offset = CHUNK_SZ;
+		if (HAS_64K_PAGES(ce->engine->i915)) {
+			GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
+
+			src_offset = 0;
+			dst_offset = 0;
+			if (src_is_lmem)
+				src_offset = CHUNK_SZ;
+			if (dst_is_lmem)
+				dst_offset = 2 * CHUNK_SZ;
+		}
+
+		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
+			       src_offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
 		}
 
 		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
-			       CHUNK_SZ, len);
+			       dst_offset, len);
 		if (err < 0)
 			goto out_rq;
 		if (err < len) {
@@ -470,7 +576,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_copy(rq, len);
+		err = emit_copy(rq, dst_offset, src_offset, len);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
@@ -488,14 +594,18 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, int size, u32 value)
+static int emit_clear(struct i915_request *rq,
+		      u64 offset,
+		      int size,
+		      u32 value)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
-	u32 instance = rq->engine->instance;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
+	offset += (u64)rq->engine->instance << 32;
+
 	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
@@ -505,17 +615,17 @@ static int emit_clear(struct i915_request *rq, int size, u32 value)
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0; /* offset */
-		*cs++ = instance;
+		*cs++ = lower_32_bits(offset);
+		*cs++ = upper_32_bits(offset);
 		*cs++ = value;
 		*cs++ = MI_NOOP;
 	} else {
-		GEM_BUG_ON(instance);
+		GEM_BUG_ON(upper_32_bits(offset));
 		*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0;
+		*cs++ = lower_32_bits(offset);
 		*cs++ = value;
 	}
 
@@ -542,6 +652,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -569,7 +680,11 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ);
+		offset = 0;
+		if (HAS_64K_PAGES(ce->engine->i915) && is_lmem)
+			offset = CHUNK_SZ;
+
+		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -579,7 +694,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, len, value);
+		err = emit_clear(rq, offset, len, value);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 07/19] drm/i915/migrate: add acceleration support for DG2
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Matthew Auld, Thomas Hellström

From: Matthew Auld <matthew.auld@intel.com>

This is all kinds of awkward since we now have to contend with using 64K
GTT pages when mapping anything in LMEM(including the page-tables
themselves).

v2: Rebased [Ram]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 179 +++++++++++++++++++-----
 1 file changed, 147 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 18b44af56969..cac791155244 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -32,6 +32,38 @@ static bool engine_supports_migration(struct intel_engine_cs *engine)
 	return true;
 }
 
+static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
+				struct i915_page_table *pt,
+				void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
+	 * we have a correctly setup PDE structure for later use.
+	 */
+	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
+	GEM_BUG_ON(!pt->is_compact);
+	d->offset += SZ_2M;
+}
+
+static void xehpsdv_insert_pte(struct i915_address_space *vm,
+			       struct i915_page_table *pt,
+			       void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * We are playing tricks here, since the actual pt, from the hw
+	 * pov, is only 256bytes with 32 entries, or 4096bytes with 512
+	 * entries, but we are still guaranteed that the physical
+	 * alignment is 64K underneath for the pt, and we are careful
+	 * not to access the space in the void.
+	 */
+	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
+	d->offset += SZ_64K;
+}
+
 static void insert_pte(struct i915_address_space *vm,
 		       struct i915_page_table *pt,
 		       void *data)
@@ -74,7 +106,12 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 	 * i.e. within the same non-preemptible window so that we do not switch
 	 * to another migration context that overwrites the PTE.
 	 *
-	 * TODO: Add support for huge LMEM PTEs
+	 * On platforms with HAS_64K_PAGES support we have three windows, and
+	 * dedicate two windows just for mapping lmem pages(smem <-> smem is not
+	 * a thing), since we are forced to use 64K GTT pages underneath which
+	 * requires also modifying the PDE. An alternative might be to instead
+	 * map the PD into the GTT, and then on the fly toggle the 4K/64K mode
+	 * in the PDE from the same batch that also modifies the PTEs.
 	 */
 
 	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
@@ -86,6 +123,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 		goto err_vm;
 	}
 
+	if (HAS_64K_PAGES(gt->i915))
+		stash.pt_sz = I915_GTT_PAGE_SIZE_64K;
+
 	/*
 	 * Each engine instance is assigned its own chunk in the VM, so
 	 * that we can run multiple instances concurrently
@@ -105,14 +145,20 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 		 * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need
 		 * 4x2 page directories for source/destination.
 		 */
-		sz = 2 * CHUNK_SZ;
+		if (HAS_64K_PAGES(gt->i915))
+			sz = 3 * CHUNK_SZ;
+		else
+			sz = 2 * CHUNK_SZ;
 		d.offset = base + sz;
 
 		/*
 		 * We need another page directory setup so that we can write
 		 * the 8x512 PTE in each chunk.
 		 */
-		sz += (sz >> 12) * sizeof(u64);
+		if (HAS_64K_PAGES(gt->i915))
+			sz += (sz / SZ_2M) * SZ_64K;
+		else
+			sz += (sz >> 12) * sizeof(u64);
 
 		err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz);
 		if (err)
@@ -133,7 +179,18 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 			goto err_vm;
 
 		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
-		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
+		if (HAS_64K_PAGES(gt->i915)) {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       xehpsdv_insert_pte, &d);
+			d.offset = base + CHUNK_SZ;
+			vm->vm.foreach(&vm->vm,
+				       d.offset,
+				       2 * CHUNK_SZ,
+				       xehpsdv_toggle_pdes, &d);
+		} else {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       insert_pte, &d);
+		}
 	}
 
 	return &vm->vm;
@@ -269,19 +326,38 @@ static int emit_pte(struct i915_request *rq,
 		    u64 offset,
 		    int length)
 {
+	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
 	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
 						       is_lmem ? PTE_LM : 0);
 	struct intel_ring *ring = rq->ring;
-	int total = 0;
+	int pkt, dword_length;
+	u32 total = 0;
+	u32 page_size;
 	u32 *hdr, *cs;
-	int pkt;
 
 	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
 
+	page_size = I915_GTT_PAGE_SIZE;
+	dword_length = 0x400;
+
 	/* Compute the page directory offset for the target address range */
-	offset >>= 12;
-	offset *= sizeof(u64);
-	offset += 2 * CHUNK_SZ;
+	if (has_64K_pages) {
+		GEM_BUG_ON(!IS_ALIGNED(offset, SZ_2M));
+
+		offset /= SZ_2M;
+		offset *= SZ_64K;
+		offset += 3 * CHUNK_SZ;
+
+		if (is_lmem) {
+			page_size = I915_GTT_PAGE_SIZE_64K;
+			dword_length = 0x40;
+		}
+	} else {
+		offset >>= 12;
+		offset *= sizeof(u64);
+		offset += 2 * CHUNK_SZ;
+	}
+
 	offset += (u64)rq->engine->instance << 32;
 
 	cs = intel_ring_begin(rq, 6);
@@ -289,7 +365,7 @@ static int emit_pte(struct i915_request *rq,
 		return PTR_ERR(cs);
 
 	/* Pack as many PTE updates as possible into a single MI command */
-	pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+	pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5);
 	pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 	hdr = cs;
@@ -299,6 +375,8 @@ static int emit_pte(struct i915_request *rq,
 
 	do {
 		if (cs - hdr >= pkt) {
+			int dword_rem;
+
 			*hdr += cs - hdr - 2;
 			*cs++ = MI_NOOP;
 
@@ -310,7 +388,18 @@ static int emit_pte(struct i915_request *rq,
 			if (IS_ERR(cs))
 				return PTR_ERR(cs);
 
-			pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+			dword_rem = dword_length;
+			if (has_64K_pages) {
+				if (IS_ALIGNED(total, SZ_2M)) {
+					offset = round_up(offset, SZ_64K);
+				} else {
+					dword_rem = SZ_2M - (total & (SZ_2M - 1));
+					dword_rem /= page_size;
+					dword_rem *= 2;
+				}
+			}
+
+			pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5);
 			pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 			hdr = cs;
@@ -319,13 +408,15 @@ static int emit_pte(struct i915_request *rq,
 			*cs++ = upper_32_bits(offset);
 		}
 
+		GEM_BUG_ON(!IS_ALIGNED(it->dma, page_size));
+
 		*cs++ = lower_32_bits(encode | it->dma);
 		*cs++ = upper_32_bits(encode | it->dma);
 
 		offset += 8;
-		total += I915_GTT_PAGE_SIZE;
+		total += page_size;
 
-		it->dma += I915_GTT_PAGE_SIZE;
+		it->dma += page_size;
 		if (it->dma >= it->max) {
 			it->sg = __sg_next(it->sg);
 			if (!it->sg || sg_dma_len(it->sg) == 0)
@@ -356,7 +447,8 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
-static int emit_copy(struct i915_request *rq, int size)
+static int emit_copy(struct i915_request *rq,
+		     u32 dst_offset, u32 src_offset, int size)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 instance = rq->engine->instance;
@@ -371,31 +463,31 @@ static int emit_copy(struct i915_request *rq, int size)
 		*cs++ = BLT_DEPTH_32 | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else if (ver >= 8) {
 		*cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else {
 		GEM_BUG_ON(instance);
 		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 	}
 
 	intel_ring_advance(rq, cs);
@@ -423,6 +515,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 src_offset, dst_offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -450,15 +543,28 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0,
-			       CHUNK_SZ);
+		src_offset = 0;
+		dst_offset = CHUNK_SZ;
+		if (HAS_64K_PAGES(ce->engine->i915)) {
+			GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
+
+			src_offset = 0;
+			dst_offset = 0;
+			if (src_is_lmem)
+				src_offset = CHUNK_SZ;
+			if (dst_is_lmem)
+				dst_offset = 2 * CHUNK_SZ;
+		}
+
+		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
+			       src_offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
 		}
 
 		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
-			       CHUNK_SZ, len);
+			       dst_offset, len);
 		if (err < 0)
 			goto out_rq;
 		if (err < len) {
@@ -470,7 +576,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_copy(rq, len);
+		err = emit_copy(rq, dst_offset, src_offset, len);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
@@ -488,14 +594,18 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, int size, u32 value)
+static int emit_clear(struct i915_request *rq,
+		      u64 offset,
+		      int size,
+		      u32 value)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
-	u32 instance = rq->engine->instance;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
+	offset += (u64)rq->engine->instance << 32;
+
 	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
@@ -505,17 +615,17 @@ static int emit_clear(struct i915_request *rq, int size, u32 value)
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0; /* offset */
-		*cs++ = instance;
+		*cs++ = lower_32_bits(offset);
+		*cs++ = upper_32_bits(offset);
 		*cs++ = value;
 		*cs++ = MI_NOOP;
 	} else {
-		GEM_BUG_ON(instance);
+		GEM_BUG_ON(upper_32_bits(offset));
 		*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0;
+		*cs++ = lower_32_bits(offset);
 		*cs++ = value;
 	}
 
@@ -542,6 +652,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -569,7 +680,11 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ);
+		offset = 0;
+		if (HAS_64K_PAGES(ce->engine->i915) && is_lmem)
+			offset = CHUNK_SZ;
+
+		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -579,7 +694,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, len, value);
+		err = emit_clear(rq, offset, len, value);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 08/19] drm/i915/uapi: document behaviour for DG2 64K support
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tony Ye, Thomas Hellström, Jordan Justen, Kenneth Graunke,
	Lionel Landwerlin, Slawomir Milczarek, Matthew Auld, mesa-dev

From: Matthew Auld <matthew.auld@intel.com>

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v3: fix typos and less emphasis
v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 5e678917da70..77e5e74c32c1 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
 	/**
 	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 	 * the user with the GTT offset at which this object will be pinned.
+	 *
 	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 	 * presumed_offset of the object.
+	 *
 	 * During execbuffer2 the kernel populates it with the value of the
 	 * current GTT offset of the object, for future presumed_offset writes.
+	 *
+	 * See struct drm_i915_gem_create_ext for the rules when dealing with
+	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+	 * minimum page sizes, like DG2.
 	 */
 	__u64 offset;
 
@@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 * Note that for some devices we have might have further minimum
-	 * page-size restrictions(larger than 4K), like for device local-memory.
-	 * However in general the final size here should always reflect any
-	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
-	 * extension to place the object in device local-memory.
+	 *
+	 * DG2 64K min page size implications:
+	 *
+	 * On discrete platforms, starting from DG2, we have to contend with GTT
+	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+	 * objects.  Specifically the hardware only supports 64K or larger GTT
+	 * page sizes for such memory. The kernel will already ensure that all
+	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+	 * sizes underneath.
+	 *
+	 * Note that the returned size here will always reflect any required
+	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
+	 * such as DG2.
+	 *
+	 * Special DG2 GTT address alignment requirement:
+	 *
+	 * The GTT alignment will also need to be at least 2M for such objects.
+	 *
+	 * Note that due to how the hardware implements 64K GTT page support, we
+	 * have some further complications:
+	 *
+	 *   1) The entire PDE (which covers a 2MB virtual address range), must
+	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+	 *   PDE is forbidden by the hardware.
+	 *
+	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+	 *   objects.
+	 *
+	 * To keep things simple for userland, we mandate that any GTT mappings
+	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
+	 * address space and avoids userland having to copy any needlessly
+	 * complicated PDE sharing scheme (coloring) and only affects DG2, this
+	 * is deemed to be a good compromise.
 	 */
 	__u64 size;
 	/**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 08/19] drm/i915/uapi: document behaviour for DG2 64K support
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Thomas Hellström, Simon Ser, Kenneth Graunke,
	Slawomir Milczarek, Pekka Paalanen, Matthew Auld, mesa-dev

From: Matthew Auld <matthew.auld@intel.com>

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v3: fix typos and less emphasis
v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 5e678917da70..77e5e74c32c1 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
 	/**
 	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 	 * the user with the GTT offset at which this object will be pinned.
+	 *
 	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 	 * presumed_offset of the object.
+	 *
 	 * During execbuffer2 the kernel populates it with the value of the
 	 * current GTT offset of the object, for future presumed_offset writes.
+	 *
+	 * See struct drm_i915_gem_create_ext for the rules when dealing with
+	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+	 * minimum page sizes, like DG2.
 	 */
 	__u64 offset;
 
@@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 * Note that for some devices we have might have further minimum
-	 * page-size restrictions(larger than 4K), like for device local-memory.
-	 * However in general the final size here should always reflect any
-	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
-	 * extension to place the object in device local-memory.
+	 *
+	 * DG2 64K min page size implications:
+	 *
+	 * On discrete platforms, starting from DG2, we have to contend with GTT
+	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+	 * objects.  Specifically the hardware only supports 64K or larger GTT
+	 * page sizes for such memory. The kernel will already ensure that all
+	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+	 * sizes underneath.
+	 *
+	 * Note that the returned size here will always reflect any required
+	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
+	 * such as DG2.
+	 *
+	 * Special DG2 GTT address alignment requirement:
+	 *
+	 * The GTT alignment will also need to be at least 2M for such objects.
+	 *
+	 * Note that due to how the hardware implements 64K GTT page support, we
+	 * have some further complications:
+	 *
+	 *   1) The entire PDE (which covers a 2MB virtual address range), must
+	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+	 *   PDE is forbidden by the hardware.
+	 *
+	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+	 *   objects.
+	 *
+	 * To keep things simple for userland, we mandate that any GTT mappings
+	 * must be aligned to and rounded up to 2MB. As this only wastes virtual
+	 * address space and avoids userland having to copy any needlessly
+	 * complicated PDE sharing scheme (coloring) and only affects DG2, this
+	 * is deemed to be a good compromise.
 	 */
 	__u64 size;
 	/**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 09/19] Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tony Ye, Jordan Justen, Daniel Vetter, Kenneth Graunke,
	Lionel Landwerlin, Slawomir Milczarek, Matthew Auld, mesa-dev

Details of the 64k pagesize support added as part of DG2 enabling and its
implicit impact on the uAPI.

v2: improvised the Flat-CCS documentation [Danvet & CQ]
v3: made only for 64k pagesize support

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Daniel Vetter <daniel.vetter@ffwll.ch>
cc: Matthew Auld <matthew.auld@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 Documentation/gpu/rfc/i915_dg2.rst | 25 +++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst    |  3 +++
 2 files changed, 28 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_dg2.rst

diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst
new file mode 100644
index 000000000000..f4eb5a219897
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_dg2.rst
@@ -0,0 +1,25 @@
+====================
+I915 DG2 RFC Section
+====================
+
+Upstream plan
+=============
+Plan to upstream the DG2 enabling is:
+
+* Merge basic HW enabling for DG2 (Still without pciid)
+* Merge the 64k support for lmem
+* Merge the flat CCS enabling patches
+* Add the pciid for DG2 and enable the DG2 in CI
+
+
+64K page support for lmem
+=========================
+On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is not
+supported anymore.
+
+DG2 hw doesn't support the 64k (lmem) and 4k (smem) pages in the same ppgtt
+Page table. Refer the struct drm_i915_gem_create_ext for the implication of
+handling the 64k page size.
+
+.. kernel-doc:: include/uapi/drm/i915_drm.h
+        :functions: drm_i915_gem_create_ext
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..afb320ed4028 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -20,6 +20,9 @@ host such documentation:
 
     i915_gem_lmem.rst
 
+.. toctree::
+    i915_dg2.rst
+
 .. toctree::
 
     i915_scheduler.rst
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 09/19] Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, Kenneth Graunke, Slawomir Milczarek,
	Pekka Paalanen, Matthew Auld, Simon Ser, mesa-dev

Details of the 64k pagesize support added as part of DG2 enabling and its
implicit impact on the uAPI.

v2: improvised the Flat-CCS documentation [Danvet & CQ]
v3: made only for 64k pagesize support

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Daniel Vetter <daniel.vetter@ffwll.ch>
cc: Matthew Auld <matthew.auld@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 Documentation/gpu/rfc/i915_dg2.rst | 25 +++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst    |  3 +++
 2 files changed, 28 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_dg2.rst

diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst
new file mode 100644
index 000000000000..f4eb5a219897
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_dg2.rst
@@ -0,0 +1,25 @@
+====================
+I915 DG2 RFC Section
+====================
+
+Upstream plan
+=============
+Plan to upstream the DG2 enabling is:
+
+* Merge basic HW enabling for DG2 (Still without pciid)
+* Merge the 64k support for lmem
+* Merge the flat CCS enabling patches
+* Add the pciid for DG2 and enable the DG2 in CI
+
+
+64K page support for lmem
+=========================
+On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is not
+supported anymore.
+
+DG2 hw doesn't support the 64k (lmem) and 4k (smem) pages in the same ppgtt
+Page table. Refer the struct drm_i915_gem_create_ext for the implication of
+handling the 64k page size.
+
+.. kernel-doc:: include/uapi/drm/i915_drm.h
+        :functions: drm_i915_gem_create_ext
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..afb320ed4028 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -20,6 +20,9 @@ host such documentation:
 
     i915_gem_lmem.rst
 
+.. toctree::
+    i915_dg2.rst
+
 .. toctree::
 
     i915_scheduler.rst
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 10/19] drm/i915/xehpsdv: Add has_flat_ccs to device info
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Lionel Landwerlin, CQ Tang, Matthew Auld

From: CQ Tang <cq.tang@intel.com>

Platforms of XeHP and beyond support 3D surface (buffer) compression and
various compression formats. This is accomplished by an additional
compression control state (CCS) stored for each surface.

Gen 12 devices(TGL family and DG1) stores compression states in a separate
region of memory. It is managed by user-space and has an associated set of
user-space managed page tables used by hardware for address translation.

In Xe HP and beyond (XEHPSDV, DG2, etc), there is a new feature introduced
i.e Flat CCS. It replaced AUX page tables with a flat indexed region of
device memory for storing compression states.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 6 ++++++
 drivers/gpu/drm/i915/i915_pci.c          | 1 +
 drivers/gpu/drm/i915/intel_device_info.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4afdfa5fd3b3..384977798c8e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1528,6 +1528,12 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
 #define HAS_LMEM(i915) HAS_REGION(i915, REGION_LMEM)
 
+/*
+ * Platform has the dedicated compression control state for each lmem surfaces
+ * stored in lmem to support the 3D and media compression formats.
+ */
+#define HAS_FLAT_CCS(dev_priv)   (INTEL_INFO(dev_priv)->has_flat_ccs)
+
 #define HAS_GT_UC(dev_priv)	(INTEL_INFO(dev_priv)->has_gt_uc)
 
 #define HAS_POOLED_EU(dev_priv)	(INTEL_INFO(dev_priv)->has_pooled_eu)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index ce6ae6a3cbdf..3976482582b8 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1003,6 +1003,7 @@ static const struct intel_device_info adl_p_info = {
 	XE_HP_PAGE_SIZES, \
 	.dma_mask_size = 46, \
 	.has_64bit_reloc = 1, \
+	.has_flat_ccs = 1, \
 	.has_global_mocs = 1, \
 	.has_gt_uc = 1, \
 	.has_llc = 1, \
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index d8da40d01dca..ef7c7c988b7b 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -133,6 +133,7 @@ enum intel_ppgtt_type {
 	func(needs_compact_pt); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
+	func(has_flat_ccs); \
 	func(has_global_mocs); \
 	func(has_gt_uc); \
 	func(has_guc_deprivilege); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 10/19] drm/i915/xehpsdv: Add has_flat_ccs to device info
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: CQ Tang, Matthew Auld

From: CQ Tang <cq.tang@intel.com>

Platforms of XeHP and beyond support 3D surface (buffer) compression and
various compression formats. This is accomplished by an additional
compression control state (CCS) stored for each surface.

Gen 12 devices(TGL family and DG1) stores compression states in a separate
region of memory. It is managed by user-space and has an associated set of
user-space managed page tables used by hardware for address translation.

In Xe HP and beyond (XEHPSDV, DG2, etc), there is a new feature introduced
i.e Flat CCS. It replaced AUX page tables with a flat indexed region of
device memory for storing compression states.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 6 ++++++
 drivers/gpu/drm/i915/i915_pci.c          | 1 +
 drivers/gpu/drm/i915/intel_device_info.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4afdfa5fd3b3..384977798c8e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1528,6 +1528,12 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
 #define HAS_LMEM(i915) HAS_REGION(i915, REGION_LMEM)
 
+/*
+ * Platform has the dedicated compression control state for each lmem surfaces
+ * stored in lmem to support the 3D and media compression formats.
+ */
+#define HAS_FLAT_CCS(dev_priv)   (INTEL_INFO(dev_priv)->has_flat_ccs)
+
 #define HAS_GT_UC(dev_priv)	(INTEL_INFO(dev_priv)->has_gt_uc)
 
 #define HAS_POOLED_EU(dev_priv)	(INTEL_INFO(dev_priv)->has_pooled_eu)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index ce6ae6a3cbdf..3976482582b8 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1003,6 +1003,7 @@ static const struct intel_device_info adl_p_info = {
 	XE_HP_PAGE_SIZES, \
 	.dma_mask_size = 46, \
 	.has_64bit_reloc = 1, \
+	.has_flat_ccs = 1, \
 	.has_global_mocs = 1, \
 	.has_gt_uc = 1, \
 	.has_llc = 1, \
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index d8da40d01dca..ef7c7c988b7b 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -133,6 +133,7 @@ enum intel_ppgtt_type {
 	func(needs_compact_pt); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
+	func(has_flat_ccs); \
 	func(has_global_mocs); \
 	func(has_gt_uc); \
 	func(has_guc_deprivilege); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 11/19] drm/i915/lmem: Enable lmem for platforms with Flat CCS
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Abdiel Janulgue, Matthew Auld, Lionel Landwerlin

From: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>

A portion of device memory is reserved for Flat CCS so usable
device memory will be reduced by size of Flat CCS. Size of
Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”.
So to get effective device memory we need to subtract
total device memory by Flat CCS memory size.

v2:
  Addressed the small bar related issue [Matt]
  Removed a reduntant check [Matt]

Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c          | 19 ++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt.h          |  1 +
 drivers/gpu/drm/i915/gt/intel_region_lmem.c | 24 +++++++++++++++++++--
 drivers/gpu/drm/i915/i915_reg.h             |  3 +++
 4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index f59933abbb3a..e40d98cb3a2d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -911,6 +911,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg)
 	return intel_uncore_read_fw(gt->uncore, reg);
 }
 
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg)
+{
+	int type;
+	u8 sliceid, subsliceid;
+
+	for (type = 0; type < NUM_STEERING_TYPES; type++) {
+		if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
+			intel_gt_get_valid_steering(gt, type, &sliceid,
+						    &subsliceid);
+			return intel_uncore_read_with_mcr_steering(gt->uncore,
+								   reg,
+								   sliceid,
+								   subsliceid);
+		}
+	}
+
+	return intel_uncore_read(gt->uncore, reg);
+}
+
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 2dad46c3eff2..0f571c8ee22b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -85,6 +85,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt,
 }
 
 u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg);
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
 
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p);
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index 21215a080088..f1d37b46b505 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -205,8 +205,28 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	if (!IS_DGFX(i915))
 		return ERR_PTR(-ENODEV);
 
-	/* Stolen starts from GSMBASE on DG1 */
-	lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
+	if (HAS_FLAT_CCS(i915)) {
+		u64 tile_stolen, flat_ccs_base_addr_reg, flat_ccs_base;
+
+		lmem_size = pci_resource_len(pdev, 2);
+		flat_ccs_base_addr_reg = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR);
+		flat_ccs_base = (flat_ccs_base_addr_reg >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
+
+		if (GEM_WARN_ON(lmem_size < flat_ccs_base))
+			return ERR_PTR(-ENODEV);
+
+		tile_stolen = lmem_size - flat_ccs_base;
+
+		/* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */
+		if (tile_stolen == lmem_size)
+			DRM_ERROR("CCS_BASE_ADDR register did not have expected value\n");
+
+		lmem_size -= tile_stolen;
+	} else {
+		/* Stolen starts from GSMBASE without CCS */
+		lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE);
+	}
+
 
 	io_start = pci_resource_start(pdev, 2);
 	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0f36af8dc3a1..9b5423572fe9 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -11651,6 +11651,9 @@ enum skl_power_gate {
 #define   SGGI_DIS			REG_BIT(15)
 #define   SGR_DIS			REG_BIT(13)
 
+#define XEHPSDV_FLAT_CCS_BASE_ADDR             _MMIO(0x4910)
+#define   XEHPSDV_CCS_BASE_SHIFT               8
+
 /* gamt regs */
 #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4)
 #define   GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW  0x67F1427F /* max/min for LRA1/2 */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 11/19] drm/i915/lmem: Enable lmem for platforms with Flat CCS
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Abdiel Janulgue, Matthew Auld

From: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>

A portion of device memory is reserved for Flat CCS so usable
device memory will be reduced by size of Flat CCS. Size of
Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”.
So to get effective device memory we need to subtract
total device memory by Flat CCS memory size.

v2:
  Addressed the small bar related issue [Matt]
  Removed a reduntant check [Matt]

Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c          | 19 ++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt.h          |  1 +
 drivers/gpu/drm/i915/gt/intel_region_lmem.c | 24 +++++++++++++++++++--
 drivers/gpu/drm/i915/i915_reg.h             |  3 +++
 4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index f59933abbb3a..e40d98cb3a2d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -911,6 +911,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg)
 	return intel_uncore_read_fw(gt->uncore, reg);
 }
 
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg)
+{
+	int type;
+	u8 sliceid, subsliceid;
+
+	for (type = 0; type < NUM_STEERING_TYPES; type++) {
+		if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
+			intel_gt_get_valid_steering(gt, type, &sliceid,
+						    &subsliceid);
+			return intel_uncore_read_with_mcr_steering(gt->uncore,
+								   reg,
+								   sliceid,
+								   subsliceid);
+		}
+	}
+
+	return intel_uncore_read(gt->uncore, reg);
+}
+
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 2dad46c3eff2..0f571c8ee22b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -85,6 +85,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt,
 }
 
 u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg);
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
 
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p);
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index 21215a080088..f1d37b46b505 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -205,8 +205,28 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	if (!IS_DGFX(i915))
 		return ERR_PTR(-ENODEV);
 
-	/* Stolen starts from GSMBASE on DG1 */
-	lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
+	if (HAS_FLAT_CCS(i915)) {
+		u64 tile_stolen, flat_ccs_base_addr_reg, flat_ccs_base;
+
+		lmem_size = pci_resource_len(pdev, 2);
+		flat_ccs_base_addr_reg = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR);
+		flat_ccs_base = (flat_ccs_base_addr_reg >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
+
+		if (GEM_WARN_ON(lmem_size < flat_ccs_base))
+			return ERR_PTR(-ENODEV);
+
+		tile_stolen = lmem_size - flat_ccs_base;
+
+		/* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */
+		if (tile_stolen == lmem_size)
+			DRM_ERROR("CCS_BASE_ADDR register did not have expected value\n");
+
+		lmem_size -= tile_stolen;
+	} else {
+		/* Stolen starts from GSMBASE without CCS */
+		lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE);
+	}
+
 
 	io_start = pci_resource_start(pdev, 2);
 	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0f36af8dc3a1..9b5423572fe9 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -11651,6 +11651,9 @@ enum skl_power_gate {
 #define   SGGI_DIS			REG_BIT(15)
 #define   SGR_DIS			REG_BIT(13)
 
+#define XEHPSDV_FLAT_CCS_BASE_ADDR             _MMIO(0x4910)
+#define   XEHPSDV_CCS_BASE_SHIFT               8
+
 /* gamt regs */
 #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4)
 #define   GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW  0x67F1427F /* max/min for LRA1/2 */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 12/19] drm/i915/gt: Clear compress metadata for Xe_HP platforms
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Lionel Landwerlin, CQ Tang, Matthew Auld, Ayaz A Siddiqui

From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>

Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.

Flat CCS memory can not be directly accessed by S/W.
Address of CCS buffer associated main BO is automatically calculated
by device itself. KMD/UMD can only access this buffer indirectly using
XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.

v2: Fixed issues with platform naming [Lucas]
v3: Rebased [Ram]
    Used the round_up funcs [Bob]

Cc: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  14 +++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 114 ++++++++++++++++++-
 2 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index f8253012d166..07bf5a1753bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -203,6 +203,20 @@
 #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
 #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
 
+#define XY_CTRL_SURF_INSTR_SIZE	5
+#define MI_FLUSH_DW_SIZE		3
+#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT	21
+#define   DST_ACCESS_TYPE_SHIFT	20
+#define   CCS_SIZE_SHIFT		8
+#define   XY_CTRL_SURF_MOCS_SHIFT	25
+#define   NUM_CCS_BYTES_PER_BLOCK	256
+#define   NUM_CCS_BLKS_PER_XFER	1024
+#define   INDIRECT_ACCESS		0
+#define   DIRECT_ACCESS		1
+#define  MI_FLUSH_LLC			BIT(9)
+#define  MI_FLUSH_CCS			BIT(16)
+
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
 #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index cac791155244..3e1cf224cdf0 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -16,6 +16,8 @@ struct insert_pte_data {
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
+#define GET_CCS_SIZE(i915, size)	(HAS_FLAT_CCS(i915) ? \
+					DIV_ROUND_UP(size, NUM_CCS_BYTES_PER_BLOCK) : 0)
 
 static bool engine_supports_migration(struct intel_engine_cs *engine)
 {
@@ -594,19 +596,105 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
+static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
+{
+	/* Mask the 3 LSB to use the PPGTT address space */
+	*cmd++ = MI_FLUSH_DW | flags;
+	*cmd++ = lower_32_bits(dst);
+	*cmd++ = upper_32_bits(dst);
+
+	return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
+{
+	u32 num_cmds, num_blks, total_size;
+
+	if (!GET_CCS_SIZE(i915, size))
+		return 0;
+
+	/*
+	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
+	 * trnasfer upto 1024 blocks.
+	 */
+	num_blks = GET_CCS_SIZE(i915, size);
+	num_cmds = (num_blks + (NUM_CCS_BLKS_PER_XFER - 1)) >> 10;
+	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
+
+	/*
+	 * We need to add a flush before and after
+	 * XY_CTRL_SURF_COPY_BLT
+	 */
+	total_size += 2 * MI_FLUSH_DW_SIZE;
+	return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+				     u8 src_mem_access, u8 dst_mem_access,
+				     int src_mocs, int dst_mocs,
+				     u16 num_ccs_blocks)
+{
+	int i = num_ccs_blocks;
+
+	/*
+	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+	 * data in and out of the CCS region.
+	 *
+	 * We can copy at most 1024 blocks of 256 bytes using one
+	 * XY_CTRL_SURF_COPY_BLT instruction.
+	 *
+	 * In case we need to copy more than 1024 blocks, we need to add
+	 * another instruction to the same batch buffer.
+	 *
+	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
+	 *
+	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
+	 */
+	do {
+		/*
+		 * We use logical AND with 1023 since the size field
+		 * takes values which is in the range of 0 - 1023
+		 */
+		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
+			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
+			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
+			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
+		*cmd++ = lower_32_bits(src_addr);
+		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
+			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		*cmd++ = lower_32_bits(dst_addr);
+		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
+			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		src_addr += SZ_64M;
+		dst_addr += SZ_64M;
+		i -= NUM_CCS_BLKS_PER_XFER;
+	} while (i > 0);
+
+	return cmd;
+}
+
 static int emit_clear(struct i915_request *rq,
 		      u64 offset,
 		      int size,
-		      u32 value)
+		      u32 value,
+		      bool is_lmem)
 {
+	struct drm_i915_private *i915 = rq->engine->i915;
 	const int ver = GRAPHICS_VER(rq->engine->i915);
+	u32 num_ccs_blks, ccs_ring_size;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
 	offset += (u64)rq->engine->instance << 32;
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
+	/* Clear flat css only when value is 0 */
+	ccs_ring_size = (is_lmem && !value) ?
+			 calc_ctrl_surf_instr_size(i915, size)
+			 : 0;
+
+	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -629,6 +717,26 @@ static int emit_clear(struct i915_request *rq,
 		*cs++ = value;
 	}
 
+	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
+		num_ccs_blks = GET_CCS_SIZE(i915, size);
+
+		/*
+		 * Flat CCS surface can only be accessed via
+		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
+		 * mapping of associated LMEM.
+		 * We can clear ccs surface by writing all 0s,
+		 * so we will flush the previously cleared buffer
+		 * and use it as a source.
+		 */
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
+					      DIRECT_ACCESS, INDIRECT_ACCESS,
+					      1, 1, num_ccs_blks);
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
+
+		if (ccs_ring_size & 1)
+			*cs++ = MI_NOOP;
+	}
 	intel_ring_advance(rq, cs);
 	return 0;
 }
@@ -694,7 +802,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, offset, len, value);
+		err = emit_clear(rq, offset, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 12/19] drm/i915/gt: Clear compress metadata for Xe_HP platforms
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: CQ Tang, Matthew Auld

From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>

Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.

Flat CCS memory can not be directly accessed by S/W.
Address of CCS buffer associated main BO is automatically calculated
by device itself. KMD/UMD can only access this buffer indirectly using
XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.

v2: Fixed issues with platform naming [Lucas]
v3: Rebased [Ram]
    Used the round_up funcs [Bob]

Cc: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  14 +++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 114 ++++++++++++++++++-
 2 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index f8253012d166..07bf5a1753bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -203,6 +203,20 @@
 #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
 #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
 
+#define XY_CTRL_SURF_INSTR_SIZE	5
+#define MI_FLUSH_DW_SIZE		3
+#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT	21
+#define   DST_ACCESS_TYPE_SHIFT	20
+#define   CCS_SIZE_SHIFT		8
+#define   XY_CTRL_SURF_MOCS_SHIFT	25
+#define   NUM_CCS_BYTES_PER_BLOCK	256
+#define   NUM_CCS_BLKS_PER_XFER	1024
+#define   INDIRECT_ACCESS		0
+#define   DIRECT_ACCESS		1
+#define  MI_FLUSH_LLC			BIT(9)
+#define  MI_FLUSH_CCS			BIT(16)
+
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
 #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index cac791155244..3e1cf224cdf0 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -16,6 +16,8 @@ struct insert_pte_data {
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
+#define GET_CCS_SIZE(i915, size)	(HAS_FLAT_CCS(i915) ? \
+					DIV_ROUND_UP(size, NUM_CCS_BYTES_PER_BLOCK) : 0)
 
 static bool engine_supports_migration(struct intel_engine_cs *engine)
 {
@@ -594,19 +596,105 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
+static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
+{
+	/* Mask the 3 LSB to use the PPGTT address space */
+	*cmd++ = MI_FLUSH_DW | flags;
+	*cmd++ = lower_32_bits(dst);
+	*cmd++ = upper_32_bits(dst);
+
+	return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
+{
+	u32 num_cmds, num_blks, total_size;
+
+	if (!GET_CCS_SIZE(i915, size))
+		return 0;
+
+	/*
+	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
+	 * trnasfer upto 1024 blocks.
+	 */
+	num_blks = GET_CCS_SIZE(i915, size);
+	num_cmds = (num_blks + (NUM_CCS_BLKS_PER_XFER - 1)) >> 10;
+	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
+
+	/*
+	 * We need to add a flush before and after
+	 * XY_CTRL_SURF_COPY_BLT
+	 */
+	total_size += 2 * MI_FLUSH_DW_SIZE;
+	return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+				     u8 src_mem_access, u8 dst_mem_access,
+				     int src_mocs, int dst_mocs,
+				     u16 num_ccs_blocks)
+{
+	int i = num_ccs_blocks;
+
+	/*
+	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+	 * data in and out of the CCS region.
+	 *
+	 * We can copy at most 1024 blocks of 256 bytes using one
+	 * XY_CTRL_SURF_COPY_BLT instruction.
+	 *
+	 * In case we need to copy more than 1024 blocks, we need to add
+	 * another instruction to the same batch buffer.
+	 *
+	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
+	 *
+	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
+	 */
+	do {
+		/*
+		 * We use logical AND with 1023 since the size field
+		 * takes values which is in the range of 0 - 1023
+		 */
+		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
+			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
+			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
+			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
+		*cmd++ = lower_32_bits(src_addr);
+		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
+			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		*cmd++ = lower_32_bits(dst_addr);
+		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
+			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		src_addr += SZ_64M;
+		dst_addr += SZ_64M;
+		i -= NUM_CCS_BLKS_PER_XFER;
+	} while (i > 0);
+
+	return cmd;
+}
+
 static int emit_clear(struct i915_request *rq,
 		      u64 offset,
 		      int size,
-		      u32 value)
+		      u32 value,
+		      bool is_lmem)
 {
+	struct drm_i915_private *i915 = rq->engine->i915;
 	const int ver = GRAPHICS_VER(rq->engine->i915);
+	u32 num_ccs_blks, ccs_ring_size;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
 	offset += (u64)rq->engine->instance << 32;
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
+	/* Clear flat css only when value is 0 */
+	ccs_ring_size = (is_lmem && !value) ?
+			 calc_ctrl_surf_instr_size(i915, size)
+			 : 0;
+
+	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -629,6 +717,26 @@ static int emit_clear(struct i915_request *rq,
 		*cs++ = value;
 	}
 
+	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
+		num_ccs_blks = GET_CCS_SIZE(i915, size);
+
+		/*
+		 * Flat CCS surface can only be accessed via
+		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
+		 * mapping of associated LMEM.
+		 * We can clear ccs surface by writing all 0s,
+		 * so we will flush the previously cleared buffer
+		 * and use it as a source.
+		 */
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
+					      DIRECT_ACCESS, INDIRECT_ACCESS,
+					      1, 1, num_ccs_blks);
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
+
+		if (ccs_ring_size & 1)
+			*cs++ = MI_NOOP;
+	}
 	intel_ring_advance(rq, cs);
 	return 0;
 }
@@ -694,7 +802,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, offset, len, value);
+		err = emit_clear(rq, offset, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 13/19] drm/i915: Introduce new Tile 4 format
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Nanley Chery, Lionel Landwerlin, Stanislav Lisovskiy, Matthew Auld

From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>

This tiling layout uses 4KB tiles in a row-major layout. It has the same
shape as Tile Y at two granularities: 4KB (128B x 32) and 64B (16B x 4). It
only differs from Tile Y at the 256B granularity in between. At this
granularity, Tile Y has a shape of 16B x 32 rows, but this tiling has a shape
of 64B x 8 rows.

Reviewed-by: Imre Deak <imre.deak@intel.com>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
---
 include/uapi/drm/drm_fourcc.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index fc0c1454d275..b73fe6797fc3 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -572,6 +572,17 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
 
+/*
+ * Intel Tile 4 layout
+ *
+ * This is a tiled layout using 4KB tiles in a row-major layout. It has the same
+ * shape as Tile Y at two granularities: 4KB (128B x 32) and 64B (16B x 4). It
+ * only differs from Tile Y at the 256B granularity in between. At this
+ * granularity, Tile Y has a shape of 16B x 32 rows, but this tiling has a shape
+ * of 64B x 8 rows.
+ */
+#define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 13/19] drm/i915: Introduce new Tile 4 format
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Nanley Chery, Matthew Auld

From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>

This tiling layout uses 4KB tiles in a row-major layout. It has the same
shape as Tile Y at two granularities: 4KB (128B x 32) and 64B (16B x 4). It
only differs from Tile Y at the 256B granularity in between. At this
granularity, Tile Y has a shape of 16B x 32 rows, but this tiling has a shape
of 64B x 8 rows.

Reviewed-by: Imre Deak <imre.deak@intel.com>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
---
 include/uapi/drm/drm_fourcc.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index fc0c1454d275..b73fe6797fc3 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -572,6 +572,17 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
 
+/*
+ * Intel Tile 4 layout
+ *
+ * This is a tiled layout using 4KB tiles in a row-major layout. It has the same
+ * shape as Tile Y at two granularities: 4KB (128B x 32) and 64B (16B x 4). It
+ * only differs from Tile Y at the 256B granularity in between. At this
+ * granularity, Tile Y has a shape of 16B x 32 rows, but this tiling has a shape
+ * of 64B x 8 rows.
+ */
+#define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 14/19] drm/i915/dg2: Tile 4 plane format support
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Lionel Landwerlin, Stanislav Lisovskiy, Matthew Auld

From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>

Tile4 in bspec format is 4K tile organized into
64B subtiles with same basic shape as for legacy TileY
which will be supported by Display13.

v2: - Moved Tile4 assocating struct for modifier/display to
      the beginning(Imre Deak)
    - Removed unneeded case I915_FORMAT_MOD_4_TILED modifier
      checks(Imre Deak)
    - Fixed I915_FORMAT_MOD_4_TILED to be 9 instead of 12
      (Imre Deak)

v3: - Rebased patch on top of new changes related to plane_caps.
    - Added static assert to check that PLANE_CTL_TILING_YF
      matches PLANE_CTL_TILING_4(Nanley Chery)
    - Fixed naming and layout description for Tile 4 in drm uapi
      header(Nanley Chery)

v4: - Extracted drm_fourcc changes to separate patch(Nanley Chery)

Reviewed-by: Imre Deak <imre.deak@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
---
 drivers/gpu/drm/i915/display/intel_display.c  |  1 +
 drivers/gpu/drm/i915/display/intel_fb.c       | 15 +++++++++++-
 drivers/gpu/drm/i915/display/intel_fb.h       |  1 +
 drivers/gpu/drm/i915/display/intel_fbc.c      |  1 +
 .../drm/i915/display/intel_plane_initial.c    |  1 +
 .../drm/i915/display/skl_universal_plane.c    | 23 ++++++++++++-------
 drivers/gpu/drm/i915/i915_drv.h               |  1 +
 drivers/gpu/drm/i915/i915_pci.c               |  1 +
 drivers/gpu/drm/i915/i915_reg.h               |  1 +
 drivers/gpu/drm/i915/intel_device_info.h      |  1 +
 drivers/gpu/drm/i915/intel_pm.c               |  1 +
 11 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 75de794185b2..189767cef356 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -7737,6 +7737,7 @@ static int intel_atomic_check_async(struct intel_atomic_state *state, struct int
 		case I915_FORMAT_MOD_X_TILED:
 		case I915_FORMAT_MOD_Y_TILED:
 		case I915_FORMAT_MOD_Yf_TILED:
+		case I915_FORMAT_MOD_4_TILED:
 			break;
 		default:
 			drm_dbg_kms(&i915->drm,
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index 23cfe2e5ce2a..94c57facbb46 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -135,11 +135,16 @@ struct intel_modifier_desc {
 					 INTEL_PLANE_CAP_CCS_MC)
 #define INTEL_PLANE_CAP_TILING_MASK	(INTEL_PLANE_CAP_TILING_X | \
 					 INTEL_PLANE_CAP_TILING_Y | \
-					 INTEL_PLANE_CAP_TILING_Yf)
+					 INTEL_PLANE_CAP_TILING_Yf | \
+					 INTEL_PLANE_CAP_TILING_4)
 #define INTEL_PLANE_CAP_TILING_NONE	0
 
 static const struct intel_modifier_desc intel_modifiers[] = {
 	{
+		.modifier = I915_FORMAT_MOD_4_TILED,
+		.display_ver = { 13, 13 },
+		.plane_caps = INTEL_PLANE_CAP_TILING_4,
+	}, {
 		.modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS,
 		.display_ver = { 12, 13 },
 		.plane_caps = INTEL_PLANE_CAP_TILING_Y | INTEL_PLANE_CAP_CCS_MC,
@@ -545,6 +550,12 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
 			return 128;
 		else
 			return 512;
+	case I915_FORMAT_MOD_4_TILED:
+		/*
+		 * Each 4K tile consists of 64B(8*8) subtiles, with
+		 * same shape as Y Tile(i.e 4*16B OWords)
+		 */
+		return 128;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 		if (intel_fb_is_ccs_aux_plane(fb, color_plane))
 			return 128;
@@ -650,6 +661,7 @@ static unsigned int intel_fb_modifier_to_tiling(u64 fb_modifier)
 		return I915_TILING_Y;
 	case INTEL_PLANE_CAP_TILING_X:
 		return I915_TILING_X;
+	case INTEL_PLANE_CAP_TILING_4:
 	case INTEL_PLANE_CAP_TILING_Yf:
 	case INTEL_PLANE_CAP_TILING_NONE:
 		return I915_TILING_NONE;
@@ -737,6 +749,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Yf_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED:
+	case I915_FORMAT_MOD_4_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
 		return 1 * 1024 * 1024;
 	default:
diff --git a/drivers/gpu/drm/i915/display/intel_fb.h b/drivers/gpu/drm/i915/display/intel_fb.h
index ba9df8986c1e..12386f13a4e0 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.h
+++ b/drivers/gpu/drm/i915/display/intel_fb.h
@@ -27,6 +27,7 @@ struct intel_plane_state;
 #define INTEL_PLANE_CAP_TILING_X	BIT(3)
 #define INTEL_PLANE_CAP_TILING_Y	BIT(4)
 #define INTEL_PLANE_CAP_TILING_Yf	BIT(5)
+#define INTEL_PLANE_CAP_TILING_4	BIT(6)
 
 bool intel_fb_is_ccs_modifier(u64 modifier);
 bool intel_fb_is_rc_ccs_cc_modifier(u64 modifier);
diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c
index bcdffe62f3cb..dccd9b7cde3f 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -946,6 +946,7 @@ static bool tiling_is_valid(const struct intel_plane_state *plane_state)
 	case I915_FORMAT_MOD_Y_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
 		return DISPLAY_VER(i915) >= 9;
+	case I915_FORMAT_MOD_4_TILED:
 	case I915_FORMAT_MOD_X_TILED:
 		return true;
 	default:
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
index e4186a0b8edb..426a8bd30abc 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -126,6 +126,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 	case DRM_FORMAT_MOD_LINEAR:
 	case I915_FORMAT_MOD_X_TILED:
 	case I915_FORMAT_MOD_Y_TILED:
+	case I915_FORMAT_MOD_4_TILED:
 		break;
 	default:
 		drm_dbg(&dev_priv->drm,
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 1223075595ff..5299dfe68802 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -762,6 +762,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
 		return PLANE_CTL_TILED_X;
 	case I915_FORMAT_MOD_Y_TILED:
 		return PLANE_CTL_TILED_Y;
+	case I915_FORMAT_MOD_4_TILED:
+		return PLANE_CTL_TILED_4;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
 		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2011,9 +2013,7 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 	case DRM_FORMAT_Y216:
 	case DRM_FORMAT_XVYU12_16161616:
 	case DRM_FORMAT_XVYU16161616:
-		if (modifier == DRM_FORMAT_MOD_LINEAR ||
-		    modifier == I915_FORMAT_MOD_X_TILED ||
-		    modifier == I915_FORMAT_MOD_Y_TILED)
+		if (!intel_fb_is_ccs_modifier(modifier))
 			return true;
 		fallthrough;
 	default:
@@ -2106,6 +2106,8 @@ static u8 skl_get_plane_caps(struct drm_i915_private *i915,
 		caps |= INTEL_PLANE_CAP_TILING_Y;
 	if (DISPLAY_VER(i915) < 12)
 		caps |= INTEL_PLANE_CAP_TILING_Yf;
+	if (HAS_4TILE(i915))
+		caps |= INTEL_PLANE_CAP_TILING_4;
 
 	if (skl_plane_has_rc_ccs(i915, pipe, plane_id)) {
 		caps |= INTEL_PLANE_CAP_CCS_RC;
@@ -2278,6 +2280,7 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 	unsigned int aligned_height;
 	struct drm_framebuffer *fb;
 	struct intel_framebuffer *intel_fb;
+	static_assert(PLANE_CTL_TILED_YF == PLANE_CTL_TILED_4);
 
 	if (!plane->get_hw_state(plane, &pipe))
 		return;
@@ -2340,11 +2343,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 		else
 			fb->modifier = I915_FORMAT_MOD_Y_TILED;
 		break;
-	case PLANE_CTL_TILED_YF:
-		if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
-			fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
-		else
-			fb->modifier = I915_FORMAT_MOD_Yf_TILED;
+	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
+		if (HAS_4TILE(dev_priv)) {
+			fb->modifier = I915_FORMAT_MOD_4_TILED;
+		} else {
+			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
+			else
+				fb->modifier = I915_FORMAT_MOD_Yf_TILED;
+		}
 		break;
 	default:
 		MISSING_CASE(tiling);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 384977798c8e..3011c05b5b9c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1424,6 +1424,7 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define CMDPARSER_USES_GGTT(dev_priv) (GRAPHICS_VER(dev_priv) == 7)
 
 #define HAS_LLC(dev_priv)	(INTEL_INFO(dev_priv)->has_llc)
+#define HAS_4TILE(dev_priv)	(INTEL_INFO(dev_priv)->has_4tile)
 #define HAS_SNOOP(dev_priv)	(INTEL_INFO(dev_priv)->has_snoop)
 #define HAS_EDRAM(dev_priv)	((dev_priv)->edram_size_mb)
 #define HAS_SECURE_BATCHES(dev_priv) (GRAPHICS_VER(dev_priv) < 6)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 3976482582b8..436aae34b1f1 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1045,6 +1045,7 @@ static const struct intel_device_info dg2_info = {
 	DGFX_FEATURES,
 	.graphics.rel = 55,
 	.media.rel = 55,
+	.has_4tile = 1,
 	PLATFORM(INTEL_DG2),
 	.has_guc_deprivilege = 1,
 	.has_64k_pages = 1,
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 9b5423572fe9..14065164fdcf 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6450,6 +6450,7 @@ enum {
 #define   PLANE_CTL_TILED_X			REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 1)
 #define   PLANE_CTL_TILED_Y			REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 4)
 #define   PLANE_CTL_TILED_YF			REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 5)
+#define   PLANE_CTL_TILED_4                     REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 5)
 #define   PLANE_CTL_ASYNC_FLIP			REG_BIT(9)
 #define   PLANE_CTL_FLIP_HORIZONTAL		REG_BIT(8)
 #define   PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE	REG_BIT(4) /* TGL+ */
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index ef7c7c988b7b..081f9849797d 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -134,6 +134,7 @@ enum intel_ppgtt_type {
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
 	func(has_flat_ccs); \
+	func(has_4tile); \
 	func(has_global_mocs); \
 	func(has_gt_uc); \
 	func(has_guc_deprivilege); \
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 46b21680e601..e81791bf7d03 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5402,6 +5402,7 @@ skl_compute_wm_params(const struct intel_crtc_state *crtc_state,
 	}
 
 	wp->y_tiled = modifier == I915_FORMAT_MOD_Y_TILED ||
+		      modifier == I915_FORMAT_MOD_4_TILED ||
 		      modifier == I915_FORMAT_MOD_Yf_TILED ||
 		      modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
 		      modifier == I915_FORMAT_MOD_Yf_TILED_CCS;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 14/19] drm/i915/dg2: Tile 4 plane format support
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Matthew Auld

From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>

Tile4 in bspec format is 4K tile organized into
64B subtiles with same basic shape as for legacy TileY
which will be supported by Display13.

v2: - Moved Tile4 assocating struct for modifier/display to
      the beginning(Imre Deak)
    - Removed unneeded case I915_FORMAT_MOD_4_TILED modifier
      checks(Imre Deak)
    - Fixed I915_FORMAT_MOD_4_TILED to be 9 instead of 12
      (Imre Deak)

v3: - Rebased patch on top of new changes related to plane_caps.
    - Added static assert to check that PLANE_CTL_TILING_YF
      matches PLANE_CTL_TILING_4(Nanley Chery)
    - Fixed naming and layout description for Tile 4 in drm uapi
      header(Nanley Chery)

v4: - Extracted drm_fourcc changes to separate patch(Nanley Chery)

Reviewed-by: Imre Deak <imre.deak@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
---
 drivers/gpu/drm/i915/display/intel_display.c  |  1 +
 drivers/gpu/drm/i915/display/intel_fb.c       | 15 +++++++++++-
 drivers/gpu/drm/i915/display/intel_fb.h       |  1 +
 drivers/gpu/drm/i915/display/intel_fbc.c      |  1 +
 .../drm/i915/display/intel_plane_initial.c    |  1 +
 .../drm/i915/display/skl_universal_plane.c    | 23 ++++++++++++-------
 drivers/gpu/drm/i915/i915_drv.h               |  1 +
 drivers/gpu/drm/i915/i915_pci.c               |  1 +
 drivers/gpu/drm/i915/i915_reg.h               |  1 +
 drivers/gpu/drm/i915/intel_device_info.h      |  1 +
 drivers/gpu/drm/i915/intel_pm.c               |  1 +
 11 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 75de794185b2..189767cef356 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -7737,6 +7737,7 @@ static int intel_atomic_check_async(struct intel_atomic_state *state, struct int
 		case I915_FORMAT_MOD_X_TILED:
 		case I915_FORMAT_MOD_Y_TILED:
 		case I915_FORMAT_MOD_Yf_TILED:
+		case I915_FORMAT_MOD_4_TILED:
 			break;
 		default:
 			drm_dbg_kms(&i915->drm,
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index 23cfe2e5ce2a..94c57facbb46 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -135,11 +135,16 @@ struct intel_modifier_desc {
 					 INTEL_PLANE_CAP_CCS_MC)
 #define INTEL_PLANE_CAP_TILING_MASK	(INTEL_PLANE_CAP_TILING_X | \
 					 INTEL_PLANE_CAP_TILING_Y | \
-					 INTEL_PLANE_CAP_TILING_Yf)
+					 INTEL_PLANE_CAP_TILING_Yf | \
+					 INTEL_PLANE_CAP_TILING_4)
 #define INTEL_PLANE_CAP_TILING_NONE	0
 
 static const struct intel_modifier_desc intel_modifiers[] = {
 	{
+		.modifier = I915_FORMAT_MOD_4_TILED,
+		.display_ver = { 13, 13 },
+		.plane_caps = INTEL_PLANE_CAP_TILING_4,
+	}, {
 		.modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS,
 		.display_ver = { 12, 13 },
 		.plane_caps = INTEL_PLANE_CAP_TILING_Y | INTEL_PLANE_CAP_CCS_MC,
@@ -545,6 +550,12 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
 			return 128;
 		else
 			return 512;
+	case I915_FORMAT_MOD_4_TILED:
+		/*
+		 * Each 4K tile consists of 64B(8*8) subtiles, with
+		 * same shape as Y Tile(i.e 4*16B OWords)
+		 */
+		return 128;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 		if (intel_fb_is_ccs_aux_plane(fb, color_plane))
 			return 128;
@@ -650,6 +661,7 @@ static unsigned int intel_fb_modifier_to_tiling(u64 fb_modifier)
 		return I915_TILING_Y;
 	case INTEL_PLANE_CAP_TILING_X:
 		return I915_TILING_X;
+	case INTEL_PLANE_CAP_TILING_4:
 	case INTEL_PLANE_CAP_TILING_Yf:
 	case INTEL_PLANE_CAP_TILING_NONE:
 		return I915_TILING_NONE;
@@ -737,6 +749,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Yf_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED:
+	case I915_FORMAT_MOD_4_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
 		return 1 * 1024 * 1024;
 	default:
diff --git a/drivers/gpu/drm/i915/display/intel_fb.h b/drivers/gpu/drm/i915/display/intel_fb.h
index ba9df8986c1e..12386f13a4e0 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.h
+++ b/drivers/gpu/drm/i915/display/intel_fb.h
@@ -27,6 +27,7 @@ struct intel_plane_state;
 #define INTEL_PLANE_CAP_TILING_X	BIT(3)
 #define INTEL_PLANE_CAP_TILING_Y	BIT(4)
 #define INTEL_PLANE_CAP_TILING_Yf	BIT(5)
+#define INTEL_PLANE_CAP_TILING_4	BIT(6)
 
 bool intel_fb_is_ccs_modifier(u64 modifier);
 bool intel_fb_is_rc_ccs_cc_modifier(u64 modifier);
diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c
index bcdffe62f3cb..dccd9b7cde3f 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -946,6 +946,7 @@ static bool tiling_is_valid(const struct intel_plane_state *plane_state)
 	case I915_FORMAT_MOD_Y_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
 		return DISPLAY_VER(i915) >= 9;
+	case I915_FORMAT_MOD_4_TILED:
 	case I915_FORMAT_MOD_X_TILED:
 		return true;
 	default:
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
index e4186a0b8edb..426a8bd30abc 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -126,6 +126,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 	case DRM_FORMAT_MOD_LINEAR:
 	case I915_FORMAT_MOD_X_TILED:
 	case I915_FORMAT_MOD_Y_TILED:
+	case I915_FORMAT_MOD_4_TILED:
 		break;
 	default:
 		drm_dbg(&dev_priv->drm,
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 1223075595ff..5299dfe68802 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -762,6 +762,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
 		return PLANE_CTL_TILED_X;
 	case I915_FORMAT_MOD_Y_TILED:
 		return PLANE_CTL_TILED_Y;
+	case I915_FORMAT_MOD_4_TILED:
+		return PLANE_CTL_TILED_4;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
 		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2011,9 +2013,7 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 	case DRM_FORMAT_Y216:
 	case DRM_FORMAT_XVYU12_16161616:
 	case DRM_FORMAT_XVYU16161616:
-		if (modifier == DRM_FORMAT_MOD_LINEAR ||
-		    modifier == I915_FORMAT_MOD_X_TILED ||
-		    modifier == I915_FORMAT_MOD_Y_TILED)
+		if (!intel_fb_is_ccs_modifier(modifier))
 			return true;
 		fallthrough;
 	default:
@@ -2106,6 +2106,8 @@ static u8 skl_get_plane_caps(struct drm_i915_private *i915,
 		caps |= INTEL_PLANE_CAP_TILING_Y;
 	if (DISPLAY_VER(i915) < 12)
 		caps |= INTEL_PLANE_CAP_TILING_Yf;
+	if (HAS_4TILE(i915))
+		caps |= INTEL_PLANE_CAP_TILING_4;
 
 	if (skl_plane_has_rc_ccs(i915, pipe, plane_id)) {
 		caps |= INTEL_PLANE_CAP_CCS_RC;
@@ -2278,6 +2280,7 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 	unsigned int aligned_height;
 	struct drm_framebuffer *fb;
 	struct intel_framebuffer *intel_fb;
+	static_assert(PLANE_CTL_TILED_YF == PLANE_CTL_TILED_4);
 
 	if (!plane->get_hw_state(plane, &pipe))
 		return;
@@ -2340,11 +2343,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 		else
 			fb->modifier = I915_FORMAT_MOD_Y_TILED;
 		break;
-	case PLANE_CTL_TILED_YF:
-		if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
-			fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
-		else
-			fb->modifier = I915_FORMAT_MOD_Yf_TILED;
+	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
+		if (HAS_4TILE(dev_priv)) {
+			fb->modifier = I915_FORMAT_MOD_4_TILED;
+		} else {
+			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
+			else
+				fb->modifier = I915_FORMAT_MOD_Yf_TILED;
+		}
 		break;
 	default:
 		MISSING_CASE(tiling);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 384977798c8e..3011c05b5b9c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1424,6 +1424,7 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define CMDPARSER_USES_GGTT(dev_priv) (GRAPHICS_VER(dev_priv) == 7)
 
 #define HAS_LLC(dev_priv)	(INTEL_INFO(dev_priv)->has_llc)
+#define HAS_4TILE(dev_priv)	(INTEL_INFO(dev_priv)->has_4tile)
 #define HAS_SNOOP(dev_priv)	(INTEL_INFO(dev_priv)->has_snoop)
 #define HAS_EDRAM(dev_priv)	((dev_priv)->edram_size_mb)
 #define HAS_SECURE_BATCHES(dev_priv) (GRAPHICS_VER(dev_priv) < 6)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 3976482582b8..436aae34b1f1 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1045,6 +1045,7 @@ static const struct intel_device_info dg2_info = {
 	DGFX_FEATURES,
 	.graphics.rel = 55,
 	.media.rel = 55,
+	.has_4tile = 1,
 	PLATFORM(INTEL_DG2),
 	.has_guc_deprivilege = 1,
 	.has_64k_pages = 1,
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 9b5423572fe9..14065164fdcf 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6450,6 +6450,7 @@ enum {
 #define   PLANE_CTL_TILED_X			REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 1)
 #define   PLANE_CTL_TILED_Y			REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 4)
 #define   PLANE_CTL_TILED_YF			REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 5)
+#define   PLANE_CTL_TILED_4                     REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 5)
 #define   PLANE_CTL_ASYNC_FLIP			REG_BIT(9)
 #define   PLANE_CTL_FLIP_HORIZONTAL		REG_BIT(8)
 #define   PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE	REG_BIT(4) /* TGL+ */
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index ef7c7c988b7b..081f9849797d 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -134,6 +134,7 @@ enum intel_ppgtt_type {
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
 	func(has_flat_ccs); \
+	func(has_4tile); \
 	func(has_global_mocs); \
 	func(has_gt_uc); \
 	func(has_guc_deprivilege); \
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 46b21680e601..e81791bf7d03 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5402,6 +5402,7 @@ skl_compute_wm_params(const struct intel_crtc_state *crtc_state,
 	}
 
 	wp->y_tiled = modifier == I915_FORMAT_MOD_Y_TILED ||
+		      modifier == I915_FORMAT_MOD_4_TILED ||
 		      modifier == I915_FORMAT_MOD_Yf_TILED ||
 		      modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
 		      modifier == I915_FORMAT_MOD_Yf_TILED_CCS;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Radhakrishna Sripada, Anshuman Gupta, Lionel Landwerlin, Matthew Auld

From: Matt Roper <matthew.d.roper@intel.com>

DG2 unifies render compression and media compression into a single
format for the first time.  The programming and buffer layout is
supposed to match compression on older gen12 platforms, but the actual
compression algorithm is different from any previous platform; as such,
we need a new framebuffer modifier to represent buffers in this format,
but otherwise we can re-use the existing gen12 compression driver logic.

v2:
  Display version fix [Imre]

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/display/intel_fb.c       | 13 ++++++++++
 .../drm/i915/display/skl_universal_plane.c    | 26 ++++++++++++++++---
 include/uapi/drm/drm_fourcc.h                 | 22 ++++++++++++++++
 3 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index 94c57facbb46..4d4d01963f15 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -141,6 +141,14 @@ struct intel_modifier_desc {
 
 static const struct intel_modifier_desc intel_modifiers[] = {
 	{
+		.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
+		.display_ver = { 13, 13 },
+		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
+	}, {
+		.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
+		.display_ver = { 13, 13 },
+		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC,
+	}, {
 		.modifier = I915_FORMAT_MOD_4_TILED,
 		.display_ver = { 13, 13 },
 		.plane_caps = INTEL_PLANE_CAP_TILING_4,
@@ -550,6 +558,8 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
 			return 128;
 		else
 			return 512;
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
 	case I915_FORMAT_MOD_4_TILED:
 		/*
 		 * Each 4K tile consists of 64B(8*8) subtiles, with
@@ -752,6 +762,9 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
 	case I915_FORMAT_MOD_4_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
 		return 1 * 1024 * 1024;
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
+		return 16 * 1024;
 	default:
 		MISSING_CASE(fb->modifier);
 		return 0;
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 5299dfe68802..c38ae0876c15 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -764,6 +764,14 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
 		return PLANE_CTL_TILED_Y;
 	case I915_FORMAT_MOD_4_TILED:
 		return PLANE_CTL_TILED_4;
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+		return PLANE_CTL_TILED_4 |
+			PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
+			PLANE_CTL_CLEAR_COLOR_DISABLE;
+	case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
+		return PLANE_CTL_TILED_4 |
+			PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
+			PLANE_CTL_CLEAR_COLOR_DISABLE;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
 		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2094,6 +2102,10 @@ static bool gen12_plane_has_mc_ccs(struct drm_i915_private *i915,
 	if (IS_ADLP_DISPLAY_STEP(i915, STEP_A0, STEP_B0))
 		return false;
 
+	/* Wa_14013215631 */
+	if (IS_DG2_DISPLAY_STEP(i915, STEP_A0, STEP_C0))
+		return false;
+
 	return plane_id < PLANE_SPRITE4;
 }
 
@@ -2335,9 +2347,10 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 	case PLANE_CTL_TILED_Y:
 		plane_config->tiling = I915_TILING_Y;
 		if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
-			fb->modifier = DISPLAY_VER(dev_priv) >= 12 ?
-				I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS :
-				I915_FORMAT_MOD_Y_TILED_CCS;
+			if (DISPLAY_VER(dev_priv) >= 12)
+				fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS;
+			else
+				fb->modifier = I915_FORMAT_MOD_Y_TILED_CCS;
 		else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
 			fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS;
 		else
@@ -2345,7 +2358,12 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 		break;
 	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
 		if (HAS_4TILE(dev_priv)) {
-			fb->modifier = I915_FORMAT_MOD_4_TILED;
+			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
+			else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
+			else
+				fb->modifier = I915_FORMAT_MOD_4_TILED;
 		} else {
 			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
 				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index b73fe6797fc3..b8fb7b44c03c 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -583,6 +583,28 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
 
+/*
+ * Intel color control surfaces (CCS) for DG2 render compression.
+ *
+ * DG2 uses a new compression format for render compression. The general
+ * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
+ * but a new hashing/compression algorithm is used, so a fresh modifier must
+ * be associated with buffers of this type. Render compression uses 128 byte
+ * compression blocks.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 media compression.
+ *
+ * DG2 uses a new compression format for media compression. The general
+ * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
+ * but a new hashing/compression algorithm is used, so a fresh modifier must
+ * be associated with buffers of this type. Media compression uses 256 byte
+ * compression blocks.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Matthew Auld

From: Matt Roper <matthew.d.roper@intel.com>

DG2 unifies render compression and media compression into a single
format for the first time.  The programming and buffer layout is
supposed to match compression on older gen12 platforms, but the actual
compression algorithm is different from any previous platform; as such,
we need a new framebuffer modifier to represent buffers in this format,
but otherwise we can re-use the existing gen12 compression driver logic.

v2:
  Display version fix [Imre]

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/display/intel_fb.c       | 13 ++++++++++
 .../drm/i915/display/skl_universal_plane.c    | 26 ++++++++++++++++---
 include/uapi/drm/drm_fourcc.h                 | 22 ++++++++++++++++
 3 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index 94c57facbb46..4d4d01963f15 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -141,6 +141,14 @@ struct intel_modifier_desc {
 
 static const struct intel_modifier_desc intel_modifiers[] = {
 	{
+		.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
+		.display_ver = { 13, 13 },
+		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
+	}, {
+		.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
+		.display_ver = { 13, 13 },
+		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC,
+	}, {
 		.modifier = I915_FORMAT_MOD_4_TILED,
 		.display_ver = { 13, 13 },
 		.plane_caps = INTEL_PLANE_CAP_TILING_4,
@@ -550,6 +558,8 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
 			return 128;
 		else
 			return 512;
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
 	case I915_FORMAT_MOD_4_TILED:
 		/*
 		 * Each 4K tile consists of 64B(8*8) subtiles, with
@@ -752,6 +762,9 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
 	case I915_FORMAT_MOD_4_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
 		return 1 * 1024 * 1024;
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
+		return 16 * 1024;
 	default:
 		MISSING_CASE(fb->modifier);
 		return 0;
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 5299dfe68802..c38ae0876c15 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -764,6 +764,14 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
 		return PLANE_CTL_TILED_Y;
 	case I915_FORMAT_MOD_4_TILED:
 		return PLANE_CTL_TILED_4;
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+		return PLANE_CTL_TILED_4 |
+			PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
+			PLANE_CTL_CLEAR_COLOR_DISABLE;
+	case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
+		return PLANE_CTL_TILED_4 |
+			PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
+			PLANE_CTL_CLEAR_COLOR_DISABLE;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
 		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2094,6 +2102,10 @@ static bool gen12_plane_has_mc_ccs(struct drm_i915_private *i915,
 	if (IS_ADLP_DISPLAY_STEP(i915, STEP_A0, STEP_B0))
 		return false;
 
+	/* Wa_14013215631 */
+	if (IS_DG2_DISPLAY_STEP(i915, STEP_A0, STEP_C0))
+		return false;
+
 	return plane_id < PLANE_SPRITE4;
 }
 
@@ -2335,9 +2347,10 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 	case PLANE_CTL_TILED_Y:
 		plane_config->tiling = I915_TILING_Y;
 		if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
-			fb->modifier = DISPLAY_VER(dev_priv) >= 12 ?
-				I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS :
-				I915_FORMAT_MOD_Y_TILED_CCS;
+			if (DISPLAY_VER(dev_priv) >= 12)
+				fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS;
+			else
+				fb->modifier = I915_FORMAT_MOD_Y_TILED_CCS;
 		else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
 			fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS;
 		else
@@ -2345,7 +2358,12 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 		break;
 	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
 		if (HAS_4TILE(dev_priv)) {
-			fb->modifier = I915_FORMAT_MOD_4_TILED;
+			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
+			else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
+			else
+				fb->modifier = I915_FORMAT_MOD_4_TILED;
 		} else {
 			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
 				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index b73fe6797fc3..b8fb7b44c03c 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -583,6 +583,28 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
 
+/*
+ * Intel color control surfaces (CCS) for DG2 render compression.
+ *
+ * DG2 uses a new compression format for render compression. The general
+ * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
+ * but a new hashing/compression algorithm is used, so a fresh modifier must
+ * be associated with buffers of this type. Render compression uses 128 byte
+ * compression blocks.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 media compression.
+ *
+ * DG2 uses a new compression format for media compression. The general
+ * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
+ * but a new hashing/compression algorithm is used, so a fresh modifier must
+ * be associated with buffers of this type. Media compression uses 256 byte
+ * compression blocks.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Mika Kahola, Lionel Landwerlin, Matthew Auld, Anshuman Gupta

From: Mika Kahola <mika.kahola@intel.com>

DG2 clear color render compression uses Tile4 layout. Therefore, we need
to define a new format modifier for uAPI to support clear color rendering.

v2:
  Display version is fixed. [Imre]
  KDoc is enhanced for cc modifier. [Nanley & Lionel]

Signed-off-by: Mika Kahola <mika.kahola@intel.com>
cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
 drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
 include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index 4d4d01963f15..3df6ef5ffec5 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -144,6 +144,12 @@ static const struct intel_modifier_desc intel_modifiers[] = {
 		.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
 		.display_ver = { 13, 13 },
 		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
+	}, {
+		.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
+		.display_ver = { 13, 13 },
+		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
+
+		.ccs.cc_planes = BIT(1),
 	}, {
 		.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
 		.display_ver = { 13, 13 },
@@ -559,6 +565,7 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
 		else
 			return 512;
 	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
 	case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
 	case I915_FORMAT_MOD_4_TILED:
 		/*
@@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
 	case I915_FORMAT_MOD_Yf_TILED:
 		return 1 * 1024 * 1024;
 	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
 	case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
 		return 16 * 1024;
 	default:
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index c38ae0876c15..b4dced1907c5 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
 		return PLANE_CTL_TILED_4 |
 			PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
 			PLANE_CTL_CLEAR_COLOR_DISABLE;
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
+		return PLANE_CTL_TILED_4 | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
 		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 		break;
 	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
 		if (HAS_4TILE(dev_priv)) {
-			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+			u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
+				      PLANE_CTL_CLEAR_COLOR_DISABLE;
+
+			if ((val & rc_mask) == rc_mask)
 				fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
 			else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
 				fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
+			else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
 			else
 				fb->modifier = I915_FORMAT_MOD_4_TILED;
 		} else {
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index b8fb7b44c03c..697614ea4b84 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -605,6 +605,16 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
 
+/*
+ * Intel color control surfaces (CCS) for DG2 clear color render compression.
+ *
+ * DG2 uses a unified compression format for clear color render compression.
+ * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
+ *
+ * Fast clear color value expected by HW is located in fb at offset 0 of plane#1
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Matthew Auld

From: Mika Kahola <mika.kahola@intel.com>

DG2 clear color render compression uses Tile4 layout. Therefore, we need
to define a new format modifier for uAPI to support clear color rendering.

v2:
  Display version is fixed. [Imre]
  KDoc is enhanced for cc modifier. [Nanley & Lionel]

Signed-off-by: Mika Kahola <mika.kahola@intel.com>
cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
 drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
 include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index 4d4d01963f15..3df6ef5ffec5 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -144,6 +144,12 @@ static const struct intel_modifier_desc intel_modifiers[] = {
 		.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
 		.display_ver = { 13, 13 },
 		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
+	}, {
+		.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
+		.display_ver = { 13, 13 },
+		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
+
+		.ccs.cc_planes = BIT(1),
 	}, {
 		.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
 		.display_ver = { 13, 13 },
@@ -559,6 +565,7 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
 		else
 			return 512;
 	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
 	case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
 	case I915_FORMAT_MOD_4_TILED:
 		/*
@@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
 	case I915_FORMAT_MOD_Yf_TILED:
 		return 1 * 1024 * 1024;
 	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
 	case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
 		return 16 * 1024;
 	default:
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index c38ae0876c15..b4dced1907c5 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
 		return PLANE_CTL_TILED_4 |
 			PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
 			PLANE_CTL_CLEAR_COLOR_DISABLE;
+	case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
+		return PLANE_CTL_TILED_4 | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
 		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 		break;
 	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
 		if (HAS_4TILE(dev_priv)) {
-			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+			u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
+				      PLANE_CTL_CLEAR_COLOR_DISABLE;
+
+			if ((val & rc_mask) == rc_mask)
 				fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
 			else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
 				fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
+			else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
 			else
 				fb->modifier = I915_FORMAT_MOD_4_TILED;
 		} else {
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index b8fb7b44c03c..697614ea4b84 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -605,6 +605,16 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
 
+/*
+ * Intel color control surfaces (CCS) for DG2 clear color render compression.
+ *
+ * DG2 uses a unified compression format for clear color render compression.
+ * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
+ *
+ * Fast clear color value expected by HW is located in fb at offset 0 of plane#1
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 17/19] drm/i915/dg2: Flat CCS Support
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Lionel Landwerlin, Matthew Auld, Mika Kahola, Anshuman Gupta

From: Anshuman Gupta <anshuman.gupta@intel.com>

DG2 onwards discrete gfx has support for new flat CCS mapping,
which brings in display feature in to avoid Aux walk for compressed
surface. This support build on top of Flat CCS support added in XEHPSDV.
FLAT CCS surface base address should be 64k aligned,
Compressed displayable surfaces must use tile4 format.

HAS: 1407880786
B.Spec : 7655
B.Spec : 53902

Cc: Mika Kahola <mika.kahola@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/display/intel_display.c  |  4 ++-
 drivers/gpu/drm/i915/display/intel_fb.c       | 32 +++++++++++++------
 .../drm/i915/display/skl_universal_plane.c    | 16 ++++++----
 3 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 189767cef356..2828ae612179 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -8588,7 +8588,9 @@ static void intel_atomic_prepare_plane_clear_colors(struct intel_atomic_state *s
 
 		/*
 		 * The layout of the fast clear color value expected by HW
-		 * (the DRM ABI requiring this value to be located in fb at offset 0 of plane#2):
+		 * (the DRM ABI requiring this value to be located in fb at
+		 * offset 0 of cc plane, plane #2 previous generations or
+		 * plane #1 for flat ccs):
 		 * - 4 x 4 bytes per-channel value
 		 *   (in surface type specific float/int format provided by the fb user)
 		 * - 8 bytes native color value used by the display
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index 3df6ef5ffec5..e94923e9dbb1 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -107,6 +107,21 @@ static const struct drm_format_info gen12_ccs_cc_formats[] = {
 	  .hsub = 1, .vsub = 1, .has_alpha = true },
 };
 
+static const struct drm_format_info gen12_flat_ccs_cc_formats[] = {
+	{ .format = DRM_FORMAT_XRGB8888, .depth = 24, .num_planes = 2,
+	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
+	  .hsub = 1, .vsub = 1, },
+	{ .format = DRM_FORMAT_XBGR8888, .depth = 24, .num_planes = 2,
+	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
+	  .hsub = 1, .vsub = 1, },
+	{ .format = DRM_FORMAT_ARGB8888, .depth = 32, .num_planes = 2,
+	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
+	  .hsub = 1, .vsub = 1, .has_alpha = true },
+	{ .format = DRM_FORMAT_ABGR8888, .depth = 32, .num_planes = 2,
+	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
+	  .hsub = 1, .vsub = 1, .has_alpha = true },
+};
+
 struct intel_modifier_desc {
 	u64 modifier;
 	struct {
@@ -150,6 +165,8 @@ static const struct intel_modifier_desc intel_modifiers[] = {
 		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
 
 		.ccs.cc_planes = BIT(1),
+
+		FORMAT_OVERRIDE(gen12_flat_ccs_cc_formats),
 	}, {
 		.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
 		.display_ver = { 13, 13 },
@@ -399,17 +416,13 @@ bool intel_fb_plane_supports_modifier(struct intel_plane *plane, u64 modifier)
 static bool format_is_yuv_semiplanar(const struct intel_modifier_desc *md,
 				     const struct drm_format_info *info)
 {
-	int yuv_planes;
-
 	if (!info->is_yuv)
 		return false;
 
-	if (plane_caps_contain_any(md->plane_caps, INTEL_PLANE_CAP_CCS_MASK))
-		yuv_planes = 4;
+	if (hweight8(md->ccs.planar_aux_planes) == 2)
+		return info->num_planes == 4;
 	else
-		yuv_planes = 2;
-
-	return info->num_planes == yuv_planes;
+		return info->num_planes == 2;
 }
 
 /**
@@ -534,12 +547,13 @@ static unsigned int gen12_ccs_aux_stride(struct intel_framebuffer *fb, int ccs_p
 
 int skl_main_to_aux_plane(const struct drm_framebuffer *fb, int main_plane)
 {
+	const struct intel_modifier_desc *md = lookup_modifier(fb->modifier);
 	struct drm_i915_private *i915 = to_i915(fb->dev);
 
-	if (intel_fb_is_ccs_modifier(fb->modifier))
+	if (md->ccs.packed_aux_planes | md->ccs.planar_aux_planes)
 		return main_to_ccs_plane(fb, main_plane);
 	else if (DISPLAY_VER(i915) < 11 &&
-		 intel_format_info_is_yuv_semiplanar(fb->format, fb->modifier))
+		 format_is_yuv_semiplanar(md, fb->format))
 		return 1;
 	else
 		return 0;
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index b4dced1907c5..18e50583abaa 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -1176,8 +1176,10 @@ skl_plane_update_arm(struct intel_plane *plane,
 	intel_de_write_fw(dev_priv, PLANE_OFFSET(pipe, plane_id),
 			  PLANE_OFFSET_Y(y) | PLANE_OFFSET_X(x));
 
-	intel_de_write_fw(dev_priv, PLANE_AUX_DIST(pipe, plane_id),
-			  skl_plane_aux_dist(plane_state, color_plane));
+	/* FLAT CCS doesn't need to program AUX_DIST */
+	if (!HAS_FLAT_CCS(dev_priv))
+		intel_de_write_fw(dev_priv, PLANE_AUX_DIST(pipe, plane_id),
+				  skl_plane_aux_dist(plane_state, color_plane));
 
 	if (DISPLAY_VER(dev_priv) < 11)
 		intel_de_write_fw(dev_priv, PLANE_AUX_OFFSET(pipe, plane_id),
@@ -1557,9 +1559,10 @@ static int skl_check_main_surface(struct intel_plane_state *plane_state)
 
 	/*
 	 * CCS AUX surface doesn't have its own x/y offsets, we must make sure
-	 * they match with the main surface x/y offsets.
+	 * they match with the main surface x/y offsets. On DG2
+	 * there's no aux plane on fb so skip this checking.
 	 */
-	if (intel_fb_is_ccs_modifier(fb->modifier)) {
+	if (intel_fb_is_ccs_modifier(fb->modifier) && aux_plane) {
 		while (!skl_check_main_ccs_coordinates(plane_state, x, y,
 						       offset, aux_plane)) {
 			if (offset == 0)
@@ -1603,6 +1606,8 @@ static int skl_check_nv12_aux_surface(struct intel_plane_state *plane_state)
 	const struct drm_framebuffer *fb = plane_state->hw.fb;
 	unsigned int rotation = plane_state->hw.rotation;
 	int uv_plane = 1;
+	int ccs_plane = intel_fb_is_ccs_modifier(fb->modifier) ?
+			skl_main_to_aux_plane(fb, uv_plane) : 0;
 	int max_width = intel_plane_max_width(plane, fb, uv_plane, rotation);
 	int max_height = intel_plane_max_height(plane, fb, uv_plane, rotation);
 	int x = plane_state->uapi.src.x1 >> 17;
@@ -1623,8 +1628,7 @@ static int skl_check_nv12_aux_surface(struct intel_plane_state *plane_state)
 	offset = intel_plane_compute_aligned_offset(&x, &y,
 						    plane_state, uv_plane);
 
-	if (intel_fb_is_ccs_modifier(fb->modifier)) {
-		int ccs_plane = main_to_ccs_plane(fb, uv_plane);
+	if (ccs_plane) {
 		u32 aux_offset = plane_state->view.color_plane[ccs_plane].offset;
 		u32 alignment = intel_surf_alignment(fb, uv_plane);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 17/19] drm/i915/dg2: Flat CCS Support
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Matthew Auld

From: Anshuman Gupta <anshuman.gupta@intel.com>

DG2 onwards discrete gfx has support for new flat CCS mapping,
which brings in display feature in to avoid Aux walk for compressed
surface. This support build on top of Flat CCS support added in XEHPSDV.
FLAT CCS surface base address should be 64k aligned,
Compressed displayable surfaces must use tile4 format.

HAS: 1407880786
B.Spec : 7655
B.Spec : 53902

Cc: Mika Kahola <mika.kahola@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/display/intel_display.c  |  4 ++-
 drivers/gpu/drm/i915/display/intel_fb.c       | 32 +++++++++++++------
 .../drm/i915/display/skl_universal_plane.c    | 16 ++++++----
 3 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 189767cef356..2828ae612179 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -8588,7 +8588,9 @@ static void intel_atomic_prepare_plane_clear_colors(struct intel_atomic_state *s
 
 		/*
 		 * The layout of the fast clear color value expected by HW
-		 * (the DRM ABI requiring this value to be located in fb at offset 0 of plane#2):
+		 * (the DRM ABI requiring this value to be located in fb at
+		 * offset 0 of cc plane, plane #2 previous generations or
+		 * plane #1 for flat ccs):
 		 * - 4 x 4 bytes per-channel value
 		 *   (in surface type specific float/int format provided by the fb user)
 		 * - 8 bytes native color value used by the display
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index 3df6ef5ffec5..e94923e9dbb1 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -107,6 +107,21 @@ static const struct drm_format_info gen12_ccs_cc_formats[] = {
 	  .hsub = 1, .vsub = 1, .has_alpha = true },
 };
 
+static const struct drm_format_info gen12_flat_ccs_cc_formats[] = {
+	{ .format = DRM_FORMAT_XRGB8888, .depth = 24, .num_planes = 2,
+	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
+	  .hsub = 1, .vsub = 1, },
+	{ .format = DRM_FORMAT_XBGR8888, .depth = 24, .num_planes = 2,
+	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
+	  .hsub = 1, .vsub = 1, },
+	{ .format = DRM_FORMAT_ARGB8888, .depth = 32, .num_planes = 2,
+	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
+	  .hsub = 1, .vsub = 1, .has_alpha = true },
+	{ .format = DRM_FORMAT_ABGR8888, .depth = 32, .num_planes = 2,
+	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
+	  .hsub = 1, .vsub = 1, .has_alpha = true },
+};
+
 struct intel_modifier_desc {
 	u64 modifier;
 	struct {
@@ -150,6 +165,8 @@ static const struct intel_modifier_desc intel_modifiers[] = {
 		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
 
 		.ccs.cc_planes = BIT(1),
+
+		FORMAT_OVERRIDE(gen12_flat_ccs_cc_formats),
 	}, {
 		.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
 		.display_ver = { 13, 13 },
@@ -399,17 +416,13 @@ bool intel_fb_plane_supports_modifier(struct intel_plane *plane, u64 modifier)
 static bool format_is_yuv_semiplanar(const struct intel_modifier_desc *md,
 				     const struct drm_format_info *info)
 {
-	int yuv_planes;
-
 	if (!info->is_yuv)
 		return false;
 
-	if (plane_caps_contain_any(md->plane_caps, INTEL_PLANE_CAP_CCS_MASK))
-		yuv_planes = 4;
+	if (hweight8(md->ccs.planar_aux_planes) == 2)
+		return info->num_planes == 4;
 	else
-		yuv_planes = 2;
-
-	return info->num_planes == yuv_planes;
+		return info->num_planes == 2;
 }
 
 /**
@@ -534,12 +547,13 @@ static unsigned int gen12_ccs_aux_stride(struct intel_framebuffer *fb, int ccs_p
 
 int skl_main_to_aux_plane(const struct drm_framebuffer *fb, int main_plane)
 {
+	const struct intel_modifier_desc *md = lookup_modifier(fb->modifier);
 	struct drm_i915_private *i915 = to_i915(fb->dev);
 
-	if (intel_fb_is_ccs_modifier(fb->modifier))
+	if (md->ccs.packed_aux_planes | md->ccs.planar_aux_planes)
 		return main_to_ccs_plane(fb, main_plane);
 	else if (DISPLAY_VER(i915) < 11 &&
-		 intel_format_info_is_yuv_semiplanar(fb->format, fb->modifier))
+		 format_is_yuv_semiplanar(md, fb->format))
 		return 1;
 	else
 		return 0;
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index b4dced1907c5..18e50583abaa 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -1176,8 +1176,10 @@ skl_plane_update_arm(struct intel_plane *plane,
 	intel_de_write_fw(dev_priv, PLANE_OFFSET(pipe, plane_id),
 			  PLANE_OFFSET_Y(y) | PLANE_OFFSET_X(x));
 
-	intel_de_write_fw(dev_priv, PLANE_AUX_DIST(pipe, plane_id),
-			  skl_plane_aux_dist(plane_state, color_plane));
+	/* FLAT CCS doesn't need to program AUX_DIST */
+	if (!HAS_FLAT_CCS(dev_priv))
+		intel_de_write_fw(dev_priv, PLANE_AUX_DIST(pipe, plane_id),
+				  skl_plane_aux_dist(plane_state, color_plane));
 
 	if (DISPLAY_VER(dev_priv) < 11)
 		intel_de_write_fw(dev_priv, PLANE_AUX_OFFSET(pipe, plane_id),
@@ -1557,9 +1559,10 @@ static int skl_check_main_surface(struct intel_plane_state *plane_state)
 
 	/*
 	 * CCS AUX surface doesn't have its own x/y offsets, we must make sure
-	 * they match with the main surface x/y offsets.
+	 * they match with the main surface x/y offsets. On DG2
+	 * there's no aux plane on fb so skip this checking.
 	 */
-	if (intel_fb_is_ccs_modifier(fb->modifier)) {
+	if (intel_fb_is_ccs_modifier(fb->modifier) && aux_plane) {
 		while (!skl_check_main_ccs_coordinates(plane_state, x, y,
 						       offset, aux_plane)) {
 			if (offset == 0)
@@ -1603,6 +1606,8 @@ static int skl_check_nv12_aux_surface(struct intel_plane_state *plane_state)
 	const struct drm_framebuffer *fb = plane_state->hw.fb;
 	unsigned int rotation = plane_state->hw.rotation;
 	int uv_plane = 1;
+	int ccs_plane = intel_fb_is_ccs_modifier(fb->modifier) ?
+			skl_main_to_aux_plane(fb, uv_plane) : 0;
 	int max_width = intel_plane_max_width(plane, fb, uv_plane, rotation);
 	int max_height = intel_plane_max_height(plane, fb, uv_plane, rotation);
 	int x = plane_state->uapi.src.x1 >> 17;
@@ -1623,8 +1628,7 @@ static int skl_check_nv12_aux_surface(struct intel_plane_state *plane_state)
 	offset = intel_plane_compute_aligned_offset(&x, &y,
 						    plane_state, uv_plane);
 
-	if (intel_fb_is_ccs_modifier(fb->modifier)) {
-		int ccs_plane = main_to_ccs_plane(fb, uv_plane);
+	if (ccs_plane) {
 		u32 aux_offset = plane_state->view.color_plane[ccs_plane].offset;
 		u32 alignment = intel_surf_alignment(fb, uv_plane);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 18/19] drm/i915/Flat-CCS: Document on Flat-CCS memory compression
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tony Ye, Jordan Justen, Kenneth Graunke, Lionel Landwerlin,
	Slawomir Milczarek, Matthew Auld, mesa-dev

Documents the Flat-CCS feature and kernel handling required along with
modifiers used.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 47 +++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 3e1cf224cdf0..5bdab0b3c735 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -596,6 +596,53 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
+/**
+ * DOC: Flat-CCS - Memory compression for Local memory
+ *
+ * On Xe-HP and later devices, we use dedicated compression control state (CCS)
+ * stored in local memory for each surface, to support the 3D and media
+ * compression formats.
+ *
+ * The memory required for the CCS of the entire local memory is 1/256 of the
+ * local memory size. So before the kernel boot, the required memory is reserved
+ * for the CCS data and a secure register will be programmed with the CCS base
+ * address.
+ *
+ * Flat CCS data needs to be cleared when a lmem object is allocated.
+ * And CCS data can be copied in and out of CCS region through
+ * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
+ *
+ * When we exaust the lmem, if the object's placements support smem, then we can
+ * directly decompress the compressed lmem object into smem and start using it
+ * from smem itself.
+ *
+ * But when we need to swapout the compressed lmem object into a smem region
+ * though objects' placement doesn't support smem, then we copy the lmem content
+ * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
+ * When the object is referred, lmem content will be swaped in along with
+ * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
+ * location.
+ *
+ *
+ * Flat-CCS Modifiers for different compression formats
+ * ----------------------------------------------------
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_RC_CCS - used to indicate the buffers of Flat CCS
+ * render compression formats. Though the general layout is same as
+ * I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression algorithm is
+ * used. Render compression uses 128 byte compression blocks
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_MC_CCS -used to indicate the buffers of Flat CCS
+ * media compression formats. Though the general layout is same as
+ * I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm is
+ * used. Media compression uses 256 byte compression blocks.
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC - used to indicate the buffers of Flat
+ * CCS clear color render compression formats. Unified compression format for
+ * clear color render compression. The genral layout is a tiled layout using
+ * 4Kb tiles i.e Tile4 layout.
+ */
+
 static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
 {
 	/* Mask the 3 LSB to use the PPGTT address space */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 18/19] drm/i915/Flat-CCS: Document on Flat-CCS memory compression
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Simon Ser, Kenneth Graunke, Slawomir Milczarek, Pekka Paalanen,
	Matthew Auld, mesa-dev

Documents the Flat-CCS feature and kernel handling required along with
modifiers used.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 47 +++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 3e1cf224cdf0..5bdab0b3c735 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -596,6 +596,53 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
+/**
+ * DOC: Flat-CCS - Memory compression for Local memory
+ *
+ * On Xe-HP and later devices, we use dedicated compression control state (CCS)
+ * stored in local memory for each surface, to support the 3D and media
+ * compression formats.
+ *
+ * The memory required for the CCS of the entire local memory is 1/256 of the
+ * local memory size. So before the kernel boot, the required memory is reserved
+ * for the CCS data and a secure register will be programmed with the CCS base
+ * address.
+ *
+ * Flat CCS data needs to be cleared when a lmem object is allocated.
+ * And CCS data can be copied in and out of CCS region through
+ * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
+ *
+ * When we exaust the lmem, if the object's placements support smem, then we can
+ * directly decompress the compressed lmem object into smem and start using it
+ * from smem itself.
+ *
+ * But when we need to swapout the compressed lmem object into a smem region
+ * though objects' placement doesn't support smem, then we copy the lmem content
+ * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
+ * When the object is referred, lmem content will be swaped in along with
+ * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
+ * location.
+ *
+ *
+ * Flat-CCS Modifiers for different compression formats
+ * ----------------------------------------------------
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_RC_CCS - used to indicate the buffers of Flat CCS
+ * render compression formats. Though the general layout is same as
+ * I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression algorithm is
+ * used. Render compression uses 128 byte compression blocks
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_MC_CCS -used to indicate the buffers of Flat CCS
+ * media compression formats. Though the general layout is same as
+ * I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm is
+ * used. Media compression uses 256 byte compression blocks.
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC - used to indicate the buffers of Flat
+ * CCS clear color render compression formats. Unified compression format for
+ * clear color render compression. The genral layout is a tiled layout using
+ * 4Kb tiles i.e Tile4 layout.
+ */
+
 static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
 {
 	/* Mask the 3 LSB to use the PPGTT address space */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v5 19/19] Doc/gpu/rfc/i915: i915 DG2 flat-CCS uAPI
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:41   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Tony Ye, Jordan Justen, Daniel Vetter, Kenneth Graunke,
	Lionel Landwerlin, Slawomir Milczarek, Matthew Auld, mesa-dev

Details of the Flat-CCS getting added as part of DG2 enabling and its
implicit impact on the uAPI.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Daniel Vetter <daniel.vetter@ffwll.ch>
cc: Matthew Auld <matthew.auld@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 Documentation/gpu/rfc/i915_dg2.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst
index f4eb5a219897..9d28b1812bc7 100644
--- a/Documentation/gpu/rfc/i915_dg2.rst
+++ b/Documentation/gpu/rfc/i915_dg2.rst
@@ -23,3 +23,10 @@ handling the 64k page size.
 
 .. kernel-doc:: include/uapi/drm/i915_drm.h
         :functions: drm_i915_gem_create_ext
+
+
+Flat CCS support for lmem
+=========================
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_migrate.c
+        :doc: Flat-CCS - Memory compression for Local memory
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Intel-gfx] [PATCH v5 19/19] Doc/gpu/rfc/i915: i915 DG2 flat-CCS uAPI
@ 2022-02-01 10:41   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-01 10:41 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, Kenneth Graunke, Slawomir Milczarek,
	Pekka Paalanen, Matthew Auld, Simon Ser, mesa-dev

Details of the Flat-CCS getting added as part of DG2 enabling and its
implicit impact on the uAPI.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Daniel Vetter <daniel.vetter@ffwll.ch>
cc: Matthew Auld <matthew.auld@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 Documentation/gpu/rfc/i915_dg2.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst
index f4eb5a219897..9d28b1812bc7 100644
--- a/Documentation/gpu/rfc/i915_dg2.rst
+++ b/Documentation/gpu/rfc/i915_dg2.rst
@@ -23,3 +23,10 @@ handling the 64k page size.
 
 .. kernel-doc:: include/uapi/drm/i915_drm.h
         :functions: drm_i915_gem_create_ext
+
+
+Flat CCS support for lmem
+=========================
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_migrate.c
+        :doc: Flat-CCS - Memory compression for Local memory
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 07/19] drm/i915/migrate: add acceleration support for DG2
  2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
@ 2022-02-01 10:49     ` Matthew Auld
  -1 siblings, 0 replies; 80+ messages in thread
From: Matthew Auld @ 2022-02-01 10:49 UTC (permalink / raw)
  To: Ramalingam C, dri-devel, intel-gfx
  Cc: Thomas Hellström, Lionel Landwerlin

On 01/02/2022 10:41, Ramalingam C wrote:
> From: Matthew Auld <matthew.auld@intel.com>
> 
> This is all kinds of awkward since we now have to contend with using 64K
> GTT pages when mapping anything in LMEM(including the page-tables
> themselves).
> 
> v2: Rebased [Ram]
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>

This version seems to be missing your review feedback, which I 
incorporated here[1].

[1] https://patchwork.freedesktop.org/series/97544/

> ---
>   drivers/gpu/drm/i915/gt/intel_migrate.c | 179 +++++++++++++++++++-----
>   1 file changed, 147 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 18b44af56969..cac791155244 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -32,6 +32,38 @@ static bool engine_supports_migration(struct intel_engine_cs *engine)
>   	return true;
>   }
>   
> +static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
> +				struct i915_page_table *pt,
> +				void *data)
> +{
> +	struct insert_pte_data *d = data;
> +
> +	/*
> +	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
> +	 * we have a correctly setup PDE structure for later use.
> +	 */
> +	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
> +	GEM_BUG_ON(!pt->is_compact);
> +	d->offset += SZ_2M;
> +}
> +
> +static void xehpsdv_insert_pte(struct i915_address_space *vm,
> +			       struct i915_page_table *pt,
> +			       void *data)
> +{
> +	struct insert_pte_data *d = data;
> +
> +	/*
> +	 * We are playing tricks here, since the actual pt, from the hw
> +	 * pov, is only 256bytes with 32 entries, or 4096bytes with 512
> +	 * entries, but we are still guaranteed that the physical
> +	 * alignment is 64K underneath for the pt, and we are careful
> +	 * not to access the space in the void.
> +	 */
> +	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
> +	d->offset += SZ_64K;
> +}
> +
>   static void insert_pte(struct i915_address_space *vm,
>   		       struct i915_page_table *pt,
>   		       void *data)
> @@ -74,7 +106,12 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>   	 * i.e. within the same non-preemptible window so that we do not switch
>   	 * to another migration context that overwrites the PTE.
>   	 *
> -	 * TODO: Add support for huge LMEM PTEs
> +	 * On platforms with HAS_64K_PAGES support we have three windows, and
> +	 * dedicate two windows just for mapping lmem pages(smem <-> smem is not
> +	 * a thing), since we are forced to use 64K GTT pages underneath which
> +	 * requires also modifying the PDE. An alternative might be to instead
> +	 * map the PD into the GTT, and then on the fly toggle the 4K/64K mode
> +	 * in the PDE from the same batch that also modifies the PTEs.
>   	 */
>   
>   	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
> @@ -86,6 +123,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>   		goto err_vm;
>   	}
>   
> +	if (HAS_64K_PAGES(gt->i915))
> +		stash.pt_sz = I915_GTT_PAGE_SIZE_64K;
> +
>   	/*
>   	 * Each engine instance is assigned its own chunk in the VM, so
>   	 * that we can run multiple instances concurrently
> @@ -105,14 +145,20 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>   		 * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need
>   		 * 4x2 page directories for source/destination.
>   		 */
> -		sz = 2 * CHUNK_SZ;
> +		if (HAS_64K_PAGES(gt->i915))
> +			sz = 3 * CHUNK_SZ;
> +		else
> +			sz = 2 * CHUNK_SZ;
>   		d.offset = base + sz;
>   
>   		/*
>   		 * We need another page directory setup so that we can write
>   		 * the 8x512 PTE in each chunk.
>   		 */
> -		sz += (sz >> 12) * sizeof(u64);
> +		if (HAS_64K_PAGES(gt->i915))
> +			sz += (sz / SZ_2M) * SZ_64K;
> +		else
> +			sz += (sz >> 12) * sizeof(u64);
>   
>   		err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz);
>   		if (err)
> @@ -133,7 +179,18 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>   			goto err_vm;
>   
>   		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
> -		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
> +		if (HAS_64K_PAGES(gt->i915)) {
> +			vm->vm.foreach(&vm->vm, base, d.offset - base,
> +				       xehpsdv_insert_pte, &d);
> +			d.offset = base + CHUNK_SZ;
> +			vm->vm.foreach(&vm->vm,
> +				       d.offset,
> +				       2 * CHUNK_SZ,
> +				       xehpsdv_toggle_pdes, &d);
> +		} else {
> +			vm->vm.foreach(&vm->vm, base, d.offset - base,
> +				       insert_pte, &d);
> +		}
>   	}
>   
>   	return &vm->vm;
> @@ -269,19 +326,38 @@ static int emit_pte(struct i915_request *rq,
>   		    u64 offset,
>   		    int length)
>   {
> +	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
>   	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
>   						       is_lmem ? PTE_LM : 0);
>   	struct intel_ring *ring = rq->ring;
> -	int total = 0;
> +	int pkt, dword_length;
> +	u32 total = 0;
> +	u32 page_size;
>   	u32 *hdr, *cs;
> -	int pkt;
>   
>   	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
>   
> +	page_size = I915_GTT_PAGE_SIZE;
> +	dword_length = 0x400;
> +
>   	/* Compute the page directory offset for the target address range */
> -	offset >>= 12;
> -	offset *= sizeof(u64);
> -	offset += 2 * CHUNK_SZ;
> +	if (has_64K_pages) {
> +		GEM_BUG_ON(!IS_ALIGNED(offset, SZ_2M));
> +
> +		offset /= SZ_2M;
> +		offset *= SZ_64K;
> +		offset += 3 * CHUNK_SZ;
> +
> +		if (is_lmem) {
> +			page_size = I915_GTT_PAGE_SIZE_64K;
> +			dword_length = 0x40;
> +		}
> +	} else {
> +		offset >>= 12;
> +		offset *= sizeof(u64);
> +		offset += 2 * CHUNK_SZ;
> +	}
> +
>   	offset += (u64)rq->engine->instance << 32;
>   
>   	cs = intel_ring_begin(rq, 6);
> @@ -289,7 +365,7 @@ static int emit_pte(struct i915_request *rq,
>   		return PTR_ERR(cs);
>   
>   	/* Pack as many PTE updates as possible into a single MI command */
> -	pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
> +	pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5);
>   	pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
>   
>   	hdr = cs;
> @@ -299,6 +375,8 @@ static int emit_pte(struct i915_request *rq,
>   
>   	do {
>   		if (cs - hdr >= pkt) {
> +			int dword_rem;
> +
>   			*hdr += cs - hdr - 2;
>   			*cs++ = MI_NOOP;
>   
> @@ -310,7 +388,18 @@ static int emit_pte(struct i915_request *rq,
>   			if (IS_ERR(cs))
>   				return PTR_ERR(cs);
>   
> -			pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
> +			dword_rem = dword_length;
> +			if (has_64K_pages) {
> +				if (IS_ALIGNED(total, SZ_2M)) {
> +					offset = round_up(offset, SZ_64K);
> +				} else {
> +					dword_rem = SZ_2M - (total & (SZ_2M - 1));
> +					dword_rem /= page_size;
> +					dword_rem *= 2;
> +				}
> +			}
> +
> +			pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5);
>   			pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
>   
>   			hdr = cs;
> @@ -319,13 +408,15 @@ static int emit_pte(struct i915_request *rq,
>   			*cs++ = upper_32_bits(offset);
>   		}
>   
> +		GEM_BUG_ON(!IS_ALIGNED(it->dma, page_size));
> +
>   		*cs++ = lower_32_bits(encode | it->dma);
>   		*cs++ = upper_32_bits(encode | it->dma);
>   
>   		offset += 8;
> -		total += I915_GTT_PAGE_SIZE;
> +		total += page_size;
>   
> -		it->dma += I915_GTT_PAGE_SIZE;
> +		it->dma += page_size;
>   		if (it->dma >= it->max) {
>   			it->sg = __sg_next(it->sg);
>   			if (!it->sg || sg_dma_len(it->sg) == 0)
> @@ -356,7 +447,8 @@ static bool wa_1209644611_applies(int ver, u32 size)
>   	return height % 4 == 3 && height <= 8;
>   }
>   
> -static int emit_copy(struct i915_request *rq, int size)
> +static int emit_copy(struct i915_request *rq,
> +		     u32 dst_offset, u32 src_offset, int size)
>   {
>   	const int ver = GRAPHICS_VER(rq->engine->i915);
>   	u32 instance = rq->engine->instance;
> @@ -371,31 +463,31 @@ static int emit_copy(struct i915_request *rq, int size)
>   		*cs++ = BLT_DEPTH_32 | PAGE_SIZE;
>   		*cs++ = 0;
>   		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> -		*cs++ = CHUNK_SZ; /* dst offset */
> +		*cs++ = dst_offset;
>   		*cs++ = instance;
>   		*cs++ = 0;
>   		*cs++ = PAGE_SIZE;
> -		*cs++ = 0; /* src offset */
> +		*cs++ = src_offset;
>   		*cs++ = instance;
>   	} else if (ver >= 8) {
>   		*cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2);
>   		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
>   		*cs++ = 0;
>   		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> -		*cs++ = CHUNK_SZ; /* dst offset */
> +		*cs++ = dst_offset;
>   		*cs++ = instance;
>   		*cs++ = 0;
>   		*cs++ = PAGE_SIZE;
> -		*cs++ = 0; /* src offset */
> +		*cs++ = src_offset;
>   		*cs++ = instance;
>   	} else {
>   		GEM_BUG_ON(instance);
>   		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
>   		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
>   		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE;
> -		*cs++ = CHUNK_SZ; /* dst offset */
> +		*cs++ = dst_offset;
>   		*cs++ = PAGE_SIZE;
> -		*cs++ = 0; /* src offset */
> +		*cs++ = src_offset;
>   	}
>   
>   	intel_ring_advance(rq, cs);
> @@ -423,6 +515,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>   	GEM_BUG_ON(ce->ring->size < SZ_64K);
>   
>   	do {
> +		u32 src_offset, dst_offset;
>   		int len;
>   
>   		rq = i915_request_create(ce);
> @@ -450,15 +543,28 @@ intel_context_migrate_copy(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0,
> -			       CHUNK_SZ);
> +		src_offset = 0;
> +		dst_offset = CHUNK_SZ;
> +		if (HAS_64K_PAGES(ce->engine->i915)) {
> +			GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
> +
> +			src_offset = 0;
> +			dst_offset = 0;
> +			if (src_is_lmem)
> +				src_offset = CHUNK_SZ;
> +			if (dst_is_lmem)
> +				dst_offset = 2 * CHUNK_SZ;
> +		}
> +
> +		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
> +			       src_offset, CHUNK_SZ);
>   		if (len <= 0) {
>   			err = len;
>   			goto out_rq;
>   		}
>   
>   		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
> -			       CHUNK_SZ, len);
> +			       dst_offset, len);
>   		if (err < 0)
>   			goto out_rq;
>   		if (err < len) {
> @@ -470,7 +576,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		err = emit_copy(rq, len);
> +		err = emit_copy(rq, dst_offset, src_offset, len);
>   
>   		/* Arbitration is re-enabled between requests. */
>   out_rq:
> @@ -488,14 +594,18 @@ intel_context_migrate_copy(struct intel_context *ce,
>   	return err;
>   }
>   
> -static int emit_clear(struct i915_request *rq, int size, u32 value)
> +static int emit_clear(struct i915_request *rq,
> +		      u64 offset,
> +		      int size,
> +		      u32 value)
>   {
>   	const int ver = GRAPHICS_VER(rq->engine->i915);
> -	u32 instance = rq->engine->instance;
>   	u32 *cs;
>   
>   	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
>   
> +	offset += (u64)rq->engine->instance << 32;
> +
>   	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
>   	if (IS_ERR(cs))
>   		return PTR_ERR(cs);
> @@ -505,17 +615,17 @@ static int emit_clear(struct i915_request *rq, int size, u32 value)
>   		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
>   		*cs++ = 0;
>   		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> -		*cs++ = 0; /* offset */
> -		*cs++ = instance;
> +		*cs++ = lower_32_bits(offset);
> +		*cs++ = upper_32_bits(offset);
>   		*cs++ = value;
>   		*cs++ = MI_NOOP;
>   	} else {
> -		GEM_BUG_ON(instance);
> +		GEM_BUG_ON(upper_32_bits(offset));
>   		*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
>   		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
>   		*cs++ = 0;
>   		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> -		*cs++ = 0;
> +		*cs++ = lower_32_bits(offset);
>   		*cs++ = value;
>   	}
>   
> @@ -542,6 +652,7 @@ intel_context_migrate_clear(struct intel_context *ce,
>   	GEM_BUG_ON(ce->ring->size < SZ_64K);
>   
>   	do {
> +		u32 offset;
>   		int len;
>   
>   		rq = i915_request_create(ce);
> @@ -569,7 +680,11 @@ intel_context_migrate_clear(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ);
> +		offset = 0;
> +		if (HAS_64K_PAGES(ce->engine->i915) && is_lmem)
> +			offset = CHUNK_SZ;
> +
> +		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
>   		if (len <= 0) {
>   			err = len;
>   			goto out_rq;
> @@ -579,7 +694,7 @@ intel_context_migrate_clear(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		err = emit_clear(rq, len, value);
> +		err = emit_clear(rq, offset, len, value);
>   
>   		/* Arbitration is re-enabled between requests. */
>   out_rq:

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 07/19] drm/i915/migrate: add acceleration support for DG2
@ 2022-02-01 10:49     ` Matthew Auld
  0 siblings, 0 replies; 80+ messages in thread
From: Matthew Auld @ 2022-02-01 10:49 UTC (permalink / raw)
  To: Ramalingam C, dri-devel, intel-gfx; +Cc: Thomas Hellström

On 01/02/2022 10:41, Ramalingam C wrote:
> From: Matthew Auld <matthew.auld@intel.com>
> 
> This is all kinds of awkward since we now have to contend with using 64K
> GTT pages when mapping anything in LMEM(including the page-tables
> themselves).
> 
> v2: Rebased [Ram]
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>

This version seems to be missing your review feedback, which I 
incorporated here[1].

[1] https://patchwork.freedesktop.org/series/97544/

> ---
>   drivers/gpu/drm/i915/gt/intel_migrate.c | 179 +++++++++++++++++++-----
>   1 file changed, 147 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 18b44af56969..cac791155244 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -32,6 +32,38 @@ static bool engine_supports_migration(struct intel_engine_cs *engine)
>   	return true;
>   }
>   
> +static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
> +				struct i915_page_table *pt,
> +				void *data)
> +{
> +	struct insert_pte_data *d = data;
> +
> +	/*
> +	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
> +	 * we have a correctly setup PDE structure for later use.
> +	 */
> +	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
> +	GEM_BUG_ON(!pt->is_compact);
> +	d->offset += SZ_2M;
> +}
> +
> +static void xehpsdv_insert_pte(struct i915_address_space *vm,
> +			       struct i915_page_table *pt,
> +			       void *data)
> +{
> +	struct insert_pte_data *d = data;
> +
> +	/*
> +	 * We are playing tricks here, since the actual pt, from the hw
> +	 * pov, is only 256bytes with 32 entries, or 4096bytes with 512
> +	 * entries, but we are still guaranteed that the physical
> +	 * alignment is 64K underneath for the pt, and we are careful
> +	 * not to access the space in the void.
> +	 */
> +	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
> +	d->offset += SZ_64K;
> +}
> +
>   static void insert_pte(struct i915_address_space *vm,
>   		       struct i915_page_table *pt,
>   		       void *data)
> @@ -74,7 +106,12 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>   	 * i.e. within the same non-preemptible window so that we do not switch
>   	 * to another migration context that overwrites the PTE.
>   	 *
> -	 * TODO: Add support for huge LMEM PTEs
> +	 * On platforms with HAS_64K_PAGES support we have three windows, and
> +	 * dedicate two windows just for mapping lmem pages(smem <-> smem is not
> +	 * a thing), since we are forced to use 64K GTT pages underneath which
> +	 * requires also modifying the PDE. An alternative might be to instead
> +	 * map the PD into the GTT, and then on the fly toggle the 4K/64K mode
> +	 * in the PDE from the same batch that also modifies the PTEs.
>   	 */
>   
>   	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
> @@ -86,6 +123,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>   		goto err_vm;
>   	}
>   
> +	if (HAS_64K_PAGES(gt->i915))
> +		stash.pt_sz = I915_GTT_PAGE_SIZE_64K;
> +
>   	/*
>   	 * Each engine instance is assigned its own chunk in the VM, so
>   	 * that we can run multiple instances concurrently
> @@ -105,14 +145,20 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>   		 * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need
>   		 * 4x2 page directories for source/destination.
>   		 */
> -		sz = 2 * CHUNK_SZ;
> +		if (HAS_64K_PAGES(gt->i915))
> +			sz = 3 * CHUNK_SZ;
> +		else
> +			sz = 2 * CHUNK_SZ;
>   		d.offset = base + sz;
>   
>   		/*
>   		 * We need another page directory setup so that we can write
>   		 * the 8x512 PTE in each chunk.
>   		 */
> -		sz += (sz >> 12) * sizeof(u64);
> +		if (HAS_64K_PAGES(gt->i915))
> +			sz += (sz / SZ_2M) * SZ_64K;
> +		else
> +			sz += (sz >> 12) * sizeof(u64);
>   
>   		err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz);
>   		if (err)
> @@ -133,7 +179,18 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
>   			goto err_vm;
>   
>   		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
> -		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
> +		if (HAS_64K_PAGES(gt->i915)) {
> +			vm->vm.foreach(&vm->vm, base, d.offset - base,
> +				       xehpsdv_insert_pte, &d);
> +			d.offset = base + CHUNK_SZ;
> +			vm->vm.foreach(&vm->vm,
> +				       d.offset,
> +				       2 * CHUNK_SZ,
> +				       xehpsdv_toggle_pdes, &d);
> +		} else {
> +			vm->vm.foreach(&vm->vm, base, d.offset - base,
> +				       insert_pte, &d);
> +		}
>   	}
>   
>   	return &vm->vm;
> @@ -269,19 +326,38 @@ static int emit_pte(struct i915_request *rq,
>   		    u64 offset,
>   		    int length)
>   {
> +	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
>   	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
>   						       is_lmem ? PTE_LM : 0);
>   	struct intel_ring *ring = rq->ring;
> -	int total = 0;
> +	int pkt, dword_length;
> +	u32 total = 0;
> +	u32 page_size;
>   	u32 *hdr, *cs;
> -	int pkt;
>   
>   	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
>   
> +	page_size = I915_GTT_PAGE_SIZE;
> +	dword_length = 0x400;
> +
>   	/* Compute the page directory offset for the target address range */
> -	offset >>= 12;
> -	offset *= sizeof(u64);
> -	offset += 2 * CHUNK_SZ;
> +	if (has_64K_pages) {
> +		GEM_BUG_ON(!IS_ALIGNED(offset, SZ_2M));
> +
> +		offset /= SZ_2M;
> +		offset *= SZ_64K;
> +		offset += 3 * CHUNK_SZ;
> +
> +		if (is_lmem) {
> +			page_size = I915_GTT_PAGE_SIZE_64K;
> +			dword_length = 0x40;
> +		}
> +	} else {
> +		offset >>= 12;
> +		offset *= sizeof(u64);
> +		offset += 2 * CHUNK_SZ;
> +	}
> +
>   	offset += (u64)rq->engine->instance << 32;
>   
>   	cs = intel_ring_begin(rq, 6);
> @@ -289,7 +365,7 @@ static int emit_pte(struct i915_request *rq,
>   		return PTR_ERR(cs);
>   
>   	/* Pack as many PTE updates as possible into a single MI command */
> -	pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
> +	pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5);
>   	pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
>   
>   	hdr = cs;
> @@ -299,6 +375,8 @@ static int emit_pte(struct i915_request *rq,
>   
>   	do {
>   		if (cs - hdr >= pkt) {
> +			int dword_rem;
> +
>   			*hdr += cs - hdr - 2;
>   			*cs++ = MI_NOOP;
>   
> @@ -310,7 +388,18 @@ static int emit_pte(struct i915_request *rq,
>   			if (IS_ERR(cs))
>   				return PTR_ERR(cs);
>   
> -			pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
> +			dword_rem = dword_length;
> +			if (has_64K_pages) {
> +				if (IS_ALIGNED(total, SZ_2M)) {
> +					offset = round_up(offset, SZ_64K);
> +				} else {
> +					dword_rem = SZ_2M - (total & (SZ_2M - 1));
> +					dword_rem /= page_size;
> +					dword_rem *= 2;
> +				}
> +			}
> +
> +			pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5);
>   			pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
>   
>   			hdr = cs;
> @@ -319,13 +408,15 @@ static int emit_pte(struct i915_request *rq,
>   			*cs++ = upper_32_bits(offset);
>   		}
>   
> +		GEM_BUG_ON(!IS_ALIGNED(it->dma, page_size));
> +
>   		*cs++ = lower_32_bits(encode | it->dma);
>   		*cs++ = upper_32_bits(encode | it->dma);
>   
>   		offset += 8;
> -		total += I915_GTT_PAGE_SIZE;
> +		total += page_size;
>   
> -		it->dma += I915_GTT_PAGE_SIZE;
> +		it->dma += page_size;
>   		if (it->dma >= it->max) {
>   			it->sg = __sg_next(it->sg);
>   			if (!it->sg || sg_dma_len(it->sg) == 0)
> @@ -356,7 +447,8 @@ static bool wa_1209644611_applies(int ver, u32 size)
>   	return height % 4 == 3 && height <= 8;
>   }
>   
> -static int emit_copy(struct i915_request *rq, int size)
> +static int emit_copy(struct i915_request *rq,
> +		     u32 dst_offset, u32 src_offset, int size)
>   {
>   	const int ver = GRAPHICS_VER(rq->engine->i915);
>   	u32 instance = rq->engine->instance;
> @@ -371,31 +463,31 @@ static int emit_copy(struct i915_request *rq, int size)
>   		*cs++ = BLT_DEPTH_32 | PAGE_SIZE;
>   		*cs++ = 0;
>   		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> -		*cs++ = CHUNK_SZ; /* dst offset */
> +		*cs++ = dst_offset;
>   		*cs++ = instance;
>   		*cs++ = 0;
>   		*cs++ = PAGE_SIZE;
> -		*cs++ = 0; /* src offset */
> +		*cs++ = src_offset;
>   		*cs++ = instance;
>   	} else if (ver >= 8) {
>   		*cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2);
>   		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
>   		*cs++ = 0;
>   		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> -		*cs++ = CHUNK_SZ; /* dst offset */
> +		*cs++ = dst_offset;
>   		*cs++ = instance;
>   		*cs++ = 0;
>   		*cs++ = PAGE_SIZE;
> -		*cs++ = 0; /* src offset */
> +		*cs++ = src_offset;
>   		*cs++ = instance;
>   	} else {
>   		GEM_BUG_ON(instance);
>   		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
>   		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
>   		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE;
> -		*cs++ = CHUNK_SZ; /* dst offset */
> +		*cs++ = dst_offset;
>   		*cs++ = PAGE_SIZE;
> -		*cs++ = 0; /* src offset */
> +		*cs++ = src_offset;
>   	}
>   
>   	intel_ring_advance(rq, cs);
> @@ -423,6 +515,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>   	GEM_BUG_ON(ce->ring->size < SZ_64K);
>   
>   	do {
> +		u32 src_offset, dst_offset;
>   		int len;
>   
>   		rq = i915_request_create(ce);
> @@ -450,15 +543,28 @@ intel_context_migrate_copy(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0,
> -			       CHUNK_SZ);
> +		src_offset = 0;
> +		dst_offset = CHUNK_SZ;
> +		if (HAS_64K_PAGES(ce->engine->i915)) {
> +			GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
> +
> +			src_offset = 0;
> +			dst_offset = 0;
> +			if (src_is_lmem)
> +				src_offset = CHUNK_SZ;
> +			if (dst_is_lmem)
> +				dst_offset = 2 * CHUNK_SZ;
> +		}
> +
> +		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
> +			       src_offset, CHUNK_SZ);
>   		if (len <= 0) {
>   			err = len;
>   			goto out_rq;
>   		}
>   
>   		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
> -			       CHUNK_SZ, len);
> +			       dst_offset, len);
>   		if (err < 0)
>   			goto out_rq;
>   		if (err < len) {
> @@ -470,7 +576,7 @@ intel_context_migrate_copy(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		err = emit_copy(rq, len);
> +		err = emit_copy(rq, dst_offset, src_offset, len);
>   
>   		/* Arbitration is re-enabled between requests. */
>   out_rq:
> @@ -488,14 +594,18 @@ intel_context_migrate_copy(struct intel_context *ce,
>   	return err;
>   }
>   
> -static int emit_clear(struct i915_request *rq, int size, u32 value)
> +static int emit_clear(struct i915_request *rq,
> +		      u64 offset,
> +		      int size,
> +		      u32 value)
>   {
>   	const int ver = GRAPHICS_VER(rq->engine->i915);
> -	u32 instance = rq->engine->instance;
>   	u32 *cs;
>   
>   	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
>   
> +	offset += (u64)rq->engine->instance << 32;
> +
>   	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
>   	if (IS_ERR(cs))
>   		return PTR_ERR(cs);
> @@ -505,17 +615,17 @@ static int emit_clear(struct i915_request *rq, int size, u32 value)
>   		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
>   		*cs++ = 0;
>   		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> -		*cs++ = 0; /* offset */
> -		*cs++ = instance;
> +		*cs++ = lower_32_bits(offset);
> +		*cs++ = upper_32_bits(offset);
>   		*cs++ = value;
>   		*cs++ = MI_NOOP;
>   	} else {
> -		GEM_BUG_ON(instance);
> +		GEM_BUG_ON(upper_32_bits(offset));
>   		*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
>   		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
>   		*cs++ = 0;
>   		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> -		*cs++ = 0;
> +		*cs++ = lower_32_bits(offset);
>   		*cs++ = value;
>   	}
>   
> @@ -542,6 +652,7 @@ intel_context_migrate_clear(struct intel_context *ce,
>   	GEM_BUG_ON(ce->ring->size < SZ_64K);
>   
>   	do {
> +		u32 offset;
>   		int len;
>   
>   		rq = i915_request_create(ce);
> @@ -569,7 +680,11 @@ intel_context_migrate_clear(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ);
> +		offset = 0;
> +		if (HAS_64K_PAGES(ce->engine->i915) && is_lmem)
> +			offset = CHUNK_SZ;
> +
> +		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
>   		if (len <= 0) {
>   			err = len;
>   			goto out_rq;
> @@ -579,7 +694,7 @@ intel_context_migrate_clear(struct intel_context *ce,
>   		if (err)
>   			goto out_rq;
>   
> -		err = emit_clear(rq, len, value);
> +		err = emit_clear(rq, offset, len, value);
>   
>   		/* Arbitration is re-enabled between requests. */
>   out_rq:

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/dg2: Enabling 64k page size and flat ccs (rev5)
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
                   ` (19 preceding siblings ...)
  (?)
@ 2022-02-01 12:45 ` Patchwork
  -1 siblings, 0 replies; 80+ messages in thread
From: Patchwork @ 2022-02-01 12:45 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/dg2: Enabling 64k page size and flat ccs (rev5)
URL   : https://patchwork.freedesktop.org/series/95686/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
7f652d8d712c drm/i915: add needs_compact_pt flag
f93c6346dd31 drm/i915: enforce min GTT alignment for discrete cards
-:298: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#298: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:457:
+						if (offset < hole_start + aligned_size)

-:310: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#310: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:481:
+						if (offset + aligned_size > hole_end)

-:328: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#328: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:497:
+						if (offset < hole_start + aligned_size)

-:340: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#340: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:520:
+						if (offset + aligned_size > hole_end)

-:358: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#358: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:536:
+						if (offset < hole_start + aligned_size)

-:370: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#370: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:560:
+						if (offset + aligned_size > hole_end)

-:388: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#388: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:576:
+						if (offset < hole_start + aligned_size)

-:400: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#400: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:599:
+						if (offset + aligned_size > hole_end)

total: 0 errors, 8 warnings, 0 checks, 434 lines checked
7373e84047bf drm/i915: support 64K GTT pages for discrete cards
29fc5ce90221 drm/i915: add gtt misalignment test
08193c9fbdc7 drm/i915/gtt: allow overriding the pt alignment
441f25ff52f8 drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
a8710e64c4a8 drm/i915/migrate: add acceleration support for DG2
d4227abc52e4 drm/i915/uapi: document behaviour for DG2 64K support
d49ca1a2fc0d Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
-:24: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#24: 
new file mode 100644

-:29: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#29: FILE: Documentation/gpu/rfc/i915_dg2.rst:1:
+====================

total: 0 errors, 2 warnings, 0 checks, 34 lines checked
19bfdb83b514 drm/i915/xehpsdv: Add has_flat_ccs to device info
9dad7bbc9b86 drm/i915/lmem: Enable lmem for platforms with Flat CCS
90946c4b0096 drm/i915/gt: Clear compress metadata for Xe_HP platforms
035ac563c069 drm/i915: Introduce new Tile 4 format
-:9: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#9: 
granularity, Tile Y has a shape of 16B x 32 rows, but this tiling has a shape

total: 0 errors, 1 warnings, 0 checks, 17 lines checked
41874785a443 drm/i915/dg2: Tile 4 plane format support
-:13: WARNING:TYPO_SPELLING: 'assocating' may be misspelled - perhaps 'associating'?
#13: 
v2: - Moved Tile4 assocating struct for modifier/display to
                  ^^^^^^^^^^

total: 0 errors, 1 warnings, 0 checks, 159 lines checked
a4ae214f55c2 drm/i915/dg2: Add DG2 unified compression
a5ef8bfef402 uapi/drm/dg2: Introduce format modifier for DG2 clear color
c529f71d7954 drm/i915/dg2: Flat CCS Support
62e5cdbc9a5d drm/i915/Flat-CCS: Document on Flat-CCS memory compression
f0e7ebd7ff02 Doc/gpu/rfc/i915: i915 DG2 flat-CCS uAPI



^ permalink raw reply	[flat|nested] 80+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/dg2: Enabling 64k page size and flat ccs (rev5)
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
                   ` (20 preceding siblings ...)
  (?)
@ 2022-02-01 12:47 ` Patchwork
  -1 siblings, 0 replies; 80+ messages in thread
From: Patchwork @ 2022-02-01 12:47 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/dg2: Enabling 64k page size and flat ccs (rev5)
URL   : https://patchwork.freedesktop.org/series/95686/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 80+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/dg2: Enabling 64k page size and flat ccs (rev5)
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
                   ` (21 preceding siblings ...)
  (?)
@ 2022-02-01 13:15 ` Patchwork
  -1 siblings, 0 replies; 80+ messages in thread
From: Patchwork @ 2022-02-01 13:15 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 22281 bytes --]

== Series Details ==

Series: drm/i915/dg2: Enabling 64k page size and flat ccs (rev5)
URL   : https://patchwork.freedesktop.org/series/95686/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11172 -> Patchwork_22148
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_22148 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_22148, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/index.html

Participating hosts (44 -> 42)
------------------------------

  Additional (1): fi-pnv-d510 
  Missing    (3): fi-icl-u2 fi-bdw-samus fi-kbl-8809g 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_22148:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@gtt:
    - fi-bsw-kefka:       [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-bsw-kefka/igt@i915_selftest@live@gtt.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bsw-kefka/igt@i915_selftest@live@gtt.html
    - fi-bwr-2160:        [PASS][3] -> [DMESG-FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-bwr-2160/igt@i915_selftest@live@gtt.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bwr-2160/igt@i915_selftest@live@gtt.html
    - fi-skl-guc:         [PASS][5] -> [DMESG-FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-skl-guc/igt@i915_selftest@live@gtt.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-skl-guc/igt@i915_selftest@live@gtt.html
    - fi-blb-e6850:       NOTRUN -> [DMESG-FAIL][7]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-blb-e6850/igt@i915_selftest@live@gtt.html
    - fi-kbl-7567u:       [PASS][8] -> [DMESG-FAIL][9]
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-kbl-7567u/igt@i915_selftest@live@gtt.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-kbl-7567u/igt@i915_selftest@live@gtt.html
    - fi-glk-j4005:       [PASS][10] -> [DMESG-FAIL][11]
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-glk-j4005/igt@i915_selftest@live@gtt.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-glk-j4005/igt@i915_selftest@live@gtt.html
    - fi-bsw-nick:        NOTRUN -> [INCOMPLETE][12]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bsw-nick/igt@i915_selftest@live@gtt.html
    - fi-cfl-8109u:       [PASS][13] -> [DMESG-FAIL][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-cfl-8109u/igt@i915_selftest@live@gtt.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-cfl-8109u/igt@i915_selftest@live@gtt.html
    - fi-snb-2520m:       [PASS][15] -> [DMESG-FAIL][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-snb-2520m/igt@i915_selftest@live@gtt.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-snb-2520m/igt@i915_selftest@live@gtt.html
    - fi-cfl-8700k:       [PASS][17] -> [DMESG-FAIL][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-cfl-8700k/igt@i915_selftest@live@gtt.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-cfl-8700k/igt@i915_selftest@live@gtt.html
    - fi-cml-u2:          [PASS][19] -> [DMESG-FAIL][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-cml-u2/igt@i915_selftest@live@gtt.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-cml-u2/igt@i915_selftest@live@gtt.html
    - fi-ilk-650:         [PASS][21] -> [DMESG-FAIL][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-ilk-650/igt@i915_selftest@live@gtt.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-ilk-650/igt@i915_selftest@live@gtt.html
    - fi-bsw-n3050:       [PASS][23] -> [DMESG-FAIL][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-bsw-n3050/igt@i915_selftest@live@gtt.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bsw-n3050/igt@i915_selftest@live@gtt.html
    - fi-cfl-guc:         [PASS][25] -> [DMESG-FAIL][26]
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-cfl-guc/igt@i915_selftest@live@gtt.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-cfl-guc/igt@i915_selftest@live@gtt.html
    - fi-skl-6700k2:      [PASS][27] -> [DMESG-FAIL][28]
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-skl-6700k2/igt@i915_selftest@live@gtt.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-skl-6700k2/igt@i915_selftest@live@gtt.html
    - fi-elk-e7500:       [PASS][29] -> [DMESG-FAIL][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-elk-e7500/igt@i915_selftest@live@gtt.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-elk-e7500/igt@i915_selftest@live@gtt.html
    - fi-kbl-soraka:      [PASS][31] -> [INCOMPLETE][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-kbl-soraka/igt@i915_selftest@live@gtt.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-kbl-soraka/igt@i915_selftest@live@gtt.html
    - fi-glk-dsi:         [PASS][33] -> [DMESG-FAIL][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-glk-dsi/igt@i915_selftest@live@gtt.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-glk-dsi/igt@i915_selftest@live@gtt.html
    - fi-ivb-3770:        [PASS][35] -> [DMESG-FAIL][36]
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-ivb-3770/igt@i915_selftest@live@gtt.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-ivb-3770/igt@i915_selftest@live@gtt.html
    - bat-dg1-6:          [PASS][37] -> [DMESG-FAIL][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/bat-dg1-6/igt@i915_selftest@live@gtt.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/bat-dg1-6/igt@i915_selftest@live@gtt.html
    - fi-snb-2600:        [PASS][39] -> [DMESG-FAIL][40]
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-snb-2600/igt@i915_selftest@live@gtt.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-snb-2600/igt@i915_selftest@live@gtt.html
    - fi-rkl-guc:         [PASS][41] -> [DMESG-FAIL][42]
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-rkl-guc/igt@i915_selftest@live@gtt.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-rkl-guc/igt@i915_selftest@live@gtt.html
    - fi-kbl-x1275:       [PASS][43] -> [DMESG-FAIL][44]
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-kbl-x1275/igt@i915_selftest@live@gtt.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-kbl-x1275/igt@i915_selftest@live@gtt.html
    - fi-kbl-7500u:       [PASS][45] -> [DMESG-FAIL][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-kbl-7500u/igt@i915_selftest@live@gtt.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-kbl-7500u/igt@i915_selftest@live@gtt.html
    - fi-rkl-11600:       [PASS][47] -> [DMESG-FAIL][48]
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-rkl-11600/igt@i915_selftest@live@gtt.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-rkl-11600/igt@i915_selftest@live@gtt.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - fi-hsw-4770:        [PASS][49] -> [INCOMPLETE][50]
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-hsw-4770/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-hsw-4770/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live@gtt:
    - {bat-jsl-2}:        [PASS][51] -> [DMESG-FAIL][52]
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/bat-jsl-2/igt@i915_selftest@live@gtt.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/bat-jsl-2/igt@i915_selftest@live@gtt.html
    - {fi-ehl-2}:         [PASS][53] -> [DMESG-FAIL][54]
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-ehl-2/igt@i915_selftest@live@gtt.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-ehl-2/igt@i915_selftest@live@gtt.html
    - {fi-tgl-dsi}:       [PASS][55] -> [INCOMPLETE][56]
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-tgl-dsi/igt@i915_selftest@live@gtt.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-tgl-dsi/igt@i915_selftest@live@gtt.html
    - {fi-jsl-1}:         [PASS][57] -> [DMESG-FAIL][58]
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-jsl-1/igt@i915_selftest@live@gtt.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-jsl-1/igt@i915_selftest@live@gtt.html
    - {bat-jsl-1}:        [PASS][59] -> [DMESG-FAIL][60]
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/bat-jsl-1/igt@i915_selftest@live@gtt.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/bat-jsl-1/igt@i915_selftest@live@gtt.html

  
Known issues
------------

  Here are the changes found in Patchwork_22148 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-skl-6700k2:      [PASS][61] -> [DMESG-FAIL][62] ([i915#2291] / [i915#541])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-skl-6700k2/igt@i915_selftest@live@gt_heartbeat.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-skl-6700k2/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@gtt:
    - bat-adlp-4:         [PASS][63] -> [DMESG-FAIL][64] ([i915#4955])
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/bat-adlp-4/igt@i915_selftest@live@gtt.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/bat-adlp-4/igt@i915_selftest@live@gtt.html
    - fi-bxt-dsi:         [PASS][65] -> [DMESG-FAIL][66] ([i915#2927])
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-bxt-dsi/igt@i915_selftest@live@gtt.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bxt-dsi/igt@i915_selftest@live@gtt.html
    - fi-pnv-d510:        NOTRUN -> [DMESG-FAIL][67] ([i915#2927])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-pnv-d510/igt@i915_selftest@live@gtt.html

  * igt@kms_chamelium@dp-crc-fast:
    - fi-bsw-nick:        NOTRUN -> [SKIP][68] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bsw-nick/igt@kms_chamelium@dp-crc-fast.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cml-u2:          [PASS][69] -> [DMESG-WARN][70] ([i915#4269])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html

  * igt@prime_vgem@basic-fence-flip:
    - fi-bsw-nick:        NOTRUN -> [SKIP][71] ([fdo#109271]) +44 similar issues
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bsw-nick/igt@prime_vgem@basic-fence-flip.html

  * igt@prime_vgem@basic-userptr:
    - fi-pnv-d510:        NOTRUN -> [SKIP][72] ([fdo#109271]) +39 similar issues
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-pnv-d510/igt@prime_vgem@basic-userptr.html

  * igt@runner@aborted:
    - fi-rkl-11600:       NOTRUN -> [FAIL][73] ([i915#4312])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-rkl-11600/igt@runner@aborted.html
    - fi-snb-2600:        NOTRUN -> [FAIL][74] ([i915#4312])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-snb-2600/igt@runner@aborted.html
    - fi-ilk-650:         NOTRUN -> [FAIL][75] ([fdo#109271] / [i915#4312])
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-ilk-650/igt@runner@aborted.html
    - fi-pnv-d510:        NOTRUN -> [FAIL][76] ([fdo#109271] / [i915#2403] / [i915#4312])
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-pnv-d510/igt@runner@aborted.html
    - fi-kbl-x1275:       NOTRUN -> [FAIL][77] ([i915#1436] / [i915#4312])
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-kbl-x1275/igt@runner@aborted.html
    - fi-bsw-kefka:       NOTRUN -> [FAIL][78] ([fdo#109271] / [i915#1436] / [i915#3428] / [i915#4312])
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bsw-kefka/igt@runner@aborted.html
    - fi-cfl-8700k:       NOTRUN -> [FAIL][79] ([i915#4312])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-cfl-8700k/igt@runner@aborted.html
    - fi-cfl-8109u:       NOTRUN -> [FAIL][80] ([i915#4312])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-cfl-8109u/igt@runner@aborted.html
    - fi-glk-dsi:         NOTRUN -> [FAIL][81] ([i915#4312] / [k.org#202321])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-glk-dsi/igt@runner@aborted.html
    - fi-snb-2520m:       NOTRUN -> [FAIL][82] ([i915#4312])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-snb-2520m/igt@runner@aborted.html
    - fi-bwr-2160:        NOTRUN -> [FAIL][83] ([i915#4312])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bwr-2160/igt@runner@aborted.html
    - fi-kbl-soraka:      NOTRUN -> [FAIL][84] ([i915#1436] / [i915#4312])
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-kbl-soraka/igt@runner@aborted.html
    - fi-kbl-7500u:       NOTRUN -> [FAIL][85] ([i915#1436] / [i915#4312])
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-kbl-7500u/igt@runner@aborted.html
    - bat-adlp-4:         NOTRUN -> [FAIL][86] ([i915#4312])
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/bat-adlp-4/igt@runner@aborted.html
    - fi-cml-u2:          NOTRUN -> [FAIL][87] ([i915#4312])
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-cml-u2/igt@runner@aborted.html
    - fi-rkl-guc:         NOTRUN -> [FAIL][88] ([i915#4312])
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-rkl-guc/igt@runner@aborted.html
    - fi-ivb-3770:        NOTRUN -> [FAIL][89] ([fdo#109271] / [i915#4312])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-ivb-3770/igt@runner@aborted.html
    - fi-bxt-dsi:         NOTRUN -> [FAIL][90] ([i915#4312])
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bxt-dsi/igt@runner@aborted.html
    - bat-dg1-6:          NOTRUN -> [FAIL][91] ([i915#4312])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/bat-dg1-6/igt@runner@aborted.html
    - fi-elk-e7500:       NOTRUN -> [FAIL][92] ([fdo#109271] / [i915#4312])
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-elk-e7500/igt@runner@aborted.html
    - fi-cfl-guc:         NOTRUN -> [FAIL][93] ([i915#4312])
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-cfl-guc/igt@runner@aborted.html
    - fi-glk-j4005:       NOTRUN -> [FAIL][94] ([i915#4312] / [k.org#202321])
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-glk-j4005/igt@runner@aborted.html
    - fi-kbl-7567u:       NOTRUN -> [FAIL][95] ([i915#1436] / [i915#4312])
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-kbl-7567u/igt@runner@aborted.html
    - fi-skl-guc:         NOTRUN -> [FAIL][96] ([i915#1436] / [i915#4312])
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-skl-guc/igt@runner@aborted.html
    - fi-skl-6700k2:      NOTRUN -> [FAIL][97] ([i915#1436] / [i915#4312])
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-skl-6700k2/igt@runner@aborted.html
    - fi-bsw-n3050:       NOTRUN -> [FAIL][98] ([fdo#109271] / [i915#1436] / [i915#3428] / [i915#4312])
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bsw-n3050/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@gem_ctx_exec@basic:
    - fi-bsw-nick:        [DMESG-WARN][99] ([i915#3428]) -> [PASS][100]
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-bsw-nick/igt@gem_ctx_exec@basic.html
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bsw-nick/igt@gem_ctx_exec@basic.html

  * igt@i915_selftest@live@requests:
    - fi-blb-e6850:       [DMESG-FAIL][101] ([i915#5026]) -> [PASS][102]
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-blb-e6850/igt@i915_selftest@live@requests.html
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-blb-e6850/igt@i915_selftest@live@requests.html

  
#### Warnings ####

  * igt@kms_psr@primary_page_flip:
    - fi-skl-6600u:       [FAIL][103] ([i915#4547]) -> [INCOMPLETE][104] ([i915#4838])
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-skl-6600u/igt@kms_psr@primary_page_flip.html
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-skl-6600u/igt@kms_psr@primary_page_flip.html

  * igt@runner@aborted:
    - fi-bsw-nick:        [FAIL][105] ([i915#3428] / [i915#4312]) -> [FAIL][106] ([fdo#109271] / [i915#1436] / [i915#3428] / [i915#4312])
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-bsw-nick/igt@runner@aborted.html
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-bsw-nick/igt@runner@aborted.html
    - fi-blb-e6850:       [FAIL][107] ([fdo#109271] / [i915#2403] / [i915#2426] / [i915#4312]) -> [FAIL][108] ([fdo#109271] / [i915#2403] / [i915#4312])
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11172/fi-blb-e6850/igt@runner@aborted.html
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/fi-blb-e6850/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109308]: https://bugs.freedesktop.org/show_bug.cgi?id=109308
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1155]: https://gitlab.freedesktop.org/drm/intel/issues/1155
  [i915#1436]: https://gitlab.freedesktop.org/drm/intel/issues/1436
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#2291]: https://gitlab.freedesktop.org/drm/intel/issues/2291
  [i915#2403]: https://gitlab.freedesktop.org/drm/intel/issues/2403
  [i915#2426]: https://gitlab.freedesktop.org/drm/intel/issues/2426
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#2927]: https://gitlab.freedesktop.org/drm/intel/issues/2927
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3428]: https://gitlab.freedesktop.org/drm/intel/issues/3428
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3970]: https://gitlab.freedesktop.org/drm/intel/issues/3970
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4269]: https://gitlab.freedesktop.org/drm/intel/issues/4269
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4391]: https://gitlab.freedesktop.org/drm/intel/issues/4391
  [i915#4547]: https://gitlab.freedesktop.org/drm/intel/issues/4547
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4838]: https://gitlab.freedesktop.org/drm/intel/issues/4838
  [i915#4897]: https://gitlab.freedesktop.org/drm/intel/issues/4897
  [i915#4955]: https://gitlab.freedesktop.org/drm/intel/issues/4955
  [i915#5026]: https://gitlab.freedesktop.org/drm/intel/issues/5026
  [i915#541]: https://gitlab.freedesktop.org/drm/intel/issues/541
  [k.org#202321]: https://bugzilla.kernel.org/show_bug.cgi?id=202321


Build changes
-------------

  * Linux: CI_DRM_11172 -> Patchwork_22148

  CI-20190529: 20190529
  CI_DRM_11172: 466c37c518256a1c79ed5f6ed4d3db1866c17910 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6337: 7c9c034619ef9dbfbfe041fbf3973a1cf1ac7a22 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_22148: f0e7ebd7ff02d63583876459802d24e708019203 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

f0e7ebd7ff02 Doc/gpu/rfc/i915: i915 DG2 flat-CCS uAPI
62e5cdbc9a5d drm/i915/Flat-CCS: Document on Flat-CCS memory compression
c529f71d7954 drm/i915/dg2: Flat CCS Support
a5ef8bfef402 uapi/drm/dg2: Introduce format modifier for DG2 clear color
a4ae214f55c2 drm/i915/dg2: Add DG2 unified compression
41874785a443 drm/i915/dg2: Tile 4 plane format support
035ac563c069 drm/i915: Introduce new Tile 4 format
90946c4b0096 drm/i915/gt: Clear compress metadata for Xe_HP platforms
9dad7bbc9b86 drm/i915/lmem: Enable lmem for platforms with Flat CCS
19bfdb83b514 drm/i915/xehpsdv: Add has_flat_ccs to device info
d49ca1a2fc0d Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
d4227abc52e4 drm/i915/uapi: document behaviour for DG2 64K support
a8710e64c4a8 drm/i915/migrate: add acceleration support for DG2
441f25ff52f8 drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
08193c9fbdc7 drm/i915/gtt: allow overriding the pt alignment
29fc5ce90221 drm/i915: add gtt misalignment test
7373e84047bf drm/i915: support 64K GTT pages for discrete cards
f93c6346dd31 drm/i915: enforce min GTT alignment for discrete cards
7f652d8d712c drm/i915: add needs_compact_pt flag

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22148/index.html

[-- Attachment #2: Type: text/html, Size: 26708 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
  2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
  (?)
@ 2022-02-12  1:17   ` Nanley Chery
  2022-02-15 14:53     ` Juha-Pekka Heikkila
  -1 siblings, 1 reply; 80+ messages in thread
From: Nanley Chery @ 2022-02-12  1:17 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx, Nanley Chery, Matthew Auld, dri-devel

On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com> wrote:
>
> From: Matt Roper <matthew.d.roper@intel.com>
>
> DG2 unifies render compression and media compression into a single
> format for the first time.  The programming and buffer layout is
> supposed to match compression on older gen12 platforms, but the actual
> compression algorithm is different from any previous platform; as such,
> we need a new framebuffer modifier to represent buffers in this format,
> but otherwise we can re-use the existing gen12 compression driver logic.
>
> v2:
>   Display version fix [Imre]
>
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
> cc: Anshuman Gupta <anshuman.gupta@intel.com>
> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_fb.c       | 13 ++++++++++
>  .../drm/i915/display/skl_universal_plane.c    | 26 ++++++++++++++++---
>  include/uapi/drm/drm_fourcc.h                 | 22 ++++++++++++++++
>  3 files changed, 57 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
> index 94c57facbb46..4d4d01963f15 100644
> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> @@ -141,6 +141,14 @@ struct intel_modifier_desc {
>
>  static const struct intel_modifier_desc intel_modifiers[] = {
>         {
> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
> +               .display_ver = { 13, 13 },
> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
> +       }, {
> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
> +               .display_ver = { 13, 13 },
> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC,
> +       }, {
>                 .modifier = I915_FORMAT_MOD_4_TILED,
>                 .display_ver = { 13, 13 },
>                 .plane_caps = INTEL_PLANE_CAP_TILING_4,
> @@ -550,6 +558,8 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
>                         return 128;
>                 else
>                         return 512;
> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>         case I915_FORMAT_MOD_4_TILED:
>                 /*
>                  * Each 4K tile consists of 64B(8*8) subtiles, with
> @@ -752,6 +762,9 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
>         case I915_FORMAT_MOD_4_TILED:
>         case I915_FORMAT_MOD_Yf_TILED:
>                 return 1 * 1024 * 1024;
> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> +               return 16 * 1024;
>         default:
>                 MISSING_CASE(fb->modifier);
>                 return 0;
> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> index 5299dfe68802..c38ae0876c15 100644
> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> @@ -764,6 +764,14 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
>                 return PLANE_CTL_TILED_Y;
>         case I915_FORMAT_MOD_4_TILED:
>                 return PLANE_CTL_TILED_4;
> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> +               return PLANE_CTL_TILED_4 |
> +                       PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
> +                       PLANE_CTL_CLEAR_COLOR_DISABLE;
> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> +               return PLANE_CTL_TILED_4 |
> +                       PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
> +                       PLANE_CTL_CLEAR_COLOR_DISABLE;
>         case I915_FORMAT_MOD_Y_TILED_CCS:
>         case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
>                 return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> @@ -2094,6 +2102,10 @@ static bool gen12_plane_has_mc_ccs(struct drm_i915_private *i915,
>         if (IS_ADLP_DISPLAY_STEP(i915, STEP_A0, STEP_B0))
>                 return false;
>
> +       /* Wa_14013215631 */
> +       if (IS_DG2_DISPLAY_STEP(i915, STEP_A0, STEP_C0))
> +               return false;
> +
>         return plane_id < PLANE_SPRITE4;
>  }
>
> @@ -2335,9 +2347,10 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
>         case PLANE_CTL_TILED_Y:
>                 plane_config->tiling = I915_TILING_Y;
>                 if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> -                       fb->modifier = DISPLAY_VER(dev_priv) >= 12 ?
> -                               I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS :
> -                               I915_FORMAT_MOD_Y_TILED_CCS;
> +                       if (DISPLAY_VER(dev_priv) >= 12)
> +                               fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS;
> +                       else
> +                               fb->modifier = I915_FORMAT_MOD_Y_TILED_CCS;
>                 else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
>                         fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS;
>                 else
> @@ -2345,7 +2358,12 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
>                 break;
>         case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
>                 if (HAS_4TILE(dev_priv)) {
> -                       fb->modifier = I915_FORMAT_MOD_4_TILED;
> +                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> +                               fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
> +                       else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> +                               fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
> +                       else
> +                               fb->modifier = I915_FORMAT_MOD_4_TILED;
>                 } else {
>                         if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>                                 fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index b73fe6797fc3..b8fb7b44c03c 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -583,6 +583,28 @@ extern "C" {
>   */
>  #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
>
> +/*
> + * Intel color control surfaces (CCS) for DG2 render compression.
> + *
> + * DG2 uses a new compression format for render compression. The general
> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> + * but a new hashing/compression algorithm is used, so a fresh modifier must
> + * be associated with buffers of this type. Render compression uses 128 byte
> + * compression blocks.

I think I've seen a way to configure the compression block size on TGL
at least. I can't find the spec text for that at the moment though...
Could we omit these mentions?

> + */
> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
> +

How about something like:

The main surface is Tile 4 and at plane index 0. The CCS plane is
hidden from userspace. The main surface pitch is required to be a
multiple of four Tile 4 widths. The CCS is configured with the render
compression format associated with the main surface format.

....I think the CCS is technically accessible via the blitter engine,
so the part about the plane being "hidden" may need some tweaking.


-Nanley

> +/*
> + * Intel color control surfaces (CCS) for DG2 media compression.
> + *
> + * DG2 uses a new compression format for media compression. The general
> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> + * but a new hashing/compression algorithm is used, so a fresh modifier must
> + * be associated with buffers of this type. Media compression uses 256 byte
> + * compression blocks.
> + */
> +#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
> +
>  /*
>   * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
>   *
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
  (?)
@ 2022-02-12  1:19   ` Nanley Chery
  2022-02-15 14:55     ` Juha-Pekka Heikkila
  -1 siblings, 1 reply; 80+ messages in thread
From: Nanley Chery @ 2022-02-12  1:19 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx, Nanley Chery, Matthew Auld, dri-devel

On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com> wrote:
>
> From: Mika Kahola <mika.kahola@intel.com>
>
> DG2 clear color render compression uses Tile4 layout. Therefore, we need
> to define a new format modifier for uAPI to support clear color rendering.
>
> v2:
>   Display version is fixed. [Imre]
>   KDoc is enhanced for cc modifier. [Nanley & Lionel]
>
> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> cc: Anshuman Gupta <anshuman.gupta@intel.com>
> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
>  drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
>  include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
>  3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
> index 4d4d01963f15..3df6ef5ffec5 100644
> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc intel_modifiers[] = {
>                 .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
>                 .display_ver = { 13, 13 },
>                 .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
> +       }, {
> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
> +               .display_ver = { 13, 13 },
> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
> +
> +               .ccs.cc_planes = BIT(1),
>         }, {
>                 .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
>                 .display_ver = { 13, 13 },
> @@ -559,6 +565,7 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
>                 else
>                         return 512;
>         case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>         case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>         case I915_FORMAT_MOD_4_TILED:
>                 /*
> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
>         case I915_FORMAT_MOD_Yf_TILED:
>                 return 1 * 1024 * 1024;
>         case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>         case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>                 return 16 * 1024;
>         default:
> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> index c38ae0876c15..b4dced1907c5 100644
> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
>                 return PLANE_CTL_TILED_4 |
>                         PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
>                         PLANE_CTL_CLEAR_COLOR_DISABLE;
> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> +               return PLANE_CTL_TILED_4 | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
>         case I915_FORMAT_MOD_Y_TILED_CCS:
>         case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
>                 return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
>                 break;
>         case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
>                 if (HAS_4TILE(dev_priv)) {
> -                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> +                       u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
> +                                     PLANE_CTL_CLEAR_COLOR_DISABLE;
> +
> +                       if ((val & rc_mask) == rc_mask)
>                                 fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
>                         else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
>                                 fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
> +                       else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> +                               fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
>                         else
>                                 fb->modifier = I915_FORMAT_MOD_4_TILED;
>                 } else {
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index b8fb7b44c03c..697614ea4b84 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -605,6 +605,16 @@ extern "C" {
>   */
>  #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
>
> +/*
> + * Intel color control surfaces (CCS) for DG2 clear color render compression.
> + *
> + * DG2 uses a unified compression format for clear color render compression.

What's unified about DG2's compression format? If this doesn't affect
the layout, maybe we should drop this sentence.

> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> + *

This also needs a pitch aligned to four tiles, right? I think we can
save some effort by referencing the DG2_RC_CCS modifier here.

> + * Fast clear color value expected by HW is located in fb at offset 0 of plane#1

Why is the expected offset hardcoded to 0 instead of relying on the
offset provided by the modifier API? This looks like a bug.

We should probably give some info about the relevant fields in the
fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).

-Nanley

> + */
> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12)
> +
>  /*
>   * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
>   *
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
  2022-02-12  1:17   ` Nanley Chery
@ 2022-02-15 14:53     ` Juha-Pekka Heikkila
  2022-02-17 17:15         ` Chery, Nanley G
  0 siblings, 1 reply; 80+ messages in thread
From: Juha-Pekka Heikkila @ 2022-02-15 14:53 UTC (permalink / raw)
  To: Nanley Chery, Ramalingam C
  Cc: intel-gfx, Nanley Chery, Matthew Auld, dri-devel

On 12.2.2022 3.17, Nanley Chery wrote:
> On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com> wrote:
>>
>> From: Matt Roper <matthew.d.roper@intel.com>
>>
>> DG2 unifies render compression and media compression into a single
>> format for the first time.  The programming and buffer layout is
>> supposed to match compression on older gen12 platforms, but the actual
>> compression algorithm is different from any previous platform; as such,
>> we need a new framebuffer modifier to represent buffers in this format,
>> but otherwise we can re-use the existing gen12 compression driver logic.
>>
>> v2:
>>    Display version fix [Imre]
>>
>> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
>> cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
>> Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
>> cc: Anshuman Gupta <anshuman.gupta@intel.com>
>> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> ---
>>   drivers/gpu/drm/i915/display/intel_fb.c       | 13 ++++++++++
>>   .../drm/i915/display/skl_universal_plane.c    | 26 ++++++++++++++++---
>>   include/uapi/drm/drm_fourcc.h                 | 22 ++++++++++++++++
>>   3 files changed, 57 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
>> index 94c57facbb46..4d4d01963f15 100644
>> --- a/drivers/gpu/drm/i915/display/intel_fb.c
>> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
>> @@ -141,6 +141,14 @@ struct intel_modifier_desc {
>>
>>   static const struct intel_modifier_desc intel_modifiers[] = {
>>          {
>> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
>> +               .display_ver = { 13, 13 },
>> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
>> +       }, {
>> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
>> +               .display_ver = { 13, 13 },
>> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC,
>> +       }, {
>>                  .modifier = I915_FORMAT_MOD_4_TILED,
>>                  .display_ver = { 13, 13 },
>>                  .plane_caps = INTEL_PLANE_CAP_TILING_4,
>> @@ -550,6 +558,8 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
>>                          return 128;
>>                  else
>>                          return 512;
>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
>> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>>          case I915_FORMAT_MOD_4_TILED:
>>                  /*
>>                   * Each 4K tile consists of 64B(8*8) subtiles, with
>> @@ -752,6 +762,9 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
>>          case I915_FORMAT_MOD_4_TILED:
>>          case I915_FORMAT_MOD_Yf_TILED:
>>                  return 1 * 1024 * 1024;
>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
>> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>> +               return 16 * 1024;
>>          default:
>>                  MISSING_CASE(fb->modifier);
>>                  return 0;
>> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
>> index 5299dfe68802..c38ae0876c15 100644
>> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
>> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
>> @@ -764,6 +764,14 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
>>                  return PLANE_CTL_TILED_Y;
>>          case I915_FORMAT_MOD_4_TILED:
>>                  return PLANE_CTL_TILED_4;
>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
>> +               return PLANE_CTL_TILED_4 |
>> +                       PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
>> +                       PLANE_CTL_CLEAR_COLOR_DISABLE;
>> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>> +               return PLANE_CTL_TILED_4 |
>> +                       PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
>> +                       PLANE_CTL_CLEAR_COLOR_DISABLE;
>>          case I915_FORMAT_MOD_Y_TILED_CCS:
>>          case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
>>                  return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
>> @@ -2094,6 +2102,10 @@ static bool gen12_plane_has_mc_ccs(struct drm_i915_private *i915,
>>          if (IS_ADLP_DISPLAY_STEP(i915, STEP_A0, STEP_B0))
>>                  return false;
>>
>> +       /* Wa_14013215631 */
>> +       if (IS_DG2_DISPLAY_STEP(i915, STEP_A0, STEP_C0))
>> +               return false;
>> +
>>          return plane_id < PLANE_SPRITE4;
>>   }
>>
>> @@ -2335,9 +2347,10 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
>>          case PLANE_CTL_TILED_Y:
>>                  plane_config->tiling = I915_TILING_Y;
>>                  if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>> -                       fb->modifier = DISPLAY_VER(dev_priv) >= 12 ?
>> -                               I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS :
>> -                               I915_FORMAT_MOD_Y_TILED_CCS;
>> +                       if (DISPLAY_VER(dev_priv) >= 12)
>> +                               fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS;
>> +                       else
>> +                               fb->modifier = I915_FORMAT_MOD_Y_TILED_CCS;
>>                  else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
>>                          fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS;
>>                  else
>> @@ -2345,7 +2358,12 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
>>                  break;
>>          case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
>>                  if (HAS_4TILE(dev_priv)) {
>> -                       fb->modifier = I915_FORMAT_MOD_4_TILED;
>> +                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>> +                               fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
>> +                       else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
>> +                               fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
>> +                       else
>> +                               fb->modifier = I915_FORMAT_MOD_4_TILED;
>>                  } else {
>>                          if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>>                                  fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
>> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>> index b73fe6797fc3..b8fb7b44c03c 100644
>> --- a/include/uapi/drm/drm_fourcc.h
>> +++ b/include/uapi/drm/drm_fourcc.h
>> @@ -583,6 +583,28 @@ extern "C" {
>>    */
>>   #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
>>
>> +/*
>> + * Intel color control surfaces (CCS) for DG2 render compression.
>> + *
>> + * DG2 uses a new compression format for render compression. The general
>> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
>> + * but a new hashing/compression algorithm is used, so a fresh modifier must
>> + * be associated with buffers of this type. Render compression uses 128 byte
>> + * compression blocks.
> 
> I think I've seen a way to configure the compression block size on TGL
> at least. I can't find the spec text for that at the moment though...
> Could we omit these mentions?

Not sure why general possibility of changing compression block size is 
relevant? All hw features can be changed but this defines how this 
modifier is being implemented.

Say you take I915_FORMAT_MOD_4_TILED_DG2_RC_CCS framebuffer including 
control surface and copy it out, then come back and restore framebuffer 
with same information. It is expected to be valid?

/Juha-Pekka

> 
>> + */
>> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
>> +
> 
> How about something like:
> 
> The main surface is Tile 4 and at plane index 0. The CCS plane is
> hidden from userspace. The main surface pitch is required to be a
> multiple of four Tile 4 widths. The CCS is configured with the render
> compression format associated with the main surface format.
> 
> ....I think the CCS is technically accessible via the blitter engine,
> so the part about the plane being "hidden" may need some tweaking.
> 
> 
> -Nanley
> 
>> +/*
>> + * Intel color control surfaces (CCS) for DG2 media compression.
>> + *
>> + * DG2 uses a new compression format for media compression. The general
>> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
>> + * but a new hashing/compression algorithm is used, so a fresh modifier must
>> + * be associated with buffers of this type. Media compression uses 256 byte
>> + * compression blocks.
>> + */
>> +#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
>> +
>>   /*
>>    * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
>>    *
>> --
>> 2.20.1
>>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-02-12  1:19   ` Nanley Chery
@ 2022-02-15 14:55     ` Juha-Pekka Heikkila
  2022-02-15 15:02         ` Chery, Nanley G
  0 siblings, 1 reply; 80+ messages in thread
From: Juha-Pekka Heikkila @ 2022-02-15 14:55 UTC (permalink / raw)
  To: Nanley Chery, Ramalingam C
  Cc: intel-gfx, Nanley Chery, Matthew Auld, dri-devel

On 12.2.2022 3.19, Nanley Chery wrote:
> On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com> wrote:
>>
>> From: Mika Kahola <mika.kahola@intel.com>
>>
>> DG2 clear color render compression uses Tile4 layout. Therefore, we need
>> to define a new format modifier for uAPI to support clear color rendering.
>>
>> v2:
>>    Display version is fixed. [Imre]
>>    KDoc is enhanced for cc modifier. [Nanley & Lionel]
>>
>> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
>> cc: Anshuman Gupta <anshuman.gupta@intel.com>
>> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> ---
>>   drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
>>   drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
>>   include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
>>   3 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
>> index 4d4d01963f15..3df6ef5ffec5 100644
>> --- a/drivers/gpu/drm/i915/display/intel_fb.c
>> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
>> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc intel_modifiers[] = {
>>                  .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
>>                  .display_ver = { 13, 13 },
>>                  .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
>> +       }, {
>> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
>> +               .display_ver = { 13, 13 },
>> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
>> +
>> +               .ccs.cc_planes = BIT(1),
>>          }, {
>>                  .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
>>                  .display_ver = { 13, 13 },
>> @@ -559,6 +565,7 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
>>                  else
>>                          return 512;
>>          case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>>          case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>>          case I915_FORMAT_MOD_4_TILED:
>>                  /*
>> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
>>          case I915_FORMAT_MOD_Yf_TILED:
>>                  return 1 * 1024 * 1024;
>>          case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>>          case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>>                  return 16 * 1024;
>>          default:
>> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
>> index c38ae0876c15..b4dced1907c5 100644
>> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
>> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
>> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
>>                  return PLANE_CTL_TILED_4 |
>>                          PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
>>                          PLANE_CTL_CLEAR_COLOR_DISABLE;
>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>> +               return PLANE_CTL_TILED_4 | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
>>          case I915_FORMAT_MOD_Y_TILED_CCS:
>>          case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
>>                  return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
>> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
>>                  break;
>>          case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
>>                  if (HAS_4TILE(dev_priv)) {
>> -                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>> +                       u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
>> +                                     PLANE_CTL_CLEAR_COLOR_DISABLE;
>> +
>> +                       if ((val & rc_mask) == rc_mask)
>>                                  fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
>>                          else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
>>                                  fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
>> +                       else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>> +                               fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
>>                          else
>>                                  fb->modifier = I915_FORMAT_MOD_4_TILED;
>>                  } else {
>> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>> index b8fb7b44c03c..697614ea4b84 100644
>> --- a/include/uapi/drm/drm_fourcc.h
>> +++ b/include/uapi/drm/drm_fourcc.h
>> @@ -605,6 +605,16 @@ extern "C" {
>>    */
>>   #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
>>
>> +/*
>> + * Intel color control surfaces (CCS) for DG2 clear color render compression.
>> + *
>> + * DG2 uses a unified compression format for clear color render compression.
> 
> What's unified about DG2's compression format? If this doesn't affect
> the layout, maybe we should drop this sentence.
> 
>> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
>> + *
> 
> This also needs a pitch aligned to four tiles, right? I think we can
> save some effort by referencing the DG2_RC_CCS modifier here.
> 
>> + * Fast clear color value expected by HW is located in fb at offset 0 of plane#1
> 
> Why is the expected offset hardcoded to 0 instead of relying on the
> offset provided by the modifier API? This looks like a bug.

Hi Nanley,

can you elaborate a bit, which offset from modifier API that applies to 
cc surface?

> 
> We should probably give some info about the relevant fields in the
> fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).

agree, that's totally missing here.

/Juha-Pekka

> 
>> + */
>> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12)
>> +
>>   /*
>>    * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
>>    *
>> --
>> 2.20.1
>>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-02-15 14:55     ` Juha-Pekka Heikkila
@ 2022-02-15 15:02         ` Chery, Nanley G
  0 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-02-15 15:02 UTC (permalink / raw)
  To: juhapekka.heikkila, Nanley Chery, C, Ramalingam
  Cc: intel-gfx, Auld, Matthew, dri-devel



> -----Original Message-----
> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Sent: Tuesday, February 15, 2022 6:56 AM
> To: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam
> <ramalingam.c@intel.com>
> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Chery, Nanley G
> <nanley.g.chery@intel.com>; Auld, Matthew <matthew.auld@intel.com>; dri-
> devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format
> modifier for DG2 clear color
> 
> On 12.2.2022 3.19, Nanley Chery wrote:
> > On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com>
> wrote:
> >>
> >> From: Mika Kahola <mika.kahola@intel.com>
> >>
> >> DG2 clear color render compression uses Tile4 layout. Therefore, we
> >> need to define a new format modifier for uAPI to support clear color
> rendering.
> >>
> >> v2:
> >>    Display version is fixed. [Imre]
> >>    KDoc is enhanced for cc modifier. [Nanley & Lionel]
> >>
> >> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> >> cc: Anshuman Gupta <anshuman.gupta@intel.com>
> >> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> >> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
> >>   drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
> >>   include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
> >>   3 files changed, 26 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c
> >> b/drivers/gpu/drm/i915/display/intel_fb.c
> >> index 4d4d01963f15..3df6ef5ffec5 100644
> >> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> >> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> >> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
> intel_modifiers[] = {
> >>                  .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
> >>                  .display_ver = { 13, 13 },
> >>                  .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >> INTEL_PLANE_CAP_CCS_MC,
> >> +       }, {
> >> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
> >> +               .display_ver = { 13, 13 },
> >> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >> + INTEL_PLANE_CAP_CCS_RC_CC,
> >> +
> >> +               .ccs.cc_planes = BIT(1),
> >>          }, {
> >>                  .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
> >>                  .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
> >> intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
> >>                  else
> >>                          return 512;
> >>          case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>          case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>          case I915_FORMAT_MOD_4_TILED:
> >>                  /*
> >> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct
> drm_framebuffer *fb,
> >>          case I915_FORMAT_MOD_Yf_TILED:
> >>                  return 1 * 1024 * 1024;
> >>          case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>          case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>                  return 16 * 1024;
> >>          default:
> >> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> index c38ae0876c15..b4dced1907c5 100644
> >> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
> >>                  return PLANE_CTL_TILED_4 |
> >>                          PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
> >>                          PLANE_CTL_CLEAR_COLOR_DISABLE;
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >> +               return PLANE_CTL_TILED_4 |
> >> + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >>          case I915_FORMAT_MOD_Y_TILED_CCS:
> >>          case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
> >>                  return PLANE_CTL_TILED_Y |
> >> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc
> *crtc,
> >>                  break;
> >>          case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
> >>                  if (HAS_4TILE(dev_priv)) {
> >> -                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >> +                       u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
> >> +                                     PLANE_CTL_CLEAR_COLOR_DISABLE;
> >> +
> >> +                       if ((val & rc_mask) == rc_mask)
> >>                                  fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
> >>                          else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> >>                                  fb->modifier =
> >> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
> >> +                       else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >> +                               fb->modifier =
> >> + I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
> >>                          else
> >>                                  fb->modifier = I915_FORMAT_MOD_4_TILED;
> >>                  } else {
> >> diff --git a/include/uapi/drm/drm_fourcc.h
> >> b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84
> >> 100644
> >> --- a/include/uapi/drm/drm_fourcc.h
> >> +++ b/include/uapi/drm/drm_fourcc.h
> >> @@ -605,6 +605,16 @@ extern "C" {
> >>    */
> >>   #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> fourcc_mod_code(INTEL,
> >> 11)
> >>
> >> +/*
> >> + * Intel color control surfaces (CCS) for DG2 clear color render compression.
> >> + *
> >> + * DG2 uses a unified compression format for clear color render
> compression.
> >
> > What's unified about DG2's compression format? If this doesn't affect
> > the layout, maybe we should drop this sentence.
> >
> >> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> >> + *
> >
> > This also needs a pitch aligned to four tiles, right? I think we can
> > save some effort by referencing the DG2_RC_CCS modifier here.
> >
> >> + * Fast clear color value expected by HW is located in fb at offset
> >> + 0 of plane#1
> >
> > Why is the expected offset hardcoded to 0 instead of relying on the
> > offset provided by the modifier API? This looks like a bug.
> 
> Hi Nanley,
> 
> can you elaborate a bit, which offset from modifier API that applies to cc surface?
> 

Hi Juha-Pekka,

On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].

-Nanley

> >
> > We should probably give some info about the relevant fields in the
> > fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
> 
> agree, that's totally missing here.
> 
> /Juha-Pekka
> 
> >
> >> + */
> >> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
> fourcc_mod_code(INTEL,
> >> +12)
> >> +
> >>   /*
> >>    * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> >>    *
> >> --
> >> 2.20.1
> >>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
@ 2022-02-15 15:02         ` Chery, Nanley G
  0 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-02-15 15:02 UTC (permalink / raw)
  To: juhapekka.heikkila, Nanley Chery, C, Ramalingam
  Cc: intel-gfx, Auld, Matthew, dri-devel



> -----Original Message-----
> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Sent: Tuesday, February 15, 2022 6:56 AM
> To: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam
> <ramalingam.c@intel.com>
> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Chery, Nanley G
> <nanley.g.chery@intel.com>; Auld, Matthew <matthew.auld@intel.com>; dri-
> devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format
> modifier for DG2 clear color
> 
> On 12.2.2022 3.19, Nanley Chery wrote:
> > On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com>
> wrote:
> >>
> >> From: Mika Kahola <mika.kahola@intel.com>
> >>
> >> DG2 clear color render compression uses Tile4 layout. Therefore, we
> >> need to define a new format modifier for uAPI to support clear color
> rendering.
> >>
> >> v2:
> >>    Display version is fixed. [Imre]
> >>    KDoc is enhanced for cc modifier. [Nanley & Lionel]
> >>
> >> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> >> cc: Anshuman Gupta <anshuman.gupta@intel.com>
> >> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> >> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
> >>   drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
> >>   include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
> >>   3 files changed, 26 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c
> >> b/drivers/gpu/drm/i915/display/intel_fb.c
> >> index 4d4d01963f15..3df6ef5ffec5 100644
> >> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> >> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> >> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
> intel_modifiers[] = {
> >>                  .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
> >>                  .display_ver = { 13, 13 },
> >>                  .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >> INTEL_PLANE_CAP_CCS_MC,
> >> +       }, {
> >> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
> >> +               .display_ver = { 13, 13 },
> >> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >> + INTEL_PLANE_CAP_CCS_RC_CC,
> >> +
> >> +               .ccs.cc_planes = BIT(1),
> >>          }, {
> >>                  .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
> >>                  .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
> >> intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
> >>                  else
> >>                          return 512;
> >>          case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>          case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>          case I915_FORMAT_MOD_4_TILED:
> >>                  /*
> >> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct
> drm_framebuffer *fb,
> >>          case I915_FORMAT_MOD_Yf_TILED:
> >>                  return 1 * 1024 * 1024;
> >>          case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>          case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>                  return 16 * 1024;
> >>          default:
> >> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> index c38ae0876c15..b4dced1907c5 100644
> >> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
> >>                  return PLANE_CTL_TILED_4 |
> >>                          PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
> >>                          PLANE_CTL_CLEAR_COLOR_DISABLE;
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >> +               return PLANE_CTL_TILED_4 |
> >> + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >>          case I915_FORMAT_MOD_Y_TILED_CCS:
> >>          case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
> >>                  return PLANE_CTL_TILED_Y |
> >> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc
> *crtc,
> >>                  break;
> >>          case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
> >>                  if (HAS_4TILE(dev_priv)) {
> >> -                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >> +                       u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
> >> +                                     PLANE_CTL_CLEAR_COLOR_DISABLE;
> >> +
> >> +                       if ((val & rc_mask) == rc_mask)
> >>                                  fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
> >>                          else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> >>                                  fb->modifier =
> >> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
> >> +                       else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >> +                               fb->modifier =
> >> + I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
> >>                          else
> >>                                  fb->modifier = I915_FORMAT_MOD_4_TILED;
> >>                  } else {
> >> diff --git a/include/uapi/drm/drm_fourcc.h
> >> b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84
> >> 100644
> >> --- a/include/uapi/drm/drm_fourcc.h
> >> +++ b/include/uapi/drm/drm_fourcc.h
> >> @@ -605,6 +605,16 @@ extern "C" {
> >>    */
> >>   #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> fourcc_mod_code(INTEL,
> >> 11)
> >>
> >> +/*
> >> + * Intel color control surfaces (CCS) for DG2 clear color render compression.
> >> + *
> >> + * DG2 uses a unified compression format for clear color render
> compression.
> >
> > What's unified about DG2's compression format? If this doesn't affect
> > the layout, maybe we should drop this sentence.
> >
> >> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> >> + *
> >
> > This also needs a pitch aligned to four tiles, right? I think we can
> > save some effort by referencing the DG2_RC_CCS modifier here.
> >
> >> + * Fast clear color value expected by HW is located in fb at offset
> >> + 0 of plane#1
> >
> > Why is the expected offset hardcoded to 0 instead of relying on the
> > offset provided by the modifier API? This looks like a bug.
> 
> Hi Nanley,
> 
> can you elaborate a bit, which offset from modifier API that applies to cc surface?
> 

Hi Juha-Pekka,

On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].

-Nanley

> >
> > We should probably give some info about the relevant fields in the
> > fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
> 
> agree, that's totally missing here.
> 
> /Juha-Pekka
> 
> >
> >> + */
> >> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
> fourcc_mod_code(INTEL,
> >> +12)
> >> +
> >>   /*
> >>    * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> >>    *
> >> --
> >> 2.20.1
> >>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-02-15 15:02         ` Chery, Nanley G
  (?)
@ 2022-02-15 16:15         ` Juha-Pekka Heikkila
  2022-02-15 16:44             ` Chery, Nanley G
  -1 siblings, 1 reply; 80+ messages in thread
From: Juha-Pekka Heikkila @ 2022-02-15 16:15 UTC (permalink / raw)
  To: Chery, Nanley G, Nanley Chery, C, Ramalingam
  Cc: intel-gfx, Auld, Matthew, dri-devel

On 15.2.2022 17.02, Chery, Nanley G wrote:
> 
> 
>> -----Original Message-----
>> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
>> Sent: Tuesday, February 15, 2022 6:56 AM
>> To: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam
>> <ramalingam.c@intel.com>
>> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Chery, Nanley G
>> <nanley.g.chery@intel.com>; Auld, Matthew <matthew.auld@intel.com>; dri-
>> devel <dri-devel@lists.freedesktop.org>
>> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format
>> modifier for DG2 clear color
>>
>> On 12.2.2022 3.19, Nanley Chery wrote:
>>> On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com>
>> wrote:
>>>>
>>>> From: Mika Kahola <mika.kahola@intel.com>
>>>>
>>>> DG2 clear color render compression uses Tile4 layout. Therefore, we
>>>> need to define a new format modifier for uAPI to support clear color
>> rendering.
>>>>
>>>> v2:
>>>>     Display version is fixed. [Imre]
>>>>     KDoc is enhanced for cc modifier. [Nanley & Lionel]
>>>>
>>>> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
>>>> cc: Anshuman Gupta <anshuman.gupta@intel.com>
>>>> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
>>>>    drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
>>>>    include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
>>>>    3 files changed, 26 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c
>>>> b/drivers/gpu/drm/i915/display/intel_fb.c
>>>> index 4d4d01963f15..3df6ef5ffec5 100644
>>>> --- a/drivers/gpu/drm/i915/display/intel_fb.c
>>>> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
>>>> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
>> intel_modifiers[] = {
>>>>                   .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
>>>>                   .display_ver = { 13, 13 },
>>>>                   .plane_caps = INTEL_PLANE_CAP_TILING_4 |
>>>> INTEL_PLANE_CAP_CCS_MC,
>>>> +       }, {
>>>> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
>>>> +               .display_ver = { 13, 13 },
>>>> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
>>>> + INTEL_PLANE_CAP_CCS_RC_CC,
>>>> +
>>>> +               .ccs.cc_planes = BIT(1),
>>>>           }, {
>>>>                   .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
>>>>                   .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
>>>> intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
>>>>                   else
>>>>                           return 512;
>>>>           case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>>>>           case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>>>>           case I915_FORMAT_MOD_4_TILED:
>>>>                   /*
>>>> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct
>> drm_framebuffer *fb,
>>>>           case I915_FORMAT_MOD_Yf_TILED:
>>>>                   return 1 * 1024 * 1024;
>>>>           case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>>>>           case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>>>>                   return 16 * 1024;
>>>>           default:
>>>> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>> b/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>> index c38ae0876c15..b4dced1907c5 100644
>>>> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
>>>>                   return PLANE_CTL_TILED_4 |
>>>>                           PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
>>>>                           PLANE_CTL_CLEAR_COLOR_DISABLE;
>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>>>> +               return PLANE_CTL_TILED_4 |
>>>> + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
>>>>           case I915_FORMAT_MOD_Y_TILED_CCS:
>>>>           case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
>>>>                   return PLANE_CTL_TILED_Y |
>>>> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
>>>> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc
>> *crtc,
>>>>                   break;
>>>>           case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
>>>>                   if (HAS_4TILE(dev_priv)) {
>>>> -                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>>>> +                       u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
>>>> +                                     PLANE_CTL_CLEAR_COLOR_DISABLE;
>>>> +
>>>> +                       if ((val & rc_mask) == rc_mask)
>>>>                                   fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
>>>>                           else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
>>>>                                   fb->modifier =
>>>> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
>>>> +                       else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>>>> +                               fb->modifier =
>>>> + I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
>>>>                           else
>>>>                                   fb->modifier = I915_FORMAT_MOD_4_TILED;
>>>>                   } else {
>>>> diff --git a/include/uapi/drm/drm_fourcc.h
>>>> b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84
>>>> 100644
>>>> --- a/include/uapi/drm/drm_fourcc.h
>>>> +++ b/include/uapi/drm/drm_fourcc.h
>>>> @@ -605,6 +605,16 @@ extern "C" {
>>>>     */
>>>>    #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
>> fourcc_mod_code(INTEL,
>>>> 11)
>>>>
>>>> +/*
>>>> + * Intel color control surfaces (CCS) for DG2 clear color render compression.
>>>> + *
>>>> + * DG2 uses a unified compression format for clear color render
>> compression.
>>>
>>> What's unified about DG2's compression format? If this doesn't affect
>>> the layout, maybe we should drop this sentence.
>>>
>>>> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
>>>> + *
>>>
>>> This also needs a pitch aligned to four tiles, right? I think we can
>>> save some effort by referencing the DG2_RC_CCS modifier here.
>>>
>>>> + * Fast clear color value expected by HW is located in fb at offset
>>>> + 0 of plane#1
>>>
>>> Why is the expected offset hardcoded to 0 instead of relying on the
>>> offset provided by the modifier API? This looks like a bug.
>>
>> Hi Nanley,
>>
>> can you elaborate a bit, which offset from modifier API that applies to cc surface?
>>
> 
> Hi Juha-Pekka,
> 
> On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
> 

Hi Nanley,

this offset is coming from userspace on creation of framebuffer, at that 
moment from userspace caller can point to offset of desire. Normally 
offset[0] is set at 0 and then offset[n] at plane n start which is not 
stated to have to be exactly after plane n-1 end. Or did I misunderstand 
what you meant?

For cc plane this offset likely will not be zero anyway and caller can 
move it as see fit to have cc plane (plane[1]) location[0] at place 
where wanted to have it.

/Juha-Pekka

> 
>>>
>>> We should probably give some info about the relevant fields in the
>>> fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
>>
>> agree, that's totally missing here.
>>
>> /Juha-Pekka
>>
>>>
>>>> + */
>>>> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
>> fourcc_mod_code(INTEL,
>>>> +12)
>>>> +
>>>>    /*
>>>>     * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
>>>>     *
>>>> --
>>>> 2.20.1
>>>>
> 


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-02-15 16:15         ` Juha-Pekka Heikkila
@ 2022-02-15 16:44             ` Chery, Nanley G
  0 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-02-15 16:44 UTC (permalink / raw)
  To: juhapekka.heikkila, Nanley Chery, C, Ramalingam
  Cc: intel-gfx, Auld, Matthew, dri-devel



> -----Original Message-----
> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Sent: Tuesday, February 15, 2022 8:15 AM
> To: Chery, Nanley G <nanley.g.chery@intel.com>; Nanley Chery
> <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>
> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Auld, Matthew
> <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format
> modifier for DG2 clear color
> 
> On 15.2.2022 17.02, Chery, Nanley G wrote:
> >
> >
> >> -----Original Message-----
> >> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> >> Sent: Tuesday, February 15, 2022 6:56 AM
> >> To: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam
> >> <ramalingam.c@intel.com>
> >> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Chery, Nanley G
> >> <nanley.g.chery@intel.com>; Auld, Matthew <matthew.auld@intel.com>;
> >> dri- devel <dri-devel@lists.freedesktop.org>
> >> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce
> >> format modifier for DG2 clear color
> >>
> >> On 12.2.2022 3.19, Nanley Chery wrote:
> >>> On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com>
> >> wrote:
> >>>>
> >>>> From: Mika Kahola <mika.kahola@intel.com>
> >>>>
> >>>> DG2 clear color render compression uses Tile4 layout. Therefore, we
> >>>> need to define a new format modifier for uAPI to support clear
> >>>> color
> >> rendering.
> >>>>
> >>>> v2:
> >>>>     Display version is fixed. [Imre]
> >>>>     KDoc is enhanced for cc modifier. [Nanley & Lionel]
> >>>>
> >>>> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> >>>> cc: Anshuman Gupta <anshuman.gupta@intel.com>
> >>>> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> >>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
> >>>>    drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
> >>>>    include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
> >>>>    3 files changed, 26 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c
> >>>> b/drivers/gpu/drm/i915/display/intel_fb.c
> >>>> index 4d4d01963f15..3df6ef5ffec5 100644
> >>>> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> >>>> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> >>>> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
> >> intel_modifiers[] = {
> >>>>                   .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
> >>>>                   .display_ver = { 13, 13 },
> >>>>                   .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >>>> INTEL_PLANE_CAP_CCS_MC,
> >>>> +       }, {
> >>>> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
> >>>> +               .display_ver = { 13, 13 },
> >>>> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >>>> + INTEL_PLANE_CAP_CCS_RC_CC,
> >>>> +
> >>>> +               .ccs.cc_planes = BIT(1),
> >>>>           }, {
> >>>>                   .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
> >>>>                   .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
> >>>> intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
> >>>>                   else
> >>>>                           return 512;
> >>>>           case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>>           case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>>>           case I915_FORMAT_MOD_4_TILED:
> >>>>                   /*
> >>>> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct
> >> drm_framebuffer *fb,
> >>>>           case I915_FORMAT_MOD_Yf_TILED:
> >>>>                   return 1 * 1024 * 1024;
> >>>>           case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>>           case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>>>                   return 16 * 1024;
> >>>>           default:
> >>>> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>> b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>> index c38ae0876c15..b4dced1907c5 100644
> >>>> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
> >>>>                   return PLANE_CTL_TILED_4 |
> >>>>                           PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
> >>>>                           PLANE_CTL_CLEAR_COLOR_DISABLE;
> >>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>> +               return PLANE_CTL_TILED_4 |
> >>>> + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >>>>           case I915_FORMAT_MOD_Y_TILED_CCS:
> >>>>           case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
> >>>>                   return PLANE_CTL_TILED_Y |
> >>>> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >>>> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct
> >>>> intel_crtc
> >> *crtc,
> >>>>                   break;
> >>>>           case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
> >>>>                   if (HAS_4TILE(dev_priv)) {
> >>>> -                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >>>> +                       u32 rc_mask =
> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
> >>>> +
> >>>> + PLANE_CTL_CLEAR_COLOR_DISABLE;
> >>>> +
> >>>> +                       if ((val & rc_mask) == rc_mask)
> >>>>                                   fb->modifier =
> I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
> >>>>                           else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> >>>>                                   fb->modifier =
> >>>> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
> >>>> +                       else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >>>> +                               fb->modifier =
> >>>> + I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
> >>>>                           else
> >>>>                                   fb->modifier = I915_FORMAT_MOD_4_TILED;
> >>>>                   } else {
> >>>> diff --git a/include/uapi/drm/drm_fourcc.h
> >>>> b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84
> >>>> 100644
> >>>> --- a/include/uapi/drm/drm_fourcc.h
> >>>> +++ b/include/uapi/drm/drm_fourcc.h
> >>>> @@ -605,6 +605,16 @@ extern "C" {
> >>>>     */
> >>>>    #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> >> fourcc_mod_code(INTEL,
> >>>> 11)
> >>>>
> >>>> +/*
> >>>> + * Intel color control surfaces (CCS) for DG2 clear color render
> compression.
> >>>> + *
> >>>> + * DG2 uses a unified compression format for clear color render
> >> compression.
> >>>
> >>> What's unified about DG2's compression format? If this doesn't
> >>> affect the layout, maybe we should drop this sentence.
> >>>
> >>>> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> >>>> + *
> >>>
> >>> This also needs a pitch aligned to four tiles, right? I think we can
> >>> save some effort by referencing the DG2_RC_CCS modifier here.
> >>>
> >>>> + * Fast clear color value expected by HW is located in fb at
> >>>> + offset
> >>>> + 0 of plane#1
> >>>
> >>> Why is the expected offset hardcoded to 0 instead of relying on the
> >>> offset provided by the modifier API? This looks like a bug.
> >>
> >> Hi Nanley,
> >>
> >> can you elaborate a bit, which offset from modifier API that applies to cc
> surface?
> >>
> >
> > Hi Juha-Pekka,
> >
> > On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
> >
> 
> Hi Nanley,
> 
> this offset is coming from userspace on creation of framebuffer, at that moment
> from userspace caller can point to offset of desire. Normally offset[0] is set at 0
> and then offset[n] at plane n start which is not stated to have to be exactly after
> plane n-1 end. Or did I misunderstand what you meant?
> 

Perhaps, at least, I'm not sure what you're meaning to say. This modifier description
seems to say that the drm_mode_fb_cmd2::offsets value for the clear color plane
must be zero. Are you saying that it's correct? This doesn't match the 
GEN12_RC_CCS_CC behavior and doesn't match mesa's expectations.

-Nanley

> For cc plane this offset likely will not be zero anyway and caller can move it as see
> fit to have cc plane (plane[1]) location[0] at place where wanted to have it.
> 
> /Juha-Pekka
> 
> >
> >>>
> >>> We should probably give some info about the relevant fields in the
> >>> fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
> >>
> >> agree, that's totally missing here.
> >>
> >> /Juha-Pekka
> >>
> >>>
> >>>> + */
> >>>> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
> >> fourcc_mod_code(INTEL,
> >>>> +12)
> >>>> +
> >>>>    /*
> >>>>     * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> >>>>     *
> >>>> --
> >>>> 2.20.1
> >>>>
> >


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
@ 2022-02-15 16:44             ` Chery, Nanley G
  0 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-02-15 16:44 UTC (permalink / raw)
  To: juhapekka.heikkila, Nanley Chery, C, Ramalingam
  Cc: intel-gfx, Auld, Matthew, dri-devel



> -----Original Message-----
> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Sent: Tuesday, February 15, 2022 8:15 AM
> To: Chery, Nanley G <nanley.g.chery@intel.com>; Nanley Chery
> <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>
> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Auld, Matthew
> <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format
> modifier for DG2 clear color
> 
> On 15.2.2022 17.02, Chery, Nanley G wrote:
> >
> >
> >> -----Original Message-----
> >> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> >> Sent: Tuesday, February 15, 2022 6:56 AM
> >> To: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam
> >> <ramalingam.c@intel.com>
> >> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Chery, Nanley G
> >> <nanley.g.chery@intel.com>; Auld, Matthew <matthew.auld@intel.com>;
> >> dri- devel <dri-devel@lists.freedesktop.org>
> >> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce
> >> format modifier for DG2 clear color
> >>
> >> On 12.2.2022 3.19, Nanley Chery wrote:
> >>> On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com>
> >> wrote:
> >>>>
> >>>> From: Mika Kahola <mika.kahola@intel.com>
> >>>>
> >>>> DG2 clear color render compression uses Tile4 layout. Therefore, we
> >>>> need to define a new format modifier for uAPI to support clear
> >>>> color
> >> rendering.
> >>>>
> >>>> v2:
> >>>>     Display version is fixed. [Imre]
> >>>>     KDoc is enhanced for cc modifier. [Nanley & Lionel]
> >>>>
> >>>> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> >>>> cc: Anshuman Gupta <anshuman.gupta@intel.com>
> >>>> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> >>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
> >>>>    drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
> >>>>    include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
> >>>>    3 files changed, 26 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c
> >>>> b/drivers/gpu/drm/i915/display/intel_fb.c
> >>>> index 4d4d01963f15..3df6ef5ffec5 100644
> >>>> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> >>>> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> >>>> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
> >> intel_modifiers[] = {
> >>>>                   .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
> >>>>                   .display_ver = { 13, 13 },
> >>>>                   .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >>>> INTEL_PLANE_CAP_CCS_MC,
> >>>> +       }, {
> >>>> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
> >>>> +               .display_ver = { 13, 13 },
> >>>> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >>>> + INTEL_PLANE_CAP_CCS_RC_CC,
> >>>> +
> >>>> +               .ccs.cc_planes = BIT(1),
> >>>>           }, {
> >>>>                   .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
> >>>>                   .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
> >>>> intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
> >>>>                   else
> >>>>                           return 512;
> >>>>           case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>>           case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>>>           case I915_FORMAT_MOD_4_TILED:
> >>>>                   /*
> >>>> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct
> >> drm_framebuffer *fb,
> >>>>           case I915_FORMAT_MOD_Yf_TILED:
> >>>>                   return 1 * 1024 * 1024;
> >>>>           case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>>           case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>>>                   return 16 * 1024;
> >>>>           default:
> >>>> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>> b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>> index c38ae0876c15..b4dced1907c5 100644
> >>>> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
> >>>>                   return PLANE_CTL_TILED_4 |
> >>>>                           PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
> >>>>                           PLANE_CTL_CLEAR_COLOR_DISABLE;
> >>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>> +               return PLANE_CTL_TILED_4 |
> >>>> + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >>>>           case I915_FORMAT_MOD_Y_TILED_CCS:
> >>>>           case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
> >>>>                   return PLANE_CTL_TILED_Y |
> >>>> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >>>> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct
> >>>> intel_crtc
> >> *crtc,
> >>>>                   break;
> >>>>           case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
> >>>>                   if (HAS_4TILE(dev_priv)) {
> >>>> -                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >>>> +                       u32 rc_mask =
> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
> >>>> +
> >>>> + PLANE_CTL_CLEAR_COLOR_DISABLE;
> >>>> +
> >>>> +                       if ((val & rc_mask) == rc_mask)
> >>>>                                   fb->modifier =
> I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
> >>>>                           else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> >>>>                                   fb->modifier =
> >>>> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
> >>>> +                       else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >>>> +                               fb->modifier =
> >>>> + I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
> >>>>                           else
> >>>>                                   fb->modifier = I915_FORMAT_MOD_4_TILED;
> >>>>                   } else {
> >>>> diff --git a/include/uapi/drm/drm_fourcc.h
> >>>> b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84
> >>>> 100644
> >>>> --- a/include/uapi/drm/drm_fourcc.h
> >>>> +++ b/include/uapi/drm/drm_fourcc.h
> >>>> @@ -605,6 +605,16 @@ extern "C" {
> >>>>     */
> >>>>    #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> >> fourcc_mod_code(INTEL,
> >>>> 11)
> >>>>
> >>>> +/*
> >>>> + * Intel color control surfaces (CCS) for DG2 clear color render
> compression.
> >>>> + *
> >>>> + * DG2 uses a unified compression format for clear color render
> >> compression.
> >>>
> >>> What's unified about DG2's compression format? If this doesn't
> >>> affect the layout, maybe we should drop this sentence.
> >>>
> >>>> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> >>>> + *
> >>>
> >>> This also needs a pitch aligned to four tiles, right? I think we can
> >>> save some effort by referencing the DG2_RC_CCS modifier here.
> >>>
> >>>> + * Fast clear color value expected by HW is located in fb at
> >>>> + offset
> >>>> + 0 of plane#1
> >>>
> >>> Why is the expected offset hardcoded to 0 instead of relying on the
> >>> offset provided by the modifier API? This looks like a bug.
> >>
> >> Hi Nanley,
> >>
> >> can you elaborate a bit, which offset from modifier API that applies to cc
> surface?
> >>
> >
> > Hi Juha-Pekka,
> >
> > On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
> >
> 
> Hi Nanley,
> 
> this offset is coming from userspace on creation of framebuffer, at that moment
> from userspace caller can point to offset of desire. Normally offset[0] is set at 0
> and then offset[n] at plane n start which is not stated to have to be exactly after
> plane n-1 end. Or did I misunderstand what you meant?
> 

Perhaps, at least, I'm not sure what you're meaning to say. This modifier description
seems to say that the drm_mode_fb_cmd2::offsets value for the clear color plane
must be zero. Are you saying that it's correct? This doesn't match the 
GEN12_RC_CCS_CC behavior and doesn't match mesa's expectations.

-Nanley

> For cc plane this offset likely will not be zero anyway and caller can move it as see
> fit to have cc plane (plane[1]) location[0] at place where wanted to have it.
> 
> /Juha-Pekka
> 
> >
> >>>
> >>> We should probably give some info about the relevant fields in the
> >>> fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
> >>
> >> agree, that's totally missing here.
> >>
> >> /Juha-Pekka
> >>
> >>>
> >>>> + */
> >>>> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
> >> fourcc_mod_code(INTEL,
> >>>> +12)
> >>>> +
> >>>>    /*
> >>>>     * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> >>>>     *
> >>>> --
> >>>> 2.20.1
> >>>>
> >


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-02-15 16:44             ` Chery, Nanley G
  (?)
@ 2022-02-15 17:31             ` Juha-Pekka Heikkila
  2022-02-15 18:24                 ` Chery, Nanley G
  -1 siblings, 1 reply; 80+ messages in thread
From: Juha-Pekka Heikkila @ 2022-02-15 17:31 UTC (permalink / raw)
  To: Chery, Nanley G, Nanley Chery, C, Ramalingam
  Cc: intel-gfx, Auld, Matthew, dri-devel

On 15.2.2022 18.44, Chery, Nanley G wrote:
> 
> 
>> -----Original Message-----
>> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
>> Sent: Tuesday, February 15, 2022 8:15 AM
>> To: Chery, Nanley G <nanley.g.chery@intel.com>; Nanley Chery
>> <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>
>> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Auld, Matthew
>> <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
>> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format
>> modifier for DG2 clear color
>>
>> On 15.2.2022 17.02, Chery, Nanley G wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
>>>> Sent: Tuesday, February 15, 2022 6:56 AM
>>>> To: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam
>>>> <ramalingam.c@intel.com>
>>>> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Chery, Nanley G
>>>> <nanley.g.chery@intel.com>; Auld, Matthew <matthew.auld@intel.com>;
>>>> dri- devel <dri-devel@lists.freedesktop.org>
>>>> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce
>>>> format modifier for DG2 clear color
>>>>
>>>> On 12.2.2022 3.19, Nanley Chery wrote:
>>>>> On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com>
>>>> wrote:
>>>>>>
>>>>>> From: Mika Kahola <mika.kahola@intel.com>
>>>>>>
>>>>>> DG2 clear color render compression uses Tile4 layout. Therefore, we
>>>>>> need to define a new format modifier for uAPI to support clear
>>>>>> color
>>>> rendering.
>>>>>>
>>>>>> v2:
>>>>>>      Display version is fixed. [Imre]
>>>>>>      KDoc is enhanced for cc modifier. [Nanley & Lionel]
>>>>>>
>>>>>> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
>>>>>> cc: Anshuman Gupta <anshuman.gupta@intel.com>
>>>>>> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
>>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
>>>>>>     drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
>>>>>>     include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
>>>>>>     3 files changed, 26 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c
>>>>>> b/drivers/gpu/drm/i915/display/intel_fb.c
>>>>>> index 4d4d01963f15..3df6ef5ffec5 100644
>>>>>> --- a/drivers/gpu/drm/i915/display/intel_fb.c
>>>>>> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
>>>>>> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
>>>> intel_modifiers[] = {
>>>>>>                    .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
>>>>>>                    .display_ver = { 13, 13 },
>>>>>>                    .plane_caps = INTEL_PLANE_CAP_TILING_4 |
>>>>>> INTEL_PLANE_CAP_CCS_MC,
>>>>>> +       }, {
>>>>>> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
>>>>>> +               .display_ver = { 13, 13 },
>>>>>> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
>>>>>> + INTEL_PLANE_CAP_CCS_RC_CC,
>>>>>> +
>>>>>> +               .ccs.cc_planes = BIT(1),
>>>>>>            }, {
>>>>>>                    .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
>>>>>>                    .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
>>>>>> intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
>>>>>>                    else
>>>>>>                            return 512;
>>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
>>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>>>>>>            case I915_FORMAT_MOD_4_TILED:
>>>>>>                    /*
>>>>>> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct
>>>> drm_framebuffer *fb,
>>>>>>            case I915_FORMAT_MOD_Yf_TILED:
>>>>>>                    return 1 * 1024 * 1024;
>>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
>>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>>>>>>                    return 16 * 1024;
>>>>>>            default:
>>>>>> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>>>> b/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>>>> index c38ae0876c15..b4dced1907c5 100644
>>>>>> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>>>> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>>>> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
>>>>>>                    return PLANE_CTL_TILED_4 |
>>>>>>                            PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
>>>>>>                            PLANE_CTL_CLEAR_COLOR_DISABLE;
>>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>>>>>> +               return PLANE_CTL_TILED_4 |
>>>>>> + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
>>>>>>            case I915_FORMAT_MOD_Y_TILED_CCS:
>>>>>>            case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
>>>>>>                    return PLANE_CTL_TILED_Y |
>>>>>> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
>>>>>> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct
>>>>>> intel_crtc
>>>> *crtc,
>>>>>>                    break;
>>>>>>            case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
>>>>>>                    if (HAS_4TILE(dev_priv)) {
>>>>>> -                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>>>>>> +                       u32 rc_mask =
>> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
>>>>>> +
>>>>>> + PLANE_CTL_CLEAR_COLOR_DISABLE;
>>>>>> +
>>>>>> +                       if ((val & rc_mask) == rc_mask)
>>>>>>                                    fb->modifier =
>> I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
>>>>>>                            else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
>>>>>>                                    fb->modifier =
>>>>>> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
>>>>>> +                       else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>>>>>> +                               fb->modifier =
>>>>>> + I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
>>>>>>                            else
>>>>>>                                    fb->modifier = I915_FORMAT_MOD_4_TILED;
>>>>>>                    } else {
>>>>>> diff --git a/include/uapi/drm/drm_fourcc.h
>>>>>> b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84
>>>>>> 100644
>>>>>> --- a/include/uapi/drm/drm_fourcc.h
>>>>>> +++ b/include/uapi/drm/drm_fourcc.h
>>>>>> @@ -605,6 +605,16 @@ extern "C" {
>>>>>>      */
>>>>>>     #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
>>>> fourcc_mod_code(INTEL,
>>>>>> 11)
>>>>>>
>>>>>> +/*
>>>>>> + * Intel color control surfaces (CCS) for DG2 clear color render
>> compression.
>>>>>> + *
>>>>>> + * DG2 uses a unified compression format for clear color render
>>>> compression.
>>>>>
>>>>> What's unified about DG2's compression format? If this doesn't
>>>>> affect the layout, maybe we should drop this sentence.
>>>>>
>>>>>> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
>>>>>> + *
>>>>>
>>>>> This also needs a pitch aligned to four tiles, right? I think we can
>>>>> save some effort by referencing the DG2_RC_CCS modifier here.
>>>>>
>>>>>> + * Fast clear color value expected by HW is located in fb at
>>>>>> + offset
>>>>>> + 0 of plane#1
>>>>>
>>>>> Why is the expected offset hardcoded to 0 instead of relying on the
>>>>> offset provided by the modifier API? This looks like a bug.
>>>>
>>>> Hi Nanley,
>>>>
>>>> can you elaborate a bit, which offset from modifier API that applies to cc
>> surface?
>>>>
>>>
>>> Hi Juha-Pekka,
>>>
>>> On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
>>>
>>
>> Hi Nanley,
>>
>> this offset is coming from userspace on creation of framebuffer, at that moment
>> from userspace caller can point to offset of desire. Normally offset[0] is set at 0
>> and then offset[n] at plane n start which is not stated to have to be exactly after
>> plane n-1 end. Or did I misunderstand what you meant?
>>
> 
> Perhaps, at least, I'm not sure what you're meaning to say. This modifier description
> seems to say that the drm_mode_fb_cmd2::offsets value for the clear color plane
> must be zero. Are you saying that it's correct? This doesn't match the
> GEN12_RC_CCS_CC behavior and doesn't match mesa's expectations.
> 

It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color 
plane must be zero", it says "Fast clear color value expected by HW is 
located in fb at offset 0 of plane#1".

Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's 
nothing stated about that offset.

These offsets are just offsets to bo which contain the framebuffer 
information hence drm_mode_fb_cmd2::offsets[1] can be changed as one 
wish and cc information is found starting at drm_mode_fb_cmd2::offsets[1][0]

/Juha-Pekka

> 
>> For cc plane this offset likely will not be zero anyway and caller can move it as see
>> fit to have cc plane (plane[1]) location[0] at place where wanted to have it.
>>
>> /Juha-Pekka
>>
>>>
>>>>>
>>>>> We should probably give some info about the relevant fields in the
>>>>> fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
>>>>
>>>> agree, that's totally missing here.
>>>>
>>>> /Juha-Pekka
>>>>
>>>>>
>>>>>> + */
>>>>>> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
>>>> fourcc_mod_code(INTEL,
>>>>>> +12)
>>>>>> +
>>>>>>     /*
>>>>>>      * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
>>>>>>      *
>>>>>> --
>>>>>> 2.20.1
>>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-02-15 17:31             ` Juha-Pekka Heikkila
@ 2022-02-15 18:24                 ` Chery, Nanley G
  0 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-02-15 18:24 UTC (permalink / raw)
  To: juhapekka.heikkila, Nanley Chery, C, Ramalingam
  Cc: intel-gfx, Auld, Matthew, dri-devel



> -----Original Message-----
> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Sent: Tuesday, February 15, 2022 9:32 AM
> To: Chery, Nanley G <nanley.g.chery@intel.com>; Nanley Chery
> <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>
> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Auld, Matthew
> <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format
> modifier for DG2 clear color
> 
> On 15.2.2022 18.44, Chery, Nanley G wrote:
> >
> >
> >> -----Original Message-----
> >> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> >> Sent: Tuesday, February 15, 2022 8:15 AM
> >> To: Chery, Nanley G <nanley.g.chery@intel.com>; Nanley Chery
> >> <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>
> >> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Auld, Matthew
> >> <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> >> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce
> >> format modifier for DG2 clear color
> >>
> >> On 15.2.2022 17.02, Chery, Nanley G wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> >>>> Sent: Tuesday, February 15, 2022 6:56 AM
> >>>> To: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam
> >>>> <ramalingam.c@intel.com>
> >>>> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Chery, Nanley G
> >>>> <nanley.g.chery@intel.com>; Auld, Matthew <matthew.auld@intel.com>;
> >>>> dri- devel <dri-devel@lists.freedesktop.org>
> >>>> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce
> >>>> format modifier for DG2 clear color
> >>>>
> >>>> On 12.2.2022 3.19, Nanley Chery wrote:
> >>>>> On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C
> >>>>> <ramalingam.c@intel.com>
> >>>> wrote:
> >>>>>>
> >>>>>> From: Mika Kahola <mika.kahola@intel.com>
> >>>>>>
> >>>>>> DG2 clear color render compression uses Tile4 layout. Therefore,
> >>>>>> we need to define a new format modifier for uAPI to support clear
> >>>>>> color
> >>>> rendering.
> >>>>>>
> >>>>>> v2:
> >>>>>>      Display version is fixed. [Imre]
> >>>>>>      KDoc is enhanced for cc modifier. [Nanley & Lionel]
> >>>>>>
> >>>>>> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> >>>>>> cc: Anshuman Gupta <anshuman.gupta@intel.com>
> >>>>>> Signed-off-by: Juha-Pekka Heikkilä
> >>>>>> <juha-pekka.heikkila@intel.com>
> >>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> >>>>>> ---
> >>>>>>     drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
> >>>>>>     drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
> >>>>>>     include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
> >>>>>>     3 files changed, 26 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c
> >>>>>> b/drivers/gpu/drm/i915/display/intel_fb.c
> >>>>>> index 4d4d01963f15..3df6ef5ffec5 100644
> >>>>>> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> >>>>>> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> >>>>>> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
> >>>> intel_modifiers[] = {
> >>>>>>                    .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
> >>>>>>                    .display_ver = { 13, 13 },
> >>>>>>                    .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >>>>>> INTEL_PLANE_CAP_CCS_MC,
> >>>>>> +       }, {
> >>>>>> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
> >>>>>> +               .display_ver = { 13, 13 },
> >>>>>> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >>>>>> + INTEL_PLANE_CAP_CCS_RC_CC,
> >>>>>> +
> >>>>>> +               .ccs.cc_planes = BIT(1),
> >>>>>>            }, {
> >>>>>>                    .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
> >>>>>>                    .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
> >>>>>> intel_tile_width_bytes(const struct drm_framebuffer *fb, int
> color_plane)
> >>>>>>                    else
> >>>>>>                            return 512;
> >>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>>>>>            case I915_FORMAT_MOD_4_TILED:
> >>>>>>                    /*
> >>>>>> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const
> >>>>>> struct
> >>>> drm_framebuffer *fb,
> >>>>>>            case I915_FORMAT_MOD_Yf_TILED:
> >>>>>>                    return 1 * 1024 * 1024;
> >>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>>>>>                    return 16 * 1024;
> >>>>>>            default:
> >>>>>> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>>>> b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>>>> index c38ae0876c15..b4dced1907c5 100644
> >>>>>> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>>>> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>>>> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
> >>>>>>                    return PLANE_CTL_TILED_4 |
> >>>>>>                            PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
> >>>>>>                            PLANE_CTL_CLEAR_COLOR_DISABLE;
> >>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>>>> +               return PLANE_CTL_TILED_4 |
> >>>>>> + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >>>>>>            case I915_FORMAT_MOD_Y_TILED_CCS:
> >>>>>>            case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
> >>>>>>                    return PLANE_CTL_TILED_Y |
> >>>>>> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >>>>>> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct
> >>>>>> intel_crtc
> >>>> *crtc,
> >>>>>>                    break;
> >>>>>>            case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+
> */
> >>>>>>                    if (HAS_4TILE(dev_priv)) {
> >>>>>> -                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >>>>>> +                       u32 rc_mask =
> >> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
> >>>>>> +
> >>>>>> + PLANE_CTL_CLEAR_COLOR_DISABLE;
> >>>>>> +
> >>>>>> +                       if ((val & rc_mask) == rc_mask)
> >>>>>>                                    fb->modifier =
> >> I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
> >>>>>>                            else if (val &
> PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> >>>>>>                                    fb->modifier =
> >>>>>> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
> >>>>>> +                       else if (val &
> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >>>>>> +                               fb->modifier =
> >>>>>> + I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
> >>>>>>                            else
> >>>>>>                                    fb->modifier = I915_FORMAT_MOD_4_TILED;
> >>>>>>                    } else {
> >>>>>> diff --git a/include/uapi/drm/drm_fourcc.h
> >>>>>> b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84
> >>>>>> 100644
> >>>>>> --- a/include/uapi/drm/drm_fourcc.h
> >>>>>> +++ b/include/uapi/drm/drm_fourcc.h
> >>>>>> @@ -605,6 +605,16 @@ extern "C" {
> >>>>>>      */
> >>>>>>     #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> >>>> fourcc_mod_code(INTEL,
> >>>>>> 11)
> >>>>>>
> >>>>>> +/*
> >>>>>> + * Intel color control surfaces (CCS) for DG2 clear color render
> >> compression.
> >>>>>> + *
> >>>>>> + * DG2 uses a unified compression format for clear color render
> >>>> compression.
> >>>>>
> >>>>> What's unified about DG2's compression format? If this doesn't
> >>>>> affect the layout, maybe we should drop this sentence.
> >>>>>
> >>>>>> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> >>>>>> + *
> >>>>>
> >>>>> This also needs a pitch aligned to four tiles, right? I think we
> >>>>> can save some effort by referencing the DG2_RC_CCS modifier here.
> >>>>>
> >>>>>> + * Fast clear color value expected by HW is located in fb at
> >>>>>> + offset
> >>>>>> + 0 of plane#1
> >>>>>
> >>>>> Why is the expected offset hardcoded to 0 instead of relying on
> >>>>> the offset provided by the modifier API? This looks like a bug.
> >>>>
> >>>> Hi Nanley,
> >>>>
> >>>> can you elaborate a bit, which offset from modifier API that
> >>>> applies to cc
> >> surface?
> >>>>
> >>>
> >>> Hi Juha-Pekka,
> >>>
> >>> On the kernel-side of things, I'm thinking of
> drm_mode_fb_cmd2::offsets[1].
> >>>
> >>
> >> Hi Nanley,
> >>
> >> this offset is coming from userspace on creation of framebuffer, at
> >> that moment from userspace caller can point to offset of desire.
> >> Normally offset[0] is set at 0 and then offset[n] at plane n start
> >> which is not stated to have to be exactly after plane n-1 end. Or did I
> misunderstand what you meant?
> >>
> >
> > Perhaps, at least, I'm not sure what you're meaning to say. This
> > modifier description seems to say that the drm_mode_fb_cmd2::offsets
> > value for the clear color plane must be zero. Are you saying that it's
> > correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't
> match mesa's expectations.
> >
> 
> It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must
> be zero", it says "Fast clear color value expected by HW is located in fb at offset 0
> of plane#1".
> 

Yes, it doesn't say that exactly, but that's what it seems to say. With every other
modifier, it's implied that the data for the plane begins at the offset specified
through the modifier API. So, explicitly mentioning it here (and with that wording)
conveys a new requirement.

> Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's
> nothing stated about that offset.
> 

Technically, plane #1's location is specified to be the combination of ::handles[1]
and ::offsets[1]. In practice though, I can imagine that there are areas of the stack
that are implicitly requiring that all ::handles[] entries match.

> These offsets are just offsets to bo which contain the framebuffer information
> hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc
> information is found starting at drm_mode_fb_cmd2::offsets[1][0]
> 

If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane
index), I propose that we drop this sentence due to avoid any confusion.

This offset discussion raises another question. The description says that the value
expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine?
The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?

-Nanley

> /Juha-Pekka
> 
> >
> >> For cc plane this offset likely will not be zero anyway and caller
> >> can move it as see fit to have cc plane (plane[1]) location[0] at place where
> wanted to have it.
> >>
> >> /Juha-Pekka
> >>
> >>>
> >>>>>
> >>>>> We should probably give some info about the relevant fields in the
> >>>>> fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
> >>>>
> >>>> agree, that's totally missing here.
> >>>>
> >>>> /Juha-Pekka
> >>>>
> >>>>>
> >>>>>> + */
> >>>>>> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
> >>>> fourcc_mod_code(INTEL,
> >>>>>> +12)
> >>>>>> +
> >>>>>>     /*
> >>>>>>      * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> >>>>>>      *
> >>>>>> --
> >>>>>> 2.20.1
> >>>>>>
> >>>
> >


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
@ 2022-02-15 18:24                 ` Chery, Nanley G
  0 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-02-15 18:24 UTC (permalink / raw)
  To: juhapekka.heikkila, Nanley Chery, C, Ramalingam
  Cc: intel-gfx, Auld, Matthew, dri-devel



> -----Original Message-----
> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Sent: Tuesday, February 15, 2022 9:32 AM
> To: Chery, Nanley G <nanley.g.chery@intel.com>; Nanley Chery
> <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>
> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Auld, Matthew
> <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format
> modifier for DG2 clear color
> 
> On 15.2.2022 18.44, Chery, Nanley G wrote:
> >
> >
> >> -----Original Message-----
> >> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> >> Sent: Tuesday, February 15, 2022 8:15 AM
> >> To: Chery, Nanley G <nanley.g.chery@intel.com>; Nanley Chery
> >> <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>
> >> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Auld, Matthew
> >> <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> >> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce
> >> format modifier for DG2 clear color
> >>
> >> On 15.2.2022 17.02, Chery, Nanley G wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> >>>> Sent: Tuesday, February 15, 2022 6:56 AM
> >>>> To: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam
> >>>> <ramalingam.c@intel.com>
> >>>> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Chery, Nanley G
> >>>> <nanley.g.chery@intel.com>; Auld, Matthew <matthew.auld@intel.com>;
> >>>> dri- devel <dri-devel@lists.freedesktop.org>
> >>>> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce
> >>>> format modifier for DG2 clear color
> >>>>
> >>>> On 12.2.2022 3.19, Nanley Chery wrote:
> >>>>> On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C
> >>>>> <ramalingam.c@intel.com>
> >>>> wrote:
> >>>>>>
> >>>>>> From: Mika Kahola <mika.kahola@intel.com>
> >>>>>>
> >>>>>> DG2 clear color render compression uses Tile4 layout. Therefore,
> >>>>>> we need to define a new format modifier for uAPI to support clear
> >>>>>> color
> >>>> rendering.
> >>>>>>
> >>>>>> v2:
> >>>>>>      Display version is fixed. [Imre]
> >>>>>>      KDoc is enhanced for cc modifier. [Nanley & Lionel]
> >>>>>>
> >>>>>> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
> >>>>>> cc: Anshuman Gupta <anshuman.gupta@intel.com>
> >>>>>> Signed-off-by: Juha-Pekka Heikkilä
> >>>>>> <juha-pekka.heikkila@intel.com>
> >>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> >>>>>> ---
> >>>>>>     drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
> >>>>>>     drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
> >>>>>>     include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
> >>>>>>     3 files changed, 26 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c
> >>>>>> b/drivers/gpu/drm/i915/display/intel_fb.c
> >>>>>> index 4d4d01963f15..3df6ef5ffec5 100644
> >>>>>> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> >>>>>> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> >>>>>> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
> >>>> intel_modifiers[] = {
> >>>>>>                    .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
> >>>>>>                    .display_ver = { 13, 13 },
> >>>>>>                    .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >>>>>> INTEL_PLANE_CAP_CCS_MC,
> >>>>>> +       }, {
> >>>>>> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
> >>>>>> +               .display_ver = { 13, 13 },
> >>>>>> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> >>>>>> + INTEL_PLANE_CAP_CCS_RC_CC,
> >>>>>> +
> >>>>>> +               .ccs.cc_planes = BIT(1),
> >>>>>>            }, {
> >>>>>>                    .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
> >>>>>>                    .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
> >>>>>> intel_tile_width_bytes(const struct drm_framebuffer *fb, int
> color_plane)
> >>>>>>                    else
> >>>>>>                            return 512;
> >>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>>>>>            case I915_FORMAT_MOD_4_TILED:
> >>>>>>                    /*
> >>>>>> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const
> >>>>>> struct
> >>>> drm_framebuffer *fb,
> >>>>>>            case I915_FORMAT_MOD_Yf_TILED:
> >>>>>>                    return 1 * 1024 * 1024;
> >>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>>>>            case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>>>>>                    return 16 * 1024;
> >>>>>>            default:
> >>>>>> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>>>> b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>>>> index c38ae0876c15..b4dced1907c5 100644
> >>>>>> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>>>> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >>>>>> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
> >>>>>>                    return PLANE_CTL_TILED_4 |
> >>>>>>                            PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
> >>>>>>                            PLANE_CTL_CLEAR_COLOR_DISABLE;
> >>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
> >>>>>> +               return PLANE_CTL_TILED_4 |
> >>>>>> + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >>>>>>            case I915_FORMAT_MOD_Y_TILED_CCS:
> >>>>>>            case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
> >>>>>>                    return PLANE_CTL_TILED_Y |
> >>>>>> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >>>>>> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct
> >>>>>> intel_crtc
> >>>> *crtc,
> >>>>>>                    break;
> >>>>>>            case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+
> */
> >>>>>>                    if (HAS_4TILE(dev_priv)) {
> >>>>>> -                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >>>>>> +                       u32 rc_mask =
> >> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
> >>>>>> +
> >>>>>> + PLANE_CTL_CLEAR_COLOR_DISABLE;
> >>>>>> +
> >>>>>> +                       if ((val & rc_mask) == rc_mask)
> >>>>>>                                    fb->modifier =
> >> I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
> >>>>>>                            else if (val &
> PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> >>>>>>                                    fb->modifier =
> >>>>>> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
> >>>>>> +                       else if (val &
> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >>>>>> +                               fb->modifier =
> >>>>>> + I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
> >>>>>>                            else
> >>>>>>                                    fb->modifier = I915_FORMAT_MOD_4_TILED;
> >>>>>>                    } else {
> >>>>>> diff --git a/include/uapi/drm/drm_fourcc.h
> >>>>>> b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84
> >>>>>> 100644
> >>>>>> --- a/include/uapi/drm/drm_fourcc.h
> >>>>>> +++ b/include/uapi/drm/drm_fourcc.h
> >>>>>> @@ -605,6 +605,16 @@ extern "C" {
> >>>>>>      */
> >>>>>>     #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> >>>> fourcc_mod_code(INTEL,
> >>>>>> 11)
> >>>>>>
> >>>>>> +/*
> >>>>>> + * Intel color control surfaces (CCS) for DG2 clear color render
> >> compression.
> >>>>>> + *
> >>>>>> + * DG2 uses a unified compression format for clear color render
> >>>> compression.
> >>>>>
> >>>>> What's unified about DG2's compression format? If this doesn't
> >>>>> affect the layout, maybe we should drop this sentence.
> >>>>>
> >>>>>> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> >>>>>> + *
> >>>>>
> >>>>> This also needs a pitch aligned to four tiles, right? I think we
> >>>>> can save some effort by referencing the DG2_RC_CCS modifier here.
> >>>>>
> >>>>>> + * Fast clear color value expected by HW is located in fb at
> >>>>>> + offset
> >>>>>> + 0 of plane#1
> >>>>>
> >>>>> Why is the expected offset hardcoded to 0 instead of relying on
> >>>>> the offset provided by the modifier API? This looks like a bug.
> >>>>
> >>>> Hi Nanley,
> >>>>
> >>>> can you elaborate a bit, which offset from modifier API that
> >>>> applies to cc
> >> surface?
> >>>>
> >>>
> >>> Hi Juha-Pekka,
> >>>
> >>> On the kernel-side of things, I'm thinking of
> drm_mode_fb_cmd2::offsets[1].
> >>>
> >>
> >> Hi Nanley,
> >>
> >> this offset is coming from userspace on creation of framebuffer, at
> >> that moment from userspace caller can point to offset of desire.
> >> Normally offset[0] is set at 0 and then offset[n] at plane n start
> >> which is not stated to have to be exactly after plane n-1 end. Or did I
> misunderstand what you meant?
> >>
> >
> > Perhaps, at least, I'm not sure what you're meaning to say. This
> > modifier description seems to say that the drm_mode_fb_cmd2::offsets
> > value for the clear color plane must be zero. Are you saying that it's
> > correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't
> match mesa's expectations.
> >
> 
> It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must
> be zero", it says "Fast clear color value expected by HW is located in fb at offset 0
> of plane#1".
> 

Yes, it doesn't say that exactly, but that's what it seems to say. With every other
modifier, it's implied that the data for the plane begins at the offset specified
through the modifier API. So, explicitly mentioning it here (and with that wording)
conveys a new requirement.

> Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's
> nothing stated about that offset.
> 

Technically, plane #1's location is specified to be the combination of ::handles[1]
and ::offsets[1]. In practice though, I can imagine that there are areas of the stack
that are implicitly requiring that all ::handles[] entries match.

> These offsets are just offsets to bo which contain the framebuffer information
> hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc
> information is found starting at drm_mode_fb_cmd2::offsets[1][0]
> 

If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane
index), I propose that we drop this sentence due to avoid any confusion.

This offset discussion raises another question. The description says that the value
expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine?
The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?

-Nanley

> /Juha-Pekka
> 
> >
> >> For cc plane this offset likely will not be zero anyway and caller
> >> can move it as see fit to have cc plane (plane[1]) location[0] at place where
> wanted to have it.
> >>
> >> /Juha-Pekka
> >>
> >>>
> >>>>>
> >>>>> We should probably give some info about the relevant fields in the
> >>>>> fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
> >>>>
> >>>> agree, that's totally missing here.
> >>>>
> >>>> /Juha-Pekka
> >>>>
> >>>>>
> >>>>>> + */
> >>>>>> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
> >>>> fourcc_mod_code(INTEL,
> >>>>>> +12)
> >>>>>> +
> >>>>>>     /*
> >>>>>>      * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> >>>>>>      *
> >>>>>> --
> >>>>>> 2.20.1
> >>>>>>
> >>>
> >


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-02-15 18:24                 ` Chery, Nanley G
  (?)
@ 2022-02-15 19:34                 ` Juha-Pekka Heikkila
  2022-03-21 13:20                   ` Imre Deak
  -1 siblings, 1 reply; 80+ messages in thread
From: Juha-Pekka Heikkila @ 2022-02-15 19:34 UTC (permalink / raw)
  To: Chery, Nanley G, Nanley Chery, C, Ramalingam
  Cc: intel-gfx, Auld, Matthew, dri-devel

On 15.2.2022 20.24, Chery, Nanley G wrote:
> 
> 
>> -----Original Message-----
>> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
>> Sent: Tuesday, February 15, 2022 9:32 AM
>> To: Chery, Nanley G <nanley.g.chery@intel.com>; Nanley Chery
>> <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>
>> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Auld, Matthew
>> <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
>> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format
>> modifier for DG2 clear color
>>
>> On 15.2.2022 18.44, Chery, Nanley G wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
>>>> Sent: Tuesday, February 15, 2022 8:15 AM
>>>> To: Chery, Nanley G <nanley.g.chery@intel.com>; Nanley Chery
>>>> <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>
>>>> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Auld, Matthew
>>>> <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
>>>> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce
>>>> format modifier for DG2 clear color
>>>>
>>>> On 15.2.2022 17.02, Chery, Nanley G wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
>>>>>> Sent: Tuesday, February 15, 2022 6:56 AM
>>>>>> To: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam
>>>>>> <ramalingam.c@intel.com>
>>>>>> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Chery, Nanley G
>>>>>> <nanley.g.chery@intel.com>; Auld, Matthew <matthew.auld@intel.com>;
>>>>>> dri- devel <dri-devel@lists.freedesktop.org>
>>>>>> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce
>>>>>> format modifier for DG2 clear color
>>>>>>
>>>>>> On 12.2.2022 3.19, Nanley Chery wrote:
>>>>>>> On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C
>>>>>>> <ramalingam.c@intel.com>
>>>>>> wrote:
>>>>>>>>
>>>>>>>> From: Mika Kahola <mika.kahola@intel.com>
>>>>>>>>
>>>>>>>> DG2 clear color render compression uses Tile4 layout. Therefore,
>>>>>>>> we need to define a new format modifier for uAPI to support clear
>>>>>>>> color
>>>>>> rendering.
>>>>>>>>
>>>>>>>> v2:
>>>>>>>>       Display version is fixed. [Imre]
>>>>>>>>       KDoc is enhanced for cc modifier. [Nanley & Lionel]
>>>>>>>>
>>>>>>>> Signed-off-by: Mika Kahola <mika.kahola@intel.com>
>>>>>>>> cc: Anshuman Gupta <anshuman.gupta@intel.com>
>>>>>>>> Signed-off-by: Juha-Pekka Heikkilä
>>>>>>>> <juha-pekka.heikkila@intel.com>
>>>>>>>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>>>>>>> ---
>>>>>>>>      drivers/gpu/drm/i915/display/intel_fb.c            |  8 ++++++++
>>>>>>>>      drivers/gpu/drm/i915/display/skl_universal_plane.c |  9 ++++++++-
>>>>>>>>      include/uapi/drm/drm_fourcc.h                      | 10 ++++++++++
>>>>>>>>      3 files changed, 26 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c
>>>>>>>> b/drivers/gpu/drm/i915/display/intel_fb.c
>>>>>>>> index 4d4d01963f15..3df6ef5ffec5 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/display/intel_fb.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
>>>>>>>> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
>>>>>> intel_modifiers[] = {
>>>>>>>>                     .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
>>>>>>>>                     .display_ver = { 13, 13 },
>>>>>>>>                     .plane_caps = INTEL_PLANE_CAP_TILING_4 |
>>>>>>>> INTEL_PLANE_CAP_CCS_MC,
>>>>>>>> +       }, {
>>>>>>>> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
>>>>>>>> +               .display_ver = { 13, 13 },
>>>>>>>> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
>>>>>>>> + INTEL_PLANE_CAP_CCS_RC_CC,
>>>>>>>> +
>>>>>>>> +               .ccs.cc_planes = BIT(1),
>>>>>>>>             }, {
>>>>>>>>                     .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
>>>>>>>>                     .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
>>>>>>>> intel_tile_width_bytes(const struct drm_framebuffer *fb, int
>> color_plane)
>>>>>>>>                     else
>>>>>>>>                             return 512;
>>>>>>>>             case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
>>>>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>>>>>>>>             case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>>>>>>>>             case I915_FORMAT_MOD_4_TILED:
>>>>>>>>                     /*
>>>>>>>> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const
>>>>>>>> struct
>>>>>> drm_framebuffer *fb,
>>>>>>>>             case I915_FORMAT_MOD_Yf_TILED:
>>>>>>>>                     return 1 * 1024 * 1024;
>>>>>>>>             case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
>>>>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>>>>>>>>             case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
>>>>>>>>                     return 16 * 1024;
>>>>>>>>             default:
>>>>>>>> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>>>>>> b/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>>>>>> index c38ae0876c15..b4dced1907c5 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
>>>>>>>> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
>>>>>>>>                     return PLANE_CTL_TILED_4 |
>>>>>>>>                             PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
>>>>>>>>                             PLANE_CTL_CLEAR_COLOR_DISABLE;
>>>>>>>> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
>>>>>>>> +               return PLANE_CTL_TILED_4 |
>>>>>>>> + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
>>>>>>>>             case I915_FORMAT_MOD_Y_TILED_CCS:
>>>>>>>>             case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
>>>>>>>>                     return PLANE_CTL_TILED_Y |
>>>>>>>> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
>>>>>>>> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct
>>>>>>>> intel_crtc
>>>>>> *crtc,
>>>>>>>>                     break;
>>>>>>>>             case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+
>> */
>>>>>>>>                     if (HAS_4TILE(dev_priv)) {
>>>>>>>> -                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>>>>>>>> +                       u32 rc_mask =
>>>> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
>>>>>>>> +
>>>>>>>> + PLANE_CTL_CLEAR_COLOR_DISABLE;
>>>>>>>> +
>>>>>>>> +                       if ((val & rc_mask) == rc_mask)
>>>>>>>>                                     fb->modifier =
>>>> I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
>>>>>>>>                             else if (val &
>> PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
>>>>>>>>                                     fb->modifier =
>>>>>>>> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
>>>>>>>> +                       else if (val &
>> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>>>>>>>> +                               fb->modifier =
>>>>>>>> + I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
>>>>>>>>                             else
>>>>>>>>                                     fb->modifier = I915_FORMAT_MOD_4_TILED;
>>>>>>>>                     } else {
>>>>>>>> diff --git a/include/uapi/drm/drm_fourcc.h
>>>>>>>> b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84
>>>>>>>> 100644
>>>>>>>> --- a/include/uapi/drm/drm_fourcc.h
>>>>>>>> +++ b/include/uapi/drm/drm_fourcc.h
>>>>>>>> @@ -605,6 +605,16 @@ extern "C" {
>>>>>>>>       */
>>>>>>>>      #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
>>>>>> fourcc_mod_code(INTEL,
>>>>>>>> 11)
>>>>>>>>
>>>>>>>> +/*
>>>>>>>> + * Intel color control surfaces (CCS) for DG2 clear color render
>>>> compression.
>>>>>>>> + *
>>>>>>>> + * DG2 uses a unified compression format for clear color render
>>>>>> compression.
>>>>>>>
>>>>>>> What's unified about DG2's compression format? If this doesn't
>>>>>>> affect the layout, maybe we should drop this sentence.
>>>>>>>
>>>>>>>> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
>>>>>>>> + *
>>>>>>>
>>>>>>> This also needs a pitch aligned to four tiles, right? I think we
>>>>>>> can save some effort by referencing the DG2_RC_CCS modifier here.
>>>>>>>
>>>>>>>> + * Fast clear color value expected by HW is located in fb at
>>>>>>>> + offset
>>>>>>>> + 0 of plane#1
>>>>>>>
>>>>>>> Why is the expected offset hardcoded to 0 instead of relying on
>>>>>>> the offset provided by the modifier API? This looks like a bug.
>>>>>>
>>>>>> Hi Nanley,
>>>>>>
>>>>>> can you elaborate a bit, which offset from modifier API that
>>>>>> applies to cc
>>>> surface?
>>>>>>
>>>>>
>>>>> Hi Juha-Pekka,
>>>>>
>>>>> On the kernel-side of things, I'm thinking of
>> drm_mode_fb_cmd2::offsets[1].
>>>>>
>>>>
>>>> Hi Nanley,
>>>>
>>>> this offset is coming from userspace on creation of framebuffer, at
>>>> that moment from userspace caller can point to offset of desire.
>>>> Normally offset[0] is set at 0 and then offset[n] at plane n start
>>>> which is not stated to have to be exactly after plane n-1 end. Or did I
>> misunderstand what you meant?
>>>>
>>>
>>> Perhaps, at least, I'm not sure what you're meaning to say. This
>>> modifier description seems to say that the drm_mode_fb_cmd2::offsets
>>> value for the clear color plane must be zero. Are you saying that it's
>>> correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't
>> match mesa's expectations.
>>>
>>
>> It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must
>> be zero", it says "Fast clear color value expected by HW is located in fb at offset 0
>> of plane#1".
>>
> 
> Yes, it doesn't say that exactly, but that's what it seems to say. With every other
> modifier, it's implied that the data for the plane begins at the offset specified
> through the modifier API. So, explicitly mentioning it here (and with that wording)
> conveys a new requirement.

I don't have objections on changing this description but for reference 
gen12 version of the same says "The main surface is Y-tiled and is at 
plane index 0 whereas CCS is linear and at index 1. The clear color is 
stored at index 2, and the pitch should be ignored.", only plane indexes 
are mentioned. I anyway wrote neither of these descriptions.

> 
>> Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's
>> nothing stated about that offset.
>>
> 
> Technically, plane #1's location is specified to be the combination of ::handles[1]
> and ::offsets[1]. In practice though, I can imagine that there are areas of the stack
> that are implicitly requiring that all ::handles[] entries match.

I didn't think we needed to go deeper as you started to just talk about 
how drm_mode_fb_cmd2::offsets[1] not being used. Let's not waste time.

> 
>> These offsets are just offsets to bo which contain the framebuffer information
>> hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc
>> information is found starting at drm_mode_fb_cmd2::offsets[1][0]
>>
> 
> If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane
> index), I propose that we drop this sentence due to avoid any confusion.
> 

But it need to defined as part of the modifier. It's the modifier 
features which are being described here.

> This offset discussion raises another question. The description says that the value
> expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine?
> The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?

Generally answer is yes but these parts you can see in patch "[PATCH v5 
17/19] drm/i915/dg2: Flat CCS Support" and should be discussed there. 
Here "HW" should probably be changed something meaningful though.

/Juha-Pekka

>>
>>>
>>>> For cc plane this offset likely will not be zero anyway and caller
>>>> can move it as see fit to have cc plane (plane[1]) location[0] at place where
>> wanted to have it.
>>>>
>>>> /Juha-Pekka
>>>>
>>>>>
>>>>>>>
>>>>>>> We should probably give some info about the relevant fields in the
>>>>>>> fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
>>>>>>
>>>>>> agree, that's totally missing here.
>>>>>>
>>>>>> /Juha-Pekka
>>>>>>
>>>>>>>
>>>>>>>> + */
>>>>>>>> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
>>>>>> fourcc_mod_code(INTEL,
>>>>>>>> +12)
>>>>>>>> +
>>>>>>>>      /*
>>>>>>>>       * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
>>>>>>>>       *
>>>>>>>> --
>>>>>>>> 2.20.1
>>>>>>>>
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
  2022-02-15 14:53     ` Juha-Pekka Heikkila
@ 2022-02-17 17:15         ` Chery, Nanley G
  0 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-02-17 17:15 UTC (permalink / raw)
  To: juhapekka.heikkila, Nanley Chery, C, Ramalingam
  Cc: intel-gfx, Auld, Matthew, dri-devel



> -----Original Message-----
> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Sent: Tuesday, February 15, 2022 6:54 AM
> To: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam
> <ramalingam.c@intel.com>
> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Chery, Nanley G
> <nanley.g.chery@intel.com>; Auld, Matthew <matthew.auld@intel.com>; dri-
> devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified
> compression
> 
> On 12.2.2022 3.17, Nanley Chery wrote:
> > On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com>
> wrote:
> >>
> >> From: Matt Roper <matthew.d.roper@intel.com>
> >>
> >> DG2 unifies render compression and media compression into a single
> >> format for the first time.  The programming and buffer layout is
> >> supposed to match compression on older gen12 platforms, but the
> >> actual compression algorithm is different from any previous platform;
> >> as such, we need a new framebuffer modifier to represent buffers in
> >> this format, but otherwise we can re-use the existing gen12 compression
> driver logic.
> >>
> >> v2:
> >>    Display version fix [Imre]
> >>
> >> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> >> cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> >> Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
> >> cc: Anshuman Gupta <anshuman.gupta@intel.com>
> >> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> >> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/display/intel_fb.c       | 13 ++++++++++
> >>   .../drm/i915/display/skl_universal_plane.c    | 26 ++++++++++++++++---
> >>   include/uapi/drm/drm_fourcc.h                 | 22 ++++++++++++++++
> >>   3 files changed, 57 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c
> >> b/drivers/gpu/drm/i915/display/intel_fb.c
> >> index 94c57facbb46..4d4d01963f15 100644
> >> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> >> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> >> @@ -141,6 +141,14 @@ struct intel_modifier_desc {
> >>
> >>   static const struct intel_modifier_desc intel_modifiers[] = {
> >>          {
> >> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
> >> +               .display_ver = { 13, 13 },
> >> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> INTEL_PLANE_CAP_CCS_MC,
> >> +       }, {
> >> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
> >> +               .display_ver = { 13, 13 },
> >> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> INTEL_PLANE_CAP_CCS_RC,
> >> +       }, {
> >>                  .modifier = I915_FORMAT_MOD_4_TILED,
> >>                  .display_ver = { 13, 13 },
> >>                  .plane_caps = INTEL_PLANE_CAP_TILING_4, @@ -550,6
> >> +558,8 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int
> color_plane)
> >>                          return 128;
> >>                  else
> >>                          return 512;
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>          case I915_FORMAT_MOD_4_TILED:
> >>                  /*
> >>                   * Each 4K tile consists of 64B(8*8) subtiles, with
> >> @@ -752,6 +762,9 @@ unsigned int intel_surf_alignment(const struct
> drm_framebuffer *fb,
> >>          case I915_FORMAT_MOD_4_TILED:
> >>          case I915_FORMAT_MOD_Yf_TILED:
> >>                  return 1 * 1024 * 1024;
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >> +               return 16 * 1024;
> >>          default:
> >>                  MISSING_CASE(fb->modifier);
> >>                  return 0;
> >> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> index 5299dfe68802..c38ae0876c15 100644
> >> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> @@ -764,6 +764,14 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
> >>                  return PLANE_CTL_TILED_Y;
> >>          case I915_FORMAT_MOD_4_TILED:
> >>                  return PLANE_CTL_TILED_4;
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >> +               return PLANE_CTL_TILED_4 |
> >> +                       PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
> >> +                       PLANE_CTL_CLEAR_COLOR_DISABLE;
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >> +               return PLANE_CTL_TILED_4 |
> >> +                       PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
> >> +                       PLANE_CTL_CLEAR_COLOR_DISABLE;
> >>          case I915_FORMAT_MOD_Y_TILED_CCS:
> >>          case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
> >>                  return PLANE_CTL_TILED_Y |
> >> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >> @@ -2094,6 +2102,10 @@ static bool gen12_plane_has_mc_ccs(struct
> drm_i915_private *i915,
> >>          if (IS_ADLP_DISPLAY_STEP(i915, STEP_A0, STEP_B0))
> >>                  return false;
> >>
> >> +       /* Wa_14013215631 */
> >> +       if (IS_DG2_DISPLAY_STEP(i915, STEP_A0, STEP_C0))
> >> +               return false;
> >> +
> >>          return plane_id < PLANE_SPRITE4;
> >>   }
> >>
> >> @@ -2335,9 +2347,10 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
> >>          case PLANE_CTL_TILED_Y:
> >>                  plane_config->tiling = I915_TILING_Y;
> >>                  if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >> -                       fb->modifier = DISPLAY_VER(dev_priv) >= 12 ?
> >> -                               I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS :
> >> -                               I915_FORMAT_MOD_Y_TILED_CCS;
> >> +                       if (DISPLAY_VER(dev_priv) >= 12)
> >> +                               fb->modifier =
> I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS;
> >> +                       else
> >> +                               fb->modifier =
> >> + I915_FORMAT_MOD_Y_TILED_CCS;
> >>                  else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> >>                          fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS;
> >>                  else
> >> @@ -2345,7 +2358,12 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
> >>                  break;
> >>          case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
> >>                  if (HAS_4TILE(dev_priv)) {
> >> -                       fb->modifier = I915_FORMAT_MOD_4_TILED;
> >> +                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >> +                               fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
> >> +                       else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> >> +                               fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
> >> +                       else
> >> +                               fb->modifier =
> >> + I915_FORMAT_MOD_4_TILED;
> >>                  } else {
> >>                          if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >>                                  fb->modifier =
> >> I915_FORMAT_MOD_Yf_TILED_CCS; diff --git
> >> a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index
> >> b73fe6797fc3..b8fb7b44c03c 100644
> >> --- a/include/uapi/drm/drm_fourcc.h
> >> +++ b/include/uapi/drm/drm_fourcc.h
> >> @@ -583,6 +583,28 @@ extern "C" {
> >>    */
> >>   #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
> >>
> >> +/*
> >> + * Intel color control surfaces (CCS) for DG2 render compression.
> >> + *
> >> + * DG2 uses a new compression format for render compression. The
> >> +general
> >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> >> + * but a new hashing/compression algorithm is used, so a fresh
> >> +modifier must
> >> + * be associated with buffers of this type. Render compression uses
> >> +128 byte
> >> + * compression blocks.
> >
> > I think I've seen a way to configure the compression block size on TGL
> > at least. I can't find the spec text for that at the moment though...
> > Could we omit these mentions?
> 
> Not sure why general possibility of changing compression block size is relevant?
> All hw features can be changed but this defines how this modifier is being
> implemented.
> 

I was concerned about compatibility between the different modes, but I've
looked into the restrictions here and don't see any problems with this.

> Say you take I915_FORMAT_MOD_4_TILED_DG2_RC_CCS framebuffer including
> control surface and copy it out, then come back and restore framebuffer with
> same information. It is expected to be valid?
> 
> /Juha-Pekka
> 
> >
> >> + */
> >> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS
> fourcc_mod_code(INTEL,
> >> +10)
> >> +
> >
> > How about something like:
> >
> > The main surface is Tile 4 and at plane index 0. The CCS plane is
> > hidden from userspace. The main surface pitch is required to be a
> > multiple of four Tile 4 widths. The CCS is configured with the render
> > compression format associated with the main surface format.
> >

Actually, let's omit the last sentence. CCS has always been affected
by the main surface format, so I don't think there's a need to mention it
specifically for the DG2 modifier.

We do need to mention the 4-tile-wide pitch requirement though.

-Nanley
 
> > ....I think the CCS is technically accessible via the blitter engine,
> > so the part about the plane being "hidden" may need some tweaking.
> >
> >
> > -Nanley
> >
> >> +/*
> >> + * Intel color control surfaces (CCS) for DG2 media compression.
> >> + *
> >> + * DG2 uses a new compression format for media compression. The
> >> +general
> >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> >> + * but a new hashing/compression algorithm is used, so a fresh
> >> +modifier must
> >> + * be associated with buffers of this type. Media compression uses
> >> +256 byte
> >> + * compression blocks.
> >> + */
> >> +#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> fourcc_mod_code(INTEL,
> >> +11)
> >> +
> >>   /*
> >>    * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> >>    *
> >> --
> >> 2.20.1
> >>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
@ 2022-02-17 17:15         ` Chery, Nanley G
  0 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-02-17 17:15 UTC (permalink / raw)
  To: juhapekka.heikkila, Nanley Chery, C, Ramalingam
  Cc: intel-gfx, Auld, Matthew, dri-devel



> -----Original Message-----
> From: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Sent: Tuesday, February 15, 2022 6:54 AM
> To: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam
> <ramalingam.c@intel.com>
> Cc: intel-gfx <intel-gfx@lists.freedesktop.org>; Chery, Nanley G
> <nanley.g.chery@intel.com>; Auld, Matthew <matthew.auld@intel.com>; dri-
> devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified
> compression
> 
> On 12.2.2022 3.17, Nanley Chery wrote:
> > On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C <ramalingam.c@intel.com>
> wrote:
> >>
> >> From: Matt Roper <matthew.d.roper@intel.com>
> >>
> >> DG2 unifies render compression and media compression into a single
> >> format for the first time.  The programming and buffer layout is
> >> supposed to match compression on older gen12 platforms, but the
> >> actual compression algorithm is different from any previous platform;
> >> as such, we need a new framebuffer modifier to represent buffers in
> >> this format, but otherwise we can re-use the existing gen12 compression
> driver logic.
> >>
> >> v2:
> >>    Display version fix [Imre]
> >>
> >> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> >> cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
> >> Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
> >> cc: Anshuman Gupta <anshuman.gupta@intel.com>
> >> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> >> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/display/intel_fb.c       | 13 ++++++++++
> >>   .../drm/i915/display/skl_universal_plane.c    | 26 ++++++++++++++++---
> >>   include/uapi/drm/drm_fourcc.h                 | 22 ++++++++++++++++
> >>   3 files changed, 57 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c
> >> b/drivers/gpu/drm/i915/display/intel_fb.c
> >> index 94c57facbb46..4d4d01963f15 100644
> >> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> >> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> >> @@ -141,6 +141,14 @@ struct intel_modifier_desc {
> >>
> >>   static const struct intel_modifier_desc intel_modifiers[] = {
> >>          {
> >> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
> >> +               .display_ver = { 13, 13 },
> >> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> INTEL_PLANE_CAP_CCS_MC,
> >> +       }, {
> >> +               .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
> >> +               .display_ver = { 13, 13 },
> >> +               .plane_caps = INTEL_PLANE_CAP_TILING_4 |
> INTEL_PLANE_CAP_CCS_RC,
> >> +       }, {
> >>                  .modifier = I915_FORMAT_MOD_4_TILED,
> >>                  .display_ver = { 13, 13 },
> >>                  .plane_caps = INTEL_PLANE_CAP_TILING_4, @@ -550,6
> >> +558,8 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int
> color_plane)
> >>                          return 128;
> >>                  else
> >>                          return 512;
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >>          case I915_FORMAT_MOD_4_TILED:
> >>                  /*
> >>                   * Each 4K tile consists of 64B(8*8) subtiles, with
> >> @@ -752,6 +762,9 @@ unsigned int intel_surf_alignment(const struct
> drm_framebuffer *fb,
> >>          case I915_FORMAT_MOD_4_TILED:
> >>          case I915_FORMAT_MOD_Yf_TILED:
> >>                  return 1 * 1024 * 1024;
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >> +               return 16 * 1024;
> >>          default:
> >>                  MISSING_CASE(fb->modifier);
> >>                  return 0;
> >> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> index 5299dfe68802..c38ae0876c15 100644
> >> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> >> @@ -764,6 +764,14 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
> >>                  return PLANE_CTL_TILED_Y;
> >>          case I915_FORMAT_MOD_4_TILED:
> >>                  return PLANE_CTL_TILED_4;
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
> >> +               return PLANE_CTL_TILED_4 |
> >> +                       PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
> >> +                       PLANE_CTL_CLEAR_COLOR_DISABLE;
> >> +       case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
> >> +               return PLANE_CTL_TILED_4 |
> >> +                       PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
> >> +                       PLANE_CTL_CLEAR_COLOR_DISABLE;
> >>          case I915_FORMAT_MOD_Y_TILED_CCS:
> >>          case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
> >>                  return PLANE_CTL_TILED_Y |
> >> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> >> @@ -2094,6 +2102,10 @@ static bool gen12_plane_has_mc_ccs(struct
> drm_i915_private *i915,
> >>          if (IS_ADLP_DISPLAY_STEP(i915, STEP_A0, STEP_B0))
> >>                  return false;
> >>
> >> +       /* Wa_14013215631 */
> >> +       if (IS_DG2_DISPLAY_STEP(i915, STEP_A0, STEP_C0))
> >> +               return false;
> >> +
> >>          return plane_id < PLANE_SPRITE4;
> >>   }
> >>
> >> @@ -2335,9 +2347,10 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
> >>          case PLANE_CTL_TILED_Y:
> >>                  plane_config->tiling = I915_TILING_Y;
> >>                  if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >> -                       fb->modifier = DISPLAY_VER(dev_priv) >= 12 ?
> >> -                               I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS :
> >> -                               I915_FORMAT_MOD_Y_TILED_CCS;
> >> +                       if (DISPLAY_VER(dev_priv) >= 12)
> >> +                               fb->modifier =
> I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS;
> >> +                       else
> >> +                               fb->modifier =
> >> + I915_FORMAT_MOD_Y_TILED_CCS;
> >>                  else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> >>                          fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS;
> >>                  else
> >> @@ -2345,7 +2358,12 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
> >>                  break;
> >>          case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
> >>                  if (HAS_4TILE(dev_priv)) {
> >> -                       fb->modifier = I915_FORMAT_MOD_4_TILED;
> >> +                       if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >> +                               fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
> >> +                       else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> >> +                               fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
> >> +                       else
> >> +                               fb->modifier =
> >> + I915_FORMAT_MOD_4_TILED;
> >>                  } else {
> >>                          if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> >>                                  fb->modifier =
> >> I915_FORMAT_MOD_Yf_TILED_CCS; diff --git
> >> a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index
> >> b73fe6797fc3..b8fb7b44c03c 100644
> >> --- a/include/uapi/drm/drm_fourcc.h
> >> +++ b/include/uapi/drm/drm_fourcc.h
> >> @@ -583,6 +583,28 @@ extern "C" {
> >>    */
> >>   #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
> >>
> >> +/*
> >> + * Intel color control surfaces (CCS) for DG2 render compression.
> >> + *
> >> + * DG2 uses a new compression format for render compression. The
> >> +general
> >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> >> + * but a new hashing/compression algorithm is used, so a fresh
> >> +modifier must
> >> + * be associated with buffers of this type. Render compression uses
> >> +128 byte
> >> + * compression blocks.
> >
> > I think I've seen a way to configure the compression block size on TGL
> > at least. I can't find the spec text for that at the moment though...
> > Could we omit these mentions?
> 
> Not sure why general possibility of changing compression block size is relevant?
> All hw features can be changed but this defines how this modifier is being
> implemented.
> 

I was concerned about compatibility between the different modes, but I've
looked into the restrictions here and don't see any problems with this.

> Say you take I915_FORMAT_MOD_4_TILED_DG2_RC_CCS framebuffer including
> control surface and copy it out, then come back and restore framebuffer with
> same information. It is expected to be valid?
> 
> /Juha-Pekka
> 
> >
> >> + */
> >> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS
> fourcc_mod_code(INTEL,
> >> +10)
> >> +
> >
> > How about something like:
> >
> > The main surface is Tile 4 and at plane index 0. The CCS plane is
> > hidden from userspace. The main surface pitch is required to be a
> > multiple of four Tile 4 widths. The CCS is configured with the render
> > compression format associated with the main surface format.
> >

Actually, let's omit the last sentence. CCS has always been affected
by the main surface format, so I don't think there's a need to mention it
specifically for the DG2 modifier.

We do need to mention the 4-tile-wide pitch requirement though.

-Nanley
 
> > ....I think the CCS is technically accessible via the blitter engine,
> > so the part about the plane being "hidden" may need some tweaking.
> >
> >
> > -Nanley
> >
> >> +/*
> >> + * Intel color control surfaces (CCS) for DG2 media compression.
> >> + *
> >> + * DG2 uses a new compression format for media compression. The
> >> +general
> >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> >> + * but a new hashing/compression algorithm is used, so a fresh
> >> +modifier must
> >> + * be associated with buffers of this type. Media compression uses
> >> +256 byte
> >> + * compression blocks.
> >> + */
> >> +#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> fourcc_mod_code(INTEL,
> >> +11)
> >> +
> >>   /*
> >>    * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> >>    *
> >> --
> >> 2.20.1
> >>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
  2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
@ 2022-02-18  5:39     ` Lucas De Marchi
  -1 siblings, 0 replies; 80+ messages in thread
From: Lucas De Marchi @ 2022-02-18  5:39 UTC (permalink / raw)
  To: Ramalingam C
  Cc: Daniel Vetter, intel-gfx, Kenneth Graunke, dri-devel,
	Slawomir Milczarek, Matthew Auld, mesa-dev

On Tue, Feb 01, 2022 at 04:11:22PM +0530, Ramalingam C wrote:
>Details of the 64k pagesize support added as part of DG2 enabling and its
>implicit impact on the uAPI.
>
>v2: improvised the Flat-CCS documentation [Danvet & CQ]
>v3: made only for 64k pagesize support
>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>cc: Matthew Auld <matthew.auld@intel.com>
>cc: Simon Ser <contact@emersion.fr>
>cc: Pekka Paalanen <ppaalanen@gmail.com>
>Cc: Jordan Justen <jordan.l.justen@intel.com>
>Cc: Kenneth Graunke <kenneth@whitecape.org>
>Cc: mesa-dev@lists.freedesktop.org
>Cc: Tony Ye <tony.ye@intel.com>
>Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
>---
> Documentation/gpu/rfc/i915_dg2.rst | 25 +++++++++++++++++++++++++
> Documentation/gpu/rfc/index.rst    |  3 +++
> 2 files changed, 28 insertions(+)
> create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
>
>diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst
>new file mode 100644
>index 000000000000..f4eb5a219897
>--- /dev/null
>+++ b/Documentation/gpu/rfc/i915_dg2.rst
>@@ -0,0 +1,25 @@
>+====================
>+I915 DG2 RFC Section
>+====================
>+
>+Upstream plan
>+=============
>+Plan to upstream the DG2 enabling is:
>+
>+* Merge basic HW enabling for DG2 (Still without pciid)
>+* Merge the 64k support for lmem
>+* Merge the flat CCS enabling patches
>+* Add the pciid for DG2 and enable the DG2 in CI

does this make sense after the fact? Earlier version of this patch
Daniel Vetter asked this to be moved to the be the first patch. I see
you added it in the cover letter, but keeping this in
gpu/rfc/i915_dg2.rst doesn't make much sense IMO. Maybe just drop this
patch?

Lucas De Marchi

>+
>+
>+64K page support for lmem
>+=========================
>+On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is not
>+supported anymore.
>+
>+DG2 hw doesn't support the 64k (lmem) and 4k (smem) pages in the same ppgtt
>+Page table. Refer the struct drm_i915_gem_create_ext for the implication of
>+handling the 64k page size.
>+
>+.. kernel-doc:: include/uapi/drm/i915_drm.h
>+        :functions: drm_i915_gem_create_ext
>diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
>index 91e93a705230..afb320ed4028 100644
>--- a/Documentation/gpu/rfc/index.rst
>+++ b/Documentation/gpu/rfc/index.rst
>@@ -20,6 +20,9 @@ host such documentation:
>
>     i915_gem_lmem.rst
>
>+.. toctree::
>+    i915_dg2.rst
>+
> .. toctree::
>
>     i915_scheduler.rst
>-- 
>2.20.1
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
@ 2022-02-18  5:39     ` Lucas De Marchi
  0 siblings, 0 replies; 80+ messages in thread
From: Lucas De Marchi @ 2022-02-18  5:39 UTC (permalink / raw)
  To: Ramalingam C
  Cc: Daniel Vetter, intel-gfx, Kenneth Graunke, dri-devel,
	Slawomir Milczarek, Pekka Paalanen, Matthew Auld, Simon Ser,
	mesa-dev

On Tue, Feb 01, 2022 at 04:11:22PM +0530, Ramalingam C wrote:
>Details of the 64k pagesize support added as part of DG2 enabling and its
>implicit impact on the uAPI.
>
>v2: improvised the Flat-CCS documentation [Danvet & CQ]
>v3: made only for 64k pagesize support
>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>cc: Matthew Auld <matthew.auld@intel.com>
>cc: Simon Ser <contact@emersion.fr>
>cc: Pekka Paalanen <ppaalanen@gmail.com>
>Cc: Jordan Justen <jordan.l.justen@intel.com>
>Cc: Kenneth Graunke <kenneth@whitecape.org>
>Cc: mesa-dev@lists.freedesktop.org
>Cc: Tony Ye <tony.ye@intel.com>
>Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
>---
> Documentation/gpu/rfc/i915_dg2.rst | 25 +++++++++++++++++++++++++
> Documentation/gpu/rfc/index.rst    |  3 +++
> 2 files changed, 28 insertions(+)
> create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
>
>diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst
>new file mode 100644
>index 000000000000..f4eb5a219897
>--- /dev/null
>+++ b/Documentation/gpu/rfc/i915_dg2.rst
>@@ -0,0 +1,25 @@
>+====================
>+I915 DG2 RFC Section
>+====================
>+
>+Upstream plan
>+=============
>+Plan to upstream the DG2 enabling is:
>+
>+* Merge basic HW enabling for DG2 (Still without pciid)
>+* Merge the 64k support for lmem
>+* Merge the flat CCS enabling patches
>+* Add the pciid for DG2 and enable the DG2 in CI

does this make sense after the fact? Earlier version of this patch
Daniel Vetter asked this to be moved to the be the first patch. I see
you added it in the cover letter, but keeping this in
gpu/rfc/i915_dg2.rst doesn't make much sense IMO. Maybe just drop this
patch?

Lucas De Marchi

>+
>+
>+64K page support for lmem
>+=========================
>+On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is not
>+supported anymore.
>+
>+DG2 hw doesn't support the 64k (lmem) and 4k (smem) pages in the same ppgtt
>+Page table. Refer the struct drm_i915_gem_create_ext for the implication of
>+handling the 64k page size.
>+
>+.. kernel-doc:: include/uapi/drm/i915_drm.h
>+        :functions: drm_i915_gem_create_ext
>diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
>index 91e93a705230..afb320ed4028 100644
>--- a/Documentation/gpu/rfc/index.rst
>+++ b/Documentation/gpu/rfc/index.rst
>@@ -20,6 +20,9 @@ host such documentation:
>
>     i915_gem_lmem.rst
>
>+.. toctree::
>+    i915_dg2.rst
>+
> .. toctree::
>
>     i915_scheduler.rst
>-- 
>2.20.1
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
  2022-02-18  5:39     ` Lucas De Marchi
@ 2022-02-18  8:20       ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-18  8:20 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Daniel Vetter, intel-gfx, Kenneth Graunke, dri-devel,
	Slawomir Milczarek, Matthew Auld, mesa-dev

On 2022-02-17 at 21:39:16 -0800, Lucas De Marchi wrote:
> On Tue, Feb 01, 2022 at 04:11:22PM +0530, Ramalingam C wrote:
> > Details of the 64k pagesize support added as part of DG2 enabling and its
> > implicit impact on the uAPI.
> > 
> > v2: improvised the Flat-CCS documentation [Danvet & CQ]
> > v3: made only for 64k pagesize support
> > 
> > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > cc: Matthew Auld <matthew.auld@intel.com>
> > cc: Simon Ser <contact@emersion.fr>
> > cc: Pekka Paalanen <ppaalanen@gmail.com>
> > Cc: Jordan Justen <jordan.l.justen@intel.com>
> > Cc: Kenneth Graunke <kenneth@whitecape.org>
> > Cc: mesa-dev@lists.freedesktop.org
> > Cc: Tony Ye <tony.ye@intel.com>
> > Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
> > ---
> > Documentation/gpu/rfc/i915_dg2.rst | 25 +++++++++++++++++++++++++
> > Documentation/gpu/rfc/index.rst    |  3 +++
> > 2 files changed, 28 insertions(+)
> > create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
> > 
> > diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst
> > new file mode 100644
> > index 000000000000..f4eb5a219897
> > --- /dev/null
> > +++ b/Documentation/gpu/rfc/i915_dg2.rst
> > @@ -0,0 +1,25 @@
> > +====================
> > +I915 DG2 RFC Section
> > +====================
> > +
> > +Upstream plan
> > +=============
> > +Plan to upstream the DG2 enabling is:
> > +
> > +* Merge basic HW enabling for DG2 (Still without pciid)
> > +* Merge the 64k support for lmem
> > +* Merge the flat CCS enabling patches
> > +* Add the pciid for DG2 and enable the DG2 in CI
> 
> does this make sense after the fact? Earlier version of this patch
> Daniel Vetter asked this to be moved to the be the first patch. I see
> you added it in the cover letter, but keeping this in
> gpu/rfc/i915_dg2.rst doesn't make much sense IMO. Maybe just drop this
> patch?

Yes. I couldn't move this to the start of the series as the kdoc
referenced here are from later patches of the series.

But now considering we have the Kdoc for uapi at the respective patches
itself we could drop this patch.

Daniel, Hope you agree on that?

Ram.
> 
> Lucas De Marchi
> 
> > +
> > +
> > +64K page support for lmem
> > +=========================
> > +On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is not
> > +supported anymore.
> > +
> > +DG2 hw doesn't support the 64k (lmem) and 4k (smem) pages in the same ppgtt
> > +Page table. Refer the struct drm_i915_gem_create_ext for the implication of
> > +handling the 64k page size.
> > +
> > +.. kernel-doc:: include/uapi/drm/i915_drm.h
> > +        :functions: drm_i915_gem_create_ext
> > diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
> > index 91e93a705230..afb320ed4028 100644
> > --- a/Documentation/gpu/rfc/index.rst
> > +++ b/Documentation/gpu/rfc/index.rst
> > @@ -20,6 +20,9 @@ host such documentation:
> > 
> >     i915_gem_lmem.rst
> > 
> > +.. toctree::
> > +    i915_dg2.rst
> > +
> > .. toctree::
> > 
> >     i915_scheduler.rst
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 09/19] Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
@ 2022-02-18  8:20       ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-18  8:20 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Daniel Vetter, intel-gfx, Kenneth Graunke, dri-devel,
	Slawomir Milczarek, Pekka Paalanen, Matthew Auld, Simon Ser,
	mesa-dev

On 2022-02-17 at 21:39:16 -0800, Lucas De Marchi wrote:
> On Tue, Feb 01, 2022 at 04:11:22PM +0530, Ramalingam C wrote:
> > Details of the 64k pagesize support added as part of DG2 enabling and its
> > implicit impact on the uAPI.
> > 
> > v2: improvised the Flat-CCS documentation [Danvet & CQ]
> > v3: made only for 64k pagesize support
> > 
> > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > cc: Matthew Auld <matthew.auld@intel.com>
> > cc: Simon Ser <contact@emersion.fr>
> > cc: Pekka Paalanen <ppaalanen@gmail.com>
> > Cc: Jordan Justen <jordan.l.justen@intel.com>
> > Cc: Kenneth Graunke <kenneth@whitecape.org>
> > Cc: mesa-dev@lists.freedesktop.org
> > Cc: Tony Ye <tony.ye@intel.com>
> > Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
> > ---
> > Documentation/gpu/rfc/i915_dg2.rst | 25 +++++++++++++++++++++++++
> > Documentation/gpu/rfc/index.rst    |  3 +++
> > 2 files changed, 28 insertions(+)
> > create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
> > 
> > diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst
> > new file mode 100644
> > index 000000000000..f4eb5a219897
> > --- /dev/null
> > +++ b/Documentation/gpu/rfc/i915_dg2.rst
> > @@ -0,0 +1,25 @@
> > +====================
> > +I915 DG2 RFC Section
> > +====================
> > +
> > +Upstream plan
> > +=============
> > +Plan to upstream the DG2 enabling is:
> > +
> > +* Merge basic HW enabling for DG2 (Still without pciid)
> > +* Merge the 64k support for lmem
> > +* Merge the flat CCS enabling patches
> > +* Add the pciid for DG2 and enable the DG2 in CI
> 
> does this make sense after the fact? Earlier version of this patch
> Daniel Vetter asked this to be moved to the be the first patch. I see
> you added it in the cover letter, but keeping this in
> gpu/rfc/i915_dg2.rst doesn't make much sense IMO. Maybe just drop this
> patch?

Yes. I couldn't move this to the start of the series as the kdoc
referenced here are from later patches of the series.

But now considering we have the Kdoc for uapi at the respective patches
itself we could drop this patch.

Daniel, Hope you agree on that?

Ram.
> 
> Lucas De Marchi
> 
> > +
> > +
> > +64K page support for lmem
> > +=========================
> > +On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is not
> > +supported anymore.
> > +
> > +DG2 hw doesn't support the 64k (lmem) and 4k (smem) pages in the same ppgtt
> > +Page table. Refer the struct drm_i915_gem_create_ext for the implication of
> > +handling the 64k page size.
> > +
> > +.. kernel-doc:: include/uapi/drm/i915_drm.h
> > +        :functions: drm_i915_gem_create_ext
> > diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
> > index 91e93a705230..afb320ed4028 100644
> > --- a/Documentation/gpu/rfc/index.rst
> > +++ b/Documentation/gpu/rfc/index.rst
> > @@ -20,6 +20,9 @@ host such documentation:
> > 
> >     i915_gem_lmem.rst
> > 
> > +.. toctree::
> > +    i915_dg2.rst
> > +
> > .. toctree::
> > 
> >     i915_scheduler.rst
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 11/19] drm/i915/lmem: Enable lmem for platforms with Flat CCS
  2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
  (?)
@ 2022-02-18 10:08   ` Lucas De Marchi
  2022-02-18 10:17     ` Lucas De Marchi
  -1 siblings, 1 reply; 80+ messages in thread
From: Lucas De Marchi @ 2022-02-18 10:08 UTC (permalink / raw)
  To: Ramalingam C; +Cc: Abdiel Janulgue, intel-gfx, Matthew Auld, dri-devel

On Tue, Feb 01, 2022 at 04:11:24PM +0530, Ramalingam C wrote:
>From: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
>
>A portion of device memory is reserved for Flat CCS so usable
>device memory will be reduced by size of Flat CCS. Size of
>Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”.
>So to get effective device memory we need to subtract
>total device memory by Flat CCS memory size.
>
>v2:
>  Addressed the small bar related issue [Matt]
>  Removed a reduntant check [Matt]
>
>Cc: Matthew Auld <matthew.auld@intel.com>
>Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>---
> drivers/gpu/drm/i915/gt/intel_gt.c          | 19 ++++++++++++++++
> drivers/gpu/drm/i915/gt/intel_gt.h          |  1 +
> drivers/gpu/drm/i915/gt/intel_region_lmem.c | 24 +++++++++++++++++++--
> drivers/gpu/drm/i915/i915_reg.h             |  3 +++
> 4 files changed, 45 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>index f59933abbb3a..e40d98cb3a2d 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gt.c
>+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>@@ -911,6 +911,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg)
> 	return intel_uncore_read_fw(gt->uncore, reg);
> }
>
>+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg)
>+{
>+	int type;
>+	u8 sliceid, subsliceid;
>+
>+	for (type = 0; type < NUM_STEERING_TYPES; type++) {
>+		if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
>+			intel_gt_get_valid_steering(gt, type, &sliceid,
>+						    &subsliceid);
>+			return intel_uncore_read_with_mcr_steering(gt->uncore,
>+								   reg,
>+								   sliceid,
>+								   subsliceid);
>+		}
>+	}
>+
>+	return intel_uncore_read(gt->uncore, reg);
>+}
>+
> void intel_gt_info_print(const struct intel_gt_info *info,
> 			 struct drm_printer *p)
> {
>diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
>index 2dad46c3eff2..0f571c8ee22b 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gt.h
>+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>@@ -85,6 +85,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt,
> }
>
> u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg);
>+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
>
> void intel_gt_info_print(const struct intel_gt_info *info,
> 			 struct drm_printer *p);
>diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>index 21215a080088..f1d37b46b505 100644
>--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>@@ -205,8 +205,28 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
> 	if (!IS_DGFX(i915))
> 		return ERR_PTR(-ENODEV);
>
>-	/* Stolen starts from GSMBASE on DG1 */
>-	lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
>+	if (HAS_FLAT_CCS(i915)) {
>+		u64 tile_stolen, flat_ccs_base_addr_reg, flat_ccs_base;
>+
>+		lmem_size = pci_resource_len(pdev, 2);
>+		flat_ccs_base_addr_reg = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR);

nit since this will need a respin due to conflicts:
we usually call _reg an i915_reg_t variable. But here you have the
value, not the register. Maybe "flat_ccs_base_addr"?


>+		flat_ccs_base = (flat_ccs_base_addr_reg >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
>+
>+		if (GEM_WARN_ON(lmem_size < flat_ccs_base))
>+			return ERR_PTR(-ENODEV);
>+
>+		tile_stolen = lmem_size - flat_ccs_base;
>+
>+		/* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */
>+		if (tile_stolen == lmem_size)
>+			DRM_ERROR("CCS_BASE_ADDR register did not have expected value\n");

drm_err()

>+
>+		lmem_size -= tile_stolen;
>+	} else {
>+		/* Stolen starts from GSMBASE without CCS */
>+		lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE);
>+	}
>+
>
> 	io_start = pci_resource_start(pdev, 2);
> 	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
>diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>index 0f36af8dc3a1..9b5423572fe9 100644
>--- a/drivers/gpu/drm/i915/i915_reg.h
>+++ b/drivers/gpu/drm/i915/i915_reg.h
>@@ -11651,6 +11651,9 @@ enum skl_power_gate {
> #define   SGGI_DIS			REG_BIT(15)
> #define   SGR_DIS			REG_BIT(13)
>
>+#define XEHPSDV_FLAT_CCS_BASE_ADDR             _MMIO(0x4910)
>+#define   XEHPSDV_CCS_BASE_SHIFT               8
>+

you will have a conflict here... I fixed it locally by moving to
gt/intel_gt_regs.h


With the above,

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>

Lucas De Marchi

> /* gamt regs */
> #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4)
> #define   GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW  0x67F1427F /* max/min for LRA1/2 */
>-- 
>2.20.1
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 11/19] drm/i915/lmem: Enable lmem for platforms with Flat CCS
  2022-02-18 10:08   ` Lucas De Marchi
@ 2022-02-18 10:17     ` Lucas De Marchi
  0 siblings, 0 replies; 80+ messages in thread
From: Lucas De Marchi @ 2022-02-18 10:17 UTC (permalink / raw)
  To: Ramalingam C; +Cc: Abdiel Janulgue, intel-gfx, Matthew Auld, dri-devel

On Fri, Feb 18, 2022 at 02:08:18AM -0800, Lucas De Marchi wrote:
>On Tue, Feb 01, 2022 at 04:11:24PM +0530, Ramalingam C wrote:
>>From: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
>>
>>A portion of device memory is reserved for Flat CCS so usable
>>device memory will be reduced by size of Flat CCS. Size of
>>Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”.
>>So to get effective device memory we need to subtract
>>total device memory by Flat CCS memory size.
>>
>>v2:
>> Addressed the small bar related issue [Matt]
>> Removed a reduntant check [Matt]
>>
>>Cc: Matthew Auld <matthew.auld@intel.com>
>>Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
>>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>>---
>>drivers/gpu/drm/i915/gt/intel_gt.c          | 19 ++++++++++++++++
>>drivers/gpu/drm/i915/gt/intel_gt.h          |  1 +
>>drivers/gpu/drm/i915/gt/intel_region_lmem.c | 24 +++++++++++++++++++--
>>drivers/gpu/drm/i915/i915_reg.h             |  3 +++
>>4 files changed, 45 insertions(+), 2 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>>index f59933abbb3a..e40d98cb3a2d 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>@@ -911,6 +911,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg)
>>	return intel_uncore_read_fw(gt->uncore, reg);
>>}
>>
>>+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg)
>>+{
>>+	int type;
>>+	u8 sliceid, subsliceid;
>>+
>>+	for (type = 0; type < NUM_STEERING_TYPES; type++) {
>>+		if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
>>+			intel_gt_get_valid_steering(gt, type, &sliceid,
>>+						    &subsliceid);
>>+			return intel_uncore_read_with_mcr_steering(gt->uncore,
>>+								   reg,
>>+								   sliceid,
>>+								   subsliceid);
>>+		}
>>+	}
>>+
>>+	return intel_uncore_read(gt->uncore, reg);
>>+}
>>+
>>void intel_gt_info_print(const struct intel_gt_info *info,
>>			 struct drm_printer *p)
>>{
>>diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
>>index 2dad46c3eff2..0f571c8ee22b 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_gt.h
>>+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>>@@ -85,6 +85,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt,
>>}
>>
>>u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg);
>>+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
>>
>>void intel_gt_info_print(const struct intel_gt_info *info,
>>			 struct drm_printer *p);
>>diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>>index 21215a080088..f1d37b46b505 100644
>>--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>>+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>>@@ -205,8 +205,28 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
>>	if (!IS_DGFX(i915))
>>		return ERR_PTR(-ENODEV);
>>
>>-	/* Stolen starts from GSMBASE on DG1 */
>>-	lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
>>+	if (HAS_FLAT_CCS(i915)) {
>>+		u64 tile_stolen, flat_ccs_base_addr_reg, flat_ccs_base;
>>+
>>+		lmem_size = pci_resource_len(pdev, 2);
>>+		flat_ccs_base_addr_reg = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR);
>
>nit since this will need a respin due to conflicts:
>we usually call _reg an i915_reg_t variable. But here you have the
>value, not the register. Maybe "flat_ccs_base_addr"?
>
>
>>+		flat_ccs_base = (flat_ccs_base_addr_reg >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
>>+
>>+		if (GEM_WARN_ON(lmem_size < flat_ccs_base))
>>+			return ERR_PTR(-ENODEV);
>>+
>>+		tile_stolen = lmem_size - flat_ccs_base;
>>+
>>+		/* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */
>>+		if (tile_stolen == lmem_size)
>>+			DRM_ERROR("CCS_BASE_ADDR register did not have expected value\n");
>
>drm_err()
>
>>+
>>+		lmem_size -= tile_stolen;
>>+	} else {
>>+		/* Stolen starts from GSMBASE without CCS */
>>+		lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE);
>>+	}
>>+
>>
>>	io_start = pci_resource_start(pdev, 2);
>>	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
>>diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>>index 0f36af8dc3a1..9b5423572fe9 100644
>>--- a/drivers/gpu/drm/i915/i915_reg.h
>>+++ b/drivers/gpu/drm/i915/i915_reg.h
>>@@ -11651,6 +11651,9 @@ enum skl_power_gate {
>>#define   SGGI_DIS			REG_BIT(15)
>>#define   SGR_DIS			REG_BIT(13)
>>
>>+#define XEHPSDV_FLAT_CCS_BASE_ADDR             _MMIO(0x4910)
>>+#define   XEHPSDV_CCS_BASE_SHIFT               8
>>+
>
>you will have a conflict here... I fixed it locally by moving to
>gt/intel_gt_regs.h
>
>
>With the above,
>
>Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>

Also, I may be wrong and couldn't test it today. But... to finally have BAT
on CI working, this and "drm/i915/xehpsdv: Add has_flat_ccs to device info" are
the only changes from the FlatCCS part that are strictly
needed, isn't it?
"drm/i915/migrate: add acceleration support for DG2", all the plane
format, support for clearing ccs etc are things that could be on top.

Lucas De Marchi

>
>Lucas De Marchi
>
>>/* gamt regs */
>>#define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4)
>>#define   GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW  0x67F1427F /* max/min for LRA1/2 */
>>-- 
>>2.20.1
>>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v5 00/19] drm/i915/dg2: Enabling 64k page size and flat ccs
  2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 19:04   ` Ramalingam C
  -1 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-18 19:04 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Matthew Auld, Lionel Landwerlin

Just a note here. To enable the dg2 with basic support sooner on CI we
have taken a subset of this series separtely at
https://patchwork.freedesktop.org/series/100419/

Remaining patches will be pursued on top the above series. Thanks for
the review comments. We will fix them working with reviewers. Thanks.

Ram.

On 2022-02-01 at 16:11:13 +0530, Ramalingam C wrote:
> This series introduces the enabling patches for new memory compression
> feature Flat CCS and 64k page support for i915 local memory, along with
> documentation on the uAPI impact. Included the details of the feature and
> the implications on the uAPI below. Which is also added into
> Documentation/gpu/rfc/i915_dg2.rst
> 
> DG2 64K page size support:
> =========================
> 
> On discrete platforms, starting from DG2, we have to contend with GTT
> page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
> objects.  Specifically the hardware only supports 64K or larger GTT
> page sizes for such memory. The kernel will already ensure that all
> I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
> sizes underneath.
> 
> Note that the returned size here will always reflect any required
> rounding up done by the kernel, i.e 4K will now become 64K on devices
> such as DG2.
> 
> Special DG2 GTT address alignment requirement:
> 
> The GTT alignment will also need to be at least 2M for such objects.
> 
> Note that due to how the hardware implements 64K GTT page support, we
> have some further complications:
> 
> 1) The entire PDE (which covers a 2MB virtual address range), must
> contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
> PDE is forbidden by the hardware.
> 
> 2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
> objects.
> 
> To keep things simple for userland, we mandate that any GTT mappings
> must be aligned to and rounded up to 2MB. As this only wastes virtual
> address space and avoids userland having to copy any needlessly
> complicated PDE sharing scheme (coloring) and only affects DG2, this
> is deemed to be a good compromise.
> 
> Flat CCS support for lmem
> =========================
> On Xe-HP and later devices, we use dedicated compression control state
> (CCS) stored in local memory for each surface, to support the 3D and
> media compression formats.
> 
> The memory required for the CCS of the entire local memory is 1/256 of
> the local memory size. So before the kernel boot, the required memory is
> reserved for the CCS data and a secure register will be programmed with
> the CCS base address.
> 
> Flat CCS data needs to be cleared when a lmem object is allocated. And
> CCS data can be copied in and out of CCS region through
> XY_CTRL_SURF_COPY_BLT. CPU can’t access the CCS data directly.
> 
> When we exaust the lmem, if the object’s placements support smem, then
> we can directly decompress the compressed lmem object into smem and
> start using it from smem itself.
> 
> But when we need to swapout the compressed lmem object into a smem
> region though objects’ placement doesn’t support smem, then we copy the
> lmem content as it is into smem region along with ccs data (using
> XY_CTRL_SURF_COPY_BLT). When the object is referred, lmem content will
> be swaped in along with restoration of the CCS data (using
> XY_CTRL_SURF_COPY_BLT) at corresponding location.
> 
> Flat-CCS Modifiers for different compression formats
> ====================================================
> I915_FORMAT_MOD_4_TILED_DG2_RC_CCS - used to indicate the buffers of
> Flat CCS render compression formats. Though the general layout is same
> as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression
> algorithm is used. Render compression uses 128 byte compression blocks
> 
> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS -used to indicate the buffers of Flat
> CCS media compression formats. Though the general layout is same as
> I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm
> is used. Media compression uses 256 byte compression blocks.
> 
> I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC - used to indicate the buffers of
> Flat CCS clear color render compression formats. Unified compression
> format for clear color render compression. The genral layout is a tiled
> layout using 4Kb tiles i.e Tile4 layout. Fast clear color value expected
> by HW is located in fb at offset 0 of plane#1
> 
> v2:
>   Fixed some formatting issues and platform naming issues
>   Added some more documentation on Flat-CCS
> 
> v3:
>   Plane programming is handled for flat-ccs and clear color
>   Tile4 and flat ccs modifier patches are rebased on table based
>     modifier reference method
>   Three patches are squashed
>   Y tile is pruned for DG2.
>   flat_ccs_cc plane format info is added
>   Added mesa, compute and media ppl for required uAPI ack.
> 
> v4:
>   Rebasing of the patches
> 
> v5:
>   KDoc is enhanced for cc modifier. [Nanley & Lionel]
>   inbuild macro usage for functional fix [Bob]
>   Addressed review comments from Matt
>   Platform coverage fix for modifiers [Imre]
> 
> Abdiel Janulgue (1):
>   drm/i915/lmem: Enable lmem for platforms with Flat CCS
> 
> Anshuman Gupta (1):
>   drm/i915/dg2: Flat CCS Support
> 
> Ayaz A Siddiqui (1):
>   drm/i915/gt: Clear compress metadata for Xe_HP platforms
> 
> CQ Tang (1):
>   drm/i915/xehpsdv: Add has_flat_ccs to device info
> 
> Matt Roper (1):
>   drm/i915/dg2: Add DG2 unified compression
> 
> Matthew Auld (6):
>   drm/i915: enforce min GTT alignment for discrete cards
>   drm/i915: support 64K GTT pages for discrete cards
>   drm/i915/gtt: allow overriding the pt alignment
>   drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
>   drm/i915/migrate: add acceleration support for DG2
>   drm/i915/uapi: document behaviour for DG2 64K support
> 
> Mika Kahola (1):
>   uapi/drm/dg2: Introduce format modifier for DG2 clear color
> 
> Ramalingam C (4):
>   drm/i915: add needs_compact_pt flag
>   Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
>   drm/i915/Flat-CCS: Document on Flat-CCS memory compression
>   Doc/gpu/rfc/i915: i915 DG2 flat-CCS uAPI
> 
> Robert Beckett (1):
>   drm/i915: add gtt misalignment test
> 
> Stanislav Lisovskiy (2):
>   drm/i915: Introduce new Tile 4 format
>   drm/i915/dg2: Tile 4 plane format support
> 
>  Documentation/gpu/rfc/i915_dg2.rst            |  32 ++
>  Documentation/gpu/rfc/index.rst               |   3 +
>  drivers/gpu/drm/i915/display/intel_display.c  |   5 +-
>  drivers/gpu/drm/i915/display/intel_fb.c       |  68 +++-
>  drivers/gpu/drm/i915/display/intel_fb.h       |   1 +
>  drivers/gpu/drm/i915/display/intel_fbc.c      |   1 +
>  .../drm/i915/display/intel_plane_initial.c    |   1 +
>  .../drm/i915/display/skl_universal_plane.c    |  70 +++-
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++
>  .../i915/gem/selftests/i915_gem_client_blt.c  |  21 +-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 158 +++++++-
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |  14 +
>  drivers/gpu/drm/i915/gt/intel_gt.c            |  19 +
>  drivers/gpu/drm/i915/gt/intel_gt.h            |   1 +
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  12 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  31 +-
>  drivers/gpu/drm/i915/gt/intel_migrate.c       | 336 ++++++++++++++++--
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  17 +-
>  drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  24 +-
>  drivers/gpu/drm/i915/i915_drv.h               |  18 +-
>  drivers/gpu/drm/i915/i915_pci.c               |   4 +
>  drivers/gpu/drm/i915/i915_reg.h               |   4 +
>  drivers/gpu/drm/i915/i915_vma.c               |   9 +
>  drivers/gpu/drm/i915/intel_device_info.h      |   3 +
>  drivers/gpu/drm/i915/intel_pm.c               |   1 +
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 224 ++++++++++--
>  include/uapi/drm/drm_fourcc.h                 |  43 +++
>  include/uapi/drm/i915_drm.h                   |  44 ++-
>  28 files changed, 1102 insertions(+), 122 deletions(-)
>  create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
> 
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 00/19] drm/i915/dg2: Enabling 64k page size and flat ccs
@ 2022-02-18 19:04   ` Ramalingam C
  0 siblings, 0 replies; 80+ messages in thread
From: Ramalingam C @ 2022-02-18 19:04 UTC (permalink / raw)
  To: dri-devel, intel-gfx; +Cc: Matthew Auld

Just a note here. To enable the dg2 with basic support sooner on CI we
have taken a subset of this series separtely at
https://patchwork.freedesktop.org/series/100419/

Remaining patches will be pursued on top the above series. Thanks for
the review comments. We will fix them working with reviewers. Thanks.

Ram.

On 2022-02-01 at 16:11:13 +0530, Ramalingam C wrote:
> This series introduces the enabling patches for new memory compression
> feature Flat CCS and 64k page support for i915 local memory, along with
> documentation on the uAPI impact. Included the details of the feature and
> the implications on the uAPI below. Which is also added into
> Documentation/gpu/rfc/i915_dg2.rst
> 
> DG2 64K page size support:
> =========================
> 
> On discrete platforms, starting from DG2, we have to contend with GTT
> page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
> objects.  Specifically the hardware only supports 64K or larger GTT
> page sizes for such memory. The kernel will already ensure that all
> I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
> sizes underneath.
> 
> Note that the returned size here will always reflect any required
> rounding up done by the kernel, i.e 4K will now become 64K on devices
> such as DG2.
> 
> Special DG2 GTT address alignment requirement:
> 
> The GTT alignment will also need to be at least 2M for such objects.
> 
> Note that due to how the hardware implements 64K GTT page support, we
> have some further complications:
> 
> 1) The entire PDE (which covers a 2MB virtual address range), must
> contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
> PDE is forbidden by the hardware.
> 
> 2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
> objects.
> 
> To keep things simple for userland, we mandate that any GTT mappings
> must be aligned to and rounded up to 2MB. As this only wastes virtual
> address space and avoids userland having to copy any needlessly
> complicated PDE sharing scheme (coloring) and only affects DG2, this
> is deemed to be a good compromise.
> 
> Flat CCS support for lmem
> =========================
> On Xe-HP and later devices, we use dedicated compression control state
> (CCS) stored in local memory for each surface, to support the 3D and
> media compression formats.
> 
> The memory required for the CCS of the entire local memory is 1/256 of
> the local memory size. So before the kernel boot, the required memory is
> reserved for the CCS data and a secure register will be programmed with
> the CCS base address.
> 
> Flat CCS data needs to be cleared when a lmem object is allocated. And
> CCS data can be copied in and out of CCS region through
> XY_CTRL_SURF_COPY_BLT. CPU can’t access the CCS data directly.
> 
> When we exaust the lmem, if the object’s placements support smem, then
> we can directly decompress the compressed lmem object into smem and
> start using it from smem itself.
> 
> But when we need to swapout the compressed lmem object into a smem
> region though objects’ placement doesn’t support smem, then we copy the
> lmem content as it is into smem region along with ccs data (using
> XY_CTRL_SURF_COPY_BLT). When the object is referred, lmem content will
> be swaped in along with restoration of the CCS data (using
> XY_CTRL_SURF_COPY_BLT) at corresponding location.
> 
> Flat-CCS Modifiers for different compression formats
> ====================================================
> I915_FORMAT_MOD_4_TILED_DG2_RC_CCS - used to indicate the buffers of
> Flat CCS render compression formats. Though the general layout is same
> as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression
> algorithm is used. Render compression uses 128 byte compression blocks
> 
> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS -used to indicate the buffers of Flat
> CCS media compression formats. Though the general layout is same as
> I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm
> is used. Media compression uses 256 byte compression blocks.
> 
> I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC - used to indicate the buffers of
> Flat CCS clear color render compression formats. Unified compression
> format for clear color render compression. The genral layout is a tiled
> layout using 4Kb tiles i.e Tile4 layout. Fast clear color value expected
> by HW is located in fb at offset 0 of plane#1
> 
> v2:
>   Fixed some formatting issues and platform naming issues
>   Added some more documentation on Flat-CCS
> 
> v3:
>   Plane programming is handled for flat-ccs and clear color
>   Tile4 and flat ccs modifier patches are rebased on table based
>     modifier reference method
>   Three patches are squashed
>   Y tile is pruned for DG2.
>   flat_ccs_cc plane format info is added
>   Added mesa, compute and media ppl for required uAPI ack.
> 
> v4:
>   Rebasing of the patches
> 
> v5:
>   KDoc is enhanced for cc modifier. [Nanley & Lionel]
>   inbuild macro usage for functional fix [Bob]
>   Addressed review comments from Matt
>   Platform coverage fix for modifiers [Imre]
> 
> Abdiel Janulgue (1):
>   drm/i915/lmem: Enable lmem for platforms with Flat CCS
> 
> Anshuman Gupta (1):
>   drm/i915/dg2: Flat CCS Support
> 
> Ayaz A Siddiqui (1):
>   drm/i915/gt: Clear compress metadata for Xe_HP platforms
> 
> CQ Tang (1):
>   drm/i915/xehpsdv: Add has_flat_ccs to device info
> 
> Matt Roper (1):
>   drm/i915/dg2: Add DG2 unified compression
> 
> Matthew Auld (6):
>   drm/i915: enforce min GTT alignment for discrete cards
>   drm/i915: support 64K GTT pages for discrete cards
>   drm/i915/gtt: allow overriding the pt alignment
>   drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
>   drm/i915/migrate: add acceleration support for DG2
>   drm/i915/uapi: document behaviour for DG2 64K support
> 
> Mika Kahola (1):
>   uapi/drm/dg2: Introduce format modifier for DG2 clear color
> 
> Ramalingam C (4):
>   drm/i915: add needs_compact_pt flag
>   Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI
>   drm/i915/Flat-CCS: Document on Flat-CCS memory compression
>   Doc/gpu/rfc/i915: i915 DG2 flat-CCS uAPI
> 
> Robert Beckett (1):
>   drm/i915: add gtt misalignment test
> 
> Stanislav Lisovskiy (2):
>   drm/i915: Introduce new Tile 4 format
>   drm/i915/dg2: Tile 4 plane format support
> 
>  Documentation/gpu/rfc/i915_dg2.rst            |  32 ++
>  Documentation/gpu/rfc/index.rst               |   3 +
>  drivers/gpu/drm/i915/display/intel_display.c  |   5 +-
>  drivers/gpu/drm/i915/display/intel_fb.c       |  68 +++-
>  drivers/gpu/drm/i915/display/intel_fb.h       |   1 +
>  drivers/gpu/drm/i915/display/intel_fbc.c      |   1 +
>  .../drm/i915/display/intel_plane_initial.c    |   1 +
>  .../drm/i915/display/skl_universal_plane.c    |  70 +++-
>  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++
>  .../i915/gem/selftests/i915_gem_client_blt.c  |  21 +-
>  drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 158 +++++++-
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |  14 +
>  drivers/gpu/drm/i915/gt/intel_gt.c            |  19 +
>  drivers/gpu/drm/i915/gt/intel_gt.h            |   1 +
>  drivers/gpu/drm/i915/gt/intel_gtt.c           |  12 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h           |  31 +-
>  drivers/gpu/drm/i915/gt/intel_migrate.c       | 336 ++++++++++++++++--
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  17 +-
>  drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  24 +-
>  drivers/gpu/drm/i915/i915_drv.h               |  18 +-
>  drivers/gpu/drm/i915/i915_pci.c               |   4 +
>  drivers/gpu/drm/i915/i915_reg.h               |   4 +
>  drivers/gpu/drm/i915/i915_vma.c               |   9 +
>  drivers/gpu/drm/i915/intel_device_info.h      |   3 +
>  drivers/gpu/drm/i915/intel_pm.c               |   1 +
>  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 224 ++++++++++--
>  include/uapi/drm/drm_fourcc.h                 |  43 +++
>  include/uapi/drm/i915_drm.h                   |  44 ++-
>  28 files changed, 1102 insertions(+), 122 deletions(-)
>  create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
> 
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
  2022-02-17 17:15         ` Chery, Nanley G
@ 2022-03-18 17:39           ` Imre Deak
  -1 siblings, 0 replies; 80+ messages in thread
From: Imre Deak @ 2022-03-18 17:39 UTC (permalink / raw)
  To: Chery, Nanley G
  Cc: Nanley Chery, juhapekka.heikkila, intel-gfx, dri-devel, Auld, Matthew

On Thu, Feb 17, 2022 at 05:15:15PM +0000, Chery, Nanley G wrote:
> > >> [...]
> > >> --- a/include/uapi/drm/drm_fourcc.h
> > >> +++ b/include/uapi/drm/drm_fourcc.h
> > >> @@ -583,6 +583,28 @@ extern "C" {
> > >>    */
> > >>   #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
> > >>
> > >> +/*
> > >> + * Intel color control surfaces (CCS) for DG2 render compression.
> > >> + *
> > >> + * DG2 uses a new compression format for render compression. The general
> > >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> > >> + * but a new hashing/compression algorithm is used, so a fresh modifier must
> > >> + * be associated with buffers of this type. Render compression uses 128 byte
> > >> + * compression blocks.
> > >
> > > I think I've seen a way to configure the compression block size on TGL
> > > at least. I can't find the spec text for that at the moment though...
> > > Could we omit these mentions?
> > 
> > Not sure why general possibility of changing compression block size is relevant?
> > All hw features can be changed but this defines how this modifier is being
> > implemented.
> 
> I was concerned about compatibility between the different modes, but I've
> looked into the restrictions here and don't see any problems with this.
> 
> > Say you take I915_FORMAT_MOD_4_TILED_DG2_RC_CCS framebuffer including
> > control surface and copy it out, then come back and restore framebuffer with
> > same information. It is expected to be valid?
>
> > /Juha-Pekka
> > 
> > >> + */
> > >> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
> > >> +
> > >
> > > How about something like:
> > >
> > > The main surface is Tile 4 and at plane index 0. The CCS plane is
> > > hidden from userspace. The main surface pitch is required to be a
> > > multiple of four Tile 4 widths. The CCS is configured with the render
> > > compression format associated with the main surface format.
> 
> Actually, let's omit the last sentence. CCS has always been affected
> by the main surface format, so I don't think there's a need to mention it
> specifically for the DG2 modifier.
>
> We do need to mention the 4-tile-wide pitch requirement though.

Agreed, the DG2 layout of planes and the tile format used - both
different wrt. the GEN12_RC_CCS format - should be described here.

> -Nanley
>  
> > > ....I think the CCS is technically accessible via the blitter engine,
> > > so the part about the plane being "hidden" may need some tweaking.

Maybe outside of the GEM object? Capturing all the above would you be ok
with the following?:

Intel color control surfaces (CCS) for DG2 render compression.

The main surface is Tile 4 and at plane index 0. The CCS data is stored
outside of the GEM object in a reserved memory area dedicated for the
storage of the CCS data from all GEM objects. The main surface pitch is
required to be a multiple of four Tile 4 widths. 


Intel color control surfaces (CCS) for DG2 media compression.

The main surface is Tile 4 and at plane index 0. For semi-planar formats
like NV12, the UV plane is Tile 4 at plane index 1. The CCS data both for
the main and semi-planar UV planes are stored outside of the GEM object
in a reserved memory area dedicated for the storage of the CCS data from
all GEM objects. The main surface pitch is required to be a multiple of
four Tile 4 widths. 

> > > -Nanley
> > >
> > >> +/*
> > >> + * Intel color control surfaces (CCS) for DG2 media compression.
> > >> + *
> > >> + * DG2 uses a new compression format for media compression. The
> > >> +general
> > >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> > >> + * but a new hashing/compression algorithm is used, so a fresh
> > >> +modifier must
> > >> + * be associated with buffers of this type. Media compression uses
> > >> +256 byte
> > >> + * compression blocks.
> > >> + */
> > >> +#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> > fourcc_mod_code(INTEL,
> > >> +11)
> > >> +
> > >>   /*
> > >>    * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> > >>    *
> > >> --
> > >> 2.20.1
> > >>
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
@ 2022-03-18 17:39           ` Imre Deak
  0 siblings, 0 replies; 80+ messages in thread
From: Imre Deak @ 2022-03-18 17:39 UTC (permalink / raw)
  To: Chery, Nanley G; +Cc: Nanley Chery, intel-gfx, dri-devel, Auld, Matthew

On Thu, Feb 17, 2022 at 05:15:15PM +0000, Chery, Nanley G wrote:
> > >> [...]
> > >> --- a/include/uapi/drm/drm_fourcc.h
> > >> +++ b/include/uapi/drm/drm_fourcc.h
> > >> @@ -583,6 +583,28 @@ extern "C" {
> > >>    */
> > >>   #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
> > >>
> > >> +/*
> > >> + * Intel color control surfaces (CCS) for DG2 render compression.
> > >> + *
> > >> + * DG2 uses a new compression format for render compression. The general
> > >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> > >> + * but a new hashing/compression algorithm is used, so a fresh modifier must
> > >> + * be associated with buffers of this type. Render compression uses 128 byte
> > >> + * compression blocks.
> > >
> > > I think I've seen a way to configure the compression block size on TGL
> > > at least. I can't find the spec text for that at the moment though...
> > > Could we omit these mentions?
> > 
> > Not sure why general possibility of changing compression block size is relevant?
> > All hw features can be changed but this defines how this modifier is being
> > implemented.
> 
> I was concerned about compatibility between the different modes, but I've
> looked into the restrictions here and don't see any problems with this.
> 
> > Say you take I915_FORMAT_MOD_4_TILED_DG2_RC_CCS framebuffer including
> > control surface and copy it out, then come back and restore framebuffer with
> > same information. It is expected to be valid?
>
> > /Juha-Pekka
> > 
> > >> + */
> > >> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
> > >> +
> > >
> > > How about something like:
> > >
> > > The main surface is Tile 4 and at plane index 0. The CCS plane is
> > > hidden from userspace. The main surface pitch is required to be a
> > > multiple of four Tile 4 widths. The CCS is configured with the render
> > > compression format associated with the main surface format.
> 
> Actually, let's omit the last sentence. CCS has always been affected
> by the main surface format, so I don't think there's a need to mention it
> specifically for the DG2 modifier.
>
> We do need to mention the 4-tile-wide pitch requirement though.

Agreed, the DG2 layout of planes and the tile format used - both
different wrt. the GEN12_RC_CCS format - should be described here.

> -Nanley
>  
> > > ....I think the CCS is technically accessible via the blitter engine,
> > > so the part about the plane being "hidden" may need some tweaking.

Maybe outside of the GEM object? Capturing all the above would you be ok
with the following?:

Intel color control surfaces (CCS) for DG2 render compression.

The main surface is Tile 4 and at plane index 0. The CCS data is stored
outside of the GEM object in a reserved memory area dedicated for the
storage of the CCS data from all GEM objects. The main surface pitch is
required to be a multiple of four Tile 4 widths. 


Intel color control surfaces (CCS) for DG2 media compression.

The main surface is Tile 4 and at plane index 0. For semi-planar formats
like NV12, the UV plane is Tile 4 at plane index 1. The CCS data both for
the main and semi-planar UV planes are stored outside of the GEM object
in a reserved memory area dedicated for the storage of the CCS data from
all GEM objects. The main surface pitch is required to be a multiple of
four Tile 4 widths. 

> > > -Nanley
> > >
> > >> +/*
> > >> + * Intel color control surfaces (CCS) for DG2 media compression.
> > >> + *
> > >> + * DG2 uses a new compression format for media compression. The
> > >> +general
> > >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> > >> + * but a new hashing/compression algorithm is used, so a fresh
> > >> +modifier must
> > >> + * be associated with buffers of this type. Media compression uses
> > >> +256 byte
> > >> + * compression blocks.
> > >> + */
> > >> +#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> > fourcc_mod_code(INTEL,
> > >> +11)
> > >> +
> > >>   /*
> > >>    * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> > >>    *
> > >> --
> > >> 2.20.1
> > >>
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-02-15 19:34                 ` Juha-Pekka Heikkila
@ 2022-03-21 13:20                   ` Imre Deak
  2022-03-23 23:42                       ` Chery, Nanley G
  0 siblings, 1 reply; 80+ messages in thread
From: Imre Deak @ 2022-03-21 13:20 UTC (permalink / raw)
  To: Nanley G Chery, Juha-Pekka Heikkila
  Cc: dri-devel, intel-gfx, Auld, Matthew, Nanley Chery

Hi Nanley, JP,

On Tue, Feb 15, 2022 at 09:34:22PM +0200, Juha-Pekka Heikkila wrote:
> [...]
> > > > > > > > > diff --git a/include/uapi/drm/drm_fourcc.h
> > > > > > > > > b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644
> > > > > > > > > --- a/include/uapi/drm/drm_fourcc.h
> > > > > > > > > +++ b/include/uapi/drm/drm_fourcc.h
> > > > > > > > > @@ -605,6 +605,16 @@ extern "C" {
> > > > > > > > >       */
> > > > > > > > >      #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
> > > > > > > > > 
> > > > > > > > > +/*
> > > > > > > > > + * Intel color control surfaces (CCS) for DG2 clear color render compression.
> > > > > > > > > + *
> > > > > > > > > + * DG2 uses a unified compression format for clear color render compression.
> > > > > > > > 
> > > > > > > > What's unified about DG2's compression format? If this doesn't
> > > > > > > > affect the layout, maybe we should drop this sentence.

Unified here probably refers to the fact the DG2 render engine is
capable of generating both a render and a media compressed surface as
opposed to earlier platforms. The display engine still needs to know
which compression format the FB uses, hence we need both an RC and MC
modifier. Based on this I also think we can drop the mention of unified
compression.

> > > > > > > > > + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> > > > > > > > > + *
> > > > > > > > 
> > > > > > > > This also needs a pitch aligned to four tiles, right? I think we
> > > > > > > > can save some effort by referencing the DG2_RC_CCS modifier here.
> > > > > > > > 
> > > > > > > > > + * Fast clear color value expected by HW is located in fb at offset 0 of plane#1
> > > > > > > > 
> > > > > > > > Why is the expected offset hardcoded to 0 instead of relying on
> > > > > > > > the offset provided by the modifier API? This looks like a bug.
> > > > > > > 
> > > > > > > Hi Nanley,
> > > > > > > 
> > > > > > > can you elaborate a bit, which offset from modifier API that
> > > > > > > applies to cc surface?
> > > > > > 
> > > > > > Hi Juha-Pekka,
> > > > > > 
> > > > > > On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
> > > > > 
> > > > > Hi Nanley,
> > > > > 
> > > > > this offset is coming from userspace on creation of framebuffer, at
> > > > > that moment from userspace caller can point to offset of desire.
> > > > > Normally offset[0] is set at 0 and then offset[n] at plane n start
> > > > > which is not stated to have to be exactly after plane n-1 end. Or did I
> > > > > misunderstand what you meant?
> > > > 
> > > > Perhaps, at least, I'm not sure what you're meaning to say. This
> > > > modifier description seems to say that the drm_mode_fb_cmd2::offsets
> > > > value for the clear color plane must be zero. Are you saying that it's
> > > > correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't
> > > > match mesa's expectations.
> > > 
> > > It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must
> > > be zero", it says "Fast clear color value expected by HW is located in fb at offset 0
> > > of plane#1".
> > 
> > Yes, it doesn't say that exactly, but that's what it seems to say. With every other
> > modifier, it's implied that the data for the plane begins at the offset specified
> > through the modifier API. So, explicitly mentioning it here (and with that wording)
> > conveys a new requirement.
> 
> I don't have objections on changing this description but for reference gen12
> version of the same says "The main surface is Y-tiled and is at plane index
> 0 whereas CCS is linear and at index 1. The clear color is stored at index
> 2, and the pitch should be ignored.", only plane indexes are mentioned. I
> anyway wrote neither of these descriptions.
> 
> > > Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's
> > > nothing stated about that offset.
> > 
> > Technically, plane #1's location is specified to be the combination of ::handles[1]
> > and ::offsets[1]. In practice though, I can imagine that there are areas of the stack
> > that are implicitly requiring that all ::handles[] entries match.

The FB modifier API requires all ::handles[] to match, that is all
planes must be contained in one GEM object.

> I didn't think we needed to go deeper as you started to just talk about how
> drm_mode_fb_cmd2::offsets[1] not being used. Let's not waste time.
> 
> > > These offsets are just offsets to bo which contain the framebuffer information
> > > hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc
> > > information is found starting at drm_mode_fb_cmd2::offsets[1][0]
> > 
> > If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane
> > index), I propose that we drop this sentence due to avoid any confusion.
> 
> But it need to defined as part of the modifier. It's the modifier features
> which are being described here.
> 
> > This offset discussion raises another question. The description says that the value
> > expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine?
> > The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?
> 
> Generally answer is yes but these parts you can see in patch "[PATCH v5
> 17/19] drm/i915/dg2: Flat CCS Support" and should be discussed there. Here
> "HW" should probably be changed something meaningful though.

The 256 bit clear color format starting at plane index 1 matches the one
described at I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC . So yes, "HW" refers
to the render engine and display consumes the 64 bit data at
::offset[1] + 16 bytes (and DE ignores the 64 bit data starting at
::offset[1] + 24 bytes.

The following captures all the above, would it be ok?:

Intel Color Control Surface with Clear Color (CCS) for DG2 render compression.

The main surface is Tile 4 and at plane index 0. The CCS data is stored
outside of the GEM object in a reserved memory area dedicated for the
storage of the CCS data from all GEM objects. The main surface pitch is
required to be a multiple of four Tile 4 widths. The clear color is stored
at plane index 1 and the pitch should be ignored. The format of the 256
bits clear color data matches the one used for the
I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC modifier, see its description
for details.

--Imre

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
  2022-03-18 17:39           ` Imre Deak
@ 2022-03-23 23:40             ` Chery, Nanley G
  -1 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-03-23 23:40 UTC (permalink / raw)
  To: Deak, Imre
  Cc: Nanley Chery, juhapekka.heikkila, intel-gfx, dri-devel, Auld, Matthew



> -----Original Message-----
> From: Deak, Imre <imre.deak@intel.com>
> Sent: Friday, March 18, 2022 10:40 AM
> To: Chery, Nanley G <nanley.g.chery@intel.com>
> Cc: juhapekka.heikkila@gmail.com; Nanley Chery <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>; intel-gfx <intel-
> gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
> 
> On Thu, Feb 17, 2022 at 05:15:15PM +0000, Chery, Nanley G wrote:
> > > >> [...]
> > > >> --- a/include/uapi/drm/drm_fourcc.h
> > > >> +++ b/include/uapi/drm/drm_fourcc.h
> > > >> @@ -583,6 +583,28 @@ extern "C" {
> > > >>    */
> > > >>   #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
> > > >>
> > > >> +/*
> > > >> + * Intel color control surfaces (CCS) for DG2 render compression.
> > > >> + *
> > > >> + * DG2 uses a new compression format for render compression. The general
> > > >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> > > >> + * but a new hashing/compression algorithm is used, so a fresh modifier must
> > > >> + * be associated with buffers of this type. Render compression uses 128 byte
> > > >> + * compression blocks.
> > > >
> > > > I think I've seen a way to configure the compression block size on TGL
> > > > at least. I can't find the spec text for that at the moment though...
> > > > Could we omit these mentions?
> > >
> > > Not sure why general possibility of changing compression block size is relevant?
> > > All hw features can be changed but this defines how this modifier is being
> > > implemented.
> >
> > I was concerned about compatibility between the different modes, but I've
> > looked into the restrictions here and don't see any problems with this.
> >
> > > Say you take I915_FORMAT_MOD_4_TILED_DG2_RC_CCS framebuffer including
> > > control surface and copy it out, then come back and restore framebuffer with
> > > same information. It is expected to be valid?
> >
> > > /Juha-Pekka
> > >
> > > >> + */
> > > >> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
> > > >> +
> > > >
> > > > How about something like:
> > > >
> > > > The main surface is Tile 4 and at plane index 0. The CCS plane is
> > > > hidden from userspace. The main surface pitch is required to be a
> > > > multiple of four Tile 4 widths. The CCS is configured with the render
> > > > compression format associated with the main surface format.
> >
> > Actually, let's omit the last sentence. CCS has always been affected
> > by the main surface format, so I don't think there's a need to mention it
> > specifically for the DG2 modifier.
> >
> > We do need to mention the 4-tile-wide pitch requirement though.
> 
> Agreed, the DG2 layout of planes and the tile format used - both
> different wrt. the GEN12_RC_CCS format - should be described here.
> 
> > -Nanley
> >
> > > > ....I think the CCS is technically accessible via the blitter engine,
> > > > so the part about the plane being "hidden" may need some tweaking.
> 
> Maybe outside of the GEM object? Capturing all the above would you be ok
> with the following?:
> 
> Intel color control surfaces (CCS) for DG2 render compression.
> 
> The main surface is Tile 4 and at plane index 0. The CCS data is stored
> outside of the GEM object in a reserved memory area dedicated for the
> storage of the CCS data from all GEM objects. The main surface pitch is
> required to be a multiple of four Tile 4 widths.
> 
> 
> Intel color control surfaces (CCS) for DG2 media compression.
> 
> The main surface is Tile 4 and at plane index 0. For semi-planar formats
> like NV12, the UV plane is Tile 4 at plane index 1. The CCS data both for
> the main and semi-planar UV planes are stored outside of the GEM object

This kind of implies that the Y plane is the main surface, but it's not more
"main" than the UV plane right? Seems like we should specifically call out the
Y plane for clarity. Maybe something like:

For semi-planar formats like NV12, the Y and UV planes are Tile 4 and are 
located at plane indices 0 and 1, respectively. The CCS for all planes are stored 
outside of the GEM object

> in a reserved memory area dedicated for the storage of the CCS data from
> all GEM objects. The main surface pitch is required to be a multiple of
> four Tile 4 widths.
> 

Looks good to me. Main suggestion I have here is to substitute 
"from all GEM objects" with "for all compressible GEM objects".
Happy to look at further revisions, but with that change at least,
Acked-by: Nanley Chery <nanley.g.chery@intel.com>

> > > > -Nanley
> > > >
> > > >> +/*
> > > >> + * Intel color control surfaces (CCS) for DG2 media compression.
> > > >> + *
> > > >> + * DG2 uses a new compression format for media compression. The
> > > >> +general
> > > >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> > > >> + * but a new hashing/compression algorithm is used, so a fresh
> > > >> +modifier must
> > > >> + * be associated with buffers of this type. Media compression uses
> > > >> +256 byte
> > > >> + * compression blocks.
> > > >> + */
> > > >> +#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> > > fourcc_mod_code(INTEL,
> > > >> +11)
> > > >> +
> > > >>   /*
> > > >>    * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> > > >>    *
> > > >> --
> > > >> 2.20.1
> > > >>
> >

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
@ 2022-03-23 23:40             ` Chery, Nanley G
  0 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-03-23 23:40 UTC (permalink / raw)
  To: Deak, Imre; +Cc: Nanley Chery, intel-gfx, dri-devel, Auld, Matthew



> -----Original Message-----
> From: Deak, Imre <imre.deak@intel.com>
> Sent: Friday, March 18, 2022 10:40 AM
> To: Chery, Nanley G <nanley.g.chery@intel.com>
> Cc: juhapekka.heikkila@gmail.com; Nanley Chery <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>; intel-gfx <intel-
> gfx@lists.freedesktop.org>; Auld, Matthew <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
> 
> On Thu, Feb 17, 2022 at 05:15:15PM +0000, Chery, Nanley G wrote:
> > > >> [...]
> > > >> --- a/include/uapi/drm/drm_fourcc.h
> > > >> +++ b/include/uapi/drm/drm_fourcc.h
> > > >> @@ -583,6 +583,28 @@ extern "C" {
> > > >>    */
> > > >>   #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 9)
> > > >>
> > > >> +/*
> > > >> + * Intel color control surfaces (CCS) for DG2 render compression.
> > > >> + *
> > > >> + * DG2 uses a new compression format for render compression. The general
> > > >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> > > >> + * but a new hashing/compression algorithm is used, so a fresh modifier must
> > > >> + * be associated with buffers of this type. Render compression uses 128 byte
> > > >> + * compression blocks.
> > > >
> > > > I think I've seen a way to configure the compression block size on TGL
> > > > at least. I can't find the spec text for that at the moment though...
> > > > Could we omit these mentions?
> > >
> > > Not sure why general possibility of changing compression block size is relevant?
> > > All hw features can be changed but this defines how this modifier is being
> > > implemented.
> >
> > I was concerned about compatibility between the different modes, but I've
> > looked into the restrictions here and don't see any problems with this.
> >
> > > Say you take I915_FORMAT_MOD_4_TILED_DG2_RC_CCS framebuffer including
> > > control surface and copy it out, then come back and restore framebuffer with
> > > same information. It is expected to be valid?
> >
> > > /Juha-Pekka
> > >
> > > >> + */
> > > >> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
> > > >> +
> > > >
> > > > How about something like:
> > > >
> > > > The main surface is Tile 4 and at plane index 0. The CCS plane is
> > > > hidden from userspace. The main surface pitch is required to be a
> > > > multiple of four Tile 4 widths. The CCS is configured with the render
> > > > compression format associated with the main surface format.
> >
> > Actually, let's omit the last sentence. CCS has always been affected
> > by the main surface format, so I don't think there's a need to mention it
> > specifically for the DG2 modifier.
> >
> > We do need to mention the 4-tile-wide pitch requirement though.
> 
> Agreed, the DG2 layout of planes and the tile format used - both
> different wrt. the GEN12_RC_CCS format - should be described here.
> 
> > -Nanley
> >
> > > > ....I think the CCS is technically accessible via the blitter engine,
> > > > so the part about the plane being "hidden" may need some tweaking.
> 
> Maybe outside of the GEM object? Capturing all the above would you be ok
> with the following?:
> 
> Intel color control surfaces (CCS) for DG2 render compression.
> 
> The main surface is Tile 4 and at plane index 0. The CCS data is stored
> outside of the GEM object in a reserved memory area dedicated for the
> storage of the CCS data from all GEM objects. The main surface pitch is
> required to be a multiple of four Tile 4 widths.
> 
> 
> Intel color control surfaces (CCS) for DG2 media compression.
> 
> The main surface is Tile 4 and at plane index 0. For semi-planar formats
> like NV12, the UV plane is Tile 4 at plane index 1. The CCS data both for
> the main and semi-planar UV planes are stored outside of the GEM object

This kind of implies that the Y plane is the main surface, but it's not more
"main" than the UV plane right? Seems like we should specifically call out the
Y plane for clarity. Maybe something like:

For semi-planar formats like NV12, the Y and UV planes are Tile 4 and are 
located at plane indices 0 and 1, respectively. The CCS for all planes are stored 
outside of the GEM object

> in a reserved memory area dedicated for the storage of the CCS data from
> all GEM objects. The main surface pitch is required to be a multiple of
> four Tile 4 widths.
> 

Looks good to me. Main suggestion I have here is to substitute 
"from all GEM objects" with "for all compressible GEM objects".
Happy to look at further revisions, but with that change at least,
Acked-by: Nanley Chery <nanley.g.chery@intel.com>

> > > > -Nanley
> > > >
> > > >> +/*
> > > >> + * Intel color control surfaces (CCS) for DG2 media compression.
> > > >> + *
> > > >> + * DG2 uses a new compression format for media compression. The
> > > >> +general
> > > >> + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
> > > >> + * but a new hashing/compression algorithm is used, so a fresh
> > > >> +modifier must
> > > >> + * be associated with buffers of this type. Media compression uses
> > > >> +256 byte
> > > >> + * compression blocks.
> > > >> + */
> > > >> +#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
> > > fourcc_mod_code(INTEL,
> > > >> +11)
> > > >> +
> > > >>   /*
> > > >>    * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> > > >>    *
> > > >> --
> > > >> 2.20.1
> > > >>
> >

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-03-21 13:20                   ` Imre Deak
@ 2022-03-23 23:42                       ` Chery, Nanley G
  0 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-03-23 23:42 UTC (permalink / raw)
  To: Deak, Imre, Juha-Pekka Heikkila
  Cc: dri-devel, intel-gfx, Auld,  Matthew, Nanley Chery



> -----Original Message-----
> From: Deak, Imre <imre.deak@intel.com>
> Sent: Monday, March 21, 2022 6:20 AM
> To: Chery, Nanley G <nanley.g.chery@intel.com>; Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Cc: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>; intel-gfx <intel-gfx@lists.freedesktop.org>;
> Auld, Matthew <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
> 
> Hi Nanley, JP,
> 
> On Tue, Feb 15, 2022 at 09:34:22PM +0200, Juha-Pekka Heikkila wrote:
> > [...]
> > > > > > > > > > diff --git a/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644
> > > > > > > > > > --- a/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > +++ b/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > @@ -605,6 +605,16 @@ extern "C" {
> > > > > > > > > >       */
> > > > > > > > > >      #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
> > > > > > > > > >
> > > > > > > > > > +/*
> > > > > > > > > > + * Intel color control surfaces (CCS) for DG2 clear color render compression.
> > > > > > > > > > + *
> > > > > > > > > > + * DG2 uses a unified compression format for clear color render compression.
> > > > > > > > >
> > > > > > > > > What's unified about DG2's compression format? If this doesn't
> > > > > > > > > affect the layout, maybe we should drop this sentence.
> 
> Unified here probably refers to the fact the DG2 render engine is
> capable of generating both a render and a media compressed surface as
> opposed to earlier platforms. The display engine still needs to know
> which compression format the FB uses, hence we need both an RC and MC
> modifier. Based on this I also think we can drop the mention of unified
> compression.
> 
> > > > > > > > > > + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> > > > > > > > > > + *
> > > > > > > > >
> > > > > > > > > This also needs a pitch aligned to four tiles, right? I think we
> > > > > > > > > can save some effort by referencing the DG2_RC_CCS modifier here.
> > > > > > > > >
> > > > > > > > > > + * Fast clear color value expected by HW is located in fb at offset 0 of plane#1
> > > > > > > > >
> > > > > > > > > Why is the expected offset hardcoded to 0 instead of relying on
> > > > > > > > > the offset provided by the modifier API? This looks like a bug.
> > > > > > > >
> > > > > > > > Hi Nanley,
> > > > > > > >
> > > > > > > > can you elaborate a bit, which offset from modifier API that
> > > > > > > > applies to cc surface?
> > > > > > >
> > > > > > > Hi Juha-Pekka,
> > > > > > >
> > > > > > > On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
> > > > > >
> > > > > > Hi Nanley,
> > > > > >
> > > > > > this offset is coming from userspace on creation of framebuffer, at
> > > > > > that moment from userspace caller can point to offset of desire.
> > > > > > Normally offset[0] is set at 0 and then offset[n] at plane n start
> > > > > > which is not stated to have to be exactly after plane n-1 end. Or did I
> > > > > > misunderstand what you meant?
> > > > >
> > > > > Perhaps, at least, I'm not sure what you're meaning to say. This
> > > > > modifier description seems to say that the drm_mode_fb_cmd2::offsets
> > > > > value for the clear color plane must be zero. Are you saying that it's
> > > > > correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't
> > > > > match mesa's expectations.
> > > >
> > > > It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must
> > > > be zero", it says "Fast clear color value expected by HW is located in fb at offset 0
> > > > of plane#1".
> > >
> > > Yes, it doesn't say that exactly, but that's what it seems to say. With every other
> > > modifier, it's implied that the data for the plane begins at the offset specified
> > > through the modifier API. So, explicitly mentioning it here (and with that wording)
> > > conveys a new requirement.
> >
> > I don't have objections on changing this description but for reference gen12
> > version of the same says "The main surface is Y-tiled and is at plane index
> > 0 whereas CCS is linear and at index 1. The clear color is stored at index
> > 2, and the pitch should be ignored.", only plane indexes are mentioned. I
> > anyway wrote neither of these descriptions.
> >
> > > > Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's
> > > > nothing stated about that offset.
> > >
> > > Technically, plane #1's location is specified to be the combination of ::handles[1]
> > > and ::offsets[1]. In practice though, I can imagine that there are areas of the stack
> > > that are implicitly requiring that all ::handles[] entries match.
> 
> The FB modifier API requires all ::handles[] to match, that is all
> planes must be contained in one GEM object.
> 

This is a requirement for i915, or for all drm drivers? I couldn't find anything in the
generic DRM headers or docs requiring this. Feel free to ping me about this offline.

> > I didn't think we needed to go deeper as you started to just talk about how
> > drm_mode_fb_cmd2::offsets[1] not being used. Let's not waste time.
> >
> > > > These offsets are just offsets to bo which contain the framebuffer information
> > > > hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc
> > > > information is found starting at drm_mode_fb_cmd2::offsets[1][0]
> > >
> > > If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane
> > > index), I propose that we drop this sentence due to avoid any confusion.
> >
> > But it need to defined as part of the modifier. It's the modifier features
> > which are being described here.
> >
> > > This offset discussion raises another question. The description says that the value
> > > expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine?
> > > The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?
> >
> > Generally answer is yes but these parts you can see in patch "[PATCH v5
> > 17/19] drm/i915/dg2: Flat CCS Support" and should be discussed there. Here
> > "HW" should probably be changed something meaningful though.
> 
> The 256 bit clear color format starting at plane index 1 matches the one
> described at I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC . So yes, "HW" refers
> to the render engine and display consumes the 64 bit data at
> ::offset[1] + 16 bytes (and DE ignores the 64 bit data starting at
> ::offset[1] + 24 bytes.
> 
> The following captures all the above, would it be ok?:
> 
> Intel Color Control Surface with Clear Color (CCS) for DG2 render compression.
> 
> The main surface is Tile 4 and at plane index 0. The CCS data is stored
> outside of the GEM object in a reserved memory area dedicated for the
> storage of the CCS data from all GEM objects. The main surface pitch is
                                                      ^
		                "for all compressible" ? (since SMEM objects don't have this) 

> required to be a multiple of four Tile 4 widths. The clear color is stored
> at plane index 1 and the pitch should be ignored. The format of the 256
> bits clear color data matches the one used for the
   ^
"256 bits of"?

> I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC modifier, see its description
> for details.
> 

Looks good to me. With the above minor changes,
Acked-by: Nanley Chery <nanley.g.chery@intel.com>

> --Imre

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
@ 2022-03-23 23:42                       ` Chery, Nanley G
  0 siblings, 0 replies; 80+ messages in thread
From: Chery, Nanley G @ 2022-03-23 23:42 UTC (permalink / raw)
  To: Deak, Imre, Juha-Pekka Heikkila
  Cc: dri-devel, intel-gfx, Auld,  Matthew, Nanley Chery



> -----Original Message-----
> From: Deak, Imre <imre.deak@intel.com>
> Sent: Monday, March 21, 2022 6:20 AM
> To: Chery, Nanley G <nanley.g.chery@intel.com>; Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Cc: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>; intel-gfx <intel-gfx@lists.freedesktop.org>;
> Auld, Matthew <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
> 
> Hi Nanley, JP,
> 
> On Tue, Feb 15, 2022 at 09:34:22PM +0200, Juha-Pekka Heikkila wrote:
> > [...]
> > > > > > > > > > diff --git a/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644
> > > > > > > > > > --- a/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > +++ b/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > @@ -605,6 +605,16 @@ extern "C" {
> > > > > > > > > >       */
> > > > > > > > > >      #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
> > > > > > > > > >
> > > > > > > > > > +/*
> > > > > > > > > > + * Intel color control surfaces (CCS) for DG2 clear color render compression.
> > > > > > > > > > + *
> > > > > > > > > > + * DG2 uses a unified compression format for clear color render compression.
> > > > > > > > >
> > > > > > > > > What's unified about DG2's compression format? If this doesn't
> > > > > > > > > affect the layout, maybe we should drop this sentence.
> 
> Unified here probably refers to the fact the DG2 render engine is
> capable of generating both a render and a media compressed surface as
> opposed to earlier platforms. The display engine still needs to know
> which compression format the FB uses, hence we need both an RC and MC
> modifier. Based on this I also think we can drop the mention of unified
> compression.
> 
> > > > > > > > > > + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> > > > > > > > > > + *
> > > > > > > > >
> > > > > > > > > This also needs a pitch aligned to four tiles, right? I think we
> > > > > > > > > can save some effort by referencing the DG2_RC_CCS modifier here.
> > > > > > > > >
> > > > > > > > > > + * Fast clear color value expected by HW is located in fb at offset 0 of plane#1
> > > > > > > > >
> > > > > > > > > Why is the expected offset hardcoded to 0 instead of relying on
> > > > > > > > > the offset provided by the modifier API? This looks like a bug.
> > > > > > > >
> > > > > > > > Hi Nanley,
> > > > > > > >
> > > > > > > > can you elaborate a bit, which offset from modifier API that
> > > > > > > > applies to cc surface?
> > > > > > >
> > > > > > > Hi Juha-Pekka,
> > > > > > >
> > > > > > > On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
> > > > > >
> > > > > > Hi Nanley,
> > > > > >
> > > > > > this offset is coming from userspace on creation of framebuffer, at
> > > > > > that moment from userspace caller can point to offset of desire.
> > > > > > Normally offset[0] is set at 0 and then offset[n] at plane n start
> > > > > > which is not stated to have to be exactly after plane n-1 end. Or did I
> > > > > > misunderstand what you meant?
> > > > >
> > > > > Perhaps, at least, I'm not sure what you're meaning to say. This
> > > > > modifier description seems to say that the drm_mode_fb_cmd2::offsets
> > > > > value for the clear color plane must be zero. Are you saying that it's
> > > > > correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't
> > > > > match mesa's expectations.
> > > >
> > > > It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must
> > > > be zero", it says "Fast clear color value expected by HW is located in fb at offset 0
> > > > of plane#1".
> > >
> > > Yes, it doesn't say that exactly, but that's what it seems to say. With every other
> > > modifier, it's implied that the data for the plane begins at the offset specified
> > > through the modifier API. So, explicitly mentioning it here (and with that wording)
> > > conveys a new requirement.
> >
> > I don't have objections on changing this description but for reference gen12
> > version of the same says "The main surface is Y-tiled and is at plane index
> > 0 whereas CCS is linear and at index 1. The clear color is stored at index
> > 2, and the pitch should be ignored.", only plane indexes are mentioned. I
> > anyway wrote neither of these descriptions.
> >
> > > > Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's
> > > > nothing stated about that offset.
> > >
> > > Technically, plane #1's location is specified to be the combination of ::handles[1]
> > > and ::offsets[1]. In practice though, I can imagine that there are areas of the stack
> > > that are implicitly requiring that all ::handles[] entries match.
> 
> The FB modifier API requires all ::handles[] to match, that is all
> planes must be contained in one GEM object.
> 

This is a requirement for i915, or for all drm drivers? I couldn't find anything in the
generic DRM headers or docs requiring this. Feel free to ping me about this offline.

> > I didn't think we needed to go deeper as you started to just talk about how
> > drm_mode_fb_cmd2::offsets[1] not being used. Let's not waste time.
> >
> > > > These offsets are just offsets to bo which contain the framebuffer information
> > > > hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc
> > > > information is found starting at drm_mode_fb_cmd2::offsets[1][0]
> > >
> > > If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane
> > > index), I propose that we drop this sentence due to avoid any confusion.
> >
> > But it need to defined as part of the modifier. It's the modifier features
> > which are being described here.
> >
> > > This offset discussion raises another question. The description says that the value
> > > expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine?
> > > The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?
> >
> > Generally answer is yes but these parts you can see in patch "[PATCH v5
> > 17/19] drm/i915/dg2: Flat CCS Support" and should be discussed there. Here
> > "HW" should probably be changed something meaningful though.
> 
> The 256 bit clear color format starting at plane index 1 matches the one
> described at I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC . So yes, "HW" refers
> to the render engine and display consumes the 64 bit data at
> ::offset[1] + 16 bytes (and DE ignores the 64 bit data starting at
> ::offset[1] + 24 bytes.
> 
> The following captures all the above, would it be ok?:
> 
> Intel Color Control Surface with Clear Color (CCS) for DG2 render compression.
> 
> The main surface is Tile 4 and at plane index 0. The CCS data is stored
> outside of the GEM object in a reserved memory area dedicated for the
> storage of the CCS data from all GEM objects. The main surface pitch is
                                                      ^
		                "for all compressible" ? (since SMEM objects don't have this) 

> required to be a multiple of four Tile 4 widths. The clear color is stored
> at plane index 1 and the pitch should be ignored. The format of the 256
> bits clear color data matches the one used for the
   ^
"256 bits of"?

> I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC modifier, see its description
> for details.
> 

Looks good to me. With the above minor changes,
Acked-by: Nanley Chery <nanley.g.chery@intel.com>

> --Imre

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
  2022-03-23 23:40             ` Chery, Nanley G
@ 2022-03-24 14:19               ` Imre Deak
  -1 siblings, 0 replies; 80+ messages in thread
From: Imre Deak @ 2022-03-24 14:19 UTC (permalink / raw)
  To: Chery, Nanley G; +Cc: Nanley Chery, intel-gfx, dri-devel, Auld, Matthew

On Thu, Mar 24, 2022 at 01:40:37AM +0200, Chery, Nanley G wrote:
> > [...]
> > Capturing all the above would you be ok with the following?:
> > 
> > Intel color control surfaces (CCS) for DG2 render compression.
> > 
> > The main surface is Tile 4 and at plane index 0. The CCS data is stored
> > outside of the GEM object in a reserved memory area dedicated for the
> > storage of the CCS data from all GEM objects. The main surface pitch is
> > required to be a multiple of four Tile 4 widths.
> > 
> > 
> > Intel color control surfaces (CCS) for DG2 media compression.
> > 
> > The main surface is Tile 4 and at plane index 0. For semi-planar formats
> > like NV12, the UV plane is Tile 4 at plane index 1. The CCS data both for
> > the main and semi-planar UV planes are stored outside of the GEM object
> 
> This kind of implies that the Y plane is the main surface, but it's not more
> "main" than the UV plane right? Seems like we should specifically call out the
> Y plane for clarity. Maybe something like:
> 
> For semi-planar formats like NV12, the Y and UV planes are Tile 4 and are 
> located at plane indices 0 and 1, respectively. The CCS for all planes are stored 
> outside of the GEM object

Ok, makes sense.

> > in a reserved memory area dedicated for the storage of the CCS data from
> > all GEM objects. The main surface pitch is required to be a multiple of
> > four Tile 4 widths.
> 
> Looks good to me. Main suggestion I have here is to substitute 
> "from all GEM objects" with "for all compressible GEM objects".

"for all RC/RC_CC/MC CCS compressible GEM objects" would be more
precise, in case there are other ways to compress data. Either way looks
ok to me.

> Happy to look at further revisions, but with that change at least,
> Acked-by: Nanley Chery <nanley.g.chery@intel.com>

Thanks. 

--Imre

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
@ 2022-03-24 14:19               ` Imre Deak
  0 siblings, 0 replies; 80+ messages in thread
From: Imre Deak @ 2022-03-24 14:19 UTC (permalink / raw)
  To: Chery, Nanley G
  Cc: Nanley Chery, juhapekka.heikkila, intel-gfx, dri-devel, Auld, Matthew

On Thu, Mar 24, 2022 at 01:40:37AM +0200, Chery, Nanley G wrote:
> > [...]
> > Capturing all the above would you be ok with the following?:
> > 
> > Intel color control surfaces (CCS) for DG2 render compression.
> > 
> > The main surface is Tile 4 and at plane index 0. The CCS data is stored
> > outside of the GEM object in a reserved memory area dedicated for the
> > storage of the CCS data from all GEM objects. The main surface pitch is
> > required to be a multiple of four Tile 4 widths.
> > 
> > 
> > Intel color control surfaces (CCS) for DG2 media compression.
> > 
> > The main surface is Tile 4 and at plane index 0. For semi-planar formats
> > like NV12, the UV plane is Tile 4 at plane index 1. The CCS data both for
> > the main and semi-planar UV planes are stored outside of the GEM object
> 
> This kind of implies that the Y plane is the main surface, but it's not more
> "main" than the UV plane right? Seems like we should specifically call out the
> Y plane for clarity. Maybe something like:
> 
> For semi-planar formats like NV12, the Y and UV planes are Tile 4 and are 
> located at plane indices 0 and 1, respectively. The CCS for all planes are stored 
> outside of the GEM object

Ok, makes sense.

> > in a reserved memory area dedicated for the storage of the CCS data from
> > all GEM objects. The main surface pitch is required to be a multiple of
> > four Tile 4 widths.
> 
> Looks good to me. Main suggestion I have here is to substitute 
> "from all GEM objects" with "for all compressible GEM objects".

"for all RC/RC_CC/MC CCS compressible GEM objects" would be more
precise, in case there are other ways to compress data. Either way looks
ok to me.

> Happy to look at further revisions, but with that change at least,
> Acked-by: Nanley Chery <nanley.g.chery@intel.com>

Thanks. 

--Imre

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
  2022-03-23 23:42                       ` Chery, Nanley G
@ 2022-03-24 14:45                         ` Imre Deak
  -1 siblings, 0 replies; 80+ messages in thread
From: Imre Deak @ 2022-03-24 14:45 UTC (permalink / raw)
  To: Chery, Nanley G
  Cc: Nanley Chery, Juha-Pekka Heikkila, intel-gfx, dri-devel, Auld, Matthew

On Thu, Mar 24, 2022 at 01:42:33AM +0200, Chery, Nanley G wrote:
> > -----Original Message-----
> > From: Deak, Imre <imre.deak@intel.com>
> > Sent: Monday, March 21, 2022 6:20 AM
> > To: Chery, Nanley G <nanley.g.chery@intel.com>; Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> > Cc: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>; intel-gfx <intel-gfx@lists.freedesktop.org>;
> > Auld, Matthew <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> > Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
> > 
> > Hi Nanley, JP,
> > 
> > On Tue, Feb 15, 2022 at 09:34:22PM +0200, Juha-Pekka Heikkila wrote:
> > > [...]
> > > > > > > > > > > diff --git a/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > > b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644
> > > > > > > > > > > --- a/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > > +++ b/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > > @@ -605,6 +605,16 @@ extern "C" {
> > > > > > > > > > >       */
> > > > > > > > > > >      #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
> > > > > > > > > > >
> > > > > > > > > > > +/*
> > > > > > > > > > > + * Intel color control surfaces (CCS) for DG2 clear color render compression.
> > > > > > > > > > > + *
> > > > > > > > > > > + * DG2 uses a unified compression format for clear color render compression.
> > > > > > > > > >
> > > > > > > > > > What's unified about DG2's compression format? If this doesn't
> > > > > > > > > > affect the layout, maybe we should drop this sentence.
> > 
> > Unified here probably refers to the fact the DG2 render engine is
> > capable of generating both a render and a media compressed surface as
> > opposed to earlier platforms. The display engine still needs to know
> > which compression format the FB uses, hence we need both an RC and MC
> > modifier. Based on this I also think we can drop the mention of unified
> > compression.
> > 
> > > > > > > > > > > + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> > > > > > > > > > > + *
> > > > > > > > > >
> > > > > > > > > > This also needs a pitch aligned to four tiles, right? I think we
> > > > > > > > > > can save some effort by referencing the DG2_RC_CCS modifier here.
> > > > > > > > > >
> > > > > > > > > > > + * Fast clear color value expected by HW is located in fb at offset 0 of plane#1
> > > > > > > > > >
> > > > > > > > > > Why is the expected offset hardcoded to 0 instead of relying on
> > > > > > > > > > the offset provided by the modifier API? This looks like a bug.
> > > > > > > > >
> > > > > > > > > Hi Nanley,
> > > > > > > > >
> > > > > > > > > can you elaborate a bit, which offset from modifier API that
> > > > > > > > > applies to cc surface?
> > > > > > > >
> > > > > > > > Hi Juha-Pekka,
> > > > > > > >
> > > > > > > > On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
> > > > > > >
> > > > > > > Hi Nanley,
> > > > > > >
> > > > > > > this offset is coming from userspace on creation of framebuffer, at
> > > > > > > that moment from userspace caller can point to offset of desire.
> > > > > > > Normally offset[0] is set at 0 and then offset[n] at plane n start
> > > > > > > which is not stated to have to be exactly after plane n-1 end. Or did I
> > > > > > > misunderstand what you meant?
> > > > > >
> > > > > > Perhaps, at least, I'm not sure what you're meaning to say. This
> > > > > > modifier description seems to say that the drm_mode_fb_cmd2::offsets
> > > > > > value for the clear color plane must be zero. Are you saying that it's
> > > > > > correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't
> > > > > > match mesa's expectations.
> > > > >
> > > > > It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must
> > > > > be zero", it says "Fast clear color value expected by HW is located in fb at offset 0
> > > > > of plane#1".
> > > >
> > > > Yes, it doesn't say that exactly, but that's what it seems to say. With every other
> > > > modifier, it's implied that the data for the plane begins at the offset specified
> > > > through the modifier API. So, explicitly mentioning it here (and with that wording)
> > > > conveys a new requirement.
> > >
> > > I don't have objections on changing this description but for reference gen12
> > > version of the same says "The main surface is Y-tiled and is at plane index
> > > 0 whereas CCS is linear and at index 1. The clear color is stored at index
> > > 2, and the pitch should be ignored.", only plane indexes are mentioned. I
> > > anyway wrote neither of these descriptions.
> > >
> > > > > Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's
> > > > > nothing stated about that offset.
> > > >
> > > > Technically, plane #1's location is specified to be the combination of ::handles[1]
> > > > and ::offsets[1]. In practice though, I can imagine that there are areas of the stack
> > > > that are implicitly requiring that all ::handles[] entries match.
> > 
> > The FB modifier API requires all ::handles[] to match, that is all
> > planes must be contained in one GEM object.
> 
> This is a requirement for i915, or for all drm drivers? I couldn't find anything in the
> generic DRM headers or docs requiring this. Feel free to ping me about this offline.

It's only an i915 requirement actually.

IIUC, it was added for the SKL CCS modifiers (2e2adb05736c3), where SKL
has a restriction on the location of the CCS (and UV) planes. The
feasible way to conform to these limits was to require all planes to
reside in the same GEM object.

For DG2 there was no plan to expose the CCS plane, so wrt. that this
restriction didn't make a difference.

> > > I didn't think we needed to go deeper as you started to just talk about how
> > > drm_mode_fb_cmd2::offsets[1] not being used. Let's not waste time.
> > >
> > > > > These offsets are just offsets to bo which contain the framebuffer information
> > > > > hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc
> > > > > information is found starting at drm_mode_fb_cmd2::offsets[1][0]
> > > >
> > > > If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane
> > > > index), I propose that we drop this sentence due to avoid any confusion.
> > >
> > > But it need to defined as part of the modifier. It's the modifier features
> > > which are being described here.
> > >
> > > > This offset discussion raises another question. The description says that the value
> > > > expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine?
> > > > The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?
> > >
> > > Generally answer is yes but these parts you can see in patch "[PATCH v5
> > > 17/19] drm/i915/dg2: Flat CCS Support" and should be discussed there. Here
> > > "HW" should probably be changed something meaningful though.
> > 
> > The 256 bit clear color format starting at plane index 1 matches the one
> > described at I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC . So yes, "HW" refers
> > to the render engine and display consumes the 64 bit data at
> > ::offset[1] + 16 bytes (and DE ignores the 64 bit data starting at
> > ::offset[1] + 24 bytes.
> > 
> > The following captures all the above, would it be ok?:
> > 
> > Intel Color Control Surface with Clear Color (CCS) for DG2 render compression.
> > 
> > The main surface is Tile 4 and at plane index 0. The CCS data is stored
> > outside of the GEM object in a reserved memory area dedicated for the
> > storage of the CCS data from all GEM objects. The main surface pitch is
>                                                       ^
> 		                "for all compressible" ? (since SMEM objects don't have this) 

Makes sense, optionally also mentioning that "for all RC/RC_CC/MC CCS
compressible GEM objects".

> > required to be a multiple of four Tile 4 widths. The clear color is stored
> > at plane index 1 and the pitch should be ignored. The format of the 256
> > bits clear color data matches the one used for the
>    ^
> "256 bits of"?

Ok.

> > I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC modifier, see its description
> > for details.
> 
> Looks good to me. With the above minor changes,
> Acked-by: Nanley Chery <nanley.g.chery@intel.com>

Thanks.

--Imre

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
@ 2022-03-24 14:45                         ` Imre Deak
  0 siblings, 0 replies; 80+ messages in thread
From: Imre Deak @ 2022-03-24 14:45 UTC (permalink / raw)
  To: Chery, Nanley G; +Cc: Nanley Chery, intel-gfx, dri-devel, Auld, Matthew

On Thu, Mar 24, 2022 at 01:42:33AM +0200, Chery, Nanley G wrote:
> > -----Original Message-----
> > From: Deak, Imre <imre.deak@intel.com>
> > Sent: Monday, March 21, 2022 6:20 AM
> > To: Chery, Nanley G <nanley.g.chery@intel.com>; Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> > Cc: Nanley Chery <nanleychery@gmail.com>; C, Ramalingam <ramalingam.c@intel.com>; intel-gfx <intel-gfx@lists.freedesktop.org>;
> > Auld, Matthew <matthew.auld@intel.com>; dri-devel <dri-devel@lists.freedesktop.org>
> > Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
> > 
> > Hi Nanley, JP,
> > 
> > On Tue, Feb 15, 2022 at 09:34:22PM +0200, Juha-Pekka Heikkila wrote:
> > > [...]
> > > > > > > > > > > diff --git a/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > > b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644
> > > > > > > > > > > --- a/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > > +++ b/include/uapi/drm/drm_fourcc.h
> > > > > > > > > > > @@ -605,6 +605,16 @@ extern "C" {
> > > > > > > > > > >       */
> > > > > > > > > > >      #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
> > > > > > > > > > >
> > > > > > > > > > > +/*
> > > > > > > > > > > + * Intel color control surfaces (CCS) for DG2 clear color render compression.
> > > > > > > > > > > + *
> > > > > > > > > > > + * DG2 uses a unified compression format for clear color render compression.
> > > > > > > > > >
> > > > > > > > > > What's unified about DG2's compression format? If this doesn't
> > > > > > > > > > affect the layout, maybe we should drop this sentence.
> > 
> > Unified here probably refers to the fact the DG2 render engine is
> > capable of generating both a render and a media compressed surface as
> > opposed to earlier platforms. The display engine still needs to know
> > which compression format the FB uses, hence we need both an RC and MC
> > modifier. Based on this I also think we can drop the mention of unified
> > compression.
> > 
> > > > > > > > > > > + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
> > > > > > > > > > > + *
> > > > > > > > > >
> > > > > > > > > > This also needs a pitch aligned to four tiles, right? I think we
> > > > > > > > > > can save some effort by referencing the DG2_RC_CCS modifier here.
> > > > > > > > > >
> > > > > > > > > > > + * Fast clear color value expected by HW is located in fb at offset 0 of plane#1
> > > > > > > > > >
> > > > > > > > > > Why is the expected offset hardcoded to 0 instead of relying on
> > > > > > > > > > the offset provided by the modifier API? This looks like a bug.
> > > > > > > > >
> > > > > > > > > Hi Nanley,
> > > > > > > > >
> > > > > > > > > can you elaborate a bit, which offset from modifier API that
> > > > > > > > > applies to cc surface?
> > > > > > > >
> > > > > > > > Hi Juha-Pekka,
> > > > > > > >
> > > > > > > > On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
> > > > > > >
> > > > > > > Hi Nanley,
> > > > > > >
> > > > > > > this offset is coming from userspace on creation of framebuffer, at
> > > > > > > that moment from userspace caller can point to offset of desire.
> > > > > > > Normally offset[0] is set at 0 and then offset[n] at plane n start
> > > > > > > which is not stated to have to be exactly after plane n-1 end. Or did I
> > > > > > > misunderstand what you meant?
> > > > > >
> > > > > > Perhaps, at least, I'm not sure what you're meaning to say. This
> > > > > > modifier description seems to say that the drm_mode_fb_cmd2::offsets
> > > > > > value for the clear color plane must be zero. Are you saying that it's
> > > > > > correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't
> > > > > > match mesa's expectations.
> > > > >
> > > > > It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must
> > > > > be zero", it says "Fast clear color value expected by HW is located in fb at offset 0
> > > > > of plane#1".
> > > >
> > > > Yes, it doesn't say that exactly, but that's what it seems to say. With every other
> > > > modifier, it's implied that the data for the plane begins at the offset specified
> > > > through the modifier API. So, explicitly mentioning it here (and with that wording)
> > > > conveys a new requirement.
> > >
> > > I don't have objections on changing this description but for reference gen12
> > > version of the same says "The main surface is Y-tiled and is at plane index
> > > 0 whereas CCS is linear and at index 1. The clear color is stored at index
> > > 2, and the pitch should be ignored.", only plane indexes are mentioned. I
> > > anyway wrote neither of these descriptions.
> > >
> > > > > Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's
> > > > > nothing stated about that offset.
> > > >
> > > > Technically, plane #1's location is specified to be the combination of ::handles[1]
> > > > and ::offsets[1]. In practice though, I can imagine that there are areas of the stack
> > > > that are implicitly requiring that all ::handles[] entries match.
> > 
> > The FB modifier API requires all ::handles[] to match, that is all
> > planes must be contained in one GEM object.
> 
> This is a requirement for i915, or for all drm drivers? I couldn't find anything in the
> generic DRM headers or docs requiring this. Feel free to ping me about this offline.

It's only an i915 requirement actually.

IIUC, it was added for the SKL CCS modifiers (2e2adb05736c3), where SKL
has a restriction on the location of the CCS (and UV) planes. The
feasible way to conform to these limits was to require all planes to
reside in the same GEM object.

For DG2 there was no plan to expose the CCS plane, so wrt. that this
restriction didn't make a difference.

> > > I didn't think we needed to go deeper as you started to just talk about how
> > > drm_mode_fb_cmd2::offsets[1] not being used. Let's not waste time.
> > >
> > > > > These offsets are just offsets to bo which contain the framebuffer information
> > > > > hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc
> > > > > information is found starting at drm_mode_fb_cmd2::offsets[1][0]
> > > >
> > > > If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane
> > > > index), I propose that we drop this sentence due to avoid any confusion.
> > >
> > > But it need to defined as part of the modifier. It's the modifier features
> > > which are being described here.
> > >
> > > > This offset discussion raises another question. The description says that the value
> > > > expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine?
> > > > The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?
> > >
> > > Generally answer is yes but these parts you can see in patch "[PATCH v5
> > > 17/19] drm/i915/dg2: Flat CCS Support" and should be discussed there. Here
> > > "HW" should probably be changed something meaningful though.
> > 
> > The 256 bit clear color format starting at plane index 1 matches the one
> > described at I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC . So yes, "HW" refers
> > to the render engine and display consumes the 64 bit data at
> > ::offset[1] + 16 bytes (and DE ignores the 64 bit data starting at
> > ::offset[1] + 24 bytes.
> > 
> > The following captures all the above, would it be ok?:
> > 
> > Intel Color Control Surface with Clear Color (CCS) for DG2 render compression.
> > 
> > The main surface is Tile 4 and at plane index 0. The CCS data is stored
> > outside of the GEM object in a reserved memory area dedicated for the
> > storage of the CCS data from all GEM objects. The main surface pitch is
>                                                       ^
> 		                "for all compressible" ? (since SMEM objects don't have this) 

Makes sense, optionally also mentioning that "for all RC/RC_CC/MC CCS
compressible GEM objects".

> > required to be a multiple of four Tile 4 widths. The clear color is stored
> > at plane index 1 and the pitch should be ignored. The format of the 256
> > bits clear color data matches the one used for the
>    ^
> "256 bits of"?

Ok.

> > I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC modifier, see its description
> > for details.
> 
> Looks good to me. With the above minor changes,
> Acked-by: Nanley Chery <nanley.g.chery@intel.com>

Thanks.

--Imre

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Intel-gfx] [PATCH v5 17/19] drm/i915/dg2: Flat CCS Support
  2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
  (?)
@ 2022-03-24 16:16   ` Imre Deak
  -1 siblings, 0 replies; 80+ messages in thread
From: Imre Deak @ 2022-03-24 16:16 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx, Matthew Auld, dri-devel

On Tue, Feb 01, 2022 at 04:11:30PM +0530, Ramalingam C wrote:
> From: Anshuman Gupta <anshuman.gupta@intel.com>
> 
> DG2 onwards discrete gfx has support for new flat CCS mapping,
> which brings in display feature in to avoid Aux walk for compressed
> surface. This support build on top of Flat CCS support added in XEHPSDV.
> FLAT CCS surface base address should be 64k aligned,
> Compressed displayable surfaces must use tile4 format.
> 
> HAS: 1407880786
> B.Spec : 7655
> B.Spec : 53902
> 
> Cc: Mika Kahola <mika.kahola@intel.com>
> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>

Reviewed-by: Imre Deak <imre.deak@intel.com>

> ---
>  drivers/gpu/drm/i915/display/intel_display.c  |  4 ++-
>  drivers/gpu/drm/i915/display/intel_fb.c       | 32 +++++++++++++------
>  .../drm/i915/display/skl_universal_plane.c    | 16 ++++++----
>  3 files changed, 36 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 189767cef356..2828ae612179 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -8588,7 +8588,9 @@ static void intel_atomic_prepare_plane_clear_colors(struct intel_atomic_state *s
>  
>  		/*
>  		 * The layout of the fast clear color value expected by HW
> -		 * (the DRM ABI requiring this value to be located in fb at offset 0 of plane#2):
> +		 * (the DRM ABI requiring this value to be located in fb at
> +		 * offset 0 of cc plane, plane #2 previous generations or
> +		 * plane #1 for flat ccs):
>  		 * - 4 x 4 bytes per-channel value
>  		 *   (in surface type specific float/int format provided by the fb user)
>  		 * - 8 bytes native color value used by the display
> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
> index 3df6ef5ffec5..e94923e9dbb1 100644
> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> @@ -107,6 +107,21 @@ static const struct drm_format_info gen12_ccs_cc_formats[] = {
>  	  .hsub = 1, .vsub = 1, .has_alpha = true },
>  };
>  
> +static const struct drm_format_info gen12_flat_ccs_cc_formats[] = {
> +	{ .format = DRM_FORMAT_XRGB8888, .depth = 24, .num_planes = 2,
> +	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
> +	  .hsub = 1, .vsub = 1, },
> +	{ .format = DRM_FORMAT_XBGR8888, .depth = 24, .num_planes = 2,
> +	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
> +	  .hsub = 1, .vsub = 1, },
> +	{ .format = DRM_FORMAT_ARGB8888, .depth = 32, .num_planes = 2,
> +	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
> +	  .hsub = 1, .vsub = 1, .has_alpha = true },
> +	{ .format = DRM_FORMAT_ABGR8888, .depth = 32, .num_planes = 2,
> +	  .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
> +	  .hsub = 1, .vsub = 1, .has_alpha = true },
> +};
> +
>  struct intel_modifier_desc {
>  	u64 modifier;
>  	struct {
> @@ -150,6 +165,8 @@ static const struct intel_modifier_desc intel_modifiers[] = {
>  		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
>  
>  		.ccs.cc_planes = BIT(1),
> +
> +		FORMAT_OVERRIDE(gen12_flat_ccs_cc_formats),
>  	}, {
>  		.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
>  		.display_ver = { 13, 13 },
> @@ -399,17 +416,13 @@ bool intel_fb_plane_supports_modifier(struct intel_plane *plane, u64 modifier)
>  static bool format_is_yuv_semiplanar(const struct intel_modifier_desc *md,
>  				     const struct drm_format_info *info)
>  {
> -	int yuv_planes;
> -
>  	if (!info->is_yuv)
>  		return false;
>  
> -	if (plane_caps_contain_any(md->plane_caps, INTEL_PLANE_CAP_CCS_MASK))
> -		yuv_planes = 4;
> +	if (hweight8(md->ccs.planar_aux_planes) == 2)
> +		return info->num_planes == 4;
>  	else
> -		yuv_planes = 2;
> -
> -	return info->num_planes == yuv_planes;
> +		return info->num_planes == 2;
>  }
>  
>  /**
> @@ -534,12 +547,13 @@ static unsigned int gen12_ccs_aux_stride(struct intel_framebuffer *fb, int ccs_p
>  
>  int skl_main_to_aux_plane(const struct drm_framebuffer *fb, int main_plane)
>  {
> +	const struct intel_modifier_desc *md = lookup_modifier(fb->modifier);
>  	struct drm_i915_private *i915 = to_i915(fb->dev);
>  
> -	if (intel_fb_is_ccs_modifier(fb->modifier))
> +	if (md->ccs.packed_aux_planes | md->ccs.planar_aux_planes)
>  		return main_to_ccs_plane(fb, main_plane);
>  	else if (DISPLAY_VER(i915) < 11 &&
> -		 intel_format_info_is_yuv_semiplanar(fb->format, fb->modifier))
> +		 format_is_yuv_semiplanar(md, fb->format))
>  		return 1;
>  	else
>  		return 0;
> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> index b4dced1907c5..18e50583abaa 100644
> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> @@ -1176,8 +1176,10 @@ skl_plane_update_arm(struct intel_plane *plane,
>  	intel_de_write_fw(dev_priv, PLANE_OFFSET(pipe, plane_id),
>  			  PLANE_OFFSET_Y(y) | PLANE_OFFSET_X(x));
>  
> -	intel_de_write_fw(dev_priv, PLANE_AUX_DIST(pipe, plane_id),
> -			  skl_plane_aux_dist(plane_state, color_plane));
> +	/* FLAT CCS doesn't need to program AUX_DIST */
> +	if (!HAS_FLAT_CCS(dev_priv))
> +		intel_de_write_fw(dev_priv, PLANE_AUX_DIST(pipe, plane_id),
> +				  skl_plane_aux_dist(plane_state, color_plane));
>  
>  	if (DISPLAY_VER(dev_priv) < 11)
>  		intel_de_write_fw(dev_priv, PLANE_AUX_OFFSET(pipe, plane_id),
> @@ -1557,9 +1559,10 @@ static int skl_check_main_surface(struct intel_plane_state *plane_state)
>  
>  	/*
>  	 * CCS AUX surface doesn't have its own x/y offsets, we must make sure
> -	 * they match with the main surface x/y offsets.
> +	 * they match with the main surface x/y offsets. On DG2
> +	 * there's no aux plane on fb so skip this checking.
>  	 */
> -	if (intel_fb_is_ccs_modifier(fb->modifier)) {
> +	if (intel_fb_is_ccs_modifier(fb->modifier) && aux_plane) {
>  		while (!skl_check_main_ccs_coordinates(plane_state, x, y,
>  						       offset, aux_plane)) {
>  			if (offset == 0)
> @@ -1603,6 +1606,8 @@ static int skl_check_nv12_aux_surface(struct intel_plane_state *plane_state)
>  	const struct drm_framebuffer *fb = plane_state->hw.fb;
>  	unsigned int rotation = plane_state->hw.rotation;
>  	int uv_plane = 1;
> +	int ccs_plane = intel_fb_is_ccs_modifier(fb->modifier) ?
> +			skl_main_to_aux_plane(fb, uv_plane) : 0;
>  	int max_width = intel_plane_max_width(plane, fb, uv_plane, rotation);
>  	int max_height = intel_plane_max_height(plane, fb, uv_plane, rotation);
>  	int x = plane_state->uapi.src.x1 >> 17;
> @@ -1623,8 +1628,7 @@ static int skl_check_nv12_aux_surface(struct intel_plane_state *plane_state)
>  	offset = intel_plane_compute_aligned_offset(&x, &y,
>  						    plane_state, uv_plane);
>  
> -	if (intel_fb_is_ccs_modifier(fb->modifier)) {
> -		int ccs_plane = main_to_ccs_plane(fb, uv_plane);
> +	if (ccs_plane) {
>  		u32 aux_offset = plane_state->view.color_plane[ccs_plane].offset;
>  		u32 alignment = intel_surf_alignment(fb, uv_plane);
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2022-03-24 16:16 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-01 10:41 [PATCH v5 00/19] drm/i915/dg2: Enabling 64k page size and flat ccs Ramalingam C
2022-02-01 10:41 ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 01/19] drm/i915: add needs_compact_pt flag Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 02/19] drm/i915: enforce min GTT alignment for discrete cards Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 03/19] drm/i915: support 64K GTT pages " Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 04/19] drm/i915: add gtt misalignment test Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 05/19] drm/i915/gtt: allow overriding the pt alignment Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 06/19] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 07/19] drm/i915/migrate: add acceleration support for DG2 Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:49   ` Matthew Auld
2022-02-01 10:49     ` [Intel-gfx] " Matthew Auld
2022-02-01 10:41 ` [PATCH v5 08/19] drm/i915/uapi: document behaviour for DG2 64K support Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 09/19] Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-18  5:39   ` Lucas De Marchi
2022-02-18  5:39     ` Lucas De Marchi
2022-02-18  8:20     ` Ramalingam C
2022-02-18  8:20       ` Ramalingam C
2022-02-01 10:41 ` [PATCH v5 10/19] drm/i915/xehpsdv: Add has_flat_ccs to device info Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 11/19] drm/i915/lmem: Enable lmem for platforms with Flat CCS Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-18 10:08   ` Lucas De Marchi
2022-02-18 10:17     ` Lucas De Marchi
2022-02-01 10:41 ` [PATCH v5 12/19] drm/i915/gt: Clear compress metadata for Xe_HP platforms Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 13/19] drm/i915: Introduce new Tile 4 format Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 14/19] drm/i915/dg2: Tile 4 plane format support Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-12  1:17   ` Nanley Chery
2022-02-15 14:53     ` Juha-Pekka Heikkila
2022-02-17 17:15       ` Chery, Nanley G
2022-02-17 17:15         ` Chery, Nanley G
2022-03-18 17:39         ` Imre Deak
2022-03-18 17:39           ` Imre Deak
2022-03-23 23:40           ` Chery, Nanley G
2022-03-23 23:40             ` Chery, Nanley G
2022-03-24 14:19             ` Imre Deak
2022-03-24 14:19               ` Imre Deak
2022-02-01 10:41 ` [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-12  1:19   ` Nanley Chery
2022-02-15 14:55     ` Juha-Pekka Heikkila
2022-02-15 15:02       ` Chery, Nanley G
2022-02-15 15:02         ` Chery, Nanley G
2022-02-15 16:15         ` Juha-Pekka Heikkila
2022-02-15 16:44           ` Chery, Nanley G
2022-02-15 16:44             ` Chery, Nanley G
2022-02-15 17:31             ` Juha-Pekka Heikkila
2022-02-15 18:24               ` Chery, Nanley G
2022-02-15 18:24                 ` Chery, Nanley G
2022-02-15 19:34                 ` Juha-Pekka Heikkila
2022-03-21 13:20                   ` Imre Deak
2022-03-23 23:42                     ` Chery, Nanley G
2022-03-23 23:42                       ` Chery, Nanley G
2022-03-24 14:45                       ` Imre Deak
2022-03-24 14:45                         ` Imre Deak
2022-02-01 10:41 ` [PATCH v5 17/19] drm/i915/dg2: Flat CCS Support Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-03-24 16:16   ` Imre Deak
2022-02-01 10:41 ` [PATCH v5 18/19] drm/i915/Flat-CCS: Document on Flat-CCS memory compression Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 10:41 ` [PATCH v5 19/19] Doc/gpu/rfc/i915: i915 DG2 flat-CCS uAPI Ramalingam C
2022-02-01 10:41   ` [Intel-gfx] " Ramalingam C
2022-02-01 12:45 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/dg2: Enabling 64k page size and flat ccs (rev5) Patchwork
2022-02-01 12:47 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-02-01 13:15 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-02-18 19:04 ` [PATCH v5 00/19] drm/i915/dg2: Enabling 64k page size and flat ccs Ramalingam C
2022-02-18 19:04   ` [Intel-gfx] " Ramalingam C

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.