All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/17] drm/i915/dg2: Enabling 64k page size and flat ccs
@ 2021-10-21 14:26 ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C

This series introduces the enabling patches for new memory compression
feature Flat CCS and 64k page support for i915 local memory, along with
documentation on the uAPI impact. Included the details of the feature and
the implications on the uAPI below. Which is also added into
Documentation/gpu/rfc/i915_dg2.rst

DG2 64K page size support:
=========================

On discrete platforms, starting from DG2, we have to contend with GTT
page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
objects. Specifically the hardware only supports 64K or larger GTT page
sizes for such memory. The kernel will already ensure that all
I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
sizes underneath.

Note that the returned size here will always reflect any required
rounding up done by the kernel, i.e 4K will now become 64K on devices
such as DG2.

Special DG2 GTT address alignment requirement:
=============================================

The GTT alignment will also need be at least 64K for such objects.

Note that due to how the hardware implements 64K GTT page support, we
have some further complications:

1) The entire PDE(which covers a 2M virtual address range), must contain
only 64K PTEs, i.e mixing 4K and 64K PTEs in the same PDE is forbidden
by the hardware.

2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
objects.

To handle the above the kernel implements a memory coloring scheme to
prevent userspace from mixing I915_MEMORY_CLASS_DEVICE and
I915_MEMORY_CLASS_SYSTEM objects in the same PDE. If the kernel is ever
unable to evict the required pages for the given PDE(different color)
when inserting the object into the GTT then it will simply fail the
request.

Since userspace needs to manage the GTT address space themselves,
special care is needed to ensure this doesn’t happen. The simplest
scheme is to simply align and round up all I915_MEMORY_CLASS_DEVICE
objects to 2M, which avoids any issues here. At the very least this is
likely needed for objects that can be placed in both
I915_MEMORY_CLASS_DEVICE and I915_MEMORY_CLASS_SYSTEM, to avoid
potential issues when the kernel needs to migrate the object behind the
scenes, since that might also involve evicting other objects.

To summarise the GTT rules, on platforms like DG2:

1) All objects that can be placed in I915_MEMORY_CLASS_DEVICE must have
64K alignment. The kernel will reject this otherwise.

2) All I915_MEMORY_CLASS_DEVICE objects must never be placed in the same
PDE with other I915_MEMORY_CLASS_SYSTEM objects. The kernel will reject
this otherwise.

3) Objects that can be placed in both I915_MEMORY_CLASS_DEVICE and
I915_MEMORY_CLASS_SYSTEM should probably be aligned and padded out to
2M.

Flat CCS support for lmem
=========================
On Xe-HP and later devices, we use dedicated compression control state
(CCS) stored in local memory for each surface, to support the 3D and
media compression formats.

The memory required for the CCS of the entire local memory is 1/256 of
the local memory size. So before the kernel boot, the required memory is
reserved for the CCS data and a secure register will be programmed with
the CCS base address.

Flat CCS data needs to be cleared when a lmem object is allocated. And
CCS data can be copied in and out of CCS region through
XY_CTRL_SURF_COPY_BLT. CPU can’t access the CCS data directly.

When we exaust the lmem, if the object’s placements support smem, then
we can directly decompress the compressed lmem object into smem and
start using it from smem itself.

But when we need to swapout the compressed lmem object into a smem
region though objects’ placement doesn’t support smem, then we copy the
lmem content as it is into smem region along with ccs data (using
XY_CTRL_SURF_COPY_BLT). When the object is referred, lmem content will
be swaped in along with restoration of the CCS data (using
XY_CTRL_SURF_COPY_BLT) at corresponding location.

Flat-CCS Modifiers for different compression formats
====================================================
I915_FORMAT_MOD_F_TILED_DG2_RC_CCS - used to indicate the buffers of
Flat CCS render compression formats. Though the general layout is same
as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression
algorithm is used. Render compression uses 128 byte compression blocks

I915_FORMAT_MOD_F_TILED_DG2_MC_CCS -used to indicate the buffers of Flat
CCS media compression formats. Though the general layout is same as
I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm
is used. Media compression uses 256 byte compression blocks.

I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC - used to indicate the buffers of
Flat CCS clear color render compression formats. Unified compression
format for clear color render compression. The genral layout is a tiled
layout using 4Kb tiles i.e Tile4 layout.

v2:
  Fixed some formatting issues and platform naming issues
  Added some more documentation on Flat-CCS


Abdiel Janulgue (1):
  drm/i915/lmem: Enable lmem for platforms with Flat CCS

Ayaz A Siddiqui (1):
  drm/i915/gt: Clear compress metadata for Xe_HP platforms

Bommu Krishnaiah (1):
  drm/i915: Add vm min alignment support

CQ Tang (1):
  drm/i915/xehpsdv: Add has_flat_ccs to device info

Matt Roper (1):
  uapi/drm/dg2: Format modifier for DG2 unified compression and clear
    color

Matthew Auld (8):
  drm/i915/xehpsdv: set min page-size to 64K
  drm/i915/xehpsdv: enforce min GTT alignment
  drm/i915: enforce min page size for scratch
  drm/i915/gtt/xehpsdv: move scratch page to system memory
  drm/i915/xehpsdv: support 64K GTT pages
  drm/i915/selftests: account for min_alignment in GTT selftests
  drm/i915/xehpsdv: implement memory coloring
  drm/i915/uapi: document behaviour for DG2 64K support

Ramalingam C (2):
  drm/i915/Flat-CCS: Document on Flat-CCS memory compression
  Doc/gpu/rfc/i915: i915 DG2 uAPI

Stanislav Lisovskiy (1):
  drm/i915/dg2: Tile 4 plane format support

Stuart Summers (1):
  drm/i915: Add has_64k_pages flag

 Documentation/gpu/rfc/i915_dg2.rst            |  32 ++++
 Documentation/gpu/rfc/index.rst               |   3 +
 drivers/gpu/drm/i915/display/intel_display.c  |   4 +
 .../drm/i915/display/intel_display_types.h    |  10 +-
 drivers/gpu/drm/i915/display/intel_fb.c       |  14 ++
 drivers/gpu/drm/i915/display/intel_fbc.c      |   1 +
 .../drm/i915/display/intel_plane_initial.c    |   1 +
 .../drm/i915/display/skl_universal_plane.c    |  81 +++++++--
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   6 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  61 +++++++
 .../i915/gem/selftests/i915_gem_client_blt.c  |  23 ++-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |   1 +
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 145 ++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |   2 +
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |  14 ++
 drivers/gpu/drm/i915/gt/intel_gt.c            |  19 ++
 drivers/gpu/drm/i915/gt/intel_gt.h            |   1 +
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  23 ++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  20 +++
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 167 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  27 ++-
 drivers/gpu/drm/i915/i915_drv.h               |   5 +
 drivers/gpu/drm/i915/i915_gem_evict.c         |  17 ++
 drivers/gpu/drm/i915/i915_pci.c               |   4 +
 drivers/gpu/drm/i915/i915_reg.h               |   4 +
 drivers/gpu/drm/i915/i915_vma.c               |  55 ++++--
 drivers/gpu/drm/i915/intel_device_info.h      |   3 +
 drivers/gpu/drm/i915/intel_pm.c               |   1 +
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  96 ++++++----
 drivers/gpu/drm/i915/selftests/mock_gtt.c     |   2 +
 include/uapi/drm/drm_fourcc.h                 |  38 ++++
 include/uapi/drm/i915_drm.h                   |  67 ++++++-
 33 files changed, 861 insertions(+), 87 deletions(-)
 create mode 100644 Documentation/gpu/rfc/i915_dg2.rst

-- 
2.20.1


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 00/17] drm/i915/dg2: Enabling 64k page size and flat ccs
@ 2021-10-21 14:26 ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C

This series introduces the enabling patches for new memory compression
feature Flat CCS and 64k page support for i915 local memory, along with
documentation on the uAPI impact. Included the details of the feature and
the implications on the uAPI below. Which is also added into
Documentation/gpu/rfc/i915_dg2.rst

DG2 64K page size support:
=========================

On discrete platforms, starting from DG2, we have to contend with GTT
page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
objects. Specifically the hardware only supports 64K or larger GTT page
sizes for such memory. The kernel will already ensure that all
I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
sizes underneath.

Note that the returned size here will always reflect any required
rounding up done by the kernel, i.e 4K will now become 64K on devices
such as DG2.

Special DG2 GTT address alignment requirement:
=============================================

The GTT alignment will also need be at least 64K for such objects.

Note that due to how the hardware implements 64K GTT page support, we
have some further complications:

1) The entire PDE(which covers a 2M virtual address range), must contain
only 64K PTEs, i.e mixing 4K and 64K PTEs in the same PDE is forbidden
by the hardware.

2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
objects.

To handle the above the kernel implements a memory coloring scheme to
prevent userspace from mixing I915_MEMORY_CLASS_DEVICE and
I915_MEMORY_CLASS_SYSTEM objects in the same PDE. If the kernel is ever
unable to evict the required pages for the given PDE(different color)
when inserting the object into the GTT then it will simply fail the
request.

Since userspace needs to manage the GTT address space themselves,
special care is needed to ensure this doesn’t happen. The simplest
scheme is to simply align and round up all I915_MEMORY_CLASS_DEVICE
objects to 2M, which avoids any issues here. At the very least this is
likely needed for objects that can be placed in both
I915_MEMORY_CLASS_DEVICE and I915_MEMORY_CLASS_SYSTEM, to avoid
potential issues when the kernel needs to migrate the object behind the
scenes, since that might also involve evicting other objects.

To summarise the GTT rules, on platforms like DG2:

1) All objects that can be placed in I915_MEMORY_CLASS_DEVICE must have
64K alignment. The kernel will reject this otherwise.

2) All I915_MEMORY_CLASS_DEVICE objects must never be placed in the same
PDE with other I915_MEMORY_CLASS_SYSTEM objects. The kernel will reject
this otherwise.

3) Objects that can be placed in both I915_MEMORY_CLASS_DEVICE and
I915_MEMORY_CLASS_SYSTEM should probably be aligned and padded out to
2M.

Flat CCS support for lmem
=========================
On Xe-HP and later devices, we use dedicated compression control state
(CCS) stored in local memory for each surface, to support the 3D and
media compression formats.

The memory required for the CCS of the entire local memory is 1/256 of
the local memory size. So before the kernel boot, the required memory is
reserved for the CCS data and a secure register will be programmed with
the CCS base address.

Flat CCS data needs to be cleared when a lmem object is allocated. And
CCS data can be copied in and out of CCS region through
XY_CTRL_SURF_COPY_BLT. CPU can’t access the CCS data directly.

When we exaust the lmem, if the object’s placements support smem, then
we can directly decompress the compressed lmem object into smem and
start using it from smem itself.

But when we need to swapout the compressed lmem object into a smem
region though objects’ placement doesn’t support smem, then we copy the
lmem content as it is into smem region along with ccs data (using
XY_CTRL_SURF_COPY_BLT). When the object is referred, lmem content will
be swaped in along with restoration of the CCS data (using
XY_CTRL_SURF_COPY_BLT) at corresponding location.

Flat-CCS Modifiers for different compression formats
====================================================
I915_FORMAT_MOD_F_TILED_DG2_RC_CCS - used to indicate the buffers of
Flat CCS render compression formats. Though the general layout is same
as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression
algorithm is used. Render compression uses 128 byte compression blocks

I915_FORMAT_MOD_F_TILED_DG2_MC_CCS -used to indicate the buffers of Flat
CCS media compression formats. Though the general layout is same as
I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm
is used. Media compression uses 256 byte compression blocks.

I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC - used to indicate the buffers of
Flat CCS clear color render compression formats. Unified compression
format for clear color render compression. The genral layout is a tiled
layout using 4Kb tiles i.e Tile4 layout.

v2:
  Fixed some formatting issues and platform naming issues
  Added some more documentation on Flat-CCS


Abdiel Janulgue (1):
  drm/i915/lmem: Enable lmem for platforms with Flat CCS

Ayaz A Siddiqui (1):
  drm/i915/gt: Clear compress metadata for Xe_HP platforms

Bommu Krishnaiah (1):
  drm/i915: Add vm min alignment support

CQ Tang (1):
  drm/i915/xehpsdv: Add has_flat_ccs to device info

Matt Roper (1):
  uapi/drm/dg2: Format modifier for DG2 unified compression and clear
    color

Matthew Auld (8):
  drm/i915/xehpsdv: set min page-size to 64K
  drm/i915/xehpsdv: enforce min GTT alignment
  drm/i915: enforce min page size for scratch
  drm/i915/gtt/xehpsdv: move scratch page to system memory
  drm/i915/xehpsdv: support 64K GTT pages
  drm/i915/selftests: account for min_alignment in GTT selftests
  drm/i915/xehpsdv: implement memory coloring
  drm/i915/uapi: document behaviour for DG2 64K support

Ramalingam C (2):
  drm/i915/Flat-CCS: Document on Flat-CCS memory compression
  Doc/gpu/rfc/i915: i915 DG2 uAPI

Stanislav Lisovskiy (1):
  drm/i915/dg2: Tile 4 plane format support

Stuart Summers (1):
  drm/i915: Add has_64k_pages flag

 Documentation/gpu/rfc/i915_dg2.rst            |  32 ++++
 Documentation/gpu/rfc/index.rst               |   3 +
 drivers/gpu/drm/i915/display/intel_display.c  |   4 +
 .../drm/i915/display/intel_display_types.h    |  10 +-
 drivers/gpu/drm/i915/display/intel_fb.c       |  14 ++
 drivers/gpu/drm/i915/display/intel_fbc.c      |   1 +
 .../drm/i915/display/intel_plane_initial.c    |   1 +
 .../drm/i915/display/skl_universal_plane.c    |  81 +++++++--
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |   6 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  61 +++++++
 .../i915/gem/selftests/i915_gem_client_blt.c  |  23 ++-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |   1 +
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 145 ++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |   2 +
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |  14 ++
 drivers/gpu/drm/i915/gt/intel_gt.c            |  19 ++
 drivers/gpu/drm/i915/gt/intel_gt.h            |   1 +
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  23 ++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  20 +++
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 167 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  27 ++-
 drivers/gpu/drm/i915/i915_drv.h               |   5 +
 drivers/gpu/drm/i915/i915_gem_evict.c         |  17 ++
 drivers/gpu/drm/i915/i915_pci.c               |   4 +
 drivers/gpu/drm/i915/i915_reg.h               |   4 +
 drivers/gpu/drm/i915/i915_vma.c               |  55 ++++--
 drivers/gpu/drm/i915/intel_device_info.h      |   3 +
 drivers/gpu/drm/i915/intel_pm.c               |   1 +
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  96 ++++++----
 drivers/gpu/drm/i915/selftests/mock_gtt.c     |   2 +
 include/uapi/drm/drm_fourcc.h                 |  38 ++++
 include/uapi/drm/i915_drm.h                   |  67 ++++++-
 33 files changed, 861 insertions(+), 87 deletions(-)
 create mode 100644 Documentation/gpu/rfc/i915_dg2.rst

-- 
2.20.1


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v2 01/17] drm/i915: Add has_64k_pages flag
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Stuart Summers, Ramalingam C

From: Stuart Summers <stuart.summers@intel.com>

Add a new platform flag, has_64k_pages, for platforms supporting
base page sizes of 64k.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 2 ++
 drivers/gpu/drm/i915/i915_pci.c          | 2 ++
 drivers/gpu/drm/i915/intel_device_info.h | 1 +
 3 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 12256218634f..a16fde38a252 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1714,6 +1714,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_MSLICES(dev_priv) \
 	(INTEL_INFO(dev_priv)->has_mslices)
 
+#define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
+
 #define HAS_IPC(dev_priv)		 (INTEL_INFO(dev_priv)->display.has_ipc)
 
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 169837de395d..8ef484a23652 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1015,6 +1015,7 @@ static const struct intel_device_info xehpsdv_info = {
 	DGFX_FEATURES,
 	PLATFORM(INTEL_XEHPSDV),
 	.display = { },
+	.has_64k_pages = 1,
 	.pipe_mask = 0,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
@@ -1033,6 +1034,7 @@ static const struct intel_device_info dg2_info = {
 	.graphics_rel = 55,
 	.media_rel = 55,
 	PLATFORM(INTEL_DG2),
+	.has_64k_pages = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) |
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 8e6f48d1eb7b..dd453b96af19 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -123,6 +123,7 @@ enum intel_ppgtt_type {
 	func(is_dgfx); \
 	/* Keep has_* in alphabetical order */ \
 	func(has_64bit_reloc); \
+	func(has_64k_pages); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
 	func(has_global_mocs); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 01/17] drm/i915: Add has_64k_pages flag
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Stuart Summers, Ramalingam C

From: Stuart Summers <stuart.summers@intel.com>

Add a new platform flag, has_64k_pages, for platforms supporting
base page sizes of 64k.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 2 ++
 drivers/gpu/drm/i915/i915_pci.c          | 2 ++
 drivers/gpu/drm/i915/intel_device_info.h | 1 +
 3 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 12256218634f..a16fde38a252 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1714,6 +1714,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_MSLICES(dev_priv) \
 	(INTEL_INFO(dev_priv)->has_mslices)
 
+#define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
+
 #define HAS_IPC(dev_priv)		 (INTEL_INFO(dev_priv)->display.has_ipc)
 
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 169837de395d..8ef484a23652 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1015,6 +1015,7 @@ static const struct intel_device_info xehpsdv_info = {
 	DGFX_FEATURES,
 	PLATFORM(INTEL_XEHPSDV),
 	.display = { },
+	.has_64k_pages = 1,
 	.pipe_mask = 0,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
@@ -1033,6 +1034,7 @@ static const struct intel_device_info dg2_info = {
 	.graphics_rel = 55,
 	.media_rel = 55,
 	PLATFORM(INTEL_DG2),
+	.has_64k_pages = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) |
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 8e6f48d1eb7b..dd453b96af19 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -123,6 +123,7 @@ enum intel_ppgtt_type {
 	func(is_dgfx); \
 	/* Keep has_* in alphabetical order */ \
 	func(has_64bit_reloc); \
+	func(has_64k_pages); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
 	func(has_global_mocs); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 02/17] drm/i915/xehpsdv: set min page-size to 64K
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Joonas Lahtinen

From: Matthew Auld <matthew.auld@intel.com>

LMEM should be allocated at 64K granularity, since 4K page support will
eventually be dropped for LMEM when using the PPGTT.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c  | 6 +++++-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c | 5 ++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index ddd37ccb1362..f52a06f05fc7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -778,6 +778,7 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
 	struct intel_uncore *uncore = &i915->uncore;
 	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
 	struct intel_memory_region *mem;
+	resource_size_t min_page_size;
 	resource_size_t io_start;
 	resource_size_t lmem_size;
 	u64 lmem_base;
@@ -789,8 +790,11 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
 	lmem_size = pci_resource_len(pdev, 2) - lmem_base;
 	io_start = pci_resource_start(pdev, 2) + lmem_base;
 
+	min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :
+						I915_GTT_PAGE_SIZE_4K;
+
 	mem = intel_memory_region_create(i915, lmem_base, lmem_size,
-					 I915_GTT_PAGE_SIZE_4K, io_start,
+					 min_page_size, io_start,
 					 type, instance,
 					 &i915_region_stolen_lmem_ops);
 	if (IS_ERR(mem))
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index afb35d2e5c73..073d28d96669 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -193,6 +193,7 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	struct intel_uncore *uncore = gt->uncore;
 	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
 	struct intel_memory_region *mem;
+	resource_size_t min_page_size;
 	resource_size_t io_start;
 	resource_size_t lmem_size;
 	int err;
@@ -207,10 +208,12 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
 		return ERR_PTR(-ENODEV);
 
+	min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :
+						I915_GTT_PAGE_SIZE_4K;
 	mem = intel_memory_region_create(i915,
 					 0,
 					 lmem_size,
-					 I915_GTT_PAGE_SIZE_4K,
+					 min_page_size,
 					 io_start,
 					 INTEL_MEMORY_LOCAL,
 					 0,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 02/17] drm/i915/xehpsdv: set min page-size to 64K
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Joonas Lahtinen

From: Matthew Auld <matthew.auld@intel.com>

LMEM should be allocated at 64K granularity, since 4K page support will
eventually be dropped for LMEM when using the PPGTT.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c  | 6 +++++-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c | 5 ++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index ddd37ccb1362..f52a06f05fc7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -778,6 +778,7 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
 	struct intel_uncore *uncore = &i915->uncore;
 	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
 	struct intel_memory_region *mem;
+	resource_size_t min_page_size;
 	resource_size_t io_start;
 	resource_size_t lmem_size;
 	u64 lmem_base;
@@ -789,8 +790,11 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
 	lmem_size = pci_resource_len(pdev, 2) - lmem_base;
 	io_start = pci_resource_start(pdev, 2) + lmem_base;
 
+	min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :
+						I915_GTT_PAGE_SIZE_4K;
+
 	mem = intel_memory_region_create(i915, lmem_base, lmem_size,
-					 I915_GTT_PAGE_SIZE_4K, io_start,
+					 min_page_size, io_start,
 					 type, instance,
 					 &i915_region_stolen_lmem_ops);
 	if (IS_ERR(mem))
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index afb35d2e5c73..073d28d96669 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -193,6 +193,7 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	struct intel_uncore *uncore = gt->uncore;
 	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
 	struct intel_memory_region *mem;
+	resource_size_t min_page_size;
 	resource_size_t io_start;
 	resource_size_t lmem_size;
 	int err;
@@ -207,10 +208,12 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
 		return ERR_PTR(-ENODEV);
 
+	min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :
+						I915_GTT_PAGE_SIZE_4K;
 	mem = intel_memory_region_create(i915,
 					 0,
 					 lmem_size,
-					 I915_GTT_PAGE_SIZE_4K,
+					 min_page_size,
 					 io_start,
 					 INTEL_MEMORY_LOCAL,
 					 0,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 03/17] drm/i915/xehpsdv: enforce min GTT alignment
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Joonas Lahtinen

From: Matthew Auld <matthew.auld@intel.com>

For local-memory objects we need to align the GTT addresses to 64K, both
for the ppgtt and ggtt.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/i915_vma.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 90546fa58fc1..c31b4bc8af16 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -670,8 +670,13 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	}
 
 	color = 0;
-	if (vma->obj && i915_vm_has_cache_coloring(vma->vm))
-		color = vma->obj->cache_level;
+	if (vma->obj) {
+		if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj))
+			alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
+
+		if (i915_vm_has_cache_coloring(vma->vm))
+			color = vma->obj->cache_level;
+	}
 
 	if (flags & PIN_OFFSET_FIXED) {
 		u64 offset = flags & PIN_OFFSET_MASK;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 03/17] drm/i915/xehpsdv: enforce min GTT alignment
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Joonas Lahtinen

From: Matthew Auld <matthew.auld@intel.com>

For local-memory objects we need to align the GTT addresses to 64K, both
for the ppgtt and ggtt.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/i915_vma.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 90546fa58fc1..c31b4bc8af16 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -670,8 +670,13 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	}
 
 	color = 0;
-	if (vma->obj && i915_vm_has_cache_coloring(vma->vm))
-		color = vma->obj->cache_level;
+	if (vma->obj) {
+		if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj))
+			alignment = max(alignment, I915_GTT_PAGE_SIZE_64K);
+
+		if (i915_vm_has_cache_coloring(vma->vm))
+			color = vma->obj->cache_level;
+	}
 
 	if (flags & PIN_OFFSET_FIXED) {
 		u64 offset = flags & PIN_OFFSET_MASK;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 04/17] drm/i915: enforce min page size for scratch
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C

From: Matthew Auld <matthew.auld@intel.com>

If the device needs 64K minimum GTT pages for device local-memory,
like on XEHPSDV, then we need to fail the allocation if we can't
meet it, instead of falling back to 4K pages, otherwise we can't
safely support the insertion of device local-memory pages for
this vm, since the HW expects the correct physical alignment and
size for every PTE, if we mark the page-table as 64K GTT mode.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gtt.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 67d14afa6623..2a6eec5f0d58 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -334,6 +334,18 @@ int setup_scratch_page(struct i915_address_space *vm)
 		if (size == I915_GTT_PAGE_SIZE_4K)
 			return -ENOMEM;
 
+		/*
+		 * If we need 64K minimum GTT pages for device local-memory,
+		 * like on XEHPSDV, then we need to fail the allocation here,
+		 * otherwise we can't safely support the insertion of
+		 * local-memory pages for this vm, since the HW expects the
+		 * correct physical alignment and size when the page-table is
+		 * operating in 64K GTT mode, which includes any scratch PTEs,
+		 * since userpsace can still touch them.
+		 */
+		if (HAS_64K_PAGES(vm->i915))
+			return -ENOMEM;
+
 		size = I915_GTT_PAGE_SIZE_4K;
 	} while (1);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 04/17] drm/i915: enforce min page size for scratch
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C

From: Matthew Auld <matthew.auld@intel.com>

If the device needs 64K minimum GTT pages for device local-memory,
like on XEHPSDV, then we need to fail the allocation if we can't
meet it, instead of falling back to 4K pages, otherwise we can't
safely support the insertion of device local-memory pages for
this vm, since the HW expects the correct physical alignment and
size for every PTE, if we mark the page-table as 64K GTT mode.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gtt.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 67d14afa6623..2a6eec5f0d58 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -334,6 +334,18 @@ int setup_scratch_page(struct i915_address_space *vm)
 		if (size == I915_GTT_PAGE_SIZE_4K)
 			return -ENOMEM;
 
+		/*
+		 * If we need 64K minimum GTT pages for device local-memory,
+		 * like on XEHPSDV, then we need to fail the allocation here,
+		 * otherwise we can't safely support the insertion of
+		 * local-memory pages for this vm, since the HW expects the
+		 * correct physical alignment and size when the page-table is
+		 * operating in 64K GTT mode, which includes any scratch PTEs,
+		 * since userpsace can still touch them.
+		 */
+		if (HAS_64K_PAGES(vm->i915))
+			return -ENOMEM;
+
 		size = I915_GTT_PAGE_SIZE_4K;
 	} while (1);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 05/17] drm/i915/gtt/xehpsdv: move scratch page to system memory
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C

From: Matthew Auld <matthew.auld@intel.com>

On some platforms the hw has dropped support for 4K GTT pages when
dealing with LMEM, and due to the design of 64K GTT pages in the hw, we
can only mark the *entire* page-table as operating in 64K GTT mode,
since the enable bit is still on the pde, and not the pte. And since we
we still need to allow 4K GTT pages for SMEM objects, we can't have a
"normal" 4K page-table with scratch pointing to LMEM, since that's
undefined from the hw pov. The simplest solution is to just move the 64K
scratch page to SMEM on such platforms and call it a day, since that
should work for all configurations.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c      |  1 +
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c      | 23 +++++++++++++++++++++--
 drivers/gpu/drm/i915/gt/intel_ggtt.c      |  2 ++
 drivers/gpu/drm/i915/gt/intel_gtt.c       |  2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h       |  2 ++
 drivers/gpu/drm/i915/selftests/mock_gtt.c |  2 ++
 6 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 890191f286e3..49e7651d764a 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -440,6 +440,7 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
 	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
 
 	ppgtt->base.vm.alloc_pt_dma = alloc_pt_dma;
+	ppgtt->base.vm.alloc_scratch_dma = alloc_pt_dma;
 	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
 
 	ppgtt->base.pd = __alloc_pd(I915_PDES);
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 037a9a6e4889..6bff6bf1a450 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -777,10 +777,29 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 	 */
 	ppgtt->vm.has_read_only = !IS_GRAPHICS_VER(gt->i915, 11, 12);
 
-	if (HAS_LMEM(gt->i915))
+	if (HAS_LMEM(gt->i915)) {
 		ppgtt->vm.alloc_pt_dma = alloc_pt_lmem;
-	else
+
+		/*
+		 * On some platforms the hw has dropped support for 4K GTT pages
+		 * when dealing with LMEM, and due to the design of 64K GTT
+		 * pages in the hw, we can only mark the *entire* page-table as
+		 * operating in 64K GTT mode, since the enable bit is still on
+		 * the pde, and not the pte. And since we still need to allow
+		 * 4K GTT pages for SMEM objects, we can't have a "normal" 4K
+		 * page-table with scratch pointing to LMEM, since that's
+		 * undefined from the hw pov. The simplest solution is to just
+		 * move the 64K scratch page to SMEM on such platforms and call
+		 * it a day, since that should work for all configurations.
+		 */
+		if (HAS_64K_PAGES(gt->i915))
+			ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
+		else
+			ppgtt->vm.alloc_scratch_dma = alloc_pt_lmem;
+	} else {
 		ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
+		ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
+	}
 
 	err = gen8_init_scratch(&ppgtt->vm);
 	if (err)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index f17383e76eb7..289316007029 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -1077,6 +1077,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
 	ggtt->vm.total = (size / sizeof(gen6_pte_t)) * I915_GTT_PAGE_SIZE;
 
 	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+	ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
 	ggtt->vm.clear_range = nop_clear_range;
 	if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915))
@@ -1129,6 +1130,7 @@ static int i915_gmch_probe(struct i915_ggtt *ggtt)
 		(struct resource)DEFINE_RES_MEM(gmadr_base, ggtt->mappable_end);
 
 	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+	ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
 	if (needs_idle_maps(i915)) {
 		drm_notice(&i915->drm,
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 2a6eec5f0d58..56fbd37a6b54 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -298,7 +298,7 @@ int setup_scratch_page(struct i915_address_space *vm)
 	do {
 		struct drm_i915_gem_object *obj;
 
-		obj = vm->alloc_pt_dma(vm, size);
+		obj = vm->alloc_scratch_dma(vm, size);
 		if (IS_ERR(obj))
 			goto skip;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index bc6750263359..6d13f4ab4d4a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -265,6 +265,8 @@ struct i915_address_space {
 
 	struct drm_i915_gem_object *
 		(*alloc_pt_dma)(struct i915_address_space *vm, int sz);
+	struct drm_i915_gem_object *
+		(*alloc_scratch_dma)(struct i915_address_space *vm, int sz);
 
 	u64 (*pte_encode)(dma_addr_t addr,
 			  enum i915_cache_level level,
diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
index cc047ec594f9..32ca8962d0ab 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
@@ -78,6 +78,7 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
 	i915_address_space_init(&ppgtt->vm, VM_CLASS_PPGTT);
 
 	ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
+	ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
 	ppgtt->vm.clear_range = mock_clear_range;
 	ppgtt->vm.insert_page = mock_insert_page;
@@ -118,6 +119,7 @@ void mock_init_ggtt(struct drm_i915_private *i915, struct i915_ggtt *ggtt)
 	ggtt->vm.total = 4096 * PAGE_SIZE;
 
 	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+	ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
 	ggtt->vm.clear_range = mock_clear_range;
 	ggtt->vm.insert_page = mock_insert_page;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 05/17] drm/i915/gtt/xehpsdv: move scratch page to system memory
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C

From: Matthew Auld <matthew.auld@intel.com>

On some platforms the hw has dropped support for 4K GTT pages when
dealing with LMEM, and due to the design of 64K GTT pages in the hw, we
can only mark the *entire* page-table as operating in 64K GTT mode,
since the enable bit is still on the pde, and not the pte. And since we
we still need to allow 4K GTT pages for SMEM objects, we can't have a
"normal" 4K page-table with scratch pointing to LMEM, since that's
undefined from the hw pov. The simplest solution is to just move the 64K
scratch page to SMEM on such platforms and call it a day, since that
should work for all configurations.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c      |  1 +
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c      | 23 +++++++++++++++++++++--
 drivers/gpu/drm/i915/gt/intel_ggtt.c      |  2 ++
 drivers/gpu/drm/i915/gt/intel_gtt.c       |  2 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h       |  2 ++
 drivers/gpu/drm/i915/selftests/mock_gtt.c |  2 ++
 6 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 890191f286e3..49e7651d764a 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -440,6 +440,7 @@ struct i915_ppgtt *gen6_ppgtt_create(struct intel_gt *gt)
 	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
 
 	ppgtt->base.vm.alloc_pt_dma = alloc_pt_dma;
+	ppgtt->base.vm.alloc_scratch_dma = alloc_pt_dma;
 	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
 
 	ppgtt->base.pd = __alloc_pd(I915_PDES);
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 037a9a6e4889..6bff6bf1a450 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -777,10 +777,29 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 	 */
 	ppgtt->vm.has_read_only = !IS_GRAPHICS_VER(gt->i915, 11, 12);
 
-	if (HAS_LMEM(gt->i915))
+	if (HAS_LMEM(gt->i915)) {
 		ppgtt->vm.alloc_pt_dma = alloc_pt_lmem;
-	else
+
+		/*
+		 * On some platforms the hw has dropped support for 4K GTT pages
+		 * when dealing with LMEM, and due to the design of 64K GTT
+		 * pages in the hw, we can only mark the *entire* page-table as
+		 * operating in 64K GTT mode, since the enable bit is still on
+		 * the pde, and not the pte. And since we still need to allow
+		 * 4K GTT pages for SMEM objects, we can't have a "normal" 4K
+		 * page-table with scratch pointing to LMEM, since that's
+		 * undefined from the hw pov. The simplest solution is to just
+		 * move the 64K scratch page to SMEM on such platforms and call
+		 * it a day, since that should work for all configurations.
+		 */
+		if (HAS_64K_PAGES(gt->i915))
+			ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
+		else
+			ppgtt->vm.alloc_scratch_dma = alloc_pt_lmem;
+	} else {
 		ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
+		ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
+	}
 
 	err = gen8_init_scratch(&ppgtt->vm);
 	if (err)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index f17383e76eb7..289316007029 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -1077,6 +1077,7 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
 	ggtt->vm.total = (size / sizeof(gen6_pte_t)) * I915_GTT_PAGE_SIZE;
 
 	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+	ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
 	ggtt->vm.clear_range = nop_clear_range;
 	if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915))
@@ -1129,6 +1130,7 @@ static int i915_gmch_probe(struct i915_ggtt *ggtt)
 		(struct resource)DEFINE_RES_MEM(gmadr_base, ggtt->mappable_end);
 
 	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+	ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
 	if (needs_idle_maps(i915)) {
 		drm_notice(&i915->drm,
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 2a6eec5f0d58..56fbd37a6b54 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -298,7 +298,7 @@ int setup_scratch_page(struct i915_address_space *vm)
 	do {
 		struct drm_i915_gem_object *obj;
 
-		obj = vm->alloc_pt_dma(vm, size);
+		obj = vm->alloc_scratch_dma(vm, size);
 		if (IS_ERR(obj))
 			goto skip;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index bc6750263359..6d13f4ab4d4a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -265,6 +265,8 @@ struct i915_address_space {
 
 	struct drm_i915_gem_object *
 		(*alloc_pt_dma)(struct i915_address_space *vm, int sz);
+	struct drm_i915_gem_object *
+		(*alloc_scratch_dma)(struct i915_address_space *vm, int sz);
 
 	u64 (*pte_encode)(dma_addr_t addr,
 			  enum i915_cache_level level,
diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
index cc047ec594f9..32ca8962d0ab 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
@@ -78,6 +78,7 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name)
 	i915_address_space_init(&ppgtt->vm, VM_CLASS_PPGTT);
 
 	ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
+	ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
 	ppgtt->vm.clear_range = mock_clear_range;
 	ppgtt->vm.insert_page = mock_insert_page;
@@ -118,6 +119,7 @@ void mock_init_ggtt(struct drm_i915_private *i915, struct i915_ggtt *ggtt)
 	ggtt->vm.total = 4096 * PAGE_SIZE;
 
 	ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+	ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
 
 	ggtt->vm.clear_range = mock_clear_range;
 	ggtt->vm.insert_page = mock_insert_page;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 06/17] drm/i915/xehpsdv: support 64K GTT pages
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Joonas Lahtinen

From: Matthew Auld <matthew.auld@intel.com>

XEHPSDV optimises 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  61 ++++++++++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 106 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 4 files changed, 168 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 41d0680f3bd7..9c2ffa4090f1 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1451,6 +1451,66 @@ static int igt_ppgtt_sanity_check(void *arg)
 	return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+	struct i915_gem_context *ctx = arg;
+	struct drm_i915_private *i915 = ctx->i915;
+	struct drm_i915_gem_object *obj;
+	int err;
+
+	/*
+	 * Simple test to catch issues with compact 64K pages -- since the pt is
+	 * compacted to 256B that gives us 32 entries per pt, however since the
+	 * backing page for the pt is 4K, any extra entries we might incorrectly
+	 * write out should be ignored by the HW. If ever hit such a case this
+	 * test should catch it since some of our writes would land in scratch.
+	 */
+
+	if (!HAS_64K_PAGES(i915)) {
+		pr_info("device lacks compact 64K page support, skipping\n");
+		return 0;
+	}
+
+	if (!HAS_LMEM(i915)) {
+		pr_info("device lacks LMEM support, skipping\n");
+		return 0;
+	}
+
+	/* We want the range to cover multiple page-table boundaries. */
+	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+	if (IS_ERR(obj))
+		return err;
+
+	err = i915_gem_object_pin_pages_unlocked(obj);
+	if (err)
+		goto out_put;
+
+	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+		pr_info("LMEM compact unable to allocate huge-page(s)\n");
+		goto out_unpin;
+	}
+
+	/*
+	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+	 * insertion.
+	 */
+	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+	err = igt_write_huge(ctx, obj);
+	if (err)
+		pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+	i915_gem_object_unpin_pages(obj);
+out_put:
+	i915_gem_object_put(obj);
+
+	if (err == -ENOMEM)
+		err = 0;
+
+	return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
 	struct i915_gem_context *ctx = arg;
@@ -1681,6 +1741,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_tmpfs_fallback),
 		SUBTEST(igt_ppgtt_smoke_huge),
 		SUBTEST(igt_ppgtt_sanity_check),
+		SUBTEST(igt_ppgtt_compact),
 	};
 	struct i915_gem_context *ctx;
 	struct i915_address_space *vm;
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 6bff6bf1a450..fec0f20f1b93 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 						   start, end, lvl);
 		} else {
 			unsigned int count;
+			unsigned int pte = gen8_pd_index(start, 0);
+			unsigned int num_ptes;
 			u64 *vaddr;
 
 			count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 			    atomic_read(&pt->used));
 			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+			num_ptes = count;
+			if (pt->is_compact) {
+				GEM_BUG_ON(num_ptes % 16);
+				GEM_BUG_ON(pte % 16);
+				num_ptes /= 16;
+				pte /= 16;
+			}
+
 			vaddr = px_vaddr(pt);
-			memset64(vaddr + gen8_pd_index(start, 0),
+			memset64(vaddr + pte,
 				 vm->scratch[0]->encode,
-				 count);
+				 num_ptes);
 
 			atomic_sub(count, &pt->used);
 			start += count;
@@ -454,6 +464,93 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void
+xehpsdv_ppgtt_insert_huge(struct i915_vma *vma,
+			  struct sgt_dma *iter,
+			  enum i915_cache_level cache_level,
+			  u32 flags)
+{
+	const gen8_pte_t pte_encode = vma->vm->pte_encode(0, cache_level, flags);
+	unsigned int rem = sg_dma_len(iter->sg);
+	u64 start = vma->node.start;
+
+	GEM_BUG_ON(!i915_vm_is_4lvl(vma->vm));
+
+	do {
+		struct i915_page_directory * const pdp =
+			gen8_pdp_for_page_address(vma->vm, start);
+		struct i915_page_directory * const pd =
+			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
+		struct i915_page_table *pt =
+			i915_pt_entry(pd, __gen8_pte_index(start, 1));
+		gen8_pte_t encode = pte_encode;
+		unsigned int page_size;
+		gen8_pte_t *vaddr;
+		u16 index, max;
+
+		max = I915_PDES;
+
+		if (vma->page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
+		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
+		    rem >= I915_GTT_PAGE_SIZE_2M &&
+		    !__gen8_pte_index(start, 0)) {
+			index = __gen8_pte_index(start, 1);
+			encode |= GEN8_PDE_PS_2M;
+			page_size = I915_GTT_PAGE_SIZE_2M;
+
+			vaddr = px_vaddr(pd);
+		} else {
+			if (encode & GEN12_PPGTT_PTE_LM) {
+				GEM_BUG_ON(!i915_gem_object_is_lmem(vma->obj));
+				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
+				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
+				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
+						       I915_GTT_PAGE_SIZE_64K));
+
+				index = __gen8_pte_index(start, 0) / 16;
+				page_size = I915_GTT_PAGE_SIZE_64K;
+
+				max /= 16;
+
+				vaddr = px_vaddr(pd);
+				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
+
+				pt->is_compact = true;
+			} else {
+				GEM_BUG_ON(i915_gem_object_is_lmem(vma->obj));
+				GEM_BUG_ON(pt->is_compact);
+				index =  __gen8_pte_index(start, 0);
+				page_size = I915_GTT_PAGE_SIZE;
+			}
+
+			vaddr = px_vaddr(pt);
+		}
+
+		do {
+			GEM_BUG_ON(rem < page_size);
+			vaddr[index++] = encode | iter->dma;
+
+			start += page_size;
+			iter->dma += page_size;
+			rem -= page_size;
+			if (iter->dma >= iter->max) {
+				iter->sg = __sg_next(iter->sg);
+				GEM_BUG_ON(!iter->sg);
+
+				rem = sg_dma_len(iter->sg);
+				GEM_BUG_ON(!rem);
+				iter->dma = sg_dma_address(iter->sg);
+				iter->max = iter->dma + rem;
+
+				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
+					break;
+			}
+		} while (rem >= page_size && index < max);
+
+		vma->page_sizes.gtt |= page_size;
+	} while (iter->sg && sg_dma_len(iter->sg));
+}
+
 static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
 				   struct sgt_dma *iter,
 				   enum i915_cache_level cache_level,
@@ -586,7 +683,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma);
 
 	if (vma->page_sizes.sg > I915_GTT_PAGE_SIZE) {
-		gen8_ppgtt_insert_huge(vma, &iter, cache_level, flags);
+		if (HAS_64K_PAGES(vm->i915))
+			xehpsdv_ppgtt_insert_huge(vma, &iter, cache_level, flags);
+		else
+			gen8_ppgtt_insert_huge(vma, &iter, cache_level, flags);
 	} else  {
 		u64 idx = vma->node.start >> GEN8_PTE_SHIFT;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 6d13f4ab4d4a..6d0233ffae17 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -89,6 +89,8 @@ typedef u64 gen8_pte_t;
 
 #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
 
+#define GEN12_PDE_64K BIT(6)
+
 /*
  * Cacheability Control is a 4-bit value. The low three bits are stored in bits
  * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
@@ -154,6 +156,7 @@ struct i915_page_table {
 		atomic_t used;
 		struct i915_page_table *stash;
 	};
+	bool is_compact;
 };
 
 struct i915_page_directory {
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 4396bfd630d8..b8238f5bc8b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	pt->is_compact = false;
 	atomic_set(&pt->used, 0);
 	return pt;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 06/17] drm/i915/xehpsdv: support 64K GTT pages
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Joonas Lahtinen

From: Matthew Auld <matthew.auld@intel.com>

XEHPSDV optimises 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  61 ++++++++++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 106 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 4 files changed, 168 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 41d0680f3bd7..9c2ffa4090f1 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1451,6 +1451,66 @@ static int igt_ppgtt_sanity_check(void *arg)
 	return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+	struct i915_gem_context *ctx = arg;
+	struct drm_i915_private *i915 = ctx->i915;
+	struct drm_i915_gem_object *obj;
+	int err;
+
+	/*
+	 * Simple test to catch issues with compact 64K pages -- since the pt is
+	 * compacted to 256B that gives us 32 entries per pt, however since the
+	 * backing page for the pt is 4K, any extra entries we might incorrectly
+	 * write out should be ignored by the HW. If ever hit such a case this
+	 * test should catch it since some of our writes would land in scratch.
+	 */
+
+	if (!HAS_64K_PAGES(i915)) {
+		pr_info("device lacks compact 64K page support, skipping\n");
+		return 0;
+	}
+
+	if (!HAS_LMEM(i915)) {
+		pr_info("device lacks LMEM support, skipping\n");
+		return 0;
+	}
+
+	/* We want the range to cover multiple page-table boundaries. */
+	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+	if (IS_ERR(obj))
+		return err;
+
+	err = i915_gem_object_pin_pages_unlocked(obj);
+	if (err)
+		goto out_put;
+
+	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+		pr_info("LMEM compact unable to allocate huge-page(s)\n");
+		goto out_unpin;
+	}
+
+	/*
+	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+	 * insertion.
+	 */
+	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+	err = igt_write_huge(ctx, obj);
+	if (err)
+		pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+	i915_gem_object_unpin_pages(obj);
+out_put:
+	i915_gem_object_put(obj);
+
+	if (err == -ENOMEM)
+		err = 0;
+
+	return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
 	struct i915_gem_context *ctx = arg;
@@ -1681,6 +1741,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_tmpfs_fallback),
 		SUBTEST(igt_ppgtt_smoke_huge),
 		SUBTEST(igt_ppgtt_sanity_check),
+		SUBTEST(igt_ppgtt_compact),
 	};
 	struct i915_gem_context *ctx;
 	struct i915_address_space *vm;
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 6bff6bf1a450..fec0f20f1b93 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 						   start, end, lvl);
 		} else {
 			unsigned int count;
+			unsigned int pte = gen8_pd_index(start, 0);
+			unsigned int num_ptes;
 			u64 *vaddr;
 
 			count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 			    atomic_read(&pt->used));
 			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+			num_ptes = count;
+			if (pt->is_compact) {
+				GEM_BUG_ON(num_ptes % 16);
+				GEM_BUG_ON(pte % 16);
+				num_ptes /= 16;
+				pte /= 16;
+			}
+
 			vaddr = px_vaddr(pt);
-			memset64(vaddr + gen8_pd_index(start, 0),
+			memset64(vaddr + pte,
 				 vm->scratch[0]->encode,
-				 count);
+				 num_ptes);
 
 			atomic_sub(count, &pt->used);
 			start += count;
@@ -454,6 +464,93 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void
+xehpsdv_ppgtt_insert_huge(struct i915_vma *vma,
+			  struct sgt_dma *iter,
+			  enum i915_cache_level cache_level,
+			  u32 flags)
+{
+	const gen8_pte_t pte_encode = vma->vm->pte_encode(0, cache_level, flags);
+	unsigned int rem = sg_dma_len(iter->sg);
+	u64 start = vma->node.start;
+
+	GEM_BUG_ON(!i915_vm_is_4lvl(vma->vm));
+
+	do {
+		struct i915_page_directory * const pdp =
+			gen8_pdp_for_page_address(vma->vm, start);
+		struct i915_page_directory * const pd =
+			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
+		struct i915_page_table *pt =
+			i915_pt_entry(pd, __gen8_pte_index(start, 1));
+		gen8_pte_t encode = pte_encode;
+		unsigned int page_size;
+		gen8_pte_t *vaddr;
+		u16 index, max;
+
+		max = I915_PDES;
+
+		if (vma->page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
+		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
+		    rem >= I915_GTT_PAGE_SIZE_2M &&
+		    !__gen8_pte_index(start, 0)) {
+			index = __gen8_pte_index(start, 1);
+			encode |= GEN8_PDE_PS_2M;
+			page_size = I915_GTT_PAGE_SIZE_2M;
+
+			vaddr = px_vaddr(pd);
+		} else {
+			if (encode & GEN12_PPGTT_PTE_LM) {
+				GEM_BUG_ON(!i915_gem_object_is_lmem(vma->obj));
+				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
+				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
+				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
+						       I915_GTT_PAGE_SIZE_64K));
+
+				index = __gen8_pte_index(start, 0) / 16;
+				page_size = I915_GTT_PAGE_SIZE_64K;
+
+				max /= 16;
+
+				vaddr = px_vaddr(pd);
+				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
+
+				pt->is_compact = true;
+			} else {
+				GEM_BUG_ON(i915_gem_object_is_lmem(vma->obj));
+				GEM_BUG_ON(pt->is_compact);
+				index =  __gen8_pte_index(start, 0);
+				page_size = I915_GTT_PAGE_SIZE;
+			}
+
+			vaddr = px_vaddr(pt);
+		}
+
+		do {
+			GEM_BUG_ON(rem < page_size);
+			vaddr[index++] = encode | iter->dma;
+
+			start += page_size;
+			iter->dma += page_size;
+			rem -= page_size;
+			if (iter->dma >= iter->max) {
+				iter->sg = __sg_next(iter->sg);
+				GEM_BUG_ON(!iter->sg);
+
+				rem = sg_dma_len(iter->sg);
+				GEM_BUG_ON(!rem);
+				iter->dma = sg_dma_address(iter->sg);
+				iter->max = iter->dma + rem;
+
+				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
+					break;
+			}
+		} while (rem >= page_size && index < max);
+
+		vma->page_sizes.gtt |= page_size;
+	} while (iter->sg && sg_dma_len(iter->sg));
+}
+
 static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
 				   struct sgt_dma *iter,
 				   enum i915_cache_level cache_level,
@@ -586,7 +683,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma);
 
 	if (vma->page_sizes.sg > I915_GTT_PAGE_SIZE) {
-		gen8_ppgtt_insert_huge(vma, &iter, cache_level, flags);
+		if (HAS_64K_PAGES(vm->i915))
+			xehpsdv_ppgtt_insert_huge(vma, &iter, cache_level, flags);
+		else
+			gen8_ppgtt_insert_huge(vma, &iter, cache_level, flags);
 	} else  {
 		u64 idx = vma->node.start >> GEN8_PTE_SHIFT;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 6d13f4ab4d4a..6d0233ffae17 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -89,6 +89,8 @@ typedef u64 gen8_pte_t;
 
 #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
 
+#define GEN12_PDE_64K BIT(6)
+
 /*
  * Cacheability Control is a 4-bit value. The low three bits are stored in bits
  * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
@@ -154,6 +156,7 @@ struct i915_page_table {
 		atomic_t used;
 		struct i915_page_table *stash;
 	};
+	bool is_compact;
 };
 
 struct i915_page_directory {
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 4396bfd630d8..b8238f5bc8b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	pt->is_compact = false;
 	atomic_set(&pt->used, 0);
 	return pt;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 07/17] drm/i915: Add vm min alignment support
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Bommu Krishnaiah, Wilson Chris P,
	Ramalingam C

From: Bommu Krishnaiah <krishnaiah.bommu@intel.com>

Replace the hard coded 4K alignment value with vm->min_alignment.

Cc: Wilson Chris P <chris.p.wilson@intel.com>
Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 23 ++++++++++++-------
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  9 ++++++++
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++++++++
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index 8402ed925a69..6b9b861e43e5 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -39,6 +39,7 @@ struct tiled_blits {
 	struct blit_buffer scratch;
 	struct i915_vma *batch;
 	u64 hole;
+	u64 align;
 	u32 width;
 	u32 height;
 };
@@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_free;
 	}
 
-	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
 	hole_size *= 2; /* room to maneuver */
-	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+	hole_size += 2 * t->align; /* padding on either side */
 
 	mutex_lock(&t->ce->vm->mutex);
 	memset(&hole, 0, sizeof(hole));
 	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
-					  hole_size, 0, I915_COLOR_UNEVICTABLE,
+					  hole_size, t->align,
+					  I915_COLOR_UNEVICTABLE,
 					  0, U64_MAX,
 					  DRM_MM_INSERT_BEST);
 	if (!err)
@@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_put;
 	}
 
-	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+	t->hole = hole.start + t->align;
 	pr_info("Using hole at %llx\n", t->hole);
 
 	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
 			       struct rnd_state *prng)
 {
-	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+	u64 offset = round_up(t->width * t->height * 4, t->align);
 	u32 *map;
 	int err;
 	int i;
@@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-	u64 offset =
-		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
 	int err;
 
 	/* We want to check position invariant tiling across GTT eviction */
@@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 
 	/* Reposition so that we overlap the old addresses, and slightly off */
 	err = tiled_blit(t,
-			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+			 &t->buffers[2], t->hole + t->align,
 			 &t->buffers[1], t->hole + 3 * offset / 2);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 56fbd37a6b54..4743921b7638 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -216,6 +216,15 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	GEM_BUG_ON(!vm->total);
 	drm_mm_init(&vm->mm, 0, vm->total);
+
+	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+		 ARRAY_SIZE(vm->min_alignment));
+
+	if (HAS_64K_PAGES(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+	}
+
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
 	INIT_LIST_HEAD(&vm->bound_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 6d0233ffae17..20101eef4c95 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -28,6 +28,8 @@
 #include "gt/intel_reset.h"
 #include "i915_selftest.h"
 #include "i915_vma_types.h"
+#include "i915_params.h"
+#include "intel_memory_region.h"
 
 #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
 
@@ -224,6 +226,7 @@ struct i915_address_space {
 	struct device *dma;
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 	u64 reserved;		/* size addr space reserved */
+	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
 
 	unsigned int bind_async_flags;
 
@@ -382,6 +385,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
 	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
 }
 
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
+					enum intel_memory_type type)
+{
+	return vm->min_alignment[type];
+}
+
 static inline bool
 i915_vm_has_cache_coloring(struct i915_address_space *vm)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 07/17] drm/i915: Add vm min alignment support
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Bommu Krishnaiah, Wilson Chris P,
	Ramalingam C

From: Bommu Krishnaiah <krishnaiah.bommu@intel.com>

Replace the hard coded 4K alignment value with vm->min_alignment.

Cc: Wilson Chris P <chris.p.wilson@intel.com>
Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 23 ++++++++++++-------
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  9 ++++++++
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++++++++
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index 8402ed925a69..6b9b861e43e5 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -39,6 +39,7 @@ struct tiled_blits {
 	struct blit_buffer scratch;
 	struct i915_vma *batch;
 	u64 hole;
+	u64 align;
 	u32 width;
 	u32 height;
 };
@@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_free;
 	}
 
-	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
 	hole_size *= 2; /* room to maneuver */
-	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+	hole_size += 2 * t->align; /* padding on either side */
 
 	mutex_lock(&t->ce->vm->mutex);
 	memset(&hole, 0, sizeof(hole));
 	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
-					  hole_size, 0, I915_COLOR_UNEVICTABLE,
+					  hole_size, t->align,
+					  I915_COLOR_UNEVICTABLE,
 					  0, U64_MAX,
 					  DRM_MM_INSERT_BEST);
 	if (!err)
@@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_put;
 	}
 
-	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+	t->hole = hole.start + t->align;
 	pr_info("Using hole at %llx\n", t->hole);
 
 	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
 			       struct rnd_state *prng)
 {
-	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+	u64 offset = round_up(t->width * t->height * 4, t->align);
 	u32 *map;
 	int err;
 	int i;
@@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-	u64 offset =
-		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
 	int err;
 
 	/* We want to check position invariant tiling across GTT eviction */
@@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 
 	/* Reposition so that we overlap the old addresses, and slightly off */
 	err = tiled_blit(t,
-			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+			 &t->buffers[2], t->hole + t->align,
 			 &t->buffers[1], t->hole + 3 * offset / 2);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 56fbd37a6b54..4743921b7638 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -216,6 +216,15 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	GEM_BUG_ON(!vm->total);
 	drm_mm_init(&vm->mm, 0, vm->total);
+
+	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+		 ARRAY_SIZE(vm->min_alignment));
+
+	if (HAS_64K_PAGES(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+	}
+
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
 	INIT_LIST_HEAD(&vm->bound_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 6d0233ffae17..20101eef4c95 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -28,6 +28,8 @@
 #include "gt/intel_reset.h"
 #include "i915_selftest.h"
 #include "i915_vma_types.h"
+#include "i915_params.h"
+#include "intel_memory_region.h"
 
 #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
 
@@ -224,6 +226,7 @@ struct i915_address_space {
 	struct device *dma;
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 	u64 reserved;		/* size addr space reserved */
+	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
 
 	unsigned int bind_async_flags;
 
@@ -382,6 +385,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
 	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
 }
 
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
+					enum intel_memory_type type)
+{
+	return vm->min_alignment[type];
+}
+
 static inline bool
 i915_vm_has_cache_coloring(struct i915_address_space *vm)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 08/17] drm/i915/selftests: account for min_alignment in GTT selftests
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C

From: Matthew Auld <matthew.auld@intel.com>

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
 1 file changed, 63 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 46f4236039a9..fdb4bf88293b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -237,6 +237,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			 u64 hole_start, u64 hole_end,
 			 unsigned long end_time)
 {
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	I915_RND_STATE(seed_prng);
 	struct i915_vma *mock_vma;
 	unsigned int size;
@@ -250,9 +252,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		I915_RND_SUBSTATE(prng, seed_prng);
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -273,8 +276,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 		GEM_BUG_ON(!order);
 
-		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
-		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
+		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
+		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
 
 		/* Ignore allocation failures (i.e. don't report them as
 		 * a test failure) as we are purposefully allocating very
@@ -297,10 +300,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
-			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
+			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
 
 			if (igt_timeout(end_time,
 					"%s timed out before %d/%d\n",
@@ -343,7 +346,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			}
 
 			mock_vma->pages = obj->mm.pages;
-			mock_vma->node.size = BIT_ULL(size);
+			mock_vma->node.size = BIT_ULL(aligned_size);
 			mock_vma->node.start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
@@ -354,7 +357,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 		i915_random_reorder(order, count, &prng);
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
 			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
@@ -398,8 +401,10 @@ static int fill_hole(struct i915_address_space *vm,
 {
 	const u64 hole_size = hole_end - hole_start;
 	struct drm_i915_gem_object *obj;
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	const unsigned long max_pages =
-		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
+		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
 	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
 	unsigned long npages, prime, flags;
 	struct i915_vma *vma;
@@ -440,14 +445,17 @@ static int fill_hole(struct i915_address_space *vm,
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -469,22 +477,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -505,22 +516,25 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -542,22 +556,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -578,9 +595,9 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 			}
@@ -610,6 +627,7 @@ static int walk_hole(struct i915_address_space *vm,
 	const u64 hole_size = hole_end - hole_start;
 	const unsigned long max_pages =
 		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
+	unsigned long min_alignment;
 	unsigned long flags;
 	u64 size;
 
@@ -619,6 +637,8 @@ static int walk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	for_each_prime_number_from(size, 1, max_pages) {
 		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
@@ -637,7 +657,7 @@ static int walk_hole(struct i915_address_space *vm,
 
 		for (addr = hole_start;
 		     addr + obj->base.size < hole_end;
-		     addr += obj->base.size) {
+		     addr += round_up(obj->base.size, min_alignment)) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
 				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
@@ -689,6 +709,7 @@ static int pot_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	unsigned int min_alignment;
 	unsigned long flags;
 	unsigned int pot;
 	int err = 0;
@@ -697,6 +718,8 @@ static int pot_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
@@ -709,13 +732,13 @@ static int pot_hole(struct i915_address_space *vm,
 
 	/* Insert a pair of pages across every pot boundary within the hole */
 	for (pot = fls64(hole_end - 1) - 1;
-	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
+	     pot > ilog2(2 * min_alignment);
 	     pot--) {
 		u64 step = BIT_ULL(pot);
 		u64 addr;
 
-		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
-		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
+		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
+		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
 		     addr += step) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -760,6 +783,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		      unsigned long end_time)
 {
 	I915_RND_STATE(prng);
+	unsigned int min_alignment;
 	unsigned int size;
 	unsigned long flags;
 
@@ -767,15 +791,18 @@ static int drunk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (size = 12; (hole_end - hole_start) >> size; size++) {
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
 		struct i915_vma *vma;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 		int err = -ENODEV;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -815,7 +842,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		GEM_BUG_ON(vma->size != BIT_ULL(size));
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -867,11 +894,14 @@ static int __shrink_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	unsigned int min_alignment;
 	unsigned int order = 12;
 	LIST_HEAD(objects);
 	int err = 0;
 	u64 addr;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (addr = hole_start; addr < hole_end; ) {
 		struct i915_vma *vma;
@@ -912,7 +942,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 		}
 
 		i915_vma_unpin(vma);
-		addr += size;
+		addr += round_up(size, min_alignment);
 
 		/*
 		 * Since we are injecting allocation faults at random intervals,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 08/17] drm/i915/selftests: account for min_alignment in GTT selftests
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C

From: Matthew Auld <matthew.auld@intel.com>

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
 1 file changed, 63 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 46f4236039a9..fdb4bf88293b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -237,6 +237,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			 u64 hole_start, u64 hole_end,
 			 unsigned long end_time)
 {
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	I915_RND_STATE(seed_prng);
 	struct i915_vma *mock_vma;
 	unsigned int size;
@@ -250,9 +252,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		I915_RND_SUBSTATE(prng, seed_prng);
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -273,8 +276,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 		GEM_BUG_ON(!order);
 
-		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
-		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
+		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
+		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
 
 		/* Ignore allocation failures (i.e. don't report them as
 		 * a test failure) as we are purposefully allocating very
@@ -297,10 +300,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
-			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
+			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
 
 			if (igt_timeout(end_time,
 					"%s timed out before %d/%d\n",
@@ -343,7 +346,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			}
 
 			mock_vma->pages = obj->mm.pages;
-			mock_vma->node.size = BIT_ULL(size);
+			mock_vma->node.size = BIT_ULL(aligned_size);
 			mock_vma->node.start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
@@ -354,7 +357,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 		i915_random_reorder(order, count, &prng);
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
 			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
@@ -398,8 +401,10 @@ static int fill_hole(struct i915_address_space *vm,
 {
 	const u64 hole_size = hole_end - hole_start;
 	struct drm_i915_gem_object *obj;
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	const unsigned long max_pages =
-		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
+		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
 	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
 	unsigned long npages, prime, flags;
 	struct i915_vma *vma;
@@ -440,14 +445,17 @@ static int fill_hole(struct i915_address_space *vm,
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -469,22 +477,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -505,22 +516,25 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -542,22 +556,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -578,9 +595,9 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 			}
@@ -610,6 +627,7 @@ static int walk_hole(struct i915_address_space *vm,
 	const u64 hole_size = hole_end - hole_start;
 	const unsigned long max_pages =
 		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
+	unsigned long min_alignment;
 	unsigned long flags;
 	u64 size;
 
@@ -619,6 +637,8 @@ static int walk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	for_each_prime_number_from(size, 1, max_pages) {
 		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
@@ -637,7 +657,7 @@ static int walk_hole(struct i915_address_space *vm,
 
 		for (addr = hole_start;
 		     addr + obj->base.size < hole_end;
-		     addr += obj->base.size) {
+		     addr += round_up(obj->base.size, min_alignment)) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
 				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
@@ -689,6 +709,7 @@ static int pot_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	unsigned int min_alignment;
 	unsigned long flags;
 	unsigned int pot;
 	int err = 0;
@@ -697,6 +718,8 @@ static int pot_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
@@ -709,13 +732,13 @@ static int pot_hole(struct i915_address_space *vm,
 
 	/* Insert a pair of pages across every pot boundary within the hole */
 	for (pot = fls64(hole_end - 1) - 1;
-	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
+	     pot > ilog2(2 * min_alignment);
 	     pot--) {
 		u64 step = BIT_ULL(pot);
 		u64 addr;
 
-		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
-		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
+		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
+		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
 		     addr += step) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -760,6 +783,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		      unsigned long end_time)
 {
 	I915_RND_STATE(prng);
+	unsigned int min_alignment;
 	unsigned int size;
 	unsigned long flags;
 
@@ -767,15 +791,18 @@ static int drunk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (size = 12; (hole_end - hole_start) >> size; size++) {
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
 		struct i915_vma *vma;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 		int err = -ENODEV;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -815,7 +842,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		GEM_BUG_ON(vma->size != BIT_ULL(size));
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -867,11 +894,14 @@ static int __shrink_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	unsigned int min_alignment;
 	unsigned int order = 12;
 	LIST_HEAD(objects);
 	int err = 0;
 	u64 addr;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (addr = hole_start; addr < hole_end; ) {
 		struct i915_vma *vma;
@@ -912,7 +942,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 		}
 
 		i915_vma_unpin(vma);
-		addr += size;
+		addr += round_up(size, min_alignment);
 
 		/*
 		 * Since we are injecting allocation faults at random intervals,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 09/17] drm/i915/xehpsdv: implement memory coloring
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Joonas Lahtinen

From: Matthew Auld <matthew.auld@intel.com>

The basic idea is that each 2M block(page-table) has a color, depending
on if the page-table is occupied by LMEM objects(64K) or SMEM
objects(4K), where our goal is to prevent mixing 64K and 4K GTT pages in
the page-table, which is not supported by the HW.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 16 ++++++++++
 drivers/gpu/drm/i915/gt/intel_gtt.h   |  6 ++++
 drivers/gpu/drm/i915/i915_gem_evict.c | 17 ++++++++++
 drivers/gpu/drm/i915/i915_vma.c       | 46 +++++++++++++++++++--------
 4 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index fec0f20f1b93..666745adbe93 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -464,6 +464,19 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void xehpsdv_ppgtt_color_adjust(const struct drm_mm_node *node,
+				       unsigned long color,
+				       u64 *start,
+				       u64 *end)
+{
+	if (i915_node_color_differs(node, color))
+		*start = round_up(*start, SZ_2M);
+
+	node = list_next_entry(node, node_list);
+	if (i915_node_color_differs(node, color))
+		*end = round_down(*end, SZ_2M);
+}
+
 static void
 xehpsdv_ppgtt_insert_huge(struct i915_vma *vma,
 			  struct sgt_dma *iter,
@@ -901,6 +914,9 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 		ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
 	}
 
+	if (HAS_64K_PAGES(gt->i915))
+		ppgtt->vm.mm.color_adjust = xehpsdv_ppgtt_color_adjust;
+
 	err = gen8_init_scratch(&ppgtt->vm);
 	if (err)
 		goto err_free;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 20101eef4c95..34696acde342 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -397,6 +397,12 @@ i915_vm_has_cache_coloring(struct i915_address_space *vm)
 	return i915_is_ggtt(vm) && vm->mm.color_adjust;
 }
 
+static inline bool
+i915_vm_has_memory_coloring(struct i915_address_space *vm)
+{
+	return !i915_is_ggtt(vm) && vm->mm.color_adjust;
+}
+
 static inline struct i915_ggtt *
 i915_vm_to_ggtt(struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 2b73ddb11c66..006bf4924c24 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -292,6 +292,13 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
 
 		/* Always look at the page afterwards to avoid the end-of-GTT */
 		end += I915_GTT_PAGE_SIZE;
+	} else if (i915_vm_has_memory_coloring(vm)) {
+		/*
+		 * Expand the search the cover the page-table boundries, in
+		 * case we need to flip the color of the page-table(s).
+		 */
+		start = round_down(start, SZ_2M);
+		end = round_up(end, SZ_2M);
 	}
 	GEM_BUG_ON(start >= end);
 
@@ -321,6 +328,16 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
 				if (node->color == target->color)
 					continue;
 			}
+		} else if (i915_vm_has_memory_coloring(vm)) {
+			if (node->start + node->size <= target->start) {
+				if (node->color == target->color)
+					continue;
+			}
+
+			if (node->start >= target->start + target->size) {
+				if (node->color == target->color)
+					continue;
+			}
 		}
 
 		if (i915_vma_is_pinned(vma)) {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index c31b4bc8af16..92b124ecc38c 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -585,6 +585,10 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, unsigned long color)
 	struct drm_mm_node *node = &vma->node;
 	struct drm_mm_node *other;
 
+	/* Only valid to be called on an already inserted vma */
+	GEM_BUG_ON(!drm_mm_node_allocated(node));
+	GEM_BUG_ON(list_empty(&node->node_list));
+
 	/*
 	 * On some machines we have to be careful when putting differing types
 	 * of snoopable memory together to avoid the prefetcher crossing memory
@@ -592,22 +596,34 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, unsigned long color)
 	 * these constraints apply and set the drm_mm.color_adjust
 	 * appropriately.
 	 */
-	if (!i915_vm_has_cache_coloring(vma->vm))
-		return true;
-
-	/* Only valid to be called on an already inserted vma */
-	GEM_BUG_ON(!drm_mm_node_allocated(node));
-	GEM_BUG_ON(list_empty(&node->node_list));
+	if (i915_vm_has_cache_coloring(vma->vm)) {
+		other = list_prev_entry(node, node_list);
+		if (i915_node_color_differs(other, color) &&
+		    !drm_mm_hole_follows(other))
+			return false;
 
-	other = list_prev_entry(node, node_list);
-	if (i915_node_color_differs(other, color) &&
-	    !drm_mm_hole_follows(other))
-		return false;
+		other = list_next_entry(node, node_list);
+		if (i915_node_color_differs(other, color) &&
+		    !drm_mm_hole_follows(node))
+			return false;
+	/*
+	 * On XEHPSDV we need to make sure we are not mixing LMEM and SMEM objects
+	 * in the same page-table, i.e mixing 64K and 4K gtt pages in the same
+	 * page-table.
+	 */
+	} else if (i915_vm_has_memory_coloring(vma->vm)) {
+		other = list_prev_entry(node, node_list);
+		if (i915_node_color_differs(other, color) &&
+		    !drm_mm_hole_follows(other) &&
+		    !IS_ALIGNED(other->start + other->size, SZ_2M))
+			return false;
 
-	other = list_next_entry(node, node_list);
-	if (i915_node_color_differs(other, color) &&
-	    !drm_mm_hole_follows(node))
-		return false;
+		other = list_next_entry(node, node_list);
+		if (i915_node_color_differs(other, color) &&
+		    !drm_mm_hole_follows(node) &&
+		    !IS_ALIGNED(other->start, SZ_2M))
+			return false;
+	}
 
 	return true;
 }
@@ -676,6 +692,8 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 
 		if (i915_vm_has_cache_coloring(vma->vm))
 			color = vma->obj->cache_level;
+		else if (i915_vm_has_memory_coloring(vma->vm))
+			color = i915_gem_object_is_lmem(vma->obj);
 	}
 
 	if (flags & PIN_OFFSET_FIXED) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 09/17] drm/i915/xehpsdv: implement memory coloring
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Joonas Lahtinen

From: Matthew Auld <matthew.auld@intel.com>

The basic idea is that each 2M block(page-table) has a color, depending
on if the page-table is occupied by LMEM objects(64K) or SMEM
objects(4K), where our goal is to prevent mixing 64K and 4K GTT pages in
the page-table, which is not supported by the HW.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 16 ++++++++++
 drivers/gpu/drm/i915/gt/intel_gtt.h   |  6 ++++
 drivers/gpu/drm/i915/i915_gem_evict.c | 17 ++++++++++
 drivers/gpu/drm/i915/i915_vma.c       | 46 +++++++++++++++++++--------
 4 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index fec0f20f1b93..666745adbe93 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -464,6 +464,19 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void xehpsdv_ppgtt_color_adjust(const struct drm_mm_node *node,
+				       unsigned long color,
+				       u64 *start,
+				       u64 *end)
+{
+	if (i915_node_color_differs(node, color))
+		*start = round_up(*start, SZ_2M);
+
+	node = list_next_entry(node, node_list);
+	if (i915_node_color_differs(node, color))
+		*end = round_down(*end, SZ_2M);
+}
+
 static void
 xehpsdv_ppgtt_insert_huge(struct i915_vma *vma,
 			  struct sgt_dma *iter,
@@ -901,6 +914,9 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 		ppgtt->vm.alloc_scratch_dma = alloc_pt_dma;
 	}
 
+	if (HAS_64K_PAGES(gt->i915))
+		ppgtt->vm.mm.color_adjust = xehpsdv_ppgtt_color_adjust;
+
 	err = gen8_init_scratch(&ppgtt->vm);
 	if (err)
 		goto err_free;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 20101eef4c95..34696acde342 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -397,6 +397,12 @@ i915_vm_has_cache_coloring(struct i915_address_space *vm)
 	return i915_is_ggtt(vm) && vm->mm.color_adjust;
 }
 
+static inline bool
+i915_vm_has_memory_coloring(struct i915_address_space *vm)
+{
+	return !i915_is_ggtt(vm) && vm->mm.color_adjust;
+}
+
 static inline struct i915_ggtt *
 i915_vm_to_ggtt(struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 2b73ddb11c66..006bf4924c24 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -292,6 +292,13 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
 
 		/* Always look at the page afterwards to avoid the end-of-GTT */
 		end += I915_GTT_PAGE_SIZE;
+	} else if (i915_vm_has_memory_coloring(vm)) {
+		/*
+		 * Expand the search the cover the page-table boundries, in
+		 * case we need to flip the color of the page-table(s).
+		 */
+		start = round_down(start, SZ_2M);
+		end = round_up(end, SZ_2M);
 	}
 	GEM_BUG_ON(start >= end);
 
@@ -321,6 +328,16 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
 				if (node->color == target->color)
 					continue;
 			}
+		} else if (i915_vm_has_memory_coloring(vm)) {
+			if (node->start + node->size <= target->start) {
+				if (node->color == target->color)
+					continue;
+			}
+
+			if (node->start >= target->start + target->size) {
+				if (node->color == target->color)
+					continue;
+			}
 		}
 
 		if (i915_vma_is_pinned(vma)) {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index c31b4bc8af16..92b124ecc38c 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -585,6 +585,10 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, unsigned long color)
 	struct drm_mm_node *node = &vma->node;
 	struct drm_mm_node *other;
 
+	/* Only valid to be called on an already inserted vma */
+	GEM_BUG_ON(!drm_mm_node_allocated(node));
+	GEM_BUG_ON(list_empty(&node->node_list));
+
 	/*
 	 * On some machines we have to be careful when putting differing types
 	 * of snoopable memory together to avoid the prefetcher crossing memory
@@ -592,22 +596,34 @@ bool i915_gem_valid_gtt_space(struct i915_vma *vma, unsigned long color)
 	 * these constraints apply and set the drm_mm.color_adjust
 	 * appropriately.
 	 */
-	if (!i915_vm_has_cache_coloring(vma->vm))
-		return true;
-
-	/* Only valid to be called on an already inserted vma */
-	GEM_BUG_ON(!drm_mm_node_allocated(node));
-	GEM_BUG_ON(list_empty(&node->node_list));
+	if (i915_vm_has_cache_coloring(vma->vm)) {
+		other = list_prev_entry(node, node_list);
+		if (i915_node_color_differs(other, color) &&
+		    !drm_mm_hole_follows(other))
+			return false;
 
-	other = list_prev_entry(node, node_list);
-	if (i915_node_color_differs(other, color) &&
-	    !drm_mm_hole_follows(other))
-		return false;
+		other = list_next_entry(node, node_list);
+		if (i915_node_color_differs(other, color) &&
+		    !drm_mm_hole_follows(node))
+			return false;
+	/*
+	 * On XEHPSDV we need to make sure we are not mixing LMEM and SMEM objects
+	 * in the same page-table, i.e mixing 64K and 4K gtt pages in the same
+	 * page-table.
+	 */
+	} else if (i915_vm_has_memory_coloring(vma->vm)) {
+		other = list_prev_entry(node, node_list);
+		if (i915_node_color_differs(other, color) &&
+		    !drm_mm_hole_follows(other) &&
+		    !IS_ALIGNED(other->start + other->size, SZ_2M))
+			return false;
 
-	other = list_next_entry(node, node_list);
-	if (i915_node_color_differs(other, color) &&
-	    !drm_mm_hole_follows(node))
-		return false;
+		other = list_next_entry(node, node_list);
+		if (i915_node_color_differs(other, color) &&
+		    !drm_mm_hole_follows(node) &&
+		    !IS_ALIGNED(other->start, SZ_2M))
+			return false;
+	}
 
 	return true;
 }
@@ -676,6 +692,8 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 
 		if (i915_vm_has_cache_coloring(vma->vm))
 			color = vma->obj->cache_level;
+		else if (i915_vm_has_memory_coloring(vma->vm))
+			color = i915_gem_object_is_lmem(vma->obj);
 	}
 
 	if (flags & PIN_OFFSET_FIXED) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 10/17] drm/i915/xehpsdv: Add has_flat_ccs to device info
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Joonas Lahtinen, Ramalingam C

From: CQ Tang <cq.tang@intel.com>

Gen12+ devices support 3D surface (buffer) compression and various
compression formats. This is accomplished by an additional compression
control state (CCS) stored for each surface.

Gen 12 devices(TGL family and DG1) stores compression states in a separate
region of memory. It is managed by user-space and has an associated set of
user-space managed page tables used by hardware for address translation.

In Gen12.5 devices(XEHPSDV, DG2, etc), there is a new feature introduced
i.e Flat CCS. It replaced AUX page tables with a flat indexed region of
device memory for storing compression states.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 2 ++
 drivers/gpu/drm/i915/i915_pci.c          | 1 +
 drivers/gpu/drm/i915/intel_device_info.h | 1 +
 3 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a16fde38a252..57948e0ee48b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1721,6 +1721,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
 #define HAS_LMEM(i915) HAS_REGION(i915, REGION_LMEM)
 
+#define HAS_FLAT_CCS(dev_priv)   (INTEL_INFO(dev_priv)->has_flat_ccs)
+
 #define HAS_GT_UC(dev_priv)	(INTEL_INFO(dev_priv)->has_gt_uc)
 
 #define HAS_POOLED_EU(dev_priv)	(INTEL_INFO(dev_priv)->has_pooled_eu)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 8ef484a23652..68367b505dc4 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -991,6 +991,7 @@ static const struct intel_device_info adl_p_info = {
 	XE_HP_PAGE_SIZES, \
 	.dma_mask_size = 46, \
 	.has_64bit_reloc = 1, \
+	.has_flat_ccs = 1, \
 	.has_global_mocs = 1, \
 	.has_gt_uc = 1, \
 	.has_llc = 1, \
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index dd453b96af19..87ee1d86d2ac 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -126,6 +126,7 @@ enum intel_ppgtt_type {
 	func(has_64k_pages); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
+	func(has_flat_ccs); \
 	func(has_global_mocs); \
 	func(has_gt_uc); \
 	func(has_l3_dpf); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 10/17] drm/i915/xehpsdv: Add has_flat_ccs to device info
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Joonas Lahtinen, Ramalingam C

From: CQ Tang <cq.tang@intel.com>

Gen12+ devices support 3D surface (buffer) compression and various
compression formats. This is accomplished by an additional compression
control state (CCS) stored for each surface.

Gen 12 devices(TGL family and DG1) stores compression states in a separate
region of memory. It is managed by user-space and has an associated set of
user-space managed page tables used by hardware for address translation.

In Gen12.5 devices(XEHPSDV, DG2, etc), there is a new feature introduced
i.e Flat CCS. It replaced AUX page tables with a flat indexed region of
device memory for storing compression states.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 2 ++
 drivers/gpu/drm/i915/i915_pci.c          | 1 +
 drivers/gpu/drm/i915/intel_device_info.h | 1 +
 3 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a16fde38a252..57948e0ee48b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1721,6 +1721,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
 #define HAS_LMEM(i915) HAS_REGION(i915, REGION_LMEM)
 
+#define HAS_FLAT_CCS(dev_priv)   (INTEL_INFO(dev_priv)->has_flat_ccs)
+
 #define HAS_GT_UC(dev_priv)	(INTEL_INFO(dev_priv)->has_gt_uc)
 
 #define HAS_POOLED_EU(dev_priv)	(INTEL_INFO(dev_priv)->has_pooled_eu)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 8ef484a23652..68367b505dc4 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -991,6 +991,7 @@ static const struct intel_device_info adl_p_info = {
 	XE_HP_PAGE_SIZES, \
 	.dma_mask_size = 46, \
 	.has_64bit_reloc = 1, \
+	.has_flat_ccs = 1, \
 	.has_global_mocs = 1, \
 	.has_gt_uc = 1, \
 	.has_llc = 1, \
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index dd453b96af19..87ee1d86d2ac 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -126,6 +126,7 @@ enum intel_ppgtt_type {
 	func(has_64k_pages); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
+	func(has_flat_ccs); \
 	func(has_global_mocs); \
 	func(has_gt_uc); \
 	func(has_l3_dpf); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 11/17] drm/i915/lmem: Enable lmem for platforms with Flat CCS
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Abdiel Janulgue, Ramalingam C

From: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>

A portion of device memory is reserved for Flat CCS so usable
device memory will be reduced by size of Flat CCS. Size of
Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”.
So to get effective device memory we need to subtract
total device memory by Flat CCS memory size.

Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c          | 19 ++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt.h          |  1 +
 drivers/gpu/drm/i915/gt/intel_region_lmem.c | 22 +++++++++++++++++++--
 drivers/gpu/drm/i915/i915_reg.h             |  3 +++
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 1cb1948ac959..fd82ebee8724 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -900,6 +900,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg)
 	return intel_uncore_read_fw(gt->uncore, reg);
 }
 
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg)
+{
+	int type;
+	u8 sliceid, subsliceid;
+
+	for (type = 0; type < NUM_STEERING_TYPES; type++) {
+		if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
+			intel_gt_get_valid_steering(gt, type, &sliceid,
+						    &subsliceid);
+			return intel_uncore_read_with_mcr_steering(gt->uncore,
+								   reg,
+								   sliceid,
+								   subsliceid);
+		}
+	}
+
+	return intel_uncore_read(gt->uncore, reg);
+}
+
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 74e771871a9b..24b78398a587 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -84,6 +84,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt,
 }
 
 u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg);
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
 
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p);
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index 073d28d96669..d1f88beb26fe 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -201,8 +201,26 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	if (!IS_DGFX(i915))
 		return ERR_PTR(-ENODEV);
 
-	/* Stolen starts from GSMBASE on DG1 */
-	lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
+	if (HAS_FLAT_CCS(i915)) {
+		u64 tile_stolen, flat_ccs_base_addr_reg, flat_ccs_base;
+
+		lmem_size = pci_resource_len(pdev, 2);
+		flat_ccs_base_addr_reg = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR);
+		flat_ccs_base = (flat_ccs_base_addr_reg >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
+		tile_stolen = lmem_size - flat_ccs_base;
+
+		/* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */
+		if (tile_stolen == lmem_size)
+			DRM_ERROR("CCS_BASE_ADDR register did not have expected value\n");
+
+		lmem_size -= tile_stolen;
+	} else {
+		/* Stolen starts from GSMBASE without CCS */
+		lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE);
+		if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
+			return ERR_PTR(-ENODEV);
+	}
+
 
 	io_start = pci_resource_start(pdev, 2);
 	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 1e221fbe37fd..3693eb03f5aa 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -12469,6 +12469,9 @@ enum skl_power_gate {
 #define GEN12_GSMBASE			_MMIO(0x108100)
 #define GEN12_DSMBASE			_MMIO(0x1080C0)
 
+#define XEHPSDV_FLAT_CCS_BASE_ADDR             _MMIO(0x4910)
+#define   XEHPSDV_CCS_BASE_SHIFT               8
+
 /* gamt regs */
 #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4)
 #define   GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW  0x67F1427F /* max/min for LRA1/2 */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 11/17] drm/i915/lmem: Enable lmem for platforms with Flat CCS
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Abdiel Janulgue, Ramalingam C

From: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>

A portion of device memory is reserved for Flat CCS so usable
device memory will be reduced by size of Flat CCS. Size of
Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”.
So to get effective device memory we need to subtract
total device memory by Flat CCS memory size.

Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c          | 19 ++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt.h          |  1 +
 drivers/gpu/drm/i915/gt/intel_region_lmem.c | 22 +++++++++++++++++++--
 drivers/gpu/drm/i915/i915_reg.h             |  3 +++
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 1cb1948ac959..fd82ebee8724 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -900,6 +900,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg)
 	return intel_uncore_read_fw(gt->uncore, reg);
 }
 
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg)
+{
+	int type;
+	u8 sliceid, subsliceid;
+
+	for (type = 0; type < NUM_STEERING_TYPES; type++) {
+		if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
+			intel_gt_get_valid_steering(gt, type, &sliceid,
+						    &subsliceid);
+			return intel_uncore_read_with_mcr_steering(gt->uncore,
+								   reg,
+								   sliceid,
+								   subsliceid);
+		}
+	}
+
+	return intel_uncore_read(gt->uncore, reg);
+}
+
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 74e771871a9b..24b78398a587 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -84,6 +84,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt,
 }
 
 u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg);
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
 
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p);
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index 073d28d96669..d1f88beb26fe 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -201,8 +201,26 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	if (!IS_DGFX(i915))
 		return ERR_PTR(-ENODEV);
 
-	/* Stolen starts from GSMBASE on DG1 */
-	lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
+	if (HAS_FLAT_CCS(i915)) {
+		u64 tile_stolen, flat_ccs_base_addr_reg, flat_ccs_base;
+
+		lmem_size = pci_resource_len(pdev, 2);
+		flat_ccs_base_addr_reg = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR);
+		flat_ccs_base = (flat_ccs_base_addr_reg >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
+		tile_stolen = lmem_size - flat_ccs_base;
+
+		/* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */
+		if (tile_stolen == lmem_size)
+			DRM_ERROR("CCS_BASE_ADDR register did not have expected value\n");
+
+		lmem_size -= tile_stolen;
+	} else {
+		/* Stolen starts from GSMBASE without CCS */
+		lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE);
+		if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
+			return ERR_PTR(-ENODEV);
+	}
+
 
 	io_start = pci_resource_start(pdev, 2);
 	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 1e221fbe37fd..3693eb03f5aa 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -12469,6 +12469,9 @@ enum skl_power_gate {
 #define GEN12_GSMBASE			_MMIO(0x108100)
 #define GEN12_DSMBASE			_MMIO(0x1080C0)
 
+#define XEHPSDV_FLAT_CCS_BASE_ADDR             _MMIO(0x4910)
+#define   XEHPSDV_CCS_BASE_SHIFT               8
+
 /* gamt regs */
 #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4)
 #define   GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW  0x67F1427F /* max/min for LRA1/2 */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 12/17] drm/i915/gt: Clear compress metadata for Xe_HP platforms
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ayaz A Siddiqui, Ramalingam C

From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>

Xe-hp and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated CCS buffer.

Flat CCS memory can not be directly accessed by S/W.
Address of CCS buffer associated main BO is automatically calculated
by device itself. KMD/UMD can only access this buffer indirectly using
XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.

v2: Fixed issues with platform naming [Lucas]

Cc: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  14 +++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 120 ++++++++++++++++++-
 2 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index f8253012d166..07bf5a1753bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -203,6 +203,20 @@
 #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
 #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
 
+#define XY_CTRL_SURF_INSTR_SIZE	5
+#define MI_FLUSH_DW_SIZE		3
+#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT	21
+#define   DST_ACCESS_TYPE_SHIFT	20
+#define   CCS_SIZE_SHIFT		8
+#define   XY_CTRL_SURF_MOCS_SHIFT	25
+#define   NUM_CCS_BYTES_PER_BLOCK	256
+#define   NUM_CCS_BLKS_PER_XFER	1024
+#define   INDIRECT_ACCESS		0
+#define   DIRECT_ACCESS		1
+#define  MI_FLUSH_LLC			BIT(9)
+#define  MI_FLUSH_CCS			BIT(16)
+
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
 #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index afb1cce9a352..0bed01750884 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -17,6 +17,7 @@ struct insert_pte_data {
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
+#define GET_CCS_SIZE(i915, size)	(HAS_FLAT_CCS(i915) ? (size) >> 8 : 0)
 
 static bool engine_supports_migration(struct intel_engine_cs *engine)
 {
@@ -490,15 +491,104 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, int size, u32 value)
+static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
+{
+	/* Mask the 3 LSB to use the PPGTT address space */
+	*cmd++ = MI_FLUSH_DW | flags;
+	*cmd++ = lower_32_bits(dst);
+	*cmd++ = upper_32_bits(dst);
+
+	return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
+{
+	u32 num_cmds, num_blks, total_size;
+
+	if (!GET_CCS_SIZE(i915, size))
+		return 0;
+
+	/*
+	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
+	 * trnasfer upto 1024 blocks.
+	 */
+	num_blks = (GET_CCS_SIZE(i915, size) +
+			   (NUM_CCS_BYTES_PER_BLOCK - 1)) >> 8;
+	num_cmds = (num_blks + (NUM_CCS_BLKS_PER_XFER - 1)) >> 10;
+	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
+
+	/*
+	 * We need to add a flush before and after
+	 * XY_CTRL_SURF_COPY_BLT
+	 */
+	total_size += 2 * MI_FLUSH_DW_SIZE;
+	return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+				     u8 src_mem_access, u8 dst_mem_access,
+				     int src_mocs, int dst_mocs,
+				     u16 num_ccs_blocks)
+{
+	int i = num_ccs_blocks;
+
+	/*
+	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+	 * data in and out of the CCS region.
+	 *
+	 * We can copy at most 1024 blocks of 256 bytes using one
+	 * XY_CTRL_SURF_COPY_BLT instruction.
+	 *
+	 * In case we need to copy more than 1024 blocks, we need to add
+	 * another instruction to the same batch buffer.
+	 *
+	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
+	 *
+	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
+	 */
+	do {
+		/*
+		 * We use logical AND with 1023 since the size field
+		 * takes values which is in the range of 0 - 1023
+		 */
+		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
+			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
+			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
+			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
+		*cmd++ = lower_32_bits(src_addr);
+		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
+			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		*cmd++ = lower_32_bits(dst_addr);
+		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
+			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		src_addr += SZ_64M;
+		dst_addr += SZ_64M;
+		i -= NUM_CCS_BLKS_PER_XFER;
+	} while (i > 0);
+
+	return cmd;
+}
+
+static int emit_clear(struct i915_request *rq,
+		      int size,
+		      u32 value,
+		      bool is_lmem)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 instance = rq->engine->instance;
 	u32 *cs;
+	struct drm_i915_private *i915 = rq->engine->i915;
+	u32 num_ccs_blks, ccs_ring_size;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
+	/* Clear flat css only when value is 0 */
+	ccs_ring_size = (is_lmem && !value) ?
+			 calc_ctrl_surf_instr_size(i915, size)
+			 : 0;
+
+	cs = intel_ring_begin(rq, ver >= 8 ? 8 + ccs_ring_size : 6);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -521,6 +611,30 @@ static int emit_clear(struct i915_request *rq, int size, u32 value)
 		*cs++ = value;
 	}
 
+	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
+		num_ccs_blks = (GET_CCS_SIZE(i915, size) +
+				NUM_CCS_BYTES_PER_BLOCK - 1) >> 8;
+		/*
+		 * Flat CCS surface can only be accessed via
+		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
+		 * mapping of associated LMEM.
+		 * We can clear ccs surface by writing all 0s,
+		 * so we will flush the previously cleared buffer
+		 * and use it as a source.
+		 */
+
+		cs = i915_flush_dw(cs, (u64)instance << 32,
+				   MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = _i915_ctrl_surf_copy_blt(cs,
+					      (u64)instance << 32,
+					      (u64)instance << 32,
+					      DIRECT_ACCESS,
+					      INDIRECT_ACCESS,
+					      1, 1,
+					      num_ccs_blks);
+		cs = i915_flush_dw(cs, (u64)instance << 32,
+				   MI_FLUSH_LLC | MI_FLUSH_CCS);
+	}
 	intel_ring_advance(rq, cs);
 	return 0;
 }
@@ -581,7 +695,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, len, value);
+		err = emit_clear(rq, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 12/17] drm/i915/gt: Clear compress metadata for Xe_HP platforms
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ayaz A Siddiqui, Ramalingam C

From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>

Xe-hp and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated CCS buffer.

Flat CCS memory can not be directly accessed by S/W.
Address of CCS buffer associated main BO is automatically calculated
by device itself. KMD/UMD can only access this buffer indirectly using
XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.

v2: Fixed issues with platform naming [Lucas]

Cc: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  14 +++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 120 ++++++++++++++++++-
 2 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index f8253012d166..07bf5a1753bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -203,6 +203,20 @@
 #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
 #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
 
+#define XY_CTRL_SURF_INSTR_SIZE	5
+#define MI_FLUSH_DW_SIZE		3
+#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT	21
+#define   DST_ACCESS_TYPE_SHIFT	20
+#define   CCS_SIZE_SHIFT		8
+#define   XY_CTRL_SURF_MOCS_SHIFT	25
+#define   NUM_CCS_BYTES_PER_BLOCK	256
+#define   NUM_CCS_BLKS_PER_XFER	1024
+#define   INDIRECT_ACCESS		0
+#define   DIRECT_ACCESS		1
+#define  MI_FLUSH_LLC			BIT(9)
+#define  MI_FLUSH_CCS			BIT(16)
+
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
 #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index afb1cce9a352..0bed01750884 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -17,6 +17,7 @@ struct insert_pte_data {
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
+#define GET_CCS_SIZE(i915, size)	(HAS_FLAT_CCS(i915) ? (size) >> 8 : 0)
 
 static bool engine_supports_migration(struct intel_engine_cs *engine)
 {
@@ -490,15 +491,104 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, int size, u32 value)
+static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
+{
+	/* Mask the 3 LSB to use the PPGTT address space */
+	*cmd++ = MI_FLUSH_DW | flags;
+	*cmd++ = lower_32_bits(dst);
+	*cmd++ = upper_32_bits(dst);
+
+	return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
+{
+	u32 num_cmds, num_blks, total_size;
+
+	if (!GET_CCS_SIZE(i915, size))
+		return 0;
+
+	/*
+	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
+	 * trnasfer upto 1024 blocks.
+	 */
+	num_blks = (GET_CCS_SIZE(i915, size) +
+			   (NUM_CCS_BYTES_PER_BLOCK - 1)) >> 8;
+	num_cmds = (num_blks + (NUM_CCS_BLKS_PER_XFER - 1)) >> 10;
+	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
+
+	/*
+	 * We need to add a flush before and after
+	 * XY_CTRL_SURF_COPY_BLT
+	 */
+	total_size += 2 * MI_FLUSH_DW_SIZE;
+	return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+				     u8 src_mem_access, u8 dst_mem_access,
+				     int src_mocs, int dst_mocs,
+				     u16 num_ccs_blocks)
+{
+	int i = num_ccs_blocks;
+
+	/*
+	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+	 * data in and out of the CCS region.
+	 *
+	 * We can copy at most 1024 blocks of 256 bytes using one
+	 * XY_CTRL_SURF_COPY_BLT instruction.
+	 *
+	 * In case we need to copy more than 1024 blocks, we need to add
+	 * another instruction to the same batch buffer.
+	 *
+	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
+	 *
+	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
+	 */
+	do {
+		/*
+		 * We use logical AND with 1023 since the size field
+		 * takes values which is in the range of 0 - 1023
+		 */
+		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
+			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
+			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
+			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
+		*cmd++ = lower_32_bits(src_addr);
+		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
+			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		*cmd++ = lower_32_bits(dst_addr);
+		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
+			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		src_addr += SZ_64M;
+		dst_addr += SZ_64M;
+		i -= NUM_CCS_BLKS_PER_XFER;
+	} while (i > 0);
+
+	return cmd;
+}
+
+static int emit_clear(struct i915_request *rq,
+		      int size,
+		      u32 value,
+		      bool is_lmem)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 instance = rq->engine->instance;
 	u32 *cs;
+	struct drm_i915_private *i915 = rq->engine->i915;
+	u32 num_ccs_blks, ccs_ring_size;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
+	/* Clear flat css only when value is 0 */
+	ccs_ring_size = (is_lmem && !value) ?
+			 calc_ctrl_surf_instr_size(i915, size)
+			 : 0;
+
+	cs = intel_ring_begin(rq, ver >= 8 ? 8 + ccs_ring_size : 6);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -521,6 +611,30 @@ static int emit_clear(struct i915_request *rq, int size, u32 value)
 		*cs++ = value;
 	}
 
+	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
+		num_ccs_blks = (GET_CCS_SIZE(i915, size) +
+				NUM_CCS_BYTES_PER_BLOCK - 1) >> 8;
+		/*
+		 * Flat CCS surface can only be accessed via
+		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
+		 * mapping of associated LMEM.
+		 * We can clear ccs surface by writing all 0s,
+		 * so we will flush the previously cleared buffer
+		 * and use it as a source.
+		 */
+
+		cs = i915_flush_dw(cs, (u64)instance << 32,
+				   MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = _i915_ctrl_surf_copy_blt(cs,
+					      (u64)instance << 32,
+					      (u64)instance << 32,
+					      DIRECT_ACCESS,
+					      INDIRECT_ACCESS,
+					      1, 1,
+					      num_ccs_blks);
+		cs = i915_flush_dw(cs, (u64)instance << 32,
+				   MI_FLUSH_LLC | MI_FLUSH_CCS);
+	}
 	intel_ring_advance(rq, cs);
 	return 0;
 }
@@ -581,7 +695,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, len, value);
+		err = emit_clear(rq, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 13/17] drm/i915/dg2: Tile 4 plane format support
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Stanislav Lisovskiy, Imre Deak,
	Matt Roper, Maarten Lankhorst

From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>

TileF(Tile4 in bspec) format is 4K tile organized into
64B subtiles with same basic shape as for legacy TileY
which will be supported by Display13.

v2: - Fixed wrong case condition(Jani Nikula)
    - Increased I915_FORMAT_MOD_F_TILED up to 12(Imre Deak)

v3: - s/I915_TILING_F/TILING_4/g
    - s/I915_FORMAT_MOD_F_TILED/I915_FORMAT_MOD_4_TILED/g
    - Removed unneeded fencing code

Cc: Imre Deak <imre.deak@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
---
 drivers/gpu/drm/i915/display/intel_display.c  |  1 +
 drivers/gpu/drm/i915/display/intel_fb.c       |  7 ++++
 drivers/gpu/drm/i915/display/intel_fbc.c      |  1 +
 .../drm/i915/display/intel_plane_initial.c    |  1 +
 .../drm/i915/display/skl_universal_plane.c    | 36 ++++++++++++++-----
 drivers/gpu/drm/i915/i915_drv.h               |  1 +
 drivers/gpu/drm/i915/i915_pci.c               |  1 +
 drivers/gpu/drm/i915/i915_reg.h               |  1 +
 drivers/gpu/drm/i915/intel_device_info.h      |  1 +
 drivers/gpu/drm/i915/intel_pm.c               |  1 +
 include/uapi/drm/drm_fourcc.h                 |  8 +++++
 11 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index ce5d6633029a..9b678839bf2b 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -8877,6 +8877,7 @@ static int intel_atomic_check_async(struct intel_atomic_state *state)
 		case I915_FORMAT_MOD_X_TILED:
 		case I915_FORMAT_MOD_Y_TILED:
 		case I915_FORMAT_MOD_Yf_TILED:
+		case I915_FORMAT_MOD_4_TILED:
 			break;
 		default:
 			drm_dbg_kms(&i915->drm,
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index fa1f375e696b..e19739fef825 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -127,6 +127,12 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
 			return 128;
 		else
 			return 512;
+	case I915_FORMAT_MOD_4_TILED:
+		/*
+		 * Each 4K tile consists of 64B(8*8) subtiles, with
+		 * same shape as Y Tile(i.e 4*16B OWords)
+		 */
+		return 128;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 		if (is_ccs_plane(fb, color_plane))
 			return 128;
@@ -305,6 +311,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Yf_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED:
+	case I915_FORMAT_MOD_4_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
 		return 1 * 1024 * 1024;
 	default:
diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c
index 1f66de77a6b1..f079a771f802 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -747,6 +747,7 @@ static bool tiling_is_valid(struct drm_i915_private *dev_priv,
 	case DRM_FORMAT_MOD_LINEAR:
 	case I915_FORMAT_MOD_Y_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
+	case I915_FORMAT_MOD_4_TILED:
 		return DISPLAY_VER(dev_priv) >= 9;
 	case I915_FORMAT_MOD_X_TILED:
 		return true;
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
index dcd698a02da2..d80855ee9b96 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -125,6 +125,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 	case DRM_FORMAT_MOD_LINEAR:
 	case I915_FORMAT_MOD_X_TILED:
 	case I915_FORMAT_MOD_Y_TILED:
+	case I915_FORMAT_MOD_4_TILED:
 		break;
 	default:
 		drm_dbg(&dev_priv->drm,
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 7444b88829ea..0eb4509f7f7a 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -207,6 +207,13 @@ static const u64 adlp_step_a_plane_format_modifiers[] = {
 	DRM_FORMAT_MOD_INVALID
 };
 
+static const u64 dg2_plane_format_modifiers[] = {
+	I915_FORMAT_MOD_X_TILED,
+	I915_FORMAT_MOD_4_TILED,
+	DRM_FORMAT_MOD_LINEAR,
+	DRM_FORMAT_MOD_INVALID
+};
+
 int skl_format_to_fourcc(int format, bool rgb_order, bool alpha)
 {
 	switch (format) {
@@ -795,6 +802,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
 		return PLANE_CTL_TILED_X;
 	case I915_FORMAT_MOD_Y_TILED:
 		return PLANE_CTL_TILED_Y;
+	case I915_FORMAT_MOD_4_TILED:
+		return PLANE_CTL_TILED_F;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
 		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -1283,6 +1292,7 @@ static int skl_plane_check_fb(const struct intel_crtc_state *crtc_state,
 	     fb->modifier == I915_FORMAT_MOD_Yf_TILED ||
 	     fb->modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
 	     fb->modifier == I915_FORMAT_MOD_Yf_TILED_CCS ||
+	     fb->modifier == I915_FORMAT_MOD_4_TILED ||
 	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS ||
 	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
 	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC)) {
@@ -2001,6 +2011,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 		if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
 			return false;
 		break;
+	case I915_FORMAT_MOD_4_TILED:
+		if (!HAS_FTILE(dev_priv))
+			return false;
+		break;
 	default:
 		return false;
 	}
@@ -2041,9 +2055,7 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 	case DRM_FORMAT_Y216:
 	case DRM_FORMAT_XVYU12_16161616:
 	case DRM_FORMAT_XVYU16161616:
-		if (modifier == DRM_FORMAT_MOD_LINEAR ||
-		    modifier == I915_FORMAT_MOD_X_TILED ||
-		    modifier == I915_FORMAT_MOD_Y_TILED)
+		if (!is_ccs_modifier(modifier))
 			return true;
 		fallthrough;
 	default:
@@ -2054,8 +2066,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 static const u64 *gen12_get_plane_modifiers(struct drm_i915_private *dev_priv,
 					    enum plane_id plane_id)
 {
+	if (HAS_FTILE(dev_priv))
+		return dg2_plane_format_modifiers;
 	/* Wa_22011186057 */
-	if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
+	else if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
 		return adlp_step_a_plane_format_modifiers;
 	else if (gen12_plane_supports_mc_ccs(dev_priv, plane_id))
 		return gen12_plane_format_modifiers_mc_ccs;
@@ -2325,11 +2339,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 		else
 			fb->modifier = I915_FORMAT_MOD_Y_TILED;
 		break;
-	case PLANE_CTL_TILED_YF:
-		if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
-			fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
-		else
-			fb->modifier = I915_FORMAT_MOD_Yf_TILED;
+	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_F on XE_LPD+ */
+		if (DISPLAY_VER(dev_priv) >= 13) {
+			fb->modifier = I915_FORMAT_MOD_4_TILED;
+		} else {
+			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
+			else
+				fb->modifier = I915_FORMAT_MOD_Yf_TILED;
+		}
 		break;
 	default:
 		MISSING_CASE(tiling);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 57948e0ee48b..fdd8ddb0cdf6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1628,6 +1628,7 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define CMDPARSER_USES_GGTT(dev_priv) (GRAPHICS_VER(dev_priv) == 7)
 
 #define HAS_LLC(dev_priv)	(INTEL_INFO(dev_priv)->has_llc)
+#define HAS_FTILE(dev_priv)    (INTEL_INFO(dev_priv)->has_ftile)
 #define HAS_SNOOP(dev_priv)	(INTEL_INFO(dev_priv)->has_snoop)
 #define HAS_EDRAM(dev_priv)	((dev_priv)->edram_size_mb)
 #define HAS_SECURE_BATCHES(dev_priv) (GRAPHICS_VER(dev_priv) < 6)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 68367b505dc4..0d7d88ee43ca 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -972,6 +972,7 @@ static const struct intel_device_info adl_p_info = {
 	.display.has_cdclk_crawl = 1,
 	.display.has_modular_fia = 1,
 	.display.has_psr_hw_tracking = 0,
+	.has_ftile = 1, \
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VCS0) | BIT(VCS2),
 	.ppgtt_size = 48,
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 3693eb03f5aa..0fe8e8e6cc31 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7195,6 +7195,7 @@ enum {
 #define   PLANE_CTL_TILED_X			(1 << 10)
 #define   PLANE_CTL_TILED_Y			(4 << 10)
 #define   PLANE_CTL_TILED_YF			(5 << 10)
+#define   PLANE_CTL_TILED_F			(5 << 10)
 #define   PLANE_CTL_ASYNC_FLIP			(1 << 9)
 #define   PLANE_CTL_FLIP_HORIZONTAL		(1 << 8)
 #define   PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE	(1 << 4) /* TGL+ */
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 87ee1d86d2ac..f95d4a939e10 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -126,6 +126,7 @@ enum intel_ppgtt_type {
 	func(has_64k_pages); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
+	func(has_ftile); \
 	func(has_flat_ccs); \
 	func(has_global_mocs); \
 	func(has_gt_uc); \
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 201477ca408a..0db5f32adfd5 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5377,6 +5377,7 @@ skl_compute_wm_params(const struct intel_crtc_state *crtc_state,
 	}
 
 	wp->y_tiled = modifier == I915_FORMAT_MOD_Y_TILED ||
+		      modifier == I915_FORMAT_MOD_4_TILED ||
 		      modifier == I915_FORMAT_MOD_Yf_TILED ||
 		      modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
 		      modifier == I915_FORMAT_MOD_Yf_TILED_CCS;
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 45a914850be0..982b0a9fa78b 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -558,6 +558,14 @@ extern "C" {
  * pitch is required to be a multiple of 4 tile widths.
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
+/*
+ * Intel F-tiling(aka Tile4) layout
+ *
+ * This is a tiled layout using 4Kb tiles in row-major layout.
+ * Within the tile pixels are laid out in 64 byte units / sub-tiles in OWORD
+ * (16 bytes) chunks column-major..
+ */
+#define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 12)
 
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 13/17] drm/i915/dg2: Tile 4 plane format support
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Stanislav Lisovskiy, Imre Deak,
	Matt Roper, Maarten Lankhorst

From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>

TileF(Tile4 in bspec) format is 4K tile organized into
64B subtiles with same basic shape as for legacy TileY
which will be supported by Display13.

v2: - Fixed wrong case condition(Jani Nikula)
    - Increased I915_FORMAT_MOD_F_TILED up to 12(Imre Deak)

v3: - s/I915_TILING_F/TILING_4/g
    - s/I915_FORMAT_MOD_F_TILED/I915_FORMAT_MOD_4_TILED/g
    - Removed unneeded fencing code

Cc: Imre Deak <imre.deak@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
---
 drivers/gpu/drm/i915/display/intel_display.c  |  1 +
 drivers/gpu/drm/i915/display/intel_fb.c       |  7 ++++
 drivers/gpu/drm/i915/display/intel_fbc.c      |  1 +
 .../drm/i915/display/intel_plane_initial.c    |  1 +
 .../drm/i915/display/skl_universal_plane.c    | 36 ++++++++++++++-----
 drivers/gpu/drm/i915/i915_drv.h               |  1 +
 drivers/gpu/drm/i915/i915_pci.c               |  1 +
 drivers/gpu/drm/i915/i915_reg.h               |  1 +
 drivers/gpu/drm/i915/intel_device_info.h      |  1 +
 drivers/gpu/drm/i915/intel_pm.c               |  1 +
 include/uapi/drm/drm_fourcc.h                 |  8 +++++
 11 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index ce5d6633029a..9b678839bf2b 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -8877,6 +8877,7 @@ static int intel_atomic_check_async(struct intel_atomic_state *state)
 		case I915_FORMAT_MOD_X_TILED:
 		case I915_FORMAT_MOD_Y_TILED:
 		case I915_FORMAT_MOD_Yf_TILED:
+		case I915_FORMAT_MOD_4_TILED:
 			break;
 		default:
 			drm_dbg_kms(&i915->drm,
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index fa1f375e696b..e19739fef825 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -127,6 +127,12 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
 			return 128;
 		else
 			return 512;
+	case I915_FORMAT_MOD_4_TILED:
+		/*
+		 * Each 4K tile consists of 64B(8*8) subtiles, with
+		 * same shape as Y Tile(i.e 4*16B OWords)
+		 */
+		return 128;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 		if (is_ccs_plane(fb, color_plane))
 			return 128;
@@ -305,6 +311,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Yf_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED:
+	case I915_FORMAT_MOD_4_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
 		return 1 * 1024 * 1024;
 	default:
diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c
index 1f66de77a6b1..f079a771f802 100644
--- a/drivers/gpu/drm/i915/display/intel_fbc.c
+++ b/drivers/gpu/drm/i915/display/intel_fbc.c
@@ -747,6 +747,7 @@ static bool tiling_is_valid(struct drm_i915_private *dev_priv,
 	case DRM_FORMAT_MOD_LINEAR:
 	case I915_FORMAT_MOD_Y_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
+	case I915_FORMAT_MOD_4_TILED:
 		return DISPLAY_VER(dev_priv) >= 9;
 	case I915_FORMAT_MOD_X_TILED:
 		return true;
diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
index dcd698a02da2..d80855ee9b96 100644
--- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
+++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
@@ -125,6 +125,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 	case DRM_FORMAT_MOD_LINEAR:
 	case I915_FORMAT_MOD_X_TILED:
 	case I915_FORMAT_MOD_Y_TILED:
+	case I915_FORMAT_MOD_4_TILED:
 		break;
 	default:
 		drm_dbg(&dev_priv->drm,
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 7444b88829ea..0eb4509f7f7a 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -207,6 +207,13 @@ static const u64 adlp_step_a_plane_format_modifiers[] = {
 	DRM_FORMAT_MOD_INVALID
 };
 
+static const u64 dg2_plane_format_modifiers[] = {
+	I915_FORMAT_MOD_X_TILED,
+	I915_FORMAT_MOD_4_TILED,
+	DRM_FORMAT_MOD_LINEAR,
+	DRM_FORMAT_MOD_INVALID
+};
+
 int skl_format_to_fourcc(int format, bool rgb_order, bool alpha)
 {
 	switch (format) {
@@ -795,6 +802,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
 		return PLANE_CTL_TILED_X;
 	case I915_FORMAT_MOD_Y_TILED:
 		return PLANE_CTL_TILED_Y;
+	case I915_FORMAT_MOD_4_TILED:
+		return PLANE_CTL_TILED_F;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
 		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -1283,6 +1292,7 @@ static int skl_plane_check_fb(const struct intel_crtc_state *crtc_state,
 	     fb->modifier == I915_FORMAT_MOD_Yf_TILED ||
 	     fb->modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
 	     fb->modifier == I915_FORMAT_MOD_Yf_TILED_CCS ||
+	     fb->modifier == I915_FORMAT_MOD_4_TILED ||
 	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS ||
 	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
 	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC)) {
@@ -2001,6 +2011,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 		if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
 			return false;
 		break;
+	case I915_FORMAT_MOD_4_TILED:
+		if (!HAS_FTILE(dev_priv))
+			return false;
+		break;
 	default:
 		return false;
 	}
@@ -2041,9 +2055,7 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 	case DRM_FORMAT_Y216:
 	case DRM_FORMAT_XVYU12_16161616:
 	case DRM_FORMAT_XVYU16161616:
-		if (modifier == DRM_FORMAT_MOD_LINEAR ||
-		    modifier == I915_FORMAT_MOD_X_TILED ||
-		    modifier == I915_FORMAT_MOD_Y_TILED)
+		if (!is_ccs_modifier(modifier))
 			return true;
 		fallthrough;
 	default:
@@ -2054,8 +2066,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 static const u64 *gen12_get_plane_modifiers(struct drm_i915_private *dev_priv,
 					    enum plane_id plane_id)
 {
+	if (HAS_FTILE(dev_priv))
+		return dg2_plane_format_modifiers;
 	/* Wa_22011186057 */
-	if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
+	else if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
 		return adlp_step_a_plane_format_modifiers;
 	else if (gen12_plane_supports_mc_ccs(dev_priv, plane_id))
 		return gen12_plane_format_modifiers_mc_ccs;
@@ -2325,11 +2339,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 		else
 			fb->modifier = I915_FORMAT_MOD_Y_TILED;
 		break;
-	case PLANE_CTL_TILED_YF:
-		if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
-			fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
-		else
-			fb->modifier = I915_FORMAT_MOD_Yf_TILED;
+	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_F on XE_LPD+ */
+		if (DISPLAY_VER(dev_priv) >= 13) {
+			fb->modifier = I915_FORMAT_MOD_4_TILED;
+		} else {
+			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
+			else
+				fb->modifier = I915_FORMAT_MOD_Yf_TILED;
+		}
 		break;
 	default:
 		MISSING_CASE(tiling);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 57948e0ee48b..fdd8ddb0cdf6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1628,6 +1628,7 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define CMDPARSER_USES_GGTT(dev_priv) (GRAPHICS_VER(dev_priv) == 7)
 
 #define HAS_LLC(dev_priv)	(INTEL_INFO(dev_priv)->has_llc)
+#define HAS_FTILE(dev_priv)    (INTEL_INFO(dev_priv)->has_ftile)
 #define HAS_SNOOP(dev_priv)	(INTEL_INFO(dev_priv)->has_snoop)
 #define HAS_EDRAM(dev_priv)	((dev_priv)->edram_size_mb)
 #define HAS_SECURE_BATCHES(dev_priv) (GRAPHICS_VER(dev_priv) < 6)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 68367b505dc4..0d7d88ee43ca 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -972,6 +972,7 @@ static const struct intel_device_info adl_p_info = {
 	.display.has_cdclk_crawl = 1,
 	.display.has_modular_fia = 1,
 	.display.has_psr_hw_tracking = 0,
+	.has_ftile = 1, \
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VCS0) | BIT(VCS2),
 	.ppgtt_size = 48,
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 3693eb03f5aa..0fe8e8e6cc31 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7195,6 +7195,7 @@ enum {
 #define   PLANE_CTL_TILED_X			(1 << 10)
 #define   PLANE_CTL_TILED_Y			(4 << 10)
 #define   PLANE_CTL_TILED_YF			(5 << 10)
+#define   PLANE_CTL_TILED_F			(5 << 10)
 #define   PLANE_CTL_ASYNC_FLIP			(1 << 9)
 #define   PLANE_CTL_FLIP_HORIZONTAL		(1 << 8)
 #define   PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE	(1 << 4) /* TGL+ */
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 87ee1d86d2ac..f95d4a939e10 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -126,6 +126,7 @@ enum intel_ppgtt_type {
 	func(has_64k_pages); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
+	func(has_ftile); \
 	func(has_flat_ccs); \
 	func(has_global_mocs); \
 	func(has_gt_uc); \
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 201477ca408a..0db5f32adfd5 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5377,6 +5377,7 @@ skl_compute_wm_params(const struct intel_crtc_state *crtc_state,
 	}
 
 	wp->y_tiled = modifier == I915_FORMAT_MOD_Y_TILED ||
+		      modifier == I915_FORMAT_MOD_4_TILED ||
 		      modifier == I915_FORMAT_MOD_Yf_TILED ||
 		      modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
 		      modifier == I915_FORMAT_MOD_Yf_TILED_CCS;
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 45a914850be0..982b0a9fa78b 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -558,6 +558,14 @@ extern "C" {
  * pitch is required to be a multiple of 4 tile widths.
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
+/*
+ * Intel F-tiling(aka Tile4) layout
+ *
+ * This is a tiled layout using 4Kb tiles in row-major layout.
+ * Within the tile pixels are laid out in 64 byte units / sub-tiles in OWORD
+ * (16 bytes) chunks column-major..
+ */
+#define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 12)
 
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 14/17] uapi/drm/dg2: Format modifier for DG2 unified compression and clear color
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Matt Roper, Ramalingam C,
	Simon Ser, Pekka Paalanen

From: Matt Roper <matthew.d.roper@intel.com>

DG2 unifies render compression and media compression into a single
format for the first time.  The programming and buffer layout is
supposed to match compression on older gen12 platforms, but the
actual compression algorithm is different from any previous platform; as
such, we need a new framebuffer modifier to represent buffers in this
format, but otherwise we can re-use the existing gen12 compression driver
logic.

DG2 clear color render compression uses Tile4 layout. Therefore, we need
to define a new format modifier for uAPI to support clear color rendering.

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Simon Ser <contact@emersion.fr>
Cc: Pekka Paalanen <ppaalanen@gmail.com>
---
 drivers/gpu/drm/i915/display/intel_display.c  |  3 ++
 .../drm/i915/display/intel_display_types.h    | 10 +++-
 drivers/gpu/drm/i915/display/intel_fb.c       |  7 +++
 .../drm/i915/display/skl_universal_plane.c    | 49 +++++++++++++++++--
 include/uapi/drm/drm_fourcc.h                 | 30 ++++++++++++
 5 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 9b678839bf2b..2949fe9f5b9f 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -1013,6 +1013,9 @@ intel_get_format_info(const struct drm_mode_fb_cmd2 *cmd)
 					  cmd->pixel_format);
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
 		return lookup_format_info(gen12_ccs_formats,
 					  ARRAY_SIZE(gen12_ccs_formats),
 					  cmd->pixel_format);
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h b/drivers/gpu/drm/i915/display/intel_display_types.h
index 39e11eaec1a3..3aa47a9965d9 100644
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@@ -2047,14 +2047,20 @@ static inline bool is_ccs_modifier(u64 modifier)
 	       modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC ||
 	       modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
 	       modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
-	       modifier == I915_FORMAT_MOD_Yf_TILED_CCS;
+	       modifier == I915_FORMAT_MOD_Yf_TILED_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_RC_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_MC_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC;
 }
 
 static inline bool is_gen12_ccs_modifier(u64 modifier)
 {
 	return modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS ||
 	       modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC ||
-	       modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS;
+	       modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_RC_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_MC_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC;
 }
 
 #endif /*  __INTEL_DISPLAY_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index e19739fef825..8216b03b8aae 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -127,6 +127,9 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
 			return 128;
 		else
 			return 512;
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
 	case I915_FORMAT_MOD_4_TILED:
 		/*
 		 * Each 4K tile consists of 64B(8*8) subtiles, with
@@ -314,6 +317,10 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
 	case I915_FORMAT_MOD_4_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
 		return 1 * 1024 * 1024;
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
+	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
+		return 16 * 1024;
 	default:
 		MISSING_CASE(fb->modifier);
 		return 0;
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 0eb4509f7f7a..0aaccaff46f7 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -207,7 +207,19 @@ static const u64 adlp_step_a_plane_format_modifiers[] = {
 	DRM_FORMAT_MOD_INVALID
 };
 
+static const u64 dg2_step_a_b_plane_format_modifiers[] = {
+	I915_FORMAT_MOD_F_TILED_DG2_RC_CCS,
+	I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC,
+	I915_FORMAT_MOD_X_TILED,
+	I915_FORMAT_MOD_4_TILED,
+	DRM_FORMAT_MOD_LINEAR,
+	DRM_FORMAT_MOD_INVALID
+};
+
 static const u64 dg2_plane_format_modifiers[] = {
+	I915_FORMAT_MOD_F_TILED_DG2_RC_CCS,
+	I915_FORMAT_MOD_F_TILED_DG2_MC_CCS,
+	I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC,
 	I915_FORMAT_MOD_X_TILED,
 	I915_FORMAT_MOD_4_TILED,
 	DRM_FORMAT_MOD_LINEAR,
@@ -804,6 +816,16 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
 		return PLANE_CTL_TILED_Y;
 	case I915_FORMAT_MOD_4_TILED:
 		return PLANE_CTL_TILED_F;
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
+		return PLANE_CTL_TILED_F |
+			PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
+			PLANE_CTL_CLEAR_COLOR_DISABLE;
+	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
+		return PLANE_CTL_TILED_F |
+			PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
+			PLANE_CTL_CLEAR_COLOR_DISABLE;
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
+		return PLANE_CTL_TILED_F | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
 		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2011,7 +2033,14 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 		if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
 			return false;
 		break;
+	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
+		/* Wa_14013215631 */
+		if (IS_DG2_DISP_STEP(dev_priv, STEP_A0, STEP_C0))
+			return false;
+		fallthrough;
 	case I915_FORMAT_MOD_4_TILED:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
 		if (!HAS_FTILE(dev_priv))
 			return false;
 		break;
@@ -2036,7 +2065,8 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 	case DRM_FORMAT_P010:
 	case DRM_FORMAT_P012:
 	case DRM_FORMAT_P016:
-		if (modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS)
+		if (modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
+		    modifier == I915_FORMAT_MOD_F_TILED_DG2_MC_CCS)
 			return true;
 		fallthrough;
 	case DRM_FORMAT_RGB565:
@@ -2066,7 +2096,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 static const u64 *gen12_get_plane_modifiers(struct drm_i915_private *dev_priv,
 					    enum plane_id plane_id)
 {
-	if (HAS_FTILE(dev_priv))
+	/* Wa_14013215631 */
+	if (IS_DG2_DISP_STEP(dev_priv, STEP_A0, STEP_C0))
+		return dg2_step_a_b_plane_format_modifiers;
+	else if (HAS_FTILE(dev_priv))
 		return dg2_plane_format_modifiers;
 	/* Wa_22011186057 */
 	else if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
@@ -2341,7 +2374,17 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 		break;
 	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_F on XE_LPD+ */
 		if (DISPLAY_VER(dev_priv) >= 13) {
-			fb->modifier = I915_FORMAT_MOD_4_TILED;
+			u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
+					PLANE_CTL_CLEAR_COLOR_DISABLE;
+
+			if ((val & rc_mask) == rc_mask)
+				fb->modifier = I915_FORMAT_MOD_F_TILED_DG2_RC_CCS;
+			else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_F_TILED_DG2_MC_CCS;
+			else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC;
+			else
+				fb->modifier = I915_FORMAT_MOD_4_TILED;
 		} else {
 			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
 				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 982b0a9fa78b..606ed0e0c880 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -567,6 +567,36 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 12)
 
+/*
+ * Intel color control surfaces (CCS) for DG2 render compression.
+ *
+ * DG2 uses a new compression format for render compression. The general
+ * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
+ * but a new hashing/compression algorithm is used, so a fresh modifier must
+ * be associated with buffers of this type. Render compression uses 128 byte
+ * compression blocks.
+ */
+#define I915_FORMAT_MOD_F_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 13)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 media compression.
+ *
+ * DG2 uses a new compression format for media compression. The general
+ * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS,
+ * but a new hashing/compression algorithm is used, so a fresh modifier must
+ * be associated with buffers of this type. Media compression uses 256 byte
+ * compression blocks.
+ */
+#define I915_FORMAT_MOD_F_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 14)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 clear color render compression.
+ *
+ * DG2 uses a unified compression format for clear color render compression.
+ * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
+ */
+#define I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 15)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 14/17] uapi/drm/dg2: Format modifier for DG2 unified compression and clear color
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Matt Roper, Ramalingam C,
	Simon Ser, Pekka Paalanen

From: Matt Roper <matthew.d.roper@intel.com>

DG2 unifies render compression and media compression into a single
format for the first time.  The programming and buffer layout is
supposed to match compression on older gen12 platforms, but the
actual compression algorithm is different from any previous platform; as
such, we need a new framebuffer modifier to represent buffers in this
format, but otherwise we can re-use the existing gen12 compression driver
logic.

DG2 clear color render compression uses Tile4 layout. Therefore, we need
to define a new format modifier for uAPI to support clear color rendering.

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Simon Ser <contact@emersion.fr>
Cc: Pekka Paalanen <ppaalanen@gmail.com>
---
 drivers/gpu/drm/i915/display/intel_display.c  |  3 ++
 .../drm/i915/display/intel_display_types.h    | 10 +++-
 drivers/gpu/drm/i915/display/intel_fb.c       |  7 +++
 .../drm/i915/display/skl_universal_plane.c    | 49 +++++++++++++++++--
 include/uapi/drm/drm_fourcc.h                 | 30 ++++++++++++
 5 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 9b678839bf2b..2949fe9f5b9f 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -1013,6 +1013,9 @@ intel_get_format_info(const struct drm_mode_fb_cmd2 *cmd)
 					  cmd->pixel_format);
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
 		return lookup_format_info(gen12_ccs_formats,
 					  ARRAY_SIZE(gen12_ccs_formats),
 					  cmd->pixel_format);
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h b/drivers/gpu/drm/i915/display/intel_display_types.h
index 39e11eaec1a3..3aa47a9965d9 100644
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@@ -2047,14 +2047,20 @@ static inline bool is_ccs_modifier(u64 modifier)
 	       modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC ||
 	       modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
 	       modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
-	       modifier == I915_FORMAT_MOD_Yf_TILED_CCS;
+	       modifier == I915_FORMAT_MOD_Yf_TILED_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_RC_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_MC_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC;
 }
 
 static inline bool is_gen12_ccs_modifier(u64 modifier)
 {
 	return modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS ||
 	       modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC ||
-	       modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS;
+	       modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_RC_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_MC_CCS ||
+	       modifier == I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC;
 }
 
 #endif /*  __INTEL_DISPLAY_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index e19739fef825..8216b03b8aae 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -127,6 +127,9 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
 			return 128;
 		else
 			return 512;
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
 	case I915_FORMAT_MOD_4_TILED:
 		/*
 		 * Each 4K tile consists of 64B(8*8) subtiles, with
@@ -314,6 +317,10 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
 	case I915_FORMAT_MOD_4_TILED:
 	case I915_FORMAT_MOD_Yf_TILED:
 		return 1 * 1024 * 1024;
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
+	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
+		return 16 * 1024;
 	default:
 		MISSING_CASE(fb->modifier);
 		return 0;
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index 0eb4509f7f7a..0aaccaff46f7 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -207,7 +207,19 @@ static const u64 adlp_step_a_plane_format_modifiers[] = {
 	DRM_FORMAT_MOD_INVALID
 };
 
+static const u64 dg2_step_a_b_plane_format_modifiers[] = {
+	I915_FORMAT_MOD_F_TILED_DG2_RC_CCS,
+	I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC,
+	I915_FORMAT_MOD_X_TILED,
+	I915_FORMAT_MOD_4_TILED,
+	DRM_FORMAT_MOD_LINEAR,
+	DRM_FORMAT_MOD_INVALID
+};
+
 static const u64 dg2_plane_format_modifiers[] = {
+	I915_FORMAT_MOD_F_TILED_DG2_RC_CCS,
+	I915_FORMAT_MOD_F_TILED_DG2_MC_CCS,
+	I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC,
 	I915_FORMAT_MOD_X_TILED,
 	I915_FORMAT_MOD_4_TILED,
 	DRM_FORMAT_MOD_LINEAR,
@@ -804,6 +816,16 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
 		return PLANE_CTL_TILED_Y;
 	case I915_FORMAT_MOD_4_TILED:
 		return PLANE_CTL_TILED_F;
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
+		return PLANE_CTL_TILED_F |
+			PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
+			PLANE_CTL_CLEAR_COLOR_DISABLE;
+	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
+		return PLANE_CTL_TILED_F |
+			PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
+			PLANE_CTL_CLEAR_COLOR_DISABLE;
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
+		return PLANE_CTL_TILED_F | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
 	case I915_FORMAT_MOD_Y_TILED_CCS:
 	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
 		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2011,7 +2033,14 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 		if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
 			return false;
 		break;
+	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
+		/* Wa_14013215631 */
+		if (IS_DG2_DISP_STEP(dev_priv, STEP_A0, STEP_C0))
+			return false;
+		fallthrough;
 	case I915_FORMAT_MOD_4_TILED:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
+	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
 		if (!HAS_FTILE(dev_priv))
 			return false;
 		break;
@@ -2036,7 +2065,8 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 	case DRM_FORMAT_P010:
 	case DRM_FORMAT_P012:
 	case DRM_FORMAT_P016:
-		if (modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS)
+		if (modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
+		    modifier == I915_FORMAT_MOD_F_TILED_DG2_MC_CCS)
 			return true;
 		fallthrough;
 	case DRM_FORMAT_RGB565:
@@ -2066,7 +2096,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
 static const u64 *gen12_get_plane_modifiers(struct drm_i915_private *dev_priv,
 					    enum plane_id plane_id)
 {
-	if (HAS_FTILE(dev_priv))
+	/* Wa_14013215631 */
+	if (IS_DG2_DISP_STEP(dev_priv, STEP_A0, STEP_C0))
+		return dg2_step_a_b_plane_format_modifiers;
+	else if (HAS_FTILE(dev_priv))
 		return dg2_plane_format_modifiers;
 	/* Wa_22011186057 */
 	else if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
@@ -2341,7 +2374,17 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
 		break;
 	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_F on XE_LPD+ */
 		if (DISPLAY_VER(dev_priv) >= 13) {
-			fb->modifier = I915_FORMAT_MOD_4_TILED;
+			u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
+					PLANE_CTL_CLEAR_COLOR_DISABLE;
+
+			if ((val & rc_mask) == rc_mask)
+				fb->modifier = I915_FORMAT_MOD_F_TILED_DG2_RC_CCS;
+			else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_F_TILED_DG2_MC_CCS;
+			else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+				fb->modifier = I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC;
+			else
+				fb->modifier = I915_FORMAT_MOD_4_TILED;
 		} else {
 			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
 				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 982b0a9fa78b..606ed0e0c880 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -567,6 +567,36 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 12)
 
+/*
+ * Intel color control surfaces (CCS) for DG2 render compression.
+ *
+ * DG2 uses a new compression format for render compression. The general
+ * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
+ * but a new hashing/compression algorithm is used, so a fresh modifier must
+ * be associated with buffers of this type. Render compression uses 128 byte
+ * compression blocks.
+ */
+#define I915_FORMAT_MOD_F_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 13)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 media compression.
+ *
+ * DG2 uses a new compression format for media compression. The general
+ * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS,
+ * but a new hashing/compression algorithm is used, so a fresh modifier must
+ * be associated with buffers of this type. Media compression uses 256 byte
+ * compression blocks.
+ */
+#define I915_FORMAT_MOD_F_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 14)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 clear color render compression.
+ *
+ * DG2 uses a unified compression format for clear color render compression.
+ * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
+ */
+#define I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 15)
+
 /*
  * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 15/17] drm/i915/uapi: document behaviour for DG2 64K support
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Simon Ser,
	Pekka Paalanen

From: Matthew Auld <matthew.auld@intel.com>

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
---
 include/uapi/drm/i915_drm.h | 67 ++++++++++++++++++++++++++++++++++---
 1 file changed, 62 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 914ebd9290e5..89bcf5a77958 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
 	/**
 	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 	 * the user with the GTT offset at which this object will be pinned.
+	 *
 	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 	 * presumed_offset of the object.
+	 *
 	 * During execbuffer2 the kernel populates it with the value of the
 	 * current GTT offset of the object, for future presumed_offset writes.
+	 *
+	 * See struct drm_i915_gem_create_ext for the rules when dealing with
+	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+	 * minimum page sizes, like DG2.
 	 */
 	__u64 offset;
 
@@ -3144,11 +3150,62 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 * Note that for some devices we have might have further minimum
-	 * page-size restrictions(larger than 4K), like for device local-memory.
-	 * However in general the final size here should always reflect any
-	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
-	 * extension to place the object in device local-memory.
+	 *
+	 * **DG2 64K min page size implications:**
+	 *
+	 * On discrete platforms, starting from DG2, we have to contend with GTT
+	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+	 * objects.  Specifically the hardware only supports 64K or larger GTT
+	 * page sizes for such memory. The kernel will already ensure that all
+	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+	 * sizes underneath.
+	 *
+	 * Note that the returned size here will always reflect any required
+	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
+	 * such as DG2.
+	 *
+	 * **Special DG2 GTT address alignment requirement:**
+	 *
+	 * The GTT alignment will also need be at least 64K for  such objects.
+	 *
+	 * Note that due to how the hardware implements 64K GTT page support, we
+	 * have some further complications:
+	 *
+	 *   1) The entire PDE(which covers a 2M virtual address range), must
+	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+	 *   PDE is forbidden by the hardware.
+	 *
+	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+	 *   objects.
+	 *
+	 * To handle the above the kernel implements a memory coloring scheme to
+	 * prevent userspace from mixing I915_MEMORY_CLASS_DEVICE and
+	 * I915_MEMORY_CLASS_SYSTEM objects in the same PDE. If the kernel is
+	 * ever unable to evict the required pages for the given PDE(different
+	 * color) when inserting the object into the GTT then it will simply
+	 * fail the request.
+	 *
+	 * Since userspace needs to manage the GTT address space themselves,
+	 * special care is needed to ensure this doesn't happen. The simplest
+	 * scheme is to simply align and round up all I915_MEMORY_CLASS_DEVICE
+	 * objects to 2M, which avoids any issues here. At the very least this
+	 * is likely needed for objects that can be placed in both
+	 * I915_MEMORY_CLASS_DEVICE and I915_MEMORY_CLASS_SYSTEM, to avoid
+	 * potential issues when the kernel needs to migrate the object behind
+	 * the scenes, since that might also involve evicting other objects.
+	 *
+	 * **To summarise the GTT rules, on platforms like DG2:**
+	 *
+	 *   1) All objects that can be placed in I915_MEMORY_CLASS_DEVICE must
+	 *   have 64K alignment. The kernel will reject this otherwise.
+	 *
+	 *   2) All I915_MEMORY_CLASS_DEVICE objects must never be placed in
+	 *   the same PDE with other I915_MEMORY_CLASS_SYSTEM objects. The
+	 *   kernel will reject this otherwise.
+	 *
+	 *   3) Objects that can be placed in both I915_MEMORY_CLASS_DEVICE and
+	 *   I915_MEMORY_CLASS_SYSTEM should probably be aligned and padded out
+	 *   to 2M.
 	 */
 	__u64 size;
 	/**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 15/17] drm/i915/uapi: document behaviour for DG2 64K support
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Simon Ser,
	Pekka Paalanen

From: Matthew Auld <matthew.auld@intel.com>

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
---
 include/uapi/drm/i915_drm.h | 67 ++++++++++++++++++++++++++++++++++---
 1 file changed, 62 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 914ebd9290e5..89bcf5a77958 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
 	/**
 	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 	 * the user with the GTT offset at which this object will be pinned.
+	 *
 	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 	 * presumed_offset of the object.
+	 *
 	 * During execbuffer2 the kernel populates it with the value of the
 	 * current GTT offset of the object, for future presumed_offset writes.
+	 *
+	 * See struct drm_i915_gem_create_ext for the rules when dealing with
+	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+	 * minimum page sizes, like DG2.
 	 */
 	__u64 offset;
 
@@ -3144,11 +3150,62 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 * Note that for some devices we have might have further minimum
-	 * page-size restrictions(larger than 4K), like for device local-memory.
-	 * However in general the final size here should always reflect any
-	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
-	 * extension to place the object in device local-memory.
+	 *
+	 * **DG2 64K min page size implications:**
+	 *
+	 * On discrete platforms, starting from DG2, we have to contend with GTT
+	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+	 * objects.  Specifically the hardware only supports 64K or larger GTT
+	 * page sizes for such memory. The kernel will already ensure that all
+	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+	 * sizes underneath.
+	 *
+	 * Note that the returned size here will always reflect any required
+	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
+	 * such as DG2.
+	 *
+	 * **Special DG2 GTT address alignment requirement:**
+	 *
+	 * The GTT alignment will also need be at least 64K for  such objects.
+	 *
+	 * Note that due to how the hardware implements 64K GTT page support, we
+	 * have some further complications:
+	 *
+	 *   1) The entire PDE(which covers a 2M virtual address range), must
+	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+	 *   PDE is forbidden by the hardware.
+	 *
+	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+	 *   objects.
+	 *
+	 * To handle the above the kernel implements a memory coloring scheme to
+	 * prevent userspace from mixing I915_MEMORY_CLASS_DEVICE and
+	 * I915_MEMORY_CLASS_SYSTEM objects in the same PDE. If the kernel is
+	 * ever unable to evict the required pages for the given PDE(different
+	 * color) when inserting the object into the GTT then it will simply
+	 * fail the request.
+	 *
+	 * Since userspace needs to manage the GTT address space themselves,
+	 * special care is needed to ensure this doesn't happen. The simplest
+	 * scheme is to simply align and round up all I915_MEMORY_CLASS_DEVICE
+	 * objects to 2M, which avoids any issues here. At the very least this
+	 * is likely needed for objects that can be placed in both
+	 * I915_MEMORY_CLASS_DEVICE and I915_MEMORY_CLASS_SYSTEM, to avoid
+	 * potential issues when the kernel needs to migrate the object behind
+	 * the scenes, since that might also involve evicting other objects.
+	 *
+	 * **To summarise the GTT rules, on platforms like DG2:**
+	 *
+	 *   1) All objects that can be placed in I915_MEMORY_CLASS_DEVICE must
+	 *   have 64K alignment. The kernel will reject this otherwise.
+	 *
+	 *   2) All I915_MEMORY_CLASS_DEVICE objects must never be placed in
+	 *   the same PDE with other I915_MEMORY_CLASS_SYSTEM objects. The
+	 *   kernel will reject this otherwise.
+	 *
+	 *   3) Objects that can be placed in both I915_MEMORY_CLASS_DEVICE and
+	 *   I915_MEMORY_CLASS_SYSTEM should probably be aligned and padded out
+	 *   to 2M.
 	 */
 	__u64 size;
 	/**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 16/17] drm/i915/Flat-CCS: Document on Flat-CCS memory compression
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Simon Ser,
	Pekka Paalanen

Documents the Flat-CCS feature and kernel handling required along with
modifiers used.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 47 +++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 0bed01750884..ad5a28da1c6a 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -491,6 +491,53 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
+/**
+ * DOC: Flat-CCS - Memory compression for Local memory
+ *
+ * On Xe-HP and later devices, we use dedicated compression control state (CCS)
+ * stored in local memory for each surface, to support the 3D and media
+ * compression formats.
+ *
+ * The memory required for the CCS of the entire local memory is 1/256 of the
+ * local memory size. So before the kernel boot, the required memory is reserved
+ * for the CCS data and a secure register will be programmed with the CCS base
+ * address.
+ *
+ * Flat CCS data needs to be cleared when a lmem object is allocated.
+ * And CCS data can be copied in and out of CCS region through
+ * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
+ *
+ * When we exaust the lmem, if the object's placements support smem, then we can
+ * directly decompress the compressed lmem object into smem and start using it
+ * from smem itself.
+ *
+ * But when we need to swapout the compressed lmem object into a smem region
+ * though objects' placement doesn't support smem, then we copy the lmem content
+ * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
+ * When the object is referred, lmem content will be swaped in along with
+ * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
+ * location.
+ *
+ *
+ * Flat-CCS Modifiers for different compression formats
+ * ----------------------------------------------------
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_RC_CCS - used to indicate the buffers of Flat CCS
+ * render compression formats. Though the general layout is same as
+ * I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression algorithm is
+ * used. Render compression uses 128 byte compression blocks
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_MC_CCS -used to indicate the buffers of Flat CCS
+ * media compression formats. Though the general layout is same as
+ * I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm is
+ * used. Media compression uses 256 byte compression blocks.
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC - used to indicate the buffers of Flat
+ * CCS clear color render compression formats. Unified compression format for
+ * clear color render compression. The genral layout is a tiled layout using
+ * 4Kb tiles i.e Tile4 layout.
+ */
+
 static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
 {
 	/* Mask the 3 LSB to use the PPGTT address space */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 16/17] drm/i915/Flat-CCS: Document on Flat-CCS memory compression
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Simon Ser,
	Pekka Paalanen

Documents the Flat-CCS feature and kernel handling required along with
modifiers used.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 47 +++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 0bed01750884..ad5a28da1c6a 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -491,6 +491,53 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
+/**
+ * DOC: Flat-CCS - Memory compression for Local memory
+ *
+ * On Xe-HP and later devices, we use dedicated compression control state (CCS)
+ * stored in local memory for each surface, to support the 3D and media
+ * compression formats.
+ *
+ * The memory required for the CCS of the entire local memory is 1/256 of the
+ * local memory size. So before the kernel boot, the required memory is reserved
+ * for the CCS data and a secure register will be programmed with the CCS base
+ * address.
+ *
+ * Flat CCS data needs to be cleared when a lmem object is allocated.
+ * And CCS data can be copied in and out of CCS region through
+ * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
+ *
+ * When we exaust the lmem, if the object's placements support smem, then we can
+ * directly decompress the compressed lmem object into smem and start using it
+ * from smem itself.
+ *
+ * But when we need to swapout the compressed lmem object into a smem region
+ * though objects' placement doesn't support smem, then we copy the lmem content
+ * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
+ * When the object is referred, lmem content will be swaped in along with
+ * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
+ * location.
+ *
+ *
+ * Flat-CCS Modifiers for different compression formats
+ * ----------------------------------------------------
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_RC_CCS - used to indicate the buffers of Flat CCS
+ * render compression formats. Though the general layout is same as
+ * I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression algorithm is
+ * used. Render compression uses 128 byte compression blocks
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_MC_CCS -used to indicate the buffers of Flat CCS
+ * media compression formats. Though the general layout is same as
+ * I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm is
+ * used. Media compression uses 256 byte compression blocks.
+ *
+ * I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC - used to indicate the buffers of Flat
+ * CCS clear color render compression formats. Unified compression format for
+ * clear color render compression. The genral layout is a tiled layout using
+ * 4Kb tiles i.e Tile4 layout.
+ */
+
 static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
 {
 	/* Mask the 3 LSB to use the PPGTT address space */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 17/17] Doc/gpu/rfc/i915: i915 DG2 uAPI
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:26   ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Daniel Vetter,
	Simon Ser, Pekka Paalanen

Details of the new features getting added as part of DG2 enabling and their
implicit impact on the uAPI.

v2: improvised the Flat-CCS documentation [Danvet & CQ]

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Daniel Vetter <daniel.vetter@ffwll.ch>
cc: Matthew Auld <matthew.auld@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
---
 Documentation/gpu/rfc/i915_dg2.rst | 32 ++++++++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst    |  3 +++
 2 files changed, 35 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_dg2.rst

diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst
new file mode 100644
index 000000000000..9d28b1812bc7
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_dg2.rst
@@ -0,0 +1,32 @@
+====================
+I915 DG2 RFC Section
+====================
+
+Upstream plan
+=============
+Plan to upstream the DG2 enabling is:
+
+* Merge basic HW enabling for DG2 (Still without pciid)
+* Merge the 64k support for lmem
+* Merge the flat CCS enabling patches
+* Add the pciid for DG2 and enable the DG2 in CI
+
+
+64K page support for lmem
+=========================
+On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is not
+supported anymore.
+
+DG2 hw doesn't support the 64k (lmem) and 4k (smem) pages in the same ppgtt
+Page table. Refer the struct drm_i915_gem_create_ext for the implication of
+handling the 64k page size.
+
+.. kernel-doc:: include/uapi/drm/i915_drm.h
+        :functions: drm_i915_gem_create_ext
+
+
+Flat CCS support for lmem
+=========================
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_migrate.c
+        :doc: Flat-CCS - Memory compression for Local memory
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..afb320ed4028 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -20,6 +20,9 @@ host such documentation:
 
     i915_gem_lmem.rst
 
+.. toctree::
+    i915_dg2.rst
+
 .. toctree::
 
     i915_scheduler.rst
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Intel-gfx] [PATCH v2 17/17] Doc/gpu/rfc/i915: i915 DG2 uAPI
@ 2021-10-21 14:26   ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-21 14:26 UTC (permalink / raw)
  To: dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, Matthew Auld, lucas.demarchi,
	rodrigo.vivi, Hellstrom Thomas, Ramalingam C, Daniel Vetter,
	Simon Ser, Pekka Paalanen

Details of the new features getting added as part of DG2 enabling and their
implicit impact on the uAPI.

v2: improvised the Flat-CCS documentation [Danvet & CQ]

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Daniel Vetter <daniel.vetter@ffwll.ch>
cc: Matthew Auld <matthew.auld@intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
---
 Documentation/gpu/rfc/i915_dg2.rst | 32 ++++++++++++++++++++++++++++++
 Documentation/gpu/rfc/index.rst    |  3 +++
 2 files changed, 35 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_dg2.rst

diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst
new file mode 100644
index 000000000000..9d28b1812bc7
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_dg2.rst
@@ -0,0 +1,32 @@
+====================
+I915 DG2 RFC Section
+====================
+
+Upstream plan
+=============
+Plan to upstream the DG2 enabling is:
+
+* Merge basic HW enabling for DG2 (Still without pciid)
+* Merge the 64k support for lmem
+* Merge the flat CCS enabling patches
+* Add the pciid for DG2 and enable the DG2 in CI
+
+
+64K page support for lmem
+=========================
+On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is not
+supported anymore.
+
+DG2 hw doesn't support the 64k (lmem) and 4k (smem) pages in the same ppgtt
+Page table. Refer the struct drm_i915_gem_create_ext for the implication of
+handling the 64k page size.
+
+.. kernel-doc:: include/uapi/drm/i915_drm.h
+        :functions: drm_i915_gem_create_ext
+
+
+Flat CCS support for lmem
+=========================
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_migrate.c
+        :doc: Flat-CCS - Memory compression for Local memory
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 91e93a705230..afb320ed4028 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -20,6 +20,9 @@ host such documentation:
 
     i915_gem_lmem.rst
 
+.. toctree::
+    i915_dg2.rst
+
 .. toctree::
 
     i915_scheduler.rst
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 13/17] drm/i915/dg2: Tile 4 plane format support
  2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:27     ` Lisovskiy, Stanislav
  -1 siblings, 0 replies; 57+ messages in thread
From: Lisovskiy, Stanislav @ 2021-10-21 14:27 UTC (permalink / raw)
  To: Ramalingam C
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	lucas.demarchi, rodrigo.vivi, Hellstrom Thomas, Imre Deak,
	Matt Roper, Maarten Lankhorst

On Thu, Oct 21, 2021 at 07:56:23PM +0530, Ramalingam C wrote:
> From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> 
> TileF(Tile4 in bspec) format is 4K tile organized into
> 64B subtiles with same basic shape as for legacy TileY
> which will be supported by Display13.
> 
> v2: - Fixed wrong case condition(Jani Nikula)
>     - Increased I915_FORMAT_MOD_F_TILED up to 12(Imre Deak)
> 
> v3: - s/I915_TILING_F/TILING_4/g
>     - s/I915_FORMAT_MOD_F_TILED/I915_FORMAT_MOD_4_TILED/g
>     - Removed unneeded fencing code
> 
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_display.c  |  1 +
>  drivers/gpu/drm/i915/display/intel_fb.c       |  7 ++++
>  drivers/gpu/drm/i915/display/intel_fbc.c      |  1 +
>  .../drm/i915/display/intel_plane_initial.c    |  1 +
>  .../drm/i915/display/skl_universal_plane.c    | 36 ++++++++++++++-----
>  drivers/gpu/drm/i915/i915_drv.h               |  1 +
>  drivers/gpu/drm/i915/i915_pci.c               |  1 +
>  drivers/gpu/drm/i915/i915_reg.h               |  1 +
>  drivers/gpu/drm/i915/intel_device_info.h      |  1 +
>  drivers/gpu/drm/i915/intel_pm.c               |  1 +
>  include/uapi/drm/drm_fourcc.h                 |  8 +++++
>  11 files changed, 50 insertions(+), 9 deletions(-)

Was I supposed to change TILE_F/TILE_4 everywhere first,
as per your comment?

Stan

> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index ce5d6633029a..9b678839bf2b 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -8877,6 +8877,7 @@ static int intel_atomic_check_async(struct intel_atomic_state *state)
>  		case I915_FORMAT_MOD_X_TILED:
>  		case I915_FORMAT_MOD_Y_TILED:
>  		case I915_FORMAT_MOD_Yf_TILED:
> +		case I915_FORMAT_MOD_4_TILED:
>  			break;
>  		default:
>  			drm_dbg_kms(&i915->drm,
> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
> index fa1f375e696b..e19739fef825 100644
> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> @@ -127,6 +127,12 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
>  			return 128;
>  		else
>  			return 512;
> +	case I915_FORMAT_MOD_4_TILED:
> +		/*
> +		 * Each 4K tile consists of 64B(8*8) subtiles, with
> +		 * same shape as Y Tile(i.e 4*16B OWords)
> +		 */
> +		return 128;
>  	case I915_FORMAT_MOD_Y_TILED_CCS:
>  		if (is_ccs_plane(fb, color_plane))
>  			return 128;
> @@ -305,6 +311,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
>  	case I915_FORMAT_MOD_Y_TILED_CCS:
>  	case I915_FORMAT_MOD_Yf_TILED_CCS:
>  	case I915_FORMAT_MOD_Y_TILED:
> +	case I915_FORMAT_MOD_4_TILED:
>  	case I915_FORMAT_MOD_Yf_TILED:
>  		return 1 * 1024 * 1024;
>  	default:
> diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c
> index 1f66de77a6b1..f079a771f802 100644
> --- a/drivers/gpu/drm/i915/display/intel_fbc.c
> +++ b/drivers/gpu/drm/i915/display/intel_fbc.c
> @@ -747,6 +747,7 @@ static bool tiling_is_valid(struct drm_i915_private *dev_priv,
>  	case DRM_FORMAT_MOD_LINEAR:
>  	case I915_FORMAT_MOD_Y_TILED:
>  	case I915_FORMAT_MOD_Yf_TILED:
> +	case I915_FORMAT_MOD_4_TILED:
>  		return DISPLAY_VER(dev_priv) >= 9;
>  	case I915_FORMAT_MOD_X_TILED:
>  		return true;
> diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
> index dcd698a02da2..d80855ee9b96 100644
> --- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
> +++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
> @@ -125,6 +125,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
>  	case DRM_FORMAT_MOD_LINEAR:
>  	case I915_FORMAT_MOD_X_TILED:
>  	case I915_FORMAT_MOD_Y_TILED:
> +	case I915_FORMAT_MOD_4_TILED:
>  		break;
>  	default:
>  		drm_dbg(&dev_priv->drm,
> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> index 7444b88829ea..0eb4509f7f7a 100644
> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> @@ -207,6 +207,13 @@ static const u64 adlp_step_a_plane_format_modifiers[] = {
>  	DRM_FORMAT_MOD_INVALID
>  };
>  
> +static const u64 dg2_plane_format_modifiers[] = {
> +	I915_FORMAT_MOD_X_TILED,
> +	I915_FORMAT_MOD_4_TILED,
> +	DRM_FORMAT_MOD_LINEAR,
> +	DRM_FORMAT_MOD_INVALID
> +};
> +
>  int skl_format_to_fourcc(int format, bool rgb_order, bool alpha)
>  {
>  	switch (format) {
> @@ -795,6 +802,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
>  		return PLANE_CTL_TILED_X;
>  	case I915_FORMAT_MOD_Y_TILED:
>  		return PLANE_CTL_TILED_Y;
> +	case I915_FORMAT_MOD_4_TILED:
> +		return PLANE_CTL_TILED_F;
>  	case I915_FORMAT_MOD_Y_TILED_CCS:
>  	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
>  		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> @@ -1283,6 +1292,7 @@ static int skl_plane_check_fb(const struct intel_crtc_state *crtc_state,
>  	     fb->modifier == I915_FORMAT_MOD_Yf_TILED ||
>  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
>  	     fb->modifier == I915_FORMAT_MOD_Yf_TILED_CCS ||
> +	     fb->modifier == I915_FORMAT_MOD_4_TILED ||
>  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS ||
>  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
>  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC)) {
> @@ -2001,6 +2011,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
>  		if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
>  			return false;
>  		break;
> +	case I915_FORMAT_MOD_4_TILED:
> +		if (!HAS_FTILE(dev_priv))
> +			return false;
> +		break;
>  	default:
>  		return false;
>  	}
> @@ -2041,9 +2055,7 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
>  	case DRM_FORMAT_Y216:
>  	case DRM_FORMAT_XVYU12_16161616:
>  	case DRM_FORMAT_XVYU16161616:
> -		if (modifier == DRM_FORMAT_MOD_LINEAR ||
> -		    modifier == I915_FORMAT_MOD_X_TILED ||
> -		    modifier == I915_FORMAT_MOD_Y_TILED)
> +		if (!is_ccs_modifier(modifier))
>  			return true;
>  		fallthrough;
>  	default:
> @@ -2054,8 +2066,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
>  static const u64 *gen12_get_plane_modifiers(struct drm_i915_private *dev_priv,
>  					    enum plane_id plane_id)
>  {
> +	if (HAS_FTILE(dev_priv))
> +		return dg2_plane_format_modifiers;
>  	/* Wa_22011186057 */
> -	if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
> +	else if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
>  		return adlp_step_a_plane_format_modifiers;
>  	else if (gen12_plane_supports_mc_ccs(dev_priv, plane_id))
>  		return gen12_plane_format_modifiers_mc_ccs;
> @@ -2325,11 +2339,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
>  		else
>  			fb->modifier = I915_FORMAT_MOD_Y_TILED;
>  		break;
> -	case PLANE_CTL_TILED_YF:
> -		if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> -			fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
> -		else
> -			fb->modifier = I915_FORMAT_MOD_Yf_TILED;
> +	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_F on XE_LPD+ */
> +		if (DISPLAY_VER(dev_priv) >= 13) {
> +			fb->modifier = I915_FORMAT_MOD_4_TILED;
> +		} else {
> +			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> +				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
> +			else
> +				fb->modifier = I915_FORMAT_MOD_Yf_TILED;
> +		}
>  		break;
>  	default:
>  		MISSING_CASE(tiling);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 57948e0ee48b..fdd8ddb0cdf6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1628,6 +1628,7 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
>  #define CMDPARSER_USES_GGTT(dev_priv) (GRAPHICS_VER(dev_priv) == 7)
>  
>  #define HAS_LLC(dev_priv)	(INTEL_INFO(dev_priv)->has_llc)
> +#define HAS_FTILE(dev_priv)    (INTEL_INFO(dev_priv)->has_ftile)
>  #define HAS_SNOOP(dev_priv)	(INTEL_INFO(dev_priv)->has_snoop)
>  #define HAS_EDRAM(dev_priv)	((dev_priv)->edram_size_mb)
>  #define HAS_SECURE_BATCHES(dev_priv) (GRAPHICS_VER(dev_priv) < 6)
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 68367b505dc4..0d7d88ee43ca 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -972,6 +972,7 @@ static const struct intel_device_info adl_p_info = {
>  	.display.has_cdclk_crawl = 1,
>  	.display.has_modular_fia = 1,
>  	.display.has_psr_hw_tracking = 0,
> +	.has_ftile = 1, \
>  	.platform_engine_mask =
>  		BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VCS0) | BIT(VCS2),
>  	.ppgtt_size = 48,
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 3693eb03f5aa..0fe8e8e6cc31 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -7195,6 +7195,7 @@ enum {
>  #define   PLANE_CTL_TILED_X			(1 << 10)
>  #define   PLANE_CTL_TILED_Y			(4 << 10)
>  #define   PLANE_CTL_TILED_YF			(5 << 10)
> +#define   PLANE_CTL_TILED_F			(5 << 10)
>  #define   PLANE_CTL_ASYNC_FLIP			(1 << 9)
>  #define   PLANE_CTL_FLIP_HORIZONTAL		(1 << 8)
>  #define   PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE	(1 << 4) /* TGL+ */
> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> index 87ee1d86d2ac..f95d4a939e10 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.h
> +++ b/drivers/gpu/drm/i915/intel_device_info.h
> @@ -126,6 +126,7 @@ enum intel_ppgtt_type {
>  	func(has_64k_pages); \
>  	func(gpu_reset_clobbers_display); \
>  	func(has_reset_engine); \
> +	func(has_ftile); \
>  	func(has_flat_ccs); \
>  	func(has_global_mocs); \
>  	func(has_gt_uc); \
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 201477ca408a..0db5f32adfd5 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5377,6 +5377,7 @@ skl_compute_wm_params(const struct intel_crtc_state *crtc_state,
>  	}
>  
>  	wp->y_tiled = modifier == I915_FORMAT_MOD_Y_TILED ||
> +		      modifier == I915_FORMAT_MOD_4_TILED ||
>  		      modifier == I915_FORMAT_MOD_Yf_TILED ||
>  		      modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
>  		      modifier == I915_FORMAT_MOD_Yf_TILED_CCS;
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 45a914850be0..982b0a9fa78b 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -558,6 +558,14 @@ extern "C" {
>   * pitch is required to be a multiple of 4 tile widths.
>   */
>  #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
> +/*
> + * Intel F-tiling(aka Tile4) layout
> + *
> + * This is a tiled layout using 4Kb tiles in row-major layout.
> + * Within the tile pixels are laid out in 64 byte units / sub-tiles in OWORD
> + * (16 bytes) chunks column-major..
> + */
> +#define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 12)
>  
>  /*
>   * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Intel-gfx] [PATCH v2 13/17] drm/i915/dg2: Tile 4 plane format support
@ 2021-10-21 14:27     ` Lisovskiy, Stanislav
  0 siblings, 0 replies; 57+ messages in thread
From: Lisovskiy, Stanislav @ 2021-10-21 14:27 UTC (permalink / raw)
  To: Ramalingam C
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	lucas.demarchi, rodrigo.vivi, Hellstrom Thomas, Imre Deak,
	Matt Roper, Maarten Lankhorst

On Thu, Oct 21, 2021 at 07:56:23PM +0530, Ramalingam C wrote:
> From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> 
> TileF(Tile4 in bspec) format is 4K tile organized into
> 64B subtiles with same basic shape as for legacy TileY
> which will be supported by Display13.
> 
> v2: - Fixed wrong case condition(Jani Nikula)
>     - Increased I915_FORMAT_MOD_F_TILED up to 12(Imre Deak)
> 
> v3: - s/I915_TILING_F/TILING_4/g
>     - s/I915_FORMAT_MOD_F_TILED/I915_FORMAT_MOD_4_TILED/g
>     - Removed unneeded fencing code
> 
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_display.c  |  1 +
>  drivers/gpu/drm/i915/display/intel_fb.c       |  7 ++++
>  drivers/gpu/drm/i915/display/intel_fbc.c      |  1 +
>  .../drm/i915/display/intel_plane_initial.c    |  1 +
>  .../drm/i915/display/skl_universal_plane.c    | 36 ++++++++++++++-----
>  drivers/gpu/drm/i915/i915_drv.h               |  1 +
>  drivers/gpu/drm/i915/i915_pci.c               |  1 +
>  drivers/gpu/drm/i915/i915_reg.h               |  1 +
>  drivers/gpu/drm/i915/intel_device_info.h      |  1 +
>  drivers/gpu/drm/i915/intel_pm.c               |  1 +
>  include/uapi/drm/drm_fourcc.h                 |  8 +++++
>  11 files changed, 50 insertions(+), 9 deletions(-)

Was I supposed to change TILE_F/TILE_4 everywhere first,
as per your comment?

Stan

> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index ce5d6633029a..9b678839bf2b 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -8877,6 +8877,7 @@ static int intel_atomic_check_async(struct intel_atomic_state *state)
>  		case I915_FORMAT_MOD_X_TILED:
>  		case I915_FORMAT_MOD_Y_TILED:
>  		case I915_FORMAT_MOD_Yf_TILED:
> +		case I915_FORMAT_MOD_4_TILED:
>  			break;
>  		default:
>  			drm_dbg_kms(&i915->drm,
> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
> index fa1f375e696b..e19739fef825 100644
> --- a/drivers/gpu/drm/i915/display/intel_fb.c
> +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> @@ -127,6 +127,12 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
>  			return 128;
>  		else
>  			return 512;
> +	case I915_FORMAT_MOD_4_TILED:
> +		/*
> +		 * Each 4K tile consists of 64B(8*8) subtiles, with
> +		 * same shape as Y Tile(i.e 4*16B OWords)
> +		 */
> +		return 128;
>  	case I915_FORMAT_MOD_Y_TILED_CCS:
>  		if (is_ccs_plane(fb, color_plane))
>  			return 128;
> @@ -305,6 +311,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
>  	case I915_FORMAT_MOD_Y_TILED_CCS:
>  	case I915_FORMAT_MOD_Yf_TILED_CCS:
>  	case I915_FORMAT_MOD_Y_TILED:
> +	case I915_FORMAT_MOD_4_TILED:
>  	case I915_FORMAT_MOD_Yf_TILED:
>  		return 1 * 1024 * 1024;
>  	default:
> diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c
> index 1f66de77a6b1..f079a771f802 100644
> --- a/drivers/gpu/drm/i915/display/intel_fbc.c
> +++ b/drivers/gpu/drm/i915/display/intel_fbc.c
> @@ -747,6 +747,7 @@ static bool tiling_is_valid(struct drm_i915_private *dev_priv,
>  	case DRM_FORMAT_MOD_LINEAR:
>  	case I915_FORMAT_MOD_Y_TILED:
>  	case I915_FORMAT_MOD_Yf_TILED:
> +	case I915_FORMAT_MOD_4_TILED:
>  		return DISPLAY_VER(dev_priv) >= 9;
>  	case I915_FORMAT_MOD_X_TILED:
>  		return true;
> diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
> index dcd698a02da2..d80855ee9b96 100644
> --- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
> +++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
> @@ -125,6 +125,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
>  	case DRM_FORMAT_MOD_LINEAR:
>  	case I915_FORMAT_MOD_X_TILED:
>  	case I915_FORMAT_MOD_Y_TILED:
> +	case I915_FORMAT_MOD_4_TILED:
>  		break;
>  	default:
>  		drm_dbg(&dev_priv->drm,
> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> index 7444b88829ea..0eb4509f7f7a 100644
> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> @@ -207,6 +207,13 @@ static const u64 adlp_step_a_plane_format_modifiers[] = {
>  	DRM_FORMAT_MOD_INVALID
>  };
>  
> +static const u64 dg2_plane_format_modifiers[] = {
> +	I915_FORMAT_MOD_X_TILED,
> +	I915_FORMAT_MOD_4_TILED,
> +	DRM_FORMAT_MOD_LINEAR,
> +	DRM_FORMAT_MOD_INVALID
> +};
> +
>  int skl_format_to_fourcc(int format, bool rgb_order, bool alpha)
>  {
>  	switch (format) {
> @@ -795,6 +802,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
>  		return PLANE_CTL_TILED_X;
>  	case I915_FORMAT_MOD_Y_TILED:
>  		return PLANE_CTL_TILED_Y;
> +	case I915_FORMAT_MOD_4_TILED:
> +		return PLANE_CTL_TILED_F;
>  	case I915_FORMAT_MOD_Y_TILED_CCS:
>  	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
>  		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> @@ -1283,6 +1292,7 @@ static int skl_plane_check_fb(const struct intel_crtc_state *crtc_state,
>  	     fb->modifier == I915_FORMAT_MOD_Yf_TILED ||
>  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
>  	     fb->modifier == I915_FORMAT_MOD_Yf_TILED_CCS ||
> +	     fb->modifier == I915_FORMAT_MOD_4_TILED ||
>  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS ||
>  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
>  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC)) {
> @@ -2001,6 +2011,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
>  		if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
>  			return false;
>  		break;
> +	case I915_FORMAT_MOD_4_TILED:
> +		if (!HAS_FTILE(dev_priv))
> +			return false;
> +		break;
>  	default:
>  		return false;
>  	}
> @@ -2041,9 +2055,7 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
>  	case DRM_FORMAT_Y216:
>  	case DRM_FORMAT_XVYU12_16161616:
>  	case DRM_FORMAT_XVYU16161616:
> -		if (modifier == DRM_FORMAT_MOD_LINEAR ||
> -		    modifier == I915_FORMAT_MOD_X_TILED ||
> -		    modifier == I915_FORMAT_MOD_Y_TILED)
> +		if (!is_ccs_modifier(modifier))
>  			return true;
>  		fallthrough;
>  	default:
> @@ -2054,8 +2066,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
>  static const u64 *gen12_get_plane_modifiers(struct drm_i915_private *dev_priv,
>  					    enum plane_id plane_id)
>  {
> +	if (HAS_FTILE(dev_priv))
> +		return dg2_plane_format_modifiers;
>  	/* Wa_22011186057 */
> -	if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
> +	else if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
>  		return adlp_step_a_plane_format_modifiers;
>  	else if (gen12_plane_supports_mc_ccs(dev_priv, plane_id))
>  		return gen12_plane_format_modifiers_mc_ccs;
> @@ -2325,11 +2339,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
>  		else
>  			fb->modifier = I915_FORMAT_MOD_Y_TILED;
>  		break;
> -	case PLANE_CTL_TILED_YF:
> -		if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> -			fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
> -		else
> -			fb->modifier = I915_FORMAT_MOD_Yf_TILED;
> +	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_F on XE_LPD+ */
> +		if (DISPLAY_VER(dev_priv) >= 13) {
> +			fb->modifier = I915_FORMAT_MOD_4_TILED;
> +		} else {
> +			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> +				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
> +			else
> +				fb->modifier = I915_FORMAT_MOD_Yf_TILED;
> +		}
>  		break;
>  	default:
>  		MISSING_CASE(tiling);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 57948e0ee48b..fdd8ddb0cdf6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1628,6 +1628,7 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
>  #define CMDPARSER_USES_GGTT(dev_priv) (GRAPHICS_VER(dev_priv) == 7)
>  
>  #define HAS_LLC(dev_priv)	(INTEL_INFO(dev_priv)->has_llc)
> +#define HAS_FTILE(dev_priv)    (INTEL_INFO(dev_priv)->has_ftile)
>  #define HAS_SNOOP(dev_priv)	(INTEL_INFO(dev_priv)->has_snoop)
>  #define HAS_EDRAM(dev_priv)	((dev_priv)->edram_size_mb)
>  #define HAS_SECURE_BATCHES(dev_priv) (GRAPHICS_VER(dev_priv) < 6)
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 68367b505dc4..0d7d88ee43ca 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -972,6 +972,7 @@ static const struct intel_device_info adl_p_info = {
>  	.display.has_cdclk_crawl = 1,
>  	.display.has_modular_fia = 1,
>  	.display.has_psr_hw_tracking = 0,
> +	.has_ftile = 1, \
>  	.platform_engine_mask =
>  		BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VCS0) | BIT(VCS2),
>  	.ppgtt_size = 48,
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 3693eb03f5aa..0fe8e8e6cc31 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -7195,6 +7195,7 @@ enum {
>  #define   PLANE_CTL_TILED_X			(1 << 10)
>  #define   PLANE_CTL_TILED_Y			(4 << 10)
>  #define   PLANE_CTL_TILED_YF			(5 << 10)
> +#define   PLANE_CTL_TILED_F			(5 << 10)
>  #define   PLANE_CTL_ASYNC_FLIP			(1 << 9)
>  #define   PLANE_CTL_FLIP_HORIZONTAL		(1 << 8)
>  #define   PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE	(1 << 4) /* TGL+ */
> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> index 87ee1d86d2ac..f95d4a939e10 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.h
> +++ b/drivers/gpu/drm/i915/intel_device_info.h
> @@ -126,6 +126,7 @@ enum intel_ppgtt_type {
>  	func(has_64k_pages); \
>  	func(gpu_reset_clobbers_display); \
>  	func(has_reset_engine); \
> +	func(has_ftile); \
>  	func(has_flat_ccs); \
>  	func(has_global_mocs); \
>  	func(has_gt_uc); \
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 201477ca408a..0db5f32adfd5 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5377,6 +5377,7 @@ skl_compute_wm_params(const struct intel_crtc_state *crtc_state,
>  	}
>  
>  	wp->y_tiled = modifier == I915_FORMAT_MOD_Y_TILED ||
> +		      modifier == I915_FORMAT_MOD_4_TILED ||
>  		      modifier == I915_FORMAT_MOD_Yf_TILED ||
>  		      modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
>  		      modifier == I915_FORMAT_MOD_Yf_TILED_CCS;
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 45a914850be0..982b0a9fa78b 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -558,6 +558,14 @@ extern "C" {
>   * pitch is required to be a multiple of 4 tile widths.
>   */
>  #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
> +/*
> + * Intel F-tiling(aka Tile4) layout
> + *
> + * This is a tiled layout using 4Kb tiles in row-major layout.
> + * Within the tile pixels are laid out in 64 byte units / sub-tiles in OWORD
> + * (16 bytes) chunks column-major..
> + */
> +#define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 12)
>  
>  /*
>   * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 14/17] uapi/drm/dg2: Format modifier for DG2 unified compression and clear color
  2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:28     ` Simon Ser
  -1 siblings, 0 replies; 57+ messages in thread
From: Simon Ser @ 2021-10-21 14:28 UTC (permalink / raw)
  To: Ramalingam C
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	lucas.demarchi, rodrigo.vivi, Hellstrom Thomas, Matt Roper,
	Pekka Paalanen

For the include/uapi/drm/drm_fourcc.h changes:

Acked-by: Simon Ser <contact@emersion.fr>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Intel-gfx] [PATCH v2 14/17] uapi/drm/dg2: Format modifier for DG2 unified compression and clear color
@ 2021-10-21 14:28     ` Simon Ser
  0 siblings, 0 replies; 57+ messages in thread
From: Simon Ser @ 2021-10-21 14:28 UTC (permalink / raw)
  To: Ramalingam C
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	lucas.demarchi, rodrigo.vivi, Hellstrom Thomas, Matt Roper,
	Pekka Paalanen

For the include/uapi/drm/drm_fourcc.h changes:

Acked-by: Simon Ser <contact@emersion.fr>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 14/17] uapi/drm/dg2: Format modifier for DG2 unified compression and clear color
  2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
@ 2021-10-21 14:35     ` Ville Syrjälä
  -1 siblings, 0 replies; 57+ messages in thread
From: Ville Syrjälä @ 2021-10-21 14:35 UTC (permalink / raw)
  To: Ramalingam C
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	lucas.demarchi, rodrigo.vivi, Hellstrom Thomas, Matt Roper,
	Simon Ser, Pekka Paalanen

On Thu, Oct 21, 2021 at 07:56:24PM +0530, Ramalingam C wrote:
> From: Matt Roper <matthew.d.roper@intel.com>
> 
> DG2 unifies render compression and media compression into a single
> format for the first time.  The programming and buffer layout is
> supposed to match compression on older gen12 platforms, but the
> actual compression algorithm is different from any previous platform; as
> such, we need a new framebuffer modifier to represent buffers in this
> format, but otherwise we can re-use the existing gen12 compression driver
> logic.
> 
> DG2 clear color render compression uses Tile4 layout. Therefore, we need
> to define a new format modifier for uAPI to support clear color rendering.
> 
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> cc: Simon Ser <contact@emersion.fr>
> Cc: Pekka Paalanen <ppaalanen@gmail.com>
> ---
>  drivers/gpu/drm/i915/display/intel_display.c  |  3 ++
>  .../drm/i915/display/intel_display_types.h    | 10 +++-
>  drivers/gpu/drm/i915/display/intel_fb.c       |  7 +++
>  .../drm/i915/display/skl_universal_plane.c    | 49 +++++++++++++++++--
>  include/uapi/drm/drm_fourcc.h                 | 30 ++++++++++++
>  5 files changed, 94 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 9b678839bf2b..2949fe9f5b9f 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -1013,6 +1013,9 @@ intel_get_format_info(const struct drm_mode_fb_cmd2 *cmd)
>  					  cmd->pixel_format);
>  	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS:
>  	case I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS:
> +	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
> +	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
> +	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
>  		return lookup_format_info(gen12_ccs_formats,
>  					  ARRAY_SIZE(gen12_ccs_formats),
>  					  cmd->pixel_format);

That seems not right. Flat CCS is invisible to the user so the format
info should not include a CCS plane.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Intel-gfx] [PATCH v2 14/17] uapi/drm/dg2: Format modifier for DG2 unified compression and clear color
@ 2021-10-21 14:35     ` Ville Syrjälä
  0 siblings, 0 replies; 57+ messages in thread
From: Ville Syrjälä @ 2021-10-21 14:35 UTC (permalink / raw)
  To: Ramalingam C
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	lucas.demarchi, rodrigo.vivi, Hellstrom Thomas, Matt Roper,
	Simon Ser, Pekka Paalanen

On Thu, Oct 21, 2021 at 07:56:24PM +0530, Ramalingam C wrote:
> From: Matt Roper <matthew.d.roper@intel.com>
> 
> DG2 unifies render compression and media compression into a single
> format for the first time.  The programming and buffer layout is
> supposed to match compression on older gen12 platforms, but the
> actual compression algorithm is different from any previous platform; as
> such, we need a new framebuffer modifier to represent buffers in this
> format, but otherwise we can re-use the existing gen12 compression driver
> logic.
> 
> DG2 clear color render compression uses Tile4 layout. Therefore, we need
> to define a new format modifier for uAPI to support clear color rendering.
> 
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> cc: Simon Ser <contact@emersion.fr>
> Cc: Pekka Paalanen <ppaalanen@gmail.com>
> ---
>  drivers/gpu/drm/i915/display/intel_display.c  |  3 ++
>  .../drm/i915/display/intel_display_types.h    | 10 +++-
>  drivers/gpu/drm/i915/display/intel_fb.c       |  7 +++
>  .../drm/i915/display/skl_universal_plane.c    | 49 +++++++++++++++++--
>  include/uapi/drm/drm_fourcc.h                 | 30 ++++++++++++
>  5 files changed, 94 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 9b678839bf2b..2949fe9f5b9f 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -1013,6 +1013,9 @@ intel_get_format_info(const struct drm_mode_fb_cmd2 *cmd)
>  					  cmd->pixel_format);
>  	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS:
>  	case I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS:
> +	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
> +	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
> +	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
>  		return lookup_format_info(gen12_ccs_formats,
>  					  ARRAY_SIZE(gen12_ccs_formats),
>  					  cmd->pixel_format);

That seems not right. Flat CCS is invisible to the user so the format
info should not include a CCS plane.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/dg2: Enabling 64k page size and flat ccs (rev2)
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
                   ` (17 preceding siblings ...)
  (?)
@ 2021-10-21 16:02 ` Patchwork
  -1 siblings, 0 replies; 57+ messages in thread
From: Patchwork @ 2021-10-21 16:02 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/dg2: Enabling 64k page size and flat ccs (rev2)
URL   : https://patchwork.freedesktop.org/series/95686/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
a3c94fe1d5cd drm/i915: Add has_64k_pages flag
45ceb0b40034 drm/i915/xehpsdv: set min page-size to 64K
03200bc06b46 drm/i915/xehpsdv: enforce min GTT alignment
9a84cb61c10d drm/i915: enforce min page size for scratch
b515c67a4176 drm/i915/gtt/xehpsdv: move scratch page to system memory
3ca8d7a82f95 drm/i915/xehpsdv: support 64K GTT pages
2ed3c1383b59 drm/i915: Add vm min alignment support
fd0534c7e65a drm/i915/selftests: account for min_alignment in GTT selftests
-:108: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#108: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:456:
+						if (offset < hole_start + aligned_size)

-:120: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#120: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:480:
+						if (offset + aligned_size > hole_end)

-:138: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#138: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:496:
+						if (offset < hole_start + aligned_size)

-:150: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#150: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:519:
+						if (offset + aligned_size > hole_end)

-:168: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#168: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:535:
+						if (offset < hole_start + aligned_size)

-:180: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#180: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:559:
+						if (offset + aligned_size > hole_end)

-:198: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#198: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:575:
+						if (offset < hole_start + aligned_size)

-:210: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#210: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:598:
+						if (offset + aligned_size > hole_end)

total: 0 errors, 8 warnings, 0 checks, 297 lines checked
08702a20cfb0 drm/i915/xehpsdv: implement memory coloring
dc2516e13157 drm/i915/xehpsdv: Add has_flat_ccs to device info
2287c2405ffa drm/i915/lmem: Enable lmem for platforms with Flat CCS
39e9fd04a236 drm/i915/gt: Clear compress metadata for Xe_HP platforms
-:8: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#8: 
device memory buffer object we also need to clear the associated CCS buffer.

total: 0 errors, 1 warnings, 0 checks, 171 lines checked
6d6995163aa1 drm/i915/dg2: Tile 4 plane format support
-:198: WARNING:LINE_CONTINUATIONS: Avoid unnecessary line continuations
#198: FILE: drivers/gpu/drm/i915/i915_pci.c:975:
+	.has_ftile = 1, \

total: 0 errors, 1 warnings, 0 checks, 168 lines checked
a25cd0cf423a uapi/drm/dg2: Format modifier for DG2 unified compression and clear color
86bc18650b6c drm/i915/uapi: document behaviour for DG2 64K support
05178dab38b0 drm/i915/Flat-CCS: Document on Flat-CCS memory compression
dd2578c083ba Doc/gpu/rfc/i915: i915 DG2 uAPI
-:18: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#18: 
new file mode 100644

-:23: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#23: FILE: Documentation/gpu/rfc/i915_dg2.rst:1:
+====================

total: 0 errors, 2 warnings, 0 checks, 41 lines checked



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/dg2: Enabling 64k page size and flat ccs (rev2)
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
                   ` (18 preceding siblings ...)
  (?)
@ 2021-10-21 16:04 ` Patchwork
  -1 siblings, 0 replies; 57+ messages in thread
From: Patchwork @ 2021-10-21 16:04 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/dg2: Enabling 64k page size and flat ccs (rev2)
URL   : https://patchwork.freedesktop.org/series/95686/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
-
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:318:49: error: static assertion failed: "amd_sriov_msg_pf2vf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1416:25: error: incompatible types in comparison expression (different address spaces):
+drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1416:25:    struct dma_fence *
+drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1416:25:    struct dma_fence [noderef] __rcu *
+drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1417:17: error: incompatible types in comparison expression (different address spaces):
+drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1417:17:    struct dma_fence *
+drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1417:17:    struct dma_fence [noderef] __rcu *
+drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1476:17: error: incompatible types in comparison expression (different address spaces):
+drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1476:17:    struct dma_fence *
+drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1476:17:    struct dma_fence [noderef] __rcu *
+drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:355:16: error: incompatible types in comparison expression (different type sizes):
+drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:355:16:    unsigned long *
+drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:355:16:    unsigned long long *
+drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4481:31: error: incompatible types in comparison expression (different address spaces):
+drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4481:31:    struct dma_fence *
+drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4481:31:    struct dma_fence [noderef] __rcu *
+drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4483:33: error: incompatible types in comparison expression (different address spaces):
+drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4483:33:    struct dma_fence *
+drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4483:33:    struct dma_fence [noderef] __rcu *
+drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:294:25: error: incompatible types in comparison expression (different address spaces):
+drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:294:25:    struct dma_fence *
+drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:294:25:    struct dma_fence [noderef] __rcu *
+drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:295:17: error: incompatible types in comparison expression (different address spaces):
+drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:295:17:    struct dma_fence *
+drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:295:17:    struct dma_fence [noderef] __rcu *
+drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:344:17: error: incompatible types in comparison expression (different address spaces):
+drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:344:17:    struct dma_fence *
+drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:344:17:    struct dma_fence [noderef] __rcu *
+drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c:117:1: warning: no newline at end of file
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
+drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion faile



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/dg2: Enabling 64k page size and flat ccs (rev2)
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
                   ` (19 preceding siblings ...)
  (?)
@ 2021-10-21 16:33 ` Patchwork
  -1 siblings, 0 replies; 57+ messages in thread
From: Patchwork @ 2021-10-21 16:33 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 5092 bytes --]

== Series Details ==

Series: drm/i915/dg2: Enabling 64k page size and flat ccs (rev2)
URL   : https://patchwork.freedesktop.org/series/95686/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10770 -> Patchwork_21407
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_21407 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21407, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21407/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_21407:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@gem:
    - fi-pnv-d510:        [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10770/fi-pnv-d510/igt@i915_selftest@live@gem.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21407/fi-pnv-d510/igt@i915_selftest@live@gem.html

  
Known issues
------------

  Here are the changes found in Patchwork_21407 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_prime@i915-to-amd:
    - fi-snb-2520m:       NOTRUN -> [SKIP][3] ([fdo#109271]) +37 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21407/fi-snb-2520m/igt@amdgpu/amd_prime@i915-to-amd.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-snb-2520m:       NOTRUN -> [SKIP][4] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21407/fi-snb-2520m/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@runner@aborted:
    - fi-pnv-d510:        NOTRUN -> [FAIL][5] ([fdo#109271] / [i915#2403] / [i915#4312])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21407/fi-pnv-d510/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@hangcheck:
    - {fi-hsw-gt1}:       [DMESG-WARN][6] ([i915#3303]) -> [PASS][7]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10770/fi-hsw-gt1/igt@i915_selftest@live@hangcheck.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21407/fi-hsw-gt1/igt@i915_selftest@live@hangcheck.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#2403]: https://gitlab.freedesktop.org/drm/intel/issues/2403
  [i915#3303]: https://gitlab.freedesktop.org/drm/intel/issues/3303
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312


Participating hosts (38 -> 8)
------------------------------

  ERROR: It appears as if the changes made in Patchwork_21407 prevented too many machines from booting.

  Additional (1): fi-snb-2520m 
  Missing    (31): fi-kbl-soraka fi-rkl-11600 fi-rkl-guc fi-icl-u2 fi-apl-guc fi-icl-y fi-skl-6600u fi-cml-u2 fi-bxt-dsi fi-bdw-5557u fi-bsw-n3050 fi-glk-dsi fi-ilk-650 fi-kbl-7500u fi-ctg-p8600 fi-bsw-nick fi-skl-6700k2 fi-kbl-7567u fi-tgl-dsi fi-skl-guc fi-cfl-8700k fi-ehl-2 fi-jsl-1 fi-hsw-4200u fi-tgl-1115g4 fi-bsw-cyan fi-cfl-guc fi-kbl-guc fi-cfl-8109u fi-kbl-8809g fi-bsw-kefka 


Build changes
-------------

  * Linux: CI_DRM_10770 -> Patchwork_21407

  CI-20190529: 20190529
  CI_DRM_10770: 214e8b46143416c4a130cbaeea8430ad9fa19f63 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6258: 4c80c71d7dec29b6376846ae96bd04dc0b6e34d9 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_21407: dd2578c083baf57fb6fbd219b904c1b2d8c992aa @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

dd2578c083ba Doc/gpu/rfc/i915: i915 DG2 uAPI
05178dab38b0 drm/i915/Flat-CCS: Document on Flat-CCS memory compression
86bc18650b6c drm/i915/uapi: document behaviour for DG2 64K support
a25cd0cf423a uapi/drm/dg2: Format modifier for DG2 unified compression and clear color
6d6995163aa1 drm/i915/dg2: Tile 4 plane format support
39e9fd04a236 drm/i915/gt: Clear compress metadata for Xe_HP platforms
2287c2405ffa drm/i915/lmem: Enable lmem for platforms with Flat CCS
dc2516e13157 drm/i915/xehpsdv: Add has_flat_ccs to device info
08702a20cfb0 drm/i915/xehpsdv: implement memory coloring
fd0534c7e65a drm/i915/selftests: account for min_alignment in GTT selftests
2ed3c1383b59 drm/i915: Add vm min alignment support
3ca8d7a82f95 drm/i915/xehpsdv: support 64K GTT pages
b515c67a4176 drm/i915/gtt/xehpsdv: move scratch page to system memory
9a84cb61c10d drm/i915: enforce min page size for scratch
03200bc06b46 drm/i915/xehpsdv: enforce min GTT alignment
45ceb0b40034 drm/i915/xehpsdv: set min page-size to 64K
a3c94fe1d5cd drm/i915: Add has_64k_pages flag

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21407/index.html

[-- Attachment #2: Type: text/html, Size: 6054 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 02/17] drm/i915/xehpsdv: set min page-size to 64K
  2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
@ 2021-10-22  6:47     ` Lucas De Marchi
  -1 siblings, 0 replies; 57+ messages in thread
From: Lucas De Marchi @ 2021-10-22  6:47 UTC (permalink / raw)
  To: Ramalingam C
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	rodrigo.vivi, Hellstrom Thomas, Joonas Lahtinen

On Thu, Oct 21, 2021 at 07:56:12PM +0530, Ramalingam C wrote:
>From: Matthew Auld <matthew.auld@intel.com>
>
>LMEM should be allocated at 64K granularity, since 4K page support will
>eventually be dropped for LMEM when using the PPGTT.

this is not for xehpsdv only. So this should probably drop the prefix
and be:

For discrete cards that support 64k pages we should be using it since 4K
page support ...



Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>

Lucas De Marhci

>
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>---
> drivers/gpu/drm/i915/gem/i915_gem_stolen.c  | 6 +++++-
> drivers/gpu/drm/i915/gt/intel_region_lmem.c | 5 ++++-
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>index ddd37ccb1362..f52a06f05fc7 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>@@ -778,6 +778,7 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
> 	struct intel_uncore *uncore = &i915->uncore;
> 	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> 	struct intel_memory_region *mem;
>+	resource_size_t min_page_size;
> 	resource_size_t io_start;
> 	resource_size_t lmem_size;
> 	u64 lmem_base;
>@@ -789,8 +790,11 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
> 	lmem_size = pci_resource_len(pdev, 2) - lmem_base;
> 	io_start = pci_resource_start(pdev, 2) + lmem_base;
>
>+	min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :
>+						I915_GTT_PAGE_SIZE_4K;
>+
> 	mem = intel_memory_region_create(i915, lmem_base, lmem_size,
>-					 I915_GTT_PAGE_SIZE_4K, io_start,
>+					 min_page_size, io_start,
> 					 type, instance,
> 					 &i915_region_stolen_lmem_ops);
> 	if (IS_ERR(mem))
>diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>index afb35d2e5c73..073d28d96669 100644
>--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>@@ -193,6 +193,7 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
> 	struct intel_uncore *uncore = gt->uncore;
> 	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> 	struct intel_memory_region *mem;
>+	resource_size_t min_page_size;
> 	resource_size_t io_start;
> 	resource_size_t lmem_size;
> 	int err;
>@@ -207,10 +208,12 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
> 	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
> 		return ERR_PTR(-ENODEV);
>
>+	min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :
>+						I915_GTT_PAGE_SIZE_4K;
> 	mem = intel_memory_region_create(i915,
> 					 0,
> 					 lmem_size,
>-					 I915_GTT_PAGE_SIZE_4K,
>+					 min_page_size,
> 					 io_start,
> 					 INTEL_MEMORY_LOCAL,
> 					 0,
>-- 
>2.20.1
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Intel-gfx] [PATCH v2 02/17] drm/i915/xehpsdv: set min page-size to 64K
@ 2021-10-22  6:47     ` Lucas De Marchi
  0 siblings, 0 replies; 57+ messages in thread
From: Lucas De Marchi @ 2021-10-22  6:47 UTC (permalink / raw)
  To: Ramalingam C
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	rodrigo.vivi, Hellstrom Thomas, Joonas Lahtinen

On Thu, Oct 21, 2021 at 07:56:12PM +0530, Ramalingam C wrote:
>From: Matthew Auld <matthew.auld@intel.com>
>
>LMEM should be allocated at 64K granularity, since 4K page support will
>eventually be dropped for LMEM when using the PPGTT.

this is not for xehpsdv only. So this should probably drop the prefix
and be:

For discrete cards that support 64k pages we should be using it since 4K
page support ...



Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>

Lucas De Marhci

>
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>---
> drivers/gpu/drm/i915/gem/i915_gem_stolen.c  | 6 +++++-
> drivers/gpu/drm/i915/gt/intel_region_lmem.c | 5 ++++-
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>index ddd37ccb1362..f52a06f05fc7 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
>@@ -778,6 +778,7 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
> 	struct intel_uncore *uncore = &i915->uncore;
> 	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> 	struct intel_memory_region *mem;
>+	resource_size_t min_page_size;
> 	resource_size_t io_start;
> 	resource_size_t lmem_size;
> 	u64 lmem_base;
>@@ -789,8 +790,11 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
> 	lmem_size = pci_resource_len(pdev, 2) - lmem_base;
> 	io_start = pci_resource_start(pdev, 2) + lmem_base;
>
>+	min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :
>+						I915_GTT_PAGE_SIZE_4K;
>+
> 	mem = intel_memory_region_create(i915, lmem_base, lmem_size,
>-					 I915_GTT_PAGE_SIZE_4K, io_start,
>+					 min_page_size, io_start,
> 					 type, instance,
> 					 &i915_region_stolen_lmem_ops);
> 	if (IS_ERR(mem))
>diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>index afb35d2e5c73..073d28d96669 100644
>--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
>@@ -193,6 +193,7 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
> 	struct intel_uncore *uncore = gt->uncore;
> 	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> 	struct intel_memory_region *mem;
>+	resource_size_t min_page_size;
> 	resource_size_t io_start;
> 	resource_size_t lmem_size;
> 	int err;
>@@ -207,10 +208,12 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
> 	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
> 		return ERR_PTR(-ENODEV);
>
>+	min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :
>+						I915_GTT_PAGE_SIZE_4K;
> 	mem = intel_memory_region_create(i915,
> 					 0,
> 					 lmem_size,
>-					 I915_GTT_PAGE_SIZE_4K,
>+					 min_page_size,
> 					 io_start,
> 					 INTEL_MEMORY_LOCAL,
> 					 0,
>-- 
>2.20.1
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Intel-gfx] [PATCH v2 01/17] drm/i915: Add has_64k_pages flag
  2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
  (?)
@ 2021-10-22  6:47   ` Lucas De Marchi
  -1 siblings, 0 replies; 57+ messages in thread
From: Lucas De Marchi @ 2021-10-22  6:47 UTC (permalink / raw)
  To: Ramalingam C
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	rodrigo.vivi, Hellstrom Thomas, Stuart Summers

On Thu, Oct 21, 2021 at 07:56:11PM +0530, Ramalingam C wrote:
>From: Stuart Summers <stuart.summers@intel.com>
>
>Add a new platform flag, has_64k_pages, for platforms supporting
>base page sizes of 64k.
>
>Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>Signed-off-by: Ramalingam C <ramalingam.c@intel.com>


Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>

Lucas De Marchi

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 07/17] drm/i915: Add vm min alignment support
  2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
@ 2021-10-22 16:56     ` Matthew Auld
  -1 siblings, 0 replies; 57+ messages in thread
From: Matthew Auld @ 2021-10-22 16:56 UTC (permalink / raw)
  To: Ramalingam C, dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, lucas.demarchi, rodrigo.vivi,
	Hellstrom Thomas, Bommu Krishnaiah, Wilson Chris P

On 21/10/2021 15:26, Ramalingam C wrote:
> From: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
> 
> Replace the hard coded 4K alignment value with vm->min_alignment.
> 
> Cc: Wilson Chris P <chris.p.wilson@intel.com>
> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

Although likely want to squash patch patches 3, 7 and 8, as suggested by 
Chris.

> ---
>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 ++++++++++++-------
>   drivers/gpu/drm/i915/gt/intel_gtt.c           |  9 ++++++++
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++++++++
>   3 files changed, 33 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> index 8402ed925a69..6b9b861e43e5 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> @@ -39,6 +39,7 @@ struct tiled_blits {
>   	struct blit_buffer scratch;
>   	struct i915_vma *batch;
>   	u64 hole;
> +	u64 align;
>   	u32 width;
>   	u32 height;
>   };
> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>   		goto err_free;
>   	}
>   
> -	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
> +	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
> +
> +	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>   	hole_size *= 2; /* room to maneuver */
> -	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
> +	hole_size += 2 * t->align; /* padding on either side */
>   
>   	mutex_lock(&t->ce->vm->mutex);
>   	memset(&hole, 0, sizeof(hole));
>   	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
> -					  hole_size, 0, I915_COLOR_UNEVICTABLE,
> +					  hole_size, t->align,
> +					  I915_COLOR_UNEVICTABLE,
>   					  0, U64_MAX,
>   					  DRM_MM_INSERT_BEST);
>   	if (!err)
> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>   		goto err_put;
>   	}
>   
> -	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
> +	t->hole = hole.start + t->align;
>   	pr_info("Using hole at %llx\n", t->hole);
>   
>   	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
>   static int tiled_blits_prepare(struct tiled_blits *t,
>   			       struct rnd_state *prng)
>   {
> -	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
> +	u64 offset = round_up(t->width * t->height * 4, t->align);
>   	u32 *map;
>   	int err;
>   	int i;
> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
>   
>   static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>   {
> -	u64 offset =
> -		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
> +	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>   	int err;
>   
>   	/* We want to check position invariant tiling across GTT eviction */
> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>   
>   	/* Reposition so that we overlap the old addresses, and slightly off */
>   	err = tiled_blit(t,
> -			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
> +			 &t->buffers[2], t->hole + t->align,
>   			 &t->buffers[1], t->hole + 3 * offset / 2);
>   	if (err)
>   		return err;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 56fbd37a6b54..4743921b7638 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -216,6 +216,15 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>   
>   	GEM_BUG_ON(!vm->total);
>   	drm_mm_init(&vm->mm, 0, vm->total);
> +
> +	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
> +		 ARRAY_SIZE(vm->min_alignment));
> +
> +	if (HAS_64K_PAGES(vm->i915)) {
> +		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
> +		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
> +	}
> +
>   	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>   
>   	INIT_LIST_HEAD(&vm->bound_list);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 6d0233ffae17..20101eef4c95 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -28,6 +28,8 @@
>   #include "gt/intel_reset.h"
>   #include "i915_selftest.h"
>   #include "i915_vma_types.h"
> +#include "i915_params.h"
> +#include "intel_memory_region.h"
>   
>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
>   
> @@ -224,6 +226,7 @@ struct i915_address_space {
>   	struct device *dma;
>   	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
>   	u64 reserved;		/* size addr space reserved */
> +	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>   
>   	unsigned int bind_async_flags;
>   
> @@ -382,6 +385,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
>   	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>   }
>   
> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
> +					enum intel_memory_type type)
> +{
> +	return vm->min_alignment[type];
> +}
> +
>   static inline bool
>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>   {
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Intel-gfx] [PATCH v2 07/17] drm/i915: Add vm min alignment support
@ 2021-10-22 16:56     ` Matthew Auld
  0 siblings, 0 replies; 57+ messages in thread
From: Matthew Auld @ 2021-10-22 16:56 UTC (permalink / raw)
  To: Ramalingam C, dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, lucas.demarchi, rodrigo.vivi,
	Hellstrom Thomas, Bommu Krishnaiah, Wilson Chris P

On 21/10/2021 15:26, Ramalingam C wrote:
> From: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
> 
> Replace the hard coded 4K alignment value with vm->min_alignment.
> 
> Cc: Wilson Chris P <chris.p.wilson@intel.com>
> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

Although likely want to squash patch patches 3, 7 and 8, as suggested by 
Chris.

> ---
>   .../i915/gem/selftests/i915_gem_client_blt.c  | 23 ++++++++++++-------
>   drivers/gpu/drm/i915/gt/intel_gtt.c           |  9 ++++++++
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |  9 ++++++++
>   3 files changed, 33 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> index 8402ed925a69..6b9b861e43e5 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
> @@ -39,6 +39,7 @@ struct tiled_blits {
>   	struct blit_buffer scratch;
>   	struct i915_vma *batch;
>   	u64 hole;
> +	u64 align;
>   	u32 width;
>   	u32 height;
>   };
> @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>   		goto err_free;
>   	}
>   
> -	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
> +	t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
> +	t->align = max(t->align,
> +		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
> +
> +	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
>   	hole_size *= 2; /* room to maneuver */
> -	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
> +	hole_size += 2 * t->align; /* padding on either side */
>   
>   	mutex_lock(&t->ce->vm->mutex);
>   	memset(&hole, 0, sizeof(hole));
>   	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
> -					  hole_size, 0, I915_COLOR_UNEVICTABLE,
> +					  hole_size, t->align,
> +					  I915_COLOR_UNEVICTABLE,
>   					  0, U64_MAX,
>   					  DRM_MM_INSERT_BEST);
>   	if (!err)
> @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
>   		goto err_put;
>   	}
>   
> -	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
> +	t->hole = hole.start + t->align;
>   	pr_info("Using hole at %llx\n", t->hole);
>   
>   	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
> @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
>   static int tiled_blits_prepare(struct tiled_blits *t,
>   			       struct rnd_state *prng)
>   {
> -	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
> +	u64 offset = round_up(t->width * t->height * 4, t->align);
>   	u32 *map;
>   	int err;
>   	int i;
> @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
>   
>   static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>   {
> -	u64 offset =
> -		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
> +	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
>   	int err;
>   
>   	/* We want to check position invariant tiling across GTT eviction */
> @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
>   
>   	/* Reposition so that we overlap the old addresses, and slightly off */
>   	err = tiled_blit(t,
> -			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
> +			 &t->buffers[2], t->hole + t->align,
>   			 &t->buffers[1], t->hole + 3 * offset / 2);
>   	if (err)
>   		return err;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
> index 56fbd37a6b54..4743921b7638 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> @@ -216,6 +216,15 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
>   
>   	GEM_BUG_ON(!vm->total);
>   	drm_mm_init(&vm->mm, 0, vm->total);
> +
> +	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
> +		 ARRAY_SIZE(vm->min_alignment));
> +
> +	if (HAS_64K_PAGES(vm->i915)) {
> +		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
> +		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
> +	}
> +
>   	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
>   
>   	INIT_LIST_HEAD(&vm->bound_list);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 6d0233ffae17..20101eef4c95 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -28,6 +28,8 @@
>   #include "gt/intel_reset.h"
>   #include "i915_selftest.h"
>   #include "i915_vma_types.h"
> +#include "i915_params.h"
> +#include "intel_memory_region.h"
>   
>   #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
>   
> @@ -224,6 +226,7 @@ struct i915_address_space {
>   	struct device *dma;
>   	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
>   	u64 reserved;		/* size addr space reserved */
> +	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>   
>   	unsigned int bind_async_flags;
>   
> @@ -382,6 +385,12 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
>   	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>   }
>   
> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
> +					enum intel_memory_type type)
> +{
> +	return vm->min_alignment[type];
> +}
> +
>   static inline bool
>   i915_vm_has_cache_coloring(struct i915_address_space *vm)
>   {
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 06/17] drm/i915/xehpsdv: support 64K GTT pages
  2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
@ 2021-10-22 17:04     ` Matthew Auld
  -1 siblings, 0 replies; 57+ messages in thread
From: Matthew Auld @ 2021-10-22 17:04 UTC (permalink / raw)
  To: Ramalingam C, dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, lucas.demarchi, rodrigo.vivi,
	Hellstrom Thomas, Joonas Lahtinen

On 21/10/2021 15:26, Ramalingam C wrote:
> From: Matthew Auld <matthew.auld@intel.com>
> 
> XEHPSDV optimises 64K GTT pages for local-memory, since everything
> should be allocated at 64K granularity. We say goodbye to sparse
> entries, and instead get a compact 256B page-table for 64K pages,
> which should be more cache friendly. 4K pages for local-memory
> are no longer supported by the HW.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  61 ++++++++++
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 106 +++++++++++++++++-
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
>   4 files changed, 168 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index 41d0680f3bd7..9c2ffa4090f1 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -1451,6 +1451,66 @@ static int igt_ppgtt_sanity_check(void *arg)
>   	return err;
>   }
>   
> +static int igt_ppgtt_compact(void *arg)
> +{
> +	struct i915_gem_context *ctx = arg;
> +	struct drm_i915_private *i915 = ctx->i915;
> +	struct drm_i915_gem_object *obj;
> +	int err;
> +
> +	/*
> +	 * Simple test to catch issues with compact 64K pages -- since the pt is
> +	 * compacted to 256B that gives us 32 entries per pt, however since the
> +	 * backing page for the pt is 4K, any extra entries we might incorrectly
> +	 * write out should be ignored by the HW. If ever hit such a case this
> +	 * test should catch it since some of our writes would land in scratch.
> +	 */
> +
> +	if (!HAS_64K_PAGES(i915)) {
> +		pr_info("device lacks compact 64K page support, skipping\n");
> +		return 0;
> +	}
> +
> +	if (!HAS_LMEM(i915)) {
> +		pr_info("device lacks LMEM support, skipping\n");
> +		return 0;
> +	}
> +
> +	/* We want the range to cover multiple page-table boundaries. */
> +	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
> +	if (IS_ERR(obj))
> +		return err;
> +
> +	err = i915_gem_object_pin_pages_unlocked(obj);
> +	if (err)
> +		goto out_put;
> +
> +	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
> +		pr_info("LMEM compact unable to allocate huge-page(s)\n");
> +		goto out_unpin;
> +	}
> +
> +	/*
> +	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
> +	 * insertion.
> +	 */
> +	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
> +
> +	err = igt_write_huge(ctx, obj);
> +	if (err)
> +		pr_err("LMEM compact write-huge failed\n");
> +
> +out_unpin:
> +	i915_gem_object_unpin_pages(obj);
> +out_put:
> +	i915_gem_object_put(obj);
> +
> +	if (err == -ENOMEM)
> +		err = 0;
> +
> +	return err;
> +}

And yeah parroting what Chris said: we probably need something more 
rigorous here for exercising the memory coloring. Although maybe wait 
for userpspace poeple to comment on the 64K uapi interactions first; 
maybe it turns we don't need to support coloring for userspace objects, 
or something...

> +
>   static int igt_tmpfs_fallback(void *arg)
>   {
>   	struct i915_gem_context *ctx = arg;
> @@ -1681,6 +1741,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(igt_tmpfs_fallback),
>   		SUBTEST(igt_ppgtt_smoke_huge),
>   		SUBTEST(igt_ppgtt_sanity_check),
> +		SUBTEST(igt_ppgtt_compact),
>   	};
>   	struct i915_gem_context *ctx;
>   	struct i915_address_space *vm;
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 6bff6bf1a450..fec0f20f1b93 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
>   						   start, end, lvl);
>   		} else {
>   			unsigned int count;
> +			unsigned int pte = gen8_pd_index(start, 0);
> +			unsigned int num_ptes;
>   			u64 *vaddr;
>   
>   			count = gen8_pt_count(start, end);
> @@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
>   			    atomic_read(&pt->used));
>   			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
>   
> +			num_ptes = count;
> +			if (pt->is_compact) {
> +				GEM_BUG_ON(num_ptes % 16);
> +				GEM_BUG_ON(pte % 16);
> +				num_ptes /= 16;
> +				pte /= 16;
> +			}
> +
>   			vaddr = px_vaddr(pt);
> -			memset64(vaddr + gen8_pd_index(start, 0),
> +			memset64(vaddr + pte,
>   				 vm->scratch[0]->encode,
> -				 count);
> +				 num_ptes);
>   
>   			atomic_sub(count, &pt->used);
>   			start += count;
> @@ -454,6 +464,93 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>   	return idx;
>   }
>   
> +static void
> +xehpsdv_ppgtt_insert_huge(struct i915_vma *vma,
> +			  struct sgt_dma *iter,
> +			  enum i915_cache_level cache_level,
> +			  u32 flags)
> +{
> +	const gen8_pte_t pte_encode = vma->vm->pte_encode(0, cache_level, flags);
> +	unsigned int rem = sg_dma_len(iter->sg);
> +	u64 start = vma->node.start;
> +
> +	GEM_BUG_ON(!i915_vm_is_4lvl(vma->vm));
> +
> +	do {
> +		struct i915_page_directory * const pdp =
> +			gen8_pdp_for_page_address(vma->vm, start);
> +		struct i915_page_directory * const pd =
> +			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
> +		struct i915_page_table *pt =
> +			i915_pt_entry(pd, __gen8_pte_index(start, 1));
> +		gen8_pte_t encode = pte_encode;
> +		unsigned int page_size;
> +		gen8_pte_t *vaddr;
> +		u16 index, max;
> +
> +		max = I915_PDES;
> +
> +		if (vma->page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
> +		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
> +		    rem >= I915_GTT_PAGE_SIZE_2M &&
> +		    !__gen8_pte_index(start, 0)) {
> +			index = __gen8_pte_index(start, 1);
> +			encode |= GEN8_PDE_PS_2M;
> +			page_size = I915_GTT_PAGE_SIZE_2M;
> +
> +			vaddr = px_vaddr(pd);
> +		} else {
> +			if (encode & GEN12_PPGTT_PTE_LM) {
> +				GEM_BUG_ON(!i915_gem_object_is_lmem(vma->obj));
> +				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
> +				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
> +				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
> +						       I915_GTT_PAGE_SIZE_64K));
> +
> +				index = __gen8_pte_index(start, 0) / 16;
> +				page_size = I915_GTT_PAGE_SIZE_64K;
> +
> +				max /= 16;
> +
> +				vaddr = px_vaddr(pd);
> +				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
> +
> +				pt->is_compact = true;
> +			} else {
> +				GEM_BUG_ON(i915_gem_object_is_lmem(vma->obj));
> +				GEM_BUG_ON(pt->is_compact);
> +				index =  __gen8_pte_index(start, 0);
> +				page_size = I915_GTT_PAGE_SIZE;
> +			}
> +
> +			vaddr = px_vaddr(pt);
> +		}
> +
> +		do {
> +			GEM_BUG_ON(rem < page_size);
> +			vaddr[index++] = encode | iter->dma;
> +
> +			start += page_size;
> +			iter->dma += page_size;
> +			rem -= page_size;
> +			if (iter->dma >= iter->max) {
> +				iter->sg = __sg_next(iter->sg);
> +				GEM_BUG_ON(!iter->sg);
> +
> +				rem = sg_dma_len(iter->sg);
> +				GEM_BUG_ON(!rem);
> +				iter->dma = sg_dma_address(iter->sg);
> +				iter->max = iter->dma + rem;
> +
> +				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
> +					break;
> +			}
> +		} while (rem >= page_size && index < max);
> +
> +		vma->page_sizes.gtt |= page_size;
> +	} while (iter->sg && sg_dma_len(iter->sg));
> +}
> +
>   static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
>   				   struct sgt_dma *iter,
>   				   enum i915_cache_level cache_level,
> @@ -586,7 +683,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   	struct sgt_dma iter = sgt_dma(vma);
>   
>   	if (vma->page_sizes.sg > I915_GTT_PAGE_SIZE) {
> -		gen8_ppgtt_insert_huge(vma, &iter, cache_level, flags);
> +		if (HAS_64K_PAGES(vm->i915))
> +			xehpsdv_ppgtt_insert_huge(vma, &iter, cache_level, flags);
> +		else
> +			gen8_ppgtt_insert_huge(vma, &iter, cache_level, flags);
>   	} else  {
>   		u64 idx = vma->node.start >> GEN8_PTE_SHIFT;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 6d13f4ab4d4a..6d0233ffae17 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -89,6 +89,8 @@ typedef u64 gen8_pte_t;
>   
>   #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
>   
> +#define GEN12_PDE_64K BIT(6)
> +
>   /*
>    * Cacheability Control is a 4-bit value. The low three bits are stored in bits
>    * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
> @@ -154,6 +156,7 @@ struct i915_page_table {
>   		atomic_t used;
>   		struct i915_page_table *stash;
>   	};
> +	bool is_compact;
>   };
>   
>   struct i915_page_directory {
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index 4396bfd630d8..b8238f5bc8b1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> +	pt->is_compact = false;
>   	atomic_set(&pt->used, 0);
>   	return pt;
>   }
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Intel-gfx] [PATCH v2 06/17] drm/i915/xehpsdv: support 64K GTT pages
@ 2021-10-22 17:04     ` Matthew Auld
  0 siblings, 0 replies; 57+ messages in thread
From: Matthew Auld @ 2021-10-22 17:04 UTC (permalink / raw)
  To: Ramalingam C, dri-devel, intel-gfx
  Cc: Daniel Vetter, CQ Tang, lucas.demarchi, rodrigo.vivi,
	Hellstrom Thomas, Joonas Lahtinen

On 21/10/2021 15:26, Ramalingam C wrote:
> From: Matthew Auld <matthew.auld@intel.com>
> 
> XEHPSDV optimises 64K GTT pages for local-memory, since everything
> should be allocated at 64K granularity. We say goodbye to sparse
> entries, and instead get a compact 256B page-table for 64K pages,
> which should be more cache friendly. 4K pages for local-memory
> are no longer supported by the HW.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  61 ++++++++++
>   drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 106 +++++++++++++++++-
>   drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
>   4 files changed, 168 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> index 41d0680f3bd7..9c2ffa4090f1 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
> @@ -1451,6 +1451,66 @@ static int igt_ppgtt_sanity_check(void *arg)
>   	return err;
>   }
>   
> +static int igt_ppgtt_compact(void *arg)
> +{
> +	struct i915_gem_context *ctx = arg;
> +	struct drm_i915_private *i915 = ctx->i915;
> +	struct drm_i915_gem_object *obj;
> +	int err;
> +
> +	/*
> +	 * Simple test to catch issues with compact 64K pages -- since the pt is
> +	 * compacted to 256B that gives us 32 entries per pt, however since the
> +	 * backing page for the pt is 4K, any extra entries we might incorrectly
> +	 * write out should be ignored by the HW. If ever hit such a case this
> +	 * test should catch it since some of our writes would land in scratch.
> +	 */
> +
> +	if (!HAS_64K_PAGES(i915)) {
> +		pr_info("device lacks compact 64K page support, skipping\n");
> +		return 0;
> +	}
> +
> +	if (!HAS_LMEM(i915)) {
> +		pr_info("device lacks LMEM support, skipping\n");
> +		return 0;
> +	}
> +
> +	/* We want the range to cover multiple page-table boundaries. */
> +	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
> +	if (IS_ERR(obj))
> +		return err;
> +
> +	err = i915_gem_object_pin_pages_unlocked(obj);
> +	if (err)
> +		goto out_put;
> +
> +	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
> +		pr_info("LMEM compact unable to allocate huge-page(s)\n");
> +		goto out_unpin;
> +	}
> +
> +	/*
> +	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
> +	 * insertion.
> +	 */
> +	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
> +
> +	err = igt_write_huge(ctx, obj);
> +	if (err)
> +		pr_err("LMEM compact write-huge failed\n");
> +
> +out_unpin:
> +	i915_gem_object_unpin_pages(obj);
> +out_put:
> +	i915_gem_object_put(obj);
> +
> +	if (err == -ENOMEM)
> +		err = 0;
> +
> +	return err;
> +}

And yeah parroting what Chris said: we probably need something more 
rigorous here for exercising the memory coloring. Although maybe wait 
for userpspace poeple to comment on the 64K uapi interactions first; 
maybe it turns we don't need to support coloring for userspace objects, 
or something...

> +
>   static int igt_tmpfs_fallback(void *arg)
>   {
>   	struct i915_gem_context *ctx = arg;
> @@ -1681,6 +1741,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(igt_tmpfs_fallback),
>   		SUBTEST(igt_ppgtt_smoke_huge),
>   		SUBTEST(igt_ppgtt_sanity_check),
> +		SUBTEST(igt_ppgtt_compact),
>   	};
>   	struct i915_gem_context *ctx;
>   	struct i915_address_space *vm;
> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> index 6bff6bf1a450..fec0f20f1b93 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
> @@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
>   						   start, end, lvl);
>   		} else {
>   			unsigned int count;
> +			unsigned int pte = gen8_pd_index(start, 0);
> +			unsigned int num_ptes;
>   			u64 *vaddr;
>   
>   			count = gen8_pt_count(start, end);
> @@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
>   			    atomic_read(&pt->used));
>   			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
>   
> +			num_ptes = count;
> +			if (pt->is_compact) {
> +				GEM_BUG_ON(num_ptes % 16);
> +				GEM_BUG_ON(pte % 16);
> +				num_ptes /= 16;
> +				pte /= 16;
> +			}
> +
>   			vaddr = px_vaddr(pt);
> -			memset64(vaddr + gen8_pd_index(start, 0),
> +			memset64(vaddr + pte,
>   				 vm->scratch[0]->encode,
> -				 count);
> +				 num_ptes);
>   
>   			atomic_sub(count, &pt->used);
>   			start += count;
> @@ -454,6 +464,93 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
>   	return idx;
>   }
>   
> +static void
> +xehpsdv_ppgtt_insert_huge(struct i915_vma *vma,
> +			  struct sgt_dma *iter,
> +			  enum i915_cache_level cache_level,
> +			  u32 flags)
> +{
> +	const gen8_pte_t pte_encode = vma->vm->pte_encode(0, cache_level, flags);
> +	unsigned int rem = sg_dma_len(iter->sg);
> +	u64 start = vma->node.start;
> +
> +	GEM_BUG_ON(!i915_vm_is_4lvl(vma->vm));
> +
> +	do {
> +		struct i915_page_directory * const pdp =
> +			gen8_pdp_for_page_address(vma->vm, start);
> +		struct i915_page_directory * const pd =
> +			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
> +		struct i915_page_table *pt =
> +			i915_pt_entry(pd, __gen8_pte_index(start, 1));
> +		gen8_pte_t encode = pte_encode;
> +		unsigned int page_size;
> +		gen8_pte_t *vaddr;
> +		u16 index, max;
> +
> +		max = I915_PDES;
> +
> +		if (vma->page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
> +		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
> +		    rem >= I915_GTT_PAGE_SIZE_2M &&
> +		    !__gen8_pte_index(start, 0)) {
> +			index = __gen8_pte_index(start, 1);
> +			encode |= GEN8_PDE_PS_2M;
> +			page_size = I915_GTT_PAGE_SIZE_2M;
> +
> +			vaddr = px_vaddr(pd);
> +		} else {
> +			if (encode & GEN12_PPGTT_PTE_LM) {
> +				GEM_BUG_ON(!i915_gem_object_is_lmem(vma->obj));
> +				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
> +				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
> +				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
> +						       I915_GTT_PAGE_SIZE_64K));
> +
> +				index = __gen8_pte_index(start, 0) / 16;
> +				page_size = I915_GTT_PAGE_SIZE_64K;
> +
> +				max /= 16;
> +
> +				vaddr = px_vaddr(pd);
> +				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
> +
> +				pt->is_compact = true;
> +			} else {
> +				GEM_BUG_ON(i915_gem_object_is_lmem(vma->obj));
> +				GEM_BUG_ON(pt->is_compact);
> +				index =  __gen8_pte_index(start, 0);
> +				page_size = I915_GTT_PAGE_SIZE;
> +			}
> +
> +			vaddr = px_vaddr(pt);
> +		}
> +
> +		do {
> +			GEM_BUG_ON(rem < page_size);
> +			vaddr[index++] = encode | iter->dma;
> +
> +			start += page_size;
> +			iter->dma += page_size;
> +			rem -= page_size;
> +			if (iter->dma >= iter->max) {
> +				iter->sg = __sg_next(iter->sg);
> +				GEM_BUG_ON(!iter->sg);
> +
> +				rem = sg_dma_len(iter->sg);
> +				GEM_BUG_ON(!rem);
> +				iter->dma = sg_dma_address(iter->sg);
> +				iter->max = iter->dma + rem;
> +
> +				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
> +					break;
> +			}
> +		} while (rem >= page_size && index < max);
> +
> +		vma->page_sizes.gtt |= page_size;
> +	} while (iter->sg && sg_dma_len(iter->sg));
> +}
> +
>   static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
>   				   struct sgt_dma *iter,
>   				   enum i915_cache_level cache_level,
> @@ -586,7 +683,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>   	struct sgt_dma iter = sgt_dma(vma);
>   
>   	if (vma->page_sizes.sg > I915_GTT_PAGE_SIZE) {
> -		gen8_ppgtt_insert_huge(vma, &iter, cache_level, flags);
> +		if (HAS_64K_PAGES(vm->i915))
> +			xehpsdv_ppgtt_insert_huge(vma, &iter, cache_level, flags);
> +		else
> +			gen8_ppgtt_insert_huge(vma, &iter, cache_level, flags);
>   	} else  {
>   		u64 idx = vma->node.start >> GEN8_PTE_SHIFT;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 6d13f4ab4d4a..6d0233ffae17 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -89,6 +89,8 @@ typedef u64 gen8_pte_t;
>   
>   #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
>   
> +#define GEN12_PDE_64K BIT(6)
> +
>   /*
>    * Cacheability Control is a 4-bit value. The low three bits are stored in bits
>    * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
> @@ -154,6 +156,7 @@ struct i915_page_table {
>   		atomic_t used;
>   		struct i915_page_table *stash;
>   	};
> +	bool is_compact;
>   };
>   
>   struct i915_page_directory {
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index 4396bfd630d8..b8238f5bc8b1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> +	pt->is_compact = false;
>   	atomic_set(&pt->used, 0);
>   	return pt;
>   }
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Intel-gfx] [PATCH v2 14/17] uapi/drm/dg2: Format modifier for DG2 unified compression and clear color
  2021-10-21 14:35     ` [Intel-gfx] " Ville Syrjälä
  (?)
@ 2021-10-25 11:20     ` Juha-Pekka Heikkila
  2021-10-26 15:44       ` Ramalingam C
  -1 siblings, 1 reply; 57+ messages in thread
From: Juha-Pekka Heikkila @ 2021-10-25 11:20 UTC (permalink / raw)
  To: Ville Syrjälä, Ramalingam C
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	lucas.demarchi, rodrigo.vivi, Hellstrom Thomas, Matt Roper,
	Simon Ser, Pekka Paalanen

On 21.10.2021 17.35, Ville Syrjälä wrote:
> On Thu, Oct 21, 2021 at 07:56:24PM +0530, Ramalingam C wrote:
>> From: Matt Roper <matthew.d.roper@intel.com>
>>
>> DG2 unifies render compression and media compression into a single
>> format for the first time.  The programming and buffer layout is
>> supposed to match compression on older gen12 platforms, but the
>> actual compression algorithm is different from any previous platform; as
>> such, we need a new framebuffer modifier to represent buffers in this
>> format, but otherwise we can re-use the existing gen12 compression driver
>> logic.
>>
>> DG2 clear color render compression uses Tile4 layout. Therefore, we need
>> to define a new format modifier for uAPI to support clear color rendering.
>>
>> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
>> Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
>> Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
>> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
>> cc: Simon Ser <contact@emersion.fr>
>> Cc: Pekka Paalanen <ppaalanen@gmail.com>
>> ---
>>   drivers/gpu/drm/i915/display/intel_display.c  |  3 ++
>>   .../drm/i915/display/intel_display_types.h    | 10 +++-
>>   drivers/gpu/drm/i915/display/intel_fb.c       |  7 +++
>>   .../drm/i915/display/skl_universal_plane.c    | 49 +++++++++++++++++--
>>   include/uapi/drm/drm_fourcc.h                 | 30 ++++++++++++
>>   5 files changed, 94 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
>> index 9b678839bf2b..2949fe9f5b9f 100644
>> --- a/drivers/gpu/drm/i915/display/intel_display.c
>> +++ b/drivers/gpu/drm/i915/display/intel_display.c
>> @@ -1013,6 +1013,9 @@ intel_get_format_info(const struct drm_mode_fb_cmd2 *cmd)
>>   					  cmd->pixel_format);
>>   	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS:
>>   	case I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS:
>> +	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
>> +	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
>> +	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
>>   		return lookup_format_info(gen12_ccs_formats,
>>   					  ARRAY_SIZE(gen12_ccs_formats),
>>   					  cmd->pixel_format);
> 
> That seems not right. Flat CCS is invisible to the user so the format
> info should not include a CCS plane.
> 

I had cleaned out those rc and mc ccs from here long time ago, I wonder 
where did they come back from? On my development tree they're not there. 
Also I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC is here in totally wrong 
place, it should have its own gen12_flat_ccs_cc_formats table.

/Juha-Pekka

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Intel-gfx] [PATCH v2 00/17] drm/i915/dg2: Enabling 64k page size and flat ccs
  2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
                   ` (20 preceding siblings ...)
  (?)
@ 2021-10-25 17:31 ` Robert Beckett
  -1 siblings, 0 replies; 57+ messages in thread
From: Robert Beckett @ 2021-10-25 17:31 UTC (permalink / raw)
  To: intel-gfx; +Cc: ramalingam.c

(apologies for not quoting, I wasn't subscribed before now)


some quick thoughts:

- Can we split these patches in to two series, one for each topic. They 
don't seem specifically related.

- to simplify 64K page support, could we just set minimum allocation 
size to 64K and round up for allocation requests?
Placement then becomes much simpler, no need to align the va to 2MB, 
just fit it in wherever it fits and always use 64K PTEs in GTT

This would simplify the code a lot and would benefit performance due up 
to 16x fewer page walks.
If we did this, we would not have to consider 2MB boundaries at all, we 
could drop all the colour handling etc.

The only down side might be some waste of allocation if there are lots 
of very small buffers.
However, I think most gfx related use cases would not be badly affected 
by this (even a cursor plane is 64k, usually).

Are there any usecases that you are aware of that would be impacted 
badly by this idea? (maybe some compute workload?)


- flat ccs modifiers: there seems to be some confusion over whether 
there should be a separate modifier for this.
As it dictates a new layout it seems like it should be a new modifier.
Was there any internal discussions about this that you could elaborate 
on here?


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 13/17] drm/i915/dg2: Tile 4 plane format support
  2021-10-21 14:27     ` [Intel-gfx] " Lisovskiy, Stanislav
@ 2021-10-26 15:08       ` Ramalingam C
  -1 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-26 15:08 UTC (permalink / raw)
  To: Lisovskiy, Stanislav
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	lucas.demarchi, rodrigo.vivi, Hellstrom Thomas, Imre Deak,
	Matt Roper, Maarten Lankhorst

On 2021-10-21 at 17:27:08 +0300, Lisovskiy, Stanislav wrote:
> On Thu, Oct 21, 2021 at 07:56:23PM +0530, Ramalingam C wrote:
> > From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > 
> > TileF(Tile4 in bspec) format is 4K tile organized into
> > 64B subtiles with same basic shape as for legacy TileY
> > which will be supported by Display13.
> > 
> > v2: - Fixed wrong case condition(Jani Nikula)
> >     - Increased I915_FORMAT_MOD_F_TILED up to 12(Imre Deak)
> > 
> > v3: - s/I915_TILING_F/TILING_4/g
> >     - s/I915_FORMAT_MOD_F_TILED/I915_FORMAT_MOD_4_TILED/g
> >     - Removed unneeded fencing code
> > 
> > Cc: Imre Deak <imre.deak@intel.com>
> > Cc: Matt Roper <matthew.d.roper@intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> > Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_display.c  |  1 +
> >  drivers/gpu/drm/i915/display/intel_fb.c       |  7 ++++
> >  drivers/gpu/drm/i915/display/intel_fbc.c      |  1 +
> >  .../drm/i915/display/intel_plane_initial.c    |  1 +
> >  .../drm/i915/display/skl_universal_plane.c    | 36 ++++++++++++++-----
> >  drivers/gpu/drm/i915/i915_drv.h               |  1 +
> >  drivers/gpu/drm/i915/i915_pci.c               |  1 +
> >  drivers/gpu/drm/i915/i915_reg.h               |  1 +
> >  drivers/gpu/drm/i915/intel_device_info.h      |  1 +
> >  drivers/gpu/drm/i915/intel_pm.c               |  1 +
> >  include/uapi/drm/drm_fourcc.h                 |  8 +++++
> >  11 files changed, 50 insertions(+), 9 deletions(-)
> 
> Was I supposed to change TILE_F/TILE_4 everywhere first,
> as per your comment?
Stan, if you think that is the right think to do, please go ahead!

Ram
> 
> Stan
> 
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> > index ce5d6633029a..9b678839bf2b 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display.c
> > @@ -8877,6 +8877,7 @@ static int intel_atomic_check_async(struct intel_atomic_state *state)
> >  		case I915_FORMAT_MOD_X_TILED:
> >  		case I915_FORMAT_MOD_Y_TILED:
> >  		case I915_FORMAT_MOD_Yf_TILED:
> > +		case I915_FORMAT_MOD_4_TILED:
> >  			break;
> >  		default:
> >  			drm_dbg_kms(&i915->drm,
> > diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
> > index fa1f375e696b..e19739fef825 100644
> > --- a/drivers/gpu/drm/i915/display/intel_fb.c
> > +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> > @@ -127,6 +127,12 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
> >  			return 128;
> >  		else
> >  			return 512;
> > +	case I915_FORMAT_MOD_4_TILED:
> > +		/*
> > +		 * Each 4K tile consists of 64B(8*8) subtiles, with
> > +		 * same shape as Y Tile(i.e 4*16B OWords)
> > +		 */
> > +		return 128;
> >  	case I915_FORMAT_MOD_Y_TILED_CCS:
> >  		if (is_ccs_plane(fb, color_plane))
> >  			return 128;
> > @@ -305,6 +311,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
> >  	case I915_FORMAT_MOD_Y_TILED_CCS:
> >  	case I915_FORMAT_MOD_Yf_TILED_CCS:
> >  	case I915_FORMAT_MOD_Y_TILED:
> > +	case I915_FORMAT_MOD_4_TILED:
> >  	case I915_FORMAT_MOD_Yf_TILED:
> >  		return 1 * 1024 * 1024;
> >  	default:
> > diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c
> > index 1f66de77a6b1..f079a771f802 100644
> > --- a/drivers/gpu/drm/i915/display/intel_fbc.c
> > +++ b/drivers/gpu/drm/i915/display/intel_fbc.c
> > @@ -747,6 +747,7 @@ static bool tiling_is_valid(struct drm_i915_private *dev_priv,
> >  	case DRM_FORMAT_MOD_LINEAR:
> >  	case I915_FORMAT_MOD_Y_TILED:
> >  	case I915_FORMAT_MOD_Yf_TILED:
> > +	case I915_FORMAT_MOD_4_TILED:
> >  		return DISPLAY_VER(dev_priv) >= 9;
> >  	case I915_FORMAT_MOD_X_TILED:
> >  		return true;
> > diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
> > index dcd698a02da2..d80855ee9b96 100644
> > --- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
> > +++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
> > @@ -125,6 +125,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
> >  	case DRM_FORMAT_MOD_LINEAR:
> >  	case I915_FORMAT_MOD_X_TILED:
> >  	case I915_FORMAT_MOD_Y_TILED:
> > +	case I915_FORMAT_MOD_4_TILED:
> >  		break;
> >  	default:
> >  		drm_dbg(&dev_priv->drm,
> > diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> > index 7444b88829ea..0eb4509f7f7a 100644
> > --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> > +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> > @@ -207,6 +207,13 @@ static const u64 adlp_step_a_plane_format_modifiers[] = {
> >  	DRM_FORMAT_MOD_INVALID
> >  };
> >  
> > +static const u64 dg2_plane_format_modifiers[] = {
> > +	I915_FORMAT_MOD_X_TILED,
> > +	I915_FORMAT_MOD_4_TILED,
> > +	DRM_FORMAT_MOD_LINEAR,
> > +	DRM_FORMAT_MOD_INVALID
> > +};
> > +
> >  int skl_format_to_fourcc(int format, bool rgb_order, bool alpha)
> >  {
> >  	switch (format) {
> > @@ -795,6 +802,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
> >  		return PLANE_CTL_TILED_X;
> >  	case I915_FORMAT_MOD_Y_TILED:
> >  		return PLANE_CTL_TILED_Y;
> > +	case I915_FORMAT_MOD_4_TILED:
> > +		return PLANE_CTL_TILED_F;
> >  	case I915_FORMAT_MOD_Y_TILED_CCS:
> >  	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
> >  		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> > @@ -1283,6 +1292,7 @@ static int skl_plane_check_fb(const struct intel_crtc_state *crtc_state,
> >  	     fb->modifier == I915_FORMAT_MOD_Yf_TILED ||
> >  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
> >  	     fb->modifier == I915_FORMAT_MOD_Yf_TILED_CCS ||
> > +	     fb->modifier == I915_FORMAT_MOD_4_TILED ||
> >  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS ||
> >  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
> >  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC)) {
> > @@ -2001,6 +2011,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
> >  		if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
> >  			return false;
> >  		break;
> > +	case I915_FORMAT_MOD_4_TILED:
> > +		if (!HAS_FTILE(dev_priv))
> > +			return false;
> > +		break;
> >  	default:
> >  		return false;
> >  	}
> > @@ -2041,9 +2055,7 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
> >  	case DRM_FORMAT_Y216:
> >  	case DRM_FORMAT_XVYU12_16161616:
> >  	case DRM_FORMAT_XVYU16161616:
> > -		if (modifier == DRM_FORMAT_MOD_LINEAR ||
> > -		    modifier == I915_FORMAT_MOD_X_TILED ||
> > -		    modifier == I915_FORMAT_MOD_Y_TILED)
> > +		if (!is_ccs_modifier(modifier))
> >  			return true;
> >  		fallthrough;
> >  	default:
> > @@ -2054,8 +2066,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
> >  static const u64 *gen12_get_plane_modifiers(struct drm_i915_private *dev_priv,
> >  					    enum plane_id plane_id)
> >  {
> > +	if (HAS_FTILE(dev_priv))
> > +		return dg2_plane_format_modifiers;
> >  	/* Wa_22011186057 */
> > -	if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
> > +	else if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
> >  		return adlp_step_a_plane_format_modifiers;
> >  	else if (gen12_plane_supports_mc_ccs(dev_priv, plane_id))
> >  		return gen12_plane_format_modifiers_mc_ccs;
> > @@ -2325,11 +2339,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
> >  		else
> >  			fb->modifier = I915_FORMAT_MOD_Y_TILED;
> >  		break;
> > -	case PLANE_CTL_TILED_YF:
> > -		if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> > -			fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
> > -		else
> > -			fb->modifier = I915_FORMAT_MOD_Yf_TILED;
> > +	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_F on XE_LPD+ */
> > +		if (DISPLAY_VER(dev_priv) >= 13) {
> > +			fb->modifier = I915_FORMAT_MOD_4_TILED;
> > +		} else {
> > +			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> > +				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
> > +			else
> > +				fb->modifier = I915_FORMAT_MOD_Yf_TILED;
> > +		}
> >  		break;
> >  	default:
> >  		MISSING_CASE(tiling);
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 57948e0ee48b..fdd8ddb0cdf6 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1628,6 +1628,7 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
> >  #define CMDPARSER_USES_GGTT(dev_priv) (GRAPHICS_VER(dev_priv) == 7)
> >  
> >  #define HAS_LLC(dev_priv)	(INTEL_INFO(dev_priv)->has_llc)
> > +#define HAS_FTILE(dev_priv)    (INTEL_INFO(dev_priv)->has_ftile)
> >  #define HAS_SNOOP(dev_priv)	(INTEL_INFO(dev_priv)->has_snoop)
> >  #define HAS_EDRAM(dev_priv)	((dev_priv)->edram_size_mb)
> >  #define HAS_SECURE_BATCHES(dev_priv) (GRAPHICS_VER(dev_priv) < 6)
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> > index 68367b505dc4..0d7d88ee43ca 100644
> > --- a/drivers/gpu/drm/i915/i915_pci.c
> > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > @@ -972,6 +972,7 @@ static const struct intel_device_info adl_p_info = {
> >  	.display.has_cdclk_crawl = 1,
> >  	.display.has_modular_fia = 1,
> >  	.display.has_psr_hw_tracking = 0,
> > +	.has_ftile = 1, \
> >  	.platform_engine_mask =
> >  		BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VCS0) | BIT(VCS2),
> >  	.ppgtt_size = 48,
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index 3693eb03f5aa..0fe8e8e6cc31 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -7195,6 +7195,7 @@ enum {
> >  #define   PLANE_CTL_TILED_X			(1 << 10)
> >  #define   PLANE_CTL_TILED_Y			(4 << 10)
> >  #define   PLANE_CTL_TILED_YF			(5 << 10)
> > +#define   PLANE_CTL_TILED_F			(5 << 10)
> >  #define   PLANE_CTL_ASYNC_FLIP			(1 << 9)
> >  #define   PLANE_CTL_FLIP_HORIZONTAL		(1 << 8)
> >  #define   PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE	(1 << 4) /* TGL+ */
> > diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> > index 87ee1d86d2ac..f95d4a939e10 100644
> > --- a/drivers/gpu/drm/i915/intel_device_info.h
> > +++ b/drivers/gpu/drm/i915/intel_device_info.h
> > @@ -126,6 +126,7 @@ enum intel_ppgtt_type {
> >  	func(has_64k_pages); \
> >  	func(gpu_reset_clobbers_display); \
> >  	func(has_reset_engine); \
> > +	func(has_ftile); \
> >  	func(has_flat_ccs); \
> >  	func(has_global_mocs); \
> >  	func(has_gt_uc); \
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 201477ca408a..0db5f32adfd5 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -5377,6 +5377,7 @@ skl_compute_wm_params(const struct intel_crtc_state *crtc_state,
> >  	}
> >  
> >  	wp->y_tiled = modifier == I915_FORMAT_MOD_Y_TILED ||
> > +		      modifier == I915_FORMAT_MOD_4_TILED ||
> >  		      modifier == I915_FORMAT_MOD_Yf_TILED ||
> >  		      modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
> >  		      modifier == I915_FORMAT_MOD_Yf_TILED_CCS;
> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > index 45a914850be0..982b0a9fa78b 100644
> > --- a/include/uapi/drm/drm_fourcc.h
> > +++ b/include/uapi/drm/drm_fourcc.h
> > @@ -558,6 +558,14 @@ extern "C" {
> >   * pitch is required to be a multiple of 4 tile widths.
> >   */
> >  #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
> > +/*
> > + * Intel F-tiling(aka Tile4) layout
> > + *
> > + * This is a tiled layout using 4Kb tiles in row-major layout.
> > + * Within the tile pixels are laid out in 64 byte units / sub-tiles in OWORD
> > + * (16 bytes) chunks column-major..
> > + */
> > +#define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 12)
> >  
> >  /*
> >   * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Intel-gfx] [PATCH v2 13/17] drm/i915/dg2: Tile 4 plane format support
@ 2021-10-26 15:08       ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-26 15:08 UTC (permalink / raw)
  To: Lisovskiy, Stanislav
  Cc: dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	lucas.demarchi, rodrigo.vivi, Hellstrom Thomas, Imre Deak,
	Matt Roper, Maarten Lankhorst

On 2021-10-21 at 17:27:08 +0300, Lisovskiy, Stanislav wrote:
> On Thu, Oct 21, 2021 at 07:56:23PM +0530, Ramalingam C wrote:
> > From: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > 
> > TileF(Tile4 in bspec) format is 4K tile organized into
> > 64B subtiles with same basic shape as for legacy TileY
> > which will be supported by Display13.
> > 
> > v2: - Fixed wrong case condition(Jani Nikula)
> >     - Increased I915_FORMAT_MOD_F_TILED up to 12(Imre Deak)
> > 
> > v3: - s/I915_TILING_F/TILING_4/g
> >     - s/I915_FORMAT_MOD_F_TILED/I915_FORMAT_MOD_4_TILED/g
> >     - Removed unneeded fencing code
> > 
> > Cc: Imre Deak <imre.deak@intel.com>
> > Cc: Matt Roper <matthew.d.roper@intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
> > Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> > Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> > ---
> >  drivers/gpu/drm/i915/display/intel_display.c  |  1 +
> >  drivers/gpu/drm/i915/display/intel_fb.c       |  7 ++++
> >  drivers/gpu/drm/i915/display/intel_fbc.c      |  1 +
> >  .../drm/i915/display/intel_plane_initial.c    |  1 +
> >  .../drm/i915/display/skl_universal_plane.c    | 36 ++++++++++++++-----
> >  drivers/gpu/drm/i915/i915_drv.h               |  1 +
> >  drivers/gpu/drm/i915/i915_pci.c               |  1 +
> >  drivers/gpu/drm/i915/i915_reg.h               |  1 +
> >  drivers/gpu/drm/i915/intel_device_info.h      |  1 +
> >  drivers/gpu/drm/i915/intel_pm.c               |  1 +
> >  include/uapi/drm/drm_fourcc.h                 |  8 +++++
> >  11 files changed, 50 insertions(+), 9 deletions(-)
> 
> Was I supposed to change TILE_F/TILE_4 everywhere first,
> as per your comment?
Stan, if you think that is the right think to do, please go ahead!

Ram
> 
> Stan
> 
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> > index ce5d6633029a..9b678839bf2b 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display.c
> > @@ -8877,6 +8877,7 @@ static int intel_atomic_check_async(struct intel_atomic_state *state)
> >  		case I915_FORMAT_MOD_X_TILED:
> >  		case I915_FORMAT_MOD_Y_TILED:
> >  		case I915_FORMAT_MOD_Yf_TILED:
> > +		case I915_FORMAT_MOD_4_TILED:
> >  			break;
> >  		default:
> >  			drm_dbg_kms(&i915->drm,
> > diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
> > index fa1f375e696b..e19739fef825 100644
> > --- a/drivers/gpu/drm/i915/display/intel_fb.c
> > +++ b/drivers/gpu/drm/i915/display/intel_fb.c
> > @@ -127,6 +127,12 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane)
> >  			return 128;
> >  		else
> >  			return 512;
> > +	case I915_FORMAT_MOD_4_TILED:
> > +		/*
> > +		 * Each 4K tile consists of 64B(8*8) subtiles, with
> > +		 * same shape as Y Tile(i.e 4*16B OWords)
> > +		 */
> > +		return 128;
> >  	case I915_FORMAT_MOD_Y_TILED_CCS:
> >  		if (is_ccs_plane(fb, color_plane))
> >  			return 128;
> > @@ -305,6 +311,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb,
> >  	case I915_FORMAT_MOD_Y_TILED_CCS:
> >  	case I915_FORMAT_MOD_Yf_TILED_CCS:
> >  	case I915_FORMAT_MOD_Y_TILED:
> > +	case I915_FORMAT_MOD_4_TILED:
> >  	case I915_FORMAT_MOD_Yf_TILED:
> >  		return 1 * 1024 * 1024;
> >  	default:
> > diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c
> > index 1f66de77a6b1..f079a771f802 100644
> > --- a/drivers/gpu/drm/i915/display/intel_fbc.c
> > +++ b/drivers/gpu/drm/i915/display/intel_fbc.c
> > @@ -747,6 +747,7 @@ static bool tiling_is_valid(struct drm_i915_private *dev_priv,
> >  	case DRM_FORMAT_MOD_LINEAR:
> >  	case I915_FORMAT_MOD_Y_TILED:
> >  	case I915_FORMAT_MOD_Yf_TILED:
> > +	case I915_FORMAT_MOD_4_TILED:
> >  		return DISPLAY_VER(dev_priv) >= 9;
> >  	case I915_FORMAT_MOD_X_TILED:
> >  		return true;
> > diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c
> > index dcd698a02da2..d80855ee9b96 100644
> > --- a/drivers/gpu/drm/i915/display/intel_plane_initial.c
> > +++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c
> > @@ -125,6 +125,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
> >  	case DRM_FORMAT_MOD_LINEAR:
> >  	case I915_FORMAT_MOD_X_TILED:
> >  	case I915_FORMAT_MOD_Y_TILED:
> > +	case I915_FORMAT_MOD_4_TILED:
> >  		break;
> >  	default:
> >  		drm_dbg(&dev_priv->drm,
> > diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> > index 7444b88829ea..0eb4509f7f7a 100644
> > --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
> > +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
> > @@ -207,6 +207,13 @@ static const u64 adlp_step_a_plane_format_modifiers[] = {
> >  	DRM_FORMAT_MOD_INVALID
> >  };
> >  
> > +static const u64 dg2_plane_format_modifiers[] = {
> > +	I915_FORMAT_MOD_X_TILED,
> > +	I915_FORMAT_MOD_4_TILED,
> > +	DRM_FORMAT_MOD_LINEAR,
> > +	DRM_FORMAT_MOD_INVALID
> > +};
> > +
> >  int skl_format_to_fourcc(int format, bool rgb_order, bool alpha)
> >  {
> >  	switch (format) {
> > @@ -795,6 +802,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
> >  		return PLANE_CTL_TILED_X;
> >  	case I915_FORMAT_MOD_Y_TILED:
> >  		return PLANE_CTL_TILED_Y;
> > +	case I915_FORMAT_MOD_4_TILED:
> > +		return PLANE_CTL_TILED_F;
> >  	case I915_FORMAT_MOD_Y_TILED_CCS:
> >  	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
> >  		return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
> > @@ -1283,6 +1292,7 @@ static int skl_plane_check_fb(const struct intel_crtc_state *crtc_state,
> >  	     fb->modifier == I915_FORMAT_MOD_Yf_TILED ||
> >  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
> >  	     fb->modifier == I915_FORMAT_MOD_Yf_TILED_CCS ||
> > +	     fb->modifier == I915_FORMAT_MOD_4_TILED ||
> >  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS ||
> >  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS ||
> >  	     fb->modifier == I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC)) {
> > @@ -2001,6 +2011,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
> >  		if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
> >  			return false;
> >  		break;
> > +	case I915_FORMAT_MOD_4_TILED:
> > +		if (!HAS_FTILE(dev_priv))
> > +			return false;
> > +		break;
> >  	default:
> >  		return false;
> >  	}
> > @@ -2041,9 +2055,7 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
> >  	case DRM_FORMAT_Y216:
> >  	case DRM_FORMAT_XVYU12_16161616:
> >  	case DRM_FORMAT_XVYU16161616:
> > -		if (modifier == DRM_FORMAT_MOD_LINEAR ||
> > -		    modifier == I915_FORMAT_MOD_X_TILED ||
> > -		    modifier == I915_FORMAT_MOD_Y_TILED)
> > +		if (!is_ccs_modifier(modifier))
> >  			return true;
> >  		fallthrough;
> >  	default:
> > @@ -2054,8 +2066,10 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane,
> >  static const u64 *gen12_get_plane_modifiers(struct drm_i915_private *dev_priv,
> >  					    enum plane_id plane_id)
> >  {
> > +	if (HAS_FTILE(dev_priv))
> > +		return dg2_plane_format_modifiers;
> >  	/* Wa_22011186057 */
> > -	if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
> > +	else if (IS_ADLP_DISPLAY_STEP(dev_priv, STEP_A0, STEP_B0))
> >  		return adlp_step_a_plane_format_modifiers;
> >  	else if (gen12_plane_supports_mc_ccs(dev_priv, plane_id))
> >  		return gen12_plane_format_modifiers_mc_ccs;
> > @@ -2325,11 +2339,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
> >  		else
> >  			fb->modifier = I915_FORMAT_MOD_Y_TILED;
> >  		break;
> > -	case PLANE_CTL_TILED_YF:
> > -		if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> > -			fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
> > -		else
> > -			fb->modifier = I915_FORMAT_MOD_Yf_TILED;
> > +	case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_F on XE_LPD+ */
> > +		if (DISPLAY_VER(dev_priv) >= 13) {
> > +			fb->modifier = I915_FORMAT_MOD_4_TILED;
> > +		} else {
> > +			if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> > +				fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
> > +			else
> > +				fb->modifier = I915_FORMAT_MOD_Yf_TILED;
> > +		}
> >  		break;
> >  	default:
> >  		MISSING_CASE(tiling);
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 57948e0ee48b..fdd8ddb0cdf6 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1628,6 +1628,7 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
> >  #define CMDPARSER_USES_GGTT(dev_priv) (GRAPHICS_VER(dev_priv) == 7)
> >  
> >  #define HAS_LLC(dev_priv)	(INTEL_INFO(dev_priv)->has_llc)
> > +#define HAS_FTILE(dev_priv)    (INTEL_INFO(dev_priv)->has_ftile)
> >  #define HAS_SNOOP(dev_priv)	(INTEL_INFO(dev_priv)->has_snoop)
> >  #define HAS_EDRAM(dev_priv)	((dev_priv)->edram_size_mb)
> >  #define HAS_SECURE_BATCHES(dev_priv) (GRAPHICS_VER(dev_priv) < 6)
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> > index 68367b505dc4..0d7d88ee43ca 100644
> > --- a/drivers/gpu/drm/i915/i915_pci.c
> > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > @@ -972,6 +972,7 @@ static const struct intel_device_info adl_p_info = {
> >  	.display.has_cdclk_crawl = 1,
> >  	.display.has_modular_fia = 1,
> >  	.display.has_psr_hw_tracking = 0,
> > +	.has_ftile = 1, \
> >  	.platform_engine_mask =
> >  		BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VCS0) | BIT(VCS2),
> >  	.ppgtt_size = 48,
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index 3693eb03f5aa..0fe8e8e6cc31 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -7195,6 +7195,7 @@ enum {
> >  #define   PLANE_CTL_TILED_X			(1 << 10)
> >  #define   PLANE_CTL_TILED_Y			(4 << 10)
> >  #define   PLANE_CTL_TILED_YF			(5 << 10)
> > +#define   PLANE_CTL_TILED_F			(5 << 10)
> >  #define   PLANE_CTL_ASYNC_FLIP			(1 << 9)
> >  #define   PLANE_CTL_FLIP_HORIZONTAL		(1 << 8)
> >  #define   PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE	(1 << 4) /* TGL+ */
> > diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> > index 87ee1d86d2ac..f95d4a939e10 100644
> > --- a/drivers/gpu/drm/i915/intel_device_info.h
> > +++ b/drivers/gpu/drm/i915/intel_device_info.h
> > @@ -126,6 +126,7 @@ enum intel_ppgtt_type {
> >  	func(has_64k_pages); \
> >  	func(gpu_reset_clobbers_display); \
> >  	func(has_reset_engine); \
> > +	func(has_ftile); \
> >  	func(has_flat_ccs); \
> >  	func(has_global_mocs); \
> >  	func(has_gt_uc); \
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 201477ca408a..0db5f32adfd5 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -5377,6 +5377,7 @@ skl_compute_wm_params(const struct intel_crtc_state *crtc_state,
> >  	}
> >  
> >  	wp->y_tiled = modifier == I915_FORMAT_MOD_Y_TILED ||
> > +		      modifier == I915_FORMAT_MOD_4_TILED ||
> >  		      modifier == I915_FORMAT_MOD_Yf_TILED ||
> >  		      modifier == I915_FORMAT_MOD_Y_TILED_CCS ||
> >  		      modifier == I915_FORMAT_MOD_Yf_TILED_CCS;
> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > index 45a914850be0..982b0a9fa78b 100644
> > --- a/include/uapi/drm/drm_fourcc.h
> > +++ b/include/uapi/drm/drm_fourcc.h
> > @@ -558,6 +558,14 @@ extern "C" {
> >   * pitch is required to be a multiple of 4 tile widths.
> >   */
> >  #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
> > +/*
> > + * Intel F-tiling(aka Tile4) layout
> > + *
> > + * This is a tiled layout using 4Kb tiles in row-major layout.
> > + * Within the tile pixels are laid out in 64 byte units / sub-tiles in OWORD
> > + * (16 bytes) chunks column-major..
> > + */
> > +#define I915_FORMAT_MOD_4_TILED         fourcc_mod_code(INTEL, 12)
> >  
> >  /*
> >   * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Intel-gfx] [PATCH v2 14/17] uapi/drm/dg2: Format modifier for DG2 unified compression and clear color
  2021-10-25 11:20     ` Juha-Pekka Heikkila
@ 2021-10-26 15:44       ` Ramalingam C
  0 siblings, 0 replies; 57+ messages in thread
From: Ramalingam C @ 2021-10-26 15:44 UTC (permalink / raw)
  To: Juha-Pekka Heikkila
  Cc: Ville Syrjälä,
	dri-devel, intel-gfx, Daniel Vetter, CQ Tang, Matthew Auld,
	lucas.demarchi, rodrigo.vivi, Hellstrom Thomas, Matt Roper,
	Simon Ser, Pekka Paalanen

On 2021-10-25 at 14:20:02 +0300, Juha-Pekka Heikkila wrote:
> On 21.10.2021 17.35, Ville Syrjälä wrote:
> > On Thu, Oct 21, 2021 at 07:56:24PM +0530, Ramalingam C wrote:
> > > From: Matt Roper <matthew.d.roper@intel.com>
> > > 
> > > DG2 unifies render compression and media compression into a single
> > > format for the first time.  The programming and buffer layout is
> > > supposed to match compression on older gen12 platforms, but the
> > > actual compression algorithm is different from any previous platform; as
> > > such, we need a new framebuffer modifier to represent buffers in this
> > > format, but otherwise we can re-use the existing gen12 compression driver
> > > logic.
> > > 
> > > DG2 clear color render compression uses Tile4 layout. Therefore, we need
> > > to define a new format modifier for uAPI to support clear color rendering.
> > > 
> > > Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> > > Signed-off-by: Mika Kahola <mika.kahola@intel.com> (v2)
> > > Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
> > > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > > cc: Simon Ser <contact@emersion.fr>
> > > Cc: Pekka Paalanen <ppaalanen@gmail.com>
> > > ---
> > >   drivers/gpu/drm/i915/display/intel_display.c  |  3 ++
> > >   .../drm/i915/display/intel_display_types.h    | 10 +++-
> > >   drivers/gpu/drm/i915/display/intel_fb.c       |  7 +++
> > >   .../drm/i915/display/skl_universal_plane.c    | 49 +++++++++++++++++--
> > >   include/uapi/drm/drm_fourcc.h                 | 30 ++++++++++++
> > >   5 files changed, 94 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> > > index 9b678839bf2b..2949fe9f5b9f 100644
> > > --- a/drivers/gpu/drm/i915/display/intel_display.c
> > > +++ b/drivers/gpu/drm/i915/display/intel_display.c
> > > @@ -1013,6 +1013,9 @@ intel_get_format_info(const struct drm_mode_fb_cmd2 *cmd)
> > >   					  cmd->pixel_format);
> > >   	case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS:
> > >   	case I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS:
> > > +	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS:
> > > +	case I915_FORMAT_MOD_F_TILED_DG2_MC_CCS:
> > > +	case I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC:
> > >   		return lookup_format_info(gen12_ccs_formats,
> > >   					  ARRAY_SIZE(gen12_ccs_formats),
> > >   					  cmd->pixel_format);
> > 
> > That seems not right. Flat CCS is invisible to the user so the format
> > info should not include a CCS plane.
> > 
> 
> I had cleaned out those rc and mc ccs from here long time ago, I wonder
> where did they come back from? On my development tree they're not there.
> Also I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC is here in totally wrong place,
> it should have its own gen12_flat_ccs_cc_formats table.

Oops, there was another piece missed from this upstreaming effort. I
will push that too...

Ram
> 
> /Juha-Pekka

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2021-10-26 15:58 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 14:26 [PATCH v2 00/17] drm/i915/dg2: Enabling 64k page size and flat ccs Ramalingam C
2021-10-21 14:26 ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 01/17] drm/i915: Add has_64k_pages flag Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-22  6:47   ` Lucas De Marchi
2021-10-21 14:26 ` [PATCH v2 02/17] drm/i915/xehpsdv: set min page-size to 64K Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-22  6:47   ` Lucas De Marchi
2021-10-22  6:47     ` [Intel-gfx] " Lucas De Marchi
2021-10-21 14:26 ` [PATCH v2 03/17] drm/i915/xehpsdv: enforce min GTT alignment Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 04/17] drm/i915: enforce min page size for scratch Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 05/17] drm/i915/gtt/xehpsdv: move scratch page to system memory Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 06/17] drm/i915/xehpsdv: support 64K GTT pages Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-22 17:04   ` Matthew Auld
2021-10-22 17:04     ` [Intel-gfx] " Matthew Auld
2021-10-21 14:26 ` [PATCH v2 07/17] drm/i915: Add vm min alignment support Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-22 16:56   ` Matthew Auld
2021-10-22 16:56     ` [Intel-gfx] " Matthew Auld
2021-10-21 14:26 ` [PATCH v2 08/17] drm/i915/selftests: account for min_alignment in GTT selftests Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 09/17] drm/i915/xehpsdv: implement memory coloring Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 10/17] drm/i915/xehpsdv: Add has_flat_ccs to device info Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 11/17] drm/i915/lmem: Enable lmem for platforms with Flat CCS Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 12/17] drm/i915/gt: Clear compress metadata for Xe_HP platforms Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 13/17] drm/i915/dg2: Tile 4 plane format support Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:27   ` Lisovskiy, Stanislav
2021-10-21 14:27     ` [Intel-gfx] " Lisovskiy, Stanislav
2021-10-26 15:08     ` Ramalingam C
2021-10-26 15:08       ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 14/17] uapi/drm/dg2: Format modifier for DG2 unified compression and clear color Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:28   ` Simon Ser
2021-10-21 14:28     ` [Intel-gfx] " Simon Ser
2021-10-21 14:35   ` Ville Syrjälä
2021-10-21 14:35     ` [Intel-gfx] " Ville Syrjälä
2021-10-25 11:20     ` Juha-Pekka Heikkila
2021-10-26 15:44       ` Ramalingam C
2021-10-21 14:26 ` [PATCH v2 15/17] drm/i915/uapi: document behaviour for DG2 64K support Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 16/17] drm/i915/Flat-CCS: Document on Flat-CCS memory compression Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 14:26 ` [PATCH v2 17/17] Doc/gpu/rfc/i915: i915 DG2 uAPI Ramalingam C
2021-10-21 14:26   ` [Intel-gfx] " Ramalingam C
2021-10-21 16:02 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/dg2: Enabling 64k page size and flat ccs (rev2) Patchwork
2021-10-21 16:04 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-10-21 16:33 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2021-10-25 17:31 ` [Intel-gfx] [PATCH v2 00/17] drm/i915/dg2: Enabling 64k page size and flat ccs Robert Beckett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.