[PATCH v2 0/4] drm/i915/ttm: Evict and store of compressed object

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] drm/i915/ttm: Evict and store of compressed object
@ 2022-03-01 21:53 ` Ramalingam C
  0 siblings, 0 replies; 25+ messages in thread
From: Ramalingam C @ 2022-03-01 21:53 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

On Xe-HP and later devices, we use dedicated compression control
state (CCS) stored in local memory for each surface, to support
the 3D and media compression formats.

The memory required for the CCS of the entire local memory is
1/256 of the local memory size. So before the kernel
boot, the required memory is reserved for the CCS data and a
secure register will be programmed with the CCS base address

So when we allocate a object in local memory we dont need to explicitly
allocate the space for ccs data. But when we evict the obj into the smem
to hold the compression related data along with the obj we need smem
space of obj_size + (obj_size/256).

Hence when we create smem for an obj with lmem placement possibility we
create with the extra space.

When we are swapping out the local memory obj on flat-ccs capable platform,
we need to capture the ccs data too along with main meory and we need to
restore it when we are swapping in the content.

When lmem object is swapped into a smem obj, smem obj will
have the extra pages required to hold the ccs data corresponding to the
lmem main memory. So main memory of lmem will be copied into the initial
pages of the smem and then ccs data corresponding to the main memory
will be copied to the subsequent pages of smem.

Swapin happens exactly in reverse order. First main memory of lmem is
restored from the smem's initial pages and the ccs data will be restored
from the subsequent pages of smem.

Extracting and restoring the CCS data is done through a special cmd called
XY_CTRL_SURF_COPY_BLT

Test-with: 20220301212513.30772-1-ramalingam.c@intel.com

Ayaz A Siddiqui (1):
  drm/i915/gt: Clear compress metadata for Xe_HP platforms

Ramalingam C (3):
  drm/ttm: parameter to add extra pages into ttm_tt
  drm/i915/gem: Extra pages in ttm_tt for ccs data
  drm/i915/migrate: Evict and restore the flatccs capable lmem obj

 drivers/gpu/drm/drm_gem_vram_helper.c        |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  23 +-
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 +
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 327 +++++++++++++++++--
 drivers/gpu/drm/qxl/qxl_ttm.c                |   2 +-
 drivers/gpu/drm/ttm/ttm_agp_backend.c        |   2 +-
 drivers/gpu/drm/ttm/ttm_tt.c                 |  12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c   |   2 +-
 include/drm/ttm/ttm_tt.h                     |   4 +-
 9 files changed, 357 insertions(+), 32 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Intel-gfx] [PATCH v2 0/4] drm/i915/ttm: Evict and store of compressed object
@ 2022-03-01 21:53 ` Ramalingam C
  0 siblings, 0 replies; 25+ messages in thread
From: Ramalingam C @ 2022-03-01 21:53 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

On Xe-HP and later devices, we use dedicated compression control
state (CCS) stored in local memory for each surface, to support
the 3D and media compression formats.

The memory required for the CCS of the entire local memory is
1/256 of the local memory size. So before the kernel
boot, the required memory is reserved for the CCS data and a
secure register will be programmed with the CCS base address

So when we allocate a object in local memory we dont need to explicitly
allocate the space for ccs data. But when we evict the obj into the smem
to hold the compression related data along with the obj we need smem
space of obj_size + (obj_size/256).

Hence when we create smem for an obj with lmem placement possibility we
create with the extra space.

When we are swapping out the local memory obj on flat-ccs capable platform,
we need to capture the ccs data too along with main meory and we need to
restore it when we are swapping in the content.

When lmem object is swapped into a smem obj, smem obj will
have the extra pages required to hold the ccs data corresponding to the
lmem main memory. So main memory of lmem will be copied into the initial
pages of the smem and then ccs data corresponding to the main memory
will be copied to the subsequent pages of smem.

Swapin happens exactly in reverse order. First main memory of lmem is
restored from the smem's initial pages and the ccs data will be restored
from the subsequent pages of smem.

Extracting and restoring the CCS data is done through a special cmd called
XY_CTRL_SURF_COPY_BLT

Test-with: 20220301212513.30772-1-ramalingam.c@intel.com

Ayaz A Siddiqui (1):
  drm/i915/gt: Clear compress metadata for Xe_HP platforms

Ramalingam C (3):
  drm/ttm: parameter to add extra pages into ttm_tt
  drm/i915/gem: Extra pages in ttm_tt for ccs data
  drm/i915/migrate: Evict and restore the flatccs capable lmem obj

 drivers/gpu/drm/drm_gem_vram_helper.c        |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  23 +-
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 +
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 327 +++++++++++++++++--
 drivers/gpu/drm/qxl/qxl_ttm.c                |   2 +-
 drivers/gpu/drm/ttm/ttm_agp_backend.c        |   2 +-
 drivers/gpu/drm/ttm/ttm_tt.c                 |  12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c   |   2 +-
 include/drm/ttm/ttm_tt.h                     |   4 +-
 9 files changed, 357 insertions(+), 32 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/4] drm/i915/gt: Clear compress metadata for Xe_HP platforms
  2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C
@ 2022-03-01 21:53   ` Ramalingam C
  -1 siblings, 0 replies; 25+ messages in thread
From: Ramalingam C @ 2022-03-01 21:53 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld, Ayaz A Siddiqui

From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>

Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.

Flat CCS memory can not be directly accessed by S/W.
Address of CCS buffer associated main BO is automatically calculated
by device itself. KMD/UMD can only access this buffer indirectly using
XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.

v2: Fixed issues with platform naming [Lucas]
v3: Rebased [Ram]
    Used the round_up funcs [Bob]
v4: Fixed ccs blk calculation [Ram]
    Added Kdoc on flat-ccs.
v5: GENMASK is used [Matt]
    mocs fix [Matt]
    Comments Fix [Matt]
    Flush address programming [Ram]
v6: FLUSH_DW is fixed
    Few coding style fix

Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 143 ++++++++++++++++++-
 2 files changed, 154 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index f8253012d166..237c1baccc64 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -203,6 +203,21 @@
 #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
 #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
 
+#define XY_CTRL_SURF_INSTR_SIZE	5
+#define MI_FLUSH_DW_SIZE		3
+#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT		21
+#define   DST_ACCESS_TYPE_SHIFT		20
+#define   CCS_SIZE_MASK			GENMASK(17, 8)
+#define   XY_CTRL_SURF_MOCS_MASK	GENMASK(31, 25)
+#define   NUM_CCS_BYTES_PER_BLOCK	256
+#define   NUM_BYTES_PER_CCS_BYTE	256
+#define   NUM_CCS_BLKS_PER_XFER		1024
+#define   INDIRECT_ACCESS		0
+#define   DIRECT_ACCESS			1
+#define  MI_FLUSH_LLC			BIT(9)
+#define  MI_FLUSH_CCS			BIT(16)
+
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
 #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 20444d6ceb3c..330fcdc3e0cf 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -16,6 +16,8 @@ struct insert_pte_data {
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
+#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
+					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
 
 static bool engine_supports_migration(struct intel_engine_cs *engine)
 {
@@ -467,6 +469,110 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
+/**
+ * DOC: Flat-CCS - Memory compression for Local memory
+ *
+ * On Xe-HP and later devices, we use dedicated compression control state (CCS)
+ * stored in local memory for each surface, to support the 3D and media
+ * compression formats.
+ *
+ * The memory required for the CCS of the entire local memory is 1/256 of the
+ * local memory size. So before the kernel boot, the required memory is reserved
+ * for the CCS data and a secure register will be programmed with the CCS base
+ * address.
+ *
+ * Flat CCS data needs to be cleared when a lmem object is allocated.
+ * And CCS data can be copied in and out of CCS region through
+ * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
+ *
+ * When we exhaust the lmem, if the object's placements support smem, then we can
+ * directly decompress the compressed lmem object into smem and start using it
+ * from smem itself.
+ *
+ * But when we need to swapout the compressed lmem object into a smem region
+ * though objects' placement doesn't support smem, then we copy the lmem content
+ * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
+ * When the object is referred, lmem content will be swaped in along with
+ * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
+ * location.
+ */
+
+static inline u32 *i915_flush_dw(u32 *cmd, u32 flags)
+{
+	*cmd++ = MI_FLUSH_DW | flags;
+	*cmd++ = 0;
+	*cmd++ = 0;
+
+	return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
+{
+	u32 num_cmds, num_blks, total_size;
+
+	if (!GET_CCS_BYTES(i915, size))
+		return 0;
+
+	/*
+	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
+	 * transfer upto 1024 blocks.
+	 */
+	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+				NUM_CCS_BYTES_PER_BLOCK);
+	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
+	total_size = XY_CTRL_SURF_INSTR_SIZE * num_cmds;
+
+	/*
+	 * Adding a flush before and after XY_CTRL_SURF_COPY_BLT
+	 */
+	total_size += 2 * MI_FLUSH_DW_SIZE;
+
+	return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+				     u8 src_mem_access, u8 dst_mem_access,
+				     int src_mocs, int dst_mocs,
+				     u32 ccs_blocks)
+{
+	/*
+	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+	 * data in and out of the CCS region.
+	 *
+	 * We can copy at most 1024 blocks of 256 bytes using one
+	 * XY_CTRL_SURF_COPY_BLT instruction.
+	 *
+	 * In case we need to copy more than 1024 blocks, we need to add
+	 * another instruction to the same batch buffer.
+	 *
+	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
+	 *
+	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
+	 */
+	do {
+		int blks_per_copy;
+
+		blks_per_copy = ccs_blocks >= NUM_CCS_BLKS_PER_XFER ?
+				NUM_CCS_BLKS_PER_XFER : ccs_blocks;
+		*cmd++ = XY_CTRL_SURF_COPY_BLT |
+			 src_mem_access << SRC_ACCESS_TYPE_SHIFT |
+			 dst_mem_access << DST_ACCESS_TYPE_SHIFT |
+			 FIELD_PREP(CCS_SIZE_MASK, blks_per_copy - 1);
+		*cmd++ = lower_32_bits(src_addr);
+		*cmd++ = (upper_32_bits(src_addr) & 0xFFFF) |
+			  FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, src_mocs);
+		*cmd++ = lower_32_bits(dst_addr);
+		*cmd++ = (upper_32_bits(dst_addr) & 0xFFFF) |
+			  FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, dst_mocs);
+		src_addr += SZ_64M;
+		dst_addr += SZ_64M;
+		ccs_blocks -= blks_per_copy;
+	} while (ccs_blocks > 0);
+
+	return cmd;
+}
+
 static int emit_copy(struct i915_request *rq,
 		     u32 dst_offset, u32 src_offset, int size)
 {
@@ -614,16 +720,24 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
+static int emit_clear(struct i915_request *rq, u64 offset, int size,
+		      u32 value, bool is_lmem)
 {
-	const int ver = GRAPHICS_VER(rq->engine->i915);
+	struct drm_i915_private *i915 = rq->engine->i915;
+	const int ver = GRAPHICS_VER(i915);
+	u32 num_ccs_blks, ccs_ring_size;
+	int mocs = rq->engine->gt->mocs.uc_index << 1;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
 	offset += (u64)rq->engine->instance << 32;
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
+	/* Clear CCS only when value is 0 */
+	ccs_ring_size = (is_lmem && !value) ?
+			 calc_ctrl_surf_instr_size(i915, size) : 0;
+
+	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -646,6 +760,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
 		*cs++ = value;
 	}
 
+	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
+		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+					    NUM_CCS_BYTES_PER_BLOCK);
+
+		/*
+		 * Flat CCS surface can only be accessed via
+		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
+		 * mapping of associated LMEM.
+		 * We can clear ccs surface by writing all 0s,
+		 * so we will flush the previously cleared buffer
+		 * and use it as a source.
+		 */
+		cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
+					      DIRECT_ACCESS, INDIRECT_ACCESS,
+					      mocs, mocs, num_ccs_blks);
+		cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
+
+		if (ccs_ring_size & 1)
+			*cs++ = MI_NOOP;
+	}
 	intel_ring_advance(rq, cs);
 	return 0;
 }
@@ -711,7 +846,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, offset, len, value);
+		err = emit_clear(rq, offset, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Intel-gfx] [PATCH v2 1/4] drm/i915/gt: Clear compress metadata for Xe_HP platforms
@ 2022-03-01 21:53   ` Ramalingam C
  0 siblings, 0 replies; 25+ messages in thread
From: Ramalingam C @ 2022-03-01 21:53 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>

Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.

Flat CCS memory can not be directly accessed by S/W.
Address of CCS buffer associated main BO is automatically calculated
by device itself. KMD/UMD can only access this buffer indirectly using
XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.

v2: Fixed issues with platform naming [Lucas]
v3: Rebased [Ram]
    Used the round_up funcs [Bob]
v4: Fixed ccs blk calculation [Ram]
    Added Kdoc on flat-ccs.
v5: GENMASK is used [Matt]
    mocs fix [Matt]
    Comments Fix [Matt]
    Flush address programming [Ram]
v6: FLUSH_DW is fixed
    Few coding style fix

Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 143 ++++++++++++++++++-
 2 files changed, 154 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index f8253012d166..237c1baccc64 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -203,6 +203,21 @@
 #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
 #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
 
+#define XY_CTRL_SURF_INSTR_SIZE	5
+#define MI_FLUSH_DW_SIZE		3
+#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT		21
+#define   DST_ACCESS_TYPE_SHIFT		20
+#define   CCS_SIZE_MASK			GENMASK(17, 8)
+#define   XY_CTRL_SURF_MOCS_MASK	GENMASK(31, 25)
+#define   NUM_CCS_BYTES_PER_BLOCK	256
+#define   NUM_BYTES_PER_CCS_BYTE	256
+#define   NUM_CCS_BLKS_PER_XFER		1024
+#define   INDIRECT_ACCESS		0
+#define   DIRECT_ACCESS			1
+#define  MI_FLUSH_LLC			BIT(9)
+#define  MI_FLUSH_CCS			BIT(16)
+
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
 #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 20444d6ceb3c..330fcdc3e0cf 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -16,6 +16,8 @@ struct insert_pte_data {
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
+#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
+					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
 
 static bool engine_supports_migration(struct intel_engine_cs *engine)
 {
@@ -467,6 +469,110 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
+/**
+ * DOC: Flat-CCS - Memory compression for Local memory
+ *
+ * On Xe-HP and later devices, we use dedicated compression control state (CCS)
+ * stored in local memory for each surface, to support the 3D and media
+ * compression formats.
+ *
+ * The memory required for the CCS of the entire local memory is 1/256 of the
+ * local memory size. So before the kernel boot, the required memory is reserved
+ * for the CCS data and a secure register will be programmed with the CCS base
+ * address.
+ *
+ * Flat CCS data needs to be cleared when a lmem object is allocated.
+ * And CCS data can be copied in and out of CCS region through
+ * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
+ *
+ * When we exhaust the lmem, if the object's placements support smem, then we can
+ * directly decompress the compressed lmem object into smem and start using it
+ * from smem itself.
+ *
+ * But when we need to swapout the compressed lmem object into a smem region
+ * though objects' placement doesn't support smem, then we copy the lmem content
+ * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
+ * When the object is referred, lmem content will be swaped in along with
+ * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
+ * location.
+ */
+
+static inline u32 *i915_flush_dw(u32 *cmd, u32 flags)
+{
+	*cmd++ = MI_FLUSH_DW | flags;
+	*cmd++ = 0;
+	*cmd++ = 0;
+
+	return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
+{
+	u32 num_cmds, num_blks, total_size;
+
+	if (!GET_CCS_BYTES(i915, size))
+		return 0;
+
+	/*
+	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
+	 * transfer upto 1024 blocks.
+	 */
+	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+				NUM_CCS_BYTES_PER_BLOCK);
+	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
+	total_size = XY_CTRL_SURF_INSTR_SIZE * num_cmds;
+
+	/*
+	 * Adding a flush before and after XY_CTRL_SURF_COPY_BLT
+	 */
+	total_size += 2 * MI_FLUSH_DW_SIZE;
+
+	return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+				     u8 src_mem_access, u8 dst_mem_access,
+				     int src_mocs, int dst_mocs,
+				     u32 ccs_blocks)
+{
+	/*
+	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+	 * data in and out of the CCS region.
+	 *
+	 * We can copy at most 1024 blocks of 256 bytes using one
+	 * XY_CTRL_SURF_COPY_BLT instruction.
+	 *
+	 * In case we need to copy more than 1024 blocks, we need to add
+	 * another instruction to the same batch buffer.
+	 *
+	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
+	 *
+	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
+	 */
+	do {
+		int blks_per_copy;
+
+		blks_per_copy = ccs_blocks >= NUM_CCS_BLKS_PER_XFER ?
+				NUM_CCS_BLKS_PER_XFER : ccs_blocks;
+		*cmd++ = XY_CTRL_SURF_COPY_BLT |
+			 src_mem_access << SRC_ACCESS_TYPE_SHIFT |
+			 dst_mem_access << DST_ACCESS_TYPE_SHIFT |
+			 FIELD_PREP(CCS_SIZE_MASK, blks_per_copy - 1);
+		*cmd++ = lower_32_bits(src_addr);
+		*cmd++ = (upper_32_bits(src_addr) & 0xFFFF) |
+			  FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, src_mocs);
+		*cmd++ = lower_32_bits(dst_addr);
+		*cmd++ = (upper_32_bits(dst_addr) & 0xFFFF) |
+			  FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, dst_mocs);
+		src_addr += SZ_64M;
+		dst_addr += SZ_64M;
+		ccs_blocks -= blks_per_copy;
+	} while (ccs_blocks > 0);
+
+	return cmd;
+}
+
 static int emit_copy(struct i915_request *rq,
 		     u32 dst_offset, u32 src_offset, int size)
 {
@@ -614,16 +720,24 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
+static int emit_clear(struct i915_request *rq, u64 offset, int size,
+		      u32 value, bool is_lmem)
 {
-	const int ver = GRAPHICS_VER(rq->engine->i915);
+	struct drm_i915_private *i915 = rq->engine->i915;
+	const int ver = GRAPHICS_VER(i915);
+	u32 num_ccs_blks, ccs_ring_size;
+	int mocs = rq->engine->gt->mocs.uc_index << 1;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
 	offset += (u64)rq->engine->instance << 32;
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
+	/* Clear CCS only when value is 0 */
+	ccs_ring_size = (is_lmem && !value) ?
+			 calc_ctrl_surf_instr_size(i915, size) : 0;
+
+	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -646,6 +760,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
 		*cs++ = value;
 	}
 
+	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
+		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+					    NUM_CCS_BYTES_PER_BLOCK);
+
+		/*
+		 * Flat CCS surface can only be accessed via
+		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
+		 * mapping of associated LMEM.
+		 * We can clear ccs surface by writing all 0s,
+		 * so we will flush the previously cleared buffer
+		 * and use it as a source.
+		 */
+		cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
+					      DIRECT_ACCESS, INDIRECT_ACCESS,
+					      mocs, mocs, num_ccs_blks);
+		cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
+
+		if (ccs_ring_size & 1)
+			*cs++ = MI_NOOP;
+	}
 	intel_ring_advance(rq, cs);
 	return 0;
 }
@@ -711,7 +846,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, offset, len, value);
+		err = emit_clear(rq, offset, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/4] drm/ttm: parameter to add extra pages into ttm_tt
  2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C
@ 2022-03-01 21:53   ` Ramalingam C
  -1 siblings, 0 replies; 25+ messages in thread
From: Ramalingam C @ 2022-03-01 21:53 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld, Christian Koenig

When a driver needs extra pages in ttm_tt, to facilidate such
requirement, parameter called "extra_pages" is added for
ttm_tt_init

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Christian Koenig <christian.koenig@amd.com>
cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
---
 drivers/gpu/drm/drm_gem_vram_helper.c      |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c    |  2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c              |  2 +-
 drivers/gpu/drm/ttm/ttm_agp_backend.c      |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c               | 12 +++++++-----
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
 include/drm/ttm/ttm_tt.h                   |  4 +++-
 7 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
index dc7f938bfff2..123045b58fec 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -867,7 +867,7 @@ static struct ttm_tt *bo_driver_ttm_tt_create(struct ttm_buffer_object *bo,
 	if (!tt)
 		return NULL;
 
-	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);
+	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
 	if (ret < 0)
 		goto err_ttm_tt_init;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 45cc5837ce00..1a8262f5f692 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -283,7 +283,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		i915_tt->is_shmem = true;
 	}
 
-	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching);
+	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
 	if (ret)
 		goto err_free;
 
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index b2e33d5ba5d0..52156b54498f 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct ttm_buffer_object *bo,
 	ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
 	if (ttm == NULL)
 		return NULL;
-	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
+	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
 		kfree(ttm);
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c
index 6ddc16f0fe2b..d27691f2e451 100644
--- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
+++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
@@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_buffer_object *bo,
 	agp_be->mem = NULL;
 	agp_be->bridge = bridge;
 
-	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined)) {
+	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined, 0)) {
 		kfree(agp_be);
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index d234aab800a0..1a66d9fc589a 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -134,9 +134,10 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
 static void ttm_tt_init_fields(struct ttm_tt *ttm,
 			       struct ttm_buffer_object *bo,
 			       uint32_t page_flags,
-			       enum ttm_caching caching)
+			       enum ttm_caching caching,
+			       unsigned long extra_pages)
 {
-	ttm->num_pages = PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT;
+	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
 	ttm->caching = ttm_cached;
 	ttm->page_flags = page_flags;
 	ttm->dma_address = NULL;
@@ -146,9 +147,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
 }
 
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
-		uint32_t page_flags, enum ttm_caching caching)
+		uint32_t page_flags, enum ttm_caching caching,
+		unsigned long extra_pages)
 {
-	ttm_tt_init_fields(ttm, bo, page_flags, caching);
+	ttm_tt_init_fields(ttm, bo, page_flags, caching, extra_pages);
 
 	if (ttm_tt_alloc_page_directory(ttm)) {
 		pr_err("Failed allocating page table\n");
@@ -180,7 +182,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
 {
 	int ret;
 
-	ttm_tt_init_fields(ttm, bo, page_flags, caching);
+	ttm_tt_init_fields(ttm, bo, page_flags, caching, 0);
 
 	if (page_flags & TTM_TT_FLAG_EXTERNAL)
 		ret = ttm_sg_tt_alloc_page_directory(ttm);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index b84ecc6d6611..4e3938e62c08 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -517,7 +517,7 @@ static struct ttm_tt *vmw_ttm_tt_create(struct ttm_buffer_object *bo,
 				     ttm_cached);
 	else
 		ret = ttm_tt_init(&vmw_be->dma_ttm, bo, page_flags,
-				  ttm_cached);
+				  ttm_cached, 0);
 	if (unlikely(ret != 0))
 		goto out_no_init;
 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index f20832139815..17a0310e8aaa 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -140,6 +140,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
  * @bo: The buffer object we create the ttm for.
  * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
  * @caching: the desired caching state of the pages
+ * @extra_pages: Extra pages needed for the driver.
  *
  * Create a struct ttm_tt to back data with system memory pages.
  * No pages are actually allocated.
@@ -147,7 +148,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
  * NULL: Out of memory.
  */
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
-		uint32_t page_flags, enum ttm_caching caching);
+		uint32_t page_flags, enum ttm_caching caching,
+		unsigned long extra_pages);
 int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object *bo,
 		   uint32_t page_flags, enum ttm_caching caching);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Intel-gfx] [PATCH v2 2/4] drm/ttm: parameter to add extra pages into ttm_tt
@ 2022-03-01 21:53   ` Ramalingam C
  0 siblings, 0 replies; 25+ messages in thread
From: Ramalingam C @ 2022-03-01 21:53 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld, Christian Koenig

When a driver needs extra pages in ttm_tt, to facilidate such
requirement, parameter called "extra_pages" is added for
ttm_tt_init

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Christian Koenig <christian.koenig@amd.com>
cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
---
 drivers/gpu/drm/drm_gem_vram_helper.c      |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c    |  2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c              |  2 +-
 drivers/gpu/drm/ttm/ttm_agp_backend.c      |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c               | 12 +++++++-----
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
 include/drm/ttm/ttm_tt.h                   |  4 +++-
 7 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
index dc7f938bfff2..123045b58fec 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -867,7 +867,7 @@ static struct ttm_tt *bo_driver_ttm_tt_create(struct ttm_buffer_object *bo,
 	if (!tt)
 		return NULL;
 
-	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);
+	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
 	if (ret < 0)
 		goto err_ttm_tt_init;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 45cc5837ce00..1a8262f5f692 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -283,7 +283,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		i915_tt->is_shmem = true;
 	}
 
-	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching);
+	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
 	if (ret)
 		goto err_free;
 
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index b2e33d5ba5d0..52156b54498f 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct ttm_buffer_object *bo,
 	ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
 	if (ttm == NULL)
 		return NULL;
-	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
+	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
 		kfree(ttm);
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c
index 6ddc16f0fe2b..d27691f2e451 100644
--- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
+++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
@@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_buffer_object *bo,
 	agp_be->mem = NULL;
 	agp_be->bridge = bridge;
 
-	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined)) {
+	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined, 0)) {
 		kfree(agp_be);
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index d234aab800a0..1a66d9fc589a 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -134,9 +134,10 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
 static void ttm_tt_init_fields(struct ttm_tt *ttm,
 			       struct ttm_buffer_object *bo,
 			       uint32_t page_flags,
-			       enum ttm_caching caching)
+			       enum ttm_caching caching,
+			       unsigned long extra_pages)
 {
-	ttm->num_pages = PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT;
+	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
 	ttm->caching = ttm_cached;
 	ttm->page_flags = page_flags;
 	ttm->dma_address = NULL;
@@ -146,9 +147,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
 }
 
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
-		uint32_t page_flags, enum ttm_caching caching)
+		uint32_t page_flags, enum ttm_caching caching,
+		unsigned long extra_pages)
 {
-	ttm_tt_init_fields(ttm, bo, page_flags, caching);
+	ttm_tt_init_fields(ttm, bo, page_flags, caching, extra_pages);
 
 	if (ttm_tt_alloc_page_directory(ttm)) {
 		pr_err("Failed allocating page table\n");
@@ -180,7 +182,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
 {
 	int ret;
 
-	ttm_tt_init_fields(ttm, bo, page_flags, caching);
+	ttm_tt_init_fields(ttm, bo, page_flags, caching, 0);
 
 	if (page_flags & TTM_TT_FLAG_EXTERNAL)
 		ret = ttm_sg_tt_alloc_page_directory(ttm);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index b84ecc6d6611..4e3938e62c08 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -517,7 +517,7 @@ static struct ttm_tt *vmw_ttm_tt_create(struct ttm_buffer_object *bo,
 				     ttm_cached);
 	else
 		ret = ttm_tt_init(&vmw_be->dma_ttm, bo, page_flags,
-				  ttm_cached);
+				  ttm_cached, 0);
 	if (unlikely(ret != 0))
 		goto out_no_init;
 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index f20832139815..17a0310e8aaa 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -140,6 +140,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
  * @bo: The buffer object we create the ttm for.
  * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
  * @caching: the desired caching state of the pages
+ * @extra_pages: Extra pages needed for the driver.
  *
  * Create a struct ttm_tt to back data with system memory pages.
  * No pages are actually allocated.
@@ -147,7 +148,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
  * NULL: Out of memory.
  */
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
-		uint32_t page_flags, enum ttm_caching caching);
+		uint32_t page_flags, enum ttm_caching caching,
+		unsigned long extra_pages);
 int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object *bo,
 		   uint32_t page_flags, enum ttm_caching caching);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/4] drm/i915/gem: Extra pages in ttm_tt for ccs data
  2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C
@ 2022-03-01 21:53   ` Ramalingam C
  -1 siblings, 0 replies; 25+ messages in thread
From: Ramalingam C @ 2022-03-01 21:53 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld, Christian Koenig

On Xe-HP and later devices, we use dedicated compression control
state (CCS) stored in local memory for each surface, to support the
3D and media compression formats.

The memory required for the CCS of the entire local memory is 1/256 of
the local memory size. So before the kernel boot, the required memory
is reserved for the CCS data and a secure register will be programmed
with the CCS base address

So when we allocate a object in local memory we dont need to explicitly
allocate the space for ccs data. But when we evict the obj into the
smem to hold the compression related data along with the obj we need
smem space of obj_size + (obj_size/256).

Hence when we create smem for an obj with lmem placement possibility we
create with the extra space.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Christian Koenig <christian.koenig@amd.com>
cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 1a8262f5f692..c7a36861c38d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -20,6 +20,7 @@
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
 #include "gem/i915_gem_ttm_pm.h"
+#include "gt/intel_gpu_commands.h"
 
 #define I915_TTM_PRIO_PURGE     0
 #define I915_TTM_PRIO_NO_PAGES  1
@@ -255,12 +256,27 @@ static const struct i915_refct_sgt_ops tt_rsgt_ops = {
 	.release = i915_ttm_tt_release
 };
 
+static inline bool
+i915_gem_object_has_lmem_placement(struct drm_i915_gem_object *obj)
+{
+	int i;
+
+	for (i = 0; i < obj->mm.n_placements; i++)
+		if (obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
+			return true;
+
+	return false;
+}
+
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 					 uint32_t page_flags)
 {
+	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
+						     bdev);
 	struct ttm_resource_manager *man =
 		ttm_manager_type(bo->bdev, bo->resource->mem_type);
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+	unsigned long ccs_pages = 0;
 	enum ttm_caching caching;
 	struct i915_ttm_tt *i915_tt;
 	int ret;
@@ -283,7 +299,12 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		i915_tt->is_shmem = true;
 	}
 
-	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
+	if (HAS_FLAT_CCS(i915) && i915_gem_object_has_lmem_placement(obj))
+		ccs_pages = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size,
+						      NUM_BYTES_PER_CCS_BYTE),
+					 PAGE_SIZE);
+
+	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, ccs_pages);
 	if (ret)
 		goto err_free;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Intel-gfx] [PATCH v2 3/4] drm/i915/gem: Extra pages in ttm_tt for ccs data
@ 2022-03-01 21:53   ` Ramalingam C
  0 siblings, 0 replies; 25+ messages in thread
From: Ramalingam C @ 2022-03-01 21:53 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld, Christian Koenig

On Xe-HP and later devices, we use dedicated compression control
state (CCS) stored in local memory for each surface, to support the
3D and media compression formats.

The memory required for the CCS of the entire local memory is 1/256 of
the local memory size. So before the kernel boot, the required memory
is reserved for the CCS data and a secure register will be programmed
with the CCS base address

So when we allocate a object in local memory we dont need to explicitly
allocate the space for ccs data. But when we evict the obj into the
smem to hold the compression related data along with the obj we need
smem space of obj_size + (obj_size/256).

Hence when we create smem for an obj with lmem placement possibility we
create with the extra space.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Christian Koenig <christian.koenig@amd.com>
cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 1a8262f5f692..c7a36861c38d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -20,6 +20,7 @@
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
 #include "gem/i915_gem_ttm_pm.h"
+#include "gt/intel_gpu_commands.h"
 
 #define I915_TTM_PRIO_PURGE     0
 #define I915_TTM_PRIO_NO_PAGES  1
@@ -255,12 +256,27 @@ static const struct i915_refct_sgt_ops tt_rsgt_ops = {
 	.release = i915_ttm_tt_release
 };
 
+static inline bool
+i915_gem_object_has_lmem_placement(struct drm_i915_gem_object *obj)
+{
+	int i;
+
+	for (i = 0; i < obj->mm.n_placements; i++)
+		if (obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
+			return true;
+
+	return false;
+}
+
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 					 uint32_t page_flags)
 {
+	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
+						     bdev);
 	struct ttm_resource_manager *man =
 		ttm_manager_type(bo->bdev, bo->resource->mem_type);
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+	unsigned long ccs_pages = 0;
 	enum ttm_caching caching;
 	struct i915_ttm_tt *i915_tt;
 	int ret;
@@ -283,7 +299,12 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		i915_tt->is_shmem = true;
 	}
 
-	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
+	if (HAS_FLAT_CCS(i915) && i915_gem_object_has_lmem_placement(obj))
+		ccs_pages = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size,
+						      NUM_BYTES_PER_CCS_BYTE),
+					 PAGE_SIZE);
+
+	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, ccs_pages);
 	if (ret)
 		goto err_free;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/4] drm/i915/migrate: Evict and restore the flatccs capable lmem obj
  2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C
@ 2022-03-01 21:53   ` Ramalingam C
  -1 siblings, 0 replies; 25+ messages in thread
From: Ramalingam C @ 2022-03-01 21:53 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

When we are swapping out the local memory obj on flat-ccs capable platform,
we need to capture the ccs data too along with main meory and we need to
restore it when we are swapping in the content.

When lmem object is swapped into a smem obj, smem obj will
have the extra pages required to hold the ccs data corresponding to the
lmem main memory. So main memory of lmem will be copied into the initial
pages of the smem and then ccs data corresponding to the main memory
will be copied to the subsequent pages of smem. ccs data is 1/256 of
lmem size.

Swapin happens exactly in reverse order. First main memory of lmem is
restored from the smem's initial pages and the ccs data will be restored
from the subsequent pages of smem.

Extracting and restoring the CCS data is done through a special cmd called
XY_CTRL_SURF_COPY_BLT

v2: Fixing the ccs handling

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 184 +++++++++++++++++++++---
 1 file changed, 167 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 330fcdc3e0cf..73ac7382aeb6 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -341,12 +341,9 @@ static int emit_no_arbitration(struct i915_request *rq)
 	return 0;
 }
 
-static int emit_pte(struct i915_request *rq,
-		    struct sgt_dma *it,
+static int emit_pte(struct i915_request *rq, struct sgt_dma *it,
 		    enum i915_cache_level cache_level,
-		    bool is_lmem,
-		    u64 offset,
-		    int length)
+		    bool is_lmem, u64 offset, int length)
 {
 	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
 	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
@@ -573,14 +570,54 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
 	return cmd;
 }
 
+static int emit_ccs_copy(struct i915_request *rq,
+			 bool dst_is_lmem, u32 dst_offset,
+			 bool src_is_lmem, u32 src_offset, int size)
+{
+	struct drm_i915_private *i915 = rq->engine->i915;
+	int mocs = rq->engine->gt->mocs.uc_index << 1;
+	u32 num_ccs_blks, ccs_ring_size;
+	u8 src_access, dst_access;
+	u32 *cs;
+
+	GEM_BUG_ON(!(src_is_lmem ^ dst_is_lmem) || !HAS_FLAT_CCS(i915));
+
+	ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
+	WARN_ON(!ccs_ring_size);
+
+	cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+				    NUM_CCS_BYTES_PER_BLOCK);
+
+	src_access = !src_is_lmem && dst_is_lmem;
+	dst_access = !src_access;
+
+	cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
+	cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset,
+				      src_access, dst_access,
+				      mocs, mocs, num_ccs_blks);
+	cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
+	if (ccs_ring_size & 1)
+		*cs++ = MI_NOOP;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static int emit_copy(struct i915_request *rq,
-		     u32 dst_offset, u32 src_offset, int size)
+		     bool dst_is_lmem, u32 dst_offset,
+		     bool src_is_lmem, u32 src_offset, int size)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 instance = rq->engine->instance;
 	u32 *cs;
 
 	cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6);
+
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -620,6 +657,18 @@ static int emit_copy(struct i915_request *rq,
 	return 0;
 }
 
+static int scatter_list_length(struct scatterlist *sg)
+{
+	int len = 0;
+
+	while (sg) {
+		len += sg_dma_len(sg);
+		sg = sg_next(sg);
+	};
+
+	return len;
+}
+
 int
 intel_context_migrate_copy(struct intel_context *ce,
 			   const struct i915_deps *deps,
@@ -632,7 +681,10 @@ intel_context_migrate_copy(struct intel_context *ce,
 			   struct i915_request **out)
 {
 	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
+	struct drm_i915_private *i915 = ce->engine->i915;
+	u32 src_sz, dst_sz, ccs_bytes = 0, bytes_to_cpy;
 	struct i915_request *rq;
+	bool ccs_copy = false;
 	int err;
 
 	GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
@@ -640,9 +692,28 @@ intel_context_migrate_copy(struct intel_context *ce,
 
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
+	if (HAS_FLAT_CCS(i915) && src_is_lmem ^ dst_is_lmem) {
+		src_sz = scatter_list_length(src);
+		dst_sz = scatter_list_length(dst);
+
+		if (src_is_lmem)
+			bytes_to_cpy = src_sz;
+		else if (dst_is_lmem)
+			bytes_to_cpy = dst_sz;
+
+		/*
+		 * When there is a eviction of ccs needed smem will have the
+		 * extra pages for the ccs data
+		 *
+		 * TO-DO: Want to move the size mismatch check to a WARN_ON,
+		 * but still we have some requests of smem->lmem with same size.
+		 * Need to fix it.
+		 */
+		ccs_bytes = src_sz != dst_sz ? GET_CCS_BYTES(i915, bytes_to_cpy) : 0;
+	}
+
 	do {
-		u32 src_offset, dst_offset;
-		int len;
+		u32 src_offset, dst_offset, copy_sz;
 
 		rq = i915_request_create(ce);
 		if (IS_ERR(rq)) {
@@ -682,27 +753,82 @@ intel_context_migrate_copy(struct intel_context *ce,
 				dst_offset = 2 * CHUNK_SZ;
 		}
 
-		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
-			       src_offset, CHUNK_SZ);
-		if (len <= 0) {
-			err = len;
+		if (ccs_copy) {
+			/* Flat-CCS: CCS data copy */
+			if (!src_is_lmem) { /* src is smem */
+				/*
+				 * We can only copy the ccs data corresponding to
+				 * the CHUNK_SZ of lmem which is
+				 * GET_CCS_BYTES(i915, CHUNK_SZ))
+				 */
+				src_sz = min_t(int, bytes_to_cpy,
+					       GET_CCS_BYTES(i915, CHUNK_SZ));
+				dst_sz = CHUNK_SZ;
+			} else {
+				src_sz = CHUNK_SZ;
+				dst_sz = min_t(int, bytes_to_cpy,
+					       GET_CCS_BYTES(i915, CHUNK_SZ));
+			}
+		} else if (!ccs_copy && ccs_bytes) {
+			/* Flat-CCS: Main memory copy */
+			if (!src_is_lmem) {
+				src_sz = min_t(int, bytes_to_cpy, CHUNK_SZ);
+				dst_sz = CHUNK_SZ;
+			} else {
+				dst_sz = min_t(int, bytes_to_cpy, CHUNK_SZ);
+				src_sz = CHUNK_SZ;
+			}
+		} else { /* ccs handling is not required */
+			src_sz = CHUNK_SZ;
+		}
+
+		src_sz = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
+				  src_offset, src_sz);
+		if (src_sz <= 0) {
+			err = src_sz;
 			goto out_rq;
 		}
 
+		if (!ccs_bytes)
+			dst_sz = src_sz;
+
 		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
-			       dst_offset, len);
+			       dst_offset, dst_sz);
 		if (err < 0)
 			goto out_rq;
-		if (err < len) {
+		if (err < dst_sz && !ccs_bytes) {
 			err = -EINVAL;
 			goto out_rq;
 		}
 
+		dst_sz = err;
+
 		err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
 		if (err)
 			goto out_rq;
 
-		err = emit_copy(rq, dst_offset, src_offset, len);
+		if (ccs_copy) {
+			/*
+			 * Using max of src_sz and dst_sz, as we need to
+			 * pass the lmem size corresponding to the ccs
+			 * blocks we need to handle.
+			 */
+			copy_sz = max_t(int, src_sz, dst_sz);
+			err = emit_ccs_copy(rq, dst_is_lmem, dst_offset,
+					    src_is_lmem, src_offset,
+					    copy_sz);
+
+			/* Converting back to ccs bytes */
+			copy_sz = GET_CCS_BYTES(i915, copy_sz);
+		} else {
+			WARN(src_sz != dst_sz, "%d != %d", src_sz, dst_sz);
+			copy_sz = src_sz;
+			err = emit_copy(rq, dst_is_lmem, dst_offset,
+					src_is_lmem, src_offset, copy_sz);
+		}
+
+		if (!err && ccs_bytes)
+			bytes_to_cpy -= copy_sz;
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
@@ -710,9 +836,33 @@ intel_context_migrate_copy(struct intel_context *ce,
 			i915_request_put(*out);
 		*out = i915_request_get(rq);
 		i915_request_add(rq);
-		if (err || !it_src.sg || !sg_dma_len(it_src.sg))
-			break;
 
+		if (err || !it_src.sg || !sg_dma_len(it_src.sg) ||
+		    !it_dst.sg || !sg_dma_len(it_src.sg)) {
+			if (err || !ccs_bytes)
+				break;
+
+			GEM_BUG_ON(bytes_to_cpy);
+			if (ccs_copy) {
+				break;
+			} else if (ccs_bytes) {
+				if (src_is_lmem) {
+					WARN_ON(it_src.sg && sg_dma_len(it_src.sg));
+					it_src = sg_sgt(src);
+				} else {
+					WARN_ON(it_dst.sg && sg_dma_len(it_dst.sg));
+					it_dst = sg_sgt(dst);
+				}
+				bytes_to_cpy = ccs_bytes;
+				ccs_copy = true;
+
+				continue;
+			} else {
+				DRM_ERROR("Invalid state\n");
+				err = -EINVAL;
+				break;
+			}
+		}
 		cond_resched();
 	} while (1);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Intel-gfx] [PATCH v2 4/4] drm/i915/migrate: Evict and restore the flatccs capable lmem obj
@ 2022-03-01 21:53   ` Ramalingam C
  0 siblings, 0 replies; 25+ messages in thread
From: Ramalingam C @ 2022-03-01 21:53 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

When we are swapping out the local memory obj on flat-ccs capable platform,
we need to capture the ccs data too along with main meory and we need to
restore it when we are swapping in the content.

When lmem object is swapped into a smem obj, smem obj will
have the extra pages required to hold the ccs data corresponding to the
lmem main memory. So main memory of lmem will be copied into the initial
pages of the smem and then ccs data corresponding to the main memory
will be copied to the subsequent pages of smem. ccs data is 1/256 of
lmem size.

Swapin happens exactly in reverse order. First main memory of lmem is
restored from the smem's initial pages and the ccs data will be restored
from the subsequent pages of smem.

Extracting and restoring the CCS data is done through a special cmd called
XY_CTRL_SURF_COPY_BLT

v2: Fixing the ccs handling

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 184 +++++++++++++++++++++---
 1 file changed, 167 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 330fcdc3e0cf..73ac7382aeb6 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -341,12 +341,9 @@ static int emit_no_arbitration(struct i915_request *rq)
 	return 0;
 }
 
-static int emit_pte(struct i915_request *rq,
-		    struct sgt_dma *it,
+static int emit_pte(struct i915_request *rq, struct sgt_dma *it,
 		    enum i915_cache_level cache_level,
-		    bool is_lmem,
-		    u64 offset,
-		    int length)
+		    bool is_lmem, u64 offset, int length)
 {
 	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
 	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
@@ -573,14 +570,54 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
 	return cmd;
 }
 
+static int emit_ccs_copy(struct i915_request *rq,
+			 bool dst_is_lmem, u32 dst_offset,
+			 bool src_is_lmem, u32 src_offset, int size)
+{
+	struct drm_i915_private *i915 = rq->engine->i915;
+	int mocs = rq->engine->gt->mocs.uc_index << 1;
+	u32 num_ccs_blks, ccs_ring_size;
+	u8 src_access, dst_access;
+	u32 *cs;
+
+	GEM_BUG_ON(!(src_is_lmem ^ dst_is_lmem) || !HAS_FLAT_CCS(i915));
+
+	ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
+	WARN_ON(!ccs_ring_size);
+
+	cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+				    NUM_CCS_BYTES_PER_BLOCK);
+
+	src_access = !src_is_lmem && dst_is_lmem;
+	dst_access = !src_access;
+
+	cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
+	cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset,
+				      src_access, dst_access,
+				      mocs, mocs, num_ccs_blks);
+	cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
+	if (ccs_ring_size & 1)
+		*cs++ = MI_NOOP;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static int emit_copy(struct i915_request *rq,
-		     u32 dst_offset, u32 src_offset, int size)
+		     bool dst_is_lmem, u32 dst_offset,
+		     bool src_is_lmem, u32 src_offset, int size)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 instance = rq->engine->instance;
 	u32 *cs;
 
 	cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6);
+
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -620,6 +657,18 @@ static int emit_copy(struct i915_request *rq,
 	return 0;
 }
 
+static int scatter_list_length(struct scatterlist *sg)
+{
+	int len = 0;
+
+	while (sg) {
+		len += sg_dma_len(sg);
+		sg = sg_next(sg);
+	};
+
+	return len;
+}
+
 int
 intel_context_migrate_copy(struct intel_context *ce,
 			   const struct i915_deps *deps,
@@ -632,7 +681,10 @@ intel_context_migrate_copy(struct intel_context *ce,
 			   struct i915_request **out)
 {
 	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
+	struct drm_i915_private *i915 = ce->engine->i915;
+	u32 src_sz, dst_sz, ccs_bytes = 0, bytes_to_cpy;
 	struct i915_request *rq;
+	bool ccs_copy = false;
 	int err;
 
 	GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
@@ -640,9 +692,28 @@ intel_context_migrate_copy(struct intel_context *ce,
 
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
+	if (HAS_FLAT_CCS(i915) && src_is_lmem ^ dst_is_lmem) {
+		src_sz = scatter_list_length(src);
+		dst_sz = scatter_list_length(dst);
+
+		if (src_is_lmem)
+			bytes_to_cpy = src_sz;
+		else if (dst_is_lmem)
+			bytes_to_cpy = dst_sz;
+
+		/*
+		 * When there is a eviction of ccs needed smem will have the
+		 * extra pages for the ccs data
+		 *
+		 * TO-DO: Want to move the size mismatch check to a WARN_ON,
+		 * but still we have some requests of smem->lmem with same size.
+		 * Need to fix it.
+		 */
+		ccs_bytes = src_sz != dst_sz ? GET_CCS_BYTES(i915, bytes_to_cpy) : 0;
+	}
+
 	do {
-		u32 src_offset, dst_offset;
-		int len;
+		u32 src_offset, dst_offset, copy_sz;
 
 		rq = i915_request_create(ce);
 		if (IS_ERR(rq)) {
@@ -682,27 +753,82 @@ intel_context_migrate_copy(struct intel_context *ce,
 				dst_offset = 2 * CHUNK_SZ;
 		}
 
-		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
-			       src_offset, CHUNK_SZ);
-		if (len <= 0) {
-			err = len;
+		if (ccs_copy) {
+			/* Flat-CCS: CCS data copy */
+			if (!src_is_lmem) { /* src is smem */
+				/*
+				 * We can only copy the ccs data corresponding to
+				 * the CHUNK_SZ of lmem which is
+				 * GET_CCS_BYTES(i915, CHUNK_SZ))
+				 */
+				src_sz = min_t(int, bytes_to_cpy,
+					       GET_CCS_BYTES(i915, CHUNK_SZ));
+				dst_sz = CHUNK_SZ;
+			} else {
+				src_sz = CHUNK_SZ;
+				dst_sz = min_t(int, bytes_to_cpy,
+					       GET_CCS_BYTES(i915, CHUNK_SZ));
+			}
+		} else if (!ccs_copy && ccs_bytes) {
+			/* Flat-CCS: Main memory copy */
+			if (!src_is_lmem) {
+				src_sz = min_t(int, bytes_to_cpy, CHUNK_SZ);
+				dst_sz = CHUNK_SZ;
+			} else {
+				dst_sz = min_t(int, bytes_to_cpy, CHUNK_SZ);
+				src_sz = CHUNK_SZ;
+			}
+		} else { /* ccs handling is not required */
+			src_sz = CHUNK_SZ;
+		}
+
+		src_sz = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
+				  src_offset, src_sz);
+		if (src_sz <= 0) {
+			err = src_sz;
 			goto out_rq;
 		}
 
+		if (!ccs_bytes)
+			dst_sz = src_sz;
+
 		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
-			       dst_offset, len);
+			       dst_offset, dst_sz);
 		if (err < 0)
 			goto out_rq;
-		if (err < len) {
+		if (err < dst_sz && !ccs_bytes) {
 			err = -EINVAL;
 			goto out_rq;
 		}
 
+		dst_sz = err;
+
 		err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
 		if (err)
 			goto out_rq;
 
-		err = emit_copy(rq, dst_offset, src_offset, len);
+		if (ccs_copy) {
+			/*
+			 * Using max of src_sz and dst_sz, as we need to
+			 * pass the lmem size corresponding to the ccs
+			 * blocks we need to handle.
+			 */
+			copy_sz = max_t(int, src_sz, dst_sz);
+			err = emit_ccs_copy(rq, dst_is_lmem, dst_offset,
+					    src_is_lmem, src_offset,
+					    copy_sz);
+
+			/* Converting back to ccs bytes */
+			copy_sz = GET_CCS_BYTES(i915, copy_sz);
+		} else {
+			WARN(src_sz != dst_sz, "%d != %d", src_sz, dst_sz);
+			copy_sz = src_sz;
+			err = emit_copy(rq, dst_is_lmem, dst_offset,
+					src_is_lmem, src_offset, copy_sz);
+		}
+
+		if (!err && ccs_bytes)
+			bytes_to_cpy -= copy_sz;
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
@@ -710,9 +836,33 @@ intel_context_migrate_copy(struct intel_context *ce,
 			i915_request_put(*out);
 		*out = i915_request_get(rq);
 		i915_request_add(rq);
-		if (err || !it_src.sg || !sg_dma_len(it_src.sg))
-			break;
 
+		if (err || !it_src.sg || !sg_dma_len(it_src.sg) ||
+		    !it_dst.sg || !sg_dma_len(it_src.sg)) {
+			if (err || !ccs_bytes)
+				break;
+
+			GEM_BUG_ON(bytes_to_cpy);
+			if (ccs_copy) {
+				break;
+			} else if (ccs_bytes) {
+				if (src_is_lmem) {
+					WARN_ON(it_src.sg && sg_dma_len(it_src.sg));
+					it_src = sg_sgt(src);
+				} else {
+					WARN_ON(it_dst.sg && sg_dma_len(it_dst.sg));
+					it_dst = sg_sgt(dst);
+				}
+				bytes_to_cpy = ccs_bytes;
+				ccs_copy = true;
+
+				continue;
+			} else {
+				DRM_ERROR("Invalid state\n");
+				err = -EINVAL;
+				break;
+			}
+		}
 		cond_resched();
 	} while (1);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/ttm: Evict and store of compressed object (rev2)
  2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C
                   ` (4 preceding siblings ...)
  (?)
@ 2022-03-02  1:51 ` Patchwork
  -1 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2022-03-02  1:51 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/ttm: Evict and store of compressed object (rev2)
URL   : https://patchwork.freedesktop.org/series/99759/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
9108a19bd675 drm/i915/gt: Clear compress metadata for Xe_HP platforms
03b09cc7472c drm/ttm: parameter to add extra pages into ttm_tt
-:88: CHECK:PREFER_KERNEL_TYPES: Prefer kernel type 'u32' over 'uint32_t'
#88: FILE: drivers/gpu/drm/ttm/ttm_tt.c:150:
+		uint32_t page_flags, enum ttm_caching caching,

-:135: CHECK:PREFER_KERNEL_TYPES: Prefer kernel type 'u32' over 'uint32_t'
#135: FILE: include/drm/ttm/ttm_tt.h:151:
+		uint32_t page_flags, enum ttm_caching caching,

total: 0 errors, 0 warnings, 2 checks, 88 lines checked
6ff06596f9c2 drm/i915/gem: Extra pages in ttm_tt for ccs data
6b1bfebe31ca drm/i915/migrate: Evict and restore the flatccs capable lmem obj



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/ttm: Evict and store of compressed object (rev2)
  2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C
                   ` (5 preceding siblings ...)
  (?)
@ 2022-03-02  1:53 ` Patchwork
  -1 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2022-03-02  1:53 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/ttm: Evict and store of compressed object (rev2)
URL   : https://patchwork.freedesktop.org/series/99759/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/ttm: Evict and store of compressed object (rev2)
  2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C
                   ` (6 preceding siblings ...)
  (?)
@ 2022-03-02  2:23 ` Patchwork
  -1 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2022-03-02  2:23 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 5575 bytes --]

== Series Details ==

Series: drm/i915/ttm: Evict and store of compressed object (rev2)
URL   : https://patchwork.freedesktop.org/series/99759/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11308 -> Patchwork_22455
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/index.html

Participating hosts (48 -> 43)
------------------------------

  Additional (1): fi-pnv-d510 
  Missing    (6): fi-kbl-soraka fi-hsw-4200u fi-bsw-cyan fi-icl-u2 fi-ctg-p8600 fi-bdw-samus 

Known issues
------------

  Here are the changes found in Patchwork_22455 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_cs_nop@sync-fork-compute0:
    - fi-snb-2600:        NOTRUN -> [SKIP][1] ([fdo#109271]) +17 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/fi-snb-2600/igt@amdgpu/amd_cs_nop@sync-fork-compute0.html

  * igt@gem_exec_suspend@basic-s3@smem:
    - fi-skl-6600u:       [PASS][2] -> [INCOMPLETE][3] ([i915#4547])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/fi-skl-6600u/igt@gem_exec_suspend@basic-s3@smem.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/fi-skl-6600u/igt@gem_exec_suspend@basic-s3@smem.html

  * igt@gem_huc_copy@huc-copy:
    - fi-pnv-d510:        NOTRUN -> [SKIP][4] ([fdo#109271]) +57 similar issues
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/fi-pnv-d510/igt@gem_huc_copy@huc-copy.html

  * igt@i915_selftest@live@gt_engines:
    - bat-dg1-6:          [PASS][5] -> [INCOMPLETE][6] ([i915#4418])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/bat-dg1-6/igt@i915_selftest@live@gt_engines.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/bat-dg1-6/igt@i915_selftest@live@gt_engines.html

  * igt@i915_selftest@live@workarounds:
    - fi-bdw-5557u:       NOTRUN -> [INCOMPLETE][7] ([i915#5170])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/fi-bdw-5557u/igt@i915_selftest@live@workarounds.html

  * igt@kms_chamelium@vga-edid-read:
    - fi-bdw-5557u:       NOTRUN -> [SKIP][8] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/fi-bdw-5557u/igt@kms_chamelium@vga-edid-read.html

  * igt@kms_psr@cursor_plane_move:
    - fi-bdw-5557u:       NOTRUN -> [SKIP][9] ([fdo#109271]) +13 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/fi-bdw-5557u/igt@kms_psr@cursor_plane_move.html

  * igt@runner@aborted:
    - bat-dg1-6:          NOTRUN -> [FAIL][10] ([i915#4312])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/bat-dg1-6/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@hangcheck:
    - fi-snb-2600:        [INCOMPLETE][11] ([i915#3921]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/fi-snb-2600/igt@i915_selftest@live@hangcheck.html

  * igt@kms_busy@basic@flip:
    - {bat-adlp-6}:       [DMESG-WARN][13] ([i915#3576]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/bat-adlp-6/igt@kms_busy@basic@flip.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/bat-adlp-6/igt@kms_busy@basic@flip.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cml-u2:          [DMESG-WARN][15] ([i915#4269]) -> [PASS][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#3921]: https://gitlab.freedesktop.org/drm/intel/issues/3921
  [i915#4269]: https://gitlab.freedesktop.org/drm/intel/issues/4269
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4418]: https://gitlab.freedesktop.org/drm/intel/issues/4418
  [i915#4547]: https://gitlab.freedesktop.org/drm/intel/issues/4547
  [i915#5127]: https://gitlab.freedesktop.org/drm/intel/issues/5127
  [i915#5170]: https://gitlab.freedesktop.org/drm/intel/issues/5170


Build changes
-------------

  * IGT: IGT_6361 -> IGTPW_6726
  * Linux: CI_DRM_11308 -> Patchwork_22455

  CI-20190529: 20190529
  CI_DRM_11308: e1345b708d76052ef668aaba362ef8ad81ed0b9b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_6726: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6726/index.html
  IGT_6361: 2372a4beb6a33c5f0799a4a8ccbb93794f52dbca @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_22455: 6b1bfebe31ca68e851ddb478cfba9aeff39a1260 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

6b1bfebe31ca drm/i915/migrate: Evict and restore the flatccs capable lmem obj
6ff06596f9c2 drm/i915/gem: Extra pages in ttm_tt for ccs data
03b09cc7472c drm/ttm: parameter to add extra pages into ttm_tt
9108a19bd675 drm/i915/gt: Clear compress metadata for Xe_HP platforms

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/index.html

[-- Attachment #2: Type: text/html, Size: 6629 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/ttm: Evict and store of compressed object (rev2)
  2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C
                   ` (7 preceding siblings ...)
  (?)
@ 2022-03-02  8:28 ` Patchwork
  -1 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2022-03-02  8:28 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 30280 bytes --]

== Series Details ==

Series: drm/i915/ttm: Evict and store of compressed object (rev2)
URL   : https://patchwork.freedesktop.org/series/99759/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11308_full -> Patchwork_22455_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_22455_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_22455_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (13 -> 13)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_22455_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_eio@reset-stress:
    - shard-snb:          [PASS][1] -> [TIMEOUT][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/shard-snb2/igt@gem_eio@reset-stress.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-snb6/igt@gem_eio@reset-stress.html

  * {igt@gem_lmem_swapping@heavy-verify-multi-ccs} (NEW):
    - shard-iclb:         NOTRUN -> [SKIP][3] +4 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb1/igt@gem_lmem_swapping@heavy-verify-multi-ccs.html

  * {igt@gem_lmem_swapping@parallel-random-verify-ccs} (NEW):
    - shard-tglb:         NOTRUN -> [SKIP][4] +4 similar issues
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb5/igt@gem_lmem_swapping@parallel-random-verify-ccs.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@kms_plane_scaling@downscale-with-pixel-format-factor-0-5@pipe-a-edp-1-downscale-with-pixel-format}:
    - {shard-rkl}:        NOTRUN -> [SKIP][5] +1 similar issue
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-rkl-6/igt@kms_plane_scaling@downscale-with-pixel-format-factor-0-5@pipe-a-edp-1-downscale-with-pixel-format.html

  
New tests
---------

  New tests have been introduced between CI_DRM_11308_full and Patchwork_22455_full:

### New IGT tests (22) ###

  * igt@gem_lmem_swapping@heavy-verify-multi-ccs:
    - Statuses : 7 skip(s)
    - Exec time: [0.0] s

  * igt@gem_lmem_swapping@heavy-verify-random-ccs:
    - Statuses : 7 skip(s)
    - Exec time: [0.0] s

  * igt@gem_lmem_swapping@parallel-random-verify-ccs:
    - Statuses : 7 skip(s)
    - Exec time: [0.0] s

  * igt@gem_lmem_swapping@verify-ccs:
    - Statuses : 7 skip(s)
    - Exec time: [0.0] s

  * igt@gem_lmem_swapping@verify-random-ccs:
    - Statuses : 7 skip(s)
    - Exec time: [0.0] s

  * igt@kms_flip@absolute-wf_vblank@d-hdmi-a3:
    - Statuses : 1 pass(s)
    - Exec time: [7.80] s

  * igt@kms_flip@nonexisting-fb-interruptible@d-hdmi-a3:
    - Statuses : 1 pass(s)
    - Exec time: [0.62] s

  * igt@kms_flip@wf_vblank-ts-check-interruptible@d-hdmi-a3:
    - Statuses : 1 pass(s)
    - Exec time: [8.07] s

  * igt@kms_plane_scaling@invalid-num-scalers@pipe-d-edp-1-invalid-num-scalers:
    - Statuses : 1 pass(s)
    - Exec time: [0.02] s

  * igt@kms_plane_scaling@planes-downscale-factor-0-75@pipe-d-edp-1-planes-downscale:
    - Statuses : 1 pass(s)
    - Exec time: [1.28] s

  * igt@kms_plane_scaling@planes-scaling-unity-scaling@pipe-d-edp-1-planes-unity-scaling:
    - Statuses : 1 pass(s)
    - Exec time: [1.28] s

  * igt@kms_plane_scaling@planes-unity-scaling-downscale-factor-0-75@pipe-d-edp-1-planes-upscale-downscale:
    - Statuses : 1 pass(s)
    - Exec time: [1.28] s

  * igt@kms_plane_scaling@planes-upscale-20x20@pipe-d-edp-1-planes-upscale:
    - Statuses : 1 pass(s)
    - Exec time: [1.22] s

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-a-edp-1-planes-upscale-downscale:
    - Statuses : 3 pass(s)
    - Exec time: [0.13, 2.04] s

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-a-hdmi-a-1-planes-upscale-downscale:
    - Statuses : 1 pass(s)
    - Exec time: [0.34] s

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-a-vga-1-planes-upscale-downscale:
    - Statuses : 1 skip(s)
    - Exec time: [0.04] s

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-b-edp-1-planes-upscale-downscale:
    - Statuses : 3 pass(s)
    - Exec time: [1.23, 1.85] s

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-b-hdmi-a-2-planes-upscale-downscale:
    - Statuses : 1 pass(s)
    - Exec time: [0.34] s

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-b-vga-1-planes-upscale-downscale:
    - Statuses : 1 skip(s)
    - Exec time: [0.03] s

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-c-edp-1-planes-upscale-downscale:
    - Statuses : 2 pass(s) 1 skip(s)
    - Exec time: [0.39, 1.27] s

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-c-hdmi-a-1-planes-upscale-downscale:
    - Statuses : 1 skip(s)
    - Exec time: [0.09] s

  * igt@kms_plane_scaling@planes-upscale-factor-0-25-downscale-factor-0-75@pipe-d-edp-1-planes-upscale-downscale:
    - Statuses : 1 pass(s)
    - Exec time: [1.22] s

  

Known issues
------------

  Here are the changes found in Patchwork_22455_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@feature_discovery@display-2x:
    - shard-tglb:         NOTRUN -> [SKIP][6] ([i915#1839])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb7/igt@feature_discovery@display-2x.html
    - shard-iclb:         NOTRUN -> [SKIP][7] ([i915#1839])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb2/igt@feature_discovery@display-2x.html

  * igt@gem_ctx_persistence@legacy-engines-queued:
    - shard-snb:          NOTRUN -> [SKIP][8] ([fdo#109271] / [i915#1099]) +2 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-snb5/igt@gem_ctx_persistence@legacy-engines-queued.html

  * igt@gem_ctx_sseu@mmap-args:
    - shard-tglb:         NOTRUN -> [SKIP][9] ([i915#280])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb2/igt@gem_ctx_sseu@mmap-args.html

  * igt@gem_eio@in-flight-1us:
    - shard-tglb:         [PASS][10] -> [TIMEOUT][11] ([i915#3063])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/shard-tglb5/igt@gem_eio@in-flight-1us.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb1/igt@gem_eio@in-flight-1us.html

  * igt@gem_eio@in-flight-contexts-10ms:
    - shard-tglb:         NOTRUN -> [TIMEOUT][12] ([i915#3063])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb5/igt@gem_eio@in-flight-contexts-10ms.html

  * igt@gem_exec_balancer@parallel-bb-first:
    - shard-tglb:         NOTRUN -> [DMESG-WARN][13] ([i915#5076])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb8/igt@gem_exec_balancer@parallel-bb-first.html

  * igt@gem_exec_balancer@parallel-out-fence:
    - shard-kbl:          NOTRUN -> [DMESG-WARN][14] ([i915#5076])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-kbl3/igt@gem_exec_balancer@parallel-out-fence.html

  * igt@gem_exec_endless@dispatch@bcs0:
    - shard-tglb:         [PASS][15] -> [INCOMPLETE][16] ([i915#3778])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/shard-tglb7/igt@gem_exec_endless@dispatch@bcs0.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb3/igt@gem_exec_endless@dispatch@bcs0.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-skl:          NOTRUN -> [FAIL][17] ([i915#2846])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl4/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-flow@rcs0:
    - shard-skl:          NOTRUN -> [SKIP][18] ([fdo#109271]) +265 similar issues
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl1/igt@gem_exec_fair@basic-flow@rcs0.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - shard-iclb:         [PASS][19] -> [FAIL][20] ([i915#2842])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/shard-iclb4/igt@gem_exec_fair@basic-none-share@rcs0.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb6/igt@gem_exec_fair@basic-none-share@rcs0.html
    - shard-glk:          [PASS][21] -> [FAIL][22] ([i915#2842])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/shard-glk5/igt@gem_exec_fair@basic-none-share@rcs0.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-glk1/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-none@vcs0:
    - shard-kbl:          NOTRUN -> [FAIL][23] ([i915#2842]) +3 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-kbl6/igt@gem_exec_fair@basic-none@vcs0.html
    - shard-tglb:         NOTRUN -> [FAIL][24] ([i915#2842]) +5 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb2/igt@gem_exec_fair@basic-none@vcs0.html

  * igt@gem_exec_fair@basic-none@vecs0:
    - shard-iclb:         NOTRUN -> [FAIL][25] ([i915#2842]) +3 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb8/igt@gem_exec_fair@basic-none@vecs0.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-kbl:          [PASS][26] -> [FAIL][27] ([i915#2842])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/shard-kbl7/igt@gem_exec_fair@basic-pace-solo@rcs0.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-kbl1/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_exec_params@secure-non-root:
    - shard-tglb:         NOTRUN -> [SKIP][28] ([fdo#112283])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb2/igt@gem_exec_params@secure-non-root.html

  * igt@gem_lmem_swapping@heavy-random:
    - shard-iclb:         NOTRUN -> [SKIP][29] ([i915#4613]) +2 similar issues
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb1/igt@gem_lmem_swapping@heavy-random.html
    - shard-glk:          NOTRUN -> [SKIP][30] ([fdo#109271] / [i915#4613]) +1 similar issue
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-glk3/igt@gem_lmem_swapping@heavy-random.html

  * igt@gem_lmem_swapping@heavy-verify-random:
    - shard-kbl:          NOTRUN -> [SKIP][31] ([fdo#109271] / [i915#4613]) +1 similar issue
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-kbl7/igt@gem_lmem_swapping@heavy-verify-random.html
    - shard-skl:          NOTRUN -> [SKIP][32] ([fdo#109271] / [i915#4613]) +4 similar issues
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl9/igt@gem_lmem_swapping@heavy-verify-random.html
    - shard-tglb:         NOTRUN -> [SKIP][33] ([i915#4613]) +2 similar issues
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb6/igt@gem_lmem_swapping@heavy-verify-random.html

  * igt@gem_lmem_swapping@smem-oom:
    - shard-apl:          NOTRUN -> [SKIP][34] ([fdo#109271] / [i915#4613]) +1 similar issue
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-apl7/igt@gem_lmem_swapping@smem-oom.html

  * igt@gem_pxp@create-regular-context-1:
    - shard-iclb:         NOTRUN -> [SKIP][35] ([i915#4270]) +3 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb5/igt@gem_pxp@create-regular-context-1.html

  * igt@gem_pxp@reject-modify-context-protection-off-2:
    - shard-tglb:         NOTRUN -> [SKIP][36] ([i915#4270]) +3 similar issues
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb1/igt@gem_pxp@reject-modify-context-protection-off-2.html

  * igt@gem_render_copy@y-tiled-mc-ccs-to-y-tiled-ccs:
    - shard-iclb:         NOTRUN -> [SKIP][37] ([i915#768]) +1 similar issue
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb7/igt@gem_render_copy@y-tiled-mc-ccs-to-y-tiled-ccs.html

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-skl:          NOTRUN -> [SKIP][38] ([fdo#109271] / [i915#3323])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl10/igt@gem_userptr_blits@dmabuf-sync.html

  * igt@gem_userptr_blits@input-checking:
    - shard-apl:          NOTRUN -> [DMESG-WARN][39] ([i915#4991])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-apl4/igt@gem_userptr_blits@input-checking.html
    - shard-kbl:          NOTRUN -> [DMESG-WARN][40] ([i915#4991])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-kbl6/igt@gem_userptr_blits@input-checking.html

  * igt@gem_userptr_blits@unsync-unmap-cycles:
    - shard-tglb:         NOTRUN -> [SKIP][41] ([i915#3297])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb5/igt@gem_userptr_blits@unsync-unmap-cycles.html
    - shard-iclb:         NOTRUN -> [SKIP][42] ([i915#3297])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb1/igt@gem_userptr_blits@unsync-unmap-cycles.html

  * igt@gen3_render_tiledy_blits:
    - shard-tglb:         NOTRUN -> [SKIP][43] ([fdo#109289]) +4 similar issues
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb1/igt@gen3_render_tiledy_blits.html

  * igt@gen7_exec_parse@batch-without-end:
    - shard-iclb:         NOTRUN -> [SKIP][44] ([fdo#109289]) +4 similar issues
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb7/igt@gen7_exec_parse@batch-without-end.html

  * igt@gen9_exec_parse@shadow-peek:
    - shard-tglb:         NOTRUN -> [SKIP][45] ([i915#2527] / [i915#2856]) +5 similar issues
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb5/igt@gen9_exec_parse@shadow-peek.html

  * igt@gen9_exec_parse@unaligned-access:
    - shard-iclb:         NOTRUN -> [SKIP][46] ([i915#2856]) +1 similar issue
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb2/igt@gen9_exec_parse@unaligned-access.html

  * igt@i915_pm_dc@dc6-psr:
    - shard-skl:          NOTRUN -> [FAIL][47] ([i915#454])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl9/igt@i915_pm_dc@dc6-psr.html

  * igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-dp:
    - shard-apl:          NOTRUN -> [SKIP][48] ([fdo#109271] / [i915#1937])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-apl7/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-dp.html

  * igt@i915_pm_rc6_residency@media-rc6-accuracy:
    - shard-tglb:         NOTRUN -> [SKIP][49] ([fdo#109289] / [fdo#111719])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb7/igt@i915_pm_rc6_residency@media-rc6-accuracy.html

  * igt@i915_pm_rc6_residency@rc6-idle:
    - shard-tglb:         NOTRUN -> [WARN][50] ([i915#2681] / [i915#2684])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb7/igt@i915_pm_rc6_residency@rc6-idle.html

  * igt@i915_pm_rpm@gem-execbuf-stress-pc8:
    - shard-iclb:         NOTRUN -> [SKIP][51] ([fdo#109293] / [fdo#109506])
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb5/igt@i915_pm_rpm@gem-execbuf-stress-pc8.html

  * igt@i915_pm_sseu@full-enable:
    - shard-tglb:         NOTRUN -> [SKIP][52] ([i915#4387])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb3/igt@i915_pm_sseu@full-enable.html
    - shard-iclb:         NOTRUN -> [SKIP][53] ([i915#4387])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb2/igt@i915_pm_sseu@full-enable.html

  * igt@i915_selftest@live@hangcheck:
    - shard-snb:          [PASS][54] -> [INCOMPLETE][55] ([i915#3921])
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/shard-snb7/igt@i915_selftest@live@hangcheck.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-snb7/igt@i915_selftest@live@hangcheck.html

  * igt@i915_suspend@forcewake:
    - shard-kbl:          [PASS][56] -> [DMESG-WARN][57] ([i915#180]) +1 similar issue
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/shard-kbl7/igt@i915_suspend@forcewake.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-kbl6/igt@i915_suspend@forcewake.html

  * igt@kms_async_flips@crc:
    - shard-skl:          NOTRUN -> [FAIL][58] ([i915#4272])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl8/igt@kms_async_flips@crc.html

  * igt@kms_atomic@plane-primary-overlay-mutable-zpos:
    - shard-tglb:         NOTRUN -> [SKIP][59] ([i915#404])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb5/igt@kms_atomic@plane-primary-overlay-mutable-zpos.html

  * igt@kms_atomic_transition@plane-all-modeset-transition:
    - shard-iclb:         NOTRUN -> [SKIP][60] ([i915#1769])
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb1/igt@kms_atomic_transition@plane-all-modeset-transition.html
    - shard-tglb:         NOTRUN -> [SKIP][61] ([i915#1769])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb8/igt@kms_atomic_transition@plane-all-modeset-transition.html

  * igt@kms_big_fb@linear-8bpp-rotate-270:
    - shard-tglb:         NOTRUN -> [SKIP][62] ([fdo#111614]) +3 similar issues
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb7/igt@kms_big_fb@linear-8bpp-rotate-270.html

  * igt@kms_big_fb@x-tiled-64bpp-rotate-270:
    - shard-iclb:         NOTRUN -> [SKIP][63] ([fdo#110725] / [fdo#111614]) +2 similar issues
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb2/igt@kms_big_fb@x-tiled-64bpp-rotate-270.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-64bpp-rotate-0-async-flip:
    - shard-skl:          NOTRUN -> [FAIL][64] ([i915#3743])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl8/igt@kms_big_fb@x-tiled-max-hw-stride-64bpp-rotate-0-async-flip.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-64bpp-rotate-0-hflip-async-flip:
    - shard-apl:          NOTRUN -> [SKIP][65] ([fdo#109271] / [i915#3777]) +3 similar issues
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-apl4/igt@kms_big_fb@x-tiled-max-hw-stride-64bpp-rotate-0-hflip-async-flip.html

  * igt@kms_big_fb@y-tiled-32bpp-rotate-0:
    - shard-glk:          [PASS][66] -> [DMESG-WARN][67] ([i915#118])
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/shard-glk1/igt@kms_big_fb@y-tiled-32bpp-rotate-0.html
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-glk4/igt@kms_big_fb@y-tiled-32bpp-rotate-0.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-async-flip:
    - shard-skl:          NOTRUN -> [FAIL][68] ([i915#3763])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl1/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-async-flip.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip-async-flip:
    - shard-kbl:          NOTRUN -> [SKIP][69] ([fdo#109271] / [i915#3777]) +4 similar issues
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-kbl6/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip-async-flip.html

  * igt@kms_big_fb@yf-tiled-64bpp-rotate-270:
    - shard-tglb:         NOTRUN -> [SKIP][70] ([fdo#111615]) +9 similar issues
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb8/igt@kms_big_fb@yf-tiled-64bpp-rotate-270.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip:
    - shard-skl:          NOTRUN -> [SKIP][71] ([fdo#109271] / [i915#3777]) +3 similar issues
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl9/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0:
    - shard-apl:          NOTRUN -> [SKIP][72] ([fdo#109271]) +206 similar issues
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-apl2/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0.html
    - shard-iclb:         NOTRUN -> [SKIP][73] ([fdo#110723]) +3 similar issues
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb3/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0.html

  * igt@kms_big_joiner@basic:
    - shard-tglb:         NOTRUN -> [SKIP][74] ([i915#2705])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb2/igt@kms_big_joiner@basic.html
    - shard-iclb:         NOTRUN -> [SKIP][75] ([i915#2705])
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb7/igt@kms_big_joiner@basic.html

  * igt@kms_ccs@pipe-a-bad-aux-stride-y_tiled_gen12_rc_ccs_cc:
    - shard-glk:          NOTRUN -> [SKIP][76] ([fdo#109271] / [i915#3886]) +5 similar issues
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-glk6/igt@kms_ccs@pipe-a-bad-aux-stride-y_tiled_gen12_rc_ccs_cc.html
    - shard-iclb:         NOTRUN -> [SKIP][77] ([fdo#109278] / [i915#3886]) +5 similar issues
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb3/igt@kms_ccs@pipe-a-bad-aux-stride-y_tiled_gen12_rc_ccs_cc.html
    - shard-apl:          NOTRUN -> [SKIP][78] ([fdo#109271] / [i915#3886]) +14 similar issues
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-apl2/igt@kms_ccs@pipe-a-bad-aux-stride-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_mc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][79] ([i915#3689] / [i915#3886]) +5 similar issues
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb7/igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_rc_ccs_cc:
    - shard-kbl:          NOTRUN -> [SKIP][80] ([fdo#109271] / [i915#3886]) +9 similar issues
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-kbl4/igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][81] ([i915#3689]) +8 similar issues
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb7/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_ccs.html

  * igt@kms_ccs@pipe-b-bad-pixel-format-y_tiled_ccs:
    - shard-snb:          NOTRUN -> [SKIP][82] ([fdo#109271]) +160 similar issues
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-snb7/igt@kms_ccs@pipe-b-bad-pixel-format-y_tiled_ccs.html

  * igt@kms_ccs@pipe-c-missing-ccs-buffer-y_tiled_gen12_rc_ccs_cc:
    - shard-skl:          NOTRUN -> [SKIP][83] ([fdo#109271] / [i915#3886]) +10 similar issues
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl4/igt@kms_ccs@pipe-c-missing-ccs-buffer-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-d-missing-ccs-buffer-yf_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][84] ([fdo#111615] / [i915#3689]) +11 similar issues
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb2/igt@kms_ccs@pipe-d-missing-ccs-buffer-yf_tiled_ccs.html

  * igt@kms_cdclk@plane-scaling:
    - shard-tglb:         NOTRUN -> [SKIP][85] ([i915#3742])
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb5/igt@kms_cdclk@plane-scaling.html

  * igt@kms_chamelium@dp-crc-multiple:
    - shard-apl:          NOTRUN -> [SKIP][86] ([fdo#109271] / [fdo#111827]) +20 similar issues
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-apl4/igt@kms_chamelium@dp-crc-multiple.html

  * igt@kms_chamelium@dp-hpd-for-each-pipe:
    - shard-kbl:          NOTRUN -> [SKIP][87] ([fdo#109271] / [fdo#111827]) +22 similar issues
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-kbl7/igt@kms_chamelium@dp-hpd-for-each-pipe.html

  * igt@kms_chamelium@hdmi-hpd:
    - shard-glk:          NOTRUN -> [SKIP][88] ([fdo#109271] / [fdo#111827]) +11 similar issues
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-glk1/igt@kms_chamelium@hdmi-hpd.html

  * igt@kms_chamelium@vga-hpd-for-each-pipe:
    - shard-skl:          NOTRUN -> [SKIP][89] ([fdo#109271] / [fdo#111827]) +20 similar issues
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl6/igt@kms_chamelium@vga-hpd-for-each-pipe.html

  * igt@kms_color_chamelium@pipe-b-ctm-limited-range:
    - shard-tglb:         NOTRUN -> [SKIP][90] ([fdo#109284] / [fdo#111827]) +16 similar issues
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb5/igt@kms_color_chamelium@pipe-b-ctm-limited-range.html

  * igt@kms_color_chamelium@pipe-c-ctm-green-to-red:
    - shard-snb:          NOTRUN -> [SKIP][91] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-snb6/igt@kms_color_chamelium@pipe-c-ctm-green-to-red.html

  * igt@kms_color_chamelium@pipe-c-gamma:
    - shard-iclb:         NOTRUN -> [SKIP][92] ([fdo#109284] / [fdo#111827]) +13 similar issues
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb8/igt@kms_color_chamelium@pipe-c-gamma.html

  * igt@kms_color_chamelium@pipe-d-ctm-limited-range:
    - shard-iclb:         NOTRUN -> [SKIP][93] ([fdo#109278] / [fdo#109284] / [fdo#111827])
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb2/igt@kms_color_chamelium@pipe-d-ctm-limited-range.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-tglb:         NOTRUN -> [SKIP][94] ([i915#1063]) +2 similar issues
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb6/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_content_protection@lic:
    - shard-apl:          NOTRUN -> [TIMEOUT][95] ([i915#1319]) +1 similar issue
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-apl7/igt@kms_content_protection@lic.html
    - shard-kbl:          NOTRUN -> [TIMEOUT][96] ([i915#1319])
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-kbl3/igt@kms_content_protection@lic.html

  * igt@kms_content_protection@type1:
    - shard-iclb:         NOTRUN -> [SKIP][97] ([fdo#109300] / [fdo#111066])
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb4/igt@kms_content_protection@type1.html

  * igt@kms_cursor_crc@pipe-b-cursor-32x32-sliding:
    - shard-tglb:         NOTRUN -> [SKIP][98] ([i915#3319]) +3 similar issues
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb5/igt@kms_cursor_crc@pipe-b-cursor-32x32-sliding.html

  * igt@kms_cursor_crc@pipe-b-cursor-512x512-rapid-movement:
    - shard-iclb:         NOTRUN -> [SKIP][99] ([fdo#109278] / [fdo#109279]) +2 similar issues
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb4/igt@kms_cursor_crc@pipe-b-cursor-512x512-rapid-movement.html

  * igt@kms_cursor_crc@pipe-c-cursor-512x512-random:
    - shard-tglb:         NOTRUN -> [SKIP][100] ([fdo#109279] / [i915#3359]) +8 similar issues
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb5/igt@kms_cursor_crc@pipe-c-cursor-512x512-random.html

  * igt@kms_cursor_crc@pipe-c-cursor-max-size-rapid-movement:
    - shard-tglb:         NOTRUN -> [SKIP][101] ([i915#3359]) +6 similar issues
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb3/igt@kms_cursor_crc@pipe-c-cursor-max-size-rapid-movement.html

  * igt@kms_cursor_edge_walk@pipe-c-256x256-bottom-edge:
    - shard-iclb:         NOTRUN -> [INCOMPLETE][102] ([i915#2295])
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb4/igt@kms_cursor_edge_walk@pipe-c-256x256-bottom-edge.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - shard-tglb:         NOTRUN -> [SKIP][103] ([i915#4103]) +2 similar issues
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb6/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-atomic:
    - shard-tglb:         NOTRUN -> [SKIP][104] ([fdo#109274] / [fdo#111825]) +10 similar issues
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-tglb2/igt@kms_cursor_legacy@cursora-vs-flipb-atomic.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-legacy:
    - shard-iclb:         NOTRUN -> [SKIP][105] ([fdo#109274] / [fdo#109278]) +3 similar issues
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb5/igt@kms_cursor_legacy@cursora-vs-flipb-legacy.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size:
    - shard-skl:          [PASS][106] -> [FAIL][107] ([i915#2346] / [i915#533])
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11308/shard-skl4/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-skl9/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@flip-vs-cursor-varying-size:
    - shard-iclb:         NOTRUN -> [FAIL][108] ([i915#2346])
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb7/igt@kms_cursor_legacy@flip-vs-cursor-varying-size.html

  * igt@kms_cursor_legacy@pipe-d-single-move:
    - shard-iclb:         NOTRUN -> [SKIP][109] ([fdo#109278]) +46 similar issues
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-iclb3/igt@kms_cursor_legacy@pipe-d-single-move.html

  * igt@kms_cursor_legacy@pipe-d-torture-bo:
    - shard-kbl:          NOTRUN -> [SKIP][110] ([fdo#109271] / [i915#533]) +2 similar issues
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/shard-kbl4/igt@kms_cursor_legac

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22455/index.html

[-- Attachment #2: Type: text/html, Size: 34277 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/4] drm/ttm: parameter to add extra pages into ttm_tt
  2022-03-01 21:53   ` [Intel-gfx] " Ramalingam C
@ 2022-03-02 12:54     ` Thomas Hellström
  -1 siblings, 0 replies; 25+ messages in thread
From: Thomas Hellström @ 2022-03-02 12:54 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel; +Cc: Matthew Auld, Christian Koenig

On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote:
> When a driver needs extra pages in ttm_tt, to facilidate such
> requirement, parameter called "extra_pages" is added for
> ttm_tt_init

nit: Please use imperative wording in commit title and description,
"Add a parameter to add extra pages.."

> 
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> cc: Christian Koenig <christian.koenig@amd.com>
> cc: Hellstrom Thomas <thomas.hellstrom@intel.com>

Otherwise LGTM.
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> ---
>  drivers/gpu/drm/drm_gem_vram_helper.c      |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c    |  2 +-
>  drivers/gpu/drm/qxl/qxl_ttm.c              |  2 +-
>  drivers/gpu/drm/ttm/ttm_agp_backend.c      |  2 +-
>  drivers/gpu/drm/ttm/ttm_tt.c               | 12 +++++++-----
>  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
>  include/drm/ttm/ttm_tt.h                   |  4 +++-
>  7 files changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c
> b/drivers/gpu/drm/drm_gem_vram_helper.c
> index dc7f938bfff2..123045b58fec 100644
> --- a/drivers/gpu/drm/drm_gem_vram_helper.c
> +++ b/drivers/gpu/drm/drm_gem_vram_helper.c
> @@ -867,7 +867,7 @@ static struct ttm_tt
> *bo_driver_ttm_tt_create(struct ttm_buffer_object *bo,
>         if (!tt)
>                 return NULL;
>  
> -       ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);
> +       ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
>         if (ret < 0)
>                 goto err_ttm_tt_init;
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 45cc5837ce00..1a8262f5f692 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -283,7 +283,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct
> ttm_buffer_object *bo,
>                 i915_tt->is_shmem = true;
>         }
>  
> -       ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching);
> +       ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
>         if (ret)
>                 goto err_free;
>  
> diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c
> b/drivers/gpu/drm/qxl/qxl_ttm.c
> index b2e33d5ba5d0..52156b54498f 100644
> --- a/drivers/gpu/drm/qxl/qxl_ttm.c
> +++ b/drivers/gpu/drm/qxl/qxl_ttm.c
> @@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct
> ttm_buffer_object *bo,
>         ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
>         if (ttm == NULL)
>                 return NULL;
> -       if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
> +       if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
>                 kfree(ttm);
>                 return NULL;
>         }
> diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c
> b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> index 6ddc16f0fe2b..d27691f2e451 100644
> --- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
> +++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> @@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct
> ttm_buffer_object *bo,
>         agp_be->mem = NULL;
>         agp_be->bridge = bridge;
>  
> -       if (ttm_tt_init(&agp_be->ttm, bo, page_flags,
> ttm_write_combined)) {
> +       if (ttm_tt_init(&agp_be->ttm, bo, page_flags,
> ttm_write_combined, 0)) {
>                 kfree(agp_be);
>                 return NULL;
>         }
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
> b/drivers/gpu/drm/ttm/ttm_tt.c
> index d234aab800a0..1a66d9fc589a 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -134,9 +134,10 @@ void ttm_tt_destroy(struct ttm_device *bdev,
> struct ttm_tt *ttm)
>  static void ttm_tt_init_fields(struct ttm_tt *ttm,
>                                struct ttm_buffer_object *bo,
>                                uint32_t page_flags,
> -                              enum ttm_caching caching)
> +                              enum ttm_caching caching,
> +                              unsigned long extra_pages)
>  {
> -       ttm->num_pages = PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT;
> +       ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) +
> extra_pages;
>         ttm->caching = ttm_cached;
>         ttm->page_flags = page_flags;
>         ttm->dma_address = NULL;
> @@ -146,9 +147,10 @@ static void ttm_tt_init_fields(struct ttm_tt
> *ttm,
>  }
>  
>  int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> -               uint32_t page_flags, enum ttm_caching caching)
> +               uint32_t page_flags, enum ttm_caching caching,
> +               unsigned long extra_pages)
>  {
> -       ttm_tt_init_fields(ttm, bo, page_flags, caching);
> +       ttm_tt_init_fields(ttm, bo, page_flags, caching,
> extra_pages);
>  
>         if (ttm_tt_alloc_page_directory(ttm)) {
>                 pr_err("Failed allocating page table\n");
> @@ -180,7 +182,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct
> ttm_buffer_object *bo,
>  {
>         int ret;
>  
> -       ttm_tt_init_fields(ttm, bo, page_flags, caching);
> +       ttm_tt_init_fields(ttm, bo, page_flags, caching, 0);
>  
>         if (page_flags & TTM_TT_FLAG_EXTERNAL)
>                 ret = ttm_sg_tt_alloc_page_directory(ttm);
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> index b84ecc6d6611..4e3938e62c08 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> @@ -517,7 +517,7 @@ static struct ttm_tt *vmw_ttm_tt_create(struct
> ttm_buffer_object *bo,
>                                      ttm_cached);
>         else
>                 ret = ttm_tt_init(&vmw_be->dma_ttm, bo, page_flags,
> -                                 ttm_cached);
> +                                 ttm_cached, 0);
>         if (unlikely(ret != 0))
>                 goto out_no_init;
>  
> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> index f20832139815..17a0310e8aaa 100644
> --- a/include/drm/ttm/ttm_tt.h
> +++ b/include/drm/ttm/ttm_tt.h
> @@ -140,6 +140,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo,
> bool zero_alloc);
>   * @bo: The buffer object we create the ttm for.
>   * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
>   * @caching: the desired caching state of the pages
> + * @extra_pages: Extra pages needed for the driver.
>   *
>   * Create a struct ttm_tt to back data with system memory pages.
>   * No pages are actually allocated.
> @@ -147,7 +148,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo,
> bool zero_alloc);
>   * NULL: Out of memory.
>   */
>  int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> -               uint32_t page_flags, enum ttm_caching caching);
> +               uint32_t page_flags, enum ttm_caching caching,
> +               unsigned long extra_pages);
>  int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object
> *bo,
>                    uint32_t page_flags, enum ttm_caching caching);
>  



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Intel-gfx] [PATCH v2 2/4] drm/ttm: parameter to add extra pages into ttm_tt
@ 2022-03-02 12:54     ` Thomas Hellström
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Hellström @ 2022-03-02 12:54 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel; +Cc: Matthew Auld, Christian Koenig

On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote:
> When a driver needs extra pages in ttm_tt, to facilidate such
> requirement, parameter called "extra_pages" is added for
> ttm_tt_init

nit: Please use imperative wording in commit title and description,
"Add a parameter to add extra pages.."

> 
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> cc: Christian Koenig <christian.koenig@amd.com>
> cc: Hellstrom Thomas <thomas.hellstrom@intel.com>

Otherwise LGTM.
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> ---
>  drivers/gpu/drm/drm_gem_vram_helper.c      |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c    |  2 +-
>  drivers/gpu/drm/qxl/qxl_ttm.c              |  2 +-
>  drivers/gpu/drm/ttm/ttm_agp_backend.c      |  2 +-
>  drivers/gpu/drm/ttm/ttm_tt.c               | 12 +++++++-----
>  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
>  include/drm/ttm/ttm_tt.h                   |  4 +++-
>  7 files changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c
> b/drivers/gpu/drm/drm_gem_vram_helper.c
> index dc7f938bfff2..123045b58fec 100644
> --- a/drivers/gpu/drm/drm_gem_vram_helper.c
> +++ b/drivers/gpu/drm/drm_gem_vram_helper.c
> @@ -867,7 +867,7 @@ static struct ttm_tt
> *bo_driver_ttm_tt_create(struct ttm_buffer_object *bo,
>         if (!tt)
>                 return NULL;
>  
> -       ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);
> +       ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
>         if (ret < 0)
>                 goto err_ttm_tt_init;
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 45cc5837ce00..1a8262f5f692 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -283,7 +283,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct
> ttm_buffer_object *bo,
>                 i915_tt->is_shmem = true;
>         }
>  
> -       ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching);
> +       ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
>         if (ret)
>                 goto err_free;
>  
> diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c
> b/drivers/gpu/drm/qxl/qxl_ttm.c
> index b2e33d5ba5d0..52156b54498f 100644
> --- a/drivers/gpu/drm/qxl/qxl_ttm.c
> +++ b/drivers/gpu/drm/qxl/qxl_ttm.c
> @@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct
> ttm_buffer_object *bo,
>         ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
>         if (ttm == NULL)
>                 return NULL;
> -       if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
> +       if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
>                 kfree(ttm);
>                 return NULL;
>         }
> diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c
> b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> index 6ddc16f0fe2b..d27691f2e451 100644
> --- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
> +++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> @@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct
> ttm_buffer_object *bo,
>         agp_be->mem = NULL;
>         agp_be->bridge = bridge;
>  
> -       if (ttm_tt_init(&agp_be->ttm, bo, page_flags,
> ttm_write_combined)) {
> +       if (ttm_tt_init(&agp_be->ttm, bo, page_flags,
> ttm_write_combined, 0)) {
>                 kfree(agp_be);
>                 return NULL;
>         }
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
> b/drivers/gpu/drm/ttm/ttm_tt.c
> index d234aab800a0..1a66d9fc589a 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -134,9 +134,10 @@ void ttm_tt_destroy(struct ttm_device *bdev,
> struct ttm_tt *ttm)
>  static void ttm_tt_init_fields(struct ttm_tt *ttm,
>                                struct ttm_buffer_object *bo,
>                                uint32_t page_flags,
> -                              enum ttm_caching caching)
> +                              enum ttm_caching caching,
> +                              unsigned long extra_pages)
>  {
> -       ttm->num_pages = PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT;
> +       ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) +
> extra_pages;
>         ttm->caching = ttm_cached;
>         ttm->page_flags = page_flags;
>         ttm->dma_address = NULL;
> @@ -146,9 +147,10 @@ static void ttm_tt_init_fields(struct ttm_tt
> *ttm,
>  }
>  
>  int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> -               uint32_t page_flags, enum ttm_caching caching)
> +               uint32_t page_flags, enum ttm_caching caching,
> +               unsigned long extra_pages)
>  {
> -       ttm_tt_init_fields(ttm, bo, page_flags, caching);
> +       ttm_tt_init_fields(ttm, bo, page_flags, caching,
> extra_pages);
>  
>         if (ttm_tt_alloc_page_directory(ttm)) {
>                 pr_err("Failed allocating page table\n");
> @@ -180,7 +182,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct
> ttm_buffer_object *bo,
>  {
>         int ret;
>  
> -       ttm_tt_init_fields(ttm, bo, page_flags, caching);
> +       ttm_tt_init_fields(ttm, bo, page_flags, caching, 0);
>  
>         if (page_flags & TTM_TT_FLAG_EXTERNAL)
>                 ret = ttm_sg_tt_alloc_page_directory(ttm);
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> index b84ecc6d6611..4e3938e62c08 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> @@ -517,7 +517,7 @@ static struct ttm_tt *vmw_ttm_tt_create(struct
> ttm_buffer_object *bo,
>                                      ttm_cached);
>         else
>                 ret = ttm_tt_init(&vmw_be->dma_ttm, bo, page_flags,
> -                                 ttm_cached);
> +                                 ttm_cached, 0);
>         if (unlikely(ret != 0))
>                 goto out_no_init;
>  
> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> index f20832139815..17a0310e8aaa 100644
> --- a/include/drm/ttm/ttm_tt.h
> +++ b/include/drm/ttm/ttm_tt.h
> @@ -140,6 +140,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo,
> bool zero_alloc);
>   * @bo: The buffer object we create the ttm for.
>   * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
>   * @caching: the desired caching state of the pages
> + * @extra_pages: Extra pages needed for the driver.
>   *
>   * Create a struct ttm_tt to back data with system memory pages.
>   * No pages are actually allocated.
> @@ -147,7 +148,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo,
> bool zero_alloc);
>   * NULL: Out of memory.
>   */
>  int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> -               uint32_t page_flags, enum ttm_caching caching);
> +               uint32_t page_flags, enum ttm_caching caching,
> +               unsigned long extra_pages);
>  int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object
> *bo,
>                    uint32_t page_flags, enum ttm_caching caching);
>  



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/4] drm/i915/gem: Extra pages in ttm_tt for ccs data
  2022-03-01 21:53   ` [Intel-gfx] " Ramalingam C
@ 2022-03-02 12:58     ` Thomas Hellström
  -1 siblings, 0 replies; 25+ messages in thread
From: Thomas Hellström @ 2022-03-02 12:58 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel; +Cc: Matthew Auld, Christian Koenig

On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote:
> On Xe-HP and later devices, we use dedicated compression control
> state (CCS) stored in local memory for each surface, to support the
> 3D and media compression formats.
> 
> The memory required for the CCS of the entire local memory is 1/256
> of
> the local memory size. So before the kernel boot, the required memory
> is reserved for the CCS data and a secure register will be programmed
> with the CCS base address
> 
> So when we allocate a object in local memory we dont need to
> explicitly
> allocate the space for ccs data. But when we evict the obj into the
> smem to hold the compression related data along with the obj we need
> smem space of obj_size + (obj_size/256).
> 
> Hence when we create smem for an obj with lmem placement possibility
> we
> create with the extra space.

Nit: Again imperative wording, 


> 
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> cc: Christian Koenig <christian.koenig@amd.com>
> cc: Hellstrom Thomas <thomas.hellstrom@intel.com>

Reviewed by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> ---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 1a8262f5f692..c7a36861c38d 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -20,6 +20,7 @@
>  #include "gem/i915_gem_ttm.h"
>  #include "gem/i915_gem_ttm_move.h"
>  #include "gem/i915_gem_ttm_pm.h"
> +#include "gt/intel_gpu_commands.h"
>  
>  #define I915_TTM_PRIO_PURGE     0
>  #define I915_TTM_PRIO_NO_PAGES  1
> @@ -255,12 +256,27 @@ static const struct i915_refct_sgt_ops
> tt_rsgt_ops = {
>         .release = i915_ttm_tt_release
>  };
>  
> +static inline bool
> +i915_gem_object_has_lmem_placement(struct drm_i915_gem_object *obj)
> +{
> +       int i;
> +
> +       for (i = 0; i < obj->mm.n_placements; i++)
> +               if (obj->mm.placements[i]->type ==
> INTEL_MEMORY_LOCAL)
> +                       return true;
> +
> +       return false;
> +}
> +
>  static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object
> *bo,
>                                          uint32_t page_flags)
>  {
> +       struct drm_i915_private *i915 = container_of(bo->bdev,
> typeof(*i915),
> +                                                    bdev);
>         struct ttm_resource_manager *man =
>                 ttm_manager_type(bo->bdev, bo->resource->mem_type);
>         struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> +       unsigned long ccs_pages = 0;
>         enum ttm_caching caching;
>         struct i915_ttm_tt *i915_tt;
>         int ret;
> @@ -283,7 +299,12 @@ static struct ttm_tt *i915_ttm_tt_create(struct
> ttm_buffer_object *bo,
>                 i915_tt->is_shmem = true;
>         }
>  
> -       ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
> +       if (HAS_FLAT_CCS(i915) &&
> i915_gem_object_has_lmem_placement(obj))
> +               ccs_pages = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size,
> +                                                    
> NUM_BYTES_PER_CCS_BYTE),
> +                                        PAGE_SIZE);
> +
> +       ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching,
> ccs_pages);
>         if (ret)
>                 goto err_free;
>  



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Intel-gfx] [PATCH v2 3/4] drm/i915/gem: Extra pages in ttm_tt for ccs data
@ 2022-03-02 12:58     ` Thomas Hellström
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Hellström @ 2022-03-02 12:58 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel; +Cc: Matthew Auld, Christian Koenig

On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote:
> On Xe-HP and later devices, we use dedicated compression control
> state (CCS) stored in local memory for each surface, to support the
> 3D and media compression formats.
> 
> The memory required for the CCS of the entire local memory is 1/256
> of
> the local memory size. So before the kernel boot, the required memory
> is reserved for the CCS data and a secure register will be programmed
> with the CCS base address
> 
> So when we allocate a object in local memory we dont need to
> explicitly
> allocate the space for ccs data. But when we evict the obj into the
> smem to hold the compression related data along with the obj we need
> smem space of obj_size + (obj_size/256).
> 
> Hence when we create smem for an obj with lmem placement possibility
> we
> create with the extra space.

Nit: Again imperative wording, 


> 
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> cc: Christian Koenig <christian.koenig@amd.com>
> cc: Hellstrom Thomas <thomas.hellstrom@intel.com>

Reviewed by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> ---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 1a8262f5f692..c7a36861c38d 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -20,6 +20,7 @@
>  #include "gem/i915_gem_ttm.h"
>  #include "gem/i915_gem_ttm_move.h"
>  #include "gem/i915_gem_ttm_pm.h"
> +#include "gt/intel_gpu_commands.h"
>  
>  #define I915_TTM_PRIO_PURGE     0
>  #define I915_TTM_PRIO_NO_PAGES  1
> @@ -255,12 +256,27 @@ static const struct i915_refct_sgt_ops
> tt_rsgt_ops = {
>         .release = i915_ttm_tt_release
>  };
>  
> +static inline bool
> +i915_gem_object_has_lmem_placement(struct drm_i915_gem_object *obj)
> +{
> +       int i;
> +
> +       for (i = 0; i < obj->mm.n_placements; i++)
> +               if (obj->mm.placements[i]->type ==
> INTEL_MEMORY_LOCAL)
> +                       return true;
> +
> +       return false;
> +}
> +
>  static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object
> *bo,
>                                          uint32_t page_flags)
>  {
> +       struct drm_i915_private *i915 = container_of(bo->bdev,
> typeof(*i915),
> +                                                    bdev);
>         struct ttm_resource_manager *man =
>                 ttm_manager_type(bo->bdev, bo->resource->mem_type);
>         struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> +       unsigned long ccs_pages = 0;
>         enum ttm_caching caching;
>         struct i915_ttm_tt *i915_tt;
>         int ret;
> @@ -283,7 +299,12 @@ static struct ttm_tt *i915_ttm_tt_create(struct
> ttm_buffer_object *bo,
>                 i915_tt->is_shmem = true;
>         }
>  
> -       ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
> +       if (HAS_FLAT_CCS(i915) &&
> i915_gem_object_has_lmem_placement(obj))
> +               ccs_pages = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size,
> +                                                    
> NUM_BYTES_PER_CCS_BYTE),
> +                                        PAGE_SIZE);
> +
> +       ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching,
> ccs_pages);
>         if (ret)
>                 goto err_free;
>  



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/4] drm/ttm: parameter to add extra pages into ttm_tt
  2022-03-01 21:53   ` [Intel-gfx] " Ramalingam C
@ 2022-03-02 13:24     ` Christian König
  -1 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2022-03-02 13:24 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

Am 01.03.22 um 22:53 schrieb Ramalingam C:
> When a driver needs extra pages in ttm_tt, to facilidate such
> requirement, parameter called "extra_pages" is added for
> ttm_tt_init
>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> cc: Christian Koenig <christian.koenig@amd.com>
> cc: Hellstrom Thomas <thomas.hellstrom@intel.com>

With the nits pointed out by Thomas the patch is Reviewed-by: Christian 
König <christian.koenig@amd.com> as well.

Let me know through which branch you want to push this upstream (i915 or 
drm-misc-next).

Thanks,
Christian.

> ---
>   drivers/gpu/drm/drm_gem_vram_helper.c      |  2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c    |  2 +-
>   drivers/gpu/drm/qxl/qxl_ttm.c              |  2 +-
>   drivers/gpu/drm/ttm/ttm_agp_backend.c      |  2 +-
>   drivers/gpu/drm/ttm/ttm_tt.c               | 12 +++++++-----
>   drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
>   include/drm/ttm/ttm_tt.h                   |  4 +++-
>   7 files changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
> index dc7f938bfff2..123045b58fec 100644
> --- a/drivers/gpu/drm/drm_gem_vram_helper.c
> +++ b/drivers/gpu/drm/drm_gem_vram_helper.c
> @@ -867,7 +867,7 @@ static struct ttm_tt *bo_driver_ttm_tt_create(struct ttm_buffer_object *bo,
>   	if (!tt)
>   		return NULL;
>   
> -	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);
> +	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
>   	if (ret < 0)
>   		goto err_ttm_tt_init;
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 45cc5837ce00..1a8262f5f692 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -283,7 +283,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>   		i915_tt->is_shmem = true;
>   	}
>   
> -	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching);
> +	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
>   	if (ret)
>   		goto err_free;
>   
> diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
> index b2e33d5ba5d0..52156b54498f 100644
> --- a/drivers/gpu/drm/qxl/qxl_ttm.c
> +++ b/drivers/gpu/drm/qxl/qxl_ttm.c
> @@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct ttm_buffer_object *bo,
>   	ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
>   	if (ttm == NULL)
>   		return NULL;
> -	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
> +	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
>   		kfree(ttm);
>   		return NULL;
>   	}
> diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> index 6ddc16f0fe2b..d27691f2e451 100644
> --- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
> +++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> @@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_buffer_object *bo,
>   	agp_be->mem = NULL;
>   	agp_be->bridge = bridge;
>   
> -	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined)) {
> +	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined, 0)) {
>   		kfree(agp_be);
>   		return NULL;
>   	}
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index d234aab800a0..1a66d9fc589a 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -134,9 +134,10 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
>   static void ttm_tt_init_fields(struct ttm_tt *ttm,
>   			       struct ttm_buffer_object *bo,
>   			       uint32_t page_flags,
> -			       enum ttm_caching caching)
> +			       enum ttm_caching caching,
> +			       unsigned long extra_pages)
>   {
> -	ttm->num_pages = PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT;
> +	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
>   	ttm->caching = ttm_cached;
>   	ttm->page_flags = page_flags;
>   	ttm->dma_address = NULL;
> @@ -146,9 +147,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>   }
>   
>   int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> -		uint32_t page_flags, enum ttm_caching caching)
> +		uint32_t page_flags, enum ttm_caching caching,
> +		unsigned long extra_pages)
>   {
> -	ttm_tt_init_fields(ttm, bo, page_flags, caching);
> +	ttm_tt_init_fields(ttm, bo, page_flags, caching, extra_pages);
>   
>   	if (ttm_tt_alloc_page_directory(ttm)) {
>   		pr_err("Failed allocating page table\n");
> @@ -180,7 +182,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
>   {
>   	int ret;
>   
> -	ttm_tt_init_fields(ttm, bo, page_flags, caching);
> +	ttm_tt_init_fields(ttm, bo, page_flags, caching, 0);
>   
>   	if (page_flags & TTM_TT_FLAG_EXTERNAL)
>   		ret = ttm_sg_tt_alloc_page_directory(ttm);
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> index b84ecc6d6611..4e3938e62c08 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> @@ -517,7 +517,7 @@ static struct ttm_tt *vmw_ttm_tt_create(struct ttm_buffer_object *bo,
>   				     ttm_cached);
>   	else
>   		ret = ttm_tt_init(&vmw_be->dma_ttm, bo, page_flags,
> -				  ttm_cached);
> +				  ttm_cached, 0);
>   	if (unlikely(ret != 0))
>   		goto out_no_init;
>   
> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> index f20832139815..17a0310e8aaa 100644
> --- a/include/drm/ttm/ttm_tt.h
> +++ b/include/drm/ttm/ttm_tt.h
> @@ -140,6 +140,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
>    * @bo: The buffer object we create the ttm for.
>    * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
>    * @caching: the desired caching state of the pages
> + * @extra_pages: Extra pages needed for the driver.
>    *
>    * Create a struct ttm_tt to back data with system memory pages.
>    * No pages are actually allocated.
> @@ -147,7 +148,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
>    * NULL: Out of memory.
>    */
>   int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> -		uint32_t page_flags, enum ttm_caching caching);
> +		uint32_t page_flags, enum ttm_caching caching,
> +		unsigned long extra_pages);
>   int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object *bo,
>   		   uint32_t page_flags, enum ttm_caching caching);
>   


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Intel-gfx] [PATCH v2 2/4] drm/ttm: parameter to add extra pages into ttm_tt
@ 2022-03-02 13:24     ` Christian König
  0 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2022-03-02 13:24 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

Am 01.03.22 um 22:53 schrieb Ramalingam C:
> When a driver needs extra pages in ttm_tt, to facilidate such
> requirement, parameter called "extra_pages" is added for
> ttm_tt_init
>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> cc: Christian Koenig <christian.koenig@amd.com>
> cc: Hellstrom Thomas <thomas.hellstrom@intel.com>

With the nits pointed out by Thomas the patch is Reviewed-by: Christian 
König <christian.koenig@amd.com> as well.

Let me know through which branch you want to push this upstream (i915 or 
drm-misc-next).

Thanks,
Christian.

> ---
>   drivers/gpu/drm/drm_gem_vram_helper.c      |  2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c    |  2 +-
>   drivers/gpu/drm/qxl/qxl_ttm.c              |  2 +-
>   drivers/gpu/drm/ttm/ttm_agp_backend.c      |  2 +-
>   drivers/gpu/drm/ttm/ttm_tt.c               | 12 +++++++-----
>   drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
>   include/drm/ttm/ttm_tt.h                   |  4 +++-
>   7 files changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
> index dc7f938bfff2..123045b58fec 100644
> --- a/drivers/gpu/drm/drm_gem_vram_helper.c
> +++ b/drivers/gpu/drm/drm_gem_vram_helper.c
> @@ -867,7 +867,7 @@ static struct ttm_tt *bo_driver_ttm_tt_create(struct ttm_buffer_object *bo,
>   	if (!tt)
>   		return NULL;
>   
> -	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);
> +	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
>   	if (ret < 0)
>   		goto err_ttm_tt_init;
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 45cc5837ce00..1a8262f5f692 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -283,7 +283,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>   		i915_tt->is_shmem = true;
>   	}
>   
> -	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching);
> +	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
>   	if (ret)
>   		goto err_free;
>   
> diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
> index b2e33d5ba5d0..52156b54498f 100644
> --- a/drivers/gpu/drm/qxl/qxl_ttm.c
> +++ b/drivers/gpu/drm/qxl/qxl_ttm.c
> @@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct ttm_buffer_object *bo,
>   	ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
>   	if (ttm == NULL)
>   		return NULL;
> -	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
> +	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
>   		kfree(ttm);
>   		return NULL;
>   	}
> diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> index 6ddc16f0fe2b..d27691f2e451 100644
> --- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
> +++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> @@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_buffer_object *bo,
>   	agp_be->mem = NULL;
>   	agp_be->bridge = bridge;
>   
> -	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined)) {
> +	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined, 0)) {
>   		kfree(agp_be);
>   		return NULL;
>   	}
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index d234aab800a0..1a66d9fc589a 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -134,9 +134,10 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
>   static void ttm_tt_init_fields(struct ttm_tt *ttm,
>   			       struct ttm_buffer_object *bo,
>   			       uint32_t page_flags,
> -			       enum ttm_caching caching)
> +			       enum ttm_caching caching,
> +			       unsigned long extra_pages)
>   {
> -	ttm->num_pages = PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT;
> +	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
>   	ttm->caching = ttm_cached;
>   	ttm->page_flags = page_flags;
>   	ttm->dma_address = NULL;
> @@ -146,9 +147,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>   }
>   
>   int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> -		uint32_t page_flags, enum ttm_caching caching)
> +		uint32_t page_flags, enum ttm_caching caching,
> +		unsigned long extra_pages)
>   {
> -	ttm_tt_init_fields(ttm, bo, page_flags, caching);
> +	ttm_tt_init_fields(ttm, bo, page_flags, caching, extra_pages);
>   
>   	if (ttm_tt_alloc_page_directory(ttm)) {
>   		pr_err("Failed allocating page table\n");
> @@ -180,7 +182,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
>   {
>   	int ret;
>   
> -	ttm_tt_init_fields(ttm, bo, page_flags, caching);
> +	ttm_tt_init_fields(ttm, bo, page_flags, caching, 0);
>   
>   	if (page_flags & TTM_TT_FLAG_EXTERNAL)
>   		ret = ttm_sg_tt_alloc_page_directory(ttm);
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> index b84ecc6d6611..4e3938e62c08 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> @@ -517,7 +517,7 @@ static struct ttm_tt *vmw_ttm_tt_create(struct ttm_buffer_object *bo,
>   				     ttm_cached);
>   	else
>   		ret = ttm_tt_init(&vmw_be->dma_ttm, bo, page_flags,
> -				  ttm_cached);
> +				  ttm_cached, 0);
>   	if (unlikely(ret != 0))
>   		goto out_no_init;
>   
> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> index f20832139815..17a0310e8aaa 100644
> --- a/include/drm/ttm/ttm_tt.h
> +++ b/include/drm/ttm/ttm_tt.h
> @@ -140,6 +140,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
>    * @bo: The buffer object we create the ttm for.
>    * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
>    * @caching: the desired caching state of the pages
> + * @extra_pages: Extra pages needed for the driver.
>    *
>    * Create a struct ttm_tt to back data with system memory pages.
>    * No pages are actually allocated.
> @@ -147,7 +148,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
>    * NULL: Out of memory.
>    */
>   int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> -		uint32_t page_flags, enum ttm_caching caching);
> +		uint32_t page_flags, enum ttm_caching caching,
> +		unsigned long extra_pages);
>   int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object *bo,
>   		   uint32_t page_flags, enum ttm_caching caching);
>   


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Intel-gfx] [PATCH v2 0/4] drm/i915/ttm: Evict and store of compressed object
  2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C
                   ` (8 preceding siblings ...)
  (?)
@ 2022-03-02 15:31 ` Das, Nirmoy
  -1 siblings, 0 replies; 25+ messages in thread
From: Das, Nirmoy @ 2022-03-02 15:31 UTC (permalink / raw)
  To: intel-gfx

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> for the series as well.

On 01/03/2022 22:53, Ramalingam C wrote:
> On Xe-HP and later devices, we use dedicated compression control
> state (CCS) stored in local memory for each surface, to support
> the 3D and media compression formats.
>
> The memory required for the CCS of the entire local memory is
> 1/256 of the local memory size. So before the kernel
> boot, the required memory is reserved for the CCS data and a
> secure register will be programmed with the CCS base address
>
> So when we allocate a object in local memory we dont need to explicitly
> allocate the space for ccs data. But when we evict the obj into the smem
> to hold the compression related data along with the obj we need smem
> space of obj_size + (obj_size/256).
>
> Hence when we create smem for an obj with lmem placement possibility we
> create with the extra space.
>
> When we are swapping out the local memory obj on flat-ccs capable platform,
> we need to capture the ccs data too along with main meory and we need to
> restore it when we are swapping in the content.
>
> When lmem object is swapped into a smem obj, smem obj will
> have the extra pages required to hold the ccs data corresponding to the
> lmem main memory. So main memory of lmem will be copied into the initial
> pages of the smem and then ccs data corresponding to the main memory
> will be copied to the subsequent pages of smem.
>
> Swapin happens exactly in reverse order. First main memory of lmem is
> restored from the smem's initial pages and the ccs data will be restored
> from the subsequent pages of smem.
>
> Extracting and restoring the CCS data is done through a special cmd called
> XY_CTRL_SURF_COPY_BLT
>
> Test-with: 20220301212513.30772-1-ramalingam.c@intel.com
>
> Ayaz A Siddiqui (1):
>    drm/i915/gt: Clear compress metadata for Xe_HP platforms
>
> Ramalingam C (3):
>    drm/ttm: parameter to add extra pages into ttm_tt
>    drm/i915/gem: Extra pages in ttm_tt for ccs data
>    drm/i915/migrate: Evict and restore the flatccs capable lmem obj
>
>   drivers/gpu/drm/drm_gem_vram_helper.c        |   2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  23 +-
>   drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 +
>   drivers/gpu/drm/i915/gt/intel_migrate.c      | 327 +++++++++++++++++--
>   drivers/gpu/drm/qxl/qxl_ttm.c                |   2 +-
>   drivers/gpu/drm/ttm/ttm_agp_backend.c        |   2 +-
>   drivers/gpu/drm/ttm/ttm_tt.c                 |  12 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c   |   2 +-
>   include/drm/ttm/ttm_tt.h                     |   4 +-
>   9 files changed, 357 insertions(+), 32 deletions(-)
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/4] drm/i915/gt: Clear compress metadata for Xe_HP platforms
  2022-03-01 21:53   ` [Intel-gfx] " Ramalingam C
@ 2022-03-03  7:39     ` Hellstrom, Thomas
  -1 siblings, 0 replies; 25+ messages in thread
From: Hellstrom, Thomas @ 2022-03-03  7:39 UTC (permalink / raw)
  To: dri-devel, C, Ramalingam, intel-gfx; +Cc: Siddiqui, Ayaz A, Auld, Matthew

On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote:
> From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> 
> Xe-HP and latest devices support Flat CCS which reserved a portion of
> the device memory to store compression metadata, during the clearing
> of
> device memory buffer object we also need to clear the associated
> CCS buffer.
> 
> Flat CCS memory can not be directly accessed by S/W.
> Address of CCS buffer associated main BO is automatically calculated
> by device itself. KMD/UMD can only access this buffer indirectly
> using
> XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.
> 
> v2: Fixed issues with platform naming [Lucas]
> v3: Rebased [Ram]
>     Used the round_up funcs [Bob]
> v4: Fixed ccs blk calculation [Ram]
>     Added Kdoc on flat-ccs.
> v5: GENMASK is used [Matt]
>     mocs fix [Matt]
>     Comments Fix [Matt]
>     Flush address programming [Ram]
> v6: FLUSH_DW is fixed
>     Few coding style fix
> 
> Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
>  drivers/gpu/drm/i915/gt/intel_migrate.c      | 143
> ++++++++++++++++++-
>  2 files changed, 154 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index f8253012d166..237c1baccc64 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -203,6 +203,21 @@
>  #define GFX_OP_DRAWRECT_INFO    
> ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
>  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
>  
> +#define XY_CTRL_SURF_INSTR_SIZE        5
> +#define MI_FLUSH_DW_SIZE               3
> +#define XY_CTRL_SURF_COPY_BLT          ((2 << 29) | (0x48 << 22) |
> 3)
> +#define   SRC_ACCESS_TYPE_SHIFT                21
> +#define   DST_ACCESS_TYPE_SHIFT                20
> +#define   CCS_SIZE_MASK                        GENMASK(17, 8)
> +#define   XY_CTRL_SURF_MOCS_MASK       GENMASK(31, 25)
> +#define   NUM_CCS_BYTES_PER_BLOCK      256
> +#define   NUM_BYTES_PER_CCS_BYTE       256
> +#define   NUM_CCS_BLKS_PER_XFER                1024
> +#define   INDIRECT_ACCESS              0
> +#define   DIRECT_ACCESS                        1
> +#define  MI_FLUSH_LLC                  BIT(9)
> +#define  MI_FLUSH_CCS                  BIT(16)
> +
>  #define COLOR_BLT_CMD                  (2 << 29 | 0x40 << 22 | (5 -
> 2))
>  #define XY_COLOR_BLT_CMD               (2 << 29 | 0x50 << 22)
>  #define SRC_COPY_BLT_CMD               (2 << 29 | 0x43 << 22)
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 20444d6ceb3c..330fcdc3e0cf 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -16,6 +16,8 @@ struct insert_pte_data {
>  };
>  
>  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> +#define GET_CCS_BYTES(i915, size)      (HAS_FLAT_CCS(i915) ? \
> +                                        DIV_ROUND_UP(size,
> NUM_BYTES_PER_CCS_BYTE) : 0)
>  
>  static bool engine_supports_migration(struct intel_engine_cs
> *engine)
>  {
> @@ -467,6 +469,110 @@ static bool wa_1209644611_applies(int ver, u32
> size)
>         return height % 4 == 3 && height <= 8;
>  }
>  
> +/**
> + * DOC: Flat-CCS - Memory compression for Local memory
> + *
> + * On Xe-HP and later devices, we use dedicated compression control
> state (CCS)
> + * stored in local memory for each surface, to support the 3D and
> media
> + * compression formats.
> + *
> + * The memory required for the CCS of the entire local memory is
> 1/256 of the
> + * local memory size. So before the kernel boot, the required memory
> is reserved
> + * for the CCS data and a secure register will be programmed with
> the CCS base
> + * address.
> + *
> + * Flat CCS data needs to be cleared when a lmem object is
> allocated.
> + * And CCS data can be copied in and out of CCS region through
> + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
> + *
> + * When we exhaust the lmem, if the object's placements support
> smem, then we can
> + * directly decompress the compressed lmem object into smem and
> start using it
> + * from smem itself.
> + *
> + * But when we need to swapout the compressed lmem object into a
> smem region
> + * though objects' placement doesn't support smem, then we copy the
> lmem content
> + * as it is into smem region along with ccs data (using
> XY_CTRL_SURF_COPY_BLT).
> + * When the object is referred, lmem content will be swaped in along
> with
> + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at
> corresponding
> + * location.
> + */
> +
> +static inline u32 *i915_flush_dw(u32 *cmd, u32 flags)
> +{
> +       *cmd++ = MI_FLUSH_DW | flags;
> +       *cmd++ = 0;
> +       *cmd++ = 0;
> +
> +       return cmd;
> +}
> +
> +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915,
> int size)
> +{
> +       u32 num_cmds, num_blks, total_size;
> +
> +       if (!GET_CCS_BYTES(i915, size))
> +               return 0;
> +
> +       /*
> +        * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
> +        * blocks. one XY_CTRL_SURF_COPY_BLT command can
> +        * transfer upto 1024 blocks.
> +        */
> +       num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> +                               NUM_CCS_BYTES_PER_BLOCK);
> +       num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
> +       total_size = XY_CTRL_SURF_INSTR_SIZE * num_cmds;
> +
> +       /*
> +        * Adding a flush before and after XY_CTRL_SURF_COPY_BLT
> +        */
> +       total_size += 2 * MI_FLUSH_DW_SIZE;
> +
> +       return total_size;
> +}
> +

Since we should always interleave the ctrl_surf_copy_blt() on max
CHUNK_SZ pieces of LMEM (See also patch 4/4), I figure we would never
need to split the command since it can do 64M worth of LMEM in a single
command vs a CHUNK_SZ of 8M. Instead perhaps an assert that CHUNK_SZ
never exceeds the capability of the XY_CTRL_SURF_COPY_BLT?

Also I think it's important that we try to figure out whether we can
use the XY_FAST_COLOR_BLT command to clear also CCS on DG2. Would save
us a lot of code, and also at least on DG1 (without CCS) it speeds
clearing up significantly.

/Thomas

> +static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64
> dst_addr,
> +                                    u8 src_mem_access, u8
> dst_mem_access,
> +                                    int src_mocs, int dst_mocs,
> +                                    u32 ccs_blocks)
> +{
> +       /*
> +        * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the
> CCS
> +        * data in and out of the CCS region.
> +        *
> +        * We can copy at most 1024 blocks of 256 bytes using one
> +        * XY_CTRL_SURF_COPY_BLT instruction.
> +        *
> +        * In case we need to copy more than 1024 blocks, we need to
> add
> +        * another instruction to the same batch buffer.
> +        *
> +        * 1024 blocks of 256 bytes of CCS represent a total 256KB of
> CCS.
> +        *
> +        * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
> +        */
> +       do {
> +               int blks_per_copy;
> +
> +               blks_per_copy = ccs_blocks >= NUM_CCS_BLKS_PER_XFER ?
> +                               NUM_CCS_BLKS_PER_XFER : ccs_blocks;
> +               *cmd++ = XY_CTRL_SURF_COPY_BLT |
> +                        src_mem_access << SRC_ACCESS_TYPE_SHIFT |
> +                        dst_mem_access << DST_ACCESS_TYPE_SHIFT |
> +                        FIELD_PREP(CCS_SIZE_MASK, blks_per_copy -
> 1);
> +               *cmd++ = lower_32_bits(src_addr);
> +               *cmd++ = (upper_32_bits(src_addr) & 0xFFFF) |
> +                         FIELD_PREP(XY_CTRL_SURF_MOCS_MASK,
> src_mocs);
> +               *cmd++ = lower_32_bits(dst_addr);
> +               *cmd++ = (upper_32_bits(dst_addr) & 0xFFFF) |
> +                         FIELD_PREP(XY_CTRL_SURF_MOCS_MASK,
> dst_mocs);
> +               src_addr += SZ_64M;
> +               dst_addr += SZ_64M;
> +               ccs_blocks -= blks_per_copy;
> +       } while (ccs_blocks > 0);
> +
> +       return cmd;
> +}
> +
>  static int emit_copy(struct i915_request *rq,
>                      u32 dst_offset, u32 src_offset, int size)
>  {
> @@ -614,16 +720,24 @@ intel_context_migrate_copy(struct intel_context
> *ce,
>         return err;
>  }
>  
> -static int emit_clear(struct i915_request *rq, u64 offset, int size,
> u32 value)
> +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> +                     u32 value, bool is_lmem)
>  {
> -       const int ver = GRAPHICS_VER(rq->engine->i915);
> +       struct drm_i915_private *i915 = rq->engine->i915;
> +       const int ver = GRAPHICS_VER(i915);
> +       u32 num_ccs_blks, ccs_ring_size;
> +       int mocs = rq->engine->gt->mocs.uc_index << 1;
>         u32 *cs;
>  
>         GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
>  
>         offset += (u64)rq->engine->instance << 32;
>  
> -       cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> +       /* Clear CCS only when value is 0 */
> +       ccs_ring_size = (is_lmem && !value) ?
> +                        calc_ctrl_surf_instr_size(i915, size) : 0;
> +
> +       cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 +
> ccs_ring_size : 6, 2));
>         if (IS_ERR(cs))
>                 return PTR_ERR(cs);
>  
> @@ -646,6 +760,27 @@ static int emit_clear(struct i915_request *rq,
> u64 offset, int size, u32 value)
>                 *cs++ = value;
>         }
>  
> +       if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
> +               num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915,
> size),
> +                                           NUM_CCS_BYTES_PER_BLOCK);
> +
> +               /*
> +                * Flat CCS surface can only be accessed via
> +                * XY_CTRL_SURF_COPY_BLT CMD and using indirect
> +                * mapping of associated LMEM.
> +                * We can clear ccs surface by writing all 0s,
> +                * so we will flush the previously cleared buffer
> +                * and use it as a source.
> +                */
> +               cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +               cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
> +                                             DIRECT_ACCESS,
> INDIRECT_ACCESS,
> +                                             mocs, mocs,
> num_ccs_blks);
> +               cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +
> +               if (ccs_ring_size & 1)
> +                       *cs++ = MI_NOOP;
> +       }
>         intel_ring_advance(rq, cs);
>         return 0;
>  }
> @@ -711,7 +846,7 @@ intel_context_migrate_clear(struct intel_context
> *ce,
>                 if (err)
>                         goto out_rq;
>  
> -               err = emit_clear(rq, offset, len, value);
> +               err = emit_clear(rq, offset, len, value, is_lmem);
>  
>                 /* Arbitration is re-enabled between requests. */
>  out_rq:

----------------------------------------------------------------------
Intel Sweden AB
Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Intel-gfx] [PATCH v2 1/4] drm/i915/gt: Clear compress metadata for Xe_HP platforms
@ 2022-03-03  7:39     ` Hellstrom, Thomas
  0 siblings, 0 replies; 25+ messages in thread
From: Hellstrom, Thomas @ 2022-03-03  7:39 UTC (permalink / raw)
  To: dri-devel, C, Ramalingam, intel-gfx; +Cc: Auld, Matthew

On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote:
> From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> 
> Xe-HP and latest devices support Flat CCS which reserved a portion of
> the device memory to store compression metadata, during the clearing
> of
> device memory buffer object we also need to clear the associated
> CCS buffer.
> 
> Flat CCS memory can not be directly accessed by S/W.
> Address of CCS buffer associated main BO is automatically calculated
> by device itself. KMD/UMD can only access this buffer indirectly
> using
> XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.
> 
> v2: Fixed issues with platform naming [Lucas]
> v3: Rebased [Ram]
>     Used the round_up funcs [Bob]
> v4: Fixed ccs blk calculation [Ram]
>     Added Kdoc on flat-ccs.
> v5: GENMASK is used [Matt]
>     mocs fix [Matt]
>     Comments Fix [Matt]
>     Flush address programming [Ram]
> v6: FLUSH_DW is fixed
>     Few coding style fix
> 
> Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
>  drivers/gpu/drm/i915/gt/intel_migrate.c      | 143
> ++++++++++++++++++-
>  2 files changed, 154 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index f8253012d166..237c1baccc64 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -203,6 +203,21 @@
>  #define GFX_OP_DRAWRECT_INFO    
> ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
>  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
>  
> +#define XY_CTRL_SURF_INSTR_SIZE        5
> +#define MI_FLUSH_DW_SIZE               3
> +#define XY_CTRL_SURF_COPY_BLT          ((2 << 29) | (0x48 << 22) |
> 3)
> +#define   SRC_ACCESS_TYPE_SHIFT                21
> +#define   DST_ACCESS_TYPE_SHIFT                20
> +#define   CCS_SIZE_MASK                        GENMASK(17, 8)
> +#define   XY_CTRL_SURF_MOCS_MASK       GENMASK(31, 25)
> +#define   NUM_CCS_BYTES_PER_BLOCK      256
> +#define   NUM_BYTES_PER_CCS_BYTE       256
> +#define   NUM_CCS_BLKS_PER_XFER                1024
> +#define   INDIRECT_ACCESS              0
> +#define   DIRECT_ACCESS                        1
> +#define  MI_FLUSH_LLC                  BIT(9)
> +#define  MI_FLUSH_CCS                  BIT(16)
> +
>  #define COLOR_BLT_CMD                  (2 << 29 | 0x40 << 22 | (5 -
> 2))
>  #define XY_COLOR_BLT_CMD               (2 << 29 | 0x50 << 22)
>  #define SRC_COPY_BLT_CMD               (2 << 29 | 0x43 << 22)
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 20444d6ceb3c..330fcdc3e0cf 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -16,6 +16,8 @@ struct insert_pte_data {
>  };
>  
>  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> +#define GET_CCS_BYTES(i915, size)      (HAS_FLAT_CCS(i915) ? \
> +                                        DIV_ROUND_UP(size,
> NUM_BYTES_PER_CCS_BYTE) : 0)
>  
>  static bool engine_supports_migration(struct intel_engine_cs
> *engine)
>  {
> @@ -467,6 +469,110 @@ static bool wa_1209644611_applies(int ver, u32
> size)
>         return height % 4 == 3 && height <= 8;
>  }
>  
> +/**
> + * DOC: Flat-CCS - Memory compression for Local memory
> + *
> + * On Xe-HP and later devices, we use dedicated compression control
> state (CCS)
> + * stored in local memory for each surface, to support the 3D and
> media
> + * compression formats.
> + *
> + * The memory required for the CCS of the entire local memory is
> 1/256 of the
> + * local memory size. So before the kernel boot, the required memory
> is reserved
> + * for the CCS data and a secure register will be programmed with
> the CCS base
> + * address.
> + *
> + * Flat CCS data needs to be cleared when a lmem object is
> allocated.
> + * And CCS data can be copied in and out of CCS region through
> + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
> + *
> + * When we exhaust the lmem, if the object's placements support
> smem, then we can
> + * directly decompress the compressed lmem object into smem and
> start using it
> + * from smem itself.
> + *
> + * But when we need to swapout the compressed lmem object into a
> smem region
> + * though objects' placement doesn't support smem, then we copy the
> lmem content
> + * as it is into smem region along with ccs data (using
> XY_CTRL_SURF_COPY_BLT).
> + * When the object is referred, lmem content will be swaped in along
> with
> + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at
> corresponding
> + * location.
> + */
> +
> +static inline u32 *i915_flush_dw(u32 *cmd, u32 flags)
> +{
> +       *cmd++ = MI_FLUSH_DW | flags;
> +       *cmd++ = 0;
> +       *cmd++ = 0;
> +
> +       return cmd;
> +}
> +
> +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915,
> int size)
> +{
> +       u32 num_cmds, num_blks, total_size;
> +
> +       if (!GET_CCS_BYTES(i915, size))
> +               return 0;
> +
> +       /*
> +        * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
> +        * blocks. one XY_CTRL_SURF_COPY_BLT command can
> +        * transfer upto 1024 blocks.
> +        */
> +       num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> +                               NUM_CCS_BYTES_PER_BLOCK);
> +       num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
> +       total_size = XY_CTRL_SURF_INSTR_SIZE * num_cmds;
> +
> +       /*
> +        * Adding a flush before and after XY_CTRL_SURF_COPY_BLT
> +        */
> +       total_size += 2 * MI_FLUSH_DW_SIZE;
> +
> +       return total_size;
> +}
> +

Since we should always interleave the ctrl_surf_copy_blt() on max
CHUNK_SZ pieces of LMEM (See also patch 4/4), I figure we would never
need to split the command since it can do 64M worth of LMEM in a single
command vs a CHUNK_SZ of 8M. Instead perhaps an assert that CHUNK_SZ
never exceeds the capability of the XY_CTRL_SURF_COPY_BLT?

Also I think it's important that we try to figure out whether we can
use the XY_FAST_COLOR_BLT command to clear also CCS on DG2. Would save
us a lot of code, and also at least on DG1 (without CCS) it speeds
clearing up significantly.

/Thomas

> +static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64
> dst_addr,
> +                                    u8 src_mem_access, u8
> dst_mem_access,
> +                                    int src_mocs, int dst_mocs,
> +                                    u32 ccs_blocks)
> +{
> +       /*
> +        * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the
> CCS
> +        * data in and out of the CCS region.
> +        *
> +        * We can copy at most 1024 blocks of 256 bytes using one
> +        * XY_CTRL_SURF_COPY_BLT instruction.
> +        *
> +        * In case we need to copy more than 1024 blocks, we need to
> add
> +        * another instruction to the same batch buffer.
> +        *
> +        * 1024 blocks of 256 bytes of CCS represent a total 256KB of
> CCS.
> +        *
> +        * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
> +        */
> +       do {
> +               int blks_per_copy;
> +
> +               blks_per_copy = ccs_blocks >= NUM_CCS_BLKS_PER_XFER ?
> +                               NUM_CCS_BLKS_PER_XFER : ccs_blocks;
> +               *cmd++ = XY_CTRL_SURF_COPY_BLT |
> +                        src_mem_access << SRC_ACCESS_TYPE_SHIFT |
> +                        dst_mem_access << DST_ACCESS_TYPE_SHIFT |
> +                        FIELD_PREP(CCS_SIZE_MASK, blks_per_copy -
> 1);
> +               *cmd++ = lower_32_bits(src_addr);
> +               *cmd++ = (upper_32_bits(src_addr) & 0xFFFF) |
> +                         FIELD_PREP(XY_CTRL_SURF_MOCS_MASK,
> src_mocs);
> +               *cmd++ = lower_32_bits(dst_addr);
> +               *cmd++ = (upper_32_bits(dst_addr) & 0xFFFF) |
> +                         FIELD_PREP(XY_CTRL_SURF_MOCS_MASK,
> dst_mocs);
> +               src_addr += SZ_64M;
> +               dst_addr += SZ_64M;
> +               ccs_blocks -= blks_per_copy;
> +       } while (ccs_blocks > 0);
> +
> +       return cmd;
> +}
> +
>  static int emit_copy(struct i915_request *rq,
>                      u32 dst_offset, u32 src_offset, int size)
>  {
> @@ -614,16 +720,24 @@ intel_context_migrate_copy(struct intel_context
> *ce,
>         return err;
>  }
>  
> -static int emit_clear(struct i915_request *rq, u64 offset, int size,
> u32 value)
> +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> +                     u32 value, bool is_lmem)
>  {
> -       const int ver = GRAPHICS_VER(rq->engine->i915);
> +       struct drm_i915_private *i915 = rq->engine->i915;
> +       const int ver = GRAPHICS_VER(i915);
> +       u32 num_ccs_blks, ccs_ring_size;
> +       int mocs = rq->engine->gt->mocs.uc_index << 1;
>         u32 *cs;
>  
>         GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
>  
>         offset += (u64)rq->engine->instance << 32;
>  
> -       cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> +       /* Clear CCS only when value is 0 */
> +       ccs_ring_size = (is_lmem && !value) ?
> +                        calc_ctrl_surf_instr_size(i915, size) : 0;
> +
> +       cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 +
> ccs_ring_size : 6, 2));
>         if (IS_ERR(cs))
>                 return PTR_ERR(cs);
>  
> @@ -646,6 +760,27 @@ static int emit_clear(struct i915_request *rq,
> u64 offset, int size, u32 value)
>                 *cs++ = value;
>         }
>  
> +       if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
> +               num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915,
> size),
> +                                           NUM_CCS_BYTES_PER_BLOCK);
> +
> +               /*
> +                * Flat CCS surface can only be accessed via
> +                * XY_CTRL_SURF_COPY_BLT CMD and using indirect
> +                * mapping of associated LMEM.
> +                * We can clear ccs surface by writing all 0s,
> +                * so we will flush the previously cleared buffer
> +                * and use it as a source.
> +                */
> +               cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +               cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
> +                                             DIRECT_ACCESS,
> INDIRECT_ACCESS,
> +                                             mocs, mocs,
> num_ccs_blks);
> +               cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +
> +               if (ccs_ring_size & 1)
> +                       *cs++ = MI_NOOP;
> +       }
>         intel_ring_advance(rq, cs);
>         return 0;
>  }
> @@ -711,7 +846,7 @@ intel_context_migrate_clear(struct intel_context
> *ce,
>                 if (err)
>                         goto out_rq;
>  
> -               err = emit_clear(rq, offset, len, value);
> +               err = emit_clear(rq, offset, len, value, is_lmem);
>  
>                 /* Arbitration is re-enabled between requests. */
>  out_rq:

----------------------------------------------------------------------
Intel Sweden AB
Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/4] drm/i915/migrate: Evict and restore the flatccs capable lmem obj
  2022-03-01 21:53   ` [Intel-gfx] " Ramalingam C
@ 2022-03-03  8:04     ` Hellstrom, Thomas
  -1 siblings, 0 replies; 25+ messages in thread
From: Hellstrom, Thomas @ 2022-03-03  8:04 UTC (permalink / raw)
  To: dri-devel, C, Ramalingam, intel-gfx; +Cc: Auld, Matthew

On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote:
> When we are swapping out the local memory obj on flat-ccs capable
> platform,
> we need to capture the ccs data too along with main meory and we need
> to
> restore it when we are swapping in the content.
> 
> When lmem object is swapped into a smem obj, smem obj will
> have the extra pages required to hold the ccs data corresponding to
> the
> lmem main memory. So main memory of lmem will be copied into the
> initial
> pages of the smem and then ccs data corresponding to the main memory
> will be copied to the subsequent pages of smem. ccs data is 1/256 of
> lmem size.
> 
> Swapin happens exactly in reverse order. First main memory of lmem is
> restored from the smem's initial pages and the ccs data will be
> restored
> from the subsequent pages of smem.
> 
> Extracting and restoring the CCS data is done through a special cmd
> called
> XY_CTRL_SURF_COPY_BLT
> 
> v2: Fixing the ccs handling
> 
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c | 184 +++++++++++++++++++++-
> --
>  1 file changed, 167 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 330fcdc3e0cf..73ac7382aeb6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -341,12 +341,9 @@ static int emit_no_arbitration(struct
> i915_request *rq)
>         return 0;
>  }
>  
> -static int emit_pte(struct i915_request *rq,
> -                   struct sgt_dma *it,
> +static int emit_pte(struct i915_request *rq, struct sgt_dma *it,
>                     enum i915_cache_level cache_level,
> -                   bool is_lmem,
> -                   u64 offset,
> -                   int length)
> +                   bool is_lmem, u64 offset, int length)

Above change seems unrelated?

>  {
>         bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
>         const u64 encode = rq->context->vm->pte_encode(0,
> cache_level,
> @@ -573,14 +570,54 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd,
> u64 src_addr, u64 dst_addr,
>         return cmd;
>  }
>  
> +static int emit_ccs_copy(struct i915_request *rq,
> +                        bool dst_is_lmem, u32 dst_offset,
> +                        bool src_is_lmem, u32 src_offset, int size)
> +{
> +       struct drm_i915_private *i915 = rq->engine->i915;
> +       int mocs = rq->engine->gt->mocs.uc_index << 1;
> +       u32 num_ccs_blks, ccs_ring_size;
> +       u8 src_access, dst_access;
> +       u32 *cs;
> +
> +       GEM_BUG_ON(!(src_is_lmem ^ dst_is_lmem) ||
> !HAS_FLAT_CCS(i915));
> +
> +       ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
> +       WARN_ON(!ccs_ring_size);
> +
> +       cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
> +       if (IS_ERR(cs))
> +               return PTR_ERR(cs);
> +
> +       num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> +                                   NUM_CCS_BYTES_PER_BLOCK);
> +
> +       src_access = !src_is_lmem && dst_is_lmem;
> +       dst_access = !src_access;
> +
> +       cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +       cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset,
> +                                     src_access, dst_access,
> +                                     mocs, mocs, num_ccs_blks);
> +       cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +       if (ccs_ring_size & 1)
> +               *cs++ = MI_NOOP;
> +
> +       intel_ring_advance(rq, cs);
> +
> +       return 0;
> +}
> +
>  static int emit_copy(struct i915_request *rq,
> -                    u32 dst_offset, u32 src_offset, int size)
> +                    bool dst_is_lmem, u32 dst_offset,
> +                    bool src_is_lmem, u32 src_offset, int size)
>  {
>         const int ver = GRAPHICS_VER(rq->engine->i915);
>         u32 instance = rq->engine->instance;
>         u32 *cs;
>  
>         cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6);
> +
>         if (IS_ERR(cs))
>                 return PTR_ERR(cs);

Changes to emit_copy() above seem unrelated?
Also for the verbatim copy we need to adjust the compression flags in
the main copy blit.

>  
> @@ -620,6 +657,18 @@ static int emit_copy(struct i915_request *rq,
>         return 0;
>  }
>  
> +static int scatter_list_length(struct scatterlist *sg)
> +{
> +       int len = 0;
> +
> +       while (sg) {

Terminate loop if (sg_dma_len() == 0) ?

> +               len += sg_dma_len(sg);
> +               sg = sg_next(sg);
> +       };
> +
> +       return len;
> +}
> +
>  int
>  intel_context_migrate_copy(struct intel_context *ce,
>                            const struct i915_deps *deps,
> @@ -632,7 +681,10 @@ intel_context_migrate_copy(struct intel_context
> *ce,
>                            struct i915_request **out)

Perhaps add a parameter "verbatim" to indicate whether we want to do a
verbatim copy or not. That way we can differentiate between eviction
(verbatim) and migration (ordinary blit)?

>  {
>         struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
> +       struct drm_i915_private *i915 = ce->engine->i915;
> +       u32 src_sz, dst_sz, ccs_bytes = 0, bytes_to_cpy;
>         struct i915_request *rq;
> +       bool ccs_copy = false;
>         int err;
>  
>         GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
> @@ -640,9 +692,28 @@ intel_context_migrate_copy(struct intel_context
> *ce,
>  
>         GEM_BUG_ON(ce->ring->size < SZ_64K);
>  
> +       if (HAS_FLAT_CCS(i915) && src_is_lmem ^ dst_is_lmem) {
> +               src_sz = scatter_list_length(src);
> +               dst_sz = scatter_list_length(dst);
> +
> +               if (src_is_lmem)
> +                       bytes_to_cpy = src_sz;
> +               else if (dst_is_lmem)
> +                       bytes_to_cpy = dst_sz;
> +
> +               /*
> +                * When there is a eviction of ccs needed smem will
> have the
> +                * extra pages for the ccs data
> +                *
> +                * TO-DO: Want to move the size mismatch check to a
> WARN_ON,
> +                * but still we have some requests of smem->lmem with
> same size.
> +                * Need to fix it.
> +                */
> +               ccs_bytes = src_sz != dst_sz ? GET_CCS_BYTES(i915,
> bytes_to_cpy) : 0;
> +       }
> +
>         do {
> -               u32 src_offset, dst_offset;
> -               int len;
> +               u32 src_offset, dst_offset, copy_sz;
>  
>                 rq = i915_request_create(ce);
>                 if (IS_ERR(rq)) {
> @@ -682,27 +753,82 @@ intel_context_migrate_copy(struct intel_context
> *ce,
>                                 dst_offset = 2 * CHUNK_SZ;
>                 }
>  
> -               len = emit_pte(rq, &it_src, src_cache_level,
> src_is_lmem,
> -                              src_offset, CHUNK_SZ);
> -               if (len <= 0) {
> -                       err = len;
> +               if (ccs_copy) {

This loop was hard to understand already before this patch. Could we
try to break out some loop functionality into separate functions?

Also if I understand the flow correctly, We're first blitting all the
chunks of the main surface, and after that the CCS data? However for
the control surface blit indirect addressing of LMEM to work, I figure
*all* main surface LMEM pages for which we blit control data need to be
present in the CHUNK_SZ window VMA, which is only true for small
buffers. Hence we need to interleave main surface and CCS copies when
we need to split the main surface into chunks, perhaps something like

for_each_chunk() {

disable_preemption();

emit_pte(lmem);
emit_pte(system);
xy_fast_copy_blt();
emit_pte(system_ccs_region); // Still use the system window for this
tlb_flush();  // Flush the updated system ptes 
xy_ctrl_surf_copy_blt();

enable_preemption();

}

And also check whether we need to do the ctrl_surface blit first
depending on blit direction (according to the docs).

Thanks,
Thomas

----------------------------------------------------------------------
Intel Sweden AB
Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Intel-gfx] [PATCH v2 4/4] drm/i915/migrate: Evict and restore the flatccs capable lmem obj
@ 2022-03-03  8:04     ` Hellstrom, Thomas
  0 siblings, 0 replies; 25+ messages in thread
From: Hellstrom, Thomas @ 2022-03-03  8:04 UTC (permalink / raw)
  To: dri-devel, C, Ramalingam, intel-gfx; +Cc: Auld, Matthew

On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote:
> When we are swapping out the local memory obj on flat-ccs capable
> platform,
> we need to capture the ccs data too along with main meory and we need
> to
> restore it when we are swapping in the content.
> 
> When lmem object is swapped into a smem obj, smem obj will
> have the extra pages required to hold the ccs data corresponding to
> the
> lmem main memory. So main memory of lmem will be copied into the
> initial
> pages of the smem and then ccs data corresponding to the main memory
> will be copied to the subsequent pages of smem. ccs data is 1/256 of
> lmem size.
> 
> Swapin happens exactly in reverse order. First main memory of lmem is
> restored from the smem's initial pages and the ccs data will be
> restored
> from the subsequent pages of smem.
> 
> Extracting and restoring the CCS data is done through a special cmd
> called
> XY_CTRL_SURF_COPY_BLT
> 
> v2: Fixing the ccs handling
> 
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c | 184 +++++++++++++++++++++-
> --
>  1 file changed, 167 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 330fcdc3e0cf..73ac7382aeb6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -341,12 +341,9 @@ static int emit_no_arbitration(struct
> i915_request *rq)
>         return 0;
>  }
>  
> -static int emit_pte(struct i915_request *rq,
> -                   struct sgt_dma *it,
> +static int emit_pte(struct i915_request *rq, struct sgt_dma *it,
>                     enum i915_cache_level cache_level,
> -                   bool is_lmem,
> -                   u64 offset,
> -                   int length)
> +                   bool is_lmem, u64 offset, int length)

Above change seems unrelated?

>  {
>         bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
>         const u64 encode = rq->context->vm->pte_encode(0,
> cache_level,
> @@ -573,14 +570,54 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd,
> u64 src_addr, u64 dst_addr,
>         return cmd;
>  }
>  
> +static int emit_ccs_copy(struct i915_request *rq,
> +                        bool dst_is_lmem, u32 dst_offset,
> +                        bool src_is_lmem, u32 src_offset, int size)
> +{
> +       struct drm_i915_private *i915 = rq->engine->i915;
> +       int mocs = rq->engine->gt->mocs.uc_index << 1;
> +       u32 num_ccs_blks, ccs_ring_size;
> +       u8 src_access, dst_access;
> +       u32 *cs;
> +
> +       GEM_BUG_ON(!(src_is_lmem ^ dst_is_lmem) ||
> !HAS_FLAT_CCS(i915));
> +
> +       ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
> +       WARN_ON(!ccs_ring_size);
> +
> +       cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
> +       if (IS_ERR(cs))
> +               return PTR_ERR(cs);
> +
> +       num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> +                                   NUM_CCS_BYTES_PER_BLOCK);
> +
> +       src_access = !src_is_lmem && dst_is_lmem;
> +       dst_access = !src_access;
> +
> +       cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +       cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset,
> +                                     src_access, dst_access,
> +                                     mocs, mocs, num_ccs_blks);
> +       cs = i915_flush_dw(cs, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +       if (ccs_ring_size & 1)
> +               *cs++ = MI_NOOP;
> +
> +       intel_ring_advance(rq, cs);
> +
> +       return 0;
> +}
> +
>  static int emit_copy(struct i915_request *rq,
> -                    u32 dst_offset, u32 src_offset, int size)
> +                    bool dst_is_lmem, u32 dst_offset,
> +                    bool src_is_lmem, u32 src_offset, int size)
>  {
>         const int ver = GRAPHICS_VER(rq->engine->i915);
>         u32 instance = rq->engine->instance;
>         u32 *cs;
>  
>         cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6);
> +
>         if (IS_ERR(cs))
>                 return PTR_ERR(cs);

Changes to emit_copy() above seem unrelated?
Also for the verbatim copy we need to adjust the compression flags in
the main copy blit.

>  
> @@ -620,6 +657,18 @@ static int emit_copy(struct i915_request *rq,
>         return 0;
>  }
>  
> +static int scatter_list_length(struct scatterlist *sg)
> +{
> +       int len = 0;
> +
> +       while (sg) {

Terminate loop if (sg_dma_len() == 0) ?

> +               len += sg_dma_len(sg);
> +               sg = sg_next(sg);
> +       };
> +
> +       return len;
> +}
> +
>  int
>  intel_context_migrate_copy(struct intel_context *ce,
>                            const struct i915_deps *deps,
> @@ -632,7 +681,10 @@ intel_context_migrate_copy(struct intel_context
> *ce,
>                            struct i915_request **out)

Perhaps add a parameter "verbatim" to indicate whether we want to do a
verbatim copy or not. That way we can differentiate between eviction
(verbatim) and migration (ordinary blit)?

>  {
>         struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
> +       struct drm_i915_private *i915 = ce->engine->i915;
> +       u32 src_sz, dst_sz, ccs_bytes = 0, bytes_to_cpy;
>         struct i915_request *rq;
> +       bool ccs_copy = false;
>         int err;
>  
>         GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
> @@ -640,9 +692,28 @@ intel_context_migrate_copy(struct intel_context
> *ce,
>  
>         GEM_BUG_ON(ce->ring->size < SZ_64K);
>  
> +       if (HAS_FLAT_CCS(i915) && src_is_lmem ^ dst_is_lmem) {
> +               src_sz = scatter_list_length(src);
> +               dst_sz = scatter_list_length(dst);
> +
> +               if (src_is_lmem)
> +                       bytes_to_cpy = src_sz;
> +               else if (dst_is_lmem)
> +                       bytes_to_cpy = dst_sz;
> +
> +               /*
> +                * When there is a eviction of ccs needed smem will
> have the
> +                * extra pages for the ccs data
> +                *
> +                * TO-DO: Want to move the size mismatch check to a
> WARN_ON,
> +                * but still we have some requests of smem->lmem with
> same size.
> +                * Need to fix it.
> +                */
> +               ccs_bytes = src_sz != dst_sz ? GET_CCS_BYTES(i915,
> bytes_to_cpy) : 0;
> +       }
> +
>         do {
> -               u32 src_offset, dst_offset;
> -               int len;
> +               u32 src_offset, dst_offset, copy_sz;
>  
>                 rq = i915_request_create(ce);
>                 if (IS_ERR(rq)) {
> @@ -682,27 +753,82 @@ intel_context_migrate_copy(struct intel_context
> *ce,
>                                 dst_offset = 2 * CHUNK_SZ;
>                 }
>  
> -               len = emit_pte(rq, &it_src, src_cache_level,
> src_is_lmem,
> -                              src_offset, CHUNK_SZ);
> -               if (len <= 0) {
> -                       err = len;
> +               if (ccs_copy) {

This loop was hard to understand already before this patch. Could we
try to break out some loop functionality into separate functions?

Also if I understand the flow correctly, We're first blitting all the
chunks of the main surface, and after that the CCS data? However for
the control surface blit indirect addressing of LMEM to work, I figure
*all* main surface LMEM pages for which we blit control data need to be
present in the CHUNK_SZ window VMA, which is only true for small
buffers. Hence we need to interleave main surface and CCS copies when
we need to split the main surface into chunks, perhaps something like

for_each_chunk() {

disable_preemption();

emit_pte(lmem);
emit_pte(system);
xy_fast_copy_blt();
emit_pte(system_ccs_region); // Still use the system window for this
tlb_flush();  // Flush the updated system ptes 
xy_ctrl_surf_copy_blt();

enable_preemption();

}

And also check whether we need to do the ctrl_surface blit first
depending on blit direction (according to the docs).

Thanks,
Thomas

----------------------------------------------------------------------
Intel Sweden AB
Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-03-03  8:04 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-01 21:53 [PATCH v2 0/4] drm/i915/ttm: Evict and store of compressed object Ramalingam C
2022-03-01 21:53 ` [Intel-gfx] " Ramalingam C
2022-03-01 21:53 ` [PATCH v2 1/4] drm/i915/gt: Clear compress metadata for Xe_HP platforms Ramalingam C
2022-03-01 21:53   ` [Intel-gfx] " Ramalingam C
2022-03-03  7:39   ` Hellstrom, Thomas
2022-03-03  7:39     ` [Intel-gfx] " Hellstrom, Thomas
2022-03-01 21:53 ` [PATCH v2 2/4] drm/ttm: parameter to add extra pages into ttm_tt Ramalingam C
2022-03-01 21:53   ` [Intel-gfx] " Ramalingam C
2022-03-02 12:54   ` Thomas Hellström
2022-03-02 12:54     ` [Intel-gfx] " Thomas Hellström
2022-03-02 13:24   ` Christian König
2022-03-02 13:24     ` [Intel-gfx] " Christian König
2022-03-01 21:53 ` [PATCH v2 3/4] drm/i915/gem: Extra pages in ttm_tt for ccs data Ramalingam C
2022-03-01 21:53   ` [Intel-gfx] " Ramalingam C
2022-03-02 12:58   ` Thomas Hellström
2022-03-02 12:58     ` [Intel-gfx] " Thomas Hellström
2022-03-01 21:53 ` [PATCH v2 4/4] drm/i915/migrate: Evict and restore the flatccs capable lmem obj Ramalingam C
2022-03-01 21:53   ` [Intel-gfx] " Ramalingam C
2022-03-03  8:04   ` Hellstrom, Thomas
2022-03-03  8:04     ` [Intel-gfx] " Hellstrom, Thomas
2022-03-02  1:51 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/ttm: Evict and store of compressed object (rev2) Patchwork
2022-03-02  1:53 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-03-02  2:23 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-03-02  8:28 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-03-02 15:31 ` [Intel-gfx] [PATCH v2 0/4] drm/i915/ttm: Evict and store of compressed object Das, Nirmoy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.